Help with interrupt software routine and non-atomic operations

I have an ISR that uses a struct with two members:

struct mystruct {
   const void *data;
   size_t size;
} *curr, *next, curr_s, next_s;

void ISR(void) {
   static const unsigned char *d;
   if (next) {
     *curr = *next;
     next = NULL;
     d = curr->data;
   }
   if (curr) {
     /* Process curr structure, byte by byte */
     ...
     if (d - (const unsigned char *)curr->data ==
             curr->size) {
       curr = NULL;
     }
   }
}

Now, to activate a new data structure I wrote the following routine:

void foo(const void *data, size_t size) {
   if ((volatile struct mystruct *)curr == NULL) {
     curr_s.data = data;
     curr_s.size = size;
     curr = &curr_s;
   } else {
(*) next_s.data = data;
     next_s.size = size;
     next = &next_s;
   }
}

If a current data structure is present, it should be stopped and the new 
one should take place.

The first if guarantees that there isn't a current structure, so curr 
and curr_s can be changed without any problem.

The second if happens when a current structure really exists. In this 
case, I can't change curr_s and curr directly, so I use a temporary data 
structure next_s and an atomic flag next (I'm working on 32-bits 
architecture, so the assignment is atomic). The next data will take 
place in the next ISR.

Now suppose current structure isn't present (curr == NULL) and I call 
foo() three times:

   foo(d1, s1);
   foo(d2, s2);
   foo(d3, s3);

I would like to have finally (d3,s3). After first foo(), the first if is 
followed and curr_s data structure and curr flag is filled.
After second foo(), the second if is followed and next_s/next are filled.

What happens with the third foo()? What happens if the ISR is fired 
exactly *after* line marked with (*) above?
The ISR will use next_s data structure, but it contains uncoherent data 
values: data points to d3, but size is s2.

How to avoid this problem?

Just to explain my problem. The ISR is a timer interrupt that generates 
and audio stream at a sampling frequency on a DAC output. foo() can be 
named audio_play().

Reply by Tim Wescott ●April 28, 20162016-04-28

On Thu, 28 Apr 2016 15:15:38 +0200, pozz wrote:

> I have an ISR that uses a struct with two members:
> 
> struct mystruct {
>    const void *data;
>    size_t size;
> } *curr, *next, curr_s, next_s;
> 
> void ISR(void) {
>    static const unsigned char *d;
>    if (next) {
>      *curr = *next;
>      next = NULL;
>      d = curr->data;
>    }
>    if (curr) {
>      /* Process curr structure, byte by byte */ ...
>      if (d - (const unsigned char *)curr->data ==
>              curr->size) {
>        curr = NULL;
>      }
>    }
> }
> 
> Now, to activate a new data structure I wrote the following routine:
> 
> void foo(const void *data, size_t size) {
>    if ((volatile struct mystruct *)curr == NULL) {
>      curr_s.data = data;
>      curr_s.size = size;
>      curr = &curr_s;
>    } else {
> (*) next_s.data = data;
>      next_s.size = size;
>      next = &next_s;
>    }
> }
> 
> If a current data structure is present, it should be stopped and the new
> one should take place.
> 
> The first if guarantees that there isn't a current structure, so curr
> and curr_s can be changed without any problem.
> 
> The second if happens when a current structure really exists. In this
> case, I can't change curr_s and curr directly, so I use a temporary data
> structure next_s and an atomic flag next (I'm working on 32-bits
> architecture, so the assignment is atomic). The next data will take
> place in the next ISR.
> 
> Now suppose current structure isn't present (curr == NULL) and I call
> foo() three times:
> 
>    foo(d1, s1);
>    foo(d2, s2);
>    foo(d3, s3);
> 
> I would like to have finally (d3,s3). After first foo(), the first if is
> followed and curr_s data structure and curr flag is filled.
> After second foo(), the second if is followed and next_s/next are
> filled.
> 
> What happens with the third foo()? What happens if the ISR is fired
> exactly *after* line marked with (*) above?
> The ISR will use next_s data structure, but it contains uncoherent data
> values: data points to d3, but size is s2.
> 
> How to avoid this problem?
> 
> Just to explain my problem. The ISR is a timer interrupt that generates
> and audio stream at a sampling frequency on a DAC output. foo() can be
> named audio_play().

Normally you identify those bits of code that need to be atomic but 
aren't, you turn off interrupts for JUST THAT PART, and then you turn 
interrupts back on.

You have to accept that your interrupt latency will increase a bit (and 
you want keep a stick handy, to whack any developer who wraps a 
protection block around more than the absolute minimum of code).

-- 
www.wescottdesign.com

Reply by David Brown ●April 28, 20162016-04-28

On 28/04/16 15:15, pozz wrote:
> I have an ISR that uses a struct with two members:
>
> struct mystruct {
>    const void *data;
>    size_t size;
> } *curr, *next, curr_s, next_s;
>
> void ISR(void) {
>    static const unsigned char *d;
>    if (next) {
>      *curr = *next;
>      next = NULL;
>      d = curr->data;
>    }
>    if (curr) {
>      /* Process curr structure, byte by byte */
>      ...
>      if (d - (const unsigned char *)curr->data ==
>              curr->size) {
>        curr = NULL;
>      }
>    }
> }
>
> Now, to activate a new data structure I wrote the following routine:
>
> void foo(const void *data, size_t size) {
>    if ((volatile struct mystruct *)curr == NULL) {
>      curr_s.data = data;
>      curr_s.size = size;
>      curr = &curr_s;
>    } else {
> (*) next_s.data = data;
>      next_s.size = size;
>      next = &next_s;
>    }
> }
>
> If a current data structure is present, it should be stopped and the new
> one should take place.
>
> The first if guarantees that there isn't a current structure, so curr
> and curr_s can be changed without any problem.

Your code above does not guarantee that curr_s is set appropriately 
before curr is set.  The compiler is free to store the address of curr_s 
in curr before setting curr_s.data and curr_s.size.

To force the desired ordering, you have two choices.  You can make sure 
curr_s /and/ curr are both accessed as volatile (by using casts to 
volatile, or by simply declaring the variables as volatile).  Or you can 
use a memory barrier to stop the movement of accesses back and forth:

	asm volatile("" ::: "memory");

(assuming gcc or a compiler that supports gcc inline assembly).

Unless you are sure you know what you are doing with memory barriers, 
and with getting volatile casts in exactly the right places, it is 
usually easier to declare the relevant variables as volatile, at the 
risk of slightly suboptimal code generation.  And that means /all/ the 
relevant variables - volatile ordering does not affect the ordering of 
non-volatile accesses.

>
> The second if happens when a current structure really exists. In this
> case, I can't change curr_s and curr directly, so I use a temporary data
> structure next_s and an atomic flag next (I'm working on 32-bits
> architecture, so the assignment is atomic). The next data will take
> place in the next ISR.
>

Again, you need volatiles or a memory barrier to force the setting of 
next_s before the setting of next.

> Now suppose current structure isn't present (curr == NULL) and I call
> foo() three times:
>
>    foo(d1, s1);
>    foo(d2, s2);
>    foo(d3, s3);
>
> I would like to have finally (d3,s3). After first foo(), the first if is
> followed and curr_s data structure and curr flag is filled.
> After second foo(), the second if is followed and next_s/next are filled.
>
> What happens with the third foo()?

If the ISR has not fired, then next will be filled with d3, s3.

> What happens if the ISR is fired
> exactly *after* line marked with (*) above?

Trouble.

> The ISR will use next_s data structure, but it contains uncoherent data
> values: data points to d3, but size is s2.
>
> How to avoid this problem?
>
> Just to explain my problem. The ISR is a timer interrupt that generates
> and audio stream at a sampling frequency on a DAC output. foo() can be
> named audio_play().

There are two ways to handle this that I can think of.  One method is to 
disable interrupts (or at least the timer interrupt) around the critical 
section.  As long as you remember to use volatile accesses or memory 
barriers appropriately, that should be easy to get right (for 
convenience, your interrupt enable/disable macros should include memory 
barriers).

Other than that, you basically have a two entry FIFO queue with one 
reader client (the ISR) and one writer client (foo).  This can be 
implemented as a 3 entry ring buffer with separate head and tail 
pointers.  The writer only ever writes to the tail pointer and the 
buffer, the reader only ever writes to the head pointer.  As each part 
can only be written by one task, you can keep them consistent.  However, 
a simple implementation will only work if you drop new values when your 
FIFO is full - it gets messy if you want to be able to overwrite the 
final entry with new data.

Reply by David Brown ●April 28, 20162016-04-28

On 28/04/16 22:14, Tim Wescott wrote:

>
> Normally you identify those bits of code that need to be atomic but
> aren't, you turn off interrupts for JUST THAT PART, and then you turn
> interrupts back on.
>
> You have to accept that your interrupt latency will increase a bit (and
> you want keep a stick handy, to whack any developer who wraps a
> protection block around more than the absolute minimum of code).
>

Turning off interrupts for a bit is often the simplest way, but just be 
/very/ careful that you really do it correctly.  Something like:

	curr_s.data = data;
	curr_s.size = size;
	asm volatile("cpsid i");
	curr = &curr_s;
	asm volatile("cpsie i");

is /not/ good enough.  The compiler can freely move the writes to curr_s 
inside the critical zone (leading to unpleasant whacking), or it can 
move the write to curr outside the critical zone (leading to everything 
working perfectly until the customer demonstration, when it fails).  It 
can even move the writes to curr_s after the write to curr, and after 
the critical zone - who knows what will happen then.

Make sure you have memory barriers, such as these (for ARM Cortex-M 
devices):

static inline void disableGlobalInterrupts(void) {
	asm volatile("cpsid i" ::: "memory");
}

static inline void enableGlobalInterrupts(void) {
	asm volatile("cpsie i" ::: "memory");
}

Reply by Tim Wescott ●April 29, 20162016-04-29

On Thu, 28 Apr 2016 23:42:39 +0200, David Brown wrote:

> On 28/04/16 22:14, Tim Wescott wrote:
> 
> 
>> Normally you identify those bits of code that need to be atomic but
>> aren't, you turn off interrupts for JUST THAT PART, and then you turn
>> interrupts back on.
>>
>> You have to accept that your interrupt latency will increase a bit (and
>> you want keep a stick handy, to whack any developer who wraps a
>> protection block around more than the absolute minimum of code).
>>
>>
> Turning off interrupts for a bit is often the simplest way, but just be
> /very/ careful that you really do it correctly.  Something like:
> 
> 
> 	curr_s.data = data;
> 	curr_s.size = size;
> 	asm volatile("cpsid i");
> 	curr = &curr_s;
> 	asm volatile("cpsie i");
> 
> 
> is /not/ good enough.  The compiler can freely move the writes to curr_s
> inside the critical zone (leading to unpleasant whacking), or it can
> move the write to curr outside the critical zone (leading to everything
> working perfectly until the customer demonstration, when it fails).  It
> can even move the writes to curr_s after the write to curr, and after
> the critical zone - who knows what will happen then.
> 
> 
> Make sure you have memory barriers, such as these (for ARM Cortex-M
> devices):
> 
> 
> static inline void disableGlobalInterrupts(void) {
> 	asm volatile("cpsid i" ::: "memory");
> }
> 
> static inline void enableGlobalInterrupts(void) {
> 	asm volatile("cpsie i" ::: "memory");
> }

Something just went on my "to be fixed" queue.  I assume that I've been 
getting away with it by luck.  

At any rate, to provide the full-meal-deal you also save the current 
state of interrupts, so that you don't turn the turned-off interrupts 
off, and then turn them on inappropriately.

inline int disable_interrupts(void)
{
  int primask_copy;
  asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
      "cpsid i\n\t"             // disable interrupts
      : [primask_copy] "=r" (primask_copy)
      :: "memory");

  return primask_copy;
}

inline void enable_interrupts(int primask)
{
  int primask_copy = primask;

  // Restore interrupts to their previous value
  asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)
      : "memory");
}

Then in the code you write:

int interrupt_state = disable_interrupts();

// Put your teeny bit of critical stuff here

enable_interrupts(interrupt_state);

I _hope_ that this works right for all cases -- I'm interested in what 
David has to say about it.

(It's not guaranteed to compile -- I just added the "memory" stuff; I 
need to go put it into my code)

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by pozz ●April 29, 20162016-04-29

Il 28/04/2016 22:26, David Brown ha scritto:
> On 28/04/16 15:15, pozz wrote:
>> I have an ISR that uses a struct with two members:
>>
>> struct mystruct {
>>    const void *data;
>>    size_t size;
>> } *curr, *next, curr_s, next_s;
>>
>> void ISR(void) {
>>    static const unsigned char *d;
>>    if (next) {
>>      *curr = *next;
>>      next = NULL;
>>      d = curr->data;
>>    }
>>    if (curr) {
>>      /* Process curr structure, byte by byte */
>>      ...
>>      if (d - (const unsigned char *)curr->data ==
>>              curr->size) {
>>        curr = NULL;
>>      }
>>    }
>> }
>>
>> Now, to activate a new data structure I wrote the following routine:
>>
>> void foo(const void *data, size_t size) {
>>    if ((volatile struct mystruct *)curr == NULL) {
>>      curr_s.data = data;
>>      curr_s.size = size;
>>      curr = &curr_s;
>>    } else {
>> (*) next_s.data = data;
>>      next_s.size = size;
>>      next = &next_s;
>>    }
>> }
>>
>> If a current data structure is present, it should be stopped and the new
>> one should take place.
>>
>> The first if guarantees that there isn't a current structure, so curr
>> and curr_s can be changed without any problem.
>
> Your code above does not guarantee that curr_s is set appropriately
> before curr is set.  The compiler is free to store the address of curr_s
> in curr before setting curr_s.data and curr_s.size.

You are right, I didn't thought about this situation.


> To force the desired ordering, you have two choices.  You can make sure
> curr_s /and/ curr are both accessed as volatile (by using casts to
> volatile, or by simply declaring the variables as volatile).  Or you can
> use a memory barrier to stop the movement of accesses back and forth:
>
>      asm volatile("" ::: "memory");
>
> (assuming gcc or a compiler that supports gcc inline assembly).
>
> Unless you are sure you know what you are doing with memory barriers,
> and with getting volatile casts in exactly the right places, it is
> usually easier to declare the relevant variables as volatile, at the
> risk of slightly suboptimal code generation.  And that means /all/ the
> relevant variables - volatile ordering does not affect the ordering of
> non-volatile accesses.

Does the following code fix that problem?

void foo(const void *data, size_t size) {
    volatile struct mystruct *c = (volatile struct mystruct *)curr;
    if (c == NULL) {
      curr_s.data = data;
      curr_s.size = size;
      c = &curr_s;
    } else {
      /* ... */
    }
}

Now is the instruction "c = &curr_s;" *guaranteed* to be executed 
*after* curr_s assignements?


>> The second if happens when a current structure really exists. In this
>> case, I can't change curr_s and curr directly, so I use a temporary data
>> structure next_s and an atomic flag next (I'm working on 32-bits
>> architecture, so the assignment is atomic). The next data will take
>> place in the next ISR.
>>
>
> Again, you need volatiles or a memory barrier to force the setting of
> next_s before the setting of next.

Yes.


>> Now suppose current structure isn't present (curr == NULL) and I call
>> foo() three times:
>>
>>    foo(d1, s1);
>>    foo(d2, s2);
>>    foo(d3, s3);
>>
>> I would like to have finally (d3,s3). After first foo(), the first if is
>> followed and curr_s data structure and curr flag is filled.
>> After second foo(), the second if is followed and next_s/next are filled.
>>
>> What happens with the third foo()?
>
> If the ISR has not fired, then next will be filled with d3, s3.
>
>> What happens if the ISR is fired
>> exactly *after* line marked with (*) above?
>
> Trouble.
>
>> The ISR will use next_s data structure, but it contains uncoherent data
>> values: data points to d3, but size is s2.
>>
>> How to avoid this problem?
>>
>> Just to explain my problem. The ISR is a timer interrupt that generates
>> and audio stream at a sampling frequency on a DAC output. foo() can be
>> named audio_play().
>
> There are two ways to handle this that I can think of.  One method is to
> disable interrupts (or at least the timer interrupt) around the critical
> section.  As long as you remember to use volatile accesses or memory
> barriers appropriately, that should be easy to get right (for
> convenience, your interrupt enable/disable macros should include memory
> barriers).

Yes, disabling interrupts is the simplest solution, but I usually try to 
find a more elegant solution... if any exists.


> Other than that, you basically have a two entry FIFO queue with one
> reader client (the ISR) and one writer client (foo).  This can be
> implemented as a 3 entry ring buffer with separate head and tail
> pointers.  The writer only ever writes to the tail pointer and the
> buffer, the reader only ever writes to the head pointer.  As each part
> can only be written by one task, you can keep them consistent.  However,
> a simple implementation will only work if you drop new values when your
> FIFO is full - it gets messy if you want to be able to overwrite the
> final entry with new data.

But I need to keep the last pushed data, not dropping it.

What about TWO next data structures? When a foo() is called and current 
structure is busy, the free next data structure is filled and the next 
pointer is then assigned to the new data structure.


struct mystruct {
    const void *data;
    size_t size;
} *curr, *next, curr_s, next1_s, next2_s;

void foo(const void *data, size_t size) {
    volatile struct mystruct *c = (volatile struct mystruct *)curr;
    volatile struct mystruct *n = (volatile struct mystruct *)next;
    if (c == NULL) {
      curr_s.data = data;
      curr_s.size = size;
      c = &curr_s;
    } else if ((n == NULL) || (n == &next2_s)) {  // Use next1
      next1_s.data = data;
      next1_s.size = size;
      n = &next1_s;
    } else {                     // Use next2
      next2_s.data = data;
      next2_s.size = size;
      n = &next2_s;
    }
}

If volatile cast guarantees that the order of the three instructions 
inside if branches isn't changed, I think this solution works in any case.

Reply by David Brown ●April 29, 20162016-04-29

On 29/04/16 06:00, Tim Wescott wrote:
> On Thu, 28 Apr 2016 23:42:39 +0200, David Brown wrote:
> 
>> On 28/04/16 22:14, Tim Wescott wrote:
>>
>>
>>> Normally you identify those bits of code that need to be atomic but
>>> aren't, you turn off interrupts for JUST THAT PART, and then you turn
>>> interrupts back on.
>>>
>>> You have to accept that your interrupt latency will increase a bit (and
>>> you want keep a stick handy, to whack any developer who wraps a
>>> protection block around more than the absolute minimum of code).
>>>
>>>
>> Turning off interrupts for a bit is often the simplest way, but just be
>> /very/ careful that you really do it correctly.  Something like:
>>
>>
>> 	curr_s.data = data;
>> 	curr_s.size = size;
>> 	asm volatile("cpsid i");
>> 	curr = &curr_s;
>> 	asm volatile("cpsie i");
>>
>>
>> is /not/ good enough.  The compiler can freely move the writes to curr_s
>> inside the critical zone (leading to unpleasant whacking), or it can
>> move the write to curr outside the critical zone (leading to everything
>> working perfectly until the customer demonstration, when it fails).  It
>> can even move the writes to curr_s after the write to curr, and after
>> the critical zone - who knows what will happen then.
>>
>>
>> Make sure you have memory barriers, such as these (for ARM Cortex-M
>> devices):
>>
>>
>> static inline void disableGlobalInterrupts(void) {
>> 	asm volatile("cpsid i" ::: "memory");
>> }
>>
>> static inline void enableGlobalInterrupts(void) {
>> 	asm volatile("cpsie i" ::: "memory");
>> }
> 
> Something just went on my "to be fixed" queue.  I assume that I've been 
> getting away with it by luck.  

A great many people misunderstand about how memory accesses can be
re-ordered in C.  They are saved by several things:

1. If you use "volatile" generously, you will probably get the ordering
you want.  All volatile accesses (either because the variable is
declared "volatile", or you use volatile casts for the access) are
ordered with respect to each other.  It is only non-volatile accesses
that can change order.

2. Compilers don't usually move reads and writes around unless there is
something to gain.  If you have a fast superscaler processor with
caches, it can happen a lot - in particular, compilers will often
generate reads quite early so that the data is there in the register
when it is actually needed.  But for simpler, in-order processors, there
is often no point in reading early or writing late, although the
compiler may not bother writing out the data if it knows it will be
writing new values to the variable shortly, and it has the free
registers available.

3. Many developers don't really understand compiler settings and
optimisations, and simply compile with optimisations disabled (how many
times have you heard people say their code runs "correctly" when
optimisations are disabled, but not when optimised?).  Of course that
means bigger and slower code - and in this branch, that can sometimes
mean more expensive and power-hungry microcontrollers.  And it also
means code that may break on newer compilers - compilers are still free
to optimise as they want despite your choice of compiler flags.  "-O"
settings are hints, not demands.

4. When you get this sort of thing wrong, you probably won't notice.
Everything will work unless an interrupt hits at exactly the right point
in the code, and the chances of that happening are typically small.

> 
> At any rate, to provide the full-meal-deal you also save the current 
> state of interrupts, so that you don't turn the turned-off interrupts 
> off, and then turn them on inappropriately.
> 
> inline int disable_interrupts(void)
> {
>   int primask_copy;
>   asm("mrs %[primask_copy], primask\n\t"     // save interrupt status
>       "cpsid i\n\t"             // disable interrupts
>       : [primask_copy] "=r" (primask_copy)
>       :: "memory");
> 
>   return primask_copy;
> }
> 
> inline void enable_interrupts(int primask)
> {
>   int primask_copy = primask;
> 
>   // Restore interrupts to their previous value
>   asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)
>       : "memory");
> }
> 
> Then in the code you write:
> 
> int interrupt_state = disable_interrupts();
> 
> // Put your teeny bit of critical stuff here
> 
> enable_interrupts(interrupt_state);
> 
> I _hope_ that this works right for all cases -- I'm interested in what 
> David has to say about it.

That all looks fine.  However, generally you don't need to save
interrupt state - because you /know/ whether you are in a critical
section or not.  If you have written:

foo(void) {
	// Do lots of stuff
	disableInterrupts();
	// tiny critical section stuff
	enableInterrupts();
	// Do lots more stuff
}

then the only reason you would want to make your
"disableInterrupts/enableInterrupts" store and restore the previous
value is if "foo" is sometimes being called from within an interrupt
function or another critical section.  And if that is happening, your
interrupts will be blocked during "lots of stuff" and "lots more stuff"
- your system is likely broken.

However, sometimes you have functions that you want to call from within
interrupts, or other situations where nested critical sections could
occur - and in that acse, you want to track and restore the interrupt
status.

> 
> (It's not guaranteed to compile -- I just added the "memory" stuff; I 
> need to go put it into my code)
>

Reply by David Brown ●April 29, 20162016-04-29

On 29/04/16 09:02, pozz wrote:
> Il 28/04/2016 22:26, David Brown ha scritto:
>> On 28/04/16 15:15, pozz wrote:
>>> I have an ISR that uses a struct with two members:
>>>
>>> struct mystruct {
>>>    const void *data;
>>>    size_t size;
>>> } *curr, *next, curr_s, next_s;
>>>
>>> void ISR(void) {
>>>    static const unsigned char *d;
>>>    if (next) {
>>>      *curr = *next;
>>>      next = NULL;
>>>      d = curr->data;
>>>    }
>>>    if (curr) {
>>>      /* Process curr structure, byte by byte */
>>>      ...
>>>      if (d - (const unsigned char *)curr->data ==
>>>              curr->size) {
>>>        curr = NULL;
>>>      }
>>>    }
>>> }
>>>
>>> Now, to activate a new data structure I wrote the following routine:
>>>
>>> void foo(const void *data, size_t size) {
>>>    if ((volatile struct mystruct *)curr == NULL) {
>>>      curr_s.data = data;
>>>      curr_s.size = size;
>>>      curr = &curr_s;
>>>    } else {
>>> (*) next_s.data = data;
>>>      next_s.size = size;
>>>      next = &next_s;
>>>    }
>>> }
>>>
>>> If a current data structure is present, it should be stopped and the new
>>> one should take place.
>>>
>>> The first if guarantees that there isn't a current structure, so curr
>>> and curr_s can be changed without any problem.
>>
>> Your code above does not guarantee that curr_s is set appropriately
>> before curr is set.  The compiler is free to store the address of curr_s
>> in curr before setting curr_s.data and curr_s.size.
> 
> You are right, I didn't thought about this situation.
> 
> 
>> To force the desired ordering, you have two choices.  You can make sure
>> curr_s /and/ curr are both accessed as volatile (by using casts to
>> volatile, or by simply declaring the variables as volatile).  Or you can
>> use a memory barrier to stop the movement of accesses back and forth:
>>
>>      asm volatile("" ::: "memory");
>>
>> (assuming gcc or a compiler that supports gcc inline assembly).
>>
>> Unless you are sure you know what you are doing with memory barriers,
>> and with getting volatile casts in exactly the right places, it is
>> usually easier to declare the relevant variables as volatile, at the
>> risk of slightly suboptimal code generation.  And that means /all/ the
>> relevant variables - volatile ordering does not affect the ordering of
>> non-volatile accesses.
> 
> Does the following code fix that problem?
> 
> void foo(const void *data, size_t size) {
>    volatile struct mystruct *c = (volatile struct mystruct *)curr;
>    if (c == NULL) {
>      curr_s.data = data;
>      curr_s.size = size;
>      c = &curr_s;
>    } else {
>      /* ... */
>    }
> }
> 
> Now is the instruction "c = &curr_s;" *guaranteed* to be executed
> *after* curr_s assignements?
> 

No - volatile accesses are only ordered with respect to other volatile
accesses (and other observable or potentially observable events, such as
calls to external functions).  The stores to curr_s can be moved after
the store to c, since the stores to curr_s are not volatile.

(You could also add a memory barrier before "c = &curr_s;".)

> 
>>> The second if happens when a current structure really exists. In this
>>> case, I can't change curr_s and curr directly, so I use a temporary data
>>> structure next_s and an atomic flag next (I'm working on 32-bits
>>> architecture, so the assignment is atomic). The next data will take
>>> place in the next ISR.
>>>
>>
>> Again, you need volatiles or a memory barrier to force the setting of
>> next_s before the setting of next.
> 
> Yes.
> 
> 
>>> Now suppose current structure isn't present (curr == NULL) and I call
>>> foo() three times:
>>>
>>>    foo(d1, s1);
>>>    foo(d2, s2);
>>>    foo(d3, s3);
>>>
>>> I would like to have finally (d3,s3). After first foo(), the first if is
>>> followed and curr_s data structure and curr flag is filled.
>>> After second foo(), the second if is followed and next_s/next are
>>> filled.
>>>
>>> What happens with the third foo()?
>>
>> If the ISR has not fired, then next will be filled with d3, s3.
>>
>>> What happens if the ISR is fired
>>> exactly *after* line marked with (*) above?
>>
>> Trouble.
>>
>>> The ISR will use next_s data structure, but it contains uncoherent data
>>> values: data points to d3, but size is s2.
>>>
>>> How to avoid this problem?
>>>
>>> Just to explain my problem. The ISR is a timer interrupt that generates
>>> and audio stream at a sampling frequency on a DAC output. foo() can be
>>> named audio_play().
>>
>> There are two ways to handle this that I can think of.  One method is to
>> disable interrupts (or at least the timer interrupt) around the critical
>> section.  As long as you remember to use volatile accesses or memory
>> barriers appropriately, that should be easy to get right (for
>> convenience, your interrupt enable/disable macros should include memory
>> barriers).
> 
> Yes, disabling interrupts is the simplest solution, but I usually try to
> find a more elegant solution... if any exists.

Often, simple /is/ elegant!

Remember, it is extremely difficult to spot issues with this sort of
thing in testing.  You can examine generated assembly code - but perhaps
at a later date, with slight changes to other code, compiler options, or
compiler versions, the generated assembly no longer matches.

So it is better to be safe than sorry, even if that means marginally
sub-optimal code.  Typically for this sort of thing, adding extra
volatiles is not going to cause inefficiency - you are going to do these
reads and writes anyway.  And disabling interrupts for a few cycles is
rarely a real problem - it can be much better than using many more
cycles for a non-locking solution.

> 
> 
>> Other than that, you basically have a two entry FIFO queue with one
>> reader client (the ISR) and one writer client (foo).  This can be
>> implemented as a 3 entry ring buffer with separate head and tail
>> pointers.  The writer only ever writes to the tail pointer and the
>> buffer, the reader only ever writes to the head pointer.  As each part
>> can only be written by one task, you can keep them consistent.  However,
>> a simple implementation will only work if you drop new values when your
>> FIFO is full - it gets messy if you want to be able to overwrite the
>> final entry with new data.
> 
> But I need to keep the last pushed data, not dropping it.

Then it gets complicated!

Basically, you need a some indirection in order to be able to atomically
swap the old data with the latest data.

> 
> What about TWO next data structures? When a foo() is called and current
> structure is busy, the free next data structure is filled and the next
> pointer is then assigned to the new data structure.
> 
> 
> struct mystruct {
>    const void *data;
>    size_t size;
> } *curr, *next, curr_s, next1_s, next2_s;
> 
> void foo(const void *data, size_t size) {
>    volatile struct mystruct *c = (volatile struct mystruct *)curr;
>    volatile struct mystruct *n = (volatile struct mystruct *)next;
>    if (c == NULL) {
>      curr_s.data = data;
>      curr_s.size = size;
>      c = &curr_s;
>    } else if ((n == NULL) || (n == &next2_s)) {  // Use next1
>      next1_s.data = data;
>      next1_s.size = size;
>      n = &next1_s;
>    } else {                     // Use next2
>      next2_s.data = data;
>      next2_s.size = size;
>      n = &next2_s;
>    }
> }
> 
> If volatile cast guarantees that the order of the three instructions
> inside if branches isn't changed, I think this solution works in any case.
> 

With the note from earlier that your curr_s (and now next1_s and
next2_s) need to be volatile, or accessed through volatile pointers, I
think this is probably okay - but I have not studied it thoroughly.

I would recommend you declare the variables here as volatile - you have
nothing to lose in efficiency, and it will make the code clearer.

I also recommend typedef'ing your struct and declaring variables
separately - again, it will make the code clearer.  But that's a style
issue, not a correctness issue, and different people have different styles.

Reply by pozz ●April 29, 20162016-04-29

Il 29/04/2016 13:02, David Brown ha scritto:
> On 29/04/16 09:02, pozz wrote:
>> Il 28/04/2016 22:26, David Brown ha scritto:
>>> On 28/04/16 15:15, pozz wrote:
>>>> I have an ISR that uses a struct with two members:
>>>>
>>>> struct mystruct {
>>>>     const void *data;
>>>>     size_t size;
>>>> } *curr, *next, curr_s, next_s;
>>>>
>>>> void ISR(void) {
>>>>     static const unsigned char *d;
>>>>     if (next) {
>>>>       *curr = *next;
>>>>       next = NULL;
>>>>       d = curr->data;
>>>>     }
>>>>     if (curr) {
>>>>       /* Process curr structure, byte by byte */
>>>>       ...
>>>>       if (d - (const unsigned char *)curr->data ==
>>>>               curr->size) {
>>>>         curr = NULL;
>>>>       }
>>>>     }
>>>> }
>>>>
>>>> Now, to activate a new data structure I wrote the following routine:
>>>>
>>>> void foo(const void *data, size_t size) {
>>>>     if ((volatile struct mystruct *)curr == NULL) {
>>>>       curr_s.data = data;
>>>>       curr_s.size = size;
>>>>       curr = &curr_s;
>>>>     } else {
>>>> (*) next_s.data = data;
>>>>       next_s.size = size;
>>>>       next = &next_s;
>>>>     }
>>>> }
>>>>
>>>> If a current data structure is present, it should be stopped and the new
>>>> one should take place.
>>>>
>>>> The first if guarantees that there isn't a current structure, so curr
>>>> and curr_s can be changed without any problem.
>>>
>>> Your code above does not guarantee that curr_s is set appropriately
>>> before curr is set.  The compiler is free to store the address of curr_s
>>> in curr before setting curr_s.data and curr_s.size.
>>
>> You are right, I didn't thought about this situation.
>>
>>
>>> To force the desired ordering, you have two choices.  You can make sure
>>> curr_s /and/ curr are both accessed as volatile (by using casts to
>>> volatile, or by simply declaring the variables as volatile).  Or you can
>>> use a memory barrier to stop the movement of accesses back and forth:
>>>
>>>       asm volatile("" ::: "memory");
>>>
>>> (assuming gcc or a compiler that supports gcc inline assembly).
>>>
>>> Unless you are sure you know what you are doing with memory barriers,
>>> and with getting volatile casts in exactly the right places, it is
>>> usually easier to declare the relevant variables as volatile, at the
>>> risk of slightly suboptimal code generation.  And that means /all/ the
>>> relevant variables - volatile ordering does not affect the ordering of
>>> non-volatile accesses.
>>
>> Does the following code fix that problem?
>>
>> void foo(const void *data, size_t size) {
>>     volatile struct mystruct *c = (volatile struct mystruct *)curr;
>>     if (c == NULL) {
>>       curr_s.data = data;
>>       curr_s.size = size;
>>       c = &curr_s;
>>     } else {
>>       /* ... */
>>     }
>> }
>>
>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed
>> *after* curr_s assignements?
>>
>
> No - volatile accesses are only ordered with respect to other volatile
> accesses (and other observable or potentially observable events, such as
> calls to external functions).  The stores to curr_s can be moved after
> the store to c, since the stores to curr_s are not volatile.
>
> (You could also add a memory barrier before "c = &curr_s;".)

In that case (adding a memory barrier before "c = &curr_s", is it 
necessary to declare curr_s and c variables as volatile?


>>>> The second if happens when a current structure really exists. In this
>>>> case, I can't change curr_s and curr directly, so I use a temporary data
>>>> structure next_s and an atomic flag next (I'm working on 32-bits
>>>> architecture, so the assignment is atomic). The next data will take
>>>> place in the next ISR.
>>>>
>>>
>>> Again, you need volatiles or a memory barrier to force the setting of
>>> next_s before the setting of next.
>>
>> Yes.
>>
>>
>>>> Now suppose current structure isn't present (curr == NULL) and I call
>>>> foo() three times:
>>>>
>>>>     foo(d1, s1);
>>>>     foo(d2, s2);
>>>>     foo(d3, s3);
>>>>
>>>> I would like to have finally (d3,s3). After first foo(), the first if is
>>>> followed and curr_s data structure and curr flag is filled.
>>>> After second foo(), the second if is followed and next_s/next are
>>>> filled.
>>>>
>>>> What happens with the third foo()?
>>>
>>> If the ISR has not fired, then next will be filled with d3, s3.
>>>
>>>> What happens if the ISR is fired
>>>> exactly *after* line marked with (*) above?
>>>
>>> Trouble.
>>>
>>>> The ISR will use next_s data structure, but it contains uncoherent data
>>>> values: data points to d3, but size is s2.
>>>>
>>>> How to avoid this problem?
>>>>
>>>> Just to explain my problem. The ISR is a timer interrupt that generates
>>>> and audio stream at a sampling frequency on a DAC output. foo() can be
>>>> named audio_play().
>>>
>>> There are two ways to handle this that I can think of.  One method is to
>>> disable interrupts (or at least the timer interrupt) around the critical
>>> section.  As long as you remember to use volatile accesses or memory
>>> barriers appropriately, that should be easy to get right (for
>>> convenience, your interrupt enable/disable macros should include memory
>>> barriers).
>>
>> Yes, disabling interrupts is the simplest solution, but I usually try to
>> find a more elegant solution... if any exists.
>
> Often, simple /is/ elegant!
>
> Remember, it is extremely difficult to spot issues with this sort of
> thing in testing.  You can examine generated assembly code - but perhaps
> at a later date, with slight changes to other code, compiler options, or
> compiler versions, the generated assembly no longer matches.
>
> So it is better to be safe than sorry, even if that means marginally
> sub-optimal code.  Typically for this sort of thing, adding extra
> volatiles is not going to cause inefficiency - you are going to do these
> reads and writes anyway.  And disabling interrupts for a few cycles is
> rarely a real problem - it can be much better than using many more
> cycles for a non-locking solution.

Yes, you're right again.

If I will decide to disable timer interrupt in foo(), is it ok to avoid 
declaring variables (curr, next, curr_s, ...) with keyword volatile?

I think yes, it should be ok. If interrupt (where the variables are 
potentially changed) is disabled, the variables aren't volatile.


>>> Other than that, you basically have a two entry FIFO queue with one
>>> reader client (the ISR) and one writer client (foo).  This can be
>>> implemented as a 3 entry ring buffer with separate head and tail
>>> pointers.  The writer only ever writes to the tail pointer and the
>>> buffer, the reader only ever writes to the head pointer.  As each part
>>> can only be written by one task, you can keep them consistent.  However,
>>> a simple implementation will only work if you drop new values when your
>>> FIFO is full - it gets messy if you want to be able to overwrite the
>>> final entry with new data.
>>
>> But I need to keep the last pushed data, not dropping it.
>
> Then it gets complicated!
>
> Basically, you need a some indirection in order to be able to atomically
> swap the old data with the latest data.
>
>>
>> What about TWO next data structures? When a foo() is called and current
>> structure is busy, the free next data structure is filled and the next
>> pointer is then assigned to the new data structure.
>>
>>
>> struct mystruct {
>>     const void *data;
>>     size_t size;
>> } *curr, *next, curr_s, next1_s, next2_s;
>>
>> void foo(const void *data, size_t size) {
>>     volatile struct mystruct *c = (volatile struct mystruct *)curr;
>>     volatile struct mystruct *n = (volatile struct mystruct *)next;
>>     if (c == NULL) {
>>       curr_s.data = data;
>>       curr_s.size = size;
>>       c = &curr_s;
>>     } else if ((n == NULL) || (n == &next2_s)) {  // Use next1
>>       next1_s.data = data;
>>       next1_s.size = size;
>>       n = &next1_s;
>>     } else {                     // Use next2
>>       next2_s.data = data;
>>       next2_s.size = size;
>>       n = &next2_s;
>>     }
>> }
>>
>> If volatile cast guarantees that the order of the three instructions
>> inside if branches isn't changed, I think this solution works in any case.
>>
>
> With the note from earlier that your curr_s (and now next1_s and
> next2_s) need to be volatile, or accessed through volatile pointers, I
> think this is probably okay - but I have not studied it thoroughly.
>
> I would recommend you declare the variables here as volatile - you have
> nothing to lose in efficiency, and it will make the code clearer.
>
> I also recommend typedef'ing your struct and declaring variables
> separately - again, it will make the code clearer.  But that's a style
> issue, not a correctness issue, and different people have different styles.

Reply by David Brown ●April 29, 20162016-04-29

On 29/04/16 13:22, pozz wrote:
> Il 29/04/2016 13:02, David Brown ha scritto:
>> On 29/04/16 09:02, pozz wrote:

<snip>

>>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed
>>> *after* curr_s assignements?
>>>
>>
>> No - volatile accesses are only ordered with respect to other volatile
>> accesses (and other observable or potentially observable events, such as
>> calls to external functions).  The stores to curr_s can be moved after
>> the store to c, since the stores to curr_s are not volatile.
>>
>> (You could also add a memory barrier before "c = &curr_s;".)
> 
> In that case (adding a memory barrier before "c = &curr_s", is it
> necessary to declare curr_s and c variables as volatile?
> 

You don't need both - a memory barrier tells the compiler that anything
that the source code says should be written before the barrier, will be
written before the barrier, and that anything that the source code says
should be written after the barrier, is written after the barrier.  It
also says that anything the compiler has read before the barrier should
be discarded and re-read after the barrier if needed.  So the memory
barrier provides a synchronisation point for all memory accesses,
volatile and non-volatile.

Note that both "volatile" and compiler memory barriers (using empty
inline assembly with a "memory clobber") are purely compiler-level
concepts.  They don't affect any scheduling or pipelining in the cpu,
cache, or other hardware.

<snip>

>>> Yes, disabling interrupts is the simplest solution, but I usually try to
>>> find a more elegant solution... if any exists.
>>
>> Often, simple /is/ elegant!
>>
>> Remember, it is extremely difficult to spot issues with this sort of
>> thing in testing.  You can examine generated assembly code - but perhaps
>> at a later date, with slight changes to other code, compiler options, or
>> compiler versions, the generated assembly no longer matches.
>>
>> So it is better to be safe than sorry, even if that means marginally
>> sub-optimal code.  Typically for this sort of thing, adding extra
>> volatiles is not going to cause inefficiency - you are going to do these
>> reads and writes anyway.  And disabling interrupts for a few cycles is
>> rarely a real problem - it can be much better than using many more
>> cycles for a non-locking solution.
> 
> Yes, you're right again.
> 
> If I will decide to disable timer interrupt in foo(), is it ok to avoid
> declaring variables (curr, next, curr_s, ...) with keyword volatile?
> 
> I think yes, it should be ok. If interrupt (where the variables are
> potentially changed) is disabled, the variables aren't volatile.
> 

Again, you need either volatiles or memory barriers to make sure the
required accesses (and preferably /only/ the required accesses) are
within the critical section.  An easy way to do this is to make memory
barriers part of the enable/disable interrupt functions.  But note that
memory barriers are a heavy-handed approach - if your code must be as
fast as possible, and your processor has lots of registers, long
pipelines and super-scaler scheduling, then sometimes it is not ideal.
For typical microcontrollers, however, it works very well.

I've included a few of my favourite functions below.  The assembly used
is for the ARM Cortex-M4, but can easily be adapted for other
microcontrollers (and simpler cpus like the AVR or MSP430 don't need any
cpu barrier or flush instructions - but you still need the memory clobber!).

One of these that is likely to be a new concept for many (I can imagine
Tim peering intently at this one!) is the "forceDependency" macro below.
 This is a solution to problems such as :

void setBaudrate(uint32_t baud) {
	uint32_t divisor = ((busClockTicksPerMicro * 1000000 * 2) +
			(baud / 2)) / baud;
	uint32_t sbr = divisor >> 5;

	UART_Type* pUart = UART2_BASE_PTR;

	uint8_t BDH = (sbr >> 8) & 0x1f;
	uint8_t BDL = sbr & 0xff;
	uint8_t C4 = divisor & 0x1f;

	disableInterrupts();

	// Critical section starts here

	pUart->BDH = BDH;
	pUart->BDL = BDL;
	pUart->C4 = C4;

	enableInterrupts();
}

First, "divisor" and "sbr" are calculated, which is a slow process
involving multiple cycles and a library call or two.  Then the desired
hardware register values are derived.  Then interrupts are disabled, and
the values are written into the peripheral registers.

It looks like interrupts are disabled for the minimal length of time.

But...

Even though the disableInterrupt() and enableInterrupt() functions have
a memory barrier, /and/ the pUart pointer is to a volatile variable,
those only apply to memory accesses.  /Calculations/ can be moved around
as the compiler fancies.  The compiler may not bother calculating
"divisor", "sbr", and so on, until they are actually needed - and that
means /after/ the disableInterrupt() call.  The result is that it is
quite possible for all the long calculations to be done within the
critical section.

The solution is to put:

	forceDependency(BDH);
	forceDependency(BDL);
	forceDependency(C4);

before disableInterrupt().  The "forceDependency" macro tells the
compiler that it needs to know the value of "BDH" for some inline
assembly - even though the inline assembly is empty and does nothing.

I believe that is enough lessons for today.  I hope my posts here have
been of use to both the OP (well done for asking difficult and
interesting questions) and to others in the group.

mvh.,

David

// Ensure that "val" has been calculated before next volatile access
// by requiring it as an assembly input.  Note that only volatiles are
// ordered!
#define forceDependency(val) \
		asm volatile("" :: "" (val) : )

// Ensures that all outstanding memory transactions complete before
// starting new ones
static inline void memoryBarrier(void) {
	asm volatile("dmb" : : : "memory");
}

// Compiler-only memory barrier
static inline void compilerBarrier(void) {
	asm volatile("" ::: "memory");
}

// Ensures that all outstanding memory transactions complete before
// next instruction
static inline void dsync(void) {
	asm volatile("dsb" : : : "memory");
}

// Ensures that all outstanding memory transactions complete, then
// flush instruction buffers
static inline void isync(void) {
	asm volatile("isb" : : : "memory");
}

Previous12 3 Next

Help with interrupt software routine and non-atomic operations

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group