EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Help with interrupt software routine and non-atomic operations

Started by pozz April 28, 2016
On 01/05/16 23:13, pozz wrote:
> Il 30/04/2016 20:46, David Brown ha scritto: > > On 30/04/16 16:47, pozz wrote: > >> Il 29/04/2016 14:07, David Brown ha scritto: > >>> On 29/04/16 13:22, pozz wrote: > >>>> Il 29/04/2016 13:02, David Brown ha scritto: > >>>>> On 29/04/16 09:02, pozz wrote: > >>> > >>> <snip> > >>> > >>>>>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed > >>>>>> *after* curr_s assignements? > >>>>>> > >>>>> > >>>>> No - volatile accesses are only ordered with respect to other > volatile > >>>>> accesses (and other observable or potentially observable events, > >>>>> such as > >>>>> calls to external functions). The stores to curr_s can be moved > after > >>>>> the store to c, since the stores to curr_s are not volatile. > >>>>> > >>>>> (You could also add a memory barrier before "c = &curr_s;".) > >>>> > >>>> In that case (adding a memory barrier before "c = &curr_s", is it > >>>> necessary to declare curr_s and c variables as volatile? > >>>> > >>> > >>> You don't need both - a memory barrier tells the compiler that > anything > >>> that the source code says should be written before the barrier, > will be > >>> written before the barrier, and that anything that the source code > says > >>> should be written after the barrier, is written after the barrier. It > >>> also says that anything the compiler has read before the barrier > should > >>> be discarded and re-read after the barrier if needed. So the memory > >>> barrier provides a synchronisation point for all memory accesses, > >>> volatile and non-volatile. > >> > >> I think I understood, even if it is a complex topic and sincerely I > >> never heard about memory barriers before (and I have more than 10 years > >> of embedded programming experience... my shame!). > > > > With generous use of "volatile", you don't need memory barriers (except > > if you need cpu synchronisation instructions for OoO processors, as > > noted by George Neuner). > > Oh yes, but you have to understand very well what volatile means and how > it can avoid reordering. Until yesterday, I used to declare a variable > volatile (usually a static variable at a module scope) when it is > changed in my ISR (defined in the same module). >
As a first approximation, what you want is volatile for all variables that are used in the ISR and another part of the program (the main loop, another ISR, etc.). That includes variables written by the main loop and only read in the ISR, and it does /not/ include variables used only within an ISR.
> I used to follow the C code, statement by statement, and check carefully > if an interrupt trigger in that point could expose to problems. > > Now I know I can't follow the instructions in the same order I write > them in the C source code, because the compiler could change this order...
Correct! That's why you need to be careful with variables /read/ by the ISR, not just those /changed/ by the ISR.
> > The solution again is to use volatile, but for another reason: avoid > reordering. So volatile keyword should be applied to other variables, > not only the ones changed in interrupts routines. This is what I didn't > know.
Don't think of volatile as "avoiding reordering" - think of it more as /enforcing/ an ordering. You only need to consider the order of accesses that are visible to the ISR, and make sure that those ones are ordered as required.
> > > >> Maybe I will post some other C-related technical questions to > >> comp.lang.c for better explanations of compiler memory barriers, > >> volatile keyword and reordering of instructions made during > >> optimizations by the compiler. > > > > You can try, but I suspect you will get more theoretical or hypothetical > > discussions about how the standards don't actually define memory > > accesses, combined with posts about how it all worked much better in the > > old days before the evil compiler writers broke everything. c.a.e. has > > people with a more pragmatic attitude - if you are programming with gcc > > for a Cortex M4, or IAR for an msp430, or Keil for an 8051, then what is > > important is how things work for that tool and that target. Both > > attitudes can be interesting and educational. > > > > Maybe you'll get some discussion going about C11 atomics - that is an > > area I have not yet investigated much, and can be an alternative to > > using volatiles and/or memory barriers. > > I'm ready to hear all of those things... they could be instructive anyway
It's high on my "things to read and learn" list. Once I have figured it out, or at least learned enough to share and/or ask sensible questions, I'll make some posts.
> > > >> Just a simple questions: is the concept of compiler memory barrier > >> defined in the C standard? I don't think, because it is a special > >> assembler instruction. So the existence of a memory barrier > >> functionality and its effect could depend on the compiler, right? > >> > > > > Correct. (But see the C11 standards for atomics.) > > > > "volatile" is in the C standards, of course, but "what constitutes a > > memory access is implementation defined". > > > > A lot of compilers either support gcc inline assembly syntax (including > > memory clobbers), provide intrinsics or extensions for barriers, or > > simply have that any function calls or inline assembly act as barriers. > > The documentation of such features can often be lacking, however. > > > > So generous "volatile" usage is the only cross-target way to get this > > effect. > > It's a pity, because you could add volatile keyword even if it isn't > really necessary, only to be safe. And this could have implications in > performance.
Yes, too many "volatile" accesses can be suboptimal, as can memory barriers (since they are barriers for /all/ memory accesses). But remember that it is always /much/ better to be safe than sorry!
> > I think one method to reduce the performance loss is to declare > variables without volatile and access them through pointers that, in > case safe is important, can be volatile.
Yes - remember that it is not /variables/ that are volatile, it is /accesses/. A variable that is declared "volatile" is just one where all the accesses are volatile (unless you use casts somewhere). While you can use pointers to handle this, it is often convenient to do it using a macro: #define volatileAccess(v) *((volatile typeof((v)) *) &(v)) int a, b, c; volatileAccess(a) = b; c = volatileAccess(b); (Linux has this macro too, under then name ACCESS_ONCE.)
> > static mytype var; > > volatile mytype *v = (volatile mytype)&var; > // volatile access to var should be made through v pointer > > > >>> Note that both "volatile" and compiler memory barriers (using empty > >>> inline assembly with a "memory clobber") are purely compiler-level > >>> concepts. They don't affect any scheduling or pipelining in the cpu, > >>> cache, or other hardware. > >> > >> Even if I just learned there are hardware memory barriers too... > >> > > > > What sort of processors do you use? > > > > If you are using an OS or RTOS of some sort, it will usually provide > > macros or functions to handle this sort of thing for you. > > > > When you are doing the messy details yourself, you only need to worry > > about processor memory ordering when you have OoO, superscaler, write > > back buffers, memory management, caches, etc. And even then, the > > hardware is usually arranged in a convenient way, such as to bypass > > cache and write buffers when accessing memory mapped peripherals. The > > fun really starts when you have a write-back cache on the cpu, and DMA > > on the peripheral side that is trying to access the same data. > > I read something about Cortex-M [1] and its hw memory barriers > instructions DSB and ISB. Cortex-M processors aren't OoO, superscaler or > other esoteric architectures... and I'm using them currently. >
Yes, it has those instructions (they were mentioned in another post of mine). They are rarely needed on the M0..M4, but they are still part of the architecture. The kind of situation where you might need them is when fiddling with the memory management unit, or when changing code that might be in the instruction cache (self-modifying code may be frowned upon, but software updates are common). And note that the M7, which is now available from some manufacturers, is dual issue. More relevant than OoO or superscaler is the use of a data writeback buffer. I don't believe the M3/M4 has one, but the M7 might. I've worked on a PPC with data caches and writeback buffers - the situations that require the data ordering cpu instructions are mostly when interacting with peripherals with DMA.
> [1] > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHHFHJD.html > > > > >>>>>> Yes, disabling interrupts is the simplest solution, but I usually > >>>>>> try to > >>>>>> find a more elegant solution... if any exists. > >>>>> > >>>>> Often, simple /is/ elegant! > >>>>> > >>>>> Remember, it is extremely difficult to spot issues with this sort of > >>>>> thing in testing. You can examine generated assembly code - but > >>>>> perhaps > >>>>> at a later date, with slight changes to other code, compiler > >>>>> options, or > >>>>> compiler versions, the generated assembly no longer matches. > >>>>> > >>>>> So it is better to be safe than sorry, even if that means marginally > >>>>> sub-optimal code. Typically for this sort of thing, adding extra > >>>>> volatiles is not going to cause inefficiency - you are going to do > >>>>> these > >>>>> reads and writes anyway. And disabling interrupts for a few > cycles is > >>>>> rarely a real problem - it can be much better than using many more > >>>>> cycles for a non-locking solution. > >>>> > >>>> Yes, you're right again. > >>>> > >>>> If I will decide to disable timer interrupt in foo(), is it ok to > avoid > >>>> declaring variables (curr, next, curr_s, ...) with keyword volatile? > >>>> > >>>> I think yes, it should be ok. If interrupt (where the variables are > >>>> potentially changed) is disabled, the variables aren't volatile. > >>>> > >>> > >>> Again, you need either volatiles or memory barriers to make sure the > >>> required accesses (and preferably /only/ the required accesses) are > >>> within the critical section. An easy way to do this is to make memory > >>> barriers part of the enable/disable interrupt functions. But note > that > >>> memory barriers are a heavy-handed approach - if your code must be as > >>> fast as possible, and your processor has lots of registers, long > >>> pipelines and super-scaler scheduling, then sometimes it is not ideal. > >>> For typical microcontrollers, however, it works very well. > >> > >> I just checked the definition of sei()/cli() macros in avr-libc and... > >> they have a memory barrier! I never noticed it. > > > > The folks behind the avr-libc library are pretty knowledgeable about > > this sort of thing! Mind you, I did mention to them my > > "forceDependency" function as a solution to the problem mentioned in the > > avr-libc documentation (the one you found below), but it didn't make it > > to the web page. > > > >> > >>> I've included a few of my favourite functions below. The assembly > used > >>> is for the ARM Cortex-M4, but can easily be adapted for other > >>> microcontrollers (and simpler cpus like the AVR or MSP430 don't > need any > >>> cpu barrier or flush instructions - but you still need the memory > >>> clobber!). > >>> > >>> > >>> One of these that is likely to be a new concept for many (I can > imagine > >>> Tim peering intently at this one!) is the "forceDependency" macro > below. > >>> This is a solution to problems such as : > >>> > >>> void setBaudrate(uint32_t baud) { > >>> uint32_t divisor = ((busClockTicksPerMicro * 1000000 * 2) + > >>> (baud / 2)) / baud; > >>> uint32_t sbr = divisor >> 5; > >>> > >>> UART_Type* pUart = UART2_BASE_PTR; > >>> > >>> uint8_t BDH = (sbr >> 8) & 0x1f; > >>> uint8_t BDL = sbr & 0xff; > >>> uint8_t C4 = divisor & 0x1f; > >>> > >>> > >>> disableInterrupts(); > >>> > >>> // Critical section starts here > >>> > >>> pUart->BDH = BDH; > >>> pUart->BDL = BDL; > >>> pUart->C4 = C4; > >>> > >>> enableInterrupts(); > >>> } > >>> > >>> First, "divisor" and "sbr" are calculated, which is a slow process > >>> involving multiple cycles and a library call or two. Then the desired > >>> hardware register values are derived. Then interrupts are > disabled, and > >>> the values are written into the peripheral registers. > >>> > >>> It looks like interrupts are disabled for the minimal length of time. > >>> > >>> But... > >>> > >>> Even though the disableInterrupt() and enableInterrupt() functions > have > >>> a memory barrier, /and/ the pUart pointer is to a volatile variable, > >>> those only apply to memory accesses. /Calculations/ can be moved > around > >>> as the compiler fancies. The compiler may not bother calculating > >>> "divisor", "sbr", and so on, until they are actually needed - and that > >>> means /after/ the disableInterrupt() call. The result is that it is > >>> quite possible for all the long calculations to be done within the > >>> critical section. > >> > >> Aaargh, this example put me in a wrong way. I thought I had understood. > >> Your example is similar to the one showed at [1], even if there isn't a > >> solution there. > > > > Yes, it is exactly the same situation - and "forceDependency" is a good > > solution to it. > > > >> > >> You say /calculations/ could be moved around and could cross memory > >> barriers... but the results of BDH, BDL and C4 (and divisor and sbr) > >> should be stored/saved in variables. Ok, they are automatic variables, > >> they are on the stack, but they are /variables/. > > > > No, automatic variables do not necessarily result in memory variables. > > And even if they are stored on the stack, they don't have to stay in the > > same place on the stack, nor do they have to be the same size or the > > same value as you use in the source. The compiler only has to turn them > > into "real" variables in memory if you do something like pass their > > address onto an external function. That is why "what constitutes a > > memory access is implementation-defined". > > > > On a device like the AVR or an ARM, with plenty of registers, most > > automatic variables never get put on the stack. And that's a good thing > > - it means you can freely make as many local variables as you want in > > order to write clear and correct code, without worrying about the extra > > stack space or timing. > > Are you saying the compiler has the right to not respect a memory > barrier regarding instructions involving automatic variables *it* > decides to manage in CPU registers? Only because on that CPU > architecture with those registers and in that context, it can avoid to > manage an automatic variable as a normal memory variable?
Yes.
> > Oh my god, it's a very bad situation... I didn't think C language (that > was borned for writing OS and device drivers in a portable way) has this > sort of obscure faces...
It all comes down to the "as if" rule - the compiler can do anything it wants, as long as the observable behaviour (basically, volatile accesses, program start and stop, and calls to external functions) work /as if/ the compiler had done a simple translation. It is not as bad as it might seem. After all, if an ISR is going to be interacting with some data that is also read or written by the main loop, then that data will have go into memory as a "normal variable" - otherwise it would not be visible to both functions. And thus your memory barriers will work as desired. Generally speaking, only local variables that are only used within a function will be kept in registers, optimised away, or otherwise manipulated. And as I say, this is a good thing - it lets you write clear source code and lets the compiler generate efficient object code.
> > > What happens if BDL, BDH and C4 variables are declared outside the > function, as static variables at module level? They are full memory > variable, so the compiler should respect the memory barrier for them. > Yes, you waste a bunch of bytes, but you can be sure the non critical > section doesn't cross the critical section borders.
They will /probably/ get implemented as memory variables, and respect the memory barrier. But without a memory barrier, their accesses can be re-ordered. And if the compiler sees that these static variables are only ever used within this one function, and are always written by the function before reading them, it can pretend they are local automatic variables and then optimise them away - because there is no observable difference between those behaviours.
> > > On [2] the same problem is descripted in the following way: > > "However, memory barrier works well in ensuring that all volatile > accesses before and after the barrier occur in the given order with > respect to the barrier. However, it does not ensure the compiler moving > non-volatile-related statements across the barrier." > > It seems the problem is caused because there are statements involving > non-volatile variables. Maybe the problem can be solved declaring BDH, > BDL and C4 volatile?
That would work too - but it could be less efficient. Almost certainly, BDH, BDL and C4 would be allocated memory space in the stack, and the calculated values would be written to them when their values are known, and they would be read from memory when they are used. (The interrupt enable/disable assembly is ordered as "volatile".) Using the "forceDependency" macro, the values could be kept in registers.
> > [2] > http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > > > > >> The memory barrier > >> embedded in disableInterrupts() macro should say to the compiler to > >> flush/save all the values from registers to RAM. At least, this is the > >> definition I understood of memory barrier. > > > > Nope. > > > > The barrier, as defined by the functions I gave, tells the compiler that > > at this point, something outside of the compiler's control and knowledge > > may read or write memory - the whole memory becomes briefly volatile. > > However, it says /nothing/ about what should or should not be stored in > > memory. > > Hmmm...., ok. Anyway, if the compiler has generated some temporary > values for some memory variables in registers, it is forced to move > those values to RAM to refresh variables. Indeed, they can be accessed > by some read operations without the direct control of compiler. >
Remember, the compiler considers itself omniscient, except for volatile accesses. If a variable or access is not volatile, the compiler knows that nothing else will ever read it except the code it is generating.
> > >> If the reordering is possible (the interrupts disabling instruction is > >> moved before the calculations instructions), I haven't understood the > >> usefulness of memory barriers. > > > > I believe you understand about the memory barriers, you just didn't > > understand about what variables go in memory. As a good guide, > > everything that has an address is in memory - that includes all global > > data, file-level data, static data, and dynamically allocated data, as > > well as any local data whose address you have taken, and any volatile > > local data. > > > > (Strictly speaking, this is not entirely accurate in light of powerful > > optimisations, link-time optimisation, etc., but it should be fine > > unless the compiler was written by sadists.) > > > >> > >> > >>> The solution is to put: > >>> > >>> forceDependency(BDH); > >>> forceDependency(BDL); > >>> forceDependency(C4); > >>> > >>> before disableInterrupt(). The "forceDependency" macro tells the > >>> compiler that it needs to know the value of "BDH" for some inline > >>> assembly - even though the inline assembly is empty and does nothing. > >> > >> Does this trick work for avr-gcc too? Because, many think there isn't a > >> solution to that problem. > >> > > > > Yes, it works for avr-gcc. And I posted it as a solution on the > > avr-libc-dev mailing list about a year ago, but the developers had hoped > > to find a way to make the compiler automatically generate the desired > > code rather than having "ugly" assembly. I agree that mind-reading > > compilers are ultimately the best solution, but in the meantime this > > trick will do the job. > > > > > > I don't know if you often look at the generated assembly for your > > compilations, but it is always interesting to see exactly what code the > > compiler generates in different circumstances. > > Yes, I think this too. > > Moreover I start thinking your best approach to write a really working > function that involves those critical aspects, is to write in assembler > directly. Mostly for safety critical applications (where human deaths > are possible).
It is up to you to choose the best method here, but I find it is best to write in C, with a good understanding of /exactly/ what the code does. Be generous with your volatiles - safety trumps speed every time. Then check the assembly too!
> > If I have to check the assembler generated by the compiler, I need to > learn it... so it is better to write in assembler directly.
That's not the same thing at all. If you make a mistake when /checking/ the assembly, then most likely you will simply fail to understand what the compiler did, and double-check your source code and re-read the information about the assembly codes. If you make a mistake when /writing/ the assembly, you are in /big/ trouble.
> > >>> I believe that is enough lessons for today. I hope my posts here have > >>> been of use to both the OP (well done for asking difficult and > >>> interesting questions) and to others in the group. > >> > >> Thank you very much for sharing your experience. > >> > >> [1] > >> > www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > > >> > >> > >> > > > > >
After learning interesting things from David about volatile keyword use, 
memory barriers and so on, I want to show three solutions to my original 
problem, with pros and cons.

I can't be defined a guru, so my solutions could be wrong and I will be 
happy if someone will write if they are right or wrong, giving 
suggestions. I'm writing those solutions mainly because explaining 
something to others is the best method to check if you have understood 
the topic.

Here I will try to better define the original problem.

A simple audio library should be written based around the main function 
play(). The audio stream to play is in RAM and is simply an array of 
samples to generate at the output of a DAC peripheral, embedded in the 
CPU. If the application calls play() when an audio stream is actually 
playing, the library should stop the actual stream and starts playing 
the new stream. The samples are generated at the output at the sampling 
frequency, generated by a timer interrupt.

--- audio.c ---
typedef struct {
   const uint16_t *samples;
   size_t size;
   uint16_t *s;
   bool init;
} AudioStream;

static AudioStream curr_s;             // current stream in playing
static AudioStream next1_s, next2_s;   // next streams to play
static AudioStream *curr, *next;       // used mainly as atomic flags

void Timer_ISR(void) {
   uint16_t sample;
   bool dac_refresh = false;
   if (next) {        // a new stream is pending
     curr_s = *next;
     curr = &curr_s;
     curr->init = 0;
     next = NULL;
   }
   if (curr) {        // we are playing a stream
     if (!curr->init) {
       curr->s = curr->samples;
       curr->init = 1;
     }
     sample = *(curr->s);
     dac_refresh = true;
     if (++curr->s - curr->samples == curr->size) {
       curr = NULL;   // last sample processed
     }
   }
   if (dac_refresh) DAC_VAL = sample;
}

void play(const uint16_t *samples, size_t size)
{
   volatile AudioStream **const c =
            (volatile AudioStream **)&curr;
   volatile AudioStream **const n =
            (volatile AudioStream **)&next;
   if (*c == NULL) {
     volatile AudioStream *const s =
           (volatile AudioStream *)&curr_s;
     s->samples = samples;
     s->size = size;
     s->init = 0;
     *c = &curr_s;
   } else if (*n != &next1_s) {
     volatile AudioStream *const s =
           (volatile AudioStream *)&next1_s;
     s->samples = samples;
     s->size = size;
     s->init = 0;
     *n = &next1_s;
   } else {
     volatile AudioStream *const s =
           (volatile AudioStream *)&next2_s;
     s->samples = samples;
     s->size = size;
     s->init = 0;
     *n = &next2_s;
   }
}
--- audio.c ---

This is my preferred solution, because it doesn't use memory barriers at 
all, so it is perfectly portable on different platforms and completely 
standard.

I use two buffers for the next audio stream to play. They are necessary 
in situation where three (or more) calls to play() functions happens in 
main application, without any timer interrupt occurred:

   play(audio1, audio1_size);
   ...
   play(audio2, audio2_size);
   ...
   play(audio3, audio3_size);

With only one next buffer, there is a real possibility that troubles 
could happen if timer interrupt triggers exactly at a precise point 
(that is my original question in my original post).

The static variables that are changed in ISR aren't declared volatile. 
This is to avoid stopping optimizing the ISR (having a small and fast 
ISR is surely a benefit).

The volatile memory accesses should be done in main/background 
application, i.e. in play() functions. This is why I don't access 
directly to curr, curr_s, next, next1_s and next2_s. I declared mirror 
pointers to volatile objects.

As pointed by David, it's important to use volatile for *all* the 
variables involved in play() function. Let's take a look at one part of it:

   if (*c == NULL) {
     volatile AudioStream *s = (volatile AudioStream *)&curr_s;
     s->samples = samples;
     s->size = size;
     s->init = 0;
     *c = &curr_s;

If *c (that is curr) is NULL, we aren't playing any stream and ISR 
doesn't change any variable. It could appear safe to not use volatile in 
this first branch in the following *wrong* way:

   if (*c == NULL) {
     curr_s.samples = samples;
     curr_s.size = size;
     curr_s.init = 0;
     *c = &curr_s;
   ...

Here curr_s isn't volatile so compiler could reorder instructions. If 
the last statement "*c = &curr_s" is moved at the top (and this is 
allowed) we will be in trouble if interrupt triggers exactly after.

In order to avoid reordering of instructions, we *must* use *only* 
volatile accesses. As David explained, compiler can't change the order 
of volatile memory accesses.

The only drawback of this approach is the programmer must carefully 
check the critical code (function play), checking if an interrupt 
trigger *at any point* could generate problems.  To avoid problems, you 
will discover you need two next buffers and an atomic flag that switches 
from one buffer to the other.

Regarding curr and next flags, they are pointers. So I'm supposing the 
architecture is capable to atomically read/write pointers. If this isn't 
the case (for example, AVR architectures), you could change pointers to 
unsigned char: curr would be current_stream_is_active (0 or 1) and next 
could be next_stream_pending (0 for no pending stream, 1 for stream 
pending in next1_s, 2 for stream pending in next2_s).

I hope I haven't written too wrong things.

Il 30/04/2016 17:39, David Brown ha scritto:
 > [...]
> So I would suggest you either use my functions, whose names make it > clear what is going on: > > static inline void disableGlobalInterrupts(void) { > asm volatile("cpsid i" ::: "memory"); > } > > static inline void enableGlobalInterrupts(void) { > asm volatile("cpsie i" ::: "memory"); > } > > > Or you use your functions, but give them better (IMHO) names: > > static int pauseInterrupts(void) > __attribute__((warn_unused_result)); > static inline int pauseInterrupts(void) > { > int primask_copy; > asm("mrs %[primask_copy], primask\n\t" // save interrupt status > "cpsid i\n\t" // disable interrupts > : [primask_copy] "=r" (primask_copy) > :: "memory"); > > return primask_copy; > } > > static inline void restoreIntterupts(int primask) > { > int primask_copy = primask; > > // Restore interrupts to their previous value > asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) > : "memory"); > }
avr-libc (in atomic.h) defines ATOMIC_BLOCK macro that accepts ATOMIC_FORCEON or ATOMIC_RESTORESTATE. It's gcc, so I think those macros can be reused for Cortex-M (and gcc compiler), of course changing accordingly assembler instructions that enable/disable interrupts.
I admit that following code instruction by instruction for checking if 
an interrupt trigger could bring to problems is tricky. The use of a 
double buffer (next1_s and next2_s) is obscure at first.

Maybe it could be simpler to disable and enable interrupts, without 
worrying about next pending audio stream.

--- audio.c ---
typedef struct {
   const uint16_t *samples;
   size_t size;
   uint16_t *s;
} AudioStream;

static AudioStream curr_s;      // current stream in playing
static AudioStream *curr;       // used mainly as atomic flags

void Timer_ISR(void) {
   uint16_t sample;
   bool dac_refresh = false;
   if (curr) {        // we are playing a stream
     if (curr->s == NULL) {
       curr->s = curr->samples;
     }
     sample = *(curr->s);
     dac_refresh = true;
     if (++curr->s - curr->samples == curr->size) {
       curr = NULL;   // last sample processed
     }
   }
   if (dac_refresh) DAC_VAL = sample;
}

void play(const uint16_t *samples, size_t size)
{
   disableInterrupts();  // start of critical section
   curr_s.samples = samples;
   curr_s.size = size;
   curr_s.s = NULL;
   curr = &curr_s;
   enableInterrupts();   // end of critical section
}
--- audio.c ---

In this solution, I don't need the init member in AudioStream structure: 
I use s member to understand if it is initialized or not.

Here I don't need volatile for any variables managed in play(): after 
entering critical section, all the variables can't be changed 
asyncronously (interrupts are disabled), so their access can be normally 
optimized.
[It is possible to disable only the Timer interrupt and not all the 
interrupts.]

The only problem is to avoid optimizer to move instructions from inside 
the critical section to outside (before disableInterrupts() or after 
enableInterrupts()).
This is one thing I never taked care until David explanation.

One solution to avoid reordering is to embed a memory barrier in 
disableInterrupts() and enableInterrupts(). The memory barrier 
guarantees that the memory accesses before and after it (inside critical 
section in our case) aren't optimized (reordered), even if curr_s and 
curr aren't volatile (David, is this right?).


If the code is for Atmel AVR architecture, I think it is better to use 
macros from <util/atomic.h>:

#include <util/atomic.h>
...
void play(const uint16_t *samples, size_t size)
{
   ATOMIC_BLOC(ATOMIC_FORCEON) {
     curr_s.samples = samples;
     curr_s.size = size;
     curr_s.s = NULL;
     curr = &curr_s;
   }
}

The macro ATOMIC_BLOCK automatically disable interrupts at the beginning 
(using a memory barrier) and enable interrupt at the end (using again a 
memory barrier).

Incidentally, I noticed a strange thing in <util/atomic.h> and 
<interrupt.h>:

--- <interrupt.h> ---
...
#define sei()  __asm__ __volatile__ ("sei" ::: "memory")
...
--- <interrupt.h> ---

--- <util/atomic.h> ---
...
static __inline__ void __iSeiParam(const uint8_t *__s)
{
   sei();
   __asm__ volatile ("" ::: "memory");
   (void)__s;
}
...
--- <util/atomic.h> ---

Why adding a memory barrier after sei() in __iSeiParam() if sei() macro 
alreay has a memory barrier?

Anyway, reading <util/atomic.h> is very instructive, because many tricks 
of C language and gcc compiler are used there (for example, cleanup 
attribute of a variable).


If the code is for Cortex-Mx architectures, macros proposed by Tim or 
David can be used. I don't know if it is necessary to add DSB and ISB hw 
memory barriers after "cpsid i", to be sure pending interrupts won't 
really trigger after disabling interrupts instruction.
On 02/05/16 09:48, pozz wrote:
> Il 30/04/2016 17:39, David Brown ha scritto: >> [...] >> So I would suggest you either use my functions, whose names make it >> clear what is going on: >> >> static inline void disableGlobalInterrupts(void) { >> asm volatile("cpsid i" ::: "memory"); >> } >> >> static inline void enableGlobalInterrupts(void) { >> asm volatile("cpsie i" ::: "memory"); >> } >> >> >> Or you use your functions, but give them better (IMHO) names: >> >> static int pauseInterrupts(void) >> __attribute__((warn_unused_result)); >> static inline int pauseInterrupts(void) >> { >> int primask_copy; >> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >> "cpsid i\n\t" // disable interrupts >> : [primask_copy] "=r" (primask_copy) >> :: "memory"); >> >> return primask_copy; >> } >> >> static inline void restoreIntterupts(int primask) >> { >> int primask_copy = primask; >> >> // Restore interrupts to their previous value >> asm("msr primask, %[primask_copy]" : : [primask_copy] "r" >> (primask_copy) >> : "memory"); >> } > > avr-libc (in atomic.h) defines ATOMIC_BLOCK macro that accepts > ATOMIC_FORCEON or ATOMIC_RESTORESTATE. It's gcc, so I think those > macros can be reused for Cortex-M (and gcc compiler), of course changing > accordingly assembler instructions that enable/disable interrupts. > >
Certainly the principles behind these macros could be adapted and reused. Whether or not you like the format of those macros is a matter of taste, but the idea is that a local variable (typically held in a register, not the stack) gets a copy of the old interrupt status, then interrupts are disabled. A gcc extension to allow a "clean up" function is used to make sure a function (to enable, disable or restore interrupts as appropriate) is called when the local variable goes out of scope. Personally, I think the method is quite nice - it avoids the need to match your "enter critical block" with an "exit critical block". For those using C++, it is perhaps neater to use a C++ class with a destructor to handle this.
Il 28/04/2016 23:42, David Brown ha scritto:
> On 28/04/16 22:14, Tim Wescott wrote: > >> >> Normally you identify those bits of code that need to be atomic but >> aren't, you turn off interrupts for JUST THAT PART, and then you turn >> interrupts back on. >> >> You have to accept that your interrupt latency will increase a bit (and >> you want keep a stick handy, to whack any developer who wraps a >> protection block around more than the absolute minimum of code). >> > > Turning off interrupts for a bit is often the simplest way, but just be > /very/ careful that you really do it correctly. Something like: > > > curr_s.data = data; > curr_s.size = size; > asm volatile("cpsid i"); > curr = &curr_s; > asm volatile("cpsie i"); > > > is /not/ good enough. The compiler can freely move the writes to curr_s > inside the critical zone (leading to unpleasant whacking), or it can > move the write to curr outside the critical zone (leading to everything > working perfectly until the customer demonstration, when it fails). It > can even move the writes to curr_s after the write to curr, and after > the critical zone - who knows what will happen then. > > > Make sure you have memory barriers, such as these (for ARM Cortex-M > devices):
Maybe another possibility is to switch off completely the optimizer for a certain funcion: void critical_function(...) __attribute__((optimize("O0"))) { curr_s.data = data; curr_s.size = size; asm ("cpsid i"); curr = &curr_s; asm volatile("cpsie i"); } In this way, I don't think volatile attribute for inline assembly code is necessary anymore. However gcc manual says regarding this attribute: "This attribute should be used for debugging purposes only. It is not suitable in production code." Why?
pozz wrote:

> Il 28/04/2016 23:42, David Brown ha scritto: >> On 28/04/16 22:14, Tim Wescott wrote: >> >>> >>> Normally you identify those bits of code that need to be atomic but >>> aren't, you turn off interrupts for JUST THAT PART, and then you turn >>> interrupts back on. >>> >>> You have to accept that your interrupt latency will increase a bit (and >>> you want keep a stick handy, to whack any developer who wraps a >>> protection block around more than the absolute minimum of code). >>> >> >> Turning off interrupts for a bit is often the simplest way, but just be >> /very/ careful that you really do it correctly. Something like: >> >> >> curr_s.data = data; >> curr_s.size = size; >> asm volatile("cpsid i"); >> curr = &curr_s; >> asm volatile("cpsie i"); >> >> >> is /not/ good enough. The compiler can freely move the writes to curr_s >> inside the critical zone (leading to unpleasant whacking), or it can >> move the write to curr outside the critical zone (leading to everything >> working perfectly until the customer demonstration, when it fails). It >> can even move the writes to curr_s after the write to curr, and after >> the critical zone - who knows what will happen then. >> >> >> Make sure you have memory barriers, such as these (for ARM Cortex-M >> devices): > > Maybe another possibility is to switch off completely the optimizer for > a certain funcion: > > void > critical_function(...) __attribute__((optimize("O0"))) > { > curr_s.data = data; > curr_s.size = size; > asm ("cpsid i"); > curr = &curr_s; > asm volatile("cpsie i"); > } > > In this way, I don't think volatile attribute for inline assembly code > is necessary anymore. > > However gcc manual says regarding this attribute: > > "This attribute should be used for debugging purposes > only. It is not suitable in production code." > > Why?
Switching optimize off does not guarantee that the compiler respects it and even then the compiler is still free to rearrange everything. May not now, but the next version. What we really need for some embedded stuff is a compiler option or pragma to tell the compiler "keep your fingers off my code" and a guaranty that this is respected. -- Reinhardt
On 6.5.16 13:44, Reinhardt Behm wrote:
> pozz wrote: > >> Il 28/04/2016 23:42, David Brown ha scritto: >>> On 28/04/16 22:14, Tim Wescott wrote: >>> >>>> >>>> Normally you identify those bits of code that need to be atomic but >>>> aren't, you turn off interrupts for JUST THAT PART, and then you turn >>>> interrupts back on. >>>> >>>> You have to accept that your interrupt latency will increase a bit (and >>>> you want keep a stick handy, to whack any developer who wraps a >>>> protection block around more than the absolute minimum of code). >>>> >>> >>> Turning off interrupts for a bit is often the simplest way, but just be >>> /very/ careful that you really do it correctly. Something like: >>> >>> >>> curr_s.data = data; >>> curr_s.size = size; >>> asm volatile("cpsid i"); >>> curr = &curr_s; >>> asm volatile("cpsie i"); >>> >>> >>> is /not/ good enough. The compiler can freely move the writes to curr_s >>> inside the critical zone (leading to unpleasant whacking), or it can >>> move the write to curr outside the critical zone (leading to everything >>> working perfectly until the customer demonstration, when it fails). It >>> can even move the writes to curr_s after the write to curr, and after >>> the critical zone - who knows what will happen then. >>> >>> >>> Make sure you have memory barriers, such as these (for ARM Cortex-M >>> devices): >> >> Maybe another possibility is to switch off completely the optimizer for >> a certain funcion: >> >> void >> critical_function(...) __attribute__((optimize("O0"))) >> { >> curr_s.data = data; >> curr_s.size = size; >> asm ("cpsid i"); >> curr = &curr_s; >> asm volatile("cpsie i"); >> } >> >> In this way, I don't think volatile attribute for inline assembly code >> is necessary anymore. >> >> However gcc manual says regarding this attribute: >> >> "This attribute should be used for debugging purposes >> only. It is not suitable in production code." >> >> Why? > > Switching optimize off does not guarantee that the compiler respects it and > even then the compiler is still free to rearrange everything. May not now, > but the next version. > > What we really need for some embedded stuff is a compiler option or pragma > to tell the compiler "keep your fingers off my code" and a guaranty that > this is respected.
The GCC tool for that is __asm__ volatile (). -- -TV
Il 06/05/2016 14:13, Tauno Voipio ha scritto:
> On 6.5.16 13:44, Reinhardt Behm wrote: >> pozz wrote: >> >>> Il 28/04/2016 23:42, David Brown ha scritto: >>>> On 28/04/16 22:14, Tim Wescott wrote: >>>> >>>>> >>>>> Normally you identify those bits of code that need to be atomic but >>>>> aren't, you turn off interrupts for JUST THAT PART, and then you turn >>>>> interrupts back on. >>>>> >>>>> You have to accept that your interrupt latency will increase a bit >>>>> (and >>>>> you want keep a stick handy, to whack any developer who wraps a >>>>> protection block around more than the absolute minimum of code). >>>>> >>>> >>>> Turning off interrupts for a bit is often the simplest way, but just be >>>> /very/ careful that you really do it correctly. Something like: >>>> >>>> >>>> curr_s.data = data; >>>> curr_s.size = size; >>>> asm volatile("cpsid i"); >>>> curr = &curr_s; >>>> asm volatile("cpsie i"); >>>> >>>> >>>> is /not/ good enough. The compiler can freely move the writes to >>>> curr_s >>>> inside the critical zone (leading to unpleasant whacking), or it can >>>> move the write to curr outside the critical zone (leading to everything >>>> working perfectly until the customer demonstration, when it fails). It >>>> can even move the writes to curr_s after the write to curr, and after >>>> the critical zone - who knows what will happen then. >>>> >>>> >>>> Make sure you have memory barriers, such as these (for ARM Cortex-M >>>> devices): >>> >>> Maybe another possibility is to switch off completely the optimizer for >>> a certain funcion: >>> >>> void >>> critical_function(...) __attribute__((optimize("O0"))) >>> { >>> curr_s.data = data; >>> curr_s.size = size; >>> asm ("cpsid i"); >>> curr = &curr_s; >>> asm volatile("cpsie i"); >>> } >>> >>> In this way, I don't think volatile attribute for inline assembly code >>> is necessary anymore. >>> >>> However gcc manual says regarding this attribute: >>> >>> "This attribute should be used for debugging purposes >>> only. It is not suitable in production code." >>> >>> Why? >> >> Switching optimize off does not guarantee that the compiler respects >> it and >> even then the compiler is still free to rearrange everything. May not >> now, >> but the next version. >> >> What we really need for some embedded stuff is a compiler option or >> pragma >> to tell the compiler "keep your fingers off my code" and a guaranty that >> this is respected. > > > The GCC tool for that is __asm__ volatile ().
If you write all the function code in assembler (with __asm__ volatile), it's ok. If you mix inline assembler (even with volatile) and C code, there is some risk the compiler/optimizer rearrange the instructions anyway. For example, here is explained how to avoid rearranging, even with volatile assembler: http://www.ethernut.de/en/documents/arm-inline-asm.html
On 06/05/16 14:44, pozz wrote:
> Il 06/05/2016 14:13, Tauno Voipio ha scritto: >> On 6.5.16 13:44, Reinhardt Behm wrote: >>> pozz wrote: >>> >>>> Il 28/04/2016 23:42, David Brown ha scritto: >>>>> On 28/04/16 22:14, Tim Wescott wrote: >>>>> >>>>>> >>>>>> Normally you identify those bits of code that need to be atomic but >>>>>> aren't, you turn off interrupts for JUST THAT PART, and then you turn >>>>>> interrupts back on. >>>>>> >>>>>> You have to accept that your interrupt latency will increase a bit >>>>>> (and >>>>>> you want keep a stick handy, to whack any developer who wraps a >>>>>> protection block around more than the absolute minimum of code). >>>>>> >>>>> >>>>> Turning off interrupts for a bit is often the simplest way, but >>>>> just be >>>>> /very/ careful that you really do it correctly. Something like: >>>>> >>>>> >>>>> curr_s.data = data; >>>>> curr_s.size = size; >>>>> asm volatile("cpsid i"); >>>>> curr = &curr_s; >>>>> asm volatile("cpsie i"); >>>>> >>>>> >>>>> is /not/ good enough. The compiler can freely move the writes to >>>>> curr_s >>>>> inside the critical zone (leading to unpleasant whacking), or it can >>>>> move the write to curr outside the critical zone (leading to >>>>> everything >>>>> working perfectly until the customer demonstration, when it >>>>> fails). It >>>>> can even move the writes to curr_s after the write to curr, and after >>>>> the critical zone - who knows what will happen then. >>>>> >>>>> >>>>> Make sure you have memory barriers, such as these (for ARM Cortex-M >>>>> devices): >>>> >>>> Maybe another possibility is to switch off completely the optimizer for >>>> a certain funcion:
That's a bad idea, for several reasons. 1. You can't "switch off the optimiser". You can give a compiler hints as to how much effort it should make in optimising code, but it is only a hint - the compiler will do various optimisations and re-organisations anyway. 2. You cannot turn bad code into good code by fiddling with optimiser options. Write your code /correctly/, according to the rules of C and the additional facilities provided by individual compilers, and let the optimiser do its job. If your code "works when optimisation is disabled", then you can be pretty sure that your code is bad. 3. With -O0 optimisation, gcc (and many other compilers) produce terribly inefficient code. That helps no one. 4. Mixing optimisation levels in different functions in one translation unit works poorly. The compiler does not handle functions completely separately, so different options have a tendency to "rub off" on other functions. So you will get some optimisations unexpectedly enabled in this one function, and others unexpectedly disabled in other functions.
>>>> >>>> void >>>> critical_function(...) __attribute__((optimize("O0"))) >>>> { >>>> curr_s.data = data; >>>> curr_s.size = size; >>>> asm ("cpsid i"); >>>> curr = &curr_s; >>>> asm volatile("cpsie i"); >>>> } >>>> >>>> In this way, I don't think volatile attribute for inline assembly code >>>> is necessary anymore. >>>> >>>> However gcc manual says regarding this attribute: >>>> >>>> "This attribute should be used for debugging purposes >>>> only. It is not suitable in production code." >>>> >>>> Why? >>>
See above (and Reinhardt's point below).
>>> Switching optimize off does not guarantee that the compiler respects >>> it and >>> even then the compiler is still free to rearrange everything. May not >>> now, >>> but the next version. >>> >>> What we really need for some embedded stuff is a compiler option or >>> pragma >>> to tell the compiler "keep your fingers off my code" and a guaranty that >>> this is respected.
No, we don't. We need tools for telling the compiler about the code in more detail than the C language provides - and asking the compiler to give us the best code it can within those additional restrictions. "volatile" is a good step here. Memory clobbers are an additional tool, though the details are specific to each compiler, and they are quite coarse-grained.
>> >> >> The GCC tool for that is __asm__ volatile ().
That is part of it, but it is not the only one.
> > If you write all the function code in assembler (with __asm__ volatile), > it's ok. If you mix inline assembler (even with volatile) and C code, > there is some risk the compiler/optimizer rearrange the instructions > anyway.
I should hope that the compiler will re-arrange and optimise the inline assembler - that is one of the features of gcc that beats many commercial tools that get flustered and panic whenever they see assembly code. The fact that the compiler will handle inline assembly within the flow of the optimiser and the surrounding C code lets you use target-specific features without losing basic C optimisations or having to write large chunks of performance-critical code in manual assembly. But it does mean that you have to learn to understand how to talk to your compiler if you want the very best quality code, and are not satisfied by simply adding a generous selection of "volatile" qualifiers.
> > For example, here is explained how to avoid rearranging, even with > volatile assembler: > http://www.ethernut.de/en/documents/arm-inline-asm.html > >
The 2026 Embedded Online Conference