EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Help with interrupt software routine and non-atomic operations

Started by pozz April 28, 2016
On Fri, 29 Apr 2016 12:50:23 +0200, David Brown wrote:

> On 29/04/16 06:00, Tim Wescott wrote: >> On Thu, 28 Apr 2016 23:42:39 +0200, David Brown wrote: >> >>> On 28/04/16 22:14, Tim Wescott wrote: >>> >>> >>>> Normally you identify those bits of code that need to be atomic but >>>> aren't, you turn off interrupts for JUST THAT PART, and then you turn >>>> interrupts back on. >>>> >>>> You have to accept that your interrupt latency will increase a bit >>>> (and you want keep a stick handy, to whack any developer who wraps a >>>> protection block around more than the absolute minimum of code). >>>> >>>> >>> Turning off interrupts for a bit is often the simplest way, but just >>> be /very/ careful that you really do it correctly. Something like: >>> >>> >>> curr_s.data = data; >>> curr_s.size = size; >>> asm volatile("cpsid i"); >>> curr = &curr_s; >>> asm volatile("cpsie i"); >>> >>> >>> is /not/ good enough. The compiler can freely move the writes to >>> curr_s inside the critical zone (leading to unpleasant whacking), or >>> it can move the write to curr outside the critical zone (leading to >>> everything working perfectly until the customer demonstration, when it >>> fails). It can even move the writes to curr_s after the write to >>> curr, and after the critical zone - who knows what will happen then. >>> >>> >>> Make sure you have memory barriers, such as these (for ARM Cortex-M >>> devices): >>> >>> >>> static inline void disableGlobalInterrupts(void) { >>> asm volatile("cpsid i" ::: "memory"); >>> } >>> >>> static inline void enableGlobalInterrupts(void) { >>> asm volatile("cpsie i" ::: "memory"); >>> } >> >> Something just went on my "to be fixed" queue. I assume that I've been >> getting away with it by luck. > > A great many people misunderstand about how memory accesses can be > re-ordered in C. They are saved by several things: > > 1. If you use "volatile" generously, you will probably get the ordering > you want. All volatile accesses (either because the variable is > declared "volatile", or you use volatile casts for the access) are > ordered with respect to each other. It is only non-volatile accesses > that can change order. > > 2. Compilers don't usually move reads and writes around unless there is > something to gain. If you have a fast superscaler processor with > caches, it can happen a lot - in particular, compilers will often > generate reads quite early so that the data is there in the register > when it is actually needed. But for simpler, in-order processors, there > is often no point in reading early or writing late, although the > compiler may not bother writing out the data if it knows it will be > writing new values to the variable shortly, and it has the free > registers available. > > 3. Many developers don't really understand compiler settings and > optimisations, and simply compile with optimisations disabled (how many > times have you heard people say their code runs "correctly" when > optimisations are disabled, but not when optimised?). Of course that > means bigger and slower code - and in this branch, that can sometimes > mean more expensive and power-hungry microcontrollers. And it also > means code that may break on newer compilers - compilers are still free > to optimise as they want despite your choice of compiler flags. "-O" > settings are hints, not demands. > > 4. When you get this sort of thing wrong, you probably won't notice. > Everything will work unless an interrupt hits at exactly the right point > in the code, and the chances of that happening are typically small. > > >> At any rate, to provide the full-meal-deal you also save the current >> state of interrupts, so that you don't turn the turned-off interrupts >> off, and then turn them on inappropriately. >> >> inline int disable_interrupts(void) >> { >> int primask_copy; >> asm("mrs %[primask_copy], primask\n\t" // save interrupt status >> "cpsid i\n\t" // disable interrupts : [primask_copy] >> "=r" (primask_copy) >> :: "memory"); >> >> return primask_copy; >> } >> >> inline void enable_interrupts(int primask) >> { >> int primask_copy = primask; >> >> // Restore interrupts to their previous value asm("msr primask, >> %[primask_copy]" : : [primask_copy] "r" (primask_copy) >> : "memory"); >> } >> >> Then in the code you write: >> >> int interrupt_state = disable_interrupts(); >> >> // Put your teeny bit of critical stuff here >> >> enable_interrupts(interrupt_state); >> >> I _hope_ that this works right for all cases -- I'm interested in what >> David has to say about it. > > That all looks fine. However, generally you don't need to save > interrupt state - because you /know/ whether you are in a critical > section or not. If you have written: > > foo(void) { > // Do lots of stuff disableInterrupts(); > // tiny critical section stuff enableInterrupts(); > // Do lots more stuff > } > > then the only reason you would want to make your > "disableInterrupts/enableInterrupts" store and restore the previous > value is if "foo" is sometimes being called from within an interrupt > function or another critical section. And if that is happening, your > interrupts will be blocked during "lots of stuff" and "lots more stuff" > - your system is likely broken. > > However, sometimes you have functions that you want to call from within > interrupts, or other situations where nested critical sections could > occur - and in that acse, you want to track and restore the interrupt > status.
I know. It's a a team-programming thing. Sometimes you sit there thinking "no one but a dumbass would use my code in this way" -- then you realize that: (A) sometimes your manager* hires dumbasses; (B) when a dumbass breaks your code, you'll be the first one blamed, and even when the smoke clears some of that blame will be stuck to you; and finally (C), sometimes, on a bad day, you're the dumbass that breaks your own code. * Well, I believe that David owns his company, but still, mistakes happen. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On Fri, 29 Apr 2016 14:07:53 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>Note that both "volatile" and compiler memory barriers (using empty >inline assembly with a "memory clobber") are purely compiler-level >concepts. They don't affect any scheduling or pipelining in the cpu, >cache, or other hardware.
That's only partly correct. It is true that the compiler *should* never reorder reads or writes across a declared memory barrier, however ... ... memory barriers (aka fences) are actual CPU instructions that prevent reordering reads and/or writes in the hardware. OoO CPUs require general fencing capability, and in-order CPUs require (the moral equivalent of) store fences if they perform write combining. The asm barrier declaration should generate the corresponding barrier instruction for the target CPU. George
Il 29/04/2016 14:07, David Brown ha scritto:
> On 29/04/16 13:22, pozz wrote: >> Il 29/04/2016 13:02, David Brown ha scritto: >>> On 29/04/16 09:02, pozz wrote: > > <snip> > >>>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed >>>> *after* curr_s assignements? >>>> >>> >>> No - volatile accesses are only ordered with respect to other volatile >>> accesses (and other observable or potentially observable events, such as >>> calls to external functions). The stores to curr_s can be moved after >>> the store to c, since the stores to curr_s are not volatile. >>> >>> (You could also add a memory barrier before "c = &curr_s;".) >> >> In that case (adding a memory barrier before "c = &curr_s", is it >> necessary to declare curr_s and c variables as volatile? >> > > You don't need both - a memory barrier tells the compiler that anything > that the source code says should be written before the barrier, will be > written before the barrier, and that anything that the source code says > should be written after the barrier, is written after the barrier. It > also says that anything the compiler has read before the barrier should > be discarded and re-read after the barrier if needed. So the memory > barrier provides a synchronisation point for all memory accesses, > volatile and non-volatile.
I think I understood, even if it is a complex topic and sincerely I never heard about memory barriers before (and I have more than 10 years of embedded programming experience... my shame!). Maybe I will post some other C-related technical questions to comp.lang.c for better explanations of compiler memory barriers, volatile keyword and reordering of instructions made during optimizations by the compiler. Just a simple questions: is the concept of compiler memory barrier defined in the C standard? I don't think, because it is a special assembler instruction. So the existence of a memory barrier functionality and its effect could depend on the compiler, right?
> Note that both "volatile" and compiler memory barriers (using empty > inline assembly with a "memory clobber") are purely compiler-level > concepts. They don't affect any scheduling or pipelining in the cpu, > cache, or other hardware.
Even if I just learned there are hardware memory barriers too...
>>>> Yes, disabling interrupts is the simplest solution, but I usually try to >>>> find a more elegant solution... if any exists. >>> >>> Often, simple /is/ elegant! >>> >>> Remember, it is extremely difficult to spot issues with this sort of >>> thing in testing. You can examine generated assembly code - but perhaps >>> at a later date, with slight changes to other code, compiler options, or >>> compiler versions, the generated assembly no longer matches. >>> >>> So it is better to be safe than sorry, even if that means marginally >>> sub-optimal code. Typically for this sort of thing, adding extra >>> volatiles is not going to cause inefficiency - you are going to do these >>> reads and writes anyway. And disabling interrupts for a few cycles is >>> rarely a real problem - it can be much better than using many more >>> cycles for a non-locking solution. >> >> Yes, you're right again. >> >> If I will decide to disable timer interrupt in foo(), is it ok to avoid >> declaring variables (curr, next, curr_s, ...) with keyword volatile? >> >> I think yes, it should be ok. If interrupt (where the variables are >> potentially changed) is disabled, the variables aren't volatile. >> > > Again, you need either volatiles or memory barriers to make sure the > required accesses (and preferably /only/ the required accesses) are > within the critical section. An easy way to do this is to make memory > barriers part of the enable/disable interrupt functions. But note that > memory barriers are a heavy-handed approach - if your code must be as > fast as possible, and your processor has lots of registers, long > pipelines and super-scaler scheduling, then sometimes it is not ideal. > For typical microcontrollers, however, it works very well.
I just checked the definition of sei()/cli() macros in avr-libc and... they have a memory barrier! I never noticed it.
> I've included a few of my favourite functions below. The assembly used > is for the ARM Cortex-M4, but can easily be adapted for other > microcontrollers (and simpler cpus like the AVR or MSP430 don't need any > cpu barrier or flush instructions - but you still need the memory clobber!). > > > One of these that is likely to be a new concept for many (I can imagine > Tim peering intently at this one!) is the "forceDependency" macro below. > This is a solution to problems such as : > > void setBaudrate(uint32_t baud) { > uint32_t divisor = ((busClockTicksPerMicro * 1000000 * 2) + > (baud / 2)) / baud; > uint32_t sbr = divisor >> 5; > > UART_Type* pUart = UART2_BASE_PTR; > > uint8_t BDH = (sbr >> 8) & 0x1f; > uint8_t BDL = sbr & 0xff; > uint8_t C4 = divisor & 0x1f; > > > disableInterrupts(); > > // Critical section starts here > > pUart->BDH = BDH; > pUart->BDL = BDL; > pUart->C4 = C4; > > enableInterrupts(); > } > > First, "divisor" and "sbr" are calculated, which is a slow process > involving multiple cycles and a library call or two. Then the desired > hardware register values are derived. Then interrupts are disabled, and > the values are written into the peripheral registers. > > It looks like interrupts are disabled for the minimal length of time. > > But... > > Even though the disableInterrupt() and enableInterrupt() functions have > a memory barrier, /and/ the pUart pointer is to a volatile variable, > those only apply to memory accesses. /Calculations/ can be moved around > as the compiler fancies. The compiler may not bother calculating > "divisor", "sbr", and so on, until they are actually needed - and that > means /after/ the disableInterrupt() call. The result is that it is > quite possible for all the long calculations to be done within the > critical section.
Aaargh, this example put me in a wrong way. I thought I had understood. Your example is similar to the one showed at [1], even if there isn't a solution there. You say /calculations/ could be moved around and could cross memory barriers... but the results of BDH, BDL and C4 (and divisor and sbr) should be stored/saved in variables. Ok, they are automatic variables, they are on the stack, but they are /variables/. The memory barrier embedded in disableInterrupts() macro should say to the compiler to flush/save all the values from registers to RAM. At least, this is the definition I understood of memory barrier. If the reordering is possible (the interrupts disabling instruction is moved before the calculations instructions), I haven't understood the usefulness of memory barriers.
> The solution is to put: > > forceDependency(BDH); > forceDependency(BDL); > forceDependency(C4); > > before disableInterrupt(). The "forceDependency" macro tells the > compiler that it needs to know the value of "BDH" for some inline > assembly - even though the inline assembly is empty and does nothing.
Does this trick work for avr-gcc too? Because, many think there isn't a solution to that problem.
> I believe that is enough lessons for today. I hope my posts here have > been of use to both the OP (well done for asking difficult and > interesting questions) and to others in the group.
Thank you very much for sharing your experience. [1] www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
On 29/04/16 17:37, Tim Wescott wrote:
> On Fri, 29 Apr 2016 12:50:23 +0200, David Brown wrote: >
>> However, sometimes you have functions that you want to call from within >> interrupts, or other situations where nested critical sections could >> occur - and in that acse, you want to track and restore the interrupt >> status. > > I know. It's a a team-programming thing. Sometimes you sit there > thinking "no one but a dumbass would use my code in this way" -- then you > realize that: (A) sometimes your manager* hires dumbasses; (B) when a > dumbass breaks your code, you'll be the first one blamed, and even when > the smoke clears some of that blame will be stuck to you; and finally > (C), sometimes, on a bad day, you're the dumbass that breaks your own > code. > > * Well, I believe that David owns his company, but still, mistakes happen. >
I actually only own a small bit of my company (from its early days 20+ years ago, when the company couldn't afford competitive salaries and thus gave a few key employees some shares instead). But we are a fairly small group of programmers (the company is mainly electronics production), most projects are handled by only one or two programmers, our department manager is smart enough not to hire dumbasses (mistakes happen, of course, but not often), and I have a lot of influence in what goes on in the programming group. So usually when I have to work with idiotic code, it is because the code came from outside. I might have to pick up the pieces, but I can usually avoid the blame! Your point is valid, of course - how you write your code and functions can be heavily influenced by who will be using them. And you don't want hidden assumptions in your functions - they should be part of the code where possible, part of the name, and if all else fails, clearly part of the comments. So I would suggest you either use my functions, whose names make it clear what is going on: static inline void disableGlobalInterrupts(void) { asm volatile("cpsid i" ::: "memory"); } static inline void enableGlobalInterrupts(void) { asm volatile("cpsie i" ::: "memory"); } Or you use your functions, but give them better (IMHO) names: static int pauseInterrupts(void) __attribute__((warn_unused_result)); static inline int pauseInterrupts(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy) :: "memory"); return primask_copy; } static inline void restoreIntterupts(int primask) { int primask_copy = primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) : "memory"); } You might also like "enterCriticalSection" and "exitCriticalSection" as names. And of course, pick camelcase, underscores, etc., according to preference. (I must remember to use "__attribute((warn_unused_result))" more often in my own code.)
On 29/04/16 18:00, George Neuner wrote:
> On Fri, 29 Apr 2016 14:07:53 +0200, David Brown > <david.brown@hesbynett.no> wrote: > >> Note that both "volatile" and compiler memory barriers (using empty >> inline assembly with a "memory clobber") are purely compiler-level >> concepts. They don't affect any scheduling or pipelining in the cpu, >> cache, or other hardware. > > That's only partly correct. > > It is true that the compiler *should* never reorder reads or writes > across a declared memory barrier, however ... > > ... memory barriers (aka fences) are actual CPU instructions that > prevent reordering reads and/or writes in the hardware. OoO CPUs > require general fencing capability, and in-order CPUs require (the > moral equivalent of) store fences if they perform write combining. > > The asm barrier declaration should generate the corresponding barrier > instruction for the target CPU. >
If you include appropriate assembly instructions along with the memory clobber in the inline assembly function, then you will get a cpu-level barrier. Certainly when you are using super-scaler OoO cpus with long pipelines, or have caches or write buffers, then you need to be using such instructions. There are some examples in the functions I posted earlier. Exactly what counts as a "memory barrier" is a question of definitions, as it can apply at various levels. On "big" systems, such as the x86 or PPC processors, you usually include such assembly code in your definition of barriers. (And you use "volatile" much less, because it is purely a compiler-level concept - this is why Linus Torvalds shuns the use of "volatile" except in the definition of things like barrier macros. It is not powerful enough to do the job on typical Linux cpus, and when you do things correctly with the OS primitives, it is not necessary.) On the smaller systems and microcontrollers typical of c.a.e. members, "memory barrier" usually just means the compiler level memory clobber, as that is all that is needed.
On 30/04/16 16:47, pozz wrote:
> Il 29/04/2016 14:07, David Brown ha scritto: >> On 29/04/16 13:22, pozz wrote: >>> Il 29/04/2016 13:02, David Brown ha scritto: >>>> On 29/04/16 09:02, pozz wrote: >> >> <snip> >> >>>>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed >>>>> *after* curr_s assignements? >>>>> >>>> >>>> No - volatile accesses are only ordered with respect to other volatile >>>> accesses (and other observable or potentially observable events, >>>> such as >>>> calls to external functions). The stores to curr_s can be moved after >>>> the store to c, since the stores to curr_s are not volatile. >>>> >>>> (You could also add a memory barrier before "c = &curr_s;".) >>> >>> In that case (adding a memory barrier before "c = &curr_s", is it >>> necessary to declare curr_s and c variables as volatile? >>> >> >> You don't need both - a memory barrier tells the compiler that anything >> that the source code says should be written before the barrier, will be >> written before the barrier, and that anything that the source code says >> should be written after the barrier, is written after the barrier. It >> also says that anything the compiler has read before the barrier should >> be discarded and re-read after the barrier if needed. So the memory >> barrier provides a synchronisation point for all memory accesses, >> volatile and non-volatile. > > I think I understood, even if it is a complex topic and sincerely I > never heard about memory barriers before (and I have more than 10 years > of embedded programming experience... my shame!). >
With generous use of "volatile", you don't need memory barriers (except if you need cpu synchronisation instructions for OoO processors, as noted by George Neuner).
> Maybe I will post some other C-related technical questions to > comp.lang.c for better explanations of compiler memory barriers, > volatile keyword and reordering of instructions made during > optimizations by the compiler.
You can try, but I suspect you will get more theoretical or hypothetical discussions about how the standards don't actually define memory accesses, combined with posts about how it all worked much better in the old days before the evil compiler writers broke everything. c.a.e. has people with a more pragmatic attitude - if you are programming with gcc for a Cortex M4, or IAR for an msp430, or Keil for an 8051, then what is important is how things work for that tool and that target. Both attitudes can be interesting and educational. Maybe you'll get some discussion going about C11 atomics - that is an area I have not yet investigated much, and can be an alternative to using volatiles and/or memory barriers.
> > Just a simple questions: is the concept of compiler memory barrier > defined in the C standard? I don't think, because it is a special > assembler instruction. So the existence of a memory barrier > functionality and its effect could depend on the compiler, right? >
Correct. (But see the C11 standards for atomics.) "volatile" is in the C standards, of course, but "what constitutes a memory access is implementation defined". A lot of compilers either support gcc inline assembly syntax (including memory clobbers), provide intrinsics or extensions for barriers, or simply have that any function calls or inline assembly act as barriers. The documentation of such features can often be lacking, however. So generous "volatile" usage is the only cross-target way to get this effect.
> >> Note that both "volatile" and compiler memory barriers (using empty >> inline assembly with a "memory clobber") are purely compiler-level >> concepts. They don't affect any scheduling or pipelining in the cpu, >> cache, or other hardware. > > Even if I just learned there are hardware memory barriers too... >
What sort of processors do you use? If you are using an OS or RTOS of some sort, it will usually provide macros or functions to handle this sort of thing for you. When you are doing the messy details yourself, you only need to worry about processor memory ordering when you have OoO, superscaler, write back buffers, memory management, caches, etc. And even then, the hardware is usually arranged in a convenient way, such as to bypass cache and write buffers when accessing memory mapped peripherals. The fun really starts when you have a write-back cache on the cpu, and DMA on the peripheral side that is trying to access the same data.
> >>>>> Yes, disabling interrupts is the simplest solution, but I usually >>>>> try to >>>>> find a more elegant solution... if any exists. >>>> >>>> Often, simple /is/ elegant! >>>> >>>> Remember, it is extremely difficult to spot issues with this sort of >>>> thing in testing. You can examine generated assembly code - but >>>> perhaps >>>> at a later date, with slight changes to other code, compiler >>>> options, or >>>> compiler versions, the generated assembly no longer matches. >>>> >>>> So it is better to be safe than sorry, even if that means marginally >>>> sub-optimal code. Typically for this sort of thing, adding extra >>>> volatiles is not going to cause inefficiency - you are going to do >>>> these >>>> reads and writes anyway. And disabling interrupts for a few cycles is >>>> rarely a real problem - it can be much better than using many more >>>> cycles for a non-locking solution. >>> >>> Yes, you're right again. >>> >>> If I will decide to disable timer interrupt in foo(), is it ok to avoid >>> declaring variables (curr, next, curr_s, ...) with keyword volatile? >>> >>> I think yes, it should be ok. If interrupt (where the variables are >>> potentially changed) is disabled, the variables aren't volatile. >>> >> >> Again, you need either volatiles or memory barriers to make sure the >> required accesses (and preferably /only/ the required accesses) are >> within the critical section. An easy way to do this is to make memory >> barriers part of the enable/disable interrupt functions. But note that >> memory barriers are a heavy-handed approach - if your code must be as >> fast as possible, and your processor has lots of registers, long >> pipelines and super-scaler scheduling, then sometimes it is not ideal. >> For typical microcontrollers, however, it works very well. > > I just checked the definition of sei()/cli() macros in avr-libc and... > they have a memory barrier! I never noticed it. >
The folks behind the avr-libc library are pretty knowledgeable about this sort of thing! Mind you, I did mention to them my "forceDependency" function as a solution to the problem mentioned in the avr-libc documentation (the one you found below), but it didn't make it to the web page.
> >> I've included a few of my favourite functions below. The assembly used >> is for the ARM Cortex-M4, but can easily be adapted for other >> microcontrollers (and simpler cpus like the AVR or MSP430 don't need any >> cpu barrier or flush instructions - but you still need the memory >> clobber!). >> >> >> One of these that is likely to be a new concept for many (I can imagine >> Tim peering intently at this one!) is the "forceDependency" macro below. >> This is a solution to problems such as : >> >> void setBaudrate(uint32_t baud) { >> uint32_t divisor = ((busClockTicksPerMicro * 1000000 * 2) + >> (baud / 2)) / baud; >> uint32_t sbr = divisor >> 5; >> >> UART_Type* pUart = UART2_BASE_PTR; >> >> uint8_t BDH = (sbr >> 8) & 0x1f; >> uint8_t BDL = sbr & 0xff; >> uint8_t C4 = divisor & 0x1f; >> >> >> disableInterrupts(); >> >> // Critical section starts here >> >> pUart->BDH = BDH; >> pUart->BDL = BDL; >> pUart->C4 = C4; >> >> enableInterrupts(); >> } >> >> First, "divisor" and "sbr" are calculated, which is a slow process >> involving multiple cycles and a library call or two. Then the desired >> hardware register values are derived. Then interrupts are disabled, and >> the values are written into the peripheral registers. >> >> It looks like interrupts are disabled for the minimal length of time. >> >> But... >> >> Even though the disableInterrupt() and enableInterrupt() functions have >> a memory barrier, /and/ the pUart pointer is to a volatile variable, >> those only apply to memory accesses. /Calculations/ can be moved around >> as the compiler fancies. The compiler may not bother calculating >> "divisor", "sbr", and so on, until they are actually needed - and that >> means /after/ the disableInterrupt() call. The result is that it is >> quite possible for all the long calculations to be done within the >> critical section. > > Aaargh, this example put me in a wrong way. I thought I had understood. > Your example is similar to the one showed at [1], even if there isn't a > solution there.
Yes, it is exactly the same situation - and "forceDependency" is a good solution to it.
> > You say /calculations/ could be moved around and could cross memory > barriers... but the results of BDH, BDL and C4 (and divisor and sbr) > should be stored/saved in variables. Ok, they are automatic variables, > they are on the stack, but they are /variables/.
No, automatic variables do not necessarily result in memory variables. And even if they are stored on the stack, they don't have to stay in the same place on the stack, nor do they have to be the same size or the same value as you use in the source. The compiler only has to turn them into "real" variables in memory if you do something like pass their address onto an external function. That is why "what constitutes a memory access is implementation-defined". On a device like the AVR or an ARM, with plenty of registers, most automatic variables never get put on the stack. And that's a good thing - it means you can freely make as many local variables as you want in order to write clear and correct code, without worrying about the extra stack space or timing.
> The memory barrier > embedded in disableInterrupts() macro should say to the compiler to > flush/save all the values from registers to RAM. At least, this is the > definition I understood of memory barrier.
Nope. The barrier, as defined by the functions I gave, tells the compiler that at this point, something outside of the compiler's control and knowledge may read or write memory - the whole memory becomes briefly volatile. However, it says /nothing/ about what should or should not be stored in memory.
> > If the reordering is possible (the interrupts disabling instruction is > moved before the calculations instructions), I haven't understood the > usefulness of memory barriers.
I believe you understand about the memory barriers, you just didn't understand about what variables go in memory. As a good guide, everything that has an address is in memory - that includes all global data, file-level data, static data, and dynamically allocated data, as well as any local data whose address you have taken, and any volatile local data. (Strictly speaking, this is not entirely accurate in light of powerful optimisations, link-time optimisation, etc., but it should be fine unless the compiler was written by sadists.)
> > >> The solution is to put: >> >> forceDependency(BDH); >> forceDependency(BDL); >> forceDependency(C4); >> >> before disableInterrupt(). The "forceDependency" macro tells the >> compiler that it needs to know the value of "BDH" for some inline >> assembly - even though the inline assembly is empty and does nothing. > > Does this trick work for avr-gcc too? Because, many think there isn't a > solution to that problem. >
Yes, it works for avr-gcc. And I posted it as a solution on the avr-libc-dev mailing list about a year ago, but the developers had hoped to find a way to make the compiler automatically generate the desired code rather than having "ugly" assembly. I agree that mind-reading compilers are ultimately the best solution, but in the meantime this trick will do the job. I don't know if you often look at the generated assembly for your compilations, but it is always interesting to see exactly what code the compiler generates in different circumstances.
> >> I believe that is enough lessons for today. I hope my posts here have >> been of use to both the OP (well done for asking difficult and >> interesting questions) and to others in the group. > > Thank you very much for sharing your experience. > > [1] > www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > >
On 29/04/16 22:07, David Brown wrote:
> I believe that is enough lessons for today. I hope my posts here have > been of use to both the OP (well done for asking difficult and > interesting questions) and to others in the group.
David, I, for one, would like to thank you for your clear explanations. I have done a fair bit of multi-threaded work built on volatile and atomic compare-and-swap, but though I'd read about memory barriers, I had never figured out when and whether they were compiler-only or also hardware things. You've cleared that up for me! One concept I've seen but still not grokked is the distinction between a "read barrier" and a "write barrier". Is that a thing, and can you briefly explain it and how to use them? Clifford Heath.
On Sun, 1 May 2016 07:51:18 +1000, Clifford Heath <no.spam@please.net>
wrote:

>On 29/04/16 22:07, David Brown wrote: >> I believe that is enough lessons for today. I hope my posts here have >> been of use to both the OP (well done for asking difficult and >> interesting questions) and to others in the group. > >David, I, for one, would like to thank you for your clear explanations. > >I have done a fair bit of multi-threaded work built on volatile and >atomic compare-and-swap, but though I'd read about memory barriers, I >had never figured out when and whether they were compiler-only or also >hardware things. You've cleared that up for me! > >One concept I've seen but still not grokked is the distinction between a >"read barrier" and a "write barrier". Is that a thing, and can you >briefly explain it and how to use them?
At the ISA level, and most importantly for CPUs with relaxed memory ordering models, read and write barriers (and there are several other types as well) force the CPU to complete all reads (or writes) prior to the read (or write) barrier, prior to performing any of the reads (or writes) after that barrier, as visible to *other* CPUs (as well as things like I/O devices). Storage accesses are almost always visible only in the logical order to the CPU performing them (there are some odd exceptions, but not really relevant here). A OoO CPU (or even an in-order one like IPF), with relaxed memory order, might execute: st r1,A st r2,B In either order, and a different CPU might see the stores in either order. Let's say that before the above pair of stores, the memory words A and B had 1 and 2 in them, and r1 and r2 had 3 and 4. A second CPU executing: ld r4,A ld r5,B might, of course, see (r4,r5)=(1,2) or (3,4), if the loads executed before or after the stores. It would also be expected to be possible to see (3,2), if the loads executed *between* the two stores. But if the first CPU can executes the stores out of order, the second CPU might actually be able to see (1,4)! There are obvious cases where that's a problem - let's say B is a semaphore that says A (and possible other stuff) has been updated. So you might have something like theis on the second CPU: loop: ld r6,B cmp r6,4 ;semaphore set? bne loop ld r7,A IOW, the intention is to not read A until B is set. With reordering, the second CPU might see the semaphore signaled *before* it sees the updates to A (and related), and so the second load above might actually return 1, and not 3. A write barrier inserted between the two stores will force the first CPU to execute/complete all the stores prior to the barrier, before any store after the barrier. In the context of the semaphore example above, the semaphore (B) will not get signaled until the updates it indicates (A) are completed. Read barriers work on the other side, and forces order on the visibility of reads to other CPUs. Again using the semaphore example, if the second CPU reorders when the two loads hit memory, you might actually be checking the semaphore B *after* loading A. So on a machine with seriously relaxed memory ordering (Alpha, for example, and a bit simplified), you had darn well better put a write barrier between the two stores on the first CPU, and a read barrier between the two loads on the second CPU. There are other ways to do that, for example, an ISA might provide just one memory barrier that does both, or provide acquire/release semantics for some loads and stores (which can be turned into the same thing). Or you can have a CPU that guarantees completion of many/most/all memory operations in some order. For example x86, while not providing true strict ordering, does provide considerable guarantees which let you avoid explicit memory barriers in the vast majority of situations. On others, like IPF or Alpha, your code will die a quick and horrible death if you assume something as silly as stores actually occurring in order. Again, this is only about visibility from other CPUs (and other things that can access memory, like I/O devices), and *not* the order in which the program running on one CPU sees the accesses, which always (again, with a handful of exceptions) sees stores and loads to memory as if they happened in the order performed by the program.
On 30/04/16 23:51, Clifford Heath wrote:
> On 29/04/16 22:07, David Brown wrote: >> I believe that is enough lessons for today. I hope my posts here have >> been of use to both the OP (well done for asking difficult and >> interesting questions) and to others in the group. > > David, I, for one, would like to thank you for your clear explanations. > > I have done a fair bit of multi-threaded work built on volatile and > atomic compare-and-swap, but though I'd read about memory barriers, I > had never figured out when and whether they were compiler-only or also > hardware things. You've cleared that up for me! > > One concept I've seen but still not grokked is the distinction between a > "read barrier" and a "write barrier". Is that a thing, and can you > briefly explain it and how to use them? >
Robert has given a good explanation here. For the most part, making a distinction between read barriers and write barriers is only really when you have multiple cores and OoO processors - otherwise it is usually safest and simplest to use a generic memory barrier (with appropriate cpu instructions as needed). And if you have such processors, and can't simply use the OS-provided primitives (maybe you are writing the OS!), you probably also want to make things more fine-grained. You want to make sure /this/ read is ordered after /that/ write, and so on, rather than just making a solid barrier. The C11 (and C++11) atomics can handle this - but it is complicated, and I don't understand it well enough myself to try to explain it to others! As Robert noted, when you are concerned about a single cpu, you only need to consider the visibility and ordering for that cpu - and compiler-only ordering (volatile and memory clobber barriers) is sufficient. That applies even in the face of interrupts. Baring a few odd examples, processor ISA's always look as though they simply handle one instruction after the other in sequential order - it is only in connection with peripherals and other cpus that there is an issue.
Il 30/04/2016 20:46, David Brown ha scritto:
 > On 30/04/16 16:47, pozz wrote:
 >> Il 29/04/2016 14:07, David Brown ha scritto:
 >>> On 29/04/16 13:22, pozz wrote:
 >>>> Il 29/04/2016 13:02, David Brown ha scritto:
 >>>>> On 29/04/16 09:02, pozz wrote:
 >>>
 >>> <snip>
 >>>
 >>>>>> Now is the instruction "c = &curr_s;" *guaranteed* to be executed
 >>>>>> *after* curr_s assignements?
 >>>>>>
 >>>>>
 >>>>> No - volatile accesses are only ordered with respect to other 
volatile
 >>>>> accesses (and other observable or potentially observable events,
 >>>>> such as
 >>>>> calls to external functions).  The stores to curr_s can be moved 
after
 >>>>> the store to c, since the stores to curr_s are not volatile.
 >>>>>
 >>>>> (You could also add a memory barrier before "c = &curr_s;".)
 >>>>
 >>>> In that case (adding a memory barrier before "c = &curr_s", is it
 >>>> necessary to declare curr_s and c variables as volatile?
 >>>>
 >>>
 >>> You don't need both - a memory barrier tells the compiler that anything
 >>> that the source code says should be written before the barrier, will be
 >>> written before the barrier, and that anything that the source code says
 >>> should be written after the barrier, is written after the barrier.  It
 >>> also says that anything the compiler has read before the barrier should
 >>> be discarded and re-read after the barrier if needed.  So the memory
 >>> barrier provides a synchronisation point for all memory accesses,
 >>> volatile and non-volatile.
 >>
 >> I think I understood, even if it is a complex topic and sincerely I
 >> never heard about memory barriers before (and I have more than 10 years
 >> of embedded programming experience... my shame!).
 >
 > With generous use of "volatile", you don't need memory barriers (except
 > if you need cpu synchronisation instructions for OoO processors, as
 > noted by George Neuner).

Oh yes, but you have to understand very well what volatile means and how 
it can avoid reordering. Until yesterday, I used to declare a variable 
volatile (usually a static variable at a module scope) when it is 
changed in my ISR (defined in the same module).

I used to follow the C code, statement by statement, and check carefully 
if an interrupt trigger in that point could expose to problems.

Now I know I can't follow the instructions in the same order I write 
them in the C source code, because the compiler could change this order...

The solution again is to use volatile, but for another reason: avoid 
reordering. So volatile keyword should be applied to other variables, 
not only the ones changed in interrupts routines. This is what I didn't 
know.


 >> Maybe I will post some other C-related technical questions to
 >> comp.lang.c for better explanations of compiler memory barriers,
 >> volatile keyword and reordering of instructions made during
 >> optimizations by the compiler.
 >
 > You can try, but I suspect you will get more theoretical or hypothetical
 > discussions about how the standards don't actually define memory
 > accesses, combined with posts about how it all worked much better in the
 > old days before the evil compiler writers broke everything.  c.a.e. has
 > people with a more pragmatic attitude - if you are programming with gcc
 > for a Cortex M4, or IAR for an msp430, or Keil for an 8051, then what is
 > important is how things work for that tool and that target.  Both
 > attitudes can be interesting and educational.
 >
 > Maybe you'll get some discussion going about C11 atomics - that is an
 > area I have not yet investigated much, and can be an alternative to
 > using volatiles and/or memory barriers.

I'm ready to hear all of those things... they could be instructive anyway


 >> Just a simple questions: is the concept of compiler memory barrier
 >> defined in the C standard? I don't think, because it is a special
 >> assembler instruction. So the existence of a memory barrier
 >> functionality and its effect could depend on the compiler, right?
 >>
 >
 > Correct.  (But see the C11 standards for atomics.)
 >
 > "volatile" is in the C standards, of course, but "what constitutes a
 > memory access is implementation defined".
 >
 > A lot of compilers either support gcc inline assembly syntax (including
 > memory clobbers), provide intrinsics or extensions for barriers, or
 > simply have that any function calls or inline assembly act as barriers.
 >   The documentation of such features can often be lacking, however.
 >
 > So generous "volatile" usage is the only cross-target way to get this
 > effect.

It's a pity, because you could add volatile keyword even if it isn't 
really necessary, only to be safe. And this could have implications in 
performance.

I think one method to reduce the performance loss is to declare 
variables without volatile and access them through pointers that, in 
case safe is important, can be volatile.

static mytype var;

volatile mytype *v = (volatile mytype)&var;
// volatile access to var should be made through v pointer


 >>> Note that both "volatile" and compiler memory barriers (using empty
 >>> inline assembly with a "memory clobber") are purely compiler-level
 >>> concepts.  They don't affect any scheduling or pipelining in the cpu,
 >>> cache, or other hardware.
 >>
 >> Even if I just learned there are hardware memory barriers too...
 >>
 >
 > What sort of processors do you use?
 >
 > If you are using an OS or RTOS of some sort, it will usually provide
 > macros or functions to handle this sort of thing for you.
 >
 > When you are doing the messy details yourself, you only need to worry
 > about processor memory ordering when you have OoO, superscaler, write
 > back buffers, memory management, caches, etc.  And even then, the
 > hardware is usually arranged in a convenient way, such as to bypass
 > cache and write buffers when accessing memory mapped peripherals.  The
 > fun really starts when you have a write-back cache on the cpu, and DMA
 > on the peripheral side that is trying to access the same data.

I read something about Cortex-M [1] and its hw memory barriers 
instructions DSB and ISB. Cortex-M processors aren't OoO, superscaler or 
other esoteric architectures... and I'm using them currently.

[1] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHHFHJD.html


 >>>>>> Yes, disabling interrupts is the simplest solution, but I usually
 >>>>>> try to
 >>>>>> find a more elegant solution... if any exists.
 >>>>>
 >>>>> Often, simple /is/ elegant!
 >>>>>
 >>>>> Remember, it is extremely difficult to spot issues with this sort of
 >>>>> thing in testing.  You can examine generated assembly code - but
 >>>>> perhaps
 >>>>> at a later date, with slight changes to other code, compiler
 >>>>> options, or
 >>>>> compiler versions, the generated assembly no longer matches.
 >>>>>
 >>>>> So it is better to be safe than sorry, even if that means marginally
 >>>>> sub-optimal code.  Typically for this sort of thing, adding extra
 >>>>> volatiles is not going to cause inefficiency - you are going to do
 >>>>> these
 >>>>> reads and writes anyway.  And disabling interrupts for a few 
cycles is
 >>>>> rarely a real problem - it can be much better than using many more
 >>>>> cycles for a non-locking solution.
 >>>>
 >>>> Yes, you're right again.
 >>>>
 >>>> If I will decide to disable timer interrupt in foo(), is it ok to 
avoid
 >>>> declaring variables (curr, next, curr_s, ...) with keyword volatile?
 >>>>
 >>>> I think yes, it should be ok. If interrupt (where the variables are
 >>>> potentially changed) is disabled, the variables aren't volatile.
 >>>>
 >>>
 >>> Again, you need either volatiles or memory barriers to make sure the
 >>> required accesses (and preferably /only/ the required accesses) are
 >>> within the critical section.  An easy way to do this is to make memory
 >>> barriers part of the enable/disable interrupt functions.  But note that
 >>> memory barriers are a heavy-handed approach - if your code must be as
 >>> fast as possible, and your processor has lots of registers, long
 >>> pipelines and super-scaler scheduling, then sometimes it is not ideal.
 >>> For typical microcontrollers, however, it works very well.
 >>
 >> I just checked the definition of sei()/cli() macros in avr-libc and...
 >> they have a memory barrier! I never noticed it.
 >
 > The folks behind the avr-libc library are pretty knowledgeable about
 > this sort of thing!  Mind you, I did mention to them my
 > "forceDependency" function as a solution to the problem mentioned in the
 > avr-libc documentation (the one you found below), but it didn't make it
 > to the web page.
 >
 >>
 >>> I've included a few of my favourite functions below.  The assembly used
 >>> is for the ARM Cortex-M4, but can easily be adapted for other
 >>> microcontrollers (and simpler cpus like the AVR or MSP430 don't 
need any
 >>> cpu barrier or flush instructions - but you still need the memory
 >>> clobber!).
 >>>
 >>>
 >>> One of these that is likely to be a new concept for many (I can imagine
 >>> Tim peering intently at this one!) is the "forceDependency" macro 
below.
 >>>   This is a solution to problems such as :
 >>>
 >>> void setBaudrate(uint32_t baud) {
 >>>     uint32_t divisor = ((busClockTicksPerMicro * 1000000 * 2) +
 >>>             (baud / 2)) / baud;
 >>>     uint32_t sbr = divisor >> 5;
 >>>
 >>>     UART_Type* pUart = UART2_BASE_PTR;
 >>>
 >>>     uint8_t BDH = (sbr >> 8) & 0x1f;
 >>>     uint8_t BDL = sbr & 0xff;
 >>>     uint8_t C4 = divisor & 0x1f;
 >>>
 >>>
 >>>     disableInterrupts();
 >>>
 >>>     // Critical section starts here
 >>>
 >>>     pUart->BDH = BDH;
 >>>     pUart->BDL = BDL;
 >>>     pUart->C4 = C4;
 >>>
 >>>     enableInterrupts();
 >>> }
 >>>
 >>> First, "divisor" and "sbr" are calculated, which is a slow process
 >>> involving multiple cycles and a library call or two.  Then the desired
 >>> hardware register values are derived.  Then interrupts are 
disabled, and
 >>> the values are written into the peripheral registers.
 >>>
 >>> It looks like interrupts are disabled for the minimal length of time.
 >>>
 >>> But...
 >>>
 >>> Even though the disableInterrupt() and enableInterrupt() functions have
 >>> a memory barrier, /and/ the pUart pointer is to a volatile variable,
 >>> those only apply to memory accesses.  /Calculations/ can be moved 
around
 >>> as the compiler fancies.  The compiler may not bother calculating
 >>> "divisor", "sbr", and so on, until they are actually needed - and that
 >>> means /after/ the disableInterrupt() call.  The result is that it is
 >>> quite possible for all the long calculations to be done within the
 >>> critical section.
 >>
 >> Aaargh, this example put me in a wrong way. I thought I had understood.
 >> Your example is similar to the one showed at [1], even if there isn't a
 >> solution there.
 >
 > Yes, it is exactly the same situation - and "forceDependency" is a good
 > solution to it.
 >
 >>
 >> You say /calculations/ could be moved around and could cross memory
 >> barriers... but the results of BDH, BDL and C4 (and divisor and sbr)
 >> should be stored/saved in variables. Ok, they are automatic variables,
 >> they are on the stack, but they are /variables/.
 >
 > No, automatic variables do not necessarily result in memory variables.
 > And even if they are stored on the stack, they don't have to stay in the
 > same place on the stack, nor do they have to be the same size or the
 > same value as you use in the source.  The compiler only has to turn them
 > into "real" variables in memory if you do something like pass their
 > address onto an external function.  That is why "what constitutes a
 > memory access is implementation-defined".
 >
 > On a device like the AVR or an ARM, with plenty of registers, most
 > automatic variables never get put on the stack.  And that's a good thing
 > - it means you can freely make as many local variables as you want in
 > order to write clear and correct code, without worrying about the extra
 > stack space or timing.

Are you saying the compiler has the right to not respect a memory 
barrier regarding instructions involving automatic variables *it* 
decides to manage in CPU registers?  Only because on that CPU 
architecture with those registers and in that context, it can avoid to 
manage an automatic variable as a normal memory variable?

Oh my god, it's a very bad situation... I didn't think C language (that 
was borned for writing OS and device drivers in a portable way) has this 
sort of obscure faces...


What happens if BDL, BDH and C4 variables are declared outside the 
function, as static variables at module level? They are full memory 
variable, so the compiler should respect the memory barrier for them.
Yes, you waste a bunch of bytes, but you can be sure the non critical 
section doesn't cross the critical section borders.


On [2] the same problem is descripted in the following way:

"However, memory barrier works well in ensuring that all volatile 
accesses before and after the barrier occur in the given order with 
respect to the barrier. However, it does not ensure the compiler moving 
non-volatile-related statements across the barrier."

It seems the problem is caused because there are statements involving 
non-volatile variables. Maybe the problem can be solved declaring BDH, 
BDL and C4 volatile?

[2] 
http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html


 >> The memory barrier
 >> embedded in disableInterrupts() macro should say to the compiler to
 >> flush/save all the values from registers to RAM. At least, this is the
 >> definition I understood of memory barrier.
 >
 > Nope.
 >
 > The barrier, as defined by the functions I gave, tells the compiler that
 > at this point, something outside of the compiler's control and knowledge
 > may read or write memory - the whole memory becomes briefly volatile.
 > However, it says /nothing/ about what should or should not be stored in
 > memory.

Hmmm...., ok.  Anyway, if the compiler has generated some temporary 
values for some memory variables in registers, it is forced to move 
those values to RAM to refresh variables. Indeed, they can be accessed 
by some read operations without the direct control of compiler.


 >> If the reordering is possible (the interrupts disabling instruction is
 >> moved before the calculations instructions), I haven't understood the
 >> usefulness of memory barriers.
 >
 > I believe you understand about the memory barriers, you just didn't
 > understand about what variables go in memory.  As a good guide,
 > everything that has an address is in memory - that includes all global
 > data, file-level data, static data, and dynamically allocated data, as
 > well as any local data whose address you have taken, and any volatile
 > local data.
 >
 > (Strictly speaking, this is not entirely accurate in light of powerful
 > optimisations, link-time optimisation, etc., but it should be fine
 > unless the compiler was written by sadists.)
 >
 >>
 >>
 >>> The solution is to put:
 >>>
 >>>     forceDependency(BDH);
 >>>     forceDependency(BDL);
 >>>     forceDependency(C4);
 >>>
 >>> before disableInterrupt().  The "forceDependency" macro tells the
 >>> compiler that it needs to know the value of "BDH" for some inline
 >>> assembly - even though the inline assembly is empty and does nothing.
 >>
 >> Does this trick work for avr-gcc too? Because, many think there isn't a
 >> solution to that problem.
 >>
 >
 > Yes, it works for avr-gcc.  And I posted it as a solution on the
 > avr-libc-dev mailing list about a year ago, but the developers had hoped
 > to find a way to make the compiler automatically generate the desired
 > code rather than having "ugly" assembly.  I agree that mind-reading
 > compilers are ultimately the best solution, but in the meantime this
 > trick will do the job.
 >
 >
 > I don't know if you often look at the generated assembly for your
 > compilations, but it is always interesting to see exactly what code the
 > compiler generates in different circumstances.

Yes, I think this too.

Moreover I start thinking your best approach to write a really working 
function that involves those critical aspects, is to write in assembler 
directly. Mostly for safety critical applications (where human deaths 
are possible).

If I have to check the assembler generated by the compiler, I need to 
learn it... so it is better to write in assembler directly.


 >>> I believe that is enough lessons for today.  I hope my posts here have
 >>> been of use to both the OP (well done for asking difficult and
 >>> interesting questions) and to others in the group.
 >>
 >> Thank you very much for sharing your experience.
 >>
 >> [1]
 >> 
www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
 >>
 >>
 >>
 >



The 2026 Embedded Online Conference