EmbeddedRelated.com
Forums

Cortex-M: share an int between two tasks

Started by pozz March 13, 2020
On 24/03/2020 23:55, Clifford Heath wrote:
> On 24/3/20 12:21 am, David Brown wrote: >> Instructions like "dmb" force an order on the cpu operations, not the >> compiler - while "volatile" enforces a partial order on the compiler, >> but not the hardware. >> >> The standard solution would be asm("dmb" ::: "memory"), where the >> memory clobber forces an ordering on the compiler - and thus you can >> (usually) omit the "volatile". > David, if you have time, many of us would appreciate a quick summary of > the situations where the different barriers should be used. > > "dmb" can be a full read/write barrier, or just a write barrier. When to > use each? > > "dsb" is different from "dmb" because it limits instruction ordering, > but when is that useful? When to use it as a write-only barrier? > > "isb" is intended to precede a context switch, i.e. task switching in an > RTOS. Is it sufficient for that, and is that the only time to use it? > > If you know a good article that gives practical guidelines, post a link, > otherwise I'd really like to hear your thoughts. > > Clifford Heath.
On 25.3.20 00:55, Clifford Heath wrote:
> On 24/3/20 12:21 am, David Brown wrote: >> Instructions like "dmb" force an order on the cpu operations, not the >> compiler - while "volatile" enforces a partial order on the compiler, >> but not the hardware. >> >> The standard solution would be asm("dmb" ::: "memory"), where the >> memory clobber forces an ordering on the compiler - and thus you can >> (usually) omit the "volatile". > David, if you have time, many of us would appreciate a quick summary of > the situations where the different barriers should be used. > > "dmb" can be a full read/write barrier, or just a write barrier. When to > use each? > > "dsb" is different from "dmb" because it limits instruction ordering, > but when is that useful? When to use it as a write-only barrier? > > "isb" is intended to precede a context switch, i.e. task switching in an > RTOS. Is it sufficient for that, and is that the only time to use it? > > If you know a good article that gives practical guidelines, post a link, > otherwise I'd really like to hear your thoughts. > > Clifford Heath.
ARM has an application note (DAI-0321A) explaining the barriers, see <https://developer.arm.com/docs/103490176/10/does-the-cortex-m3-or-cortex-m4-processor-need-memory-barrier-instructions>. -- -TV
On 31/3/20 1:07 am, David Brown wrote:
> On 24/03/2020 23:55, Clifford Heath wrote: >> On 24/3/20 12:21 am, David Brown wrote: >>> Instructions like "dmb" force an order on the cpu operations, not the >>> compiler - while "volatile" enforces a partial order on the compiler, >>> but not the hardware. >>> >>> The standard solution would be asm("dmb" ::: "memory"), where the >>> memory clobber forces an ordering on the compiler - and thus you can >>> (usually) omit the "volatile". >> David, if you have time, many of us would appreciate a quick summary >> of the situations where the different barriers should be used. >> >> "dmb" can be a full read/write barrier, or just a write barrier. When >> to use each? >> >> "dsb" is different from "dmb" because it limits instruction ordering, >> but when is that useful? When to use it as a write-only barrier? >> >> "isb" is intended to precede a context switch, i.e. task switching in >> an RTOS. Is it sufficient for that, and is that the only time to use it? >> >> If you know a good article that gives practical guidelines, post a >> link, otherwise I'd really like to hear your thoughts. >> >> Clifford Heath. >
I was excited to open your response, then disappointed. Forget to hit Save? CH
On 31/03/2020 00:02, Clifford Heath wrote:
> On 31/3/20 1:07 am, David Brown wrote: >> On 24/03/2020 23:55, Clifford Heath wrote: >>> On 24/3/20 12:21 am, David Brown wrote: >>>> Instructions like "dmb" force an order on the cpu operations, not >>>> the compiler - while "volatile" enforces a partial order on the >>>> compiler, but not the hardware. >>>> >>>> The standard solution would be asm("dmb" ::: "memory"), where the >>>> memory clobber forces an ordering on the compiler - and thus you can >>>> (usually) omit the "volatile". >>> David, if you have time, many of us would appreciate a quick summary >>> of the situations where the different barriers should be used. >>> >>> "dmb" can be a full read/write barrier, or just a write barrier. When >>> to use each? >>> >>> "dsb" is different from "dmb" because it limits instruction ordering, >>> but when is that useful? When to use it as a write-only barrier? >>> >>> "isb" is intended to precede a context switch, i.e. task switching in >>> an RTOS. Is it sufficient for that, and is that the only time to use it? >>> >>> If you know a good article that gives practical guidelines, post a >>> link, otherwise I'd really like to hear your thoughts. >>> >>> Clifford Heath. >> > > I was excited to open your response, then disappointed. > Forget to hit Save? >
No, I just haven't had the time to do a decent reply yet. It is still an open draft on my desktop. I haven't forgotten about it - I've just had too many other things to do.
On 31/3/20 7:57 pm, David Brown wrote:
> On 31/03/2020 00:02, Clifford Heath wrote: >> On 31/3/20 1:07 am, David Brown wrote: >>> On 24/03/2020 23:55, Clifford Heath wrote: >>>> David, if you have time, many of us would appreciate a quick summary >>>> of the situations where the different barriers should be used. >>>> >>>> "dmb" can be a full read/write barrier, or just a write barrier. >>>> When to use each? >>>> >>>> "dsb" is different from "dmb" because it limits instruction >>>> ordering, but when is that useful? When to use it as a write-only >>>> barrier? >>>> >>>> "isb" is intended to precede a context switch, i.e. task switching >>>> in an RTOS. Is it sufficient for that, and is that the only time to >>>> use it? >>>> >>>> If you know a good article that gives practical guidelines, post a >>>> link, otherwise I'd really like to hear your thoughts. >> I was excited to open your response, then disappointed. >> Forget to hit Save? > > No, I just haven't had the time to do a decent reply yet.&#4294967295; It is still > an open draft on my desktop.&#4294967295; I haven't forgotten about it - I've just > had too many other things to do.
No worries. I look forward to it. CH
In article <IdweG.1169345$Gh7.768707@fx45.iad>,
Clifford Heath  <no.spam@please.net> wrote:
>On 24/3/20 12:21 am, David Brown wrote: >> Instructions like "dmb" force an order on the cpu operations, not the >> compiler - while "volatile" enforces a partial order on the compiler, >> but not the hardware. >> >> The standard solution would be asm("dmb" ::: "memory"), where the memory >> clobber forces an ordering on the compiler - and thus you can (usually) >> omit the "volatile". >David, if you have time, many of us would appreciate a quick summary of >the situations where the different barriers should be used. > >"dmb" can be a full read/write barrier, or just a write barrier. When to >use each? > >"dsb" is different from "dmb" because it limits instruction ordering, >but when is that useful? When to use it as a write-only barrier? > >"isb" is intended to precede a context switch, i.e. task switching in an >RTOS. Is it sufficient for that, and is that the only time to use it? > >If you know a good article that gives practical guidelines, post a link, >otherwise I'd really like to hear your thoughts. > >Clifford Heath.
Barriers are very difficult to get right for a programmer. In my opinion, it is also an CPU architectural mistake to need barriers for user-level code. (It's OK for system software to need some barriers like ISB). Many people disagree with me on this point, and we don't need to argue it here. The basic problem is there's a mismatch between what the programmer is thinking about, and what the compiler/CPU are requiring. This is one reason why multithreading is more difficult than it should be. ARM let's you choose whether the barrier is a read or write or both barrier. I suggest you always do "both", which ARM calls SYSTEM, and is the default if you just say "DMB" or "DSB". I suspect there's almost no performance difference, and it's one less thing you have to worry about. It makes sense for macros for Linux to try to be more aggressive, since they need to show off, and they care more about performance and have the time to test and debug their logic across a variety of systems. This stuff is easy to make a mistake on, and very hard to debug. For application programming, you should generally only need DMB for ordering of volatile accesses. DSB is for ordering other traffic with data accesses--things like cache invalidates, icache fetches, or TLB shoot downs, etc. If you have anything like that which needs to be ordered, you want DSB. Again, ordering accesses to variables in normal memory don't need DSB, but you can use it if you want much lower performance (DSB is a super-set of DMB, so you can use DSB anyplace you would use DMB). Generally, user code doesn't need DSB unless you're doing self-modifying code. ISB is for ordering special system register accesses, or odering data accesses with other CPU actions. If you want to read the TLB using an AT instruction, you must do an ISB before reading the PAR register. I cannot think of a user-level code sequence case that needs ISB off the top of my head now. ISB does nothing to order data operations, and it's not what you want. There's hidden magic in the combination: "DSB; ISB". This waits for most (but not quite all) previous bus traffic to complete before executing any new instructions, including instruction fetches, data fetches, etc. User code generally never needs to do this, but OS code sometimes does. In terms of "heaviness", DMB is the lightest--it will slow the pipeline a few clocks to get the data accesses right. DSB is actually the slowest--it generally causes a bus transaction, which is sent to other agents, and a response is sent back (CPU optimizations can avoid this traffic sometimes, especially if another DSB was recently done). You don't want to use DSB unless you really need it. And ISB is in the middle--also a few cycles, but likely a little more than DMB. And "DSB; ISB" basically brings the CPU to temporary halt--waits for (almost) every current fetch to finish, then restarts. To make things worse, the way to actually insert the barrier is another level of complexity, and which sadly seems to be compiler dependent. It's almost as if this whole area is a big giant mess. Kent
On 19/5/20 3:34 pm, Kent Dickey wrote:
> In article <IdweG.1169345$Gh7.768707@fx45.iad>, > Clifford Heath <no.spam@please.net> wrote: >> On 24/3/20 12:21 am, David Brown wrote: >>> Instructions like "dmb" force an order on the cpu operations, not the >>> compiler - while "volatile" enforces a partial order on the compiler, >>> but not the hardware. >>> >>> The standard solution would be asm("dmb" ::: "memory"), where the memory >>> clobber forces an ordering on the compiler - and thus you can (usually) >>> omit the "volatile". >> David, if you have time, many of us would appreciate a quick summary of >> the situations where the different barriers should be used. >> >> "dmb" can be a full read/write barrier, or just a write barrier. When to >> use each? >> >> "dsb" is different from "dmb" because it limits instruction ordering, >> but when is that useful? When to use it as a write-only barrier? >> >> "isb" is intended to precede a context switch, i.e. task switching in an >> RTOS. Is it sufficient for that, and is that the only time to use it? >> >> If you know a good article that gives practical guidelines, post a link, >> otherwise I'd really like to hear your thoughts. >> >> Clifford Heath. > > Barriers are very difficult to get right for a programmer. In my opinion, > it is also an CPU architectural mistake to need barriers for user-level code. > (It's OK for system software to need some barriers like ISB). Many people > disagree with me on this point, and we don't need to argue it here. > > The basic problem is there's a mismatch between what the programmer is > thinking about, and what the compiler/CPU are requiring. This is one reason > why multithreading is more difficult than it should be. > > ARM let's you choose whether the barrier is a read or write or both barrier. > I suggest you always do "both", which ARM calls SYSTEM, and is the > default if you just say "DMB" or "DSB". I suspect there's almost no > performance difference, and it's one less thing you have to worry about. > It makes sense for macros for Linux to try to be more aggressive, since they > need to show off, and they care more about performance and have the time to > test and debug their logic across a variety of systems. This stuff is easy to > make a mistake on, and very hard to debug. > > For application programming, you should generally only need DMB for > ordering of volatile accesses. > > DSB is for ordering other traffic with data accesses--things like cache > invalidates, icache fetches, or TLB shoot downs, etc. If you have anything > like that which needs to be ordered, you want DSB. Again, ordering accesses > to variables in normal memory don't need DSB, but you can use it if you want > much lower performance (DSB is a super-set of DMB, so you can use DSB anyplace > you would use DMB). Generally, user code doesn't need DSB unless you're > doing self-modifying code. > > ISB is for ordering special system register accesses, or odering data accesses > with other CPU actions. If you want to read the TLB using an AT instruction, > you must do an ISB before reading the PAR register. I cannot think of a > user-level code sequence case that needs ISB off the top of my head now. > ISB does nothing to order data operations, and it's not what you want. > There's hidden magic in the combination: "DSB; ISB". > This waits for most (but not quite all) previous bus traffic to complete > before executing any new instructions, including instruction fetches, data > fetches, etc. User code generally never needs to do this, but OS code > sometimes does. > > In terms of "heaviness", DMB is the lightest--it will slow the pipeline > a few clocks to get the data accesses right. DSB is actually the slowest--it > generally causes a bus transaction, which is sent to other agents, and > a response is sent back (CPU optimizations can avoid this traffic sometimes, > especially if another DSB was recently done). You don't want to use DSB > unless you really need it. And ISB is in the middle--also a few cycles, but > likely a little more than DMB. And "DSB; ISB" basically brings the CPU > to temporary halt--waits for (almost) every current fetch to finish, then > restarts. > > To make things worse, the way to actually insert the barrier is another > level of complexity, and which sadly seems to be compiler dependent. > > It's almost as if this whole area is a big giant mess. > > Kent >
Many thanks Kent, that's very useful and to-the-point. Clifford Heath
On 19/05/2020 07:34, Kent Dickey wrote:
> In article <IdweG.1169345$Gh7.768707@fx45.iad>, > Clifford Heath <no.spam@please.net> wrote: >> On 24/3/20 12:21 am, David Brown wrote: >>> Instructions like "dmb" force an order on the cpu operations, not the >>> compiler - while "volatile" enforces a partial order on the compiler, >>> but not the hardware. >>> >>> The standard solution would be asm("dmb" ::: "memory"), where the memory >>> clobber forces an ordering on the compiler - and thus you can (usually) >>> omit the "volatile". >> David, if you have time, many of us would appreciate a quick summary of >> the situations where the different barriers should be used. >> >> "dmb" can be a full read/write barrier, or just a write barrier. When to >> use each? >> >> "dsb" is different from "dmb" because it limits instruction ordering, >> but when is that useful? When to use it as a write-only barrier? >> >> "isb" is intended to precede a context switch, i.e. task switching in an >> RTOS. Is it sufficient for that, and is that the only time to use it? >> >> If you know a good article that gives practical guidelines, post a link, >> otherwise I'd really like to hear your thoughts. >> >> Clifford Heath. > > Barriers are very difficult to get right for a programmer. In my opinion, > it is also an CPU architectural mistake to need barriers for user-level code. > (It's OK for system software to need some barriers like ISB). Many people > disagree with me on this point, and we don't need to argue it here. >
I agree on the principle. And usually it can be done in practice too, but it can come at a cost. For most embedded systems, the way to avoid needing barrier instructions is to set up memory areas with different characteristics such as cacheable, bufferable, etc. Typically memory mapped peripherals will be in an area where all accesses are strictly ordered and uncacheable, and then no barrier instructions are needed. For small microcontroller cores, this has no cost since you don't have caches or write buffers anyway, but on bigger processors it can be significant when you have larger blocks of data to transfer. This can be a measurable hit on things like Ethernet performance or data in DMA buffers. The most important thing is always that the code should be correct. It is better to be slower and correct than faster and incorrect! Thus usually you have the such memory setups to cover the normal cases, and put any required cache or barrier instructions in system code. If you are going to need some cache flush and data ordering instruction before starting a DMA transfer, then those should be in the "start_dma_transfer" function - written by a programmer who /does/ know how these things work. Another kind of barrier is the compiler memory barrier. Again, it can be hard for users to get these right - and they should be put in system code for things like interrupt disable functions so that users don't have to worry about them.
> The basic problem is there's a mismatch between what the programmer is > thinking about, and what the compiler/CPU are requiring. This is one reason > why multithreading is more difficult than it should be. >
Agreed. C11 and C++11 can help a bit with atomics and fences, but relatively few people understand these well. I am a fan of message passing and queues as a way of inter-thread communication, as it is a lot easier to understand and get right than using locks or critical sections. It is also much easier to scale with SMP or AMP. You don't need to worry about whether data is written to memory before the lock is taken, or whether you want a compiler memory barrier, a processor barrier instruction, volatile accesses - just put the message you want on the queue and off it goes. (Just don't pass pointers to data on the local stack...)
> ARM let's you choose whether the barrier is a read or write or both barrier.
(Write or read/write barrier - there is AFAIK no read barrier.)
> I suggest you always do "both", which ARM calls SYSTEM, and is the > default if you just say "DMB" or "DSB". I suspect there's almost no > performance difference, and it's one less thing you have to worry about.
For smaller microcontrollers, there will be no noticeable difference. By the time you have external dynamic memory connected via a quad SPI bus, the latency on reads can be much more dramatic. Writes can be buffered further down the chain (such as in the QSPI or SDRAM controller), but you don't want to wait for reads if you don't have to. Still, it is always better to be safe than fast, and use "both" if you are not sure.
> It makes sense for macros for Linux to try to be more aggressive, since they > need to show off, and they care more about performance and have the time to > test and debug their logic across a variety of systems. This stuff is easy to > make a mistake on, and very hard to debug. >
Agreed.
> For application programming, you should generally only need DMB for > ordering of volatile accesses.
Generally you don't need that either. The volatile accesses will be ordered by the compiler (as long as the programmer doesn't make the mistake of thinking that volatile accesses also order with non-volatile accesses). If the memory setup is done right, then when writing to peripherals the cpu will enforce the order without the need of DMB. And you don't need DMB for purely cpu-related actions, such as interaction between interrupt routines or threads on the same processor (volatile and compiler barriers are sufficient). The point you typically need DMB is for data that is in main memory and shared between bus masters, like other processors, DMA, or Ethernet controllers. Then you might need a DMB before informing the other masters that data is ready. You may also need cache control instructions too. (You need this sort of thing for reads as well as writes.)
> > DSB is for ordering other traffic with data accesses--things like cache > invalidates, icache fetches, or TLB shoot downs, etc.
Yes, and also changes to the MPU mappings are a common case.
> If you have anything > like that which needs to be ordered, you want DSB. Again, ordering accesses > to variables in normal memory don't need DSB, but you can use it if you want > much lower performance (DSB is a super-set of DMB, so you can use DSB anyplace > you would use DMB). Generally, user code doesn't need DSB unless you're > doing self-modifying code.
Just say "no" to self-modifying code! Firmware updates are an exception, of course. And your DSB is likely to be combined with data cache flushes (to make sure the changes are written to memory), instruction cache flushes (to make sure you don't have stale data there) and ISB.
> > ISB is for ordering special system register accesses, or odering data accesses > with other CPU actions. If you want to read the TLB using an AT instruction, > you must do an ISB before reading the PAR register. I cannot think of a > user-level code sequence case that needs ISB off the top of my head now. > ISB does nothing to order data operations, and it's not what you want. > There's hidden magic in the combination: "DSB; ISB". > This waits for most (but not quite all) previous bus traffic to complete > before executing any new instructions, including instruction fetches, data > fetches, etc. User code generally never needs to do this, but OS code > sometimes does.
Sometimes this sort of thing can be recommended for entering "sleep" modes - often in combination with chip errata on early versions of devices.
> > In terms of "heaviness", DMB is the lightest--it will slow the pipeline > a few clocks to get the data accesses right. DSB is actually the slowest--it > generally causes a bus transaction, which is sent to other agents, and > a response is sent back (CPU optimizations can avoid this traffic sometimes, > especially if another DSB was recently done). You don't want to use DSB > unless you really need it. And ISB is in the middle--also a few cycles, but > likely a little more than DMB. And "DSB; ISB" basically brings the CPU > to temporary halt--waits for (almost) every current fetch to finish, then > restarts. >
Note that the cost of these instructions varies significantly from system to system. On an M0, all three barrier instructions will likely be no more expensive than a NOP. On an M7 with cache and outstanding transactions to external memory, they can cost a lot.
> To make things worse, the way to actually insert the barrier is another > level of complexity, and which sadly seems to be compiler dependent. >
That is partly true - ARM has made a reasonable attempt at headers that can be used with a variety of compilers for at least some of this stuff. But there are always complications when you are dealing with features that simply cannot be described in languages like C.
> It's almost as if this whole area is a big giant mess. >
Well, it's all a big compromise. You can design a processor system that doesn't need barriers of any kind, but it won't scale for higher speeds and certainly won't work with multiple processors. (And once you get to multiple processors, you have another layer with the memory models - you can have programmer-friendly "strong" models like the x86, or far simpler and more efficient "weak" models like most RISC processors, requiring more effort from the programmer.)
On 19/05/20 11:00, David Brown wrote:
> On 19/05/2020 07:34, Kent Dickey wrote:
>> The basic problem is there's a mismatch between what the programmer is >> thinking about, and what the compiler/CPU are requiring. This is one reason >> why multithreading is more difficult than it should be. >> > > Agreed. C11 and C++11 can help a bit with atomics and fences, but > relatively few people understand these well. I am a fan of message > passing and queues as a way of inter-thread communication, as it is a > lot easier to understand and get right than using locks or critical > sections.
And, despite opinions to the contrary, that is just as true in Java[1]. The major advantage that Java has (probably, belatedly, "had") is a memory model that took account of caches and multicore processors. But even that model had to be "adjusted" after a few years out in the wild; humility is beneficial :) [1] Or assembler for that matter!