EmbeddedRelated.com
Forums

How to write a simple driver in bare metal systems: volatile, memory barrier, critical sections and so on

Started by pozz October 22, 2021
On 2021-10-25 16:04, Dimiter_Popoff wrote:
> On 10/25/2021 11:09, Niklas Holsti wrote: >> On 2021-10-24 23:27, Dimiter_Popoff wrote: >>> On 10/24/2021 22:54, Don Y wrote: >>>> On 10/24/2021 4:14 AM, Dimiter_Popoff wrote: >>>>>> Disable interrupts while accessing the fifo. you really have to. >>>>>> alternatively you'll often get away not using a fifo at all, >>>>>> unless you're blocking for a long while in some part of the code. >>>>> >>>>> Why would you do that. The fifo write pointer is only modified by >>>>> the interrupt handler, the read pointer is only modified by the >>>>> interrupted code. Has been done so for times immemorial. >>>> >>>> The OPs code doesn't differentiate between FIFO full and empty. >>> >>> So he should fix that first, there is no sane reason why not. >>> Few things are simpler to do than that. >> >> >> ��� [snip] >> >> >>> Whatever handshakes he makes there is no problem knowing whether >>> the fifo is full - just check if the position the write pointer >>> will have after putting the next byte matches the read pointer >>> at the moment.� Like I said before, few things are simpler than >>> that, can't imagine someone working as a programmer being >>> stuck at *that*. >> >> That simple check would require keeping a maximum of only N-1 entries >> in the N-position FIFO buffer, and the OP explicitly said they did not >> want to allocate an unused place in the buffer (which I think is >> unreasonable of the OP, but that is only IMO). > > Well it might be reasonable if the fifo has a size of two, you know :-).
And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.
On 10/25/2021 8:34 AM, Niklas Holsti wrote:
> And if each of those two items is large, yes. But here we have a FIFO of 8-bit > characters... few programs are so tight on memory that they cannot stand one > unused octet.
It's not "unused". Rather, it's roll is that of indicating "full/overrun". The OP seems to have decided that this is of no concern -- in *one* app?
Il 25/10/2021 17:34, Niklas Holsti ha scritto:
> On 2021-10-25 16:04, Dimiter_Popoff wrote: >> On 10/25/2021 11:09, Niklas Holsti wrote: >>> On 2021-10-24 23:27, Dimiter_Popoff wrote: >>>> On 10/24/2021 22:54, Don Y wrote: >>>>> On 10/24/2021 4:14 AM, Dimiter_Popoff wrote: >>>>>>> Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all, >>>>>>> unless you're blocking for a long while in some part of the code. >>>>>> >>>>>> Why would you do that. The fifo write pointer is only modified by >>>>>> the interrupt handler, the read pointer is only modified by the >>>>>> interrupted code. Has been done so for times immemorial. >>>>> >>>>> The OPs code doesn't differentiate between FIFO full and empty. >>>> >>>> So he should fix that first, there is no sane reason why not. >>>> Few things are simpler to do than that. >>> >>> >>> ��� [snip] >>> >>> >>>> Whatever handshakes he makes there is no problem knowing whether >>>> the fifo is full - just check if the position the write pointer >>>> will have after putting the next byte matches the read pointer >>>> at the moment.� Like I said before, few things are simpler than >>>> that, can't imagine someone working as a programmer being >>>> stuck at *that*. >>> >>> That simple check would require keeping a maximum of only N-1 entries >>> in the N-position FIFO buffer, and the OP explicitly said they did >>> not want to allocate an unused place in the buffer (which I think is >>> unreasonable of the OP, but that is only IMO). >> >> Well it might be reasonable if the fifo has a size of two, you know :-). > > > And if each of those two items is large, yes. But here we have a FIFO of > 8-bit characters... few programs are so tight on memory that they cannot > stand one unused octet.
When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element. However I know this isn't the best implementation ever and it's a pity the thread emphasis has been against this implementation (that was used as *one* implementation just to have an example to discuss on). The main point was the use of volatile (and other techniques) to guarantee a correct compiler output, whatever legal (respect the C standard) optimizations the compiler thinks to do. It seems to me the arguments againts or for volatile are completely indipendent from the implementation of ring-buffer.
On 10/25/2021 20:43, Don Y wrote:
> On 10/25/2021 8:34 AM, Niklas Holsti wrote: >> And if each of those two items is large, yes. But here we have a FIFO >> of 8-bit characters... few programs are so tight on memory that they >> cannot stand one unused octet. > > It's not "unused".&#4294967295; Rather, it's roll is that of indicating "full/overrun". > The OP seems to have decided that this is of no concern -- in *one* app?
Oh come on, I joked about the fifo of two bytes only because this whole thread is a joke - pages and pages of C to maintain a fifo, what can be more of a joke than this.
On 10/25/2021 10:53 AM, Dimiter_Popoff wrote:
> On 10/25/2021 20:43, Don Y wrote: >> On 10/25/2021 8:34 AM, Niklas Holsti wrote: >>> And if each of those two items is large, yes. But here we have a FIFO of >>> 8-bit characters... few programs are so tight on memory that they cannot >>> stand one unused octet. >> >> It's not "unused". Rather, it's roll is that of indicating "full/overrun". >> The OP seems to have decided that this is of no concern -- in *one* app? > > Oh come on, I joked about the fifo of two bytes only because this whole > thread is a joke
My comment applies regardless of the size of the FIFO.
> - pages and pages of C to maintain a fifo, what can be > more of a joke than this.
Where do you see "pages and pages of C to maintain a FIFO"?
On 10/25/2021 10:52 AM, pozz wrote:
> However I know this isn't the best implementation ever and it's a pity the > thread emphasis has been against this implementation (that was used as *one* > implementation just to have an example to discuss on).
The point is that you need a COMPLETE implementation before you start thinking about the amount of "license" the compiler can take with your code. Here's *part* of an implementation: a = 37; Now, should I declare A as volatile? Use the register qualifier? What size should the integer A be? Can the optimizer elide this statement from my code? All sorts of questions whose answers depend on the REST of the implementation -- not shown!
> The main point was the use of volatile (and other techniques) to guarantee a > correct compiler output, whatever legal (respect the C standard) optimizations > the compiler thinks to do. > > It seems to me the arguments againts or for volatile are completely indipendent > from the implementation of ring-buffer.
It has to do with indicating how YOU (the developer) see the object being used (accessed). You, in theory, know more about the role of the object than the compiler (because it may be accessed in other modules, or, have "stuff" tied to it -- like special hardware, etc.) You need a way to tell the compiler that "you know what you are doing" in your use of the object and that it should restrain itself from making assumptions that might not be true. If your example doesn't bring to light those various issues, then the decision as to its applicability is moot.
Il 23/10/2021 18:09, David Brown ha scritto:
[...]
> Marking "in" and "buf" as volatile is /far/ better than using a critical > section, and likely to be more efficient than a memory barrier. You can > also use volatileAccess rather than making buf volatile, and it is often > slightly more efficient to cache volatile variables in a local variable > while working with them.
I think I got your point, but I'm wondering why there are plenty of examples of ring-buffer implementations that don't use volatile at all, even if the author explicitly refers to interrupts and multithreading. Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too). In the source code there's no use of volatile. I could call RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code. From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future compilers. [1] https://github.com/QuantumLeaps/lock-free-ring-buffer
On 2021-10-25 20:52, pozz wrote:
> Il 25/10/2021 17:34, Niklas Holsti ha scritto: >> On 2021-10-25 16:04, Dimiter_Popoff wrote: >>> On 10/25/2021 11:09, Niklas Holsti wrote: >>>> On 2021-10-24 23:27, Dimiter_Popoff wrote: >>>>> On 10/24/2021 22:54, Don Y wrote: >>>>>> On 10/24/2021 4:14 AM, Dimiter_Popoff wrote: >>>>>>>> Disable interrupts while accessing the fifo. you really have to. >>>>>>>> alternatively you'll often get away not using a fifo at all, >>>>>>>> unless you're blocking for a long while in some part of the code. >>>>>>> >>>>>>> Why would you do that. The fifo write pointer is only modified by >>>>>>> the interrupt handler, the read pointer is only modified by the >>>>>>> interrupted code. Has been done so for times immemorial. >>>>>> >>>>>> The OPs code doesn't differentiate between FIFO full and empty.
(I suspect something is not quite right with the attributions of the quotations above -- Dimiter probably did not suggest disabling interrupts -- but no matter.) [snip]
> When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and > this is the case for a UART receiving ring-buffer), I like to use this > implementation that works and doesn't waste any element. > > However I know this isn't the best implementation ever and it's a pity > the thread emphasis has been against this implementation (that was used > as *one* implementation just to have an example to discuss on). > > The main point was the use of volatile (and other techniques) to > guarantee a correct compiler output, whatever legal (respect the C > standard) optimizations the compiler thinks to do. > > It seems to me the arguments againts or for volatile are completely > indipendent from the implementation of ring-buffer.
Of course "volatile" is needed, in general, whenever anything is written in one thread and read in another. The issue, I think, is when "volatile" is _enough_. I feel that detection of a full buffer (FIFO overflow) is required for a proper ring buffer implementation, and that has implications for the data structure needed, and that has implications for whether critical sections are needed. If the FIFO implementation is based on just two pointers (read and write), and each pointer is modified by just one of the two threads (main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been discussed so far, IIRC...), then one can do without a critical region. But then detection of a full buffer needs one "wasted" element in the buffer. To avoid the wasted element, one could add a "full"/"not full" Boolean flag. But that flag would be modified by both threads, and should be modified atomically together with the pointer modifications, which (I think) means that a critical section is needed.
On 25/10/2021 20:15, pozz wrote:
> Il 23/10/2021 18:09, David Brown ha scritto: > [...] >> Marking "in" and "buf" as volatile is /far/ better than using a critical >> section, and likely to be more efficient than a memory barrier.&#4294967295; You can >> also use volatileAccess rather than making buf volatile, and it is often >> slightly more efficient to cache volatile variables in a local variable >> while working with them. > > I think I got your point, but I'm wondering why there are plenty of > examples of ring-buffer implementations that don't use volatile at all, > even if the author explicitly refers to interrupts and multithreading.
You don't have to use "volatile". You can make correct code here using critical sections - it's just a lot less efficient. (If you have a queue where more than one context can be reading it or writing it, then you /do/ need some kind of locking mechanism.) You can also use memory barriers instead of volatile, but it is likely to be slightly less efficient. You can also use atomics instead of volatiles, but it is also quite likely to be slightly less efficient. If you have an SMP system, on the other hand, then you need something more than volatile and compiler memory barriers - atomics are quite possibly the most efficient solution in that case. And sometimes you can make code that doesn't need any special treatment at all, because you know the way it is being called. If the two ends of your buffer are handled by tasks in a cooperative multi-tasking scenario, then there is no problem - you don't need to worry about volatile or any alternatives. If you know your interrupt can't occur while the other end of the buffer is being handled, that can reduce your need for volatile. (In particular, that can also avoid complications if you have counter variables that are bigger than the processor can handle atomically - usually not a problem for a 32-bit Cortex-M, but often important on an 8-bit AVR.) If you know, for a fact, that the code will be compiled by a weak compiler or with weak optimisation, or that the "get" and "put" implementations will always be in a separately compiled unit from code calling these functions and you'll never use any kind of cross-unit optimisations, then you can get often away without using volatile.
> > Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I > think thread-safe) ring-buffer implementation in the scenario of single > producer/single consumer (that is my scenario too).
It's lock-free, but not safe in the face of modern optimisation (gcc has had LTO for many years, and a lot of high-end commercial embedded compilers have used such techniques for decades). And I'd want to study it in detail and think a lot before accepting that it is safe to use its 16-bit counters on an 8-bit AVR. That could be fixed by just changing the definition of the RingBufCtr type, which is a nice feature in the code.
> > In the source code there's no use of volatile. I could call > RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code. >
You don't want to call functions from an ISR if you can avoid it, unless the functions are defined in the same unit and can be inlined. On many processors (less so on the Cortex-M) calling an external function from an ISR means a lot of overhead to save and restore the so-called "volatile" registers (no relation to the C keyword "volatile"), usually completely unnecessarily.
> From what I learned from you, this code usually works, but the standard > doesn't guarantee it will work with every old, current and future > compilers. >
Yes, that's a fair summary. It might be good enough for some purposes. But since "volatile" will cost nothing in code efficiency but greatly increase the portability and safety of the code, I'd recommend using it. And I am certainly in favour of thinking carefully about these things - as you did in the first place, which is why we have this thread.
> > > [1] https://github.com/QuantumLeaps/lock-free-ring-buffer
On 25/10/2021 17:34, Niklas Holsti wrote:

> And if each of those two items is large, yes. But here we have a FIFO of > 8-bit characters... few programs are so tight on memory that they cannot > stand one unused octet.
I remember a program I worked with where the main challenge for the final features was not figuring out the implementation, but finding a few spare bytes of code space and a couple of spare bits of ram to use. And that was with 32 KB ROM and 512 bytes RAM (plus some bits in the registers of peripherals that weren't used). That was probably the last big assembly program I wrote - non-portability was a killer.