EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

atomic test_and_set on ADSP21020

Started by alb July 12, 2012
Hi everyone,

I'm trying to implement a locking mechanism without the need to disable
the interrupts (and yes, I know I can do this disabling interrupts) and
I thought about a 'test-and-set' function which will atomically set the
lock which will then be released at the end of the critical section.

Does anybody know about an implementation with the target's Instruction
Set?
I found in the C runtime library for the ADSP21060 (not ADSP21020) a
function called test_and_set_semaphore which is intended for the bus
lock in a multiprocessor system where there's the need to lock access to
the address bus.
The function looks overly complicated though, including a timeout I do
not need. Maybe I can strip off the timeout part and use the rest.

In your opinion, is it still possible to implement it in C language,
keeping the atomicity?
Here I have found that the 'volatile' type qualifier can be a tricky
business, moreover, considering that reordering can be an issue with
these types of operations, I'm afraid about what is written in the g21k
documentation. Here an excerpt:

> 4.3.4 Reordering & Optimization
> If an asm() has output operands, G21K assumes, for optimization > purposes, the instruction lacks side effects except to change the output > operands. This does not mean that instructions with a side effect > cannot be used, but you must be careful. The compiler may eliminate > them if the output operands are not used, or move them out of loops, > or replace two with one if they constitute a common subexpression. > Also, if your instruction does have a side effect on a variable that > otherwise appears not to change, the old value of the variable may be > reused later if it happens to be found in a register. > You can prevent an asm() instruction from being deleted, moved > significantly, or combined, by writing the keyword volatile after > the asm . For example:
> #define set_priority(x) \ > asm volatile (�set_priority %0;�: /* no outputs */ : �d� (x))
> Note: Even a volatile asm() instruction can be moved in ways that > appear insignificant to the compiler, such as across jump instructions. > You cannot expect a sequence of volatile asm() instructions to > remain perfectly consecutive. If you want consecutive output, use a > single asm() construct. Another way to avoid reordering is to use > the output of an asm() construct in a C instruction. > Even with these precautions, be aware that the -O and -O2 > compiler switches may cause asm() instructions to be moved. > Compile without optimization to minimize the chance of reordering.
It is surprising to me the amount of haziness in the documentation. 'moved significantly', 'ways that appear insignificant', 'minimize the chance'. <sarcastic mode ON> Shall a make a probability distribution and measure what is the probability that my code gets reordered? <sarcastic mode OFF> Any suggestion or pointer is appreciated. Al -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
On Jul 12, 4:27=A0pm, alb <alessandro.bas...@cern.ch> wrote:
> Hi everyone, > > I'm trying to implement a locking mechanism without the need to disable > the interrupts > .... > In your opinion, is it still possible to implement it in C language, > keeping the atomicity?
Atomic access without having interrupts masked takes some particular hardware in the processor, like a "test and set" instruction on the 68k or like the lwarx/stwcx. pair on power. You cannot do this in software, let alone trying to do it by mastering the programming language of the device at phrase-book level (in your case, C). Even if you are fluent in the device's assembly if there is no opcode doing that there is just no way of doing it. Notice that you do not need the complete atomicity of TAS and lwarx/stwcx. which are usable in multiprocessor systems as well; all you need is something equivalent to the 68k bset and bclr (bit test and set, bit test and clear), which do not lock the bus but can't be interrupted by an IRQ. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 12-07-12 15:27 , alb wrote:
> Hi everyone, > > I'm trying to implement a locking mechanism without the need to disable > the interrupts (and yes, I know I can do this disabling interrupts) and > I thought about a 'test-and-set' function which will atomically set the > lock which will then be released at the end of the critical section.
Assuming that this locking mechanism is intended for the same 21020 system that has featured in your earlier threads, I'm wondering how this locking mechanism can be useful. As I remember, your program schedules a number of ordinary processes or coroutines co-operatively (without preemption), while these processes (and the scheduler) can be interruped for interrupt handling. If you want the lock to provide mutual exclusion only between ordinary processes, you can use a simple boolean flag variable that is set by assignment statements in the normal way (non-atomically), because the ordinary processes do not preempt each other. (However, to make a process wait for a lock held by some other process, you must make the first process poll repeatedly for the lock, since your scheduler cannot -- as I recall -- suspend a process to wait for an event.) If you want the lock to provide mutual exclusion between ordinary processes and interrupt handlers, you would indeed need a way to test and set a flag atomically. But if the interrupt occurs when the flag is set to "locked", the interrupt handler cannot wait for the lock to be opened, so it can do nothing. This seems very inconvenient to me; the interrupt is effectively "lost" since it cannot update the locked data structure. The nice point about using interrupt disabling (or masking) to implement mutual exclusion between non-interrupt processes and interrupt processes is that this makes the interrupt wait (be pending) until the lock is opened (interrupts enabled or unmasked). Why do you want to avoid interrupt disabling/masking? It seems to me that interrupt disabling/masking is the only usable solution for mutual exclusion in your environment. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Thu, 12 Jul 2012 09:52:11 -0700 (PDT), dp <dp@tgi-sci.com> wrote:

>On Jul 12, 4:27&nbsp;pm, alb <alessandro.bas...@cern.ch> wrote: >> Hi everyone, >> >> I'm trying to implement a locking mechanism without the need to disable >> the interrupts >> .... >> In your opinion, is it still possible to implement it in C language, >> keeping the atomicity? > >Atomic access without having interrupts masked takes some >particular hardware in the processor, like a "test and set" >instruction on the 68k or like the lwarx/stwcx. pair on >power. > >You cannot do this in software, let alone trying to do it >by mastering the programming language of the device at phrase-book >level (in your case, C).
Actually mutual exclusion CAN be done in software ... at least on a uniprocessor with in-order execution. See Dekker's algorithm, or Peterson's algorithm, or Lamport's algorithm, or any of a dozen others. And if you have atomic instructions you don't really need them to set condition codes ... an atomic register-memory swap will suffice. We don't really know that the OP needs atomicity ... only that he needs mutual exclusion.
>Even if you are fluent in the device's assembly if there is >no opcode doing that there is just no way of doing it.
Maybe.
>Notice that you do not need the complete atomicity of >TAS and lwarx/stwcx. which are usable in multiprocessor >systems as well; all you need is something equivalent to >the 68k bset and bclr (bit test and set, bit test and >clear), which do not lock the bus but can't be interrupted >by an IRQ.
Almost. Even on a uniprocessor, without bus locking you need to guarantee in-order execution. That may mean using barrier instructions on an OoO processor. Parallel ops notwithstanding, the 21020 executes in-order anyway so that is not an issue. However, I don't think it has any atomic instructions ... I don't have a manual handy to double check, but unlike the 21060, the 21020 wasn't intended for multiprocessor use.
>Dimiter
George
On 7/12/2012 11:24 PM, Niklas Holsti wrote:
> On 12-07-12 15:27 , alb wrote: >> Hi everyone, >> >> I'm trying to implement a locking mechanism without the need to disable >> the interrupts (and yes, I know I can do this disabling interrupts) and >> I thought about a 'test-and-set' function which will atomically set the >> lock which will then be released at the end of the critical section. > > Assuming that this locking mechanism is intended for the same 21020 > system that has featured in your earlier threads, I'm wondering how this > locking mechanism can be useful. >
It is indeed intended for the same system, but in order to handle multiple processes (FSMs) trying to access the output resource there has been quite a bit of modification to the previous scheme. First of all we realized the output fifo wasn't really needed. Each process will have its own memory buffer and will pass the pointers together with size of memory to the interrupt service routine which will take care about writing to the output register (except for the first writing which will be done by the process). In order to 'serve' multiple processes with the need to transmit data we thought about implementing a 'request queue' (which can eventually be even prioritized maybe according to the link budget you suggested in an earlier thread). When the process put the request in the queue it also 'attempts' to start the transmission but if the resource is locked it will not succeed. At the beginning the lock is free and the first process which attempts to start the transmission will lock it, while all the others which follow will only manage to set their request in the queue. Given this logic, the interrupt service routine, once it completed transmitting the first packet, should also check the queue and lock the resource in case there is a pending packet. Since lock is set and reset by both ordinary processes and interrupt service routines I am concerned about running into a race condition. But I should say that I did not yet find the possibility for this race to occur... so maybe you are right, a 'test_and_set' function might be not useful.
> As I remember, your program schedules a number of ordinary processes or > coroutines co-operatively (without preemption), while these processes > (and the scheduler) can be interruped for interrupt handling. > > If you want the lock to provide mutual exclusion only between ordinary > processes, you can use a simple boolean flag variable that is set by > assignment statements in the normal way (non-atomically), because the > ordinary processes do not preempt each other. (However, to make a > process wait for a lock held by some other process, you must make the > first process poll repeatedly for the lock, since your scheduler cannot > -- as I recall -- suspend a process to wait for an event.)
Indeed this is my first try and I will have it tested today. And yes, there's no way to suspend a process, therefore my processes are always returning to the main loop, while waiting for the transmission to be finished. The lock is indeed needed only to prevent multiple processes to start a new transmission when another one is going on, but they will still be able to append their request to the queue (if not full! In that case they should wait until their request is appended to the queue).
> > If you want the lock to provide mutual exclusion between ordinary > processes and interrupt handlers, you would indeed need a way to test > and set a flag atomically. But if the interrupt occurs when the flag is > set to "locked", the interrupt handler cannot wait for the lock to be > opened, so it can do nothing. This seems very inconvenient to me; the > interrupt is effectively "lost" since it cannot update the locked data > structure.
The locking mechanism will not prevent the interrupt to run. The interrupt will continue to write bytes to the serial output until the packet is complete and then will free the lock. In the situation where there is another packet in the queue the same interrupt service routine will pull the data from the buffer and transmit it, setting again the lock, in order to prevent another process to start transmission.
> > The nice point about using interrupt disabling (or masking) to implement > mutual exclusion between non-interrupt processes and interrupt processes > is that this makes the interrupt wait (be pending) until the lock is > opened (interrupts enabled or unmasked). > > Why do you want to avoid interrupt disabling/masking? It seems to me > that interrupt disabling/masking is the only usable solution for mutual > exclusion in your environment.
The reason why I would have liked to avoid disabling/masking interrupt is to understand what I'm doing :-). I currently have a 'model' - in my mind - of how the processor works and according to the model I should be able to do what I want without the need to disable the interrupts. I simply do not like to do things without the need to do them and, as you pointed out, it is possible that a simple flag will be sufficient.
On 7/12/2012 6:52 PM, dp wrote:
[...]
> > Atomic access without having interrupts masked takes some > particular hardware in the processor, like a "test and set" > instruction on the 68k or like the lwarx/stwcx. pair on > power. >
I found an interesting article on the 80x86 which suggests the use of other types of instructions which are not intended for 'test and set' but can achieve the same functionality: http://maven.smith.edu/~thiebaut/ArtOfAssembly/CH19/CH19-12.html the author talks about how to use the /xchg/ operation or the /shr/ operation in order to achieve the same effect without the need of a 'test and set' instruction.
> You cannot do this in software, let alone trying to do it > by mastering the programming language of the device at phrase-book > level (in your case, C).
I'm pretty sure you can do this with several CPUs at 'phrase-book' level, since is just about calling a built-in function of gcc: http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html And I was indeed wondering whether the g21k, which is based on gcc, provided this possibility. I did not find anything in the documentation which points to this functionality, but you never know whether the documentation is fully describing the compiler.
> > Even if you are fluent in the device's assembly if there is > no opcode doing that there is just no way of doing it.
You can use different opcodes to achieve the same result.
> Notice that you do not need the complete atomicity of > TAS and lwarx/stwcx. which are usable in multiprocessor > systems as well; all you need is something equivalent to > the 68k bset and bclr (bit test and set, bit test and > clear), which do not lock the bus but can't be interrupted > by an IRQ.
indeed and the article I refer to is exactly showing how to do this without a 'test and set' instruction.
> > Dimiter > > ------------------------------------------------------ > Dimiter Popoff Transgalactic Instruments > > http://www.tgi-sci.com > ------------------------------------------------------ > http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ >
On 7/13/2012 6:08 AM, George Neuner wrote:
> On Thu, 12 Jul 2012 09:52:11 -0700 (PDT), dp <dp@tgi-sci.com> wrote:
[...]
>> You cannot do this in software, let alone trying to do it >> by mastering the programming language of the device at phrase-book >> level (in your case, C). > > Actually mutual exclusion CAN be done in software ... at least on a > uniprocessor with in-order execution. See Dekker's algorithm, or > Peterson's algorithm, or Lamport's algorithm, or any of a dozen > others.
Thanks for the pointers, I indeed had a look at those, but it looks to me that if there's some 'reordering' going on the logic of these algorithms will fail, unless some memory barrier is introduced.
> > And if you have atomic instructions you don't really need them to set > condition codes ... an atomic register-memory swap will suffice.
this is also what I understood.
> > We don't really know that the OP needs atomicity ... only that he > needs mutual exclusion.
I tend to believe now that I only need mutual exclusion and considering that my processes are not running in a multithreading environment I should be safe without an atomic instruction.
>> Notice that you do not need the complete atomicity of >> TAS and lwarx/stwcx. which are usable in multiprocessor >> systems as well; all you need is something equivalent to >> the 68k bset and bclr (bit test and set, bit test and >> clear), which do not lock the bus but can't be interrupted >> by an IRQ. > > Almost. Even on a uniprocessor, without bus locking you need to > guarantee in-order execution. That may mean using barrier > instructions on an OoO processor. > > Parallel ops notwithstanding, the 21020 executes in-order anyway so > that is not an issue. However, I don't think it has any atomic > instructions ... I don't have a manual handy to double check, but > unlike the 21060, the 21020 wasn't intended for multiprocessor use.
Quoting the manual:
> 3.2.1 Sequential Instruction Flow > The program sequencer determines the next instruction address by > examining both the current instruction being executed and the current > state of the processor. If no conditions require otherwise, the ADSP-21020 > executes instructions from program memory in sequential order by simply > incrementing the fetch address.
But what makes things a little worrisome is that the g21k does attempt to reorder instructions.
On Jul 13, 11:10=A0am, alb <alessandro.bas...@cern.ch> wrote:
> .... > > > You cannot do this in software, let alone trying to do it > > by mastering the programming language of the device at phrase-book > > level (in your case, C). > > I'm pretty sure you can do this with several CPUs at 'phrase-book' > level, since is just about calling a built-in function of gcc:
No you cannot. In order to execute more than one opcode in sequence without the sequence being interrupted you need either to have the interrupt masked or guarantee there will be no interrupts during that time span.
> http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
You may also want to read the links you post. It took me 2-3 seconds to spot the sentence saying the code there does not work on all processors.
> > Notice that you do not need the complete atomicity of > > TAS and lwarx/stwcx. which are usable in multiprocessor > > systems as well; all you need is something equivalent to > > the 68k bset and bclr (bit test and set, bit test and > > clear), which do not lock the bus but can't be interrupted > > by an IRQ. > > indeed and the article I refer to is exactly showing how to do this > without a 'test and set' instruction.
Well you certainly are entitled to your beliefs. Show us the code when you invent the software uninterruptible test and set. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 7/13/2012 2:26 PM, dp wrote:
[...]
>> >> I'm pretty sure you can do this with several CPUs at 'phrase-book' >> level, since is just about calling a built-in function of gcc: >
[...]
> >> http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html > > You may also want to read the links you post. It took me 2-3 seconds > to spot the sentence saying the code there does not work on > all processors.
You may also want to read my post which, for your convenience, is copied here for the relevant part. I never said the code works on *all processors*. [...]
>> >> indeed and the article I refer to is exactly showing how to do this >> without a 'test and set' instruction. >
[...]
> Show us the code when you invent the software uninterruptible > test and set.
as per the link I posted (on 80x86 architectures): example #1:
> InUseLoop: mov al, 0 ;0=In Use > xchg al, Flag > cmp al, 0 > je InUseLoop
example #2:
> InUseLoop: shr Flag, 1 ;In-use bit to carry, 0->Flag. > jnc InUseLoop ;Repeat if already in use.
both xchg and shr are instructions which achieve test_and_set functionality with no need to have dedicated hardware. On ADSP21020 there are several (7) operations which are not interruptable. Quoting the UM:
> Certain ADSP-21020 operations that span more than one cycle hold off > interrupts. If an interrupt occurs during one of these operations, it is > synchronized and latched, but its processing is delayed
So I was wondering if one can take advantage of these operations to implement a 'test and set' which is atomic in the sense that cannot be interrupted.
On Jul 13, 4:09=A0pm, alb <alessandro.bas...@cern.ch> wrote:
> .... > > InUseLoop: =A0 =A0 =A0mov =A0 =A0 al, 0 =A0 =A0 =A0 =A0 =A0 ;0=3DIn Use > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xchg =A0 =A0al, Flag > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cmp =A0 =A0 al, 0 > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 je =A0 =A0 =A0InUseLoop > > example #2: > > > InUseLoop: =A0 =A0 =A0shr =A0 =A0 Flag, 1 =A0 =A0 =A0 =A0 ;In-use bit t=
o carry, 0->Flag.
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jnc =A0 =A0 InUseLoop =A0 =A0 =A0 ;Repe=
at if already in use.
> > both xchg and shr are instructions which achieve test_and_set > functionality with no need to have dedicated hardware.
Having xchg (and perhaps shr, don't know what it is doing, I am not x86 interested) _is_ the hardware it takes. It is indeed a read then write opcode which cannot be interrupted, equivalent to bset/bclr in this context.
> > On ADSP21020 there are several (7) operations which are not > interruptable. Quoting the UM: > > > Certain ADSP-21020 operations that span more than one cycle hold off > > interrupts. If an interrupt occurs during one of these operations, it i=
s
> > synchronized and latched, but its processing is delayed
Most if not all processors have non single cycle opcodes (e.g. division is non single cycle even on all RISC cores I have seen). Unless you have a specific, implemented in hardware, read and write opcode so it cannot be interrupted (which obviously will take
> 1 cycle) the cycle count is simply not relevant.
While writing this I recalled that you seemed to have a lengthy thread on FIFOs; if that is meant for implementing a FIFO well, you just do not need the masking etc. For the FIFO read and write pointers, typically one of them is written only by the interrupt service routine and the other - only by the interruptible code. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

The 2024 Embedded Online Conference