atomic test_and_set on ADSP21020

Hi everyone,

I'm trying to implement a locking mechanism without the need to disable
the interrupts (and yes, I know I can do this disabling interrupts) and
I thought about a 'test-and-set' function which will atomically set the
lock which will then be released at the end of the critical section.

Does anybody know about an implementation with the target's Instruction
Set?
I found in the C runtime library for the ADSP21060 (not ADSP21020) a
function called test_and_set_semaphore which is intended for the bus
lock in a multiprocessor system where there's the need to lock access to
the address bus.
The function looks overly complicated though, including a timeout I do
not need. Maybe I can strip off the timeout part and use the rest.

In your opinion, is it still possible to implement it in C language,
keeping the atomicity?
Here I have found that the 'volatile' type qualifier can be a tricky
business, moreover, considering that reordering can be an issue with
these types of operations, I'm afraid about what is written in the g21k
documentation. Here an excerpt:

> 4.3.4 Reordering & Optimization

> If an asm() has output operands, G21K assumes, for optimization
> purposes, the instruction lacks side effects except to change the output
> operands. This does not mean that instructions with a side effect
> cannot be used, but you must be careful. The compiler may eliminate
> them if the output operands are not used, or move them out of loops,
> or replace two with one if they constitute a common subexpression.
> Also, if your instruction does have a side effect on a variable that
> otherwise appears not to change, the old value of the variable may be
> reused later if it happens to be found in a register.
> You can prevent an asm() instruction from being deleted, moved
> significantly, or combined, by writing the keyword volatile after
> the asm . For example:

> #define set_priority(x) \
> asm volatile (&#4294967295;set_priority %0;&#4294967295;: /* no outputs */ : &#4294967295;d&#4294967295; (x))

> Note: Even a volatile asm() instruction can be moved in ways that
> appear insignificant to the compiler, such as across jump instructions.
> You cannot expect a sequence of volatile asm() instructions to
> remain perfectly consecutive. If you want consecutive output, use a
> single asm() construct. Another way to avoid reordering is to use
> the output of an asm() construct in a C instruction.
> Even with these precautions, be aware that the -O and -O2
> compiler switches may cause asm() instructions to be moved.
> Compile without optimization to minimize the chance of reordering.

It is surprising to me the amount of haziness in the documentation.
'moved significantly', 'ways that appear insignificant', 'minimize the
chance'.
<sarcastic mode ON>
Shall a make a probability distribution and measure what is the
probability that my code gets reordered?
<sarcastic mode OFF>

Any suggestion or pointer is appreciated.

Al

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Reply by dp ●July 12, 20122012-07-12

On Jul 12, 4:27=A0pm, alb <alessandro.bas...@cern.ch> wrote:
> Hi everyone,
>
> I'm trying to implement a locking mechanism without the need to disable
> the interrupts
> ....
> In your opinion, is it still possible to implement it in C language,
> keeping the atomicity?

Atomic access without having interrupts masked takes some
particular hardware in the processor, like a "test and set"
instruction on the 68k or like the lwarx/stwcx. pair on
power.

You cannot do this in software, let alone trying to do it
by mastering the programming language of the device at phrase-book
level (in your case, C).

Even if you are fluent in the device's assembly if there is
no opcode doing that there is just no way of doing it.
Notice that you do not need the complete atomicity of
TAS and lwarx/stwcx. which are usable in multiprocessor
systems as well; all you need is something equivalent to
the 68k bset and bclr (bit test and set, bit test and
clear), which do not lock the bus but can't be interrupted
by an IRQ.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by Niklas Holsti ●July 12, 20122012-07-12

On 12-07-12 15:27 , alb wrote:
> Hi everyone,
>
> I'm trying to implement a locking mechanism without the need to disable
> the interrupts (and yes, I know I can do this disabling interrupts) and
> I thought about a 'test-and-set' function which will atomically set the
> lock which will then be released at the end of the critical section.

Assuming that this locking mechanism is intended for the same 21020 
system that has featured in your earlier threads, I'm wondering how this 
locking mechanism can be useful.

As I remember, your program schedules a number of ordinary processes or 
coroutines co-operatively (without preemption), while these processes 
(and the scheduler) can be interruped for interrupt handling.

If you want the lock to provide mutual exclusion only between ordinary 
processes, you can use a simple boolean flag variable that is set by 
assignment statements in the normal way (non-atomically), because the 
ordinary processes do not preempt each other. (However, to make a 
process wait for a lock held by some other process, you must make the 
first process poll repeatedly for the lock, since your scheduler cannot 
-- as I recall -- suspend a process to wait for an event.)

If you want the lock to provide mutual exclusion between ordinary 
processes and interrupt handlers, you would indeed need a way to test 
and set a flag atomically. But if the interrupt occurs when the flag is 
set to "locked", the interrupt handler cannot wait for the lock to be 
opened, so it can do nothing. This seems very inconvenient to me; the 
interrupt is effectively "lost" since it cannot update the locked data 
structure.

The nice point about using interrupt disabling (or masking) to implement 
mutual exclusion between non-interrupt processes and interrupt processes 
is that this makes the interrupt wait (be pending) until the lock is 
opened (interrupts enabled or unmasked).

Why do you want to avoid interrupt disabling/masking? It seems to me 
that interrupt disabling/masking is the only usable solution for mutual 
exclusion in your environment.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by George Neuner ●July 13, 20122012-07-13

On Thu, 12 Jul 2012 09:52:11 -0700 (PDT), dp <dp@tgi-sci.com> wrote:

>On Jul 12, 4:27&nbsp;pm, alb <alessandro.bas...@cern.ch> wrote:
>> Hi everyone,
>>
>> I'm trying to implement a locking mechanism without the need to disable
>> the interrupts
>> ....
>> In your opinion, is it still possible to implement it in C language,
>> keeping the atomicity?
>
>Atomic access without having interrupts masked takes some
>particular hardware in the processor, like a "test and set"
>instruction on the 68k or like the lwarx/stwcx. pair on
>power.
>
>You cannot do this in software, let alone trying to do it
>by mastering the programming language of the device at phrase-book
>level (in your case, C).

Actually mutual exclusion CAN be done in software ... at least on a
uniprocessor with in-order execution.  See Dekker's algorithm, or
Peterson's algorithm, or Lamport's algorithm, or any of a dozen
others.

And if you have atomic instructions you don't really need them to set
condition codes ... an atomic register-memory swap will suffice.

We don't really know that the OP needs atomicity ... only that he
needs mutual exclusion.

>Even if you are fluent in the device's assembly if there is
>no opcode doing that there is just no way of doing it.

Maybe.

>Notice that you do not need the complete atomicity of
>TAS and lwarx/stwcx. which are usable in multiprocessor
>systems as well; all you need is something equivalent to
>the 68k bset and bclr (bit test and set, bit test and
>clear), which do not lock the bus but can't be interrupted
>by an IRQ.

Almost.  Even on a uniprocessor, without bus locking you need to
guarantee in-order execution.  That may mean using barrier
instructions on an OoO processor.

Parallel ops notwithstanding, the 21020 executes in-order anyway so
that is not an issue.  However, I don't think it has any atomic
instructions ... I don't have a manual handy to double check, but 
unlike the 21060, the 21020 wasn't intended for multiprocessor use.

>Dimiter
George

Reply by alb ●July 13, 20122012-07-13

On 7/12/2012 11:24 PM, Niklas Holsti wrote:
> On 12-07-12 15:27 , alb wrote:
>> Hi everyone,
>>
>> I'm trying to implement a locking mechanism without the need to disable
>> the interrupts (and yes, I know I can do this disabling interrupts) and
>> I thought about a 'test-and-set' function which will atomically set the
>> lock which will then be released at the end of the critical section.
> 
> Assuming that this locking mechanism is intended for the same 21020
> system that has featured in your earlier threads, I'm wondering how this
> locking mechanism can be useful.
> 

It is indeed intended for the same system, but in order to handle
multiple processes (FSMs) trying to access the output resource there has
been quite a bit of modification to the previous scheme.

First of all we realized the output fifo wasn't really needed. Each
process will have its own memory buffer and will pass the pointers
together with size of memory to the interrupt service routine which will
take care about writing to the output register (except for the first
writing which will be done by the process).

In order to 'serve' multiple processes with the need to transmit data we
thought about implementing a 'request queue' (which can eventually be
even prioritized maybe according to the link budget you suggested in an
earlier thread).

When the process put the request in the queue it also 'attempts' to
start the transmission but if the resource is locked it will not succeed.
At the beginning the lock is free and the first process which attempts
to start the transmission will lock it, while all the others which
follow will only manage to set their request in the queue.

Given this logic, the interrupt service routine, once it completed
transmitting the first packet, should also check the queue and lock the
resource in case there is a pending packet.

Since lock is set and reset by both ordinary processes and interrupt
service routines I am concerned about running into a race condition. But
I should say that I did not yet find the possibility for this race to
occur... so maybe you are right, a 'test_and_set' function might be not
useful.

> As I remember, your program schedules a number of ordinary processes or
> coroutines co-operatively (without preemption), while these processes
> (and the scheduler) can be interruped for interrupt handling.
> 
> If you want the lock to provide mutual exclusion only between ordinary
> processes, you can use a simple boolean flag variable that is set by
> assignment statements in the normal way (non-atomically), because the
> ordinary processes do not preempt each other. (However, to make a
> process wait for a lock held by some other process, you must make the
> first process poll repeatedly for the lock, since your scheduler cannot
> -- as I recall -- suspend a process to wait for an event.)

Indeed this is my first try and I will have it tested today. And yes,
there's no way to suspend a process, therefore my processes are always
returning to the main loop, while waiting for the transmission to be
finished. The lock is indeed needed only to prevent multiple processes
to start a new transmission when another one is going on, but they will
still be able to append their request to the queue (if not full! In that
case they should wait until their request is appended to the queue).

> 
> If you want the lock to provide mutual exclusion between ordinary
> processes and interrupt handlers, you would indeed need a way to test
> and set a flag atomically. But if the interrupt occurs when the flag is
> set to "locked", the interrupt handler cannot wait for the lock to be
> opened, so it can do nothing. This seems very inconvenient to me; the
> interrupt is effectively "lost" since it cannot update the locked data
> structure.

The locking mechanism will not prevent the interrupt to run. The
interrupt will continue to write bytes to the serial output until the
packet is complete and then will free the lock. In the situation where
there is another packet in the queue the same interrupt service routine
will pull the data from the buffer and transmit it, setting again the
lock, in order to prevent another process to start transmission.

> 
> The nice point about using interrupt disabling (or masking) to implement
> mutual exclusion between non-interrupt processes and interrupt processes
> is that this makes the interrupt wait (be pending) until the lock is
> opened (interrupts enabled or unmasked).
> 
> Why do you want to avoid interrupt disabling/masking? It seems to me
> that interrupt disabling/masking is the only usable solution for mutual
> exclusion in your environment.

The reason why I would have liked to avoid disabling/masking interrupt
is to understand what I'm doing :-). I currently have a 'model' - in my
mind - of how the processor works and according to the model I should be
able to do what I want without the need to disable the interrupts. I
simply do not like to do things without the need to do them and, as you
pointed out, it is possible that a simple flag will be sufficient.

Reply by alb ●July 13, 20122012-07-13

On 7/12/2012 6:52 PM, dp wrote:
[...]
> 
> Atomic access without having interrupts masked takes some
> particular hardware in the processor, like a "test and set"
> instruction on the 68k or like the lwarx/stwcx. pair on
> power.
> 

I found an interesting article on the 80x86 which suggests the use of
other types of instructions which are not intended for 'test and set'
but can achieve the same functionality:

http://maven.smith.edu/~thiebaut/ArtOfAssembly/CH19/CH19-12.html

the author talks about how to use the /xchg/ operation or the /shr/
operation in order to achieve the same effect without the need of a
'test and set' instruction.

> You cannot do this in software, let alone trying to do it
> by mastering the programming language of the device at phrase-book
> level (in your case, C).

I'm pretty sure you can do this with several CPUs at 'phrase-book'
level, since is just about calling a built-in function of gcc:

http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

And I was indeed wondering whether the g21k, which is based on gcc,
provided this possibility. I did not find anything in the documentation
which points to this functionality, but you never know whether the
documentation is fully describing the compiler.

> 
> Even if you are fluent in the device's assembly if there is
> no opcode doing that there is just no way of doing it.

You can use different opcodes to achieve the same result.

> Notice that you do not need the complete atomicity of
> TAS and lwarx/stwcx. which are usable in multiprocessor
> systems as well; all you need is something equivalent to
> the 68k bset and bclr (bit test and set, bit test and
> clear), which do not lock the bus but can't be interrupted
> by an IRQ.

indeed and the article I refer to is exactly showing how to do this
without a 'test and set' instruction.

> 
> Dimiter
> 
> ------------------------------------------------------
> Dimiter Popoff               Transgalactic Instruments
> 
> http://www.tgi-sci.com
> ------------------------------------------------------
> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
>

Reply by alb ●July 13, 20122012-07-13

On 7/13/2012 6:08 AM, George Neuner wrote:
> On Thu, 12 Jul 2012 09:52:11 -0700 (PDT), dp <dp@tgi-sci.com> wrote:
[...]
>> You cannot do this in software, let alone trying to do it
>> by mastering the programming language of the device at phrase-book
>> level (in your case, C).
> 
> Actually mutual exclusion CAN be done in software ... at least on a
> uniprocessor with in-order execution.  See Dekker's algorithm, or
> Peterson's algorithm, or Lamport's algorithm, or any of a dozen
> others.

Thanks for the pointers, I indeed had a look at those, but it looks to
me that if there's some 'reordering' going on the logic of these
algorithms will fail, unless some memory barrier is introduced.

> 
> And if you have atomic instructions you don't really need them to set
> condition codes ... an atomic register-memory swap will suffice.

this is also what I understood.

> 
> We don't really know that the OP needs atomicity ... only that he
> needs mutual exclusion.

I tend to believe now that I only need mutual exclusion and considering
that my processes are not running in a multithreading environment I
should be safe without an atomic instruction.

>> Notice that you do not need the complete atomicity of
>> TAS and lwarx/stwcx. which are usable in multiprocessor
>> systems as well; all you need is something equivalent to
>> the 68k bset and bclr (bit test and set, bit test and
>> clear), which do not lock the bus but can't be interrupted
>> by an IRQ.
> 
> Almost.  Even on a uniprocessor, without bus locking you need to
> guarantee in-order execution.  That may mean using barrier
> instructions on an OoO processor.
> 
> Parallel ops notwithstanding, the 21020 executes in-order anyway so
> that is not an issue.  However, I don't think it has any atomic
> instructions ... I don't have a manual handy to double check, but 
> unlike the 21060, the 21020 wasn't intended for multiprocessor use.

Quoting the manual:
> 3.2.1 Sequential Instruction Flow
> The program sequencer determines the next instruction address by
> examining both the current instruction being executed and the current
> state of the processor. If no conditions require otherwise, the ADSP-21020
> executes instructions from program memory in sequential order by simply
> incrementing the fetch address.

But what makes things a little worrisome is that the g21k does attempt
to reorder instructions.

Reply by dp ●July 13, 20122012-07-13

On Jul 13, 11:10=A0am, alb <alessandro.bas...@cern.ch> wrote:
> ....
>
> > You cannot do this in software, let alone trying to do it
> > by mastering the programming language of the device at phrase-book
> > level (in your case, C).
>
> I'm pretty sure you can do this with several CPUs at 'phrase-book'
> level, since is just about calling a built-in function of gcc:

No you cannot. In order to execute more than one opcode in sequence
without the sequence being interrupted you need either to have
the interrupt masked or guarantee there will be no interrupts
during that time span.

> http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

You may also want to read the links you post. It took me 2-3 seconds
to spot the sentence saying the code there does not work on
all processors.

> > Notice that you do not need the complete atomicity of
> > TAS and lwarx/stwcx. which are usable in multiprocessor
> > systems as well; all you need is something equivalent to
> > the 68k bset and bclr (bit test and set, bit test and
> > clear), which do not lock the bus but can't be interrupted
> > by an IRQ.
>
> indeed and the article I refer to is exactly showing how to do this
> without a 'test and set' instruction.

Well you certainly are entitled to your beliefs.
Show us the code when you invent the software uninterruptible
test and set.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by alb ●July 13, 20122012-07-13

On 7/13/2012 2:26 PM, dp wrote:
[...]
>>
>> I'm pretty sure you can do this with several CPUs at 'phrase-book'
>> level, since is just about calling a built-in function of gcc:
> 
[...]
> 
>> http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
> 
> You may also want to read the links you post. It took me 2-3 seconds
> to spot the sentence saying the code there does not work on
> all processors.

You may also want to read my post which, for your convenience, is copied
here for the relevant part. I never said the code works on *all processors*.

[...]
>>
>> indeed and the article I refer to is exactly showing how to do this
>> without a 'test and set' instruction.
> 
[...]
> Show us the code when you invent the software uninterruptible
> test and set.

as per the link I posted (on 80x86 architectures):

example #1:
> InUseLoop:      mov     al, 0           ;0=In Use
>                 xchg    al, Flag
>                 cmp     al, 0
>                 je      InUseLoop

example #2:
> InUseLoop:      shr     Flag, 1         ;In-use bit to carry, 0->Flag.
>                 jnc     InUseLoop       ;Repeat if already in use.

both xchg and shr are instructions which achieve test_and_set
functionality with no need to have dedicated hardware.

On ADSP21020 there are several (7) operations which are not
interruptable. Quoting the UM:

> Certain ADSP-21020 operations that span more than one cycle hold off
> interrupts. If an interrupt occurs during one of these operations, it is
> synchronized and latched, but its processing is delayed

So I was wondering if one can take advantage of these operations to
implement a 'test and set' which is atomic in the sense that cannot be
interrupted.

Reply by dp ●July 13, 20122012-07-13

On Jul 13, 4:09=A0pm, alb <alessandro.bas...@cern.ch> wrote:
> ....
> > InUseLoop: =A0 =A0 =A0mov =A0 =A0 al, 0 =A0 =A0 =A0 =A0 =A0 ;0=3DIn Use
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xchg =A0 =A0al, Flag
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cmp =A0 =A0 al, 0
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 je =A0 =A0 =A0InUseLoop
>
> example #2:
>
> > InUseLoop: =A0 =A0 =A0shr =A0 =A0 Flag, 1 =A0 =A0 =A0 =A0 ;In-use bit t=
o carry, 0->Flag.
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jnc =A0 =A0 InUseLoop =A0 =A0 =A0 ;Repe=
at if already in use.
>
> both xchg and shr are instructions which achieve test_and_set
> functionality with no need to have dedicated hardware.

Having xchg (and perhaps shr, don't know what it is doing, I am not
x86 interested) _is_ the hardware it takes. It is indeed a read then
write opcode which cannot be interrupted, equivalent to bset/bclr
in this context.

>
> On ADSP21020 there are several (7) operations which are not
> interruptable. Quoting the UM:
>
> > Certain ADSP-21020 operations that span more than one cycle hold off
> > interrupts. If an interrupt occurs during one of these operations, it i=
s
> > synchronized and latched, but its processing is delayed

Most if not all processors have non single cycle opcodes (e.g.
division
is non single cycle even on all RISC cores I have seen).
Unless you have a specific, implemented in hardware, read and write
opcode so it cannot be interrupted (which obviously will take
> 1 cycle) the cycle count is simply not relevant.

While writing this I recalled that you seemed to have a lengthy
thread on FIFOs; if that is meant for implementing a FIFO well,
you just do not need the masking etc. For the FIFO read and write
pointers,
typically one of them is written only by the interrupt service
routine and the other - only by the interruptible code.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Previous12 3 4 Next

atomic test_and_set on ADSP21020

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group