EmbeddedRelated.com
Forums

atomic test_and_set on ADSP21020

Started by alb July 12, 2012
On 7/13/2012 3:35 PM, dp wrote:
> On Jul 13, 4:09 pm, alb <alessandro.bas...@cern.ch> wrote: >> .... >>> InUseLoop: mov al, 0 ;0=In Use >>> xchg al, Flag >>> cmp al, 0 >>> je InUseLoop >> >> example #2: >> >>> InUseLoop: shr Flag, 1 ;In-use bit to carry, 0->Flag. >>> jnc InUseLoop ;Repeat if already in use. >> >> both xchg and shr are instructions which achieve test_and_set >> functionality with no need to have dedicated hardware. > > Having xchg (and perhaps shr, don't know what it is doing, I am not > x86 interested) _is_ the hardware it takes. It is indeed a read then > write opcode which cannot be interrupted, equivalent to bset/bclr > in this context.
shr is 'shift right' operation. But then I do not understand your point, you *always* need some hardware which your software runs with (except for simulators). In the examples above a shift operation is used to achieve the test_and_set functionality.
> >> >> On ADSP21020 there are several (7) operations which are not >> interruptable. Quoting the UM: >> >>> Certain ADSP-21020 operations that span more than one cycle hold off >>> interrupts. If an interrupt occurs during one of these operations, it is >>> synchronized and latched, but its processing is delayed > > Most if not all processors have non single cycle opcodes (e.g. > division > is non single cycle even on all RISC cores I have seen).
Most of the instruction set of the adsp21020 is single cycle operation.
> Unless you have a specific, implemented in hardware, read and write > opcode so it cannot be interrupted (which obviously will take >> 1 cycle) the cycle count is simply not relevant.
The ALU of the target I'm using has a series of Status Flags which are modified upon execution of the operation. I believe that checking these flags may lead to a similar result of the shift operation posted.
> > While writing this I recalled that you seemed to have a lengthy > thread on FIFOs; if that is meant for implementing a FIFO well, > you just do not need the masking etc. For the FIFO read and write > pointers, > typically one of them is written only by the interrupt service > routine and the other - only by the interruptible code. >
This post is about mutual exclusion and how to achieve that without the need to mask the interrupts. But even in the FIFO implementation a 'write_to_fifo()' should prevent 'read_from_fifo()' to run, since both of them will change the status of the EMPTY/FULL flag. Indeed there's the possibility not to use the EMPTY/FULL flag at all, leaving always 1 element of the fifo empty and in this case there should be no race between the two functions.
On Jul 13, 4:12=A0pm, alb <alessandro.bas...@cern.ch> wrote:
Indeed there's
> the possibility not to use the EMPTY/FULL flag at all, leaving always 1 > element of the fifo empty and in this case there should be no race > between the two functions.
Now I understand why you wanted the flag. I always just use the 1 element empty strategy.
On Jul 13, 5:12=A0pm, alb <alessandro.bas...@cern.ch> wrote:
> > ... But then I do not understand your point, > you *always* need some hardware which your software runs with (except > for simulators). In the examples above a shift operation is used to > achieve the test_and_set functionality.
It always takes some hardware, yes. Sometimes you do have the hardware to do read/write in one indivisible opcode and sometimes you do not. Whether it is a shift or whatever operation is irrelevant, the key is that after the operation you know the state of at least one bit in a memory location prior to _and_ after the operation, this operation having been never interrupted. Just check the opcodes of your processor for such an instruction, will take less time than doing a single usenet post :-). (Not that I mind your posts, the group needed some revival so yours are welcome by me at least).
>... Indeed there's > the possibility not to use the EMPTY/FULL flag at all, leaving always 1 > element of the fifo empty and in this case there should be no race > between the two functions.
This is the "normal" way of doing it for me, has been for decades really. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Fri, 13 Jul 2012 10:47:31 +0200, alb <alessandro.basili@cern.ch>
wrote:

>On 7/13/2012 6:08 AM, George Neuner wrote: > >> ... mutual exclusion CAN be done in software ... at least on a >> uniprocessor with in-order execution. See Dekker's algorithm, or >> Peterson's algorithm, or Lamport's algorithm, or any of a dozen >> others. > >Thanks for the pointers, I indeed had a look at those, but it looks to >me that if there's some 'reordering' going on the logic of these >algorithms will fail, unless some memory barrier is introduced.
Yes. Hence the "in-order" execution requirement.
>> We don't really know that the OP needs atomicity ... only that he >> needs mutual exclusion. > >I tend to believe now that I only need mutual exclusion and considering >that my processes are not running in a multithreading environment I >should be safe without an atomic instruction.
The software algorithms also will work under pre-emption as long as a single memory access is not interruptible. That sounds strange because it's true for all modern chips ... even with VMM an access fault cancels and reschedules the instruction, the resulting interrupt then is taken between instructions. However, this wasn't necessarily true for early processors, some of which had interruptible load and store operations. The early software exclusion algorithms were designed with this fact in mind.
>> ... Even on a uniprocessor, without bus locking you need to >> guarantee in-order execution. That may mean using barrier >> instructions on an OoO processor. >> >> Parallel ops notwithstanding, the 21020 executes in-order anyway so >> that is not an issue. However, I don't think it has any atomic >> instructions ... I don't have a manual handy to double check, but >> unlike the 21060, the 21020 wasn't intended for multiprocessor use. > >Quoting the manual: >> 3.2.1 Sequential Instruction Flow >> The program sequencer determines the next instruction address by >> examining both the current instruction being executed and the current >> state of the processor. If no conditions require otherwise, the ADSP-21020 >> executes instructions from program memory in sequential order by simply >> incrementing the fetch address. > >But what makes things a little worrisome is that the g21k does attempt >to reorder instructions.
Yes, but the compiler won't elide accesses to volatile variables - meaning it won't cache a volatile in a register but will store the value back immediately after changing it and will reload the value from memory every time it is needed. Nor will it reorder volatile accesses wrt other accesses ... C (and C++) considers a volatile variable access to be a program sequencing point. I never played much with g21k ... back when I was doing DSP work, the company had VisualDSP for the 060, and since it covered the 020 as well I just used it for everything. With respect to g21k being a GCC derivative, some of the early GCC versions had a *lot* of bugs (particularly some of the 3.x) ... so you might want to check out the bug list for the version your compiler is based on (separate from the provided g21k bug list which may not include all the GCC bugs). George
On Fri, 13 Jul 2012 05:26:54 -0700 (PDT), dp <dp@tgi-sci.com> wrote:

>On Jul 13, 11:10&#4294967295;am, alb <alessandro.bas...@cern.ch> wrote:
(...)
>> > Notice that you do not need the complete atomicity of >> > TAS and lwarx/stwcx. which are usable in multiprocessor >> > systems as well; all you need is something equivalent to >> > the 68k bset and bclr (bit test and set, bit test and >> > clear), which do not lock the bus but can't be interrupted >> > by an IRQ. >> >> indeed and the article I refer to is exactly showing how to do this >> without a 'test and set' instruction. > >Well you certainly are entitled to your beliefs. >Show us the code when you invent the software uninterruptible >test and set.
Assuming you can guarantee the visible ordering of certain memory accesses, Dekker's Algorithm can be used to do mutual exclusion for a two processor system, without requiring any atomic update instruction. It's very old. There are more complicated algorithms than can handle more than two processors (Eisenberg and McGuire's, for example), again, requiring nothing more than some control over memory ordering. In either case, if you cannot force some memory ordering, you're not going to manage much with a shared memory multi-processor system, so those, with system specific extensions (if needed) to force memory ordering, should work on all useful SMP systems. While mutual exclusion is not quite what test-an-set doe, but once you have either, you can synthesize the other. As for "non-interruptible updates", these can be adapted to provide that - basically the interrupt routine has to run its half of Dekker's, and then if the other side has already acquired the lock, it has to continue running the original (interrupted) code until it releases the lock (and then resumes the interrupt handler). This is usually practical only inside an OS kernel where both the "other" routine that wanted to do the uninterruptable update and the interrupt handler run in the same address space (of course you're not generally going to be able to disable interrupts outside the kernel anyway). Ugly, but it can be made to work. In general, all stuff to be avoided if you have proper hardware support for atomics and disabling interrupts.
On Jul 13, 7:29=A0pm, Robert Wessel <robertwess...@yahoo.com> wrote:
> ... > Assuming you can guarantee the visible ordering of certain memory > accesses, Dekker's Algorithm can be used to do mutual exclusion for a > two processor system, without requiring any atomic update instruction. > It's very old. =A0There are more complicated algorithms than can handle > more than two processors (Eisenberg and McGuire's, for example), > again, requiring nothing more than some control over memory ordering.
It's been 13 years since I last considered something of that sort, in fact IIRC what I reinvented - and discarded as unusable - was what seems to be called "Dekker's algorithm". It only lowers the probability of a conflict. The thing is, to read and update a flag takes two separate accesses; if one of the requestors interrupts the other between these two accesses, completes the transaction, then it still tests the address and succeeds for a second time - after which gets interrupted by the first one which writes its ID (not knowing an entire cycle has been up etc.) we have a conflict. It can be practical in some cases, perhaps the timing of some systems might guarantee success, but this is by no means obvious. The path I took back then was similar (for two processors), but I had a CPLD which allowed me to insert a timeout for one of the accesses so the above cited conflict could never occur. Basically, no indivisible read/write means no means to exclude the mess. On small systems this is just being done by masking the interrupt, on large ones - like power - by "check if successful and rerun if not", perhaps the simplest of all as it does not require locking of the bus (but it does need hardware detection of a write to the shared location, just a write detection is also OK). Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Fri, 13 Jul 2012 12:21:27 -0400, George Neuner
<gneuner2@comcast.net> wrote:

>>I tend to believe now that I only need mutual exclusion and considering >>that my processes are not running in a multithreading environment I >>should be safe without an atomic instruction. > >The software algorithms also will work under pre-emption as long as a >single memory access is not interruptible. That sounds strange >because it's true for all modern chips ... even with VMM an access >fault cancels and reschedules the instruction, the resulting interrupt >then is taken between instructions. However, this wasn't necessarily >true for early processors, some of which had interruptible load and >store operations. The early software exclusion algorithms were >designed with this fact in mind.
Just out of curiosity, what was the dividing line between early and modern architectures ? Of the early architectures, VAX/VMS could in the worst case generate about a dozen page fault interrupts for a single instruction (instruction crossing page boundary, with 1-6 operands and each causing a page fault) and of these any indirection causing one or two (unaligned across page boundary) access interrupts. The string and decimal instructions could cause multiple additional page fault interrupts and were definitely restartable, rather than interruptible instructions, but of course irrelevant to the mutual exclusion discussion.
On 12-07-13 10:58 , alb wrote:
> On 7/12/2012 11:24 PM, Niklas Holsti wrote: >> On 12-07-12 15:27 , alb wrote: >>> Hi everyone, >>> >>> I'm trying to implement a locking mechanism without the need to disable >>> the interrupts (and yes, I know I can do this disabling interrupts) and >>> I thought about a 'test-and-set' function which will atomically set the >>> lock which will then be released at the end of the critical section. >> >> Assuming that this locking mechanism is intended for the same 21020 >> system that has featured in your earlier threads, I'm wondering how this >> locking mechanism can be useful. >> > > It is indeed intended for the same system, but in order to handle > multiple processes (FSMs) trying to access the output resource there has > been quite a bit of modification to the previous scheme. > > First of all we realized the output fifo wasn't really needed. Each > process will have its own memory buffer and will pass the pointers > together with size of memory to the interrupt service routine which will > take care about writing to the output register (except for the first > writing which will be done by the process).
That sounds like a sensible simplification, for your system. But it does mean that the each process must keep track of whether its buffer has been transmitted, and not use the buffer for a new message until the preceding message has been transmitted.
> In order to 'serve' multiple processes with the need to transmit data we > thought about implementing a 'request queue' (which can eventually be > even prioritized maybe according to the link budget you suggested in an > earlier thread). > > When the process put the request in the queue it also 'attempts' to > start the transmission but if the resource is locked it will not succeed. > At the beginning the lock is free and the first process which attempts > to start the transmission will lock it, while all the others which > follow will only manage to set their request in the queue. > > Given this logic, the interrupt service routine, once it completed > transmitting the first packet, should also check the queue and lock the > resource in case there is a pending packet.
Aha, so the "lock" is not actually protecting access to some independent data structure, and making processes wait for access, but is more of a flag that shows whether or not a buffer is being transmitted. This should be an easier problem to solve.
> The reason why I would have liked to avoid disabling/masking interrupt > is to understand what I'm doing :-). I currently have a 'model' - in my > mind - of how the processor works and according to the model I should be > able to do what I want without the need to disable the interrupts. I > simply do not like to do things without the need to do them and, as you > pointed out, it is possible that a simple flag will be sufficient.
Yes, I think it can be solved without disabling interrupts (although I would still use interrupt disabling, as the simpler solution). Below is a sketch of a solution that I believe should work, when the request queue is as non-prioritized FIFO queue. A prioritized queue would be rather more difficult, and probably interrupt disabling would be the only practical solution in that case. Apologies for a long piece of code, but here it is: typedef struct buffer_t { data_t data[MAX_BUFFER_SIZE]; int length; int data_index; bool transmitted; struct buffer_t *next; } buffer_t; // The data in the buffer are data[0 .. length - 1]. // The "data_index" is the index of the next data[] // to transmit. // The "transmitted" flag is just what its name implies. // The "next" field points to the next buffer in the // request queue, or is NULL if this buffer is the // last one. volatile buffer_t * q_head = NULL; // Not NULL when and only when a buffer is being // transmitted, and then points to the first (i.e. // currently transmitting) buffer in the request queue. // // If a non-interrupt process reads q_head as NULL, // an output interrupt will NOT happen again, until // transmission is again started by some process. // // If a non-interrupt process reads q_head as not NULL, // an output interrupt WILL happen after that read of // q_head (and perhaps immediately after the read). volatile buffer_t * volatile q_tail = NULL; // NULL when and only when no buffer has ever been // transmitted or put in the queue, and otherwise // points to the last buffer in the queue (when // transmission is going on) or to the last // buffer transmitted (otherwise). // Assume that all buffers in the queue are statically // allocated, or at least never deallocated, and that // the "next" field is irrelevant after the buffer has // been transmitted, until the buffer is again enqueued. // // Also assume that all buffers entered in the queue, or // for which transmission is started, have "length" > 0. void transmit (buffer_t * volatile buf) // Starts transmitting the buf, or enqueues it for // later tranmission. Assume that this can be // interrupted at any point by an interrupt from an // ongoing transmission, but cannot be preempted by // another non-interrupt process. { // Assume that buf->data and buf->length are set // by caller and that buf->length > 0. // Prepare the buffer for enqueing or transmitting: buf->transmitted = FALSE; buf->next = NULL; buf->data_index = 0; // Enqueue the buffer, if there is something // ahead of it in the queue: if (q_head != NULL) { // Transmission is going on (or was going on // when we read q_head). q_tail->next = buf; // If transmission was still going on when // the above assignment to q_tail was executed, // the interrupt handler will eventually transmit // this buf. Otherwise transmission stopped // before reaching this buf. q_tail = buf; // This update of q_tail is for the benefit of // the next caller of transmit(). } // If transmission is now going on, either it is // still transmitting some earlier buffer in the // queue, or has already started on transmitting // this buf. In either case, we have nothing more // to do. // If transmission has now stopped, either it had // already stopped before we enqueued (or did not // enqueue) this buf, or it has had time to transmit // this buf, and then stop. // Start transmission, if it stopped before // transmitting this buf: if (q_head == NULL) { // No transmission going on now. if (!(buf->transmitted)) { // All of the earlier buffers in the queue // have been transmitted, but not this buf. // This buf is now the first in the queue. q_head = buf; q_tail = buf; buf->data_index = 1; // Start transmission: write_to_output_port (buf->data[0]); } } } void output_isr (void) // Interrupt handler for the interrupt that reports // "transmission complete" of the last data_t written // to the output port. // That data_t was q_head->data[q_head->data_index - 1], // so the data_t at q_head->data_index is next to be // transmitted, unless the whole q_head has been // transmitted already. // Assume that this can interrupt any non-interrupt // process at any point, but then itself runs atomically // (i.e. cannot be interrupted by anything else that // uses the same variables). { buffer_t * buf = q_head; // This initialization may trigger a warning about // qualifiers being discarded (because "q_head" is // qualified as volatile, but "buf" is not), but // this warning can be ignored. Using "buf" instead of // q_head is an optimization (perhaps premature) // to avoid unnecessary repeated reads of q_head // and its fields. In principle, buf should be // declared as "buffer_t * volatile buf", but this // is unnecessary since this function will not be // in-lined, and instruction reordering within this // function does not matter, because the function is // executed atomically with respect to the non- // interrupt processes. if (buf->data_index >= buf->length) { // This buf was completely transmitted. // Go on to the next buffer in the queue, // if there is one: buf = buf->next; q_head = buf; } if (buf != NULL) { // Some data are left to transmit from this // buf (= q_head). write_to_output_port (buf->data[buf->data_index]); buf->data_index ++; } } I have not tested the above code, only thought about it. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Oops, an editing error dropped one important statement:

On 12-07-14 01:07 , Niklas Holsti wrote:
> > void output_isr (void) > // Interrupt handler for the interrupt that reports > // "transmission complete" of the last data_t written > // to the output port.... > { > buffer_t * buf = q_head; > // ... > > if (buf->data_index >= buf->length) { > // This buf was completely transmitted.
Add: buf->transmitted = TRUE;
> // Go on to the next buffer in the queue, > // if there is one: > buf = buf->next; > q_head = buf; > } > > if (buf != NULL) { > // Some data are left to transmit from this > // buf (= q_head). > write_to_output_port (buf->data[buf->data_index]); > buf->data_index ++; > } > }
-- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Fri, 13 Jul 2012 10:24:53 -0700 (PDT), dp <dp@tgi-sci.com> wrote:

>On Jul 13, 7:29&#4294967295;pm, Robert Wessel <robertwess...@yahoo.com> wrote: >> ... >> Assuming you can guarantee the visible ordering of certain memory >> accesses, Dekker's Algorithm can be used to do mutual exclusion for a >> two processor system, without requiring any atomic update instruction. >> It's very old. &#4294967295;There are more complicated algorithms than can handle >> more than two processors (Eisenberg and McGuire's, for example), >> again, requiring nothing more than some control over memory ordering. > >It's been 13 years since I last considered something of that sort, >in fact IIRC what I reinvented - and discarded as unusable - was what >seems to be called "Dekker's algorithm". >It only lowers the probability of a conflict. >The thing is, to read and update a flag takes two separate accesses; >if one of the requestors interrupts the other between these two >accesses, completes the transaction, then it still tests the >address and succeeds for a second time - after which gets interrupted >by the first one which writes its ID (not knowing an entire cycle >has been up etc.) we have a conflict. >It can be practical in some cases, perhaps the timing of some systems >might guarantee success, but this is by no means obvious. >The path I took back then was similar (for two processors), but >I had a CPLD which allowed me to insert a timeout for one of the >accesses so the above cited conflict could never occur. > >Basically, no indivisible read/write means no means to exclude >the mess. On small systems this is just being done by masking >the interrupt, on large ones - like power - by "check if successful >and rerun if not", perhaps the simplest of all as it does not >require locking of the bus (but it does need hardware detection >of a write to the shared location, just a write detection >is also OK).
I'm not quite sure what you mean, but Dekker's is well understood (and has been for half a century), and does, in fact, provide mutual exclusion, while avoiding deadlock and starvation, while requiring neither atomic update or disabled interrupts. The Wikipedia article on it is OK, but doesn't really into too much depth. Dijkstra's paper describing Dekker's is linked, and is thorough, but not really easy going.