EmbeddedRelated.com
Forums

atomic test_and_set on ADSP21020

Started by alb July 12, 2012
On 7/14/2012 12:07 AM, Niklas Holsti wrote:
>> In order to 'serve' multiple processes with the need to transmit data we >> thought about implementing a 'request queue' (which can eventually be >> even prioritized maybe according to the link budget you suggested in an >> earlier thread). >> >> When the process put the request in the queue it also 'attempts' to >> start the transmission but if the resource is locked it will not succeed. >> At the beginning the lock is free and the first process which attempts >> to start the transmission will lock it, while all the others which >> follow will only manage to set their request in the queue. >> >> Given this logic, the interrupt service routine, once it completed >> transmitting the first packet, should also check the queue and lock the >> resource in case there is a pending packet. > > Aha, so the "lock" is not actually protecting access to some independent > data structure, and making processes wait for access, but is more of a > flag that shows whether or not a buffer is being transmitted. This > should be an easier problem to solve.
Indeed, the 'lock' is to prevent writing to the serial port while another transmission is going on, since this will result in lost of character.
> typedef struct buffer_t { > data_t data[MAX_BUFFER_SIZE]; > int length; > int data_index; > bool transmitted; > struct buffer_t *next; > } buffer_t; > // The data in the buffer are data[0 .. length - 1]. > // The "data_index" is the index of the next data[] > // to transmit. > // The "transmitted" flag is just what its name implies. > // The "next" field points to the next buffer in the > // request queue, or is NULL if this buffer is the > // last one. > > volatile buffer_t * q_head = NULL; > // Not NULL when and only when a buffer is being > // transmitted, and then points to the first (i.e. > // currently transmitting) buffer in the request queue. > // > // If a non-interrupt process reads q_head as NULL, > // an output interrupt will NOT happen again, until > // transmission is again started by some process. > // > // If a non-interrupt process reads q_head as not NULL, > // an output interrupt WILL happen after that read of > // q_head (and perhaps immediately after the read). > > volatile buffer_t * volatile q_tail = NULL;
why do you need the pointer to be volatile? Isn't sufficient to have the pointed structure to be volatile? AFAIK, volatile qualifier for a structure makes volatiles for each member of the structure, but in this case you have a recursive structure and I'm not sure I understand fully how the volatile qualifier is interpreted here.
> // NULL when and only when no buffer has ever been > // transmitted or put in the queue, and otherwise > // points to the last buffer in the queue (when > // transmission is going on) or to the last > // buffer transmitted (otherwise).
[...]
> > void transmit (buffer_t * volatile buf) > // Starts transmitting the buf, or enqueues it for > // later tranmission. Assume that this can be > // interrupted at any point by an interrupt from an > // ongoing transmission, but cannot be preempted by > // another non-interrupt process. > { > // Assume that buf->data and buf->length are set > // by caller and that buf->length > 0. > > // Prepare the buffer for enqueing or transmitting: > buf->transmitted = FALSE; > buf->next = NULL; > buf->data_index = 0; > > // Enqueue the buffer, if there is something > // ahead of it in the queue: > if (q_head != NULL) { > // Transmission is going on (or was going on > // when we read q_head). > q_tail->next = buf; > // If transmission was still going on when > // the above assignment to q_tail was executed, > // the interrupt handler will eventually transmit > // this buf. Otherwise transmission stopped > // before reaching this buf.
This is a good hint that I missed in my implementation. Assigning q_tail->next will allow the isr to send this one as soon as the previous is complete. Since I did not have a 'next' pointer in my structure I had to call the transmit() function from the isr in order to pull data from the request queue. It seems to me this approach is more elegant than mine.
> q_tail = buf; > // This update of q_tail is for the benefit of > // the next caller of transmit().
In a two element request queue it seems to me that q_head->next should always be equal to q_tail. Correct?
> } > > // If transmission is now going on, either it is > // still transmitting some earlier buffer in the > // queue, or has already started on transmitting > // this buf. In either case, we have nothing more > // to do. > > // If transmission has now stopped, either it had > // already stopped before we enqueued (or did not > // enqueue) this buf, or it has had time to transmit > // this buf, and then stop. > > // Start transmission, if it stopped before > // transmitting this buf: > if (q_head == NULL) {
it seems to me this is your 'locking' variable. You do not allow any other transmission if the head of the queue is != NULL.
> // No transmission going on now. > if (!(buf->transmitted)) { > // All of the earlier buffers in the queue > // have been transmitted, but not this buf. > // This buf is now the first in the queue. > q_head = buf; > q_tail = buf; > buf->data_index = 1; > // Start transmission: > write_to_output_port (buf->data[0]); > } > } > } > > void output_isr (void) > // Interrupt handler for the interrupt that reports > // "transmission complete" of the last data_t written > // to the output port. > // That data_t was q_head->data[q_head->data_index - 1], > // so the data_t at q_head->data_index is next to be > // transmitted, unless the whole q_head has been > // transmitted already. > // Assume that this can interrupt any non-interrupt > // process at any point, but then itself runs atomically > // (i.e. cannot be interrupted by anything else that > // uses the same variables). > { > buffer_t * buf = q_head; > // This initialization may trigger a warning about > // qualifiers being discarded (because "q_head" is > // qualified as volatile, but "buf" is not), but > // this warning can be ignored. Using "buf" instead of > // q_head is an optimization (perhaps premature) > // to avoid unnecessary repeated reads of q_head > // and its fields. In principle, buf should be > // declared as "buffer_t * volatile buf", but this > // is unnecessary since this function will not be > // in-lined, and instruction reordering within this > // function does not matter, because the function is > // executed atomically with respect to the non- > // interrupt processes. > > if (buf->data_index >= buf->length) { > // This buf was completely transmitted.
[pasted from your next post] buf->transmitted = TRUE;
> // Go on to the next buffer in the queue, > // if there is one: > buf = buf->next; > q_head = buf; > } > > if (buf != NULL) { > // Some data are left to transmit from this > // buf (= q_head). > write_to_output_port (buf->data[buf->data_index]); > buf->data_index ++; > } > } > > > I have not tested the above code, only thought about it. >
I appreciate your effort to sketch down the implementation. It indeed gave me a valuable hint on how to write the transmit() function. I think I still have to pin down a nasty bug in moving pointers around since the program seems to crash always at the same point.
On 12-07-16 11:19 , alb wrote:
> On 7/14/2012 12:07 AM, Niklas Holsti wrote: >>> In order to 'serve' multiple processes with the need to transmit data we >>> thought about implementing a 'request queue' (which can eventually be >>> even prioritized maybe according to the link budget you suggested in an >>> earlier thread). >>> >>> When the process put the request in the queue it also 'attempts' to >>> start the transmission but if the resource is locked it will not succeed. >>> At the beginning the lock is free and the first process which attempts >>> to start the transmission will lock it, while all the others which >>> follow will only manage to set their request in the queue. >>> >>> Given this logic, the interrupt service routine, once it completed >>> transmitting the first packet, should also check the queue and lock the >>> resource in case there is a pending packet. >> >> Aha, so the "lock" is not actually protecting access to some independent >> data structure, and making processes wait for access, but is more of a >> flag that shows whether or not a buffer is being transmitted. This >> should be an easier problem to solve. > > Indeed, the 'lock' is to prevent writing to the serial port while > another transmission is going on, since this will result in lost of > character. > >> typedef struct buffer_t { >> data_t data[MAX_BUFFER_SIZE]; >> int length; >> int data_index; >> bool transmitted; >> struct buffer_t *next; >> } buffer_t; >> // The data in the buffer are data[0 .. length - 1]. >> // The "data_index" is the index of the next data[] >> // to transmit. >> // The "transmitted" flag is just what its name implies. >> // The "next" field points to the next buffer in the >> // request queue, or is NULL if this buffer is the >> // last one. >> >> volatile buffer_t * q_head = NULL; >> // Not NULL when and only when a buffer is being >> // transmitted, and then points to the first (i.e. >> // currently transmitting) buffer in the request queue. >> // >> // If a non-interrupt process reads q_head as NULL, >> // an output interrupt will NOT happen again, until >> // transmission is again started by some process. >> // >> // If a non-interrupt process reads q_head as not NULL, >> // an output interrupt WILL happen after that read of >> // q_head (and perhaps immediately after the read). >> >> volatile buffer_t * volatile q_tail = NULL; > > why do you need the pointer to be volatile?
My mistake (a remmant from an early version of the code). Well spotted. Since q_tail is accessed only in the non-interrupt processes, which do not preempt each other, it does not have to be volatile.
> Isn't sufficient to have the pointed structure to be volatile?
Yes, I agree.
> AFAIK, volatile qualifier for a structure makes volatiles for each > member of the structure,
That is my understanding too. But note that there is a paper by Eric Eide and John Regehr saying that compilers often (well, relatively often) have bugs in their treatment of volatiles (google for "volatiles are miscompiled"). I would definitely check the compiler-generated assembly code, at least once, to verify that my compiler is doing the right thing.
> but in this case you have a recursive structure > and I'm not sure I understand fully how the volatile qualifier is > interpreted here.
I assume that by recursion you mean that the buffer_t field "next" points to another buffer_t. Since q_head is defined as a pointer to a volatile object, accesses to the "next" field of the buffer at q_head are treated as access to a volatile. Since the "next" field is not defined as a pointer *to* a volatile object, accesses to fields of the buffer that q_head->next points to are not treated as volatile accesses, when the code uses the form q_head->next->Xxx. As I understand it.
> In a two element request queue it seems to me that q_head->next should > always be equal to q_tail. Correct?
Yes, if you observe the state when neither output_isr() nor transmit() is running.
>> // Start transmission, if it stopped before >> // transmitting this buf: >> if (q_head == NULL) { > > it seems to me this is your 'locking' variable. You do not allow any > other transmission if the head of the queue is != NULL.
Yes, q_head is certainly the main controlling variable. q_head and the field q_tail->next are the only variables that are accessed concurrently by the interrupt handler and the non-interrupt processes. The code assumes (and I should have remarked on it) that reads and writes (loads and stores) of these variables are atomic in the sense that when a read and a write are concurrent, the read returns either the old or the new value of the variable, and not some half-updated value. This assumption holds for the ADSP21020, where memory read and write are wide enough for a whole pointer, but it would not hold for example on a processor with 8-bit read/write but 16-bit or wider pointers. (In C, unfortunately, there is no "atomic" qualifier similar to the "volatile" qualifier, that could ensure at compile-time that a given variable is atomically read and written. There is only the standard type sig_atomic_t that is ensured to have atomic read/write, but this type may not be large enough to hold a pointer; it is only required to be 8 bits or larger.) Note that while my code has concurrent (racing) read and write of the some variables (q_head and q_tail->next), it avoids concurrent writes, which could lose (overwrite) data. The transmit() function only writes to q_head when it knows that transmission is not going on, and it only writes to the "transmitted" field of a buffer before the buffer is enqueued. Therefore, output_isr() cannot try to write to those variables concurrently.
>> void output_isr (void) >> // Interrupt handler for the interrupt that reports >> // "transmission complete" of the last data_t written >> // to the output port. ... >> { >> buffer_t * buf = q_head; >> // This initialization may trigger a warning about >> // qualifiers being discarded (because "q_head" is >> // qualified as volatile, but "buf" is not), but >> // this warning can be ignored.
A warning about ignoring this warning :-) : The C99 standard says, in paragraph 5 of the section "6.7.3 Type qualifiers", that "If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined", which might apply to this initialization of the "buf" variable. However, the same paragraph has a foot-note saying that this applies to objects that were "never actually defined as objects in the program (such as an object at a memory-mapped input/output address)", which is not the case here.
> I appreciate your effort to sketch down the implementation. It indeed > gave me a valuable hint on how to write the transmit() function.
You are very welcome. But note how simple the transmit() code becomes if interrupts can be disabled or masked: void transmit (buffer_t * volatile buf) // Starts transmitting the buf, or enqueues it for // later tranmission. { // Assume that buf->data and buf->length are set // by caller and that buf->length > 0. // Prepare the buffer for enqueing or transmitting: buf->transmitted = FALSE; buf->next = NULL; prevent_output_interrupts(); if (q_head != NULL) { // Transmission is going on. We can just enqueue // the buffer, and it will be transmitted. buf->data_index = 0; q_tail->next = buf; } else { // No transmission going on now. // This buf becomes the first in the queue, and // we start its transmission. q_head = buf; buf->data_index = 1; // Start transmission: write_to_output_port (buf->data[0]); } q_tail = buf; allow_output_interrupts(); } The interrupt handler code is the same in this case. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Fri, 13 Jul 2012 22:00:38 +0300, upsidedown@downunder.com wrote:

>On Fri, 13 Jul 2012 12:21:27 -0400, George Neuner ><gneuner2@comcast.net> wrote: > >>>I tend to believe now that I only need mutual exclusion and considering >>>that my processes are not running in a multithreading environment I >>>should be safe without an atomic instruction. >> >>The software algorithms also will work under pre-emption as long as a >>single memory access is not interruptible. That sounds strange >>because it's true for all modern chips ... even with VMM an access >>fault cancels and reschedules the instruction, the resulting interrupt >>then is taken between instructions. However, this wasn't necessarily >>true for early processors, some of which had interruptible load and >>store operations. The early software exclusion algorithms were >>designed with this fact in mind. > >Just out of curiosity, what was the dividing line between early and >modern architectures ? Of the early architectures, VAX/VMS could in >the worst case generate about a dozen page fault interrupts for a >single instruction (instruction crossing page boundary, with 1-6 >operands and each causing a page fault) and of these any indirection >causing one or two (unaligned across page boundary) access interrupts.
You can trace the roots of load/store back to the CDC 6600 (1964), but IMO "modern" really began in the 80's with the interlocked load/store RISC designs: MIPS2, PA-RISC, etc. All modern chips effectively are a load/store architecture internally even if externally their programmer visible ISA presents complex addressing modes. But pinning down when the mass market adopted load/store is difficult because it happened in stages. The i486 and the 68040 could be called "mostly load/store" as each had only a couple of (internally) complex modes. The Pentium Pro was a true load/store design, but probably wasn't popular enough to count ... so call the Pentium II the first one for Intel. The 88K was a market failure, so call POWER the first for Motorola. I think a case could be made that ARM really was the first widely used load/store design, but until VMM became common in designs, the issue of interruptible vs restartable instructions didn't much matter.
>The string and decimal instructions could cause multiple additional >page fault interrupts and were definitely restartable, rather than >interruptible instructions, but of course irrelevant to the mutual >exclusion discussion.
String instructions are a special case. They really are a loop of microinstructions which can be interrupted between iterations. [I know not all chips technically execute microcode ... what I'm referring to as "microinstructions" may simply be states in the execution sequencer.] This is where terminology becomes troublesome 8( ... what exactly does it mean to be "restartable" vs "interruptible"? Everyone has a slightly different understanding based on which vendors' documentation they have been reading. FWIW: I consider that an instruction is "interruptible" if a programmer visible context switch can occur during its execution. Excepting some special cases like string operations, most ISAs specify that an interrupt logically can occur only between ISA level instructions "Restartable" is a different matter. I consider that an instruction is restartable if it can be halted and resumed in-progress or can be aborted and reexecuted without noticeable side effects. At the ISA level, instructions may be either resumable or reexecutable. At the microinstruction level, instructions almost always are aborted and reexecuted. On almost all chips today, all but the very simplest ISA level instructions decompose into a sequence of microinstructions. When a microinstruction sequence representing a memory accessing instruction takes a fault, the entire sequence is rolled back to the beginning and the VMM interrupt is generated. Once the fault is cleared and the interrupt returns, the microinstruction sequence is reexecuted. Whether this is a full reexecution from the beginning or a resume from the previous point of failure is implementation dependent. WRT software mutual exclusion algorithms, realizing that any memory access may be arbitrarily delayed is where the difference between in-order and out-of-order execution becomes important. These algorithms all depend on specific ordering of memory accesses to work properly. As long as those ordering requirements are respected, mutual exclusion will be guaranteed even in the presence of arbitrary thread delays. Special care has to taken on OoO chips to guarantee that the memory accesses do occur in the required order. Lamport's algorithm is unique because it purports to work with multiple CPUs, however it still requires in-order accesses from all participating CPUs. Hope this ... doesn't confuse the issue more 8-) George
On Fri, 13 Jul 2012 10:24:53 -0700 (PDT), dp <dp@tgi-sci.com> wrote:

>On Jul 13, 7:29&nbsp;pm, Robert Wessel <robertwess...@yahoo.com> wrote: >> ... >> Assuming you can guarantee the visible ordering of certain memory >> accesses, Dekker's Algorithm can be used to do mutual exclusion for a >> two processor system, without requiring any atomic update instruction. >> It's very old. &nbsp;There are more complicated algorithms than can handle >> more than two processors (Eisenberg and McGuire's, for example), >> again, requiring nothing more than some control over memory ordering. > >It's been 13 years since I last considered something of that sort, >in fact IIRC what I reinvented - and discarded as unusable - was what >seems to be called "Dekker's algorithm". >It only lowers the probability of a conflict. >The thing is, to read and update a flag takes two separate accesses; >if one of the requestors interrupts the other between these two >accesses, completes the transaction, then it still tests the >address and succeeds for a second time - after which gets interrupted >by the first one which writes its ID (not knowing an entire cycle >has been up etc.) we have a conflict.
No. If you examine them carefully, each of the algorithms relies on 2 separate flags being set and tested in a particular order. The use of 2 flags guarantees that the interruption of any operation does not compromise the integrity of the algorithm.
>It can be practical in some cases, perhaps the timing of some systems >might guarantee success, but this is by no means obvious.
I agree that their operation is not particularly obvious, but these algorithms have been very well proven. They are basic knowledge for introductory CS operating system courses. They should be taught with basic algorithms to every programmer. Now that many (most?) programmers are writing threaded code, IMO every programmer should be aware of them and how they work.
>: >Basically, no indivisible read/write means no means to exclude >the mess.
No, it only means the way is more complicated.
>Dimiter
George
On Mon, 16 Jul 2012 17:07:30 -0400, George Neuner
<gneuner2@comcast.net> wrote:

>On Fri, 13 Jul 2012 22:00:38 +0300, upsidedown@downunder.com wrote:
>>Just out of curiosity, what was the dividing line between early and >>modern architectures ? Of the early architectures, VAX/VMS could in >>the worst case generate about a dozen page fault interrupts for a >>single instruction (instruction crossing page boundary, with 1-6 >>operands and each causing a page fault) and of these any indirection >>causing one or two (unaligned across page boundary) access interrupts. > >You can trace the roots of load/store back to the CDC 6600 (1964), but >IMO "modern" really began in the 80's with the interlocked load/store >RISC designs: MIPS2, PA-RISC, etc. All modern chips effectively are a >load/store architecture internally even if externally their programmer >visible ISA presents complex addressing modes. But pinning down when >the mass market adopted load/store is difficult because it happened in >stages.
<clip> Thank you very much for this description, I haven't followed the detailed HW development since the 1980's.
>WRT software mutual exclusion algorithms, realizing that any memory >access may be arbitrarily delayed is where the difference between >in-order and out-of-order execution becomes important. These >algorithms all depend on specific ordering of memory accesses to work >properly. As long as those ordering requirements are respected, >mutual exclusion will be guaranteed even in the presence of arbitrary >thread delays. Special care has to taken on OoO chips to guarantee >that the memory accesses do occur in the required order.
In the x86 family, the CPUID instruction seems to be a quite popular method to drain the pipeline.
On Jul 17, 1:45=A0am, George Neuner <gneun...@comcast.net> wrote:
> On Fri, 13 Jul 2012 10:24:53 -0700 (PDT), dp <d...@tgi-sci.com> wrote: > ... > >It's been 13 years since I last considered something of that sort, > >in fact IIRC what I reinvented - and discarded as unusable - was what > >seems to be called "Dekker's algorithm". > >It only lowers the probability of a conflict. > >The thing is, to read and update a flag takes two separate accesses; > >if one of the requestors interrupts the other between these two > >accesses, completes the transaction, then it still tests the > >address and succeeds for a second time - after which gets interrupted > >by the first one which writes its ID (not knowing an entire cycle > >has been up etc.) we have a conflict. > > No. =A0If you examine them carefully, each of the algorithms relies on 2 > separate flags being set and tested in a particular order. =A0The use of > 2 flags guarantees that the interruption of any operation does not > compromise the integrity of the algorithm
Indeed, I was wrong. Taking advantage of "in order" memory access (which can be enforced on the "out of order" machines I know) can be used. I did not really remember what exactly I have considered 13 years ago... :-). I suppose the fact that expanding this for multiple tasks (say, 64) would take each task examine all 63 flags etc. has made it less appealing to me or something. Or the fact that I just had an easier option. But it can be made to work indeed - as far as I can see at a glance, there is no fundamental flaw in it. Dimiter
On 17/07/12 10:39, dp wrote:
> On Jul 17, 1:45 am, George Neuner<gneun...@comcast.net> wrote: >> On Fri, 13 Jul 2012 10:24:53 -0700 (PDT), dp<d...@tgi-sci.com> wrote: >> ... >>> It's been 13 years since I last considered something of that sort, >>> in fact IIRC what I reinvented - and discarded as unusable - was what >>> seems to be called "Dekker's algorithm". >>> It only lowers the probability of a conflict. >>> The thing is, to read and update a flag takes two separate accesses; >>> if one of the requestors interrupts the other between these two >>> accesses, completes the transaction, then it still tests the >>> address and succeeds for a second time - after which gets interrupted >>> by the first one which writes its ID (not knowing an entire cycle >>> has been up etc.) we have a conflict. >> >> No. If you examine them carefully, each of the algorithms relies on 2 >> separate flags being set and tested in a particular order. The use of >> 2 flags guarantees that the interruption of any operation does not >> compromise the integrity of the algorithm > > Indeed, I was wrong. Taking advantage of "in order" memory access > (which can be enforced on the "out of order" machines I know) can > be used. > I did not really remember what exactly I have considered 13 years > ago... :-). > I suppose the fact that expanding this for multiple tasks > (say, 64) would take each task examine all 63 flags etc. has > made it less appealing to me or something. Or the fact that I > just had an easier option. > But it can be made to work indeed - as far as I can see at > a glance, there is no fundamental flaw in it. > > Dimiter
There are other algorithms that are more efficient for multiple tasks - as you note, Dekker's is not very scalable. But in most cases, there should only be a few tasks competing for any given mutex/critical section/semaphore - you don't really want 64 different tasks all fighting for the same shared resource! Note also that you don't actually need proper in-order memory access here (as enforced by things like the eieio or msync instructions on a PPC, or by memory area policies) - you only need the /appearance/ of in-order memory access. So if you are talking about multiple tasks on the same processor, you don't need anything beyond "volatile" accesses - even if the reads and writes are cached or buffered in some way, the processor will still view them in the same order.
On 16/07/12 12:06, Niklas Holsti wrote:
> On 12-07-16 11:19 , alb wrote: >> AFAIK, volatile qualifier for a structure makes volatiles for each >> member of the structure, > > That is my understanding too. But note that there is a paper by Eric > Eide and John Regehr saying that compilers often (well, relatively > often) have bugs in their treatment of volatiles (google for "volatiles > are miscompiled"). I would definitely check the compiler-generated > assembly code, at least once, to verify that my compiler is doing the > right thing. >
(I haven't studied the code here - this is just general comments.) The paper looks mainly at complex uses of "volatile", especially in connection with "sequence points" and with poorly specified areas of C. An example would be the expression "(v + v)", where "v" is declared volatile - should the compiler read "v" twice and add the results, can it simply read "v" once and double the result, or perhaps read "v" twice and double one of the results? The C specifications are very unclear on such matters, and compiler writers differ in their interpretations - as do programmers writing the code. I am not convinced that E&R are correct in all /their/ interpretations either. All we can be sure of is that with complex expressions using volatiles, we can't all be right - so such code should be avoided. When using "volatile", the key to correct coding is to realise there is no such thing as volatile data, or volatile variables. There are only volatile /accesses/. A "volatile variable" is nothing more than an ordinary variable with a note to the compiler that all normal accesses should be "volatile". Think about when you want your /reads/ and /writes/ to be volatile, and you will get it right. I often don't even bother declaring the variables to be volatile - I use macros to enforce volatile reads or writes as needed. And keep all volatile-related code simple - then the compiler will get it right too. Rather than writing "v = (v + v)", write "int v1 = v; int v2 = v; v = v1 + v2" to make explicit exactly the functionality you want.
On Wed, 18 Jul 2012 11:05:46 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>On 17/07/12 10:39, dp wrote: > >> I suppose the fact that expanding [software mutex] for multiple >> tasks (say, 64) would take each task examine all 63 flags etc. has >> made it less appealing to me or something. Or the fact that I >> just had an easier option. >> But it can be made to work indeed - as far as I can see at >> a glance, there is no fundamental flaw in it. >> > >There are other algorithms that are more efficient for multiple tasks - >as you note, Dekker's is not very scalable. But in most cases, there >should only be a few tasks competing for any given mutex/critical >section/semaphore - you don't really want 64 different tasks all >fighting for the same shared resource!
In fact, none of the software only algorithms scale particularly well to large numbers of tasks.
>Note also that you don't actually need proper in-order memory access >here (as enforced by things like the eieio or msync instructions on a >PPC, or by memory area policies) - you only need the /appearance/ of >in-order memory access. So if you are talking about multiple tasks on >the same processor, you don't need anything beyond "volatile" accesses - >even if the reads and writes are cached or buffered in some way, the >processor will still view them in the same order.
Yes, with a caveat which may make no difference to anyone here but is mentioned for completeness. For SMT ("hyperthreaded") chips, each physical core behaves as 2 or more logical cores executing separate instruction streams. But unlike software multitasking, in SMT the logical cores really do execute *simultaneously* (assuming sufficient chip resources are available). Exactly when and in what order writes become visible to all logical cores is implementation dependent. Code running on such a chip may have to obey the same restrictions as for a true multiprocessor even when there is only one physical core present. George
On 19/07/12 22:12, George Neuner wrote:
> On Wed, 18 Jul 2012 11:05:46 +0200, David Brown > <david.brown@removethis.hesbynett.no> wrote: > >> On 17/07/12 10:39, dp wrote: >> >>> I suppose the fact that expanding [software mutex] for multiple >>> tasks (say, 64) would take each task examine all 63 flags etc. has >>> made it less appealing to me or something. Or the fact that I >>> just had an easier option. >>> But it can be made to work indeed - as far as I can see at >>> a glance, there is no fundamental flaw in it. >>> >> >> There are other algorithms that are more efficient for multiple tasks - >> as you note, Dekker's is not very scalable. But in most cases, there >> should only be a few tasks competing for any given mutex/critical >> section/semaphore - you don't really want 64 different tasks all >> fighting for the same shared resource! > > In fact, none of the software only algorithms scale particularly well > to large numbers of tasks. > >> Note also that you don't actually need proper in-order memory access >> here (as enforced by things like the eieio or msync instructions on a >> PPC, or by memory area policies) - you only need the /appearance/ of >> in-order memory access. So if you are talking about multiple tasks on >> the same processor, you don't need anything beyond "volatile" accesses - >> even if the reads and writes are cached or buffered in some way, the >> processor will still view them in the same order. > > Yes, with a caveat which may make no difference to anyone here but is > mentioned for completeness. > > For SMT ("hyperthreaded") chips, each physical core behaves as 2 or > more logical cores executing separate instruction streams. But unlike > software multitasking, in SMT the logical cores really do execute > *simultaneously* (assuming sufficient chip resources are available). > Exactly when and in what order writes become visible to all logical > cores is implementation dependent. Code running on such a chip may > have to obey the same restrictions as for a true multiprocessor even > when there is only one physical core present. > > George
Indeed - SMT has to be treated like true SMP as far as these things are concerned.