Task priorities in non strictly real-time systems| page 2

Reply by pozz ●January 4, 20202020-01-04

Il 03/01/2020 15:19, David Brown ha scritto:
> With pre-emptive scheduling, you will have to go
> through your existing code and make very sure that you have locks or
> synchronisation in place for any shared resources or data.

As I already wrote many times, I don't have experience with RTOS and 
task sync mechanism such as semaphores, locks, mutexes, message queues 
and so on.
So I'm not able to understand when a sync is really needed.

Could you point on a good simple material to study (online or book)?

For example, many times I have a serial channel where soma data are 
received. A frame parser decodes the "wire data" in variables, accessed 
by other tasks.

while(1) {
   serial_task();  // frame receiver/parser
   main_task();    // uses variables touched by frame parser
}

Supposing all the variables are of type int (i.e., they are changed 
atomically in serial_task()), should I need to protect them with locks, 
because they are used by main_task() too?

I think lock isn't needed, except main_task() needs to have a coerent 
data values for all the variables (all variables with new values or old 
values).

Reply by Clifford Heath ●January 4, 20202020-01-04

On 5/1/20 11:58 am, pozz wrote:
> Il 03/01/2020 15:19, David Brown ha scritto:
>> With pre-emptive scheduling, you will have to go
>> through your existing code and make very sure that you have locks or
>> synchronisation in place for any shared resources or data.
> 
> As I already wrote many times, I don't have experience with RTOS and 
> task sync mechanism such as semaphores, locks, mutexes, message queues 
> and so on.
> So I'm not able to understand when a sync is really needed.
> Could you point on a good simple material to study (online or book)?

I wish I could, but it is actually a frightfully difficult subject.
Basically it's the same as thread-safe programming.
Only about 1% of programmers think they can do it.
Of those, only about 1% actually can.

It's the 0.99% that you have to worry about. At least some of them for 
Toyota. Don't be one of them!

However, this difficulty is precisely why Rust was created. Although I 
haven't yet done a project in Rust, I've done enough multi-threaded work 
in C++ to know that the ideas in Rust are a massive leap forwards, and 
anyone doing this kind of work (especially professionally) owes it to 
their users to learn it.

> Supposing all the variables are of type int (i.e., they are changed 
> atomically in serial_task()), should I need to protect them with locks, 
> because they are used by main_task() too?

If "int" is your CPUs word size, you are using word alignment, and you 
don't have multiple CPUs with separate caches accessing the same RAM, 
you're probably ok for individual variables. However you will come 
unstuck if you expect assignments and reads to be performed in the same 
order you wrote them. A modern compiler will freely re-order things in 
extremely ambitious and unexpected ways, in order to keep the pipeline 
flowing.

I cannot emphasise this enough. The compiler will do what it can to make 
your program do what it thinks you have asked for - which will NOT be 
the same as what you think you have asked for.

You need to understand about basic mutex operations, preferably also 
semaphores, and beyond that to read and write barriers (if you want to 
write lock-free code). It's a big subject.

Clifford Heath.

Reply by Paul Rubin ●January 5, 20202020-01-05

pozz <pozzugno@gmail.com> writes:
> As I already wrote many times, I don't have experience with RTOS and
> task sync mechanism such as semaphores, locks, mutexes, message queues
> and so on.
> So I'm not able to understand when a sync is really needed.
>
> Could you point on a good simple material to study (online or book)?

I have found it simplest to have tasks communicate by message passing,
the so-called "CSP model" (communicating sequential processes), rather
than fooling around with explicit locks.  With locks you have to worry
about lock inversion and all kinds of other madness, and your main hope
of getting it right is formal methods, like Lamport used for the Paxos
algorithm.  Message passing incurs some cpu overhead because of the
interprocess communication and context switches, but it gets rid of a
lot of ways things go wrong.

If your RTOS supports message passing (look for "mailboxes" in the RTOS
docs) then I'd say use them.

The language most associated with CSP style is Erlang, which doesn't
really fit on small embedded devices, but Erlang materials might still
be a good place to learn about the style.  Erlang inventor Joe
Armstrong's book might be a good place to start:

   http://erlang.org/download/erlang-book-part1.pdf

At the much lower end, you could check out Brad Rodriguez's articles
about Forth multitaskers:

https://www.bradrodriguez.com/papers/mtasking.html

and related ones at https://www.bradrodriguez.com/papers/ .

Reply by ●January 5, 20202020-01-05

On Sun, 5 Jan 2020 14:26:12 +1100, Clifford Heath <no.spam@please.net>
wrote:

>On 5/1/20 11:58 am, pozz wrote:
>> Il 03/01/2020 15:19, David Brown ha scritto:
>>> With pre-emptive scheduling, you will have to go
>>> through your existing code and make very sure that you have locks or
>>> synchronisation in place for any shared resources or data.
>> 
>> As I already wrote many times, I don't have experience with RTOS and 
>> task sync mechanism such as semaphores, locks, mutexes, message queues 
>> and so on.
>> So I'm not able to understand when a sync is really needed.
>> Could you point on a good simple material to study (online or book)?
>
>I wish I could, but it is actually a frightfully difficult subject.
>Basically it's the same as thread-safe programming.
>Only about 1% of programmers think they can do it.
>Of those, only about 1% actually can.
>
>It's the 0.99% that you have to worry about. At least some of them for 
>Toyota. Don't be one of them!
>
>However, this difficulty is precisely why Rust was created. Although I 
>haven't yet done a project in Rust, I've done enough multi-threaded work 
>in C++ to know that the ideas in Rust are a massive leap forwards, and 
>anyone doing this kind of work (especially professionally) owes it to 
>their users to learn it.
>
>> Supposing all the variables are of type int (i.e., they are changed 
>> atomically in serial_task()), should I need to protect them with locks, 
>> because they are used by main_task() too?

You may need some double buffering in one form or another.

Assuming you have a receiver byte buffer that can take a full serial
message and a structure of integers that will receive the values
decoded from the message.

When the serial task notices the end of a message. it immediately
decodes the values into the integers in the struct. After this, the
serial byte buffer is ready to start receiving the next message.The
serial task can then inform the main task that new data is in the
integer structure and main task can copy it to local variables. 

Alternatively, if a serial byte buffer is not used but the received
bytes are decoded into the integer fields in the fly, then a copy of
the struct may be provided, e.g. after the last integer has been
decoded, put the complete struct into a mailbox, if the RTOS provides
mailbox support.

In both cases the main task has a full message transfer time to
process a message, before it has to process the next serial message.
If the main task is incapable of processing the messages in time, then
the program is faulty at least in the hard real time sense.

>
>If "int" is your CPUs word size, you are using word alignment, and you 
>don't have multiple CPUs with separate caches accessing the same RAM, 
>you're probably ok for individual variables. However you will come 
>unstuck if you expect assignments and reads to be performed in the same 
>order you wrote them. A modern compiler will freely re-order things in 
>extremely ambitious and unexpected ways, in order to keep the pipeline 
>flowing.

Using volatile declaration and turn of optimization will help. Better
yet, use small assembler routines to have full control of actual
memory access.

>
>I cannot emphasise this enough. The compiler will do what it can to make 
>your program do what it thinks you have asked for - which will NOT be 
>the same as what you think you have asked for.
>
>You need to understand about basic mutex operations, preferably also 
>semaphores, and beyond that to read and write barriers (if you want to 
>write lock-free code). It's a big subject.

t least with a small micro controller, simply disable interrupts for a
critical section. Of course the critical section must behave like a
real interrupt, limit the number of instructions and do not call any
library routines.

Reply by David Brown ●January 5, 20202020-01-05

On 05/01/2020 04:26, Clifford Heath wrote:
> On 5/1/20 11:58 am, pozz wrote:
>> Il 03/01/2020 15:19, David Brown ha scritto:
>>> With pre-emptive scheduling, you will have to go
>>> through your existing code and make very sure that you have locks or
>>> synchronisation in place for any shared resources or data.
>>
>> As I already wrote many times, I don't have experience with RTOS and 
>> task sync mechanism such as semaphores, locks, mutexes, message queues 
>> and so on.
>> So I'm not able to understand when a sync is really needed.
>> Could you point on a good simple material to study (online or book)?
> 
> I wish I could, but it is actually a frightfully difficult subject.
> Basically it's the same as thread-safe programming.
> Only about 1% of programmers think they can do it.
> Of those, only about 1% actually can.
> 
> It's the 0.99% that you have to worry about. At least some of them for 
> Toyota. Don't be one of them!

All good points.

> 
> However, this difficulty is precisely why Rust was created. Although I 
> haven't yet done a project in Rust, I've done enough multi-threaded work 
> in C++ to know that the ideas in Rust are a massive leap forwards, and 
> anyone doing this kind of work (especially professionally) owes it to 
> their users to learn it.

"Safe" languages like Rust can help for simple issues, but won't give 
any benefits of the more challenging cases.  If you understand the 
basics of multi-threading, and have a good, careful development 
methodology, you won't have the kind of problems that Rust would help 
you with.  Maybe Rust will help for some cases, but don't believe that 
it is a game-changer.

> 
>> Supposing all the variables are of type int (i.e., they are changed 
>> atomically in serial_task()), should I need to protect them with 
>> locks, because they are used by main_task() too?
> 
> If "int" is your CPUs word size, you are using word alignment, and you 
> don't have multiple CPUs with separate caches accessing the same RAM, 
> you're probably ok for individual variables. However you will come 
> unstuck if you expect assignments and reads to be performed in the same 
> order you wrote them. A modern compiler will freely re-order things in 
> extremely ambitious and unexpected ways, in order to keep the pipeline 
> flowing.

Yes.  Simple reads and writes of aligned data that is no bigger than the 
cpu's word size will be atomic without any more effort.  But complex 
accesses (like "x++;") are not atomic on most processors.  And you don't 
have any ordering unless you use "volatile", or memory fences of some kind.

A key mistake many people make is to think that non-volatile accesses 
are also ordered by volatile accesses - this is, of course, untrue.

> 
> I cannot emphasise this enough. The compiler will do what it can to make 
> your program do what it thinks you have asked for - which will NOT be 
> the same as what you think you have asked for.

Well, it /will/ be the same as you think you asked for when you know 
what you are doing!

> 
> You need to understand about basic mutex operations, preferably also 
> semaphores, and beyond that to read and write barriers (if you want to 
> write lock-free code). It's a big subject.
> 

Yes.

Or he can use cooperative multitasking, and avoid many of these issues!

> Clifford Heath.

Reply by David Brown ●January 5, 20202020-01-05

On 05/01/2020 10:21, upsidedown@downunder.com wrote:
> On Sun, 5 Jan 2020 14:26:12 +1100, Clifford Heath <no.spam@please.net>

>>
>> If "int" is your CPUs word size, you are using word alignment, and you
>> don't have multiple CPUs with separate caches accessing the same RAM,
>> you're probably ok for individual variables. However you will come
>> unstuck if you expect assignments and reads to be performed in the same
>> order you wrote them. A modern compiler will freely re-order things in
>> extremely ambitious and unexpected ways, in order to keep the pipeline
>> flowing.
> 
> Using volatile declaration and turn of optimization will help. Better
> yet, use small assembler routines to have full control of actual
> memory access.

Turning off optimisation is /never/ the answer!  (Baring bugging 
compilers, of course.)

If your code "works with optimisation disabled", your code is /wrong/. 
In over 25 years in this business, I have never seen an exception.

Remember, there is no such thing as "disabling optimisations" - 
compilers can re-arrange code and apply whatever transformations they 
like, according to the C standards, with a total disregard for your 
choice of optimisation settings.  These settings are guidelines, not 
part of the semantics of the language - the language and the freedoms 
the compiler has do not change (unless your compiler specifically 
documents the changes).

And even if you think it is a "workaround" that is good enough for now, 
you are creating a maintainability nightmare.  Or worse - you are 
creating something that works fine during your testing and fails when 
deployed.

"Volatile", when used correctly, can be helpful.

Assembly routines for memory accesses are usually a bad idea - 
inefficient, inflexible and error-prone.

If you want a simple and relatively fool-proof system, all you really 
need are two functions (preferably inline) :

interrupt_status_t disableGlobalInterrupts(void);
void restoreGlobalInterrupts(interrupt_status_t old_status);

These must both act as full memory fences.

Then you can put whatever code needs atomic behaviour within a critical 
section bracketed by these functions.

You need more work if you have other memory masters (DMA, second 
processor, etc.).

> 
>>
>> I cannot emphasise this enough. The compiler will do what it can to make
>> your program do what it thinks you have asked for - which will NOT be
>> the same as what you think you have asked for.
>>
>> You need to understand about basic mutex operations, preferably also
>> semaphores, and beyond that to read and write barriers (if you want to
>> write lock-free code). It's a big subject.
> 
> t least with a small micro controller, simply disable interrupts for a
> critical section. Of course the critical section must behave like a
> real interrupt, limit the number of instructions and do not call any
> library routines.
>

Reply by David Brown ●January 5, 20202020-01-05

On 05/01/2020 07:34, Paul Rubin wrote:
> pozz <pozzugno@gmail.com> writes:
>> As I already wrote many times, I don't have experience with RTOS and
>> task sync mechanism such as semaphores, locks, mutexes, message queues
>> and so on.
>> So I'm not able to understand when a sync is really needed.
>>
>> Could you point on a good simple material to study (online or book)?
> 
> I have found it simplest to have tasks communicate by message passing,
> the so-called "CSP model" (communicating sequential processes), rather
> than fooling around with explicit locks.  With locks you have to worry
> about lock inversion and all kinds of other madness, and your main hope
> of getting it right is formal methods, like Lamport used for the Paxos
> algorithm.  Message passing incurs some cpu overhead because of the
> interprocess communication and context switches, but it gets rid of a
> lot of ways things go wrong.
> 
> If your RTOS supports message passing (look for "mailboxes" in the RTOS
> docs) then I'd say use them.
> 

Yes - message passing (whether asynchronous with queues, or synchronous 
with CSP style) is often a lot easier to get right than complicated 
locking mechanisms.

> The language most associated with CSP style is Erlang, which doesn't
> really fit on small embedded devices, but Erlang materials might still
> be a good place to learn about the style.  Erlang inventor Joe
> Armstrong's book might be a good place to start:
> 
>     http://erlang.org/download/erlang-book-part1.pdf
> 

I've worked indirectly with Erlang (I made the microcontroller half of 
the system, in C, while someone else wrote the Linux half in Erlang).  I 
was not impressed - he spend a lot of time figuring out things that 
should have been very simple.  It is just one sample point, of course, 
and not enough to condemn a whole language - but it does mean Erlang is 
not high on my "languages to learn when I have time" list.

Far and away the most popular "CSP language" is Go, as I understand it. 
Another option is XC for XMOS devices, but that is hardware-specific.

> At the much lower end, you could check out Brad Rodriguez's articles
> about Forth multitaskers:
> 
> https://www.bradrodriguez.com/papers/mtasking.html
> 
> and related ones at https://www.bradrodriguez.com/papers/ .
>

Reply by ●January 5, 20202020-01-05

On Sun, 5 Jan 2020 14:07:37 +0100, David Brown
<david.brown@hesbynett.no> wrote:

>On 05/01/2020 10:21, upsidedown@downunder.com wrote:
>> On Sun, 5 Jan 2020 14:26:12 +1100, Clifford Heath <no.spam@please.net>
>
>>>
>>> If "int" is your CPUs word size, you are using word alignment, and you
>>> don't have multiple CPUs with separate caches accessing the same RAM,
>>> you're probably ok for individual variables. However you will come
>>> unstuck if you expect assignments and reads to be performed in the same
>>> order you wrote them. A modern compiler will freely re-order things in
>>> extremely ambitious and unexpected ways, in order to keep the pipeline
>>> flowing.
>> 
>> Using volatile declaration and turn of optimization will help. Better
>> yet, use small assembler routines to have full control of actual
>> memory access.
>
>Turning off optimisation is /never/ the answer!  (Baring bugging 
>compilers, of course.)
>
>If your code "works with optimisation disabled", your code is /wrong/. 
>In over 25 years in this business, I have never seen an exception.
>
>Remember, there is no such thing as "disabling optimisations" - 
>compilers can re-arrange code and apply whatever transformations they 
>like, according to the C standards, with a total disregard for your 
>choice of optimisation settings.  These settings are guidelines, not 
>part of the semantics of the language - the language and the freedoms 
>the compiler has do not change (unless your compiler specifically 
>documents the changes).

The problem is the C standard or actually the language lawyers (in
most languages) who do not have understanding for multithreading or
multiprocessors.

>And even if you think it is a "workaround" that is good enough for now, 
>you are creating a maintainability nightmare.  Or worse - you are 
>creating something that works fine during your testing and fails when 
>deployed.
>
>"Volatile", when used correctly, can be helpful.
>
>Assembly routines for memory accesses are usually a bad idea - 
>inefficient, inflexible and error-prone.

In any hardware platforms with at least memory location increment or
decrement operation as on a single instruction performing
read/modify/write memory access cycle is often quite hardly.

Even if you can't get an atomic R/M/W cycle, there are often similar
tricks e.g. such as using the lock prefix in x86
 

>If you want a simple and relatively fool-proof system, all you really 
>need are two functions (preferably inline) :
>
>interrupt_status_t disableGlobalInterrupts(void);
>void restoreGlobalInterrupts(interrupt_status_t old_status);
>
>These must both act as full memory fences.
>
>Then you can put whatever code needs atomic behaviour within a critical 
>section bracketed by these functions.

Is this standard C in some recent standard variant ?


>You need more work if you have other memory masters (DMA, second 
>processor, etc.).

This has a lot to do with cache coherence.

Reply by Niklas Holsti ●January 5, 20202020-01-05

On 2020-01-05 15:12, David Brown wrote:
> On 05/01/2020 07:34, Paul Rubin wrote:
>> pozz <pozzugno@gmail.com> writes:
>>> As I already wrote many times, I don't have experience with RTOS and
>>> task sync mechanism such as semaphores, locks, mutexes, message queues
>>> and so on.
>>> So I'm not able to understand when a sync is really needed.
>>>
>>> Could you point on a good simple material to study (online or book)?
>>
>> I have found it simplest to have tasks communicate by message passing,
>> the so-called "CSP model" (communicating sequential processes), rather
>> than fooling around with explicit locks.

    [snip]

> Yes - message passing (whether asynchronous with queues, or synchronous 
> with CSP style) is often a lot easier to get right than complicated 
> locking mechanisms.
> 
>> The language most associated with CSP style is Erlang, which doesn't
>> really fit on small embedded devices, but Erlang materials might still
>> be a good place to learn about the style.&nbsp; Erlang inventor Joe
>> Armstrong's book might be a good place to start:
>>
>> &nbsp;&nbsp;&nbsp; http://erlang.org/download/erlang-book-part1.pdf

    [snip]

> Far and away the most popular "CSP language" is Go, as I understand it. 
> Another option is XC for XMOS devices, but that is hardware-specific.

Another language with CSP-style primitives is Ada (the "rendez-vous" 
feature), although AIUI most embedded Ada programs currently being 
implemented use the alternative "monitor"-like primitives (the 
"protected object" feature), which can be used to implement critical 
regions, or CSP-like message passing, or buffered (queued) message 
passing, or for many other styles.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by David Brown ●January 5, 20202020-01-05

On 05/01/2020 16:46, upsidedown@downunder.com wrote:
> On Sun, 5 Jan 2020 14:07:37 +0100, David Brown
> <david.brown@hesbynett.no> wrote:
> 
>> On 05/01/2020 10:21, upsidedown@downunder.com wrote:
>>> On Sun, 5 Jan 2020 14:26:12 +1100, Clifford Heath <no.spam@please.net>
>>
>>>>
>>>> If "int" is your CPUs word size, you are using word alignment, and you
>>>> don't have multiple CPUs with separate caches accessing the same RAM,
>>>> you're probably ok for individual variables. However you will come
>>>> unstuck if you expect assignments and reads to be performed in the same
>>>> order you wrote them. A modern compiler will freely re-order things in
>>>> extremely ambitious and unexpected ways, in order to keep the pipeline
>>>> flowing.
>>>
>>> Using volatile declaration and turn of optimization will help. Better
>>> yet, use small assembler routines to have full control of actual
>>> memory access.
>>
>> Turning off optimisation is /never/ the answer!  (Baring bugging
>> compilers, of course.)
>>
>> If your code "works with optimisation disabled", your code is /wrong/.
>> In over 25 years in this business, I have never seen an exception.
>>
>> Remember, there is no such thing as "disabling optimisations" -
>> compilers can re-arrange code and apply whatever transformations they
>> like, according to the C standards, with a total disregard for your
>> choice of optimisation settings.  These settings are guidelines, not
>> part of the semantics of the language - the language and the freedoms
>> the compiler has do not change (unless your compiler specifically
>> documents the changes).
> 
> The problem is the C standard or actually the language lawyers (in
> most languages) who do not have understanding for multithreading or
> multiprocessors.

The main point of C11 is support for multi-threading and multi-processor 
systems.  The standards, and the language lawyers, /do/ understand it.

The more advanced and progressive compilers support C11.  Many embedded 
ones do not, but that is the fault of the compiler vendors, and perhaps 
of developers who don't realise that they should be insisting on it.

The big missing feature, however, is that you need an implementation of 
some of the functions in the C11 threading libraries, and the 
implementation must fit the OS in use.  That's not too hard for Linux or 
Windows, but a different world in embedded systems.  Still, it should be 
possible to make C11 library support for FreeRTOS, mbed, and any other 
RTOS you like.

Key points like atomics, fences, and language semantics for 
multi-threading are in place.

(And C++ is more helpful in providing higher level multi-threading 
features.)

So we are far from having nice multi-threading integration in C 
toolchains, but nearly as far as you suggest.

> 
>> And even if you think it is a "workaround" that is good enough for now,
>> you are creating a maintainability nightmare.  Or worse - you are
>> creating something that works fine during your testing and fails when
>> deployed.
>>
>> "Volatile", when used correctly, can be helpful.
>>
>> Assembly routines for memory accesses are usually a bad idea -
>> inefficient, inflexible and error-prone.
> 
> In any hardware platforms with at least memory location increment or
> decrement operation as on a single instruction performing
> read/modify/write memory access cycle is often quite hardly.
> 

/Some/ hardware platforms that let you do "x++" as a single instruction 
on memory, do so atomically.  Many others do not.  Typically, they are 
atomic on small 8-bit CISC microcontrollers.  On larger processors, you 
rarely get such instructions at all (they don't exist on any kind of 
RISC cpu).  And even when you /do/ get them, they may be implemented by 
multiple separate actions.  Perhaps they are atomic with respect to 
other code on the same core (such as interrupts), but not with respect 
to DMA or other cores.

So this kind of thing can sometimes be acceptable on target-specific 
code for small microcontrollers, but not otherwise.

> Even if you can't get an atomic R/M/W cycle, there are often similar
> tricks e.g. such as using the lock prefix in x86
>   

You do that using intrinsics or C11/C++11 atomics.  You certainly don't 
do it with "volatile".

> 
>> If you want a simple and relatively fool-proof system, all you really
>> need are two functions (preferably inline) :
>>
>> interrupt_status_t disableGlobalInterrupts(void);
>> void restoreGlobalInterrupts(interrupt_status_t old_status);
>>
>> These must both act as full memory fences.
>>
>> Then you can put whatever code needs atomic behaviour within a critical
>> section bracketed by these functions.
> 
> Is this standard C in some recent standard variant ?

No C standard covers interrupts, or ways to disable and enable them - 
that is highly target-specific.

For example, on the ARM Cortex-M, you might use:

#include "core_cmFunc.h"

typedef interrupt_status_t uint32_t;
static inline interrupt_status_t disableGlobalInterrupts(void) {
	interrupt_status_t old = __get_PRIMASK();
	__disable_irq();
	return old;
}

static inline void restoreGlobalInterrupts(interrupt_status_t old) {
	__set_PRIMASK(old);
}

If you don't want to use the ARM core functions, you can use inline 
assembly - but that is compiler specific.  For gcc, that would be:

typedef interrupt_status_t uint32_t;
static inline interrupt_status_t disableGlobalInterrupts(void) {
	interrupt_status_t old;
	asm volatile ("mrs %0, primask" : "=r" (old));
	asm volatile ("cpsid i" : : : "memory");
	return old;
}

static inline void restoreGlobalInterrupts(interrupt_status_t old) {
	asm volatile ("msr primask, %0" : : "r" (old) : "memory");
}

C11 provides standard support for a memory barrier, but since you need 
compiler-specific code for the implementation anyway, you might as well 
use the compiler-specific memory barrier.

> 
> 
>> You need more work if you have other memory masters (DMA, second
>> processor, etc.).
> 
> This has a lot to do with cache coherence.
> 

That is one aspect, yes.  But it is not the only one.  For example, 
bigger processors can have write buffers with re-ordering.