Cortex-M: share an int between two tasks| page 3

Reply by ●March 19, 20202020-03-19

On Thu, 19 Mar 2020 08:03:21 +0100, pozz <pozzugno@gmail.com> wrote:

>Il 18/03/2020 19:37, upsidedown@downunder.com ha scritto:
>> On Wed, 18 Mar 2020 11:48:55 +0100, David Brown
>> <david.brown@hesbynett.no> wrote:
>> 
>>> On 18/03/2020 10:59, pozz wrote:
>>>> Il 18/03/2020 10:06, David Brown ha scritto:
>>>>> On 18/03/2020 09:20, pozz wrote:
>>>>>> Il 13/03/2020 15:38, pozz ha scritto:
>>>>>>> I'm working on a Cortex-M4 MCU and using FreeRTOS.
>>>>>>>
>>>>>>> One task:
>>>>>>>
>>>>>>> uint32_t wait_period;
>>>>>>> while(1) {
>>>>>>>  &#4294967295;&#4294967295; // make some things
>>>>>>>  &#4294967295;&#4294967295; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>>>>>> }
>>>>>>>
>>>>>>> The following function can be called from another task:
>>>>>>>
>>>>>>> void set_waiting_period(uint32_t new_period) {
>>>>>>>  &#4294967295;&#4294967295; wait_period = new_period;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> In this case, is it needed to protect the access of the shared
>>>>>>> variable wait_period with a mutex? I don't think, we are dealing
>>>>>>> with an integer variable that should be accessed (reading and
>>>>>>> writing) atomically.
>>>>>>
>>>>>> Ok the example was extremely simple. If I have something a little
>>>>>> more complex:
>>>>>>
>>>>>> ---
>>>>>> volatile uint32_t wait_period;
>>>>>>
>>>>>> while(1) {
>>>>>>  &#4294967295;&#4294967295; // make some things
>>>>>>  &#4294967295;&#4294967295; if (wait_period > 0) {
>>>>>>  &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>>>>>  &#4294967295;&#4294967295; } else {
>>>>>>  &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskSuspend(myTask);
>>>>>>  &#4294967295;&#4294967295; }
>>>>>> }
>>>>>>
>>>>>> void set_waiting_period(uint32_t new_period) {
>>>>>>  &#4294967295;&#4294967295; wait_period = new_period;
>>>>>>  &#4294967295;&#4294967295; if (wait_period > 0) {
>>>>>>  &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskResume(myTask);
>>>>>>  &#4294967295;&#4294967295; }
>>>>>> }
>>>>>> ---
>>>>>>
>>>>>> I think this doesn't work in all the cases. For example, if
>>>>>> wait_period=0 and set_waiting_period(3) is executed from another
>>>>>> higher priority task immediately after
>>>>>>
>>>>>>  &#4294967295;&#4294967295; if(wait_period > 0)&#4294967295;&#4294967295; // the if in the endless loop
>>>>>>
>>>>>> wait_period is changed to 3, but vTaskSuspended() is executed in the
>>>>>> else branch, so leaving the task suspended with a not-null wait_period.
>>>>>>
>>>>>> Do I need a mutex in this case to protect wait_period access in both
>>>>>> points?
>>>>>>
>>>>>> Pardon for the silly question, but I'm very new to multitasking and
>>>>>> RTOS.
>>>>>
>>>>> When you have more complex examples, you start needing synchronisation
>>>>> of some sort (it can be a disabled interrupt block, a lock like a
>>>>> mutex, or message passing, or something similar).&#4294967295; Sometimes it can be
>>>>> handled with lock-free methods.
>>>>>
>>>>> Generally, if you have one reader and one writer, volatile will be
>>>>> enough - but you have to think about exactly how you want to arrange
>>>>> the accesses.
>>>>
>>>> Just to think of a more complex example. If you share a data structure
>>>> with some members and they must be consistent (all members must be
>>>> changed atomically avoiding reading only a part changed), volatile is
>>>> not enough. You need mutex, right?
>>>>
>>>
>>> You need something to make the accesses atomic, so that each reader or
>>> writer sees a consistent picture.  (This applies also to data that is
>>> bigger than the processor can read or write at one time, and for
>>> read-modify-write accesses.)  Mutexes are one way to make atomic access,
>>> but they are not the only way.  With C11/C++11, atomic access is part of
>>> the language - but support in libraries can be poor or even wrong (a
>>> common implementation I have seen could lead to deadlock).  Other locks
>>> and synchronisations will work, as will a simple interrupt-disable
>>> block.  You can also use indicator flags that are atomic to mark when a
>>> block is being updated, or have more than one copy of the block and use
>>> an atomic pointer to the current block.  Many of these techniques can be
>>> simpler and more efficient than mutexes, but are only correct if certain
>>> assumptions hold (like there only being one writer thread).
>> 
>> The concept of global variable or global struct "owner" is hardly in
>> this respect. Only let the "owner" update the volatile global variable
>> and let one or more other tasks only read this variable.
>> 
>> If multiple writers are required, send an update request (by e.g. some
>> queue mechanism) to the owner, which will make the update.
>> 
>> This is especially important for global structures, it is important
>> that only the owner updates fields in the struct.
>> 
>> A version number field should be added to the struct, which is
>> incremented each time the owner has updated one or more fields of the
>> struct. The readers should first read the version number, copy all
>> required fields to local variables and then reread the version number.
>> If the version number is still the same, use the copied data. If the
>> version  has been incremented, copy again the fields to local
>> variables, until the version number doesn't change. Of course, watch
>> up for pathological cases :-).
>> 
>> 
>>> If you can arrange for your data to be transferred between threads using
>>> message queues of some sort (most RTOS's support a few different types),
>>> then this is often the easiest and cleanest method of communicating
>>> between threads.
>> 
>> If the RTOS supports messages as big as the struct, just ask the owner
>> to send the most recent version of the struct.
>
>Interesting. Could you explain?

If a structure is only directly accessible (both read and write) to
the "owner", some form of message transfer may be needed to update and
distribute the structure.

All those tasks wanting to update some field(s) of a structure must
send update request to the owner.

If some other task wants to get an atomic snapshot of the structure,
it sends a read request to the owner and receives in return the
current snapshot of the struct.

The problem is, how large messages can be transferred between tasks.
If the size is unlimited, no problems delivering the snapshot of any
size. However, if the maximum size is only a single word (16 or 32
bits) or only a few words, things get nasty. In this case the client
must allocate a buffer for the snapshot and pass it in the read
request, the struct owner will copy data to the requester local
buffer.

Reply by David Brown ●March 19, 20202020-03-19

On 19/03/2020 11:26, pozz wrote:
> Il 19/03/2020 09:38, David Brown ha scritto:
>> On 19/03/2020 07:40, pozz wrote:
>>> Il 18/03/2020 11:48, David Brown ha scritto:
>>
>>>> If you are using "suspend" and "resume" directly, you are almost 
>>>> certainly doing it wrong - no matter which threads are doing what.
>>>>
>>>> Suspension of a task should be for a reason.&#4294967295; If you are waiting for 
>>>> some time, use vTaskDelay (or similar functions).&#4294967295; If you are 
>>>> waiting for data from another thread, wait for that - receiving from 
>>>> a queue, waiting on a semaphore, a condition variable, etc.
>>>>
>>>> Resumption of a task should be for a reason - send it a message in a 
>>>> queue, set the semaphore or condition variable, etc.
>>>>
>>>> "suspend" and "resume" are low-level "beat it with a hammer" 
>>>> approaches.
>>>
>>> I didn't know that suspend/resume mechanism shouldn't be used in 
>>> normal cases like mine.
>>>
>>> I simply wanted a management taskB to stop and restart sensor 
>>> sampling done by taskA. taskA uses vTaskDelay() for sampling at 
>>> precise timings.
>>
>> You want vTaskDelayUntil, not vTaskDelay.
> 
> Oh yes.
> 
>>
>>>
>>> Suspend/resume appeared to me the most natural way to implement this 
>>> stop/restart feature.
>>>
>>
>> It would be possible, but it does not look good to me.&#4294967295; I'd be looking 
>> at an atomic flag (volatile bool, pre-C11) to indicate task A should 
>> be shutting down after the current sample, then wait on a semaphore.  
>> But I'd need a clearer picture of the whole system and interaction 
>> between threads.
> 
> This is what I thought, but I'm not able to implement it without any 
> race conditions.
> 
> // TaskA
> static volatile bool stopped;
> static volatile unsigned int delay_ms;
> 
> stopped = false;
> while(1) {
>  &#4294967295; // get sample and process
>  &#4294967295; if (stopped) {&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; (1)
>  &#4294967295;&#4294967295;&#4294967295; vTaskDelay(portMAX_DELAY);&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; (2)

This is a stop forever!

>  &#4294967295; } else {
>  &#4294967295;&#4294967295;&#4294967295; vTaskDelay(pdMS_TO_TICKS(delay_ms));
>  &#4294967295; }
> }
> 
> void set_delay(unsigned int new_delay) {
>  &#4294967295; // This is TaskB
>  &#4294967295; if (new_delay == 0) {
>  &#4294967295;&#4294967295;&#4294967295; stopped = true;
>  &#4294967295; } else {
>  &#4294967295;&#4294967295;&#4294967295; stopped = false;
>  &#4294967295;&#4294967295;&#4294967295; delay_ms = new_delay;
>  &#4294967295;&#4294967295;&#4294967295; vTaskDelayAbort(taskA);
>  &#4294967295; }
> }
> 
> This code has some race conditions if both tasks can be interrupted in 
> any point by the other.
> For example, if stopped is true and taskB enters immediately after (1) 
> and before (2) with set_delay(3), I will have taskA blocked forever!!
> 
> Could you suggest a solution with a simple volatile bool flag?
> 

You use a flag to indicate that you should be stopping, and a semaphore 
to handle the wakeup (semaphores are intended for one task to give them, 
and another to take them).

Stop messing around with varying delays - that is an irrelevancy, and is 
complicating your code.  Either your sensor task is running with regular 
samples, or it is stopped.

One possible arrangement could be:


static volatile bool sensor_stopped;
static SemaphoreHandle_t sensor_semaphore;

// TaskA
static const TickType_t delay_ticks = pdMS_TO_TICKS(10);

while (true) {
	// Get sample and process
	if (sensor_stopped) {
		// Wait for semaphore to be ready
		xSemaphoreTake(sensor_semaphore, portMAX_DELAY);
		// Then release it again
		xSemaphoreGive(sensor_semaphore);
	}
	vTaskDelay(delay_ticks);	// DelayUntil
}


// TaskB
static void turn_off_sensor(void) {
	sensor_stopped = True;
	xSemaphoreTake(sensor_semaphore, 0);
}

static void turn_on_sensor(void) {
	sensor_stopped = False;
	xSemaphoreGive(sensor_semaphore);
}

The "sensor_stopped" flag is really just an optimisation, so that TaskA 
doesn't have to keep checking the semaphore.  It can be dropped if you want.


A really simple solution would be:

static volatile bool sensor_stopped;

// TaskA
static const TickType_t delay_ticks = pdMS_TO_TICKS(10);

while (true) {
	if (!sensor_stopped) {
		// Get sample and process
	}
	vTaskDelay(delay_ticks);	// DelayUntil
}


// TaskB
static void turn_off_sensor(void) {
	sensor_stopped = True;
}

static void turn_on_sensor(void) {
	sensor_stopped = False;
}


This all depends on why you want to stop the sensor task - if you are 
going into low power modes, for example, then it may be best to block 
the task.  If you don't mind a regular task doing nothing useful, then 
the second choice is fine.

Reply by ●March 19, 20202020-03-19

On Thu, 19 Mar 2020 11:45:37 +0100, pozz <pozzugno@gmail.com> wrote:

>
>Unfortunatley I don't know Ada.
>
>> To map this Ada code into some C+RTOS combination, one should use a 
>> mutex to make the executions of the protected operations 
>> (Sampling_Control.Set_Period, .Start, .Stop, .Get_Period) mutually 
>> exclusive, and use some "condition" variable or semaphore to implement 
>> the guard (caller suspension) of Sampling_Control.Get_Period.
>> 
>> That last point (caller suspension) has some subtleties that depend on 
>> the actual set of RTOS primitives available. But I recommend a simpler 
>> solution: instead of suspending/resuming the sampling task, just let it 
>> run normally at the sampling period, but make the actual sampling (and 
>> sample processing, I assume) conditional on a "Running" flag (by 
>> returning this flag, too, from Get_Period). Usually the processor effort 
>> saved by actually suspending tasks that normally run periodically is not 
>> significant.
>
>Yes, I didn't imagine that controlling a task from another task was so 
>difficult with a full-features RTOS like FreeRTOS.
>So your suggestion to let the sampling task always run even when it 
>should be suspended is good. It's a pity, because I have many sensors, 
>but only a few of them are active at any time.

Why don't you simply make each sampling task like

loop
    wait_for_event()
   do_the_sampling
end loop

Then have a separate timing task, which runs at say every millisecond
and increment a single integer counter.

For each task, calculate (Counter modulo NN) and send a message to a
specific sampler task, when modulo is zero. NN is the number of clock
ticks before a specific sampling is done.

If you want to do sampling say every 10 ms, but want to spread out the
sampling times, add a constant 1, 2, 3 to a copy of the counter before
performing the module and one task will sample at xx7, the other at
xx8 and third at xx9 milliseconds every 10 ms.

This way, you can control the sampling times and periods from a single
point in the timing task and no need to spread out this to individual
sampler tasks.

Reply by Tauno Voipio ●March 19, 20202020-03-19

In Cortex-M4 there are exclusive access instruction pairs
LDREX / STREX, LDREXH / STREXH and LDREXB / STREXB which
can be used to effect mutual exclusion on a variable access.

They are actually pairs where the store part refuses to work
if there has been an exception between the load and store part
of the pair. There is a result indication of the success.

If the thread switching is done with the recommended PendSV
exception, the exclusion works, if applied suitably.

-- 

-TV


On 18.3.20 22:24, David Brown wrote:
> On 18/03/2020 19:37, upsidedown@downunder.com wrote:
>> On Wed, 18 Mar 2020 11:48:55 +0100, David Brown
>> <david.brown@hesbynett.no> wrote:
>>
>>> On 18/03/2020 10:59, pozz wrote:
>>>> Il 18/03/2020 10:06, David Brown ha scritto:
>>>>> On 18/03/2020 09:20, pozz wrote:
>>>>>> Il 13/03/2020 15:38, pozz ha scritto:
>>>>>>> I'm working on a Cortex-M4 MCU and using FreeRTOS.
>>>>>>>
>>>>>>> One task:
>>>>>>>
>>>>>>> uint32_t wait_period;
>>>>>>> while(1) {
>>>>>>> &nbsp;&nbsp;&nbsp; // make some things
>>>>>>> &nbsp;&nbsp;&nbsp; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>>>>>> }
>>>>>>>
>>>>>>> The following function can be called from another task:
>>>>>>>
>>>>>>> void set_waiting_period(uint32_t new_period) {
>>>>>>> &nbsp;&nbsp;&nbsp; wait_period = new_period;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> In this case, is it needed to protect the access of the shared
>>>>>>> variable wait_period with a mutex? I don't think, we are dealing
>>>>>>> with an integer variable that should be accessed (reading and
>>>>>>> writing) atomically.
>>>>>>
>>>>>> Ok the example was extremely simple. If I have something a little
>>>>>> more complex:
>>>>>>
>>>>>> ---
>>>>>> volatile uint32_t wait_period;
>>>>>>
>>>>>> while(1) {
>>>>>> &nbsp;&nbsp;&nbsp; // make some things
>>>>>> &nbsp;&nbsp;&nbsp; if (wait_period > 0) {
>>>>>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>>>>> &nbsp;&nbsp;&nbsp; } else {
>>>>>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vTaskSuspend(myTask);
>>>>>> &nbsp;&nbsp;&nbsp; }
>>>>>> }
>>>>>>
>>>>>> void set_waiting_period(uint32_t new_period) {
>>>>>> &nbsp;&nbsp;&nbsp; wait_period = new_period;
>>>>>> &nbsp;&nbsp;&nbsp; if (wait_period > 0) {
>>>>>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; vTaskResume(myTask);
>>>>>> &nbsp;&nbsp;&nbsp; }
>>>>>> }
>>>>>> ---
>>>>>>
>>>>>> I think this doesn't work in all the cases. For example, if
>>>>>> wait_period=0 and set_waiting_period(3) is executed from another
>>>>>> higher priority task immediately after
>>>>>>
>>>>>> &nbsp;&nbsp;&nbsp; if(wait_period > 0)&nbsp;&nbsp; // the if in the endless loop
>>>>>>
>>>>>> wait_period is changed to 3, but vTaskSuspended() is executed in the
>>>>>> else branch, so leaving the task suspended with a not-null 
>>>>>> wait_period.
>>>>>>
>>>>>> Do I need a mutex in this case to protect wait_period access in both
>>>>>> points?
>>>>>>
>>>>>> Pardon for the silly question, but I'm very new to multitasking and
>>>>>> RTOS.
>>>>>
>>>>> When you have more complex examples, you start needing synchronisation
>>>>> of some sort (it can be a disabled interrupt block, a lock like a
>>>>> mutex, or message passing, or something similar).&nbsp; Sometimes it can be
>>>>> handled with lock-free methods.
>>>>>
>>>>> Generally, if you have one reader and one writer, volatile will be
>>>>> enough - but you have to think about exactly how you want to arrange
>>>>> the accesses.
>>>>
>>>> Just to think of a more complex example. If you share a data structure
>>>> with some members and they must be consistent (all members must be
>>>> changed atomically avoiding reading only a part changed), volatile is
>>>> not enough. You need mutex, right?
>>>>
>>>
>>> You need something to make the accesses atomic, so that each reader or
>>> writer sees a consistent picture.&nbsp; (This applies also to data that is
>>> bigger than the processor can read or write at one time, and for
>>> read-modify-write accesses.)&nbsp; Mutexes are one way to make atomic access,
>>> but they are not the only way.&nbsp; With C11/C++11, atomic access is part of
>>> the language - but support in libraries can be poor or even wrong (a
>>> common implementation I have seen could lead to deadlock).&nbsp; Other locks
>>> and synchronisations will work, as will a simple interrupt-disable
>>> block.&nbsp; You can also use indicator flags that are atomic to mark when a
>>> block is being updated, or have more than one copy of the block and use
>>> an atomic pointer to the current block.&nbsp; Many of these techniques can be
>>> simpler and more efficient than mutexes, but are only correct if certain
>>> assumptions hold (like there only being one writer thread).
>>
>> The concept of global variable or global struct "owner" is hardly in
> 
> (typo - "handy", I assume)
> 
>> this respect. Only let the "owner" update the volatile global variable
>> and let one or more other tasks only read this variable.
> 
> Yes.&nbsp; But be careful that this alone is not enough if the variable is 
> too big to be written atomically by the processor.
> 
> The "owner" here refers primarily to the owning thread or context, but 
> the "owning" module, function, class (for C++), etc., is also a useful 
> concept for good structure.&nbsp; And the ideas apply to all data, not just 
> "global" data.
> 
>>
>> If multiple writers are required, send an update request (by e.g. some
>> queue mechanism) to the owner, which will make the update.
> 
> Yes, that is a good way to do it.
> 
>>
>> This is especially important for global structures, it is important
>> that only the owner updates fields in the struct.
>>
>> A version number field should be added to the struct, which is
>> incremented each time the owner has updated one or more fields of the
>> struct. The readers should first read the version number, copy all
>> required fields to local variables and then reread the version number.
>> If the version number is still the same, use the copied data. If the
>> version&nbsp; has been incremented, copy again the fields to local
>> variables, until the version number doesn't change. Of course, watch
>> up for pathological cases :-).
>>
> 
> That is certainly one way to handle atomic reads.&nbsp; It is by no means the 
> only way, but it can certainly be a good way for data that is rarely 
> changed.&nbsp; However, you must be careful to use volatile reads for the 
> struct (the local copy does not have to be volatile), or use memory 
> barriers, to ensure that the reading really takes place in the correct 
> manner.
> 
> For smaller data - such as a 64-bit tick counter - it is sufficient to 
> simply read the whole data repeatedly until you get the same value in 
> two successive reads.&nbsp; That avoids the need of a separate version field. 
>  &nbsp;(Don't use that for anything that may suffer from an ABA problem.)
> 
>>
>>> If you can arrange for your data to be transferred between threads using
>>> message queues of some sort (most RTOS's support a few different types),
>>> then this is often the easiest and cleanest method of communicating
>>> between threads.
>>
>> If the RTOS supports messages as big as the struct, just ask the owner
>> to send the most recent version of the struct.
>>
> 
> Often good, yes - I do that often.&nbsp; When sending more complex data 
> involving pointers, be very careful about the lifetime of the data 
> pointed at, and who (if anyone) is responsible for clearing it up.
> 
>>>
>>> But you are right that volatile is not enough!
>>>
>>>
>>>>
>>>>> I find one of the best ways to get good software design is to think
>>>>> about it like hardware.&nbsp; You don't have one hardware signal controlled
>>>>> by various different signals scattered around the design - if you have
>>>>> an LED that can be controlled by different signals, you put these
>>>>> signals into a multiplexer or logic gate (or wired or / wired and).
>>>>> And you collect them all together in one place in the design to do 
>>>>> that.
>>>>>
>>>>> The same applies to software.&nbsp; Don't change a variable from a dozen
>>>>> different places around the program.&nbsp; Don't access the same resource
>>>>> from different places.&nbsp; Put in queues, multiplexers, etc., so that you
>>>>> have control.
>>>>
>>>> I see. I think the safest approach is async messages sent in a queue.
>>>>
>>>
>>> Yes, it is often good.&nbsp; It is often easier to see that the code is
>>> correct when you use message passing rather than mutexes.
>>>
>>>>
>>>>> In a case like this, the idea that you have your "myTask" being
>>>>> suspended from one thread and woken up by a different thread is your
>>>>> flaw.&nbsp; Fix the design - the suspension and resumption of "myTask"
>>>>> should happen from /one/ place in /one/ thread - and your problems of
>>>>> synchronisation go away.
>>>>
>>>> I have two tasks. Task A can be suspended and resumed. Because Task A
>>>> can't resume itself from suspended state, only Task B should suspend 
>>>> and
>>>> resume Task A.
>>>>
>>>> // Task A
>>>> while(1) {
>>>> &nbsp; &nbsp;&nbsp; // make some things
>>>> &nbsp; &nbsp;&nbsp; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>>> }
>>>>
>>>> However I want to suspend Task A, if needed, after "make some 
>>>> things" or
>>>> during vTaskDelay(). I don't want to suspend Task A in the middle of
>>>> "make some things".
>>>
>>> If you are using "suspend" and "resume" directly, you are almost
>>> certainly doing it wrong - no matter which threads are doing what.
>>>
>>> Suspension of a task should be for a reason.&nbsp; If you are waiting for
>>> some time, use vTaskDelay (or similar functions).&nbsp; If you are waiting
>>> for data from another thread, wait for that - receiving from a queue,
>>> waiting on a semaphore, a condition variable, etc.
>>
>> Some OS support also waiting data from an other thread in which also a
>> timeout parameter can be used. If you wait for an event that will
>> never occur, the timeout parameter can be used to make a fixed delay.
>>
> 
> True, but usually the OS will have a specific call for a timed delay 
> which will be clearer in the code.
> 
>>>
>>> Resumption of a task should be for a reason - send it a message in a
>>> queue, set the semaphore or condition variable, etc.
>>
>> Yes, absolutely.
>>
>>> "suspend" and "resume" are low-level "beat it with a hammer" approaches.
>>
>> "Suspend" can be usable if done e.g. with low power Wait For Interrupt
>> instruction. When an interrupt occurs, the interrupt will be served
>> and then the scheduler is executed to find the first high priority
>> task in READY state and it starts executing i.e.
>> WaitForSignificantEvent. If the scheduler starts to run the original
>> suspender, it should check if it should wake up or start a new
>> "Suspend" state.
>>
>

Reply by David Brown ●March 19, 20202020-03-19

On 19/03/2020 13:05, Tauno Voipio wrote:
> In Cortex-M4 there are exclusive access instruction pairs
> LDREX / STREX, LDREXH / STREXH and LDREXB / STREXB which
> can be used to effect mutual exclusion on a variable access.
> 
> They are actually pairs where the store part refuses to work
> if there has been an exception between the load and store part
> of the pair. There is a result indication of the success.
> 
> If the thread switching is done with the recommended PendSV
> exception, the exclusion works, if applied suitably.
> 

These instructions can be useful, but difficult to use.  Generally they 
are part of how you implement mutexes, semaphores, and other 
synchronisation mechanisms - they are not sufficient on their own.  It's 
really easy to follow a "this is how you make a mutex with ldrex/strex" 
recipe and get something that will deadlock as soon as there is 
contention, because the recipe only works when you have time-sharing or 
multiple cores.

And on single-core microcontroller-oriented cpus, like the Cortex-M, 
disabling global interrupts for a few instructions is generally more 
efficient and much easier to get right.

Reply by Tauno Voipio ●March 19, 20202020-03-19

On 19.3.20 14:16, David Brown wrote:
> On 19/03/2020 13:05, Tauno Voipio wrote:
>> In Cortex-M4 there are exclusive access instruction pairs
>> LDREX / STREX, LDREXH / STREXH and LDREXB / STREXB which
>> can be used to effect mutual exclusion on a variable access.
>>
>> They are actually pairs where the store part refuses to work
>> if there has been an exception between the load and store part
>> of the pair. There is a result indication of the success.
>>
>> If the thread switching is done with the recommended PendSV
>> exception, the exclusion works, if applied suitably.
>>
> 
> These instructions can be useful, but difficult to use.&nbsp; Generally they 
> are part of how you implement mutexes, semaphores, and other 
> synchronisation mechanisms - they are not sufficient on their own.&nbsp; It's 
> really easy to follow a "this is how you make a mutex with ldrex/strex" 
> recipe and get something that will deadlock as soon as there is 
> contention, because the recipe only works when you have time-sharing or 
> multiple cores.
> 
> And on single-core microcontroller-oriented cpus, like the Cortex-M, 
> disabling global interrupts for a few instructions is generally more 
> efficient and much easier to get right.
> 

Been there - done that.

I have two parallel versions of the same kernel, one with
interrupt disable and the other with LDREX / STREX.

Both ways work, and I agree that the global interrupt disable
is easier to use, but it may create unacceptable delays on
high-priority interrupts.

Would you provide an example where the LDREX / STREX will create
a deadlock in a single processor? Remember that any interrupt
service will cancel the exclusivity, as also a thread switch.

-- 

-TV

Reply by David Brown ●March 19, 20202020-03-19

On 19/03/2020 13:27, Tauno Voipio wrote:
> On 19.3.20 14:16, David Brown wrote:
>> On 19/03/2020 13:05, Tauno Voipio wrote:
>>> In Cortex-M4 there are exclusive access instruction pairs
>>> LDREX / STREX, LDREXH / STREXH and LDREXB / STREXB which
>>> can be used to effect mutual exclusion on a variable access.
>>>
>>> They are actually pairs where the store part refuses to work
>>> if there has been an exception between the load and store part
>>> of the pair. There is a result indication of the success.
>>>
>>> If the thread switching is done with the recommended PendSV
>>> exception, the exclusion works, if applied suitably.
>>>
>>
>> These instructions can be useful, but difficult to use.&nbsp; Generally 
>> they are part of how you implement mutexes, semaphores, and other 
>> synchronisation mechanisms - they are not sufficient on their own.  
>> It's really easy to follow a "this is how you make a mutex with 
>> ldrex/strex" recipe and get something that will deadlock as soon as 
>> there is contention, because the recipe only works when you have 
>> time-sharing or multiple cores.
>>
>> And on single-core microcontroller-oriented cpus, like the Cortex-M, 
>> disabling global interrupts for a few instructions is generally more 
>> efficient and much easier to get right.
>>
> 
> Been there - done that.
> 
> I have two parallel versions of the same kernel, one with
> interrupt disable and the other with LDREX / STREX.

I haven't implemented any real kernels, so I'm happy to learn from your 
experience there.

> 
> Both ways work, and I agree that the global interrupt disable
> is easier to use, but it may create unacceptable delays on
> high-priority interrupts.

I think people get a bit obsessed with interrupt disable times.  Many 
programmers get in a fluster when you write code that disables 
interrupts for 4 or 5 instructions - yet have no problem writing an 
interrupt routine than takes 50 to 200 instruction clocks to execute, 
with that amount of unknown jitter.  (Yes, I am ignoring interrupt 
priorities for simplification.)

If you need sub-microsecond reaction times for something, use hardware - 
not software.  That's why your microcontroller has timers, communication 
ports, DMA, and the like.  And that means you can disable interrupts for 
a microsecond without causing trouble - that's a /long/ time on a modern 
Cortex-M.

And when you go to a faster device like NXP's Cortex-M7 "crossover" 
chips at 600 MHz, a disable interrupt based function for atomic 64-bit 
access, or a CAS operation, all running from tightly-coupled memory will 
run faster than a single "load" operation if the instruction and data 
are not in the cache.  And that load operation will delay interrupts 
just as much as the interrupt-disable block.

Even on a M4, a divide instruction can take 12 cycles - you can have 
your interrupts disabled, your load/stores done, and interrupts 
re-enabled in that time.

Yes, disabling interrupts may cause unacceptable delays - but you should 
look closely at what delays are acceptable and unacceptable.  Don't 
dismiss interrupt disable blocks on principle without actually running 
the numbers.

> 
> Would you provide an example where the LDREX / STREX will create
> a deadlock in a single processor? Remember that any interrupt
> service will cancel the exclusivity, as also a thread switch.
> 

It's the same as any other way of implementing locks that does not take 
into account priority inversion.  Suppose Task A has higher priority 
than Task B.  (Task A could be an interrupt - the details don't matter 
too much.)

Task A:
	1. Take lock.
	2. Use shared resource.
	3. Release lock.

Task B:
	1. Take lock.
	2. Use shared resource.
	3. Release lock.

If task B has passed step 1 when task A is scheduled (such as the 
interrupt being triggered), task A will block on step 1, and task B 
never gets to step 3.

It is not that LDREX/STREX as special here - you'd get the same problem 
using interrupt-disable blocks (if you are using them to access other 
locks, rather than using the interrupt enable as the lock itself).  The 
problem is that the net is full of examples like:

"""
8.5.3. Example of LDREX and STREX usage

The following is an example of typical usage. Suppose you are trying to 
claim a lock:

Lock address :  LockAddr

Lock free :  0x00

Lock taken :  0xFF

     MOV  R1, #0xFF             ; load the &lsquo;lock taken&rsquo; value

try LDREX    R0, [LockAddr]    ; load the lock value

     CMP R0,  #0                ; is the lock free?

     STREXEQ  R1, R0, [LockAddr]; try and claim the lock

     CMPEQ R0, #0               ; did this succeed?

     BNE try                    ; no - try again. . .

; yes - we have the lock
"""

(That's from ARM's documentation.)

It is very seductive to think that LDREX/STREX lets you implement locks, 
and these recipes are good enough.  And when you use them, and test 
them, everything seems to work.  But I'm sure I don't need to tell you 
that this is not sufficient - it will only work if both Task A and Task 
B can be running.

Reply by pozz ●March 19, 20202020-03-19

Il 19/03/2020 12:12, David Brown ha scritto:
...
>>> It would be possible, but it does not look good to me.&#4294967295; I'd be 
>>> looking at an atomic flag (volatile bool, pre-C11) to indicate task A 
>>> should be shutting down after the current sample, then wait on a 
>>> semaphore. But I'd need a clearer picture of the whole system and 
>>> interaction between threads.
>>
>> This is what I thought, but I'm not able to implement it without any 
>> race conditions.
>>
>> // TaskA
>> static volatile bool stopped;
>> static volatile unsigned int delay_ms;
>>
>> stopped = false;
>> while(1) {
>> &#4294967295;&#4294967295; // get sample and process
>> &#4294967295;&#4294967295; if (stopped) {&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; (1)
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskDelay(portMAX_DELAY);&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; (2)
> 
> This is a stop forever!

Yes, but in my example I used vTaskDelayAbort(), so it isn't really a 
stop forever.


>> &#4294967295;&#4294967295; } else {
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskDelay(pdMS_TO_TICKS(delay_ms));
>> &#4294967295;&#4294967295; }
>> }
>>
>> void set_delay(unsigned int new_delay) {
>> &#4294967295;&#4294967295; // This is TaskB
>> &#4294967295;&#4294967295; if (new_delay == 0) {
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; stopped = true;
>> &#4294967295;&#4294967295; } else {
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; stopped = false;
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; delay_ms = new_delay;
>> &#4294967295;&#4294967295;&#4294967295;&#4294967295; vTaskDelayAbort(taskA);
>> &#4294967295;&#4294967295; }
>> }
>>
>> This code has some race conditions if both tasks can be interrupted in 
>> any point by the other.
>> For example, if stopped is true and taskB enters immediately after (1) 
>> and before (2) with set_delay(3), I will have taskA blocked forever!!
>>
>> Could you suggest a solution with a simple volatile bool flag?
>>
> 
> You use a flag to indicate that you should be stopping, and a semaphore 
> to handle the wakeup (semaphores are intended for one task to give them, 
> and another to take them).
> 
> Stop messing around with varying delays - that is an irrelevancy, and is 
> complicating your code.&#4294967295; Either your sensor task is running with regular 
> samples, or it is stopped.

Sincerely taskB should change the sampling interval too. Anyway the big 
issue is stopping and restarting, not changing the interval.


> One possible arrangement could be:
> 
> 
> static volatile bool sensor_stopped;
> static SemaphoreHandle_t sensor_semaphore;
> 
> // TaskA
> static const TickType_t delay_ticks = pdMS_TO_TICKS(10);
> 
> while (true) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;// Get sample and process
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;if (sensor_stopped) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; // Wait for semaphore to be ready
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; xSemaphoreTake(sensor_semaphore, portMAX_DELAY);
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; // Then release it again
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; xSemaphoreGive(sensor_semaphore);
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;}
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;vTaskDelay(delay_ticks);&#4294967295;&#4294967295;&#4294967295; // DelayUntil
> }
> 
> 
> // TaskB
> static void turn_off_sensor(void) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;sensor_stopped = True;
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;xSemaphoreTake(sensor_semaphore, 0);
> }
> 
> static void turn_on_sensor(void) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;sensor_stopped = False;
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;xSemaphoreGive(sensor_semaphore);
> }
> 
> The "sensor_stopped" flag is really just an optimisation, so that TaskA 
> doesn't have to keep checking the semaphore.&#4294967295; It can be dropped if you 
> want.

I'm not sure about xSemaphoreTake(sensor_semaphore, 0) in 
turn_off_sensor(). You pass a zero timeout. Are you sure you will be 
able to take the semaphore in *every* situation? What happens if 
turn_off_sensor() is run immediately after xSemaphoreTake(..., 
portMAX_DELAY) returns?
In this case, I think taskB wouldn't be able to take the semaphore and 
so taskA doesn't really block. You will have sensor_stopped true, but 
the semaphore released.


> A really simple solution would be:
> 
> static volatile bool sensor_stopped;
> 
> // TaskA
> static const TickType_t delay_ticks = pdMS_TO_TICKS(10);
> 
> while (true) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;if (!sensor_stopped) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; // Get sample and process
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;}
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;vTaskDelay(delay_ticks);&#4294967295;&#4294967295;&#4294967295; // DelayUntil
> }
> 
> 
> // TaskB
> static void turn_off_sensor(void) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;sensor_stopped = True;
> }
> 
> static void turn_on_sensor(void) {
>  &#4294967295;&#4294967295;&#4294967295;&#4294967295;sensor_stopped = False;
> }
> 
> 
> This all depends on why you want to stop the sensor task - if you are 
> going into low power modes, for example, then it may be best to block 
> the task.&#4294967295; If you don't mind a regular task doing nothing useful, then 
> the second choice is fine.

Yes, this is another solution that doesn't really stop the task, but 
only the sampling process.

Reply by Les Cargill ●March 22, 20202020-03-22

David Brown wrote:
> On 13/03/2020 15:38, pozz wrote:
>> I'm working on a Cortex-M4 MCU and using FreeRTOS.
>>
>> One task:
>>
>> uint32_t wait_period;
>> while(1) {
>>  &#4294967295; // make some things
>>  &#4294967295; vTaskDelay(pdMS_TO_TICKS(wait_period));
>> }
>>
>> The following function can be called from another task:
>>
>> void set_waiting_period(uint32_t new_period) {
>>  &#4294967295; wait_period = new_period;
>> }
>>
>>
>> In this case, is it needed to protect the access of the shared variable
>> wait_period with a mutex? I don't think, we are dealing with an integer
>> variable that should be accessed (reading and writing) atomically.
>>
> 
> You don't need to use a mutex, as you won't have race conditions when
> accessing it (race conditions happen when at least one thread can be
> writing, and at least one thread reading, and the accesses are not
> atomic).  However, you /do/ need to make the accesses volatile, or there
> will be no guarantee that the variable will be read or written in the
> spots where you have accessed it.
> 

Small digression: ( out of ignorance of the M3 ) - There is nothing you
have to do with an MMU or other caching furniture to guarantee
serialization of access? So "volatile" is known to be sufficient?

I'd be tempted to put a mutex on an measure the cost because race
conditions are quite ... challenging to test for.

-- 
Les Cargill

Reply by David Brown ●March 22, 20202020-03-22

On 22/03/2020 19:42, Les Cargill wrote:
> David Brown wrote:
>> On 13/03/2020 15:38, pozz wrote:
>>> I'm working on a Cortex-M4 MCU and using FreeRTOS.
>>>
>>> One task:
>>>
>>> uint32_t wait_period;
>>> while(1) {
>>> &#4294967295;&#4294967295; // make some things
>>> &#4294967295;&#4294967295; vTaskDelay(pdMS_TO_TICKS(wait_period));
>>> }
>>>
>>> The following function can be called from another task:
>>>
>>> void set_waiting_period(uint32_t new_period) {
>>> &#4294967295;&#4294967295; wait_period = new_period;
>>> }
>>>
>>>
>>> In this case, is it needed to protect the access of the shared variable
>>> wait_period with a mutex? I don't think, we are dealing with an integer
>>> variable that should be accessed (reading and writing) atomically.
>>>
>>
>> You don't need to use a mutex, as you won't have race conditions when
>> accessing it (race conditions happen when at least one thread can be
>> writing, and at least one thread reading, and the accesses are not
>> atomic).&#4294967295; However, you /do/ need to make the accesses volatile, or there
>> will be no guarantee that the variable will be read or written in the
>> spots where you have accessed it.
>>
> 
> Small digression: ( out of ignorance of the M3 ) - There is nothing you
> have to do with an MMU or other caching furniture to guarantee
> serialization of access? So "volatile" is known to be sufficient?

Yes, volatile is sufficient here.  Almost all processors have a serial 
processing model - that is, no matter how much super-scaling and 
out-of-order execution you have (the M3 does not have either, but the M7 
has super-scaling and can often do two instructions per cycle), the 
result is as though instructions execute fully in program order.  Writes 
can end up re-ordered before they hit memory, due to caches or write 
buffers, but this is invisible to the code.

You need data ordering instructions (like "dsb" or "dmb" on the 
Cortex-M) or MPU memory regions (such as non-cacheable areas) when the 
data is accessed by something other than the processor that reads and 
writes it.  So if you are writing data to a buffer that will be read by 
DMA, or you have a dual processor system (as found in some Cortex-M 
devices), or an SMP system (like Cortex-A devices) it's a different 
matter - you need data synchronisation.

Also remember that volatile accesses do not synchronise with 
non-volatile accesses.  You can't do some non-volatile writes to a 
buffer, then set a volatile flag and think the writes will always come 
before the flag.

> 
> I'd be tempted to put a mutex on an measure the cost because race
> conditions are quite ... challenging to test for.
> 

You avoid race conditions by design, not testing!