EmbeddedRelated.com
Forums

Cortex-M: share an int between two tasks

Started by pozz March 13, 2020
On 3/22/20 2:42 PM, Les Cargill wrote:
> > Small digression: ( out of ignorance of the M3 ) - There is nothing you > have to do with an MMU or other caching furniture to guarantee > serialization of access? So "volatile" is known to be sufficient? > > I'd be tempted to put a mutex on an measure the cost because race > conditions are quite ... challenging to test for.
The key point is that things are different for a single-processor/single-master system, then when you have multiple processors/bus masters. A single processor system that didn't keep accesses in order at least to its own view would be seriously broken. At worse there could be a few cycles of delay slots that the compiler would keep track of that needed to be observed between a write and a read. The one catch with a single processor system is if it has something like DMA, then the memory accessed by the DMA needs the protections like a multiprocessor system. The volatile system was invented back at a time when for most machines, it was sufficient for most types of access.
On 23/03/2020 01:50, Richard Damon wrote:
> On 3/22/20 2:42 PM, Les Cargill wrote: >> >> Small digression: ( out of ignorance of the M3 ) - There is nothing you >> have to do with an MMU or other caching furniture to guarantee >> serialization of access? So "volatile" is known to be sufficient? >> >> I'd be tempted to put a mutex on an measure the cost because race >> conditions are quite ... challenging to test for. > > The key point is that things are different for a > single-processor/single-master system, then when you have multiple > processors/bus masters. A single processor system that didn't keep > accesses in order at least to its own view would be seriously broken. At > worse there could be a few cycles of delay slots that the compiler would > keep track of that needed to be observed between a write and a read. > > The one catch with a single processor system is if it has something like > DMA, then the memory accessed by the DMA needs the protections like a > multiprocessor system. > > The volatile system was invented back at a time when for most machines, > it was sufficient for most types of access. >
It's also worth noting that even though "volatile" is not sufficient for bigger systems (multiple bus masters, like multiple cores or DMA), it is generally still /necessary/ along with the memory barriers, cache flushes, synchronisation instructions, locks, etc. Sometimes with modern C or C++ you can use atomic types rather than volatile, since the atomic types are effectively volatile in themselves (as well as having other properties and synchronisation features). And of course compiler-specific features can replace the need of volatile (such as using a gcc asm memory clobber).
Il 19/03/2020 15:31, pozz ha scritto:
> Il 19/03/2020 12:12, David Brown ha scritto:
> [...]
>> One possible arrangement could be: >> >> >> static volatile bool sensor_stopped; >> static SemaphoreHandle_t sensor_semaphore; >> >> // TaskA >> static const TickType_t delay_ticks = pdMS_TO_TICKS(10); >> >> while (true) { >> �����// Get sample and process >> �����if (sensor_stopped) { >> �������� // Wait for semaphore to be ready >> �������� xSemaphoreTake(sensor_semaphore, portMAX_DELAY); >> �������� // Then release it again >> �������� xSemaphoreGive(sensor_semaphore); >> �����} >> �����vTaskDelay(delay_ticks);��� // DelayUntil >> } >> >> >> // TaskB >> static void turn_off_sensor(void) { >> �����sensor_stopped = True; >> �����xSemaphoreTake(sensor_semaphore, 0); >> } >> >> static void turn_on_sensor(void) { >> �����sensor_stopped = False; >> �����xSemaphoreGive(sensor_semaphore); >> } >> >> The "sensor_stopped" flag is really just an optimisation, so that >> TaskA doesn't have to keep checking the semaphore.� It can be dropped >> if you want. > > I'm not sure about xSemaphoreTake(sensor_semaphore, 0) in > turn_off_sensor(). You pass a zero timeout. Are you sure you will be > able to take the semaphore in *every* situation? What happens if > turn_off_sensor() is run immediately after xSemaphoreTake(..., > portMAX_DELAY) returns? > In this case, I think taskB wouldn't be able to take the semaphore and > so taskA doesn't really block. You will have sensor_stopped true, but > the semaphore released.
David I hope you can answer to this. This topic is quite confusing to me. Thank you.
On 3/23/20 4:09 AM, David Brown wrote:
> On 23/03/2020 01:50, Richard Damon wrote: >> On 3/22/20 2:42 PM, Les Cargill wrote: >>> >>> Small digression: ( out of ignorance of the M3 ) - There is nothing you >>> have to do with an MMU or other caching furniture to guarantee >>> serialization of access? So "volatile" is known to be sufficient? >>> >>> I'd be tempted to put a mutex on an measure the cost because race >>> conditions are quite ... challenging to test for. >> >> The key point is that things are different for a >> single-processor/single-master system, then when you have multiple >> processors/bus masters. A single processor system that didn't keep >> accesses in order at least to its own view would be seriously broken. At >> worse there could be a few cycles of delay slots that the compiler would >> keep track of that needed to be observed between a write and a read. >> >> The one catch with a single processor system is if it has something like >> DMA, then the memory accessed by the DMA needs the protections like a >> multiprocessor system. >> >> The volatile system was invented back at a time when for most machines, >> it was sufficient for most types of access. >> > > It's also worth noting that even though "volatile" is not sufficient for > bigger systems (multiple bus masters, like multiple cores or DMA), it is > generally still /necessary/ along with the memory barriers, cache > flushes, synchronisation instructions, locks, etc. > > Sometimes with modern C or C++ you can use atomic types rather than > volatile, since the atomic types are effectively volatile in themselves > (as well as having other properties and synchronisation features). > > And of course compiler-specific features can replace the need of > volatile (such as using a gcc asm memory clobber). >
I will admit that it isn't my area of expertise, as most of my programming is done on machines where volatile is sufficient for most of the needs, but I remember arguments that for the big machines volatile wasn't needed at all, as you needed a barrier that said every thing that happened before this will be done before anything that happens after this, and with such a barrier volatile isn't needed. It may well be, and I think it is, that the Standards adopted weaker barriers (that are perhaps significantly cheaper to perform) and with those weaker barriers we still have a need for volatile. The one key thing that I remember is that the compiler needs to understand these barriers, as it needs to avoid moving accesses accross them.
On 3/19/20 10:31 AM, pozz wrote:
> Il 19/03/2020 12:12, David Brown ha scritto: > ... >>>> It would be possible, but it does not look good to me.� I'd be >>>> looking at an atomic flag (volatile bool, pre-C11) to indicate task >>>> A should be shutting down after the current sample, then wait on a >>>> semaphore. But I'd need a clearer picture of the whole system and >>>> interaction between threads. >>> >>> This is what I thought, but I'm not able to implement it without any >>> race conditions. >>> >>> // TaskA >>> static volatile bool stopped; >>> static volatile unsigned int delay_ms; >>> >>> stopped = false; >>> while(1) { >>> �� // get sample and process >>> �� if (stopped) {���������������������������� (1) >>> ���� vTaskDelay(portMAX_DELAY);�������������� (2) >> >> This is a stop forever! > > Yes, but in my example I used vTaskDelayAbort(), so it isn't really a > stop forever. > > >>> �� } else { >>> ���� vTaskDelay(pdMS_TO_TICKS(delay_ms)); >>> �� } >>> } >>> >>> void set_delay(unsigned int new_delay) { >>> �� // This is TaskB >>> �� if (new_delay == 0) { >>> ���� stopped = true; >>> �� } else { >>> ���� stopped = false; >>> ���� delay_ms = new_delay; >>> ���� vTaskDelayAbort(taskA); >>> �� } >>> } >>> >>> This code has some race conditions if both tasks can be interrupted >>> in any point by the other. >>> For example, if stopped is true and taskB enters immediately after >>> (1) and before (2) with set_delay(3), I will have taskA blocked >>> forever!! >>> >>> Could you suggest a solution with a simple volatile bool flag? >>> >> >> You use a flag to indicate that you should be stopping, and a >> semaphore to handle the wakeup (semaphores are intended for one task >> to give them, and another to take them). >> >> Stop messing around with varying delays - that is an irrelevancy, and >> is complicating your code.� Either your sensor task is running with >> regular samples, or it is stopped. > > Sincerely taskB should change the sampling interval too. Anyway the big > issue is stopping and restarting, not changing the interval. > > >> One possible arrangement could be: >> >> >> static volatile bool sensor_stopped; >> static SemaphoreHandle_t sensor_semaphore; >> >> // TaskA >> static const TickType_t delay_ticks = pdMS_TO_TICKS(10); >> >> while (true) { >> �����// Get sample and process >> �����if (sensor_stopped) { >> �������� // Wait for semaphore to be ready >> �������� xSemaphoreTake(sensor_semaphore, portMAX_DELAY); >> �������� // Then release it again >> �������� xSemaphoreGive(sensor_semaphore); >> �����} >> �����vTaskDelay(delay_ticks);��� // DelayUntil >> } >> >> >> // TaskB >> static void turn_off_sensor(void) { >> �����sensor_stopped = True; >> �����xSemaphoreTake(sensor_semaphore, 0); >> } >> >> static void turn_on_sensor(void) { >> �����sensor_stopped = False; >> �����xSemaphoreGive(sensor_semaphore); >> } >> >> The "sensor_stopped" flag is really just an optimisation, so that >> TaskA doesn't have to keep checking the semaphore.� It can be dropped >> if you want. > > I'm not sure about xSemaphoreTake(sensor_semaphore, 0) in > turn_off_sensor(). You pass a zero timeout. Are you sure you will be > able to take the semaphore in *every* situation? What happens if > turn_off_sensor() is run immediately after xSemaphoreTake(..., > portMAX_DELAY) returns? > In this case, I think taskB wouldn't be able to take the semaphore and > so taskA doesn't really block. You will have sensor_stopped true, but > the semaphore released. >
A Take with a zero timeout will lower the semaphore if it happens to be raised, and continue if is was lowered. This will cause task A to go and block on the Semaphore until task B gives it again. The take operation returns a flag as to if it succeeded, and Task a should really use a: if(xSemaphoreTake(sensor_semaphore, portMAX_DELAY)) { xSemaphoreGive(sensor_semaphore); } Otherwise if something causes the Take to abort, the the sampling will start back up. I would probably do the take before setting the flag, so if TaskA runs between the statements it doesn't do its take, and pass.
> >> A really simple solution would be: >> >> static volatile bool sensor_stopped; >> >> // TaskA >> static const TickType_t delay_ticks = pdMS_TO_TICKS(10); >> >> while (true) { >> �����if (!sensor_stopped) { >> �������� // Get sample and process >> �����} >> �����vTaskDelay(delay_ticks);��� // DelayUntil >> } >> >> >> // TaskB >> static void turn_off_sensor(void) { >> �����sensor_stopped = True; >> } >> >> static void turn_on_sensor(void) { >> �����sensor_stopped = False; >> } >> >> >> This all depends on why you want to stop the sensor task - if you are >> going into low power modes, for example, then it may be best to block >> the task.� If you don't mind a regular task doing nothing useful, then >> the second choice is fine. > > Yes, this is another solution that doesn't really stop the task, but > only the sampling process. > >
Il 23/03/2020 13:10, Richard Damon ha scritto:
> On 3/19/20 10:31 AM, pozz wrote: >> Il 19/03/2020 12:12, David Brown ha scritto:
>>> One possible arrangement could be: >>> >>> >>> static volatile bool sensor_stopped; >>> static SemaphoreHandle_t sensor_semaphore; >>> >>> // TaskA >>> static const TickType_t delay_ticks = pdMS_TO_TICKS(10); >>> >>> while (true) { >>> �����// Get sample and process >>> �����if (sensor_stopped) { >>> �������� // Wait for semaphore to be ready >>> �������� xSemaphoreTake(sensor_semaphore, portMAX_DELAY); >>> �������� // Then release it again >>> �������� xSemaphoreGive(sensor_semaphore); >>> �����} >>> �����vTaskDelay(delay_ticks);��� // DelayUntil >>> } >>> >>> >>> // TaskB >>> static void turn_off_sensor(void) { >>> �����sensor_stopped = True; >>> �����xSemaphoreTake(sensor_semaphore, 0); >>> } >>> >>> static void turn_on_sensor(void) { >>> �����sensor_stopped = False; >>> �����xSemaphoreGive(sensor_semaphore); >>> } >>> >>> The "sensor_stopped" flag is really just an optimisation, so that >>> TaskA doesn't have to keep checking the semaphore.� It can be dropped >>> if you want. >> >> I'm not sure about xSemaphoreTake(sensor_semaphore, 0) in >> turn_off_sensor(). You pass a zero timeout. Are you sure you will be >> able to take the semaphore in *every* situation? What happens if >> turn_off_sensor() is run immediately after xSemaphoreTake(..., >> portMAX_DELAY) returns? >> In this case, I think taskB wouldn't be able to take the semaphore and >> so taskA doesn't really block. You will have sensor_stopped true, but >> the semaphore released. >> > A Take with a zero timeout will lower the semaphore if it happens to be > raised, and continue if is was lowered. This will cause task A to go and > block on the Semaphore until task B gives it again. The take operation > returns a flag as to if it succeeded, and Task a should really use a: > if(xSemaphoreTake(sensor_semaphore, portMAX_DELAY)) { > xSemaphoreGive(sensor_semaphore); > }
If portMAX_DELAY is used in xSemaphoreTake(), I think the function returns only if it was able to take the semaphore (if we don't use functions that abruptly let one task out of blocked state).
> Otherwise if something causes the Take to abort, the the sampling will > start back up. > > I would probably do the take before setting the flag, so if TaskA runs > between the statements it doesn't do its take, and pass.
Anyway my objection is still there. If turn_off_sensor() is executed immediately after xSemaphoreTake(sensor_semaphore, portMAX_DELAY) returns, taskB wouldn't be able to take the semaphore, so taskA will wrongly continue doing its work. I think we can use portMAX_DELAY in turn_off_sensor() too. In the worst case, taskB would wait for a very short time to take the semaphore, but usually it takes the semaphore immediately.
On 23/03/2020 13:00, Richard Damon wrote:
> On 3/23/20 4:09 AM, David Brown wrote: >> On 23/03/2020 01:50, Richard Damon wrote: >>> On 3/22/20 2:42 PM, Les Cargill wrote: >>>> >>>> Small digression: ( out of ignorance of the M3 ) - There is nothing you >>>> have to do with an MMU or other caching furniture to guarantee >>>> serialization of access? So "volatile" is known to be sufficient? >>>> >>>> I'd be tempted to put a mutex on an measure the cost because race >>>> conditions are quite ... challenging to test for. >>> >>> The key point is that things are different for a >>> single-processor/single-master system, then when you have multiple >>> processors/bus masters. A single processor system that didn't keep >>> accesses in order at least to its own view would be seriously broken. At >>> worse there could be a few cycles of delay slots that the compiler would >>> keep track of that needed to be observed between a write and a read. >>> >>> The one catch with a single processor system is if it has something like >>> DMA, then the memory accessed by the DMA needs the protections like a >>> multiprocessor system. >>> >>> The volatile system was invented back at a time when for most machines, >>> it was sufficient for most types of access. >>> >> >> It's also worth noting that even though "volatile" is not sufficient for >> bigger systems (multiple bus masters, like multiple cores or DMA), it is >> generally still /necessary/ along with the memory barriers, cache >> flushes, synchronisation instructions, locks, etc. >> >> Sometimes with modern C or C++ you can use atomic types rather than >> volatile, since the atomic types are effectively volatile in themselves >> (as well as having other properties and synchronisation features). >> >> And of course compiler-specific features can replace the need of >> volatile (such as using a gcc asm memory clobber). >> > > I will admit that it isn't my area of expertise, as most of my > programming is done on machines where volatile is sufficient for most of > the needs, but I remember arguments that for the big machines volatile > wasn't needed at all, as you needed a barrier that said every thing that > happened before this will be done before anything that happens after > this, and with such a barrier volatile isn't needed.
Yes - if you have such a barrier, volatile is generally not needed (that was one of the exceptions I gave). But the barrier needs to be defined correctly. If you write: uint64_t data; bool ready_flag; data = 123; asm ("dmb"); ready_flag = true; then the "dmb" instruction will form a memory barrier - all previously encountered memory operations are completed before the "dmb" happens, and none of the following memory operations are started. But there is nothing stopping the compiler re-arranging the code like: ready_flag = true; asm ("dmb"); data = 123; Instructions like "dmb" force an order on the cpu operations, not the compiler - while "volatile" enforces a partial order on the compiler, but not the hardware. The standard solution would be asm("dmb" ::: "memory"), where the memory clobber forces an ordering on the compiler - and thus you can (usually) omit the "volatile".
> > It may well be, and I think it is, that the Standards adopted weaker > barriers (that are perhaps significantly cheaper to perform) and with > those weaker barriers we still have a need for volatile.
The C11 and C++11 atomics have a number of barriers - some stronger, some weaker in various senses. Prior to C11 and C++11, there were no standard barriers at all.
> > The one key thing that I remember is that the compiler needs to > understand these barriers, as it needs to avoid moving accesses accross > them. >
Yes.
On 23/03/2020 13:47, pozz wrote:
> Il 23/03/2020 13:10, Richard Damon ha scritto: >> On 3/19/20 10:31 AM, pozz wrote: >>> Il 19/03/2020 12:12, David Brown ha scritto: > >>>> One possible arrangement could be: >>>> >>>> >>>> static volatile bool sensor_stopped; >>>> static SemaphoreHandle_t sensor_semaphore; >>>> >>>> // TaskA >>>> static const TickType_t delay_ticks = pdMS_TO_TICKS(10); >>>> >>>> while (true) { >>>> ������// Get sample and process >>>> ������if (sensor_stopped) { >>>> ��������� // Wait for semaphore to be ready >>>> ��������� xSemaphoreTake(sensor_semaphore, portMAX_DELAY); >>>> ��������� // Then release it again >>>> ��������� xSemaphoreGive(sensor_semaphore); >>>> ������} >>>> ������vTaskDelay(delay_ticks);��� // DelayUntil >>>> } >>>> >>>> >>>> // TaskB >>>> static void turn_off_sensor(void) { >>>> ������sensor_stopped = True; >>>> ������xSemaphoreTake(sensor_semaphore, 0); >>>> } >>>> >>>> static void turn_on_sensor(void) { >>>> ������sensor_stopped = False; >>>> ������xSemaphoreGive(sensor_semaphore); >>>> } >>>> >>>> The "sensor_stopped" flag is really just an optimisation, so that >>>> TaskA doesn't have to keep checking the semaphore.� It can be dropped >>>> if you want. >>> >>> I'm not sure about xSemaphoreTake(sensor_semaphore, 0) in >>> turn_off_sensor(). You pass a zero timeout. Are you sure you will be >>> able to take the semaphore in *every* situation? What happens if >>> turn_off_sensor() is run immediately after xSemaphoreTake(..., >>> portMAX_DELAY) returns? >>> In this case, I think taskB wouldn't be able to take the semaphore and >>> so taskA doesn't really block. You will have sensor_stopped true, but >>> the semaphore released. >>> >> A Take with a zero timeout will lower the semaphore if it happens to be >> raised, and continue if is was lowered. This will cause task A to go and >> block on the Semaphore until task B gives it again. The take operation >> returns a flag as to if it succeeded, and Task a should really use a: >> �� if(xSemaphoreTake(sensor_semaphore, portMAX_DELAY)) { >> ����� xSemaphoreGive(sensor_semaphore); >> �� } > > If portMAX_DELAY is used in xSemaphoreTake(), I think the function > returns only if it was able to take the semaphore (if we don't use > functions that abruptly let one task out of blocked state). > >> Otherwise if something causes the Take to abort, the the sampling will >> start back up. >> >> I would probably do the take before setting the flag, so if TaskA runs >> between the statements it doesn't do its take, and pass. > > Anyway my objection is still there. If turn_off_sensor() is executed > immediately after xSemaphoreTake(sensor_semaphore, portMAX_DELAY) > returns, taskB wouldn't be able to take the semaphore, so taskA will > wrongly continue doing its work.
Yes, I think the code is not quite right. I am not happy about it anyway - it doesn't feel good. But as I said before, I think you would be better with a re-think of the design at a higher level.
> > I think we can use portMAX_DELAY in turn_off_sensor() too. In the worst > case, taskB would wait for a very short time to take the semaphore, but > usually it takes the semaphore immediately. >
That depends on the timing requirements, but could work.
On 3/23/20 8:47 AM, pozz wrote:
> > Anyway my objection is still there. If turn_off_sensor() is executed > immediately after xSemaphoreTake(sensor_semaphore, portMAX_DELAY) > returns, taskB wouldn't be able to take the semaphore, so taskA will > wrongly continue doing its work.
Yes, if TaskB has priority greater than or equal to TaskA, then it could come in between the Take and the Give. One solution would be to make the Give back conditional on not stopped, as well as the getting and processing of the sample. Something like: while (true) { if(sensor_stopped) { xSemaphoreTake(sensor_semaphore, portMAX_DELAY); if(!sensor_stopped) { xSemaphoreGive(sensor_semaphore); } } else { // Get sample and process vTaskDelay(delay_ticks; } } With this, if TaskB just sets the stopped flag, TaskA will take the semaphore, not give it, then loop around and take it again and wait. If TaskB does a 0 timeout take, it can prevent that extra loop. If TaskA gets accidentally woken up, then it still doesn't give the semaphore and blocks on the next loop.
On 24/3/20 12:21 am, David Brown wrote:
> Instructions like "dmb" force an order on the cpu operations, not the > compiler - while "volatile" enforces a partial order on the compiler, > but not the hardware. > > The standard solution would be asm("dmb" ::: "memory"), where the memory > clobber forces an ordering on the compiler - and thus you can (usually) > omit the "volatile".
David, if you have time, many of us would appreciate a quick summary of the situations where the different barriers should be used. "dmb" can be a full read/write barrier, or just a write barrier. When to use each? "dsb" is different from "dmb" because it limits instruction ordering, but when is that useful? When to use it as a write-only barrier? "isb" is intended to precede a context switch, i.e. task switching in an RTOS. Is it sufficient for that, and is that the only time to use it? If you know a good article that gives practical guidelines, post a link, otherwise I'd really like to hear your thoughts. Clifford Heath.