## Synchronizing access to shared bits array from several tasks

Started by 5 years ago11 replieslatest reply 5 years ago1142 views

Hello,

I have been developing control software based on #FreeRTOS. I have divided the
whole software into several tasks (the first executed every 1 ms, the second
executed every 20 ms and third executed every 100 ms). Each task call several
me-defined functions. These functions have as one of their parameters some bit
variables (for example signal for blocking the PI controller or ramp calculation).
These bit variables are defined in a "bit array" which is shared by the tasks.
I have written several functions for manipulating with the bits in the bits array.
But I have problem with synchronizing the access to the bits array from the tasks.
I have tried to take semaphore (mutex) at the beginnig of each bit handling
function and then release the semaphore (mutex) at the end. But these tries were
unsuccessful. The bits in the bits array are randomly overwritten by the tasks
and all the timing of the software is corrupted. I don't know where the problem
is. I am also not sure if my "architecture" with the shared bits array is the
right one. Can anybody more experienced give me any advice or recommendation how
to solve this problem? Thanks in advance for any suggestions.

[ - ]
Reply by February 20, 2017

Hi,

My first guess would be that you are not checking if the mutex/semaphore is already taken when you are accessing the data. The mutex/semaphore itself will not protect access if you don't check its state before updating your shared data.

A good example how to use it: http://www.freertos.org/a00122.html

And some background:

Mutex vs Semaphore: "Though mutex & semaphores are used as synchronization primitives ,there is a big difference between them. In the case of mutex, only the thread that locked or acquired the mutex can unlock it. In the case of a semaphore, a thread waiting on asemaphore can be signaled by a different thread "

From the FreeRTOS site:

Mutexes and binary semaphores are very similar but have some subtle differences: Mutexes include a priority inheritance mechanism, binary semaphores do not. This makes binary semaphores the better choice for implementing synchronisation (between tasks or between tasks and an interrupt), and mutexes the better choice for implementing simple mutual exclusion.

The priority of a task that 'takes' a mutex will be temporarily raised if another task of higher priority attempts to obtain the same mutex. The task that owns the mutex 'inherits' the priority of the task attempting to 'take' the same mutex. This means the mutex must always be 'given' back - otherwise the higher priority task will never be able to obtain the mutex, and the lower priority task will never 'disinherit' the priority.

[ - ]
Reply by February 20, 2017

It's difficult to say with out seeing the source. Definitely the other comments should be followed. It sounds like bit array accesses are not truely being accessed as a critical section.

A few things to try:

1) Hunt down all areas of the code that are accessing the bit arrays and verify that they are protected by the semaphore.

2) Make sure that if the semaphore times out, that the bit array is not accessed

3) The best way to visually see what is happening would be to download Percepio Tracealyzer and view the trace for the issue. It will let you see when the semaphore is acquired, returned and when tasks are pre-empting. (Alternatively you could try Segger SysView)

[ - ]
Reply by February 20, 2017

The mutex should be working.... chances are the code is still missing some location where the bit array is manipulated.

If the code passes the bit array as a parameter to manipulating functions, that could also be part of the issue. Depending upon how the array is passed, it may be copying the bit array outside the mutex locking protection. Probably not... but worth considering in the code review.

Providing sample code may help...

Matthew

https://covemountainsoftware.com/consulting/

[ - ]
Reply by February 20, 2017

Another point: consider a change in architecture.

Instead of 3 tasks, perhaps reduce the architecture to a single task with with 3 freertos timer events feeding an event queue for the single task. The single task would just wait forever on a queue msg which would receive events.

Events might be:

1_MS_EVENT

20_MS_EVENT

100_MS_EVENT

Then the code could potentially eliminate the mutual exclusion issue entirely.

Concern: a 1ms timer is probably an issue (jitter, maybe overhead and impact on other FreeRTOS timers) depending upon the FreeRTOS tick rate configuration. Might need to use a HW timer to create the event, and then use http://www.freertos.org/a00119.html to inject the 1ms event into your queue.

I'm a big fan of "event driven" architecture. See here:

and some sample/demo code (using FreeRTOS) here:

Best regards,

Matthew

[ - ]
Reply by February 20, 2017

I've worked on systems that built task loops inside of RTOS tasks.  They were, uniformly, disasters.  Why make a task loop if you've got a perfectly good RTOS?  Why use an RTOS if just a task loop will do?

[ - ]
Reply by February 20, 2017

Hi Tim,

If I understand your concern, I agree with you: while(1) style loops that are just checking bit flags or event flag variables are problematic, and quickly devolve into spaghetti code as the complexity and feature count managed by the code increase. That being said, multi-threaded software has areas of concern too.

My suggestion was limited to the minimal details provided and I was not trying to advocate creating a new type of task loop nor elimination of the RTOS. I was suggesting an "event driven" approach for this specific case where three timer/polling style tasks are accessing the same exclusive resource. The benefits of my proposal:

• Eliminates two task stack allocations, saving ram.
• Eliminates mutual exclusion overhead, saving CPU time and any RAM dedicated to the mutex.
• This assumes that the three threads mentioned are in fact the only tasks/threads in the system that access the shared data.
• Single thread that simply waits on queued messages (events), which will now always be guaranteed to be processed in order.

My proposal is based on the concepts presented in one of my favorite books: Miro Samek’s book Practical UML Statecharts in C/C++: Event-Driven Programming for Embedded Systems.

FWIW, every system I've worked on that follows the event driven paradigm has been successful, easy to understand, and easy to expand.

Best regards,

Matthew

[ - ]
Reply by February 20, 2017

Thanks for replying, and being nice in the face of my strongly worded reaction to your suggestion.

You remind me that I did not raise a concern with the OP: is he creating this common resource unnecessarily?  If the flag bits in his bit field are unrelated and he's putting them into one variable for the sake of parsimony, he may want to reevaluate his approach.  If he's setting bits based on external events then for at least some of them there's a good chance that he should just have tasks to service those events.

(I have seen while(1) loops work reasonably well -- but one has to be very structured in one's approach to dealing with them, and to avoid the spaghetti code in maintenance one must approach the maintenance task with as much care as if one were designing from scratch.  I've seen it work, and work well, but only when you're solving small problems.)

[ - ]
Reply by February 20, 2017

Without knowing your code or your intent I can't be entirely sure what's going on.

A suggestion -- which will probably take some work at this point -- is to hide your bitfield behind some sort of semantic barrier in your code: either put it inside a class if you're coding in C++, or make it a static local variable in a file if you're coding in C.  Then access that variable with accessor functions that include the necessary mutexes.

The nice thing about this is that the means to modify the bitfield is enforced by the tools, so if you can do it at all you'll be doing it right.

In C, this would be something like:

static int bitfield = 0;
int check_bit(int mask)
{
return bitfield & mask;
}

int set_bit(int mask)
{
retval = -1;
if (grab_mutex_successful())
{
bitfield |= mask;
release_mutex();
retval = 0;
}
return retval;
}

int reset_bit(int mask)
{
retval = -1;
if (grab_mutex_successful())
{
bitfield &= ~mask;
release_mutex();
retval = 0;
}
return retval;}
[ - ]
Reply by February 20, 2017

You should not be timing the controller in code with a bitfield -- you have an RTOS for that.  Unless you're tightly resource constrained you should separate the job into logical tasks, and make an RTOS task for each logical task.

This almost certainly means that you want to put the controller in its own task that wakes up and does its business when it needs to.  Ditto your ramp generator, and any other bits and bobs.

If you find that you have one task of the form "if this flag is sent then do A, if that flag is set then do B, etc.", then you've just reenvinted the task loop and put it inside an RTOS task, and that is usually not a good idea.  The whole reason for an RTOS is to take everything out of task loops.

[ - ]
Reply by February 21, 2017

Thank you very much for your help Tim Wescott. My idea behind the bits array isn't to completely block/unblock the processing of PI controller or ramp functions in a task. Both functions are processed with the execution period of the task in which they are placed. The bits are used only as "signals" to set some prescribed value at the output of the ramp or PI controller (bypass the calculation of their difference equations) as a reaction to some event in the controlled system. One of the events is for example transition into "run" state in system's finite state machine. But it isn't the only one event. For example the PI controller function has some bit signaling that the action variable is for example at the allowable maximum or minimum value.

[ - ]
Reply by February 21, 2017

Two solutions that I have used for this in the past (other than the one I mentioned):

First, if the only thing that matters is the instantaneous value of the bit and there's no two-way communications going on, use one bit per variable, and use variables that are atomic types for the processor.  If you do this you'll want to double-check the assembly code, and you'll want to be aware that your code may not be fully portable.  If you're writing to the variable from more than one task, this may not be a good way to go.

Second, you can protect an operation like that by turning off all interrupts, making your change, and turning on all interrupts.  I mention this with a bit of trepidation, because I've seen people get entirely too enthusiastic about using this and screw up systems because of it.  Basically, if you go turning off interrupts then whatever time is spent with interrupts off increases the worst-case interrupt latency, and by extension the worst-case task-switch time.  However, for things like this you can do the deed in maybe 50 times fewer clock ticks than by going through the OS.