Mutex vs. Semaphores – Part 2: The Mutex & Mutual Exclusion Problems
Part 1 of this series we looked at the history of the binary and counting semaphore, and then went on to discuss some of the associated problem areas. In this posting I aim to show how a different RTOS construct, the mutex, may overcome some, if not all, of these weaknesses.
To address the problems associated with semaphore, a new concept was developed during the late 1980’s. I have struggled to find it’s first clear definition, but the major use of the term mutex (another neologism based around MUTual EXclusion) appears to have been driven through the development of the common programming specification for UNIX based systems. In 1990 this was formalised by the IEEE as standard IEEE Std 1003.1 commonly known as POSIX.
The mutex is similar to the principles of the binary semaphore with one significant difference: the principle of ownership. Ownership is the simple concept that when a task locks (acquires) a mutex only it can unlock (release) it. If a task tries to unlock a mutex it hasn’t locked (thus doesn’t own) then an error condition is encountered and, most importantly, the mutex is not unlocked. If the mutual exclusion object doesn’t have ownership then, irrelevant of what it is called, it is not a mutex.
The concept of ownership enables mutex implementations to address the problems discussed in part 1:
- Accidental release
- Recursive deadlock
- Task-Death deadlock
- Priority inversion
- Semaphore as a signal
As already stated, ownership stops accidental release of a mutex as a check is made on the release and an error is raised if current task is not owner.
Due to ownership, a mutex can support relocking of the same mutex by the owning task as long as it is released the same number of times.
With ownership this problem can be addressed using one of the following priority inheritance protocols:
- [Basic] Priority Inheritance Protocol
- Priority Ceiling Protocol
The Basic Priority Inheritance Protocol enables a low-priority task to inherit a higher-priorities task’s priority if this higher-priority task becomes blocked waiting on a mutex currently owned by the low-priority task. The low priority task can now run and unlock the mutex – at this point it is returned back to its original priority.
The details of the Priority Inheritance Protocol and Priority Ceiling Protocol (a slight variant) are covered later in this article.
If a task terminates for any reason, the RTOS can detect if that task current owns a mutex and signal waiting tasks of this condition. In terms of what happens to the waiting tasks, there are various models, but two dominate:
- All tasks readied with error condition;
- Only one task readied; this task is responsible for ensuring integrity of critical region.
When all tasks are readied, these tasks must then assume critical region is in an undefined state. In this model no task currently has ownership of the mutex. The mutex is in an undefined state (and cannot be locked) and must be reinitialized.
When only one task is readied, ownership of the mutex is passed from the terminated task to the readied task. This task is now responsible for ensuring integrity of critical region, and can unlock the mutex as normal.
Mutual Exclusion / Synchronisation
Due to ownership a mutex cannot be used for synchronization due to lock/unlock pairing. This makes the code cleaner by not confusing the issues of mutual exclusion with synchronization.
A specific Operating Systems mutex implementation may or may not support the following:
- Priority Inheritance
- Death Detection
Mutual Exclusion Problems
The mutex is a significantly safer mechanism to use for implementing mutual exclusion around shared resources. Nevertheless, there are still a couple of problems that use of the mutex (in preference to the semaphore) will not solve. These are:
- Circular deadlock
Circular deadlock, often referred to as the “deadly embrace” problem is a condition where two or more tasks develop a circular dependency of mutual exclusion. Simply put, one task is blocked waiting on a mutex owned by another task. That other task is also block waiting on a mutex held by the first task.
So how can this happen? Take as an example a small control system. The system is made up of three tasks, a low priority Control task, a medium priority System Identification (SI) task and a high priority Alarm task. There is an analogue input shared by the Control and the SI tasks, which is protected by a mutex. There is also an analogue output protected by a different mutex.
The Control task waits for mutexes ADC and DAC:
/* critical section */
The SI Task waits for mutexes DAC and ADC:
/* critical section */
Unfortunately, under certain timing conditions, this can lead to deadlock. In this example the Control task has locked the ADC, but before locking the DAC has been pre-empted by the higher priory SI task. The SI task then locks the DAC and tries to lock the ADC. The SI task is now blocked as the ADC is already owned by the Control task. The Control task now runs and tries to lock the DAC. It is blocked as the DAC is held by the SI task. Neither task can continue until the mutex is unlocked and neither mutex can be unlocked until either task runs – classic deadlock.
For circular deadlock to occur the following conditions must all be true:
- A thread has exclusive use of resources (Mutual exclusion)
- A thread can hold on to a resource(s) whilst waiting for another resource (Hold and wait)
- A circular dependency of thread and resources is set up (Circular waiting)
- A thread never releases a resource until it is completely finished with it (No resource preemption)
These conditions can be addressed in a number of ways. For example, a design policy may stipulate that if a task needs to lock more than one mutex it must either lock all or none.
Priority Ceiling Protocol
With the Priority Ceiling Protocol (PCP) method each mutex has a defined priority ceiling, set to that of the highest priority task which uses the mutex. Any task using a mutex executes at its own priority – until a second task attempts to acquire the mutex. At this point it has its priority raised to the ceiling value, preventing suspension and thus eliminating the “hold and wait” condition.
In the deadlock example shown before, the significant point is when the SI task tries to lock the DAC. Before that succeeded and lead to cyclic deadlock. However with a PCP mutex, both the ADC and DAC mutex will have a ceiling priority equal to the SI’s task priority. When the SI task tries to lock the DAC, then the run-time system will detect that the SI’s task priority is not higher than the priority of the locked mutex ADC. The run-time system suspends the SI task without locking the DAC mutex. The control task now inherits the priority of the SI task and resumes execution.
The last, but most important aspect of mutual exclusion covered in these ramblings relies on one founding principle: we have to rely on all tasks to access critical regions using mutual exclusion primitives. Unfortunately this is dependent on the design of the software and cannot be detected by the run-time system. This final problem was addressed by Tony Hoare, called the Monitor.
The monitor is a mechanism not typically supplied by the RTOS, but something the programmer tends to build (a notable exception is Ada95’s protected object mechanism). A monitor simply encapsulates the shared resource and the locking mechanism into a single construct (e.g. a C++ Object that encapsulates the mutex mechanism). Access to the shared resource, then, is through a controlled interface which cannot be bypassed (i.e. the application never explicitly calls the mutex, but calls upon access functions).
This goal of these initial postings is to demonstrate that common terms used in the real-time programming community are open to ambiguity and interpretation. Hopefully you should now be clear about the core differences between the Binary Semaphore, General (counting) Semaphore and the Mutex.
The underlying difference between the Semaphores and the Mutex is the Principle of Ownership. Given the principle of ownership a particular implementation of a mutex may support Recursion, Priority inheritance and Death Detection.
An aspect of the mutex I haven’t covered here is that many operating systems support the concept of a condition variable. A condition variable allows a task to wait on a synchronization primitive within a critical region. The whole aspect Synchronization Patterns (e.g. semaphore as a signal) within the context of RTOSs will be the subject of a future posting.
Very informative article. Many developers underestimate the risks and skills needed to use the various RTOS mechanisms safely and correctly and this article illuminates some of the problems and solutions.
I have a question regarding the use of a mutex: What do you think about thread calling a blocking mechanism (e.g., delay() or semaphore wait()) while already holding a mutex? How often do you see it in real-life projects?
Hi Miro, thanks for the comment.
The condition-object is there specifically to allow wait-semantics within a critical region. No thread should ever explicitly delay/wait in a critical region.
I've seen delays used but not sem-waits (I guess these get debugged pretty quickly). Actually, a more common, and more worrying, issue is that I've seen numerous companies using priority mechanisms instead of using mutual-exclusion!
For most applications, the mutex is still too granular for most application programmers and should ideally hidden behind a monitor construct.
Thank you for the great article Niall.
I've read explanations of semaphores and mutexes in different textbooks, but your brief article is much clearer and easy to understand.
I also appreciate your reference to Ada and its built in protected object feature. This is one of the constructs that drew me to learn the language and one I think many application writers would benefit from.
Thanks you for your kind words, they are much appreciated.
I was disappointed that when the Threading APIs for C11 and C++11 were published, concepts such as protected objects and aync-pipes were not part of the model.
One of the things that attracts me to Rust (a really viable alternative to Ada, more so than C) is that concepts such as RwLock and Pipes are built in, but alas, not protected object.
To address the problems associated with semaphore, a new concept was developed during the late 1980’s.Try the late 1960s or early 1970s. I'm not sure of who came up with the term first, but it is present in several textbooks and papers in the early 1970's, for example Operating System Principles
My understanding, but I'm always happy to be corrected, is that typically the examples used were still based around the 'type' semaphore but the variable name was 'mutex'.
I certainly wasn't aware of Mutex as a 'type' until the late-80's (when I happened to be working for Ready Systems - remember VRTX?)
Some commercial RTOSs have a mutex that temporarly bumps the priority of a lower task while the mutex is active. The goal is to prevent the 2-task deadlock as described in the article.
BTW - excellent article!
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.