Modern Embedded Systems Programming: Beyond the RTOS
An RTOS (Real-Time Operating System) is the most universally accepted way of designing and implementing embedded software. It is the most sought after component of any system that outgrows the venerable "superloop". But it is also the design strategy that implies a certain programming paradigm, which leads to particularly brittle designs that often work only by chance. I'm talking about sequential programming based on blocking.
Blocking occurs any time you wait explicitly in-line for something to happen. All RTOSes provide an assortment of blocking mechanisms, such as time-delays, semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles. Typically, however, tasks block not in just one place in the endless loop, but in many places scattered throughout various functions called from the task routine. For example, in one part of the loop a task can block and wait for a semaphore that indicates the end of an ADC conversion. In other part of the loop, the same task might wait for an event flag indicating a button press, and so on.
This excessive blocking is evil, because it appears to work initially, but almost always degenerates into a unmanageable mess. The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such a task cannot be easily extended to handle new events, not just because the system is unresponsive, but mostly due to the fact the the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.
You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is developed over time by gradually adding new events and behaviors. The inflexibility makes it exponentially harder to grow and elaborate an application, so the design quickly degenerates in the process known as architectural decay.
The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is the unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data and resources as an already existing feature (such features are called cohesive). But unresponsiveness forces you to add the new feature in a new task, which requires caution with sharing the common data. So mutexes and other such *blocking* mechanisms must be applied and the vicious cycle tightens. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, intermittent, unintended side-effects.
For these reasons experienced software developers avoid blocking as much as possible. Instead, they use the Active Object design pattern. They structure their tasks in a particular way, as "message pumps", with just one blocking call at the top of the task loop, which waits generically for all events that can flow to this particular task. Then, after this blocking call the code checks which event actually arrived, and based on the type of the event the appropriate event handler is called. The pivotal point is that these event handlers are not allowed to block, but must quickly return to the "message pump". This is, of course, the event-driven paradigm applied on top of a traditional RTOS.
While you can implement it manually on top of a conventional RTOS, an even better way is to implement this pattern as a software framework, because a framework is the best known method to capture and reuse a software architecture. In fact, you can already see how such a framework already starts to emerge, because the "message pump" structure is identical for all tasks, so it can become part of the framework rather than being repeated in every application.
This also illustrates the most important characteristics of a framework called inversion of control. When you use an RTOS, you write the main body of each task and you call the code from the RTOS, such as delay(). In contrast, when you use a framework, you reuse the architecture, such as the "message pump" here, and write the code that it calls. The inversion of control is very characteristic to all event-driven systems. It is the main reason for the architectural-reuse and enforcement of the best practices, as opposed to re-inventing them for each project at hand.
But there is more, much more to the Active Object framework. For example, a framework like this can also provide support for state machines (or better yet, hierarchical state machines), with which to implement the internal behavior of active objects. In fact, this is exactly how you are supposed to model the behavior in the UML (Unified Modeling Language).
As it turns out, active objects provide the sufficiently high-level of abstraction and the right level of abstraction to effectively apply modeling. This is in contrast to a traditional RTOS, which does not provide the right abstractions. You will not find threads, semaphores, or time delays in the standard UML. But you will find active objects, events, and hierarchical state machines.
An AO framework and a modeling tool beautifully complement each other. The framework benefits from a modeling tool to take full advantage of the very expressive graphical notation of state machines, which are the most constructive part of the UML.
In summary, RTOS and superloop aren't the only game in town. Actor frameworks, such as Akka, are becoming all the rage in enterprise computing, but active object frameworks are an even better fit for deeply embedded programming. After working with such frameworks for over 15 years , I believe that they represent a similar quantum leap of improvement over the RTOS, as the RTOS represents with respect to the “superloop”.
If you'd like to learn more about active objects, I recently posted a presentation on SlideShare: Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
- Comments
- Write a Comment Select to add a comment
But what you're really saying is that an RTOS, used incorrectly, is bad. Then you advance your method, but you talk about how it'll work when used well.
As a counter-example, I once inherited some Very Bad Code that used your "automagically good" active object design. I could describe the code in a couple of paragraphs, but the bottom line is that the best solution would have been to split that one task into three, with two of the tasks changed from message pumps to simple loops, each that polled the state of a switch. What I had to do (because of time constraints) was to take the incoming messages and write a good old-fashioned task loop that emulated those three tasks. It was pretty close to what you recommend, and it was a horror because it was done wrong.
The answer to software engineers going off the rails and doing stupid things isn't nifty new technologies -- it's to hire disciplined people to your team, reward them for being disciplined, and use design reviews to keep their work disciplined. All the candy coating in the world won't make the brown stuff taste like chocolate if it started out as something fundamentally different.
Specifically to your counter-example, active objects are not particularly suitable for doing low-level stuff, such as switch de-bouncing. In your particular case, the events fed to the active object were apparently very low-level and low-quality. These low-level events needed to be turned into higher-level events first, before being really useful to drive a nicely-structured state machine. But, because the production of higher-level events was conflated with real work, it looked like a mess. To me it's another example of the old principle: "garbage in, garbage out".
For that reason, all examples that ship with the QP frameworks perform switch de-bouncing in the ISRs (yes, using the simple switch statement). This is meant to show that the job of ISRs it to produce high-level, high-quality events to drive active objects.
However, I fundamentally disagree with your conclusion that we should not look for better methods and patterns, but instead base everything on heroism of disciplined teams.
But I am merely trying to educate fellow embedded developers that RTOS is no longer the last step or the only game in town. There are other ways of designing RTE software (even if you think of active objects as just another way of using the RTOS), which lead to a paradigm shift. This paradigm shift opens up many new possibilities not available really with the "naked" RTOS, such as state machine modeling and truly useful code generation, to name just a few.
So, yes, I absolutely agree that the most common cause of bugs is human error. But instead of just stating this fact and offering only "iron discipline", I'm trying to be more constructive. Code generation, for example, eliminates a lot of human errors.
Finally, I would very much like to set the record straight by saying that the Active Object design pattern is *not* a silver bullet. It obviously comes with problems of its own. But after working with both the traditional RTOS and Active Object frameworks for over 15 years, I must say that the problems with Active Object are easier to handle than "free-threading" with an RTOS.
7 of them have their own dedicated compare timer register for 1ms to 60sec delays.
All events are state-machines (switch and case) and each state is 2-10 lines.
A event is started by button IRQ or a timer IRQ etc, or a current event can start other events.
I guess, it would be helpful if we could all agree on terminology. Unfortunately, I see a lot of confusion in our filed of real-time embedded programming, to the point that it is almost impossible to discuss anything. I'm not sure why that is...
What I find strange is that, at least in my experience, experienced developers tend to marry to a design technique and stick with it to the point of neglecting any other good tool/technique. They won't even check it for themselves. It's amazing how they react whenever I show them something new to try. The answers are always the same: "Sounds good, but I'll stick with my method", "Sounds great ... (but then nothing changes)", "I don't know ...", "It's great, but I don't have the time to learn anything new at this stage of the project" and the likes. And if you really want to hear stupid answers ask any developer: Why don't you write tests for your code?
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: