Memfault State of IoT Report

From Baremetal to RTOS: A review of scheduling techniques

Jacob BeningoJune 8, 201617 comments

Transitioning from bare-metal embedded software development to a real-time operating system (RTOS) can be a difficult endeavor. Many developers struggle with the question of whether they should use an RTOS or simply use a bare-metal scheduler. One of the goals of this series is to walk developers through the transition and decision making process of abandoning bare-metal thinking and getting up to speed quickly with RTOSes. Before diving into the details of RTOSes, the appropriate first step is review some of the most popular scheduler techniques available to embedded software developers.

From bare-metal to RTOS Quick Links

Technique #1 – Round Robin

The easiest scheduler to implement in software is the Round Robin scheduling technique. Round Robin is so easy to implement because it is a natural way of writing software. Developers simply add new code to the super loop and a task is now added. There is no need to create dynamic and memory expensive objects. Just add code!

This article is available in PDF format for easy printing

One of the biggest problems with a Round Robin approach to scheduling is that the real-time performance of the system is affected every time new code is added. The super loop might be executing once per 100 microseconds but suddenly add new code and now it’s 110 microseconds. That code pieced developed to execute perfectly every is now toast and has to be reworked. The fact of the matter is that using just a Round Robin approach results in constant rework to ensure system timing and can be very difficult to get predictable results for hard real-time requirements.

A simple example of how a Round Robin scheduler works can be seen in Figure 1.

Figure 1 - Round Robin Scheduling

Technique #2 – Round Robin with Interrupts

Bare-metal developers never allowed the issues with Round Robin to get them down. There are dozens of incremental techniques that can be used as a scheduler before having to roll out the big gun (RTOS). The next algorithm that developers have found quite useful over the years is Round Robin scheduling with interrupts. In these applications, any code that has hard real-time requirements or are timing sensitive are moved into an interrupt service routine. An example might be incoming UART data that if left too long could be overwritten by the next incoming data byte. The timing critical section, retrieve the data, is then handled as soon as the interrupt fires by moving the data byte into a circular buffer. Processing the data byte is usually a soft real-time requirement and can be handled by the super loop when it gets around to it.

The interrupt behavior can even be used to create pre-emptive behavior that for simple applications can mimic an RTOS. Careful selection of interrupt priorities and architecting the round robin loop can go a long way towards real-time and reliable behavior before needing to move on to other more advanced scheduling techniques. Figure 2 shows an example of what Round Round with interrupts would look like from a model point of view. On the left is the traditional Round Robin super loop. At any point, an interrupt could fire causes code execution to move from the left to the right and enter the interrupt service routine (ISR). The interrupt would then execute to completion, unless a priority interrupt fires, and then return where it left off on the left side of the diagram.

Figure 2 - Round Robin with Interrupts

Technique #3 – Queued 

Queued scheduling is an interesting technique that is an intermediate step between Round Robin with Interrupts and Cooperative scheduling. In most cases it makes sense to just use a Cooperative Scheduler, but it is worth taking a few moments to discuss Queued scheduling. Before discussing the details, examine Figure 3 for a moment. 

Figure 3 – Queue Scheduling Behavior

In Queued scheduling, an array of function pointers (queue) is created that initially contains only NULL pointers (Figure 3a). When a task needs to be executed, perhaps triggered by an interrupt for example, a pointer is inserted into the first NULL location. It is possible for multiple tasks to become ready for execution at the same time. When this occurs, the scheduler simply executes one task at a time (Figure 3b) in a Round Robin fashion until the queue is once again empty (Figure 3c and Figure 3D). Once a task has been executed, its function pointer location in the queue is returned to a NULL pointer.

Technique #4 – Cooperative

Cooperative schedulers are probably one of the most widely used bare-metal scheduling algorithms available to developers. A Cooperative scheduler resides in a very small memory footprint and can be easily configured to handle any number of tasks. A Cooperative scheduler consists of a simple structure that contains that contains a function pointer to the task to be executed, the period at which the task needs to be executed and last time the task was executed. Developers will often create an array of tasks that contain all of this critical task information.

The algorithm itself is pretty simple and straight forward. The system is first initialized and then a background timer is used to keep track of time. The scheduler reads the system tick and then loops through each of the tasks defined for the system and calculates whether it is time to execute the task or not. If so, the task is executed, otherwise the next task is checked. When more than one task needs to run, they are executed in a Round Robin fashion. Developers still need to take care to monitor their system execution times to make sure during the worst case, when all tasks have to execute, that all of the deadlines are still met. 

Developers interested in learning more about Cooperative schedulers can download an example here. The scheduler can be ran with any microcontroller and only requires that a single timer be setup to get the scheduler working.

Figure 4 – Cooperative Scheduler (High Level)

Technique #5 – Real-time Operating System

The last technique that we are going to touch on is the RTOS. The RTOS is the most powerful scheduler a real-time developer can use and also the most complicated. The RTOS can usually be configured to use a number of deterministic scheduling algorithms that guarantee task deadlines are met. Each task has its own separate task control block that contains its own stack, function and ID among other parameters. Each task can literally be viewed as its own separate application. The RTOS also comes with a wide variety of synchronization and communication tools such as semaphores, mutexes and message queues. 

RTOSes are a powerful tool for embedded software developers but can sometimes be overkill. For simple 8-bit or 16-bit applications a Cooperative scheduler may work perfectly fine and use far less memory. When developers are faced with using a complex 32-bit microcontrollers with more than 32 kB of flash and 4 kB of RAM, the use of a RTOS begins to make a lot of sense. Many 32-bit applications require the use of USB, TCP/IP and file systems which can be very difficult to develop for bare-metal applications. Most third party middleware is designed to integrate seamless with an RTOS.   


To use an RTOS or not use an RTOS, that is the real question. Each scheduling algorithm has its own advantages and disadvantages. We have examined the most commonly used scheduling algorithms which forms a basis for making such a determination. The next step is to learn when and why to use an RTOS. Stay tuned! 

Jacob Beningo is an embedded software consultant who currently works with clients in more than a dozen countries to dramatically transform their businesses by improving product quality, cost and time to market. He has published more than 200 articles on embedded software development techniques, is a sought-after speaker and technical trainer and holds three degrees which include a Masters of Engineering from the University of Michigan. Feel free to contact him here, at his website www.beningo.com, and sign-up for his monthly Embedded Bytes Newsletter here.

Memfault State of IoT Report
[ - ]
Comment by Tim WescottJune 8, 2016
I have some quibbles, but they're things that shouldn't be overlooked:

First, Jacob leaves out prioritized cooperative scheduling. In this scheme the tasks are prioritized and event-driven, and once whatever task that is running is finished, the highest-priority task that's ready to run gets the processor. It's probably the only "bare metal" method that I've used in the last 20 years.

Second, using an RTOS isn't really a 32-bit thing -- it's more a memory size thing. There are ARM Cortex M0 processors out there that address the space used by 8-bitters quite nicely, but they don't really have the memory space to support an RTOS -- nor are the applications for which they're appropriate really complex enough to demand an RTOS.

Third, even on a 32-bit processor, an RTOS can cause a time hit. Task switches just take many more clock ticks than interrupts. If you have an application that's small but needs to squeeze everything out of the processor that it can, then you should tread carefully. In such circumstances you either need to abandon the RTOS, or you need to have heavier-weight ISRs than you might otherwise feel is wise.

For all that -- good post!
[ - ]
Comment by beningjwJune 8, 2016
Thanks for the comment and sharing your experiences. It is interesting to hear that you've mostly used prioritized cooperative scheduling in your systems. I've only used that method perhaps 10% of the time which is why I didn't mention it (plus I have a list of a few dozen algorithms and didn't think my first post should be a white paper). I've found that for most applications that are bare-metal, a simple round-robin cooperative scheduler can be used paired with interrupts to handle events with prioritization.

I agree that the use of an RTOS is not just a 32-bit thing. I occasionally over generalize. I've used them in 16 bit applications often but for simple applications keep it bare-metal. In my next post I'm going to discuss in more detail when an RTOS should and shouldn't be used. Some of the points that you've mentioned will definitely be discussed.

[ - ]
Comment by AstroJerryJune 7, 2016
This is a nice summary of microcomputer multitasking techniques. Many years ago, I programmed in Forth which had round robin with interrupts multitasking built in. It worked quite well for most applications. For the last 25 years or so I have used Ada which has a nice RTOS built-in which is much better. Even the small footprint RTS included with the GNAT Ada for ARM processors dose a great job.
[ - ]
Comment by beningjwJune 8, 2016
Thanks for the comment. Very interesting about Ada RTOS. To date I have not ventured into programming with Ada but I have heard some heard interesting stories!
[ - ]
Comment by PierreMJune 20, 2016
Interesting article. Thanks for sharing :)

Are there dedicated strategies for watchdog handling depending of the scheduler/RTOS used ?
[ - ]
Comment by beningjwJune 20, 2016
Thanks for the comment. If you are interested in watchdogs, I wrote a white paper a few ago on the topic. It can be found at http://beningo.com/a-review-of-watchdog-architectures/
[ - ]
Comment by fabateraJune 8, 2016
I had participated to the Renesas Hands-on training on Synergy Platform 3 days ago. They provide an RTOS called ThreadX (from X press Logic) with usb, ethernet, file system etc for free. You only pay for the microcontroller (that is a little bit more expensive than others). Everything is visual. Very easy to use.
[ - ]
Comment by beningjwJune 8, 2016
Thanks for the comment. I've been pretty excited about what Renesas has been doing with their Synergy Platform. They've really removed the barriers for the use of an RTOS with middleware. Even to the degree as you mentioned that the RTOS, middleware and even IAR compiler licensing comes at no cost.
[ - ]
Comment by s-lightJune 10, 2016
thanks for this nice article.
now i learned how the concepts iam using are called ;)
on uC baremetal things iam doing some mixture of round robin and cooperative scheduler:
my main loop basically is a round robin approach - and some sub-modules called from there have there own cooperative scheduler type logic - so it is very modular - but therefore not optimized ;)
but in most cases for me its not soo relevant to be exactly spot on timing..
[ - ]
Comment by beningjwJune 11, 2016
You are welcome! Thanks for your comment and I'm glad to hear about your experiences in this area.
[ - ]
Comment by jkvasanJune 13, 2016

Very Nice overview on the schedulers. Could not download the samples. Probably server may be busy.

By the way, Round Robin with Interrupts seems to be the most used technique. Migrating from it to subsequently described methods appears to be very difficult for a common embedded designer. Synergy platform, from what we get to know from the website, can remove hesitations and straight away help in graduating further.
[ - ]
Comment by beningjwJune 13, 2016
Thanks for the comment. I just checked the link for the download and did not have any issues. Maybe try again at your convenience. If you still have issues feel free to email me and I can get you the files.

Microcontroller vendors are definitely in the process of trying to simply the development process for developers using their parts and toolchains.
[ - ]
Comment by JLuceJune 14, 2016
This post brings back memories. My first position out of college was as an Assembler developer on a bare metal system for group controller of up to 8 elevators in a bank for a double deck car. We used an 'executive' and everything ran off interrupts with a 16 ms timer calling tasks with processing sets at 16-32-48-64-96ms.

Best learning experience ever. I have to admit, I have used VxWorks, MTOS-68k, Linux Embedded (hardly RTOS) and VRTX along with a home brew by Bell Northern Research (RIP). For what we did with these RTOSes, all but a few projects could have been written with a bare metal executive.

I miss working with this stuff . Thank you for a look at what I consider the most excellent field in software.

[ - ]
Comment by beningjwJune 14, 2016
Thanks for your comments! I'm glad to hear that it brought back fond memories. It sounds like you have had some interesting experiences! Thanks for sharing!
[ - ]
Comment by johnIsaJune 18, 2016
Very good article!

I used to work in an environment where we has four queues, each at a different priority. Each queue was basically a FIFO (round robin within the queue). Any process can fire off a new message at any required priority. The higher priority queue interrupted the lower. It was very scale-able and worked from the highest level executive processor right down to the smallest device processor.

Thanks again for the blog!
[ - ]
Comment by beningjwJune 19, 2016
Thanks for the comment! That is also an interesting way to schedule bare metal tasks. There are so many variations that can be used before ever reaching an RTOS. Thanks for sharing experience!
[ - ]
Comment by jszedulaJune 30, 2019

I use a variation on your Cooperative Scheduler that includes task priority. My changes include:

  • get and check the system time before each task 
  • restart scanning the task list after a task is run, this creates a task priority because the first task is checked for execution


To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: