Surprising Linux Real Time Scheduler Behavior
I have recently been helping with embedded software design and development for a data acquisition and visualization device. The software executes within an embedded Linux context and consists of various animated user interfaces rendering the acquired data.
The data is received via a UART and a SPI connection. During project development we noticed poor UART data latency issues during heavy user interface animations. For this product to properly meet its acquisition requirements, the UART reception must be soft real-time with minimal jitter.
Our first attempt to resolve the latency issue was to move all the threads of our main application into the Linux real time scheduler using the SCHED_FIFO option. This dramatically improved the jitter/latency issue but did not fully solve the issue. During peak UI animations we were still observing delayed UART reception despite giving the acquisition threads a higher priority than the UI thread. We then ignored the issue for some time, as the device data acquisition was now "good enough" for the visualization requirements at the time.
Of course, with the product nearing production deadlines, a new requirement emerges after potential customers evaluate pre-production units. The customers want detailed logging of the acquired data which can be exported with timestamps! The previous soft real-time requirements have suddenly morphed into near-hard real-time requirements. I was assigned the new feature request and quickly recalled the partially-resolved jitter issues.
After following several dead-end paths, I ran across a post discussing a similar issue. This post implied that kernel worker threads may be creating the equivalent of a mutual exclusion issue. Despite the fact that the data acquisition threads were running with a higher priority than the UI thread, when each of those real time threads uses any kernel driver at the same time, the possibility of blocking due to kernel resource conflicts rears its ugly head. After reading that post, I experimentally removed the UI thread from the real-time scheduler, returning it to the normal Linux scheduler.
Suddenly, the system worked perfectly. The real time data acquisition threads acquired data with minimal jitter. The new time-stamped log data looked great, and the user interface animations were smooth and no longer impacting data acquisition. Another Linux lesson learned.
That being said, does anyone know the root/core reason for the apparent priority-inversion blocking in the Linux kernel when using real time scheduled threads? Please comment!
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: