EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Hidden latencies and delays for a running program?

Started by Unknown May 25, 2014
Hi all,

I've been a SW developer, but one question I've never addressed is: What OS 
latencies and CPU delays are there in a compiled, running program? Is there any 
simple way to minimize them?

I am thinking of a simple c code program that reads data off a pci card and then 
writes it to memory like a PCIe SSD drive. I understand there will be various 
hardware latencies and delays in the data input. 

But what if the assembler program is executing? Does the OS "butt in" and context 
switch/ multi-task during execution of a continuous compiled program? If so, how 
does one shut that off? 

I've read about this somewhere, but never paid attention to it. 

Thanks in advance
jb
haiticare2011@gmail.com writes:
> I've been a SW developer, but one question I've never addressed is: What OS > latencies and CPU delays are there in a compiled, running program?
Lots. At the cpu level alone: variable instruction timing, cache misses, pipeline stalls, etc. At OS level: swapping and page faults, contention for machine resources by other tasks, etc.
> Is there any simple way to minimize them?
If you have absolute deadlines ("hard real time") then it's complicated and there's books written about it.
> But what if the assembler program is executing? Does the OS "butt in" and context > switch/ multi-task during execution of a continuous compiled program? If so, how > does one shut that off?
Some OS's offer real time scheduling which basically means you can give an absolute priority to your real time task, so no other tasks can run until the priority task has released the cpu.
On 25.5.14 21:44, haiticare2011@gmail.com wrote:
> Hi all, > > I've been a SW developer, but one question I've never addressed is: What OS > latencies and CPU delays are there in a compiled, running program? Is there any > simple way to minimize them? > > I am thinking of a simple c code program that reads data off a pci card and then > writes it to memory like a PCIe SSD drive. I understand there will be various > hardware latencies and delays in the data input. > > But what if the assembler program is executing? Does the OS "butt in" and context > switch/ multi-task during execution of a continuous compiled program? If so, how > does one shut that off?
Yes, it does, and you should not attempt to prevent it, as you may make the whole system totally unresponsive. There is little difference between a compiled C program and an assembly program performing the same algorithm. The write to the SSD drive is far from simple, if you have a file system on the card. Also, the SSD may have an internal controller which needs time slots for its own purposes. Examples are SD (camera) cards and USB sticks. -- Tauno Voipio
Hi jb,

On 5/25/2014 11:44 AM, haiticare2011@gmail.com wrote:
> I've been a SW developer, but one question I've never addressed is: > What OS latencies and CPU delays are there in a compiled, running > program? Is there any simple way to minimize them?
That, of course, depends on the choice of processor ("CPU delays") and the choice/characteristics of the OS you are using (if any). CPU's often include instruction pipelines, I/D caches, and (instruction) scheduling algorithms that can cause what you *think* is happening (i.e., by examining the assembly language code that is actually executing) to differ from what is *actually* happening (i.e., by examining the CPU's *state*, dynamically). Add a second (or fourth) core and things get even messier! OS's range from *nothing* (e.g., running your code in a big loop) to those with virtual memory subsystems, and dynamic scheduling algorithms, preemption, resource reservations, deadline handlers, etc. Of course, if it's *your* hardware (and OS choice), you can opt to bypass all of those mechanisms by *carefully* designing your "system" to run at the highest hardware priority available. In essence, claiming the CPU for your exclusive use.
> I am thinking of a simple c code program that reads data off a pci > card and then writes it to memory like a PCIe SSD drive. I understand > there will be various hardware latencies and delays in the data > input.
Again, that depends on the choice of processor and the actual code that gets executed (recall, what you *write* can be rewritten by an aggressive compiler so you need to look at what the actual instruction stream will be). You can, of course, mix and match your tools to the tasks best suited. E.g., if there are timing constraints and relationships that must be observed in accessing the PCI card, code that in ASM. If the OS already knows how to *talk* to the SSD (assuming you are using a supported file system and not just writing to the raw device), then just pass the results of the ASM routine to a higher level routine that allows the OS to do the actual write. Of course, you have to be sure your *average* throughput meets the needs of the data source. Often, that means an elastic store, somewhere, so your ASM routine can *always* be invoked to get the next batch of data even if the OS hasn't caught up with the *last* batch of data. Make this store easily resizable and then measure to see just how much gets consumed (max) in your worst case scenario. [Hint, if you are using a COTS OS, you probably will never be able to get *published* data to allow you to make these computations a priori. And, if the OS will support a variety of unconstrained *other* applications, all bets are off -- unless you can constrain them to suit your requirements!]
> But what if the assembler program is executing? Does the OS "butt in" > and context switch/ multi-task during execution of a continuous > compiled program? If so, how does one shut that off?
Again, depends on the OS and how you've installed your "program". E.g., if you have ensured that your code always runs at highest privilege, then the OS waits for *you* (which could bodge other applications that are expecting the OS to "be fair"). If, OTOH, you are just a userland application, then your code could "pause" for INDEFINITE periods of time: milliseconds to *days* (exaggeration). All the "writing in ASM" buys you is the ability to see what the sequence of opcodes available to the CPU will be. Writing in a HLL hides that detail from you (though you can often tell your compiler to show it to you) *and* limits your ability to make arbitrary changes to that sequence (because the compiler has liberties to alter what you've told it -- in "compatible ways").
> I've read about this somewhere, but never paid attention to it.
Much effort goes into system designs to *free* people from having to think about these sorts of details. But, when you are dealing with hardware, there are often other constraints that force you to work around/through those abstractions. Typically (i.e., even in a custom OS/MTOS/RTOS) a high(er) priority task deals with events that have timeliness constraints. E.g., fetching packets off a network interface (if you "miss" one, it either is lost forever *or* you have to request/wait for its retransmission -- a loss of efficiency... especially if you are likely to miss *that* one, too!). The data acquired (or *delivered* -- when pumping a data sink), is then buffered and a lower priority (though this might still be a relatively high priority... based on the overall needs of the system) task removes data from that buffer and "consumes" it. Note that this *adds* latency to the overall task. And, allows that latency to exhibit a greater degree of variability (based on how much of the elastic store gets consumed -- or not -- over the course of execution). So, if you expect a close temporal relationship between "input" and "output", you have to address this with other mechanisms (e.g., if you wanted something to happen AS SOON AS -- or, some predictable, constant time thereafter -- an input event was detected, the variability in this approach is directly reflected in that "output") Of course, if it can't be consumed as fast as it is sourced, then your system is too slow for the task you've set for it! "Why not just do the output in the same high priority task as the input?" What if the SSD (in your case) is not *ready* for more input at the *moment* your new input comes along? Perhaps the SSD is doing internal housekeeping? Do you twiddle your thumbs in that HIGH PRIORITY task *waiting* for it to be ready? How long can you twiddle before your *next* input comes along AND GETS *MISSED*? OS's (particularly full-fledged RTOS's) can provide varying degrees of support to remove some of the details of this task management. E.g., it may provide support for shared circular buffers. Or, allow buffers to be dynamically m-mapped to recipient tasks (to eliminate bcopy()'s). Signaling between the producer and consumer can be *part* of the OS (instead of forcing you to spin-wait on a flag). Deadline handlers can be created (by you) that the OS can then invoke *if* the associated task fails to meet its agreed upon deadline (e.g., what happens if you *can't* get back to look at the PCI card before the next data arrives? or, if you can't pull the data out of the buffer before the buffer *fills*/overflows? Do you *break*? Or, do you gracefully recover?) Best piece of advice: figure out how *not* to have timing constraints on your task. And, if unavoidable, figure out best to handle their violation: "hard" constraints can be handled easiest -- you simply stop working on them once you're "late"! ("Sorry, the ship has already sailed!"). "Soft" requires far more thought and effort -- it assumes there is still *value* to achieving the goal -- albeit *late*. ("But, if you charter a speedboat, you could probably catch up to that ship and arrange to board her AT SEA -- or in the next port. Yeah, that's a more expensive proposition but that's what happens when you miss your deadline!"). Any more *specific* answer requires far more specifics about your execution environment (processor, hardware involved, choice of OS, etc.) HTH, --don
haiticare2011@gmail.com writes:
> I am thinking of a simple c code program that reads data off a pci > card and then writes it to memory like a PCIe SSD drive. I understand > there will be various hardware latencies and delays in the data input.
Oh I remember now, you had the other post about some kind of data logging application. As others said, it sounds like you don't really have a strict latency bound as long as you don't use data, given enough ram to buffer stuff while i/o is blocked, with high enough probability that the failure possibilities are dominated by the reliability of the hardware. Anyway my guess is that the main source of delays may be the SSD itself. Those have unpredictable delays as they sometimes have to reorganize the data internally, which on some units can take a VERY long time on rare occasions. If you use an "enterprise" SSD, the vendors try harder to control those delays, including by overprovisioning the device so that the reorganization can happen using the extra capacity in the background. For that reason the enterprise SSD's cost more.
On 5/25/2014 3:06 PM, Tauno Voipio wrote:
> On 25.5.14 21:44, haiticare2011@gmail.com wrote: >> Hi all, >> >> I've been a SW developer, but one question I've never addressed is: >> What OS >> latencies and CPU delays are there in a compiled, running program? Is >> there any >> simple way to minimize them? >> >> I am thinking of a simple c code program that reads data off a pci >> card and then >> writes it to memory like a PCIe SSD drive. I understand there will be >> various >> hardware latencies and delays in the data input. >> >> But what if the assembler program is executing? Does the OS "butt in" >> and context >> switch/ multi-task during execution of a continuous compiled program? >> If so, how >> does one shut that off? > > Yes, it does, and you should not attempt to prevent it, > as you may make the whole system totally unresponsive.
I worked on a real time PC in which we had installed a board. It ran NT with a real time extension. First pass of my board had a bug which hung the bus transfer and the *entire* machine hung. Wow! The only way out was a hardware reset.
> There is little difference between a compiled C program > and an assembly program performing the same algorithm. > > The write to the SSD drive is far from simple, if you > have a file system on the card. Also, the SSD may have > an internal controller which needs time slots for its > own purposes. Examples are SD (camera) cards and USB sticks.
JB seems to have a lot to learn about real time systems. The part I don't quite get is why the PC side has to be real time. If he uses a separate MCU board to capture the ADC data (the important real time part of the problem) it can then send the data to a PC, not in "real time", just with a through put that exceeds the data rate. Adequate buffering on the MCU card will assure no loss of data. Then the PC can store the data on any media it wishes. Sounds simple enough to me but I don't get why he continues to flog this horse. -- Rick
On 26.5.14 00:11, rickman wrote:
> On 5/25/2014 3:06 PM, Tauno Voipio wrote: >> On 25.5.14 21:44, haiticare2011@gmail.com wrote: >>> Hi all, >>> >>> I've been a SW developer, but one question I've never addressed is: >>> What OS >>> latencies and CPU delays are there in a compiled, running program? Is >>> there any >>> simple way to minimize them? >>> >>> I am thinking of a simple c code program that reads data off a pci >>> card and then >>> writes it to memory like a PCIe SSD drive. I understand there will be >>> various >>> hardware latencies and delays in the data input. >>> >>> But what if the assembler program is executing? Does the OS "butt in" >>> and context >>> switch/ multi-task during execution of a continuous compiled program? >>> If so, how >>> does one shut that off? >> >> Yes, it does, and you should not attempt to prevent it, >> as you may make the whole system totally unresponsive. > > I worked on a real time PC in which we had installed a board. It ran NT > with a real time extension. First pass of my board had a bug which hung > the bus transfer and the *entire* machine hung. Wow! The only way out > was a hardware reset. > > >> There is little difference between a compiled C program >> and an assembly program performing the same algorithm. >> >> The write to the SSD drive is far from simple, if you >> have a file system on the card. Also, the SSD may have >> an internal controller which needs time slots for its >> own purposes. Examples are SD (camera) cards and USB sticks. > > JB seems to have a lot to learn about real time systems. The part I > don't quite get is why the PC side has to be real time. If he uses a > separate MCU board to capture the ADC data (the important real time part > of the problem) it can then send the data to a PC, not in "real time", > just with a through put that exceeds the data rate. Adequate buffering > on the MCU card will assure no loss of data. Then the PC can store the > data on any media it wishes. Sounds simple enough to me but I don't get > why he continues to flog this horse.
Maybe the PHB has orederd him to make the PC a real-time capturing system. Anyway, he'll have a stiff climb up the learning steps. -- -TV
On 5/26/2014 1:51 AM, Tauno Voipio wrote:
> > Maybe the PHB has orederd him to make the PC a real-time > capturing system. Anyway, he'll have a stiff climb up the > learning steps.
PHB? Do you mean powers that be? He has been asking about embedded, but seems to think he has to put the entire system on the embedded device. I don't want to give the guy grief, but it sounds like he is not familiar enough with embedded design to even know if his task can use it effectively or not. He seems to reject a lot of suggestions before he understands them. I'm also very unclear on what data rate he really needs from the front end to the storage. -- Rick
On 26.5.14 09:19, rickman wrote:
> On 5/26/2014 1:51 AM, Tauno Voipio wrote: >> >> Maybe the PHB has orederd him to make the PC a real-time >> capturing system. Anyway, he'll have a stiff climb up the >> learning steps. > > PHB? Do you mean powers that be? He has been asking about embedded, > but seems to think he has to put the entire system on the embedded > device. I don't want to give the guy grief, but it sounds like he is > not familiar enough with embedded design to even know if his task can > use it effectively or not. He seems to reject a lot of suggestions > before he understands them. I'm also very unclear on what data rate he > really needs from the front end to the storage. >
Sorry - Pointy-Haired Boss, from Dilbert. -- -TV
On 26/05/14 08:19, rickman wrote:
> On 5/26/2014 1:51 AM, Tauno Voipio wrote: >> >> Maybe the PHB has orederd him to make the PC a real-time >> capturing system. Anyway, he'll have a stiff climb up the >> learning steps. > > PHB? Do you mean powers that be? He has been asking about embedded, > but seems to think he has to put the entire system on the embedded > device. I don't want to give the guy grief, but it sounds like he is > not familiar enough with embedded design to even know if his task can > use it effectively or not. He seems to reject a lot of suggestions > before he understands them. I'm also very unclear on what data rate he > really needs from the front end to the storage. >
The OP is very unclear about the data rate he needs (he alternates over several orders of magnitude), and has no idea at all about the sample size. The worrying thing is that he does not seem to consider this a problem, and does not realise that this project needs a lot of thought and planning, then a lot of research and prototyping, before he can start looking at implementation and development. He also has virtually no idea about the technologies for implementing the system. He has some fixed pre-conceived ideas that he won't change no matter what people tell him - he believes USB latency will cause trouble, he believes SSD is the greatest invention since sliced bread, he believes assembly programming will be more "real time" than C programming. The guy may be a good SW developer for all I know, but he is clearly far out of his depth with this project. I don't know if this is his own fault, or that of a PHB, but he desperately needs help here (of a kind that we cannot give him) before he wastes lots of time and money.
The 2026 Embedded Online Conference