Small or fast?
Life is often about compromise, but embedded developers really are not good at that. Code generation is a context in which compromise is somewhat inevitable and we call it “optimization”. All modern compilers perform optimization, of course. Some do a better job than others. A lot of the time, the compiler simply guesses which optimization will produce the best result without knowing what the designer really wants. For desktop applications, this is OK. Speed is the only important criterion, as memory is effectively free. But embedded is different …
To the first approximation, all desktop computers are the same. It is quite straightforward to write acceptable applications that will run on anyone’s machine. Also, the broad expectations of their users are the same. Embedded systems are all different – the hardware and software environment varies widely and the expectations of users are just as diverse. In many ways, this is what is particularly interesting about embedded software development.
An embedded compiler is likely to have a great many options to control optimization. Sometimes that fine-grain control is vital; on other occasions, it can come down to a simple choice between optimization for speed or size. This choice is curious, but it is simply an empirical observation that small code is often slower and fast code tends to need more memory.
An obvious example is function inlining. A small function can be optimized so that its actual code is placed in line at each call site. This executes faster, because the call/return sequence is eliminated. But it will use more memory as there may be multiple copies of identical code. Sometimes you can get lucky and an optimization which yields faster code is also light on memory, but this is quite unusual.
The control of optimization for embedded code generation is not set to get any easier, as more possibilities are coming along. Notably, there is increased interest in minimizing power consumption. An algorithm may be selected on the basis of how much power the CPU consumes to get the job done. This is subtle, because fast code will use less CPU power, but smaller code needs less memory, which consumes power.
- Comments
- Write a Comment Select to add a comment
Be sure to watch in the generated code of exactly what optimizations your compiler actually throws at you. I once had a compiler that I had set to optimize for small code space. I also had a macro for a one instruction software interrupt which was a call gate into the bare bones operating system to allow the OS to run in supervisor mode and everyone else in user mode. The compiler "optimized" the one instruction macro into a multiple byte subroutine call which then called the one instruction long subroutine which it converted to a two instruction subroutine that included a return instruction! The compiler did not take into account the size of the repeated code and instead saw that the OS call was used frequently so it concluded that it should replace all of the desired repeated code with a singe subroutine.
I agree completely. Although most developers may never write a line of assembly, being able to read it is an essential skill so that you can keep an eye on what you compiler is doing.
As an embedded developer, I've also find it important to be able to compile and run code without any optimizations for debugging purposes. Not only does this allow source-level debuggers to make sane interpretations with regard to the original source code, but it's not infrequent that a debug build either doesn't suffer the original issue or the issue manifests itself in some other way. Both can be important clues as to the nature of the bug. I'm not implying the issue is a bug in compiler optimization; rather, it exposes incorrect assumptions on my part regarding how different threads of code or interrupts may interact in subtle ways. Padding the embedded environment, when possible, to allow a debug version of the code to run can pay off big dividends.
I know I'm in for a long day (and possibly night) when running a debug build proves impossible, usually because code expansion exceeds the size of available flash memory or tight timing constraints are violated. It's even worse when a debugger is impractical, no serial port is available for tracing and I'm reduced to toggling spare GPIOs attached to a logic analyzer to indicate the state of the code while running.
A day in the life of an embedded software developer. :-)
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: