On 09/11/2010 05:12, glen herrmannsfeldt wrote:
> In comp.dsp David Brown<david.brown@removethisbit.hesbynett.no> wrote:
> (snip, I wrote)
>>>> Much embedded programming is done in C with inline assembler.
>
>> On 08/11/10 21:30, Vladimir Vassilevsky wrote:
>
>>> Inline assembler is bad style and a characteristic feature of lame
>>> programmers. It combines disadvantages of both C and assembler.
>
>>> If there is a need to use an assembler, make a separate module in
>>> assembler and call it from C as an external function.
>
>> I disagree. It's a matter of taste and style, of course. It is also a
>> matter of the application and the target - for DSPs with typical DSP
>> code, assembly can make a much bigger difference over C than for more
>> common microprocessor code.
>
>> Sometimes you can't avoid using assembly. A typical good reason is
>> because you need access to low-level features that can't be expressed in
>> C (such as registers that are accessible only with special
>> instructions), for a few bits of startup code, as part of an interrupt
>> routine (if your compiler does a poor job), or for handling key parts of
>> an RTOS.
>
> The last few assembly routines I wrote were to use the IA32 RDTSC
> instruction. It conveniently returns the 64 bit result in EDX:EAX,
> the usual return registers for (long long) on IA32. The executable
> instructions are RDTSC and RET.
>
On the x86, you have so few registers that the compiler can't help you
much in the allocation, and many instructions have implicit register
usage. It also means that the function call overhead is less, since you
have data is on the stack anyway (though the actual processor
implementation may keep copies internally in registers). So it really
doesn't matter if you have fixed registers in your assembly code, and
with a bit of luck the processor will handle the "ret" instruction early
on in the instruction pipeline.
But let's think about what actually happens in the processor the "rdtsc"
instruction implemented as an external assembly module, and as inline
assembly. Bear with me on the details here - I have never used assembly
on the x86, and I haven't tested this code.
Case 1 - external module.
In assembly.s, you have:
readTimestamp:
rdtsc
ret
In test.c, you have :
extern uint64_t readTimestamp(void);
void test(void) {
// part A
uint64_t t = readTimestamp();
// part B
}
First, consider what the compiler knows. All it knows about
readTimestamp is that it follows the C calling convention, and returns a
value in EDX:EAX. It must assume that the function may change any
volatile registers (according to the standard x86 ABI), and it may read
or write any memory. This means any values held in local registers in
part A, such as loop counters, pointers, etc., must be preserved on the
stack before calling readTimestamp(), and restored afterwards.
Similarly, any outstanding memory writes must be done, and in part B any
values from memory must be re-read. In other words, the call to
readTimestamp is a serious block in the flow of the optimiser.
Secondly, consider the processor executing the code. The function call
and the return are non-conditional, so they will be executed early in
the instruction pipeline. But any jump in the instruction flow means a
new block of memory needs to be in the cache, with associated risks of
cache misses, page misses, etc.
In other words, it can be quite costly to read the timestamp this way.
With inline assembly, you have just test.c :
static inline uint64_t readTimestamp(void) {
uint64_t x;
asm (" rdtsc " : "=A" (x) :: );
return x;
}
void test(void) {
// part A
uint64_t t = readTimestamp();
// part B
}
The use of this inline assembly within the test() function is identical.
But now the compiler knows everything about it - it knows that
readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and
neither reads from nor writes to memory. This gives it much more
flexibility in optimising the code in test(). The generated code will
be nothing more than the single "rdtsc" instruction and whatever
register movements are needed, and as it is inline there is no change of
flow when the processor is executing it.
For processors with more registers, there is even more to be gained with
inline assembly. On the PPC, for example, I used this inline assembly
function recently in code that converted a set of values from
little-endian to big-endian:
static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) {
uint32_t y;
asm (" lwbrx %[y], %[x], %[p] " :
[y] "=r" (y) :
[x] "r" (x), [p] "r" (p) );
return y;
}
By using inline assembly, the compiler can allocate registers and
pipeline the byte-swapped reads optimally. If I made this an external
assembly routine following the C calling conventions, the code would
have been ten times slower - using shifts and masks for the conversion
would have been faster.
>> Typical bad reasons is for speed-optimising code. Sometimes it /is/
>> important to get the best possible speed out of the system within a
>> small section of the code. You may also be writing library code or
>> other heavily-used code, where it can be worth the effort. But in a
>> great many cases when people think they need to write code in assembly
>> for speed, they are wrong - either it is not worth the cost (in terms of
>> development time, maintainability, readability, correctness, robustness,
>> portability, etc.), or the C compiler will actually do a good job if the
>> programmer just learned to use it properly.
>
> Yes, those are the cases where inline assembler works best, but
> also, as mentioned, is ugly, non-portable, etc. You can write
> just the inner loop, maybe only a few instructions, in assembler
> with the rest in C.
>
Assembly is /always/ ugly and non-portable. Being inline or external
makes no difference there.
>> So all in all, for most targets and most applications, you don't often
>> need assembly when using modern tools. Since well-written high-level
>> code is /generally/ clearer and more portable than even well-written
>> assembly (though you can write bad code in any language), the preference
>> should always be for using high level coding unless you have overriding
>> reasons for using assembly.
>
>> Then you have the choice - separate assembly modules, or inline assembly.
>
>> If you are writing large sections of assembly code, then assembly
>> modules makes sense. It is clearer to stick to a single language at a
>> time, and use tools suitable for that language, and large "asm(...)"
>> statements are as messy as large multi-line pre-processor macros.
>
> Last I wrote inline assembler was in Dynamic C for the Rabbit 3000,
> which is pretty much a Z180. Unlike the Z80, code has a 24 bit
> addressing space, with special call and ret instructions to change
> the high bits. With inline assembler, the C compiler keeps track
> of the addressing.
>
>> But if you are mixing C and assembly, and want to have minimal assembly,
>> then inline assembly is the way to go, especially if you have good
>> tools. There are some C compilers that can't deal well with inline
>> assembly - they insist on it being restricted to "assembly functions"
>> only, or they turn off all optimisations for functions that use inline
>> assembly.
>
> For the Z180 there isn't much optimization.
>
The only cure for a poor compiler is to get a better one, if there is
one available.
>> Other compilers work very well with inline assembly - the
>> compiler will handle register and/or stack allocation, and happily
>> include the inline assembly in its optimisation flow. If you are using
>> such a compiler (gcc is a well-known example, but there are commercial
>> tools that work well too), then you will probably get the smallest and
>> fastest code by using inline assembly and letting the compiler do its
>> job. The actual assembly code itself can often be tucked away in
>> "static inline" functions.
>
> (snip)
>
>> If you need to write code requiring a lot of register tracking, then I
>> can see that inline assembly will be messy. But for a lot of uses, when
>> done properly (both by the user, and by the toolchain vendor), inline
>> assembly is much clearer and simpler than external assembly modules, and
>> results in smaller and faster code.
>
> For the Z180, there aren't many registers, and some have special
> uses, so there really isn't much of a problem with registers tracking.
>
> -- glen
>