Reply by Paul March 31, 20122012-03-31
In article <koWdnfyMCcuXcOvSnZ2dnUVZ8qudnZ2d@lyse.net>, 
david.brown@removethis.hesbynett.no says...
> > On 31/03/12 13:16, upsidedown@downunder.com wrote: > > On Sat, 31 Mar 2012 12:07:54 +0200, David Brown > > <david.brown@removethis.hesbynett.no> wrote: > > > >> That reminds me of a situation where C was much better than assembly for > >> startup code. This was 15 years ago - the compiler in question being > >> about 20 years old now. The toolchain-provided startup code for > >> clearing the bss was written in assembly, as is common. And it was slow > >> and inefficient - also a very common situation for toolchain-provided > >> assembly code. I re-wrote it in C - the result was clearer code, half > >> the size of object code, and something like 10 times as fast run time. > >> Ironically it is because the compiler generated a DBNE instruction, > >> which the assembly code did not use. > > > > This might be slightly off topic, but at least it shows that embedded > > systems are used in quite different environments. > > > > While it might be critical that the startup can be done in 1 ms in > > some cases, in other cases a startup time of 1 s, 1 minute or even 1 > > hour might be acceptable, if the system is expected to run for the > > next 1-30 years without restarts. > > > > True enough - and startup time is seldom critical. But it is silly for > a toolchain provider to have startup code in assembly that is both > longer and slower than the equivalent C code, while being also less > clear and flexible. > > Sometimes I think toolchain providers don't realise it is possible to > have C code that runs before main() starts. They write this crap in > assembly once, for one member of the processor family, and re-use it > ever after because no one can be bothered re-writing it optimally for > different members. So the startup code you get for your Coldfire v4 is > restricted to being able to run on an 68000 from 30 years ago. Or if > they do modify it, they try to minimise the changes - resulting in an > incomprehensible mixture. > > (Sorry if that sounds like a bit of a rant - I've had to deal with some > very messy low-level assembly code recently. The toolchain is otherwise > good, but some of the junk created as part of the "project wizard" setup > is ridiculous.)
Whenever I see Wizard or similar with anything I immediately think "oh shit yet another there are only three possible ways to use this tool" Wizards for anything are a bain of my life. -- Paul Carpenter | paul@pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny <http://www.badweb.org.uk/> For those web sites you hate
Reply by March 31, 20122012-03-31
On Sat, 31 Mar 2012 13:39:54 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:


>True enough - and startup time is seldom critical. But it is silly for >a toolchain provider to have startup code in assembly that is both >longer and slower than the equivalent C code, while being also less >clear and flexible.
The last time I was involved in the cross compiler business was in the 1980's so I might be a bit out of date :-). In those days, when you hired a new person, what would be her/his first duty ? Typically writing examples and documents for different platforms in order to get some familiarity to the company products. By no means, you would not use your best programmers for this duty, but instead used engaged in writing the actual compiler. I guess that the situation has not changed a lot since those days.
Reply by David Brown March 31, 20122012-03-31
On 31/03/12 13:16, upsidedown@downunder.com wrote:
> On Sat, 31 Mar 2012 12:07:54 +0200, David Brown > <david.brown@removethis.hesbynett.no> wrote: > >> That reminds me of a situation where C was much better than assembly for >> startup code. This was 15 years ago - the compiler in question being >> about 20 years old now. The toolchain-provided startup code for >> clearing the bss was written in assembly, as is common. And it was slow >> and inefficient - also a very common situation for toolchain-provided >> assembly code. I re-wrote it in C - the result was clearer code, half >> the size of object code, and something like 10 times as fast run time. >> Ironically it is because the compiler generated a DBNE instruction, >> which the assembly code did not use. > > This might be slightly off topic, but at least it shows that embedded > systems are used in quite different environments. > > While it might be critical that the startup can be done in 1 ms in > some cases, in other cases a startup time of 1 s, 1 minute or even 1 > hour might be acceptable, if the system is expected to run for the > next 1-30 years without restarts. >
True enough - and startup time is seldom critical. But it is silly for a toolchain provider to have startup code in assembly that is both longer and slower than the equivalent C code, while being also less clear and flexible. Sometimes I think toolchain providers don't realise it is possible to have C code that runs before main() starts. They write this crap in assembly once, for one member of the processor family, and re-use it ever after because no one can be bothered re-writing it optimally for different members. So the startup code you get for your Coldfire v4 is restricted to being able to run on an 68000 from 30 years ago. Or if they do modify it, they try to minimise the changes - resulting in an incomprehensible mixture. (Sorry if that sounds like a bit of a rant - I've had to deal with some very messy low-level assembly code recently. The toolchain is otherwise good, but some of the junk created as part of the "project wizard" setup is ridiculous.)
Reply by David Brown March 31, 20122012-03-31
On 30/03/12 17:57, Tim Wescott wrote:
> On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:
>> It is clearly C++, but it would seem that CFractional is a class >> containing an int32_t member "_x" which is the fractional value in >> question. Think of it as syntactic sugar around the function >> >> int32_t add_sat_frac(int32_t a, int32_t b); > > Yup. In fact, that's an awful lot like what the call looks like when I > need to do this in C (except that I'm going to be investigating just how > ubiquitous fractional support is, now that I've been made aware of it). > > Sorry for not elucidating -- I thought it would be obvious. >
It is obvious to people who are familiar with C++ - but gobbledegook to people who have managed to avoid it! If you find out anything interesting about support for "fract", "sat", etc., it would be interesting to hear about it. I know it is supported in gcc for many processors (either with hardware support, or library calls), but I haven't looked further than that. I also know that the numpties that wrote the specs have, as usual, underspecified them. How can they possibly have been so stupid as to write things like "the minimum formats for each type are..." ? Did they not notice that the embedded world is a great fan of the <stdint.h> types like int16_t, and used their own types like u8 before that? I would much prefer to have seen standardised names like "fract16_t", "ufract32_t", etc., from the start - before people make them up themselves. mvh., David
Reply by March 31, 20122012-03-31
On Sat, 31 Mar 2012 12:07:54 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>That reminds me of a situation where C was much better than assembly for >startup code. This was 15 years ago - the compiler in question being >about 20 years old now. The toolchain-provided startup code for >clearing the bss was written in assembly, as is common. And it was slow >and inefficient - also a very common situation for toolchain-provided >assembly code. I re-wrote it in C - the result was clearer code, half >the size of object code, and something like 10 times as fast run time. >Ironically it is because the compiler generated a DBNE instruction, >which the assembly code did not use.
This might be slightly off topic, but at least it shows that embedded systems are used in quite different environments. While it might be critical that the startup can be done in 1 ms in some cases, in other cases a startup time of 1 s, 1 minute or even 1 hour might be acceptable, if the system is expected to run for the next 1-30 years without restarts.
Reply by David Brown March 31, 20122012-03-31
On 30/03/12 16:22, Mark Borgerson wrote:
> I did look at the C code and the compiler outputs. It > seems that compilers have come a long way since I wrote some > 68K assembly because the compiler refused to use the most > efficient decrement-test-and-loop instruction (DBNE D0, Dest, > I think). >
That reminds me of a situation where C was much better than assembly for startup code. This was 15 years ago - the compiler in question being about 20 years old now. The toolchain-provided startup code for clearing the bss was written in assembly, as is common. And it was slow and inefficient - also a very common situation for toolchain-provided assembly code. I re-wrote it in C - the result was clearer code, half the size of object code, and something like 10 times as fast run time. Ironically it is because the compiler generated a DBNE instruction, which the assembly code did not use. (The compiler will only be able to generate DBNE instructions for a 16-bit counter, not an 32-bit "int" counter. It's one of the few 16-bit only instructions on the m68k.)
> It is clear to me that the compiler writers are way ahead of > me for the ARM and ARM-Cortex chips. Even on the simpler > MSP430, I seldom use assembly outside the startup code. > I still look at the assembly listing in the debugger, though. > > Mark Borgerson >
Reply by Tim Wescott March 30, 20122012-03-30
On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:

> On 30/03/2012 06:27, Mark Borgerson wrote: >> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, >> tim@seemywebsite.com says... >>> >>> Hey Walter (et all, if you're out there): >>> >>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes >>> a _lot_ faster when you precede it with >>> >>> #define ASSEMBLY_WORKS >>> >>> than when you don't. >>> >>> Yet you say that an optimizer should eat up the C code and spit out >>> assembly that's better than I can do. >>> >>> How come the difference? Is it the tools? I know it's not because >>> it's the World's Best ARM Assembly, because I've learned a bit since I >>> did it and could probably speed it up -- or at least make it cleaner. >>> >>> CFractional CFractional::operator + (CFractional y) const { >>> #ifdef ASSEMBLY_WORKS >>> int32_t a = _x; >>> int32_t b = y._x; >>> asm ( "adds %[a], %[b]\n" // subtract >>> "bvc .sat_add_vc\n" // check for overflow "ite mi\n" >>> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive >>> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative "b >>> .sat_add_ret\n" >>> ".sat_add_maxpos: .word 0x7fffffff\n" ".sat_add_maxneg: >>> .word 0x80000001\n" ".sat_add_forbid: .word 0x80000000\n" >>> ".sat_add_vc:\n" >>> "bpl .sat_add_ret\n" >>> "ldr %[b], .sat_add_forbid\n" >>> "cmp %[a], %[b]\n" >>> "it eq\n" >>> "moveq %[a], %[b]\n" >>> ".sat_add_ret:\n" >>> : [a] "=r" (a), [b] "=r" (b) >>> : "[a]" "r" (a), "[b]" "r" (b)); >>> >>> return CFractional(a); >>> #else >>> int32_t retval = _x + y._x; >>> >>> // Check for underflow and saturate if so if (_x< 0&& y._x< 0&& >>> (retval>= 0 || retval< -INT32_MAX)) { >>> retval = -INT32_MAX; >>> } >>> >>> // check for overflow and saturate if so if (_x> 0&& y._x> 0&& >>> retval<= 0) { >>> retval = INT32_MAX; >>> } >>> >>> return retval; >>> #endif >>> } >> >> I was going to try out that code on the IAR EWARM compiler at various >> optimization levels----until I realized that >> >> "CFractional CFractional::operator + (CFractional y) const" >> >> doesn't look like C to me. Am I missing something?? >> >> Could you include enough information to make that example directly >> compilable in standard C.? >> >> Mark Borgerson >> >> >> > It is clearly C++, but it would seem that CFractional is a class > containing an int32_t member "_x" which is the fractional value in > question. Think of it as syntactic sugar around the function > > int32_t add_sat_frac(int32_t a, int32_t b);
Yup. In fact, that's an awful lot like what the call looks like when I need to do this in C (except that I'm going to be investigating just how ubiquitous fractional support is, now that I've been made aware of it). Sorry for not elucidating -- I thought it would be obvious. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
Reply by Mark Borgerson March 30, 20122012-03-30
In article <BcSdna8XpZPQ_ujSnZ2dnUVZ7v2dnZ2d@lyse.net>, 
david@westcontrol.removethisbit.com says...
> > On 30/03/2012 06:27, Mark Borgerson wrote: > > In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, > > tim@seemywebsite.com says... > >> > >> Hey Walter (et all, if you're out there): > >> > >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > >> _lot_ faster when you precede it with > >> > >> #define ASSEMBLY_WORKS > >> > >> than when you don't. > >> > >> Yet you say that an optimizer should eat up the C code and spit out > >> assembly that's better than I can do. > >> > >> How come the difference? Is it the tools? I know it's not because it's > >> the World's Best ARM Assembly, because I've learned a bit since I did it > >> and could probably speed it up -- or at least make it cleaner. > >> > >> CFractional CFractional::operator + (CFractional y) const > >> { > >> #ifdef ASSEMBLY_WORKS > >> int32_t a = _x; > >> int32_t b = y._x; > >> asm ( "adds %[a], %[b]\n" // subtract > >> "bvc .sat_add_vc\n" // check for overflow > >> "ite mi\n" > >> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > >> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > >> "b .sat_add_ret\n" > >> ".sat_add_maxpos: .word 0x7fffffff\n" > >> ".sat_add_maxneg: .word 0x80000001\n" > >> ".sat_add_forbid: .word 0x80000000\n" > >> ".sat_add_vc:\n" > >> "bpl .sat_add_ret\n" > >> "ldr %[b], .sat_add_forbid\n" > >> "cmp %[a], %[b]\n" > >> "it eq\n" > >> "moveq %[a], %[b]\n" > >> ".sat_add_ret:\n" > >> : [a] "=r" (a), [b] "=r" (b) > >> : "[a]" "r" (a), "[b]" "r" (b)); > >> > >> return CFractional(a); > >> #else > >> int32_t retval = _x + y._x; > >> > >> // Check for underflow and saturate if so > >> if (_x< 0&& y._x< 0&& (retval>= 0 || retval< -INT32_MAX)) > >> { > >> retval = -INT32_MAX; > >> } > >> > >> // check for overflow and saturate if so > >> if (_x> 0&& y._x> 0&& retval<= 0) > >> { > >> retval = INT32_MAX; > >> } > >> > >> return retval; > >> #endif > >> } > > > > I was going to try out that code on the IAR EWARM compiler at various > > optimization levels----until I realized that > > > > "CFractional CFractional::operator + (CFractional y) const" > > > > doesn't look like C to me. Am I missing something?? > > > > Could you include enough information to make that example > > directly compilable in standard C.? > > > > Mark Borgerson > > > > > > It is clearly C++, but it would seem that CFractional is a class > containing an int32_t member "_x" which is the fractional value in > question. Think of it as syntactic sugar around the function > > int32_t add_sat_frac(int32_t a, int32_t b); > > (Or see my re-write of the code in C in my other post.)
I did look at the C code and the compiler outputs. It seems that compilers have come a long way since I wrote some 68K assembly because the compiler refused to use the most efficient decrement-test-and-loop instruction (DBNE D0, Dest, I think). It is clear to me that the compiler writers are way ahead of me for the ARM and ARM-Cortex chips. Even on the simpler MSP430, I seldom use assembly outside the startup code. I still look at the assembly listing in the debugger, though. Mark Borgerson
Reply by David Brown March 30, 20122012-03-30
On 30/03/2012 06:27, Mark Borgerson wrote:
> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, > tim@seemywebsite.com says... >> >> Hey Walter (et all, if you're out there): >> >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a >> _lot_ faster when you precede it with >> >> #define ASSEMBLY_WORKS >> >> than when you don't. >> >> Yet you say that an optimizer should eat up the C code and spit out >> assembly that's better than I can do. >> >> How come the difference? Is it the tools? I know it's not because it's >> the World's Best ARM Assembly, because I've learned a bit since I did it >> and could probably speed it up -- or at least make it cleaner. >> >> CFractional CFractional::operator + (CFractional y) const >> { >> #ifdef ASSEMBLY_WORKS >> int32_t a = _x; >> int32_t b = y._x; >> asm ( "adds %[a], %[b]\n" // subtract >> "bvc .sat_add_vc\n" // check for overflow >> "ite mi\n" >> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive >> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative >> "b .sat_add_ret\n" >> ".sat_add_maxpos: .word 0x7fffffff\n" >> ".sat_add_maxneg: .word 0x80000001\n" >> ".sat_add_forbid: .word 0x80000000\n" >> ".sat_add_vc:\n" >> "bpl .sat_add_ret\n" >> "ldr %[b], .sat_add_forbid\n" >> "cmp %[a], %[b]\n" >> "it eq\n" >> "moveq %[a], %[b]\n" >> ".sat_add_ret:\n" >> : [a] "=r" (a), [b] "=r" (b) >> : "[a]" "r" (a), "[b]" "r" (b)); >> >> return CFractional(a); >> #else >> int32_t retval = _x + y._x; >> >> // Check for underflow and saturate if so >> if (_x< 0&& y._x< 0&& (retval>= 0 || retval< -INT32_MAX)) >> { >> retval = -INT32_MAX; >> } >> >> // check for overflow and saturate if so >> if (_x> 0&& y._x> 0&& retval<= 0) >> { >> retval = INT32_MAX; >> } >> >> return retval; >> #endif >> } > > I was going to try out that code on the IAR EWARM compiler at various > optimization levels----until I realized that > > "CFractional CFractional::operator + (CFractional y) const" > > doesn't look like C to me. Am I missing something?? > > Could you include enough information to make that example > directly compilable in standard C.? > > Mark Borgerson > >
It is clearly C++, but it would seem that CFractional is a class containing an int32_t member "_x" which is the fractional value in question. Think of it as syntactic sugar around the function int32_t add_sat_frac(int32_t a, int32_t b); (Or see my re-write of the code in C in my other post.) mvh., David
Reply by Mark Borgerson March 30, 20122012-03-30
In article <rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> > Hey Walter (et all, if you're out there): > > With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > _lot_ faster when you precede it with > > #define ASSEMBLY_WORKS > > than when you don't. > > Yet you say that an optimizer should eat up the C code and spit out > assembly that's better than I can do. > > How come the difference? Is it the tools? I know it's not because it's > the World's Best ARM Assembly, because I've learned a bit since I did it > and could probably speed it up -- or at least make it cleaner. > > CFractional CFractional::operator + (CFractional y) const > { > #ifdef ASSEMBLY_WORKS > int32_t a = _x; > int32_t b = y._x; > asm ( "adds %[a], %[b]\n" // subtract > "bvc .sat_add_vc\n" // check for overflow > "ite mi\n" > "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > "b .sat_add_ret\n" > ".sat_add_maxpos: .word 0x7fffffff\n" > ".sat_add_maxneg: .word 0x80000001\n" > ".sat_add_forbid: .word 0x80000000\n" > ".sat_add_vc:\n" > "bpl .sat_add_ret\n" > "ldr %[b], .sat_add_forbid\n" > "cmp %[a], %[b]\n" > "it eq\n" > "moveq %[a], %[b]\n" > ".sat_add_ret:\n" > : [a] "=r" (a), [b] "=r" (b) > : "[a]" "r" (a), "[b]" "r" (b)); > > return CFractional(a); > #else > int32_t retval = _x + y._x; > > // Check for underflow and saturate if so > if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX)) > { > retval = -INT32_MAX; > } > > // check for overflow and saturate if so > if (_x > 0 && y._x > 0 && retval <= 0) > { > retval = INT32_MAX; > } > > return retval; > #endif > }
I was going to try out that code on the IAR EWARM compiler at various optimization levels----until I realized that "CFractional CFractional::operator + (CFractional y) const" doesn't look like C to me. Am I missing something?? Could you include enough information to make that example directly compilable in standard C.? Mark Borgerson