In article <koWdnfyMCcuXcOvSnZ2dnUVZ8qudnZ2d@lyse.net>, 
david.brown@removethis.hesbynett.no says...
> 
> On 31/03/12 13:16, upsidedown@downunder.com wrote:
> > On Sat, 31 Mar 2012 12:07:54 +0200, David Brown
> > <david.brown@removethis.hesbynett.no>  wrote:
> >
> >> That reminds me of a situation where C was much better than assembly for
> >> startup code.  This was 15 years ago - the compiler in question being
> >> about 20 years old now.  The toolchain-provided startup code for
> >> clearing the bss was written in assembly, as is common.  And it was slow
> >> and inefficient - also a very common situation for toolchain-provided
> >> assembly code.  I re-wrote it in C - the result was clearer code, half
> >> the size of object code, and something like 10 times as fast run time.
> >> Ironically it is because the compiler generated a DBNE instruction,
> >> which the assembly code did not use.
> >
> > This might be slightly off topic, but at least it shows that embedded
> > systems are used in quite different environments.
> >
> > While it might be critical that the startup can be done in 1 ms in
> > some cases, in other cases a startup time of 1 s, 1 minute or even 1
> > hour might be acceptable, if the system is expected to run for the
> > next 1-30 years without restarts.
> >
> 
> True enough - and startup time is seldom critical.  But it is silly for 
> a toolchain provider to have startup code in assembly that is both 
> longer and slower than the equivalent C code, while being also less 
> clear and flexible.
> 
> Sometimes I think toolchain providers don't realise it is possible to 
> have C code that runs before main() starts.  They write this crap in 
> assembly once, for one member of the processor family, and re-use it 
> ever after because no one can be bothered re-writing it optimally for 
> different members.  So the startup code you get for your Coldfire v4 is 
> restricted to being able to run on an 68000 from 30 years ago.  Or if 
> they do modify it, they try to minimise the changes - resulting in an 
> incomprehensible mixture.
> 
> (Sorry if that sounds like a bit of a rant - I've had to deal with some 
> very messy low-level assembly code recently.  The toolchain is otherwise 
> good, but some of the junk created as part of the "project wizard" setup 
> is ridiculous.)

Whenever I see Wizard or similar with anything I immediately think "oh 
shit yet another there are only three possible ways to use this tool"

Wizards for anything are a bain of my life.



-- 
Paul Carpenter          | paul@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/>    PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/>  GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate

On Sat, 31 Mar 2012 13:39:54 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>True enough - and startup time is seldom critical.  But it is silly for 
>a toolchain provider to have startup code in assembly that is both 
>longer and slower than the equivalent C code, while being also less 
>clear and flexible.

The last time I was involved in the cross compiler business was in the
1980's so I might be a bit out of date :-).

In those days, when you hired a new person, what would be her/his
first duty ? Typically writing examples and documents for different
platforms in order to get some familiarity to the company products.

By no means, you would not use your best programmers for this duty,
but instead used engaged in writing the actual compiler.

I guess that the situation has not changed a lot since those days.

On 31/03/12 13:16, upsidedown@downunder.com wrote:
> On Sat, 31 Mar 2012 12:07:54 +0200, David Brown
> <david.brown@removethis.hesbynett.no>  wrote:
>
>> That reminds me of a situation where C was much better than assembly for
>> startup code.  This was 15 years ago - the compiler in question being
>> about 20 years old now.  The toolchain-provided startup code for
>> clearing the bss was written in assembly, as is common.  And it was slow
>> and inefficient - also a very common situation for toolchain-provided
>> assembly code.  I re-wrote it in C - the result was clearer code, half
>> the size of object code, and something like 10 times as fast run time.
>> Ironically it is because the compiler generated a DBNE instruction,
>> which the assembly code did not use.
>
> This might be slightly off topic, but at least it shows that embedded
> systems are used in quite different environments.
>
> While it might be critical that the startup can be done in 1 ms in
> some cases, in other cases a startup time of 1 s, 1 minute or even 1
> hour might be acceptable, if the system is expected to run for the
> next 1-30 years without restarts.
>

True enough - and startup time is seldom critical.  But it is silly for 
a toolchain provider to have startup code in assembly that is both 
longer and slower than the equivalent C code, while being also less 
clear and flexible.

Sometimes I think toolchain providers don't realise it is possible to 
have C code that runs before main() starts.  They write this crap in 
assembly once, for one member of the processor family, and re-use it 
ever after because no one can be bothered re-writing it optimally for 
different members.  So the startup code you get for your Coldfire v4 is 
restricted to being able to run on an 68000 from 30 years ago.  Or if 
they do modify it, they try to minimise the changes - resulting in an 
incomprehensible mixture.

(Sorry if that sounds like a bit of a rant - I've had to deal with some 
very messy low-level assembly code recently.  The toolchain is otherwise 
good, but some of the junk created as part of the "project wizard" setup 
is ridiculous.)

On 30/03/12 17:57, Tim Wescott wrote:
> On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:

>> It is clearly C++, but it would seem that CFractional is a class
>> containing an int32_t member "_x" which is the fractional value in
>> question.  Think of it as syntactic sugar around the function
>>
>> 	int32_t add_sat_frac(int32_t a, int32_t b);
>
> Yup.  In fact, that's an awful lot like what the call looks like when I
> need to do this in C (except that I'm going to be investigating just how
> ubiquitous fractional support is, now that I've been made aware of it).
>
> Sorry for not elucidating -- I thought it would be obvious.
>

It is obvious to people who are familiar with C++ - but gobbledegook to 
people who have managed to avoid it!

If you find out anything interesting about support for "fract", "sat", 
etc., it would be interesting to hear about it.  I know it is supported 
in gcc for many processors (either with hardware support, or library 
calls), but I haven't looked further than that.

I also know that the numpties that wrote the specs have, as usual, 
underspecified them.  How can they possibly have been so stupid as to 
write things like "the minimum formats for each type are..." ?  Did they 
not notice that the embedded world is a great fan of the <stdint.h> 
types like int16_t, and used their own types like u8 before that?  I 
would much prefer to have seen standardised names like "fract16_t", 
"ufract32_t", etc., from the start - before people make them up themselves.

mvh.,

David

On Sat, 31 Mar 2012 12:07:54 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>That reminds me of a situation where C was much better than assembly for 
>startup code.  This was 15 years ago - the compiler in question being 
>about 20 years old now.  The toolchain-provided startup code for 
>clearing the bss was written in assembly, as is common.  And it was slow 
>and inefficient - also a very common situation for toolchain-provided 
>assembly code.  I re-wrote it in C - the result was clearer code, half 
>the size of object code, and something like 10 times as fast run time. 
>Ironically it is because the compiler generated a DBNE instruction, 
>which the assembly code did not use.

This might be slightly off topic, but at least it shows that embedded
systems are used in quite different environments.

While it might be critical that the startup can be done in 1 ms in
some cases, in other cases a startup time of 1 s, 1 minute or even 1
hour might be acceptable, if the system is expected to run for the
next 1-30 years without restarts.

On 30/03/12 16:22, Mark Borgerson wrote:
> I did look at the C code and the compiler outputs.   It
> seems that compilers have come a long way since I wrote some
> 68K assembly because the compiler  refused to use the most
> efficient decrement-test-and-loop instruction (DBNE D0, Dest,
> I think).
>

That reminds me of a situation where C was much better than assembly for 
startup code.  This was 15 years ago - the compiler in question being 
about 20 years old now.  The toolchain-provided startup code for 
clearing the bss was written in assembly, as is common.  And it was slow 
and inefficient - also a very common situation for toolchain-provided 
assembly code.  I re-wrote it in C - the result was clearer code, half 
the size of object code, and something like 10 times as fast run time. 
Ironically it is because the compiler generated a DBNE instruction, 
which the assembly code did not use.

(The compiler will only be able to generate DBNE instructions for a 
16-bit counter, not an 32-bit "int" counter.  It's one of the few 16-bit 
only instructions on the m68k.)

> It is clear to me that the compiler writers are way ahead of
> me for the ARM and ARM-Cortex chips.  Even on the simpler
> MSP430,  I seldom use assembly outside the startup code.
> I still look at the assembly listing in the debugger, though.
>
> Mark Borgerson
>

On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:

> On 30/03/2012 06:27, Mark Borgerson wrote:
>> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
>> tim@seemywebsite.com says...
>>>
>>> Hey Walter (et all, if you're out there):
>>>
>>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes
>>> a _lot_ faster when you precede it with
>>>
>>> #define ASSEMBLY_WORKS
>>>
>>> than when you don't.
>>>
>>> Yet you say that an optimizer should eat up the C code and spit out
>>> assembly that's better than I can do.
>>>
>>> How come the difference?  Is it the tools?  I know it's not because
>>> it's the World's Best ARM Assembly, because I've learned a bit since I
>>> did it and could probably speed it up -- or at least make it cleaner.
>>>
>>> CFractional CFractional::operator + (CFractional y) const {
>>> #ifdef ASSEMBLY_WORKS
>>>    int32_t a = _x;
>>>    int32_t b = y._x;
>>>    asm ( "adds   %[a], %[b]\n"     // subtract
>>>          "bvc    .sat_add_vc\n"    // check for overflow "ite    mi\n"
>>>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>>>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative "b  
>>>             .sat_add_ret\n"
>>>          ".sat_add_maxpos: .word   0x7fffffff\n" ".sat_add_maxneg:
>>>          .word   0x80000001\n" ".sat_add_forbid: .word   0x80000000\n"
>>>          ".sat_add_vc:\n"
>>>          "bpl    .sat_add_ret\n"
>>>          "ldr    %[b], .sat_add_forbid\n"
>>>          "cmp    %[a], %[b]\n"
>>>          "it     eq\n"
>>>          "moveq  %[a], %[b]\n"
>>>          ".sat_add_ret:\n"
>>>          : [a] "=r" (a), [b] "=r" (b)
>>>          : "[a]" "r" (a), "[b]" "r" (b));
>>>
>>>    return CFractional(a);
>>> #else
>>>    int32_t retval = _x + y._x;
>>>
>>>    // Check for underflow and saturate if so if (_x<  0&&  y._x<  0&& 
>>>    (retval>= 0 || retval<  -INT32_MAX)) {
>>>      retval = -INT32_MAX;
>>>    }
>>>
>>>    // check for overflow and saturate if so if (_x>  0&&  y._x>  0&& 
>>>    retval<= 0) {
>>>      retval = INT32_MAX;
>>>    }
>>>
>>>    return retval;
>>> #endif
>>> }
>>
>> I was going to try out that code on the IAR EWARM compiler at various
>> optimization levels----until I realized that
>>
>> "CFractional CFractional::operator + (CFractional y) const"
>>
>> doesn't look like C to me.  Am I missing something??
>>
>> Could you include enough information to make that example directly
>> compilable in standard C.?
>>
>> Mark Borgerson
>>
>>
>>
> It is clearly C++, but it would seem that CFractional is a class
> containing an int32_t member "_x" which is the fractional value in
> question.  Think of it as syntactic sugar around the function
> 
> 	int32_t add_sat_frac(int32_t a, int32_t b);

Yup.  In fact, that's an awful lot like what the call looks like when I 
need to do this in C (except that I'm going to be investigating just how 
ubiquitous fractional support is, now that I've been made aware of it).

Sorry for not elucidating -- I thought it would be obvious.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

In article <BcSdna8XpZPQ_ujSnZ2dnUVZ7v2dnZ2d@lyse.net>, 
david@westcontrol.removethisbit.com says...
> 
> On 30/03/2012 06:27, Mark Borgerson wrote:
> > In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
> > tim@seemywebsite.com says...
> >>
> >> Hey Walter (et all, if you're out there):
> >>
> >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
> >> _lot_ faster when you precede it with
> >>
> >> #define ASSEMBLY_WORKS
> >>
> >> than when you don't.
> >>
> >> Yet you say that an optimizer should eat up the C code and spit out
> >> assembly that's better than I can do.
> >>
> >> How come the difference?  Is it the tools?  I know it's not because it's
> >> the World's Best ARM Assembly, because I've learned a bit since I did it
> >> and could probably speed it up -- or at least make it cleaner.
> >>
> >> CFractional CFractional::operator + (CFractional y) const
> >> {
> >> #ifdef ASSEMBLY_WORKS
> >>    int32_t a = _x;
> >>    int32_t b = y._x;
> >>    asm ( "adds   %[a], %[b]\n"     // subtract
> >>          "bvc    .sat_add_vc\n"    // check for overflow
> >>          "ite    mi\n"
> >>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
> >>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
> >>          "b      .sat_add_ret\n"
> >>          ".sat_add_maxpos: .word   0x7fffffff\n"
> >>          ".sat_add_maxneg: .word   0x80000001\n"
> >>          ".sat_add_forbid: .word   0x80000000\n"
> >>          ".sat_add_vc:\n"
> >>          "bpl    .sat_add_ret\n"
> >>          "ldr    %[b], .sat_add_forbid\n"
> >>          "cmp    %[a], %[b]\n"
> >>          "it     eq\n"
> >>          "moveq  %[a], %[b]\n"
> >>          ".sat_add_ret:\n"
> >>          : [a] "=r" (a), [b] "=r" (b)
> >>          : "[a]" "r" (a), "[b]" "r" (b));
> >>
> >>    return CFractional(a);
> >> #else
> >>    int32_t retval = _x + y._x;
> >>
> >>    // Check for underflow and saturate if so
> >>    if (_x<  0&&  y._x<  0&&  (retval>= 0 || retval<  -INT32_MAX))
> >>    {
> >>      retval = -INT32_MAX;
> >>    }
> >>
> >>    // check for overflow and saturate if so
> >>    if (_x>  0&&  y._x>  0&&  retval<= 0)
> >>    {
> >>      retval = INT32_MAX;
> >>    }
> >>
> >>    return retval;
> >> #endif
> >> }
> >
> > I was going to try out that code on the IAR EWARM compiler at various
> > optimization levels----until I realized that
> >
> > "CFractional CFractional::operator + (CFractional y) const"
> >
> > doesn't look like C to me.  Am I missing something??
> >
> > Could you include enough information to make that example
> > directly compilable in standard C.?
> >
> > Mark Borgerson
> >
> >
> 
> It is clearly C++, but it would seem that CFractional is a class 
> containing an int32_t member "_x" which is the fractional value in 
> question.  Think of it as syntactic sugar around the function
> 
> 	int32_t add_sat_frac(int32_t a, int32_t b);
> 
> (Or see my re-write of the code in C in my other post.)

I did look at the C code and the compiler outputs.   It 
seems that compilers have come a long way since I wrote some
68K assembly because the compiler  refused to use the most
efficient decrement-test-and-loop instruction (DBNE D0, Dest,
I think).

It is clear to me that the compiler writers are way ahead of
me for the ARM and ARM-Cortex chips.  Even on the simpler
MSP430,  I seldom use assembly outside the startup code.
I still look at the assembly listing in the debugger, though.

Mark Borgerson

On 30/03/2012 06:27, Mark Borgerson wrote:
> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
> tim@seemywebsite.com says...
>>
>> Hey Walter (et all, if you're out there):
>>
>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
>> _lot_ faster when you precede it with
>>
>> #define ASSEMBLY_WORKS
>>
>> than when you don't.
>>
>> Yet you say that an optimizer should eat up the C code and spit out
>> assembly that's better than I can do.
>>
>> How come the difference?  Is it the tools?  I know it's not because it's
>> the World's Best ARM Assembly, because I've learned a bit since I did it
>> and could probably speed it up -- or at least make it cleaner.
>>
>> CFractional CFractional::operator + (CFractional y) const
>> {
>> #ifdef ASSEMBLY_WORKS
>>    int32_t a = _x;
>>    int32_t b = y._x;
>>    asm ( "adds   %[a], %[b]\n"     // subtract
>>          "bvc    .sat_add_vc\n"    // check for overflow
>>          "ite    mi\n"
>>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>>          "b      .sat_add_ret\n"
>>          ".sat_add_maxpos: .word   0x7fffffff\n"
>>          ".sat_add_maxneg: .word   0x80000001\n"
>>          ".sat_add_forbid: .word   0x80000000\n"
>>          ".sat_add_vc:\n"
>>          "bpl    .sat_add_ret\n"
>>          "ldr    %[b], .sat_add_forbid\n"
>>          "cmp    %[a], %[b]\n"
>>          "it     eq\n"
>>          "moveq  %[a], %[b]\n"
>>          ".sat_add_ret:\n"
>>          : [a] "=r" (a), [b] "=r" (b)
>>          : "[a]" "r" (a), "[b]" "r" (b));
>>
>>    return CFractional(a);
>> #else
>>    int32_t retval = _x + y._x;
>>
>>    // Check for underflow and saturate if so
>>    if (_x<  0&&  y._x<  0&&  (retval>= 0 || retval<  -INT32_MAX))
>>    {
>>      retval = -INT32_MAX;
>>    }
>>
>>    // check for overflow and saturate if so
>>    if (_x>  0&&  y._x>  0&&  retval<= 0)
>>    {
>>      retval = INT32_MAX;
>>    }
>>
>>    return retval;
>> #endif
>> }
>
> I was going to try out that code on the IAR EWARM compiler at various
> optimization levels----until I realized that
>
> "CFractional CFractional::operator + (CFractional y) const"
>
> doesn't look like C to me.  Am I missing something??
>
> Could you include enough information to make that example
> directly compilable in standard C.?
>
> Mark Borgerson
>
>

It is clearly C++, but it would seem that CFractional is a class 
containing an int32_t member "_x" which is the fractional value in 
question.  Think of it as syntactic sugar around the function

	int32_t add_sat_frac(int32_t a, int32_t b);

(Or see my re-write of the code in C in my other post.)

mvh.,

David

In article <rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> 
> Hey Walter (et all, if you're out there):
> 
> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a 
> _lot_ faster when you precede it with
> 
> #define ASSEMBLY_WORKS
> 
> than when you don't.
> 
> Yet you say that an optimizer should eat up the C code and spit out 
> assembly that's better than I can do.
> 
> How come the difference?  Is it the tools?  I know it's not because it's 
> the World's Best ARM Assembly, because I've learned a bit since I did it 
> and could probably speed it up -- or at least make it cleaner.
> 
> CFractional CFractional::operator + (CFractional y) const
> {
> #ifdef ASSEMBLY_WORKS
>   int32_t a = _x;
>   int32_t b = y._x;
>   asm ( "adds   %[a], %[b]\n"     // subtract
>         "bvc    .sat_add_vc\n"    // check for overflow
>         "ite    mi\n"
>         "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>         "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>         "b      .sat_add_ret\n"
>         ".sat_add_maxpos: .word   0x7fffffff\n"
>         ".sat_add_maxneg: .word   0x80000001\n"
>         ".sat_add_forbid: .word   0x80000000\n"
>         ".sat_add_vc:\n"
>         "bpl    .sat_add_ret\n"
>         "ldr    %[b], .sat_add_forbid\n"
>         "cmp    %[a], %[b]\n"
>         "it     eq\n"
>         "moveq  %[a], %[b]\n"
>         ".sat_add_ret:\n"
>         : [a] "=r" (a), [b] "=r" (b)
>         : "[a]" "r" (a), "[b]" "r" (b));
> 
>   return CFractional(a);
> #else
>   int32_t retval = _x + y._x;
> 
>   // Check for underflow and saturate if so
>   if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX))
>   {
>     retval = -INT32_MAX;
>   }
> 
>   // check for overflow and saturate if so
>   if (_x > 0 && y._x > 0 && retval <= 0)
>   {
>     retval = INT32_MAX;
>   }
> 
>   return retval;
> #endif
> }

I was going to try out that code on the IAR EWARM compiler at various 
optimization levels----until I realized that 

"CFractional CFractional::operator + (CFractional y) const"

doesn't look like C to me.  Am I missing something??

Could you include enough information to make that example 
directly compilable in standard C.?

Mark Borgerson