EmbeddedRelated.com
Forums

A Challenge for our Compiler Writer(s)

Started by Tim Wescott March 29, 2012
Hey Walter (et all, if you're out there):

With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a 
_lot_ faster when you precede it with

#define ASSEMBLY_WORKS

than when you don't.

Yet you say that an optimizer should eat up the C code and spit out 
assembly that's better than I can do.

How come the difference?  Is it the tools?  I know it's not because it's 
the World's Best ARM Assembly, because I've learned a bit since I did it 
and could probably speed it up -- or at least make it cleaner.

CFractional CFractional::operator + (CFractional y) const
{
#ifdef ASSEMBLY_WORKS
  int32_t a = _x;
  int32_t b = y._x;
  asm ( "adds   %[a], %[b]\n"     // subtract
        "bvc    .sat_add_vc\n"    // check for overflow
        "ite    mi\n"
        "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
        "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
        "b      .sat_add_ret\n"
        ".sat_add_maxpos: .word   0x7fffffff\n"
        ".sat_add_maxneg: .word   0x80000001\n"
        ".sat_add_forbid: .word   0x80000000\n"
        ".sat_add_vc:\n"
        "bpl    .sat_add_ret\n"
        "ldr    %[b], .sat_add_forbid\n"
        "cmp    %[a], %[b]\n"
        "it     eq\n"
        "moveq  %[a], %[b]\n"
        ".sat_add_ret:\n"
        : [a] "=r" (a), [b] "=r" (b)
        : "[a]" "r" (a), "[b]" "r" (b));

  return CFractional(a);
#else
  int32_t retval = _x + y._x;

  // Check for underflow and saturate if so
  if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX))
  {
    retval = -INT32_MAX;
  }

  // check for overflow and saturate if so
  if (_x > 0 && y._x > 0 && retval <= 0)
  {
    retval = INT32_MAX;
  }

  return retval;
#endif
}


-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com
On 03/29/2012 07:39 PM, Tim Wescott wrote:
> Hey Walter (et all, if you're out there): > > With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > _lot_ faster when you precede it with > > #define ASSEMBLY_WORKS > > than when you don't. > > Yet you say that an optimizer should eat up the C code and spit out > assembly that's better than I can do. > > How come the difference? Is it the tools? I know it's not because it's > the World's Best ARM Assembly, because I've learned a bit since I did it > and could probably speed it up -- or at least make it cleaner. > > CFractional CFractional::operator + (CFractional y) const > { > #ifdef ASSEMBLY_WORKS > int32_t a = _x; > int32_t b = y._x; > asm ( "adds %[a], %[b]\n" // subtract > "bvc .sat_add_vc\n" // check for overflow > "ite mi\n" > "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > "b .sat_add_ret\n" > ".sat_add_maxpos: .word 0x7fffffff\n" > ".sat_add_maxneg: .word 0x80000001\n" > ".sat_add_forbid: .word 0x80000000\n" > ".sat_add_vc:\n" > "bpl .sat_add_ret\n" > "ldr %[b], .sat_add_forbid\n" > "cmp %[a], %[b]\n" > "it eq\n" > "moveq %[a], %[b]\n" > ".sat_add_ret:\n" > : [a] "=r" (a), [b] "=r" (b) > : "[a]" "r" (a), "[b]" "r" (b));
Not an answer to your question, but couldn't you use the SSAT instruction to your advantage here ?
On Thu, 29 Mar 2012 20:19:48 +0200, Arlet Ottens wrote:

> On 03/29/2012 07:39 PM, Tim Wescott wrote: >> Hey Walter (et all, if you're out there): >> >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a >> _lot_ faster when you precede it with >> >> #define ASSEMBLY_WORKS >> >> than when you don't. >> >> Yet you say that an optimizer should eat up the C code and spit out >> assembly that's better than I can do. >> >> How come the difference? Is it the tools? I know it's not because >> it's the World's Best ARM Assembly, because I've learned a bit since I >> did it and could probably speed it up -- or at least make it cleaner. >> >> CFractional CFractional::operator + (CFractional y) const { >> #ifdef ASSEMBLY_WORKS >> int32_t a = _x; >> int32_t b = y._x; >> asm ( "adds %[a], %[b]\n" // subtract >> "bvc .sat_add_vc\n" // check for overflow "ite mi\n" >> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive >> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative "b >> .sat_add_ret\n" >> ".sat_add_maxpos: .word 0x7fffffff\n" ".sat_add_maxneg: >> .word 0x80000001\n" ".sat_add_forbid: .word 0x80000000\n" >> ".sat_add_vc:\n" >> "bpl .sat_add_ret\n" >> "ldr %[b], .sat_add_forbid\n" >> "cmp %[a], %[b]\n" >> "it eq\n" >> "moveq %[a], %[b]\n" >> ".sat_add_ret:\n" >> : [a] "=r" (a), [b] "=r" (b) >> : "[a]" "r" (a), "[b]" "r" (b)); > > Not an answer to your question, but couldn't you use the SSAT > instruction to your advantage here ?
If it's what I think it is -- very possibly. As I said, this isn't super- optimized assembly code, here. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
On 29/03/12 19:39, Tim Wescott wrote:
> Hey Walter (et all, if you're out there): > > With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > _lot_ faster when you precede it with > > #define ASSEMBLY_WORKS > > than when you don't. > > Yet you say that an optimizer should eat up the C code and spit out > assembly that's better than I can do. > > How come the difference? Is it the tools? I know it's not because it's > the World's Best ARM Assembly, because I've learned a bit since I did it > and could probably speed it up -- or at least make it cleaner. > > CFractional CFractional::operator + (CFractional y) const > { > #ifdef ASSEMBLY_WORKS > int32_t a = _x; > int32_t b = y._x; > asm ( "adds %[a], %[b]\n" // subtract > "bvc .sat_add_vc\n" // check for overflow > "ite mi\n" > "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > "b .sat_add_ret\n" > ".sat_add_maxpos: .word 0x7fffffff\n" > ".sat_add_maxneg: .word 0x80000001\n" > ".sat_add_forbid: .word 0x80000000\n" > ".sat_add_vc:\n" > "bpl .sat_add_ret\n" > "ldr %[b], .sat_add_forbid\n" > "cmp %[a], %[b]\n" > "it eq\n" > "moveq %[a], %[b]\n" > ".sat_add_ret:\n" > : [a] "=r" (a), [b] "=r" (b) > : "[a]" "r" (a), "[b]" "r" (b)); > > return CFractional(a); > #else > int32_t retval = _x + y._x; > > // Check for underflow and saturate if so > if (_x< 0&& y._x< 0&& (retval>= 0 || retval< -INT32_MAX)) > { > retval = -INT32_MAX; > } > > // check for overflow and saturate if so > if (_x> 0&& y._x> 0&& retval<= 0) > { > retval = INT32_MAX; > } > > return retval; > #endif > } >
I don't have an ARM handy for testing speed, but I've just tried compiling some test code with the latest Code Sourcery "lite" arm compiler (gcc 4.6.1), with the command line: arm-none-eabi-gcc test.c -c -std=gnu99 -Wa,-ahlsd=test.lst -fverbose-asm -Os -mcpu=cortex-m4 -mthumb (I tried with cortex-m4 because it supports saturating arithmetic.) There might be differences about saturating negative values to -INT32_MAX or to INT32_MIN - I don't know which is standard or required here. As can be seen from the code below, your C code is not optimal. I would be very interested to know how the speed of satadd2() below compares to your hand-made assembly. However, this all raises bigger questions - why are you making your own code for this? Modern compilers (such as gcc) support fractional types (from ISO/IEC TR 18037). If you use them, as in satadd3(), the compiler will generate optimal code for processors with hardware support (such as the Cortex-M4). For other processors, such as the Cortex-M3, the compiler automatically uses a library routine. You can expect such library routines to be pretty optimal for the architecture in question (for the M3, the library code is the same as for satadd2() below, which is hardly surprising given the source of that function). So by using "signed long sat fract" types you get fast library code on M3 and before, and when you switch to an M4 with DSP functionality, a re-compile gives you optimal use of the hardware without having to re-write your assembly. Tell me again why assembly is so great in this case? mvh., David // test.c #include <stdint.h> #include <stdfix.h> int32_t satadd1(int32_t x, int32_t y) { int32_t retval = x + y; if ((x < 0) && (y < 0) && ((retval >= 0) || ( retval < -INT32_MAX))) { retval = -INT32_MAX; } if ((x > 0) && (y > 0) && (retval <= 0)) { retval = INT32_MAX; } return retval; } // Copied from <http://gcc.gnu.org/wiki/FixedPointArithmetic> #define MIN_32 0x80000000 #define MAX_32 0x7fffffff int32_t satadd2(int32_t x, int32_t y) { int32_t retval = x + y; if (((x ^ y) & MIN_32) == 0) { // Sign of x and y the same if ((retval ^ x) & MIN_32) { // Sign of retval and x are different retval = (x < 0) ? MIN_32 : MAX_32; } } return retval; } int32_t satadd3(int32_t x, int32_t y) { typedef union { int32_t i; signed long sat fract f; } satInt_t; satInt_t a, b, c; a.i = x; b.i = y; c.f = a.f + b.f; return c.i; } // test.lst 65 satadd1: 66 @ args = 0, pretend = 0, frame = 0 67 @ frame_needed = 0, uses_anonymous_args = 0 68 @ link register save eliminated. 69 0000 0346 mov r3, r0 @ x, x 70 0002 002B cmp r3, #0 @ x, 71 0004 0844 add r0, r0, r1 @ retval, x, y 72 0006 0ADA bge .L2 @, 73 0008 0029 cmp r1, #0 @ y, 74 000a 0FDA bge .L5 @, 75 000c 00F10041 add r1, r0, #-2147483648 @ tmp140, retval, 76 0010 4B1E subs r3, r1, #1 @ tmp140, tmp140, 77 0012 074A ldr r2, .L8 @ tmp144, 78 0014 0749 ldr r1, .L8+4 @ tmp142, 79 0016 8B42 cmp r3, r1 @ tmp140, tmp142 80 0018 88BF it hi @ 81 001a 1046 movhi r0, r2 @, retval, tmp144 82 001c 7047 bx lr @ 83 .L2: 84 001e 05D0 beq .L5 @, 85 0020 0029 cmp r1, #0 @ y, 86 0022 03DD ble .L5 @, 87 0024 0028 cmp r0, #0 @ retval, 88 0026 D8BF it le @ 89 0028 6FF00040 mvnle r0, #-2147483648 @, retval, 90 .L5: 91 002c 7047 bx lr @ 92 .L9: 93 002e 00BF .align 2 94 .L8: 95 0030 01000080 .word -2147483647 96 0034 FEFFFF7F .word 2147483646 98 .align 1 99 .global satadd2 100 .thumb 101 .thumb_func 103 satadd2: 104 @ args = 0, pretend = 0, frame = 0 105 @ frame_needed = 0, uses_anonymous_args = 0 106 @ link register save eliminated. 107 0038 0346 mov r3, r0 @ x, x 108 003a 91EA030F teq r1, r3 @ y, x 109 003e 0844 add r0, r0, r1 @ retval, x, y 110 0040 08D4 bmi .L11 @, 111 0042 90EA030F teq r0, r3 @ retval, x 112 0046 05D5 bpl .L11 @, 113 0048 002B cmp r3, #0 @ x, 114 004a ACBF ite ge @ 115 004c 6FF00040 mvnge r0, #-2147483648 @, retval, 116 0050 4FF00040 movlt r0, #-2147483648 @, retval, 117 .L11: ARM GAS /tmp/ccDhsXUk.s page 3 118 0054 7047 bx lr @ 120 .align 1 121 .global satadd3 122 .thumb 123 .thumb_func 125 satadd3: 126 @ args = 0, pretend = 0, frame = 0 127 @ frame_needed = 0, uses_anonymous_args = 0 128 @ link register save eliminated. 129 0056 81FA80F0 qadd r0, r0, r1 @ <retval>, x, y 130 005a 7047 bx lr @ 132 .ident "GCC: (Sourcery CodeBench Lite 2011.09-69) 4.6.1" ARM GAS /tmp/ccDhsXUk.s page 4
On Thu, 29 Mar 2012 23:41:30 +0200, David Brown wrote:

> However, this all raises bigger questions - why are you making your own > code for this? Modern compilers (such as gcc) support fractional types > (from ISO/IEC TR 18037). If you use them, as in satadd3(), the compiler > will generate optimal code for processors with hardware support (such as > the Cortex-M4). For other processors, such as the Cortex-M3, the > compiler automatically uses a library routine. You can expect such > library routines to be pretty optimal for the architecture in question > (for the M3, the library code is the same as for satadd2() below, which > is hardly surprising given the source of that function).
Damn. And I thought I was so smart. So, how long must I have been sleeping? -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
In article <rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> > Hey Walter (et all, if you're out there): > > With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > _lot_ faster when you precede it with > > #define ASSEMBLY_WORKS > > than when you don't. > > Yet you say that an optimizer should eat up the C code and spit out > assembly that's better than I can do. > > How come the difference? Is it the tools? I know it's not because it's > the World's Best ARM Assembly, because I've learned a bit since I did it > and could probably speed it up -- or at least make it cleaner. > > CFractional CFractional::operator + (CFractional y) const > { > #ifdef ASSEMBLY_WORKS > int32_t a = _x; > int32_t b = y._x; > asm ( "adds %[a], %[b]\n" // subtract > "bvc .sat_add_vc\n" // check for overflow > "ite mi\n" > "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > "b .sat_add_ret\n" > ".sat_add_maxpos: .word 0x7fffffff\n" > ".sat_add_maxneg: .word 0x80000001\n" > ".sat_add_forbid: .word 0x80000000\n" > ".sat_add_vc:\n" > "bpl .sat_add_ret\n" > "ldr %[b], .sat_add_forbid\n" > "cmp %[a], %[b]\n" > "it eq\n" > "moveq %[a], %[b]\n" > ".sat_add_ret:\n" > : [a] "=r" (a), [b] "=r" (b) > : "[a]" "r" (a), "[b]" "r" (b)); > > return CFractional(a); > #else > int32_t retval = _x + y._x; > > // Check for underflow and saturate if so > if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX)) > { > retval = -INT32_MAX; > } > > // check for overflow and saturate if so > if (_x > 0 && y._x > 0 && retval <= 0) > { > retval = INT32_MAX; > } > > return retval; > #endif > }
I was going to try out that code on the IAR EWARM compiler at various optimization levels----until I realized that "CFractional CFractional::operator + (CFractional y) const" doesn't look like C to me. Am I missing something?? Could you include enough information to make that example directly compilable in standard C.? Mark Borgerson
On 30/03/2012 06:27, Mark Borgerson wrote:
> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, > tim@seemywebsite.com says... >> >> Hey Walter (et all, if you're out there): >> >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a >> _lot_ faster when you precede it with >> >> #define ASSEMBLY_WORKS >> >> than when you don't. >> >> Yet you say that an optimizer should eat up the C code and spit out >> assembly that's better than I can do. >> >> How come the difference? Is it the tools? I know it's not because it's >> the World's Best ARM Assembly, because I've learned a bit since I did it >> and could probably speed it up -- or at least make it cleaner. >> >> CFractional CFractional::operator + (CFractional y) const >> { >> #ifdef ASSEMBLY_WORKS >> int32_t a = _x; >> int32_t b = y._x; >> asm ( "adds %[a], %[b]\n" // subtract >> "bvc .sat_add_vc\n" // check for overflow >> "ite mi\n" >> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive >> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative >> "b .sat_add_ret\n" >> ".sat_add_maxpos: .word 0x7fffffff\n" >> ".sat_add_maxneg: .word 0x80000001\n" >> ".sat_add_forbid: .word 0x80000000\n" >> ".sat_add_vc:\n" >> "bpl .sat_add_ret\n" >> "ldr %[b], .sat_add_forbid\n" >> "cmp %[a], %[b]\n" >> "it eq\n" >> "moveq %[a], %[b]\n" >> ".sat_add_ret:\n" >> : [a] "=r" (a), [b] "=r" (b) >> : "[a]" "r" (a), "[b]" "r" (b)); >> >> return CFractional(a); >> #else >> int32_t retval = _x + y._x; >> >> // Check for underflow and saturate if so >> if (_x< 0&& y._x< 0&& (retval>= 0 || retval< -INT32_MAX)) >> { >> retval = -INT32_MAX; >> } >> >> // check for overflow and saturate if so >> if (_x> 0&& y._x> 0&& retval<= 0) >> { >> retval = INT32_MAX; >> } >> >> return retval; >> #endif >> } > > I was going to try out that code on the IAR EWARM compiler at various > optimization levels----until I realized that > > "CFractional CFractional::operator + (CFractional y) const" > > doesn't look like C to me. Am I missing something?? > > Could you include enough information to make that example > directly compilable in standard C.? > > Mark Borgerson > >
It is clearly C++, but it would seem that CFractional is a class containing an int32_t member "_x" which is the fractional value in question. Think of it as syntactic sugar around the function int32_t add_sat_frac(int32_t a, int32_t b); (Or see my re-write of the code in C in my other post.) mvh., David
In article <BcSdna8XpZPQ_ujSnZ2dnUVZ7v2dnZ2d@lyse.net>, 
david@westcontrol.removethisbit.com says...
> > On 30/03/2012 06:27, Mark Borgerson wrote: > > In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, > > tim@seemywebsite.com says... > >> > >> Hey Walter (et all, if you're out there): > >> > >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a > >> _lot_ faster when you precede it with > >> > >> #define ASSEMBLY_WORKS > >> > >> than when you don't. > >> > >> Yet you say that an optimizer should eat up the C code and spit out > >> assembly that's better than I can do. > >> > >> How come the difference? Is it the tools? I know it's not because it's > >> the World's Best ARM Assembly, because I've learned a bit since I did it > >> and could probably speed it up -- or at least make it cleaner. > >> > >> CFractional CFractional::operator + (CFractional y) const > >> { > >> #ifdef ASSEMBLY_WORKS > >> int32_t a = _x; > >> int32_t b = y._x; > >> asm ( "adds %[a], %[b]\n" // subtract > >> "bvc .sat_add_vc\n" // check for overflow > >> "ite mi\n" > >> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive > >> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative > >> "b .sat_add_ret\n" > >> ".sat_add_maxpos: .word 0x7fffffff\n" > >> ".sat_add_maxneg: .word 0x80000001\n" > >> ".sat_add_forbid: .word 0x80000000\n" > >> ".sat_add_vc:\n" > >> "bpl .sat_add_ret\n" > >> "ldr %[b], .sat_add_forbid\n" > >> "cmp %[a], %[b]\n" > >> "it eq\n" > >> "moveq %[a], %[b]\n" > >> ".sat_add_ret:\n" > >> : [a] "=r" (a), [b] "=r" (b) > >> : "[a]" "r" (a), "[b]" "r" (b)); > >> > >> return CFractional(a); > >> #else > >> int32_t retval = _x + y._x; > >> > >> // Check for underflow and saturate if so > >> if (_x< 0&& y._x< 0&& (retval>= 0 || retval< -INT32_MAX)) > >> { > >> retval = -INT32_MAX; > >> } > >> > >> // check for overflow and saturate if so > >> if (_x> 0&& y._x> 0&& retval<= 0) > >> { > >> retval = INT32_MAX; > >> } > >> > >> return retval; > >> #endif > >> } > > > > I was going to try out that code on the IAR EWARM compiler at various > > optimization levels----until I realized that > > > > "CFractional CFractional::operator + (CFractional y) const" > > > > doesn't look like C to me. Am I missing something?? > > > > Could you include enough information to make that example > > directly compilable in standard C.? > > > > Mark Borgerson > > > > > > It is clearly C++, but it would seem that CFractional is a class > containing an int32_t member "_x" which is the fractional value in > question. Think of it as syntactic sugar around the function > > int32_t add_sat_frac(int32_t a, int32_t b); > > (Or see my re-write of the code in C in my other post.)
I did look at the C code and the compiler outputs. It seems that compilers have come a long way since I wrote some 68K assembly because the compiler refused to use the most efficient decrement-test-and-loop instruction (DBNE D0, Dest, I think). It is clear to me that the compiler writers are way ahead of me for the ARM and ARM-Cortex chips. Even on the simpler MSP430, I seldom use assembly outside the startup code. I still look at the assembly listing in the debugger, though. Mark Borgerson
On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:

> On 30/03/2012 06:27, Mark Borgerson wrote: >> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, >> tim@seemywebsite.com says... >>> >>> Hey Walter (et all, if you're out there): >>> >>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes >>> a _lot_ faster when you precede it with >>> >>> #define ASSEMBLY_WORKS >>> >>> than when you don't. >>> >>> Yet you say that an optimizer should eat up the C code and spit out >>> assembly that's better than I can do. >>> >>> How come the difference? Is it the tools? I know it's not because >>> it's the World's Best ARM Assembly, because I've learned a bit since I >>> did it and could probably speed it up -- or at least make it cleaner. >>> >>> CFractional CFractional::operator + (CFractional y) const { >>> #ifdef ASSEMBLY_WORKS >>> int32_t a = _x; >>> int32_t b = y._x; >>> asm ( "adds %[a], %[b]\n" // subtract >>> "bvc .sat_add_vc\n" // check for overflow "ite mi\n" >>> "ldrmi %[a], .sat_add_maxpos\n" // set to max positive >>> "ldrpl %[a], .sat_add_maxneg\n" // set to max negative "b >>> .sat_add_ret\n" >>> ".sat_add_maxpos: .word 0x7fffffff\n" ".sat_add_maxneg: >>> .word 0x80000001\n" ".sat_add_forbid: .word 0x80000000\n" >>> ".sat_add_vc:\n" >>> "bpl .sat_add_ret\n" >>> "ldr %[b], .sat_add_forbid\n" >>> "cmp %[a], %[b]\n" >>> "it eq\n" >>> "moveq %[a], %[b]\n" >>> ".sat_add_ret:\n" >>> : [a] "=r" (a), [b] "=r" (b) >>> : "[a]" "r" (a), "[b]" "r" (b)); >>> >>> return CFractional(a); >>> #else >>> int32_t retval = _x + y._x; >>> >>> // Check for underflow and saturate if so if (_x< 0&& y._x< 0&& >>> (retval>= 0 || retval< -INT32_MAX)) { >>> retval = -INT32_MAX; >>> } >>> >>> // check for overflow and saturate if so if (_x> 0&& y._x> 0&& >>> retval<= 0) { >>> retval = INT32_MAX; >>> } >>> >>> return retval; >>> #endif >>> } >> >> I was going to try out that code on the IAR EWARM compiler at various >> optimization levels----until I realized that >> >> "CFractional CFractional::operator + (CFractional y) const" >> >> doesn't look like C to me. Am I missing something?? >> >> Could you include enough information to make that example directly >> compilable in standard C.? >> >> Mark Borgerson >> >> >> > It is clearly C++, but it would seem that CFractional is a class > containing an int32_t member "_x" which is the fractional value in > question. Think of it as syntactic sugar around the function > > int32_t add_sat_frac(int32_t a, int32_t b);
Yup. In fact, that's an awful lot like what the call looks like when I need to do this in C (except that I'm going to be investigating just how ubiquitous fractional support is, now that I've been made aware of it). Sorry for not elucidating -- I thought it would be obvious. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
On 30/03/12 16:22, Mark Borgerson wrote:
> I did look at the C code and the compiler outputs. It > seems that compilers have come a long way since I wrote some > 68K assembly because the compiler refused to use the most > efficient decrement-test-and-loop instruction (DBNE D0, Dest, > I think). >
That reminds me of a situation where C was much better than assembly for startup code. This was 15 years ago - the compiler in question being about 20 years old now. The toolchain-provided startup code for clearing the bss was written in assembly, as is common. And it was slow and inefficient - also a very common situation for toolchain-provided assembly code. I re-wrote it in C - the result was clearer code, half the size of object code, and something like 10 times as fast run time. Ironically it is because the compiler generated a DBNE instruction, which the assembly code did not use. (The compiler will only be able to generate DBNE instructions for a 16-bit counter, not an 32-bit "int" counter. It's one of the few 16-bit only instructions on the m68k.)
> It is clear to me that the compiler writers are way ahead of > me for the ARM and ARM-Cortex chips. Even on the simpler > MSP430, I seldom use assembly outside the startup code. > I still look at the assembly listing in the debugger, though. > > Mark Borgerson >