A Challenge for our Compiler Writer(s)

Hey Walter (et all, if you're out there):

With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a 
_lot_ faster when you precede it with

#define ASSEMBLY_WORKS

than when you don't.

Yet you say that an optimizer should eat up the C code and spit out 
assembly that's better than I can do.

How come the difference?  Is it the tools?  I know it's not because it's 
the World's Best ARM Assembly, because I've learned a bit since I did it 
and could probably speed it up -- or at least make it cleaner.

CFractional CFractional::operator + (CFractional y) const
{
#ifdef ASSEMBLY_WORKS
  int32_t a = _x;
  int32_t b = y._x;
  asm ( "adds   %[a], %[b]\n"     // subtract
        "bvc    .sat_add_vc\n"    // check for overflow
        "ite    mi\n"
        "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
        "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
        "b      .sat_add_ret\n"
        ".sat_add_maxpos: .word   0x7fffffff\n"
        ".sat_add_maxneg: .word   0x80000001\n"
        ".sat_add_forbid: .word   0x80000000\n"
        ".sat_add_vc:\n"
        "bpl    .sat_add_ret\n"
        "ldr    %[b], .sat_add_forbid\n"
        "cmp    %[a], %[b]\n"
        "it     eq\n"
        "moveq  %[a], %[b]\n"
        ".sat_add_ret:\n"
        : [a] "=r" (a), [b] "=r" (b)
        : "[a]" "r" (a), "[b]" "r" (b));

  return CFractional(a);
#else
  int32_t retval = _x + y._x;

  // Check for underflow and saturate if so
  if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX))
  {
    retval = -INT32_MAX;
  }

  // check for overflow and saturate if so
  if (_x > 0 && y._x > 0 && retval <= 0)
  {
    retval = INT32_MAX;
  }

  return retval;
#endif
}


-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by Arlet Ottens ●March 29, 20122012-03-29

On 03/29/2012 07:39 PM, Tim Wescott wrote:
> Hey Walter (et all, if you're out there):
>
> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
> _lot_ faster when you precede it with
>
> #define ASSEMBLY_WORKS
>
> than when you don't.
>
> Yet you say that an optimizer should eat up the C code and spit out
> assembly that's better than I can do.
>
> How come the difference?  Is it the tools?  I know it's not because it's
> the World's Best ARM Assembly, because I've learned a bit since I did it
> and could probably speed it up -- or at least make it cleaner.
>
> CFractional CFractional::operator + (CFractional y) const
> {
> #ifdef ASSEMBLY_WORKS
>    int32_t a = _x;
>    int32_t b = y._x;
>    asm ( "adds   %[a], %[b]\n"     // subtract
>          "bvc    .sat_add_vc\n"    // check for overflow
>          "ite    mi\n"
>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>          "b      .sat_add_ret\n"
>          ".sat_add_maxpos: .word   0x7fffffff\n"
>          ".sat_add_maxneg: .word   0x80000001\n"
>          ".sat_add_forbid: .word   0x80000000\n"
>          ".sat_add_vc:\n"
>          "bpl    .sat_add_ret\n"
>          "ldr    %[b], .sat_add_forbid\n"
>          "cmp    %[a], %[b]\n"
>          "it     eq\n"
>          "moveq  %[a], %[b]\n"
>          ".sat_add_ret:\n"
>          : [a] "=r" (a), [b] "=r" (b)
>          : "[a]" "r" (a), "[b]" "r" (b));

Not an answer to your question, but couldn't you use the SSAT 
instruction to your advantage here ?

Reply by Tim Wescott ●March 29, 20122012-03-29

On Thu, 29 Mar 2012 20:19:48 +0200, Arlet Ottens wrote:

> On 03/29/2012 07:39 PM, Tim Wescott wrote:
>> Hey Walter (et all, if you're out there):
>>
>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
>> _lot_ faster when you precede it with
>>
>> #define ASSEMBLY_WORKS
>>
>> than when you don't.
>>
>> Yet you say that an optimizer should eat up the C code and spit out
>> assembly that's better than I can do.
>>
>> How come the difference?  Is it the tools?  I know it's not because
>> it's the World's Best ARM Assembly, because I've learned a bit since I
>> did it and could probably speed it up -- or at least make it cleaner.
>>
>> CFractional CFractional::operator + (CFractional y) const {
>> #ifdef ASSEMBLY_WORKS
>>    int32_t a = _x;
>>    int32_t b = y._x;
>>    asm ( "adds   %[a], %[b]\n"     // subtract
>>          "bvc    .sat_add_vc\n"    // check for overflow "ite    mi\n"
>>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative "b   
>>            .sat_add_ret\n"
>>          ".sat_add_maxpos: .word   0x7fffffff\n" ".sat_add_maxneg:
>>          .word   0x80000001\n" ".sat_add_forbid: .word   0x80000000\n"
>>          ".sat_add_vc:\n"
>>          "bpl    .sat_add_ret\n"
>>          "ldr    %[b], .sat_add_forbid\n"
>>          "cmp    %[a], %[b]\n"
>>          "it     eq\n"
>>          "moveq  %[a], %[b]\n"
>>          ".sat_add_ret:\n"
>>          : [a] "=r" (a), [b] "=r" (b)
>>          : "[a]" "r" (a), "[b]" "r" (b));
> 
> Not an answer to your question, but couldn't you use the SSAT
> instruction to your advantage here ?

If it's what I think it is -- very possibly.  As I said, this isn't super-
optimized assembly code, here.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by David Brown ●March 29, 20122012-03-29

On 29/03/12 19:39, Tim Wescott wrote:
> Hey Walter (et all, if you're out there):
>
> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
> _lot_ faster when you precede it with
>
> #define ASSEMBLY_WORKS
>
> than when you don't.
>
> Yet you say that an optimizer should eat up the C code and spit out
> assembly that's better than I can do.
>
> How come the difference?  Is it the tools?  I know it's not because it's
> the World's Best ARM Assembly, because I've learned a bit since I did it
> and could probably speed it up -- or at least make it cleaner.
>
> CFractional CFractional::operator + (CFractional y) const
> {
> #ifdef ASSEMBLY_WORKS
>    int32_t a = _x;
>    int32_t b = y._x;
>    asm ( "adds   %[a], %[b]\n"     // subtract
>          "bvc    .sat_add_vc\n"    // check for overflow
>          "ite    mi\n"
>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>          "b      .sat_add_ret\n"
>          ".sat_add_maxpos: .word   0x7fffffff\n"
>          ".sat_add_maxneg: .word   0x80000001\n"
>          ".sat_add_forbid: .word   0x80000000\n"
>          ".sat_add_vc:\n"
>          "bpl    .sat_add_ret\n"
>          "ldr    %[b], .sat_add_forbid\n"
>          "cmp    %[a], %[b]\n"
>          "it     eq\n"
>          "moveq  %[a], %[b]\n"
>          ".sat_add_ret:\n"
>          : [a] "=r" (a), [b] "=r" (b)
>          : "[a]" "r" (a), "[b]" "r" (b));
>
>    return CFractional(a);
> #else
>    int32_t retval = _x + y._x;
>
>    // Check for underflow and saturate if so
>    if (_x<  0&&  y._x<  0&&  (retval>= 0 || retval<  -INT32_MAX))
>    {
>      retval = -INT32_MAX;
>    }
>
>    // check for overflow and saturate if so
>    if (_x>  0&&  y._x>  0&&  retval<= 0)
>    {
>      retval = INT32_MAX;
>    }
>
>    return retval;
> #endif
> }
>

I don't have an ARM handy for testing speed, but I've just tried 
compiling some test code with the latest Code Sourcery "lite" arm 
compiler (gcc 4.6.1), with the command line:

arm-none-eabi-gcc test.c -c -std=gnu99 -Wa,-ahlsd=test.lst -fverbose-asm 
-Os -mcpu=cortex-m4 -mthumb

(I tried with cortex-m4 because it supports saturating arithmetic.)

There might be differences about saturating negative values to 
-INT32_MAX or to INT32_MIN - I don't know which is standard or required 
here.



As can be seen from the code below, your C code is not optimal.  I would 
be very interested to know how the speed of satadd2() below compares to 
your hand-made assembly.

However, this all raises bigger questions - why are you making your own 
code for this?  Modern compilers (such as gcc) support fractional types 
(from ISO/IEC TR 18037).  If you use them, as in satadd3(), the compiler 
will generate optimal code for processors with hardware support (such as 
the Cortex-M4).  For other processors, such as the Cortex-M3, the 
compiler automatically uses a library routine.  You can expect such 
library routines to be pretty optimal for the architecture in question 
(for the M3, the library code is the same as for satadd2() below, which 
is hardly surprising given the source of that function).

So by using "signed long sat fract" types you get fast library code on 
M3 and before, and when you switch to an M4 with DSP functionality, a 
re-compile gives you optimal use of the hardware without having to 
re-write your assembly.

Tell me again why assembly is so great in this case?

mvh.,

David




// test.c

#include <stdint.h>
#include <stdfix.h>

int32_t satadd1(int32_t x, int32_t y) {
	int32_t retval = x + y;
	if ((x < 0) && (y < 0) && ((retval >= 0) || ( retval < -INT32_MAX))) {
		retval = -INT32_MAX;
	}
	if ((x > 0) && (y > 0) && (retval <= 0)) {
		retval = INT32_MAX;
	}
	return retval;
}
	
// Copied from <http://gcc.gnu.org/wiki/FixedPointArithmetic>
#define MIN_32 0x80000000
#define MAX_32 0x7fffffff
int32_t satadd2(int32_t x, int32_t y) {
	int32_t retval = x + y;
	if (((x ^ y) & MIN_32) == 0) {		// Sign of x and y the same
		if ((retval ^ x) & MIN_32) {	// Sign of retval and x are different
			retval = (x < 0) ? MIN_32 : MAX_32;
		}
	}
	return retval;
}


int32_t satadd3(int32_t x, int32_t y) {
	typedef union { int32_t i; signed long sat fract f; } satInt_t;

	satInt_t a, b, c;
	a.i = x;
	b.i = y;
	c.f = a.f + b.f;
	return c.i;
}




// test.lst

   65              	satadd1:
   66              		@ args = 0, pretend = 0, frame = 0
   67              		@ frame_needed = 0, uses_anonymous_args = 0
   68              		@ link register save eliminated.
   69 0000 0346     		mov	r3, r0	@ x, x
   70 0002 002B     		cmp	r3, #0	@ x,
   71 0004 0844     		add	r0, r0, r1	@ retval, x, y
   72 0006 0ADA     		bge	.L2	@,
   73 0008 0029     		cmp	r1, #0	@ y,
   74 000a 0FDA     		bge	.L5	@,
   75 000c 00F10041 		add	r1, r0, #-2147483648	@ tmp140, retval,
   76 0010 4B1E     		subs	r3, r1, #1	@ tmp140, tmp140,
   77 0012 074A     		ldr	r2, .L8	@ tmp144,
   78 0014 0749     		ldr	r1, .L8+4	@ tmp142,
   79 0016 8B42     		cmp	r3, r1	@ tmp140, tmp142
   80 0018 88BF     		it	hi	@
   81 001a 1046     		movhi	r0, r2	@, retval, tmp144
   82 001c 7047     		bx	lr	@
   83              	.L2:
   84 001e 05D0     		beq	.L5	@,
   85 0020 0029     		cmp	r1, #0	@ y,
   86 0022 03DD     		ble	.L5	@,
   87 0024 0028     		cmp	r0, #0	@ retval,
   88 0026 D8BF     		it	le	@
   89 0028 6FF00040 		mvnle	r0, #-2147483648	@, retval,
   90              	.L5:
   91 002c 7047     		bx	lr	@
   92              	.L9:
   93 002e 00BF     		.align	2
   94              	.L8:
   95 0030 01000080 		.word	-2147483647
   96 0034 FEFFFF7F 		.word	2147483646
   98              		.align	1
   99              		.global	satadd2
  100              		.thumb
  101              		.thumb_func
  103              	satadd2:
  104              		@ args = 0, pretend = 0, frame = 0
  105              		@ frame_needed = 0, uses_anonymous_args = 0
  106              		@ link register save eliminated.
  107 0038 0346     		mov	r3, r0	@ x, x
  108 003a 91EA030F 		teq	r1, r3	@ y, x
  109 003e 0844     		add	r0, r0, r1	@ retval, x, y
  110 0040 08D4     		bmi	.L11	@,
  111 0042 90EA030F 		teq	r0, r3	@ retval, x
  112 0046 05D5     		bpl	.L11	@,
  113 0048 002B     		cmp	r3, #0	@ x,
  114 004a ACBF     		ite	ge	@
  115 004c 6FF00040 		mvnge	r0, #-2147483648	@, retval,
  116 0050 4FF00040 		movlt	r0, #-2147483648	@, retval,
  117              	.L11:
ARM GAS  /tmp/ccDhsXUk.s 			page 3


  118 0054 7047     		bx	lr	@
  120              		.align	1
  121              		.global	satadd3
  122              		.thumb
  123              		.thumb_func
  125              	satadd3:
  126              		@ args = 0, pretend = 0, frame = 0
  127              		@ frame_needed = 0, uses_anonymous_args = 0
  128              		@ link register save eliminated.
  129 0056 81FA80F0 		qadd	r0, r0, r1	@ <retval>, x, y
  130 005a 7047     		bx	lr	@
  132              		.ident	"GCC: (Sourcery CodeBench Lite 2011.09-69) 
4.6.1"
ARM GAS  /tmp/ccDhsXUk.s 			page 4

Reply by Tim Wescott ●March 29, 20122012-03-29

On Thu, 29 Mar 2012 23:41:30 +0200, David Brown wrote:

> However, this all raises bigger questions - why are you making your own
> code for this?  Modern compilers (such as gcc) support fractional types
> (from ISO/IEC TR 18037).  If you use them, as in satadd3(), the compiler
> will generate optimal code for processors with hardware support (such as
> the Cortex-M4).  For other processors, such as the Cortex-M3, the
> compiler automatically uses a library routine.  You can expect such
> library routines to be pretty optimal for the architecture in question
> (for the M3, the library code is the same as for satadd2() below, which
> is hardly surprising given the source of that function).

Damn.  And I thought I was so smart.

So, how long must I have been sleeping?

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by Mark Borgerson ●March 30, 20122012-03-30

In article <rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> 
> Hey Walter (et all, if you're out there):
> 
> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a 
> _lot_ faster when you precede it with
> 
> #define ASSEMBLY_WORKS
> 
> than when you don't.
> 
> Yet you say that an optimizer should eat up the C code and spit out 
> assembly that's better than I can do.
> 
> How come the difference?  Is it the tools?  I know it's not because it's 
> the World's Best ARM Assembly, because I've learned a bit since I did it 
> and could probably speed it up -- or at least make it cleaner.
> 
> CFractional CFractional::operator + (CFractional y) const
> {
> #ifdef ASSEMBLY_WORKS
>   int32_t a = _x;
>   int32_t b = y._x;
>   asm ( "adds   %[a], %[b]\n"     // subtract
>         "bvc    .sat_add_vc\n"    // check for overflow
>         "ite    mi\n"
>         "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>         "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>         "b      .sat_add_ret\n"
>         ".sat_add_maxpos: .word   0x7fffffff\n"
>         ".sat_add_maxneg: .word   0x80000001\n"
>         ".sat_add_forbid: .word   0x80000000\n"
>         ".sat_add_vc:\n"
>         "bpl    .sat_add_ret\n"
>         "ldr    %[b], .sat_add_forbid\n"
>         "cmp    %[a], %[b]\n"
>         "it     eq\n"
>         "moveq  %[a], %[b]\n"
>         ".sat_add_ret:\n"
>         : [a] "=r" (a), [b] "=r" (b)
>         : "[a]" "r" (a), "[b]" "r" (b));
> 
>   return CFractional(a);
> #else
>   int32_t retval = _x + y._x;
> 
>   // Check for underflow and saturate if so
>   if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX))
>   {
>     retval = -INT32_MAX;
>   }
> 
>   // check for overflow and saturate if so
>   if (_x > 0 && y._x > 0 && retval <= 0)
>   {
>     retval = INT32_MAX;
>   }
> 
>   return retval;
> #endif
> }

I was going to try out that code on the IAR EWARM compiler at various 
optimization levels----until I realized that 

"CFractional CFractional::operator + (CFractional y) const"

doesn't look like C to me.  Am I missing something??

Could you include enough information to make that example 
directly compilable in standard C.?

Mark Borgerson

Reply by David Brown ●March 30, 20122012-03-30

On 30/03/2012 06:27, Mark Borgerson wrote:
> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
> tim@seemywebsite.com says...
>>
>> Hey Walter (et all, if you're out there):
>>
>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
>> _lot_ faster when you precede it with
>>
>> #define ASSEMBLY_WORKS
>>
>> than when you don't.
>>
>> Yet you say that an optimizer should eat up the C code and spit out
>> assembly that's better than I can do.
>>
>> How come the difference?  Is it the tools?  I know it's not because it's
>> the World's Best ARM Assembly, because I've learned a bit since I did it
>> and could probably speed it up -- or at least make it cleaner.
>>
>> CFractional CFractional::operator + (CFractional y) const
>> {
>> #ifdef ASSEMBLY_WORKS
>>    int32_t a = _x;
>>    int32_t b = y._x;
>>    asm ( "adds   %[a], %[b]\n"     // subtract
>>          "bvc    .sat_add_vc\n"    // check for overflow
>>          "ite    mi\n"
>>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
>>          "b      .sat_add_ret\n"
>>          ".sat_add_maxpos: .word   0x7fffffff\n"
>>          ".sat_add_maxneg: .word   0x80000001\n"
>>          ".sat_add_forbid: .word   0x80000000\n"
>>          ".sat_add_vc:\n"
>>          "bpl    .sat_add_ret\n"
>>          "ldr    %[b], .sat_add_forbid\n"
>>          "cmp    %[a], %[b]\n"
>>          "it     eq\n"
>>          "moveq  %[a], %[b]\n"
>>          ".sat_add_ret:\n"
>>          : [a] "=r" (a), [b] "=r" (b)
>>          : "[a]" "r" (a), "[b]" "r" (b));
>>
>>    return CFractional(a);
>> #else
>>    int32_t retval = _x + y._x;
>>
>>    // Check for underflow and saturate if so
>>    if (_x<  0&&  y._x<  0&&  (retval>= 0 || retval<  -INT32_MAX))
>>    {
>>      retval = -INT32_MAX;
>>    }
>>
>>    // check for overflow and saturate if so
>>    if (_x>  0&&  y._x>  0&&  retval<= 0)
>>    {
>>      retval = INT32_MAX;
>>    }
>>
>>    return retval;
>> #endif
>> }
>
> I was going to try out that code on the IAR EWARM compiler at various
> optimization levels----until I realized that
>
> "CFractional CFractional::operator + (CFractional y) const"
>
> doesn't look like C to me.  Am I missing something??
>
> Could you include enough information to make that example
> directly compilable in standard C.?
>
> Mark Borgerson
>
>

It is clearly C++, but it would seem that CFractional is a class 
containing an int32_t member "_x" which is the fractional value in 
question.  Think of it as syntactic sugar around the function

	int32_t add_sat_frac(int32_t a, int32_t b);

(Or see my re-write of the code in C in my other post.)

mvh.,

David

Reply by Mark Borgerson ●March 30, 20122012-03-30

In article <BcSdna8XpZPQ_ujSnZ2dnUVZ7v2dnZ2d@lyse.net>, 
david@westcontrol.removethisbit.com says...
> 
> On 30/03/2012 06:27, Mark Borgerson wrote:
> > In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
> > tim@seemywebsite.com says...
> >>
> >> Hey Walter (et all, if you're out there):
> >>
> >> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a
> >> _lot_ faster when you precede it with
> >>
> >> #define ASSEMBLY_WORKS
> >>
> >> than when you don't.
> >>
> >> Yet you say that an optimizer should eat up the C code and spit out
> >> assembly that's better than I can do.
> >>
> >> How come the difference?  Is it the tools?  I know it's not because it's
> >> the World's Best ARM Assembly, because I've learned a bit since I did it
> >> and could probably speed it up -- or at least make it cleaner.
> >>
> >> CFractional CFractional::operator + (CFractional y) const
> >> {
> >> #ifdef ASSEMBLY_WORKS
> >>    int32_t a = _x;
> >>    int32_t b = y._x;
> >>    asm ( "adds   %[a], %[b]\n"     // subtract
> >>          "bvc    .sat_add_vc\n"    // check for overflow
> >>          "ite    mi\n"
> >>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
> >>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative
> >>          "b      .sat_add_ret\n"
> >>          ".sat_add_maxpos: .word   0x7fffffff\n"
> >>          ".sat_add_maxneg: .word   0x80000001\n"
> >>          ".sat_add_forbid: .word   0x80000000\n"
> >>          ".sat_add_vc:\n"
> >>          "bpl    .sat_add_ret\n"
> >>          "ldr    %[b], .sat_add_forbid\n"
> >>          "cmp    %[a], %[b]\n"
> >>          "it     eq\n"
> >>          "moveq  %[a], %[b]\n"
> >>          ".sat_add_ret:\n"
> >>          : [a] "=r" (a), [b] "=r" (b)
> >>          : "[a]" "r" (a), "[b]" "r" (b));
> >>
> >>    return CFractional(a);
> >> #else
> >>    int32_t retval = _x + y._x;
> >>
> >>    // Check for underflow and saturate if so
> >>    if (_x<  0&&  y._x<  0&&  (retval>= 0 || retval<  -INT32_MAX))
> >>    {
> >>      retval = -INT32_MAX;
> >>    }
> >>
> >>    // check for overflow and saturate if so
> >>    if (_x>  0&&  y._x>  0&&  retval<= 0)
> >>    {
> >>      retval = INT32_MAX;
> >>    }
> >>
> >>    return retval;
> >> #endif
> >> }
> >
> > I was going to try out that code on the IAR EWARM compiler at various
> > optimization levels----until I realized that
> >
> > "CFractional CFractional::operator + (CFractional y) const"
> >
> > doesn't look like C to me.  Am I missing something??
> >
> > Could you include enough information to make that example
> > directly compilable in standard C.?
> >
> > Mark Borgerson
> >
> >
> 
> It is clearly C++, but it would seem that CFractional is a class 
> containing an int32_t member "_x" which is the fractional value in 
> question.  Think of it as syntactic sugar around the function
> 
> 	int32_t add_sat_frac(int32_t a, int32_t b);
> 
> (Or see my re-write of the code in C in my other post.)

I did look at the C code and the compiler outputs.   It 
seems that compilers have come a long way since I wrote some
68K assembly because the compiler  refused to use the most
efficient decrement-test-and-loop instruction (DBNE D0, Dest,
I think).

It is clear to me that the compiler writers are way ahead of
me for the ARM and ARM-Cortex chips.  Even on the simpler
MSP430,  I seldom use assembly outside the startup code.
I still look at the assembly listing in the debugger, though.

Mark Borgerson

Reply by Tim Wescott ●March 30, 20122012-03-30

On Fri, 30 Mar 2012 09:31:50 +0200, David Brown wrote:

> On 30/03/2012 06:27, Mark Borgerson wrote:
>> In article<rZydncc478zzA-nSnZ2dnUVZ_ridnZ2d@web-ster.com>,
>> tim@seemywebsite.com says...
>>>
>>> Hey Walter (et all, if you're out there):
>>>
>>> With the gnu tools, optimizations on, and an Arm Cortex M3, this goes
>>> a _lot_ faster when you precede it with
>>>
>>> #define ASSEMBLY_WORKS
>>>
>>> than when you don't.
>>>
>>> Yet you say that an optimizer should eat up the C code and spit out
>>> assembly that's better than I can do.
>>>
>>> How come the difference?  Is it the tools?  I know it's not because
>>> it's the World's Best ARM Assembly, because I've learned a bit since I
>>> did it and could probably speed it up -- or at least make it cleaner.
>>>
>>> CFractional CFractional::operator + (CFractional y) const {
>>> #ifdef ASSEMBLY_WORKS
>>>    int32_t a = _x;
>>>    int32_t b = y._x;
>>>    asm ( "adds   %[a], %[b]\n"     // subtract
>>>          "bvc    .sat_add_vc\n"    // check for overflow "ite    mi\n"
>>>          "ldrmi  %[a], .sat_add_maxpos\n"  // set to max positive
>>>          "ldrpl  %[a], .sat_add_maxneg\n"  // set to max negative "b  
>>>             .sat_add_ret\n"
>>>          ".sat_add_maxpos: .word   0x7fffffff\n" ".sat_add_maxneg:
>>>          .word   0x80000001\n" ".sat_add_forbid: .word   0x80000000\n"
>>>          ".sat_add_vc:\n"
>>>          "bpl    .sat_add_ret\n"
>>>          "ldr    %[b], .sat_add_forbid\n"
>>>          "cmp    %[a], %[b]\n"
>>>          "it     eq\n"
>>>          "moveq  %[a], %[b]\n"
>>>          ".sat_add_ret:\n"
>>>          : [a] "=r" (a), [b] "=r" (b)
>>>          : "[a]" "r" (a), "[b]" "r" (b));
>>>
>>>    return CFractional(a);
>>> #else
>>>    int32_t retval = _x + y._x;
>>>
>>>    // Check for underflow and saturate if so if (_x<  0&&  y._x<  0&& 
>>>    (retval>= 0 || retval<  -INT32_MAX)) {
>>>      retval = -INT32_MAX;
>>>    }
>>>
>>>    // check for overflow and saturate if so if (_x>  0&&  y._x>  0&& 
>>>    retval<= 0) {
>>>      retval = INT32_MAX;
>>>    }
>>>
>>>    return retval;
>>> #endif
>>> }
>>
>> I was going to try out that code on the IAR EWARM compiler at various
>> optimization levels----until I realized that
>>
>> "CFractional CFractional::operator + (CFractional y) const"
>>
>> doesn't look like C to me.  Am I missing something??
>>
>> Could you include enough information to make that example directly
>> compilable in standard C.?
>>
>> Mark Borgerson
>>
>>
>>
> It is clearly C++, but it would seem that CFractional is a class
> containing an int32_t member "_x" which is the fractional value in
> question.  Think of it as syntactic sugar around the function
> 
> 	int32_t add_sat_frac(int32_t a, int32_t b);

Yup.  In fact, that's an awful lot like what the call looks like when I 
need to do this in C (except that I'm going to be investigating just how 
ubiquitous fractional support is, now that I've been made aware of it).

Sorry for not elucidating -- I thought it would be obvious.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by David Brown ●March 31, 20122012-03-31

On 30/03/12 16:22, Mark Borgerson wrote:
> I did look at the C code and the compiler outputs.   It
> seems that compilers have come a long way since I wrote some
> 68K assembly because the compiler  refused to use the most
> efficient decrement-test-and-loop instruction (DBNE D0, Dest,
> I think).
>

That reminds me of a situation where C was much better than assembly for 
startup code.  This was 15 years ago - the compiler in question being 
about 20 years old now.  The toolchain-provided startup code for 
clearing the bss was written in assembly, as is common.  And it was slow 
and inefficient - also a very common situation for toolchain-provided 
assembly code.  I re-wrote it in C - the result was clearer code, half 
the size of object code, and something like 10 times as fast run time. 
Ironically it is because the compiler generated a DBNE instruction, 
which the assembly code did not use.

(The compiler will only be able to generate DBNE instructions for a 
16-bit counter, not an 32-bit "int" counter.  It's one of the few 16-bit 
only instructions on the m68k.)

> It is clear to me that the compiler writers are way ahead of
> me for the ARM and ARM-Cortex chips.  Even on the simpler
> MSP430,  I seldom use assembly outside the startup code.
> I still look at the assembly listing in the debugger, though.
>
> Mark Borgerson
>

Previous12 Next

A Challenge for our Compiler Writer(s)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group