EmbeddedRelated.com
Forums
Memfault Beyond the Launch

assembler math

Started by khalakatevakis January 22, 2003
Hello
I'm using MSP430F1101 and my project requires some math equations, 
including 32 bit multiplication and division. When i use IAR C, the 
micro runs quickly out of memory.
Has anybody done 32 bit math in assembly? 


Beginning Microcontrollers with the MSP430

> Hello
> I'm using MSP430F1101 and my project requires some math equations, 
> including 32 bit multiplication and division. When i use IAR C, the 
> micro runs quickly out of memory.
> Has anybody done 32 bit math in assembly? 

Yes, to support our C compiler.  ;-)

I assume that you mean 32-bit integer (or unsigned) multiplication and
division.  If this is the case, then take a look at the TI Application
Report Book, chapter #5
(http://focus.ti.com/analog/docs/articles.tsp?familyId42&templateIdR
46&path=templatedata/cm/brc/data/20011218msp430userguides&articleType=br
c#app)

This has mixed 32-bit/16-bit operations.  Be warned, these are not the
fastest routines in the world as they bit-bash the multiplication with
no intelligence and with no respect for the input parameters.  A better
approach to multiplication is to look at the operands before starting
off, same for division.

This app note will get you some way to coding the 32-bit routines
yourself.  I'd also look up some references to multiplication and
division using Google to get a flavour of more efficient algorithms.

Also good for a read are the floating-point package AN and the hardware
multiplier AN.  Of course, TI's published FP code isn't as fast as
ours
and it doesn't conform to the standard IEEE formats...  ;-)

Good luck!

-- Paul.

--- In msp430@msp4..., "khalakatevakis 
<halakatevakis@e...>" <halakatevakis@e...> wrote:
> Hello
> I'm using MSP430F1101 and my project requires some math equations, 
> including 32 bit multiplication and division. When i use IAR C, the 
> micro runs quickly out of memory.
> Has anybody done 32 bit math in assembly?

What sort of math? Integer arith is really simple and  requires no RAM 
(multiplication - r10 to r15, division - r8 - r15). 
Float arith will be more RAM exhaustive - mult requires about
r4 - r15 and 22 bytes of ram, division - r14 to r15 and 18 bytes of ram.
(not really optimized)
~d


> --- In msp430@msp4..., "khalakatevakis 
> <halakatevakis@e...>"
<halakatevakis@e...> wrote:
> > Hello
> > I'm using MSP430F1101 and my project requires some math
equations,
> > including 32 bit multiplication and division. When i use IAR C, the 
> > micro runs quickly out of memory.
> > Has anybody done 32 bit math in assembly?
> 
> What sort of math? Integer arith is really simple and  
> requires no RAM 
> (multiplication - r10 to r15, division - r8 - r15). 

> Float arith will be more RAM exhaustive - mult
requires about 
> r4 - r15 and 22 bytes of ram,

I don't see this at all.  My hand-coded float multiply routines only use
R6-R15 (input in R14:R15 and R13:R12) and, because we don't save R12-R15
across calls, we only need to save 6 registers for a total of 12
bytes--all working storage is in registers.  If you have a hardware
multiplier, we use two fewer registers (R8-R15) and consequently only 8
bytes of RAM.  If you don't need to save the working registers across
calls (which is very common in some cases, for instance when computing a
power series), then you can dispense with the RAM requirement and do all
arithmetic in registers.

> division - r14 to r15 and 18 
> bytes of ram. (not really optimized) ~d

Division requires R7-R15 (with input in R13:R12 and R15:R14) and only
the requirement to save the registers.  If yo can do without this, fp
division can be completed in registers only.

-- Paul.

> 
> I don't see this at all.  My hand-coded float
multiply routines only use
> R6-R15 (input in R14:R15 and R13:R12) and, because we don't save 
R12-R15
> across calls, we only need to save 6 registers for
a total of 12
> bytes--all working storage is in registers.  If you have a hardware
> multiplier, we use two fewer registers (R8-R15) and consequently only 
8
> bytes of RAM.  If you don't need to save the
working registers across
> calls (which is very common in some cases, for instance when 
computing a
> power series), then you can dispense with the RAM
requirement and 
do all
> arithmetic in registers.

That's cool!
HOw about singularities? 

And yes, we also do not save r12-r15, yet just in case one wants to save 
them...

> 
> > division - r14 to r15 and 18 
> > bytes of ram. (not really optimized) ~d

Sorry, r4 - r15 (r14-r12 saved on stack + return address == 18 bytes). 
 
> 
> Division requires R7-R15 (with input in R13:R12 and R15:R14) and only
> the requirement to save the registers.  If yo can do without this, fp
> division can be completed in registers only.

Well, C coded FP mul/div do not consume stack for local vars at all and 
everything done in registers. Probably, moce accurate coding will result 
better output and assembly coding will give even better result.
The figures above are what gcc does.

~d

> 
> -- Paul.


Hi,

> > I don't see this at all.  My hand-coded
float multiply 
> routines only 
> > use R6-R15 (input in R14:R15 and R13:R12) and, because we don't
save
> R12-R15
> > across calls, we only need to save 6 registers for a total of 12 
> > bytes--all working storage is in registers.  If you have a hardware 
> > multiplier, we use two fewer registers (R8-R15) and 
> consequently only
> 8
> > bytes of RAM.  If you don't need to save the working 
> registers across 
> > calls (which is very common in some cases, for instance when
> computing a
> > power series), then you can dispense with the RAM requirement and
> do all
> > arithmetic in registers.
> 
> That's cool!
> HOw about singularities? 

If you mean IEEE infinities?  I deal with those and also NaNs.  It's a
simple matter to extract the exponent and compare with 255 (in the
single precision case) and deal with the exceptional cases--you need to
extract the exponent anyway.

The power series is a good example of where cascading things works very
well -- ((x*k1 + k2)*x + k3)*x + k4...  In this case, you you're
multiply-adding.  To make this clearer, this what our compiler does with
the above, if x is a float parameter and the ks are constants, as is
usual with a power series:

  power(float x)

1214                            .ALIGN  BYTE[2]
1214                    _power
1214                            LOCAL   x @ R10
1214  0b12                      PUSH.W  R11
1216  0a12                      PUSH.W  R10
1218  0a4e                      MOV.W   R14, R10
121a  0b4f                      MOV.W   R15, R11

    return ((x+K1)*x + K2)*x + K3;

121c  0e4a                      MOV.W   R10, R14
121e  0f4b                      MOV.W   R11, R15
1220  3c40b6f3                  MOV.W   #62390, R12
1224  3d409d3f                  MOV.W   #16285, R13
1228  b012e815                  CALL    #___float32_add
122c  0c4a                      MOV.W   R10, R12
122e  0d4b                      MOV.W   R11, R13
1230  b012da16                  CALL    #___float32_mul
1234  3c407b14                  MOV.W   #5243, R12
1238  3d401640                  MOV.W   #16406, R13
123c  b012e815                  CALL    #___float32_add
1240  0c4a                      MOV.W   R10, R12
1242  0d4b                      MOV.W   R11, R13
1244  b012da16                  CALL    #___float32_mul
1248  3c401b2f                  MOV.W   #12059, R12
124c  3d405d40                  MOV.W   #16477, R13
1250  b012e815                  CALL    #___float32_add
1254  3a41                      MOV.W   @SP+, R10
1256  3b41                      MOV.W   @SP+, R11
1258  3041                      RET

It's all very neat.  I love the MSP430, it's just so cute... :-)

-- Paul.

Hi,

> Not so bad...
> but the output reminds me something... below... :)
> ~d

Yep, looks like gcc compiles to the same sort of output that we do
(except in my case, the Ks were constants).

But, then again, I bet that our compiler is a lot faster than gcc to
compile and link, and it comes with a pretty IDE.  But then it's not as
cheap as gcc.  :-)

-- Paul.

> 
> -------------------------------
> float K1, K2, K3;
> float pow(float x) {
> return ((x+K1)*x + K2)*x + K3;
> }
> 
> /***********************
>  * Function `pow'
>  ***********************/
> pow:
> /* prologue: frame size = 0 */
> .L__FrameSize_pow=0x0
> .L__FrameOffset_pow=0x4
>         push    r11
>         push    r10
> /* prologue end (size=2) */
>         mov     r14, r10
>         mov     r15, r11
>         mov     &K1, r12
>         mov     &K1+2, r13
>         call    #__addsf3
>         mov     r10, r12
>         mov     r11, r13
>         call    #__mulsf3
>         mov     &K2, r12
>         mov     &K2+2, r13
>         call    #__addsf3
>         mov     r10, r12
>         mov     r11, r13
>         call    #__mulsf3
>         mov     &K3, r12
>         mov     &K3+2, r13
>         call    #__addsf3
> /* epilogue: frame size=0 */
>         pop     r10
>         pop     r11
>         ret
> /* epilogue end (size=3) */
> /* function pow size 33 (28) */
> .Lfe1:
>         .size   pow,.Lfe1-pow
> /********* End of function ******/
> -
> > The power series is a good example of where cascading things works
> very
> > well -- ((x*k1 + k2)*x + k3)*x + k4...  In this case, you you're 
> > multiply-adding.  To make this clearer, this what our compiler does 
> > with the above, if x is a float parameter and the ks are 
> constants, as 
> > is usual with a power series:
> > 
> >   power(float x)
> > 
> > 1214                            .ALIGN  BYTE[2]
> > 1214                    _power
> > 1214                            LOCAL   x @ R10
> > 1214  0b12                      PUSH.W  R11
> > 1216  0a12                      PUSH.W  R10
> > 1218  0a4e                      MOV.W   R14, R10
> > 121a  0b4f                      MOV.W   R15, R11
> > 
> >     return ((x+K1)*x + K2)*x + K3;
> > 
> > 121c  0e4a                      MOV.W   R10, R14
> > 121e  0f4b                      MOV.W   R11, R15
> > 1220  3c40b6f3                  MOV.W   #62390, R12
> > 1224  3d409d3f                  MOV.W   #16285, R13
> > 1228  b012e815                  CALL    #___float32_add
> > 122c  0c4a                      MOV.W   R10, R12
> > 122e  0d4b                      MOV.W   R11, R13
> > 1230  b012da16                  CALL    #___float32_mul
> > 1234  3c407b14                  MOV.W   #5243, R12
> > 1238  3d401640                  MOV.W   #16406, R13
> > 123c  b012e815                  CALL    #___float32_add
> > 1240  0c4a                      MOV.W   R10, R12
> > 1242  0d4b                      MOV.W   R11, R13
> > 1244  b012da16                  CALL    #___float32_mul
> > 1248  3c401b2f                  MOV.W   #12059, R12
> > 124c  3d405d40                  MOV.W   #16477, R13
> > 1250  b012e815                  CALL    #___float32_add
> > 1254  3a41                      MOV.W   @SP+, R10
> > 1256  3b41                      MOV.W   @SP+, R11
> > 1258  3041                      RET
> > 
> > It's all very neat.  I love the MSP430, it's just so cute...
:-)
> > 
> > -- Paul.
> 
> 
> .
> 
>  
> 
> ">http://docs.yahoo.com/info/terms/ 
> 
> 
> 

Not so bad...
but the output reminds me something... below... :)
~d

-------------------------------
float K1, K2, K3;
float pow(float x) {
return ((x+K1)*x + K2)*x + K3;
}

/***********************
 * Function `pow'
 ***********************/
pow:
/* prologue: frame size = 0 */
.L__FrameSize_pow=0x0
.L__FrameOffset_pow=0x4
        push    r11
        push    r10
/* prologue end (size=2) */
        mov     r14, r10
        mov     r15, r11
        mov     &K1, r12
        mov     &K1+2, r13
        call    #__addsf3
        mov     r10, r12
        mov     r11, r13
        call    #__mulsf3
        mov     &K2, r12
        mov     &K2+2, r13
        call    #__addsf3
        mov     r10, r12
        mov     r11, r13
        call    #__mulsf3
        mov     &K3, r12
        mov     &K3+2, r13
        call    #__addsf3
/* epilogue: frame size=0 */
        pop     r10
        pop     r11
        ret
/* epilogue end (size=3) */
/* function pow size 33 (28) */
.Lfe1:
        .size   pow,.Lfe1-pow
/********* End of function ******/
-
> The power series is a good example of where
cascading things works 
very
> well -- ((x*k1 + k2)*x + k3)*x + k4...  In this
case, you you're
> multiply-adding.  To make this clearer, this what our compiler does with
> the above, if x is a float parameter and the ks are constants, as is
> usual with a power series:
> 
>   power(float x)
> 
> 1214                            .ALIGN  BYTE[2]
> 1214                    _power
> 1214                            LOCAL   x @ R10
> 1214  0b12                      PUSH.W  R11
> 1216  0a12                      PUSH.W  R10
> 1218  0a4e                      MOV.W   R14, R10
> 121a  0b4f                      MOV.W   R15, R11
> 
>     return ((x+K1)*x + K2)*x + K3;
> 
> 121c  0e4a                      MOV.W   R10, R14
> 121e  0f4b                      MOV.W   R11, R15
> 1220  3c40b6f3                  MOV.W   #62390, R12
> 1224  3d409d3f                  MOV.W   #16285, R13
> 1228  b012e815                  CALL    #___float32_add
> 122c  0c4a                      MOV.W   R10, R12
> 122e  0d4b                      MOV.W   R11, R13
> 1230  b012da16                  CALL    #___float32_mul
> 1234  3c407b14                  MOV.W   #5243, R12
> 1238  3d401640                  MOV.W   #16406, R13
> 123c  b012e815                  CALL    #___float32_add
> 1240  0c4a                      MOV.W   R10, R12
> 1242  0d4b                      MOV.W   R11, R13
> 1244  b012da16                  CALL    #___float32_mul
> 1248  3c401b2f                  MOV.W   #12059, R12
> 124c  3d405d40                  MOV.W   #16477, R13
> 1250  b012e815                  CALL    #___float32_add
> 1254  3a41                      MOV.W   @SP+, R10
> 1256  3b41                      MOV.W   @SP+, R11
> 1258  3041                      RET
> 
> It's all very neat.  I love the MSP430, it's just so cute... :-)
> 
> -- Paul.


> 
> But, then again, I bet that our compiler is a lot
faster than gcc to
> compile and link, and it comes with a pretty IDE.  But then it's not
as
> cheap as gcc.  :-)

Well, probably you have to pay more attention to run-time performance 
than to compile-time for 64K only core ;)
~d


> 
> -- Paul.
> 
> > 
> > -------------------------------
> > float K1, K2, K3;
> > float pow(float x) {
> > return ((x+K1)*x + K2)*x + K3;
> > }
> > 
> > /***********************
> >  * Function `pow'
> >  ***********************/
> > pow:
> > /* prologue: frame size = 0 */
> > .L__FrameSize_pow=0x0
> > .L__FrameOffset_pow=0x4
> >         push    r11
> >         push    r10
> > /* prologue end (size=2) */
> >         mov     r14, r10
> >         mov     r15, r11
> >         mov     &K1, r12
> >         mov     &K1+2, r13
> >         call    #__addsf3
> >         mov     r10, r12
> >         mov     r11, r13
> >         call    #__mulsf3
> >         mov     &K2, r12
> >         mov     &K2+2, r13
> >         call    #__addsf3
> >         mov     r10, r12
> >         mov     r11, r13
> >         call    #__mulsf3
> >         mov     &K3, r12
> >         mov     &K3+2, r13
> >         call    #__addsf3
> > /* epilogue: frame size=0 */
> >         pop     r10
> >         pop     r11
> >         ret
> > /* epilogue end (size=3) */
> > /* function pow size 33 (28) */
> > .Lfe1:
> >         .size   pow,.Lfe1-pow
> > /********* End of function ******/
> > -
> > > The power series is a good example of where cascading things 
works
> > very
> > > well -- ((x*k1 + k2)*x + k3)*x + k4...  In this case, you
you're 
> > > multiply-adding.  To make this clearer, this what our compiler
does 
> > > with the above, if x is a float parameter and the ks are 
> > constants, as 
> > > is usual with a power series:
> > > 
> > >   power(float x)
> > > 
> > > 1214                            .ALIGN  BYTE[2]
> > > 1214                    _power
> > > 1214                            LOCAL   x @ R10
> > > 1214  0b12                      PUSH.W  R11
> > > 1216  0a12                      PUSH.W  R10
> > > 1218  0a4e                      MOV.W   R14, R10
> > > 121a  0b4f                      MOV.W   R15, R11
> > > 
> > >     return ((x+K1)*x + K2)*x + K3;
> > > 
> > > 121c  0e4a                      MOV.W   R10, R14
> > > 121e  0f4b                      MOV.W   R11, R15
> > > 1220  3c40b6f3                  MOV.W   #62390, R12
> > > 1224  3d409d3f                  MOV.W   #16285, R13
> > > 1228  b012e815                  CALL    #___float32_add
> > > 122c  0c4a                      MOV.W   R10, R12
> > > 122e  0d4b                      MOV.W   R11, R13
> > > 1230  b012da16                  CALL    #___float32_mul
> > > 1234  3c407b14                  MOV.W   #5243, R12
> > > 1238  3d401640                  MOV.W   #16406, R13
> > > 123c  b012e815                  CALL    #___float32_add
> > > 1240  0c4a                      MOV.W   R10, R12
> > > 1242  0d4b                      MOV.W   R11, R13
> > > 1244  b012da16                  CALL    #___float32_mul
> > > 1248  3c401b2f                  MOV.W   #12059, R12
> > > 124c  3d405d40                  MOV.W   #16477, R13
> > > 1250  b012e815                  CALL    #___float32_add
> > > 1254  3a41                      MOV.W   @SP+, R10
> > > 1256  3b41                      MOV.W   @SP+, R11
> > > 1258  3041                      RET
> > > 
> > > It's all very neat.  I love the MSP430, it's just so
cute... :-)
> > > 
> > > -- Paul.
> > 
> > 
> > .
> > 
> >  
> > 
> > ">http://docs.yahoo.com/info/terms/ 
> > 
> > 
> >



Memfault Beyond the Launch