Hello I'm using MSP430F1101 and my project requires some math equations, including 32 bit multiplication and division. When i use IAR C, the micro runs quickly out of memory. Has anybody done 32 bit math in assembly?
assembler math
Started by ●January 22, 2003
Reply by ●January 22, 20032003-01-22
> Hello > I'm using MSP430F1101 and my project requires some math equations, > including 32 bit multiplication and division. When i use IAR C, the > micro runs quickly out of memory. > Has anybody done 32 bit math in assembly? Yes, to support our C compiler. ;-) I assume that you mean 32-bit integer (or unsigned) multiplication and division. If this is the case, then take a look at the TI Application Report Book, chapter #5 (http://focus.ti.com/analog/docs/articles.tsp?familyId42&templateIdR 46&path=templatedata/cm/brc/data/20011218msp430userguides&articleType=br c#app) This has mixed 32-bit/16-bit operations. Be warned, these are not the fastest routines in the world as they bit-bash the multiplication with no intelligence and with no respect for the input parameters. A better approach to multiplication is to look at the operands before starting off, same for division. This app note will get you some way to coding the 32-bit routines yourself. I'd also look up some references to multiplication and division using Google to get a flavour of more efficient algorithms. Also good for a read are the floating-point package AN and the hardware multiplier AN. Of course, TI's published FP code isn't as fast as ours and it doesn't conform to the standard IEEE formats... ;-) Good luck! -- Paul.
Reply by ●January 22, 20032003-01-22
--- In msp430@msp4..., "khalakatevakis
<halakatevakis@e...>" <halakatevakis@e...> wrote:
> Hello
> I'm using MSP430F1101 and my project requires some math equations,
> including 32 bit multiplication and division. When i use IAR C, the
> micro runs quickly out of memory.
> Has anybody done 32 bit math in assembly?
What sort of math? Integer arith is really simple and requires no RAM
(multiplication - r10 to r15, division - r8 - r15).
Float arith will be more RAM exhaustive - mult requires about
r4 - r15 and 22 bytes of ram, division - r14 to r15 and 18 bytes of ram.
(not really optimized)
~d
Reply by ●January 22, 20032003-01-22
> --- In msp430@msp4..., "khalakatevakis > <halakatevakis@e...>" <halakatevakis@e...> wrote: > > Hello > > I'm using MSP430F1101 and my project requires some math equations, > > including 32 bit multiplication and division. When i use IAR C, the > > micro runs quickly out of memory. > > Has anybody done 32 bit math in assembly? > > What sort of math? Integer arith is really simple and > requires no RAM > (multiplication - r10 to r15, division - r8 - r15). > Float arith will be more RAM exhaustive - mult requires about > r4 - r15 and 22 bytes of ram, I don't see this at all. My hand-coded float multiply routines only use R6-R15 (input in R14:R15 and R13:R12) and, because we don't save R12-R15 across calls, we only need to save 6 registers for a total of 12 bytes--all working storage is in registers. If you have a hardware multiplier, we use two fewer registers (R8-R15) and consequently only 8 bytes of RAM. If you don't need to save the working registers across calls (which is very common in some cases, for instance when computing a power series), then you can dispense with the RAM requirement and do all arithmetic in registers. > division - r14 to r15 and 18 > bytes of ram. (not really optimized) ~d Division requires R7-R15 (with input in R13:R12 and R15:R14) and only the requirement to save the registers. If yo can do without this, fp division can be completed in registers only. -- Paul.
Reply by ●January 22, 20032003-01-22
> > I don't see this at all. My hand-coded float multiply routines only use > R6-R15 (input in R14:R15 and R13:R12) and, because we don't save R12-R15 > across calls, we only need to save 6 registers for a total of 12 > bytes--all working storage is in registers. If you have a hardware > multiplier, we use two fewer registers (R8-R15) and consequently only 8 > bytes of RAM. If you don't need to save the working registers across > calls (which is very common in some cases, for instance when computing a > power series), then you can dispense with the RAM requirement and do all > arithmetic in registers. That's cool! HOw about singularities? And yes, we also do not save r12-r15, yet just in case one wants to save them... > > > division - r14 to r15 and 18 > > bytes of ram. (not really optimized) ~d Sorry, r4 - r15 (r14-r12 saved on stack + return address == 18 bytes). > > Division requires R7-R15 (with input in R13:R12 and R15:R14) and only > the requirement to save the registers. If yo can do without this, fp > division can be completed in registers only. Well, C coded FP mul/div do not consume stack for local vars at all and everything done in registers. Probably, moce accurate coding will result better output and assembly coding will give even better result. The figures above are what gcc does. ~d > > -- Paul.
Reply by ●January 22, 20032003-01-22
Hi,
> > I don't see this at all. My hand-coded
float multiply
> routines only
> > use R6-R15 (input in R14:R15 and R13:R12) and, because we don't
save
> R12-R15
> > across calls, we only need to save 6 registers for a total of 12
> > bytes--all working storage is in registers. If you have a hardware
> > multiplier, we use two fewer registers (R8-R15) and
> consequently only
> 8
> > bytes of RAM. If you don't need to save the working
> registers across
> > calls (which is very common in some cases, for instance when
> computing a
> > power series), then you can dispense with the RAM requirement and
> do all
> > arithmetic in registers.
>
> That's cool!
> HOw about singularities?
If you mean IEEE infinities? I deal with those and also NaNs. It's a
simple matter to extract the exponent and compare with 255 (in the
single precision case) and deal with the exceptional cases--you need to
extract the exponent anyway.
The power series is a good example of where cascading things works very
well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're
multiply-adding. To make this clearer, this what our compiler does with
the above, if x is a float parameter and the ks are constants, as is
usual with a power series:
power(float x)
1214 .ALIGN BYTE[2]
1214 _power
1214 LOCAL x @ R10
1214 0b12 PUSH.W R11
1216 0a12 PUSH.W R10
1218 0a4e MOV.W R14, R10
121a 0b4f MOV.W R15, R11
return ((x+K1)*x + K2)*x + K3;
121c 0e4a MOV.W R10, R14
121e 0f4b MOV.W R11, R15
1220 3c40b6f3 MOV.W #62390, R12
1224 3d409d3f MOV.W #16285, R13
1228 b012e815 CALL #___float32_add
122c 0c4a MOV.W R10, R12
122e 0d4b MOV.W R11, R13
1230 b012da16 CALL #___float32_mul
1234 3c407b14 MOV.W #5243, R12
1238 3d401640 MOV.W #16406, R13
123c b012e815 CALL #___float32_add
1240 0c4a MOV.W R10, R12
1242 0d4b MOV.W R11, R13
1244 b012da16 CALL #___float32_mul
1248 3c401b2f MOV.W #12059, R12
124c 3d405d40 MOV.W #16477, R13
1250 b012e815 CALL #___float32_add
1254 3a41 MOV.W @SP+, R10
1256 3b41 MOV.W @SP+, R11
1258 3041 RET
It's all very neat. I love the MSP430, it's just so cute... :-)
-- Paul.
Reply by ●January 22, 20032003-01-22
Hi, > Not so bad... > but the output reminds me something... below... :) > ~d Yep, looks like gcc compiles to the same sort of output that we do (except in my case, the Ks were constants). But, then again, I bet that our compiler is a lot faster than gcc to compile and link, and it comes with a pretty IDE. But then it's not as cheap as gcc. :-) -- Paul. > > ------------------------------- > float K1, K2, K3; > float pow(float x) { > return ((x+K1)*x + K2)*x + K3; > } > > /*********************** > * Function `pow' > ***********************/ > pow: > /* prologue: frame size = 0 */ > .L__FrameSize_pow=0x0 > .L__FrameOffset_pow=0x4 > push r11 > push r10 > /* prologue end (size=2) */ > mov r14, r10 > mov r15, r11 > mov &K1, r12 > mov &K1+2, r13 > call #__addsf3 > mov r10, r12 > mov r11, r13 > call #__mulsf3 > mov &K2, r12 > mov &K2+2, r13 > call #__addsf3 > mov r10, r12 > mov r11, r13 > call #__mulsf3 > mov &K3, r12 > mov &K3+2, r13 > call #__addsf3 > /* epilogue: frame size=0 */ > pop r10 > pop r11 > ret > /* epilogue end (size=3) */ > /* function pow size 33 (28) */ > .Lfe1: > .size pow,.Lfe1-pow > /********* End of function ******/ > - > > The power series is a good example of where cascading things works > very > > well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're > > multiply-adding. To make this clearer, this what our compiler does > > with the above, if x is a float parameter and the ks are > constants, as > > is usual with a power series: > > > > power(float x) > > > > 1214 .ALIGN BYTE[2] > > 1214 _power > > 1214 LOCAL x @ R10 > > 1214 0b12 PUSH.W R11 > > 1216 0a12 PUSH.W R10 > > 1218 0a4e MOV.W R14, R10 > > 121a 0b4f MOV.W R15, R11 > > > > return ((x+K1)*x + K2)*x + K3; > > > > 121c 0e4a MOV.W R10, R14 > > 121e 0f4b MOV.W R11, R15 > > 1220 3c40b6f3 MOV.W #62390, R12 > > 1224 3d409d3f MOV.W #16285, R13 > > 1228 b012e815 CALL #___float32_add > > 122c 0c4a MOV.W R10, R12 > > 122e 0d4b MOV.W R11, R13 > > 1230 b012da16 CALL #___float32_mul > > 1234 3c407b14 MOV.W #5243, R12 > > 1238 3d401640 MOV.W #16406, R13 > > 123c b012e815 CALL #___float32_add > > 1240 0c4a MOV.W R10, R12 > > 1242 0d4b MOV.W R11, R13 > > 1244 b012da16 CALL #___float32_mul > > 1248 3c401b2f MOV.W #12059, R12 > > 124c 3d405d40 MOV.W #16477, R13 > > 1250 b012e815 CALL #___float32_add > > 1254 3a41 MOV.W @SP+, R10 > > 1256 3b41 MOV.W @SP+, R11 > > 1258 3041 RET > > > > It's all very neat. I love the MSP430, it's just so cute... :-) > > > > -- Paul. > > > . > > > > ">http://docs.yahoo.com/info/terms/ > > >
Reply by ●January 22, 20032003-01-22
Not so bad... but the output reminds me something... below... :) ~d ------------------------------- float K1, K2, K3; float pow(float x) { return ((x+K1)*x + K2)*x + K3; } /*********************** * Function `pow' ***********************/ pow: /* prologue: frame size = 0 */ .L__FrameSize_pow=0x0 .L__FrameOffset_pow=0x4 push r11 push r10 /* prologue end (size=2) */ mov r14, r10 mov r15, r11 mov &K1, r12 mov &K1+2, r13 call #__addsf3 mov r10, r12 mov r11, r13 call #__mulsf3 mov &K2, r12 mov &K2+2, r13 call #__addsf3 mov r10, r12 mov r11, r13 call #__mulsf3 mov &K3, r12 mov &K3+2, r13 call #__addsf3 /* epilogue: frame size=0 */ pop r10 pop r11 ret /* epilogue end (size=3) */ /* function pow size 33 (28) */ .Lfe1: .size pow,.Lfe1-pow /********* End of function ******/ - > The power series is a good example of where cascading things works very > well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're > multiply-adding. To make this clearer, this what our compiler does with > the above, if x is a float parameter and the ks are constants, as is > usual with a power series: > > power(float x) > > 1214 .ALIGN BYTE[2] > 1214 _power > 1214 LOCAL x @ R10 > 1214 0b12 PUSH.W R11 > 1216 0a12 PUSH.W R10 > 1218 0a4e MOV.W R14, R10 > 121a 0b4f MOV.W R15, R11 > > return ((x+K1)*x + K2)*x + K3; > > 121c 0e4a MOV.W R10, R14 > 121e 0f4b MOV.W R11, R15 > 1220 3c40b6f3 MOV.W #62390, R12 > 1224 3d409d3f MOV.W #16285, R13 > 1228 b012e815 CALL #___float32_add > 122c 0c4a MOV.W R10, R12 > 122e 0d4b MOV.W R11, R13 > 1230 b012da16 CALL #___float32_mul > 1234 3c407b14 MOV.W #5243, R12 > 1238 3d401640 MOV.W #16406, R13 > 123c b012e815 CALL #___float32_add > 1240 0c4a MOV.W R10, R12 > 1242 0d4b MOV.W R11, R13 > 1244 b012da16 CALL #___float32_mul > 1248 3c401b2f MOV.W #12059, R12 > 124c 3d405d40 MOV.W #16477, R13 > 1250 b012e815 CALL #___float32_add > 1254 3a41 MOV.W @SP+, R10 > 1256 3b41 MOV.W @SP+, R11 > 1258 3041 RET > > It's all very neat. I love the MSP430, it's just so cute... :-) > > -- Paul.
Reply by ●January 23, 20032003-01-23
> > But, then again, I bet that our compiler is a lot faster than gcc to > compile and link, and it comes with a pretty IDE. But then it's not as > cheap as gcc. :-) Well, probably you have to pay more attention to run-time performance than to compile-time for 64K only core ;) ~d > > -- Paul. > > > > > ------------------------------- > > float K1, K2, K3; > > float pow(float x) { > > return ((x+K1)*x + K2)*x + K3; > > } > > > > /*********************** > > * Function `pow' > > ***********************/ > > pow: > > /* prologue: frame size = 0 */ > > .L__FrameSize_pow=0x0 > > .L__FrameOffset_pow=0x4 > > push r11 > > push r10 > > /* prologue end (size=2) */ > > mov r14, r10 > > mov r15, r11 > > mov &K1, r12 > > mov &K1+2, r13 > > call #__addsf3 > > mov r10, r12 > > mov r11, r13 > > call #__mulsf3 > > mov &K2, r12 > > mov &K2+2, r13 > > call #__addsf3 > > mov r10, r12 > > mov r11, r13 > > call #__mulsf3 > > mov &K3, r12 > > mov &K3+2, r13 > > call #__addsf3 > > /* epilogue: frame size=0 */ > > pop r10 > > pop r11 > > ret > > /* epilogue end (size=3) */ > > /* function pow size 33 (28) */ > > .Lfe1: > > .size pow,.Lfe1-pow > > /********* End of function ******/ > > - > > > The power series is a good example of where cascading things works > > very > > > well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're > > > multiply-adding. To make this clearer, this what our compiler does > > > with the above, if x is a float parameter and the ks are > > constants, as > > > is usual with a power series: > > > > > > power(float x) > > > > > > 1214 .ALIGN BYTE[2] > > > 1214 _power > > > 1214 LOCAL x @ R10 > > > 1214 0b12 PUSH.W R11 > > > 1216 0a12 PUSH.W R10 > > > 1218 0a4e MOV.W R14, R10 > > > 121a 0b4f MOV.W R15, R11 > > > > > > return ((x+K1)*x + K2)*x + K3; > > > > > > 121c 0e4a MOV.W R10, R14 > > > 121e 0f4b MOV.W R11, R15 > > > 1220 3c40b6f3 MOV.W #62390, R12 > > > 1224 3d409d3f MOV.W #16285, R13 > > > 1228 b012e815 CALL #___float32_add > > > 122c 0c4a MOV.W R10, R12 > > > 122e 0d4b MOV.W R11, R13 > > > 1230 b012da16 CALL #___float32_mul > > > 1234 3c407b14 MOV.W #5243, R12 > > > 1238 3d401640 MOV.W #16406, R13 > > > 123c b012e815 CALL #___float32_add > > > 1240 0c4a MOV.W R10, R12 > > > 1242 0d4b MOV.W R11, R13 > > > 1244 b012da16 CALL #___float32_mul > > > 1248 3c401b2f MOV.W #12059, R12 > > > 124c 3d405d40 MOV.W #16477, R13 > > > 1250 b012e815 CALL #___float32_add > > > 1254 3a41 MOV.W @SP+, R10 > > > 1256 3b41 MOV.W @SP+, R11 > > > 1258 3041 RET > > > > > > It's all very neat. I love the MSP430, it's just so cute... :-) > > > > > > -- Paul. > > > > > > . > > > > > > > > ">http://docs.yahoo.com/info/terms/ > > > > > >