Not so bad...
but the output reminds me something... below... :)
~d
-------------------------------
float K1, K2, K3;
float pow(float x) {
return ((x+K1)*x + K2)*x + K3;
}
/***********************
* Function `pow'
***********************/
pow:
/* prologue: frame size = 0 */
.L__FrameSize_pow=0x0
.L__FrameOffset_pow=0x4
push r11
push r10
/* prologue end (size=2) */
mov r14, r10
mov r15, r11
mov &K1, r12
mov &K1+2, r13
call #__addsf3
mov r10, r12
mov r11, r13
call #__mulsf3
mov &K2, r12
mov &K2+2, r13
call #__addsf3
mov r10, r12
mov r11, r13
call #__mulsf3
mov &K3, r12
mov &K3+2, r13
call #__addsf3
/* epilogue: frame size=0 */
pop r10
pop r11
ret
/* epilogue end (size=3) */
/* function pow size 33 (28) */
.Lfe1:
.size pow,.Lfe1-pow
/********* End of function ******/
-
> The power series is a good example of where
cascading things works
very
> well -- ((x*k1 + k2)*x + k3)*x + k4... In this
case, you you're
> multiply-adding. To make this clearer, this what our compiler does with
> the above, if x is a float parameter and the ks are constants, as is
> usual with a power series:
>
> power(float x)
>
> 1214 .ALIGN BYTE[2]
> 1214 _power
> 1214 LOCAL x @ R10
> 1214 0b12 PUSH.W R11
> 1216 0a12 PUSH.W R10
> 1218 0a4e MOV.W R14, R10
> 121a 0b4f MOV.W R15, R11
>
> return ((x+K1)*x + K2)*x + K3;
>
> 121c 0e4a MOV.W R10, R14
> 121e 0f4b MOV.W R11, R15
> 1220 3c40b6f3 MOV.W #62390, R12
> 1224 3d409d3f MOV.W #16285, R13
> 1228 b012e815 CALL #___float32_add
> 122c 0c4a MOV.W R10, R12
> 122e 0d4b MOV.W R11, R13
> 1230 b012da16 CALL #___float32_mul
> 1234 3c407b14 MOV.W #5243, R12
> 1238 3d401640 MOV.W #16406, R13
> 123c b012e815 CALL #___float32_add
> 1240 0c4a MOV.W R10, R12
> 1242 0d4b MOV.W R11, R13
> 1244 b012da16 CALL #___float32_mul
> 1248 3c401b2f MOV.W #12059, R12
> 124c 3d405d40 MOV.W #16477, R13
> 1250 b012e815 CALL #___float32_add
> 1254 3a41 MOV.W @SP+, R10
> 1256 3b41 MOV.W @SP+, R11
> 1258 3041 RET
>
> It's all very neat. I love the MSP430, it's just so cute... :-)
>
> -- Paul.
Reply by Paul Curtis●January 22, 20032003-01-22
Hi,
> Not so bad...
> but the output reminds me something... below... :)
> ~d
Yep, looks like gcc compiles to the same sort of output that we do
(except in my case, the Ks were constants).
But, then again, I bet that our compiler is a lot faster than gcc to
compile and link, and it comes with a pretty IDE. But then it's not as
cheap as gcc. :-)
-- Paul.
>
> -------------------------------
> float K1, K2, K3;
> float pow(float x) {
> return ((x+K1)*x + K2)*x + K3;
> }
>
> /***********************
> * Function `pow'
> ***********************/
> pow:
> /* prologue: frame size = 0 */
> .L__FrameSize_pow=0x0
> .L__FrameOffset_pow=0x4
> push r11
> push r10
> /* prologue end (size=2) */
> mov r14, r10
> mov r15, r11
> mov &K1, r12
> mov &K1+2, r13
> call #__addsf3
> mov r10, r12
> mov r11, r13
> call #__mulsf3
> mov &K2, r12
> mov &K2+2, r13
> call #__addsf3
> mov r10, r12
> mov r11, r13
> call #__mulsf3
> mov &K3, r12
> mov &K3+2, r13
> call #__addsf3
> /* epilogue: frame size=0 */
> pop r10
> pop r11
> ret
> /* epilogue end (size=3) */
> /* function pow size 33 (28) */
> .Lfe1:
> .size pow,.Lfe1-pow
> /********* End of function ******/
> -
> > The power series is a good example of where cascading things works
> very
> > well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're
> > multiply-adding. To make this clearer, this what our compiler does
> > with the above, if x is a float parameter and the ks are
> constants, as
> > is usual with a power series:
> >
> > power(float x)
> >
> > 1214 .ALIGN BYTE[2]
> > 1214 _power
> > 1214 LOCAL x @ R10
> > 1214 0b12 PUSH.W R11
> > 1216 0a12 PUSH.W R10
> > 1218 0a4e MOV.W R14, R10
> > 121a 0b4f MOV.W R15, R11
> >
> > return ((x+K1)*x + K2)*x + K3;
> >
> > 121c 0e4a MOV.W R10, R14
> > 121e 0f4b MOV.W R11, R15
> > 1220 3c40b6f3 MOV.W #62390, R12
> > 1224 3d409d3f MOV.W #16285, R13
> > 1228 b012e815 CALL #___float32_add
> > 122c 0c4a MOV.W R10, R12
> > 122e 0d4b MOV.W R11, R13
> > 1230 b012da16 CALL #___float32_mul
> > 1234 3c407b14 MOV.W #5243, R12
> > 1238 3d401640 MOV.W #16406, R13
> > 123c b012e815 CALL #___float32_add
> > 1240 0c4a MOV.W R10, R12
> > 1242 0d4b MOV.W R11, R13
> > 1244 b012da16 CALL #___float32_mul
> > 1248 3c401b2f MOV.W #12059, R12
> > 124c 3d405d40 MOV.W #16477, R13
> > 1250 b012e815 CALL #___float32_add
> > 1254 3a41 MOV.W @SP+, R10
> > 1256 3b41 MOV.W @SP+, R11
> > 1258 3041 RET
> >
> > It's all very neat. I love the MSP430, it's just so cute...
:-)
> >
> > -- Paul.
>
>
> .
>
>
>
> ">http://docs.yahoo.com/info/terms/
>
>
>
Reply by Paul Curtis●January 22, 20032003-01-22
Hi,
> > I don't see this at all. My hand-coded
float multiply
> routines only
> > use R6-R15 (input in R14:R15 and R13:R12) and, because we don't
save
> R12-R15
> > across calls, we only need to save 6 registers for a total of 12
> > bytes--all working storage is in registers. If you have a hardware
> > multiplier, we use two fewer registers (R8-R15) and
> consequently only
> 8
> > bytes of RAM. If you don't need to save the working
> registers across
> > calls (which is very common in some cases, for instance when
> computing a
> > power series), then you can dispense with the RAM requirement and
> do all
> > arithmetic in registers.
>
> That's cool!
> HOw about singularities?
If you mean IEEE infinities? I deal with those and also NaNs. It's a
simple matter to extract the exponent and compare with 255 (in the
single precision case) and deal with the exceptional cases--you need to
extract the exponent anyway.
The power series is a good example of where cascading things works very
well -- ((x*k1 + k2)*x + k3)*x + k4... In this case, you you're
multiply-adding. To make this clearer, this what our compiler does with
the above, if x is a float parameter and the ks are constants, as is
usual with a power series:
power(float x)
1214 .ALIGN BYTE[2]
1214 _power
1214 LOCAL x @ R10
1214 0b12 PUSH.W R11
1216 0a12 PUSH.W R10
1218 0a4e MOV.W R14, R10
121a 0b4f MOV.W R15, R11
return ((x+K1)*x + K2)*x + K3;
121c 0e4a MOV.W R10, R14
121e 0f4b MOV.W R11, R15
1220 3c40b6f3 MOV.W #62390, R12
1224 3d409d3f MOV.W #16285, R13
1228 b012e815 CALL #___float32_add
122c 0c4a MOV.W R10, R12
122e 0d4b MOV.W R11, R13
1230 b012da16 CALL #___float32_mul
1234 3c407b14 MOV.W #5243, R12
1238 3d401640 MOV.W #16406, R13
123c b012e815 CALL #___float32_add
1240 0c4a MOV.W R10, R12
1242 0d4b MOV.W R11, R13
1244 b012da16 CALL #___float32_mul
1248 3c401b2f MOV.W #12059, R12
124c 3d405d40 MOV.W #16477, R13
1250 b012e815 CALL #___float32_add
1254 3a41 MOV.W @SP+, R10
1256 3b41 MOV.W @SP+, R11
1258 3041 RET
It's all very neat. I love the MSP430, it's just so cute... :-)
-- Paul.
Reply by diwilru●January 22, 20032003-01-22
>
> I don't see this at all. My hand-coded float
multiply routines only use
> R6-R15 (input in R14:R15 and R13:R12) and, because we don't save
R12-R15
> across calls, we only need to save 6 registers for
a total of 12
> bytes--all working storage is in registers. If you have a hardware
> multiplier, we use two fewer registers (R8-R15) and consequently only
8
> bytes of RAM. If you don't need to save the
working registers across
> calls (which is very common in some cases, for instance when
computing a
> power series), then you can dispense with the RAM
requirement and
do all
> arithmetic in registers.
That's cool!
HOw about singularities?
And yes, we also do not save r12-r15, yet just in case one wants to save
them...
>
> > division - r14 to r15 and 18
> > bytes of ram. (not really optimized) ~d
Sorry, r4 - r15 (r14-r12 saved on stack + return address == 18 bytes).
>
> Division requires R7-R15 (with input in R13:R12 and R15:R14) and only
> the requirement to save the registers. If yo can do without this, fp
> division can be completed in registers only.
Well, C coded FP mul/div do not consume stack for local vars at all and
everything done in registers. Probably, moce accurate coding will result
better output and assembly coding will give even better result.
The figures above are what gcc does.
~d
>
> -- Paul.
Reply by Paul Curtis●January 22, 20032003-01-22
> --- In msp430@msp4..., "khalakatevakis
> <halakatevakis@e...>"
<halakatevakis@e...> wrote:
> > Hello
> > I'm using MSP430F1101 and my project requires some math
equations,
> > including 32 bit multiplication and division. When i use IAR C, the
> > micro runs quickly out of memory.
> > Has anybody done 32 bit math in assembly?
>
> What sort of math? Integer arith is really simple and
> requires no RAM
> (multiplication - r10 to r15, division - r8 - r15).
> Float arith will be more RAM exhaustive - mult
requires about
> r4 - r15 and 22 bytes of ram,
I don't see this at all. My hand-coded float multiply routines only use
R6-R15 (input in R14:R15 and R13:R12) and, because we don't save R12-R15
across calls, we only need to save 6 registers for a total of 12
bytes--all working storage is in registers. If you have a hardware
multiplier, we use two fewer registers (R8-R15) and consequently only 8
bytes of RAM. If you don't need to save the working registers across
calls (which is very common in some cases, for instance when computing a
power series), then you can dispense with the RAM requirement and do all
arithmetic in registers.
> division - r14 to r15 and 18
> bytes of ram. (not really optimized) ~d
Division requires R7-R15 (with input in R13:R12 and R15:R14) and only
the requirement to save the registers. If yo can do without this, fp
division can be completed in registers only.
-- Paul.
Reply by diwilru●January 22, 20032003-01-22
--- In msp430@msp4..., "khalakatevakis
<halakatevakis@e...>" <halakatevakis@e...> wrote:
> Hello
> I'm using MSP430F1101 and my project requires some math equations,
> including 32 bit multiplication and division. When i use IAR C, the
> micro runs quickly out of memory.
> Has anybody done 32 bit math in assembly?
What sort of math? Integer arith is really simple and requires no RAM
(multiplication - r10 to r15, division - r8 - r15).
Float arith will be more RAM exhaustive - mult requires about
r4 - r15 and 22 bytes of ram, division - r14 to r15 and 18 bytes of ram.
(not really optimized)
~d
Reply by Paul Curtis●January 22, 20032003-01-22
> Hello
> I'm using MSP430F1101 and my project requires some math equations,
> including 32 bit multiplication and division. When i use IAR C, the
> micro runs quickly out of memory.
> Has anybody done 32 bit math in assembly?
Yes, to support our C compiler. ;-)
I assume that you mean 32-bit integer (or unsigned) multiplication and
division. If this is the case, then take a look at the TI Application
Report Book, chapter #5
(http://focus.ti.com/analog/docs/articles.tsp?familyId42&templateIdR
46&path=templatedata/cm/brc/data/20011218msp430userguides&articleType=br
c#app)
This has mixed 32-bit/16-bit operations. Be warned, these are not the
fastest routines in the world as they bit-bash the multiplication with
no intelligence and with no respect for the input parameters. A better
approach to multiplication is to look at the operands before starting
off, same for division.
This app note will get you some way to coding the 32-bit routines
yourself. I'd also look up some references to multiplication and
division using Google to get a flavour of more efficient algorithms.
Also good for a read are the floating-point package AN and the hardware
multiplier AN. Of course, TI's published FP code isn't as fast as
ours
and it doesn't conform to the standard IEEE formats... ;-)
Good luck!
-- Paul.
Reply by khalakatevakis●January 22, 20032003-01-22
Hello
I'm using MSP430F1101 and my project requires some math equations,
including 32 bit multiplication and division. When i use IAR C, the
micro runs quickly out of memory.
Has anybody done 32 bit math in assembly?