EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Multiplier madness ...................... help

Started by jkw_ee September 18, 2008
I have started digging into how the multiplies are implemented in my
software.

MSP4301611
IAR 3.42A (should update soon)

I am multiplying two signed integers (16bit signed)

When I look at the assembler listing there is some strange stuff
going on that I just can't figure out ......... my results are off
as well.

signed long ti;
signed int ar, ai;

ti = ai * ar;

Simple enough

ti = ai * ar;
004328 411C 000A mov.w 0xA(SP),R12
00432C 411E 000E mov.w 0xE(SP),R14
004330 12B0 4AF0 call #?Mul16Hw
004334 4C0E mov.w R12,R14
004336 4E0F mov.w R14,R15
004338 E33F inv.w R15
00433A 5F0F rla.w R15
00433C 7F0F subc.w R15,R15
00433E 4E06 mov.w R14,R6
004340 4F07 mov.w R15,R7

?Mul16Hw:
?Mul16to32uHw:
004AE2 1202 push.w SR
004AE4 C232 dint
004AE6 4303 nop
004AE8 4C82 0130 mov.w R12,&MPY
004AEC 4E82 0138 mov.w R14,&OP2
004AF0 421C 013A mov.w &RESLO,R12
004AF4 421D 013C mov.w &RESHI,R13
004AF8 1300 reti
This looks really strange to me.

First it is calling the Mul16to32uHW rather than the signed version.
The high result in R13 is ignored upon return.

R12 is copied to 14, then 14 is copied to 15 then a bunch stuff is
done to 15 and 15 is subracted from itself ........ R14 is placed in
R6 and R15 is placed in R7.

Am I just having a massive brain fart here?

Please bring me back to reality.

What I am really trying to do is the following complex multiplication

tr = ar*RealOut[k] - ai*ImagOut[k];
ti = ar*ImagOut[k] + ai*RealOut[k];

RealOut[] and ImagOut[] are signed longs so the result is computed
using the Mul32Hw macro and takes about 51 cycles for each. I know
the bounds on RealOut[] and was attempting to cast it to a signed int
to use the Mul16to32sHW which is only 29 cycles.

Thanks for your help,
Jeremy

Beginning Microcontrollers with the MSP430

Hi,

> I have started digging into how the multiplies are implemented in my
> software.
>
> MSP4301611
> IAR 3.42A (should update soon)
>
> I am multiplying two signed integers (16bit signed)
>
> When I look at the assembler listing there is some strange stuff
> going on that I just can't figure out ......... my results are off
> as well.

Your understanding is, unfortunately, incorrect.

> signed long ti;
> signed int ar, ai;
>
> ti = ai * ar;
>
> Simple enough

You think that's going to get you the right answer? Think again.

> ti = ai * ar;
> 004328 411C 000A mov.w 0xA(SP),R12
> 00432C 411E 000E mov.w 0xE(SP),R14
> 004330 12B0 4AF0 call #?Mul16Hw
> 004334 4C0E mov.w R12,R14
> 004336 4E0F mov.w R14,R15
> 004338 E33F inv.w R15
> 00433A 5F0F rla.w R15
> 00433C 7F0F subc.w R15,R15
> 00433E 4E06 mov.w R14,R6
> 004340 4F07 mov.w R15,R7
> ?Mul16Hw:
> ?Mul16to32uHw:
> 004AE2 1202 push.w SR
> 004AE4 C232 dint
> 004AE6 4303 nop
> 004AE8 4C82 0130 mov.w R12,&MPY
> 004AEC 4E82 0138 mov.w R14,&OP2
> 004AF0 421C 013A mov.w &RESLO,R12
> 004AF4 421D 013C mov.w &RESHI,R13
> 004AF8 1300 reti
> This looks really strange to me.

Doesn't to me.

> First it is calling the Mul16to32uHW rather than the signed version.

Doesn't need to use a signed version. The multiplication of 16x16->16
yields identical results (bit patterns) for both signed and unsigned
operands. -1x-1 = 1, 0xffff x 0xffff = 1. Doesn't matter a jot, does it?

> The high result in R13 is ignored upon return.

Yep. You're multiplying two ints to generate an int, what did you expect?

> R12 is copied to 14, then 14 is copied to 15 then a bunch stuff is
> done to 15 and 15 is subracted from itself ........ R14 is placed in
> R6 and R15 is placed in R7.

Sign extension.

> Am I just having a massive brain fart here?

I would have to say, "yes".

> Please bring me back to reality.

The above is not enough to jolt you back to reality? Short answer:
understand what your programming language does, not what you think it does.

> What I am really trying to do is the following complex multiplication
>
> tr = ar*RealOut[k] - ai*ImagOut[k];
> ti = ar*ImagOut[k] + ai*RealOut[k];
>
> RealOut[] and ImagOut[] are signed longs so the result is computed
> using the Mul32Hw macro and takes about 51 cycles for each. I know
> the bounds on RealOut[] and was attempting to cast it to a signed int
> to use the Mul16to32sHW which is only 29 cycles.

Does IAR specialise sign-extended 16-bit value x signed 32-bit value? Is
there a 16x32->32 specialisation? I don't think so, not unless you write it
yourself.

Regards,

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

Try ti = (long) ai * (long) ar;

Hugh

At 08:28 AM 9/18/2008, you wrote:
I have started digging into how the multiplies are implemented in my
software.

MSP4301611
IAR 3.42A (should update soon)

I am multiplying two signed integers (16bit signed)

When I look at the assembler listing there is some strange stuff
going on that I just can't figure out ......... my results are off
as well.

signed long ti;
signed int ar, ai;

ti = ai * ar;

Simple enough

ti = ai * ar;
004328 411C 000A mov.w 0xA(SP),R12
00432C 411E 000E mov.w 0xE(SP),R14
004330 12B0 4AF0 call #?Mul16Hw
004334 4C0E mov.w R12,R14
004336 4E0F mov.w R14,R15
004338 E33F inv.w R15
00433A 5F0F rla.w R15
00433C 7F0F subc.w R15,R15
00433E 4E06 mov.w R14,R6
004340 4F07 mov.w R15,R7
?Mul16Hw:
?Mul16to32uHw:
004AE2 1202 push.w SR
004AE4 C232 dint
004AE6 4303 nop
004AE8 4C82 0130 mov.w R12,&MPY
004AEC 4E82 0138 mov.w R14,&OP2
004AF0 421C 013A mov.w &RESLO,R12
004AF4 421D 013C mov.w &RESHI,R13
004AF8 1300 reti
This looks really strange to me.

First it is calling the Mul16to32uHW rather than the signed version.
The high result in R13 is ignored upon return.

R12 is copied to 14, then 14 is copied to 15 then a bunch stuff is
done to 15 and 15 is subracted from itself ........ R14 is placed in
R6 and R15 is placed in R7.

Am I just having a massive brain fart here?

Please bring me back to reality.

What I am really trying to do is the following complex multiplication

tr = ar*RealOut[k] - ai*ImagOut[k];
ti = ar*ImagOut[k] + ai*RealOut[k];

RealOut[] and ImagOut[] are signed longs so the result is computed
using the Mul32Hw macro and takes about 51 cycles for each. I know
the bounds on RealOut[] and was attempting to cast it to a signed int
to use the Mul16to32sHW which is only 29 cycles.

Thanks for your help,
Jeremy

Hi Hugh,

> -----Original Message-----
> From: m... [mailto:m...] On Behalf Of
Hugh
> Molesworth
> Sent: 18 September 2008 16:43
> To: m...
> Subject: Re: [msp430] Multiplier madness ...................... help
>
> Try ti = (long) ai * (long) ar;

...I think that's what he's trying to avoid. I believe he requires a
16x32->32 multiply which IAR (and indeed CrossWorks) does not specialize.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

Hi Paul

Good point, I always oversimplify. You must be bored today as well :-)

By the way, I was testing some filesystems, and required a default
size in CrossWorks but didn't see it anywhere so I added it thus:
#define __SIZE_T_TYPE__ unsigned long

I also assume there are no file function headers?

Hugh

At 08:45 AM 9/18/2008, you wrote:
Hi Hugh,

> -----Original Message-----
> From: m... [mailto:m...] On Behalf Of
Hugh
> Molesworth
> Sent: 18 September 2008 16:43
> To: m...
> Subject: Re: [msp430] Multiplier madness ...................... help
>
> Try ti = (long) ai * (long) ar;

....I think that's what he's trying to avoid. I believe he requires a
16x32->32 multiply which IAR (and indeed CrossWorks) does not specialize.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

Hi Hugh,

> Good point, I always oversimplify. You must be bored today as well :-)

I'm not bored! I just implemented a nice new feature in the IDE. :-)

> By the way, I was testing some filesystems, and required a default
> size in CrossWorks but didn't see it anywhere so I added it thus:
> #define __SIZE_T_TYPE__ unsigned long

size_t is actually defined exactly where it should be, in (and
other places, the C standard has hoops you need to jump through). For
MSP430, size_t is in fact just 16 bits wide and is equivalent to plain
'unsigned'.

> I also assume there are no file function headers?

Correct. At present.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

--- In m..., "Paul Curtis" wrote:
>
> Hi,
>
> > I have started digging into how the multiplies are implemented in
my
> > software.
> >
> > MSP4301611
> > IAR 3.42A (should update soon)
> >
> > I am multiplying two signed integers (16bit signed)
> >
> > When I look at the assembler listing there is some strange stuff
> > going on that I just can't figure out ......... my results are
off
> > as well.
>
> Your understanding is, unfortunately, incorrect.
>
> > signed long ti;
> > signed int ar, ai;
> >
> > ti = ai * ar;
> >
> > Simple enough
>
> You think that's going to get you the right answer? Think again.
>
> > ti = ai * ar;
> > 004328 411C 000A mov.w 0xA(SP),R12
> > 00432C 411E 000E mov.w 0xE(SP),R14
> > 004330 12B0 4AF0 call #?Mul16Hw
> > 004334 4C0E mov.w R12,R14
> > 004336 4E0F mov.w R14,R15
> > 004338 E33F inv.w R15
> > 00433A 5F0F rla.w R15
> > 00433C 7F0F subc.w R15,R15
> > 00433E 4E06 mov.w R14,R6
> > 004340 4F07 mov.w R15,R7
> >
> >
> > ?Mul16Hw:
> > ?Mul16to32uHw:
> > 004AE2 1202 push.w SR
> > 004AE4 C232 dint
> > 004AE6 4303 nop
> > 004AE8 4C82 0130 mov.w R12,&MPY
> > 004AEC 4E82 0138 mov.w R14,&OP2
> > 004AF0 421C 013A mov.w &RESLO,R12
> > 004AF4 421D 013C mov.w &RESHI,R13
> > 004AF8 1300 reti
> >
> >
> > This looks really strange to me.
>
> Doesn't to me.
>
> > First it is calling the Mul16to32uHW rather than the signed
version.
>
> Doesn't need to use a signed version. The multiplication of 16x16-
>16
> yields identical results (bit patterns) for both signed and unsigned
> operands. -1x-1 = 1, 0xffff x 0xffff = 1. Doesn't matter a jot,
does it?
>
> > The high result in R13 is ignored upon return.
>
> Yep. You're multiplying two ints to generate an int, what did you
expect?
>
> > R12 is copied to 14, then 14 is copied to 15 then a bunch stuff is
> > done to 15 and 15 is subracted from itself ........ R14 is placed
in
> > R6 and R15 is placed in R7.
>
> Sign extension.
>
> > Am I just having a massive brain fart here?
>
> I would have to say, "yes".
>
> > Please bring me back to reality.
>
> The above is not enough to jolt you back to reality? Short answer:
> understand what your programming language does, not what you think
it does.
>
> > What I am really trying to do is the following complex
multiplication
> >
> > tr = ar*RealOut[k] - ai*ImagOut[k];
> > ti = ar*ImagOut[k] + ai*RealOut[k];
> >
> > RealOut[] and ImagOut[] are signed longs so the result is computed
> > using the Mul32Hw macro and takes about 51 cycles for each. I know
> > the bounds on RealOut[] and was attempting to cast it to a signed
int
> > to use the Mul16to32sHW which is only 29 cycles.
>
> Does IAR specialise sign-extended 16-bit value x signed 32-bit
value? Is
> there a 16x32->32 specialisation? I don't think so, not unless you
write it
> yourself.
>
> Regards,
>
> --
> Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
> CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors
>
Paul,

Thanks for your reply.

I don't think that you have read my posting clearly. I am multiplying
two signed ints and looking for a signed long result. I belive there
is a 16x16 to 32 signed multiplication in IAR as I stated above, it
is called Mul16to32sHW. My problem is that the compiler is calling
the MUl16to32uHW instead. The simple acid test for this is the fact
that I do not get the correct result for the multiplication of two
signed numbers such as -100 x 512. Result should be -51200 not the
14336 that is returned. Becuase the high result is dropped.

In future I would also appreciate it if you could keep your
condecending tone to yourself. The purpose of this forum is to
exchange information, not for assholes like you to slam everyone else.

Jeremy

Hi,

> Paul,
>
> Thanks for your reply.
>
> I don't think that you have read my posting clearly.

Unfortunately, I did. But you haven't understood my answer correctly.

> I am multiplying
> two signed ints and looking for a signed long result.

No MSP430 compiler is going to give you that. None. It just won't happen.

> I belive there
> is a 16x16 to 32 signed multiplication in IAR as I stated above, it
> is called Mul16to32sHW.

Yes. But to get that you need to multiply two longs together.

> My problem is that the compiler is calling
> the MUl16to32uHW instead. The simple acid test for this is the fact
> that I do not get the correct result for the multiplication of two
> signed numbers such as -100 x 512. Result should be -51200 not the
> 14336 that is returned. Becuase the high result is dropped.

Correct. int x int gives an int result. int x int does not give a long
result. Only long x long = long.

-100 = 0xff9c. 512 = 0x200. Two 16-bit ints multiplied gives a result of
0x3800, in decimal 14366. Compiler is correct.

> In future I would also appreciate it if you could keep your
> condecending tone to yourself. The purpose of this forum is to
> exchange information, not for assholes like you to slam everyone else.

Unfortunately, I am not an asshole, but I am blessed with one. I actually
gave you enough information for you to rescue yourself. And Hugh did too.

I will go through it step by step.

int x, y;
long z;
z = x * y;

This multiplies a 16-bit int by a 16-bit int generating a 16-bit int
product. That 16-bit product is then sign-extended to 32 bits. The
compiler is *correct* in its code generation. It doesn't matter whether the
operands are signed or not, doesn't matter a damn, same bit pattern is
generated.

If you *require* a 16x16->32, that is *not* expressible in standard C (with
16-bit ints) as a single operation--you *MUST* go through a long
multiplication which the compiler will then *specialize* by recognizing the
operands have restricted range.

IAR's compiler and my compiler both specialize the following:

int x, y;
long z;
z = (long)x * (long)y;

That will get you a correct result using plain 16-bit multiplication.

My compiler, and I believe IAR's compiler, will *not* specialize the
following, but both *could*:

int x;
long y, z;
z = x * y;

If you can't follow this, all hope is lost.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

Hi,

> In future I would also appreciate it if you could keep your
> condecending tone to yourself. The purpose of this forum is to
> exchange information, not for assholes like you to slam everyone else.

I led you down the path of what you needed to know. I'm a firm believer in
people helping themselves--it's far better for people not to be spoon-fed
the answer, but to work the answer out for themselves. It sticks much
better.

BTW, I'm British, so I actually have an arsehole and fibre keeps it nice and
regular.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors

Yeah, I saw that but wanted to avoid altering the existing code. I
changed my definition to leverage of the compiler definition; later
this will go away.
#define __SIZE_T_TYPE__ __SIZE_T

What new feature? I think I'm looking for it ...

At 09:28 AM 9/18/2008, you wrote:
Hi Hugh,

> Good point, I always oversimplify. You must be bored today as well :-)

I'm not bored! I just implemented a nice new feature in the IDE. :-)

> By the way, I was testing some filesystems, and required a default
> size in CrossWorks but didn't see it anywhere so I added it thus:
> #define __SIZE_T_TYPE__ unsigned long

size_t is actually defined exactly where it should be, in (and
other places, the C standard has hoops you need to jump through). For
MSP430, size_t is in fact just 16 bits wide and is equivalent to plain
'unsigned'.

> I also assume there are no file function headers?

Correct. At present.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for ARM, MSP430, AVR, MAXQ, and now Cortex-M3 processors


The 2024 Embedded Online Conference