EmbeddedRelated.com
Forums

Basic floating point execution times, using compiler libraries

Started by Jonathan Kirwan April 9, 2004
On Sun, 11 Apr 2004 09:58:26 +0100, Paul wrote:

>Jon,
>
>> >I did do those numbers, but I think I didn't want to post 
>> them in chance
>> >of having one party or another yelling at me :-)
>> 
>> I wish they'd post their own numbers.
>
>That would mean we would have to agree on things such as input values,
>precision, whether NaNs and INFs were supported, gradual underflow,
>sticky bits, and what rounding ode to use, and whether numbers are
>correctly rounded.

No, you don't have to agree on anything to post performance figures for
what you
are doing.  Before the advent of APR here in the US, back in the early
1970's,
banks posted a wide variety of interest rates that had little or nothing to do
with each other -- yet they put them out there, anyway.  Even after APR was
forced onto them, they still have a great many various fees regarding loans for
homes, for example, which make shopping on a single number almost impossible.

There is no requirement that vendors get together on these things.  There never
was such a requirement, there still isn't such a requirement.  Customers
will
just have to be smarter than to compare a single number, I suppose.  But
that's
been life for a long time and I don't expect it to change much.

If your answer to all this is to not say a thing, then that's fine.  I just
wish
it were otherwise.

>What's the point?

I'd like it, if nothing else.

>Even more, if we tested other
>logarithmic and transcendental functions, then we'd have to agree on
how
>precise the answers were, to how many ulps, and so on.  And we'd need
to
>agree on whether the hardware multiplier is used or not, or any other
>peculiarities.

You keep saying you'd have to agree and I'll keep saying you
don't need to.
These details were also the problem with various FP chips for the 80x86 from
other makers, too.  Do you remember Cyrix, Wytek, and the like?

I remember Cyrix going to some lengths, for example, to point up the errors in
the Intel implementations and why.  Details they claimed to have gotten right,
and they backed this up with various charts and plots of average and max error
over the range of their inputs.  I remember reading about method details
(non-linear methods like minimax) in designing their truncated coefficients
(coefficient values do not carry infinite precision) for truncated calculations
(calculations do not carry out in infinite precision, either.)

They didn't need this 'agreement' you talk about to sell their
product, document
their performance, and talk about their competition in the bargain.

Again, if your answer to your business approach is to avoid saying anything at
all, that's fine -- of course.  But I can still wish it were otherwise,
too.

Nothing you've said convinces me in the least that you have a good reason
to
stay quiet on the issue.

><snip about security methods>

>Hey, I don't like the fact I have to activate
Windows XP.  If this is
>your attitude, then I presume you don't like XP and don't want to
use
>it?

Actually, I don't like XP and don't use XP at all.

>>                     and     #0xFF, R12      ; 
 extract bits 
>
>Either use
>
>and.b #-1, r12  ; 1 word, 1 cycle
>
>or
>
>mov.b r12, r12  ; 1 word, 1 cycle

Thanks, Paul.  I see exactly what you are saying here.  Much appreciated!

Jon

Beginning Microcontrollers with the MSP430

On Sun, 11 Apr 2004 09:58:26 +0100, you wrote:

>That would mean we would have to agree on things
such as input values,

By this, do you mean how much shifting in denormalization is required, as one
aspect here?  In other words, the exact quantities to be processed in the test?
If so, understood.

Of course, I think that any summary of performance should include some reasoning
about them.  Just putting a number out there (or two, or three) without any
explanation is bad form.  But you might talk about the methodology used to
arrive at various figures and provide a few numbers which, in your informed
opinion, are meaningful and then go on with some detail about how they were
developed.

You don't have to agree on the inputs values with competitors in order to
say
something meaningful.

>precision,

What's the format you use, Paul?  Is it different from the others?  If so,
how
so?  Is this information available from your site without having to install the
package first?

I'm seriously interested.

>whether NaNs and INFs were supported,
>gradual underflow,
>sticky bits, and what rounding ode to use, and whether numbers are
>correctly rounded

And more, I know.  Do you support these?  Is it so hard to include that detail
with posted numbers?

Or, if the cost of supporting some of them is high enough, provide libraries for
those who don't deal with them, too?

This doesn't say why you can't provide information.  Instead, it says
you need
to communicate.  It's not like we need to be treated like idiots.  Or do
you see
it otherwise?

Paul, I'm sure you have an excellent technical product and are a superb
talent.
Nothing I'm saying challenges that fact.  I just don't agree with what
appears
to me to be a sour attitude towards customers' intelligence.  Talk about
what
you are making, how it performs, etc.  Let us read what's said and make
sense of
it for ourselves.  Tell us more, not less.

Jon

On Sun, 11 Apr 2004 09:58:26 +0100, Paul wrote:

>>Div32u16uQE_1       bit     #0xFF00, R14    ;
upper 8 bits all zero?
>>                    jnz     Div32u16uQE_2   ; no -- do normal shifting
>>                    swpb    R14             ; yes -- shift up by 8 bits
>>                    swpb    R15             ;   lower part holds mixture
>>                    mov.b   R15, R12        ;   so, get a copy of it
>>                    bic     #0xFF, R15      ;   shift zeros into lower
16 part
>>                    and     #0xFF, R12      ;   extract bits to go to
upper 16
>>                    bis     R12, R14        ;   merge them into the
upper 16
>>                    sub     #8, R11         ;   and update the exponent
>>Div32u16uQE_2

>Either use
>
>and.b #-1, r12  ; 1 word, 1 cycle
>
>or
>
>mov.b r12, r12  ; 1 word, 1 cycle

By the way, I'd been coding this while in a phone meeting and my mind
wasn't as
engaged as it may be.  Your pointer to this place made me take a quieter moment
to look at what's being done.

Here's slightly faster code:

Div32u16uQE_1       bit     #0xFF00, R14    ; upper 8 bits all zero?
                    jnz     Div32u16uQE_2   ; no -- do normal shifting
                    xor     R15, R14        ; yes -- shift the
                    mov.b   R15, R15        ;   dividend up by 8
                    xor     R15, R14        ;   bits, placing zeros
                    swpb    R14             ;   into the lower-
                    swpb    R15             ;   most 8 bits and
                    sub     #8, R11         ;   update the exponent
Div32u16uQE_2

This strips 2 cycles and 3 words off of my original code and 1 cycle and 2 words
off of what results from your idiom replacement and it further removes a need
for using R12 as a quick temporary.  It's a common method I'd
forgotten to use
at the time, but having my attention back to this recalled it.

Jon

Hi Jon,

> >> >I did do those numbers, but I think I
didn't want to post 
> >> them in chance
> >> >of having one party or another yelling at me :-)
> >> 
> >> I wish they'd post their own numbers.
> >
> >That would mean we would have to agree on things such as 
> input values,
> >precision, whether NaNs and INFs were supported, gradual underflow,
> >sticky bits, and what rounding ode to use, and whether numbers are
> >correctly rounded.
> [snip]
> There is no requirement that vendors get together on these 
> things.  There never
> was such a requirement, there still isn't such a requirement. 
>  Customers will
> just have to be smarter than to compare a single number, I 
> suppose.  But that's
> been life for a long time and I don't expect it to change much.
> 
> If your answer to all this is to not say a thing, then that's 
> fine.  I just wish
> it were otherwise.

No, I don't mind providing figures.  We also provided TI with benchmark
figures for code size on common benchmarks.  Only two vendors (ourselves
invluded) responded, out of those invited to do so, as of this time.  I
might pop together a quick pseudo-benchmark.  I need to provide TI with
other figures for our runtime routines, so this ties in nicely.

> >What's the point?
> 
> I'd like it, if nothing else.

Well, I might as well do it then.

> >Even more, if we tested other
> >logarithmic and transcendental functions, then we'd have to 
> agree on how
> >precise the answers were, to how many ulps, and so on.  And 
> we'd need to
> >agree on whether the hardware multiplier is used or not, or any other
> >peculiarities.
> 
> You keep saying you'd have to agree and I'll keep saying you 
> don't need to.

I know how benchmarks can be used against me... :-(  I can pick numberes
to make us look good, or to prove something.  But, for instance, we have
stable functions that are well-defined over a wide range of numbers and
if we compete against some library that doesn't do accurate range
reduction with a double precision pi, then we'll look bad.  No point
saying we work over a wider domain, customers just point to benchmark
results.  Perhaps you're not an average customer...

> Again, if your answer to your business approach is
to avoid 
> saying anything at
> all, that's fine -- of course.  But I can still wish it were 
> otherwise, too.

You don't know me very well if you think I won't say anything at all.
;-)

> Nothing you've said convinces me in the least
that you have a 
> good reason to
> stay quiet on the issue.

No, as I wrote the floating point routines for the MSP430, I know how
much effort I put into it.  I don't need to keep quiet about performance
because I think we have some tidy libraries.

> ><snip about security methods>
> 
> >Hey, I don't like the fact I have to activate Windows XP.  If this
is
> >your attitude, then I presume you don't like XP and don't
want to use
> >it?
> 
> Actually, I don't like XP and don't use XP at all.

But is that because of the activation requirement or something else?
You use a version of Windows as you admit to using KickStart; so I guess
it *is* the activation requirement?

> >>                     and     #0xFF, R12   
  ;   extract bits 
> >
> >Either use
> >
> >and.b #-1, r12  ; 1 word, 1 cycle
> >
> >or
> >
> >mov.b r12, r12  ; 1 word, 1 cycle
> 
> Thanks, Paul.  I see exactly what you are saying here.  Much 
> appreciated!

No problem.

-- Paul.

Jon,

> >That would mean we would have to agree on
things such as 
> input values,
> 
> By this, do you mean how much shifting in denormalization is 
> required, as one
> aspect here?  In other words, the exact quantities to be 
> processed in the test?
> If so, understood.

That type of thing, yes.  However, we have more than one floating point
implementation...

> Of course, I think that any summary of performance
should 
> include some reasoning
> about them.  Just putting a number out there (or two, or 
> three) without any
> explanation is bad form.  But you might talk about the 
> methodology used to
> arrive at various figures and provide a few numbers which, in 
> your informed
> opinion, are meaningful and then go on with some detail about 
> how they were
> developed.

Sure, I don't mind doing this.

> You don't have to agree on the inputs values
with competitors 
> in order to say
> something meaningful.

Well, I don't mind putting together my own benchmark suite to see how
well things go.  However, I have no idea how other vendors will
benchmark using our test suite because they don't have cycle-accurate
simulators.  I can run my code under a simulator and get exact cycle
times, which is easy.  Running on hardware is more problematic, but I
can use a timer I suppose.

> >precision,
> 
> What's the format you use, Paul?  Is it different from the 
> others?  If so, how
> so?  Is this information available from your site without 
> having to install the
> package first?

All vendors must have agreed on IEEE-754 or its ISO equivalent IEC60995
by now.  We use the standard single and double precisions (8.23 and
12.53 formats) but we also have a library that will do double extended
precision and quadruple precision in all IEEE rounding modes and has all
IEEE features.  We don't ship this version for the MSP430 as it's too
big and too slow for any practical purpose.

I believe all other vendors, excepting IAR, only provide 32-bit IEEE
floating point.  Who knows, they may even use the TI libraries to
provide their FP, in which case they won't be IEEE compliant.

> I'm seriously interested.
> 
> >whether NaNs and INFs were supported,
> >gradual underflow,
> >sticky bits, and what rounding ode to use, and whether numbers are
> >correctly rounded
> 
> And more, I know.  Do you support these?  Is it so hard to 
> include that detail
> with posted numbers?

In our IEEE-compliant library, yes.  In the library we ship for the
MSP430, NaNs and infinities are handled, but only as outputs from, e.g.
0/0, not as inputs.  Rounding mode is fixed at IEEE round to nearest
with tie breaks.

> Or, if the cost of supporting some of them is high
enough, 
> provide libraries for
> those who don't deal with them, too?

As I said, we have two sets of libraries, and one we choose not to ship
for the MSP430.

> This doesn't say why you can't provide
information.  Instead, 
> it says you need
> to communicate.  It's not like we need to be treated like 
> idiots.  Or do you see
> it otherwise?

I only want a fair set of benchmarks; I'm not afraid to compete with
anybody as long as the ground is fair.

> Paul, I'm sure you have an excellent
technical product and 
> are a superb talent.
> Nothing I'm saying challenges that fact.  I just don't agree 
> with what appears
> to me to be a sour attitude towards customers' intelligence.  
> Talk about what
> you are making, how it performs, etc.  Let us read what's 
> said and make sense of
> it for ourselves.  Tell us more, not less.

I don't believe you to be an average customer, John.  If I appear to
have a sour attitude, it's not intended.  I only point out that simply
providing performance figures is something that is more complex than
saying "floating point addition takes x cycles."

You've spurred me on to do it, so I might as well do so.

-- Paul.

At 05:56 PM 4/11/2004 -0700, Jonathan Kirwan wrote:
...

I have asked the consultant who wrote the FP to provide numbers for basic 
operations. We all know benchmarks are infamously bad for actually 
comparing compiler's performance (rant: certain 3 letter acronym company 
still has a PDF on their website demonstrating how their AVR compiler is 
48% better than ours on one program, ignoring for example that, for the AVR 
Butterfly, the Atmel written code differs in a few hundred bytes, out of 
~12K code...) but this should give some ideas for people who want to know 
the ballpark figure performance.

Anycase, hopefully some times this week I can provide some numbers for the 
ICC430 compilers. We have a customer who did something similar for the 
HC11/HC12 compilers since he is using it for some sort of flight simulator 
(the real stuff) and he cares about such thing.

// richard (This email is for mailing lists. To reach me directly, please 
use richard@rich...) 


Hi Paul, Jon et al

> That type of thing, yes.  However, we have more
than one floating point
> implementation...

Ideally, if I had it my way (hum), I'd have a "fract" type 
(ie. in 32 bits - bit 31 is sign, and rest is neg powers of 2).
It's really handy when you can suffice with nums between 0 and +- 0.999999.
Saves time not having to unpack, do calc and pack to eg. IEEE.

The actual floats then can call packing and unpacking routines, effectively
converting to a fract type.

Is this doable ? (I understand it'd be a hell of a thing to modify
towards).

In any case, I've always liked floats, and I DO know that CrossWorks has
the best and 
fastest floats/doubles, so I'm happy !

-- Kris





> Well, I don't mind putting together my own benchmark suite to see
how
> well things go.  However, I have no idea how other
vendors will
> benchmark using our test suite because they don't have cycle-accurate
> simulators.  I can run my code under a simulator and get exact cycle
> times, which is easy.  

I can readily support this claim.
I did several tricky timing tests with Simulator and then on HW, and indeed
CrossWorks' simulator is very cycle-accurate.
(And I have very high resolution calibrated equipment to confirm it timing wise)

-- Kris





Kris,

The Embedded C Standard document has fract and accum types.  Although I
have implemented fract arithmetic for MSP430, I haven't plumbed in the
compiler yet.  The compiler is undergoing a few changes to support the
_Complex and _Imag types, but fract and accum are just a bridge too far
at the moment.

-- Paul.

> -----Original Message-----
> From: microbit [mailto:microbit@micr...]
> Sent: 12 April 2004 12:57
> To: msp430@msp4...
> Subject: Re: [msp430] Basic floating point execution times, using
> compiler libraries
> 
> 
> Hi Paul, Jon et al
> 
> > That type of thing, yes.  However, we have more than one 
> floating point
> > implementation...
> 
> Ideally, if I had it my way (hum), I'd have a "fract" type 
> (ie. in 32 bits - bit 31 is sign, and rest is neg powers of 2).
> It's really handy when you can suffice with nums between 0 
> and +- 0.999999.
> Saves time not having to unpack, do calc and pack to eg. IEEE.
> 
> The actual floats then can call packing and unpacking 
> routines, effectively
> converting to a fract type.
> 
> Is this doable ? (I understand it'd be a hell of a thing to 
> modify towards).
> 
> In any case, I've always liked floats, and I DO know that 
> CrossWorks has the best and 
> fastest floats/doubles, so I'm happy !
> 
> -- Kris
> 
> 
> 
> 
> 
> 
> ------------------------ Yahoo! Groups Sponsor 
> ---------------------~-->
> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
> Printer at MyInks.com.  Free s/h on orders $50 or more to the 
> US & Canada.
> http://www.c1tracking.com/l.asp?cidU11
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/CFFolB/TM
> --------------------------
> -------~->
> 
> .
> 
>  
> Yahoo! Groups Links
> 
> 
> 
>  
> 
>