On Sun, 11 Apr 2004 09:58:26 +0100, Paul wrote: >Jon, > >> >I did do those numbers, but I think I didn't want to post >> them in chance >> >of having one party or another yelling at me :-) >> >> I wish they'd post their own numbers. > >That would mean we would have to agree on things such as input values, >precision, whether NaNs and INFs were supported, gradual underflow, >sticky bits, and what rounding ode to use, and whether numbers are >correctly rounded. No, you don't have to agree on anything to post performance figures for what you are doing. Before the advent of APR here in the US, back in the early 1970's, banks posted a wide variety of interest rates that had little or nothing to do with each other -- yet they put them out there, anyway. Even after APR was forced onto them, they still have a great many various fees regarding loans for homes, for example, which make shopping on a single number almost impossible. There is no requirement that vendors get together on these things. There never was such a requirement, there still isn't such a requirement. Customers will just have to be smarter than to compare a single number, I suppose. But that's been life for a long time and I don't expect it to change much. If your answer to all this is to not say a thing, then that's fine. I just wish it were otherwise. >What's the point? I'd like it, if nothing else. >Even more, if we tested other >logarithmic and transcendental functions, then we'd have to agree on how >precise the answers were, to how many ulps, and so on. And we'd need to >agree on whether the hardware multiplier is used or not, or any other >peculiarities. You keep saying you'd have to agree and I'll keep saying you don't need to. These details were also the problem with various FP chips for the 80x86 from other makers, too. Do you remember Cyrix, Wytek, and the like? I remember Cyrix going to some lengths, for example, to point up the errors in the Intel implementations and why. Details they claimed to have gotten right, and they backed this up with various charts and plots of average and max error over the range of their inputs. I remember reading about method details (non-linear methods like minimax) in designing their truncated coefficients (coefficient values do not carry infinite precision) for truncated calculations (calculations do not carry out in infinite precision, either.) They didn't need this 'agreement' you talk about to sell their product, document their performance, and talk about their competition in the bargain. Again, if your answer to your business approach is to avoid saying anything at all, that's fine -- of course. But I can still wish it were otherwise, too. Nothing you've said convinces me in the least that you have a good reason to stay quiet on the issue. ><snip about security methods> >Hey, I don't like the fact I have to activate Windows XP. If this is >your attitude, then I presume you don't like XP and don't want to use >it? Actually, I don't like XP and don't use XP at all. >> and #0xFF, R12 ; extract bits > >Either use > >and.b #-1, r12 ; 1 word, 1 cycle > >or > >mov.b r12, r12 ; 1 word, 1 cycle Thanks, Paul. I see exactly what you are saying here. Much appreciated! Jon
Basic floating point execution times, using compiler libraries
Started by ●April 9, 2004
Reply by ●April 11, 20042004-04-11
Reply by ●April 11, 20042004-04-11
On Sun, 11 Apr 2004 09:58:26 +0100, you wrote: >That would mean we would have to agree on things such as input values, By this, do you mean how much shifting in denormalization is required, as one aspect here? In other words, the exact quantities to be processed in the test? If so, understood. Of course, I think that any summary of performance should include some reasoning about them. Just putting a number out there (or two, or three) without any explanation is bad form. But you might talk about the methodology used to arrive at various figures and provide a few numbers which, in your informed opinion, are meaningful and then go on with some detail about how they were developed. You don't have to agree on the inputs values with competitors in order to say something meaningful. >precision, What's the format you use, Paul? Is it different from the others? If so, how so? Is this information available from your site without having to install the package first? I'm seriously interested. >whether NaNs and INFs were supported, >gradual underflow, >sticky bits, and what rounding ode to use, and whether numbers are >correctly rounded And more, I know. Do you support these? Is it so hard to include that detail with posted numbers? Or, if the cost of supporting some of them is high enough, provide libraries for those who don't deal with them, too? This doesn't say why you can't provide information. Instead, it says you need to communicate. It's not like we need to be treated like idiots. Or do you see it otherwise? Paul, I'm sure you have an excellent technical product and are a superb talent. Nothing I'm saying challenges that fact. I just don't agree with what appears to me to be a sour attitude towards customers' intelligence. Talk about what you are making, how it performs, etc. Let us read what's said and make sense of it for ourselves. Tell us more, not less. Jon
Reply by ●April 11, 20042004-04-11
On Sun, 11 Apr 2004 09:58:26 +0100, Paul wrote: >>Div32u16uQE_1 bit #0xFF00, R14 ; upper 8 bits all zero? >> jnz Div32u16uQE_2 ; no -- do normal shifting >> swpb R14 ; yes -- shift up by 8 bits >> swpb R15 ; lower part holds mixture >> mov.b R15, R12 ; so, get a copy of it >> bic #0xFF, R15 ; shift zeros into lower 16 part >> and #0xFF, R12 ; extract bits to go to upper 16 >> bis R12, R14 ; merge them into the upper 16 >> sub #8, R11 ; and update the exponent >>Div32u16uQE_2 >Either use > >and.b #-1, r12 ; 1 word, 1 cycle > >or > >mov.b r12, r12 ; 1 word, 1 cycle By the way, I'd been coding this while in a phone meeting and my mind wasn't as engaged as it may be. Your pointer to this place made me take a quieter moment to look at what's being done. Here's slightly faster code: Div32u16uQE_1 bit #0xFF00, R14 ; upper 8 bits all zero? jnz Div32u16uQE_2 ; no -- do normal shifting xor R15, R14 ; yes -- shift the mov.b R15, R15 ; dividend up by 8 xor R15, R14 ; bits, placing zeros swpb R14 ; into the lower- swpb R15 ; most 8 bits and sub #8, R11 ; update the exponent Div32u16uQE_2 This strips 2 cycles and 3 words off of my original code and 1 cycle and 2 words off of what results from your idiom replacement and it further removes a need for using R12 as a quick temporary. It's a common method I'd forgotten to use at the time, but having my attention back to this recalled it. Jon
Reply by ●April 12, 20042004-04-12
Hi Jon, > >> >I did do those numbers, but I think I didn't want to post > >> them in chance > >> >of having one party or another yelling at me :-) > >> > >> I wish they'd post their own numbers. > > > >That would mean we would have to agree on things such as > input values, > >precision, whether NaNs and INFs were supported, gradual underflow, > >sticky bits, and what rounding ode to use, and whether numbers are > >correctly rounded. > [snip] > There is no requirement that vendors get together on these > things. There never > was such a requirement, there still isn't such a requirement. > Customers will > just have to be smarter than to compare a single number, I > suppose. But that's > been life for a long time and I don't expect it to change much. > > If your answer to all this is to not say a thing, then that's > fine. I just wish > it were otherwise. No, I don't mind providing figures. We also provided TI with benchmark figures for code size on common benchmarks. Only two vendors (ourselves invluded) responded, out of those invited to do so, as of this time. I might pop together a quick pseudo-benchmark. I need to provide TI with other figures for our runtime routines, so this ties in nicely. > >What's the point? > > I'd like it, if nothing else. Well, I might as well do it then. > >Even more, if we tested other > >logarithmic and transcendental functions, then we'd have to > agree on how > >precise the answers were, to how many ulps, and so on. And > we'd need to > >agree on whether the hardware multiplier is used or not, or any other > >peculiarities. > > You keep saying you'd have to agree and I'll keep saying you > don't need to. I know how benchmarks can be used against me... :-( I can pick numberes to make us look good, or to prove something. But, for instance, we have stable functions that are well-defined over a wide range of numbers and if we compete against some library that doesn't do accurate range reduction with a double precision pi, then we'll look bad. No point saying we work over a wider domain, customers just point to benchmark results. Perhaps you're not an average customer... > Again, if your answer to your business approach is to avoid > saying anything at > all, that's fine -- of course. But I can still wish it were > otherwise, too. You don't know me very well if you think I won't say anything at all. ;-) > Nothing you've said convinces me in the least that you have a > good reason to > stay quiet on the issue. No, as I wrote the floating point routines for the MSP430, I know how much effort I put into it. I don't need to keep quiet about performance because I think we have some tidy libraries. > ><snip about security methods> > > >Hey, I don't like the fact I have to activate Windows XP. If this is > >your attitude, then I presume you don't like XP and don't want to use > >it? > > Actually, I don't like XP and don't use XP at all. But is that because of the activation requirement or something else? You use a version of Windows as you admit to using KickStart; so I guess it *is* the activation requirement? > >> and #0xFF, R12 ; extract bits > > > >Either use > > > >and.b #-1, r12 ; 1 word, 1 cycle > > > >or > > > >mov.b r12, r12 ; 1 word, 1 cycle > > Thanks, Paul. I see exactly what you are saying here. Much > appreciated! No problem. -- Paul.
Reply by ●April 12, 20042004-04-12
Jon, > >That would mean we would have to agree on things such as > input values, > > By this, do you mean how much shifting in denormalization is > required, as one > aspect here? In other words, the exact quantities to be > processed in the test? > If so, understood. That type of thing, yes. However, we have more than one floating point implementation... > Of course, I think that any summary of performance should > include some reasoning > about them. Just putting a number out there (or two, or > three) without any > explanation is bad form. But you might talk about the > methodology used to > arrive at various figures and provide a few numbers which, in > your informed > opinion, are meaningful and then go on with some detail about > how they were > developed. Sure, I don't mind doing this. > You don't have to agree on the inputs values with competitors > in order to say > something meaningful. Well, I don't mind putting together my own benchmark suite to see how well things go. However, I have no idea how other vendors will benchmark using our test suite because they don't have cycle-accurate simulators. I can run my code under a simulator and get exact cycle times, which is easy. Running on hardware is more problematic, but I can use a timer I suppose. > >precision, > > What's the format you use, Paul? Is it different from the > others? If so, how > so? Is this information available from your site without > having to install the > package first? All vendors must have agreed on IEEE-754 or its ISO equivalent IEC60995 by now. We use the standard single and double precisions (8.23 and 12.53 formats) but we also have a library that will do double extended precision and quadruple precision in all IEEE rounding modes and has all IEEE features. We don't ship this version for the MSP430 as it's too big and too slow for any practical purpose. I believe all other vendors, excepting IAR, only provide 32-bit IEEE floating point. Who knows, they may even use the TI libraries to provide their FP, in which case they won't be IEEE compliant. > I'm seriously interested. > > >whether NaNs and INFs were supported, > >gradual underflow, > >sticky bits, and what rounding ode to use, and whether numbers are > >correctly rounded > > And more, I know. Do you support these? Is it so hard to > include that detail > with posted numbers? In our IEEE-compliant library, yes. In the library we ship for the MSP430, NaNs and infinities are handled, but only as outputs from, e.g. 0/0, not as inputs. Rounding mode is fixed at IEEE round to nearest with tie breaks. > Or, if the cost of supporting some of them is high enough, > provide libraries for > those who don't deal with them, too? As I said, we have two sets of libraries, and one we choose not to ship for the MSP430. > This doesn't say why you can't provide information. Instead, > it says you need > to communicate. It's not like we need to be treated like > idiots. Or do you see > it otherwise? I only want a fair set of benchmarks; I'm not afraid to compete with anybody as long as the ground is fair. > Paul, I'm sure you have an excellent technical product and > are a superb talent. > Nothing I'm saying challenges that fact. I just don't agree > with what appears > to me to be a sour attitude towards customers' intelligence. > Talk about what > you are making, how it performs, etc. Let us read what's > said and make sense of > it for ourselves. Tell us more, not less. I don't believe you to be an average customer, John. If I appear to have a sour attitude, it's not intended. I only point out that simply providing performance figures is something that is more complex than saying "floating point addition takes x cycles." You've spurred me on to do it, so I might as well do so. -- Paul.
Reply by ●April 12, 20042004-04-12
At 05:56 PM 4/11/2004 -0700, Jonathan Kirwan wrote: ... I have asked the consultant who wrote the FP to provide numbers for basic operations. We all know benchmarks are infamously bad for actually comparing compiler's performance (rant: certain 3 letter acronym company still has a PDF on their website demonstrating how their AVR compiler is 48% better than ours on one program, ignoring for example that, for the AVR Butterfly, the Atmel written code differs in a few hundred bytes, out of ~12K code...) but this should give some ideas for people who want to know the ballpark figure performance. Anycase, hopefully some times this week I can provide some numbers for the ICC430 compilers. We have a customer who did something similar for the HC11/HC12 compilers since he is using it for some sort of flight simulator (the real stuff) and he cares about such thing. // richard (This email is for mailing lists. To reach me directly, please use richard@rich...)
Reply by ●April 12, 20042004-04-12
Hi Paul, Jon et al
> That type of thing, yes. However, we have more
than one floating point
> implementation...
Ideally, if I had it my way (hum), I'd have a "fract" type
(ie. in 32 bits - bit 31 is sign, and rest is neg powers of 2).
It's really handy when you can suffice with nums between 0 and +- 0.999999.
Saves time not having to unpack, do calc and pack to eg. IEEE.
The actual floats then can call packing and unpacking routines, effectively
converting to a fract type.
Is this doable ? (I understand it'd be a hell of a thing to modify
towards).
In any case, I've always liked floats, and I DO know that CrossWorks has
the best and
fastest floats/doubles, so I'm happy !
-- Kris
Reply by ●April 12, 20042004-04-12
> Well, I don't mind putting together my own benchmark suite to see
how
> well things go. However, I have no idea how other
vendors will
> benchmark using our test suite because they don't have cycle-accurate
> simulators. I can run my code under a simulator and get exact cycle
> times, which is easy.
I can readily support this claim.
I did several tricky timing tests with Simulator and then on HW, and indeed
CrossWorks' simulator is very cycle-accurate.
(And I have very high resolution calibrated equipment to confirm it timing wise)
-- Kris
Reply by ●April 12, 20042004-04-12
Kris,
The Embedded C Standard document has fract and accum types. Although I
have implemented fract arithmetic for MSP430, I haven't plumbed in the
compiler yet. The compiler is undergoing a few changes to support the
_Complex and _Imag types, but fract and accum are just a bridge too far
at the moment.
-- Paul.
> -----Original Message-----
> From: microbit [mailto:microbit@micr...]
> Sent: 12 April 2004 12:57
> To: msp430@msp4...
> Subject: Re: [msp430] Basic floating point execution times, using
> compiler libraries
>
>
> Hi Paul, Jon et al
>
> > That type of thing, yes. However, we have more than one
> floating point
> > implementation...
>
> Ideally, if I had it my way (hum), I'd have a "fract" type
> (ie. in 32 bits - bit 31 is sign, and rest is neg powers of 2).
> It's really handy when you can suffice with nums between 0
> and +- 0.999999.
> Saves time not having to unpack, do calc and pack to eg. IEEE.
>
> The actual floats then can call packing and unpacking
> routines, effectively
> converting to a fract type.
>
> Is this doable ? (I understand it'd be a hell of a thing to
> modify towards).
>
> In any case, I've always liked floats, and I DO know that
> CrossWorks has the best and
> fastest floats/doubles, so I'm happy !
>
> -- Kris
>
>
>
>
>
>
> ------------------------ Yahoo! Groups Sponsor
> ---------------------~-->
> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
> Printer at MyInks.com. Free s/h on orders $50 or more to the
> US & Canada.
> http://www.c1tracking.com/l.asp?cidU11
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/CFFolB/TM
> --------------------------
> -------~->
>
> .
>
>
> Yahoo! Groups Links
>
>
>
>
>
>