The IAR KickStart C compiler does NOT support floating point. I assume this is managed by simply not including the routines in the libraries. In thinking about the DIV routine recently discussed, I realized that it made me curious just how quickly the floating point packages tend to work on their basic operations, such as multiplication and division and addition and subtraction. But since I do not have nor use any of the existing C compiler tools (except for the KickStart, which doesn't support FP), this made me realize that I also don't know what libraries folks use for their floating point. And even more curious about the variations in their performances... So, (*) What floating point libraries do folks use (and where to get them) when using IAR's Kickstart assembler and limited C? The FP package at the TI freetools web site? Or something else? (*) Is there a comparison of timing for the basic functions listed above for these routines? Does anyone have any particulars they've already developed in this regard, as a rule of thumb about their performances? (I noticed that Kris had written some time ago about some benchmarks, but didn't provide specific numbers that I could find.) I'm mostly interested in figures for FP where the mantissa is in the range of 16 bits to 23+ bits in size, roughly. The larger sizes (like 54+ bit mantissas) would be interesting, but further away from the question at mind -- comparison with the custom FP methods I more often use on non-FP CPUs, where much of what I do is satisfactorily managed through the use of 32/16 division with integers and 16x16 multiplication of integers used to support the custom floating point. In simulation, I'm getting a typical of 240 cycles for (x*y/z), where x, y, and z are all 16-bit unsigned, normalized (or denormal) mantissas with 16-bit signed exponents and the result maintains a 16-bit normalized, result-rounded mantissa with a 16-bit exponent. Worst case is about 300 cycles and best is about 200 cycles (without special case detection.) Jon
Basic floating point execution times, using compiler libraries
Started by ●April 9, 2004
Reply by ●April 10, 20042004-04-10
Hi Jon,
> (*) Is there a comparison of timing for the basic
functions
> listed above for these routines? Does anyone have any
> particulars they've already developed in this regard, as a rule
> of thumb about their performances? (I noticed that Kris had
> written some time ago about some benchmarks, but didn't provide
> specific numbers that I could find.)
I did do those numbers, but I think I didn't want to post them in chance
of having one party or another yelling at me :-)
I did numbers comparing :
- IAR
- CrossWorks
- HITECH - C
If you need specific cases to look at, you could D/L a demo of eg.
CrossWorks
(unrestricted wrt floats and doubles), that also is the fastest library.
If it would be too much scope, I could up my results and send them to you.
I think I averaged 20 permutations.
If you send me an off-group Email, I'll send them to you ?
Cheers,
Kris
Reply by ●April 10, 20042004-04-10
On Sat, 10 Apr 2004 16:33:48 +1000, you wrote: >I did do those numbers, but I think I didn't want to post them in chance >of having one party or another yelling at me :-) I wish they'd post their own numbers. >I did numbers comparing : >- IAR >- CrossWorks >- HITECH - C > >If you need specific cases to look at, you could D/L a demo of eg. >CrossWorks >(unrestricted wrt floats and doubles), that also is the fastest library. I probably could download several and test myself. I was just hoping someone has already put the numbers together. I also do NOT like downloading programs which install timers or other 'secret' mechanisms to protect their wares. I do it, sometimes, but never comfortably. I've had problems in the past in the case of certain companies. It wasn't pleasant, at all. >If it would be too much scope, I could up my results and send them to you. >I think I averaged 20 permutations. I'd be interested, of course. >If you send me an off-group Email, I'll send them to you ? Sure, just email my address if you like. But if you are offering them only by way of private information, I'll probably have to go generate my own figures. I cannot promise that I won't base further public questions on the basis of what you may tell me. So just tell me to go get them myself, if you are worried about it. But do you really imagine you'd be yelled at, posting just the times for the four basic operations here?? It's not like I'm asking for proprietary info, is it? Oh, well. New subject... Anyway, here are three routines below that I wrote while on a long phone meeting at work, this morning. The last routine is a copy, roughly, of what I posted before -- your basic 32/16 unsigned divide type of routine. It's smaller in code size (by 1 word), significantly faster, provides more information (remainder), and uses fewer registers than the version TI included in their notes (less resource.) (I'll post signed ones some other time, perhaps.) The second one normalizes before performing the computation (which means it computes something useful even if you try and divide 1 by 23, for example) and returns (Q+R/D)*2^E, providing a great deal of precision. But more often for normal computations, it's handier to round the R/D part into the Q part and just have a Q*2^E as the result. The first routine does this, calling the second one and then performing the rounding and then renormalization, if needed. Anyone is free to use these, as far as I'm concerned. I hope it's helpful. ; -- ; DIV32u16uQEr 32u/16u --> (16u R 16u) * 2 ^ 16s, rounded result ; -- ; ; This routine provides division of an unsigned, 32-bit dividend by an ; unsigned, 16-bit divisor and produces a rounded, unsigned, 16-bit quotient ; and a signed, 16-bit exponent. The quotient is in normalized result form. ; In order to round the result, the remainder is compared with the divisor, ; adding 1 to the quotient if (remainder >= divisor - remainder) is true. ; ; Execution is from 190 to 300 cycles: including the return, but not the ; function call. ; ; Inputs: ; ; R13 unsigned 16-bit divisor ; R14 unsigned high-order 16 bits of the dividend ; R15 low-order 16 bits of the dividend ; ; Outputs: ; ; R11 signed 16-bit exponent ; R15 rounded, normalized, unsigned 16-bit quotient result ; ; Scratches: ; ; R12 normalization temporary and counter ; R13 (divisor - remainder) temporary ; R14 unsigned 16-bit remainder temporary Div32u16uQEr call #Div32u16uQE ; perform the division calculation sub R14, R13 ; compute (divisor - remainder) cmp R13, R14 ; compare, set carry appropriately addc #0, R15 ; add in the carry result jnc Div32u16uQEr_0 ; overflow? rrc R15 ; yes -- shift down by 1 inc R11 ; and update the exponent Div32u16uQEr_0 ret ; either way, done! ; -- ; DIV32u16uQE 32u/16u --> (16u R 16u) * 2 ^ 16s ; -- ; ; This routine provides division of an unsigned, 32-bit dividend by an ; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient, an ; unsigned 16-bit remainder, a new unsigned divisor for that remainder, plus ; a signed, 16-bit exponent. The quotient is in normalized result form. ; ; Execution is from 176 to 284 cycles: including the return, but not the ; function call. ; ; Inputs: ; ; R13 unsigned 16-bit divisor ; R14 unsigned high-order 16 bits of the dividend ; R15 low-order 16 bits of the dividend ; ; Outputs: ; ; R11 signed 16-bit exponent ; R13 new unsigned 16-bit divisor, matched to remainder ; R14 unsigned 16-bit remainder ; R15 normalized, unsigned 16-bit quotient ; ; Scratches: ; ; R12 normalization temporary and counter Div32u16uQE clr R11 ; ---- ; Normalize the divisor and update the exponent. The exponent ; will be too high, by 1, when we exit -- corrected soon. ; ---- ; This part handles the special case where only the upper 8 ; bits of the divisor are all zero. bit #0xFF00, R13 ; upper 8 bits all zero? jnz Div32u16uQE_0 ; no -- do normal shifting swpb R13 ; yes -- shift up by 8 bits add #8, R11 ; and update the exponent ; This part handles the remaining part of the normalization of ; the divisor by a typical shifting process. Div32u16uQE_0 inc R11 rla R13 jnc Div32u16uQE_0 rrc R13 ; ---- ; Normalize the dividend and update the exponent. This part ; would set the exponent too low, by 1, and corrects above error. ; ---- ; This part handles the special case where the upper 16 bits ; of the dividend are all zero. cmp #0, R14 ; upper 16 bits all zero? jne Div32u16uQE_1 ; no -- check upper 8 bits, only mov R15, R14 ; yes -- shift lower 16 to upper clr R15 ; and put zeros into lower 16 sub #16, R11 ; and update the exponent bit #0xFF00, R14 ; upper 8 bits all zero? jnz Div32u16uQE_2 ; no -- do normal shifting swpb R14 ; yes -- shift up by 8 bits sub #8, R11 ; and update the exponent jmp Div32u16uQE_2 ; and go do normal shifting ; This part handles the special case where only the upper 8 ; bits of the dividend are all zero and where the lower 16 ; bits may have important content to retain. Div32u16uQE_1 bit #0xFF00, R14 ; upper 8 bits all zero? jnz Div32u16uQE_2 ; no -- do normal shifting swpb R14 ; yes -- shift up by 8 bits swpb R15 ; lower part holds mixture mov R15, R12 ; so, get a copy of it bic #0xFF, R15 ; shift zeros into lower 16 part and #0xFF, R12 ; extract bits to go to upper 16 bis R12, R14 ; merge them into the upper 16 sub #8, R11 ; and update the exponent ; This part handles the remaining part of the normalization of ; the dividend by a typical shifting process. Div32u16uQE_2 dec R11 rla R15 rlc R14 jnc Div32u16uQE_2 rrc R14 rrc R15 ; carry = 0, always ; ---- ; Now, make sure that the quotient won't overflow. ; ---- cmp R13, R14 ; if the dividend's high order jlo Div32u16uQE_3 ; 16 bits is larger than the rrc R14 ; divisor, then shift the rrc R15 ; dividend down by 1 so that inc R11 ; this isn't true. Div32u16uQE_3 ; ---- ; Perform the division on the normalized values, by falling ; through... ; ---- ; -- ; DIV32u16uQR 32u/16u --> 16u R 16u ; -- ; ; This routine provides division of an unsigned, 32-bit dividend by an ; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and ; an unsigned, 16-bit remainder. ; ; No normalization takes place in this routine and no exponent is produced. ; ; Execution is from 149 to 181 cycles, typically 172: including the return, ; but not the call. ; ; Inputs: ; ; R13 unsigned 16-bit divisor ; R14 unsigned high-order 16 bits of the dividend ; R15 low-order 16 bits of the dividend ; ; Outputs: ; ; R14 unsigned 16-bit remainder ; R15 unsigned 16-bit quotient ; ; Scratches: ; ; R12 counter Div32u16uQR mov #16, R12 ; set up the shift counter Div32u16uQR_0 rla R15 rlc R14 jc Div32u16uQR_1 sub R13, R14 jc Div32u16uQR_2 add R13, R14 dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return Div32u16uQR_1 sub R13, R14 Div32u16uQR_2 inc R15 dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return Jon
Reply by ●April 10, 20042004-04-10
Hi Jon, > I wish they'd post their own numbers. So do I :-) > I probably could download several and test myself. I was just hoping someone > has already put the numbers together. I also do NOT like downloading programs > which install timers or other 'secret' mechanisms to protect their wares. I do > it, sometimes, but never comfortably. I've had problems in the past in the case > of certain companies. It wasn't pleasant, at all. Same here. I once installed a demo of some sort, and at that time I used an internal 14.4K modem. The modem never worked again after that demo. (something in the registry managed to hijack the "serial port", wherever I assigned it... ? ) I install "demos" on a separate machine now. > Sure, just email my address if you like. But if you are offering them only by > way of private information, I'll probably have to go generate my own figures. I > cannot promise that I won't base further public questions on the basis of what > you may tell me. That's not a problem, indeed it isn't private nor proprietary, that'd be silly. I can say that Rowley's CrossWorks are the fastest floats by far. > So just tell me to go get them myself, if you are worried > about it. But do you really imagine you'd be yelled at, posting just the times > for the four basic operations here?? It's not like I'm asking for proprietary > info, is it? I think you misunderstand what I mean. Even as recent as 2 months ago, I posted an unbiased benchmark, similar to what you're asking - and I got an Email from a vendor off-group - let's just say, an unpleasant experience I just don't need that crap anymore - thus I'm a bit more prudent, that's all. I only offered to help. -- Kris
Reply by ●April 10, 20042004-04-10
On Sat, 10 Apr 2004 19:56:59 +1000, Kris wrote: >I only offered to help. Thanks for the offer. Very much appreciated. And I'm be more than glad to accept, if there is no issue with my writing about what you say, here. I'm gathering that you do have a problem with that, though. >I can say that Rowley's CrossWorks are the fastest floats by far. Thanks. That's a good place for me to look as a reference, then. >Even as recent as 2 months ago, I posted an unbiased benchmark, similar to >what you're asking - and I got an Email from a vendor off-group - let's just say, >an unpleasant experience That vendor probably deserves to be outed -- and perhaps more than that. >I just don't need that crap anymore - thus I'm a bit more prudent, that's all. I'm surprised you could be bullied in that fashion. I don't think I could be, but perhaps I haven't had your experience, yet. Sorry to hear that this vendor managed to affect your behavior over something like this. Jon
Reply by ●April 10, 20042004-04-10
Hi Jon,
> I'm surprised you could be bullied in that
fashion. I don't think I could be,
> but perhaps I haven't had your experience, yet. Sorry to hear that
this vendor
> managed to affect your behavior over something like this.
Oh, it's more a crescendo Jon :-)
There's been a few occasions where things are taken the wrong way.
That's inherent with being not verbose enough in monologue text.
And being too verbose risks losing the focus of the reader, it's a dilemma
sometimes.
When Rowley were close to releasing CrossWorks, I announced its arrival with
great
enthusiasm (I'd been alpha testing it a bit), and then copped all this
smart arse comments
about "Kris's wondertools". Sometimes it's just a bit much I
guess, one gets a bit weary at times.
More recently someone here made it clear in a non-direct way that some posters
send a post to
_help_ people, whereas at other times there's the "smartarse"
comments on questions,
in a fashion like "look how incredibly clever I am" rather than a
simple gesture to help someone,
and derive some satisfaction from that ?
I'll see where I put those results, if I can dig them up and send then them
to you.
I didn't mean you can't make a reference to them Jon :-)
I hope that clarifies a bit where I might seem weary.
Take care,
Kris
Reply by ●April 10, 20042004-04-10
On Sat, 10 Apr 2004 21:16:17 +1000, Kris wrote: >There's been a few occasions where things are taken the wrong way. >That's inherent with being not verbose enough in monologue text. >And being too verbose risks losing the focus of the reader, it's >a dilemma sometimes. I follow that. >When Rowley were close to releasing CrossWorks, I announced its arrival with great >enthusiasm (I'd been alpha testing it a bit), and then copped all this smart arse comments >about "Kris's wondertools". Sometimes it's just a bit much I guess, one gets a bit weary at times. Understood, I think. >More recently someone here made it clear in a non-direct way that some posters send >a post to _help_ people, whereas at other times there's the "smartarse" comments on >questions, in a fashion like "look how incredibly clever I am" rather than a simple >gesture to help someone, and derive some satisfaction from that ? I enjoy helping, if I can, Kris. I also enjoy taking on challenges. I also enjoy just learning, for learning's sake alone. I don't mind being a gadfly of sorts, where I think some verbal prodding may teach me something new about some technical issue or knock a brick out of the foundation of some argument (which almost always also teaches me something technical, in the end.) Just for your information (and anyone else who may care, at all), I've not been using MSP430 for anything but a toy project up to this point. I read and enjoy the posts here because I plan to use it soon enough more seriously and I enjoy reading people who do bother to write something and have a decent mind when they do so. To be honest, I had figured on using the part for a new tool last summer, but I've been doing other projects with higher priority and it has had to wait. In the interim, there is another project which is looming where I'll probably select the MSP430 for it. (So, perhaps soon, I keep saying to myself.) I contribute where I can, given my very modest experience with the chip -- which does not include detailed experience, but more general experience that I draw from prior projects. At this point, an "informed hobbyist, newly interested in the MSP430" describes me better, I think. I'll be more silent, than not; more reading, than posting; but that's just the way it is for me, just now. I also tend to be verbose enough. If you notice my original question here about this, you'll notice paragraphs of text designed to get across where I'm coming from on this question. Not only that, I've included a fair amount of code to clarify where I'm coming from, as if the text itself wasn't enough. There's always more, of course. But the upshot in my case, if there is still any question at all in your mind about it, is that I'm curious and like to learn from others. That's it. I don't expect answers from anyone, either. I have no right to and don't feel otherwise about it. For example, I recently posted a question on writing multi-module source files in assembly, which garnered not a single response to a very serious, detailed question about the IAR assembler tool. Not even from a representative from IAR, which makes the product and sells it as part of their toolset. But I didn't press the point, either. I accepted the lack of response as being due to a variety of reasons I cannot well fathom and left it at that. No harm, no foul. I've no competing products -- no compilers, no operating systems, etc. I've no business to push here (I don't solicit work nor do I look for or respond to posts put in here asking for contractors.) I'm not looking for a single nickel from being here -- I have plenty of work from customers I like a lot, without that. I hope that puts me in a box for you, Kris, in case you had any doubts about it. When I'm looking into floating point performance, it's only because some curiosity about the details has taken me there. That's all. >I'll see where I put those results, if I can dig them up and send then >them to you. Thanks very much, Kris. It will be accepted with gratitude. >I didn't mean you can't make a reference to them Jon :-) Okay. I honestly didn't know, one way or another. >I hope that clarifies a bit where I might seem weary. It does, somewhat (by excluding one of my previous thoughts and thus reducing the set of possibilities.) Jon
Reply by ●April 10, 20042004-04-10
Jon : You have said it : I am also " informed hobbyist, newly interested in the MSP430" It is nice to read posts which have subsatance in them. Incidently, I have read all the 8900 posts on this list. Madhav Joshi
Reply by ●April 10, 20042004-04-10
Jon : You have said it : I am also " informed hobbyist, newly interested in the MSP430" It is nice to read posts which have subsatance in them. Incidently, I have read all the 7900 plus posts on this list. I am still learning about MSP430 and yet to take a plunge. Can the experts on this list give me some frank and free advice on how and where to start please ? Madhav Joshi
Reply by ●April 11, 20042004-04-11
Jon, > >I did do those numbers, but I think I didn't want to post > them in chance > >of having one party or another yelling at me :-) > > I wish they'd post their own numbers. That would mean we would have to agree on things such as input values, precision, whether NaNs and INFs were supported, gradual underflow, sticky bits, and what rounding ode to use, and whether numbers are correctly rounded. What's the point? Even more, if we tested other logarithmic and transcendental functions, then we'd have to agree on how precise the answers were, to how many ulps, and so on. And we'd need to agree on whether the hardware multiplier is used or not, or any other peculiarities. > >I did numbers comparing : > >- IAR > >- CrossWorks > >- HITECH - C > > > >If you need specific cases to look at, you could D/L a demo of eg. > >CrossWorks > >(unrestricted wrt floats and doubles), that also is the > fastest library. > > I probably could download several and test myself. I was > just hoping someone > has already put the numbers together. I also do NOT like > downloading programs > which install timers or other 'secret' mechanisms to protect > their wares. I do > it, sometimes, but never comfortably. I've had problems in > the past in the case > of certain companies. It wasn't pleasant, at all. Hey, I don't like the fact I have to activate Windows XP. If this is your attitude, then I presume you don't like XP and don't want to use it? > and #0xFF, R12 ; extract bits Either use and.b #-1, r12 ; 1 word, 1 cycle or mov.b r12, r12 ; 1 word, 1 cycle -- Paul.