EmbeddedRelated.com
Forums

Basic floating point execution times, using compiler libraries

Started by Jonathan Kirwan April 9, 2004
The IAR KickStart C compiler does NOT support floating point.  I
assume this is managed by simply not including the routines in
the libraries.

In thinking about the DIV routine recently discussed, I realized
that it made me curious just how quickly the floating point
packages tend to work on their basic operations, such as
multiplication and division and addition and subtraction.

But since I do not have nor use any of the existing C compiler
tools (except for the KickStart, which doesn't support FP), this
made me realize that I also don't know what libraries folks use
for their floating point.  And even more curious about the
variations in their performances...

So,

(*)  What floating point libraries do folks use (and where to
get them) when using IAR's Kickstart assembler and limited C?
The FP package at the TI freetools web site?  Or something else?

(*)  Is there a comparison of timing for the basic functions
listed above for these routines?  Does anyone have any
particulars they've already developed in this regard, as a rule
of thumb about their performances?  (I noticed that Kris had
written some time ago about some benchmarks, but didn't provide
specific numbers that I could find.)

I'm mostly interested in figures for FP where the mantissa is in
the range of 16 bits to 23+ bits in size, roughly.  The larger
sizes (like 54+ bit mantissas) would be interesting, but further
away from the question at mind -- comparison with the custom FP
methods I more often use on non-FP CPUs, where much of what I do
is satisfactorily managed through the use of 32/16 division with
integers and 16x16 multiplication of integers used to support
the custom floating point.

In simulation, I'm getting a typical of 240 cycles for (x*y/z),
where x, y, and z are all 16-bit unsigned, normalized (or
denormal) mantissas with 16-bit signed exponents and the result
maintains a 16-bit normalized, result-rounded mantissa with a
16-bit exponent.  Worst case is about 300 cycles and best is
about 200 cycles (without special case detection.)

Jon

Beginning Microcontrollers with the MSP430

Hi Jon,

> (*)  Is there a comparison of timing for the basic
functions
> listed above for these routines?  Does anyone have any
> particulars they've already developed in this regard, as a rule
> of thumb about their performances?  (I noticed that Kris had
> written some time ago about some benchmarks, but didn't provide
> specific numbers that I could find.)

I did do those numbers, but I think I didn't want to post them in chance
of having one party or another yelling at me :-)

I did numbers comparing :
- IAR
- CrossWorks
- HITECH - C

If you need specific cases to look at, you could D/L a demo of eg.
CrossWorks
(unrestricted wrt floats and doubles), that also is the fastest library.
If it would be too much scope, I could up my results and send them to you.
I think I averaged 20 permutations.
If you send me an off-group Email, I'll send them to you ?

Cheers,
Kris


On Sat, 10 Apr 2004 16:33:48 +1000, you wrote:

>I did do those numbers, but I think I didn't
want to post them in chance
>of having one party or another yelling at me :-)

I wish they'd post their own numbers.

>I did numbers comparing :
>- IAR
>- CrossWorks
>- HITECH - C
>
>If you need specific cases to look at, you could D/L a demo of eg.
>CrossWorks
>(unrestricted wrt floats and doubles), that also is the fastest library.

I probably could download several and test myself.  I was just hoping someone
has already put the numbers together.  I also do NOT like downloading programs
which install timers or other 'secret' mechanisms to protect their
wares.  I do
it, sometimes, but never comfortably.  I've had problems in the past in the
case
of certain companies.  It wasn't pleasant, at all.

>If it would be too much scope, I could up my
results and send them to you.
>I think I averaged 20 permutations.

I'd be interested, of course.

>If you send me an off-group Email, I'll send
them to you ?

Sure, just email my address if you like.  But if you are offering them only by
way of private information, I'll probably have to go generate my own
figures.  I
cannot promise that I won't base further public questions on the basis of
what
you may tell me.  So just tell me to go get them myself, if you are worried
about it.  But do you really imagine you'd be yelled at, posting just the
times
for the four basic operations here??  It's not like I'm asking for
proprietary
info, is it?

Oh, well.  New subject...

Anyway, here are three routines below that I wrote while on a long phone meeting
at work, this morning.  The last routine is a copy, roughly, of what I posted
before -- your basic 32/16 unsigned divide type of routine.  It's smaller
in
code size (by 1 word), significantly faster, provides more information
(remainder), and uses fewer registers than the version TI included in their
notes (less resource.)  (I'll post signed ones some other time, perhaps.) 
The
second one normalizes before performing the computation (which means it computes
something useful even if you try and divide 1 by 23, for example) and returns
(Q+R/D)*2^E, providing a great deal of precision.  But more often for normal
computations, it's handier to round the R/D part into the Q part and just
have a
Q*2^E as the result.  The first routine does this, calling the second one and
then performing the rounding and then renormalization, if needed.  Anyone is
free to use these, as far as I'm concerned.  I hope it's helpful.



;   --
;   DIV32u16uQEr    32u/16u --> (16u R 16u) * 2 ^ 16s, rounded result
;   --
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces a rounded, unsigned, 16-bit quotient
;   and a signed, 16-bit exponent.  The quotient is in normalized result form.
;   In order to round the result, the remainder is compared with the divisor,
;   adding 1 to the quotient if (remainder >= divisor - remainder) is true.
;
;   Execution is from 190 to 300 cycles: including the return, but not the
;   function call.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R11     signed 16-bit exponent
;       R15     rounded, normalized, unsigned 16-bit quotient result
;
;   Scratches:
;
;       R12     normalization temporary and counter
;       R13     (divisor - remainder) temporary
;       R14     unsigned 16-bit remainder temporary

Div32u16uQEr        call    #Div32u16uQE    ; perform the division calculation
                    sub     R14, R13        ; compute (divisor - remainder)
                    cmp     R13, R14        ; compare, set carry appropriately
                    addc    #0, R15         ; add in the carry result
                    jnc     Div32u16uQEr_0  ; overflow?
                    rrc     R15             ; yes -- shift down by 1
                    inc     R11             ;   and update the exponent
Div32u16uQEr_0      ret                     ; either way, done!



;   --
;   DIV32u16uQE     32u/16u --> (16u R 16u) * 2 ^ 16s
;   --
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient, an
;   unsigned 16-bit remainder, a new unsigned divisor for that remainder, plus
;   a signed, 16-bit exponent.  The quotient is in normalized result form.
;
;   Execution is from 176 to 284 cycles: including the return, but not the
;   function call.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R11     signed 16-bit exponent
;       R13     new unsigned 16-bit divisor, matched to remainder
;       R14     unsigned 16-bit remainder
;       R15     normalized, unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     normalization temporary and counter

Div32u16uQE         clr     R11

            ; ----
            ; Normalize the divisor and update the exponent.  The exponent
            ; will be too high, by 1, when we exit -- corrected soon.
            ; ----

                ; This part handles the special case where only the upper 8
                ; bits of the divisor are all zero.

                    bit     #0xFF00, R13    ; upper 8 bits all zero?
                    jnz     Div32u16uQE_0   ; no -- do normal shifting
                    swpb    R13             ; yes -- shift up by 8 bits
                    add     #8, R11         ;   and update the exponent

                ; This part handles the remaining part of the normalization of
                ; the divisor by a typical shifting process.

Div32u16uQE_0       inc     R11
                    rla     R13
                    jnc     Div32u16uQE_0
                    rrc     R13

            ; ----
            ; Normalize the dividend and update the exponent.  This part
            ; would set the exponent too low, by 1, and corrects above error.
            ; ----

                ; This part handles the special case where the upper 16 bits
                ; of the dividend are all zero.

                    cmp     #0, R14         ; upper 16 bits all zero?
                    jne     Div32u16uQE_1   ; no -- check upper 8 bits, only
                    mov     R15, R14        ; yes -- shift lower 16 to upper
                    clr     R15             ;   and put zeros into lower 16
                    sub     #16, R11        ;   and update the exponent
                    bit     #0xFF00, R14    ; upper 8 bits all zero?
                    jnz     Div32u16uQE_2   ; no -- do normal shifting
                    swpb    R14             ; yes -- shift up by 8 bits
                    sub     #8, R11         ;   and update the exponent
                    jmp     Div32u16uQE_2   ;   and go do normal shifting

                ; This part handles the special case where only the upper 8
                ; bits of the dividend are all zero and where the lower 16
                ; bits may have important content to retain.

Div32u16uQE_1       bit     #0xFF00, R14    ; upper 8 bits all zero?
                    jnz     Div32u16uQE_2   ; no -- do normal shifting
                    swpb    R14             ; yes -- shift up by 8 bits
                    swpb    R15             ;   lower part holds mixture
                    mov     R15, R12        ;   so, get a copy of it
                    bic     #0xFF, R15      ;   shift zeros into lower 16 part
                    and     #0xFF, R12      ;   extract bits to go to upper 16
                    bis     R12, R14        ;   merge them into the upper 16
                    sub     #8, R11         ;   and update the exponent

                ; This part handles the remaining part of the normalization of
                ; the dividend by a typical shifting process.

Div32u16uQE_2       dec     R11
                    rla     R15
                    rlc     R14
                    jnc     Div32u16uQE_2
                    rrc     R14
                    rrc     R15             ; carry = 0, always

            ; ----
            ; Now, make sure that the quotient won't overflow.
            ; ----

                    cmp     R13, R14        ; if the dividend's high order
                    jlo     Div32u16uQE_3   ;   16 bits is larger than the
                    rrc     R14             ;   divisor, then shift the
                    rrc     R15             ;   dividend down by 1 so that
                    inc     R11             ;   this isn't true.
Div32u16uQE_3

            ; ----
            ; Perform the division on the normalized values, by falling
            ; through...
            ; ----


;   --
;   DIV32u16uQR     32u/16u --> 16u R 16u
;   --
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and
;   an unsigned, 16-bit remainder.
;
;   No normalization takes place in this routine and no exponent is produced.
;
;   Execution is from 149 to 181 cycles, typically 172: including the return,
;   but not the call.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R14     unsigned 16-bit remainder
;       R15     unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     counter

Div32u16uQR         mov     #16, R12        ; set up the shift counter
Div32u16uQR_0       rla     R15
                    rlc     R14
                    jc      Div32u16uQR_1
                    sub     R13, R14
                    jc      Div32u16uQR_2
                    add     R13, R14
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return
Div32u16uQR_1       sub     R13, R14
Div32u16uQR_2       inc     R15
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return


Jon

Hi Jon,

> I wish they'd post their own numbers.

So do I :-)

> I probably could download several and test myself.
 I was just hoping someone
> has already put the numbers together.  I also do NOT like downloading
programs
> which install timers or other 'secret' mechanisms to protect
their wares.  I do
> it, sometimes, but never comfortably.  I've had problems in the past
in the case
> of certain companies.  It wasn't pleasant, at all.

Same here.
I once installed a demo of some sort, and at that time I used an internal 14.4K
modem.
The modem never worked again after that demo. (something in the registry managed
to hijack the "serial port", wherever I assigned it... ? )
I install "demos" on a separate machine now.

> Sure, just email my address if you like.  But if
you are offering them only by
> way of private information, I'll probably have to go generate my own
figures.  I
> cannot promise that I won't base further public questions on the basis
of what
> you may tell me.  

That's not a problem, indeed it isn't private nor proprietary,
that'd be silly.
I can say that Rowley's CrossWorks are the fastest floats by far.

> So just tell me to go get them myself, if you are
worried
> about it.  But do you really imagine you'd be yelled at, posting just
the times
> for the four basic operations here??  It's not like I'm asking
for proprietary
> info, is it?

I think you misunderstand what I mean.
Even as recent as 2 months ago, I posted an unbiased benchmark, similar to
what you're asking - and I got an Email from a vendor off-group -
let's just say, 
an unpleasant experience
I just don't need that crap anymore - thus I'm a bit more prudent,
that's all.

I only offered to help.

-- Kris





On Sat, 10 Apr 2004 19:56:59 +1000, Kris wrote:

>I only offered to help.

Thanks for the offer.  Very much appreciated.  And I'm be more than glad to
accept, if there is no issue with my writing about what you say, here.  I'm
gathering that you do have a problem with that, though.

>I can say that Rowley's CrossWorks are the
fastest floats by far.

Thanks.  That's a good place for me to look as a reference, then.

>Even as recent as 2 months ago, I posted an
unbiased benchmark, similar to
>what you're asking - and I got an Email from a vendor off-group -
let's just say, 
>an unpleasant experience

That vendor probably deserves to be outed -- and perhaps more than that.

>I just don't need that crap anymore - thus
I'm a bit more prudent, that's all.

I'm surprised you could be bullied in that fashion.  I don't think I
could be,
but perhaps I haven't had your experience, yet.  Sorry to hear that this
vendor
managed to affect your behavior over something like this.

Jon

Hi Jon,

> I'm surprised you could be bullied in that
fashion.  I don't think I could be,
> but perhaps I haven't had your experience, yet.  Sorry to hear that
this vendor
> managed to affect your behavior over something like this.

Oh, it's more a crescendo Jon :-)
There's been a few occasions where things are taken the wrong way.
That's inherent with being not verbose enough in monologue text.
And being too verbose risks losing the focus of the reader, it's a dilemma
sometimes.

When Rowley were close to releasing CrossWorks, I announced its arrival with
great
enthusiasm (I'd been alpha testing it a bit), and then copped all this
smart arse comments 
about "Kris's wondertools". Sometimes it's just a bit much I
guess, one gets a bit weary at times.

More recently someone here made it clear in a non-direct way that some posters
send a post to 
_help_  people, whereas at other times there's the "smartarse"
comments on questions,
in a fashion like "look how incredibly clever I am" rather than a
simple gesture to help someone,
and derive some satisfaction from that ?

I'll see where I put those results, if I can dig them up and send then them
to you.
I didn't mean you can't make a reference to them Jon :-)
I hope that clarifies a bit where I might seem weary.

Take care,
Kris





On Sat, 10 Apr 2004 21:16:17 +1000, Kris wrote:

>There's been a few occasions where things are
taken the wrong way.
>That's inherent with being not verbose enough in monologue text.
>And being too verbose risks losing the focus of the reader, it's
>a dilemma sometimes.

I follow that.

>When Rowley were close to releasing CrossWorks, I
announced its arrival with great
>enthusiasm (I'd been alpha testing it a bit), and then copped all this
smart arse comments 
>about "Kris's wondertools". Sometimes it's just a bit
much I guess, one gets a bit weary at times.

Understood, I think.

>More recently someone here made it clear in a
non-direct way that some posters send
>a post to _help_  people, whereas at other times there's the
"smartarse" comments on
>questions, in a fashion like "look how incredibly clever I am"
rather than a simple
>gesture to help someone, and derive some satisfaction from that ?

I enjoy helping, if I can, Kris.  I also enjoy taking on challenges.  I also
enjoy just learning, for learning's sake alone.  I don't mind being a
gadfly of
sorts, where I think some verbal prodding may teach me something new about some
technical issue or knock a brick out of the foundation of some argument (which
almost always also teaches me something technical, in the end.)

Just for your information (and anyone else who may care, at all), I've not
been
using MSP430 for anything but a toy project up to this point.  I read and enjoy
the posts here because I plan to use it soon enough more seriously and I enjoy
reading people who do bother to write something and have a decent mind when they
do so.  To be honest, I had figured on using the part for a new tool last
summer, but I've been doing other projects with higher priority and it has
had
to wait.  In the interim, there is another project which is looming where
I'll
probably select the MSP430 for it.  (So, perhaps soon, I keep saying to myself.)

I contribute where I can, given my very modest experience with the chip -- which
does not include detailed experience, but more general experience that I draw
from prior projects.  At this point, an "informed hobbyist, newly
interested in
the MSP430" describes me better, I think.  I'll be more silent, than
not; more
reading, than posting; but that's just the way it is for me, just now.

I also tend to be verbose enough.  If you notice my original question here about
this, you'll notice paragraphs of text designed to get across where
I'm coming
from on this question.  Not only that, I've included a fair amount of code
to
clarify where I'm coming from, as if the text itself wasn't enough. 
There's
always more, of course.  But the upshot in my case, if there is still any
question at all in your mind about it, is that I'm curious and like to
learn
from others.  That's it.

I don't expect answers from anyone, either.  I have no right to and
don't feel
otherwise about it.  For example, I recently posted a question on writing
multi-module source files in assembly, which garnered not a single response to a
very serious, detailed question about the IAR assembler tool.  Not even from a
representative from IAR, which makes the product and sells it as part of their
toolset.  But I didn't press the point, either.  I accepted the lack of
response
as being due to a variety of reasons I cannot well fathom and left it at that.
No harm, no foul.

I've no competing products -- no compilers, no operating systems, etc. 
I've no
business to push here (I don't solicit work nor do I look for or respond to
posts put in here asking for contractors.)  I'm not looking for a single
nickel
from being here -- I have plenty of work from customers I like a lot, without
that.

I hope that puts me in a box for you, Kris, in case you had any doubts about it.
When I'm looking into floating point performance, it's only because
some
curiosity about the details has taken me there.  That's all.

>I'll see where I put those results, if I can
dig them up and send then
>them to you.

Thanks very much, Kris.  It will be accepted with gratitude.

>I didn't mean you can't make a reference
to them Jon :-)

Okay.  I honestly didn't know, one way or another.

>I hope that clarifies a bit where I might seem
weary.

It does, somewhat (by excluding one of my previous thoughts and thus reducing
the set of possibilities.)

Jon

Jon :
You have said it : I am also 
" informed hobbyist, newly interested in  the MSP430" 
It is nice to read posts which have subsatance in them.
Incidently, I have read all the 8900 posts on this list.
Madhav Joshi







Jon :
You have said it : I am also
" informed hobbyist, newly interested in the MSP430"
It is nice to read posts which have subsatance in them.
Incidently, I have read all the 7900 plus posts on this list.
I am still learning about MSP430 and yet to take a plunge.
Can the experts on this list give me some frank and free advice on 
how and where to start please ?

Madhav Joshi



Jon,

> >I did do those numbers, but I think I
didn't want to post 
> them in chance
> >of having one party or another yelling at me :-)
> 
> I wish they'd post their own numbers.

That would mean we would have to agree on things such as input values,
precision, whether NaNs and INFs were supported, gradual underflow,
sticky bits, and what rounding ode to use, and whether numbers are
correctly rounded.  What's the point?  Even more, if we tested other
logarithmic and transcendental functions, then we'd have to agree on how
precise the answers were, to how many ulps, and so on.  And we'd need to
agree on whether the hardware multiplier is used or not, or any other
peculiarities.

> >I did numbers comparing :
> >- IAR
> >- CrossWorks
> >- HITECH - C
> >
> >If you need specific cases to look at, you could D/L a demo of eg.
> >CrossWorks
> >(unrestricted wrt floats and doubles), that also is the 
> fastest library.
> 
> I probably could download several and test myself.  I was 
> just hoping someone
> has already put the numbers together.  I also do NOT like 
> downloading programs
> which install timers or other 'secret' mechanisms to protect 
> their wares.  I do
> it, sometimes, but never comfortably.  I've had problems in 
> the past in the case
> of certain companies.  It wasn't pleasant, at all.

Hey, I don't like the fact I have to activate Windows XP.  If this is
your attitude, then I presume you don't like XP and don't want to use
it?

>                     and     #0xFF, R12      ;  
extract bits 

Either use

and.b #-1, r12  ; 1 word, 1 cycle

or

mov.b r12, r12  ; 1 word, 1 cycle

-- Paul.