floating point calculations.| page 5

Reply by Everett M. Greene ●February 13, 20092009-02-13

Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> rickman wrote:
> 
> > Any idea how they are performing the divide that it runs as fast as
> > the multiply?
> 
> 1. There are the fast hardware dividers which essentually perform 
> several steps of the trivial division algorithm at once.

What are these "trivial" steps?

> 2. The division can be computed as the multiplication by 1/x, where 1/x 
> is computed as the Taylor series. The whole series can be computed in 
> parallel if the hardware allows for that.

The Taylor series for 1/x is 1/x.  How does that help?

> 3. LUT and approximation.

Reply by Vladimir Vassilevsky ●February 13, 20092009-02-13


Everett M. Greene wrote:
> Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> 
>>rickman wrote:
>>
>>>Any idea how they are performing the divide that it runs as fast as
>>>the multiply?
>>
>>1. There are the fast hardware dividers which essentually perform 
>>several steps of the trivial division algorithm at once.
> 
> What are these "trivial" steps?

Elementary school algorithm. Each step consists of compare/subtract and 
shift.

>>2. The division can be computed as the multiplication by 1/x, where 1/x 
>>is computed as the Taylor series. The whole series can be computed in 
>>parallel if the hardware allows for that.
> 
> 
> The Taylor series for 1/x is 1/x.

:))))))))

    1
------ = 1 + x + x^2 + x^3 ....
  1 - x

>  How does that help?

A lot. The whole series can be computed in parallel in one step.


Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

http://www.abvolt.com

Reply by rickman ●February 13, 20092009-02-13

On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> Everett M. Greene wrote:
> > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes:
>
> >>rickman wrote:
>
> >>>Any idea how they are performing the divide that it runs as fast as
> >>>the multiply?
>
> >>1. There are the fast hardware dividers which essentually perform
> >>several steps of the trivial division algorithm at once.
>
> > What are these "trivial" steps?
>
> Elementary school algorithm. Each step consists of compare/subtract and
> shift.

I'm a little fuzzy on this.  How exactly do you perform multiple steps
in parallel if each step requires the result of the previous step to
evaluate the comparison?  By the fourth step there are eight
conditions that have to be evaluated.  For a 24 bit mantissa how many
bits are evaluated at once typically?  How does this run as fast as a
multiply?

Maybe the topic changed, but the original discussion was about
software floating point.  Grant Edwards said that his measurements of
a software library gave equal timing results for multiply and divide.
I was simply describing the algorithm used in hardware to explain why
division was much slower in that case.  I don't see how any of these
methods would be any faster, and in fact I expect them to be slower
than Newton Raphson iteration.

> >>2. The division can be computed as the multiplication by 1/x, where 1/x
> >>is computed as the Taylor series. The whole series can be computed in
> >>parallel if the hardware allows for that.
>
> > The Taylor series for 1/x is 1/x.
>
> :))))))))
>
> =A0 =A0 1
> ------ =3D 1 + x + x^2 + x^3 ....
> =A0 1 - x
>
> > =A0How does that help?
>
> A lot. The whole series can be computed in parallel in one step.

Yes, this can, in theory, be computed in one step.  When working with
32 bit floating point, just how many elements of this series does it
take to get a reasonable accuracy and how much logic does it take to
perform all those calculations?

Here is an interesting paper I found.

http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf

Rick

Reply by Everett M. Greene ●February 14, 20092009-02-14

Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> Everett M. Greene wrote:
> > Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> >>rickman wrote:
> >>
> >>>Any idea how they are performing the divide that it runs as fast as
> >>>the multiply?
> >>
> >>1. There are the fast hardware dividers which essentually perform 
> >>several steps of the trivial division algorithm at once.
> > 
> > What are these "trivial" steps?
> 
> Elementary school algorithm. Each step consists of compare/subtract and 
> shift.

Correct, but you said several can be computed at once.
I took that to mean that several bits of the division
could be done at once.  Perhaps you meant that all the
operations for one bit position can be performed at one
time.

> >>2. The division can be computed as the multiplication by 1/x, where 1/x 
> >>is computed as the Taylor series. The whole series can be computed in 
> >>parallel if the hardware allows for that.
> > 
> > The Taylor series for 1/x is 1/x.
> 
> :))))))))
> 
>     1
>   ----- = 1 + x + x^2 + x^3 ....
>   1 - x

Only if 0 >= x < 1 and you put some minus signs in the expression

> >  How does that help?
> 
> A lot. The whole series can be computed in parallel in one step.

Reply by rickman ●February 14, 20092009-02-14

On Feb 13, 4:28=A0pm, rickman <gnu...@gmail.com> wrote:
> On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
> wrote:
>
> > Everett M. Greene wrote:
> > > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes:
>
> > >>rickman wrote:
>
> > >>>Any idea how they are performing the divide that it runs as fast as
> > >>>the multiply?
>
> > >>1. There are the fast hardware dividers which essentually perform
> > >>several steps of the trivial division algorithm at once.
>
> > > What are these "trivial" steps?
>
> > Elementary school algorithm. Each step consists of compare/subtract and
> > shift.
>
> I'm a little fuzzy on this. =A0How exactly do you perform multiple steps
> in parallel if each step requires the result of the previous step to
> evaluate the comparison? =A0By the fourth step there are eight
> conditions that have to be evaluated. =A0For a 24 bit mantissa how many
> bits are evaluated at once typically? =A0How does this run as fast as a
> multiply?
>
> Maybe the topic changed, but the original discussion was about
> software floating point. =A0Grant Edwards said that his measurements of
> a software library gave equal timing results for multiply and divide.
> I was simply describing the algorithm used in hardware to explain why
> division was much slower in that case. =A0I don't see how any of these
> methods would be any faster, and in fact I expect them to be slower
> than Newton Raphson iteration.
>
> > >>2. The division can be computed as the multiplication by 1/x, where 1=
/x
> > >>is computed as the Taylor series. The whole series can be computed in
> > >>parallel if the hardware allows for that.
>
> > > The Taylor series for 1/x is 1/x.
>
> > :))))))))
>
> > =A0 =A0 1
> > ------ =3D 1 + x + x^2 + x^3 ....
> > =A0 1 - x
>
> > > =A0How does that help?
>
> > A lot. The whole series can be computed in parallel in one step.
>
> Yes, this can, in theory, be computed in one step. =A0When working with
> 32 bit floating point, just how many elements of this series does it
> take to get a reasonable accuracy and how much logic does it take to
> perform all those calculations?
>
> Here is an interesting paper I found.
>
> http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf
>
> Rick

Actually, this discussion has become more than academic.  I am
estimating a job that will be using floating point in an FPGA.  I have
found three open source floating point packages.  Over the next few
days/weeks I will be digging into this.

Rick

Reply by Jon Kirwan ●February 14, 20092009-02-14

On Sat, 14 Feb 2009 12:33:13 -0800 (PST), rickman <gnuarm@gmail.com>
wrote:

>Actually, this discussion has become more than academic.  I am
>estimating a job that will be using floating point in an FPGA.  I have
>found three open source floating point packages.  Over the next few
>days/weeks I will be digging into this.

I haven't been following this side-thread and if I'm speaking out of
turn, I apologize.  But I remember that Bipolar Integrated Technology
(Beaverton, Oregon) had developed a combinatorial floating point
divider.  A designer I worked with on a project, in fact, used the
parts for a board he designed then and that's how he described the
chip to me.  I can't speak for support on the QNaNs and so on, though.

Jon

Previous 3 45Next

floating point calculations.

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group