floating point calculations.

Started by February 10, 2009
```Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> rickman wrote:
>
> > Any idea how they are performing the divide that it runs as fast as
> > the multiply?
>
> 1. There are the fast hardware dividers which essentually perform
> several steps of the trivial division algorithm at once.

What are these "trivial" steps?

> 2. The division can be computed as the multiplication by 1/x, where 1/x
> is computed as the Taylor series. The whole series can be computed in
> parallel if the hardware allows for that.

The Taylor series for 1/x is 1/x.  How does that help?

> 3. LUT and approximation.
```
```
Everett M. Greene wrote:
>
>>rickman wrote:
>>
>>>Any idea how they are performing the divide that it runs as fast as
>>>the multiply?
>>
>>1. There are the fast hardware dividers which essentually perform
>>several steps of the trivial division algorithm at once.
>
> What are these "trivial" steps?

Elementary school algorithm. Each step consists of compare/subtract and
shift.

>>2. The division can be computed as the multiplication by 1/x, where 1/x
>>is computed as the Taylor series. The whole series can be computed in
>>parallel if the hardware allows for that.
>
>
> The Taylor series for 1/x is 1/x.

:))))))))

1
------ = 1 + x + x^2 + x^3 ....
1 - x

>  How does that help?

A lot. The whole series can be computed in parallel in one step.

DSP and Mixed Signal Design Consultant

http://www.abvolt.com
```
```On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> Everett M. Greene wrote:
> > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes:
>
> >>rickman wrote:
>
> >>>Any idea how they are performing the divide that it runs as fast as
> >>>the multiply?
>
> >>1. There are the fast hardware dividers which essentually perform
> >>several steps of the trivial division algorithm at once.
>
> > What are these "trivial" steps?
>
> Elementary school algorithm. Each step consists of compare/subtract and
> shift.

I'm a little fuzzy on this.  How exactly do you perform multiple steps
in parallel if each step requires the result of the previous step to
evaluate the comparison?  By the fourth step there are eight
conditions that have to be evaluated.  For a 24 bit mantissa how many
bits are evaluated at once typically?  How does this run as fast as a
multiply?

Maybe the topic changed, but the original discussion was about
software floating point.  Grant Edwards said that his measurements of
a software library gave equal timing results for multiply and divide.
I was simply describing the algorithm used in hardware to explain why
division was much slower in that case.  I don't see how any of these
methods would be any faster, and in fact I expect them to be slower
than Newton Raphson iteration.

> >>2. The division can be computed as the multiplication by 1/x, where 1/x
> >>is computed as the Taylor series. The whole series can be computed in
> >>parallel if the hardware allows for that.
>
> > The Taylor series for 1/x is 1/x.
>
> :))))))))
>
> =A0 =A0 1
> ------ =3D 1 + x + x^2 + x^3 ....
> =A0 1 - x
>
> > =A0How does that help?
>
> A lot. The whole series can be computed in parallel in one step.

Yes, this can, in theory, be computed in one step.  When working with
32 bit floating point, just how many elements of this series does it
take to get a reasonable accuracy and how much logic does it take to
perform all those calculations?

Here is an interesting paper I found.

http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf

Rick
```
```Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> Everett M. Greene wrote:
> > Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> >>rickman wrote:
> >>
> >>>Any idea how they are performing the divide that it runs as fast as
> >>>the multiply?
> >>
> >>1. There are the fast hardware dividers which essentually perform
> >>several steps of the trivial division algorithm at once.
> >
> > What are these "trivial" steps?
>
> Elementary school algorithm. Each step consists of compare/subtract and
> shift.

Correct, but you said several can be computed at once.
I took that to mean that several bits of the division
could be done at once.  Perhaps you meant that all the
operations for one bit position can be performed at one
time.

> >>2. The division can be computed as the multiplication by 1/x, where 1/x
> >>is computed as the Taylor series. The whole series can be computed in
> >>parallel if the hardware allows for that.
> >
> > The Taylor series for 1/x is 1/x.
>
> :))))))))
>
>     1
>   ----- = 1 + x + x^2 + x^3 ....
>   1 - x

Only if 0 >= x < 1 and you put some minus signs in the expression

> >  How does that help?
>
> A lot. The whole series can be computed in parallel in one step.
```
```On Feb 13, 4:28=A0pm, rickman <gnu...@gmail.com> wrote:
> On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
> wrote:
>
> > Everett M. Greene wrote:
> > > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes:
>
> > >>rickman wrote:
>
> > >>>Any idea how they are performing the divide that it runs as fast as
> > >>>the multiply?
>
> > >>1. There are the fast hardware dividers which essentually perform
> > >>several steps of the trivial division algorithm at once.
>
> > > What are these "trivial" steps?
>
> > Elementary school algorithm. Each step consists of compare/subtract and
> > shift.
>
> I'm a little fuzzy on this. =A0How exactly do you perform multiple steps
> in parallel if each step requires the result of the previous step to
> evaluate the comparison? =A0By the fourth step there are eight
> conditions that have to be evaluated. =A0For a 24 bit mantissa how many
> bits are evaluated at once typically? =A0How does this run as fast as a
> multiply?
>
> Maybe the topic changed, but the original discussion was about
> software floating point. =A0Grant Edwards said that his measurements of
> a software library gave equal timing results for multiply and divide.
> I was simply describing the algorithm used in hardware to explain why
> division was much slower in that case. =A0I don't see how any of these
> methods would be any faster, and in fact I expect them to be slower
> than Newton Raphson iteration.
>
> > >>2. The division can be computed as the multiplication by 1/x, where 1=
/x
> > >>is computed as the Taylor series. The whole series can be computed in
> > >>parallel if the hardware allows for that.
>
> > > The Taylor series for 1/x is 1/x.
>
> > :))))))))
>
> > =A0 =A0 1
> > ------ =3D 1 + x + x^2 + x^3 ....
> > =A0 1 - x
>
> > > =A0How does that help?
>
> > A lot. The whole series can be computed in parallel in one step.
>
> Yes, this can, in theory, be computed in one step. =A0When working with
> 32 bit floating point, just how many elements of this series does it
> take to get a reasonable accuracy and how much logic does it take to
> perform all those calculations?
>
> Here is an interesting paper I found.
>
> http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf
>
> Rick

Actually, this discussion has become more than academic.  I am
estimating a job that will be using floating point in an FPGA.  I have
found three open source floating point packages.  Over the next few
days/weeks I will be digging into this.

Rick
```
```On Sat, 14 Feb 2009 12:33:13 -0800 (PST), rickman <gnuarm@gmail.com>
wrote:

>Actually, this discussion has become more than academic.  I am
>estimating a job that will be using floating point in an FPGA.  I have
>found three open source floating point packages.  Over the next few
>days/weeks I will be digging into this.

I haven't been following this side-thread and if I'm speaking out of
turn, I apologize.  But I remember that Bipolar Integrated Technology
(Beaverton, Oregon) had developed a combinatorial floating point
divider.  A designer I worked with on a project, in fact, used the
parts for a board he designed then and that's how he described the
chip to me.  I can't speak for support on the QNaNs and so on, though.

Jon
```