EmbeddedRelated.com
Forums

floating point calculations.

Started by knightslancer February 10, 2009
Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> rickman wrote: > > > Any idea how they are performing the divide that it runs as fast as > > the multiply? > > 1. There are the fast hardware dividers which essentually perform > several steps of the trivial division algorithm at once.
What are these "trivial" steps?
> 2. The division can be computed as the multiplication by 1/x, where 1/x > is computed as the Taylor series. The whole series can be computed in > parallel if the hardware allows for that.
The Taylor series for 1/x is 1/x. How does that help?
> 3. LUT and approximation.

Everett M. Greene wrote:
> Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes: > >>rickman wrote: >> >>>Any idea how they are performing the divide that it runs as fast as >>>the multiply? >> >>1. There are the fast hardware dividers which essentually perform >>several steps of the trivial division algorithm at once. > > What are these "trivial" steps?
Elementary school algorithm. Each step consists of compare/subtract and shift.
>>2. The division can be computed as the multiplication by 1/x, where 1/x >>is computed as the Taylor series. The whole series can be computed in >>parallel if the hardware allows for that. > > > The Taylor series for 1/x is 1/x.
:)))))))) 1 ------ = 1 + x + x^2 + x^3 .... 1 - x
> How does that help?
A lot. The whole series can be computed in parallel in one step. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> Everett M. Greene wrote: > > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes: > > >>rickman wrote: > > >>>Any idea how they are performing the divide that it runs as fast as > >>>the multiply? > > >>1. There are the fast hardware dividers which essentually perform > >>several steps of the trivial division algorithm at once. > > > What are these "trivial" steps? > > Elementary school algorithm. Each step consists of compare/subtract and > shift.
I'm a little fuzzy on this. How exactly do you perform multiple steps in parallel if each step requires the result of the previous step to evaluate the comparison? By the fourth step there are eight conditions that have to be evaluated. For a 24 bit mantissa how many bits are evaluated at once typically? How does this run as fast as a multiply? Maybe the topic changed, but the original discussion was about software floating point. Grant Edwards said that his measurements of a software library gave equal timing results for multiply and divide. I was simply describing the algorithm used in hardware to explain why division was much slower in that case. I don't see how any of these methods would be any faster, and in fact I expect them to be slower than Newton Raphson iteration.
> >>2. The division can be computed as the multiplication by 1/x, where 1/x > >>is computed as the Taylor series. The whole series can be computed in > >>parallel if the hardware allows for that. > > > The Taylor series for 1/x is 1/x. > > :)))))))) > > =A0 =A0 1 > ------ =3D 1 + x + x^2 + x^3 .... > =A0 1 - x > > > =A0How does that help? > > A lot. The whole series can be computed in parallel in one step.
Yes, this can, in theory, be computed in one step. When working with 32 bit floating point, just how many elements of this series does it take to get a reasonable accuracy and how much logic does it take to perform all those calculations? Here is an interesting paper I found. http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf Rick
Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> Everett M. Greene wrote: > > Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes: > >>rickman wrote: > >> > >>>Any idea how they are performing the divide that it runs as fast as > >>>the multiply? > >> > >>1. There are the fast hardware dividers which essentually perform > >>several steps of the trivial division algorithm at once. > > > > What are these "trivial" steps? > > Elementary school algorithm. Each step consists of compare/subtract and > shift.
Correct, but you said several can be computed at once. I took that to mean that several bits of the division could be done at once. Perhaps you meant that all the operations for one bit position can be performed at one time.
> >>2. The division can be computed as the multiplication by 1/x, where 1/x > >>is computed as the Taylor series. The whole series can be computed in > >>parallel if the hardware allows for that. > > > > The Taylor series for 1/x is 1/x. > > :)))))))) > > 1 > ----- = 1 + x + x^2 + x^3 .... > 1 - x
Only if 0 >= x < 1 and you put some minus signs in the expression
> > How does that help? > > A lot. The whole series can be computed in parallel in one step.
On Feb 13, 4:28=A0pm, rickman <gnu...@gmail.com> wrote:
> On Feb 13, 3:29=A0pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > wrote: > > > Everett M. Greene wrote: > > > Vladimir Vassilevsky <antispam_bo...@hotmail.com> writes: > > > >>rickman wrote: > > > >>>Any idea how they are performing the divide that it runs as fast as > > >>>the multiply? > > > >>1. There are the fast hardware dividers which essentually perform > > >>several steps of the trivial division algorithm at once. > > > > What are these "trivial" steps? > > > Elementary school algorithm. Each step consists of compare/subtract and > > shift. > > I'm a little fuzzy on this. =A0How exactly do you perform multiple steps > in parallel if each step requires the result of the previous step to > evaluate the comparison? =A0By the fourth step there are eight > conditions that have to be evaluated. =A0For a 24 bit mantissa how many > bits are evaluated at once typically? =A0How does this run as fast as a > multiply? > > Maybe the topic changed, but the original discussion was about > software floating point. =A0Grant Edwards said that his measurements of > a software library gave equal timing results for multiply and divide. > I was simply describing the algorithm used in hardware to explain why > division was much slower in that case. =A0I don't see how any of these > methods would be any faster, and in fact I expect them to be slower > than Newton Raphson iteration. > > > >>2. The division can be computed as the multiplication by 1/x, where 1=
/x
> > >>is computed as the Taylor series. The whole series can be computed in > > >>parallel if the hardware allows for that. > > > > The Taylor series for 1/x is 1/x. > > > :)))))))) > > > =A0 =A0 1 > > ------ =3D 1 + x + x^2 + x^3 .... > > =A0 1 - x > > > > =A0How does that help? > > > A lot. The whole series can be computed in parallel in one step. > > Yes, this can, in theory, be computed in one step. =A0When working with > 32 bit floating point, just how many elements of this series does it > take to get a reasonable accuracy and how much logic does it take to > perform all those calculations? > > Here is an interesting paper I found. > > http://www.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-enenkel.pdf > > Rick
Actually, this discussion has become more than academic. I am estimating a job that will be using floating point in an FPGA. I have found three open source floating point packages. Over the next few days/weeks I will be digging into this. Rick
On Sat, 14 Feb 2009 12:33:13 -0800 (PST), rickman <gnuarm@gmail.com>
wrote:

>Actually, this discussion has become more than academic. I am >estimating a job that will be using floating point in an FPGA. I have >found three open source floating point packages. Over the next few >days/weeks I will be digging into this.
I haven't been following this side-thread and if I'm speaking out of turn, I apologize. But I remember that Bipolar Integrated Technology (Beaverton, Oregon) had developed a combinatorial floating point divider. A designer I worked with on a project, in fact, used the parts for a board he designed then and that's how he described the chip to me. I can't speak for support on the QNaNs and so on, though. Jon