Division

I know most processors have an instruction for division and certainly Forth includes division words.  But how are they implemented?  I am asking because I need to do a division in hardware in an FPGA.  

Are there processors that don't include a division instruction?  Has anyone had to implement division using something other than a native CPU division instruction?

I worked on an attached array processor many years ago (a floating point coprocessor in a rack cabinet) and it did division using Newton-Raphson iteration I believe.  It converged quickly and took some 10 or 12 cycles to complete.  Considering the multiply needed to carry it out took more than one clock cycle that's not bad.  

Are there better techniques for division?  I guess there is long division, "shift and subtract" as compared to shift and add for multiplication.  Fortunately multipliers are still common. 

-- 

  Rick C.

  - Get 1,000 miles of free Supercharging
  - Tesla referral code - https://ts.la/richard11209

Reply by Richard Damon ●August 25, 20202020-08-25

On 8/25/20 1:10 PM, Rick C wrote:
> I know most processors have an instruction for division and certainly Forth includes division words.  But how are they implemented?  I am asking because I need to do a division in hardware in an FPGA.  
> 
> Are there processors that don't include a division instruction?  Has anyone had to implement division using something other than a native CPU division instruction?
> 
> I worked on an attached array processor many years ago (a floating point coprocessor in a rack cabinet) and it did division using Newton-Raphson iteration I believe.  It converged quickly and took some 10 or 12 cycles to complete.  Considering the multiply needed to carry it out took more than one clock cycle that's not bad.  
> 
> Are there better techniques for division?  I guess there is long division, "shift and subtract" as compared to shift and add for multiplication.  Fortunately multipliers are still common. 
> 

You might want to look at
https://en.wikipedia.org/wiki/Division_algorithm

Besides the simple long division you are describing, this also describes
a couple other techniques, like SRT that can do multiple bits at once,
and Newton-Raphson that works well for big numbers as each loop doubles
the number of bits you have (but each step takes 2 multiplies)

Reply by David Brown ●August 25, 20202020-08-25

On 25/08/2020 19:10, Rick C wrote:
> I know most processors have an instruction for division and certainly
> Forth includes division words.  But how are they implemented?  I am
> asking because I need to do a division in hardware in an FPGA.
> 
> Are there processors that don't include a division instruction?  Has
> anyone had to implement division using something other than a native
> CPU division instruction?
> 

There are /lots/ of processors that don't have division instructions.
It was pretty much standard practice for small microcontrollers to avoid
hardware division instructions until the Cortex M3 became the standard.
 (Cortex M0, used for the cheapest and lowest power ARM
microcontrollers, and Cortex M1 for soft cores in FGPAs, do not have
hardware division.)

Hardware division instructions are expensive in die space and relatively
rarely used in software - they are simply not worth the cost for many
microcontroller systems.

Also, hardware division is almost invariably multi-cycle, and the
division execution unit does not lend itself to pipelining, which does
not fit well in a RISC processor when you are trying to get single cycle
for a simple and predictable core.  Some RISC designs skip division
entirely, others have a "division step" instruction that is simple
enough to be acceptable in small, low-power hardware, but speeds up
software division routines.

Even on larger devices, division is a pain - requiring multiple cycles
and without pipelining, a division is often an order of magnitude slower
in throughput and latency than a multiplication, so compilers do their
best to avoid them (preferring to multiply by the reciprocal where
possible).

If my history is correct, the Motorola 68k family had hardware division
instructions until someone realised (on the 68020, I think) that pure
software division routines were /faster/ than the hardware division
instructions.

(Very small microcontrollers don't even have multiply instructions, but
these are getting rare now.)

> I worked on an attached array processor many years ago (a floating
> point coprocessor in a rack cabinet) and it did division using
> Newton-Raphson iteration I believe.  It converged quickly and took
> some 10 or 12 cycles to complete.  Considering the multiply needed to
> carry it out took more than one clock cycle that's not bad.
> 
> Are there better techniques for division?  I guess there is long
> division, "shift and subtract" as compared to shift and add for
> multiplication.  Fortunately multipliers are still common.
> 

"Better" depends on your requirements.  Do you need to minimise hardware
size, maximize throughput, minimise latency?  Do you need exact results?
 (Not every use-case does.)  Will the denominator be re-used for
multiple divisions, or even constant?  What are your sizes?

As Richard says, the Wikipedia article can give you some suggestions (in
addition to the ones you mentioned here).  But if you can take advantage
of additional information about the situation, you might be able to do
better than standard general-purpose algorithms.

Reply by gtwrek ●August 25, 20202020-08-25

In article <44d87377-6730-4e48-90ef-f3ac49d2b0e5o@googlegroups.com>,
Rick C  <gnuarm.deletethisbit@gmail.com> wrote:
>I know most processors have an instruction for division and certainly Forth includes division words.  But how are they implemented?  I am asking because I need to do a division in hardware
>in an FPGA.  
>
>Are there processors that don't include a division instruction?  Has anyone had to implement division using something other than a native CPU division instruction?
>
>I worked on an attached array processor many years ago (a floating point coprocessor in a rack cabinet) and it did division using Newton-Raphson iteration I believe.  It converged quickly
>and took some 10 or 12 cycles to complete.  Considering the multiply needed to carry it out took more than one clock cycle that's not bad.  
>
>Are there better techniques for division?  I guess there is long division, "shift and subtract" as compared to shift and add for multiplication.  Fortunately multipliers are still common. 
>

Rick, entirely dependent on your requirements.

For < ~100 MHz, up to about ~11 bit divisors, just brute force it with
long division.  Yes, in a single clock cycle.  If you need faster, then
pipeline it.

For > 11 bit divisors I'd start getting more exotic.  Thinking ROMs to
hold look up tables of inverse values.  Then on to even more exotic
Newton-Raphson (perhaps combined with the Look up ROMs).

I've only ever needed to just brute force it (with pipelining).

Of course if you're doing this in a smaller FPGA, then you may need to
consider bit-serial like sequential algorithms that take longer to
compute each division, at the expense of more complicated hardware and
performance.

Regard,

Mark

Reply by Rick C ●August 26, 20202020-08-26

On Tuesday, August 25, 2020 at 7:31:45 PM UTC-4, gtwrek wrote:
> In article <44d87377-6730-4e48-90ef-f3ac49d2b0e5o@googlegroups.com>,
> Rick C  <gnuarm.deletethisbit@gmail.com> wrote:
> >I know most processors have an instruction for division and certainly Forth includes division words.  But how are they implemented?  I am asking because I need to do a division in hardware
> >in an FPGA.  
> >
> >Are there processors that don't include a division instruction?  Has anyone had to implement division using something other than a native CPU division instruction?
> >
> >I worked on an attached array processor many years ago (a floating point coprocessor in a rack cabinet) and it did division using Newton-Raphson iteration I believe.  It converged quickly
> >and took some 10 or 12 cycles to complete.  Considering the multiply needed to carry it out took more than one clock cycle that's not bad.  
> >
> >Are there better techniques for division?  I guess there is long division, "shift and subtract" as compared to shift and add for multiplication.  Fortunately multipliers are still common. 
> >
> 
> Rick, entirely dependent on your requirements.
> 
> For < ~100 MHz, up to about ~11 bit divisors, just brute force it with
> long division.  Yes, in a single clock cycle.  If you need faster, then
> pipeline it.
> 
> For > 11 bit divisors I'd start getting more exotic.  Thinking ROMs to
> hold look up tables of inverse values.  Then on to even more exotic
> Newton-Raphson (perhaps combined with the Look up ROMs).
> 
> I've only ever needed to just brute force it (with pipelining).
> 
> Of course if you're doing this in a smaller FPGA, then you may need to
> consider bit-serial like sequential algorithms that take longer to
> compute each division, at the expense of more complicated hardware and
> performance.

I looked at a bit serial multiplier once and it didn't save much if anything over an 8 bit sequential (shift and add) multiplier.  The control logic is much more complex for bit serial.  

I plan to do this in a small device, but don't have a feel yet for what needs to go in it.  The part does have 16 bit hardware fast multipliers and the data rate is 100,000 times slower, so why not use the available resources. 

I guess I was really asking if there were better algorithms that work like Newton-Raphson but converge more quickly or something.  From reading about Newton-Raphson if it is seeded with a small table lookup, it will converge quickly, but since time is in much greater supply than memory, it won't really matter much to start with a fixed value for every computation.  In fact, it would be perfectly acceptable to calculate worse case numbers for iterations required and simply loop that many times regardless.  Once the error is zero the result stops changing.  

This is also simplified by the fact that this is a way to match the ADC input range to the sensor output which are based on separate power supplies.  It's not necessary to use an accurate reference on the ADC if that same ADC is used to measure the sensor power supply which provides the basis for the gain setting in the sensor output.  

So the divisor will be in a narrow range.  Maybe a table lookup for that narrow range is the best way.  With a 12 bit input even 200 entries will cover 10% (&plusmn;5%).  

Darn, I keep simplifying away the fun stuff. 

-- 

  Rick C.

  + Get 1,000 miles of free Supercharging
  + Tesla referral code - https://ts.la/richard11209

Reply by gtwrek ●August 26, 20202020-08-26

In article <5c019f7b-4641-4e5b-99d8-9630936f564eo@googlegroups.com>,
Rick C  <gnuarm.deletethisbit@gmail.com> wrote:
>So the divisor will be in a narrow range.  Maybe a table lookup for that narrow range is the best way.  With a 12 bit input even 200 entries will cover 10% (&plusmn;5%).  
>
>Darn, I keep simplifying away the fun stuff. 

I hear you.  Sometimes these "do elementary school maths" with the
technology at hand, under these sets of conditions, are very satisfying
small side projects.  Easy concepts, but sometimes tricky conditions to
work under.  This may frustrate some folks but I often find this sort of
tasks quite satisfying...

Regards,
Mark

Reply by John Speth ●August 26, 20202020-08-26

On 8/25/2020 10:10 AM, Rick C wrote:
> I know most processors have an instruction for division and certainly Forth includes division words.  But how are they implemented?  I am asking because I need to do a division in hardware in an FPGA.
> 
> Are there processors that don't include a division instruction?  Has anyone had to implement division using something other than a native CPU division instruction?
> 
> I worked on an attached array processor many years ago (a floating point coprocessor in a rack cabinet) and it did division using Newton-Raphson iteration I believe.  It converged quickly and took some 10 or 12 cycles to complete.  Considering the multiply needed to carry it out took more than one clock cycle that's not bad.
> 
> Are there better techniques for division?  I guess there is long division, "shift and subtract" as compared to shift and add for multiplication.  Fortunately multipliers are still common.

Recalling an FPGA project I did about 10 years ago, I was able to 
implement FPGA processing blocks that could be integrated into the 
embedded core and present itself as a custom instruction. It worked very 
well for me. Fewer than 10 clocks would produce a single precision 
floating point result. The requirement was to compute std dev in less 
than 500 us. Emulated floating point divide and sqrt were time consuming 
operations at the time.

Details: It was an Altera Cyclone III FPGA. I was using the Quartus tool 
set. I embedded a NIOS II core. Quartus was a huge tool featuring many 
different ways to get something done. The choice I made (custom 
instruction plus various Quartus supplied floating point math blocks) 
was not obvious but I eventually stumbled across my solution combo and 
used it successfully.

JJS

Division

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group