EmbeddedRelated.com
Forums

small CPUs

Started by Jan Gray October 22, 2004
> The XSOC RISC is also not very small.

That stings! Oh well -- de gustibus non disputandem est -- so what's new in
small processor cores? Anyone care to fill in the table below with more
recent entries?

Small (IMHO):
PicoBlaze: 76-96 slices (approx. double to get LUTs)
gr0000: (simpleVirtex-optimized 16-bit RISC): < 200 LUTs + 1 BRAM
gr1000: (unpublished Virtex-optimized pipelined 16-bit RISC): < 200 LUTs + 1
BRAM
xr16: 260 logic cells (258 4-LUTs, 52 3-LUTs, 165 flip-flops, 112 TBUFs)

Middlin':
Nios-II/e: ~550 LEs (Cyclone)

Not very small:
MicroBlaze: ~900 LUTs
Nios (16-bit): ~1100 LEs
Nios-II/f: 1800 LEs (Cyclone)

(Disclaimer: all data may be obsolete/wrong.)

You can of course implement a 16- or 32-bit datapath on an 8-bit datapath
(taking 2 or 4 cycles per operation); but that will not significantly
improve the "performance divided by area" number that is my preferred figure
of merit.

Cheers,
Jan Gray



At 12:30 AM 10/22/2004, you wrote:

> > The XSOC RISC is also not very small.
>
>That stings! Oh well -- de gustibus non disputandem est -- so what's new in
>small processor cores? Anyone care to fill in the table below with more
>recent entries?

I must apologize. My memory was incorrect. I did a brief survey a while
back and did not remember any CPU cores that were less than about 1000 LUTs
for CPUs that were completely functional. I think I did not include yours
in my mental list of small CPUs because, as opposed to GPL, the license
does not allow commercial use, which is what I was looking for.

Now that I have looked at it a bit harder, I see that the big LUT consumer,
the wide mux, is implemented in TBUFs. That won't fly in the newer Xilinx
parts or the Altera parts. That would use about 4 LUTs per bit (7 inputs)
or 64 more LUTs, still a very small CPU at about 375 LUTs.

I have not looked hard at the others. But I did look at your notes with
the GR0040. You estimate a 32 bit GR0050 at 330 LUTs, even after adding
128 more for the wide mux, that is only 460 LUTs, still very good for a 32
bit CPU. How do you expect to extend the immediate operands to a full 32
bits, multiple imm instructions? >Small (IMHO):
>PicoBlaze: 76-96 slices (approx. double to get LUTs)
>gr0000: (simpleVirtex-optimized 16-bit RISC): < 200 LUTs + 1 BRAM
>gr1000: (unpublished Virtex-optimized pipelined 16-bit RISC): < 200 LUTs + 1
>BRAM
>xr16: 260 logic cells (258 4-LUTs, 52 3-LUTs, 165 flip-flops, 112 TBUFs)
>
>Middlin':
>Nios-II/e: ~550 LEs (Cyclone)

32 bits though, right? >Not very small:
>MicroBlaze: ~900 LUTs
>Nios (16-bit): ~1100 LEs
>Nios-II/f: 1800 LEs (Cyclone)

I believe the microBlaze and NIOS-II/f are 32 bits too. >(Disclaimer: all data may be obsolete/wrong.)
>
>You can of course implement a 16- or 32-bit datapath on an 8-bit datapath
>(taking 2 or 4 cycles per operation); but that will not significantly
>improve the "performance divided by area" number that is my preferred figure
>of merit.

I would think a metric should also take into account the efficiency of the
instruction set. Using 16 bit instructions can use more program memory
than 8 bit instructions. But I guess that would be very hard to measure
other than using benchmarks.

Likewise, the instruction set affects processor speed in ways other than
just clock speed. But that takes us into the nebulous world of
benchmarking as well.
Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX


Arius - Rick Collins wrote:

> I must apologize. My memory was incorrect. I did a brief survey a while
> back and did not remember any CPU cores that were less than about 1000 LUTs
> for CPUs that were completely functional. I think I did not include yours
> in my mental list of small CPUs because, as opposed to GPL, the license
> does not allow commercial use, which is what I was looking for.

But can one use LUTs as reasonable benchmark since not all LUTs
are equal. Also features like fast carry and dual port ram with Xilinx
tend to cloud just what you can do.

>>You can of course implement a 16- or 32-bit datapath on an 8-bit datapath
>>(taking 2 or 4 cycles per operation); but that will not significantly
>>improve the "performance divided by area" number that is my preferred figure
>>of merit.
Good benchmark but the 8 bit toy computer that fit in a 32 cell CPLD
can not be
easly beat. http://www.tu-harburg.de/~setb0209/cpu/mcpu.html > I would think a metric should also take into account the efficiency of the
> instruction set. Using 16 bit instructions can use more program memory
> than 8 bit instructions. But I guess that would be very hard to measure
> other than using benchmarks.
>
> Likewise, the instruction set affects processor speed in ways other than
> just clock speed. But that takes us into the nebulous world of
> benchmarking as well.

I think the real fractor is the random logic needed. FPGA's have
problems with this no
matter what brand of FPGA's you use. Many years ago , BYTE magazine had
a benchmark
of sorts that took in acount the instruction set of the computer rather
raw speed.
This I think is what you are looking for.



At 02:32 AM 10/22/2004, you wrote:

> But can one use LUTs as reasonable benchmark since not all LUTs
>are equal. Also features like fast carry and dual port ram with Xilinx
>tend to cloud just what you can do.

No one is trying to compare different FPGAs, just the CPU designs. For a
given FPGA, LUTs are a useful measure of the size of a design. In fact,
they are the *only* measure since that is typically the limiting resource
in the chip.
Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX