This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
I just wanted to announce a new minimal CPU designed by me some time ago. Its special in the respect, that it fits into a 32 macrocell CPLD. Even though the instruction set is severly limited I think that the code density and cpu performance is on par with other minimal 'educational' CPUs, which are sometimes quite a bit larger in respect to gate/register usage. Features: -8 bit data. -6 bit adressbus allowing 64 bytes of shared program/data memory. -Programming model: Accumulator based. Registers are Accu[7:0], PC[5:0] and a carry flag. -4 Instructions: NOR, ADD, STA, JCC. Instructions execute in 1-2 clock cycles. The CPU code, along with further documentation, simulator and examples can be found here: http://www.tu-harburg.de/~setb0209/cpu/ So, does anybody come up with a smaller CPU ? Comments are appreciated. |
|
|
|
Tim Boescke wrote: > > I just wanted to announce a new minimal CPU designed > by me some time ago. <cut> > -8 bit data. > -6 bit adressbus allowing 64 bytes of shared program/data memory. > -Programming model: Accumulator based. Registers are Accu[7:0], PC[5:0] and > a carry flag. > -4 Instructions: NOR, ADD, STA, JCC. Instructions execute in 1-2 clock > cycles. <snip> > So, does anybody come up with a smaller CPU ? Years ago Byte magazine had a similar design in TTL. I think the instructions where 0) Load input 1) Jump minus 2) Add 3) Store Compliment Ben Franchuk. |
|
|
|
Anybody interested on the subject, we had another project called "FiJaaC processor". You can check it out a www.geocities.com/jaime_aranguren/FiJaaC or www.geocities.com/faidoll It's a basic 8 bit CPU on a Xilinx SpartanXL XCS05XL-PC84. Cheers, > -----Mensaje original----- > De: [mailto:]En > nombre de Ben Franchuk > Enviado el: Sábado, 20 de Octubre de 2001 08:39 p.m. > Para: > Asunto: Re: [fpga-cpu] New minimal CPU design > Tim Boescke wrote: > > > > I just wanted to announce a new minimal CPU designed > > by me some time ago. > <cut> > > -8 bit data. > > -6 bit adressbus allowing 64 bytes of shared program/data memory. > > -Programming model: Accumulator based. Registers are Accu[7:0], > PC[5:0] and > > a carry flag. > > -4 Instructions: NOR, ADD, STA, JCC. Instructions execute in 1-2 clock > > cycles. > <snip> > > So, does anybody come up with a smaller CPU ? > > Years ago Byte magazine had a similar design in TTL. I think > the instructions where > 0) Load input 1) Jump minus 2) Add 3) Store Compliment > Ben Franchuk. > > To Post a message, send it to: > To Unsubscribe, send a blank message to: |
|
|
|
> > > > I just wanted to announce a new minimal CPU designed > > by me some time ago. > > Years ago Byte magazine had a similar design in TTL. I think > the instructions where > 0) Load input 1) Jump minus 2) Add 3) Store Compliment Thats interesting. Do you know the exact issue ? But well, actually I was rather referring to the design being minimal in respect to resource usage. 31 CPLD macrocells are maybe comparable to ~16 Spartan CLBs, which is even less than 20% of a Spartan XCS05. (But you cant compare CPLDs and FPGAs anyways.. the design is optimized for CPLD usage and would be suboptimal for a FPGA). Minimal programming models are well known.(subtract-and-branch- if-negative, move machine etc..) I'd be interested to know about minimal FPGA cpu designs. Since the ALU can be minimized easily due to carry chains, the main problem is probably to optimize the control. The datapath of the above design should fit into 4+1+3+3=11 CLBs (Alu,Carry,Pc,Adreg). However first attempts to synthesize the design for a Spartan I revealed a CLB count of 24, so that 13 CLBs are wasted on control and unnecessary constructs. I have to look a bit deeper into it. |
|
> >Anybody interested on the subject, we had another project called "FiJaaC >processor". You can check it out a www.geocities.com/jaime_aranguren/FiJaaC >or www.geocities.com/faidoll It's a basic 8 bit CPU on a Xilinx SpartanXL >XCS05XL-PC84. I couldn't find any code for the CPU on the web site. Leon -- Leon Heller, G1HSM Tel: +44 1327 359058 Email: My web page: http://www.geocities.com/leon_heller My low-cost Altera Flex design kit: http://www.leonheller.com _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp |
|
|
|
Tim Boescke wrote: > Thats interesting. Do you know the exact issue ? No ... I lost all my old computer magazines a few years back do to water damage in storage. > But well, actually I was rather referring to the design > being minimal in respect to resource usage. 31 CPLD macrocells > are maybe comparable to ~16 Spartan CLBs, which is even less than > 20% of a Spartan XCS05. (But you cant compare CPLDs and FPGAs > anyways.. the design is optimized for CPLD usage and would > be suboptimal for a FPGA). That is hard to say what a good small model is . A shift right and and jump zero would be nice instructions if you had the room. > Minimal programming models are well known.(subtract-and-branch- > if-negative, move machine etc..) > > I'd be interested to know about minimal FPGA cpu designs. > Since the ALU can be minimized easily due to carry chains, > the main problem is probably to optimize the control. Historically 12 bits have been the minimum real cpu design. I suspect a clean design - ADD ac = ac + n , toggle carry - NOR ac = ac nor n - SHR ac = rcr ac , shift carry - DCA ac = 0 ; n = ac - JC if(cy) pc = n -JZ if(z) pc = n -JMP pc = n -JSR ac = pc ,pc = n would be about three times the size of a small 8 bit processor. > The datapath of the above design should fit into 4+1+3+3=11 > CLBs (Alu,Carry,Pc,Adreg). However first attempts to synthesize > the design for a Spartan I revealed a CLB count of 24, so that > 13 CLBs are wasted on control and unnecessary constructs. I have to > look a bit deeper into it. It is nice to see somebody working with the small sized FPGA's rather than the MegAGate-super-deluxe-do-all chips. Ben Franchuk. -- Standard Disclaimer : 97% speculation 2% bad grammar 1% facts. "Pre-historic Cpu's" http://www.jetnet.ab.ca/users/bfranchuk Now with schematics. |
|
OK, He haven't finished it yet, still are testing the Control Unit (the last part). By the way, if you can point us on how to get VHDL code from Xilinx' CORE Generator Dual Ported RAM models, we can give the complete VHDL code of it. For size issues, we had to use such an implementation for the Registers Bank, 'cause our VHDL implementation resulted to be quite big to fit on the XCS05XL-PC84. In that order of ideas, we also implemented some multiplexers as LogiBlox modules, because they can be optimized for area. The control unit is completely VHDL. Anyway, as soon as we finish our project (that should be in about one or two weeks), we will post on the pages the whole project sources. We use Xilinx' Foundation 3.1i for development, although some other tools as Renoir99 and ModelSim have been also used. Cheers, Jaime. > -----Mensaje original----- > De: Leon Heller [mailto:] > Enviado el: Domingo, 21 de Octubre de 2001 01:07 p.m. > Para: > Asunto: RE: [fpga-cpu] New minimal CPU design > > > >Anybody interested on the subject, we had another project called "FiJaaC > >processor". You can check it out a > www.geocities.com/jaime_aranguren/FiJaaC > >or www.geocities.com/faidoll It's a basic 8 bit CPU on a Xilinx SpartanXL > >XCS05XL-PC84. > > I couldn't find any code for the CPU on the web site. > > Leon > -- > Leon Heller, G1HSM Tel: +44 1327 359058 Email: > My web page: http://www.geocities.com/leon_heller > My low-cost Altera Flex design kit: http://www.leonheller.com > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > To Post a message, send it to: > To Unsubscribe, send a blank message to: |
|
|
|
Furthermore, if any of you are interested on it, we will be glad on sharing our project via e-mail. before it gets on the web. And of course, all of your suggestions and comments are very, very welcomed. Best regards, Jaime > -----Mensaje original----- > De: Leon Heller [mailto:] > Enviado el: Domingo, 21 de Octubre de 2001 01:07 p.m. > Para: > Asunto: RE: [fpga-cpu] New minimal CPU design > > > >Anybody interested on the subject, we had another project called "FiJaaC > >processor". You can check it out a > www.geocities.com/jaime_aranguren/FiJaaC > >or www.geocities.com/faidoll It's a basic 8 bit CPU on a Xilinx SpartanXL > >XCS05XL-PC84. > > I couldn't find any code for the CPU on the web site. > > Leon > -- > Leon Heller, G1HSM Tel: +44 1327 359058 Email: > My web page: http://www.geocities.com/leon_heller > My low-cost Altera Flex design kit: http://www.leonheller.com > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > To Post a message, send it to: > To Unsubscribe, send a blank message to: |
|
--- In fpga-cpu@y..., Jaime Andrés Aranguren Cardona <jaime.aranguren@i...> wrote: > part). By the way, if you can point us on how to get VHDL code from Xilinx' > CORE Generator Dual Ported RAM models, we can give the complete VHDL code of > it. You can easily create VHDL code for RAMs by manually instantiating RAMB4_Sxx entities. See the code below implementing a 256x32 dual ported RAM and excuse me for the line breaks. Hope this helps, best regards Felix _____ Dipl.-Ing. Felix Bertram Trenz Electronic Duenner Kirchweg 77 D - 32257 Buende Tel.: +49 (0) 5223 49 39 755 Fax.: +49 (0) 5223 48 945 Mailto: http://www.trenz-electronic.de ---------------------------------------------------------------------- ---------- library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.vcomponents.all; entity dpram256x32 is port ( clka: IN std_logic; addra: IN std_logic_VECTOR(7 downto 0); dia: IN std_logic_VECTOR(31 downto 0); ena: IN std_logic; wea: IN std_logic; -- clkb: IN std_logic; addrb: IN std_logic_VECTOR(7 downto 0); dob: OUT std_logic_VECTOR(31 downto 0); enb: IN std_logic); end dpram256x32; ---------------------------------------------------------------------- ---------- architecture bhv of dpram256x32 is constant zeroC: STD_LOGIC:= '0'; signal zero: STD_LOGIC; signal zerobus: STD_LOGIC_VECTOR(15 downto 0); begin zero <= zeroC; zerobus<= (others=> zeroC); U1: RAMB4_S16_S16 port map( CLKA=> clka, RSTA=> zero, ADDRA=> addra, DIA=> dia(15 downto 0), DOA=> open, ENA=> ena, WEA=> wea, CLKB=> clkb, RSTB=> zero, ADDRB=> addrb, DIB=> zerobus, DOB=> dob(15 downto 0), ENB=> enb, WEB=> zero); U2: RAMB4_S16_S16 port map( CLKA=> clka, RSTA=> zero, ADDRA=> addra, DIA=> dia(31 downto 16), DOA=> open, ENA=> ena, WEA=> wea, CLKB=> clkb, RSTB=> zero, ADDRB=> addrb, DIB=> zerobus, DOB=> dob(31 downto 16), ENB=> enb, WEB=> zero); end bhv; ---------------------------------------------------------------------- ---------- -- end of file |
|
|
|
> > But well, actually I was rather referring to the design > > being minimal in respect to resource usage. 31 CPLD macrocells > > are maybe comparable to ~16 Spartan CLBs, which is even less than > > 20% of a Spartan XCS05. (But you cant compare CPLDs and FPGAs > > anyways.. the design is optimized for CPLD usage and would > > be suboptimal for a FPGA). > > That is hard to say what a good small model is . A shift right > and > and jump zero would be nice instructions if you had the room. No chance to get anything else into the CPLD, there is not even enough room left for a synch. reset. :) And btw. are there really that many uses for a zero flag ? I see it being useful for bit comparisons, but in most arithmetic routines it is sufficient to have a greater-than or less-than compare. The equal-to compare can easily be done by an add #255, jcc .. Wide NOR gates can eat quite a bit of resources. > > I'd be interested to know about minimal FPGA cpu designs. > > Since the ALU can be minimized easily due to carry chains, > > the main problem is probably to optimize the control. > > Historically 12 bits have been the minimum real cpu design. > I suspect a clean design > - ADD ac = ac + n , toggle carry > - NOR ac = ac nor n > - SHR ac = rcr ac , shift carry > - DCA ac = 0 ; n = ac > - JC if(cy) pc = n > -JZ if(z) pc = n > -JMP pc = n > -JSR ac = pc ,pc = n > > would be about three times the size of a small 8 bit processor. Hm.. that would imply a 9bit PC. The datapath would probably take 12+1+9+2=24 CLbs (ALU, Carry, PC/Adreg, Z-Flag ) thats not too much. The control probably would not increase that much over an 8 Bit CPU, so it is maybe just twice as big. Jan Gray somehow managed to fit a 4 operator ALU into a single LUT per bit. (In the GR0040) Is there any documentation on this ? So maybe there is a way to reduce the ALU size to 6 CLBs. The DCA instruction is neat, however when I looked at the example for my CPU, which uses STA, I noticed that none of them could benefit from a DCA instruction. |
|
|
|
Dear Mr. Bertram, What I meant was to istantiate the DPRAM generated by the COREGenerator and the Muxes and other stuff generated by LogiBlox, so (I think) I don't get or should write any architecture, isn't it? Regards, > -----Mensaje original----- > De: Felix Bertram [mailto:] > Enviado el: Lunes, 22 de Octubre de 2001 09:33 a.m. > Para: > Asunto: [fpga-cpu] Re: New minimal CPU design > --- In fpga-cpu@y..., Jaime Andrés Aranguren Cardona > <jaime.aranguren@i...> wrote: > > part). By the way, if you can point us on how to get VHDL code from > Xilinx' > > CORE Generator Dual Ported RAM models, we can give the complete > VHDL code of > > it. > > You can easily create VHDL code for RAMs by manually instantiating > RAMB4_Sxx entities. See the code below implementing a 256x32 dual > ported RAM and excuse me for the line breaks. > > Hope this helps, > best regards > > Felix > _____ > Dipl.-Ing. Felix Bertram > Trenz Electronic > Duenner Kirchweg 77 > D - 32257 Buende > Tel.: +49 (0) 5223 49 39 755 > Fax.: +49 (0) 5223 48 945 > Mailto: > http://www.trenz-electronic.de > > ---------------------------------------------------------------------- > ---------- > > library IEEE; > use IEEE.STD_LOGIC_1164.all; > > library UNISIM; > use UNISIM.vcomponents.all; > > entity dpram256x32 is > port ( > clka: IN std_logic; > addra: IN std_logic_VECTOR(7 downto 0); > dia: IN std_logic_VECTOR(31 downto 0); > ena: IN std_logic; > wea: IN std_logic; > -- > clkb: IN std_logic; > addrb: IN std_logic_VECTOR(7 downto 0); > dob: OUT std_logic_VECTOR(31 downto 0); > enb: IN std_logic); > end dpram256x32; > > ---------------------------------------------------------------------- > ---------- > architecture bhv of dpram256x32 is > constant zeroC: STD_LOGIC:= '0'; > signal zero: STD_LOGIC; > signal zerobus: STD_LOGIC_VECTOR(15 downto 0); > begin > zero <= zeroC; > zerobus<= (others=> zeroC); > > U1: RAMB4_S16_S16 port map( > CLKA=> clka, RSTA=> zero, ADDRA=> addra, DIA=> dia(15 > downto 0), DOA=> open, ENA=> ena, WEA=> wea, > CLKB=> clkb, RSTB=> zero, ADDRB=> addrb, DIB=> > zerobus, DOB=> dob(15 downto 0), ENB=> enb, WEB=> zero); > > U2: RAMB4_S16_S16 port map( > CLKA=> clka, RSTA=> zero, ADDRA=> addra, DIA=> dia(31 > downto 16), DOA=> open, ENA=> ena, WEA=> wea, > CLKB=> clkb, RSTB=> zero, ADDRB=> addrb, DIB=> > zerobus, DOB=> dob(31 downto 16), ENB=> enb, WEB=> zero); > > end bhv; > > ---------------------------------------------------------------------- > ---------- > -- end of file > > > To Post a message, send it to: > To Unsubscribe, send a blank message to: |
|
|
|
> > Jan Gray somehow managed to fit a 4 operator ALU into a > > single LUT per bit. (In the GR0040) Is there any documentation on > > this ? So maybe there is a way to reduce the ALU size to 6 CLBs. > > Some of it is here: http://www.fpgacpu.org/log/nov00.html#001112. > > Some of it is hinted at in the GR0000 paper here: > http://www.fpgacpu.org/papers/soc-gr0040-paper.pdf > in section 3.12. Thank you! > Now then. Say you build a 4-bit ALU using this techniques. 4 LUTs. > And say you attach that to a 16 entry x 4-bit LUT RAM (another 4 LUTs, > or 8 if you make it dual-port RAM). And say you add a 2-bit counter (2 > LUTs) to sequence through LSB addresses 00, 01, 10, and 11 to that LUT > RAM. Now you have a simple datapath with 4 16-bit registers, > nybble-serial, that should easily run at 100 MHz (25 MHz for each 16-bit > operation). Total cost of datapath: 10-14 LUTs (3-4 Virtex CLBs; 2 > Virtex2 CLBs). Thats a pretty interesting idea. But how do we load data from the ram ? I fear another MUX is required, at least to load constants. > To hook that up to a 512x8 or 256x16 BRAM for program and data storage, > you need another 8 or 9 FFs for a PC and/or address register. These FFs > can share the same handful of CLBs with the aforementioned LUTs. The > instruction register can be the BRAM output register. Hm. that could substitute for the adress register, if we dont allow indexed adressing modes. This would simplify the control a bit. OTOH the PC could be mapped to the registerfile, but this would double the amount of cycles per instruction. (well, 8 cycles for a dual ported registerfile, 12 cycles for single ported) The adress register could have an incrementer. This would make flexible length instruction encoding easy. (register to register ops: 1 byte, immediate ops 3 byte) |
|
|
|
-- Martin Thompson BEng(Hons) CEng MIEE TRW Automotive Advanced Product Development, Stratford Road, Solihull, B90 4GW. UK Tel: +44 (0)121-627-3569 mailto: >>> 21 October 2001 17:21:23 >>> > 0) Load input 1) Jump minus 2) Add 3) Store Compliment Anyone implemented a One Instruction Set Computing processor in FPGA yet? http://www.idiom.com/free-compilers/LANG/OISC-1.html :-) Martin |
|
> What I meant was to istantiate the DPRAM generated by > the COREGenerator and > the Muxes and other stuff generated by LogiBlox, so > (I think) I don't get or > should write any architecture, isn't it? ok, I misunderstood you. There are some things to remember here. CoreGen produces EDIF. This means that the CoreGen modules are treated as black boxes during synthesis and resolved during design translation with ngdbuild. This implies, that you need to have a replacement for your core during simulation. Xilinx provides the XilinxCoreLib which contains a set of primitives to create models of CoreGen modules. For simulation, you can instantiate your CoreGen modules in VHDL, as the VHDL snippet below explains. I chose a different way than the instantiation template provided in the "vho" file. It uses a placeholder named "dpram256x32ii" so that I do *not* have to create that top-level configuration used in the vho file. Instead I can instantiate the "dpram256x32" entity like any other VHDL entity. The complete file is not included in the synthesis project. However, for simple blocks like RAMs I found this not too helpful. I feel it is much more concise to implement the RAM modules by hand in VHDL- as my previous posting showed. If you're wondering what CoreGen actually does- it creates instances of exactly the same primitives from the Unisim library that my VHDL code uses. Just have a look inside the EDIF files produced by CoreGen and search for "RAMB4". The main advantage of this is, that you can use exactly the same set of files for synthesis and simulation- avoiding problems related to inconsistent design files, resulting in deviations of simulation from synthesis. Hope this helps, Felix _____ Dipl.-Ing. Felix Bertram Trenz Electronic Duenner Kirchweg 77 D - 32257 Buende Tel.: +49 (0) 5223 49 39 755 Fax.: +49 (0) 5223 48 945 Mailto: http://www.trenz-electronic.de ---------------------------------------------------------------------- ---------- library IEEE; use IEEE.STD_LOGIC_1164.all; -- synopsys translate_off Library XilinxCoreLib; -- synopsys translate_on entity dpram256x32 is port ( addra: IN std_logic_VECTOR(7 downto 0); clka: IN std_logic; addrb: IN std_logic_VECTOR(7 downto 0); clkb: IN std_logic; dia: IN std_logic_VECTOR(31 downto 0); wea: IN std_logic; ena: IN std_logic; enb: IN std_logic; dob: OUT std_logic_VECTOR(31 downto 0)); end entity; ---------------------------------------------------------------------- ---------- architecture CoreGen of dpram256x32 is component dpram256x32ii port ( addra: IN std_logic_VECTOR(7 downto 0); clka: IN std_logic; addrb: IN std_logic_VECTOR(7 downto 0); clkb: IN std_logic; dia: IN std_logic_VECTOR(31 downto 0); wea: IN std_logic; ena: IN std_logic; enb: IN std_logic; dob: OUT std_logic_VECTOR(31 downto 0)); end component; -- synopsys translate_off for all : dpram256x32ii use entity XilinxCoreLib.C_MEM_DP_BLOCK_V1_0(behavioral) generic map( c_depth_b => 256, c_depth_a => 256, c_has_web => 0, c_has_wea => 1, c_has_dib => 0, c_has_dia => 1, c_clka_polarity => 1, c_web_polarity => 1, c_address_width_b => 8, c_address_width_a => 8, c_width_b => 32, c_width_a => 32, c_clkb_polarity => 1, c_ena_polarity => 1, c_rsta_polarity => 1, c_has_rstb => 0, c_has_rsta => 0, c_read_mif => 0, c_enb_polarity => 1, c_pipe_stages => 0, c_rstb_polarity => 1, c_has_enb => 1, c_has_ena => 1, c_mem_init_radix => 16, c_default_data => "0", c_mem_init_file => "dpram256x32.mif", c_has_dob => 1, c_generate_mif => 1, c_has_doa => 0, c_wea_polarity => 1); -- synopsys translate_on begin U0 : dpram256x32ii port map ( addra => addra, clka => clka, addrb => addrb, clkb => clkb, dia => dia, wea => wea, ena => ena, enb => enb, dob => dob); end CoreGen; ---------------------------------------------------------------------- ---------- -- end of file |
|
|
|
Jan Gray wrote: > Fun stuff. Brings back the old days of assembler one-upsmanship, > striving to bum an instruction or a cycle from a little compute kernel. > "What's the most compact sequence to do atoi, or itoa, or what have > you..." > > I feel sorry for anyone starting in computing today and who thinks 4 KB > is nothing. They missed all the fun. 4KB?? Nowdays 64 Meg is nothing. I think program size could be 1/4 the size assuming people wanted to save memory at some cost in speed. Finally debugging my FPGA CPU in real hardware nothing beats the old switches and lights to see what the cpu is doing. Programmable logic is nice to change configuration around for spot testing but I think all CPU's need a Run/halt/step setup where development is being done. While my cpu still has a few bugs I do have the UART working, front panel switches and my bootstrap ( hardware ) loader are working. Now I just need to go instruction by instruction to see what works. :) > Jan Gray Ben Franchuk. -- Standard Disclaimer : 97% speculation 2% bad grammar 1% facts. "Pre-historic Cpu's" http://www.jetnet.ab.ca/users/bfranchuk Now with schematics. |
|
This may sound dumb, but Im trying to get to grips with all this RISC FPGA microcontroller stuff. What pseudo instructions may be emulated using a combination of Tim Boescke's minimal NOR, ADD, STA, and JCC instructions? Have I been spoilt all this time and I only really need these four? Thanks, Vincent |
|
wrote: > > This may sound dumb, but Im trying to get to grips with all this RISC FPGA > microcontroller stuff. > > What pseudo instructions may be emulated using a combination of Tim Boescke's > minimal NOR, ADD, STA, and JCC instructions? > Have I been spoilt all this time and I only really need these four? I assume here that STA clears the accumulator. Yes 4 is the minimum instructions to have for a simple controller. Shifts and fancy arithmetic and subroutine calls do make a like easier to program however. Having just found a intermittent connection in a ribbon cable from my FPGA to my memory chips I will say a front panel is mandatory. Simulation does not always find bugs too as I also had error with my branch offsets for conditional branches in the simulator compared with the hardware.Ben -- Standard Disclaimer : 97% speculation 2% bad grammar 1% facts. "Pre-historic Cpu's" http://www.jetnet.ab.ca/users/bfranchuk Now with schematics. |
|
|
|
> You may also wish to put 0 in the regfile (leaving 2 (or perhaps it is 2 > 3/4) 16-bit regs (or 6 8-bit regs, if you prefer)). Then IR=MEM[PC+=1] > becomes rf[PC] += rf[zero] + cin=1 across 4 cycles. As the alu output > nybbles go by, you latch them in your nybble-serial-to-parallel FF-based > address register, that drives address lines to the BRAM. And perhaps > you can save more registers and a mux, if the data bus is only 4-bits > wide, if you use a RAMB_S16_S4 or something like that. (Hands wave > furiously.) Hm.. When using a single ported register file the output flip flops have to be used to store the first operand. This could be configured so that it is resetted instead of being loaded when the PC register is accessed. (Not sure whether it is possible to configure the CLBs like that) This way the PC reads always zero when it is read as first operand. Saves one register at the cost of one CLB. (I hope :) ) > (If this is all too complex, by all means, build a conventional 8- or > 16-bit high regfile and ALU -- I just wanted to show how that it is > possible to build a minimal slow austere 16-bit datapath in just 2 > Virtex2 CLBs.) I think i will go for a 16regs/8 bit parallel design as a first step. Should be easy to fold the design into a serial 16 bit architecture later. Of course the above trick would be less useful in this configuration. > (Another gate bumming idea: the output network of a RAM is a mux. Use Yeah, this could save some muxes. However it would require an additional instruction register. I have to evaluate this later.. |
|
> > > minimal NOR, ADD, STA, and JCC instructions? > > I assume here that STA clears the accumulator. Yes 4 is the > > minimum instructions to have for a simple controller. Shifts and > > fancy arithmetic and subroutine calls do make a like easier to > > program however. > > I was hoping for a few macros to explain things - like shift/rotate, compare, > and etc. > > On further investigation, I found a few macros in Tims cpu3.inc file. > > Any more for any more? Well, here are more: memory constants used: allone=255 one=1 shift akku left: sta temp add temp test for zero: add allone c=0 if akku was zero rotate akku left: sta temp add temp jcc noadd add one noadd: compare akku to temp: nor allone add one add temp c=0 if akku <= temp nor allone add temp c=0 if akku < temp akku=-akku: nor allone add one |
|
Sorry, some corrections: > compare akku to temp: > > nor zero > add one > add temp > > c=0 if akku <= temp > > nor zero > add temp > > c=0 if akku < temp > > akku=-akku: > > nor zero > add one |
|
|
|
> Could this CPU be extended for program and data memory (harvard) without > increasing the logic count considerably? The size will increase almost linearily with the adress/data width, since the control overhead is very small. Extending the cpu to 16bit data / 14 bit adress is pretty straightforward. You just have to change all signal widths / bit offsets in the source.. |
|
Tommy Thorn wrote: > Very nice, although (as others have pointed out) not minimal in a > mathematical sense, merly small. Also, IMHO, you have to take the > gates for program and data memory into account, at which point > code density becomes an issue. While clearly turing complete > (modulo finiteness), the code size explodes if you try running > more realistic stuff. I don't know RISC machines have the same problem too. Most machines have a comfortable operating size of main memory and problem solving ability. 0) Turning machines and OIC's would fall in this class. You can solve the problem but only in a abstract sense. 1) Toy or educational computers. 1K or less of memory and limited I/O. You can solve trivial problems.The Minimal machine fits here. 2) Calculator or simple control. 16K or less of memory and simple I/O and data storge.PDP-8's or any machine with BASIC in ROM. 3) Simple Real world problems. 128K or less of memory. Often 56k for many machines. Good floppy I/O. The PDP-11 or IBM PC worked best with 128 of memory. 4) Real machines with real problems. Big,Bigger,BEST money can buy this week. 5) Super computers -- Number crunching,weather,games,Windows :) > It would be very interesting to see what kind of architectures could come up if we had a competition for the fastest/smallest > cpu (eg. maximizing MIPS/(Mhz*gates), where gates includes program > memory cost). Of course this would require fixing a set (> 4) of > of realistic/useful micro benchmarks. I'm not sure Dhrystone > would be a good candidate. large integer factorization, RSA, > AES (DES successor), theorem proving, n'th digit of pi, ... Byte had ( late 1980's?) a benchmark on judging CPU's by instruction size , bus size and speed. Byte also had some real benchmarks (in small C?) I think too but never heard of them since. The small C compiler is almost a good benchmark by itself to generate sample programs. I have to say I designed my CPU to generate nice Small C output. This was because I have 24Kb limit for programs, 8kb is reserved for DOS. A well hacked Small C ( ver 1) fits just under 24kb. (Not that I have Dos or even a disk yet ). Ben Franchuk. -- Standard Disclaimer : 97% speculation 2% bad grammar 1% facts. "Pre-historic Cpu's" http://www.jetnet.ab.ca/users/bfranchuk Now with schematics. |
|
--- In fpga-cpu@y..., "Tim Boescke" <t.boescke@t...> wrote: > I just wanted to announce a new minimal CPU designed > by me some time ago. > > Its special in the respect, that it fits into a 32 > macrocell CPLD. Even though the instruction set > is severly limited I think that the code density > and cpu performance is on par with other minimal > 'educational' CPUs, which are sometimes quite a bit > larger in respect to gate/register usage. .... > So, does anybody come up with a smaller CPU ? Very nice, although (as others have pointed out) not minimal in a mathematical sense, merly small. Also, IMHO, you have to take the gates for program and data memory into account, at which point code density becomes an issue. While clearly turing complete (modulo finiteness), the code size explodes if you try running more realistic stuff. It would be very interesting to see what kind of architectures could come up if we had a competition for the fastest/smallest cpu (eg. maximizing MIPS/(Mhz*gates), where gates includes program memory cost). Of course this would require fixing a set (> 4) of of realistic/useful micro benchmarks. I'm not sure Dhrystone would be a good candidate. large integer factorization, RSA, AES (DES successor), theorem proving, n'th digit of pi, ... /Tommy |
|
> > I just wanted to announce a new minimal CPU designed > > by me some time ago. > > > > Its special in the respect, that it fits into a 32 > > macrocell CPLD. Even though the instruction set > > is severly limited I think that the code density > > and cpu performance is on par with other minimal > > 'educational' CPUs, which are sometimes quite a bit > > larger in respect to gate/register usage. > .... > > So, does anybody come up with a smaller CPU ? > > Very nice, although (as others have pointed out) not minimal in a > mathematical sense, merly small. Also, IMHO, you have to take the Thanks. Well, the 'minimal' was rether referring to the used CPLD, which is the smallest you can get. But of course you are generally right. > gates for program and data memory into account, at which point > code density becomes an issue. While clearly turing complete > (modulo finiteness), the code size explodes if you try running > more realistic stuff. Yes, but it is interesting to note, that this instruction set did not eat up more logic resources, than other more minimized and less efficient instruction sets. (for example branch-and-substract- when-negative). Proving Turing completeness is fairly simple for most instruction sets - however are there any generalized means to measure efficiency on a theoretical base ? > It would be very interesting to see what kind of architectures could come up if we had a competition for the fastest/smallest > cpu (eg. maximizing MIPS/(Mhz*gates), where gates includes program > memory cost). Of course this would require fixing a set (> 4) of > of realistic/useful micro benchmarks. I'm not sure Dhrystone > would be a good candidate. large integer factorization, RSA, > AES (DES successor), theorem proving, n'th digit of pi, ... Indeed, unfortunately there are lots of degrees of freedom, which makes optimizing very difficult. A more constrained approach (like limiting the instruction set) would be more realistic. |
|
|
|
"Tim Boescke" <> writes: > Yes, but it is interesting to note, that this instruction set did > not eat up more logic resources, than other more minimized and > less efficient instruction sets. (for example branch-and-substract- > when-negative). Proving Turing completeness is fairly simple for > most instruction sets - however are there any generalized means > to measure efficiency on a theoretical base ? I was trying to come up with such a thing. The best I've been able to come up with so far measures the efficiency of two generalized CPU architectures A and B. It requires that both architectures be generalized to work at arbitrary bit widths n. Then the efficiency of A relative to B is O(f(n)) if every instruction of B can be emulated by O(f(n)) instructions of A. For instance, suppose B has an instruction for computing (x>>y) where x, y, and the result are n-bit quantities. An architecture with a shift right by one bit instruction can emulate this with O(n) instructions. An architecture that requires division by repeated subtraction uses O(2^n) instructions. I conjecture that the "standard" architectures (x86, SPARC, etc.) can emulate each other in O(n). Carl Witty |
|
Sunday Ed Corter (hey that me) wrote I am almost done with an Auto rotate and shift module (Verilog). Module is programmable for mode of operation ( sh \ rot ) and a count of operations is set. When the operand is written . the module does the N Op's and flags Done. The module description and usage is available at http://ca.geocities.com/artiedc/files/rotshift.htm there's a zip file for download that contains all of the files ( 3 for the module ) and all of the models sim files including the bench. Ed Corter __________________________________________________ |