This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
> it is not absolutely necessary to port GCC; any C compiler will do. > In my ECO32 project I used LCC, which is very small compared to GCC, > is well documented (Fraser & Hanson book), and uses a back-end generator. > Included are back-end descriptions for a handful of architectures. > Porting the compiler to ECO32 (a 32-bit RISC machine) was done in > about 2 weeks. > > Regards, > Hellwig Yes, see also http://www.fpgacpu.org/usenet/lcc.html. By chosing LCC you work with a splendid compiler text book and so as a side effect you will learn much about compiler implementation. You'll also need an assembler, linker and a runtime library. I wrote a simple home made assembler in C. In the past I have built assemblers using awk. These days you might be happier implementing the assembler in Python or Perl. I also cut corners and used my assembler as my linker. The output of separately compiled modules were .o's consisting not of binary instructions, with relocation records, but of assembly, with appropriate .global and .extern directives. Then at "link time" I simply concatenated together the .o's and assembled them. You just need to provide assembly symbol table functionality so that duplicate local symbol names do not cause errors. For runtime libraries I just implemented the functions I needed as I went along. I think this approach is perfectly acceptable, particuarly if you are then going to run a higher level runtime environment like Scheme or Squeak on top of your minimal C runtime. In contrast, the considerable extra effort/investment of a binutils/GCC/GAS/libraries/GDB port will pay off if you have a lot of off the shelf C/C++ software that you want to reuse. Even there one might argue that first doing a quick LCC port while you tune up your instruction set architecture will save time in the long run. Jan Gray, Gray Research LLC |
|
|
|
Hi Jan, Well, thought I'd let you know that I've just finished a port of the gr0040 series to a 32-bit architecture (targetted at spartan-3) with a few minor twists along the way. I'm just looking at the final verification of the interrupt-driven timer as it scrolls up the screen, and indeed every 64 (or so) clocks, it jumps to the interrupt handler, resets the count and jumps back to the program :-) I too wrote an assembler, though I did it in java rather than a scripting-language - the StreamTokenizer does nice things like give you a block-comment parser for free just like 'C'/C++ :-) Compiling it to an executable with 'gcj' makes it a nice fast tool to run, and it took about a day to write (Saturday, in fact :-) Looking at the licence for xr16, it's not clear I'm allowed to host a web-page saying 'here it is, warts and all', so I guess you're all going to have to build your own [grin] - not that you wouldn't in the first place, excuse me, I'm just still full of euphoria from actually getting the thing to simulate verifiably correctly (as in: all the instructions do what they're supposed to :-)) I modified the interrupt handlers slightly. I have a somewhat embarrassingly large number of registers (512 in fact, 64 visible at any one time, split as 32 permanent, 32 banked), so using one for the interrupt-return wasn't a problem, and since I don't have "mov rdst, rsrc" (it uses "add rdst, rsrc, r0, 0" instead) it was causing issues when r0 wasn't 0 :-) I moved to a 4-operand (src, a, b, constant) for the ALU ops, mainly because I could - there's lots of bits in a 32-bit instruction :-) Loads and stores are the same format though, "ldl Rdst, imm(Rsrc)", with everything extended to 32-bits (using the same sort of ideas with duplication of data on writes and zero-extension on read for smaller bit-widths). Branches now have a 27-bit relative range (25 stored, <<2 for alignment). The only new instruction at the moment is 'bank', which selects a second-set-of-32 register bank to use. This defaults to '1' on reset, but ranges from 1-15. If the high-bit of the register number is set in any instruction, the core replaces that bit with the current bank number, giving a 9-bit address for the register location (which is basically because a blockram is so large :-) I am going to add 'ei' and 'di' instructions though to enable/disable interrupt processing. I want to port a multi-tasking OS onto the chip when I get a C compiler up and running, and being able to switch off interrupts when in a critical section will be quite useful :-) It's rather nice to identify a weakness for what *I* want to do, and be able to fix it either in 'hardware' or in software :-) The other point worth mentioning [to bring this slightly back to the original topic] is that lcc only copes with 32 registers, as I found out after getting my CPU to work with 64 at a time... Lesson 1: read the manual. Lesson 2: read it again. So, unless there's a better candidate out there, it looks like I'll be attempting gcc :-( (or perhaps :-), who knows :-) Oh yeah, click and go synthesis in Webpack gives me ~56MHz in a Spartan-3 FG456. Doing some minimal floorplanning (aligning columns, trying to keep datapaths short) gives a boost to 68->70 MHz (whenever I get it to ~70, I still tweak it and it drops back to 68 or so :-( I need to learn more about floorplanning...) It's complicated slightly because there aren't any tristate buffers any more (I get lots of warnings :-) which means you need to route the signals. My gut feeling is that there's a fair bit of congestion in the 'middle' bits. Also, the design as it stands uses 6 Blockrams (4 for memory, to give 8-bit write-enables, and 2 for dual-issue register files... Well, I've no other use for them :-) which extends it along the chip a bit... Anyway, thought you might be interested :-) Simon |
|
|
|
On 25 Jul 2004, at 17:16, Simon Gornall wrote: > I modified the interrupt handlers slightly. I have a somewhat > embarrassingly large number of registers (512 in fact, 64 visible at > any one time, split as 32 permanent, 32 banked) Hmm, I think I've just thought of a way this could be a real advantage and still use lcc - and I've just been skimming the book, if Jan thinks this is easy compared to gcc, I don't want to go *near* gcc! The idea is to do 2 ports :-) The second port only differs from the first in that it uses r32->r63 whereas the first uses r0->r31. This means I can write 'system' code that uses r0->r31 and 'application' code that uses r32->r63, giving me 16 (+the kernel) tasks which can be context-switched using a single instruction (bank N). This assumes I fold the sp and pc registers into the normal register file along with the {ccc,ccn,ccv,ccz} vector (at the moment they're single-instance 'reg's in the verilog code, as in gr0040). That's still 29 general-purpose registers to play with per task, which ought to be plenty :-) 'System' tasks can interrupt application ones without conflicting with their registers or state. I think Motorola had something similar with their 'supervisor mode', which had a separate set of registers when you were out of 'user mode'. The only drawback is the limited number of tasks - but I think 16 will be a good start, I'm not trying to write Linux here :-) I just want to be able to run more than one task on the cpu, keeping the tasks separate rather than multiplexing the cpu's actions within a single program. Tomorrows seminar on "How to turn necessity into virtue" will be brought to you by P.ragmatism. Stay tuned :-) Simon |