This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
Hi All, Say I want to build a cycle accurate model of an exisiting processor. Say Intel 386 for example. Now I have access to all the data sheets and plenty of other information. Now I have, a very good specifications as well a brief internal design. How exactly I go about handling this project, so that I make best use of the all the information available and in very systematic way. Please send me the list of suggestions from u r experience and any pointers to information on such projects. Regards, Anand,. [Non-text portions of this message have been removed] |
|
|
|
When you say emulation, do you mean in FPGA (as per list) or are you thinking about a software emulation? Original Message: ----------------- From: Anand Gopal Shirahatti Date: 18 Dec 2003 04:57:09 -0000 To: Subject: [fpga-cpu] Emulation of Processor Hi All, Say I want to build a cycle accurate model of an exisiting processor. Say Intel 386 for example. Now I have access to all the data sheets and plenty of other information. Now I have, a very good specifications as well a brief internal design. How exactly I go about handling this project, so that I make best use of the all the information available and in very systematic way. Please send me the list of suggestions from u r experience and any pointers to information on such projects. Regards, Anand,. [Non-text portions of this message have been removed] To post a message, send it to: To unsubscribe, send a blank message to: -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web.com/ . |
|
Anand Gopal Shirahatti brought up an excellent point: > Say I want to build a cycle accurate model of an > exisiting processor. Say Intel 386 for example. Now I > have access to all the data sheets and plenty of > other information. Now I have, a very good > specifications as well a brief internal design. > > How exactly I go about handling this project, so that > I make best use of the all the information available > and in very systematic way. Please send me the list > of suggestions from u r experience and any pointers > to information on such projects. That is a very good question. The answer obviously depends on level of seriousness (commercial, research, hobby) and degree of accurateness required (perfect cycle accurate replica, bug-faithful implementation, "semantic" faithful, ...). Perfect cycle accurate replica are rarely neccessary, except in cases where programs depend on instruction timing (not practical for any modern architeture). For most processor work the pivotal element is the "golden reference model", which can be anything from a simple cpu simulator written in C to a formal description in some machine readable form. Arriving at this model is the first part of the work. It can be written based on data sheets but in general those only provide a first approximation. For the full detail there is really no alternative to good old reverse engineering: writing test programs to answer ambiguities and unknowns in the data sheets. One lazy-mans approach is to extract execution traces from a real processor (fx. by single stepping through real programs and dumping the complete cpu state at each step). This trace can then be used to verify the reference simulator at each step. For truly lowlevel timing relations, you have to hook up a logic analyser also. Often you don't have to go that far though. After all, what people cares about the most is that the new processor can run the same programs, so getting the model into a shape where it can run real programs is very helpful. Notice that this generally also requires some amount of external device simulation for more useful programs. Once you have a model you trust there are several ways to proceed. In my hobby projects, I evolve the simulator in stages to include a more and more details of the hardware implementation and eventually implement it in Verilog. Each refinement is co-simulated with one of it's predecessors to check for bugs (trust me, it's *much* easier to find and correct bugs this way compared to trying to debug why some application wasn't executed correctly [1]). Anyway, that's my take on it. I look forward to other opinions. /Tommy [1] Alexander Klaiber, Sinclair Chau: Automatic Detection of Logic Bugs in Hardware Designs, Fourth International Workshop on Microprocessor Test and Verification, Common Challenges and Solutions (MTV 2003), May 29-30, 2003, Hyatt Town Lake Hotel, Austin, Texas, USA. IEEE Computer Society 2003. __________________________________ |
|
Ofcourse on FPGA ! Final goal is VHDL RTL On Thu, 18 Dec 2003 wrote : >When you say emulation, do you mean in FPGA (as per list) or are you >thinking about a software emulation? >Original Message: >----------------- > From: Anand Gopal Shirahatti >Date: 18 Dec 2003 04:57:09 -0000 >To: >Subject: [fpga-cpu] Emulation of Processor >Hi All, > >Say I want to build a cycle accurate model of an exisiting processor. Say >Intel 386 for example. Now I have access to all the data sheets and plenty >of other information. Now I have, a very good specifications as well a >brief internal design. > >How exactly I go about handling this project, so that I make best use of >the all the information available and in very systematic way. Please send >me the list of suggestions from u r experience and any pointers to >information on such projects. > >Regards, >Anand,. > >[Non-text portions of this message have been removed] >To post a message, send it to: >To unsubscribe, send a blank message to: > >-------------------------------------------------------------------- >mail2web - Check your email from the web at >http://mail2web.com/ . > >To post a message, send it to: >To unsubscribe, send a blank message to: [Non-text portions of this message have been removed] |
|
> Say I want to build a cycle accurate model of an exisiting processor. Say > Intel 386 for example. [...] > How exactly I go about handling this project, so that I make best use of > the all the information available and in very systematic way. Please send > me the list of suggestions from u r experience and any pointers to > information on such projects. With all due respect, you're in way over your head. The 386 is one of the most complex scalar processors ever. Start with something simpler. If it has to be an x86, start with an 8086 or 8088. But you'd be better off doing an 8-bit processor, or a RISC processor. The stuff you learn doing that will be essential when you are actually ready to tackle something more complex. |
|
Encouraging words from Eric Smith: > With all due respect, you're in way over your head. The 386 is one of > the most complex scalar processors ever. Start with something simpler. > If it has to be an x86, start with an 8086 or 8088. But you'd be better > off doing an 8-bit processor, or a RISC processor. The stuff you learn > doing that will be essential when you are actually ready to tackle > something more complex. I don't even know how to start replying to this, so let me just enumerate: 0) Do you actually know Ananard? I don't, but I don't jump to conclusions about other peoples resources and abilities. 1) I personally know a handful of persons who could each pull off a 386 on their own. 2) The 386 is not that complex. It's just not very orthogonal and there's a lot of detail to describing it. 3) There is already a pretty good starting point for a reference model in BOCHS. 4) One reasonable approach would be to identify the 95% most executed instructions and features used on a variety of benchmarks. Implement those and trap to an interpreter (written the that subset) for the rest. The subset actually used when running, say Linux, is a lot smaller than the full 386. I think that could be a fun project. /Tommy __________________________________ |
|
|
|
"Tommy Thorn" <> wrote: > 0) Do you actually know Ananard? I don't, but I don't > jump to conclusions about other peoples resources and > abilities. I have no idea who Ananard is. I don't know Anand Gopal Shirahatti, but I do know that it takes more than thirty man-years for a team of expert microprocessor designers to build a fully 386-compatible core. I've spoken to engineers who were involved in such projects at two different companies. If it was as easy as you claim, there would have been a lot more than just a handful companies (Intel, AMD, Cyrix, Chips & Technology, perhaps I've missed one or two) making 386 processors. So yes, based on the way his posting was written, I did jump to the conclusion that Anand doesn't want to invest thirty man-years or more in the project. I'll freely concede that this assumption could be incorrect. > 1) I personally know a handful of persons who could > each pull off a 386 on their own. By making this statement, you've just demonstrated that you have no grasp of the complexity of the 386 yourself. However, feel free to tell us more about these superstars, and what they've actually accomplished. > 2) The 386 is not that complex. False. It is a very complex part (though obviously less complex than the newer x86 parts). The original implementation used approximately 275,000 transistors. And unlike most modern processors (RISC or CISC), very few of those are RAM. In a modern processor, more than 90% of the transistor count is RAM. Even if you're substantially more clever than the original designers of the 386 (which I rather doubt), you're probably not going to be able to shave down the transistor count (or the gate count) by more than 20% and still maintain full compatability to the extent that Anand wanted. Nor, by using more transistors (or gates) than the original, are you going to reduce the magnitude of the required design effort by more than 20%. > It's just not very orthogonal and there's a lot of detail to > describing it. True. > 3) There is already a pretty good starting point for a > reference model in BOCHS. Having a software 386 simulator is certainly useful as a reference, but tells you very little about how to write a workable RTL model of a 386. (Or any other sort of hardware model that can meet the project objective.) > 4) One reasonable approach would be to identify the > 95% most executed instructions and features used on a > variety of benchmarks. Implement those and trap to an > interpreter (written the that subset) for the rest. The original project as described was "to build a cycle accurate model of an exisiting processor. Say Intel 386 for example." The approach you're proposing, while it might yield useful results, does not produce the goal defined for the project. Eric |
|
> Say I want to build a cycle accurate model of an exisiting processor. Say Intel 386 for example. Now I have access to all the data sheets and plenty of other information. Now I have, a very good specifications as well a brief internal design. > > How exactly I go about handling this project, so that I make best use of the all the information available and in very systematic way. I have put together a cycle-accurate implementation of the 6502 processor. Whether this is a good appraoch to use or not, this how I accomplished the task: First, ignore the cycle accuracy. I'm of the opinion it's better to get a working processor first, then make it cycle accurate. But keep the cycle accuracy in the back of your mind. In other words don't put together a bit-serial implementation of a byte wide processor, a micro-code version of a RISC cpu, etc. The required conversion later will probably only cause problems. My advice is to follow a similar design pattern to begin with. Once you have a working processor, then go back and patch it up so it's cycle accurate. Get to cycle accuracy in stages. First try and make the cpu *faster* than cycle accurate while keeping it simple at the same time. Once it's faster than the original, it's easier to go back and add in additional 'nop' cycles to slow the instructions down so they match the original timing. Reducing the speed of a design is probably a lot easier than trying to increase the speed of a design that started off on the wrong foot with the wrong architecture. Keep the pipelining of the original in mind. If the original processor is pipelined so that instructions execute in a single cycle, then you'll have to duplicate that pipelining in order to get the single cycle instruction execution. Tackle cycle accuracy on the instructions that are a) easy to make cycle accurate, and b) the instructions that are likely to be the critical ones for cycle accuracy. It might be acceptable for other less critical instructions to be non-cycle accurate. Cycle accuracy is mostly marketing hype. It's great to be able to say the processor is 100% cycle-accurate, but it's not normally a requirement. Coding that depends on cycle accuracy is strongly discouraged because different versions of a processor (even within the same generation from the same manufacturer) could potentially have different timings. With todays complex systems involving overlapped instructions sequences, caches accesses, interrupts, etc. Almost no-one depends on cycle accuracy because it's an unreliable approach. Where cycle accuracy has been used in the past is for simple systems where clock cycles were counted to determine timing delays. Most of these delays consist of loops that simply decrement a counter. So critical instructions to make cycle accurate are probably branch / loop instructions and decrements / increments. For my '02 implementation, in the first pass I had many instructions that took longer than the original. Once I had the processor basically working, I then looked at how I could streamline the cpu. I streamlined the cpu to reduce all the instructions to the minimum number of cycle (once again not trying too hard to keep cycle accuracy). This was the second iteration of the cpu. At this point I had all instructions executing in the same or fewer clock cycles than the original. For the third iteration of the processor, I went back and added in additional 'nop' cycles to extend instruction out to the same timings as the original. Note there are different kinds of cycle accuracy as well. My '02 has instruction timing accuracy, but not bus-cycle by bus-cycle accuracy (although it's very close). Note obtaining cycle accuracy cost about 10% of the clock cycle, and 10% in size. The cycle accurate version runs at 10% slower clock frequency and consumes about 10% more fpga resources. (Cycle-accuracy uses the fpga resources less efficiently than they could otherwise be used in this case). I have an option to build the code with non-cycle accuracy for better performance and size. ================================================= I spent about a year getting the 02 basically working. It was more than another year before I had it cycle accurate. These were not really man years, but I spent a lot of time at it on weekends and evenings. It' probably represents many man-months of effort anyway (I can code and get things working very fast....) The x86 series is a complex processor. Twice I've started a 8086 clone, but then dropped it after a just a few hours. I'd estimate it to be about three or four times more complex than the '02, meaning it would probably take me about five years to get a decently working version (without working on it full time). Something like the 386 is several times more complex than that so the other poster's comment about spending 30 man years isn't an unreasonable time estimate. still, if you like a challenge..... Implementing an existing processor has a lot of attraction because of the existing base of software and tools. Depending what your goals are...... it might be easier to get x386 comparable performance with a much simpler processor. For instance isn't the xr16 20 MIPS ? Rob |
|
|
|
Hi Rob, May i ask for more information on how you implement the 6502? cos i am doing an emulator on 6502 also .... I plan to emulate the 6502 by Verilog .... Let say i alredi obtain the 6502 Verilog code somewhere from the web, how do i test it to detect the bug? cos the code is said to contain bugs ..... Please advice Thanks EH Rob Finch <> wrote: > Say I want to build a cycle accurate model of an exisiting processor. Say Intel 386 for example. Now I have access to all the data sheets and plenty of other information. Now I have, a very good specifications as well a brief internal design. > > How exactly I go about handling this project, so that I make best use of the all the information available and in very systematic way. I have put together a cycle-accurate implementation of the 6502 processor. Whether this is a good appraoch to use or not, this how I accomplished the task: First, ignore the cycle accuracy. I'm of the opinion it's better to get a working processor first, then make it cycle accurate. But keep the cycle accuracy in the back of your mind. In other words don't put together a bit-serial implementation of a byte wide processor, a micro-code version of a RISC cpu, etc. The required conversion later will probably only cause problems. My advice is to follow a similar design pattern to begin with. Once you have a working processor, then go back and patch it up so it's cycle accurate. Get to cycle accuracy in stages. First try and make the cpu *faster* than cycle accurate while keeping it simple at the same time. Once it's faster than the original, it's easier to go back and add in additional 'nop' cycles to slow the instructions down so they match the original timing. Reducing the speed of a design is probably a lot easier than trying to increase the speed of a design that started off on the wrong foot with the wrong architecture. Keep the pipelining of the original in mind. If the original processor is pipelined so that instructions execute in a single cycle, then you'll have to duplicate that pipelining in order to get the single cycle instruction execution. Tackle cycle accuracy on the instructions that are a) easy to make cycle accurate, and b) the instructions that are likely to be the critical ones for cycle accuracy. It might be acceptable for other less critical instructions to be non-cycle accurate. Cycle accuracy is mostly marketing hype. It's great to be able to say the processor is 100% cycle-accurate, but it's not normally a requirement. Coding that depends on cycle accuracy is strongly discouraged because different versions of a processor (even within the same generation from the same manufacturer) could potentially have different timings. With todays complex systems involving overlapped instructions sequences, caches accesses, interrupts, etc. Almost no-one depends on cycle accuracy because it's an unreliable approach. Where cycle accuracy has been used in the past is for simple systems where clock cycles were counted to determine timing delays. Most of these delays consist of loops that simply decrement a counter. So critical instructions to make cycle accurate are probably branch / loop instructions and decrements / increments. For my '02 implementation, in the first pass I had many instructions that took longer than the original. Once I had the processor basically working, I then looked at how I could streamline the cpu. I streamlined the cpu to reduce all the instructions to the minimum number of cycle (once again not trying too hard to keep cycle accuracy). This was the second iteration of the cpu. At this point I had all instructions executing in the same or fewer clock cycles than the original. For the third iteration of the processor, I went back and added in additional 'nop' cycles to extend instruction out to the same timings as the original. Note there are different kinds of cycle accuracy as well. My '02 has instruction timing accuracy, but not bus-cycle by bus-cycle accuracy (although it's very close). Note obtaining cycle accuracy cost about 10% of the clock cycle, and 10% in size. The cycle accurate version runs at 10% slower clock frequency and consumes about 10% more fpga resources. (Cycle-accuracy uses the fpga resources less efficiently than they could otherwise be used in this case). I have an option to build the code with non-cycle accuracy for better performance and size. ================================================= I spent about a year getting the 02 basically working. It was more than another year before I had it cycle accurate. These were not really man years, but I spent a lot of time at it on weekends and evenings. It' probably represents many man-months of effort anyway (I can code and get things working very fast....) The x86 series is a complex processor. Twice I've started a 8086 clone, but then dropped it after a just a few hours. I'd estimate it to be about three or four times more complex than the '02, meaning it would probably take me about five years to get a decently working version (without working on it full time). Something like the 386 is several times more complex than that so the other poster's comment about spending 30 man years isn't an unreasonable time estimate. still, if you like a challenge..... Implementing an existing processor has a lot of attraction because of the existing base of software and tools. Depending what your goals are...... it might be easier to get x386 comparable performance with a much simpler processor. For instance isn't the xr16 20 MIPS ? Rob --------------------------------- |
|
|
|
--- Eng How Khoo <> wrote: > Hi Rob, > > May i ask for more information on how you implement the 6502? cos i > am doing an emulator on 6502 also .... > I plan to emulate the 6502 by Verilog .... > Let say i alredi obtain the 6502 Verilog code somewhere from the web, > how do i test it to detect the bug? cos the code is said to contain > bugs ..... > Please advice > Thanks Hi, About testing Verilog cores... I'm working currently on bridge from Verilog simulator to Java (using VPI). It is in early stage now and supports Linux Icarus Verilog-Java link. It allows user to single step/run your model, set/get each of its registeres/nets, do trace log and breakpoints. Currently it does not have any GUI (command line like interface). The target is to provide a set of GUI components (Java beans) which allow user to build their processor specific simulators in a flash, including source-level software debugging. If you will be interrested I may put it together and post you within a week (with all sources). regards, Tomasz Sztejka ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html |
|
|
|
Hi Tomasz Sztejka, Can explain more about the bridge .... what is a bridge? The simulator will onli work in Linux environment? How about window? cos most of the comp at my uni is running on microsoft product .... I am very interested, thank you ..... by the way, is there any documentation on how to use the simulator? regards EH --------------------------------- |
|
|
|
--- Eng How Khoo <> wrote: > Hi Tomasz Sztejka, > > Can explain more about the bridge .... what is a bridge? The > simulator will onli work in Linux environment? How about window? cos > most of the comp at my uni is running on microsoft product .... > > I am very interested, thank you ..... > by the way, is there any documentation on how to use the simulator? The Icarus Verilog (http://icarus.com/eda/verilog/) is a free Linux/Windows/etc verilog compiler plus simulator (this is Verilog chip description language simulator, it does not simulate on chip level - won't calculate timings, propagations and etc., as far as I know) As all verilogs its simulator uses VPI (what explains to: Verliog Procedural Interface) which allows user written C programs to interact with simulation process. This can be anything and in my case it is "bridge" software which sends informations about simulation progress/data to another program. So my program is _not_ a simulator. It contains C part which interacts with simulator and Java part for UI. This C part of program is specific to Icarus Verilog and propably for Linux too - I didn't try to compile it with gcc under Windows since I don't use it anymore at home and did not tried other Verilog simulators since I don't have them. Second part it is a Java side, which will actually implement GUI (well, currently only commandline but I'm working on it). Java side is completly portable and OS independent. This allows you to type get xxx, set xxx=yy or something like that. In future user will just have regular GUI to proceed and programmer will have set of toolbox java classes to tune it for own needs. As I mentioned up to now it is in early stage (not for beginners - it is horrible if you don't know who is wrong - you or soft you use). Anyway, if you like to complete your own Verilog environment I would recommend: - use some of Xilinx or Altera free soft. It is huge (>200Megs), slow and in my opinion buggy (esp. verilog in Xilinx) but it is Windows, free and complete. I recommed it to beginners equiped with powerfull PC's; - use Icarus Verilog as compiler / simulator. The simulator is command line - you fire it and it dumps log to a file and finishes. Very fast. No interaction. To view results in nice wave format you may use GTKWave availble in most linux distributions. To actually get data to program chip you may enter your design to Xilinx www accesible online system and get them back in compiled form ( I didn't got that far now ) Complete soft is <20Megs. I recommed it for linux freaks :) regards, Tomasz Sztejka ________________________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html |