Forums

Writing a simple assembler

Started by Alex March 6, 2006
> First, choosing a good algorithm up front is not "premature > optimization." It's simply good design. > > Second, despite the best intentions (encapsulation and other good > software engineering methods), it's often not so simple to just replace > one symbol table search routine with another. > > Third, "a simple assembler" today may be a complex assembler tomorrow. > Better to do it right the first time around, especially as using a hash > table lookup isn't a whole lot more complex than a linear search. > > I regret the day I said to myself "heck, this is just a prototype, I'll > use a simple linear search right now and fix it in the final version." > 82 versions later and over 100,000 lines of code, I can attest that > this was the second worst design decision I made for my "prototype > assembler" (the #1 bad design decision was using Flex and Bison).
Randy, I tip my hat to you sir. It seems many here have forgotten the tenet of Software Engineering. I said it before and Ill say it now .. it SHOWS in the sort of software currently developed. Another thing that irks me is the "its done in this library so just use it" argument. No one stops to question WHETHER that library is doing it right/efficiently. Anyway I am getting off the point ...
Grant Edwards wrote:
> On 2006-03-08, Paul Keinanen <keinanen@sci.fi> wrote: > > >>> If it's just "simple assembler" why not just use a list with > >>> O(n) access time. > >> > >>Most 8bit micros have anywhere from 40 - 60 instructions (opcodes > >>NOT opcodes+addressing formats) > >> > >>O(n) searches on that set looks like a waste of cpu cyles > >> > >>> Better yet, use a real programming language with a dictionary > >>> type (Python, Smalltalk, whatever). > >> > >>There are somethings you can ONLY do in assembly ... and on an > >>8 bit micro (with 64k address space) every byte counts > > > > In the 1970's I wrote several cross-assemblers in Fortran running on > > PDP-11s (which have 64 KiB address space) for various processors such > > as 8080 and 1802. I never bothered to use anything more fancier than > > linear search for opcode or symbol table search, since the total time > > was dominated by the program load time, the source file read time (two > > passes) and the binary output file writing (or even punching to paper > > tape for downloading to the target :-). > > My point exactly. Worrying about hash tables for symbol lookup > reeks of premature optimization for "a simple assembler".
Well the hash lookups were not for premature optimization. I first decided on doing simple string parsing but I quickly realized it would be a nightmare to code. I am simply looking for a solution that is easy to code.
> > It would be very hard to write so huge modules for any small > > target processor that would require such a huge number of > > labels, that the inefficient symbol table search time would > > have been of any significance relative to the I/O times. > > I'm also surprised that somebody thinks they're going to use > C++ and generic hashing libraries on an 8-bit target with a 64K > address space.
That choice might have been overambitious, but I do feel I need to get a grasp of OOP programming. I must ask though why you guys are so adamant about using TCL or Python?! I have not started to code it yet, I am still in the planning phase. -------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.
samIam wrote:
> WOAH > the "thought police"!
There is no need to overreact to somebody pointing out a breach of Usenet etiquette. Doing so does, however, give us a clue to the personality we're dealing with.
> > > Please do not top-post. It loses all context, and makes no sense. > > > > You don't have to implement any collision detection, it is all done > > for you. > > completely missing the point. > the problem is the NEED for collision detection ... and having to > re-allocate twice the memory when the table is filled.
It's not "collision detection", it's chaining; it's not a "problem"; and if you want an O(1) lookup, hashes are for you. If not: there are a million other data structures you can use. Buy this book: http://dogbert.abebooks.com/servlet/SearchResults?an=knuth&tn=sorting Plonk. (And I *never* plonk.)
> > having it "hidden" or "done for you" doesnt resolve the issues above
Alex wrote:
> First, I would like to thank everyone for a response and advice. > > Second - the purpose was to write a simple assembler in order to generate > an op code on a PC > and then run it on my IC (fpga is used as a host controller). > I understand that the task is trivial for gurus, but being a novice in > this thing first thing that came to > my mind was simply to make two passes: first - preprocessor, detects all > the variables etc., second actual > translation - recognising mnemonics and generate an opcode . Obviously it > is not a "proper" way to do it > (grammar descriptions and so on..). That's why I was asking about some > examples and articles about this issue.
It can be argued that since you are in learning mode there is no 'wrong' way to go about it. Go forth and build your prototype; doing so is a great way to learn about languages and tools that might be related or make the task easier. :)
> > > > -- > Alex
On 2006-03-08, Isaac Bosompem <x86asm@gmail.com> wrote:

> I must ask though why you guys are so adamant about using TCL > or Python?! I have not started to code it yet, I am still in > the planning phase.
Because the sort of stuff you're planning wouldn't need planning in a high level language. Things like symbol lookups are just basic built-in operations. -- Grant Edwards grante@visi.com
> > About 30 years ago i wrote a (cross) assembler for the Intel 8080 > processor (8 bit) in Basic (from Hewlett Packard). > I believe i have somewhere still the listing of this program. > If you are very interested i will look what i still have and try to > scan it (i do not have it in electronic format, maybe as punch paper > tape (:-) >
Hi, I am interested in seeing your BASIC assembler; if it is on paper tape, I can read it (and return the tape with conversion of your choice). I imagine this predates RMB (Rocky Mountain Basic)? Regards, Michael Grigoni Cybertheque Museum
toby wrote:

> Alex wrote: > >>First, I would like to thank everyone for a response and advice. >> >>Second - the purpose was to write a simple assembler in order to generate >>an op code on a PC >>and then run it on my IC (fpga is used as a host controller). >>I understand that the task is trivial for gurus, but being a novice in >>this thing first thing that came to >>my mind was simply to make two passes: first - preprocessor, detects all >>the variables etc., second actual >>translation - recognising mnemonics and generate an opcode . Obviously it >>is not a "proper" way to do it >>(grammar descriptions and so on..). That's why I was asking about some >>examples and articles about this issue. > > > It can be argued that since you are in learning mode there is no > 'wrong' way to go about it. Go forth and build your prototype; doing so > is a great way to learn about languages and tools that might be related > or make the task easier. :)
Total agreement. In the total time spent posting on this thread, a simple assembler could have been written and debugged :) Myself, I'd just define some macros for MASM, the old Microsoft DOS assembler.
On Wed, 08 Mar 2006 17:34:20 -0000, Alex <al.lopich@gmail.com> wrote:

>First, I would like to thank everyone for a response and advice. > >Second - the purpose was to write a simple assembler in order to generate >an op code on a PC >and then run it on my IC (fpga is used as a host controller). >I understand that the task is trivial for gurus, but being a novice in >this thing first thing that came to >my mind was simply to make two passes: first - preprocessor, detects all >the variables etc., second actual >translation - recognising mnemonics and generate an opcode . Obviously it >is not a "proper" way to do it >(grammar descriptions and so on..). That's why I was asking about some >examples and articles about this issue.
A two pass assembler is definitely a practical way of doing a cross-assembler. The first pass must generate the correct amount of code for each instruction, in order to "detect" the locations of all branch target labels in the program. In the second pass do the same thing again and using the label addresses stored in the symbol table, generate the correct code (especially the branch/jump instructions). When running the assembler on a system with a huge memory (such as PC) just generate the code into memory with any forward jump/branch set to 0 and saving this location into a fix-up record. When the assembly is ready to calculate the address of the label, you have to make a fix-up patch at all those locations that referenced that label. Finally, write the patched code into a file or target. I would consider implementing a two pass cross assembler to be the simplest thing, just use brute force :-). If you are comfortable with a single pass assembler with fix-ups, you could write as well a linker or a linking loader. Paul
samIam wrote:
>
... snip ...
> > Another thing that irks me is the "its done in this library so > just use it" argument. No one stops to question WHETHER that > library is doing it right/efficiently.
If you are talking about my recommendation of hashlib, you are perfectly free to comment on the correctness and efficiency of that library. I claim it is both, although it may give up some slight efficiencies for ease of use. It is out there in source form, totally exposed to the winds of criticism. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
Jim Stewart wrote:
> toby wrote: > > > Alex wrote: > > > >>First, I would like to thank everyone for a response and advice. > >> > >>Second - the purpose was to write a simple assembler in order to generate > >>an op code on a PC > >>and then run it on my IC (fpga is used as a host controller). > >>I understand that the task is trivial for gurus, but being a novice in > >>this thing first thing that came to > >>my mind was simply to make two passes: first - preprocessor, detects all > >>the variables etc., second actual > >>translation - recognising mnemonics and generate an opcode . Obviously it > >>is not a "proper" way to do it > >>(grammar descriptions and so on..). That's why I was asking about some > >>examples and articles about this issue. > > > > > > It can be argued that since you are in learning mode there is no > > 'wrong' way to go about it. Go forth and build your prototype; doing so > > is a great way to learn about languages and tools that might be related > > or make the task easier. :) > > Total agreement. In the total time spent > posting on this thread, a simple assembler > could have been written and debugged :) > > Myself, I'd just define some macros for > MASM, the old Microsoft DOS assembler.
Hi Jim, Now that is a technique I have heard of before! A gentleman Tom Evans clued me into this, and I hope he will allow me the liberty of quoting: There's two ways to "write an assembler in Macro 11". The IMP/PACE one I mentioned was "the classic version". It had the parser, symbol table management and everything, all lovingly coded in individual machine instructions. Lots of them. How redundant... The SC/MP cross assembler (and other ones I've worked on ... that generated microcode) consisted of MACROS. This is cheating big-time. The Macro-11 assembler is being abused to assemble and emit code for a different CPU, or sometimes not even for a CPU but for a ROM Sequencer or worse. The macros have the same names as the target CPU's op-codes and they simply generate the appropriate code, (ab)using the symbol table management built into Macro-11. As a huge benefit you can also use all of the powerful macro facilities in Macro-11. Try emulating all of that in lex/yacc. Of course if the targeted CPU uses opcodes with the same names as the ones the PDP-11 uses there's a bit of strife, ... Macro-11 isn't exactly fast when abused like this. It took about 5 minutes to make [a] 1023-byte ROM ... Kids these days have it easy! Lex! Yacc! Puts me in mind of: http://www.phespirit.info/montypython/four_yorkshiremen.htm "SECOND YORKSHIREMAN: Luxury. We used to have to get out of the lake at six o'clock in the morning, clean the lake, eat a handful of 'ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!" --Toby