Forums

Writing a simple assembler

Started by Alex March 6, 2006
On Mon, 06 Mar 2006 16:19:24 -0000, Alex <al.lopich@gmail.com> wrote:

>Hi, > >(since asm conf. is dead, i post it here, maybe someone can help) > >In my current project , i have to create i simple assembler for an 8-bit >processor. >Since i am a novice at language writing, I would like to ask if somebody >can direct to some articles >on this topic, or where to start looking at. >Thank for help.
Hi, About 30 years ago i wrote a (cross) assembler for the Intel 8080 processor (8 bit) in Basic (from Hewlett Packard). I believe i have somewhere still the listing of this program. If you are very interested i will look what i still have and try to scan it (i do not have it in electronic format, maybe as punch paper tape (:-) As far as i remember it is a two pass assembler with simple macro's. Bu.
Grant Edwards <grante@visi.com> wrote:

> Better yet, use a real programming language with a dictionary > type (Python, Smalltalk, whatever).
Why such overkill? I held back this long, but now there's no longer any way of avoiding it. It has to be mentioned now, to put an end to this discussion Henry Spencer's "Amazing Awk Assembler" And yes, you can still find that beast out there, if you want to. -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.
Alex wrote:
> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help. > > > -- > Alex
HI Alex You might look at the example of an assembler for the 8080, written by John Cassidy, in the language Forth. Although, it is only a single pass assembler, it includes control structures. The part that really makes it worth while is that the entire source code fits easily on a single 8.5X11 sheet of paper. This makes it easy to understand and easy to modify. With a little more modifications, it can still be a single pass assembler with forward referencing. Still, it is possible to handle all of ones referencing problems in a simple fast single pass assembler. The other thing to remember is that the order you assemble into the address space does not need to be sequencial. The simple control structures like 'if-then-else' help to make your assembly source code more readable and reduce the number of simple entry point labels that tend to clutter most assembly source. Last is that you have the Forth language in the background to create macro functions to any complexity that you like. Dwight
samIam wrote:
> > If it's just "simple assembler" why not just use a list with > > O(n) access time. > > Most 8bit micros have anywhere from 40 - 60 instructions (opcodes > NOT opcodes+addressing formats) > > O(n) searches on that set looks like a waste of cpu cyles
Not necessarily. For example, my old Athlon XP 1800 does a strcmp() between two matching 4-char strings in 17 nanoseconds. On average, a 4 char opcode can be found in 0.5 microseconds by linear search in array of 60 entries. In order to compensate for 10 minutes of time spent implementing a better search, you'll need to run 1.2 billion lines of code through the assembler.
samIam wrote:
> I have never understood the obsession with hash tables. > > SO WHAT they provide O(1) search/insert times ... but they are FIXED in > size ... and you often have to implement collision detection to get > around soem problems.
Chaining is a well studied and simple idea, see, for instance Knuth.
> > A red-and-black tree is a better compromise ... a data structure that > "grows" with the program run, and offers reasonable search and insert > times (Olog(n)?) > > There is no way to know how many labels/symbols exist in your asm source > file ... and JUST choosing a hash table size and then reallocating twice > that size whenever it fills up ... isnt a sound method of implementing a > symbol table in my honest opinion.
There is an informed discussion of symbol table implementations at http://groups.google.com/group/comp.compilers/browse_thread/thread/d44b7c420ab50e28?tvc=2 Hash tables are more efficient. See Anton Ertl's post: Resizing is an O(n) operation (where n is the no. of entries in the hash table at the time of resizing), but you resize only once every O(n) elements, so the amortized cost of resizing per insertion is O(n/n)=O(1); of course, there is no resizing cost when you are only doing lookups.
> > For an opcode table (which is fixed in size as in you know the number > of opcodes in the micro) a hash table makes more sense. > > > CBFalconer wrote: > > > Isaac Bosompem wrote: > > > > ... snip ... > > > >>I do need an efficient hashing function, with a low probability of > >>collisions (ideally). I will need to code to handle collisions. > >> > >>Anyways I am open to new ideas hopefully you have some of your own > >>to add. > > > > > > You are welcome to use hashlib, which is under GPL licensing, and > > was designed to fill just this sort of need. It is written in pure > > ISO C. > > > > You could open one table and stuff it with the opcodes. Another > > could hold the macros, and yet another the symbols. The tables > > will all use the same re-entrant hash code, yet can have widely > > different content. > > > > <http://cbfalconer.home.att.net/download/hashlib.zip> > >
On 2006-03-07, samIam <him@here.com> wrote:
>> If it's just "simple assembler" why not just use a list with >> O(n) access time. > > Most 8bit micros have anywhere from 40 - 60 instructions (opcodes > NOT opcodes+addressing formats) > > O(n) searches on that set looks like a waste of cpu cyles
Who cares? It's an _assembler_. It'll be run at most a few dozen times a day. It's not the kernel's scheduler.
>> Better yet, use a real programming language with a dictionary >> type (Python, Smalltalk, whatever). > > There are somethings you can ONLY do in assembly ... and on an 8 bit > micro (with 64k address space) every byte counts
Ah, the assembler is running on an 8-bit processor. I guess I missed that. I assumed the assembler would be running on a Linux or Windows host. -- Grant Edwards grante Yow! Kids, don't gross me at off... "Adventures with visi.com MENTAL HYGIENE" can be carried too FAR!
CBFalconer wrote:
> Isaac Bosompem wrote: > > > ... snip ... > > > > I do need an efficient hashing function, with a low probability of > > collisions (ideally). I will need to code to handle collisions. > > > > Anyways I am open to new ideas hopefully you have some of your own > > to add. > > You are welcome to use hashlib, which is under GPL licensing, and > was designed to fill just this sort of need. It is written in pure > ISO C. > > You could open one table and stuff it with the opcodes. Another > could hold the macros, and yet another the symbols. The tables > will all use the same re-entrant hash code, yet can have widely > different content. > > <http://cbfalconer.home.att.net/download/hashlib.zip> > > -- > "If you want to post a followup via groups.google.com, don't use > the broken "Reply" link at the bottom of the article. Click on > "show options" at the top of the article, then click on the > "Reply" at the bottom of the article headers." - Keith Thompson > More details at: <http://cfaj.freeshell.org/google/> > Also see <http://www.safalra.com/special/googlegroupsreply/>
Hi CBFalconer, WOW, your package will again make my life a lot easier. Thanks! I will definitely make use of it! Also I am writing for the PC, I was not aware that the OP was writing for the CPU itself (well that is what I have gotten from the replies posted). Also guys, I didnt know of any other methods to implement it, I am definitely going to stay away from parsing the text directly because it would be fairly difficult to code. What is a red - green tree? I know of a binary tree used in Huffman Compression and the like, is this data structure similar? -------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.
> In order to compensate for 10 minutes of time spent implementing a > better search, you'll need to run 1.2 billion lines of code through the > assembler.
This attitude is why there isnt much thought given to softare development nowadays and its clearly reflected in the output and the drive for faster processors and huge amounts of memory/resources
> Resizing is an O(n) operation (where n is the no. of entries in the > hash table at the time of resizing), but you resize only once every > O(n) elements, so the amortized cost of resizing per insertion is > O(n/n)=O(1); of course, there is no resizing cost when you are only > doing lookups.
Biggest assumption you are making here is that THERE IS SPACE TO ALLOCATE/RESIZE. Your assembler would not run on the target processor with that mentality. It would exist only as a cross development tool on a system with the resources to house it. Software should be designed much better than that. But hey, what do I know? Nothing apparently!
Alex wrote:
> Thank you for your comment, you gave me a lot of food for thought. > I think that in any case the main problem is to write an efficient hash > function. > > To make my life easier I was initially thinking about writing my asm in a > scripting language, > since it make string processing rather simple. For now that would work (in > the other case in will be necessary to > describe the syntax via BNF). > Can you explain why it is so necessary for you to use hash table in step 2 > (apart from speed gain). > Alex > > On Tue, 07 Mar 2006 03:30:00 -0000, Isaac Bosompem <x86asm@gmail.com> > wrote: > > > > > Alex wrote: > >> Hi, > >> > >> (since asm conf. is dead, i post it here, maybe someone can help) > >> > >> In my current project , i have to create i simple assembler for an 8-bit > >> processor. > >> Since i am a novice at language writing, I would like to ask if somebody > >> can direct to some articles > >> on this topic, or where to start looking at. > >> Thank for help. > >> > >> > >> -- > >> Alex > > > > OK here is my general idea: > > > > What I have decided to do was to split the assembler into classes. I > > figured this would be a good time to do something useful and advance my > > C++ knowledge and methodologies. > > > > 1. Preprocessor > > Basically takes equates and macros and replaces them with their > > equivalent. This alone will require a pass across the entire file. > > > > > > 1. File Tokenizer > > Once the preprocessor has finished, the assembly will start to take > > place. This respective class will take the file and divide it into > > tokens (with selected delimiters). Its sole purpose is to maintain the > > position in the file, memory associated with it etc. > > > > 2. Global Hash Table > > > > This portion was added to make my life easier. I decided that instead > > of parsing the data through text, which will make my job a living hell, > > I decided to use a global hash table to keep track of strings. This way > > I can just do a table lookup of the hashed token, look up type flags > > and determine what to do from there (assemble an opcode, report error, > > incompatible types, etc.) > > > > 3. Opcode hash list > > > > This portion really is the portion of the program which contains hash > > values for each of the opcode words (minus size suffixes), of course > > everything will be put in uppercase before its hash value is taken. > > > > 4. Opcode assembler. > > > > Takes an opcode from the hash list, parses parameters and assembles the > > respective opcode, based on the size of the variable (if operated), the > > register operands, etc. > > > > This is my general idea, of course I havent started to code it yet as I > > am still planning it. > > > > I do need an efficient hashing function, with a low probability of > > collisions (ideally). I will need to code to handle collisions. > > > > Anyways I am open to new ideas hopefully you have some of your own to > > add. > > > > -Isaac > > > > -------------------------------------------------------------------------------------------------------------- > > I am an EE student looking for summer employment in Toronto, Canada > > area > > If you have any openings please contact me at isaacb[AT]rogers[DOT]com. > > > > > > -- > Alex
Hi Alex, I decided to use a hash table for step 2 to make life easy. I originally thought of using a lookup table based on the first letter within the string but I found it to be too complicated. With a hash value, I can read the asm source, grab a token, run a hash on it and look it up. It was done for simplicity, I didn't initially believe there would be any speed improvements because you will have to do a lookup of the hash value in the table to check if it is a valid symbol and decide from there. I don't know any scripting languages. I guess that is a weakness (and I am also not that great with analog electronics as well) I do know VB though haha. I dont know Python or TCL for that matter. I will pick them up eventually, but the way I learn best is if I use the tool to make something useful with it. If you do decide to write the assembler in Python, please let me know. -Isaac -------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.