Forums

Writing a simple assembler

Started by Alex March 6, 2006
Hi,

(since asm conf. is dead, i post it here, maybe someone can help)

In my current project , i have to create i simple assembler for an 8-bit
processor.
Since i am a novice at language writing, I would like to ask if somebody
can direct to some articles
on this topic, or where to start looking at.
Thank for help.


-- 
Alex
Alex wrote:
> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help.
Alfred Arnold's AS is here http://john.ccac.rwth-aachen.de:8000/as/download.html This covers many microcontrollers. and more complex, is Randall Hyde's HLA http://webster.cs.ucr.edu/AsmTools/HLA/index.html latticeSemi's downloads for the small Mico8 SoftCPU include an assembler source, but it is simpler than AS, which now supports Mico8. -jg
Hey you have to get this book:

Language Translators: Assemblers, Compilers, and Interpreters
by John Zarrella
ISBN: 0935230068

used on amazon is around ~$4 and change.

I bought it to give me a heads up for a project I was working on at
home ... I needed to write an assembler for a 8bit micro as well and
needed to brush up on the different modules comprising such a task.

Its rich in theory and practical!!!
(versus the "dragon book" which is all theory and no practical
  examples )


Alex wrote:

> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help. > >
Alex wrote:
> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help.
My favoured tools are lex/yacc (flex/bison). Source code to my assembler for simple architectures is here (non-macro though): http://www.telegraphics.com.au/svn/dpa/trunk/ Notes here http://www.telegraphics.com.au/sw/info/dpa.html
> > > -- > Alex
I am writing an assembler for a 16-bit soft core CPU I made for FPGA in
VHDL.

I have a general idea of how I am going to attack this (I dont have any
theoretical books).

I will outline my plans to you a bit later. Hopefully, people will
offer some critique and we can learn together :).

Alex wrote:
> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help. > > > -- > Alex
OK here is my general idea: What I have decided to do was to split the assembler into classes. I figured this would be a good time to do something useful and advance my C++ knowledge and methodologies. 1. Preprocessor Basically takes equates and macros and replaces them with their equivalent. This alone will require a pass across the entire file. 1. File Tokenizer Once the preprocessor has finished, the assembly will start to take place. This respective class will take the file and divide it into tokens (with selected delimiters). Its sole purpose is to maintain the position in the file, memory associated with it etc. 2. Global Hash Table This portion was added to make my life easier. I decided that instead of parsing the data through text, which will make my job a living hell, I decided to use a global hash table to keep track of strings. This way I can just do a table lookup of the hashed token, look up type flags and determine what to do from there (assemble an opcode, report error, incompatible types, etc.) 3. Opcode hash list This portion really is the portion of the program which contains hash values for each of the opcode words (minus size suffixes), of course everything will be put in uppercase before its hash value is taken. 4. Opcode assembler. Takes an opcode from the hash list, parses parameters and assembles the respective opcode, based on the size of the variable (if operated), the register operands, etc. This is my general idea, of course I havent started to code it yet as I am still planning it. I do need an efficient hashing function, with a low probability of collisions (ideally). I will need to code to handle collisions. Anyways I am open to new ideas hopefully you have some of your own to add. -Isaac -------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.
Alex wrote:
> Hi, > > (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor. > Since i am a novice at language writing, I would like to ask if somebody > can direct to some articles > on this topic, or where to start looking at. > Thank for help. >
One method I have used is to write out the syntax of your assembly language in BNF, then augment the BNF by appending to each rule the name of a function to be called when that rule is matched (ie fires). Let's call this augmented syntax BNF+. Once you have the core BNF engine running, it can build the assembler for you (usually called a compiler-compiler, but this is assembler). You write the syntax of BNF itself, in BNF+. Feed this as input to the engine, and get out a set of control tables. Drive the engine with these, and it will read anything in BNF+ (eg your assembly language). Besides the BNF engine, you need a symbol table (indexing method of your choice), and the back-end code generation functions. You can do this without a tokeniser, by defining a token on BNF, & letting the main engine do the work. This is simple, but slow. The following examples are not fully functional, but should give the flavour of the thing: Fragment of a BNF+ syntax file: * rule elements semantic function file = {oneline} [eof] . oneline = line ending . NextLine *----------------------------------------------- * Basic elements eof = #26 . eol = {ws} [comment] #10 . ws = wsc {wsc} . wsc = " " | #9 . digit = ("0".."9") . uphex = ("A".."F") . lowhex = ("a".."f") . upper = ("A".."Z") . lower = ("a".."z") . hexchar = digit | HexValueDigit uphex | HexValueUpper lowhex . HexValueLower letter = upper | lower . symch = letter | digit | "_" . Top-level function of the BNF+ engine: // Syntax "engine": the core of the system #define SP current.syntab[current.synptr] #define GETCHR {currch = getchr(&current);} #define NEXTCH {lastch = currch; current.fileptr++; \ GETCHR; if(currch=='\n') current.linum++;} #define GOTOCH(x) {current.fileptr = x; GETCHR} #define EXIT {if(init.filename) fclose(init.input); \ return(current);} state engine(state init) { state current; // Local copy (we may want to back-up) state temp; // Workspace for recursive calls WORD rule; char error[100]; char currch; init.depth++; if(init.filename) // Open new file... { if(!(init.input = fopen(init.filename, "rb"))) { sprintf(error,"Fatal: cannot open file %s\n", init.filename); fatal(error); } init.fileptr = 0; // Start at BOF init.linum = 1; getchr(NULL); // Force a re-read } current = init; current.filename = NULL; // File dealt with GETCHR; do { rule = SP; currstate = &current; // Expose it for logging #if LOGSTATE { static int ctr = 0; char zz[8]; if((currch >= 0x20) && (currch <= 0x7e)) sprintf(zz,"<%c> ",currch); else sprintf(zz,"0x%02x", currch); fprintf(logfile,"%5d %3d%6d [%4d] %s [%4d] %c\n",ctr++, init.depth, rule,current.synptr, zz, current.fileptr, (current.match) ? 'T' : 'F'); fflush(logfile); } #endif if(rule < 0) // SUB RULE { if(current.match) { temp = current; temp.synptr = -rule; // Rule address temp = engine(temp); current.match &= temp.match; if(temp.match) GOTOCH(temp.fileptr); } } else { if(rule < 200) // LITERAL CHAR { current.match &= ((char)rule == currch); if(current.match) NEXTCH } else switch(rule) { case 202: // '|' ALTERNATE RULE if(!current.match) { // Last test failed: back-up & try next current.match = init.match; current.fileptr = init.fileptr; current.synptr++; // To start of new rule break; } // Else, fall through (rule succeeded) case 201: // '.' END OF RULE current.synptr++; // Point at semantic if(current.match) { #if LOGSEMANTIC char zz[8]; if((lastch >= 0x20) && (lastch <= 0x7e)) sprintf(zz,"<%c> ",lastch); else sprintf(zz,"0x%02x", lastch); if(SP) fprintf(logfile," Semantic[%4d] %s\n",SP,zz); fflush(logfile); #endif semantics[SP](); // Semantic function (zero is valid) } EXIT; case 203: // '{' case 204: // '[' temp = current; temp.synptr++; temp = engine(temp); if(temp.match) // If it worked, swallow the chars. GOTOCH(temp.fileptr); if(current.syntab[temp.synptr] != 2+SP) { sprintf(error,"Fatal: bracket imbalance at %d:%d\n", temp.synptr, current.synptr); fatal(error); } if(temp.match && (current.syntab[temp.synptr] == 205)) continue; // {...} repeats indefinitely else current.synptr = temp.synptr; break; case 205: // '}' case 206: // ']' EXIT; case 207: // RANGE current.match &= ((currch >= (char)current.syntab[current.synptr+1]) && (currch <= (char)current.syntab[current.synptr+2])); current.synptr += 2; if(current.match) NEXTCH; break; case 208: // '@' - include file temp = current; temp.filename = Identifier; temp.fileptr = 0; // Start at BOF of inner file current.synptr++; // Point to nested rule temp.synptr = -SP; temp = engine(temp); // Run the nested file current.match &= temp.match; default: { sprintf(error,"Fatal: unrecognised syntax code at [%d] = %d\n", current.synptr, rule); fatal(error); } } } if(!current.match) // If this rule failed, back-up GOTOCH(init.fileptr) do { current.synptr++; // Next syntax item (skip if tests have failed) } while(!(current.match || (SP > 200))); } while(1); // End the DO loop }
Isaac Bosompem wrote:
>
... snip ...
> > I do need an efficient hashing function, with a low probability of > collisions (ideally). I will need to code to handle collisions. > > Anyways I am open to new ideas hopefully you have some of your own > to add.
You are welcome to use hashlib, which is under GPL licensing, and was designed to fill just this sort of need. It is written in pure ISO C. You could open one table and stuff it with the opcodes. Another could hold the macros, and yet another the symbols. The tables will all use the same re-entrant hash code, yet can have widely different content. <http://cbfalconer.home.att.net/download/hashlib.zip> -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
Alex wrote:

> (since asm conf. is dead, i post it here, maybe someone can help) > > In my current project , i have to create i simple assembler for an 8-bit > processor.
If you know scripting languages like Perl or python, you can make something fairly quickly. I've made a simple Perl assembler script for my small FPGA-based CPU in about 200 lines of code. For simple projects, a line by line translation works fine. Match each line against a set of patterns, translate mnemonic to opcode through a hash, and call subroutine to fill in operand bits. Use two passes. In the first pass, you build the symbol table (a hash), and in the second pass, you can generate the code. Or store all code in array, and dump it after second pass. If you need more advanced features like macros, things become a bit more complicated, and then a properly written assembler, using lex/yacc maybe a better choice.
Well, that what i thought initially. I know Tcl and used it for parsing
hdl code, so this seems to be an easy and fast solution for the given  
problem.

I just want to clarify few things - do I really need two passes (omit  
preprocessor for)
in case I don't have conditional branches and jumps.

At the same time I was thinking about implementing such a feature as  
merging few sequential instructions
provided they can be accomplished simultaneously. Obviously it is possible  
to declare just a new
instructions, but this will not look that tidy.

Anyway, thank you for help.
Alex

> Alex wrote: > >> (since asm conf. is dead, i post it here, maybe someone can help) >> >> In my current project , i have to create i simple assembler for an 8-bit >> processor. > > If you know scripting languages like Perl or python, you can make > something fairly quickly. I've made a simple Perl assembler script for > my small FPGA-based CPU in about 200 lines of code. > > For simple projects, a line by line translation works fine. Match each > line against a set of patterns, translate mnemonic to opcode through a > hash, and call subroutine to fill in operand bits. Use two passes. In > the first pass, you build the symbol table (a hash), and in the second > pass, you can generate the code. Or store all code in array, and dump > it after second pass. > > If you need more advanced features like macros, things become a bit > more complicated, and then a properly written assembler, using lex/yacc > maybe a better choice. >
-- Alex