Writing a simple assembler

Hi,

(since asm conf. is dead, i post it here, maybe someone can help)

In my current project , i have to create i simple assembler for an 8-bit
processor.
Since i am a novice at language writing, I would like to ask if somebody
can direct to some articles
on this topic, or where to start looking at.
Thank for help.


-- 
Alex

Reply by Jim Granville ●March 6, 20062006-03-06

Alex wrote:
> Hi,
> 
> (since asm conf. is dead, i post it here, maybe someone can help)
> 
> In my current project , i have to create i simple assembler for an 8-bit
> processor.
> Since i am a novice at language writing, I would like to ask if somebody
> can direct to some articles
> on this topic, or where to start looking at.
> Thank for help.

Alfred Arnold's AS is here
http://john.ccac.rwth-aachen.de:8000/as/download.html

This covers many microcontrollers.

and more complex, is Randall Hyde's HLA
http://webster.cs.ucr.edu/AsmTools/HLA/index.html

latticeSemi's downloads for the small Mico8 SoftCPU include
an assembler source, but it is simpler than AS, which now
supports Mico8.

-jg

Reply by samIam ●March 6, 20062006-03-06

Hey you have to get this book:

Language Translators: Assemblers, Compilers, and Interpreters
by John Zarrella
ISBN: 0935230068

used on amazon is around ~$4 and change.

I bought it to give me a heads up for a project I was working on at
home ... I needed to write an assembler for a 8bit micro as well and
needed to brush up on the different modules comprising such a task.

Its rich in theory and practical!!!
(versus the "dragon book" which is all theory and no practical
  examples )

Alex wrote:

> Hi,
> 
> (since asm conf. is dead, i post it here, maybe someone can help)
> 
> In my current project , i have to create i simple assembler for an 8-bit
> processor.
> Since i am a novice at language writing, I would like to ask if somebody
> can direct to some articles
> on this topic, or where to start looking at.
> Thank for help.
> 
>

Reply by toby ●March 6, 20062006-03-06

Alex wrote:
> Hi,
>
> (since asm conf. is dead, i post it here, maybe someone can help)
>
> In my current project , i have to create i simple assembler for an 8-bit
> processor.
> Since i am a novice at language writing, I would like to ask if somebody
> can direct to some articles
> on this topic, or where to start looking at.
> Thank for help.

My favoured tools are lex/yacc (flex/bison). Source code to my
assembler for simple architectures is here (non-macro though):
http://www.telegraphics.com.au/svn/dpa/trunk/
Notes here http://www.telegraphics.com.au/sw/info/dpa.html

> 
> 
> -- 
> Alex

Reply by Isaac Bosompem ●March 6, 20062006-03-06

I am writing an assembler for a 16-bit soft core CPU I made for FPGA in
VHDL.

I have a general idea of how I am going to attack this (I dont have any
theoretical books).

I will outline my plans to you a bit later. Hopefully, people will
offer some critique and we can learn together :).

Reply by Isaac Bosompem ●March 6, 20062006-03-06

Alex wrote:
> Hi,
>
> (since asm conf. is dead, i post it here, maybe someone can help)
>
> In my current project , i have to create i simple assembler for an 8-bit
> processor.
> Since i am a novice at language writing, I would like to ask if somebody
> can direct to some articles
> on this topic, or where to start looking at.
> Thank for help.
>
>
> --
> Alex

OK here is my general idea:

What I have decided to do was to split the assembler into classes. I
figured this would be a good time to do something useful and advance my
C++ knowledge and methodologies.

1. Preprocessor
Basically takes equates and macros and replaces them with their
equivalent. This alone will require a pass across the entire file.

1. File Tokenizer
Once the preprocessor has finished, the assembly will start to take
place. This respective class will take the file and divide it into
tokens (with selected delimiters). Its sole purpose is to maintain the
position in the file, memory associated with it etc.

2. Global Hash Table

This portion was added to make my life easier. I decided that instead
of parsing the data through text, which will make my job a living hell,
I decided to use a global hash table to keep track of strings. This way
I can just do a table lookup of the hashed token, look up type flags
and determine what to do from there (assemble an opcode, report error,
incompatible types, etc.)

3. Opcode hash list

This portion really is the portion of the program which contains hash
values for each of the opcode words (minus size suffixes), of course
everything will be put in uppercase before its hash value is taken.

4. Opcode assembler.

Takes an opcode from the hash list, parses parameters and assembles the
respective opcode, based on the size of the variable (if operated), the
register operands, etc.

This is my general idea, of course I havent started to code it yet as I
am still planning it.

I do need an efficient hashing function, with a low probability of
collisions (ideally). I will need to code to handle collisions.

Anyways I am open to new ideas hopefully you have some of your own to
add.

-Isaac

--------------------------------------------------------------------------------------------------------------
I am an EE student looking for summer employment in Toronto, Canada
area
If you have any openings please contact me at isaacb[AT]rogers[DOT]com.

Reply by David R Brooks ●March 7, 20062006-03-07

Alex wrote:
> Hi,
> 
> (since asm conf. is dead, i post it here, maybe someone can help)
> 
> In my current project , i have to create i simple assembler for an 8-bit
> processor.
> Since i am a novice at language writing, I would like to ask if somebody
> can direct to some articles
> on this topic, or where to start looking at.
> Thank for help.
> 
One method I have used is to write out the syntax of your assembly 
language in BNF, then augment the BNF by appending to each rule the name 
of a function to be called when that rule is matched (ie fires). Let's 
call this augmented syntax BNF+.

Once you have the core BNF engine running, it can build the assembler 
for you (usually called a compiler-compiler, but this is assembler).
You write the syntax of BNF itself, in BNF+. Feed this as input to the 
engine, and get out a set of control tables. Drive the engine with 
these, and it will read anything in BNF+ (eg your assembly language).

Besides the BNF engine, you need a symbol table (indexing method of your 
choice), and the back-end code generation functions.

You can do this without a tokeniser, by defining a token on BNF, & 
letting the main engine do the work. This is simple, but slow.

The following examples are not fully functional, but should give the 
flavour of the thing:

Fragment of a BNF+ syntax file:

* rule        elements                      semantic function
file        = {oneline} [eof]           .
oneline     = line ending               .   NextLine
*-----------------------------------------------
* Basic elements
eof         = #26                       .
eol         = {ws} [comment] #10        .
ws          = wsc {wsc}                 .
wsc         = " "                       |
               #9                        .
digit       = ("0".."9")                .
uphex       = ("A".."F")                .
lowhex      = ("a".."f")                .
upper       = ("A".."Z")                .
lower       = ("a".."z")                .
hexchar     = digit                     |   HexValueDigit
               uphex                     |   HexValueUpper
               lowhex                    .   HexValueLower
letter      = upper                     |
               lower                     .
symch       = letter                    |
               digit                     |
               "_"                       .

Top-level function of the BNF+ engine:

// Syntax "engine": the core of the system
#define SP        current.syntab[current.synptr]
#define GETCHR    {currch = getchr(&current);}
#define NEXTCH    {lastch = currch; current.fileptr++; \
   GETCHR; if(currch=='\n') current.linum++;}
#define GOTOCH(x) {current.fileptr = x; GETCHR}
#define EXIT      {if(init.filename) fclose(init.input); \
   return(current);}

state engine(state init)
{
     state current;          // Local copy (we may want to back-up)
     state temp;             // Workspace for recursive calls
     WORD rule;
     char error[100];
     char currch;

     init.depth++;

     if(init.filename)       // Open new file...
     {
         if(!(init.input = fopen(init.filename, "rb")))
         {
             sprintf(error,"Fatal: cannot open file %s\n", init.filename);
             fatal(error);
         }

         init.fileptr  = 0;          // Start at BOF
         init.linum    = 1;
         getchr(NULL);               // Force a re-read
     }

     current = init;
     current.filename = NULL;        // File dealt with
     GETCHR;

     do
     {
         rule = SP;
         currstate = &current;       // Expose it for logging

#if LOGSTATE
         {
             static int ctr = 0;
             char zz[8];
             if((currch >= 0x20) && (currch <= 0x7e))
                 sprintf(zz,"<%c> ",currch);
             else
                 sprintf(zz,"0x%02x", currch);
             fprintf(logfile,"%5d %3d%6d [%4d]  %s [%4d] %c\n",ctr++, 
init.depth, rule,current.synptr, zz, current.fileptr, (current.match) ? 
'T' : 'F');
             fflush(logfile);
         }
#endif

         if(rule < 0)                // SUB RULE
         {
             if(current.match)
             {
                 temp = current;
                 temp.synptr = -rule;            // Rule address
                 temp = engine(temp);
                 current.match  &= temp.match;
                 if(temp.match)
                     GOTOCH(temp.fileptr);
             }
         }
         else
         {
             if(rule < 200)          // LITERAL CHAR
             {
                 current.match &= ((char)rule == currch);
                 if(current.match)
                     NEXTCH
             }
             else
             switch(rule)
             {
             case 202:               // '|'  ALTERNATE RULE
                 if(!current.match)
                 {                           // Last test failed: 
back-up & try next
                     current.match = init.match;
                     current.fileptr = init.fileptr;
                     current.synptr++;       // To start of new rule
                     break;
                 }
                 // Else, fall through (rule succeeded)
             case 201:               // '.'  END OF RULE
                 current.synptr++;           // Point at semantic
                 if(current.match)
                 {
#if LOGSEMANTIC
                     char zz[8];
                     if((lastch >= 0x20) && (lastch <= 0x7e))
                         sprintf(zz,"<%c> ",lastch);
                     else
                         sprintf(zz,"0x%02x", lastch);

                     if(SP)
                         fprintf(logfile,"   Semantic[%4d] %s\n",SP,zz);
                     fflush(logfile);
#endif
                     semantics[SP]();        // Semantic function (zero 
is valid)
                 }
                 EXIT;

             case 203:               // '{'
             case 204:               // '['
                 temp = current;
                 temp.synptr++;
                 temp = engine(temp);
                 if(temp.match)              // If it worked, swallow 
the chars.
                     GOTOCH(temp.fileptr);
                 if(current.syntab[temp.synptr] != 2+SP)
                 {
                     sprintf(error,"Fatal: bracket imbalance at 
%d:%d\n", temp.synptr, current.synptr);
                     fatal(error);
                 }

                 if(temp.match && (current.syntab[temp.synptr] == 205))
                     continue;               // {...} repeats indefinitely
                 else
                     current.synptr = temp.synptr;
                 break;

             case 205:               // '}'
             case 206:               // ']'
                 EXIT;

             case 207:               // RANGE
                 current.match &= ((currch >= 
(char)current.syntab[current.synptr+1]) && (currch <= 
(char)current.syntab[current.synptr+2]));
                 current.synptr += 2;
                 if(current.match)
                     NEXTCH;
                 break;

             case 208:               // '@' - include file
                 temp = current;
                 temp.filename = Identifier;
                 temp.fileptr  = 0;          // Start at BOF of inner file
                 current.synptr++;           // Point to nested rule
                 temp.synptr   = -SP;
                 temp = engine(temp);        // Run the nested file
                 current.match &= temp.match;

             default:
                 {
                     sprintf(error,"Fatal: unrecognised syntax code at 
[%d] = %d\n", current.synptr, rule);
                     fatal(error);
                 }
             }
         }

         if(!current.match)          // If this rule failed, back-up
             GOTOCH(init.fileptr)

         do
         {
             current.synptr++;       // Next syntax item (skip if tests 
have failed)
         } while(!(current.match || (SP > 200)));

     } while(1);                     // End the DO loop
}

Reply by CBFalconer ●March 7, 20062006-03-07

Isaac Bosompem wrote:
> 
... snip ...
> 
> I do need an efficient hashing function, with a low probability of
> collisions (ideally). I will need to code to handle collisions.
> 
> Anyways I am open to new ideas hopefully you have some of your own
> to add.

You are welcome to use hashlib, which is under GPL licensing, and
was designed to fill just this sort of need.  It is written in pure
ISO C.

You could open one table and stuff it with the opcodes.  Another
could hold the macros, and yet another the symbols.  The tables
will all use the same re-entrant hash code, yet can have widely
different content.

  <http://cbfalconer.home.att.net/download/hashlib.zip>

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Reply by Artenz ●March 7, 20062006-03-07

Alex wrote:

> (since asm conf. is dead, i post it here, maybe someone can help)
>
> In my current project , i have to create i simple assembler for an 8-bit
> processor.

If you know scripting languages like Perl or python, you can make
something fairly quickly. I've made a simple Perl assembler script for
my small FPGA-based CPU in about 200 lines of code.

For simple projects, a line by line translation works fine. Match each
line against a set of patterns, translate mnemonic to opcode through a
hash, and call subroutine to fill in operand bits. Use two passes. In
the first pass, you build the symbol table (a hash), and in the second
pass, you can generate the code. Or store all code in array, and dump
it after second pass.

If you need more advanced features like macros, things become a bit
more complicated, and then a properly written assembler, using lex/yacc
maybe a better choice.

Reply by Alex ●March 7, 20062006-03-07

Well, that what i thought initially. I know Tcl and used it for parsing
hdl code, so this seems to be an easy and fast solution for the given  
problem.

I just want to clarify few things - do I really need two passes (omit  
preprocessor for)
in case I don't have conditional branches and jumps.

At the same time I was thinking about implementing such a feature as  
merging few sequential instructions
provided they can be accomplished simultaneously. Obviously it is possible  
to declare just a new
instructions, but this will not look that tidy.

Anyway, thank you for help.
Alex

> Alex wrote:
>
>> (since asm conf. is dead, i post it here, maybe someone can help)
>>
>> In my current project , i have to create i simple assembler for an 8-bit
>> processor.
>
> If you know scripting languages like Perl or python, you can make
> something fairly quickly. I've made a simple Perl assembler script for
> my small FPGA-based CPU in about 200 lines of code.
>
> For simple projects, a line by line translation works fine. Match each
> line against a set of patterns, translate mnemonic to opcode through a
> hash, and call subroutine to fill in operand bits. Use two passes. In
> the first pass, you build the symbol table (a hash), and in the second
> pass, you can generate the code. Or store all code in array, and dump
> it after second pass.
>
> If you need more advanced features like macros, things become a bit
> more complicated, and then a properly written assembler, using lex/yacc
> maybe a better choice.
>

-- 
Alex

Previous12 3 4 5 6 Next

Writing a simple assembler

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group