Writing a simple assembler| page 9

Reply by Paul Keinanen ●March 10, 20062006-03-10

On 10 Mar 2006 11:11:07 -0800, cs_posting@hotmail.com wrote:

>Paul Keinanen wrote:
>
>> A two pass assembler is definitely a practical way of doing a
>> cross-assembler. The first pass must generate the correct amount of
>> code for each instruction, in order to "detect" the locations of all
>> branch target labels in the program. In the second pass do the same
>> thing again and using the label addresses stored in the symbol table,
>> generate the correct code (especially the branch/jump instructions).
>
>I would imagine things get more complicated when you have both a short
>relative and a long call/jump addressing scheme - you don't know before
>you emit the intervening code how far away your target will be, so you
>don't know how long an instruction word you need to emit, so you don't
>know how far away your target will be...

Backward references are easy to solve, the forward references are a
bit trickier :-). Anyway, you could speculatively generate the long
forward reference in the first pass and when the forward definition of
a label is encountered, check if the forward branch is within short
distance. If it is, in the symbol table, decrement any labels after
the branch instruction by the number of bytes saved. If an optimum
result is not needed, leave it this way. 

However, when moving the labels upwards after the optimised
instruction may reduce the distance from some other prior long branch
instruction, which can also be optimised. 

Thus, also the location of the speculative long branch instruction and
a pointer to the symbol table entry of the target label needs also to
be stored into a temporary symbol table.  Those speculative long
branch instructions have been converted to short branches can be
removed from the symbol table.

When the symbol table shuffling is done, a normal second pass can be
performed and since the locations of all labels are now fixed, it is
easy to emit either a long or short branch instruction.

Paul

Reply by rand...@earthlink.net ●March 10, 20062006-03-10

toby wrote:
> > ...
> > While on the subject, I would also like to point out "yet another
> > reason not to use Flex and Bison" for assemblers--
>
> Are you going to go into the reasons why one *would* want to use them?
> Or will we leave that to others?

Flex and Bison are great for small prototype languages.
They might be okay for languages where you use Flex and/or Bison for
one small part of the compiler (e.g., they way they're used in GCC,
which is huge anyway). Trying to implement a compile-time language
inside a compiler written with Flex/Bison is a major pain in the butt
and it leads to all kinds of problems and restrictions on the grammar.

Let me give a classic example of a problem in HLA. HLA allows a data
declaration like the following:

static someVarName:int32;

or, it can allow a declaration like this:

static someVarName:int32; @external;

(with the obvious [I hope] effect.)

The problem is that if you follow a declaration of this sort by a
command that must be immediately executed at compile-time, you run into
some problems with Bison's one-symbol lookahead. In particular, the
compile-time statement may execute *before* the parser finishes the
declaration. This can create problems if the operation of that
compile-time statement depends upon the declaration, e.g., something
like

static someVarName:int32;
#if (@defined( someVarName)) ... #endif

Because the declaration has not finished yet, the symbol may not be
declared at the point the @defined compile-time function executes, and
@defined incorrectly returns false.

Unfortunately, you cannot (easily, anyway) merge the grammars of the
compile-time and run-time languages together. They are truly two
separate languages that the compiler must process concurrently (and the
other solution, using a preprocessor rather than a compile-time
language has an even bigger set of problems, such as lack of access to
objects declared or used in the run-time language).

Note that if you create a hand-written parser, it's easy enough to work
around problems like this.  Of course, this has nothing to do with
working sets and blowing the cache away, but it is an example of some
problems I've encountered with using Flex and Bison to write a
full-blown macro assembler.

In general, assemblers have such simple grammars that using Bison for
anything other than processing arithmetic expressions is probably a
waste of time anyway. Flex can be useful, though. Nevertheless, a
Flex-generated scanner is going to be *way* bigger than most
hand-generated scanners (though the hand-generated scanner I wrote for
HLA v2.0 is far from tiny, as it was written to be fast and uses
in-line coding to implement a hash search for assembler keywords; and
that's many thousands of lines of code; fortunately, it doesn't all
execute all the time, and most of the memory it's sitting in doesn't
get touched, so you don't have the cache pollution problems you get
with Flex and Bison's tables).
Cheers,
Randy Hyde

Reply by Jim Granville ●March 10, 20062006-03-10

randyhyde@earthlink.net wrote:

<snip>
> In general, assemblers have such simple grammars that using Bison for
> anything other than processing arithmetic expressions is probably a
> waste of time anyway. Flex can be useful, though. Nevertheless, a
> Flex-generated scanner is going to be *way* bigger than most
> hand-generated scanners (though the hand-generated scanner I wrote for
> HLA v2.0 is far from tiny, as it was written to be fast and uses
> in-line coding to implement a hash search for assembler keywords; and
> that's many thousands of lines of code; fortunately, it doesn't all
> execute all the time, and most of the memory it's sitting in doesn't
> get touched, so you don't have the cache pollution problems you get
> with Flex and Bison's tables).

  Is HLA 2.0 ready enough the OP (in: writing a simple assemler)
could use its MACRO features, to create a desired cross-assembler ?
  I'd imagine a HEX output would be needed - not sure if HLA 2.0 does that ?

-jg

Reply by toby ●March 10, 20062006-03-10

randyhyde@earthlink.net wrote:
> toby wrote:
> > > ...
> > > While on the subject, I would also like to point out "yet another
> > > reason not to use Flex and Bison" for assemblers--
> >
> > Are you going to go into the reasons why one *would* want to use them?
> > Or will we leave that to others?
>
> Flex and Bison are great for small prototype languages.
> They might be okay for languages where you use Flex and/or Bison for
> one small part of the compiler (e.g., they way they're used in GCC,
> which is huge anyway). Trying to implement a compile-time language
> inside a compiler written with Flex/Bison is a major pain in the butt
> ...
> In general, assemblers have such simple grammars that using Bison for
> anything other than processing arithmetic expressions is probably a
> waste of time anyway. Flex can be useful, though. Nevertheless, a
> Flex-generated scanner is going to be *way* bigger than most

The fact that its specification is clearer and simpler, leading to a
more reliable and maintainable program, may matter more.

> hand-generated scanners (though the hand-generated scanner I wrote for
> HLA v2.0 is far from tiny, ...; fortunately, it doesn't all
> execute all the time, and most of the memory it's sitting in doesn't
> get touched, so you don't have the cache pollution problems you get
> with Flex and Bison's tables).

Cache pollution is not an issue that 999 out of 1000 HLL programmers
should concern themselves with. (Let's not confuse the assembler itself
with issues that might arise in assembly programming...)

The OP was, iirc, asking about "writing a simple assembler". A handmade
lexer/parser is likely outside the 'simple' zone: I argue that it's
easier and quicker to internalise 'info flex' and 'info bison' than to
internalise the Dragon Book, a bunch of more recent references *and*
fret oneself silly over cache pollution, pipeline stalls, etc.

In short, not every assembler is HLA.

> Cheers,
> Randy Hyde

Reply by Rod Pemberton ●March 11, 20062006-03-11

<randyhyde@earthlink.net> wrote in message
news:1142036486.495926.36730@i40g2000cwc.googlegroups.com...
>
> toby wrote:
> > > ...
> > > While on the subject, I would also like to point out "yet another
> > > reason not to use Flex and Bison" for assemblers--
> >
> > Are you going to go into the reasons why one *would* want to use them?
> > Or will we leave that to others?
>
> Flex and Bison are great for small prototype languages.
> They might be okay for languages where you use Flex and/or Bison for
> one small part of the compiler (e.g., they way they're used in GCC,
> which is huge anyway). Trying to implement a compile-time language
> inside a compiler written with Flex/Bison is a major pain in the butt
> and it leads to all kinds of problems and restrictions on the grammar.
>

I'm interested in those statements.
(Sorry upfront to those who feel this is going off-topic since this thread
is heavily cross-posted).

I've run into this exact issue with a C99 grammar.  The C language of course
is LALR(1) when stripped of typedef's and preprocessor directives.  I can
deal with those issues, but I'm having a problem with flex.  It matches
tokens by longest length, first.  This is a problem with trigraphs,digraphs,
escape sequences, universal character constants, line continuation, etc.
All these small rules that must be processed first (C99 phase 1-4,5-7) and
are messing up the grammar: interfering with string tokenization,
preprocessor tokenization etc.  At this point I'm thinking about writing a
pre-pre-processor and post-pre-processor just to deal with these situations
unless someone knows of a more elegant alternative.

Rod Pemberton
PS.  Replies need to make it to alt.lang.asm for me to read.

Reply by Everett M. Greene ●March 11, 20062006-03-11

"toby" <toby@telegraphics.com.au> writes:
> randyhyde@earthlink.net wrote:
> > toby wrote:
> > > > ...
> > > > While on the subject, I would also like to point out "yet another
> > > > reason not to use Flex and Bison" for assemblers--
> > >
> > > Are you going to go into the reasons why one *would* want to use them?
> > > Or will we leave that to others?
> >
> > Flex and Bison are great for small prototype languages.
> > They might be okay for languages where you use Flex and/or Bison for
> > one small part of the compiler (e.g., they way they're used in GCC,
> > which is huge anyway). Trying to implement a compile-time language
> > inside a compiler written with Flex/Bison is a major pain in the butt
> > ...
> > In general, assemblers have such simple grammars that using Bison for
> > anything other than processing arithmetic expressions is probably a
> > waste of time anyway. Flex can be useful, though. Nevertheless, a
> > Flex-generated scanner is going to be *way* bigger than most
> 
> The fact that its specification is clearer and simpler, leading to a
> more reliable and maintainable program, may matter more.
> 
> > hand-generated scanners (though the hand-generated scanner I wrote for
> > HLA v2.0 is far from tiny, ...; fortunately, it doesn't all
> > execute all the time, and most of the memory it's sitting in doesn't
> > get touched, so you don't have the cache pollution problems you get
> > with Flex and Bison's tables).
> 
> Cache pollution is not an issue that 999 out of 1000 HLL programmers
> should concern themselves with. (Let's not confuse the assembler itself
> with issues that might arise in assembly programming...)
> 
> The OP was, iirc, asking about "writing a simple assembler". A handmade
> lexer/parser is likely outside the 'simple' zone: I argue that it's
> easier and quicker to internalise 'info flex' and 'info bison' than to
> internalise the Dragon Book, a bunch of more recent references *and*
> fret oneself silly over cache pollution, pipeline stalls, etc.

Just what is so complicated about parsing and lexical analysis
for a simple ASM (or even a complex ASM)?  Do the obvious and
get on with the job.

> In short, not every assembler is HLA.

Reply by toby ●March 11, 20062006-03-11

Everett M. Greene wrote:
> "toby" <toby@telegraphics.com.au> writes:
> > randyhyde@earthlink.net wrote:
> > > toby wrote:
> > > > > ...
> > > > > While on the subject, I would also like to point out "yet another
> > > > > reason not to use Flex and Bison" for assemblers--
> > > >
> > > > Are you going to go into the reasons why one *would* want to use them?
> > > > Or will we leave that to others?
> > >
> > > Flex and Bison are great for small prototype languages.
> > > They might be okay for languages where you use Flex and/or Bison for
> > > one small part of the compiler (e.g., they way they're used in GCC,
> > > which is huge anyway). Trying to implement a compile-time language
> > > inside a compiler written with Flex/Bison is a major pain in the butt
> > > ...
> > > In general, assemblers have such simple grammars that using Bison for
> > > anything other than processing arithmetic expressions is probably a
> > > waste of time anyway. Flex can be useful, though. Nevertheless, a
> > > Flex-generated scanner is going to be *way* bigger than most
> >
> > The fact that its specification is clearer and simpler, leading to a
> > more reliable and maintainable program, may matter more.
> >
> > > hand-generated scanners (though the hand-generated scanner I wrote for
> > > HLA v2.0 is far from tiny, ...; fortunately, it doesn't all
> > > execute all the time, and most of the memory it's sitting in doesn't
> > > get touched, so you don't have the cache pollution problems you get
> > > with Flex and Bison's tables).
> >
> > Cache pollution is not an issue that 999 out of 1000 HLL programmers
> > should concern themselves with. (Let's not confuse the assembler itself
> > with issues that might arise in assembly programming...)
> >
> > The OP was, iirc, asking about "writing a simple assembler". A handmade
> > lexer/parser is likely outside the 'simple' zone: I argue that it's
> > easier and quicker to internalise 'info flex' and 'info bison' than to
> > internalise the Dragon Book, a bunch of more recent references *and*
> > fret oneself silly over cache pollution, pipeline stalls, etc.
>
> Just what is so complicated about parsing and lexical analysis
> for a simple ASM (or even a complex ASM)?  Do the obvious and
> get on with the job.

Well, we have differing definitions of 'the job'. I don't consider
reinventing the lexer/parser wheels (which can get arbitrarily complex
and tedious to get right/read/maintain) as *necessarily* part of this
'job' of writing an assembler. Ymmv.

Clearly HLA falls outside my parameters, or far outside the 'simple'
parameter, because it has strict performance requirements -- e.g. it
has to process very large symbol tables and a complex macro syntax, and
because it's among other things (forgive me) a competitor in a pissing
contest, if you've ever frequented alt.lang.asm. It's also written by
an assembly language programmer with an assembly language programmer's
preoccupations - cache, etc.

I am simply defending an alternative approach: Don't microdesign and
micromanage, but exploit tools like flex and bison to handle the
tedious and move on to focus on 'the real job'.

At one point I compared[1] two implementations of a PAL-III (PDP-8)
assembler, one with hand-coded lexer/parser, and one using flex/bison
(mine). In the hand-coded case, the code concerned with lexing/parsing
was 692 lines, or 58% of the program. In the flex/bison case, it was a
mere 179 lines, or 11% of the program (including token definitions,
grammar, support code). It seems reasonable to infer that there is
correspondingly less to write and debug in the latter case, by a factor
of nearly four. And it ends up in a clearer, more maintainable form.
But of course this reasoning applies most strongly to "simple"
projects.

[1] http://www.telegraphics.com.au/sw/info/dpa.html
> 
> > In short, not every assembler is HLA.

Reply by Bu ●March 11, 20062006-03-11

On Wed, 08 Mar 2006 15:58:45 -0600, msg <msg@_cybertheque.org_> wrote:

>
>> 
>> About 30 years ago i wrote a (cross) assembler for the Intel 8080
>> processor (8 bit) in Basic (from Hewlett Packard).
>> I believe i have somewhere still the listing of this program.
>> If you are very interested i will look what i still have and try to
>> scan it (i do not have it in electronic format, maybe as punch paper
>> tape (:-)
>>
>Hi,
>
>I am interested in seeing your BASIC assembler; if it is on paper
>tape, I can read it (and return the tape with conversion of your
>choice).  I imagine this predates RMB (Rocky Mountain Basic)?
>
>Regards,
>
>Michael Grigoni
>Cybertheque Museum

Hi Michael,

I looked up what i still have on this 30-year old project (old
sentiment from my education).
I could not find the papertape. I only found the listing (with all
rem-statements removed to speed up the process).

This listing is not really 'structured' basic so it is not realy clear
for somebody else. I do have a detailed description on the cross
assembler program (that even has a simulator and reverse assembler!)
but this is in Dutch.

So thanks for the interest but i can not make it available without a
lot of extra work (scanning etc).

Bu

Reply by Isaac Bosompem ●March 11, 20062006-03-11

Hi Grant, sorry for the cross posting.

Do you know a good Python IDE? Which one do you use yourself?

Reply by toby ●March 11, 20062006-03-11

Isaac Bosompem wrote:
> Hi Grant, sorry for the cross posting.
>
> Do you know a good Python IDE? Which one do you use yourself?

Eclipse will integrate nicely with Python - though I am not a Pythonist
myself (I use Eclipse for C, C++, Perl, etc).
http://wiki.python.org/moin/EclipsePythonIntegration