EmbeddedRelated.com
Forums

Writing a simple assembler

Started by Alex March 6, 2006
On 10 Mar 2006 11:11:07 -0800, cs_posting@hotmail.com wrote:

>Paul Keinanen wrote: > >> A two pass assembler is definitely a practical way of doing a >> cross-assembler. The first pass must generate the correct amount of >> code for each instruction, in order to "detect" the locations of all >> branch target labels in the program. In the second pass do the same >> thing again and using the label addresses stored in the symbol table, >> generate the correct code (especially the branch/jump instructions). > >I would imagine things get more complicated when you have both a short >relative and a long call/jump addressing scheme - you don't know before >you emit the intervening code how far away your target will be, so you >don't know how long an instruction word you need to emit, so you don't >know how far away your target will be...
Backward references are easy to solve, the forward references are a bit trickier :-). Anyway, you could speculatively generate the long forward reference in the first pass and when the forward definition of a label is encountered, check if the forward branch is within short distance. If it is, in the symbol table, decrement any labels after the branch instruction by the number of bytes saved. If an optimum result is not needed, leave it this way. However, when moving the labels upwards after the optimised instruction may reduce the distance from some other prior long branch instruction, which can also be optimised. Thus, also the location of the speculative long branch instruction and a pointer to the symbol table entry of the target label needs also to be stored into a temporary symbol table. Those speculative long branch instructions have been converted to short branches can be removed from the symbol table. When the symbol table shuffling is done, a normal second pass can be performed and since the locations of all labels are now fixed, it is easy to emit either a long or short branch instruction. Paul
toby wrote:
> > ... > > While on the subject, I would also like to point out "yet another > > reason not to use Flex and Bison" for assemblers-- > > Are you going to go into the reasons why one *would* want to use them? > Or will we leave that to others?
Flex and Bison are great for small prototype languages. They might be okay for languages where you use Flex and/or Bison for one small part of the compiler (e.g., they way they're used in GCC, which is huge anyway). Trying to implement a compile-time language inside a compiler written with Flex/Bison is a major pain in the butt and it leads to all kinds of problems and restrictions on the grammar. Let me give a classic example of a problem in HLA. HLA allows a data declaration like the following: static someVarName:int32; or, it can allow a declaration like this: static someVarName:int32; @external; (with the obvious [I hope] effect.) The problem is that if you follow a declaration of this sort by a command that must be immediately executed at compile-time, you run into some problems with Bison's one-symbol lookahead. In particular, the compile-time statement may execute *before* the parser finishes the declaration. This can create problems if the operation of that compile-time statement depends upon the declaration, e.g., something like static someVarName:int32; #if (@defined( someVarName)) ... #endif Because the declaration has not finished yet, the symbol may not be declared at the point the @defined compile-time function executes, and @defined incorrectly returns false. Unfortunately, you cannot (easily, anyway) merge the grammars of the compile-time and run-time languages together. They are truly two separate languages that the compiler must process concurrently (and the other solution, using a preprocessor rather than a compile-time language has an even bigger set of problems, such as lack of access to objects declared or used in the run-time language). Note that if you create a hand-written parser, it's easy enough to work around problems like this. Of course, this has nothing to do with working sets and blowing the cache away, but it is an example of some problems I've encountered with using Flex and Bison to write a full-blown macro assembler. In general, assemblers have such simple grammars that using Bison for anything other than processing arithmetic expressions is probably a waste of time anyway. Flex can be useful, though. Nevertheless, a Flex-generated scanner is going to be *way* bigger than most hand-generated scanners (though the hand-generated scanner I wrote for HLA v2.0 is far from tiny, as it was written to be fast and uses in-line coding to implement a hash search for assembler keywords; and that's many thousands of lines of code; fortunately, it doesn't all execute all the time, and most of the memory it's sitting in doesn't get touched, so you don't have the cache pollution problems you get with Flex and Bison's tables). Cheers, Randy Hyde
randyhyde@earthlink.net wrote:

<snip>
> In general, assemblers have such simple grammars that using Bison for > anything other than processing arithmetic expressions is probably a > waste of time anyway. Flex can be useful, though. Nevertheless, a > Flex-generated scanner is going to be *way* bigger than most > hand-generated scanners (though the hand-generated scanner I wrote for > HLA v2.0 is far from tiny, as it was written to be fast and uses > in-line coding to implement a hash search for assembler keywords; and > that's many thousands of lines of code; fortunately, it doesn't all > execute all the time, and most of the memory it's sitting in doesn't > get touched, so you don't have the cache pollution problems you get > with Flex and Bison's tables).
Is HLA 2.0 ready enough the OP (in: writing a simple assemler) could use its MACRO features, to create a desired cross-assembler ? I'd imagine a HEX output would be needed - not sure if HLA 2.0 does that ? -jg
randyhyde@earthlink.net wrote:
> toby wrote: > > > ... > > > While on the subject, I would also like to point out "yet another > > > reason not to use Flex and Bison" for assemblers-- > > > > Are you going to go into the reasons why one *would* want to use them? > > Or will we leave that to others? > > Flex and Bison are great for small prototype languages. > They might be okay for languages where you use Flex and/or Bison for > one small part of the compiler (e.g., they way they're used in GCC, > which is huge anyway). Trying to implement a compile-time language > inside a compiler written with Flex/Bison is a major pain in the butt > ... > In general, assemblers have such simple grammars that using Bison for > anything other than processing arithmetic expressions is probably a > waste of time anyway. Flex can be useful, though. Nevertheless, a > Flex-generated scanner is going to be *way* bigger than most
The fact that its specification is clearer and simpler, leading to a more reliable and maintainable program, may matter more.
> hand-generated scanners (though the hand-generated scanner I wrote for > HLA v2.0 is far from tiny, ...; fortunately, it doesn't all > execute all the time, and most of the memory it's sitting in doesn't > get touched, so you don't have the cache pollution problems you get > with Flex and Bison's tables).
Cache pollution is not an issue that 999 out of 1000 HLL programmers should concern themselves with. (Let's not confuse the assembler itself with issues that might arise in assembly programming...) The OP was, iirc, asking about "writing a simple assembler". A handmade lexer/parser is likely outside the 'simple' zone: I argue that it's easier and quicker to internalise 'info flex' and 'info bison' than to internalise the Dragon Book, a bunch of more recent references *and* fret oneself silly over cache pollution, pipeline stalls, etc. In short, not every assembler is HLA.
> Cheers, > Randy Hyde
<randyhyde@earthlink.net> wrote in message
news:1142036486.495926.36730@i40g2000cwc.googlegroups.com...
> > toby wrote: > > > ... > > > While on the subject, I would also like to point out "yet another > > > reason not to use Flex and Bison" for assemblers-- > > > > Are you going to go into the reasons why one *would* want to use them? > > Or will we leave that to others? > > Flex and Bison are great for small prototype languages. > They might be okay for languages where you use Flex and/or Bison for > one small part of the compiler (e.g., they way they're used in GCC, > which is huge anyway). Trying to implement a compile-time language > inside a compiler written with Flex/Bison is a major pain in the butt > and it leads to all kinds of problems and restrictions on the grammar. >
I'm interested in those statements. (Sorry upfront to those who feel this is going off-topic since this thread is heavily cross-posted). I've run into this exact issue with a C99 grammar. The C language of course is LALR(1) when stripped of typedef's and preprocessor directives. I can deal with those issues, but I'm having a problem with flex. It matches tokens by longest length, first. This is a problem with trigraphs,digraphs, escape sequences, universal character constants, line continuation, etc. All these small rules that must be processed first (C99 phase 1-4,5-7) and are messing up the grammar: interfering with string tokenization, preprocessor tokenization etc. At this point I'm thinking about writing a pre-pre-processor and post-pre-processor just to deal with these situations unless someone knows of a more elegant alternative. Rod Pemberton PS. Replies need to make it to alt.lang.asm for me to read.
"toby" <toby@telegraphics.com.au> writes:
> randyhyde@earthlink.net wrote: > > toby wrote: > > > > ... > > > > While on the subject, I would also like to point out "yet another > > > > reason not to use Flex and Bison" for assemblers-- > > > > > > Are you going to go into the reasons why one *would* want to use them? > > > Or will we leave that to others? > > > > Flex and Bison are great for small prototype languages. > > They might be okay for languages where you use Flex and/or Bison for > > one small part of the compiler (e.g., they way they're used in GCC, > > which is huge anyway). Trying to implement a compile-time language > > inside a compiler written with Flex/Bison is a major pain in the butt > > ... > > In general, assemblers have such simple grammars that using Bison for > > anything other than processing arithmetic expressions is probably a > > waste of time anyway. Flex can be useful, though. Nevertheless, a > > Flex-generated scanner is going to be *way* bigger than most > > The fact that its specification is clearer and simpler, leading to a > more reliable and maintainable program, may matter more. > > > hand-generated scanners (though the hand-generated scanner I wrote for > > HLA v2.0 is far from tiny, ...; fortunately, it doesn't all > > execute all the time, and most of the memory it's sitting in doesn't > > get touched, so you don't have the cache pollution problems you get > > with Flex and Bison's tables). > > Cache pollution is not an issue that 999 out of 1000 HLL programmers > should concern themselves with. (Let's not confuse the assembler itself > with issues that might arise in assembly programming...) > > The OP was, iirc, asking about "writing a simple assembler". A handmade > lexer/parser is likely outside the 'simple' zone: I argue that it's > easier and quicker to internalise 'info flex' and 'info bison' than to > internalise the Dragon Book, a bunch of more recent references *and* > fret oneself silly over cache pollution, pipeline stalls, etc.
Just what is so complicated about parsing and lexical analysis for a simple ASM (or even a complex ASM)? Do the obvious and get on with the job.
> In short, not every assembler is HLA.
Everett M. Greene wrote:
> "toby" <toby@telegraphics.com.au> writes: > > randyhyde@earthlink.net wrote: > > > toby wrote: > > > > > ... > > > > > While on the subject, I would also like to point out "yet another > > > > > reason not to use Flex and Bison" for assemblers-- > > > > > > > > Are you going to go into the reasons why one *would* want to use them? > > > > Or will we leave that to others? > > > > > > Flex and Bison are great for small prototype languages. > > > They might be okay for languages where you use Flex and/or Bison for > > > one small part of the compiler (e.g., they way they're used in GCC, > > > which is huge anyway). Trying to implement a compile-time language > > > inside a compiler written with Flex/Bison is a major pain in the butt > > > ... > > > In general, assemblers have such simple grammars that using Bison for > > > anything other than processing arithmetic expressions is probably a > > > waste of time anyway. Flex can be useful, though. Nevertheless, a > > > Flex-generated scanner is going to be *way* bigger than most > > > > The fact that its specification is clearer and simpler, leading to a > > more reliable and maintainable program, may matter more. > > > > > hand-generated scanners (though the hand-generated scanner I wrote for > > > HLA v2.0 is far from tiny, ...; fortunately, it doesn't all > > > execute all the time, and most of the memory it's sitting in doesn't > > > get touched, so you don't have the cache pollution problems you get > > > with Flex and Bison's tables). > > > > Cache pollution is not an issue that 999 out of 1000 HLL programmers > > should concern themselves with. (Let's not confuse the assembler itself > > with issues that might arise in assembly programming...) > > > > The OP was, iirc, asking about "writing a simple assembler". A handmade > > lexer/parser is likely outside the 'simple' zone: I argue that it's > > easier and quicker to internalise 'info flex' and 'info bison' than to > > internalise the Dragon Book, a bunch of more recent references *and* > > fret oneself silly over cache pollution, pipeline stalls, etc. > > Just what is so complicated about parsing and lexical analysis > for a simple ASM (or even a complex ASM)? Do the obvious and > get on with the job.
Well, we have differing definitions of 'the job'. I don't consider reinventing the lexer/parser wheels (which can get arbitrarily complex and tedious to get right/read/maintain) as *necessarily* part of this 'job' of writing an assembler. Ymmv. Clearly HLA falls outside my parameters, or far outside the 'simple' parameter, because it has strict performance requirements -- e.g. it has to process very large symbol tables and a complex macro syntax, and because it's among other things (forgive me) a competitor in a pissing contest, if you've ever frequented alt.lang.asm. It's also written by an assembly language programmer with an assembly language programmer's preoccupations - cache, etc. I am simply defending an alternative approach: Don't microdesign and micromanage, but exploit tools like flex and bison to handle the tedious and move on to focus on 'the real job'. At one point I compared[1] two implementations of a PAL-III (PDP-8) assembler, one with hand-coded lexer/parser, and one using flex/bison (mine). In the hand-coded case, the code concerned with lexing/parsing was 692 lines, or 58% of the program. In the flex/bison case, it was a mere 179 lines, or 11% of the program (including token definitions, grammar, support code). It seems reasonable to infer that there is correspondingly less to write and debug in the latter case, by a factor of nearly four. And it ends up in a clearer, more maintainable form. But of course this reasoning applies most strongly to "simple" projects. [1] http://www.telegraphics.com.au/sw/info/dpa.html
> > > In short, not every assembler is HLA.
On Wed, 08 Mar 2006 15:58:45 -0600, msg <msg@_cybertheque.org_> wrote:

> >> >> About 30 years ago i wrote a (cross) assembler for the Intel 8080 >> processor (8 bit) in Basic (from Hewlett Packard). >> I believe i have somewhere still the listing of this program. >> If you are very interested i will look what i still have and try to >> scan it (i do not have it in electronic format, maybe as punch paper >> tape (:-) >> >Hi, > >I am interested in seeing your BASIC assembler; if it is on paper >tape, I can read it (and return the tape with conversion of your >choice). I imagine this predates RMB (Rocky Mountain Basic)? > >Regards, > >Michael Grigoni >Cybertheque Museum
Hi Michael, I looked up what i still have on this 30-year old project (old sentiment from my education). I could not find the papertape. I only found the listing (with all rem-statements removed to speed up the process). This listing is not really 'structured' basic so it is not realy clear for somebody else. I do have a detailed description on the cross assembler program (that even has a simulator and reverse assembler!) but this is in Dutch. So thanks for the interest but i can not make it available without a lot of extra work (scanning etc). Bu
Hi Grant, sorry for the cross posting.

Do you know a good Python IDE? Which one do you use yourself?

Isaac Bosompem wrote:
> Hi Grant, sorry for the cross posting. > > Do you know a good Python IDE? Which one do you use yourself?
Eclipse will integrate nicely with Python - though I am not a Pythonist myself (I use Eclipse for C, C++, Perl, etc). http://wiki.python.org/moin/EclipsePythonIntegration