Forums

Writing a simple assembler

Started by Alex March 6, 2006
samIam wrote:
> > In order to compensate for 10 minutes of time spent implementing a > > better search, you'll need to run 1.2 billion lines of code through the > > assembler. > > This attitude is why there isnt much thought given to softare > development nowadays and its clearly reflected in the output and > the drive for faster processors and huge amounts of memory/resources
On the contrary. I'm all for optimization, but only if it makes sense. If, for any reasonable input size, the assembler takes no noticable time to process the entire source, what point is there in further improvements ? Instead, I prefer to optimize scarcer resources, such as my own time. If making the assembler is just a step towards the goal of having a binary program ready for a product, and not a goal in itself, then spending more time on the assembler than you will ever gain back by the optimizations simply makes no sense. Gigahertz processors, and gigabyte memories are here. It would be waste of resources not to use them whenever it saves us time.
samIam wrote:
> > I have never understood the obsession with hash tables. > > SO WHAT they provide O(1) search/insert times ... but they are > FIXED in size ... and you often have to implement collision > detection to get around soem problems.
Please do not top-post. It loses all context, and makes no sense. Your answer belongs after, or intermixed with, the material you quote, after snipping non-relevant portions. You obviously did not look at hashlib (reference lost by the toppost). It is not fixed in size. It can handle from 2 to about 8,000,000 entries as it stands, and the upper limit can be extended with one line of code, and the method is detailed in the source. You don't have to implement any collision detection, it is all done for you. The only case in which it is less than optimum if you need to maintain the content in sorted order continuously. There are provisions in hashlib that make it easy to sort the content on demand, but that sort will be destroyed by the next entry or deletion from the table. Note that O(1) is considerably faster than O(logN). It tends to relieve you of any need to think about the penalties of expansion. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
samIam wrote:
> >> Resizing is an O(n) operation (where n is the no. of entries >> in the hash table at the time of resizing), but you resize >> only once every O(n) elements, so the amortized cost of >> resizing per insertion is O(n/n)=O(1); of course, there is no >> resizing cost when you are only doing lookups. > > Biggest assumption you are making here is that THERE IS SPACE TO > ALLOCATE/RESIZE. > > Your assembler would not run on the target processor with that > mentality. It would exist only as a cross development tool on a > system with the resources to house it. > > Software should be designed much better than that. But hey, what > do I know? Nothing apparently!
You said it, not I. If there is no room to house the symbol table, there is probably no room to house the symbol table. BTW how many embedded system assemblers and compilers are expected to run on the destination system? -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
Isaac Bosompem wrote:
> CBFalconer wrote: >> Isaac Bosompem wrote: >>> >> ... snip ... >>> >>> I do need an efficient hashing function, with a low probability of >>> collisions (ideally). I will need to code to handle collisions. >>> >>> Anyways I am open to new ideas hopefully you have some of your own >>> to add. >> >> You are welcome to use hashlib, which is under GPL licensing, and >> was designed to fill just this sort of need. It is written in pure >> ISO C. >> >> You could open one table and stuff it with the opcodes. Another >> could hold the macros, and yet another the symbols. The tables >> will all use the same re-entrant hash code, yet can have widely >> different content. >> >> <http://cbfalconer.home.att.net/download/hashlib.zip> > > WOW, your package will again make my life a lot easier. Thanks! I > will definitely make use of it!
You are welcome. This is exactly one of the sort of applications it was designed for. Remember, it is GPL licensed, not GLPL, so you are free to use it, but you are obliged to publish your code under GPL if you ever promulgate your end program. If you only use it for your own benefit, there are no restrictions. Look at the usage in id2id-20, which may also be useful for any mass identifier changes in a suite of source files. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
On Tue, 07 Mar 2006 12:15:38 -0500, samIam <him@here.com> wrote:

>> If it's just "simple assembler" why not just use a list with >> O(n) access time. > >Most 8bit micros have anywhere from 40 - 60 instructions (opcodes >NOT opcodes+addressing formats) > >O(n) searches on that set looks like a waste of cpu cyles > >> Better yet, use a real programming language with a dictionary >> type (Python, Smalltalk, whatever). > >There are somethings you can ONLY do in assembly ... and on an 8 bit >micro (with 64k address space) every byte counts
In the 1970's I wrote several cross-assemblers in Fortran running on PDP-11s (which have 64 KiB address space) for various processors such as 8080 and 1802. I never bothered to use anything more fancier than linear search for opcode or symbol table search, since the total time was dominated by the program load time, the source file read time (two passes) and the binary output file writing (or even punching to paper tape for downloading to the target :-). It would be very hard to write so huge modules for any small target processor that would require such a huge number of labels, that the inefficient symbol table search time would have been of any significance relative to the I/O times. Paul
CBFalconer wrote:
> Isaac Bosompem wrote: > ... snip ... >> I do need an efficient hashing function, with a low probability of >> collisions (ideally). I will need to code to handle collisions. >> >> Anyways I am open to new ideas hopefully you have some of your own >> to add. > > You are welcome to use hashlib, which is under GPL licensing, and > was designed to fill just this sort of need. It is written in pure > ISO C. > > You could open one table and stuff it with the opcodes. Another > could hold the macros, and yet another the symbols. The tables > will all use the same re-entrant hash code, yet can have widely > different content. > > <http://cbfalconer.home.att.net/download/hashlib.zip> >
It's for this sort of reason that such programs should be written in languages like Perl or Python (or even PHP). Just as C++ is a poor choice of language for an 8051, so C/C++ is a poor choice of language for an assembler running on a PC. Both Perl and Python will give you much simpler regular expression engines than any C library, vastly easier (and more flexible, and probably faster) hash dictionaries, and a host of ready-to-use libraries.
On Tue, 07 Mar 2006 20:24:47 -0000, Grant Edwards <grante@visi.com>
wrote:

>> There are somethings you can ONLY do in assembly ... and on an 8 bit >> micro (with 64k address space) every byte counts > >Ah, the assembler is running on an 8-bit processor. I guess I >missed that. I assumed the assembler would be running on a >Linux or Windows host.
The OP wrote "for an 8 bit" not "on an 8 bit" CPU. -- 42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use <same-name>@monlynx.de instead !
David Brown wrote:
> CBFalconer wrote: >> Isaac Bosompem wrote: >> ... snip ... >>> I do need an efficient hashing function, with a low probability of >>> collisions (ideally). I will need to code to handle collisions. >>> >>> Anyways I am open to new ideas hopefully you have some of your own >>> to add. >> >> You are welcome to use hashlib, which is under GPL licensing, and >> was designed to fill just this sort of need. It is written in pure >> ISO C. >> >> You could open one table and stuff it with the opcodes. Another >> could hold the macros, and yet another the symbols. The tables >> will all use the same re-entrant hash code, yet can have widely >> different content. >> >> <http://cbfalconer.home.att.net/download/hashlib.zip> > > It's for this sort of reason that such programs should be written > in languages like Perl or Python (or even PHP). Just as C++ is a > poor choice of language for an 8051, so C/C++ is a poor choice of > language for an assembler running on a PC. > > Both Perl and Python will give you much simpler regular expression > engines than any C library, vastly easier (and more flexible, and > probably faster) hash dictionaries, and a host of ready-to-use > libraries.
And why do you assume an assembler needs the complication of a regular expression engine? It can classify items very quickly. If it starts in the lhcolumn it is either a name or a label. A label terminates with a colon. A name should be followed by a pseudo op such as macro or equ. A field starting after the lh column is an op or a pseudo op. All else (unless the line has been terminated by a comment marker) is an expression of some form. Names, ops, etc. start with an alpha. Numeric constants start with a digit, unless they come out of a symbol table. Nothing spreads over multiple lines except a macro. We don't need macros for a simple assembler. What an assembler needs is a quick and consistent way to form and maintain symbol tables. The complications appear in the object format, if it is to generate relocatable linkable code. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson More details at: <http://cfaj.freeshell.org/google/> Also see <http://www.safalra.com/special/googlegroupsreply/>
On 2006-03-08, Paul Keinanen <keinanen@sci.fi> wrote:

>>> If it's just "simple assembler" why not just use a list with >>> O(n) access time. >> >>Most 8bit micros have anywhere from 40 - 60 instructions (opcodes >>NOT opcodes+addressing formats) >> >>O(n) searches on that set looks like a waste of cpu cyles >> >>> Better yet, use a real programming language with a dictionary >>> type (Python, Smalltalk, whatever). >> >>There are somethings you can ONLY do in assembly ... and on an >>8 bit micro (with 64k address space) every byte counts > > In the 1970's I wrote several cross-assemblers in Fortran running on > PDP-11s (which have 64 KiB address space) for various processors such > as 8080 and 1802. I never bothered to use anything more fancier than > linear search for opcode or symbol table search, since the total time > was dominated by the program load time, the source file read time (two > passes) and the binary output file writing (or even punching to paper > tape for downloading to the target :-).
My point exactly. Worrying about hash tables for symbol lookup reeks of premature optimization for "a simple assembler".
> It would be very hard to write so huge modules for any small > target processor that would require such a huge number of > labels, that the inefficient symbol table search time would > have been of any significance relative to the I/O times.
I'm also surprised that somebody thinks they're going to use C++ and generic hashing libraries on an 8-bit target with a 64K address space. -- Grant Edwards grante Yow! .. someone in DAYTON, at Ohio is selling USED visi.com CARPETS to a SERBO-CROATIAN
On 2006-03-08, David Brown <david@westcontrol.removethisbit.com> wrote:
> CBFalconer wrote: >> Isaac Bosompem wrote: >> ... snip ... >>> I do need an efficient hashing function, with a low probability of >>> collisions (ideally). I will need to code to handle collisions. >>> >>> Anyways I am open to new ideas hopefully you have some of your own >>> to add. >> >> You are welcome to use hashlib, which is under GPL licensing, and >> was designed to fill just this sort of need. It is written in pure >> ISO C. >> >> You could open one table and stuff it with the opcodes. Another >> could hold the macros, and yet another the symbols. The tables >> will all use the same re-entrant hash code, yet can have widely >> different content. >> >> <http://cbfalconer.home.att.net/download/hashlib.zip> >> > > It's for this sort of reason that such programs should be written in > languages like Perl or Python (or even PHP). Just as C++ is a poor > choice of language for an 8051, so C/C++ is a poor choice of language > for an assembler running on a PC.
That's what I said, but apparently the assembler is running on an 8-bit target with a 64K address space. So Python is out of the question. But C++ and generic libraries are going to fit?
> Both Perl and Python will give you much simpler regular > expression engines than any C library, vastly easier (and more > flexible, and probably faster) hash dictionaries, and a host > of ready-to-use libraries.
Yup. -- Grant Edwards grante Yow! Darling, my ELBOW at is FLYING over FRANKFURT, visi.com Germany...