Writing a simple assembler| page 4

Reply by Artenz ●March 7, 20062006-03-07

samIam wrote:
> > In order to compensate for 10 minutes of time spent implementing a
> > better search, you'll need to run 1.2 billion lines of code through the
> > assembler.
>
> This attitude is why there isnt much thought given to softare
> development nowadays and its clearly reflected in the output and
> the drive for faster processors and huge amounts of memory/resources

On the contrary. I'm all for optimization, but only if it makes sense.
If, for any reasonable input size, the assembler takes no noticable
time to process the entire source, what point is there in further
improvements ? Instead, I prefer to optimize scarcer resources, such as
my own time.  If making the assembler is just a step towards the goal
of having a binary program ready for a product, and not a goal in
itself, then spending more time on the assembler than you will ever
gain back by the optimizations simply makes no sense.

Gigahertz processors, and gigabyte memories are here. It would be waste
of resources not to use them whenever it saves us time.

Reply by CBFalconer ●March 7, 20062006-03-07

samIam wrote:
> 
> I have never understood the obsession with hash tables.
> 
> SO WHAT they provide O(1) search/insert times ... but they are
> FIXED in size ... and you often have to implement collision
> detection to get around soem problems.

Please do not top-post.  It loses all context, and makes no sense. 
Your answer belongs after, or intermixed with, the material you
quote, after snipping non-relevant portions.

You obviously did not look at hashlib (reference lost by the
toppost).  It is not fixed in size.  It can handle from 2 to about
8,000,000 entries as it stands, and the upper limit can be extended
with one line of code, and the method is detailed in the source. 
You don't have to implement any collision detection, it is all done
for you.  

The only case in which it is less than optimum if you need to
maintain the content in sorted order continuously.  There are
provisions in hashlib that make it easy to sort the content on
demand, but that sort will be destroyed by the next entry or
deletion from the table.

Note that O(1) is considerably faster than O(logN).  It tends to
relieve you of any need to think about the penalties of expansion.

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Reply by CBFalconer ●March 7, 20062006-03-07

samIam wrote:
> 
>>    Resizing is an O(n) operation (where n is the no. of entries
>>    in the hash table at the time of resizing), but you resize
>>    only once every O(n) elements, so the amortized cost of
>>    resizing per insertion is O(n/n)=O(1); of course, there is no
>>    resizing cost when you are only doing lookups.
> 
> Biggest assumption you are making here is that THERE IS SPACE TO
> ALLOCATE/RESIZE.
> 
> Your assembler would not run on the target processor with that
> mentality. It would exist only as a cross development tool on a
> system with the resources to house it.
> 
> Software should be designed much better than that. But hey, what
> do I know? Nothing apparently!

You said it, not I.  If there is no room to house the symbol table,
there is probably no room to house the symbol table.

BTW how many embedded system assemblers and compilers are expected
to run on the destination system?

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Reply by CBFalconer ●March 7, 20062006-03-07

Isaac Bosompem wrote:
> CBFalconer wrote:
>> Isaac Bosompem wrote:
>>>
>> ... snip ...
>>>
>>> I do need an efficient hashing function, with a low probability of
>>> collisions (ideally). I will need to code to handle collisions.
>>>
>>> Anyways I am open to new ideas hopefully you have some of your own
>>> to add.
>>
>> You are welcome to use hashlib, which is under GPL licensing, and
>> was designed to fill just this sort of need.  It is written in pure
>> ISO C.
>>
>> You could open one table and stuff it with the opcodes.  Another
>> could hold the macros, and yet another the symbols.  The tables
>> will all use the same re-entrant hash code, yet can have widely
>> different content.
>>
>>   <http://cbfalconer.home.att.net/download/hashlib.zip>
> 
> WOW, your package will again make my life a lot easier. Thanks! I
> will definitely make use of it!

You are welcome.  This is exactly one of the sort of applications
it was designed for.  Remember, it is GPL licensed, not GLPL, so
you are free to use it, but you are obliged to publish your code
under GPL if you ever promulgate your end program.  If you only use
it for your own benefit, there are no restrictions.

Look at the usage in id2id-20, which may also be useful for any
mass identifier changes in a suite of source files.

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Reply by Paul Keinanen ●March 8, 20062006-03-08

On Tue, 07 Mar 2006 12:15:38 -0500, samIam <him@here.com> wrote:

>> If it's just "simple assembler" why not just use a list with
>> O(n) access time.  
>
>Most 8bit micros have anywhere from 40 - 60 instructions (opcodes
>NOT opcodes+addressing formats)
>
>O(n) searches on that set looks like a waste of cpu cyles
>
>> Better yet, use a real programming language with a dictionary
>> type (Python, Smalltalk, whatever).
>
>There are somethings you can ONLY do in assembly ... and on an 8 bit
>micro (with 64k address space) every byte counts

In the 1970's I wrote several cross-assemblers in Fortran running on
PDP-11s (which have 64 KiB address space) for various processors such
as 8080 and 1802. I never bothered to use anything more fancier than
linear search for opcode or symbol table search, since the total time
was dominated by the program load time, the source file read time (two
passes) and the binary output file writing (or even punching to paper
tape for downloading to the target :-).

It would be very hard to write so huge modules for any small target
processor that would require such a huge number of labels, that the
inefficient symbol table search time would have been of any
significance relative to the I/O times.

Paul

Reply by David Brown ●March 8, 20062006-03-08

CBFalconer wrote:
> Isaac Bosompem wrote:
> ... snip ...
>> I do need an efficient hashing function, with a low probability of
>> collisions (ideally). I will need to code to handle collisions.
>>
>> Anyways I am open to new ideas hopefully you have some of your own
>> to add.
> 
> You are welcome to use hashlib, which is under GPL licensing, and
> was designed to fill just this sort of need.  It is written in pure
> ISO C.
> 
> You could open one table and stuff it with the opcodes.  Another
> could hold the macros, and yet another the symbols.  The tables
> will all use the same re-entrant hash code, yet can have widely
> different content.
> 
>   <http://cbfalconer.home.att.net/download/hashlib.zip>
> 

It's for this sort of reason that such programs should be written in 
languages like Perl or Python (or even PHP).  Just as C++ is a poor 
choice of language for an 8051, so C/C++ is a poor choice of language 
for an assembler running on a PC.

Both Perl and Python will give you much simpler regular expression 
engines than any C library, vastly easier (and more flexible, and 
probably faster) hash dictionaries, and a host of ready-to-use libraries.

Reply by 42Bastian Schick ●March 8, 20062006-03-08

On Tue, 07 Mar 2006 20:24:47 -0000, Grant Edwards <grante@visi.com>
wrote:

>> There are somethings you can ONLY do in assembly ... and on an 8 bit
>> micro (with 64k address space) every byte counts
>
>Ah, the assembler is running on an 8-bit processor.  I guess I
>missed that.  I assumed the assembler would be running on a
>Linux or Windows host.

The OP wrote "for an 8 bit" not "on an 8 bit" CPU.

-- 
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use <same-name>@monlynx.de instead !

Reply by CBFalconer ●March 8, 20062006-03-08

David Brown wrote:
> CBFalconer wrote:
>> Isaac Bosompem wrote:
>> ... snip ...
>>> I do need an efficient hashing function, with a low probability of
>>> collisions (ideally). I will need to code to handle collisions.
>>>
>>> Anyways I am open to new ideas hopefully you have some of your own
>>> to add.
>>
>> You are welcome to use hashlib, which is under GPL licensing, and
>> was designed to fill just this sort of need.  It is written in pure
>> ISO C.
>>
>> You could open one table and stuff it with the opcodes.  Another
>> could hold the macros, and yet another the symbols.  The tables
>> will all use the same re-entrant hash code, yet can have widely
>> different content.
>>
>>   <http://cbfalconer.home.att.net/download/hashlib.zip>
> 
> It's for this sort of reason that such programs should be written
> in languages like Perl or Python (or even PHP).  Just as C++ is a
> poor choice of language for an 8051, so C/C++ is a poor choice of
> language for an assembler running on a PC.
> 
> Both Perl and Python will give you much simpler regular expression
> engines than any C library, vastly easier (and more flexible, and
> probably faster) hash dictionaries, and a host of ready-to-use
> libraries.

And why do you assume an assembler needs the complication of a
regular expression engine?  It can classify items very quickly.  If
it starts in the lhcolumn it is either a name or a label.  A label
terminates with a colon.  A name should be followed by a pseudo op
such as macro or equ.  A field starting after the lh column is an
op or a pseudo op.  All else (unless the line has been terminated
by a comment marker) is an expression of some form.

Names, ops, etc. start with an alpha.  Numeric constants start with
a digit, unless they come out of a symbol table.  Nothing spreads
over multiple lines except a macro.  We don't need macros for a
simple assembler.

What an assembler needs is a quick and consistent way to form and
maintain symbol tables.  The complications appear in the object
format, if it is to generate relocatable linkable code.

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Reply by Grant Edwards ●March 8, 20062006-03-08

On 2006-03-08, Paul Keinanen <keinanen@sci.fi> wrote:

>>> If it's just "simple assembler" why not just use a list with
>>> O(n) access time.  
>>
>>Most 8bit micros have anywhere from 40 - 60 instructions (opcodes
>>NOT opcodes+addressing formats)
>>
>>O(n) searches on that set looks like a waste of cpu cyles
>>
>>> Better yet, use a real programming language with a dictionary
>>> type (Python, Smalltalk, whatever).
>>
>>There are somethings you can ONLY do in assembly ... and on an
>>8 bit micro (with 64k address space) every byte counts
>
> In the 1970's I wrote several cross-assemblers in Fortran running on
> PDP-11s (which have 64 KiB address space) for various processors such
> as 8080 and 1802. I never bothered to use anything more fancier than
> linear search for opcode or symbol table search, since the total time
> was dominated by the program load time, the source file read time (two
> passes) and the binary output file writing (or even punching to paper
> tape for downloading to the target :-).

My point exactly.  Worrying about hash tables for symbol lookup
reeks of premature optimization for "a simple assembler".

> It would be very hard to write so huge modules for any small
> target processor that would require such a huge number of
> labels, that the inefficient symbol table search time would
> have been of any significance relative to the I/O times.

I'm also surprised that somebody thinks they're going to use
C++ and generic hashing libraries on an 8-bit target with a 64K
address space.

-- 
Grant Edwards                   grante             Yow!  .. someone in DAYTON,
                                  at               Ohio is selling USED
                               visi.com            CARPETS to a SERBO-CROATIAN

Reply by Grant Edwards ●March 8, 20062006-03-08

On 2006-03-08, David Brown <david@westcontrol.removethisbit.com> wrote:
> CBFalconer wrote:
>> Isaac Bosompem wrote:
>> ... snip ...
>>> I do need an efficient hashing function, with a low probability of
>>> collisions (ideally). I will need to code to handle collisions.
>>>
>>> Anyways I am open to new ideas hopefully you have some of your own
>>> to add.
>> 
>> You are welcome to use hashlib, which is under GPL licensing, and
>> was designed to fill just this sort of need.  It is written in pure
>> ISO C.
>> 
>> You could open one table and stuff it with the opcodes.  Another
>> could hold the macros, and yet another the symbols.  The tables
>> will all use the same re-entrant hash code, yet can have widely
>> different content.
>> 
>>   <http://cbfalconer.home.att.net/download/hashlib.zip>
>> 
>
> It's for this sort of reason that such programs should be written in 
> languages like Perl or Python (or even PHP).  Just as C++ is a poor 
> choice of language for an 8051, so C/C++ is a poor choice of language 
> for an assembler running on a PC.

That's what I said, but apparently the assembler is running on
an 8-bit target with a 64K address space.  So Python is out of
the question.  But C++ and generic libraries are going to fit?

> Both Perl and Python will give you much simpler regular
> expression engines than any C library, vastly easier (and more
> flexible, and probably faster) hash dictionaries, and a host
> of ready-to-use libraries.

Yup.

-- 
Grant Edwards                   grante             Yow!  Darling, my ELBOW
                                  at               is FLYING over FRANKFURT,
                               visi.com            Germany...