This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).
|
I've been deliberating the value of these instructions in a "general purpose" processor. If byte addressable memory is not used then effectively two more address bits can be made available to allow access to more memory (4G -> 16G with 32 bit addresses). Not to mention that the byte lane switching hardware can be eliminated. I'd love to be able to eliminate byte lane switching hardware because it would speed up the processor. Is byte addressability worth more than extra address bits ? I've put together a zip file for a SoC containing the BlueBird processor, and a text / graphics mode VGA controoler. It's on my website. Rob http://www.birdcomputer.ca/ |
|
|
|
Hi Rob, >I've been deliberating the value of these instructions in a "general >purpose" processor. If byte addressable memory is not used then >effectively two more address bits can be made available to allow >access to more memory (4G -> 16G with 32 bit addresses). Not to >mention that the byte lane switching hardware can be eliminated. I'd >love to be able to eliminate byte lane switching hardware because it >would speed up the processor. Is byte addressability worth more than >extra address bits ? IIRC the Analog Devices SHARC DSP did without them, everything was 32 bits, which made for occasional fun porting C code since char, short and int were all 32 bits :-) Cheers. Martin -- Martin Thompson BEng(Hons) CEng MIEE TRW Conekt Stratford Road, Solihull, B90 4GW. UK Tel: +44 (0)121-627-3569 - |
|
rtfinch35 wrote: > I've been deliberating the value of these instructions in a "general > purpose" processor. If byte addressable memory is not used then > effectively two more address bits can be made available to allow > access to more memory (4G -> 16G with 32 bit addresses). Not to > mention that the byte lane switching hardware can be eliminated. I'd > love to be able to eliminate byte lane switching hardware because it > would speed up the processor. Is byte addressability worth more than > extra address bits ? It depends. If you want good performance on code that was written with byte addressable memory in mind (e.g. most C code), then you have a problem without it. It's often very hard for a compiler to transform or schedule code to pack multiple byte accesses into a single word access. The packing also costs you (you may want specific byte pack/unpack instructions). And you'll never be able to make writing a single isolated byte to a word as fast as with byte-addressable memory; you'll always have to do a read-modify-write access in that case. The DEC Alpha was one of the few recent system CPU architectures that tried to do without byte addressing. Despite their excellent design and advanced compiler, it didn't take long before they decided to add byte addressing (with the 21164A IIRC)... They realised it was a mistake for their high-performance system target, even with their state of the art compiler technology. On the other hand, if you don't mind a performance hit, you can do without byte addressing and not only save address bits and reduce CPU cost, but also reduce memory (cache and main memory) cost. Also, if you expect to run code written (and optimised) specifically for your architecture most of the time, then the performance hit will be small in many cases. So for an embedded or special purpose processor it may well make sense to leave out byte addressing support. - Reinoud |
|
rtfinch35 wrote: > > I've been deliberating the value of these instructions in a "general > purpose" processor. If byte addressable memory is not used then > effectively two more address bits can be made available to allow > access to more memory (4G -> 16G with 32 bit addresses). Not to > mention that the byte lane switching hardware can be eliminated. I'd > love to be able to eliminate byte lane switching hardware because it > would speed up the processor. Is byte addressability worth more than > extra address bits ? The old problem bytes or no bytes. Most RISC machines that have software byte support end up adding hardware support later. Make sure you have 16 bits around that can give you a wide byte.:) -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Tommy Thorn wrote: > Dhystone MIPS (Doubious MIPS?) implies C, C implies representing strings as > zero terminated byte arrays. If you're willing to give up C (say, for somes > other language) then nothing really dictates 1) support for eight-bit data > types & 2) that word addresses are incremented by sizeof(word). Note, these > are two distinct issues. For an embedded system there are many cases where > byte support isn't worthwhile. But if you omit bytes, short int takes its place.Mind you you still loose address bits that int-16 int-32 int-64 take up. No where does it say bytes has to be 8 bits. On some machines like the PDP-10 a byte is 7 bits. It is that programmers assume bytes and words map together that you have problems not the language. > The asymmetry is interesting. It's fairly cheap to implement byte loads in > terms of word loads, but simulating byte stores with load-merge-store is much > more expensive due to the increased memory traffic and it's latencies. > > (Sweet hack. It's a variant of a more well known one. I'll give a few > examples tomorrow). So what is this a serial now ... tune in next week to ( sounds of hoof beats fading off into the distance ). > > Once of the reasons I find FPGA CPUs so interesting, is they (used to) > > mandate a certain minimalism that leads you to re-examine conventional > > wisdom. It is not minimalism ... it is using resources wisely. With my cpu I am at 98% routing capacity. Sure I can make changes but I have to do it carefully, none the less I have added several features with only a few changes. For example I have 12 bit processor, yet at times I want to store only 8 bits with the upper 4 bits 0 filled. This was a minor change made on the control logic adding one inhibit line. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
|
|
Hi The biggest problem I see is that of a compiler to work with it. Porting LCC to a new target has an assumption that byte addressing is possible. It is not possible to set SIZEOFBYTE = SIZEOFINT = SIZEOFLONG. From what I can tell, other compilers are the same. It might be posible to code using only ints and longs having done the port with this assumption in mind. The alternative is to do some rework on the compiler internals. If you were coding in assembler you'd be ok. Invariable, most processor architectures that have left out bytes to start with have adopted them later. Veronica -----Original Message----- From: rtfinch35 <> To: <> Date: 31 January 2002 08:31 Subject: [fpga-cpu] LB / LBU / SB valuable ? >I've been deliberating the value of these instructions in a "general >purpose" processor. If byte addressable memory is not used then >effectively two more address bits can be made available to allow >access to more memory (4G -> 16G with 32 bit addresses). Not to >mention that the byte lane switching hardware can be eliminated. I'd >love to be able to eliminate byte lane switching hardware because it >would speed up the processor. Is byte addressability worth more than >extra address bits ? > >I've put together a zip file for a SoC containing the BlueBird >processor, and a text / graphics mode VGA controoler. It's on my >website. > >Rob http://www.birdcomputer.ca/ > >To post a message, send it to: >To unsubscribe, send a blank message to: |
|
|
|
Tommy Thorn wrote: > Not necessarity? My point was that it sometimes makes sense to design a word > based machine where there is only one load and one store instruction. > Obviously, you can extract bit fields from words and call the bytes and you > can even layer an (expensive) byte addressing scheme on top of this. If byte > operations aren't critical for performance then the savings could buy you > performance elsewhere. I like a large word length like 64 or 72 bits. You can pack two instructions often per word. What I don't like is a RISC architecture as you seem to spend most of time moving to registers then performing your operation. Has any one done a address stack where you keep a cache of the last few addresses and use them as a short form of addressing? Most of my current designs have limited to 24 bits because of FPGA size and 12 bit byte width since I fewer chips to wire up. > Now if the second word should have address 1, 2, or 4 is another issue. For > 16-bit machine the difference between 64KB and 256KB is significant, but none > of the systems I have in mind need anything near 4GB, let alone, 16GB. > Keeping the byte addressing scheme makes it a lot easier to simulate in > software. Historically 16 bit machines ( Before the PDP-11 ) did not have byte operations and addressing was limited to 32k as often 1 bit was used to indicate multiple indirection. Some of the old machines had good ideas, lets keep them in mind too. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
(Good discussion thread, thanks Rob.) > Porting LCC to a new target has an assumption that byte > addressing is possible. It is not possible to set SIZEOFBYTE > = SIZEOFINT = SIZEOFLONG. From what I can tell, other > compilers are the same. I've gone down the same paths before. I think byte stores are a necessary evil. I think they're one of the few areas where explicit hardware support above-and-beyond the bare minimum pays off in wall clock time. (If you ever want to catch or beat MicroBlaze at "D-MIPS" you're going to need to implement strcpy and strlen (I think it was) with an efficient handling of byte string data.) Alternately, you can do insert-byte and insert-halfword instructions so that sb is load-word ;; insert-byte ;; store-word. Another variation is the two instruction sequence load-word-merging-with-the-register-containing-the-byte-or-halfword-I-in tend-to-store ;; store-word. This sequence can update the register or (interlocked) do the merge in the "MDR". By the way, you have probably noticed that xr16 has no lb(signed). It uses the sequence lb(unsigned) rd,addr ;; xori rd,0x80 ;; subi rd,0x80 which properly sign extends rd when rd[7] is set. The same trick (with 0x8000) works for load-halfword-signed. And here's a cute hack that went through Microsoft back in 1988 or so: What property does w have whenever ((w-0x01010101) & ~w & 0x80808080) == 0 and why should you care? (If you know the answer, please embargo your response until TOMORROW so that people can take a little time to puzzle it out for themselves. If you've never twiddled bits before it's quite instructive and interesting.) It IS relevant to this discussion. Back to the compiler discussion. Say we have 16-bit chars on a machine with 16- and 32-bit ld/st, but no 8-bit data type. (Note: I'm not suggesting this is a good idea for porting dusty deck C code.) Now let's build a C compiler with sizeof(char)==sizeof(short)==1 (16 bits) and sizeof(int)==sizeof(long)==sizeof(void*)==2 (32 bits). You could run this by lcc's ops utility: ops c=1 s=1 i=2 l=2 h=2 f=2 d=2 x=2 p=2 and I guess lcc would be just fine. Veronica? And are you saying (from experience?) that ops c=1 s=1 i=1 l=1 h=1 f=1 d=1 x=1 p=1 won't work? Perhaps someone could try this and see what happens. Once of the reasons I find FPGA CPUs so interesting, is they (used to) mandate a certain minimalism that leads you to re-examine conventional wisdom. Thanks. Jan Gray, Gray Research LLC |
|
|
|
> I've gone down the same paths before. I think byte stores are a > necessary evil. I think they're one of the few areas where explicit > hardware support above-and-beyond the bare minimum pays off in wall > clock time. (If you ever want to catch or beat MicroBlaze at "D-MIPS" > you're going to need to implement strcpy and strlen (I think it was) > with an efficient handling of byte string data.) Dhystone MIPS (Doubious MIPS?) implies C, C implies representing strings as zero terminated byte arrays. If you're willing to give up C (say, for somes other language) then nothing really dictates 1) support for eight-bit data types & 2) that word addresses are incremented by sizeof(word). Note, these are two distinct issues. For an embedded system there are many cases where byte support isn't worthwhile. > Alternately, you can do insert-byte and insert-halfword instructions so > that sb is load-word ;; insert-byte ;; store-word. Another variation is > the two instruction sequence > load-word-merging-with-the-register-containing-the-byte-or-halfword-I-in > tend-to-store ;; store-word. This sequence can update the register or > (interlocked) do the merge in the "MDR". The asymmetry is interesting. It's fairly cheap to implement byte loads in terms of word loads, but simulating byte stores with load-merge-store is much more expensive due to the increased memory traffic and it's latencies. (Sweet hack. It's a variant of a more well known one. I'll give a few examples tomorrow). > Once of the reasons I find FPGA CPUs so interesting, is they (used to) > mandate a certain minimalism that leads you to re-examine conventional > wisdom. Amen. /Tommy |
|
Sorry to follow up so soon, but I think I was being misunderstood. I wrote: > > Dhystone MIPS (Doubious MIPS?) implies C, C implies representing strings > > as zero terminated byte arrays. If you're willing to give up C (say, for > > somes other language) then nothing really dictates 1) support for > > eight-bit data types & 2) that word addresses are incremented by > > sizeof(word). Note, these are two distinct issues. For an embedded > > system there are many cases where byte support isn't worthwhile. Ben Franchuk replied: > But if you omit bytes, short int takes its place. Not necessarity? My point was that it sometimes makes sense to design a word based machine where there is only one load and one store instruction. Obviously, you can extract bit fields from words and call the bytes and you can even layer an (expensive) byte addressing scheme on top of this. If byte operations aren't critical for performance then the savings could buy you performance elsewhere. Now if the second word should have address 1, 2, or 4 is another issue. For 16-bit machine the difference between 64KB and 256KB is significant, but none of the systems I have in mind need anything near 4GB, let alone, 16GB. Keeping the byte addressing scheme makes it a lot easier to simulate in software. To add a little more interesting contents to my mail: What's so amazing about n << ((s >> 1) - 1) and when can it be used? (Please, don't reply until tomorrow). Regards, Tommy |
|
|
|
On Thu, 31 Jan 2002, Ben Franchuk wrote: > On some machines like the PDP-10 a byte is 7 bits. Well, sometimes. The PDP-10 is a word-oriented machine, but it has a set of instructions for dealing with bytes of whatever size you want (0 to 36 bits). The byte instructions find the bytes using byte pointers, which hold the address of the word containing the byte, and the position and size of the byte within the word. ASCII is often stored as 7 bits (5 bytes per word), since that's the most space-efficient. If you need to store 8-bit bytes, though, you could use 8 or 9 bits. The -10 also used SIXBIT, which handily fit six characters into one word. A load byte (LDB) instruction reads the byte pointer, loads, shifts, and masks the word containing the byte, and stores it in a register. A deposit byte (DPB) instruction reads the byte pointer, reads the word, inserts the byte (from a register), and stores the word back. Increment byte pointer (IBP) is more fun. It takes a byte pointer and slides it to the next byte in the word; if there aren't enough bits to do this, it increments the address of the word containing the byte and sets the position field to 0. You can combine this with a load or deposit with the ILDB and IDPB instructions. When the -10 got microcode, it ended up with more byte instructions. Adjust byte pointer (ADJBP) is a more powerful IBP: it allows you to slide a byte pointer by an arbitrary number of bytes when the byte you're pointing at might start at an arbitrary position in the word. There are also some extended instructions for dealing with strings of bytes (moving, comparing, decimal conversion). They're not light on memory operations, but assembly language programmers seem to like 'em. :-) jake (from XKL) |
|
|
|
Hmmm... I think I'll compromise and support half-word (16 bit) loads and stores. It looks to me like bytes are only needed to support chars for legacy apps, now-adays everything is going unicode. I've never been that impressed with the way C handles strings. Pascal (and even BASIC) does a better job. There should be a built in string type in the language, although this is fixed somewhat with C++. I'm thinking of starting a port of gcc. Is there a place to download the gcc source and tools ? The last time I tried to compile gcc I gave up because not all of the sources and tools were in 'in sync'. I will be developing under Win98 / Me. Rob |
|
Hi > What property does w have whenever ((w-0x01010101) & ~w & 0x80808080) >== 0 and why should you care? This is a nice little teaser! >You could run this by lcc's ops utility: > ops c=1 s=1 i=2 l=2 h=2 f=2 d=2 x=2 p=2 >and I guess lcc would be just fine. Veronica? Intresting you propose that becuase I was thinking of looking at it as a way forward. The easiest way may be to do this and not use char in the source. >And are you saying (from experience?) that > ops c=1 s=1 i=1 l=1 h=1 f=1 d=1 x=1 p=1 >won't work? I started looking at this from a byte only processor and I couldn't get the backend to work properly. I then took a look from a 32 bit only processor view and that looked worse. Thinking about it, there is too much inherent sizing in C surrounding char types and thier relationship with longer type. Just take a look at types as defined in K&R for a compiler writers perspective. Page 34, K&R First edition. Char is a single byte. int is typically the natural size of the machine short is no longer than long In the second edition (ANSI), page 36, shorts are at least 16 bit and longs at least 32, short is no longer than int is no longer than long. C++ has that char is at least 8 bit, short at least 16 and long at least 32 with bytes no longer than shorts etc etc. As an ANSI C compiler writer it is highly likely that you would make certain assumptions then about the archtiechture you are targeting. Bytes and byte pointers are one such area and from memory it was code generation in this area that got messy. Perhaps a way to go is to use 16 and 32 bit, provide a dummys for the 8 bit encoding and never use 8 bit numbers. Use unicode for strings. However, I would still anticipate code that ends up having to manipulate 8 bit numbers because computer architechture depends so heavily on bytes. For my own design I decided to use bytes (8 bit), shorts (16 bit) and ints/longs (32 bit) knowing that it isn't going to cost be a lot in FPGA area and makes things in the code and lcc a lot easier. |
|
Veronica Merryfield wrote: > Thinking about it, there is too much inherent sizing in C surrounding char > types and thier relationship with longer type. Just take a look at types as > defined in K&R for a compiler writers perspective. > Page 34, K&R First edition. > Char is a single byte. > int is typically the natural size of the machine > short is no longer than long > In the second edition (ANSI), page 36, shorts are at least 16 bit and longs > at least 32, short is no longer than int is no longer than long. > C++ has that char is at least 8 bit, short at least 16 and long at least 32 > with bytes no longer than shorts etc etc. ANSI C, C++ :~) <- sticks out tongue. I think C lost it at the point where it was ANSI'ised for being a useful language. Mind you what alternatives have you. There must be thousands out of work Computer Science people as nothing new and small have appeared to write programs in. Big systems and big languages yes. Mind you the only thing C really requires is that a pointer == NULL and end of string is 0. The rest is hardware and library dependent. > As an ANSI C compiler writer it is highly likely that you would make certain > assumptions then about the archtiechture you are targeting. Bytes and byte > pointers are one such area and from memory it was code generation in this > area that got messy. As a person with a CPU design that has no register to register operations other than a move I get rather annoyed with the idea that you load a register, then perform a register to register operation in most code generators I have seen. > Perhaps a way to go is to use 16 and 32 bit, provide a dummys for the 8 bit > encoding and never use 8 bit numbers. Use unicode for strings. However, I > would still anticipate code that ends up having to manipulate 8 bit numbers > because computer architechture depends so heavily on bytes. A store byte instruction is still useful where the upper 8 bits are zeroed in a word store.A swap byte instruction also is handy. Last minute instructions to my instruction set for char data to and from disk. > For my own design I decided to use bytes (8 bit), shorts (16 bit) and > ints/longs (32 bit) knowing that it isn't going to cost be a lot in FPGA > area and makes things in the code and lcc a lot easier. > I really have not looked at lcc for porting to my cpu since it does seem to self compile well. Note the system I plan to design will have 96k free ram not several megs like most C compilers. Ben. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
> > For my own design I decided to use bytes (8 bit), shorts (16 bit) and > > ints/longs (32 bit) knowing that it isn't going to cost be a lot in FPGA > > area and makes things in the code and lcc a lot easier. > > > I really have not looked at lcc for porting to my cpu since it does seem > to self compile well. Note the system I plan to design will have 96k > free ram not several megs like most C compilers. If you think C is only suited for big register-register architecture with lots of memory to burn you should maybe have a look at CC65. (www.cc65.org) Thats a C-compiler for several Commodore machines using a 6502 or compatible. Yes, that is 64kb of memory and a 8 bit accumulator based cpu. |
|
|
|
Tim Boescke wrote: > If you think C is only suited for big register-register architecture > with lots of memory to burn you should maybe have a look at CC65. > (www.cc65.org) Thats a C-compiler for several Commodore machines > using a 6502 or compatible. Yes, that is 64kb of memory and a 8 bit > accumulator based cpu. I glanced at it a while back. 1) No source 2) It does look like you do development from the PC. Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Veronica Merryfield wrote: > 96K is plenty. Writing in C doesn't mean you have to write inefficient code. > If you pay as much attention to writing C and what the compiler will do as > you would in writing with asm, then you can write small fast code. Writing > in a high level language doesn't mean you have to bloat it even though most > programmers seem to. The same goes for C++. > > As an example, I have got a flue gas analyser with LCD UI into a 16k ROM > with complied C. > I have others of similar complex systems written in C with relatively small > foot prints. Well for now I am limited to 32K as that is all on my prototype card. It is the compilers that have the big foot prints. Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
Ben Franchuk wrote: > I glanced at it a while back. 1) No source 2) It does look like you do > development from the PC. > > Ben Franchuk - Dawn * 12/24 bit cpu * > www.jetnet.ab.ca/users/bfranchuk/index.html I was wrong about no source but it does look like you need a 32bit compiler for it.All I got here is MS6. How ever I will spend more energy at hardware and a OS. Once I have that then I can get a better C compiler. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
>I really have not looked at lcc for porting to my cpu since it does seem >to self compile well. Note the system I plan to design will have 96k >free ram not several megs like most C compilers. 96K is plenty. Writing in C doesn't mean you have to write inefficient code. If you pay as much attention to writing C and what the compiler will do as you would in writing with asm, then you can write small fast code. Writing in a high level language doesn't mean you have to bloat it even though most programmers seem to. The same goes for C++. As an example, I have got a flue gas analyser with LCD UI into a 16k ROM with complied C. I have others of similar complex systems written in C with relatively small foot prints. Veronica |
|
> > I glanced at it a while back. 1) No source 2) It does look like you do > > development from the PC. > > > > Ben Franchuk - Dawn * 12/24 bit cpu * > > www.jetnet.ab.ca/users/bfranchuk/index.html > > I was wrong about no source but it does look like you need a 32bit > compiler for it.All I got here is MS6. How ever I will spend more energy > at hardware and a OS. Once I have that then I can get a better C > compiler. MSVC 6 ? That should be fine. Regarding an embedded OS: Has anybody ever considered porting Minix to a FPGA-CPU ? Its a very small unix clone and well documented. Probably it is quite easy to port as there are only few small assembler part. Maybe it is possible to compile it with some modified LCC or small c. |
|
|
|
Sean wrote: > > Why not use eCos or uCos-II on an fpga. I am not sure how useful Minix would be > for that period of time after the port was completed. B^) <snip> I have the book 'Embedded Systems Building Blocks' on order. I hope to port from C to assembly. At this time all I plan for my OS to do is fopen,fclose,fput,fget,delete. How ever since I am still coding the hardware I don't expect to have a motherboard for at least a month and prom burned for it. Right now I can't make up my mind what frequency to have the cpu but a 12 Mhz clock ( 3 MHz memory cycle time ) is what it set for now. > I am not sure of the C compiler requirements for building either of these > operating systems. If LCC worked, oh sweet joy! I was looking the the minux page and LCC is ported for minix. > Sean. -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
> Why not use eCos or uCos-II on an fpga. I am not sure how useful Minix would be > for that period of time after the port was completed. B^) Uhm well.. thats a point. Nevertheless, Minix comes with most unix tools and is geared towards the very low end. If you want some kind of "desktop" operating system on your machine it is probably still the way to go. |
|
> > Why not use eCos or uCos-II on an fpga. I am not sure how useful Minix > would be > for that period of time after the port was completed. B^) > > Uhm well.. thats a point. Nevertheless, Minix comes with most > unix tools and is geared towards the very low end. If you want > some kind of "desktop" operating system on your machine it is > probably still the way to go. Well, another idea would be a uclinux. http://www.uclinux.org/ Thats a linux port for small embedded systems without MMU. A bit bigger and less good documented than minix i'd presume.. |
|
Tim Boescke wrote: > Uhm well.. thats a point. Nevertheless, Minix comes with most > unix tools and is geared towards the very low end. If you want > some kind of "desktop" operating system on your machine it is > probably still the way to go. I have looked at Minux and uLinux but they all are still 8086 based. They are too complex to port at the moment. Also the about the desktop OS, I want the computer off the top of my desk. :) -- Ben Franchuk - Dawn * 12/24 bit cpu * www.jetnet.ab.ca/users/bfranchuk/index.html |
|
I wrote: > What property does w have whenever > ((w-0x01010101) & ~w & 0x80808080) == 0 > and why should you care? Answer: (Assuming w is a 2's complement 32-bit word,) ((w-0x01010101) & ~w & 0x80808080) != 0 if and only if some byte in w (e.g. w[31:24], w[23:16], w[15:8], w[7:0]) is 0. See, e.g. NUL finding in http://www.ugcs.caltech.edu/~wnoise/base2.html -- which has many more fun bit twiddling tricks. I once measured (on a cached i386 XENIX box) that strlen (and hence strcpy) based on this identity were some tens of per cent faster than byte at a time strlen / strcpy using REP SCASB. However, it was not a win for very short strings, or for i386s without caches. Who knows how they compare on newer, faster x86 processors? Now then, back to the topic at hand. Even if your processor lacks byte load/store hardware, it can implement byte addressible memory, and C strings, if it has byte insert and extract support, or, failing that, shifts. If you do use this approach, though, you may wish to implement C strings word-at-a-time using the above identity to quickly test if the current 4 or 8 byte word has any nuls. (You could also follow the example of others and provide a test-for-null-byte instruction). Then the inner loops of strlen, strcpy, etc. can sometimes go a word at a time (modulo alignment issues, etc. Jan Gray, Gray Research LLC |
|
Thanks to Jan for the pointer to many base-2 tricks (some I hadn't seem before). Hower my little trick wasn't covered. As I'm sure everyone (who care) have realized by now n << ((x >> 1) - 1) == n*x for x = 1, 2, or 4. This formular is useful for calculating the total size given an item count and item size and know that items can only be bytes, 16-bits, or 32-bits. Unfortunately it doesn't work of 64-bit items. This might be considered Of Topic, but as base-2 is the fundemental to everything FPGA we should know all the tricks there are. Regards, Tommy |
|
Ben Franchuk <> writes: > I have looked at Minux and uLinux but they all are still 8086 based. > They are too complex to port at the moment. Also the about the desktop > OS, I want the computer off the top of my desk. :) ucLinux is not 8086 based (I'm not sure if it even runs on x86). According to the web page, there are (at least preliminary) ports for Motorola 68K family processors, Motorola ColdFire, ARM, H8, and other processors. (I'm not saying I think ucLinux is a good choice for targeting an FPGA; unless you really need the features of Linux, I think an OS that's written from scratch to be embeddable is probably a better choice.) Carl Witty |