--- In rabbit-semi@rabb..., Scott Henion <shenion@s...> wrote: > So it looks like using far pointers in ST C gets a 2.5x > speed penalty. Not bad considering you need to do 64k wrap > checks on each pointer and calc seg+offset. But it sure > beats using xmem2root() and manually doing buffer > swaps. Impressive. Dare I suggest someone give Duff's Device a try? http://www.lysator.liu.se/c/duffs-device.html In K&R C, targetting a Vax in 1983: send(to, from, count) register short *to, *from; register count; { register n=(count+7)/8; switch(count%8){ case 0: do{ *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; }while(--n>0); } } Kelly

At 05:17 PM 10/8/2004, you wrote: >At 02:53 PM 10/8/2004, you wrote: > >Output: > > Pure assembly: 199 milliseconds > > memcpy(): 311 milliseconds > > pointers: 3977 milliseconds > > array index: 3580 milliseconds > >Softools Output: >Pure assembly: 196 milliseconds >memcpy(): 197 milliseconds >pointers: 3675 milliseconds >array index: 3030 milliseconds > >Using far pointers (xmem arrays) >fmemcpy(): 197 milliseconds >far pointers: 3676 milliseconds > >;) Revised results (far results were using a near pointer.) RCM3010 Softools Output: Pure assembly: 196 milliseconds memcpy(): 197 milliseconds pointers: 3676 milliseconds array index: 3030 milliseconds fmemcpy(): 1012 milliseconds far pointers: 7967 milliseconds far array index: 8499 milliseconds So it looks like using far pointers in ST C gets a 2.5x speed penalty. Not bad considering you need to do 64k wrap checks on each pointer and calc seg+offset. But it sure beats using xmem2root() and manually doing buffer swaps. <Scott

> I'm pretty amazed that pointers were not the fastest... :( > I believe there is a "poiter check" compile option. I'm not at > my rabbit bnech right now. Does turning that off make much of > a timing difference ? Both pointer and array index checking were "on" for the previous test. Here are the results when they're "off": Pure assembly: 199 milliseconds memcpy(): 310 milliseconds pointers: 3978 milliseconds array index: 3580 milliseconds Here are the previous results: > > Output: > > Pure assembly: 199 milliseconds > > memcpy(): 311 milliseconds > > pointers: 3977 milliseconds > > array index: 3580 milliseconds Doesn't look like much of a difference. I haven't looked at the generated ASM code to see what's going on. > I write pointer incrs when I want speed ... obviously I need to > rethink that assumption. Probably depends on the compiler. memcpy() will always be the fastest "straight C" implementation, as long as the library function is written in assembly (using something like LDIR). DC's memcpy() implementation is a bit odd - they check for overlapping memory regions and then user LDIR or LDDR, as appropriate. That's supposed to be a memmove() feature (though the fact that memcpy()'s behavior in such a circumstance is "undefined" makes DC's implementation perfectly legal). Pointers vs. arrays will depend on the compiler. A compiler could optimize pointers such that they were faster than using arrays, or could do the converse. -Don

At 02:53 PM 10/8/2004, you wrote: >Output: > Pure assembly: 199 milliseconds > memcpy(): 311 milliseconds > pointers: 3977 milliseconds > array index: 3580 milliseconds Softools Output: Pure assembly: 196 milliseconds memcpy(): 197 milliseconds pointers: 3675 milliseconds array index: 3030 milliseconds Using far pointers (xmem arrays) fmemcpy(): 197 milliseconds far pointers: 3676 milliseconds ;)

--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote: > The test below was run on a 29.4 MHz RCM3000. Code was compiled > and run under DC 7.33TSE, "optimized" for speed. <snip> > Output: > Pure assembly: 199 milliseconds > memcpy(): 311 milliseconds > pointers: 3977 milliseconds > array index: 3580 milliseconds Pure assembly: 4020 KB/sec memcpy(): 2572.3 KB/sec pointers: 201.16 KB/sec indices: 223.46 KB/sec I would never have expected a 20x speed up for assembly over pointer-based C code. Truly amazing. > I'm not sure why one would use the 'pointers' or 'array indices' > versions, unless one was moving data from a circular queue to > another buffer (and even then, it could be handled with, at most, > two memcpy() or ASM LDIR operations). It might happen if you were trying to use a standard C library. Imagine how an off-the-shelf XML library might perform. Kelly

--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote: > > Copying a 8192-byte buffer 100 times takes 199 milliseconds. > > With assembly. Well, nobody placed any limitation on programming language, or even development tools ;) > So I image that it would be almost impossible to exceed that speed > in practice. Without some DMA hardware, yes. > What sort of speed do you get if you write it in C? > Either with a pointer or loop index? The test below was run on a 29.4 MHz RCM3000. Code was compiled and run under DC 7.33TSE, "optimized" for speed. A better compiler would likely yield better results for the pointer and array index times - the hardware isn't the only limiting factor here (that's why my first test was pure ASM - it tests only the hardware). unsigned char SrcBuffer[8192], DstBuffer[8192]; unsigned short int i, j; unsigned char *pSrc, *pDst; nodebug void withPureASM(void) { #asm ld a, 100 loop: ld de, DstBuffer ld hl, SrcBuffer ld bc, 8192 db 0xED, 0xB0 ; LDIR dec a jr nz, loop #endasm } nodebug void withMemcpy( void ) { for ( i=0; i<100; i++ ) { memcpy( DstBuffer, SrcBuffer, sizeof(DstBuffer) ); } } nodebug void withPointers( void ) { for ( i=0; i<100; i++ ) { pSrc = SrcBuffer; pDst = DstBuffer; for ( j=0; j<sizeof(DstBuffer); *pDst++ = *pSrc++, j++ ); } } nodebug void withIndices( void ) { for ( i=0; i<100; i++ ) { for ( j=0; j<sizeof(DstBuffer); DstBuffer[j]=SrcBuffer[j], j++ ); } } int main( void ) { auto unsigned long startTime, endTime; startTime = MS_TIMER; withPureASM(); endTime = MS_TIMER; printf( "Pure assembly: %ld milliseconds\n", endTime - startTime ); startTime = MS_TIMER; withMemcpy(); endTime = MS_TIMER; printf( "memcpy(): %ld milliseconds\n", endTime - startTime ); startTime = MS_TIMER; withPointers(); endTime = MS_TIMER; printf( "pointers: %ld milliseconds\n", endTime - startTime ); startTime = MS_TIMER; withIndices(); endTime = MS_TIMER; printf( "array index: %ld milliseconds\n", endTime - startTime ); return 0; } Output: Pure assembly: 199 milliseconds memcpy(): 311 milliseconds pointers: 3977 milliseconds array index: 3580 milliseconds I'm not sure why one would use the 'pointers' or 'array indices' versions, unless one was moving data from a circular queue to another buffer (and even then, it could be handled with, at most, two memcpy() or ASM LDIR operations). -Don

--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote: > Copying a 8192-byte buffer 100 times takes 199 milliseconds. With assembly. So I image that it would be almost impossible to exceed that speed in practice. What sort of speed do you get if you write it in C? Either with a pointer or loop index? I'd do this myself, but I'm at work and don't have my toys here ;) Kelly

--- In rabbit-semi@rabb..., Matt Pobursky <rabbituser@m...> wrote: > On Thu, 07 Oct 2004 21:09:06 -0000, jacob_sullivan wrote: > >I was planning on using 100M ethernet. Are you saying that > > althoughthe ethernet interface can handle 100M, the rabbit > > can't deal withthat much data? (e.g. drinking from a firehose). > > That's exactly what he's saying. In fact, if you have a Rabbit > module handy just write a quickt test program that reads the > system timer, moves a block of data (say 8K or 16K) from one data > buffer to another. Read the timer again when done and compute the > data transfer rate. You'll find you can't go much faster than > Scott said -- and that's just moving data from one buffer to > another! While the Ethernet throughput is certainly limited, just moving data between buffers isn't a problem. I performed the test above on a RCM3000 running at 29.4 MHz: unsigned char SrcBuffer[8192], DstBuffer[8192]; int main( void ) { auto unsigned long startTime, endTime; startTime = MS_TIMER; #asm ld a, 100 loop: ld de, DstBuffer ld hl, SrcBuffer ld bc, 8192 db 0xED, 0xB0 ; LDIR, to avoid DC's 'ldir_func' dec a jr nz, loop #endasm endTime = MS_TIMER; printf( "%ld milliseconds\n", endTime - startTime ); return 0; } Output: 199 milliseconds Copying a 8192-byte buffer 100 times takes 199 milliseconds. This is what I expected, based on the timing of the LDIR instruction. On my little 29.4 MHz machine, I could move 4 MB per second before I ran out of CPU cycles. Unless wait states start coming into play, the 44 MHz BL2600 should do even better. -Don

>> They have the CPU horsepower (50-60MHz Coldfire CPU) and bus bandwidth to get the job done. << That's what I am using to do an XML server, a 66 Mhz 5282 Coldfire. Ron ----- Original Message ----- From: "Matt Pobursky" <rabbituser@rabb...> To: <rabbit-semi@rabb...> Sent: Thursday, October 07, 2004 7:17 PM Subject: Re: [rabbit-semi] Re: XML Parser > > On Thu, 07 Oct 2004 21:09:06 -0000, jacob_sullivan wrote: > > I was planning on using 100M ethernet. Are you saying that although > > the ethernet interface can handle 100M, the rabbit can't deal with > > that much data? (e.g. drinking from a firehose). > > That's exactly what he's saying. In fact, if you have a Rabbit module > handy just write a quickt test program that reads the system timer, > moves a block of data (say 8K or 16K) from one data buffer to another. > Read the timer again when done and compute the data transfer rate. > You'll find you can't go much faster than Scott said -- and that's just > moving data from one buffer to another! In fact, the Rabbit can't even > saturate a 10Mbps ethernet connection. > > The Rabbit platform is OK for basic (read slow and simple) ethernet > operations, but you wouldn't really want to use it for any serious > network traffic, especially if you have to do any data crunching or > real time control at the same time. > > If you really have to do XML parsing on the fly, I'd consider something > more like the Netburner products. They have the CPU horsepower (50-60 > MHz Coldfire CPU) and bus bandwidth to get the job done. > > Matt Pobursky > Maximum Performance Systems > Yahoo! Groups Links >

--- In rabbit-semi@rabb..., "jacob_sullivan" <mail@j...> wrote: > > Does anyone have a recommendation for an XML parser that will work > with Dynamic C? Yep Jacob I agree with everyone else, I wrote an XML prototype on an admittedly slower system than the Rabbit, but decided ultimately that xml parsing was inappropriate for embedded systems. The overhead of including data tags, and parsing, bumped my streams up by a factor of about 30 under test. Now I simply stream raw data to Internet servers, and let them do all the parsing. Obviously there are issues with converting native data to PC world, but thats what its all about, and that should be done on machines that have the computing power to do it properly. Just my opinion of course, and always interested in alternate. Actually we decided that XML was not appropriate full stop, later on in the project, and abandoned it in favour of standard Data base engines, due to the sheer volume of data we are dealing with. Perhaps you could give us some idea of what you have in mind ? We/You the guys, might be able to make some recommendations for you. Best regards Jimbo CTT systems Irish Republic