Reply by Kelly October 9, 20042004-10-09

--- In rabbit-semi@rabb..., Scott Henion <shenion@s...> wrote:
> So it looks like using far pointers in ST C gets a 2.5x
> speed penalty. Not bad considering you need to do 64k wrap
> checks on each pointer and calc seg+offset. But it sure
> beats using xmem2root() and manually doing buffer
> swaps.

Impressive.

Dare I suggest someone give Duff's Device a try?

http://www.lysator.liu.se/c/duffs-device.html

In K&R C, targetting a Vax in 1983:

send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}

Kelly



Reply by Scott Henion October 8, 20042004-10-08
At 05:17 PM 10/8/2004, you wrote:

>At 02:53 PM 10/8/2004, you wrote:
> >Output:
> > Pure assembly: 199 milliseconds
> > memcpy(): 311 milliseconds
> > pointers: 3977 milliseconds
> > array index: 3580 milliseconds
>
>Softools Output:
>Pure assembly: 196 milliseconds
>memcpy(): 197 milliseconds
>pointers: 3675 milliseconds
>array index: 3030 milliseconds
>
>Using far pointers (xmem arrays)
>fmemcpy(): 197 milliseconds
>far pointers: 3676 milliseconds
>
>;)

Revised results (far results were using a near pointer.)

RCM3010 Softools Output:
Pure assembly: 196 milliseconds
memcpy(): 197 milliseconds
pointers: 3676 milliseconds
array index: 3030 milliseconds

fmemcpy(): 1012 milliseconds
far pointers: 7967 milliseconds
far array index: 8499 milliseconds So it looks like using far pointers in ST C gets a 2.5x speed penalty. Not
bad considering you need to do 64k wrap checks on each pointer and calc
seg+offset. But it sure beats using xmem2root() and manually doing buffer
swaps.

<Scott


Reply by Don Starr October 8, 20042004-10-08
> I'm pretty amazed that pointers were not the fastest... :(
> I believe there is a "poiter check" compile option. I'm not at
> my rabbit bnech right now. Does turning that off make much of
> a timing difference ?

Both pointer and array index checking were "on" for the previous
test. Here are the results when they're "off":
Pure assembly: 199 milliseconds
memcpy(): 310 milliseconds
pointers: 3978 milliseconds
array index: 3580 milliseconds

Here are the previous results:
> > Output:
> > Pure assembly: 199 milliseconds
> > memcpy(): 311 milliseconds
> > pointers: 3977 milliseconds
> > array index: 3580 milliseconds

Doesn't look like much of a difference. I haven't looked at the
generated ASM code to see what's going on.

> I write pointer incrs when I want speed ... obviously I need to
> rethink that assumption.

Probably depends on the compiler.

memcpy() will always be the fastest "straight C" implementation, as
long as the library function is written in assembly (using something
like LDIR). DC's memcpy() implementation is a bit odd - they check
for overlapping memory regions and then user LDIR or LDDR, as
appropriate. That's supposed to be a memmove() feature (though the
fact that memcpy()'s behavior in such a circumstance is "undefined"
makes DC's implementation perfectly legal).

Pointers vs. arrays will depend on the compiler. A compiler could
optimize pointers such that they were faster than using arrays, or
could do the converse.

-Don


Reply by Scott Henion October 8, 20042004-10-08
At 02:53 PM 10/8/2004, you wrote:
>Output:
> Pure assembly: 199 milliseconds
> memcpy(): 311 milliseconds
> pointers: 3977 milliseconds
> array index: 3580 milliseconds

Softools Output:
Pure assembly: 196 milliseconds
memcpy(): 197 milliseconds
pointers: 3675 milliseconds
array index: 3030 milliseconds

Using far pointers (xmem arrays)
fmemcpy(): 197 milliseconds
far pointers: 3676 milliseconds

;)


Reply by Kelly October 8, 20042004-10-08

--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote:
> The test below was run on a 29.4 MHz RCM3000. Code was compiled
> and run under DC 7.33TSE, "optimized" for speed.

<snip>

> Output:
> Pure assembly: 199 milliseconds
> memcpy(): 311 milliseconds
> pointers: 3977 milliseconds
> array index: 3580 milliseconds

Pure assembly: 4020 KB/sec
memcpy(): 2572.3 KB/sec
pointers: 201.16 KB/sec
indices: 223.46 KB/sec

I would never have expected a 20x speed up for assembly over
pointer-based C code. Truly amazing.

> I'm not sure why one would use the 'pointers' or 'array indices'
> versions, unless one was moving data from a circular queue to
> another buffer (and even then, it could be handled with, at most,
> two memcpy() or ASM LDIR operations).

It might happen if you were trying to use a standard C library.
Imagine how an off-the-shelf XML library might perform.

Kelly



Reply by Don Starr October 8, 20042004-10-08
--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote:
> > Copying a 8192-byte buffer 100 times takes 199 milliseconds.
>
> With assembly.

Well, nobody placed any limitation on programming language, or
even development tools ;)

> So I image that it would be almost impossible to exceed that speed
> in practice.

Without some DMA hardware, yes.

> What sort of speed do you get if you write it in C?
> Either with a pointer or loop index?

The test below was run on a 29.4 MHz RCM3000. Code was compiled
and run under DC 7.33TSE, "optimized" for speed. A better compiler
would likely yield better results for the pointer and array index
times - the hardware isn't the only limiting factor here (that's
why my first test was pure ASM - it tests only the hardware).

unsigned char SrcBuffer[8192], DstBuffer[8192];
unsigned short int i, j;
unsigned char *pSrc, *pDst;

nodebug void withPureASM(void)
{
#asm
ld a, 100
loop:
ld de, DstBuffer
ld hl, SrcBuffer
ld bc, 8192
db 0xED, 0xB0 ; LDIR
dec a
jr nz, loop
#endasm
}

nodebug void withMemcpy( void )
{
for ( i=0; i<100; i++ )
{
memcpy( DstBuffer, SrcBuffer, sizeof(DstBuffer) );
}
}

nodebug void withPointers( void )
{
for ( i=0; i<100; i++ )
{
pSrc = SrcBuffer;
pDst = DstBuffer;
for ( j=0; j<sizeof(DstBuffer); *pDst++ = *pSrc++, j++ );
}
}

nodebug void withIndices( void )
{
for ( i=0; i<100; i++ )
{
for ( j=0; j<sizeof(DstBuffer); DstBuffer[j]=SrcBuffer[j], j++ );
}
}

int main( void )
{
auto unsigned long startTime, endTime;

startTime = MS_TIMER;
withPureASM();
endTime = MS_TIMER;
printf( "Pure assembly: %ld milliseconds\n", endTime - startTime );

startTime = MS_TIMER;
withMemcpy();
endTime = MS_TIMER;
printf( "memcpy(): %ld milliseconds\n", endTime - startTime );

startTime = MS_TIMER;
withPointers();
endTime = MS_TIMER;
printf( "pointers: %ld milliseconds\n", endTime - startTime );

startTime = MS_TIMER;
withIndices();
endTime = MS_TIMER;
printf( "array index: %ld milliseconds\n", endTime - startTime );

return 0;
}

Output:
Pure assembly: 199 milliseconds
memcpy(): 311 milliseconds
pointers: 3977 milliseconds
array index: 3580 milliseconds

I'm not sure why one would use the 'pointers' or 'array indices'
versions, unless one was moving data from a circular queue to
another buffer (and even then, it could be handled with, at most,
two memcpy() or ASM LDIR operations).

-Don


Reply by Kelly October 8, 20042004-10-08

--- In rabbit-semi@rabb..., "Don Starr" <don@s...> wrote:
> Copying a 8192-byte buffer 100 times takes 199 milliseconds.

With assembly. So I image that it would be almost
impossible to exceed that speed in practice.

What sort of speed do you get if you write it in C?
Either with a pointer or loop index?

I'd do this myself, but I'm at work and don't have my toys here ;)

Kelly



Reply by Don Starr October 8, 20042004-10-08

--- In rabbit-semi@rabb..., Matt Pobursky <rabbituser@m...>
wrote:
> On Thu, 07 Oct 2004 21:09:06 -0000, jacob_sullivan wrote:
> >I was planning on using 100M ethernet. Are you saying that
> > althoughthe ethernet interface can handle 100M, the rabbit
> > can't deal withthat much data? (e.g. drinking from a
firehose).
>
> That's exactly what he's saying. In fact, if you have a Rabbit
> module handy just write a quickt test program that reads the
> system timer, moves a block of data (say 8K or 16K) from one data
> buffer to another. Read the timer again when done and compute the
> data transfer rate. You'll find you can't go much faster than
> Scott said -- and that's just moving data from one buffer to
> another!

While the Ethernet throughput is certainly limited, just moving data
between buffers isn't a problem.

I performed the test above on a RCM3000 running at 29.4 MHz:

unsigned char SrcBuffer[8192], DstBuffer[8192];

int main( void )
{
auto unsigned long startTime, endTime;

startTime = MS_TIMER;

#asm
ld a, 100
loop:
ld de, DstBuffer
ld hl, SrcBuffer
ld bc, 8192
db 0xED, 0xB0 ; LDIR, to avoid DC's 'ldir_func'
dec a
jr nz, loop
#endasm

endTime = MS_TIMER;

printf( "%ld milliseconds\n", endTime - startTime );

return 0;
}

Output:
199 milliseconds

Copying a 8192-byte buffer 100 times takes 199 milliseconds. This is
what I expected, based on the timing of the LDIR instruction. On my
little 29.4 MHz machine, I could move 4 MB per second before I ran
out of CPU cycles. Unless wait states start coming into play, the 44
MHz BL2600 should do even better.

-Don


Reply by RonG October 8, 20042004-10-08
>> They have the CPU horsepower (50-60MHz Coldfire CPU) and bus bandwidth to get the job done. <<

That's what I am using to do an XML server, a 66 Mhz 5282 Coldfire.

Ron

----- Original Message -----
From: "Matt Pobursky" <rabbituser@rabb...>
To: <rabbit-semi@rabb...>
Sent: Thursday, October 07, 2004 7:17 PM
Subject: Re: [rabbit-semi] Re: XML Parser >
> On Thu, 07 Oct 2004 21:09:06 -0000, jacob_sullivan wrote:
> > I was planning on using 100M ethernet. Are you saying that although
> > the ethernet interface can handle 100M, the rabbit can't deal with
> > that much data? (e.g. drinking from a firehose).
>
> That's exactly what he's saying. In fact, if you have a Rabbit module
> handy just write a quickt test program that reads the system timer,
> moves a block of data (say 8K or 16K) from one data buffer to another.
> Read the timer again when done and compute the data transfer rate.
> You'll find you can't go much faster than Scott said -- and that's just
> moving data from one buffer to another! In fact, the Rabbit can't even
> saturate a 10Mbps ethernet connection.
>
> The Rabbit platform is OK for basic (read slow and simple) ethernet
> operations, but you wouldn't really want to use it for any serious
> network traffic, especially if you have to do any data crunching or
> real time control at the same time.
>
> If you really have to do XML parsing on the fly, I'd consider something
> more like the Netburner products. They have the CPU horsepower (50-60
> MHz Coldfire CPU) and bus bandwidth to get the job done.
>
> Matt Pobursky
> Maximum Performance Systems > Yahoo! Groups Links >



Reply by Jim October 8, 20042004-10-08

--- In rabbit-semi@rabb..., "jacob_sullivan" <mail@j...>
wrote:
>
> Does anyone have a recommendation for an XML parser that will work
> with Dynamic C?

Yep Jacob I agree with everyone else, I wrote an XML prototype on an
admittedly slower system than the Rabbit, but decided ultimately that
xml parsing was inappropriate for embedded systems. The overhead of
including data tags, and parsing, bumped my streams up by a factor of
about 30 under test.

Now I simply stream raw data to Internet servers, and let them do all
the parsing. Obviously there are issues with converting native data
to PC world, but thats what its all about, and that should be done on
machines that have the computing power to do it properly.

Just my opinion of course, and always interested in alternate.

Actually we decided that XML was not appropriate full stop, later on
in the project, and abandoned it in favour of standard Data base
engines, due to the sheer volume of data we are dealing with.

Perhaps you could give us some idea of what you have in mind ?
We/You the guys, might be able to make some recommendations for you.

Best regards Jimbo CTT systems Irish Republic