Linux serial port dropping bytes| page 5

Reply by Didi ●April 2, 20082008-04-02

> The XScale CPU I'm using runs at 400 MHz.  (I've forgotten who asked,
> but it's communicating to with a TSM320F2812 DSP.)  Hardware flow
> control is not an option because it's not implemented in the Arcom hardware.
>
> Do these interrupt frequencies sound reasonable for a non-realtime OS,
> or is it hopeless as some of my coworkers here have suggested?

The hardware can do what you are after at < 10% overhead (more like
1%).
The OS (or should we call it a inOS?) or any software can be written
in a
way to make any hardware unusable, of course.

Not long ago I used a 400 MHz MPC5200; part of what it did was to
continuously (no pauses at all) update a serial DAC at apr. 16 MbpS
and read
4 ADCs at another 16 Mbps (4 MbpS per ADC, that is, but going over
a single 16 MbpS link).
  The CPU is doing it all at a fraction of its resources, and I
actually used
*no* interrupts (this was for fun/experiment). I did not try the UART
at >76800 bpS
(had no faster port at the other end), but it had plenty of margin
(and 9600 would have been lpenty for the application).
Of course all seril ports work simultaneously.
  (see some of it at http://tgi-sci.com/y2demo/ ).

Now the XScale is not a PPC but even if it were 16 times slower at the
same clock rate it would still be sufficient for your 1 MbpS with
plenty
of margin. So clearly the software is the inhibiting factor.

> BTW, I'm going to give the guy who picked out this particular
> hardware/software combo a really hard time.  :P

Well increasingly more people seem to fall for things like that, it
seems the popularity of words like windows and linux and being
exposed all day to colourful websites make people think everything
will
just work no matter what - which sometimes is far from being
true... :-).

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Derek Young wrote:
> >
> >> So that leaves the real problem handling throughput of
> >> approximately 1 char each 10 microsec.
> >
> > That's where a large FIFO becomes important.  Using Linux on an
> > XScale (which, IIRC, is is what the OP is using), I've done up
> > to 460K baud without problems.  But, that was using a UART with
> > a 1K byte rx FIFO.  That UART also allowed 32-bit wide accesses
> > to the tx/rx FIFOs so that you could transfer 4 bytes per bus
> > cycle.
> >
> > With a 128 byte FIFO and byte-wide access, the timing
> > constraints are quite a bit tighter, but I think it should be
> > doable if you carefully vet the other drivers that are running
> > on the system.
> >
>
> Unfortunately, I need to rely on drivers written by Arcom.  I've sent
> some questions to their tech support, but am still waiting for a reply.
>
> I was thinking... at 921.6 kbps (8/N/1 -> 92160 bytes/sec), if the FIFO
> interrupt level is set at:
>
> 128 bytes: 720 interrupts/sec (1.4 ms/int), 10.8 us allowed for the ISR
> to respond and empty the FIFO
>
> 64 bytes: 1440 interrupts/sec (694 us/int), ~700 us for the ISR to
> respond/finish
>
> (I'm still trying to look through the driver code to figure out where
> the interrupt level is set.)
>
> The XScale CPU I'm using runs at 400 MHz.  (I've forgotten who asked,
> but it's communicating to with a TSM320F2812 DSP.)  Hardware flow
> control is not an option because it's not implemented in the Arcom hardware.
>
> Do these interrupt frequencies sound reasonable for a non-realtime OS,
> or is it hopeless as some of my coworkers here have suggested?
>
> BTW, I'm going to give the guy who picked out this particular
> hardware/software combo a really hard time.  :P
>
>
> Derek

Reply by David Brown ●April 2, 20082008-04-02

Didi wrote:
> CBFalconer wrote:
>> ....
>> That depends on your CPU speed.  Within the interrupt, you have to
>> handle something like:
>>
>>   REPEAT
>>    test for more in FIFO
>>    take one, stuff in buffer, while checking for buffer full.
>>    test for overflow or other errors.
> 
> So far OK, although it sounds more complex than it actually is - 128
> bytes
> will be processed within 1000 CPU cycles easily, which would be taking
> all the time on a 1 MHz CPU...
> 
>>    if any, call appropriate handler
> 
> You don't call any "handlers" from an IRQ service routine.
> You set a flag bit somewhere to indicate what happened and let the
> non-time critical code deal with it.
> 

In this case (a fast UART on a Linux system), you don't want to do 
anything with the incoming data except buffer it - processing is done in 
a different process/thread, as you suggest.  But it's worth noting that 
in some embedded systems, it makes a lot of sense to do more specific 
handling of data during interrupt routines - interrupt handlers do not 
necessarily need to be as fast as possible, only as fast as necessary. 
If you have a system where you have better knowledge of the interrupts, 
the response times, and the required times, then you are free to do all 
the work you want during an interrupt routine.

Reply by Anton Erasmus ●April 2, 20082008-04-02

On Wed, 02 Apr 2008 08:57:52 -0500, CBFalconer <cbfalconer@yahoo.com>
wrote:

>David Brown wrote:
>> CBFalconer wrote:
>>
>> <snip>
>>
>>> I left the whole thing unsnipped.  The time has come for me to
>>> crave forgiveness.  I think I have been afflicted with age or
>>> something.  The bits/persec crowd are absolutely correct, and I
>>> am wrong.
>> 
>> I don't think you need forgiveness - you just made a mistake.
>> 
>>> So that leaves the real problem handling throughput of
>>> approximately 1 char each 10 microsec.
>> 
>> You need to handle an *average* of 1 character per 10 us.  But
>> the cost of handling each character is peanuts - even if the
>> UART is on a slow bus, you should be able to read out characters
>> at something like 20 per us.  The cost is in the handling of the
>> interrupt itself - context switches, cache misses, etc.  That's
>> why you use a UART with a buffer - it takes virtually the same
>> time to read 128 bytes out the buffer during one interrupt, as
>> to read 1 byte from the buffer during the interrupt. So if
>> you've set your UART to give an interrupt every 100 characters,
>> you get an interrupt every ms and read out a block of 100
>> characters at a time.
>
>That depends on your CPU speed.  Within the interrupt, you have to
>handle something like:
>
>  REPEAT
>   test for more in FIFO
>   take one, stuff in buffer, while checking for buffer full.
>   test for overflow or other errors.
>   if any, call appropriate handler
>  UNTIL FIFO empty
>  clear interrupt system
>  rearm interrupt
>  exit
>
>Note that some operations will require several accesses to the
>UART.  Those will eat up time.  They will be much slower than
>on-chip memory access.

This can be surprisingly slow.  On a recent project I used an STR9 ARM
MCU with the onboard UARTs as well as an external Exar UART.
On the Exar UART one could read the number of characters available in
the RX FIFO, on the MCU uarts one can only check for character
avaiable, or FIFO full. 
So with the EXAR onr could:

read number of chars available
repeat
read char
if buffer not full stuff into buffer
until chars read.

This turned out to be 5x faster than with the onboard UARTs where one
had to check the FIFO not empty flag every time.
On a 50MHz ARM9 it took about 25us per 16 characters having to do it
the way CBFalconer described it, while it only took about 5us per 16
characters where one could read how many chars were in the Rx FIFO.

Regards
  Anton Erasmus

Reply by Ulf Samuelsson ●April 2, 20082008-04-02

>>>
>>> You need to handle an *average* of 1 character per 10 us.  But
>>> the cost of handling each character is peanuts - even if the
>>> UART is on a slow bus, you should be able to read out
>>> characters at something like 20 per us.  The cost is in the
>>> handling of the interrupt itself - context switches, cache
>>> misses, etc.  That's why you use a UART with a buffer - it
>>> takes virtually the same time to read 128 bytes out the buffer
>>> during one interrupt, as to read 1 byte from the buffer during
>>> the interrupt. So if you've set your UART to give an interrupt
>>> every 100 characters, you get an interrupt every ms and read
>>> out a block of 100 characters at a time.
>>
>> That depends on your CPU speed.
>
> True.  The OP is running an XScale with Linux, so I'd guess he's
> running at a couple hundred MHz.
>
>> Within the interrupt, you have to
>> handle something like:
>>
>>   REPEAT
>>    test for more in FIFO
>>    take one, stuff in buffer, while checking for buffer full.
>>    test for overflow or other errors.
>>    if any, call appropriate handler
>>   UNTIL FIFO empty
>>   clear interrupt system
>>   rearm interrupt
>>   exit
>>
>> Note that some operations will require several accesses to the
>> UART.  Those will eat up time.  They will be much slower than
>> on-chip memory access.
>
> People have been supporting 921K bps serial links for ages. You
> do have to pay attention to what you're doing, but it's really
> not that hard with a sufficiently large FIFO.  However, IMO a
> 128 FIFO is getting close to being insufficiently large.  I
> wouldn't want to try to support it on a Linux system with
> interrupt latencies imposed by a bunch of randomly chosen
> device drivers. If it's an embedded system and you've got
> control over what other ISRs are running, it should be doable.
>
> -- 

DMA is way superior to FIFO, since you dump to memory in the background.
The AT91 (and AVR32 implementation) support a Timeout interrupt
which is triggered if NO characters arrive in a certain number of bit 
periods.

What really limits the speed in Linux is the error handling.
If you want to to proper error handling, you typically have
to handle the error before the next character arrives, and this
is pretty difficult in Linux, and will severly limit the speed.





-- 
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

Reply by Didi ●April 2, 20082008-04-02

David Brown wrote:
> ...
> If you have a system where you have better knowledge of the interrupts,
> the response times, and the required times, then you are free to do all
> the work you want during an interrupt routine.

Why would you ant to do it there?
Interrupts are meant to be as short as possible and do only what
cannot
be done outside their handlers - this is fundamental to programming.
I know it can be done otherwise, and I know they do such a mess to
no direct consequences because most of the hardware nowadays
is 10x to 1000+x overkill, but why want to do it so?

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

David Brown wrote:
> Didi wrote:
> > CBFalconer wrote:
> >> ....
> >> That depends on your CPU speed.  Within the interrupt, you have to
> >> handle something like:
> >>
> >>   REPEAT
> >>    test for more in FIFO
> >>    take one, stuff in buffer, while checking for buffer full.
> >>    test for overflow or other errors.
> >
> > So far OK, although it sounds more complex than it actually is - 128
> > bytes
> > will be processed within 1000 CPU cycles easily, which would be taking
> > all the time on a 1 MHz CPU...
> >
> >>    if any, call appropriate handler
> >
> > You don't call any "handlers" from an IRQ service routine.
> > You set a flag bit somewhere to indicate what happened and let the
> > non-time critical code deal with it.
> >
>
> In this case (a fast UART on a Linux system), you don't want to do
> anything with the incoming data except buffer it - processing is done in
> a different process/thread, as you suggest.  But it's worth noting that
> in some embedded systems, it makes a lot of sense to do more specific
> handling of data during interrupt routines - interrupt handlers do not
> necessarily need to be as fast as possible, only as fast as necessary.
> If you have a system where you have better knowledge of the interrupts,
> the response times, and the required times, then you are free to do all
> the work you want during an interrupt routine.

Reply by Grant Edwards ●April 2, 20082008-04-02

On 2008-04-02, Anton Erasmus <nobody@spam.prevent.net> wrote:

> This turned out to be 5x faster than with the onboard UARTs where one
> had to check the FIFO not empty flag every time.
> On a 50MHz ARM9 it took about 25us per 16 characters having to do it
> the way CBFalconer described it, while it only took about 5us per 16
> characters where one could read how many chars were in the Rx FIFO.

There are a lot of micro-controllers out there with horribly
designed UARTs in them.  One I've fought with recently is the
in the Samsung S3C4530.  Half of the features don't work at
all.  Half of the stuff that does work are useless because
whoever specified/designed the UART had never actually done any
serial communications only had a vague understanding of how
things like UARTs and FIFOs are used.  For example, it has
FIFOs, but there's no way to flush them.  It also has "hardware
flow control", but it doesn't work in a way that can be used
with any other UART on the planet.

-- 
Grant Edwards                   grante             Yow! ... the MYSTERIANS are
                                  at               in here with my CORDUROY
                               visi.com            SOAP DISH!!

Reply by John Devereux ●April 2, 20082008-04-02

Didi <dp@tgi-sci.com> writes:

> David Brown wrote:
>> ...
>> If you have a system where you have better knowledge of the interrupts,
>> the response times, and the required times, then you are free to do all
>> the work you want during an interrupt routine.
>
> Why would you ant to do it there?
> Interrupts are meant to be as short as possible and do only what
> cannot
> be done outside their handlers - this is fundamental to programming.
> I know it can be done otherwise, and I know they do such a mess to
> no direct consequences because most of the hardware nowadays
> is 10x to 1000+x overkill, but why want to do it so?
>

How about decoding SLIP or similar? If you wait until the end of the
frame, you have to have double the buffer size to cope with the worst
case scenario. If decoded "inline", in the irq handler, the maximum
size is just that of the decoded data.

Also protocols like modbus need to have protocol-level decisions
(e.g. about timing) done in the ISR. It doesn't work to have a generic
"read block" performed by the ISR followed by decoding in the task
level.

(Obviously all this depends on your definition of ISR, since no doubt
it can all be done at "task level" with a good enough RTOS. But in
that case the "task" is realy just another type of ISR, isn't it?)

-- 

John Devereux

Reply by John Devereux ●April 2, 20082008-04-02

Grant Edwards <grante@visi.com> writes:

> On 2008-04-02, Anton Erasmus <nobody@spam.prevent.net> wrote:
>
>> This turned out to be 5x faster than with the onboard UARTs where one
>> had to check the FIFO not empty flag every time.
>> On a 50MHz ARM9 it took about 25us per 16 characters having to do it
>> the way CBFalconer described it, while it only took about 5us per 16
>> characters where one could read how many chars were in the Rx FIFO.
>
> There are a lot of micro-controllers out there with horribly
> designed UARTs in them.  One I've fought with recently is the
> in the Samsung S3C4530.  Half of the features don't work at
> all.  Half of the stuff that does work are useless because
> whoever specified/designed the UART had never actually done any
> serial communications only had a vague understanding of how
> things like UARTs and FIFOs are used.  For example, it has
> FIFOs, but there's no way to flush them.  It also has "hardware
> flow control", but it doesn't work in a way that can be used
> with any other UART on the planet.

Everything with a "industry standard '550 uart" is horrible.

-- 

John Devereux

Reply by Didi ●April 2, 20082008-04-02

Hi John,

> How about decoding SLIP or similar? If you wait until the end of the
> frame, you have to have double the buffer size to cope with the worst
> case scenario. If decoded "inline", in the irq handler, the maximum
> size is just that of the decoded data.

Actually it takes very little more than that - a few bytes - and you
will
not need to do it in the handler. Even a 16 byte FIFO organized in
memory will allow you to do it "normally" and only queue the incoming
data into the FIFO. But OK, I can see your point. I don't know SLIP,
but I have done PPP and this is doable since you will not enter a loop
anywhere in the handler, just make it a bit branchier. Not much more
than queueing the data.
Can be a valid choice, I agree - although it should be taken only if
there is a good enough reason not to take the other one, e.g. you
do need the 16 or so bytes, or if you can squeeze the last drop of CPU
performance if you do so and you need that drop etc.

> (Obviously all this depends on your definition of ISR, since no doubt
> it can all be done at "task level" with a good enough RTOS. But in
> that case the "task" is realy just another type of ISR, isn't it?)

Well no, the "task" can have a lot worse a latency than the ISR - in
the example above, 16 times. Make that 256 times if you can afford
a 256 byte deep queue (FIFO). Then you can spend this latency on
multitasking or whatever you can use it for in the particular design.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

John Devereux wrote:
> Didi <dp@tgi-sci.com> writes:
>
> > David Brown wrote:
> >> ...
> >> If you have a system where you have better knowledge of the interrupts,
> >> the response times, and the required times, then you are free to do all
> >> the work you want during an interrupt routine.
> >
> > Why would you ant to do it there?
> > Interrupts are meant to be as short as possible and do only what
> > cannot
> > be done outside their handlers - this is fundamental to programming.
> > I know it can be done otherwise, and I know they do such a mess to
> > no direct consequences because most of the hardware nowadays
> > is 10x to 1000+x overkill, but why want to do it so?
> >
>
> How about decoding SLIP or similar? If you wait until the end of the
> frame, you have to have double the buffer size to cope with the worst
> case scenario. If decoded "inline", in the irq handler, the maximum
> size is just that of the decoded data.
>
> Also protocols like modbus need to have protocol-level decisions
> (e.g. about timing) done in the ISR. It doesn't work to have a generic
> "read block" performed by the ISR followed by decoding in the task
> level.
>
> (Obviously all this depends on your definition of ISR, since no doubt
> it can all be done at "task level" with a good enough RTOS. But in
> that case the "task" is realy just another type of ISR, isn't it?)
>
> --
>
> John Devereux

Reply by ●April 2, 20082008-04-02

Didi wrote:

> Interrupts are meant to be as short as possible and do only what
> cannot
> be done outside their handlers - this is fundamental to programming.

Of course.  But sometimes "what cannot be done outside" does include 
some, or even all the processing of that incoming data.   Resources or 
response time constraints might not allow any other approach.

For just one example, let's consider that you're running XON/XOFF flow 
control on a plain old RS232 link.  That would mean even an interrupt 
handler that would normally just stuff each byte received by the UART 
into some software FIFO, had better look at the actual character, too, 
to check if it's XOFF.