Linux serial port dropping bytes| page 4

Reply by CBFalconer ●April 2, 20082008-04-02

David Brown wrote:
> CBFalconer wrote:
>
> <snip>
>
>> I left the whole thing unsnipped.  The time has come for me to
>> crave forgiveness.  I think I have been afflicted with age or
>> something.  The bits/persec crowd are absolutely correct, and I
>> am wrong.
> 
> I don't think you need forgiveness - you just made a mistake.
> 
>> So that leaves the real problem handling throughput of
>> approximately 1 char each 10 microsec.
> 
> You need to handle an *average* of 1 character per 10 us.  But
> the cost of handling each character is peanuts - even if the
> UART is on a slow bus, you should be able to read out characters
> at something like 20 per us.  The cost is in the handling of the
> interrupt itself - context switches, cache misses, etc.  That's
> why you use a UART with a buffer - it takes virtually the same
> time to read 128 bytes out the buffer during one interrupt, as
> to read 1 byte from the buffer during the interrupt. So if
> you've set your UART to give an interrupt every 100 characters,
> you get an interrupt every ms and read out a block of 100
> characters at a time.

That depends on your CPU speed.  Within the interrupt, you have to
handle something like:

  REPEAT
   test for more in FIFO
   take one, stuff in buffer, while checking for buffer full.
   test for overflow or other errors.
   if any, call appropriate handler
  UNTIL FIFO empty
  clear interrupt system
  rearm interrupt
  exit

Note that some operations will require several accesses to the
UART.  Those will eat up time.  They will be much slower than
on-chip memory access.

-- 
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: <http://cbfalconer.home.att.net>
            Try the download section.



-- 
Posted via a free Usenet account from http://www.teranews.com

Reply by Derek Young ●April 2, 20082008-04-02

> 
>> So that leaves the real problem handling throughput of
>> approximately 1 char each 10 microsec.
> 
> That's where a large FIFO becomes important.  Using Linux on an
> XScale (which, IIRC, is is what the OP is using), I've done up
> to 460K baud without problems.  But, that was using a UART with
> a 1K byte rx FIFO.  That UART also allowed 32-bit wide accesses
> to the tx/rx FIFOs so that you could transfer 4 bytes per bus
> cycle.
> 
> With a 128 byte FIFO and byte-wide access, the timing
> constraints are quite a bit tighter, but I think it should be
> doable if you carefully vet the other drivers that are running
> on the system.
> 

Unfortunately, I need to rely on drivers written by Arcom.  I've sent 
some questions to their tech support, but am still waiting for a reply.

I was thinking... at 921.6 kbps (8/N/1 -> 92160 bytes/sec), if the FIFO 
interrupt level is set at:

128 bytes: 720 interrupts/sec (1.4 ms/int), 10.8 us allowed for the ISR 
to respond and empty the FIFO

64 bytes: 1440 interrupts/sec (694 us/int), ~700 us for the ISR to 
respond/finish

(I'm still trying to look through the driver code to figure out where 
the interrupt level is set.)

The XScale CPU I'm using runs at 400 MHz.  (I've forgotten who asked, 
but it's communicating to with a TSM320F2812 DSP.)  Hardware flow 
control is not an option because it's not implemented in the Arcom hardware.

Do these interrupt frequencies sound reasonable for a non-realtime OS, 
or is it hopeless as some of my coworkers here have suggested?

BTW, I'm going to give the guy who picked out this particular 
hardware/software combo a really hard time.  :P


Derek

Reply by Grant Edwards ●April 2, 20082008-04-02

On 2008-04-02, CBFalconer <cbfalconer@yahoo.com> wrote:

>>> So that leaves the real problem handling throughput of
>>> approximately 1 char each 10 microsec.
>> 
>> You need to handle an *average* of 1 character per 10 us.  But
>> the cost of handling each character is peanuts - even if the
>> UART is on a slow bus, you should be able to read out
>> characters at something like 20 per us.  The cost is in the
>> handling of the interrupt itself - context switches, cache
>> misses, etc.  That's why you use a UART with a buffer - it
>> takes virtually the same time to read 128 bytes out the buffer
>> during one interrupt, as to read 1 byte from the buffer during
>> the interrupt. So if you've set your UART to give an interrupt
>> every 100 characters, you get an interrupt every ms and read
>> out a block of 100 characters at a time.
>
> That depends on your CPU speed.

True.  The OP is running an XScale with Linux, so I'd guess he's
running at a couple hundred MHz.

> Within the interrupt, you have to
> handle something like:
>
>   REPEAT
>    test for more in FIFO
>    take one, stuff in buffer, while checking for buffer full.
>    test for overflow or other errors.
>    if any, call appropriate handler
>   UNTIL FIFO empty
>   clear interrupt system
>   rearm interrupt
>   exit
>
> Note that some operations will require several accesses to the
> UART.  Those will eat up time.  They will be much slower than
> on-chip memory access.

People have been supporting 921K bps serial links for ages. You
do have to pay attention to what you're doing, but it's really
not that hard with a sufficiently large FIFO.  However, IMO a
128 FIFO is getting close to being insufficiently large.  I
wouldn't want to try to support it on a Linux system with
interrupt latencies imposed by a bunch of randomly chosen
device drivers. If it's an embedded system and you've got
control over what other ISRs are running, it should be doable.

-- 
Grant Edwards                   grante             Yow! I would like to
                                  at               urinate in an OVULAR,
                               visi.com            porcelain pool --

Reply by sprocket ●April 2, 20082008-04-02

Grant Edwards wrote:

> People have been supporting 921K bps serial links for ages. You
> do have to pay attention to what you're doing, but it's really
> not that hard with a sufficiently large FIFO.  However, IMO a
> 128 FIFO is getting close to being insufficiently large. 

If the board has USB, he could interpose a serial-to-USB converter- the 
FTD2232 from FTDI has a 384 byte receive buffer, which should get him 
down to 4ms or so per interrupt.

Reply by Derek Young ●April 2, 20082008-04-02

> If the board has USB, he could interpose a serial-to-USB converter- the 
> FTD2232 from FTDI has a 384 byte receive buffer, which should get him 
> down to 4ms or so per interrupt.
> 

That's a really good suggestion.  During early development on a 
Windows/Labview machine, I used a Quatech RS-422 to USB2.0 converter 
(with a 2K buffer) to get over this same problem.  But in the current 
hardware, there's not a lot of room for an adapter, and I'm worried 
about getting a working driver for this particular flavor of Linux.

It took a couple of calls to Quatech to get their box working in 
Windows.  It was initially randomly duplicating bytes, so I would get 
packets that were longer than expected!

Derek

Reply by Grant Edwards ●April 2, 20082008-04-02

On 2008-04-02, Derek Young <edu.mit.LL@dereky.nospam> wrote:

>>> So that leaves the real problem handling throughput of
>>> approximately 1 char each 10 microsec.
>> 
>> That's where a large FIFO becomes important.  Using Linux on
>> an XScale (which, IIRC, is is what the OP is using), I've done
>> up to 460K baud without problems.  But, that was using a UART
>> with a 1K byte rx FIFO.  That UART also allowed 32-bit wide
>> accesses to the tx/rx FIFOs so that you could transfer 4 bytes
>> per bus cycle.
>> 
>> With a 128 byte FIFO and byte-wide access, the timing
>> constraints are quite a bit tighter, but I think it should be
>> doable if you carefully vet the other drivers that are running
>> on the system.
>
> Unfortunately, I need to rely on drivers written by Arcom.
> I've sent some questions to their tech support, but am still
> waiting for a reply.
>
> I was thinking... at 921.6 kbps (8/N/1 -> 92160 bytes/sec), if the FIFO 
> interrupt level is set at:
>
> 128 bytes: 720 interrupts/sec (1.4 ms/int), 10.8 us allowed for the ISR 
> to respond and empty the FIFO

That 10 us latency requirement is probably impossible.

> 64 bytes: 1440 interrupts/sec (694 us/int), ~700 us for the ISR to 
> respond/finish

That might be possible as long as you can make sure there
aren't any other ISRs running that take more than a a few tens
of microseconds.

> (I'm still trying to look through the driver code to figure out where 
> the interrupt level is set.)
>
> The XScale CPU I'm using runs at 400 MHz.  (I've forgotten who
> asked, but it's communicating to with a TSM320F2812 DSP.)
> Hardware flow control is not an option because it's not
> implemented in the Arcom hardware.
>
> Do these interrupt frequencies sound reasonable for a
> non-realtime OS, or is it hopeless as some of my coworkers
> here have suggested?

It's definitely pushing the limits pretty hard.  With enough
time and effort you might be able to make it work but all it
would take is one other ISR that runs for more than a few
hundred microseconds and you've lost data.

Whether the current architecture is acceptible depends on a
number of questions:

 1) What are the consequences of loosing data?  Does somebody
    die, or is there merely a retry?

 2) How much schedule risk is acceptible?  Is it OK if it takes
    a year of hacking on the source code for a few kernel
    modules and a half-dozen different device drivers to get
    the overall ISR latency down?

 3) How much redesign risk is acceptible?  Is it OK if you work
    on it for three months before proving that it can't work
    and a better UART or different interface has to be chosen?

-- 
Grant Edwards                   grante             Yow! I'm having BEAUTIFUL
                                  at               THOUGHTS about the INSIPID
                               visi.com            WIVES of smug and wealthy
                                                   CORPORATE LAWYERS ...

Reply by David Brown ●April 2, 20082008-04-02

sprocket wrote:
> Grant Edwards wrote:
> 
>> People have been supporting 921K bps serial links for ages. You
>> do have to pay attention to what you're doing, but it's really
>> not that hard with a sufficiently large FIFO.  However, IMO a
>> 128 FIFO is getting close to being insufficiently large. 
> 
> If the board has USB, he could interpose a serial-to-USB converter- the 
> FTD2232 from FTDI has a 384 byte receive buffer, which should get him 
> down to 4ms or so per interrupt.
> 

I've got a board that uses one of these devices, with a ColdFire running 
at 150 MHz.  I run the UART at about 2.5 Mbps (250 KB per second, if you 
prefer :-).  I'm not running Linux, but there are plenty of other 
interrupts going on at rates of at least several kHz.

Reply by Didi ●April 2, 20082008-04-02

CBFalconer wrote:
> ....
> That depends on your CPU speed.  Within the interrupt, you have to
> handle something like:
>
>   REPEAT
>    test for more in FIFO
>    take one, stuff in buffer, while checking for buffer full.
>    test for overflow or other errors.

So far OK, although it sounds more complex than it actually is - 128
bytes
will be processed within 1000 CPU cycles easily, which would be taking
all the time on a 1 MHz CPU...

>    if any, call appropriate handler

You don't call any "handlers" from an IRQ service routine.
You set a flag bit somewhere to indicate what happened and let the
non-time critical code deal with it.

>   UNTIL FIFO empty
>   clear interrupt system
>   rearm interrupt
>   exit

That "rearm interrupt" is actually part of the return from interrupt
opcode
on normal processors (perhaps even on Intel?), but generally this is
how
it is typically done. The "call this or that" mistake from a handler
seems
to be also frequently done, of course.

Here is how it has to be done to keep latency really low:
begin IRQ handler:
 (save registers which will be changed)
 disable UART interrupt
 enable CPU interrupts
 empty UARTs FIFO into memory and flag error(s), if some detected
 disable CPU interrupts
 enable UART interrupt
 (restore saved registers)
 return from interrupt
end IRQ handler

This must be applied to *all* interrupt handlers in a system in order
to
work, of course. The minimum latency gets a little worse, but the
maximum latency - which is the limiting factor - is dramatically
reduced.

Grant suggests this would not be that easy to do even if it is one
person
working on it and while I tend to agree with him based on what I have
witnessed last 20+ years, I must say I have been doing things like
that routinely over these 20+years myself... :-).

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

CBFalconer wrote:
> David Brown wrote:
> > CBFalconer wrote:
> >
> > <snip>
> >
> >> I left the whole thing unsnipped.  The time has come for me to
> >> crave forgiveness.  I think I have been afflicted with age or
> >> something.  The bits/persec crowd are absolutely correct, and I
> >> am wrong.
> >
> > I don't think you need forgiveness - you just made a mistake.
> >
> >> So that leaves the real problem handling throughput of
> >> approximately 1 char each 10 microsec.
> >
> > You need to handle an *average* of 1 character per 10 us.  But
> > the cost of handling each character is peanuts - even if the
> > UART is on a slow bus, you should be able to read out characters
> > at something like 20 per us.  The cost is in the handling of the
> > interrupt itself - context switches, cache misses, etc.  That's
> > why you use a UART with a buffer - it takes virtually the same
> > time to read 128 bytes out the buffer during one interrupt, as
> > to read 1 byte from the buffer during the interrupt. So if
> > you've set your UART to give an interrupt every 100 characters,
> > you get an interrupt every ms and read out a block of 100
> > characters at a time.
>
> That depends on your CPU speed.  Within the interrupt, you have to
> handle something like:
>
>   REPEAT
>    test for more in FIFO
>    take one, stuff in buffer, while checking for buffer full.
>    test for overflow or other errors.
>    if any, call appropriate handler
>   UNTIL FIFO empty
>   clear interrupt system
>   rearm interrupt
>   exit
>
> Note that some operations will require several accesses to the
> UART.  Those will eat up time.  They will be much slower than
> on-chip memory access.
>
> --
>  [mail]: Chuck F (cbfalconer at maineline dot net)
>  [page]: <http://cbfalconer.home.att.net>
>             Try the download section.
>
>
>
> --
> Posted via a free Usenet account from http://www.teranews.com

Reply by David Brown ●April 2, 20082008-04-02

CBFalconer wrote:
<snip>
> That depends on your CPU speed.  Within the interrupt, you have to
> handle something like:
> 
>   REPEAT
>    test for more in FIFO
>    take one, stuff in buffer, while checking for buffer full.
>    test for overflow or other errors.
>    if any, call appropriate handler
>   UNTIL FIFO empty
>   clear interrupt system
>   rearm interrupt
>   exit
> 
> Note that some operations will require several accesses to the
> UART.  Those will eat up time.  They will be much slower than
> on-chip memory access.
> 

This stuff is not magic - it's standard fare for embedded developers. 
You seem determined to view the problem from the worst possible angle, 
and pick the worst possible solution.  You do *not* have to check for 
overflows or other receive errors for each byte (buffered uarts provide 
summary flags, and you would normally use higher level constructs, such 
as crc checks, to check correctness on a fast link).  You do *not* have 
to check for space in your buffer for each byte.  At the start of the 
ISR, you ask the UART how many bytes are in the FIFO buffer, and you 
check how much space you have in the memory buffer.  That tells you how 
often to execute your loop.

The requirements for the read loop are so simple that in many 32-bit 
microcontrollers, you can set up a DMA controller to handle it.

Reply by Derek Young ●April 2, 20082008-04-02

Grant Edwards wrote:
> On 2008-04-02, Derek Young <edu.mit.LL@dereky.nospam> wrote:
> 
>>>> So that leaves the real problem handling throughput of
>>>> approximately 1 char each 10 microsec.
>>> That's where a large FIFO becomes important.  Using Linux on
>>> an XScale (which, IIRC, is is what the OP is using), I've done
>>> up to 460K baud without problems.  But, that was using a UART
>>> with a 1K byte rx FIFO.  That UART also allowed 32-bit wide
>>> accesses to the tx/rx FIFOs so that you could transfer 4 bytes
>>> per bus cycle.
>>>
>>> With a 128 byte FIFO and byte-wide access, the timing
>>> constraints are quite a bit tighter, but I think it should be
>>> doable if you carefully vet the other drivers that are running
>>> on the system.
>> Unfortunately, I need to rely on drivers written by Arcom.
>> I've sent some questions to their tech support, but am still
>> waiting for a reply.
>>
>> I was thinking... at 921.6 kbps (8/N/1 -> 92160 bytes/sec), if the FIFO 
>> interrupt level is set at:
>>
>> 128 bytes: 720 interrupts/sec (1.4 ms/int), 10.8 us allowed for the ISR 
>> to respond and empty the FIFO
> 
> That 10 us latency requirement is probably impossible.
> 
>> 64 bytes: 1440 interrupts/sec (694 us/int), ~700 us for the ISR to 
>> respond/finish
> 
> That might be possible as long as you can make sure there
> aren't any other ISRs running that take more than a a few tens
> of microseconds.
> 
>> (I'm still trying to look through the driver code to figure out where 
>> the interrupt level is set.)
>>
>> The XScale CPU I'm using runs at 400 MHz.  (I've forgotten who
>> asked, but it's communicating to with a TSM320F2812 DSP.)
>> Hardware flow control is not an option because it's not
>> implemented in the Arcom hardware.
>>
>> Do these interrupt frequencies sound reasonable for a
>> non-realtime OS, or is it hopeless as some of my coworkers
>> here have suggested?
> 
> It's definitely pushing the limits pretty hard.  With enough
> time and effort you might be able to make it work but all it
> would take is one other ISR that runs for more than a few
> hundred microseconds and you've lost data.
> 
> Whether the current architecture is acceptible depends on a
> number of questions:
> 
>  1) What are the consequences of loosing data?  Does somebody
>     die, or is there merely a retry?
> 
>  2) How much schedule risk is acceptible?  Is it OK if it takes
>     a year of hacking on the source code for a few kernel
>     modules and a half-dozen different device drivers to get
>     the overall ISR latency down?
> 
>  3) How much redesign risk is acceptible?  Is it OK if you work
>     on it for three months before proving that it can't work
>     and a better UART or different interface has to be chosen?
> 

Absolutely.  So, for me, the dropped bytes mean that I'm losing 
approximately 3% of my data packets, worst-case.  I'm going to argue 
that this is okay for now.  It's just data.  Each packet can be 
considered a retry.  And I can easily tell which are bad and resync.

I don't think I have the time (or expertise, really) to mess around any 
more with the Linux kernel.  I'm going to suggest redesigning the 
hardware in the next version to avoid using the serial link.  Also going 
to suggest avoid using Linux (or any OS).  It's really overkill for this 
application, as it turns out, quite a hassle.

Thanks everybody for all your input and advice.

Derek