EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

UART TX FIFO and INTs problem

Started by forum_microbit February 17, 2004
Hi all,

[I'm reposting on the URL, the new lpc2000 address doesn't seem to wor

I'm a bit stuck with this one, and hope someone has some advice,
I must be overlooking something and I just don't see it.......

I'm using Two 256 byte circular buffers for RX and TX Interrupts on
UART0.

Putchar() writes a char to the TX buffer's current head pointer and
post increments it (auto wrap on index byte). It then increments the
# of TX chars in the buffer (tx_chars) and enables the THRE Interrupt.

The THRE handler part takes a character from the tail pointer, and
post increments. It then decrements tx_chars.
If TX buffer tail equals TX buffer Head, the THRE Interrupt is
disabled.

Of course in case I write too fast to the TX buffer and it overflows
(to simplify for the time being) at the end of Putchar() I check if
tx_chars is larger than say, 200 Chars.
If it is, I wait in a while loop for tx_chars to drop below a
threshold, say 32.

I've used this principle on many MCUs, and never had a problem.

The problem is that if I write a _constant_ flow of characters much
faster than the Baudrate after quite some time the transmission INTs
stop. [ NO other interrupts are running during this test ]
To test it, I set a pin HIGH when Putchar() is waiting for the TX
buffer to empty a bit, and clear it when buffer filling resumes - a
bit like this :

volatile unsigned char tx_chars;

..... /* write char to buffer */
++tx_chars ; /* another char in TX buffer */
if ( tx_chars > 200 )
{
/* set test pin HIGH */
while ( tx_chars > 32 ) ; /* let TX buffer flush out a bit */
}
/* set test pin LOW */
}

I can see all is fine, the TEST pin spends time HIGH while the buffer
is being flushed, and is then LOW for a few mS (printf() is re-
filling up the buffer from 32 or less to > 200 chars)

When the transmit out of UART0 TX stops, tx_chars is set to one more
than the threshold (for example - here 33 ), and Tail and Head
Pointer of TX buffer are equal, hence the THRE interrupt is disabled.
The code of course is stuck in the while ( tx_chars > 32 ) loop ......

A plausible explanation would be that at times, the TX FIFO holds
more than one byte, and when the THRE interrupt occurs, more than one
char should be subtracted from tx_chars.
This could explain why tx_chars isn't <Zero> when TX Head and TX Tail
become equal and THRE INTs turn OFF.
On the other hand this doesn't make sense, as the TX FIFO should be
transparent to that process. Also, if this was a simple FIFO issue,
that process should lock up after a few hundred characters are
transmitted.

There is nothing in the user guide on LPC2106 that clarifies how
exactly the TX FIFO and the THRE interact.
Although the guide says that FIFOs should be enabled for proper UART
operation, leaving U0FCR to ZERO makes no difference, nor does
changing the FIFO trigger level (again wording is ambiguous here,
left column implies RX FIFO only , right column bitmap implies both
RX and TX FIFOs.

In fact I don't see the point of the TX FIFO if your THRE/TEMT can't
tell you whether you just flushed out 1 or <trigger level> bytes.

Is anyone seeing what the problem is ?
I'm stumped.

Best regards,
Kris



An Engineer's Guide to the LPC2100 Series

A couple of questions and observations. Maybe they'll spark something.
At 06:49 PM 2/17/04 +0000, you wrote:
>I'm using Two 256 byte circular buffers for RX and TX Interrupts on
>UART0.

Not really related to your actual problem but since you are using printf
why bother with a transmit interrupt? Unless you are planning on
multithreading? I've always found that all a serial transmit interrupt on
single threaded apps does is introduce needless complexity (read bugs).

>
>Putchar() writes a char to the TX buffer's current head pointer and
>post increments it (auto wrap on index byte). It then increments the
># of TX chars in the buffer (tx_chars) and enables the THRE Interrupt.
>
>The THRE handler part takes a character from the tail pointer, and
>post increments. It then decrements tx_chars.
>If TX buffer tail equals TX buffer Head, the THRE Interrupt is
>disabled.
>
>Of course in case I write too fast to the TX buffer and it overflows
>(to simplify for the time being) at the end of Putchar() I check if
>tx_chars is larger than say, 200 Chars.
>If it is, I wait in a while loop for tx_chars to drop below a
>threshold, say 32.

That raises a bunch of design issues to mind. Why such a large gap? Why
not add to buffer as soon as any room is available? And finally why two
different variables to keep track of the room in the ring buffer?

>
>I've used this principle on many MCUs, and never had a problem.
>
>The problem is that if I write a _constant_ flow of characters much
>faster than the Baudrate after quite some time the transmission INTs
>stop. [ NO other interrupts are running during this test ]
>To test it, I set a pin HIGH when Putchar() is waiting for the TX
>buffer to empty a bit, and clear it when buffer filling resumes - a
>bit like this :
>
>volatile unsigned char tx_chars;
>
>..... /* write char to buffer */
>++tx_chars ; /* another char in TX buffer */
>if ( tx_chars > 200 )
> {
> /* set test pin HIGH */
> while ( tx_chars > 32 ) ; /* let TX buffer flush out a bit */
> }
>/* set test pin LOW */
>}
>
>I can see all is fine, the TEST pin spends time HIGH while the buffer
>is being flushed, and is then LOW for a few mS (printf() is re-
>filling up the buffer from 32 or less to > 200 chars)
>
>When the transmit out of UART0 TX stops, tx_chars is set to one more
>than the threshold (for example - here 33 ), and Tail and Head
>Pointer of TX buffer are equal, hence the THRE interrupt is disabled.
>The code of course is stuck in the while ( tx_chars > 32 ) loop ......
>
>A plausible explanation would be that at times, the TX FIFO holds
>more than one byte, and when the THRE interrupt occurs, more than one
>char should be subtracted from tx_chars.

I don't see how that could happen given your outline. The pointer is
updated exactly once, as is the count. One possibility is that there is an
access control problem.

Are your pointers and counters declared as volatile?
Is access to both protected with interrupt disabling and re-enabling sequences?

One thing that can happen on the ARM that won't on micros with small
register sets is that the pointer and counters can be held in the register
sets and won't get spilled out to memory unless they are declared as
volatile. I'd expect the issue to be a little more dramatic in that case
but...

This does 'feel' more like an access control race where the interrupt
decrements the counter and returns to normal execution which immediately
overwrites it with an old updated value. Something like:

r1 = tx_chars
- jump to transmit interrupt
- Save appropriate registers
- Perform transmit
- r1 = tx_chars
- r1 -= 1
- tx_chars = r1
- restore saved registers
- return from interrupt
r1 += 1
tx_chars = r1

And now tx_chars is one higher than it should be. As bugs go these ones
tend to be very timing sensitive.

Ring any bells?

Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III


Hi Robert,
 
Thanks for the insights, and of course while the group was off the air for a while
I'm pretty near getting on top of it.
It turns out it's not the INTs that are causing this after all.
As you might know, this is to do with the BASIC interpeter, and I can execute
Basic that prints data really fast with no problems at all.
Also, when I just loop really fast in a printf() with a long string, it works good
as gold too.
It's when I use one specific procedure call that I've narrowed it down to.
I think it's a CG issue, but not sure, it's very hard to track down :-(
It might be my interpreter code too, but I doubt it.
 
Maybe to clarify a bit :

> Not really related to your actual problem but since you are using printf
> why bother with a transmit interrupt?  Unless you are planning on
> multithreading?  I've always found that all a serial transmit interrupt on
> single threaded apps does is introduce needless complexity (read bugs).
There's a few very specific reasons it's set up that way.
That UART channel normally carries ASCII data, and then of course no INTs
would be needed, given that it's a human I/F. However, when I will add the RF
frontend it is very important to have these INTs, for example you can give a
Connect statement to the interpeter, and that immediately sets up a TDD session
between 2 nodes, so I need to carry binary protocols along, and I can't afford to
waste CPU time on polling. (the polling loop on tx_chars dropping down is temporary
of course. I find it's a good test to see if everything is up to scratch, obviously
something's not :-)
 
The large (sort of) buffer is the minimum needed to create a pseudo-full
duplex RS232 wireless link between 2 nodes. While data is being serviced on
the RF in one direction, the other side's RS232 RX buffer can fill up, and compensate
for the Half Duplex nature and Line Turn Around delays.
When it's in Binary mode, the scheme above isn't used, the buffer is filled up,
and then the INTs are enabled, so the CPU can spend most of its time doing other
things. RTS/CTS is used there.
And yes, afterwards my own small but fast RTOS will be plugged back in, so I'll use
counting semaphores to handle the ring buffers.
 
> That raises a bunch of design issues to mind.  Why such a large gap?  Why
> not add to buffer as soon as any room is available?  And finally why two
> different variables to keep track of the room in the ring buffer?
Partially as above, the large gap is there to reduce CPU loading, if too much data
is being written too fast into the ring buffer.
I find using a Head and a Tail more convenient to manage, rather than having to
reset the pointer all the time, it's also easier to do the ground work w/o the RTOS,
but have the whole system built up so it's easy to assign tasks and only change the
"foreground" code, then _actually_ being foreground code instead of some sort of
a superloop. The dead CPU time is then being used better. I'm trying to optimise
what will be in what task, and minimise # of tasks, because many full featured OSs
reschedule as slow as buggery. Latency issues galore then.
 
 
> I don't see how that could happen given your outline.  The pointer is
> updated exactly once, as is the count.  One possibility is that there is an
> access control problem.
>
> Are your pointers and counters declared as volatile?
Of course.
 
> Is access to both protected with interrupt disabling and re-enabling sequences?
That's a good idea, but I know now that's not where this specific problem is.
It's very repeatable with differing Basic application programs, so I doubt it's
just that.
It's mainly really that a CG problem with context switch threw me out
 
> One thing that can happen on the ARM that won't on micros with small
> register sets is that the pointer and counters can be held in the register
> sets and won't get spilled out to memory unless they are declared as
> volatile.  I'd expect the issue to be a little more dramatic in that case
> but...
>
> This does 'feel' more like an access control race where the interrupt
> decrements the counter and returns to normal execution which immediately
> overwrites it with an old updated value.  Something like:
>
> r1 = tx_chars
>                          - jump to transmit interrupt
>                          - Save appropriate registers
>                          - Perform transmit
>                          - r1 = tx_chars
>                          - r1 -= 1
>                          - tx_chars = r1
>                          - restore saved registers
>                          - return from interrupt
> r1 += 1
> tx_chars = r1
>
> And now tx_chars is one higher than it should be.  As bugs go these ones
> tend to be very timing sensitive.
>
> Ring any bells?
 
That's a very good point you make Robert.
I will make sure TX INTs are disabled when I modify tx_chars.
I've had it sitting there dumping data out while executing Basic for hours, and it ran
with no problems, whereas a specific Basic program caused it to hang in that "loop"
within 30-40 secs at the most.
It must be a library call somewhere that messes up something.
I'll certainly keep you posted.
 
B regards,
Kris
 



At 07:22 PM 2/18/04 +1100, you wrote:

<snip>
>
>The large (sort of) buffer is the minimum needed to create a pseudo-full
>duplex RS232 wireless link between 2 nodes. While data is being serviced on
>the RF in one direction, the other side's RS232 RX buffer can fill up, and
>compensate
>for the Half Duplex nature and Line Turn Around delays.
>When it's in Binary mode, the scheme above isn't used, the buffer is
>filled up,
>and then the INTs are enabled, so the CPU can spend most of its time doing
>other
>things. RTS/CTS is used there.
>And yes, afterwards my own small but fast RTOS will be plugged back in, so
>I'll use
>counting semaphores to handle the ring buffers.

I knew there had to be a reason :) Half duplex buffering makes a lot of sense.

>
> > That raises a bunch of design issues to mind. Why such a large gap? Why
> > not add to buffer as soon as any room is available? And finally why two
> > different variables to keep track of the room in the ring buffer?
>Partially as above, the large gap is there to reduce CPU loading, if too
>much data
>is being written too fast into the ring buffer.
>I find using a Head and a Tail more convenient to manage, rather than
>having to
>reset the pointer all the time, it's also easier to do the ground work w/o
>the RTOS,
>but have the whole system built up so it's easy to assign tasks and only
>change the
>"foreground" code, then _actually_ being foreground code instead of some
>sort of
>a superloop. The dead CPU time is then being used better. I'm trying to
>optimise
>what will be in what task, and minimise # of tasks, because many full
>featured OSs
>reschedule as slow as buggery. Latency issues galore then.

OK, I see what you are after here. Makes sense. Most of the serial work
I've done has had serial as a low priority compared to the rest of the code
(It's more important for the machine to run than for it to communicate text
to someone). That being the case it would only poll during dead time
anyway. I have done CAN and other network machine comms where that wasn't
the case though.

<snip>
>That's a very good point you make Robert.
>I will make sure TX INTs are disabled when I modify tx_chars.
>I've had it sitting there dumping data out while executing Basic for
>hours, and it ran
>with no problems, whereas a specific Basic program caused it to hang in
>that "loop"
>within 30-40 secs at the most.
I have seen that sort of behaviour with access control issues. Changing
unrelated code changes the observed behaviour. The messiest I've run into
was a problem with a third party RTOS that took some time to find (mostly
because of chip bugs). The problem ended up being a sequence that worked
only if compiler optimization was on above a certain level. Amazingly
enough the RTOS company claimed it wasn't a bug (and had no intention of
changing it) since it worked fine with optimization on!

>It must be a library call somewhere that messes up something.

Uggh, wild pointer or library re-enabling interrupts or ... Good luck. " 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III


Hi Robert,
 
Well you were certainly dean on the money.
It's not completely solved yet, and I do know there is a context
switch issue as well, but these are my findings so far :
 
> OK, I see what you are after here.  Makes sense.  Most of the serial work
> I've done has had serial as a low priority compared to the rest of the code
> (It's more important for the machine to run than for it to communicate text
> to someone).  That being the case it would only poll during dead time
> anyway.  I have done CAN and other network machine comms where that wasn't
> the case though.
The RF normally runs at 64 kBps, and the RX INT is a state machine that looks for
preamble, checks the Frame sync header, handles all DLL stuff, including CRC on-the-fly.
It is very important that I don't have overruns. If a char is lost, then the FHSS stuffs up etc.
That puts already quite a bit of demand on RTIOS, as the time spent in the Kernel cannot
be more than just under 2 chars at 64 kBps, or ~ 312 uS.
That was quite a bit of a challenge on MSP430, especially for auto-event handling and
timers.
On ARM it's much more relaxed, but I'm quite new to it.
 
> I have seen that sort of behaviour with access control issues.  Changing
> unrelated code changes the observed behaviour.  The messiest I've run into
> was a problem with a third party RTOS that took some time to find (mostly
> because of chip bugs).  The problem ended up being a sequence that worked
> only if compiler optimization was on above a certain level.  Amazingly
> enough the RTOS company claimed it wasn't a bug (and had no intention of
> changing it) since it worked fine with optimization on!
The UART operation on LPC2000 is still confusing wrt the TX FIFO.
The UM states that THRE asserts when 2 or more chars are in the FIFO, but then
somewhere else it states that each char asserts it.
That would have explained why I always had a char too many, because the FIFO
only INTs with 2 or more chars on THRE. If that were the case then I'm barking up
the wrong tree with my THRE Handler.
I find the User's manual too vague about it, maybe it's because it's supposed to be
16C550. I've never really programmed PCs and I never want to, so I'm not used
to these UARTs.
If someone can clarify a bit better what the hell the deal is exactly with the TX FIFO,
I'd be grateful. I've googled around on 16C550 till I'm silly, but not much luck
finding a description how exactly the TX FIFO works. Is it a FIFO between THR and
the actual Shift register ? That's my interpretation, but then you still get the same
amount of TX interrupts anyway, just that they're spaced together in time a lot more,
with more gap in between.
I don't see the point, or maybe I just don't get it. The RX, fair enough, that is great
having a FIFO there.
 
I turned optimisation off, and inserted a write of 0x01 to UOIER before I handle
the buffer or pointer, so THRE interrupt is disabled.
When I started it, the code resets itself in no time.
It's obvious that indeed an atomic operation is needed. I didn't expect that.
And you're dead right about the timing issues, optimisations affect how fast the
execution is, hence the asynchronous timing effect on each other, and that confused
me.
 
Never had such bloody headaches with this sort of thing.
What a pain.
 
Now it's pretty much fixed, except that running code still "resets" itself.
I can't work out what's getting trashed where.
I _should_ work now, there are no other coding issues wrt multiple access,
but it doesn't.
 
Oh, well I'll be off the streets for a bit.
 
> Uggh, wild pointer or library re-enabling interrupts or ...   Good luck.
I'm gonna need it !
 
All the best,
Kris
 
 



Nice summary Kris,

>I find the User's manual too vague about it, maybe it's because it's
>supposed to be
>16C550. I've never really programmed PCs and I never want to, so I'm not used
>to these UARTs.
>If someone can clarify a bit better what the hell the deal is exactly with
>the TX FIFO,
>I'd be grateful. I've googled around on 16C550 till I'm silly, but not
>much luck
>finding a description how exactly the TX FIFO works. Is it a FIFO between
>THR and
>the actual Shift register ? That's my interpretation, but then you still
>get the same
>amount of TX interrupts anyway, just that they're spaced together in time
>a lot more,
>with more gap in between.
>I don't see the point, or maybe I just don't get it. The RX, fair enough,
>that is great
>having a FIFO there.

the 16550 was a National part if I remember correctly. Hmm, I'll check.

Try this

http://www.national.com/pf/PC/PC16550D.html

If anything will have a complete description that should be it. If I
remember correctly the 16550 was a 16450 but with a FIFO that actually
worked. (but I may have my part numbers mixed up). It's been a long time
since I read the datasheet so I don't know how complete it is. " 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III



At 03:44 AM 2/19/04 +1100, you wrote:
>If someone can clarify a bit better what the hell the deal is exactly with
>the TX FIFO,
>I'd be grateful. I've googled around on 16C550 till I'm silly, but not
>much luck
>finding a description how exactly the TX FIFO works. Is it a FIFO between
>THR and
>the actual Shift register ? That's my interpretation, but then you still
>get the same
>amount of TX interrupts anyway, just that they're spaced together in time
>a lot more,
>with more gap in between.
>I don't see the point, or maybe I just don't get it. The RX, fair enough,
>that is great
>having a FIFO there.

I wanted to check the documentation before I commented (to refresh my
memory on the FIFO). You get a lot of the same benefits you get from a RX
FIFO. When you get an interrupt you can sit in a loop and stuff the FIFO
full before returning. That nicely lowers the interrupt overhead by a
factor of 15 or 16. On the down side the main service body will take
longer, maybe not a full 15x longer but longer so that each individual
interrupt may be longer and increase your latency. If you just put in one
character and exit the interrupt it will probably act just as you say. " 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III


> If anything will have a complete description that should be it. If I
> remember correctly the 16550 was a 16450 but with a FIFO that actually
> worked. (but I may have my part numbers mixed up). It's been a long time
> since I read the datasheet so I don't know how complete it is.

OT trivia:
The 16450 was the simple UART (8250). The 16550 had buffers, but there
were bugs that made them unusable. The 16550A was the fixed version.
An 16C550 should be fixed, as well. (back when C really meant something
different from no character--which generally meant NMOS)

So, NMOS 16550 bad. NMOS 16550A good. CMOS 16C550 good.

But, this is only as accurate as my failing memory.

Cheers,
David




> FIFO. When you get an interrupt you can sit in a loop and stuff the FIFO
> full before returning. That nicely lowers the interrupt overhead by a
> factor of 15 or 16. On the down side the main service body will take
> longer, maybe not a full 15x longer but longer so that each individual
> interrupt may be longer and increase your latency. If you just put in one
> character and exit the interrupt it will probably act just as you say.

How do you then know when the FIFO is full ? There is nothing
accessible to tell you.
Do you maintain a counter loop that lets you only write up to 16 chars
in the FIFO ?
The description on page 89 is really ambiguous, it implies that if 2 or more
chars are in the FIFO _and_ the shiftregister has just flushed out a char,
that THRE will set too.
THRE when FIFO empty "provided" certain init conditions have been met.
Is this referring to the char delay so no INTs will issue straight away at
start up ?
I guess so.

-- Kris



At 02:17 PM 2/18/04 -0500, you wrote:
>So, NMOS 16550 bad. NMOS 16550A good. CMOS 16C550 good.
That's what I was remembering! Thanks for the refresh.

Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III



The 2024 Embedded Online Conference