Reply by lp2000c November 19, 20042004-11-19

It seems to me that given the problem with the UART interrupt
register, it is more likely than not that the same problem exists
with other registers. Rather than "one race condition is an
oversight," I would assume similar circuit design would be used in
all parts of the chip.

In any event, this is something which should be very easy for
Philips' circuit designers to answer.

As engineers, we can all appreciate that bugs sometimes get through.
It's the lack of information, once the bug is discovered, that's
frustrating. --- In , Robert Adsett <subscriptions@a...>
wrote:
>
> Well, one race condition is an oversight. Two make me wonder if it
was
> something they didn't check for properly. And since both the SPI
and timer
> have documented race conditions (and given the apparent long lead
time
> between the discovery of an issue and letting us know about it) .... > >In fact, there are a number of other registers which also get
written
> >to by both the core and peripheral hardware. Has Philips analyzed
> >which of these are subject to the same problem with simultaneoues
> >access?
> >
> >Philips: How about letting us know what's going on!
>
> Very good question. Having errata come out in dribs and drabs is
very
> frustrating.
>




An Engineer's Guide to the LPC2100 Series

Reply by Robert Adsett November 18, 20042004-11-18
Thanks for passing this along. It may be the root of a problem that's been
bothering me for some time. There are times when I start to think Philips
is actually trying to kill this microcontroller.

At 10:00 PM 11/18/04 +0000, you wrote:
>You MUST enable the FIFO (by setting U0FCR:0) - See Description of
>that bit in Table 66 of LPC2106 User Manual.

Good heavens, that's well hidden. I don't think very many others noticed
that. Most of the example uart code doesn't use the FIFOs. It's certainly
contradictory to any other 16550 implementation.

I'll have to go back to my paper copy and see if that was in earlier revs
of the UM. Sheesh!
>Since other sections of the manual imply that you can run without
>this bit set, I asked Philips about it. One of the apps engineers
>told me it was OK to run without the FIFO enabled. Hwever, another
>apps engineer told me that the original intention was to allow
>operation without the FIFO, but they discovered some bugs operating
>in that mode, so they redefined the product to not allow that mode of
>operation.
>
>So, as far as I know, if the FIFO is not enabled - all bets are off.
>
>Robert: Are you also operating without the FIFO when you have your
>problems?

Nope, the program I've got that ran into the problem has the FIFO
enabled. Hmm that went through a set of changes I may have added the FIFO
after changing the IIR read. I'll have to go back and recheck to see if it
works properly with the FIFO enabled now. >As to whether the problem operating without the FIFO (which Philips
>has long been aware of) might be caused by a similar race condition
>as the timer interrupt problem - interesting question. When I
>discussed the FIFO issue with them, the timer problem had not yet
>been discovered. Now that Philips has an understanding of the timer
>interrupt problem (and has reportedly implemented a fix on some new
>chips in development), their circuit designers should be able to
>easily determine if a similar problem exists here.

Well, one race condition is an oversight. Two make me wonder if it was
something they didn't check for properly. And since both the SPI and timer
have documented race conditions (and given the apparent long lead time
between the discovery of an issue and letting us know about it) .... >In fact, there are a number of other registers which also get written
>to by both the core and peripheral hardware. Has Philips analyzed
>which of these are subject to the same problem with simultaneoues
>access?
>
>Philips: How about letting us know what's going on!

Very good question. Having errata come out in dribs and drabs is very
frustrating.

I'll get to this shortly, I hope, and I will report back (after restoring
IIR multiple read with FIFO's enabled and performing a test). Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III



Reply by g2100g November 18, 20042004-11-18

You MUST enable the FIFO (by setting U0FCR:0) - See Description of
that bit in Table 66 of LPC2106 User Manual.

Since other sections of the manual imply that you can run without
this bit set, I asked Philips about it. One of the apps engineers
told me it was OK to run without the FIFO enabled. Hwever, another
apps engineer told me that the original intention was to allow
operation without the FIFO, but they discovered some bugs operating
in that mode, so they redefined the product to not allow that mode of
operation.

So, as far as I know, if the FIFO is not enabled - all bets are off.

Robert: Are you also operating without the FIFO when you have your
problems?

As to whether the problem operating without the FIFO (which Philips
has long been aware of) might be caused by a similar race condition
as the timer interrupt problem - interesting question. When I
discussed the FIFO issue with them, the timer problem had not yet
been discovered. Now that Philips has an understanding of the timer
interrupt problem (and has reportedly implemented a fix on some new
chips in development), their circuit designers should be able to
easily determine if a similar problem exists here.

In fact, there are a number of other registers which also get written
to by both the core and peripheral hardware. Has Philips analyzed
which of these are subject to the same problem with simultaneoues
access?

Philips: How about letting us know what's going on!

--- In , "Leighton Rowe" <leightonsrowe@y...>
wrote:
>
> > >Should I enable the FIFOs (write 1 to U0FCR) before running the
> > >interrupts? I seem to be getting by without this up until this
> point.
> >
> > I don't see why that would eliminate the problem. Worse it might
> hide it
> > so that it showed up later under less benign conditions.
> >
>
> Good news...I got communication running alot better after enabling
> FIFOs (U0FCR = 1) at startup. So far, no Rx & Tx glitches yet. Try
> doing the same to see what happens.
>
> For now the LSR mystery is worth forgetting. > Leighton




Reply by Robert Adsett November 18, 20042004-11-18
At 08:02 PM 11/18/04 +0000, you wrote:
> > >Should I enable the FIFOs (write 1 to U0FCR) before running the
> > >interrupts? I seem to be getting by without this up until this
>point.
> >
> > I don't see why that would eliminate the problem. Worse it might
>hide it
> > so that it showed up later under less benign conditions.
> >
>
>Good news...I got communication running alot better after enabling
>FIFOs (U0FCR = 1) at startup. So far, no Rx & Tx glitches yet. Try
>doing the same to see what happens.

That rather worries me. I will give it try and see what it does on my code
though (I've got a few other items to take care of first). I place it in
the same category as my read-IIR-only-once-per-interrupt 'fix' though. I
strongly suspect it's only masking temporarily whatever the underlying
cause is.

I suspect there is something similar going on here as happens with the SPI
and the timers. We've now got two independent reports of missing
interrupts on the UART with no clear source of the problem.

Robert " 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III


Reply by Leighton Rowe November 18, 20042004-11-18

> >Should I enable the FIFOs (write 1 to U0FCR) before running the
> >interrupts? I seem to be getting by without this up until this
point.
>
> I don't see why that would eliminate the problem. Worse it might
hide it
> so that it showed up later under less benign conditions.
>

Good news...I got communication running alot better after enabling
FIFOs (U0FCR = 1) at startup. So far, no Rx & Tx glitches yet. Try
doing the same to see what happens.

For now the LSR mystery is worth forgetting. Leighton




Reply by Robert Adsett November 18, 20042004-11-18
At 07:00 PM 11/18/04 +0000, you wrote:
> > Do you know (and can you tell) if the packet is truncated or maybe
>just
> > missing characters from its body? If it always just characters at
>the end
> > I'm less optimistic about the race condition.
>
>Well after double checking & changing my code to the single IIR read
>concept, things remained the same. The affected packet I receive
>always get's somewhat truncated in the middle, with the last bytes
>showing up a few bytes earlier than expected.

Well that fits with the speculation of an IIR race condition. Doesn't
prove it but it does fit. You may have duplicated what I've been seeing. >I however notice LSR interrupt occuring at the point where the
>problem occurs but I don't see any signs of any LSR errors.

Hmmm... Doesn't make much sense to me either. Maybe instead of a simple
race the source is being miss-classified? My own test would not have seen
a difference between that and the interrupt simply dissappearing. >Just a crazy question though...can the UART ISR interrupt itself?
>(eg. while processing a THRE an RDA interrupt comes in)

At the very least that would require that you re-enable the associated
interrupt (IRQ or FIQ) and if vectored through the VIC you would also have
to acknowledge the vectored interrupt with the appropriate write to the VIC
first as well. If you've done that, try it again w/o re-enabling the
interrupts. >Should I enable the FIFOs (write 1 to U0FCR) before running the
>interrupts? I seem to be getting by without this up until this point.

I don't see why that would eliminate the problem. Worse it might hide it
so that it showed up later under less benign conditions.

Curiouser and curiouser
Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III



Reply by Leighton Rowe November 18, 20042004-11-18

> Do you know (and can you tell) if the packet is truncated or maybe
just
> missing characters from its body? If it always just characters at
the end
> I'm less optimistic about the race condition.

Well after double checking & changing my code to the single IIR read
concept, things remained the same. The affected packet I receive
always get's somewhat truncated in the middle, with the last bytes
showing up a few bytes earlier than expected.

I however notice LSR interrupt occuring at the point where the
problem occurs but I don't see any signs of any LSR errors.

Just a crazy question though...can the UART ISR interrupt itself?
(eg. while processing a THRE an RDA interrupt comes in)

Should I enable the FIFOs (write 1 to U0FCR) before running the
interrupts? I seem to be getting by without this up until this point.

Thanks again,
Leighton




Reply by Robert Adsett November 18, 20042004-11-18
At 02:15 PM 11/18/04 +0000, you wrote:
> > Leighton's symptoms appear to be similar to (but not the same as)
>what I've
> > seen so I'm wondering if he hasn't run across the same underlying
>problem I
> > have.
> >
> > The THRE interrupts are the only serial interrupts that do not re-
>assert if
> > they are not serviced so they are particularly vulnerable.
> >
>
>Robert, I haven't yet seen the problem you described that's
>interestingly the opposite to what I'm getting. Debugging through
>the JTAG and using a serial port data logger I see all the receiver
>bytes coming in from the PC. I usually send up to 16 bytes per THRE
>& for the most part the lpc sends the reply packet correctly. But
>only in that special case I see the lpc not respond to the last
>command packet. It came across the receive line ok (logger) so the
>PC's ok. However on the lpc side I debug and see the packet looking
>truncated. Like some bytes got lost while receiving, which looks
>kinda wierd. I haven't yet seen the opposite happen b4.

I had a thought. If my speculation about a race condition in IIR is
correct it may well explain both our problems. In my case I lose the THRE
interrupt and thus stop transmitting. In your case you lose a receive
interrupt and drop a character. My test wouldn't catch a dropped character
since I wasn't running any sort of check or comparison so all I would see
is if the echo stopped cold (if a receive interrupt was missed it would
just pick up the next one).

Do you know (and can you tell) if the packet is truncated or maybe just
missing characters from its body? If it always just characters at the end
I'm less optimistic about the race condition.

If it is a race in the IIR then a single read per interrupt might help (or
it might not) depending on the details of the race and the timing (ICK).

In any case let us know how changing to a single IIR read per interrupt
affects the problem. Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III



Reply by Leighton Rowe November 18, 20042004-11-18

Sorry for the delay guys,

Point taken for the answers to 1,2, and 3. Thanks Karl & Robert.

> Leighton's symptoms appear to be similar to (but not the same as)
what I've
> seen so I'm wondering if he hasn't run across the same underlying
problem I
> have.
>
> The THRE interrupts are the only serial interrupts that do not re-
assert if
> they are not serviced so they are particularly vulnerable.
>

Robert, I haven't yet seen the problem you described that's
interestingly the opposite to what I'm getting. Debugging through
the JTAG and using a serial port data logger I see all the receiver
bytes coming in from the PC. I usually send up to 16 bytes per THRE
& for the most part the lpc sends the reply packet correctly. But
only in that special case I see the lpc not respond to the last
command packet. It came across the receive line ok (logger) so the
PC's ok. However on the lpc side I debug and see the packet looking
truncated. Like some bytes got lost while receiving, which looks
kinda wierd. I haven't yet seen the opposite happen b4.

> a. Check U0IIR, handle the highest-priority pending interrupt,
> thereby doing the Interrupt Reset action, and return. If more
than
> one interrupt was pending, the UART will immediately interrupt
again,
> and then present the next-highest priority interrupt in U0IIR, and
so
> on. This way is the easiest, and the best one if multiple
> simultaneous interrupts are rare.
>
> b. Check U0IIR, handle the highest-priority pending interrupt,
> thereby doing the Interrupt Reset action. Re-check U0IIR, and
loop
> around if another interrupt is pending. When no more interrupts
are
> pending, return. This way is better if multiple simultaneous
> interrupts are so likely that it pays off to save the multiple
> interrupt entry/exit code.

Interesting stuff Karl. I wasn't really paying attention to the IIR
table. So, I'll have to review my code just incase I did a bit too
much for RDA & THRE. Basically, the code I'm using is following Plan
B. Bill Knight's code follows the same concept too. I'll give Plan A
a try though, since my PC <-> lpc communication is on a small scale,
and Robert's having more success with it.




Reply by Robert Adsett November 18, 20042004-11-18
At 10:07 PM 11/17/04 +0000, you wrote:
>b. Check U0IIR, handle the highest-priority pending interrupt,
>thereby doing the Interrupt Reset action. Re-check U0IIR, and loop
>around if another interrupt is pending. When no more interrupts are
>pending, return. This way is better if multiple simultaneous
>interrupts are so likely that it pays off to save the multiple
>interrupt entry/exit code.

That's certainly the way it's supposed to work (and the way I have done it
on 'real' 16550s) but when I do this on an LPC2106 with both transmit and
receive interrupts I occasionally miss THRE interrupts (they just never
occur). This only happens when receive is occurring at the same time as
transmit (I used a simple echo test) and it only occurs infrequently
(several 100K up to a megabyte or more of transmitted bytes before it would
halt). The only explanation I have is an undocumented race condition in
updating the IIR as it is being read but as I said in an earlier post I
haven't seen any duplication. Changing to the less conventional one IIR
read per interrupt seems to eliminate the problem but maybe all it does is
reduce the probability.

Leighton's symptoms appear to be similar to (but not the same as) what I've
seen so I'm wondering if he hasn't run across the same underlying problem I
have.

The THRE interrupts are the only serial interrupts that do not re-assert if
they are not serviced so they are particularly vulnerable.

Robert

" 'Freedom' has no meaning of itself. There are always restrictions,
be they legal, genetic, or physical. If you don't believe me, try to
chew a radio signal. "

Kelvin Throop, III