forward error correction on ADSP21020

Hi everyone,

in the system I am using there is an ADSP21020 connected to an FPGA
which is receiving data from a serial port. The FPGA receives the serial
bytes and sets an interrupt and a bit in a status register once the byte
is ready in the output register (one 'start bit' and one 'stop bit').
The DSP can look at the registers simply reading from a mapped port and
we can choose either polling the status register or using the interrupt.

Unfortunately this is just on paper. The real world is much more
different since the FPGA receiver is apparently 'losing' bits.
When we send a "packet" (a sequence of bytes) what we can observe with
the scope it that sometimes the interrupts are not equally spaced in
time and there is one byte less w.r.t. what we send. So we suspect that
the receiver has started on the wrong 'start bit', hence screwing up
everything.

The incidence of this error looks like dependent on the length of the
packet we send, leading to think that due to some synchronization
problem the uart looses the sync (maybe timing issues on the fpga).

Given the fact that we cannot change the fpga, I came up with the idea
to use some forward error correction (FEC) encoding to overcome this
issue, but if my diagnosis is correct it looks like that the broken
sequence of bytes is not only missing some bytes, it will certainly have
the bit shifted (starting on wrong 'start bit') with some bits inserted
('start bit' and 'stop bit' will be part of the data) and I'm not sure
if there exists some technique which may recover such a broken sequence.

On top of it I don't have any feeling how much would cost (in terms of
memory and cpu resources) any type of FEC decoding on the DSP.

Any suggestions and/or ideas?

Al

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Reply by Stef ●March 2, 20122012-03-02

In comp.arch.embedded,
alb <alessandro.basili@cern.ch> wrote:
> Hi everyone,
>
> in the system I am using there is an ADSP21020 connected to an FPGA
> which is receiving data from a serial port. The FPGA receives the serial
> bytes and sets an interrupt and a bit in a status register once the byte
> is ready in the output register (one 'start bit' and one 'stop bit').
> The DSP can look at the registers simply reading from a mapped port and
> we can choose either polling the status register or using the interrupt.
>
> Unfortunately this is just on paper. The real world is much more
> different since the FPGA receiver is apparently 'losing' bits.
> When we send a "packet" (a sequence of bytes) what we can observe with
> the scope it that sometimes the interrupts are not equally spaced in
> time and there is one byte less w.r.t. what we send. So we suspect that
> the receiver has started on the wrong 'start bit', hence screwing up
> everything.
>
> The incidence of this error looks like dependent on the length of the
> packet we send, leading to think that due to some synchronization
> problem the uart looses the sync (maybe timing issues on the fpga).
>
> Given the fact that we cannot change the fpga, I came up with the idea
> to use some forward error correction (FEC) encoding to overcome this
> issue, but if my diagnosis is correct it looks like that the broken
> sequence of bytes is not only missing some bytes, it will certainly have
> the bit shifted (starting on wrong 'start bit') with some bits inserted
> ('start bit' and 'stop bit' will be part of the data) and I'm not sure
> if there exists some technique which may recover such a broken sequence.
>
> On top of it I don't have any feeling how much would cost (in terms of
> memory and cpu resources) any type of FEC decoding on the DSP.
>
> Any suggestions and/or ideas?

Is this a continuous stream of bits, with no pauses between bytes?
Looks like the start bit detection does not re-adjust it's timing to
the actual edge of the next start bit. With small diffferences in
bitrate, this causes the receiver to fall out of sync as you found.

Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
there no way to fix it or get it fixed?

Can you change the sender of the data? If so, you can set it to 2 stop
bits. This can allow the receive to re-sync every byte. If possible, I
do try to set my transmitters to 2 stop bits and receivers to 1. This
can prevent trouble like this but costs a little bandwidth.

Another option would be to tweak the bitrates. It seems your sender is
now a tiny bit on the fast side w.r.t. the receiver. Maybe you can slown
down the clock on your sender by 1 or 2 percent? Try to get an accurate
measurement of the bitrate on both sides before you do anything.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

An egghead is one who stands firmly on both feet, in mid-air, on both
sides of an issue.
		-- Homer Ferguson

Reply by alb ●March 2, 20122012-03-02

On 3/2/2012 12:52 PM, Stef wrote:
> In comp.arch.embedded,
> alb <alessandro.basili@cern.ch> wrote:
>> Hi everyone,
>>
>> in the system I am using there is an ADSP21020 connected to an FPGA
>> which is receiving data from a serial port. The FPGA receives the serial
>> bytes and sets an interrupt and a bit in a status register once the byte
>> is ready in the output register (one 'start bit' and one 'stop bit').
>> The DSP can look at the registers simply reading from a mapped port and
>> we can choose either polling the status register or using the interrupt.
>>
>> Unfortunately this is just on paper. The real world is much more
>> different since the FPGA receiver is apparently 'losing' bits.
>> When we send a "packet" (a sequence of bytes) what we can observe with
>> the scope it that sometimes the interrupts are not equally spaced in
>> time and there is one byte less w.r.t. what we send. So we suspect that
>> the receiver has started on the wrong 'start bit', hence screwing up
>> everything.
>>
>> The incidence of this error looks like dependent on the length of the
>> packet we send, leading to think that due to some synchronization
>> problem the uart looses the sync (maybe timing issues on the fpga).
>>
>> Given the fact that we cannot change the fpga, I came up with the idea
>> to use some forward error correction (FEC) encoding to overcome this
>> issue, but if my diagnosis is correct it looks like that the broken
>> sequence of bytes is not only missing some bytes, it will certainly have
>> the bit shifted (starting on wrong 'start bit') with some bits inserted
>> ('start bit' and 'stop bit' will be part of the data) and I'm not sure
>> if there exists some technique which may recover such a broken sequence.
>>
>> On top of it I don't have any feeling how much would cost (in terms of
>> memory and cpu resources) any type of FEC decoding on the DSP.
>>
>> Any suggestions and/or ideas?
> 
> Is this a continuous stream of bits, with no pauses between bytes?
> Looks like the start bit detection does not re-adjust it's timing to
> the actual edge of the next start bit. With small diffferences in
> bitrate, this causes the receiver to fall out of sync as you found.

in within a "packet" there's should be no pause between bytes, I will
check though. There might be a small difference in bitrate, maybe I
would need to verify how much.

> 
> Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
> there no way to fix it or get it fixed?

The FPGA, is flying in space, together with the rest of the equipment.
We cannot reprogram it, we can only replace the software in the DSP,
with non-trivial effort.

> 
> Can you change the sender of the data? If so, you can set it to 2 stop
> bits. This can allow the receive to re-sync every byte. If possible, I
> do try to set my transmitters to 2 stop bits and receivers to 1. This
> can prevent trouble like this but costs a little bandwidth.
> 

We are currently investigating it, the transmitter is controlled by an
8051 and in principle we should have control over it. Your idea is to
use the second stop bit to allow better synching and hopefully not lose
the following start bit, correct?

> Another option would be to tweak the bitrates. It seems your sender is
> now a tiny bit on the fast side w.r.t. the receiver. Maybe you can slown
> down the clock on your sender by 1 or 2 percent? Try to get an accurate
> measurement of the bitrate on both sides before you do anything.
> 

We can certainly measure the transmission rate. I am not sure we can
tweak the bitrates to that level. The current software on the 8051
supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
those options are somehow hardcoded in the transmitter. Certainly it
would be worth having a look.

Reply by Stef ●March 2, 20122012-03-02

In comp.arch.embedded,
alb <alessandro.basili@cern.ch> wrote:
> On 3/2/2012 12:52 PM, Stef wrote:
>> In comp.arch.embedded,
>> alb <alessandro.basili@cern.ch> wrote:
>>> Hi everyone,
>>>
>>> in the system I am using there is an ADSP21020 connected to an FPGA
>>> which is receiving data from a serial port. The FPGA receives the serial
>>> bytes and sets an interrupt and a bit in a status register once the byte
>>> is ready in the output register (one 'start bit' and one 'stop bit').
>>> The DSP can look at the registers simply reading from a mapped port and
>>> we can choose either polling the status register or using the interrupt.
>>>
>>> Unfortunately this is just on paper. The real world is much more
>>> different since the FPGA receiver is apparently 'losing' bits.
>>> When we send a "packet" (a sequence of bytes) what we can observe with
>>> the scope it that sometimes the interrupts are not equally spaced in
>>> time and there is one byte less w.r.t. what we send. So we suspect that
>>> the receiver has started on the wrong 'start bit', hence screwing up
>>> everything.
>>>
>>> The incidence of this error looks like dependent on the length of the
>>> packet we send, leading to think that due to some synchronization
>>> problem the uart looses the sync (maybe timing issues on the fpga).
>>>
>>> Given the fact that we cannot change the fpga, I came up with the idea
>>> to use some forward error correction (FEC) encoding to overcome this
>>> issue, but if my diagnosis is correct it looks like that the broken
>>> sequence of bytes is not only missing some bytes, it will certainly have
>>> the bit shifted (starting on wrong 'start bit') with some bits inserted
>>> ('start bit' and 'stop bit' will be part of the data) and I'm not sure
>>> if there exists some technique which may recover such a broken sequence.
>>>
>>> On top of it I don't have any feeling how much would cost (in terms of
>>> memory and cpu resources) any type of FEC decoding on the DSP.
>>>
>>> Any suggestions and/or ideas?
>> 
>> Is this a continuous stream of bits, with no pauses between bytes?
>> Looks like the start bit detection does not re-adjust it's timing to
>> the actual edge of the next start bit. With small diffferences in
>> bitrate, this causes the receiver to fall out of sync as you found.
>
> in within a "packet" there's should be no pause between bytes, I will
> check though. There might be a small difference in bitrate, maybe I
> would need to verify how much.
>
>> 
>> Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
>> there no way to fix it or get it fixed?
>
> The FPGA, is flying in space, together with the rest of the equipment.
> We cannot reprogram it, we can only replace the software in the DSP,
> with non-trivial effort.

Whoops, that's a real "cannot" then. Too often a statement like "cannot"
is flexible after a little interrogation, guess this is not one of those
cases. 

But it seems you have at least a test system on the ground you can do
measurements or tests on? Changing the FPGA there can at least confirm
that you found the actual cause of the problem if you need to.

>> Can you change the sender of the data? If so, you can set it to 2 stop
>> bits. This can allow the receive to re-sync every byte. If possible, I
>> do try to set my transmitters to 2 stop bits and receivers to 1. This
>> can prevent trouble like this but costs a little bandwidth.
>> 
>
> We are currently investigating it, the transmitter is controlled by an
> 8051 and in principle we should have control over it. Your idea is to
> use the second stop bit to allow better synching and hopefully not lose
> the following start bit, correct?

The extra stop bit will allow the receiver start looking for a start bit
from scratch, causing re-sync every byte. But the exact effect ofcourse
depends on the receiver implementation in the FPGA. A "correct"
implementation should start looking for a start-bit after half a stop-bit
or so and sync the data sampling for the real bits to the first edge of
the newly detected start-bit. 

If you have the FPGA code or RTL and a simulator, you can set up a
simulation testbench to test the effects of 1 and 2 stop-bits under
slightly varying bitrates.

>> Another option would be to tweak the bitrates. It seems your sender is
>> now a tiny bit on the fast side w.r.t. the receiver. Maybe you can slown
>> down the clock on your sender by 1 or 2 percent? Try to get an accurate
>> measurement of the bitrate on both sides before you do anything.
>> 
>
> We can certainly measure the transmission rate. I am not sure we can
> tweak the bitrates to that level. The current software on the 8051
> supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
> those options are somehow hardcoded in the transmitter. Certainly it
> would be worth having a look.

If you only have control over the dividers in the 8051, there is nothing
you can in this area, you can only change the bitrate in large steps.
Can you change the bitrate on the FPGA side? If so, depending on the
receiver implementation, changing the bitrate on both sides may help.
If bitrates are not exact, you may be able to reverse the speed errors
at certain settings.

Tweaking the bitrate in the error margin can only be done by changing
a crystal on a 8051.

Again, if you can simulate the FPGA receiver, you can take a lot of
guessing out of the equation.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

I am firm.  You are obstinate.  He is a pig-headed fool.
		-- Katharine Whitehorn

Reply by Tim Wescott ●March 2, 20122012-03-02

On Fri, 02 Mar 2012 14:03:07 +0100, alb wrote:

> On 3/2/2012 12:52 PM, Stef wrote:
>> In comp.arch.embedded,
>> alb <alessandro.basili@cern.ch> wrote:
>>> Hi everyone,
>>>
>>> in the system I am using there is an ADSP21020 connected to an FPGA
>>> which is receiving data from a serial port. The FPGA receives the
>>> serial bytes and sets an interrupt and a bit in a status register once
>>> the byte is ready in the output register (one 'start bit' and one
>>> 'stop bit'). The DSP can look at the registers simply reading from a
>>> mapped port and we can choose either polling the status register or
>>> using the interrupt.
>>>
>>> Unfortunately this is just on paper. The real world is much more
>>> different since the FPGA receiver is apparently 'losing' bits. When we
>>> send a "packet" (a sequence of bytes) what we can observe with the
>>> scope it that sometimes the interrupts are not equally spaced in time
>>> and there is one byte less w.r.t. what we send. So we suspect that the
>>> receiver has started on the wrong 'start bit', hence screwing up
>>> everything.
>>>
>>> The incidence of this error looks like dependent on the length of the
>>> packet we send, leading to think that due to some synchronization
>>> problem the uart looses the sync (maybe timing issues on the fpga).
>>>
>>> Given the fact that we cannot change the fpga, I came up with the idea
>>> to use some forward error correction (FEC) encoding to overcome this
>>> issue, but if my diagnosis is correct it looks like that the broken
>>> sequence of bytes is not only missing some bytes, it will certainly
>>> have the bit shifted (starting on wrong 'start bit') with some bits
>>> inserted ('start bit' and 'stop bit' will be part of the data) and I'm
>>> not sure if there exists some technique which may recover such a
>>> broken sequence.
>>>
>>> On top of it I don't have any feeling how much would cost (in terms of
>>> memory and cpu resources) any type of FEC decoding on the DSP.
>>>
>>> Any suggestions and/or ideas?
>> 
>> Is this a continuous stream of bits, with no pauses between bytes?
>> Looks like the start bit detection does not re-adjust it's timing to
>> the actual edge of the next start bit. With small diffferences in
>> bitrate, this causes the receiver to fall out of sync as you found.
> 
> in within a "packet" there's should be no pause between bytes, I will
> check though. There might be a small difference in bitrate, maybe I
> would need to verify how much.
> 
> 
>> Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
>> there no way to fix it or get it fixed?
> 
> The FPGA, is flying in space, together with the rest of the equipment.
> We cannot reprogram it, we can only replace the software in the DSP,
> with non-trivial effort.
> 
> 
>> Can you change the sender of the data? If so, you can set it to 2 stop
>> bits. This can allow the receive to re-sync every byte. If possible, I
>> do try to set my transmitters to 2 stop bits and receivers to 1. This
>> can prevent trouble like this but costs a little bandwidth.
>> 
>> 
> We are currently investigating it, the transmitter is controlled by an
> 8051 and in principle we should have control over it. Your idea is to
> use the second stop bit to allow better synching and hopefully not lose
> the following start bit, correct?
> 
>> Another option would be to tweak the bitrates. It seems your sender is
>> now a tiny bit on the fast side w.r.t. the receiver. Maybe you can
>> slown down the clock on your sender by 1 or 2 percent? Try to get an
>> accurate measurement of the bitrate on both sides before you do
>> anything.
>> 
>> 
> We can certainly measure the transmission rate. I am not sure we can
> tweak the bitrates to that level. The current software on the 8051
> supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
> those options are somehow hardcoded in the transmitter. Certainly it
> would be worth having a look.

Go over the FPGA code with a fine-toothed comb -- whatever you're doing, 
it won't help if the FPGA doesn't support it.

What the FPGA _should_ be doing is starting a clock when it detects the 
leading edge of a start bit, then sampling the waveform at the middle of 
every bit period, then beginning a search for a new start bit at the 
_middle_ of the stop bit.

It sounds like it either doesn't resynchronize in the middle of a packet 
at all, or it starts seeking the start bit at the _end_ of the preceding 
stop bit.  In the former case adding a stop bit will just screw things up 
completely.  In the latter case, if the DSP bit clock is slower than the 
FPGA the whole resynchronization thing falls apart -- but adding that 
extra stop bit will fix things.

And next time you go sending satellites to space, put in a mechanism to 
upload FPGA firmware!

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by Charles Bryant ●March 2, 20122012-03-02

In article <9rbp1cF67tU1@mid.individual.net>,
alb  <alessandro.basili@cern.ch> wrote:
}in the system I am using there is an ADSP21020 connected to an FPGA
}which is receiving data from a serial port. The FPGA receives the serial
}bytes and sets an interrupt and a bit in a status register once the byte
}is ready in the output register (one 'start bit' and one 'stop bit').
}The DSP can look at the registers simply reading from a mapped port and
}we can choose either polling the status register or using the interrupt.
}
}Unfortunately this is just on paper. The real world is much more
}different since the FPGA receiver is apparently 'losing' bits.
}When we send a "packet" (a sequence of bytes) what we can observe with
}the scope it that sometimes the interrupts are not equally spaced in
}time and there is one byte less w.r.t. what we send. So we suspect that
}the receiver has started on the wrong 'start bit', hence screwing up
}everything.
}
}The incidence of this error looks like dependent on the length of the
}packet we send, leading to think that due to some synchronization
}problem the uart looses the sync (maybe timing issues on the fpga).

Try a test where you send a packet consisting of only 0xff bytes (I'll
assume it's 8 bits per character). Watch the interrupts and confirm
that what happens is they're normally spaced until one goes missing
where you get an extra gap of about 9 bits. This is a problem with the
delivery of the characters rather than one of recognising them.

Then try sending only 0x55 bytes (this gives a bit pattern of
010101010101...). If the interrupts show the same pattern, you're
missing a whole character; if the pattern has an extra gap of about 2
bits, then the receiver is missing a start bit and using the next 0
that arrives as the new start bit. This is a problem of recognising the
characters.

If the problem is recognising characters, then most errors will cause
many bytes in a packet to be wrong from the point of the error. It the
problem is in delivery, then delivered bytes will be right, merely
missing out the byte where the error occurred. The best solution will
depend on which problem you have.

One possible solution might be possible to do the UART receive function
in software (this depends very much on how the hardware works). By
setting the baud rate on the FPGA over 10x the true speed, it sees
every bit as either a 0xff or a 0x00 character. If you can react to
the interrupt fast enough and read a suitable clock, you can then
decode the bits in software. Of course if the FPGA is failing to
deliver characters, this is no better.

Fixing a delivery problem is more tricky. It's necessary to know more
about the losses. Is it certain bit patterns which are more likely to
get lost, or every Nth character, or apparently at random? If random,
about how often are characters lost? How big are your packets, and
what sort of re-trasmitting error-correction do you already have?

Reply by alb ●March 5, 20122012-03-05

On 3/2/2012 2:58 PM, Stef wrote:
>> The FPGA, is flying in space, together with the rest of the equipment.
>> We cannot reprogram it, we can only replace the software in the DSP,
>> with non-trivial effort.
> 
> Whoops, that's a real "cannot" then. Too often a statement like "cannot"
> is flexible after a little interrogation, guess this is not one of those
> cases. 
> 
> But it seems you have at least a test system on the ground you can do
> measurements or tests on? Changing the FPGA there can at least confirm
> that you found the actual cause of the problem if you need to.

Well, we do have the system on the ground and we could in principle
replace the FPGA, but unfortunately when we try to place&route the
original design we bump into a problem in pin assignments (I think is a
clock resource signal without the clock buffer or something similar) and
we do not know how the original designer 'tricked' the tool to produce
the bitstream. No traces of whatsoever in the design or in any document.
Needless to say the original designer has migrated into a different
world and recollects nothing but 'how difficult it was and how many
problems they had'.

If there's a timing problem in the FPGA and we will re-route the pinouts
I fear we will not be seeing the same effects, or worse, reveal some
other problems somewhere else.

I should admit this project is well below the standards the other piece
of electronics is conforming to. Considering out production cycle, this
piece should have never made it into the flight assembly!

> 
>>> Can you change the sender of the data? If so, you can set it to 2 stop
>>> bits. This can allow the receive to re-sync every byte. If possible, I
>>> do try to set my transmitters to 2 stop bits and receivers to 1. This
>>> can prevent trouble like this but costs a little bandwidth.
>>>
>>
>> We are currently investigating it, the transmitter is controlled by an
>> 8051 and in principle we should have control over it. Your idea is to
>> use the second stop bit to allow better synching and hopefully not lose
>> the following start bit, correct?
> 
> The extra stop bit will allow the receiver start looking for a start bit
> from scratch, causing re-sync every byte. But the exact effect ofcourse
> depends on the receiver implementation in the FPGA. A "correct"
> implementation should start looking for a start-bit after half a stop-bit
> or so and sync the data sampling for the real bits to the first edge of
> the newly detected start-bit. 
> 
> If you have the FPGA code or RTL and a simulator, you can set up a
> simulation testbench to test the effects of 1 and 2 stop-bits under
> slightly varying bitrates.

Here probably would be much easier to do a test with 2 stop-bits
directly. To run the simulation is probably more time consuming, even
though it might be helpful in the long run, in case we find other
oddities in the behavior of the FPGA.

> 
>>> Another option would be to tweak the bitrates. It seems your sender is
>>> now a tiny bit on the fast side w.r.t. the receiver. Maybe you can slown
>>> down the clock on your sender by 1 or 2 percent? Try to get an accurate
>>> measurement of the bitrate on both sides before you do anything.
>>>
>>
>> We can certainly measure the transmission rate. I am not sure we can
>> tweak the bitrates to that level. The current software on the 8051
>> supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
>> those options are somehow hardcoded in the transmitter. Certainly it
>> would be worth having a look.
> 
> If you only have control over the dividers in the 8051, there is nothing
> you can in this area, you can only change the bitrate in large steps.
> Can you change the bitrate on the FPGA side? If so, depending on the
> receiver implementation, changing the bitrate on both sides may help.
> If bitrates are not exact, you may be able to reverse the speed errors
> at certain settings.

No, we cannot change the receiver rate. Again we could in principle
change the crystal and see if that helps, but certainly cannot be an
option to solve it.

> 
> Tweaking the bitrate in the error margin can only be done by changing
> a crystal on a 8051.
> 
> Again, if you can simulate the FPGA receiver, you can take a lot of
> guessing out of the equation.
> 

Here I'm only concerned that we would need to run a back annotate
simulation, other wise we may lose some nasty time critical effects.
Usually when I was designing the vhdl I made sure that I didn't have any
timing violation, but here we cannot guarantee since we have a
place&route issue that we do not know how it was originally solved.

Reply by alb ●March 5, 20122012-03-05

On 3/2/2012 8:38 PM, Tim Wescott wrote:
> On Fri, 02 Mar 2012 14:03:07 +0100, alb wrote:
> 
>> On 3/2/2012 12:52 PM, Stef wrote:
>>> In comp.arch.embedded,
>>> alb <alessandro.basili@cern.ch> wrote:
>>>> Hi everyone,
>>>>
>>>> in the system I am using there is an ADSP21020 connected to an FPGA
>>>> which is receiving data from a serial port. The FPGA receives the
>>>> serial bytes and sets an interrupt and a bit in a status register once
>>>> the byte is ready in the output register (one 'start bit' and one
>>>> 'stop bit'). The DSP can look at the registers simply reading from a
>>>> mapped port and we can choose either polling the status register or
>>>> using the interrupt.
>>>>
>>>> Unfortunately this is just on paper. The real world is much more
>>>> different since the FPGA receiver is apparently 'losing' bits. When we
>>>> send a "packet" (a sequence of bytes) what we can observe with the
>>>> scope it that sometimes the interrupts are not equally spaced in time
>>>> and there is one byte less w.r.t. what we send. So we suspect that the
>>>> receiver has started on the wrong 'start bit', hence screwing up
>>>> everything.
>>>>
>>>> The incidence of this error looks like dependent on the length of the
>>>> packet we send, leading to think that due to some synchronization
>>>> problem the uart looses the sync (maybe timing issues on the fpga).
>>>>
>>>> Given the fact that we cannot change the fpga, I came up with the idea
>>>> to use some forward error correction (FEC) encoding to overcome this
>>>> issue, but if my diagnosis is correct it looks like that the broken
>>>> sequence of bytes is not only missing some bytes, it will certainly
>>>> have the bit shifted (starting on wrong 'start bit') with some bits
>>>> inserted ('start bit' and 'stop bit' will be part of the data) and I'm
>>>> not sure if there exists some technique which may recover such a
>>>> broken sequence.
>>>>
>>>> On top of it I don't have any feeling how much would cost (in terms of
>>>> memory and cpu resources) any type of FEC decoding on the DSP.
>>>>
>>>> Any suggestions and/or ideas?
>>>
>>> Is this a continuous stream of bits, with no pauses between bytes?
>>> Looks like the start bit detection does not re-adjust it's timing to
>>> the actual edge of the next start bit. With small diffferences in
>>> bitrate, this causes the receiver to fall out of sync as you found.
>>
>> in within a "packet" there's should be no pause between bytes, I will
>> check though. There might be a small difference in bitrate, maybe I
>> would need to verify how much.
>>
>>
>>> Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
>>> there no way to fix it or get it fixed?
>>
>> The FPGA, is flying in space, together with the rest of the equipment.
>> We cannot reprogram it, we can only replace the software in the DSP,
>> with non-trivial effort.
>>
>>
>>> Can you change the sender of the data? If so, you can set it to 2 stop
>>> bits. This can allow the receive to re-sync every byte. If possible, I
>>> do try to set my transmitters to 2 stop bits and receivers to 1. This
>>> can prevent trouble like this but costs a little bandwidth.
>>>
>>>
>> We are currently investigating it, the transmitter is controlled by an
>> 8051 and in principle we should have control over it. Your idea is to
>> use the second stop bit to allow better synching and hopefully not lose
>> the following start bit, correct?
>>
>>> Another option would be to tweak the bitrates. It seems your sender is
>>> now a tiny bit on the fast side w.r.t. the receiver. Maybe you can
>>> slown down the clock on your sender by 1 or 2 percent? Try to get an
>>> accurate measurement of the bitrate on both sides before you do
>>> anything.
>>>
>>>
>> We can certainly measure the transmission rate. I am not sure we can
>> tweak the bitrates to that level. The current software on the 8051
>> supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
>> those options are somehow hardcoded in the transmitter. Certainly it
>> would be worth having a look.
> 
> Go over the FPGA code with a fine-toothed comb -- whatever you're doing, 
> it won't help if the FPGA doesn't support it.

Ok, a colleague of mine went through it and indeed the start-bit logic
is faulty, since it is looking for a negative transition but without the
signal being synchronized with the internal clock (don't ask me how that
is possible!).

Given this type of error the 0xFF byte will be lost completely, since
there are no other start-bit to sync on within the byte, while in other
cases it may resync with a '0' bit in within the byte.

> 
> It sounds like it either doesn't resynchronize in the middle of a packet 
> at all, or it starts seeking the start bit at the _end_ of the preceding 
> stop bit.  In the former case adding a stop bit will just screw things up 
> completely.  In the latter case, if the DSP bit clock is slower than the 
> FPGA the whole resynchronization thing falls apart -- but adding that 
> extra stop bit will fix things.

Adding a delay between bytes does mitigate the effect, but of course it
does not solve it.

> 
> And next time you go sending satellites to space, put in a mechanism to 
> upload FPGA firmware!
> 

We have ~1000 FPGAs onboard and all of them are anti-fuse. The whole
design and production process, which consists of several test campaigns
on different quality models (Engineering, Qualification and Flight),
should have ensured this level of functionality. The reason why it
failed is most probably due to a poor level of quality control of the
process. Just as an example we are missing test reports of the system,
as well as Checksums for the FPGA firmware.

As a side note: IMO the capability to reprogram an FPGA onboard is built
when your needs are changing with time, not to fix some stupid UART
receiver.

Reply by alb ●March 5, 20122012-03-05

On 3/3/2012 4:32 AM, Charles Bryant wrote:
[...]
> 
> Try a test where you send a packet consisting of only 0xff bytes (I'll
> assume it's 8 bits per character). Watch the interrupts and confirm
> that what happens is they're normally spaced until one goes missing
> where you get an extra gap of about 9 bits. This is a problem with the
> delivery of the characters rather than one of recognising them.
> 

Since we found that the 'start bit' logic has a problem, the 0xFF
pattern has a good chance to be lost completely, since no other 'start
bit' will be recognized.

> 
> One possible solution might be possible to do the UART receive function
> in software (this depends very much on how the hardware works). By
> setting the baud rate on the FPGA over 10x the true speed, it sees
> every bit as either a 0xff or a 0x00 character. If you can react to
> the interrupt fast enough and read a suitable clock, you can then
> decode the bits in software. Of course if the FPGA is failing to
> deliver characters, this is no better.

the 0xFF has a good chance to go completely lost. The method you suggest
may reduce the problem of recognizing bytes to the problem of delivering
the bytes. Then extra encoding should be added to recover the loss of bytes.

> 
> Fixing a delivery problem is more tricky. It's necessary to know more
> about the losses. Is it certain bit patterns which are more likely to
> get lost, or every Nth character, or apparently at random? If random,
> about how often are characters lost? How big are your packets, and
> what sort of re-trasmitting error-correction do you already have?

If you plot the number of failed packets [1] with the position in the
packet which had the problem, you will see an almost linear increasing
curve, hence the probability to have problems is higher if the packet is
longer.

At the moment we don't have any re-transmitting mechanism and the rate
of loss is ~0.5% on a 100 bytes packet. We want to exploit the 4K buffer
on the transmitter side in order not to add too much overhead, but it
looks like the rate of loss will be higher with bigger packets.

[1] we send a packet and echo it back and compare the values.

Reply by Tim Wescott ●March 5, 20122012-03-05

On Mon, 05 Mar 2012 11:44:43 +0100, alb wrote:

> On 3/2/2012 8:38 PM, Tim Wescott wrote:
>> On Fri, 02 Mar 2012 14:03:07 +0100, alb wrote:
>> 
>>> On 3/2/2012 12:52 PM, Stef wrote:
>>>> In comp.arch.embedded,
>>>> alb <alessandro.basili@cern.ch> wrote:
>>>>> Hi everyone,
>>>>>
>>>>> in the system I am using there is an ADSP21020 connected to an FPGA
>>>>> which is receiving data from a serial port. The FPGA receives the
>>>>> serial bytes and sets an interrupt and a bit in a status register
>>>>> once the byte is ready in the output register (one 'start bit' and
>>>>> one 'stop bit'). The DSP can look at the registers simply reading
>>>>> from a mapped port and we can choose either polling the status
>>>>> register or using the interrupt.
>>>>>
>>>>> Unfortunately this is just on paper. The real world is much more
>>>>> different since the FPGA receiver is apparently 'losing' bits. When
>>>>> we send a "packet" (a sequence of bytes) what we can observe with
>>>>> the scope it that sometimes the interrupts are not equally spaced in
>>>>> time and there is one byte less w.r.t. what we send. So we suspect
>>>>> that the receiver has started on the wrong 'start bit', hence
>>>>> screwing up everything.
>>>>>
>>>>> The incidence of this error looks like dependent on the length of
>>>>> the packet we send, leading to think that due to some
>>>>> synchronization problem the uart looses the sync (maybe timing
>>>>> issues on the fpga).
>>>>>
>>>>> Given the fact that we cannot change the fpga, I came up with the
>>>>> idea to use some forward error correction (FEC) encoding to overcome
>>>>> this issue, but if my diagnosis is correct it looks like that the
>>>>> broken sequence of bytes is not only missing some bytes, it will
>>>>> certainly have the bit shifted (starting on wrong 'start bit') with
>>>>> some bits inserted ('start bit' and 'stop bit' will be part of the
>>>>> data) and I'm not sure if there exists some technique which may
>>>>> recover such a broken sequence.
>>>>>
>>>>> On top of it I don't have any feeling how much would cost (in terms
>>>>> of memory and cpu resources) any type of FEC decoding on the DSP.
>>>>>
>>>>> Any suggestions and/or ideas?
>>>>
>>>> Is this a continuous stream of bits, with no pauses between bytes?
>>>> Looks like the start bit detection does not re-adjust it's timing to
>>>> the actual edge of the next start bit. With small diffferences in
>>>> bitrate, this causes the receiver to fall out of sync as you found.
>>>
>>> in within a "packet" there's should be no pause between bytes, I will
>>> check though. There might be a small difference in bitrate, maybe I
>>> would need to verify how much.
>>>
>>>
>>>> Obviously, the best solution is to fix the FPGA as it is 'broken'. Is
>>>> there no way to fix it or get it fixed?
>>>
>>> The FPGA, is flying in space, together with the rest of the equipment.
>>> We cannot reprogram it, we can only replace the software in the DSP,
>>> with non-trivial effort.
>>>
>>>
>>>> Can you change the sender of the data? If so, you can set it to 2
>>>> stop bits. This can allow the receive to re-sync every byte. If
>>>> possible, I do try to set my transmitters to 2 stop bits and
>>>> receivers to 1. This can prevent trouble like this but costs a little
>>>> bandwidth.
>>>>
>>>>
>>> We are currently investigating it, the transmitter is controlled by an
>>> 8051 and in principle we should have control over it. Your idea is to
>>> use the second stop bit to allow better synching and hopefully not
>>> lose the following start bit, correct?
>>>
>>>> Another option would be to tweak the bitrates. It seems your sender
>>>> is now a tiny bit on the fast side w.r.t. the receiver. Maybe you can
>>>> slown down the clock on your sender by 1 or 2 percent? Try to get an
>>>> accurate measurement of the bitrate on both sides before you do
>>>> anything.
>>>>
>>>>
>>> We can certainly measure the transmission rate. I am not sure we can
>>> tweak the bitrates to that level. The current software on the 8051
>>> supports several bitrates (19.2, 9.6, 4.8, 2.4 Kbaud) but I'm afraid
>>> those options are somehow hardcoded in the transmitter. Certainly it
>>> would be worth having a look.
>> 
>> Go over the FPGA code with a fine-toothed comb -- whatever you're
>> doing, it won't help if the FPGA doesn't support it.
> 
> Ok, a colleague of mine went through it and indeed the start-bit logic
> is faulty, since it is looking for a negative transition but without the
> signal being synchronized with the internal clock (don't ask me how that
> is possible!).
> 
> Given this type of error the 0xFF byte will be lost completely, since
> there are no other start-bit to sync on within the byte, while in other
> cases it may resync with a '0' bit in within the byte.
> 
> 
>> It sounds like it either doesn't resynchronize in the middle of a
>> packet at all, or it starts seeking the start bit at the _end_ of the
>> preceding stop bit.  In the former case adding a stop bit will just
>> screw things up completely.  In the latter case, if the DSP bit clock
>> is slower than the FPGA the whole resynchronization thing falls apart
>> -- but adding that extra stop bit will fix things.
> 
> Adding a delay between bytes does mitigate the effect, but of course it
> does not solve it.
> 
> 
>> And next time you go sending satellites to space, put in a mechanism to
>> upload FPGA firmware!
>> 
>> 
> We have ~1000 FPGAs onboard and all of them are anti-fuse.

Ah well.

> The whole
> design and production process, which consists of several test campaigns
> on different quality models (Engineering, Qualification and Flight),
> should have ensured this level of functionality. The reason why it
> failed is most probably due to a poor level of quality control of the
> process. Just as an example we are missing test reports of the system,
> as well as Checksums for the FPGA firmware.

Testing is only the most visible and least effective of all means to 
insure quality.  It is a net with whale-sized holes, with which you go 
out and attempt to catch all the minnows in the sea by making repeated 
passes.

And all too often, quality programs end up being blown off by management 
and/or design personnel as being an unnecessary expense, or a personal 
affront.  Or the QA department gets staffed with martinets or rubber-
stamp bozos or whatever.  Because -- as you are seeing -- sometimes 
quality problems don't show up until long after everyone has gotten 
rewarded for doing a good job.

Humans just aren't made to build high-reliability systems, so an 
organization really needs to swim upstream to make it happen.

> As a side note: IMO the capability to reprogram an FPGA onboard is built
> when your needs are changing with time, not to fix some stupid UART
> receiver.

Well, time has marched on, and your needs have certainly changed.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Previous12 3 Next

forward error correction on ADSP21020

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group