Forums

I2C troubleshooting

Started by Patrick March 8, 2005
Hi,

I'm developing an I2C slave device based on a MSP430F169.
Another 'F169 is the master (for testing) and every now and then, the
communication breaks down.

The master's application is derived from TI app note slaa208
"Interfacing an EEPROM to the MSP430 I2C Module".

I'm implementing a I2C bootloader for the slave device, and the
control flow goes like this:

1a) Master does "ack-polling" to see if the slave is there.
(It puts a start-condition on the bus, followed by the slave address,
followed by a stop condition, until it sees ACK from the slave)
1b) Slave I2C module HW sees its address byte and sends ACK

2a) Master reads slave's status register.
2b) Slave writes its status
Master repeats until the register reads "OK, Slave ready"

3a) Master sends the control code for "write flash block"
3b) Slave enters "write flash block" routine

4a) Master sends a single line of the HEX file
4b) Slave writes data to flash and leaves "write flash block" routine

5a) Master jumps to 1) if it's not finished

My problem is that the ack-polling sometimes doesn't work. It is not
exactly reproducible, and it seems to depend on things like room
temperature, air moisture, moon phase, etc.
And this error doesn't happen if the master waits a LOOONG time
(almost 100 ms) between steps 5) and 1). (But this is not desirable at
all, and I want to know the real reason for this problem).

When the ack-polling fails, the master sends the correct sequence on
the bus, but the slave doesn't ACK, and this goes on forever.
Ok, it must be the slave, I thought. But when I reset the _master_,
then everything is OK again.

Does this sound familiar to anyone? I don't really know how to further
debug this stuff. I have normal pullups (5K ohms) and if there isn't
lots of traffic, then everything's OK.

Do I have to resort to s/w bit banging? The 'F169 errata sheet doesn't
mention any bugs in the I2C module, IIRC.

Any ideas/suggestions are very welcome.

TIA,
Patrick
On Tue, 08 Mar 2005 19:58:00 +0100, Patrick wrote:

> Hi, > > I'm developing an I2C slave device based on a MSP430F169. > Another 'F169 is the master (for testing) and every now and then, the > communication breaks down. > > The master's application is derived from TI app note slaa208 > "Interfacing an EEPROM to the MSP430 I2C Module". > > I'm implementing a I2C bootloader for the slave device, and the > control flow goes like this: > > 1a) Master does "ack-polling" to see if the slave is there. > (It puts a start-condition on the bus, followed by the slave address, > followed by a stop condition, until it sees ACK from the slave) > 1b) Slave I2C module HW sees its address byte and sends ACK > > 2a) Master reads slave's status register. > 2b) Slave writes its status > Master repeats until the register reads "OK, Slave ready" > > 3a) Master sends the control code for "write flash block" > 3b) Slave enters "write flash block" routine > > 4a) Master sends a single line of the HEX file > 4b) Slave writes data to flash and leaves "write flash block" routine > > 5a) Master jumps to 1) if it's not finished > > My problem is that the ack-polling sometimes doesn't work. It is not > exactly reproducible, and it seems to depend on things like room > temperature, air moisture, moon phase, etc. > And this error doesn't happen if the master waits a LOOONG time > (almost 100 ms) between steps 5) and 1). (But this is not desirable at > all, and I want to know the real reason for this problem). > > When the ack-polling fails, the master sends the correct sequence on > the bus, but the slave doesn't ACK, and this goes on forever. > Ok, it must be the slave, I thought. But when I reset the _master_, > then everything is OK again. > > Does this sound familiar to anyone? I don't really know how to further > debug this stuff. I have normal pullups (5K ohms) and if there isn't > lots of traffic, then everything's OK. > > Do I have to resort to s/w bit banging? The 'F169 errata sheet doesn't > mention any bugs in the I2C module, IIRC. > > Any ideas/suggestions are very welcome. > > TIA, > Patrick
I2C has a mode where you can stretch the clock to accomodate slower devices. IIRC the slave holds off the bus by stretching the low period of SCL. There is a timeout feature to prevent hanging up the bus indefinitely. Don't know if that's any help. Bob
Hi,

Bob Stephens wrote:
> IIRC the slave holds off the bus by stretching the low period of > SCL.
Yes, IIRC, thats the case when the slave has seen its address but user code cannot receive / transmit immediately. Then the on-chip I2C module pulls SCL low until user code reads / writes the data register. But my problem is that the slave doesn't even ACK its address.
> There is a timeout feature to prevent hanging up the bus indefinitely.
My bus is idle (both SDA and SCL are high). But thanks anyway, Patrick
Patrick wrote:

> Hi, > > I'm developing an I2C slave device based on a MSP430F169. > Another 'F169 is the master (for testing) and every now and then, the > communication breaks down. > > The master's application is derived from TI app note slaa208 > "Interfacing an EEPROM to the MSP430 I2C Module". > > I'm implementing a I2C bootloader for the slave device, and the > control flow goes like this: > > 1a) Master does "ack-polling" to see if the slave is there. > (It puts a start-condition on the bus, followed by the slave address, > followed by a stop condition, until it sees ACK from the slave) > 1b) Slave I2C module HW sees its address byte and sends ACK > > 2a) Master reads slave's status register. > 2b) Slave writes its status > Master repeats until the register reads "OK, Slave ready" > > 3a) Master sends the control code for "write flash block" > 3b) Slave enters "write flash block" routine > > 4a) Master sends a single line of the HEX file > 4b) Slave writes data to flash and leaves "write flash block" routine > > 5a) Master jumps to 1) if it's not finished > > My problem is that the ack-polling sometimes doesn't work. It is not > exactly reproducible, and it seems to depend on things like room > temperature, air moisture, moon phase, etc. > And this error doesn't happen if the master waits a LOOONG time > (almost 100 ms) between steps 5) and 1). (But this is not desirable at > all, and I want to know the real reason for this problem). > > When the ack-polling fails, the master sends the correct sequence on > the bus, but the slave doesn't ACK, and this goes on forever. > Ok, it must be the slave, I thought. But when I reset the _master_, > then everything is OK again. > > Does this sound familiar to anyone? I don't really know how to further > debug this stuff. I have normal pullups (5K ohms) and if there isn't > lots of traffic, then everything's OK. > > Do I have to resort to s/w bit banging? The 'F169 errata sheet doesn't > mention any bugs in the I2C module, IIRC. > > Any ideas/suggestions are very welcome.
Any idea of the failure rate itself, in terms of completed i2c bus packets ? Sounds like this is a rare failure ( <0.1%), which rather rules out fundamental HW flaws. Can you try send of multiple STOP pulses ? - because i2c is a state engine system, it is possible for slaves to not terminate/exit correctly ( which is why the last byte in a series is supposed to not have ACK ) If that occurs, the start arrives on a slave system that is not fully transaction-complete; of course, in an ideal designed HW, the Start should over-ride all others, but few devices are ideal. You will see a number of the newer i2c RTC devices now come with a RESET pin, and that will have been added because system reliability demanded it.... The time delay for recovery you see, might be an internal i2c watchdog ? -jg
Hi,

Jim Granville wrote:
> Any idea of the failure rate itself, in terms of completed i2c bus > packets ?
Not really. It can happen after a few packets, or after 1000 packets.
> Can you try send of multiple STOP pulses ? - because i2c is a state > engine system, it is possible for slaves to not terminate/exit correctly > ( which is why the last byte in a series is supposed to not have ACK ) > If that occurs, the start arrives on a slave system that is not fully > transaction-complete; of course, in an ideal designed HW, the Start > should over-ride all others, but few devices are ideal.
That's interesting. But it still puzzles me that the slave works again when I reset the _master_. I wonder how the slave can notice that, given that the bus is idle and doesn't change its state when I restart the master.
> You will see a number of the newer i2c RTC devices now come with a > RESET pin, and that will have been added because system reliability > demanded it....
interesting again...
> The time delay for recovery you see, might be an internal i2c watchdog ?
I don't think so. When I comment out the flash writing stuff, the slave is ready immediately after it receives the data. (And then my problem doesn't occur). It looks like the delay that has to be made by the master is directly correlated to the amount of time needed by the slave to complete its write operation. Like the system couldn't _always_ handle the situation that the slave doesn't respond immediately. I will try the multiple STOP thing anyway. Thanks, Patrick
Patrick wrote:
> Hi, > > Jim Granville wrote: > >> Any idea of the failure rate itself, in terms of completed i2c bus >> packets ? > > > Not really. It can happen after a few packets, or after 1000 packets. > >> Can you try send of multiple STOP pulses ? - because i2c is a state >> engine system, it is possible for slaves to not terminate/exit correctly >> ( which is why the last byte in a series is supposed to not have ACK ) >> If that occurs, the start arrives on a slave system that is not fully >> transaction-complete; of course, in an ideal designed HW, the Start >> should over-ride all others, but few devices are ideal. > > > That's interesting. But it still puzzles me that the slave works again > when I reset the _master_. I wonder how the slave can notice that, given > that the bus is idle and doesn't change its state when I restart the > master. > >> You will see a number of the newer i2c RTC devices now come with a >> RESET pin, and that will have been added because system reliability >> demanded it.... > > > interesting again... > >> The time delay for recovery you see, might be an internal i2c watchdog ? > > > I don't think so. When I comment out the flash writing stuff, the slave > is ready immediately after it receives the data. (And then my problem > doesn't occur). It looks like the delay that has to be made by the > master is directly correlated to the amount of time needed by the slave > to complete its write operation. Like the system couldn't _always_ > handle the situation that the slave doesn't respond immediately. > > I will try the multiple STOP thing anyway.
This new info is sounding more like a i2c state phase problem - ie if the slave is caught in various levels of too busy, the slave i2c is expected to wait somewhere along the sequence, and sometimes you hit the jackpot..... You do send a stop, after each failed start.address to the slave ? Can you buffer the info, so the write operation does not interact with the i2c timing ? -jg
Patrick wrote:
> Hi, > > Jim Granville wrote: > >> Any idea of the failure rate itself, in terms of completed i2c bus >> packets ? > > > Not really. It can happen after a few packets, or after 1000 packets. > >> Can you try send of multiple STOP pulses ? - because i2c is a state >> engine system, it is possible for slaves to not terminate/exit correctly >> ( which is why the last byte in a series is supposed to not have ACK ) >> If that occurs, the start arrives on a slave system that is not fully >> transaction-complete; of course, in an ideal designed HW, the Start >> should over-ride all others, but few devices are ideal. > > > That's interesting. But it still puzzles me that the slave works again > when I reset the _master_. I wonder how the slave can notice that, given > that the bus is idle and doesn't change its state when I restart the > master. > >> You will see a number of the newer i2c RTC devices now come with a >> RESET pin, and that will have been added because system reliability >> demanded it.... > > > interesting again... > >> The time delay for recovery you see, might be an internal i2c watchdog ? > > > I don't think so. When I comment out the flash writing stuff, the slave > is ready immediately after it receives the data. (And then my problem > doesn't occur). It looks like the delay that has to be made by the > master is directly correlated to the amount of time needed by the slave > to complete its write operation. Like the system couldn't _always_ > handle the situation that the slave doesn't respond immediately. > > I will try the multiple STOP thing anyway. >
Is there a hardware clock clamp in the slave? It should clamp the clock as soon as the slave is not able to catch the current bit. -- Tauno Voipio tauno voipio (at) iki fi
Just FYI:

The MSP430F16x's I2c hardware controller turned out to be buggy (some 
other people said they had problems as well). I replaced the rx and tx 
routines on the master side with software-controlled bit banging, and 
suddenly the problem was gone. I'm still using the I2c HW module on the 
slave side, but that seems to be OK.

Regards,
Patrick
Patrick wrote:

> Just FYI: > > The MSP430F16x's I2c hardware controller turned out to be buggy (some > other people said they had problems as well). I replaced the rx and tx > routines on the master side with software-controlled bit banging, and > suddenly the problem was gone. I'm still using the I2c HW module on the > slave side, but that seems to be OK.
Interesting, (and not the first time). I remember years ago getting customers to change from the 87C552 i2c HW to SW masters, and getting smaller code and more reliable operation...
Bob Stephens <stephensyomamadigital@earthlink.net> wrote:

>I2C has a mode where you can stretch the clock to accomodate slower >devices. IIRC the slave holds off the bus by stretching the low period of >SCL. There is a timeout feature to prevent hanging up the bus indefinitely.
Several non-exhaustive examinations (i.e. skims) of Philips' I2C spec (version 2.0 December 1998) does not indicate what that timeout is. Where and how is the timeout quantified? -- Dan Henry