Forums

The SPI interface reads the same value from the A/D. Why is it stuck?

Started by Eliza 2 months ago18 replieslatest reply 1 month ago56 views

I have a pressure sensor connected through the SPI interface of a Microchip controller, p33FJ128MC706A. The sensor is read at 125 Hz driven by a timer interrupt and data stored in a buffer. Occasionally, the pressure data is stuck at one value, the entire buffer having the same value. This problem seems to have started occurring and I am trying to find out if it is caused by a "weaker" batch of component or is there a hole in my code.

This is the code setting up the SPI

void SetupSPI()
{ 
    uint16 SPICON1Value; 
    uint16 SPICON2Value; // Set pin directions 
    _TRISG6 = 0;    //SCK1 - spi clock    
    _TRISG7 = 1;    //SDI1 - spi input 
    TRISG8 = 0;    //SDO1 - spi output //CloseSPI    
    // PIC errata requires interrupt protect when disabling interrupts.    
    INTERRUPT_PROTECT(_SPI2IE = 0); 
    // Disable the Interrupt bit in the Interrupt Enable Control Register    
    SPI2STATbits.SPIEN = 0;   
    // Disable the module. All pins controlled by PORT Functions 
    //Configure SPI2 module    
    _SPI2IP = ISP_INT_IPL;    // Interrupt priority 
    //Open SPI1   
    SPICON2Value =  FRAME_ENABLE_OFF & FRAME_SYNC_OUTPUT & FRAME_POL_ACTIVE_LOW & FRAME_SYNC_EDGE_COINCIDE; 
    SPICON1Value =  ENABLE_SCK_PIN &                    
        DISABLE_SDO_PIN & SPI_MODE16_ON &                    
        SPI_SMP_ON & SPI_CKE_OFF &                    
        SLAVE_ENABLE_OFF &                     
        CLK_POL_ACTIVE_LOW &                    
        MASTER_ENABLE_ON &                    
        SEC_PRESCAL_2_1 &                    
        PRI_PRESCAL_4_1; //spi clock freq 4.9536 MHz 
    SPI2CON1 = SPICON1Value;  
    SPI2CON2 = SPICON2Value; 
    SPI2STAT = SPI_DISABLE & SPI_IDLE_CON & SPI_RX_OVFLOW_CLR; 
    _SPI2IF = 0;    // Clear the Interrupt flag bit in the Interrupt Flag Control Register  
    _SPI2IE = 1;    //Enable interrupt (the SPI is disabled, it will be enabled when ready to take readings)

}

This is the code where the SPI conversion is executed

void StartConversion()
{
     if( SPI2STATbits.SPIROV == 1 )
     { 
        SPI2STATbits.SPIROV = 0;
        SPIROVResetCount++;
     }
        
     _RG8 = 1;                 // Start A/D conversion
     DelayUsPrecise( 4 );  //  tCONV time is max 3.2us
     _RG8 = 0;                //  enables the A/D converter device's SDO pin ( SDI pin for this microcontroller)

     DelayUsPrecise( 1 );  // this delay helps (found through experimentation)
     SPI2BUF = 0x0000;        //  starts data transfer from the A/D converter
}

This is where the pressure data is stored (after averaging 16 individual reads)

uint16 ReadPressure()
{ 
     int i;
     A2Dsum = 0;
     Pressure.Data = SPI2BUF;        //this will avoid generating buffer overrun SPI error
     SPI2STATbits.SPIEN = 1;         //SPI module enabled, so SPI interrupts will start coming in
     
    //normally, SPI interrupt is used to read sensor continuously
     //but here we just want one burst of sensor read without interrupt
     INTERRUPT_PROTECT(_SPI2IE = 0); // Disable the Interrupt bit in the Interrupt Enable Control Register
 
     for( i = 0; i < A2DBUF_SIZE; i++ )
     {
        SPI2STATbits.SPIROV = 0;          // Clear SPI2 receive overflow just in case (otherwise it may stop further data transfer)
        StartConversion();
        while( !SPI2STATbits.SPIRBF );
        A2DBuf[i] = SPI2BUF;
        A2Dsum += SPI2BUF;
     }
     Pressure.Data = (uint16)( (A2Dsum+(A2DBUF_SIZE>>1)) >> IDX_BITSIZE );
     SPI2STATbits.SPIEN = 0;            //disable SPI module
     _SPI2IE = 1;                    //leave the interrupt setting as it was before entering this function
 
     return Pressure.Data;
}


Thank you for looking at my code and eventually finding a potential cause for my problem.

Also, if you ran into similar problems where a component would get stuck and you figured it out, would be a great help to learn about it.

Thanks,

Elizabeth


[ - ]
Reply by KocsonyaJuly 9, 2019

As mr_bandit suggested, a scope would definitely help.

An other thing that you might consider is to check whether after StartConversion() the SPI becomes busy or not, i.e. SPIRBF goes away at all. If the SPI is stuck for some reason, then it will simply ignore your start command and the result is that it always indicates 'done' and the same value in the buffer.

Since the SPI transfer is a lot slower than the CPU, you could do something like this:

SpiStartConversion();
while ( SPI2STATbits.SPIRBF );
while ( ! SPI2STATbits.SPIRBF );

if the SPI is stuck, so will be your code, so you will get a good indication that the thing is dead.


[ - ]
Reply by ElizaJuly 9, 2019

I like the idea on how to sniff the problem. Thanks Kocsonya! Elizabeth (Erzsebet)

[ - ]
Reply by MichaelKellettJuly 10, 2019

What pressure sensor are you using ?


MK

[ - ]
Reply by ElizaJuly 10, 2019
SDX01G2 - differential pressure sensor

AD620AR - amplifier
LTC1864 - converter


Here are the components from the sensor up to the pins of the microcontroller

pressuresensor and converter_82724.png

[ - ]
Reply by MichaelKellettJuly 10, 2019

Nothing stands out as bad as far as I can see. I have to tell you that I almost never get a serial interface to work correctly first time round and almost always end up connecting a scope. 

When you get the buffer full of the same value is it a repetition of a valid value or something else (like 0000 or ffff).


[ - ]
Reply by ElizaJuly 10, 2019

It is valid. As if the last value from the previous operation is stuck in there. The system runs hundreds of such operations, intermittently choking on this bad data.

Indeed, during development you find the quirks and irks of the components you are using, and I have gone through several versions of code, until it looked solid and tested the sauerkraut out of it. The last main improvement of the code was where I had a burst of 16 reads and averaged them for one buffer-worthy data. The code passed the development phase five years ago, but recently more and more customers experience this "flat-line" problem. My task here is to see if we can ask the board vendor to check on their process, batches etc, or I must upgrade the firmware with a more ingenious code.


[ - ]
Reply by MichaelKellettJuly 11, 2019

You've definitely hit scope territory - set the board up on the bench in a simulation and see if you can replicate the problem. Use a board that has had problems in the field. If nothing goes wrong then try to cause the problem by messing with the power supply, inducing spikes on power supplies etc.

You could add some monitoring to your code, detect bad behavior by spotting all 16 readings the same - toggle a pin to tell you it's gone wrong.

You can't expect to fix this without catching it going wrong - since you have made lots that work it's going to be some horrible edge case kind of thing.

If you can get it to go wrong on the bench it will be much easier to fix.

Does it start working again spontaneosuly ?


[ - ]
Reply by mr_banditJuly 10, 2019

Kocsonya has a good point about the busy bit. You might put something in the loop (pseudo code):

while ( SPI2STATbits.SPIRBF ) { print "." }
while ( ! SPI2STATbits.SPIRBF ) { print "-" }

I always try to use a simple device - RS232, SPI or I2C - when creating a device driver. For example, a simple temperature sensor with I2C:

https://www.sparkfun.com/products/13314

I used this on a professional project where there was a complex I2C chain. A quick and easy way to tell if the chain was working or not.

Adafruit has the https://www.adafruit.com/product/2651 BMP280 temperature sensor that is both I2C and SPI. $10

For RS232 or others, I will write a little program to emit or read data on an Arduino using the Arduino library. For example (pseudo code)

loop()
{
    if( Serial.available )
      { int c = Wire.read();
        Serial.print( "got " );
        Serial.println( c, HEX );
       }
}

Scope are pretty easy to learn these days. The one I mentioned for under $400 is a good one. I will not tell you of the hack on he net that will convert from 50MHz to 100MHz...

Not sure where you are, but there should be a hacker/makerspace near you - google "makerspaces near me". There are folks there who will walk you through how to use oscopes - they should have a couple, including the $375 one I suggested.

They should also be able to help you with the bus pirate - a low-cost version. It use the hardware to get the data, then a program in a PC does the display and control. Pretty nifty.

Please let us know how you are doing!

[ - ]
Reply by ElizaJuly 10, 2019

The print() will not work in my system, it has no peripheral and if indeed the code gets stuck in this loop, there's no way to send commands to the firmware, I can't even write to the EEPROM leaving some breadcrumb behind about what happened.

Thanks for the many ideas. And dealing with the scope, I agree with you, I got to learn it. I had an EE colleague who did all this for me, but he left.

But, hooking up the scope is a project in itself, this is a refrigerator size machine (in production for more than five years) and the board is deeply buried in it. I would need to solder some fine wires on a board and swap the real board with this mockup. It, honestly, makes me shiver. If I can do something genius trick in the code to find out where the glitch is, then the solution should lend itself.

[ - ]
Reply by mr_banditJuly 10, 2019

A trick is to find an unused pin and wiggle it.

Can you get a board by itself? You imply this is something y'all make. How are you able to make changes to the firmware?

Microchip sells eval boards - see https://www.microchip.com/wwwproducts/en/dsPIC33FJ128MC706A

They range in price. The goal is to get a demo board and add your sensor to it. You would want a board with the exact same chip and clock speed. This way you can debug on a bench.

Where is this pressure sensor in the box? By the buried MPU board, or near the surface? How is it connected? Can you access the sensor cable?

You are really talking to the ADC. It doesn't really matter what the sensor is. Is this a completely new circuit (all three chips)? Can you get to the cable to the ADC?

We need a bit more info to be able to help guide you. The key is to simplify, simplify, simplify...

Yours .. bandit

[ - ]
Reply by mr_banditJuly 18, 2019

what is your status? how are you doing?

[ - ]
Reply by ElizaJuly 30, 2019

I got back on this project. And this is what I did.

I tried the trap with the busy bit 

while ( SPI2STATbits.SPIRBF ) { print "." }
while ( ! SPI2STATbits.SPIRBF ) { print "-" }

but the logic did not hang in the loop. 


The next thing I tried, and I am still testing, is to increase the wait time before data transfer from 1 to 2 us.

     DelayUsPrecise( 2 );  // this delay helps (found through experimentation)
     SPI2BUF = 0x0000;        //  starts data transfer from the A/D converter

The run that usually failed on this board passed so far, and now I am extending the test.

Elizabeth

[ - ]
Reply by mr_banditJuly 30, 2019

Look carefully at the definition of the SPIRBF bit. I have seen cases where a status bit's definition is deceptively defined && it doesn't work the way you would think.

A delay is a time-honored method. Just make sure the delay is *always* .GT. the worst case...


You are making progress!

[ - ]
Reply by ElizaJuly 31, 2019

I read through the data sheet of the converter and I found that the max conversion time is 4.6 us.

https://www.analog.com/media/en/technical-documentation/data-sheets/18645lfs.pdf

In my code, the wait time is 4 us. I am running a test that waits 5 us.

I will let you know how this goes. If this is the culprit of my problem, then I want to understand why this happens only at the beginning of the reading sequence. The entire data from the sequence is of one value. occasionally, it should happen in the middle of it too.

Thank you all for your help,

Elizabeth

[ - ]
Reply by mr_banditJuly 31, 2019

Not unheard of that the first i/o takes a bit longer because of setup. But it seems you have solved the basic problem.

It also may be that the only way to determine why this max convert only happens at the start is to put a scope on the PI bus && assert a line in the ISR. Since that is non-trivial for you, you may never know - although I applaud you for really wanting to grok the issue - something a fair number of programmers don't care about.

[ - ]
Reply by mr_banditJuly 9, 2019

The best thing to do is use an oscope to look at the SPI lines to determine where the problem is. You can then determine which end is getting confused.

If you don't want to pop for the $375 for the oscope, try bus pirate for $40

http://dangerousprototypes.com/blog/bus-pirate-manual/bus-pirate-spi-guide/

https://www.seeedstudio.com/Bus-Pirate-v4.html

https://www.seeedstudio.com/Bus-Pirate-v4-Acrylic-case-kit-v1-p-1417.html

oscope: https://www.tequipment.net/Rigol/DS1054Z/Digital-Oscilloscopes/?v=0


[ - ]
Reply by ElizaJuly 9, 2019

Thanks for a prompt reply. Sorry to admit, but I am not versed in using scopes. I think I will try Kocsonya's idea (below). I have better access to the code.

Elizabeth

[ - ]
Reply by ElizaAugust 9, 2019

I want to report to all my helpers what I have found.

In my code there is a silly bug. That is not in the part that I shared with you. 

The bug is that in file1 there is a global uint16 var defined and initialized with 0 in func1() in file1. Then in file2 the global variable is declared as extern of type uint32 and being incremented in func2() in file2, where func2() is the interrupt handler that gets called when the spi data got shifted into SPIxBUF. I did not display func2() on this forum, so you wouldn't have picked up this problem...

If I correct the code then the constant-values-problem does not exhibit any more, yet I cannot explain why the code change fixes the problem. Even if the variables in that memory region get corrupted, they do not drive any logic decision, so ultimately their values should not affect the outcome. The timing however changes with the code fix because I am incrementing a 16-bit memory location, instead of a 32-bit one.

So, the problem is possibly still there, it just got shifted away because of timing change. As I said, this is an interrupt handler and it can jitter.

I found other errors in the code, but not significant to be the culprits. But I thought that it should be shared.

1. The pin that is toggled to start and stop the A/D conversion should be written to by the LATCH register and not by the PORT register. As such, I could take out the 1us delay after conversion finished. As you can see in the comment in that line, I did not quite understand why I needed it. But after diving deep in every line, it finally stood out. The new code is now this.

_LATG8 = 1;          //incorrect is to use _RG8
DelayUsPrecise( 4 ); //  tCONV time is max 3.2us
_LATG8 = 0;

2. The SPI interrupt flag should be cleared before starting the reading sequence.

Soooo... I keep digging until I understand the problem and hope to share with you my eureka.

Elizabeth