EmbeddedRelated.com
Forums

optical debugger tools or some state machine deadlocks

Started by janka vietzen December 13, 2006
last 2 days I was at a customers machine searching for bugs without any
success. CPU ist a Freescale 6808 and code written in assembly (by me about
5 years ago). The machine is now in production for about 1,5 years. In that
time about 10 million pcs have been produced with. 

The upper unit consists of a turn table with 2 opposite mounted grabbers.
The lower unit consists of a shuttle for feeding material and another
shuttle what takes away the waste. Only one of those 2 shuttles can be in
the grabber position. One grabber is always in shuttle position and
opposite grabber is in workpiece position. Every workpiece needs a grabber
cycle to put material on. Thereby some waste is accumulated at the grabbers
tools what has to be flushed after preselected number of workpieces. 

The SW design is mainly made by 4 state machines. One state machine for both
shuttles becouse only one of them can be in grabber position. One main
machine cycle state machine for the turn table and 2 more for the both
grabbers. The grabbers work simulatanously. Grabber at workpiece puts
material to the workpiece while opposite graber picks up new material from
feed shuttle. With a grabber in action, the turntable must not turn to
avoid crash. In case of a cleaning cycle, the lower grabber first flushs
the waste to the cleaning shuttle, then the shuttles change and the same
grabber picks up new material and tilts up to workpiece. 

The rules seem simple but they are not. Main state machine writes commands
to the lower state machine. There is a material request command from the
upper grabber to the shuttle state machine. That is a counter what contains
0,1 or 2 becouse up to 2 grabbers can be without material same time. There
is also a cleaning request from the upper grabber to the shuttle to allow
the shuttle to be in position until the wasted grabber tilted down by turn
table for cleaning. This is only a flag becouse both grabbers must not
clean in consecutive cycles. This would generate to much delay becouse a
grabber sequence waste-material pickup is much longer than a single
material release. In average the material shuttle is designed slightly
faster than the external conveyor can exxchange the next workpiece. This
makes a one cycle time window where one cleaning cycle can be inserted
without too much delay to the workpiece transport conveyor. 

The SW has now 2 diffrent bugs (maybe the reason is same). One bug is, that
occasionally (about every hour) one grabber makes an accidently cleaning to
the cleaning shuttle after (or before) cleaning the correct grabber. The
other bug I found, is a dead lock situation. Material feed shuttle waits
that grabber picks up the material, the grabber above waits for cleaning
shuttle to release waist and cleaning shuttle cannot go into position where
the feed shuttle waits. 

I observed that machine now for 2 days without exact result what makes me
crazy. The time for one workpiece is about 1-2 seconds and most time there
are 2 or 3 concurrent moves. Unfortunately I have no Idea on how to go on.
I made a Display what can show the state numbers in realtime. Probably a
digital camera system would be ideal what records machine moves and state
number display. A record time of one minute is much enough to replay and
examine what happened before a bug. A frame rate of 20 photos per second
seems also ok. Any ideas (or maybe experience) if that is possible with USB
camera? I hardly see a chance to trace this with a logic analyzer


janka vietzen wrote:

> last 2 days I was at a customers machine searching for bugs without any > success. CPU ist a Freescale 6808 and code written in assembly (by me about > 5 years ago). The machine is now in production for about 1,5 years. In that > time about 10 million pcs have been produced with. > > The upper unit consists of a turn table with 2 opposite mounted grabbers. > The lower unit consists of a shuttle for feeding material and another > shuttle what takes away the waste. Only one of those 2 shuttles can be in > the grabber position. One grabber is always in shuttle position and > opposite grabber is in workpiece position. Every workpiece needs a grabber > cycle to put material on. Thereby some waste is accumulated at the grabbers > tools what has to be flushed after preselected number of workpieces. > > The SW design is mainly made by 4 state machines. One state machine for both > shuttles becouse only one of them can be in grabber position. One main > machine cycle state machine for the turn table and 2 more for the both > grabbers. The grabbers work simulatanously. Grabber at workpiece puts > material to the workpiece while opposite graber picks up new material from > feed shuttle. With a grabber in action, the turntable must not turn to > avoid crash. In case of a cleaning cycle, the lower grabber first flushs > the waste to the cleaning shuttle, then the shuttles change and the same > grabber picks up new material and tilts up to workpiece. > > The rules seem simple but they are not. Main state machine writes commands > to the lower state machine. There is a material request command from the > upper grabber to the shuttle state machine. That is a counter what contains > 0,1 or 2 becouse up to 2 grabbers can be without material same time. There > is also a cleaning request from the upper grabber to the shuttle to allow > the shuttle to be in position until the wasted grabber tilted down by turn > table for cleaning. This is only a flag becouse both grabbers must not > clean in consecutive cycles. This would generate to much delay becouse a > grabber sequence waste-material pickup is much longer than a single > material release. In average the material shuttle is designed slightly > faster than the external conveyor can exxchange the next workpiece. This > makes a one cycle time window where one cleaning cycle can be inserted > without too much delay to the workpiece transport conveyor. > > The SW has now 2 diffrent bugs (maybe the reason is same). One bug is, that > occasionally (about every hour) one grabber makes an accidently cleaning to > the cleaning shuttle after (or before) cleaning the correct grabber. The > other bug I found, is a dead lock situation. Material feed shuttle waits > that grabber picks up the material, the grabber above waits for cleaning > shuttle to release waist and cleaning shuttle cannot go into position where > the feed shuttle waits. > > I observed that machine now for 2 days without exact result what makes me > crazy. The time for one workpiece is about 1-2 seconds and most time there > are 2 or 3 concurrent moves. Unfortunately I have no Idea on how to go on. > I made a Display what can show the state numbers in realtime. Probably a > digital camera system would be ideal what records machine moves and state > number display. A record time of one minute is much enough to replay and > examine what happened before a bug. A frame rate of 20 photos per second > seems also ok. Any ideas (or maybe experience) if that is possible with USB > camera? I hardly see a chance to trace this with a logic analyzer
Does not could like a camera issue. A USB storage scope, with Digital channels is more what you want, preferably one with zooming digital storage (edge based) Try and get a handle on the failure duty %, yes it is low, but the value can give clues. Commonly low failure rates are aperture effects, where some tiny window exists that allows the oops to occur. Deadlock should be easy to resolve - you should already have code that recovers gracefully from unforseen states, right ? -jg
> Does not could like a camera issue. A USB storage scope, with Digital > channels is more what you want, preferably one with zooming digital > storage (edge based)
seems to be rather complicated to connect and cannot record the internal state numbers displayed on screen.
> Try and get a handle on the failure duty %, yes it is low, but the > value can give clues.
Failure duty is aproximatly 1 cycle from 3000.
> Commonly low failure rates are aperture effects, where some tiny window > exists that allows the oops to occur. > Deadlock should be easy to resolve - you should already have code that > recovers gracefully from unforseen states, right ?
A dead lock occurs about every 8 hours. The operators have now the instruction to write the displayed states into a table for every deadlock. Lets hope they are not too diffrent. A dark corner in the state machine design are my grabber state machines. They only consist from 3 states. Idle, moving to front and moving back. Aditionally I introduced a flag what indicates if grabber is with or without material. Later the cleaning has been introduced and I added another clean/dirty flag for each grabber. Those flags indicate "hidden" states what are difficult to trace and one has to keep care for proper set-reset sequence. Probably I have to design the state machine for the grabbers with much more states. Moving to front without material, moving back with material, moving to front without material but clean, moving to front dirty, moving back cleaned and so on. That seems more easy to make shure the clean - pickup sequence is always in order and eliminates bugs in set and reset of single flags ...
> > seems to be rather complicated to connect and cannot record the internal > state numbers displayed on screen.
Why not sending a log line out via a I/O pin serially, each time any of the state machines change state ? I have TRACE() macros in each state machine I do, that I disable at the end. Connect a PC with a serial terminal, log to disk, and probably you can get a lot of insight of what's going on. Even without an UART, you can bit-bang data by software (uart tx is easy, just a loop and a calibrated delay). You can output data very fast, should not disrupt your main timings... If you have enough I/O to display a number on a screen, you should also be able to send those serially. You can send all the states and input/output states in binary for each status change.
janka vietzen wrote:
>> Does not could like a camera issue. A USB storage scope, with Digital >>channels is more what you want, preferably one with zooming digital >>storage (edge based) > > > seems to be rather complicated to connect and cannot record the internal > state numbers displayed on screen. > > >> Try and get a handle on the failure duty %, yes it is low, but the >>value can give clues. > > > Failure duty is aproximatly 1 cycle from 3000. > > >> Commonly low failure rates are aperture effects, where some tiny window >>exists that allows the oops to occur. >> Deadlock should be easy to resolve - you should already have code that >>recovers gracefully from unforseen states, right ? > > > A dead lock occurs about every 8 hours. The operators have now the > instruction to write the displayed states into a table for every deadlock. > Lets hope they are not too diffrent. > > A dark corner in the state machine design are my grabber state machines. > They only consist from 3 states. Idle, moving to front and moving back. > Aditionally I introduced a flag what indicates if grabber is with or > without material. Later the cleaning has been introduced and I added > another clean/dirty flag for each grabber. Those flags indicate "hidden" > states what are difficult to trace and one has to keep care for proper > set-reset sequence. Probably I have to design the state machine for the > grabbers with much more states. Moving to front without material, moving > back with material, moving to front without material but clean, moving to > front dirty, moving back cleaned and so on. That seems more easy to make > shure the clean - pickup sequence is always in order and eliminates bugs in > set and reset of single flags ...
Yes it sounds like a pencil and paper are the best tools - in your hands, not the operators :) You need to design a state control and handshake system, that can cope gracefully with inputs at unexpected times, and that self-checks for recovery from any illegal states. One aperture bug, can occur if you test the signals in more than once place. You should sample the pins, and create a stable set of flags that are then used by the statemachine. This app sounds like a golden reference for dirty industrial signals ! -jg
janka vietzen wrote:
> last 2 days I was at a customers machine searching for bugs without any > success. CPU ist a Freescale 6808 and code written in assembly (by me about > 5 years ago). The machine is now in production for about 1,5 years. In that > time about 10 million pcs have been produced with. > > The upper unit consists of a turn table with 2 opposite mounted grabbers. > The lower unit consists of a shuttle for feeding material and another > shuttle what takes away the waste. Only one of those 2 shuttles can be in > the grabber position. One grabber is always in shuttle position and > opposite grabber is in workpiece position. Every workpiece needs a grabber > cycle to put material on. Thereby some waste is accumulated at the grabbers > tools what has to be flushed after preselected number of workpieces. > > The SW design is mainly made by 4 state machines. One state machine for both > shuttles becouse only one of them can be in grabber position. One main > machine cycle state machine for the turn table and 2 more for the both > grabbers. The grabbers work simulatanously. Grabber at workpiece puts > material to the workpiece while opposite graber picks up new material from > feed shuttle. With a grabber in action, the turntable must not turn to > avoid crash. In case of a cleaning cycle, the lower grabber first flushs > the waste to the cleaning shuttle, then the shuttles change and the same > grabber picks up new material and tilts up to workpiece. > > The rules seem simple but they are not. Main state machine writes commands > to the lower state machine. There is a material request command from the > upper grabber to the shuttle state machine. That is a counter what contains > 0,1 or 2 becouse up to 2 grabbers can be without material same time. There > is also a cleaning request from the upper grabber to the shuttle to allow > the shuttle to be in position until the wasted grabber tilted down by turn > table for cleaning. This is only a flag becouse both grabbers must not > clean in consecutive cycles. This would generate to much delay becouse a > grabber sequence waste-material pickup is much longer than a single > material release. In average the material shuttle is designed slightly > faster than the external conveyor can exxchange the next workpiece. This > makes a one cycle time window where one cleaning cycle can be inserted > without too much delay to the workpiece transport conveyor. > > The SW has now 2 diffrent bugs (maybe the reason is same). One bug is, that > occasionally (about every hour) one grabber makes an accidently cleaning to > the cleaning shuttle after (or before) cleaning the correct grabber. The > other bug I found, is a dead lock situation. Material feed shuttle waits > that grabber picks up the material, the grabber above waits for cleaning > shuttle to release waist and cleaning shuttle cannot go into position where > the feed shuttle waits. >
I would investigate the cleaning request. What happens to your state machine when a clean request is sent and also a material request command (with either 0, 1, or 2)? When does a grabber decide that it must be cleaned? And does it do something asynchronously that causes an undesired state change?
Hi J�rgen,

[...]

>I made a Display what can show the state numbers in realtime. Probably a >digital camera system would be ideal what records machine moves and state
I don't think so. [...]
>camera? I hardly see a chance to trace this with a logic analyzer
Why not? Record every input and output signal along with the state information. Use DigiView as LA because it has a smart recording method: it can record over a long period _and_ with good time resolution using very little memory. And is cheap. Buy it from elmicro and tell Oliver Thamm that I recommended it <g>. Oliver -- Oliver Betz, Muenchen (oliverbetz.de)
Antonio Pasini wrote:

> Why not sending a log line out via a I/O pin serially, each time any of > the state machines change state ? > > I have TRACE() macros in each state machine I do, that I disable at the > end.
the display screen is linked by CanOpen and screen output is completly async to the state machines. The display is much faster than what you can see but probably cannot show all state changes for a cycle time of about 10ms average. Luckily the system is CAN Master and it should be feasable to tx one byte with state machine number and actual state at seperate CAN ID and then using a CAN bus trace with ID filter. Thanx for that idea ... @Antonio: To observe state machines the trace() macro seems to be required at dozens of locations. It should contain a call to queue one or two bytes to a tx buffer whenever a state variable is written. Can you post some details to your macro to make shure I understand what you do with ?
> I would investigate the cleaning request. What happens to your state > machine when a clean request is sent and also a material request > command (with either 0, 1, or 2)? When does a grabber decide that it > must be cleaned? And does it do something asynchronously that > causes an undesired state change?
already checked that detail. Each grabber has a counter what counts up. User set up a number of grabber cycles after wich the grabbers are cleaned. Initialization sequence rotates that number right (/2) and throws away remainder. That is start for grabber 1, start value for grabber 2 is zero to avoid timing delays by consecutive cleaning cycles. After each workpiece action the appopriate counter is incremented. If the counter matches the user selected cycle, a cleaning request is set. That is done in advance, to allow the cleaning shuttle to be in position until tilted down. The shuttle state machine itself checks, if there is a grabber with cleaning request (counter=0) in down (workpiece opposite) position. If so, it runs a cleaning sequence, if not it checks for a grabber without material. If present, it runs a material feed sequence anyhow there are one or more bugs in that construction probably with the reset of the cleaning request flag. Maybe I can eliminate the grabber state machine completly if the state machines do not correspond to a physical unit. Grabber in shuttle position and both shuttles are syncronized sequence what can be implemented with one state machine. If grabber is tilted up, same grabber is requiered to operate parallel and async from shuttle state machine. Probably it will be wise not to have a state machine for each grabber. Therefore design a state machine for the grabber in workpiece position and integrate the grabber in shuttle position inside the shuttle state machine. Additional a combinatorial instance is required what makes a multiplex of the input and output signals in dependance of the turntables position. That approach seems slightly more hopeful to break down the task in feasable details.
> > @Antonio: To observe state machines the trace() macro seems to be required > at dozens of locations. It should contain a call to queue one or two bytes > to a tx buffer whenever a state variable is written. Can you post some > details to your macro to make shure I understand what you do with ?
Why not put just before the switch() statement ? I have no "general purpose" trace macro, I simpy write a tailor made little function each time... of course, whan I feel the need. As you noted, it depends a lot on how many inputs are involved, how much ram you have (for a queue..), communication / debug channels you have, their speed vs. expected state transition times. But usually in my systems you have maybe hundredths of loops where nothing happens, then suddenly you have a quick sequence of transitions before settling again in a stable state. I want to capture those transitions. You can write a function that compares N arguments with stored (static ?) samples, and sends a log line to a tx queue each time one of them differs. You can also write two or more versions of that, that samples different subsets of the inputs, and then tags them in the output.