EmbeddedRelated.com
Forums

How to choose a firmware partner

Started by robi...@tesco.net May 26, 2004
<robin.pain@tesco.net> wrote in message
news:bd24a397.0406032358.26e60d15@posting.google.com...
> iddw@hotmail.com (Dave Hansen) wrote in message
news:<40bf21f8.132385989@News.individual.net>...
> Very well, _guarantee_ that this code will fail without the wdt > enabled:- > ... > bitset PORT,test_bit > bitclear PORT,test_bit
I'm not sure I understand your point, but in any case, if your test_bit is the watchdog output, then it's not a good idea to output both states in the same place. I and others have given you examples of watchdog-kicking strategies - have you read and understood them?
> So a high energy particle changes your program counter and you have a > random GOTO occur but your program does not reset because the chances > of the GOTO landing near a wdt_reset is now "several orders of > magnitude" higher? And you say "Who cares?"
Eh? I'm sorry, this makes no sense at all to me. To reiterate: in normal operation, software will periodically kick the watchdog (high and low, in two wholly unrelated places that both need to happen to accurately represent "normal operation") to stop it timing out and hence resetting the CPU. If the program loses control, then the watchdog is not kicked in those two places, it times out, and the CPU gets reset. Whether a random GOTO lands near a watchdog kick (one state only) has nothing to do with anything. Steve http://www.sfdesign.co.uk http://www.fivetrees.com
"Steve at fivetrees" <steve@NOSPAMTAfivetrees.com> wrote in message news:<jc6dnbSOzPUaiCLd4p2dnA@nildram.net>...
> <robin.pain@tesco.net> wrote in message > news:bd24a397.0406022345.42dfad7b@posting.google.com... > > > > No, I do read data sheets but I have a small attention span and a bad > > memory so I expect to make mistakes. > > Remind me not to hire you ;). >
Why? Do you have a short attention span and a bad memory too? =:)= Cheers Robin
On 4 Jun 2004 00:58:45 -0700, robin.pain@tesco.net
(robin.pain@tesco.net) wrote:

>iddw@hotmail.com (Dave Hansen) wrote in message news:<40bf21f8.132385989@News.individual.net>...
[...]
>> For my superloop code, the WDT is updated in exactly two places: >> immediately after reset, and at the top of the superloop. Combined >> with special "come from" tests to ensure the program flow is as was >> expected, our systems have no trouble surviving some really nasty ESD >> testing required by our customers. (Without a WDT, I _guarantee_ your >> system will fail these tests.) > >Very well, _guarantee_ that this code will fail without the wdt >enabled:- > ... > jmp test > jmp test > >test > bitset PORT,test_bit > bitclear PORT,test_bit > jmp test > jmp test > ... >
That will fail whether or not there's a WDT. You've made another silly mistake. [...]
>> Under normal operation the WDT is being updated several orders of >> magnitude more often than is necessary. Who cares? It's abnormal >> operation I care about, and what the WDT is supposed to remedy. >> > >So a high energy particle changes your program counter and you have a >random GOTO occur but your program does not reset because the chances >of the GOTO landing near a wdt_reset is now "several orders of >magnitude" higher? And you say "Who cares?"
And another. Remember, there are exactly _two_ places in the code where the WDT is reset, not "orders of magnitude." It's a loop. And if we _do_ jump to the top of the loop from somewhere in the middle, the "come from" test will fail, triggering a reset. Regards, -=Dave -- Change is inevitable, progress is not.
iddw@hotmail.com (Dave Hansen) wrote in message news:<40c08779.223921024@News.individual.net>...
> On 4 Jun 2004 00:58:45 -0700, robin.pain@tesco.net > (robin.pain@tesco.net) wrote: > > >iddw@hotmail.com (Dave Hansen) wrote in message news:<40bf21f8.132385989@News.individual.net>... > [...] > >> For my superloop code, the WDT is updated in exactly two places: > >> immediately after reset, and at the top of the superloop. Combined > >> with special "come from" tests to ensure the program flow is as was > >> expected, our systems have no trouble surviving some really nasty ESD > >> testing required by our customers. (Without a WDT, I _guarantee_ your > >> system will fail these tests.) > > > >Very well, _guarantee_ that this code will fail without the wdt > >enabled:- > > ... > > jmp test > > jmp test > > > >test > > bitset PORT,test_bit > > bitclear PORT,test_bit > > jmp test > > jmp test > > ... > > > That will fail whether or not there's a WDT. You've made another > silly mistake. > > [...]
The ellipsis at top and bottom mean e.g. ... jmp test ... is the same as ... jmp test jmp test ... is the same as ... jmp test jmp test jmp test ... etc In the same way one might fill all remaining unused locations with e.g. ... jmp reset ... Cheers Robin
"Steve at fivetrees" <steve@NOSPAMTAfivetrees.com> wrote in message news:<vo6dnfwA_v4xyl3dRVn-gg@nildram.net>...
> <robin.pain@tesco.net> wrote in message > news:bd24a397.0406032358.26e60d15@posting.google.com... > > iddw@hotmail.com (Dave Hansen) wrote in message > news:<40bf21f8.132385989@News.individual.net>... > > Very well, _guarantee_ that this code will fail without the wdt > > enabled:- > > ... > > bitset PORT,test_bit > > bitclear PORT,test_bit > > I'm not sure I understand your point, but in any case, if your test_bit is > the watchdog output, then it's not a good idea to output both states in the > same place. I and others have given you examples of watchdog-kicking > strategies - have you read and understood them?
The test bit is only for a 'scope etc. There is no watchdog timer (or if the MCU has an internal WDT it is disabled.
> > > So a high energy particle changes your program counter and you have a > > random GOTO occur but your program does not reset because the chances > > of the GOTO landing near a wdt_reset is now "several orders of > > magnitude" higher? And you say "Who cares?" > > Eh? I'm sorry, this makes no sense at all to me. To reiterate: in normal > operation, software will periodically kick the watchdog (high and low, in > two wholly unrelated places that both need to happen to accurately represent > "normal operation") to stop it timing out and hence resetting the CPU. If > the program loses control, then the watchdog is not kicked in those two > places, it times out,
Evidently, but that will only happen if the "loss of control" does not inadvertantly pass through one of the two wdt_reset instructions and if these are normally "refreshed" orders of magnitude prematurely, then you increase the chance that random loss of control can live long enough, to accidently traverse either of these two places.
> and the CPU gets reset. Whether a random GOTO lands > near a watchdog kick (one state only) has nothing to do with anything.
If a random GOTO lands near a watchdog kick then the faulty-state CPU can run amock for "orders of magnitude" longer before the wdt-resets. It might turn out that the CPU control flows back into your loop and from then on continues normally, indefinitely... so now you have had a catastrophic failure that fixed itself and no ones knows about it. Some state might have changed during the failure... like... the speed of the cooker fan or the trim of the aircraft. Cheers Robin
"robin.pain@tesco.net" wrote:
> > iddw@hotmail.com (Dave Hansen) wrote in message news:<40c08779.223921024@News.individual.net>... > > On 4 Jun 2004 00:58:45 -0700, robin.pain@tesco.net > > (robin.pain@tesco.net) wrote: > > > > >iddw@hotmail.com (Dave Hansen) wrote in message news:<40bf21f8.132385989@News.individual.net>... > > [...] > > >> For my superloop code, the WDT is updated in exactly two places: > > >> immediately after reset, and at the top of the superloop. Combined > > >> with special "come from" tests to ensure the program flow is as was > > >> expected, our systems have no trouble surviving some really nasty ESD > > >> testing required by our customers. (Without a WDT, I _guarantee_ your > > >> system will fail these tests.) > > > > > >Very well, _guarantee_ that this code will fail without the wdt > > >enabled:- > > > ... > > > jmp test > > > jmp test > > > > > >test > > > bitset PORT,test_bit > > > bitclear PORT,test_bit > > > jmp test > > > jmp test > > > ... > > > > > That will fail whether or not there's a WDT. You've made another > > silly mistake. > > > > [...] > > The ellipsis at top and bottom mean e.g. > > ... > jmp test > ...
The jmp test instruction is clearly more than a 1 byte instruction. What if the IP starts pointing to the middle of the opcode which turns out to be something that prevents your bit set/clear from ever getting executed, like say a "jmp nottest" and loops to itself? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
rickman wrote:

>The jmp test instruction is clearly more than a 1 byte instruction. >What if the IP starts pointing to the middle of the opcode which turns >out to be something that prevents your bit set/clear from ever getting >executed, like say a "jmp nottest" and loops to itself?
In systems where the code and data reside in the same address space, the instruction pointer may also be altered so that the processor begins to execute data. This rarely has predictable or desirable results. -- ======================================================================== Michael Kesti | "And like, one and one don't make | two, one and one make one." mkesti@gv.net | - The Who, Bargain
<robin.pain@tesco.net> wrote in message
news:bd24a397.0406050536.3a6dafd8@posting.google.com...
> > Eh? I'm sorry, this makes no sense at all to me. To reiterate: in normal > > operation, software will periodically kick the watchdog (high and low,
in
> > two wholly unrelated places that both need to happen to accurately
represent
> > "normal operation") to stop it timing out and hence resetting the CPU.
If
> > the program loses control, then the watchdog is not kicked in those two > > places, it times out, > > Evidently, but that will only happen if the "loss of control" does not > inadvertantly pass through one of the two wdt_reset instructions and > if these are normally "refreshed" orders of magnitude prematurely, > then you increase the chance that random loss of control can live long > enough, to accidently traverse either of these two places.
All of which explains why it's important to use two separate kicks in two completely different levels of code (as previously explained: e.g. top-level background task, and interrupt-driven heartbeat). One without the other will still reset the CPU.
> If a random GOTO lands near a watchdog kick then the faulty-state CPU > can run amock for "orders of magnitude" longer before the wdt-resets. > It might turn out that the CPU control flows back into your loop and > from then on continues normally, indefinitely... so now you have had a > catastrophic failure that fixed itself and no ones knows about it. > Some state might have changed during the failure... like... the speed > of the cooker fan or the trim of the aircraft.
Exactly what happens when a CPU goes "off in the weeds" is, by definition, unpredictable. Data may be corrupted; the stack may grow way past its bounds (hint: use a stack trap). My experience has been that mostly the CPU will wind up in a tight little loop somewhere... but not always. Your points are valid, and one should use other defences to guard against such cases. But none of this invalidates the use of a watchdog. It just means that one must be thorough. Steve http://www.sfdesign.co.uk http://www.fivetrees.com
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:40C1E612.BBE1DA53@yahoo.com...
> > jmp test > > ... > > The jmp test instruction is clearly more than a 1 byte instruction.
Why "clearly"? Not on many (most?) DSP processors. Meindert
"Michael R. Kesti" wrote:
> rickman wrote: > >> The jmp test instruction is clearly more than a 1 byte instruction. >> What if the IP starts pointing to the middle of the opcode which >> turns out to be something that prevents your bit set/clear from >> ever getting executed, like say a "jmp nottest" and loops to itself? > > In systems where the code and data reside in the same address space, the > instruction pointer may also be altered so that the processor begins to > execute data. This rarely has predictable or desirable results.
For even greater amusement, have the i/o ports memory mapped. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?