EmbeddedRelated.com
Forums

Continous eeprom checksum microcontroller

Started by Vishal July 3, 2004
On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett"
<peb@amleth.demon.co.uk> wrote:

>Vishal wrote: > >> Hi, Anybody come across continous background checksum tests on >> eeprom?? Is it worth doing ?? > >If the integrity requirements for your system indicate that such checking >is useful then it is definitely worth doing. As some others have indicated, >it can catch problems with bus addressing and/or data shared pathways. The >most difficult thing about deciding to use continuous eeprom checksum is >what you need to do if you discover a problem.
I was thinking the same thing. I presume some analysis was performed & the eeprom checksum is a mitigation of some fault or hazard. Otherwise what do you do when a fault is detected? Initialise to default values? Log an error & continue? Halt the device? Reset the device? Also is a "checksum" adequate or should a CRC be calculated? Performing continuous eeprom checks could chew up considerable MIPs, so I wouldn't do it unless I had cause for good reason. Some good design practices employ minimal resource usage -- I wouldn't put this one into that category. Ken. +====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
On Tue, 06 Jul 2004 23:09:36 GMT, the renowned postmaster@noname.com
(Ken Lee) wrote:

>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett" ><peb@amleth.demon.co.uk> wrote: > >>Vishal wrote: >> >>> Hi, Anybody come across continous background checksum tests on >>> eeprom?? Is it worth doing ?? >> >>If the integrity requirements for your system indicate that such checking >>is useful then it is definitely worth doing. As some others have indicated, >>it can catch problems with bus addressing and/or data shared pathways. The >>most difficult thing about deciding to use continuous eeprom checksum is >>what you need to do if you discover a problem. > >I was thinking the same thing. I presume some analysis was performed & >the eeprom checksum is a mitigation of some fault or hazard. Otherwise >what do you do when a fault is detected? Initialise to default values? >Log an error & continue? Halt the device? Reset the device?
Attempt to salvage correct value, attempt to repair at an appropriate time in my case.
>Also is a "checksum" adequate or should a CRC be calculated? > >Performing continuous eeprom checks could chew up considerable MIPs,
It could not, too.
>so I wouldn't do it unless I had cause for good reason.
Yes.
>Some good >design practices employ minimal resource usage -- I wouldn't put this >one into that category. > >Ken.
Depends on the other specifications. The minimum resource usage to meet ALL the specifications, right? Best regards, Spehro Pefhany -- "it's the network..." "The Journey is the reward" speff@interlog.com Info for manufacturers: http://www.trexon.com Embedded software/hardware/analog Info for designers: http://www.speff.com
On 5 Jul 2004 21:18:06 -0700, vishalpatil_89@yahoo.co.in (Vishal)
wrote:

>OK , I will rephrase the question.Is the register area of a micro more >reliable than that of static RAM? I got this guideline doc >(requirement doc as they call it) >which asks for reg refresh.Since the document is quite old , i was >wondering if it makes sense to do reg refresh. > >Thanks.
If it is at all possible to corrupt micro registers via external means (that is not by a software related fault) then I wouldn't be attempting to mitigate the fault as, God knows, what other aspects of the micro would be questionable. Instead I would mitigate against the resultant hazard. For instance, if the micro is controlling the transmitted output power of the device (as for a CAT device) then I would be looking at putting a power limiter in the hardware. Also if stored values need to be validated by some means then an appropriate software architecture should be adopted. Possibly you could adopt a scheme to CRC or checksum critically stored parameters before use, rather than performing continuous refreshes -- just a thought. The fact that you are even discussing such issues would seem to me that a proper hazard analysis needs to be performed on the system to determine where you stand and to get a handle on the type of mitigations you need to put in place. Ken.
> > >Anton Erasmus <nobody@nowhere.net> wrote in message news:<o7ffe0tdpfv90820f3cfj4fsei440o7co9@4ax.com>... >> On 2 Jul 2004 21:34:16 -0700, vishalpatil_89@yahoo.co.in (Vishal) >> wrote: >> >> >Hi, Anybody come across continous background checksum tests on >> >eeprom?? Is it worth doing ?? >> >> Hi, >> >> The most important thing when asking if any sort of error checking is >> worthwhile, is what do you do if the answer is that there is a problem >> . If you cannot do something practical/realistic with the answer, then >> do not bother. >> Anything that can actually be checked and realistically be used to >> improve reliability is worthwhile. >> >> Regards >> Anton Erasmus
+====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
On Tue, 06 Jul 2004 23:09:36 GMT, postmaster@noname.com (Ken Lee)
wrote:

>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett" ><peb@amleth.demon.co.uk> wrote: > >>Vishal wrote: >> >>> Hi, Anybody come across continous background checksum tests on >>> eeprom?? Is it worth doing ?? >> >>If the integrity requirements for your system indicate that such checking >>is useful then it is definitely worth doing. As some others have indicated, >>it can catch problems with bus addressing and/or data shared pathways. The >>most difficult thing about deciding to use continuous eeprom checksum is >>what you need to do if you discover a problem. > >I was thinking the same thing. I presume some analysis was performed & >the eeprom checksum is a mitigation of some fault or hazard. Otherwise >what do you do when a fault is detected? Initialise to default values? >Log an error & continue? Halt the device? Reset the device? > >Also is a "checksum" adequate or should a CRC be calculated? > >Performing continuous eeprom checks could chew up considerable MIPs, >so I wouldn't do it unless I had cause for good reason. Some good >design practices employ minimal resource usage -- I wouldn't put this >one into that category.
It really depends on how often it needs to be performed. If you have a routine called in a main loop or tick interrupt that accumulates a location, increments a pointer (and if at end, does the compare and resets the pointer) and returns, it will use very little resources and be done in seconds or minutes. From reading followup posts, the OP seems to want to do this for the more general purpose of system integrity. I'm not sure this is the best (and certainly not the only) way to validate system integrity. As any recent c.a.e reader knows, we've had a few heated threads recently regarding devices intended to increase system integrity. Some things are a lot better than others in increasing, validating or insuring system integrity. Quoting an earlier response in the thread:
>Hi, one more(i am getting lazy,cant dig out on my own ;) ) >One of the generic requirement by my customer asks for a register >refresh on continuous basis again. I am using a hc12 micro. My que is >the same .. is it worth it ?? never implemented such a thing >before...
It seems to me your question is "I am using a hc12 micro. What can I do to make it as reliable as reasonably possible?" As for the "register refresh" he may be referring to output registers and data direction registers. Electrical spikes can cause the bits in these registers to change states, so it makes sense to refresh them regularly. But why does your customer ask this? It seems that as the designer, you should be making these decisions. Is your customer micromanaging you? Or is your customer a governmental agency and these are "required specs"? But if a spike can change an I/O register, then it can change any other read/write bit on the silicon, such as a CPU register or RAM location. These can't be "refreshed" because you don't know what values they should be. The solution to this, in addition to to the above CRC's and refreshing, is to reduce the effect of a spike so it's much less likely to affect the controller: change layout, add bypass caps, add diodes and resistors on I/O's and such. So what happens if an I/O port is the wrong value? Could it do something dangerous? Could it lose valuable data? What should the thing do? Reset? Save the fault state in an EEPROM, light a special "ERROR" LED and stop? (I rather like that, as the fault state is "valuable data" to the designer) All this is dictated by the application.
>Ken. > >+====================================+ >I hate junk email. Please direct any >genuine email to: kenlee at hotpop.com
Ben Bradley wrote:

> On Tue, 06 Jul 2004 23:09:36 GMT, postmaster@noname.com (Ken Lee) > wrote:
[%X]
>>I was thinking the same thing. I presume some analysis was performed & >>the eeprom checksum is a mitigation of some fault or hazard. Otherwise >>what do you do when a fault is detected? Initialise to default values? >>Log an error & continue? Halt the device? Reset the device? >> >>Also is a "checksum" adequate or should a CRC be calculated? >> >>Performing continuous eeprom checks could chew up considerable MIPs, >>so I wouldn't do it unless I had cause for good reason. Some good >>design practices employ minimal resource usage -- I wouldn't put this >>one into that category.
As I stated previously, only the risk assessment process for the system in its intended environments can guide the mitigating measures required in the system. It can take as long or longer to perform the risk analysis ccompared to the time required to do the system design.
> It really depends on how often it needs to be performed. If you > have a routine called in a main loop or tick interrupt that > accumulates a location, increments a pointer (and if at end, does the > compare and resets the pointer) and returns, it will use very little > resources and be done in seconds or minutes. > From reading followup posts, the OP seems to want to do this for > the more general purpose of system integrity. I'm not sure this is the > best (and certainly not the only) way to validate system integrity. As > any recent c.a.e reader knows, we've had a few heated threads recently > regarding devices intended to increase system integrity. Some things > are a lot better than others in increasing, validating or insuring > system integrity.
The difference between genearal purpose controller systems and high integrity controller systems should, nowadays, be less reflected in the hardware manifestations. The difference is more likely to be manifested in the software techniques used and the overall system integration arrangements. Whatever errors you find in the system you still must have some plan of action for the controller to follow even if it is just raise an alarm and turn itself off. Naturally, when errors are encountered they should be logged somewhere so that the engineers/operations staff can identify what happened (in sequence hopefully).
> Quoting an earlier response in the thread: > >>Hi, one more(i am getting lazy,cant dig out on my own ;) ) >>One of the generic requirement by my customer asks for a register >>refresh on continuous basis again. I am using a hc12 micro. My que is >>the same .. is it worth it ?? never implemented such a thing >>before... > > It seems to me your question is "I am using a hc12 micro. What can > I do to make it as reliable as reasonably possible?"
Most likely what he is asking but also trying to follow a document he has been given by his client (who may just have cherry picked some supposedly useful phrases that have been applied to other systems without understanding the implications of what they are asking - the OP needs to explore this with his client from the basis of a good grounding in defensive programming techniques and their overall value to system integrity. I would require a significant amount of information from the OIP to be able to assist him to that level from where he seems to be.
> As for the "register refresh" he may be referring to output > registers and data direction registers. Electrical spikes can cause > the bits in these registers to change states, so it makes sense to > refresh them regularly. But why does your customer ask this? It seems > that as the designer, you should be making these decisions. Is your > customer micromanaging you? Or is your customer a governmental agency > and these are "required specs"?
The general rule for registers that are relied upon for output is that their state should be refreshed at the successful completion of each control loop cycle based on the evidence from the real system state as represented by the inputs (INPUT-->PROCESS-->OUTPUT). If the path to the end of the control loop is not completed successfully then you may need to set a default output pattern that has been determined to be safe (I try and make mine all outputs off if I can - not always possible).
> But if a spike can change an I/O register, then it can change any > other read/write bit on the silicon, such as a CPU register or RAM > location. These can't be "refreshed" because you don't know what > values they should be. The solution to this, in addition to to the > above CRC's and refreshing, is to reduce the effect of a spike so it's > much less likely to affect the controller: change layout, add bypass > caps, add diodes and resistors on I/O's and such. > So what happens if an I/O port is the wrong value? Could it do
Systems that are easily suceptable to the effects of spikes from PSU, ESD or RFI need the hardware design looking at. Decent layout, adequate decoupling, filtering, shielding and sensible arrangement of ground circuits will all have beneficial effects on the system.
> something dangerous? Could it lose valuable data? What should the > thing do? Reset? Save the fault state in an EEPROM, light a special > "ERROR" LED and stop? (I rather like that, as the fault state is > "valuable data" to the designer) All this is dictated by the > application.
It just takes a little up-front thinking to eliminate many of the problems that can arise for a system design. No-one needs to rush to dash out a system design on receipt of the requirements. -- ******************************************************************** Paul E. Bennett ....................<email://peb@a...> Forth based HIDECS Consultancy .....<http://www.amleth.demon.co.uk/> Mob: +44 (0)7811-639972 .........NOW AVAILABLE:- HIDECS COURSE...... Tel: +44 (0)1235-811095 .... see http://www.feabhas.com for details. Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************
On Tue, 06 Jul 2004 23:34:14 GMT, Spehro Pefhany
<speffSNIP@interlogDOTyou.knowwhat> wrote:

>On Tue, 06 Jul 2004 23:09:36 GMT, the renowned postmaster@noname.com >(Ken Lee) wrote: > >>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett" >><peb@amleth.demon.co.uk> wrote: >> >>>Vishal wrote: >>> >>>> Hi, Anybody come across continous background checksum tests on >>>> eeprom?? Is it worth doing ?? >>> >>>If the integrity requirements for your system indicate that such checking >>>is useful then it is definitely worth doing. As some others have indicated, >>>it can catch problems with bus addressing and/or data shared pathways. The >>>most difficult thing about deciding to use continuous eeprom checksum is >>>what you need to do if you discover a problem. >> >>I was thinking the same thing. I presume some analysis was performed & >>the eeprom checksum is a mitigation of some fault or hazard. Otherwise >>what do you do when a fault is detected? Initialise to default values? >>Log an error & continue? Halt the device? Reset the device? > >Attempt to salvage correct value, attempt to repair at an appropriate >time in my case.
What does "salvage" really mean. If a checksum is performed on a block of memory & it's wrong then one could take this to mean that 1 or any number of the contents are incorrect. If a checksum is performed on a single item and it's wrong, then all you can deduce from this is that the value is incorrect. Unless you keep a mirror image of the data you cannot "salvage" the correct value. Possibly the only things one could do is fall back to some "safe" or default value, reset the device or place the device in an error state.
> >>Also is a "checksum" adequate or should a CRC be calculated? >> >>Performing continuous eeprom checks could chew up considerable MIPs, > >It could not, too.
Sorry but this requirement just has that particular pattern as a MIPs-chewer -- "repetitive calculation on a block of memory". Why can't this requirement be met on demand -- that is, the checksum is calculated when the data is read and used?
> >>so I wouldn't do it unless I had cause for good reason. > >Yes. > >>Some good >>design practices employ minimal resource usage -- I wouldn't put this >>one into that category. >> >>Ken. > >Depends on the other specifications. The minimum resource usage to >meet ALL the specifications, right?
I've no argument on resource budgeting for input requirements, but this particular requirement looks like a mitigation for some fault or hazard. I've no problem with checksumming or CRCing stored data, but doing it on a continual basis seems to me to be ill-formulated. Admittedly I don't know what the application is, but I'm in the medical electronics game and have worked in the automotive industry, and am familiar with over-burdened mitigations. Ken.
> >Best regards, >Spehro Pefhany >-- >"it's the network..." "The Journey is the reward" >speff@interlog.com Info for manufacturers: http://www.trexon.com >Embedded software/hardware/analog Info for designers: http://www.speff.com
+====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
On Wed, 07 Jul 2004 23:10:59 GMT, postmaster@noname.com (Ken Lee)
wrote:

>Sorry but this requirement just has that particular pattern as a >MIPs-chewer -- "repetitive calculation on a block of memory". Why >can't this requirement be met on demand -- that is, the checksum is >calculated when the data is read and used?
If you are using some sort of multitasking kernel, which usually contains a null task (running an idle loop), which executes when no other task is runnable, simply put the memory check into this null task. Paul
On Thu, 08 Jul 2004 09:46:30 +0300, Paul Keinanen <keinanen@sci.fi>
wrote:

>On Wed, 07 Jul 2004 23:10:59 GMT, postmaster@noname.com (Ken Lee) >wrote: > >>Sorry but this requirement just has that particular pattern as a >>MIPs-chewer -- "repetitive calculation on a block of memory". Why >>can't this requirement be met on demand -- that is, the checksum is >>calculated when the data is read and used? > >If you are using some sort of multitasking kernel, which usually >contains a null task (running an idle loop), which executes when no >other task is runnable, simply put the memory check into this null >task. > >Paul
I'm sure that there are a multitude of ways to implement this but that wasn't my point. I was making an observation as to why this requirement had to be done continuously, opposed when it's needed & that's when the value is actually read. Ken +====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
wrote:


>I'm sure that there are a multitude of ways to implement this but that >wasn't my point. I was making an observation as to why this >requirement had to be done continuously, opposed when it's needed & >that's when the value is actually read. >
Suppose the device is the navigation system for an airplane, and you haven't taken off yet. Wouldn't you like to know whether you could rely on it once you're in the air? -- Jim McGinnis
In fact, that is one of the tasks that does run as part of
the normal frame in our embedded avionics software for
some of the boxes we build for airplanes !!  Yes, we do
want to know if things are corrupted.

--
Mike "mikey" Fields
http://home.comcast.net/~mike.fields/
outgoing email scanned by Norton Antivirus ... is that good ?

Linux users brag on how long their system stays up,
Window users assume it's a temporary condition ...


"Jim McGinnis" <remove_this.mcginnis@and_this.ieee.org> wrote in message
news:fjs3f0tkq1ifs979oj8h1ipd16fallvpq8@4ax.com...
> On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee) > wrote: > > > >I'm sure that there are a multitude of ways to implement this but that > >wasn't my point. I was making an observation as to why this > >requirement had to be done continuously, opposed when it's needed & > >that's when the value is actually read. > > > > Suppose the device is the navigation system for an airplane, and you > haven't taken off yet. Wouldn't you like to know whether you could > rely on it once you're in the air? > > -- > Jim McGinnis