Continous eeprom checksum microcontroller| page 3

Reply by Ken Lee ●July 6, 20042004-07-06

On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett"
<peb@amleth.demon.co.uk> wrote:

>Vishal wrote:
>
>> Hi, Anybody come across continous background checksum tests on
>> eeprom?? Is it worth doing ??
>
>If the integrity requirements for your system indicate that such checking 
>is useful then it is definitely worth doing. As some others have indicated, 
>it can catch problems with bus addressing and/or data shared pathways. The 
>most difficult thing about deciding to use continuous eeprom checksum is 
>what you need to do if you discover a problem.

I was thinking the same thing. I presume some analysis was performed &
the eeprom checksum is a mitigation of some fault or hazard. Otherwise
what do you do when a fault is detected? Initialise to default values?
Log an error & continue? Halt the device? Reset the device? 

Also is a "checksum" adequate or should a CRC be calculated? 

Performing continuous eeprom checks could chew up considerable MIPs,
so I wouldn't do it unless I had cause for good reason. Some good
design practices employ minimal resource usage -- I wouldn't put this
one into that category.

Ken.

+====================================+
I hate junk email. Please direct any 
genuine email to: kenlee at hotpop.com

Reply by Spehro Pefhany ●July 6, 20042004-07-06

On Tue, 06 Jul 2004 23:09:36 GMT, the renowned postmaster@noname.com
(Ken Lee) wrote:

>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett"
><peb@amleth.demon.co.uk> wrote:
>
>>Vishal wrote:
>>
>>> Hi, Anybody come across continous background checksum tests on
>>> eeprom?? Is it worth doing ??
>>
>>If the integrity requirements for your system indicate that such checking 
>>is useful then it is definitely worth doing. As some others have indicated, 
>>it can catch problems with bus addressing and/or data shared pathways. The 
>>most difficult thing about deciding to use continuous eeprom checksum is 
>>what you need to do if you discover a problem.
>
>I was thinking the same thing. I presume some analysis was performed &
>the eeprom checksum is a mitigation of some fault or hazard. Otherwise
>what do you do when a fault is detected? Initialise to default values?
>Log an error & continue? Halt the device? Reset the device? 

Attempt to salvage correct value, attempt to repair at an appropriate
time in my case.  

>Also is a "checksum" adequate or should a CRC be calculated? 
>
>Performing continuous eeprom checks could chew up considerable MIPs,

It could not, too. 

>so I wouldn't do it unless I had cause for good reason. 

Yes. 

>Some good
>design practices employ minimal resource usage -- I wouldn't put this
>one into that category.
>
>Ken.

Depends on the other specifications. The minimum resource usage to
meet ALL the specifications, right? 

Best regards, 
Spehro Pefhany
-- 
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com

Reply by Ken Lee ●July 6, 20042004-07-06

On 5 Jul 2004 21:18:06 -0700, vishalpatil_89@yahoo.co.in (Vishal)
wrote:

>OK , I will rephrase the question.Is the register area of a micro more
>reliable than that of static RAM? I got this  guideline doc
>(requirement doc as they call it)
>which asks for reg refresh.Since the document is quite old , i was
>wondering if it makes sense to do reg refresh.
>
>Thanks.

If it is at all possible to corrupt micro registers via external means
(that is not by a software related fault) then I wouldn't be
attempting to mitigate the fault as, God knows, what other aspects of
the micro would be questionable. Instead I would mitigate against the
resultant hazard. For instance, if the micro is controlling the
transmitted output power of the device (as for a CAT device) then I
would be looking at putting a power limiter in the hardware.

Also if stored values need to be validated by some means then an
appropriate software architecture should be adopted. Possibly you
could adopt a scheme to CRC or checksum critically stored parameters
before use, rather than performing continuous refreshes -- just a
thought.

The fact that you are even discussing such issues would seem to me
that a proper hazard analysis needs to be performed on the system to
determine where you stand and to get a handle on the type of
mitigations you need to put in place.

Ken.

>
>
>Anton Erasmus <nobody@nowhere.net> wrote in message news:<o7ffe0tdpfv90820f3cfj4fsei440o7co9@4ax.com>...
>> On 2 Jul 2004 21:34:16 -0700, vishalpatil_89@yahoo.co.in (Vishal)
>> wrote:
>> 
>> >Hi, Anybody come across continous background checksum tests on
>> >eeprom?? Is it worth doing ??
>> 
>> Hi,
>> 
>> The most important thing when asking if any sort of error checking is
>> worthwhile, is what do you do if the answer is that there is a problem
>> . If you cannot do something practical/realistic with the answer, then
>> do not bother. 
>> Anything that can actually be checked and realistically be used to
>> improve reliability is worthwhile.
>> 
>> Regards
>>    Anton Erasmus

+====================================+
I hate junk email. Please direct any 
genuine email to: kenlee at hotpop.com

Reply by Ben Bradley ●July 6, 20042004-07-06

On Tue, 06 Jul 2004 23:09:36 GMT, postmaster@noname.com (Ken Lee)
wrote:

>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett"
><peb@amleth.demon.co.uk> wrote:
>
>>Vishal wrote:
>>
>>> Hi, Anybody come across continous background checksum tests on
>>> eeprom?? Is it worth doing ??
>>
>>If the integrity requirements for your system indicate that such checking 
>>is useful then it is definitely worth doing. As some others have indicated, 
>>it can catch problems with bus addressing and/or data shared pathways. The 
>>most difficult thing about deciding to use continuous eeprom checksum is 
>>what you need to do if you discover a problem.
>
>I was thinking the same thing. I presume some analysis was performed &
>the eeprom checksum is a mitigation of some fault or hazard. Otherwise
>what do you do when a fault is detected? Initialise to default values?
>Log an error & continue? Halt the device? Reset the device? 
>
>Also is a "checksum" adequate or should a CRC be calculated? 
>
>Performing continuous eeprom checks could chew up considerable MIPs,
>so I wouldn't do it unless I had cause for good reason. Some good
>design practices employ minimal resource usage -- I wouldn't put this
>one into that category.

   It really depends on how often it needs to be performed. If you
have a routine called in a main loop or tick interrupt that
accumulates a location, increments a pointer (and if at end, does the
compare and resets the pointer) and returns, it will use very little
resources and be done in seconds or minutes.
   From reading followup posts, the OP seems to want to do this for
the more general purpose of system integrity. I'm not sure this is the
best (and certainly not the only) way to validate system integrity. As
any recent c.a.e reader knows, we've had a few heated threads recently
regarding devices intended to increase system integrity. Some things
are a lot better than others in increasing, validating or insuring
system integrity.

Quoting an earlier response in the thread:

>Hi, one more(i am getting lazy,cant dig out on my own ;) )
>One of the generic requirement by my customer asks for a register
>refresh on continuous basis again. I am using a hc12 micro. My que is
>the same  .. is it worth it ?? never implemented such a thing
>before...

   It seems to me your question is "I am using a hc12 micro. What can
I do to make it as reliable as reasonably possible?"

   As for the "register refresh" he may be referring to output
registers and data direction registers. Electrical spikes can cause
the bits in these registers to change states, so it makes sense to
refresh them regularly. But why does your customer ask this? It seems
that as the designer, you should be making these decisions. Is your
customer micromanaging you? Or is your customer a governmental agency
and these are "required specs"?
   But if a spike can change an I/O register, then it can change any
other read/write bit on the silicon, such as a CPU register or RAM
location. These can't be "refreshed" because you don't know what
values they should be. The solution to this, in addition to to the
above CRC's and refreshing, is to reduce the effect of a spike so it's
much less likely to affect the controller: change layout, add bypass
caps, add diodes and resistors on I/O's and such.
   So what happens if an I/O port is the wrong value? Could it do
something dangerous? Could it lose valuable data? What should the
thing do? Reset? Save the fault state in an EEPROM, light a special
"ERROR" LED and stop? (I rather like that, as the fault state is
"valuable data" to the designer) All this is dictated by the
application.

>Ken.
>
>+====================================+
>I hate junk email. Please direct any 
>genuine email to: kenlee at hotpop.com

Reply by Paul E. Bennett ●July 7, 20042004-07-07

Ben Bradley wrote:

> On Tue, 06 Jul 2004 23:09:36 GMT, postmaster@noname.com (Ken Lee)
> wrote:

[%X]

>>I was thinking the same thing. I presume some analysis was performed &
>>the eeprom checksum is a mitigation of some fault or hazard. Otherwise
>>what do you do when a fault is detected? Initialise to default values?
>>Log an error & continue? Halt the device? Reset the device?
>>
>>Also is a "checksum" adequate or should a CRC be calculated?
>>
>>Performing continuous eeprom checks could chew up considerable MIPs,
>>so I wouldn't do it unless I had cause for good reason. Some good
>>design practices employ minimal resource usage -- I wouldn't put this
>>one into that category.

As I stated previously, only the risk assessment process for the system in 
its intended environments can guide the mitigating measures required in the 
system. It can take as long or longer to perform the risk analysis 
ccompared to the time required to do the system design.

>    It really depends on how often it needs to be performed. If you
> have a routine called in a main loop or tick interrupt that
> accumulates a location, increments a pointer (and if at end, does the
> compare and resets the pointer) and returns, it will use very little
> resources and be done in seconds or minutes.
>    From reading followup posts, the OP seems to want to do this for
> the more general purpose of system integrity. I'm not sure this is the
> best (and certainly not the only) way to validate system integrity. As
> any recent c.a.e reader knows, we've had a few heated threads recently
> regarding devices intended to increase system integrity. Some things
> are a lot better than others in increasing, validating or insuring
> system integrity.

The difference between genearal purpose controller systems and high 
integrity controller systems should, nowadays, be less reflected in the 
hardware manifestations. The difference is more likely to be manifested in 
the software techniques used and the overall system integration 
arrangements. Whatever errors you find in the system you still must have 
some plan of action for the controller to follow even if it is just raise 
an alarm and turn itself off. Naturally, when errors are encountered they 
should be logged somewhere so that the engineers/operations staff can 
identify what happened (in sequence hopefully).

> Quoting an earlier response in the thread:
> 
>>Hi, one more(i am getting lazy,cant dig out on my own ;) )
>>One of the generic requirement by my customer asks for a register
>>refresh on continuous basis again. I am using a hc12 micro. My que is
>>the same  .. is it worth it ?? never implemented such a thing
>>before...
> 
>    It seems to me your question is "I am using a hc12 micro. What can
> I do to make it as reliable as reasonably possible?"

Most likely what he is asking but also trying to follow a document he has 
been given by his client (who may just have cherry picked some supposedly 
useful phrases that have been applied to other systems without 
understanding the implications of what they are asking - the OP needs to 
explore this with his client from the basis of a good grounding in 
defensive programming techniques and their overall value to system 
integrity. I would require a significant amount of information from the OIP 
to be able to assist him to that level from where he seems to be.

>    As for the "register refresh" he may be referring to output
> registers and data direction registers. Electrical spikes can cause
> the bits in these registers to change states, so it makes sense to
> refresh them regularly. But why does your customer ask this? It seems
> that as the designer, you should be making these decisions. Is your
> customer micromanaging you? Or is your customer a governmental agency
> and these are "required specs"?

The general rule for registers that are relied upon for output is that 
their state should be refreshed at the successful completion of each 
control loop cycle based on the evidence from the real system state as 
represented by the inputs (INPUT-->PROCESS-->OUTPUT). If the path to the 
end of the control loop is not completed successfully then you may need to 
set a default output pattern that has been determined to be safe (I try and 
make mine all outputs off if I can - not always possible).

>    But if a spike can change an I/O register, then it can change any
> other read/write bit on the silicon, such as a CPU register or RAM
> location. These can't be "refreshed" because you don't know what
> values they should be. The solution to this, in addition to to the
> above CRC's and refreshing, is to reduce the effect of a spike so it's
> much less likely to affect the controller: change layout, add bypass
> caps, add diodes and resistors on I/O's and such.
>    So what happens if an I/O port is the wrong value? Could it do

Systems that are easily suceptable to the effects of spikes from PSU, ESD 
or RFI need the hardware design looking at. Decent layout, adequate 
decoupling, filtering, shielding and sensible arrangement of ground 
circuits will all have beneficial effects on the system. 

> something dangerous? Could it lose valuable data? What should the
> thing do? Reset? Save the fault state in an EEPROM, light a special
> "ERROR" LED and stop? (I rather like that, as the fault state is
> "valuable data" to the designer) All this is dictated by the
> application.

It just takes a little up-front thinking to eliminate many of the problems 
that can arise for a system design. No-one needs to rush to dash out a 
system design on receipt of the requirements.

-- 
********************************************************************
Paul E. Bennett ....................<email://peb@a...>
Forth based HIDECS Consultancy .....<http://www.amleth.demon.co.uk/>
Mob: +44 (0)7811-639972 .........NOW AVAILABLE:- HIDECS COURSE......
Tel: +44 (0)1235-811095 .... see http://www.feabhas.com for details.
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
********************************************************************

Reply by Ken Lee ●July 7, 20042004-07-07

On Tue, 06 Jul 2004 23:34:14 GMT, Spehro Pefhany
<speffSNIP@interlogDOTyou.knowwhat> wrote:

>On Tue, 06 Jul 2004 23:09:36 GMT, the renowned postmaster@noname.com
>(Ken Lee) wrote:
>
>>On Sat, 03 Jul 2004 20:45:34 +0100, "Paul E. Bennett"
>><peb@amleth.demon.co.uk> wrote:
>>
>>>Vishal wrote:
>>>
>>>> Hi, Anybody come across continous background checksum tests on
>>>> eeprom?? Is it worth doing ??
>>>
>>>If the integrity requirements for your system indicate that such checking 
>>>is useful then it is definitely worth doing. As some others have indicated, 
>>>it can catch problems with bus addressing and/or data shared pathways. The 
>>>most difficult thing about deciding to use continuous eeprom checksum is 
>>>what you need to do if you discover a problem.
>>
>>I was thinking the same thing. I presume some analysis was performed &
>>the eeprom checksum is a mitigation of some fault or hazard. Otherwise
>>what do you do when a fault is detected? Initialise to default values?
>>Log an error & continue? Halt the device? Reset the device? 
>
>Attempt to salvage correct value, attempt to repair at an appropriate
>time in my case.  

What does "salvage" really mean. If a checksum is performed on a block
of memory & it's wrong then one could take this to mean that 1 or any
number of the contents are incorrect. If a checksum is performed on a
single item and it's wrong, then all you can deduce from this is that
the value is incorrect. Unless you keep a mirror image of the data you
cannot "salvage" the correct value. Possibly the only things one could
do is fall back to some "safe" or default value, reset the device or
place the device in an error state.

>
>>Also is a "checksum" adequate or should a CRC be calculated? 
>>
>>Performing continuous eeprom checks could chew up considerable MIPs,
>
>It could not, too. 
Sorry but this requirement just has that particular pattern as a
MIPs-chewer --  "repetitive calculation on a block of memory". Why
can't this requirement be met on demand -- that is, the checksum is
calculated when the data is read and used?

>
>>so I wouldn't do it unless I had cause for good reason. 
>
>Yes. 
>
>>Some good
>>design practices employ minimal resource usage -- I wouldn't put this
>>one into that category.
>>
>>Ken.
>
>Depends on the other specifications. The minimum resource usage to
>meet ALL the specifications, right? 

I've no argument on resource budgeting for input requirements, but
this particular requirement looks like a mitigation for some fault or
hazard. I've no problem with checksumming or CRCing stored data, but
doing it on a continual basis seems to me to be ill-formulated.
Admittedly I don't know what the application is, but I'm in the
medical electronics game and have worked in the automotive industry,
and am familiar with over-burdened mitigations.

Ken.
>
>Best regards, 
>Spehro Pefhany
>-- 
>"it's the network..."                          "The Journey is the reward"
>speff@interlog.com             Info for manufacturers: http://www.trexon.com
>Embedded software/hardware/analog  Info for designers:  http://www.speff.com

+====================================+
I hate junk email. Please direct any 
genuine email to: kenlee at hotpop.com

Reply by Paul Keinanen ●July 8, 20042004-07-08

On Wed, 07 Jul 2004 23:10:59 GMT, postmaster@noname.com (Ken Lee)
wrote:

>Sorry but this requirement just has that particular pattern as a
>MIPs-chewer --  "repetitive calculation on a block of memory". Why
>can't this requirement be met on demand -- that is, the checksum is
>calculated when the data is read and used?

If you are using some sort of multitasking kernel, which usually
contains a null task (running an idle loop), which executes when no
other task is runnable, simply put the memory check into this null
task.

Paul

Reply by Ken Lee ●July 11, 20042004-07-11

On Thu, 08 Jul 2004 09:46:30 +0300, Paul Keinanen <keinanen@sci.fi>
wrote:

>On Wed, 07 Jul 2004 23:10:59 GMT, postmaster@noname.com (Ken Lee)
>wrote:
>
>>Sorry but this requirement just has that particular pattern as a
>>MIPs-chewer --  "repetitive calculation on a block of memory". Why
>>can't this requirement be met on demand -- that is, the checksum is
>>calculated when the data is read and used?
>
>If you are using some sort of multitasking kernel, which usually
>contains a null task (running an idle loop), which executes when no
>other task is runnable, simply put the memory check into this null
>task.
>
>Paul

I'm sure that there are a multitude of ways to implement this but that
wasn't my point. I was making an observation  as to why this
requirement had to be done continuously, opposed when it's needed &
that's when the value is actually read.

Ken

+====================================+
I hate junk email. Please direct any 
genuine email to: kenlee at hotpop.com

Reply by Jim McGinnis ●July 11, 20042004-07-11

On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
wrote:

>I'm sure that there are a multitude of ways to implement this but that
>wasn't my point. I was making an observation  as to why this
>requirement had to be done continuously, opposed when it's needed &
>that's when the value is actually read.
>

Suppose the device is the navigation system for an airplane, and you
haven't taken off yet. Wouldn't you like to know whether you could
rely on it once you're in the air?

-- 
Jim McGinnis

Reply by Mike Fields ●July 12, 20042004-07-12

In fact, that is one of the tasks that does run as part of
the normal frame in our embedded avionics software for
some of the boxes we build for airplanes !!  Yes, we do
want to know if things are corrupted.

--
Mike "mikey" Fields
http://home.comcast.net/~mike.fields/
outgoing email scanned by Norton Antivirus ... is that good ?

Linux users brag on how long their system stays up,
Window users assume it's a temporary condition ...


"Jim McGinnis" <remove_this.mcginnis@and_this.ieee.org> wrote in message
news:fjs3f0tkq1ifs979oj8h1ipd16fallvpq8@4ax.com...
> On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
> wrote:
>
>
> >I'm sure that there are a multitude of ways to implement this but that
> >wasn't my point. I was making an observation  as to why this
> >requirement had to be done continuously, opposed when it's needed &
> >that's when the value is actually read.
> >
>
> Suppose the device is the navigation system for an airplane, and you
> haven't taken off yet. Wouldn't you like to know whether you could
> rely on it once you're in the air?
>
> --
> Jim McGinnis

Previous 1 234 5 6 Next

Continous eeprom checksum microcontroller

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group