Continous eeprom checksum microcontroller| page 4

Reply by Ken Lee ●July 14, 20042004-07-14

On Sun, 11 Jul 2004 22:12:58 -0400, Jim McGinnis
<remove_this.mcginnis@and_this.ieee.org> wrote:

>On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
>wrote:
>
>
>>I'm sure that there are a multitude of ways to implement this but that
>>wasn't my point. I was making an observation  as to why this
>>requirement had to be done continuously, opposed when it's needed &
>>that's when the value is actually read.
>>
>
>Suppose the device is the navigation system for an airplane, and you
>haven't taken off yet. Wouldn't you like to know whether you could
>rely on it once you're in the air?
>
>-- 
>Jim McGinnis

So you're implying that they don't do a system check of the navigation
system on the ground before take-off. I'm assuming that the navigation
system is turned on prior to the plane getting into the air.

Let me be perfectly clear -- I'm NOT saying that the check shouldn't
be done. I'm objecting to the fact that it is done "continuously"
rather than on-demand.

Ken.

+====================================+
I hate junk email. Please direct any 
genuine email to: kenlee at hotpop.com

Reply by Frank Bemelman ●July 15, 20042004-07-15

"Jim McGinnis" <remove_this.mcginnis@and_this.ieee.org> schreef in bericht
news:fjs3f0tkq1ifs979oj8h1ipd16fallvpq8@4ax.com...
> On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
> wrote:
>
>
> >I'm sure that there are a multitude of ways to implement this but that
> >wasn't my point. I was making an observation  as to why this
> >requirement had to be done continuously, opposed when it's needed &
> >that's when the value is actually read.
> >
>
> Suppose the device is the navigation system for an airplane, and you
> haven't taken off yet. Wouldn't you like to know whether you could
> rely on it once you're in the air?

Oh, not again. Why not a controller for a nuclear power plant, an
on-board engine controller for a satellite in orbit, a laser for
eye corrections, a heart-lung machine in an OR, a launcher for
H-boms...

The bottom line is that folks that implement it, probably need it.
Either because their hardware is crap or they are paranoid or it
is a requirement from some retard, leaving 0.01% of applications
where it really serves a purpose.

-- 
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)

Reply by Guy Macon ●July 15, 20042004-07-15

Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says...
>
>Oh, not again. Why not a controller for a nuclear power plant, an
>on-board engine controller for a satellite in orbit, a laser for
>eye corrections, a heart-lung machine in an OR, a launcher for
>H-boms...
>
>The bottom line is that folks that implement it, probably need it.

I don't think so.

>Either because their hardware is crap

But why would they imagine thet the EEPROM hardware is crap while
having faith the the "do the checksum" hardware and the "store the
checksum to compare with the EEPROM" hardware will work just fine?

>or they are paranoid 

By why would they be paranoid about the EEPROM hardware and not the
EEPROM checking hardware? 

>or it is a requirement from some retard,

Possibly , but if you develop embedded sytems, it's your *job* to 
identify flaws in the requirements.  If it isn't needed and the 
customer insists on having it you have to make the personal choice 
of whether to do it or find work elsewhere.

Reply by Paul Keinanen ●July 15, 20042004-07-15

On Sun, 11 Jul 2004 22:12:58 -0400, Jim McGinnis
<remove_this.mcginnis@and_this.ieee.org> wrote:

>On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee)
>wrote:
>

>>I was making an observation  as to why this
>>requirement had to be done continuously, opposed when it's needed &
>>that's when the value is actually read.

>Suppose the device is the navigation system for an airplane, and you
>haven't taken off yet. Wouldn't you like to know whether you could
>rely on it once you're in the air?

With avionics, it should be noted that at 10 km in the polar cap
areas, the radiation level is higher than elsewhere, so it is a good
idea to do continuous checks if your device might move in those areas.
I don't know if the South Atlantic Anomaly will increase the radiation
levels at 10 km significantly, but at least in low orbit satellites,
there is a significant increase in the radiation levels.

Paul

Reply by Guy Macon ●July 15, 20042004-07-15

Paul Keinanen <keinanen@sci.fi> says...

>With avionics, it should be noted that at 10 km in the polar cap
>areas, the radiation level is higher than elsewhere, so it is a good
>idea to do continuous checks if your device might move in those areas.

Again, I have seen no evidence that the sum-checker is more reliable
than the EEPROM being checked. Everyone seems to be accepting that
it is based on nothing more than blind faith.

Reply by Paul Keinanen ●July 15, 20042004-07-15

On Thu, 15 Jul 2004 01:45:42 -0700, Guy Macon
<http://www.guymacon.com> wrote:

>Again, I have seen no evidence that the sum-checker is more reliable
>than the EEPROM being checked. Everyone seems to be accepting that
>it is based on nothing more than blind faith.

Even if the checksum algorithm is executed directly out of the EEPROM
(which is not always the case), the surface area occupied by the
checker is very small compared to the total area of the EEPROM in most
cases. If there is a single (hard or soft) error in the EEPROM, the
likelihood is much greater that is in the error is the other part of
the EEPROM than in the checker code itself due to the area ratio.

The worst case is that there are error(s) in the EEPROM, but a bit
flip in the actual checker code will modify the program so that it
will return EEPROM OK, but the likelihood is still smaller. 

Then there is the different question, is it enough to be able to
detect only a single bit error or is detection of multiple errors
needed. If the errors appear randomly, it might be sufficient to be
able to detect only one or two errors if the checker is executed often
enough. After detecting of the first error, the device should be taken
out of service.    

However, if there is a great likelihood of multiple errors appearing
once, e.g. when a highly energetic particle hits the box and creates a
shower of secondary particles hitting all over the EEPROM, you need an
algorithm that is able to detect multiple errors at once.

Paul

Reply by Frank Bemelman ●July 15, 20042004-07-15

"Guy Macon" <http://www.guymacon.com> schreef in bericht
news:10fcen2cf7e7h3f@corp.supernews.com...
>
> Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says...
> >
> >Oh, not again. Why not a controller for a nuclear power plant, an
> >on-board engine controller for a satellite in orbit, a laser for
> >eye corrections, a heart-lung machine in an OR, a launcher for
> >H-boms...
> >
> >The bottom line is that folks that implement it, probably need it.
>
> I don't think so.

They need it, but only from their point of view.

> >Either because their hardware is crap
>
> But why would they imagine thet the EEPROM hardware is crap while
> having faith the the "do the checksum" hardware and the "store the
> checksum to compare with the EEPROM" hardware will work just fine?

I have no idea.

> >or they are paranoid
>
> By why would they be paranoid about the EEPROM hardware and not the
> EEPROM checking hardware?

Paranoid behaviour implies lack of understanding.

> >or it is a requirement from some retard,
>
> Possibly , but if you develop embedded sytems, it's your *job* to
> identify flaws in the requirements.  If it isn't needed and the
> customer insists on having it you have to make the personal choice
> of whether to do it or find work elsewhere.

Oh, if someone insist on it, even after pointing out it isn't
very useful, why not. But most requirement flaws I ignore without
informing the person that wrote them (or simply copied them
from another project).


-- 
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)

Reply by Spehro Pefhany ●July 15, 20042004-07-15

On Wed, 07 Jul 2004 23:10:59 GMT, the renowned postmaster@noname.com
(Ken Lee) wrote:

>On Tue, 06 Jul 2004 23:34:14 GMT, Spehro Pefhany
><speffSNIP@interlogDOTyou.knowwhat> wrote:

>
>What does "salvage" really mean. If a checksum is performed on a block
>of memory & it's wrong then one could take this to mean that 1 or any
>number of the contents are incorrect. If a checksum is performed on a
>single item and it's wrong, then all you can deduce from this is that
>the value is incorrect. Unless you keep a mirror image of the data you
>cannot "salvage" the correct value. Possibly the only things one could
>do is fall back to some "safe" or default value, reset the device or
>place the device in an error state.

It means I incorporate a lot (more than just a mirror) of redundancy
on important information, because a non-recoverable failure is very
expensive. Data integrity is more important than saving a few cents on
memory. The other options you mention are open if they are acceptable
in the application, of course. Some systems have no "safe" state (few
I work on), or there is an unpleasant choice such a) test limit
controls, b) cause $10,000 damage (100% certain). 

>>>Also is a "checksum" adequate or should a CRC be calculated? 
>>>
>>>Performing continuous eeprom checks could chew up considerable MIPs,
>>
>>It could not, too. 
>Sorry but this requirement just has that particular pattern as a
>MIPs-chewer --  "repetitive calculation on a block of memory". Why
>can't this requirement be met on demand -- that is, the checksum is
>calculated when the data is read and used?

Yes, although checking the entire memory every time a few very
frequently accessed locations are used might be quite unnecessary
costly in bandwidth. But that's just implementation, and most anyone
here can figure ways around that. 

>I've no argument on resource budgeting for input requirements, but
>this particular requirement looks like a mitigation for some fault or
>hazard. I've no problem with checksumming or CRCing stored data, but
>doing it on a continual basis seems to me to be ill-formulated.
>Admittedly I don't know what the application is, but I'm in the
>medical electronics game and have worked in the automotive industry,
>and am familiar with over-burdened mitigations.
>
>Ken.

EEPROM is fundamentally different from RAM etc. because any errors
that arise (because of issues beyond the control of the engineer) will
persist indefinitely. They also wear out, and are fundamentally less
reliable than RAM due to the high dielectric stresses involved in
Fowler-Nordheim tunneling etc. (especially from re-writing). 

On frequency- as you say above, I don't think it's necessary to do it
more often than the information is accessed. ;-) More seriously, the
upper time limit is typically set by how long it takes the system to
get into trouble, worst-case. If it's a slow thermal system, then a
minute or ten minutes with worst-case outputs may be no big deal. 

Best regards, 
Spehro Pefhany
-- 
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com

Reply by Guy Macon ●July 15, 20042004-07-15

Paul Keinanen <keinanen@sci.fi> says...
>
>Guy Macon <http://www.guymacon.com> wrote:
>
>>Again, I have seen no evidence that the sum-checker is more reliable
>>than the EEPROM being checked. Everyone seems to be accepting that
>>it is based on nothing more than blind faith.
>
>Even if the checksum algorithm is executed directly out of the EEPROM
>(which is not always the case), the surface area occupied by the
>checker is very small compared to the total area of the EEPROM in most
>cases. If there is a single (hard or soft) error in the EEPROM, the
>likelihood is much greater that is in the error is the other part of
>the EEPROM than in the checker code itself due to the area ratio.
>The worst case is that there are error(s) in the EEPROM, but a bit
>flip in the actual checker code will modify the program so that it
>will return EEPROM OK, but the likelihood is still smaller. 

But the sum-checker is far more than lust the place where the 
sum-checking code is stored.  It is also the electronics that 
reads the code, the ALU that executes the code, the registers
and RAM that the code uses, and so forth.  One would have to 
estimate the error rate of all of those parts of the uC and 
compare them to the error rate of the EEPROM.  Unless you do 
that, you have no idea whether your continuous sum-checker 
increases or decreases system reliability compared to an 
on-demand sum-checker or no sum-checker at all. 

-- 
Guy Macon, Electronics Engineer & Project Manager for hire. 
Remember Doc Brown from the _Back to the Future_ movies? Do you 
have an "impossible" engineering project that only someone like 
Doc Brown can solve?  My resume is at http://www.guymacon.com/

Reply by Guy Macon ●July 15, 20042004-07-15

Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says...

>But most requirement flaws I ignore without
>informing the person that wrote them (or simply copied them
>from another project).

You would have a problem if I was your project manager.  You would
be instructed to evaluate the requirements and to agree or disagree
with each requirement, and the definition of your code being "done" 
would include the independent testers verifying that your code complies 
with all requirements. On my projects requirement errors are serious,
and they are to be corrected, not ignored.

Then again, I wouldn't be handing you requirements that are male bovine
excrement. Before you got a requirement to implement a continuous checksum, 
you would have hard numbers for EEPROM errors, sense-amp errors, ALU errors,
register errors. etc.. both under normal conditions and under conditions
of radiation, ESD, etc, and an analysis of the reliability impact, cost,
etc. of the sum-checker.


-- 
Guy Macon, Electronics Engineer & Project Manager for hire. 
Remember Doc Brown from the _Back to the Future_ movies? Do you 
have an "impossible" engineering project that only someone like 
Doc Brown can solve?  My resume is at http://www.guymacon.com/

Previous 2 345 6 7 Next

Continous eeprom checksum microcontroller

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group