EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Continous eeprom checksum microcontroller

Started by Vishal July 3, 2004
On Sun, 11 Jul 2004 22:12:58 -0400, Jim McGinnis
<remove_this.mcginnis@and_this.ieee.org> wrote:

>On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee) >wrote: > > >>I'm sure that there are a multitude of ways to implement this but that >>wasn't my point. I was making an observation as to why this >>requirement had to be done continuously, opposed when it's needed & >>that's when the value is actually read. >> > >Suppose the device is the navigation system for an airplane, and you >haven't taken off yet. Wouldn't you like to know whether you could >rely on it once you're in the air? > >-- >Jim McGinnis
So you're implying that they don't do a system check of the navigation system on the ground before take-off. I'm assuming that the navigation system is turned on prior to the plane getting into the air. Let me be perfectly clear -- I'm NOT saying that the check shouldn't be done. I'm objecting to the fact that it is done "continuously" rather than on-demand. Ken. +====================================+ I hate junk email. Please direct any genuine email to: kenlee at hotpop.com
"Jim McGinnis" <remove_this.mcginnis@and_this.ieee.org> schreef in bericht
news:fjs3f0tkq1ifs979oj8h1ipd16fallvpq8@4ax.com...
> On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee) > wrote: > > > >I'm sure that there are a multitude of ways to implement this but that > >wasn't my point. I was making an observation as to why this > >requirement had to be done continuously, opposed when it's needed & > >that's when the value is actually read. > > > > Suppose the device is the navigation system for an airplane, and you > haven't taken off yet. Wouldn't you like to know whether you could > rely on it once you're in the air?
Oh, not again. Why not a controller for a nuclear power plant, an on-board engine controller for a satellite in orbit, a laser for eye corrections, a heart-lung machine in an OR, a launcher for H-boms... The bottom line is that folks that implement it, probably need it. Either because their hardware is crap or they are paranoid or it is a requirement from some retard, leaving 0.01% of applications where it really serves a purpose. -- Thanks, Frank. (remove 'x' and 'invalid' when replying by email)
Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says...
> >Oh, not again. Why not a controller for a nuclear power plant, an >on-board engine controller for a satellite in orbit, a laser for >eye corrections, a heart-lung machine in an OR, a launcher for >H-boms... > >The bottom line is that folks that implement it, probably need it.
I don't think so.
>Either because their hardware is crap
But why would they imagine thet the EEPROM hardware is crap while having faith the the "do the checksum" hardware and the "store the checksum to compare with the EEPROM" hardware will work just fine?
>or they are paranoid
By why would they be paranoid about the EEPROM hardware and not the EEPROM checking hardware?
>or it is a requirement from some retard,
Possibly , but if you develop embedded sytems, it's your *job* to identify flaws in the requirements. If it isn't needed and the customer insists on having it you have to make the personal choice of whether to do it or find work elsewhere.
On Sun, 11 Jul 2004 22:12:58 -0400, Jim McGinnis
<remove_this.mcginnis@and_this.ieee.org> wrote:

>On Sun, 11 Jul 2004 22:55:13 GMT, postmaster@noname.com (Ken Lee) >wrote: >
>>I was making an observation as to why this >>requirement had to be done continuously, opposed when it's needed & >>that's when the value is actually read.
>Suppose the device is the navigation system for an airplane, and you >haven't taken off yet. Wouldn't you like to know whether you could >rely on it once you're in the air?
With avionics, it should be noted that at 10 km in the polar cap areas, the radiation level is higher than elsewhere, so it is a good idea to do continuous checks if your device might move in those areas. I don't know if the South Atlantic Anomaly will increase the radiation levels at 10 km significantly, but at least in low orbit satellites, there is a significant increase in the radiation levels. Paul
Paul Keinanen <keinanen@sci.fi> says...

>With avionics, it should be noted that at 10 km in the polar cap >areas, the radiation level is higher than elsewhere, so it is a good >idea to do continuous checks if your device might move in those areas.
Again, I have seen no evidence that the sum-checker is more reliable than the EEPROM being checked. Everyone seems to be accepting that it is based on nothing more than blind faith.
On Thu, 15 Jul 2004 01:45:42 -0700, Guy Macon
<http://www.guymacon.com> wrote:

>Again, I have seen no evidence that the sum-checker is more reliable >than the EEPROM being checked. Everyone seems to be accepting that >it is based on nothing more than blind faith.
Even if the checksum algorithm is executed directly out of the EEPROM (which is not always the case), the surface area occupied by the checker is very small compared to the total area of the EEPROM in most cases. If there is a single (hard or soft) error in the EEPROM, the likelihood is much greater that is in the error is the other part of the EEPROM than in the checker code itself due to the area ratio. The worst case is that there are error(s) in the EEPROM, but a bit flip in the actual checker code will modify the program so that it will return EEPROM OK, but the likelihood is still smaller. Then there is the different question, is it enough to be able to detect only a single bit error or is detection of multiple errors needed. If the errors appear randomly, it might be sufficient to be able to detect only one or two errors if the checker is executed often enough. After detecting of the first error, the device should be taken out of service. However, if there is a great likelihood of multiple errors appearing once, e.g. when a highly energetic particle hits the box and creates a shower of secondary particles hitting all over the EEPROM, you need an algorithm that is able to detect multiple errors at once. Paul
"Guy Macon" <http://www.guymacon.com> schreef in bericht
news:10fcen2cf7e7h3f@corp.supernews.com...
> > Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says... > > > >Oh, not again. Why not a controller for a nuclear power plant, an > >on-board engine controller for a satellite in orbit, a laser for > >eye corrections, a heart-lung machine in an OR, a launcher for > >H-boms... > > > >The bottom line is that folks that implement it, probably need it. > > I don't think so.
They need it, but only from their point of view.
> >Either because their hardware is crap > > But why would they imagine thet the EEPROM hardware is crap while > having faith the the "do the checksum" hardware and the "store the > checksum to compare with the EEPROM" hardware will work just fine?
I have no idea.
> >or they are paranoid > > By why would they be paranoid about the EEPROM hardware and not the > EEPROM checking hardware?
Paranoid behaviour implies lack of understanding.
> >or it is a requirement from some retard, > > Possibly , but if you develop embedded sytems, it's your *job* to > identify flaws in the requirements. If it isn't needed and the > customer insists on having it you have to make the personal choice > of whether to do it or find work elsewhere.
Oh, if someone insist on it, even after pointing out it isn't very useful, why not. But most requirement flaws I ignore without informing the person that wrote them (or simply copied them from another project). -- Thanks, Frank. (remove 'x' and 'invalid' when replying by email)
On Wed, 07 Jul 2004 23:10:59 GMT, the renowned postmaster@noname.com
(Ken Lee) wrote:

>On Tue, 06 Jul 2004 23:34:14 GMT, Spehro Pefhany ><speffSNIP@interlogDOTyou.knowwhat> wrote:
> >What does "salvage" really mean. If a checksum is performed on a block >of memory & it's wrong then one could take this to mean that 1 or any >number of the contents are incorrect. If a checksum is performed on a >single item and it's wrong, then all you can deduce from this is that >the value is incorrect. Unless you keep a mirror image of the data you >cannot "salvage" the correct value. Possibly the only things one could >do is fall back to some "safe" or default value, reset the device or >place the device in an error state.
It means I incorporate a lot (more than just a mirror) of redundancy on important information, because a non-recoverable failure is very expensive. Data integrity is more important than saving a few cents on memory. The other options you mention are open if they are acceptable in the application, of course. Some systems have no "safe" state (few I work on), or there is an unpleasant choice such a) test limit controls, b) cause $10,000 damage (100% certain).
>>>Also is a "checksum" adequate or should a CRC be calculated? >>> >>>Performing continuous eeprom checks could chew up considerable MIPs, >> >>It could not, too. >Sorry but this requirement just has that particular pattern as a >MIPs-chewer -- "repetitive calculation on a block of memory". Why >can't this requirement be met on demand -- that is, the checksum is >calculated when the data is read and used?
Yes, although checking the entire memory every time a few very frequently accessed locations are used might be quite unnecessary costly in bandwidth. But that's just implementation, and most anyone here can figure ways around that.
>I've no argument on resource budgeting for input requirements, but >this particular requirement looks like a mitigation for some fault or >hazard. I've no problem with checksumming or CRCing stored data, but >doing it on a continual basis seems to me to be ill-formulated. >Admittedly I don't know what the application is, but I'm in the >medical electronics game and have worked in the automotive industry, >and am familiar with over-burdened mitigations. > >Ken.
EEPROM is fundamentally different from RAM etc. because any errors that arise (because of issues beyond the control of the engineer) will persist indefinitely. They also wear out, and are fundamentally less reliable than RAM due to the high dielectric stresses involved in Fowler-Nordheim tunneling etc. (especially from re-writing). On frequency- as you say above, I don't think it's necessary to do it more often than the information is accessed. ;-) More seriously, the upper time limit is typically set by how long it takes the system to get into trouble, worst-case. If it's a slow thermal system, then a minute or ten minutes with worst-case outputs may be no big deal. Best regards, Spehro Pefhany -- "it's the network..." "The Journey is the reward" speff@interlog.com Info for manufacturers: http://www.trexon.com Embedded software/hardware/analog Info for designers: http://www.speff.com
Paul Keinanen <keinanen@sci.fi> says...
> >Guy Macon <http://www.guymacon.com> wrote: > >>Again, I have seen no evidence that the sum-checker is more reliable >>than the EEPROM being checked. Everyone seems to be accepting that >>it is based on nothing more than blind faith. > >Even if the checksum algorithm is executed directly out of the EEPROM >(which is not always the case), the surface area occupied by the >checker is very small compared to the total area of the EEPROM in most >cases. If there is a single (hard or soft) error in the EEPROM, the >likelihood is much greater that is in the error is the other part of >the EEPROM than in the checker code itself due to the area ratio. >The worst case is that there are error(s) in the EEPROM, but a bit >flip in the actual checker code will modify the program so that it >will return EEPROM OK, but the likelihood is still smaller.
But the sum-checker is far more than lust the place where the sum-checking code is stored. It is also the electronics that reads the code, the ALU that executes the code, the registers and RAM that the code uses, and so forth. One would have to estimate the error rate of all of those parts of the uC and compare them to the error rate of the EEPROM. Unless you do that, you have no idea whether your continuous sum-checker increases or decreases system reliability compared to an on-demand sum-checker or no sum-checker at all. -- Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/
Frank Bemelman <f.bemelmanx@xs4all.invalid.nl> says...

>But most requirement flaws I ignore without >informing the person that wrote them (or simply copied them >from another project).
You would have a problem if I was your project manager. You would be instructed to evaluate the requirements and to agree or disagree with each requirement, and the definition of your code being "done" would include the independent testers verifying that your code complies with all requirements. On my projects requirement errors are serious, and they are to be corrected, not ignored. Then again, I wouldn't be handing you requirements that are male bovine excrement. Before you got a requirement to implement a continuous checksum, you would have hard numbers for EEPROM errors, sense-amp errors, ALU errors, register errors. etc.. both under normal conditions and under conditions of radiation, ESD, etc, and an analysis of the reliability impact, cost, etc. of the sum-checker. -- Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/

Memfault Beyond the Launch