EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Saving data in CPU on-chip EEPROM

Started by yossi_sr June 14, 2005
Hi all,
What happens if during saving the parameter in byte of EEPROM( it takes
about 5-10 msec)the Power Down occurs.What should be in the byte
programmed ? The same value/ The next value/ Undetermined?
I developed the diesel controller card which uses AT89C51ED2 processor(
2K bytes onchip EEPROM).The system manages failure table for various
parameters and in case of failure the appropriate
byte in EEPROM is incremented ( counting failures).Each failure counter
consists of 2 bytes ( counting up to FFFF).The system may be switched
off (Power down) occasionally during programming and this may lead to
errors such as:
1) when writing ( incrementing) low order byte (its value is less
   than FF)and Power Down occurs,I am not sure what this byte
   will contain after next Power On.
2) when the value in counter is 0x00FF, incrementing means writing
   0x00 to low order byte and then writing 0x01 to the high
   order   byte. If Power Down happens after writing the low order
   byte the system will show error value 0x0000 after next Power
   On.

One solution may be to use checksum on all block of data, and
always recalculate the new checksum after each parameter change.If
the checksum is not OK I can clear all parameters.I don't like this
solution because in case of one error I should clear all
the list of failures.
Is there any other solution or suggestion how to solve this problem?
Thanks!
Joseph

On 2005-06-14, yossi_sr <YSrebrnik@kinetics.co.il> wrote:

> What happens if during saving the parameter in byte of EEPROM( it takes > about 5-10 msec)the Power Down occurs.What should be in the byte > programmed ? The same value/ The next value/ Undetermined?
Undetermined.
> I developed the diesel controller card which uses AT89C51ED2 > processor( 2K bytes onchip EEPROM).The system manages failure > table for various parameters and in case of failure the > appropriate byte in EEPROM is incremented ( counting > failures).Each failure counter consists of 2 bytes ( counting > up to FFFF).The system may be switched off (Power down) > occasionally during programming and this may lead to errors > such as:
> 1) when writing ( incrementing) low order byte (its value is less > than FF)and Power Down occurs,I am not sure what this byte > will contain after next Power On.
You have no way of knowing.
> 2) when the value in counter is 0x00FF, incrementing means > writing 0x00 to low order byte and then writing 0x01 to the > high order byte. If Power Down happens after writing the > low order byte the system will show error value 0x0000 > after next Power On.
Bummer, eh?
> One solution may be to use checksum on all block of data, and > always recalculate the new checksum after each parameter > change.If the checksum is not OK I can clear all parameters.I > don't like this solution because in case of one error I should > clear all the list of failures. Is there any other solution or > suggestion how to solve this problem?
Put two copies in EEPROM with checksums and some sort of sequence number or "active" flag so that you use the most recently written valid copy and write to the older one each time. That way if you loose power during an update, you don't loose everything, you just fall back on the previous set of data. -- Grant Edwards grante Yow! Am I accompanied by at a PARENT or GUARDIAN? visi.com
Hello Joseph,

Read about how that is done in mission critical applications, such as 
navigation systems or the actual controls.

Some systems store values in much more than one location and upon 
wake-up can then perform a majority decision. That requires at least 
three writes but I have seen as high as five. IOW, one value might be 
corrupted but when the others corroborate they will be selected as valid.

Regards, Joerg

http://www.analogconsultants.com
yossi_sr wrote:
> Is there any other solution or suggestion how to solve this problem?
Perhaps a small battery or supercap that can keep the CPU alive during write operations in spite of system power going down? Kelly
yossi_sr <YSrebrnik@kinetics.co.il> wrote:
> Hi all, > What happens if during saving the parameter in byte of EEPROM( it takes > about 5-10 msec)the Power Down occurs.
A desaster. So you must not let that happen, or you must build your code so it can survive the desaster. The usual suggestions are: 1) control the power. Use a back-up supply (e.g. a large capacitor) and an external power-off detector on the "upstream" side of that, which let you know at least a couple of milliseconds in advance before power to the CPU actually goes down. 2) Change the data-pattern used by the counters to a one-way modification, i.e. each fault will clear one bit, instead of re-programming an entire byte, which will go through an erase to value 0xff, and could lose you not just the current error you're trying to record, but also the count of previous ones. -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.
yossi_sr wrote:
> Hi all, > What happens if during saving the parameter in byte of EEPROM( it takes > about 5-10 msec)the Power Down occurs.What should be in the byte > programmed ? The same value/ The next value/ Undetermined? > I developed the diesel controller card which uses AT89C51ED2 processor( > 2K bytes onchip EEPROM).The system manages failure table for various > parameters and in case of failure the appropriate > byte in EEPROM is incremented ( counting failures).Each failure counter > consists of 2 bytes ( counting up to FFFF).The system may be switched > off (Power down) occasionally during programming and this may lead to > errors such as: > 1) when writing ( incrementing) low order byte (its value is less > than FF)and Power Down occurs,I am not sure what this byte > will contain after next Power On. > 2) when the value in counter is 0x00FF, incrementing means writing > 0x00 to low order byte and then writing 0x01 to the high > order byte. If Power Down happens after writing the low order > byte the system will show error value 0x0000 after next Power > On. > > One solution may be to use checksum on all block of data, and > always recalculate the new checksum after each parameter change.If > the checksum is not OK I can clear all parameters.I don't like this > solution because in case of one error I should clear all > the list of failures. > Is there any other solution or suggestion how to solve this problem?
Welcome to the world of transaction processing! What you need is a way to write multiple bytes so that the whole write either succeeds or is not done at all. One way of processing a transaction is to use a journal: an area in the EEROM large enough to house one write set (two bytes in your example) and the necessary control data. Usually the control data consists of one status byte, the starting address and the length of the write. The operation is done: 1. mark status as tentative, 2. copy the write data to journal, 3. mark status as journal written, 4. copy the write data to final destination, 5. mark the status as free. If the operation chain is broken, it's possible to roll-back the write from the journal. A broken operation before end of step 3 is simply discarded by marking status as free. A broken operation after step 3 can be completed by copying the data from the journal to the final destination. Please note that journaling in this form is ill suited for Flash memories, as the status byte and journal must be repetitively overwritten. HTH -- Tauno Voipio tauuno voipio (at) iki fi
On 2005-06-15, Tauno Voipio <tauno.voipio@iki.fi.NOSPAM.invalid> wrote:

>> What happens if during saving the parameter in byte of EEPROM( it takes >> about 5-10 msec)the Power Down occurs.What should be in the byte >> programmed ?
> Welcome to the world of transaction processing! > > What you need is a way to write multiple bytes so that the > whole write either succeeds or is not done at all. > > One way of processing a transaction is to use a journal:
[...]
> Please note that journaling in this form is ill suited > for Flash memories, as the status byte and journal must > be repetitively overwritten.
Doesn't the same note applie to EEPROM? -- Grant Edwards grante Yow! DIDI... is that a at MARTIAN name, or, are we visi.com in ISRAEL?
Thank you for your suggestions.The hardware is already closed , so I
cannot add power-off detector.Other solutions given here seem to be too
complicated for this small project.
Once again, to remind, I'd like to summarise the system:
1) The EEPROM is on-chip the AT89C51ED2 microcontroller( 2kb).
   There is no page write,just byte write.
   Programming one byte takes 10msec typical.
2) The data recorded consist of 20 failure counters ( when one
   of the system failure occurs, the corresponding to this failure
   2byte counter is incremented by one).
   No Preset,default or etc.. values.If the value of the counter
   reaches 0xFFFF no further increments would take place for this
   variable until it is cleared. The EEPROM data array may be cleared
   via RS-232 terminal command by the user. All failure counters start
   from zero, and after some amount of time it is possible to read the
   failure counters and display them at the terminal.
   The data block also includes additional 3 Hourmeters which hold the
   overall time the various parts of the system are working.
   These params are updated every 15 min(to reduce writes to EEPROM)
   No RTC,counting in software.
3) There is no RTOS running on this project.
4) Writing a change in EEPROM will be always done on one parameter only
   at the given time.

In my opinion the following solution will be sure and sufficient:

1) I will hold two same blocks of data in EEPROM (block A and block B),
   each one ends with checksum.
2) When there is time to increment parameter:
    a) udate parameter in A.
    b) write new checksum in A.
    c) udate parameter in B.
    d) write new checksum in B.
3) On Power On the checksum of both blocks will be checked:
    a) If A and B are both OK , we check which block is latest
       (by comparing parameter values of both blocks).
       Then we copy the newer block to the oldest.(normally there
       should be only one change, so copying is reduced to the one
       parameter only).
    b) If one of the blocks is not OK , we copy the second block
       to the first.
    c) If both blocks are not OK, we clear both blocks.

I think this approach will sufficiently protect the data corruption
in EEPROM in case when switching off the Power during write cycle.
I don't bother against any other source of disturbances. I should
prefer to improve the hardware rather than to invest in complicated
software algorithms.
And probably this approach is reliable enough in this case because
the data stored is simple and sequential.

What do you think ? Will the described above approach be sufficient?
Are there any potential errors in logic which I don't see?
Please let me know before I start to write this software.
Thanks!

yossi_sr wrote:
> Thank you for your suggestions.The hardware is already closed , so I > cannot add power-off detector.Other solutions given here seem to be too > complicated for this small project. > Once again, to remind, I'd like to summarise the system: > 1) The EEPROM is on-chip the AT89C51ED2 microcontroller( 2kb). > There is no page write,just byte write. > Programming one byte takes 10msec typical. > 2) The data recorded consist of 20 failure counters ( when one > of the system failure occurs, the corresponding to this failure > 2byte counter is incremented by one). > No Preset,default or etc.. values.If the value of the counter > reaches 0xFFFF no further increments would take place for this > variable until it is cleared. The EEPROM data array may be cleared > via RS-232 terminal command by the user. All failure counters start > from zero, and after some amount of time it is possible to read the > failure counters and display them at the terminal. > The data block also includes additional 3 Hourmeters which hold the > overall time the various parts of the system are working. > These params are updated every 15 min(to reduce writes to EEPROM) > No RTC,counting in software. > 3) There is no RTOS running on this project. > 4) Writing a change in EEPROM will be always done on one parameter only > at the given time. > > In my opinion the following solution will be sure and sufficient: > > 1) I will hold two same blocks of data in EEPROM (block A and block B), > each one ends with checksum. > 2) When there is time to increment parameter: > a) udate parameter in A. > b) write new checksum in A. > c) udate parameter in B. > d) write new checksum in B. > 3) On Power On the checksum of both blocks will be checked: > a) If A and B are both OK , we check which block is latest > (by comparing parameter values of both blocks). > Then we copy the newer block to the oldest.(normally there > should be only one change, so copying is reduced to the one > parameter only). > b) If one of the blocks is not OK , we copy the second block > to the first. > c) If both blocks are not OK, we clear both blocks. > > I think this approach will sufficiently protect the data corruption > in EEPROM in case when switching off the Power during write cycle. > I don't bother against any other source of disturbances. I should > prefer to improve the hardware rather than to invest in complicated > software algorithms. > And probably this approach is reliable enough in this case because > the data stored is simple and sequential. > > What do you think ? Will the described above approach be sufficient? > Are there any potential errors in logic which I don't see? > Please let me know before I start to write this software. > Thanks! >
That should work out fine. When you have the space, duplication of the data is easier than using a log journal (which would be the way to do it using less extra data). The checksum is essential so that you know which copy is correct at power-up. It is also possible to include a "version number" parameter, which would allow you to update only one copy at a time and thereby save have your writes. Just don't copy Microsoft's design for FAT - for "safety", they make two copies of the FAT, but forgot to include any way to spot corruption in the event of discrepancies between them!
Grant Edwards wrote:
> On 2005-06-15, Tauno Voipio <tauno.voipio@iki.fi.NOSPAM.invalid> wrote: > > >>>What happens if during saving the parameter in byte of EEPROM( it takes >>>about 5-10 msec)the Power Down occurs.What should be in the byte >>>programmed ? > > >>Welcome to the world of transaction processing! >> >>What you need is a way to write multiple bytes so that the >>whole write either succeeds or is not done at all. >> >>One way of processing a transaction is to use a journal: > > > [...] > > >>Please note that journaling in this form is ill suited >>for Flash memories, as the status byte and journal must >>be repetitively overwritten. > > > Doesn't the same note applie to EEPROM? >
Not so far - a Flash must be block erased, which complicates the thing quite remarkably. -- Tauno Voipio tauno voipio (at) iki fi

Memfault Beyond the Launch