EmbeddedRelated.com
Forums

EEPROM guarantees after power loss during a write

Started by John Devereux February 5, 2008
Jim Granville <no.spam@designtools.maps.co.nz> writes:

> John Devereux wrote: >> larwe <zwsdotcom@gmail.com> writes: >> >> >>>>Say power is lost during a write to a single byte in a page. What can >>>>I assume? Is just that byte suspect, or the whole page (or the whole >>>>device)? >>> >>>The answer to this question depends rather much on whether your >>>external brownout protection also asserts the write protect pin... >> >> >> I would like to know the situation where this does not happen (i.e. no >> external brownout detection). >> >> Actually in the case of the AT24C1024, it looks pretty useless >> anyway. It is active high, which still leaves the question of brownout >> behaviour open. And the datasheet implies it only provides write >> protection if asserted *before* the write. > > If this is important, it sounds like the sort of thing you > should run some agressive tests on. > Make the power fail during a write, and see what happens ? > All writes have to have a 'hidden erase', so check you can see > that 'on demand' and then look around for collateral damage....
That could work - at least to answer the question of whether an entire page is erased as part of a single-byte write. Perhaps a timer with an output pin hooked up to disconnect power, then vary the delay until I see something interesting. I don't think manually unplugging the supply is work, the page write time is 10ms and for all I know the vulnerable period could be a lot less than this.
> There are FRAMs, and I saw someone just released a 32KB SPI SRAM > too. > > -jg >
-- John Devereux
On 2008-02-05, Grant Edwards <grante@visi.com> wrote:
> On 2008-02-05, John Devereux <jdREMOVE@THISdevereux.me.uk> wrote: >> Hi, >> >> I am wondering what guarantees are there for existing EEPROM data, >> after power is lost during a write operation? >> >> I am writing a datalogging routine that writes records to an >> EEPROM. It's an Atmel 24C1024, although the question is probably >> applicable to other devices too. This uses "page mode" for writes - >> the device seems to be organised as 256 byte pages. >> >> Say power is lost during a write to a single byte in a page. What can >> I assume? Is just that byte suspect, or the whole page (or the whole >> device)? >> >> I can't find any information on this stuff. > > Atmel was very up front with me when I e-mailed their support > address with that exact question. They said that the byte > being written to when the power failed will be undefined, but > everything else will be OK.
I can't find that e-mail, and it could have been a different vendor (I've used EEPROMs from several different ones). You probably should press Atmel for an answer. -- Grant Edwards grante Yow! PEGGY FLEMMING is at stealing BASKET BALLS to visi.com feed the babies in VERMONT.
In article <87ve53i7fm.fsf@cordelia.devereux.me.uk>, John Devereux 
says...
> Hi, > > I am wondering what guarantees are there for existing EEPROM data, > after power is lost during a write operation? > > I am writing a datalogging routine that writes records to an > EEPROM. It's an Atmel 24C1024, although the question is probably > applicable to other devices too. This uses "page mode" for writes - > the device seems to be organised as 256 byte pages. > > Say power is lost during a write to a single byte in a page. What can > I assume? Is just that byte suspect, or the whole page (or the whole > device)? > > The microcontroller has brownout protection, so isn't going to run > wild - but what about the EEPROM internal state machine? Are they > generally protected against brownout?
My experience would suggest brownout protection on the devices themselves may be minimal. Brownout protection on the micro may actually make the problem worse. Do you know (is it documented) what the state of the micro's pins is during reset as opposed to coming out of reset?
> If I write a single byte, does this in fact involve a hidden > erase/write of the whole page?
Not usually for conventional EE. If it's flash masquarading as EE.....
> > I can't find any information on this stuff.
There does seem to be a fair amount of resistance to providing full details. Let me share a previous experience with EE - Environment, bit banged Microwire/SPI, electrically a bit noisy (100's of Amps switching near by). Hold up cap to maintain power when it is detected that power is removed. Power off detect comparitor used to let micro know when power was removed. - EEProm used to store operating parameters, operating clock and faults. Clocks written every 3 to 6 min to reduce wear on EE to tolerable level. Clock data protected with an ECC code. Fault flags unprotected. Parameters stored in two blocks each protected by a fletcher checksum. Both banks would be read on startup and if one block was bad it would be restored from the other. - Writes to EE would check that power was valid before starting. - In operation occaisional field returns due to parameter corruption. Results of improvment attempts. Each one of these resulted in an improvement. - Hold up cap size increased - Write sequence changed so one parameter block completely updated with checksum before next written. It should have been written that way to begin with of course. - Extra decoupling - Redundant pull-up (or was it pull down?) on some of the lines. It 'shouldn't' have been necessary as I recall. Although all of these helped, none eliminated the problem. Unfortunately it happed rarely enough that we didn't find a way to duplicate it in the lab. As a test I recommended a switch to FRAM to reduce the window of vulnerability but that hadn't happened by the time I left so I don't know if it would have helped. Some of the reading I did at the time suggested that if the EE state machine were interrupted things could go very wrong. Try a search for something like reliable EE. I did find something moons ago but as I recall it was from a vendor so judge that as you will. Robert -- Posted via a free Usenet account from http://www.teranews.com
In article <KV2qj.4178$0w.1057@newssvr27.news.prodigy.net>, Vladimir 
Vassilevsky says...
> > > John Devereux wrote: > > > I am wondering what guarantees are there for existing EEPROM data, > > after power is lost during a write operation? > > > > I am writing a datalogging routine that writes records to an > > EEPROM. It's an Atmel 24C1024, although the question is probably > > applicable to other devices too. This uses "page mode" for writes - > > the device seems to be organised as 256 byte pages. > > > > Say power is lost during a write to a single byte in a page. What can > > I assume? Is just that byte suspect, or the whole page (or the whole > > device)? > > I don't think anybody can tell for sure what can happen to the flash > write state machine when the power goes down at sudden. Hopefully it > will not have enough time to destroy the whole device, so something like > the journaling file system could help. > > You can also consider the autostore NVRAMs from Simtek: > > http://www.simtek.com/simtekSite.php > > Those parts are designed for the random power outages. > Works very well indeed.
Except, of course, when they don't. I had to modify a test bench at one point to add a test to write to such devices, power off, power on and read the device to see if the values were actually stored. Apparently there was a bad batch and the only way to check was to power cycle them (with them being off for a significant time before repowering them). You may want to add a check like that if you use them. Robert -- Posted via a free Usenet account from http://www.teranews.com
Robert Adsett <sub2@aeolusdevelopment.com> writes:

> In article <87ve53i7fm.fsf@cordelia.devereux.me.uk>, John Devereux > says... >> Hi, >> >> I am wondering what guarantees are there for existing EEPROM data, >> after power is lost during a write operation? >> >> I am writing a datalogging routine that writes records to an >> EEPROM. It's an Atmel 24C1024, although the question is probably >> applicable to other devices too. This uses "page mode" for writes - >> the device seems to be organised as 256 byte pages. >> >> Say power is lost during a write to a single byte in a page. What can >> I assume? Is just that byte suspect, or the whole page (or the whole >> device)? >> >> The microcontroller has brownout protection, so isn't going to run >> wild - but what about the EEPROM internal state machine? Are they >> generally protected against brownout? > > My experience would suggest brownout protection on the devices > themselves may be minimal. Brownout protection on the micro may > actually make the problem worse. Do you know (is it documented) what > the state of the micro's pins is during reset as opposed to coming out > of reset?
I naively assumed that "brownout protection" would prevent the micro from sending arbitrary data over the I/O pins. It's an ATMega128. The EEPROM is an I2C device (with 10k pullups on the 2 wires). The datasheet does say that the microcontroller I/O pins go to their "initial state" during a reset, i.e. high impedance inputs. So the I2C lines should get pulled high. Briefly...
>> If I write a single byte, does this in fact involve a hidden >> erase/write of the whole page? > > Not usually for conventional EE. If it's flash masquarading as > EE.....
I don't think so - but it's possible I suppose!
>> >> I can't find any information on this stuff. > > There does seem to be a fair amount of resistance to providing full > details. > > Let me share a previous experience with EE > > - Environment, bit banged Microwire/SPI, electrically a bit noisy > (100's of Amps switching near by). Hold up cap to maintain power when > it is detected that power is removed. Power off detect comparitor used > to let micro know when power was removed. > - EEProm used to store operating parameters, operating clock and > faults. Clocks written every 3 to 6 min to reduce wear on EE to > tolerable level. Clock data protected with an ECC code. Fault flags > unprotected. Parameters stored in two blocks each protected by a > fletcher checksum. Both banks would be read on startup and if one block > was bad it would be restored from the other. > - Writes to EE would check that power was valid before starting. > > - In operation occaisional field returns due to parameter corruption. > > Results of improvment attempts. Each one of these resulted in an > improvement. > - Hold up cap size increased > - Write sequence changed so one parameter block completely updated > with checksum before next written. It should have been written that way > to begin with of course.
This is basically what I will be doing (just the software part of the above).
> - Extra decoupling > - Redundant pull-up (or was it pull down?) on some of the lines. It > 'shouldn't' have been necessary as I recall. > > Although all of these helped, none eliminated the problem. > Unfortunately it happed rarely enough that we didn't find a way to > duplicate it in the lab. As a test I recommended a switch to FRAM to > reduce the window of vulnerability but that hadn't happened by the time > I left so I don't know if it would have helped. > > Some of the reading I did at the time suggested that if the EE state > machine were interrupted things could go very wrong. > > Try a search for something like reliable EE. I did find something moons > ago but as I recall it was from a vendor so judge that as you will.
Interesting, thanks for sharing that. In my application there are a few hundred units in the field that have no protection at all. I.e. the software is written ignoring power failure. And we are not getting problems. But it is obviously a possibility, so I am attempting to address it. Of course this will add complexity and be quite awkward to test. If I am not careful I could introduce a bug that would make things *worse*. So I want to have some clue that it is worth doing. -- John Devereux
On Feb 5, 11:28=A0pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
> Hi, > > I am wondering what guarantees are there for existing EEPROM data, > after power is lost during a write operation? > > I am writing a datalogging routine that writes records to an > EEPROM. It's an Atmel 24C1024, although the question is probably > applicable to other devices too. This uses "page mode" for writes - > the device seems to be organised as 256 byte pages. > > Say power is lost during a write to a single byte in a page. What can > I assume? Is just that byte suspect, or the whole page (or the whole > device)? > > The microcontroller has brownout protection, so isn't going to run > wild - but what about the EEPROM internal state machine? Are they > generally protected against brownout? > > If I write a single byte, does this in fact involve a hidden > erase/write of the whole page? > > I can't find any information on this stuff. > > -- > > John Devereux
John, We encountered the same problem with our product(still encountering...!).Even though we did not have a right fix,the way we approached to provide a work around for this: We implemented a checksum in our software to detect data corruption in eeprom and incase we find corruption,have a known good copy of eeprom data backup in ROM.(external flash).This data would be copied back to the eeprom during bootup.So this ensures customer has good data when he bootsup. When wrong data is updated due to brownouts,checksum is prone to vary. We will backup good data during a situation where we conclude at least one known set of good data is there.(This can be ascertained again by comparing with known checksum). We have used this workaround and after this workaround was implemented,we never faced any problems with the content of eeprom.Even though brownout situation still continues to happen,the impact was greatly minimised. As far this brownout,like your situation we also did not have either an external capacitor or an brownout protection pin in our board.We use ST's eeprom.I have raised a similar query to this a couple of months bak.Given below is the link: 1)http://groups.google.co.in/group/comp.arch.embedded/browse_thread/ thread/f24017eb1e913ac6/f51e6152809d6293? hl=3Den&lnk=3Dgst&q=3Dsubbarayan#f51e6152809d6293 2)Regarding checksum:http://groups.google.co.in/group/ comp.arch.embedded/browse_thread/thread/7bb610e206733fdf/ 70757e6c50a8dfb6?hl=3Den&lnk=3Dgst&q=3Dsubbarayan#70757e6c50a8dfb6 P.S:ours is an consumer electronics product.Processor:ST,EEPROM:ST's M24128BW . This solution may or may not be suitable to you depending on your product. Hope this helps, Regards, s.subbarayan
ssubbarayan <ssubba@gmail.com> writes:

> On Feb 5, 11:28&nbsp;pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote: >> Hi, >> >> I am wondering what guarantees are there for existing EEPROM data, >> after power is lost during a write operation? >> >> I am writing a datalogging routine that writes records to an >> EEPROM. It's an Atmel 24C1024, although the question is probably >> applicable to other devices too. This uses "page mode" for writes - >> the device seems to be organised as 256 byte pages. >> >> Say power is lost during a write to a single byte in a page. What can >> I assume? Is just that byte suspect, or the whole page (or the whole >> device)? >> >> The microcontroller has brownout protection, so isn't going to run >> wild - but what about the EEPROM internal state machine? Are they >> generally protected against brownout? >> >> If I write a single byte, does this in fact involve a hidden >> erase/write of the whole page? >> >> I can't find any information on this stuff. >> >> -- >> >> John Devereux > > John, > We encountered the same problem with our product(still > encountering...!).Even though we did not have a right fix,the way we > approached to provide a work around for this: > We implemented a checksum in our software to detect data corruption in > eeprom and incase we find corruption,have a known good copy of eeprom > data backup in ROM.(external flash).This data would be copied back to > the eeprom during bootup.So this ensures customer has good data when > he bootsup. > When wrong data is updated due to brownouts,checksum is prone to vary. > We will backup good data during a situation where we conclude at least > one known set of good data is there.(This can be ascertained again by > comparing with known checksum).
This is equivalent to what I was planning. Although I don't think I need a checksum. I was going to have "valid" markers, separate from the data blocks. So it would go mark copy 1 invalid write new copy 1 mark copy 1 valid mark copy 2 invalid write new copy 2 mark copy 2 valid On power up both copy valid flags would be checked, and any "invalid" copy overwritten with the valid one. The "copy valid" markers would be stored on separate pages from the data (and each other), so hopefully will not get corrupted at the same time as the data they refer to. Only problem with this is it requires 4 pages to be written instead of one. Using a checksum to replace the separate flags could mean just two pages - perhaps that is better after all.
> We have used this workaround and after this workaround was > implemented,we never faced any problems with the content of > eeprom.Even though brownout situation still continues to happen,the > impact was greatly minimised. > As far this brownout,like your situation we also did not have either > an external capacitor or an brownout protection pin in our board.We > use ST's eeprom.I have raised a similar query to this a couple of > months bak.Given below is the link: > 1)http://groups.google.co.in/group/comp.arch.embedded/browse_thread/thread/f24017eb1e913ac6/f51e6152809d6293?hl=en&lnk=gst&q=subbarayan#f51e6152809d6293
I will look at these. By the way, long links often get scrambled up on usenet. You can make it easier for some people if you enclose in angle brackets <http://groups.google.co.in/group/comp.arch.embedded/browse_thread/thread/f24017eb1e913ac6/f51e6152809d6293?hl=en&lnk=gst&q=subbarayan#f51e6152809d6293> This seems to stop them getting split up by news readers.
> 2)Regarding checksum:http://groups.google.co.in/group/ > comp.arch.embedded/browse_thread/thread/7bb610e206733fdf/ > 70757e6c50a8dfb6?hl=en&lnk=gst&q=subbarayan#70757e6c50a8dfb6 > > P.S:ours is an consumer electronics product.Processor:ST,EEPROM:ST's > M24128BW . > > This solution may or may not be suitable to you depending on your > product.
Thank you.
> Hope this helps, > Regards, > s.subbarayan >
-- John Devereux
A good way around this problem is to have a power monitor function on
the micro.

If this shows the power is going then you shouldnt write to the
EEPROM.
Depending on the power supply it might give you time  to write one or
more pages of data to the EEPROM.

I used to do work with dataloggers and if the power supply went we had
enough time to write all the data to the EEPROM before the power
supply died.
But we did have a pin on the micro that showed power was dying.
You might even need to beef up the pwoer supply caps to give yo ua bit
longer.
In article <87ve52787s.fsf@cordelia.devereux.me.uk>, John Devereux 
says...
> Robert Adsett <sub2@aeolusdevelopment.com> writes: > > > In article <87ve53i7fm.fsf@cordelia.devereux.me.uk>, John Devereux > > says... > >> Hi, > >> > >> I am wondering what guarantees are there for existing EEPROM data, > >> after power is lost during a write operation? > >> > >> I am writing a datalogging routine that writes records to an > >> EEPROM. It's an Atmel 24C1024, although the question is probably > >> applicable to other devices too. This uses "page mode" for writes - > >> the device seems to be organised as 256 byte pages. > >> > >> Say power is lost during a write to a single byte in a page. What can > >> I assume? Is just that byte suspect, or the whole page (or the whole > >> device)? > >> > >> The microcontroller has brownout protection, so isn't going to run > >> wild - but what about the EEPROM internal state machine? Are they > >> generally protected against brownout? > > > > My experience would suggest brownout protection on the devices > > themselves may be minimal. Brownout protection on the micro may > > actually make the problem worse. Do you know (is it documented) what > > the state of the micro's pins is during reset as opposed to coming out > > of reset? > > I naively assumed that "brownout protection" would prevent the micro > from sending arbitrary data over the I/O pins. It's an ATMega128. The > EEPROM is an I2C device (with 10k pullups on the 2 wires). The > datasheet does say that the microcontroller I/O pins go to their > "initial state" during a reset, i.e. high impedance inputs. So the I2C > lines should get pulled high. Briefly...
There's another question I've remembered when dealing with brownout. Not only the question of whether I/O is the same in reset as on its rising edge but also over what range reset is asserted and will hold those values. The problem can occur (or so I've heard) if the voltage drops to a value that the brownout circuit can no longer hold the micro in reset but the voltage is still high enough for the EE to be operating. Not normally an issue since most I/O fails when the voltage drops that far anyway but appently it can be an issue with some EEs. And when you have a hold up cap any transition through such a zone will be slow.
> > Try a search for something like reliable EE. I did find something moons > > ago but as I recall it was from a vendor so judge that as you will. > > Interesting, thanks for sharing that. > > In my application there are a few hundred units in the field that have > no protection at all. I.e. the software is written ignoring power > failure. And we are not getting problems. But it is obviously a > possibility, so I am attempting to address it. Of course this will add > complexity and be quite awkward to test. If I am not careful I could > introduce a bug that would make things *worse*. So I want to have some > clue that it is worth doing.
It eill certainly help to have a checksum of some sort on the data if you can. At least then you know something went wrong. Otherwise if a random byte changed would you be able to tell? If you are not getting problems I'd be tempted to make my first step just making sure that problems will be detected if they occur. Robert -- Posted via a free Usenet account from http://www.teranews.com
Robert Adsett <sub2@aeolusdevelopment.com> writes:

> In article <87ve52787s.fsf@cordelia.devereux.me.uk>, John Devereux > says... >> Robert Adsett <sub2@aeolusdevelopment.com> writes: >> >> > In article <87ve53i7fm.fsf@cordelia.devereux.me.uk>, John Devereux >> > says... >> >> Hi, >> >> >> >> I am wondering what guarantees are there for existing EEPROM data, >> >> after power is lost during a write operation? >> >> >> >> I am writing a datalogging routine that writes records to an >> >> EEPROM. It's an Atmel 24C1024, although the question is probably >> >> applicable to other devices too. This uses "page mode" for writes - >> >> the device seems to be organised as 256 byte pages. >> >> >> >> Say power is lost during a write to a single byte in a page. What can >> >> I assume? Is just that byte suspect, or the whole page (or the whole >> >> device)? >> >> >> >> The microcontroller has brownout protection, so isn't going to run >> >> wild - but what about the EEPROM internal state machine? Are they >> >> generally protected against brownout? >> > >> > My experience would suggest brownout protection on the devices >> > themselves may be minimal. Brownout protection on the micro may >> > actually make the problem worse. Do you know (is it documented) what >> > the state of the micro's pins is during reset as opposed to coming out >> > of reset? >> >> I naively assumed that "brownout protection" would prevent the micro >> from sending arbitrary data over the I/O pins. It's an ATMega128. The >> EEPROM is an I2C device (with 10k pullups on the 2 wires). The >> datasheet does say that the microcontroller I/O pins go to their >> "initial state" during a reset, i.e. high impedance inputs. So the I2C >> lines should get pulled high. Briefly... > > There's another question I've remembered when dealing with brownout. > Not only the question of whether I/O is the same in reset as on its > rising edge but also over what range reset is asserted and will hold > those values. > > The problem can occur (or so I've heard) if the voltage drops to a value > that the brownout circuit can no longer hold the micro in reset but the > voltage is still high enough for the EE to be operating.
The problem is that this information does not seem to be available.
> Not normally an issue since most I/O fails when the voltage drops > that far anyway but appently it can be an issue with some EEs. And > when you have a hold up cap any transition through such a zone will > be slow.
I was just thinking that a "hold up" cap could be a bad idea in this respect. Might be best just to get rid of the supply ASAP - the opposite of a hold up cap, get it through the "dangerous" region quickly.
>> > Try a search for something like reliable EE. I did find something moons >> > ago but as I recall it was from a vendor so judge that as you will. >> >> Interesting, thanks for sharing that. >> >> In my application there are a few hundred units in the field that have >> no protection at all. I.e. the software is written ignoring power >> failure. And we are not getting problems. But it is obviously a >> possibility, so I am attempting to address it. Of course this will add >> complexity and be quite awkward to test. If I am not careful I could >> introduce a bug that would make things *worse*. So I want to have some >> clue that it is worth doing. > > It eill certainly help to have a checksum of some sort on the data if > you can. At least then you know something went wrong. Otherwise if a > random byte changed would you be able to tell? > > If you are not getting problems I'd be tempted to make my first step > just making sure that problems will be detected if they occur.
-- John Devereux