Reply by John Devereux February 8, 20082008-02-08
Arlet <usenet+5@c-scape.nl> writes:

> On Feb 8, 11:58 am, John Devereux <jdREM...@THISdevereux.me.uk> wrote: > >> I would still love to know, for sure, that a write to part of a page >> does not involve an internal erasure of the entire page. Without >> knowing this each version stamp needs a page of its own as far as I >> can see. The act of writing the version number must be guaranteed not >> to upset the data it refers to, if it gets interrupted. >> >> I think I will have to try and test this. > > To test the system, you could make a simple test jig that switches the > power to your board. Use another controller to switch the power in > random intervals. The random interval timing should match the > discharge rate of the power supply capacitors such that the board > suffers a lot of brown out conditions. Add an extra R/C filter if > necessary. > > On the device you're testing, set up some special firmware that > continously writes updates to the EEPROM. Instead of real data, write > a verifiable test pattern, and have the software check it regularly. > If it finds corrupted data in a 'valid' block, trigger an alarm. > > Then leave the test setup in a corner of the lab, 24/7.
That sounds like a good idea to test a finished routine. But to get the initial information needed to write it, I am thinking of this: - Hack the electronics so the EEPROM can be powered from an output pin - hack my eeprom_write routine so that a timer can interrupt power to the EEPROM and hold the I2C pins low (so the eeprom is definitely unpowered). That allows the timer to interrupt programming using a precise time delay that I can sweep though a range of values. For each value I can - print the eeprom page contents (to a serial port) - reprogram the entire page with a test pattern - start the timer and the page programming test (different pattern, only alters part of page) I should be able to see any partial erasures, partial programming, and also any erasure of bytes on the same page outside of the program area. Perhaps I would printout extra regions like address 0 and parts of adjacent pages. This does not simulate a real system since there is no "brownout" state. So I still need something like your setup as a final verification. -- John Devereux
Reply by Arlet February 8, 20082008-02-08
On Feb 8, 11:58 am, John Devereux <jdREM...@THISdevereux.me.uk> wrote:

> I would still love to know, for sure, that a write to part of a page > does not involve an internal erasure of the entire page. Without > knowing this each version stamp needs a page of its own as far as I > can see. The act of writing the version number must be guaranteed not > to upset the data it refers to, if it gets interrupted. > > I think I will have to try and test this.
To test the system, you could make a simple test jig that switches the power to your board. Use another controller to switch the power in random intervals. The random interval timing should match the discharge rate of the power supply capacitors such that the board suffers a lot of brown out conditions. Add an extra R/C filter if necessary. On the device you're testing, set up some special firmware that continously writes updates to the EEPROM. Instead of real data, write a verifiable test pattern, and have the software check it regularly. If it finds corrupted data in a 'valid' block, trigger an alarm. Then leave the test setup in a corner of the lab, 24/7.
Reply by John Devereux February 8, 20082008-02-08
David Brown <david@westcontrol.removethisbit.com> writes:

> John Devereux wrote: >> ssubbarayan <ssubba@gmail.com> writes: >> >> >> [...] >> >>> John, >>> My only worry was getting atleast one good copy.In your whole >>> algorithm,you have assumed atleast one good copy exists.I was >>> wondering what would be situation when the first time(no copy >>> available,freshly you are writing data),and you encounter power brown >>> out situation. >> >> Firstly, Davids algorithm is better - use a version number based >> system like he describes. >> >> For any possible algorithm, if the power fails during writing of data, >> you are always going to lose *that version*. Just as if the power >> failed before you started to write it. >> >> Assuming your eeprom is initially filled with 0xff, and a 32 bit >> version number, then a version number of 0xffffffff (or -1) would >> indicate a missing copy. >> > > It's actually enough with the versioning stamp to distinguish between > invalid, and newer or later versions. All you really need are > versions 1, 2, and 3, and wrap to 1 again after 3. Anything other > than 1, 2, or 3 is invalid.
Cool - I was thinking of avoiding the wrap entirely by having a range so high it would never happen :)
> One thing to watch out for, however, is the possibility of corruption > at addresses other than the one you are writing. External serial > eeproms generally have protection against this, but Atmel AVRs are > known to be able to corrupt byte 0 of the eeprom if they get a reset > during a write (the address register gets cleared to 0, but the write > continues - thus the data at address 0 may be half overwritten). The > same problem can probably occur on many other eeproms - I don't know > if the AVRs are a particular high risk, or if Atmel is just unusually > honest!
I would still love to know, for sure, that a write to part of a page does not involve an internal erasure of the entire page. Without knowing this each version stamp needs a page of its own as far as I can see. The act of writing the version number must be guaranteed not to upset the data it refers to, if it gets interrupted. I think I will have to try and test this. -- John Devereux
Reply by David Brown February 8, 20082008-02-08
John Devereux wrote:
> ssubbarayan <ssubba@gmail.com> writes: > > > [...] > >> John, >> My only worry was getting atleast one good copy.In your whole >> algorithm,you have assumed atleast one good copy exists.I was >> wondering what would be situation when the first time(no copy >> available,freshly you are writing data),and you encounter power brown >> out situation. > > Firstly, Davids algorithm is better - use a version number based > system like he describes. > > For any possible algorithm, if the power fails during writing of data, > you are always going to lose *that version*. Just as if the power > failed before you started to write it. > > Assuming your eeprom is initially filled with 0xff, and a 32 bit > version number, then a version number of 0xffffffff (or -1) would > indicate a missing copy. >
It's actually enough with the versioning stamp to distinguish between invalid, and newer or later versions. All you really need are versions 1, 2, and 3, and wrap to 1 again after 3. Anything other than 1, 2, or 3 is invalid. One thing to watch out for, however, is the possibility of corruption at addresses other than the one you are writing. External serial eeproms generally have protection against this, but Atmel AVRs are known to be able to corrupt byte 0 of the eeprom if they get a reset during a write (the address register gets cleared to 0, but the write continues - thus the data at address 0 may be half overwritten). The same problem can probably occur on many other eeproms - I don't know if the AVRs are a particular high risk, or if Atmel is just unusually honest!
Reply by John Devereux February 8, 20082008-02-08
ssubbarayan <ssubba@gmail.com> writes:


[...]

> > John, > My only worry was getting atleast one good copy.In your whole > algorithm,you have assumed atleast one good copy exists.I was > wondering what would be situation when the first time(no copy > available,freshly you are writing data),and you encounter power brown > out situation.
Firstly, Davids algorithm is better - use a version number based system like he describes. For any possible algorithm, if the power fails during writing of data, you are always going to lose *that version*. Just as if the power failed before you started to write it. Assuming your eeprom is initially filled with 0xff, and a 32 bit version number, then a version number of 0xffffffff (or -1) would indicate a missing copy.
> I guess in this scenerio theres nothing you can do about > it.How ever if you have any solutions in mind for this,please let me > know. > > Regards, > s.subbarayan
-- John Devereux
Reply by ssubbarayan February 8, 20082008-02-08
On Feb 7, 4:25=A0pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
> ssubbarayan <ssu...@gmail.com> writes: > > On Feb 6, 5:25=A0pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote: > > [...] > > > > > > > > >> This is equivalent to what I was planning. Although I don't think I > >> need a checksum. I was going to have "valid" markers, separate from > >> the data blocks. So it would go > > >> =A0 mark copy 1 invalid > >> =A0 write new copy 1 > >> =A0 mark copy 1 valid > >> =A0 mark copy 2 invalid > >> =A0 write new copy 2 > >> =A0 mark copy 2 valid > > >> On power up both copy valid flags would be checked, and any "invalid" > >> copy overwritten with the valid one. The "copy valid" markers would be > >> stored on separate pages from the data (and each other), so hopefully > >> will not get corrupted at the same time as the data they refer to. > > >> Only problem with this is it requires 4 pages to be written instead of > >> one. Using a checksum to replace the separate flags could mean just > >> two pages - perhaps that is better after all. > > [...] > > > Hi John, > > =A0 =A0 =A0I will continue using brackets while posting long links. > > They have to be *angle* brackets, < >. But you are using google > groups, which usually scrambles everything up anyway. > > > By the way,I have a question regarding your implementation. > > In your algorithm to make two copies of any data in nvram,what if > > during updation to both the copies you encounter power brownout?Since > > power brown outs are unpredictable,how are we going to guarentee > > atleast one good copy exists with us? > > The scenerio which I am referring to here would be the first time when > > you are updating the data.During first updation,you wont be able to > > ascertain whether the copy is good or bad. > > Another question is at what point of time you would update the > > validity flag for the data? > > update(): > > =A0 a) mark copy 1 invalid > =A0 b) write new copy 1 > =A0 c) mark copy 1 valid > > =A0 =A0 =A0[same again for copy 2] > > startup(): > =A0 any copy marked invalid is replaced by the copy marked valid. > > The steps happen in strict order. Each previous step must complete > successfully before the next is started. So the only way the valid > flag can be set is if the data has been successfuly written, without > interruption. > > > On what basis you would come to know data is valid given that you dont > > have a checksum? > > The data is marked valid only *after* it has been successfully > written. If writing of data is interrupted, then the flag never set > either. So next time it powers up we know that copy may be bad, and > restore from the good one. > > There is always at least one good copy. > > Let us look at what happens if programming is interrupted during a,b,d > above. > > a) The copy 1 valid *flag* is left in an unknown state. But the actual > data is valid. So either the startup will see it invalid and restore > the data, or it sees it valid and all is OK. > > b) The data is marked invalid, and the *data* is left in an unknown > state. This is OK, the startup will see the invalid flag and restore > the data. > > c) The data has been correctly written, but the valid flag is left in > an unknown state. If the startup sees the flag as valid, that is OK, > because the data is in fact valid. If it sees it as invalid, the data > will be restored from the other copy. Still OK. > > Obviously this make a few assumptions: the eeprom has not worn out, > and that there is some brownout protection so that the CPU does not go > crazy and erase everything. > > Another assumption is that the flags are either programmed or not > programmed. But what if the flag programming gets interrupted so that > the flag state is not only unknown, but is actually *unreliable*. That > is, it is only "half programmed" (or half erased), so sometimes reads > "valid" and sometimes "invalid"? In this condition the state read > could depend on temperature,age or supply noise. > > It would require a very unlikely sequence of events, but you could > have: > > update() > =A0 ... > =A0 mark copy 2 invalid > =A0 write copy 2 > =A0 mark copy 2 valid <interrupted> > > Then on power up, copy 2 valid flag is unreliable. But at startup > happens to read OK. > > Then next time we do an update, we get *another* power cut, this time > during copy 1 update. And at power up, this time copy 2 reads > *invalid*. So we have no valid copies. > > I think the solution is to reprogram the "valid" flags every startup. > > > I am sorry if these questions look amature,I am trying to understand > > it and felt your algorithm is more simpler then mine except for extra > > memory needed for having copies. > > I find it a difficult area, too. (And it gets harder if you start > thinking about wear-levelling or if you don't want to allocate a whole > page to a record, or if the record does not fit in a single page...) > > > Looking farward for your reply and advanced thanks, > > Regards, > > s.subbarayan > > -- > > John Devereux- Hide quoted text - > > - Show quoted text -
John, My only worry was getting atleast one good copy.In your whole algorithm,you have assumed atleast one good copy exists.I was wondering what would be situation when the first time(no copy available,freshly you are writing data),and you encounter power brown out situation.I guess in this scenerio theres nothing you can do about it.How ever if you have any solutions in mind for this,please let me know. Regards, s.subbarayan
Reply by John Devereux February 7, 20082008-02-07
David Brown <david@westcontrol.removethisbit.com> writes:

> John Devereux wrote: > >> update(): >> >> a) mark copy 1 invalid >> b) write new copy 1 >> c) mark copy 1 valid >> >> [same again for copy 2] >> >> startup(): any copy marked invalid is replaced by the copy marked >> valid. >> >> The steps happen in strict order. Each previous step must complete >> successfully before the next is started. So the only way the valid >> flag can be set is if the data has been successfuly written, without >> interruption.
[...]
> A better method is to have a version stamp along with your data. You > have two blocks, each structured as "version stamp, data". At > startup, you verify each block based on having a valid version (and > possibly a checksum as well, if you are particularly paranoid). The > latest valid version shows which block you use as your data. > > For an update, you erase the block containing the older version of the > data. Then you save your data to this block, then you write your new > version stamp. There is no need to write your data a second time - it > gives no advantages, and halves your eeprom/flash life expectancy.
That does seem a better idea. I have used versioned structures before, for a flash based system. So I don't know why I did not suggest it here too. -- John Devereux
Reply by David Brown February 7, 20082008-02-07
John Devereux wrote:

> update(): > > a) mark copy 1 invalid > b) write new copy 1 > c) mark copy 1 valid > > [same again for copy 2] > > startup(): > any copy marked invalid is replaced by the copy marked valid. > > The steps happen in strict order. Each previous step must complete > successfully before the next is started. So the only way the valid > flag can be set is if the data has been successfuly written, without > interruption. > >> On what basis you would come to know data is valid given that you dont >> have a checksum? > > The data is marked valid only *after* it has been successfully > written. If writing of data is interrupted, then the flag never set > either. So next time it powers up we know that copy may be bad, and > restore from the good one. > > There is always at least one good copy. > > Let us look at what happens if programming is interrupted during a,b,d > above. > > a) The copy 1 valid *flag* is left in an unknown state. But the actual > data is valid. So either the startup will see it invalid and restore > the data, or it sees it valid and all is OK. > > b) The data is marked invalid, and the *data* is left in an unknown > state. This is OK, the startup will see the invalid flag and restore > the data. > > c) The data has been correctly written, but the valid flag is left in > an unknown state. If the startup sees the flag as valid, that is OK, > because the data is in fact valid. If it sees it as invalid, the data > will be restored from the other copy. Still OK. > > Obviously this make a few assumptions: the eeprom has not worn out, > and that there is some brownout protection so that the CPU does not go > crazy and erase everything. > > Another assumption is that the flags are either programmed or not > programmed. But what if the flag programming gets interrupted so that > the flag state is not only unknown, but is actually *unreliable*. That > is, it is only "half programmed" (or half erased), so sometimes reads > "valid" and sometimes "invalid"? In this condition the state read > could depend on temperature,age or supply noise. > > It would require a very unlikely sequence of events, but you could > have: > > update() > ... > mark copy 2 invalid > write copy 2 > mark copy 2 valid <interrupted> > > Then on power up, copy 2 valid flag is unreliable. But at startup > happens to read OK. > > Then next time we do an update, we get *another* power cut, this time > during copy 1 update. And at power up, this time copy 2 reads > *invalid*. So we have no valid copies. > > I think the solution is to reprogram the "valid" flags every startup. > >> I am sorry if these questions look amature,I am trying to understand >> it and felt your algorithm is more simpler then mine except for extra >> memory needed for having copies. > > I find it a difficult area, too. (And it gets harder if you start > thinking about wear-levelling or if you don't want to allocate a whole > page to a record, or if the record does not fit in a single page...) >
A better method is to have a version stamp along with your data. You have two blocks, each structured as "version stamp, data". At startup, you verify each block based on having a valid version (and possibly a checksum as well, if you are particularly paranoid). The latest valid version shows which block you use as your data. For an update, you erase the block containing the older version of the data. Then you save your data to this block, then you write your new version stamp. There is no need to write your data a second time - it gives no advantages, and halves your eeprom/flash life expectancy.
Reply by John Devereux February 7, 20082008-02-07
ssubbarayan <ssubba@gmail.com> writes:

> On Feb 6, 5:25&nbsp;pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
[...]
>> >> This is equivalent to what I was planning. Although I don't think I >> need a checksum. I was going to have "valid" markers, separate from >> the data blocks. So it would go >> >> &nbsp; mark copy 1 invalid >> &nbsp; write new copy 1 >> &nbsp; mark copy 1 valid >> &nbsp; mark copy 2 invalid >> &nbsp; write new copy 2 >> &nbsp; mark copy 2 valid >> >> On power up both copy valid flags would be checked, and any "invalid" >> copy overwritten with the valid one. The "copy valid" markers would be >> stored on separate pages from the data (and each other), so hopefully >> will not get corrupted at the same time as the data they refer to. >> >> Only problem with this is it requires 4 pages to be written instead of >> one. Using a checksum to replace the separate flags could mean just >> two pages - perhaps that is better after all. >>
[...]
> Hi John, > I will continue using brackets while posting long links.
They have to be *angle* brackets, < >. But you are using google groups, which usually scrambles everything up anyway.
> By the way,I have a question regarding your implementation. > In your algorithm to make two copies of any data in nvram,what if > during updation to both the copies you encounter power brownout?Since > power brown outs are unpredictable,how are we going to guarentee > atleast one good copy exists with us? > The scenerio which I am referring to here would be the first time when > you are updating the data.During first updation,you wont be able to > ascertain whether the copy is good or bad. > Another question is at what point of time you would update the > validity flag for the data?
update(): a) mark copy 1 invalid b) write new copy 1 c) mark copy 1 valid [same again for copy 2] startup(): any copy marked invalid is replaced by the copy marked valid. The steps happen in strict order. Each previous step must complete successfully before the next is started. So the only way the valid flag can be set is if the data has been successfuly written, without interruption.
> On what basis you would come to know data is valid given that you dont > have a checksum?
The data is marked valid only *after* it has been successfully written. If writing of data is interrupted, then the flag never set either. So next time it powers up we know that copy may be bad, and restore from the good one. There is always at least one good copy. Let us look at what happens if programming is interrupted during a,b,d above. a) The copy 1 valid *flag* is left in an unknown state. But the actual data is valid. So either the startup will see it invalid and restore the data, or it sees it valid and all is OK. b) The data is marked invalid, and the *data* is left in an unknown state. This is OK, the startup will see the invalid flag and restore the data. c) The data has been correctly written, but the valid flag is left in an unknown state. If the startup sees the flag as valid, that is OK, because the data is in fact valid. If it sees it as invalid, the data will be restored from the other copy. Still OK. Obviously this make a few assumptions: the eeprom has not worn out, and that there is some brownout protection so that the CPU does not go crazy and erase everything. Another assumption is that the flags are either programmed or not programmed. But what if the flag programming gets interrupted so that the flag state is not only unknown, but is actually *unreliable*. That is, it is only "half programmed" (or half erased), so sometimes reads "valid" and sometimes "invalid"? In this condition the state read could depend on temperature,age or supply noise. It would require a very unlikely sequence of events, but you could have: update() ... mark copy 2 invalid write copy 2 mark copy 2 valid <interrupted> Then on power up, copy 2 valid flag is unreliable. But at startup happens to read OK. Then next time we do an update, we get *another* power cut, this time during copy 1 update. And at power up, this time copy 2 reads *invalid*. So we have no valid copies. I think the solution is to reprogram the "valid" flags every startup.
> I am sorry if these questions look amature,I am trying to understand > it and felt your algorithm is more simpler then mine except for extra > memory needed for having copies.
I find it a difficult area, too. (And it gets harder if you start thinking about wear-levelling or if you don't want to allocate a whole page to a record, or if the record does not fit in a single page...)
> Looking farward for your reply and advanced thanks, > Regards, > s.subbarayan
-- John Devereux
Reply by CBFalconer February 7, 20082008-02-07
ssubbarayan wrote:
>
... snip ...
> > I will continue using brackets while posting long links.
FYI the proper way to transmit links is within <> pairs. See the page URL in my sig. below for an example. Another would be: <http://cbfalconer.home.att.net/download/> -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. -- Posted via a free Usenet account from http://www.teranews.com