Arlet <usenet+5@c-scape.nl> writes:

> On Feb 8, 11:58 am, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
>
>> I would still love to know, for sure, that a write to part of a page
>> does not involve an internal erasure of the entire page. Without
>> knowing this each version stamp needs a page of its own as far as I
>> can see. The act of writing the version number must be guaranteed not
>> to upset the data it refers to, if it gets interrupted.
>>
>> I think I will have to try and test this.
>
> To test the system, you could make a simple test jig that switches the
> power to your board. Use another controller to switch the power in
> random intervals. The random interval timing should match the
> discharge rate of the power supply capacitors such that the board
> suffers a lot of brown out conditions. Add an extra R/C filter if
> necessary.
>
> On the device you're testing, set up some special firmware that
> continously writes updates to the EEPROM. Instead of real data, write
> a verifiable test pattern, and have the software check it regularly.
> If it finds corrupted data in a 'valid' block, trigger an alarm.
>
> Then leave the test setup in a corner of the lab, 24/7.

That sounds like a good idea to test a finished routine.

But to get the initial information needed to write it, I am thinking
of this:

 - Hack the electronics so the EEPROM can be powered from an output
   pin
 - hack my eeprom_write routine so that a timer can interrupt power to
   the EEPROM and hold the I2C pins low (so the eeprom is definitely
   unpowered).

That allows the timer to interrupt programming using a precise time
delay that I can sweep though a range of values. For each value I can

 - print the eeprom page contents (to a serial port)
 - reprogram the entire page with a test pattern
 - start the timer and the page programming test
   (different pattern, only alters part of page)

I should be able to see any partial erasures, partial programming, and
also any erasure of bytes on the same page outside of the program
area.

Perhaps I would printout extra regions like address 0 and parts of
adjacent pages.

This does not simulate a real system since there is no "brownout"
state. So I still need something like your setup as a final
verification.

-- 

John Devereux

On Feb 8, 11:58 am, John Devereux <jdREM...@THISdevereux.me.uk> wrote:

> I would still love to know, for sure, that a write to part of a page
> does not involve an internal erasure of the entire page. Without
> knowing this each version stamp needs a page of its own as far as I
> can see. The act of writing the version number must be guaranteed not
> to upset the data it refers to, if it gets interrupted.
>
> I think I will have to try and test this.

To test the system, you could make a simple test jig that switches the
power to your board. Use another controller to switch the power in
random intervals. The random interval timing should match the
discharge rate of the power supply capacitors such that the board
suffers a lot of brown out conditions. Add an extra R/C filter if
necessary.

On the device you're testing, set up some special firmware that
continously writes updates to the EEPROM. Instead of real data, write
a verifiable test pattern, and have the software check it regularly.
If it finds corrupted data in a 'valid' block, trigger an alarm.

Then leave the test setup in a corner of the lab, 24/7.

David Brown <david@westcontrol.removethisbit.com> writes:

> John Devereux wrote:
>> ssubbarayan <ssubba@gmail.com> writes:
>>
>>
>> [...]
>>
>>> John,
>>> My only worry was getting atleast one good copy.In your whole
>>> algorithm,you have assumed atleast one good copy exists.I was
>>> wondering what would be situation when the first time(no copy
>>> available,freshly you are writing data),and you encounter power brown
>>> out situation.
>>
>> Firstly, Davids algorithm is better - use a version number based
>> system like he describes.
>>
>> For any possible algorithm, if the power fails during writing of data,
>> you are always going to lose *that version*. Just as if the power
>> failed before you started to write it.
>>
>> Assuming your eeprom is initially filled with 0xff, and a 32 bit
>> version number, then a version number of 0xffffffff (or -1) would
>> indicate a missing copy.
>>
>
> It's actually enough with the versioning stamp to distinguish between
> invalid, and newer or later versions.  All you really need are
> versions 1, 2, and 3, and wrap to 1 again after 3.  Anything other
> than 1, 2, or 3 is invalid.

Cool - I was thinking of avoiding the wrap entirely by having a range
so high it would never happen :)

> One thing to watch out for, however, is the possibility of corruption
> at addresses other than the one you are writing.  External serial
> eeproms generally have protection against this, but Atmel AVRs are
> known to be able to corrupt byte 0 of the eeprom if they get a reset
> during a write (the address register gets cleared to 0, but the write
> continues - thus the data at address 0 may be half overwritten).  The
> same problem can probably occur on many other eeproms - I don't know
> if the AVRs are a particular high risk, or if Atmel is just unusually
> honest!

I would still love to know, for sure, that a write to part of a page
does not involve an internal erasure of the entire page. Without
knowing this each version stamp needs a page of its own as far as I
can see. The act of writing the version number must be guaranteed not
to upset the data it refers to, if it gets interrupted.

I think I will have to try and test this.

-- 

John Devereux

John Devereux wrote:
> ssubbarayan <ssubba@gmail.com> writes:
> 
> 
> [...]
> 
>> John,
>> My only worry was getting atleast one good copy.In your whole
>> algorithm,you have assumed atleast one good copy exists.I was
>> wondering what would be situation when the first time(no copy
>> available,freshly you are writing data),and you encounter power brown
>> out situation.
> 
> Firstly, Davids algorithm is better - use a version number based
> system like he describes.
> 
> For any possible algorithm, if the power fails during writing of data,
> you are always going to lose *that version*. Just as if the power
> failed before you started to write it.
> 
> Assuming your eeprom is initially filled with 0xff, and a 32 bit
> version number, then a version number of 0xffffffff (or -1) would
> indicate a missing copy.
> 

It's actually enough with the versioning stamp to distinguish between 
invalid, and newer or later versions.  All you really need are versions 
1, 2, and 3, and wrap to 1 again after 3.  Anything other than 1, 2, or 
3 is invalid.

One thing to watch out for, however, is the possibility of corruption at 
addresses other than the one you are writing.  External serial eeproms 
generally have protection against this, but Atmel AVRs are known to be 
able to corrupt byte 0 of the eeprom if they get a reset during a write 
(the address register gets cleared to 0, but the write continues - thus 
the data at address 0 may be half overwritten).  The same problem can 
probably occur on many other eeproms - I don't know if the AVRs are a 
particular high risk, or if Atmel is just unusually honest!

ssubbarayan <ssubba@gmail.com> writes:

[...]

>
> John,
> My only worry was getting atleast one good copy.In your whole
> algorithm,you have assumed atleast one good copy exists.I was
> wondering what would be situation when the first time(no copy
> available,freshly you are writing data),and you encounter power brown
> out situation.

Firstly, Davids algorithm is better - use a version number based
system like he describes.

For any possible algorithm, if the power fails during writing of data,
you are always going to lose *that version*. Just as if the power
failed before you started to write it.

Assuming your eeprom is initially filled with 0xff, and a 32 bit
version number, then a version number of 0xffffffff (or -1) would
indicate a missing copy.

> I guess in this scenerio theres nothing you can do about
> it.How ever if you have any solutions in mind for this,please let me
> know.
>
> Regards,
> s.subbarayan

-- 

John Devereux

On Feb 7, 4:25=A0pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
> ssubbarayan <ssu...@gmail.com> writes:
> > On Feb 6, 5:25=A0pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:
>
> [...]
>
>
>
>
>
>
>
> >> This is equivalent to what I was planning. Although I don't think I
> >> need a checksum. I was going to have "valid" markers, separate from
> >> the data blocks. So it would go
>
> >> =A0 mark copy 1 invalid
> >> =A0 write new copy 1
> >> =A0 mark copy 1 valid
> >> =A0 mark copy 2 invalid
> >> =A0 write new copy 2
> >> =A0 mark copy 2 valid
>
> >> On power up both copy valid flags would be checked, and any "invalid"
> >> copy overwritten with the valid one. The "copy valid" markers would be
> >> stored on separate pages from the data (and each other), so hopefully
> >> will not get corrupted at the same time as the data they refer to.
>
> >> Only problem with this is it requires 4 pages to be written instead of
> >> one. Using a checksum to replace the separate flags could mean just
> >> two pages - perhaps that is better after all.
>
> [...]
>
> > Hi John,
> > =A0 =A0 =A0I will continue using brackets while posting long links.
>
> They have to be *angle* brackets, < >. But you are using google
> groups, which usually scrambles everything up anyway.
>
> > By the way,I have a question regarding your implementation.
> > In your algorithm to make two copies of any data in nvram,what if
> > during updation to both the copies you encounter power brownout?Since
> > power brown outs are unpredictable,how are we going to guarentee
> > atleast one good copy exists with us?
> > The scenerio which I am referring to here would be the first time when
> > you are updating the data.During first updation,you wont be able to
> > ascertain whether the copy is good or bad.
> > Another question is at what point of time you would update the
> > validity flag for the data?
>
> update():
>
> =A0 a) mark copy 1 invalid
> =A0 b) write new copy 1
> =A0 c) mark copy 1 valid
>
> =A0 =A0 =A0[same again for copy 2]
>
> startup():
> =A0 any copy marked invalid is replaced by the copy marked valid.
>
> The steps happen in strict order. Each previous step must complete
> successfully before the next is started. So the only way the valid
> flag can be set is if the data has been successfuly written, without
> interruption.
>
> > On what basis you would come to know data is valid given that you dont
> > have a checksum?
>
> The data is marked valid only *after* it has been successfully
> written. If writing of data is interrupted, then the flag never set
> either. So next time it powers up we know that copy may be bad, and
> restore from the good one.
>
> There is always at least one good copy.
>
> Let us look at what happens if programming is interrupted during a,b,d
> above.
>
> a) The copy 1 valid *flag* is left in an unknown state. But the actual
> data is valid. So either the startup will see it invalid and restore
> the data, or it sees it valid and all is OK.
>
> b) The data is marked invalid, and the *data* is left in an unknown
> state. This is OK, the startup will see the invalid flag and restore
> the data.
>
> c) The data has been correctly written, but the valid flag is left in
> an unknown state. If the startup sees the flag as valid, that is OK,
> because the data is in fact valid. If it sees it as invalid, the data
> will be restored from the other copy. Still OK.
>
> Obviously this make a few assumptions: the eeprom has not worn out,
> and that there is some brownout protection so that the CPU does not go
> crazy and erase everything.
>
> Another assumption is that the flags are either programmed or not
> programmed. But what if the flag programming gets interrupted so that
> the flag state is not only unknown, but is actually *unreliable*. That
> is, it is only "half programmed" (or half erased), so sometimes reads
> "valid" and sometimes "invalid"? In this condition the state read
> could depend on temperature,age or supply noise.
>
> It would require a very unlikely sequence of events, but you could
> have:
>
> update()
> =A0 ...
> =A0 mark copy 2 invalid
> =A0 write copy 2
> =A0 mark copy 2 valid <interrupted>
>
> Then on power up, copy 2 valid flag is unreliable. But at startup
> happens to read OK.
>
> Then next time we do an update, we get *another* power cut, this time
> during copy 1 update. And at power up, this time copy 2 reads
> *invalid*. So we have no valid copies.
>
> I think the solution is to reprogram the "valid" flags every startup.
>
> > I am sorry if these questions look amature,I am trying to understand
> > it and felt your algorithm is more simpler then mine except for extra
> > memory needed for having copies.
>
> I find it a difficult area, too. (And it gets harder if you start
> thinking about wear-levelling or if you don't want to allocate a whole
> page to a record, or if the record does not fit in a single page...)
>
> > Looking farward for your reply and advanced thanks,
> > Regards,
> > s.subbarayan
>
> --
>
> John Devereux- Hide quoted text -
>
> - Show quoted text -

John,
My only worry was getting atleast one good copy.In your whole
algorithm,you have assumed atleast one good copy exists.I was
wondering what would be situation when the first time(no copy
available,freshly you are writing data),and you encounter power brown
out situation.I guess in this scenerio theres nothing you can do about
it.How ever if you have any solutions in mind for this,please let me
know.

Regards,
s.subbarayan

David Brown <david@westcontrol.removethisbit.com> writes:

> John Devereux wrote:
>
>> update():
>>
>>   a) mark copy 1 invalid
>>   b) write new copy 1
>>   c) mark copy 1 valid
>>
>>      [same again for copy 2]
>>
>> startup():   any copy marked invalid is replaced by the copy marked
>> valid.
>>
>> The steps happen in strict order. Each previous step must complete
>> successfully before the next is started. So the only way the valid
>> flag can be set is if the data has been successfuly written, without
>> interruption.

[...]

> A better method is to have a version stamp along with your data.  You
> have two blocks, each structured as "version stamp, data".  At
> startup, you verify each block based on having a valid version (and
> possibly a checksum as well, if you are particularly paranoid).  The
> latest valid version shows which block you use as your data.
>
> For an update, you erase the block containing the older version of the
> data.  Then you save your data to this block, then you write your new
> version stamp.  There is no need to write your data a second time - it
> gives no advantages, and halves your eeprom/flash life expectancy.

That does seem a better idea. I have used versioned structures before,
for a flash based system. So I don't know why I did not suggest it
here too.


-- 

John Devereux

John Devereux wrote:

> update():
> 
>   a) mark copy 1 invalid
>   b) write new copy 1
>   c) mark copy 1 valid
> 
>      [same again for copy 2]
> 
> startup(): 
>   any copy marked invalid is replaced by the copy marked valid.
> 
> The steps happen in strict order. Each previous step must complete
> successfully before the next is started. So the only way the valid
> flag can be set is if the data has been successfuly written, without
> interruption.
> 
>> On what basis you would come to know data is valid given that you dont
>> have a checksum?
> 
> The data is marked valid only *after* it has been successfully
> written. If writing of data is interrupted, then the flag never set
> either. So next time it powers up we know that copy may be bad, and
> restore from the good one.
> 
> There is always at least one good copy.
> 
> Let us look at what happens if programming is interrupted during a,b,d
> above.
> 
> a) The copy 1 valid *flag* is left in an unknown state. But the actual
> data is valid. So either the startup will see it invalid and restore
> the data, or it sees it valid and all is OK.
> 
> b) The data is marked invalid, and the *data* is left in an unknown
> state. This is OK, the startup will see the invalid flag and restore
> the data.
> 
> c) The data has been correctly written, but the valid flag is left in
> an unknown state. If the startup sees the flag as valid, that is OK,
> because the data is in fact valid. If it sees it as invalid, the data
> will be restored from the other copy. Still OK.
> 
> Obviously this make a few assumptions: the eeprom has not worn out,
> and that there is some brownout protection so that the CPU does not go
> crazy and erase everything.
> 
> Another assumption is that the flags are either programmed or not
> programmed. But what if the flag programming gets interrupted so that
> the flag state is not only unknown, but is actually *unreliable*. That
> is, it is only "half programmed" (or half erased), so sometimes reads
> "valid" and sometimes "invalid"? In this condition the state read
> could depend on temperature,age or supply noise.
> 
> It would require a very unlikely sequence of events, but you could
> have:
> 
> update()
>   ...
>   mark copy 2 invalid
>   write copy 2
>   mark copy 2 valid <interrupted>
> 
> Then on power up, copy 2 valid flag is unreliable. But at startup
> happens to read OK.
> 
> Then next time we do an update, we get *another* power cut, this time
> during copy 1 update. And at power up, this time copy 2 reads
> *invalid*. So we have no valid copies.
> 
> I think the solution is to reprogram the "valid" flags every startup.
> 
>> I am sorry if these questions look amature,I am trying to understand
>> it and felt your algorithm is more simpler then mine except for extra
>> memory needed for having copies.
> 
> I find it a difficult area, too. (And it gets harder if you start
> thinking about wear-levelling or if you don't want to allocate a whole
> page to a record, or if the record does not fit in a single page...)
> 

A better method is to have a version stamp along with your data.  You 
have two blocks, each structured as "version stamp, data".  At startup, 
you verify each block based on having a valid version (and possibly a 
checksum as well, if you are particularly paranoid).  The latest valid 
version shows which block you use as your data.

For an update, you erase the block containing the older version of the 
data.  Then you save your data to this block, then you write your new 
version stamp.  There is no need to write your data a second time - it 
gives no advantages, and halves your eeprom/flash life expectancy.

ssubbarayan <ssubba@gmail.com> writes:

> On Feb 6, 5:25&nbsp;pm, John Devereux <jdREM...@THISdevereux.me.uk> wrote:

[...]

>>
>> This is equivalent to what I was planning. Although I don't think I
>> need a checksum. I was going to have "valid" markers, separate from
>> the data blocks. So it would go
>>
>> &nbsp; mark copy 1 invalid
>> &nbsp; write new copy 1
>> &nbsp; mark copy 1 valid
>> &nbsp; mark copy 2 invalid
>> &nbsp; write new copy 2
>> &nbsp; mark copy 2 valid
>>
>> On power up both copy valid flags would be checked, and any "invalid"
>> copy overwritten with the valid one. The "copy valid" markers would be
>> stored on separate pages from the data (and each other), so hopefully
>> will not get corrupted at the same time as the data they refer to.
>>
>> Only problem with this is it requires 4 pages to be written instead of
>> one. Using a checksum to replace the separate flags could mean just
>> two pages - perhaps that is better after all.
>>

[...]

> Hi John,
>      I will continue using brackets while posting long links.

They have to be *angle* brackets, < >. But you are using google
groups, which usually scrambles everything up anyway.

> By the way,I have a question regarding your implementation.
> In your algorithm to make two copies of any data in nvram,what if
> during updation to both the copies you encounter power brownout?Since
> power brown outs are unpredictable,how are we going to guarentee
> atleast one good copy exists with us?
> The scenerio which I am referring to here would be the first time when
> you are updating the data.During first updation,you wont be able to
> ascertain whether the copy is good or bad.
> Another question is at what point of time you would update the
> validity flag for the data?

update():

  a) mark copy 1 invalid
  b) write new copy 1
  c) mark copy 1 valid

     [same again for copy 2]

startup(): 
  any copy marked invalid is replaced by the copy marked valid.

The steps happen in strict order. Each previous step must complete
successfully before the next is started. So the only way the valid
flag can be set is if the data has been successfuly written, without
interruption.

> On what basis you would come to know data is valid given that you dont
> have a checksum?

The data is marked valid only *after* it has been successfully
written. If writing of data is interrupted, then the flag never set
either. So next time it powers up we know that copy may be bad, and
restore from the good one.

There is always at least one good copy.

Let us look at what happens if programming is interrupted during a,b,d
above.

a) The copy 1 valid *flag* is left in an unknown state. But the actual
data is valid. So either the startup will see it invalid and restore
the data, or it sees it valid and all is OK.

b) The data is marked invalid, and the *data* is left in an unknown
state. This is OK, the startup will see the invalid flag and restore
the data.

c) The data has been correctly written, but the valid flag is left in
an unknown state. If the startup sees the flag as valid, that is OK,
because the data is in fact valid. If it sees it as invalid, the data
will be restored from the other copy. Still OK.

Obviously this make a few assumptions: the eeprom has not worn out,
and that there is some brownout protection so that the CPU does not go
crazy and erase everything.

Another assumption is that the flags are either programmed or not
programmed. But what if the flag programming gets interrupted so that
the flag state is not only unknown, but is actually *unreliable*. That
is, it is only "half programmed" (or half erased), so sometimes reads
"valid" and sometimes "invalid"? In this condition the state read
could depend on temperature,age or supply noise.

It would require a very unlikely sequence of events, but you could
have:

update()
  ...
  mark copy 2 invalid
  write copy 2
  mark copy 2 valid <interrupted>

Then on power up, copy 2 valid flag is unreliable. But at startup
happens to read OK.

Then next time we do an update, we get *another* power cut, this time
during copy 1 update. And at power up, this time copy 2 reads
*invalid*. So we have no valid copies.

I think the solution is to reprogram the "valid" flags every startup.

> I am sorry if these questions look amature,I am trying to understand
> it and felt your algorithm is more simpler then mine except for extra
> memory needed for having copies.

I find it a difficult area, too. (And it gets harder if you start
thinking about wear-levelling or if you don't want to allocate a whole
page to a record, or if the record does not fit in a single page...)

> Looking farward for your reply and advanced thanks,
> Regards,
> s.subbarayan

-- 

John Devereux