Reply by Clifford Heath June 5, 20052005-06-05
rolf.freitag@rolf... wrote:
> > The birthday paradox says...
> That's only true for files with a) same size

For files larger than the hash, size has *nothing* to do with
it, it's simple mathematics.

 > and b) size > 160.

If you mean files smaller than 160 bits (20 bytes) - how many
of those exist anyhow? It's true that ASCII english text has
only 1-2 bits of entropy per byte, but that doesn't stop the
checksum from working, if, like SHA-1, you include the file
size in the hash.

> Because on a typical PC there are many small files
collisions du occur much often
> than in one of 2^80 cases even with a theoretically perfect hash of 160
bits.

Find me a pair then, > 20 bytes.

> md5sum is not theoretically perfect because the
size of the md5sum is 128 bit and the
> collisions where found in 128 bit files.

Again, file size is irrelevant to this, I can't imagine why
you're hung up on it.

> Which hash is theoretically perfect?

None. But the extensive testing and analysis that's been carried
out means that we can get close enough. Remember the cosmic rays :-).

I've also built a duplicate-file finder - it's an interesting project.

Clifford Heath.

Beginning Microcontrollers with the MSP430

Reply by June 4, 20052005-06-04
Hi,

> The "birthday paradox" (google for it)
says that to reach an 50/50
> chance of finding two files having the same (theoretically perfect)
> hash of 160 bits, you need 2^80 files.

That's only true for files with a) same size and b) size > 160.
Because on a typical PC there are many small files collisions du occur much
often
than in one of 2^80 cases even with a theoretically perfect hash of 160 bits.

md5sum is not theoretically perfect because the size of the md5sum is 128 bit
and the
collisions where found in 128 bit files.
Which hash is theoretically perfect?
I know the file content itself, used with the file size as the hash, is perfect
and therefore
i'm simply using this hash, e. g. for finding duplicate files (
http://freshmeat.net/projects/dupmerge/ ).

Regards,

Rolf



Reply by Onestone June 4, 20052005-06-04
Hi Jeff, I would concentrate on making the time to write much faster. 
That is remove all the latency from the detect to the write process. 
Obviously the ideal is to have the segment erased initially, and again 
everytime it rolls over. That way, on power up, you only need to find 
the next word to write. A write is a very fast operation. One reason I 
use the switchers I do is because they will give me regulated supply out 
with only 0.7Vin. Since my power loss threshhold is higher than my 
actual operating voltage it is almost impossible for me not to be able 
to program in time. The key is rapid detection, and, rather than lots of 
extra capacitiance on the power supply output have a decent cap on the 
input. I have never seen cross segment corruption in the MSP430, but I 
specifically designed to prevent such issues from the very start.

Al

Jeff wrote:

>Hi Al
>Losing the data of my last power-up time, is not that important 
>which is separate from my other flash data. What I more interested 
>in, is when writing my "time up" to a segment, and this segment is

>just allocated to "time up" data. can I loss data or corrupt data
in 
>other segments if I power down halfway during a write, for example I 
>don't allow enough time between detecting the power down and 
>implementing the write procedure. If the worst is, that I corrupt 
>the "time-up" data segment, I can live with that. But if there is
a 
>chance other segments could be corrupted, then I will have to 
>consider that. I don't know the internal architecture of flash, and 
>whether TI have put in safe guards for writing only the sector 
>requested. I don't know.
>
>Jeff
>
>
>--- In msp430@msp4..., Onestone <onestone@b...> wrote:
>  
>
>>Maths probably doesn't even enter into the reality of it. If your 
>>talking about a last gasp write on power loss. If you're worried 
>>    
>>
>about 
>  
>
>>not being able to write 1 word to flash you shouldn't even be 
>>considering  trying to calculate a CRC or othe rvalidity check, 
>>    
>>
>then 
>  
>
>>write that off plus the original data. That just strikes me as too 
>>    
>>
>silly 
>  
>
>>for words.
>>
>>The OP spoecifically queried the issue of writing 4 bytes to flash 
>>    
>>
>as a 
>  
>
>>time value.  reality says you don't need 4 bytes only 2. assuming 
>>    
>>
>a 
>  
>
>>simple 32 bit seconds counter. if you wish to be truly frugal with 
>>    
>>
>your 
>  
>
>>erasewrites to flash  you allocate a single bit for each overflow 
>>    
>>
>from 
>  
>
>>16 bits, and sequentially program these. lots of writes, but fewer 
>>erases. personally I allocate 2 rolling stores for this. a 128 
>>    
>>
>byte 
>  
>
>>dataflash segment for the MSW and a 512 byte segment for the 16 
>>    
>>
>bit 
>  
>
>>count. If you ONLY write the 16 bit LSWat power loss then you only 
>>    
>>
>need 
>  
>
>>a 128 byte segment for each.
>>
>>The MSW only gets written every 18 hours of continuous operation, 
>>    
>>
>so the 
>  
>
>>segment will fill every 48 days, thus 10 years of continuous 
>>    
>>
>operation 
>  
>
>>will result in 75 segment erase/write cycles.
>>
>>If the seconds value is ONLY written at power loss then the 
>>    
>>
>segment will 
>  
>
>>require an erase cycle every 64 power losses. even at 1 an hour ( 
>>    
>>
>a 
>  
>
>>pretty shitty system) the data flash will only need erasing every 
>>    
>>
>64 
>  
>
>>hours, or 2.6666 days. Thus in 10 years only 1370 erase cycles 
>>    
>>
>would 
>  
>
>>have occurred.
>>
>>Both of these are well within spec.
>>
>>The only issue then is to write the flash fast enough. The OP 
>>    
>>
>depended 
>  
>
>>on reading the ADC. I think this is too slow. I would use the 
>>comparator, or a low battery warning on the system power supply 
>>    
>>
>(this is 
>  
>
>>how most of my systems work. A couple of early designs use the on 
>>    
>>
>board 
>  
>
>>comparator) Since I run from Li-poly batteries, in most cases, and 
>>    
>>
>since 
>  
>
>>my power supply will run right down to 0.7 Vin I set my trigger 
>>    
>>
>voltage 
>  
>
>>very high. After a full charge the batteries sit at 4.2V. At 3.6V 
>>    
>>
>they 
>  
>
>>start to discharge more rapidly. At 3.5V the discharge rate is too 
>>    
>>
>fast. 
>  
>
>>Thus at 3.6V I trigger my shut down sequence. having given the 
>>    
>>
>user 
>  
>
>>adequate warning at 3.7V.
>>
>>Al
>>
>>augusto einsfeldt wrote:
>>
>>    
>>
>>>It seems to be a math field here.
>>>Wonder if a simple double crc word would provide enough safety. 
>>>      
>>>
>Say, CRC-16
>  
>
>>>and CRC-ITT stored along with data area. Again, math could 
>>>      
>>>
>provide final
>  
>
>>>answer but it seems that when calculating a checksum with two 
>>>      
>>>
>different ways
>  
>
>>>one can get a good safety level if the FLASH failure rate is 
>>>      
>>>
>below 5 bytes
>  
>
>>>long.
>>>A failure in the data may result in same CRC-16 value but not 
>>>      
>>>
>likely will
>  
>
>>>also result in the same CRC-ITT value. It would also be true if 
>>>      
>>>
>the failure
>  
>
>>>is in one or both checksum values.
>>>A failure in the data and onde CRC value would be catch by second 
>>>      
>>>
>CRC.
>  
>
>>>I'd like to see someone in this list, with math background, 
>>>      
>>>
>writing about
>  
>
>>>this.
>>>-Augusto
>>>
>>>
>>>-----Original Message-----
>>>From: msp430@msp4... [mailto:msp430@msp4...] On 
>>>      
>>>
>Behalf Of
>  
>
>>>rolf.freitag@e...
>>>Sent: Friday, June 03, 2005 6:31 PM
>>>To: msp430@msp4...
>>>Subject: RE: [msp430] Writing to Flash during power down
>>>
>>>
>>>Hi,
>>>
>>> 
>>>
>>>      
>>>
>>>>MD5 and SHA-1 digests collide very, very infrequently and
they're
>>>>   
>>>>
>>>>        
>>>>
>>>yes, but they are not absolutely safe. I have some files with 
>>>      
>>>
>only 128 byte
>  
>
>>>which are different and do have the same md5sum. It's saver to 
>>>      
>>>
>use md5sum
>  
>
>>>and sha1sum but that only increases the probability which will 
>>>      
>>>
>always be
>  
>
>>>smaller than exactly 1. 
>>>I do prefer exactly 1 and not nearly 1; checksums are good in 
>>>      
>>>
>some cases but
>  
>
>>>often not good enough (snake oil), especially when you have a safe
>>>alternative.
>>>
>>>Regards,
>>>
>>>Rolf
>>>
>>>
>>>
>>>
>>>.
>>>
>>>
>>>Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>>
>>>      
>>>
>
>
>
>
>.
>
> 
>Yahoo! Groups Links
>
>
>
> 
>
>
>
>
>  
>


Reply by Jeff June 4, 20052005-06-04
Hi Al
Losing the data of my last power-up time, is not that important 
which is separate from my other flash data. What I more interested 
in, is when writing my "time up" to a segment, and this segment is 
just allocated to "time up" data. can I loss data or corrupt data in 
other segments if I power down halfway during a write, for example I 
don't allow enough time between detecting the power down and 
implementing the write procedure. If the worst is, that I corrupt 
the "time-up" data segment, I can live with that. But if there is a 
chance other segments could be corrupted, then I will have to 
consider that. I don't know the internal architecture of flash, and 
whether TI have put in safe guards for writing only the sector 
requested. I don't know.

Jeff


--- In msp430@msp4..., Onestone <onestone@b...> wrote:
> Maths probably doesn't even enter into the
reality of it. If your 
> talking about a last gasp write on power loss. If you're worried 
about 
> not being able to write 1 word to flash you
shouldn't even be 
> considering  trying to calculate a CRC or othe rvalidity check, 
then 
> write that off plus the original data. That just
strikes me as too 
silly 
> for words.
> 
> The OP spoecifically queried the issue of writing 4 bytes to flash 
as a 
> time value.  reality says you don't need 4
bytes only 2. assuming 
a 
> simple 32 bit seconds counter. if you wish to be
truly frugal with 
your 
> erasewrites to flash  you allocate a single bit
for each overflow 
from 
> 16 bits, and sequentially program these. lots of
writes, but fewer 
> erases. personally I allocate 2 rolling stores for this. a 128 
byte 
> dataflash segment for the MSW and a 512 byte
segment for the 16 
bit 
> count. If you ONLY write the 16 bit LSWat power
loss then you only 
need 
> a 128 byte segment for each.
> 
> The MSW only gets written every 18 hours of continuous operation, 
so the 
> segment will fill every 48 days, thus 10 years of
continuous 
operation 
> will result in 75 segment erase/write cycles.
> 
> If the seconds value is ONLY written at power loss then the 
segment will 
> require an erase cycle every 64 power losses. even
at 1 an hour ( 
a 
> pretty shitty system) the data flash will only
need erasing every 
64 
> hours, or 2.6666 days. Thus in 10 years only 1370
erase cycles 
would 
> have occurred.
> 
> Both of these are well within spec.
> 
> The only issue then is to write the flash fast enough. The OP 
depended 
> on reading the ADC. I think this is too slow. I
would use the 
> comparator, or a low battery warning on the system power supply 
(this is 
> how most of my systems work. A couple of early
designs use the on 
board 
> comparator) Since I run from Li-poly batteries, in
most cases, and 
since 
> my power supply will run right down to 0.7 Vin I
set my trigger 
voltage 
> very high. After a full charge the batteries sit
at 4.2V. At 3.6V 
they 
> start to discharge more rapidly. At 3.5V the
discharge rate is too 
fast. 
> Thus at 3.6V I trigger my shut down sequence.
having given the 
user 
> adequate warning at 3.7V.
> 
> Al
> 
> augusto einsfeldt wrote:
> 
> >It seems to be a math field here.
> >Wonder if a simple double crc word would provide enough safety. 
Say, CRC-16
> >and CRC-ITT stored along with data area.
Again, math could 
provide final
> >answer but it seems that when calculating a
checksum with two 
different ways
> >one can get a good safety level if the FLASH
failure rate is 
below 5 bytes
> >long.
> >A failure in the data may result in same CRC-16 value but not 
likely will
> >also result in the same CRC-ITT value. It
would also be true if 
the failure
> >is in one or both checksum values.
> >A failure in the data and onde CRC value would be catch by second 
CRC.
> >I'd like to see someone in this list,
with math background, 
writing about
> >this.
> >-Augusto
> >
> >
> >-----Original Message-----
> >From: msp430@msp4... [mailto:msp430@msp4...] On 
Behalf Of
> >rolf.freitag@e...
> >Sent: Friday, June 03, 2005 6:31 PM
> >To: msp430@msp4...
> >Subject: RE: [msp430] Writing to Flash during power down
> >
> >
> >Hi,
> >
> >  
> >
> >>MD5 and SHA-1 digests collide very, very infrequently and
they're
> >>    
> >>
> >
> >yes, but they are not absolutely safe. I have some files with 
only 128 byte
> >which are different and do have the same
md5sum. It's saver to 
use md5sum
> >and sha1sum but that only increases the
probability which will 
always be
> >smaller than exactly 1. 
> >I do prefer exactly 1 and not nearly 1; checksums are good in 
some cases but
> >often not good enough (snake oil), especially
when you have a safe
> >alternative.
> >
> >Regards,
> >
> >Rolf
> >
> >
> >
> >
> >.
> >
> > 
> >Yahoo! Groups Links
> >
> >
> >
> > 
> >
> >
> >
> >  
> >



Reply by Clifford Heath June 4, 20052005-06-04
Paul Curtis wrote:
> Rolf, 
>>>I've worked in crypto for ten years and I never heard of anyone
 >>>accidentally discovering two files with the same MD5
>>See http://gilchrist.ca/jeff/md5GUI/
> No, that is a wilful misinterpretation.

Worse, this is an artificial brute-force search of the remaining
key-space left *after* the theoretical weakness is used. I stand
by my claim that it's *never* happened accidentally. Though the
theoretical result means that SHA-1 is preferred, especially
against brute-force reversal of the one-way characteristic.

The "birthday paradox" (google for it) says that to reach an 50/50
chance of finding two files having the same (theoretically perfect)
hash of 160 bits, you need 2^80 files. There have never been (close
to) that many files created in the course of all computing.

Rolf, you'd be surprised how many times per week your life would be
affected if SHA-1 (and its friends) was unreliable. Every credit
card or other electronic transaction, trillions of individual data
blocks mirrored by rsync servers, the list is endless.

> All digests have collision problems by their very
nature.

It's true. But even planets have collision problems. Some problems
are unlikely/infrequent enough to not deserve bothering about. It's
*much* more likely that your program will suffer by a chip being
hit by a cosmic ray.

Clifford Heath.

Reply by Paul Curtis June 4, 20052005-06-04
Rolf, 

> > >>MD5 and SHA-1 digests collide very,
very infrequently and they're
> > > yes, but they are not absolutely safe. I have some files 
> with only 
> > > 128 byte which are different and do have the same md5sum.
> > 
> > I'd very much like to see those files. I've worked in 
> crypto for ten 
> > years and I never heard of anyone accidentally discovering 
> two files 
> > with the same MD5 - despite the theoretical weakness that's been 
> > analysed.
> 
> See 
> 
> http://gilchrist.ca/jeff/md5GUI/
> http://gilchrist.ca/jeff/md5GUI/md5col.zip
> 
> When you are developing with md5sum you have such files. 
> Therefore i don't like programs which say that these files 
> are equal because even a simple diff -q says that they are different.

No, that is a wilful misinterpretation.  They do not say "these files
are equal" because they can't ever guarantee that.  What they say is
"these two inputs have a high probability of being the same". 
Changing
to a larger digest increases the confidence.  As one-way functions they
also guarantee that it's computationally difficult to find a message
M'
that hashes to the same digest D as a message M that also hashes to D.

> > And SHA-1 is even better...
> 
> Yes, but only if you also check the file size. 
> If you don't check the size you get much more collisions.
> Using a checksum without the size of the data is snake oil.

That is why the last input to SHA-1 *is* the message size!  Zero bits
are added to the original message M to pad, then the number of bits in
the original message also also appended before placing it into the SHA-1
function.  So, I cannot see how you can *ever* run SHA-1 *without* the
message length (file size) because then it isn't SHA-1!

> > > ... the probability which will always be
smaller than exactly 1. 
> > 
> > Life isn't perfectly safe... :-)
> 
> Yes, but i prefer to be exact when i can and it's not costly.

All digests have collision problems by their very nature.

--
Paul Curtis, Rowley Associates Ltd  http://www.rowley.co.uk
CrossWorks for MSP430, ARM, AVR and (soon) MAXQ processors

Reply by June 4, 20052005-06-04
Hi,

> >>MD5 and SHA-1 digests collide very, very
infrequently and they're
> > yes, but they are not absolutely safe. I have some files with only 128
byte which
> > are different and do have the same md5sum.
> 
> I'd very much like to see those files. I've worked in crypto for
ten
> years and I never heard of anyone accidentally discovering two files
> with the same MD5 - despite the theoretical weakness that's been
> analysed. 

See 

http://gilchrist.ca/jeff/md5GUI/
http://gilchrist.ca/jeff/md5GUI/md5col.zip

When you are developing with md5sum you have such files. Therefore i don't
like
programs which say that these files are equal because even a simple diff -q says
that 
they are different.


> And SHA-1 is even better...

Yes, but only if you also check the file size. 
If you don't check the size you get much more collisions.
Using a checksum without the size of the data is snake oil.


> > ... the probability which will always be
smaller than exactly 1. 
> 
> Life isn't perfectly safe... :-)

Yes, but i prefer to be exact when i can and it's not costly.

Regards,

Rolf



Reply by Onestone June 4, 20052005-06-04
Maths probably doesn't even enter into the reality of it. If your 
talking about a last gasp write on power loss. If you're worried about 
not being able to write 1 word to flash you shouldn't even be 
considering  trying to calculate a CRC or othe rvalidity check, then 
write that off plus the original data. That just strikes me as too silly 
for words.

The OP spoecifically queried the issue of writing 4 bytes to flash as a 
time value.  reality says you don't need 4 bytes only 2. assuming a 
simple 32 bit seconds counter. if you wish to be truly frugal with your 
erasewrites to flash  you allocate a single bit for each overflow from 
16 bits, and sequentially program these. lots of writes, but fewer 
erases. personally I allocate 2 rolling stores for this. a 128 byte 
dataflash segment for the MSW and a 512 byte segment for the 16 bit 
count. If you ONLY write the 16 bit LSWat power loss then you only need 
a 128 byte segment for each.

The MSW only gets written every 18 hours of continuous operation, so the 
segment will fill every 48 days, thus 10 years of continuous operation 
will result in 75 segment erase/write cycles.

If the seconds value is ONLY written at power loss then the segment will 
require an erase cycle every 64 power losses. even at 1 an hour ( a 
pretty shitty system) the data flash will only need erasing every 64 
hours, or 2.6666 days. Thus in 10 years only 1370 erase cycles would 
have occurred.

Both of these are well within spec.

The only issue then is to write the flash fast enough. The OP depended 
on reading the ADC. I think this is too slow. I would use the 
comparator, or a low battery warning on the system power supply (this is 
how most of my systems work. A couple of early designs use the on board 
comparator) Since I run from Li-poly batteries, in most cases, and since 
my power supply will run right down to 0.7 Vin I set my trigger voltage 
very high. After a full charge the batteries sit at 4.2V. At 3.6V they 
start to discharge more rapidly. At 3.5V the discharge rate is too fast. 
Thus at 3.6V I trigger my shut down sequence. having given the user 
adequate warning at 3.7V.

Al

augusto einsfeldt wrote:

>It seems to be a math field here.
>Wonder if a simple double crc word would provide enough safety. Say, CRC-16
>and CRC-ITT stored along with data area. Again, math could provide final
>answer but it seems that when calculating a checksum with two different ways
>one can get a good safety level if the FLASH failure rate is below 5 bytes
>long.
>A failure in the data may result in same CRC-16 value but not likely will
>also result in the same CRC-ITT value. It would also be true if the failure
>is in one or both checksum values.
>A failure in the data and onde CRC value would be catch by second CRC.
>I'd like to see someone in this list, with math background, writing
about
>this.
>-Augusto
>
>
>-----Original Message-----
>From: msp430@msp4... [mailto:msp430@msp4...] On Behalf Of
>rolf.freitag@rolf...
>Sent: Friday, June 03, 2005 6:31 PM
>To: msp430@msp4...
>Subject: RE: [msp430] Writing to Flash during power down
>
>
>Hi,
>
>  
>
>>MD5 and SHA-1 digests collide very, very infrequently and they're
>>    
>>
>
>yes, but they are not absolutely safe. I have some files with only 128 byte
>which are different and do have the same md5sum. It's saver to use
md5sum
>and sha1sum but that only increases the probability which will always be
>smaller than exactly 1. 
>I do prefer exactly 1 and not nearly 1; checksums are good in some cases but
>often not good enough (snake oil), especially when you have a safe
>alternative.
>
>Regards,
>
>Rolf
>
>
>
>
>.
>
> 
>Yahoo! Groups Links
>
>
>
> 
>
>
>
>  
>


Reply by Clifford Heath June 3, 20052005-06-03
rolf.freitag@rolf... wrote:
>>MD5 and SHA-1 digests collide very, very
infrequently and they're
> yes, but they are not absolutely safe. I have some files with only 128 byte
which
> are different and do have the same md5sum.

I'd very much like to see those files. I've worked in crypto for ten
years and I never heard of anyone accidentally discovering two files
with the same MD5 - despite the theoretical weakness that's been
analysed. And SHA-1 is even better...

> ... the probability which will always be smaller
than exactly 1. 

Life isn't perfectly safe... :-)

Reply by June 3, 20052005-06-03
Hi,

> but dont you also then need to do three segment
erases during this process
> and therefore this is not too nice as a power-down routine?

no, because two writes (without error) are enough, as Bahadir wrote.
It's no problem when you only have time for two writes and do erase the
next segment
only after a completed write.

Regards,

Rolf