rolf.freitag@rolf... wrote:
> > The birthday paradox says...
> That's only true for files with a) same size
For files larger than the hash, size has *nothing* to do with
it, it's simple mathematics.
> and b) size > 160.
If you mean files smaller than 160 bits (20 bytes) - how many
of those exist anyhow? It's true that ASCII english text has
only 1-2 bits of entropy per byte, but that doesn't stop the
checksum from working, if, like SHA-1, you include the file
size in the hash.
> Because on a typical PC there are many small files
collisions du occur much often
> than in one of 2^80 cases even with a theoretically perfect hash of 160
bits.
Find me a pair then, > 20 bytes.
> md5sum is not theoretically perfect because the
size of the md5sum is 128 bit and the
> collisions where found in 128 bit files.
Again, file size is irrelevant to this, I can't imagine why
you're hung up on it.
> Which hash is theoretically perfect?
None. But the extensive testing and analysis that's been carried
out means that we can get close enough. Remember the cosmic rays :-).
I've also built a duplicate-file finder - it's an interesting project.
Clifford Heath.
Hi,
> The "birthday paradox" (google for it)
says that to reach an 50/50
> chance of finding two files having the same (theoretically perfect)
> hash of 160 bits, you need 2^80 files.
That's only true for files with a) same size and b) size > 160.
Because on a typical PC there are many small files collisions du occur much
often
than in one of 2^80 cases even with a theoretically perfect hash of 160 bits.
md5sum is not theoretically perfect because the size of the md5sum is 128 bit
and the
collisions where found in 128 bit files.
Which hash is theoretically perfect?
I know the file content itself, used with the file size as the hash, is perfect
and therefore
i'm simply using this hash, e. g. for finding duplicate files (
http://freshmeat.net/projects/dupmerge/ ).
Regards,
Rolf
Reply by Onestone●June 4, 20052005-06-04
Hi Jeff, I would concentrate on making the time to write much faster.
That is remove all the latency from the detect to the write process.
Obviously the ideal is to have the segment erased initially, and again
everytime it rolls over. That way, on power up, you only need to find
the next word to write. A write is a very fast operation. One reason I
use the switchers I do is because they will give me regulated supply out
with only 0.7Vin. Since my power loss threshhold is higher than my
actual operating voltage it is almost impossible for me not to be able
to program in time. The key is rapid detection, and, rather than lots of
extra capacitiance on the power supply output have a decent cap on the
input. I have never seen cross segment corruption in the MSP430, but I
specifically designed to prevent such issues from the very start.
Al
Jeff wrote:
>Hi Al
>Losing the data of my last power-up time, is not that important
>which is separate from my other flash data. What I more interested
>in, is when writing my "time up" to a segment, and this segment is
>just allocated to "time up" data. can I loss data or corrupt data
in
>other segments if I power down halfway during a write, for example I
>don't allow enough time between detecting the power down and
>implementing the write procedure. If the worst is, that I corrupt
>the "time-up" data segment, I can live with that. But if there is
a
>chance other segments could be corrupted, then I will have to
>consider that. I don't know the internal architecture of flash, and
>whether TI have put in safe guards for writing only the sector
>requested. I don't know.
>
>Jeff
>
>
>--- In msp430@msp4..., Onestone <onestone@b...> wrote:
>
>
>>Maths probably doesn't even enter into the reality of it. If your
>>talking about a last gasp write on power loss. If you're worried
>>
>>
>about
>
>
>>not being able to write 1 word to flash you shouldn't even be
>>considering trying to calculate a CRC or othe rvalidity check,
>>
>>
>then
>
>
>>write that off plus the original data. That just strikes me as too
>>
>>
>silly
>
>
>>for words.
>>
>>The OP spoecifically queried the issue of writing 4 bytes to flash
>>
>>
>as a
>
>
>>time value. reality says you don't need 4 bytes only 2. assuming
>>
>>
>a
>
>
>>simple 32 bit seconds counter. if you wish to be truly frugal with
>>
>>
>your
>
>
>>erasewrites to flash you allocate a single bit for each overflow
>>
>>
>from
>
>
>>16 bits, and sequentially program these. lots of writes, but fewer
>>erases. personally I allocate 2 rolling stores for this. a 128
>>
>>
>byte
>
>
>>dataflash segment for the MSW and a 512 byte segment for the 16
>>
>>
>bit
>
>
>>count. If you ONLY write the 16 bit LSWat power loss then you only
>>
>>
>need
>
>
>>a 128 byte segment for each.
>>
>>The MSW only gets written every 18 hours of continuous operation,
>>
>>
>so the
>
>
>>segment will fill every 48 days, thus 10 years of continuous
>>
>>
>operation
>
>
>>will result in 75 segment erase/write cycles.
>>
>>If the seconds value is ONLY written at power loss then the
>>
>>
>segment will
>
>
>>require an erase cycle every 64 power losses. even at 1 an hour (
>>
>>
>a
>
>
>>pretty shitty system) the data flash will only need erasing every
>>
>>
>64
>
>
>>hours, or 2.6666 days. Thus in 10 years only 1370 erase cycles
>>
>>
>would
>
>
>>have occurred.
>>
>>Both of these are well within spec.
>>
>>The only issue then is to write the flash fast enough. The OP
>>
>>
>depended
>
>
>>on reading the ADC. I think this is too slow. I would use the
>>comparator, or a low battery warning on the system power supply
>>
>>
>(this is
>
>
>>how most of my systems work. A couple of early designs use the on
>>
>>
>board
>
>
>>comparator) Since I run from Li-poly batteries, in most cases, and
>>
>>
>since
>
>
>>my power supply will run right down to 0.7 Vin I set my trigger
>>
>>
>voltage
>
>
>>very high. After a full charge the batteries sit at 4.2V. At 3.6V
>>
>>
>they
>
>
>>start to discharge more rapidly. At 3.5V the discharge rate is too
>>
>>
>fast.
>
>
>>Thus at 3.6V I trigger my shut down sequence. having given the
>>
>>
>user
>
>
>>adequate warning at 3.7V.
>>
>>Al
>>
>>augusto einsfeldt wrote:
>>
>>
>>
>>>It seems to be a math field here.
>>>Wonder if a simple double crc word would provide enough safety.
>>>
>>>
>Say, CRC-16
>
>
>>>and CRC-ITT stored along with data area. Again, math could
>>>
>>>
>provide final
>
>
>>>answer but it seems that when calculating a checksum with two
>>>
>>>
>different ways
>
>
>>>one can get a good safety level if the FLASH failure rate is
>>>
>>>
>below 5 bytes
>
>
>>>long.
>>>A failure in the data may result in same CRC-16 value but not
>>>
>>>
>likely will
>
>
>>>also result in the same CRC-ITT value. It would also be true if
>>>
>>>
>the failure
>
>
>>>is in one or both checksum values.
>>>A failure in the data and onde CRC value would be catch by second
>>>
>>>
>CRC.
>
>
>>>I'd like to see someone in this list, with math background,
>>>
>>>
>writing about
>
>
>>>this.
>>>-Augusto
>>>
>>>
>>>-----Original Message-----
>>>From: msp430@msp4... [mailto:msp430@msp4...] On
>>>
>>>
>Behalf Of
>
>
>>>rolf.freitag@e...
>>>Sent: Friday, June 03, 2005 6:31 PM
>>>To: msp430@msp4...
>>>Subject: RE: [msp430] Writing to Flash during power down
>>>
>>>
>>>Hi,
>>>
>>>
>>>
>>>
>>>
>>>>MD5 and SHA-1 digests collide very, very infrequently and
they're
>>>>
>>>>
>>>>
>>>>
>>>yes, but they are not absolutely safe. I have some files with
>>>
>>>
>only 128 byte
>
>
>>>which are different and do have the same md5sum. It's saver to
>>>
>>>
>use md5sum
>
>
>>>and sha1sum but that only increases the probability which will
>>>
>>>
>always be
>
>
>>>smaller than exactly 1.
>>>I do prefer exactly 1 and not nearly 1; checksums are good in
>>>
>>>
>some cases but
>
>
>>>often not good enough (snake oil), especially when you have a safe
>>>alternative.
>>>
>>>Regards,
>>>
>>>Rolf
>>>
>>>
>>>
>>>
>>>.
>>>
>>>
>>>Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>
>
>.
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
>
Reply by Jeff●June 4, 20052005-06-04
Hi Al
Losing the data of my last power-up time, is not that important
which is separate from my other flash data. What I more interested
in, is when writing my "time up" to a segment, and this segment is
just allocated to "time up" data. can I loss data or corrupt data in
other segments if I power down halfway during a write, for example I
don't allow enough time between detecting the power down and
implementing the write procedure. If the worst is, that I corrupt
the "time-up" data segment, I can live with that. But if there is a
chance other segments could be corrupted, then I will have to
consider that. I don't know the internal architecture of flash, and
whether TI have put in safe guards for writing only the sector
requested. I don't know.
Jeff
--- In msp430@msp4..., Onestone <onestone@b...> wrote:
> Maths probably doesn't even enter into the
reality of it. If your
> talking about a last gasp write on power loss. If you're worried
about
> not being able to write 1 word to flash you
shouldn't even be
> considering trying to calculate a CRC or othe rvalidity check,
then
> write that off plus the original data. That just
strikes me as too
silly
> for words.
>
> The OP spoecifically queried the issue of writing 4 bytes to flash
as a
> time value. reality says you don't need 4
bytes only 2. assuming
a
> simple 32 bit seconds counter. if you wish to be
truly frugal with
your
> erasewrites to flash you allocate a single bit
for each overflow
from
> 16 bits, and sequentially program these. lots of
writes, but fewer
> erases. personally I allocate 2 rolling stores for this. a 128
byte
> dataflash segment for the MSW and a 512 byte
segment for the 16
bit
> count. If you ONLY write the 16 bit LSWat power
loss then you only
need
> a 128 byte segment for each.
>
> The MSW only gets written every 18 hours of continuous operation,
so the
> segment will fill every 48 days, thus 10 years of
continuous
operation
> will result in 75 segment erase/write cycles.
>
> If the seconds value is ONLY written at power loss then the
segment will
> require an erase cycle every 64 power losses. even
at 1 an hour (
a
> pretty shitty system) the data flash will only
need erasing every
64
> hours, or 2.6666 days. Thus in 10 years only 1370
erase cycles
would
> have occurred.
>
> Both of these are well within spec.
>
> The only issue then is to write the flash fast enough. The OP
depended
> on reading the ADC. I think this is too slow. I
would use the
> comparator, or a low battery warning on the system power supply
(this is
> how most of my systems work. A couple of early
designs use the on
board
> comparator) Since I run from Li-poly batteries, in
most cases, and
since
> my power supply will run right down to 0.7 Vin I
set my trigger
voltage
> very high. After a full charge the batteries sit
at 4.2V. At 3.6V
they
> start to discharge more rapidly. At 3.5V the
discharge rate is too
fast.
> Thus at 3.6V I trigger my shut down sequence.
having given the
user
> adequate warning at 3.7V.
>
> Al
>
> augusto einsfeldt wrote:
>
> >It seems to be a math field here.
> >Wonder if a simple double crc word would provide enough safety.
Say, CRC-16
> >and CRC-ITT stored along with data area.
Again, math could
provide final
> >answer but it seems that when calculating a
checksum with two
different ways
> >one can get a good safety level if the FLASH
failure rate is
below 5 bytes
> >long.
> >A failure in the data may result in same CRC-16 value but not
likely will
> >also result in the same CRC-ITT value. It
would also be true if
the failure
> >is in one or both checksum values.
> >A failure in the data and onde CRC value would be catch by second
CRC.
> >I'd like to see someone in this list,
with math background,
writing about
> >this.
> >-Augusto
> >
> >
> >-----Original Message-----
> >From: msp430@msp4... [mailto:msp430@msp4...] On
Behalf Of
> >rolf.freitag@e...
> >Sent: Friday, June 03, 2005 6:31 PM
> >To: msp430@msp4...
> >Subject: RE: [msp430] Writing to Flash during power down
> >
> >
> >Hi,
> >
> >
> >
> >>MD5 and SHA-1 digests collide very, very infrequently and
they're
> >>
> >>
> >
> >yes, but they are not absolutely safe. I have some files with
only 128 byte
> >which are different and do have the same
md5sum. It's saver to
use md5sum
> >and sha1sum but that only increases the
probability which will
always be
> >smaller than exactly 1.
> >I do prefer exactly 1 and not nearly 1; checksums are good in
some cases but
> >often not good enough (snake oil), especially
when you have a safe
> >alternative.
> >
> >Regards,
> >
> >Rolf
> >
> >
> >
> >
> >.
> >
> >
> >Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >
> >
> >
Reply by Clifford Heath●June 4, 20052005-06-04
Paul Curtis wrote:
> Rolf,
>>>I've worked in crypto for ten years and I never heard of anyone
>>>accidentally discovering two files with the same MD5
>>See http://gilchrist.ca/jeff/md5GUI/
> No, that is a wilful misinterpretation.
Worse, this is an artificial brute-force search of the remaining
key-space left *after* the theoretical weakness is used. I stand
by my claim that it's *never* happened accidentally. Though the
theoretical result means that SHA-1 is preferred, especially
against brute-force reversal of the one-way characteristic.
The "birthday paradox" (google for it) says that to reach an 50/50
chance of finding two files having the same (theoretically perfect)
hash of 160 bits, you need 2^80 files. There have never been (close
to) that many files created in the course of all computing.
Rolf, you'd be surprised how many times per week your life would be
affected if SHA-1 (and its friends) was unreliable. Every credit
card or other electronic transaction, trillions of individual data
blocks mirrored by rsync servers, the list is endless.
> All digests have collision problems by their very
nature.
It's true. But even planets have collision problems. Some problems
are unlikely/infrequent enough to not deserve bothering about. It's
*much* more likely that your program will suffer by a chip being
hit by a cosmic ray.
Clifford Heath.
Reply by Paul Curtis●June 4, 20052005-06-04
Rolf,
> > >>MD5 and SHA-1 digests collide very,
very infrequently and they're
> > > yes, but they are not absolutely safe. I have some files
> with only
> > > 128 byte which are different and do have the same md5sum.
> >
> > I'd very much like to see those files. I've worked in
> crypto for ten
> > years and I never heard of anyone accidentally discovering
> two files
> > with the same MD5 - despite the theoretical weakness that's been
> > analysed.
>
> See
>
> http://gilchrist.ca/jeff/md5GUI/
> http://gilchrist.ca/jeff/md5GUI/md5col.zip
>
> When you are developing with md5sum you have such files.
> Therefore i don't like programs which say that these files
> are equal because even a simple diff -q says that they are different.
No, that is a wilful misinterpretation. They do not say "these files
are equal" because they can't ever guarantee that. What they say is
"these two inputs have a high probability of being the same".
Changing
to a larger digest increases the confidence. As one-way functions they
also guarantee that it's computationally difficult to find a message
M'
that hashes to the same digest D as a message M that also hashes to D.
> > And SHA-1 is even better...
>
> Yes, but only if you also check the file size.
> If you don't check the size you get much more collisions.
> Using a checksum without the size of the data is snake oil.
That is why the last input to SHA-1 *is* the message size! Zero bits
are added to the original message M to pad, then the number of bits in
the original message also also appended before placing it into the SHA-1
function. So, I cannot see how you can *ever* run SHA-1 *without* the
message length (file size) because then it isn't SHA-1!
> > > ... the probability which will always be
smaller than exactly 1.
> >
> > Life isn't perfectly safe... :-)
>
> Yes, but i prefer to be exact when i can and it's not costly.
All digests have collision problems by their very nature.
--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for MSP430, ARM, AVR and (soon) MAXQ processors
Reply by ●June 4, 20052005-06-04
Hi,
> >>MD5 and SHA-1 digests collide very, very
infrequently and they're
> > yes, but they are not absolutely safe. I have some files with only 128
byte which
> > are different and do have the same md5sum.
>
> I'd very much like to see those files. I've worked in crypto for
ten
> years and I never heard of anyone accidentally discovering two files
> with the same MD5 - despite the theoretical weakness that's been
> analysed.
See
http://gilchrist.ca/jeff/md5GUI/http://gilchrist.ca/jeff/md5GUI/md5col.zip
When you are developing with md5sum you have such files. Therefore i don't
like
programs which say that these files are equal because even a simple diff -q says
that
they are different.
> And SHA-1 is even better...
Yes, but only if you also check the file size.
If you don't check the size you get much more collisions.
Using a checksum without the size of the data is snake oil.
> > ... the probability which will always be
smaller than exactly 1.
>
> Life isn't perfectly safe... :-)
Yes, but i prefer to be exact when i can and it's not costly.
Regards,
Rolf
Reply by Onestone●June 4, 20052005-06-04
Maths probably doesn't even enter into the reality of it. If your
talking about a last gasp write on power loss. If you're worried about
not being able to write 1 word to flash you shouldn't even be
considering trying to calculate a CRC or othe rvalidity check, then
write that off plus the original data. That just strikes me as too silly
for words.
The OP spoecifically queried the issue of writing 4 bytes to flash as a
time value. reality says you don't need 4 bytes only 2. assuming a
simple 32 bit seconds counter. if you wish to be truly frugal with your
erasewrites to flash you allocate a single bit for each overflow from
16 bits, and sequentially program these. lots of writes, but fewer
erases. personally I allocate 2 rolling stores for this. a 128 byte
dataflash segment for the MSW and a 512 byte segment for the 16 bit
count. If you ONLY write the 16 bit LSWat power loss then you only need
a 128 byte segment for each.
The MSW only gets written every 18 hours of continuous operation, so the
segment will fill every 48 days, thus 10 years of continuous operation
will result in 75 segment erase/write cycles.
If the seconds value is ONLY written at power loss then the segment will
require an erase cycle every 64 power losses. even at 1 an hour ( a
pretty shitty system) the data flash will only need erasing every 64
hours, or 2.6666 days. Thus in 10 years only 1370 erase cycles would
have occurred.
Both of these are well within spec.
The only issue then is to write the flash fast enough. The OP depended
on reading the ADC. I think this is too slow. I would use the
comparator, or a low battery warning on the system power supply (this is
how most of my systems work. A couple of early designs use the on board
comparator) Since I run from Li-poly batteries, in most cases, and since
my power supply will run right down to 0.7 Vin I set my trigger voltage
very high. After a full charge the batteries sit at 4.2V. At 3.6V they
start to discharge more rapidly. At 3.5V the discharge rate is too fast.
Thus at 3.6V I trigger my shut down sequence. having given the user
adequate warning at 3.7V.
Al
augusto einsfeldt wrote:
>It seems to be a math field here.
>Wonder if a simple double crc word would provide enough safety. Say, CRC-16
>and CRC-ITT stored along with data area. Again, math could provide final
>answer but it seems that when calculating a checksum with two different ways
>one can get a good safety level if the FLASH failure rate is below 5 bytes
>long.
>A failure in the data may result in same CRC-16 value but not likely will
>also result in the same CRC-ITT value. It would also be true if the failure
>is in one or both checksum values.
>A failure in the data and onde CRC value would be catch by second CRC.
>I'd like to see someone in this list, with math background, writing
about
>this.
>-Augusto
>
>
>-----Original Message-----
>From: msp430@msp4... [mailto:msp430@msp4...] On Behalf Of
>rolf.freitag@rolf...
>Sent: Friday, June 03, 2005 6:31 PM
>To: msp430@msp4...
>Subject: RE: [msp430] Writing to Flash during power down
>
>
>Hi,
>
>
>
>>MD5 and SHA-1 digests collide very, very infrequently and they're
>>
>>
>
>yes, but they are not absolutely safe. I have some files with only 128 byte
>which are different and do have the same md5sum. It's saver to use
md5sum
>and sha1sum but that only increases the probability which will always be
>smaller than exactly 1.
>I do prefer exactly 1 and not nearly 1; checksums are good in some cases but
>often not good enough (snake oil), especially when you have a safe
>alternative.
>
>Regards,
>
>Rolf
>
>
>
>
>.
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
Reply by Clifford Heath●June 3, 20052005-06-03
rolf.freitag@rolf... wrote:
>>MD5 and SHA-1 digests collide very, very
infrequently and they're
> yes, but they are not absolutely safe. I have some files with only 128 byte
which
> are different and do have the same md5sum.
I'd very much like to see those files. I've worked in crypto for ten
years and I never heard of anyone accidentally discovering two files
with the same MD5 - despite the theoretical weakness that's been
analysed. And SHA-1 is even better...
> ... the probability which will always be smaller
than exactly 1.
Life isn't perfectly safe... :-)
Reply by ●June 3, 20052005-06-03
Hi,
> but dont you also then need to do three segment
erases during this process
> and therefore this is not too nice as a power-down routine?
no, because two writes (without error) are enough, as Bahadir wrote.
It's no problem when you only have time for two writes and do erase the
next segment
only after a completed write.
Regards,
Rolf