Embedding a Checksum in an Image File| page 5

Reply by Rick C ●April 23, 20232023-04-23

On Saturday, April 22, 2023 at 1:55:01&#8239;PM UTC-4, David Brown wrote:
> On 22/04/2023 18:56, Rick C wrote: 
> > On Saturday, April 22, 2023 at 11:13:32&#8239;AM UTC-4, David Brown wrote: 
> >> On 22/04/2023 05:14, Rick C wrote: 
> >>> On Friday, April 21, 2023 at 11:02:28&#8239;AM UTC-4, David Brown wrote: 
> >>>> On 21/04/2023 14:12, Rick C wrote: 
> >>>>> 
> >>>>> This is simply to be able to say this version is unique, 
> >>>>> regardless of what the version number says. Version numbers are 
> >>>>> set manually and not always done correctly. I'm looking for 
> >>>>> something as a backup so that if the checksums are different, I 
> >>>>> can be sure the versions are not the same. 
> >>>>> 
> >>>>> The less work involved, the better. 
> >>>>> 
> >>>> Run a simple 32-bit crc over the image. The result is a hash of 
> >>>> the image. Any change in the image will show up as a change in the 
> >>>> crc. 
> >>> 
> >>> No one is trying to detect changes in the image. I'm trying to label 
> >>> the image in a way that can be read in operation. I'm using the 
> >>> checksum simply because that is easy to generate. I've had problems 
> >>> with version numbering in the past. It will be used, but I want it 
> >>> supplemented with a number that will change every time the design 
> >>> changes, at least with a high probability, such as 1 in 64k. 
> >>> 
> >> Again - use a CRC. It will give you what you want. 
> > 
> > Again - as will a simple addition checksum.
> A simple addition checksum might be okay much of the time, but it 
> doesn't have the resolving power of a CRC. If the source code changes 
> "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to 
> be exactly the same despite the change in the source. In general, you 
> will have much higher chance of collisions, though I think it would be 
> very hard to quantify that. 
> 
> Maybe it will be good enough for you. Simple checksums were popular 
> once, and can still make sense if you are very short on program space. 
> But there are good reasons why they fell out of favour in many uses.
> > 
> > 
> >> You might want to go for 32-bit CRC rather than a 16-bit CRC, depending 
> >> on the kind of program, how often you build it, and what consequences a 
> >> hash collision could have. With a 16-bit CRC, you have a 5% chance of a 
> >> collision after 82 builds. If collisions only matter for releases, and 
> >> you only release a couple of updates, fine - but if they matter during 
> >> development builds, you are getting a more significant risk. Since a 
> >> 32-bit CRC is quick and easy, it's worth using. 
> > 
> > Or, I might want to go with a simple checksum. 
> > 
> > Thanks for your comments. 
> >
> It's your choice (obviously). I only point out the weaknesses in case 
> anyone else is listening in to the thread. 
> 
> If you like, I can post code for a 32-bit CRC. It's a table, and a few 
> lines of C code.

You know nothing of the project I am working on or those that I typically work on.  But thanks for the advice. 

-- 

  Rick C.

  +-- Get 1,000 miles of free Supercharging
  +-- Tesla referral code - https://ts.la/richard11209

Reply by Grant Edwards ●April 23, 20232023-04-23

On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:

> Another thing you can look at is the distribution of checksum outputs, 
> for random inputs.  For an additive checksum, you can consider your 
> input as N independent 0-255 random values, added together.  The result 
> will be a normal distribution of the checksum.  If you have, say, a 100 
> byte data block and a 16-bit checksum, it's clear that you will never 
> get a checksum value greater than 25500, and that you are much more 
> likely to get a value close to 12750.

It never occurred to me that for an N-bit checksum, you would sum
something other than N-bit "words" of the input data.

--
Grant

Reply by David Brown ●April 23, 20232023-04-23

On 23/04/2023 19:37, Grant Edwards wrote:
> On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:
> 
>> Another thing you can look at is the distribution of checksum outputs,
>> for random inputs.  For an additive checksum, you can consider your
>> input as N independent 0-255 random values, added together.  The result
>> will be a normal distribution of the checksum.  If you have, say, a 100
>> byte data block and a 16-bit checksum, it's clear that you will never
>> get a checksum value greater than 25500, and that you are much more
>> likely to get a value close to 12750.
> 
> It never occurred to me that for an N-bit checksum, you would sum
> something other than N-bit "words" of the input data.
> 

Usually - in my experience - you sum bytes, using an unsigned integer 
8-bit or 16-bit wide.  Simple additive checksums are often used on small 
8-bit microcontrollers where CRC's are seen (rightly or wrongly) as too 
demanding.  Perhaps other people have different experiences.

You could certainly sum 16-bit words to get your 16-bit additive 
checksum, and that would give a different kind of clustering - maybe 
better, maybe not.

Reply by David Brown ●April 23, 20232023-04-23

On 23/04/2023 19:34, Rick C wrote:
> On Saturday, April 22, 2023 at 1:55:01&#8239;PM UTC-4, David Brown wrote:
>> On 22/04/2023 18:56, Rick C wrote:
>>> On Saturday, April 22, 2023 at 11:13:32&#8239;AM UTC-4, David Brown
>>> wrote:
>>>> On 22/04/2023 05:14, Rick C wrote:
>>>>> On Friday, April 21, 2023 at 11:02:28&#8239;AM UTC-4, David Brown
>>>>> wrote:
>>>>>> On 21/04/2023 14:12, Rick C wrote:
>>>>>>> 
>>>>>>> This is simply to be able to say this version is unique, 
>>>>>>> regardless of what the version number says. Version
>>>>>>> numbers are set manually and not always done correctly.
>>>>>>> I'm looking for something as a backup so that if the
>>>>>>> checksums are different, I can be sure the versions are
>>>>>>> not the same.
>>>>>>> 
>>>>>>> The less work involved, the better.
>>>>>>> 
>>>>>> Run a simple 32-bit crc over the image. The result is a
>>>>>> hash of the image. Any change in the image will show up as
>>>>>> a change in the crc.
>>>>> 
>>>>> No one is trying to detect changes in the image. I'm trying
>>>>> to label the image in a way that can be read in operation.
>>>>> I'm using the checksum simply because that is easy to
>>>>> generate. I've had problems with version numbering in the
>>>>> past. It will be used, but I want it supplemented with a
>>>>> number that will change every time the design changes, at
>>>>> least with a high probability, such as 1 in 64k.
>>>>> 
>>>> Again - use a CRC. It will give you what you want.
>>> 
>>> Again - as will a simple addition checksum.
>> A simple addition checksum might be okay much of the time, but it 
>> doesn't have the resolving power of a CRC. If the source code
>> changes "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum
>> is likely to be exactly the same despite the change in the source.
>> In general, you will have much higher chance of collisions, though
>> I think it would be very hard to quantify that.
>> 
>> Maybe it will be good enough for you. Simple checksums were
>> popular once, and can still make sense if you are very short on
>> program space. But there are good reasons why they fell out of
>> favour in many uses.
>>> 
>>> 
>>>> You might want to go for 32-bit CRC rather than a 16-bit CRC,
>>>> depending on the kind of program, how often you build it, and
>>>> what consequences a hash collision could have. With a 16-bit
>>>> CRC, you have a 5% chance of a collision after 82 builds. If
>>>> collisions only matter for releases, and you only release a
>>>> couple of updates, fine - but if they matter during development
>>>> builds, you are getting a more significant risk. Since a 32-bit
>>>> CRC is quick and easy, it's worth using.
>>> 
>>> Or, I might want to go with a simple checksum.
>>> 
>>> Thanks for your comments.
>>> 
>> It's your choice (obviously). I only point out the weaknesses in
>> case anyone else is listening in to the thread.
>> 
>> If you like, I can post code for a 32-bit CRC. It's a table, and a
>> few lines of C code.
> 
> You know nothing of the project I am working on or those that I
> typically work on.  But thanks for the advice.
> 

You haven't given much to go on.  It is still not really clear (to me, 
at least) if you are asking about checksums or how to manipulate binary 
images as part of a build process, or what you are really asking.

When someone wants a checksum on an image file, the appropriate choice 
in most cases is a CRC.  If security is an issue, then a secure hash is 
needed.  For a very limited system, additive checksums might be then 
only realistic choice.

But more often, the reason people pick additive checksums rather than 
CRCs is because they don't realise that CRCs are actually very simple 
and efficient to implement.  People unfamiliar with them might have read 
a little, and think they need to do calculations for each bit (which is 
possible but /slow/), or that they would have to understand the theory 
of binary polynomial division rings (they don't).  They think CRC's are 
complicated and advanced, and shy away from them.

There are a number of people who read this group - maybe some of them 
have learned a little from this thread.

Reply by Richard Damon ●April 23, 20232023-04-23

On 4/23/23 5:45 PM, David Brown wrote:
> On 23/04/2023 19:37, Grant Edwards wrote:
>> On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:
>>
>>> Another thing you can look at is the distribution of checksum outputs,
>>> for random inputs.&nbsp; For an additive checksum, you can consider your
>>> input as N independent 0-255 random values, added together.&nbsp; The result
>>> will be a normal distribution of the checksum.&nbsp; If you have, say, a 100
>>> byte data block and a 16-bit checksum, it's clear that you will never
>>> get a checksum value greater than 25500, and that you are much more
>>> likely to get a value close to 12750.
>>
>> It never occurred to me that for an N-bit checksum, you would sum
>> something other than N-bit "words" of the input data.
>>
> 
> Usually - in my experience - you sum bytes, using an unsigned integer 
> 8-bit or 16-bit wide.&nbsp; Simple additive checksums are often used on small 
> 8-bit microcontrollers where CRC's are seen (rightly or wrongly) as too 
> demanding.&nbsp; Perhaps other people have different experiences.
> 
> You could certainly sum 16-bit words to get your 16-bit additive 
> checksum, and that would give a different kind of clustering - maybe 
> better, maybe not.
> 
> 

I have seen 16-bit checksums done both ways. Summing 16 bit units does 
eliminate the issue of clustering, and makes adjacent byte swaps 
detectable.

Reply by Rick C ●April 23, 20232023-04-23

On Sunday, April 23, 2023 at 5:58:51&#8239;PM UTC-4, David Brown wrote:
> On 23/04/2023 19:34, Rick C wrote: 
> > On Saturday, April 22, 2023 at 1:55:01&#8239;PM UTC-4, David Brown wrote: 
> >> On 22/04/2023 18:56, Rick C wrote: 
> >>> On Saturday, April 22, 2023 at 11:13:32&#8239;AM UTC-4, David Brown 
> >>> wrote: 
> >>>> On 22/04/2023 05:14, Rick C wrote: 
> >>>>> On Friday, April 21, 2023 at 11:02:28&#8239;AM UTC-4, David Brown 
> >>>>> wrote: 
> >>>>>> On 21/04/2023 14:12, Rick C wrote: 
> >>>>>>> 
> >>>>>>> This is simply to be able to say this version is unique, 
> >>>>>>> regardless of what the version number says. Version 
> >>>>>>> numbers are set manually and not always done correctly. 
> >>>>>>> I'm looking for something as a backup so that if the 
> >>>>>>> checksums are different, I can be sure the versions are 
> >>>>>>> not the same. 
> >>>>>>> 
> >>>>>>> The less work involved, the better. 
> >>>>>>> 
> >>>>>> Run a simple 32-bit crc over the image. The result is a 
> >>>>>> hash of the image. Any change in the image will show up as 
> >>>>>> a change in the crc. 
> >>>>> 
> >>>>> No one is trying to detect changes in the image. I'm trying 
> >>>>> to label the image in a way that can be read in operation. 
> >>>>> I'm using the checksum simply because that is easy to 
> >>>>> generate. I've had problems with version numbering in the 
> >>>>> past. It will be used, but I want it supplemented with a 
> >>>>> number that will change every time the design changes, at 
> >>>>> least with a high probability, such as 1 in 64k. 
> >>>>> 
> >>>> Again - use a CRC. It will give you what you want. 
> >>> 
> >>> Again - as will a simple addition checksum. 
> >> A simple addition checksum might be okay much of the time, but it 
> >> doesn't have the resolving power of a CRC. If the source code 
> >> changes "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum 
> >> is likely to be exactly the same despite the change in the source. 
> >> In general, you will have much higher chance of collisions, though 
> >> I think it would be very hard to quantify that. 
> >> 
> >> Maybe it will be good enough for you. Simple checksums were 
> >> popular once, and can still make sense if you are very short on 
> >> program space. But there are good reasons why they fell out of 
> >> favour in many uses. 
> >>> 
> >>> 
> >>>> You might want to go for 32-bit CRC rather than a 16-bit CRC, 
> >>>> depending on the kind of program, how often you build it, and 
> >>>> what consequences a hash collision could have. With a 16-bit 
> >>>> CRC, you have a 5% chance of a collision after 82 builds. If 
> >>>> collisions only matter for releases, and you only release a 
> >>>> couple of updates, fine - but if they matter during development 
> >>>> builds, you are getting a more significant risk. Since a 32-bit 
> >>>> CRC is quick and easy, it's worth using. 
> >>> 
> >>> Or, I might want to go with a simple checksum. 
> >>> 
> >>> Thanks for your comments. 
> >>> 
> >> It's your choice (obviously). I only point out the weaknesses in 
> >> case anyone else is listening in to the thread. 
> >> 
> >> If you like, I can post code for a 32-bit CRC. It's a table, and a 
> >> few lines of C code. 
> > 
> > You know nothing of the project I am working on or those that I 
> > typically work on. But thanks for the advice. 
> >
> You haven't given much to go on. It is still not really clear (to me, 
> at least) if you are asking about checksums or how to manipulate binary 
> images as part of a build process, or what you are really asking. 

If you don't understand, you are making this far more complicated than it is.  I don't know what to tell you.  There are no other details that are relevant.  Don't read into this, what is not there. 


> When someone wants a checksum on an image file, the appropriate choice 
> in most cases is a CRC. 

Why?  What makes a CRC an "appropriate" choice.  Normally, when I design something, I establish the requirements.  What requirements are you assuming, that would make the CRC more desireable than a simple checksum? 


> If security is an issue, then a secure hash is 
> needed. For a very limited system, additive checksums might be then 
> only realistic choice. 

What have I said that makes you think security is an issue???  I don't recall ever mentioning anything about security.  Do you recall what I did say? 


> But more often, the reason people pick additive checksums rather than 
> CRCs is because they don't realise that CRCs are actually very simple 
> and efficient to implement. 

The fact that they are "simple and efficient" is not a reason to use them.  I repeat, what are the requirements? 


> People unfamiliar with them might have read 
> a little, and think they need to do calculations for each bit (which is 
> possible but /slow/), or that they would have to understand the theory 
> of binary polynomial division rings (they don't). They think CRC's are 
> complicated and advanced, and shy away from them. 
> 
> There are a number of people who read this group - maybe some of them 
> have learned a little from this thread.

I suppose there is that possibility.  But when people make claims about something being good or "better", without substantiation, there's not much to learn. 

If you think a discussion of CRC calculations would be useful, why don't you open a thread and discuss them, instead of insisting they are the right solution to my problem, when you don't even know what the problem requirements are?  It's all here in the thread.  You only need to read, without projecting your opinions on the problem statement. 

-- 

  Rick C.

  +-+ Get 1,000 miles of free Supercharging
  +-+ Tesla referral code - https://ts.la/richard11209

Reply by David Brown ●April 24, 20232023-04-24

On 24/04/2023 00:16, Richard Damon wrote:
> On 4/23/23 5:45 PM, David Brown wrote:
>> On 23/04/2023 19:37, Grant Edwards wrote:
>>> On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:
>>>
>>>> Another thing you can look at is the distribution of checksum outputs,
>>>> for random inputs.&nbsp; For an additive checksum, you can consider your
>>>> input as N independent 0-255 random values, added together.&nbsp; The result
>>>> will be a normal distribution of the checksum.&nbsp; If you have, say, a 100
>>>> byte data block and a 16-bit checksum, it's clear that you will never
>>>> get a checksum value greater than 25500, and that you are much more
>>>> likely to get a value close to 12750.
>>>
>>> It never occurred to me that for an N-bit checksum, you would sum
>>> something other than N-bit "words" of the input data.
>>>
>>
>> Usually - in my experience - you sum bytes, using an unsigned integer 
>> 8-bit or 16-bit wide.&nbsp; Simple additive checksums are often used on 
>> small 8-bit microcontrollers where CRC's are seen (rightly or wrongly) 
>> as too demanding.&nbsp; Perhaps other people have different experiences.
>>
>> You could certainly sum 16-bit words to get your 16-bit additive 
>> checksum, and that would give a different kind of clustering - maybe 
>> better, maybe not.
>>
>>
> 
> I have seen 16-bit checksums done both ways. Summing 16 bit units does 
> eliminate the issue of clustering, and makes adjacent byte swaps 
> detectable.

Long ago, there used to be a definite risk of mixing up endianness when 
dealing with program images burned to flash or eeprom.  Popular "hex" 
formats like Intel Hex and Motorola SRecord could differ in endianness. 
So byte swaps in the entire image was a real possibility, and good to 
guard against.  But it's hard to imagine how an individual byte swap 
could occur - I see bigger movements and re-arrangements being more 
likely, and using 16-bit units will not help much there.  Still, I think 
there is little doubt that using 16-bit units is better than using 8-bit 
units in many ways (except for efficient implementation on small 8-bit 
devices).

Reply by David Brown ●April 24, 20232023-04-24

On 24/04/2023 00:24, Rick C wrote:
> On Sunday, April 23, 2023 at 5:58:51&#8239;PM UTC-4, David Brown wrote:
> 
>> When someone wants a checksum on an image file, the appropriate
>> choice in most cases is a CRC.
> 
> Why?  What makes a CRC an "appropriate" choice.  Normally, when I
> design something, I establish the requirements.  What requirements
> are you assuming, that would make the CRC more desireable than a
> simple checksum?
> 

I've already explained this in quite a lot of detail in this thread (as 
have others).  If you don't like my explanation, or didn't read it, 
that's okay.  You are under no obligation to learn about CRCs.  Or if 
you prefer to look it up in other sources, that's obviously also an option.

> 
>> If security is an issue, then a secure hash is needed. For a very
>> limited system, additive checksums might be then only realistic
>> choice.
> 
> What have I said that makes you think security is an issue???  I
> don't recall ever mentioning anything about security.  Do you recall
> what I did say?
> 
> 
> If you think a discussion of CRC calculations would be useful, why
> don't you open a thread and discuss them, instead of insisting they
> are the right solution to my problem, when you don't even know what
> the problem requirements are?  It's all here in the thread.  You only
> need to read, without projecting your opinions on the problem
> statement.
> 


I've asked you this before - are you /sure/ you understand how Usenet works?

Reply by Don Y ●April 24, 20232023-04-24

On 4/22/2023 7:57 AM, David Brown wrote:
>>> However, in almost every case where CRC's might be useful, you have 
>>> additional checks of the sanity of the data, and an all-zero or all-one data 
>>> block would be rejected.&nbsp; For example, Ethernet packets use CRC for 
>>> integrity checking, but an attempt to send a packet type 0 from MAC address 
>>> 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would be 
>>> rejected anyway.
>>
>> Why look at "data" -- which may be suspect -- and *then* check its CRC?
>> Run the CRC first.&nbsp; If it fails, decide how you are going to proceed
>> or recover.
> 
> That is usually the order, yes.&nbsp; Sometimes you want "fail fast", such as 
> dropping a packet that was not addressed to you (it doesn't matter if it was 
> received correctly but for someone else, or it was addressed to you but the 
> receiver address was corrupted - you are dropping the packet either way).&nbsp; But 
> usually you will run the CRC then look at the data.
> 
> But the order doesn't matter - either way, you are still checking for valid 
> data, and if the data is invalid, it does not matter if the CRC only passed by 
> luck or by all zeros.

You're assuming the CRC is supposed to *vouch* for the data.
The CRC can be there simply to vouch for the *transport* of a
datagram.

>>> I can't think of any use-cases where you would be passing around a block of 
>>> "pure" data that could reasonably take absolutely any value, without any 
>>> type of "envelope" information, and where you would think a CRC check is 
>>> appropriate.
>>
>> I append a *version specific* CRC to each packet of marshalled data
>> in my RMIs.&nbsp; If the data is corrupted in transit *or* if the
>> wrong version API ends up targeted, the operation will abend
>> because we know the data "isn't right".
> 
> Using a version-specific CRC sounds silly.&nbsp; Put the version information in the 
> packet.

The packet routed to a particular interface is *supposed* to
conform to "version X" of an interface.  There are different stubs
generated for different versions of EACH interface.  The OCL for
the interface defines (and is used to check) the form of that
interface to that service/mechanism.

The parameters are checked on the client side -- why tie up the
transport medium with data that is inappropriate (redundant)
to THAT interface?  Why tie up the server verifying that data?
The stub generator can perform all of those checks automatically
and CONSISTENTLY based on the OCL definition of that version
of that interface (because developers make mistakes).

So, at the instant you schedule the marshalled data for transmission,
you *know* the parameters are "appropriate" and compliant with
the constraints of THAT version of THAT interface.

Now, you have to ensure the packet doesn't get corrupted (altered) in
transmission.  If it remains intact, then there is no need to check
the parameters on the server side.

NONE OF THE PARAMETERS... including the (implied) "interface version" field!

Yet, folks make mistakes.  So, you want some additional reassurance
that this is at least intended for this version of the interface,
ESPECIALLY IF THAT CAN BE MADE AVAILABLE FOR ZERO COST (i.e., check
to see if the residual is 0xDEADBEEF instead of 0xB16B00B5).

Why burden the packet with a "protocol version" parameter?

So, use a version-specific CRC on the packet.  If it fails, then
either the data in the packet has been corrupted (which could just
as easily have involved an embedded "interface version" parameter);
or the packet was formed with the wrong CRC.

If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
why bother looking at a "protocol version" parameter?  Would
you ALSO want to verify all the rest of the parameters?

>> I *could* put a header saying "this is version 4.2".&nbsp; And, that
>> tells me nothing about the integrity of the rest of the data.
>> OTOH, ensuring the CRC reflects "4.2" does -- it the recipient
>> expects it to be so.
> 
> Now you don't know if the data is corrupted, or for the wrong version - or 
> occasionally, corrupted /and/ the wrong version but passing the CRC anyway.

You don't know if the parameters have been corrupted in a manner that
allows a packet intended for the correct interface to appear as correct.
What's your point?

> Unless you are absolutely desperate to save every bit you can, your system will 
> be simpler, clearer, and more reliable if you separate your purposes.

Yes.  You verify the correct interface at the client side -- where
it is invoked by the client and enforced in the OCL generated stub.
Thereafter, the server is concerned with corruption during transport
and the version specific CRC just gives another reassurance of
correct version without adding another cost.

[Imagine EVERY subroutine function call in your system having
such overhead.  Would you want to push an "interface version"
onto the stack along with all of the arguments for that
subr/ftn?  Or, would you just hope everything was intact?]

>>>>>> You can also "salt" the calculation so that the residual
>>>>>> is deliberately nonzero.&nbsp; So, for example, "success" is
>>>>>> indicated by a residual of 0x474E.&nbsp; :>
>>>>>
>>>>> Again, pointless.
>>>>>
>>>>> Salt is important for security-related hashes (like password hashes), not 
>>>>> for integrity checks.
>>>>
>>>> You've missed the point.&nbsp; The correct "sum" can be anything.
>>>> Why is "0" more special than any other value?&nbsp; As the value is
>>>> typically meaningless to anything other than the code that verifies
>>>> it, you couldn't look at an image (or the output of the verifier)
>>>> and gain anything from seeing that obscure value.
>>>
>>> Do you actually know what is meant by "salt" in the context of hashes, and 
>>> why it is useful in some circumstances?&nbsp; Do you understand that "salt" is 
>>> added (usually prepended, or occasionally mixed in in some other way) to the 
>>> data /before/ the hash is calculated?
>>
>> What term would you have me use to indicate a "bias" applied to a CRC
>> algorithm?
> 
> Well, first I'd note that any kind of modification to the basic CRC algorithm 
> is pointless from the viewpoint of its use as an integrity check.&nbsp; (There have 
> been, mostly historically, some justifications in terms of implementation 
> efficiency.&nbsp; For example, bit and byte re-ordering could be done to suit 
> hardware bit-wise implementations.)
> 
> Otherwise I'd say you are picking a specific initial value if that is what you 
> are doing, or modifying the final value (inverting it or xor'ing it with a 
> fixed value).&nbsp; There is, AFAIK, no specific terms for these - and I don't see 
> any benefit in having one.&nbsp; Misusing the term "salt" from cryptography is 
> certainly not helpful.

Salt just ensures that you can differentiate between functionally identical
values.  I.e., in a CRC, it differentiates between the "0x0000" that CRC-1
generates from the "0x0000" that CRC-2 generates.

You don't see the parallel to ensuring that *my* use of "Passw0rd" is
encoded in a different manner than *your* use of "Passw0rd"?

>> See the RMI desciption.
> 
> I'm sorry, I have no idea what "RMI" is or where it is described. You've 
> mentioned that abbreviation twice, but I can't figure it out.

<https://en.wikipedia.org/wiki/RMI>
<https://en.wikipedia.org/wiki/OCL>

Nothing magical with either term.

>> OTOH, "salting" the calculation so that it is expected to yield
>> a value of 0x13 means *those* situations will be flagged as errors
>> (and a different set of situations will sneak by, undetected).
> 
> And that gives you exactly /zero/ benefit.

See above.

> You run your hash algorithm, and check for the single value that indicates no 
> errors.&nbsp; It does not matter if that number is 0, 0x13, or - often more 
-----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As you've admitted, it doesn't matter.  So, why wouldn't I opt to have
an algorithm for THIS interface give me a result that is EXPECTED
for this protocol?  What value picking "0"?

> conveniently - the number attached at the end of the image as the expected 
> result of the hash of the rest of the data.

>>> To be more accurate, the chances of them passing unnoticed are of the order 
>>> of 1 in 2^n, for a good n-bit check such as a CRC check. Certain types of 
>>> error are always detectable, such as single and double bit errors.&nbsp; That is 
>>> the point of using a checksum or hash for integrity checking.
>>>
>>> /Intentional/ changes are a different matter.&nbsp; If a hacker changes the 
>>> program image, they can change the transmitted hash to their own calculated 
>>> hash.&nbsp; Or for a small CRC, they could change a different part of the image 
>>> until the original checksum matched - for a 16-bit CRC, that only takes 
>>> 65,535 attempts in the worst case.
>>
>> If the approach used is "typical", then you need far fewer attempts to
>> produce a correct image -- without EVER knowing where the CRC is stored.
> 
> It is difficult to know what you are trying to say here, but if you believe 
> that different initial values in a CRC algorithm makes it harder to modify an 
> image to make it pass the integrity test, you are simply wrong.

Of course it does!  You don't KNOW what to expect -- unless you've identified
where the test is performed in the code and the result stored/checked.  If
you assume the residual will be 0 and make an attempt to generate a new
checksum that yields 0 and it doesn't work FIRST TIME, then, by definition,
it is HARDER (more work is required -- even if not *conceptually* more
*difficult*).

*My* example use of the different salt is for a different purpose.
And, isn't meant as a deterrent to any developer/attacker but, rather,
simply to ensure the transmission of the packet is intact AND carries
some reassurance that it is in the correct format.

>>> That is why you need to distinguish between the two possibilities.&nbsp; If you 
>>> don't have to worry about malicious attacks, a 32-bit CRC takes a dozen 
>>> lines of C code and a 1 KB table, all running extremely efficiently.&nbsp; If 
>>> security is an issue, you need digital signatures - an RSA-based signature 
>>> system is orders of magnitude more effort in both development time and in 
>>> run time.
>>
>> It's considerably more expensive AND not fool-proof -- esp if the
>> attacker knows you are signing binaries.&nbsp; "OK, now I need to find
>> WHERE the signature is verified and just patch that "CALL" out
>> of the code".
> 
> I'm not sure if that is a straw-man argument, or just showing your ignorance of 
> the topic.&nbsp; Do you really think security checks are done by the program you are 
> trying to send securely?&nbsp; That would be like trying to have building security 
> where people entering the building look at their own security cards.

Do YOU really think we all design applications that run in PCs where some
CLOSED OS performs these tests in a manner that can't be subverted?
*WE* (tend to) write ALL the code in the products developed, here.
So, whether it's the POST WE wrote that is performing the test or
the loader WE wrote, it's still *our* program.

Yes, we ARE looking at our own security cards!

Manufacturers *try* to hide ("obscurity") details of these mechanisms
in an attempt to improve effective security.  But, there's nothing
that makes these guarantees.

Give me the sources for Windows (Linux, *BSD, etc.) and I can
subvert all the state-of-the-art digital signing used to ensure
binaries aren't altered.  Nothing *outside* the box is involved
so, by definition, everything I need has to reside *in* the box.

DataI/O was always paranoid about their software/firmware.
They rely on custom silicon to "protect" their investment.
But, use a COTS CPU to execute the code!

So, pull the MC68K out of its socket and plug in an emulator.
Capture the execution trace and you know exactly what the
instruction  stream is/was -- despite it's encoding on the
distribution media.

>>>> I altered the test procedure for a
>>>> piece of military gear we were building simply to skip some lengthy tests 
>>>> that I *knew* would pass (I don't want to inject an extra 20 minutes of 
>>>> wait time
>>>> just to get through a lengthy test I already know works before I can get
>>>> to the test of interest to me, now.
>>>>
>>>> I failed to undo the change before the official signoff on the device.
>>>>
>>>> The only evidence of this was the fact that I had also patched the
>>>> startup message to say "Go for coffee..." -- which remained on the
>>>> screen for the duration of the lengthy (even with the long test
>>>> elided) procedure...
>>>>
>>>> ..which alerted folks to the fact that this *probably* wasn't the
>>>> original image.&nbsp; (The computer running the test suite on the DUT had
>>>> no problem accepting my patched binary)
>>>
>>> And what, exactly, do you think that anecdote tells us about CRC checks for 
>>> image files?&nbsp; It reminds us that we are all fallible, but does no more than 
>>> that.
>>
>> That *was* the point.&nbsp; Because the folks who designed the test computer
>> relied on common techniques to safeguard the image.
> 
> There was a human error - procedures were not good enough, or were not 
> followed.&nbsp; It happens, and you learn from it and make better procedures. &nbsp;The 
> fault was in what people did, not in an automated integrity check. &nbsp;It is 
> completely unrelated.

It shows that the check was designed without consideration of how
it might be subverted.  This is the most common flaw in all
security schemes -- failing to consider an attack/fault vector.

The vendor assumed no one would deliberately alter the test
procedure.  That anyone running it would willingly sit through an
extra half hour of tests ALREADY KNOWN TO PASS instead of opting
to find a way to skip that (because the test designer only consider
"sell off" when designing the test and not *debug* and the test
platform didn't provide hooks to facilitate that, either!!)

I unplugged a cable between two pieces of equipment that I had
never seen before to subvert a security mechanism in a product.
Because the designers never considered the fact that someone
might do that!

Security is no different from any other "solution".  You test
the divisor before a calculation because you reasonably expect to
encounter "unfortunate" values and don't want the operation to
fail.

>> The counterfeiting example I cited indicates how "obscurity/secrecy"
>> is far more effective (yet you dismiss it out-of-hand).
> 
> No, it does nothing of the sort.&nbsp; There is no connection at all.

The counterfeiter lost the lawsuit because he was unaware (obscurity)
of the hidden SECURITY measures in the product design.  This proven
by his attempts to defeat the OBVIOUS ones!

>>>>> CRC's alone are useless.&nbsp; All the attacker needs to do after modifying the 
>>>>> image is calculate the CRC themselves, and replace the original checksum 
>>>>> with their own.
>>>>
>>>> That assumes the "alterer" knows how to replace the checksum, how it
>>>> is computed, where it is embedded in the image, etc.&nbsp; I modified the Compaq
>>>> portable mentioned without ever knowing where the checksum was store
>>>> or *if* it was explicitly stored.&nbsp; I had no desire to disassemble the
>>>> BIOS ROMs (though could obviously do so as there was no "proprietary
>>>> hardware" limiting access to their contents and the instruction set of
>>>> the processor is well known!).
>>>>
>>>> Instead, I did this by *guessing* how they would implement such a check
>>>> in a bit of kit from that era (ERPOMs aren't easily modified by malware
>>>> so it wasn't likely that they would go to great lengths to "protect" the
>>>> image).&nbsp; And, if my guess had been incorrect, I could always reinstall
>>>> the original EPROMs -- nothing lost, nothing gained.
>>>>
>>>> Had much experience with folks counterfeiting your products and making
>>>> "simple" changes to the binaries?&nbsp; Like changing the copyright notice
>>>> or splash screen?
>>>>
>>>> Then, bringing the (accused) counterfeit of YOUR product into a courtroom
>>>> and revealing the *hidden* checksum that the counterfeiter wasn't aware of?
>>>>
>>>> "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
>>>> to behaving exactly like mine??"
>>>>
>>>> [I guess obscurity has its place!]
>>>
>>> Security by obscurity is not security.&nbsp; Having a hidden signature or other 
>>> mark can be useful for proving ownership (making an intentional mistake is 
>>> another common tactic - such as commercial maps having a few subtle spelling 
>>> errors). But that is not security.
>>
>> Of course it is!&nbsp; If *you* check the "hidden signature" at runtime
>> and then alter "your" operation such that an altered copy fails
>> to perform properly, then then you have secured it.
> 
> That is not security.&nbsp; "Security" means that the program that starts the 
> updated program checks the /entire/ image according to its digital signature, 
> and rejects it /entirely/ if it does not match.

No, that's *your* naive assumption of security.  It's why such attempts
invariably fail; they are "drop dead" implementations that make it
clear to anyone trying to subvert that security that their
efforts have not (yet) succeeded.

The goal is to prevent the program/device from being used without
authorization/compensation.  If it KILLS the user as a result of some
hidden feature, it has met its goal -- even if a draconian approach.
If it *pretends* to be doing what you want --- and then fails to
complete some later step -- it is similarly preventing unauthorized
use (and tying up a lot of your time, in the process).

If you want to ensure the image isn't *corrupt* (which could
lead to failures that could invite lawsuits, etc.), then you
are concerned with INTEGRITY.

> What you are talking about here is the sort of cat-and-mouse nonsense computer 
> games producers did with intentional disk errors to stop copying.&nbsp; It annoys 
> legitimate users and does almost nothing to hinder the bad guys.

Because it was a bad solution that was fairly obvious in its presence:
"I can't copy this disk!  Let me buy Copy2PC..."

The same applies to most licensing schemes and other "tamper
detection" mechanisms.

>> Would you want to use a check-writing program if the account
>> balances it maintains were subtly (but not consistently)
>> incorrect?
> 
> Again, you make no sense.&nbsp; What has this got to do with integrity checks or 
> security?

If you;re selling check writing software and want to prevent
FlyByNight Accounting Software, Inc. from stealing your
product and reselling it as your own, a great way to prevent that
is to ensure THEIR copy of the product causes accounting errors
that are hard to notice. Their customers will (eventually)
complain that THEIR product is buggy.  But, yours isn't!

If your goal is to track your checks accurately, you're
likely not going to want to wonder what yet-to-be-discovered
errors exist in the "books" that THEIR software has been
maintaining for you.

The original vendor has secured his product against tampering.

>> Checking signatures, CRCs, licensing schemes, etc. all are used
>> in a "drop dead" fashion so considerably easier to defeat.
>> Witness the number of "products" available as warez...
> 
> Look, it is all /really/ simple.&nbsp; And the year is 2023, not 1973.

Yes!  And it is considerably easier to subvert naive mechanisms
AND SHARE YOUR HACKS!

> If you want to check the integrity of a file against accidental changes, a CRC 
> is usually fine.

As is a CRC on a network packet.  Without having to double-check the
contents of that packet after it has been verified on the sending side!

> If you want security, and to protect against malicious changes, use a digital 
> signature.&nbsp; This must be checked by the program that /starts/ the updated code, 
> or that downloaded and stored it - not by the program itself!

And who wrote THAT program?  Where is it, physically?  Is there some device
OUTSIDE of the device that you've built that securely performs these
checks?

>> Only answer with a support contract.
> 
> Oh, sure - the amateurs who have some of the information but not enough 
> details, skill or knowledge to get things working will /never/ fill forums with 
> questions, complaints or bad reviews that bother your support staff or scare 
> away real sales.

A forum doesn't have to be "public".  FUD can scare off real sales
even in the total absence of information (or knowledge).

Your goal should always be to produce a good product that does
what it claims to do.  And, rely on the happiness of your
customers to directly (or indirectly) generate additional sales.

I've never "advertised" my services.  I'm actually pretty hard to get
in touch with!  Yet, clients never had a hard time finding me -- through
other clients who were happy with my work.  As they likely weren't
direct competitors to the original clients, they had nothing to
fear (lose) from sharing me, as a resource.

Similarly, a customer making widgets that employ some feature of
your device likely has little to lose by sharing his (good or bad)
experiences with another POTENTIAL customer (making wodjets).
And, likely can benefit from the goodwill he receives from that
other customer as well as from *you* ("Thanks for recommending
us to him!").  And, ensures a continued demand for your products
so you continue to be available for HIS needs!

>>> They are limited by idiotic government export restrictions made by ignorant 
>>> politicians who don't understand cryptography.
>>
>> Protections don't always have to be cryptographic. 
> 
> Correct, but - as with a lot of what you write - completely irrelevant to the 
> subject at hand.
> 
> Why can't companies give out information about the security systems used in 
> their microcontrollers (for example) ?&nbsp; Because some geriatric ignoramuses 
> think banning "export" of such information to certain countries will stop those 
> countries knowing about security and cryptography.

Do you really think that's the sole reason for all the "secrecy" and NDAs?
I've had to sit with gummit folks and sort out what parts of our technology
could LEGALLY be exported.  Even to our partners in the UK!  Some of it
makes sense ("Nothing goes to Libya!").  Some is bogus.

And, thinking that you can put up a wall that is impermeable is a joke.
Just like printing PGP in book form and selling books overseas.

Or, hiring someone who worked for Company X.  Or, bribing someone
to make a photocopy of <whatever>.

But, this doesn't mean one should ENCOURAGE dissemination of things
that may have special security/economic value.  "Delay" often has
as much value as "deter".

A friend who designed arcade pieces recounted how he was contacted by a guy
who had disassembled ~40KB of (hand-written) code in one of his products.
He had even uncovered latent bugs (!) in the code.

But, his efforts were so "late" that the product had long ago lost
commercial value.  So, it may have been flattering that someone
would invest that much time in such an endeavor.  But, little else.

Nowadays, tools would make that a trivial undertaking.  And, the
possibility of easily enlisting others in the effort (without
resorting to clandestine channels).  OTOH, projects are now
considerably larger (orders of magnitude).  OToOH, much current
work in done in HLLs (so tools can recognize their code genrator
patterns) and with "standard" libraries; I can recognize a call to
printf without decompiling any 9of the code -- folks aren't
likely going to replace "%d" with "?b" just to obscure functionality!

>>> Some things benefit from being kept hidden, or under restricted access. The 
>>> details of the CRC algorithm you use to catch accidental errors in your 
>>> image file is /not/ one of them.&nbsp; If you think hiding it has the remotest 
>>> hint of a benefit, you are doing things wrong - you need a /security/ check, 
>>> not a simple /integrity/ check.
>>>
>>> And then once you have switched to a security check - a digital signature - 
>>> there's no need to keep that choice hidden either, because it is the /key/ 
>>> that is important, not the type of lock.
>>
>> Again, meaningless if the attacker can interfere with the *enforcement*
>> of that check.&nbsp; Using something "well known" just means he already knows
>> what to look for in your code.&nbsp; Or, how to interfere with your
>> intended implementation in ways that you may have not anticipated
>> (confident that your "security" can't be MATHEMATICALLY broken).
>>
> 
> If the attacker can interfere with the enforcement of the check, then it 
> doesn't matter what checks you have.&nbsp; Keeping the design of a building's locks 
> secret does not help you if the bad guys have bribed the security guard 
> /inside/ the building!

But, if that's the only way to subvert the secrets of those locks,
then you only have to worry about keeping that security guard "happy".

>> "Here's my source code.&nbsp; Here are my schematics.&nbsp; Here's the
>> name of the guy who oversees production (bribe him to gain
>> access to the keys stored in the TPM).&nbsp; Now, what are you
>> gonna *do* with all that?"
> 
> The first two should be fine - if people can break your security after looking 
> at your source code or schematics, your security is /bad/.&nbsp; As for the third 
> one, if they can break your security by going through the production guy, your 
> production procedures are bad.

You can change your production procedures without having to redesign your
product.  You don't want to embrace a solution/technology that may soon/later
be subverted (e.g., SHA1) and have to redesign portions of your product
(which may already be deployed) to "fix".

IMO, this is the downside of modern cryptography -- if you have a product
with any significant lifespan and "exposure".  You never know when the
next "uncrackable" algorithm will fall.  And, when someone might opt to
marshall a community's resources to attack a particular implementation.

Attacks that used to be considered "nation-state scale" are quickly
becoming "big business scale" and even "network of workstations scale".
So, any implementation that *shares* a key across a product line
is vulnerable to the entire product line being compromised when/if
that key is disclosed/broken.

[I generate unique keys for each device on the customer's site
using a dedicated (physically secure) interface so even the manufacturer
doesn't know what they are.  Crack one (possibly by physically attacking
the device and microprobing the die) and all you get it that one
device -- and whatever *its* role in the system may have been.]

Reply by Rick C ●April 24, 20232023-04-24

On Monday, April 24, 2023 at 3:17:33&#8239;AM UTC-4, David Brown wrote:
> On 24/04/2023 00:24, Rick C wrote: 
> > On Sunday, April 23, 2023 at 5:58:51&#8239;PM UTC-4, David Brown wrote: 
> > 
> >> When someone wants a checksum on an image file, the appropriate 
> >> choice in most cases is a CRC. 
> > 
> > Why? What makes a CRC an "appropriate" choice. Normally, when I 
> > design something, I establish the requirements. What requirements 
> > are you assuming, that would make the CRC more desireable than a 
> > simple checksum? 
> >
> I've already explained this in quite a lot of detail in this thread (as 
> have others). If you don't like my explanation, or didn't read it, 
> that's okay. You are under no obligation to learn about CRCs. Or if 
> you prefer to look it up in other sources, that's obviously also an option.

Hmmm...  I ask you a question about why you think CRC is better for my application and you respond oddly.  So you can't explain why the CRC would be better for my application?  OK, thanks anyway. 


> >> If security is an issue, then a secure hash is needed. For a very 
> >> limited system, additive checksums might be then only realistic 
> >> choice. 
> > 
> > What have I said that makes you think security is an issue??? I 
> > don't recall ever mentioning anything about security. Do you recall 
> > what I did say? 
> > 
> >
> > If you think a discussion of CRC calculations would be useful, why 
> > don't you open a thread and discuss them, instead of insisting they 
> > are the right solution to my problem, when you don't even know what 
> > the problem requirements are? It's all here in the thread. You only 
> > need to read, without projecting your opinions on the problem 
> > statement. 
> >
> I've asked you this before - are you /sure/ you understand how Usenet works?

I will say this again, rather than burying your comments on CRC in this thread about checksums, why not open a new thread, and allow the world to read what you have to say, instead of commenting as a side topic in a thread where most people have tuned out long ago?  You can use an appropriate subject line like, "Why CRC is better than checksums for some applications".  

Or you can continue to muddy up the waters here by discussing something that is of no value in this application. 

-- 

  Rick C.

  ++- Get 1,000 miles of free Supercharging
  ++- Tesla referral code - https://ts.la/richard11209

Previous 3 456 7 8 Next

Embedding a Checksum in an Image File

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group