Embedding a Checksum in an Image File| page 3

Reply by David Brown ●April 21, 20232023-04-21

On 21/04/2023 13:39, Don Y wrote:
> On 4/21/2023 3:43 AM, David Brown wrote:
>>> Note that you want to choose a polynomial that doesn't
>>> give you a "win" result for "obviously" corrupt data.
>>> E.g., if data is all zeros or all 0xFF (as these sorts of
>>> conditions can happen with hardware failures) you probably
>>> wouldn't want a "success" indication!
>>
>> No, that is pointless for something like a code image.&nbsp; It just adds 
>> needless complexity to your CRC algorithm.
> 
> Perhaps you've forgotten that you don't just use CRCs (secure hashes, etc.)
> on "code images"?

No - but "code images" is the topic here.

However, in almost every case where CRC's might be useful, you have 
additional checks of the sanity of the data, and an all-zero or all-one 
data block would be rejected.  For example, Ethernet packets use CRC for 
integrity checking, but an attempt to send a packet type 0 from MAC 
address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, 
would be rejected anyway.

I can't think of any use-cases where you would be passing around a block 
of "pure" data that could reasonably take absolutely any value, without 
any type of "envelope" information, and where you would think a CRC 
check is appropriate.

> 
>> You should already have checks that would eliminate an all-zero image 
>> or other "obviously corrupt" data.&nbsp; You'll be checking the image for a 
>> key or "magic number" that identifies the image as "program image for 
>> board X, project Y". You'll be checking version numbers.&nbsp; You'll be 
>> reading the length of the image so you know the range for your CRC 
>> function, and where to find the appended CRC check.&nbsp; You might not 
>> have all of these in a given system, but you'll have some kind of 
>> check which would fail on an all-zero image.
> 
> See above.

See above.

> 
>>> You can also "salt" the calculation so that the residual
>>> is deliberately nonzero.&nbsp; So, for example, "success" is
>>> indicated by a residual of 0x474E.&nbsp; :>
>>
>> Again, pointless.
>>
>> Salt is important for security-related hashes (like password hashes), 
>> not for integrity checks.
> 
> You've missed the point.&nbsp; The correct "sum" can be anything.
> Why is "0" more special than any other value?&nbsp; As the value is
> typically meaningless to anything other than the code that verifies
> it, you couldn't look at an image (or the output of the verifier)
> and gain anything from seeing that obscure value.

Do you actually know what is meant by "salt" in the context of hashes, 
and why it is useful in some circumstances?  Do you understand that 
"salt" is added (usually prepended, or occasionally mixed in in some 
other way) to the data /before/ the hash is calculated?

I have not given the slightest indication to suggest that "0" is a 
special value.  I fully agree that the value you get from the checking 
algorithm does not have to be 0 - I already suggested it could be 
compared to the stored value.  I.e., your build your image file as "data 
++ crc(data)", at check it by re-calculating "crc(data)" on the received 
image and comparing the result to the received crc.  There is no 
necessity or benefit in having a crc run calculated over the received 
data plus the received crc being 0.

"Salt" is used in cases where the original data must be kept secret, and 
only the hashes are transmitted or accessible - by adding salt to the 
original data before hashing it, you avoid a direct correspondence 
between the hash and the original data.  The prime use-case is to stop 
people being able to figure out a password by looking up the hash in a 
list of pre-computed hashes of common passwords.

> 
> OTOH, if the CRC yields something familiar -- or useful -- then
> it can tell you something about the image.&nbsp; E.g., salt the algorithm
> with the product code, version number, your initials, 0xDEADBEEF, etc.
> 

You are making no sense at all.  Are you suggesting that it would be a 
good idea to add some value to the start of the image so that the 
resulting crc calculation gives a nice recognisable product code?  This 
"salt" would be different for each program image, and calculated by 
trial and error.  If you want a product code, version number, etc., in 
the program image (and it's a good idea), just put these in the program 
image!

>>>> So now you have a new extended block&nbsp;&nbsp; |....data....|crc|
>>>>
>>>> Now if you compute a new CRC on the extended block, the resulting
>>>> value /should/ come out to zero. If it doesn't, either your data or
>>>> the original CRC value appended to it has been changed/corrupted.
>>>
>>> As there is usually a lack of originality in the algorithms
>>> chosen, you have to consider if you are also hoping to use
>>> this to safeguard the *integrity* of your image (i.e.,
>>> against intentional modification).
>>
>> "Integrity" has nothing to do with the motivation for change. 
>> /Security/ is concerned with intentional modifications that 
>> deliberately attempt to defeat /integrity/ checks.&nbsp; Integrity is about 
>> detecting any changes.
>>
>> If you are concerned about the possibility of intentional malicious 
>> changes, 
> 
> Changes don't have to be malicious.  

Accidental changes (such as human error, noise during data transfer, 
memory cell errors, etc.) do not pass integrity tests unnoticed.  To be 
more accurate, the chances of them passing unnoticed are of the order of 
1 in 2^n, for a good n-bit check such as a CRC check.  Certain types of 
error are always detectable, such as single and double bit errors.  That 
is the point of using a checksum or hash for integrity checking.

/Intentional/ changes are a different matter.  If a hacker changes the 
program image, they can change the transmitted hash to their own 
calculated hash.  Or for a small CRC, they could change a different part 
of the image until the original checksum matched - for a 16-bit CRC, 
that only takes 65,535 attempts in the worst case.

That is why you need to distinguish between the two possibilities.  If 
you don't have to worry about malicious attacks, a 32-bit CRC takes a 
dozen lines of C code and a 1 KB table, all running extremely 
efficiently.  If security is an issue, you need digital signatures - an 
RSA-based signature system is orders of magnitude more effort in both 
development time and in run time.

> I altered the test procedure for a
> piece of military gear we were building simply to skip some lengthy 
> tests that I *knew* would pass (I don't want to inject an extra 20 
> minutes of wait time
> just to get through a lengthy test I already know works before I can get
> to the test of interest to me, now.
> 
> I failed to undo the change before the official signoff on the device.
> 
> The only evidence of this was the fact that I had also patched the
> startup message to say "Go for coffee..." -- which remained on the
> screen for the duration of the lengthy (even with the long test
> elided) procedure...
> 
> ..which alerted folks to the fact that this *probably* wasn't the
> original image.&nbsp; (The computer running the test suite on the DUT had
> no problem accepting my patched binary)

And what, exactly, do you think that anecdote tells us about CRC checks 
for image files?  It reminds us that we are all fallible, but does no 
more than that.

> 
>> CRC's alone are useless.&nbsp; All the attacker needs to do after modifying 
>> the image is calculate the CRC themselves, and replace the original 
>> checksum with their own.
> 
> That assumes the "alterer" knows how to replace the checksum, how it
> is computed, where it is embedded in the image, etc.&nbsp; I modified the Compaq
> portable mentioned without ever knowing where the checksum was store
> or *if* it was explicitly stored.&nbsp; I had no desire to disassemble the
> BIOS ROMs (though could obviously do so as there was no "proprietary
> hardware" limiting access to their contents and the instruction set of
> the processor is well known!).
> 
> Instead, I did this by *guessing* how they would implement such a check
> in a bit of kit from that era (ERPOMs aren't easily modified by malware
> so it wasn't likely that they would go to great lengths to "protect" the
> image).&nbsp; And, if my guess had been incorrect, I could always reinstall
> the original EPROMs -- nothing lost, nothing gained.
> 
> Had much experience with folks counterfeiting your products and making
> "simple" changes to the binaries?&nbsp; Like changing the copyright notice
> or splash screen?
> 
> Then, bringing the (accused) counterfeit of YOUR product into a courtroom
> and revealing the *hidden* checksum that the counterfeiter wasn't aware of?
> 
> "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
> to behaving exactly like mine??"
> 
> [I guess obscurity has its place!]

Security by obscurity is not security.  Having a hidden signature or 
other mark can be useful for proving ownership (making an intentional 
mistake is another common tactic - such as commercial maps having a few 
subtle spelling errors).  But that is not security.

> 
> Use a non-secret approach and you invite folks to alter it, as well.
> 
>> Using non-standard algorithms for security is a simple way to get 
>> things completely wrong.&nbsp; "Security by obscurity" is very rarely the 
>> right answer.&nbsp; In reality, good security algorithms, and good 
>> implementations, are difficult and specialised tasks, best left to 
>> people who know what they are doing.
>>
>> To make something secure, you have to ensure that the check algorithms 
>> depend on a key that you know, but that the attacker does not have. 
>> That's the basis of digital signatures (though you use a secure hash 
>> algorithm rather than a simple CRC).
> 
> If you can remove the check, then what value the key's secrecy?&nbsp; By your
> criteria, the adversary KNOWS how you are implementing your security
> so he knows exactly what to remove to bypass your checks and allow his
> altered image to operate in its place.
> 
> Ever notice how manufacturers don't PUBLICLY disclose their security
> hooks (without an NDA)?&nbsp; If "security by obscurity" was not important,
> they would publish these details INVITING challenges (instead of
> trying to limit the knowledge to people with whom they've officially
> contracted).
> 

Any serious manufacturer /does/ invite challenges to their security.

There are multiple reasons why a manufacturer (such as a semiconductor 
manufacturer) might be guarded about the details of their security 
systems.  They can be avoiding giving hints to competitors.  Maybe they 
know their systems aren't really very secure, because their keys are too 
short or they can be read out in some way.

But I think the main reasons are often:

They want to be able to change the details, and that's far easier if 
there are only a few people who have read the information.

They don't want endless support questions from amateurs.

They are limited by idiotic government export restrictions made by 
ignorant politicians who don't understand cryptography.

Some things benefit from being kept hidden, or under restricted access. 
The details of the CRC algorithm you use to catch accidental errors in 
your image file is /not/ one of them.  If you think hiding it has the 
remotest hint of a benefit, you are doing things wrong - you need a 
/security/ check, not a simple /integrity/ check.

And then once you have switched to a security check - a digital 
signature - there's no need to keep that choice hidden either, because 
it is the /key/ that is important, not the type of lock.

Reply by David Brown ●April 21, 20232023-04-21

On 21/04/2023 14:12, Rick C wrote:
> 
> This is simply to be able to say this version is unique, regardless
> of what the version number says.  Version numbers are set manually
> and not always done correctly.  I'm looking for something as a backup
> so that if the checksums are different, I can be sure the versions
> are not the same.
> 
> The less work involved, the better.
> 

Run a simple 32-bit crc over the image.  The result is a hash of the 
image.  Any change in the image will show up as a change in the crc.

Reply by Stefan Reuther ●April 21, 20232023-04-21

Am 20.04.2023 um 22:44 schrieb George Neuner:
> On Thu, 20 Apr 2023 09:45:59 -0700 (PDT), Rick C
>> CRC is not complicated, but I would not know how to calculate an
>> inserted value to force the resulting CRC to zero.  How do you do
>> that? 
> 
> It's implicit in the equation they chose.  I don't know how it works -
> just that it does.

It works for any CRC that starts with zero and does not invert.

CRC is based on polynomial division remainders. Basically, the CRC is a
division remainder of the input interpreted as a polynomial, and if you
add that remainder back into the equation, result is zero.

https://crccalc.com/?crc=12345678&method=CRC-16/AUG-CCITT&datatype=hex&outtype=0
-> result is 0xBA3C

https://crccalc.com/?crc=12345678BA3C&method=CRC-16/AUG-CCITT&datatype=hex&outtype=0
-> result is 0x0000

(Need to be careful with byte orders; for some CRCs on that page, you
need to swap the bytes before appending.)


  Stefan

Reply by Richard Damon ●April 21, 20232023-04-21

On 4/20/23 10:41 PM, Rick C wrote:
> On Thursday, April 20, 2023 at 10:09:35&#8239;PM UTC-4, Richard Damon wrote:
>> On 4/19/23 10:06 PM, Rick C wrote:
>>> This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.
>>>
>>> I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.
>>>
>>> I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.
>>>
>>> I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.
>>>
>>> I keep thinking there is a different way of looking at this to achieve the result I want...
>>>
>>> Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.
>>>
>>> This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.
>>>
>> IF I understand you correctly, what you want is for the file to compute
>> to some "checksum" that comes from the basic contents of the file, and
>> then you want to add the "checksum" into the file so the program itself
>> can print its checksum.
>>
>> One fact to remember, is that "cryptographic hashes" were invented
>> because it was too easy to create a faked file that matches a
>> non-crptographic hash/checksum, so that couldn't be a key to make sure
>> you really had the right file in the presence of a determined enemy, but
>> the checksums were good enough to catch "random" errors.
>>
>> This means that you can add the checksum into the file, and some
>> additional bytes (likely at the end) and by knowing the propeties of the
>> checksum algorithm, compute a value for those extra bytes such that the
>> "undo" the changes caused by adding the checksum bytes to file.
>>
>> I'm not sure exactly how to computes these, but the key is that you add
>> something at the end of the file to get the checksum back to what the
>> original file had before you added the checksum into the file.
> 
> Yeah, for a simple checksum, I think that would be easy, at least if "checksum" means a bitwise XOR operation.  If the checksum and extra bytes are both 16 bits, this would also work for an arithmetic checksum where each 16 bit word were added into the checksum.  All the carries would cascade out of the upper 16 bits from adding the inserted checksum and it's 2's complement.
> 
> I don't even want to think about using a CRC to try to do this.
> 

Its is a bit of work, but even a 32-bit CRC will be solvable to find the 
reverse equation. You can do the work once generically, and get a 
formula that computes the value you need to put into the final bytes to 
get the CRC of the file back to the CRC it was before adding the CRC and 
the extra bytes. It wouldn't surprise me if the formula isn't published 
somewhere for the common CRCs.

Reply by Brian Cockburn ●April 21, 20232023-04-21

On Friday, April 21, 2023 at 10:12:49&#8239;PM UTC+10, Rick C wrote:
> On Friday, April 21, 2023 at 4:53:18&#8239;AM UTC-4, Brian Cockburn wrote: 
> > On Thursday, April 20, 2023 at 12:06:36&#8239;PM UTC+10, Rick C wrote: 
> > > This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. 
> > > 
> > > I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16. 
> > > 
> > > I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum. 
> > > 
> > > I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy. 
> > > 
> > > I keep thinking there is a different way of looking at this to achieve the result I want... 
> > > 
> > > Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact. 
> > > 
> > > This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that. 
> > > 
> > > -- 
> > > 
> > > Rick C. 
> > > 
> > > - Get 1,000 miles of free Supercharging 
> > > - Tesla referral code - https://ts.la/richard11209 
> > Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection?
> This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same. 
> 
> The less work involved, the better. 
> 
> -- 
> 
> Rick C. 
> 
> ++ Get 1,000 miles of free Supercharging 
> ++ Tesla referral code - https://ts.la/richard11209
Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself?  Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'?  For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked.  Neither of these involve embedding the 'checksum' value.
And just to be sure I understand what you wrote in a somewhat convoluted way.  When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?

Reply by Brian Cockburn ●April 21, 20232023-04-21

On Saturday, April 22, 2023 at 1:02:28&#8239;AM UTC+10, David Brown wrote:
> On 21/04/2023 14:12, Rick C wrote: 
> > 
> > This is simply to be able to say this version is unique, regardless 
> > of what the version number says. Version numbers are set manually 
> > and not always done correctly. I'm looking for something as a backup 
> > so that if the checksums are different, I can be sure the versions 
> > are not the same. 
> > 
> > The less work involved, the better. 
> >
> Run a simple 32-bit crc over the image. The result is a hash of the 
> image. Any change in the image will show up as a change in the crc.
David, a hash and a CRC are not the same thing.  They both produce a reasonably unique result though.  Any change would show in either (unless as a result of intentional tampering).

Reply by Don Y ●April 21, 20232023-04-21

On 4/21/2023 7:50 AM, David Brown wrote:
> On 21/04/2023 13:39, Don Y wrote:
>> On 4/21/2023 3:43 AM, David Brown wrote:
>>>> Note that you want to choose a polynomial that doesn't
>>>> give you a "win" result for "obviously" corrupt data.
>>>> E.g., if data is all zeros or all 0xFF (as these sorts of
>>>> conditions can happen with hardware failures) you probably
>>>> wouldn't want a "success" indication!
>>>
>>> No, that is pointless for something like a code image.&nbsp; It just adds 
>>> needless complexity to your CRC algorithm.
>>
>> Perhaps you've forgotten that you don't just use CRCs (secure hashes, etc.)
>> on "code images"?
> 
> No - but "code images" is the topic here.

So, anything unrelated to CRC's as applied to code images is off limits...
per order of the Internet Police"?

If *all* you use CRCs for is checking *a* code image at POST, you're
wasting a valuable resource.

Do you not think data/parameters need to be safeguarded?  Program images?
Communication protocols?

Or, do you develop yet another technique for *each* of those?

> However, in almost every case where CRC's might be useful, you have additional 
> checks of the sanity of the data, and an all-zero or all-one data block would 
> be rejected.&nbsp; For example, Ethernet packets use CRC for integrity checking, but 
> an attempt to send a packet type 0 from MAC address 00:00:00:00:00:00 to 
> address 00:00:00:00:00:00, of length 0, would be rejected anyway.

Why look at "data" -- which may be suspect -- and *then* check its CRC?
Run the CRC first.  If it fails, decide how you are going to proceed
or recover.

["Data" can be code or parameters]

I treat blocks of "data" (carefully arranged) with individual CRCs,
based on their relative importance to the operation.  If the CRC is
corrupt, I have no idea *where* the error lies -- as it could
be anything in the checked block.  So, one has to (typically)
restore some defaults (or, invoke a reconfigure operation) which
recreates *a* valid dataset.

This is particularly useful when power to a device can be
removed at arbitrary points in time (or, some other abrupt
crash).  Before altering anything in a block, take deliberate
steps to invalidate the CRC, make your changes, then "fix"
the CRC.  So, an interrupted process causes the CRC to fail
and remedial action taken.

Note that replacing a FLASH image (mostly code) falls under
such a mechanism.

> I can't think of any use-cases where you would be passing around a block of 
> "pure" data that could reasonably take absolutely any value, without any type 
> of "envelope" information, and where you would think a CRC check is appropriate.

I append a *version specific* CRC to each packet of marshalled data
in my RMIs.  If the data is corrupted in transit *or* if the
wrong version API ends up targeted, the operation will abend
because we know the data "isn't right".

I *could* put a header saying "this is version 4.2".  And, that
tells me nothing about the integrity of the rest of the data.
OTOH, ensuring the CRC reflects "4.2" does -- it the recipient
expects it to be so.

>>>> You can also "salt" the calculation so that the residual
>>>> is deliberately nonzero.&nbsp; So, for example, "success" is
>>>> indicated by a residual of 0x474E.&nbsp; :>
>>>
>>> Again, pointless.
>>>
>>> Salt is important for security-related hashes (like password hashes), not 
>>> for integrity checks.
>>
>> You've missed the point.&nbsp; The correct "sum" can be anything.
>> Why is "0" more special than any other value?&nbsp; As the value is
>> typically meaningless to anything other than the code that verifies
>> it, you couldn't look at an image (or the output of the verifier)
>> and gain anything from seeing that obscure value.
> 
> Do you actually know what is meant by "salt" in the context of hashes, and why 
> it is useful in some circumstances?&nbsp; Do you understand that "salt" is added 
> (usually prepended, or occasionally mixed in in some other way) to the data 
> /before/ the hash is calculated?

What term would you have me use to indicate a "bias" applied to a CRC
algorithm?

> I have not given the slightest indication to suggest that "0" is a special 
> value.&nbsp; I fully agree that the value you get from the checking algorithm does 
> not have to be 0 - I already suggested it could be compared to the stored 
> value.&nbsp; I.e., your build your image file as "data ++ crc(data)", at check it by 
> re-calculating "crc(data)" on the received image and comparing the result to 
> the received crc.&nbsp; There is no necessity or benefit in having a crc run 
> calculated over the received data plus the received crc being 0.
> 
> "Salt" is used in cases where the original data must be kept secret, and only 
> the hashes are transmitted or accessible - by adding salt to the original data 
> before hashing it, you avoid a direct correspondence between the hash and the 
> original data.&nbsp; The prime use-case is to stop people being able to figure out a 
> password by looking up the hash in a list of pre-computed hashes of common 
> passwords.

See above.

>> OTOH, if the CRC yields something familiar -- or useful -- then
>> it can tell you something about the image.&nbsp; E.g., salt the algorithm
>> with the product code, version number, your initials, 0xDEADBEEF, etc.
> 
> You are making no sense at all.&nbsp; Are you suggesting that it would be a good 
> idea to add some value to the start of the image so that the resulting crc 
> calculation gives a nice recognisable product code?&nbsp; This "salt" would be 
> different for each program image, and calculated by trial and error.&nbsp; If you 
> want a product code, version number, etc., in the program image (and it's a 
> good idea), just put these in the program image!

Again, that tells you nothing about the rest of the image!
See the RMI desciption.

[Note that the OP is expecting the checksum to help *him*
identify versions:  "Just put these in the program image!"  Eh?]

>>>>> So now you have a new extended block&nbsp;&nbsp; |....data....|crc|
>>>>>
>>>>> Now if you compute a new CRC on the extended block, the resulting
>>>>> value /should/ come out to zero. If it doesn't, either your data or
>>>>> the original CRC value appended to it has been changed/corrupted.
>>>>
>>>> As there is usually a lack of originality in the algorithms
>>>> chosen, you have to consider if you are also hoping to use
>>>> this to safeguard the *integrity* of your image (i.e.,
>>>> against intentional modification).
>>>
>>> "Integrity" has nothing to do with the motivation for change. /Security/ is 
>>> concerned with intentional modifications that deliberately attempt to defeat 
>>> /integrity/ checks.&nbsp; Integrity is about detecting any changes.
>>>
>>> If you are concerned about the possibility of intentional malicious changes, 
>>
>> Changes don't have to be malicious. 
> 
> Accidental changes (such as human error, noise during data transfer, memory 
> cell errors, etc.) do not pass integrity tests unnoticed.

That's not true.  The role of the 8test* is to notice these.  If the test
is blind to the types of errors that are likely to occur, then it CAN'T
notice them.

A CRC (hash, etc.) reduces a large block of data to a small bit of
data.  So, by definition, there are multiple DIFFERENT sets of data that
map to the same CRC/hash/etc.  (2^(data_size-CRC-size))

E.g., simply summing the values in a block of memory will yield "0"
for ANY condition that results in the block having identical values
for ALL members, if the block size is a power of 2.  So, a block
of 0xFF, 0x00, 0xFE, 0x27, 0x88, etc. will all yield the same sum.
Clearly a bad choice of test!

OTOH, "salting" the calculation so that it is expected to yield
a value of 0x13 means *those* situations will be flagged as errors
(and a different set of situations will sneak by, undetected).
The trick (engineering) is to figure out which types of
failures/faults/errors are most common to occur and guard
against them.

> To be more accurate, 
> the chances of them passing unnoticed are of the order of 1 in 2^n, for a good 
> n-bit check such as a CRC check.&nbsp; Certain types of error are always detectable, 
> such as single and double bit errors.&nbsp; That is the point of using a checksum or 
> hash for integrity checking.
> 
> /Intentional/ changes are a different matter.&nbsp; If a hacker changes the program 
> image, they can change the transmitted hash to their own calculated hash.&nbsp; Or 
> for a small CRC, they could change a different part of the image until the 
> original checksum matched - for a 16-bit CRC, that only takes 65,535 attempts 
> in the worst case.

If the approach used is "typical", then you need far fewer attempts to
produce a correct image -- without EVER knowing where the CRC is stored.

> That is why you need to distinguish between the two possibilities.&nbsp; If you 
> don't have to worry about malicious attacks, a 32-bit CRC takes a dozen lines 
> of C code and a 1 KB table, all running extremely efficiently.&nbsp; If security is 
> an issue, you need digital signatures - an RSA-based signature system is orders 
> of magnitude more effort in both development time and in run time.

It's considerably more expensive AND not fool-proof -- esp if the
attacker knows you are signing binaries.  "OK, now I need to find
WHERE the signature is verified and just patch that "CALL" out
of the code".

>> I altered the test procedure for a
>> piece of military gear we were building simply to skip some lengthy tests 
>> that I *knew* would pass (I don't want to inject an extra 20 minutes of wait 
>> time
>> just to get through a lengthy test I already know works before I can get
>> to the test of interest to me, now.
>>
>> I failed to undo the change before the official signoff on the device.
>>
>> The only evidence of this was the fact that I had also patched the
>> startup message to say "Go for coffee..." -- which remained on the
>> screen for the duration of the lengthy (even with the long test
>> elided) procedure...
>>
>> ..which alerted folks to the fact that this *probably* wasn't the
>> original image.&nbsp; (The computer running the test suite on the DUT had
>> no problem accepting my patched binary)
> 
> And what, exactly, do you think that anecdote tells us about CRC checks for 
> image files?&nbsp; It reminds us that we are all fallible, but does no more than that.

That *was* the point.  Because the folks who designed the test computer
relied on common techniques to safeguard the image.

The counterfeiting example I cited indicates how "obscurity/secrecy"
is far more effective (yet you dismiss it out-of-hand).

>>> CRC's alone are useless.&nbsp; All the attacker needs to do after modifying the 
>>> image is calculate the CRC themselves, and replace the original checksum 
>>> with their own.
>>
>> That assumes the "alterer" knows how to replace the checksum, how it
>> is computed, where it is embedded in the image, etc.&nbsp; I modified the Compaq
>> portable mentioned without ever knowing where the checksum was store
>> or *if* it was explicitly stored.&nbsp; I had no desire to disassemble the
>> BIOS ROMs (though could obviously do so as there was no "proprietary
>> hardware" limiting access to their contents and the instruction set of
>> the processor is well known!).
>>
>> Instead, I did this by *guessing* how they would implement such a check
>> in a bit of kit from that era (ERPOMs aren't easily modified by malware
>> so it wasn't likely that they would go to great lengths to "protect" the
>> image).&nbsp; And, if my guess had been incorrect, I could always reinstall
>> the original EPROMs -- nothing lost, nothing gained.
>>
>> Had much experience with folks counterfeiting your products and making
>> "simple" changes to the binaries?&nbsp; Like changing the copyright notice
>> or splash screen?
>>
>> Then, bringing the (accused) counterfeit of YOUR product into a courtroom
>> and revealing the *hidden* checksum that the counterfeiter wasn't aware of?
>>
>> "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
>> to behaving exactly like mine??"
>>
>> [I guess obscurity has its place!]
> 
> Security by obscurity is not security.&nbsp; Having a hidden signature or other mark 
> can be useful for proving ownership (making an intentional mistake is another 
> common tactic - such as commercial maps having a few subtle spelling errors).  
> But that is not security.

Of course it is!  If *you* check the "hidden signature" at runtime
and then alter "your" operation such that an altered copy fails
to perform properly, then then you have secured it.

Would you want to use a check-writing program if the account
balances it maintains were subtly (but not consistently)
incorrect?

OTOH, if the (altered) program threw up a splash screen and
said "Unlicensed copy detected" and refused to operate, the
"program" is still "secured" -- but, now you've provided an
easy indicator of whether or not the security has been
defeated.

We started doing this in the heyday of video (arcade) gaming;
a counterfeiter would have a clone of YOUR game on the market
(at substantially reduced prices) in a matter of *weeks*.
As Operators have no foreknowledge of which games will be
moneymakers and which will be "90 day wonders" (literally,
no longer played after 90 days of exposure!), what incentive
to pay for a genuine article?

If all a counterfeiter had to do was alter the copyright
notice (even if it was stored in some coded form), or alter
some graphics (name of game, colors/shapes of characters)
that's *no* impediment -- given how often and quickly
it could be done.

Games would not just look at their images during POST
but, also, verify that routineX() had some particular
side-effect that could be tested, etc.  Counterfeiters
would go to lengths to ensure even THESE tests would pass.

Because the game would *complain*, otherwise!  (so, keep
looking for more tests until the game stops throwing an
alarm).

OTOH, if you *hide* the checks in the runtime and alter
the game's performance subtly by folding expected values
into key calculations such that values derived from
altered code differ, you can annoy the player:  "why did
my guy just turn blue and run off the edge of the screen?"
An annoyed player stops putting money into a game.
A game that doesn't earn money -- regardless of how
inexpensive it was to purchase -- quickly teaches the
Owner not to invest in such "buggy" games.

This is much better than taking the counterfeiter to court and
proving the code is a copy of yours!  (and, "FlyByNight
Games Counterfeiters" simply closes up shop and opens up,
next door)

And, because there is no "drop dead" point in the code or
the games behavior, the counterfeiter never knows when
he's found all the protection mechanisms.

Checking signatures, CRCs, licensing schemes, etc. all are used
in a "drop dead" fashion so considerably easier to defeat.
Witness the number of "products" available as warez...

>> Use a non-secret approach and you invite folks to alter it, as well.
>>
>>> Using non-standard algorithms for security is a simple way to get things 
>>> completely wrong.&nbsp; "Security by obscurity" is very rarely the right answer.  
>>> In reality, good security algorithms, and good implementations, are 
>>> difficult and specialised tasks, best left to people who know what they are 
>>> doing.
>>>
>>> To make something secure, you have to ensure that the check algorithms 
>>> depend on a key that you know, but that the attacker does not have. That's 
>>> the basis of digital signatures (though you use a secure hash algorithm 
>>> rather than a simple CRC).
>>
>> If you can remove the check, then what value the key's secrecy?&nbsp; By your
>> criteria, the adversary KNOWS how you are implementing your security
>> so he knows exactly what to remove to bypass your checks and allow his
>> altered image to operate in its place.
>>
>> Ever notice how manufacturers don't PUBLICLY disclose their security
>> hooks (without an NDA)?&nbsp; If "security by obscurity" was not important,
>> they would publish these details INVITING challenges (instead of
>> trying to limit the knowledge to people with whom they've officially
>> contracted).
> 
> Any serious manufacturer /does/ invite challenges to their security.
> 
> There are multiple reasons why a manufacturer (such as a semiconductor 
> manufacturer) might be guarded about the details of their security systems.  
> They can be avoiding giving hints to competitors.&nbsp; Maybe they know their 
> systems aren't really very secure, because their keys are too short or they can 
> be read out in some way.
> 
> But I think the main reasons are often:
> 
> They want to be able to change the details, and that's far easier if there are 
> only a few people who have read the information.

So, a legitimate customer is subjected to arbitrary changes in
the product's implementation?

> They don't want endless support questions from amateurs.

Only answer with a support contract.

> They are limited by idiotic government export restrictions made by ignorant 
> politicians who don't understand cryptography.

Protections don't always have to be cryptographic.  The
"Fortress" payphone is remarkably well hardened to direct
physical (brute force) attacks -- money is involved.
Ditto many slot machines (again, CASH money).  Yet, all
have vulnerabilities.  "Expose this portion of the die
to ultraviolet light to reset the memory protection bits"
Etc.

> Some things benefit from being kept hidden, or under restricted access. The 
> details of the CRC algorithm you use to catch accidental errors in your image 
> file is /not/ one of them.&nbsp; If you think hiding it has the remotest hint of a 
> benefit, you are doing things wrong - you need a /security/ check, not a simple 
> /integrity/ check.
> 
> And then once you have switched to a security check - a digital signature - 
> there's no need to keep that choice hidden either, because it is the /key/ that 
> is important, not the type of lock.

Again, meaningless if the attacker can interfere with the *enforcement*
of that check.  Using something "well known" just means he already knows
what to look for in your code.  Or, how to interfere with your
intended implementation in ways that you may have not anticipated
(confident that your "security" can't be MATHEMATICALLY broken).

I had a discussion with a friend who knew just enough about "computers"
to THINK he understood that world.  I mentioned my NOT using ecommerce.
He laughed at me as "naive":  "There's 40 bit encryption on those
connections!  No one is going to eavesdrop on your financial data!"

[Really, Jerry?  You think, as an OLD accountant, you know more
than I do as a young engineer practicing in that field?  Ok...]

"Yeah, and are you 100% sure something isn't already *on* your computer
looking at your keystrokes BEFORE they head down that encrypted tunnel?"

Guess he hadn't really thought out the problem to that level of detail
as his confidence quickly melted away to one of worry ("I wonder if
I've already been hacked??")

People implementing security almost always focus on the wrong
aspects of the problem and walk away THINKING they can rest easy.
Vulnerabilities are often so blatantly obvious, after the fact,
as to be embarassing:  "You're not supposed to do that!"
"Then, why did your product LET ME?"

I use *many* layers of security in my current design and STILL
expect them (at least the ones that are accessible) to all
be subverted.  So, ultimately rely on controlling *what*
the devices can do so that, even compromised, they can't
cause undetectable failures or information leaks.

"Here's my source code.  Here are my schematics.  Here's the
name of the guy who oversees production (bribe him to gain
access to the keys stored in the TPM).  Now, what are you
gonna *do* with all that?"

Reply by Rick C ●April 22, 20232023-04-22

On Friday, April 21, 2023 at 11:02:28&#8239;AM UTC-4, David Brown wrote:
> On 21/04/2023 14:12, Rick C wrote: 
> > 
> > This is simply to be able to say this version is unique, regardless 
> > of what the version number says. Version numbers are set manually 
> > and not always done correctly. I'm looking for something as a backup 
> > so that if the checksums are different, I can be sure the versions 
> > are not the same. 
> > 
> > The less work involved, the better. 
> >
> Run a simple 32-bit crc over the image. The result is a hash of the 
> image. Any change in the image will show up as a change in the crc.

No one is trying to detect changes in the image.  I'm trying to label the image in a way that can be read in operation.  I'm using the checksum simply because that is easy to generate.  I've had problems with version numbering in the past.  It will be used, but I want it supplemented with a number that will change every time the design changes, at least with a high probability, such as 1 in 64k. 

-- 

  Rick C.

  --- Get 1,000 miles of free Supercharging
  --- Tesla referral code - https://ts.la/richard11209

Reply by Rick C ●April 22, 20232023-04-22

On Friday, April 21, 2023 at 7:52:27&#8239;PM UTC-4, Brian Cockburn wrote:
> On Friday, April 21, 2023 at 10:12:49&#8239;PM UTC+10, Rick C wrote: 
> > On Friday, April 21, 2023 at 4:53:18&#8239;AM UTC-4, Brian Cockburn wrote: 
> > > On Thursday, April 20, 2023 at 12:06:36&#8239;PM UTC+10, Rick C wrote: 
> > > > This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. 
> > > > 
> > > > I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16. 
> > > > 
> > > > I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum. 
> > > > 
> > > > I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy. 
> > > > 
> > > > I keep thinking there is a different way of looking at this to achieve the result I want... 
> > > > 
> > > > Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact. 
> > > > 
> > > > This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that. 
> > > > 
> > > > -- 
> > > > 
> > > > Rick C. 
> > > > 
> > > > - Get 1,000 miles of free Supercharging 
> > > > - Tesla referral code - https://ts.la/richard11209 
> > > Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection? 
> > This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same. 
> > 
> > The less work involved, the better. 
> > 
> > -- 
> > 
> > Rick C. 
> > 
> > ++ Get 1,000 miles of free Supercharging 
> > ++ Tesla referral code - https://ts.la/richard11209
> Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value. 
> And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?

Yes, I want the checksum to be readable while operating.  Calculation code???  Not going to happen.  That's why I want to embed the checksum. 

Yes, two compiled files which ended up with the same version number by error.  We are using an 8 bit version number, so two hex digits.  Negative numbers are lab versions, positive numbers are releases, so 64 of each.  We don't do a lot of actual work on the hardware.  This code usually is 99.9% working by the time it is tested on hardware.  So no need for lots of rev numbers.  But sometimes, in the lab, the rev number is not bumped when it should be.  The checksum will tell us if we are working with different revisions in that case. 

So far, it looks like a simple checksum is the way to go.  Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change. 

-- 

  Rick C.

  --+ Get 1,000 miles of free Supercharging
  --+ Tesla referral code - https://ts.la/richard11209

Reply by Brian Cockburn ●April 22, 20232023-04-22

Rick,
>> Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value.
>> And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?
> 
> Yes, I want the checksum to be readable while operating.  Calculation code???  Not going to happen.  That's why I want to embed the checksum.

  Can you expand on what you mean or expect by 'readable while operating' please?  Are you planning to use some sort of tool to inspect the executing binary to 'read' this thing, or provoke output to the console in some way like:

	$ run my-binary-thing --checksum
	10FD
	$

This would be as distinct from:
	
	$ run my-binary-thing --version
	-52
	$
> Yes, two compiled files which ended up with the same version number by error.  We are using an 8 bit version number, so two hex digits.  Negative numbers are lab versions, positive numbers are releases, so 64 of each. 

  Signed 8-bit numbers range from -128 to +127 (0x80 to 0x7F) so probably a few more than 64.

> ... sometimes, in the lab, the rev number is not bumped when it should be.

  This may be an indicator that better procedures are needed for code review-for-release.  And that in independent pair of eyes should be doing the review against an agreed check list.

> So far, it looks like a simple checksum is the way to go.  Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change.

  How will the checksum 'not change'?  It will be different for every build won't it?

  Cheers, Brian.

Previous 1 234 5 6 Next

Embedding a Checksum in an Image File

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group