EmbeddedRelated.com
Forums

Embedding a Checksum in an Image File

Started by Rick C April 19, 2023
On 24/04/2023 09:32, Don Y wrote:
> On 4/22/2023 7:57 AM, David Brown wrote: >>>> However, in almost every case where CRC's might be useful, you have >>>> additional checks of the sanity of the data, and an all-zero or >>>> all-one data block would be rejected.  For example, Ethernet packets >>>> use CRC for integrity checking, but an attempt to send a packet type >>>> 0 from MAC address 00:00:00:00:00:00 to address 00:00:00:00:00:00, >>>> of length 0, would be rejected anyway. >>> >>> Why look at "data" -- which may be suspect -- and *then* check its CRC? >>> Run the CRC first.  If it fails, decide how you are going to proceed >>> or recover. >> >> That is usually the order, yes.  Sometimes you want "fail fast", such >> as dropping a packet that was not addressed to you (it doesn't matter >> if it was received correctly but for someone else, or it was addressed >> to you but the receiver address was corrupted - you are dropping the >> packet either way).  But usually you will run the CRC then look at the >> data. >> >> But the order doesn't matter - either way, you are still checking for >> valid data, and if the data is invalid, it does not matter if the CRC >> only passed by luck or by all zeros. > > You're assuming the CRC is supposed to *vouch* for the data. > The CRC can be there simply to vouch for the *transport* of a > datagram.
I am assuming that the CRC is there to determine the integrity of the data in the face of possible unintentional errors. That's what CRC checks are for. They have nothing to do with the content of the data, or the type of the data package or image. As an example of the use of CRC's in messaging, look at Ethernet frames: <https://en.wikipedia.org/wiki/Ethernet_frame> The CRC does not care about the content of the data it protects.
> > So, use a version-specific CRC on the packet.&nbsp; If it fails, then > either the data in the packet has been corrupted (which could just > as easily have involved an embedded "interface version" parameter); > or the packet was formed with the wrong CRC. > > If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then > why bother looking at a "protocol version" parameter?&nbsp; Would > you ALSO want to verify all the rest of the parameters? >
I'm sorry, I simply cannot see your point. Identifying the version of a protocol, or other protocol type information, is a totally orthogonal task to ensuring the integrity of the data. The concepts should be handled separately.
>>> What term would you have me use to indicate a "bias" applied to a CRC >>> algorithm? >> >> Well, first I'd note that any kind of modification to the basic CRC >> algorithm is pointless from the viewpoint of its use as an integrity >> check.&nbsp; (There have been, mostly historically, some justifications in >> terms of implementation efficiency.&nbsp; For example, bit and byte >> re-ordering could be done to suit hardware bit-wise implementations.) >> >> Otherwise I'd say you are picking a specific initial value if that is >> what you are doing, or modifying the final value (inverting it or >> xor'ing it with a fixed value).&nbsp; There is, AFAIK, no specific terms >> for these - and I don't see any benefit in having one.&nbsp; Misusing the >> term "salt" from cryptography is certainly not helpful. > > Salt just ensures that you can differentiate between functionally identical > values.&nbsp; I.e., in a CRC, it differentiates between the "0x0000" that CRC-1 > generates from the "0x0000" that CRC-2 generates.
Can we agree that this is called an "initial value", not "salt" ?
> > You don't see the parallel to ensuring that *my* use of "Passw0rd" is > encoded in a different manner than *your* use of "Passw0rd"?
No. They are different things. An important difference is that adding "salt" to a password hash is an important security feature. Picking a different initial value for a CRC instead of having appropriate protocol versioning in the data (or a surrounding envelope) is a misfeature. The second difference is the purpose of the hashing. The CRC here is for data integrity - spotting mistakes in the data during transfer or storage. The hash in a password is for security, avoiding the password ever being transmitted or stored in plain text. Any coincidence in the the way these might be implemented is just that - coincidence.
> >>> See the RMI desciption. >> >> I'm sorry, I have no idea what "RMI" is or where it is described. >> You've mentioned that abbreviation twice, but I can't figure it out. > > <https://en.wikipedia.org/wiki/RMI> > <https://en.wikipedia.org/wiki/OCL> > > Nothing magical with either term.
I looked up RMI on Wikipedia before asking, and saw nothing of relevance to CRC's or checksums. I noticed no mention of "OCL" in your posts, and looking it up on Wikipedia gives no clues. So for now, I'll assume you don't want anyone to know what you meant and I can safely ignore anything you write in connection with the terms.
> >>> OTOH, "salting" the calculation so that it is expected to yield >>> a value of 0x13 means *those* situations will be flagged as errors >>> (and a different set of situations will sneak by, undetected). >> >> And that gives you exactly /zero/ benefit. > > See above.
I did. Zero benefit. Actually, it is worse than useless - it makes it harder to identify the protocol, and reduces the information content of the CRC check.
> >> You run your hash algorithm, and check for the single value that >> indicates no errors.&nbsp; It does not matter if that number is 0, 0x13, or >> - often more > -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > As you've admitted, it doesn't matter.&nbsp; So, why wouldn't I opt to have > an algorithm for THIS interface give me a result that is EXPECTED > for this protocol?&nbsp; What value picking "0"? >
A /single/ result does not matter (other than needlessly complicating things). Having multiple different valid results /does/ matter.
>>>> That is why you need to distinguish between the two possibilities. >>>> If you don't have to worry about malicious attacks, a 32-bit CRC >>>> takes a dozen lines of C code and a 1 KB table, all running >>>> extremely efficiently.&nbsp; If security is an issue, you need digital >>>> signatures - an RSA-based signature system is orders of magnitude >>>> more effort in both development time and in run time. >>> >>> It's considerably more expensive AND not fool-proof -- esp if the >>> attacker knows you are signing binaries.&nbsp; "OK, now I need to find >>> WHERE the signature is verified and just patch that "CALL" out >>> of the code". >> >> I'm not sure if that is a straw-man argument, or just showing your >> ignorance of the topic.&nbsp; Do you really think security checks are done >> by the program you are trying to send securely?&nbsp; That would be like >> trying to have building security where people entering the building >> look at their own security cards. > > Do YOU really think we all design applications that run in PCs where some > CLOSED OS performs these tests in a manner that can't be subverted?
Do you bother to read my posts at all? Or do you prefer to make up things that you imagine I write, so that you can make nonsensical attacks on them? Certainly there is no sane reading of my posts (written and sent from an /open/ OS) where "do not rely on security by obscurity" could be taken to mean "rely on obscured and closed platforms".
> *WE* (tend to) write ALL the code in the products developed, here. > So, whether it's the POST WE wrote that is performing the test or > the loader WE wrote, it's still *our* program. > > Yes, we ARE looking at our own security cards! > > Manufacturers *try* to hide ("obscurity") details of these mechanisms > in an attempt to improve effective security.&nbsp; But, there's nothing > that makes these guarantees.
Why are you trying to "persuade" me that manufacturer obscurity is a bad thing? You have been promoting obscurity of algorithms as though it were helpful for security - I have made clear that it is not. Are you getting your own position mixed up with mine?
> > Give me the sources for Windows (Linux, *BSD, etc.) and I can > subvert all the state-of-the-art digital signing used to ensure > binaries aren't altered.&nbsp; Nothing *outside* the box is involved > so, by definition, everything I need has to reside *in* the box.
No, you can't. The sources for Linux and *BSD /are/ all freely available. The private signing keys used by, for example, Red Hat or Debian, are /not/ freely available. You cannot make changes to a Red Hat or Debian package that will pass the security checks - you are unable to sign the packages. This is precisely because something /outside/ the box /is/ involved - the private half of the public/private key used for signing. The public half - and all the details of the algorithms - is easily available to let people verify the signature, but the private half is kept secret. (Sorry, but I've skipped and snipped the rest. I simply don't have time to go through it in detail. If others find it useful or interesting, that's great, but there has to be limits somewhere.)
Den 2023-04-20 kl. 04:06, skrev Rick C:
> This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. >
The proper way to do this is to have a directive in the linker. This reserves space for the CRC and defines the area where the CRC is calculated. I am not aware of any linker which support this. Two months ago, I added the DIGEST directive to binutils aka the GNU linker. It was committed, but then people realized that I had not signed an agreement with Free Software Foundation. Since part of the code I pushed was from a third party which released their code under MIT, the licensing has not been resolved yet but the patch is in binutils git, but reverted. You would write (IIRC): DIGEST "CRC64-ECMA", (from, to) and the linker would reserve 8 bytes which is filled with the CRC in the final link stage. /Ulf
> I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16. > > I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum. > > I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy. > > I keep thinking there is a different way of looking at this to achieve the result I want... > > Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact. > > This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that. >
Den 2023-04-20 kl. 04:06, skrev Rick C:
> This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. >
The proper way to do this is to have a directive in the linker. This reserves space for the CRC and defines the area where the CRC is calculated. I am not aware of any linker which support this. Two months ago, I added the DIGEST directive to binutils aka the GNU linker. It was committed, but then people realized that I had not signed an agreement with Free Software Foundation. Since part of the code I pushed was from a third party which released their code under MIT, the licensing has not been resolved yet but the patch is in binutils git, but reverted. You would write (IIRC): DIGEST "CRC64-ECMA", (from, to) and the linker would reserve 8 bytes which is filled with the CRC in the final link stage. /Ulf
> I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16. > > I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum. > > I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy. > > I keep thinking there is a different way of looking at this to achieve the result I want... > > Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact. > > This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that. >
Den 2023-04-20 kl. 22:26, skrev David Brown:
> On 20/04/2023 18:45, Rick C wrote: >> On Thursday, April 20, 2023 at 11:33:28&#8239;AM UTC-4, George Neuner >> wrote: >>> On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C >>> <gnuarm.del...@gmail.com> wrote: >>> >>>> This is a bit of the chicken and egg thing. If you want a embed >>>> a checksum in a code module to report the checksum, is there a >>>> way of doing this? It's a bit like being your own grandfather, I >>>> think. >>> Take a look at the old xmodem/ymodem CRC. It was designed such >>> that when the CRC was sent immediately following the data, a >>> receiver computing CRC over the whole incoming packet (data and CRC >>> both) would get a result of zero. >>> >>> But AFAIK it doesn't work with CCITT equation(s) - you have to use >>> xmodem/ymodem. >>>> I'm not thinking anything too fancy, like a CRC, but rather a >>>> simple modulo N addition, maybe N being 2^16. >>> Sorry, I don't know a way to do it with a modular checksum. YMMV, >>> but I think 16-bit CRC is pretty simple. >>> >>> George >> >> CRC is not complicated, but I would not know how to calculate an >> inserted value to force the resulting CRC to zero.&nbsp; How do you do >> that? > > You "insert" the value at the end.&nbsp; Anything else is insane.
In all projects I have been involved with, the application binary starts with a header looking like this. MAGIC WORD 1 CRC Entry Point Size other info... MAGIC WORD 2 APPLICATION_START ... APPLICATION_END (aligned with flash sector) The bootloader first checks the two magic words. It then computes CRC on the header (from Entry Point) to APPLICATION_END I ported the IAR ielftool (open source) to Linux at https://github.com/emagii/ielftool This can insert the CRC in the ELF file, but needs tweaks to work with an ELF file generated by the GNU tools. /Ulf
> > CRC's are quite good hashes, for suitable sized data.&nbsp; There are perhaps > some special cases, but basically you'd be doing trial-and-error > searches to find an inserted value that gives you a zero CRC overall. > 2^16 is not an overwhelming search space, but the whole idea is pointless. > >> >> Even so, I'm not trying to validate the file.&nbsp; I'm trying to come up >> with a substitute for a time stamp or version number.&nbsp; I don't want >> to have to rely on my consistency in handling the version number >> correctly.&nbsp; This would be a backup in case there was more than one >> version released, even only within the "lab", that were different.&nbsp; A >> checksum that could be read by the controlling software would do the >> job. > > A CRC is fine for that. > >> >> I have run into this before, where the version number was not a 100% >> indication of the uniqueness of an executable.&nbsp; The checksum would be >> a second indicator. >> >> I should mention that I'm not looking for a solution that relies on >> any specific details of the tools. >> > > A table-based CRC is easy, runs quickly, and can be quickly ported to > pretty much any language (the C and Python code, for example, is almost > the same).
Den 2023-04-22 kl. 05:14, skrev Rick C:
> On Friday, April 21, 2023 at 11:02:28&#8239;AM UTC-4, David Brown wrote: >> On 21/04/2023 14:12, Rick C wrote: >>> >>> This is simply to be able to say this version is unique, regardless >>> of what the version number says. Version numbers are set manually >>> and not always done correctly. I'm looking for something as a backup >>> so that if the checksums are different, I can be sure the versions >>> are not the same. >>> >>> The less work involved, the better. >>> >> Run a simple 32-bit crc over the image. The result is a hash of the >> image. Any change in the image will show up as a change in the crc. > > No one is trying to detect changes in the image. I'm trying to label the image in a way that can be read in operation. I'm using the checksum simply because that is easy to generate. I've had problems with version numbering in the past. It will be used, but I want it supplemented with a number that will change every time the design changes, at least with a high probability, such as 1 in 64k. >
Another thing I added (and was later removed) was a timestamp directive. A 64 bit integer with the number of seconds since 1970-01-01 00:00. /Ulf
On Thursday, April 27, 2023 at 12:26:47&#8239;PM UTC-4, Ulf Samuelsson wrote:
> Den 2023-04-20 kl. 04:06, skrev Rick C: > > This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. > > > The proper way to do this is to have a directive in the linker. > This reserves space for the CRC and defines the area where the CRC is > calculated.
That assumes there is a linker. How does the application access this information?
> I am not aware of any linker which support this. > > Two months ago, I added the DIGEST directive to binutils aka the GNU > linker. It was committed, but then people realized that I had not signed > an agreement with Free Software Foundation. > Since part of the code I pushed was from a third party which released > their code under MIT, the licensing has not been resolved yet > but the patch is in binutils git, but reverted. > > You would write (IIRC): > DIGEST "CRC64-ECMA", (from, to) > and the linker would reserve 8 bytes which is filled with the CRC in the > final link stage.
You are making a lot of assumptions about the tools. I'm pretty sure they don't apply to my case. I'm not at all clear how this is workable, anyway. Adding the checksum to the file, changes the checksum, which is where this conversation started... unless I'm missing something significant. -- Rick C. +++ Get 1,000 miles of free Supercharging +++ Tesla referral code - https://ts.la/richard11209
On 2023-04-27 20:09, Rick C wrote:
> On Thursday, April 27, 2023 at 12:26:47&#8239;PM UTC-4, Ulf Samuelsson wrote: >> Den 2023-04-20 kl. 04:06, skrev Rick C: >>> This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. >>> >> The proper way to do this is to have a directive in the linker. >> This reserves space for the CRC and defines the area where the CRC is >> calculated. > > That assumes there is a linker.
Almost all toolchains have a linker.
> How does the application access this information?
In Ulf's suggestion, it seems the DIGEST directive emits 8 bytes of checksum at the current point (usually the linker "." symbol). I assume one can give that point in the image a linkage symbol, perhaps like _checksum DIGEST "CRC64-ECMA", (from, to) or like _checksum EQU. . DIGEST "CRC64-ECMA", (from, to) (This is schematic linker code, not necessarily proper syntax.) One can then from the application code access the "checksum" location as an externally defined object, say: extern uint8[8] checksum; The linker will connect that C identifier to the actual address of the DIGEST checksum. Here I assumed that the C compiler mangles C identifiers into linkage symbols by prefixing an underscore; YMMV.
> You are making a lot of assumptions about the tools. I'm pretty sure > they don't apply to my case. I'm not at all clear how this is > workable, anyway. Adding the checksum to the file, changes the > checksum, which is where this conversation started... unless I'm > missing something significant.
But you have insisted that your "checksum" is for the purpose of identifying the version of the program, not for checking the integrity of the memory image. If so, that checksum does not have to be the checksum of the whole memory image, as long as it is the checksum of the part of the image that contains the actual code and constant data, and so will change according to changes in those parts of the image.
Um, just noting some typos in my last, with apologies:

On 2023-04-27 21:29, Niklas Holsti wrote:

> &nbsp; _checksum&nbsp; EQU.&nbsp;&nbsp; .
should be _checksum EQU . (Thunderbird inserted an extra period out of "friendliness"...)
> &nbsp;&nbsp; extern uint8[8] checksum;
should be (my C is rusty): extern uint8 checksum[8];
Den 2023-04-27 kl. 19:09, skrev Rick C:
> On Thursday, April 27, 2023 at 12:26:47&#8239;PM UTC-4, Ulf Samuelsson wrote: >> Den 2023-04-20 kl. 04:06, skrev Rick C: >>> This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think. >>> >> The proper way to do this is to have a directive in the linker. >> This reserves space for the CRC and defines the area where the CRC is >> calculated. > > That assumes there is a linker. How does the application access this information? > >
Linker command file public CRC64; start, stop HEADER = .; QUAD(MAGIC); CRC64 = .; DIGEST "CRC64-ECMA", (start, stop) start = .; # Your data to be protected ... stop = .; C source code. extern uint64_t CRC64; extern char* start; extern char* stop; uint64_t crc; crc64 = calc_crc64_ecma(start, stop); if (crc64 == CRC64) { /* everything is OK */ }
>> I am not aware of any linker which support this. >> >> Two months ago, I added the DIGEST directive to binutils aka the GNU >> linker. It was committed, but then people realized that I had not signed >> an agreement with Free Software Foundation. >> Since part of the code I pushed was from a third party which released >> their code under MIT, the licensing has not been resolved yet >> but the patch is in binutils git, but reverted. >> >> You would write (IIRC): >> DIGEST "CRC64-ECMA", (from, to) >> and the linker would reserve 8 bytes which is filled with the CRC in the >> final link stage. > > You are making a lot of assumptions about the tools. I'm pretty sure they don't apply to my case. I'm not at all clear how this is workable, anyway. Adding the checksum to the file, changes the checksum, which is where this conversation started... unless I'm missing something significant. >
I am assuming that no tool support this off the shelg, but the patches are inside binutils, but reverted. /Ulf
Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
> On 2023-04-27 20:09, Rick C wrote: >> On Thursday, April 27, 2023 at 12:26:47&#8239;PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C: >>>> This is a bit of the chicken and egg thing. If you want a embed a >>>> checksum in a code module to report the checksum, is there a way of >>>> doing this? It's a bit like being your own grandfather, I think. >>>> >>> The proper way to do this is to have a directive in the linker. >>> This reserves space for the CRC and defines the area where the CRC is >>> calculated. >> >> That assumes there is a linker. > > > Almost all toolchains have a linker. > > >> How does the application access this information? > > > In Ulf's suggestion, it seems the DIGEST directive emits 8 bytes of > checksum at the current point (usually the linker "." symbol). I assume > one can give that point in the image a linkage symbol, perhaps like > > &nbsp; _checksum&nbsp; DIGEST "CRC64-ECMA", (from, to) > > or like > > &nbsp; _checksum&nbsp; EQU.&nbsp;&nbsp; . > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DIGEST "CRC64-ECMA", (from, to) > > > (This is schematic linker code, not necessarily proper syntax.) > > One can then from the application code access the "checksum" location as > an externally defined object, say: > > &nbsp;&nbsp; extern uint8[8] checksum; > > The linker will connect that C identifier to the actual address of the > DIGEST checksum. Here I assumed that the C compiler mangles C > identifiers into linkage symbols by prefixing an underscore; YMMV. >
Yes, that is more or less it.
> >> You are making a lot of assumptions about the tools.&nbsp; I'm pretty sure >> they don't apply to my case.&nbsp; I'm not at all clear how this is >> workable, anyway.&nbsp; Adding the checksum to the file, changes the >> checksum, which is where this conversation started... unless I'm >> missing something significant.
No, you reserve room for the checksum, but that needs to be outside the checked area. The address of the checksum needs to be known to the application. Also the limits of the checked area. That is why the application has a header in front in my projects. The application is started by the bootloader, which checks a number of things before the application is started. The application can read the header as well to allow checking the code area at runtime.
> > > But you have insisted that your "checksum" is for the purpose of > identifying the version of the program, not for checking the integrity > of the memory image. If so, that checksum does not have to be the > checksum of the whole memory image, as long as it is the checksum of the > part of the image that contains the actual code and constant data, and > so will change according to changes in those parts of the image. >
/Ulf