On 2023-04-27 23:36, Ulf Samuelsson wrote:> Den 2023-04-27 kl. 19:09, skrev Rick C: >> On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C: >>>> This is a bit of the chicken and egg thing. If you want a embed a >>>> checksum in a code module to report the checksum, is there a way of >>>> doing this? It's a bit like being your own grandfather, I think. >>>> >>> The proper way to do this is to have a directive in the linker. >>> This reserves space for the CRC and defines the area where the CRC is >>> calculated. >> >> That assumes there is a linker. How does the application access this >> information? >> >> > Linker command file > public CRC64; start, stop > HEADER = .; > QUAD(MAGIC); > CRC64 = .; > DIGEST "CRC64-ECMA", (start, stop) > start = .; > # Your data to be protected > ... > stop = .; > > C source code. > > extern uint64_t CRC64; > extern char* start; > extern char* stop; > > uint64_t crc; > > crc64 = calc_crc64_ecma(start, stop); > if (crc64 == CRC64) { > /* everything is OK */ > }I'm nit-picking, but that C code does not look right to me. The extern declarations for "start" and "stop" claim them to be names of memory locations that contain addresses, but the linker file just places them at the starting and one-past-end locations of the block to be protected. So the "start" variable contains the first bytes of the "data to be protected", and the contents of the "stop" variable are not defined because it is placed after the "data to be protected", where no code or data is loaded (it seems). It seems to me that the call to calc_crc64_ecma should get the addresses of "start" and "stop" as arguments (&start, &stop), instead of their values. But perhaps calc_crc64_ecma is not a function, but a macro that can itself take the addresses of its parameters.
Embedding a Checksum in an Image File
Started by ●April 19, 2023
Reply by ●April 27, 20232023-04-27
Reply by ●April 28, 20232023-04-28
On 27/04/2023 18:36, Ulf Samuelsson wrote:> Den 2023-04-20 kl. 22:26, skrev David Brown: >> On 20/04/2023 18:45, Rick C wrote: >>> On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner >>> wrote: >>>> On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C >>>> <gnuarm.del...@gmail.com> wrote: >>>> >>>>> This is a bit of the chicken and egg thing. If you want a embed >>>>> a checksum in a code module to report the checksum, is there a >>>>> way of doing this? It's a bit like being your own grandfather, I >>>>> think. >>>> Take a look at the old xmodem/ymodem CRC. It was designed such >>>> that when the CRC was sent immediately following the data, a >>>> receiver computing CRC over the whole incoming packet (data and CRC >>>> both) would get a result of zero. >>>> >>>> But AFAIK it doesn't work with CCITT equation(s) - you have to use >>>> xmodem/ymodem. >>>>> I'm not thinking anything too fancy, like a CRC, but rather a >>>>> simple modulo N addition, maybe N being 2^16. >>>> Sorry, I don't know a way to do it with a modular checksum. YMMV, >>>> but I think 16-bit CRC is pretty simple. >>>> >>>> George >>> >>> CRC is not complicated, but I would not know how to calculate an >>> inserted value to force the resulting CRC to zero. How do you do >>> that? >> >> You "insert" the value at the end. Anything else is insane. > > In all projects I have been involved with, the application binary starts > with a header looking like this. > > > MAGIC WORD 1 > CRC > Entry Point > Size > other info... > MAGIC WORD 2 > APPLICATION_START > ... > APPLICATION_END (aligned with flash sector) > > > The bootloader first checks the two magic words. > It then computes CRC on the header (from Entry Point) to APPLICATION_END > > I ported the IAR ielftool (open source) to Linux at > https://github.com/emagii/ielftool > > This can insert the CRC in the ELF file, but needs tweaks to work > with an ELF file generated by the GNU tools. > > /UlfThat can work for some microcontrollers, but is unsuitable for others - it depends on how the flash is organised. For an msp430, for example, it would be fine, as the interrupt vectors (including the reset vector) are at the end of flash. But for most ARM Cortex M devices, it would not be suitable - they expect the reset vector and initial stack pointer at the start of the flash image. Some devices have a boot ROM, and then you have to match their specifics for the header - or you can have your own boot program, and make the header how ever you like. I am absolutely a fan of having some kind of header like this (and sometimes even a human-readable copyright notice, identifier and version information). And having it as near the beginning as possible is good. But for many microcontrollers, having it at the start is not feasible. And if you can't put the CRC at the start like you do, you have to put it at the end of the image. I've never really thought about trying to inject a CRC into an elf file. I use elfs (or should that be "elves" ?) for debugging, not flash programming. And usually the main concern for having a CRC at the end of the image is when you have an online update of some kind, to check that nothing has gone wrong during the transfer or in-field update.
Reply by ●April 28, 20232023-04-28
On 27/04/2023 18:42, Ulf Samuelsson wrote:> Den 2023-04-22 kl. 05:14, skrev Rick C: >> On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote: >>> On 21/04/2023 14:12, Rick C wrote: >>>> >>>> This is simply to be able to say this version is unique, regardless >>>> of what the version number says. Version numbers are set manually >>>> and not always done correctly. I'm looking for something as a backup >>>> so that if the checksums are different, I can be sure the versions >>>> are not the same. >>>> >>>> The less work involved, the better. >>>> >>> Run a simple 32-bit crc over the image. The result is a hash of the >>> image. Any change in the image will show up as a change in the crc. >> >> No one is trying to detect changes in the image. I'm trying to label >> the image in a way that can be read in operation. I'm using the >> checksum simply because that is easy to generate. I've had problems >> with version numbering in the past. It will be used, but I want it >> supplemented with a number that will change every time the design >> changes, at least with a high probability, such as 1 in 64k. >> > > Another thing I added (and was later removed) was a timestamp directive. > A 64 bit integer with the number of seconds since 1970-01-01 00:00. >Timestamping a build in some way (as part of the "make", using __DATE__ or __TIME__ in source code, or some feature of a revision control system) is very tempting, and can be helpful for tracking exactly what code you have on the system. However, IMHO having reproducible builds is much more valuable. I am not happy with a project build until I am getting identical binaries built on multiple hosts (Windows and Linux). That's how you can be absolutely sure of what code went into a particular binary, even years or decades later. A compromise that can work is to distinguish development builds and production builds, and have timestamping in development builds. That also reduces the rate at which your minor version number or build number goes up, and avoids endless changes to your "version.h" include file.
Reply by ●April 28, 20232023-04-28
On 27/04/2023 18:26, Ulf Samuelsson wrote:> Den 2023-04-20 kl. 04:06, skrev Rick C: >> This is a bit of the chicken and egg thing. If you want a embed a >> checksum in a code module to report the checksum, is there a way of >> doing this? It's a bit like being your own grandfather, I think. >> > The proper way to do this is to have a directive in the linker. > This reserves space for the CRC and defines the area where the CRC is > calculated. > I am not aware of any linker which support this. > > Two months ago, I added the DIGEST directive to binutils aka the GNU > linker. It was committed, but then people realized that I had not signed > an agreement with Free Software Foundation. > Since part of the code I pushed was from a third party which released > their code under MIT, the licensing has not been resolved yet > but the patch is in binutils git, but reverted. > > You would write (IIRC): > DIGEST "CRC64-ECMA", (from, to) > and the linker would reserve 8 bytes which is filled with the CRC in the > final link stage. > > /Ulf >I like that. Thanks for doing that work. Is there also a way to get the length of the final link, and insert it near the beginning of the image? I suppose that would be another kind of DIGEST where the algorithm is simply (to - from). (I assume that "to" and "from" may be linker symbols.)
Reply by ●April 28, 20232023-04-28
On 27/04/2023 20:29, Niklas Holsti wrote:> On 2023-04-27 20:09, Rick C wrote: >> On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C: >>>> This is a bit of the chicken and egg thing. If you want a embed a >>>> checksum in a code module to report the checksum, is there a way of >>>> doing this? It's a bit like being your own grandfather, I think. >>>> >>> The proper way to do this is to have a directive in the linker. >>> This reserves space for the CRC and defines the area where the CRC is >>> calculated. >> >> That assumes there is a linker. > > > Almost all toolchains have a linker. >It is possible that Rick is using Forth, rather than C (or other languages traditionally compiled in a similar manner, such as C++ and Ada). There are also some commercial C toolchains for brain-dead 8-bit CISC devices that are monolithic and offer very little control over the linking. Ulf is correct that the ideal place to handle this is part of the linking process. I do it with a post-link Python script run during the build, because the linkers I use can't handle this at the moment. But if Ulf's patch works its way into binutils then I'll be able to do it directly during linking, which is neater. (I will still have post-link scripts to handle things like renaming image files according to version, making zips for sending to customers, etc. - linkers can't do /everything/ !)
Reply by ●April 28, 20232023-04-28
On 27/04/2023 22:44, Ulf Samuelsson wrote:> Den 2023-04-27 kl. 20:29, skrev Niklas Holsti: >> On 2023-04-27 20:09, Rick C wrote:> >> >>> You are making a lot of assumptions about the tools. I'm pretty sure >>> they don't apply to my case. I'm not at all clear how this is >>> workable, anyway. Adding the checksum to the file, changes the >>> checksum, which is where this conversation started... unless I'm >>> missing something significant. > No, you reserve room for the checksum, but that needs to be outside > the checked area. > The address of the checksum needs to be known to the application.The address here could have a symbol, and then declared "extern" in the C code - it would not have to be a known numerical address. But if the image is checked or started from another program (such as a boot program), you need an absolute address somewhere to chain this all together.> Also the limits of the checked area. > That is why the application has a header in front in my projects. > The application is started by the bootloader, which checks > a number of things before the application is started. > The application can read the header as well to allow checking > the code area at runtime. >Or for my preferences, the CRC "DIGEST" would be put at the end of the image, rather than near the start. Then the "from, to" range would cover the entire image except for the final CRC. But I'd have a similar directive for the length of the image at a specific area near the start.
Reply by ●April 28, 20232023-04-28
Den 2023-04-28 kl. 09:12, skrev David Brown:> On 27/04/2023 18:36, Ulf Samuelsson wrote: >> Den 2023-04-20 kl. 22:26, skrev David Brown: >>> On 20/04/2023 18:45, Rick C wrote: >>>> On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner >>>> wrote: >>>>> On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C >>>>> <gnuarm.del...@gmail.com> wrote: >>>>> >>>>>> This is a bit of the chicken and egg thing. If you want a embed >>>>>> a checksum in a code module to report the checksum, is there a >>>>>> way of doing this? It's a bit like being your own grandfather, I >>>>>> think. >>>>> Take a look at the old xmodem/ymodem CRC. It was designed such >>>>> that when the CRC was sent immediately following the data, a >>>>> receiver computing CRC over the whole incoming packet (data and CRC >>>>> both) would get a result of zero. >>>>> >>>>> But AFAIK it doesn't work with CCITT equation(s) - you have to use >>>>> xmodem/ymodem. >>>>>> I'm not thinking anything too fancy, like a CRC, but rather a >>>>>> simple modulo N addition, maybe N being 2^16. >>>>> Sorry, I don't know a way to do it with a modular checksum. YMMV, >>>>> but I think 16-bit CRC is pretty simple. >>>>> >>>>> George >>>> >>>> CRC is not complicated, but I would not know how to calculate an >>>> inserted value to force the resulting CRC to zero. How do you do >>>> that? >>> >>> You "insert" the value at the end. Anything else is insane. >> >> In all projects I have been involved with, the application binary starts >> with a header looking like this. >> >> >> MAGIC WORD 1 >> CRC >> Entry Point >> Size >> other info... >> MAGIC WORD 2 >> APPLICATION_START >> ... >> APPLICATION_END (aligned with flash sector) >> >> >> The bootloader first checks the two magic words. >> It then computes CRC on the header (from Entry Point) to APPLICATION_END >> >> I ported the IAR ielftool (open source) to Linux at >> https://github.com/emagii/ielftool >> >> This can insert the CRC in the ELF file, but needs tweaks to work >> with an ELF file generated by the GNU tools. >> >> /Ulf > > That can work for some microcontrollers, but is unsuitable for others - > it depends on how the flash is organised. For an msp430, for example, > it would be fine, as the interrupt vectors (including the reset vector) > are at the end of flash. But for most ARM Cortex M devices, it would > not be suitable - they expect the reset vector and initial stack pointer > at the start of the flash image. Some devices have a boot ROM, and then > you have to match their specifics for the header - or you can have your > own boot program, and make the header how ever you like.All projects I am involved with have a custom bootloader. If there is a problem with the reset vector, then the program will fail immediately. The CRC is right after the initial vector table. The bootloader application contains a copy of the vector table. THe first thing the bootloader does is to check the CRC from right after the CRC. Then it compares the vector table with the copy. The header is only for the application.> > I am absolutely a fan of having some kind of header like this (and > sometimes even a human-readable copyright notice, identifier and version > information). And having it as near the beginning as possible is good. > But for many microcontrollers, having it at the start is not feasible. > And if you can't put the CRC at the start like you do, you have to put > it at the end of the image. > > > I've never really thought about trying to inject a CRC into an elf file. > I use elfs (or should that be "elves" ?) for debugging, not flash > programming. And usually the main concern for having a CRC at the end > of the image is when you have an online update of some kind, to check > that nothing has gone wrong during the transfer or in-field update. > >The last bootloader I wrote download using Y-Modem which has CRC checking. Since it had more RAM than internal flash, the whole application was downloaded to RAM first, and then when everything is OK, the flash can be programmed. Finally, the header is analyzed and the flash contents checked. There is absolutely no need to have the CRC at the end since the CRC result is stored in a known location. /Ulf
Reply by ●April 28, 20232023-04-28
Den 2023-04-28 kl. 09:20, skrev David Brown:> On 27/04/2023 18:42, Ulf Samuelsson wrote: >> Den 2023-04-22 kl. 05:14, skrev Rick C: >>> On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote: >>>> On 21/04/2023 14:12, Rick C wrote: >>>>> >>>>> This is simply to be able to say this version is unique, regardless >>>>> of what the version number says. Version numbers are set manually >>>>> and not always done correctly. I'm looking for something as a backup >>>>> so that if the checksums are different, I can be sure the versions >>>>> are not the same. >>>>> >>>>> The less work involved, the better. >>>>> >>>> Run a simple 32-bit crc over the image. The result is a hash of the >>>> image. Any change in the image will show up as a change in the crc. >>> >>> No one is trying to detect changes in the image. I'm trying to label >>> the image in a way that can be read in operation. I'm using the >>> checksum simply because that is easy to generate. I've had problems >>> with version numbering in the past. It will be used, but I want it >>> supplemented with a number that will change every time the design >>> changes, at least with a high probability, such as 1 in 64k. >>> >> >> Another thing I added (and was later removed) was a timestamp directive. >> A 64 bit integer with the number of seconds since 1970-01-01 00:00. >> > > Timestamping a build in some way (as part of the "make", using __DATE__ > or __TIME__ in source code, or some feature of a revision control > system) is very tempting, and can be helpful for tracking exactly what > code you have on the system. > > However, IMHO having reproducible builds is much more valuable. I am > not happy with a project build until I am getting identical binaries > built on multiple hosts (Windows and Linux). That's how you can be > absolutely sure of what code went into a particular binary, even years > or decades later.With the timestamp located in the header, you can simply compare the non-header area. Make with __DATE__ or __TIME__ will tell you when that module is compiled, not when the program is generated. That is why TIMESTAMP is best generated in the linker. /Ulf> > A compromise that can work is to distinguish development builds and > production builds, and have timestamping in development builds. That > also reduces the rate at which your minor version number or build number > goes up, and avoids endless changes to your "version.h" include file. > > > >
Reply by ●April 28, 20232023-04-28
Den 2023-04-28 kl. 09:38, skrev David Brown:> On 27/04/2023 22:44, Ulf Samuelsson wrote: >> Den 2023-04-27 kl. 20:29, skrev Niklas Holsti: >>> On 2023-04-27 20:09, Rick C wrote: > >> >>> >>>> You are making a lot of assumptions about the tools. I'm pretty sure >>>> they don't apply to my case. I'm not at all clear how this is >>>> workable, anyway. Adding the checksum to the file, changes the >>>> checksum, which is where this conversation started... unless I'm >>>> missing something significant. >> No, you reserve room for the checksum, but that needs to be outside >> the checked area. >> The address of the checksum needs to be known to the application. > > The address here could have a symbol, and then declared "extern" in the > C code - it would not have to be a known numerical address. But if the > image is checked or started from another program (such as a boot > program), you need an absolute address somewhere to chain this all > together.The header is declared as a struct.> >> Also the limits of the checked area. >> That is why the application has a header in front in my projects. >> The application is started by the bootloader, which checks >> a number of things before the application is started. >> The application can read the header as well to allow checking >> the code area at runtime. >> > > Or for my preferences, the CRC "DIGEST" would be put at the end of the > image, rather than near the start. Then the "from, to" range would > cover the entire image except for the final CRC. But I'd have a similar > directive for the length of the image at a specific area near the start. >I really do not see a benefit of splitting the meta information about the image to two separate locations. The bootloader uses the struct for all checks. It is a much simpler implementation once the tools support it. You might find it easier to write a tool which adds the CRC at the end, but that is a different issue. Occam's Razor! /Ulf
Reply by ●April 28, 20232023-04-28
Den 2023-04-28 kl. 00:10, skrev Niklas Holsti:> On 2023-04-27 23:36, Ulf Samuelsson wrote: >> Den 2023-04-27 kl. 19:09, skrev Rick C: >>> On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>>> Den 2023-04-20 kl. 04:06, skrev Rick C: >>>>> This is a bit of the chicken and egg thing. If you want a embed a >>>>> checksum in a code module to report the checksum, is there a way of >>>>> doing this? It's a bit like being your own grandfather, I think. >>>>> >>>> The proper way to do this is to have a directive in the linker. >>>> This reserves space for the CRC and defines the area where the CRC is >>>> calculated. >>> >>> That assumes there is a linker. How does the application access this >>> information? >>> >>> >> Linker command file >> public CRC64; start, stop >> HEADER = .; >> QUAD(MAGIC); >> CRC64 = .; >> DIGEST "CRC64-ECMA", (start, stop) >> start = .; >> # Your data to be protected >> ... >> stop = .; >> >> C source code. >> >> extern uint64_t CRC64; >> extern char* start; >> extern char* stop; >> >> uint64_t crc; >> >> crc64 = calc_crc64_ecma(start, stop); >> if (crc64 == CRC64) { >> /* everything is OK */ >> } > > > I'm nit-picking, but that C code does not look right to me. The extern > declarations for "start" and "stop" claim them to be names of memory > locations that contain addresses, but the linker file just places them > at the starting and one-past-end locations of the block to be protected. > So the "start" variable contains the first bytes of the "data to be > protected", and the contents of the "stop" variable are not defined > because it is placed after the "data to be protected", where no code or > data is loaded (it seems). > > It seems to me that the call to calc_crc64_ecma should get the addresses > of "start" and "stop" as arguments (&start, &stop), instead of their > values. But perhaps calc_crc64_ecma is not a function, but a macro that > can itself take the addresses of its parameters. >Whatever, I did not put a lot of thought into that, and certainly did not check it. The important thing is that you can declare labels in the linker and use them in the code through extern declarations. /Ulf







