EmbeddedRelated.com
Forums

Patch fixed strings in .hex file

Started by pozz January 16, 2024
Il 16/01/2024 13:19, pozz ha scritto:
> In one project I have many quasi-fixed strings that I'd like to keep in > non volatile memory (Flash) to avoid losing precious RAM space. > > static const char s1[] = "/my/very/long/string/of/01020304"; > static const char s2[] = "/another/string/01020304"; > ... > > Substring "01020304" is a serial number that changes during production > with specific device. It has the same length in bytes (it's a simple hex > representation of a 32-bits integer). > > Of course it's too difficult and slow to rebuild the firmware during > production passing to the compiler the real serial number. I think a > better solution is to patch the .hex file generated by the compiler. > > I'm wondering how to detect the exact positions (addresses) of serial > numbers to fix. > > The build system is gcc, so I could search for s1 in the elf file. Do > you know of a tool that returns the address of a symbol in the elf or > map file? > > Could you suggest a better approach? >
With this command readelf -s output.elf | grep string_to_patch the output would be: 543: 00019c44 58 OBJECT LOCAL DEFAULT 1 lwt_message In order to retrieve only the address: readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed -e 's/ */ /g' | cut -d " " -f 2 The first sed removes all the spaces at the beginning, the second sed squeezes multiple spaces to one and the cut command extract the second field.
On 17/01/2024 08:45, pozz wrote:
> Il 16/01/2024 19:35, Hans-Bernhard Bröker ha scritto: >> Am 16.01.2024 um 13:19 schrieb pozz: >> >>> I'm wondering how to detect the exact positions (addresses) of serial >>> numbers to fix. >> >> You do not. >> >> Instead, you set up linker scripts, linker options and/or add >> __attribute(()) to the variables' definitions to _place_ them at a >> predetermined, fixed, known-useful location. > > Do you mean to choose by yourself the exact address of *each* string? > And where would you put them, at the beginning, in the middle or at the > end of the Flash? You need to calculate the address of the next string > from the address *and length* of the previous string. It seems to me a > tedious and error-prone job that could be done easily by the linker. >
How many strings do you need here? While it is possible to do all this using patching of odd places in your file, using specific locations is often a better choice. Since you haven't already said "Thanks for the advice - I tried it that way, it worked, and I'm happy" in response to any post, I would say that now is the time to take fixed location solutions seriously. The way I always handle this is to define a struct type of fixed size, containing all the information that might be added post-build. That can be version information, serial numbers, length of the image (very useful if you tag a CRC check on the end of the image), etc., - whatever you want to add. Strings have fixed maximum sizes and space. Make a dedicated section, and in the source code have a default instance of the type in that section, with default values. (This is especially handy when running from a debugger, as your elf file will not have post-build values.) Empty strings should be all null characters. Remember to declare it "volatile const". Your linker file specifies that this section goes at a specific known fixed address (perhaps just after interrupt vectors, or whatever is appropriate for your microcontroller). Now your post-build scripts have a simple fixed address to patch the binaries.
> >> And do yourself one favour: have only _one_ instance of that number in >> your code.  Use concatenation or similar to output it where needed. >> >> Then you can use tools like srecord GNU binutils to stamp your desired >> number into that fixed location in the hex file.  Professional-grade >> chip flashing tools for production environments can usually do that by >> themselves, so you don't even have to edit your "official" files. >> >> Details will obviously vary by tool chain. >> > > Patching the .hex or .bin file replacing 8 bytes starting from a known > address is simple. I would write a Python script or would use one of > srecord[1] tools. > > [1] https://srecord.sourceforge.net/ >
Don't bother with hex or srec files. Use binary files - it makes things easier.
Il 17/01/2024 11:27, David Brown ha scritto:
> On 17/01/2024 08:45, pozz wrote: >> Il 16/01/2024 19:35, Hans-Bernhard Bröker ha scritto: >>> Am 16.01.2024 um 13:19 schrieb pozz: >>> >>>> I'm wondering how to detect the exact positions (addresses) of >>>> serial numbers to fix. >>> >>> You do not. >>> >>> Instead, you set up linker scripts, linker options and/or add >>> __attribute(()) to the variables' definitions to _place_ them at a >>> predetermined, fixed, known-useful location. >> >> Do you mean to choose by yourself the exact address of *each* string? >> And where would you put them, at the beginning, in the middle or at >> the end of the Flash? You need to calculate the address of the next >> string from the address *and length* of the previous string. It seems >> to me a tedious and error-prone job that could be done easily by the >> linker. >> > > How many strings do you need here?
They are 10 strings.
> While it is possible to do all this using patching of odd places in your > file, using specific locations is often a better choice.  Since you > haven't already said "Thanks for the advice - I tried it that way, it > worked, and I'm happy" in response to any post, I would say that now is > the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used with success. Just for sake of curiosity and studying, I'm exploring all of them. Sincerely I don't *like* solutions where you need to choose a fixed location by yourself. Why you should make a job that can be done by the linker?
> The way I always handle this is to define a struct type of fixed size, > containing all the information that might be added post-build.  That can > be version information, serial numbers, length of the image (very useful > if you tag a CRC check on the end of the image), etc., - whatever you > want to add.  Strings have fixed maximum sizes and space.
>
> Make a dedicated section, and in the source code have a default instance > of the type in that section, with default values.  (This is especially > handy when running from a debugger, as your elf file will not have > post-build values.)  Empty strings should be all null characters. > Remember to declare it "volatile const".  Your linker file specifies > that this section goes at a specific known fixed address (perhaps just > after interrupt vectors, or whatever is appropriate for your > microcontroller). > > Now your post-build scripts have a simple fixed address to patch the > binaries.
How the post-build script should know the exact address of a certain field in the struct? volatile const struct post_build_data { uint32_t serial_number; uint64_t mac_address; uint32_t image_size; char s1[32]; char s2[64]; char s3[13]; } post_build_data __attribute(...); I know the fixed address of the symbol post_build_data (the only object in my custom section), but now I have to calculate the offset of the field s1 in the struct. This calculations is error prone. In my opinion, it's much simpler to use a production script that retrieves, without any error or manual calculation, the address of a certain symbol directly from the elf. From another post of mine: readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed -e 's/ */ /g' | cut -d " " -f 2
>>> And do yourself one favour: have only _one_ instance of that number >>> in your code.  Use concatenation or similar to output it where needed. >>> >>> Then you can use tools like srecord GNU binutils to stamp your >>> desired number into that fixed location in the hex file. >>> Professional-grade chip flashing tools for production environments >>> can usually do that by themselves, so you don't even have to edit >>> your "official" files. >>> >>> Details will obviously vary by tool chain. >>> >> >> Patching the .hex or .bin file replacing 8 bytes starting from a known >> address is simple. I would write a Python script or would use one of >> srecord[1] tools. >> >> [1] https://srecord.sourceforge.net/ >> > > Don't bother with hex or srec files.  Use binary files - it makes things > easier.
Yes of course, patching an hex file or a binary file isn't the complex task here.
On 17/01/2024 12:54, pozz wrote:
> Il 17/01/2024 11:27, David Brown ha scritto: >> On 17/01/2024 08:45, pozz wrote: >>> Il 16/01/2024 19:35, Hans-Bernhard Bröker ha scritto: >>>> Am 16.01.2024 um 13:19 schrieb pozz: >>>> >>>>> I'm wondering how to detect the exact positions (addresses) of >>>>> serial numbers to fix. >>>> >>>> You do not. >>>> >>>> Instead, you set up linker scripts, linker options and/or add >>>> __attribute(()) to the variables' definitions to _place_ them at a >>>> predetermined, fixed, known-useful location. >>> >>> Do you mean to choose by yourself the exact address of *each* string? >>> And where would you put them, at the beginning, in the middle or at >>> the end of the Flash? You need to calculate the address of the next >>> string from the address *and length* of the previous string. It seems >>> to me a tedious and error-prone job that could be done easily by the >>> linker. >>> >> >> How many strings do you need here? > > They are 10 strings. >
I thought you were storing serial numbers? But okay, if you need 10 strings you need 10 strings. The number is just a detail. (But if the number were 200 strings for supporting different languages, you might do things differently.)
> >> While it is possible to do all this using patching of odd places in >> your file, using specific locations is often a better choice.  Since >> you haven't already said "Thanks for the advice - I tried it that way, >> it worked, and I'm happy" in response to any post, I would say that >> now is the time to take fixed location solutions seriously. > > There are many suggested solutions and I think all of them can be used > with success. Just for sake of curiosity and studying, I'm exploring all > of them. >
Fair enough.
> Sincerely I don't *like* solutions where you need to choose a fixed > location by yourself. Why you should make a job that can be done by the > linker? >
You do so because it makes live much easier. It is the same reason you write your patching script in Python, rather than C.
> >> The way I always handle this is to define a struct type of fixed size, >> containing all the information that might be added post-build.  That >> can be version information, serial numbers, length of the image (very >> useful if you tag a CRC check on the end of the image), etc., - >> whatever you want to add.  Strings have fixed maximum sizes and space. > > >> Make a dedicated section, and in the source code have a default >> instance of the type in that section, with default values.  (This is >> especially handy when running from a debugger, as your elf file will >> not have post-build values.)  Empty strings should be all null >> characters. Remember to declare it "volatile const".  Your linker file >> specifies that this section goes at a specific known fixed address >> (perhaps just after interrupt vectors, or whatever is appropriate for >> your microcontroller). >> >> Now your post-build scripts have a simple fixed address to patch the >> binaries. > > How the post-build script should know the exact address of a certain > field in the struct?
You figure it out /once/ - using one of many possible methods. Counting with static asserts to check, or looking at the binary after putting canaries in the sample data. If you think that you might change the struct often, you can use separate variables and put them all in the same section, then look at the map file. In practice you rarely need to do something like that.
> > volatile const struct post_build_data { >   uint32_t serial_number; >   uint64_t mac_address; >   uint32_t image_size; >   char s1[32]; >   char s2[64]; >   char s3[13]; > } post_build_data __attribute(...); > > I know the fixed address of the symbol post_build_data (the only object > in my custom section), but now I have to calculate the offset of the > field s1 in the struct. This calculations is error prone. >
Static assertions are your friend here.
> In my opinion, it's much simpler to use a production script that > retrieves, without any error or manual calculation, the address of a > certain symbol directly from the elf. >
I doubt it is simpler. But of course it is possible, and what is simpler for me is not necessarily the same as simpler for you.
> From another post of mine: > > readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed -e > 's/  */ /g' | cut -d " " -f 2 >
You are using a Python script to do the patching. Use pyelftools and do this all in the one Python script. That way, future you who has to maintain this system will not build a time machine to go back and strangle the past you that thought this monster made sense. These kinds of pipes can seem elegant, but they are write-only and a maintainer's nightmare.
Am 17.01.2024 um 12:54 schrieb pozz:
> Il 17/01/2024 11:27, David Brown ha scritto: >> While it is possible to do all this using patching of odd places in >> your file, using specific locations is often a better choice.  Since >> you haven't already said "Thanks for the advice - I tried it that way, >> it worked, and I'm happy" in response to any post, I would say that >> now is the time to take fixed location solutions seriously. > > There are many suggested solutions and I think all of them can be used > with success. Just for sake of curiosity and studying, I'm exploring all > of them. > > Sincerely I don't *like* solutions where you need to choose a fixed > location by yourself. Why you should make a job that can be done by the > linker?
It's not you vs. the linker. You co-operate. You need to tell the linker about your chip anyway ("code is from 0x1000 to 0xc000, data is from 0xc000 to 0xd000"). So you can as well tell it "version stamp is from 0xcc00 to 0xd000, data only before 0xcc00". If you have your identification information in a fixed place, you can, for example, more easily analyze field returns. It's easy for your field service has to change something, and it's easy to do software updates that preserve the identification information. You don't need to figure out which software build is running on the chip and what the address of the structure happens to be in that one.
>> Now your post-build scripts have a simple fixed address to patch the >> binaries. > > How the post-build script should know the exact address of a certain > field in the struct?
By defining the struct in a compatible way. For example....
> volatile const struct post_build_data { >   uint32_t serial_number; >   uint64_t mac_address;
...this is a bad idea, because in most (but probably not all) chips, uint64_t after uint32_t means there's 32 bits of padding, so if you need serial-before-mac, you should at least make the padding explicit. There also might be endian problems. Using only char/uint8_t fields gives you a very high chance of identical structure layout everywhere (`uint8_t mac_address[8]`). Stefan
On 1/16/2024 5:19 AM, pozz wrote:
> In one project I have many quasi-fixed strings that I'd like to keep in non > volatile memory (Flash) to avoid losing precious RAM space. > > static const char s1[] = "/my/very/long/string/of/01020304"; > static const char s2[] = "/another/string/01020304"; > ... > > Substring "01020304" is a serial number that changes during production with > specific device. It has the same length in bytes (it's a simple hex > representation of a 32-bits integer). > > Of course it's too difficult and slow to rebuild the firmware during production > passing to the compiler the real serial number. I think a better solution is to > patch the .hex file generated by the compiler. > > I'm wondering how to detect the exact positions (addresses) of serial numbers > to fix.
Don't "detect" (i.e., "find"); rather, *place* it/them in a specific location that your code already knows about. How else would you force vector tables to reside at specific locations, jump tables, etc.? You will also code this "hole" into any checksum routine that your code executes at POST and cover it with a check of its own (that you will have to ensure is satisfied by whatever tool you use to "patch" the binary image).
> The build system is gcc, so I could search for s1 in the elf file. Do you know > of a tool that returns the address of a symbol in the elf or map file? > > Could you suggest a better approach?
When faced with *small* memory regions (e.g., 100 bytes) of a particular resource (e.g., NVRAM), I prefer a tagged format that allows the available space to be dynamically traded among uses -- much like cramming a variable number of parameters in a BOOTP packet. The downside is that you have to parse the region to extract any specific parameter. But, it eliminates the need to define static structures that might change from instance to instance: STRING1, "/my/very/long/string/of/01020304", STRING8, "Some Guy's Really long name or address", STRING9, "/another/string/01020304", etc. Note that the tag can be designed to act as the delimiter of a field. E.g., if all tags (STRING1, STRING8, etc.) have values outside the valid range of the data being stored (e.g., > 0x7F for ASCII), then the parse can know that a string terminates when any value > 0x7F is encountered. (you know that the first value in the region is a tag) A more versatile approach is to have each tag invoke it's own parse algorithm: CITY, "Cañon City\0", ZIPCODE, (long) 81212, AREACODE, ... Note that 'ñ' is outside the ASCII code points but the CITY parse routine could rely on some other mechanism ('\0') to detect the end of that field; similarly, ZIPCODE can expect a 4-byte integer to immediately follow it; AREA code can expect three BSD digits, etc. One can insist that tags appear in some fixed order (like TIFF files) so encountering anything that violates that rule where a tag is expected can act as a terminator for the field. Or, you can add a tag that is ENDOFDATA, etc. This makes the task of replacing any *individual* datum a bit harder as the fields aren't rigidly defined -- just the start of the region and its TOTAL length. OTOH, it's a win when the design evolves to require yet another parameter without altering the space available to that COLLECTION of parameters (like a network protocol trying to cram more functionality into a single packet) Protecting the integrity of this section can be accomplished with an error *correcting* code instead of just an error DETECTING checksum. E.g., I often create nonvolatile instances of the state of a pseudo-random number generator (because you don't want a user to be able to "reinitialize" it just by cycling power) with it's own ECC -- as *it* is often far more valuable than any other "settings" in the device.
On 17/01/2024 17:39, Stefan Reuther wrote:
> Am 17.01.2024 um 12:54 schrieb pozz: >> Il 17/01/2024 11:27, David Brown ha scritto: >>> While it is possible to do all this using patching of odd places in >>> your file, using specific locations is often a better choice.  Since >>> you haven't already said "Thanks for the advice - I tried it that way, >>> it worked, and I'm happy" in response to any post, I would say that >>> now is the time to take fixed location solutions seriously. >> >> There are many suggested solutions and I think all of them can be used >> with success. Just for sake of curiosity and studying, I'm exploring all >> of them. >> >> Sincerely I don't *like* solutions where you need to choose a fixed >> location by yourself. Why you should make a job that can be done by the >> linker? > > It's not you vs. the linker. You co-operate. You need to tell the linker > about your chip anyway ("code is from 0x1000 to 0xc000, data is from > 0xc000 to 0xd000"). So you can as well tell it "version stamp is from > 0xcc00 to 0xd000, data only before 0xcc00". > > If you have your identification information in a fixed place, you can, > for example, more easily analyze field returns. It's easy for your field > service has to change something, and it's easy to do software updates > that preserve the identification information. You don't need to figure > out which software build is running on the chip and what the address of > the structure happens to be in that one. > >>> Now your post-build scripts have a simple fixed address to patch the >>> binaries. >> >> How the post-build script should know the exact address of a certain >> field in the struct? > > By defining the struct in a compatible way. For example.... > >> volatile const struct post_build_data { >>   uint32_t serial_number; >>   uint64_t mac_address; > > ...this is a bad idea, because in most (but probably not all) chips, > uint64_t after uint32_t means there's 32 bits of padding, so if you need > serial-before-mac, you should at least make the padding explicit. There > also might be endian problems. > > Using only char/uint8_t fields gives you a very high chance of identical > structure layout everywhere (`uint8_t mac_address[8]`). >
You can also keep things safe by making sure that you are aligned by "natural" alignment, at least to size 8 bytes (I have never heard of a platform that has more than 8 byte alignment for anything). So two uint32_t's followed by an uint64_t is fine. #pragma GCC diagnostic push #pragma GCC diagnostic error "-Wpadded" volatile const struct ... #pragma GCC diagnostic pop is a useful check (if you are using gcc or clang). And a static assert on the size of the struct is another important safe-guard.
On 16.1.2024 17.39, David Brown wrote:
> On 16/01/2024 13:19, pozz wrote: >> In one project I have many quasi-fixed strings that I'd like to keep >> in non volatile memory (Flash) to avoid losing precious RAM space. >> >> static const char s1[] = "/my/very/long/string/of/01020304"; >> static const char s2[] = "/another/string/01020304"; >> ... >> >> Substring "01020304" is a serial number that changes during production >> with specific device. It has the same length in bytes (it's a simple >> hex representation of a 32-bits integer). >> >> Of course it's too difficult and slow to rebuild the firmware during >> production passing to the compiler the real serial number. I think a >> better solution is to patch the .hex file generated by the compiler. >> >> I'm wondering how to detect the exact positions (addresses) of serial >> numbers to fix. >> >> The build system is gcc, so I could search for s1 in the elf file. Do >> you know of a tool that returns the address of a symbol in the elf or >> map file? >> >> Could you suggest a better approach? >> > > Another - perhaps more reliable - method would be to put the string in > its own section with __attribute__((section('serial_number'))), and then > have a linker file entry to fix it at a specific known address. >
My vote for this. If there are many strings, they could be set into a const volatile struct which then is located into a known place. -- -TV
On 2024-01-17, Stefan Reuther <stefan.news@arcor.de> wrote:
> Am 17.01.2024 um 12:54 schrieb pozz: >> Il 17/01/2024 11:27, David Brown ha scritto: >>> While it is possible to do all this using patching of odd places in >>> your file, using specific locations is often a better choice.&nbsp; Since >>> you haven't already said "Thanks for the advice - I tried it that way, >>> it worked, and I'm happy" in response to any post, I would say that >>> now is the time to take fixed location solutions seriously. >> >> There are many suggested solutions and I think all of them can be used >> with success. Just for sake of curiosity and studying, I'm exploring all >> of them. >> >> Sincerely I don't *like* solutions where you need to choose a fixed >> location by yourself. Why you should make a job that can be done by the >> linker? > > It's not you vs. the linker. You co-operate. You need to tell the linker > about your chip anyway ("code is from 0x1000 to 0xc000, data is from > 0xc000 to 0xd000"). So you can as well tell it "version stamp is from > 0xcc00 to 0xd000, data only before 0xcc00". > > If you have your identification information in a fixed place, you can, > for example, more easily analyze field returns. It's easy for your field > service has to change something, and it's easy to do software updates > that preserve the identification information. You don't need to figure > out which software build is running on the chip and what the address of > the structure happens to be in that one. > >>> Now your post-build scripts have a simple fixed address to patch the >>> binaries. >> >> How the post-build script should know the exact address of a certain >> field in the struct? > > By defining the struct in a compatible way. For example.... > >> volatile const struct post_build_data { >> &nbsp; uint32_t serial_number; >> &nbsp; uint64_t mac_address; > > ...this is a bad idea, because in most (but probably not all) chips, > uint64_t after uint32_t means there's 32 bits of padding, so if you need > serial-before-mac, you should at least make the padding explicit.
Yes, defintely that. Or make the packing explicit. And add compile time checks to verify the offsets of fields withing the structure and fail if they're not what is expected. That has saved my many times.
pozz <pozzugno@gmail.com> writes:
> The build system is gcc, so I could search for s1 in the elf file. Do > you know of a tool that returns the address of a symbol in the elf or > map file?
The nlist command line tool is for that, and iirc there are some library functions that do similar.