EmbeddedRelated.com
Forums

What is your strategy to save non volatile data?

Started by pozz August 11, 2016
On 8/13/2016 2:42 PM, pozz wrote:
> Il 13/08/2016 19:06, Tim Wescott ha scritto: >> On Thu, 11 Aug 2016 14:48:57 +0200, pozz wrote: >> >>> Many times I work on embedded projects where it is required to save some >>> data in a non volatile way. Usualy I use external (to the main >>> microcontroller) serial/parallel EEPROM or internal Flash memory (with a >>> software layer for EEPROM emulation). >>> >>> However my goal isn't to discuss the hw alternatives, but the data >>> format and strategy. >>> >>> I know there are many serialization data format (XML, JSON, Protocol >>> Buffer, SQLite), but I don't think they are valid solutions for >>> medium/low end microcontrollers. They are complex to implement and they >>> aren't really necessary. >>> Data are written and read from the same platform, so the endianess and >>> byte order are not a problem. >>> >>> IMHO directly saving a C structure to the non-volatile memory is the >>> best method. When you will load bytes from the non volatile memory, you >>> will have magically your C structure in RAM filled with the saved >>> values. >>> >>> Is it the same strategy that you use? >> >> And then, when you upgrade your software, your data structure is >> automatically corrupted! Woo hoo! > > You can include a "data layout version" that increases when one parameter is > added, removed or changed when a new firmware version is created. During > loading process, the "data layout version" is checked against the corresponding > version of the current firmware. > > Immediately after an upgrade, the data version doesn't match the firmware > version so a "data upgrade" process is triggered. > > Of course, for complex firmware and persistent data that changes frequently, > this task could be tricky: you could have whatever previous data version (or > the upgrade *must* be done incrementally from one version to the next one, > without gaps).
This places a burden on the user(s): they have to track your releases even if they have no need/desire to do so. And, *you* have to keep all of the "updates" available (web site?) so a user who is still running v1.0 can apply 2.0, then 3.0, then 4.0, etc. Or, offer a set of shortcuts: 1.0 to 3.0 1.1 to 2.0 2.0 to 4.0 etc. (this gets messy really quick -- esp for a *device*, not a "desktop application") And, you need some way of telling the user that the update does NOT apply to his current version; that he must first update to version X (which, in turn, may require yet another update)
>> You can prevent this by assuming that memory will never be corrupted and >> only growing the structure at the end (which is oh so convenient when, >> for instance, you want to obsolete a parameter or grow it from one byte >> to two), or by making 0xff the default value for every byte (again, oh so >> convenient in the code). >> >> Personally, I use a list of data items arranged in records. Each record >> has a length byte, an ID byte (or word), and data. It's accompanied by >> software that lets you define the default value for each ID, so when the >> item isn't found the software automatically gets the factory default (I >> use C++, so the default value is done as a parameter in a template). > > How do you save a variable-length data, for example a string? Do you save > always the maximum lenght for that parameter? Otherwise, how do you manage an > increase in the length of that parameter (when the length of a string > increases)? You can't overwrite only the old value, you need to rewrite the > "database" from the beginning.
You can use P-strings. And, to minimize the number of "database rewrites", allow a *shorter* string to replace an existing string (padding the excess so your parser knows to skip over it as "empty space"). Then, the only times you need to rewrite is when you need a *longer* string OR when you need to GC some of the "empty space" in order to accommodate some other increased memory need (i.e., some *other* string that has grown). (Of course, you can also use ASCIIZ strings) In each case, you have to consider how you handle corruption in the dataset; if a tag suggests N bytes of data *should* follow (e.g., a 6-octet MAC) but the tag has been corrupted and, really, only M bytes actually are present (because it was really an IP address stored there!), then you quickly can get out of sync and find yourself reading "past" the end of the dataset. Or, coming up short. (hence the need for a checksum/hash AND the precaution of not trusting ANY of the data until you verify that the entire object appears intact) Note that this requires you to have enough working memory to store the entire "database" while it is being rewritten. Or, best case, the largest object contained therein (assuming you can repeatedly, incrementally rewrite the database as you shuffle entries around).
In article <noo48f$g4a$1@dont-email.me>, pozzugno@gmail.com says...

....
> > You can prevent this by assuming that memory will never be corrupted
and
> > only growing the structure at the end (which is oh so convenient when, > > for instance, you want to obsolete a parameter or grow it from one byte > > to two), or by making 0xff the default value for every byte (again, oh so > > convenient in the code).
For a lot of applications, as long as you safeguard to ensure even if software is newer or older you only load valid number of parameters. Even with a record set and downgrade you will have to skip newer records the software knows nothing about becuase software has been downgraded
> > > > Personally, I use a list of data items arranged in records. Each record > > has a length byte, an ID byte (or word), and data. It's accompanied by > > software that lets you define the default value for each ID, so when the > > item isn't found the software automatically gets the factory default (I > > use C++, so the default value is done as a parameter in a template). > > How do you save a variable-length data, for example a string? Do you > save always the maximum lenght for that parameter? Otherwise, how do > you manage an increase in the length of that parameter (when the length > of a string increases)? You can't overwrite only the old value, you > need to rewrite the "database" from the beginning.
For the vast majority of embedded applications there is limited number of strings that need to be saved and these are in MOST cases things like hostname, filename, path all of which have system defined LIMITS. It is rare I come across a CONFIGURATION setting (not a save state) that needs a variable length string. If any strings are involved they are normally ones with fixed limits. Not all systems have to have save state that is more complex than a set of variables, rarely do I find embedded applications that have to save system state or loaded sub-application state. A lot of embedded systems cannot do more than skip unrecognised parameters as they have no way to inform anyone. -- Paul Carpenter | paul@pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/pi/> Raspberry Pi Add-ons <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.badweb.org.uk/> For those web sites you hate
On Sat, 13 Aug 2016 23:42:49 +0200, pozz wrote:

> Il 13/08/2016 19:06, Tim Wescott ha scritto:
>> Personally, I use a list of data items arranged in records. Each >> record has a length byte, an ID byte (or word), and data. It's >> accompanied by software that lets you define the default value for each >> ID, so when the item isn't found the software automatically gets the >> factory default (I use C++, so the default value is done as a parameter >> in a template). > > How do you save a variable-length data, for example a string? Do you > save always the maximum lenght for that parameter? Otherwise, how do > you manage an increase in the length of that parameter (when the length > of a string increases)? You can't overwrite only the old value, you > need to rewrite the "database" from the beginning. > > >> It all goes into a block with a 16-bit CRC word at the end. It's >> extravagant in the use of overhead, but I have yet to come close to >> using 256 bytes, much less the amount of memory commonly found in a >> block of flash or an external EEPROM.
Basically, everything I've done with this sort of thing is on a processor big enough that all of the individual parameters can be stored in RAM. In fact, the process is that each parameter is echoed in memory, and at startup each one is either loaded with the value from flash (if it is there), or the default value (if it isn't). Because I usually end up using a block of the processor's flash, the "database" is, indeed, rewritten from the beginning on save (and woe be unto the service guy who says "save" and then pulls the plug). I generally don't store strings in this sort of space, so it hasn't been an issue for me. If it is an issue, you can always have a special data ID for "erasure". You'll still need a way to rebuild memory if you're constantly changing the size of things. Just for reference, I'm doing this sort of thing on chips with 32K or more of RAM, and 128K or more of flash -- I'm sure it works for smaller processors than that, but at some point space will get constrained enough that it's not reasonable. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On 13/08/16 18:44, Don Y wrote:
> On 8/12/2016 1:40 PM, David Brown wrote: >> On 12/08/16 18:40, Don Y wrote: >>> On 8/12/2016 2:54 AM, David Brown wrote: >>>> On 12/08/16 00:30, Don Y wrote: >>>>> On 8/11/2016 3:17 PM, pozz wrote: >>>>> >>>>> [much elided] >>>>> >>>>>> "Everything in one structure" means you have a global (as the scope >>>>>> in C) >>>>>> struct definition that is accessed by every piece of code in the >>>>>> project, even >>>>>> if they are very different section of the project. >>>>> >>>>> No. It only has to be *effectively* global. >>>>> >>>>> E.g., you can have a global "update_parameter()" that takes a >>>>> "key" (name for a parameter -- this could be a literal string >>>>> *or* just a small integer) and a parameter value (or pointer to >>>>> a parameter value). Update_parameter() then moves the "value" >>>>> into the referenced portion of the "parameters to be saved struct". >>>> >>>> First, /don't/ use strings for that sort of thing unless you have no >>>> other choice. And don't use arbitrary integers. Use an enum, that is >>>> defined in one place. That way you get it right. Your code refers to >>>> the parameters using sensible names, rather than numbers, you don't get >>>> overlap or gaps (unless you want them, of course), and you can easily >>>> change things when you need to. And when one part of your program >>>> writes parameter "backgroundColour" and another part reads parameter >>>> "backgroundColor", you get a compile-time error rather than weird >>>> behaviour at run-time. >>> >>> You throw an error at run-time -- because you *still* have to do >>> run-time checks on parameters and their values! >> >> The difference here is that with checks on parameter values, you can >> (normally) >> just set them to safe default values if they occur, and it is >> something that >> typically would never happen. >> >> But if you have a spelling mistake in your texts, your program is >> likely to >> silently ignore storage of changes in parameters. > > Why would your program silently ignore them? Don't you write ROBUST code? >
If you write code that does run-time checks on strings, then you open yourself to the risk that spelling mistakes or other mismatches can cause problems that are only discovered at run time. That means you need extra layers of complications - extra checks in the code to ensure that there are no such problems, and extra test suites in order to check this checking code. Sure, it can be made robust - but at the cost of significant effort. On the other hand, if such checks are made by the compiler (such as by using enums), no test code is needed - the compiler will spot the problem.
> If you tried to set parameter 47 to 982 -- but there were only 42 supported > parameters, would your code "silently ignore" this? Or, if parameter 47 > *did* exist but expected to be set to a string, would you ignore the > attempt to set it to a numeric?
If your parameters were made using a C++ strong enum, then you could not write code that tried to set parameter 47 when there are only 42 supported parameters. If you are just using numbers (or normal C enums), and are using gcc (or clang), you can do something like this: // settings.h extern void __attribute__((error("Bad parameter number"))) badParameterNumber(void); extern void doSetParameterUint32(int paramNo, uint32_t newValue); static inline setParameterUint32(int paramNo, uint32_t newValue) { if (__builtin_constant_p(paramNo)) { if ((paramNo < 0) || (paramNo > 41)) { badParameterNumber(); } } doSetParameterUint32(paramNo, newValue); } Then in the very common case of using a fixed number (preferably from your enum, or at least from #define's), you will get compile-time checking of valid numbers. The more work you can get from the compiler, the less work you have to do yourself as a developer, and the less work your target cpu has to do at run-time.
> >>> And, it would be silly to type out the name of the key in each >>> invocation. Instead, you'd use: >>> >>> #define PARAM_BACKGROUND_COLOR ("bacKgRouNdCoLOR") >>> #define PARAM_FOREGROUND_COLOR ("4GrouNDcoLOUr") >>> etc. >>> >>> How is this any different from: >>> >>> #define PARAM_BACKGROUND_COLOR (1) >>> #define PARAM_FOREGROUND_COLOR (2) >>> etc.? >> >> It is different, because it is - IMHO - daft. Working with strings >> like this >> is vastly less efficient, and more error prone. And if you've got a >> good > > So, if you opted to use an int as a tag/key, you'd insist on using values > like 1, 2, 3, 4, ... instead of "BACK", "FORE", "BORD", for your tags? > (i.e., each of these are 32b integers) > > And, force yourself to maintain a cheat sheet mapping *arbitrary* integers > to these interpretations -- just because your grade school teacher taught > you to start counting from '1'?
The "cheat sheet" is an enum.
> > Using strings is "daft" when you don't have the resources (memory/MIPS) > to do so. Forcing yourself to use enums/integers when you *do* have > the resources is "daft"!
Doing work yourself that can be done by the compiler is "daft". Of course there are times when strings would be the most convenient types of tag. It can reduce coupling between modules, and that may be of overriding importance. But generally, it is best to keep things simple, and to get as much compile-time checking and help as possible.
> > (why isn't the windows registry just one big set of binary tuples?
The windows registry is well-known to be a complete mess, hugely inefficient, unstable and insecure. And many of its keys are just meaningless numbers (there is no handy "enum" to translate them). It is /not/ a good example for your argument!
> why > aren't UN*X configuration files encoded in some magical, highly space and > time efficient manner -- with a suitable "decoder" to allow for the > *infrequent* human use? why do databases give names to fields instead > of just 'field1', 'field2', etc.?) >
Using an enum with integer keys for your parameters is great in a small program - it is less great for a big program, and it is a bad choice if you expect humans to manually alter the settings.
>> place to write out these #defines, then you are better off using an >> enum there >> with numerical values: >> >> enum { >> paramBackgroundColour, >> paramForegroundColour, >> etc. >> } >> >> It's far more efficient, easier on the eye, easier to get IDE help in >> automatic >> completion, and easier for the compiler to check (such as compiler >> warnings if >> you use switch on an enum but forget some cases). And if you are using >> C++ >> rather than C, you can use type-safe strong enums, as well as have better >> scoping and namespace usage. > > And, all it ADDS is the requirement that you insert this extra step > (and keep it consistently, everywhere you use it). > > I just got notified of an "error 27". What's that mean? What's > the symbolic label associated with that numeric value? *Then*, > what does that symbolic label actually *mean*? > > [Isn't this the same argument applied to "error codes"?] > >>> Enums, IMO, are a bad idea -- it's far too easy to slip a new one >>> "in the middle" and effectively bodge the keys associated with all >>> those that follow. I.e., a software revision changes: >>> >>> {PARAM_BACKGROUND_COLOR, PARAM_FOREGROUND_COLOR, ...} >>> to >>> {PARAM_BACKGROUND_COLOR, PARAM_BORDER_COLOR, PARAM_FOREGROUND_COLOR, >>> ...} >>> >>> and now any "legacy" configuration files are effectively unreadable. >> >> If that is a concern (and it sometimes /is/ a concern), give the enum >> explicit >> values: >> >> enum { >> paramBackgroundColour = 1, >> paramForegroundColour = 2, >> etc. >> } > > This is then another maintenance issue: ensuring that paramBorderColor > gets > MANUALLY assigned the correct value regardless of where it appears in > the list of enumerations: > enum { > paramBackgroundColour = 1, > paramForegroundColour = 2, > paramBorderColor = 27, > paramShadowColor = 3, // return to the value that would have > been used > // had not paramBorderColor been introduced > > or, alternatively: > enum { > paramBackgroundColour = 1, > paramForegroundColour = 2, > paramShadowColor, // 3 > ... // manually skip over 24 labels (assuming none of them > explicitly > // alter their default values) to ensure we're at "27" when > the > // next label (border color) is encountered > paramBorderColor, // 27 >
And how is that different from using a bunch of #define's with specific numbers or strings?
>>> (How do you refer to "old_PARAM_FOREGROUND_COLOR" in the code that >>> attempts to patch-up a legacy configuration dataset in light of >>> the *new* "PARAM_FOREGROUND_COLOR"? I.e., you end up having to create >>> another enumeration for the legacy representation -- using DIFFERENT >>> names!) >> >> And how is that worse than using different names with #define'd values? > > Because there is no "old_" value! The "old" and "new" are exactly the > same! > You just add new values to the end and obsolete values "in the middle". > > #define PARAM_BACKGROUND_COLOR (3) > #define PARAM_FOREGROUND_COLOR (4) > #define PARAM_BORDER_COLOR (5) > #define PARAM_SHADOW_COLOR (6) > > becomes: > > #define PARAM_BACKGROUND_COLOR (3) > #define PARAM_FOREGROUND_COLOR (4) > #define PARAM_BORDER_COLOR (5) // no longer used > #define PARAM_SHADOW_COLOR (6) > > when the border color parameter is obsoleted. So, if you encounter > a "configuration struct" that references parameter 5, you know that > it is referencing the obsolete border color parameter -- not the > *new* "highlight color" that took its place in the list of numerical > parameter identifier values.
Again, how is that different from doing /exactly/ the same thing with an enum?
> >>> By contrast, if you manually enumerate them, you are keenly aware of >>> the original (name,key) mapping. Add new names to the end of the >>> list (who cares if "COLOR_X" and "COLOR_Y" have adjacent keys?!). >>> And, you can simply put a comment after any keys that have been >>> obsoleted >>> over time: >>> >>> #define PARAM_VARIABLE_X (...) // obsolete as per rev x.y.z >>> >>> Strings have value in that you can *see* them in logs: >>> log( "Updating parameter %s to value %s", param, value ) >>> instead of: >>> log( "updating parameter %d to value %d", param, value ) >>> or (ick): >>> log( "Updating parameter %s to value %s", >>> param_to_string(param), value_to_string(value) ) >>> >>> Integers (and enums) suffer in poorly typed languages: >>> update_parameter(PARAM_BACKGROUND_COLOR, VALUE_BROCOLLI); >>> >>> I.e., there's no way to ensure only the values valid for >>> PARAM_BACKGROUND_COLOR are applied to PARAM_BACKGROUND_COLOR! >>> The above function call may succeed (depends on whether >>> VALUE_BROCOLLI happens to coincide with a value for a >>> PARAM_BACKGROUND_COLOR) -- or not. I.e., you still have to >>> rely on run-time checks; you can't do it all at compile time. >>> >>> In my current project, parameters are just fields in an RDBMS table. >>> So, the RDBMS *enforces* "correct" (in the sense of "valid") values. >>> Can't set Background_Color to "Blue" unless "Blue" is a valid choice >>> for *that* field. And, surely can't set it to "Brocolli"! :> >>> >>> And, as the RDBMS is charged with maintaining the integrity of this >>> data, there's no need to *check* it when reading it from the RDBMS >>> (it was checked "on the way in"!) >>> >>> As everything is SQL, I *see* "Blue" not some arbitrary >>> *encoding* of PARAM_BACKGROUND_COLOR_BLUE (which might differ >>> from PARAM_FOREGROUND_COLOR_BLUE -- even though they are the >>> same color!) >> >> I don't disagree that storing things as strings can be useful >> sometimes. But >> we are not talking about big, wasteful PC software - we are talking >> about a >> small number of parameters in a small embedded program. SQL is too >> big by > > Who said it was a "small embedded program"? Did I miss that, somewhere > upthread?? Just because its "embedded" doesn't implicitly make it "small".
I don't think the OP has said so explicitly, but if you look at the code he has given it is far more in line with a small embedded program than an embedded Linux application.
> > How many Linux-based appliances still use fstab(5), ethers(5), > inetd.conf(5), > etc.? Surely they could replace those with "compiled" versions and elide > all of that *text* from their configurations! After all, they already > *have* run-time code to parse those configuration files and configure > the system accordingly; why not just skip the "text processing" and store > a bunch of magic integers in a file, someplace? > >> orders of magnitude for such a system. It would a different matter if >> we were >> talking about an embedded linux system, for example - then you just >> write your >> code in Python, use a JSON string to hold a dict of the parameters, >> and you've >> got the system working in an hour of developer time. Different tools for >> different tasks. > > Exactly. If I tried to replace the *thousands* of "configuration > parameters" in my current system with enums/small integers, I (and > anyone following after me) would spend all out time consulting cheat > sheets to try to remember which parameter controls which set of options. > > And, of course, document all of that nicely as it evolves. > > Even the little /pro bono/ job I'm working on this month has enough > "settings" to make that a daunting undertaking. Much easier to > just say: > result = write("/dev/display/refresh", "60") > than to sort out which ioctl(2) sets the display refresh rate to 60Hz. > Or: > result = write("/dev/eia1/ctl", "odd") > instead of trying to remember if the manifest constant for parity > is PARODD or !PAREVEN. > > Likewise, the "result" need not be some cryptic value that the > developer needs to look up and then translate into a *familiar* > form (so all developers use similar reporting messages): > if (result != nil) > print("Error: " result) > > [This also seems to be a trend in CONFIGURING devices and apps in > other OS's as well: "mixerctl outputs.headphones=+5" ] > > If I had to provide a set of manuals enumerating all sorts of "error > codes", > no one would ever attempt to alter the system from its default state. > "How do I change the color of the background? *Why* didn't my > attempt to change the background to "vilet" [sic] fail?" > >>>>> E.g., you can pack a bunch of (potentially unrelated) parameters >>>>> into a bitfield within that struct. Data coming *into* it gets >>>>> massaged to manipulate the appropriate fields. Likewise for data >>>>> coming *out*. >>>> >>>> That sounds like a complication that is unlikely to be worth the >>>> effort, >>>> but it's possible. In many cases its enough to simply say "all >>>> parameters are 32 bits - it's up to the app if this is 4 characters, a >>>> float, an integer, etc.". >>> >>> Depends on resources available *and* how parameters are accessed. >>> Surely easier to fetch a *set* of related small fields in one >>> access than it is to have to fetch several individual fields >>> (each consisting of lots of 0 bits!) >> >> If you have the resources to use string keys for the parameters, you >> have the >> resources for using individual data items for your bits. > > You're assuming all of my comments apply to single implementation. > I'm presenting options that have/can be used for a variety of scenarios. > Many are inconsistent with each other (but the OP hadn't indicated which > SPECIFIC capabilities/characteristics his system involves) > >> And if you have a number of related bits to track together, then use a >> flags >> parameter - have the application module keep the flags in a uint and >> pass them >> all together. >> >>> It also hides the representation from the caller. Should I store >>> the time-of-last-update as a time_t? Or, as some sort of string? >>> Or, as something that really only makes sense to the routine that >>> does the updates? >>> >>>>> Finally, because this represents a monitor, of sorts, you can ensure >>>>> atomic operations on that "data to be preserved" -- up to and >>>>> including >>>>> an action that automatically saves the struct to persistent store >>>>> when power is failing (the rest of the time, "updates" happen to >>>>> the "in-RAM" copy of the struct so you don't incur the costs of >>>>> updating FLASH more often than necessary) >>>> >>>> Also true - it's always important to make sure your data is consistent >>>> at all times. >
Il 14/08/2016 02:06, Tim Wescott ha scritto:
> On Sat, 13 Aug 2016 23:42:49 +0200, pozz wrote: > >> Il 13/08/2016 19:06, Tim Wescott ha scritto: > >>> Personally, I use a list of data items arranged in records. Each >>> record has a length byte, an ID byte (or word), and data. It's >>> accompanied by software that lets you define the default value for each >>> ID, so when the item isn't found the software automatically gets the >>> factory default (I use C++, so the default value is done as a parameter >>> in a template). >> >> How do you save a variable-length data, for example a string? Do you >> save always the maximum lenght for that parameter? Otherwise, how do >> you manage an increase in the length of that parameter (when the length >> of a string increases)? You can't overwrite only the old value, you >> need to rewrite the "database" from the beginning. >> >> >>> It all goes into a block with a 16-bit CRC word at the end. It's >>> extravagant in the use of overhead, but I have yet to come close to >>> using 256 bytes, much less the amount of memory commonly found in a >>> block of flash or an external EEPROM. > > Basically, everything I've done with this sort of thing is on a processor > big enough that all of the individual parameters can be stored in RAM. > In fact, the process is that each parameter is echoed in memory, and at > startup each one is either loaded with the value from flash (if it is > there), or the default value (if it isn't). > > Because I usually end up using a block of the processor's flash, the > "database" is, indeed, rewritten from the beginning on save (and woe be > unto the service guy who says "save" and then pulls the plug).
Do you use any mechanism to avoid corrupting the "database" if the saving process is interrupted for any reason (for example, a power failure)? Do you restart from "all default" values, loosing all the previous saved values?
> I generally don't store strings in this sort of space, so it hasn't been > an issue for me. > > If it is an issue, you can always have a special data ID for "erasure". > You'll still need a way to rebuild memory if you're constantly changing > the size of things.
If you rewrite all the database from the beginning (in a new memory area or in the same memory area), you could increase/descrease the length of strings without any problem. The overall size of the database will be greater or lower than before. There's a problem only if you want to rewrite/overwrite only the single parameter that is changed *and* the size of its value changes.
> Just for reference, I'm doing this sort of thing on chips with 32K or > more of RAM, and 128K or more of flash -- I'm sure it works for smaller > processors than that, but at some point space will get constrained enough > that it's not reasonable.