I'm playing with a Raspberry system, however I think my question is about Linux embedded in general. We all know that the OS (linux or windows or whatever) *should* be gracefully powered down with a shutdown procedure (shutdown command in Linux). We must avoid cutting the power abruptly. If this is possible for desktop systems, IMHO it's impossible to achieve in embedded systems. The user usually switch off a small box by pressing an OFF button that usually is connected to the main power supply input. In any case, he could immediately unplug the power cord without waiting for the end of the shutdown procedure. I'm interesting to know what are the methods to use to reduce the probability of corruption. For example, I choose to use a sqlite database to save non-volatile user configurable settings. sqlite is transaction based, so a power interruption in the middle of a transaction shouldn't corrupt the entire database. With normal text files this should be more difficult. I know the write requests on non-volatile memories (HDD, embedded Flash memories) are usually buffered by OS and we don't know when they will be really executed by the kernel. Is there a method to force the buffered writing requests immediately? Other aspects to consider?
Linux embedded: how to avoid corruption on power off
Started by ●June 16, 2017
Reply by ●June 16, 20172017-06-16
AT Friday 16 June 2017 18:10, pozz wrote:> I'm playing with a Raspberry system, however I think my question is > about Linux embedded in general. > > We all know that the OS (linux or windows or whatever) *should* be > gracefully powered down with a shutdown procedure (shutdown command in > Linux). We must avoid cutting the power abruptly. > > If this is possible for desktop systems, IMHO it's impossible to achieve > in embedded systems. The user usually switch off a small box by > pressing an OFF button that usually is connected to the main power > supply input. In any case, he could immediately unplug the power cord > without waiting for the end of the shutdown procedure. > > I'm interesting to know what are the methods to use to reduce the > probability of corruption. > > For example, I choose to use a sqlite database to save non-volatile user > configurable settings. sqlite is transaction based, so a power > interruption in the middle of a transaction shouldn't corrupt the entire > database. With normal text files this should be more difficult. > > I know the write requests on non-volatile memories (HDD, embedded Flash > memories) are usually buffered by OS and we don't know when they will be > really executed by the kernel. Is there a method to force the buffered > writing requests immediately? > > Other aspects to consider?sync should do this, but you can never be sure that the buffers and caches of the medium will also b flushed. I have several devices that *may* not get corrupted (helicopter navigation). But here I don't have many writes. I mount all partitions read-only. When a write is needed the partition gets temporary remounted as rw (mount -o remount,rw ...) and after the write (and sync) is done it gets remounted as ro. This together with a journalling file system keeps the "window of vulnerability" as small as possible. With more than 100 system flying for more than 10 years I did not have one corrupted file system. The first half of that systems used mechanical hard disks in such a high vibration environment without problems and no, no shock mounts, fixed rigid mounting. It's no guaranty but at least a low probability of problems. -- Reinhardt
Reply by ●June 16, 20172017-06-16
Il giorno venerdì 16 giugno 2017 12:10:30 UTC+2, pozz ha scritto:> I'm playing with a Raspberry system, however I think my question is > about Linux embedded in general. > > We all know that the OS (linux or windows or whatever) *should* be > gracefully powered down with a shutdown procedure (shutdown command in > Linux). We must avoid cutting the power abruptly. > > If this is possible for desktop systems, IMHO it's impossible to achieve > in embedded systems. The user usually switch off a small box by > pressing an OFF button that usually is connected to the main power > supply input. In any case, he could immediately unplug the power cord > without waiting for the end of the shutdown procedure. > > I'm interesting to know what are the methods to use to reduce the > probability of corruption.you put a big capacitor on the power line so that when the main power is cut, the capacitor keep the system running the time necessary to gracefully shutdown. Bye Jack
Reply by ●June 16, 20172017-06-16
Il 16/06/2017 12:45, Reinhardt Behm ha scritto:> AT Friday 16 June 2017 18:10, pozz wrote: > >> I'm playing with a Raspberry system, however I think my question is >> about Linux embedded in general. >> >> We all know that the OS (linux or windows or whatever) *should* be >> gracefully powered down with a shutdown procedure (shutdown command in >> Linux). We must avoid cutting the power abruptly. >> >> If this is possible for desktop systems, IMHO it's impossible to achieve >> in embedded systems. The user usually switch off a small box by >> pressing an OFF button that usually is connected to the main power >> supply input. In any case, he could immediately unplug the power cord >> without waiting for the end of the shutdown procedure. >> >> I'm interesting to know what are the methods to use to reduce the >> probability of corruption. >> >> For example, I choose to use a sqlite database to save non-volatile user >> configurable settings. sqlite is transaction based, so a power >> interruption in the middle of a transaction shouldn't corrupt the entire >> database. With normal text files this should be more difficult. >> >> I know the write requests on non-volatile memories (HDD, embedded Flash >> memories) are usually buffered by OS and we don't know when they will be >> really executed by the kernel. Is there a method to force the buffered >> writing requests immediately? >> >> Other aspects to consider? > > sync should do this, but you can never be sure that the buffers and caches > of the medium will also b flushed.When exactly you call sync? I can call sync everytime my running applications write something on non-volatile memories, but I don't know when the OS writes something.> I have several devices that *may* not get corrupted (helicopter navigation). > But here I don't have many writes. > I mount all partitions read-only.What is your linux distribution? I know it's not simple to mount the root filesystem as read-only. Some Linux tasks start from the assumption they can write on root filesystems.> When a write is needed the partition gets > temporary remounted as rw (mount -o remount,rw ...) and after the write (and > sync) is done it gets remounted as ro. > This together with a journalling file system keeps the "window of > vulnerability" as small as possible.Do you use ext3 as journaled filesystem?> With more than 100 system flying for more than 10 years I did not have one > corrupted file system. > The first half of that systems used mechanical hard disks in such a high > vibration environment without problems and no, no shock mounts, fixed rigid > mounting. > It's no guaranty but at least a low probability of problems. > >
Reply by ●June 16, 20172017-06-16
On 16/06/17 12:10, pozz wrote:> I'm playing with a Raspberry system, however I think my question is > about Linux embedded in general. > > We all know that the OS (linux or windows or whatever) *should* be > gracefully powered down with a shutdown procedure (shutdown command in > Linux). We must avoid cutting the power abruptly. > > If this is possible for desktop systems, IMHO it's impossible to achieve > in embedded systems. The user usually switch off a small box by > pressing an OFF button that usually is connected to the main power > supply input. In any case, he could immediately unplug the power cord > without waiting for the end of the shutdown procedure. > > I'm interesting to know what are the methods to use to reduce the > probability of corruption. > > For example, I choose to use a sqlite database to save non-volatile user > configurable settings. sqlite is transaction based, so a power > interruption in the middle of a transaction shouldn't corrupt the entire > database. With normal text files this should be more difficult. > > I know the write requests on non-volatile memories (HDD, embedded Flash > memories) are usually buffered by OS and we don't know when they will be > really executed by the kernel. Is there a method to force the buffered > writing requests immediately? > > Other aspects to consider?Some storage mediums, such as sd cards, are very sensitive to being corrupted if they are powered off unexpectedly. There just is no way to make an sd card storage reliable in the face of unexpected power offs. In general, using read-only mounts helps enormously. In finer detail, some filesystems are more robust for unexpected poweroffs, such as LogFS, NILFS and FFS. And some file types are more susceptible to problems - sqlite databases are notorious for being corrupted if writes are interrupted. The only way to be really safe is to have enough capacitance on board to be able to finish your writes when the supply is cut off. Keep your writes short, use kernel parameter tuning (or filesystem mount options) to minimise the dirty data buffer time, and make sure that as much as possible is mounted read-only with things like /tmp, /var/lock, etc., on tmpfs mounts. And when a power fail occurs, re-mount your r/w filesystems as read only - that should force writing out buffers as fast as possible, and block any more writes, and can be perhaps be done early in the controlled shutdown (if you don't need to write too much log data on shutdown).
Reply by ●June 16, 20172017-06-16
On Fri, 16 Jun 2017 04:13:50 -0700 (PDT), Jack <jack4747@gmail.com> wrote:>Il giorno venerd� 16 giugno 2017 12:10:30 UTC+2, pozz ha scritto: >> I'm playing with a Raspberry system, however I think my question is >> about Linux embedded in general. >> >> We all know that the OS (linux or windows or whatever) *should* be >> gracefully powered down with a shutdown procedure (shutdown command in >> Linux). We must avoid cutting the power abruptly. >> >> If this is possible for desktop systems, IMHO it's impossible to achieve >> in embedded systems. The user usually switch off a small box by >> pressing an OFF button that usually is connected to the main power >> supply input. In any case, he could immediately unplug the power cord >> without waiting for the end of the shutdown procedure. >> >> I'm interesting to know what are the methods to use to reduce the >> probability of corruption. > >you put a big capacitor on the power line so that when the main power is cut, the capacitor keep the system running the time necessary to gracefully shutdown. > >Bye JackI second this. E.g. for 24 V (typically diesel engines) use a series diode, a _big_ storage capacitor and a SMPS with input voltage range at least 8-28 V, this should be enough time to sync out the data. Using SMPS input range much larger than 3:1 might not be that productive, since at the lower capacitor voltages, the energy stored is very small (inverse square law).
Reply by ●June 16, 20172017-06-16
AT Friday 16 June 2017 19:19, pozz wrote:> Il 16/06/2017 12:45, Reinhardt Behm ha scritto: >> AT Friday 16 June 2017 18:10, pozz wrote: >> >>> I'm playing with a Raspberry system, however I think my question is >>> about Linux embedded in general. >>> >>> We all know that the OS (linux or windows or whatever) *should* be >>> gracefully powered down with a shutdown procedure (shutdown command in >>> Linux). We must avoid cutting the power abruptly. >>> >>> If this is possible for desktop systems, IMHO it's impossible to achieve >>> in embedded systems. The user usually switch off a small box by >>> pressing an OFF button that usually is connected to the main power >>> supply input. In any case, he could immediately unplug the power cord >>> without waiting for the end of the shutdown procedure. >>> >>> I'm interesting to know what are the methods to use to reduce the >>> probability of corruption. >>> >>> For example, I choose to use a sqlite database to save non-volatile user >>> configurable settings. sqlite is transaction based, so a power >>> interruption in the middle of a transaction shouldn't corrupt the entire >>> database. With normal text files this should be more difficult. >>> >>> I know the write requests on non-volatile memories (HDD, embedded Flash >>> memories) are usually buffered by OS and we don't know when they will be >>> really executed by the kernel. Is there a method to force the buffered >>> writing requests immediately? >>> >>> Other aspects to consider? >> >> sync should do this, but you can never be sure that the buffers and >> caches of the medium will also b flushed. > > When exactly you call sync? > I can call sync everytime my running applications write something on > non-volatile memories, but I don't know when the OS writes something.After closing all open written files. So there is nothing buffered inside the application. Files open for read only are not effected. Writes are quite short, mostly files are newly created, written and closed again, like when the pilot changes some waypoint(s).> >> I have several devices that *may* not get corrupted (helicopter >> navigation). But here I don't have many writes. >> I mount all partitions read-only. > > What is your linux distribution? I know it's not simple to mount the > root filesystem as read-only. Some Linux tasks start from the > assumption they can write on root filesystems.The first was based on SuSE 9.3, later Arch. /etc, /var and /tmp are in RAM, /etc is created and filled during startup from a tarball in the initrd. Changing some configs is not simple. But this is rarely done. The software is certified (DO-178 DAL D), so changes require re-certification and a lot of paperwork.> >> When a write is needed the partition gets >> temporary remounted as rw (mount -o remount,rw ...) and after the write >> (and sync) is done it gets remounted as ro. >> This together with a journalling file system keeps the "window of >> vulnerability" as small as possible. > > Do you use ext3 as journaled filesystem?Yes, the first system used Reiser (2002).> > >> With more than 100 system flying for more than 10 years I did not have >> one corrupted file system. >> The first half of that systems used mechanical hard disks in such a high >> vibration environment without problems and no, no shock mounts, fixed >> rigid mounting. >> It's no guaranty but at least a low probability of problems.-- Reinhardt
Reply by ●June 16, 20172017-06-16
AT Friday 16 June 2017 19:31, David Brown wrote:> On 16/06/17 12:10, pozz wrote: >> I'm playing with a Raspberry system, however I think my question is >> about Linux embedded in general. >> >> We all know that the OS (linux or windows or whatever) *should* be >> gracefully powered down with a shutdown procedure (shutdown command in >> Linux). We must avoid cutting the power abruptly. >> >> If this is possible for desktop systems, IMHO it's impossible to achieve >> in embedded systems. The user usually switch off a small box by >> pressing an OFF button that usually is connected to the main power >> supply input. In any case, he could immediately unplug the power cord >> without waiting for the end of the shutdown procedure. >> >> I'm interesting to know what are the methods to use to reduce the >> probability of corruption. >> >> For example, I choose to use a sqlite database to save non-volatile user >> configurable settings. sqlite is transaction based, so a power >> interruption in the middle of a transaction shouldn't corrupt the entire >> database. With normal text files this should be more difficult. >> >> I know the write requests on non-volatile memories (HDD, embedded Flash >> memories) are usually buffered by OS and we don't know when they will be >> really executed by the kernel. Is there a method to force the buffered >> writing requests immediately? >> >> Other aspects to consider? > > Some storage mediums, such as sd cards, are very sensitive to being > corrupted if they are powered off unexpectedly. There just is no way to > make an sd card storage reliable in the face of unexpected power offs.You never know when the sd cards is done with writing which can include internal re-organization for wear leveling.> > In general, using read-only mounts helps enormously. In finer detail, > some filesystems are more robust for unexpected poweroffs, such as > LogFS, NILFS and FFS. And some file types are more susceptible to > problems - sqlite databases are notorious for being corrupted if writes > are interrupted. > > The only way to be really safe is to have enough capacitance on board to > be able to finish your writes when the supply is cut off. Keep your > writes short, use kernel parameter tuning (or filesystem mount options) > to minimise the dirty data buffer time, and make sure that as much as > possible is mounted read-only with things like /tmp, /var/lock, etc., on > tmpfs mounts. And when a power fail occurs, re-mount your r/w > filesystems as read only - that should force writing out buffers as fast > as possible, and block any more writes, and can be perhaps be done early > in the controlled shutdown (if you don't need to write too much log data > on shutdown).-- Reinhardt
Reply by ●June 16, 20172017-06-16
>>> I'm interesting to know what are the methods to use to reduce the >>> probability of corruption. >> >> you put a big capacitor on the power line so that when the main power is cut, the capacitor keep the system running the time necessary to gracefully shutdown.> E.g. for 24 V (typically diesel engines) use a series diode, a _big_ > storage capacitor and a SMPS with input voltage range at least 8-28 V, > this should be enough time to sync out the data.You'll also need some hardware to signal the software that external power has turned off. That event will tell your system that a little bit of time remains to shut down before the caps drain and the world ends. JJS
Reply by ●June 16, 20172017-06-16
On 6/16/2017 3:10 AM, pozz wrote:> I'm playing with a Raspberry system, however I think my question is about Linux > embedded in general. > > We all know that the OS (linux or windows or whatever) *should* be gracefully > powered down with a shutdown procedure (shutdown command in Linux). We must > avoid cutting the power abruptly. > > If this is possible for desktop systems, IMHO it's impossible to achieve in > embedded systems. The user usually switch off a small box by pressing an OFF > button that usually is connected to the main power supply input. In any case, > he could immediately unplug the power cord without waiting for the end of the > shutdown procedure. > > I'm interesting to know what are the methods to use to reduce the probability > of corruption.The easiest -- and most reliable -- is to have an "early warning" of an impending power loss (regardless of whether that is a genuine power loss, the power switch being flipped off *or* the power cord being unplugged). The notice must allow the system to operate *reliably* long enough for it to preserve whatever it *decides* to preserve (this might not be "everything" based on the magnitude of the warning interval). Consider most/many devices are normally "shutdown" in an orderly fashion. So, the application can go about saving EVERYTHING that it wants to save before removing power to itself (most power switches, nowadays, are *soft* power switches -- merely REQUESTS for power to be removed). Buffered log files -- and other objects which aren't *essential* to be flushed to permanent store -- can be written out in an orderly fashion, etc. (remember, each FILE* also typically represents a buffer in the application's space) OTOH, if you are powering down due as a result of an alert of impending power FAILURE, you have a fixed amount of time ("energy") to get work done. The system designer knows how long the system can be kept operational *and* the conditions that have to be met to do so. E.g., the software may have to immediately shut down peripherals that will sap the reserve power from the power supply's filters, extend the up-time from any on-board battery backup, etc. The software then has to decide which subset of everything that it would LIKE to preserve absolutely MUST be preserved. It's important not to be caught in the middle of a write to (most) permanent memory systems. Most technologies have the potential to corrupt large portions of the store if a "write cycle" is not strictly implemented. Its not like just *a* datum will have a bogus value but, rather, an entire page of data may be corrupted, etc. So, the last part of your shutdown routine is spinning in a tight loop (or, executing HALT) deliberately NOT trying to do anything lest some "write" happen to be corrupted at the instant the power fell below a level for reliable operation. I've designed products where the power switch had a second set of contacts that allowed the software to sense if the switch had been flipped (vs. power failing) so that I could alter the shutdown routine accordingly (the product being capable of extended operation on battery as it was used in a high availability application)> For example, I choose to use a sqlite database to save non-volatile user > configurable settings. sqlite is transaction based, so a power interruption in > the middle of a transaction shouldn't corrupt the entire database. With normal > text files this should be more difficult.That's not true. If the DBMS software is in the process of writing to persistent store *as* the power falls into the "not guaranteed to work" realm, it can corrupt OTHER memory beyond that related to your current transaction -- memory for transactions that have already been *committed*. Given a record (in the DBMS) that conceptually looks like: char name[20]; ... char address[40]; ... time_t birthdate; an access to "address" that is in progress when the power falls into the realm that isn't guaranteed to yield reliable operation can corrupt ANY of these stored values. Similarly, an access to some datum not shown, above, can corrupt any of *these*! You need to understand where each datum resides if you want to risk an "interrupted write". When you are alerted of an impending power failure, you want to finish ALL writes in the "window of reliable operation" that remains. If that means prematurely terminating a commit, then that's what you do else you risk corrupting previous completed commits!> I know the write requests on non-volatile memories (HDD, embedded Flash > memories) are usually buffered by OS and we don't know when they will be really > executed by the kernel. Is there a method to force the buffered writing > requests immediately?You can flush the buffer caches. Many DBMS's do exactly this to keep the persistent store up to date. I use different tablespaces for different resources based on the technology that I want "backing" those stores. I.e., do I want this part of the database to reside on disk, in DRAM, BBDRAM, BBSRAM or in FLASH?? Note that I also have to take durability into consideration, not just persistence (I don't want to be hammering on FLASH with frequent updates lest I "wear a hole through it" from overuse. :> [BBSRAM is the safest bet as I can ensure the time for individual write cycles giving me a finer "safe" granularity; with BBDRAM, its possible to "blow a row" if a write is interrupted (ditto FLASH)]> Other aspects to consider?Of course! - mechanisms have to be brought to safe states in an orderly fashion - some indication of whether the shutdown was completed successfully (or not) made so the bootstrap will know how exhaustively it should test the preserved state - external protocol connections have to be treated as local resources potentially left in an inconsistent state (they think everything is OK even though your device is now OFF) - is there a risk of those connections being hijacked by an adversary killing YOUR power with the knowledge that you DON'T notify those connections - informing your user that you are attempting a recovery (on next powerup) so he is aware of the potential for data loss (and, if the recovery process is time consuming, he doesn't fret over the long delay) - handling "double faults" (what happens if you are in the middle of recovering and power fails, again? This can be a common experience esp for times when power is coming up hesitantly) - etc. Its relatively easy to design a product that works "steady state". Most of the blemishes appear getting into -- or out of -- that steady state operating condition!