EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Safety-Critical Software Design

Started by Randy Yates July 17, 2016
Hi Everyone,

Are there any formal requirements, guidelines, or recommendations for
software which will run in a safety-critical environment in the United
States or world-wide?

By a "safety-critical" environment I mean an environment in which a
failure can lead to loss of, or serious injury to, human life. For
example, automobile navigation systems, medical devices, lasers, etc.

I know there is the MISRA association and MISRA C. I am wondering if
there are others. 

My gut and experience tells me there should NEVER be software DIRECTLY
controlling signals of devices that might lead to human injury. Rather,
such devices should be controlled by discrete hardware, perhaps as
complex as an FPGA. There is always going to be a chance that a real
processor that, e.g., controls the enable signal to a laser is going to
crash with the signal enabled.

I realize that hardware-only control is subject to failures as well,
but they wouldn't seem to be nearly as likely as a coding failure.

Let me get even more specific: would it be acceptable to use a processor
running linux in such an application? My gut reaction is "Not only no,
but HELL no," but I'm not sure if I'm being overly cautious.

Any guidance, suggestions, comments, etc., would be appreciated.
-- 
Randy Yates, DSP/Embedded Firmware Developer
Digital Signal Labs
http://www.digitalsignallabs.com
Randy Yates <yates@digitalsignallabs.com> writes:
> Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide?
This is a reasonable place to start reading, and has some useful references: http://www.dwheeler.com/essays/high-assurance-floss.html
> My gut and experience tells me there should NEVER be software DIRECTLY > controlling signals of devices that might lead to human injury. Rather, > such devices should be controlled by discrete hardware, perhaps as > complex as an FPGA.
Like an FPGA running a softcore microprocessor? ;-) Yes there are microprocessors intended for safety-critical applications. They have things like dual cpu's running in lockstep, predictable realtime response, ECC memory all the way through, etc. Some of the ARM Cortex-R series are rated for this. http://www.ti.com/ww/en/launchpad/launchpads-hercules.html has some cheap experimentation kits. There are some Haskell DSL's for generating realtime C code with various error-prone issues handled for you automatically: https://github.com/Copilot-Language https://github.com/Copilot-Language/atom_for_copilot I've played with Atom and ImPROVE a little bit but not the rest of Copilot. You might also want to look at SPARK/Ada.
On 16-07-17 07:57 , Randy Yates wrote:
> Hi Everyone, > > Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide?
Several, of course.
> By a "safety-critical" environment I mean an environment in which a > failure can lead to loss of, or serious injury to, human life. For > example, automobile navigation systems, medical devices, lasers, etc. > > I know there is the MISRA association and MISRA C. I am wondering if > there are others.
One starting point is IEC 61508, https://en.wikipedia.org/wiki/IEC_61508.
> My gut and experience tells me there should NEVER be software DIRECTLY > controlling signals of devices that might lead to human injury. Rather, > such devices should be controlled by discrete hardware, perhaps as > complex as an FPGA.
I'm not an expert, but I believe that the standards and requirements do not prohibit or mandate certain designs, but mandate certain analyses and the resulting assurances that the design has the required safety properties. It is up to the designer to balance the complexity of the design against the complexity of the safety analysis or "safety case". MISRA is more design-oriented.
> There is always going to be a chance that a real > processor that, e.g., controls the enable signal to a laser is going to > crash with the signal enabled.
Often the design provides a separate control system and a separate safety system that monitors the control system and prevents unsafe behaviour.
> Let me get even more specific: would it be acceptable to use a processor > running linux in such an application? My gut reaction is "Not only no, > but HELL no," but I'm not sure if I'm being overly cautious.
I don't thinkt that Linux would be prohibited out of hand, but showing that a Linux-based system has the required safety properties is probably harder than for a simpler, more to-the-point implementation. HTH. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Randy Yates wrote:

> Hi Everyone, > > Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide? > > By a "safety-critical" environment I mean an environment in which a > failure can lead to loss of, or serious injury to, human life. For > example, automobile navigation systems, medical devices, lasers, etc. > > I know there is the MISRA association and MISRA C. I am wondering if > there are others. > > My gut and experience tells me there should NEVER be software DIRECTLY > controlling signals of devices that might lead to human injury. Rather, > such devices should be controlled by discrete hardware, perhaps as > complex as an FPGA. There is always going to be a chance that a real > processor that, e.g., controls the enable signal to a laser is going to > crash with the signal enabled. > > I realize that hardware-only control is subject to failures as well, > but they wouldn't seem to be nearly as likely as a coding failure. > > Let me get even more specific: would it be acceptable to use a processor > running linux in such an application? My gut reaction is "Not only no, > but HELL no," but I'm not sure if I'm being overly cautious. > > Any guidance, suggestions, comments, etc., would be appreciated.
There are several regulations: IEC 61508 for industrial functional safety ISO 26262 for automotive DO-178 for avionics ??? for medical I am not at home with the first two, but develop avionics systems to DO-178. At its highest safety level (DAL-A) you have to trace back every line of code to system/high-level/low-level requirements and develop test procedures to verify that the software (and that is the whole system, from boot loader to OS, to libraries, to the application) fulfills these requirements and also that every decision ("if (a && b)...) and every combination of inputs ("a" and "b" in this example) is correctly processed. Also every low level requirement must be derived from high-level reqs, every HLR must be derived from Sys-Reqs and it must be shown that the step from Sys-Reqs to HLR covers all Sys-Req. The same for HLR to LLR. You even have to certify or verify all tools that can contribute to errors or are used for verification and can lead to not detecting an error during verification. If you think you can do it instead in hardware like an FPGA something similar applies: There is DO-254 for "complex hardware". For any software that needs certification to DO-178 e.g. Level A the hardware has to be certified to the corresponding DO-254 level. Often you will also have not just one CPU but at least two doing the same by software written by different teams and some kind of interlock between these which brings the system into fail safe state in case of discrepancies. In such a case you also have to asses what fail has to be. Shutting down the engines in mid flight probably is not very fail safe. Thing like MISRA C does by itself not guaranty any safety. It is just a set of guidelines. Mostly of what not to do because a less competent programmer might misuse a feature. Something in the line of "Somebody has cut himself with a knife, so we forbid to use of knifes." The most critical part is the development of the Sys-Reqs, since everything derives from these. This is something that is in principle outside of the realm of the software designers. From what I know about some catastrophic failures, the problem was often rooted in incorrect Sys-Reqs. One example was the fatal crash during landing with heavy side wind of an Airbus in Warsaw several years ago. It was caused by an interlock that prevented the thrust reversal to be activated when not both wheels signal "weight on wheel". This interlock was introduced because a Lauda-Air bird dropped from the sky in Malaysia (?) because the pilot had erroneously activated thrust reversal during normal flight. The people who had changed the Sys-Reqs for the flight software had not thought everything to its end. The software people were not to be blamed for correct implementation of this "feature" according to the Sys-Reqs. My guideline is: - first do safety assessments, identify any potential thread - develop Sys-Reqs, take into account every dangerous situation and how to handle it (like: have a upper time limit for activation of the laser). Decide what to to in HW what in SW, the upper limit could better be a HW monoflop. - review the Sys-Reqs and fix/freeze them, signed by the customer. Can every Sys-Req be verified? How do different Sys-Reqs interact to create an additional thread (see above fatal plane crash) - develop HLRs. This will also define the overall design. Do nothing fancy, keep everything simple. - review and freeze, Make sure each HLR can be verified during integration and verification. - and so on... In parallel develop the requirements for the test cases and develop the tests. These will help you during coding. Never ever change HLRs or even Sys-Reqs when coding. When such changes need to be done, re-start the process including the reviews and the impact analysis for the changes. -- Reinhardt
On Sun, 17 Jul 2016 00:57:16 -0400, Randy Yates
<yates@digitalsignallabs.com> wrote:

>Hi Everyone, > >Are there any formal requirements, guidelines, or recommendations for >software which will run in a safety-critical environment in the United >States or world-wide?
At least keep the safety critical and non-safety critical systems well apart, at least in separate (or even different type of hardware). Analyzing the _small_ safety critical system becomes possible. Also some organizations might dictate what data may be transferred between the two systems, some might allow only data out from the safety critical to the non-safety critical system.
>By a "safety-critical" environment I mean an environment in which a >failure can lead to loss of, or serious injury to, human life. For >example, automobile navigation systems, medical devices, lasers, etc. > >I know there is the MISRA association and MISRA C. I am wondering if >there are others. > >My gut and experience tells me there should NEVER be software DIRECTLY >controlling signals of devices that might lead to human injury. Rather, >such devices should be controlled by discrete hardware, perhaps as >complex as an FPGA.
Use a voter system, the simplest I have seen was a bar with three solenoids attached each controlled by separate systems. When at least two systems agree, the bar is moved in that direction and the bar is controlling something big.
>There is always going to be a chance that a real >processor that, e.g., controls the enable signal to a laser is going to >crash with the signal enabled.
Redundancy helps, especially if the systems are made by different hardware and different programming teams. At last resort, use springs or gravity to handle power loss and similar problems
>I realize that hardware-only control is subject to failures as well, >but they wouldn't seem to be nearly as likely as a coding failure.
Gravity might be unreliable during an earthquake :-). It is important that different safety systems are separate from each other and preferably implemented with different technology. Remember Fukushima, they had a lot of redundant emergency cooling diesel generators, but they all got wet due to a single tsunami wave and rest is history.
>Let me get even more specific: would it be acceptable to use a processor >running linux in such an application? My gut reaction is "Not only no, >but HELL no," but I'm not sure if I'm being overly cautious.
The question is, why would you need such complex general purpose operating system for running small safety critical systems ? However, I wouldn't be too surprised finding some heavily stripped down Linux based system.
>Any guidance, suggestions, comments, etc., would be appreciated.
On 7/16/2016 9:57 PM, Randy Yates wrote:
> Hi Everyone, > > Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide? > > By a "safety-critical" environment I mean an environment in which a > failure can lead to loss of, or serious injury to, human life. For > example, automobile navigation systems, medical devices, lasers, etc.
Like, say, a self-driving car? XRay? CAT? Fly-by-wire aircraft? Countless process control devices?
> I know there is the MISRA association and MISRA C. I am wondering if > there are others.
IME (in a few different "regulated" industries), *process* seems to be what gets stressed. *How* did you come to this design? Can you SHOW your "work"? If there's any hand-waving, EXPECT to have problems! MISRA is really inconsequential -- despite being adopted by some industries. It's like telling folks not to run while carrying scissors. They're guidelines just like HR has guidelines for whose CV's they'll consider (the analogy is deliberate: guidelines can be unnecessarily limiting! Sort of like trying to avoid putting a goto in your code!) Do you put an explicit test to verify the divisor in each division instance is NOT zero? That the argument to each sqrt() is non-negative? (Why not??)
> My gut and experience tells me there should NEVER be software DIRECTLY > controlling signals of devices that might lead to human injury. Rather, > such devices should be controlled by discrete hardware, perhaps as > complex as an FPGA. There is always going to be a chance that a real > processor that, e.g., controls the enable signal to a laser is going to > crash with the signal enabled.
That's the purpose of failsafes and watchdogs. Note that something (i.e., software) still has to TELL that hardware to allow the device to turn on. So, you're still "trusting" the software... I use hardware protections for things like keeping personnel and body parts out of areas where they can be harmed by mechanisms in motion. I.e., "guards" with interlocks that make it hard for a user to casually/carelessly put himself in harms way
> I realize that hardware-only control is subject to failures as well, > but they wouldn't seem to be nearly as likely as a coding failure. > > Let me get even more specific: would it be acceptable to use a processor > running linux in such an application? My gut reaction is "Not only no, > but HELL no," but I'm not sure if I'm being overly cautious.
The rationale for "hell no" is that you can't vouch for the code you're "leveraging"! And, probably couldn't find a small army of people who, in concert, *could*!
> Any guidance, suggestions, comments, etc., would be appreciated.
As with any design: make everything as simple as it can be -- but no simpler! You want to be able to convince yourself that you know EVERYTHING that can affect a safety-related issue -- without having to wonder how much stuff is obfuscated in the countless "black boxes" in most designs. Remember, those black boxes can evolve -- in ways that make your safety-related ASSUMPTIONS about them invalid! When I design a solution (hardware and/or software), I am ALWAYS asking myself: how can I *break* this? And, the goal is NOT to convince myself that the vulnerability CAN'T HAPPEN but, rather, to figure out how I will protect against it happening (as unlikely as it may seem!). And, *record* all of this so its obvious (to yourself and others) why you did something that *seems* "impossible".
On 7/17/2016 3:09 AM, Don Y wrote:
> When I design a solution (hardware and/or software), I am ALWAYS > asking myself: how can I *break* this? And, the goal is NOT to > convince myself that the vulnerability CAN'T HAPPEN but, rather, > to figure out how I will protect against it happening (as unlikely > as it may seem!). > > And, *record* all of this so its obvious (to yourself and others) > why you did something that *seems* "impossible".
An actual example of this sort of "can't happen" thinking: I frequently design products that move mechanisms. Often, MASSIVE mechanisms (e.g., 10HP motors). Invariably, there are limit switches on the extremes of travel. And, these aren't always "hardware interlocks" but, rather, rely on software to interpret their state and adjust the motion of the mechanism, accordingly. I.e., STOP moving IN THAT DIRECTION when the limit is reached (you don't want to cut power to the motor when it hits the limit because you will eventually want to move OUT of that limit!) So, the code effectively looks like: if (moving_leftward) { if (left_limit_switch_entered) { stop(); } } if (moving_rightward) { if (right_limit_switch_entered) { stop(); } } Makes sense, right? But, it's relatively easy for a motor to be wired incorrectly. Or, the harness designed without an adequate key. Or, the limit switch harnesses to be swapped. Or... In these cases, you can run the mechanism out *past* the limit switch and damage the equipment or personnel ("Yeah, the bed is in motion but I'm safe -- it will stop at the limit switch before it reaches me!") So, I watch *both* limit switches regardless of which direction the mechanism is traveling. If I see a transition on the "wrong" switch, I bring about the STOP *and* throw an error: "The mechanism was traveling right and I saw activity on the LEFT limit switch! That's not possible! Yet, I SAW IT!!"
On 16-07-17 13:32 , Don Y wrote:
> On 7/17/2016 3:09 AM, Don Y wrote: >> When I design a solution (hardware and/or software), I am ALWAYS >> asking myself: how can I *break* this? And, the goal is NOT to >> convince myself that the vulnerability CAN'T HAPPEN but, rather, >> to figure out how I will protect against it happening (as unlikely >> as it may seem!). >> >> And, *record* all of this so its obvious (to yourself and others) >> why you did something that *seems* "impossible". > > An actual example of this sort of "can't happen" thinking: > > I frequently design products that move mechanisms. Often, MASSIVE > mechanisms (e.g., 10HP motors). > > Invariably, there are limit switches on the extremes of travel. > And, these aren't always "hardware interlocks" but, rather, rely > on software to interpret their state and adjust the motion of > the mechanism, accordingly. I.e., STOP moving IN THAT DIRECTION > when the limit is reached (you don't want to cut power to the motor > when it hits the limit because you will eventually want to move OUT > of that limit!)
Sometimes similar errors happen with hardware interlocks. (Anecdote warning:) Some years ago, an error of mine, combined with such HW interlocks, nearly destroyed the dome hatch of the Nordic Optical Telescope, sited on one of the Canary Islands. The hatch is a rectangular section of the hemispherical telescope dome, about 3 meters wide in the azimuth direction and perhaps 6 meters long in the elevation direction and (I think) some hundreds of kilograms in weight. The hatch can slide along its curved long edges from one side of the dome, where it covers the similarly shaped dome opening, up over the peak of the dome to the other side, uncovering the dome opening and letting the telescope view the sky. The hatch is driven by an electrical motor. The control computer can send a "close" command or an "open" command, and there are electrical limit switches in the fully-closed position and the fully-open position which stop the motion. One night the astronomers had given the command to open the hatch, but before it was fully open they noticed that humid mist was blowing in to the mountain-top and so they gave the command to close the hatch. The computer switched off the "open" command and immediately switched on the "close" command. Turned out the motor is of a type that has to be stopped before its direction of motion can be reversed; if already running in the "open" direction, switching the command from "open" to "close" leaves the motor happily running in the "open" direction, but energized by the "close" command. So, after a while, the hatch activated the fully-open limit switch. Turned out that this switch is wired to interrupt only the "open" command, not the "close" command. So the hatch kept moving in the "open" direction until it hit the mechanical stops, which turned out not to be strong enough to stop the motion... the result was some bent metal, but luckily an engineer was present who could prop up the hatch with some wooden beams which kept it from falling entirely off the dome. (Later analysis of the error revealed that the hatch was originally designed to be operated from a manual handset which had two press-and-hold buttons, one for "open" and the other for "close. The handset also had a small sliding cover such that only one of the buttons was accessible and pressable at a time. This ensured that the operator could not switch immediately from an "open" command to a "close" command or vice versa, and prevented the problem. However, the information had not been translated into requirements on the control SW. Bummer. But this illustrates that safety is a system property.) -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Sun, 17 Jul 2016 00:57:16 -0400, Randy Yates wrote:

> Hi Everyone, > > Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide? > > By a "safety-critical" environment I mean an environment in which a > failure can lead to loss of, or serious injury to, human life. For > example, automobile navigation systems, medical devices, lasers, etc. > > I know there is the MISRA association and MISRA C. I am wondering if > there are others. > > My gut and experience tells me there should NEVER be software DIRECTLY > controlling signals of devices that might lead to human injury. Rather, > such devices should be controlled by discrete hardware, perhaps as > complex as an FPGA. There is always going to be a chance that a real > processor that, e.g., controls the enable signal to a laser is going to > crash with the signal enabled. > > I realize that hardware-only control is subject to failures as well, but > they wouldn't seem to be nearly as likely as a coding failure. > > Let me get even more specific: would it be acceptable to use a processor > running linux in such an application? My gut reaction is "Not only no, > but HELL no," but I'm not sure if I'm being overly cautious. > > Any guidance, suggestions, comments, etc., would be appreciated.
Try searching on the terms? The standards that I know about are DO-178, MIL-STD-498 and MIL- STD-2167. Searching on some of the standards given should give you some threads to start looking into. Regardless of your gut and experience, there's plenty of places where software DOES directly control signals or devices that might lead to injury or death -- but the software design methods are much more stringent. DO-178 lists five levels of criticality, ranging from "E" (the in-flight movie doesn't work) through "A" (smoking hole in the ground surrounded by TV crews). "E" is pretty much "anything goes" -- i.e., go ahead and use commercial software. The rule of thumb is that each time you bump up a level, the amount of work on the software alone goes up by, roughly, a factor of 7, and (as mentioned), the hardware and all the tools used in the software must march with the software design. I can't say that the details are the same for the FDA-approved stuff, but I've got friends that worked on pacemaker software, and the general vibe was the same. For instance, Protocol Systems, before they got bought by Welsh-Allen, would build a whole prototype, do animal tests on it, then design the whole pacemaker again from scratch, using lessons learned. I think Welsh-Allen did the same thing. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On 17/07/2016 05:57, Randy Yates wrote:
> Hi Everyone, > > Are there any formal requirements, guidelines, or recommendations for > software which will run in a safety-critical environment in the United > States or world-wide? >
Gosh, lots, here is a start: For industrial look at IEC61508, and all its industry specific derivatives. For aerospace look at DO178C. For rail look at EN50128/EN50129 For automotive look at ISO 26262 For medical look at IEC 62304 and/or FDA 510(K) etc.....try some simple Google searches for whichever industry you are working in. -- Regards, Richard. + http://www.FreeRTOS.org The de facto standard, downloaded every 4.2 minutes during 2015. + http://www.FreeRTOS.org/plus IoT, Trace, Certification, TCP/IP, FAT FS, Training, and more...

The 2024 Embedded Online Conference