Safety-Critical Software Design

Hi Everyone,

Are there any formal requirements, guidelines, or recommendations for
software which will run in a safety-critical environment in the United
States or world-wide?

By a "safety-critical" environment I mean an environment in which a
failure can lead to loss of, or serious injury to, human life. For
example, automobile navigation systems, medical devices, lasers, etc.

I know there is the MISRA association and MISRA C. I am wondering if
there are others. 

My gut and experience tells me there should NEVER be software DIRECTLY
controlling signals of devices that might lead to human injury. Rather,
such devices should be controlled by discrete hardware, perhaps as
complex as an FPGA. There is always going to be a chance that a real
processor that, e.g., controls the enable signal to a laser is going to
crash with the signal enabled.

I realize that hardware-only control is subject to failures as well,
but they wouldn't seem to be nearly as likely as a coding failure.

Let me get even more specific: would it be acceptable to use a processor
running linux in such an application? My gut reaction is "Not only no,
but HELL no," but I'm not sure if I'm being overly cautious.

Any guidance, suggestions, comments, etc., would be appreciated.
-- 
Randy Yates, DSP/Embedded Firmware Developer
Digital Signal Labs
http://www.digitalsignallabs.com

Reply by Paul Rubin ●July 17, 20162016-07-17

Randy Yates <yates@digitalsignallabs.com> writes:
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?

This is a reasonable place to start reading, and has some useful references:

   http://www.dwheeler.com/essays/high-assurance-floss.html

> My gut and experience tells me there should NEVER be software DIRECTLY
> controlling signals of devices that might lead to human injury. Rather,
> such devices should be controlled by discrete hardware, perhaps as
> complex as an FPGA.

Like an FPGA running a softcore microprocessor? ;-)

Yes there are microprocessors intended for safety-critical applications.
They have things like dual cpu's running in lockstep, predictable
realtime response, ECC memory all the way through, etc.  Some of the ARM
Cortex-R series are rated for this.  

  http://www.ti.com/ww/en/launchpad/launchpads-hercules.html

has some cheap experimentation kits.

There are some Haskell DSL's for generating realtime C code with
various error-prone issues handled for you automatically:

  https://github.com/Copilot-Language
  https://github.com/Copilot-Language/atom_for_copilot

I've played with Atom and ImPROVE a little bit but not the rest of Copilot.

You might also want to look at SPARK/Ada.

Reply by Niklas Holsti ●July 17, 20162016-07-17

On 16-07-17 07:57 , Randy Yates wrote:
> Hi Everyone,
>
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?

Several, of course.

> By a "safety-critical" environment I mean an environment in which a
> failure can lead to loss of, or serious injury to, human life. For
> example, automobile navigation systems, medical devices, lasers, etc.
>
> I know there is the MISRA association and MISRA C. I am wondering if
> there are others.

One starting point is IEC 61508, https://en.wikipedia.org/wiki/IEC_61508.

> My gut and experience tells me there should NEVER be software DIRECTLY
> controlling signals of devices that might lead to human injury. Rather,
> such devices should be controlled by discrete hardware, perhaps as
> complex as an FPGA.

I'm not an expert, but I believe that the standards and requirements do 
not prohibit or mandate certain designs, but mandate certain analyses 
and the resulting assurances that the design has the required safety 
properties.

It is up to the designer to balance the complexity of the design against 
the complexity of the safety analysis or "safety case".

MISRA is more design-oriented.

> There is always going to be a chance that a real
> processor that, e.g., controls the enable signal to a laser is going to
> crash with the signal enabled.

Often the design provides a separate control system and a separate 
safety system that monitors the control system and prevents unsafe 
behaviour.

> Let me get even more specific: would it be acceptable to use a processor
> running linux in such an application? My gut reaction is "Not only no,
> but HELL no," but I'm not sure if I'm being overly cautious.

I don't thinkt that Linux would be prohibited out of hand, but showing 
that a Linux-based system has the required safety properties is probably 
harder than for a simpler, more to-the-point implementation.

HTH.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Reinhardt Behm ●July 17, 20162016-07-17

Randy Yates wrote:

> Hi Everyone,
> 
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?
> 
> By a "safety-critical" environment I mean an environment in which a
> failure can lead to loss of, or serious injury to, human life. For
> example, automobile navigation systems, medical devices, lasers, etc.
> 
> I know there is the MISRA association and MISRA C. I am wondering if
> there are others.
> 
> My gut and experience tells me there should NEVER be software DIRECTLY
> controlling signals of devices that might lead to human injury. Rather,
> such devices should be controlled by discrete hardware, perhaps as
> complex as an FPGA. There is always going to be a chance that a real
> processor that, e.g., controls the enable signal to a laser is going to
> crash with the signal enabled.
> 
> I realize that hardware-only control is subject to failures as well,
> but they wouldn't seem to be nearly as likely as a coding failure.
> 
> Let me get even more specific: would it be acceptable to use a processor
> running linux in such an application? My gut reaction is "Not only no,
> but HELL no," but I'm not sure if I'm being overly cautious.
> 
> Any guidance, suggestions, comments, etc., would be appreciated.

There are several regulations:
IEC 61508 for industrial functional safety
ISO 26262 for automotive
DO-178 for avionics
??? for medical

I am not at home with the first two, but develop avionics systems to DO-178. 
At its highest safety level (DAL-A) you have to trace back every line of 
code to system/high-level/low-level requirements and develop test procedures 
to verify that the software (and that is the whole system, from boot loader 
to OS, to libraries, to the application) fulfills these requirements and 
also that every decision ("if (a && b)...) and every combination of inputs 
("a" and "b" in this example) is correctly processed.
Also every low level requirement must be derived from high-level reqs, every 
HLR must be derived from Sys-Reqs and it must be shown that the step from 
Sys-Reqs to HLR covers all Sys-Req. The same for HLR to LLR.
You even have to certify or verify all tools that can contribute to errors 
or are used for verification and can lead to not detecting an error during 
verification.

If you think you can do it instead in hardware like an FPGA something 
similar applies: There is DO-254 for "complex hardware".

For any software that needs certification to DO-178 e.g. Level A the 
hardware has to be certified to the corresponding DO-254 level. Often you 
will also have not just one CPU but at least two doing the same by software 
written by different teams and some kind of interlock between these which 
brings the system into fail safe state in case of discrepancies. In such a 
case you also have to asses what fail has to be. Shutting down the engines 
in mid flight probably is not very fail safe.

Thing like MISRA C does by itself not guaranty any safety. It is just a set 
of guidelines. Mostly of what not to do because a less competent programmer 
might misuse a feature. 
Something in the line of "Somebody has cut himself with a knife, so we 
forbid to use of knifes."

The most critical part is the development of the Sys-Reqs, since everything 
derives from these. This is something that is in principle outside of the 
realm of the software designers. From what I know about some catastrophic 
failures, the problem was often rooted in incorrect Sys-Reqs.

One example was the fatal crash during landing with heavy side wind of an 
Airbus in Warsaw several years ago. It was caused by an interlock that 
prevented the thrust reversal to be activated when not both wheels signal 
"weight on wheel". This interlock was introduced because a Lauda-Air bird 
dropped from the sky in Malaysia (?) because the pilot had erroneously 
activated thrust reversal during normal flight. The people who had changed 
the Sys-Reqs for the flight software had not thought everything to its end. 
The software people were not to be blamed for correct implementation of this 
"feature" according to the Sys-Reqs.

My guideline is:
- first do safety assessments, identify any potential thread
- develop Sys-Reqs, take into account every dangerous situation and how to 
handle it (like: have a upper time limit for activation of the laser). 
Decide what to to in HW what in SW, the upper limit could better be a HW 
monoflop.
- review the Sys-Reqs and fix/freeze them, signed by the customer. Can every 
Sys-Req be verified? How do different Sys-Reqs interact to create an 
additional thread (see above fatal plane crash)
- develop HLRs. This will also define the overall design. Do nothing fancy, 
keep everything simple.
- review and freeze, Make sure each HLR can be verified during integration 
and verification.
- and so on...

In parallel develop the requirements for the test cases and develop the 
tests. These will help you during coding.

Never ever change HLRs or even Sys-Reqs when coding. When such changes need 
to be done, re-start the process including the reviews and the impact 
analysis for the changes.

-- 
Reinhardt

Reply by ●July 17, 20162016-07-17

On Sun, 17 Jul 2016 00:57:16 -0400, Randy Yates
<yates@digitalsignallabs.com> wrote:

>Hi Everyone,
>
>Are there any formal requirements, guidelines, or recommendations for
>software which will run in a safety-critical environment in the United
>States or world-wide?

At least keep the safety critical and non-safety critical systems well
apart, at least in separate (or even different type of hardware).

Analyzing the _small_ safety critical system becomes possible. Also
some organizations might dictate what data may be transferred between
the two systems, some might allow only data out from the safety
critical to the non-safety critical system.

>By a "safety-critical" environment I mean an environment in which a
>failure can lead to loss of, or serious injury to, human life. For
>example, automobile navigation systems, medical devices, lasers, etc.
>
>I know there is the MISRA association and MISRA C. I am wondering if
>there are others. 
>
>My gut and experience tells me there should NEVER be software DIRECTLY
>controlling signals of devices that might lead to human injury. Rather,
>such devices should be controlled by discrete hardware, perhaps as
>complex as an FPGA. 

Use a voter system, the simplest I have seen was a bar with three
solenoids attached each controlled by separate systems. When at least
two systems  agree, the bar is moved in that direction and the bar is
controlling something big.

>There is always going to be a chance that a real
>processor that, e.g., controls the enable signal to a laser is going to
>crash with the signal enabled.

Redundancy helps, especially if the systems are made by different
hardware and different programming teams. At last resort, use springs
or gravity to handle power loss and similar problems 

>I realize that hardware-only control is subject to failures as well,
>but they wouldn't seem to be nearly as likely as a coding failure.

Gravity might be unreliable during an earthquake :-).

It is important that different safety systems are separate from each
other and preferably implemented with different technology. Remember
Fukushima, they had a lot of redundant emergency cooling diesel
generators, but they all got wet due to a single tsunami wave and rest
is history.

>Let me get even more specific: would it be acceptable to use a processor
>running linux in such an application? My gut reaction is "Not only no,
>but HELL no," but I'm not sure if I'm being overly cautious.

The question is, why would you need such complex general purpose
operating system for running small safety critical systems ?

However, I wouldn't be too surprised finding some heavily stripped
down Linux based system.

>Any guidance, suggestions, comments, etc., would be appreciated.

Reply by Don Y ●July 17, 20162016-07-17

On 7/16/2016 9:57 PM, Randy Yates wrote:
> Hi Everyone,
>
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?
>
> By a "safety-critical" environment I mean an environment in which a
> failure can lead to loss of, or serious injury to, human life. For
> example, automobile navigation systems, medical devices, lasers, etc.

Like, say, a self-driving car?  XRay?  CAT?  Fly-by-wire aircraft?
Countless process control devices?

> I know there is the MISRA association and MISRA C. I am wondering if
> there are others.

IME (in a few different "regulated" industries), *process* seems to
be what gets stressed.  *How* did you come to this design?  Can you
SHOW your "work"?  If there's any hand-waving, EXPECT to have problems!

MISRA is really inconsequential -- despite being adopted by some
industries.  It's like telling folks not to run while carrying scissors.
They're guidelines just like HR has guidelines for whose CV's they'll
consider (the analogy is deliberate:  guidelines can be unnecessarily
limiting!  Sort of like trying to avoid putting a goto in your code!)

Do you put an explicit test to verify the divisor in each division
instance is NOT zero?  That the argument to each sqrt() is non-negative?
(Why not??)

> My gut and experience tells me there should NEVER be software DIRECTLY
> controlling signals of devices that might lead to human injury. Rather,
> such devices should be controlled by discrete hardware, perhaps as
> complex as an FPGA. There is always going to be a chance that a real
> processor that, e.g., controls the enable signal to a laser is going to
> crash with the signal enabled.

That's the purpose of failsafes and watchdogs.

Note that something (i.e., software) still has to TELL that hardware to
allow the device to turn on.  So, you're still "trusting" the software...

I use hardware protections for things like keeping personnel and body
parts out of areas where they can be harmed by mechanisms in motion.
I.e., "guards" with interlocks that make it hard for a user to
casually/carelessly put himself in harms way

> I realize that hardware-only control is subject to failures as well,
> but they wouldn't seem to be nearly as likely as a coding failure.
>
> Let me get even more specific: would it be acceptable to use a processor
> running linux in such an application? My gut reaction is "Not only no,
> but HELL no," but I'm not sure if I'm being overly cautious.

The rationale for "hell no" is that you can't vouch for the code
you're "leveraging"!  And, probably couldn't find a small army of
people who, in concert, *could*!

> Any guidance, suggestions, comments, etc., would be appreciated.

As with any design:  make everything as simple as it can be -- but
no simpler!

You want to be able to convince yourself that you know EVERYTHING
that can affect a safety-related issue -- without having to wonder
how much stuff is obfuscated in the countless "black boxes" in
most designs.

Remember, those black boxes can evolve -- in ways that make your
safety-related ASSUMPTIONS about them invalid!

When I design a solution (hardware and/or software), I am ALWAYS
asking myself:  how can I *break* this?  And, the goal is NOT to
convince myself that the vulnerability CAN'T HAPPEN but, rather,
to figure out how I will protect against it happening (as unlikely
as it may seem!).

And, *record* all of this so its obvious (to yourself and others)
why you did something that *seems* "impossible".

Reply by Don Y ●July 17, 20162016-07-17

On 7/17/2016 3:09 AM, Don Y wrote:
> When I design a solution (hardware and/or software), I am ALWAYS
> asking myself:  how can I *break* this?  And, the goal is NOT to
> convince myself that the vulnerability CAN'T HAPPEN but, rather,
> to figure out how I will protect against it happening (as unlikely
> as it may seem!).
>
> And, *record* all of this so its obvious (to yourself and others)
> why you did something that *seems* "impossible".

An actual example of this sort of "can't happen" thinking:

I frequently design products that move mechanisms.  Often, MASSIVE
mechanisms (e.g., 10HP motors).

Invariably, there are limit switches on the extremes of travel.
And, these aren't always "hardware interlocks" but, rather, rely
on software to interpret their state and adjust the motion of
the mechanism, accordingly.  I.e., STOP moving IN THAT DIRECTION
when the limit is reached (you don't want to cut power to the motor
when it hits the limit because you will eventually want to move OUT
of that limit!)

So, the code effectively looks like:
      if (moving_leftward) {
          if (left_limit_switch_entered) {
               stop();
          }
      }
      if (moving_rightward) {
          if (right_limit_switch_entered) {
               stop();
          }
      }
Makes sense, right?

But, it's relatively easy for a motor to be wired incorrectly.
Or, the harness designed without an adequate key.  Or, the
limit switch harnesses to be swapped.  Or...

In these cases, you can run the mechanism out *past* the limit
switch and damage the equipment or personnel ("Yeah, the bed
is in motion but I'm safe -- it will stop at the limit switch
before it reaches me!")

So, I watch *both* limit switches regardless of which direction
the mechanism is traveling.  If I see a transition on the "wrong"
switch, I bring about the STOP *and* throw an error:
   "The mechanism was traveling right and I saw activity on the
   LEFT limit switch!  That's not possible!  Yet, I SAW IT!!"

Reply by Niklas Holsti ●July 17, 20162016-07-17

On 16-07-17 13:32 , Don Y wrote:
> On 7/17/2016 3:09 AM, Don Y wrote:
>> When I design a solution (hardware and/or software), I am ALWAYS
>> asking myself:  how can I *break* this?  And, the goal is NOT to
>> convince myself that the vulnerability CAN'T HAPPEN but, rather,
>> to figure out how I will protect against it happening (as unlikely
>> as it may seem!).
>>
>> And, *record* all of this so its obvious (to yourself and others)
>> why you did something that *seems* "impossible".
>
> An actual example of this sort of "can't happen" thinking:
>
> I frequently design products that move mechanisms.  Often, MASSIVE
> mechanisms (e.g., 10HP motors).
>
> Invariably, there are limit switches on the extremes of travel.
> And, these aren't always "hardware interlocks" but, rather, rely
> on software to interpret their state and adjust the motion of
> the mechanism, accordingly.  I.e., STOP moving IN THAT DIRECTION
> when the limit is reached (you don't want to cut power to the motor
> when it hits the limit because you will eventually want to move OUT
> of that limit!)

Sometimes similar errors happen with hardware interlocks.

(Anecdote warning:)

Some years ago, an error of mine, combined with such HW interlocks, 
nearly destroyed the dome hatch of the Nordic Optical Telescope, sited 
on one of the Canary Islands. The hatch is a rectangular section of the 
hemispherical telescope dome, about 3 meters wide in the azimuth 
direction and perhaps 6 meters long in the elevation direction and (I 
think) some hundreds of kilograms in weight. The hatch can slide along 
its curved long edges from one side of the dome, where it covers the 
similarly shaped dome opening, up over the peak of the dome to the other 
side, uncovering the dome opening and letting the telescope view the sky.

The hatch is driven by an electrical motor. The control computer can 
send a "close" command or an "open" command, and there are electrical 
limit switches in the fully-closed position and the fully-open position 
which stop the motion.

One night the astronomers had given the command to open the hatch, but 
before it was fully open they noticed that humid mist was blowing in to 
the mountain-top and so they gave the command to close the hatch. The 
computer switched off the "open" command and immediately switched on the 
"close" command. Turned out the motor is of a type that has to be 
stopped before its direction of motion can be reversed; if already 
running in the "open" direction, switching the command from "open" to 
"close" leaves the motor happily running in the "open" direction, but 
energized by the "close" command.

So, after a while, the hatch activated the fully-open limit switch. 
Turned out that this switch is wired to interrupt only the "open" 
command, not the "close" command. So the hatch kept moving in the "open" 
direction until it hit the mechanical stops, which turned out not to be 
strong enough to stop the motion... the result was some bent metal, but 
luckily an engineer was present who could prop up the hatch with some 
wooden beams which kept it from falling entirely off the dome.

(Later analysis of the error revealed that the hatch was originally 
designed to be operated from a manual handset which had two 
press-and-hold buttons, one for "open" and the other for "close. The 
handset also had a small sliding cover such that only one of the buttons 
was accessible and pressable at a time. This ensured that the operator 
could not switch immediately from an "open" command to a "close" command 
or vice versa, and prevented the problem. However, the information had 
not been translated into requirements on the control SW. Bummer. But 
this illustrates that safety is a system property.)

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Tim Wescott ●July 17, 20162016-07-17

On Sun, 17 Jul 2016 00:57:16 -0400, Randy Yates wrote:

> Hi Everyone,
> 
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?
> 
> By a "safety-critical" environment I mean an environment in which a
> failure can lead to loss of, or serious injury to, human life. For
> example, automobile navigation systems, medical devices, lasers, etc.
> 
> I know there is the MISRA association and MISRA C. I am wondering if
> there are others.
> 
> My gut and experience tells me there should NEVER be software DIRECTLY
> controlling signals of devices that might lead to human injury. Rather,
> such devices should be controlled by discrete hardware, perhaps as
> complex as an FPGA. There is always going to be a chance that a real
> processor that, e.g., controls the enable signal to a laser is going to
> crash with the signal enabled.
> 
> I realize that hardware-only control is subject to failures as well, but
> they wouldn't seem to be nearly as likely as a coding failure.
> 
> Let me get even more specific: would it be acceptable to use a processor
> running linux in such an application? My gut reaction is "Not only no,
> but HELL no," but I'm not sure if I'm being overly cautious.
> 
> Any guidance, suggestions, comments, etc., would be appreciated.

Try searching on the terms?

The standards that I know about are DO-178, MIL-STD-498 and MIL-
STD-2167.  Searching on some of the standards given should give you some 
threads to start looking into.

Regardless of your gut and experience, there's plenty of places where 
software DOES directly control signals or devices that might lead to 
injury or death -- but the software design methods are much more 
stringent.

DO-178 lists five levels of criticality, ranging from "E" (the in-flight 
movie doesn't work) through "A" (smoking hole in the ground surrounded by 
TV crews).  "E" is pretty much "anything goes" -- i.e., go ahead and use 
commercial software.  The rule of thumb is that each time you bump up a 
level, the amount of work on the software alone goes up by, roughly, a 
factor of 7, and (as mentioned), the hardware and all the tools used in 
the software must march with the software design.  

I can't say that the details are the same for the FDA-approved stuff, but 
I've got friends that worked on pacemaker software, and the general vibe 
was the same.  For instance, Protocol Systems, before they got bought by 
Welsh-Allen, would build a whole prototype, do animal tests on it, then 
design the whole pacemaker again from scratch, using lessons learned.  I 
think Welsh-Allen did the same thing.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by FreeRTOS info ●July 21, 20162016-07-21

On 17/07/2016 05:57, Randy Yates wrote:
> Hi Everyone,
>
> Are there any formal requirements, guidelines, or recommendations for
> software which will run in a safety-critical environment in the United
> States or world-wide?
>


Gosh, lots, here is a start:

For industrial look at IEC61508, and all its industry specific derivatives.

For aerospace look at DO178C.

For rail look at EN50128/EN50129

For automotive look at ISO 26262

For medical look at IEC 62304 and/or FDA 510(K)

etc.....try some simple Google searches for whichever industry you are 
working in.

--
Regards,
Richard.

+ http://www.FreeRTOS.org
The de facto standard, downloaded every 4.2 minutes during 2015.

+ http://www.FreeRTOS.org/plus
IoT, Trace, Certification, TCP/IP, FAT FS, Training, and more...

Previous12 Next

Safety-Critical Software Design

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group