Fail-safe

In engineering, a fail-safe is a design feature or practice that, in the event of a failure of the design feature, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is naturally inconsequential, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. If and when a "fail-safe" system fails, it remains at least as safe as it was before the failure.^[1]^[2] Since many types of failure are possible, failure mode and effects analysis is used to examine failure situations and recommend safety design and procedures.^[3]

Some systems can never be made fail-safe, as continuous availability is needed. Redundancy, fault tolerance, or contingency plans are used for these situations (e.g. multiple independently controlled and fuel-fed engines).^[4]

Examples

Mechanical or physical

Examples include:

Safety valves – Various devices that operate with fluids use fuses or safety valves as fail-safe mechanisms.
Roller-shutter fire doors that are activated by building alarm systems or local smoke detectors must close automatically when signaled regardless of power. In case of power outage the coiling fire door does not need to close, but must be capable of automatic closing when given a signal from the building alarm systems or smoke detectors. A temperature-sensitive fusible link may be employed to hold the fire doors open against gravity or a closing spring. In case of fire, the link melts and releases the doors, and they close.
Some airport baggage carts require that the person hold down a given cart's handbrake switch at all times; if the handbrake switch is released, the brake will activate, and assuming that all other portions of the braking system are working properly, the cart will stop. The handbrake-holding requirement thus both operates according to the principles of "fail-safety" and contributes to (but does not necessarily ensure) the fail-security of the system. This is an example of a dead man's switch.
Lawnmowers and snow blowers have a hand-closed lever that must be held down at all times. If it is released, it stops the blade's or rotor's rotation. This also functions as a dead man's switch.
Air brakes on railway trains and air brakes on trucks. The brakes are held in the "off" position by air pressure created in the brake system. Should a brake line split, or a carriage become uncoupled, the air pressure will be lost and the brakes applied, by springs in the case of trucks, or by a local air reservoir in trains. It is impossible to drive a truck with a serious leak in the air brake system. (Trucks may also employ wig wags to indicate low air pressure.)
Motorized gates – In case of power outage the gate can be pushed open by hand with no crank or key required. However, as this would allow virtually anyone to go through the gate, a fail-secure design is used: In a power outage, the gate can only be opened by a hand crank that is usually kept in a safe area or under lock and key. When such a gate provides vehicle access to homes, a fail-safe design is used, where the door opens to allow fire department access.
Railway semaphore signals. "Stop" or "caution" is a horizontal arm, "Clear to Proceed" is 45 degrees upwards, so failure of the actuating cable releases the signal arm to safety under gravity.
A railway semaphore signal is specially designed so that, should the cable controlling the signal break, the arm returns to the "danger" position, preventing any trains passing the inoperative signal.
Isolation valves, and control valves, that are used for example in systems containing hazardous substances, can be designed to close upon loss of power, for example by spring force. This is known as fail-closed upon loss of power.
An elevator has brakes that are held off brake pads by the tension of the elevator cable. If the cable breaks, tension is lost and the brakes latch on the rails in the shaft, so that the elevator cabin does not fall.
Vehicle air conditioning – Defrost controls require vacuum for diverter damper operation for all functions except defrost.^{[incomprehensible]} If vacuum fails, defrost is still available.

Electrical or electronic

Examples include:

Many devices are protected from short circuit by fuses, circuit breakers, or current limiting circuits. The electrical interruption under overload conditions will prevent damage or destruction of wiring or circuit devices due to overheating.
Avionics^[5] using redundant systems to perform the same computation using three different systems. Different results indicate a fault in the system.^[6]
Drive-by-wire and fly-by-wire controls such as an Accelerator Position Sensor typically have two potentiometers which read in opposite directions, such that moving the control will result in one reading becoming higher, and the other generally equally lower. Mismatches between the two readings indicates a fault in the system, and the ECU can often deduce which of the two readings is faulty.^[7]
Traffic light controllers use a Conflict Monitor Unit to detect faults or conflicting signals and switch an intersection to an all flashing error signal, rather than displaying potentially dangerous conflicting signals, e.g. showing green in all directions.^[8]
The automatic protection of programs and/or processing systems when a computer hardware or software failure is detected in a computer system. A classic example is a watchdog timer. See Fail-safe (computer).
A control operation or function that prevents improper system functioning or catastrophic degradation in the event of circuit malfunction or operator error; for example, the failsafe track circuit used to control railway block signals. The fact that a flashing amber is more permissive than a solid amber on many railway lines is a sign of a failsafe, as the relay, if not working, will revert to a more restrictive setting.
The iron pellet ballast on the bathyscaphe is dropped to allow the submarine to ascend. The ballast is held in place by electromagnets. If electrical power fails, the ballast is released, and the submarine then ascends to safety.
Many nuclear reactor designs have neutron-absorbing control rods suspended by electromagnets. If the power fails, they drop under gravity into the core and shut down the chain reaction in seconds by absorbing the neutrons needed for fission to continue.
In industrial automation, alarm circuits are usually "normally closed". This ensures that in case of a wire break the alarm will be triggered. If the circuit were normally open, a wire failure would go undetected, while blocking actual alarm signals.
Analog sensors and modulating actuators can usually be installed and wired such that the circuit failure results in an out-of-bound reading – see current loop. For example, a potentiometer indicating pedal position might only travel from 20% to 80% of its full range, such that a cable break or short results in a 0% or 100% reading.
In control systems, critically important signals can be carried by a complementary pair of wires (<signal> and <not_signal>). Only states where the two signals are opposite (one is high, the other low) are valid. If both are high or both are low the control system knows that something is wrong with the sensor or connecting wiring. Simple failure modes (dead sensor, cut or unplugged wires) are thereby detected. An example would be a control system reading both the normally open (NO) and normally closed (NC) poles of a SPDT selector switch against common, and checking them for coherency before reacting to the input.
In HVAC control systems, actuators that control dampers and valves may be fail-safe, for example, to prevent coils from freezing or rooms from overheating. Older pneumatic actuators were inherently fail-safe because if the air pressure against the internal diaphragm failed, the built-in spring would push the actuator to its home position – of course the home position needed to be the "safe" position. Newer electrical and electronic actuators need additional components (springs or capacitors) to automatically drive the actuator to home position upon loss of electrical power.^[9]
Programmable logic controllers (PLCs). To make a PLC fail-safe the system does not require energization to stop the drives associated. For example, usually, an emergency stop is a normally closed contact. In the event of a power failure this would remove the power directly from the coil and also the PLC input. Hence, a fail-safe system.
If a voltage regulator fails, it can destroy connected equipment. A crowbar (circuit) prevents damage by short-circuiting the power supply as soon as it detects overvoltage.

Procedural safety

As well as physical devices and systems fail-safe procedures can be created so that if a procedure is not carried out or carried out incorrectly no dangerous action results. For example:

Spacecraft trajectory - During early Apollo program missions to the Moon, the spacecraft was put on a free return trajectory — if the engines had failed at lunar orbit insertion, the craft would have safely coasted back to Earth.
An aircraft lights its afterburners to maintain full power during an arrested landing aboard an aircraft carrier. If the arrested landing fails, the aircraft can safely take off again.
The pilot of an aircraft landing on an aircraft carrier increases the throttle to full power at touchdown. If the arresting wires fail to capture the aircraft, it is able to take off again; this is an example of fail-safe practice.^[10]
In railway signalling signals which are not in active use for a train are required to be kept in the 'danger' position. The default position of every controlled absolute signal is therefore "danger", and therefore a positive action — setting signals to "clear" — is required before a train may pass. This practice also ensures that, in case of a fault in the signalling system, an incapacitated signalman, or the unexpected entry of a train, that a train will never be shown an erroneous "clear" signal.
Railroad engineers are instructed that a railway signal showing a confusing, contradictory or unfamiliar aspect (for example a colour light signal that has suffered an electrical failure and is showing no light at all) must be treated as showing "danger". In this way, the driver contributes to the fail-safety of the system.

Other terminology

Fail-safe (foolproof) devices are also known as poka-yoke devices. Poka-yoke, a Japanese term, was coined by Shigeo Shingo, a quality expert.^[11]^[12] "Safe to fail" refers to civil engineering designs such as the Room for the River project in Netherlands and the Thames Estuary 2100 Plan^[13]^[14] which incorporate flexible adaptation strategies or climate change adaptation which provide for, and limit, damage, should severe events such as 500-year floods occur.^[15]

Fail safe and fail secure

Fail-safe and fail-secure are distinct concepts. Fail-safe means that a device will not endanger lives or property when it fails. Fail-secure, also called fail-closed, means that access or data will not fall into the wrong hands in a security failure. Sometimes the approaches suggest opposite solutions. For example, if a building catches fire, fail-safe systems would unlock doors to ensure quick escape and allow firefighters inside, while fail-secure would lock doors to prevent unauthorized access to the building.

The opposite of fail-closed is called fail-open.

Fail active operational

Fail active operational can be installed on systems that have a high degree of redundancy so that a single failure of any part of the system can be tolerated (fail active operational) and a second failure can be detected – at which point the system will turn itself off (uncouple, fail passive). One way of accomplishing this is to have three identical systems installed, and a control logic which detects discrepancies. An example for this are many aircraft systems, among them inertial navigation systems and pitot tubes.

Failsafe point

During the Cold War, "failsafe point" was the term used for the point of no return for American Strategic Air Command nuclear bombers, just outside Soviet airspace. In the event of receiving an attack order, the bombers were required to linger at the failsafe point and wait for a second confirming order; until one was received, they would not arm their bombs or proceed further.^[16] The design was to prevent any single failure of the American command system causing nuclear war. This sense of the term entered the American popular lexicon with the publishing of the 1962 novel Fail-Safe.

(Other nuclear war command control systems have used the opposite scheme, fail-deadly, which requires continuous or regular proof that an enemy first-strike attack has not occurred to prevent the launching of a nuclear strike.)

References

^ "Fail-safe". AudioEnglich.net. Accessed 2009.12.31
^ e.g., David B. Rutherford Jr., What Do You Mean It\'s Fail Safe? . 1990 Rapid Transit Conference
^ Force V: The history of Britain's airborne deterrent, by Andrew Brookes. Jane's Publishing Co Ltd; First Edition 1 Jan. 1982, ISBN 0710602383, p.144.
^ Bornschlegl, Susanne (2012). Ready for SIL 4: Modular Computers for Safety-Critical Mobile Applications. MEN Mikro Elektronik. Archived from the original (pdf) on 2019-06-09. Retrieved 2015-09-21.
^ Wragg, David W. (1973). A Dictionary of Aviation (first ed.). Osprey. p. 127. ISBN 9780850451634.
^ Bornschlegl, Susanne (2012). Ready for SIL 4: Modular Computers for Safety-Critical Mobile Applications. MEN Mikro Elektronik. Archived from the original (pdf) on 2019-06-09. Retrieved 2015-09-21.
^ "P2138 DTC Throttle/Pedal Pos Sensor/Switch D / E Voltage Correlation". www.obd-codes.com.
^ Manual on Uniform Traffic Control Devices, Federal Highway Administration, 2003
^ "When Failure Is Not an Option: The Evolution of Fail-Safe Actuators". KMC Controls. 29 October 2015. Retrieved 12 April 2021.
^ Harris, Tom (29 August 2002). "How Aircraft Carriers Work". HowStuffWorks, Inc. Retrieved 2007-10-20.
^ Shingo, Shigeo; Andrew P. Dillon (1989). A study of the Toyota production system from an industrial engineering viewpoint. Portland, Oregon: Productivity Press. p. 22. ISBN 0-915299-17-8. OCLC 19740349
^ John R. Grout, Brian T. Downs. "A Brief Tutorial on Mistake-proofing, Poka-Yoke, and ZQC", MistakeProofing.com Archived 2016-03-19 at the Wayback Machine
^ "Thames Estuary 2100 Plan" (PDF). UK Environment Agency. November 2012. Archived from the original (PDF) on 2012-12-10. Retrieved March 20, 2013.
^ "Thames Estuary 2100 (TE2100)". UK Environment Agency. Retrieved March 20, 2013.
^ Jennifer Weeks (March 20, 2013). "Adaptation expert Paul Kirshen proposes a new paradigm for civil engineers: 'safe to fail,' not 'fail safe'". The Daily Climate. Archived from the original on May 13, 2013. Retrieved March 20, 2013.
^ "fail-safe". Dictionary.com. Retrieved November 7, 2021.

[1] "Fail-safe". AudioEnglich.net. Accessed 2009.12.31

[2] e.g., David B. Rutherford Jr., What Do You Mean It\'s Fail Safe? . 1990 Rapid Transit Conference

[3] Force V: The history of Britain's airborne deterrent, by Andrew Brookes. Jane's Publishing Co Ltd; First Edition 1 Jan. 1982, ISBN 0710602383, p.144.

[4] Bornschlegl, Susanne (2012). Ready for SIL 4: Modular Computers for Safety-Critical Mobile Applications. MEN Mikro Elektronik. Archived from the original (pdf) on 2019-06-09. Retrieved 2015-09-21.

[5] Wragg, David W. (1973). A Dictionary of Aviation (first ed.). Osprey. p. 127. ISBN 9780850451634.

[6] Bornschlegl, Susanne (2012). Ready for SIL 4: Modular Computers for Safety-Critical Mobile Applications. MEN Mikro Elektronik. Archived from the original (pdf) on 2019-06-09. Retrieved 2015-09-21.

[7] "P2138 DTC Throttle/Pedal Pos Sensor/Switch D / E Voltage Correlation". www.obd-codes.com.

[8] Manual on Uniform Traffic Control Devices, Federal Highway Administration, 2003

[9] "When Failure Is Not an Option: The Evolution of Fail-Safe Actuators". KMC Controls. 29 October 2015. Retrieved 12 April 2021.

[10] Harris, Tom (29 August 2002). "How Aircraft Carriers Work". HowStuffWorks, Inc. Retrieved 2007-10-20.

[11] Shingo, Shigeo; Andrew P. Dillon (1989). A study of the Toyota production system from an industrial engineering viewpoint. Portland, Oregon: Productivity Press. p. 22. ISBN 0-915299-17-8. OCLC 19740349

[12] John R. Grout, Brian T. Downs. "A Brief Tutorial on Mistake-proofing, Poka-Yoke, and ZQC", MistakeProofing.com Archived 2016-03-19 at the Wayback Machine

[TE2100-13] "Thames Estuary 2100 Plan" (PDF). UK Environment Agency. November 2012. Archived from the original (PDF) on 2012-12-10. Retrieved March 20, 2013.

[TE21-14] "Thames Estuary 2100 (TE2100)". UK Environment Agency. Retrieved March 20, 2013.

[TDC032013-15] Jennifer Weeks (March 20, 2013). "Adaptation expert Paul Kirshen proposes a new paradigm for civil engineers: 'safe to fail,' not 'fail safe'". The Daily Climate. Archived from the original on May 13, 2013. Retrieved March 20, 2013.

[16] "fail-safe". Dictionary.com. Retrieved November 7, 2021.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Mechanism	Principle	Example Failure Response	Standard Reference
Fuses/Circuit Breakers	Overcurrent interruption	Open circuit on >150% rated current	IEC 60947-2
Watchdog Timers	Timeout reset	MCU reset after 1-60 s no pulse	Embedded system norms ^[40]
Relay Logic (NC Wiring)	De-energize to safe	Off-state on power loss	Ladder logic design ^[28]
Redundant Channels	Voting/majority rule	Safe mode on signal mismatch	IEC 61508 SIL levels ^[42]

History

Fail-safe

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Fail-safe

Examples

Mechanical or physical

Electrical or electronic

Procedural safety

Other terminology

Fail safe and fail secure

Fail active operational

Failsafe point

See also

References

Fail-safe

Definition and Principles

Core Definition

Fundamental Design Principles

Historical Development

Early Mechanical Origins

Post-WWII Advancements in Electronics and Nuclear Applications

Modern Integration in Software and Automation

Types and Mechanisms

Mechanical and Physical Mechanisms

Electrical and Electronic Mechanisms

Software and Procedural Mechanisms

Key Applications

Aviation and Transportation Systems

Nuclear Power and Weapons Safeguards

Security Systems and Access Control

Industrial and Medical Devices

Artificial Intelligence Systems

Comparisons to Related Concepts

Fail-Safe versus Fail-Secure

Fail-Safe versus Fail-Deadly

Fail-Safe versus Fail-Active

Criticisms and Limitations

Risks of Over-Reliance and Complacency

Empirical Evidence from Real-World Incidents

Design Trade-Offs and Unintended Consequences

Case Studies

Successful Implementations

Failures Despite Fail-Safe Designs

References

Add your contribution

Related Hubs

Contribute something