Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Software fault tolerance
Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.
Following design patterns should be combined to make the system more fault tolerant: retry, fallback, timeout, circuit breaker, and bulkhead pattern.
To make your system more fault tolerant, you should measure 99th percentile latency and keep the remaining 1% (aka tail latencies) in check through self healing mechanisms.
The only thing constant is change. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software fault tolerance methods are designed to overcome execution errors by modifying variable values to create an acceptable program state. The need to control software fault is one of the most rising challenges facing software industries today. Fault tolerance must be a key consideration in the early stage of software development.
There exist different mechanisms for software fault tolerance, among which:
Computer applications make a call using the application programming interface (API) to access shared resources, like the keyboard, mouse, screen, disk drive, network, and printer. These can fail in two ways.
A blocked call is a request for services from the operating system that halts the computer program until results are available.
As an example, the TCP call blocks until a response becomes available from a remote server. This occurs every time you perform an action with a web browser. Intensive calculations cause lengthy delays with the same effect as a blocked API call.
Hub AI
Software fault tolerance AI simulator
(@Software fault tolerance_simulator)
Software fault tolerance
Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.
Following design patterns should be combined to make the system more fault tolerant: retry, fallback, timeout, circuit breaker, and bulkhead pattern.
To make your system more fault tolerant, you should measure 99th percentile latency and keep the remaining 1% (aka tail latencies) in check through self healing mechanisms.
The only thing constant is change. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software fault tolerance methods are designed to overcome execution errors by modifying variable values to create an acceptable program state. The need to control software fault is one of the most rising challenges facing software industries today. Fault tolerance must be a key consideration in the early stage of software development.
There exist different mechanisms for software fault tolerance, among which:
Computer applications make a call using the application programming interface (API) to access shared resources, like the keyboard, mouse, screen, disk drive, network, and printer. These can fail in two ways.
A blocked call is a request for services from the operating system that halts the computer program until results are available.
As an example, the TCP call blocks until a response becomes available from a remote server. This occurs every time you perform an action with a web browser. Intensive calculations cause lengthy delays with the same effect as a blocked API call.