Hubbry Logo
search
logo

Predictive failure analysis

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Predictive failure analysis

Predictive failure analysis (PFA) refers to methods intended to predict imminent failure of systems or components (software or hardware), and potentially enable mechanisms to avoid or counteract failure issues, or recommend maintenance of systems prior to failure.

For example, computer mechanisms that analyze trends in corrected errors to predict future failures of hardware/memory components and proactively enabling mechanisms to avoid them. Predictive failure analysis was originally used as term for a proprietary IBM technology for monitoring the likelihood of hard disk drives to fail, although the term is now used generically for a variety of technologies for judging the imminent failure of CPUs, memory and I/O devices.

IBM introduced the term "PFA" and its technology in 1992, with reference to its 0662-S1x drive (1052 MB Fast-Wide SCSI-2 disk which operated at 5400 rpm).

The technology relies on measuring several key (mainly mechanical) parameters of the drive unit, such asthe flying height of heads. The drive firmware compares the measured parameters against predefined thresholds and evaluates the health status of the drive. If the drive appears likely to fail soon, the system sends notification to the disk controller.

The major drawbacks of the technology included:

The technology merged with IntelliSafe to form the Self-Monitoring, Analysis and Reporting Technology (SMART).

High counts of corrected RAM intermittent errors by ECC can be predictive of future DIMM failures and so automatic offlining for memory and CPU caches can be used to avoid future errors, for example under the Linux operating system the mcelog daemon will automatically remove from usage memory pages showing excessive corrections, and will remove from usage processor cores showing excessive cache correctable memory errors.

On optical media (CD, DVD and Blu-ray), failures caused by degradation of media can be predicted and media of low manufacturing quality can be detected prior to data loss occurring by measuring the rate of correctable data errors using software such as QpxTool or Nero DiscSpeed. However, not all vendors and models of optical drives allow error scanning.

See all
User Avatar
No comments yet.