ECC memory

Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code^[a] (ECC) to detect and correct n-bit data corruption which occurs in memory.

Typically, ECC memory maintains a memory system immune to single-bit errors: the data that is read from each word is always the same as the data that had been written to it, even if one of the bits actually stored has been flipped to the wrong state. Most non-ECC memory cannot detect errors, although some non-ECC memory with parity support allows detection but not correction.

ECC memory is used in most computers where data corruption cannot be tolerated, like industrial control applications, critical databases, and infrastructural memory caches.

Background: memory errors

Concept

Error correction codes protect against undetected data corruption and are used in computers where such corruption is unacceptable, examples being scientific and financial computing applications, or in database and file servers. ECC can also reduce the number of crashes in multi-user server applications and maximum-availability systems.

Electrical or magnetic interference inside a computer system can cause a single bit of dynamic random-access memory (DRAM) to spontaneously flip to the opposite state. It was initially thought that this was mainly due to alpha particles emitted by contaminants in chip packaging material, but research has shown that the majority of one-off soft errors in DRAM chips occur as a result of background radiation, chiefly neutrons from cosmic ray secondaries, which may change the contents of one or more memory cells or interfere with the circuitry used to read or write to them.^[2] Hence, the error rates increase rapidly with rising altitude; for example, compared to sea level, the rate of neutron flux is 3.5 times higher at 1.5 km and 300 times higher at 10–12 km (the cruising altitude of commercial airplanes).^[3] As a result, systems operating at high altitudes require special provisions for reliability.

As an example, the spacecraft Cassini–Huygens, launched in 1997, contained two identical flight recorders, each with 2.5 gigabits of memory in the form of arrays of commercial DRAM chips. Due to built-in EDAC functionality, the spacecraft's engineering telemetry reported the number of (correctable) single-bit-per-word errors and (uncorrectable) double-bit-per-word errors. During the first 2.5 years of flight, the spacecraft reported a nearly constant single-bit error rate of about 280 errors per day. However, on November 6, 1997, during the first month in space, the number of errors increased by more than a factor of four on that single day. This was attributed to a solar particle event that had been detected by the satellite GOES 9.^[4]

There was some concern that as DRAM density increases further, and thus the components on chips get smaller, while operating voltages continue to fall, DRAM chips will be affected by such radiation more frequently, since lower-energy particles will be able to change a memory cell's state.^[3] On the other hand, smaller cells make smaller targets, and moves to technologies such as SOI may make individual cells less susceptible and so counteract, or even reverse, this trend. Recent studies^[5] show that single-event upsets due to cosmic radiation have been dropping dramatically with process geometry, and previous concerns over increasing bit cell error rates are unfounded.

Real-world error rates and consequences

Work published between 2007 and 2009 showed widely varying error rates with over 7 orders of magnitude difference, ranging from 10⁻¹⁰ error/(bit·h), roughly one bit error per hour per gigabyte of memory, to 10⁻¹⁷ error/(bit·h), roughly one bit error per millennium per gigabyte of memory.^[5]^[6]^[7] A large-scale study based on Google's very large number of servers was presented at the SIGMETRICS/Performance '09 conference.^[6] The actual error rate found was several orders of magnitude higher than the previous small-scale or laboratory studies, with between 25,000 (2.5×10⁻¹¹ error/(bit·h)) and 70,000 (7.0×10⁻¹¹ error/(bit·h), or 1 bit error per gigabyte of RAM per 1.8 hours) errors per billion device hours per megabit. More than 8% of DIMM memory modules were affected by errors per year.

The consequence of a memory error is system-dependent. In systems without ECC, an error can lead either to a crash or to corruption of data; in large-scale production sites, memory errors are one of the most-common hardware causes of machine crashes.^[6] Memory errors can cause security vulnerabilities.^[6] A memory error can have no consequences if it changes a bit which neither causes observable malfunctioning nor affects data used in calculations or saved. A 2010 simulation study showed that, for a web browser, only a small fraction of memory errors caused data corruption, although, as many memory errors are intermittent and correlated, the effects of memory errors were greater than would be expected for independent soft errors.^[8]

Some tests conclude that the isolation of DRAM memory cells can be circumvented by unintended side effects of specially crafted accesses to adjacent cells. Thus, accessing data stored in DRAM causes memory cells to leak their charges and interact electrically, as a result of high cell density in modern memory, altering the content of nearby memory rows that actually were not addressed in the original memory access. This effect is known as row hammer, and it has also been used in some privilege escalation computer security exploits.^[9]^[10]

An example of a single-bit error that would be ignored by a system with no error-checking, would halt a machine with parity checking or be invisibly corrected by ECC: a single bit is stuck at 1 due to a faulty chip, or becomes changed to 1 due to background or cosmic radiation; a spreadsheet storing numbers in ASCII format is loaded, and the character "8" (decimal value 56 in the ASCII encoding) is stored in the byte that contains the stuck bit at its lowest bit position; then, a change is made to the spreadsheet and it is saved. As a result, the "8" (0011 1000 binary) has silently become a "9" (0011 1001).

Solutions

Several approaches have been developed to deal with unwanted bit-flips, including immunity-aware programming, RAM parity memory, and ECC memory.

This problem can be mitigated by using DRAM modules that include extra memory bits and memory controllers that exploit these bits. These extra bits are used to record parity or to use an error-correcting code (ECC). Parity allows the detection of all single-bit errors (actually, any odd number of wrong bits), but not correction, so the system has to either carry on (just flagging the problem) or halt. Error-correction codes allow for more errors to be corrected; how much depends on the exact type of memory used.

DRAM memory may provide increased protection against soft errors by relying on error-correcting codes. Such error-correcting memory, known as ECC or EDAC-protected memory, is particularly desirable for highly fault-tolerant applications, such as servers, as well as deep-space applications due to increased radiation.

Some systems also "scrub" the memory, by periodically reading all addresses and writing back corrected versions if necessary to remove accumulated soft errors.

Schemes

Modern memory subsystems may deliver data integrity through one or more of the following schemes:^[11]

By memory controller: These schemes have the memory controller send or receive extra data to the chip.
- Side-band ECC (SBECC) is the traditional server approach. ECCs are stored in separate DRAM chips and transmitted with data through additional channels (extra bits per word). The memory controller computes ECCs when writing, corrects errors when reading and reports error corrections and detections to the operating system or firmware (UEFI or BIOS).
- Inline ECC or In-band ECC (IBECC) does not use extra channel width and are as a result compatible with "non-ECC" memory modules. The memory controller partitions the physical space.
  - In one style of implementation represented by Intel's IBECC and TI's RTOS processor, the physical address space is partitioned so that there is a chunk of reserved memory.^[12] Each write-command would need to be accompanied by an addition write-command and the same applies to read-commands. This results in an approximate doubling of memory latency. Specifically, Intel's implementation has minimal performance impact on web browsing and productivity applications, but can reduce performance by up to 25% in gaming and video editing workloads.^[13]
  - It is theoretically possible to simply partition the existing channel (say, 64 bits into 56 bits of data and 8 bits of checking) to provide for an analogue of side-band ECC. A cursory read of Synopsys's decription of "inline ECC" mentioning a partitioning of the 16-bit channel-per-chip would lead to this understanding, but this is not very common in commercial products.^[14]
By memory chip: On-die ECC (ODECC), also called in-DRAM ECC or integrated ECC,^[15] is mandatory in all DDR5 and LPDDR6^[16] memory modules to mitigate higher error rates associated with smaller memory cells. Additional ECC storage and error correction circuitry are embedded in DRAM chips and are invisible to the memory controller. Transmission errors are not corrected since ECCs are not sent with the data, and error corrections and detections are not reported. Additional latency is introduced only when error correction is needed.
By both
- Link ECC adds error-correction to the data link but not the underlying storage. The memory controller computes and transmits ECCs with the data when writing to the DRAM, which verifies and corrects errors. When reading, the DRAM computes ECCs that the memory controller then verifies. It is a part of LPDDR5. While side-band ECC automatically provides link-level redundancy, inband/inline ECC using physical address space reserving and on-die ECC do not; they would need a layer of link ECC to protect against corruption in transmission.

Reporting of error

Many early implementations of ECC memory as well as on-die ECC mask correctable errors, acting "as if" the error never occurred, and only report uncorrectable errors. Modern implementations log both correctable errors (CE) and uncorrectable errors (UE). Some people proactively replace memory modules that exhibit high error rates, in order to reduce the likelihood of uncorrectable error events.^[17]

Implementations

Standard server memory: side-band SECDEC

Standard server memory are designed for a single-error correction and double-error detection (SECDED) Hamming code, which allows a single-bit error to be corrected and double-bit errors to be detected per word (the unit of bus transfer). Since DDR SDRAM, the standard bus width (word size) as far as memory is concerned is 64 bits. As a result, the typical setup between DDR and DDR4 is a 72-bit word with 64 data bits and 8 checking bits. DDR5 SDRAM splits the bus into two somewhat independent 32-bit subchannels, so ECC memory uses 80 bits of width in total, split between two 40-bit (32 data, 8 checking) channels.^[18] ECC is also used with smaller and larger sizes.

An ECC-capable memory controller uses the additional bits to store the SECDED code; the memory is only responsible for holding the extra bits. Since the late 1990s, the memory controller also communicates to the BIOS and maintains a count of errors detected and corrected, in part to help identify failing memory modules before the problem becomes catastrophic. Reading the counter is supported on many systems thanks to the SMBIOS standard, being available on Linux, BSD, and Windows (Windows 2000 and later).^[19]

Layout of bits

Error detection and correction depends on an expectation of the kinds of errors that occur. Implicitly, it is assumed that the failure of each bit in a word of memory is independent, resulting in improbability of two simultaneous errors. This used to be the case when memory chips were one-bit wide, what was typical in the first half of the 1980s; later developments moved many bits into the same chip.

This weakness is addressed by various technologies, including IBM's Chipkill, Sun Microsystems' Extended ECC, Hewlett-Packard's Chipspare, and Intel's Single Device Data Correction (SDDC), all of which make sure that the failure of one memory cip would only affect one bit per ECC word. This is achieved by scattering the bits of ECC words across chips, a form of interleaving. To make sure each chip only gets one bit per word, it may be necessary to interleave across multiple memory modules (sticks).

Interleaving in general is a useful technique to defend against correlated multi-bit failures. A cosmic ray, for example, may upset multiple physically neighboring bits across multiple words by associating neighboring bits to different words. As long as a single-event upset (SEU) does not exceed the error threshold (e.g., a single error) in any particular word between accesses, it can be corrected (e.g., by a single-bit error-correcting code), and an effectively error-free memory system may be maintained.^[20]

By memory chip itself

Some DRAM chips include internal "on-chip" or "on-die" error-correction circuits, which allow systems with non-ECC memory controllers to still gain most of the benefits of ECC memory.^[21]^[22] In some systems, a similar effect may be achieved by using EOS memory modules.

As mentioned above, on-die ECC is mandatory on DDR5 and LPDDR6. However, its lack of reporting means that very little is known about the true state of the memory chip until the errors exceed the ability for the on-die algorithm to perform correction; no information on how much "margin" there is is conveyed. Sophisticated algorithms have been built to infer the existence of corrected errors based on non-corrected errors.^[23]

Location of correction

Many ECC memory systems use an "external" EDAC circuit between the CPU and the memory. A few systems with ECC memory use both internal and external EDAC systems; the external EDAC system should be designed to correct certain errors that the internal EDAC system is unable to correct.^[21] Modern desktop and server CPUs integrate the EDAC circuit into the CPU,^[24] even before the shift toward CPU-integrated memory controllers, which are related to the NUMA architecture. CPU integration enables a zero-penalty EDAC system during error-free operation.

Correction algorithms

As of 2009, the most-common error-correction codes use Hamming or Hsiao codes that provide single-bit error correction and double-bit error detection (SEC-DED). Other error-correction codes have been proposed for protecting memory – double-bit error correcting and triple-bit error detecting (DEC-TED) codes, single-nibble error correcting and double-nibble error detecting (SNC-DND) codes, Reed–Solomon error correction codes, etc. However, in practice, multi-bit correction is usually implemented by interleaving multiple SEC-DED codes.^[25]^[26]

Early research attempted to minimize the area and delay overheads of ECC circuits. Hamming first demonstrated that SEC-DED codes were possible with one particular check matrix. Hsiao showed that an alternative matrix with odd-weight columns provides SEC-DED capability with less hardware area and shorter delay than traditional Hamming SEC-DED codes.^[27] More recent research also attempts to minimize power in addition to minimizing area and delay.^[28]^[29]

Redundancy instead of ECC

Error-correcting memory controllers traditionally use space-optimal error-correction codes such as Hamming and Hsiao. If cost and space is not a concern but speed is, a triple modular redundancy (TMR) may be used due for its faster hardware implementation.^[20] Space satellite systems often use TMR,^[30]^[31]^[32] although satellite RAM usually uses Hamming error correction.^[33]

Personal computers

Seymour Cray famously said "parity is for farmers" when asked why he left this out of the CDC 6600.^[34] Later, he included parity in the CDC 7600, which caused pundits to remark that "apparently a lot of farmers buy computers". The original IBM PC and all PCs until the early 1990s used parity checking.^[35] Later ones mostly did not.

Most data paths on a 2020s personal computer, including PCIe, SATA, chip-to-chip interconnection, and on-disk storage, have some form of ECC protection. The lack of ECC on main memory is in comparison unusual, especially given the size and higher likelihood of corruption. Linus Torvalds wrote a long e-mail thread in 2021 attacking Intel's choice to forgo ECC support on desktop platforms, when contemporary AMD desktop platforms could use (but not necessary enable the ECC feature on) registered DIMMs with ECC support.^[36]

Cache

Many CPUs use error-correction codes in the on-chip cache, including the Intel Itanium, Xeon, Core and Pentium (since P6 microarchitecture)^[37]^[38] processors, the AMD Athlon, Opteron, all Zen-^[39] and Zen+-based^[40] processors (EPYC, EPYC Embedded, Ryzen and Ryzen Threadripper), and the DEC Alpha 21264.^[25]^[41]

As of 2006^[update], EDC/ECC and ECC/ECC are the two most-common cache error-protection techniques used in commercial microprocessors. The EDC/ECC technique uses an error-detecting code (EDC) in the level 1 cache. If an error is detected, data is recovered from ECC-protected level 2 cache. The ECC/ECC technique uses an ECC-protected level 1 cache and an ECC-protected level 2 cache.^[42] CPUs that use the EDC/ECC technique always write-through all STOREs to the level 2 cache, so that when an error is detected during a read from the level 1 data cache, a copy of that data can be recovered from the level 2 cache.

Registered memory

Registered, or buffered, memory is not the same as ECC; the technologies perform different functions. It is usual for memory used in servers to be both registered, to allow many memory modules to be used without electrical problems, and ECC, for data integrity.

Cost and benefits

The use of ECC to increase data security often comes with a bigger expense, resulting in a marginally slower performance and higher memory costs.

ECC memory is more expensive than non-ECC memory due to its additional error-checking functionality.^[43] The added extra cost of ECC memory for 1 GB in 2010 varies between $0 and $15, depending on performance and manufacturer.^[44] The design of ECC and its purpose in high-reliability workloads positioned it to have additional overhead for validation and the added extra circuit-level designs within the memory.^[45] The said features typically result in higher costs for the implementation of ECC.

Motherboard manufacturers may choose to add ECC compatibility of varying levels depending on the market segment.^[46] Some ECC-enabled boards and processors are able to support unbuffered (unregistered) ECC, but will also work with non-ECC memory; system firmware enables ECC functionality if ECC memory is installed.^[47]

ECC may lower memory performance by around 2–3 percent on some systems, depending on the application and implementation, due to the additional time needed for ECC memory controllers to perform error checking.^[48] However, modern systems integrate ECC testing into the CPU, generating no additional delay to memory accesses as long as no errors are detected.^[24]^[49]^[50]

This is not the case for in-band ECC, which stores tables used for protection in a reserved region of main system memory,^[51]^[52] supported by Intel for Chromebooks, which showed little impact on web browsing and productivity tasks, but caused up to a 25% reduction in gaming and video editing benchmarks.^[53]

Notes

^ Most ECC memory uses a SECDED code.

References

^ Werner Fischer. "RAM Revealed". ADMIN Magazine. Retrieved October 20, 2014.
^ Single Event Upset at Ground Level, Eugene Normand, Member, IEEE, Boeing Defense & Space Group, Seattle, WA 98124-2499
^ ^a ^b Mittal, Sparsh; Vetter, Jeffrey S. (2016). "A Survey of Techniques for Modeling and Improving Reliability of Computing Systems". IEEE Transactions on Parallel and Distributed Systems. 27 (4): 1226–1238. Bibcode:2016ITPDS..27.1226M. doi:10.1109/TPDS.2015.2426179. OSTI 1261262.
^ Gary M. Swift and Steven M. Guertin. "In-Flight Observations of Multiple-Bit Upset in DRAMs". Jet Propulsion Laboratory
^ ^a ^b Borucki, "Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level", 46th Annual International Reliability Physics Symposium, Phoenix, 2008, pp. 482–487
^ ^a ^b ^c ^d
- Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009). "DRAM errors in the wild: A large-scale field study" (PDF). ACM SIGMETRICS Performance Evaluation Review. 37 (1). ACM: 193–204. doi:10.1145/2492101.1555372. ISBN 978-1-60558-511-6.
- Robin Harris (October 4, 2009). "DRAM error rates: Nightmare on DIMM street". ZDNet.
^ "A Memory Soft Error Measurement on Production Systems". Archived from the original on 2017-02-14. Retrieved 2011-06-27.
^ Li, Huang; Shen, Chu (2010). "A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility" (PDF). Usenix Annual Tech Conference 2010.
^ Yoongu Kim; Ross Daly; Jeremie Kim; Chris Fallin; Ji Hye Lee; Donghyuk Lee; Chris Wilkerson; Konrad Lai; Onur Mutlu (2014-06-24). "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" (PDF). ece.cmu.edu. IEEE. Retrieved 2015-03-10.
^ Dan Goodin (2015-03-10). "Cutting-edge hack gives super user status by exploiting DRAM weakness". Ars Technica. Retrieved 2015-03-10.
^ "ECC Technical Details". MemTest86. PassMark Software. Retrieved 2 August 2025.
^ "9.13. Enabling TI's inline ECC for DDR — Processor SDK RTOS J784S4". software-dl.ti.com.
^ Ganesh, T (29 January 2023). "ASRock Industrial NUCS BOX-1360P/D4 Review". AnandTech. Archived from the original on 30 January 2023. Retrieved 2 August 2023.
^ Sankaranarayanan, Vadhiraj (19 October 2020). "Error Correction Code (ECC) in DDR Memories". Synopsys. Retrieved 2 August 2025.
^ Patel, Minesh; Kim, Jeremie; Hassan, Hasan; Mutlu, Onur (June 2019). "Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices". 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE. pp. 13–25. doi:10.1109/DSN.2019.00017. ISBN 978-1-7281-0057-9.
^ Williams, Wayne (12 July 2025). "LPDDR6 likely to debut in 2026 as JEDEC publishes new standard document and targets mobile devices and AI - desktop PCs and workstations will have to wait". TechRadar. Retrieved 2 August 2025.
^ Doug Thompson, Mauro Carvalho Chehab. "EDAC – Error Detection And Correction". Archived 2009-09-05 at the Wayback Machine. 2005–2009. "The 'edac' kernel module goal is to detect and report errors that occur within the computer system running under linux."
^ "DDR5 Memory Standard: An introduction to the next generation of DRAM module technology - Kingston Technology". Kingston Technology Company.
^ DOMARS. "!mca". Microsoft Learn. Retrieved 2021-03-27.
^ ^a ^b "Using StrongArm SA-1110 in the On-Board Computer of Nanosatellite". Tsinghua Space Center, Tsinghua University, Beijing. Archived from the original on 2011-10-02. Retrieved 2009-02-16.
^ ^a ^b A. H. Johnston. "Space Radiation Effects in Advanced Flash Memories". Archived 2016-03-04 at the Wayback Machine. NASA Electronic Parts and Packaging Program (NEPP). 2001.
^ "ECC DRAM". Intelligent Memory. Archived from the original on 2019-02-12. Retrieved 2021-06-12.
^ M. Patel, J. S. Kim, H. Hassan and O. Mutlu, "Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices," 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, 2019, pp. 13-25, doi: 10.1109/DSN.2019.00017.
^ ^a ^b "AMD-762™ System Controller Software/BIOS Design Guide, p. 179" (PDF).
^ ^a ^b Doe Hyun Yoon; Mattan Erez. "Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches". 2009. p. 3.
^ Daniele Rossi; Nicola Timoncini; Michael Spica; Cecilia Metra. "Error Correcting Code Analysis for Cache Memory High Reliability and Performance". Archived 2015-02-03 at the Wayback Machine.
^ M. Y. Hsiao. "A Class of Optimal Minimum Odd-weight-column SEC-DED Codes". 1970.
^ Shalini Ghosh; Sugato Basu; and Nur A. Touba. "Selecting Error Correcting Codes to Minimize Power in Memory Checker Circuits". Archived 2015-02-03 at the Wayback Machine. P. 2, 4.
^ Chris Wilkerson; Alaa R. Alameldeen; Zeshan Chishti; Wei Wu; Dinesh Somasekhar; Shih-lien Lu. "Reducing cache power with low-cost, multi-bit error-correcting codes". doi:10.1145/1816038.1815973.
^ "Actel engineers use triple-module redundancy in new rad-hard FPGA". Military & Aerospace Electronics. Archived from the original on 2012-07-14. Retrieved 2009-02-16.
^ "SEU Hardening of Field Programmable Gate Arrays (FPGAs) For Space Applications and Device Characterization". Klabs.org. 2010-02-03. Archived from the original on 2011-11-25. Retrieved 2011-11-23.
^ "FPGAs in Space". Techfocusmedia.net. Retrieved 2011-11-23.^{[permanent dead link]}
^ "Commercial Microelectronics Technologies for Applications in the Satellite Radiation Environment". Radhome.gsfc.nasa.gov. Archived from the original on 2001-03-04. Retrieved 2011-11-23.
^ "CDC 6600". Microsoft Research. Retrieved 2011-11-23.
^ Kozierok, Charles M. (2001-04-17). "Parity Checking". The PC Guide. Memory Errors, Detection and Correction. Archived from the original on 2019-02-12. Retrieved 2011-11-23.
^ "Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC". www.phoronix.com. 3 January 2021.
^ Intel Corporation. "Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability". 2011. p. 12.
^ "Bios and Cache". www.custom-build-computers.com. Retrieved 2021-03-27.
^ "AMD Zen microarchitecture — Memory Hierarchy". WikiChip. Retrieved 15 October 2018.
^ "AMD Zen+ microarchitecture — Memory Hierarchy". WikiChip. Retrieved 15 October 2018.
^ Jangwoo Kim; Nikos Hardavellas; Ken Mai; Babak Falsafi; James C. Hoe. "Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding". 2007. p. 2.
^ Nathan N. Sadler and Daniel J. Sorin. "Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache". 2006. p. 1.
^ "What Are the Benefits of ECC Memory?". Lenovo. Retrieved October 1, 2025.
^ Harrell, John (2010). "The Importance of ECC Memory in Your Substation Computer" (PDF). Schweitzer Engineering Laboratories, Inc.: 4.
^ Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009-06-15). "DRAM errors in the wild: A large-scale field study". Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems. SIGMETRICS '09. New York, NY, USA: Association for Computing Machinery. pp. 193–204. doi:10.1145/1555349.1555372. ISBN 978-1-60558-511-6.
^ Schroeder, B.; Gibson, G.A. (June 2006). "A large-scale study of failures in high-performance computing systems". International Conference on Dependable Systems and Networks (DSN'06). pp. 249–258. doi:10.1109/DSN.2006.5. ISBN 0-7695-2607-1.
^ ECC vs. Non-ECC MEMORY (Rev A ed.). viking TECHNOLOGY. May 26, 2020. p. 10.
^ Kozierok, Charles M. (2001-04-17). "ECC". The PC Guide. Memory Errors, Detection and Correction. Archived from the original on 2019-02-06. Retrieved 2011-11-23.
^ Benchmark of AMD-762/Athlon platform with and without ECC. Archived 2013-06-15 at the Wayback Machine.
^ "ECCploit: ECC Memory Vulnerable to Rowhammer Attacks After All". Systems and Network Security Group at VU Amsterdam. 12 November 2018. Retrieved 2018-11-22.
^ US abandoned 20190332469A1, Amir A. RADJAI, Nagi Aboulenein, Steve L. GEIGER, Satyajit A. JADHAV, Bezan J. KAPADIA, Vivek Kozhikkottu, Rashmi LAKKUR SUBRAMANYAM, Srithar Rames, James M. Shehadi, Jason D. VAN DYKEN, "Address range based in-band memory error-correcting code protection module with syndrome buffer", published 2019-10-31, assigned to Intel
^ US patent 11768731B2, Hartlieb, Heimo & Heiling, Christian, "System and method for transparent register data error detection and correction via a communication bus", published 2020-11-05, assigned to Infineon Technologies
^ Ganesh T S (2023-01-29). "ASRock Industrial NUCS BOX-1360P/D4 Review: Raptor Lake-P Impresses, plus Surprise ECC". pp. 2–6. Archived from the original on January 30, 2023. Retrieved 2024-01-29.

External links

[SECDED-2] Most ECC memory uses a SECDED code.

[1] Werner Fischer. "RAM Revealed". ADMIN Magazine. Retrieved October 20, 2014.

[Boeing1-3] Single Event Upset at Ground Level, Eugene Normand, Member, IEEE, Boeing Defense & Space Group, Seattle, WA 98124-2499

[ieee-tpds-4] Mittal, Sparsh; Vetter, Jeffrey S. (2016). "A Survey of Techniques for Modeling and Improving Reliability of Computing Systems". IEEE Transactions on Parallel and Distributed Systems. 27 (4): 1226–1238. Bibcode:2016ITPDS..27.1226M. doi:10.1109/TPDS.2015.2426179. OSTI 1261262.

[Auto7P-1-5] Gary M. Swift and Steven M. Guertin. "In-Flight Observations of Multiple-Bit Upset in DRAMs". Jet Propulsion Laboratory

[Borucki1-6] Borucki, "Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level", 46th Annual International Reliability Physics Symposium, Phoenix, 2008, pp. 482–487

[Schroeder1-7] 
Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009). "DRAM errors in the wild: A large-scale field study" (PDF). ACM SIGMETRICS Performance Evaluation Review. 37 (1). ACM: 193–204. doi:10.1145/2492101.1555372. ISBN 978-1-60558-511-6.

Robin Harris (October 4, 2009). "DRAM error rates: Nightmare on DIMM street". ZDNet.

[8] Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009). "DRAM errors in the wild: A large-scale field study" (PDF). ACM SIGMETRICS Performance Evaluation Review. 37 (1). ACM: 193–204. doi:10.1145/2492101.1555372. ISBN 978-1-60558-511-6.

[9] Robin Harris (October 4, 2009). "DRAM error rates: Nightmare on DIMM street". ZDNet.

[Xin1-8] "A Memory Soft Error Measurement on Production Systems". Archived from the original on 2017-02-14. Retrieved 2011-06-27.

[9] Li, Huang; Shen, Chu (2010). "A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility" (PDF). Usenix Annual Tech Conference 2010.

[10] Yoongu Kim; Ross Daly; Jeremie Kim; Chris Fallin; Ji Hye Lee; Donghyuk Lee; Chris Wilkerson; Konrad Lai; Onur Mutlu (2014-06-24). "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" (PDF). ece.cmu.edu. IEEE. Retrieved 2015-03-10.

[11] Dan Goodin (2015-03-10). "Cutting-edge hack gives super user status by exploiting DRAM weakness". Ars Technica. Retrieved 2015-03-10.

[12] "ECC Technical Details". MemTest86. PassMark Software. Retrieved 2 August 2025.

[13] "9.13. Enabling TI's inline ECC for DDR — Processor SDK RTOS J784S4". software-dl.ti.com.

[14] Ganesh, T (29 January 2023). "ASRock Industrial NUCS BOX-1360P/D4 Review". AnandTech. Archived from the original on 30 January 2023. Retrieved 2 August 2023.

[Synopsys-15] Sankaranarayanan, Vadhiraj (19 October 2020). "Error Correction Code (ECC) in DDR Memories". Synopsys. Retrieved 2 August 2025.

[16] Patel, Minesh; Kim, Jeremie; Hassan, Hasan; Mutlu, Onur (June 2019). "Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices". 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE. pp. 13–25. doi:10.1109/DSN.2019.00017. ISBN 978-1-7281-0057-9.

[17] Williams, Wayne (12 July 2025). "LPDDR6 likely to debut in 2026 as JEDEC publishes new standard document and targets mobile devices and AI - desktop PCs and workstations will have to wait". TechRadar. Retrieved 2 August 2025.

[18] Doug Thompson, Mauro Carvalho Chehab. "EDAC – Error Detection And Correction". Archived 2009-09-05 at the Wayback Machine. 2005–2009. "The 'edac' kernel module goal is to detect and report errors that occur within the computer system running under linux."

[19] "DDR5 Memory Standard: An introduction to the next generation of DRAM module technology - Kingston Technology". Kingston Technology Company.

[20] DOMARS. "!mca". Microsoft Learn. Retrieved 2021-03-27.

[apmcsta1-21] "Using StrongArm SA-1110 in the On-Board Computer of Nanosatellite". Tsinghua Space Center, Tsinghua University, Beijing. Archived from the original on 2011-10-02. Retrieved 2009-02-16.

[johnston-22] A. H. Johnston. "Space Radiation Effects in Advanced Flash Memories". Archived 2016-03-04 at the Wayback Machine. NASA Electronic Parts and Packaging Program (NEPP). 2001.

[23] "ECC DRAM". Intelligent Memory. Archived from the original on 2019-02-12. Retrieved 2021-06-12.

[24] M. Patel, J. S. Kim, H. Hassan and O. Mutlu, "Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices," 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, 2019, pp. 13-25, doi: 10.1109/DSN.2019.00017.

[AMD-762-25] "AMD-762™ System Controller Software/BIOS Design Guide, p. 179" (PDF).

[yoon-26] Doe Hyun Yoon; Mattan Erez. "Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches". 2009. p. 3.

[27] Daniele Rossi; Nicola Timoncini; Michael Spica; Cecilia Metra. "Error Correcting Code Analysis for Cache Memory High Reliability and Performance". Archived 2015-02-03 at the Wayback Machine.

[28] M. Y. Hsiao. "A Class of Optimal Minimum Odd-weight-column SEC-DED Codes". 1970.

[29] Shalini Ghosh; Sugato Basu; and Nur A. Touba. "Selecting Error Correcting Codes to Minimize Power in Memory Checker Circuits". Archived 2015-02-03 at the Wayback Machine. P. 2, 4.

[30] Chris Wilkerson; Alaa R. Alameldeen; Zeshan Chishti; Wei Wu; Dinesh Somasekhar; Shih-lien Lu. "Reducing cache power with low-cost, multi-bit error-correcting codes". doi:10.1145/1816038.1815973.

[Auto7P-3-31] "Actel engineers use triple-module redundancy in new rad-hard FPGA". Military & Aerospace Electronics. Archived from the original on 2012-07-14. Retrieved 2009-02-16.

[Auto7P-4-32] "SEU Hardening of Field Programmable Gate Arrays (FPGAs) For Space Applications and Device Characterization". Klabs.org. 2010-02-03. Archived from the original on 2011-11-25. Retrieved 2011-11-23.

[Auto7P-5-33] "FPGAs in Space". Techfocusmedia.net. Retrieved 2011-11-23.^{[permanent dead link]}

[Auto7P-6-34] "Commercial Microelectronics Technologies for Applications in the Satellite Radiation Environment". Radhome.gsfc.nasa.gov. Archived from the original on 2001-03-04. Retrieved 2011-11-23.

[Bell1-35] "CDC 6600". Microsoft Research. Retrieved 2011-11-23.

[Auto7P-2-36] Kozierok, Charles M. (2001-04-17). "Parity Checking". The PC Guide. Memory Errors, Detection and Correction. Archived from the original on 2019-02-12. Retrieved 2011-11-23.

[37] "Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC". www.phoronix.com. 3 January 2021.

[38] Intel Corporation. "Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability". 2011. p. 12.

[39] "Bios and Cache". www.custom-build-computers.com. Retrieved 2021-03-27.

[40] "AMD Zen microarchitecture — Memory Hierarchy". WikiChip. Retrieved 15 October 2018.

[41] "AMD Zen+ microarchitecture — Memory Hierarchy". WikiChip. Retrieved 15 October 2018.

[42] Jangwoo Kim; Nikos Hardavellas; Ken Mai; Babak Falsafi; James C. Hoe. "Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding". 2007. p. 2.

[43] Nathan N. Sadler and Daniel J. Sorin. "Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache". 2006. p. 1.

[44] "What Are the Benefits of ECC Memory?". Lenovo. Retrieved October 1, 2025.

[45] Harrell, John (2010). "The Importance of ECC Memory in Your Substation Computer" (PDF). Schweitzer Engineering Laboratories, Inc.: 4.

[46] Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich (2009-06-15). "DRAM errors in the wild: A large-scale field study". Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems. SIGMETRICS '09. New York, NY, USA: Association for Computing Machinery. pp. 193–204. doi:10.1145/1555349.1555372. ISBN 978-1-60558-511-6.

[47] Schroeder, B.; Gibson, G.A. (June 2006). "A large-scale study of failures in high-performance computing systems". International Conference on Dependable Systems and Networks (DSN'06). pp. 249–258. doi:10.1109/DSN.2006.5. ISBN 0-7695-2607-1.

[48] ECC vs. Non-ECC MEMORY (Rev A ed.). viking TECHNOLOGY. May 26, 2020. p. 10.

[Auto7P-9-49] Kozierok, Charles M. (2001-04-17). "ECC". The PC Guide. Memory Errors, Detection and Correction. Archived from the original on 2019-02-06. Retrieved 2011-11-23.

[50] Benchmark of AMD-762/Athlon platform with and without ECC. Archived 2013-06-15 at the Wayback Machine.

[ECCploit-51] "ECCploit: ECC Memory Vulnerable to Rowhammer Attacks After All". Systems and Network Security Group at VU Amsterdam. 12 November 2018. Retrieved 2018-11-22.

[52] US abandoned 20190332469A1, Amir A. RADJAI, Nagi Aboulenein, Steve L. GEIGER, Satyajit A. JADHAV, Bezan J. KAPADIA, Vivek Kozhikkottu, Rashmi LAKKUR SUBRAMANYAM, Srithar Rames, James M. Shehadi, Jason D. VAN DYKEN, "Address range based in-band memory error-correcting code protection module with syndrome buffer", published 2019-10-31, assigned to Intel

[53] US patent 11768731B2, Hartlieb, Heimo & Heiling, Christian, "System and method for transparent register data error detection and correction via a communication bus", published 2020-11-05, assigned to Infineon Technologies

[anandtech_18732-54] Ganesh T S (2023-01-29). "ASRock Industrial NUCS BOX-1360P/D4 Review: Raptor Lake-P Impresses, plus Surprise ECC". pp. 2–6. Archived from the original on January 30, 2023. Retrieved 2024-01-29.

[1]

[a]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

History

Media collections

ECC memory

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

ECC memory

Background: memory errors

Concept

Real-world error rates and consequences

Solutions

Schemes

Reporting of error

Implementations

Standard server memory: side-band SECDEC

Layout of bits

By memory chip itself

Location of correction

Correction algorithms

Redundancy instead of ECC

Personal computers

Cache

Registered memory

Cost and benefits

Notes

References

External links

ECC memory

Fundamentals

Definition and Purpose

Sources of Memory Errors

Error Correction Techniques

Hamming Codes

Advanced Schemes and Variants

Hardware Implementations

In Main Memory Modules

In Processor Caches

Registered and Buffered ECC

Applications and Adoption

In Servers and Workstations

In Consumer and Emerging Systems

Advantages and Disadvantages

Key Benefits

Limitations and Trade-offs

Historical Development

Early Research and Invention

Commercial Evolution and Modern Trends

References

Add your contribution

Related Hubs

Contribute something