Recent from talks
Contribute something
Nothing was collected or created yet.
Fuzzing
View on Wikipedia
In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, such as in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.
For the purpose of security, input that crosses a trust boundary is often the most useful.[1] For example, it is more important to fuzz code that handles a file uploaded by any user than it is to fuzz the code that parses a configuration file that is accessible only to a privileged user.
History
[edit]The term "fuzz" originates from a 1988 class project[2] in the graduate Advanced Operating Systems class (CS736), taught by Prof. Barton Miller at the University of Wisconsin, whose results were subsequently published in 1990.[3][4] To fuzz test a UNIX utility meant to automatically generate random input and command-line parameters for the utility. The project was designed to test the reliability of UNIX command line programs by executing a large number of random inputs in quick succession until they crashed. Miller's team was able to crash 25 to 33 percent of the utilities that they tested. They then debugged each of the crashes to determine the cause and categorized each detected failure. To allow other researchers to conduct similar experiments with other software, the source code of the tools, the test procedures, and the raw result data were made publicly available.[5] This early fuzzing would now be called black box, generational, unstructured (dumb or "classic") fuzzing.
According to Prof. Barton Miller, "In the process of writing the project description, I needed to give this kind of testing a name. I wanted a name that would evoke the feeling of random, unstructured data. After trying out several ideas, I settled on the term fuzz."[4]
A key contribution of this early work was simple (almost simplistic) oracle. A program failed its test if it crashed or hung under the random input and was considered to have passed otherwise. While test oracles can be challenging to construct, the oracle for this early fuzz testing was simple and universal to apply.
In April 2012, Google announced ClusterFuzz, a cloud-based fuzzing infrastructure for security-critical components of the Chromium web browser.[6] Security researchers can upload their own fuzzers and collect bug bounties if ClusterFuzz finds a crash with the uploaded fuzzer.
In September 2014, Shellshock[7] was disclosed as a family of security bugs in the widely used UNIX Bash shell; most vulnerabilities of Shellshock were found using the fuzzer AFL.[8] (Many Internet-facing services, such as some web server deployments, use Bash to process certain requests, allowing an attacker to cause vulnerable versions of Bash to execute arbitrary commands. This can allow an attacker to gain unauthorized access to a computer system.[9])
In April 2015, Hanno Böck showed how the fuzzer AFL could have found the 2014 Heartbleed vulnerability.[10][11] (The Heartbleed vulnerability was disclosed in April 2014. It is a serious vulnerability that allows adversaries to decipher otherwise encrypted communication. The vulnerability was accidentally introduced into OpenSSL which implements TLS and is used by the majority of the servers on the internet. Shodan reported 238,000 machines still vulnerable in April 2016;[12] 200,000 in January 2017.[13])
In August 2016, the Defense Advanced Research Projects Agency (DARPA) held the finals of the first Cyber Grand Challenge, a fully automated capture-the-flag competition that lasted 11 hours.[14] The objective was to develop automatic defense systems that can discover, exploit, and correct software flaws in real-time. Fuzzing was used as an effective offense strategy to discover flaws in the software of the opponents. It showed tremendous potential in the automation of vulnerability detection. The winner was a system called "Mayhem"[15] developed by the team ForAllSecure led by David Brumley.
In September 2016, Microsoft announced Project Springfield, a cloud-based fuzz testing service for finding security critical bugs in software.[16]
In December 2016, Google announced OSS-Fuzz which allows for continuous fuzzing of several security-critical open-source projects.[17]
At Black Hat 2018, Christopher Domas demonstrated the use of fuzzing to expose the existence of a hidden RISC core in a processor.[18] This core was able to bypass existing security checks to execute Ring 0 commands from Ring 3.
In September 2020, Microsoft released OneFuzz, a self-hosted fuzzing-as-a-service platform that automates the detection of software bugs.[19] It supports Windows and Linux.[20] It was archived three years later on November 1, 2023.[21]
Early random testing
[edit]Testing programs with random inputs dates back to the 1950s when data was still stored on punched cards.[22] Programmers would use punched cards that were pulled from the trash or card decks of random numbers as input to computer programs. If an execution revealed undesired behavior, a bug had been detected.
The execution of random inputs is also called random testing or monkey testing.
In 1981, Duran and Ntafos formally investigated the effectiveness of testing a program with random inputs.[23][24] While random testing had been widely perceived to be the worst means of testing a program, the authors could show that it is a cost-effective alternative to more systematic testing techniques.
In 1983, Steve Capps at Apple developed "The Monkey",[25] a tool that would generate random inputs for classic Mac OS applications, such as MacPaint.[26] The figurative "monkey" refers to the infinite monkey theorem which states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will eventually type out the entire works of Shakespeare. In the case of testing, the monkey would write the particular sequence of inputs that would trigger a crash.
In 1991, the crashme tool was released, which was intended to test the robustness of Unix and Unix-like operating systems by randomly executing systems calls with randomly chosen parameters.[27]
Types
[edit]A fuzzer can be categorized in several ways:[28][1]
- A fuzzer can be generation-based or mutation-based depending on whether inputs are generated from scratch or by modifying existing inputs.
- A fuzzer can be dumb (unstructured) or smart (structured) depending on whether it is aware of input structure.
- A fuzzer can be white-, grey-, or black-box, depending on whether it is aware of program structure.
Reuse of existing input seeds
[edit]A mutation-based fuzzer leverages an existing corpus of seed inputs during fuzzing. It generates inputs by modifying (or rather mutating) the provided seeds.[29] For example, when fuzzing the image library libpng, the user would provide a set of valid PNG image files as seeds while a mutation-based fuzzer would modify these seeds to produce semi-valid variants of each seed. The corpus of seed files may contain thousands of potentially similar inputs. Automated seed selection (or test suite reduction) allows users to pick the best seeds in order to maximize the total number of bugs found during a fuzz campaign.[30]
A generation-based fuzzer generates inputs from scratch. For instance, a smart generation-based fuzzer[31] takes the input model that was provided by the user to generate new inputs. Unlike mutation-based fuzzers, a generation-based fuzzer does not depend on the existence or quality of a corpus of seed inputs.
Some fuzzers have the capability to do both, to generate inputs from scratch and to generate inputs by mutation of existing seeds.[32]
Aware of input structure
[edit]Typically, fuzzers are used to generate inputs for programs that take structured inputs, such as a file, a sequence of keyboard or mouse events, or a sequence of messages. This structure distinguishes valid input that is accepted and processed by the program from invalid input that is quickly rejected by the program. What constitutes a valid input may be explicitly specified in an input model. Examples of input models are formal grammars, file formats, GUI-models, and network protocols. Even items not normally considered as input can be fuzzed, such as the contents of databases, shared memory, environment variables or the precise interleaving of threads. An effective fuzzer generates semi-valid inputs that are "valid enough" so that they are not directly rejected from the parser and "invalid enough" so that they might stress corner cases and exercise interesting program behaviours.
A smart (model-based,[32] grammar-based,[31][33] or protocol-based[34]) fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, if the input can be modelled as an abstract syntax tree, then a smart mutation-based fuzzer[33] would employ random transformations to move complete subtrees from one node to another. If the input can be modelled by a formal grammar, a smart generation-based fuzzer[31] would instantiate the production rules to generate inputs that are valid with respect to the grammar. However, generally the input model must be explicitly provided, which is difficult to do when the model is proprietary, unknown, or very complex. If a large corpus of valid and invalid inputs is available, a grammar induction technique, such as Angluin's L* algorithm, would be able to generate an input model.[35][36]
A dumb fuzzer[37][38] does not require the input model and can thus be employed to fuzz a wider variety of programs. For instance, AFL is a dumb mutation-based fuzzer that modifies a seed file by flipping random bits, by substituting random bytes with "interesting" values, and by moving or deleting blocks of data. However, a dumb fuzzer might generate a lower proportion of valid inputs and stress the parser code rather than the main components of a program. The disadvantage of dumb fuzzers can be illustrated by means of the construction of a valid checksum for a cyclic redundancy check (CRC). A CRC is an error-detecting code that ensures that the integrity of the data contained in the input file is preserved during transmission. A checksum is computed over the input data and recorded in the file. When the program processes the received file and the recorded checksum does not match the re-computed checksum, then the file is rejected as invalid. Now, a fuzzer that is unaware of the CRC is unlikely to generate the correct checksum. However, there are attempts to identify and re-compute a potential checksum in the mutated input, once a dumb mutation-based fuzzer has modified the protected data.[39]
Aware of program structure
[edit]Typically, a fuzzer is considered more effective if it achieves a higher degree of code coverage. The rationale is, if a fuzzer does not exercise certain structural elements in the program, then it is also not able to reveal bugs that are hiding in these elements. Some program elements are considered more critical than others. For instance, a division operator might cause a division by zero error, or a system call may crash the program.
A black-box fuzzer[37][33] treats the program as a black box and is unaware of internal program structure. For instance, a random testing tool that generates inputs at random is considered a blackbox fuzzer. Hence, a blackbox fuzzer can execute several hundred inputs per second, can be easily parallelized, and can scale to programs of arbitrary size. However, blackbox fuzzers may only scratch the surface and expose "shallow" bugs. Hence, there are attempts to develop blackbox fuzzers that can incrementally learn about the internal structure (and behavior) of a program during fuzzing by observing the program's output given an input. For instance, LearnLib employs active learning to generate an automaton that represents the behavior of a web application.
A white-box fuzzer[38][32] leverages program analysis to systematically increase code coverage or to reach certain critical program locations. For instance, SAGE[40] leverages symbolic execution to systematically explore different paths in the program (a technique known as concolic execution). If the program's specification is available, a whitebox fuzzer might leverage techniques from model-based testing to generate inputs and check the program outputs against the program specification. A whitebox fuzzer can be very effective at exposing bugs that hide deep in the program. However, the time used for analysis (of the program or its specification) can become prohibitive. If the whitebox fuzzer takes relatively too long to generate an input, a blackbox fuzzer will be more efficient.[41] Hence, there are attempts to combine the efficiency of blackbox fuzzers and the effectiveness of whitebox fuzzers.[42]
A gray-box fuzzer leverages instrumentation rather than program analysis to glean information about the program. For instance, AFL and libFuzzer utilize lightweight instrumentation to trace basic block transitions exercised by an input. This leads to a reasonable performance overhead but informs the fuzzer about the increase in code coverage during fuzzing, which makes gray-box fuzzers extremely efficient vulnerability detection tools.[43]
Uses
[edit]Fuzzing is used mostly as an automated technique to expose vulnerabilities in security-critical programs that might be exploited with malicious intent.[6][16][17] More generally, fuzzing is used to demonstrate the presence of bugs rather than their absence. Running a fuzzing campaign for several weeks without finding a bug does not prove the program correct.[44] After all, the program may still fail for an input that has not been executed, yet; executing a program for all inputs is prohibitively expensive. If the objective is to prove a program correct for all inputs, a formal specification must exist and techniques from formal methods must be used.
Exposing bugs
[edit]In order to expose bugs, a fuzzer must be able to distinguish expected (normal) from unexpected (buggy) program behavior. However, a machine cannot always distinguish a bug from a feature. In automated software testing, this is also called the test oracle problem.[45][46]
Typically, a fuzzer distinguishes between crashing and non-crashing inputs in the absence of specifications and to use a simple and objective measure. Crashes can be easily identified and might indicate potential vulnerabilities (e.g., denial of service or arbitrary code execution). However, the absence of a crash does not indicate the absence of a vulnerability. For instance, a program written in C may or may not crash when an input causes a buffer overflow. Rather the program's behavior is undefined.
To make a fuzzer more sensitive to failures other than crashes, sanitizers can be used to inject assertions that crash the program when a failure is detected.[47][48] There are different sanitizers for different kinds of bugs:
- to detect memory related errors, such as buffer overflows and use-after-free (using memory debuggers such as AddressSanitizer),
- to detect race conditions and deadlocks (ThreadSanitizer),
- to detect undefined behavior (UndefinedBehaviorSanitizer),
- to detect memory leaks (LeakSanitizer), or
- to check control-flow integrity (CFISanitizer).
Fuzzing can also be used to detect "differential" bugs if a reference implementation is available. For automated regression testing,[49] the generated inputs are executed on two versions of the same program. For automated differential testing,[50] the generated inputs are executed on two implementations of the same program (e.g., lighttpd and httpd are both implementations of a web server). If the two variants produce different output for the same input, then one may be buggy and should be examined more closely.
Validating static analysis reports
[edit]Static program analysis analyzes a program without actually executing it. This might lead to false positives where the tool reports problems with the program that do not actually exist. Fuzzing in combination with dynamic program analysis can be used to try to generate an input that actually witnesses the reported problem.[51]
Browser security
[edit]Modern web browsers undergo extensive fuzzing. The Chromium code of Google Chrome is continuously fuzzed by the Chrome Security Team with 15,000 cores.[52] For Microsoft Edge [Legacy] and Internet Explorer, Microsoft performed fuzzed testing with 670 machine-years during product development, generating more than 400 billion DOM manipulations from 1 billion HTML files.[53][52]
Toolchain
[edit]A fuzzer produces a large number of inputs in a relatively short time. For instance, in 2016 the Google OSS-fuzz project produced around 4 trillion inputs a week.[17] Hence, many fuzzers provide a toolchain that automates otherwise manual and tedious tasks which follow the automated generation of failure-inducing inputs.
Automated bug triage
[edit]Automated bug triage is used to group a large number of failure-inducing inputs by root cause and to prioritize each individual bug by severity. A fuzzer produces a large number of inputs, and many of the failure-inducing ones may effectively expose the same software bug. Only some of these bugs are security-critical and should be patched with higher priority. For instance the CERT Coordination Center provides the Linux triage tools which group crashing inputs by the produced stack trace and lists each group according to their probability to be exploitable.[54] The Microsoft Security Research Centre (MSEC) developed the "!exploitable" tool which first creates a hash for a crashing input to determine its uniqueness and then assigns an exploitability rating:[55]
- Exploitable
- Probably Exploitable
- Probably Not Exploitable, or
- Unknown.
Previously unreported, triaged bugs might be automatically reported to a bug tracking system. For instance, OSS-Fuzz runs large-scale, long-running fuzzing campaigns for several security-critical software projects where each previously unreported, distinct bug is reported directly to a bug tracker.[17] The OSS-Fuzz bug tracker automatically informs the maintainer of the vulnerable software and checks in regular intervals whether the bug has been fixed in the most recent revision using the uploaded minimized failure-inducing input.
Automated input minimization
[edit]Automated input minimization (or test case reduction) is an automated debugging technique to isolate that part of the failure-inducing input that is actually inducing the failure.[56][57] If the failure-inducing input is large and mostly malformed, it might be difficult for a developer to understand what exactly is causing the bug. Given the failure-inducing input, an automated minimization tool would remove as many input bytes as possible while still reproducing the original bug. For instance, Delta Debugging is an automated input minimization technique that employs an extended binary search algorithm to find such a minimal input.[58]
List of popular fuzzers
[edit]This section needs expansion. You can help by adding to it. (February 2023) |
The following is a list of fuzzers described as "popular", "widely used", or similar in the academic literature.[59][60]
| Name | White/gray/black-box | Smart/dumb | Description | Written in | License |
|---|---|---|---|---|---|
| AFL[61][62] | Gray | Dumb | C | Apache 2.0 | |
| AFL++[63] | Gray | Dumb | C | Apache 2.0 | |
| AFLFast[64] | Gray | Dumb | C | Apache 2.0 | |
| Angora[65] | Gray | Dumb | C++ | Apache 2.0 | |
| honggfuzz[66][67] | Gray | Dumb | C | Apache 2.0 | |
| QSYM[68] | [?] | [?] | [?] | [?] | |
| SymCC[69] | White[70] | [?] | C++ | GPL, LGPL | |
| T-Fuzz[71] | [?] | [?] | [?] | [?] | |
| VUzzer[72] | [?] | [?] | [?] | [?] |
See also
[edit]References
[edit]- ^ a b John Neystadt (February 2008). "Automated Penetration Testing with White-Box Fuzzing". Microsoft. Retrieved 2009-05-14.
- ^ Barton P. Miller (September 1988). "Fall 1988 CS736 Project List" (PDF). Computer Sciences Department, University of Wisconsin-Madison. Retrieved 2020-12-30.
- ^ Barton P. Miller; Lars Fredriksen; Bryan So (December 1990). "An Empirical Study of the Reliability of UNIX Utilities". Communications of the ACM. 33 (11): 32–44. doi:10.1145/96267.96279. S2CID 14313707.
- ^ a b Miller, Barton (April 2008). "Foreword for Fuzz Testing Book". UW-Madison Computer Sciences. Retrieved 29 March 2024.
- ^ "Fuzz Testing of Application Reliability". University of Wisconsin-Madison. Retrieved 2020-12-30.
- ^ a b "Announcing ClusterFuzz". Retrieved 2017-03-09.
- ^ Perlroth, Nicole (25 September 2014). "Security Experts Expect 'Shellshock' Software Bug in Bash to Be Significant". The New York Times. Retrieved 25 September 2014.
- ^ Zalewski, Michał (1 October 2014). "Bash bug: the other two RCEs, or how we chipped away at the original fix (CVE-2014-6277 and '78)". lcamtuf's blog. Retrieved 13 March 2017.
- ^ Seltzer, Larry (29 September 2014). "Shellshock makes Heartbleed look insignificant". ZDNet. Retrieved 29 September 2014.
- ^ Böck, Hanno. "Fuzzing: Wie man Heartbleed hätte finden können (in German)". Golem.de (in German). Retrieved 13 March 2017.
- ^ Böck, Hanno. "How Heartbleed could've been found (in English)". Hanno's blog. Retrieved 13 March 2017.
- ^ "Search engine for the internet of things – devices still vulnerable to Heartbleed". shodan.io. Retrieved 13 March 2017.
- ^ "Heartbleed Report (2017-01)". shodan.io. Archived from the original on 23 January 2017. Retrieved 10 July 2017.
- ^ Walker, Michael. "DARPA Cyber Grand Challenge". darpa.mil. Retrieved 12 March 2017.
- ^ "Mayhem comes in first place at CGC". Retrieved 12 March 2017.
- ^ a b "Announcing Project Springfield". 2016-09-26. Retrieved 2017-03-08.
- ^ a b c d "Announcing OSS-Fuzz". Retrieved 2017-03-08.
- ^ Christopher Domas (August 2018). "GOD MODE UNLOCKED - Hardware Backdoors in x86 CPUs". Retrieved 2018-09-03.
- ^ "Microsoft: Windows 10 is hardened with these fuzzing security tools – now they're open source". ZDNet. September 15, 2020.
- ^ "Microsoft open-sources fuzzing test framework". InfoWorld. September 17, 2020.
- ^ microsoft/onefuzz, Microsoft, 2024-03-03, retrieved 2024-03-06
- ^ Gerald M. Weinberg (2017-02-05). "Fuzz Testing and Fuzz History". Retrieved 2017-02-06.
- ^ Joe W. Duran; Simeon C. Ntafos (1981-03-09). A report on random testing. Icse '81. Proceedings of the ACM SIGSOFT International Conference on Software Engineering (ICSE'81). pp. 179–183. ISBN 9780897911467.
- ^ Joe W. Duran; Simeon C. Ntafos (1984-07-01). "An Evaluation of Random Testing". IEEE Transactions on Software Engineering (4): 438–444. doi:10.1109/TSE.1984.5010257. S2CID 17208399.
- ^ Andy Hertzfeld (2004). Revolution in the Valley:The Insanely Great Story of How the Mac Was Made?. O'Reily Press. ISBN 978-0596007195.
- ^ "Macintosh Stories: Monkey Lives". Folklore.org. 1999-02-22. Retrieved 2010-05-28.
- ^ "crashme". CodePlex. Retrieved 2021-05-21.
- ^ Michael Sutton; Adam Greene; Pedram Amini (2007). Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley. ISBN 978-0-321-44611-4.
- ^ Offutt, Jeff; Xu, Wuzhi (2004). "Generating test cases for web services using data perturbation". ACM SIGSOFT Software Engineering Notes. 29 (5): 1–10. doi:10.1145/1022494.1022529. S2CID 52854851.
- ^ Rebert, Alexandre; Cha, Sang Kil; Avgerinos, Thanassis; Foote, Jonathan; Warren, David; Grieco, Gustavo; Brumley, David (2014). "Optimizing Seed Selection for Fuzzing" (PDF). Proceedings of the 23rd USENIX Conference on Security Symposium: 861–875.
- ^ a b c Patrice Godefroid; Adam Kiezun; Michael Y. Levin. "Grammar-based Whitebox Fuzzing" (PDF). Microsoft Research.
- ^ a b c Van-Thuan Pham; Marcel Böhme; Abhik Roychoudhury (2016-09-07). "Model-based whitebox fuzzing for program binaries". Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016. Proceedings of Automated Software Engineering (ASE'16). pp. 543–553. doi:10.1145/2970276.2970316. ISBN 9781450338455. S2CID 5809364.
- ^ a b c "Peach Fuzzer". Retrieved 2017-03-08.
- ^ Greg Banks; Marco Cova; Viktoria Felmetsger; Kevin Almeroth; Richard Kemmerer; Giovanni Vigna. SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr. Proceedings of the Information Security Conference (ISC'06).
- ^ Osbert Bastani; Rahul Sharma; Alex Aiken; Percy Liang (June 2017). Synthesizing Program Input Grammars. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). arXiv:1608.01723. Bibcode:2016arXiv160801723B.
- ^ "VDA Labs - Evolutionary Fuzzing System". Archived from the original on 2015-11-05. Retrieved 2009-05-14.
- ^ a b Ari Takanen; Jared D. Demott; Charles Miller (31 January 2018). Fuzzing for Software Security Testing and Quality Assurance, Second Edition. Artech House. p. 15. ISBN 978-1-63081-519-6. full document available (archived September 19, 2018)
- ^ a b Ganesh, Vijay; Leek, Tim; Rinard, Martin (2009). "Taint-based directed whitebox fuzzing". 2009 IEEE 31st International Conference on Software Engineering. pp. 474–484. doi:10.1109/ICSE.2009.5070546. hdl:1721.1/59320. ISBN 978-1-4244-3453-4.
- ^ Wang, T.; Wei, T.; Gu, G.; Zou, W. (May 2010). "TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection". 2010 IEEE Symposium on Security and Privacy. pp. 497–512. CiteSeerX 10.1.1.169.7866. doi:10.1109/SP.2010.37. ISBN 978-1-4244-6894-2. S2CID 11898088.
- ^ Patrice Godefroid; Michael Y. Levin; David Molnar (2008-02-08). "Automated Whitebox Fuzz Testing" (PDF). Proceedings of Network and Distributed Systems Symposium (NDSS'08).
- ^ Marcel Böhme; Soumya Paul (2015-10-05). "A Probabilistic Analysis of the Efficiency of Automated Software Testing". IEEE Transactions on Software Engineering. 42 (4): 345–360. doi:10.1109/TSE.2015.2487274. S2CID 15927031.
- ^ Nick Stephens; John Grosen; Christopher Salls; Andrew Dutcher; Ruoyu Wang; Jacopo Corbetta; Yan Shoshitaishvili; Christopher Kruegel; Giovanni Vigna (2016-02-24). Driller: Augmenting. Fuzzing Through Selective Symbolic Execution (PDF). Proceedings of Network and Distributed Systems Symposium (NDSS'16).
- ^ Marcel Böhme; Van-Thuan Pham; Abhik Roychoudhury (2016-10-28). "Coverage-based Greybox Fuzzing as Markov Chain". Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Proceedings of the ACM Conference on Computer and Communications Security (CCS'16). pp. 1032–1043. doi:10.1145/2976749.2978428. ISBN 9781450341394. S2CID 3344888.
- ^ Hamlet, Richard G.; Taylor, Ross (December 1990). "Partition testing does not inspire confidence". IEEE Transactions on Software Engineering. 16 (12): 1402–1411. doi:10.1109/32.62448.
- ^ Weyuker, Elaine J. (1 November 1982). "On Testing Non-Testable Programs". The Computer Journal. 25 (4): 465–470. doi:10.1093/comjnl/25.4.465.
- ^ Barr, Earl T.; Harman, Mark; McMinn, Phil; Shahbaz, Muzammil; Yoo, Shin (1 May 2015). "The Oracle Problem in Software Testing: A Survey" (PDF). IEEE Transactions on Software Engineering. 41 (5): 507–525. Bibcode:2015ITSEn..41..507B. doi:10.1109/TSE.2014.2372785. S2CID 7165993.
- ^ "Clang compiler documentation". clang.llvm.org. Retrieved 13 March 2017.
- ^ "GNU GCC sanitizer options". gcc.gnu.org. Retrieved 13 March 2017.
- ^ Orso, Alessandro; Xie, Tao (2008). "BERT: BEhavioral Regression Testing". Proceedings of the 2008 international workshop on dynamic analysis: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008). ACM. pp. 36–42. doi:10.1145/1401827.1401835. ISBN 9781605580548. S2CID 7506576.
- ^ McKeeman, William M. (1998). "Differential Testing for Software" (PDF). Digital Technical Journal. 10 (1): 100–107. Archived from the original (PDF) on 2006-10-31.
- ^ Babić, Domagoj; Martignoni, Lorenzo; McCamant, Stephen; Song, Dawn (2011). "Statically-directed dynamic automated test generation". Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM. pp. 12–22. doi:10.1145/2001420.2001423. ISBN 9781450305624. S2CID 17344927.
- ^ a b Sesterhenn, Eric; Wever, Berend-Jan; Orrù, Michele; Vervier, Markus (19 Sep 2017). "Browser Security WhitePaper" (PDF). X41D SEC GmbH.
- ^ "Security enhancements for Microsoft Edge (Microsoft Edge for IT Pros)". Microsoft. 15 Oct 2017. Retrieved 31 August 2018.
- ^ "CERT Triage Tools". CERT Division of the Software Engineering Institute (SEI) at Carnegie Mellon University (CMU). Retrieved 14 March 2017.
- ^ "Microsoft !exploitable Crash Analyzer". CodePlex. Retrieved 14 March 2017.
- ^ "Test Case Reduction". 2011-07-18.
- ^ "IBM Test Case Reduction Techniques". 2011-07-18. Archived from the original on 2016-01-10. Retrieved 2011-07-18.
- ^ Zeller, Andreas; Hildebrandt, Ralf (February 2002). "Simplifying and Isolating Failure-Inducing Input". IEEE Transactions on Software Engineering. 28 (2): 183–200. Bibcode:2002ITSEn..28..183Z. CiteSeerX 10.1.1.180.3357. doi:10.1109/32.988498. ISSN 0098-5589. Retrieved 14 March 2017.
- ^ Hazimeh, Ahmad; Herrera, Adrian; Payer, Mathias (2021-06-15). "Magma: A Ground-Truth Fuzzing Benchmark". Proceedings of the ACM on Measurement and Analysis of Computing Systems. 4 (3): 49:1–49:29. arXiv:2009.01120. doi:10.1145/3428334. S2CID 227230949.
- ^ Li, Yuwei; Ji, Shouling; Chen, Yuan; Liang, Sizhuang; Lee, Wei-Han; Chen, Yueyao; Lyu, Chenyang; Wu, Chunming; Beyah, Raheem; Cheng, Peng; Lu, Kangjie; Wang, Ting (2021). {UNIFUZZ}: A Holistic and Pragmatic {Metrics-Driven} Platform for Evaluating Fuzzers. pp. 2777–2794. ISBN 978-1-939133-24-3.
- ^ Hazimeh, Herrera & Payer 2021, p. 1: "We evaluate seven widely-used mutation-based fuzzers (AFL, ...)".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ...".
- ^ Hazimeh, Herrera & Payer 2021, p. 1: "We evaluate seven widely-used mutation-based fuzzers (..., AFL++, ...)".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, AFLFast, ...".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ..., Angora, ...".
- ^ Hazimeh, Herrera & Payer 2021, p. 1: "We evaluate seven widely-used mutation-based fuzzers (..., honggfuzz, ...)".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ..., Honggfuzz, ...".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ..., QSYM, ...".
- ^ Hazimeh, Herrera & Payer 2021, p. 1: "We evaluate seven widely-used mutation-based fuzzers (..., and SymCC-AFL)".
- ^ Hazimeh, Herrera & Payer 2021, p. 14.
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ..., T-Fuzz, ...".
- ^ Li et al. 2021, p. 1: "Using UniFuzz, we conduct in-depth evaluations of several prominent fuzzers including AFL, ..., and VUzzer64.".
Further reading
[edit]- Nappa, A.; Blázquez, E. (2023). Fuzzing Against the Machine: Automate Vulnerability Research with Emulated IoT Devices on Qemu. Packt Publishing, Limited. ISBN 9781804614976. A comprehensive guide on automated vulnerability research with emulated IoT devices.
- Zeller, Andreas; Gopinath, Rahul; Böhme, Marcel; Fraser, Gordon; Holler, Christian (2019). The Fuzzing Book. Saarbrücken: CISPA + Saarland University. A free, online, introductory textbook on fuzzing.
- Ari Takanen, Jared D. DeMott, Charles Miller, Fuzzing for Software Security Testing and Quality Assurance, 2008, ISBN 978-1-59693-214-2
- Michael Sutton, Adam Greene, and Pedram Amini. Fuzzing: Brute Force Vulnerability Discovery, 2007, ISBN 0-321-44611-9.
- H. Pohl, Cost-Effective Identification of Zero-Day Vulnerabilities with the Aid of Threat Modeling and Fuzzing, 2011
- Fabien Duchene, Detection of Web Vulnerabilities via Model Inference assisted Evolutionary Fuzzing, 2014, PhD Thesis
- Bratus, Sergey; Darley, Trey; Locasto, Michael; Patterson, Meredith L.; Shapiro, Rebecca Bx; Shubina, Anna; "Bx" Shapiro, Rebecca; Shubina, Anna (2014). "Beyond Planted Bugs in "Trusting Trust": The Input-Processing Frontier". IEEE Security & Privacy. 12 (1): 83–87. Bibcode:2014ISPri..12a..83M. doi:10.1109/MSP.2014.1.—Basically highlights why fuzzing works so well: because the input is the controlling program of the interpreter.
External links
[edit]- Fuzzing Project, includes tutorials, a list of security-critical open-source projects, and other resources.
- University of Wisconsin Fuzz Testing (the original fuzz project) Source of papers and fuzz software.
- Designing Inputs That Make Software Fail, conference video including fuzzy testing
- Building 'Protocol Aware' Fuzzing Frameworks
Fuzzing
View on GrokipediaFundamentals
Definition and Purpose
Fuzzing, also known as fuzz testing, is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program in order to discover defects such as crashes, failed assertions, or memory errors. This approach was pioneered in the late 1980s as a simple method for feeding random inputs to applications to evaluate their reliability.[4] The primary purposes of fuzzing are to identify implementation bugs and expose security vulnerabilities, such as buffer overflows, use-after-free errors, and denial-of-service conditions, thereby enhancing software robustness without necessitating detailed knowledge of the program's internal structure or source code.[2] By systematically perturbing inputs, fuzzing complements traditional testing methods and has proven effective in uncovering issues that evade specification-based verification.[4] In distinction from other testing methodologies, basic fuzzing operates as a black-box technique, observing only the external input-output behavior of the program without access to its internals, unlike white-box or model-driven approaches that rely on program semantics or formal specifications.[7] The basic workflow entails generating diverse test inputs, injecting them into the target application, monitoring for anomalies like crashes or hangs, and logging failures for subsequent analysis.[4]Core Principles
Fuzzing operates through three fundamental components that form its operational backbone. The input generator creates test cases, often by mutating valid seed inputs or generating novel ones from models of expected formats, to probe the program's behavior under unexpected conditions.[8] The execution environment provides a controlled setting to run the target program with these inputs, typically sandboxed to manage resource usage and isolate potential crashes or hangs.[8] The oracle then monitors outputs to detect anomalies, such as segmentation faults, assertion failures, or sanitizer-detected issues like memory errors, flagging them as potential defects.[8] At its core, fuzzing explores the vast input space of a program by systematically generating diverse inputs to uncover hidden flaws. Random sampling forms a primary principle, where inputs are produced pseudo-randomly to broadly cover possible values and reveal implementation bugs that deterministic testing might miss.[9] Boundary value testing complements this by focusing on edge cases, such as maximum or minimum values for data types, which are prone to overflows or validation errors.[10] Feedback loops enable iterative refinement, where observations from prior executions—such as execution traces or coverage data—guide the generation of subsequent inputs to prioritize unexplored regions and enhance efficiency.[9] Success in fuzzing is evaluated using metrics that quantify exploration depth and defect detection quality. Code coverage rates, for instance, measure the proportion of the program's structure exercised by test cases, with branch coverage calculated as the percentage of unique branches executed relative to total branches: This metric guides resource allocation toward deeper code penetration.[11] Crash uniqueness assesses the diversity of failures found, counting distinct crashes (e.g., via stack traces or hashes) to avoid redundant reports and indicate broader vulnerability exposure.[10] Fault revelation efficiency evaluates the rate of novel bugs discovered per unit of fuzzing time or effort, providing a practical gauge of the technique's productivity in real-world testing scenarios. Instrumentation plays a pivotal role in enabling these principles by embedding lightweight probes into the target program during compilation or execution. These probes collect runtime data, such as branch transitions or memory accesses, to inform feedback loops without modifying the program's observable semantics or performance significantly. Techniques like binary instrumentation allow this monitoring even for unmodified binaries, ensuring compatibility across diverse software environments.[12]Historical Development
Origins and Early Experiments
The concept of random testing in software development emerged in the 1950s during the debugging era, when programmers commonly used decks of punch cards with random or garbage data to probe for errors in early computer programs, simulating real-world input variability without systematic methods.[13] This practice laid informal groundwork for automated input-based testing, and by the 1960s and 1970s, rudimentary automated checks were incorporated into early operating systems to validate system stability against unexpected conditions.[14] The modern technique of fuzzing originated in 1988 as a graduate class project in the Advanced Operating Systems course (CS736) taught by Barton P. Miller at the University of Wisconsin-Madison. Inspired by a thunderstorm that introduced line noise into Miller's dial-up connection, causing random corruption of inputs and subsequent crashes in UNIX utilities, the project aimed to systematically evaluate software reliability using automated random inputs.[15] Students developed a tool called "fuzz" to generate random ASCII streams, including printable and non-printable characters, NULL bytes, and varying lengths up to 25,000 bytes, feeding them into 88 standard UNIX utilities across seven different UNIX implementations, such as 4.3BSD, SunOS 3.2, and AIX 1.1. For interactive programs, a complementary tool named "ptyjig" simulated random keyboard and mouse inputs via pseudo-terminals.[4] The experiments revealed significant vulnerabilities, with 25-33% of the utilities crashing or hanging across the tested systems—for instance, 29% on a VAX running 4.3BSD and 25% on a Sun workstation running SunOS. Common failures included segmentation violations, core dumps, and infinite loops, often triggered by poor input validation in areas like buffer management and string parsing; notable examples involved utilities like "troff" and "ld" producing exploitable faults. These results, published in 1990, demonstrated fuzzing's potential to uncover bugs overlooked by traditional testing, prompting UNIX vendors to integrate similar tools into their quality assurance processes.[4][15] Despite its successes, the early fuzzing approach had notable limitations, including the purely random nature of input generation, which lacked structure or guidance toward edge cases, potentially missing deeper program paths. Crash analysis was also manual and challenging, relying on core dumps and debugger examination without access to source code for many utilities, limiting reproducibility and root-cause diagnosis.[4]Key Milestones and Modern Advancements
In the late 1990s and early 2000s, fuzzing evolved from ad-hoc random testing to more structured frameworks targeted at specific domains. The PROTOS project, initiated in 1999 by researchers at the University of Oulu, introduced a systematic approach to protocol fuzzing by generating test cases based on protocol specifications to uncover implementation flaws in network software. This framework emphasized heuristic-based mutation of protocol fields, leading to the discovery of over 50 vulnerabilities in widely used protocols like SIP and SNMP by 2003. Building on this, Microsoft's SAGE (Automated Whitebox Fuzz Testing) tool, released in 2008, pioneered whitebox fuzzing by combining symbolic execution with random input generation to systematically explore program paths in binary applications.[16] SAGE significantly enhanced coverage in security testing, reportedly finding dozens of bugs in Windows components that blackbox methods missed.[17] The 2010s marked a surge in coverage-guided fuzzing, driven by open-source tools that integrated genetic algorithms and compiler instrumentation. American Fuzzy Lop (AFL), developed by Michał Zalewski and publicly released in 2013, employed novel compile-time instrumentation to track code coverage and evolve inputs via mutation, achieving breakthroughs in efficiency for binary fuzzing.[18] AFL played a pivotal role in exposing follow-up vulnerabilities related to the Shellshock bug (CVE-2014-6271 and CVE-2014-6277) in Bash during 2014, demonstrating fuzzing's ability to uncover command injection flaws in shell interpreters. Concurrently, LLVM's LibFuzzer, introduced in 2015, provided an in-process fuzzing engine tightly integrated with AddressSanitizer and coverage instrumentation, enabling seamless fuzzing of C/C++ libraries with minimal overhead.[19] This tool's adoption accelerated bug detection in projects like OpenSSL, where it complemented sanitizers to identify memory errors. Google's OSS-Fuzz, launched in 2016, represented a paradigm shift toward continuous, large-scale fuzzing for open-source software, integrating engines like AFL and LibFuzzer into CI/CD pipelines across thousands of cores.[20] As of May 2025, OSS-Fuzz has helped identify and fix over 13,000 vulnerabilities and 50,000 bugs across 1,000 projects, underscoring fuzzing's role in proactive security maintenance.[21] In parallel, syzkaller, developed by Google starting in 2015, adapted coverage-guided fuzzing for operating system kernels by generating syscall sequences informed by kernel coverage feedback, leading to thousands of Linux kernel bug reports.[22] For instance, syzkaller exposed race conditions and memory issues in subsystems like networking and filesystems, with ongoing enhancements improving its state-machine modeling for complex kernel interactions. Modern advancements from 2017 onward have focused on scalability and hybridization. AFL++, a community fork of AFL initiated in 2017, incorporated optimizations like mirror scheduling and advanced mutation strategies (e.g., dictionary-based and havoc modes), boosting performance by up to 50% on real-world benchmarks while maintaining compatibility.[23] This evolution enabled deeper exploration in environments like web browsers and embedded systems. Google's ClusterFuzz, first deployed in 2011 and scaled extensively by the 2010s, exemplified cloud-based fuzzing by orchestrating distributed execution across 25,000+ cores, automating triage, and integrating with OSS-Fuzz to handle high-volume campaigns.[24] Its impact was evident in high-profile detections, such as Codenomicon's 2014 fuzzing-based discovery of the Heartbleed vulnerability (CVE-2014-0160) in OpenSSL, which exposed a buffer over-read affecting millions of servers.[25] Recent trends up to 2025 include hybrid techniques blending fuzzing with machine learning for seed prioritization, as seen in tools like those extending syzkaller, and AI enhancements in OSS-Fuzz, which in 2024 discovered 26 new vulnerabilities in established projects, including a long-standing flaw in OpenSSL.[26] further amplifying detection rates in kernel and protocol domains.Fuzzing Techniques
Mutation-Based Fuzzing
Mutation-based fuzzing generates test inputs by applying random or heuristic modifications to a set of valid seed inputs, such as existing files, network packets, or messages, without requiring prior knowledge of the input format or protocol. The process begins by selecting a seed from a queue, optionally trimming it to minimize size while preserving behavior, then applying a series of mutations to produce variants for execution against the target program.[27] Common mutation operations include bit flips (e.g., inverting 1, 2, or 4 bits at random positions), arithmetic modifications (e.g., adding or subtracting small integers to 8-, 16-, or 32-bit values), byte insertions or deletions, overwriting with predefined "interesting" values (e.g., 0, 1, or boundary cases like 0xFF), and dictionary-based swaps using domain-specific tokens.[27] If a mutated input triggers new code coverage or crashes, it is added to the seed queue for further mutation; otherwise, the process cycles to the next seed.[23] This approach offers low computational overhead due to its reliance on simple, stateless transformations and the reuse of valid seeds, which increases the likelihood of passing initial parsing stages compared to purely random generation.[28] It is particularly effective for binary or unstructured formats where structural models are unavailable or costly to develop, enabling rapid exploration of edge cases with minimal setup. For instance, dictionary-based mutations enhance efficiency by incorporating protocol-specific terms, such as HTTP headers, to target relevant input regions without exhaustive random trials.[27] Key algorithms optimize seed selection and mutation application to balance exploration and exploitation. The PowerSchedule algorithm, introduced in AFL, dynamically assigns "energy" (i.e., the number of mutations attempted per seed) based on factors like input length, path depth, and historical coverage contributions, favoring shorter or more promising seeds to allocate computational resources efficiently—typically executing 1 to 10 times more mutations on high-value paths.[27] In havoc mode, a core mutation strategy, random perturbations are stacked sequentially (e.g., 2 to 4096 operations per input, selected via a batch exponent where the number of tweaks is ), including bit flips, arithmetic changes, block deletions or duplications, and dictionary insertions, with a low probability (around 6%) of invoking custom extensions to avoid over-mutation.[23] The mutation rate is calibrated inversely with input length to maintain diversity; for an input of length , the probability of altering a specific byte approximates , ensuring proportional changes across varying sizes.[27] In practice, mutation-based fuzzing has proven effective for testing file parsers with minimal structural knowledge. A study on PNG image parsers using tools like zzuf applied bit-level mutations to seed files (e.g., varying chunk counts from 5 to 9), generating 200,000 variants per seed, which exposed checksum handling flaws but achieved only ~24% of the code coverage obtained by generation-based methods due to limited deep-path exploration without format awareness.[28] Similarly, a 2024 study fuzzing XML parsers such as libxml2, Apache Xerces, and Expat found that byte-level mutations with AFL detected more crashes than tree-level strategies, particularly in Xerces (up to 57 crashes with protocol-conformant seeds vs. 38 with public seeds), though no security vulnerabilities beyond illegal instructions were found.[29]Generation-Based Fuzzing
Generation-based fuzzing employs formal models such as context-free grammars, schemas, or finite state machines (FSMs) to synthetically generate test inputs that adhere to specified input formats or protocols while incorporating deliberate faults.[30] This method contrasts with mutation-based approaches by constructing inputs from scratch according to the model, ensuring syntactic validity to reach deeper program states without early rejection by input parsers.[31] In protocol fuzzing, FSMs model the sequence of states and transitions, allowing the creation of input sequences that simulate protocol handshakes or sessions with injected anomalies. Key techniques include random grammar mutations, where production rules are probabilistically altered to introduce variations in structure, and constraint solving to produce semantically valid yet malformed data.[32] For example, constraint solvers can enforce field dependencies in a schema while randomizing values to violate expected behaviors, such as generating HTTP requests with invalid headers that still parse correctly. In practice, parsers generated from tools like ANTLR for HTTP grammars enable the derivation of test cases by expanding non-terminals and mutating terminals, focusing faults on semantic layers.[33] The primary benefits of generation-based fuzzing lie in its ability to explore complex state spaces through valid inputs, enabling tests of intricate logic in parsers and protocol handlers that random or mutated data might bypass.[34] However, this comes at the cost of higher computational overhead, as input generation involves recursive expansion of the model for each test case. The scale of possible derivations in a grammar without recursion is determined by the product of the number of rule choices for each non-terminal, leading to rapid growth in input variety but increased generation time.[30] In network protocol applications, generation-based methods facilitate stateful fuzzing by producing sequences that respect transition dependencies, as seen in frameworks like Boofuzz, which use FSM-driven primitives to craft multi-packet interactions for protocols such as TCP or SIP. This approach has proven effective for uncovering vulnerabilities in state-dependent implementations, where invalid sequences reveal flaws in session management.[35]Coverage-Guided and Hybrid Fuzzing
Coverage-guided fuzzing enhances traditional mutation-based approaches by incorporating runtime feedback to direct the generation of test inputs toward unexplored code regions. This technique involves instrumenting the target program to monitor execution coverage, typically at the level of basic blocks or control-flow edges, using lightweight mechanisms such as bitmaps to record reached transitions. Inputs that trigger new coverage are assigned higher priority for mutation, enabling efficient exploration of the program's state space; for instance, American Fuzzy Lop (AFL) employs a shared bitmap to track edge coverage across executions, favoring "power schedules" that allocate more mutations to promising seeds.[36] This feedback loop contrasts with undirected fuzzing by systematically increasing code coverage, often achieving deeper penetration into complex binaries.[23] Hybrid fuzzing builds on coverage guidance by integrating complementary techniques, such as generation-based methods or machine learning, to overcome limitations in path exploration and input synthesis. In these approaches, mutation is combined with adaptive seeding strategies; for example, a fitness score can guide prioritization via the formula , which quantifies the efficiency of inputs in revealing novel control flow. Grey-box models further hybridize by selectively invoking symbolic execution to resolve hard-to-reach branches when coverage stalls, as in Driller, which augments fuzzing with concolic execution to generate inputs that bypass concrete execution dead-ends without full symbolic overhead.[37] More recent advancements incorporate machine learning, such as NEUZZ, which trains neural networks to approximate program behavior and enable gradient-based optimization for fuzzing guidance, smoothing discrete branch decisions into continuous landscapes for better seed selection.[38] As of 2025, further advancements include LLM-guided hybrid fuzzing, which uses large language models for semantic-aware input generation to improve exploration in stateful systems.[39] These methods have demonstrated significant effectiveness in detecting vulnerabilities in large-scale, complex software, including web browsers, where traditional fuzzing struggles with deep state interactions. For example, coverage-guided hybrid techniques have uncovered numerous security bugs in Chromium by achieving higher branch coverage and faster crash reproduction compared to black-box alternatives, contributing to real-world vulnerability disclosure in production environments.[40] Quantitative evaluations show improvements in bug-finding rates, with hybrid fuzzers like Driller achieving a 13% increase in unique crashes (77 vs. 68) over pure coverage-guided baselines like AFL in the DARPA CGC benchmarks.[37]Applications
Bug Detection and Vulnerability Exposure
Fuzzing uncovers software defects by systematically supplying invalid, malformed, or random inputs to program interfaces, with the goal of provoking exceptions, memory corruptions, or logic errors that reveal underlying flaws. This dynamic approach monitors runtime behavior for indicators of failure, such as segmentation faults or assertion violations, which signal potential defects in code handling edge cases. By exercising rarely encountered paths, fuzzing exposes issues that deterministic testing often misses, including those arising from unexpected data flows or boundary conditions.[41] Among the vulnerabilities commonly detected, buffer overflows stand out, where excessive input data overwrites adjacent memory regions, potentially allowing arbitrary code execution. Integer overflows, which occur when arithmetic operations exceed representable values in a data type, can lead to incorrect computations and subsequent exploits. Race conditions, involving timing-dependent interactions in multithreaded environments, manifest as inconsistent states or data corruption under concurrent access. In C/C++ programs, fuzzing frequently identifies null pointer dereferences by generating inputs that nullify pointers before dereference operations, triggering crashes that pinpoint the error location.[42][43][44] Studies indicate that fuzzing outperforms manual testing by executing programs orders of magnitude more frequently, thereby exploring deeper into state spaces and uncovering unique crashes that human-led efforts overlook. For instance, empirical evaluations show fuzzers detecting vulnerabilities in complex systems where traditional methods achieve limited coverage. Integration with memory sanitizers like AddressSanitizer (ASan) amplifies this impact by instrumenting code to intercept and report precise error details, such as the stack trace and offset for a buffer overflow, enabling faster triage and patching.[45][46][47] To sustain effectiveness over time, corpus-based fuzzing employs seed input collections derived from prior tests or real-world data, replaying them to verify regressions and mutate them for new discoveries. This strategy ensures that code modifications do not reintroduce fixed bugs while expanding coverage. Continuous fuzzing embedded in CI/CD pipelines further automates this process, running fuzzer jobs on every commit or pull request to catch defects early in the development cycle, thereby reducing the cost of remediation.[48][49]Validation of Static Analysis
Fuzzing serves as a dynamic complement to static analysis tools, which often generate warnings about potential issues such as memory leaks or buffer overflows but suffer from high false positive rates. In this validation process, outputs from static analyzers like Coverity or Infer are used to guide targeted fuzzing campaigns, where fuzzers generate inputs specifically aimed at reproducing the flagged code paths or functions. This involves extracting relevant code slices or hotspots from the warnings—such as tainted data flows in taint analysis—and creating minimal, compilable binaries for fuzzing, allowing the fuzzer to exercise the suspected vulnerable locations efficiently.[50][51] The primary benefit of this approach is the reduction of false positives through empirical verification: if a warning does not lead to a crash or anomaly under extensive fuzzing, it is likely spurious, thereby alleviating the manual triage burden on developers. For instance, in scenarios involving taint analysis warnings for potential information leaks, fuzzing can confirm whether tainted inputs actually propagate to sensitive sinks, as demonstrated in evaluations on libraries like OpenSSL where buffer overflow alerts were pruned if non-crashing. This method not only confirms true positives but also provides concrete evidence for dismissal, improving overall developer productivity in large-scale software maintenance.[51][52] Integration often employs feedback-directed fuzzing techniques, where static hotspots inform the fuzzer's power schedule or seed selection to prioritize exploration toward warning locations. Tools like FuzzSlice automate this by generating type-aware inputs for function-level slices, while advanced frameworks such as Lyso use multi-step directed greybox fuzzing, correlating alarms across program flows (via control and data flow graphs) to break validation into sequential goals. A key metric for effectiveness is the false positive reduction rate; for example, FuzzSlice identified 62% of developer-confirmed false positives in open-source warnings by failing to trigger crashes on them, and hybrid approaches have reported up to 100% false positive elimination in benchmark tests.[51][50] Case studies in large codebases highlight practical impact, such as applying targeted fuzzing to validate undefined behavior reports in projects like tmux and OpenSSH, where static tools flagged numerous potential issues but fuzzing confirmed only a subset, enabling focused fixes. Similarly, directed fuzzing guided by static analysis on multimedia libraries (e.g., Libsndfile) has uncovered and verified previously unknown vulnerabilities from alarm correlations, demonstrating scalability for enterprise-scale validation without exhaustive manual review. These integrations underscore fuzzing's role in bridging static warnings to actionable insights, particularly for legacy or complex systems.[51][50]Domain-Specific Implementations
Fuzzing has been extensively adapted for browser security, where it targets complex components such as DOM parsers and JavaScript engines to uncover vulnerabilities that could lead to code execution or data leaks. Google's ClusterFuzz infrastructure, which supports fuzzing of Chromium, operates on a scale of 25,000 cores and has identified over 27,000 bugs in Google's codebase, including Chromium, as of February 2023.[53][54] This large-scale deployment enables continuous testing of browser rendering pipelines and script interpreters, leveraging coverage-guided techniques to prioritize inputs that exercise rarely reached code paths in these high-risk areas. In kernel and operating system fuzzing, tools like syzkaller focus on system call interfaces to systematically probe kernel behaviors, including those in device drivers and file systems, which are prone to memory corruption and race conditions. Syzkaller employs grammar-based input generation and kernel coverage feedback via mechanisms like KCOV to discover deep bugs that traditional testing overlooks.[22] As of 2024, syzkaller has uncovered nearly 4,000 vulnerabilities in the Linux kernel alone, many of which affect drivers for storage and networking hardware.[55] These findings have led to critical patches, demonstrating the tool's effectiveness in simulating real-world OS interactions without requiring full hardware emulation. Fuzzing extends to other domains, such as network protocols, where stateful implementations like TLS demand modeling of handshake sequences and message flows to detect flaws in cryptographic handling or state transitions. Protocol state fuzzing, for instance, has revealed multiple previously unknown vulnerabilities in major TLS libraries, including denial-of-service issues in OpenSSL and GnuTLS, by systematically exploring valid and malformed protocol states.[56] In embedded systems, adaptations for resource-constrained and stateful environments often involve firmware emulation or semi-hosted execution to maintain persistent states across fuzzing iterations, addressing challenges like limited memory and non-deterministic hardware interactions.[57] These tailored approaches have improved coverage in IoT devices and microcontrollers, identifying buffer overflows and logic errors that could compromise system integrity. Scaling fuzzing for domain-specific targets, especially resource-intensive ones like browsers and kernels, relies on distributed infrastructures to distribute workloads across clusters and achieve high throughput. However, challenges arise in efficient task scheduling, where imbalances can lead to underutilized resources or redundant efforts, as well as in managing synchronization for stateful targets. Solutions like dynamic centralized schedulers in frameworks such as UniFuzz optimize seed distribution and mutation strategies across nodes, reducing overhead and enhancing bug discovery rates in large-scale deployments.Tools and Infrastructure
Popular Fuzzing Frameworks
American Fuzzy Lop (AFL) and its enhanced fork AFL++ are prominent coverage-guided fuzzing frameworks that employ mutation-based techniques to generate inputs, leveraging compile-time instrumentation for efficient branch coverage feedback. AFL uses a fork-server model to minimize process overhead, enabling rapid execution of test cases, while AFL++ extends this with optimizations such as persistent mode for in-memory fuzzing without repeated initialization, custom mutator APIs for domain-specific mutations, and support for various instrumentation backends including LLVM and QEMU. These frameworks are open-source and widely adopted for fuzzing user-space applications, particularly in C and C++ binaries.[58][59] LibFuzzer serves as an in-process, coverage-guided evolutionary fuzzer tightly integrated with the LLVM compiler infrastructure, allowing seamless linking with the target library to feed mutated inputs directly without external process spawning. It supports AddressSanitizer (ASan) and other sanitizers for detecting memory errors during fuzzing sessions, and is commonly invoked via build systems like CMake by adding compiler flags such as-fsanitize=fuzzer to enable instrumentation. LibFuzzer excels in fuzzing libraries and APIs, prioritizing speed through in-process execution and corpus-based mutation strategies.[60]
Other notable frameworks include Honggfuzz, which provides hardware-accelerated coverage feedback using Intel PT or AMD IBS for precise edge detection, alongside software-based options, and supports multi-threaded fuzzing to utilize all CPU cores efficiently. Syzkaller is a specialized, unsupervised coverage-guided fuzzer designed for operating system kernels, generating syscall programs based on declarative descriptions and integrating with kernel coverage tools like KCOV to explore deep code paths. Peach Fuzzer, in its original open-source community edition (no longer actively maintained since 2019), focuses on protocol-oriented fuzzing through generation-based and mutation-based approaches, requiring users to define data models via Peach Pit XML files for structured input creation and stateful testing of network protocols; its technology forms the basis for the actively developed GitLab Protocol Fuzzer Community Edition.[61][22][62][63]
| Framework | Type | Primary Languages/Targets | License |
|---|---|---|---|
| AFL++ | Coverage-guided mutation | C/C++, binaries (user-space) | Apache 2.0 |
| LibFuzzer | Coverage-guided evolutionary (in-process) | C/C++, libraries/APIs | Apache 2.0 |
| Honggfuzz | Coverage-guided (HW/SW feedback) | C/C++, binaries | Apache 2.0 |
| Syzkaller | Coverage-guided (kernel-specific) | Kernel syscalls (Linux, others) | Apache 2.0 |
| Peach Fuzzer | Generation/mutation (protocol-oriented) | Protocols, networks (multi-language) | MIT |

