Recent from talks
Nothing was collected or created yet.
ReDoS
View on WikipediaA regular expression denial of service (ReDoS)[1] is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate. The attack exploits the fact that many[2] regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size. An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and/or input. The program will then slow down or become unresponsive.[3][4]
Description
[edit]Regular expression ("regex") matching can be done by building a finite-state automaton. Regex can be easily converted to nondeterministic automata (NFAs), in which for each state and input symbol, there may be several possible next states. After building the automaton, several possibilities exist:
- the engine may convert it to a deterministic finite-state automaton (DFA) and run the input through the result;
- the engine may try one by one all the possible paths until a match is found or all the paths are tried and fail ("backtracking").[5][6]
- the engine may consider all possible paths through the nondeterministic automaton in parallel;
- the engine may convert the nondeterministic automaton to a DFA lazily (i.e., on the fly, during the match).
Of the above algorithms, the first two are problematic. The first is problematic because a deterministic automaton could have up to states where is the number of states in the nondeterministic automaton; thus, the conversion from NFA to DFA may take exponential time. The second is problematic because a nondeterministic automaton could have an exponential number of paths of length , so that walking through an input of length will also take exponential time.[7] The last two algorithms, however, do not exhibit pathological behavior.
Note that for non-pathological regular expressions, the problematic algorithms are usually fast, and in practice, one can expect them to "compile" a regex in time and match it in time; instead, simulation of an NFA and lazy computation of the DFA have worst-case complexity.[a] Regex denial of service occurs when these expectations are applied to a regex provided by the user, and malicious regular expressions provided by the user trigger the worst-case complexity of the regex matcher.
While regex algorithms can be written in an efficient way, most regex engines in existence extend the regex languages with additional constructs that cannot always be solved efficiently. Such extended patterns essentially force the implementation of regex in most programming languages to use backtracking.
Examples
[edit]Exponential backtracking
[edit]The most severe type of problem happens with backtracking regular expression matches, where some patterns have a runtime that is exponential in the length of the input string.[8] For strings of characters, the runtime is . This happens when a regular expression has three properties:
- the regular expression applies repetition (
+,*) to a subexpression; - the subexpression can match the same input in multiple ways, or the subexpression can match an input string which is a prefix of a longer possible match;
- and after the repeated subexpression, there is an expression that matches something which the subexpression does not match.
The second condition is best explained with two examples:
- in
(a|a)+$, repetition is applied to the subexpressiona|a, which can matchain two ways on each side of the alternation. - in
(a+)*$, repetition is applied to the subexpressiona+, which can matchaoraa, etc.
In both of these examples we used $ to match the end of the string, satisfying the third condition, but it is also possible to use another character for this. For example (a|aa)*c has the same problematic structure.
All three of the above regular expressions will exhibit exponential runtime when applied to strings of the form . For example, if you try to match them against aaaaaaaaaaaaaaaaaaaaaaaax on a backtracking expression engine, it will take a significantly long time to complete, and the runtime will approximately double for each extra a before the x.
It is also possible to have backtracking which is polynomial time , instead of exponential. This can also cause problems for long enough inputs, though less attention has been paid to this problem as malicious input must be much longer to have a significant effect. An example of such a pattern is "a*b?a*c", when the input is an arbitrarily long sequence of "a"s.
Vulnerable regexes in online repositories
[edit]So-called "evil" or vulnerable regexes have been found in online regular expression repositories. Note that it is enough to find a vulnerable subexpression in order to attack the full regex:
- RegExLib, id=1757 (email validation) – see red part
^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]{1}(([a-z]{2,3})|([a-z]{2,3}[.]{1}[a-z]{2,3}))$ - OWASP Validation Regex Repository, Java Classname – see red part
^(([a-z])+.)+[A-Z]([a-z])+$
These two examples are also susceptible to the input aaaaaaaaaaaaaaaaaaaaaaaa!.
Attacks
[edit]If the regex itself is affected by user input, such as a web service permitting clients to provide a search pattern, then an attacker can inject a malicious regex to consume the server's resources. Therefore, in most cases, regular expression denial of service can be avoided by removing the possibility for the user to execute arbitrary patterns on the server. In this case, web applications and databases are the main vulnerable applications. Alternatively, a malicious page could hang the user's web browser or cause it to use arbitrary amounts of memory.
However, if a vulnerable regex exists on the server-side already, then an attacker may instead be able to provide an input that triggers its worst-case behavior. In this case, e-mail scanners and intrusion detection systems could also be vulnerable.
In the case of a web application, the programmer may use the same regular expression to validate input on both the client and the server side of the system. An attacker could inspect the client code, looking for evil regular expressions, and send crafted input directly to the web server in order to hang it.[9]
Mitigation
[edit]ReDoS can be mitigated without changes to the regular expression engine, simply by setting a time limit for the execution of regular expressions when untrusted input is involved.[10]
ReDoS can be avoided entirely by using a non-vulnerable regular expression implementation. After CloudFlare's web application firewall (WAF) was brought down by a PCRE ReDoS in 2019, the company rewrote its WAF to use the non-backtracking Rust regex library, using an algorithm similar to RE2.[11][12]
Vulnerable regular expressions can be detected programmatically by a linter.[13] Methods range from pure static analysis[14][15] to fuzzing.[16] In most cases, the problematic regular expressions can be rewritten as "non-evil" patterns. For example, (.*a)+ can be rewritten to ([^a]*a)+. Possessive matching and atomic grouping, which disable backtracking for parts of the expression,[17] can also be used to "pacify" vulnerable parts.[18][19]
Linear-time (finite automata) regex
[edit]While some regex libraries do not have built-in defence against ReDoS attacks, such as C++ Standard Library <regex>, C POSIX library <regex.h>[20] or Boost boost.regex (which use backtracking, leading to exponential time), other regex libraries are engineered for preventing regex denial of service attacks. This is done using deterministic finite automata, which run in linear time relative to the input size.
Using the RE2 library by Google for C++:[21]
import <re2/re2.h>;
import std;
using std::string;
using re2::RE2;
int main(int argc, char* argv[]) {
string text = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
string pattern = "(a+)+$";
bool match = RE2::FullMatch(text, pattern);
std::println("Match result: {}", match);
}
Using the regex crate for Rust:[22]
use regex::Regex;
fn main() {
// Regex::new() returns Result<Regex, Error> and must be unwrapped
let re: Regex = Regex::new(r"^(a+)+$").unwrap();
let matches: bool = re.is_match("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!");
println!("Match result: {}", matches);
}
Regex match timeout
[edit]Timeouts can be implemented to cancel regex tasks if they take too long.
package org.wikipedia.examples;
import java.util.concurrent.*;
import java.util.regex.*;
public class Example {
public static boolean matchesWithTimeout(String regex, String input, long timeoutMillis) {
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<Boolean> future = executor.submit(() -> {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
return matcher.matches();
});
try {
return future.get(timeoutMillis, TimeUnit.MILLISECONDS);
} catch (TimeoutException e) {
System.err.println("Regex evaluation timed out!");
return false;
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
return false;
} finally {
future.cancel(true); // Stop the thread
executor.shutdownNow();
}
}
public static void main(String[] args) {
String regex = "(a+)+$";
String input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!";
boolean result = matchesWithTimeout(regex, input, 100); // 100 ms timeout
System.out.printf("Match result: %s%n", result);
}
}
Timeouts are built in to the .NET standard library, as the class System.Text.RegularExpressions.Regex supports a property MatchTimeout.[23] The following is an example in C#:
namespace Wikipedia.Examples;
using System;
using System.Text.RegularExpressions;
public class Example
{
static void Main(string[] args)
{
string pattern = @"(a+)+$";
string input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX";
try
{
Regex regex = new(pattern, RegexOptions.None, TimeSpan.FromMilliseconds(100));
bool match = regex.IsMatch(input);
Console.WriteLine($"Match result: {match}");
}
catch (RegexMatchTimeoutException ex)
{
Console.WriteLine($"Regex operation timed out! {ex.Message}");
}
}
}
See also
[edit]- Denial-of-service attack
- Billion laughs attack, a similar attack on XML parsers
- Black fax
- Busy beaver, a program that produces the maximum possible output before terminating
- Email bomb
- Fork bomb
- Logic bomb
- Online algorithm, limit discovered rather than declared
- Zip bomb
- Time bomb (software)
- Denial-of-service attack
- Cyberwarfare
- Low Orbit Ion Cannon
- High Orbit Ion Cannon
References
[edit]- ^ Lazy computation of the DFA can usually reach the speed of deterministic automatons while keeping worst case behavior similar to simulation of an NFA. However, it is considerably more complex to implement and can use more memory.
- ^ OWASP (2010-02-10). "Regex Denial of Service". Retrieved 2010-04-16.
- ^ Davis, James; Louis, Michael; Coghlan, Christy; Servant, Francisco; Lee, Dongyoon (2019). "Why Aren't Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions" (PDF). The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering: 443–454.
- ^ RiverStar Software (2010-01-18). "Security Bulletin: Caution Using Regular Expressions". Archived from the original on 2011-07-15. Retrieved 2010-04-16.
- ^ Ristic, Ivan (2010-03-15). ModSecurity Handbook. London, UK: Feisty Duck Ltd. p. 173. ISBN 978-1-907117-02-2. Archived from the original on 2016-08-08. Retrieved 2010-04-16.
- ^ Crosby and Wallach, Usenix Security (2003). "Regular Expression Denial Of Service". Archived from the original on 2005-03-01. Retrieved 2010-01-13.
- ^ Bryan Sullivan (2010-05-03). "Regular Expression Denial of Service Attacks and Defenses". Retrieved 2010-05-06.
- ^ Kirrage, J.; Rathnayake, A.; Thielecke, H. (2013). "Static Analysis for Regular Expression Denial-of-Service Attacks". Network and System Security. Madrid, Spain: Springer. pp. 135–148. arXiv:1301.0849. doi:10.1007/978-3-642-38631-2_11.
- ^ Jim Manico and Adar Weidman (2009-12-07). "OWASP Podcast 56 (ReDoS)". Retrieved 2010-04-02.
- ^ Barlas, Efe; Du, Xin; Davis, James (2022). "Exploiting Input Sanitization for Regex Denial of Service" (PDF). ACM/IEEE International Conference on Software Engineering: 1–14. arXiv:2303.01996.
- ^ "Backtracking in .NET regular expressions - .NET". learn.microsoft.com. 11 August 2023.
When using System.Text.RegularExpressions to process untrusted input, pass a timeout. A malicious user can provide input to RegularExpressions, causing a Denial-of-Service attack. ASP.NET Core framework APIs that use RegularExpressions pass a timeout.
- ^ "Making the WAF 40% faster". The Cloudflare Blog. 1 July 2020.
- ^ Cox, Russ (2007). "Regular Expression Matching Can Be Simple And Fast". Retrieved 2011-04-20. – describes the RE2 algorithm
- ^ See e.g. Schmidt, Michael (30 March 2023). "RunDevelopment/scslre". GitHub., TSUYUSATO, Kitsune. "recheck Introduction"., and Davis, James. "vuln-regex-detector/src/detect/README.md". GitHub.
- ^ H. Thielecke, A. Rathnayake (2013). "Regular expression denial of service (ReDoS) static analysis Archived 2014-08-03 at the Wayback Machine". Retrieved 2013-05-30.
- ^ B. van der Merwe, N Weideman (2017). "Regex Static Analysis". Retrieved 2017-08-12.
- ^ "Fuzzing with Static Analysis | recheck". makenowjust-labs.github.io.
- ^ "Essential classes: Regular Expressions: Quantifiers: Differences Among Greedy, Reluctant, and Possessive Quantifiers". The Java Tutorials. Oracle. Archived from the original on 7 October 2020. Retrieved 23 December 2016.
- ^ "compose-regexp.js, "Atomic matching"". GitHub. 2 January 2024.
"tc39/proposal-regexp-atomic-operators". Ecma TC39. 31 December 2023. - ^ "Preventing Regular Expression Denial of Service (ReDoS)". www.regular-expressions.info.
- ^ "regex(3) - Linux manual page". man7.org. 28 June 2025.
- ^ "Github - google/re2". github.com. 1 July 2025.
- ^ "regex - Rust". docs.rs. Retrieved 15 September 2025.
- ^ "Regex.MatchTimeout Property (System.Text.RegularExpressions) Microsoft Learn". learn.microsoft.com. Retrieved 15 September 2025.
External links
[edit]- Examples of ReDoS in open source applications:
- ReDoS in DataVault (CVE-2009-3277)
- ReDoS in EntLib (CVE-2009-3275)
- ReDoS in NASD CORE.NET Terelik (CVE-2009-3276)
- Some benchmarks for ReDoS
- Achim Hoffman (2010). "ReDoS - benchmark for regular expression DoS in JavaScript". Retrieved 2010-04-19.
- Richard M. Smith (2010). "Regular expression denial of service (ReDoS) attack test results Archived 2011-07-22 at the Wayback Machine". Retrieved 2010-04-19.
ReDoS
View on Grokipedia(a+)+$—lead to repeated failures and retries for inputs like repeated 'a's followed by a non-matching character (e.g., "aaaaaaaaaaaaaaaa!").[1][3] For instance, a seemingly simple pattern like ^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@(([0-9a-zA-Z])+([-\w]*[0-9a-zA-Z])*\.)+[a-zA-Z]{2,9})$ for email validation can take seconds or hours to process crafted strings due to the engine exploring thousands of paths.[3]
As of 2018, empirical studies revealed ReDoS as a widespread issue, with super-linear regex patterns appearing in thousands of software modules across ecosystems like JavaScript (e.g., npm) and Python (e.g., PyPI), affecting over 10,000 packages and potentially enabling attacks on diverse applications from web servers to input validators.[4] More recent analyses as of 2024 show continued widespread impact, with a 143% increase in reported ReDoS vulnerabilities in the npm ecosystem and approximately 400 related CVEs since 2014.[5][6] The impact includes resource exhaustion at multiple layers—client-side (e.g., browser hangs), server-side (e.g., CPU spikes in Node.js), and even in security tools like web application firewalls—often without detection until exploited.[1][4] Mitigation strategies emphasize avoiding vulnerable patterns, such as nested repetitions or ambiguous quantifiers, and adopting safer engines like deterministic finite automata (DFA) where possible, alongside techniques like input length limits, timeouts, and automated fuzzing tools to test for backtracking risks.[3][1] Despite advances in detection tools, developers often revise regexes manually rather than overhaul parsing logic, underscoring ongoing challenges in preventing ReDoS at scale.[4]
Fundamentals
Definition
Regular expressions, often abbreviated as regex, are formal patterns used in programming languages and tools to match, search, and manipulate strings of text through operations like concatenation, alternation, and repetition.[7] These patterns enable efficient text processing in applications such as input validation, data extraction, and search functionalities across various software systems. Regular expression denial of service (ReDoS) is an algorithmic complexity attack that exploits vulnerabilities in the processing of regular expressions by causing excessive computational resource consumption, effectively denying service to legitimate users.[8] In ReDoS, attackers craft malicious inputs that trigger inefficient evaluation in regex engines, particularly those relying on backtracking, resulting in catastrophic slowdowns where processing time escalates exponentially with input size. This leads to worst-case time complexities as high as O(2^n), where n is the input length, in contrast to the typical linear or polynomial time for well-behaved patterns. Unlike traditional denial-of-service attacks that overwhelm systems through high-volume traffic or resource flooding, ReDoS targets the inherent inefficiencies of regex implementations in specific application components, such as web servers or validators, allowing a single, small input to monopolize CPU cycles and halt operations.[1] This makes ReDoS particularly insidious in software relying on user-supplied inputs for pattern matching, as it exploits the gap between average-case efficiency and pathological worst-case behavior in backtracking-based engines.[8]Backtracking Mechanism
Backtracking in regular expression engines simulates the execution of a non-deterministic finite automaton (NFA) through an input-directed depth-first search, allowing the engine to handle ambiguity in the pattern by exploring multiple matching paths. The process begins at the NFA's initial state, where the engine attempts to consume the input string symbol by symbol. For each state, if a direct match (e.g., a literal character) succeeds, it advances to the next state; however, when non-determinism arises—such as with alternatives (denoted by |) or quantifiers—the engine prioritizes one path (typically the left branch or maximum repetition) and proceeds recursively. If the chosen path reaches a dead end (i.e., no further match is possible for the remaining input), the engine backtracks to the most recent choice point, restores the previous state, and tries the next alternative, continuing until an accepting path is found or all possibilities are exhausted.[9][10] This mechanism relies on maintaining a stack of states to enable backtracking, effectively turning the NFA simulation into a recursive traversal that retries failed branches. For instance, in handling unions, the engine first explores the initial alternative fully; upon failure, it unwinds the stack and shifts to the subsequent one. Similarly, for closures like Kleene stars (*), the engine recursively applies the body of the repetition, backtracking through iteration counts when the overall pattern cannot proceed. To avoid infinite loops from epsilon transitions (empty moves), many engines track visited states within a cycle before consuming input, ensuring progress.[9][11] Greedy and lazy quantifiers play a key role in shaping the backtracking paths by dictating the order of repetition attempts. A greedy quantifier, such as *, instructs the engine to match the maximum possible repetitions of its preceding element before advancing, committing to longer matches first and only backtracking by reducing the count one by one if later parts of the pattern fail. In contrast, a lazy quantifier, like *?, starts with the minimum (often zero) repetitions and incrementally increases them only when required for the overall match to succeed, reversing the backtracking sequence to favor shorter initial commitments. This difference in prioritization can significantly alter the depth and breadth of the search tree explored, though both ultimately rely on the same retry mechanism.[9] Mathematically, backtracking manifests as a depth-first search tree over the NFA's state space, where each internal node represents a partial match state with pending choices, and edges denote transitions triggered by input consumption or pattern alternatives. The root is the starting state, with branches for each non-deterministic option (e.g., quantifier iterations from maximum downward for greedy cases), leading to leaves that are either accepting matches or failures. For patterns with overlapping ambiguities, such as nested repetitions or alternatives, the tree's node count can expand exponentially—proportional to the factorial of repetition bounds in worst cases—as the engine exhaustively enumerates path combinations before concluding. This growth underscores the non-linear complexity inherent in backtracking simulations of NFAs.[9][10] The backtracking approach traces its origins to Henry Spencer's public-domain regular expression library developed in the 1980s, which introduced this NFA-based recursive method as an alternative to earlier deterministic finite automaton (DFA) implementations. This library was integrated into Perl's regex engine around 1991, marking a pivotal adoption that emphasized flexibility for extended regex features over strict linear performance. The mechanism has since proliferated, forming the core of engines in languages like JavaScript, which enumerates NFA paths in priority order per ECMAScript specifications, and Python, whose re module employs similar depth-first traversal for pattern matching.[12][11]Vulnerabilities
Common Patterns
Common patterns vulnerable to ReDoS typically involve syntactic constructs in regular expressions that introduce ambiguity, allowing backtracking regex engines to explore an exponential number of matching paths on certain inputs.[13] These patterns exploit the nondeterministic nature of the expressions, where multiple ways to consume the input string lead to redundant computations during the matching process.[14] Among the most prevalent vulnerable patterns are nested quantifiers, such as(a+)+, where inner repetitions can be grouped in various ways, creating overlapping possibilities for matching sequences of 'a' characters.[13] Another common construct is optional groups with alternations, exemplified by (a|aa)+?, which permits non-greedy matching that branches into multiple partial matches for strings with repeated substrings.[14] Overlapping subexpressions, such as quantified overlapping disjunctions like (\w|\d)+, further amplify this by allowing the engine to try numerous combinations of word characters and digits in ambiguous sequences.[14] These patterns are characterized by their star height greater than one or quantified overlaps, which inherently support multiple valid parses of the input.[15]
The ambiguity in these patterns causes path explosion because backtracking engines, which attempt all possible ways to advance through the expression, generate a combinatorial number of execution paths. For instance, in a pattern like (a|a)* matching a string of 'a's followed by a non-match, the engine may explore up to paths for an input of length , as each position can be attributed to either branch of the alternation.[15] This can be illustrated through a simplified pseudocode representation of a backtracking matcher:
function match(pattern, input, pos, state):
if pos == len(input) and state is accepting:
return true
if no more transitions from state:
return false
for each possible transition from state:
new_pos = pos + consumed_length(transition)
if new_pos <= len(input) and match(pattern, input, new_pos, next_state(transition)):
return true
// Backtrack: try alternatives or rewind
return false
function match(pattern, input, pos, state):
if pos == len(input) and state is accepting:
return true
if no more transitions from state:
return false
for each possible transition from state:
new_pos = pos + consumed_length(transition)
if new_pos <= len(input) and match(pattern, input, new_pos, next_state(transition)):
return true
// Backtrack: try alternatives or rewind
return false
Exploitation Examples
A classic illustration of a ReDoS vulnerability involves the regular expression^(a+)+$, which intends to match one or more repetitions of one or more 'a' characters at the start and end of a string. When tested against an input like "aaaa!", the backtracking engine explores numerous failure paths due to the nested quantifiers + inside another +. For a smaller input such as "aa!", the trace begins with the engine greedily matching "aa" as a single a+ group within the outer (a+)+, then failing at the $ anchor upon encountering "!". It backtracks by reducing the inner a+ to "a" and attempting a second group with the remaining "a", but fails again at $. Further backtracking tries splitting into single "a" groups or other combinations, resulting in 5 total attempts; scaling to "aaaa!" expands this exponentially to over 15 attempts, demonstrating the quadratic blowup in computation.[17]
In practical code, such vulnerabilities often appear in input validation functions using nested or overlapping groups. For instance, a JavaScript email validator employing the pattern /("[^"]*"|[^@])*@[^@]*/ can hang on crafted inputs with repeated quotes, as the engine backtracks across the alternatives [^"]* and [^@]* to find non-matches before the @. The following snippet exemplifies this:
const emailRegex = /("[^"]*"|[^@])*@[^@]*/;
function isValidEmail(email) {
return emailRegex.test(email);
}
// Vulnerable: isValidEmail('""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""@example.com');
// This input triggers extensive [backtracking](/page/Backtracking), potentially delaying execution by seconds or more.
const emailRegex = /("[^"]*"|[^@])*@[^@]*/;
function isValidEmail(email) {
return emailRegex.test(email);
}
// Vulnerable: isValidEmail('""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""@example.com');
// This input triggers extensive [backtracking](/page/Backtracking), potentially delaying execution by seconds or more.
Impacts
Performance Effects
Regular Expression Denial of Service (ReDoS) exploits the backtracking mechanism in many regex engines, leading to exponential time complexity in the worst case. In backtracking implementations, the engine explores multiple possible matches by trying different combinations of quantifiers and alternatives, which can result in an explosion of computation paths. For patterns with nested quantifiers, such as(a+)+b matched against a long string of 'a's without a trailing 'b', the number of backtracking steps grows exponentially with the input length . This can be modeled as , where is a constant representing the base cost per step, and approximates the number of decision points (often proportional to ), yielding an overall complexity.[22][23]
Benchmarks illustrate this impact on modest hardware, such as a mid-2010s laptop. For instance, a 30-character invalid input to a vulnerable pattern like ^(.*?,){n}P can take 1.8 seconds to process, spiking CPU usage to 99%, while a valid input completes in 0.05 seconds. Extending the input by one character often doubles the runtime, escalating to several seconds for 31 characters and potentially hours for longer strings due to billions of steps—for a 20-character input, one engine requires over 3.6 million steps, and for 100 characters, the theoretical steps exceed . Memory consumption also rises from stack frames for each backtrack level, potentially leading to stack overflows in languages like pre-5.10 Perl.[17][22][23]
In multi-threaded applications, such as those using Node.js, a single ReDoS-triggering input blocks the event loop thread, causing widespread delays and service unavailability as other requests queue up. Performance graphs typically show runtime versus input size as an exponential curve: near-linear for small (e.g., milliseconds), but vertical escalation beyond a threshold (e.g., seconds to minutes at ), contrasting sharply with safe patterns that remain flat.[17][24]
In comparison, linear-time alternatives like Thompson's NFA construction avoid backtracking altogether, simulating the NFA by tracking sets of states as the input is processed once, achieving time complexity where is the pattern length. This makes NFA-based engines, such as RE2, immune to ReDoS while handling the same inputs in microseconds even for large , orders of magnitude faster than vulnerable backtracking in worst cases.[12]
Real-World Incidents
One of the earliest prominent real-world ReDoS incidents occurred in July 2016, when Stack Overflow experienced a 34-minute outage affecting its entire platform. The disruption was triggered by a user-submitted post containing a malicious string that exploited a vulnerable regular expression designed to trim whitespace, specifically^[\s\u200c]+|[\s\u200c]+$, leading to excessive backtracking and CPU exhaustion on the servers.[25][18] This event highlighted the risks of processing untrusted user input with complex regex patterns in high-traffic web applications, resulting in temporary unavailability for millions of users worldwide.[26]
In July 2019, Cloudflare encountered a major global outage lasting approximately 27 minutes, impacting a significant portion of the internet's traffic routed through its network. The incident stemmed from a newly deployed Web Application Firewall (WAF) rule incorporating a regular expression with catastrophic backtracking behavior, such as .*.*=, which caused full CPU utilization on edge servers handling HTTP/HTTPS requests.[27][25] This affected millions of websites and services, underscoring the potential for ReDoS to cascade into widespread downtime when introduced in critical infrastructure components like content delivery networks.[28]
Following these high-profile cases, awareness of ReDoS has grown, particularly since 2020, with automated security scanners increasingly identifying vulnerabilities in open-source supply chains. Security analyses indicate ReDoS ranks as the fourth most common server-side vulnerability class in JavaScript ecosystems such as npm.[29] Hundreds of CVEs have been assigned to ReDoS across various libraries and frameworks as of 2023, reflecting broader adoption of static analysis for regex auditing in software development pipelines.
In 2024, ReDoS vulnerabilities continued to emerge in AI-related tools, exemplified by CVE-2024-12720 in the Hugging Face Transformers library, versions up to 4.46.3. This flaw in the Nougat tokenizer's post_process_single() function in tokenization_nougat_fast.py allowed crafted inputs to trigger exponential backtracking, potentially causing denial-of-service in AI inference workflows.[30] The vulnerability affected deployments relying on the library for document understanding tasks, emphasizing ReDoS risks in emerging machine learning supply chains.[31] As of 2025, ReDoS remains prevalent, with ongoing reports of vulnerabilities in diverse software ecosystems.
Mitigation
Design Strategies
To prevent Regular Expression Denial of Service (ReDoS) vulnerabilities, developers should follow principles that minimize or eliminate backtracking in regex patterns, particularly by avoiding constructs that lead to exponential time complexity. Key strategies include using atomic groups, which prevent backtracking within a subpattern once it has matched, and possessive quantifiers, which apply greedy matching without allowing subsequent backtracking to release characters.[32] These features, supported in engines like PCRE and Java's regex, ensure that quantified elements do not revisit prior matches, thereby bounding the execution time to linear in the input length for supported patterns.[29] Additionally, nested unbounded repeats, such as(a+)+ or (a*)*, should be avoided as they introduce infinite ambiguity and catastrophic backtracking; these are identified as high-risk anti-patterns like Star-3 in theoretical analyses of regex ambiguity.[33]
Refactoring techniques transform ambiguous patterns into deterministic equivalents that execute in linear time without altering the intended matches. For instance, the nested repeat (a+)+ can be refactored to the simpler a+, eliminating the outer quantifier's backtracking potential while preserving behavior for strings of 'a' characters.[32] Similarly, patterns with overlapping concatenations, such as a*(aa)* (Concat-1 anti-pattern), can be rewritten to a+ to remove redundant ambiguity where multiple subpatterns generate overlapping languages.[32] These refactors rely on equivalence checks via finite automata, ensuring no loss of expressiveness while preventing super-linear runtime; empirical studies show such revisions resolve over 70% of identified vulnerabilities in practice.[34]
Selecting regex engines with guaranteed linear-time performance is a foundational design choice for ReDoS prevention. Google's RE2 library implements a finite-state machine approach that evaluates alternatives in parallel without backtracking, achieving O(m * n) time complexity where m is the regex size and n is the input length, and explicitly omits features like backreferences that enable exponential behavior.[35] Similarly, Rust's regex crate compiles patterns to a Thompson NFA with lazy DFA evaluation, providing worst-case O(m * n) guarantees for searches and supporting size limits on untrusted patterns to cap compilation overhead.[36] These engines prioritize safety for production use, contrasting with backtracking-based libraries like PCRE, and are recommended for applications handling untrusted input.[29]
Testing practices should be integrated into the software development lifecycle (SDLC) to verify regex security proactively. Developers must include unit tests with large, adversarial inputs—such as strings exceeding 1,000 characters designed to trigger backtracking—to detect slowdowns exceeding linear scaling, often using timeouts to fail failing tests.[34] Input length limits, enforced before regex evaluation (e.g., rejecting strings longer than a fixed threshold), serve as a simple first-line defense, proven effective in over 20% of historical vulnerability fixes by bounding resource usage.[34] Embedding these tests in continuous integration pipelines ensures ongoing validation, with benchmarks on diverse inputs confirming linear performance.[36]
Detection Tools
Static analyzers identify potential ReDoS vulnerabilities by examining regular expression patterns without executing them, often modeling backtracking depth through analysis of nested quantifiers, ambiguity in alternatives, and polynomial-time complexity bounds.[37] Tools like Semgrep use pattern-matching rules to flag vulnerable regexes, such as those with repeated capturing groups containing optional elements, supporting multiple languages including Python and JavaScript.[38] RegexStaticAnalysis employs static techniques to detect catastrophic backtracking in expressions like(ht|f)tp(s?)://[^\s]*, evaluating structural risks in real-world codebases.[39] Similarly, RXXR2, an extension of earlier regex analyzers, uses pumping lemma-based checks to bound execution time and identify exponential blowups.[40]
Dynamic testing tools complement static methods by executing regexes against crafted inputs to measure runtime behavior and confirm vulnerabilities. Fuzzers such as regexploit generate adversarial strings heuristically to trigger excessive backtracking, providing proof-of-concept exploits for patterns like ^((mailto:)?[\w.%+-]+@[\w.-]+\.[\w]{2,20})?$.[41] ReScue applies genetic algorithms to evolve input strings that maximize processing time, effectively discovering ReDoS triggers in non-trivial cases by iteratively mutating candidates based on fitness scores derived from execution duration.[42]
Benchmarks evaluating these tools show varying accuracy, with advanced analyzers like RENGAR achieving near-complete detection of known vulnerabilities—detecting all those found by nine prior tools plus 3 to 197 times more in large datasets—while maintaining low false positives through principled modeling of backtracking states.[40] In comparative tests on benchmark sets of vulnerable expressions, static tools like RegexStaticAnalysis identified up to 46% of cases, though hybrid approaches often exceed 90% recall in recent studies.[39] Many tools, including Semgrep and CodeQL, integrate seamlessly with CI/CD pipelines via plugins for GitHub Actions or Jenkins, enabling automated scanning during code reviews to catch ReDoS risks in legacy codebases early.