Recent from talks
Nothing was collected or created yet.
Comparison of parser generators
View on Wikipedia
This article needs additional citations for verification. (July 2023) |
This is a list of notable lexer generators and parser generators for various language classes.
Regular languages
[edit]Regular languages are a category of languages (sometimes termed Chomsky Type 3) which can be matched by a state machine (more specifically, by a deterministic finite automaton or a nondeterministic finite automaton) constructed from a regular expression. In particular, a regular language can match constructs like "A follows B", "Either A or B", "A, followed by zero or more instances of B", but cannot match constructs which require consistency between non-adjacent elements, such as "some instances of A followed by the same number of instances of B", and also cannot express the concept of recursive "nesting" ("every A is eventually followed by a matching B"). A classic example of a problem which a regular grammar cannot handle is the question of whether a given string contains correctly nested parentheses. (This is typically handled by a Chomsky Type 2 grammar, also termed a context-free grammar.)
| Name | Lexer algorithm | Output languages | Grammar, code | Development platform | License |
|---|---|---|---|---|---|
| Alex | DFA | Haskell | Mixed | All | Free, BSD |
| AnnoFlex | DFA | Java | Mixed | Java virtual machine | Free, BSD |
| Astir | DFA table driven, with branching | C++ | Only grammar (actioned) | All | Free, MIT |
| AustenX | DFA | Java | Separate | All | Free, BSD |
| C# Flex | DFA | C# | Mixed | .NET CLR | Free, GNU GPL |
| C# Lex | DFA | C# | Mixed | .NET CLR | ? |
| CookCC | DFA | Java | Mixed | Java virtual machine | Free, Apache 2.0 |
| DFA | DFA compressed matrix | C, C++ | Separate | Windows, Visual Studio | BSD |
| Dolphin | DFA | C++ | Separate | All | Proprietary |
| Flex | DFA table driven | C, C++ | Mixed | All | Free, BSD |
| gelex | DFA | Eiffel | Mixed | Eiffel | Free, MIT |
| golex | DFA | Go | Mixed | Go | Free, BSD-style |
| gplex | DFA | C# | Mixed | .NET CLR | Free, BSD-like |
| JFlex | DFA | Java | Mixed | Java virtual machine | Free, BSD |
| JLex | DFA | Java | Mixed | Java virtual machine | Free, BSD-like |
| lex | DFA | C | Mixed | POSIX | Partial, proprietary, CDDL |
| lexertl | DFA | C++ | ? | All | Free, GNU LGPL |
| Quex | DFA direct code | C, C++ | Mixed | All | Free, GNU LGPL |
| Ragel | DFA | Go, C, C++, Java, assembly | Mixed | All | Free, GNU GPL, MIT[1][2] |
| RE/flex | DFA direct code, DFA table driven, and NFA regex libraries | C++ | Mixed | All | Free, BSD |
| re2c | DFA direct code | C, C++, Go, Rust | Mixed | All | Free, public domain |
Deterministic context-free languages
[edit]Context-free languages are a category of languages (sometimes termed Chomsky Type 2) which can be matched by a sequence of replacement rules, each of which essentially maps each non-terminal element to a sequence of terminal elements and/or other nonterminal elements. Grammars of this type can match anything that can be matched by a regular grammar, and furthermore, can handle the concept of recursive "nesting" ("every A is eventually followed by a matching B"), such as the question of whether a given string contains correctly nested parentheses. The rules of Context-free grammars are purely local, however, and therefore cannot handle questions that require non-local analysis such as "Does a declaration exist for every variable that is used in a function?". To do so technically would require a more sophisticated grammar, like a Chomsky Type 1 grammar, also termed a context-sensitive grammar. However, parser generators for context-free grammars often support the ability for user-written code to introduce limited amounts of context-sensitivity. (For example, upon encountering a variable declaration, user-written code could save the name and type of the variable into an external data structure, so that these could be checked against later variable references detected by the parser.)
The deterministic context-free languages are a proper subset of the context-free languages which can be efficiently parsed by deterministic pushdown automata.
| Name | Parsing algorithm | Input grammar notation | Output languages | Grammar, code | Lexer | Development platform | IDE | License |
|---|---|---|---|---|---|---|---|---|
| ANTLR4 | Adaptive LL(*)[3] | EBNF | C#, Java, Python, JavaScript, C++, Swift, Go, PHP | Separate | generated | Java virtual machine | Yes | Free, BSD |
| ANTLR3 | LL(*) | EBNF | ActionScript, Ada95, C, C++, C#, Java, JavaScript, Objective-C, Perl, Python, Ruby | Mixed | generated | Java virtual machine | Yes | Free, BSD |
| APG[citation needed] | Recursive descent, backtracking | ABNF | Python, JavaScript, C, Java | Separate | none | All | No | Free, BSD |
| Beaver[4][5] | LALR(1) | EBNF | Java | Mixed | external | Java virtual machine | No | Free, BSD |
| Bison | LALR(1), LR(1), IELR(1), GLR | Yacc | C, C++, D, Java | Mixed | external | All | No | Free, GNU GPL with exception |
| BtYacc | Backtracking Bottom-up | ? | C++ | Mixed | external | All | No | Free, public domain |
| byacc | LALR(1) | Yacc | C | Mixed | external | All | No | Free, public domain |
| CL-Yacc[6][7] | LALR(1) | Lisp | Common Lisp | Mixed | external | All | No | Free, MIT |
| Coco/R | LL(1) + semantic predicates | EBNF | C, C++, C#, F#, Java, Ada, Object Pascal, Delphi, Modula-2, Oberon, Ruby, Swift, Unicon, Visual Basic .NET | Mixed | generated | Java virtual machine, .NET framework, Windows, POSIX (depends on output language) | No | Free, GNU GPL |
| CppCC[8][9] | LL(k) | ? | C++ | Mixed | generated | POSIX | No | Free, GNU GPL |
| CUP[10][11] | LALR(1) | ? | Java | Mixed | external | Java virtual machine | No | Free, BSD-like |
| Eli[12][13] | LALR(1) | ? | C | Mixed | generated | POSIX | No | Free, GNU GPL, GNU LGPL |
| Essence[14] | LR(?) | ? | Scheme 48 | Mixed | external | All | No | Free, BSD |
| eyapp[15] | LALR(1) | ? | Perl | Mixed | external or generated | All | No | Free, Artistic |
| GOLD[16] | LALR(1) | BNF | x86 assembly language, ANSI C, C#, D, Java, Pascal, Object Pascal, Python, Visual Basic 6, Visual Basic .NET, Visual C++ | Separate | generated | Windows | Yes | Free, zlib modified |
| Hime Parser Generator[17] | LALR(1), GLR | BNF dialect | C#, Java, Rust | Separate | generated | .NET framework, Java virtual machine | No | Free, GNU LGPL |
| Hyacc[18] | LR(1), LALR(1), LR(0) | Yacc | C | Mixed | external | All | No | Free, GNU GPL |
| JavaCC[19][20] | LL(k) | EBNF | Java, C++, JavaScript (via GWT compiler)[21] | Mixed | generated | Java virtual machine | Yes | Free, BSD |
| JFLAP | LL(1), LALR(1) | ? | Java | ? | ? | Java virtual machine | Yes | ? |
| JetPAG | LL(k) | ? | C++ | Mixed | generated | All | No | Free, GNU GPL |
| JS/CC | LALR(1) | EBNF | JavaScript, JScript, ECMAScript | Mixed | internal | All | Yes | Free, BSD |
| KDevelop-PG-Qt | LL(1), backtracking, shunting-yard | ? | C++ | Mixed | generated or external | All, KDE | No | Free, GNU LGPL |
| Kelbt | Backtracking LALR(1) | ? | C++ | Mixed | generated | POSIX | No | Free, GNU GPL |
| kmyacc | LALR(1) | ? | C, Java, Perl, JavaScript | Mixed | external | All | No | Free, GNU GPL |
| Lapg | LALR(1) | ? | C, C++, C#, Java, JavaScript | Mixed | generated | Java virtual machine | No | Free, GNU GPL |
| Lark | LALR(1), Earley (SPPF) | EBNF | Python, JavaScript | Mixed | generated | All | Yes | Free, MIT |
| Lemon | LALR(1) | BNF dialect[22] | C | Mixed | external | All | No | Free, public domain |
| Lezer[23][24][25] | LR(1), GLR | EBNF dialect | JavaScript | Separate | generated | Node.js, JavaScript | No | Free, MIT |
| Lime | LALR(1) | ? | PHP | Mixed | external | All | No | Free, GNU GPL |
| LISA | LR(?), LL(?), LALR(?), SLR(?) | ? | Java | Mixed | generated | Java virtual machine | Yes | Free, public domain |
| LLgen | LL(1) | ? | C | Mixed | external | POSIX | No | Free, BSD |
| LLnextgen | LL(1) | ? | C | Mixed | external | All | No | Free, GNU GPL |
| LLLPG | LL(k) + syntactic and semantic predicates | ANTLR-like | C# | Mixed | generated (?) | .NET framework, Mono | Visual Studio | Free, GNU LGPL |
| LPG | Backtracking LALR(k) | ? | Java | Mixed | generated | Java virtual machine | No | Free, EPL |
| LRSTAR[26] | LALR(1), LALR(*) | YACC, ANTLR, EBNF | C++ | Separate | generated | Windows | Visual Studio | Free, BSD |
| Menhir | LR(1) | ? | OCaml | Mixed | generated | All | No | Free, QPL |
| ML-Yacc | LALR(1) | ? | ML | Mixed | external | All | No | ? |
| Monkey | LR(1) | ? | Java | Separate | generated | Java virtual machine | No | Free, GNU GPL |
| Msta | LALR(k), LR(k) | YACC, EBNF | C, C++ | Mixed | external or generated | POSIX, Cygwin | No | Free, GNU GPL |
| MTP (More Than Parsing) | LL(1) | ? | Java | Separate | generated | Java virtual machine | No | Free, GNU GPL |
| MyParser | LL(*) | Markdown | C++11 | Separate | internal | Any with standard C++11 compiler | No | Free, MIT |
| NLT | GLR | C#/BNF-like | C# | Mixed | mixed | .NET framework | No | Free, MIT |
| ocamlyacc | LALR(1) | ? | OCaml | Mixed | external | All | No | Free, QPL |
| olex | LL(1) | ? | C++ | Mixed | generated | All | No | Free, GNU GPL |
| Parsec | LL, backtracking | Haskell | Haskell | Mixed | none | All | No | Free, BSD |
| yapp[15] | LALR(1) | ? | Perl | Mixed | external | All | No | Free, GNU GPL |
| Parser Objects | LL(k) | ? | Java | Mixed | ? | Java virtual machine | No | Free, zlib |
| PCCTS | LL | ? | C, C++ | ? | ? | All | No | ? |
| PLY | LALR(1) | BNF | Python | Mixed | generated | All | No | Free, MIT |
| PlyPlus | LALR(1) | EBNF | Python | Separate | generated | All | No | Free, MIT |
| PRECC | LL(k) | ? | C | Separate | generated | DOS, POSIX | No | Free, GNU GPL |
| QLALR | LALR(1) | ? | C++ | Mixed | external | All | No | Free, GNU GPL |
| racc[27] | LALR(1) | BNF-like, yacc-like[28] | Ruby | Mixed | ? | Windows, Linux, macOS, FreeBSD, NetBSD | No | LGPL |
| REX[29] | LL(1) sLL(k) LR(k) LALR(k) GLR PEG DFA Context-dependent lexing | EBNF | C++, C#, Java, JavaScript, Go, Haxe, Python, Scala, Typescript, XQuery, XSLT | Separate | generated | All | No | Free, Apache License 2.0 |
| SableCC | LALR(1) | ? | C, C++, C#, Java, OCaml, Python | Separate | generated | Java virtual machine | No | Free, GNU LGPL |
| SLK[30] | LL(k) LR(k) LALR(k) | EBNF | C, C++, C#, Java, JavaScript | Separate | external | All | No | SLK[31] |
| SLY[32] | LALR(1) | BNF | Python | Mixed | generated | All | No | Free, BSD |
| SP (Simple Parser) | Recursive descent | Python | Python | Separate | generated | All | No | Free, GNU LGPL |
| Spirit | Recursive descent | ? | C++ | Mixed | internal | All | No | Free, Boost |
| Styx | LALR(1) | ? | C, C++ | Separate | generated | All | No | Free, GNU LGPL |
| Sweet Parser | LALR(1) | ? | C++ | Separate | generated | Windows | No | Free, zlib |
| Tap | LL(1) | ? | C++ | Mixed | generated | All | No | Free, GNU GPL |
| TextTransformer | LL(k) | ? | C++ | Mixed | generated | Windows | Yes | Proprietary |
| TinyPG | LL(1) | ? | C#, Visual Basic | ? | ? | Windows | Yes | Partial, CPOL 1.0 |
| Toy Parser Generator | Recursive descent | ? | Python | Mixed | generated | All | No | Free, GNU LGPL |
| TP Yacc | LALR(1) | ? | Turbo Pascal | Mixed | external | All | Yes | Free, GNU GPL |
| Tree-Sitter[33] | LR(1), GLR | JavaScript DSL, JSON | C, bindings (Rust, WebAssembly, JavaScript, Python, many other) | Separate | generated + external | All | Neovim, Helix, GNU Emacs, Lapce, Zed | Free, MIT |
| Tunnel Grammar Studio | Tunnel Parsing | ABNF | C++ | Separate | generated | Windows | Yes | Proprietary |
| UltraGram | LALR(1), LR(1), GLR | BNF | C++, Java, C#, Visual Basic .NET | Separate | external | Windows | Yes | Free, public domain |
| UniCC | LALR(1) | EBNF | C, C++, Python, JavaScript, JSON, XML | Mixed | generated | POSIX | No | Free, BSD |
| UrchinCC | LL(1) | ? | Java | ? | generated | Java virtual machine | No | ? |
| Yacc AT&T/Sun | LALR(1) | Yacc | C | Mixed | external | POSIX | No | Free, CPL & CDDL |
| Yacc++ | LR(1), LALR(1) | Yacc | C++, C# | Mixed | generated or external | All | No | Proprietary |
| Yapps | LL(1) | ? | Python | Mixed | generated | All | No | Free, MIT |
| yecc | LALR(1) | ? | Erlang | Separate | generated | All | No | Free, Apache 2.0 |
| Visual BNF | LR(1), LALR(1) | ? | C# | Separate | generated | .NET framework | Yes | Proprietary |
| YooParse | LR(1), LALR(1) | ? | C++ | Mixed | external | All | No | Free, MIT |
| Parse[34] | LR(1) | BNF in C++ types | ? | ? | none | C++11 standard compiler | No | Free, MIT |
| GGLL | LL(1) | Graph | Java | Mixed | generated | Windows | Yes | Free, MIT |
| Product | Parsing algorithm | Input grammar notation | Output languages | Grammar, code | Lexer | Development platform | IDE | License |
Parsing expression grammars, deterministic Boolean grammars
[edit]This table compares parser generators with parsing expression grammars, deterministic Boolean grammars.
| Name | Parsing algorithm | Output languages | Grammar, code | Development platform | License |
|---|---|---|---|---|---|
| AustenX | Packrat (modified) | Java | Separate | All | Free, BSD |
| Aurochs | Packrat | C, OCaml, Java | Mixed | All | Free, GNU GPL |
| BNFlite | Recursive descent | C++ | Mixed | All | Free, MIT |
| Canopy | Packrat | Java, JavaScript, Python, Ruby | Separate | All | Free, GNU GPL |
| CL-peg | Packrat | Common Lisp | Mixed | All | Free, MIT |
| Drat! | Packrat | D | Mixed | All | Free, GNU GPL |
| Frisby | Packrat | Haskell | Mixed | All | Free, BSD |
| grammar::peg | Packrat | Tcl | Mixed | All | Free, BSD |
| Grako | Packrat + Cut + Left Recursion | Python, C++ (beta) | Separate | All | Free, BSD |
| IronMeta | Packrat | C# | Mixed | Windows | Free, BSD |
| Laja | 2-phase scannerless top-down backtracking + runtime support | Java | Separate | All | Free, GNU GPL |
| lars::Parser | Packrat (supporting left-recursion and grammar ambiguity) | C++ | Identical | All | Free, BSD |
| LPeg | Parsing machine | Lua | Mixed | All | Free, MIT |
| lug | Parsing machine | C++17 | Mixed | All | Free, MIT |
| Mouse | Recursive descent (modified, limited memoization and left-recursion) | Java | Separate | Java virtual machine | Free, Apache 2.0 |
| Narwhal | Packrat | C | Mixed | POSIX, Windows | Free, BSD |
| Nearley | Earley | JavaScript | Mixed | All | Free, MIT |
| Nemerle.Peg | Recursive descent + Pratt | Nemerle | Separate | All | Free, BSD |
| neotoma | Packrat | Erlang | Separate | All | Free, MIT |
| nez[35] | Parsing machine | Java, C | Separate | Java virtual machine | Free, BSD |
| NPEG | Recursive descent | C# | Mixed | All | Free, MIT |
| OMeta | Packrat (modified, partial memoization) | JavaScript, Squeak, Python | Mixed | All | Free, MIT |
| PackCC | Packrat (modified, left-recursion support) | C | Mixed | All | Free, MIT |
| Packrat | Packrat | Scheme | Mixed | All | Free, MIT |
| Pappy | Packrat | Haskell | Mixed | All | Free, BSD |
| parboiled | Recursive descent | Java, Scala | Mixed | Java virtual machine | Free, Apache 2.0 |
| Lambda PEG | Recursive descent | Java | Mixed | Java virtual machine | Free, Apache 2.0 |
| parsepp | Recursive descent | C++ | Mixed | All | Free, public domain |
| Parsnip | Packrat | C++ | Mixed | Windows | Free, GNU GPL |
| Patterns | Parsing machine | Swift | Identical | All | Free, MIT |
| peg | Recursive descent | C | Mixed | All | Free, MIT |
| PEG.js | Packrat (partial memoization) | JavaScript | Mixed | All | Free, MIT |
| Peggy[36] | Packrat (partial memoization) | JavaScript | Mixed | All | Free, MIT |
| Pegasus | Recursive descent, Packrat (selectively) | C# | Mixed | Windows | Free, MIT |
| pegc | Recursive descent | C | Mixed | All | Free, public domain |
| pest | Recursive descent | Rust | Separate | All | Free, MIT, Apache 2.0 |
| PetitParser | Packrat | Smalltalk, Java, Dart | Mixed | All | Free, MIT |
| PEGTL[37] | Recursive descent | C++11, C++17 | Mixed | All | Free, Boost |
| Parser Grammar Engine (PGE) | Hybrid recursive descent / operator precedence[38] | Parrot bytecode | Mixed | Parrot virtual machine | Free, Artistic 2.0 |
| PyPy rlib | Packrat | Python | Mixed | All | Free, MIT |
| Rats! | Packrat | Java | Mixed | Java virtual machine | Free, GNU LGPL |
| Spirit2 | Recursive descent | C++ | Mixed | All | Free, Boost |
| Treetop | Recursive descent | Ruby | Mixed | All | Free, MIT |
| Yard | Recursive descent | C++ | Mixed | All | Free, MIT or public domain |
| Waxeye | Parsing machine | C, Java, JavaScript, Python, Racket, Ruby | Separate | All | Free, MIT |
| PHP PEG | PEG Parser? | PHP | Mixed | All | Free, BSD |
General context-free, conjunctive, or Boolean languages
[edit]This table compares parser generator languages with a general context-free grammar, a conjunctive grammar, or a Boolean grammar.
| Name | Parsing algorithm | Input grammar notation | Output languages | Grammar, code | Lexer | Development platform | IDE | License |
|---|---|---|---|---|---|---|---|---|
| ACCENT | Earley | Yacc variant | C | Mixed | external | All | No | Free, GNU GPL |
| APaGeD | GLR, LALR(1), LL(k) | ? | D | Mixed | generated | All | No | Free, Artistic |
| Bison | LALR(1), LR(1), IELR(1), GLR | Yacc | C, C++, D,[39] Java, XML | Mixed, except XML | external | All | No | Free, GNU GPL |
| DMS Software Reengineering Toolkit | GLR | ? | Parlanse | Mixed | generated | Windows | No | Proprietary |
| DParser | Scannerless GLR | ? | C | Mixed | scannerless | POSIX | No | Free, BSD |
| Dypgen | Runtime-extensible GLR | ? | OCaml | Mixed | generated | All | No | Free, CeCILL-B |
| E3 | Earley | ? | OCaml | Mixed | external, or scannerless | All | No | ? |
| Elkhound | GLR | ? | C++, OCaml | Mixed | external | All | No | Free, BSD |
| GDK | LALR(1), GLR | ? | C, Lex, Haskell, HTML, Java, Object Pascal, Yacc | Mixed | generated | POSIX | No | Free, MIT |
| Happy | LALR, GLR | ? | Haskell | Mixed | external | All | No | Free, BSD |
| Hime Parser Generator | GLR | ? | C#, Java, Rust | Separate | generated | .NET framework, Java virtual machine | No | Free, GNU LGPL |
| IronText Library | LALR(1), GLR | C# | C# | Mixed | generated or external | .NET framework | No | Free, Apache 2.0 |
| Jison | LALR(1), LR(0), SLR(1) | Yacc | JavaScript, C#, PHP | Mixed | generated | All | No | Free, MIT |
| Syntax | LALR(1), LR(0), SLR(1) CLR(1) LL(1) | JSON/Yacc | JavaScript, Python, PHP, Ruby, C++, C#, Rust, Java | Mixed | generated | All | No | Free, MIT |
| Laja | Scannerless, two phase | Laja | Java | Separate | scannerless | All | No | Free, GNU GPL |
| ModelCC | Earley | Annotated class model | Java | Generated | generated | All | No | Free, BSD |
| P3 | Earley–combinators | BNF-like | OCaml | Mixed | external, or scannerless | All | No | ? |
| P4 | Earley–combinators, infinitary CFGs | BNF-like | OCaml | Mixed | external, or scannerless | All | No | ? |
| Scannerless Boolean Parser | Scannerless GLR (Boolean grammars) | ? | Haskell, Java | Separate | scannerless | Java virtual machine | No | Free, BSD |
| SDF/SGLR | Scannerless GLR | SDF | C, Java | Separate | scannerless | All | Yes | Free, BSD |
| SmaCC | GLR(1), LALR(1), LR(1) | ? | Smalltalk | Mixed | internal | All | Yes | Free, MIT |
| SPARK | Earley | ? | Python | Mixed | external | All | No | Free, MIT |
| Tom | GLR | ? | C | Generated | none | All | No | Free, "No licensing or copyright restrictions" |
| UltraGram | LALR, LR, GLR | ? | C++, C#, Java, Visual Basic .NET | Separate | generated | Windows | Yes | Proprietary |
| Wormhole | Pruning, LR, GLR, Scannerless GLR | ? | C, Python | Mixed | scannerless | Windows | No | Free, MIT |
| Whale Calf | General tabular, SLL(k), Linear normal form (conjunctive grammars), LR, Binary normal form (Boolean grammars) | ? | C++ | Separate | external | All | No | Proprietary |
| yaep | Earley | Yacc-like | C | Mixed | external | All | No | Free, GNU LGPL |
Context-sensitive grammars
[edit]This table compares parser generators with context-sensitive grammars.
| Name | Parsing algorithm | Input grammar notation | Boolean grammar abilities | Development platform | License |
|---|---|---|---|---|---|
| bnf2xml | Recursive descent (is a text filter output is xml) | simple BNF[clarification needed] grammar (input matching), output is xml | ? | Beta, and not a full EBNF parser | Free, GNU GPL |
See also
[edit]Notes
[edit]
References
[edit]- ^ "Ragel State Machine Compiler".
- ^ http://www.colm.net/open-source/ragel/ [verification needed]
- ^ "Adaptive LL(*) Parsing: The Power of Dynamic Analysis" (PDF). Terence Parr. Retrieved 2016-04-03.
- ^ Boyland, John; Spiewak, Daniel (2010-09-17). "Tool Paper: ScalaBison Recursive Ascent-Descent Parser Generator". Electronic Notes in Theoretical Computer Science. Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications (LDTA 2009). 253 (7): 65–74. doi:10.1016/j.entcs.2010.08.032. ISSN 1571-0661.
- ^ "Beaver - a LALR Parser Generator". beaver.sourceforge.net. Retrieved 2023-09-16.
- ^ Newton, Jim E.; Demaille, Akim; Verna, Didier (2016-05-09). "Type-Checking of Heterogeneous Sequences in Common Lisp" (PDF). Proceedings of the 9th European Lisp Symposium on European Lisp Symposium. ELS2016. Kraków, Poland: European Lisp Scientific Activities Association: 13–20. ISBN 978-2-9557474-0-7.
- ^ "CL-Yacc — a LALR(1) parser generator for Common Lisp". www.irif.fr. Retrieved 2023-09-16.
- ^ Hosseinpour, Sahereh; Alavi Milani, Mir Mohammad Reza; Pehlivan, Hüseyin (July 2018). "A Step-by-Step Solution Methodology for Mathematical Expressions". Symmetry. 10 (7): 285. Bibcode:2018Symm...10..285H. doi:10.3390/sym10070285. ISSN 2073-8994.
- ^ "CppCC's Home Page". cppcc.sourceforge.net. Retrieved 2023-09-16.
- ^ "Java Cup". pages.cs.wisc.edu. Retrieved 2023-09-16.
- ^ "CUP". www2.cs.tum.edu. Retrieved 2023-09-16.
- ^ Thiemann, Peter; Neubauer, Matthias (2004-12-31). "Parameterized LR Parsing". Electronic Notes in Theoretical Computer Science. Proceedings of the Fourth Workshop on Language Descriptions, Tools, and Applications (LDTA 2004). 110: 115–132. doi:10.1016/j.entcs.2004.06.007. ISSN 1571-0661.
- ^ Gray, Robert W.; Levi, Steven P.; Heuring, Vincent P.; Sloane, Anthony M.; Waite, William M. (1992). "Eli: a complete, flexible compiler construction system". Communications of the ACM. 35 (2): 121–130. doi:10.1145/129630.129637. ISSN 0001-0782. S2CID 5121773.
- ^ Owens, Scott; Flatt, M.; Shivers, O.; McMullan, Benjamin (2004-10-01). "Lexer and Parser Generators in Scheme" (PDF). Scheme 2004: Proceedings of the Fifth Workshop on Scheme and Functional Programming.
- ^ a b Areias, Hugo; Simões, Alberto; Henriques, P.; Cruz, Daniela Carneiro da (2010-09-01). "Parser generation in Perl: an overview and available tools" (PDF).
{{cite journal}}: Cite journal requires|journal=(help) - ^ Volkman, Victor (2007-07-19). "Let Your Parser Go for the GOLD". Developer.com. Retrieved 2023-11-04.
- ^ "Parsing in C#: All the Tools and Libraries You Can Use (Part 2) - DZone". dzone.com. Retrieved 2023-11-04.
- ^ Ortin, Francisco; Quiroga, Jose; Rodriguez-Prieto, Oscar; Garcia, Miguel (2022-03-03). "An empirical evaluation of Lex/Yacc and ANTLR parser generation tools". PLOS ONE. 17 (3) e0264326. Bibcode:2022PLoSO..1764326O. doi:10.1371/journal.pone.0264326. ISSN 1932-6203. PMC 8893623. PMID 35239695.
- ^ Enseling, Oliver (2000-12-29). "Build your own languages with JavaCC". InfoWorld. Retrieved 2023-11-04.
- ^ "JavaCC". JavaCC. Retrieved 2023-11-04.
- ^ "Building parsers for the web with JavaCC & GWT (Part one)". Chris Ainsley. 14 April 2014. Retrieved 2014-05-04.
- ^ "The Lemon Parser Generator". sqlite.org. Retrieved 2023-11-30.
- ^ "The Lezer Parser System".
- ^ "Building a ShopifyQL Code Editor". Shopify. Retrieved 2023-12-06.
- ^ "Sponsoring the Lezer parser system | Tines". www.tines.com. 2022-03-11. Retrieved 2023-12-06.
- ^ "An LR(*) parser generator for C++".
- ^ "Racc". i.loveruby.net. Retrieved 2021-11-26.
- ^ "Racc Grammar File Reference". i.loveruby.net. Retrieved 2021-11-26.
- ^ "The REX Parser Generator supports C, C++, Java, JavaScript, C#, Go, Haxe, Python, Scala, Typescript, XQuery, and XSLT".
- ^ "The SLK Parser Generator supports C, C++, Java, JavaScript, and C#, optional backtracking, free".
- ^ http://www.slkpg.com/license.txt [bare URL plain text file]
- ^ "SLY (Sly Lex Yacc)".
- ^ "Tree-Sitter - An incremental parsing system for programming tools".
- ^ "Parse - Compile time (LR) type safe parser generator for C++". GitHub. 30 December 2021.
- ^ Kuramitsu, Kimio (2015-11-26), Nez: practical open grammar language, arXiv:1511.08307
- ^ Maintained fork of PEG.js
- ^ taocpp/PEGTL, The Art of C++, 2024-03-14, retrieved 2024-03-16
- ^ "Parrot: Grammar Engine". The Parrot Foundation. 2011.
PGE rules provide the full power of recursive descent parsing and operator precedence parsing.
- ^ "Decl Summary (Bison 3.8.1)". www.gnu.org.
External links
[edit]Comparison of parser generators
View on GrokipediaFundamentals
Parser Generators Defined
Parser generators are software tools designed to automatically create parsers from formal specifications of a language's syntax, enabling the analysis of input strings in systems like compilers, interpreters, and data processing applications. These tools take a grammar as input and output executable code that recognizes valid structures in the input data, producing parse trees or abstract syntax trees as output. By automating the construction of parsing logic, parser generators facilitate the development of language processors for diverse domains, from programming languages to configuration files.[7] The operational workflow of a parser generator begins with a user-supplied grammar description, which defines the rules for valid input structures. The generator processes this description to produce parser code, typically in a general-purpose programming language such as C or Java, implementing algorithms like shift-reduce or recursive descent parsing. This generated parser then tokenizes and analyzes input streams against the grammar, reporting syntax errors or building structured representations of the input.[7] Historically, parser generators trace their origins to the mid-1970s, with Yacc (Yet Another Compiler-Compiler), developed by Stephen C. Johnson at Bell Laboratories in 1975, serving as a seminal example for Unix-based systems. Yacc introduced a practical approach to generating LALR parsers from grammar specifications, influencing subsequent tools and establishing automation as a standard in compiler construction.[7] In contrast to hand-written parsers, where developers manually code recognition logic, parser generators automate the process, significantly reducing errors and accelerating development for intricate grammars that would otherwise demand extensive algorithmic expertise. This automation is particularly beneficial for maintaining parsers as grammars evolve, ensuring consistency and reliability. Grammar inputs to parser generators commonly use notations such as Backus-Naur Form (BNF), a concise metalanguage for defining syntax rules, or Extended BNF (EBNF), which extends BNF with repetition and optionality operators derived from regular expressions for more expressive and compact descriptions. These formats allow precise specification of language structures while remaining amenable to automated analysis and code generation.[8]Key Grammar Classes
The Chomsky hierarchy classifies formal grammars and the languages they generate into four types based on increasing expressive power and computational complexity, providing a foundational framework for understanding the theoretical limits of parser generators.[9] Type 0 grammars are unrestricted, capable of generating any recursively enumerable language using Turing machines as recognizers, with no limitations on production rules.[9] Type 1 grammars are context-sensitive, where each production rule adheres to the form α → β with |β| ≥ |α|, and they generate context-sensitive languages recognized by linear bounded automata that operate within input tape bounds.[9] Type 3 grammars, the most restrictive, generate regular languages, which lack recursion and are recognized by finite automata or expressed via regular expressions; these include simple patterns like identifiers or keywords in programming languages.[9] Type 2 grammars produce context-free languages (CFLs), which support recursion through unrestricted nonterminal substitutions and are recognized by pushdown automata; this class encompasses the syntax of most natural and programming languages, enabling nested structures such as balanced parentheses or expressions.[9] Deterministic context-free languages (DCFLs) form a proper subset of CFLs, consisting of those parsable by deterministic pushdown automata without backtracking or ambiguity resolution.[9] Parsing expression grammars (PEGs) offer an alternative to context-free grammars, defining languages through recognition rules with ordered choice operators and no inherent ambiguity, typically parsed via packrat algorithms that employ memoization for linear-time processing.[10] Beyond the Chomsky hierarchy, extensions like conjunctive and Boolean grammars augment context-free rules with explicit intersection and union operations, respectively, enhancing expressiveness to capture languages beyond CFLs while maintaining polynomial-time recognition for certain subclasses.[11] In formal terms, a grammar consists of nonterminals , terminals , productions , and start symbol , with the language defined as the set of all terminal strings derivable from .[9]Comparisons by Language Hierarchy
Regular Languages
Parser generators for regular languages, also known as lexer generators, are specialized tools that compile regular expressions into efficient code for recognizing and tokenizing input streams based on patterns definable by finite automata. These tools are primarily designed for Type 3 grammars in the Chomsky hierarchy, which encompass regular languages characterized by non-recursive, linear patterns without context dependencies. Representative examples include Lex, its open-source variant flex, and re2c, which target simple pattern matching tasks rather than full syntactic analysis.[12][13][14] A key strength of these generators lies in their ability to produce deterministic finite automata (DFAs) that perform recognition in linear time relative to the input length, leveraging the fixed number of states to process streams efficiently without backtracking. This makes them particularly suitable for lexical analysis, such as tokenizing source code by identifying keywords, identifiers, and operators in a compiler's front-end. For instance, Lex and flex generate C code implementing a DFA-based scanner function, often namedyylex(), which partitions input into tokens matching user-defined regular expressions and executes associated actions.[12][13] Similarly, re2c compiles regular expressions directly into optimized C or C++ code for DFA execution, emphasizing minimal runtime overhead and embeddability in larger programs.[14]
These tools optimize their output through algorithms like DFA state minimization, which reduces the automaton's size while preserving recognition power, ensuring compact and performant code even for complex regular expression sets.[12] In practice, they excel at generating scanners for simple patterns, such as line-oriented processing or basic text segmentation, where the absence of recursion keeps the state space finite and manageable.[14]
However, parser generators for regular languages are inherently limited to non-recursive constructs and cannot recognize nested or context-dependent structures, such as balanced parentheses, which require memory of arbitrary depth beyond a finite automaton's capability—as demonstrated by the pumping lemma for regular languages.[15] Without support for recursion or stack-based mechanisms, they fail on patterns involving matching delimiters at varying depths, confining their role to preprocessing stages before more expressive parsers handle syntactic hierarchies.[12]
Deterministic Context-Free Languages
Parser generators for deterministic context-free languages (DCFLs) produce deterministic pushdown automata that recognize these languages in linear time, enabling efficient parsing without backtracking or ambiguity resolution during runtime. DCFLs form a proper subset of context-free languages, suitable for most programming language grammars that require bounded nondeterminism. Prominent tools include Yacc and its successor Bison for LR parsing, ANTLR for LL() parsing, and JavaCC for LL parsing. Yacc, introduced in 1975, generates bottom-up LR parsers from LALR(1) grammars, while Bison, developed in 1985 as a free alternative, extends this with support for GLR parsing to handle mild nondeterminism in otherwise deterministic grammars.[7] ANTLR employs an adaptive LL() algorithm that computes lookahead sets dynamically to achieve determinism for a broad class of grammars, avoiding the need for fixed lookahead limits.[16] JavaCC generates top-down LL(k) parsers with syntactic and semantic lookahead to resolve local ambiguities, targeting Java-based applications.[17] The primary algorithms are bottom-up LR(0) or LALR(1) for shift-reduce parsing and top-down LL(1) for predictive parsing. In LR parsing, the algorithm constructs a parsing table from canonical collections of LR items, where each item represents a production with a dot indicating the parsing position, such as . Conflicts arise in the action table; a shift-reduce conflict occurs when, for a given state and lookahead, both a shift on (from item ) and a reduce using are possible. These are resolved via operator precedence declarations, comparing the precedence of the lookahead token against the precedence of the production , with higher precedence favoring shift or reduce accordingly, and ties resolved by associativity.[7] [7] LL parsing, in contrast, uses a top-down approach with a FIRST/FOLLOW set analysis to predict productions based on one token of lookahead, avoiding left recursion through grammar transformations. These methods ensure predictable O(n runtime complexity for input of length n, making them ideal for real-time applications like compilers, as they handle nested structures and recursion via a stack without exponential backtracking.[16][17] Strengths of DCFL parser generators include their efficiency in processing most practical programming languages, such as C or Java, where grammars can be written to avoid nondeterminism, yielding fast, table-driven parsers with minimal memory overhead. However, limitations stem from the requirement for unambiguous grammars; any inherent ambiguity or nondeterminism leads to conflicts that demand manual resolution through precedence tweaks, grammar refactoring, or augmented algorithms like GLR in Bison, which trades determinism for completeness at the cost of potential cubic time in worst cases.[7]Parsing Expression Grammars and Variants
Parsing Expression Grammars (PEGs) represent a class of grammars designed for unambiguous, predictive parsing, introduced by Bryan Ford in 2004 as an alternative to traditional context-free grammars (CFGs). Unlike CFGs, which allow nondeterministic choice and backtracking that can lead to ambiguity, PEGs employ ordered choice and treat parsing failure as non-consumptive, ensuring that alternatives are tried without advancing the input position on failure. This makes PEGs particularly suitable for generating parsers that produce a unique parse tree for any valid input prefix, facilitating straightforward implementation via recursive descent methods. The core mechanics of PEGs revolve around packrat parsing, a memoization technique that caches results of subexpressions to avoid redundant computations, achieving linear time complexity O(n) for parsing strings of length n. The ordered choice operator, denoted by /, is central: for an expression e ::= a / b, the parser first attempts to match a; if a succeeds, it commits to that branch, but if a fails without consuming input, it backtracks to the starting position and tries b instead. Success of e occurs only if either a or b matches a prefix of the remaining input. This predictive nature eliminates the need for lookahead sets or conflict resolution common in table-driven parsers. PEGs offer several strengths, including the absence of inherent ambiguity due to their deterministic semantics, which simplifies grammar authoring compared to CFGs where multiple derivations may exist for the same string. They directly support left recursion without special handling, as the packrat algorithm handles it through memoization, avoiding infinite loops in recursive descent. Parser generators for PEGs, such as PEG.js for JavaScript, LPeg for Lua, and parsimonious for Python, translate PEG specifications into efficient code that leverages these features. PEG.js, for instance, compiles grammars into bytecode for fast execution in browsers or Node.js environments. LPeg integrates seamlessly with Lua's pattern matching, extending it to full PEG capabilities.[18] Variants like deterministic Boolean grammars (DBGs) extend PEG-like formalisms by incorporating Boolean operations such as intersection and complement on top of context-free rules, while maintaining determinism for efficient parsing. Introduced by Alexander Okhotin in 2004, DBGs enable recognition of languages beyond standard context-free ones, such as those requiring conjunction of conditions, but parser generators remain specialized. Despite these advantages, PEGs and their variants have limitations: without memoization, parsing can degrade to exponential time in the worst case due to repeated backtracking attempts. They are less ideal for highly ambiguous languages, as the ordered choice enforces a specific resolution that may not align with all desired parses, prioritizing the first matching alternative over exploring multiple possibilities.General Context-Free Languages
Parser generators for general context-free languages (CFLs) enable the recognition and analysis of any context-free grammar, including those that are ambiguous or nondeterministic, without requiring restrictions like determinism found in lower Chomsky hierarchy classes. These tools employ algorithms capable of handling full nondeterminism through techniques such as chart parsing or graph-structured stacks, making them suitable for applications where grammar simplicity and expressiveness outweigh efficiency concerns, such as in natural language processing. Unlike deterministic parsers, which fail on ambiguous grammars, general CFL parsers explore multiple derivation paths, often using dynamic programming to avoid redundant computations. The Earley algorithm, introduced in 1970, is a foundational method for parsing arbitrary CFLs using a bottom-up chart-filling approach. It constructs sets of partial parses, denoted as a derivation of spans from position to , incrementally advancing a dot through productions while predicting, scanning, and completing items. The algorithm fills the chart from left to right, achieving worst-case time complexity of for input length , though it reduces to for unambiguous grammars and for certain unambiguous subclasses. Parser generators like Marpa implement optimized variants of Earley parsing, supporting Perl and other languages while handling ambiguity by producing parse forests.[19][20] Generalized LR (GLR) parsing, developed by Masaru Tomita in 1985, extends LR techniques to nondeterministic grammars by maintaining a graph-structured stack to track multiple parse paths in parallel, effectively nondeterminizing the shift-reduce process. This allows GLR-based tools to parse any CFL, including ambiguous ones, with worst-case complexity, but often performs better in practice for mildly ambiguous inputs. Implementations include extensions to traditional LR generators, such as those incorporating GLR for broader grammar support. Similarly, Generalized LL (GLL) parsing, proposed by Elizabeth Scott and Adrian Johnstone in 2009, provides a recursive descent-like framework for all CFLs, using a binary graph-structured stack to manage nondeterminism and supporting left recursion without infinite loops; it also runs in worst-case time and is implemented in tools like GoGLL for context-free grammar specification.[21][22] Conjunctive grammars, introduced by Alexander Okhotin in 2001, extend standard CFLs by allowing rules with conjunctions of multiple right-hand sides, enabling the intersection of context-free languages while remaining within polynomial-time parsable classes. These grammars combine several CFGs conjunctively, parsed using modified Earley or GLR algorithms that evaluate intersections during chart construction. Boolean grammars, proposed by Okhotin in 2004, further generalize this by incorporating union, intersection, and complement operations directly into productions, expressing all Boolean combinations of CFLs; they maintain parsing complexity via generalized LR methods. Experimental tools, such as those prototyped in Okhotin's research for Boolean grammar recognition, support these extensions, though adoption remains limited due to implementation complexity.[23][24][25] The primary strength of these parser generators lies in their universality: they can handle any CFG, including highly ambiguous ones common in natural language, producing all possible parses or forests for disambiguation post-parsing. This makes them invaluable for domains like computational linguistics, where grammars prioritize descriptive accuracy over determinism. However, the cubic time complexity poses limitations for large inputs, and optimization is challenging due to the need to explore exponential parse paths; practical implementations often incorporate heuristics or grammar transformations to mitigate this, but they remain harder to tune than deterministic alternatives.[19][21]Context-Sensitive and Higher Languages
Context-sensitive grammars (CSGs), classified as Type-1 in the Chomsky hierarchy, extend context-free grammars by allowing production rules that depend on the surrounding context of nonterminals. These grammars generate context-sensitive languages (CSLs), which require the parser to consider the context in which a nonterminal appears to apply rules. Formally introduced by Noam Chomsky in 1956, CSGs feature productions of the form , where and are strings of terminals and nonterminals, is a single nonterminal, and . This non-erasing condition ensures that the length of the derived string does not decrease, distinguishing CSLs from more permissive unrestricted grammars.[26] The mechanics of CSG parsing involve simulating a linear bounded automaton (LBA), which operates on input tape bounded by a linear function of the input length, enabling recognition of CSLs within polynomial space. However, the general recognition problem for CSLs is PSPACE-complete, making it computationally intensive compared to context-free languages.[26] For higher languages, such as those generated by Type-0 unrestricted grammars, parsing equates to the halting problem and is undecidable, though Turing-complete simulators can approximate behavior for specific cases without guaranteed termination.[27] Parser generators supporting CSGs are rare due to their complexity, with most tools being experimental or specialized for mildly context-sensitive variants like Head-driven Phrase Structure Grammar (HPSG). Examples include the PET parser, designed for efficient chart-based parsing of HPSG, and the Autumn library, which provides principled stateful parsing to handle context-sensitive features in programming languages.[28] Custom implementations or general LBA simulators offer further options, but practical adoption remains limited. Unrestricted Type-0 support relies on Turing-complete environments rather than dedicated generators. A key strength of CSGs lies in their expressiveness for modeling natural language phenomena, such as subject-verb agreement and cross-serial dependencies, which exceed the capabilities of context-free grammars while remaining within decidable bounds. This full Type-1 power allows precise capture of syntactic constraints in linguistics and certain domain-specific languages. However, the PSPACE-complete recognition complexity, coupled with undecidability for Type-0 extensions, results in few practical tools; modern approximations often use macros or state extensions in context-free generators to simulate limited context-sensitivity without full CSG machinery.[26]Additional Comparison Dimensions
Implementation and Output Features
Parser generators vary significantly in their output mechanisms, which determine how the resulting parsers are compiled and deployed. The majority produce source code in a target programming language, facilitating integration into larger applications. For instance, GNU Bison primarily generates C source code for parsers, with support for C++ output through specific directives like%define api.parser.class, allowing developers to create object-oriented parser classes.[29] Similarly, ANTLR generates source code in languages such as Java, C++, Python, and others, often including facilities to build and traverse abstract syntax trees (ASTs) directly from the grammar.[30] Tree-sitter, on the other hand, outputs C code for concrete syntax tree (CST) builders, emphasizing efficient tree construction over traditional source code skeletons.[6] Bytecode generation is less common among parser generators; while tools like JavaCC produce pure Java source that compiles to bytecode, direct bytecode output is rare, as most prioritize platform-specific source code for broader compatibility.[17]
Integration options further distinguish these tools, influencing their suitability for different environments. Bison's generated parsers can form standalone executables when linked with a lexical analyzer and a main function, or they can interface with host programs in C, C++, or Java via the yyparse function, supporting reentrant and push-parsing modes for event-driven applications.[29] ANTLR excels in embeddability through its runtime libraries, which allow parsers to be incorporated as modules in applications without external dependencies, and it offers IDE plugins for environments like IntelliJ, Eclipse, and Visual Studio Code to streamline grammar development and testing.[30] Tree-sitter is particularly tailored for editor integration, generating incremental parsers that update syntax trees in real-time as code is edited, with native support for tools like Vim and Neovim across multiple programming languages.[6] In contrast, some commercial tools, such as Visual Parse++, provide standalone executable outputs optimized for Windows environments, often with graphical interfaces for non-programmers.[31]
Extensibility features enable developers to customize parsing behavior beyond pure syntax recognition. Bison supports semantic actions embedded directly in grammar rules, written in C (e.g., { $$ = $1 + $3; } to compute expression values), and includes error recovery via the error token and directives like yyerrok to resume parsing after syntax errors.[29] ANTLR extends this with semantic predicates—arbitrary host-language code that guards rule application (e.g., boolean checks during parsing)—alongside actions for tree manipulation and robust error recovery strategies that insert missing tokens or synchronize on expected ones.[16] These mechanisms allow for context-sensitive behaviors, such as disambiguating ambiguous grammars at runtime. Error recovery is a common extensibility point across tools, often involving panic-mode skipping or phrase-level resynchronization to maintain parse tree integrity in faulty input.[30]
Licensing models impact adoption, particularly in proprietary software development. Open-source options dominate, with Bison distributed under the GNU General Public License (GPL) version 3, but including a special exception that permits the generated parser code to be used in non-free programs without requiring the entire application to be GPL-licensed.[29] ANTLR adopts a more permissive BSD license, allowing unrestricted use, modification, and distribution of generated code in both open and closed-source projects.[32] Tree-sitter follows an MIT license, promoting its dependency-free C runtime for broad integration in editors and tools.[33] Commercial alternatives, like Visual Parse++, operate under proprietary licenses that typically require payment for advanced features such as unlimited grammar size or priority support, appealing to enterprises needing tailored extensibility without open-source constraints.[31]
A notable example of specialized implementation is Tree-sitter, announced in 2017, which generates incremental parsers optimized for dynamic environments like code editors; it supports over 100 languages by producing CSTs that update efficiently on text changes, enabling features such as syntax highlighting and code folding in Vim without full re-parsing.[6] This focus on incrementality sets it apart from traditional generators like Bison or ANTLR, which typically require complete input reprocessing.
Performance and Usability
Parser generators differ significantly in runtime efficiency, influenced by their parsing algorithms and supported grammar classes. For LL(k) parsers generated by ANTLR, practical time complexity is typically linear O(n for unambiguous grammars, enabling efficient processing of large inputs without excessive overhead.[34] In contrast, general context-free parsers like Marpa, which implement an optimized Earley algorithm, exhibit a worst-case time complexity of O(n³), suitable for ambiguous grammars but potentially slower on massive datasets due to cubic scaling.[35] Space complexity remains linear O(n across most generators for deterministic cases, though it can grow to O(n²) or higher in ambiguous scenarios requiring dynamic charts or stacks.[35] Usability encompasses the developer experience, including grammar specification ease and debugging support. Traditional tools like Yacc and Bison impose a steep learning curve, primarily from resolving shift/reduce and reduce/reduce conflicts that arise in LALR(1) grammars, often requiring manual precedence adjustments or grammar refactoring.[29] Empirical studies highlight ANTLR's advantages, with users achieving 6%–15.5% higher task completion rates and spending 1.7–3.5 times fewer autonomous hours compared to Lex/Yacc, attributed to its top-down parsing model that minimizes conflicts.[36] ANTLR further enhances usability through integrated GUI tools, such as the ANTLR Lab for real-time grammar testing and parse tree visualization via theantlr4-parse command with -gui flag, facilitating iterative development.[30]
Benchmarks underscore these efficiency traits. In tests parsing approximately 1,000-line JSON files, LL-based parsers like those from ANTLR complete efficiently on modern hardware, while PEG implementations like PEG.js scale poorly for larger inputs exceeding 1MB due to backtracking, often taking seconds or failing on memory-intensive structures.[37] Bison-generated LR parsers excel on deterministic context-free grammars for such large files, as LR avoids exponential backtracking.[38]
Error handling varies in sophistication and automation. ANTLR provides built-in recovery via ANTLRErrorStrategy implementations, including automatic insertion of error nodes for syntax issues and customizable listeners for detailed reporting, reducing manual intervention.[39] Bison, conversely, depends on user-defined yyerror functions and the error token for basic recovery, such as skipping invalid tokens until synchronization, which demands explicit grammar rules for robust implementation.[40]
| Parser Generator | Typical Time Complexity (Practice) | Worst-Case Time Complexity | Key Usability Feature |
|---|---|---|---|
| ANTLR (LL(*)) | O(n) | O(n⁴) | GUI parse tree visualization[30] |
| Marpa (Earley) | O(n²) for DCFGs | O(n³) | Handles ambiguity without conflicts[35] |
| Bison (LALR(1)) | O(n) | O(n) | Conflict resolution via precedence[29] |
