Hubbry Logo
Proof assistantProof assistantMain
Open search
Proof assistant
Community hub
Proof assistant
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Proof assistant
Proof assistant
from Wikipedia
An interactive proof session in RocqIDE, showing the proof script on the left and the proof state on the right

In computer science and mathematical logic, a proof assistant or interactive theorem prover is a software tool to assist with the development of formal proofs by human–machine collaboration. This involves some sort of interactive proof editor, or other interface, with which a human can guide the search for proofs, the details of which are stored in, and some steps provided by, a computer.

A recent effort within this field is making these tools use artificial intelligence to automate the formalization of ordinary mathematics.[1]

System comparison

[edit]
Name Latest version Developer(s) Implementation language Features
Higher-order logic Dependent types Small
kernel
Proof automation Proof by
reflection
Code generation
ACL2 8.3 Matt Kaufmann, J Strother Moore Common Lisp No Untyped No Yes Yes[2] Already executable
Agda 2.6.4.3[3] Ulf Norell, Nils Anders Danielsson, and Andreas Abel (Chalmers and Gothenburg)[3] Haskell[3] Yes
[citation needed]
Yes
[4]
Yes
[citation needed]
No
[citation needed]
Partial
[citation needed]
Already executable
[citation needed]
Albatross 0.4 Helmut Brandl OCaml Yes No Yes Yes Unknown Not yet implemented
F* repository Microsoft Research and INRIA F* Yes Yes No Yes Yes[5] Yes
HOL Light repository John Harrison OCaml Yes No Yes Yes No No
HOL4 Kananaskis-13 (or repo) Michael Norrish, Konrad Slind, and others Standard ML Yes No Yes Yes No Yes
Idris 2 0.6.0 Edwin Brady Idris Yes Yes Yes Unknown Partial Yes
Isabelle Isabelle2025 (March 2025) Larry Paulson (Cambridge), Tobias Nipkow (München) and Makarius Wenzel Standard ML, Scala Yes No Yes Yes Yes Yes
Lean v4.23.0[6] Leonardo de Moura (Microsoft Research) C++, Lean Yes Yes Yes Yes Yes Yes
LEGO 1.3.1 Randy Pollack (Edinburgh) Standard ML Yes Yes Yes No No No
Metamath v0.198[7] Norman Megill ANSI C
Mizar 8.1.11 Białystok University Free Pascal Partial Yes No No No No
Nqthm
NuPRL 5 Cornell University Common Lisp Yes Yes Yes Yes Unknown Yes
PVS 6.0 SRI International Common Lisp Yes Yes No Yes No Unknown
Rocq 9.0 INRIA OCaml Yes Yes Yes Yes Yes Yes
Twelf 1.7.1 Frank Pfenning, Carsten Schürmann Standard ML Yes Yes Unknown No No Unknown
  • ACL2 – a programming language, a first-order logical theory, and a theorem prover (with both interactive and automatic modes) in the Boyer–Moore tradition.
  • Rocq (former name: Coq) – Allows the expression of mathematical assertions, mechanically checks proofs of these assertions, helps to find formal proofs, and extracts a certified program from the constructive proof of its formal specification.
  • HOL theorem provers – A family of tools ultimately derived from the LCF theorem prover. In these systems the logical core is a library of their programming language. Theorems represent new elements of the language and can only be introduced via "strategies" which guarantee logical correctness. Strategy composition gives users the ability to produce significant proofs with relatively few interactions with the system. Members of the family include:
  • IMPS, An Interactive Mathematical Proof System.[8]
  • Isabelle is an interactive theorem prover, successor of HOL. The main code-base is BSD-licensed, but the Isabelle distribution bundles many add-on tools with different licenses.
  • Jape – Java based.
  • Lean
  • LEGO
  • Matita – A light system based on the Calculus of Inductive Constructions.
  • MINLOG – A proof assistant based on first-order minimal logic.
  • Mizar – A proof assistant based on first-order logic, in a natural deduction style, and Tarski–Grothendieck set theory.
  • PhoX – A proof assistant based on higher-order logic which is eXtensible.
  • Prototype Verification System (PVS) – a proof language and system based on higher-order logic.
  • Theorem Proving System (TPS) and ETPS – Interactive theorem provers also based on simply typed lambda calculus, but based on an independent formulation of the logical theory and independent implementation.

User interfaces

[edit]

A popular front-end for proof assistants is the Emacs-based Proof General, developed at the University of Edinburgh.

Rocq includes RocqIDE, which is based on OCaml/Gtk. Isabelle includes Isabelle/jEdit, which is based on jEdit and the Isabelle/Scala infrastructure for document-oriented proof processing. More recently, Visual Studio Code extensions have been developed for Rocq,[9] Isabelle by Makarius Wenzel,[10] and for Lean 4 by the leanprover developers.[11]

Formalization extent

[edit]

Freek Wiedijk has been keeping a ranking of proof assistants by the amount of formalized theorems out of a list of 100 well-known theorems. As of September 2025, only six systems have formalized proofs of more than 70% of the theorems, namely Isabelle, HOL Light, Lean, Rocq, Metamath and Mizar.[12][13]

Notable formalized proofs

[edit]

The following is a list of notable proofs that have been formalized within proof assistants.

Theorem Proof assistant Year
Four color theorem[14] Rocq 2005
Feit–Thompson theorem[15] Rocq 2012
Fundamental group of the circle[16] Rocq 2013
Erdős–Graham problem[17][18] Lean 2022
Polynomial Freiman-Ruzsa conjecture over [19] Lean 2023
BB(5) = 47,176,870[20] Rocq 2024

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A proof assistant is an interactive software tool that enables users to formalize mathematical statements, construct rigorous proofs, and mechanically verify their correctness using a formal logical foundation, thereby ensuring reliability in complex reasoning tasks. These systems emerged in the late 1960s with early projects like Automath, developed by Nicolaas de Bruijn and colleagues at Eindhoven University of Technology, which focused on representing mathematical texts in a computer-checkable format to address the growing need for verifiable proofs amid increasing computational involvement in mathematics. Key features include support for dependent type theories or higher-order logics, interactive proof development via tactics and automation, and the de Bruijn criterion, which requires that generated proofs be independently verifiable by a small, trusted kernel program to minimize errors. Proof assistants distinguish themselves from automated theorem provers by emphasizing human-guided of proofs, often incorporating of pre-verified and supporting program extraction from proofs via the Curry-Howard correspondence, which equates proofs with executable programs. Prominent examples include Coq, a dependently typed system based on the Calculus of Inductive Constructions, widely used for verifying software and mathematical ; Isabelle, a generic framework supporting multiple logics like (HOL); HOL Light, an embeddable prover emphasizing a minimal kernel; Lean, a dependently typed system supporting mathematical formalization with a large called mathlib; and , which employs a declarative style close to natural mathematical language. Historically, proof assistants gained prominence through landmark formalizations, such as the 2005 Coq verification of the originally proved with computer assistance in 1976, and the 2014 completion of the Flyspeck project using HOL Light and Isabelle to confirm Thomas Hales' 1998 proof of the . Today, they play a critical role in fields like (e.g., certifying operating systems and compilers), certified programming, and advancing mathematical knowledge via large-scale libraries such as the Mathematical Components project in Coq or the Isabelle Archive of Formal Proofs. Ongoing developments as of 2025 focus on improving usability, scalability for large proofs, and bidirectional integration with and , including AI assistance in tactic selection and proof generation as well as the use of proof assistants to generate millions of verified theorems and proofs as synthetic training data for large language models in automated theorem proving.

Fundamentals

Definition and Purpose

A proof assistant is a software system designed to aid in the construction, verification, and checking of formal mathematical proofs through mechanized reasoning. These tools allow users to encode mathematical statements and proofs in a formal language, where the system rigorously validates each step against a specified logical foundation, ensuring that the proof is correct relative to the axioms and inference rules employed. The core purpose of proof assistants is to mitigate inherent in manual proof-writing, facilitate interactive development of proofs where users guide the process with high-level commands, and support the machine-checked formalization of intricate theorems that might otherwise be prone to subtle mistakes. By providing a trusted environment for proof validation, these systems enable reliable of mathematical results, which is particularly valuable in domains requiring absolute certainty, such as foundational mathematics, , and . Proof assistants differ from automated theorem provers, which primarily automate the search for proofs with minimal user intervention, by prioritizing user-directed, verifiable proof steps that allow for detailed control and inspection. In contrast to systems, which focus on symbolic manipulation and computation of expressions (e.g., solving equations or simplifying polynomials), proof assistants emphasize logical deduction and proof certification rather than mere algebraic evaluation. These systems trace their origins to early efforts in automated deduction in the late 1960s and 1970s, such as investigations into mechanical proof generation, with more structured interactive frameworks emerging in the 1980s.

Core Components

Proof assistants rely on a modular architecture centered around several key components that facilitate the construction and verification of formal proofs. Many such systems, particularly those in the LCF tradition or based on type theory, feature a kernel as the trustworthy core responsible for proof checking, implementing a minimal set of primitive rules and type-checking algorithms to validate all user-provided content. This small kernel ensures reliability by reducing complex proofs to basic, manually verifiable operations, adhering to the de Bruijn criterion, which requires generating independently checkable proof objects that can be verified by a simple program regardless of the system's complexity. Other systems, such as declarative ones like Mizar, use alternative verification mechanisms without a minimal kernel. In interactive proof assistants, tactics act as procedural commands that guide the proof process by transforming proof goals into subgoals, such as applying lemmas or performing case analysis, allowing users to build proofs interactively without directly manipulating low-level proof terms. Declarative systems instead emphasize writing proofs in a style close to natural mathematical language, with the system checking justifications against a . Libraries provide pre-built collections of theorems, definitions, and axioms, enabling reuse of established mathematical results to accelerate proof development; for instance, Coq's standard libraries include modules like Arith for arithmetic operations. Proof scripts consist of user-written sequences of commands, including tactic applications and declarations, that orchestrate the overall proof construction in a structured, reproducible manner. In systems with a kernel, the proof verification process hinges on its role in validating large-scale proofs through efficient type checking, ensuring consistency even as proofs grow to megabytes in size. By the de Bruijn criterion, the kernel independently confirms that each proof step adheres to the system's logical rules, minimizing trust assumptions to just this core component and preventing inconsistencies from propagating through higher-level abstractions. This separation allows elaborate user interactions and tactics to be untrusted, as long as they produce verifiable proof objects for kernel inspection. A typical workflow in interactive proof assistants begins with the user stating a in the system's , followed by applying a series of tactics to decompose the goal into simpler subgoals, ultimately reducing it to primitive axioms or admitted facts that the kernel verifies step by step. For example, in Coq, a user might declare a lemma, use the apply tactic to invoke a relevant from a , and then employ intros or destruct to handle variables and cases, with the kernel checking each transformation for type correctness. Upon completion, the proof script is executed to generate a proof term, which the kernel type-checks to confirm validity. Declarative systems follow a different process, where users provide detailed justifications that the system verifies against predefined rules and the . Automation plays a crucial role in handling routine subtasks through built-in procedures for decidable fragments of the underlying logic, such as linear arithmetic or propositional tautologies, which can resolve subgoals without user intervention. Tactics like Coq's auto or omega leverage these decidable solvers to automatically discharge simple obligations, integrating seamlessly with interactive proof construction while still submitting results to kernel verification. Many proof assistants, including those based on dependent type theory, incorporate such mechanisms to balance human guidance with computational efficiency.

Historical Development

Origins in Automated Reasoning

The origins of proof assistants can be traced to foundational developments in mathematical logic and computability theory during the 1930s and 1950s, which established the theoretical limits of formal systems and computation. Kurt Gödel's incompleteness theorems, published in 1931, demonstrated that in any sufficiently powerful consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proved within the system itself, highlighting the inherent incompleteness of axiomatic mathematics. This result underscored the challenges of fully automating mathematical reasoning and influenced later efforts in formal verification by emphasizing the need for systems that could handle undecidable propositions. Complementing Gödel's work, Alan Turing's 1936 paper on computable numbers introduced the concept of a universal computing machine and proved the undecidability of the halting problem, showing that no general algorithm exists to determine whether an arbitrary program will terminate. Alonzo Church's lambda calculus, developed in the early 1930s and applied in his 1936 demonstration of the unsolvability of the Entscheidungsproblem, provided a formal framework for expressing computable functions and higher-order abstractions, serving as a precursor to typed logics used in verification. These contributions collectively laid the groundwork for automated reasoning by revealing the boundaries of mechanical proof and inspiring computational models for logic. In the 1960s, the field advanced toward practical automated theorem proving, driven by efforts to mechanize first-order logic for machine implementation. A pivotal development was J.A. Robinson's introduction of the resolution principle in 1965, a complete and sound inference rule for first-order logic that enabled efficient refutation-based theorem proving by reducing clauses through unification and resolution steps. This method, designed explicitly for computer use, became a cornerstone of automated deduction systems and facilitated the integration of logic into early artificial intelligence programs. Concurrently, early AI logic programs, such as extensions of the 1956 Logic Theorist by Newell and Simon, evolved in the 1960s to explore heuristic search in proof generation, though they often struggled with scalability for non-trivial theorems. These systems marked the shift from theoretical foundations to programmatic attempts at automation, yet they primarily operated in propositional or simple first-order settings, with emerging interest in higher-order logic for more expressive reasoning. The 1970s saw a transition from fully automated provers to semi-automated, interactive approaches, exemplified by the Boyer-Moore theorem prover, which originated in projects at the University of Texas and Xerox PARC starting around 1971. This system emphasized inductive proof techniques and user-guided simplification strategies within a first-order logic framework, allowing human intervention to resolve ambiguities in complex deductions. By focusing on semi-automated deduction, it addressed the practical limitations of pure automation, where exhaustive search often failed due to combinatorial explosion in proof spaces for mathematical theorems. Researchers recognized that fully automatic systems were insufficient for intricate mathematics, as undecidability results and the complexity of real-world proofs necessitated human insight to select lemmas, guide tactics, and manage assumptions, paving the way for interactive proof assistants.

Key Milestones and Systems

The development of proof assistants gained momentum in the with the emergence of early interactive theorem provers that emphasized human-guided proof construction. Earlier, in the late 1960s, the Automath project at University developed the first interactive proof assistant for formalizing mathematical texts. In the , Robin Milner's LCF at the introduced a foundational interactive theorem prover based on , emphasizing a trusted kernel for proof validation. One pivotal system was Nqthm, developed by Robert S. Boyer and J Strother Moore starting in the late and continuing through the at the ; it introduced techniques for program verification and became influential in industrial applications, such as hardware and software correctness proofs. Concurrently, the HOL system, initiated in 1986 by at the , built on the LCF framework to support , enabling more expressive formalizations of and concepts. Entering the 1990s, proof assistants evolved toward more robust logical foundations and user-friendly interfaces. Coq, developed from 1989 at Inria by Christine Paulin-Mohring and others, was based on the Calculus of Inductive Constructions—a theory that allowed for concise definitions of inductive structures and proofs—and quickly became a standard for formalizing complex mathematics. Isabelle, originating in 1986 under Larry Paulson at the and maturing through the 1990s, adopted an LCF-style architecture with a generic framework for classical and intuitionistic logics, facilitating theorem proving across diverse domains. The and saw broader adoption in academia and industry, with systems tailored for specific verification needs. PVS, released in 1992 by under sponsorship, integrated with automated decision procedures, proving effective for software analysis. , a successor to Nqthm launched in the mid-1990s by Boyer and Moore, focused on industrial-scale hardware and , supporting executable specifications in a with induction. Agda, introduced in 2007 at Chalmers University, advanced dependently typed programming and proof, drawing from Martin-Löf to enable interactive formalization of functional programs as proofs. In the 2010s and , proof assistants emphasized mathematical formalization and automation enhancements. Lean, developed starting in 2013 by Leonardo de Moura at , targeted large-scale mathematics libraries with its dependent type theory kernel and integration of tactics for efficient proof scripting. During this period, many systems incorporated SAT solvers and other automated tools; for instance, Isabelle's (from 2008 onward) translated goals to external solvers like , significantly boosting proof automation. Key milestones underscored these advancements: the was fully formalized in Coq in 2005 by Georges Gonthier, resulting in approximately 60,000 lines of Coq code. Similarly, the —affirming the densest —was verified in 2014 using HOL Light by the Flyspeck project led by Thomas Hales, confirming Hales' approximately 250-page proof through extensive formal checks.

Logical Foundations

Higher-Order Logic Systems

(HOL) extends by permitting quantification over functions and predicates, treating them as first-class entities alongside individuals. This allows for the direct expression of complex mathematical concepts, such as properties of properties or functions that take other functions as arguments, which are inexpressible in first-order systems. The formal semantics of HOL is grounded in the , where terms are assigned types from a hierarchy starting with base types like bool for propositions and ind for individuals, and built up via function types σ → τ. Logical connectives and quantifiers are encoded as higher-typed constants, with key axioms including (stating that functions are equal if they agree on all inputs), infinity (ensuring the existence of infinite domains, such as the natural numbers), and a choice axiom via the Hilbert ε-operator, which selects an element satisfying a predicate. This framework ensures a conservative extension of classical , preserving its theorems while enabling richer expressiveness without introducing inconsistencies. A core feature of HOL is its support for shallow embeddings of , where domains like numbers, sets, and structures are defined directly using the and lambda abstractions, avoiding deep encodings into a separate meta-language. For instance, sets are represented as predicates of type α → bool, allowing primitive treatment of set-theoretic operations within the logic itself. This approach facilitates concise formalizations of classical . Representative HOL-based proof assistants include HOL Light, which employs a minimal kernel with just 10 primitive rules to enforce , making it ideal for foundational work in verification. In contrast, Isabelle/HOL provides an extensible framework for building theories, integrating automated tactics and a rich library for higher-order unification and classical proof procedures. The strengths of HOL systems lie in their alignment with classical reasoning, enabling efficient proofs in domains requiring non-constructive methods, such as hardware and . For example, HOL's primitive support for and equality axioms has been leveraged to verify algorithms at scale, demonstrating its practicality for industrial applications. These systems typically incorporate interactive theorem proving to guide users through complex derivations.

Dependent Type Theory Systems

Dependent type theory provides a foundation for proof assistants where types can depend on values, enabling the expression of propositions as types through the Curry-Howard isomorphism. This correspondence, first articulated in the context of typed lambda calculi, identifies proofs of propositions with programs of corresponding types, allowing logical statements to be computationally inhabited. In this framework, introduced by in his , propositions are treated as types, and proofs are terms that inhabit those types, fostering a constructive approach to mathematics where existence implies constructibility. Key features of dependent type theory include the identification of proofs with programs, which supports the extraction of executable code from formal proofs, and inductive types that model recursive data structures such as natural numbers or . Some systems incorporate the univalence axiom from (HoTT), which equates paths (identities) between types with equivalences, enabling higher-dimensional structures that capture homotopy-theoretic interpretations of equality. To prevent paradoxes like Girard's paradox arising from self-referential , these theories employ cumulative or non-cumulative universe levels, stratifying types into a where each level classifies types of lower levels. Equality is typically handled via identity types, which form a propositional equality that can be refined in HoTT to include higher paths, or via more intensional variants in core systems. Representative proof assistants based on dependent type theory include Coq, which implements the Calculus of Inductive Constructions (CIC), extending the with inductive definitions to support both proofs and program extraction. Agda realizes Martin-Löf's , emphasizing dependent pattern matching and termination checking for practical programming alongside theorem proving. Lean employs a dependent type theory with a focus on through tactics and a kernel based on CIC-like rules, facilitating large-scale formalizations in and verification. These systems excel in supporting paradigms and constructive mathematics, as seen in Coq's ability to extract certified algorithms, such as verified implementations of sorting or , directly from proofs. This integration bridges with , ensuring that mathematical correctness translates to runtime guarantees without introducing non-constructive axioms by default.

System Design and Features

Interactive Theorem Proving

Interactive theorem proving represents the core in most proof assistants, where users collaboratively construct formal proofs by guiding the through a series of steps, rather than relying solely on full . In this process, the user begins with a stated , which the system treats as an initial goal, and applies tactics—high-level commands that transform the current proof state into simpler subgoals or resolve them entirely. The proof state encompasses the current goals, hypotheses, and , which the system maintains and updates after each tactic application, allowing users to focus on high-level while the assistant verifies local correctness. This backward reasoning proceeds goal-oriented, starting from the theorem and working towards primitive axioms or admitted facts, enabling the handling of complex, undecidable problems that evade complete automation. Tactics serve as the primary mechanism for goal decomposition, with users choosing between ad-hoc applications for specific steps and scripted sequences for repetitive or structured proofs. Common examples include induction, which generates subgoals for base cases and inductive steps on recursively defined structures like natural numbers or ; , which replaces terms in goals or hypotheses using equalities or definitions to simplify expressions; and (via destruct in Coq or case_tac in Isabelle), which splits goals based on constructors of inductive types, such as distinguishing empty and non-empty . These tactics can be combined using tacticals, such as sequencing (e.g., applying one after another) or repetition, to build proof scripts that mirror mathematical reasoning while ensuring type-theoretic consistency. For instance, proving list reversal properties often involves induction followed by and to handle recursive calls. To balance user interaction with computational power, proof assistants integrate external automated solvers, such as SMT solvers (e.g., Z3 or veriT) and equality-matching engines, particularly for proving lemmas in decidable fragments like linear arithmetic or propositional logic. These tools are invoked via dedicated tactics, like in Isabelle, which translates goals to external provers and reconstructs proofs using verifiable certificates checked by the assistant, or bounded invocations in Coq that limit solver exploration to avoid non-termination. This hybrid approach automates routine subproofs while deferring creative decomposition to the user, as seen in verifications where SMT handles quantifier-free constraints but induction addresses recursive aspects. The interactive paradigm offers key benefits, including the ability to manage undecidable logics like by leveraging human insight for non-trivial choices, such as selecting induction hypotheses or case splits, which fosters deeper understanding of proof structures. It also promotes reliability, as the system guarantees that only valid steps advance the proof, reducing errors in formalizing intricate theorems like the or . Overall, this user-guided method scales to large formalizations by modularizing proofs into verifiable components, enhancing both mathematical discovery and .

User Interfaces and Tools

Proof assistants offer diverse user interfaces to support interactive proof construction, catering to users ranging from novices to experts. These interfaces emphasize ergonomic interaction, enabling step-by-step verification while managing complex proof states. Text-based environments, integrated with extensible editors like and Vim, form a foundational approach for many systems. Proof General, an -based framework, provides a unified interface for assistants such as Coq and Isabelle, featuring , script parsing, and real-time feedback on proof progress. Agda's Emacs mode similarly facilitates interactive type-checking and goal refinement through structured editing and Unicode input methods. Dedicated IDEs enhance usability with graphical elements tailored to proof development. CoqIDE serves as Coq's native standalone interface, offering windows for script editing, goal inspection, and query evaluation, with support for and error localization. Isabelle/jEdit, built on the jEdit editor, delivers asynchronous processing, markup rendering, and plugin extensibility for seamless proof exploration. For Lean, the VS Code extension acts as the primary IDE, incorporating features like hover information and code navigation to streamline stating and tactic application. Visualization tools play a crucial role in presenting proof dynamics clearly. Goal displays, common across interfaces, show current subgoals, hypotheses, and context, as seen in Proof General's dedicated goals buffer that updates incrementally during tactic execution. highlighting identifies or logical issues with visual cues, reducing time. Specialized visualizers, such as Prooftree for Coq in Proof General, render proof trees as layered diagrams, with nodes representing tactics and branches indicating subgoal evolution—fully proved paths appear in green for at-a-glance verification. Auxiliary tools augment core interfaces with practical utilities. Proof state debuggers, like Coq's interactive stepping mode, allow users to inspect and retract commands mid-proof, aiding in error diagnosis. Version control integration leverages editor capabilities, such as Git hooks in VS Code or packages, to track proof script changes collaboratively. Export functions enable documentation generation; Coq's coqdoc utility converts scripts to or , producing polished PDFs for sharing formalized results. To improve accessibility, especially in educational settings, notebook-style interfaces promote exploratory learning. Lean's lean4-jupyter kernel integrates with Jupyter notebooks, allowing users to interleave code execution, theorem proofs, and visualizations in a format. User interfaces have evolved from rudimentary command-line prompts to immersive, web-accessible platforms. Initial systems depended on terminal interactions for tactic input and output, but contemporary developments like jsCoq compile Coq to , providing a browser-native IDE with scratchpad editing, key bindings for proof stepping, and embeddable demos—facilitating remote collaboration without installation.

Comparison Across Systems

Feature and Capability Overview

Proof assistants vary significantly in their core features and capabilities, reflecting their underlying logical foundations and design goals. Major systems such as Coq, Isabelle/HOL, Lean, and HOL4 support distinct logics—ranging from intuitionistic type theories to classical higher-order logics—while providing tactic-based proof construction, varying degrees of automation, and mechanisms for code extraction and proof reuse. These differences influence their suitability for formalizing , verifying software, or developing readable proofs. The following table summarizes key comparison criteria across these systems, highlighting supported logics, proof languages, automation, libraries, definition mechanisms, extraction, expressiveness, and reuse.
SystemSupported LogicTactic LanguageAutomation LevelsLibrary Size/NameLanguage for DefinitionsExtraction CapabilitiesTheorem ExpressivenessProof Reuse via Modules
CoqCalculus of Inductive Constructions (intuitionistic)Ltac (procedural)High (e.g., auto, omega for arithmetic)Large (standard library + contrib, thousands of theorems)Gallina (functional)Strong to OCaml, Haskell, SchemeHigh via dependent types for precise specificationsModules and functors for parameterized proofs
Isabelle/HOLClassical higher-order logicIsar (declarative), ML-basedVery high (Sledgehammer integrates external ATPs)Extensive (Archive of Formal Proofs, 934 entries as of November 2025)HOL (functional)Code generation to Scala, Haskell, SMLSupports classical axioms like Peirce's law nativelyLocales and theories for modular extensions
LeanDependent type theory (CIC with quotients, choice)Native tactics (e.g., simp, rw)High (simp, linarith, ring; typed tactic framework)Very large (mathlib: ~120,000 definitions, ~240,000 theorems as of November 2025)Lean (functional with dependent types)To C, JavaScript; focuses on verified executablesUniverse polymorphism for scalable hierarchiesType classes and sections for reusable abstractions
HOL4Classical higher-order logicML-based tactics (e.g., REWRITE_TAC)Moderate (HOLyHammer for premise selection, Metis)Moderate (standard library for analysis, sets; ~7,000 theorems)ML (functional)To SML, OCaml via embeddingsBalanced for automation in classical settingsTheories and lemmas for incremental building
Coq exemplifies strong extraction capabilities, allowing verified functional programs to be compiled from proofs, such as extracting a certified extraction of the solver to . Isabelle/HOL emphasizes readable proofs through , enabling structured, human-verifiable arguments that resemble natural language derivations. Lean's mathlib library prioritizes comprehensive mathematical formalization, supporting proof reuse via type classes that automatically resolve instances in algebraic structures. HOL4 facilitates proof reuse through its theory hierarchy, where new developments extend base logics without compromising soundness. These features enable theorem statement expressiveness tailored to their logics, such as Coq's dependent types for encoding program specifications directly in types.

Formalization Scale and Performance

Proof assistants have enabled formalizations of increasing scale, with major libraries comprising hundreds of thousands of definitions and theorems. For instance, Lean's mathlib library, a comprehensive mathematical repository, contains ~120,000 definitions and ~240,000 theorems as of November 2025. Similarly, Coq's ecosystem, including its and contributed packages, contains extensive formal developments across various domains, reflecting the cumulative efforts of its community. Isabelle/HOL's Archive of Formal Proofs (AFP) hosts 934 entries as of November 2025, each often spanning thousands of lines of formal developments, covering topics from algebra to verified software. Performance in proof assistants is evaluated through metrics such as proof checking time, memory usage, and support for parallelism, which are critical for handling large-scale libraries. Isabelle/HOL incorporates multi-threading for parallel proof processing, allowing efficient exploitation of multi-core hardware and reducing overall verification time for extensive theories. In Coq, proof checking for substantial developments can require significant computational resources, with benchmarks showing that unoptimized large libraries may take hours to verify fully, though targeted optimizations can yield improvements of several orders of magnitude in runtime. Memory usage similarly scales with library size; for example, verifying complex sessions in Isabelle may demand gigabytes of RAM, influenced by the depth of proof dependencies. Benchmarks for kernel verification highlight efficiency differences across systems. HOL Light's compact kernel, consisting of around 400 lines of code, enables rapid verification—often in seconds—even for intricate formalizations, prioritizing a minimal . Scalability challenges in large libraries include prolonged edit-check cycles and dependency resolution overheads, where modifying a foundational can necessitate re-verifying thousands of dependent proofs, exacerbating time and resource demands. Key factors affecting performance include kernel size and optimization techniques. Smaller kernels, as in HOL Light, enhance trustworthiness by limiting the code subject to manual audit but may require more user effort in proofs; larger kernels in systems like Coq offer richer automation at the cost of extended verification times. Techniques such as caching—evident in Coq's compiled .vo files that store pre-verified modules and Isabelle's session graphs for incremental rebuilding—mitigate recomputation, significantly accelerating iterative development in expansive libraries.

Applications and Impact

Notable Formalized Theorems

One of the landmark achievements in proof assistants is the formalization of the Four Color Theorem, which states that any planar graph can be colored with at most four colors such that no two adjacent vertices share the same color. This theorem was mechanized in the Coq proof assistant in 2005 by Georges Gonthier at Microsoft Research, with contributions from Benjamin Werner, building on the 1997 informal proof by Robertson, Sanders, Seymour, and Thomas. The formalization spanned several years of effort, involving approximately 60,000 lines of Coq proof scripts, and required refactoring extensive parts of the original proof to fit Coq's constructive framework, including the development of libraries for hypermaps and graph theory. The verification process took three days on contemporary hardware and provided mechanical assurance against errors in the original computer-assisted case analysis, which had previously relied on unverified software; this formal proof eliminated trust in external code and enabled independent reproducibility. Another significant formalization is that of the , asserting that the densest packing of equal spheres in three-dimensional has a density of π/18\pi / \sqrt{18}
Add your contribution
Related Hubs
User Avatar
No comments yet.