Recent from talks
Nothing was collected or created yet.
Isabelle (proof assistant)
View on Wikipedia| Isabelle | |
|---|---|
Isabelle–jEdit running on macOS | |
| Original author | Lawrence Paulson |
| Developers | University of Cambridge Technical University of Munich, et al. |
| Initial release | 1986[1] |
| Stable release | Isabelle2025
/ March 2025 |
| Written in | Standard ML, Scala |
| Operating system | Linux, Windows, macOS |
| Type | Mathematics |
| License | BSD |
| Website | isabelle |
The Isabelle[a] automated theorem prover is a higher-order logic (HOL) theorem prover, written in Standard ML and Scala. As a Logic for Computable Functions (LCF) style theorem prover, it is based on a small logical core (kernel) to increase the trustworthiness of proofs without requiring, yet supporting, explicit proof objects.
Isabelle is available inside a flexible system framework allowing for logically safe extensions, which comprise both theories and implementations for code-generating, documenting, and specific support for a variety of formal methods. It can be seen as an integrated development environment (IDE) for formal methods. In recent years, a substantial number of theories and system extensions have been collected in the Isabelle Archive of Formal Proofs (Isabelle AFP).[2]
Isabelle was named by Lawrence Paulson after Gérard Huet's daughter.[3]
The Isabelle theorem prover is free software, released under the revised BSD license.
Features
[edit]Isabelle is generic: it provides a meta-logic (a weak type theory), which is used to encode object logics like first-order logic (FOL), higher-order logic (HOL) or Zermelo–Fraenkel set theory (ZFC). The most widely used object logic is Isabelle/HOL, although significant set theory developments were completed in Isabelle/ZF. Isabelle's main proof method is a higher-order version of resolution, based on higher-order unification.
Though interactive, Isabelle features efficient automatic reasoning tools, such as a term rewriting engine and a tableaux prover, various decision procedures, and, through the Sledgehammer proof-automation interface, external satisfiability modulo theories (SMT) solvers (including CVC4) and resolution-based automated theorem provers (ATPs), including E, SPASS, and Vampire (the Metis[b] proof method reconstructs resolution proofs generated by these ATPs).[4] It also features two model finders (counterexample generators): Nitpick[5] and Nunchaku.[6]
Isabelle features locales which are modules that structure large proofs. A locale fixes types, constants, and assumptions within a specified scope[5] so that they do not have to be repeated for every lemma.
Isar ("intelligible semi-automated reasoning") is Isabelle's formal proof language. It is inspired by the Mizar system.[5]
Example proof
[edit]Isabelle allows proofs to be written in two different styles, the procedural and the declarative. Procedural proofs specify a series of tactics (theorem proving functions/procedures) to apply. While reflecting the procedure that a human mathematician might apply to proving a result, they are typically hard to read as they do not describe the outcome of these steps. This style is "considered harmful" in the Isabelle documentation.[7]
On the other hand, declarative proofs (supported by Isabelle's proof language, Isar) specify the actual mathematical operations to be performed, and are therefore more easily read and checked by humans.
For example, a declarative proof by contradiction in Isar that the square root of two is not rational can be written as follows.
theorem sqrt2_not_rational: "sqrt 2 ∉ ℚ" proof let ?x = "sqrt 2" assume "?x ∈ ℚ" then obtain m n :: nat where sqrt_rat: "¦?x¦ = m / n" and lowest_terms: "coprime m n" by (rule Rats_abs_nat_div_natE) hence "m^2 = ?x^2 * n^2" by (auto simp add: power2_eq_square) hence eq: "m^2 = 2 * n^2" using of_nat_eq_iff power2_eq_square by fastforce hence "2 dvd m^2" by simp hence "2 dvd m" by simp have "2 dvd n" proof - from ‹2 dvd m› obtain k where "m = 2 * k" .. with eq have "2 * n^2 = 2^2 * k^2" by simp hence "2 dvd n^2" by simp thus "2 dvd n" by simp qed with ‹2 dvd m› have "2 dvd gcd m n" by (rule gcd_greatest) with lowest_terms have "2 dvd 1" by simp thus False using odd_one by blast qed
Applications
[edit]Isabelle has been used to aid formal methods for the specification, development and verification of software and hardware systems.
Isabelle has been used to formalize numerous theorems from mathematics and computer science, like Gödel's completeness theorem, Gödel's theorem about the consistency of the axiom of choice, the prime number theorem, correctness of security protocols, and properties of programming language semantics. Many of the formal proofs are, as mentioned, maintained in the Archive of Formal Proofs, which contains (as of 2019) at least 500 articles with over 2 million lines of proof in total.[8]
- In 2009, the L4.verified project at NICTA produced the first formal proof of functional correctness of a general-purpose operating system kernel:[9] the seL4 (secure embedded L4) microkernel. The proof is constructed and checked in Isabelle/HOL and comprises over 200,000 lines of proof script to verify 7,500 lines of C. The verification covers code, design, and implementation, and the main theorem states that the C code correctly implements the formal specification of the kernel. The proof uncovered 144 bugs in an early version of the C code of the seL4 kernel, and about 150 issues in each of design and specification.
- The definition of the programming language Lightweight Java was proven type-sound in Isabelle.[10]
Alternatives
[edit]Several languages and systems provide similar functions:
- Agda, written in Haskell
- Rocq (previously known as Coq), written in OCaml
- Lean, written in Lean itself and C++
- LEGO, written in Standard ML of New Jersey
- Mizar system, written in Free Pascal
- Metamath, written in ANSI C
- Prover9, written in C, with a GUI written in Python
- Twelf, written in Standard ML
Notes
[edit]References
[edit]- ^ Paulson, L. C. (1986). "Natural deduction as higher-order resolution". The Journal of Logic Programming. 3 (3): 237–258. arXiv:cs/9301104. doi:10.1016/0743-1066(86)90015-4. S2CID 27085090.
- ^ Eberl, Manuel; Klein, Gerwin; Nipkow, Tobias; Paulson, Larry; Thiemann, René. "Archive of Formal Proofs". Retrieved 1 May 2021.
- ^ Gordon, Mike (1994-11-16). "1.2 History". Isabelle and HOL. Cambridge AR Research (The Automated Reasoning Group). Archived from the original on 2017-03-05. Retrieved 2016-04-28.
- ^ Jasmin Christian Blanchette, Lukas Bulwahn, Tobias Nipkow, "Automatic Proof and Disproof in Isabelle/HOL", in: Cesare Tinelli, Viorica Sofronie-Stokkermans (eds.), International Symposium on Frontiers of Combining Systems – FroCoS 2011, Springer, 2011.
- ^ a b c Jasmin Christian Blanchette, Mathias Fleury, Peter Lammich & Christoph Weidenbach, "A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality", Journal of Automated Reasoning 61:333–365 (2018).
- ^ Andrew Reynolds, Jasmin Christian Blanchette, Simon Cruanes, Cesare Tinelli, "Model Finding for Recursive Functions in SMT", in: Nicola Olivetti, Ashish Tiwari (eds.), 8th International Joint Conference on Automated Reasoning, Springer, 2016.
- ^ Wenzel, Makarius (March 13, 2025). "The Isabelle/Isar Reference Manual" (PDF). Retrieved 2025-05-10. Page 148: "Arbitrary goal refinement via tactics is considered harmful". See also section 7.3, "Tactics: improper proof methods", pp. 172–175.
- ^ Eberl, Manuel; Klein, Gerwin; Nipkow, Tobias; Paulson, Larry; Thiemann, René. "Archive of Formal Proofs". Retrieved 22 October 2019.
- ^ Klein, Gerwin; Elphinstone, Kevin; Heiser, Gernot; Andronick, June; Cock, David; Derrin, Philip; Elkaduwe, Dhammika; Engelhardt, Kai; Kolanski, Rafal; Norrish, Michael; Sewell, Thomas; Tuch, Harvey; Winwood, Simon (October 2009). "seL4: Formal verification of an OS kernel" (PDF). 22nd ACM Symposium on Operating System Principles. Big Sky, Montana, US. pp. 207–200.
- ^ Strniša, Rok; Parkinson, Matthew (7 February 2011). "Lightweight Java". Archive of Formal Proofs (February 2011 ed.). ISSN 2150-914X. Retrieved 2019-11-25.
Further reading
[edit]- Lawrence C. Paulson, "The Foundation of a Generic Theorem Prover", Journal of Automated Reasoning, Volume 5, Issue 3 (September 1989), pages: 363–397, ISSN 0168-7433.
- Lawrence C. Paulson and Tobias Nipkow, "Isabelle Tutorial and User's Manual", 1990.
- M. A. Ozols, K. A. Eastaughffe, and A. Cant, "DOVE: A Tool for Design Oriented Verification and Evaluation", Proceedings of AMAST 97, M. Johnson, editor, Sydney, Australia. Lecture Notes in Computer Science (LNCS) Vol. 1349, Springer Verlag, 1997.
- Tobias Nipkow, Lawrence C. Paulson, Markus Wenzel, "Isabelle/HOL – A Proof Assistant for Higher-Order Logic", 2020.
External links
[edit]Isabelle (proof assistant)
View on GrokipediaHistory and Development
Origins
Isabelle's development began in 1986 at the University of Cambridge under the leadership of Lawrence C. Paulson, who created it as an extension of the LCF theorem prover tradition.[5] Drawing from the success of Edinburgh LCF, which introduced ML as a meta-language for defining tactics, Paulson aimed to build a more generic system capable of supporting multiple logical frameworks through a unified interface.[6] This initial design emphasized flexibility, allowing users to define object-logics within a common proof engine based on typed lambda-calculus and higher-order unification, addressing limitations in earlier LCF variants by incorporating advanced automation techniques like backtracking search.[6] The system was named Isabelle, reportedly after the daughter of the prominent logician Gérard Huet, whose work on type theory and proof systems influenced the era's developments.[7] Early implementations, such as Isabelle-86, prioritized support for diverse logics including Zermelo-Fraenkel set theory (ZF) and first-order logic, with initial experiments in constructive type theory.[6] These versions focused on enabling interactive proof construction via tactics and tacticals, surpassing LCF's capabilities in handling complex goals through unification-based solving of unknowns. The first public release occurred in 1986, marking Isabelle's debut as a tool for formal reasoning in mathematics and computer science.[8] By the early 1990s, collaboration with Tobias Nipkow at the Technical University of Munich expanded Isabelle's scope, particularly with the integration of higher-order logic (HOL) as a primary object-logic.[5] This partnership shifted emphasis toward polymorphic higher-order logics and axiomatic set theory, laying the groundwork for Isabelle's evolution into a versatile proof assistant used in formal verification and mathematical formalization.[5]Major Releases
Isabelle's development has proceeded through annual major releases since its early versions, each introducing significant enhancements to its core functionality, proof automation, and user interface. The first notable release, Isabelle91 in 1991, marked the introduction of Isabelle/HOL as the primary object-logic, providing a robust foundation for higher-order logic proofs within the generic framework. This version emphasized interactive theorem proving with improved resolution tactics and support for classical reasoning, establishing HOL as the default logic for most applications.[9] In Isabelle98, released in 1998, the Isar proof language was integrated, enabling declarative and readable proofs that bridge the gap between informal mathematical reasoning and machine-checked verification. Isar allowed users to structure proofs in a natural deduction style, moving away from imperative scripts toward a more intelligible format, while maintaining compatibility with existing tactics. This release also added hierarchical name spaces and enhanced theory management, facilitating larger-scale developments.[9][10] Isabelle2007, released in 2007, introduced locales as a modular reasoning mechanism, allowing parametric theories to be defined and composed through interpretations that manage dependencies and assumptions effectively. Locales provided a sectioning concept for scoping constants, types, and axioms, supporting reusable and hierarchical specifications in object-logics like HOL. Additional features included parallel proof checking for improved performance and the metis method for resolution-based automation.[9][11] The Isabelle2016 release in 2016 enhanced the Sledgehammer tool for seamless integration with automated theorem provers (ATPs), incorporating machine learning-based premise selection via MaSh to boost proof reconstruction efficiency. This version also improved support for (co)datatypes, Eisbach for tactic definition, and the Prover IDE with better debugging and state visualization, making interactive proving more accessible.[12][13] Isabelle2024 focused on performance optimizations and advancements in automated proving, including updates to Sledgehammer with support for newer ATPs like E 3.0 and Vampire 4.8, alongside AI-assisted techniques for premise selection and proof search. It introduced arm64 architecture support, refined multiset libraries in HOL, and enhanced distributed build capabilities for large-scale verification projects.[9] The latest stable release, Isabelle2025 in March 2025, expanded support for synthetic proof methods through updates to meson_tac and case_tac, alongside better ML and Scala integration for extensible tooling. Key additions include the Find_Facts search engine for efficient lemma retrieval and the Build_Manager for continuous integration, with ongoing library enhancements in HOL for analysis and probability. Developed collaboratively by the University of Cambridge, Technical University of Munich, and international contributors, it remains under the BSD license. Isabelle receives ongoing maintenance by its core team, with annual releases ensuring compatibility and incremental improvements.[9][3]Logical Foundations
Meta-Logic
Isabelle's meta-logic provides a generic framework for defining and reasoning about various object-logics, with those such as higher-order logic (HOL) implemented as shallow embeddings within it. This meta-logic, denoted as , consists of a small set of natural deduction rules for logical connectives such as implication () and conjunction (), enabling the formalization of inference systems in a way that mirrors intuitive proof construction.[14] The rules include introduction forms, like -introduction for deriving implications from assumptions, and elimination forms for applying them, ensuring soundness and completeness for encoded logics. Object-logics are encoded into this meta-logic through judgments, represented as structures like for a proposition , and inference rules specified as meta-level axioms or theorems. These encodings allow users to define the syntax and semantics of specific logics, such as intuitionistic propositional logic, by extending the type system and adding appropriate constants and axioms, all verified against the meta-logic's rules.[14] The process leverages the meta-logic's operators—such as meta-implication and meta-universal quantification —to express hypothetical judgments and parametric rules, facilitating modular extensions without altering the core system.[16] At the heart of the framework lies a generic inference kernel that mechanically verifies proof steps by reducing goals to subgoals via resolution and higher-order unification, relying solely on the type safety of its underlying functional language for trustworthiness. This kernel applies the natural deduction rules uniformly across any encoded logic, ensuring that proofs are checked incrementally without assuming the correctness of external components beyond the language's type system.[14] By design, Isabelle includes no built-in object-logic; instead, all logics—ranging from classical first-order logic to higher-order variants—are user-defined extensions built atop this meta-logic, promoting flexibility and reusability.[16] Historically, Isabelle's meta-logic draws significant influence from Martin-Löf's type theory, particularly in its emphasis on typed judgments and constructive reasoning, as well as from frameworks for first-order logic (FOL) that prioritize generic theorem proving. These foundations, combined with inspirations from earlier systems like LCF, enabled the development of a lightweight yet expressive meta-level that supports diverse logical paradigms without committing to a single one.[14]Object-Logics
Isabelle's object-logics are formal systems embedded within its meta-logic framework, enabling users to specify and prove theorems in diverse mathematical foundations. These logics extend the generic Isabelle/Pure system by defining object-level syntax, inference rules, and axioms tailored to specific domains, such as higher-order logic or set theory. The framework supports a variety of logics, with historical implementations exceeding a dozen, including first-order variants and sequent calculus formulations.[17][18] The primary and most developed object-logic is Isabelle/HOL, which implements classical higher-order logic grounded in typed lambda calculus. It features a polymorphic type system with basic types like booleans and natural numbers, alongside primitives such as equality (=), universal quantification (∀), and existential quantification (∃). Core axioms include reflexivity of equality (t = t for any term t), the Hilbert choice axiom via the ε-operator (P(x) ⇒ P(εx. P x)), and rules for implication and universal introduction derived from natural deduction. Isabelle/HOL balances expressiveness for computational interpretations—such as defining recursive functions and datatypes—with strong automation support, making it suitable for formalizing both mathematics and software properties.[19][17] Another foundational object-logic is Isabelle/ZF, which formalizes Zermelo-Fraenkel set theory as an untyped extension of first-order logic. It uses a single type i for sets, with primitives including membership (∈), subsets (⊆), power sets (Pow), unions (Union), and comprehensions (Collect). Key axioms encompass extensionality (sets with equal members are equal), foundation (every nonempty set has an ∈-minimal element), infinity (existence of an infinite set), replacement (image of a set under a functional formula is a set), and separation (subsets defined by formulas exist). While equivalent in expressive power to HOL as a conservative extension of first-order logic, Isabelle/ZF emphasizes pure set-based constructions without native types, facilitating foundational proofs in pure mathematics.[20][17] Isabelle also supports simpler logics like Isabelle/FOL, a many-sorted first-order logic implemented via sequent calculus (e.g., LK-style rules for intuitionistic and classical variants), serving as the basis for more complex systems like ZF. Other historical logics include classical first-order logic (CCL), lattice logic (LCF), domain theory (HOLCF), constructive type theory (CTT), the lambda cube (Cube), modal logic (Modal), and intuitionistic linear logic (ILL). These provide specialized primitives, such as dependent products in CTT or necessity operators in Modal, but see limited modern use compared to HOL and ZF.[17][20][18] Since the 1990s, Isabelle/HOL has served as the default object-logic in most distributions, reflecting its maturity and extensive library. It accounts for the majority of developments in the Isabelle ecosystem, with the Archive of Formal Proofs predominantly featuring HOL-based entries due to its practicality for applied formalization. In contrast to ZF's focus on set-theoretic purity, HOL's typed structure enables more direct computational reasoning.[1][17][18]Key Features
Interactive Proving with Isar
Isar, short for Intelligible Semi-Automated Reasoning, is a structured proof language in Isabelle designed to produce human-readable formal proofs, serving as an alternative to traditional tactic-based scripts that resemble imperative programming.[21] Introduced in 1999, Isar emphasizes declarative specifications that mimic natural mathematical reasoning, allowing proofs to be both executable by the system and comprehensible as standalone documents.[21] This approach bridges the gap between machine-verified deductions and human-oriented presentations by providing a generic framework independent of specific logics.[22] Isar proofs follow a block-structured format using keywords such aslemma to declare statements, proof to initiate subproofs, by to apply methods for completing steps, and qed to conclude the overall proof.[22] The language supports both backward reasoning, which refines goals through commands like show and assume, and forward reasoning, which chains facts using elements like then, also, and finally for calculational derivations.[22] This duality enables flexible proof construction, where local contexts can be managed with fix for variables and nested blocks for assumptions, all processed incrementally via an interpreter known as the Isar virtual machine.[22]
Interactive development in Isar is facilitated through integration with IDEs such as Proof General, an Emacs-based interface, and jEdit, Isabelle's native Scala/Swing-based environment, which provide real-time feedback during proof refinement.[22] Users can edit proofs incrementally, visualizing the proof state and errors, which supports iterative construction and debugging.[23]
The primary benefits of Isar include the generation of proofs that function as verifiable mathematical documents, enhancing readability and reusability while ensuring kernel-level verification of every step.[22] For instance, automation tools like Sledgehammer can be invoked within Isar proofs to suggest tactic scripts that are then incorporated declaratively.[22] A simple example illustrates this structure:
lemma sqrt2_not_rational: "√2 ∉ ℚ"
proof -
-- proof steps here
qed
lemma sqrt2_not_rational: "√2 ∉ ℚ"
proof -
-- proof steps here
qed
Automation and External Tools
Isabelle provides a range of automation tools to assist users in discharging proof obligations with minimal manual intervention, integrating external solvers and built-in mechanisms to handle higher-order logic goals. These tools translate Isabelle/HOL problems into formats suitable for automated theorem provers (ATPs) and satisfiability modulo theories (SMT) solvers, reconstruct verifiable proofs within the system, and support counterexample generation for debugging conjectures.[24] Sledgehammer, introduced in 2007, automates proof search by translating subgoals and relevant lemmas from Isabelle/HOL into first-order formats for external ATPs such as Vampire and E, or SMT solvers like Z3 and CVC4.[25] It heuristically selects premises using relevance filtering, invokes solvers in parallel (locally or remotely via SystemOnTPTP), and reconstructs proofs as Isabelle methods, typically using the Metis tactic for ATP outputs or SMT-specific reconstruction for others.[24] This process ensures proof certificates remain within Isabelle's trusted kernel, with optional conversion to readable Isar scripts for human inspection.[24] Nitpick serves as a counterexample finder for Isabelle/HOL, approximating infinite models with finite structures to refute conjectures efficiently.[26] Introduced around 2010, it leverages the Kodkod relational model finder, which encodes problems as SAT instances solvable by backends like MiniSat, to explore domains of bounded cardinality (default 1–10) while handling datatypes, quantifiers, and inductive definitions via Skolemization and three-valued logic for partiality.[27][26] Users invoke it directly on goals, receiving explicit counterexamples with variable assignments to guide proof refinement.[26] For first-order fragments, Isabelle includes built-in provers like Metis, an ordered paramodulation ATP integrated natively for reconstructing and discharging goals from Sledgehammer outputs.[24] Metis processes lists of lemmas to close goals without external calls, supporting equality reasoning in higher-order contexts via shallow embeddings.[28] Similarly, E, a resolution-based ATP, operates as an external backend but can be invoked directly or via Sledgehammer for efficient handling of large premise sets in first-order approximations.[24] Code generation in Isabelle exports verified, executable functions from HOL specifications to target languages including Haskell and Scala, enabling deployment outside the proof environment.[29] This shallow embedding process preserves semantic equivalence through normalization and type refinements, producing idiomatic code via customizable setups for data types and equations.[29] Users define export targets in theories, generating standalone modules for integration with functional programming workflows.[29] Custom automation extends via Standard ML (SML)-based tactics, allowing users to implement domain-specific proof procedures within Isabelle's LCF-style kernel. These tactics manipulate proof states using combinators likeSEQ for sequencing or ORELSE for alternatives, integrating with existing methods for hybrid interactive-automated proving. SML extensions ensure type-safe, verifiable additions without compromising soundness.
In Isabelle2025, enhancements to automation include improved parallel invocation of ATP calls in Sledgehammer, optimizing resource use across multiple cores for faster proof search on complex goals.[24]
Proof Examples
Basic Irrationality Proof
One of the simplest yet illustrative examples of an interactive proof in Isabelle using the Isar language is the demonstration that the square root of 2 is irrational. This proof leverages higher-order logic (HOL) within Isabelle to formalize the classical argument by contradiction, assuming √2 is rational and deriving a parity contradiction in the numerator and denominator of its fractional representation. The goal is stated as the lemmasqrt2_not_rat: "√2 ∉ ℚ", where ℚ denotes the set of rational numbers defined in Isabelle's standard library.[30]
The full Isar proof script, verifiable in Isabelle/HOL, proceeds as follows, combining manual decomposition with automated simplification:
lemma sqrt2_not_rat: "√2 ∉ ℚ"
proof
assume "√2 ∈ ℚ"
obtain m n :: nat where "n ≠ 0" and "√2 = real m / real n" and "gcd m n = 1"
by (auto simp: Rats_def)
hence "real (m²) = real 2 * real (n²)"
by (simp add: power2_eq_square real_of_nat_mult)
then have "m² = 2 * n²" by linarith
hence "even m" by (auto simp: even_nat_def)
then obtain k where "m = 2 * k" by (auto simp: even_nat_def)
hence "4 * k² = 2 * n²" by (simp add: power2_eq_square)
hence "2 * k² = n²" by simp
hence "even n" by (auto simp: even_nat_def)
then obtain l where "n = 2 * l" by (auto simp: even_nat_def)
hence "gcd m n ≥ 2" by (simp add: gcd_nat mult_le_mono)
with `gcd m n = 1` show False by simp
qed
lemma sqrt2_not_rat: "√2 ∉ ℚ"
proof
assume "√2 ∈ ℚ"
obtain m n :: nat where "n ≠ 0" and "√2 = real m / real n" and "gcd m n = 1"
by (auto simp: Rats_def)
hence "real (m²) = real 2 * real (n²)"
by (simp add: power2_eq_square real_of_nat_mult)
then have "m² = 2 * n²" by linarith
hence "even m" by (auto simp: even_nat_def)
then obtain k where "m = 2 * k" by (auto simp: even_nat_def)
hence "4 * k² = 2 * n²" by (simp add: power2_eq_square)
hence "2 * k² = n²" by simp
hence "even n" by (auto simp: even_nat_def)
then obtain l where "n = 2 * l" by (auto simp: even_nat_def)
hence "gcd m n ≥ 2" by (simp add: gcd_nat mult_le_mono)
with `gcd m n = 1` show False by simp
qed
√2 ∈ ℚ ⊢ False. The obtain tactic introduces witnesses m and n from the definition of rationals (Rats_def), which requires coprime natural numbers with n ≠ 0, refining the state to algebraic equations over reals and naturals. Squaring both sides yields m² = 2 * n² via simplification rules for powers and real embeddings, with linarith automating the linear arithmetic inference to eliminate real intermediates.[30]
The core contradiction arises from parity analysis using HOL's arithmetic lemmas. From m² = 2 * n², the evenness of the left side implies even m via even_nat_def, which equates evenness to divisibility by 2; Isabelle's automation (auto simp: even_nat_def) internally applies the lemma that an even square implies an even base (analogous to even_square_even). Substituting m = 2k propagates the factor of 2 to n², similarly yielding even n and thus n = 2l. This violates coprimality, as gcd m n ≥ 2 follows from gcd_nat and monotonicity (mult_le_mono), leading to False by simplification. The by auto and by simp steps demonstrate Isabelle's automation: auto combines rewriting, resolution, and basic tactics, while simp applies equational simplifications, closing subgoals without manual intervention. The final qed discharges the proof, confirming the lemma.[30]
Formal Verification Snippet
One illustrative example of Isabelle's application in formal verification involves specifying and proving the correctness of the insertion sort algorithm within the HOL object-logic. The algorithm is defined recursively on lists of natural numbers as follows: the empty list sorts to itself, and for a non-empty list headed by and tail , it inserts into the sorted version of . The insert function places at the first position where it maintains the sorted order, defined by cases:insert x [] = [x] and insert x (y # ys) = (if x < y then x # y # ys else y # insert x ys). This formalization allows verification of key properties such as the output being sorted and preserving the multiset of elements, ensuring it is a permutation of the input.[31]
To prove these properties, lemmas are stated and discharged using Isar, Isabelle's structured proof language, which emphasizes readability through natural deduction-style reasoning. For instance, the lemma asserting that insertion preserves sortedness, sorted ys ⟹ sorted (insert x ys), is proven by structural induction on ys. A representative Isar outline for this proof proceeds as follows:
lemma insert_sorted: "sorted ys ⟹ sorted (insert x ys)"
[apply](/page/Apply) (induct ys)
[apply](/page/Apply) (simp)
[apply](/page/Apply) (simp split: if_split_asm)
by (metis append_sorted.Cons insert_is_sort linorder_not_le sorted_append)
lemma insert_sorted: "sorted ys ⟹ sorted (insert x ys)"
[apply](/page/Apply) (induct ys)
[apply](/page/Apply) (simp)
[apply](/page/Apply) (simp split: if_split_asm)
by (metis append_sorted.Cons insert_is_sort linorder_not_le sorted_append)
x < y and invokes auxiliary facts about appending sorted lists. Similarly, the main theorem sorted (insertion_sort xs) follows by induction on xs, composing the insert lemma with the recursive structure. Permutation preservation is established via the multiset equality mset (insertion_sort xs) = mset xs, also by induction, leveraging mset (insert x ys) = {#x#} + mset ys. Termination is implicit for finite lists in HOL, as the recursion follows the well-founded list structure. These proofs rely on HOL's built-in libraries for lists and orders.[31]
In practice, interactive development integrates automation tools like Sledgehammer, which translates subgoals to first-order logic and invokes external ATPs (e.g., Vampire or E) to suggest Isar proof steps. For insertion sort lemmas, Sledgehammer often resolves inductive cases automatically, reconstructing verifiable proofs in seconds, reducing manual effort for routine arithmetic or order reasoning. Upon verification, the definitions can be exported to executable code in languages like Haskell or Scala via Isabelle's code generation framework, allowing runtime checks against the formal model to confirm implementation fidelity. This workflow exemplifies small-scale verification, typically spanning 50-100 lines of theory and proof, contrasting with purely mathematical proofs by emphasizing executable specifications and tool-assisted refinement.
Such snippet verifications mirror the refinement style used in large-scale projects like the seL4 microkernel, where Isabelle/HOL connects abstract functional specifications to C implementations through stepwise refinement proofs, ensuring absence of errors in critical systems.
Applications
Software Verification
Isabelle/HOL has been instrumental in the formal verification of complex software systems, particularly through its support for connecting high-level abstract specifications to low-level executable code. This capability enables the proof of functional correctness, security properties, and absence of errors in real-world implementations. One of the most prominent applications is in operating system kernels, where Isabelle facilitates end-to-end verification pipelines that bridge mathematical models with machine code. A landmark case study is the verification of the seL4 microkernel, a high-performance, general-purpose operating system kernel designed for security-critical applications. Developed by researchers at NICTA (now Data61), the project uses Isabelle/HOL to prove that the C implementation of seL4 refines an abstract functional specification, establishing full functional correctness including the absence of buffer overflows, deadlocks, and other common vulnerabilities. The initial verification, completed in 2009, encompassed approximately 7,500 lines of C code with over 200,000 lines of Isabelle proof script, covering the kernel's behavior on x86 and ARM architectures. Subsequent extensions have expanded this to approximately 10,000 lines of verified C code with binary-level proofs for multiple platforms, including RISC-V, demonstrating binary code extraction and machine-checkable guarantees for the entire trusted computing base. The seL4 proofs remain ongoing, with updates incorporating new features and architectures while maintaining the original correctness theorems.[32][33][34] Beyond kernels, Isabelle has been applied to verify components of larger systems, such as modules in the Linux kernel. For instance, a formally verified implementation of the red-black tree data structure—a key component used in the Linux kernel for managing memory allocations and file systems—was developed and proven correct in Isabelle/HOL. This verification establishes invariants like balance and ordering, ensuring the module's reliability. Such efforts highlight Isabelle's utility in modular verification, where individual kernel components can be isolated and proven against specifications derived from the kernel's interface.[35] Central to these verifications is Isabelle's refinement methodology, which decomposes proofs into stepwise refinements from abstract, high-level specifications (often expressed in functional languages like HOL) down to imperative, executable code in languages such as C. This approach leverages the Isabelle Refinement Framework, which automates much of the proof obligation generation through a verification condition generator (VCG). The VCG transforms refinement relations into dischargeable proof obligations, supporting Hoare-style reasoning for loops and procedures while integrating automated solvers for simpler subgoals. For imperative languages, challenges arise in modeling side effects and pointer aliasing; Isabelle addresses these via monotonic predicates and ghost variables to track abstract states without altering concrete execution.[36][37] The impact of Isabelle in software verification extends to high-assurance systems in industry and research. The seL4 project, led by Data61, has influenced secure system design in domains like avionics and automotive software, providing a foundation for certified components with minimal trusted code. Organizations such as Galois Inc. have incorporated Isabelle-compatible tools into their formal methods workflows for verifying cryptographic protocols and embedded systems, enhancing reliability in safety-critical applications. These efforts underscore Isabelle's role in scaling verification to production software, where proofs not only detect bugs but also certify compliance with standards like DO-178C for aerospace. Recent advancements include the IsaBIL framework for verifying the (in)correctness of binaries in Isabelle/HOL, presented in 2025, which builds on the Binary Analysis Platform for low-level code analysis.[33][38][39][40]Mathematical Formalizations
Isabelle/HOL has facilitated the formalization of several foundational results in mathematics and logic, demonstrating its capability for handling abstract and complex structures through higher-order logic. One prominent example is the mechanization of Gödel's incompleteness theorems, first achieved in the mid-2010s. A 2015 formalization using Nominal Isabelle followed Świerczkowski's detailed proof, verifying both theorems for Peano arithmetic and related systems.[41] Subsequent works, such as a 2019 abstract development at CADE and a 2021 machine-assisted proof for hereditarily finite sets, extended this to broader meta-logical frameworks, confirming the theorems' implications for formal systems.[42][43] These efforts highlight Isabelle's strength in encoding arithmetization and self-referential constructions. In analytic number theory, the Prime Number Theorem—stating that the number of primes up to is asymptotically —received a formal proof in Isabelle/HOL during the 2000s via the elementary Erdős–Selberg method.[44] This verification included remainder terms and relied on Dirichlet series libraries. A refined analytic version, based on Newman's presentation and incorporating Tauberian theorems, was added to the Archive of Formal Proofs in 2018, building on complex analysis primitives.[45] These formalizations underscore Isabelle's support for asymptotic analysis and summation techniques. Beyond these, Isabelle has enabled proofs of topological theorems like the Jordan curve theorem, which asserts that a simple closed curve in the plane divides it into an interior and exterior region. This was formalized in Isabelle's HOL-Analysis library by Lawrence C. Paulson, adapting insights from HOL Light to establish the theorem and its applications in plane topology.[46] The Flyspeck project, culminating in 2014, formally verified the Kepler conjecture on the densest sphere packing in three dimensions using a combination of HOL Light and Isabelle; the Isabelle component handled key algebraic and geometric auxiliaries for the proof.[47] Additionally, category theory libraries in Isabelle/HOL, initiated in 2010 and expanded since, define categories, functors, natural transformations, and related structures, supporting applications in algebra and logic.[48] More recent formalizations include the Balog–Szemerédi–Gowers theorem in additive combinatorics (2023) and the Hidden Number Problem in cryptography (2025), both contributed to the Archive of Formal Proofs.[49][50] Such projects typically comprise thousands of lemmas per formalization, leveraging HOL's expressiveness to model infinite structures and higher-order abstractions efficiently; for instance, the Archive of Formal Proofs hosts entries with hundreds to thousands of theorems each.[51] These developments bridge informal mathematical discourse with machine-checked rigor, fostering reusable libraries that advance automated theorem proving in pure mathematics.[51]Community and Extensions
Archive of Formal Proofs
The Archive of Formal Proofs (AFP) is a comprehensive repository of peer-reviewed formal proofs, libraries, and larger developments for the Isabelle proof assistant, launched in 2004 to facilitate the sharing and reuse of verified mathematical and computational content.[52] As of November 2025, it encompasses 935 entries contributed by 566 authors, totaling approximately 4.9 million lines of proof code and around 299,200 lemmas.[4] These entries span a wide range of domains, including algebra (e.g., group theory and ring formalizations), topology (e.g., general and Furstenberg topologies), cryptography (e.g., verified implementations of elliptic curves and hash functions), and artificial intelligence (e.g., Markov decision processes foundational to reinforcement learning). All submissions undergo a rigorous peer-review process, akin to that of academic journals, to ensure correctness, significance, and documentation quality before inclusion.[53] The AFP promotes modularity and reuse within the Isabelle ecosystem by allowing users to import entries directly into their proofs. After installing the AFP via theisabelle components -u command, which integrates it as a component of the Isabelle distribution, specific libraries can be loaded by declaring their session names in a project's ROOT file or using Isabelle's build system to include dependencies.[54] This mechanism enables seamless extension of existing formalizations, reducing redundancy and accelerating new developments—for instance, building reinforcement learning models on top of verified Markov decision process theories.[55]
Hosted and maintained by the Technical University of Munich, the AFP receives annual releases synchronized with Isabelle versions, incorporating new entries, updates to existing ones for compatibility, and enhancements to the build infrastructure.[51] Notable examples include the MDP-Algorithms entry, which provides verified algorithms for solving Markov decision processes used in reinforcement learning applications (often referred to in contexts like Isabelle/RL), and the Topology entry, featuring synthetic proofs of topological properties such as closure under unions and intersections.[56] These resources not only demonstrate Isabelle's expressive power but also serve as benchmarks for formal verification in applied mathematics and computer science.[55]
Recent Developments and Integrations
The Isabelle2025 release in March 2025 introduced enhancements to synthetic proof frameworks, enabling more modular and reusable proof constructions through improved tactic composition and abstraction mechanisms. This version also advanced parallel processing capabilities, allowing for distributed proof checking across multi-core systems and cloud environments, which significantly reduces verification times for large-scale formalizations. An upcoming Isabelle2025-1 release, planned for mid-December 2025, will build on these with further improvements. These updates build on prior automation tools but emphasize scalability for modern hardware, as detailed in the official release notes.[3] AI integrations have emerged as a key focus, with IsaMini representing a redesigned subset of the Isabelle proof language optimized for machine learning applications. Introduced in early 2025, IsaMini simplifies tactic syntax and proof states to better interface with large language models (LLMs), facilitating automated proof generation in neural theorem proving (NTP) pipelines. Pilot projects using NTP have demonstrated LLMs generating structured Isabelle proofs for verification conditions, achieving up to 29% higher success rates on benchmark theorems compared to traditional search-based methods, by leveraging fine-tuned models. These efforts, including frameworks for whole-proof synthesis, highlight Isabelle's adaptability to AI-driven automation.[57][58] The Isabelle Workshop 2024, held in September in Tbilisi, Georgia, and colocated with the International Conference on Interactive Theorem Proving (ITP), featured discussions on AI integration and proof modularity, with presentations on LLM-assisted tactics and cross-assistant interoperability. Building on this, community efforts continue to emphasize these themes. Extensions like Isabelle/RL have progressed, providing a formalization environment for reinforcement learning algorithms within Isabelle/HOL, enabling verified implementations of RL policies for proof search optimization.[59][60] Recent research, such as the CPP 2025 paper on an Isabelle/HOL framework for synthetic completeness proofs (presented at POPL 2025), addresses gaps in prior documentation by formalizing generic completeness arguments for proof systems, paving the way for broader AI-assisted verifications not yet widely covered in older resources. This work underscores Isabelle's evolving role in handling complex, AI-enhanced formal methods.[61]Comparisons
With Coq
Isabelle/HOL and Coq differ fundamentally in their underlying logics, with Isabelle based on classical higher-order logic (HOL) and Coq on the constructive Calculus of Inductive Constructions (CIC). HOL's classical nature allows direct use of principles like the law of excluded middle, facilitating proofs in classical mathematics without additional axioms, which makes Isabelle particularly suitable for formalizing traditional mathematical theorems. In contrast, Coq's CIC is intuitionistic by default, emphasizing constructive proofs where existence implies computability, though users can introduce classical axioms if needed; this constructive foundation supports Coq's strengths in program extraction and certified programming but can complicate classical reasoning. These logical differences influence usability: Isabelle's HOL aligns more closely with standard mathematical practice, reducing the need for workarounds in non-constructive domains. The proof languages reflect these logics and prioritize different styles of proof development. Isabelle employs Isar, a declarative language designed for structured, human-readable proofs that mimic natural mathematical discourse, enabling backward reasoning where goals are stated explicitly before tactics are applied.[62] This approach enhances proof maintainability and intelligibility, as Isar proofs can resemble textbook derivations without procedural details.[62] Coq, however, relies on Gallina for definitions and statements alongside Ltac, a tactic language that is procedural and programmable, allowing fine-grained control over proof search but often resulting in less readable, imperative-style scripts. While Ltac's flexibility supports complex automation, it can obscure the high-level proof structure compared to Isar's declarative format. Automation in both systems leverages external provers, but implementations vary in effectiveness. Isabelle's Sledgehammer translates goals to first-order logic for automated theorem provers (ATPs) like Vampire or E, then reconstructs verifiable Isar proofs from the results, achieving high success rates on HOL problems due to robust premise selection and reconstruction.[25] CoqHammer performs a similar role for Coq, translating dependent-type goals to first-order form and using machine learning for premise selection, but reconstruction in CIC's richer type system is more challenging, leading to lower automation rates for complex proofs.[63] Both tools integrate ATPs effectively, yet Isabelle's classical logic and translation strategies often yield better reconstruction fidelity. Both communities maintain extensive repositories of formalizations, fostering verification efforts. Isabelle's Archive of Formal Proofs (AFP) hosts over 650 peer-reviewed entries as of 2024 covering mathematics, computer science, and verification, providing reusable libraries integrated with the Isabelle distribution.[51][64] Coq's ecosystem includes the standard library and contributions via the Coq Platform, with thousands of formalizations in areas like software verification and type theory, supported by active development from Inria and contributors. These resources underscore both systems' strengths in formal verification, though Isabelle's AFP emphasizes curated, larger-scale developments while Coq's libraries excel in constructive and dependently typed components. Isabelle's strengths lie in its accessibility for users familiar with HOL and classical mathematics, offering readable proofs and strong automation for theorem proving in non-constructive settings. Coq, conversely, shines in scenarios requiring dependent types for fine-grained specifications and certified programs, enabling executable extractions from proofs but demanding more effort for classical results. These complementary profiles make Isabelle preferable for mathematical formalizations and Coq for dependently typed programming verification.With Lean
Isabelle and Lean represent two prominent interactive theorem provers with distinct implementation foundations. Isabelle is primarily implemented in Standard ML (SML), a functional programming language that facilitates rapid prototyping of deductive systems and proof procedures while ensuring system soundness.[1] In contrast, Lean is built using C++ as its core implementation language, with increasing portions of the system, including Lean 4, written in Lean itself to enable self-hosting and performance optimizations; this hybrid approach allows Lean to compile to efficient C code, making it particularly fast for handling large mathematical libraries.[65] Lean's design in C++ and Lean contributes to superior performance in processing extensive formalizations, such as those in its mathlib library, due to optimized memory management and compilation. Isabelle is based on higher-order logic (HOL), while Lean is based on dependent type theory via the calculus of constructions with inductive types, enabling more expressive specifications where types can depend on values and proofs can be treated as first-class objects.[65] Isabelle's HOL, while powerful for classical reasoning, remains more generic and object-logic agnostic, allowing instantiation to various logics without dependent type features, which provides flexibility for verification tasks but limits expressiveness in areas requiring fine-grained type dependencies compared to Lean.[1] This difference influences their applicability: Lean's dependent types support advanced mathematical constructions, such as universe polymorphism, that are less straightforward in Isabelle's setup. Proof development styles diverge notably between the two. Isabelle employs Isar, a structured language that produces human-readable proofs resembling natural mathematical discourse, emphasizing declarative steps over low-level manipulation.[1] Lean, while supporting a tactic mode for interactive, goal-directed proving similar to many systems, also integrates declarative styles akin to Isar through its mathlib library, which offers a vast, community-curated collection of over 210,000 formalized theorems across algebra, analysis, and topology as of May 2025, facilitating reusable mathematical components.[66] Lean's tactic system, enhanced in Lean 4 (released in 2023), allows seamless switching between scripted automation and declarative narratives, often yielding more concise proofs for complex mathematics.[67] Automation capabilities are comparable yet tailored differently. Both integrate automated theorem provers (ATPs) via hammers: Isabelle's Sledgehammer translates goals to first-order logic and invokes external ATPs like Vampire or E, reconstructing proofs in Isar with high success rates on HOL problems.[24] Lean 4 provides similar ATP interfaces through tools like Lean-auto, which exports goals to SMT solvers and ATPs, but benefits from tighter integration with its IDE, such as VS Code extensions offering real-time feedback and tactic suggestions, improving usability over Isabelle's jEdit-based environment.[68] These features make Lean 4's automation feel more modern and responsive for iterative proof development. In use cases, Isabelle excels in software and hardware verification, leveraging its maturity for industrial applications like the seL4 kernel proof, while Lean has gained traction in mathematical formalization, exemplified by its 2024 formalization of finite Ramsey theory, including exact values for small Ramsey numbers and van der Waerden variants.[1][69] Trade-offs highlight Isabelle's long-standing stability and extensive ecosystem for reliable verification against Lean's rapid evolution, driven by community contributions that have scaled mathlib dramatically since Lean 4's introduction, though this pace can introduce occasional instability in cutting-edge features.[70][71]References
- https://isabelle.in.tum.de/doc/[tutorial](/page/Tutorial).pdf
