Recent from talks
Nothing was collected or created yet.
DPLL algorithm
View on WikipediaAfter 5 fruitless attempts (red), choosing the variable assignment a=1, b=1 leads, after unit propagation (bottom), to success (green): the top left CNF formula is satisfiable. | |
| Class | Boolean satisfiability problem |
|---|---|
| Data structure | Binary tree |
| Worst-case performance | |
| Best-case performance | (constant) |
| Worst-case space complexity | (basic algorithm) |
In logic and computer science, the Davis–Putnam–Logemann–Loveland (DPLL) algorithm is a complete, backtracking-based search algorithm for deciding the satisfiability of propositional logic formulae in conjunctive normal form, i.e. for solving the CNF-SAT problem.
It was introduced in 1961 by Martin Davis, George Logemann and Donald W. Loveland and is a refinement of the earlier Davis–Putnam algorithm, which is a resolution-based procedure developed by Davis and Hilary Putnam in 1960. Especially in older publications, the Davis–Logemann–Loveland algorithm is often referred to as the "Davis–Putnam method" or the "DP algorithm". Other common names that maintain the distinction are DLL and DPLL.
Implementations and applications
[edit]The SAT problem is important both from theoretical and practical points of view. In complexity theory it was the first problem proved to be NP-complete, and can appear in a broad variety of applications such as model checking, automated planning and scheduling, and diagnosis in artificial intelligence.
As such, writing efficient SAT solvers has been a research topic for many years. GRASP (1996-1999) was an early implementation using DPLL.[1] In the international SAT competitions, implementations based around DPLL such as zChaff[2] and MiniSat[3] were in the first places of the competitions in 2004 and 2005.[4]
Another application that often involves DPLL is automated theorem proving or satisfiability modulo theories (SMT), which is a SAT problem in which propositional variables are replaced with formulas of another mathematical theory.
The algorithm
[edit]The basic backtracking algorithm runs by choosing a literal, assigning a truth value to it, simplifying the formula and then recursively checking if the simplified formula is satisfiable; if this is the case, the original formula is satisfiable; otherwise, the same recursive check is done assuming the opposite truth value. This is known as the splitting rule, as it splits the problem into two simpler sub-problems. The simplification step essentially removes all clauses that become true under the assignment from the formula, and all literals that become false from the remaining clauses.
The DPLL algorithm enhances over the backtracking algorithm by the eager use of the following rules at each step:
- Unit propagation
- If a clause is a unit clause, i.e. it contains only a single unassigned literal, this clause can only be satisfied by assigning the necessary value to make this literal true. Thus, no choice is necessary. Unit propagation consists in removing every clause containing a unit clause's literal and in discarding the complement of a unit clause's literal from every clause containing that complement. In practice, this often leads to deterministic cascades of units, thus avoiding a large part of the naive search space.
- Pure literal elimination
- If a propositional variable occurs with only one polarity in the formula, it is called pure. A pure literal can always be assigned in a way that makes all clauses containing it true. Thus, when it is assigned in such a way, these clauses do not constrain the search anymore, and can be deleted.
Unsatisfiability of a given partial assignment is detected if one clause becomes empty, i.e. if all its variables have been assigned in a way that makes the corresponding literals false. Satisfiability of the formula is detected either when all variables are assigned without generating the empty clause, or, in modern implementations, if all clauses are satisfied. Unsatisfiability of the complete formula can only be detected after exhaustive search.
The DPLL algorithm can be summarized in the following pseudocode, where Φ is the CNF formula:
Algorithm DPLL
Input: A set of clauses Φ.
Output: A truth value indicating whether Φ is satisfiable.
function DPLL(Φ)
// unit propagation:
while there is a unit clause {l} in Φ do
Φ ← unit-propagate(l, Φ);
// pure literal elimination:
while there is a literal l that occurs pure in Φ do
Φ ← pure-literal-assign(l, Φ);
// stopping conditions:
if Φ is empty then
return true;
if Φ contains an empty clause then
return false;
// DPLL procedure:
l ← choose-literal(Φ);
return DPLL(Φ ∧ {l}) or DPLL(Φ ∧ {¬l});
- "←" denotes assignment. For instance, "largest ← item" means that the value of largest changes to the value of item.
- "return" terminates the algorithm and outputs the following value.
In this pseudocode, unit-propagate(l, Φ) and pure-literal-assign(l, Φ) are functions that return the result of applying unit propagation and the pure literal rule, respectively, to the literal l and the formula Φ. In other words, they replace every occurrence of l with "true" and every occurrence of not l with "false" in the formula Φ, and simplify the resulting formula. The or in the return statement is a short-circuiting operator. Φ ∧ {l} denotes the simplified result of substituting "true" for l in Φ.
The algorithm terminates in one of two cases. Either the CNF formula Φ is empty, i.e., it contains no clause. Then it is satisfied by any assignment, as all its clauses are vacuously true. Otherwise, when the formula contains an empty clause, the clause is vacuously false because a disjunction requires at least one member that is true for the overall set to be true. In this case, the existence of such a clause implies that the formula (evaluated as a conjunction of all clauses) cannot evaluate to true and must be unsatisfiable.
The pseudocode DPLL function only returns whether the final assignment satisfies the formula or not. In a real implementation, the partial satisfying assignment typically is also returned on success; this can be derived by keeping track of branching literals and of the literal assignments made during unit propagation and pure literal elimination.
The Davis–Logemann–Loveland algorithm depends on the choice of branching literal, which is the literal considered in the backtracking step. As a result, this is not exactly an algorithm, but rather a family of algorithms, one for each possible way of choosing the branching literal. Efficiency is strongly affected by the choice of the branching literal: there exist instances for which the running time is constant or exponential depending on the choice of the branching literals. Such choice functions are also called heuristic functions or branching heuristics.[5]
Formalization
[edit]The sequent calculus-similar notation can be used to formalize many rewriting algorithms, including DPLL. The following are the 5 rules a DPLL solver can apply in order to either find or fail to find a satisfying assignment, i.e. .[6][7]
If a clause in the formula has exactly one unassigned literal in , with all other literals in the clause appearing negatively, extend with . This rule represents the idea a currently false clause with only one unset variable left forces that variable to be set in such a way as to make the entire clause true, otherwise the formula will not be satisfied.If a literal appears in the formula but its negation does not, and and are not in , extend with . This rule represents the idea that if a variable appears only purely positively or purely negatively in a formula, all the instances can be set to true or false to make their corresponding clauses true.If a literal is in the set of literals of and neither nor is in , then decide on the truth value of and extend with the decision literal . This rule represents the idea that if you aren't forced to do an assignment, you must choose a variable to assign and make note which assignment was a choice so you can go back if the choice didn't result in a satisfying assignment.If a clause is in , and their negations are in , and can be represented as where , then backtrack by setting to . This rule represents the idea that if you reach a contradiction in trying to find a valid assignment, you need to go back to where you previously did a decision between two assignments and pick the other one.If a clause is in , and their negations are in , and there is no conflict marker in , then the DPLL algorithm fails. This rule represents the idea that if you reach a contradiction but there wasn't anything you could have done differently on the way there, the formula is unsatisfiable.
Visualization
[edit]Davis, Logemann, Loveland (1961) had developed this algorithm. Some properties of this original algorithm are:
- It is based on search.
- It is the basis for almost all modern SAT solvers.
- It does not use learning or non-chronological backtracking (introduced in 1996).
An example with visualization of a DPLL algorithm having chronological backtracking:
-
All clauses making a CNF formula
-
Pick a variable
-
Make a decision, variable a = False (0), thus green clauses becomes True
-
After making several decisions, we find an implication graph that leads to a conflict.
-
Now backtrack to immediate level and by force assign opposite value to that variable
-
But a forced decision still leads to another conflict
-
Backtrack to previous level and make a forced decision
-
Make a new decision, but it leads to a conflict
-
Make a forced decision, but again it leads to a conflict
-
Backtrack to previous level
-
Continue in this way and the final implication graph
Related algorithms
[edit]Since 1986, (Reduced ordered) binary decision diagrams have also been used for SAT solving.[citation needed]
In 1989-1990, Stålmarck's method for formula verification was presented and patented. It has found some use in industrial applications.[8]
DPLL has been extended for automated theorem proving for fragments of first-order logic by way of the DPLL(T) algorithm.[1]
In the 2010-2019 decade, work on improving the algorithm has found better policies for choosing the branching literals and new data structures to make the algorithm faster, especially the part on unit propagation. However, the main improvement has been a more powerful algorithm, Conflict-Driven Clause Learning (CDCL), which is similar to DPLL but after reaching a conflict "learns" the root causes (assignments to variables) of the conflict, and uses this information to perform non-chronological backtracking (aka backjumping) in order to avoid reaching the same conflict again. Most state-of-the-art SAT solvers are based on the CDCL framework as of 2019.[9]
Relation to other notions
[edit]Runs of DPLL-based algorithms on unsatisfiable instances correspond to tree resolution refutation proofs.[10]
See also
[edit]References
[edit]General
- Davis, Martin; Putnam, Hilary (1960). "A Computing Procedure for Quantification Theory". Journal of the ACM. 7 (3): 201–215. doi:10.1145/321033.321034. S2CID 31888376.
- Davis, Martin; Logemann, George; Loveland, Donald (1962). "A Machine Program for Theorem Proving". Communications of the ACM. 5 (7): 394–397. doi:10.1145/368273.368557. hdl:2027/mdp.39015095248095. S2CID 15866917.
- Ouyang, Ming (1998). "How Good Are Branching Rules in DPLL?". Discrete Applied Mathematics. 89 (1–3): 281–286. doi:10.1016/S0166-218X(98)00045-6.
- John Harrison (2009). Handbook of practical logic and automated reasoning. Cambridge University Press. pp. 79–90. ISBN 978-0-521-89957-4.
Specific
- ^ a b Nieuwenhuis, Robert; Oliveras, Albert; Tinelli, Cesar (2004), "Abstract DPLL and Abstract DPLL Modulo Theories" (PDF), Proceedings Int. Conf. on Logic for Programming, Artificial Intelligence, and Reasoning, LPAR 2004, pp. 36–50
- ^ zChaff website
- ^ "Minisat website".
- ^ The international SAT Competitions web page, sat! live
- ^ Marques-Silva, João P. (1999). "The Impact of Branching Heuristics in Propositional Satisfiability Algorithms". In Barahona, Pedro; Alferes, José J. (eds.). Progress in Artificial Intelligence: 9th Portuguese Conference on Artificial Intelligence, EPIA '99 Évora, Portugal, September 21–24, 1999 Proceedings. LNCS. Vol. 1695. pp. 62–63. doi:10.1007/3-540-48159-1_5. ISBN 978-3-540-66548-9.
- ^ Nieuwenhuis, Robert; Oliveras, Albert; Tinelli, Cesare (2006-11-01). "Solving SAT and SAT Modulo Theories: From an abstract Davis--Putnam--Logemann--Loveland procedure to DPLL(T)". J. ACM. 53 (6): 937–977. doi:10.1145/1217856.1217859. ISSN 0004-5411.
- ^ Krstić, Sava; Goel, Amit (2007). Konev, Boris; Wolter, Frank (eds.). "Architecting Solvers for SAT Modulo Theories: Nelson-Oppen with DPLL". Frontiers of Combining Systems. Berlin, Heidelberg: Springer: 1–27. doi:10.1007/978-3-540-74621-8_1. ISBN 978-3-540-74621-8.
- ^ Stålmarck, G.; Säflund, M. (October 1990). "Modeling and Verifying Systems and Software in Propositional Logic". IFAC Proceedings Volumes. 23 (6): 31–36. doi:10.1016/S1474-6670(17)52173-4.
- ^ Möhle, Sibylle; Biere, Armin (2019). "Backing Backtracking". Theory and Applications of Satisfiability Testing – SAT 2019 (PDF). Lecture Notes in Computer Science. Vol. 11628. pp. 250–266. doi:10.1007/978-3-030-24258-9_18. ISBN 978-3-030-24257-2. S2CID 195755607.
- ^ Van Beek, Peter (2006). "Backtracking search algorithms". In Rossi, Francesca; Van Beek, Peter; Walsh, Toby (eds.). Handbook of constraint programming. Elsevier. p. 122. ISBN 978-0-444-52726-4.
Further reading
[edit]- Malay Ganai; Aarti Gupta; Dr. Aarti Gupta (2007). SAT-based scalable formal verification solutions. Springer. pp. 23–32. ISBN 978-0-387-69166-4.
- Gomes, Carla P.; Kautz, Henry; Sabharwal, Ashish; Selman, Bart (2008). "Satisfiability Solvers". In Van Harmelen, Frank; Lifschitz, Vladimir; Porter, Bruce (eds.). Handbook of knowledge representation. Foundations of Artificial Intelligence. Vol. 3. Elsevier. pp. 89–134. doi:10.1016/S1574-6526(07)03002-7. ISBN 978-0-444-52211-5.
DPLL algorithm
View on GrokipediaPrerequisites
Propositional Logic Basics
Propositional logic, also known as sentential logic, deals with propositions that can be either true or false, using connectives such as negation (¬), conjunction (∧), and disjunction (∨) to form compound statements.[6] The basic units are propositional variables, which are atomic symbols like , , or , representing statements whose truth values are undetermined.[6] A literal is either a propositional variable (a positive literal) or the negation of a propositional variable (a negative literal), such as or .[6] A clause is a disjunction of one or more literals, for example, .[6] Conjunctive normal form (CNF) is a standardized representation where a formula is expressed as a conjunction of clauses, such as .[6] This form is particularly useful in computational contexts because it structures formulas into a series of constraints that must all hold true simultaneously.[6] A truth assignment is a function that maps each propositional variable to a truth value, either true (T) or false (F).[6] Under a given truth assignment, a literal evaluates to true if the variable is assigned true (for positive) or false (for negative), and a clause is satisfied if at least one of its literals is true; otherwise, the clause is falsified.[6] A CNF formula is satisfied by a truth assignment if every clause is satisfied, and falsified if at least one clause is falsified.[6] The empty clause, denoted , consists of no literals and is inherently unsatisfiable, as there is no truth assignment that can make it true.[7] In contrast, the empty CNF formula, which is a conjunction with no clauses, is denoted and is trivially satisfiable under any truth assignment, as there are no constraints to violate.[6]Boolean Satisfiability Problem
The Boolean satisfiability problem (SAT) asks whether there exists a truth assignment to the variables of a given propositional formula that evaluates to true. Formally, given a formula φ in conjunctive normal form (CNF), SAT is the problem of deciding if there is an assignment of truth values to the Boolean variables such that φ is satisfied. A CNF formula consists of a conjunction of clauses, where each clause is a disjunction of literals, and a literal is either a variable or its negation. This problem serves as a canonical NP-complete decision problem, for which complete solvers like DPLL have been developed to determine satisfiability efficiently.[8] SAT was identified as the first NP-complete problem by Stephen Cook in 1971, establishing its central place in computational complexity theory.[8] The proof of NP-completeness relies on the Cook-Levin theorem, which demonstrates a polynomial-time reduction from the general verification problem in NP—simulating a nondeterministic Turing machine's accepting computation—to an instance of SAT. For the restricted 3-SAT variant, where every clause contains exactly three literals, NP-completeness follows from a polynomial-time reduction from general SAT: clauses with more than three literals are split using auxiliary variables, preserving satisfiability without altering the overall complexity.[8] SAT exists in two primary forms: the decision version, which outputs yes if a satisfying assignment exists (SAT) or no otherwise (UNSAT), and the search version, which requires producing a satisfying assignment when one exists. The decision version is NP-complete, while the search version, though also computationally challenging, reduces to the decision version via self-reducibility: by iteratively fixing variables and querying the decision oracle, a satisfying assignment can be constructed using O(n) calls, where n is the number of variables.[9] As an example, consider the 3-CNF formula φ = (x ∨ y ∨ z) ∧ (¬x ∨ ¬y ∨ ¬z). This instance is satisfiable, for example, under the assignment x = true, y = true, z = false, which makes both clauses true.Historical Development
Davis-Putnam Procedure
The Davis-Putnam procedure, introduced by Martin Davis and Hilary Putnam in 1960, serves as a foundational algorithm for deciding the satisfiability of propositional formulas in conjunctive normal form (CNF). It operates as the propositional component of a broader two-phase method for first-order predicate calculus, focusing on systematically eliminating variables through resolution to reduce the formula until satisfiability can be determined. Unlike later variants, this procedure relies exclusively on resolution without branching or propagation mechanisms, making it a complete but computationally intensive approach for the Boolean satisfiability problem (SAT).[2] The core of the procedure involves recursive variable elimination via resolution. Given a CNF formula , select an uneliminated propositional variable . Partition the clauses into three sets: those containing (denoted ), those containing (), and those containing neither (). Generate all non-tautological resolvents by applying the resolution rule to every pair of clauses, one from and one from : for clauses and , the resolvent is . The updated formula becomes resolvents, with clauses containing or discarded. Recurse on with the remaining variables until the formula is empty (indicating satisfiability, as the empty set is satisfiable) or contains the empty clause (indicating unsatisfiability). This process ensures completeness, as resolution preserves unsatisfiability, and termination occurs when no variables remain.[2][10] To illustrate, consider the small CNF formula . Select variable for elimination. Here, and , with . Resolve each pair:- (tautology, discarded)
- (tautology, discarded)
Introduction of DPLL
The Davis–Putnam–Logemann–Loveland (DPLL) algorithm emerged as a significant advancement in automated theorem proving, motivated by the limitations of the earlier Davis–Putnam (DP) procedure, which, despite its resolution-based approach to propositional satisfiability, suffered from inefficiencies in handling larger formulas due to exhaustive resolution without systematic search pruning.[1] In 1962, Martin Davis, George Logemann, and Donald W. Loveland published their refined method in the Communications of the ACM, introducing a practical computing procedure for deciding satisfiability in propositional logic, adaptable to first-order quantification theory through skolemization.[1] This work addressed DP's shortcomings by shifting from pure resolution to a hybrid strategy that incorporated dynamic simplification and search, enabling more efficient exploration of the formula's solution space.[1] The core innovations of DPLL lie in its integration of backtracking search with preprocessing rules to prune the search tree effectively. Specifically, it employs unit propagation, which iteratively assigns values to literals in unit clauses (clauses with a single literal) and simplifies the formula accordingly, and pure literal elimination, which identifies literals appearing only positively or negatively across all clauses and assigns them values to reduce the formula without loss of satisfiability.[1] These are combined with a chronological backtracking mechanism, where the algorithm branches on an undecided literal, recurses on both the literal and its negation, and backjumps to the most recent decision point upon failure, avoiding redundant computations.[1] This structure marked a departure from DP's static resolution, introducing a recursive, decision-directed search that dynamically adapts to the formula's constraints.[1] The first outline of the DPLL algorithm appeared in pseudocode form, emphasizing its recursive nature. The procedure processes a conjunctive normal form (CNF) formula as follows:function DPLL(Φ):
if Φ contains the empty clause:
return false // unsatisfiable
if Φ has no clauses:
return true // satisfiable
perform unit propagation on Φ
perform pure literal elimination on Φ
if Φ is satisfiable after simplification:
return true
select an undecided literal l in Φ
return DPLL(Φ ∧ l) or DPLL(Φ ∧ ¬l)
function DPLL(Φ):
if Φ contains the empty clause:
return false // unsatisfiable
if Φ has no clauses:
return true // satisfiable
perform unit propagation on Φ
perform pure literal elimination on Φ
if Φ is satisfiable after simplification:
return true
select an undecided literal l in Φ
return DPLL(Φ ∧ l) or DPLL(Φ ∧ ¬l)
Core Algorithm
Informal Description
The DPLL algorithm decides the satisfiability of propositional formulas in conjunctive normal form (CNF) by systematically exploring possible truth assignments through a backtracking search. It builds partial assignments incrementally, applying simplification rules to reduce the formula and prune unproductive branches, recursing until a satisfying assignment is found or a contradiction proves unsatisfiability.[12] A core simplification rule is unit propagation: if a clause reduces to a single literal (a unit clause), the algorithm assigns that literal to true, satisfying the clause, and propagates the effect by removing the literal from other clauses and eliminating any now-satisfied clauses. This process repeats until no unit clauses remain, efficiently forcing necessary assignments without branching. Another rule is pure literal elimination: if a variable occurs only positively (or only negatively) across all remaining clauses, it is assigned true (or false) to satisfy those clauses safely, as the opposite polarity does not appear and cannot cause a contradiction. These rules, known respectively as the one-literal clause rule and the affirmative-negative rule, minimize the search space before branching.[12] When no further simplifications apply, the algorithm branches by selecting an unassigned variable and recursively trying both truth values, continuing the search on the resulting subformulas. The search terminates as satisfiable if all clauses become satisfied under the current partial assignment, yielding a full satisfying assignment; it terminates as unsatisfiable if an empty clause (indicating a contradiction) is derived during propagation.[12] For illustration, consider the CNF formula . No unit clauses or pure literals exist initially, so branch on assigned true: the first and third clauses are satisfied, while the second simplifies to the unit clause , forcing false via propagation. All clauses are now satisfied with no empty clause derived, confirming satisfiability under this assignment. (Branching on false would propagate to conflicting unit clauses and , but success on the first branch suffices.)[12]Formal Specification
The DPLL algorithm takes as input a conjunctive normal form (CNF) formula defined over a finite set of propositional variables .[1] The algorithm determines whether is satisfiable by constructing a partial truth assignment that satisfies all clauses in .[13] The state of the algorithm is represented by a partial assignment , which maps a subset of variables in to truth values (where 0 denotes false and 1 denotes true), and the current set of clauses , obtained by simplifying under . Specifically, consists of the clauses in with literals falsified by removed and clauses reduced to empty if all literals are falsified.[13] A clause in is satisfied if it contains a literal set to true by ; otherwise, it is either unit (containing one unassigned literal) or non-unit. The domain is the set of variables assigned by . The algorithm proceeds via a set of inference rules applied in sequent calculus style to transform the state until a base case is reached. These rules are:- Propagate (Unit Rule): If contains a unit clause where is an unassigned literal, extend to (setting to true, which falsifies ) and update to . This rule is applied exhaustively until no unit clauses remain.[1]
- Pure (Elimination Rule): If a variable appears purely (i.e., only as or only as across all clauses in ), extend to (or for the negative case) and remove all clauses containing that literal from . This rule is applied whenever applicable.[1]
- Decide (Branch Rule): If no unit or pure literals exist, select an unassigned variable and branch by first extending to and recursing on , then (upon backtracking) to (equivalent to ) and recursing on .[13]
- Backtrack (Failure Rule): Upon recursive failure in a branch (e.g., reaching an unsatisfiable subproblem), revert to the parent state by undoing the last decision or propagation.[1]
- Fail (Empty Clause Rule): If contains the empty clause , the current state is unsatisfiable; return UNSAT and backtrack.[13]
- Propagate is sound because any satisfying assignment for must set the unit literal to true, so extending maintains equivalence.[1]
- Pure elimination is sound since setting a pure literal to match its occurrences satisfies the affected clauses without introducing conflicts, and the opposite assignment would falsify them.[1]
- Decide preserves satisfiability by exhaustive branching on an unassigned variable, ensuring completeness over all possible extensions.[13]
- Backtrack and fail ensure termination without loss of solutions, as unsatisfiable branches are pruned.[1] These properties guarantee that correctly determines satisfiability.[13]
Search Tree Visualization
The search tree of the DPLL algorithm models its backtracking execution as a binary decision tree, where each node represents a partial assignment to the Boolean variables derived from prior decisions and propagations. The root node corresponds to the empty assignment at the start of the procedure. From any internal node, two outgoing edges represent the branching decision on a selected unassigned variable, one edge for assigning it true and the other for false; these edges lead to child nodes updated with the new assignment plus any literals forced by unit propagation or pure literal elimination.[1] Pruning occurs when propagation at a node results in a contradiction, such as an empty clause or conflicting unit literals, causing the algorithm to abandon the entire subtree rooted there and backtrack to the most recent decision point. Successful leaves occur when all clauses are satisfied under the partial assignment, marking a SAT outcome with a (possibly partial) satisfying assignment; unsuccessful leaves indicate an UNSAT outcome for that path due to an irrecoverable contradiction with no further variables to branch on. This structure ensures that only viable paths toward satisfiability are explored, with propagations collapsing multiple levels of the tree into single nodes labeled by the forced assignments.[1] To illustrate, consider the execution tree for the 2-variable CNF formula , which is satisfiable (e.g., via , ). The root is the empty assignment , with no initial units. Branching on (a common static ordering choice) yields two children:- Left branch (): The first and third clauses are satisfied (by ), removed. The second clause simplifies to , propagating . All clauses are now satisfied; this SAT leaf terminates successfully with assignment .
- Right branch (): The first clause simplifies to , propagating . The second clause is satisfied (by ). The third clause then becomes empty (falsified literals only), yielding a contradiction; this leaf prunes the subtree, and backtracking occurs.
Theoretical Analysis
Correctness and Completeness
The DPLL algorithm is sound, meaning that if it returns "satisfiable" (SAT) for a propositional formula in conjunctive normal form (CNF), then a satisfying assignment indeed exists. This follows from the construction of the algorithm: unit propagation and pure literal elimination rules preserve satisfiability by only assigning values that are forced or non-damaging to the formula, while decision branches explore partial assignments that maintain equivalence to the original formula under the current implications. Upon reaching a complete assignment at a leaf of the search tree, the algorithm verifies that all clauses are satisfied, ensuring no false positives. DPLL is also complete, ensuring that if the input formula is satisfiable, the algorithm will find a satisfying assignment. Completeness arises from the exhaustive nature of the search: the algorithm branches on all unassigned variables in a depth-first manner, with unit and pure literal rules eliminating branches that cannot lead to a model without skipping any potential satisfying assignments, as these rules only prune unsatisfiable subproblems or irrelevant literals. Chronological backtracking ensures that all 2^n possible assignments are implicitly explored until a model is found or all paths are exhausted, declaring "unsatisfiable" (UNSAT) only when no model exists. The algorithm terminates for any finite input formula with n variables, as the search tree has finite depth bounded by n (one level per variable assignment) and branching factor at most 2, with no cycles due to the monotonic progress in assigning variables and the backtracking mechanism that unwinds decisions in reverse order without repetition.[16] A formal proof of correctness can be established by induction on the size of the formula or the depth of the search tree. The base case holds for empty or trivially satisfied/unsatisfied formulas. For the inductive step, each application of unit propagation, pure literal elimination, or decision preserves logical equivalence or implication between the original formula and the simplified subformula, ensuring that satisfiability is neither introduced nor lost; backtracking restores prior states correctly, maintaining the invariant that unexplored branches cover all remaining possibilities. Special cases illustrate the algorithm's robustness: tautologies, which are always satisfiable (e.g., a formula with a clause containing all literals true under some assignment), are detected as SAT through propagation or shallow search, while contradictions (e.g., a single empty clause) immediately propagate to UNSAT without branching.[16]Time and Space Complexity
The Davis–Putnam–Logemann–Loveland (DPLL) algorithm has a worst-case time complexity of , where is the number of variables in the propositional formula, arising from the potential need to explore a complete binary search tree of depth in the backtracking process. This exponential bound reflects the algorithm's completeness for deciding satisfiability, as it may evaluate up to partial assignments before concluding unsatisfiability. In the best case, however, particularly when unit propagation heavily prunes the search space—such as in formulas where most variables are forced early—the time complexity reduces to , with denoting the number of clauses, due to the linear-time cost of propagation scans across the formula at each of up to steps. For special classes like Horn formulas, the complexity is polynomial in the input size, as the search tree is bounded by nodes without excessive branching.[17][18][19] The space complexity of DPLL is , accounting for the storage of the input clauses and the stack maintaining the current partial assignment trail, which grows to at most entries during recursion. This linear space requirement enables efficient backtracking without duplicating the entire formula, though practical implementations may use additional structures like watched literals to optimize propagation without increasing asymptotic space. Factors influencing performance include the branching heuristic, such as selecting the most frequent literal in the current formula to minimize search tree size, and the efficiency of unit propagation, which can dramatically reduce the effective branching factor. Poor heuristic choices can lead to near-worst-case behavior even on average instances.[20] Empirically, early implementations of DPLL in the 1960s solved satisfiability instances with up to a few tens of variables, limited by the exponential worst case on challenging inputs.[21] Advances in hardware have extended the practical reach of unenhanced DPLL to somewhat larger instances, though performance remains sensitive to formula structure. Asymptotically, no polynomial-time algorithm for general SAT exists unless P = NP, but random 3-SAT formulas display phase transitions at a clause-to-variable ratio of approximately 4.26, where instances become exponentially harder for DPLL due to balanced satisfiability probabilities and propagation bottlenecks.[22]Modern Enhancements
Conflict-Driven Clause Learning
Conflict-driven clause learning (CDCL) represents a pivotal advancement over the original DPLL algorithm, incorporating mechanisms to analyze conflicts during search and derive new clauses that prune the search space more effectively. Introduced in the GRASP solver, CDCL extends DPLL's chronological backtracking by enabling non-chronological backtracking and learning clauses specifically upon encountering unsatisfiable subproblems (UNSAT leaves in the search tree). This evolution allows solvers to jump back to earlier decision levels based on conflict analysis rather than simply undoing the most recent assignments, significantly reducing redundant exploration of the search tree.[23] The core mechanism of CDCL revolves around conflict analysis using an implication graph, which models the unit propagations performed during Boolean constraint propagation (BCP). When a conflict arises—typically when a clause becomes falsified under the current partial assignment—the implication graph is constructed with nodes for literals and a special conflict node (⊥), and directed edges representing implications from antecedent clauses. To derive a learned clause, the algorithm identifies a resolving cut in this graph, often via the 1-unique implication point (1-UIP) scheme, where the cut is placed at the first literal (other than the decision literal) that lies on all paths from the current decision to the conflict. This process, grounded in resolution steps along the graph, yields a new clause that explains the conflict and is added to the formula, ensuring the solver avoids similar conflicts in future branches. The 1-UIP approach balances clause length and explanatory power, producing assertive yet concise learned clauses.[24] Consider a simplified example of conflict analysis on a CNF formula leading to a conflict. Suppose decisions and propagations assign x = true (decision at level 1), implying ¬y via clause (¬x ∨ ¬y), and later z = true (decision at level 2), implying y via (¬z ∨ y), resulting in a conflict since y and ¬y cannot both hold. The implication graph shows paths: x → ¬y and z → y, with edges to ⊥ from the conflicting literals. Applying the 1-UIP scheme resolves along these paths, deriving the learned clause (¬x ∨ y), which captures that setting x true without y leads to inconsistency. Upon learning this clause, the solver backjumps to level 1 and propagates ¬x from the new unit clause under the current context, pruning invalid subtrees.[24] The impact of CDCL has been transformative, dramatically enhancing the scalability of SAT solvers by reducing the number of backtracks and enabling the solution of large industrial instances. For instance, the Chaff solver, which popularized the 1-UIP learning scheme alongside efficient data structures like watched literals, demonstrated orders-of-magnitude speedups over prior DPLL-based solvers and routinely handled problems with up to a million variables and clauses in benchmarks from hardware verification. This breakthrough, building directly on GRASP's foundations, established CDCL as the dominant paradigm in modern SAT solving.[25][23]Heuristics and Optimizations
Branching heuristics guide the selection of literals during the search process in DPLL-based SAT solvers, significantly influencing efficiency by prioritizing decisions that reduce the search space. Static heuristics, such as the Dynamic Largest Individual Sum (DLIS), compute scores based on the frequency of literals in currently unsatisfied clauses, selecting the unassigned literal with the highest occurrence count to maximize immediate clause satisfaction. Used in early implementations like the GRASP solver, DLIS provides a straightforward, state-dependent approach but incurs high computational overhead due to frequent score updates during propagation and backtracking.[26][27] Dynamic heuristics build on static ones by incorporating search history and clause characteristics. The Maximum Occurrences in clauses of Minimum Size (MOMS) heuristic favors variables appearing most frequently in the shortest unsatisfied clauses, typically prioritizing binary clauses to exploit unit propagation and accelerate conflict detection.[28] This method, implemented in solvers like Sata, weights occurrences in longer clauses with diminishing factors (e.g., treating two ternary clauses as equivalent to one binary), promoting decisions that resolve tight constraints early.[28] A more advanced dynamic heuristic, the Variable State Independent Decaying Sum (VSIDS), maintains scores for each literal initialized by initial clause occurrences and incremented during clause learning, with periodic decay to emphasize recent conflicts; the highest-scoring unassigned literal is selected for branching.[25] Employed in the Chaff solver, VSIDS achieves an order-of-magnitude speedup on challenging instances by adapting to learned clauses without excessive overhead, roughly 10% of runtime.[25] Unit propagation, a core DPLL operation, benefits from optimizations like the two-watched literals scheme, which maintains two watched literals per clause to enable near-constant-time detection of unit clauses.[25] In this approach, clauses are only scanned when a watched literal falsifies, allowing efficient replacement with another unassigned literal or implication if the clause becomes unit; backtracking requires no watched literal updates, minimizing cache misses and memory accesses.[25] Pioneered in Chaff, this technique reduces propagation time from linear to amortized O(1) per clause, accounting for over 90% of solver runtime and yielding 1-2 orders-of-magnitude overall performance gains compared to prior methods.[25] Restart policies mitigate local search traps by periodically resetting the solver to the root of the search tree after a fixed or growing number of conflicts, diversifying exploration.[29] The geometric policy, as in early MiniSat versions, starts with short intervals (e.g., 100 conflicts) and increases them exponentially (factor of 1.5), balancing frequent restarts for UNSAT instances with longer runs for SAT ones.[29] Luby's policy, derived from randomized algorithms, follows a sequence like 1, 1, 2, 1, 1, 2, 4 (scaled by a factor such as 32), providing irregular intervals that outperform geometric restarts on industrial benchmarks, solving up to 180 of 251 instances versus 164.[29] These policies enhance clause-learning efficiency, with nontrivial restarts solving 16-30% more instances than no restarts in empirical studies.[29] Parallelization in DPLL solvers often employs portfolio approaches, running multiple configured sequential solvers concurrently to partition the search space or share learned clauses.[30] HordeSat, a massively parallel portfolio solver, uses a decentralized, hierarchical design on clusters with thousands of cores, interleaving clause exchange among instances of varied solvers like Glucose or Lingeling.[30] It excels in SAT competitions post-2010, solving all application-track benchmarks from 2011 and 2014 with scalable speedups on hard problems, outperforming single-threaded winners by leveraging thread diversity without centralized coordination.[30] Recent advancements incorporate machine learning to guide literal selection, surpassing traditional heuristics on structured instances. AutoModSAT employs large language models to automatically discover and integrate novel branching heuristics into existing solvers, achieving 30% better performance than state-of-the-art baselines and 20% average speedup over tuned variants across diverse benchmarks, including random k-SAT where it resolves instances 20-30% faster by predicting high-impact decisions from instance graphs.[31]Applications
Formal Verification and Model Checking
The DPLL algorithm and its modern variants, particularly conflict-driven clause learning (CDCL), play a central role in bounded model checking (BMC), a technique for verifying finite-state transition systems by encoding execution paths of bounded length as propositional satisfiability (SAT) problems. In BMC, a system's transition relation is unrolled for a fixed number of steps , transforming the problem of checking whether a safety property holds into determining the satisfiability of a corresponding conjunctive normal form (CNF) formula; if the formula is unsatisfiable, no violating path exists up to depth . This encoding leverages DPLL's backtracking search to systematically explore the implication graph of clauses, efficiently pruning infeasible paths through unit propagation and decision heuristics. The approach was pioneered in the late 1990s as an alternative to binary decision diagram (BDD)-based methods, offering scalability for complex hardware designs where BDDs suffer from state explosion. A representative example involves verifying a safety property in a digital circuit, such as ensuring no deadlock occurs in a pipelined processor. The circuit's state variables are unrolled over time steps, with clauses encoding the initial state, transition relations (e.g., , where are states and inputs), and the negation of the safety property at each step (e.g., ); the resulting CNF is then solved using a DPLL-based SAT solver to detect any bounded violation. Tools like MiniSat, a lightweight CDCL solver, and Z3, an SMT solver with strong SAT capabilities, are widely employed in very-large-scale integration (VLSI) design for such property checking, integrating seamlessly with hardware description languages like Verilog through front-end translators that generate CNF encodings. These tools enable automated verification of intricate components, such as arithmetic units, by exploiting CDCL's learned clauses to accelerate conflict analysis across unrolling depths.[32] BMC with DPLL/CDCL provides completeness for bounded verification depths, guaranteeing detection of all bugs within steps if they exist, while avoiding the incompleteness of abstraction-based methods for large state spaces. Its scalability stems from CDCL enhancements, allowing practical analysis of circuits with up to gates and unrolling depths exceeding 100 steps in industrial settings, far surpassing early DPLL's limitations through non-chronological backtracking and clause learning. A notable case study is Intel's application of SAT-based BMC in processor verification, evolving from early 1990s efforts in bug detection—such as analyzing floating-point divisions post-Pentium FDIV incident—to routine use in modern CPU designs like Xeon and Core architectures, where it confirms equivalence and safety properties across billions of clauses.[33]Artificial Intelligence and Planning
The DPLL algorithm underpins SAT solvers that enable automated planning by encoding classical planning problems, such as those in STRIPS formalism, into conjunctive normal form (CNF) formulas whose satisfiability corresponds to the existence of a valid plan. In this approach, planning domains and problems are translated into a plan graph, from which mutex relations (mutually exclusive facts or actions) are derived to generate compact CNF encodings with fewer variables and clauses compared to direct translations. The Blackbox planner exemplifies this method, converting STRIPS inputs in PDDL format into SAT instances solved by DPLL-based engines like Satz or WalkSAT, achieving competitive performance on benchmarks such as logistics domains by leveraging plan graph pruning and randomized restarts.[34] In model-based diagnosis, DPLL-driven SAT solving supports consistency-based techniques to identify minimal sets of faulty components that explain observed system behaviors. The system description, component health assumptions, and observations are encoded as a Boolean formula, with cardinality constraints limiting the number of faults to ensure minimal diagnoses; iterative SAT calls with incremental solvers then compute these sets efficiently. This SAT compilation outperforms traditional hitting-set algorithms on large benchmarks like ISCAS-85 circuits, solving thousands of observations in seconds by exploiting conflict-driven learning inherent to modern DPLL implementations.[35] Modern applications extend DPLL to AI domains through SMT variants like DPLL(T), as in Reluplex, which verifies deep neural networks by encoding ReLU activations and linear constraints into SMT formulas solved via DPLL(T) with on-demand splitting for non-linearities. Reluplex proves safety properties, such as robustness to adversarial inputs in collision-avoidance networks, by iteratively refining bounds and ReLU states, scaling to networks with hundreds of nodes—critical for AI safety in autonomous systems during the 2020s.[36] A representative example is the blocks world domain, where an initial configuration (e.g., block B on A, A on the table, B clear) and goal (e.g., A on B, B on the table, A clear) are encoded into CNF using time-indexed predicates likeon(x,y,t) and clear(x,t), with action variables move(x,from,to,t) enforcing preconditions, effects, and frame axioms over a bounded horizon. Satisfying this formula via a DPLL solver yields a model interpreting true action literals as a plan sequence, such as moving B to the table at step 1 (requiring clear(B,1) and updating on(B,table,2)) followed by moving A to B at step 2, confirming a two-step solution.[37]
SAT4J, a Java-based DPLL-derived solver, impacts robotics by verifying control software in modular systems, transforming extended finite state machine guards into CNF to check nondeterminism (ensuring mutually exclusive transitions) and totality (covering all inputs), thus enabling reliable mission planning in platforms like ATRON robots. As of 2025, trends in hybrid ML-SAT approaches integrate large language models to autonomously discover and optimize solver heuristics, yielding 30% gains over state-of-the-art DPLL variants on planning benchmarks and enhancing constraint satisfaction in AI tasks like scheduling.[38][31]
