Recent from talks
Nothing was collected or created yet.
Variable elimination
View on WikipediaVariable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields.[1] It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.
Factors
[edit]Enabling a key reduction in algorithmic complexity, a factor , also known as a potential, of variables is a relation between each instantiation of of variables to a non-negative number, commonly denoted as .[2] A factor does not necessarily have a set interpretation. One may perform operations on factors of different representations such as a probability distribution or conditional distribution.[2] Joint distributions often become too large to handle as the complexity of this operation is exponential. Thus variable elimination becomes more feasible when computing factorized entities.
Basic Operations
[edit]Variable Summation
[edit]Algorithm 1, called sum-out (SO), or marginalization, eliminates a single variable from a set of factors,[3] and returns the resulting set of factors. The algorithm collect-relevant simply returns those factors in involving variable .
Algorithm 1 sum-out(,)
- = collect factors relevant to
- = the product of all factors in
return
Example
Here we have a joint probability distribution. A variable, can be summed out between a set of instantiations where the set at minimum must agree over the remaining variables. The value of is irrelevant when it is the variable to be summed out.[2]
| true | true | true | false | false | 0.80 |
| false | true | true | false | false | 0.20 |
After eliminating , its reference is excluded and we are left with a distribution only over the remaining variables and the sum of each instantiation.
| true | true | false | false | 1.0 |
The resulting distribution which follows the sum-out operation only helps to answer queries that do not mention .[2] Also worthy to note, the summing-out operation is commutative.
Factor Multiplication
[edit]Computing a product between multiple factors results in a factor compatible with a single instantiation in each factor.[2]
Algorithm 2 mult-factors(,)[2]
- = Union of all variables between product of factors
- = a factor over where for all
- For each instantiation
- For 1 to
- instantiation of variables consistent with
- For 1 to
- return
Factor multiplication is not only commutative but also associative.
Inference
[edit]The most common query type is in the form where and are disjoint subsets of , and is observed taking value . A basic algorithm to computing p(X|E = e) is called variable elimination (VE), first put forth in.[1]
Taken from,[1] this algorithm computes from a discrete Bayesian network B. VE calls SO to eliminate variables one by one. More specifically, in Algorithm 2, is the set C of conditional probability tables (henceforth "CPTs") for B, is a list of query variables, is a list of observed variables, is the corresponding list of observed values, and is an elimination ordering for variables , where denotes .
Variable Elimination Algorithm VE()
- Multiply factors with appropriate CPTs while σ is not empty
- Remove the first variable from
- = sum-out
- = the product of all factors
return
Ordering
[edit]Finding the optimal order in which to eliminate variables is an NP-hard problem. As such there are heuristics one may follow to better optimize performance by order:
- Minimum Degree: Eliminate the variable which results in constructing the smallest factor possible.[2]
- Minimum Fill: By constructing an undirected graph showing variable relations expressed by all CPTs, eliminate the variable which would result in the least edges to be added post elimination.[2]
References
[edit]- ^ a b c Zhang, Nevin L.; Poole, David (1994). "A simple approach to Bayesian network computations". Proceedings of the 10th Canadian Artificial Intelligence Conference: 171–178. Retrieved 26 August 2025.
- ^ a b c d e f g h Darwiche, Adnan (2009-01-01). Modeling and Reasoning with Bayesian Networks. doi:10.1017/cbo9780511811357. ISBN 9780511811357.
- ^ Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA (2009)
Variable elimination
View on GrokipediaIntroduction
Overview and Definition
Variable elimination (VE) is an exact inference algorithm employed in probabilistic graphical models to compute marginal probabilities or maximum a posteriori (MAP) assignments by successively eliminating variables from a joint probability distribution.[5] This method systematically reduces the complexity of the joint distribution by focusing on subsets of variables, enabling efficient computation of queries such as the probability of evidence or the most likely configuration of hidden variables. At a high level, VE represents the joint distribution as a product of factors—compact representations of partial distributions—and iteratively multiplies the factors involving a chosen variable before summing out (marginalizing over) that variable, thereby eliminating it from consideration.[5] The process continues until only the variables of interest remain, yielding the desired marginal or assignment. The choice of elimination order influences computational cost, as suboptimal orders can lead to larger intermediate factors. VE provides exact results, distinguishing it from approximate methods, and offers significant efficiency gains over brute-force enumeration, particularly for models with tree-structured graphs or low treewidth, where complexity is bounded by the induced width of the graph.[5] This makes it suitable for networks where the graph's structure limits the size of intermediate computations. The algorithm unifies inference across different graphical model types, including directed Bayesian networks and undirected Markov random fields, by treating both through a common factor-based framework.[5] Within probabilistic graphical models, VE serves as a general elimination-based alternative to message-passing algorithms like belief propagation, applicable to both acyclic and loopy structures without requiring explicit message routing.Historical Development
Variable elimination emerged in the mid-1990s as a key algorithm for exact probabilistic inference in graphical models, building on earlier advancements in constraint satisfaction problems and Bayesian network reasoning. Researchers Rina Dechter and Judea Pearl contributed foundational ideas through their work on tree clustering and decomposition methods for constraint networks in the late 1980s, which laid the groundwork for systematic variable elimination in probabilistic settings. Independently, Dechter formalized the approach in her seminal 1996 paper introducing bucket elimination as a unifying framework for various inference tasks, including belief updating and most probable explanation finding in Bayesian networks. Concurrently, Nevin Lianwen Zhang and David Poole described a similar variable elimination procedure in their 1996 work on exploiting causal independence, emphasizing its efficiency for marginalizing variables in directed graphical models.[6] The algorithm's initial applications focused on Bayesian networks, heavily influenced by Judea Pearl's 1988 development of belief propagation, which provided an exact inference method for tree-structured graphs and highlighted the need for generalizable elimination techniques in loopy networks. Variable elimination applies to undirected graphical models such as Markov random fields, where it facilitates marginal inference by summing out variables while managing clique sizes through appropriate elimination orders. Further advancements in the 2000s addressed hybrid models combining discrete and continuous variables; for instance, methods integrating mixture of truncated exponentials and variable elimination enabled exact inference in hybrid Bayesian networks with both types of variables. Key milestones include its integration into early probabilistic software tools in the 1990s and 2000s, paving the way for broader adoption in inference engines. In the 2010s, variable elimination influenced modern open-source libraries such as libDAI (implementing exact inference methods like junction tree) and pgmpy, a Python package supporting variable elimination for Bayesian network inference.[7][8] These implementations underscored its versatility across directed and undirected models. The algorithm's impact lies in bridging exact and approximate inference paradigms, as evidenced by its prominent role in Kevin Murphy's 2012 textbook, which popularized variable elimination as a foundational technique in machine learning curricula and applications.Background Concepts
Probabilistic Graphical Models
Probabilistic graphical models (PGMs) provide a compact and intuitive representation of multivariate probability distributions by combining graph theory with probability theory, where nodes represent random variables and edges encode conditional dependencies or independencies among them.[9] This framework exploits the conditional independence structure inherent in many real-world systems to factorize complex joint distributions into simpler local components, enabling efficient storage, manipulation, and inference over high-dimensional spaces.[9] PGMs encompass both directed and undirected graph structures, facilitating applications in fields such as machine learning, statistics, and artificial intelligence.[9] Bayesian networks, also known as belief networks, are directed acyclic graphs (DAGs) in which each node corresponds to a random variable, and directed edges represent conditional dependencies between variables.[10] The joint probability distribution over the variables factorizes according to the graph structure as where denotes the set of variables with direct edges pointing to .[10] This factorization leverages the Markov condition, which states that each variable is conditionally independent of its non-descendants given its parents, allowing the model to capture asymmetric causal or influential relationships efficiently.[10] Markov random fields (MRFs), or undirected graphical models, employ undirected graphs where nodes represent random variables and edges indicate mutual dependencies without directional asymmetry.[11] The joint distribution factorizes over the maximal cliques of the graph via the Hammersley-Clifford theorem, which equates positive MRFs with Gibbs distributions: under the positivity condition (all probabilities positive), a distribution is Markovian if and only if it admits a factorization where ranges over maximal cliques, are potential functions, and is the normalization constant.[11] The global Markov property in MRFs asserts conditional independence between variables separated by a conditioning set in the graph, focusing on symmetric interactions within cliques.[11] Key differences between Bayesian networks and MRFs lie in their graph orientations and independence semantics: BNs use directed edges to encode directional asymmetries and ancestral relationships, while MRFs rely on undirected edges and graph separations to model undirected conditional independencies.[9] Both structures support variable elimination for exact inference tasks, such as computing marginal probabilities or maximum a posteriori (MAP) estimates , where are query variables and denotes observed evidence.[9] Standard notation in PGMs includes the set of variables , with discrete or continuous domains, and inference queries conditioned on evidence .[9] The computational complexity of inference in PGMs is governed by the treewidth of the graph, defined as the minimum, over all elimination orderings, of one less than the size of the largest clique induced during elimination.[12] Low treewidth implies that the graph can be decomposed into a tree-like structure with small separators, enabling polynomial-time inference algorithms like variable elimination; for instance, graphs with treewidth bounded by yield complexity exponential in but linear in the number of nodes.[12] This measure quantifies the "tree-likeness" of the graph and is crucial for assessing the tractability of exact methods in dense dependency structures.[12]Factors in Graphical Models
In probabilistic graphical models, a factor is defined as a function mapping assignments to a subset of variables, known as its scope, to non-negative real numbers, typically representing unnormalized probabilities or potentials that capture local interactions among the variables.[13] These factors serve as the core data structures for inference algorithms like variable elimination, allowing the joint distribution to be decomposed into compact local components rather than storing the full exponential-sized joint explicitly.[14] For discrete variables, factors are commonly represented as multidimensional arrays or tables, where each entry corresponds to the function value for a specific assignment of values within the scope; for instance, a factor over variables and might be tabulated with rows for values of and columns for .[13] In contrast, for continuous variables, factors take functional forms, such as Gaussian densities, to represent densities over continuous scopes efficiently.[13] Factors in graphical models are derived directly from the model's structure: in Bayesian networks, each conditional probability distribution constitutes a factor over and its parents ; in Markov random fields, factors correspond to potential functions defined over maximal cliques in the undirected graph.[13] A key property is their multiplicativity, whereby the joint distribution (or unnormalized measure) is the product of all factors, enabling the full model to be reconstructed from these local pieces.[13] Additionally, summation over a variable in a factor's scope corresponds to marginalization or integration, effectively eliminating that variable from consideration.[14] This factor-based representation is particularly advantageous in variable elimination, as it permits dynamic combination and reduction of factors during inference, avoiding the need to materialize the full joint distribution, which grows exponentially with the number of variables .[14] By operating on these smaller scopes, the approach exploits conditional independencies inherent in the graphical model to maintain computational tractability.[13]Core Operations
Factor Multiplication
In probabilistic graphical models, factor multiplication is a fundamental operation used to combine information from multiple factors during inference algorithms such as variable elimination. Given two factors with scope (the set of variables it depends on) and with scope , their product has scope . For any assignment to the variables in , the value is computed as , where and denote the projections of onto the respective scopes; if a variable is absent from a scope, its value is irrelevant to that factor's evaluation. This operation effectively creates a new factor that encodes the joint compatibility over the union of variables, preserving the probabilistic semantics of the original factors.[15] To perform the multiplication algorithmically, the representations of and (typically tables or arrays indexed by assignments to their scopes) are aligned over any shared variables in . For each possible joint assignment to , the corresponding values from and are retrieved and multiplied pointwise to populate the new factor . The time complexity of this step is , where is the size of the Cartesian product of the domains of variables in , assuming discrete finite domains; space requirements are similarly bounded by the output factor's size. If the domains of shared variables differ between and (e.g., due to prior restrictions or evidence instantiation), the factors must be extended by assigning dummy values—such as 0 for incompatible assignments in unnormalized potentials or 1 for neutral extension in multiplicative contexts—or restricted to the intersection of compatible assignments to ensure well-defined products.[15][16] A simple example illustrates the process: consider factors over variable with domain and over with domain , where scopes are disjoint. Their product is , yielding a table with four entries, each the product of the independent marginal values. For overlapping scopes, such as and with shared and domains for both, the product has eight entries (assuming binary domains for and ), where for each matching , the values are computed, effectively integrating information across the chain. In variable elimination, this multiplication is crucial for building intermediate factors that represent partial joint distributions over the neighbors of a variable to be eliminated, enabling efficient propagation of dependencies without expanding to the full joint over all variables.[15] Numerical issues arise in practice due to the potential for underflow (when multiplying many probabilities less than 1) or overflow (with unnormalized potentials exceeding machine precision). To mitigate this, factor multiplication is often implemented in log-space: values are stored and manipulated as logarithms, converting the product to a sum , which avoids extreme exponents while preserving relative magnitudes; normalization can follow using log-sum-exp tricks for stability. This approach is particularly beneficial in high-dimensional models where intermediate factors involve numerous multiplications.[17]Variable Summation
In variable elimination for probabilistic graphical models, the variable summation operation, also known as marginalization, removes a variable from a factor by summing over its entire domain, thereby producing a new factor with a reduced scope. This step is fundamental to reducing the joint distribution to marginals of interest. For a factor defined over a scope that includes variable , the resulting factor after summing out has scope and is given by where denotes an assignment to the variables in , and is the domain of .[14] Algorithmically, computing this summation involves iterating over all possible assignments to and, for each such assignment, accumulating the values of across all . The time required is proportional to the size of the original factor, which is , as each entry in the factor table must be considered. This operation often follows factor multiplication, which combines all factors involving into a single factor before marginalization. In the context of discrete variables, as is common in many Bayesian networks and Markov random fields, the summation directly aggregates probabilities or potentials. For instance, given a factor over variables and , summing out yields , a unary factor over alone.[14] In graphical models with continuous variables, the discrete summation is replaced by integration to marginalize over the continuous domain: , assuming a one-dimensional continuous variable ; multivariate cases extend this analogously. This adaptation enables exact inference in hybrid discrete-continuous models, though it may require closed-form solutions or numerical methods for tractability. Each elimination step in the variable elimination algorithm relies on this summation to progressively remove variables, generating intermediate factors that capture the marginals over the surviving variables until only the query variables remain.[18] When evidence is incorporated, factors are first restricted by setting observed variables to their evidence values, which prunes the domains and adjusts the summation accordingly—for example, if a variable in the factor is observed as , the factor is multiplied by a Dirac delta or indicator function at , effectively fixing before any summation over other variables. This ensures that the marginals condition on the evidence without altering the core summation mechanics.[14]The Algorithm
Step-by-Step Procedure
The variable elimination algorithm computes marginal probabilities or maximum a posteriori (MAP) estimates in probabilistic graphical models by systematically eliminating variables from a set of factors representing the joint distribution. The input consists of an initial set of factors derived from the graphical model (such as conditional probability tables in Bayesian networks or potential functions in Markov random fields), the query variables , and evidence variables with observed values. An elimination order for the non-query variables is also required, which specifies the sequence in which variables are removed; selection of this order is assumed to be provided here.[15] The first step processes the evidence by restricting each factor to the observed values in . This involves evaluating at the fixed values for the observed variables in its scope and then removing those variables from the scope, yielding a new set of restricted factors. This step ensures that only unobserved variables remain in subsequent computations, effectively conditioning the model on the evidence.[15][19] Next, the algorithm iterates over the non-query variables in the chosen elimination order. For each variable to eliminate, it collects all factors in the current set whose scopes include . These factors are multiplied together to form an intermediate factor , where the product is over the relevant factors; the scope of includes and all other variables appearing in the multiplied factors. Then, is summed out by computing , where denotes the remaining variables in the scope of . This new factor is added back to the set of factors, replacing the originals used in the multiplication, while factors not involving remain unchanged. The process repeats for the next variable until all non-query variables are eliminated. For MAP queries, the summation over is replaced by an argmax operation to track the maximizing assignment.[15][19] After eliminating all non-query variables, the remaining factors have scopes contained within the query variables . These factors are multiplied together to obtain the unnormalized marginal , where the product is over the final factors. For probability queries, normalization is applied by dividing by to yield . The output is this final factor over for marginal inference or the tracked assignment for MAP.[15] The procedure can be outlined in pseudocode as follows:function VariableElimination(factors φ, query Y, evidence e, order σ):
// Step 1: Restrict to evidence
for each φ_i in φ:
φ_i ← [Restrict](/page/Restrict)(φ_i, e) // Evaluate at e and remove observed vars
// Step 2-3: Eliminate variables in order σ (non-query vars)
remaining_vars ← all vars except Y
while remaining_vars ≠ ∅:
X_i ← next var in σ from remaining_vars
relevant_φ ← {φ_j in φ | X_i in scope(φ_j)}
ψ ← Product(relevant_φ) // Multiply: ψ(x_i, Z) = ∏ φ_j
φ' ← SumOut(ψ, X_i) // φ'(Z) = ∑_{x_i} ψ(x_i, Z)
φ ← (φ \ relevant_φ) ∪ {φ'} // Update factor set
remaining_vars ← remaining_vars \ {X_i}
// Step 4: Compute final marginal
final_φ ← Product(φ) // All remaining factors over Y
P(Y|e) ← Normalize(final_φ) // Divide by sum over Y
return P(Y|e)
function VariableElimination(factors φ, query Y, evidence e, order σ):
// Step 1: Restrict to evidence
for each φ_i in φ:
φ_i ← [Restrict](/page/Restrict)(φ_i, e) // Evaluate at e and remove observed vars
// Step 2-3: Eliminate variables in order σ (non-query vars)
remaining_vars ← all vars except Y
while remaining_vars ≠ ∅:
X_i ← next var in σ from remaining_vars
relevant_φ ← {φ_j in φ | X_i in scope(φ_j)}
ψ ← Product(relevant_φ) // Multiply: ψ(x_i, Z) = ∏ φ_j
φ' ← SumOut(ψ, X_i) // φ'(Z) = ∑_{x_i} ψ(x_i, Z)
φ ← (φ \ relevant_φ) ∪ {φ'} // Update factor set
remaining_vars ← remaining_vars \ {X_i}
// Step 4: Compute final marginal
final_φ ← Product(φ) // All remaining factors over Y
P(Y|e) ← Normalize(final_φ) // Divide by sum over Y
return P(Y|e)
Illustrative Example
To illustrate the variable elimination algorithm, consider a simple Bayesian network with three binary variables: A (root node), B (dependent on A), and C (dependent on both A and B). The network structure forms a V-shape, textually depicted as: A
/ \
B C
A
/ \
B C
| A | φ₁(A) |
|---|---|
| true | 0.5 |
| false | 0.5 |
| A | B | φ₂(A, B) |
|---|---|---|
| true | true | 0.9 |
| true | false | 0.1 |
| false | true | 0.1 |
| false | false | 0.9 |
| A | B | C | φ₃(A, B, C) |
|---|---|---|---|
| true | true | true | 0.8 |
| true | false | true | 0.3 |
| false | true | true | 0.4 |
| false | false | true | 0.2 |
φ(A = true, C = true) = [φ₂(true, true) × φ₃(true, true, true)] + [φ₂(true, false) × φ₃(true, false, true)]
= (0.9 × 0.8) + (0.1 × 0.3) = 0.72 + 0.03 = 0.75 For A = false, C = true:
φ(A = false, C = true) = (0.1 × 0.4) + (0.9 × 0.2) = 0.04 + 0.18 = 0.22 Thus, the factor φ(A, C = true) is:
| A | C | φ(A, C) |
|---|---|---|
| true | true | 0.75 |
| false | true | 0.22 |
| A | C | ψ(A, C) |
|---|---|---|
| true | true | 0.5 × 0.75 = 0.375 |
| false | true | 0.5 × 0.22 = 0.11 |
