Hubbry Logo
Variable eliminationVariable eliminationMain
Open search
Variable elimination
Community hub
Variable elimination
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Variable elimination
Variable elimination
from Wikipedia

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields.[1] It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.

Factors

[edit]

Enabling a key reduction in algorithmic complexity, a factor , also known as a potential, of variables is a relation between each instantiation of of variables to a non-negative number, commonly denoted as .[2] A factor does not necessarily have a set interpretation. One may perform operations on factors of different representations such as a probability distribution or conditional distribution.[2] Joint distributions often become too large to handle as the complexity of this operation is exponential. Thus variable elimination becomes more feasible when computing factorized entities.

Basic Operations

[edit]

Variable Summation

[edit]

Algorithm 1, called sum-out (SO), or marginalization, eliminates a single variable from a set of factors,[3] and returns the resulting set of factors. The algorithm collect-relevant simply returns those factors in involving variable .

Algorithm 1 sum-out(,)

= collect factors relevant to
= the product of all factors in


return

Example

Here we have a joint probability distribution. A variable, can be summed out between a set of instantiations where the set at minimum must agree over the remaining variables. The value of is irrelevant when it is the variable to be summed out.[2]

true true true false false 0.80
false true true false false 0.20

After eliminating , its reference is excluded and we are left with a distribution only over the remaining variables and the sum of each instantiation.

true true false false 1.0

The resulting distribution which follows the sum-out operation only helps to answer queries that do not mention .[2] Also worthy to note, the summing-out operation is commutative.

Factor Multiplication

[edit]

Computing a product between multiple factors results in a factor compatible with a single instantiation in each factor.[2]

Algorithm 2 mult-factors(,)[2]

= Union of all variables between product of factors
= a factor over where for all
For each instantiation
For 1 to
instantiation of variables consistent with
return

Factor multiplication is not only commutative but also associative.

Inference

[edit]

The most common query type is in the form where and are disjoint subsets of , and is observed taking value . A basic algorithm to computing p(X|E = e) is called variable elimination (VE), first put forth in.[1]

Taken from,[1] this algorithm computes from a discrete Bayesian network B. VE calls SO to eliminate variables one by one. More specifically, in Algorithm 2, is the set C of conditional probability tables (henceforth "CPTs") for B, is a list of query variables, is a list of observed variables, is the corresponding list of observed values, and is an elimination ordering for variables , where denotes .

Variable Elimination Algorithm VE()

Multiply factors with appropriate CPTs while σ is not empty
Remove the first variable from
= sum-out
= the product of all factors

return

Ordering

[edit]

Finding the optimal order in which to eliminate variables is an NP-hard problem. As such there are heuristics one may follow to better optimize performance by order:

  1. Minimum Degree: Eliminate the variable which results in constructing the smallest factor possible.[2]
  2. Minimum Fill: By constructing an undirected graph showing variable relations expressed by all CPTs, eliminate the variable which would result in the least edges to be added post elimination.[2]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Variable elimination is an exact inference algorithm employed in probabilistic graphical models, including Bayesian networks and Markov random fields, to compute marginal probabilities, conditional distributions, or optimal assignments by systematically eliminating variables from the joint distribution through summation or maximization operations in a specified elimination order. Introduced as part of the bucket elimination framework, it generalizes dynamic programming principles to unify various probabilistic inference tasks, such as belief updating, finding the most probable explanation (MPE), maximum a posteriori (MAP) estimation, and maximum expected utility (MEU) computation. The algorithm operates by representing the joint probability as a product of factors—typically conditional probability tables or potentials—and iteratively processing non-query variables: for each variable to eliminate, it multiplies all relevant factors involving that variable, then marginalizes (sums out) the variable to produce a new factor over its neighboring variables, which is then placed into subsequent processing steps. This process exploits the conditional independencies encoded in the graphical model's structure to avoid exhaustive enumeration of the full joint distribution, reducing computational complexity from exponential in the number of variables to exponential in the induced width (or treewidth) of the moralized graph under the chosen elimination ordering. Optimal orderings, often found via heuristics like minimum degree or minimum fill-in, are crucial for minimizing the size of intermediate factors and thus controlling time and space requirements, which can otherwise grow prohibitively large for dense or loopy graphs. While highly efficient for tree-structured or low-treewidth models, variable elimination's exact nature limits its scalability to very large or complex networks, prompting extensions like conditioning on evidence to prune irrelevant subnetworks or approximations such as loopy belief propagation for intractable cases.

Introduction

Overview and Definition

Variable elimination (VE) is an exact algorithm employed in probabilistic graphical models to compute marginal probabilities or maximum a posteriori () assignments by successively eliminating variables from a . This method systematically reduces the complexity of the joint distribution by focusing on subsets of variables, enabling efficient computation of queries such as the probability of evidence or the most likely configuration of hidden variables. At a high level, VE represents the joint distribution as a product of factors—compact representations of partial distributions—and iteratively multiplies the factors involving a chosen variable before summing out (marginalizing over) that variable, thereby eliminating it from consideration. The process continues until only the variables of interest remain, yielding the desired marginal or assignment. The of elimination order influences computational , as suboptimal orders can lead to larger intermediate factors. VE provides exact results, distinguishing it from approximate methods, and offers significant efficiency gains over brute-force enumeration, particularly for models with tree-structured graphs or low , where complexity is bounded by the induced width of the graph. This makes it suitable for networks where the graph's structure limits the size of intermediate computations. The algorithm unifies across different types, including directed Bayesian networks and undirected Markov random fields, by treating both through a common factor-based framework. Within probabilistic s, VE serves as a general elimination-based alternative to message-passing algorithms like , applicable to both acyclic and loopy structures without requiring explicit message routing.

Historical Development

Variable elimination emerged in the mid-1990s as a key for exact probabilistic in graphical models, building on earlier advancements in problems and reasoning. Researchers Rina Dechter and contributed foundational ideas through their work on tree clustering and decomposition methods for constraint networks in the late , which laid the groundwork for systematic variable elimination in probabilistic settings. Independently, Dechter formalized the approach in her seminal 1996 paper introducing bucket elimination as a unifying framework for various tasks, including updating and most probable explanation finding in s. Concurrently, Nevin Lianwen Zhang and David Poole described a similar variable elimination procedure in their 1996 work on exploiting causal independence, emphasizing its efficiency for marginalizing variables in directed graphical models. The algorithm's initial applications focused on Bayesian networks, heavily influenced by Judea Pearl's 1988 development of , which provided an method for tree-structured graphs and highlighted the need for generalizable elimination techniques in loopy networks. Variable elimination applies to undirected graphical models such as Markov random fields, where it facilitates marginal by summing out variables while managing clique sizes through appropriate elimination orders. Further advancements in the addressed hybrid models combining discrete and continuous variables; for instance, methods integrating of truncated exponentials and variable elimination enabled in hybrid Bayesian networks with both types of variables. Key milestones include its integration into early probabilistic software tools in the and , paving the way for broader adoption in inference engines. In the , variable elimination influenced modern open-source libraries such as libDAI (implementing exact inference methods like junction tree) and pgmpy, a Python package supporting variable elimination for inference. These implementations underscored its versatility across directed and undirected models. The algorithm's impact lies in bridging exact and approximate paradigms, as evidenced by its prominent role in Kevin Murphy's 2012 textbook, which popularized variable elimination as a foundational technique in curricula and applications.

Background Concepts

Probabilistic Graphical Models

Probabilistic graphical models (PGMs) provide a compact and intuitive representation of multivariate probability distributions by combining with , where nodes represent random variables and edges encode conditional dependencies or independencies among them. This framework exploits the structure inherent in many real-world systems to factorize complex joint distributions into simpler local components, enabling efficient storage, manipulation, and over high-dimensional spaces. PGMs encompass both directed and undirected graph structures, facilitating applications in fields such as , statistics, and . Bayesian networks, also known as belief networks, are directed acyclic graphs (DAGs) in which each node corresponds to a random variable, and directed edges represent conditional dependencies between variables. The joint probability distribution over the variables X={X1,,Xn}X = \{X_1, \dots, X_n\} factorizes according to the graph structure as P(X)=i=1nP(XiParents(Xi)),P(X) = \prod_{i=1}^n P(X_i \mid \mathrm{Parents}(X_i)), where Parents(Xi)\mathrm{Parents}(X_i) denotes the set of variables with direct edges pointing to XiX_i. This factorization leverages the Markov condition, which states that each variable is conditionally independent of its non-descendants given its parents, allowing the model to capture asymmetric causal or influential relationships efficiently. Markov random fields (MRFs), or undirected graphical models, employ undirected graphs where nodes represent random variables and edges indicate mutual dependencies without directional asymmetry. The joint distribution factorizes over the maximal cliques of the graph via the , which equates positive MRFs with Gibbs distributions: under the positivity condition (all probabilities positive), a distribution is Markovian it admits a P(X)=1ZCψC(XC),P(X) = \frac{1}{Z} \prod_C \psi_C(X_C), where CC ranges over maximal cliques, ψC\psi_C are potential functions, and ZZ is the normalization constant. The global in MRFs asserts between variables separated by a conditioning set in the graph, focusing on symmetric interactions within cliques. Key differences between Bayesian networks and MRFs lie in their graph orientations and independence semantics: BNs use directed edges to encode directional asymmetries and ancestral relationships, while MRFs rely on undirected edges and graph separations to model undirected conditional independencies. Both structures support variable elimination for exact inference tasks, such as computing marginal probabilities P(Ye)P(Y \mid e) or maximum (MAP) estimates argmaxP(Xe)\arg\max P(X \mid e), where YXY \subseteq X are query variables and ee denotes observed . Standard notation in PGMs includes the set of variables X={X1,,Xn}X = \{X_1, \dots, X_n\}, with discrete or continuous domains, and inference queries conditioned on ee. The of in PGMs is governed by the of the graph, defined as the minimum, over all elimination orderings, of one less than the size of the largest induced during elimination. Low implies that the graph can be decomposed into a tree-like structure with small separators, enabling polynomial-time algorithms like variable elimination; for instance, graphs with bounded by kk yield complexity exponential in kk but linear in the number of nodes. This measure quantifies the "tree-likeness" of the graph and is crucial for assessing the tractability of exact methods in dense dependency structures.

Factors in Graphical Models

In probabilistic graphical models, a factor ϕ\phi is defined as a function mapping assignments to a subset of variables, known as its scope, to non-negative real numbers, typically representing unnormalized probabilities or potentials that capture local interactions among the variables. These factors serve as the core data structures for inference algorithms like variable elimination, allowing the joint distribution to be decomposed into compact local components rather than storing the full exponential-sized joint explicitly. For discrete variables, factors are commonly represented as multidimensional arrays or tables, where each entry corresponds to the function value for a specific assignment of values within the scope; for instance, a factor ϕA,B(a,b)\phi_{A,B}(a,b) over variables AA and BB might be tabulated with rows for values of AA and columns for BB. In contrast, for continuous variables, factors take functional forms, such as Gaussian densities, to represent densities over continuous scopes efficiently. Factors in graphical models are derived directly from the model's structure: in Bayesian networks, each conditional probability distribution P(XiPa(Xi))P(X_i \mid \mathrm{Pa}(X_i)) constitutes a factor over XiX_i and its parents Pa(Xi)\mathrm{Pa}(X_i); in Markov random fields, factors correspond to potential functions defined over maximal cliques in the undirected graph. A key property is their multiplicativity, whereby the joint distribution (or unnormalized measure) is the product of all factors, enabling the full model to be reconstructed from these local pieces. Additionally, summation over a variable in a factor's scope corresponds to marginalization or integration, effectively eliminating that variable from consideration. This factor-based representation is particularly advantageous in variable elimination, as it permits dynamic combination and reduction of factors during inference, avoiding the need to materialize the full joint distribution, which grows exponentially with the number of variables nn. By operating on these smaller scopes, the approach exploits conditional independencies inherent in the to maintain computational tractability.

Core Operations

Factor Multiplication

In probabilistic graphical models, factor multiplication is a fundamental operation used to combine information from multiple factors during algorithms such as variable elimination. Given two factors ϕ1\phi_1 with scope S1S_1 (the set of variables it depends on) and ϕ2\phi_2 with scope S2S_2, their product ϕ=ϕ1×ϕ2\phi = \phi_1 \times \phi_2 has scope S=S1S2S = S_1 \cup S_2. For any assignment ss to the variables in SS, the value is computed as ϕ(s)=ϕ1(sS1)×ϕ2(sS2)\phi(s) = \phi_1(s_{S_1}) \times \phi_2(s_{S_2}), where sS1s_{S_1} and sS2s_{S_2} denote the projections of ss onto the respective scopes; if a variable is absent from a scope, its value is irrelevant to that factor's evaluation. This operation effectively creates a new factor that encodes the compatibility over the union of variables, preserving the probabilistic semantics of the original factors. To perform the multiplication algorithmically, the representations of ϕ1\phi_1 and ϕ2\phi_2 (typically tables or arrays indexed by assignments to their scopes) are aligned over any shared variables in S1S2S_1 \cap S_2. For each possible joint assignment to S1S2S_1 \cup S_2, the corresponding values from ϕ1\phi_1 and ϕ2\phi_2 are retrieved and multiplied to populate the new factor ϕ\phi. The of this step is O(\dom(S1S2))O(|\dom(S_1 \cup S_2)|), where \dom(V)\dom(V) is the size of the of the domains of variables in VV, assuming discrete finite domains; space requirements are similarly bounded by the output factor's size. If the domains of shared variables differ between ϕ1\phi_1 and ϕ2\phi_2 (e.g., due to prior restrictions or instantiation), the factors must be extended by assigning dummy values—such as 0 for incompatible assignments in unnormalized potentials or 1 for neutral extension in multiplicative contexts—or restricted to the of compatible assignments to ensure well-defined products. A simple example illustrates the process: consider factors ϕA(a)\phi_A(a) over variable AA with domain {a1,a2}\{a_1, a_2\} and ϕB(b)\phi_B(b) over BB with domain {b1,b2}\{b_1, b_2\}, where scopes are disjoint. Their product is ϕA,B(a,b)=ϕA(a)×ϕB(b)\phi_{A,B}(a,b) = \phi_A(a) \times \phi_B(b), yielding a table with four entries, each the product of the independent marginal values. For overlapping scopes, such as ϕA,B(a,b)\phi_{A,B}(a,b) and ϕB,C(b,c)\phi_{B,C}(b,c) with BB shared and domains {b1,b2}\{b_1, b_2\} for both, the product ϕA,B,C(a,b,c)\phi_{A,B,C}(a,b,c) has eight entries (assuming binary domains for AA and CC), where for each matching bb, the values ϕA,B(a,b)×ϕB,C(b,c)\phi_{A,B}(a,b) \times \phi_{B,C}(b,c) are computed, effectively integrating information across the chain. In variable elimination, this multiplication is crucial for building intermediate factors that represent partial joint distributions over the neighbors of a variable to be eliminated, enabling efficient propagation of dependencies without expanding to the full joint over all variables. Numerical issues arise in practice due to the potential for underflow (when multiplying many probabilities less than 1) or overflow (with unnormalized potentials exceeding machine precision). To mitigate this, factor multiplication is often implemented in log-space: values are stored and manipulated as logarithms, converting the product to a sum logϕ(s)=logϕ1(sS1)+logϕ2(sS2)\log \phi(s) = \log \phi_1(s_{S_1}) + \log \phi_2(s_{S_2}), which avoids extreme exponents while preserving relative magnitudes; normalization can follow using log-sum-exp tricks for stability. This approach is particularly beneficial in high-dimensional models where intermediate factors involve numerous multiplications.

Variable Summation

In variable elimination for probabilistic graphical models, the variable summation operation, also known as marginalization, removes a variable from a factor by summing over its entire domain, thereby producing a new factor with a reduced scope. This step is fundamental to reducing the joint distribution to marginals of interest. For a ϕ\phi defined over a scope SS that includes variable XX, the resulting ϕ\phi' after summing out XX has scope S{X}S \setminus \{X\} and is given by ϕ(s)=x\dom(X)ϕ(s,x),\phi'(s') = \sum_{x \in \dom(X)} \phi(s', x), where ss' denotes an assignment to the variables in S{X}S \setminus \{X\}, and \dom(X)\dom(X) is the domain of XX. Algorithmically, computing this summation involves iterating over all possible assignments to S{X}S \setminus \{X\} and, for each such assignment, accumulating the values of ϕ\phi across all x\dom(X)x \in \dom(X). The time required is proportional to the size of the original factor, which is O(\dom(S{X})\dom(X))O(|\dom(S \setminus \{X\})| \cdot |\dom(X)|), as each entry in the factor table must be considered. This operation often follows factor multiplication, which combines all factors involving XX into a single factor before marginalization. In the context of discrete variables, as is common in many Bayesian networks and Markov random fields, the summation directly aggregates probabilities or potentials. For instance, given a factor ϕA,B(a,b)\phi_{A,B}(a,b) over variables AA and BB, summing out BB yields ϕA(a)=b\dom(B)ϕA,B(a,b)\phi_A(a) = \sum_{b \in \dom(B)} \phi_{A,B}(a,b), a unary factor over AA alone. In graphical models with continuous variables, the discrete summation is replaced by integration to marginalize over the continuous domain: ϕ(s)=ϕ(s,x)dx\phi'(s') = \int_{-\infty}^{\infty} \phi(s', x) \, dx, assuming a one-dimensional continuous variable XX; multivariate cases extend this analogously. This adaptation enables exact inference in hybrid discrete-continuous models, though it may require closed-form solutions or numerical methods for tractability. Each elimination step in the variable elimination algorithm relies on this summation to progressively remove variables, generating intermediate factors that capture the marginals over the surviving variables until only the query variables remain. When evidence is incorporated, factors are first restricted by setting observed variables to their evidence values, which prunes the domains and adjusts the accordingly—for example, if a variable YY in the factor is observed as y=ey = e, the factor is multiplied by a Dirac delta or at ee, effectively fixing YY before any over other variables. This ensures that the marginals condition on the without altering the core mechanics.

The Algorithm

Step-by-Step Procedure

The variable elimination algorithm computes marginal probabilities or maximum () estimates in probabilistic s by systematically eliminating variables from a set of factors representing the joint distribution. The input consists of an initial set of factors {ϕ1,,ϕk}\{\phi_1, \dots, \phi_k\} derived from the graphical model (such as tables in Bayesian networks or potential functions in Markov random fields), the query variables YY, and variables ee with observed values. An elimination order for the non-query variables is also required, which specifies the sequence in which variables are removed; selection of this order is assumed to be provided here. The first step processes the by restricting each factor ϕi\phi_i to the observed values in ee. This involves evaluating ϕi\phi_i at the fixed values for the observed variables in its scope and then removing those variables from the scope, yielding a new set of restricted factors. This step ensures that only unobserved variables remain in subsequent computations, effectively conditioning the model on the . Next, the algorithm iterates over the non-query variables in the chosen elimination order. For each variable XiX_i to eliminate, it collects all factors in the current set whose scopes include XiX_i. These factors are multiplied together to form an intermediate factor ψ=ϕj\psi = \prod \phi_j, where the product is over the relevant factors; the scope of ψ\psi includes XiX_i and all other variables appearing in the multiplied factors. Then, XiX_i is summed out by computing ϕ(Z)=xiψ(xi,Z)\phi' (Z) = \sum_{x_i} \psi (x_i, Z), where ZZ denotes the remaining variables in the scope of ψ\psi. This new factor ϕ\phi' is added back to the set of factors, replacing the originals used in the multiplication, while factors not involving XiX_i remain unchanged. The process repeats for the next variable until all non-query variables are eliminated. For MAP queries, the summation over XiX_i is replaced by an argmax operation to track the maximizing assignment. After eliminating all non-query variables, the remaining factors have scopes contained within the query variables YY. These factors are multiplied together to obtain the unnormalized marginal P~(Ye)=ϕm\tilde{P}(Y \mid e) = \prod \phi_m, where the product is over the final factors. For probability queries, normalization is applied by dividing by YP~(Ye)\sum_Y \tilde{P}(Y \mid e) to yield P(Ye)P(Y \mid e). The output is this final factor over YY for marginal inference or the tracked assignment for MAP. The procedure can be outlined in pseudocode as follows:

function VariableElimination(factors φ, query Y, evidence e, order σ): // Step 1: Restrict to evidence for each φ_i in φ: φ_i ← [Restrict](/page/Restrict)(φ_i, e) // Evaluate at e and remove observed vars // Step 2-3: Eliminate variables in order σ (non-query vars) remaining_vars ← all vars except Y while remaining_vars ≠ ∅: X_i ← next var in σ from remaining_vars relevant_φ ← {φ_j in φ | X_i in scope(φ_j)} ψ ← Product(relevant_φ) // Multiply: ψ(x_i, Z) = ∏ φ_j φ' ← SumOut(ψ, X_i) // φ'(Z) = ∑_{x_i} ψ(x_i, Z) φ ← (φ \ relevant_φ) ∪ {φ'} // Update factor set remaining_vars ← remaining_vars \ {X_i} // Step 4: Compute final marginal final_φ ← Product(φ) // All remaining factors over Y P(Y|e) ← Normalize(final_φ) // Divide by sum over Y return P(Y|e)

function VariableElimination(factors φ, query Y, evidence e, order σ): // Step 1: Restrict to evidence for each φ_i in φ: φ_i ← [Restrict](/page/Restrict)(φ_i, e) // Evaluate at e and remove observed vars // Step 2-3: Eliminate variables in order σ (non-query vars) remaining_vars ← all vars except Y while remaining_vars ≠ ∅: X_i ← next var in σ from remaining_vars relevant_φ ← {φ_j in φ | X_i in scope(φ_j)} ψ ← Product(relevant_φ) // Multiply: ψ(x_i, Z) = ∏ φ_j φ' ← SumOut(ψ, X_i) // φ'(Z) = ∑_{x_i} ψ(x_i, Z) φ ← (φ \ relevant_φ) ∪ {φ'} // Update factor set remaining_vars ← remaining_vars \ {X_i} // Step 4: Compute final marginal final_φ ← Product(φ) // All remaining factors over Y P(Y|e) ← Normalize(final_φ) // Divide by sum over Y return P(Y|e)

This pseudocode integrates the core operations of factor multiplication and variable summation as building blocks for efficient inference.

Illustrative Example

To illustrate the variable elimination algorithm, consider a simple Bayesian network with three binary variables: A (root node), B (dependent on A), and C (dependent on both A and B). The network structure forms a V-shape, textually depicted as:

A / \ B C

A / \ B C

The query is the marginal probability P(C = true). The conditional probability tables (CPTs) define the initial factors: φ₁(A) = P(A), φ₂(A, B) = P(B | A), and φ₃(A, B, C) = P(C | A, B). All variables take values in {true, false}. The factor φ₁(A) is given by:
Aφ₁(A)
true0.5
false0.5
The factor φ₂(A, B) is:
ABφ₂(A, B)
truetrue0.9
truefalse0.1
falsetrue0.1
falsefalse0.9
The factor φ₃(A, B, C) for C = true (with symmetric probabilities for C = false summing to 1 for each (A, B) instantiation) is:
ABCφ₃(A, B, C)
truetruetrue0.8
truefalsetrue0.3
falsetruetrue0.4
falsefalsetrue0.2
Following the variable elimination procedure with elimination order B then A, first multiply φ₂(A, B) and φ₃(A, B, C = true) to obtain the intermediate factor ψ(A, B, C = true) = φ₂(A, B) × φ₃(A, B, C = true). Then sum out B to produce φ(A, C = true) = ∑_B ψ(A, B, C = true). The summation over B yields: For A = true, C = true:
φ(A = true, C = true) = [φ₂(true, true) × φ₃(true, true, true)] + [φ₂(true, false) × φ₃(true, false, true)]
= (0.9 × 0.8) + (0.1 × 0.3) = 0.72 + 0.03 = 0.75
For A = false, C = true:
φ(A = false, C = true) = (0.1 × 0.4) + (0.9 × 0.2) = 0.04 + 0.18 = 0.22
Thus, the factor φ(A, C = true) is:
ACφ(A, C)
truetrue0.75
falsetrue0.22
Next, multiply this with φ₁(A) to get ψ(A, C = true) = φ₁(A) × φ(A, C = true):
ACψ(A, C)
truetrue0.5 × 0.75 = 0.375
falsetrue0.5 × 0.22 = 0.11
Finally, sum out A: φ(C = true) = ∑_A ψ(A, C = true) = 0.375 + 0.11 = 0.485. This is the unnormalized marginal probability, but since the factors derive from normalized CPTs, it equals the true P(C = true) = 0.485. A similar process for C = false yields P(C = false) = 0.515, confirming normalization.

Applications in Inference

Exact Inference in Bayesian Networks

Variable elimination (VE) applied to Bayesian networks (BNs) begins by representing the as a product of tables (CPTs) associated with each node, where each CPT serves as an initial factor encoding the conditional dependencies. To perform elimination, the (DAG) of the BN is first moralized by adding undirected edges between all pairs of parents of each common child and removing edge directions, transforming it into an undirected moral graph that captures the relevant conditional independencies for inference. The elimination order is then selected on this moral graph to systematically remove variables, ensuring that the process respects the network's structure while avoiding cycles in the intermediate computations. For marginal inference queries such as computing the posterior probability P(Y|e), where Y is the query variable and e represents observed evidence on other nodes, VE processes the factors by summing out all non-query and non-evidence variables in the chosen order, ultimately yielding a factor over Y that is normalized to obtain the marginal. When evidence is present on multiple nodes, the initial factors corresponding to those evidence variables are restricted upfront by setting them to their observed values, reducing the domain size and simplifying subsequent multiplications and summations. This adaptation ensures efficient incorporation of evidence without altering the core elimination steps. To address maximum a posteriori (MAP) queries, which seek the most probable assignment to a of variables given , VE is modified into a max-product variant by replacing summation operations during elimination with maximization over the eliminated variables, propagating the highest probability assignments through the factors. This extension maintains the same structural framework but shifts the objective from probabilistic marginalization to optimization, making it suitable for finding optimal configurations in diagnostic or decision-making tasks within BNs. A specialized variant known as bucket elimination organizes the factors into "buckets" indexed by the variable to be eliminated last among its arguments, processing each bucket sequentially from the end of the elimination order to the beginning, which facilitates efficient storage and reuse of intermediate results particularly in directed structures like BNs. This approach leverages the ordering to minimize factor sizes during processing, enhancing practicality for networks with moderate connectivity. Although BNs are acyclic, the moralization process can introduce cycles, which VE handles by implicitly triangulating the graph during elimination—adding fill-in edges to form s—and the overall time and space complexity is exponential in the induced width of the moral graph under the chosen ordering, a measure of the largest size formed during the process. The of the moral graph, defined as the minimum induced width over all possible orderings, provides a graph-theoretic bound on this complexity, indicating that exact VE is feasible only for BNs with small or structured , such as those approximating tree-like dependencies.

Exact Inference in Markov Random Fields

In Markov random fields (MRFs), variable elimination enables exact inference by factorizing the joint distribution over an undirected graph into potentials defined on maximal s CC. The joint probability is given by P(X)=1ZCψC(XC)P(X) = \frac{1}{Z} \prod_{C} \psi_C(X_C), where ψC\psi_C are non-negative potential functions over the variables in clique CC, and Z=XCψC(XC)Z = \sum_X \prod_{C} \psi_C(X_C) is the normalizing partition function. The algorithm proceeds similarly to its application in Bayesian networks but operates directly on the undirected structure, without needing moralization to connect parents. An elimination order is chosen for the variables, and for each variable XiX_i to eliminate, all potentials involving XiX_i are multiplied together, then XiX_i is marginalized out (typically by ) to produce a new potential over the remaining neighboring variables. This multiplication and elimination step induces fill-in edges between neighbors of XiX_i, effectively completing the graph to a chordal form where every cycle of greater than three has a chord; the resulting factors represent the updated over the triangulated graph. To compute a query such as the marginal P(Ye)P(Y \mid e), evidence variables in ee are fixed by multiplying in delta potentials (indicators that enforce the observed values), and all non-query variables are eliminated in sequence; the final factor over YY is then normalized by the partition function ZZ, which is computed via a separate full-elimination pass over all variables. Unlike in Bayesian networks, where conditional probability tables are normalized, MRF potentials ψC\psi_C are often unnormalized, and the lack of directed edges means independence is encoded via separation in the undirected graph rather than d-separation. A representative example arises in image denoising with a grid-structured MRF, where each is a variable Xi,jX_{i,j}, data potentials ψd(Xi,j,Yi,j)\psi_d(X_{i,j}, Y_{i,j}) encode compatibility with the noisy Yi,jY_{i,j}, and pairwise potentials ψs(Xi,j,Xi,j+1)\psi_s(X_{i,j}, X_{i,j+1}) and ψs(Xi,j,Xi+1,j)\psi_s(X_{i,j}, X_{i+1,j}) encourage neighboring pixels to take similar values. Applying elimination row-by-row—first eliminating all variables in row 1, then row 2, and so on—multiplies horizontal and vertical potentials to form intermediate factors over entire rows, yielding the posterior marginals for the denoised ; however, this creates factors of size exponential in the grid width due to induced cliques. Exact inference via variable elimination is feasible in MRFs that are tree-structured (treewidth 1, allowing linear-time computation equivalent to ) or have bounded low treewidth, where the complexity is polynomial in the number of variables but exponential in the treewidth. For loopy MRFs with cycles and high treewidth, such as dense grids, exact VE becomes intractable, leading to approximations like loopy or truncated elimination schemes, though the focus here remains on exact methods for low-treewidth cases.

Complexity and Optimization

Computational Complexity

The of variable elimination (VE) is fundamentally tied to the structure of the and the chosen elimination order. During each elimination step for a variable XiX_i, the algorithm multiplies relevant factors and sums over the domain of XiX_i, incurring a cost of O(dw+1)O(d^{w+1}), where dd is the maximum domain size of the variables and ww is the induced width of the graph with respect to the elimination order (the maximum size of the scope of intermediate factors minus one). Since there are nn variables to eliminate, the total is O(ndw+1)O(n \cdot d^{w+1}). The space complexity mirrors this, as the algorithm must store intermediate factors, requiring up to O(dw+1)O(d^{w+1}) space for the largest ones generated during the process. The induced width ww is closely related to the treewidth twtw of the graph, which is the minimum induced width over all possible elimination orders; thus, VE's cost can be bounded by O(ndtw+1)O(n \cdot d^{tw+1}). For graphs with low treewidth, such as trees where tw=1tw=1, VE achieves O(nd2)O(n \cdot d^{2}) time complexity, which is linear in nn for fixed dd, making it efficient for sparse structures. In the worst case, for dense graphs where twn1tw \approx n-1, the complexity becomes exponential, scaling as O(ndn)O(n \cdot d^{n}) or, for binary domains (d=2d=2), O(n2n)O(n \cdot 2^{n}), though practical implementations often exhibit exponential behavior in nn. Several factors influence this complexity: larger domain sizes dd amplify the exponent, while the number of initial factors affects multiplication costs; incorporating evidence can reduce factor scopes, potentially lowering ww and thus mitigating the exponential growth. Compared to naive enumeration, which requires O(dn)O(d^n) time by evaluating all joint assignments, VE is more efficient when wnw \ll n, but it remains inferior to approximate methods like loopy belief propagation for high-treewidth graphs where exact inference is intractable.

Variable Ordering Strategies

The efficiency of variable elimination (VE) hinges critically on the choice of elimination order, as it directly influences the size of intermediate factors generated during summation and multiplication steps. A poor ordering can lead to exponentially large factors by inducing high-degree cliques in the moralized graph, whereas a well-chosen order minimizes the induced width—the maximum size of these cliques minus one—keeping computational costs manageable. For instance, in a chain-structured graph with variables connected sequentially (A-B-C-D), eliminating endpoint variables first (e.g., A then D) maintains factors of size at most two, resulting in linear time complexity; conversely, eliminating a central variable like B first creates a factor over A, B, and C, inflating the width unnecessarily. Static heuristics compute an ordering in advance based on the graph structure to approximate the minimum induced width. The minimum-degree selects the variable with the fewest neighbors at each step, aiming to delay the growth of clique sizes. The min-fill extends this by choosing the variable whose elimination introduces the fewest new edges (fill-ins) to the graph, thereby controlling the of the induced graph more effectively; empirical studies show it often yields orderings with smaller widths than minimum-degree, particularly in sparse networks. The goal of these heuristics is to find an ordering close to the minimum induced width, though determining the exact minimum is NP-complete. Dynamic ordering strategies adapt the selection during the VE process by evaluating current factor sizes or scopes, such as prioritizing the variable whose involved factors have the smallest product of domain sizes, to minimize immediate computational overhead. While less common in pure VE due to the fixed-order nature of the algorithm, dynamic approaches appear in hybrid variants like mini-bucket elimination, where they refine buckets on-the-fly for better bounding. Algorithms for generating good orderings include greedy implementations of min-fill and minimum-degree, which run in near-linear time, and more advanced methods like or with approximations for larger graphs, as exhaustive search for the optimal order is infeasible. Tools such as , developed by the , integrate these heuristics into VE implementations for Bayesian networks, allowing users to select min-fill or other static orderings to optimize on moralized graphs. Limitations persist, as even strong heuristics may not achieve optimal widths on dense graphs, often necessitating combination with graph triangulation techniques to bound fill-ins upfront; in practice, multiple heuristics are tested to balance preprocessing time against speed.
Add your contribution
Related Hubs
User Avatar
No comments yet.