Hubbry Logo
Dempster–Shafer theoryDempster–Shafer theoryMain
Open search
Dempster–Shafer theory
Community hub
Dempster–Shafer theory
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Dempster–Shafer theory
Dempster–Shafer theory
from Wikipedia
Arthur P. Dempster at the Workshop on Theory of Belief Functions (Brest, 1 April 2010).

The theory of belief functions, also referred to as evidence theory or Dempster–Shafer theory (DST), is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as probability, possibility and imprecise probability theories. Introduced by Arthur P. Dempster[1] in the context of statistical inference, the theory was later developed by Glenn Shafer into a general framework for modeling epistemic uncertainty—a mathematical theory of evidence.[2][3] The theory allows one to combine evidence from different sources and arrive at a degree of belief (represented by a mathematical object called belief function) that takes into account all the available evidence.

In a narrow sense, the term Dempster–Shafer theory refers to the original conception of the theory by Dempster and Shafer. However, it is more common to use the term in the wider sense of the same general approach, as adapted to specific kinds of situations. In particular, many authors have proposed different rules for combining evidence, often with a view to handling conflicts in evidence better.[4] The early contributions have also been the starting points of many important developments, including the transferable belief model and the theory of hints.[5]

Overview

[edit]

Dempster–Shafer theory is a generalization of the Bayesian theory of subjective probability. Belief functions base degrees of belief (or confidence, or trust) for one question on the subjective probabilities for a related question. The degrees of belief themselves may or may not have the mathematical properties of probabilities; how much they differ depends on how closely the two questions are related.[6] Put another way, it is a way of representing epistemic plausibilities, but it can yield answers that contradict those arrived at using probability theory.

Often used as a method of sensor fusion, Dempster–Shafer theory is based on two ideas: obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule[7] for combining such degrees of belief when they are based on independent items of evidence. In essence, the degree of belief in a proposition depends primarily upon the number of answers (to the related questions) containing the proposition, and the subjective probability of each answer. Also contributing are the rules of combination that reflect general assumptions about the data.

In this formalism a degree of belief (also referred to as a mass) is represented as a belief function rather than a Bayesian probability distribution. Probability values are assigned to sets of possibilities rather than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions.

Dempster–Shafer theory assigns its masses to all of the subsets of the set of states of a system—in set-theoretic terms, the power set of the states. For instance, assume a situation where there are two possible states of a system. For this system, any belief function assigns mass to the first state, the second, to both, and to neither.

Belief and plausibility

[edit]

Shafer's formalism starts from a set of possibilities under consideration, for instance numerical values of a variable, or pairs of linguistic variables like "date and place of origin of a relic" (asking whether it is antique or a recent fake). A hypothesis is represented by a subset of this frame of discernment, like "(Ming dynasty, China)", or "(19th century, Germany)".[2]: p.35f. 

Shafer's framework allows for belief about such propositions to be represented as intervals, bounded by two values, belief (or support) and plausibility:

beliefplausibility.

In a first step, subjective probabilities (masses) are assigned to all subsets of the frame; usually, only a restricted number of sets will have non-zero mass (focal elements).[2]: 39f.  Belief in a hypothesis is constituted by the sum of the masses of all subsets of the hypothesis-set. It is the amount of belief that directly supports either the given hypothesis or a more specific one, thus forming a lower bound on its probability. Belief (usually denoted Bel) measures the strength of the evidence in favor of a proposition p. It ranges from 0 (indicating no evidence) to 1 (denoting certainty). Plausibility is 1 minus the sum of the masses of all sets whose intersection with the hypothesis is empty. Or, it can be obtained as the sum of the masses of all sets whose intersection with the hypothesis is not empty. It is an upper bound on the possibility that the hypothesis could be true, because there is only so much evidence that contradicts that hypothesis. Plausibility (denoted by Pl) is thus related to Bel by Pl(p) = 1 − Bel(~p). It also ranges from 0 to 1 and measures the extent to which evidence in favor of ~p leaves room for belief in p.

For example, suppose we have a belief of 0.5 for a proposition, say "the cat in the box is dead." This means that we have evidence that allows us to state strongly that the proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis (i.e. "the cat is alive") only has a confidence of 0.2. The remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand, and the 0.2 contrary evidence on the other) is "indeterminate," meaning that the cat could either be dead or alive. This interval represents the level of uncertainty based on the evidence in the system.

Hypothesis Mass Belief Plausibility
Neither (alive nor dead) 0 0 0
Alive 0.2 0.2 0.5
Dead 0.5 0.5 0.8
Either (alive or dead) 0.3 1.0 1.0

The "neither" hypothesis is set to zero by definition (it corresponds to "no solution"). The orthogonal hypotheses "Alive" and "Dead" have probabilities of 0.2 and 0.5, respectively. This could correspond to "Live/Dead Cat Detector" signals, which have respective reliabilities of 0.2 and 0.5. Finally, the all-encompassing "Either" hypothesis (which simply acknowledges there is a cat in the box) picks up the slack so that the sum of the masses is 1. The belief for the "Alive" and "Dead" hypotheses matches their corresponding masses because they have no subsets; belief for "Either" consists of the sum of all three masses (Either, Alive, and Dead) because "Alive" and "Dead" are each subsets of "Either". The "Alive" plausibility is 1 − m (Dead): 0.5 and the "Dead" plausibility is 1 − m (Alive): 0.8. In other way, the "Alive" plausibility is m(Alive) + m(Either) and the "Dead" plausibility is m(Dead) + m(Either). Finally, the "Either" plausibility sums m(Alive) + m(Dead) + m(Either). The universal hypothesis ("Either") will always have 100% belief and plausibility—it acts as a checksum of sorts.

Here is a somewhat more elaborate example where the behavior of belief and plausibility begins to emerge. We're looking through a variety of detector systems at a single faraway signal light, which can only be coloured in one of three colours (red, yellow, or green):

Hypothesis Mass Belief Plausibility
None 0 0 0
Red 0.35 0.35 0.56
Yellow 0.25 0.25 0.45
Green 0.15 0.15 0.34
Red or Yellow 0.06 0.66 0.85
Red or Green 0.05 0.55 0.75
Yellow or Green 0.04 0.44 0.65
Any 0.1 1.0 1.0

Events of this kind would not be modeled as distinct entities in probability space as they are here in mass assignment space. Rather the event "Red or Yellow" would be considered as the union of the events "Red" and "Yellow", and (see probability axioms) P(Red or Yellow) ≥ P(Yellow), and P(Any) = 1, where Any refers to Red or Yellow or Green. In DST the mass assigned to Any refers to the proportion of evidence that can not be assigned to any of the other states, which here means evidence that says there is a light but does not say anything about what color it is. In this example, the proportion of evidence that shows the light is either Red or Green is given a mass of 0.05. Such evidence might, for example, be obtained from a R/G color blind person. DST lets us extract the value of this sensor's evidence. Also, in DST the empty set is considered to have zero mass, meaning here that the signal light system exists and we are examining its possible states, not speculating as to whether it exists at all.

Combining beliefs

[edit]

Beliefs from different sources can be combined with various fusion operators to model specific situations of belief fusion, e.g. with Dempster's rule of combination, which combines belief constraints[8] that are dictated by independent belief sources, such as in the case of combining hints[5] or combining preferences.[9] Note that the probability masses from propositions that contradict each other can be used to obtain a measure of conflict between the independent belief sources. Other situations can be modeled with different fusion operators, such as cumulative fusion of beliefs from independent sources, which can be modeled with the cumulative fusion operator.[10]

Dempster's rule of combination is sometimes interpreted as an approximate generalisation of Bayes' rule. In this interpretation the priors and conditionals need not be specified, unlike traditional Bayesian methods, which often use a symmetry (minimax error) argument to assign prior probabilities to random variables (e.g. assigning 0.5 to binary values for which no information is available about which is more likely). However, any information contained in the missing priors and conditionals is not used in Dempster's rule of combination unless it can be obtained indirectly—and arguably is then available for calculation using Bayes equations.

Dempster–Shafer theory allows one to specify a degree of ignorance in this situation instead of being forced to supply prior probabilities that add to unity. This sort of situation, and whether there is a real distinction between risk and ignorance, has been extensively discussed by statisticians and economists. See, for example, the contrasting views of Daniel Ellsberg, Howard Raiffa, Kenneth Arrow and Frank Knight.[citation needed]

Formal definition

[edit]

Let X be the universe: the set representing all possible states of a system under consideration. The power set

is the set of all subsets of X, including the empty set . For example, if:

then

The elements of the power set can be taken to represent propositions concerning the actual state of the system, by containing all and only the states in which the proposition is true.

The theory of evidence assigns a belief mass to each element of the power set. Formally, a function

is called a basic belief assignment (BBA), when it has two properties. First, the mass of the empty set is zero:

Second, the masses of all the members of the power set add up to a total of 1:

The mass m(A) of A, a given member of the power set, expresses the proportion of all relevant and available evidence that supports the claim that the actual state belongs to A but to no particular subset of A. The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A, each of which have, by definition, their own mass.

From the mass assignments, the upper and lower bounds of a probability interval can be defined. This interval contains the precise probability of a set of interest (in the classical sense), and is bounded by two non-additive continuous measures called belief (or support) and plausibility:

The belief bel(A) for a set A is defined as the sum of all the masses of subsets of the set of interest:

The plausibility pl(A) is the sum of all the masses of the sets B that intersect the set of interest A:

The two measures are related to each other as follows:

And conversely, for finite A, given the belief measure bel(B) for all subsets B of A, we can find the masses m(A) with the following inverse function:

where |A − B| is the difference of the cardinalities of the two sets.[4]

It follows from the last two equations that, for a finite set X, one needs to know only one of the three (mass, belief, or plausibility) to deduce the other two; though one may need to know the values for many sets in order to calculate one of the other values for a particular set. In the case of an infinite X, there can be well-defined belief and plausibility functions but no well-defined mass function.[11]

Dempster's rule of combination

[edit]

The problem we now face is how to combine two independent sets of probability mass assignments in specific situations. In case different sources express their beliefs over the frame in terms of belief constraints such as in the case of giving hints or in the case of expressing preferences, then Dempster's rule of combination is the appropriate fusion operator. This rule derives common shared belief between multiple sources and ignores all the conflicting (non-shared) belief through a normalization factor. Use of that rule in other situations than that of combining belief constraints has come under serious criticism, such as in case of fusing separate belief estimates from multiple sources that are to be integrated in a cumulative manner, and not as constraints. Cumulative fusion means that all probability masses from the different sources are reflected in the derived belief, so no probability mass is ignored.

Specifically, the combination (called the joint mass) is calculated from the two sets of masses m1 and m2 in the following manner:

where

K is a measure of the amount of conflict between the two mass sets.

Effects of conflict

[edit]

The normalization factor above, 1 − K, has the effect of completely ignoring conflict and attributing any mass associated with conflict to the empty set. This combination rule for evidence can therefore produce counterintuitive results, as we show next.

Example producing correct results in case of high conflict

[edit]

The following example shows how Dempster's rule produces intuitive results when applied in a preference fusion situation, even when there is high conflict.

Suppose that two friends, Alice and Bob, want to see a film at the cinema one evening, and that there are only three films showing: X, Y and Z. Alice expresses her preference for film X with probability 0.99, and her preference for film Y with a probability of only 0.01. Bob expresses his preference for film Z with probability 0.99, and his preference for film Y with a probability of only 0.01. When combining the preferences with Dempster's rule of combination it turns out that their combined preference results in probability 1.0 for film Y, because it is the only film that they both agree to see.
Dempster's rule of combination produces intuitive results even in case of totally conflicting beliefs when interpreted in this way. Assume that Alice prefers film X with probability 1.0, and that Bob prefers film Z with probability 1.0. When trying to combine their preferences with Dempster's rule it turns out that it is undefined in this case, which means that there is no solution. This would mean that they can not agree on seeing any film together, so they do not go to the cinema together that evening. However, the semantics of interpreting preference as a probability is vague: if it is referring to the probability of seeing film X tonight, then we face the fallacy of the excluded middle: the event that actually occurs, seeing none of the films tonight, has a probability mass of 0.

Example producing counter-intuitive results in case of high conflict

[edit]

An example with exactly the same numerical values was introduced by Lotfi Zadeh in 1979,[12][13][14] to point out counter-intuitive results generated by Dempster's rule when there is a high degree of conflict. The example goes as follows:

Suppose that one has two equi-reliable doctors and one doctor believes a patient has either a brain tumor, with a probability (i.e. a basic belief assignment—bba's, or mass of belief) of 0.99; or meningitis, with a probability of only 0.01. A second doctor believes the patient has a concussion, with a probability of 0.99, and believes the patient suffers from meningitis, with a probability of only 0.01. Applying Dempster's rule to combine these two sets of masses of belief, one gets finally m(meningitis)=1 (the meningitis is diagnosed with 100 percent of confidence).

Such result goes against common sense since both doctors agree that there is a little chance that the patient has a meningitis. This example has been the starting point of many research works for trying to find a solid justification for Dempster's rule and for foundations of Dempster–Shafer theory[15][16] or to show the inconsistencies of this theory.[17][18][19]

Example producing counter-intuitive results in case of low conflict

[edit]

The following example shows where Dempster's rule produces a counter-intuitive result, even when there is low conflict.

Suppose that one doctor believes a patient has either a brain tumor, with a probability of 0.99, or meningitis, with a probability of only 0.01. A second doctor also believes the patient has a brain tumor, with a probability of 0.99, and believes the patient suffers from concussion, with a probability of only 0.01. If we calculate m (brain tumor) with Dempster's rule, we obtain

This result implies complete support for the diagnosis of a brain tumor, which both doctors believed very likely. The agreement arises from the low degree of conflict between the two sets of evidence comprised by the two doctors' opinions.

In either case, it would be reasonable to expect that:

since the existence of non-zero belief probabilities for other diagnoses implies less than complete support for the brain tumour diagnosis.

Dempster–Shafer as a generalisation of Bayesian theory

[edit]

As in Dempster–Shafer theory, a Bayesian belief function has the properties and . The third condition, however, is subsumed by, but relaxed in DS theory:[2]: p. 19 

Either of the following conditions implies the Bayesian special case of the DS theory:[2]: p. 37, 45 

  • For finite X, all focal elements of the belief function are singletons.

As an example of how the two approaches differ, a Bayesian could model the color of a car as a probability distribution over (red, green, blue), assigning one number to each color. Dempster–Shafer would assign numbers to each of (red, green, blue, (red or green), (red or blue), (green or blue), (red or green or blue)). These numbers do not have to be coherent; for example, Bel(red)+Bel(green) does not have to equal Bel(red or green).

Thus, Bayes' conditional probability can be considered as a special case of Dempster's rule of combination.[2]: p. 19f.  However, it lacks many (if not most) of the properties that make Bayes' rule intuitively desirable, leading some to argue that it cannot be considered a generalization in any meaningful sense.[20] For example, DS theory violates the requirements for Cox's theorem, which implies that it cannot be considered a coherent (contradiction-free) generalization of classical logic—specifically, DS theory violates the requirement that a statement be either true or false (but not both). As a result, DS theory is subject to the Dutch Book argument, implying that any agent using DS theory would agree to a series of bets that result in a guaranteed loss.

Bayesian approximation

[edit]

The Bayesian approximation[21][22] reduces a given bpa to a (discrete) probability distribution, i.e. only singleton subsets of the frame of discernment are allowed to be focal elements of the approximated version of :

It's useful for those who are only interested in the single state hypothesis.

We can perform it in the 'light' example.

Hypothesis
None 0 0 0 0 0 0
Red 0.35 0.11 0.32 0.41 0.30 0.37
Yellow 0.25 0.21 0.33 0.33 0.38 0.38
Green 0.15 0.33 0.24 0.25 0.32 0.25
Red or Yellow 0.06 0.21 0.07 0 0 0
Red or Green 0.05 0.01 0.01 0 0 0
Yellow or Green 0.04 0.03 0.01 0 0 0
Any 0.1 0.1 0.02 0 0 0

Criticism

[edit]

Judea Pearl (1988a, chapter 9;[23] 1988b[24] and 1990)[25] has argued that it is misleading to interpret belief functions as representing either "probabilities of an event," or "the confidence one has in the probabilities assigned to various outcomes," or "degrees of belief (or confidence, or trust) in a proposition," or "degree of ignorance in a situation." Instead, belief functions represent the probability that a given proposition is provable from a set of other propositions, to which probabilities are assigned. Confusing probabilities of truth with probabilities of provability may lead to counterintuitive results in reasoning tasks such as (1) representing incomplete knowledge, (2) belief-updating and (3) evidence pooling. He further demonstrated that, if partial knowledge is encoded and updated by belief function methods, the resulting beliefs cannot serve as a basis for rational decisions.

Kłopotek and Wierzchoń[26] proposed to interpret the Dempster–Shafer theory in terms of statistics of decision tables (of the rough set theory), whereby the operator of combining evidence should be seen as relational joining of decision tables. In another interpretation M. A. Kłopotek and S. T. Wierzchoń[27] propose to view this theory as describing destructive material processing (under loss of properties), e.g. like in some semiconductor production processes. Under both interpretations reasoning in DST gives correct results, contrary to the earlier probabilistic interpretations, criticized by Pearl in the cited papers and by other researchers.

Jøsang proved that Dempster's rule of combination actually is a method for fusing belief constraints.[8] It only represents an approximate fusion operator in other situations, such as cumulative fusion of beliefs, but generally produces incorrect results in such situations. The confusion around the validity of Dempster's rule therefore originates in the failure of correctly interpreting the nature of situations to be modeled. Dempster's rule of combination always produces correct and intuitive results in situation of fusing belief constraints from different sources.

Relational measures

[edit]

In considering preferences one might use the partial order of a lattice instead of the total order of the real line as found in Dempster–Schafer theory. Indeed, Gunther Schmidt has proposed this modification and outlined the method.[28]

Given a set of criteria C and a bounded lattice L with ordering ≤, Schmidt defines a relational measure to be a function μ from the power set of C into L that respects the order ⊆ on (C):

and such that μ takes the empty subset of (C) to the least element of L, and takes C to the greatest element of L.

Schmidt compares μ with the belief function of Schafer, and he also considers a method of combining measures generalizing the approach of Dempster (when new evidence is combined with previously held evidence). He also introduces a relational integral and compares it to the Choquet integral and Sugeno integral. Any relation m between C and L may be introduced as a "direct valuation", then processed with the calculus of relations to obtain a possibility measure μ.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Dempster–Shafer theory, also known as the theory of or function theory, is a mathematical framework for modeling and reasoning under that generalizes classical by allowing the representation of incomplete or partial through non-additive measures of . It enables the quantification of both support for hypotheses and the degree of ignorance about them, distinguishing between aleatory () and epistemic (lack of ). Developed initially by Arthur P. Dempster in the 1960s, the theory introduced the concept of upper and lower probabilities induced by multivalued mappings to handle imprecise inference from data to parameters. Glenn Shafer expanded this foundation in 1976 into a comprehensive theory of belief functions, providing axioms for belief measures and a rule for combining evidence from independent sources. At its core, the theory operates over a frame of discernment Θ, a finite set of mutually exclusive and exhaustive hypotheses, with uncertainty represented via a basic probability assignment (or mass function) m: 2^Θ → [0,1], where m(∅) = 0 and ∑ m(A) = 1 for A ⊆ Θ, assigning "mass" to subsets rather than singletons. From the mass function, two key functions are derived: the belief function Bel(A) = ∑{B ⊆ A} m(B), which measures the total evidence supporting A, and the plausibility function Pl(A) = ∑{B ∩ A ≠ ∅} m(B) = 1 - Bel(Θ \ A), providing an upper bound on the potential truth of A. Evidence from multiple sources is aggregated using Dempster's rule of combination, a commutative operator that normalizes the product of mass functions while accounting for conflict (mass assigned to the empty set). This rule, formalized as m_{1,2}(A) = (1/(1-K)) ∑{B ∩ C = A} m_1(B) m_2(C) where K = ∑{B ∩ C = ∅} m_1(B) m_2(C), allows for the fusion of disparate pieces of evidence without requiring prior probabilities. The theory's advantages include its ability to explicitly model ignorance (via mass on the full frame Θ) and to avoid the additivity assumptions of probability, making it suitable for applications where data is sparse or subjective. It has been influential in fields such as , particularly in expert systems for uncertainty management, as well as in , , and decision support under conflicting information. Despite critiques regarding and the counterintuitive behavior of Dempster's rule in certain paradoxes, extensions like transferable belief models have addressed these issues, sustaining its relevance in modern uncertain reasoning.

Overview

Core concepts

Dempster–Shafer theory (DST), also known as theory, is a mathematical framework designed to represent and reason with uncertainty arising from incomplete or conflicting , extending beyond the limitations of traditional by allowing assignments of to sets of hypotheses rather than individual outcomes. This approach enables the modeling of evidential reasoning where may support multiple possibilities without specifying exact probabilities, originating from Arthur P. Dempster's foundational work on under uncertainty through multivalued mappings that induce upper and lower probabilities. DST distinguishes key types of uncertainty: , where evidence provides no distinction among hypotheses and belief is uniformly distributed over the entire possibility space; conflict, arising from incompatible pieces of evidence that cannot simultaneously support the same hypotheses. By using set-valued assignments, DST captures these uncertainties more expressively than Bayesian methods, which require commitment to point probabilities and assume independence, allowing instead for the representation of partial commitment to hypotheses based on available evidence. At the heart of DST is the frame of discernment Θ, defined as the finite comprising all mutually exclusive and exhaustive hypotheses relevant to the problem at hand, serving as the foundational over which is assessed. This setup motivates DST's core advantage: it permits degrees of belief that reflect the strength of evidence without forcing a full probabilistic distribution, thus accommodating situations of limited information where classical probability might overcommit or underrepresent evidential support. Belief functions, derived from such assignments, provide a mechanism to quantify this evidential support in a coherent manner.

Belief and plausibility measures

In Dempster–Shafer theory, the function, denoted Bel, maps the power set of the frame of discernment Θ to the unit interval [0,1] and satisfies four key axioms: it is monotonic, meaning Bel(A) ≤ Bel(B) whenever A ⊆ B ⊆ Θ; it is normalized such that Bel(∅) = 0 and Bel(Θ) = 1; and Bel(A) + Bel(Θ \ A) ≤ 1 for all A ⊆ Θ. These properties ensure that belief increases with the inclusion of supporting , that no belief is assigned to impossible events while full belief is assigned to the entire frame, and that there is room for . Belief functions are constructed from basic probability assignments, which distribute belief masses over subsets of Θ. The plausibility function, Pl, provides an upper bound on belief and is defined for any subset A ⊆ Θ by the relation Pl(A)=1Bel(Aˉ),\text{Pl}(A) = 1 - \text{Bel}(\bar{A}), where Aˉ\bar{A} denotes the complement of A with respect to Θ. This dual formulation captures the degree to which evidence does not contradict A, reflecting potential support amid uncertainty. A fundamental property is that Bel(A) ≤ Pl(A) for all A ⊆ Θ, forming an interval [Bel(A), Pl(A)] that quantifies the range between confirmed support and possible extension. Belief functions exhibit superadditivity: for any disjoint subsets A and B of Θ, Bel(A ∪ B) ≥ Bel(A) + Bel(B), indicating that the belief in a union is at least the sum of individual beliefs, allowing for reinforcement of evidence. Conversely, plausibility functions are subadditive: for disjoint A and B, Pl(A ∪ B) ≤ Pl(A) + Pl(B), indicating that the potential support for a union cannot exceed the sum of individual plausibilities. These properties highlight the theory's capacity to handle imprecision without assuming probabilistic independence. Conceptually, Bel(A) measures the total portion of evidence that commits to A or its subsets, representing confirmed support from available sources. In contrast, Pl(A) measures the maximum evidence that might support A, incorporating portions uncommitted to A or its complement and thus accounting for or . For illustration, consider Θ = {a, b} with Bel({a}) = 0.6 and Bel({b}) = 0.2; then Pl({a}) = 1 - Bel({b}) = 0.8, showing strong but not absolute support for a amid some favoring b.

Evidence combination process

In Dempster–Shafer theory, the evidence combination process aggregates basic probability assignments from multiple independent sources to derive an updated representation of uncertainty, enabling the fusion of partial and potentially conflicting information into a cohesive belief structure. This iterative procedure refines the overall belief and plausibility measures by incorporating new evidence, thereby reducing ignorance or reinforcing commitment to subsets of the frame of discernment. The primary goal is to model how distinct pieces of evidence interact to support or refute hypotheses without requiring complete probabilistic specificity. Several types of combination rules exist to handle different evidential relationships, with the conjunctive rule serving as the default for reinforcing agreement among sources. Conjunctive , as formalized in the theory's foundational framework, intersects focal elements to emphasize shared support, making it suitable for scenarios where sources are expected to corroborate each other. In contrast, disjunctive unions focal elements, preserving broader possibilities and avoiding the dismissal of potentially valid but non-overlapping , which is particularly useful when sources provide complementary rather than reinforcing . A hybrid approach, the Dubois-Prade rule, addresses cases of non-empty intersections by selectively applying conjunction where possible and disjunction otherwise, thus balancing precision and inclusivity in the presence of partial overlaps. The process relies on the assumption that evidence sources are independent, meaning their uncertainties arise from distinct origins without mutual influence, which justifies the multiplicative aggregation of probabilities in the underlying model. This ensures that the combined reflects the joint informational content without introducing spurious correlations. Without this assumption, alternative conditioning methods may be needed to account for dependencies. At a high level, the combination process follows these steps: first, assign basic probability masses to focal elements from each source; second, apply the selected combination rule—such as the conjunctive rule as the primary method—to merge the masses; third, normalize the result to redistribute any uncommitted mass if applicable; and finally, derive the updated and plausibility measures from the combined mass function. This sequence can be iterated as new becomes available, progressively refining the evidential assessment. Conceptually, the process can be visualized as a : begin with parallel inputs of basic probability assignments from multiple sources, converge them through the combination operator (e.g., for conjunctive), filter for normalization if conflict arises, and output to updated /plausibility functions, with loops for sequential additions of . This diagram underscores the modular and extensible nature of evidence fusion in the .

Mathematical Foundations

Frame of discernment

In Dempster–Shafer theory, the frame of discernment, denoted Θ\Theta, is defined as a comprising mutually exclusive and exhaustive hypotheses relevant to a particular question or problem domain. The elements of Θ\Theta represent the basic or simple hypotheses, while any of Θ\Theta constitutes a compound hypothesis that may encompass across multiple simple hypotheses. This structure allows the theory to model incomplete or imprecise evidence without requiring assignment of probabilities solely to individual elements. The of Θ\Theta, denoted 2Θ2^\Theta, includes all possible subsets of Θ\Theta and forms the complete domain of hypotheses over which is distributed. Subsets within 2Θ2^\Theta serve as potential focal elements to which can be assigned, enabling the representation of varying degrees of and . Basic probability assignments are defined over this , providing the foundational for functions. For scenarios involving multi-level , refined or hierarchical can be employed, where a coarser frame Θ1\Theta_1 is extended by a finer frame Θ2\Theta_2 that partitions the elements of Θ1\Theta_1 into more detailed hypotheses (e.g., Θ2\Theta_2 refines Θ1\Theta_1 such that elements of Θ1\Theta_1 correspond to unions of elements in Θ2\Theta_2). This refinement supports nested representations of , allowing functions from the coarser frame to be extended vacuous to the finer one while preserving evidential consistency. The theory assumes the finiteness of Θ\Theta to ensure computational tractability, as infinite frames would complicate the enumeration of subsets in 2Θ2^\Theta. Unlike classical , no prior probabilities are assigned exclusively to the singleton elements of Θ\Theta; instead, evidence can support unions of hypotheses directly, accommodating situations where distinction among singletons is not possible. For instance, in a medical diagnosis context, Θ={disease,no disease}\Theta = \{\text{disease}, \text{no disease}\} captures the binary yet potentially uncertain outcomes based on available tests.

Basic probability assignments

In Dempster–Shafer theory, the basic probability assignment, often denoted as mm, serves as the foundational representation of uncertain . It is defined as a function m:2Θ[0,1]m: 2^\Theta \to [0,1], where Θ\Theta is the frame of discernment and 2Θ2^\Theta denotes the power set of all possible subsets of Θ\Theta. This function satisfies two key : m()=0m(\emptyset) = 0, ensuring no evidence is assigned to the , and AΘm(A)=1\sum_{A \subseteq \Theta} m(A) = 1, normalizing the total assignment to unity. The value m(A)m(A) for a subset AΘA \subseteq \Theta represents the exact portion of belief or "mass" of evidence committed precisely to AA, without further specification to any of its proper subsets or supersets. This allows the theory to model both confirmation of specific hypotheses and degrees of ignorance, distinguishing it from classical probability distributions that assign masses only to singletons. Subsets AΘA \subseteq \Theta for which m(A)>0m(A) > 0 are termed focal elements, as they delineate the and scope of the available ; the collection of focal elements thus captures the structure of the encoded in mm. When AA is a singleton {θ}\{ \theta \} with θΘ\theta \in \Theta and m({θ})>0m(\{ \theta \}) > 0, it corresponds to precise supporting a single , often called a singleton in the theory's . In contrast, assignments to compound hypotheses, where A>1|A| > 1, indicate partial ignorance or distributed over multiple possibilities. For illustration, consider a frame of discernment Θ={a,b}\Theta = \{a, b\}, such as distinguishing between two potential diagnoses. A basic probability assignment might be m()=0m(\emptyset) = 0, m({a})=0.4m(\{a\}) = 0.4, m({b})=0.3m(\{b\}) = 0.3, and m({a,b})=0.3m(\{a,b\}) = 0.3, where the mass on {a,b}\{a,b\} reflects uncommitted applicable to either outcome.

Derivation of belief and plausibility

The function Bel\operatorname{Bel} and plausibility function Pl\operatorname{Pl} in Dempster–Shafer theory are derived directly from the basic probability assignment mm, which assigns masses to subsets of the frame of discernment Θ\Theta. For a subset AΘA \subseteq \Theta, the in AA is obtained by aggregating the masses of all focal elements that are entirely contained within AA: Bel(A)=BAm(B).\operatorname{Bel}(A) = \sum_{B \subseteq A} m(B). This summation captures the total evidence committed to AA and its subsets, reflecting the strength of support for AA based on the available evidence. Similarly, the plausibility of AA sums the masses of all focal elements that intersect AA at least partially: Pl(A)=BAm(B).\operatorname{Pl}(A) = \sum_{B \cap A \neq \emptyset} m(B). This measure accounts for evidence that does not contradict AA, providing an upper bound on the potential support for AA. These definitions ensure key properties of belief and plausibility functions. First, Bel(Θ)=BΘm(B)=1\operatorname{Bel}(\Theta) = \sum_{B \subseteq \Theta} m(B) = 1, since the masses are normalized such that BΘm(B)=1\sum_{B \subseteq \Theta} m(B) = 1. Second, Pl(A)1\operatorname{Pl}(A) \leq 1 follows because Pl(A)=1Bel(ΘA)\operatorname{Pl}(A) = 1 - \operatorname{Bel}(\Theta \setminus A), and Bel(ΘA)0\operatorname{Bel}(\Theta \setminus A) \geq 0. Finally, Bel(A)Pl(A)\operatorname{Bel}(A) \leq \operatorname{Pl}(A) holds as every subset BAB \subseteq A satisfies BAB \cap A \neq \emptyset, so the summation for Bel(A)\operatorname{Bel}(A) is a subset of the terms in Pl(A)\operatorname{Pl}(A). These inequalities are derived from the non-negativity of mm and the set inclusion properties. The uncommitted belief, or degree of ignorance regarding a singleton hypothesis {θ}Θ\{\theta\} \subseteq \Theta, is quantified as 1Bel(Θ{θ})1 - \operatorname{Bel}(\Theta \setminus \{\theta\}), which equals Pl({θ})\operatorname{Pl}(\{\theta\}). This value represents the portion of the basic probability assignment not assigned to evidence against {θ}\{\theta\}, highlighting the theory's ability to model beyond strict commitment. To illustrate, consider Θ={a,b}\Theta = \{a, b\} with m({a})=0.4m(\{a\}) = 0.4, m({b})=0.3m(\{b\}) = 0.3, and m({a,b})=0.3m(\{a, b\}) = 0.3. Then Bel({a})=m({a})=0.4\operatorname{Bel}(\{a\}) = m(\{a\}) = 0.4, while Pl({a})=m({a})+m({a,b})=0.7\operatorname{Pl}(\{a\}) = m(\{a\}) + m(\{a, b\}) = 0.7, demonstrating how plausibility exceeds belief due to the intersecting mass on {a,b}\{a, b\}.

Dempster's Rule of Combination

Rule formulation

Dempster's rule of combination provides a method for merging two basic probability assignments, denoted m1m_1 and m2m_2, derived from independent sources of defined over a frame of discernment Θ\Theta. The resulting combined assignment, often expressed using the orthogonal sum notation m=m1m2m = m_1 \oplus m_2 or simply m1m2m_1 \oplus m_2, assigns mass to the nonempty subsets of Θ\Theta. The explicit formula for the combined mass function is given by m(A)=11KB,CΘBC=Am1(B)m2(C),m(A) = \frac{1}{1-K} \sum_{\substack{B, C \subseteq \Theta \\ B \cap C = A}} m_1(B) \, m_2(C), for every nonempty AΘA \subseteq \Theta, where the summation ranges over all pairs of subsets BB and CC whose intersection equals AA, and K=B,CΘBC=m1(B)m2(C)K = \sum_{\substack{B, C \subseteq \Theta \\ B \cap C = \emptyset}} m_1(B) \, m_2(C) quantifies the degree of conflict between the sources. The empty set receives no mass under this rule, i.e., m()=0m(\emptyset) = 0. This formulation originates from the need to propagate upper and lower probabilities through multivalued mappings representing uncertain evidence. Intuitively, the rule computes the combined mass by multiplying the individual masses for pairs of focal elements that agree on AA (i.e., intersect at AA) and then redistributes this product proportionally after excluding the conflicting portion captured by KK, ensuring the masses sum to unity. The rule relies on the assumption that the evidence sources are statistically independent, such that the outcomes from one do not affect the other. It further presupposes that the denominator 1K1 - K is nonzero; when K=1K = 1, indicating complete conflict, the combination is undefined, though the theory addresses this through normalization procedures. Notably, the operator \oplus is commutative, satisfying m1m2=m2m1m_1 \oplus m_2 = m_2 \oplus m_1, and associative, allowing (m1m2)m3=m1(m2m3)(m_1 \oplus m_2) \oplus m_3 = m_1 \oplus (m_2 \oplus m_3) for any compatible assignments, which facilitates iterative of multiple functions.

Conflict handling and normalization

In Dempster's rule of , the conflict measure KK quantifies the degree of incompatibility between two basic probability assignments m1m_1 and m2m_2, defined as K=AB=m1(A)m2(B)K = \sum_{A \cap B = \emptyset} m_1(A) m_2(B), where the sum is taken over all pairs of focal elements whose intersection is the . This measure satisfies 0K10 \leq K \leq 1, with K=0K = 0 indicating complete compatibility between the evidence sources and no loss of belief mass to contradiction. The normalization factor in the rule, 1/(1[K](/page/K))1/(1 - [K](/page/K)), serves to redistribute the remaining non-conflicting mass so that the combined basic probability assignment sums to 1, effectively discarding the portion of mass associated with [K](/page/K)[K](/page/K) as irreconcilable. The value of [K](/page/K)[K](/page/K) represents the proportion of the unnormalized product of the two assignments that falls on the , rather than a direct probability of contradiction between the sources. A high [K](/page/K)[K](/page/K) signals substantial incompatibility, often interpreted as evidence that the sources may be drawing from distinct or erroneous underlying models, thereby questioning their joint reliability. When total conflict occurs (K=1K = 1), the normalization factor becomes undefined due to , preventing the application of Dempster's rule. In such scenarios, alternative approaches are required, such as the disjunctive rule of combination, which aggregates focal elements via unions (m(A)=BC=Am1(B)m2(C)m(A) = \sum_{B \cup C = A} m_1(B) m_2(C)) without normalization, preserving all mass while emphasizing over consensus. To illustrate, consider a frame of discernment Θ={A,B}\Theta = \{A, B\}. Let the first source have basic probability assignment m1(A)=0.8m_1(A) = 0.8, m1(Θ)=0.2m_1(\Theta) = 0.2, and the second source have m2(B)=1.0m_2(B) = 1.0. The conflicting product is m1(A)m2(B)=0.8m_1(A) m_2(B) = 0.8 (as AB=A \cap B = \emptyset), yielding K=0.8K = 0.8. The sole non-conflicting product is m1(Θ)m2(B)=0.2m_1(\Theta) m_2(B) = 0.2, assigned to BB. Normalization by 1/(10.8)=51/(1 - 0.8) = 5 results in the combined assignment m(B)=0.2×5=1.0m(B) = 0.2 \times 5 = 1.0, demonstrating how the rule discards the conflicting mass and amplifies the compatible portion to full belief in BB.

Illustrative examples

To illustrate the behavior of Dempster's rule of combination, consider a frame of discernment Θ = {a, b} where two sources provide completely conflicting evidence: the first assigns all mass to {a}, so m₁({a}) = 1, while the second assigns all mass to {b}, so m₂({b}) = 1. The conflict measure K is computed as the sum of products of masses whose focal sets have empty intersection, yielding K = m₁({a}) · m₂({b}) = 1 · 1 = 1. Since 1 - K = 0, normalization is impossible, and the rule fails to produce a combined mass function; this outcome correctly avoids assigning mass to any subset, preventing a false consensus from irreconcilable sources. A counter-intuitive case arises even with partial ignorance and apparent low support for consensus, as highlighted in Zadeh's critique. For Θ = {brain tumor (BT), meningitis (M), concussion (C)}, suppose one source (e.g., a doctor) assigns m₁(M) = 0.99 and m₁(BT) = 0.01, while another assigns m₂(C) = 0.99 and m₂(BT) = 0.01. The conflict K ≈ 0.9999 (from M∩C, M∩BT, BT∩C), with 1 - K ≈ 0.0001. The combined mass normalizes to m(BT) = (0.01 × 0.01) / 0.0001 = 1, fully supporting brain tumor despite both sources assigning it only marginal belief (0.01 each) and strongly favoring mutually exclusive alternatives (M and C); this unexpectedly amplifies the minor overlapping support while discarding the primary conflicting evidence. In contrast, low-conflict scenarios yield intuitive reinforcement of shared evidence. For Θ = {A, B} representing possible locations in a (A for spot 1, B for spot 2), suppose m₁({A}) = 0.3, m₁({B}) = 0.2, m₁({A, B}) = 0.5 from one observer, and m₂({A}) = 0.2, m₂({A, B}) = 0.8 from another. Here, K = 0.04 due to m₁({B}) · m₂({A}) = 0.2 · 0.2 (B ∩ A = ∅), leading to a combined m({A}) ≈ 0.42, m({B}) ≈ 0.17, m({A, B}) ≈ 0.41 after normalization by 1 - K = 0.96; this concentrates toward A (Bel(A) = 0.42, Pl(A) = 0.83), intuitively strengthening the common indication of car A without undue amplification. A step-by-step for the above low-conflict case demonstrates the rule's . First, identify all pairs of focal sets from m₁ and m₂, computing the unnormalized for each possible S ≠ ∅ as the product m₁(X) · m₂(Y) where X ∩ Y = S, and accumulate for K where X ∩ Y = ∅. K = m₁({B}) · m₂({A}) = 0.2 · 0.2 = 0.04. For S = {A}: m₁({A}) · m₂({A}) = 0.3 · 0.2 = 0.06 ({A} ∩ {A} = {A}); m₁({A}) · m₂({A, B}) = 0.3 · 0.8 = 0.24 ({A} ∩ {A, B} = {A}); m₁({A, B}) · m₂({A}) = 0.5 · 0.2 = 0.10 ({A, B} ∩ {A} = {A}); total unnormalized = 0.40. For S = {B}: m₁({B}) · m₂({A, B}) = 0.2 · 0.8 = 0.16 ({B} ∩ {A, B} = {B}); m₁({A, B}) · m₂({A, B}) contributes to {A, B}. For S = {A, B}: m₁({A, B}) · m₂({A, B}) = 0.5 · 0.8 = 0.40. Normalize each by dividing by 1 - K = 0.96, yielding m({A}) ≈ 0.42, m({B}) ≈ 0.17, m({A, B}) ≈ 0.41. Finally, derive Bel(A) = m({A}) = 0.42 and Pl(A) = 1 - Bel({B}) = 1 - m({B}) = 0.83, confirming reinforced commitment to A. Dempster's rule produces expected outcomes in low-conflict settings by proportionally concentrating mass on overlapping , fostering consensus without distortion, as seen in the parking lot reinforcement of A. However, in high-conflict cases like complete opposition or Zadeh's scenario, it yields unexpected results—either failure to combine or disproportionate amplification of minor shared elements—highlighting sensitivities to normalization that can mislead when sources disagree strongly.

Connections to Probability Theory

Generalization of Bayesian methods

Dempster–Shafer theory (DST) generalizes by extending the representation of uncertainty beyond precise probability distributions over singletons to belief functions defined over subsets of the frame of discernment. In , probabilities are assigned solely to individual outcomes, requiring a complete prior distribution even when information is partial or incomplete. DST addresses this limitation through basic probability assignments (m) that can allocate mass to composite sets, enabling the modeling of evidential support for hypotheses without committing to exhaustive probabilistic specifications. This framework, formalized by Dempster in his work on upper and lower probabilities, allows for inference under conditions of limited prior knowledge by deriving bounds on probabilities from multivalued mappings. A key aspect of this generalization is that Bayesian probability emerges as a special case within DST. Specifically, if the basic probability assignment m allocates mass only to singleton subsets, the resulting belief function recovers standard probability measures. For a subset A of the frame Θ, Bel(A)=θAm({θ})=P(A),\text{Bel}(A) = \sum_{\theta \in A} m(\{\theta\}) = P(A), where Bel denotes the belief function and P the probability. This equivalence holds because, under such an assignment, the belief in A is the sum of masses on its singletons, mirroring Bayesian conditioning and updating. However, DST's flexibility shines in scenarios with unknown or partial priors, where mass can be assigned to the entire frame Θ to represent ignorance, avoiding the need to fabricate a full prior distribution. The vacuous belief function, defined by m(Θ) = 1, exemplifies total ignorance, yielding Bel(A) = 0 for all proper subsets A ≠ Θ while maintaining plausibility Pl(A) = 1 for nonempty A, thus bounding uncertainty without false precision. DST further distinguishes evidential support from probabilistic commitment by separating Bel(A), which measures committed support for A, from plausibility Pl(A) = 1 - Bel(\overline{A}), which provides an upper bound including possible but uncommitted . This duality allows DST to quantify both confirmed and potential plausibility, offering a richer structure than Bayesian point probabilities for handling incomplete . belief functions, a subclass where focal elements form a nested sequence A_1 \subset A_2 \subset \cdots \subset A_n = Θ, approximate by aligning plausibility with possibility measures and enable connections to fuzzy sets through graded memberships. Dempster's model specifically targeted partial prior , using multivalued mappings to induce upper and lower probabilities that capture evidential without assuming full distributional .

Bayesian approximations of belief functions

Bayesian approximations of belief functions in Dempster–Shafer theory (DST) aim to represent the uncertainty captured by belief functions in a probabilistic form compatible with standard tools, which require precise probability distributions. These approximations are particularly useful when DST outputs need to interface with Bayesian networks or other probabilistic models that cannot directly handle the full structure of basic probability assignments. Common methods include interval-based representations and point probability transformations, each trading off some of the expressive power of DST for computational tractability. One straightforward approximation treats the belief function as defining an interval of possible probabilities for each event, with the belief measure Bel(A) serving as the lower bound and the plausibility measure Pl(A) as the upper bound. This approach preserves the range of inherent in the belief function, representing it as an imprecise probability interval [Bel(A), Pl(A)], which can then be used in robust Bayesian under . A more specific point approximation is the pignistic transformation, introduced by Philippe Smets, which converts a basic probability assignment m into a BetP by averaging the mass over singletons: BetP(A)=BΘBABm(B)B\text{BetP}(A) = \sum_{\substack{B \subseteq \Theta \\ B \ni A \\ B \neq \emptyset}} \frac{m(B)}{|B|} for singleton sets A, where Θ is the frame of discernment and |B| is the of B. This transformation effectively distributes the mass from non-singleton focal elements equally among their elements, yielding a that approximates the decision-relevant aspects of the belief function. In Smets' transferable belief model (TBM), the pignistic transformation BetP is explicitly designated for decision-making under expected utility, where linearity in probabilities is required, but it is not intended for belief updating, which remains at the credal level using Dempster's rule. For example, given a basic probability assignment m({a}) = 0.4 and m({a,b}) = 0.6 on Θ = {a,b}, the pignistic probabilities are BetP(a) = 0.4 + 0.6/2 = 0.7 and BetP(b) = 0.6/2 = 0.3, providing a normalized for decisions. These approximations find application when DST belief functions must feed into Bayesian networks, such as in scenarios where fused expert opinions in belief form are transformed via the pignistic method to initialize probabilistic nodes, enabling subsequent without altering the network structure. While DST generalizes Bayesian methods by allowing mass on non-singletons, these approximations reverse the process for practical integration.

Comparative limitations

While the Dempster–Shafer theory (DST) generalizes by allowing partial commitment to hypotheses through basic probability assignments (BPAs), it introduces several limitations in comparison, particularly in representation uniqueness and computational demands. For a given belief function, the underlying BPA is uniquely determined via the , but converting probabilistic representations (such as conditional probabilities) to belief functions lacks a method, leading to non-uniqueness in the evidential model depending on the extension used, such as the ballooning or consonant approximation. In contrast, Bayesian theory enforces additivity, where probabilities sum to unity across the frame of discernment, ensuring a unique and normalized representation that simplifies interpretation and avoids such ambiguities. A primary computational drawback of DST arises from Dempster's rule of combination, which naively requires enumerating all subset intersections, yielding a worst-case of O(2^{2n}) for a frame of discernment with n elements, as each combination step scales quadratically with the number of focal elements (up to 2^n). Bayesian updates, by comparison, leverage for efficient incremental computation, often O(n) in network propagation for conjugate priors or likelihoods, making them more scalable for large-scale with precise data. However, Bayesian methods assume full probabilistic commitment to singletons, which can misrepresent or , whereas DST's non-additive structure better accommodates such cases. DST excels in scenarios involving unreliable sources, where evidence discounting—via a reliability factor α (0 ≤ α ≤ 1) that redistributes to the frame while assigning (1-α) to the full set—allows modeling of without full dismissal, as in tasks with partial trust. Bayesian approaches, requiring transformation of unreliability into likelihood ratios, often demand stronger assumptions about and completeness, performing better with precise, reliable streams like controlled experiments. A of DST's potential overcommitment occurs when combining two BPAs each assigning 0.95 to distinct singletons A and B (with 0.05 to the full frame): high conflict (K ≈ 0.9025) leads to normalization that assigns approximately 0.487 to A and 0.487 to B (with ~0.026 to Θ), counterintuitively committing strongly to the conflicting hypotheses despite evidential discord. To address closed-world normalization issues, unnormalized functions extend standard DST for open-world assumptions, permitting positive mass on the to quantify uncommitted conflict without redistribution, thus preserving evidential in incomplete domains like database queries. Empirically, DST shows slight advantages in decision accuracy for evidential reasoning tasks, such as fault , where it achieves higher diagnostic precision (e.g., distinguishing 2100/2500 instances vs. Bayesian's lower resolution) by explicitly handling conflict, though both yield comparable results under transformation via the pignistic probability as a decision bridge.

Criticisms and Extensions

Major theoretical critiques

One prominent early critique of Dempster–Shafer theory (DST) came from Lotfi A. Zadeh in 1979, who argued that the conjunctive rule of combination can produce counterintuitive and seemingly irrational results, particularly when sources exhibit high degrees of ignorance. In Zadeh's illustrative example using a frame Θ = {M, C, T} (meningitis, concussion, tumor), two independent sources provide nearly certain evidence for different mutually exclusive hypotheses: one assigns m({M}) = 0.99 and m({C}) = 0.01, the other m({T}) = 0.99 and m({C}) = 0.01; their combination under Dempster's rule yields m({C}) = 1, assigning full belief to C despite both sources nearly ruling it out. This behavior, Zadeh contended, reveals a fundamental flaw in the rule's normalization process, limiting its applicability in scenarios involving imprecise or conflicting expert opinions. Another foundational challenge arises from Peter Walley's framework of imprecise probabilities, where some belief functions may not satisfy his coherence axioms for avoiding Dutch books. Walley developed a theory emphasizing coherent lower previsions as bounds on expectations. In response to such critiques, including Zadeh's, Glenn Shafer and Philippe Smets developed the Transferable Belief Model (TBM) as a reinterpretation of DST, emphasizing that belief functions represent degrees of distinct from subjective probabilities, thereby justifying the separation of evidential reasoning from probabilistic without requiring full coherence in the Bayesian sense. Practical drawbacks of DST further compound theoretical concerns, notably its sensitivity to the order of in non-associative extensions proposed to handle high conflict, where alternative rules (e.g., disjunctive or hybrid methods) can yield divergent results depending on evidence sequencing, undermining reliability in multi-source fusion. Additionally, repeated applications of the rule often lead to an exponential explosion in the number of focal elements, rendering computations intractable for large frames of discernment or numerous sources, as the power set representation grows to 2^{|\Theta|} elements. Empirical studies in AI-based diagnostics have also highlighted limitations of DST compared to Bayesian approaches in certain fault diagnosis tasks.

Relational and alternative measures

Relational belief measures extend the Dempster–Shafer framework to capture dependencies among multiple sources of evidence using matrix representations. In Yager's 1987 formulation, belief structures are represented as matrices where entries correspond to basic probability assignments, enabling the modeling of relational dependencies between hypotheses across sources during combination. This approach addresses limitations in handling correlated evidence by incorporating relational operators that preserve inter-source relationships, unlike standard orthogonal sums that assume independence. The commonality function serves as a to the plausibility function in Dempster–Shafer theory, providing a measure of the total mass committed to sets containing a given . Defined as Q(A)=BAm(B)Q(A) = \sum_{B \supseteq A} m(B) for all AΘA \subseteq \Theta, where mm is the basic probability assignment and Θ\Theta is the frame of discernment, Q(A)Q(A) quantifies commonality by summing masses over all supersets of AA. This function facilitates efficient computation of combinations, as Dempster's rule can be expressed via pointwise multiplication of commonality functions followed by Möbius inversion to recover the combined mass. It is particularly useful for analyzing the shared support across focal elements without requiring explicit belief or plausibility calculations. Alternatives to classical Dempster–Shafer theory have been developed to mitigate issues with conflicting or non-exclusive evidence. The Dezert-Smarandache theory (DSmT) relaxes the exclusivity assumption inherent in the power set of the frame of discernment, instead operating over a hyper-power set that includes both unions and intersections of hypotheses to model vague or interacting propositions. In DSmT, basic belief assignments are defined on this hyper-power set, with combination rules like the hybrid DSm rule adapting to integrity constraints on hypothesis intersections, thus avoiding counterintuitive redistributions in high-conflict scenarios. Another alternative is the Imprecise Dirichlet Model (IDM), which integrates with Dempster–Shafer structures to provide a cautious prior for updating beliefs under limited data. The IDM employs a set of Dirichlet priors with a small ss (typically s=2s=2 or s=1s=1) to generate lower and upper probabilities, representing imprecision as belief intervals that avoid overcommitment to precise values. When fused with evidence in Dempster–Shafer theory, it promotes conservative inference by treating ignorance explicitly through vacuous belief functions derived from the imprecise priors. These measures address shortcomings such as Zadeh's , where classical Dempster's rule counterintuitively assigns full belief to a disjunctive hypothesis amid conflicting favoring singletons. In DSmT, the proportional conflict redistribution rule (PCR5) mitigates this by redistributing conflicting mass proportionally to the non-empty intersections of focal elements, yielding balanced beliefs (e.g., approximately 0.486 each to two conflicting singletons M and C, and 0.028 to the common third singleton T in a standard example with masses 0.9/0.1), preserving intuitive support for alternatives without artificial normalization. The IDM similarly tempers effects by maintaining imprecision in updates, ensuring lower beliefs remain low even under conflict.

Historical development and applications

The Dempster–Shafer theory originated with Arthur P. Dempster's 1967 paper, which introduced the concepts of upper and lower probabilities induced by a multivalued mapping, providing a framework for handling in without assuming precise probabilistic distributions. This work laid the groundwork for representing incomplete information through bounds on probabilities rather than point estimates. In 1976, Glenn Shafer extended and formalized these ideas in his seminal book A Mathematical Theory of Evidence, defining belief functions as a generalization of to model evidential reasoning and combine sources of evidence. Key milestones in the theory's development include Ronald R. Yager's 1987 proposal for alternative combination rules in the Dempster–Shafer framework, which addressed conflict redistribution by assigning conflicting to the universal set instead of normalizing it away, improving handling of highly contradictory evidence. Philippe Smets further advanced the theory in 1988 with the Transferable Belief Model (TBM), which separated the representation of (via belief functions at the credal level) from (via pignistic probabilities), allowing for non-probabilistic updates without committing to underlying probabilities prematurely. The theory has found diverse applications, particularly in domains requiring fusion of uncertain or conflicting data. In and , it enables by combining readings from multiple sensors, such as in autonomous mobile robots where it weighs evidence from , , and vision systems to improve localization and obstacle avoidance under noisy conditions. In , Dempster–Shafer theory supports by modeling uncertainties in detection and judgments, integrating evidential indicators like transaction anomalies to generate intervals for risk levels rather than binary classifications. For , it underpins evidential reasoning in expert systems, combining symptoms, test results, and physician judgments to assess disease probabilities while accounting for diagnostic indeterminacy, as seen in frameworks for or evaluation. Open-source software tools facilitate practical implementation of the theory, such as the Python package dempster-shafer, which provides functions for defining mass functions, applying Dempster's rule of combination, and computing and plausibility measures. As of 2025, Dempster–Shafer theory continues to evolve through integration with techniques for handling uncertain data, notably in scenarios where it fuses model outputs from distributed devices while preserving privacy and managing evidential conflicts in applications like intrusion detection and trustworthy AI segmentation tasks. The BELIEF 2024 conference highlighted ongoing research, including applications in machine and . Post-Shafer developments include extensions like the Dezert–Smarandache Theory (DSmT), which relaxes frame of discernment constraints to better manage dynamic or non-exclusive hypotheses in information fusion.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.