Hubbry Logo
Markov kernelMarkov kernelMain
Open search
Markov kernel
Community hub
Markov kernel
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Markov kernel
Markov kernel
from Wikipedia

In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.[1]

Formal definition

[edit]

Let and be measurable spaces. A Markov kernel with source and target , sometimes written as , is a function with the following properties:

  1. For every (fixed) , the map is -measurable
  2. For every (fixed) , the map is a probability measure on

In other words it associates to each point a probability measure on such that, for every measurable set , the map is measurable with respect to the -algebra .[2]

Examples

[edit]

Simple random walk on the integers

[edit]

Take , and (the power set of ). Then a Markov kernel is fully determined by the probability it assigns to singletons for each :

.

Now the random walk that goes to the right with probability and to the left with probability is defined by

where is the Kronecker delta. The transition probabilities for the random walk are equivalent to the Markov kernel.

General Markov processes with countable state space

[edit]

More generally take and both countable and . Again a Markov kernel is defined by the probability it assigns to singleton sets for each

,

We define a Markov process by defining a transition probability where the numbers define a (countable) stochastic matrix i.e.

We then define

.

Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

[edit]

Let be a measure on , and a measurable function with respect to the product -algebra such that

,

then i.e. the mapping

defines a Markov kernel.[3] This example generalises the countable Markov process example where was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on with standard Lebesgue measure and

Measurable functions

[edit]

Take and arbitrary measurable spaces, and let be a measurable function. Now define i.e.

for all .

Note that the indicator function is -measurable for all iff is measurable.

This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.

As a less obvious example, take , and the real numbers with the standard sigma algebra of Borel sets. Then

where is the number of element at the state , are i.i.d. random variables (usually with mean 0) and where is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels

[edit]

Given measurable spaces , we consider a Markov kernel as a morphism . Intuitively, rather than assigning to each a sharply defined point the kernel assigns a "fuzzy" point in which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space , and probability kernels and , we can define a composition by the Chapman-Kolmogorov equation

.

The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure ) is the unit for this composition.

This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,[4] the category of Markov kernels.

Probability Space defined by Probability Distribution and a Markov Kernel

[edit]

A composition of a probability space and a probability kernel defines a probability space , where the probability measure is given by

Properties

[edit]

Semidirect product

[edit]

Let be a probability space and a Markov kernel from to some . Then there exists a unique measure on , such that:

Regular conditional distribution

[edit]

Let be a Borel space, a -valued random variable on the measure space and a sub--algebra. Then there exists a Markov kernel from to , such that is a version of the conditional expectation for every , i.e.

It is called regular conditional distribution of given and is not uniquely defined.

Generalizations

[edit]

Transition kernels generalize Markov kernels in the sense that for all , the map

can be any type of (non negative) measure, not necessarily a probability measure.

[edit]

References

[edit]

See also

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a Markov kernel (also known as a stochastic kernel, transition kernel, or probability kernel) is a function KK from the product of a (Ω1,F1)(\Omega_1, \mathcal{F}_1) and the sigma-algebra F2\mathcal{F}_2 of another (Ω2,F2)(\Omega_2, \mathcal{F}_2) into [0,1][0, 1], such that for each ω1Ω1\omega_1 \in \Omega_1, K(ω1,)K(\omega_1, \cdot) defines a on (Ω2,F2)(\Omega_2, \mathcal{F}_2), and for each BF2B \in \mathcal{F}_2, the map ω1K(ω1,B)\omega_1 \mapsto K(\omega_1, B) is F1\mathcal{F}_1-measurable. This structure ensures that the kernel captures conditional probabilities in a way that preserves measurability and total probability mass of 1 for each fixed input. Markov kernels generalize the transition matrices used in discrete-state Markov chains to arbitrary measurable spaces, enabling the modeling of stochastic transitions in continuous or mixed settings. They form the basis for defining Markov processes, where the kernel specifies the of the next state given the current state, embodying the that future states depend only on the present. Key operations on Markov kernels include composition, which chains transitions (e.g., K2K1(x,B)=K1(x,dy)K2(y,B)K_2 \circ K_1 (x, B) = \int K_1(x, dy) K_2(y, B)), and marginalization, allowing the construction of joint distributions from initial measures and kernels via μK(B)=μ(dx)K(x,B)\mu K(B) = \int \mu(dx) K(x, B). Beyond stochastic processes, Markov kernels underpin advanced concepts such as conditional expectations, disintegration theorems for decomposing joint measures, and categorical frameworks in probability, where they form a category with properties akin to relations, supporting applications in probabilistic programming and semantics. Their σ-finiteness and boundedness properties ensure well-behaved integral operators for transforming measures and functions, making them essential in measure-theoretic probability.

Definition and Basics

Formal Definition

A Markov kernel, also known as a stochastic kernel or transition kernel, is defined between two measurable spaces (X,X)(X, \mathcal{X}) and (Y,Y)(Y, \mathcal{Y}) as a map K:X×Y[0,1]K: X \times \mathcal{Y} \to [0,1] such that, for each fixed xXx \in X, the set function AK(x,A)A \mapsto K(x, A) is a on (Y,Y)(Y, \mathcal{Y}), and for each fixed AYA \in \mathcal{Y}, the function xK(x,A)x \mapsto K(x, A) is X\mathcal{X}-measurable. The core axioms stem from the requirement: non-negativity (K(x,A)0K(x, A) \geq 0 for all xXx \in X and AYA \in \mathcal{Y}), normalization (K(x,Y)=1K(x, Y) = 1 for all xXx \in X), and countable additivity (K(x,nAn)=nK(x,An)K(x, \bigcup_n A_n) = \sum_n K(x, A_n) for disjoint AnYA_n \in \mathcal{Y}), alongside the measurability condition ensuring the kernel integrates properly with respect to measures on XX. Standard notation represents the kernel as K(x,dy)K(x, dy), denoting the probability measure on YY induced by xx, with expectations of Y\mathcal{Y}-measurable functions f:YRf: Y \to \mathbb{R} given by Yf(y)K(x,dy).\int_Y f(y) \, K(x, dy). This formulation distinguishes Markov kernels from deterministic kernels, which take the form K(x,)=δg(x)K(x, \cdot) = \delta_{g(x)} for some measurable g:XYg: X \to Y (Dirac measures at deterministic points), and from arbitrary set-valued functions, which lack the and measurability properties.

Interpretation in Probability

A Markov kernel provides a rigorous framework for specifying conditional probabilities in processes, particularly those exhibiting the . In the context of discrete-time processes, a kernel KK from a state space SS to subsets ASA \subseteq S is interpreted as K(x,A)=P(Xn+1AXn=x)K(x, A) = P(X_{n+1} \in A \mid X_n = x), where XnX_n denotes the state at time nn. This formulation captures the transition probabilities governing the evolution from the current state xx to future states, ensuring that the kernel defines a for each fixed xx. The inherent in the is directly encoded by the kernel, as the conditional distribution of the next state depends solely on the present state and remains independent of the entire past trajectory. This eliminates , meaning that predictions about future states require knowledge only of the current position, simplifying the modeling of systems where historical details beyond the immediate state do not influence outcomes. Such an interpretation is fundamental to defining Markov chains and processes in general probability spaces. When paired with an initial distribution μ\mu on the state space, the Markov kernel generates the full joint probability measure for the process. Specifically, the distribution at time n+1n+1 is obtained by integrating the kernel against the distribution at time nn, iteratively constructing the of the sequence {Xn}\{X_n\} from μ\mu and KK. This combination enables the analysis of long-term behavior, such as stationarity or , without relying on explicit path constructions.

Examples

Discrete State Spaces

In discrete state spaces, which are finite or countably infinite sets, a Markov kernel KK from a state space (S,S)(S, \mathcal{S}) to itself, where S\mathcal{S} is the power set of SS, reduces to a transition probability matrix PP with entries Pij=K(i,{j})P_{ij} = K(i, \{j\}) for i,jSi, j \in S. Each row of PP sums to 1, ensuring that K(i,)K(i, \cdot) is a on SS for every iSi \in S, which aligns the kernel with the standard representation of discrete-time Markov chains on countable spaces. This matrix form facilitates analysis of long-term behavior, such as convergence to stationary distributions, under conditions like irreducibility. A classic example is the simple random walk on the integers Z\mathbb{Z}, where the state space S=ZS = \mathbb{Z} is countable and infinite. The Markov kernel is defined by K(n,{n+1})=pK(n, \{n+1\}) = p and K(n,{n1})=1pK(n, \{n-1\}) = 1-p for 0<p<10 < p < 1 and all nZn \in \mathbb{Z}, with K(n,A)=0K(n, A) = 0 for all other singletons outside {n1,n+1}\{n-1, n+1\}. In matrix terms, this yields a doubly infinite transition matrix with pp on the superdiagonal and 1p1-p on the subdiagonal, modeling unbiased diffusion when p=1/2p = 1/2. This kernel captures recurrent behavior on Z\mathbb{Z}, where the chain returns to the starting point with probability 1. Another illustrative case is the Galton-Watson branching process on the non-negative integers S={0,1,2,}S = \{0, 1, 2, \dots\}, which tracks population sizes across generations. The kernel K(n,{m})K(n, \{m\}) gives the probability that nn individuals produce exactly mm offspring in total, assuming each produces an independent number of offspring according to a fixed probability generating function f(s)=kfkskf(s) = \sum_k f_k s^k, where fk=P(ξ=k)f_k = P(\xi = k) for offspring random variable ξ\xi. Explicitly, K(n,{m})=k1++kn=mi=1nfkiK(n, \{m\}) = \sum_{k_1 + \dots + k_n = m} \prod_{i=1}^n f_{k_i}, or equivalently, the coefficient of sms^m in [f(s)]n[f(s)]^n. The resulting transition matrix has entries that grow combinatorially with nn, reflecting supercritical growth if the mean offspring exceeds 1. For discrete Markov kernels, key structural properties include irreducibility and periodicity, which determine ergodic behavior. A kernel is irreducible if, for every pair i,jSi, j \in S, there exists n1n \geq 1 such that K(n)(i,{j})>0K^{(n)}(i, \{j\}) > 0, where K(n)K^{(n)} denotes the nn-fold iteration, meaning the chain can reach any state from any other. Periodicity measures the cyclic structure: the period of state ii is the greatest common divisor of {n1:K(n)(i,{i})>0}\{n \geq 1 : K^{(n)}(i, \{i\}) > 0\}, and the chain is periodic if this period d>1d > 1 for all states in an irreducible class, leading to oscillatory convergence; otherwise, it is aperiodic with d=1d=1. In irreducible chains, all states share the same period. These properties ensure unique stationary distributions under aperiodicity and positive recurrence.

Continuous State Spaces

In continuous state spaces, Markov kernels are often represented using densities with respect to a reference measure to handle the uncountable nature of the spaces. Specifically, for measurable spaces (X,A)(X, \mathcal{A}) and (Y,B)(Y, \mathcal{B}), a Markov kernel K:X×B[0,1]K: X \times \mathcal{B} \to [0,1] may admit a k(x,)k(x, \cdot) with respect to a σ\sigma-finite reference measure μ\mu on (Y,B)(Y, \mathcal{B}), such that K(x,B)=Bk(x,y)μ(dy)K(x, B) = \int_B k(x, y) \, \mu(dy) for all BBB \in \mathcal{B} and xXx \in X, where k:X×Y[0,)k: X \times Y \to [0, \infty) is a measurable function satisfying Yk(x,y)μ(dy)=1\int_Y k(x, y) \, \mu(dy) = 1 for each xx. This form arises when the induced joint measure σ(dx,dy)=λ(dx)K(x,dy)\sigma(dx, dy) = \lambda(dx) K(x, dy) on X×YX \times Y is absolutely continuous with respect to the product measure λμ\lambda \otimes \mu, where λ\lambda is a measure on (X,A)(X, \mathcal{A}), and the density kk is the Radon-Nikodym derivative dσd(λμ)(x,y)\frac{d\sigma}{d(\lambda \otimes \mu)}(x,y). Such representations are essential for integrating kernels over uncountable sets and facilitate computations in stochastic analysis. A prominent example of a Markov kernel on continuous spaces is the deterministic shift induced by a f:XYf: X \to Y. Here, the kernel is given by K(x,dy)=δf(x)(dy)K(x, dy) = \delta_{f(x)}(dy), where δz\delta_z denotes the concentrated at zYz \in Y. This kernel assigns probability 1 to the singleton {f(x)}\{f(x)\}, making transitions fully deterministic while preserving measurability, and it serves as a building block for more complex kernels via mixtures or compositions. An illustrative stochastic example is the transition kernel for Brownian motion on R\mathbb{R}, a fundamental diffusion process. For standard Brownian motion starting at xRx \in \mathbb{R}, the kernel over time t>0t > 0 has density k(x,y;t)=12πtexp((yx)22t)k(x, y; t) = \frac{1}{\sqrt{2\pi t}} \exp\left( -\frac{(y - x)^2}{2t} \right)
Add your contribution
Related Hubs
User Avatar
No comments yet.