Hubbry Logo
Positive-definite kernelPositive-definite kernelMain
Open search
Positive-definite kernel
Community hub
Positive-definite kernel
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Positive-definite kernel
Positive-definite kernel
from Wikipedia

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

Definition

[edit]

Let be a nonempty set, sometimes referred to as the index set. A symmetric function is called a positive-definite (p.d.) kernel on if

holds for all , .

In probability theory, a distinction is sometimes made between positive-definite kernels, for which equality in (1.1) implies , and positive semi-definite (p.s.d.) kernels, which do not impose this condition. Note that this is equivalent to requiring that every finite matrix constructed by pairwise evaluation, , has either entirely positive (p.d.) or nonnegative (p.s.d.) eigenvalues.

In mathematical literature, kernels are usually complex-valued functions. That is, a complex-valued function is called a Hermitian kernel if and positive definite if for every finite set of points and any complex numbers ,

where denotes the complex conjugate.[1] In the rest of this article we assume real-valued functions, which is the common practice in applications of p.d. kernels.

Some general properties

[edit]
  • For a family of p.d. kernels
    • The conical sum is p.d., given
    • The product is p.d., given
    • The limit is p.d. if the limit exists.
  • If is a sequence of sets, and a sequence of p.d. kernels, then both and are p.d. kernels on .
  • Let . Then the restriction of to is also a p.d. kernel.

Examples of p.d. kernels

[edit]
  • Common examples of p.d. kernels defined on Euclidean space include:
    • Linear kernel: .
    • Polynomial kernel: .
    • Gaussian kernel (RBF kernel): .
    • Laplacian kernel: .
    • Abel kernel: .
    • Kernel generating Sobolev spaces : , where is the Bessel function of the third kind.
    • Kernel generating Paley–Wiener space: .
  • If is a Hilbert space, then its corresponding inner product is a p.d. kernel. Indeed, we have
  • Kernels defined on and histograms: Histograms are frequently encountered in applications of real-life problems. Most observations are usually available under the form of nonnegative vectors of counts, which, if normalized, yield histograms of frequencies. It has been shown [2] that the following family of squared metrics, respectively Jensen divergence, the -square, Total Variation, and two variations of the Hellinger distance:can be used to define p.d. kernels using the following formula

Examples of other kernels

[edit]

The sigmoid kernel, or hyperbolic tangent kernel, is defined as where are real parameters. The kernel is not PD, but has been sometimes used for kernel algorithms.[3]

History

[edit]

Positive-definite kernels, as defined in (1.1), appeared first in 1909 in a paper on integral equations by James Mercer.[4] Several other authors made use of this concept in the following two decades, but none of them explicitly used kernels , i.e. p.d. functions (indeed M. Mathias and S. Bochner seem not to have been aware of the study of p.d. kernels). Mercer’s work arose from Hilbert’s paper of 1904 [5] on Fredholm integral equations of the second kind:

In particular, Hilbert had shown that

where is a continuous real symmetric kernel, is continuous, is a complete system of orthonormal eigenfunctions, and ’s are the corresponding eigenvalues of (1.2). Hilbert defined a “definite” kernel as one for which the double integral satisfies except for . The original object of Mercer’s paper was to characterize the kernels which are definite in the sense of Hilbert, but Mercer soon found that the class of such functions was too restrictive to characterize in terms of determinants. He therefore defined a continuous real symmetric kernel to be of positive type (i.e. positive-definite) if for all real continuous functions on , and he proved that (1.1) is a necessary and sufficient condition for a kernel to be of positive type. Mercer then proved that for any continuous p.d. kernel the expansion holds absolutely and uniformly.

At about the same time W. H. Young,[6] motivated by a different question in the theory of integral equations, showed that for continuous kernels condition (1.1) is equivalent to for all .

E.H. Moore [7][8] initiated the study of a very general kind of p.d. kernel. If is an abstract set, he calls functions defined on “positive Hermitian matrices” if they satisfy (1.1) for all . Moore was interested in generalization of integral equations and showed that to each such there is a Hilbert space of functions such that, for each . This property is called the reproducing property of the kernel and turns out to have importance in the solution of boundary-value problems for elliptic partial differential equations.

Another line of development in which p.d. kernels played a large role was the theory of harmonics on homogeneous spaces as begun by E. Cartan in 1929, and continued by H. Weyl and S. Ito. The most comprehensive theory of p.d. kernels in homogeneous spaces is that of M. Krein[9] which includes as special cases the work on p.d. functions and irreducible unitary representations of locally compact groups.

In probability theory, p.d. kernels arise as covariance kernels of stochastic processes.[10]

Connection with reproducing kernel Hilbert spaces and feature maps

[edit]

Positive-definite kernels provide a framework that encompasses some basic Hilbert space constructions. In the following we present a tight relationship between positive-definite kernels and two mathematical objects, namely reproducing Hilbert spaces and feature maps.

Let be a set, a Hilbert space of functions , and the corresponding inner product on . For any the evaluation functional is defined by . We first define a reproducing kernel Hilbert space (RKHS):

Definition: Space is called a reproducing kernel Hilbert space if the evaluation functionals are continuous.

Every RKHS has a special function associated to it, namely the reproducing kernel:

Definition: Reproducing kernel is a function such that

  1. , and
  2. , for all and .

The latter property is called the reproducing property.

The following result shows equivalence between RKHS and reproducing kernels:

Theorem Every reproducing kernel induces a unique RKHS, and every RKHS has a unique reproducing kernel.

Now the connection between positive definite kernels and RKHS is given by the following theorem

Theorem Every reproducing kernel is positive-definite, and every positive definite kernel defines a unique RKHS, of which it is the unique reproducing kernel.

Thus, given a positive-definite kernel , it is possible to build an associated RKHS with as a reproducing kernel.

As stated earlier, positive definite kernels can be constructed from inner products. This fact can be used to connect p.d. kernels with another interesting object that arises in machine learning applications, namely the feature map. Let be a Hilbert space, and the corresponding inner product. Any map is called a feature map. In this case we call the feature space. It is easy to see [11] that every feature map defines a unique p.d. kernel by Indeed, positive definiteness of follows from the p.d. property of the inner product. On the other hand, every p.d. kernel, and its corresponding RKHS, have many associated feature maps. For example: Let , and for all . Then , by the reproducing property. This suggests a new look at p.d. kernels as inner products in appropriate Hilbert spaces, or in other words p.d. kernels can be viewed as similarity maps which quantify effectively how similar two points and are through the value . Moreover, through the equivalence of p.d. kernels and its corresponding RKHS, every feature map can be used to construct a RKHS.

Kernels and distances

[edit]

Kernel methods are often compared to distance based methods such as nearest neighbors. In this section we discuss parallels between their two respective ingredients, namely kernels and distances .

Here by a distance function between each pair of elements of some set , we mean a metric defined on that set, i.e. any nonnegative-valued function on which satisfies

  • , and if and only if ,

One link between distances and p.d. kernels is given by a particular kind of kernel, called a negative definite kernel, and defined as follows

Definition: A symmetric function is called a negative definite (n.d.) kernel on if

holds for any and such that .

The parallel between n.d. kernels and distances is in the following: whenever a n.d. kernel vanishes on the set , and is zero only on this set, then its square root is a distance for .[12] At the same time each distance does not correspond necessarily to a n.d. kernel. This is only true for Hilbertian distances, where distance is called Hilbertian if one can embed the metric space isometrically into some Hilbert space.

On the other hand, n.d. kernels can be identified with a subfamily of p.d. kernels known as infinitely divisible kernels. A nonnegative-valued kernel is said to be infinitely divisible if for every there exists a positive-definite kernel such that .

Another link is that a p.d. kernel induces a pseudometric, where the first constraint on the distance function is loosened to allow for . Given a positive-definite kernel , we can define a distance function as:

Some applications

[edit]

Kernels in machine learning

[edit]

Positive-definite kernels, through their equivalence with reproducing kernel Hilbert spaces (RKHS), are particularly important in the field of statistical learning theory because of the celebrated representer theorem which states that every minimizer function in an RKHS can be written as a linear combination of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the empirical risk minimization problem from an infinite dimensional to a finite dimensional optimization problem.

Kernels in probabilistic models

[edit]

There are several different ways in which kernels arise in probability theory.

  • Nondeterministic recovery problems: Assume that we want to find the response of an unknown model function at a new point of a set , provided that we have a sample of input-response pairs given by observation or experiment. The response at is not a fixed function of but rather a realization of a real-valued random variable . The goal is to get information about the function which replaces in the deterministic setting. For two elements the random variables and will not be uncorrelated, because if is too close to the random experiments described by and will often show similar behaviour. This is described by a covariance kernel . Such a kernel exists and is positive-definite under weak additional assumptions. Now a good estimate for can be obtained by using kernel interpolation with the covariance kernel, ignoring the probabilistic background completely.

Assume now that a noise variable , with zero mean and variance , is added to , such that the noise is independent for different and independent of there, then the problem of finding a good estimate for is identical to the above one, but with a modified kernel given by .

  • Density estimation by kernels: The problem is to recover the density of a multivariate distribution over a domain , from a large sample including repetitions. Where sampling points lie dense, the true density function must take large values. A simple density estimate is possible by counting the number of samples in each cell of a grid, and plotting the resulting histogram, which yields a piecewise constant density estimate. A better estimate can be obtained by using a nonnegative translation invariant kernel , with total integral equal to one, and define as a smooth estimate.

Numerical solution of partial differential equations

[edit]

One of the greatest application areas of so-called meshfree methods is in the numerical solution of PDEs. Some of the popular meshfree methods are closely related to positive-definite kernels (such as meshless local Petrov Galerkin (MLPG), Reproducing kernel particle method (RKPM) and smoothed-particle hydrodynamics (SPH)). These methods use radial basis kernel for collocation.[13]

Stinespring dilation theorem

[edit]

Other applications

[edit]

In the literature on computer experiments [14] and other engineering experiments, one increasingly encounters models based on p.d. kernels, RBFs or kriging. One such topic is response surface methodology. Other types of applications that boil down to data fitting are rapid prototyping and computer graphics. Here one often uses implicit surface models to approximate or interpolate point cloud data.

Applications of p.d. kernels in various other branches of mathematics are in multivariate integration, multivariate optimization, and in numerical analysis and scientific computing, where one studies fast, accurate and adaptive algorithms ideally implemented in high-performance computing environments.[15]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In mathematics, particularly within and , a positive-definite kernel (also known as a positive semi-definite kernel) is a k:X×XRk: \mathcal{X} \times \mathcal{X} \to \mathbb{R}, where X\mathcal{X} is a nonempty set, such that for any positive nn, any distinct points x1,,xnXx_1, \dots, x_n \in \mathcal{X}, and any real coefficients c1,,cnRc_1, \dots, c_n \in \mathbb{R}, the i=1nj=1ncicjk(xi,xj)0\sum_{i=1}^n \sum_{j=1}^n c_i c_j k(x_i, x_j) \geq 0. This property ensures that the associated [k(xi,xj)]i,j=1n[k(x_i, x_j)]_{i,j=1}^n is positive semi-definite, generalizing the concept of positive-definite matrices to continuous or infinite-dimensional settings. Positive-definite kernels originated in the early 20th century through work on integral equations, with foundational contributions from (1908) on Hilbert spaces and James Mercer (1909), who established linking such kernels to positive-definite integral operators via spectral decompositions. Key properties include closure under pointwise products, sums, and limits, as well as their intimate connection to reproducing kernel Hilbert spaces (RKHS), where the kernel serves as the reproducing kernel, enabling evaluations via inner products: f(x)=f,k(x,)Hkf(x) = \langle f, k(x, \cdot) \rangle_{H_k} for functions ff in the space HkH_k. Common examples encompass the linear kernel k(x,y)=xyk(x, y) = x^\top y, the k(x,y)=(xy+c)dk(x, y) = (x^\top y + c)^d for constants c0c \geq 0 and degree d>0d > 0, and the Gaussian (RBF) kernel k(x,y)=exp(xy2/(2σ2))k(x, y) = \exp\left(-\|x - y\|^2 / (2\sigma^2)\right) for σ>0\sigma > 0, each universally positive-definite on Rd\mathbb{R}^d and widely used due to their flexibility in capturing linear and nonlinear similarities. In applications, positive-definite kernels underpin for scattered data approximation, numerical solutions to partial differential equations (PDEs), and algorithms such as support vector machines (SVMs), , and Gaussian processes, where they implicitly map data into high-dimensional feature spaces without explicit computation. More recent extensions include operator-valued kernels for vector-valued outputs and structured data kernels for graphs, sequences, and images, enhancing tasks like multiple kernel learning and multimodal data integration.

Definition and Properties

Definition

A real symmetric matrix AA is positive semidefinite if it satisfies xTAx0\mathbf{x}^T A \mathbf{x} \geq 0 for every real vector x\mathbf{x}. Equivalently, all eigenvalues of AA are non-negative. A function k:X×XRk: X \times X \to \mathbb{R}, where XX is any nonempty set, is called a positive-definite kernel if it is symmetric, meaning k(x,y)=k(y,x)k(x, y) = k(y, x) for all x,yXx, y \in X, and if, for every positive integer nn, every choice of points x1,,xnXx_1, \dots, x_n \in X, and every choice of real coefficients c1,,cnRc_1, \dots, c_n \in \mathbb{R}, the satisfies i=1nj=1ncicjk(xi,xj)0.\sum_{i=1}^n \sum_{j=1}^n c_i c_j k(x_i, x_j) \geq 0. This condition is equivalent to requiring that the n×nn \times n Gram matrix KK with entries Kij=k(xi,xj)K_{ij} = k(x_i, x_j) is positive semidefinite for all such nn and points. If the inequality is strict (>0> 0) whenever the coefficients are not all zero, then kk is a strictly positive-definite kernel. The definition extends to complex-valued kernels as follows: a function k:X×XCk: X \times X \to \mathbb{C} is positive-definite if it satisfies Hermitian , k(y,x)=k(x,y)k(y, x) = \overline{k(x, y)} for all x,yXx, y \in X, and if, for every positive integer nn, points x1,,xnXx_1, \dots, x_n \in X, and complex coefficients c1,,cnCc_1, \dots, c_n \in \mathbb{C}, i=1nj=1ncicjk(xi,xj)0.\sum_{i=1}^n \sum_{j=1}^n c_i \overline{c_j} k(x_i, x_j) \geq 0. In this case, the associated KK with Kij=k(xi,xj)K_{ij} = k(x_i, x_j) is Hermitian and positive semidefinite.

Properties

A positive-definite kernel k:X×XRk: X \times X \to \mathbb{R} satisfies k(x,x)0k(x, x) \geq 0 for all xXx \in X, as this follows directly from the positive semi-definiteness of the for any singleton set {x}\{x\}. Equality holds, i.e., k(x,x)=0k(x, x) = 0, k(x,y)=0k(x, y) = 0 for all yXy \in X, corresponding to a degenerate case where the kernel collapses the point xx to the zero function in the associated feature space. Additionally, positive definiteness implies symmetry, k(x,y)=k(y,x)k(x, y) = k(y, x) for all x,yXx, y \in X, since the must be symmetric to be positive semi-definite. A key pointwise bound arises from the Cauchy-Schwarz inequality applied in the implicit feature : k(x,y)k(x,x)k(y,y)|k(x, y)| \leq \sqrt{k(x, x) k(y, y)}
Add your contribution
Related Hubs
User Avatar
No comments yet.