Hubbry Logo
Log probabilityLog probabilityMain
Open search
Log probability
Community hub
Log probability
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Log probability
Log probability
from Wikipedia

In probability theory and computer science, a log probability is simply a logarithm of a probability.[1] The use of log probabilities means representing probabilities on a logarithmic scale , instead of the standard unit interval.

Since the probabilities of independent events multiply, and logarithms convert multiplication to addition, log probabilities of independent events add. Log probabilities are thus practical for computations, and have an intuitive interpretation in terms of information theory: the negative expected value of the log probabilities is the information entropy of an event. Similarly, likelihoods are often transformed to the log scale, and the corresponding log-likelihood can be interpreted as the degree to which an event supports a statistical model. The log probability is widely used in implementations of computations with probability, and is studied as a concept in its own right in some applications of information theory, such as natural language processing.

Motivation

[edit]

Representing probabilities in this way has several practical advantages:

  1. Speed. Since multiplication is more expensive than addition, taking the product of a high number of probabilities is often faster if they are represented in log form. (The conversion to log form is expensive, but is only incurred once.) Multiplication arises from calculating the probability that multiple independent events occur: the probability that all independent events of interest occur is the product of all these events' probabilities.
  2. Accuracy. The use of log probabilities improves numerical stability, when the probabilities are very small, because of the way in which computers approximate real numbers.[1]
  3. Simplicity. Many probability distributions have an exponential form. Taking the log of these distributions eliminates the exponential function, unwrapping the exponent. For example, the log probability of the normal distribution's probability density function is instead of . Log probabilities make some mathematical manipulations easier to perform.
  4. Optimization. Since most common probability distributions—notably the exponential family—are only logarithmically concave,[2][3] and concavity of the objective function plays a key role in the maximization of a function such as probability, optimizers work better with log probabilities.

Representation issues

[edit]

The logarithm function is not defined for zero, so log probabilities can only represent non-zero probabilities. Since the logarithm of a number in interval is negative, often the negative log probabilities are used. In that case the log probabilities in the following formulas would be inverted.

Any base can be selected for the logarithm.

Basic manipulations

[edit]

In this section we would name probabilities in logarithmic space and for short:

The product of probabilities corresponds to addition in logarithmic space.

The sum of probabilities is a bit more involved to compute in logarithmic space, requiring the computation of one exponent and one logarithm.

However, in many applications a multiplication of probabilities (giving the probability of all independent events occurring) is used more often than their addition (giving the probability of at least one of mutually exclusive events occurring). Additionally, the cost of computing the addition can be avoided in some situations by simply using the highest probability as an approximation. Since probabilities are non-negative this gives a lower bound. This approximation is used in reverse to get a continuous approximation of the max function.

Addition in log space

[edit]

The formula above is more accurate than , provided one takes advantage of the asymmetry in the addition formula. should be the larger (least negative) of the two operands. This also produces the correct behavior if one of the operands is floating-point negative infinity, which corresponds to a probability of zero.

This quantity is indeterminate, and will result in NaN.
This is the desired answer.

The above formula alone will incorrectly produce an indeterminate result in the case where both arguments are . This should be checked for separately to return .

For numerical reasons, one should use a function that computes (log1p) directly.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In and , log probability is defined as the logarithm—typically the natural logarithm (base ee)—of a probability value, which lies between 0 and 1, resulting in a negative value that ranges from -\infty to 0. This representation preserves the ordering of probabilities since the logarithm is a monotonically increasing function, allowing comparisons and optimizations to be performed equivalently on the original probabilities. The primary advantages of working with log probabilities stem from computational stability and algebraic simplicity. When computing joint probabilities as products of individual probabilities, repeated multiplications of small values (common in high-dimensional or large-sample scenarios) can lead to numerical underflow in , where values below approximately 2.225×103082.225 \times 10^{-308} become indistinguishable from zero in systems like Python. By contrast, the logarithm converts these products into sums via the identity log(P×Q)=logP+logQ\log(P \times Q) = \log P + \log Q, which avoids underflow and enables efficient gradient-based optimization in algorithms. In , log probabilities form the basis of the log-likelihood function, defined as the natural logarithm of the likelihood, which is the probability (or ) of observing a given sample under a parameterized . Maximizing the log-likelihood is equivalent to maximizing the likelihood itself due to the monotonicity of the logarithm, and it is routinely applied in to find optimal model parameters, such as the mean in a Gaussian distribution. For independent observations x1,,xnx_1, \dots, x_n, the log-likelihood simplifies to i=1nlogp(xiθ)\sum_{i=1}^n \log p(x_i \mid \theta), facilitating derivative computations for optimization. Log probabilities also play a pivotal role in , where the negative log probability of an event, logp(x)-\log p(x), quantifies its self-information or surprise, measuring the uncertainty reduction upon observing the event in bits (using base-2 logarithm) or nats (). This concept, introduced by , underpins as the expected self-information over a distribution, providing a foundation for data compression, , and calculations. For instance, a flip (probability 0.5) yields 1 bit of self-information, while a rarer event like rolling a specific face on a fair die (probability 1/61/6) yields approximately 2.585 bits. In and probabilistic modeling, log probabilities are essential for training generative models, such as in naive Bayes classifiers or neural language models, where they enable the summation of log-probabilities for sequence likelihoods and regularization via techniques like entropy minimization. Their use extends to , variational methods, and , where they support scalable approximations of posterior distributions and policy optimization.

Definition and Fundamentals

Formal Definition

In probability theory and statistics, the log probability of an event is defined as the logarithm of its associated probability pp, where 0<p10 < p \leq 1. This transformation is expressed as logp\log p, with the natural logarithm lnp\ln p serving as the standard choice in statistical analysis and computational contexts due to its mathematical properties and prevalence in likelihood functions. Although base-10 logarithms log10p\log_{10} p can be used in some applications, the natural base ee (approximately 2.718) is preferred for its alignment with exponential functions and information-theoretic measures. Common notation for log probability includes logP(X)\log P(X), where XX denotes the event or random variable, or the shorthand (X)\ell(X) to explicitly signify the logarithmic scale. The function is undefined at p=0p = 0, as the logarithm approaches negative infinity, and for p(0,1]p \in (0,1], the values range from -\infty to 0, reflecting the non-positive nature of probabilities in this interval. For example, consider a fair coin flip where the probability of heads is P(heads)=0.5P(\text{heads}) = 0.5. The log probability is then ln(0.5)0.693\ln(0.5) \approx -0.693, illustrating how the transformation yields a negative value that scales with the rarity of the event.

Relation to Natural Logarithm

In probability and statistics, the natural logarithm (base e) is the conventional base for log probabilities due to its favorable properties in calculus, particularly when dealing with probability densities and maximum likelihood estimation. The derivative of the natural logarithm of a probability p with respect to a parameter is simply (1/p) times the derivative of p, which streamlines the computation of score functions and gradients without extraneous constants. This simplification is especially useful in deriving estimators for distributions involving exponentials, as seen in exponential families. In contrast, using a logarithm with base be introduces a scaling factor of 1/ln(b) in the derivative, complicating analytical work. The change-of-base formula further underscores this preference: for any base b, logb(p) = ln(p) / ln(b), meaning computations in the natural base eliminate the need for such normalization constants in theoretical derivations. While base-2 logarithms are standard in to quantify in bits—as introduced by in 1948—the natural base aligns more naturally with the exponential forms prevalent in probabilistic modeling and avoids unit-specific scaling. The practice of using natural logarithms for log probabilities emerged in early 20th-century statistics, particularly with Ronald A. Fisher's 1922 formalization of , where log-likelihoods proved essential for managing products of joint probabilities from independent observations. This approach addressed numerical underflow issues in likelihood computations and facilitated asymptotic theory. Alfred Rényi extended related ideas in the 1960s through his axiomatic development of generalized measures, which rely on logarithmic transformations of probabilities to capture in a unified framework. In modern computational implementations, libraries like default to logarithm via the np.log function for log probability operations, ensuring compatibility with statistical algorithms and preventing overflow in high-dimensional products. This convention promotes , as log probabilities transform multiplications into additions, a property preserved regardless of base but optimized with log for derivative-based optimizations.

Key Properties

Monotonicity and Inequalities

The logarithm function is strictly increasing on the positive real numbers, which implies that for probabilities 0<p1,p210 < p_1, p_2 \leq 1, logp1>logp2\log p_1 > \log p_2 if and only if p1>p2p_1 > p_2. This monotonicity preserves the ordering of probabilities when working in log space, ensuring that relative comparisons remain unchanged under the transformation. The natural logarithm logp\log p is a concave function for p>0p > 0. By Jensen's inequality applied to this concavity, for a random variable PP taking values in (0,1](0, 1] with expectation E[P]\mathbb{E}[P], it holds that log(E[P])E[logP]\log(\mathbb{E}[P]) \geq \mathbb{E}[\log P], with equality if and only if PP is almost surely constant. This inequality underscores the subadditivity of the log transform in expectation, which is fundamental in analyses of probabilistic averages and information measures. A useful bound arising from monotonicity is that for 0<p,q10 < p, q \leq 1, log(p+q)log2+max(logp,logq)\log(p + q) \leq \log 2 + \max(\log p, \log q). Without loss of generality, assume pqp \geq q; then p+q2pp + q \leq 2p, so log(p+q)log(2p)=log2+logp\log(p + q) \leq \log(2p) = \log 2 + \log p, as the logarithm is increasing. This provides an upper bound on the log of a sum of probabilities, limiting overflow in computations involving small values. In Bayesian updating, the monotonicity of the logarithm ensures that the posterior log-odds increase with the strength of supporting evidence, as the update adds the log-likelihood to the prior log-odds. For instance, stronger evidence corresponds to a higher likelihood ratio, which monotonically boosts the posterior belief in the hypothesis.

Log of Products and Sums

In probability theory, the product rule for independent events transforms under the logarithm into a simple addition. For two independent events AA and BB, the joint probability is P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B), so the log probability becomes logP(AB)=logP(A)+logP(B)\log P(A \cap B) = \log P(A) + \log P(B). This property, derived from the logarithm's additive nature over multiplication, facilitates the handling of multiplicative probability structures by converting them to summations, which are computationally more stable and interpretable in many analyses. This additive property extends naturally to the chain rule of probability, which decomposes joint distributions into products of conditionals. For a sequence of random variables X1,,XnX_1, \dots, X_n, the joint probability is P(X1,,Xn)=i=1nP(XiX1,,Xi1)P(X_1, \dots, X_n) = \prod_{i=1}^n P(X_i \mid X_1, \dots, X_{i-1}), and taking the logarithm yields logP(X1,,Xn)=i=1nlogP(XiX1,,Xi1)\log P(X_1, \dots, X_n) = \sum_{i=1}^n \log P(X_i \mid X_1, \dots, X_{i-1}). This representation is fundamental in probabilistic modeling, as it allows the log-joint probability to be expressed as a sum of local log-conditional probabilities, enabling efficient inference in and sequential processes. In contrast, the sum rule for probabilities presents a challenge in log space. For mutually exclusive events AA and BB, P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B), so logP(AB)=log(P(A)+P(B))\log P(A \cup B) = \log \left( P(A) + P(B) \right), which does not simplify to logP(A)+logP(B)\log P(A) + \log P(B). This lack of direct additivity requires careful handling, often involving normalization to ensure the arguments sum appropriately before applying the logarithm. A key application of the product-to-sum transformation arises in maximum likelihood estimation (MLE), where the likelihood for independent and identically distributed (i.i.d.) observations x1,,xnx_1, \dots, x_n is L(θ)=i=1np(xiθ)L(\theta) = \prod_{i=1}^n p(x_i \mid \theta). The log-likelihood then simplifies to (θ)=logL(θ)=i=1nlogp(xiθ)\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log p(x_i \mid \theta), turning the optimization problem into minimizing the negative sum of log probabilities, which is more numerically robust and aligns with gradient-based methods.

Operations in Log Space

Logarithmic Addition

Logarithmic addition refers to the computation of the logarithm of the sum of probabilities, which is essential when working in log space to represent unions of mutually exclusive events or marginal probabilities. For two mutually exclusive events AA and BB, the probability of their union is P(AB)=P(A)+P(B)P(A \lor B) = P(A) + P(B), so the log probability is given by logP(AB)=log(exp(logP(A))+exp(logP(B)))\log P(A \lor B) = \log \left( \exp(\log P(A)) + \exp(\log P(B)) \right). This form, known as the log-sum-exp function, arises naturally in probabilistic models where direct summation in probability space can lead to numerical issues, but it requires careful implementation in log space to maintain stability. For a general sum over multiple log probabilities logpi\log p_i, the expression becomes log(iexp(logpi))\log \left( \sum_i \exp(\log p_i) \right). To prevent overflow or underflow during exponentiation—especially when the logpi\log p_i values vary widely—a normalization technique factors out the maximum value M=max(logpi)M = \max(\log p_i), yielding log(iexp(logpi))=logM+log(iexp(logpilogM))\log \left( \sum_i \exp(\log p_i) \right) = \log M + \log \left( \sum_i \exp(\log p_i - \log M) \right). This log-sum-exp trick ensures numerical stability by keeping all terms in the inner sum between 0 and 1 after subtraction, avoiding extreme exponents. In the pairwise case, the formula simplifies further: for terms aa and bb where aba \geq b, log(exp(a)+exp(b))=a+log(1+exp(ba))\log(\exp(a) + \exp(b)) = a + \log(1 + \exp(b - a)). This avoids computing the potentially large exp(a)\exp(a) directly and leverages the fact that exp(ba)1\exp(b - a) \leq 1, making it computationally efficient and stable. A practical application of logarithmic addition occurs in the forward algorithm for hidden Markov models (HMMs), where state probabilities at time tt are computed by summing over transitions from previous states in log space: logαt(k)=log-sum-exp({logαt1(j)+logajk+logbk(ot)j=1,,K})\log \alpha_t(k) = \log\text{-sum-exp} \left( \{ \log \alpha_{t-1}(j) + \log a_{jk} + \log b_k(o_t) \mid j = 1, \dots, K \} \right). This approach prevents underflow in long sequences, enabling reliable likelihood computation.

Handling Divisions and Ratios

In log space, divisions and ratios of probabilities are computed via subtraction, leveraging the property that the logarithm of a quotient equals the difference of the logarithms. Specifically, for events AA and BB with P(B)>0P(B) > 0, logP(A)P(B)=logP(A)logP(B)\log \frac{P(A)}{P(B)} = \log P(A) - \log P(B). This operation simplifies the calculation of probability ratios, such as odds ratios, which compare the likelihood of an event occurring versus not occurring, avoiding direct division of potentially small probabilities that could introduce numerical issues. For conditional probabilities, the logarithm follows directly from the definition P(AB)=P(A,B)P(B)P(A \mid B) = \frac{P(A, B)}{P(B)}, yielding logP(AB)=logP(A,B)logP(B)\log P(A \mid B) = \log P(A, B) - \log P(B). Here, the joint probability P(A,B)P(A, B) is often obtained in log space from the sum of log probabilities under assumptions, such as logP(A,B)=logP(A)+logP(B)\log P(A, B) = \log P(A) + \log P(B) if AA and BB are independent. This subtraction enables efficient computation of conditionals in probabilistic models without . The log-odds, defined for a binary event as logP1P\log \frac{P}{1 - P} where PP is the probability of the event, transforms probabilities into an additive scale useful for binary outcomes. Additions in log-odds space correspond to multiplications in , facilitating updates in models with binary decisions. In , this manifests as logP(θD)P(θ0D)=logP(Dθ)P(Dθ0)+logP(θ)P(θ0)\log \frac{P(\theta \mid D)}{P(\theta_0 \mid D)} = \log \frac{P(D \mid \theta)}{P(D \mid \theta_0)} + \log \frac{P(\theta)}{P(\theta_0)}, where the posterior log-odds equal the prior log-odds plus the log-likelihood ratio. This form streamlines by converting multiplicative updates to additions. In , model parameters directly represent changes in log-odds; for inputs xx and weights ww, the log-odds of the positive class is wTxw^T x, such that P(y=1x,w)=σ(wTx)P(y=1 \mid x, w) = \sigma(w^T x) where σ\sigma is the . Each feature's indicates the log-odds shift per unit change, enabling interpretable probabilistic predictions for tasks.

Practical Applications

In Probabilistic Modeling

In probabilistic modeling, log probabilities play a central role in , particularly through the log-likelihood function, which was introduced by Ronald A. Fisher in 1922 to facilitate parameter estimation. The log-likelihood (θ)\ell(\theta) for a set of independent observations {xi}i=1n\{x_i\}_{i=1}^n given parameters θ\theta is defined as (θ)=i=1nlogP(xiθ)\ell(\theta) = \sum_{i=1}^n \log P(x_i \mid \theta), transforming the product of probabilities into a sum that is easier to maximize. This formulation underpins maximum likelihood estimation, where θ^=argmaxθ(θ)\hat{\theta} = \arg\max_\theta \ell(\theta), enabling efficient optimization in large datasets by leveraging the additivity of logarithms. A key application arises in exponential family distributions, where the log probability takes a linear form in the natural parameter space, simplifying inference and computation. Specifically, for a xx with parameters θ\theta, the log probability is given by logP(xθ)=θT(x)logZ(θ)+A(x),\log P(x \mid \theta) = \theta \cdot T(x) - \log Z(\theta) + A(x), where T(x)T(x) is the , Z(θ)Z(\theta) is the (or partition function), and A(x)A(x) is a base measure; this structure highlights the linearity in log space, which facilitates conjugate priors and moment calculations in Bayesian settings. This representation was formalized in the foundational work on sufficient statistics by Pitman in , establishing exponential families as a for tractable probabilistic models. In graphical models, log probabilities are essential for belief propagation algorithms, such as the junction tree method, which represents joint distributions via s and separators to perform exact . The junction tree algorithm, developed by Lauritzen and Spiegelhalter in 1988, can be implemented using log potentials—logarithms of the clique and separator functions—to mitigate numerical underflow during , ensuring stable computation of marginal posteriors in multiply connected Bayesian networks. This approach transforms multiplicative updates into additive ones, preserving the probabilistic structure while enhancing computational reliability. Variational inference further relies on log probabilities to approximate intractable posteriors through the evidence lower bound (ELBO), a tractable objective that lower-bounds the marginal log-likelihood. The ELBO is expressed as L(q)=Eq[logP(x,z)]Eq[logq(z)]\mathcal{L}(q) = \mathbb{E}_q[\log P(x, z)] - \mathbb{E}_q[\log q(z)], where q(z)q(z) is a variational distribution over latent variables zz, and maximizing it yields an approximation to the true posterior P(zx)P(z \mid x). This framework, introduced by Jordan et al. in 1999 for graphical models, enables scalable inference by optimizing in log space, avoiding direct computation of normalizing constants. For joint distributions, the log probability of the joint can be referenced as the sum of marginal and conditional log probabilities, aligning with product rules from earlier sections.

In Machine Learning Algorithms

The use of log probabilities in algorithms has surged since the early , coinciding with the rise of , where they enable stable computation of probabilities in high-dimensional spaces. For instance, , a seminal , maximized the average log-probability across training cases using as its objective, which helped mitigate numerical instability during training on large datasets like . This approach became foundational for subsequent models, allowing efficient handling of softmax outputs without underflow in probability values close to zero. In gradient-based optimization, log probabilities simplify the computation of derivatives for maximum likelihood estimation. The gradient of the log-likelihood with respect to model parameters θ\theta is given by the score function θlogP(xθ)=1P(xθ)θP(xθ)\frac{\partial}{\partial \theta} \log P(x|\theta) = \frac{1}{P(x|\theta)} \frac{\partial}{\partial \theta} P(x|\theta), which avoids direct manipulation of small probability values and reduces variance in stochastic gradient estimates. This log-derivative trick is particularly valuable in scenarios where probabilities are tiny, as it transforms products into sums and stabilizes training dynamics. During backpropagation in neural networks, log probabilities are integral to the cross-entropy loss, formulated as yilogpi-\sum y_i \log p_i, where yy is the one-hot target and pp the predicted softmax probabilities; this pairing with log-softmax outputs ensures numerical robustness and efficient gradient flow through layers. In reinforcement learning, policy gradient methods leverage log probabilities to update policies via advantage-weighted importance. The REINFORCE algorithm computes updates proportional to the gradient θlogπ(as)A\nabla_\theta \log \pi(a|s) \cdot A, where π(as)\pi(a|s) is the policy probability and AA the advantage, enabling direct optimization of expected rewards without explicit value functions. A prominent application appears in large language models like GPT-3, where training maximizes the sum of log probabilities for next-token prediction, logP(wtw<t)\sum \log P(w_t | w_{<t}), using log-softmax over a vast vocabulary to handle autoregressive generation efficiently. This formulation supports scalable training on massive corpora, yielding models capable of coherent sequence generation.

Numerical Implementation

Avoiding Overflow and Underflow

In probabilistic computations, direct evaluation of joint probabilities often involves products of individual probabilities, each typically less than 1, such as i=1npi\prod_{i=1}^n p_i for nn events. For moderate to large nn, these products can rapidly approach , leading to underflow where the result is indistinguishable from zero in , thus causing loss of precision or erroneous computations. Working in log space mitigates this by transforming the product into a sum of logarithms, i=1nlogpi\sum_{i=1}^n \log p_i, where each logpi0\log p_i \leq 0 remains finite and negative, preserving relative magnitudes without underflow. Although of log probabilities, exp(logp)\exp(\log p), could theoretically overflow for large positive logp\log p, probabilities satisfy p1p \leq 1 and thus logp0\log p \leq 0, making overflow impossible; instead, severe underflow in the exponentiated result is the dominant concern when logp\log p is large negative. The primary strategy to address these issues is to perform all intermediate calculations entirely in log space, only exponentiating at the final step if a probability scale is required, which maintains throughout the process. A representative example occurs in the , where the for a class involves multiplying class-conditional probabilities across a long feature vector; direct computation underflows for high-dimensional data, but summing the logs of these probabilities avoids this while enabling reliable argmax decisions. Under the floating-point standard, log(0)\log(0) is defined to yield negative , which appropriately represents impossible events without introducing , though implementations must ensure non-negative arguments to prevent undefined behavior like log(negative)\log(\text{negative}).

Log-Sum-Exp Computation

The log-sum-exp (LSE) function is defined as LSE(x1,,xn)=log(i=1nexp(xi))\mathrm{LSE}(x_1, \dots, x_n) = \log \left( \sum_{i=1}^n \exp(x_i) \right), where xix_i are real numbers representing log-probabilities or log-likelihoods. Direct computation of this expression risks numerical overflow when the xix_i are large and positive, or severe underflow when they are large and negative, leading to inaccurate results in . To ensure stability, the function is reformulated by shifting all terms by their maximum value: let m=max(x1,,xn)m = \max(x_1, \dots, x_n), then LSE(x1,,xn)=m+log(i=1nexp(xim)).\mathrm{LSE}(x_1, \dots, x_n) = m + \log \left( \sum_{i=1}^n \exp(x_i - m) \right). This adjustment prevents overflow in the exponentials, as exp(xim)1\exp(x_i - m) \leq 1 for all ii, while the final logarithm handles the scaling accurately. In vectorized implementations, such as the logsumexp function in SciPy, the LSE is computed efficiently over arrays or along specified axes, incorporating the max-shift internally to maintain numerical stability for high-dimensional inputs common in scientific computing. This allows seamless handling of vector or matrix arguments without explicit looping, optimizing performance on modern hardware while bounding errors comparably to the scalar case. For sequential or incremental addition of terms, an iterative variant maintains a running LSE by incorporating each new xkx_k via pairwise application: starting with s1=x1s_1 = x_1, update sk=LSE(sk1,xk)s_{k} = \mathrm{LSE}(s_{k-1}, x_k) using the stable shift at each step. This approach is particularly useful in streaming computations or online algorithms where terms arrive one at a time, preserving stability without recomputing the full sum. Regarding numerical , the naive direct incurs relative errors on the order of O(exp(xim))\mathcal{O}(\exp(-|x_i - m|)) for terms far from the maximum, potentially losing all precision due to underflow. In contrast, the shifted LSE reduces the backward to at most u(1+nκ)u(1 + n \kappa), where uu is the unit roundoff ( divided by ln2\ln 2) and κ\kappa is the of the summation, achieving accuracy close to machine precision for well-conditioned inputs. A practical example arises in the expectation-maximization (EM) algorithm for mixture models, where the E-step computes log posterior responsibilities as normalized log-likelihoods: for data point xx and components jj, the responsibility γj=πjfj(x)kπkfk(x)\gamma_j = \frac{\pi_j f_j(x)}{\sum_k \pi_k f_k(x)} is obtained in log space via logγj=logπj+logfj(x)LSE({logπk+logfk(x)}k=1K)\log \gamma_j = \log \pi_j + \log f_j(x) - \mathrm{LSE}(\{\log \pi_k + \log f_k(x)\}_{k=1}^K), using LSE to stably evaluate the normalizing log-partition function.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.