Hubbry Logo
LogitLogitMain
Open search
Logit
Community hub
Logit
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Logit
Logit
from Wikipedia
Plot of logit(x) in the domain of 0 to 1, where the base of the logarithm is e.

In statistics, the logit (/ˈlɪt/ LOH-jit) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations.

Mathematically, the logit is the inverse of the standard logistic function , so the logit is defined as

Because of this, the logit is also called the log-odds since it is equal to the logarithm of the odds where p is a probability. Thus, the logit is a type of function that maps probability values from to real numbers in ,[1] akin to the probit function.

Definition

[edit]

If p is a probability, then p/(1 − p) is the corresponding odds; the logit of the probability is the logarithm of the odds, i.e.:

The base of the logarithm function used is of little importance in the present article, as long as it is greater than 1, but the natural logarithm with base e is the one most often used. The choice of base corresponds to the choice of logarithmic unit for the value: base 2 corresponds to a shannon, base e to a nat, and base 10 to a hartley; these units are particularly used in information-theoretic interpretations. For each choice of base, the logit function takes values between negative and positive infinity.

The “logistic” function of any number is given by the inverse-logit:

The difference between the logits of two probabilities is the logarithm of the odds ratio (R), thus providing a shorthand for writing the correct combination of odds ratios only by adding and subtracting:

The Taylor series for the logit function is given by:

History

[edit]

Several approaches have been explored to adapt linear regression methods to a domain where the output is a probability value , instead of any real number . In many cases, such efforts have focused on modeling this problem by mapping the range to and then running the linear regression on these transformed values.[2]

In 1934, Chester Ittner Bliss used the cumulative normal distribution function to perform this mapping and called his model probit, an abbreviation for "probability unit". This is, however, computationally more expensive.[2]

In 1944, Joseph Berkson used log of odds and called this function logit, an abbreviation for "logistic unit", following the analogy for probit:

"I use this term [logit] for following Bliss, who called the analogous function which is linear on for the normal curve 'probit'."

— Joseph Berkson (1944)[3]

Log odds was used extensively by Charles Sanders Peirce (late 19th century).[4] G. A. Barnard in 1949 coined the commonly used term log-odds;[5][6] the log-odds of an event is the logit of the probability of the event.[7] Barnard also coined the term lods as an abstract form of "log-odds",[8] but suggested that "in practice the term 'odds' should normally be used, since this is more familiar in everyday life".[9]

Uses and properties

[edit]

Comparison with probit

[edit]
Comparison of the logit function with a scaled probit (i.e. the inverse CDF of the normal distribution), comparing vs. , which makes the slopes the same at the y-origin.

Closely related to the logit function (and logit model) are the probit function and probit model. The logit and probit are both sigmoid functions with a domain between 0 and 1, which makes them both quantile functions – i.e., inverses of the cumulative distribution function (CDF) of a probability distribution. In fact, the logit is the quantile function of the logistic distribution, while the probit is the quantile function of the normal distribution. The probit function is denoted , where is the CDF of the standard normal distribution, as just mentioned:

As shown in the graph on the right, the logit and probit functions are extremely similar when the probit function is scaled, so that its slope at y = 0 matches the slope of the logit. As a result, probit models are sometimes used in place of logit models because for certain applications (e.g., in item response theory) the implementation is easier.[14]

See also

[edit]

References

[edit]
[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The logit function, also known as the log-odds function, is a mathematical transformation in statistics defined as logit(p)=ln(p1p)\operatorname{logit}(p) = \ln\left(\frac{p}{1-p}\right), where pp is a probability satisfying 0<p<10 < p < 1. This function maps the open interval (0, 1) onto the entire real line (,)(-\infty, \infty), providing an unbounded scale suitable for linear modeling of probabilities. The logit arises naturally in the context of logistic regression, a generalized linear model used to predict binary outcomes by modeling the log-odds of success as a linear combination of predictor variables. In this framework, the inverse logit—known as the logistic or sigmoid function, σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}—converts the linear predictor back to a probability between 0 and 1, yielding an S-shaped cumulative distribution that accommodates the bounded nature of probabilities. Logistic regression, which relies on the logit link, is one of the most widely applied techniques in applied statistics for analyzing dichotomous data, such as pass/fail outcomes or presence/absence events. Beyond binary classification, the logit function extends to multinomial and ordinal logistic models, enabling the analysis of categorical responses with more than two levels, as seen in choice modeling and ranking data. In machine learning, logistic regression serves as a foundational algorithm for supervised classification tasks, including spam detection, medical diagnosis, and fraud detection, and forms the basis for more complex models like neural networks. Its robustness stems from the global concavity of the log-likelihood function under the logit assumption, ensuring reliable maximum likelihood estimation even with moderate sample sizes. Applications span diverse fields, including epidemiology for risk factor analysis and social sciences for survey response modeling, as well as economics for discrete choice experiments.

Definition and Formulation

Definition

The logit, also known as the log-odds, is a statistical transformation that converts a probability pp (where 0<p<10 < p < 1) into a continuous value on the real number line by taking the natural logarithm of the odds. The odds represent the ratio of the probability of an event occurring (pp) to the probability of it not occurring (1p1 - p), providing a measure of relative likelihood that avoids the bounded nature of probabilities. In its binary form, the logit applies to scenarios with two possible outcomes, such as success or failure, where pp denotes the probability of success. This transformation is particularly useful for modeling binary events in statistics, as it linearizes the relationship between probabilities and explanatory variables. For cases with more than two outcomes, the logit extends to a multinomial framework, where odds are computed relative to a baseline category to handle multiple alternatives. To illustrate, consider a probability p=0.5p = 0.5, which yields odds of 1 and a logit value of 0, indicating balanced likelihood. For p=0.75p = 0.75, the odds are 3, resulting in a logit of approximately 1.099, reflecting a stronger tilt toward the event occurring. The inverse of this transformation is the logistic function, which recovers the original probability from the logit scale.

Mathematical Formulation

The logit function, often denoted as \logit(p)\logit(p) or η\eta, transforms a probability pp into a real-valued quantity and is formally defined as \logit(p)=ln(p1p),\logit(p) = \ln\left(\frac{p}{1-p}\right), where 0<p<10 < p < 1. This formulation maps the open interval (0,1)(0, 1) to the entire real line (,)(-\infty, \infty). The inverse of the logit function is the logistic function, commonly referred to as the sigmoid function σ(x)\sigma(x), given by σ(x)=11+ex.\sigma(x) = \frac{1}{1 + e^{-x}}. Thus, if x=\logit(p)x = \logit(p), then p=σ(x)p = \sigma(x). As pp approaches 0 from above, \logit(p)\logit(p) approaches -\infty, and as pp approaches 1 from below, \logit(p)\logit(p) approaches ++\infty. These boundary behaviors ensure the function's utility in unbounded linear models. For extensions to multiple categories, the multinomial logit model generalizes the binary case. Consider KK categories labeled 1 to KK, with category KK as the reference (where πK=1j=1K1πj\pi_K = 1 - \sum_{j=1}^{K-1} \pi_j). The logit for category jj (j=1,,K1j = 1, \dots, K-1) is defined as \logitj(π)=ln(πjπK).\logit_j(\pi) = \ln\left(\frac{\pi_j}{\pi_K}\right). This yields K1K-1 logit values, each ranging over (,)(-\infty, \infty). The logit function derives from the odds ratio, defined as the ratio of the probability of an event to its complement, \odds(p)=p/(1p)\odds(p) = p / (1-p); applying the natural logarithm yields \logit(p)=ln(\odds(p))\logit(p) = \ln(\odds(p)).

Properties and Interpretation

Key Properties

The logit function, defined as \logit(p)=ln(p1p)\logit(p) = \ln\left(\frac{p}{1-p}\right) for 0<p<10 < p < 1, is strictly increasing in pp, mapping the open interval (0,1)(0,1) onto the entire real line (,)(-\infty, \infty) while preserving the order of input probabilities. This monotonicity ensures that higher probabilities correspond to higher logit values, a property that facilitates its use in transforming bounded probabilities into an unbounded scale suitable for linear modeling. The function is continuous and infinitely differentiable (smooth) on (0,1)(0,1), with its first derivative given by ddp\logit(p)=1p(1p)\frac{d}{dp} \logit(p) = \frac{1}{p(1-p)}, which is always positive and achieves its minimum value of 4 at p=0.5p = 0.5. This derivative corresponds to the reciprocal of the variance function for the binomial distribution in the context of generalized linear models, highlighting the function's analytical tractability. The smoothness allows for straightforward Taylor expansions and numerical optimizations involving the logit. A notable symmetry property is that \logit(1p)=\logit(p)\logit(1-p) = -\logit(p), reflecting antisymmetry around the point p=0.5p = 0.5, where \logit(0.5)=0\logit(0.5) = 0. This relation implies that deviations from 0.5 in probability correspond to equal-magnitude but opposite-signed deviations in the logit scale, providing a balanced transformation centered at the midpoint of the probability interval. Asymptotically, as p0+p \to 0^+, \logit(p)\logit(p) \to -\infty, and as p1p \to 1^-, \logit(p)+\logit(p) \to +\infty, resulting in unbounded tails that accommodate extreme probabilities without saturation. Near p=0.5p = 0.5, the function admits a linear approximation: \logit(p)4(p0.5)\logit(p) \approx 4(p - 0.5), derived from the first-order Taylor expansion around the inflection point, which underscores its local linearity in the central region. In the framework of generalized linear models, the logit serves as the canonical link function for the binomial distribution, satisfying the condition that the linear predictor equals the natural parameter θ=\logit(μ)\theta = \logit(\mu) of the exponential family form, where μ\mu is the mean. This uniqueness arises from the integral condition b(θ)θdθ=μ\int \frac{\partial b(\theta)}{\partial \theta} \, d\theta = \mu, ensuring desirable statistical properties such as simplified maximum likelihood estimation.

Statistical Interpretation

In statistical modeling, the logit function serves as a canonical link function in generalized linear models (GLMs) for binary response data, transforming the probability pp of success in a binomial distribution to the real line via η=log(p1p)\eta = \log\left(\frac{p}{1-p}\right), which allows the expected value to be expressed as a linear combination of predictors, such as η=β0+β1x\eta = \beta_0 + \beta_1 x. This transformation linearizes the inherently nonlinear relationship between predictors and probabilities, enabling the use of standard linear regression techniques on the logit scale while ensuring predicted probabilities remain bounded between 0 and 1. The coefficients in a logit model have a direct interpretation in terms of odds ratios: the exponentiated coefficient eβje^{\beta_j} represents the multiplicative change in the odds of the outcome for a one-unit increase in the corresponding predictor xjx_j, holding other variables constant. For instance, if β1=0.693\beta_1 = 0.693, then e0.6932e^{0.693} \approx 2, indicating that the odds double for each unit increase in x1x_1. This odds-based interpretation facilitates intuitive understanding of effect sizes in probabilistic terms, particularly in fields like epidemiology and . The logit scale is centered at zero, where \logit(0.5)=0\logit(0.5) = 0 corresponds to even odds (probability of 0.5), with positive logit values indicating probabilities greater than 0.5 and negative values indicating probabilities less than 0.5. This symmetry around 0.5 provides a natural reference point for interpreting deviations from neutrality in model predictions. Additionally, for large sample sizes, the logit transformation of a binomial proportion is asymptotically normal, which helps stabilize the variance of estimates and justifies the use of normal-based inference procedures like Wald tests and confidence intervals. As an illustrative example, consider a simple logit model \logit(p)=β0+β1x\logit(p) = \beta_0 + \beta_1 x with β0=0\beta_0 = 0 (implying baseline probability 0.5 at x=0x = 0) and β1=0.693\beta_1 = 0.693. At x=1x = 1, the logit becomes 0.693, so p=e0.6931+e0.6930.667p = \frac{e^{0.693}}{1 + e^{0.693}} \approx 0.667, meaning the probability increases to about 67% and the odds double from 1:1 to 2:1, demonstrating the model's capacity to quantify predictor impacts on probabilistic outcomes.

Applications

In Logistic Regression

In binary logistic regression, the logit function serves as the link between the linear predictor and the probability of the outcome. The model is formulated as P(Y=1X)=σ(β0+βTX)P(Y=1 \mid X) = \sigma(\beta_0 + \beta^T X), where σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}} is the inverse logit (sigmoid) function, β0\beta_0 is the intercept, β\beta is the vector of coefficients, and XX is the vector of predictors. This setup models the log-odds of the event as a linear combination of the predictors, ensuring predicted probabilities lie between 0 and 1. Parameters are estimated via maximum likelihood estimation (MLE), which maximizes the log-likelihood function (β)=i=1n[yilog(σ(ηi))+(1yi)log(1σ(ηi))]\ell(\beta) = \sum_{i=1}^n \left[ y_i \log(\sigma(\eta_i)) + (1 - y_i) \log(1 - \sigma(\eta_i)) \right], where ηi=β0+βTXi\eta_i = \beta_0 + \beta^T X_i and yiy_i is the binary outcome for observation ii. The MLE has no closed-form solution and is typically obtained iteratively using methods like Newton-Raphson or iteratively reweighted least squares. For multinomial logistic regression, the model extends to K>2K > 2 categories by applying the logit to ratios against a reference category, yielding probabilities P(Y=kX)=exp(β0k+βkTX)j=1Kexp(β0j+βjTX)P(Y=k \mid X) = \frac{\exp(\beta_{0k} + \beta_k^T X)}{\sum_{j=1}^K \exp(\beta_{0j} + \beta_j^T X)} for k=1,,[K](/page/K)k = 1, \dots, [K](/page/K), which sum to 1 across categories. This formulation, known as the multinomial logit (MNL), assumes the . Key assumptions include of observations and of the log-odds with respect to the predictors on the logit scale. Violations, such as or non-, can lead to unstable estimates or if unmodeled; inflates standard errors without biasing point estimates, while non- in the logit can predictions. A common application is predicting from predictors like age, where a positive indicates increased of the event per unit increase in the predictor, holding other factors constant.

In Ordinal Logistic Regression

Ordinal logistic regression extends the logit model to ordered categorical outcomes, such as Likert scales or severity levels. It uses the cumulative logit link, modeling the log-odds of being in a category or higher as a of predictors. For J ordered categories, there are J-1 cumulative probabilities, each with its own intercept but shared slopes: log(P(YjX)P(Y>jX))=αjβTX\log\left(\frac{P(Y \leq j \mid X)}{P(Y > j \mid X)}\right) = \alpha_j - \beta^T X for j=1 to J-1, assuming proportional (parallel lines assumption). This allows analysis of ranked data in fields like and , e.g., modeling severity levels based on treatment effects. Violations of proportional can be addressed with partial proportional models or multinomial alternatives.

In Discrete Choice Modeling

In discrete choice modeling, the serves as the foundation for analyzing individual among mutually exclusive alternatives, rooted in the random maximization (RUM) framework. Under RUM, an individual chooses alternative jj if it provides the highest UjU_j, where decomposes into an deterministic component VjV_j (capturing attributes of the alternative and individual characteristics) and an unobservable random error ϵj\epsilon_j, such that Uj=Vj+ϵjU_j = V_j + \epsilon_j. The probability of selecting jj over all other alternatives kjk \neq j is then Pj=P(Uj>Uk  kj)P_j = P(U_j > U_k \ \forall \ k \neq j). When the error terms ϵj\epsilon_j are independently and identically distributed according to a type I extreme value () distribution, this probability yields the multinomial logit (MNL) form, Pj=exp(Vj)kexp(Vk)P_j = \frac{\exp(V_j)}{\sum_k \exp(V_k)}, enabling estimation via maximum likelihood. A central assumption of the MNL model is the (IIA) property, which posits that the relative probabilities of choosing between two alternatives are unaffected by the presence or attributes of other alternatives. This implies proportional substitution patterns, where adding or removing an irrelevant option scales probabilities uniformly across the remaining options, often leading to unrealistic predictions in correlated choice sets (e.g., the "red bus/blue bus" , where bus modes are perfect substitutes but car is not). IIA arises directly from the of the Gumbel error terms in the RUM derivation and facilitates computational simplicity but restricts applicability in scenarios with shared unobserved factors. To address IIA's limitations, the nested logit model extends the MNL by grouping alternatives into nests with correlated errors within groups, allowing flexible substitution patterns across nests while maintaining IIA within them. In this generalized extreme value (GEV) framework, the probability incorporates a logsum (inclusive value) term for each nest, capturing intra-nest correlations; for instance, in transportation, nests might group similar modes like "" and "" versus "bus" and "," reflecting shared unobserved costs. This model relaxes global IIA, improving fit for hierarchical choices, and is derived from with appropriately structured error distributions. The logit-based models find widespread application in and social sciences for predicting choices in , , and policy contexts. In transportation, MNL and nested logit estimate mode choice probabilities based on attributes like , time, and reliability; for example, the probability of selecting over bus might be modeled as Pcar=exp(βccostcarβttimecar)exp(βccostcarβttimecar)+exp(βccostbusβttimebus)+exp(βccosttrainβttimetrain)P_{\text{car}} = \frac{\exp(-\beta_c \cdot \text{cost}_{\text{car}} - \beta_t \cdot \text{time}_{\text{car}})}{\exp(-\beta_c \cdot \text{cost}_{\text{car}} - \beta_t \cdot \text{time}_{\text{car}}) + \exp(-\beta_c \cdot \text{cost}_{\text{bus}} - \beta_t \cdot \text{time}_{\text{bus}}) + \exp(-\beta_c \cdot \text{cost}_{\text{train}} - \beta_t \cdot \text{time}_{\text{train}})}, where βc\beta_c and βt\beta_t are estimated parameters reflecting sensitivity to and time, aiding infrastructure planning. In , these models analyze brand selection from scanner data, incorporating prices and promotions to forecast market shares. In policy analysis, they model , linking individual demographics and issue positions to candidate preferences in multiparty elections.

Historical Development

Origins in Population Dynamics

The logistic function, from which the logit transformation derives as its inverse, originated in the modeling of processes in the . In 1838, Belgian mathematician Pierre-François Verhulst introduced the to describe bounded , addressing the limitations of assumptions by incorporating environmental constraints. The model is defined by the dPdt=rP(1PK),\frac{dP}{dt} = rP \left(1 - \frac{P}{K}\right), where P(t)P(t) represents population size at time tt, rr is the intrinsic growth rate, and KK is the carrying capacity, the maximum sustainable population level. Verhulst solved this equation analytically, yielding the sigmoid-shaped solution P(t)=K1+ert+c,P(t) = \frac{K}{1 + e^{-rt + c}}, which captures initial exponential-like growth followed by saturation near the carrying capacity, reflecting resource limitations in biological systems. He applied it to human population data from France and other regions, demonstrating fits that predicted long-term stabilization without invoking the logit explicitly at the time. Though initially overlooked, Verhulst's logistic curve gained traction in the early within and for modeling saturation in various growth processes, such as animal populations and agricultural yields, prior to its statistical formalization. A pivotal adoption occurred in 1920 when American biostatisticians Raymond Pearl and Lowell J. Reed independently rediscovered and applied the model to data from 1790 to 1910, fitting the logistic curve to estimate a carrying capacity of approximately 197 million people. Their work, published in the Proceedings of the , highlighted the curve's empirical accuracy in capturing historical population trends and projected future limits, reviving interest in Verhulst's framework. Throughout the , the logistic curve saw broader applications in and , emphasizing bounded growth in contexts like bacterial cultures and populations, which underscored its utility for processes approaching asymptotic limits without delving into probabilistic interpretations. These early uses established the sigmoid form as a foundational tool for describing self-limiting dynamics, laying the groundwork for later connections to probability models in statistics.

Adoption in Statistics

The adoption of the logit function in statistics began with its introduction by Joseph Berkson in 1944, where he proposed its use in bio-assay for modeling dose-response relationships, coining the term "logit" to describe the logarithmic transformation of the . This application marked a shift from earlier biological uses of the logistic curve toward statistical modeling of binary outcomes in experimental settings. In 1958, David Cox further advanced the logit by developing the proportional odds model for binary sequences, which formalized as a method for regressing binary responses on explanatory variables, enabling its broader application in statistical analysis. Cox's framework addressed limitations in linear models for dichotomous data, establishing the logit link as a cornerstone of generalized linear models. The 1970s saw significant expansion of logit-based methods into through Daniel McFadden's development of the conditional logit model, which extended the approach to analysis by incorporating individual-specific attributes in utility maximization frameworks. McFadden's contributions, detailed in his 1973 paper, earned him the in Economic Sciences in 2000 for advancing the analysis of qualitative choice behavior. A key milestone in popularizing logit models occurred in the 1970s with the implementation of generalized linear models in statistical software, notably GLIM (Generalized Linear Interactive Modelling), released by the Royal Statistical Society in 1974 after development starting in the early , which facilitated fitting and its adoption in social sciences and beyond. This software accessibility spurred widespread use across disciplines, from to . By the 2000s, the logit had evolved into contexts, with implementations like 's LogisticRegression class, introduced in the library's early releases following its inception in 2007, enabling scalable for large datasets in predictive modeling.

Comparisons with Similar Functions

With

The function serves as the link function in models and is defined as the inverse of the (CDF) of the standard , denoted Φ1(p)\Phi^{-1}(p), where pp is the probability between 0 and 1. In comparison, the logit function is ln(p1p)\ln\left(\frac{p}{1-p}\right), derived from the CDF of the . Both the and functions are monotonically increasing, S-shaped transformations that map probabilities from (0,1) to the real line (,)(-\infty, \infty). They approximate each other closely in the central region around p=0.5p = 0.5, where the probit value is roughly the logit value divided by 1.7, allowing for straightforward scaling between model estimates. A key difference arises from their underlying distributions: the for the logit exhibits heavier tails than the normal distribution for the , leading the logit to assign higher probabilities to extreme outcomes (near 0 or 1) under similar linear predictors. This tail behavior implies that logit models may predict more pronounced effects in the tails of the compared to probit models. In practice, the logit is often preferred for its analytical tractability, as both its CDF and inverse have closed-form expressions, facilitating easier computation and interpretation without . The probit, however, aligns better with assumptions of normally distributed latent variables, making it suitable for contexts where such normality is theoretically justified. Empirically, coefficients from logit models are typically scaled by a factor of 1.6 to 1.7 relative to probit coefficients on the same data, reflecting the variance differences between logistic (π2/3\pi^2/3) and normal (1) distributions. The logit model sees widespread adoption in , particularly for analysis based on random utility maximization, while the is more common in and where latent traits may follow a .

With Complementary Log-Log

The complementary log-log (cloglog) link function is defined as g(p)=ln(ln(1p))g(p) = \ln(-\ln(1 - p)), where pp is the probability of success in a binomial outcome, with the inverse transformation given by p=1exp(eη)p = 1 - \exp(-e^{\eta}) and η\eta as the linear predictor. This contrasts with the logit link, which employs the symmetric p=11+eηp = \frac{1}{1 + e^{-\eta}}, centered around 0.5 and approaching its asymptotes of 0 and 1 at equal rates. The cloglog function exhibits , with the transformation approaching negative infinity rapidly as pp nears 0 but ascending more gradually toward positive infinity as pp approaches 1, making it particularly suitable for modeling where probabilities are close to 0. This asymmetry arises from its connection to the , whose it mirrors in survival contexts. In comparison, the logit's renders it ideal for binary outcomes with probabilities balanced around 0.5, without favoring one extreme over the other. These differences have key implications for : the cloglog link accommodates asymmetric structures inherent in extreme-value distributions, allowing for unequal scaling of variance at the probability extremes, which is advantageous in scenarios with skewed outcome distributions. The logit, by assuming a symmetric logistic distribution with constant variance, performs better for standard where outcomes are not skewed toward rarity. In practice, the cloglog link finds prominent use in discrete-time , including grouped proportional hazards models that approximate the continuous-time Cox model for interval-censored or binned data. Such applications leverage its ability to model time-varying hazards under discrete observation. Conversely, the logit is preferred in settings with equally likely binary events, such as standard for classification tasks. Notably, the cloglog approximates the logit closely for small pp (near 0), but the functions diverge markedly when p>0.5p > 0.5, highlighting the need for careful choice based on expected probability ranges.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.