Hubbry Logo
Probit modelProbit modelMain
Open search
Probit model
Community hub
Probit model
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Probit model
Probit model
from Wikipedia

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit.[1] The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

A probit model is a popular specification for a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. When viewed in the generalized linear model framework, the probit model employs a probit link function.[2] It is most often estimated using the maximum likelihood procedure,[3] such an estimation being called a probit regression.

Conceptual framework

[edit]

Suppose a response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example, Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes the form

where P is the probability and is the cumulative distribution function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood.

It is possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable

where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive:

The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount.

To see that the two models are equivalent, note that

Model estimation

[edit]

Maximum likelihood estimation

[edit]

Suppose data set contains n independent statistical units corresponding to the model above.

For the single observation, conditional on the vector of inputs of that observation, we have:

where is a vector of inputs, and is a vector of coefficients.

The likelihood of a single observation is then

In fact, if , then , and if , then .

Since the observations are independent and identically distributed, then the likelihood of the entire sample, or the joint likelihood, will be equal to the product of the likelihoods of the single observations:

The joint log-likelihood function is thus

The estimator which maximizes this function will be consistent, asymptotically normal and efficient provided that exists and is not singular. It can be shown that this log-likelihood function is globally concave in , and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.

Asymptotic distribution for is given by

where

[citation needed]

and is the Probability Density Function (PDF) of standard normal distribution.

Semi-parametric and non-parametric maximum likelihood methods for probit-type and other related models are also available.[4]

Berkson's minimum chi-square method

[edit]

This method can be applied only when there are many observations of response variable having the same value of the vector of regressors (such situation may be referred to as "many observations per cell"). More specifically, the model can be formulated as follows.

Suppose among n observations there are only T distinct values of the regressors, which can be denoted as . Let be the number of observations with and the number of such observations with . We assume that there are indeed "many" observations per each "cell": for each .

Denote

Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of on with weights :

It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient.[citation needed] Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts , , and (for example in the analysis of voting behavior).

Albert and Chib Gibbs sampling method

[edit]

Gibbs sampling of a probit model is possible with the introduction of normally distributed latent variables z, which are observed as 1 if positive and 0 otherwise. This approach was introduced in Albert and Chib (1993),[5] which demonstrated how Gibbs sampling could be applied to binary and polychotomous response models within a Bayesian framework. Under a multivariate normal prior distribution over the weights, the model can be described as

From this, Albert and Chib (1993)[5] derive the following full conditional distributions in the Gibbs sampling algorithm:

The result for is given in the article on Bayesian linear regression, although specified with different notation, while the conditional posterior distributions of the latent variables follow a truncated normal distribution within the given ranges. The notation is the Iverson bracket, sometimes written or similar. Thus, knowledge of the observed outcomes serves to restrict the support of the latent variables.

Sampling of the weights given the latent vector from the multinormal distribution is standard. For sampling the latent variables from the truncated normal posterior distributions, one can take advantage of the inverse-cdf method, implemented in the following R vectorized function, making it straightforward to implement the method.

zbinprobit <- function(y, X, beta, n) {
  meanv <- X %*% beta  
  u <- runif(n)  # uniform(0,1) random variates
  cd <- pnorm(-meanv)  # cumulative normal CDF
  pu <- (u * cd) * (1 - 2 * y) + (u + cd) * y  
  cpui <- qnorm(pu)  # inverse normal CDF
  z <- meanv + cpui  # latent vector 
  return(z)
}

Model evaluation

[edit]

The suitability of an estimated binary model can be evaluated by counting the number of true observations equaling 1, and the number equaling zero, for which the model assigns a correct predicted classification by treating any estimated probability above 1/2 (or, below 1/2), as an assignment of a prediction of 1 (or, of 0). See Logistic regression § Model for details.

Performance under misspecification

[edit]

Consider the latent variable model formulation of the probit model. When the variance of conditional on is not constant but dependent on , then the heteroscedasticity issue arises. For example, suppose and where is a continuous positive explanatory variable. Under heteroskedasticity, the probit estimator for is usually inconsistent, and most of the tests about the coefficients are invalid. More importantly, the estimator for becomes inconsistent, too. To deal with this problem, the original model needs to be transformed to be homoskedastic. For instance, in the same example, can be rewritten as , where . Therefore, and running probit on generates a consistent estimator for the conditional probability

When the assumption that is normally distributed fails to hold, then a functional form misspecification issue arises: if the model is still estimated as a probit model, the estimators of the coefficients are inconsistent. For instance, if follows a logistic distribution in the true model, but the model is estimated by probit, the estimates will be generally smaller than the true value. However, the inconsistency of the coefficient estimates is practically irrelevant because the estimates for the partial effects, , will be close to the estimates given by the true logit model.[6]

To avoid the issue of distribution misspecification, one may adopt a general distribution assumption for the error term, such that many different types of distribution can be included in the model. The cost is heavier computation and lower accuracy for the increase of the number of parameter.[7] In most of the cases in practice where the distribution form is misspecified, the estimators for the coefficients are inconsistent, but estimators for the conditional probability and the partial effects are still very good.[citation needed]

One can also take semi-parametric or non-parametric approaches, e.g., via local-likelihood or nonparametric quasi-likelihood methods, which avoid assumptions on a parametric form for the index function and is robust to the choice of the link function (e.g., probit or logit).[4]

History

[edit]

The probit model is usually credited to Chester Bliss, who coined the term "probit" in 1934,[8] and to John Gaddum (1933), who systematized earlier work.[9] However, the basic model dates to the Weber–Fechner law by Gustav Fechner, published in Fechner (1860), and was repeatedly rediscovered until the 1930s; see Finney (1971, Chapter 3.6) and Aitchison & Brown (1957, Chapter 1.2).[9]

A fast method for computing maximum likelihood estimates for the probit model was proposed by Ronald Fisher as an appendix to Bliss' work in 1935.[10]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The probit model is a binary regression technique used in statistics to model the probability of a dichotomous outcome—typically coded as 0 or 1—as a linear function of predictor variables, where the cumulative distribution function (CDF) of the standard normal distribution transforms the linear predictor into a probability between 0 and 1. Introduced by biologist Chester Ittner Bliss in 1935 for analyzing dose-response relationships in toxicology, such as the probability of mortality from pesticide exposure, the model assumes an underlying latent variable that follows a normal distribution, with the observed binary outcome determined by whether this latent variable exceeds a threshold. The probit function, derived from the inverse of the standard normal CDF (denoted Φ^{-1}), linearizes the sigmoid-shaped response curve, enabling estimation of parameters via maximum likelihood methods, as ordinary least squares is inappropriate due to the bounded nature of probabilities. Mathematically, for a binary outcome Y_i and predictors X_i, the model specifies P(Y_i = 1 | X_i) = Φ(X_i β), where β is the vector of representing changes in the index (z-score) per unit change in predictors, and Φ is the standard normal CDF. This formulation contrasts with the model, which uses the logistic CDF and yields similar but not identical interpretations, often with approximately 1.6 times smaller than ones due to differences in distribution variance. David J. Finney advanced the model in his seminal 1947 book Probit Analysis, providing rigorous statistical treatments for applications and establishing protocols that became feasible with 1970s computing advancements. Beyond its origins in bioassays for estimating lethal doses (e.g., LD50), the probit model has broad applications in for analysis, such as predicting consumer decisions or labor force participation, and in for assessment, where it models default probabilities based on firm characteristics. Extensions include the ordered probit for ordinal outcomes (e.g., rating scales) and the for multiple categories, though the latter faces computational challenges from correlated errors. Interpretation typically involves marginal effects, which quantify how changes in predictors alter outcome probabilities, evaluated at means or specific values, as coefficients alone do not directly represent probability shifts due to the nonlinear link function.

Overview and Foundations

Definition and Purpose

The probit model is a type of employed in to analyze binary dependent variables, where the response probability is linked to a of predictors via the inverse of the of the standard . This approach models the probability of a dichotomous outcome, such as an event occurring (coded as 1) versus not occurring (coded as 0), under the assumption that the error terms in the underlying process follow a . The primary purpose of the probit model is to estimate and predict probabilities for binary events, particularly in scenarios where the latent, unobserved factors influencing the outcome are believed to exhibit symmetric, bell-shaped variability consistent with normality. It serves as a tool for understanding how covariates affect the likelihood of outcomes like participation in an activity or the onset of a condition, providing a framework that bounds predictions between 0 and 1 while capturing nonlinear relationships. This makes it suitable for applications requiring probabilistic interpretation without assuming a logistic structure. The name "" derives from "probability unit," a term coined by Chester Ittner Bliss in 1934 to describe a transformation that converts observed probabilities into a scale aligned with the standard , facilitating linear analysis of sigmoid-shaped response curves. Bliss introduced this in the context of experiments to quantify thresholds, building on earlier psychometric work by in 1927 that applied normal distributions to comparative judgments. The scale thus standardizes probabilities for easier modeling and comparison across studies. In practice, the probit model finds widespread use in to predict labor participation, where covariates such as , age, and influence the probability of among individuals. Similarly, in , it models incidence, estimating the likelihood of conditions like chronic illnesses based on risk factors including demographics and exposures. These applications highlight its role in deriving covariate-specific probabilities for policy and health interventions.

Relation to Logit and Other Binary Choice Models

The probit model and the logit model are both parametric approaches to binary choice modeling, differing primarily in their choice of cumulative distribution function (CDF) for the latent error term. The probit model applies the standard normal CDF, denoted as Φ, to map the linear predictor to probabilities, resulting in an S-shaped curve that is symmetric around 0.5. In contrast, the logit model uses the logistic CDF, Λ, which produces a similar sigmoidal shape but with heavier tails and a less steep central slope compared to the probit. This difference arises because the logistic distribution is leptokurtic relative to the normal distribution assumed in probit, leading to marginally different predicted probabilities, especially at extreme values of the predictors; however, empirical studies show that the models often yield substantively similar results in most applications. The (LPM), which uses ordinary to directly regress the binary outcome on predictors, serves as a simpler baseline but faces limitations that the mitigates. LPM predictions can fall outside the [0,1] interval, violating the bounded nature of probabilities, and its error variance is inherently heteroskedastic, varying with the predicted probability as var(y|x) = p(x)[1 - p(x)]. addresses these by enforcing predictions within [0,1] through the normal CDF and accounting for the nonlinear, heteroskedastic structure via , though it requires more computational effort. Another alternative is the complementary log-log (cloglog) model, which employs the CDF of the extreme value distribution, producing asymmetric probability curves that approach 0 and 1 at different rates. Unlike the symmetric error assumptions in and , the cloglog model is suited to scenarios with inherent asymmetry, such as discrete-time , but is generally preferred when normality and align with theoretical expectations, as in many contexts. The following table summarizes key distinctions among these models:
ModelLink FunctionError DistributionTypical Use Cases
Probit (Φ⁻¹(p))Standard Bioassays with symmetric responses; models assuming normality
Logit (ln(p/(1-p)))LogisticGeneral binary outcomes; interpretable odds ratios for convenience
Linear ProbabilityIdentity (p)Homoskedastic (idealized)Quick linear approximations despite boundary and heteroskedasticity issues
Complementary Log-LogCloglog (ln(-ln(1-p)))Extreme value (Type I)Asymmetric events, e.g., survival or rare events analysis

Mathematical Formulation

Binary Probit Model

The binary probit model provides a framework for modeling binary outcomes, where the dependent variable YiY_i takes values 0 or 1, conditional on a vector of covariates XiX_i. The core specification expresses the probability of the positive outcome as P(Yi=1Xi)=Φ(Xiβ),P(Y_i = 1 \mid X_i) = \Phi(X_i \beta), where Φ()\Phi(\cdot) denotes the (CDF) of the standard normal distribution, XiX_i includes an intercept and explanatory variables, and β\beta is the vector of unknown parameters to be estimated. Key assumptions underpin the model's validity: observations are independent across individuals ii; the linear index XiβX_i \beta correctly captures the systematic component of the outcome probability; and, implicitly, the disturbance term follows a standard normal distribution with mean 0 and variance 1, ensuring the CDF Φ\Phi appropriately bounds probabilities between 0 and 1. These assumptions facilitate a smooth, monotonically increasing probability function that asymptotes to 0 and 1, avoiding the boundary issues of linear probability models. The probit link function defines the transformation from probability to the linear predictor scale, given by g(p)=Φ1(p)g(p) = \Phi^{-1}(p), which maps a success probability p(0,1)p \in (0,1) to its corresponding z-score under the standard . This inverse CDF, often termed the transformation, linearizes the nonlinear probability relationship, allowing estimation of β\beta such that Φ1(P(Yi=1Xi))=Xiβ\Phi^{-1}(P(Y_i=1 \mid X_i)) = X_i \beta. In practice, the link ensures the model fits within the class, with the normal CDF providing a symmetric and bell-shaped density for the underlying errors. Identification of β\beta relies on normalizing the error variance to 1, which fixes the scale of the parameters and eliminates ambiguity that might arise in unnormalized latent models; no additional intercept adjustments are required beyond this standardization. This normalization aligns with the standard normal assumption, enabling unique maximum likelihood estimates under the specified functional form. The binary probit formulation can be viewed as arising from a latent continuous variable thresholded at zero, though full details of this representation are addressed elsewhere.

Latent Variable Representation

The probit model is conceptually grounded in a latent variable framework, which posits an unobserved continuous variable YiY_i^* that captures the underlying propensity or utility driving the observed binary outcome YiY_i. This latent variable follows the linear specification Yi=Xiβ+εi,Y_i^* = X_i \beta + \varepsilon_i, where XiX_i denotes the vector of explanatory variables for observation ii, β\beta is the vector of parameters, and εi\varepsilon_i is the error term assumed to be independently and identically distributed as standard normal, εiN(0,1)\varepsilon_i \sim N(0, 1). The observed binary response is then determined by a single threshold rule: Yi=1Y_i = 1 if Yi>0Y_i^* > 0, and Yi=0Y_i = 0 otherwise. This setup links the probit model to utility maximization or propensity thresholds in decision-making contexts, such as economic choices where YiY_i^* represents net utility from an option. This latent structure interprets the binary outcome as arising from a threshold-crossing mechanism, where the decision to select one category (e.g., "yes") occurs only if the latent propensity surpasses the normalized threshold of zero. The choice of zero as the threshold is without loss of generality, as any constant can be absorbed into the intercept term in XiβX_i \beta. Such a representation is particularly useful in fields like econometrics and biostatistics, where the unobserved YiY_i^* might reflect latent traits like risk propensity or biological response intensity, censored at the observation level. A key aspect of this formulation is the normalization of the error variance to σ2=1\sigma^2 = 1, which ensures model since the scale of the latent variable cannot be separately estimated from the alone. Without this restriction, the parameters β\beta would be scaled by an arbitrary factor, leading to non-unique solutions; this contrasts with models where variance is directly estimable from observed variation. In unnormalized variants, such as certain heteroskedastic extensions, additional parameters may be introduced, but the standard probit relies on this homoskedastic unit-variance assumption for computational tractability and consistency in . While the binary probit employs a single threshold at zero, this latent framework extends naturally to multiple ordered thresholds for polytomous outcomes, forming the basis of ordered models where categories emerge from several crossing points on the latent continuum.

Estimation Techniques

Maximum Likelihood Estimation

The parameters of the model are estimated using (MLE), which involves maximizing the log-likelihood function with respect to the coefficient vector β\beta. l(β)=i=1n[yilogΦ(Xiβ)+(1yi)log(1Φ(Xiβ))]l(\beta) = \sum_{i=1}^n \left[ y_i \log \Phi(X_i' \beta) + (1 - y_i) \log \left(1 - \Phi(X_i' \beta)\right) \right] Here, yiy_i is the observed binary outcome for the ii-th observation, XiX_i' denotes the row vector of explanatory variables, and Φ\Phi is the cumulative distribution function of the standard normal distribution. Since no closed-form solution exists, the maximization is performed numerically using iterative algorithms such as the Newton-Raphson method, which updates β\beta based on the score vector (first derivative of the log-likelihood) and the Hessian matrix (second derivative). Under standard regularity conditions, including correct model specification and identification, the MLE β^\hat{\beta} is consistent, meaning β^pβ\hat{\beta} \xrightarrow{p} \beta
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.