Hubbry Logo
ProbitProbitMain
Open search
Probit
Community hub
Probit
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Probit
Probit
from Wikipedia
Plot of probit function

In statistics, the probit function converts a probability (a number between 0 and 1) into a score. This score indicates how many standard deviations from the mean a value from a standard normal distribution (or "bell curve") is. For example, a probability of 0.5 (50%) represents the exact middle of the distribution, so its probit score is 0. A smaller probability like 0.025 (2.5%) is far to the left on the curve, corresponding to a probit score of approximately −1.96.

The function is widely used in probit models, a type of regression analysis for binary outcomes (e.g., success/failure or pass/fail). It was first developed in toxicology to analyze dose-response relationships, such as how the percentage of pests killed by a pesticide changes with its concentration.[1] The probit function is also used to create Q–Q plots, a graphical tool for assessing whether a dataset is normally distributed.

Mathematically, the probit function is the quantile function (the inverse of the cumulative distribution function (CDF)) associated with the standard normal distribution. If the CDF is denoted by , then the probit function is defined as:

.

This means that for any probability , the probit function finds the value such that the area under the standard normal curve to the left of is equal to .

Conceptual development

[edit]

The idea of the probit function was published by Chester Ittner Bliss in a 1934 article in Science on how to treat data such as the percentage of a pest killed by a pesticide.[1] Bliss proposed transforming the percentage killed into a "probability unit" (or "probit") which was linearly related to the modern definition (he defined it arbitrarily as equal to 0 for 0.0001 and 1 for 0.9999):[2]

These arbitrary probability units have been termed "probits" ...

He included a table to aid other researchers to convert their kill percentages to his probit, which they could then plot against the logarithm of the dose and thereby, it was hoped, obtain a more or less straight line. Such a so-called probit model is still important in toxicology, as well as other fields. The approach is justified in particular if response variation can be rationalized as a lognormal distribution of tolerances among subjects on test, where the tolerance of a particular subject is the dose just sufficient for the response of interest.

The method introduced by Bliss was carried forward in Probit Analysis, an important text on toxicological applications by D. J. Finney.[3][4] Values tabled by Finney can be derived from probits as defined here by adding a value of 5. This distinction is summarized by Collett (p. 55):[5] "The original definition of a probit [with 5 added] was primarily to avoid having to work with negative probits; ... This definition is still used in some quarters, but in the major statistical software packages for what is referred to as probit analysis, probits are defined without the addition of 5." Probit methodology, including numerical optimization for fitting of probit functions, was introduced before widespread availability of electronic computing. When using tables, it was convenient to have probits uniformly positive. Common areas of application do not require positive probits.

Symmetries

[edit]

Largely because of the central limit theorem, the standard normal distribution plays a fundamental role in probability theory and statistics. If we consider the familiar fact that the standard normal distribution places 95% of probability between −1.96 and 1.96 and is symmetric around zero, it follows that

The probit function gives the 'inverse' computation, generating a value of a standard normal random variable, associated with specified cumulative probability. Continuing the example,

.

In general,

and

Diagnosing deviation of a distribution from normality

[edit]

In addition to providing a basis for important types of regression, the probit function is useful in statistical analysis for diagnosing deviation from normality, according to the method of Q–Q plotting. If a set of data is actually a sample of a normal distribution, a plot of the values against their probit scores will be approximately linear. Specific deviations from normality such as asymmetry, heavy tails, or bimodality can be diagnosed based on detection of specific deviations from linearity. While the Q–Q plot can be used for comparison to any distribution family (not only the normal), the normal Q–Q plot is a relatively standard exploratory data analysis procedure because the assumption of normality is often a starting point for analysis.

Computation

[edit]

The normal distribution CDF and its inverse are not available in closed form, and computation requires careful use of numerical procedures. However, the functions are widely available in software for statistics and probability modeling, and in spreadsheets. In computing environments where numerical implementations of the inverse error function are available, the probit function may be obtained as

An example is MATLAB, where an 'erfinv' function is available. The language Mathematica implements 'InverseErf'. Other environments directly implement the probit function as is shown in the following R code.

> qnorm(0.025)
[1] -1.959964
> pnorm(-1.96)
[1] 0.02499790

Details for computing the inverse error function can be found at [1]. Wichura gives a fast algorithm for computing the probit function to 16 decimal places; this is used in R to generate random variates for the normal distribution.[6]

An ordinary differential equation for the probit function

[edit]

Another means of computation is based on forming a non-linear ordinary differential equation (ODE) for probit, as per the Steinbrecher and Shaw method.[7] Abbreviating the probit function as , the ODE is

where is the probability density function of w.

In the case of the Gaussian:

Differentiating again:

with the centre (initial) conditions

This equation may be solved by several methods, including the classical power series approach. From this, solutions of arbitrarily high accuracy may be developed based on Steinbrecher's approach to the series for the inverse error function. The power series solution is given by

where the coefficients satisfy the non-linear recurrence

with . In this form the ratio as .

Logit

[edit]
Comparison of the logit function with a scaled probit (i.e. the inverse CDF of the normal distribution), comparing vs. , which makes the slopes the same at the origin.

Closely related to the probit function (and probit model) are the logit function and logit model. The inverse of the logistic function is given by

Analogously to the probit model, we may assume that such a quantity is related linearly to a set of predictors, resulting in the logit model, the basis in particular of logistic regression model, the most prevalent form of regression analysis for categorical response data. In current statistical practice, probit and logit regression models are often handled as cases of the generalized linear model.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The probit model, also known as probit regression, is a statistical method used to analyze binary outcome variables by modeling the probability of a positive outcome as a function of predictor variables through the cumulative distribution function of the standard normal distribution. In this framework, the inverse standard normal cumulative distribution function, termed the probit link, transforms a linear predictor into a probability bounded between 0 and 1, assuming an underlying latent normal variable that determines the observed binary response. Originating in the field of bioassay and toxicology, the probit model was introduced by Chester I. Bliss in 1934 as a tool to quantify dose-response relationships, such as the proportion of organisms affected by varying concentrations of a toxic agent. Bliss coined the term "probit" from "probability unit," building on earlier work by J.H. Gaddum to linearize sigmoid-shaped response curves for easier statistical analysis. This approach facilitated maximum likelihood estimation and hypothesis testing, marking a shift from graphical methods to parametric modeling in quantitative biology. The has since become widely applied in , social sciences, and for scenarios involving dichotomous choices, such as labor force participation, medical treatment efficacy, and . Unlike the closely related model, which employs the logistic , the probit assumes normality in the error term of the latent variable, leading to slightly steeper probability transitions near 0.5 but similar overall predictions in most practical settings. Estimation typically involves maximum likelihood methods, with extensions like random-effects probit accommodating or clustered observations in longitudinal studies.

Definition and History

Definition of the Probit Function

The probit function, denoted probit(p)\operatorname{probit}(p) or Φ1(p)\Phi^{-1}(p), is defined as the inverse of the (CDF) Φ\Phi of the standard N(0,1)N(0,1). It transforms a probability p(0,1)p \in (0,1) into the corresponding zz on the real line such that Φ(z)=p\Phi(z) = p. This mapping allows the probit to convert bounded probabilities into unbounded z-scores, which are useful for linearizing sigmoid response curves in statistical . For instance, probit(0.5)=0\operatorname{probit}(0.5) = 0, reflecting the of the standard , and probit(0.975)1.96\operatorname{probit}(0.975) \approx 1.96, which marks the approximate upper limit of the central 95% probability interval. Equivalently, the probit function can be expressed in terms of the inverse : probit(p)=2\erf1(2p1),\operatorname{probit}(p) = \sqrt{2} \cdot \erf^{-1}(2p - 1),
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.