Hubbry Logo
search
logo
1602027

Jackknife resampling

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
Schematic of jackknife resampling

In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation.[1] The jackknife is a linear approximation of the bootstrap.[2]

The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.[2]

A simple example: mean estimation

[edit]

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.

For example, if the parameter to be estimated is the population mean of random variable , then for a given set of i.i.d. observations the natural estimator is the sample mean:

where the last sum used another way to indicate that the index runs over the set .

Then we proceed as follows: For each we compute the mean of the jackknife subsample consisting of all but the -th data point, and this is called the -th jackknife replicate:

It could help to think that these jackknife replicates approximate the distribution of the sample mean . A larger improves the approximation. Then finally to get the jackknife estimator, the jackknife replicates are averaged:

One may ask about the bias and the variance of . From the definition of as the average of the jackknife replicates one could try to calculate explicitly. The bias is a trivial calculation, but the variance of is more involved since the jackknife replicates are not independent.

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

This establishes the identity . Then taking expectations we get , so is unbiased, while taking variance we get . However, these properties do not generally hold for parameters other than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

could be used to construct an empirical estimate of the bias of , namely with some suitable factor , although in this case we know that so this construction does not add any meaningful knowledge, but it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of can be calculated from the variance of the jackknife replicates :[3][4]

The left equality defines the estimator and the right equality is an identity that can be verified directly. Then taking expectations we get , so this is an unbiased estimator of the variance of .

Estimating the bias of an estimator

[edit]

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample.

Suppose is the target parameter of interest, which is assumed to be some functional of the distribution of . Based on a finite set of observations , which is assumed to consist of i.i.d. copies of , the estimator is constructed:

The value of is sample-dependent, so this value will change from one random sample to another.

By definition, the bias of is as follows:

One may wish to compute several values of from several samples, and average them, to calculate an empirical approximation of , but this is impossible when there are no "other samples" when the entire set of available observations was used to calculate . In this kind of situation the jackknife resampling technique may be of help.

We construct the jackknife replicates:

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points:

Then we define their average:

The jackknife estimate of the bias of is given by:

and the resulting bias-corrected jackknife estimate of is given by:

This removes the bias in the special case that the bias is and reduces it to in other cases.[2]

Estimating the variance of an estimator

[edit]

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.

Literature

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Jackknife resampling is a nonparametric statistical method for estimating the bias and standard error of an estimator by recomputing it on subsamples formed by systematically leaving out one observation at a time from the original dataset of size nn, yielding nn such subsamples.[1] The technique was first introduced by Maurice H. Quenouille in 1949 as a method to reduce bias in estimators, particularly for serial correlation in time-series data, by averaging estimates from split samples. John W. Tukey expanded and popularized the approach in 1958, naming it the "jackknife" due to its versatility as a rough-and-ready tool analogous to a pocket knife, and applied it to variance estimation and confidence intervals for a wide range of statistics. Bradley Efron further developed resampling methods in the late 1970s, highlighting the jackknife's role as a linear approximation to the more general bootstrap technique for bias and variance correction.[2] In practice, if θn\theta_n denotes the original estimator from the full sample, the jackknife pseudovalues are computed as θ~i=nθn(n1)θ(i)\tilde{\theta}_i = n \theta_n - (n-1) \theta_{(i)} for each leave-one-out estimator θ(i)\theta_{(i)}, allowing the bias to be estimated as bias^=θn1ni=1nθ~i\widehat{\text{bias}} = \theta_n - \frac{1}{n} \sum_{i=1}^n \tilde{\theta}_i and the variance as var^=1n(n1)i=1n(θ~iθ~ˉ)2\widehat{\text{var}} = \frac{1}{n(n-1)} \sum_{i=1}^n (\tilde{\theta}_i - \bar{\tilde{\theta}})^2, where θ~ˉ\bar{\tilde{\theta}} is the mean of the pseudovalues. This process is computationally efficient compared to the bootstrap, requiring only nn resamples, and is particularly useful for smooth estimators where higher-order bias terms are negligible.[3] Jackknife resampling finds applications in survey sampling, regression analysis, and hypothesis testing, such as in the National Assessment of Educational Progress (NAEP) for complex variance estimation under multistage designs, and in reducing bias for maximum likelihood estimators or ratios.[4] Its advantages include simplicity, lack of distributional assumptions, and adaptability to grouped or deleted-d variants for handling dependencies or large datasets, though it may perform poorly for nonsmooth or highly nonlinear statistics where the bootstrap is preferred.[1]

Introduction and Fundamentals

Definition and Motivation

Jackknife resampling is a non-parametric statistical technique designed to assess the reliability of estimators by generating multiple subsets from an original sample of size nn, where each subset omits exactly one observation, resulting in nn such subsets.[2] This approach falls within the broader framework of resampling methods in statistics, which emerged to overcome the shortcomings of traditional parametric techniques that often require strong distributional assumptions about the data.[5] For complex estimators, such as those involving medians or ratios, conventional standard error calculations can fail due to the lack of closed-form expressions or reliance on unverified asymptotic normality, particularly in small or moderate sample sizes.[2] The primary motivation for jackknife resampling lies in its ability to provide empirical approximations of bias and variance without invoking large-sample asymptotic theory, which may not hold reliably for finite datasets.[5] By leveraging the original sample to simulate variability through systematic deletion, it enables statisticians to evaluate estimator performance in scenarios where analytical derivations are infeasible or inaccurate.[2] This method thus bridges the gap between theoretical inference and practical data analysis, offering a robust tool for uncertainty quantification in diverse applications.[5] Key advantages of the jackknife include its computational simplicity relative to more intensive simulation-based alternatives, as it requires only nn recomputations rather than generating numerous random samples.[2] Additionally, it effectively reduces bias in certain estimators, such as the sample variance, by adjusting for finite-sample effects that traditional formulas overlook.[5] These features make it particularly appealing for scenarios demanding quick, distribution-free assessments of statistical stability.[2]

Historical Development

The jackknife resampling technique originated in the mid-20th century as a method for bias reduction in statistical estimation. Maurice Quenouille first introduced a preliminary form of the technique in 1949, applying it to reduce bias in estimators of serial correlation by splitting samples into halves and averaging the results. He further refined and generalized the approach in 1956, extending it to bias correction in ratio estimation and other symmetric estimators through a systematic leave-one-out procedure.[6][7] In 1958, John Tukey built upon Quenouille's work by demonstrating the method's utility for variance estimation and confidence interval construction, while coining the term "jackknife" to evoke the versatility and robustness of a pocket knife as a multi-purpose tool. This naming and expansion marked a key milestone, shifting the jackknife from a niche bias-correction tool to a broader resampling strategy applicable to various estimators.[6] The technique gained further prominence through comprehensive reviews and formalizations in the 1970s. Rupert G. Miller Jr. provided an influential survey in 1974, synthesizing early developments and evaluating the jackknife's effectiveness in bias reduction alongside its emerging role in robust inference. Bradley Efron then formalized and extended the method in his seminal 1979 paper, integrating it with variance estimation and laying groundwork for its incorporation into wider resampling frameworks during the 1980s, as computational advances enabled practical implementations.[6][2]

Core Methodology

Basic Procedure

The basic procedure of the delete-one jackknife resampling method involves systematically omitting one observation at a time from the original sample to generate replicates of an estimator. Given an independent and identically distributed sample {X1,,Xn}\{X_1, \dots, X_n\} and an estimator θ^\hat{\theta} computed from the full sample, the process begins by calculating θ^\hat{\theta}.[8] For each i=1,,ni = 1, \dots, n, the jackknife replicate θ^(i)\hat{\theta}_{(-i)} is then computed by applying the estimator to the reduced sample excluding XiX_i, that is, θ^(i)=θ^(X1,,Xi1,Xi+1,,Xn)\hat{\theta}_{(-i)} = \hat{\theta}(X_1, \dots, X_{i-1}, X_{i+1}, \dots, X_n).[8] This yields a set of nn jackknife replicates {θ^(i)}i=1n\{\hat{\theta}_{(-i)}\}_{i=1}^n. The jackknife sample mean is defined as θˉ=1ni=1nθ^(i)\bar{\theta} = \frac{1}{n} \sum_{i=1}^n \hat{\theta}_{(-i)}. Pseudo-values provide a transformed set of values derived from the replicates, facilitating further statistical analysis by treating them analogously to independent observations. Introduced by Tukey, the pseudo-value for the ii-th observation is given by
Ji=nθ^(n1)θ^(i). J_i = n \hat{\theta} - (n-1) \hat{\theta}_{(-i)}.
To see how these pseudo-values recover the original estimator on average, consider their sample mean:
Jˉ=1ni=1nJi=1ni=1n[nθ^(n1)θ^(i)]=nθ^(n1)θˉ. \bar{J} = \frac{1}{n} \sum_{i=1}^n J_i = \frac{1}{n} \sum_{i=1}^n \left[ n \hat{\theta} - (n-1) \hat{\theta}_{(-i)} \right] = n \hat{\theta} - (n-1) \bar{\theta}.
This expression equals θ^\hat{\theta} precisely when θˉ=θ^\bar{\theta} = \hat{\theta}, which holds for certain estimators such as the sample mean. For the sample mean θ^=Xˉ=1nj=1nXj\hat{\theta} = \bar{X} = \frac{1}{n} \sum_{j=1}^n X_j, the leave-one-out replicate is θ^(i)=Xˉ(i)=1n1jiXj=nXˉXin1\hat{\theta}_{(-i)} = \bar{X}_{(-i)} = \frac{1}{n-1} \sum_{j \neq i} X_j = \frac{n \bar{X} - X_i}{n-1}. Substituting into the pseudo-value formula gives
Ji=nXˉ(n1)nXˉXin1=nXˉ(nXˉXi)=Xi. J_i = n \bar{X} - (n-1) \cdot \frac{n \bar{X} - X_i}{n-1} = n \bar{X} - (n \bar{X} - X_i) = X_i.
Thus, the pseudo-values are exactly the original observations {X1,,Xn}\{X_1, \dots, X_n\}, and their average Jˉ=Xˉ=θ^\bar{J} = \bar{X} = \hat{\theta} recovers the original estimator exactly.[9] In general, Jˉ\bar{J} provides a bias-corrected version of θ^\hat{\theta}, approximating recovery for estimators where the average of the leave-one-out estimates closely matches the full-sample value.[9] Computationally, the delete-one jackknife requires n+1n+1 evaluations of the estimator θ^\hat{\theta}: one for the full sample and nn for the reduced samples. Each reduced-sample computation typically costs nearly as much as the full, leading to an overall time complexity of O(n)O(n) times the cost of a single θ^\hat{\theta} evaluation.[9] This can be implemented efficiently by reusing computations where possible, especially for estimators like means or linear statistics. The following pseudocode outlines the procedure:
function jackknife_replicates([data](/page/Data), [estimator](/page/Estimator)):
    n = length([data](/page/Data))
    theta_hat = [estimator](/page/Estimator)([data](/page/Data))  # Full sample estimate
    replicates = [array](/page/Array) of [size](/page/Size) n
    for i in 1 to n:
        reduced_data = [data](/page/Data) without [data](/page/Data)[i]
        replicates[i] = [estimator](/page/Estimator)(reduced_data)
    return theta_hat, replicates

# Optional: Compute pseudo-values
function pseudo_values(theta_hat, replicates, n):
    pv = [array](/page/Array) of [size](/page/Size) n
    for i in 1 to n:
        pv[i] = n * theta_hat - (n - 1) * replicates[i]
    return pv
This structure allows straightforward extension to variance or bias estimation using the replicates or pseudo-values.[9]

Pseudo-Values and Jackknife Estimates

In jackknife resampling, pseudo-values are derived from the full-sample estimator θ^\hat{\theta} and the leave-one-out estimators θ^i\hat{\theta}_{-i}, where θ^i\hat{\theta}_{-i} is computed by omitting the ii-th observation from the sample of size nn. The ii-th pseudo-value is given by
Ji=nθ^(n1)θ^i, J_i = n \hat{\theta} - (n-1) \hat{\theta}_{-i},
for i=1,,ni = 1, \dots, n. This formulation, introduced by Quenouille for bias reduction and formalized by Tukey, isolates the contribution of each observation to the overall estimate.[10] The average of the pseudo-values is Jˉ=1ni=1nJi\bar{J} = \frac{1}{n} \sum_{i=1}^n J_i. Substituting the definition yields
Jˉ=nθ^(n1)θ^ˉ, \bar{J} = n \hat{\theta} - (n-1) \bar{\hat{\theta}}_{-},
where θ^ˉ=1ni=1nθ^i\bar{\hat{\theta}}_{-} = \frac{1}{n} \sum_{i=1}^n \hat{\theta}_{-i} is the average of the leave-one-out estimators. For linear statistics, such as the sample mean, the leave-one-out estimators satisfy θ^ˉ=θ^\bar{\hat{\theta}}_{-} = \hat{\theta}, so Jˉ=θ^\bar{J} = \hat{\theta}. This equality holds because each observation is symmetrically treated in linear estimators, ensuring the average leave-one-out estimate matches the full-sample estimate exactly. Pseudo-values can be interpreted as "leave-one-in" contributions, representing the influence of the ii-th observation on θ^\hat{\theta} as if it were the sole contributor in a reweighted sense. This perspective treats the set {J1,,Jn}\{J_1, \dots, J_n\} as an augmented sample drawn from the sampling distribution of θ^\hat{\theta}, enabling further statistical analysis on this transformed data. For instance, the jackknife estimate of a function g(θ)g(\theta) is obtained by applying gg to the mean of the pseudo-values, such as g(Jˉ)g(\bar{J}); for the standard error of θ^\hat{\theta}, it is the sample standard deviation of the JiJ_i divided by n\sqrt{n}. Key properties of pseudo-values include unbiasedness for linear statistics, where the JiJ_i recover the original observations exactly, preserving the unbiased nature of estimators like the sample mean. Additionally, they provide finite-sample bias corrections by design, reducing the order of bias from O(1/n)O(1/n) to O(1/n2)O(1/n^2) for smooth functions, though this correction is exact only for specific cases like polynomial estimators of low degree.[10]

Estimation Techniques

Bias Correction

In jackknife resampling, bias correction is achieved by estimating the bias of an estimator θ^\hat{\theta} and subtracting it to obtain a corrected version, with pseudo-values serving as the computational basis for deriving these estimates.[11] The jackknife estimate of bias, denoted b^\hat{b}, is given by b^=(n1)[θ^ˉ(i)θ^]\hat{b} = (n-1) [\bar{\hat{\theta}}_{(-i)} - \hat{\theta}], where nn is the sample size, θ^\hat{\theta} is the estimator computed from the full sample, and θ^ˉ(i)\bar{\hat{\theta}}_{(-i)} is the average of the nn leave-one-out estimators θ^(i)\hat{\theta}_{(-i)}, each omitting the ii-th observation. To derive this, consider the influence of a single observation on the estimator. For a leave-one-out sample of size n1n-1, the expected value of θ^(i)\hat{\theta}_{(-i)} approximates the bias expansion for smaller samples. Specifically, if the bias of θ^\hat{\theta} based on mm observations is E[θ^m]θ=b1/m+O(1/m2)E[\hat{\theta}_m] - \theta = b_1 / m + O(1/m^2), then averaging over the leave-one-out estimates yields E[θ^ˉ(i)]θ+b1/(n1)+O(1/n2)E[\bar{\hat{\theta}}_{(-i)}] \approx \theta + b_1 / (n-1) + O(1/n^2). Substituting into the formula gives E[b^](n1)(b1n1b1n)+O(1/n)=b1/n+O(1/n2)E[\hat{b}] \approx (n-1) \left( \frac{b_1}{n-1} - \frac{b_1}{n} \right) + O(1/n) = b_1 / n + O(1/n^2), isolating and scaling the leading b1/nb_1 / n term, removing the first-order bias contribution. The bias-corrected jackknife estimator is then θ~=θ^b^\tilde{\theta} = \hat{\theta} - \hat{b}, which simplifies to θ~=nθ^(n1)θ^ˉ(i)\tilde{\theta} = n \hat{\theta} - (n-1) \bar{\hat{\theta}}_{(-i)}.[11] Theoretically, this correction is justified through the asymptotic expansion of the bias, assuming the estimator is consistent and the data are independent with finite moments. Under these conditions, the jackknife reduces the bias from O(1/n)O(1/n) to O(1/n2)O(1/n^2). It is particularly effective for mildly biased estimators, such as the sample variance computed as 1n(xixˉ)2\frac{1}{n} \sum (x_i - \bar{x})^2, where the first-order bias is removed, though it does not address higher-order biases.

Variance Estimation

The jackknife method provides an estimate of the variance of a point estimator θ^\hat{\theta} derived from a sample of size nn by computing leave-one-out replicates θ^i\hat{\theta}_{-i} for i=1,,ni = 1, \dots, n, where each θ^i\hat{\theta}_{-i} is the estimator based on the sample excluding the ii-th observation. The jackknife variance estimator is then given by
v^=n1ni=1n(θ^iθˉ)2, \hat{v} = \frac{n-1}{n} \sum_{i=1}^n \left( \hat{\theta}_{-i} - \bar{\theta} \right)^2,
where θˉ=1ni=1nθ^i\bar{\theta} = \frac{1}{n} \sum_{i=1}^n \hat{\theta}_{-i} is the average of the leave-one-out estimates.[12] This formula, introduced by Tukey, scales the sample variance of the replicates by n1n\frac{n-1}{n} to adjust for the finite-sample bias arising from the correlation among the θ^i\hat{\theta}_{-i}, which share n1n-1 observations.[10] For smooth, linear statistics like the sample mean, the jackknife variance coincides exactly with the conventional estimator σ^2/n\hat{\sigma}^2 / n, where σ^2\hat{\sigma}^2 is the sample variance.[12] An equivalent formulation expresses the variance in terms of pseudo-values Ji=nθ^(n1)θ^iJ_i = n \hat{\theta} - (n-1) \hat{\theta}_{-i}, which can be interpreted as adjusted contributions of each observation to the estimator. The average pseudo-value Jˉ=1ni=1nJi=θ^\bar{J} = \frac{1}{n} \sum_{i=1}^n J_i = \hat{\theta} serves as a bias-corrected estimate, and the variance is
v^=1n(n1)i=1n(JiJˉ)2. \hat{v} = \frac{1}{n(n-1)} \sum_{i=1}^n (J_i - \bar{J})^2.
This representation derives from viewing the pseudo-values as approximately independent replicates with variance nVar(θ^)n \operatorname{Var}(\hat{\theta}), such that their sample variance, divided by nn, yields the desired estimate after bias correction.[12] The adjustment ensures consistency under regularity conditions for differentiable functions of the data.[2] The estimated variance v^\hat{v} quantifies the uncertainty in θ^\hat{\theta}, enabling computation of the standard error se^=v^\hat{se} = \sqrt{\hat{v}}, which measures the precision of the point estimate on the scale of the data. This standard error facilitates hypothesis testing, such as t-tests comparing θ^\hat{\theta} to a null value, by standardizing the statistic (θ^θ0)/se^(\hat{\theta} - \theta_0)/\hat{se}.[12] However, the jackknife variance estimator can underestimate the true variance for non-smooth estimators, such as the sample median.[13]

Practical Examples

Univariate Mean Estimation

Jackknife resampling provides a practical illustration in the context of estimating the population mean from a univariate sample X1,X2,,XnX_1, X_2, \dots, X_n drawn independently from an unknown distribution FF, where the estimator is the sample mean θ^=1ni=1nXi\hat{\theta} = \frac{1}{n} \sum_{i=1}^n X_i.[12] The jackknife procedure generates nn replicates by omitting one observation at a time, yielding θ^i=1n1jiXj\hat{\theta}_{-i} = \frac{1}{n-1} \sum_{j \neq i} X_j for i=1,,ni = 1, \dots, n.[12] Consider a small dataset with n=3n=3 and values X=(1,2,3)X = (1, 2, 3). The original sample mean is θ^=2\hat{\theta} = 2. The jackknife replicates are computed as follows: omitting X1=1X_1 = 1 gives θ^1=(2+3)/2=2.5\hat{\theta}_{-1} = (2 + 3)/2 = 2.5; omitting X2=2X_2 = 2 gives θ^2=(1+3)/2=2\hat{\theta}_{-2} = (1 + 3)/2 = 2; omitting X3=3X_3 = 3 gives θ^3=(1+2)/2=1.5\hat{\theta}_{-3} = (1 + 2)/2 = 1.5. The average of these replicates is θ^ˉ=(2.5+2+1.5)/3=2\bar{\hat{\theta}}_{-} = (2.5 + 2 + 1.5)/3 = 2.[14] Applying the general jackknife formulas to this case, the bias estimate is bias^=(n1)(θ^ˉθ^)=2(22)=0\widehat{\text{bias}} = (n-1)(\bar{\hat{\theta}}_{-} - \hat{\theta}) = 2(2 - 2) = 0.[12] The variance estimate is Var^(θ^)=n1ni=1n(θ^iθ^ˉ)2=23[(2.52)2+(22)2+(1.52)2]=23×0.5=13\widehat{\text{Var}}(\hat{\theta}) = \frac{n-1}{n} \sum_{i=1}^n (\hat{\theta}_{-i} - \bar{\hat{\theta}}_{-})^2 = \frac{2}{3} [(2.5-2)^2 + (2-2)^2 + (1.5-2)^2] = \frac{2}{3} \times 0.5 = \frac{1}{3}.[12] These results are summarized in the table below:
iiXiX_iθ^i\hat{\theta}_{-i}θ^iθ^ˉ\hat{\theta}_{-i} - \bar{\hat{\theta}}_{-}
112.50.5
2220
331.5-0.5
This example highlights that for linear statistics like the sample mean, the jackknife bias estimate is exactly zero and the variance estimate matches the classical formula 1n(n1)i=1n(Xiθ^)2\frac{1}{n(n-1)} \sum_{i=1}^n (X_i - \hat{\theta})^2.[14]

Linear Regression Coefficients

In linear regression, the model is specified as $ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} $, where $ \mathbf{Y} $ is an $ n \times 1 $ response vector, $ \mathbf{X} $ is an $ n \times p $ design matrix of predictors, $ \boldsymbol{\beta} $ is a $ p \times 1 $ vector of unknown coefficients, and $ \boldsymbol{\varepsilon} $ is an error vector with mean zero. The ordinary least-squares estimator is $ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y} $.[15] The jackknife procedure for coefficients involves refitting the model $ n $ times, each time omitting one observation to obtain $ \hat{\boldsymbol{\beta}}{-i} $ for $ i = 1, \dots, n $. Focusing on a single coefficient for simplicity, such as the slope $ \hat{\beta}j $, the leave-one-out estimates $ \hat{\beta}{j,-i} $ are used to compute pseudo-values $ \beta{j,i}^* = n \hat{\beta}j - (n-1) \hat{\beta}{j,-i} $. The jackknife estimate of $ \beta_j $ is the average of these pseudo-values, which corrects for bias as $ \hat{\beta}j^{JK} = n \bar{\hat{\beta}}{j,(-.)} - (n-1) \hat{\beta}j $, where $ \bar{\hat{\beta}}{j,(-.)} = \frac{1}{n} \sum_{i=1}^n \hat{\beta}{j,-i} $. The estimated bias is $ n (\bar{\hat{\beta}}{j,(-.)} - \hat{\beta}j ) $, and the variance is $ \widehat{\mathrm{Var}}(\hat{\beta}j) = \frac{n-1}{n} \sum{i=1}^n (\hat{\beta}{j,-i} - \bar{\hat{\beta}}_{j,(-.)})^2 $. This delete-one approach extends the univariate case by requiring full model refits at each step.[15] For illustration, consider a simulated dataset with $ n=10 $ observations and $ p=2 $ predictors, where the full-sample slope for the first predictor is $ \hat{\beta}1 = 2.0 $. The leave-one-out slope estimates $ \hat{\beta}{1,-i} $ are shown in the table below, along with deviations from the average $ \bar{\hat{\beta}}_{1,(-.)} = 2.0 $.
$ i $$ \hat{\beta}_{1,-i} $Deviation
11.95-0.05
22.050.05
31.90-0.10
42.100.10
52.000.00
61.85-0.15
72.150.15
82.000.00
91.95-0.05
102.050.05
The jackknife bias estimate is $ 10 (2.0 - 2.0) = 0 $. The sum of squared deviations is 0.075, yielding a variance estimate of $ \frac{9}{10} \times 0.075 = 0.0675 $ (standard error ≈ 0.26). Observation 6 and 7 show larger deviations, indicating potential influence.[15] A key insight from the jackknife in this context is its ability to handle leverage points, where the leverage of the $ i $-th observation is $ w_i = \mathbf{x}i^T (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{x}i $, and the delete-one variance incorporates these weights as $ V{J,1}(\hat{\boldsymbol{\beta}}) = \sum{i=1}^n (1 - w_i) (\hat{\boldsymbol{\beta}}{-i} - \hat{\boldsymbol{\beta}})(\hat{\boldsymbol{\beta}}{-i} - \hat{\boldsymbol{\beta}})^T $. Large $ |\hat{\beta}_{j,-i} - \hat{\beta}_j| $ flags influential observations that disproportionately affect the coefficient estimates.[15]

Extensions and Applications

Generalized Jackknife

The generalized jackknife extends the standard delete-one procedure by modifying the resampling strategy to address limitations in bias reduction and variance estimation for certain statistics, particularly those that are smooth functions of the data. One prominent variant is the delete-d jackknife, where d observations (with d > 1) are omitted at a time instead of a single observation, generating \binom{n}{d} replicates from a sample of size n.[16] This approach adjusts the bias and variance formulas relative to the delete-one case; the delete-d jackknife is particularly useful when the standard jackknife provides poor variance estimates for smooth functions of the sample, as it can achieve higher-order bias reduction by choosing d appropriately relative to n. For instance, in regression settings, selecting d > 1 helps mitigate underestimation of variability in heteroscedastic models without requiring parametric assumptions.[17] The variance is estimated as \widehat{\var} = \frac{n-d}{d} \cdot \frac{1}{\binom{n}{d} - 1} \sum_{j} (\theta_{(j)} - \bar{\theta})^2, providing consistency for linear and smooth nonlinear statistics. Pseudo-values are less commonly defined for d > 1 due to the large number of replicates. Other generalizations include the infinitesimal jackknife, which approximates the finite delete-one jackknife through differential calculus for asymptotic expansions, treating observation deletion as an infinitesimal perturbation to derive influence functions and variance estimates efficiently for complex estimators like those in machine learning ensembles.[5] This method aligns with the delta method and is especially valuable for large-scale computations where full resampling is infeasible.[18] Additionally, the block jackknife adapts the technique for dependent data, such as time series, by omitting contiguous blocks of l observations rather than individual points, preserving serial correlations while estimating variance through the sample variance of block-deleted replicates.[19]

Use in Confidence Intervals

Jackknife resampling contributes to confidence interval construction by leveraging pseudo-values and variance estimates to approximate the sampling distribution of a parameter estimator θ^\hat{\theta}. One straightforward approach is the percentile method applied to the distribution of pseudo-values. The pseudo-values PVi=nθ^(n1)θ^(i)PV_i = n \hat{\theta} - (n-1) \hat{\theta}_{(i)} for i=1,,ni = 1, \dots, n, where θ^(i)\hat{\theta}_{(i)} is the estimator omitting the ii-th observation, serve as nn nearly independent replicates of θ^\hat{\theta}. A (1α)(1-\alpha) confidence interval is then formed by the α/2\alpha/2 and 1α/21-\alpha/2 percentiles of these pseudo-values, providing a nonparametric interval that accounts for bias and asymmetry without assuming normality. Studentized jackknife intervals enhance reliability by incorporating the jackknife estimate of standard error se^\widehat{\mathrm{se}}. The interval is constructed as θ^±tn1,1α/2se^\hat{\theta} \pm t_{n-1, 1-\alpha/2} \widehat{\mathrm{se}}, where tn1,1α/2t_{n-1, 1-\alpha/2} is the (1α/2)(1-\alpha/2)-quantile of the Student's tt-distribution with n1n-1 degrees of freedom, and se^\widehat{\mathrm{se}} is derived from the variability among the delete-one estimators as se^=n1ni=1n(θ^(i)θˉjack)2\widehat{\mathrm{se}} = \sqrt{\frac{n-1}{n} \sum_{i=1}^n (\hat{\theta}_{(i)} - \bar{\theta}_{\mathrm{jack}})^2}, with θˉjack=1ni=1nθ^(i)\bar{\theta}_{\mathrm{jack}} = \frac{1}{n} \sum_{i=1}^n \hat{\theta}_{(i)}. This method approximates a pivotal tt-statistic, improving coverage when the sampling distribution deviates from normality. The bias-corrected accelerated (BCa) interval integrates jackknife estimates into bootstrap procedures for refined percentile intervals. The bias correction z0z_0 is estimated via the jackknife as z0=Φ1(I(θ^(i)<θ^)n)z_0 = \Phi^{-1} \left( \frac{\sum I(\hat{\theta}_{(i)} < \hat{\theta})}{n} \right), where Φ1\Phi^{-1} is the standard normal inverse CDF and II is the indicator function. The acceleration aa measures skewness, approximated by the third cumulant from jackknife pseudo-values: a=i=1n(θ^(i)θˉjack)36[i=1n(θ^(i)θˉjack)2]3/2a = \frac{\sum_{i=1}^n (\hat{\theta}_{(i)} - \bar{\theta}_{\mathrm{jack}})^3}{6 \left[ \sum_{i=1}^n (\hat{\theta}_{(i)} - \bar{\theta}_{\mathrm{jack}})^2 \right]^{3/2}}. These adjust the bootstrap percentile limits to θ^α/2=G1(Φ(z0+z0+zα/21a(z0+zα/2)))\hat{\theta}^*_{\alpha/2} = G^{-1} \left( \Phi \left( z_0 + \frac{z_0 + z_{\alpha/2}}{1 - a (z_0 + z_{\alpha/2})} \right) \right) and similarly for the upper limit, where GG is the bootstrap CDF and zα/2=Φ1(α/2)z_{\alpha/2} = \Phi^{-1}(\alpha/2), yielding intervals robust to bias and skewness. A jackknife-after-bootstrap hybrid refines standard errors for interval construction by applying jackknife resampling to bootstrap replicates, estimating the variability of bootstrap quantiles more stably than direct bootstrap variance. This approach computes influence functions from bootstrap samples, enabling adjusted studentized intervals with improved accuracy in finite samples. For the univariate mean from a skewed distribution, such as a sample of size n=10n=10 from a chi-squared with 1 degree of freedom (mean 1, but positively skewed), the jackknife studentized 95% interval might yield (0.45, 2.15) compared to the normal approximation (0.32, 1.68), demonstrating wider coverage of the true mean in simulations. Jackknife intervals generally exhibit superior coverage properties over normal-approximation intervals for skewed distributions, achieving nominal levels closer to 95% due to their bias reduction and non-reliance on symmetry.

Comparisons and Limitations

Relation to Bootstrap Resampling

Jackknife and bootstrap resampling are both nonparametric techniques used to estimate the bias and variance of a statistic by generating multiple approximations of the sampling distribution from the original dataset. Introduced by Bradley Efron in 1979, the bootstrap builds upon the jackknife by providing a more general framework, with the jackknife serving as a linear approximation to the bootstrap process.[18] These methods share the advantage of requiring fewer assumptions about the underlying data distribution compared to traditional parametric approaches, enabling reliable inference in diverse statistical contexts.[20] A key difference lies in their resampling mechanisms: the jackknife is deterministic, producing exactly nn replicates by systematically deleting one observation at a time from a dataset of size nn, whereas the bootstrap is stochastic, generating a large number BnB \gg n of resamples with replacement from the full dataset. This makes the jackknife computationally faster and less resource-intensive, particularly for moderate sample sizes, but the bootstrap's randomness allows for broader applicability and typically yields estimates with smaller standard errors. The jackknife performs well for smooth, linear statistics but can be less accurate for non-smooth ones, such as the sample median, where its variance estimator fails asymptotically, unlike the bootstrap.[18][20] The bootstrap's variance estimates exhibit faster asymptotic convergence compared to the jackknife, achieving higher-order accuracy in many cases, though the jackknife's delete-one structure offers simplicity and exact bias reduction for linear estimators. For small sample sizes or linear statistics, the jackknife is often preferred due to its efficiency and lack of randomness; in contrast, the bootstrap is better suited for complex or skewed distributions where more replicates enhance precision.[18][20]

Assumptions and Drawbacks

The jackknife resampling method assumes that the observed data consist of independent and identically distributed (i.i.d.) samples from an underlying distribution.[21] This i.i.d. condition is fundamental for the method's nonparametric estimation of bias and variance, as violations can lead to invalid inferences.[5] Additionally, for asymptotic validity, the estimator of interest must be differentiable or smooth, ensuring that the method provides a second-order accurate approximation to the sampling distribution.[2] The procedure also requires the distribution to have no heavy tails, typically implying finite second or higher moments to guarantee the existence of the variance being estimated.[21] A key drawback of the jackknife is its inconsistency when estimating the variance of nonsmooth functionals, such as the sample median or quantiles, where the delete-1 jackknife fails to converge to the true variance.[22] Similarly, it is inconsistent for U-statistics unless specialized modifications are applied, as the linear approximation inherent in the method breaks down for these nonlinear forms.[17] For small sample sizes, the jackknife can produce negative variance estimates, which undermine its utility despite efforts to retain them for unbiasedness.[23] The method performs poorly with dependent data, such as time series, without adaptations like block jackknifing to account for serial correlation.[12] Furthermore, the jackknife exhibits sensitivity to outliers, as removing a single aberrant observation affects nearly all replicates, potentially skewing the overall bias and variance estimates across the board.[2] To mitigate these issues, variants such as the delete-d jackknife—where d > 1 observations are omitted per replicate—can improve consistency for certain nonsmooth cases and reduce outlier influence.[24] Robust extensions further address heavy-tailed or skewed distributions. Simulation studies indicate that in skewed settings, the standard jackknife may underestimate true variance, highlighting the need for these adjustments in non-normal data.[25]

References

User Avatar
No comments yet.