Hubbry Logo
Resampling (statistics)Resampling (statistics)Main
Open search
Resampling (statistics)
Community hub
Resampling (statistics)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Resampling (statistics)
Resampling (statistics)
from Wikipedia

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests (also re-randomization tests) for generating counterfactual samples
  2. Bootstrapping
  3. Cross validation
  4. Jackknife

Permutation tests

[edit]

Permutation tests rely on resampling the original data assuming the null hypothesis. Based on the resampled data it can be concluded how likely the original data is to occur under the null hypothesis.

Bootstrap

[edit]
The best example of the plug-in principle, the bootstrapping method

Bootstrapping is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. It has been called the plug-in principle,[1] as it is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample.

For example,[1] when estimating the population mean, this method uses the sample mean; to estimate the population median, it uses the sample median; to estimate the population regression line, it uses the sample regression line.

It may also be used for constructing hypothesis tests. It is often used as a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors. Bootstrapping techniques are also used in the updating-selection transitions of particle filters, genetic type algorithms and related resample/reconfiguration Monte Carlo methods used in computational physics.[2][3] In this context, the bootstrap is used to replace sequentially empirical weighted probability measures by empirical measures. The bootstrap allows to replace the samples with low weights by copies of the samples with high weights.

Cross-validation

[edit]

Cross-validation is a statistical method for validating a predictive model. Subsets of the data are held out for use as validating sets; a model is fit to the remaining data (a training set) and used to predict for the validation set. Averaging the quality of the predictions across the validation sets yields an overall measure of prediction accuracy. Cross-validation is employed repeatedly in building decision trees.

One form of cross-validation leaves out a single observation at a time; this is similar to the jackknife. Another, K-fold cross-validation, splits the data into K subsets; each is held out in turn as the validation set.

This avoids "self-influence". For comparison, in regression analysis methods such as linear regression, each y value draws the regression line toward itself, making the prediction of that value appear more accurate than it really is. Cross-validation applied to linear regression predicts the y value for each observation without using that observation.

This is often used for deciding how many predictor variables to use in regression. Without cross-validation, adding predictors always reduces the residual sum of squares (or possibly leaves it unchanged). In contrast, the cross-validated mean-square error will tend to decrease if valuable predictors are added, but increase if worthless predictors are added.[4]

Monte Carlo cross-validation

[edit]

Subsampling is an alternative method for approximating the sampling distribution of an estimator. The two key differences to the bootstrap are:

  1. the resample size is smaller than the sample size and
  2. resampling is done without replacement.

The advantage of subsampling is that it is valid under much weaker conditions compared to the bootstrap. In particular, a set of sufficient conditions is that the rate of convergence of the estimator is known and that the limiting distribution is continuous. In addition, the resample (or subsample) size must tend to infinity together with the sample size but at a smaller rate, so that their ratio converges to zero. While subsampling was originally proposed for the case of independent and identically distributed (iid) data only, the methodology has been extended to cover time series data as well; in this case, one resamples blocks of subsequent data rather than individual data points. There are many cases of applied interest where subsampling leads to valid inference whereas bootstrapping does not; for example, such cases include examples where the rate of convergence of the estimator is not the square root of the sample size or when the limiting distribution is non-normal. When both subsampling and the bootstrap are consistent, the bootstrap is typically more accurate. RANSAC is a popular algorithm using subsampling.

Jackknife cross-validation

[edit]

Jackknifing (jackknife cross-validation), is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. Historically, this method preceded the invention of the bootstrap with Quenouille inventing this method in 1949 and Tukey extending it in 1958.[5][6] This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random.[7] He coined the name 'interpenetrating samples' for this method.

Quenouille invented this method with the intention of reducing the bias of the sample estimate. Tukey extended this method by assuming that if the replicates could be considered identically and independently distributed, then an estimate of the variance of the sample parameter could be made and that it would be approximately distributed as a t variate with n−1 degrees of freedom (n being the sample size).

The basic idea behind the jackknife variance estimator lies in systematically recomputing the statistic estimate, leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated. Jackknife is equivalent to the random (subsampling) leave-one-out cross-validation, it only differs in the goal.[8]

For many statistical parameters the jackknife estimate of variance tends asymptotically to the true value almost surely. In technical terms one says that the jackknife estimate is consistent. The jackknife is consistent for the sample means, sample variances, central and non-central t-statistics (with possibly non-normal populations), sample coefficient of variation, maximum likelihood estimators, least squares estimators, correlation coefficients and regression coefficients.

It is not consistent for the sample median. In the case of a unimodal variate the ratio of the jackknife variance to the sample variance tends to be distributed as one half the square of a chi square distribution with two degrees of freedom.

Instead of using the jackknife to estimate the variance, it may instead be applied to the log of the variance. This transformation may result in better estimates particularly when the distribution of the variance itself may be non normal.

The jackknife, like the original bootstrap, is dependent on the independence of the data. Extensions of the jackknife to allow for dependence in the data have been proposed. One such extension is the delete-a-group method used in association with Poisson sampling.

Comparison of bootstrap and jackknife

[edit]

Both methods, the bootstrap and the jackknife, estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. For the more general jackknife, the delete-m observations jackknife, the bootstrap can be seen as a random approximation of it. Both yield similar numerical results, which is why each can be seen as approximation to the other. Although there are huge theoretical differences in their mathematical insights, the main practical difference for statistics users is that the bootstrap gives different results when repeated on the same data, whereas the jackknife gives exactly the same result each time. Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies). On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences).

Whether to use the bootstrap or the jackknife may depend more on operational aspects than on statistical concerns of a survey. The jackknife, originally used for bias reduction, is more of a specialized method and only estimates the variance of the point estimator. This can be enough for basic statistical inference (e.g., hypothesis testing, confidence intervals). The bootstrap, on the other hand, first estimates the whole distribution (of the point estimator) and then computes the variance from that. While powerful and easy, this can become highly computationally intensive.

"The bootstrap can be applied to both variance and distribution estimation problems. However, the bootstrap variance estimator is not as good as the jackknife or the balanced repeated replication (BRR) variance estimator in terms of the empirical results. Furthermore, the bootstrap variance estimator usually requires more computations than the jackknife or the BRR. Thus, the bootstrap is mainly recommended for distribution estimation."[attribution needed][9]

There is a special consideration with the jackknife, particularly with the delete-1 observation jackknife. It should only be used with smooth, differentiable statistics (e.g., totals, means, proportions, ratios, odd ratios, regression coefficients, etc.; not with medians or quantiles). This could become a practical disadvantage. This disadvantage is usually the argument favoring bootstrapping over jackknifing. More general jackknifes than the delete-1, such as the delete-m jackknife or the delete-all-but-2 Hodges–Lehmann estimator, overcome this problem for the medians and quantiles by relaxing the smoothness requirements for consistent variance estimation.

Usually the jackknife is easier to apply to complex sampling schemes than the bootstrap. Complex sampling schemes may involve stratification, multiple stages (clustering), varying sampling weights (non-response adjustments, calibration, post-stratification) and under unequal-probability sampling designs. Theoretical aspects of both the bootstrap and the jackknife can be found in Shao and Tu (1995),[10] whereas a basic introduction is accounted in Wolter (2007).[11] The bootstrap estimate of model prediction bias is more precise than jackknife estimates with linear models such as linear discriminant function or multiple regression.[12]

See also

[edit]

References

[edit]

Literature

[edit]
  • Good, P. (2006) Resampling Methods. 3rd Ed. Birkhauser.
  • Wolter, K.M. (2007). Introduction to Variance Estimation. 2nd Edition. Springer, Inc.
  • Pierre Del Moral (2004). Feynman-Kac formulae. Genealogical and Interacting particle systems with applications, Springer, Series Probability and Applications. ISBN 978-0-387-20268-6
  • Pierre Del Moral (2013). Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press, Monographs on Statistics and Applied Probability. ISBN 9781466504059
  • Jiang W, Simon R. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Stat Med. 2007 Dec 20;26(29):5320-34. doi: 10.1002/sim.2968. PMID: 17624926. https://brb.nci.nih.gov/techreport/prederr_rev_0407.pdf
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Resampling methods in statistics are a class of nonparametric techniques that leverage computational power to repeatedly draw samples from an observed , thereby approximating the of a —such as its , variance, , or confidence intervals—without assuming a specific parametric form for the underlying distribution. These methods, which gained prominence with the rise of digital computing in the mid-20th century, enable robust in scenarios where traditional parametric approaches fail due to complex or unknown data structures. Key techniques include the jackknife, developed by Maurice Quenouille starting in 1949 for bias correction and refined in 1956, and extended by in 1958 to estimate variance, which involves recomputing the statistic by omitting one observation at a time from the sample of size n to generate n pseudoreplicates. The bootstrap, pioneered by Bradley Efron in 1979, builds on the jackknife by treating the empirical distribution as a proxy for the population and drawing B resamples (typically B = 1,000 or more) with replacement to simulate variability and construct empirical distributions for estimators. Additional prominent approaches encompass , which reshuffle data labels to test hypotheses under exchangeability assumptions, and cross-validation, a resampling strategy for model evaluation that partitions data into subsets to estimate prediction error, such as in k-fold variants where k is often 5 or 10. Widely applied in , , and , resampling methods provide flexible tools for , , and hypothesis testing, particularly with high-dimensional or non-normal data.

Fundamentals

Definition and Principles

Resampling in statistics refers to a class of non-parametric methods that involve generating multiple samples from an observed dataset, either with or without replacement, to approximate the of a or without relying on assumptions about an underlying parametric . This approach enables the estimation of properties such as variance, , confidence intervals, or p-values by treating the empirical distribution of the as a surrogate for the true population distribution. The core principles of resampling rest on the idea that the empirical distribution function, defined as F^(x)=1ni=1nI(Xix)\hat{F}(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x), where II is the indicator function and X1,,XnX_1, \dots, X_n are the observed data points, serves as a reliable proxy for the unknown true distribution F(x)F(x). Through repeated simulations, resampling leverages computational power to approximate the behavior predicted by the central limit theorem, providing asymptotic normality or other distributional properties via empirical aggregation rather than analytical derivation. Unlike Monte Carlo methods, which generate samples from hypothetical or parametric models external to the data, resampling strictly utilizes the observed dataset itself to mimic variability under the data-generating process. The basic workflow of resampling entails drawing a large number of resamples from the original , computing the of interest (such as a or regression ) on each resample, and then aggregating the resulting values—often via the sample for point estimates or percentiles for interval estimates—to derive inferences. This process is particularly advantageous in scenarios involving small sample sizes or distributions with unknown or complex forms, where traditional parametric assumptions may fail, allowing for robust inference grounded in the 's empirical characteristics.

Historical Development

The origins of resampling methods in statistics trace back to the early , with foundational work on in experimental design. In , Ronald A. Fisher introduced randomization tests as a means to assess the significance of experimental outcomes under null hypotheses of no treatment effects, emphasizing the role of in generating reference distributions. This approach laid the groundwork for permutation-based inference by treating observed data permutations as proxies for the . Concurrently, E. J. G. Pitman developed exact tests for small samples using randomization principles, applicable to any without parametric assumptions, further solidifying the theoretical basis for non-parametric resampling in testing during and 1940s. By the , Maurice H. Quenouille proposed an early form of the jackknife method for bias reduction in estimators, involving the deletion of individual observations to approximate sampling variability, which marked a shift toward computational resampling for estimation purposes. The formalization and expansion of resampling techniques accelerated in the late , driven by key publications that connected and extended these ideas. Bradley Efron played a pivotal role in 1979 by introducing the bootstrap method in his seminal paper, which reframed the jackknife as a special case and proposed resampling with replacement from the empirical distribution to estimate sampling distributions, variance, and bias without relying on asymptotic theory. In the 1980s, Eugene S. Edgington advanced permutation tests through his book Randomization Tests, providing practical guidelines for their implementation in randomized experiments and broadening their application beyond Fisher's original context in experimental design. John W. Tukey, influential in , contributed to the jackknife's refinement in the 1950s and promoted resampling as a robust, data-driven alternative to rigid parametric models, influencing the field's emphasis on computational exploration. Philip Good further propelled permutation tests in the 1990s with works like Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses (first edition 1994), advocating their use for exact inference in diverse settings. The evolution of resampling methods was profoundly shaped by advances in computing power from the onward, which made intensive simulations feasible on personal computers and shifted from parametric reliance to computationally intensive, distribution-free approaches. Michael R. Chernick's Bootstrap Methods: A Practitioner's Guide (1999) exemplified this integration, offering accessible implementations tailored to growing computational resources and practitioner needs. A major theoretical milestone came in 1992 with Peter Hall's The Bootstrap and Edgeworth Expansion, which provided rigorous asymptotic justifications for bootstrap validity, bridging computational practice with mathematical foundations. Post-2000, resampling techniques adapted to high-dimensional data challenges, such as in ; for example, in 2010, methods like stability selection via subsampling addressed sparsity and multiple testing without assuming low dimensionality.

Methods for Hypothesis Testing

Permutation Tests

Permutation tests constitute a fundamental resampling method in for testing, where the null distribution of a is derived by the observed data under the assumption that group labels or observations are exchangeable. This approach generates an exact reference distribution for finite samples by considering all possible rearrangements consistent with the of no systematic differences between groups. Originating from early work by Ronald A. Fisher and E. J. G. Pitman, permutation tests offer a distribution-free alternative to parametric methods, particularly when data violate normality assumptions or sample sizes are small. The core assumption underlying permutation tests is exchangeability of the observations under the , implying that the remains unchanged when permuting labels within or across groups. This holds when data are independent and identically distributed (i.i.d.) under the null, such as in randomized experiments testing for no treatment effect. Permutation tests are applicable to a range of statistics, including differences in means, medians, or correlations, making them versatile for comparing distributions without specifying parametric forms. Unlike asymptotic tests, they rely solely on the observed data's structure for validity. The standard procedure begins with calculating the observed test statistic T(obs)T(\text{obs}), such as the difference in group means for a two-sample . Under the null, group labels are randomly permuted, and T(perm)T(\text{perm}) is recomputed for each permutation to simulate the . For tests with small samples, all permutations are enumerated; for larger datasets, a random of permutations (e.g., thousands) approximates the distribution. The is then the proportion of permuted statistics at least as extreme as T(obs)T(\text{obs}), often adjusted conservatively. Specifically, for a right-tailed test, p=1+permsI(T(perm)T(obs))Nperms+1,p = \frac{1 + \sum_{\text{perms}} I(T(\text{perm}) \geq T(\text{obs}))}{N_{\text{perms}} + 1}, where I()I(\cdot) is the indicator function (1 if true, 0 otherwise) and NpermsN_{\text{perms}} is the total number of permutations considered; the additive 1s prevent p-values of exactly 0 and provide an unbiased estimate. This method ensures the p-value directly reflects the probability of observing data as extreme under the null. A representative example is using tests as a robust alternative to the two-sample t-test for comparing group means. Consider two groups of 5 observations each (total n=10), where the null posits identical distributions. The observed mean difference serves as T(obs)T(\text{obs}), and the null distribution arises from all (105)=252\binom{10}{5} = 252 ways to reassign labels to the pooled data, which is computationally feasible on modern hardware. For instance, with drug treatment scores (36, 39, 60) versus (55, 70, 73) in a smaller n=6 case (3 per group, 20 permutations), the is the fraction of permuted mean differences exceeding the observed value of -21, yielding an assessment without normality reliance. Such exact enumeration is practical for n up to around 20, beyond which sampling is employed. Key advantages of permutation tests include their ability to achieve exact Type I error control in finite samples when fully enumerated, avoiding the approximations inherent in parametric tests that may fail under non-normality or heteroscedasticity. They require no assumptions about the data-generating distribution beyond exchangeability under the null, enhancing robustness in experimental and observational settings. This exactness stems from the principle, ensuring the test's validity regardless of sample size, though computational demands grow factorially with n. Seminal developments, including Fisher's 1935 exposition in and Pitman's 1937 formalization of significance tests for any , underscore their foundational role in modern resampling.

Bootstrap Hypothesis Testing

Bootstrap hypothesis testing extends the resampling principles of the bootstrap method, originally developed for , by generating an approximate of a through data modification under the specified . This approach enables the computation of p-values and regions for a wide range of test statistics without assuming a particular parametric form for the underlying distribution, making it suitable for complex or non-standard . The technique was formalized in the seminal work on bootstrap methods, where resampling under the null is used to mimic the variability expected if the holds true. To adapt the bootstrap for testing, the original is first transformed to conform to the ; for instance, when testing a population mean equal to a specific value μ0\mu_0, each observation is centered by subtracting μ0\mu_0, effectively shifting the empirical distribution to align with the null. Bootstrap samples are then drawn with replacement from this centered dataset, and the is computed for each resample to approximate its under the null. The is estimated as the proportion of these bootstrap test statistics that are as extreme as or more extreme than the observed from the original , providing a data-driven measure of against the null. This procedure is detailed in foundational texts on resampling, emphasizing its flexibility for scenarios where analytical null distributions are intractable. For enhanced accuracy, particularly with statistics that are approximately pivotal, the studentized bootstrap is employed, which standardizes the by its estimated in both the original and bootstrap samples. The resulting is given by p=1Bb=1BI(Tbθ^se^Tθ^se^),p = \frac{1}{B} \sum_{b=1}^B I\left( \frac{T^*_b - \hat{\theta}}{\widehat{se}^*} \geq \frac{T - \hat{\theta}}{\widehat{se}} \right),
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.