Recent from talks
Nothing was collected or created yet.
Kolmogorov–Smirnov test
View on Wikipedia

In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions. It can be used to test whether a sample came from a given reference probability distribution (one-sample K–S test), or to test whether or not two samples came from the same distribution (two-sample K–S test). It is named after Andrey Kolmogorov and Nikolai Smirnov, who developed it in the 1930s.[1]
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2), purely discrete or mixed (see Section 2.2). In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.
The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test.[2] However, these other tests have their own disadvantages. For instance, the Shapiro–Wilk test is known not to work well in samples with many identical values.
One-sample Kolmogorov–Smirnov statistic
[edit]The empirical distribution function Fn for n independent and identically distributed (i.i.d.) ordered observations Xi is defined as
where is the indicator function, equal to 1 if and equal to 0 otherwise.
The Kolmogorov–Smirnov statistic for a given cumulative distribution function F(x) is
where supx is the supremum of the set of distances. Intuitively, the statistic takes the largest absolute difference between the two distribution functions across all x values.
By the Glivenko–Cantelli theorem, if the sample comes from the distribution F(x), then Dn converges to 0 almost surely in the limit when goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.
In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject the null hypothesis.
Kolmogorov distribution
[edit]
The Kolmogorov distribution is the distribution of the random variable
where B(t) is the Brownian bridge. The cumulative distribution function of K is given by[3]
which can also be expressed by the Jacobi theta function . Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by Andrey Kolmogorov,[4] while a table of the distribution was published by Nikolai Smirnov.[5] Recurrence relations for the distribution of the test statistic in finite samples are available.[4]
Under null hypothesis that the sample comes from the hypothesized distribution F(x),
in distribution, where B(t) is the Brownian bridge. If F is continuous then under the null hypothesis converges to the Kolmogorov distribution, which does not depend on F. This result may also be known as the Kolmogorov theorem.
The accuracy of this limit as an approximation to the exact CDF of when is finite is not very impressive: even when , the corresponding maximum error is about ; this error increases to when and to a totally unacceptable when . However, a very simple expedient of replacing by
in the argument of the Jacobi theta function reduces these errors to , , and respectively; such accuracy would be usually considered more than adequate for all practical applications.[6]
The goodness-of-fit test or the Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when It rejects the null hypothesis at level if
where Kα is found from
The asymptotic power of this test is 1.
Fast and accurate algorithms to compute the cdf or its complement for arbitrary and , are available from:
- [7] and [8] for continuous null distributions with code in C and Java to be found in.[7]
- [9] for purely discrete, mixed or continuous null distribution implemented in the KSgeneral package [10] of the R project for statistical computing, which for a given sample also computes the KS test statistic and its p-value. Alternative C++ implementation is available from.[9]
Test with estimated parameters
[edit]If either the form or the parameters of F(x) are determined from the data Xi the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required, but tables have been prepared for some cases. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published,[11] and later publications also include the Gumbel distribution.[12] The Lilliefors test represents a special case of this for the normal distribution. The logarithm transformation may help to overcome cases where the Kolmogorov test data does not seem to fit the assumption that it came from the normal distribution.
Using estimated parameters, the question arises which estimation method should be used. Usually this would be the maximum likelihood method, but e.g. for the normal distribution MLE has a large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df = 2 via KS test whether the data could be normal or not, then a ML estimate based on H0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than a fit with minimum KS. In this case we should reject H0, which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject H0. In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power.[citation needed]
Discrete and mixed null distribution
[edit]Under the assumption that is non-decreasing and right-continuous, with countable (possibly infinite) number of jumps, the KS test statistic can be expressed as:
From the right-continuity of , it follows that and and hence, the distribution of depends on the null distribution , i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of when is purely discrete or mixed,[9] implemented in C++ and in the KSgeneral package [10] of the R language. The functions disc_ks_test(), mixed_ks_test() and cont_ks_test() compute also the KS test statistic and p-values for purely discrete, mixed or continuous null distributions and arbitrary sample sizes. The KS test and its p-values for discrete null distributions and small sample sizes are also computed in [13] as part of the dgof package of the R language. Major statistical packages among which SAS PROC NPAR1WAY,[14] Stata ksmirnov[15] implement the KS test under the assumption that is continuous, which is more conservative if the null distribution is actually not continuous (see [16]
[17]
[18]).
Two-sample Kolmogorov–Smirnov test
[edit]
The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case, the Kolmogorov–Smirnov statistic is
where and are the empirical distribution functions of the first and the second sample respectively, and is the supremum function.
For large samples, the null hypothesis is rejected at level if
Where and are the sizes of first and second sample respectively. The value of is given in the table below for the most common levels of
| 0.20 | 0.15 | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 | 0.001 | |
| 1.073 | 1.138 | 1.224 | 1.358 | 1.48 | 1.628 | 1.731 | 1.949 |
and in general[19] by
so that the condition reads
Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. ), the minimal bound scales in the size of either of the samples according to its inverse square root.
Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the univariate Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions. Some argue[20][21] that the Cucconi test, originally proposed for simultaneously comparing location and scale, can be much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.
Two-sample KS tests have been applied in economics to detect asymmetric effects and to study natural experiments.[22]
Setting confidence limits for the shape of a distribution function
[edit]While the Kolmogorov–Smirnov test is usually used to test whether a given F(x) is the underlying probability distribution of Fn(x), the procedure may be inverted to give confidence limits on F(x) itself. If one chooses a critical value of the test statistic Dα such that P(Dn > Dα) = α, then a band of width ±Dα around Fn(x) will entirely contain F(x) with probability 1 − α.
The Kolmogorov–Smirnov statistic in more than one dimension
[edit]A distribution-free multivariate Kolmogorov–Smirnov goodness of fit test has been proposed by Justel, Peña and Zamar (1997).[23] The test uses a statistic which is built using Rosenblatt's transformation, and an algorithm is developed to compute it in the bivariate case. An approximate test that can be easily computed in any dimension is also presented.
The Kolmogorov–Smirnov test statistic needs to be modified if a similar test is to be applied to multivariate data. This is not straightforward because the maximum difference between two joint cumulative distribution functions is not generally the same as the maximum difference of any of the complementary distribution functions. Thus the maximum difference will differ depending on which of or or any of the other two possible arrangements is used. One might require that the result of the test used should not depend on which choice is made.
One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting KS statistics. In d dimensions, there are 2d − 1 such orderings. One such variation is due to Peacock[24] (see also Gosset[25] for a 3D version) and another to Fasano and Franceschini[26] (see Lopes et al. for a comparison and computational details).[27] Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution.
Implementations
[edit]The Kolmogorov–Smirnov test is implemented in many software programs. Most of these implement both the one and two sampled test.
- Mathematica has KolmogorovSmirnovTest.
- MATLAB's Statistics Toolbox has kstest and kstest2 for one-sample and two-sample Kolmogorov–Smirnov tests, respectively.
- The R package "KSgeneral"[10] computes the KS test statistics and its p-values under arbitrary, possibly discrete, mixed or continuous null distribution.
- R's statistics base-package implements the test as ks.test {stats} in its "stats" package.
- SAS implements the test in its PROC NPAR1WAY procedure.
- In Python, the SciPy package implements the test in the scipy.stats.kstest function.[28]
- SYSTAT (SPSS Inc., Chicago, IL)
- Java has an implementation of this test provided by Apache Commons.[29]
- KNIME has a node implementing this test based on the above Java implementation.[30]
- Julia has the package HypothesisTests.jl with the function ExactOneSampleKSTest(x::AbstractVector{<:Real}, d::UnivariateDistribution).[31]
- StatsDirect (StatsDirect Ltd, Manchester, UK) implements all common variants.
- Stata (Stata Corporation, College Station, TX) implements the test in ksmirnov (Kolmogorov–Smirnov equality-of-distributions test) command.[32]
- PSPP implements the test in its KOLMOGOROV-SMIRNOV (or using KS shortcut function).
- The Real Statistics Resource Pack for Excel runs the test as KSCRIT and KSPROB.[33]
See also
[edit]References
[edit]- ^ "7.2.1.2. Kolmogorov- Smirnov test". Retrieved 8 October 2025.
- ^ Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association. 69 (347): 730–737. doi:10.2307/2286009. JSTOR 2286009.
- ^ Marsaglia G, Tsang WW, Wang J (2003). "Evaluating Kolmogorov's Distribution". Journal of Statistical Software. 8 (18): 1–4. doi:10.18637/jss.v008.i18.
- ^ a b Kolmogorov A (1933). "Sulla determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari. 4: 83–91.
- ^ Smirnov N (1948). "Table for estimating the goodness of fit of empirical distributions". Annals of Mathematical Statistics. 19 (2): 279–281. doi:10.1214/aoms/1177730256.
- ^ Vrbik, Jan (2018). "Small-Sample Corrections to Kolmogorov–Smirnov Test Statistic". Pioneer Journal of Theoretical and Applied Statistics. 15 (1–2): 15–23.
- ^ a b Simard R, L'Ecuyer P (2011). "Computing the Two-Sided Kolmogorov–Smirnov Distribution". Journal of Statistical Software. 39 (11): 1–18. doi:10.18637/jss.v039.i11.
- ^ Moscovich A, Nadler B (2017). "Fast calculation of boundary crossing probabilities for Poisson processes". Statistics and Probability Letters. 123: 177–182. arXiv:1503.04363. doi:10.1016/j.spl.2016.11.027. S2CID 12868694.
- ^ a b c Dimitrova DS, Kaishev VK, Tan S (2020). "Computing the Kolmogorov–Smirnov Distribution when the Underlying cdf is Purely Discrete, Mixed or Continuous". Journal of Statistical Software. 95 (10): 1–42. doi:10.18637/jss.v095.i10.
- ^ a b c Dimitrova, Dimitrina; Yun, Jia; Kaishev, Vladimir; Tan, Senren (21 May 2024). "KSgeneral: KSgeneral: Computing P-Values of the One-Sample K-S Test and the Two-Sample K-S and Kuiper Tests for (Dis)Continuous Null Distribution". CRAN.R-project.org/package=KSgeneral.
- ^ Pearson, E. S.; Hartley, H. O., eds. (1972). Biometrika Tables for Statisticians. Vol. 2. Cambridge University Press. pp. 117–123, Tables 54, 55. ISBN 978-0-521-06937-3.
- ^ Shorack, Galen R.; Wellner, Jon A. (1986). Empirical Processes with Applications to Statistics. Wiley. p. 239. ISBN 978-0-471-86725-8.
- ^ Arnold, Taylor B.; Emerson, John W. (2011). "Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions" (PDF). The R Journal. 3 (2): 34\[Dash]39. doi:10.32614/rj-2011-016.
- ^ "SAS/STAT(R) 14.1 User's Guide". support.sas.com. Retrieved 14 April 2018.
- ^ "ksmirnov — Kolmogorov–Smirnov equality-of-distributions test" (PDF). stata.com. Retrieved 14 April 2018.
- ^ Noether GE (1963). "Note on the Kolmogorov Statistic in the Discrete Case". Metrika. 7 (1): 115–116. doi:10.1007/bf02613966. S2CID 120687545.
- ^ Slakter MJ (1965). "A Comparison of the Pearson Chi-Square and Kolmogorov Goodness-of-Fit Tests with Respect to Validity". Journal of the American Statistical Association. 60 (311): 854–858. doi:10.2307/2283251. JSTOR 2283251.
- ^ Walsh JE (1963). "Bounded Probability Properties of Kolmogorov–Smirnov and Similar Statistics for Discrete Data". Annals of the Institute of Statistical Mathematics. 15 (1): 153–158. doi:10.1007/bf02865912. S2CID 122547015.
- ^ Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms), 3rd Edition, Addison Wesley, Reading Mass, 1998.
- ^ Marozzi, Marco (2009). "Some Notes on the Location-Scale Cucconi Test". Journal of Nonparametric Statistics. 21 (5): 629–647. doi:10.1080/10485250902952435. S2CID 120038970.
- ^ Marozzi, Marco (2013). "Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods". Communications in Statistics – Simulation and Computation. 42 (6): 1298–1317. doi:10.1080/03610918.2012.665546. S2CID 28146102.
- ^ Monge, Marco (2023). "Two-Sample Kolmogorov-Smirnov Tests as Causality Tests. A narrative of Latin American inflation from 2020 to 2022". Revista Chilena de Economía y Sociedad. 17 (1): 68–78.
- ^ Justel, A.; Peña, D.; Zamar, R. (1997). "A multivariate Kolmogorov–Smirnov test of goodness of fit". Statistics & Probability Letters. 35 (3): 251–259. CiteSeerX 10.1.1.498.7631. doi:10.1016/S0167-7152(97)00020-5.
- ^ Peacock J.A. (1983). "Two-dimensional goodness-of-fit testing in astronomy". Monthly Notices of the Royal Astronomical Society. 202 (3): 615–627. Bibcode:1983MNRAS.202..615P. doi:10.1093/mnras/202.3.615.
- ^ Gosset E. (1987). "A three-dimensional extended Kolmogorov–Smirnov test as a useful tool in astronomy}". Astronomy and Astrophysics. 188 (1): 258–264. Bibcode:1987A&A...188..258G.
- ^ Fasano, G.; Franceschini, A. (1987). "A multidimensional version of the Kolmogorov–Smirnov test". Monthly Notices of the Royal Astronomical Society. 225: 155–170. Bibcode:1987MNRAS.225..155F. doi:10.1093/mnras/225.1.155. ISSN 0035-8711.
- ^ Lopes, R.H.C.; Reid, I.; Hobson, P.R. (23–27 April 2007). The two-dimensional Kolmogorov–Smirnov test (PDF). XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research. Amsterdam, the Netherlands.
- ^ "scipy.stats.kstest". SciPy v1.7.1 Manual. The Scipy community. Retrieved 26 October 2021.
- ^ "KolmogorovSmirnovTest". Retrieved 18 June 2019.
- ^ "New statistics nodes". Retrieved 25 June 2020.
- ^ "Nonparametric tests · HypothesisTests.jl".
- ^ "ksmirnov — Kolmogorov –Smirnov equality-of-distributions test" (PDF). Retrieved 18 June 2019.
- ^ "Kolmogorov–Smirnov Test for Normality Hypothesis Testing". Retrieved 18 June 2019.
Further reading
[edit]- Daniel, Wayne W. (1990). "Kolmogorov–Smirnov one-sample test". Applied Nonparametric Statistics (2nd ed.). Boston: PWS-Kent. pp. 319–330. ISBN 978-0-534-91976-4.
- Eadie, W.T.; D. Drijard; F.E. James; M. Roos; B. Sadoulet (1971). Statistical Methods in Experimental Physics. Amsterdam: North-Holland. pp. 269–271. ISBN 978-0-444-10117-4.
- Stuart, Alan; Ord, Keith; Arnold, Steven [F.] (1999). Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics. Vol. 2A (Sixth ed.). London: Arnold. pp. 25.37 – 25.43. ISBN 978-0-340-66230-4. MR 1687411.
- Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley. ISBN 978-1-118-84031-3.
- Stephens, M. A. (1979). "Test of fit for the logistic distribution based on the empirical distribution function". Biometrika. 66 (3): 591–595. doi:10.1093/biomet/66.3.591.
- Kesemen, O.; Tiryaki, B.K.; Tezel, Ö.; Özkul, E. (2021). "A new goodness of fit test for multivariate normality". Hacettepe Journal of Mathematics and Statistics. 50 (3): 872–894. doi:10.15672/hujms.644516.
External links
[edit]- "Kolmogorov–Smirnov test". Encyclopedia of Mathematics. EMS Press. 2001 [1994].
- Short introduction
- KS test explanation
- JavaScript implementation of one- and two-sided tests
- Online calculator with the KS test
- Open-source C++ code to compute the Kolmogorov distribution and perform the KS test
- Paper on Evaluating Kolmogorov's Distribution; contains C implementation. This is the method used in Matlab.
- Paper on Computing the Two-Sided Kolmogorov–Smirnov Distribution; computing the cdf of the KS statistic in C or Java.
- Paper powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions; Jeff Alstott, Ed Bullmore, Dietmar Plenz. Among others, it also performs the Kolmogorov–Smirnov test. Source code and installers of powerlaw package are available at PyPi.
Kolmogorov–Smirnov test
View on GrokipediaIntroduction
Purpose and variants
The Kolmogorov–Smirnov test serves as a nonparametric statistical hypothesis test primarily used to evaluate goodness-of-fit in the one-sample case, where it determines whether a sample's empirical cumulative distribution function aligns with a specified theoretical distribution, and in the two-sample case, where it assesses whether two independent samples arise from the same underlying distribution.[1] This test quantifies discrepancies between cumulative distribution functions without requiring parametric assumptions about the form of the distributions, making it suitable for broad applications in distributional inference.[8] As a nonparametric procedure, the test imposes minimal assumptions on the data-generating process, relying only on the continuity of the distributions for the validity of its asymptotic distribution under the null hypothesis; it does not presuppose membership in any specific parametric family, such as normality or exponentiality.[9][10] This distribution-free property under the null enables its use across diverse datasets where parametric forms are unknown or suspect.[11] Key variants include the standard one-sample test, which compares a single empirical distribution to a fully specified reference, and the two-sample test, which compares two empirical distributions directly; a notable adaptation is the Lilliefors test, which modifies the one-sample version to account for estimated parameters, such as mean and variance in normality testing.[12] Under the null hypothesis, the distributions are identical, while the alternative posits a difference in their cumulative distribution functions.[8] Developed in the early 20th century, the test emerged as a foundational tool for distribution-free statistical inference, with the one-sample form introduced by Kolmogorov in 1933 and the two-sample extension by Smirnov in 1939.[8][2]Historical development
The Kolmogorov–Smirnov test emerged in the early 20th century as a nonparametric method for assessing the fit between empirical and theoretical distributions. In 1933, Andrey Kolmogorov introduced the one-sample variant in his paper "Sulla determinazione empirica di una legge di distribuzione," where he defined the test for continuous distributions and established its asymptotic distribution, laying the groundwork for modern goodness-of-fit testing. This contribution was contemporaneous with and influenced by the Glivenko-Cantelli theorem, proved by Valery Glivenko in the same year, which demonstrates the almost sure uniform convergence of the empirical distribution function to the true cumulative distribution function as sample size increases. Nikolai V. Smirnov built upon Kolmogorov's foundation with key extensions in the late 1930s and 1940s. In 1939, Smirnov developed the two-sample test in his work "On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples," enabling comparisons between distributions from two independent samples without assuming a specific form.[13] He further advanced the framework in 1948 by publishing tables for estimating the goodness of fit of empirical distributions, which provided practical computational aids for applying the test in finite samples. These developments formalized the test's variants, shifting it from abstract probability theory toward applied statistics. Post-World War II, the test gained broader acceptance as statistical computing advanced, with its incorporation into software packages by the 1960s enhancing accessibility for researchers.[1] Refinements in the 1950s, notably by F. J. Massey, included detailed tables of critical values for the goodness-of-fit statistic, improving the test's usability for small to moderate sample sizes.[14] By the 1970s, the Kolmogorov–Smirnov test had transitioned into a standard practical tool, widely applied in quality control for verifying process distributions and in astronomy for comparing observational data sets to theoretical models.[1][15]One-sample test
Statistic definition
The one-sample Kolmogorov–Smirnov test assesses whether a random sample is drawn from a specified continuous distribution by comparing its empirical cumulative distribution function (ECDF) to the theoretical cumulative distribution function (CDF).[1] Let be a random sample from a distribution with hypothesized CDF . The ECDF for the sample is defined as where is the indicator function.[1] The test statistic is the supremum of the absolute difference between the ECDF and the hypothesized CDF, which quantifies the maximum vertical distance between the ECDF step function and the theoretical CDF.[1] This statistic, introduced by Kolmogorov in 1933, measures overall discrepancies in the distribution without assuming a parametric form beyond the specified . The null hypothesis is that the sample arises from the distribution with CDF , with the alternative that it does not.[2] To compute , sort the observations to obtain the order statistics , then evaluate the differences at the points just before and after each , as the supremum occurs at these jumps in the ECDF. Specifically, compute and .[1] For illustration, consider a sample of 1000 observations generated from a standard normal distribution and tested against the normal CDF (α = 0.05, critical value ≈ 0.043). The computed does not exceed the critical value, failing to reject the null hypothesis of normality. In contrast, the same sample size from a lognormal distribution yields , leading to rejection.[1]Asymptotic distribution
Under the null hypothesis that a sample of size is drawn from the continuous distribution function , the one-sample Kolmogorov–Smirnov statistic satisfies where is a standard Brownian bridge on .[1] This convergence holds as .[16] The cumulative distribution function of this limiting random variable is given by the Kolmogorov distribution: This distribution is independent of the underlying CDF .[1] For inference at significance level , the asymptotic critical value of is approximately 1.36; thus, the critical value for itself is about .[17] Critical values for finite samples can be obtained from tables.[18] In finite samples, particularly when , the asymptotic approximation may be inaccurate, and exact p-values are computed using the null distribution via enumeration or Monte Carlo simulations with at least 10,000 replicates.[19] The proof relies on Donsker's invariance principle, where the empirical process converges in distribution to a Brownian bridge , such that the supremum norm converges to that of .[16]Parameter estimation effects
When the parameters of the hypothesized distribution, such as the mean μ and standard deviation σ for a normal distribution, are estimated from the sample data, the null distribution of the Kolmogorov–Smirnov statistic shifts. Specifically, the test statistic , where denotes the estimated parameters, tends to be smaller under the null hypothesis because the fitted distribution is adjusted to better match the empirical distribution function . If the standard critical values assuming known parameters are used, the test becomes conservative, resulting in Type I error rates lower than the nominal level.[1] The Lilliefors test addresses this issue for testing normality with estimated mean and variance. It modifies the one-sample KS test by employing critical values obtained from Monte Carlo simulations of the null distribution under parameter estimation, rather than relying on the standard Kolmogorov tables. These simulated critical values are smaller than the standard ones to account for the bias introduced by estimation, ensuring the test maintains the desired significance level; for instance, for a sample size of 20 and α = 0.05, the Lilliefors critical value is approximately 0.190, compared to 0.294 for known parameters. Lilliefors (1967) provided tables for critical values up to sample sizes of 300 at common significance levels. For more general cases, particularly location-scale families, the test statistic is computed using the estimated parameters, but appropriate critical values must be derived via Monte Carlo simulations tailored to the specific distribution and number of estimated parameters. This approach simulates samples from the fitted distribution and computes the empirical distribution of to obtain p-values or thresholds that correct for the estimation effect.[1] The Pelz-Good algorithm offers an efficient computational method for approximating the cumulative distribution function of the standard KS statistic (with known parameters), which can be integrated into parametric bootstrap procedures for p-value estimation when parameters are unknown. By generating simulations from the estimated distribution and applying the algorithm to compute tail probabilities, it accounts for the reduced degrees of freedom due to parameter estimation, enabling accurate inference without exhaustive full simulations. Pelz and Good (1976) developed this series expansion-based approximation for the lower tail areas, improving efficiency for moderate to large sample sizes. As a practical guideline, the asymptotic Kolmogorov distribution provides a reasonable approximation for the test with estimated parameters when the sample size n > 50, but for smaller n, simulated or tabulated critical values are essential to avoid excessive conservatism. For example, in testing an exponential distribution with estimated rate parameter , the critical value for at α = 0.05 and n = 20 is approximately 0.22–0.25, a decrease of about 10–20% compared to the value of 0.294 for known λ, reflecting the need for adjustment to achieve nominal Type I error control.[1]Discrete case adjustments
The asymptotic theory underlying the Kolmogorov–Smirnov (KS) test assumes a continuous null cumulative distribution function (CDF) , under which the empirical CDF converges to a uniform distribution on [0,1], enabling the use of Kolmogorov's limiting distribution for p-values. For discrete null distributions, however, ties in the data cause to exhibit jumps and plateaus, preventing the supremum from achieving the full range of the continuous case and resulting in a discrete distribution for the test statistic that stochastically dominates the continuous one.[9] This leads to conservative p-values when using continuous tables, meaning the test has lower power to detect deviations, with actual type I error rates below the nominal level (e.g., observed rejection rates of about 2-3% at the 5% level for binomial data). To address these issues, modifications to the statistic or its evaluation have been proposed. One approach uses a modified statistic , where denotes the left limit, focusing on deviations just before jumps in the discrete CDF to better approximate continuous behavior. Alternatively, a continuity correction can be applied by adjusting the empirical ranks, such as comparing to for ordered observations , which shifts the steps to midpoints of intervals and reduces conservatism, though it may slightly alter power properties. These adjustments maintain the nonparametric nature of the test while improving its applicability to discrete data. For exact inference in the discrete case, the null hypothesis posits that the sample follows a multinomial distribution with cell probabilities given by the jumps in . An exact p-value is then , computed by enumerating or recursively summing over all possible multinomial outcomes with statistic at least as extreme as observed. Conover's algorithm efficiently calculates this for one-sided tests with by leveraging the structure of partial sums, providing exact critical values; for two-sided tests, conservative bounds are used due to symmetry challenges.[9] This method has been implemented in statistical software for small samples, ensuring proper control of the type I error. In cases of mixed continuous-discrete null distributions, where has both jumps and continuous parts, discretization techniques such as binning the continuous components into fine intervals or using weighted deviations can approximate the test. More precisely, recursive methods compute the exact distribution of by integrating over continuous segments and handling discrete masses separately, often incorporating a continuity correction like adding 0.5 to jumps for numerical stability.[20] These approaches yield exact p-values without relying on asymptotic approximations, though they require specifying the full hybrid CDF. For example, consider testing a sample against a known Poisson null distribution with parameter . The supremum is evaluated at integer points where jumps occur, but to adjust for the flat intervals between integers, one averages deviations over each unit interval or applies the continuity correction to the ranks before computing . Using Conover's exact method on a small sample (e.g., ) might yield a p-value of 0.12 for observed , whereas the unadjusted continuous approximation would conservatively report around 0.20. This highlights the adjustment's role in restoring power for count data common in Poisson testing. Despite these advances, exact methods remain computationally intensive for large (e.g., beyond 100) or distributions with many categories, as the multinomial enumeration grows factorially.[9] Approximations like Monte Carlo simulation under the null can mitigate this but introduce variability; for highly discrete cases with sparse categories, the chi-squared goodness-of-fit test is often recommended as a more efficient alternative due to its asymptotic chi-squared distribution and better power in binned settings.Two-sample test
Statistic definition
The two-sample Kolmogorov–Smirnov test assesses whether two independent samples are drawn from the same continuous distribution by comparing their empirical cumulative distribution functions (ECDFs).[21] Let be a random sample from an unknown distribution with CDF , and be an independent random sample from another unknown distribution with CDF . The ECDF for the first sample is defined as where is the indicator function, and similarly for the second sample, [22] The test statistic is the supremum of the absolute difference between these ECDFs, which quantifies the maximum vertical distance between the two step functions.[21] This statistic, introduced by Smirnov in 1939, measures overall discrepancies in distribution shape, location, and scale without assuming a specific parametric form.[22] For asymptotic approximations, an effective sample size is used to scale the statistic as , which converges in distribution to the Kolmogorov distribution under the null hypothesis when .[23] To compute , combine and sort the observations from both samples to form the order statistics, then evaluate the ECDF differences at these points, as the supremum occurs at jumps in either ECDF.[21] For illustration, consider two samples of size 8 from Italy ({2, 4, 6, 8, 10, 12, 14, 16}) and size 7 from France ({1, 3, 5, 7, 9, 11, 13}), which can be viewed as scaled uniforms for demonstration; the ECDFs yield as the maximum absolute difference.[24] The null hypothesis is that the two samples arise from the same continuous distribution (), with the alternative that the distributions differ ().[21]Asymptotic distribution
Under the null hypothesis that two independent samples of sizes and are drawn from the same continuous distribution function , the two-sample Kolmogorov–Smirnov statistic satisfies where is the effective sample size and , are independent standard Brownian bridges on .[16] This convergence holds as .[16] The cumulative distribution function of this limiting random variable is given by the Kolmogorov distribution: Surprisingly, this distribution is identical to that of the limiting supremum in the one-sample Kolmogorov–Smirnov test, despite involving two independent samples.[16] For inference at significance level , the asymptotic critical value of is approximately 1.36; thus, the critical value for itself is about .[17] For unequal sample sizes, critical values can be obtained from tables based on the effective sample size .[25] In finite samples, particularly when , the asymptotic approximation may be inaccurate, and exact p-values are computed using the permutation distribution under the null, which enumerates all possible label assignments to the combined sample.[19] For larger but still moderate samples, Monte Carlo simulations with at least 10,000 replicates provide reliable p-values by approximating the null distribution.[19] The proof relies on the convergence of the empirical processes and to independent Brownian bridges, such that their appropriately scaled difference converges to a Gaussian process whose supremum has the Kolmogorov distribution; specifically, the law of (after variance adjustment) coincides with that of for a single Brownian bridge .[16]Theoretical foundations
Kolmogorov distribution properties
The Kolmogorov distribution arises as the limiting distribution of the Kolmogorov–Smirnov statistic in large samples, specifically as the distribution of the random variable , where is a standard Brownian bridge and is a standard Wiener process. The cumulative distribution function of is given explicitly by the infinite series This alternating series converges rapidly for due to the quadratic exponential decay in the terms, allowing practical truncation after a few terms for numerical evaluation. The first two moments of are and , with higher moments obtainable through numerical integration of the survival function derived from the series expansion. For large , the tail probability satisfies , reflecting the dominant contribution from the first term of the series and providing key insights into large deviation behavior. The Kolmogorov distribution coincides with that of the supremum norm of a centered Gaussian process on [0,1] with the Brownian bridge covariance function , and it relates to the scaling limits of tied-down random walks or excursion processes in discrete approximations. Numerical evaluation of the CDF employs recursive relations or direct summation of the series for efficiency, with precomputed quantile tables available for critical values corresponding to significance levels as low as .Convergence and limiting behavior
The Glivenko–Cantelli theorem provides the foundational result for the uniform consistency of the empirical distribution function. Specifically, for independent and identically distributed (i.i.d.) random variables drawn from a distribution function , the supremum almost surely as , where is the empirical distribution function. This strong uniform convergence holds under the i.i.d. assumption without further restrictions on . Building on this, Donsker's invariance principle describes the weak convergence of the properly scaled empirical process to a Brownian bridge. Under the i.i.d. condition, the process converges in distribution to a Brownian bridge in the Skorohod space equipped with the Skorohod topology, where is a standard Brownian motion. This functional central limit theorem underpins the asymptotic distribution of the Kolmogorov–Smirnov statistic , which satisfies . The validity of these asymptotic results requires i.i.d. samples from for the one-sample case; for the two-sample test, the samples must be independent and drawn from the same underlying distribution . Additionally, continuity of ensures the exact limiting Kolmogorov distribution without adjustments for ties or discontinuities. Stronger quantitative bounds on the approximation are given by the Komlós–Major–Tusnády strong approximation theorem, which constructs a probability space where with probability approaching 1 as , under the i.i.d. assumption with finite variance. This rate quantifies the error in approximating the KS statistic by its Brownian bridge limit and improves upon weaker convergence results. Extensions to non-i.i.d. settings, such as dependent data in time series, necessitate modifications like blocking techniques or bootstrapping to restore approximate independence and enable asymptotic validity, though these require case-specific conditions on the dependence structure.[26] Regarding the rate of convergence, for a fixed significance level , the quantity converges in distribution to a form related to the squared supremum of the Brownian bridge, which exhibits slower convergence rates compared to parametric tests that achieve efficiency.Extensions
Confidence bands for distributions
The Kolmogorov–Smirnov test exhibits a duality with confidence bands for the cumulative distribution function (CDF). Specifically, failing to reject the null hypothesis (where is the true CDF and is a specified CDF) at significance level implies, with confidence , that the true CDF satisfies , where is the critical value of the KS statistic .[27] This inversion of the test provides a nonparametric confidence region consisting of all CDFs such that .[28] For an unknown distribution without a specified , the duality yields confidence bands centered on the empirical CDF . The confidence band is given by where is the quantile of the Kolmogorov distribution (the limiting distribution of ). For example, provides an asymptotic 95% confidence band.[29][28] Smirnov's method constructs these bands with uniform width for the one-sample case assuming known parameters, relying on the exact distribution of the KS statistic under continuity.[27] When parameters are estimated from the data (e.g., via maximum likelihood), the null distribution of the KS statistic changes, requiring the band to be widened by a simulation-derived factor to maintain coverage; this adjustment, often implemented via Monte Carlo methods, accounts for the estimation uncertainty and is detailed in approaches like the Lilliefors modification.[28][30] In survival analysis, these bands set nonparametric limits on reliability functions (survival curves, ), aiding inference on failure probabilities without parametric assumptions. The bands provide simultaneous coverage over all , ensuring the true CDF lies within the band with probability . For known and continuous distributions, coverage is exact; with unknown distributions, it is asymptotic as .[27][28] For discrete distributions, the bands tend to be conservative due to ties in the empirical CDF, leading to inflated critical values; adjustments like randomization or continuity corrections mitigate this.[28] Alternatives, such as Hall–Wellner bands, weight deviations by to improve uniformity and power in the tails, though they require more computation.[31]Multivariate versions
The multivariate Kolmogorov–Smirnov test extends the univariate test to compare probability distributions in dimensions, addressing scenarios where data points are vectors in . The multivariate empirical cumulative distribution function (ECDF) for a sample of i.i.d. observations is defined as where and is the indicator function; this counts the proportion of points whose components are all less than or equal to the corresponding components of , analogous to the univariate case but over orthants.[32] For the one-sample goodness-of-fit test against a hypothesized distribution with CDF , the test statistic is , measuring the maximum vertical deviation between the ECDF and the target CDF. However, computing this supremum is computationally intractable due to the infinite domain of and the need to evaluate the difference everywhere, which becomes prohibitive even for moderate .[32] Practical implementations approximate the supremum using methods like evaluating the statistic over a finite grid of points or at marginal maxima derived from the data, as developed in early multidimensional extensions. For instance, Fasano and Franceschini (1987) proposed an exact computational approach for the two-sample case in two or three dimensions by sorting the data and checking deviations across all relevant orthants defined by the combined sample points, avoiding full grid searches. An alternative is the Cramér-von Mises analog, which integrates the squared deviations over the space rather than taking the supremum, though it requires similar approximations in higher dimensions.[33] In the two-sample setting, with samples of sizes and having ECDFs and , the statistic is , again evaluated over orthants to detect differences in multidimensional distributions. Under the null hypothesis of identical distributions, the asymptotic distribution involves a multivariate Brownian bridge, but no closed-form expression exists for , complicating exact p-value computation.[33] Key challenges in multivariate applications include the curse of dimensionality, where the effective sample size diminishes rapidly as increases, leading to low power and unreliable approximations even for . Consequently, tests often rely on bootstrap methods to estimate p-values, resampling from the combined or individual samples to simulate the null distribution, though this adds significant computational cost.[34] For example, to test bivariate normality on 2D scatterplot data from a sample of size , one might compute by discretizing a grid over the observed range of each dimension (e.g., 50x50 points) and evaluating deviations from the bivariate normal CDF, then using bootstrapping for the p-value; significant deviations would indicate non-normality, such as clustering or outliers in specific orthants.[32]Higher-dimensional generalizations
Higher-dimensional generalizations of the Kolmogorov–Smirnov (KS) test extend beyond finite-dimensional vectors to infinite-dimensional or non-Euclidean spaces, such as function spaces or graph structures, by adapting the supremum distance to measure discrepancies in empirical and target distributions. In functional data analysis, where observations are curves or functions in a Hilbert space , the test compares projected empirical distribution functions to a null hypothesis distribution. The statistic is typically defined as , where is a projection (e.g., onto principal components derived from a Karhunen–Loève expansion), is the empirical cumulative distribution function of the projected data, and is the projected null distribution; this approach leverages random or all-projections to capture multidimensional variability while maintaining computational tractability.[35] Such tests assess goodness-of-fit for functional models by aggregating projection-based KS statistics, with asymptotic properties derived from empirical process theory.[36] For graph or network distributions, the KS test is adapted by treating graphs as realizations of random variables over edge probabilities or subgraph counts, with the statistic computed as the supremum of differences in empirical cumulative distributions of these features. This measures similarity between network ensembles, such as testing whether two sets of graphs arise from the same generative model, and has applications in social network analysis for detecting structural shifts.[37] The approach embeds graph properties into a distributional framework, enabling non-parametric comparisons without assuming specific parametric forms.[38] As an alternative in high dimensions, the energy distance metric provides a related supremum-based measure that avoids the curse of dimensionality plaguing grid-based KS extensions; defined as , it quantifies distribution differences via expected Euclidean distances and offers faster computation through direct sample estimates without projections or kernels.[39] This makes it suitable for high-dimensional settings where traditional KS variants degrade in power.[40] Theoretically, these generalizations embed distributions into a reproducing kernel Hilbert space (RKHS) , mapping measures to kernel mean embeddings , with the test statistic as over a family of kernels; convergence under the null follows functional Donsker theorems, ensuring converges to the supremum of a Gaussian process in the RKHS unit ball.[41] This framework unifies infinite-dimensional two-sample testing with universal Donsker classes for empirical processes. Post-2010 developments include bootstrap methods for p-value computation in streaming data scenarios, where KS statistics monitor evolving distributions online by resampling from sliding windows to approximate null distributions under non-stationarity.[42] In machine learning, these tests detect distribution shifts, such as in deep reinforcement learning for traffic control, where KS distances between vehicle flow empirical CDFs quantify deviations causing performance drops (e.g., a 0.02 increase linked to 3.7% throughput loss).[43] An illustrative example is testing uniformity on the hypersphere for , where projections onto random directions yield one-dimensional KS statistics on the projected uniforms, aggregated over multiple directions to assess global deviations; angular measures, like cosine-based pairwise angles , further adapt the supremum to spherical geometry for rotation-invariant testing.[44]Practical considerations
Implementation algorithms
The Kolmogorov–Smirnov (KS) statistic for the one-sample test is computed by first sorting the sample data in ascending order to obtain the order statistics . The empirical cumulative distribution function (ECDF) is then evaluated at these points, and the test statistic is the supremum of the absolute differences between the ECDF and the hypothesized cumulative distribution function , specifically , where for . In practice, this supremum is achieved at the order statistics, so is calculated as the maximum of and over .[1] For the two-sample test, the samples from the two groups are sorted separately to form order statistics, and the ECDFs and are computed for samples of sizes and . The statistic is the supremum of over , which can be efficiently found by merging the two sorted lists and evaluating the differences at the combined order statistics, accounting for jumps in each ECDF. This merged approach avoids redundant computations by traversing the combined sequence once.[1] The time complexity of computing the KS statistic involves an initial sorting step, which requires operations for a sample of size in the one-sample case, or for the two-sample case. Following sorting, the search for the supremum difference proceeds in linear time, , by iterating over the order statistics.[45] P-values for the KS test are calculated differently based on sample size. For large , asymptotic approximations to the Kolmogorov distribution are used, where (adjusted for two-sample as ), providing the tail probability . For smaller samples or when parameters are estimated, exact p-values can be obtained via dynamic programming methods, such as the Durbin matrix algorithm, which computes the distribution recursively in time. An efficient approximation is the Pelz-Good asymptotic series, which expands the distribution function using a series of terms for improved accuracy over basic asymptotics, particularly for moderate .[46] When exact or asymptotic methods are computationally intensive, Monte Carlo simulation offers a flexible alternative for p-value estimation. This involves generating (typically to ) independent samples under the null hypothesis, computing the KS statistic for each, and estimating the p-value as the proportion of simulated statistics exceeding the observed . The process is parallelizable across simulations to reduce computation time.[47] Special handling is required for edge cases to ensure numerical stability and validity. In discrete distributions with ties, the standard KS test assumes continuity and may produce biased results; ties can be addressed by averaging the ECDF ranks at tied points or by adding small uniform random noise to break ties while preserving the distribution. Numerical stability issues arise when is near 0 or 1, potentially causing floating-point errors in differences; these are mitigated by using adjusted ECDF evaluations, such as instead of , to avoid boundary artifacts.[48] The following pseudocode illustrates the computation of the one-sample KS statistic:function ks_one_sample_statistic(sample, F):
n = length(sample)
sort sample to get X_sorted # O(n log n)
D = 0
for i in 1 to n:
# Evaluate at jumps
d1 = abs(i / n - F(X_sorted[i]))
d2 = abs((i - 1) / n - F(X_sorted[i]))
# Optional adjustment for continuity correction
d3 = abs((i - 0.5) / n - F(X_sorted[i]))
D = max(D, d1, d2, d3)
return D # O(n) after sorting
function ks_one_sample_statistic(sample, F):
n = length(sample)
sort sample to get X_sorted # O(n log n)
D = 0
for i in 1 to n:
# Evaluate at jumps
d1 = abs(i / n - F(X_sorted[i]))
d2 = abs((i - 1) / n - F(X_sorted[i]))
# Optional adjustment for continuity correction
d3 = abs((i - 0.5) / n - F(X_sorted[i]))
D = max(D, d1, d2, d3)
return D # O(n) after sorting
Software and libraries
The Kolmogorov–Smirnov (KS) test is implemented in various statistical software packages and programming libraries, facilitating its application in data analysis workflows across different environments. In the R programming language, the basestats package provides the ks.test() function, which supports both one-sample and two-sample KS tests. For the one-sample case, it compares an empirical distribution to a specified cumulative distribution function (CDF), such as the normal distribution, with options to adjust for estimated parameters using the Lilliefors correction. The syntax is exemplified as ks.test(x, y = "pnorm", mean = 0, sd = 1), where x is the data vector and y specifies the theoretical distribution. For advanced handling of discrete distributions, the dgof package extends KS testing with adjustments for ties and lattice distributions, while the boot package enables simulation-based p-value estimation through bootstrapping.
Python's SciPy library offers robust implementations via the scipy.stats module, including kstest() for one-sample tests against a specified or empirical CDF, and ks_2samp() for two-sample comparisons. These functions integrate seamlessly with NumPy arrays, supporting efficient computation on large datasets, and allow custom CDFs for flexible hypothesis testing. For instance, kstest(rvs, 'norm', args=(0, 1)) tests normality assuming mean 0 and standard deviation 1. Starting with SciPy version 1.10.0, enhancements to bootstrap methods improve accuracy for p-value calculations in complex scenarios.
MATLAB includes built-in functions kstest() and kstest2() in its Statistics and Machine Learning Toolbox, performing one- and two-sample KS tests, respectively, with access to precomputed critical value tables for asymptotic distributions. These tools output test statistics, p-values, and confidence intervals, suitable for both interactive and scripted analysis.
Other environments provide specialized support: Julia's HypothesisTests.jl package implements KS tests through functions like ExactKS for one-sample and ApproximateTwoSampleKS for two-sample cases, emphasizing high-performance computing. In SAS, the PROC NPAR1WAY procedure conducts KS tests as part of nonparametric analyses. For users of Microsoft Excel, third-party add-ins like the Real Statistics Resource Pack enable basic one-sample KS testing via menu-driven interfaces or VBA functions.
Considerations for implementation include the balance between open-source options like R, Python, and Julia—which offer extensive customization and community support—and proprietary tools like MATLAB and SAS, which provide integrated environments with validated compliance for enterprise use. Version-specific updates, such as those in SciPy, underscore the importance of maintaining current installations to leverage algorithmic refinements.
Power and limitations
The Kolmogorov–Smirnov (KS) test demonstrates moderate statistical power for detecting location shifts in continuous distributions, where it outperforms the chi-squared test due to the latter's reliance on binning, which reduces effectiveness for small sample sizes and continuous data.[1][49] Simulation studies show that power varies with sample size, alternative distribution, and significance level.[50] However, the KS test has low power against changes in variance or deviations involving heavy tails, as it is primarily sensitive to discrepancies near the center of the distribution rather than the extremes.[1] In comparisons with other goodness-of-fit tests, the KS statistic exhibits lower power than the Anderson-Darling test, particularly for tail deviations, where the latter weights observations more heavily in the tails and requires smaller sample sizes (e.g., n=14 vs. n=17 for 80% power at a standardized shift of δ=0.9 in normal distributions).[50][12] Relative to the chi-squared test, the KS approach is preferable for small n (e.g., n<50) because it avoids arbitrary binning and provides exact distribution-free critical values under continuity assumptions.[1] Empirical evidence from 1980s studies, such as those by Stephens, quantified these power curves through extensive tables and simulations for exponentiality and normality tests, confirming the KS test's relative strengths and weaknesses across alternatives. Key limitations include high sensitivity to ties in discrete data, which renders the test underpowered and requires conservative p-value adjustments or specialized implementations to maintain validity.[9] The test assumes independent and identically distributed (i.i.d.) observations, leading to invalid results and inflated Type I errors for clustered or dependent data, such as time series.[51] Additionally, when distribution parameters are estimated from the data, the test becomes conservative, necessitating simulation-based critical values rather than asymptotic approximations.[1] Common pitfalls involve misinterpreting p-values without verifying continuity or over-relying on asymptotic distributions for small samples (n<20), where exact methods or bootstrapping are recommended to avoid underpowered inferences.[1] In the context of credit risk modeling, the two-sample KS statistic is commonly employed to assess the discriminatory power of credit scoring models. It measures the maximum vertical separation between the cumulative distribution functions of predicted scores for good (non-defaulting) borrowers and bad (defaulting) borrowers. A higher KS value indicates superior model discrimination, with values in the range of 40-60% typically signifying strong separation and effective model performance.[52][53][54] To address these issues, weighted variants of the KS test, such as Kuiper's test for circular data, improve power against specific alternatives like rotational shifts by emphasizing uniform weighting across the domain.[55] Bootstrap resampling enhances overall power by incorporating parameter uncertainty and providing more accurate p-values, particularly for finite samples or non-i.i.d. settings.[56] Recent studies in the 2020s highlight significant power loss in high-dimensional extensions of the KS test, where the curse of dimensionality dilutes sensitivity, prompting developments in sliced or projected variants for multivariate applications.[57] # Evaluate at jumps d1 = abs(i / n - F(X_sorted)) d2 = abs((i - 1) / n - F(X_sorted)) # Optional adjustment for continuity correction d3 = abs((i - 0.5) / n - F(X_sorted)) D = max(D, d1, d2, d3) return D # O(n) after sorting
[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
### Software and libraries
The Kolmogorov–Smirnov (KS) test is implemented in various statistical software packages and programming libraries, facilitating its application in data analysis workflows across different environments.
In the R programming language, the base `stats` package provides the `ks.test()` function, which supports both one-sample and two-sample KS tests. For the one-sample case, it compares an empirical distribution to a specified cumulative distribution function (CDF), such as the normal distribution, with options to adjust for estimated parameters using the Lilliefors correction. The syntax is exemplified as `ks.test(x, y = "pnorm", mean = 0, sd = 1)`, where `x` is the data vector and `y` specifies the theoretical distribution. For advanced handling of discrete distributions, the `dgof` package extends KS testing with adjustments for ties and lattice distributions, while the `boot` package enables simulation-based p-value estimation through bootstrapping.
Python's [SciPy](/page/SciPy) library offers robust implementations via the `scipy.stats` module, including `kstest()` for one-sample tests against a specified or empirical CDF, and `ks_2samp()` for two-sample comparisons. These functions integrate seamlessly with [NumPy](/page/NumPy) arrays, supporting efficient computation on large datasets, and allow custom CDFs for flexible [hypothesis](/page/Hypothesis) testing. For instance, `kstest(rvs, 'norm', args=(0, 1))` tests normality assuming mean 0 and standard deviation 1. Starting with [SciPy](/page/SciPy) version 1.10.0, enhancements to bootstrap methods improve accuracy for [p-value](/page/P-value) calculations in complex scenarios.
MATLAB includes built-in functions `kstest()` and `kstest2()` in its Statistics and [Machine Learning](/page/Machine_learning) Toolbox, performing one- and two-sample KS tests, respectively, with access to precomputed critical value tables for asymptotic distributions. These tools output test statistics, p-values, and [confidence](/page/Confidence) intervals, suitable for both interactive and scripted analysis.
Other environments provide specialized support: Julia's `HypothesisTests.jl` package implements KS tests through functions like `ExactKS` for one-sample and `ApproximateTwoSampleKS` for two-sample cases, emphasizing [high-performance computing](/page/High-performance_computing). In SAS, the `PROC NPAR1WAY` procedure conducts KS tests as part of nonparametric analyses. For users of [Microsoft Excel](/page/Microsoft_Excel), third-party add-ins like the Real Statistics Resource Pack enable basic one-sample KS testing via menu-driven interfaces or VBA functions.
Considerations for implementation include the balance between open-source options like [R](/page/R), Python, and Julia—which offer extensive customization and community support—and proprietary tools like [MATLAB](/page/MATLAB) and SAS, which provide integrated environments with validated compliance for enterprise use. Version-specific updates, such as those in [SciPy](/page/SciPy), underscore the importance of maintaining current installations to leverage algorithmic refinements.
### Power and limitations
The Kolmogorov–Smirnov (KS) test demonstrates moderate statistical power for detecting location shifts in continuous distributions, where it outperforms the [chi-squared test](/page/Chi-squared_test) due to the latter's reliance on binning, which reduces effectiveness for small sample sizes and continuous data.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)[](http://article.sapub.org/10.5923.j.ijps.20180705.02.html) Simulation studies show that power varies with sample size, alternative distribution, and significance level.[](https://www.cna.org/archive/CNA_Files/pdf/dop-2016-u-014638-final.pdf) However, the KS test has low power against changes in variance or deviations involving heavy tails, as it is primarily sensitive to discrepancies near the center of the distribution rather than the extremes.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
In comparisons with other goodness-of-fit tests, the KS statistic exhibits lower power than the Anderson-Darling test, particularly for tail deviations, where the latter weights observations more heavily in the tails and requires smaller sample sizes (e.g., n=14 vs. n=17 for 80% power at a standardized shift of δ=0.9 in normal distributions).[](https://www.cna.org/archive/CNA_Files/pdf/dop-2016-u-014638-final.pdf)[](https://www.nrc.gov/docs/ml1714/ml17143a100.pdf) Relative to the [chi-squared test](/page/Chi-squared_test), the KS approach is preferable for small n (e.g., n<50) because it avoids arbitrary binning and provides exact distribution-free critical values under continuity assumptions.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm) [Empirical evidence](/page/Empirical_evidence) from 1980s studies, such as those by Stephens, quantified these power curves through extensive tables and simulations for exponentiality and normality tests, confirming the KS test's relative strengths and weaknesses across alternatives.
Key limitations include high sensitivity to ties in discrete data, which renders the test underpowered and requires conservative [p-value](/page/P-value) adjustments or specialized implementations to maintain validity.[](http://www.stat.yale.edu/~jay/EmersonMaterials/DiscreteGOF.pdf) The test assumes independent and identically distributed (i.i.d.) observations, leading to invalid results and inflated Type I errors for clustered or dependent data, such as [time series](/page/Time_series).[](https://www.pnas.org/doi/10.1073/pnas.1008446107) Additionally, when distribution parameters are estimated from the data, the test becomes conservative, necessitating simulation-based critical values rather than asymptotic approximations.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm) Common pitfalls involve misinterpreting p-values without verifying continuity or over-relying on asymptotic distributions for small samples (n<20), where exact methods or [bootstrapping](/page/Bootstrapping) are recommended to avoid underpowered inferences.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
To address these issues, weighted variants of the KS test, such as Kuiper's test for circular data, improve power against specific alternatives like rotational shifts by emphasizing [uniform](/page/Uniform) weighting across the domain.[](https://cxc.cfa.harvard.edu/csc/why/ks_test.html) Bootstrap resampling enhances overall power by incorporating parameter uncertainty and providing more accurate p-values, particularly for finite samples or non-i.i.d. settings.[](https://www.jams.or.jp/scm/contents/e-2015-3/2015-30.pdf) Recent studies in the 2020s highlight significant power loss in high-dimensional extensions of the KS test, where the curse of dimensionality dilutes sensitivity, prompting developments in sliced or projected variants for multivariate applications.[](https://ml4physicalsciences.github.io/2020/files/NeurIPS_ML4PS_2020_75.pdf)
[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
### Software and libraries
The Kolmogorov–Smirnov (KS) test is implemented in various statistical software packages and programming libraries, facilitating its application in data analysis workflows across different environments.
In the R programming language, the base `stats` package provides the `ks.test()` function, which supports both one-sample and two-sample KS tests. For the one-sample case, it compares an empirical distribution to a specified cumulative distribution function (CDF), such as the normal distribution, with options to adjust for estimated parameters using the Lilliefors correction. The syntax is exemplified as `ks.test(x, y = "pnorm", mean = 0, sd = 1)`, where `x` is the data vector and `y` specifies the theoretical distribution. For advanced handling of discrete distributions, the `dgof` package extends KS testing with adjustments for ties and lattice distributions, while the `boot` package enables simulation-based p-value estimation through bootstrapping.
Python's [SciPy](/page/SciPy) library offers robust implementations via the `scipy.stats` module, including `kstest()` for one-sample tests against a specified or empirical CDF, and `ks_2samp()` for two-sample comparisons. These functions integrate seamlessly with [NumPy](/page/NumPy) arrays, supporting efficient computation on large datasets, and allow custom CDFs for flexible [hypothesis](/page/Hypothesis) testing. For instance, `kstest(rvs, 'norm', args=(0, 1))` tests normality assuming mean 0 and standard deviation 1. Starting with [SciPy](/page/SciPy) version 1.10.0, enhancements to bootstrap methods improve accuracy for [p-value](/page/P-value) calculations in complex scenarios.
MATLAB includes built-in functions `kstest()` and `kstest2()` in its Statistics and [Machine Learning](/page/Machine_learning) Toolbox, performing one- and two-sample KS tests, respectively, with access to precomputed critical value tables for asymptotic distributions. These tools output test statistics, p-values, and [confidence](/page/Confidence) intervals, suitable for both interactive and scripted analysis.
Other environments provide specialized support: Julia's `HypothesisTests.jl` package implements KS tests through functions like `ExactKS` for one-sample and `ApproximateTwoSampleKS` for two-sample cases, emphasizing [high-performance computing](/page/High-performance_computing). In SAS, the `PROC NPAR1WAY` procedure conducts KS tests as part of nonparametric analyses. For users of [Microsoft Excel](/page/Microsoft_Excel), third-party add-ins like the Real Statistics Resource Pack enable basic one-sample KS testing via menu-driven interfaces or VBA functions.
Considerations for implementation include the balance between open-source options like [R](/page/R), Python, and Julia—which offer extensive customization and community support—and proprietary tools like [MATLAB](/page/MATLAB) and SAS, which provide integrated environments with validated compliance for enterprise use. Version-specific updates, such as those in [SciPy](/page/SciPy), underscore the importance of maintaining current installations to leverage algorithmic refinements.
### Power and limitations
The Kolmogorov–Smirnov (KS) test demonstrates moderate statistical power for detecting location shifts in continuous distributions, where it outperforms the [chi-squared test](/page/Chi-squared_test) due to the latter's reliance on binning, which reduces effectiveness for small sample sizes and continuous data.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)[](http://article.sapub.org/10.5923.j.ijps.20180705.02.html) Simulation studies show that power varies with sample size, alternative distribution, and significance level.[](https://www.cna.org/archive/CNA_Files/pdf/dop-2016-u-014638-final.pdf) However, the KS test has low power against changes in variance or deviations involving heavy tails, as it is primarily sensitive to discrepancies near the center of the distribution rather than the extremes.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
In comparisons with other goodness-of-fit tests, the KS statistic exhibits lower power than the Anderson-Darling test, particularly for tail deviations, where the latter weights observations more heavily in the tails and requires smaller sample sizes (e.g., n=14 vs. n=17 for 80% power at a standardized shift of δ=0.9 in normal distributions).[](https://www.cna.org/archive/CNA_Files/pdf/dop-2016-u-014638-final.pdf)[](https://www.nrc.gov/docs/ml1714/ml17143a100.pdf) Relative to the [chi-squared test](/page/Chi-squared_test), the KS approach is preferable for small n (e.g., n<50) because it avoids arbitrary binning and provides exact distribution-free critical values under continuity assumptions.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm) [Empirical evidence](/page/Empirical_evidence) from 1980s studies, such as those by Stephens, quantified these power curves through extensive tables and simulations for exponentiality and normality tests, confirming the KS test's relative strengths and weaknesses across alternatives.
Key limitations include high sensitivity to ties in discrete data, which renders the test underpowered and requires conservative [p-value](/page/P-value) adjustments or specialized implementations to maintain validity.[](http://www.stat.yale.edu/~jay/EmersonMaterials/DiscreteGOF.pdf) The test assumes independent and identically distributed (i.i.d.) observations, leading to invalid results and inflated Type I errors for clustered or dependent data, such as [time series](/page/Time_series).[](https://www.pnas.org/doi/10.1073/pnas.1008446107) Additionally, when distribution parameters are estimated from the data, the test becomes conservative, necessitating simulation-based critical values rather than asymptotic approximations.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm) Common pitfalls involve misinterpreting p-values without verifying continuity or over-relying on asymptotic distributions for small samples (n<20), where exact methods or [bootstrapping](/page/Bootstrapping) are recommended to avoid underpowered inferences.[](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm)
To address these issues, weighted variants of the KS test, such as Kuiper's test for circular data, improve power against specific alternatives like rotational shifts by emphasizing [uniform](/page/Uniform) weighting across the domain.[](https://cxc.cfa.harvard.edu/csc/why/ks_test.html) Bootstrap resampling enhances overall power by incorporating parameter uncertainty and providing more accurate p-values, particularly for finite samples or non-i.i.d. settings.[](https://www.jams.or.jp/scm/contents/e-2015-3/2015-30.pdf) Recent studies in the 2020s highlight significant power loss in high-dimensional extensions of the KS test, where the curse of dimensionality dilutes sensitivity, prompting developments in sliced or projected variants for multivariate applications.[](https://ml4physicalsciences.github.io/2020/files/NeurIPS_ML4PS_2020_75.pdf)
