Recent from talks
Nothing was collected or created yet.
Chi-squared test
View on Wikipedia

A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table) are independent in influencing the test statistic (values within the table).[1] The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. For contingency tables with smaller sample sizes, a Fisher's exact test is used instead.
In the standard applications of this test, the observations are classified into mutually exclusive classes. If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution. The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.
Test statistics that follow a χ2 distribution occur when the observations are independent. There are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on observations of the pairs.
Chi-squared tests often refers to tests for which the distribution of the test statistic approaches the χ2 distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true) of the test statistic approximates a chi-squared distribution more and more closely as sample sizes increase.
History
[edit]In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Mansfield Merriman, whose works were criticized by Karl Pearson in his 1900 paper.[2]
At the end of the 19th century, Pearson noticed the existence of significant skewness within some biological observations. In order to model the observations regardless of being normal or skewed, Pearson, in a series of articles published from 1893 to 1916,[3][4][5][6] devised the Pearson distribution, a family of continuous probability distributions, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing a test of goodness of fit to determine how well the model really fits to the observations.
Pearson's chi-squared test
[edit]In 1900, Pearson published a paper[2] on the χ2 test which is considered to be one of the foundations of modern statistics.[7] In this paper, Pearson investigated a test of goodness of fit.
Suppose that n observations in a random sample from a population are classified into k mutually exclusive classes with respective observed numbers of observations xi (for i = 1,2,…,k), and a null hypothesis gives the probability pi that an observation falls into the ith class. So we have the expected numbers mi = npi for all i, where
Pearson proposed that, under the circumstance of the null hypothesis being correct, as n → ∞ the limiting distribution of the quantity given below is the χ2 distribution.
Pearson dealt first with the case in which the expected numbers mi are large enough known numbers in all cells assuming every observation xi may be taken as normally distributed, and reached the result that, in the limit as n becomes large, X2 follows the χ2 distribution with k − 1 degrees of freedom.
However, Pearson next considered the case in which the expected numbers depended on the parameters that had to be estimated from the sample, and suggested that, with the notation of mi being the true expected numbers and m′i being the estimated expected numbers, the difference
will usually be positive and small enough to be omitted. In a conclusion, Pearson argued that if we regarded X′2 as also distributed as χ2 distribution with k − 1 degrees of freedom, the error in this approximation would not affect practical decisions. This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher's 1922 and 1924 papers.[8][9]
Other examples of chi-squared tests
[edit]One test statistic that follows a chi-squared distribution exactly is the test that the variance of a normally distributed population has a given value based on a sample variance. Such tests are uncommon in practice because the true variance of the population is usually unknown. However, there are several statistical tests where the chi-squared distribution is approximately valid:
Fisher's exact test
[edit]For an exact test used in place of the 2 × 2 chi-squared test for independence when all the row and column totals were fixed by design, see Fisher's exact test. When the row or column margins (or both) are random variables (as in most common research designs) this tends to be overly conservative and underpowered.[10]
Binomial test
[edit]For an exact test used in place of the 2 × 1 chi-squared test for goodness of fit, see binomial test.
Other chi-squared tests
[edit]- Cochran–Mantel–Haenszel chi-squared test.
- McNemar's test, used in certain 2 × 2 tables with pairing
- Tukey's test of additivity
- The portmanteau test in time-series analysis, testing for the presence of autocorrelation
- Likelihood-ratio tests in general statistical modelling, for testing whether there is evidence of the need to move from a simple model to a more complicated one (where the simple model is nested within the complicated one).
Yates's correction for continuity
[edit]Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct and introduces some error.
To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the absolute difference between each observed value and its expected value in a 2 × 2 contingency table.[11] This reduces the chi-squared value obtained and thus increases its p-value.
Chi-squared test for variance in a normal population
[edit]If a sample of size n is taken from a population having a normal distribution, then there is a result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of n product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e. the value to be tested as holding). Then T has a chi-squared distribution with n − 1 degrees of freedom. For example, if the sample size is 21, the acceptance region for T with a significance level of 5% is between 9.59 and 34.17.
Example chi-squared test for categorical data
[edit]Suppose there is a city of 1,000,000 residents with four neighborhoods: A, B, C, and D. A random sample of 650 residents of the city is taken and their occupation is recorded as "white collar", "blue collar", or "no collar". The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification. The data are tabulated as:
A B C D Total White collar 90 60 104 95 349 Blue collar 30 50 51 20 151 No collar 30 40 45 35 150 Total 150 150 200 150 650
Let us take the sample living in neighborhood A, 150, to estimate what proportion of the whole 1,000,000 live in neighborhood A. Similarly we take 349/650 to estimate what proportion of the 1,000,000 are white-collar workers. By the assumption of independence under the hypothesis we should "expect" the number of white-collar workers in neighborhood A to be
Then in that "cell" of the table, we have
The sum of these quantities over all of the cells is the test statistic; in this case, . Under the null hypothesis, this sum has approximately a chi-squared distribution whose number of degrees of freedom is
If the test statistic is improbably large according to that chi-squared distribution, then one rejects the null hypothesis of independence.
A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many residents of each neighborhood to include. Then each resident has the same chance of being chosen as do all residents of the same neighborhood, but residents of different neighborhoods would have different probabilities of being chosen if the four sample sizes are not proportional to the populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather than "independence". The question is whether the proportions of blue-collar, white-collar, and no-collar workers in the four neighborhoods are the same. However, the test is done in the same way.
Applications
[edit]In cryptanalysis, the chi-squared test is used to compare the distribution of plaintext and (possibly) decrypted ciphertext. The lowest value of the test means that the decryption was successful with high probability.[12][13] This method can be generalized for solving modern cryptographic problems.[14]
In bioinformatics, the chi-squared test is used to compare the distribution of certain properties of genes (e.g., genomic content, mutation rate, interaction network clustering, etc.) belonging to different categories (e.g., disease genes, essential genes, genes on a certain chromosome etc.).[15][16]
See also
[edit]References
[edit]- ^ "Chi-Square - Sociology 3112 - Department of Sociology - The University of utah". soc.utah.edu. Retrieved 2022-11-12.
- ^ a b Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
- ^ Pearson, Karl (1893). "Contributions to the mathematical theory of evolution [abstract]". Proceedings of the Royal Society. 54: 329–333. doi:10.1098/rspl.1893.0079. JSTOR 115538.
- ^ Pearson, Karl (1895). "Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material". Philosophical Transactions of the Royal Society. 186: 343–414. Bibcode:1895RSPTA.186..343P. doi:10.1098/rsta.1895.0010. JSTOR 90649.
- ^ Pearson, Karl (1901). "Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation". Philosophical Transactions of the Royal Society A. 197 (287–299): 443–459. Bibcode:1901RSPTA.197..443P. doi:10.1098/rsta.1901.0023. JSTOR 90841.
- ^ Pearson, Karl (1916). "Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation". Philosophical Transactions of the Royal Society A. 216 (538–548): 429–457. Bibcode:1916RSPTA.216..429P. doi:10.1098/rsta.1916.0009. JSTOR 91092.
- ^ Cochran, William G. (1952). "The Chi-square Test of Goodness of Fit". The Annals of Mathematical Statistics. 23 (3): 315–345. doi:10.1214/aoms/1177729380. JSTOR 2236678.
- ^ Fisher, Ronald A. (1922). "On the Interpretation of χ2 from Contingency Tables, and the Calculation of P". Journal of the Royal Statistical Society. 85 (1): 87–94. doi:10.2307/2340521. JSTOR 2340521.
- ^ Fisher, Ronald A. (1924). "The Conditions Under Which χ2 Measures the Discrepancey Between Observation and Hypothesis". Journal of the Royal Statistical Society. 87 (3): 442–450. JSTOR 2341149.
- ^ Campbell, Ian (2007-08-30). "Chi-squared and Fisher–Irwin tests of two-by-two tables with small sample recommendations". Statistics in Medicine. 26 (19): 3661–3675. doi:10.1002/sim.2832. ISSN 0277-6715. PMID 17315184.
- ^ Yates, Frank (1934). "Contingency table involving small numbers and the χ2 test". Supplement to the Journal of the Royal Statistical Society. 1 (2): 217–235. doi:10.2307/2983604. JSTOR 2983604.
- ^ "Chi-squared Statistic". Practical Cryptography. Archived from the original on 18 February 2015. Retrieved 18 February 2015.
- ^ "Using Chi Squared to Crack Codes". IB Maths Resources. British International School Phuket. 15 June 2014.
- ^ Ryabko, B. Ya.; Stognienko, V. S.; Shokin, Yu. I. (2004). "A new test for randomness and its application to some cryptographic problems" (PDF). Journal of Statistical Planning and Inference. 123 (2): 365–376. doi:10.1016/s0378-3758(03)00149-6. Retrieved 18 February 2015.
- ^ Feldman, I.; Rzhetsky, A.; Vitkup, D. (2008). "Network properties of genes harboring inherited disease mutations". PNAS. 105 (11): 4323–432. Bibcode:2008PNAS..105.4323F. doi:10.1073/pnas.0701722105. PMC 2393821. PMID 18326631.
- ^ "chi-square-tests" (PDF). Archived from the original (PDF) on 29 June 2018. Retrieved 29 June 2018.
Further reading
[edit]- Weisstein, Eric W. "Chi-Squared Test". MathWorld.
- Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. New York: Wiley. ISBN 978-1118840313.
- Greenwood, Cindy; Nikulin, M. S. (1996). A guide to chi-squared testing. New York: Wiley. ISBN 0-471-55779-X.
- Nikulin, M. S. (1973). Chi-squared test for normality. Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics. Vol. 2. pp. 119–122.
- Bagdonavicius, Vilijandas B.; Nikulin, Mikhail S. (2011). "Chi-squared goodness-of-fit test for right censored data". International Journal of Applied Mathematics & Statistics. 24: 30–50. MR 2800388.
- Chicco D.; Sichenze A.; Jurman G. (2025). "A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics". BioData Mining. 18 (56): 1-51. doi:10.1186/s13040-025-00465-6.
Chi-squared test
View on GrokipediaIntroduction
Definition and Purpose
The chi-squared test is a statistical hypothesis test that employs the chi-squared distribution to assess the extent of discrepancies between observed frequencies and expected frequencies in categorical data.[4] It evaluates whether these differences are likely due to random variation or indicate a significant deviation from the null hypothesis.[5] Under the null hypothesis, the test statistic follows an asymptotic chi-squared distribution, allowing for the computation of p-values to determine statistical significance.[4] The primary purposes of the chi-squared test are to examine independence between two or more categorical variables in contingency tables and to test the goodness-of-fit of observed data to a specified theoretical distribution.[6] In the test of independence, it determines whether the distribution of one variable depends on the levels of another, such as assessing associations in survey responses across demographic groups.[5] For goodness-of-fit, it verifies if empirical frequencies align with expected proportions under models like uniformity or specific probability distributions.[4] The test statistic is given by where represents the observed frequencies and the expected frequencies for each category .[4] Developed in early 20th-century statistics for analyzing categorical data, the chi-squared test is non-parametric, imposing no assumptions on the underlying distribution of the data itself, but relying on the asymptotic chi-squared distribution of the statistic under the null hypothesis.[7] This makes it versatile for applications where data are counts or proportions without normality requirements.[5]Assumptions and Prerequisites
The chi-squared test requires that observations are independent, meaning each data point is collected without influencing others, to ensure the validity of the underlying statistical inference.[5] This independence assumption holds when the sample is drawn as a simple random sample from the population, avoiding any systematic dependencies or clustering in the data.[8] Additionally, the test is designed for categorical data, where variables are discrete or binned into mutually exclusive categories, rather than continuous measurements that have not been discretized.[9] A critical assumption concerns sample size adequacy: the expected frequencies in at least 80% of the cells should be 5 or greater, with no expected frequencies less than 1, to justify the asymptotic approximation to the chi-squared distribution under the null hypothesis.[10] Violations of this rule, particularly in small samples, can lead to unreliable p-values, necessitating alternatives such as exact tests like Fisher's exact test.[11] Prior to applying the chi-squared test, users should possess foundational knowledge in probability theory, including concepts like expected values and distributions, as well as the hypothesis testing framework—encompassing null and alternative hypotheses, test statistics, and interpretation of p-values at chosen significance levels (e.g., α = 0.05).[12] These prerequisites enable proper setup of the test for applications such as assessing independence in contingency tables.[13]Historical Development
Karl Pearson's Formulation
In 1900, Karl Pearson introduced the chi-squared test in a paper published in the Philosophical Magazine, presenting it as a criterion to determine whether observed deviations from expected probabilities in a system of correlated variables could reasonably be ascribed to random sampling. This formulation addressed key limitations in prior approaches to analyzing categorical data, particularly in biological contexts where earlier methods struggled to quantify discrepancies between empirical observations and theoretical expectations.[14] Pearson's work was motivated by the need for a robust tool to evaluate patterns in genetics, building on challenges posed by datasets like those from Gregor Mendel's experiments on pea plant inheritance, which highlighted inconsistencies in fitting discrete distributions to observed frequencies.[15] Pearson derived the test statistic as a sum of squared deviations between observed and expected frequencies, divided by the expected frequencies to account for varying scales across categories; this measure captured the overall discrepancy in a single value, inspired by the summation of squared standardized normals from multivariate normal theory.[16] He symbolized the statistic with the Greek letter χ²—pronounced "chi-squared"—reflecting its connection to the squared form of the character χ, a notation that has persisted in statistical literature. Initially, Pearson applied the test to biological data on inheritance patterns, such as ratios in genetic crosses, enabling researchers to assess whether empirical results aligned with hypothesized Mendelian proportions under random variation.[14] A pivotal aspect of Pearson's contribution was establishing the asymptotic distribution of the χ² statistic under the null hypothesis of good fit, linking it to a chi-squared distribution with k degrees of freedom, where k equals the number of categories minus the number of parameters estimated from the data. This theoretical foundation allowed for probabilistic inference, with larger values of the statistic indicating poorer fit and lower probabilities of the data arising by chance alone.[16] By formalizing this approach, Pearson provided the first systematic method for goodness-of-fit testing in categorical settings, profoundly influencing the development of modern statistical inference in biology and beyond.[14]Subsequent Contributions and Naming
Following Karl Pearson's initial formulation, Ronald A. Fisher advanced the chi-squared test in the 1920s by rigorously establishing its asymptotic chi-squared distribution under the null hypothesis and extending its application to testing independence in contingency tables. In his 1922 paper, Fisher derived the appropriate degrees of freedom for the test statistic—(r-1)(c-1) for an r × c contingency table—correcting earlier inconsistencies in Pearson's approach and enabling more accurate p-value calculations for assessing deviations from independence.[17] The nomenclature distinguishes "Pearson's chi-squared test" as the statistical procedure itself, crediting its originator, from the "chi-squared distribution," which describes the limiting probability distribution of the test statistic. This naming convention arises from Pearson's adoption of the symbol χ² (chi squared) for the statistic, while Fisher provided the foundational proof of its convergence to the chi-squared distribution, solidifying the theoretical basis.[18] In the 1930s, the chi-squared test became integrated into the Neyman-Pearson framework for hypothesis testing, which emphasized specifying alternative hypotheses, controlling both Type I and Type II error rates, and using p-values to quantify evidence against the null. This incorporation elevated the test's role in formal inferential procedures, aligning it with broader developments in statistical decision theory. By the 1940s, the chi-squared test achieved widespread recognition in genetics, as seen in Fisher's 1936 application to evaluate the goodness-of-fit of Gregor Mendel's experimental ratios to Mendelian expectations, revealing improbably precise results suggestive of data adjustment. In social sciences, it facilitated analysis of associations in categorical survey data, with standardization occurring through its prominent inclusion in influential textbooks like E.F. Lindquist's 1940 Statistical Analysis in Educational Research, which exemplified its use in fields such as education and sociology.[19][20]The Pearson Chi-squared Statistic
Mathematical Formulation
The chi-squared test evaluates hypotheses concerning the distribution of categorical data. The null hypothesis asserts that the observed frequencies conform to expected frequencies under a specified theoretical distribution (goodness-of-fit test) or that categorical variables are independent (test of independence), while the alternative hypothesis posits deviation from this fit or presence of dependence.[4] The Pearson chi-squared statistic is given by where the sum is over all categories , denotes the observed frequency in category , and is the expected frequency under . This formulation measures the discrepancy between observed and expected values, normalized by the expected frequencies to account for varying category sizes.[4] The statistic was originally proposed by Karl Pearson in 1900 as a measure of goodness of fit for frequency distributions.[7] The statistic derives from the multinomial likelihood under , where the data follow a multinomial distribution with probabilities yielding the expected frequencies . The log-likelihood ratio test statistic provides an alternative measure, but under large samples, a second-order Taylor (quadratic) approximation of the log-likelihood around the null yields the Pearson form asymptotically.[21] For the goodness-of-fit test, the expected frequencies are , where is the total sample size and are the theoretical probabilities for each category under .[4] In the test of independence for an contingency table, the expected frequency for cell is , where is the total for row , the total for column , and the grand total.[22] Under and large sample sizes, approximately follows a chi-squared distribution with appropriate degrees of freedom , and the p-value is , where is the computed statistic.[4]Asymptotic Properties and Degrees of Freedom
Under the null hypothesis and for sufficiently large sample sizes, the Pearson chi-squared statistic converges in distribution to a central chi-squared distribution with a specified number of degrees of freedom, providing the theoretical basis for hypothesis testing. This asymptotic property, established by Pearson in his foundational work, allows the use of chi-squared critical values to assess the significance of observed deviations from expected frequencies.[7] The degrees of freedom for the chi-squared distribution depend on the test context. In the test of independence for an contingency table, the degrees of freedom are , reflecting the number of independent cells after accounting for row and column marginal constraints.[5] For the goodness-of-fit test involving categories where the expected frequencies are fully specified, the degrees of freedom are ; if parameters of the hypothesized distribution are estimated from the data, this adjusts to .[4] This asymptotic chi-squared distribution arises from the Central Limit Theorem applied to the multinomial sampling model underlying the test. Under the null hypothesis, the standardized differences for each category are approximately independent standard normal random variables for large expected frequencies , so their squares sum to a chi-squared variate.[23] To conduct the test, the observed chi-squared statistic is compared to the critical value from the chi-squared distribution with degrees of freedom at significance level ; the null hypothesis is rejected if the statistic exceeds this value. Critical values are available in standard tables or computed via statistical software functions.[24] The validity of the chi-squared approximation strengthens as the expected frequencies increase, typically recommended to be at least 5 in most cells to ensure reliable inference.[4]Primary Applications
Test of Independence for Categorical Data
The chi-squared test of independence assesses whether there is a statistically significant association between two categorical variables, using an r × c contingency table that displays observed frequencies Oij for each combination of row category i (where i = 1 to r) and column category j (where j = 1 to c). This setup arises from cross-classifying a sample of N observations into the table cells based on their values for the two variables. The null hypothesis H0 posits that the row variable and column variable are independent, implying that the distribution of one variable does not depend on the levels of the other; the alternative hypothesis Ha suggests dependence or association between them. Under the null hypothesis, expected frequencies Eij for each cell are computed as the product of the row total for i and the column total for j, divided by the overall sample size N: These expected values represent what would be anticipated if the variables were truly independent, preserving the marginal totals of the observed table. The test then evaluates deviations between observed and expected frequencies using the Pearson chi-squared statistic, which approximates a chi-squared distribution with (r-1)(c-1) degrees of freedom for sufficiently large samples (typically when all expected frequencies exceed 5). Interpretation involves computing the p-value from the chi-squared distribution of the test statistic; if the p-value is less than the chosen significance level α (commonly 0.05), the null hypothesis is rejected in favor of the alternative, indicating evidence of dependence between the variables. To quantify the strength of any detected association beyond mere significance, measures such as Cramér's V can be applied, defined as the square root of the chi-squared statistic divided by N times the minimum of (r-1) and (c-1), yielding a value between 0 (no association) and 1 (perfect association). This test is particularly common in analyzing survey data, such as examining the relationship between gender (rows) and voting preference (columns) in election studies. If the test rejects independence, post-hoc analysis of cell contributions aids in identifying which specific combinations drive the result. Standardized Pearson residuals, calculated as (Oij - Eij) / √Eij, highlight deviations; residuals with absolute values exceeding about 2 (corresponding to a roughly 5% tail probability under the null) suggest cells where observed frequencies differ markedly from expectations, signaling localized associations. These residuals follow an approximate standard normal distribution under the null, facilitating targeted interpretation while accounting for varying expected cell sizes.[25]Goodness-of-Fit Test
The chi-squared goodness-of-fit test evaluates whether the distribution of observed categorical data aligns with a predefined theoretical distribution, providing a measure of discrepancy between observed and expected frequencies. This test is particularly valuable when assessing if sample outcomes conform to expected probabilities derived from theoretical models, such as uniform, Poisson, or multinomial distributions. Introduced as part of the broader chi-squared framework by Karl Pearson, it serves as a foundational tool in statistical inference for distribution validation.[7][4] In the standard setup, the data is partitioned into k mutually exclusive categories, yielding observed counts Oi for each category i = 1, 2, ..., k. The corresponding expected counts are then calculated as , where n is the total sample size and represents the theoretical probability for category i, often set to for uniformity or derived from parametric models like the Poisson distribution. For instance, in testing dice fairness, each of the six faces would have an expected probability of under the null assumption of uniformity. The test proceeds by computing the chi-squared statistic from these frequencies, as detailed in the mathematical formulation section.[4][26] The null hypothesis (H0) asserts that the observed data arises from the specified theoretical distribution, implying no significant deviation between observed and expected frequencies. The alternative hypothesis (Ha) posits that the data does not follow this distribution, indicating a mismatch that could arise from systematic biases or non-random processes. Unique applications include verifying uniformity in random number generators or gaming devices like dice, as well as checking adherence to multinomial models in fields such as genetics or market research.[4][27] When the theoretical probabilities involve parameters estimated directly from the sample data—such as the mean in a Poisson fit—the degrees of freedom must be adjusted to account for this estimation, given by , where m is the number of parameters fitted. This adjustment ensures the test's validity by reducing the effective freedom to reflect the information used in parameter estimation. The chi-squared goodness-of-fit test is commonly applied in quality control to assess process consistency, such as verifying that defect rates or product categorizations match expected distributional norms in manufacturing.[4][28]Computational Methods
Step-by-Step Calculation
To perform a manual calculation of the chi-squared test statistic, begin by organizing the data into a contingency table for tests of independence or a frequency table for goodness-of-fit tests, recording the observed frequencies in each cell or category.[5][29] Next, compute the expected frequencies for each cell or category, which depend on the specific application: for a test of independence in categorical data, these are derived from the marginal totals and overall sample size as outlined in the mathematical formulation; for a goodness-of-fit test, they are obtained by multiplying the total sample size by the hypothesized proportions for each category.[5][29] Then, calculate the test statistic using the formula where the sum is taken over all cells or categories, providing a measure of deviation between observed and expected frequencies.[5][29] Determine the degrees of freedom (df) based on the test type—for independence, df = (r - 1)(c - 1) where r is the number of rows and c is the number of columns in the contingency table; for goodness-of-fit, df = k - 1 - m where k is the number of categories and m is the number of parameters estimated from the data—and use a chi-squared distribution table or software to find the critical value at a chosen significance level (e.g., α = 0.05) or the p-value, referencing the asymptotic properties of the statistic for large samples.[5][29] Compare the computed to the critical value: if exceeds the critical value (or if the p-value < α), reject the null hypothesis of independence or good fit; otherwise, fail to reject it. Report the results in the standard format, such as " value, p = value," to summarize the finding.[5][29] Note that the chi-squared approximation is reliable only when expected frequencies meet certain conditions, such as in at least 80% of cells with no , as smaller values can lead to inaccurate p-values and may require alternative tests.[5][29]Software and Implementation Notes
The chi-squared test is implemented in various statistical software packages, facilitating both tests of independence and goodness-of-fit. In R, thechisq.test() function from the base stats package handles both types of tests on count data.[30] For a contingency table test of independence, users provide a matrix of observed frequencies, optionally applying Yates's continuity correction via the correct parameter (default: TRUE for 2x2 tables).[30] For goodness-of-fit, the function accepts a vector of observed counts and a vector of expected probabilities via the p parameter.[30] An example for independence is:
observed <- matrix(c(10, 20, 30, 40), nrow=2)
result <- chisq.test(observed, correct=FALSE)
print(result)
observed <- matrix(c(10, 20, 30, 40), nrow=2)
result <- chisq.test(observed, correct=FALSE)
print(result)
scipy.stats.chi2_contingency() for tests of independence on contingency tables, returning the statistic, p-value, degrees of freedom, and expected frequencies.[31] The function applies Yates's correction by default but allows disabling it; since version 1.11.0, a method parameter supports Monte Carlo simulation or permutation tests for improved accuracy with small samples.[31] For goodness-of-fit, scipy.stats.chisquare() compares observed frequencies to expected ones under the null hypothesis of equal probabilities (or user-specified via f_exp).[32] An example for independence is:
import numpy as np
from scipy.stats import chi2_contingency
observed = np.array([[10, 20], [30, 40]])
stat, p, dof, expected = chi2_contingency(observed)
print(f'Statistic: {stat}, p-value: {p}')
import numpy as np
from scipy.stats import chi2_contingency
observed = np.array([[10, 20], [30, 40]])
stat, p, dof, expected = chi2_contingency(observed)
print(f'Statistic: {stat}, p-value: {p}')
CHISQ.TEST() function computes the p-value for a goodness-of-fit or independence test by comparing actual and expected ranges, with degrees of freedom as a second output via CHISQ.DIST.RT().[34] For example: =CHISQ.TEST(A1:B2, C1:D2) where A1:B2 holds observed and C1:D2 expected values.[34]
Software implementations often issue warnings for low expected counts, as the chi-squared approximation may be unreliable if more than 20% of cells have expected frequencies <5 or any <1.[30][31] In R, chisq.test() explicitly warns if expected values are <5.[30] Similarly, SciPy notes potential inaccuracy for small frequencies and recommends alternatives like exact tests.[32] Residuals, useful for identifying influential cells, are accessible in outputs: R provides Pearson residuals (result$residuals) and standardized residuals (result$stdres); SciPy allows computation from returned expected frequencies as (observed - expected) / sqrt(expected); SPSS includes them in Crosstabs tables when selected.[30][31][33]
As of 2025, modern software includes simulation-based options like Monte Carlo or bootstrapping for p-values in small samples to enhance accuracy beyond the asymptotic approximation.[30][31] In R, set simulate.p.value=TRUE with B replicates for Monte Carlo p-values.[30] SciPy's chi2_contingency supports 'monte-carlo' or 'permutation' methods.[31] SPSS Exact Tests module offers Monte Carlo simulation for exact p-values in the Crosstabs dialog.[33] These approaches resample the data to estimate the null distribution, mitigating issues with low counts.[30][31]
Adjustments and Related Tests
Yates's Correction for Continuity
Yates's correction for continuity is a modification to the standard Pearson chi-squared statistic designed specifically for 2×2 contingency tables involving small sample sizes. Introduced by Frank Yates in 1934, it adjusts the test to better account for the discrete nature of categorical count data when approximating the continuous chi-squared distribution, thereby improving the accuracy of the p-value estimation.[35][36] The corrected statistic is computed as where denotes the observed frequency in cell , the expected frequency under the null hypothesis of independence, and the subtraction of 0.5 serves as the continuity correction to mitigate the discontinuity between discrete observations and the continuous approximation. This adjustment reduces the value of the chi-squared statistic compared to the uncorrected version, making it less likely to reject the null hypothesis and thus lowering the risk of Type I error inflation in small samples.[36][37] The correction is recommended for application in 2×2 tables when all expected cell frequencies are at least 1 and at least one is less than 5, as these conditions indicate potential inadequacy of the chi-squared approximation without adjustment. However, Yates explicitly advised against its use for tables larger than 2×2, where the correction has minimal impact and may unnecessarily complicate computations.[35][36] Despite its historical utility, the routine use of Yates's correction remains debated among statisticians, with critics arguing that it is overly conservative, particularly in modern contexts where exact tests are computationally feasible, potentially reducing statistical power without substantial benefits in controlling error rates. Influential analyses, such as those by Agresti, highlight that the correction is often unnecessary given advancements in exact methods and simulation-based approaches.[38]Fisher's Exact Test and Binomial Test as Alternatives
Fisher's exact test provides an exact alternative to the chi-squared test of independence for 2×2 contingency tables, particularly when sample sizes are small and the chi-squared approximation may be unreliable. Developed by Ronald A. Fisher, the test computes the probability of observing the given table (or one more extreme) under the null hypothesis of independence, assuming fixed marginal totals, using the hypergeometric distribution.[39] The p-value is obtained by summing the hypergeometric probabilities of all tables with the same margins that are as or less probable than the observed table. This test is especially recommended for 2×2 tables where one or more expected cell frequencies are less than 5, as the chi-squared test's asymptotic approximation performs poorly in such cases, potentially leading to inaccurate p-values.[40] Computationally, Fisher's exact test traditionally relies on enumerating all possible tables consistent with the fixed margins, though for larger tables this becomes intensive; modern implementations use efficient network algorithms to optimize the summation over the probability space. Despite these challenges for tables beyond 2×2, the test is routinely available in statistical software for practical use.[41] For even simpler cases, such as testing a single proportion in a 2×1 table (e.g., comparing observed successes to an expected rate under the null), the binomial test serves as an exact alternative. This test evaluates deviations from the hypothesized proportion using the exact binomial distribution, calculating the p-value as the cumulative probability of outcomes as extreme as or more extreme than observed.[42] Like Fisher's test, it is preferred when expected counts are small (e.g., fewer than 5 successes or failures), avoiding reliance on normal approximations inherent in large-sample methods. The p-value is computed directly from the binomial cumulative distribution function, which is straightforward and efficient even for moderate sample sizes.[42]Chi-squared Test for Variance
Formulation for Normal Populations
The chi-squared test for variance assesses whether the variance of a normally distributed population equals a specified hypothesized value, applying specifically to continuous data rather than the categorical data addressed by Pearson's chi-squared test of independence or goodness-of-fit. This test evaluates the null hypothesis , where is the population variance and is the hypothesized value.[43] The test statistic is formulated as where denotes the sample size and represents the sample variance, calculated as with as the sample mean.[43][44] Assuming the population is normally distributed, under the null hypothesis, this statistic follows an exact chi-squared distribution with degrees of freedom, eliminating the need for asymptotic approximations unlike in categorical applications.[44][45] For hypothesis testing, a two-sided alternative () rejects if the p-value or test statistic falls outside critical values from the chi-squared distribution table at the chosen significance level; one-sided alternatives ( or ) use the appropriate tail.[43][45] This formulation emerged in the 1920s through Ronald A. Fisher's development of inference methods for normal distributions, distinct from Karl Pearson's earlier work on chi-squared for discrete data.Interpretation and Limitations
The chi-squared test for variance involves computing the test statistic , where is the sample size, is the sample variance, and is the hypothesized population variance under the null hypothesis . Under and assuming normality, this statistic follows a chi-squared distribution with degrees of freedom.[46][43] For a two-sided test at significance level , reject if (upper critical value) or (lower critical value); one-sided alternatives adjust the critical region accordingly.[46][47] A p-value is obtained by comparing the observed to the chi-squared distribution, with rejection if p < .[43] A confidence interval for the population variance is given by where the quantiles are from the chi-squared distribution with degrees of freedom; this interval contains with probability under normality.[48][49] If the interval excludes , it supports rejection of . For the standard deviation , take square roots of the interval bounds.[50] This test is often paired with the t-test for the population mean in normal theory inference, where both assess aspects of a normal distribution: the t-test evaluates the mean assuming known or estimated variance, while the chi-squared test verifies the variance assumption.[51][52] The test relies on the strict assumption that the population is normally distributed, making it highly sensitive to departures from normality, such as skewness, kurtosis, or outliers, which can distort the distribution and lead to invalid p-values or coverage probabilities for confidence intervals.[49] For non-normal data, robust alternatives like Levene's test (adapted for single samples) or bootstrap methods are recommended over the chi-squared approach.[49][24] Additionally, the test exhibits low power for detecting deviations from , especially with small to moderate sample sizes, often requiring large (e.g., >30) to achieve adequate sensitivity to variance changes.[53] The chi-squared test for a single variance can be generalized to compare two population variances using the F-test, where the statistic (assuming equal hypothesized variances) follows an F-distribution with and degrees of freedom under normality and .[47][54]Illustrative Examples
Contingency Table Example
To illustrate the chi-squared test for independence, consider a hypothetical 2x2 contingency table examining the association between smoking status and lung cancer diagnosis in a sample of 100 patients. The observed frequencies are as follows:| Lung Cancer | No Lung Cancer | Total | |
|---|---|---|---|
| Smokers | 40 | 10 | 50 |
| Non-Smokers | 5 | 45 | 50 |
| Total | 45 | 55 | 100 |
| Lung Cancer | No Lung Cancer | Total | |
|---|---|---|---|
| Smokers | 22.5 | 27.5 | 50 |
| Non-Smokers | 22.5 | 27.5 | 50 |
| Total | 45 | 55 | 100 |
Goodness-of-Fit Example
A classic illustration of the chi-squared goodness-of-fit test involves assessing the fairness of a six-sided die, where the null hypothesis states that each face appears with equal probability of . Consider an experiment with 30 rolls, yielding the observed frequencies shown in the table below.| Face | Observed Frequency () |
|---|---|
| 1 | 3 |
| 2 | 7 |
| 3 | 5 |
| 4 | 10 |
| 5 | 2 |
| 6 | 3 |
