Hubbry Logo
Chi-squared testChi-squared testMain
Open search
Chi-squared test
Community hub
Chi-squared test
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Chi-squared test
Chi-squared test
from Wikipedia

Chi-squared distribution, showing χ2 on the first axis and p-value (right tail probability) on the second axis.

A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table) are independent in influencing the test statistic (values within the table).[1] The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. For contingency tables with smaller sample sizes, a Fisher's exact test is used instead.

In the standard applications of this test, the observations are classified into mutually exclusive classes. If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution. The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.

Test statistics that follow a χ2 distribution occur when the observations are independent. There are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on observations of the pairs.

Chi-squared tests often refers to tests for which the distribution of the test statistic approaches the χ2 distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true) of the test statistic approximates a chi-squared distribution more and more closely as sample sizes increase.

History

[edit]

In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Mansfield Merriman, whose works were criticized by Karl Pearson in his 1900 paper.[2]

At the end of the 19th century, Pearson noticed the existence of significant skewness within some biological observations. In order to model the observations regardless of being normal or skewed, Pearson, in a series of articles published from 1893 to 1916,[3][4][5][6] devised the Pearson distribution, a family of continuous probability distributions, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing a test of goodness of fit to determine how well the model really fits to the observations.

Pearson's chi-squared test

[edit]

In 1900, Pearson published a paper[2] on the χ2 test which is considered to be one of the foundations of modern statistics.[7] In this paper, Pearson investigated a test of goodness of fit.

Suppose that n observations in a random sample from a population are classified into k mutually exclusive classes with respective observed numbers of observations xi (for i = 1,2,…,k), and a null hypothesis gives the probability pi that an observation falls into the ith class. So we have the expected numbers mi = npi for all i, where

Pearson proposed that, under the circumstance of the null hypothesis being correct, as n → ∞ the limiting distribution of the quantity given below is the χ2 distribution.

Pearson dealt first with the case in which the expected numbers mi are large enough known numbers in all cells assuming every observation xi may be taken as normally distributed, and reached the result that, in the limit as n becomes large, X2 follows the χ2 distribution with k − 1 degrees of freedom.

However, Pearson next considered the case in which the expected numbers depended on the parameters that had to be estimated from the sample, and suggested that, with the notation of mi being the true expected numbers and mi being the estimated expected numbers, the difference

will usually be positive and small enough to be omitted. In a conclusion, Pearson argued that if we regarded X2 as also distributed as χ2 distribution with k − 1 degrees of freedom, the error in this approximation would not affect practical decisions. This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher's 1922 and 1924 papers.[8][9]

Other examples of chi-squared tests

[edit]

One test statistic that follows a chi-squared distribution exactly is the test that the variance of a normally distributed population has a given value based on a sample variance. Such tests are uncommon in practice because the true variance of the population is usually unknown. However, there are several statistical tests where the chi-squared distribution is approximately valid:

Fisher's exact test

[edit]

For an exact test used in place of the 2 × 2 chi-squared test for independence when all the row and column totals were fixed by design, see Fisher's exact test. When the row or column margins (or both) are random variables (as in most common research designs) this tends to be overly conservative and underpowered.[10]

Binomial test

[edit]

For an exact test used in place of the 2 × 1 chi-squared test for goodness of fit, see binomial test.

Other chi-squared tests

[edit]

Yates's correction for continuity

[edit]

Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct and introduces some error.

To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the absolute difference between each observed value and its expected value in a 2 × 2 contingency table.[11] This reduces the chi-squared value obtained and thus increases its p-value.

Chi-squared test for variance in a normal population

[edit]

If a sample of size n is taken from a population having a normal distribution, then there is a result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of n product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e. the value to be tested as holding). Then T has a chi-squared distribution with n − 1 degrees of freedom. For example, if the sample size is 21, the acceptance region for T with a significance level of 5% is between 9.59 and 34.17.

Example chi-squared test for categorical data

[edit]

Suppose there is a city of 1,000,000 residents with four neighborhoods: A, B, C, and D. A random sample of 650 residents of the city is taken and their occupation is recorded as "white collar", "blue collar", or "no collar". The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification. The data are tabulated as:

A B C D Total
White collar 90 60 104 95 349
Blue collar 30 50 51 20 151
No collar 30 40 45 35 150
Total 150 150 200 150 650

Let us take the sample living in neighborhood A, 150, to estimate what proportion of the whole 1,000,000 live in neighborhood A. Similarly we take 349/650 to estimate what proportion of the 1,000,000 are white-collar workers. By the assumption of independence under the hypothesis we should "expect" the number of white-collar workers in neighborhood A to be

Then in that "cell" of the table, we have

The sum of these quantities over all of the cells is the test statistic; in this case, . Under the null hypothesis, this sum has approximately a chi-squared distribution whose number of degrees of freedom is

If the test statistic is improbably large according to that chi-squared distribution, then one rejects the null hypothesis of independence.

A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many residents of each neighborhood to include. Then each resident has the same chance of being chosen as do all residents of the same neighborhood, but residents of different neighborhoods would have different probabilities of being chosen if the four sample sizes are not proportional to the populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather than "independence". The question is whether the proportions of blue-collar, white-collar, and no-collar workers in the four neighborhoods are the same. However, the test is done in the same way.

Applications

[edit]

In cryptanalysis, the chi-squared test is used to compare the distribution of plaintext and (possibly) decrypted ciphertext. The lowest value of the test means that the decryption was successful with high probability.[12][13] This method can be generalized for solving modern cryptographic problems.[14]

In bioinformatics, the chi-squared test is used to compare the distribution of certain properties of genes (e.g., genomic content, mutation rate, interaction network clustering, etc.) belonging to different categories (e.g., disease genes, essential genes, genes on a certain chromosome etc.).[15][16]

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Chi-squared test, also known as , is a non-parametric that determines whether there is a significant association between categorical variables or if observed categorical data frequencies deviate substantially from those expected under a specified . Developed by mathematician in 1900, it provides a criterion for evaluating the fit of sample data to a theoretical model without assuming normality, marking a foundational advancement in modern . The test computes a based on the χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where OiO_i represents observed frequencies and EiE_i expected frequencies, which approximately follows a under the for large sample sizes. Pearson's innovation addressed the need for a general method to test deviations in correlated systems of variables, originating from his work on random sampling and probable errors in biological and social data. Initially introduced for goodness-of-fit analysis—assessing if data conform to a hypothesized like the normal or Poisson—the test has since expanded to include the test of , which examines associations between two categorical variables in a , and the test of homogeneity, which compares distributions across multiple populations. For the independence test, degrees of freedom are calculated as (r1)(c1)(r-1)(c-1), where rr and cc are the number of rows and columns in the table, enabling p-value computation to reject or retain the of no association. Key assumptions include random sampling, independence of observations, and sufficiently large expected frequencies (typically at least 5 per cell to ensure the chi-squared approximation holds, as per Cochran's rule). Violations, such as small sample sizes, may necessitate alternatives like Fisher's exact test. Widely used in fields like biology, sociology, and medicine for analyzing survey data, genetic inheritance, and clinical trials, the chi-squared test remains a cornerstone of categorical data analysis due to its simplicity and robustness.

Introduction

Definition and Purpose

The is a that employs the to assess the extent of discrepancies between observed frequencies and expected frequencies in categorical data. It evaluates whether these differences are likely due to random variation or indicate a significant deviation from the . Under the , the follows an asymptotic , allowing for the computation of p-values to determine . The primary purposes of the chi-squared test are to examine between two or more categorical variables in contingency tables and to test the goodness-of-fit of observed data to a specified theoretical distribution. In the test of , it determines whether the distribution of one variable depends on the levels of another, such as assessing associations in survey responses across demographic groups. For goodness-of-fit, it verifies if empirical frequencies align with expected proportions under models like uniformity or specific probability distributions. The test statistic is given by χ2=(OiEi)2Ei,\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where OiO_i represents the observed frequencies and EiE_i the expected frequencies for each category ii. Developed in early 20th-century statistics for analyzing categorical data, the chi-squared test is non-parametric, imposing no assumptions on the underlying distribution of the data itself, but relying on the asymptotic chi-squared distribution of the statistic under the null hypothesis. This makes it versatile for applications where data are counts or proportions without normality requirements.

Assumptions and Prerequisites

The chi-squared test requires that observations are independent, meaning each point is collected without influencing others, to ensure the validity of the underlying . This assumption holds when the sample is drawn as a from the , avoiding any systematic dependencies or clustering in the . Additionally, the test is designed for categorical , where variables are discrete or binned into mutually exclusive categories, rather than continuous measurements that have not been discretized. A critical assumption concerns sample size adequacy: the expected frequencies in at least 80% of the cells should be 5 or greater, with no expected frequencies less than 1, to justify the asymptotic approximation to the under the . Violations of this rule, particularly in small samples, can lead to unreliable p-values, necessitating alternatives such as exact tests like . Prior to applying the chi-squared test, users should possess foundational knowledge in , including concepts like expected values and distributions, as well as the hypothesis testing framework—encompassing null and alternative hypotheses, , and interpretation of p-values at chosen significance levels (e.g., α = 0.05). These prerequisites enable proper setup of the test for applications such as assessing in contingency tables.

Historical Development

Karl Pearson's Formulation

In 1900, Karl Pearson introduced the chi-squared test in a paper published in the Philosophical Magazine, presenting it as a criterion to determine whether observed deviations from expected probabilities in a system of correlated variables could reasonably be ascribed to random sampling. This formulation addressed key limitations in prior approaches to analyzing categorical data, particularly in biological contexts where earlier methods struggled to quantify discrepancies between empirical observations and theoretical expectations. Pearson's work was motivated by the need for a robust tool to evaluate patterns in genetics, building on challenges posed by datasets like those from Gregor Mendel's experiments on pea plant inheritance, which highlighted inconsistencies in fitting discrete distributions to observed frequencies. Pearson derived the test statistic as a sum of squared deviations between observed and expected frequencies, divided by the expected frequencies to account for varying scales across categories; this measure captured the overall discrepancy in a single value, inspired by the summation of squared standardized normals from multivariate normal theory. He symbolized the statistic with the Greek letter χ²—pronounced "chi-squared"—reflecting its connection to the squared form of the character χ, a notation that has persisted in statistical literature. Initially, Pearson applied the test to biological data on inheritance patterns, such as ratios in genetic crosses, enabling researchers to assess whether empirical results aligned with hypothesized Mendelian proportions under random variation. A pivotal aspect of Pearson's contribution was establishing the of the χ² under the of good fit, linking it to a with k , where k equals the number of categories minus the number of parameters estimated from the . This theoretical foundation allowed for probabilistic , with larger values of the indicating poorer fit and lower probabilities of the arising by chance alone. By formalizing this approach, Pearson provided the first systematic method for goodness-of-fit testing in categorical settings, profoundly influencing the development of modern in and beyond.

Subsequent Contributions and Naming

Following Karl Pearson's initial formulation, Ronald A. Fisher advanced the chi-squared test in the 1920s by rigorously establishing its asymptotic under the and extending its application to testing in . In his 1922 paper, Fisher derived the appropriate for the —(r-1)(c-1) for an r × c —correcting earlier inconsistencies in Pearson's approach and enabling more accurate calculations for assessing deviations from . The nomenclature distinguishes "Pearson's chi-squared test" as the statistical procedure itself, crediting its originator, from the "," which describes the limiting of the . This arises from Pearson's adoption of the χ² (chi squared) for the , while Fisher provided the foundational proof of its convergence to the , solidifying the theoretical basis. In the 1930s, the chi-squared test became integrated into the Neyman-Pearson framework for hypothesis testing, which emphasized specifying alternative hypotheses, controlling both Type I and Type II error rates, and using p-values to quantify evidence against the null. This incorporation elevated the test's role in formal inferential procedures, aligning it with broader developments in statistical . By the , the chi-squared test achieved widespread recognition in , as seen in Fisher's 1936 application to evaluate the goodness-of-fit of Gregor Mendel's experimental ratios to Mendelian expectations, revealing improbably precise results suggestive of data adjustment. In social sciences, it facilitated analysis of associations in categorical survey data, with standardization occurring through its prominent inclusion in influential textbooks like E.F. Lindquist's 1940 Statistical Analysis in Educational Research, which exemplified its use in fields such as and .

The Pearson Chi-squared Statistic

Mathematical Formulation

The chi-squared test evaluates hypotheses concerning the distribution of categorical data. The null hypothesis H0H_0 asserts that the observed frequencies conform to expected frequencies under a specified theoretical distribution (goodness-of-fit test) or that categorical variables are independent (test of independence), while the alternative hypothesis HAH_A posits deviation from this fit or presence of dependence. The Pearson chi-squared statistic is given by χ2=i(OiEi)2Ei,\chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}, where the sum is over all categories ii, OiO_i denotes the observed in category ii, and EiE_i is the expected under H0H_0. This formulation measures the discrepancy between observed and expected values, normalized by the expected frequencies to account for varying category sizes. The statistic was originally proposed by in 1900 as a measure of for frequency distributions. The statistic derives from the multinomial likelihood under H0H_0, where the data follow a with probabilities yielding the expected frequencies EiE_i. The log-likelihood ratio test statistic G2=2iOilog(Oi/Ei)G^2 = 2 \sum_i O_i \log(O_i / E_i) provides an alternative measure, but under large samples, a second-order Taylor (quadratic) of the log-likelihood around the null yields the Pearson form χ2\chi^2 asymptotically. For the goodness-of-fit test, the expected frequencies are Ei=npiE_i = n p_i, where nn is the total sample size and pip_i are the theoretical probabilities for each category under H0H_0. In the test of independence for an r×cr \times c , the expected frequency for cell (i,j)(i,j) is Eij=(ricj)/NE_{ij} = (r_i c_j) / N, where rir_i is the total for row ii, cjc_j the total for column jj, and NN the grand total. Under H0H_0 and large sample sizes, χ2\chi^2 approximately follows a with appropriate dfdf, and the p-value is P(χdf2>χobs2)P(\chi^2_{df} > \chi^2_{\text{obs}}), where χobs2\chi^2_{\text{obs}} is the computed .

Asymptotic Properties and Degrees of Freedom

Under the and for sufficiently large sample sizes, the Pearson chi-squared converges in distribution to a central with a specified number of , providing the theoretical basis for testing. This asymptotic property, established by Pearson in his foundational work, allows the use of chi-squared critical values to assess the significance of observed deviations from expected frequencies. The for the depend on the test context. In the test of for an r×cr \times c , the are (r1)(c1)(r-1)(c-1), reflecting the number of independent cells after accounting for row and column marginal constraints. For the goodness-of-fit test involving kk categories where the expected frequencies are fully specified, the are k1k - 1; if mm parameters of the hypothesized distribution are estimated from the , this adjusts to k1mk - 1 - m. This asymptotic chi-squared distribution arises from the applied to the multinomial sampling model underlying the test. Under the , the standardized differences (OiEi)/Ei(O_i - E_i)/\sqrt{E_i}
Add your contribution
Related Hubs
User Avatar
No comments yet.