T-statistic
View on WikipediaThis article needs additional citations for verification. (April 2020) |
In statistics, the t-statistic is the ratio of the difference in a number’s estimated value from its assumed value to its standard error. It is used in hypothesis testing via Student's t-test. The t-statistic is used in a t-test to determine whether to support or reject the null hypothesis. It is very similar to the z-score but with the difference that t-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the t-statistic is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown. It is also used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.
Definition and features
[edit]Let be an estimator of parameter β in some statistical model. Then a t-statistic for this parameter is any quantity of the form
where β0 is a non-random, known constant, which may or may not match the actual unknown parameter value β, and is the standard error of the estimator for β.
By default, statistical packages report t-statistic with β0 = 0 (these t-statistics are used to test the significance of corresponding regressor). However, when t-statistic is needed to test the hypothesis of the form H0: β = β0, then a non-zero β0 may be used.
If is an ordinary least squares estimator in the classical linear regression model (that is, with normally distributed and homoscedastic error terms), and if the true value of the parameter β is equal to β0, then the sampling distribution of the t-statistic is the Student's t-distribution with (n − k) degrees of freedom, where n is the number of observations, and k is the number of regressors (including the intercept)[citation needed].
In the majority of models, the estimator is consistent for β and is distributed asymptotically normally. If the true value of the parameter β is equal to β0, and the quantity correctly estimates the asymptotic variance of this estimator, then the t-statistic will asymptotically have the standard normal distribution.
In some models the distribution of the t-statistic is different from the normal distribution, even asymptotically. For example, when a time series with a unit root is regressed in the augmented Dickey–Fuller test, the test t-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).
Use
[edit]Most frequently, t statistics are used in Student's t-tests, a form of statistical hypothesis testing, and in the computation of certain confidence intervals.
The key property of the t statistic is that it is a pivotal quantity – while defined in terms of the sample mean, its sampling distribution does not depend on the population parameters, and thus it can be used regardless of what these may be.
One can also divide a residual by the sample standard deviation:
to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a z-score, the z-score requiring the population parameters.
Prediction
[edit]Given a normal distribution with unknown mean and variance, the t-statistic of a future observation after one has made n observations, is an ancillary statistic – a pivotal quantity (does not depend on the values of μ and σ2) that is a statistic (computed from observations). This allows one to compute a frequentist prediction interval (a predictive confidence interval), via the following t-distribution:
Solving for yields the prediction distribution
from which one may compute predictive confidence intervals – given a probability p, one may compute intervals such that 100p% of the time, the next observation will fall in that interval.
History
[edit]The term "t-statistic" is abbreviated from "hypothesis test statistic".[1][citation needed] In statistics, the t-distribution was first derived as a posterior distribution in 1876 by Helmert[2][3][4] and Lüroth.[5][6][7] The t-distribution also appeared in a more general form as Pearson Type IV distribution in Karl Pearson's 1895 paper.[8] However, the T-Distribution, also known as Student's T Distribution gets its name from William Sealy Gosset who was first to publish the result in English in his 1908 paper titled "The Probable Error of a Mean" (in Biometrika) using his pseudonym "Student"[9][10] because his employer preferred their staff to use pen names when publishing scientific papers instead of their real name, so he used the name "Student" to hide his identity.[11] Gosset worked at the Guinness Brewery in Dublin, Ireland, and was interested in the problems of small samples – for example, the chemical properties of barley where sample sizes might be as few as 3. Hence a second version of the etymology of the term Student is that Guinness did not want their competitors to know that they were using the t-test to determine the quality of raw material. Although it was William Gosset after whom the term "Student" is penned, it was actually through the work of Ronald Fisher that the distribution became well known as "Student's distribution"[12][13] and "Student's t-test"
Related concepts
[edit]- z-score (standardization): If the population parameters are known, then rather than computing the t-statistic, one can compute the z-score; analogously, rather than using a t-test, one uses a z-test. This is rare outside of standardized testing.
- Studentized residual: In regression analysis, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a simple linear regression), and thus one must divide the different residuals by different estimates for the error, yielding what are called studentized residuals.
See also
[edit]References
[edit]- ^ The Microbiome in Health and Disease. Academic Press. 29 May 2020. p. 397. ISBN 978-0-12-820001-8.
- ^ Szabó, István (2003), "Systeme aus einer endlichen Anzahl starrer Körper", Einführung in die Technische Mechanik, Springer Berlin Heidelberg, pp. 196–199, doi:10.1007/978-3-642-61925-0_16, ISBN 978-3-540-13293-6
- ^ Schlyvitch, B. (October 1937). "Untersuchungen über den anastomotischen Kanal zwischen der Arteria coeliaca und mesenterica superior und damit in Zusammenhang stehende Fragen". Zeitschrift für Anatomie und Entwicklungsgeschichte. 107 (6): 709–737. doi:10.1007/bf02118337. ISSN 0340-2061. S2CID 27311567.
- ^ Helmert (1876). "Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Beobachtungsfehlers directer Beobachtungen gleicher Genauigkeit". Astronomische Nachrichten (in German). 88 (8–9): 113–131. Bibcode:1876AN.....88..113H. doi:10.1002/asna.18760880802.
- ^ Lüroth, J. (1876). "Vergleichung von zwei Werthen des wahrscheinlichen Fehlers". Astronomische Nachrichten (in German). 87 (14): 209–220. Bibcode:1876AN.....87..209L. doi:10.1002/asna.18760871402.
- ^ Pfanzagl, J. (1996). "Studies in the history of probability and statistics XLIV. A forerunner of the t-distribution". Biometrika. 83 (4): 891–898. doi:10.1093/biomet/83.4.891. MR 1766040.
- ^ Sheynin, Oscar (1995). "Helmert's work in the theory of errors". Archive for History of Exact Sciences. 49 (1): 73–104. doi:10.1007/BF00374700. ISSN 0003-9519. S2CID 121241599.
- ^ Pearson, Karl (1895). "X. Contributions to the mathematical theory of evolution.—II. Skew variation in homogeneous material". Philosophical Transactions of the Royal Society of London A. 186: 343–414. Bibcode:1895RSPTA.186..343P. doi:10.1098/rsta.1895.0010. ISSN 1364-503X.
- ^ "Student" (William Sealy Gosset) (1908). "The Probable Error of a Mean". Biometrika. 6 (1): 1–25. doi:10.1093/biomet/6.1.1. hdl:10338.dmlcz/143545. JSTOR 2331554.
- ^ "T Table | History of T Table, Etymology, one-tail T Table, two-tail T Table and T-statistic".
- ^ Wendl, M. C. (2016). "Pseudonymous fame". Science. 351 (6280): 1406. doi:10.1126/science.351.6280.1406. PMID 27013722.
- ^ Tuttle, Md; Anazonwu, Bs, Walter; Rubin, Md, Lee (2014). "Subgroup Analysis of Topical Tranexamic Acid in Total Knee Arthroplasty". Reconstructive Review. 4 (2): 37–41. doi:10.15438/rr.v4i2.72.
- ^ Walpole, Ronald E. (2006). Probability & statistics for engineers & scientists. Myers, H. Raymond. (7th ed.). New Delhi: Pearson. ISBN 81-7758-404-9. OCLC 818811849.
T-statistic
View on GrokipediaMathematical Foundation
Definition and Formula
The t-statistic is a ratio that measures the difference between a sample mean and a hypothesized population mean, standardized by an estimate of the standard error derived from the sample data. It is primarily used when the population standard deviation is unknown, providing a test statistic for inference about population parameters based on small samples.[8] For the one-sample case, the t-statistic is defined by the formulaInterpretation of the t-value
The t-value quantifies the extent to which the sample mean deviates from the mean specified under the null hypothesis, expressed in terms of the number of standard errors away from that hypothesized value. This standardization allows for assessing the plausibility of the observed difference under the assumption of no true effect, where the standard error is derived from the sample data, including the sample standard deviation s as a key component in its estimation.[14][15] The absolute value of the t-statistic, |t|, serves as the primary indicator of evidence against the null hypothesis: larger magnitudes imply a more substantial deviation relative to the variability in the sample, thereby providing stronger grounds for rejecting the null. To determine significance, |t| is compared to critical values obtained from the t-distribution table, which depend on the degrees of freedom (df) and the chosen significance level (α). For instance, with large df (> 30), the critical value approximates the z-score of 1.96 for a two-tailed test at α = 0.05.[15][16] Considerations of test directionality distinguish one-tailed from two-tailed interpretations. In a two-tailed test, the alternative hypothesis posits a difference in either direction, so the critical region is divided equally between both tails of the t-distribution (using α/2 per tail), and the sign of t reveals the direction of the deviation. Conversely, a one-tailed test focuses on a directional alternative (greater than or less than), allocating the entire critical region to one tail (using α), which requires the t-value to align with the hypothesized direction for rejection.[17][18] For example, a computed t = 2.5 with df = 10 exceeds the two-tailed critical value of 2.228 at α = 0.05, indicating sufficient evidence to reject the null hypothesis in favor of a significant difference.[19]Properties and Assumptions
Underlying Distribution
Under the null hypothesis and when the underlying assumptions are satisfied, the t-statistic follows a Student's t-distribution with degrees of freedom equal to the sample size minus one for a single-sample test.[20] The Student's t-distribution is symmetric around zero and bell-shaped, resembling the standard normal distribution but featuring heavier tails that reflect greater variability in estimates from smaller samples.[21] As the degrees of freedom increase, the distribution converges to the standard normal (z) distribution; for practical purposes, it provides a close approximation when degrees of freedom exceed 30.[22] The probability density function for the Student's t-distribution with degrees of freedom isKey Assumptions and Limitations
The validity of the t-statistic relies on several core assumptions about the underlying data. Primarily, the data must be drawn from a population that follows a normal distribution, although for large sample sizes (typically n > 30), the central limit theorem provides a reasonable approximation even if normality is not strictly met.[24] Observations must also be independent of one another, meaning that the value of one observation does not influence or depend on another, which is crucial to ensure unbiased estimation of the population parameters.[25] For two-sample t-tests, homogeneity of variances—also known as equal variances across groups—is assumed, preventing distortions in the test statistic due to differing spreads in the data.[26] Additionally, the presence of extreme outliers can unduly influence the sample standard deviation, compromising the reliability of the t-statistic, particularly in smaller samples.[27] Despite these assumptions, the t-statistic exhibits notable limitations, especially in scenarios where they are violated. In small samples, non-normality such as skewness can lead to inaccurate p-values and unreliable inference, as the t-distribution may not adequately approximate the sampling distribution of the mean difference.[24] When variances are unequal between groups, the standard t-test assumes homogeneity, which, if violated, can bias results; this issue is addressed by modifications like Welch's t-test, which adjusts the degrees of freedom to account for heteroscedasticity without assuming equal variances.[28] The degrees of freedom, directly tied to sample size, further underscore the t-statistic's dependence on adequate n to mitigate these sensitivities.[26] To assess robustness, researchers commonly employ diagnostic tools prior to applying the t-statistic. Normality can be evaluated using quantile-quantile (Q-Q) plots, which visually compare the sample quantiles against theoretical normal quantiles to detect deviations like heavy tails or skewness. For homogeneity of variances in two-sample cases, Levene's test is widely used, as it is robust to non-normality and tests whether the absolute deviations from group means are equal across groups.[29] Violations of these assumptions carry significant consequences for statistical inference. Non-normality or outliers in small samples often inflate the Type I error rate, increasing the likelihood of falsely rejecting the null hypothesis, while also reducing the test's power to detect true effects.[30] Heterogeneity of variances similarly distorts error rates, potentially leading to overly conservative or liberal conclusions depending on sample sizes.[31] In such cases, alternatives like non-parametric tests (e.g., Mann-Whitney U test) may be considered to bypass parametric assumptions, though they come with their own trade-offs in efficiency.[32]Applications
Hypothesis Testing
The t-statistic plays a central role in hypothesis testing for assessing whether sample data provide sufficient evidence to challenge claims about population means or differences in means, particularly when population variances are unknown and sample sizes are small.[33] In such tests, the null hypothesis $ H_0 $ typically posits no effect or equality, such as $ H_0: \mu = \mu_0 $ for a single population mean $ \mu $ compared to a specified value $ \mu_0 $, while the alternative hypothesis $ H_a $ specifies the direction or existence of a difference, such as $ H_a: \mu \neq \mu_0 $ (two-sided), $ \mu > \mu_0 $, or $ \mu < \mu_0 $ (one-sided).[34] The general test procedure involves calculating the t-statistic, determining its associated p-value from the t-distribution with appropriate degrees of freedom (df), and comparing it to a preselected significance level $ \alpha $ (commonly 0.05). If the p-value is less than $ \alpha $, the null hypothesis is rejected in favor of the alternative. Alternatively, the absolute value of the t-statistic can be compared directly to a critical value from the t-distribution table for the given df and $ \alpha $; rejection occurs if the t-statistic exceeds this threshold.[33] The p-value represents the probability of obtaining a t-statistic at least as extreme as the observed value assuming the null hypothesis is true.[34] For the one-sample t-test, the t-statistic is computed asEstimation and Confidence Intervals
In point estimation, the sample mean serves as an unbiased estimator of the population mean , with the t-statistic quantifying the precision of this estimate through the standard error , where is the sample standard deviation and is the sample size.[38] This approach accounts for the uncertainty in estimating from the sample, making it suitable for small samples where the population standard deviation is unknown.[39] For constructing confidence intervals around the population mean, the formula is , where is the critical value from the t-distribution with degrees of freedom, and is the significance level (e.g., 0.05 for a 95% confidence level).[38] This interval provides a range within which the true is likely to lie, with the interpretation that if the sampling process were repeated many times, 95% of such intervals would contain the true population mean.[39] The width of the interval decreases as the sample size increases or as the sample standard deviation decreases, reflecting greater precision in the estimate.[40] Prediction intervals extend this framework to estimate the range for a single future observation from the population, given by .[41] Unlike confidence intervals, which focus on the mean, prediction intervals incorporate the additional variability of an individual observation, resulting in wider bounds that account for both sampling error and inherent population scatter.[41] These methods assume the population is normally distributed, though they remain approximately valid for larger samples due to the central limit theorem.[23] For illustration, consider a sample of with and ; the 95% confidence interval is , using the critical value .[16]Historical Development
Origins and Invention
The t-statistic was invented by William Sealy Gosset in 1908 while he was employed as a chemist and statistician at the Guinness Brewery in Dublin, Ireland.[42] Gosset's work was driven by the practical needs of quality control in beer production, where small sample sizes—often fewer than 30 observations—were common due to economic constraints on testing materials like yeast viability and barley yields.[43] These limitations made traditional normal distribution assumptions unreliable for assessing variability in brewing processes, prompting Gosset to develop a new approach for inference with unknown population standard deviation.[44] Gosset first published his findings under the pseudonym "Student" in the paper "The Probable Error of a Mean," which appeared in the journal Biometrika in 1908.[45] In this seminal work, he analytically derived what became known as the Student's t-distribution to address the challenges of small-sample estimation, extending earlier theoretical foundations laid by Karl Pearson during Gosset's time studying at Pearson's Biometric Laboratory in London.[44] The t-statistic emerged as a key component of this distribution, enabling more accurate probability calculations for means when the population variance was estimated from the sample itself.[46] Publication faced significant hurdles due to Guinness's strict policy on industrial secrecy, which initially prohibited Gosset from revealing brewery-specific applications and delayed the release of his research.[47] To circumvent this, he adopted the "Student" pseudonym with the brewery's eventual approval, allowing the ideas to enter the public domain without disclosing proprietary details.[48] Later, Ronald A. Fisher played a crucial role in popularizing the t-statistic through his writings and refinements in the 1920s, integrating it into broader statistical practice.[49]Adoption and Naming
The t-statistic gained prominence in the 1920s through the efforts of Ronald A. Fisher, who integrated it into his foundational work on analysis of variance (ANOVA) and the principles of experimental design, thereby extending its utility beyond initial small-sample contexts.[50] Fisher's 1925 book, Statistical Methods for Research Workers, marked a pivotal inclusion of the t-test, presenting tables and methods that made it practical for biologists and other researchers dealing with experimental data.[51] This publication, aimed at non-mathematicians, facilitated its rapid dissemination in academic and applied settings.[52] Fisher coined the term "Student's t-distribution" in a 1925 paper to honor William Sealy Gosset, who had developed the underlying method under the pseudonym "Student" while addressing small-sample challenges in brewery quality testing at Guinness.[44] Gosset had originally denoted the statistic as z, but Fisher introduced the notation t for it, distinguishing it from the standard normal z-statistic and adapting the formula to emphasize the standard error. The designation "t-statistic" emerged later, appearing routinely in mid-20th-century statistical textbooks as the method became entrenched in standard curricula.[49] Gosset's true identity remained confidential during his lifetime due to employer restrictions, only becoming widely known after his death in 1937.[53] By the 1930s, the t-statistic had achieved widespread adoption as a core tool in statistical education across universities and in industrial practices for data analysis.[54] Its application surged during World War II, particularly in quality control for munitions and manufacturing, where statistical techniques like the t-test supported efficient process monitoring and variability assessment under resource constraints.[55] A key milestone came with Fisher's 1935 book The Design of Experiments, which elaborated on t-tests for handling multiple comparisons in complex designs, solidifying their role in rigorous hypothesis testing.[56]Related Concepts
Comparison to Other Statistics
The t-statistic is primarily employed when the population standard deviation is unknown and sample sizes are small (typically ), whereas the z-statistic is suitable when is known and samples are large (), allowing reliance on the central limit theorem for approximate normality.[57][58] The t-distribution exhibits heavier tails than the standard normal distribution, resulting in more conservative inference and wider confidence intervals to account for estimation uncertainty in the standard deviation; for instance, the critical value for a two-sided 95% confidence interval is 1.96 under the z-distribution but 2.228 for the t-distribution with 10 degrees of freedom.[20][59] In contrast to the F-statistic, which tests ratios of variances (e.g., in ANOVA or regression models for multiple parameters), the t-statistic focuses on univariate mean differences.[60] Under the null hypothesis, the square of a t-statistic follows an F-distribution with 1 numerator degree of freedom and the same denominator degrees of freedom as the t-test:Extensions and Variants
Welch's t-test extends the standard two-sample t-test to handle cases where the variances of the two populations are unequal, avoiding the assumption of homogeneity required in the pooled variance approach. The test statistic is given byt.test() function supports Welch's adjustment via the var.equal=FALSE argument and paired tests with paired=TRUE. Similarly, Python's SciPy library provides scipy.stats.ttest_ind() for independent samples including Welch's (with equal_var=False) and ttest_rel() for paired data.