Recent from talks
Nothing was collected or created yet.
Post hoc analysis
View on WikipediaIn a scientific study, post hoc analysis (from Latin post hoc, "after this") consists of statistical analyses that were specified after the data were seen.[1][2][3] A post hoc analysis is usually used to explore specific, statistically significant differences between the means of three or more independent groups-- differences detected with an analysis of variance (ANOVA).[4] An ANOVA does not identify the group(s); for that, a post hoc analysis is required.[5]
Because each post hoc analysis is effectively a statistical test, conducting multiple post hoc comparisons introduces a family-wise error rate problem, which is a type of multiple testing problem. This increases the likelihood of false positives unless corrected.
Post hoc tests are follow-up tests performed after a significant ANOVA result[6] to identify where the differences lie (which specific groups differ). To compensate, multiple post hoc testing procedures are sometimes used, but that is often difficult or impossible to do precisely. Post hoc analysis that is conducted and interpreted without adequate consideration of this problem is sometimes called data dredging (p-hacking) by critics because the statistical associations that it finds are often spurious.[7] In other words, findings from data dredging are invalid or not trustworthy.
Post hoc analyses are acceptable when transparently reported as exploratory. In other words, post hoc analyses are not inherently unethical.[8] The main requirement for their ethical use is simply that their results not be mispresented as the original hypothesis.[8] Modern editions of scientific manuals have clarified this point; for example, APA style now specifies that "hypotheses should now be stated in three groupings: preplanned–primary, preplanned–secondary, and exploratory (post hoc). Exploratory hypotheses are allowable, and there should be no pressure to disguise them as if they were preplanned."[8]
Types of post hoc analysis
[edit]Types or categories of post hoc analyses include[9]:
- Pairwise comparisons: Tests all possible pairs
- Trend analysis: Tests for linear or quadratic trends across ordered groups
- Simple effects analysis: Examines effects within factorial ANOVA
- Interaction probing: Analyzes interaction constraints within factorial ANOVA
- Restricted Sets of Contrasts: Testing smaller families of comparisons
In addition, a subgroup analysis[10] examines whether findings differ between discrete categories of subjects in the sample. This approach is common in clinical and observational studies.
Common post hoc tests
[edit]Common post hoc tests include:[11][12]
- Holm-Bonferroni Procedure
- Newman-Keuls
- Rodger's Method
- Scheffé's Method
- Tukey's Test and Honestly Significance Difference (HSD) (see also: Studentized Range Distribution)
However, with the exception of Scheffès Method, these tests should be specified "a priori" despite being called "post-hoc" in conventional usage. For example, a difference between means could be significant with the Holm-Bonferroni method but not with the Turkey Test and vice versa. It would be poor practice for a data analyst to choose which of these tests to report based on which gave the desired result.
Causes
[edit]Sometimes the temptation to engage in post hoc analysis is motivated by a desire to produce positive results or see a project as successful. In the case of pharmaceutical research, there may be significant financial consequences to a failed trial.[citation needed]
See also
[edit]References
[edit]- ^ "What is the significance and use of post-hoc analysis studies?". www.cwauthors.com. Retrieved 2022-12-09.
- ^ "11.8: Post Hoc Tests". Statistics LibreTexts. 2019-11-12. Retrieved 2022-12-09.
- ^ "Post Hoc". FORRT - Framework for Open and Reproducible Research Training. Retrieved 2025-11-02.
- ^ "SAGE Research Methods - The SAGE Encyclopedia of Communication Research Methods". methods.sagepub.com. Retrieved 2022-12-09.
- ^ "11.8: Post Hoc Tests". Statistics LibreTexts. 2019-11-12. Retrieved 2025-11-02.
- ^ Bobbitt, Zach (2019-04-14). "A Guide to Using Post Hoc Tests with ANOVA". Statology. Retrieved 2025-11-02.
- ^ Zhang, Yiran; Hedo, Rita; Rivera, Anna; Rull, Rudolph; Richardson, Sabrina; Tu, Xin M. (2019-08-01). "Post hoc power analysis: is it an informative and meaningful analysis?". General Psychiatry. 32 (4) e100069. doi:10.1136/gpsych-2019-100069. ISSN 2517-729X. PMC 6738696.
- ^ a b c American Psychological Association (2020). Publication Manual of the American Psychological Association: the Official Guide to APA Style (7th ed.). Washington, DC: American Psychological Association. ISBN 978-1-4338-3217-8.
- ^ Beaton, Albert E.; Keppel, Geoffrey (1975). "Design and Analysis: A Researcher's Handbook". American Educational Research Journal. 12 (1): 101. doi:10.2307/1162588. ISSN 0002-8312.
- ^ Andrade, Chittaranjan (2023-11-01). "Types of Analysis: Planned (prespecified) vs Post Hoc, Primary vs Secondary, Hypothesis-driven vs Exploratory, Subgroup and Sensitivity, and Others". Indian Journal of Psychological Medicine. 45 (6): 640–641. doi:10.1177/02537176231216842. ISSN 0253-7176. PMC 10964884. PMID 38545527.
- ^ "Post Hoc Definition and Types of Tests". Statistics How To. Retrieved 2022-12-09.
- ^ Pamplona, Fabricio (2022-07-28). "Post Hoc Analysis: Process and types of tests". Mind the Graph Blog. Retrieved 2022-12-09.
Post hoc analysis
View on GrokipediaIntroduction
Definition
Post hoc analysis, derived from the Latin phrase post hoc meaning "after this," refers to statistical procedures or explorations performed retrospectively, after data collection and primary hypothesis testing, to investigate specific patterns or differences observed in the dataset.[4] These analyses are typically initiated when an overall test, such as analysis of variance (ANOVA), indicates significant differences among groups, allowing researchers to probe deeper into the nature of those differences.[3] A defining feature of post hoc analysis is its exploratory orientation, as it involves unplanned comparisons that were not specified in advance, often encompassing multiple tests on the identical dataset.[5] This approach contrasts with pre-planned confirmatory testing by emphasizing discovery over verification, though it requires careful control of error rates due to the increased likelihood of false positives from repeated testing.[6] For example, in an experiment evaluating the effects of three fertilizer types on crop yield, a post hoc analysis would follow a significant ANOVA result to determine which specific pairs of fertilizers lead to statistically distinguishable yields.[3] The retrospective application underscores the term's Latin roots, highlighting its role in examining data after initial findings have emerged.[4]Historical Development
The roots of post hoc analysis lie in the early 20th-century evolution of experimental design in statistics, particularly through Ronald A. Fisher's pioneering work on analysis of variance (ANOVA). In his 1925 book Statistical Methods for Research Workers, Fisher introduced ANOVA as a method to assess variance in experimental data, such as agricultural trials, which established the need for subsequent tests to pinpoint specific group differences following an overall significant result. This framework shifted statistical practice from simple pairwise comparisons to structured follow-up analyses in multifactor experiments. The mid-20th century saw the formalization of specific post hoc methods to address multiple comparisons while controlling error rates. In 1949, John W. Tukey developed the Honestly Significant Difference (HSD) test, presented in his paper "Comparing Individual Means in the Analysis of Variance," which provided a practical procedure for pairwise comparisons after ANOVA by using the studentized range distribution to maintain family-wise error rates. Building on this, Henry Scheffé introduced a more versatile method in 1953 for judging all possible linear contrasts, including complex ones, in his Biometrika article "A Method for Judging All Contrasts in the Analysis of Variance," offering conservative simultaneous confidence intervals suitable for exploratory investigations. These innovations addressed the limitations of earlier ad hoc approaches, emphasizing protection against inflated Type I errors in planned and unplanned comparisons.[7] Post-1960s advancements in computing facilitated the widespread application of post hoc analyses by enabling rapid execution of multiple tests on large datasets. This era also highlighted the need for robust error control, with the Bonferroni correction—originally formulated by Carlo Emilio Bonferroni in 1936 for probability inequalities—gaining prominence in the 1970s as a simple yet conservative adjustment for multiple testing in statistical software and experimental designs.[8] In the modern context, post hoc analysis has faced increased scrutiny amid the reproducibility crisis of the 2010s, where practices like p-hacking—manipulating data through iterative post hoc tests to achieve statistical significance—were identified as contributors to non-replicable findings in fields such as psychology and medicine. To mitigate these issues, the American Psychological Association's 7th edition Publication Manual (2019) introduced guidelines distinguishing exploratory post hoc analyses from confirmatory ones, requiring clear labeling, pre-registration where possible, and transparent reporting to enhance scientific integrity.Context and Prerequisites
Relation to Hypothesis Testing
Post hoc analysis functions as a critical follow-up to omnibus hypothesis tests, such as the one-way analysis of variance (ANOVA), which evaluate the null hypothesis that all group means are equal against the alternative that at least one mean differs.[9] These primary tests detect overall differences among multiple groups but cannot specify which particular groups account for the effect, necessitating post hoc procedures to localize significant pairwise or complex contrasts.[10] A key prerequisite for conducting post hoc analysis is a statistically significant result from the ANOVA F-test, conventionally at a significance level of p < 0.05, indicating that overall group differences exist and warrant further investigation to identify the sources of variation. The F-statistic itself is computed as where MS represents the mean square variance between groups and MS the mean square variance within groups; a large F-value relative to the F-distribution under the null hypothesis triggers the application of post hoc tests.[11] Within experimental design, post hoc analysis integrates into a sequential testing pipeline, where the initial confirmatory hypothesis test (e.g., ANOVA) precedes exploratory breakdowns to refine understanding of the effects while maintaining statistical control.[9] For instance, in psychological experiments evaluating treatment effects on depression incidence across groups (e.g., cognitive behavioral therapy, medication, and placebo), an ANOVA on group means is performed first; only upon significance do post hoc tests follow to pinpoint differences, such as between therapy and placebo, thereby avoiding unnecessary comparisons on non-significant data.[12]A Priori Versus Post Hoc Approaches
In statistical research, a priori approaches involve formulating specific hypotheses and planned comparisons prior to data collection, ensuring that the analyses are driven by theoretical expectations rather than observed results. This pre-specification allows researchers to control the Type I error rate at the nominal level, such as α = 0.05, for each planned test without the need for multiplicity adjustments, as the comparisons are limited and theoretically justified. For instance, in analysis of variance (ANOVA), orthogonal contrasts can be designed a priori to examine particular patterns, like a linear trend across increasing drug doses in a clinical trial, thereby maintaining the integrity of the overall experiment while focusing on hypothesized effects./12:_One-way_Analysis_of_Variance/12.6:_ANOVA_post-hoc_tests)[13] In contrast, post hoc approaches are data-driven explorations conducted after initial analyses reveal patterns, such as significant overall effects in ANOVA, to probe specific group differences that were not anticipated beforehand. These analyses offer flexibility for discovering novel insights but carry a higher risk of false positives due to the increased number of potential comparisons, necessitating adjustments like Tukey's honestly significant difference or Scheffé's method to control the family-wise error rate (FWER) and prevent inflation of the overall Type I error. An example is following a significant ANOVA result with pairwise comparisons among all treatment groups to identify which pairs differ, even if no specific pairs were hypothesized initially; without correction, this could lead to spurious findings.[6] The fundamental distinction between these approaches lies in their impact on error control and inferential validity: a priori tests preserve the designated α level per comparison because they are constrained by design, whereas post hoc tests demand conservative adjustments to maintain an acceptable FWER across the exploratory family of tests. Philosophically, a priori planning aligns with the principle of falsification in scientific inquiry, where pre-stated hypotheses are rigorously tested to avoid confirmation bias, while post hoc methods are better suited for hypothesis generation rather than definitive confirmation, as their exploratory nature can inadvertently capitalize on chance findings.[14][15]Types of Post Hoc Analysis
Pairwise Comparisons
Pairwise comparisons represent the most fundamental form of post hoc analysis, involving the examination of differences between every possible pair of group means after an initial omnibus test, such as ANOVA, has indicated overall significance among multiple groups. This approach allows researchers to pinpoint which specific groups differ from one another, providing targeted insights into the nature of the observed effects.[10][1] These comparisons are particularly common in balanced experimental designs where groups have equal sample sizes, facilitating straightforward computation and interpretation. They typically assume that the data are normally distributed within each group and that variances are homogeneous across groups, ensuring the validity of the underlying statistical inferences. The process begins with calculating the mean difference for each pair using independent t-tests, followed by the application of a multiplicity correction—such as adjustments to p-values or critical values—to control the inflated risk of Type I errors from multiple testing. For k groups, this results in \frac{k(k-1)}{2} pairwise tests, which grows quadratically and underscores the need for such corrections.[10][1][16] A practical example occurs in a clinical trial evaluating four different diets for weight loss effectiveness. After ANOVA reveals a significant overall difference in mean weight loss across the diets (F(3, 196) = 5.67, p < 0.01), pairwise comparisons might show that the low-carbohydrate diet significantly outperforms the standard diet (mean difference = 3.2 kg, adjusted p = 0.02), while no other pairs differ meaningfully. This isolates the superior intervention without overinterpreting the broad ANOVA result.[10] The primary limitation of pairwise comparisons lies in their quadratic increase in the number of tests as the number of groups rises—for instance, five groups require 10 comparisons, amplifying the multiple comparisons problem and potentially reducing statistical power unless robust error-rate adjustments are employed. This heightens the overall experiment-wise error rate if uncorrected, emphasizing the importance of proceeding only after omnibus significance.[10][1]Complex Exploratory Analyses
Complex exploratory analyses extend beyond simple pairwise comparisons to uncover nuanced patterns in data, such as trends and interactions, particularly when initial omnibus tests like ANOVA indicate overall significance but require deeper dissection. These analyses are employed in scenarios where group means exhibit ordered or interactive relationships, allowing researchers to probe underlying structures without prior specification of all comparisons. For instance, in factorial designs, they facilitate the examination of how effects vary across levels of multiple factors.[3] Key types include trend analysis, which tests for linear or quadratic patterns across ordered categories using orthogonal polynomial contrasts; simple effects analysis, which evaluates the influence of one factor within specific levels of another; interaction probing, which assesses moderator effects by decomposing significant interactions; and restricted contrasts, which focus on theory-guided subsets of comparisons rather than all possible pairs. Trend analysis, for example, applies coefficients like those for linear (e.g., -1, 0, 1) or quadratic (-1, 2, -1) trends to detect monotonic or curvilinear relationships. Simple effects involve running focused tests, such as one-way ANOVAs, at each level of a moderator to clarify interaction patterns. Interaction probing further explores how variables jointly influence outcomes, while restricted contrasts limit the family of tests to hypothesized subsets, enhancing power for targeted inquiries.[17][18][3] These methods are particularly useful when pairwise comparisons alone fail to capture complexity, such as in probing interactions from factorial ANOVA where overall effects mask subgroup variations. By decomposing the omnibus effect into components like main effects within subgroups or trend components, researchers gain insights into data structures that inform model refinement. In education research, for example, post hoc trend analysis on performance data across age groups can reveal non-linear learning curves, such as a quadratic pattern where gains accelerate in middle childhood before plateauing in adolescence, as observed in studies of cognitive skill acquisition. A distinctive aspect of complex exploratory analyses is their role in hypothesis generation for subsequent confirmatory studies, provided results are explicitly labeled as exploratory to distinguish them from pre-planned tests and mitigate overinterpretation risks.[19] The process typically begins after a significant overall test, involving the specification of contrasts or subgroup models to partition variance into interpretable components, followed by evaluation of their significance without full a priori planning, though adjustments for multiplicity may be applied depending on the exploratory scope.[20]Common Post Hoc Tests
Tukey's Honestly Significant Difference Test
Tukey's Honestly Significant Difference (HSD) test is a single-step post-hoc procedure designed for performing all pairwise comparisons among group means after a significant one-way analysis of variance (ANOVA), while controlling the family-wise error rate (FWER) at the desired significance level . Developed by John Tukey, the method relies on the studentized range distribution to determine critical values, ensuring that the probability of at least one Type I error across all comparisons does not exceed . It is particularly suited for balanced experimental designs where the focus is on identifying which specific pairs of means differ significantly.[21][22] The test assumes that the data are normally distributed within each group, that variances are homogeneous across groups, and that sample sizes are equal (balanced design), making it most appropriate following a one-way ANOVA with these conditions met. Violations of normality or homogeneity can be assessed via residual plots or formal tests like Levene's, though the procedure is robust to moderate departures. For unequal sample sizes, an extension known as the Tukey-Kramer method adjusts the standard error for each pairwise comparison, though this renders the test more conservative.[22][23] The core of the test involves computing a critical difference threshold, or HSD, using the formula: where is the critical value from the studentized range distribution (obtained from statistical tables or software for significance level , groups, and error degrees of freedom), MSE is the mean square error from the ANOVA, and is the common sample size per group. A pairwise mean difference is deemed significant if it exceeds the HSD. Confidence intervals for differences can also be constructed by subtracting or adding half the HSD to the observed difference.[24][22] To apply the test, first conduct the one-way ANOVA and confirm overall significance (). Then, calculate the HSD using the formula above. Compute the absolute differences for all pairwise comparisons and flag those exceeding the HSD as significant. Results are often summarized in a table or compact letter display, where means sharing the same letter are not significantly different. Software like R (viaTukeyHSD()) or SAS automates these computations, including simultaneous confidence intervals.[22][23]
For instance, consider an experiment evaluating crop yields from five fertilizer varieties, each tested on plots, yielding ANOVA MSE = 25. With and , the critical , so HSD . If mean yields are 20, 22, 25, 28, and 30 bushels per acre, pairs like the first and last (difference=10 > 6.36) would be significantly different, identifying the superior variety without inflating error rates.[22][25]
The Tukey HSD test is powerful for detecting true differences in balanced designs with equal sample sizes, offering better control over Type II errors compared to more conservative methods like Bonferroni, while maintaining FWER control. It is widely implemented and recommended for standard pairwise post-hoc analyses in parametric settings. However, it is less flexible for unequal sample sizes (where the Tukey-Kramer adjustment increases conservatism) or for comparisons involving linear contrasts beyond pairs, and its power decreases as the number of groups grows large.[26]
