Hubbry Logo
Post hoc analysisPost hoc analysisMain
Open search
Post hoc analysis
Community hub
Post hoc analysis
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Post hoc analysis
Post hoc analysis
from Wikipedia

In a scientific study, post hoc analysis (from Latin post hoc, "after this") consists of statistical analyses that were specified after the data were seen.[1][2][3] A post hoc analysis is usually used to explore specific, statistically significant differences between the means of three or more independent groups-- differences detected with an analysis of variance (ANOVA).[4] An ANOVA does not identify the group(s); for that, a post hoc analysis is required.[5]

Because each post hoc analysis is effectively a statistical test, conducting multiple post hoc comparisons introduces a family-wise error rate problem, which is a type of multiple testing problem. This increases the likelihood of false positives unless corrected.

Post hoc tests are follow-up tests performed after a significant ANOVA result[6] to identify where the differences lie (which specific groups differ). To compensate, multiple post hoc testing procedures are sometimes used, but that is often difficult or impossible to do precisely. Post hoc analysis that is conducted and interpreted without adequate consideration of this problem is sometimes called data dredging (p-hacking) by critics because the statistical associations that it finds are often spurious.[7] In other words, findings from data dredging are invalid or not trustworthy.

Post hoc analyses are acceptable when transparently reported as exploratory. In other words, post hoc analyses are not inherently unethical.[8] The main requirement for their ethical use is simply that their results not be mispresented as the original hypothesis.[8] Modern editions of scientific manuals have clarified this point; for example, APA style now specifies that "hypotheses should now be stated in three groupings: preplanned–primary, preplanned–secondary, and exploratory (post hoc). Exploratory hypotheses are allowable, and there should be no pressure to disguise them as if they were preplanned."[8]

Types of post hoc analysis

[edit]

Types or categories of post hoc analyses include[9]:

  • Pairwise comparisons: Tests all possible pairs
  • Trend analysis: Tests for linear or quadratic trends across ordered groups
  • Simple effects analysis: Examines effects within factorial ANOVA
  • Interaction probing: Analyzes interaction constraints within factorial ANOVA
  • Restricted Sets of Contrasts: Testing smaller families of comparisons

In addition, a subgroup analysis[10] examines whether findings differ between discrete categories of subjects in the sample. This approach is common in clinical and observational studies.

Common post hoc tests

[edit]

Common post hoc tests include:[11][12]

However, with the exception of Scheffès Method, these tests should be specified "a priori" despite being called "post-hoc" in conventional usage. For example, a difference between means could be significant with the Holm-Bonferroni method but not with the Turkey Test and vice versa. It would be poor practice for a data analyst to choose which of these tests to report based on which gave the desired result.

Causes

[edit]

Sometimes the temptation to engage in post hoc analysis is motivated by a desire to produce positive results or see a project as successful. In the case of pharmaceutical research, there may be significant financial consequences to a failed trial.[citation needed]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Post hoc analysis, in the context of , refers to a set of procedures used to explore and identify specific differences between groups after an initial , such as analysis of variance (ANOVA), has indicated a significant overall effect. These analyses are typically performed retrospectively, after data collection and following the rejection of the in the primary test, to pinpoint which particular group means differ from one another. Unlike planned (a priori) comparisons, post hoc tests are not specified in advance and are chosen based on the observed results, which necessitates adjustments to control for inflated Type I error rates across multiple simultaneous tests. The primary purpose of post hoc analysis is to provide detailed insights into the nature of significant effects detected in experiments or observational studies, enabling researchers to draw more precise conclusions without conducting additional data collection. Common methods include Tukey's Honestly Significant Difference (HSD) test, which performs all pairwise comparisons while controlling the family-wise error rate using the Studentized range distribution, making it suitable for balanced designs with equal sample sizes. Other widely used approaches are the Scheffé test, which is more conservative and allows for complex contrasts beyond simple pairwise comparisons, and the Bonferroni correction, a straightforward method that divides the overall alpha level by the number of comparisons to maintain error control. Less conservative options, such as the Newman-Keuls or Duncan tests, sequentially test comparisons based on the range of means, offering greater power to detect differences but at the risk of higher Type I errors. While post hoc analyses enhance interpretability and generation from existing , they carry inherent limitations, including reduced statistical power due to multiple testing corrections, which can lead to false negatives, and the potential for —exploring patterns without pre-specification—that may produce spurious findings not replicable in future studies. To mitigate these issues, researchers emphasize transparent reporting of all conducted tests and recommend combining post hoc results with confirmatory a priori analyses or replication studies for robust inference. In fields like , , and social sciences, post hoc methods are indispensable for dissecting complex group effects but must be interpreted cautiously to avoid overgeneralization.

Introduction

Definition

Post hoc analysis, derived from the Latin phrase post hoc meaning "after this," refers to statistical procedures or explorations performed retrospectively, after and primary testing, to investigate specific patterns or differences observed in the . These analyses are typically initiated when an overall test, such as analysis of variance (ANOVA), indicates significant differences among groups, allowing researchers to probe deeper into the nature of those differences. A defining feature of post hoc analysis is its exploratory orientation, as it involves unplanned comparisons that were not specified in advance, often encompassing multiple tests on the identical . This approach contrasts with pre-planned confirmatory testing by emphasizing discovery over verification, though it requires careful control of error rates due to the increased likelihood of false positives from repeated testing. For example, in an experiment evaluating the effects of three types on , a post hoc analysis would follow a significant ANOVA result to determine which specific pairs of fertilizers lead to statistically distinguishable yields. The retrospective application underscores the term's Latin , highlighting its role in examining data after initial findings have emerged.

Historical Development

The roots of post hoc analysis lie in the early 20th-century evolution of experimental design in statistics, particularly through Ronald A. Fisher's pioneering work on analysis of variance (ANOVA). In his 1925 book Statistical Methods for Research Workers, Fisher introduced ANOVA as a method to assess variance in experimental data, such as agricultural trials, which established the need for subsequent tests to pinpoint specific group differences following an overall significant result. This framework shifted statistical practice from simple pairwise comparisons to structured follow-up analyses in multifactor experiments. The mid-20th century saw the formalization of specific post hoc methods to address multiple comparisons while controlling error rates. In 1949, John W. Tukey developed the Honestly Significant Difference (HSD) test, presented in his paper "Comparing Individual Means in the Analysis of Variance," which provided a practical procedure for pairwise comparisons after ANOVA by using the to maintain family-wise error rates. Building on this, Henry Scheffé introduced a more versatile method in 1953 for judging all possible linear contrasts, including complex ones, in his article "A Method for Judging All Contrasts in the Analysis of Variance," offering conservative simultaneous confidence intervals suitable for exploratory investigations. These innovations addressed the limitations of earlier ad hoc approaches, emphasizing protection against inflated Type I errors in planned and unplanned comparisons. Post-1960s advancements in facilitated the widespread application of post hoc analyses by enabling rapid execution of multiple tests on large datasets. This era also highlighted the need for robust error control, with the —originally formulated by Carlo Emilio Bonferroni in 1936 for probability inequalities—gaining prominence in the 1970s as a simple yet conservative adjustment for multiple testing in statistical software and experimental designs. In the modern context, post hoc analysis has faced increased scrutiny amid the reproducibility crisis of the 2010s, where practices like p-hacking—manipulating data through iterative post hoc tests to achieve —were identified as contributors to non-replicable findings in fields such as and . To mitigate these issues, the American Psychological Association's 7th edition Publication Manual (2019) introduced guidelines distinguishing exploratory post hoc analyses from confirmatory ones, requiring clear labeling, pre-registration where possible, and transparent reporting to enhance scientific integrity.

Context and Prerequisites

Relation to Hypothesis Testing

Post hoc analysis functions as a critical follow-up to omnibus hypothesis tests, such as the (ANOVA), which evaluate the that all group means are equal against the alternative that at least one mean differs. These primary tests detect overall differences among multiple groups but cannot specify which particular groups account for the effect, necessitating post hoc procedures to localize significant pairwise or complex contrasts. A key prerequisite for conducting post hoc analysis is a statistically significant result from the ANOVA , conventionally at a significance level of p < 0.05, indicating that overall group differences exist and warrant further investigation to identify the sources of variation. The F-statistic itself is computed as F=MSbetweenMSwithin,F = \frac{\text{MS}_\text{between}}{\text{MS}_\text{within}}, where MSbetween_{\text{between}} represents the mean square variance between groups and MSwithin_{\text{within}} the mean square variance within groups; a large F-value relative to the F-distribution under the null hypothesis triggers the application of post hoc tests. Within experimental design, post hoc analysis integrates into a sequential testing pipeline, where the initial confirmatory hypothesis test (e.g., ANOVA) precedes exploratory breakdowns to refine understanding of the effects while maintaining statistical control. For instance, in psychological experiments evaluating treatment effects on depression incidence across groups (e.g., cognitive behavioral therapy, medication, and placebo), an ANOVA on group means is performed first; only upon significance do post hoc tests follow to pinpoint differences, such as between therapy and placebo, thereby avoiding unnecessary comparisons on non-significant data.

A Priori Versus Post Hoc Approaches

In statistical research, a priori approaches involve formulating specific hypotheses and planned comparisons prior to data collection, ensuring that the analyses are driven by theoretical expectations rather than observed results. This pre-specification allows researchers to control the Type I error rate at the nominal level, such as α = 0.05, for each planned test without the need for multiplicity adjustments, as the comparisons are limited and theoretically justified. For instance, in analysis of variance (ANOVA), orthogonal contrasts can be designed a priori to examine particular patterns, like a linear trend across increasing drug doses in a clinical trial, thereby maintaining the integrity of the overall experiment while focusing on hypothesized effects./12:_One-way_Analysis_of_Variance/12.6:_ANOVA_post-hoc_tests) In contrast, post hoc approaches are data-driven explorations conducted after initial analyses reveal patterns, such as significant overall effects in ANOVA, to probe specific group differences that were not anticipated beforehand. These analyses offer flexibility for discovering novel insights but carry a higher risk of false positives due to the increased number of potential comparisons, necessitating adjustments like Tukey's honestly significant difference or Scheffé's method to control the family-wise error rate (FWER) and prevent inflation of the overall Type I error. An example is following a significant ANOVA result with pairwise comparisons among all treatment groups to identify which pairs differ, even if no specific pairs were hypothesized initially; without correction, this could lead to spurious findings. The fundamental distinction between these approaches lies in their impact on error control and inferential validity: a priori tests preserve the designated α level per comparison because they are constrained by design, whereas post hoc tests demand conservative adjustments to maintain an acceptable FWER across the exploratory family of tests. Philosophically, a priori planning aligns with the principle of falsification in scientific inquiry, where pre-stated hypotheses are rigorously tested to avoid confirmation bias, while post hoc methods are better suited for hypothesis generation rather than definitive confirmation, as their exploratory nature can inadvertently capitalize on chance findings.

Types of Post Hoc Analysis

Pairwise Comparisons

Pairwise comparisons represent the most fundamental form of post hoc analysis, involving the examination of differences between every possible pair of group means after an initial omnibus test, such as ANOVA, has indicated overall significance among multiple groups. This approach allows researchers to pinpoint which specific groups differ from one another, providing targeted insights into the nature of the observed effects. These comparisons are particularly common in balanced experimental designs where groups have equal sample sizes, facilitating straightforward computation and interpretation. They typically assume that the data are normally distributed within each group and that variances are homogeneous across groups, ensuring the validity of the underlying statistical inferences. The process begins with calculating the mean difference for each pair using independent t-tests, followed by the application of a multiplicity correction—such as adjustments to p-values or critical values—to control the inflated risk of Type I errors from multiple testing. For k groups, this results in \frac{k(k-1)}{2} pairwise tests, which grows quadratically and underscores the need for such corrections. A practical example occurs in a clinical trial evaluating four different diets for weight loss effectiveness. After ANOVA reveals a significant overall difference in mean weight loss across the diets (F(3, 196) = 5.67, p < 0.01), pairwise comparisons might show that the low-carbohydrate diet significantly outperforms the standard diet (mean difference = 3.2 kg, adjusted p = 0.02), while no other pairs differ meaningfully. This isolates the superior intervention without overinterpreting the broad ANOVA result. The primary limitation of pairwise comparisons lies in their quadratic increase in the number of tests as the number of groups rises—for instance, five groups require 10 comparisons, amplifying the multiple comparisons problem and potentially reducing statistical power unless robust error-rate adjustments are employed. This heightens the overall experiment-wise error rate if uncorrected, emphasizing the importance of proceeding only after omnibus significance.

Complex Exploratory Analyses

Complex exploratory analyses extend beyond simple pairwise comparisons to uncover nuanced patterns in data, such as trends and interactions, particularly when initial omnibus tests like ANOVA indicate overall significance but require deeper dissection. These analyses are employed in scenarios where group means exhibit ordered or interactive relationships, allowing researchers to probe underlying structures without prior specification of all comparisons. For instance, in factorial designs, they facilitate the examination of how effects vary across levels of multiple factors. Key types include trend analysis, which tests for linear or quadratic patterns across ordered categories using orthogonal polynomial contrasts; simple effects analysis, which evaluates the influence of one factor within specific levels of another; interaction probing, which assesses moderator effects by decomposing significant interactions; and restricted contrasts, which focus on theory-guided subsets of comparisons rather than all possible pairs. Trend analysis, for example, applies coefficients like those for linear (e.g., -1, 0, 1) or quadratic (-1, 2, -1) trends to detect monotonic or curvilinear relationships. Simple effects involve running focused tests, such as one-way ANOVAs, at each level of a moderator to clarify interaction patterns. Interaction probing further explores how variables jointly influence outcomes, while restricted contrasts limit the family of tests to hypothesized subsets, enhancing power for targeted inquiries. These methods are particularly useful when pairwise comparisons alone fail to capture complexity, such as in probing interactions from factorial ANOVA where overall effects mask subgroup variations. By decomposing the omnibus effect into components like main effects within subgroups or trend components, researchers gain insights into data structures that inform model refinement. In education research, for example, post hoc trend analysis on performance data across age groups can reveal non-linear learning curves, such as a quadratic pattern where gains accelerate in middle childhood before plateauing in adolescence, as observed in studies of cognitive skill acquisition. A distinctive aspect of complex exploratory analyses is their role in hypothesis generation for subsequent confirmatory studies, provided results are explicitly labeled as exploratory to distinguish them from pre-planned tests and mitigate overinterpretation risks. The process typically begins after a significant overall test, involving the specification of contrasts or subgroup models to partition variance into interpretable components, followed by evaluation of their significance without full a priori planning, though adjustments for multiplicity may be applied depending on the exploratory scope.

Common Post Hoc Tests

Tukey's Honestly Significant Difference Test

Tukey's Honestly Significant Difference (HSD) test is a single-step post-hoc procedure designed for performing all pairwise comparisons among group means after a significant one-way analysis of variance (ANOVA), while controlling the family-wise error rate (FWER) at the desired significance level α\alpha. Developed by John Tukey, the method relies on the studentized range distribution to determine critical values, ensuring that the probability of at least one Type I error across all comparisons does not exceed α\alpha. It is particularly suited for balanced experimental designs where the focus is on identifying which specific pairs of means differ significantly. The test assumes that the data are normally distributed within each group, that variances are homogeneous across groups, and that sample sizes are equal (balanced design), making it most appropriate following a one-way ANOVA with these conditions met. Violations of normality or homogeneity can be assessed via residual plots or formal tests like Levene's, though the procedure is robust to moderate departures. For unequal sample sizes, an extension known as the Tukey-Kramer method adjusts the standard error for each pairwise comparison, though this renders the test more conservative. The core of the test involves computing a critical difference threshold, or HSD, using the formula: HSD=qα,k,νMSEn\text{HSD} = q_{\alpha, k, \nu} \sqrt{\frac{\text{MSE}}{n}}
Add your contribution
Related Hubs
User Avatar
No comments yet.