Recent from talks
Nothing was collected or created yet.
All models are wrong
View on Wikipedia"All models are wrong" is a common aphorism in statistics. It is often expanded as "All models are wrong, but some are useful". The aphorism acknowledges that statistical models always fall short of the complexities of reality but can still be useful nonetheless. The aphorism is generally attributed to George E. P. Box, a British statistician, although the underlying concept predates Box's writings.
History
[edit]
The phrase "all models are wrong" was attributed[1] to George Box who used the phrase in a 1976 paper to refer to the limitations of models, arguing that while no model is ever completely accurate, simpler models can still provide valuable insights if applied judiciously.[2]: 792
In their 1983 book on generalized linear models, Peter McCullagh and John Nelder stated that while modeling in science is a creative process, some models are better than others, even though none can claim eternal truth.[3][4] In 1996, an Applied Statistician's Creed was proposed by M.R. Nester, which incorporated the aphorism as a central tenet.[1]
The longer form appears on in a 1987 book by Box and Norman Draper in a section "The Use of Approximating Functions,":
"The fact that the polynomial is an approximation does not necessarily detract from its usefulness because all models are approximations. Essentially, all models are wrong, but some are useful."[5]: 424
Discussions
[edit]Box used the aphorism again in 1979, where he expanded on the idea by discussing how models serve as useful approximations, despite failing to perfectly describe empirical phenomena.[6] He reiterated this sentiment in his later works, where he discussed how models should be judged based on their utility rather than their absolute correctness.[7][8]
David Cox, in a 1995 commentary, argued that stating all models are wrong is unhelpful, as models by their nature simplify reality. He emphasized that statistical models, like other scientific models, aim to capture important aspects of systems through idealized representations.[9]
In their 2002 book on statistical model selection, Burnham and Anderson reiterated Box's statement, noting that while models are simplifications of reality, they vary in usefulness, from highly useful to essentially useless.[10]
J. Michael Steele used the analogy of city maps to explain that models, like maps, serve practical purposes despite their limitations, emphasizing that certain models, though simplified, are not necessarily wrong.[11] In response, Andrew Gelman acknowledged Steele's point but defended the usefulness of the aphorism, particularly in drawing attention to the inherent imperfections of models.[12]
Philosopher Peter Truran, in a 2013 essay, discussed how seemingly incompatible models can make accurate predictions by representing different aspects of the same phenomenon, illustrating the point with an example of two observers viewing a cylindrical object from different angles.[13]
In 2014, David Hand reiterated that models are meant to aid in understanding or decision-making about the real world, a point emphasized by Box's famous remark.[14]
See also
[edit]- Anscombe's quartet – Four data sets with the same descriptive statistics, yet very different distributions
- Bonini's paradox – As a model of a complex system becomes more complete, it becomes less understandable
- Lie-to-children – Teaching a complex subject via simpler models
- Map–territory relation – Relationship between an object and a representation of that object
- Pragmatism – Philosophical tradition
- Reification (fallacy) – Fallacy of treating an abstraction as if it were a real thing
- Scientific modelling – Scientific activity that produces models
- Statistical model – Type of mathematical model
- Statistical model validation – Evaluating whether a chosen statistical model is appropriate or not
- Verisimilitude – Resemblance to reality
Notes
[edit]- ^ a b Nester, M. R. (1996), "An applied statistician's creed" (PDF), Journal of the Royal Statistical Society, Series C, 45 (4): 401–410, doi:10.2307/2986064, JSTOR 2986064.
- ^ Box, George E. P. (1976), "Science and statistics" (PDF), Journal of the American Statistical Association, 71 (356): 791–799, doi:10.1080/01621459.1976.10480949.
- ^ McCullagh, P.; Nelder, J. A. (1983), Generalized Linear Models, Chapman & Hall, §1.1.4.
- ^ McCullagh, P.; Nelder, J. A. (1989), Generalized Linear Models (second ed.), Chapman & Hall, §1.1.4.
- ^ Box, George E. P.; Draper, Norman Richard (1987). Empirical model-building and response surfaces. Wiley series in probability and mathematical statistics. New York: Wiley. ISBN 978-0-471-81033-9.
- ^ Box, G. E. P. (1979), "Robustness in the strategy of scientific model building", in Launer, R. L.; Wilkinson, G. N. (eds.), Robustness in Statistics, Academic Press, pp. 201–236, doi:10.1016/B978-0-12-438150-6.50018-2, ISBN 978-1-4832-6336-6
- ^ Box, G. E. P.; Draper, N. R. (1987), Empirical Model-Building and Response Surfaces, John Wiley & Sons.
- ^ The relatedness of Shewhart's quotation with the aphorism "all models are wrong" is noted by Fricker & Woodall (2016).
- ^ Cox, D. R. (1995), "Comment on "Model uncertainty, data mining and statistical inference"", Journal of the Royal Statistical Society, Series A, 158: 455–456.
- ^ Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.), Springer-Verlag, §1.2.5.
- ^ Steele, J. M., "Models: Masterpieces and Lame Excuses".
- ^ Gelman, A. (12 June 2008), "Some thoughts on the saying, 'All models are wrong, but some are useful'".
- ^ Truran, P. (2013), "Models: Useful but Not True", Practical Applications of the Philosophy of Science, SpringerBriefs in Philosophy, Springer, pp. 61–67, doi:10.1007/978-3-319-00452-5_10, ISBN 978-3-319-00451-8.
- ^ Hand, D. J. (2014), "Wonderful examples, but let's not close our eyes", Statistical Science, 29: 98–100, arXiv:1405.4986, doi:10.1214/13-STS446.
References
[edit]- Ashby, N. (2002), "Relativity and the Global Positioning System" (PDF), Physics Today, 55 (5): 41–47, Bibcode:2002PhT....55e..41A, doi:10.1063/1.1485583.
- Fricker, R. D., Jr.; Woodall, W. H. (2016), "Play it again, and again, Sam", Significance, 13 (4): 46, doi:10.1111/j.1740-9713.2016.00944.x
{{citation}}: CS1 maint: multiple names: authors list (link). - Valéry, Paul (1970), Collected Works of Paul Valéry, Volume 14—Analects, translated by Stuart Gilbert, Princeton University Press.
- Vankat, J. L. (2013), Vegetation Dynamics on the Mountains and Plateaus of the American Southwest, Springer.
- Wolfson, M. C.; Murphy, B. B. (April 1998), "New views on inequality trends" (PDF), Monthly Labor Review: 3–23.
Further reading
[edit]- Anderson, C. (23 June 2008), "The end of theory", Wired
- Box, G. E. P. (1999), "Statistics as a catalyst to learning by scientific method Part II—A discussion", Journal of Quality Technology, 31: 16–29, doi:10.1080/00224065.1999.11979890
- Enderling, H.; Wolkenhauer, O. (2021), "Are all models wrong?", Computational and Systems Oncology, 1 (1) e1008, doi:10.1002/cso2.1008, PMC 7880041, PMID 33585835
- Saltelli, A.; Funtowicz, S. (Winter 2014), "When all models are wrong", Issues in Science and Technology, 30
External links
[edit]All models are wrong
View on GrokipediaIntroduction
The Aphorism
The aphorism "All models are wrong, but some are useful" is a foundational statement in statistical philosophy, commonly attributed to the British statistician George E. P. Box. A related phrasing, "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful," appears on page 424 of the 1987 book Empirical Model-Building and Response Surfaces, co-authored by Box and Norman R. Draper.[7] This succinct expression captures the inherent limitations of mathematical and statistical models while highlighting their potential value in practical applications.[8] George E. P. Box (1919–2013) was a pioneering statistician whose work profoundly influenced experimental design, time-series analysis, and industrial quality control.[9] He is particularly renowned for developing response surface methodology in collaboration with K. B. Wilson in 1951, a technique for optimizing processes through sequential experimentation, and for co-authoring the influential Box-Jenkins methodology for ARIMA models in 1970. Box's career spanned academia and industry, including key roles at the University of Wisconsin-Madison and contributions to wartime chemical engineering efforts during World War II.[9] The core idea behind the aphorism traces to Box's 1976 paper "Science and Statistics," published in the Journal of the American Statistical Association, where he stated, "Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration."[10] In this work, Box aimed to caution researchers in empirical sciences against over-literal interpretations of models, urging instead a focus on their approximate nature and the importance of iterative refinement to address key discrepancies with reality.[10] This perspective underscores that models serve as tools for insight rather than perfect representations, a notion that resonates with earlier philosophical distinctions like Alfred Korzybski's 1933 assertion that "the map is not the territory."Core Principle
The core principle underlying the aphorism "all models are wrong, but some are useful" posits that models serve as simplified abstractions of complex reality, capturing key patterns while inevitably excluding numerous details and nuances. In statistical and scientific contexts, a model is defined as a mathematical or conceptual representation that approximates the relationships within observed data, relying on assumptions to make the system tractable for analysis and prediction.[11] These simplifications—such as assuming specific distributions or functional forms—render every model inherently inaccurate, as no representation can fully encapsulate the infinite intricacies of the real world.[3] Despite this fundamental wrongness, models derive value from their pragmatic utility in facilitating understanding, forecasting outcomes, or informing decisions, provided they align sufficiently with empirical evidence in targeted scenarios. The principle emphasizes that usefulness arises not from pursuing unattainable perfect truth, but from iterative refinement where models are tested, critiqued, and adjusted against real-world data to highlight significant discrepancies over minor ones.[3] For instance, linear regression exemplifies this by assuming a straight-line relationship between variables, which is often untrue in nonlinear phenomena, yet it effectively approximates many such relationships when the deviations are not extreme, enabling reliable predictions in fields like economics or biology.[12] This distinction between absolute truth—which remains elusive in multifaceted systems—and practical utility underscores the aphorism's philosophical essence: models should be evaluated by their capacity to yield actionable insights rather than their completeness. As articulated by statistician George E. P. Box, the focus must lie on models that, though flawed, advance scientific inquiry through robust, falsifiable approximations.Historical Origins
Precedents Before Box
The intellectual precursors to the aphorism "all models are wrong" can be traced to early 20th-century discussions in philosophy, semantics, and statistics that underscored the inherent limitations of abstract representations in capturing complex realities. A foundational influence emerged from Alfred Korzybski's development of general semantics, where he introduced the dictum "the map is not the territory" to highlight the epistemological and semantic boundaries between linguistic or symbolic abstractions and the actual world they describe. In Science and Sanity (1933), Korzybski argued that all human knowledge operates through such "maps"—simplified structures that resemble but never fully encompass the "territory" of reality, leading to potential errors in scientific reasoning if the distinction is ignored. This principle emphasized that abstractions, while useful for navigation and prediction, impose structural similarities rather than identical replications, thereby establishing a cautionary framework for scientific modeling that prioritizes awareness of representational incompleteness.[13] Building on semantic critiques, statistical thinkers like Walter Shewhart advanced ideas about approximations in applied science, particularly in quality management. In his 1939 monograph Statistical Method from the Viewpoint of Quality Control, Shewhart described statistical control charts as practical tools for approximating process variability in manufacturing, acknowledging that no mathematical framework could exhaustively characterize the dynamic, chance-influenced nature of physical systems. He posited that such methods enable rational prediction and control but are inherently limited by the unpredictability of assignable causes versus common variation, framing statistical models as operational approximations rather than definitive truths. This perspective influenced quality engineering by promoting models that balance utility with an explicit recognition of their partial coverage of real-world phenomena.[14] In the realm of computing and simulations, John von Neumann contributed a related insight during the mid-1940s, amid early efforts to model complex systems numerically. In 1947, von Neumann remarked that "truth is much too complicated to allow anything but approximations," reflecting on the challenges of exact modeling in computational contexts like Monte Carlo simulations and automata theory. This observation arose from his work on electronic computing at the Institute for Advanced Study, where he recognized that the intricate dynamics of physical and biological processes defied precise mathematical replication, necessitating iterative, approximate methods for practical advancement. His view positioned computational models as indispensable yet flawed instruments for exploring otherwise intractable realities.[15] Complementing these developments, Bertrand Russell's early 20th-century philosophy of logic and science provided broader conceptual groundwork through his emphasis on logical constructions as approximations to empirical reality. In works such as Our Knowledge of the External World (1914), Russell advocated for logical atomism, wherein complex phenomena are analyzed into simpler propositional structures that approximate the underlying order of the world, but he cautioned that such analyses are mediated by incomplete sense data and cannot fully mirror unobservable essences. His approach in The Analysis of Matter (1927) further elaborated how scientific theories construct relational models that capture structural analogies to reality rather than its intrinsic nature, influencing positivist traditions by treating logical approximations as the closest achievable proxies in philosophy and science. These ideas reinforced the notion that all representational systems, logical or otherwise, entail necessary simplifications.George Box's Formulation
George Edward Pelham Box, a prominent British statistician, first articulated the core idea behind the aphorism in his 1976 presidential address to the American Statistical Association, published in the Journal of the American Statistical Association. In the paper "Science and Statistics," Box stated: "Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena." This formulation emphasized parsimony in modeling, arguing that scientists should prioritize simple, adequate approximations over complex but still flawed representations to uncover new insights about reality.[3] Box's background in experimental design profoundly shaped this perspective. During World War II, he contributed to chemical engineering statistics at Imperial Chemical Industries, where he developed methods for optimizing industrial processes. His seminal 1951 collaboration with K. B. Wilson introduced response surface methodology (RSM), a technique for modeling and analyzing problems in which a response of interest is influenced by several variables, enabling efficient experimental designs to approximate optimal conditions. Later, Box advanced Bayesian inference through works like his 1973 book with George C. Tiao, applying probabilistic approaches to model updating and uncertainty quantification in statistical analysis. Box expanded the aphorism in 1979, in his paper "Robustness in the Strategy of Scientific Model Building", where he famously declared: "All models are wrong but some are useful." In this context, Box illustrated the point using the ideal gas law (PV = nRT) as an example of a model that, despite its inaccuracies in capturing real gas behavior, provides valuable predictions and guides further inquiry into molecular dynamics. This work popularized the phrase within statistical literature, highlighting how imperfect models can still drive practical advancements when they align with empirical data.[16] Within response surface methodology, Box employed the aphorism to advocate for iterative model refinement, a process where initial quadratic approximations are fitted to experimental data, evaluated for adequacy, and sequentially improved through additional designs to better capture system responses. This sequential approach, rooted in his 1951 RSM framework, underscores the aphorism's role in promoting adaptive, usefulness-focused modeling over static perfectionism. In his 1987 book with Draper, Empirical Model-Building and Response Surfaces, Box further elaborated on models as parsimonious simplifications of complex realities, akin to caricatures that exaggerate key features while omitting minor details to facilitate understanding and prediction. Box's ideas echoed earlier sentiments, such as John von Neumann's remark that "with four parameters I can fit an elephant, and with five I can make him wiggle his trunk," which similarly critiqued overparameterized models.Conceptual Foundations
Nature of Models
In statistical and scientific modeling, a fundamental representation of the relationship between observed outcomes and predictors is given by the equation where is the response variable, represents the input features, denotes the model parameters, is the functional form capturing the systematic relationship, and is the error term accounting for unmodeled variability or noise, often assumed to have mean zero. This formulation underscores that all models are inherently approximations, as encapsulates aspects of reality that the model cannot fully capture, such as random fluctuations or omitted complexities. Models are broadly categorized into parametric and non-parametric types based on their structural assumptions. Parametric models assume a fixed functional form with a finite number of parameters , such as a Gaussian distribution for error terms or linear relationships in regression, which allows for efficient estimation but risks misspecification if the assumed form does not match the data-generating process. In contrast, non-parametric models avoid rigid assumptions about the form of , instead estimating it flexibly from the data, such as through kernel smoothing or splines; however, they require larger sample sizes to achieve reliable performance and can suffer from overfitting in high dimensions. These categories highlight the trade-off between simplicity and adaptability inherent in modeling. The "wrongness" of models arises from several key sources of misspecification. Omitted variables occur when relevant predictors are excluded, leading to biased estimates of the included parameters, as the model attributes effects of the missing factors to the observed ones—a phenomenon known as omitted variable bias.[17] Assumption violations, such as non-linearity in a linear model or non-normal errors where normality is presumed, distort inference and prediction accuracy by invalidating the theoretical basis for estimation.[18] In complex systems, scaling issues further exacerbate flaws, where models calibrated at one scale (e.g., local interactions) fail to capture emergent behaviors or interactions at larger scales, due to nonlinear dynamics or hierarchical structures that amplify small discrepancies.[19] A classic illustration of these limitations is Newtonian physics, which serves as a local approximation to general relativity under conditions of low velocities and weak gravitational fields but breaks down at relativistic scales, such as near the speed of light or in strong gravity, where spacetime curvature introduces unmodeled effects like time dilation.[20] This example reminds us, as articulated in George Box's aphorism, that models are simplifications destined to be wrong in some respects yet potentially useful within their intended domains.Criteria for Usefulness
The usefulness of a model lies in its ability to deliver practical value despite inevitable simplifications and errors, primarily through strong predictive power, interpretability, generalizability, and cost-effectiveness. Predictive power refers to the model's capacity to accurately forecast outcomes on unseen data, enabling reliable decision-making in real-world scenarios. Interpretability allows users to comprehend the relationships and factors driving predictions, fostering trust and actionable insights. Generalizability measures how well the model performs across diverse datasets or conditions, avoiding degradation in novel contexts. Cost-effectiveness evaluates the trade-off between the model's benefits and the resources required for development, training, and deployment, ensuring feasibility in resource-constrained environments. A fundamental principle underlying model usefulness is the bias-variance tradeoff, which highlights the tension between model simplification—introducing bias through assumptions that deviate from true complexity—and overfitting, which increases variance by capturing noise in training data at the expense of broader applicability.[21] This tradeoff is formally captured in the decomposition of mean squared error (MSE), a common measure of predictive accuracy: Here, bias² quantifies systematic errors from model assumptions, variance reflects sensitivity to training data fluctuations, and irreducible error represents inherent noise in the data that no model can eliminate.[21] Effective models minimize total MSE by balancing these components, prioritizing configurations that enhance overall utility without excessive complexity. For instance, in public policy decisions, simple linear models are frequently favored over intricate alternatives precisely because their straightforward coefficients and assumptions enhance interpretability, allowing policymakers to readily discern causal influences and build stakeholder confidence.Key Interpretations
In Statistics and Modeling
In statistical practice, the aphorism "all models are wrong" underscores the inherent approximations in modeling real-world phenomena, influencing approaches to model selection and validation by encouraging practitioners to prioritize usefulness over perfection. This perspective, originally formulated by George Box, reminds statisticians that no model fully captures the complexity of data-generating processes, prompting techniques that account for uncertainty and model inadequacy.[22] A key interpretation appears in the development of generalized linear models (GLMs), where Peter McCullagh and John Nelder frame models explicitly as approximations rather than exact representations of truth.[23] In their foundational text, they emphasize the goal of identifying models that provide adequate fits for practical inference, particularly in handling non-normal response variables through link functions and variance structures. This view promotes rigorous diagnostic checks and iterative refinement to mitigate the inevitable discrepancies between model assumptions and data. McCullagh and Nelder stress that the approximate nature of models must always be acknowledged to avoid overconfidence in predictions or parameter estimates. Building on this, Kenneth P. Burnham and David R. Anderson advocate for model averaging as a strategy to address the flaws in any single model, recognizing that model selection often favors one approximation at the expense of others, leading to biased inferences. In their information-theoretic framework, they recommend using Akaike's Information Criterion (AIC) to weigh multiple candidate models and compute weighted averages of predictions, thereby incorporating model uncertainty and reducing the risks associated with relying on a "best" but inevitably wrong model. This multimodel inference approach has become widely adopted in ecology and beyond, as it aligns with the aphorism by treating all models as provisional tools for approximating reality rather than definitive truths.[24] Mark R. Nester integrates the aphorism into his "creed for modelers," a set of guiding principles for applied statisticians that emphasizes humility in the face of model limitations. Presented as a manifesto against overreliance on rigid assumptions, Nester's creed lists "all models are wrong" as a core tenet, urging modelers to focus on robust estimation, sensitivity analyses, and communication of uncertainties rather than pursuing unattainable exactitude. This creed promotes validation techniques like cross-validation and residual analysis to reveal model inadequacies, fostering a practice-oriented mindset that values practical utility in decision-making. John Michael Steele extends the discussion with an analogy to city maps, portraying statistical models as selective representations designed for specific purposes, such as a map that depicts a building as a simple square rather than its full architectural detail to serve navigation needs. In this view, models are not wrong in an absolute sense but are intentionally simplified to serve targeted inquiries, such as prediction or hypothesis testing; their "wrongness" arises only if mismatched to the intended use. Steele's analogy reinforces validation practices that assess model fit relative to the problem at hand, encouraging statisticians to choose and critique models based on their representational fidelity for practical tasks.[22]In Philosophy of Science
The aphorism "all models are wrong" aligns closely with scientific anti-realism, particularly Bas van Fraassen's constructive empiricism, which posits that scientific models function as instruments for empirical adequacy rather than as literal depictions of unobservable reality. In constructive empiricism, the success of a theory or model is measured by its ability to accurately represent observable phenomena, without committing to the truth of theoretical entities or structures beyond what is empirically accessible. This view echoes the aphorism by emphasizing that models need not capture the "whole truth" to be valuable; instead, their utility lies in predictive power and empirical fit, acknowledging inherent approximations and idealizations that prevent perfect representation. The epistemological limits highlighted in the aphorism also draw from Alfred Korzybski's general semantics, where his dictum "the map is not the territory" underscores that abstractions and representations inevitably distort the complexity of reality. Korzybski argued in his foundational work that linguistic and symbolic models impose structural constraints on human understanding, leading to potential errors if treated as identical to the phenomena they describe. This influence revisits broader philosophical concerns about the boundaries of knowledge, reinforcing the aphorism's caution against conflating models with objective truth and promoting awareness of their abstract nature in scientific inquiry. In discussions of scientific inference, David J. Hand has elaborated on the aphorism to advocate a shift from mere prediction toward decision-making utility, arguing that models' value extends beyond explanatory accuracy to informing practical choices under uncertainty. Hand contends that while models cannot fully replicate reality, their role in science increasingly involves supporting robust decisions by highlighting probabilistic outcomes and risks, rather than aspiring to unattainable precision in forecasting. This perspective integrates the aphorism into pragmatic philosophy, where model assessment prioritizes actionable insights over idealized truth claims. Philosopher Peter Truran further illustrates the aphorism's implications by observing that even simplistic models, such as approximating non-cylindrical objects as cylinders in physical analyses, can disclose partial truths about underlying mechanisms despite their inaccuracies. Truran emphasizes that such models reveal aspects of reality by focusing on relevant features while abstracting away irrelevant details, thereby contributing to knowledge without pretending to completeness. This approach supports the philosophical acceptance of approximation as a necessary feature of scientific progress, where the "wrongness" of models does not negate their capacity to illuminate truths incrementally.Applications Across Fields
In Traditional Sciences
In physics, the aphorism underscores the role of approximate models in simulations that facilitate practical engineering applications, despite inherent inaccuracies. For instance, computational fluid dynamics (CFD) models approximate complex turbulent flows by simplifying small-scale eddies and using parameterization for unresolved phenomena, enabling designs for aircraft and vehicles that reduce the need for physical prototypes.[25] These models, such as those based on the Navier-Stokes equations, are iteratively refined through validation against experimental data to improve predictive accuracy for macroscopic behaviors, even as they fail to capture every molecular interaction.[26] In engineering, George Box applied the principle through his development of response surface methodology (RSM) for optimizing chemical processes at Imperial Chemical Industries. RSM uses quadratic polynomial approximations to model relationships between input variables and process outputs, allowing engineers to identify optimal operating conditions despite the models' simplifications of nonlinear dynamics. This approach emphasizes iterative experimentation to refine models, balancing computational feasibility with actionable insights for industrial efficiency, as seen in yield maximization for chemical reactors.[27] In biology, ecological models exemplify the aphorism as deliberate simplifications that inform conservation strategies, highlighting the value of iterative updates based on field data. The Lotka-Volterra equations, which describe idealized predator-prey dynamics through coupled differential equations, ignore factors like spatial heterogeneity and environmental stochasticity but provide foundational insights into population oscillations and stability thresholds.[28] Such models are refined over time—incorporating extensions for multiple species or density dependence—to guide planning, such as predicting extinction risks in island ecosystems and informing invasive species control efforts.[29]In Machine Learning and AI
In the domain of explainable AI (XAI), the aphorism inspires variants like "All models are wrong, but some are dangerous," which stress the perils of deploying opaque models in high-stakes applications without transparency mechanisms. This perspective is exemplified in the development of local interpretable model-agnostic explanations (LIME), where Ribeiro et al. (2016) argue that understanding model predictions is essential to mitigate risks from erroneous or biased outputs, enabling users to probe and trust black-box classifiers. Such approaches have become foundational in XAI, promoting surrogate models that approximate complex decisions locally to reveal potential flaws. The limitations of large language models (LLMs) further illustrate the aphorism, as claims of emergent abilities—sudden performance jumps in tasks like arithmetic or commonsense reasoning—often prove misleading due to inherent model flaws and evaluation artifacts. Schaeffer et al. (2023) show that these "emergences" vanish when using continuous, non-smooth metrics, attributing the phenomenon to discontinuities in measurement rather than genuine capabilities, thus highlighting how LLMs remain fundamentally approximate and prone to brittleness. This underscores the need for cautious interpretation in AI scaling, where model wrongs manifest as unreliable generalizations. Physics-enhanced machine learning leverages the idea to advocate hybrid models that incorporate domain knowledge, acknowledging pure data-driven approaches as wrong in data-sparse regimes but improvable through physical constraints. In a position paper, Cicirello (2024) invokes the aphorism to justify physics-informed neural networks for dynamical systems, where embedding governing equations reduces extrapolation errors and enhances utility over standalone ML, as demonstrated in simulations of fluid flows and structural mechanics.[30] This integration yields more robust predictions, bridging the gap between approximate models and real-world applicability. Assessing variable importance in ML also draws on the principle by shifting focus from single flawed models to entire classes, yielding stable insights despite individual inaccuracies. Fisher et al. (2019) introduce model class reliance, a method that aggregates reliance across prediction models to quantify feature effects robustly, even for black-box or misspecified systems, as validated on datasets like UCI benchmarks showing reduced variance in importance rankings.[31] Recent extensions, such as targeted learning frameworks in 2024, build on this for causal inference in AI. Contemporary developments in trustworthy AI and arbitrariness in ML continue to reference the aphorism, emphasizing model-agnostic limits and selection strategies. For instance, works on distribution-free prediction bounds establish fundamental hardness results, confirming that detecting model wrongs is challenging without assumptions, informing safer AI deployment (Müller, 2025).[32] Similarly, explorations of arbitrariness reveal how random seeds or architectures lead to divergent outcomes, advocating ensemble methods for reliability in trustworthy systems (Ganesh, 2025).[33] These arXiv preprints from 2023–2025 highlight ongoing efforts to make "wrong" models more predictably useful.Criticisms and Debates
Primary Critiques
One prominent critique of the aphorism "all models are wrong" comes from statistician David Cox, who in 1995 argued that the statement is unhelpful because the very concept of a model inherently involves simplification and approximation to aid understanding of complex phenomena, rather than aiming for exact replication of reality.[34] Cox emphasized that such declarations overlook the practical roles models play in hypothesis testing and inference, where their utility derives from targeted approximations rather than absolute accuracy.[34] Critics have also pointed to an overemphasis on the "wrongness" of models, which can inadvertently discourage rigorous validation and improvement efforts. In the context of artificial intelligence, particularly large language models (LLMs), analyses of LLM risks highlight how inherent flaws in training data—such as embedded biases and incomplete representations of language—can propagate discriminatory outputs, underscoring the need for proactive mitigation of ethical issues like bias amplification and societal harms.[35] Philosophical discussions highlight a tension between instrumentalist views, which emphasize models as useful tools for prediction, and scientific realism, which suggests that refined models can approximate underlying truths about the world.Responses and Evolutions
In response to the inherent limitations of models, George E. P. Box emphasized in 1987 the value of iterative refinement in empirical model-building processes to progressively enhance their practical utility, viewing model development as a cyclical endeavor involving conjecture, design, experimentation, and verification. This approach counters the aphorism's acknowledgment of imperfection by focusing on incremental improvements that make models increasingly useful for prediction and decision-making, even if they never achieve perfect accuracy.[1] Building on this, Kenneth P. Burnham advanced the discourse in 2002 by promoting multi-model inference as a robust alternative to reliance on a single "best" model, which often amplifies flaws due to selection bias and uncertainty. Through information-theoretic criteria like Akaike's Information Criterion (AIC), Burnham advocated averaging predictions across a set of candidate models weighted by their relative support from the data, thereby mitigating the risks of overconfidence in any one imperfect representation and improving overall inference reliability in statistical applications.[36] In the domain of artificial intelligence, recent position papers have extended these ideas to explainable AI (XAI), proposing "explanation agreement" as a mechanism to bolster robustness amid model imperfections. A 2024 framework, EXAGREE, formalizes methods to quantify and resolve disagreements in feature importance rankings across multiple explainers, enabling the identification of stable insights that persist despite individual model errors and thus enhancing trust in AI decisions for high-stakes scenarios like healthcare diagnostics.[37] Ongoing debates in 2025 continue to explore formalizations of model trustworthiness under imperfections, with arXiv preprints surveying trustworthiness dimensions—such as robustness, fairness, and explainability—in large language models (LLMs) and proposing metrics to quantify reliability when perfect fidelity to reality is unattainable. These works underscore the aphorism's relevance by developing protocols for uncertainty quantification and conflict resolution in ensemble systems, ensuring that even flawed models contribute to verifiable, safe AI deployments.[38]References
- https://en.wikiquote.org/wiki/George_E._P._Box
