Recent from talks
All channels
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Be the first to start a discussion here.
Welcome to the community hub built to collect knowledge and have discussions related to List of statistics articles.
Nothing was collected or created yet.
List of statistics articles
View on Wikipediafrom Wikipedia
| Statistics |
|---|
0–9
[edit]- 1.96
- 2SLS (two-stage least squares) – redirects to instrumental variable
- 3SLS – see three-stage least squares
- 68–95–99.7 rule
- 100-year flood
A
[edit]- A priori probability
- Abductive reasoning
- Absolute deviation
- Absolute risk reduction
- Absorbing Markov chain
- ABX test
- Accelerated failure time model
- Acceptable quality limit
- Acceptance sampling
- Accidental sampling
- Accuracy and precision
- Accuracy paradox
- Acquiescence bias
- Actuarial science
- Adapted process
- Adaptive estimator
- Additive Markov chain
- Additive model
- Additive smoothing
- Additive white Gaussian noise
- Adjusted Rand index – see Rand index (subsection)
- ADMB – software
- Admissible decision rule
- Age adjustment
- Age-standardized mortality rate
- Age stratification
- Aggregate data
- Aggregate pattern
- Akaike information criterion
- Algebra of random variables
- Algebraic statistics
- Algorithmic inference
- Algorithms for calculating variance
- All models are wrong
- All-pairs testing
- Allan variance
- Alignments of random points
- Almost surely
- Alpha beta filter
- Alternative hypothesis
- Analyse-it – software
- Analysis of categorical data
- Analysis of covariance
- Analysis of molecular variance
- Analysis of rhythmic variance
- Analysis of variance
- Analytic and enumerative statistical studies
- Ancestral graph
- Anchor test
- Ancillary statistic
- ANCOVA – redirects to Analysis of covariance
- Anderson–Darling test
- ANOVA
- ANOVA on ranks
- ANOVA–simultaneous component analysis
- Anomaly detection
- Anomaly time series
- Anscombe transform
- Anscombe's quartet
- Antecedent variable
- Antithetic variates
- Approximate Bayesian computation
- Approximate entropy
- Arcsine distribution
- Area chart
- Area compatibility factor
- ARGUS distribution
- Arithmetic mean
- Armitage–Doll multistage model of carcinogenesis
- Arrival theorem
- Artificial neural network
- Ascertainment bias
- ASReml – software
- Association (statistics)
- Association mapping
- Association scheme
- Assumed mean
- Astrostatistics
- Asymptotic distribution
- Asymptotic equipartition property (information theory)
- Asymptotic normality – redirects to Asymptotic distribution
- Asymptotic relative efficiency – redirects to Efficiency (statistics)
- Asymptotic theory (statistics)
- Atkinson index
- Attack rate
- Augmented Dickey–Fuller test
- Aumann's agreement theorem
- Autocorrelation
- Autocorrelation plot – redirects to Correlogram
- Autocovariance
- Autoregressive conditional duration
- Autoregressive conditional heteroskedasticity
- Autoregressive fractionally integrated moving average
- Autoregressive integrated moving average
- Autoregressive model
- Autoregressive–moving-average model
- Auxiliary particle filter
- Average
- Average treatment effect
- Averaged one-dependence estimators
- Azuma's inequality
B
[edit]- BA model – model for a random network
- Backfitting algorithm
- Balance equation
- Balanced incomplete block design – redirects to Block design
- Balanced repeated replication
- Balding–Nichols model
- Banburismus – related to Bayesian networks
- Bangdiwala's B
- Bapat–Beg theorem
- Bar chart
- Barabási–Albert model
- Barber–Johnson diagram
- Barnard's test
- Barnardisation
- Barnes interpolation
- Bartlett's method
- Bartlett's test
- Bartlett's theorem
- Base rate
- Baseball statistics
- Basu's theorem
- Bates distribution
- Baum–Welch algorithm
- Bayes classifier
- Bayes error rate
- Bayes estimator
- Bayes factor
- Bayes linear statistics
- Bayes' rule
- Bayes' theorem
- Bayesian – disambiguation
- Bayesian average
- Bayesian brain
- Bayesian econometrics
- Bayesian experimental design
- Bayesian game
- Bayesian inference
- Bayesian inference in marketing
- Bayesian inference in phylogeny
- Bayesian inference using Gibbs sampling
- Bayesian information criterion
- Bayesian linear regression
- Bayesian model comparison – see Bayes factor
- Bayesian multivariate linear regression
- Bayesian network
- Bayesian probability
- Bayesian search theory
- Bayesian spam filtering
- Bayesian statistics
- Bayesian tool for methylation analysis
- Bayesian vector autoregression
- BCMP network – queueing theory
- Bean machine
- Behrens–Fisher distribution
- Behrens–Fisher problem
- Belief propagation
- Belt transect
- Benford's law
- Benini distribution
- Bennett's inequality
- Berkson error model
- Berkson's paradox
- Berlin procedure
- Bernoulli distribution
- Bernoulli process
- Bernoulli sampling
- Bernoulli scheme
- Bernoulli trial
- Bernstein inequalities (probability theory)
- Bernstein–von Mises theorem
- Berry–Esseen theorem
- Bertrand's ballot theorem
- Bertrand's box paradox
- Bessel process
- Bessel's correction
- Best linear unbiased prediction
- Beta (finance)
- Beta-binomial distribution
- Beta-binomial model
- Beta distribution
- Beta function – for incomplete beta function
- Beta negative binomial distribution
- Beta prime distribution
- Beta rectangular distribution
- Beverton–Holt model
- Bhatia–Davis inequality
- Bhattacharya coefficient – redirects to Bhattacharyya distance
- Bias (statistics)
- Bias of an estimator
- Biased random walk (biochemistry)
- Biased sample – see Sampling bias
- Biclustering
- Big O in probability notation
- Bienaymé–Chebyshev inequality
- Bills of Mortality
- Bimodal distribution
- Binary classification
- Bingham distribution
- Binomial distribution
- Binomial proportion confidence interval
- Binomial regression
- Binomial test
- Bioinformatics
- Biometrics (statistics) – redirects to Biostatistics
- Biostatistics
- Biplot
- Birnbaum–Saunders distribution
- Birth–death process
- Bispectrum
- Bivariate analysis
- Bivariate von Mises distribution
- Black–Scholes
- Bland–Altman plot
- Blind deconvolution
- Blind experiment
- Block design
- Blocking (statistics)
- Blumenthal's zero–one law
- BMDP – software
- Bochner's theorem
- Bonferroni correction
- Bonferroni inequalities – redirects to Boole's inequality
- Boole's inequality
- Boolean analysis
- Bootstrap aggregating
- Bootstrap error-adjusted single-sample technique
- Bootstrapping (statistics)
- Bootstrapping populations
- Borel–Cantelli lemma
- Bose–Mesner algebra
- Box–Behnken design
- Box–Cox distribution
- Box–Cox transformation – redirects to Power transform
- Box–Jenkins
- Box–Muller transform
- Box–Pierce test
- Box plot
- Branching process
- Bregman divergence
- Breusch–Godfrey test
- Breusch–Pagan statistic – redirects to Breusch–Pagan test
- Breusch–Pagan test
- Brown–Forsythe test
- Brownian bridge
- Brownian excursion
- Brownian motion
- Brownian tree
- Bruck–Ryser–Chowla theorem
- Burke's theorem
- Burr distribution
- Business statistics
- Bühlmann model
- Buzen's algorithm
- BV4.1 (software)
C
[edit]- c-chart
- Càdlàg
- Calculating demand forecast accuracy
- Calculus of predispositions
- Calibrated probability assessment
- Calibration (probability) – subjective probability, redirects to Calibrated probability assessment
- Calibration (statistics) – the statistical calibration problem
- Cancer cluster
- Candlestick chart
- Canonical analysis
- Canonical correlation
- Canopy clustering algorithm
- Cantor distribution
- Carpet plot
- Cartogram
- Case-control – redirects to Case-control study
- Case-control study
- Catastro of Ensenada – a census of part of Spain
- Categorical data
- Categorical distribution
- Categorical variable
- Cauchy distribution
- Cauchy–Schwarz inequality
- Causal Markov condition
- CDF-based nonparametric confidence interval
- Ceiling effect (statistics)
- Cellular noise
- Censored regression model
- Censoring (clinical trials)
- Censoring (statistics)
- Centering matrix
- Centerpoint (geometry) – to which Tukey median redirects
- Central composite design
- Central limit theorem
- Central moment
- Central tendency
- Census
- Cepstrum
- CHAID – CHi-squared Automatic Interaction Detector
- Chain rule for Kolmogorov complexity
- Challenge–dechallenge–rechallenge
- Champernowne distribution
- Change detection
- Chapman–Kolmogorov equation
- Chapman–Robbins bound
- Characteristic function (probability theory)
- Chauvenet's criterion
- Chebyshev center
- Chebyshev's inequality
- Checking if a coin is biased – redirects to Checking whether a coin is fair
- Checking whether a coin is fair
- Cheeger bound
- Chemometrics
- Chernoff bound – a special case of Chernoff's inequality
- Chernoff face
- Chernoff's distribution
- Chernoff's inequality
- Chi distribution
- Chi-squared distribution
- Chi-squared test
- Chinese restaurant process
- Choropleth map
- Chow test
- Chronux software
- Circular analysis
- Circular distribution
- Circular error probable
- Circular statistics – redirects to Directional statistics
- Circular uniform distribution
- Civic statistics
- Clark–Ocone theorem
- Class membership probabilities
- Classic data sets
- Classical definition of probability
- Classical test theory – psychometrics
- Classification rule
- Classifier (mathematics)
- Climate ensemble
- Climograph
- Clinical significance
- Clinical study design
- Clinical trial
- Clinical utility of diagnostic tests
- Cliodynamics
- Closed testing procedure
- Cluster analysis
- Cluster randomised controlled trial
- Cluster sampling
- Cluster-weighted modeling
- Clustering high-dimensional data
- CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
- Coalescent theory
- Cochran's C test
- Cochran's Q test
- Cochran's theorem
- Cochran–Armitage test for trend
- Cochran–Mantel–Haenszel statistics
- Cochrane–Orcutt estimation
- Coding (social sciences)
- Coefficient of coherence – redirects to Coherence (statistics)
- Coefficient of determination
- Coefficient of dispersion
- Coefficient of variation
- Cognitive pretesting
- Cohen's class distribution function – a time-frequency distribution function
- Cohen's kappa
- Coherence (signal processing)
- Coherence (statistics)
- Cohort (statistics)
- Cohort effect
- Cohort study
- Cointegration
- Collectively exhaustive events
- Collider (epidemiology)
- Combinatorial data analysis
- Combinatorial design
- Combinatorial meta-analysis
- Common-method variance
- Common mode failure
- Common cause and special cause (statistics)
- Comonotonicity
- Comparing means
- Comparison of general and generalized linear models
- Comparison of statistical packages
- Comparisonwise error rate
- Complementary event
- Complete-linkage clustering
- Complete spatial randomness
- Completely randomized design
- Completeness (statistics)
- Compositional data
- Composite bar chart
- Compound Poisson distribution
- Compound Poisson process
- Compound probability distribution
- Computational formula for the variance
- Computational learning theory
- Computational statistics
- Computer experiment
- Computer-assisted survey information collection
- Concomitant (statistics)
- Concordance correlation coefficient
- Concordant pair
- Concrete illustration of the central limit theorem
- Concurrent validity
- Conditional change model
- Conditional distribution – see Conditional probability distribution
- Conditional dependence
- Conditional expectation
- Conditional independence
- Conditional probability
- Conditional probability distribution
- Conditional random field
- Conditional variance
- Conditionality principle
- Confidence band – redirects to Confidence and prediction bands
- Confidence distribution
- Confidence interval
- Confidence region
- Configural frequency analysis
- Confirmation bias
- Confirmatory factor analysis
- Confounding
- Confounding factor
- Confusion of the inverse
- Congruence coefficient
- Conjoint analysis
- Conjugate prior
- Consensus-based assessment
- Consensus clustering
- Consensus forecast
- Conservatism (belief revision)
- Consistency (statistics)
- Consistent estimator
- Constant elasticity of substitution
- Constant false alarm rate
- Constraint (information theory)
- Consumption distribution
- Contact process (mathematics)
- Content validity
- Contiguity (probability theory)
- Contingency table
- Continuity correction
- Continuous distribution – see Continuous probability distribution
- Continuous mapping theorem
- Continuous probability distribution
- Continuous stochastic process
- Continuous-time Markov process
- Continuous-time stochastic process
- Contrast (statistics)
- Control chart
- Control event rate
- Control limits
- Control variate
- Controlling for a variable
- Convergence of measures
- Convergence of random variables
- Convex hull
- Convolution of probability distributions
- Convolution random number generator
- Conway–Maxwell–Poisson distribution
- Cook's distance
- Cophenetic correlation
- Copula (statistics)
- Cornish–Fisher expansion
- Correct sampling
- Correction for attenuation
- Correlation
- Correlation and dependence
- Correlation does not imply causation
- Correlation clustering
- Correlation function
- Correlation inequality
- Correlation ratio
- Correlogram
- Correspondence analysis
- Cosmic variance
- Cost-of-living index
- Count data
- Counternull
- Counting process
- Covariance
- Covariance and correlation
- Covariance intersection
- Covariance matrix
- Covariance function
- Covariate
- Cover's theorem
- Coverage probability
- Cox process
- Cox's theorem
- Cox–Ingersoll–Ross model
- Cramér–Rao bound
- Cramér–von Mises criterion
- Cramér's decomposition theorem
- Cramér's theorem (large deviations)
- Cramér's V
- Craps principle
- Credal set
- Credible interval
- Cricket statistics
- Crime statistics
- Critical region – redirects to Statistical hypothesis testing
- Cromwell's rule
- Cronbach's α
- Cross-correlation
- Cross-covariance
- Cross-entropy method
- Cross-sectional data
- Cross-sectional regression
- Cross-sectional study
- Cross-spectrum
- Cross tabulation
- Cross-validation (statistics)
- Crossover study
- Crystal Ball function – a probability distribution
- Cumulant
- Cumulant generating function – redirects to cumulant
- Cumulative accuracy profile
- Cumulative distribution function
- Cumulative frequency analysis
- Cumulative incidence
- Cunningham function
- CURE data clustering algorithm
- Curve fitting
- CUSUM
- Cuzick–Edwards test
- Cyclostationary process
D
[edit]- d-separation
- D/M/1 queue
- D'Agostino's K-squared test
- Dagum distribution
- DAP – open source software
- Data analysis
- Data assimilation
- Data binning
- Data classification (business intelligence)
- Data cleansing
- Data clustering
- Data collection
- Data Desk – software
- Data dredging
- Data fusion
- Data generating process
- Data mining
- Data reduction
- Data point
- Data quality assurance
- Data set
- Data-snooping bias
- Data stream clustering
- Data transformation (statistics)
- Data visualization
- DataDetective – software
- Dataplot – software
- Davies–Bouldin index
- Davis distribution
- De Finetti's game
- De Finetti's theorem
- DeFries–Fulker regression
- de Moivre's law
- De Moivre–Laplace theorem
- Decision boundary
- Decision theory
- Decomposition of time series
- Degenerate distribution
- Degrees of freedom (statistics)
- Delaporte distribution
- Delphi method
- Delta method
- Demand forecasting
- Deming regression
- Demographics
- Demography
- Dendrogram
- Density estimation
- Dependent and independent variables
- Descriptive research
- Descriptive statistics
- Design effect
- Design matrix
- Design of experiments
- The Design of Experiments (book by Fisher)
- Detailed balance
- Detection theory
- Determining the number of clusters in a data set
- Detrended correspondence analysis
- Detrended fluctuation analysis
- Deviance (statistics)
- Deviance information criterion
- Deviation (statistics)
- Deviation analysis (disambiguation)
- DFFITS – a regression diagnostic
- Diagnostic odds ratio
- Dickey–Fuller test
- Difference in differences
- Differential entropy
- Diffusion process
- Diffusion-limited aggregation
- Dimension reduction
- Dilution assay
- Direct relationship
- Directional statistics
- Dirichlet distribution
- Dirichlet-multinomial distribution
- Dirichlet process
- Disattenuation
- Discrepancy function
- Discrete choice
- Discrete choice analysis
- Discrete distribution – redirects to section of Probability distribution
- Discrete frequency domain
- Discrete phase-type distribution
- Discrete probability distribution – redirects to section of Probability distribution
- Discrete time
- Discretization of continuous features
- Discriminant function analysis
- Discriminative model
- Disorder problem
- Distance correlation
- Distance sampling
- Distributed lag
- Distribution fitting
- Divergence (statistics)
- Diversity index
- Divisia index
- Divisia monetary aggregates index
- Dixon's Q test
- Dominating decision rule
- Donsker's theorem
- Doob decomposition theorem
- Doob martingale
- Doob's martingale convergence theorems
- Doob's martingale inequality
- Doob–Meyer decomposition theorem
- Doomsday argument
- Dot plot (bioinformatics)
- Dot plot (statistics)
- Double counting (fallacy)
- Double descent
- Double exponential distribution (disambiguation)
- Double mass analysis
- Doubly stochastic model
- Drift rate – redirects to Stochastic drift
- Dudley's theorem
- Dummy variable (statistics)
- Duncan's new multiple range test
- Dunn index
- Dunnett's test
- Durbin test
- Durbin–Watson statistic
- Dutch book
- Dvoretzky–Kiefer–Wolfowitz inequality
- Dyadic distribution
- Dynamic Bayesian network
- Dynamic factor
- Dynamic topic model
E
[edit]- E-statistic
- Earth mover's distance
- Eaton's inequality
- Ecological correlation
- Ecological fallacy
- Ecological regression
- Ecological study
- Econometrics
- Econometric model
- Econometric software – redirects to Comparison of statistical packages
- Economic data
- Economic epidemiology
- Economic statistics
- Eddy covariance
- Edgeworth series
- Effect size
- Efficiency (statistics)
- Efficient estimator
- Ehrenfest model
- Elastic map
- Elliptical distribution
- Ellsberg paradox
- Elston–Stewart algorithm
- EMG distribution
- Empirical
- Empirical Bayes method
- Empirical distribution function
- Empirical likelihood
- Empirical measure
- Empirical orthogonal functions
- Empirical probability
- Empirical process
- Empirical statistical laws
- Endogeneity (econometrics)
- End point of clinical trials
- Energy distance
- Energy statistics (disambiguation)
- Encyclopedia of Statistical Sciences (book)
- Engelbert–Schmidt zero–one law
- Engineering statistics
- Engineering tolerance
- Engset calculation
- Ensemble forecasting
- Ensemble Kalman filter
- Entropy (information theory)
- Entropy estimation
- Entropy power inequality
- Environmental statistics
- Epi Info – software
- Epidata – software
- Epidemic model
- Epidemiological methods
- Epilogism
- Epitome (image processing)
- Epps effect
- Equating – test equating
- Equipossible
- Equiprobable
- Erdős–Rényi model
- Erlang distribution
- Ergodic theory
- Ergodicity
- Error bar
- Error correction model
- Error function
- Errors and residuals in statistics
- Errors-in-variables models
- An Essay Towards Solving a Problem in the Doctrine of Chances
- Estimating equations
- Estimation theory
- Estimation of covariance matrices
- Estimation of signal parameters via rotational invariance techniques
- Estimator
- Etemadi's inequality
- Ethical problems using children in clinical trials
- Event (probability theory)
- Event study
- Evidence lower bound
- Evidence under Bayes theorem
- Evolutionary data mining
- Ewens's sampling formula
- EWMA chart
- Exact statistics
- Exact test
- Examples of Markov chains
- Excess risk
- Exchange paradox
- Exchangeable random variables
- Expander walk sampling
- Expectation–maximization algorithm
- Expectation propagation
- Expected mean squares
- Expected utility hypothesis
- Expected value
- Expected value of sample information
- Experiment
- Experimental design diagram
- Experimental event rate
- Experimental uncertainty analysis
- Experimenter's bias
- Experimentwise error rate
- Explained sum of squares
- Explained variation
- Explanatory variable
- Exploratory data analysis
- Exploratory factor analysis
- Exponential dispersion model
- Exponential distribution
- Exponential family
- Exponential-logarithmic distribution
- Exponential power distribution – redirects to Generalized normal distribution
- Exponential random numbers – redirect to subsection of Exponential distribution
- Exponential smoothing
- Exponentially modified Gaussian distribution
- Exponentiated Weibull distribution
- Exposure variable
- Extended Kalman filter
- Extended negative binomial distribution
- Extensions of Fisher's method
- External validity
- Extrapolation domain analysis
- Extreme value theory
- Extremum estimator
F
[edit]- F-distribution
- F-divergence
- F-statistics – population genetics
- F-test
- F-test of equality of variances
- F1 score
- Facet theory
- Factor analysis
- Factor regression model
- Factor graph
- Factorial code
- Factorial experiment
- Factorial moment
- Factorial moment generating function
- Failure rate
- Fair coin
- Falconer's formula
- False discovery rate
- False nearest neighbor algorithm
- False negative
- False positive
- False positive rate
- False positive paradox
- Family-wise error rate
- Fan chart (time series)
- Fano factor
- Fast Fourier transform
- Fast Kalman filter
- FastICA – fast independent component analysis
- Fat-tailed distribution
- Feasible generalized least squares
- Feature extraction
- Feller process
- Feller's coin-tossing constants
- Feller-continuous process
- Felsenstein's tree-pruning algorithm – statistical genetics
- Fides (reliability)
- Fiducial inference
- Field experiment
- Fieller's theorem
- File drawer problem
- Filtering problem (stochastic processes)
- Financial econometrics
- Financial models with long-tailed distributions and volatility clustering
- Finite-dimensional distribution
- First-hitting-time model
- First-in-man study
- Fishburn–Shepp inequality
- Fisher consistency
- Fisher information
- Fisher information metric
- Fisher kernel
- Fisher transformation
- Fisher's exact test
- Fisher's inequality
- Fisher's linear discriminator
- Fisher's method
- Fisher's noncentral hypergeometric distribution
- Fisher's z-distribution
- Fisher–Tippett distribution – redirects to Generalized extreme value distribution
- Fisher–Tippett–Gnedenko theorem
- Five-number summary
- Fixed effects estimator and Fixed effects estimation – redirect to Fixed effects model
- Fixed-effect Poisson model
- FLAME clustering
- Fleiss' kappa
- Fleming–Viot process
- Flood risk assessment
- Floor effect
- Focused information criterion
- Fokker–Planck equation
- Folded normal distribution
- Forecast bias
- Forecast error
- Forecast skill
- Forecasting
- Forest plot
- Fork-join queue
- Formation matrix
- Forward measure
- Foster's theorem
- Foundations of statistics
- Founders of statistics
- Fourier analysis
- Fowlkes–Mallows index
- Fraction of variance unexplained
- Fractional Brownian motion
- Fractional factorial design
- Fréchet distribution
- Fréchet mean
- Free statistical software
- Freedman's paradox
- Freedman–Diaconis rule
- Freidlin–Wentzell theorem
- Frequency (statistics)
- Frequency distribution
- Frequency domain
- Frequency probability
- Frequentist inference
- Friedman test
- Friendship paradox
- Frisch–Waugh–Lovell theorem
- Fully crossed design
- Function approximation
- Functional boxplot
- Functional data analysis
- Funnel plot
- Fuzzy logic
- Fuzzy measure theory
- FWL theorem – relating regression and projection
G
[edit]- G/G/1 queue
- G-network
- G-test
- Galbraith plot
- Gallagher Index
- Galton–Watson process
- Galton's problem
- Gambler's fallacy
- Gambler's ruin
- Gambling and information theory
- Game of chance
- Gamma distribution
- Gamma test (statistics)
- Gamma process
- Gamma variate
- GAUSS (software)
- Gauss's inequality
- Gauss–Kuzmin distribution
- Gauss–Markov process
- Gauss–Markov theorem
- Gauss–Newton algorithm
- Gaussian function
- Gaussian isoperimetric inequality
- Gaussian measure
- Gaussian noise
- Gaussian process
- Gaussian process emulator
- Gaussian q-distribution
- Geary's C
- GEH statistic – a statistic comparing modelled and observed counts
- General linear model
- Generalizability theory
- Generalized additive model
- Generalized additive model for location, scale and shape
- Generalized beta distribution
- Generalized canonical correlation
- Generalized chi-squared distribution
- Generalized Dirichlet distribution
- Generalized entropy index
- Generalized estimating equation
- Generalized expected utility
- Generalized extreme value distribution
- Generalized gamma distribution
- Generalized Gaussian distribution
- Generalised hyperbolic distribution
- Generalized inverse Gaussian distribution
- Generalized least squares
- Generalized linear array model
- Generalized linear mixed model
- Generalized linear model
- Generalized logistic distribution
- Generalized method of moments
- Generalized multidimensional scaling
- Generalized multivariate log-gamma distribution
- Generalized normal distribution
- Generalized p-value
- Generalized Pareto distribution
- Generalized Procrustes analysis
- Generalized randomized block design
- Generalized Tobit
- Generalized Wiener process
- Generative model
- Genetic epidemiology
- GenStat – software
- Geo-imputation
- Geodemographic segmentation
- Geometric Brownian motion
- Geometric data analysis
- Geometric distribution
- Geometric median
- Geometric standard deviation
- Geometric stable distribution
- Geospatial predictive modeling
- Geostatistics
- German tank problem
- Gerschenkron effect
- Gibbs sampling
- Gillespie algorithm
- Gini coefficient
- Girsanov theorem
- Gittins index
- GLIM (software) – software
- Glivenko–Cantelli theorem
- GLUE (uncertainty assessment)
- Goldfeld–Quandt test
- Gompertz distribution
- Gompertz function
- Gompertz–Makeham law of mortality
- Good–Turing frequency estimation
- Goodhart's law
- Goodman and Kruskal's gamma
- Goodman and Kruskal's lambda
- Goodness of fit
- Gordon–Newell network
- Gordon–Newell theorem
- Graeco-Latin square
- Grand mean
- Granger causality
- Graph cuts in computer vision – a potential application of Bayesian analysis
- Graphical model
- Graphical models for protein structure
- GraphPad InStat – software
- GraphPad Prism – software
- Gravity model of trade
- Greenwood statistic
- Gretl
- Group family
- Group method of data handling
- Group size measures
- Grouped data
- Grubbs's test for outliers
- Guess value
- Guesstimate
- Gumbel distribution
- Guttman scale
- Gy's sampling theory
H
[edit]- h-index
- Hájek–Le Cam convolution theorem
- Half circle distribution
- Half-logistic distribution
- Half-normal distribution
- Halton sequence
- Hamburger moment problem
- Hannan–Quinn information criterion
- Harris chain
- Hardy–Weinberg principle – statistical genetics
- Hartley's test
- Hat matrix
- Hammersley–Clifford theorem
- Hausdorff moment problem
- Hausman specification test – redirects to Hausman test
- Haybittle–Peto boundary
- Hazard function – redirects to Failure rate
- Hazard ratio
- Heaps' law
- Health care analytics
- Heart rate variability
- Heavy-tailed distribution
- Heckman correction
- Hedonic regression
- Hellin's law
- Hellinger distance
- Helmert–Wolf blocking
- Herdan's law
- Herfindahl index
- Heston model
- Heteroscedasticity
- Heteroscedasticity-consistent standard errors
- Heteroskedasticity – see Heteroscedasticity
- Hewitt–Savage zero–one law
- Hidden Markov model
- Hidden Markov random field
- Hidden semi-Markov model
- Hierarchical Bayes model
- Hierarchical clustering
- Hierarchical hidden Markov model
- Hierarchical linear modeling
- High-dimensional statistics
- Higher-order factor analysis
- Higher-order statistics
- Hirschman uncertainty
- Histogram
- Historiometry
- History of randomness
- History of statistics
- Hitting time
- Hodges' estimator
- Hodges–Lehmann estimator
- Hoeffding's independence test
- Hoeffding's lemma
- Hoeffding's inequality
- Holm–Bonferroni method
- Holtsmark distribution
- Homogeneity (statistics)
- Homogenization (climate)
- Homoscedasticity
- Hoover index (a.k.a. Robin Hood index)
- Horvitz–Thompson estimator
- Hosmer–Lemeshow test
- Hotelling's T-squared distribution
- How to Lie with Statistics (book)
- Howland will forgery trial
- Hubbert curve
- Huber–White standard error – see Heteroscedasticity-consistent standard errors
- Huber loss function
- Human subject research
- Hurst exponent
- Hyper-exponential distribution
- Hyper-Graeco-Latin square design
- Hyperbolic distribution
- Hyperbolic secant distribution
- Hypergeometric distribution
- Hyperparameter (Bayesian statistics)
- Hyperparameter (machine learning)
- Hyperprior
- Hypoexponential distribution
I
[edit]- Idealised population
- Idempotent matrix
- Identifiability
- Ignorability
- Illustration of the central limit theorem
- Image denoising
- Importance sampling
- Imprecise probability
- Impulse response
- Imputation (statistics)
- Incidence (epidemiology)
- Increasing process
- Indecomposable distribution
- Independence of irrelevant alternatives
- Independent component analysis
- Independent and identically distributed random variables
- Index (economics)
- Index number
- Index of coincidence
- Index of dispersion
- Index of dissimilarity
- Indicators of spatial association
- Indirect least squares
- Inductive inference
- An inequality on location and scale parameters – see Chebyshev's inequality
- Inference
- Inferential statistics – redirects to Statistical inference
- Infinite divisibility (probability)
- Infinite monkey theorem
- Influence diagram
- Info-gap decision theory
- Information bottleneck method
- Information geometry
- Information gain ratio
- Information ratio – finance
- Information source (mathematics)
- Information theory
- Inherent bias
- Inherent zero
- Injury prevention – application
- Innovation (signal processing)
- Innovations vector
- Institutional review board
- Instrumental variable
- Integrated nested Laplace approximations
- Intention to treat analysis
- Interaction (statistics)
- Interaction variable – see Interaction (statistics)
- Interclass correlation
- Interdecile range
- Interim analysis
- Internal consistency
- Internal validity
- Interquartile mean
- Interquartile range
- Inter-rater reliability
- Interval estimation
- Intervening variable
- Intra-rater reliability
- Intraclass correlation
- Invariant estimator
- Invariant extended Kalman filter
- Inverse distance weighting
- Inverse distribution
- Inverse Gaussian distribution
- Inverse matrix gamma distribution
- Inverse Mills ratio
- Inverse probability
- Inverse probability weighting
- Inverse relationship
- Inverse-chi-squared distribution
- Inverse-gamma distribution
- Inverse transform sampling
- Inverse-variance weighting
- Inverse-Wishart distribution
- Iris flower data set
- Irwin–Hall distribution
- Isomap
- Isotonic regression
- Isserlis' theorem
- Item response theory
- Item-total correlation
- Item tree analysis
- Iterative proportional fitting
- Iteratively reweighted least squares
- Itô calculus
- Itô isometry
- Itô's lemma
J
[edit]- Jaccard index
- Jackknife (statistics)
- Jackson network
- Jackson's theorem (queueing theory)
- Jadad scale
- James–Stein estimator
- Jarque–Bera test
- Jeffreys prior
- Jensen's inequality
- Jensen–Shannon divergence
- JMulTi – software
- Johansen test
- Johnson SU distribution
- Joint probability distribution
- Jonckheere's trend test
- JMP (statistical software)
- Jump process
- Jump-diffusion model
- Junction tree algorithm
K
[edit]- K-distribution
- K-means algorithm – redirects to k-means clustering
- K-means++
- K-medians clustering
- K-medoids
- K-statistic
- Kalman filter
- Kaplan–Meier estimator
- Kappa coefficient
- Kappa statistic
- Karhunen–Loève theorem
- Kendall tau distance
- Kendall tau rank correlation coefficient
- Kendall's notation
- Kendall's W – Kendall's coefficient of concordance
- Kent distribution
- Kernel density estimation
- Kernel Fisher discriminant analysis
- Kernel methods
- Kernel principal component analysis
- Kernel regression
- Kernel smoother
- Kernel (statistics)
- Khmaladze transformation (probability theory)
- Killed process
- Khintchine inequality
- Kingman's formula
- Kirkwood approximation
- Kish grid
- Kitchen sink regression
- Klecka's tau
- Knightian uncertainty
- Kolmogorov backward equation
- Kolmogorov continuity theorem
- Kolmogorov extension theorem
- Kolmogorov's criterion
- Kolmogorov's generalized criterion
- Kolmogorov's inequality
- Kolmogorov's zero–one law
- Kolmogorov–Smirnov test
- KPSS test
- Kriging
- Kruskal–Wallis one-way analysis of variance
- Kuder–Richardson Formula 20
- Kuiper's test
- Kullback's inequality
- Kullback–Leibler divergence
- Kumaraswamy distribution
- Kurtosis
- Kushner equation
L
[edit]- L-estimator
- L-moment
- Labour Force Survey
- Lack-of-fit sum of squares
- Lady tasting tea
- Lag operator
- Lag windowing
- Lambda distribution – disambiguation
- Landau distribution
- Lander–Green algorithm
- Language model
- Laplace distribution
- Laplace principle (large deviations theory)
- LaplacesDemon – software
- Large deviations theory
- Large deviations of Gaussian random functions
- LARS – see least-angle regression
- Latent variable, latent variable model
- Latent class model
- Latent Dirichlet allocation
- Latent growth modeling
- Latent semantic analysis
- Latin rectangle
- Latin square
- Latin hypercube sampling
- Law (stochastic processes)
- Law of averages
- Law of comparative judgment
- Law of large numbers
- Law of the iterated logarithm
- Law of the unconscious statistician
- Law of total covariance
- Law of total cumulance
- Law of total expectation
- Law of total probability
- Law of total variance
- Law of truly large numbers
- Layered hidden Markov model
- Le Cam's theorem
- Lead time bias
- Least absolute deviations
- Least-angle regression
- Least squares
- Least-squares spectral analysis
- Least squares support vector machine
- Least trimmed squares
- Learning theory (statistics)
- Leftover hash-lemma
- Lehmann–Scheffé theorem
- Length time bias
- Levene's test
- Level of analysis
- Level of measurement
- Levenberg–Marquardt algorithm
- Leverage (statistics)
- Levey–Jennings chart – redirects to Laboratory quality control
- Lévy's convergence theorem
- Lévy's continuity theorem
- Lévy arcsine law
- Lévy distribution
- Lévy flight
- Lévy process
- Lewontin's Fallacy
- Lexis diagram
- Lexis ratio
- Lies, damned lies, and statistics
- Life expectancy
- Life table
- Lift (data mining)
- Likelihood function
- Likelihood principle
- Likelihood-ratio test
- Likelihood ratios in diagnostic testing
- Likert scale
- Lilliefors test
- Limited dependent variable
- Limiting density of discrete points
- Lincoln index
- Lindeberg's condition
- Lindley equation
- Lindley's paradox
- Line chart
- Line-intercept sampling
- Linear classifier
- Linear discriminant analysis
- Linear least squares
- Linear model
- Linear prediction
- Linear probability model
- Linear regression
- Linguistic demography
- Linnik distribution – redirects to Geometric stable distribution
- LISREL – proprietary statistical software package
- List of basic statistics topics – redirects to Outline of statistics
- List of convolutions of probability distributions
- List of graphical methods
- List of information graphics software
- List of probability topics
- List of random number generators
- List of scientific journals in statistics
- List of statistical packages
- List of statisticians
- Listwise deletion
- Little's law
- Littlewood's law
- Ljung–Box test
- Local convex hull
- Local independence
- Local martingale
- Local regression
- Location estimation – redirects to Location parameter
- Location estimation in sensor networks
- Location parameter
- Location test
- Location-scale family
- Local asymptotic normality
- Locality (statistics)
- Loess curve – redirects to Local regression
- Log-Cauchy distribution
- Log-Laplace distribution
- Log-normal distribution
- Log-linear analysis
- Log-linear model
- Log-linear modeling – redirects to Poisson regression
- Log-log plot
- Log-logistic distribution
- Logarithmic distribution
- Logarithmic mean
- Logistic distribution
- Logistic function
- Logistic regression
- Logit
- Logit analysis in marketing
- Logit-normal distribution
- Log-normal distribution
- Logrank test
- Lomax distribution
- Long-range dependency
- Long Tail
- Long-tail traffic
- Longitudinal study
- Longstaff–Schwartz model
- Lorenz curve
- Loss function
- Lot quality assurance sampling
- Lotka's law
- Low birth weight paradox
- Lucia de Berk – prob/stats related court case
- Lukacs's proportion-sum independence theorem
- Lumpability
- Lusser's law
- Lyapunov's central limit theorem
M
[edit]- M/D/1 queue
- M/G/1 queue
- M/M/1 queue
- M/M/c queue
- M-estimator
- M-separation
- Mabinogion sheep problem
- Machine learning
- Mahalanobis distance
- Main effect
- Mallows's Cp
- MANCOVA
- Manhattan plot
- Mann–Whitney U
- MANOVA
- Mantel test
- MAP estimator – redirects to Maximum a posteriori estimation
- Marchenko–Pastur distribution
- Marcinkiewicz–Zygmund inequality
- Marcum Q-function
- Margin of error
- Marginal conditional stochastic dominance
- Marginal distribution
- Marginal likelihood
- Marginal model
- Marginal variable – redirects to Marginal distribution
- Mark and recapture
- Markov additive process
- Markov blanket
- Markov chain
- Markov chain Monte Carlo
- Markov decision process
- Markov information source
- Markov kernel
- Markov logic network
- Markov model
- Markov network
- Markov process
- Markov property
- Markov random field
- Markov renewal process
- Markov's inequality
- Markovian arrival processes
- Marsaglia polar method
- Martingale (probability theory)
- Martingale difference sequence
- Martingale representation theorem
- Master equation
- Matched filter
- Matching pursuit
- Matching (statistics)
- Matérn covariance function
- Mathematica – software
- Mathematical biology
- Mathematical modelling in epidemiology
- Mathematical modelling of infectious disease
- Mathematical statistics
- Matthews correlation coefficient
- Matrix gamma distribution
- Matrix normal distribution
- Matrix population models
- Matrix t-distribution
- Mauchly's sphericity test
- Maximal ergodic theorem
- Maximal information coefficient
- Maximum a posteriori estimation
- Maximum entropy classifier – redirects to Logistic regression
- Maximum-entropy Markov model
- Maximum entropy method – redirects to Principle of maximum entropy
- Maximum entropy probability distribution
- Maximum entropy spectral estimation
- Maximum likelihood
- Maximum likelihood sequence estimation
- Maximum parsimony
- Maximum spacing estimation
- Maxwell speed distribution
- Maxwell–Boltzmann distribution
- Maxwell's theorem
- Mazziotta–Pareto index
- MCAR (missing completely at random)
- McCullagh's parametrization of the Cauchy distributions
- McDiarmid's inequality
- McDonald–Kreitman test – statistical genetics
- McKay's approximation for the coefficient of variation
- McNemar's test
- Meadow's law
- Mean
- Mean – see also expected value
- Mean absolute error
- Mean absolute percentage error
- Mean absolute scaled error
- Mean and predicted response
- Mean deviation (disambiguation)
- Mean difference
- Mean integrated squared error
- Mean of circular quantities
- Mean percentage error
- Mean preserving spread
- Mean reciprocal rank
- Mean signed difference
- Mean square quantization error
- Mean square weighted deviation
- Mean squared error
- Mean squared prediction error
- Mean time between failures
- Mean-reverting process – redirects to Ornstein–Uhlenbeck process
- Mean value analysis
- Measurement, level of – see level of measurement.
- Measurement invariance
- MedCalc – software
- Median
- Median absolute deviation
- Median polish
- Median test
- Mediation (statistics)
- Medical statistics
- Medoid
- Memorylessness
- Mendelian randomization
- Meta-analysis
- Meta-regression
- Metalog distribution
- Method of moments (statistics)
- Method of simulated moments
- Method of support
- Metropolis–Hastings algorithm
- Mexican paradox
- Microdata (statistics)
- Midhinge
- Mid-range
- MinHash
- Minimax
- Minimax estimator
- Minimisation (clinical trials)
- Minimum chi-square estimation
- Minimum distance estimation
- Minimum mean square error
- Minimum-variance unbiased estimator
- Minimum viable population
- Minitab
- MINQUE – minimum norm quadratic unbiased estimation
- Misleading graph
- Missing completely at random
- Missing data
- Missing values – see Missing data
- Mittag–Leffler distribution
- Mixed logit
- Misconceptions about the normal distribution
- Misuse of statistics
- Mixed data sampling
- Mixed-design analysis of variance
- Mixed model
- Mixing (mathematics)
- Mixture distribution
- Mixture model
- Mixture (probability)
- MLwiN
- Mode (statistics)
- Model output statistics
- Model selection
- Model specification
- Moderator variable – redirects to Moderation (statistics)
- Modifiable areal unit problem
- Moffat distribution
- Moment (mathematics)
- Moment-generating function
- Moments, method of – see method of moments (statistics)
- Moment problem
- Monotone likelihood ratio
- Monte Carlo integration
- Monte Carlo method
- Monte Carlo method for photon transport
- Monte Carlo methods for option pricing
- Monte Carlo methods in finance
- Monte Carlo molecular modeling
- Moral graph
- Moran process
- Moran's I
- Morisita's overlap index
- Morris method
- Mortality rate
- Most probable number
- Moving average
- Moving-average model
- Moving average representation – redirects to Wold's theorem
- Moving least squares
- Multi-armed bandit
- Multi-vari chart
- Multiclass classification
- Multiclass LDA (linear discriminant analysis) – redirects to Linear discriminant analysis
- Multicollinearity
- Multidimensional analysis
- Multidimensional Chebyshev's inequality
- Multidimensional panel data
- Multidimensional scaling
- Multifactor design of experiments software
- Multifactor dimensionality reduction
- Multilevel model
- Multilinear principal component analysis
- Multinomial distribution
- Multinomial logistic regression
- Multinomial logit – see Multinomial logistic regression
- Multinomial probit
- Multinomial test
- Multiple baseline design
- Multiple comparisons
- Multiple correlation
- Multiple correspondence analysis
- Multiple discriminant analysis
- Multiple-indicator kriging
- Multiple Indicator Cluster Survey
- Multiple of the median
- Multiple testing correction – redirects to Multiple comparisons
- Multiple-try Metropolis
- Multiresolution analysis
- Multiscale decision making
- Multiscale geometric analysis
- Multistage testing
- Multitaper
- Multitrait-multimethod matrix
- Multivariate adaptive regression splines
- Multivariate analysis
- Multivariate analysis of variance
- Multivariate distribution – see Joint probability distribution
- Multivariate kernel density estimation
- Multivariate normal distribution
- Multivariate Pareto distribution
- Multivariate Pólya distribution
- Multivariate probit – redirects to Multivariate probit model
- Multivariate random variable
- Multivariate stable distribution
- Multivariate statistics
- Multivariate Student distribution – redirects to Multivariate t-distribution
- Multivariate t-distribution
N
[edit]- n = 1 fallacy
- N of 1 trial
- Naive Bayes classifier
- Nakagami distribution
- National and international statistical services
- Nash–Sutcliffe model efficiency coefficient
- National Health Interview Survey
- Natural experiment
- Natural exponential family
- Natural process variation
- NCSS (statistical software)
- Nearest-neighbor chain algorithm
- Negative binomial distribution
- Negative multinomial distribution
- Negative predictive value
- Negative relationship
- Negentropy
- Neighbourhood components analysis
- Neighbor joining
- Nelson rules
- Nelson–Aalen estimator
- Nemenyi test
- Nested case-control study
- Nested sampling algorithm
- Network probability matrix
- Neutral vector
- Newcastle–Ottawa scale
- Newey–West estimator
- Newman–Keuls method
- Neyer d-optimal test
- Neyman construction
- Neyman–Pearson lemma
- Nicholson–Bailey model
- Nominal category
- Noncentral beta distribution
- Noncentral chi distribution
- Noncentral chi-squared distribution
- Noncentral F-distribution
- Noncentral hypergeometric distributions
- Noncentral t-distribution
- Noncentrality parameter
- Nonlinear autoregressive exogenous model
- Nonlinear dimensionality reduction
- Non-linear iterative partial least squares
- Nonlinear regression
- Non-homogeneous Poisson process
- Non-linear least squares
- Non-negative matrix factorization
- Nonparametric skew
- Non-parametric statistics
- Non-response bias
- Non-sampling error
- Nonparametric regression
- Nonprobability sampling
- Normal curve equivalent
- Normal distribution
- Normal probability plot – see also rankit
- Normal score – see also rankit and Z score
- Normal variance-mean mixture
- Normal-exponential-gamma distribution
- Normal-gamma distribution
- Normal-inverse Gaussian distribution
- Normal-scaled inverse gamma distribution
- Normality test
- Normalization (statistics)
- Notation in probability and statistics
- Novikov's condition
- np-chart
- Null distribution
- Null hypothesis
- Null result
- Nuisance parameter
- Nuisance variable
- Numerical data
- Numerical methods for linear least squares
- Numerical parameter – redirects to statistical parameter
- Numerical smoothing and differentiation
- Nuremberg Code
O
[edit]- O'Brien–Fleming boundary
- Observable variable
- Observational equivalence
- Observational error
- Observational study
- Observed information
- Occupancy frequency distribution
- Odds
- Odds algorithm
- Odds ratio
- Official statistics
- Ogden tables
- Ogive (statistics)
- Omitted-variable bias
- Omnibus test
- One- and two-tailed tests
- One-class classification
- One-factor-at-a-time method
- One-tailed test – redirects to One- and two-tailed tests
- One-way analysis of variance
- Online NMF Online Non-negative Matrix Factorisation
- Open-label trial
- OpenEpi – software
- OpenBUGS – software
- Operational confound
- Operational sex ratio
- Operations research
- Opinion poll
- Optimal decision
- Optimal design
- Optimal discriminant analysis
- Optimal matching
- Optimal stopping
- Optimality criterion
- Optimistic knowledge gradient
- Optional stopping theorem
- Order of a kernel
- Order of integration
- Order statistic
- Ordered logit
- Ordered probit
- Ordered subset expectation maximization
- Ordinal regression
- Ordinary least squares
- Ordination (statistics)
- Ornstein–Uhlenbeck process
- Orthogonal array testing
- Orthogonality
- Orthogonality principle
- Outlier
- Outliers ratio
- Outline of probability
- Outline of regression analysis
- Outline of statistics
- Overdispersion
- Overfitting
- Owen's T function
- OxMetrics – software
P
[edit]- p-chart
- p-rep
- P-value
- P–P plot
- Page's trend test
- Paid survey
- Paired comparison analysis
- Paired difference test
- Pairwise comparison
- Pairwise independence
- Panel analysis
- Panel data
- Panjer recursion – a class of discrete compound distributions
- Paley–Zygmund inequality
- Parabolic fractal distribution
- PARAFAC (parallel factor analysis)
- Parallel coordinates – graphical display of data
- Parallel factor analysis – redirects to PARAFAC
- Paradigm (experimental)
- Parameter identification problem
- Parameter space
- Parametric family
- Parametric model
- Parametric statistics
- Pareto analysis
- Pareto chart
- Pareto distribution
- Pareto index
- Pareto interpolation
- Pareto principle
- Park test
- Partial autocorrelation – redirects to Partial autocorrelation function
- Partial autocorrelation function
- Partial correlation
- Partial least squares
- Partial least squares regression
- Partial leverage
- Partial regression plot
- Partial residual plot
- Particle filter
- Partition of sums of squares
- Parzen window
- Path analysis (statistics)
- Path coefficient
- Path space (disambiguation)
- Pattern recognition
- Pearson's chi-squared test (one of various chi-squared tests)
- Pearson distribution
- Pearson product-moment correlation coefficient
- Pedometric mapping
- People v. Collins (prob/stats related court case)
- Per capita
- Per-comparison error rate
- Per-protocol analysis
- Percentile
- Percentile rank
- Periodic variation – redirects to Seasonality
- Periodogram
- Peirce's criterion
- Pensim2 – an econometric model
- Percentage point
- Permutation test – redirects to Resampling (statistics)
- Pharmaceutical statistics
- Phase dispersion minimization
- Phase-type distribution
- Phi coefficient
- Phillips–Perron test
- Philosophy of probability
- Philosophy of statistics
- Pickands–Balkema–de Haan theorem
- Pie chart
- Piecewise-deterministic Markov process
- Pignistic probability
- Pill puzzle
- Pinsker's inequality
- Pitman closeness criterion
- Pitman–Koopman–Darmois theorem
- Pitman–Yor process
- Pivotal quantity
- Placebo-controlled study
- Plackett–Burman design
- Plate notation
- Plot (graphics)
- Pocock boundary
- Poincaré plot
- Point-biserial correlation coefficient
- Point estimation
- Point pattern analysis
- Point process
- Poisson binomial distribution
- Poisson distribution
- Poisson hidden Markov model
- Poisson limit theorem
- Poisson process
- Poisson regression
- Poisson random numbers – redirects to section of Poisson distribution
- Poisson sampling
- Polar distribution – see Circular distribution
- Policy capturing
- Political forecasting
- Pollaczek–Khinchine formula
- Pollyanna Creep
- Polykay
- Poly-Weibull distribution
- Polychoric correlation
- Polynomial and rational function modeling
- Polynomial chaos
- Polynomial regression
- Polytree (Bayesian networks)
- Pooled standard deviation – redirects to Pooled variance
- Pooling design
- Popoviciu's inequality on variances
- Population
- Population dynamics
- Population ecology – application
- Population modeling
- Population process
- Population pyramid
- Population statistics
- Population variance
- Population viability analysis
- Portmanteau test
- Positive predictive value
- Post-hoc analysis
- Posterior predictive distribution
- Posterior probability
- Power law
- Power transform
- Prais–Winsten estimation
- Pre- and post-test probability
- Precision (statistics)
- Precision and recall
- Prediction interval
- Predictive analytics
- Predictive inference
- Predictive informatics
- Predictive intake modelling
- Predictive modelling
- Predictive validity
- Preference regression (in marketing)
- Preferential attachment process – see Preferential attachment
- PRESS statistic
- Prevalence
- Principal component analysis
- Principal component regression
- Principal geodesic analysis
- Principal stratification
- Principle of indifference
- Principle of marginality
- Principle of maximum entropy
- Prior knowledge for pattern recognition
- Prior probability
- Prior probability distribution – redirects to Prior probability
- Probabilistic causation
- Probabilistic design
- Probabilistic forecasting
- Probabilistic latent semantic analysis
- Probabilistic metric space
- Probabilistic proposition
- Probabilistic relational model
- Probability
- Probability bounds analysis
- Probability box
- Probability density function
- Probability distribution
- Probability distribution function (disambiguation)
- Probability integral transform
- Probability interpretations
- Probability mass function
- Probability matching
- Probability metric
- Probability of error
- Probability of precipitation
- Probability plot
- Probability plot correlation coefficient – redirects to Q–Q plot
- Probability plot correlation coefficient plot
- Probability space
- Probability theory
- Probability-generating function
- Probable error
- Probit
- Probit model
- Procedural confound
- Process control
- Process Window Index
- Procrustes analysis
- Proebsting's paradox
- Product distribution
- Product form solution
- Profile likelihood – redirects to Likelihood function
- Progressively measurable process
- Prognostics
- Projection pursuit
- Projection pursuit regression
- Proof of Stein's example
- Propagation of uncertainty
- Propensity probability
- Propensity score
- Propensity score matching
- Proper linear model
- Proportional hazards models
- Proportional reduction in loss
- Prosecutor's fallacy
- Proxy (statistics)
- Psephology
- Pseudo-determinant
- Pseudo-random number sampling
- Pseudocount
- Pseudolikelihood
- Pseudomedian
- Pseudoreplication
- PSPP (free software)
- Psychological statistics
- Psychometrics
- Pythagorean expectation
Q
[edit]- Q test
- Q-exponential distribution
- Q-function
- Q-Gaussian distribution
- Q–Q plot
- Q-statistic (disambiguation)
- Quadrat
- Quadrant count ratio
- Quadratic classifier
- Quadratic form (statistics)
- Quadratic variation
- Qualitative comparative analysis
- Qualitative data
- Qualitative variation
- Quality control
- Quantile
- Quantile function
- Quantile normalization
- Quantile regression
- Quantile-parameterized distribution
- Quantitative marketing research
- Quantitative psychological research
- Quantitative research
- Quartile
- Quartile coefficient of dispersion
- Quasi-birth–death process
- Quasi-experiment
- Quasi-experimental design – see Design of quasi-experiments
- Quasi-likelihood
- Quasi-maximum likelihood
- Quasireversibility
- Quasi-variance
- Questionnaire
- Queueing model
- Queueing theory
- Queuing delay
- Queuing theory in teletraffic engineering
- Quota sampling
R
[edit]- R programming language – see R (programming language)
- R v Adams (prob/stats related court case)
- Radar chart
- Rademacher distribution
- Radial basis function network
- Raikov's theorem
- Raised cosine distribution
- Ramaswami's formula
- Ramsey RESET test – the Ramsey Regression Equation Specification Error Test
- Rand index
- Random assignment
- Random compact set
- Random data – see randomness
- Random effects estimation – see Random effects model
- Random effects model
- Random element
- Random field
- Random function
- Random graph
- Random matrix
- Random measure
- Random multinomial logit
- Random naive Bayes
- Random permutation statistics
- Random regular graph
- Random sample
- Random sampling
- Random sequence
- Random variable
- Random variate
- Random walk
- Random walk hypothesis
- Randomization
- Randomized block design
- Randomized controlled trial
- Randomized decision rule
- Randomized experiment
- Randomized response
- Randomness
- Randomness tests
- Range (statistics)
- Rank abundance curve
- Rank correlation mainly links to two following
- Rank product
- Rank-size distribution
- Ranking
- Rankit
- Ranklet
- RANSAC
- Rao–Blackwell theorem
- Rao-Blackwellisation – see Rao–Blackwell theorem
- Rasch model
- Rasch model estimation
- Ratio distribution
- Ratio estimator
- Rational quadratic covariance function
- Rayleigh distribution
- Rayleigh mixture distribution
- Raw score
- Realization (probability)
- Recall bias
- Receiver operating characteristic
- Reciprocal distribution
- Rectified Gaussian distribution
- Recurrence period density entropy
- Recurrence plot
- Recurrence quantification analysis
- Recursive Bayesian estimation
- Recursive least squares
- Recursive partitioning
- Reduced form
- Reference class problem
- Reflected Brownian motion
- Regenerative process
- Regression analysis – see also linear regression
- Regression Analysis of Time Series – proprietary software
- Regression control chart
- Regression diagnostic
- Regression dilution
- Regression discontinuity design
- Regression estimation
- Regression fallacy
- Regression-kriging
- Regression model validation
- Regression toward the mean
- Regret (decision theory)
- Reification (statistics)
- Rejection sampling
- Relationships among probability distributions
- Relative change and difference
- Relative efficiency – redirects to Efficiency (statistics)
- Relative index of inequality
- Relative likelihood
- Relative risk
- Relative risk reduction
- Relative standard deviation
- Relative standard error – redirects to Relative standard deviation
- Relative variance – redirects to Relative standard deviation
- Relative survival
- Relativistic Breit–Wigner distribution
- Relevance vector machine
- Reliability (statistics)
- Reliability block diagram
- Reliability engineering
- Reliability theory
- Reliability theory of aging and longevity
- Rencontres numbers – a discrete distribution
- Renewal theory
- Repeatability
- Repeated measures design
- Replication (statistics)
- Representation validity
- Reproducibility
- Resampling (statistics)
- Rescaled range
- Resentful demoralization – experimental design
- Residual. See errors and residuals in statistics.
- Residual sum of squares
- Response bias
- Response rate (survey)
- Response surface methodology
- Response variable
- Restricted maximum likelihood
- Restricted randomization
- Reversible-jump Markov chain Monte Carlo
- Reversible dynamics
- Rind et al. controversy – interpretations of paper involving meta-analysis
- Rice distribution
- Richardson–Lucy deconvolution
- Ridge regression – redirects to Tikhonov regularization
- Ridit scoring
- Risk adjusted mortality rate
- Risk factor
- Risk function
- Risk perception
- Risk theory
- Risk–benefit analysis
- Robbins lemma
- Robust Bayesian analysis
- Robust confidence intervals
- Robust measures of scale
- Robust regression
- Robust statistics
- Root mean square
- Root-mean-square deviation
- Root mean square deviation (bioinformatics)
- Root mean square fluctuation
- Ross's conjecture
- Rossmo's formula
- Rothamsted Experimental Station
- Round robin test
- Rubin causal model
- Ruin theory
- Rule of succession
- Rule of three (medicine)
- Run chart
- RV coefficient
S
[edit]- S (programming language)
- S-PLUS
- Safety in numbers
- Sally Clark (prob/stats related court case)
- Sammon projection
- Sample mean and covariance – redirects to Sample mean and sample covariance
- Sample mean and sample covariance
- Sample maximum and minimum
- Sample size determination
- Sample space
- Sample (statistics)
- Sample-continuous process
- Sampling (statistics)
- Sampling bias
- Sampling design
- Sampling distribution
- Sampling error
- Sampling fraction
- Sampling frame
- Sampling probability
- Sampling risk
- Samuelson's inequality
- Sargan test
- SAS (software)
- SAS language
- SAS System – see SAS (software)
- Savitzky–Golay smoothing filter
- Sazonov's theorem
- Saturated array
- Scale analysis (statistics)
- Scale parameter
- Scaled-inverse-chi-squared distribution
- Scaling pattern of occupancy
- Scatter matrix
- Scatter plot
- Scatterplot smoothing
- Scheffé's method
- Scheirer–Ray–Hare test
- Schilder's theorem
- Schramm–Loewner evolution
- Schuette–Nesbitt formula
- Schwarz criterion
- Score (statistics)
- Score test
- Scoring algorithm
- Scoring rule
- SCORUS
- Scott's Pi
- SDMX – a standard for exchanging statistical data
- Seasonal adjustment
- Seasonality
- Seasonal subseries plot
- Seasonal variation
- Seasonally adjusted annual rate
- Second moment method
- Secretary problem
- Secular variation
- Seed-based d mapping
- Seemingly unrelated regressions
- Seismic to simulation
- Selection bias
- Selective recruitment
- Self-organizing map
- Self-selection bias
- Self-similar process
- Segmented regression
- Seismic inversion
- Self-similarity matrix
- Semantic mapping (statistics)
- Semantic relatedness
- Semantic similarity
- Semi-Markov process
- Semi-log graph
- Semidefinite embedding
- Semimartingale
- Semiparametric model
- Semiparametric regression
- Semivariance
- Sensitivity (tests)
- Sensitivity analysis
- Sensitivity and specificity
- Sensitivity index
- Separation test
- Sequential analysis
- Sequential estimation
- Sequential Monte Carlo methods – redirects to Particle filter
- Sequential probability ratio test
- Serial dependence
- Seriation (archaeology)
- SETAR (model) – a time series model
- Sethi model
- Seven-number summary
- Sexual dimorphism measures
- Shannon–Hartley theorem
- Shape of the distribution
- Shape parameter
- Shapiro–Wilk test
- Sharpe ratio
- SHAZAM (software)
- Shewhart individuals control chart
- Shifted Gompertz distribution
- Shifted log-logistic distribution
- Shifting baseline
- Shrinkage (statistics)
- Shrinkage estimator
- Sichel distribution
- Siegel–Tukey test
- Sieve estimator
- Sigma-algebra
- SigmaStat – software
- Sign test
- Signal-to-noise ratio
- Signal-to-noise statistic
- Significance analysis of microarrays
- Silhouette (clustering)
- Simfit – software
- Similarity matrix
- Simon model
- Simple linear regression
- Simple moving average crossover
- Simple random sample
- Simpson's paradox
- Simulated annealing
- Simultaneous equation methods (econometrics)
- Simultaneous equations model
- Single equation methods (econometrics)
- Single-linkage clustering
- Singular distribution
- Singular spectrum analysis
- Sinusoidal model
- Sinkov statistic
- Size (statistics)
- Skellam distribution
- Skew normal distribution
- Skewness
- Skorokhod's representation theorem
- Slash distribution
- Slice sampling
- Sliced inverse regression
- Slutsky's theorem
- Small area estimation
- Smearing retransformation
- Smoothing
- Smoothing spline
- Smoothness (probability theory)
- Snowball sampling
- Sobel test
- Social network change detection
- Social statistics
- SOFA Statistics – software
- Soliton distribution – redirects to Luby transform code
- Somers' D
- Sørensen similarity index
- Spaghetti plot
- Sparse binary polynomial hashing
- Sparse PCA – sparse principal components analysis
- Sparsity-of-effects principle
- Spatial analysis
- Spatial dependence
- Spatial descriptive statistics
- Spatial distribution
- Spatial econometrics
- Spatial statistics – redirects to Spatial analysis
- Spatial variability
- Spearman's rank correlation coefficient
- Spearman–Brown prediction formula
- Species discovery curve
- Specification (regression) – redirects to Statistical model specification
- Specificity (tests)
- Spectral clustering – (cluster analysis)
- Spectral density
- Spectral density estimation
- Spectrum bias
- Spectrum continuation analysis
- Speed prior
- Spherical design
- Split normal distribution
- SPRT – redirects to Sequential probability ratio test
- SPSS – software
- SPSS Clementine – software (data mining)
- Spurious relationship
- Square root biased sampling
- Squared deviations
- St. Petersburg paradox
- Stability (probability)
- Stable distribution
- Stable and tempered stable distributions with volatility clustering – financial applications
- Standard deviation
- Standard error
- Standard normal deviate
- Standard normal table
- Standard probability space
- Standard score
- Standardized coefficient
- Standardized moment
- Standardised mortality rate
- Standardized mortality ratio
- Standardized rate
- Stanine
- STAR model – a time series model
- Star plot – redirects to Radar chart
- Stata
- State space representation
- Statgraphics – software
- Static analysis
- Stationary distribution
- Stationary ergodic process
- Stationary process
- Stationary sequence
- Stationary subspace analysis
- Statistic
- STATISTICA – software
- Statistical arbitrage
- Statistical assembly
- Statistical assumption
- Statistical benchmarking
- Statistical classification
- Statistical conclusion validity
- Statistical consultant
- Statistical deviance – see deviance (statistics)
- Statistical dispersion
- Statistical distance
- Statistical efficiency
- Statistical epidemiology
- Statistical estimation – redirects to Estimation theory
- Statistical finance
- Statistical genetics – redirects to population genetics
- Statistical geography
- Statistical graphics
- Statistical hypothesis testing
- Statistical independence
- Statistical inference
- Statistical interference
- Statistical Lab – software
- Statistical learning theory
- Statistical literacy
- Statistical model
- Statistical model specification
- Statistical model validation
- Statistical noise
- Statistical package
- Statistical parameter
- Statistical parametric mapping
- Statistical parsing
- Statistical population
- Statistical power
- Statistical probability
- Statistical process control
- Statistical proof
- Statistical randomness
- Statistical range – see range (statistics)
- Statistical regularity
- Statistical relational learning
- Statistical sample
- Statistical semantics
- Statistical shape analysis
- Statistical signal processing
- Statistical significance
- Statistical survey
- Statistical syllogism
- Statistical theory
- Statistical unit
- Statisticians' and engineers' cross-reference of statistical terms
- Statistics
- Statistics education
- Statistics Online Computational Resource – training materials
- StatPlus
- StatXact – software
- Stein's example
- Stein's lemma
- Stein's unbiased risk estimate
- Steiner system
- Stemplot – see Stem-and-leaf display
- Step detection
- Stepwise regression
- Stieltjes moment problem
- Stimulus-response model
- Stochastic
- Stochastic approximation
- Stochastic calculus
- Stochastic convergence
- Stochastic differential equation
- Stochastic dominance
- Stochastic drift
- Stochastic equicontinuity
- Stochastic gradient descent
- Stochastic grammar
- Stochastic investment model
- Stochastic kernel estimation
- Stochastic matrix
- Stochastic modelling (insurance)
- Stochastic optimization
- Stochastic ordering
- Stochastic process
- Stochastic rounding
- Stochastic simulation
- Stopped process
- Stopping time
- Stratified sampling
- Stratonovich integral
- Streamgraph
- Stress majorization
- Strong law of small numbers
- Strong prior
- Structural break
- Structural equation modeling
- Structural estimation
- Structured data analysis (statistics)
- Studentized range
- Studentized residual
- Student's t-distribution
- Student's t-statistic
- Student's t-test
- Student's t-test for Gaussian scale mixture distributions – see Location testing for Gaussian scale mixture distributions
- Studentization
- Study design
- Study heterogeneity
- Subcontrary mean – redirects to Harmonic mean
- Subgroup analysis
- Subindependence
- Substitution model
- SUDAAN – software
- Sufficiency (statistics) – see Sufficient statistic
- Sufficient dimension reduction
- Sufficient statistic
- Sum of normally distributed random variables
- Sum of squares (disambiguation) – general disambiguation
- Sum of squares (statistics) – see Partition of sums of squares
- Summary statistic
- Support curve
- Support vector machine
- Surrogate model
- Survey data collection
- Survey sampling
- Survey methodology
- Survival analysis
- Survival rate
- Survival function
- Survivorship bias
- Symmetric design
- Symmetric mean absolute percentage error
- SYSTAT – software
- System dynamics
- System identification
- Systematic error (also see bias (statistics) and errors and residuals in statistics)
- Systematic review
T
[edit]- t-distribution – see Student's t-distribution (includes table)
- T distribution (disambiguation)
- t-statistic
- Tag cloud – graphical display of info
- Taguchi loss function
- Taguchi methods
- Tajima's D
- Taleb distribution
- Tampering (quality control)
- Taylor expansions for the moments of functions of random variables
- Taylor's law – empirical variance-mean relations
- Telegraph process
- Test for structural change
- Test–retest reliability
- Test score
- Test set
- Test statistic
- Testimator
- Testing hypotheses suggested by the data
- Text analytics
- The Long Tail – possibly seminal magazine article
- The Unscrambler – software
- Theil index
- Theil–Sen estimator
- Theory of conjoint measurement
- Therapeutic effect
- Thompson sampling
- Three-point estimation
- Three-stage least squares
- Threshold model
- Thurstone scale
- Thurstonian model
- Time–frequency analysis
- Time–frequency representation
- Time reversibility
- Time series
- Time-series regression
- Time use survey
- Time-varying covariate
- Timeline of probability and statistics
- TinkerPlots – proprietary software for schools
- Tobit model
- Tolerance interval
- Top-coded
- Topic model (statistical natural language processing)
- Topological data analysis
- Tornqvist index
- Total correlation
- Total least squares
- Total sum of squares
- Total survey error
- Total variation distance – a statistical distance measure
- TPL Tables – software
- Tracy–Widom distribution
- Traffic equations
- Training set
- Transect
- Transferable belief model
- Transiogram
- Transition rate matrix
- Treatment and control groups
- Trend analysis
- Trend estimation
- Trend-stationary process
- Treynor ratio
- Triangular distribution
- Trimean
- Trimmed estimator
- Trispectrum
- True experiment
- True variance
- Truncated distribution
- Truncated mean
- Truncated normal distribution
- Truncated regression model
- Truncation (statistics)
- Tsallis distribution
- Tsallis statistics
- Tschuprow's T
- Tucker decomposition
- Tukey's range test – multiple comparisons
- Tukey's test of additivity – interaction in two-way anova
- Tukey–Duckworth test
- Tukey–Kramer method
- Tukey lambda distribution
- Tweedie distribution
- Twisting properties
- Two stage least squares – redirects to Instrumental variable
- Two-tailed test
- Two-way analysis of variance
- Type I and type II errors
- Type-1 Gumbel distribution
- Type-2 Gumbel distribution
- Tyranny of averages
U
[edit]- u-chart
- U-quadratic distribution
- U-statistic
- U test
- Umbrella sampling
- Unbiased estimator – see bias (statistics)
- Unbiased estimation of standard deviation
- Uncertainty
- Uncertainty coefficient
- Uncertainty quantification
- Uncomfortable science
- Uncorrelated
- Underdispersion – redirects to Overdispersion
- Underfitting – redirects to Overfitting
- Underprivileged area score
- Unevenly spaced time series
- Unexplained variation – see Explained variation
- Uniform distribution (continuous)
- Uniform distribution (discrete)
- Uniformly most powerful test
- Unimodal distribution – redirects to Unimodal function (has some stats context)
- Unimodality
- Unit (statistics)
- Unit of observation
- Unit root
- Unit root test
- Unit-weighted regression
- Unitized risk
- Univariate
- Univariate analysis
- Univariate distribution
- Unmatched count
- Unseen species problem
- Unsolved problems in statistics
- Upper and lower probabilities
- Upside potential ratio – finance
- Urn problem
- Ursell function
- Utility maximization problem
- Utilization distribution
V
[edit]- Validity (statistics)
- Van der Waerden test
- Van Houtum distribution
- Vapnik–Chervonenkis theory
- Varadhan's lemma
- Variable
- Variable kernel density estimation
- Variable-order Bayesian network
- Variable-order Markov model
- Variable rules analysis
- Variance
- Variance decomposition of forecast errors
- Variance gamma process
- Variance inflation factor
- Variance-gamma distribution
- Variance reduction
- Variance-stabilizing transformation
- Variance-to-mean ratio
- Variation ratio
- Variational Bayesian methods
- Variational message passing
- Variogram
- Varimax rotation
- Vasicek model
- VC dimension
- VC theory
- Vector autoregression
- VEGAS algorithm
- Violin plot
- ViSta – Software, see ViSta, The Visual Statistics system
- Voigt profile
- Volatility (finance)
- Volcano plot (statistics)
- Von Mises distribution
- Von Mises–Fisher distribution
- V-optimal histograms
- V-statistic
- Vuong's closeness test
- Vysochanskiï–Petunin inequality
W
[edit]- Wait list control group
- Wald distribution – redirects to Inverse Gaussian distribution
- Wald test
- Wald–Wolfowitz runs test
- Wallenius' noncentral hypergeometric distribution
- Wang and Landau algorithm
- Ward's method
- Watterson estimator
- Watts and Strogatz model
- Weibull chart – redirects to Weibull distribution
- Weibull distribution
- Weibull modulus
- Weight function
- Weighted median
- Weighted covariance matrix – redirects to Sample mean and sample covariance
- Weighted mean
- Weighted sample – redirects to Sample mean and sample covariance
- Welch's method – spectral density estimation
- Welch's t test
- Welch–Satterthwaite equation
- Well-behaved statistic
- Wick product
- Wilks' lambda distribution
- Wilks' theorem
- Winsorized mean
- Whipple's index
- White test
- White noise
- Wide and narrow data
- Wiener deconvolution
- Wiener filter
- Wiener process
- Wigner quasi-probability distribution
- Wigner semicircle distribution
- Wike's law of low odd primes
- Wilcoxon signed-rank test
- Will Rogers phenomenon
- WinBUGS – software
- Window function
- Winpepi – software
- Winsorising
- Wishart distribution
- Wold's theorem
- Wombling
- Working–Hotelling procedure
- World Programming System – software
- Wrapped Cauchy distribution
- Wrapped distribution
- Wrapped exponential distribution
- Wrapped normal distribution
- Wrapped Lévy distribution
- Writer invariant
X
[edit]- X-12-ARIMA
- chart
- and R chart
- and s chart
- XLispStat – software
- XploRe – software
Y
[edit]Z
[edit]See also
[edit]- Supplementary lists
These lists include items which are somehow related to statistics however are not included in this index:
- List of statisticians
- List of important publications in statistics
- List of scientific journals in statistics
- Topic lists
- Outline of statistics
- List of probability topics
- Glossary of probability and statistics
- Glossary of experimental design
- Notation in probability and statistics
- List of probability distributions
- List of graphical methods
- List of fields of application of statistics
- List of stochastic processes topics
- Lists of statistics topics
- List of statistical packages
External links
[edit]- ISI Glossary of Statistical Terms (multilingual), International Statistical Institute
List of statistics articles
View on Grokipediafrom Grokipedia
The list of statistics articles serves as a comprehensive index of key concepts, methods, and subfields within statistics, the branch of mathematics dedicated to collecting, analyzing, interpreting, and drawing conclusions from data in the presence of uncertainty and variability.[1][2] This compilation encompasses foundational topics such as descriptive statistics—including measures of central tendency like the mean, median, and mode, as well as dispersion metrics such as range and standard deviation—probability theory with discrete and continuous distributions, and inferential techniques like hypothesis testing, confidence intervals, and sampling distributions.[3][4][5] It also extends to regression analysis, including linear and multiple regression models, correlation, and analysis of variance (ANOVA), which are essential for modeling relationships in data.[6][7]
Beyond introductory elements, the list highlights advanced and specialized areas that address complex real-world applications, such as Bayesian inference for updating probabilities with new evidence, time series analysis for modeling temporal data, and nonparametric methods for distributions without strong assumptions.[8][9] These topics reflect the interdisciplinary nature of modern statistics, integrating with fields like biostatistics for health research, environmental statistics for ecological modeling, spatial statistics for geographic patterns, and machine learning for high-dimensional data analysis.[8][10] Overall, such a list provides an organized reference for scholars, practitioners, and students navigating the evolution of statistical methodologies from classical foundations to contemporary computational approaches.
All properties sourced from standard formulations.[65]
Properties as standardized.[65]
where are parameters and is Gaussian noise. The MA(q) component incorporates q lagged errors:
Integration through differencing (order d) transforms a non-stationary series into a stationary one by applying the operator , where is the backshift operator. The full ARIMA(p,d,q) model is specified after identifying orders via autocorrelation and partial autocorrelation functions. Stationarity is assessed using the Augmented Dickey-Fuller (ADF) test, which augments the basic Dickey-Fuller regression with lagged differences to test the null hypothesis of a unit root:
rejecting non-stationarity if the test statistic falls below critical values. These models, developed by Box and Jenkins, are widely used for short-term forecasting in economics and finance due to their parsimony and interpretability. Exponential smoothing methods provide simpler alternatives for forecasting, weighting recent observations more heavily to adapt to changes in level, trend, or seasonality. Simple exponential smoothing updates forecasts as a convex combination of the actual value and prior forecast:
where is the smoothing parameter controlling responsiveness. Holt's method extends this to linear trends by maintaining separate level and slope equations:
with forecast . The Holt-Winters approach adds multiplicative or additive seasonality, decomposing the series into level, trend, and seasonal factors updated via additional smoothing parameters, making it suitable for data with periodic patterns like monthly sales. These techniques, originating from Holt's 1957 work and Winters' extension, excel in computational efficiency for real-time applications. Spectral analysis decomposes time series into frequency components to identify dominant cycles, contrasting time-domain methods by revealing periodic structures. The periodogram serves as a nonparametric estimator of the power spectral density, computed as
for frequencies , . It highlights peaks corresponding to periodicities but suffers from high variance, often requiring smoothing via Welch's method or multitaper techniques for stability. This approach is particularly valuable for detecting hidden rhythms in geophysical or astronomical data, building on early work by Schuster. Forecasting performance is evaluated using scale-dependent metrics that quantify prediction errors on holdout data. The mean absolute error (MAE) measures average deviation without penalizing direction:
offering interpretability in original units. The root mean squared error (RMSE) emphasizes larger errors through squaring:
and is sensitive to outliers, making it suitable for normally distributed residuals. These metrics guide model selection, with lower values indicating better accuracy, as standardized in forecasting practice.
Foundational Concepts
Core Definitions
Statistics is the mathematical science concerned with the collection, analysis, interpretation, presentation, and organization of data to make informed decisions or inferences about real-world phenomena.[11] This discipline encompasses two primary branches: descriptive statistics, which involves summarizing and organizing data from a sample or population to describe its main features, such as through measures of central tendency and variability; and inferential statistics, which uses sample data to draw conclusions or make predictions about a larger population, often incorporating probability to account for uncertainty.[12][13] Fundamental terms in statistics include population, referring to the entire set of entities or observations under study; sample, a subset of the population selected for analysis; parameter, a numerical characteristic describing a population, such as its mean or variance; and statistic, a numerical characteristic computed from a sample to estimate a parameter.[14] These concepts were formalized in the late 19th and early 20th centuries, with Karl Pearson playing a pivotal role in establishing modern statistical terminology and methodology through his work in biometrics and probability, including the introduction of systematic distinctions between populations and samples in his 1895 contributions to curve fitting and data analysis.[15] Ronald A. Fisher later refined the term statistic in the 1920s to specifically denote sample-based estimates of population parameters.[16] Data in statistics is classified by type and scale to determine appropriate analytical methods. Qualitative data, also known as categorical data, describes qualities or categories without numerical meaning, such as gender or color, while quantitative data involves numerical values that can be measured or counted, such as height or income.[17] Quantitative data is further divided into discrete types, which take on countable integer values (e.g., number of children), and continuous types, which can assume any value within an interval (e.g., temperature).[18] Scales of measurement, as defined by psychologist Stanley Smith Stevens in 1946, provide a framework for these classifications: nominal scale for unordered categories (e.g., blood type); ordinal scale for ordered categories without equal intervals (e.g., Likert scales); interval scale for ordered numerical data with equal intervals but no true zero (e.g., Celsius temperature); and ratio scale for ordered numerical data with equal intervals and a true zero (e.g., weight).[19] Bias in statistics refers to a systematic error that skews results away from the true value, often arising from flawed data collection or non-representative sampling, leading to consistently inaccurate estimates.[20] In hypothesis testing, errors are categorized as Type I error, the incorrect rejection of a true null hypothesis (false positive), and Type II error, the failure to reject a false null hypothesis (false negative), with their probabilities denoted as α and β, respectively.[21] Precision describes the consistency or reproducibility of measurements, reflecting low variability among repeated observations, whereas accuracy measures how close those measurements are to the true value; high precision does not guarantee accuracy if bias is present, and vice versa.[22] Probability serves as a foundational tool for quantifying uncertainty in these concepts, with deeper exploration in subsequent sections on probability basics.[11]Probability Basics
Probability theory begins with the concept of a random experiment, which is any process or phenomenon whose outcome cannot be predicted with certainty, such as tossing a coin or drawing a card from a deck. The sample space, denoted as , is the set of all possible outcomes of such an experiment, while events are subsets of the sample space representing specific outcomes or collections of outcomes. For instance, in a coin toss, the sample space is , and the event "heads" is the subset containing only that outcome. These foundational elements allow for the modeling of uncertainty in a structured manner. The axioms of probability, formalized by Andrey Kolmogorov in 1933, provide the rigorous mathematical foundation for assigning probabilities to events. The first axiom states that the probability of any event is a non-negative real number: for any event . The second axiom requires that the probability of the entire sample space is 1: . The third axiom, known as countable additivity, asserts that for a countable collection of mutually exclusive events , the probability of their union is the sum of their individual probabilities: . These axioms ensure that probability measures are consistent and applicable to infinite sample spaces. From these axioms, basic rules of probability follow directly. The addition rule for two events and states that , accounting for overlap to avoid double-counting. The multiplication rule for independent events, where the occurrence of one does not affect the other, simplifies to . More generally, for any events, , introducing conditional probability , which is the probability of given that has occurred, defined as when . Events and are independent if . Bayes' theorem, derived from the multiplication rule, reverses conditional probabilities: . This theorem is crucial for updating beliefs based on new evidence. A classic example in medical testing involves a disease affecting 1% of the population, with a test that is 99% accurate (true positive and true negative rates both 99%). If a person tests positive, the probability they have the disease is approximately 50%, not 99%, because the low disease prevalence (prior probability) outweighs the test's accuracy, leading to many false positives. This illustrates base rate neglect and the theorem's practical importance in diagnostics.[23][24] Expected value, variance, and covariance extend these principles to quantify averages and spreads under uncertainty. The expected value (or mean) of an event or quantity is the probability-weighted average, representing the long-run average outcome over many repetitions. For a discrete random variable taking values with probabilities , it is . Variance measures the expected squared deviation from the mean, , capturing dispersion. Covariance between two variables and , , assesses their joint variability; positive values indicate tendency to move together. These measures are probability-weighted summaries essential for risk assessment and modeling dependencies. Random variables formalize outcomes as functions on the sample space, building on event probabilities for numerical analysis.Random Variables and Processes
Random variables provide a formal framework for modeling uncertainty in statistical analysis by associating numerical outcomes with events in a probability space. A random variable is defined as a measurable function from the sample space to the real numbers , where the probability measure on induces a distribution on .[25] This concept extends basic probability by quantifying outcomes through expected values and variability, serving as a foundation for more advanced inferential techniques. Discrete random variables take on a countable set of possible values, such as integers, and are characterized by their probability mass function (PMF), denoted , which satisfies and for all .[26] For example, the number of heads in coin flips is a discrete random variable with a binomial PMF. In contrast, continuous random variables assume values in an uncountable interval of the reals and are described by a probability density function (PDF), , where the probability over an interval is given by the integral , with and .[27] Unlike PMFs, PDFs can exceed 1 but represent densities rather than direct probabilities, as in the uniform distribution over [0,1]. For multiple random variables, such as and , the joint distribution captures their combined behavior. The joint PMF for discrete variables is , while the joint PDF for continuous variables is , satisfying .[28] Marginal distributions are obtained by summing or integrating out the other variable: the marginal PMF of is , and similarly for the PDF .[29] Conditional distributions describe the distribution of one variable given the other, with the conditional PMF for , and analogously for PDFs, enabling analysis of dependencies.[30] Stochastic processes model sequences of random variables indexed by time or another parameter, representing evolving systems under uncertainty. Markov chains are discrete-time stochastic processes where the future state depends only on the current state, not the past, defined by transition probabilities , forming a transition matrix for finite states.[31] They are foundational for modeling systems like queueing or population dynamics. Poisson processes, as continuous-time counterparts, count events occurring randomly at a constant average rate , with interarrival times exponentially distributed; the number of events in interval follows a Poisson distribution with parameter , and increments are independent.[32] Moment-generating functions (MGFs) and characteristic functions facilitate the analysis of random variables by encoding their moments and distributions. The MGF of a random variable is , defined for in some neighborhood of 0 where the expectation exists, and the -th moment is obtained via .[33] MGFs uniquely determine distributions when they exist and simplify convolutions for sums of independent variables, as . Characteristic functions, always defined as where , extend this to all distributions via Fourier transforms and share similar uniqueness and convolution properties, aiding in limit theorems and inversion to recover densities.[34]Descriptive Statistics
Measures of Central Tendency
Measures of central tendency summarize the central or typical value of a dataset, providing a single representative figure for the distribution's location. These measures include the arithmetic mean, geometric mean, harmonic mean, median, mode, weighted mean, and midrange, each suited to different data characteristics and assumptions. They originated in ancient mathematics and gained prominence in astronomy and social sciences for handling observational errors and aggregating data.[35][36] The arithmetic mean, also known as the average, is calculated as the sum of all values divided by the number of observations. For a dataset , the formula is . It assumes equal importance of each value and is widely used in economics for averaging quantities like income or production levels, where additive properties apply. Historically, the arithmetic mean traces back to Pythagorean concepts around 280 BC for proportions in music, but its statistical use emerged in 16th-century astronomy, where observers like Tycho Brahe averaged multiple measurements to minimize errors in planetary positions. By 1755, Thomas Simpson formalized its application in astronomy, and Pierre-Simon Laplace's 1810 central limit theorem justified its reliability for error reduction. In the 19th century, Adolphe Quetelet extended it to social data, such as averaging chest sizes of soldiers to define the "average man."[37][36][35] The geometric mean is the nth root of the product of n positive values, given by . It is appropriate for data involving ratios or growth rates, such as averaging investment returns over time or environmental concentrations spanning orders of magnitude, where it better captures multiplicative effects than the arithmetic mean. For instance, financial analysts use it to compute compound annual growth rates from historical returns. Its origins lie with the Pythagoreans, who termed it the "mean proportional" for geometric constructions like right-triangle altitudes around the 5th century BC, with formalization by Euclid.[37][38][39] The harmonic mean, defined as the reciprocal of the arithmetic mean of the reciprocals, is for positive values. It is useful for averaging rates, such as speeds over equal distances or harmonic means in hydrological statistics for low-flow frequencies in rivers. In music theory, Pythagoras linked it to string length ratios for consonant intervals around 500 BC, as documented by Boethius.[37][40][38] The median is the middle value in an ordered dataset, the 50th percentile, which divides the data into equal halves. For even n, it is the average of the two central values. Unlike the arithmetic mean, the median resists outliers and is preferred for skewed distributions, where it provides a more stable central summary. In right-skewed data, the mean exceeds the median, pulled toward the tail, while in left-skewed data, the mean falls below it. The mode, the most frequent value, identifies the peak in multimodal or categorical data but can be multiple or absent; it aligns with the mean and median in symmetric distributions. Outliers minimally affect the median and mode, making them robust for real-world data with extremes, such as income distributions.[41] Weighted means assign different importance to values via weights , computed as , useful when data points vary in reliability, like survey responses weighted by sample size. The midrange, a positional average, is simply , offering a quick estimate of the center but sensitive to extremes. Early uses of such positional averages, including midrange, appeared in Thucydides' 5th-century BC estimates of ship crews by averaging minima and maxima. In astronomy from the 9th to 16th centuries, Arabian and European astronomers employed midrange for celestial observations before generalizing to the arithmetic mean with decimal advancements. These measures complement assessments of data spread, though their selection depends on distribution shape.[42][35]Measures of Variability
Measures of variability, also known as measures of dispersion, quantify the extent to which data points deviate from a central value, such as the mean or median, thereby complementing measures of central tendency to describe the full profile of a dataset's distribution. These metrics are essential in descriptive statistics for assessing data spread, identifying outliers, and facilitating comparisons across different scales or units. Unlike central tendency, which locates the data's center, variability highlights the consistency or heterogeneity within the data, with applications in fields like quality control, finance, and social sciences. The range is the most basic measure of variability, defined as the difference between the maximum and minimum values in a dataset. To calculate it, sort the data to identify the largest value and the smallest value , then compute . This metric provides a quick sense of the data's total spread but is highly sensitive to extreme values, making it less robust for datasets with outliers.[43][44] The interquartile range (IQR) offers a more robust alternative by capturing the spread of the central 50% of the data, excluding potential outliers in the tails. It is calculated as the difference between the third quartile (the 75th percentile) and the first quartile (the 25th percentile): . To find the quartiles, first sort the dataset in ascending order; then, determine at position and at , where is the number of observations, interpolating if the position is not an integer. The IQR is particularly useful for skewed distributions or when paired with the median.[43][44][45] The semi-interquartile range, or quartile deviation, is simply half the IQR, providing a symmetric measure of spread around the median: . This is computed directly after determining the IQR and is often used in older statistical texts or for summarizing variability in small samples without emphasizing the full middle spread.[43][44][45] Variance quantifies the average squared deviation of data points from the mean, emphasizing larger deviations due to the squaring. For a population, it is given by where is the population mean and the total number of observations; to compute it, subtract from each , square the differences, sum them, and divide by . For a sample, an unbiased estimator uses the sample variance with as the sample mean and (Bessel's correction) in the denominator to account for sampling variability. The population version assumes complete data, while the sample version corrects for underestimation in finite samples.[43][44][45] The standard deviation is the square root of the variance, translating the measure back to the original units of the data for intuitive interpretation as the typical deviation from the mean. The population standard deviation is , and the sample standard deviation is , following the same computational steps as variance but taking the positive square root at the end. It is widely used because it aligns with the normal distribution's properties and facilitates probabilistic interpretations.[43][44][45] The coefficient of variation (CV) and relative standard deviation (RSD) normalize variability relative to the mean, enabling comparisons between datasets with different scales or units. The CV is calculated as for samples, where is the sample standard deviation; the RSD is the non-percentage form . These are particularly valuable in fields like biology or engineering, where relative consistency matters more than absolute spread, but they are undefined or misleading if the mean is near zero.[43][44] Skewness and kurtosis extend variability measures to the distribution's shape via higher moments, capturing asymmetry and tail behavior beyond second-moment spread. Skewness assesses deviation from symmetry and is computed as the standardized third moment: for a sample; values near zero indicate symmetry, positive values right-skew (longer right tail), and negative values left-skew. To calculate, first standardize each deviation , cube them, average, and interpret relative to the normal distribution's zero skewness.[43][46][45] Kurtosis evaluates the peakedness and tail heaviness relative to a normal distribution, using the standardized fourth moment: (excess kurtosis for samples); zero indicates mesokurtic (normal-like), positive leptokurtic (heavy tails, sharp peak), and negative platykurtic (light tails, flat peak). Computation involves standardizing deviations, raising to the fourth power, applying the bias-corrected formula, and subtracting 3 for excess relative to normal. These shape measures help detect non-normality but require larger samples for reliability.[43][46][45]Data Presentation Techniques
Data presentation techniques in statistics involve tabular and graphical methods to summarize and visualize descriptive statistics from observed data, facilitating pattern recognition and interpretation. These approaches transform raw numerical data into accessible formats that highlight distributions, frequencies, and relationships without inferring underlying probabilities. Common methods include tables for categorical summaries and charts for continuous data representations, ensuring clarity and minimizing distortion in communication.Tables
Frequency distribution tables organize data by grouping values into classes or categories and counting occurrences within each, providing a foundational summary of data spread. For instance, in a dataset of exam scores, classes might range from 0-10, 11-20, up to 91-100, with counts indicating how many scores fall into each interval.[47][48] This method is particularly useful for large datasets, as it condenses information while preserving ordinal structure.[49] Cumulative frequency tables extend frequency distributions by accumulating counts progressively, showing the total number of observations up to a given class boundary. Construction involves adding each class's frequency to the sum of all preceding frequencies; for example, if frequencies are 5, 10, and 15 for classes A, B, and C, cumulative values become 5, 15, and 30.[50] This cumulative approach aids in identifying percentiles and overall data accumulation.[51] Cross-tabulations, also known as contingency tables, display joint frequencies for two or more categorical variables in a matrix format, revealing associations between them. Rows and columns represent categories of each variable, with cell entries showing counts for their intersections; margins provide row and column totals.[52] For example, a table might cross-tabulate gender (rows) and product preference (columns) to show purchase frequencies.[53] These tables support initial exploration of dependencies in multivariate categorical data.Charts
Histograms represent the distribution of continuous data by dividing the range into bins and plotting bar heights proportional to frequencies within each bin. Construction begins with selecting bin width, often guided by Sturges' rule, which approximates the optimal number of bins as , where is the sample size, to balance detail and smoothness.[54] Bins should cover the data range evenly, starting from the minimum value, with adjacent bars touching to indicate continuity.[55] Box plots, or box-and-whisker plots, summarize data distribution through quartiles, median, and potential outliers in a compact graphical form. To construct one, calculate the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum; draw a box from Q1 to Q3 with a line at Q2, extending whiskers to the minimum and maximum (or 1.5 times the interquartile range if outliers exist).[56][57] This method effectively visualizes central tendency, variability, and skewness from the same dataset. Pie charts depict proportional relationships in categorical data as sectors of a circle, where each slice's angle corresponds to its frequency relative to the total. Angles are computed as , with as the category frequency and the total; sectors are drawn clockwise from a reference line, labeled with percentages or values.[58] Guidelines recommend limiting slices to five or fewer for readability and avoiding 3D effects that distort perceptions.[59]Plots for Small Datasets
Stem-and-leaf plots retain the original data values while organizing them hierarchically, suitable for datasets up to about 50 observations. The "stem" comprises leading digits, and the "leaf" the trailing digit(s); for example, the value 23 appears as stem 2 with leaf 3, listed in ascending order per stem.[60] This format allows quick assessment of shape, central tendency, and outliers without losing precision.[61] Dot plots display individual data points as stacked dots along a number line, ideal for small samples to reveal exact values, clusters, and gaps. Each unique value is marked on the axis, with dots stacked vertically for multiples; for instance, three observations at 5 would show three dots above 5.[62] They emphasize frequency without binning artifacts, though overlap can obscure details in denser areas.[63]Principles of Effective Visualization
Effective data presentation prioritizes clarity and integrity, with Edward Tufte's data-ink ratio advocating maximization of ink (or pixels) dedicated to data itself relative to non-essential elements. The ratio is defined as the proportion of total ink representing data, calculated as ; high ratios (ideally approaching 1) are achieved by erasing redundant lines, frames, and decorations.[64] This principle, from Tufte's seminal work, ensures visualizations convey information efficiently without distraction. Other guidelines include aligning scales linearly, labeling axes clearly, and avoiding chartjunk like excessive colors or gridlines that dilute focus. These techniques often draw on measures of central tendency and variability to scale axes and highlight key features.Probability Theory
Probability Distributions
Probability distributions provide a mathematical description of the likelihood of different outcomes for a random variable, extending the foundational framework of random variables by specifying exact probability assignments to values or intervals.[65] These distributions are broadly categorized into discrete and continuous families, each with parametric forms that define their probability mass functions (PMFs) for discrete cases or probability density functions (PDFs) for continuous cases, along with key properties such as mean and variance.[65] Seminal developments in these distributions trace back to early probability theory, with discrete forms often arising from counting processes and continuous ones from limiting behaviors in natural phenomena.[65]Discrete Distributions
Discrete probability distributions model random variables that take on a countable number of distinct values, typically integers, with probabilities given by a PMF summing to 1.[65] Key examples include the Bernoulli, binomial, Poisson, geometric, and negative binomial distributions, each suited to specific counting scenarios such as trials or events.[65] The Bernoulli distribution, introduced by Jacob Bernoulli in his 1713 work Ars Conjectandi, represents a single trial with two outcomes. Its PMF is where is the success probability, with mean and variance .[65] The binomial distribution generalizes the Bernoulli to independent trials and was also formalized in Bernoulli's Ars Conjectandi. Its PMF is with mean and variance ; higher moments include the third central moment .[65] The Poisson distribution, derived by Siméon Denis Poisson in 1837 for rare events, has PMF mean and variance both , and third central moment .[65] Its factorial moments are .[65] The geometric distribution counts trials until the first success, with PMF mean and variance .[65] The negative binomial distribution extends this to failures before successes, with PMF mean and variance .[65] The following table summarizes the PMFs, means, and variances for these discrete distributions:| Distribution | PMF | Mean | Variance |
|---|---|---|---|
| Bernoulli | , | ||
| Binomial | , | ||
| Poisson | , | ||
| Geometric | , | ||
| Negative Binomial | , |
Continuous Distributions
Continuous probability distributions apply to random variables over uncountable intervals, characterized by a PDF where the integral over any interval gives the probability.[65] Prominent families include the uniform, normal, exponential, gamma, chi-squared, t, and F distributions, widely used in modeling measurements, waiting times, and statistical tests.[65] The uniform distribution on assumes equal density, with PDF mean and variance .[65] The normal (Gaussian) distribution, central to statistics and first systematically studied by Carl Friedrich Gauss in 1809, has PDF mean and variance ; odd central moments are zero, and even moments are .[65] The exponential distribution models inter-arrival times, with PDF mean and variance ; moments are .[65] The gamma distribution generalizes the exponential for shape and rate , with PDF mean and variance ; moments .[65] The chi-squared distribution with degrees of freedom, arising from sums of squared normals, has PDF mean and variance ; moments .[65] The Student's t-distribution with degrees of freedom, introduced by William Sealy Gosset in 1908 under the pseudonym "Student," has PDF mean 0 (for ) and variance (for ).[65] The F-distribution with degrees of freedom and , used in variance ratio tests and derived by Ronald Fisher in 1924, has PDF mean (for ) and variance (for ).[65] The table below summarizes the PDFs, means, and variances for these continuous distributions:| Distribution | Mean | Variance | |
|---|---|---|---|
| Uniform | , | ||
| Normal | , | ||
| Exponential | , | ||
| Gamma | , | ||
| Chi-squared | , | ||
| t | , | 0 () | () |
| F | See full PDF above | () | Complex (see text) |
Multivariate Extensions
Multivariate distributions extend univariate forms to vectors of random variables, capturing joint behaviors and dependencies.[65] The multivariate normal distribution, building on the univariate normal and formalized in early 20th-century work by researchers like Karl Pearson, has joint PDF for -dimensional , with mean vector and covariance matrix ; marginals and conditionals are also normal.[65] Moments derive from the characteristic function, emphasizing its role in linear models and dimensionality reduction.[65]Stochastic Processes
Stochastic processes model random phenomena that evolve over time or space, extending the framework of probability distributions to sequences of interrelated random variables. Unlike independent and identically distributed (i.i.d.) random variables covered in probability distributions, stochastic processes capture dependencies, such as temporal correlations, where the state at one time influences future states. These processes are fundamental in statistics for analyzing systems like population dynamics, financial markets, and physical systems subject to randomness. Key properties include the marginal distributions at individual times, which follow standard probability distributions, but the joint behavior over multiple times defines the process type. Markov processes form a cornerstone of stochastic process theory, characterized by the Markov property: the conditional distribution of future states depends only on the current state, independent of prior history. Discrete-time Markov chains, introduced by Andrey Markov in his 1906 work on dependent quantities, model sequences where transitions are governed by a stochastic transition matrix , with entries , satisfying for each state .[66] Continuous-time Markov processes generalize this to time , using infinitesimal generator matrices or transition rates, as formalized in early developments by Kolmogorov in the 1930s. These processes enable analysis of long-run behavior, such as stationary distributions solving with , widely applied in reliability and genetics. Renewal processes describe counting processes , the number of events up to time , where inter-event times are i.i.d. positive random variables, generalizing the Poisson process to arbitrary distributions. The renewal function satisfies the renewal equation , where is the interarrival cumulative distribution function; Feller's 1941 integral equation analysis established key asymptotic results, including the elementary renewal theorem as , with .[67] In queueing theory, renewal processes underpin models like the M/M/1 queue, a single-server system with Poisson arrivals (rate ) and exponential service times (rate ); Kendall's 1953 embedded Markov chain method derived the steady-state queue length distribution , where , providing metrics like mean queue length .[68] Continuous stochastic processes include the Wiener process, also known as Brownian motion, a Gaussian process with independent, normally distributed increments and continuous paths. Defined rigorously by Norbert Wiener in 1923, it starts at 0, has , stationary increments for , and almost surely nowhere differentiable paths, serving as a building block for diffusion processes via the stochastic differential equation . Stationarity in stochastic processes refers to invariance of finite-dimensional distributions under time shifts, with weak stationarity requiring constant mean and autocovariance depending only on lag ; the autocorrelation function quantifies linear dependence at lag . Ergodicity, established by Birkhoff's 1931 pointwise ergodic theorem, ensures that for an ergodic measure-preserving transformation, time averages converge almost surely to the expectation under the invariant measure: as , linking sample paths to ensemble properties.Limit Theorems
Limit theorems in probability and statistics provide foundational results for understanding the behavior of sums and averages of random variables as the sample size grows large. These theorems justify the use of asymptotic approximations, which are essential for deriving the limiting distributions that underpin much of statistical inference. Key results include convergence of sample means to population parameters and normalization leading to standard distributions, under specified conditions on the underlying random variables. The law of large numbers (LLN) asserts that the sample average of independent and identically distributed (i.i.d.) random variables with finite expectation converges to the true mean. The weak law of large numbers (WLLN), first rigorously established by Chebyshev using Markov's inequality, states that for i.i.d. random variables with and finite variance, the sample mean satisfies as , where denotes convergence in probability. The strong law of large numbers (SLLN), proved by Kolmogorov under the condition of finite expectation, strengthens this to almost sure convergence: with probability 1. This result holds more generally for i.i.d. sequences without variance assumptions, and the almost sure convergence implies consistency of the sample mean as an estimator of the population mean in the sense of probability convergence. The central limit theorem (CLT) describes the asymptotic normality of appropriately scaled sums of random variables. For i.i.d. random variables with mean and finite positive variance , the standardized sum converges in distribution to a standard normal random variable: . More general versions apply to independent but non-identically distributed sequences satisfying the Lindeberg-Feller conditions: for triangular arrays of row-wise independent random variables with zero means and variances summing to 1, the Lindeberg condition requires that for every , as , which is necessary and sufficient for the row sums to converge in distribution to . The Berry-Esseen theorem quantifies the rate of this convergence, bounding the supremum distance between the cumulative distribution function of the standardized sum and that of the standard normal by , where and is a universal constant (Esseen's original bound used , improved to in recent works).[69] The delta method extends the CLT to functions of asymptotically normal estimators. If and is a differentiable function at with , then . [70] This approximation arises from a first-order Taylor expansion of around . Slutsky's theorem facilitates operations on convergent sequences of random variables. If and (a constant), then and ; more generally, if , then under joint convergence conditions, products and quotients preserve distributional limits provided the limiting quantity is continuous. The continuous mapping theorem complements this by stating that if and is continuous at points in the support of (or almost everywhere), then . These results, often used in tandem, enable the derivation of limiting distributions for complex statistics composed of simpler convergent components.Inferential Statistics
Sampling and Estimation
- Simple random sample
- Stratified sampling
- Cluster sampling
- Systematic sampling
- Point estimation
- Method of moments[71]
- Maximum likelihood estimation
- Unbiased estimator
- Efficient estimator
- Sufficient statistic
- Bias–variance tradeoff
- Cramér–Rao bound
Hypothesis Testing
- Null hypothesis
- Alternative hypothesis
- Test statistic
- P-value
- Significance level
- Type I and Type II errors
- Z-test
- Student's t-test
- Chi-squared test
- F-test
- Statistical power
- Sample size determination
- Multiple comparisons problem
- Bonferroni correction
- False discovery rate
- Benjamini–Hochberg procedure
Confidence Intervals
- Confidence interval
- Pivotal quantity
- Normal confidence interval for the mean
- Student's t confidence interval
- Wald interval (for proportions)
- Chi-squared confidence interval for variance
- Confidence interval for difference of means
- Bootstrap confidence interval
Regression and Correlation
Linear Models
Linear models provide a fundamental framework in statistics for modeling the relationship between a dependent variable and one or more independent variables, assuming the relationship is linear. The core approach minimizes the sum of squared differences between observed and predicted values, known as ordinary least squares (OLS) estimation, which yields unbiased and efficient parameter estimates under specified conditions.[72] These models are widely applied in fields such as economics, biology, and engineering to predict outcomes and infer relationships.[72] Simple linear regression models the response as a function of a single predictor via the equation , where is the intercept, is the slope, and represents the random error term assumed to have mean zero.[72] The least squares estimates are and , minimizing the residual sum of squares .[72] This technique originated in astronomy for orbit determination, with Adrien-Marie Legendre publishing the first formal description in 1805 in his work Nouvelles méthodes pour la détermination des orbites des comètes.[73] Carl Friedrich Gauss later provided a probabilistic justification in 1809, emphasizing the normality of errors.[74] Multiple linear regression extends this to several predictors, expressed in matrix notation as , where is the response vector, is the design matrix (including a column of ones for the intercept), is the parameter vector, and is the error vector.[72] The OLS estimator is , assuming has full column rank.[72] This formulation was advanced by George Udny Yule in 1907, who developed the theory of correlation and regression for multiple variables using a notation that facilitated partial coefficients.[75] Valid inference in linear models relies on key assumptions: linearity in parameters, independence of errors, homoscedasticity (constant variance), and normality of errors for certain tests.[72] The Gauss-Markov theorem establishes that OLS estimators are the best linear unbiased estimators (BLUE) if errors have zero mean, constant variance, and are uncorrelated, without requiring normality. Model diagnostics involve examining residuals , such as plotting residuals against fitted values to detect non-linearity or heteroscedasticity, and against predictors to check independence; patterns like funnels indicate violations.[72] Analysis of variance (ANOVA) serves as a special case of linear models when predictors are categorical, partitioning total variance into components explained by the model and residual error.[72] The overall significance is tested using the F-statistic, , where MSR is the mean square regression and MSE is the mean square error, following an F-distribution under the null hypothesis of no predictors' effect.[72] Ronald A. Fisher introduced ANOVA in his 1925 book Statistical Methods for Research Workers to analyze experimental designs in agriculture.[76] Linear models build on measures of association like correlation by enabling predictive modeling and coefficient interpretation.[75]Nonlinear and Generalized Models
Nonlinear least squares methods extend the principles of ordinary least squares to models where the relationship between predictors and the response is inherently nonlinear, requiring iterative optimization techniques to minimize the sum of squared residuals. Unlike linear models, which assume a straight-line relationship and can be solved in closed form, nonlinear models, such as those describing logistic growth, demand numerical approximation algorithms like the Gauss-Newton or Levenberg-Marquardt methods to estimate parameters. A classic example is the logistic growth model, given by , where is the curve's maximum value, the growth rate, and the midpoint, often fitted to population or epidemic data to capture saturation effects. This approach, detailed in Bates and Watts (1988), emphasizes the importance of parameter identifiability and curvature in the parameter space to ensure stable convergence and reliable inference. Generalized linear models (GLMs) provide a flexible framework for regression analysis when the response variable does not follow a normal distribution, generalizing the linear model by incorporating a link function that connects the linear predictor to the mean of the response distribution. Introduced by Nelder and Wedderburn (1972), GLMs unify various regression types under exponential family distributions, with the linear predictor related to the mean via , where is the link function. For binary outcomes, the logit link in logistic regression models probabilities as , enabling odds ratio interpretations for predictor effects. Poisson regression, using the log link , suits count data like event occurrences, assuming variance equals the mean. The comprehensive treatment in McCullagh and Nelder (1989) established GLMs as a cornerstone for handling diverse data types, from binomial to gamma distributions, through maximum likelihood estimation via iteratively reweighted least squares.[77] Quasi-likelihood methods address limitations in full likelihood-based inference for GLMs, particularly overdispersion where the variance exceeds that implied by the model, such as in count data with extra variability due to clustering or unobserved heterogeneity. Developed by Wedderburn (1974), quasi-likelihood estimates parameters by solving estimating equations that mimic the score equations of a full likelihood but without specifying the full distribution, using only the mean-variance relationship , where is a dispersion parameter. This approach yields consistent estimators for the mean parameters even under model misspecification and allows robust standard errors to account for overdispersion, as extended in McCullagh (1983). In practice, quasi-Poisson models scale the Poisson variance by , providing a simple adjustment for real-world data deviations without altering the link function.[78] Model selection in nonlinear and generalized models often relies on information criteria that balance goodness-of-fit with model complexity to prevent overfitting. The Akaike Information Criterion (AIC), proposed by Akaike (1973), estimates relative model quality as , where is the maximized log-likelihood and the number of parameters, favoring parsimonious models with predictive power. The Bayesian Information Criterion (BIC), introduced by Schwarz (1978), imposes a stronger penalty on complexity via , with the sample size, making it asymptotically consistent for selecting the true model under certain conditions. These criteria, applied post-estimation in GLM software, guide choices among candidate link functions or nonlinear forms, prioritizing those with the lowest value for out-of-sample performance.[79]Correlation and Association
Correlation and association in statistics refer to measures quantifying the strength and direction of relationships between variables, essential for understanding joint variability without implying causation. These metrics range from -1 to +1, where values near 0 indicate no linear or monotonic association, positive values suggest direct relationships, and negative values indicate inverse ones. They form the foundation for more advanced modeling techniques, such as those explored in linear models where regression coefficients serve as related predictive metrics. The Pearson correlation coefficient, denoted , assesses the degree of linear dependence between two continuous variables and in a sample of size . It is computed as where and are the sample means; this formula normalizes the covariance by the product of standard deviations, yielding a dimensionless measure invariant to linear transformations. Introduced by Karl Pearson in 1895, it assumes bivariate normality and linearity for optimal interpretation, with often indicating strong associations in applied contexts.[80] Significance testing for the Pearson coefficient typically employs a t-test to evaluate the null hypothesis of no correlation (), using the statistic which follows a Student's t-distribution with degrees of freedom under the null; p-values below 0.05 reject the null for most studies, though effect size via provides substantive insight beyond significance. This approach, formalized by R.A. Fisher in 1921, accounts for sample size to avoid overinterpreting small correlations in large datasets. For ordinal or non-normally distributed data, rank-based measures like Spearman's rho and Kendall's tau offer robust alternatives by evaluating monotonic associations rather than strict linearity. Spearman's rank correlation coefficient, , replaces raw values with ranks and applies the Pearson formula to these ranks, effectively capturing nonlinear but consistently increasing or decreasing relationships; it equals 1 for perfect monotonicity and is less sensitive to outliers than Pearson's . Developed by Charles Spearman in 1904, it is widely used in psychology and biology for ranked data, with significance assessed via permutation tests or t-approximations.[81] Kendall's tau, , quantifies concordance between rankings by counting concordant and discordant pairs: , where and are concordant and discordant pairs, and adjust for ties; it emphasizes pairwise agreements, making it suitable for small samples or sparse data. Proposed by Maurice Kendall in 1938, tau is computationally intensive but provides a probability-based interpretation, with values near 0.3 common in social sciences for moderate associations, and exact p-values via hypergeometric distribution for ties.[82] Partial correlation extends bivariate measures by isolating the association between two variables while controlling for one or more covariates, computed as the Pearson correlation of residuals from regressing each variable on the controls. This isolates direct linear relationships, useful in confounding-heavy fields like epidemiology; for variables and controlling for , it equals . Introduced by George Udny Yule in 1907, it assumes multivariate normality and is tested similarly to Pearson's via t-statistics. Multiple correlation, denoted , generalizes to the strength of linear association between one variable and a set of others, equivalent to the correlation between the dependent variable and its linear prediction from the independents; represents the proportion of variance explained. Karl Pearson formalized this in 1896 within multivariate normal theory, with significance via F-tests comparing to zero, critical in variable selection where signals strong predictability. For categorical data in contingency tables, association measures like the phi coefficient and Cramér's V assess dependence without assuming continuity. The phi coefficient, , for 2×2 tables is the Pearson correlation of the binary indicators, , where is the chi-squared statistic and the sample size; it ranges from -1 to 1 and equals Pearson's for dichotomous variables. Derived by Karl Pearson in 1900 as part of chi-squared development, it detects nominal associations with significance from chi-squared tests. Cramér's V extends phi to larger tables as , normalizing to [0,1] for asymmetric tables and providing a symmetric strength measure independent of table dimensions. Introduced by Harald Cramér in 1946, it is preferred in sociology for multi-category data, with values above 0.15 indicating notable associations, and p-values from chi-squared with degrees of freedom.Multivariate and Advanced Analysis
Multivariate Statistics
Multivariate statistics encompasses methods for analyzing datasets with multiple interdependent variables, extending univariate techniques to capture joint distributions and relationships. These approaches often assume multivariate normality, where variables follow a joint normal distribution characterized by a mean vector and covariance matrix, enabling generalizations of classical tests and estimators. Key applications include dimensionality reduction to simplify high-dimensional data while preserving variance, association measures between variable sets, and hypothesis testing for multivariate means, all of which are foundational in fields like psychometrics, econometrics, and bioinformatics. Principal component analysis (PCA) is a seminal technique for dimensionality reduction, transforming correlated variables into a set of uncorrelated principal components ordered by decreasing variance. Introduced by Karl Pearson in 1901, PCA identifies orthogonal directions (principal axes) that maximize the variance captured from the data, allowing retention of the most informative components while discarding noise. The method computes the eigenvectors and eigenvalues of the data's covariance matrix , where the first principal component corresponds to the eigenvector with the largest eigenvalue , explaining the proportion of total variance. where is the centered data matrix and is the matrix of eigenvectors, yielding scores in the reduced space. Factor analysis complements PCA by modeling observed variables as linear combinations of underlying latent factors plus unique errors, assuming a smaller number of factors explain correlations. Developed by Charles Spearman in 1904 to identify a general intelligence factor from test scores, it decomposes the covariance matrix as , where is the factor loading matrix and is diagonal with specific variances. Unlike PCA's emphasis on total variance, factor analysis focuses on common variance, often using maximum likelihood estimation under normality assumptions. Canonical correlation analysis (CCA) extends correlation to pairs of multivariate sets, seeking linear combinations that maximize their correlation. Harold Hotelling formalized CCA in 1936, deriving canonical variates and from two data matrices and to maximize , with subsequent pairs orthogonal to prior ones. The canonical correlations are the square roots of the eigenvalues of . Linear discriminant analysis (LDA), proposed by Ronald A. Fisher in 1936, applies similar projections for classification, finding directions that maximize the ratio of between-class to within-class variance scatter, ideal for separating groups in taxonomic problems like iris species differentiation. LDA assumes multivariate normality within classes with equal covariances, yielding decision boundaries as linear functions of the features. Hotelling's statistic generalizes the univariate t-test to multivariate means, testing hypotheses like equality of group means under normality. Harold Hotelling introduced it in 1931 as the generalization of Student's ratio, defined for a sample mean as where is the sample covariance and is the sample size; it follows a scaled F-distribution under the null. The Mahalanobis distance extends this idea to measure dissimilarity between a point and a distribution, accounting for variable correlations via the inverse covariance: . P.C. Mahalanobis developed it in 1936 for anthropometric studies, providing a scale-invariant metric superior to Euclidean distance for elliptical distributions. Assessing multivariate normality is crucial, as many methods rely on it; key tests include Mardia's measures of skewness and kurtosis from 1970, which quantify deviations from zero skewness and kurtosis equal to p(p+2) for p dimensions under normality. Mardia's skewness is given by , with the test statistic following approximately a chi-squared distribution with degrees of freedom. The kurtosis measure is , and the excess kurtosis follows approximately a normal distribution with mean 0 and variance , enabling joint tests. Other influential tests, like Henze-Zirkler's, provide affine-invariant alternatives but Mardia's remains widely adopted for its simplicity and power against common alternatives.[83][84]Nonparametric Methods
Nonparametric methods in statistics encompass distribution-free techniques that do not rely on assumptions about the underlying probability distribution of the data, making them robust alternatives for inference when normality or other parametric conditions are violated. These methods are particularly useful for small sample sizes, ordinal data, or outliers, prioritizing ranks, signs, or empirical distributions over parametric forms. They provide valid p-values and confidence intervals through exact or asymptotic approximations, often achieving comparable power to parametric counterparts under ideal conditions but excelling in robustness. For testing differences in location (central tendency) between paired samples, the sign test evaluates the median difference by counting the proportion of positive and negative differences, ignoring magnitudes. It is applicable to any symmetric distribution and serves as a simple, exact test for the null hypothesis that the median is zero. The test statistic is the number of positive signs, with p-values computed from the binomial distribution under the null. This method was formalized in Dixon and Mood (1946), who provided tables for significance levels and demonstrated its utility in comparing treatments. The Wilcoxon signed-rank test extends the sign test by incorporating both the direction and rank magnitudes of differences, offering greater power for symmetric distributions. Differences are ranked by absolute value, signed according to direction, and summed for positive and negative ranks; the smaller sum serves as the test statistic , with exact distributions tabulated for small samples or normal approximations for larger ones. Introduced by Wilcoxon (1945), it tests the null that the distribution of differences is symmetric about zero. For independent samples, the Mann-Whitney U test (also known as the Wilcoxon rank-sum test) assesses stochastic dominance by ranking all observations combined and computing the sum of ranks in one group. The U statistic is the minimum of the two possible sums, testing the null that the distributions are identical. It is exact for small samples via enumeration and asymptotically normal otherwise, with high efficiency under normality but superior robustness to heavy tails. Mann and Whitney (1947) derived its distribution and moments, establishing its validity for continuous distributions. Goodness-of-fit tests in nonparametric statistics evaluate how well sample data match a specified distribution using empirical cumulative distribution functions (ECDFs). The Kolmogorov-Smirnov test measures the maximum vertical distance between the ECDF and hypothesized CDF , rejecting the null if D exceeds critical values from Kolmogorov's distribution. It is distribution-free for continuous cases and sensitive to discrepancies anywhere in the distribution. Massey (1951) tabulated percentage points and illustrated its application to one- and two-sample problems. The Anderson-Darling test weights the KS statistic by , emphasizing tails for greater sensitivity to deviations in extremes. Critical values depend on the tested distribution, enhancing power over unweighted tests. Anderson and Darling (1954) developed its asymptotic theory and significance points for large samples. Rank-based regression methods adapt linear models to non-normal errors by using medians of slopes rather than least squares. The Theil-Sen estimator computes the median of all pairwise slopes for , providing a slope robust to outliers with breakdown point up to 29%. It assumes linearity but no error distribution, yielding consistent estimates under mild conditions. Theil (1950) introduced the rank-invariant approach for simple regression, while Sen (1968) established its asymptotic normality and efficiency relative to least squares. Permutation tests and exact methods form the foundation of many nonparametric procedures by resampling data under the null hypothesis to derive empirical distributions. Permutation tests randomize group labels or pairings to compute the reference distribution of a test statistic, ensuring exact validity without distributional assumptions for exchangeable data. They are computationally intensive but feasible with modern algorithms, offering flexibility for complex hypotheses. Fisher (1935) pioneered randomization tests in experimental design, with Pitman (1937) formalizing exact significance for rank statistics. Exact methods extend this by enumerating all possible permutations or sign combinations for small samples, providing precise p-values for tests like the sign or Wilcoxon without approximations. These are essential in finite-sample inference, as detailed in foundational works on nonparametric exact distributions.Time Series and Forecasting
Time series analysis examines sequential data observed over time, focusing on patterns such as trends, seasonality, and autocorrelation to model and forecast future values. Unlike independent and identically distributed assumptions in standard regression, time series methods account for temporal dependencies where observations are correlated with past values. This approach applies theoretical foundations from stochastic processes to empirical data, enabling the identification of underlying structures in non-stationary series. ARIMA models form a cornerstone of time series modeling, integrating autoregressive (AR), differencing (I), and moving average (MA) components to handle non-stationarity and serial correlation. An AR(p) process models the current value as a linear function of p lagged values plus white noise:where are parameters and is Gaussian noise. The MA(q) component incorporates q lagged errors:
Integration through differencing (order d) transforms a non-stationary series into a stationary one by applying the operator , where is the backshift operator. The full ARIMA(p,d,q) model is specified after identifying orders via autocorrelation and partial autocorrelation functions. Stationarity is assessed using the Augmented Dickey-Fuller (ADF) test, which augments the basic Dickey-Fuller regression with lagged differences to test the null hypothesis of a unit root:
rejecting non-stationarity if the test statistic falls below critical values. These models, developed by Box and Jenkins, are widely used for short-term forecasting in economics and finance due to their parsimony and interpretability. Exponential smoothing methods provide simpler alternatives for forecasting, weighting recent observations more heavily to adapt to changes in level, trend, or seasonality. Simple exponential smoothing updates forecasts as a convex combination of the actual value and prior forecast:
where is the smoothing parameter controlling responsiveness. Holt's method extends this to linear trends by maintaining separate level and slope equations:
with forecast . The Holt-Winters approach adds multiplicative or additive seasonality, decomposing the series into level, trend, and seasonal factors updated via additional smoothing parameters, making it suitable for data with periodic patterns like monthly sales. These techniques, originating from Holt's 1957 work and Winters' extension, excel in computational efficiency for real-time applications. Spectral analysis decomposes time series into frequency components to identify dominant cycles, contrasting time-domain methods by revealing periodic structures. The periodogram serves as a nonparametric estimator of the power spectral density, computed as
for frequencies , . It highlights peaks corresponding to periodicities but suffers from high variance, often requiring smoothing via Welch's method or multitaper techniques for stability. This approach is particularly valuable for detecting hidden rhythms in geophysical or astronomical data, building on early work by Schuster. Forecasting performance is evaluated using scale-dependent metrics that quantify prediction errors on holdout data. The mean absolute error (MAE) measures average deviation without penalizing direction:
offering interpretability in original units. The root mean squared error (RMSE) emphasizes larger errors through squaring:
and is sensitive to outliers, making it suitable for normally distributed residuals. These metrics guide model selection, with lower values indicating better accuracy, as standardized in forecasting practice.
