Algebraic statistics

Algebraic statistics is a branch of mathematical statistics that focuses on the use of algebraic, geometric, and combinatorial methods in statistics. While the use of these methods has a long history in statistics, algebraic statistics is continuously forging new interdisciplinary connections.

This growing field has established itself squarely at the intersection of several areas of mathematics, including, for instance, multilinear algebra, commutative algebra, algebraic geometry, convex geometry, combinatorics, theoretical problems in statistics, and their practical applications. For example, algebraic statistics has been useful for experimental design, parameter estimation, and hypothesis testing.

History

Algebraic statistics can be traced back to Karl Pearson, who used polynomial algebra to study Gaussian mixture models. Subsequently, Ronald A. Fisher, Henry B. Mann, and Rosemary A. Bailey applied Abelian groups to the design of experiments. Experimental designs were also studied with affine geometry over finite fields and then with the introduction of association schemes by R. C. Bose. Orthogonal arrays were introduced by C. R. Rao also for experimental designs.

The field experienced a major revitalization in the 1990s. In 1998, Diaconis and Sturmfels introduced Gröbner bases for constructing Markov chain Monte Carlo algorithms for conditional sampling from discrete exponential families. Pistone and Wynn, in 1996, applied computational commutative algebra to the design and analysis of experiments, providing new tools for understanding confounding and identifiability in complex experimental settings. These works, along with the monograph by Giovanni Pistone, Eva Riccomagno, and Henry P. Wynn, in which the term “algebraic statistics” was first used, played a pivotal role in establishing this field as a unified area of research.

Modern researchers in algebraic statistics explore a wide range of topics, including computational biology, graphical models, and statistical learning.

Active Research Areas

Phylogenetics

Maximum likelihood estimation

Method of moments

Graphical models

Tropical statistics

Statistical learning theory

Algebraic geometry has also recently found applications to statistical learning theory, including a generalization of the Akaike information criterion to singular statistical models.^[1]

Introductory Example

Consider a random variable X which can take on the values 0, 1, 2. Such a variable is completely characterized by the three probabilities

p_{i}=\mathrm {Pr} (X=i),\quad i=0,1,2

and these numbers satisfy

\sum _{i=0}^{2}p_{i}=1\quad {\mbox{and}}\quad 0\leq p_{i}\leq 1.

Conversely, any three such numbers unambiguously specify a random variable, so we can identify the random variable X with the tuple $(p_{0},p_{1},p_{2})\in \mathbb {R} ^{3}$ .

Now suppose X is a binomial random variable with parameter q and n = 2, i.e. X represents the number of successes when repeating a certain experiment two times, where each experiment has an individual success probability of q. Then

p_{i}=\mathrm {Pr} (X=i)={2 \choose i}q^{i}(1-q)^{2-i}

and it is not hard to show that the tuples $(p_{0},p_{1},p_{2})$ which arise in this way are precisely the ones satisfying

4p_{0}p_{2}-p_{1}^{2}=0.\

The latter is a polynomial equation defining an algebraic variety (or surface) in $\mathbb {R} ^{3}$ , and this variety, when intersected with the simplex given by

\sum _{i=0}^{2}p_{i}=1\quad {\mbox{and}}\quad 0\leq p_{i}\leq 1,

yields a piece of an algebraic curve which may be identified with the set of all 3-state Bernoulli variables. Determining the parameter q amounts to locating one point on this curve; testing the hypothesis that a given variable X is Bernoulli amounts to testing whether a certain point lies on that curve or not.

References

^ Watanabe, Sumio. "Why algebraic geometry?".
^
A gap in Garrett Birkhoff's original proof was filled by Alexander Ostrowski.
- Garrett Birkhoff, 1967. Lattice Theory, 3rd ed. Vol. 25 of AMS Colloquium Publications. American Mathematical Society.

R. A. Bailey. Association Schemes: Designed Experiments, Algebra and Combinatorics, Cambridge University Press, Cambridge, 2004. 387pp. ISBN 0-521-82446-X. (Chapters from preliminary draft are available on-line)
Caliński, Tadeusz; Kageyama, Sanpei (2003). Block designs: A Randomization approach, Volume II: Design. Lecture Notes in Statistics. Vol. 170. New York: Springer-Verlag. ISBN 0-387-95470-8.
Hinkelmann, Klaus; Kempthorne, Oscar (2005). Design and Analysis of Experiments, Volume 2: Advanced Experimental Design (First ed.). Wiley. ISBN 978-0-471-55177-5.
H. B. Mann. 1949. Analysis and Design of Experiments: Analysis of Variance and Analysis-of-Variance Designs. Dover.
Raghavarao, Damaraju (1988). Constructions and Combinatorial Problems in Design of Experiments (corrected reprint of the 1971 Wiley ed.). New York: Dover.
Raghavarao, Damaraju; Padgett, L.V. (2005). Block Designs: Analysis, Combinatorics and Applications. World Scientific.
Street, Anne Penfold; Street, Deborah J. (1987). Combinatorics of Experimental Design. Oxford U. P. [Clarendon]. ISBN 0-19-853256-3.
L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press 2005.
G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics. CRC Press, 2001.
Drton, Mathias, Sturmfels, Bernd, Sullivant, Seth. Lectures on Algebraic Statistics, Springer 2009.
Watanabe, Sumio. Algebraic Geometry and Statistical Learning Theory, Cambridge University Press 2009.
Paolo Gibilisco, Eva Riccomagno, Maria-Piera Rogantin, Henry P. Wynn. Algebraic and Geometric Methods in Statistics, Cambridge 2009.

External links

[1] Watanabe, Sumio. "Why algebraic geometry?".

[2] A gap in Garrett Birkhoff's original proof was filled by Alexander Ostrowski.
Garrett Birkhoff, 1967. Lattice Theory, 3rd ed. Vol. 25 of AMS Colloquium Publications. American Mathematical Society.

[3] Garrett Birkhoff, 1967. Lattice Theory, 3rd ed. Vol. 25 of AMS Colloquium Publications. American Mathematical Society.

[1]

[2]

History

Algebraic statistics

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Algebraic statistics

History

Active Research Areas

Phylogenetics

Maximum likelihood estimation

Method of moments

Graphical models

Tropical statistics

Statistical learning theory

Other topics

Algebraic analysis and abstract statistical inference

Partially ordered sets and lattices

Introductory Example

References

External links