Recent from talks
Nothing was collected or created yet.
Iterative proportional fitting
View on WikipediaThe iterative proportional fitting procedure (IPF or IPFP, also known as biproportional fitting or biproportion in statistics or economics (input-output analysis, etc.), RAS algorithm[1] in economics, raking in survey statistics, and matrix scaling in computer science) is the operation of finding the fitted matrix which is the closest to an initial matrix but with the row and column totals of a target matrix (which provides the constraints of the problem; the interior of is unknown). The fitted matrix being of the form , where and are diagonal matrices such that has the margins (row and column sums) of . Some algorithms can be chosen to perform biproportion. We have also the entropy maximization,[2][3] information loss minimization (or cross-entropy)[4] or RAS which consists of factoring the matrix rows to match the specified row totals, then factoring its columns to match the specified column totals; each step usually disturbs the previous step's match, so these steps are repeated in cycles, re-adjusting the rows and columns in turn, until all specified marginal totals are satisfactorily approximated. However, all algorithms give the same solution.[5] In three- or more-dimensional cases, adjustment steps are applied for the marginals of each dimension in turn, the steps likewise repeated in cycles.
History
[edit]IPF has been "re-invented" many times, the earliest by Kruithof in 1937 [6] in relation to telephone traffic ("Kruithof’s double factor method"), Deming and Stephan in 1940[7] for adjusting census crosstabulations, and G.V. Sheleikhovskii for traffic as reported by Bregman.[8] (Deming and Stephan proposed IPFP as an algorithm leading to a minimizer of the Pearson X-squared statistic, which Stephan later reported it does not).[9] Early proofs of uniqueness and convergence came from Sinkhorn (1964),[10] Bacharach (1965),[11] Bishop (1967),[12] and Fienberg (1970).[13] Bishop's proof that IPFP finds the maximum likelihood estimator for any number of dimensions extended a 1959 proof by Brown for 2x2x2... cases. Fienberg's proof by differential geometry exploits the method's constant crossproduct ratios, for strictly positive tables. Csiszár (1975).[14] found necessary and sufficient conditions for general tables having zero entries. Pukelsheim and Simeone (2009) [15] give further results on convergence and error behavior.
An exhaustive treatment of the algorithm and its mathematical foundations can be found in the book of Bishop et al. (1975).[16] Idel (2016)[17] gives a more recent survey.
Other general algorithms can be modified to yield the same limit as the IPFP, for instance the Newton–Raphson method and the EM algorithm. In most cases, IPFP is preferred due to its computational speed, low storage requirements, numerical stability and algebraic simplicity.
Applications of IPFP have grown to include trip distribution models, Fratar or Furness and other applications in transportation planning (Lamond and Stewart), survey weighting, synthesis of cross-classified demographic data, adjusting input–output models in economics, estimating expected quasi-independent contingency tables, biproportional apportionment systems of political representation, and for a preconditioner in linear algebra.[18]
Biproportion
[edit]Biproportion, whatever the algorithm used to solve it, is the following concept: , matrix and matrix are known real nonnegative matrices of dimension ; the interior of is unknown and is searched such that has the same margins than , i.e. and ( being the sum vector), and such that is close to following a given criterion, the fitted matrix being of the form , where and are diagonal matrices.
s.t., ∀ and , ∀. The Lagrangian is .
Thus , for ∀,
which, after posing and , yields
, ∀, i.e., ,
with , ∀ and , ∀. and form a system that can be solve iteratively:
, ∀ and , ∀.
The solution is independent of the initialization chosen (i.e., we can begin by , ∀ or by , ∀. If the matrix is “indecomposable”, then this process has a unique fixed-point because it is deduced from a program where the function is a convex and continuously derivable function defined on a compact set. In some cases the solution may not exist: see de Mesnard's example cited by Miller and Blair (Miller R.E. & Blair P.D. (2009) Input-output analysis: Foundations and Extensions, Second edition, Cambridge (UK): Cambridge University Press, p. 335-336 (freely available)).
Some properties (see de Mesnard (1994)):
Lack of information: if brings no information, i.e., , ∀ then .
Idempotency: if has the same margins than .
Composition of biproportions: ; .
Zeros: a zero in is projected as a zero in . Thus, a bloc-diagonal matrix is projected as a bloc-diagonal matrix and a triangular matrix is projected as a triangular matrix.
Theorem of separable modifications: if is premutiplied by a diagonal matrix and/or postmultiplied by a diagonal matrix, then the solution is unchanged.
Theorem of "unicity": If is any non-specified algorithm, with , and being unknown, then and can always be changed into the standard form of and . The demonstrations calls some above properties, particularly the Theorem of separable modifications and the composition of biproportions.
Algorithm 1 (classical IPF)
[edit]Given a two-way (I × J)-table , we wish to estimate a new table for all i and j such that the marginals satisfy and .
Choose initial values , and for set
Repeat these steps until row and column totals are sufficiently close to u and v.
Notes:
- For the RAS form of the algorithm, define the diagonalization operator , which produces a (diagonal) matrix with its input vector on the main diagonal and zero elsewhere. Then, for each row adjustment, let , from which . Similarly each column adjustment's , from which . Reducing the operations to the necessary ones, it can easily be seen that RAS does the same as classical IPF. In practice, one would not implement actual matrix multiplication with the whole R and S matrices; the RAS form is more a notational than computational convenience.
Algorithm 2 (factor estimation)
[edit]Assume the same setting as in the classical IPFP. Alternatively, we can estimate the row and column factors separately: Choose initial values , and for set
Repeat these steps until successive changes of a and b are sufficiently negligible (indicating the resulting row- and column-sums are close to u and v).
Finally, the result matrix is
Notes:
- The two variants of the algorithm are mathematically equivalent, as can be seen by formal induction. With factor estimation, it is not necessary to actually compute each cycle's .
- The factorization is not unique, since it is for all .
Discussion
[edit]The vaguely demanded 'similarity' between M and X can be explained as follows: IPFP (and thus RAS) maintains the crossproduct ratios, i.e.
since
This property is sometimes called structure conservation and directly leads to the geometrical interpretation of contingency tables and the proof of convergence in the seminal paper of Fienberg (1970).
Direct factor estimation (algorithm 2) is generally the more efficient way to solve IPF: Whereas a form of the classical IPFP needs
elementary operations in each iteration step (including a row and a column fitting step), factor estimation needs only
operations being at least one order in magnitude faster than classical IPFP.
IPFP can be used to estimate expected quasi-independent (incomplete) contingency tables, with , and for included cells and for excluded cells. For fully independent (complete) contingency tables, estimation with IPFP concludes exactly in one cycle.
Comparison with the NM-method
[edit]Similar to the IPF, the NM-method is also an operation of finding a matrix which is the “closest” to matrix () while its row totals and column totals are identical to those of a target matrix .
However, there are differences between the NM-method and the IPF. For instance, the NM-method defines closeness of matrices of the same size differently from the IPF.[19] Also, the NM-method was developed to solve for matrix in problems, where matrix is not a sample from the population characterized by the row totals and column totals of matrix , but represents another population.[19] In contrast, matrix is a sample from this population in problems where the IPF is applied as the maximum likelihood estimator.
Macdonald (2023)[20] is at ease with the conclusion by Naszodi (2023)[21] that the IPF is suitable for sampling correction tasks, but not for generation of counterfactuals. Similarly to Naszodi, Macdonald also questions whether the row and column proportional transformations of the IPF preserve the structure of association within a contingency table that allows us to study social mobility.
Existence and uniqueness of MLEs
[edit]Necessary and sufficient conditions for the existence and uniqueness of MLEs are complicated in the general case (see[22]), but sufficient conditions for 2-dimensional tables are simple:
- the marginals of the observed table do not vanish (that is, ) and
- the observed table is inseparable (i.e. the table does not permute to a block-diagonal shape).
If unique MLEs exist, IPFP exhibits linear convergence in the worst case (Fienberg 1970), but exponential convergence has also been observed (Pukelsheim and Simeone 2009). If a direct estimator (i.e. a closed form of ) exists, IPFP converges after 2 iterations. If unique MLEs do not exist, IPFP converges toward the so-called extended MLEs by design (Haberman 1974), but convergence may be arbitrarily slow and often computationally infeasible.
If all observed values are strictly positive, existence and uniqueness of MLEs and therefore convergence is ensured.
Example
[edit]Consider the following table, given with the row- and column-sums and targets.
| 1 | 2 | 3 | 4 | TOTAL | TARGET | |
|---|---|---|---|---|---|---|
| 1 | 40 | 30 | 20 | 10 | 100 | 150 |
| 2 | 35 | 50 | 100 | 75 | 260 | 300 |
| 3 | 30 | 80 | 70 | 120 | 300 | 400 |
| 4 | 20 | 30 | 40 | 50 | 140 | 150 |
| TOTAL | 125 | 190 | 230 | 255 | 800 | |
| TARGET | 200 | 300 | 400 | 100 | 1000 |
For executing the classical IPFP, we first adjust the rows:
| 1 | 2 | 3 | 4 | TOTAL | TARGET | |
|---|---|---|---|---|---|---|
| 1 | 60.00 | 45.00 | 30.00 | 15.00 | 150.00 | 150 |
| 2 | 40.38 | 57.69 | 115.38 | 86.54 | 300.00 | 300 |
| 3 | 40.00 | 106.67 | 93.33 | 160.00 | 400.00 | 400 |
| 4 | 21.43 | 32.14 | 42.86 | 53.57 | 150.00 | 150 |
| TOTAL | 161.81 | 241.50 | 281.58 | 315.11 | 1000.00 | |
| TARGET | 200 | 300 | 400 | 100 | 1000 |
The first step exactly matched row sums, but not the column sums. Next we adjust the columns:
| 1 | 2 | 3 | 4 | TOTAL | TARGET | |
|---|---|---|---|---|---|---|
| 1 | 74.16 | 55.90 | 42.62 | 4.76 | 177.44 | 150 |
| 2 | 49.92 | 71.67 | 163.91 | 27.46 | 312.96 | 300 |
| 3 | 49.44 | 132.50 | 132.59 | 50.78 | 365.31 | 400 |
| 4 | 26.49 | 39.93 | 60.88 | 17.00 | 144.30 | 150 |
| TOTAL | 200.00 | 300.00 | 400.00 | 100.00 | 1000.00 | |
| TARGET | 200 | 300 | 400 | 100 | 1000 |
Now the column sums exactly match their targets, but the row sums no longer match theirs. After completing three cycles, each with a row adjustment and a column adjustment, we get a closer approximation:
| 1 | 2 | 3 | 4 | TOTAL | TARGET | |
|---|---|---|---|---|---|---|
| 1 | 64.61 | 46.28 | 35.42 | 3.83 | 150.13 | 150 |
| 2 | 49.95 | 68.15 | 156.49 | 25.37 | 299.96 | 300 |
| 3 | 56.70 | 144.40 | 145.06 | 53.76 | 399.92 | 400 |
| 4 | 28.74 | 41.18 | 63.03 | 17.03 | 149.99 | 150 |
| TOTAL | 200.00 | 300.00 | 400.00 | 100.00 | 1000.00 | |
| TARGET | 200 | 300 | 400 | 100 | 1000 |
Implementation
[edit]The R package mipfp (currently in version 3.2) provides a multi-dimensional implementation of the traditional iterative proportional fitting procedure.[23] The package allows the updating of a N-dimensional array with respect to given target marginal distributions (which, in turn can be multi-dimensional).
Python has an equivalent package, ipfn[24][25] that can be installed via pip. The package supports numpy and pandas input objects.
See also
[edit]- Data cleansing
- Data editing
- NM-method
- Triangulation (social science) for quantitative and qualitative study data enhancement.
References
[edit]- ^ Bacharach, M. (1965). "Estimating Nonnegative Matrices from Marginal Data". International Economic Review. 6 (3). Blackwell Publishing: 294–310. doi:10.2307/2525582. JSTOR 2525582.
- ^ Jaynes E.T. (1957) Information theory and statistical mechanics, Physical Review, 106: 620-30.
- ^ Wilson A.G. (1970) Entropy in urban and regional modelling. London: Pion LTD, Monograph in spatial and environmental systems analysis.
- ^ Kullback S. & Leibler R.A. (1951) On information and sufficiency, Annals of Mathematics and Statistics, 22 (1951) 79-86.
- ^ de Mesnard, L. (1994). "Unicity of Biproportion". SIAM Journal on Matrix Analysis and Applications. 15 (2): 490–495. doi:10.1137/S0895479891222507.https://www.researchgate.net/publication/243095013_Unicity_of_Biproportion
- ^ Kruithof, J (February 1937). "Telefoonverkeersrekening (Calculation of telephone traffic)". De Ingenieur. 52 (8): E15 – E25.
- ^ Deming, W. E.; Stephan, F. F. (1940). "On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known". Annals of Mathematical Statistics. 11 (4): 427–444. doi:10.1214/aoms/1177731829. MR 0003527.
- ^ Lamond, B. and Stewart, N.F. (1981) Bregman's balancing method. Transportation Research 15B, 239-248.
- ^ Stephan, F. F. (1942). "Iterative method of adjusting frequency tables when expected margins are known". Annals of Mathematical Statistics. 13 (2): 166–178. doi:10.1214/aoms/1177731604. MR 0006674. Zbl 0060.31505.
- ^ Sinkhorn, Richard (1964). “A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices”. In: Annals of Mathematical Statistics 35.2, pp. 876–879.
- ^ Bacharach, Michael (1965). “Estimating Nonnegative Matrices from Marginal Data”. In: International Economic Review 6.3, pp. 294–310.
- ^ Bishop, Y. M. M. (1967). “Multidimensional contingency tables: cell estimates”. PhD thesis. Harvard University.
- ^ Fienberg, S. E. (1970). "An Iterative Procedure for Estimation in Contingency Tables". Annals of Mathematical Statistics. 41 (3): 907–917. doi:10.1214/aoms/1177696968. JSTOR 2239244. MR 0266394. Zbl 0198.23401.
- ^ Csiszár, I. (1975). "I-Divergence of Probability Distributions and Minimization Problems". Annals of Probability. 3 (1): 146–158. doi:10.1214/aop/1176996454. JSTOR 2959270. MR 0365798. Zbl 0318.60013.
- ^ "On the Iterative Proportional Fitting Procedure: Structure of Accumulation Points and L1-Error Analysis". Pukelsheim, F. and Simeone, B. Retrieved 2009-06-28.
- ^ Bishop, Y. M. M.; Fienberg, S. E.; Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press. ISBN 978-0-262-02113-5. MR 0381130.
- ^ Martin Idel (2016) A review of matrix scaling and Sinkhorn’s normal form for matrices and positive maps arXiv preprint https://arxiv.org/pdf/1609.06349.pdf
- ^ Bradley, A.M. (2010) Algorithms for the equilibration of matrices and their application to limited-memory quasi-newton methods. Ph.D. thesis, Institute for Computational and Mathematical Engineering, Stanford University, 2010
- ^ a b Naszodi, A.; Mendonca, F. (2021). "A new method for identifying the role of marital preferences at shaping marriage patterns". Journal of Demographic Economics. 1 (1): 1–27. doi:10.1017/dem.2021.1.
- ^ Macdonald, K. (2023). "The marginal adjustment of mobility tables, revisited". OSF: 1–19.
- ^ Naszodi, A. (2023). "The iterative proportional fitting algorithm and the NM-method: solutions for two different sets of problems". arXiv:2303.05515 [econ.GN].
- ^ Haberman, S. J. (1974). The Analysis of Frequency Data. Univ. Chicago Press. ISBN 978-0-226-31184-5.
- ^ Barthélemy, Johan; Suesse, Thomas. "mipfp: Multidimensional Iterative Proportional Fitting". CRAN. Retrieved 23 February 2015.
- ^ "ipfn: pip".
- ^ "ipfn: github". GitHub.
Iterative proportional fitting
View on GrokipediaOverview
Definition and Purpose
Iterative proportional fitting (IPF) is an iterative algorithm for adjusting the entries of an initial nonnegative matrix to satisfy given row and column marginal totals, while preserving the relative proportions of the original matrix as closely as possible. The method alternates between scaling all rows to match their prescribed totals and then scaling all columns to match theirs, repeating until convergence or a specified tolerance is achieved. This process ensures the adjusted matrix reproduces the exact marginal constraints multiplicatively.[5][4] The core purpose of IPF is to generate or refine multidimensional contingency tables from partial information, such as one-dimensional marginal distributions and a seed matrix capturing approximate cell dependencies. It addresses the problem of estimating joint probabilities or frequencies when direct observations are unavailable but margins are known from sources like censuses or surveys. For instance, in demographic modeling, IPF constructs cross-classified tables (e.g., households by income and size) by fitting an initial uniform or prior-based matrix to separate univariate totals, enabling consistent population projections.[7][8] In economic applications, such as input-output analysis, IPF updates matrices under the RAS procedure to reconcile supply and demand margins amid structural changes, maintaining balance without assuming independence. The algorithm's multiplicative nature yields the maximum entropy solution or maximum likelihood estimates under log-linear models with Poisson-distributed cell counts, providing a statistically principled adjustment that avoids negative values and respects data nonnegativity. Limitations include potential non-convergence for incompatible margins or highly sparse seeds, though it remains computationally efficient for moderate dimensions.[9][4][1]Relation to Biproportional Fitting
Biproportional fitting refers to the adjustment of an initial nonnegative matrix to a target matrix such that for scaling factors and , ensuring the row sums of match prescribed totals and the column sums match prescribed totals .[10] This formulation preserves the cross-product ratios of the initial matrix while achieving exact marginal consistency, a property central to applications in contingency table estimation and input-output analysis.[11] Iterative proportional fitting (IPF) provides the computational procedure to obtain this biproportional solution by alternately normalizing rows and columns of an intermediate matrix until convergence to the unique solution satisfying the margins, assuming compatibility conditions hold (i.e., the total sum of row margins equals that of column margins).[12] The equivalence arises because each IPF iteration multiplies row by (current row sum ) and column by (current column sum ), accumulating the diagonal matrices and such that .[13] Under the assumption of positive entries in and compatible margins, the fixed point of IPF coincides with the unique biproportional fit, which also maximizes the likelihood under a multinomial model for the cell probabilities proportional to .[14] This relation underscores IPF's role as the iterative solver for the biproportional problem, with historical usage of "biproportional fitting" predating widespread adoption of IPF terminology, as noted in economic literature from the 1960s.[11]Historical Development
Origins and Early Applications
The iterative proportional fitting procedure originated in 1937 with R. Kruithof's work on telephone traffic networks, where it served as a method to adjust an initial matrix of observed call volumes to match specified row and column totals representing origin-destination demands and capacities.[15] Kruithof's approach, known as the "double factor method," iteratively scaled rows and columns alternately until the marginal constraints were satisfied, addressing imbalances in traffic data estimation without assuming independence between factors.[16] This application arose from practical needs in telecommunications engineering to reconcile incomplete or inconsistent datasets with known aggregates, demonstrating the method's utility in balancing multi-way arrays under additive constraints. In 1940, W. Edwards Deming and Frederick F. Stephan independently adapted and formalized the procedure for statistical estimation in contingency tables, applying it to adjust sample frequency tables from surveys or censuses to align with known population marginal totals.[17] Their paper outlined an iterative process to minimize discrepancies while preserving relative cell proportions, motivated by challenges in raking survey data where direct marginal controls were available but internal crosstabulations were under-sampled or erroneous.[18] This statistical framing emphasized least-squares adjustments under Poisson sampling assumptions, establishing IPF as a tool for unbiased estimation in incomplete tabular data, particularly in demographic and social surveys. Early applications extended to fields requiring reconciliation of inconsistent matrices, such as early input-output economic modeling and transportation planning, where the method balanced supply-demand tables against fixed totals.[2] By the mid-1940s, it gained traction in U.S. census adjustments and sample validation, though convergence properties remained empirically observed rather than theoretically proven until later analyses. These initial uses highlighted IPF's robustness for multi-dimensional problems but also its sensitivity to starting values and marginal consistency, as inconsistent constraints could prevent convergence.[17]Key Advancements Post-1940
In the decades following its initial formulation, iterative proportional fitting (IPF) saw significant theoretical refinements, particularly regarding convergence and its statistical interpretation. Sinkhorn (1964) provided an early proof of convergence for the procedure when applied to scaling matrices to doubly stochastic form, establishing conditions under which the iterations stabilize.[17] This work laid groundwork for broader applications, though it initially focused on permutation-invariant cases. Bacharach (1965) extended analysis to biproportional fitting in economic contexts, demonstrating uniqueness under compatible marginal constraints.[2] By the late 1960s, connections to log-linear models emerged as a pivotal advancement. Darroch (1962) implicitly employed IPF to compute maximum likelihood estimates under models of independence, bridging the algorithm to probabilistic inference in contingency tables.[2] Fienberg (1970) formalized its role in deriving maximum likelihood estimates for hierarchical log-linear models, as proposed by Birch (1963), enabling IPF's use beyond simple margin matching to fit complex interaction structures. Bishop (1967) contributed proofs of existence and uniqueness for solutions in multi-way tables, addressing scalability to higher dimensions.[17] Ireland and Kullback (1968) advanced the theoretical foundation by proving monotonic convergence to the unique solution minimizing Kullback-Leibler divergence from the initial matrix, subject to marginal constraints, thus linking IPF to information theory and entropy maximization principles.[10] This interpretation justified IPF's use in scenarios requiring minimal alteration to prior estimates, such as survey weighting or input-output table balancing via the RAS variant, which gained traction in economic modeling during the 1960s for updating matrices with new sector totals.[19] Subsequent extensions in the 1970s and beyond included algorithmic optimizations for computational efficiency and handling of incompatible margins, with Fienberg and others emphasizing IPF's equivalence to generalized least squares under log-linear parametrization. These developments solidified IPF's status as a robust tool for empirical adjustment in statistics and econometrics, despite occasional non-convergence in sparse or high-dimensional settings without additional regularization.Mathematical Formulation
Core Problem Setup
Iterative proportional fitting addresses the biproportional scaling problem of adjusting an initial non-negative matrix to prescribed row and column marginals via separable multiplicative factors. Given an matrix , a positive row marginal vector with , and a positive column marginal vector with satisfying the compatibility condition , the task is to determine positive diagonal matrices and , with and , such that the scaled matrix fulfills the marginal constraints and , where is the -dimensional all-ones vector.[10][17] This formulation ensures that, for entries where , the adjustment ratios satisfy , maintaining proportionality through independent row and column scaling factors. The problem originates from the need to reconcile a sampled or prior frequency table with known population marginal totals, as posed by Deming and Stephan in 1940, who framed it as a least-squares adjustment minimizing discrepancies relative to the initial estimates while enforcing the marginals.[20][18] Under standard assumptions of positive initial entries in the support of the solution and marginal compatibility, a unique solution exists within the biproportional family.[10]Connection to Maximum Likelihood Estimation
Iterative proportional fitting (IPF) computes the maximum likelihood estimates (MLEs) for the expected cell frequencies in a contingency table under a log-linear Poisson model where the observed marginal totals are treated as fixed sufficient statistics.[2] Specifically, assuming the cell counts are independent Poisson random variables with means , the model posits to incorporate row and column effects, ensuring the estimated margins match the observed ones.[21] The likelihood is maximized subject to these margin constraints, and IPF iteratively scales an initial estimate (often the observed table or a uniform matrix) by row and column factors until convergence, yielding the unique MLE under standard regularity conditions.[22] This equivalence arises because the estimating equations for the MLE in such models reduce to matching the observed margins, as the Poisson log-likelihood's score equations depend only on the differences between expected and observed marginals.[23] Each IPF iteration corresponds to a conditional maximization step in the expectation-conditional maximization (ECM) algorithm, a generalization of the expectation-maximization (EM) algorithm, confirming its optimality for the Poisson parameterization.[24] For sparse tables or higher-dimensional arrays, IPF remains computationally efficient for MLE, though direct Newton-Raphson methods may be used for parameter estimation in unsaturated models.[4] Equivalently, under the multinomial sampling framework with fixed total count, IPF provides the MLE for cell probabilities by conditioning on the margins, preserving the same iterative adjustment process.[25] This connection extends to relational models and curved exponential families, where generalized IPF variants compute MLEs for parameters beyond simple margins, such as in network inference from marginals.[26] Empirical studies confirm IPF's MLE properties hold under the Poisson assumption, with deviations only if margins are not sufficient statistics for the model.[27]Algorithms
Classical Iterative Proportional Fitting
The classical iterative proportional fitting (IPF) procedure, also known as the RAS method in some economic contexts, is an alternating scaling algorithm designed to adjust an initial nonnegative matrix to satisfy specified row and column marginal totals while preserving the relative structure of the input.[9] Introduced by Deming and Stephan in 1940 for estimating cell probabilities in sampled frequency tables under known marginal constraints, it operates through successive multiplicative adjustments to rows and columns.[2] The method assumes the marginal totals are compatible, meaning the sum of row margins equals the sum of column margins, ensuring a feasible solution exists.[17] The algorithm begins with an initial matrix , typically derived from observed data, a uniform prior, or an independence assumption (e.g., outer product of marginals divided by grand total). In each iteration :- Compute row scaling factors , where is the target -th row marginal.
- Update to intermediate matrix .
- Compute column scaling factors , where is the target -th column marginal.
- Update to .
Factor Estimation Approach
The factor estimation approach to iterative proportional fitting computes scaling factors for each dimension directly, rather than iteratively adjusting the full contingency table as in the classical method. For a two-way table, it seeks diagonal matrices (row factors) and (column factors) such that the adjusted table matches the target row marginals and column marginals , where is the seed table. This formulation minimizes divergence from subject to the marginal constraints, equivalent to maximum likelihood under a multinomial model assuming independence.[30][31] The algorithm initializes column factors for all , then alternates updates: row factors for each , followed by column factors for each . These steps repeat until the marginal discrepancies fall below a tolerance threshold, such as the L2-norm of differences less than . The final adjusted table is then . This avoids full matrix multiplications per iteration, reducing computational cost from per step in classical IPF to for factor updates, where and are dimensions, making it more efficient for large sparse tables.[30] For multi-way tables, the approach extends to multiple factor sets, one per dimension, with cyclic updates over marginal constraints; implementations handle -dimensions via tensor operations. Convergence holds under the same conditions as classical IPF—positive seed entries and consistent margins—yielding the unique solution minimizing Kullback-Leibler divergence.[32][33] Practical software, such as Julia's ProportionalFitting package, employs this for -dimensional arrays, confirming its scalability.[32]Theoretical Properties
Existence and Uniqueness of Solutions
A solution to the biproportional fitting problem underlying iterative proportional fitting (IPF) exists if and only if the prescribed row marginal vector and column marginal vector (with ) are compatible with the support of the initial nonnegative matrix , meaning that and lie within the transportation polytope defined by the zero pattern of . This compatibility ensures the feasibility of scaling factors and such that achieves exactly the target margins, without violating structural zeros in . For instance, if has full support (all entries positive) and , existence is guaranteed as the polytope has nonempty interior.[10] When a solution exists, it is unique. This uniqueness stems from the fact that distinct diagonal scaling matrices and producing the same margins would imply identical row and column scaling factors up to the biproportional structure, as the mapping from scalings to margins is strictly monotone and invertible on the feasible set. Formally, if two biproportional scalings and of share the same row sums and column sums , then the row scaling factors coincide ( for all ) and similarly for columns, yielding . This property holds even for matrices with zeros, provided the solution respects the support.[10] In the context of maximum likelihood estimation for contingency tables under independent Poisson sampling, the IPF solution corresponds to the unique maximum-likelihood estimator (MLE) when it exists, as the likelihood is strictly log-concave on the feasible set, ensuring a unique global maximum. Absence of existence manifests as IPF failing to converge to exact margins, often diverging or stabilizing at infeasible approximations; uniqueness then applies to any limit point within the feasible polytope. Empirical verification in applications, such as demographic projections, confirms that violations of existence conditions (e.g., overly discrepant marginals relative to 's scale) lead to non-convergence, underscoring the theorem's practical import.[29]Convergence Conditions
The iterative proportional fitting (IPF) procedure converges if and only if a biproportional fit to the prescribed row margins and column margins exists for the initial nonnegative matrix , which requires the total marginal sums to be equal () and the target margins to satisfy the support constraints of 's zero pattern. Specifically, for every subset of rows, the partial row sum must not exceed the partial column sum over the columns that receive support from in , i.e., . These conditions ensure feasibility within the transportation polytope defined by 's support, preserving zeros throughout iterations and yielding a unique limit matrix when a solution exists.[10] The L1-error between the current row and column sums and their targets decreases monotonically during IPF iterations, bounded above by the maximum discrepancy in partial sums across row subsets, , where primes denote complements; convergence to zero error follows when this bound vanishes, confirming the procedure's success. When marginals and initial entries are strictly positive, these conditions hold generically, leading to rapid convergence to the maximum entropy solution matching the margins. In contrast, infeasible margins—such as inconsistent totals or violations of subset inequalities—result in non-convergence, often manifesting as oscillation or divergence in sums.[10][34] Extensions to general information projection problems, including multivariate densities or linear subspace constraints, guarantee exponential convergence under additional regularity assumptions like strong duality, bounded log-densities (uniformly by some ), and closure of the sum of constraint subspaces in . The contraction rate depends on the condition number of the constraint operator and geometric factors such as the Friedrichs angle between subspaces, with larger angles yielding faster rates; these results quantify IPF's behavior beyond classical matrix scaling.[35]Applications
In Contingency Tables and Demography
Iterative proportional fitting (IPF), also known as the RAS method in certain contexts, serves as a standard procedure for estimating expected cell frequencies in multi-dimensional contingency tables while ensuring consistency with specified marginal totals. Originally developed by Deming and Stephan in 1940 for two-way tables, it was generalized by Fienberg in 1970 to handle higher-dimensional arrays by iteratively adjusting an initial matrix through alternating proportional scalings of rows and columns (or higher-order margins) until the marginal constraints are satisfied within a tolerance.[36] This yields maximum likelihood estimates under log-linear models assuming cell counts follow a Poisson distribution, equivalent to multinomial sampling with fixed totals, under the hypothesis of independence or specified conditional independences.[2] In contingency tables, IPF addresses the problem of incomplete data or under-sampling by "completing" the table: starting from an initial estimate (e.g., observed frequencies or uniform priors), it scales subsets of cells proportionally to match observed or target marginals, preserving cross-classification structure without assuming a full parametric form beyond the margins.[4] For instance, in a three-way table of variables like age, sex, and occupation, IPF can fit one-way, two-way, or all three-way marginals simultaneously, producing estimates that are the unique solution under non-negativity and the specified constraints when a solution exists.[27] The method's iterative nature ensures convergence to the MLE for hierarchical log-linear models, as proven geometrically and algebraically in the literature.[34] Demographic applications of IPF center on population synthesis and projection, where it generates synthetic microdata or joint distributions matching aggregate marginal constraints from censuses, surveys, or administrative records.[37] For example, to create household-level datasets for small areas, IPF adjusts an initial seed table of joint attribute frequencies (e.g., household size by income and location) to align with known univariate totals like total population by age-sex or regional counts, enabling agent-based simulations without microdata disclosure risks.[8] In multi-regional demographic models, it refits migration or fertility matrices to updated marginals, such as age-specific rates, facilitating forecasts; a 2016 review notes its widespread use in transportation and health demography for cost-effective alternatives to full microsimulation when detailed joints are unavailable.[37] Empirical evaluations, such as those in U.S. synthetic population datasets, confirm IPF's effectiveness in reproducing marginals accurately, though it may underperform in capturing rare events or correlations beyond the fitted margins without extensions.[38] In census updating, IPF has been applied since the 1970s to estimate small-area statistics by reallocating totals proportionally, as in UK subnational projections.[39]In Transportation and Spatial Microsimulation
In transportation planning, iterative proportional fitting (IPF) estimates origin-destination (OD) matrices by scaling an initial seed matrix—often derived from surveys or prior models—to match observed marginal totals for trip productions (row sums) and attractions (column sums). This doubly constrained approach underpins trip distribution in gravity-based models, enabling the reproduction of aggregate flows while preserving compatibility with supply-side constraints like network capacities. For example, in transit systems, IPF processes automatic passenger count data from boarding and alighting sensors to infer route-level OD patterns, as demonstrated in applications to bus networks where seed matrices are iteratively adjusted until residuals fall below predefined thresholds.[40] A 2010 analysis applied IPF to derive passenger OD flows on urban bus routes, achieving convergence in under 20 iterations for matrices up to 50x50 in size when starting from uniform priors.[41] IPF's utility extends to dynamic OD estimation in rail and highway contexts, where it fuses real-time data sources—such as ticket sales and traffic counts—to update matrices iteratively, supporting short-term forecasting for congestion management. In one railway network study, IPF integrated heterogeneous inputs to estimate peak-hour OD demands, yielding matrices with mean absolute percentage errors below 5% relative to ground-truth validations from dedicated surveys.[42] The method's computational efficiency, requiring only matrix multiplications and normalizations per cycle, makes it suitable for large-scale applications, though sensitivity to seed matrix choice necessitates robustness checks against perturbations.[43] In spatial microsimulation, IPF synthesizes disaggregate populations by adjusting a benchmark microdataset (e.g., individual records with attributes like age, income, and location) to conform to aggregate small-area constraints from censuses or administrative data. The process alternates row and column scalings across multi-way contingency tables, converging to a joint distribution that matches univariate and bivariate marginals simultaneously. This enables policy simulations, such as projecting household-level energy demands or health risks in under-sampled regions. For instance, county-level health indicator estimation has employed IPF to expand national survey microdata, aligning outputs with local vital statistics and reducing ecological fallacy biases in inferences.[44] Performance assessments confirm IPF's reliability for modest contingency dimensions (e.g., 10-20 categories per margin), with convergence typically achieved in 10-50 iterations under positive initial values and consistent marginals. However, it produces fractional cell weights, often addressed via post-hoc integerization like truncate-replicate-sampling to generate integer micro-units for agent-based modeling. Empirical tests across UK small areas showed IPF outperforming uniform priors in replicating observed attribute correlations, with chi-squared goodness-of-fit statistics improving by factors of 2-5 over naive benchmarks.[1] Applications in demographic forecasting highlight its role in overcoming data sparsity, though failures occur with zero cells or incompatible constraints, prompting hybrid uses with combinatorial optimization.[45]Practical Considerations
Illustrative Examples
A common application of iterative proportional fitting (IPF) involves adjusting an initial estimate of a two-way contingency table to match specified row and column marginal totals while preserving the relative structure of the initial matrix. This process iteratively scales rows and columns until convergence. Consider a survey sample of 300 individuals classified by sex (female, male) and ethnicity (Black, White, Asian, Native American, Other), initially weighted by 10 to estimate a population of 3000. The initial adjusted table, derived from sample proportions, is as follows:| Sex/Ethnicity | Black | White | Asian | Native | Other | Total |
|---|---|---|---|---|---|---|
| Female | 300 | 1200 | 60 | 30 | 30 | 1620 |
| Male | 150 | 1080 | 90 | 30 | 30 | 1380 |
| Total | 450 | 2280 | 150 | 60 | 60 | 3000 |
| Sex/Ethnicity | Black | White | Asian | Native | Other | Total |
|---|---|---|---|---|---|---|
| Female | 279.63 | 1118.52 | 55.93 | 27.96 | 27.96 | 1510 |
| Male | 161.96 | 1166.09 | 97.17 | 32.39 | 32.39 | 1490 |
| Total | 441.59 | 2284.61 | 153.10 | 60.35 | 60.35 | 3000 |
| Sex/Ethnicity | Black | White | Asian | Native | Other | Total |
|---|---|---|---|---|---|---|
| Female | 379.94 | 1037.93 | 54.79 | 46.33 | 13.90 | 1532.89 |
| Male | 220.06 | 1082.07 | 95.21 | 53.67 | 16.10 | 1467.11 |
| Total | 600.00 | 2120.00 | 150.00 | 100.00 | 30.00 | 3000 |
| Sex/Ethnicity | Black | White | Asian | Native | Other | Total |
|---|---|---|---|---|---|---|
| Female | 375.67 | 1021.76 | 53.74 | 45.57 | 13.67 | 1510.41 |
| Male | 224.33 | 1098.24 | 96.26 | 54.43 | 16.33 | 1489.59 |
| Total | 600.00 | 2120.00 | 150.00 | 100.00 | 30.00 | 3000 |
