Recent from talks
Nothing was collected or created yet.
Least absolute deviations
View on Wikipedia| Part of a series on |
| Regression analysis |
|---|
| Models |
| Estimation |
| Background |
Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based on minimizing the sum of absolute deviations (also sum of absolute residuals or sum of absolute errors) or the L1 norm of such values. It is analogous to the least squares technique, except that it is based on absolute values instead of squared values. It attempts to find a function which closely approximates a set of data by minimizing residuals between points generated by the function and corresponding data points. The LAD estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution. It was introduced in 1757 by Roger Joseph Boscovich.[1]
Formulation
[edit]Suppose that the data set consists of the points (xi, yi) with i = 1, 2, ..., n. We want to find a function f such that
To attain this goal, we suppose that the function f is of a particular form containing some parameters that need to be determined. For instance, the simplest form would be linear: f(x) = bx + c, where b and c are parameters whose values are not known but which we would like to estimate. Less simply, suppose that f(x) is quadratic, meaning that f(x) = ax2 + bx + c, where a, b and c are not yet known. (More generally, there could be not just one explanator x, but rather multiple explanators, all appearing as arguments of the function f.)
We now seek estimated values of the unknown parameters that minimize the sum of the absolute values of the residuals:
Solution
[edit]Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently. Unlike least squares regression, least absolute deviations regression does not have an analytical solving method. Therefore, an iterative approach is required. The following is an enumeration of some least absolute deviations solving methods.
- Simplex-based methods (such as the Barrodale-Roberts algorithm[2])
- Because the problem is a linear program, any of the many linear programming techniques (including the simplex method as well as others) can be applied.
- Iteratively re-weighted least squares[3]
- Wesolowsky's direct descent method[4]
- Li-Arce's maximum likelihood approach[5]
- Recursive reduction of dimensionality approach[6]
- Check all combinations of point-to-point lines for minimum sum of errors
Simplex-based methods are the “preferred” way to solve the least absolute deviations problem.[7] A Simplex method is a method for solving a problem in linear programming. The most popular algorithm is the Barrodale-Roberts modified Simplex algorithm. The algorithms for IRLS, Wesolowsky's Method, and Li's Method can be found in Appendix A of [7] among other methods. Checking all combinations of lines traversing any two (x,y) data points is another method of finding the least absolute deviations line. Since it is known that at least one least absolute deviations line traverses at least two data points, this method will find a line by comparing the SAE (Smallest Absolute Error over data points) of each line, and choosing the line with the smallest SAE. In addition, if multiple lines have the same, smallest SAE, then the lines outline the region of multiple solutions. Though simple, this final method is inefficient for large sets of data.
Solution using linear programming
[edit]The problem can be solved using any linear programming technique on the following problem specification. We wish to
with respect to the choice of the values of the parameters , where yi is the value of the ith observation of the dependent variable, and xij is the value of the ith observation of the jth independent variable (j = 1,...,k). We rewrite this problem in terms of artificial variables ui as
- with respect to and
- subject to
These constraints have the effect of forcing each to equal upon being minimized, so the objective function is equivalent to the original objective function. Since this version of the problem statement does not contain the absolute value operator, it is in a format that can be solved with any linear programming package.
Properties
[edit]There exist other unique properties of the least absolute deviations line. In the case of a set of (x,y) data, the least absolute deviations line will always pass through at least two of the data points, unless there are multiple solutions. If multiple solutions exist, then the region of valid least absolute deviations solutions will be bounded by at least two lines, each of which passes through at least two data points. More generally, if there are k regressors (including the constant), then at least one optimal regression surface will pass through k of the data points.[8]: p.936
This "latching" of the line to the data points can help to understand the "instability" property: if the line always latches to at least two points, then the line will jump between different sets of points as the data points are altered. The "latching" also helps to understand the "robustness" property: if there exists an outlier, and a least absolute deviations line must latch onto two data points, the outlier will most likely not be one of those two points because that will not minimize the sum of absolute deviations in most cases.
One known case in which multiple solutions exist is a set of points symmetric about a horizontal line, as shown in Figure A below.

To understand why there are multiple solutions in the case shown in Figure A, consider the pink line in the green region. Its sum of absolute errors is some value S. If one were to tilt the line upward slightly, while still keeping it within the green region, the sum of errors would still be S. It would not change because the distance from each point to the line grows on one side of the line, while the distance to each point on the opposite side of the line diminishes by exactly the same amount. Thus the sum of absolute errors remains the same. Also, since one can tilt the line in infinitely small increments, this also shows that if there is more than one solution, there are infinitely many solutions.
Advantages and disadvantages
[edit]The following is a table contrasting some properties of the method of least absolute deviations with those of the method of least squares (for non-singular problems).[9][10]
| Ordinary least squares regression | Least absolute deviations regression | |
|---|---|---|
| Not very robust | Robust | |
| Stable solution | Unstable solution | |
| One solution* | Possibly multiple solutions | |
*Provided that the number of data points is greater than or equal to the number of features.
The method of least absolute deviations finds applications in many areas, due to its robustness compared to the least squares method. Least absolute deviations is robust in that it is resistant to outliers in the data. LAD gives equal emphasis to all observations, in contrast to ordinary least squares (OLS) which, by squaring the residuals, gives more weight to large residuals, that is, outliers in which predicted values are far from actual observations. This may be helpful in studies where outliers do not need to be given greater weight than other observations. If it is important to give greater weight to outliers, the method of least squares is a better choice.
Variations, extensions, specializations
[edit]If in the sum of the absolute values of the residuals one generalises the absolute value function to a tilted absolute value function, which on the left half-line has slope and on the right half-line has slope , where , one obtains quantile regression. The case of gives the standard regression by least absolute deviations and is also known as median regression.
The least absolute deviation problem may be extended to include multiple explanators, constraints and regularization, e.g., a linear model with linear constraints:[11]
- minimize
- subject to, e.g.,
where is a column vector of coefficients to be estimated, b is an intercept to be estimated, xi is a column vector of the ith observations on the various explanators, yi is the ith observation on the dependent variable, and k is a known constant.
Regularization with LASSO (least absolute shrinkage and selection operator) may also be combined with LAD.[12]
See also
[edit]References
[edit]- ^ "Least Absolute Deviation Regression". The Concise Encyclopedia of Statistics. Springer. 2008. pp. 299–302. doi:10.1007/978-0-387-32833-1_225. ISBN 9780387328331.
- ^ Barrodale, I.; Roberts, F. D. K. (1973). "An improved algorithm for discrete L1 linear approximation". SIAM Journal on Numerical Analysis. 10 (5): 839–848. Bibcode:1973SJNA...10..839B. doi:10.1137/0710069. hdl:1828/11491. JSTOR 2156318.
- ^ Schlossmacher, E. J. (December 1973). "An Iterative Technique for Absolute Deviations Curve Fitting". Journal of the American Statistical Association. 68 (344): 857–859. doi:10.2307/2284512. JSTOR 2284512.
- ^ Wesolowsky, G. O. (1981). "A new descent algorithm for the least absolute value regression problem". Communications in Statistics – Simulation and Computation. B10 (5): 479–491. doi:10.1080/03610918108812224.
- ^ Li, Yinbo; Arce, Gonzalo R. (2004). "A Maximum Likelihood Approach to Least Absolute Deviation Regression". EURASIP Journal on Applied Signal Processing. 2004 (12): 1762–1769. Bibcode:2004EJASP2004...61L. doi:10.1155/S1110865704401139.
- ^ Kržić, Ana Sović; Seršić, Damir (2018). "L1 minimization using recursive reduction of dimensionality". Signal Processing. 151: 119–129. Bibcode:2018SigPr.151..119S. doi:10.1016/j.sigpro.2018.05.002.
- ^ a b William A. Pfeil, Statistical Teaching Aids, Bachelor of Science thesis, Worcester Polytechnic Institute, 2006
- ^ Branham, R. L., Jr., "Alternatives to least squares", Astronomical Journal 87, June 1982, 928–937. [1] at SAO/NASA Astrophysics Data System (ADS)
- ^ For a set of applets that demonstrate these differences, see the following site: http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/73_choices.html
- ^ For a discussion of LAD versus OLS, see these academic papers and reports: http://www.econ.uiuc.edu/~roger/research/rq/QRJEP.pdf and https://www.leeds.ac.uk/educol/documents/00003759.htm
- ^ Shi, Mingren; Mark A., Lukas (March 2002). "An L1 estimation algorithm with degeneracy and linear constraints". Computational Statistics & Data Analysis. 39 (1): 35–55. doi:10.1016/S0167-9473(01)00049-4.
- ^ Wang, Li; Gordon, Michael D.; Zhu, Ji (December 2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". Proceedings of the Sixth International Conference on Data Mining. pp. 690–700. doi:10.1109/ICDM.2006.134.
Further reading
[edit]- Peter Bloomfield; William Steiger (1980). "Least Absolute Deviations Curve-Fitting". SIAM Journal on Scientific Computing. 1 (2): 290–301. doi:10.1137/0901019.
- Subhash C. Narula and John F. Wellington (1982). "The Minimum Sum of Absolute Errors Regression: A State of the Art Survey". International Statistical Review. 50 (3): 317–326. doi:10.2307/1402501. JSTOR 1402501.
- Robert F. Phillips (July 2002). "Least absolute deviations estimation via the EM algorithm". Statistics and Computing. 12 (3): 281–285. doi:10.1023/A:1020759012226.
- Enno Siemsen & Kenneth A. Bollen (2007). "Least Absolute Deviation Estimation in Structural Equation Modeling". Sociological Methods & Research. 36 (2): 227–265. doi:10.1177/0049124107301946.
Least absolute deviations
View on GrokipediaOverview and Background
Definition and Motivation
Least absolute deviations (LAD), also known as L1 regression, is a statistical estimation method that determines parameters by minimizing the sum of absolute residuals between observed and predicted values, applicable to both regression analysis and location estimation problems.[4] This approach seeks to find the best-fitting model or central tendency that reduces the total absolute deviation across data points, offering a robust alternative to methods sensitive to error magnitude. The primary motivation for LAD stems from its robustness to outliers, as the absolute value penalty treats large deviations linearly rather than quadratically, preventing extreme values from disproportionately influencing the estimate.[4] In contrast, squared-error methods amplify outliers, leading to biased fits in contaminated datasets, whereas LAD maintains stability by bounding the impact of such anomalies. This property makes LAD particularly valuable in real-world applications where data may include measurement errors or atypical observations, ensuring more reliable parameter estimates under heteroscedasticity or heavy-tailed error distributions.[4] In the univariate case, LAD estimation corresponds to the geometric median, which for one-dimensional data is simply the sample median that minimizes the sum of absolute deviations from the data points.[5] For example, given a dataset, the value that achieves this minimum balances the number of points on either side, providing a central location resistant to skewed or outlier-affected distributions. Historically, LAD emerged as an estimation technique in the 18th century, with Roger Boscovich introducing the method in 1757 for fitting linear models by minimizing absolute deviations, predating the least squares approach and serving as an early robust alternative.[6] Its use gained renewed attention in the 20th century through computational advancements and theoretical developments, solidifying its role in robust statistics.[4]Relation to Other Regression Methods
Ordinary least squares (OLS) regression minimizes the sum of squared residuals between observed and predicted values, providing efficient estimates under the assumption of normally distributed errors with constant variance. This method relies on the L2 norm and is highly sensitive to outliers, as large residuals disproportionately influence the estimates due to squaring. In contrast, least absolute deviations (LAD) regression minimizes the sum of absolute residuals, employing the L1 norm, which treats all deviations more equally and reduces the impact of extreme values.[7][8] The distinction between L1 and L2 norms positions LAD as a robust alternative to OLS, particularly in datasets with non-normal errors or contamination. While OLS achieves maximum efficiency under Gaussian assumptions, LAD serves as the maximum likelihood estimator for Laplace-distributed errors, offering greater resistance to outliers without requiring normality. This makes LAD preferable in scenarios where data may include gross errors, though it sacrifices some efficiency relative to OLS in uncontaminated normal settings.[8][7] LAD regression is equivalent to median regression, estimating the conditional median of the response variable given the predictors, analogous to how the sample median minimizes absolute deviations in univariate data. Under the assumption that errors have a conditional median of zero, LAD identifies parameters that minimize expected absolute deviations, generalizing the median to linear models and providing a location estimate robust to skewness and outliers.[9] Within the broader M-estimation framework for robust statistics, LAD corresponds to a specific choice of the influence function where the objective is the absolute deviation, balancing efficiency and breakdown resistance. As a robust M-estimator, it offers a compromise between the outlier sensitivity of OLS and more complex methods, though modern variants like MM-estimators often surpass it in high-leverage scenarios.[7]Mathematical Formulation
General Objective Function
The general objective function of least absolute deviations (LAD) estimation seeks to determine the parameter vector that minimizes the sum of absolute deviations from a set of observed data points , . This is formulated as which corresponds to the classical location estimation problem in one dimension.[10] The method traces its origins to early statistical work, where it was recognized as a robust alternative to squared-error criteria for fitting data to a central value.[10] In the univariate case, the solution to this objective is the sample median, . To derive this, assume without loss of generality that the data are ordered such that . For an odd sample size , the median is . The objective function is piecewise linear and convex, with subgradient At , the subgradient is positive (more terms pull rightward), so decreases as increases; at , the subgradient is negative (more terms pull leftward), so increases. Thus, minimizes , as the subgradient contains zero. For even , any works, with the conventional choice being the midpoint or either endpoint. This property was established in early probability theory as a key advantage of the median for absolute error minimization.[10] This objective extends naturally to multivariate location estimation in , where the goal is to minimize the sum of L1 norms: Due to the separability of the L1 norm across coordinates, the optimization decouples into independent univariate problems, yielding the component-wise sample median as the solution: for each dimension . This multivariate form preserves the robustness of the univariate case while handling vector-valued data. The LAD objective function is convex but non-differentiable at points where for any (or component-wise in the multivariate case), as the absolute value function has a kink at zero. This non-smoothness implies that standard gradient-based optimization techniques fail, necessitating specialized approaches like subgradient descent or reformulation into equivalent smooth or linear programs for computation.[10]Linear Model Specification
In the linear model specification for least absolute deviations (LAD) regression, the response variable for each observation is modeled as a linear function of predictors plus an error term: where is the intercept parameter, (for ) are the slope parameters, are the values of the predictor variables, and are the errors assumed to have median zero.[11] The LAD estimator is obtained by minimizing the sum of absolute residuals: This objective directly adapts the general LAD criterion to the linear predictor form, focusing on median-based fitting rather than mean-based.[11] The intercept parameter shifts the regression hyperplane vertically to center the fitted values around the median of the responses when all predictors are zero, while each slope parameter measures the marginal effect of the corresponding predictor on , adjusted for the other predictors in the model.[11] These parameters collectively define the linear predictor , which approximates the conditional median of given the predictors.[4] In simple linear regression, the specification reduces to a single predictor (), yielding the model and the objective , often visualized as the line passing through the data that minimizes total vertical absolute deviations.[11] For multiple linear regression (), the formulation incorporates additional predictors, allowing for more complex relationships while maintaining the same absolute deviation minimization over the multivariate linear combination.[11] Residuals in the LAD linear model are denoted as the absolute deviations from the fitted values, , where ; the sum represents the minimized objective value, emphasizing pointwise deviations without squaring.[11]Computation and Solution Methods
Linear Programming Formulation
The least absolute deviations (LAD) estimation problem for a linear model can be reformulated as a linear program by introducing auxiliary variables to handle the absolute values in the objective function. Specifically, for observations where , define non-negative auxiliary variables and for each such that the residual and . This transformation linearizes the absolute deviation term while preserving the original minimization of .[12] The full linear programming problem is then stated as: where is the vector of regression parameters. To ensure feasibility and numerical stability in solvers, bounds on the parameters (e.g., ) can be added as additional constraints, converting the problem to a standard form suitable for linear programming software such as CPLEX or Gurobi. This bounded-variable formulation was first detailed for regression contexts in seminal work on applying linear programming to statistical estimation.[12][13] For small datasets, where the number of observations and parameters is modest (e.g., ), the resulting linear program can be solved exactly using the simplex method, which efficiently navigates the feasible region defined by the equality constraints and non-negativity conditions. The simplex algorithm's pivot operations handle the bounded variables effectively, yielding the optimal LAD estimates upon termination. This approach demonstrates the computational tractability of LAD via linear programming for problems where exact solutions are preferred over approximations.[12]Iterative Approximation Algorithms
Iterative approximation algorithms for least absolute deviations (LAD) regression provide practical solutions when exact linear programming formulations become computationally prohibitive for large datasets, offering scalable alternatives that leverage iterative refinements to approximate the L1 minimizer.[14] These methods typically start from an initial estimate, such as ordinary least squares, and iteratively update parameters until convergence to a solution that minimizes the sum of absolute residuals.[15] The Barrodale-Roberts algorithm stands as a seminal specialized variant of the simplex method tailored specifically for LAD problems, optimizing the linear programming representation of L1 regression by exploiting its structure to reduce storage and computational demands.[14] Introduced in 1973, it performs row and column operations on an augmented matrix to efficiently navigate the feasible region, achieving exact solutions faster than general-purpose simplex implementations for moderate-sized problems with up to hundreds of observations.[14] This algorithm is particularly effective for linear LAD models, as it avoids unnecessary pivots by prioritizing median-based properties inherent to the L1 objective.[16] Another widely adopted approach is the iterative reweighted least squares (IRLS) adaptation for L1 minimization, which approximates the non-differentiable absolute deviation function through a sequence of weighted least squares subproblems.[17] In each iteration, weights are assigned inversely proportional to the absolute residuals from the previous estimate—specifically, for nonzero residuals, with safeguards like a small epsilon to avoid division by zero—transforming the LAD objective into a quadratic form solvable via standard least squares solvers.[17] This method converges to the LAD solution under mild conditions, such as bounded residuals and a suitable initial guess, often requiring 10–20 iterations for practical accuracy in low-dimensional settings. Convergence properties of these iterative methods depend on the algorithm and problem characteristics; the Barrodale-Roberts simplex variant guarantees finite termination to an exact optimum due to the finite basis in linear programming, typically in O(nm) operations where n is the number of observations and m the number of parameters.[14] In contrast, IRLS exhibits monotonic decrease in the objective function and linear convergence rates globally when augmented with smoothing regularization, though it may slow near zero residuals or require acceleration techniques for high dimensions. Common stopping criteria include a relative change in parameter estimates below a threshold (e.g., 10^{-6}), a fixed maximum number of iterations (e.g., 50), or negligible reduction in the sum of absolute deviations between successive steps.[15] Handling non-uniqueness in LAD solutions, where multiple parameter sets achieve the same minimum due to the objective's flatness at the median (e.g., when an even number of residuals straddle zero), requires algorithms to detect and report alternative minimizers. The Barrodale-Roberts algorithm identifies non-uniqueness by checking for degenerate bases or multiple optimal vertices in the simplex tableau, allowing users to select, for instance, the solution with minimal L2 norm among equivalents.[16] For IRLS, non-uniqueness manifests as convergence to one of several local equivalents, mitigated by post-processing sensitivity analysis or ensemble starts from perturbed initials to verify robustness.Statistical Properties
Robustness Characteristics
Least absolute deviations (LAD) estimation demonstrates strong robustness properties, particularly in its resistance to outliers and deviations from distributional assumptions. In the univariate setting, LAD corresponds to the sample median, which achieves a breakdown point of 50%, allowing it to withstand contamination of up to nearly half the data points by arbitrary outliers before the estimate can be driven to infinity or lose all resemblance to the true parameter; this contrasts sharply with the sample mean's breakdown point of 0%, where even a single outlier can arbitrarily distort the estimate.[18][19] The influence function further underscores LAD's robustness, revealing bounded influence from individual observations. For the univariate median, the influence function is given by where is the true median, is the density at , and limits the effect of any single point to at most , preventing outliers from exerting unbounded leverage. In the regression context, LAD's influence function for residuals is , which is bounded between -1 and 1, ensuring that outliers in the response variable contribute only a constant maximum influence regardless of their extremity, unlike ordinary least squares (OLS) where influence grows linearly with residual size.[20][21] Under contamination models such as Huber's -contamination framework, where the data distribution is with the good distribution and arbitrary, the univariate LAD estimator remains consistent provided , as the median shifts continuously but stays within a bounded neighborhood of the true value. This property extends to LAD regression under similar conditions, maintaining consistency for moderate contamination levels when the design matrix satisfies standard regularity assumptions, such as bounded leverage.[22] Empirical simulations illustrate these characteristics, showing that in scenarios with outliers in the response direction or high-leverage points, LAD yields lower mean squared errors for slope estimates than OLS, demonstrating reduced bias and variance under outlier contamination.Asymptotic Behavior
The least absolute deviations (LAD) estimator is consistent for the true regression parameters under mild conditions, including the design matrix having full column rank asymptotically and the error distribution possessing a unique median at zero with finite first moment.[23] Under additional regularity conditions, such as the error density being positive and continuous at the median, the LAD estimator exhibits asymptotic normality. Specifically, for independent and identically distributed errors with median zero, where denotes the error density evaluated at the median, is the true parameter vector, and represents the regressors.[23][9] The asymptotic relative efficiency of the LAD estimator compared to ordinary least squares (OLS) depends on the error distribution. For Gaussian errors, LAD achieves approximately 64% of the efficiency of OLS due to the influence of the density term in the asymptotic variance. However, in heavy-tailed distributions like the Laplace, where outliers are more prevalent, LAD outperforms OLS in efficiency, leveraging its robustness to achieve lower asymptotic variance.[24][4] A Bahadur representation exists for the LAD regression coefficients, expressing as the true plus a linear term involving the score function and a remainder that is asymptotically negligible at rate , facilitating derivations of higher-order asymptotics and inference procedures.[25]Advantages and Disadvantages
Benefits in Outlier Resistance
Least absolute deviations (LAD) regression offers significant advantages in handling outliers, particularly in datasets prone to anomalies, by minimizing the sum of absolute residuals rather than squared ones, which prevents extreme values from disproportionately influencing the fit.[26] This approach yields improved prediction accuracy in contaminated environments, such as financial time series where sudden market shocks introduce outliers; for instance, applications to exchange rate returns demonstrate that LAD models effectively capture non-linear behaviors while maintaining stability against such disruptions.[27] Unlike ordinary least squares (OLS), which can produce biased estimates when outliers are present, LAD's robustness stems from its equivalence to estimating the conditional median, providing a straightforward interpretation as the value that minimizes absolute deviations and resists vertical outliers up to nearly 50% contamination.[28] This property enhances overall model reliability in real-world scenarios with irregular data structures. In simulations of contaminated regression settings, LAD consistently outperforms OLS by delivering lower mean squared errors and higher efficiency when up to 20% of observations are outliers, underscoring its practical superiority for robust inference.[3] As noted in analyses of robustness characteristics, LAD achieves a high breakdown point for response outliers, further solidifying its role in outlier-resistant estimation.[26]Limitations in Efficiency and Computation
One significant limitation of the least absolute deviations (LAD) estimator is its lower asymptotic efficiency compared to ordinary least squares (OLS) when the errors follow a normal distribution. Specifically, the asymptotic relative efficiency of the LAD estimator relative to OLS is , implying that the asymptotic variance of the LAD estimator is approximately times larger than that of the OLS estimator under normality. This reduced efficiency arises because LAD minimizes the sum of absolute residuals, which is less optimal for symmetric unimodal densities like the normal, where squared residuals better capture the variance structure.[29] Another challenge is the potential non-uniqueness of LAD solutions, particularly in cases of collinear predictors or sparse data configurations. When the design matrix is singular due to collinearity, the linear programming formulation of LAD admits multiple optimal solutions, as the objective function remains constant along certain directions in the parameter space. Similarly, in sparse datasets—such as those with few observations relative to parameters or with many tied residuals—the median regression nature of LAD can lead to non-unique minimizers, complicating interpretation and requiring additional regularization to select a unique estimate.[4] LAD also suffers from higher computational demands, especially for large sample sizes , due to its formulation as a linear program lacking a closed-form solution unlike OLS. Solving the LAD problem typically requires numerical optimization via simplex or interior-point methods, with worst-case complexity scaling as in general-purpose solvers, making it substantially slower than the direct computation of OLS for fixed predictors . This absence of closed-form expressions necessitates iterative approximations, further increasing the practical burden for high-dimensional or big data applications. While specialized algorithms like iterative reweighted least squares can mitigate some costs, they still demand more resources than closed-form alternatives.[4]Applications and Examples
Real-World Use Cases
In econometrics, least absolute deviations (LAD) regression is applied to model income data, which often exhibits skewness and outliers due to factors like economic shocks or high-income extremes. For instance, Glahe and Hunt (1970) used LAD to estimate parameters in a simultaneous equation model incorporating personal and farm income variables from U.S. time series data (1960–1964), demonstrating its superior performance over ordinary least squares in small samples with non-normal error distributions.[11] This robustness makes LAD particularly suitable for analyzing income distributions where outliers can distort mean-based estimates, as highlighted in broader econometric literature on thick-tailed economic indicators.[11] In engineering, LAD regression supports signal processing tasks, especially for recovering sparse signals contaminated by heavy-tailed noise, where traditional least squares methods fail. Markovic (2013) proposed an LAD-based algorithm that adapts orthogonal matching pursuit for sparse signal reconstruction, achieving higher recovery rates than least squares under t(2)-distributed noise, which is common in real-world sensor data.[30] This approach enhances fault detection in systems by identifying deviations in signal patterns without undue influence from anomalous measurements, leveraging LAD's outlier resistance.[30] In environmental science, LAD regression is employed to analyze skewed pollution measurements, such as contaminant concentrations in sediments or volatile compounds, where outliers from sampling variability or extreme events are prevalent. Mebarki et al. (2017) applied LAD to model retention indices of pyrazines—heterocyclic pollutants from industrial sources—using quantitative structure-retention relationships, yielding robust fits (R² > 98%) on gas chromatography data for 114 compounds.[31] Similarly, Grant and Middleton (1998) utilized least absolute values regression for grain-size normalization of metal contaminants in Humber Estuary sediments, effectively handling outliers to distinguish true pollution signals from textural artifacts.[32] Software implementations of LAD regression are widely available, facilitating its adoption across disciplines. In R, the quantreg package provides the rq() function, where setting tau=0.5 computes the LAD estimator as a special case of quantile regression.[33] Python's statsmodels library offers QuantReg from statsmodels.regression.quantile_regression, with q=0.5 yielding LAD results for linear models.[34] In SAS, the PROC QUANTREG procedure supports LAD estimation via the quantile=0.5 option, including inference tools for robust analysis.[35]Illustrative Numerical Example
To illustrate the least absolute deviations (LAD) method, begin with the univariate case, where it corresponds to selecting the median as the estimator that minimizes the sum of absolute deviations from the data points. Consider the following dataset of 11 observations, which includes an outlier: 22, 24, 26, 28, 29, 31, 35, 37, 41, 53, 64. Sorting the values yields the same order, and with an odd number of points, the median is the sixth value, 31. The absolute deviations are 9, 7, 5, 3, 2, 0, 4, 6, 10, 22, and 33, summing to 101. This sum is minimized at the median, providing robustness against the outlier at 64, which contributes only 33 to the total but would exert disproportionate influence under squared-error minimization.[36] For comparison, the arithmetic mean of this dataset is 390 / 11 ≈ 35.45. The absolute deviations from the mean sum to approximately 106.4 (calculated as ∑|x_i - 35.45| for each point, where the outlier contributes 28.55 and pulls the center higher, increasing the total deviation). To arrive at the mean's sum of absolute deviations, subtract the mean from each sorted value and take absolutes: |22-35.45|=13.45, |24-35.45|=11.45, |26-35.45|=9.45, |28-35.45|=7.45, |29-35.45|=6.45, |31-35.45|=4.45, |35-35.45|=0.45, |37-35.45|=1.55, |41-35.45|=5.55, |53-35.45|=17.55, |64-35.45|=28.55; summing these yields 13.45 + 11.45 + 9.45 + 7.45 + 6.45 + 4.45 + 0.45 + 1.55 + 5.55 + 17.55 + 28.55 = 106.35. Thus, the median yields a lower sum of absolute deviations (101 vs. 106.35), highlighting LAD's outlier resistance in the univariate setting. Now consider a simple linear regression example using the dataset with 8 points, including potential outliers such as the low value at x=3: (1,7), (2,14), (3,10), (4,17), (5,15), (6,21), (7,26), (8,23). To fit the parameters β₀ (intercept) and β₁ (slope), the LAD objective minimizes ∑|y_i - (β₀ + β₁ x_i)|, which can be reformulated as a basic linear program: minimize ∑(u_i + v_i) subject to y_i = β₀ + β₁ x_i + u_i - v_i for i=1 to 8, with u_i ≥ 0 and v_i ≥ 0 representing positive and negative residuals, respectively. This setup ensures the absolute residual |y_i - (β₀ + β₁ x_i)| = u_i + v_i is captured linearly.[37] Solving this yields the fitted line ŷ = 4.2 + 2.8x, which passes through two data points ((1,7) and (6,21)) as is characteristic of univariate LAD regression lines. The residuals and absolute residuals are shown in the table below:| x | y | ŷ | Residual (y - ŷ) | Absolute Residual |
|---|---|---|---|---|
| 1 | 7 | 7.0 | 0.0 | 0.0 |
| 2 | 14 | 9.8 | 4.2 | 4.2 |
| 3 | 10 | 12.6 | -2.6 | 2.6 |
| 4 | 17 | 15.4 | 1.6 | 1.6 |
| 5 | 15 | 18.2 | -3.2 | 3.2 |
| 6 | 21 | 21.0 | 0.0 | 0.0 |
| 7 | 26 | 23.8 | 2.2 | 2.2 |
| 8 | 23 | 26.6 | -3.6 | 3.6 |
