Hubbry Logo
ExtrapolationExtrapolationMain
Open search
Extrapolation
Community hub
Extrapolation
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Extrapolation
Extrapolation
from Wikipedia

In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between known observations, but extrapolation is subject to greater uncertainty and a higher risk of producing meaningless results. Extrapolation may also mean extension of a method, assuming similar methods will be applicable. Extrapolation may also apply to human experience to project, extend, or expand known experience into an area not known or previously experienced. By doing so, one makes an assumption of the unknown[1] (for example, a driver may extrapolate road conditions beyond what is currently visible and these extrapolations may be correct or incorrect). The extrapolation method can be applied in the interior reconstruction problem.

Example illustration of the extrapolation problem, consisting of assigning a meaningful value at the blue box, at , given the red data points

Method

[edit]

A sound choice of which extrapolation method to apply relies on a priori knowledge of the process that created the existing data points. Some experts have proposed the use of causal forces in the evaluation of extrapolation methods.[2] Crucial questions are, for example, if the data can be assumed to be continuous, smooth, possibly periodic, etc.

Linear

[edit]
Global temperature 1942-2025 from Our World In Data forecasted to 2100.

Linear extrapolation means creating a tangent line at the end of the known data and extending it beyond that limit. Linear extrapolation will only provide good results when used to extend the graph of an approximately linear function or not too far beyond the known data.

If the two data points nearest the point to be extrapolated are and , linear extrapolation gives the function:

(which is identical to linear interpolation if ). It is possible to include more than two points, and averaging the slope of the linear interpolant, by regression-like techniques, on the data points chosen to be included. This is similar to linear prediction.

Polynomial

[edit]
Lagrange extrapolations of the sequence 1,2,3. Extrapolating by 4 leads to a polynomial of minimal degree (cyan line).

A polynomial curve can be created through the entire known data or just near the end (two points for linear extrapolation, three points for quadratic extrapolation, etc.). The resulting curve can then be extended beyond the end of the known data. Polynomial extrapolation is typically done by means of Lagrange interpolation or using Newton's method of finite differences to create a Newton series that fits the data. The resulting polynomial may be used to extrapolate the data.

High-order polynomial extrapolation must be used with due care. For the example data set and problem in the figure above, anything above order 1 (linear extrapolation) will possibly yield unusable values; an error estimate of the extrapolated value will grow with the degree of the polynomial extrapolation. This is related to Runge's phenomenon.


Conic

[edit]

A conic section can be created using five points near the end of the known data. If the conic section created is an ellipse or circle, when extrapolated it will loop back and rejoin itself. An extrapolated parabola or hyperbola will not rejoin itself, but may curve back relative to the X-axis. This type of extrapolation could be done with a conic sections template (on paper) or with a computer.

French curve

[edit]

French curve extrapolation is a method suitable for any distribution that has a tendency to be exponential, but with accelerating or decelerating factors.[3] This method has been used successfully in providing forecast projections of the growth of HIV/AIDS in the UK since 1987 and variant CJD in the UK for a number of years. Another study has shown that extrapolation can produce the same quality of forecasting results as more complex forecasting strategies.[4]

Geometric Extrapolation with error prediction

[edit]

Can be created with 3 points of a sequence and the "moment" or "index", this type of extrapolation have 100% accuracy in predictions in a big percentage of known series database (OEIS).[5]

Example of extrapolation with error prediction :

Quality

[edit]

Typically, the quality of a particular method of extrapolation is limited by the assumptions about the function made by the method. If the method assumes the data are smooth, then a non-smooth function will be poorly extrapolated.

In terms of complex time series, some experts have discovered that extrapolation is more accurate when performed through the decomposition of causal forces.[6]

Even for proper assumptions about the function, the extrapolation can diverge severely from the function. The classic example is truncated power series representations of sin(x) and related trigonometric functions. For instance, taking only data from near the x = 0, we may estimate that the function behaves as sin(x) ~ x. In the neighborhood of x = 0, this is an excellent estimate. Away from x = 0 however, the extrapolation moves arbitrarily away from the x-axis while sin(x) remains in the interval [−1, 1]. I.e., the error increases without bound.

Taking more terms in the power series of sin(x) around x = 0 will produce better agreement over a larger interval near x = 0, but will produce extrapolations that eventually diverge away from the x-axis even faster than the linear approximation.

This divergence is a specific property of extrapolation methods and is only circumvented when the functional forms assumed by the extrapolation method (inadvertently or intentionally due to additional information) accurately represent the nature of the function being extrapolated. For particular problems, this additional information may be available, but in the general case, it is impossible to satisfy all possible function behaviors with a workably small set of potential behavior.

In the complex plane

[edit]

In complex analysis, a problem of extrapolation may be converted into an interpolation problem by the change of variable . This transform exchanges the part of the complex plane inside the unit circle with the part of the complex plane outside of the unit circle. In particular, the compactification point at infinity is mapped to the origin and vice versa. Care must be taken with this transform however, since the original function may have had "features", for example poles and other singularities, at infinity that were not evident from the sampled data.

Another problem of extrapolation is loosely related to the problem of analytic continuation, where (typically) a power series representation of a function is expanded at one of its points of convergence to produce a power series with a larger radius of convergence. In effect, a set of data from a small region is used to extrapolate a function onto a larger region.

Again, analytic continuation can be thwarted by function features that were not evident from the initial data.

Also, one may use sequence transformations like Padé approximants and Levin-type sequence transformations as extrapolation methods that lead to a summation of power series that are divergent outside the original radius of convergence. In this case, one often obtains rational approximants.

Extrapolation arguments

[edit]

Extrapolation arguments are informal and unquantified arguments which assert that something is probably true beyond the range of values for which it is known to be true. For example, we believe in the reality of what we see through magnifying glasses because it agrees with what we see with the naked eye but extends beyond it; we believe in what we see through light microscopes because it agrees with what we see through magnifying glasses but extends beyond it; and similarly for electron microscopes. Such arguments are widely used in biology in extrapolating from animal studies to humans and from pilot studies to a broader population.[7]

Like slippery slope arguments, extrapolation arguments may be strong or weak depending on such factors as how far the extrapolation goes beyond the known range.[8]

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Extrapolation is a fundamental technique in , , and used to estimate unknown values by extending patterns or trends observed within a known beyond its observed range. This method contrasts with , which estimates values within the data range, as extrapolation inherently carries greater uncertainty due to the potential for unmodeled changes in the underlying function or relationship. In statistics, extrapolation is commonly applied in regression models to predict outcomes for predictor variables outside the sample data's scope, such as future trends from historical observations. However, it is considered risky because the assumed or trends may not persist, leading to significant errors, as demonstrated in cases like models where extrapolated values deviate markedly from actual measurements. For instance, a equation fitted to urine concentration data from 0 to 5.80 ml/plate predicted 34.8 colonies at 11.60 ml/plate, while the observed value was approximately 15.1, highlighting the limitations. In numerical analysis, extrapolation methods enhance computational accuracy and efficiency by systematically eliminating dominant error terms in approximations. A prominent example is Richardson extrapolation, pioneered by Lewis Fry Richardson and J. Arthur Gaunt in 1927 for solving ordinary differential equations, which combines solutions at different step sizes to achieve higher-order convergence. This technique, later extended in Romberg integration by Werner Romberg in 1955, improves tasks like numerical differentiation (reaching 14 decimal places of accuracy versus 7 without it) and integration (attaining machine precision with coarser grids). Applications span scientific computing, including series acceleration for constants like π (computed to 10 decimals with 392 evaluations) and broader predictive modeling in physics and engineering.

Fundamentals

Definition

Extrapolation is the process of estimating values for variables outside the observed range of a by extending the trends or patterns identified within the known data. This technique is commonly applied in , , and related fields to make predictions beyond the boundaries of available observations, such as future outcomes based on historical records. Unlike mere , extrapolation relies on systematic methods to infer these estimates, though it inherently carries risks if the underlying patterns do not persist. Mathematically, extrapolation involves selecting or constructing a function ff that approximates a set of observed data points (xi,yi)(x_i, y_i) for i=1i = 1 to nn, where the xix_i lie within a specific interval, say [a,b][a, b]. The goal is to evaluate f(x)f(x) for x<ax < a or x>bx > b to predict corresponding yy values, typically achieved through curve-fitting approaches that minimize discrepancies between f(xi)f(x_i) and yiy_i. For example, consider data points (1,2)(1, 2), (2,4)(2, 4), and (3,6)(3, 6); fitting a linear function y=2xy = 2x allows extrapolation to x=4x = 4, yielding an estimated y=8y = 8, assuming the linear relationship continues. In statistical contexts, extrapolation serves as a foundational tool in predictive modeling, enabling inferences about unobserved phenomena under assumptions such as the continuity of the process or the persistence of observed trends. These assumptions imply that causal factors supporting the data's patterns remain stable beyond the sampled range, though violations can lead to unreliable predictions. Traditional methods often implicitly rely on such trend persistence to project outcomes, highlighting the need for cautious application in fields like . The concept of extrapolation traces its origins to 19th-century astronomy and physics, with the term first appearing in 1862 in a Harvard Observatory report on the of 1858, where it described inferring orbital positions from limited observations; this usage is linked to the work of English mathematician and astronomer Sir George Airy.

Distinction from

The primary distinction between extrapolation and lies in the range of the independent variable relative to the known data points. involves estimating values within the observed data range—for instance, predicting a function value at x=3x = 3 given data at x=1x = 1 and x=5x = 5—whereas extrapolation extends estimates beyond this range, such as at x=6x = 6 or x=0x = 0. Conceptually, interpolation fills gaps between data points to create a smoother representation of the underlying function, akin to connecting dots within a to approximate missing intermediates. In contrast, extrapolation projects the trend outward from the endpoints, potentially extending a line or into uncharted territory. For example, consider a of readings from 9 a.m. to 5 p.m.; interpolation might estimate the temperature at noon, while extrapolation could forecast it at 7 p.m., assuming the pattern persists. This visual difference highlights interpolation's role in internal refinement versus extrapolation's forward or backward projection. Extrapolation relies on the assumption that the observed trend continues unchanged beyond the range, an assumption that introduces greater risk due to possible shifts in underlying patterns, such as non-linear behaviors or external influences not captured in the . Interpolation, operating within bounds, is typically more reliable as it adheres closely to observed , reducing the likelihood of significant errors from unmodeled changes. The dangers of extrapolation are particularly pronounced in high-stakes applications, where erroneous predictions can lead to flawed decisions, underscoring the need for caution and validation. Mathematically, the boundary is defined by the domain of approximation: interpolation confines estimates to the convex hull of the data points—the smallest convex set containing all points—ensuring the query point is a convex combination of observed locations. Extrapolation occurs when the point lies outside this hull, violating the safe interpolation region and amplifying uncertainty. In practice, interpolation is preferred for tasks like data smoothing or filling internal gaps in datasets, where accuracy within known bounds is paramount. Extrapolation suits forecasting or scenario planning, such as economic projections or trend extensions, but requires additional safeguards like sensitivity analysis to mitigate risks. Selecting between them depends on the context: stay within the data for reliability, but venture outside only with strong theoretical justification.

Methods

Linear Extrapolation

Linear extrapolation is the simplest form of extrapolation, involving the fitting of a straight line to two or more known data points at the endpoints of a and extending that line beyond the observed range to predict values outside it. This method assumes a linear relationship between the variables, where the rate of change remains constant, allowing for straightforward extension using the of the line determined from the given points. The for linear extrapolation derives from the slope-intercept form of a , y=mx+by = mx + b, where mm is the and bb is the . To derive it from two points (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2), first compute the slope m=y2y1x2x1m = \frac{y_2 - y_1}{x_2 - x_1}. Substituting into the point-slope form yy1=m(xx1)y - y_1 = m(x - x_1) yields the extrapolation : y=y1+y2y1x2x1(xx1).y = y_1 + \frac{y_2 - y_1}{x_2 - x_1} (x - x_1). This equation directly extends the line by scaling the by the distance from the reference point x1x_1. Consider the points (1,2)(1, 2) and (3,6)(3, 6); to extrapolate the value at x=5x = 5:
  1. Calculate the slope: m=6231=42=2m = \frac{6 - 2}{3 - 1} = \frac{4}{2} = 2.
  2. Apply the formula using the first point: y=2+2(51)=2+8=10y = 2 + 2(5 - 1) = 2 + 8 = 10.
    Thus, the extrapolated value is y=10y = 10 at x=5x = 5.
This method relies on the assumption of a constant rate of change, meaning the underlying relationship is perfectly linear within and beyond the data range. However, it has limitations when the true relationship is non-linear, as the straight-line extension can lead to significant inaccuracies over longer projections. Linear extrapolation finds applications in basic forecasting for time series data, such as estimating short-term population growth by extending trends from recent census points. It is also used in operations management for simple trend projections in business metrics like sales over limited horizons.

Polynomial Extrapolation

Polynomial extrapolation extends the use of polynomial functions beyond linear approximations by fitting a polynomial of degree n>1n > 1, p(x)=anxn+an1xn1++a1x+a0p(x) = a_n x^n + a_{n-1} x^{n-1} + \dots + a_1 x + a_0, to a set of data points (xi,yi)(x_i, y_i) for i=0,,mi = 0, \dots, m where mnm \geq n, and then evaluating p(x)p(x) at points outside the interval spanned by the xix_i. This approach allows for capturing nonlinear trends in the data, serving as a higher-degree generalization of linear extrapolation. Two primary techniques for constructing the interpolating are the Lagrange and Newton's divided difference method. The Lagrange directly builds the as p(x)=k=0nykk(x),p(x) = \sum_{k=0}^{n} y_k \ell_k(x), where the basis polynomials are k(x)=j=0jknxxjxkxj.\ell_k(x) = \prod_{\substack{j=0 \\ j \neq k}}^{n} \frac{x - x_j}{x_k - x_j}. This form is explicit but computationally intensive for large nn due to the product evaluations. In contrast, Newton's divided difference method expresses the polynomial in a nested form that facilitates efficient computation, particularly when adding more points: p(x)=f[x0]+f[x0,x1](xx0)+f[x0,x1,x2](xx0)(xx1)++f[x0,,xn]j=0n1(xxj),p(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \dots + f[x_0, \dots, x_n] \prod_{j=0}^{n-1} (x - x_j), with divided defined recursively: f[xi]=yif[x_i] = y_i and f[xi,,xi+k]=f[xi+1,,xi+k]f[xi,,xi+k1]xi+kxi.f[x_i, \dots, x_{i+k}] = \frac{f[x_{i+1}, \dots, x_{i+k}] - f[x_i, \dots, x_{i+k-1}]}{x_{i+k} - x_i}. This method leverages a divided difference table for incremental updates. Consider a quadratic example with points (0,1)(0, 1), (1,2)(1, 2), and (2,5)(2, 5). Using , the zeroth divided differences are f{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} = 1, f{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} = 2, f{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}} = 5. The first-order differences are f[0,1]=(21)/(10)=1f[0,1] = (2-1)/(1-0) = 1 and f[1,2]=(52)/(21)=3f[1,2] = (5-2)/(2-1) = 3. The second-order difference is f[0,1,2]=(31)/(20)=1f[0,1,2] = (3-1)/(2-0) = 1. Thus, p(x)=1+1(x0)+1(x0)(x1)=1+x+x(x1)=x2+1.p(x) = 1 + 1 \cdot (x - 0) + 1 \cdot (x - 0)(x - 1) = 1 + x + x(x-1) = x^2 + 1. Extrapolating to x=3x=3 yields p(3)=9+1=10p(3) = 9 + 1 = 10. This derivation confirms the polynomial passes through the points: p(0)=1p(0)=1, p(1)=2p(1)=2, p(2)=5p(2)=5. Compared to linear extrapolation, polynomial methods better capture curvature in data exhibiting quadratic or higher-order trends, improving accuracy for moderately nonlinear functions. However, high-degree polynomials can suffer from , where oscillations amplify near the interval endpoints, leading to poor extrapolation stability, as illustrated by interpolating f(x)=1/(1+25x2)f(x) = 1/(1 + 25x^2) on [1,1][-1,1] with increasing degrees. For computational efficiency, especially with equally spaced points, finite differences simplify the process by approximating divided differences. The forward difference table starts with values f(xi)f(x_i), computes first differences Δf(xi)=f(xi+1)f(xi)\Delta f(x_i) = f(x_{i+1}) - f(x_i), second differences Δ2f(xi)=Δf(xi+1)Δf(xi)\Delta^2 f(x_i) = \Delta f(x_{i+1}) - \Delta f(x_i), and so on, until constant nnth differences for a degree-nn polynomial. Extrapolation then uses Newton's forward difference formula: p(x)=k=0n(sk)Δkf(x0),p(x) = \sum_{k=0}^{n} \binom{s}{k} \Delta^k f(x_0), where s=(xx0)/hs = (x - x_0)/h and hh is the spacing. This avoids full divided difference tables for uniform grids.

Conic and Geometric Extrapolation

Conic extrapolation involves fitting a conic section—such as a , , , or —to a set of points to predict values beyond the observed range. This method uses the general implicit ax2+bxy+cy2+dx+ey+f=0ax^2 + bxy + cy^2 + dx + ey + f = 0, where the coefficients a,b,c,d,e,fa, b, c, d, e, f are determined by minimizing an error metric, typically the algebraic or geometric distance from the points to the . Common approaches include linear least-squares methods like the direct ellipse fit (LIN), which solve a subject to a quadratic constraint to ensure the conic type, or more robust geometric minimization techniques that account for distances. These fittings are particularly effective for exhibiting quadratic or hyperbolic trends, allowing extension of the while preserving geometric properties. A representative application is parabolic extrapolation in , where the trajectory follows a quadratic path under constant . The vertical position is modeled as y=ax2+bx+cy = ax^2 + bx + c, with coefficients fitted to observed position-time or position-horizontal ; for instance, eliminating time from kinematic equations yields y=(tanθ)xgx22vi2cos2θy = (\tan \theta) x - \frac{g x^2}{2 v_i^2 \cos^2 \theta}, where θ\theta is the launch angle, viv_i the initial velocity, and gg . This fit enables prediction of landing points or maximum height by extending the parabola beyond initial measurements. Geometric extrapolation extends curves manually or with aids like rulers, compasses, or templates to visually continue trends from plotted points. Rulers and compasses facilitate straight-line extensions or circular arcs, while specialized tools approximate conic paths through linkage mechanisms. The , a template with varying radii, allows freehand of smooth, spline-like extensions by aligning segments with points and tracing beyond them, suitable for irregular or accelerating distributions. Modern software emulates these by parameterizing curves via and fitting splines for seamless continuation. To incorporate uncertainty, geometric methods can include error prediction via confidence cones, which visualize reliability around extrapolated paths. These cones, apexed at the origin and widening along the direction of extension, bound the probable true based on sampling variability, with width determined by factors like the R2R^2; for example, a 95% confidence cone in a two-variable optimization spans angles indicating directional . Historically, conic and geometric extrapolation featured in pre-computer engineering drawings, where mechanical linkages and templates—developed from ancient Greek devices by figures like and —enabled precise curve extensions for designs in and without numerical computation.

Other Curve-Fitting Techniques

Spline extrapolation employs piecewise functions, typically cubic splines, to fit segments between specified knots and extend the fit beyond the range. These methods construct a smooth curve by ensuring continuity in the function value, first , and at the knots, allowing for local adjustments that avoid the oscillations often seen in high-degree global . A cubic spline segment between knots tit_i and ti+1t_{i+1} is given by Si(x)=ai+bi(xti)+ci(xti)2+di(xti)3,S_i(x) = a_i + b_i(x - t_i) + c_i(x - t_i)^2 + d_i(x - t_i)^3, where the coefficients ai,bi,ci,dia_i, b_i, c_i, d_i are determined by solving a system of equations from the interpolation conditions and boundary constraints. For extrapolation, the spline is extended using the last segment's polynomial, often with natural boundary conditions where the second derivative is zero at the endpoints to minimize curvature. In practice, cubic splines can fit scattered data points, such as irregularly spaced observations of a physical process, and extrapolate forward by maintaining the smoothness of the final segment; for instance, applying natural conditions to endpoint data ensures a linear-like extension without abrupt changes. These techniques offer flexibility for modeling complex, non-linear trends in without the Runge phenomenon associated with global polynomials, enabling better adaptation to local variations. Software implementations, such as MATLAB's pchip function, provide shape-preserving piecewise cubic that avoids overshoots during extrapolation, making it suitable for and scientific applications. However, spline extrapolation is sensitive to the choice of boundary conditions and endpoint , as alterations can propagate instabilities or unrealistic trends beyond the observed range. Non-parametric methods, such as kernel smoothing and nearest-neighbor approaches, offer alternatives for extrapolation with irregular or noisy data by avoiding rigid parametric forms. Kernel smoothing estimates the function at a point by weighting nearby observations with a kernel function, like the Gaussian kernel, and can extend estimates beyond the data by incorporating distant points with diminishing influence, though effectiveness diminishes far from the data domain. Nearest-neighbor extension selects the closest data points and averages or weights their values, providing a simple local extrapolation for sparse, irregular datasets without assuming an underlying functional form. These methods excel in capturing data-driven patterns but require careful bandwidth or neighbor selection to balance and variance during extension.

Quality and Error Assessment

Error Measures and

In the assessment of extrapolation accuracy, common deterministic error measures include the (MSE) and the (RSS). MSE quantifies the average of the squared differences between extrapolated predictions and corresponding true values, when available, thereby extending traditional regression evaluation to points beyond the observed range to gauge predictive fidelity. RSS, meanwhile, measures the total squared deviations between observed and the fitted model within the training set, serving as a foundational indicator of model fit quality that informs expected extrapolation reliability. Prediction techniques for estimating extrapolation errors encompass bootstrap resampling and forward error analysis. Bootstrap resampling generates error bands by iteratively drawing samples with replacement from the dataset, refitting the extrapolation model each time, and analyzing the variability in predicted values at target points; this approach is particularly useful for non-parametric error characterization in extrapolation scenarios. Forward error analysis, rooted in numerical methods, evaluates how initial data perturbations—such as rounding errors or measurement inaccuracies—propagate through the extrapolation algorithm to bound the difference between the computed and exact extrapolated results. A representative example arises in linear extrapolation, where the predicted error is computed via the of the : σ^1+1n+(xxˉ)2Sxx\hat{\sigma} \sqrt{1 + \frac{1}{n} + \frac{(x - \bar{x})^2}{S_{xx}}}
Add your contribution
Related Hubs
User Avatar
No comments yet.