Recent from talks
Contribute something
Nothing was collected or created yet.
Piecewise linear function
View on WikipediaThis article needs additional citations for verification. (March 2013) |
In mathematics, a piecewise linear or segmented function is a real-valued function of a real variable, whose graph is composed of straight-line segments.[1]
Definition
[edit]A piecewise linear function is a function defined on a (possibly unbounded) interval of real numbers, such that there is a collection of intervals on each of which the function is an affine function. (Thus "piecewise linear" is actually defined to mean "piecewise affine".) If the domain of the function is compact, there needs to be a finite collection of such intervals; if the domain is not compact, it may either be required to be finite or to be locally finite in the reals.
Examples
[edit]
The function defined by
is piecewise linear with four pieces. The graph of this function is shown to the right. Since the graph of an affine(*) function is a line, the graph of a piecewise linear function consists of line segments and rays. The x values (in the above example −3, 0, and 3) where the slope changes are typically called breakpoints, changepoints, threshold values or knots. As in many applications, this function is also continuous. The graph of a continuous piecewise linear function on a compact interval is a polygonal chain.
(*) A linear function satisfies by definition and therefore in particular ; functions whose graph is a straight line are affine rather than linear.
There are other examples of piecewise linear functions:
- Absolute value[2]
- Sawtooth function
- Floor function
- Step function, a function composed of constant sub-functions, so also called a piecewise constant function
- Triangular function
Fitting to a curve
[edit]
An approximation to a known curve can be found by sampling the curve and interpolating linearly between the points. An algorithm for computing the most significant points subject to a given error tolerance has been published.[3]
Fitting to data
[edit]If partitions, and then breakpoints, are already known, linear regression can be performed independently on these partitions. However, continuity is not preserved in that case, and also there is no unique reference model underlying the observed data. A stable algorithm with this case has been derived.[4]
If partitions are not known, the residual sum of squares can be used to choose optimal separation points.[5] However efficient computation and joint estimation of all model parameters (including the breakpoints) may be obtained by an iterative procedure[6] currently implemented in the package segmented[7] for the R language.
A variant of decision tree learning called model trees learns piecewise linear functions.[8]
Generalizations
[edit]
The notion of a piecewise linear function makes sense in several different contexts. Piecewise linear functions may be defined on n-dimensional Euclidean space, or more generally any vector space or affine space, as well as on piecewise linear manifolds and simplicial complexes (see simplicial map). In each case, the function may be real-valued, or it may take values from a vector space, an affine space, a piecewise linear manifold, or a simplicial complex. (In these contexts, the term “linear” does not refer solely to linear transformations, but to more general affine linear functions.)
In dimensions higher than one, it is common to require the domain of each piece to be a polygon or polytope. This guarantees that the graph of the function will be composed of polygonal or polytopal pieces.
Splines generalize piecewise linear functions to higher-order polynomials, which are in turn contained in the category of piecewise-differentiable functions, PDIFF.
Specializations
[edit]Important sub-classes of piecewise linear functions include the continuous piecewise linear functions and the convex piecewise linear functions. In general, for every n-dimensional continuous piecewise linear function , there is a
such that
If is convex and continuous, then there is a
such that
Applications
[edit]
In agriculture piecewise regression analysis of measured data is used to detect the range over which growth factors affect the yield and the range over which the crop is not sensitive to changes in these factors.
The image on the left shows that at shallow watertables the yield declines, whereas at deeper (> 7 dm) watertables the yield is unaffected. The graph is made using the method of least squares to find the two segments with the best fit.
The graph on the right reveals that crop yields tolerate a soil salinity up to ECe = 8 dS/m (ECe is the electric conductivity of an extract of a saturated soil sample), while beyond that value the crop production reduces. The graph is made with the method of partial regression to find the longest range of "no effect", i.e. where the line is horizontal. The two segments need not join at the same point. Only for the second segment method of least squares is used.
See also
[edit]Further reading
[edit]- Apps, P., Long, N., & Rees, R. (2014). Optimal piecewise linear income taxation. Journal of Public Economic Theory, 16(4), 523–545.
References
[edit]- ^ Stanley, William D. (2004). Technical Analysis And Applications With Matlab. Cengage Learning. p. 143. ISBN 978-1401864811.
- ^ a b Weisstein, Eric W. "Piecewise Function". mathworld.wolfram.com. Retrieved 2020-08-24.
- ^ Hamann, B.; Chen, J. L. (1994). "Data point selection for piecewise linear curve approximation" (PDF). Computer Aided Geometric Design. 11 (3): 289. doi:10.1016/0167-8396(94)90004-3.
- ^ Golovchenko, Nikolai. "Least-squares Fit of a Continuous Piecewise Linear Function". Retrieved 6 Dec 2012.
- ^ Vieth, E. (1989). "Fitting piecewise linear regression functions to biological responses". Journal of Applied Physiology. 67 (1): 390–396. doi:10.1152/jappl.1989.67.1.390. PMID 2759968.
- ^ Muggeo, V. M. R. (2003). "Estimating regression models with unknown break-points". Statistics in Medicine. 22 (19): 3055–3071. doi:10.1002/sim.1545. PMID 12973787. S2CID 36264047.
- ^ Muggeo, V. M. R. (2008). "Segmented: an R package to fit regression models with broken-line relationships" (PDF). R News (FTP). pp. 20–25.[dead ftp link] (To view documents see Help:FTP)
- ^ Landwehr, N.; Hall, M.; Frank, E. (2005). "Logistic Model Trees" (PDF). Machine Learning. 59 (1–2): 161–205. doi:10.1007/s10994-005-0466-3. S2CID 6306536.
- ^ Ovchinnikov, Sergei (2002). "Max-min representation of piecewise linear functions". Beiträge zur Algebra und Geometrie. 43 (1): 297–302. arXiv:math/0009026. MR 1913786.
- ^ A calculator for piecewise regression.
- ^ A calculator for partial regression.
Piecewise linear function
View on GrokipediaCore Concepts
Definition
A piecewise linear function, more precisely termed a piecewise affine function in modern usage, is a mapping where is the domain, obtained by partitioning into a finite number of polyhedral sets (the pieces) such that the interiors of the are disjoint, their union covers , and restricts to an affine function on each piece .[7] On each , the affine form is given by where is a matrix and is a constant vector.[8] This structure allows the function to exhibit linear behavior locally within each polyhedral region while enabling global nonlinearity through the partitioning.[9] The terminology "piecewise linear" is often used interchangeably with "piecewise affine," though strictly speaking, the latter emphasizes the inclusion of the constant term , providing greater generality over purely homogeneous linear functions (where ).[10] In contexts requiring homogeneity, such as certain geometric or algebraic applications, piecewise linear may refer exclusively to functions without the additive constant.[8] The polyhedral pieces are typically defined by linear inequalities, ensuring the domain decomposition aligns with convex optimization frameworks.[9] Piecewise linear functions gained prominence in the mid-20th century in numerical analysis and optimization as a tool for approximating nonlinear problems, building on foundational developments in linear programming such as George Dantzig's 1947 introduction of the simplex method.[11]Examples
One prominent example of a piecewise linear function is the absolute value function, defined as Its graph forms a V-shape with the vertex at the origin, consisting of two half-lines: one with slope -1 for negative and one with slope 1 for non-negative .[12] In machine learning, the rectified linear unit (ReLU) activation function serves as another fundamental piecewise linear example, expressed as This function outputs zero for negative inputs and follows the identity line with slope 1 for non-negative inputs, promoting sparsity and efficient computation in neural networks.[13] Piecewise linear functions extend naturally to multiple dimensions; for instance, in two dimensions, a function can be defined over a triangulated domain where it is affine on each triangular region, interpolating linearly via barycentric coordinates from vertex values. Such constructions are common in finite element methods for approximating surfaces or fields on planar domains.[14] A non-continuous piecewise linear function arises in approximations of step functions, such as the Heaviside step function for and for , which is piecewise constant—hence linear with zero slope—across the discontinuity at . Smoother approximations might insert a short linear ramp segment over a narrow interval around the jump to model transitions while preserving overall piecewise linearity.[15]Properties
Continuity and Differentiability
A piecewise linear function, defined as a function that is affine on each subdomain of a partition of its domain, is continuous at a breakpoint if the left-hand limit , the right-hand limit , and the function value are equal. This condition holds when the affine expressions for the adjacent pieces match at , ensuring no jump discontinuity occurs. In one dimension, if the pieces are for and for , continuity requires .[16] In higher dimensions, continuity extends to the pieces agreeing on the shared hyperplanes or faces between adjacent polytopes in the domain partition. For adjacent pieces defined by affine functions on polytope and on polytope , the condition is for all in the intersection , which simplifies to matching values at vertices of the shared face. This ensures the function is well-defined and continuous across boundaries.[16] Piecewise linear functions are differentiable at interior points of each piece, where the derivative equals the constant slope (gradient) of the affine expression for that piece. At breakpoints, differentiability requires the slopes of adjacent pieces to match; if in one dimension (or gradients agree across the boundary hyperplane in higher dimensions), the function is differentiable with derivative . Otherwise, the function is not differentiable at the breakpoint due to a corner or kink. The one-sided derivatives always exist and are given by the adjacent slopes: the left derivative and the right derivative .[17] For convex piecewise linear functions, such as the pointwise maximum of affine functions , the subdifferential at a point is the convex hull of the gradients of the active pieces: , where . At interior points or smooth boundaries where only one piece is active, the subdifferential is a singleton equal to the gradient, indicating differentiability. At kinks where multiple pieces are active, the subdifferential is a polytope with dimension greater than zero, confirming non-differentiability. For example, the absolute value function , which is piecewise linear with pieces for and for , is continuous everywhere but not differentiable at , where .[17]Convexity and Other Structural Properties
A piecewise linear function in one dimension is convex if and only if the slopes of its successive linear pieces are non-decreasing, assuming the pieces are ordered according to their domains along the real line. This condition ensures that the function lies below any chord connecting two points on its graph, preserving the defining property of convexity. More generally, in higher dimensions, a piecewise linear function is convex if it can be represented as the pointwise maximum of a finite collection of affine functions, as the maximum of convex functions is itself convex.[18][19] Monotonicity of a piecewise linear function follows directly from the signs of its slopes. Specifically, the function is non-decreasing over its domain if every slope is non-negative, and it is strictly increasing if every slope is positive. In the context of convex piecewise linear functions, non-decreasing slopes already imply a form of global non-decreasing behavior when combined with non-negative initial slopes, though strict monotonicity requires stricter conditions on the slopes.[18] The boundedness of a piecewise linear function depends on its domain and the slopes of the boundary pieces. On a bounded domain, the function is always bounded above and below, as it is continuous and the domain is compact. On an unbounded domain, such as the entire real line, the function is bounded below if the leftmost slope is non-positive and the rightmost slope is non-negative (or vice versa for bounded above), but it becomes unbounded in the direction where the boundary slope has the same sign as the direction of extension. Piecewise linear functions exhibit Lipschitz continuity when their slopes are bounded. In particular, if the absolute values of all slopes are finite, the function is globally Lipschitz continuous over its domain, with the Lipschitz constant equal to the supremum of the absolute values of the slopes; this follows from the fact that the function's variation is controlled by the steepest linear piece. For convex piecewise linear functions, local Lipschitz continuity holds on open convex sets, but global bounds require uniform control on the slopes.[20]Approximation Techniques
Fitting to Curves
Piecewise linear functions provide a fundamental approach to approximating smooth nonlinear curves by dividing the domain interval into subintervals and fitting linear segments on each. In uniform piecewise linear interpolation, the interval is partitioned into equal subintervals using knots for , where . The approximation is then constructed by connecting the points with straight line segments, ensuring the function passes exactly through these evaluation points. This method is straightforward to implement and computationally efficient, as each segment is defined by the simple linear formula for .[21][22] For a twice continuously differentiable function , the error in this approximation is bounded by on each subinterval , leading to an overall error of order . With equidistant knots, this simplifies to , where depends on the maximum second derivative, demonstrating second-order convergence as the number of subintervals increases. These bounds highlight the method's accuracy for functions with bounded curvature, though the error grows quadratically with subinterval length.[23] To enhance accuracy without uniformly increasing the number of segments, adaptive methods refine the knot placement in regions of high curvature, such as where the second derivative is large. These approaches estimate curvature—often via local quadratic approximations or direct computation of —and insert additional knots to reduce subinterval lengths proportionally to the local variation, ensuring a more equitable error distribution across the domain. For instance, curvature-threshold-based subdivision identifies key points where the curve's bending exceeds a specified tolerance, allowing targeted refinement while maintaining computational efficiency.[24][25] Piecewise linear interpolation corresponds to a spline of degree 1, characterized by continuity at the knots, meaning the function is continuous but its derivative may exhibit jumps. This structure arises naturally from matching function values at knot points without enforcing derivative continuity, distinguishing it from higher-degree splines. In practice, the construction algorithm begins with knot selection (uniform or adaptive), followed by evaluation of at these points, and then determination of linear coefficients on each segment either by direct connection for interpolation or via local least-squares fitting for minor smoothing. Historical roots of these techniques lie in 19th-century graphical methods for astronomical tabulations and engineering drawings, with formal theoretical development in numerical analysis accelerating after the 1950s alongside electronic computing advancements.[21][26]Fitting to Data
Fitting piecewise linear functions to discrete, noisy datasets is a common task in statistical modeling, where the goal is to approximate the underlying relationship between variables using segments of linear functions joined at knots, while accounting for observational errors. The standard approach employs least squares estimation to minimize the sum of squared residuals between observed data points and the predicted values from the piecewise linear model. This involves solving an optimization problem where the model parameters—slopes, intercepts, and knot locations—are adjusted to best fit the data, assuming the errors are independent and identically distributed. For a dataset , the objective is to minimize where is a continuous piecewise linear function with breaks (knots) at positions , such that for in the -th interval defined by the knots.[3][27] When knots are fixed in advance, the problem reduces to separate linear regressions on each segment, with continuity constraints enforced at the joins to ensure smoothness. For variable knots, optimization algorithms such as differential evolution or dynamic programming are used to simultaneously estimate knot positions and segment parameters, as the objective function is non-convex but can leverage the piecewise linearity for efficient computation.[28][29] Knot selection is crucial for balancing model complexity and fit, often guided by information criteria like the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), which penalize additional pieces to avoid overfitting. These criteria evaluate models with varying numbers of knots, selecting the one that minimizes AIC = -2 log(L) + 2p or BIC = -2 log(L) + p log(n), where L is the likelihood and p the number of parameters. Alternative methods include forward selection, starting with a single line and iteratively adding knots where they most reduce residuals, or backward selection from an overparameterized model.[30][31] Change-point detection methods identify potential knots by testing for structural breaks in the data, such as abrupt shifts in slope. Statistical tests like the cumulative sum (CUSUM) procedure monitor the partial sums of residuals or score statistics to detect deviations from a null linear model, signaling a change point when the CUSUM statistic exceeds a threshold derived from asymptotic distributions or simulations. For multiple change points, sequential testing or penalized likelihood approaches extend this framework.[32][33] Implementations are available in statistical software; for example, the R package 'segmented' fits piecewise linear models via iterative least squares, estimating breakpoints and supporting AIC/BIC for selection.[31] In Python, the 'pwlf' library uses differential evolution to optimize continuous piecewise linear fits for a specified number of segments. A basic pseudocode for segmented regression with fixed knots might proceed as follows:Initialize parameters: slopes and intercepts for each segment
While convergence not achieved:
For each segment j:
Fit linear regression to data points in interval j, using continuity at knots
Update knot positions if variable (e.g., via grid search or optimization)
Compute residuals and check change in objective < tolerance
Return fitted parameters and function
Initialize parameters: slopes and intercepts for each segment
While convergence not achieved:
For each segment j:
Fit linear regression to data points in interval j, using continuity at knots
Update knot positions if variable (e.g., via grid search or optimization)
Compute residuals and check change in objective < tolerance
Return fitted parameters and function
