Hubbry Logo
search
logo
2Sum
2Sum
current hub

2Sum

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

2Sum[1] is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.

2Sum and its variant Fast2Sum were first published by Ole Møller in 1965.[2] Fast2Sum is often used implicitly in other algorithms such as compensated summation algorithms;[1] Kahan's summation algorithm was published first in 1965,[3] and Fast2Sum was later factored out of it by Dekker in 1971 for double-double arithmetic algorithms.[4] The names 2Sum and Fast2Sum appear to have been applied retroactively by Shewchuk in 1997.[5]

Algorithm

[edit]

Given two floating-point numbers and , 2Sum computes the floating-point sum rounded to nearest and the floating-point error so that , where and respectively denote the addition and subtraction rounded to nearest. The error is itself a floating-point number.

Inputs floating-point numbers
Outputs rounded sum and exact error
  1. return

Provided the floating-point arithmetic is correctly rounded to nearest (with ties resolved any way), as is the default in IEEE 754, and provided the sum does not overflow and, if it underflows, underflows gradually, it can be proven that .[1][6][2]

A variant of 2Sum called Fast2Sum uses only three floating-point operations, for floating-point arithmetic in radix 2 or radix 3, under the assumption that the exponent of is at least as large as the exponent of , such as when :[1][6][7][4]

Inputs radix-2 or radix-3 floating-point numbers and , where either at least one is zero, or which have normalized exponents
Outputs rounded sum and exact error
  1. return

Even if the conditions are not satisfied, 2Sum and Fast2Sum often provide reasonable approximations to the error, i.e. , which enables algorithms for compensated summation, dot-product, etc., to have low error even if the inputs are not sorted or the rounding mode is unusual.[1][2]

More complicated variants of 2Sum and Fast2Sum are used for rounding modes other than round-to-nearest.[1]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The 2Sum algorithm is a fundamental technique in floating-point arithmetic designed to compute both the rounded sum $ s = \RN(a + b) $ of two floating-point numbers $ a $ and $ b $, and an error term $ e $ such that $ s + e = a + b $ exactly in exact arithmetic, thereby capturing the round-off error without additional precision. This error-free transformation enables higher accuracy in numerical computations by allowing the error to be propagated and compensated in subsequent operations. Introduced by Ole Møller in his 1965 paper on achieving quasi double-precision through careful error handling in additions, the algorithm addresses the loss of accuracy when adding numbers of significantly different magnitudes in floating-point systems. It was later formalized and analyzed by Donald Knuth, who integrated it into discussions of numerical stability in The Art of Computer Programming. The standard 2Sum procedure involves a sequence of floating-point operations: first compute $ s $, then derive the residuals for $ a $ and $ b $ relative to $ s $, and finally set $ e $ as their sum, ensuring the result holds under IEEE 754 rounding modes assuming no overflow.[1] A key variant, Fast2Sum, optimizes the process by assuming $ |a| \geq |b| $, reducing the number of operations from six to three while preserving exactness, making it suitable for performance-critical applications. Both algorithms serve as building blocks in advanced numerical methods, including compensated summation (e.g., for accurate dot products) and floating-point expansions like those in Shewchuk's robust geometric predicates. Their robustness has been extensively studied, confirming reliability even under non-standard rounding modes and minimal overflow risks, with applications spanning scientific computing, computer algebra, and verified numerical software.

Introduction

Definition and Purpose

The 2Sum algorithm is a fundamental technique in numerical computing designed to extract the exact rounding error from a floating-point addition operation. Given two floating-point numbers aa and bb, it computes the floating-point sum s=fl(a+b)s = \mathrm{fl}(a + b) and the error term tt such that s+t=a+bs + t = a + b exactly, where fl()\mathrm{fl}(\cdot) denotes the floating-point evaluation with rounding. This decomposition allows the exact mathematical sum to be represented as a short expansion of two floating-point numbers, preserving full precision without requiring higher-radix or extended-precision hardware.
s=fl(a+b),t=a+bs. \begin{aligned} s &= \mathrm{fl}(a + b), \\ t &= a + b - s. \end{aligned}
The primary purpose of 2Sum is to enable precise error tracking during sequential additions, facilitating compensated summation and other methods that achieve higher effective precision from standard floating-point arithmetic. By isolating the roundoff error tt, which is typically small in magnitude (bounded by the unit in the last place of ss), the algorithm supports robust implementations in adaptive precision systems, such as those used in geometric predicates and scientific simulations, without the overhead of arbitrary-precision libraries. This makes it particularly valuable in environments constrained by IEEE 754 floating-point standards, where naive summation can accumulate significant errors. The term "2Sum" was coined retroactively by Jonathan Shewchuk in his 1997 work on adaptive floating-point arithmetic, though the underlying algorithm was originally described by Ole Møller in 1965 as a means to achieve quasi double-precision addition. A related variant, Fast2Sum, simplifies the process under the assumption that one operand dominates in magnitude.

Historical Development

The 2Sum algorithm originated in the work of Ole Møller, who first described it in 1965 as a technique for achieving quasi-double precision in floating-point addition by computing an error-free transformation that separates the rounded sum from the exact error term.[2] This approach addressed the loss of accuracy when adding numbers of disparate magnitudes in floating-point systems, enabling more precise computations without requiring additional hardware precision.[2] Møller's method laid the groundwork for handling roundoff errors explicitly, predating the formalization of modern floating-point standards. Møller's contribution appeared two decades before the IEEE 754 standard for binary floating-point arithmetic was established in 1985, yet the 2Sum algorithm's reliance on properties like rounding-to-nearest and guard digits aligns seamlessly with the standard's requirements for reproducible and accurate computations.[2] The technique's robustness in binary floating-point environments ensured its enduring relevance even after IEEE 754's adoption. The specific terminology "2Sum" was retroactively applied by Jonathan Richard Shewchuk in his 1997 technical report on adaptive precision floating-point arithmetic, where he formalized the algorithm's role in constructing exact predicates and expansions for geometric computations.[3][4] Shewchuk's nomenclature distinguished it from related error-free operations, such as the variant Fast2Sum, which assumes ordered inputs for efficiency and traces its operations back to elements in Møller's original description.[3] Over time, 2Sum evolved within the framework of error-free transformations, becoming a foundational component in numerical libraries for extending precision and compensating for roundoff in summation algorithms, including Kahan's compensated summation method.[3] This integration facilitated its adoption in adaptive arithmetic systems, influencing subsequent variants and broader applications in robust geometric and scientific computing.[4]

Algorithm Description

Standard 2Sum Procedure

The standard 2Sum procedure computes a faithful representation of the sum of two floating-point numbers aa and bb by producing the rounded sum s=fl([a+b](/page/ListofFrenchcomposers))s = \mathrm{fl}([a + b](/page/List_of_French_composers)) and an error term tt such that a+b=s+ta + b = s + t exactly, with both ss and tt being floating-point numbers.[https://hal.archives-ouvertes.fr/ensl-01310023] This algorithm, originally described by Møller and formalized by Knuth, operates under the IEEE 754 standard and requires six floating-point operations, involving additions and subtractions to isolate the rounding error without any prior assumption on the relative magnitudes of aa and bb.[5] The procedure begins by computing the floating-point sum ss. It then estimates effective contributions of aa and bb to ss through successive subtractions, accounting for potential cancellation in the initial addition. The differences between the original inputs and these estimates yield the error components δa\delta_a and δb\delta_b, which are finally added to form tt. This sequence ensures that tt captures the exact rounding error of the initial sum, preserving the total as s+t=a+bs + t = a + b without overflow or underflow issues in the IEEE 754 framework, provided no intermediate overflow occurs. If the computed tt is tiny relative to the unit in the last place (ulp) of ss, the algorithm's structure inherently handles it through the subtraction steps, though advanced variants may incorporate number splitting to further mitigate denormalization risks; however, the core process relies on the six operations for robustness.[6] The operational mechanics derive tt as follows. Start with s=fl(a+b)s = \mathrm{fl}(a + b), where fl()\mathrm{fl}(\cdot) denotes the floating-point evaluation under round-to-nearest. The effective aa' is fl(sb)\mathrm{fl}(s - b), approximating aa as seen in the sum. Then, b=fl(sa)b' = \mathrm{fl}(s - a') refines the estimate for bb. The error terms are δa=fl(aa)\delta_a = \mathrm{fl}(a - a') and δb=fl(bb)\delta_b = \mathrm{fl}(b - b'), leading to t=fl(δa+δb)t = \mathrm{fl}(\delta_a + \delta_b). Under IEEE 754 arithmetic with precision p4p \geq 4, this yields t=(a+b)s+αt = (a + b) - s + \alpha where α<2p+1ulp(s)|\alpha| < 2^{-p+1} \cdot \mathrm{ulp}(s), ensuring tt is a faithful rounding of the true error (a+b)s(a + b) - s. Here is the pseudocode for the standard 2Sum procedure:
function [s, t] = TwoSum(a, b)
    s = fl(a + b)                    // Step 1: Rounded sum
    a_prime = fl(s - b)              // Step 2: Estimate effective a
    b_prime = fl(s - a_prime)        // Step 3: Estimate effective b
    delta_a = fl(a - a_prime)        // Step 4: Error in a estimate
    delta_b = fl(b - b_prime)        // Step 5: Error in b estimate
    t = fl(delta_a + delta_b)        // Step 6: Total error term
    return [s, t]
end
This general approach contrasts with the Fast2Sum variant, which uses only three operations but requires the precondition ab|a| \geq |b| for efficiency.

Fast2Sum Variant

The Fast2Sum algorithm serves as an optimized variant of the 2Sum procedure, designed to compute the rounded sum $ s $ and the associated error $ t $ such that $ s + t = a + b $ exactly, while requiring fewer floating-point operations under specific preconditions.[7] Introduced by T. J. Dekker in 1971 as part of techniques for extending floating-point precision, it prioritizes computational efficiency for cases where the inputs satisfy magnitude and format constraints.[7] This makes it particularly suitable for inner loops in numerical algorithms where speed is critical and the preconditions can be ensured. The procedure consists of three floating-point operations: first, compute the rounded sum $ s = \mathrm{fl}(a + b) $; second, calculate the intermediate difference $ \delta = \mathrm{fl}(s - a) $; third, extract the error term $ t = \mathrm{fl}(b - \delta) $.[7] This simplified error extraction is captured in the key relation $ t = \mathrm{fl}(b - \mathrm{fl}(s - a)) $, which avoids the more elaborate scaling and unrounding steps of the standard 2Sum.[8] The algorithm assumes $ |a| \geq |b| $ and operates in floating-point systems with radix $ \beta \leq 3 $, ensuring the error $ t $ is correctly rounded and the sum $ s + t $ is exact.[7] Without the magnitude precondition, Fast2Sum fails to produce the correct error term, necessitating a prior swap of $ a $ and $ b $ if violated.[8] In IEEE 754 binary floating-point arithmetic, Fast2Sum remains compatible with gradual underflow, preserving accuracy even when subnormal numbers are involved, as long as the radix and rounding mode assumptions hold.[9] Compared to the standard 2Sum, which requires six floating-point operations to handle arbitrary inputs unconditionally, Fast2Sum reduces this to three, offering a significant speedup at the cost of restricted applicability.[10] This efficiency has made it a building block in enhanced summation methods, such as variants of Kahan summation.[7]

Mathematical Foundations

Error Computation in Floating-Point Addition

In floating-point arithmetic conforming to the IEEE 754 standard, the addition of two floating-point numbers aa and bb is computed as s=fl(a+b)s = \mathrm{fl}(a + b), where the exact sum a+ba + b is rounded to the nearest representable floating-point number ss, with ties resolved by rounding to even. This rounding introduces an error bounded by half the unit in the last place (ulp) of the result: a+bs12ulp(s)|a + b - s| \leq \frac{1}{2} \mathrm{ulp}(s). Equivalently, in terms of relative error, s=(a+b)(1+ϵ)s = (a + b)(1 + \epsilon) with ϵ2p|\epsilon| \leq 2^{-p}, where pp is the precision of the floating-point format (e.g., p=53p = 53 for binary64 double precision), and the unit roundoff u=2pu = 2^{-p} bounds the maximum relative rounding error.[11][12] The error term tt in the 2Sum transformation represents this rounding error exactly, satisfying the relation a+b=s+ta + b = s + t, where ss is the floating-point sum and tt captures the discarded portion due to rounding. Here, tt is bounded by t12ulp(s)|t| \leq \frac{1}{2} \mathrm{ulp}(s), ensuring that tt is small relative to ss and aligns with the ulp scale around (a+b)/2(a + b)/2, specifically tulp((a+b)/2)|t| \leq \mathrm{ulp}((a + b)/2) under typical conditions where no overflow occurs. This bound follows directly from the IEEE 754 rounding guarantee, as the maximum deviation from the exact sum cannot exceed half the spacing between representable numbers at the magnitude of ss.[11][9][12] The 2Sum algorithm constitutes an error-free transformation, decomposing the exact sum a+ba + b into two floating-point components ss and tt such that their sum is exact without further rounding loss, while preserving all information from the original addition. This separation allows subsequent computations to track and compensate for accumulated errors precisely, leveraging the fact that tt is a floating-point number whose magnitude is controlled by the ulp bound, enabling applications in high-accuracy numerical methods.[11][12]

Assumptions and Preconditions

The 2Sum algorithm operates under the precondition of compliance with the IEEE 754 standard for binary floating-point arithmetic, which ensures a consistent framework for representation, operations, and exception handling in radix-2 systems with specified precision pp.[9] Specifically, it requires the rounding mode to be round-to-nearest (ties to even), the default mode in IEEE 754, to guarantee the exact computation of the error term in the floating-point addition a+b=s+αa + b = s + \alpha, where ss is the rounded sum and α\alpha is the exact error.[9] Additionally, the inputs aa and bb must satisfy a+b>0|a| + |b| > 0, as the case where both are zero yields a trivial sum with no error, rendering the algorithm's error extraction unnecessary.[13] The algorithm inherently handles gradual underflow through support for subnormal numbers, maintaining accuracy without abrupt transitions to zero when results approach the underflow threshold.[9] Limitations arise if these preconditions are violated, leading to inexact error computation. For instance, in non-round-to-nearest modes such as directed roundings (toward ++\infty or -\infty), 2Sum produces a faithful rounding of the error rather than the exact value, though the result remains bounded closely to the true error.[13] Similarly, the algorithm is designed for binary radices (radix 2); in higher radices like 3 or 10, it may fail to deliver exact errors due to differences in representation and rounding properties beyond the binary case.[9] Overflow is generally managed robustly, with 2Sum remaining nearly immune except in rare cases where one input equals the largest representable finite number Ω\Omega, potentially requiring additional checks for detection.[13] The algorithm accommodates subnormal inputs and outputs effectively within IEEE 754, preserving the exact error relation even when denormalization occurs, though extreme denormalization issues—such as when the sum's exponent difference causes partial loss—may lead to approximated rather than exact error terms in boundary scenarios.[9] A key conceptual aspect is its scale invariance, allowing consistent operation regardless of the absolute magnitudes of aa and bb. The algorithm exhibits scale invariance, allowing consistent exact error computation regardless of the absolute magnitudes or exponent differences of aa and bb, provided the preconditions are satisfied and no overflow occurs. This ties into the broader error model of floating-point addition, where the precondition of round-to-nearest ensures the unit roundoff bound 12ulp(s)\frac{1}{2} \mathrm{ulp}(s).[9]

Applications and Implementations

Use in Compensated Summation

The 2Sum algorithm serves as a key building block in compensated summation techniques, where it enables the exact decomposition of floating-point additions into a rounded sum and a corresponding error term, allowing these errors to be tracked and compensated in subsequent operations. In particular, it integrates into variants of the Kahan summation algorithm, originally proposed for reducing rounding errors in sequential additions by maintaining a separate accumulator for lost low-order bits. By replacing the approximate error estimation in Kahan's method with the precise error extraction provided by 2Sum, the overall summation achieves bounded error growth that is significantly tighter than the original.[14][15] A representative example of this integration occurs in iterative summation algorithms, such as the Sum2 procedure, where 2Sum is applied repeatedly to the current accumulated sum $ s $ and the next input term $ x_i $. Specifically, for each step, compute $ (t, \delta) = 2\mathrm{Sum}(s, x_i) $, where $ t $ is the rounded sum and $ \delta $ is the exact error term satisfying $ s + x_i = t + \delta $ with $ |\delta| < \mathrm{ulp}(t) $; then update $ s \leftarrow t $ and add $ \delta $ to a secondary accumulator that feeds into future iterations. This process ensures that rounding errors are not discarded but instead incorporated precisely, preventing their accumulation into higher-order bits.[14] One distinctive benefit of employing 2Sum in such compensated schemes is the enhancement of accuracy from the $ O(n \cdot \mathrm{ulp}) $ error bound of naive pairwise summation—where errors grow linearly with the number of terms $ n $—to a near-exact result with error bounded by a small constant multiple of the unit in the last place (ulp), independent of $ n $ for well-conditioned inputs. For instance, in floating-point summation of $ n $ terms, this reduces the effective error to at most 0.5 ulp(s), where s is the true sum, ensuring faithful rounding and full double-precision accuracy even for large $ n $.[14] A specialized variant appears in Dekker's double-double arithmetic, an early framework for emulating extended-precision computation using pairs of floating-point numbers. Here, 2Sum (termed "two-sum" in the original) is employed to perform exact addition of double-length accumulators by decomposing the sum of the high-order parts and then compensating the low-order parts with the extracted error, enabling summation with approximately twice the precision of the base floating-point format without hardware support.

Integration with Other Numerical Methods

The 2Sum algorithm plays a crucial role in adaptive precision techniques for geometric predicates, particularly in robust computational geometry where exact sign determination of determinants is required without invoking multiprecision arithmetic. In Jonathan Shewchuk's framework, 2Sum serves as a foundational error-free transformation to construct floating-point expansions, which represent exact sums of non-overlapping floating-point numbers. These expansions enable the evaluation of predicates like orientation and incircle tests by adaptively increasing precision only when necessary, ensuring robustness against roundoff errors in standard floating-point arithmetic.[3] This approach allows for certified computations in geometric algorithms, such as Delaunay triangulations, by guaranteeing exact results solely through double-precision operations on commodity hardware.[3] A key application of 2Sum extends to error tracking in dot products, where it compensates for rounding errors in summation steps to achieve higher accuracy in linear algebra operations. When combined with the 2Prod algorithm—which computes the exact product and its rounding error—2Sum facilitates compensated dot products, as in the CompDot method. This integration corrects both multiplication and addition errors iteratively, yielding results accurate to nearly twice the precision of naive computations for matrix-vector multiplications and similar tasks.[14] For instance, in matrix operations involving ill-conditioned data, such pairings maintain backward stability without requiring extended precision formats.[14] The unique capability of 2Sum-based expansions to deliver exact predicates in floating-point geometry without multiprecision hardware has led to its adoption in established libraries. Notably, the CGAL library incorporates Shewchuk's robust predicates, leveraging 2Sum-derived expansions for certified geometric computations in higher-dimensional spaces and mesh generation.[16][17] This integration supports applications in computer-aided design and scientific simulation, where topological consistency is paramount.

Properties and Analysis

Correctness Guarantees

The 2Sum algorithm guarantees the exact recovery of the rounding error in floating-point addition under IEEE 754 binary arithmetic with rounding to nearest, computing floating-point numbers ss and tt such that s+t=a+bs + t = a + b exactly, where s=RN(a+b)s = \mathrm{RN}(a + b) is the correctly rounded sum and tt is the error term.
\] This exactness holds without comparisons, using six floating-point operations, and is formally verified to be robust against underflow in intermediate steps, though overflow can invalidate the result.\[
The proof of correctness relies on the properties of floating-point subtraction and addition in IEEE 754. Let s=RN(a+b)s = \mathrm{RN}(a + b), a=RN(sb)a' = \mathrm{RN}(s - b), b=RN(sa)b' = \mathrm{RN}(s - a'), δa=RN(aa)\delta_a = \mathrm{RN}(a - a'), δb=RN(bb)\delta_b = \mathrm{RN}(b - b'), and t=RN(δa+δb)t = \mathrm{RN}(\delta_a + \delta_b). Each subtraction recovers the exact difference up to rounding, and the final addition reconstructs the error such that no further rounding loss occurs, ensuring a+b=s+ta + b = s + t in exact arithmetic, with all terms representable in the floating-point system.
\] This chain of operations leverages the fact that subtractions in floating-point preserve the relative [error](/page/Error) bound of $\frac{1}{2} \mathrm{ulp}(s)$, propagating exactly under the no-overflow [precondition](/page/Precondition).\[
The error term tt is bounded by t12ulp(s)|t| \leq \frac{1}{2} \mathrm{ulp}(s), with equality achievable in the worst case when a+ba + b lies exactly midway between two representable numbers. $$] \begin{equation} |t| \leq \frac{1}{2} ,\mathrm{ulp}(s) \end{equation} This bound follows directly from the IEEE 754 rounding guarantee for addition, where the maximum deviation from the exact sum is half a unit in the last place of the result.[$$ Additionally, 2Sum preserves exactness when adapted for fused multiply-add operations, by emulating the fused step through error-free transformations that maintain the invariant s+t=s + t = exact result.[]

Computational Complexity

The 2Sum algorithm operates in constant time, with a complexity of O(1) per call, typically requiring 3 to 6 floating-point operations depending on the specific variant employed.[18] The Fast2Sum variant, applicable under conditions where the exponent of the first operand is at least as large as that of the second, achieves this with exactly 3 operations: one addition, one subtraction for the error extraction from the larger operand, and one final subtraction to adjust the smaller operand.[18] In contrast, the general 2Sum requires up to 6 operations to handle arbitrary inputs without such assumptions, ensuring the exact decomposition into a rounded sum and its error term.[18] Space complexity for both variants remains O(1), as the algorithm generates only two output floating-point values—the computed sum and the corresponding rounding error—without necessitating additional data structures or storage beyond a handful of temporary variables.[14] In the context of summing n floating-point numbers, iterative application of 2Sum yields an overall time complexity of O(n), preserving the linear scaling of standard summation while mitigating error accumulation to levels comparable to twice the working precision.[14] This approach avoids the overhead of higher-order expansions, enabling accurate results without deviating from the asymptotic efficiency of naive pairwise addition. Compared to arbitrary-precision arithmetic libraries, 2Sum implementations offer superior performance for error tracking in summations, delivering near-equivalent accuracy at a fraction of the computational cost—often several times faster—by leveraging native floating-point hardware rather than software-emulated multi-word arithmetic.[19]
User Avatar
No comments yet.