2Sum[1] is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.
2Sum and its variant Fast2Sum were first published by Ole Møller in 1965.[2] Fast2Sum is often used implicitly in other algorithms such as compensated summation algorithms;[1] Kahan's summation algorithm was published first in 1965,[3] and Fast2Sum was later factored out of it by Dekker in 1971 for double-double arithmetic algorithms.[4] The names 2Sum and Fast2Sum appear to have been applied retroactively by Shewchuk in 1997.[5]
Algorithm
[edit]Given two floating-point numbers and , 2Sum computes the floating-point sum rounded to nearest and the floating-point error so that , where and respectively denote the addition and subtraction rounded to nearest. The error is itself a floating-point number.
- Inputs floating-point numbers
- Outputs rounded sum and exact error
- return
Provided the floating-point arithmetic is correctly rounded to nearest (with ties resolved any way), as is the default in IEEE 754, and provided the sum does not overflow and, if it underflows, underflows gradually, it can be proven that .[1][6][2]
A variant of 2Sum called Fast2Sum uses only three floating-point operations, for floating-point arithmetic in radix 2 or radix 3, under the assumption that the exponent of is at least as large as the exponent of , such as when :[1][6][7][4]
- Inputs radix-2 or radix-3 floating-point numbers and , where either at least one is zero, or which have normalized exponents
- Outputs rounded sum and exact error
- return
Even if the conditions are not satisfied, 2Sum and Fast2Sum often provide reasonable approximations to the error, i.e. , which enables algorithms for compensated summation, dot-product, etc., to have low error even if the inputs are not sorted or the rounding mode is unusual.[1][2]
More complicated variants of 2Sum and Fast2Sum are used for rounding modes other than round-to-nearest.[1]
See also
[edit]References
[edit]- ^ a b c d e f Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Cham, Switzerland: Birkhäuser. pp. 104–111. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9. Archived from the original on 2023-04-28. Retrieved 2020-09-20.
- ^ a b c Møller, Ole (March 1965). "Quasi double-precision in floating point addition". BIT Numerical Mathematics. 5: 37–50. doi:10.1007/BF01975722. S2CID 119991676.
- ^ Kahan, W. (January 1965). "Further remarks on reducing truncation errors". Communications of the ACM. 8 (1). Association for Computing Machinery: 40. doi:10.1145/363707.363723. ISSN 0001-0782. S2CID 22584810.
- ^ a b Dekker, T.J. (June 1971). "A floating-point technique for extending the available precision". Numerische Mathematik. 18 (3): 224–242. doi:10.1007/BF01397083. S2CID 63218464. Archived from the original on 2020-07-19. Retrieved 2020-09-24.
- ^ Shewchuk, Jonathan Richard (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". Discrete & Computational Geometry. 18 (3): 305–363. doi:10.1007/PL00009321.
- ^ a b Knuth, Donald E. (1998). The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison–Wesley. p. 236. ISBN 978-0-201-89684-8. Archived from the original on 2017-07-16. Retrieved 2020-09-20.
- ^ Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. pp. 138–143. ISBN 0-13-322495-3.
Introduction
Definition and Purpose
The 2Sum algorithm is a fundamental technique in numerical computing designed to extract the exact rounding error from a floating-point addition operation. Given two floating-point numbers and , it computes the floating-point sum and the error term such that exactly, where denotes the floating-point evaluation with rounding. This decomposition allows the exact mathematical sum to be represented as a short expansion of two floating-point numbers, preserving full precision without requiring higher-radix or extended-precision hardware.Historical Development
The 2Sum algorithm originated in the work of Ole Møller, who first described it in 1965 as a technique for achieving quasi-double precision in floating-point addition by computing an error-free transformation that separates the rounded sum from the exact error term.[2] This approach addressed the loss of accuracy when adding numbers of disparate magnitudes in floating-point systems, enabling more precise computations without requiring additional hardware precision.[2] Møller's method laid the groundwork for handling roundoff errors explicitly, predating the formalization of modern floating-point standards. Møller's contribution appeared two decades before the IEEE 754 standard for binary floating-point arithmetic was established in 1985, yet the 2Sum algorithm's reliance on properties like rounding-to-nearest and guard digits aligns seamlessly with the standard's requirements for reproducible and accurate computations.[2] The technique's robustness in binary floating-point environments ensured its enduring relevance even after IEEE 754's adoption. The specific terminology "2Sum" was retroactively applied by Jonathan Richard Shewchuk in his 1997 technical report on adaptive precision floating-point arithmetic, where he formalized the algorithm's role in constructing exact predicates and expansions for geometric computations.[3][4] Shewchuk's nomenclature distinguished it from related error-free operations, such as the variant Fast2Sum, which assumes ordered inputs for efficiency and traces its operations back to elements in Møller's original description.[3] Over time, 2Sum evolved within the framework of error-free transformations, becoming a foundational component in numerical libraries for extending precision and compensating for roundoff in summation algorithms, including Kahan's compensated summation method.[3] This integration facilitated its adoption in adaptive arithmetic systems, influencing subsequent variants and broader applications in robust geometric and scientific computing.[4]Algorithm Description
Standard 2Sum Procedure
The standard 2Sum procedure computes a faithful representation of the sum of two floating-point numbers and by producing the rounded sum and an error term such that exactly, with both and being floating-point numbers.[https://hal.archives-ouvertes.fr/ensl-01310023] This algorithm, originally described by Møller and formalized by Knuth, operates under the IEEE 754 standard and requires six floating-point operations, involving additions and subtractions to isolate the rounding error without any prior assumption on the relative magnitudes of and .[5] The procedure begins by computing the floating-point sum . It then estimates effective contributions of and to through successive subtractions, accounting for potential cancellation in the initial addition. The differences between the original inputs and these estimates yield the error components and , which are finally added to form . This sequence ensures that captures the exact rounding error of the initial sum, preserving the total as without overflow or underflow issues in the IEEE 754 framework, provided no intermediate overflow occurs. If the computed is tiny relative to the unit in the last place (ulp) of , the algorithm's structure inherently handles it through the subtraction steps, though advanced variants may incorporate number splitting to further mitigate denormalization risks; however, the core process relies on the six operations for robustness.[6] The operational mechanics derive as follows. Start with , where denotes the floating-point evaluation under round-to-nearest. The effective is , approximating as seen in the sum. Then, refines the estimate for . The error terms are and , leading to . Under IEEE 754 arithmetic with precision , this yields where , ensuring is a faithful rounding of the true error . Here is the pseudocode for the standard 2Sum procedure:function [s, t] = TwoSum(a, b)
s = fl(a + b) // Step 1: Rounded sum
a_prime = fl(s - b) // Step 2: Estimate effective a
b_prime = fl(s - a_prime) // Step 3: Estimate effective b
delta_a = fl(a - a_prime) // Step 4: Error in a estimate
delta_b = fl(b - b_prime) // Step 5: Error in b estimate
t = fl(delta_a + delta_b) // Step 6: Total error term
return [s, t]
end
This general approach contrasts with the Fast2Sum variant, which uses only three operations but requires the precondition for efficiency.