Recent from talks
Nothing was collected or created yet.
Exponentiation by squaring
View on WikipediaThis article may be confusing or unclear to readers. (May 2022) |
This article needs additional citations for verification. (February 2018) |
In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.
Basic method
[edit]Recursive version
[edit]The method is based on the observation that, for any integer , one has
If the exponent n is zero, then the answer is 1. If the exponent is negative then we can reuse the previous formula by rewriting the value using a positive exponent. That is,
Together, these may be implemented directly as the following recursive algorithm:
Inputs: a real number x; an integer n
Output: xn
function exp_by_squaring(x, n) is
if n < 0 then
return exp_by_squaring(1 / x, −n)
else if n = 0 then
return 1
else if n is even then
return exp_by_squaring(x × x, n / 2)
else if n is odd then
return x × exp_by_squaring(x × x, (n − 1) / 2)
end function
In each recursive call, the least-significant digit of the binary representation of n is removed. It follows that the number of recursive calls is the number of bits of the binary representation of n. So this algorithm computes this number of squares and a lower number of multiplication, which is equal to the number of 1s in the binary representation of n. This logarithmic number of operations is to be compared with the trivial algorithm which requires n − 1 multiplications.
This algorithm is not tail-recursive. This implies that it requires an amount of auxiliary memory that is roughly proportional to the number of recursive calls, or perhaps higher if the amount of data per iteration is increasing.
The algorithms of the next section use a different approach, and the resulting algorithms needs the same number of operations, but use an auxiliary memory that is roughly the same as the memory required to store the result.
With constant auxiliary memory
[edit]The variants described in this section are based on the formula
If one applies recursively this formula, by starting with y = 1, one gets eventually an exponent equal to 0, and the desired result is then the left factor.
This may be implemented as a tail-recursive function:
Function exp_by_squaring(x, n)
return exp_by_squaring2(1, x, n)
Function exp_by_squaring2(y, x, n)
if n < 0 then return exp_by_squaring2(y, 1 / x, -n);
else if n = 0 then return y;
else if n is even then return exp_by_squaring2(y, x * x, n / 2);
else if n is odd then return exp_by_squaring2(x * y, x * x, (n - 1) / 2).
The iterative version of the algorithm also uses a bounded auxiliary space, and is given by
Function exp_by_squaring_iterative(x, n)
if n < 0 then
x := 1 / x;
n := -n;
if n = 0 then return 1
y := 1;
while n > 1 do
if n is odd then
y := x * y;
n := n - 1;
x := x * x;
n := n / 2;
return x * y
The correctness of the algorithm results from the fact that is invariant during the computation; it is at the beginning; and it is at the end.
These algorithms use exactly the same number of operations as the algorithm of the preceding section, but the multiplications are done in a different order.
Computational complexity
[edit]A brief analysis shows that such an algorithm uses squarings and at most multiplications, where denotes the floor function. More precisely, the number of multiplications is one less than the number of ones present in the binary expansion of n. For n greater than about 4 this is computationally more efficient than naively multiplying the base with itself repeatedly.
Each squaring results in approximately double the number of digits of the previous, and so, if multiplication of two d-digit numbers is implemented in O(dk) operations for some fixed k, then the complexity of computing xn is given by
2k-ary method
[edit]This algorithm calculates the value of xn after expanding the exponent in base 2k. It was first proposed by Brauer in 1939. In the algorithm below we make use of the following function f(0) = (k, 0) and f(m) = (s, u), where m = u·2s with u odd.
Algorithm:
- Input
- An element x of G, a parameter k > 0, a non-negative integer n = (nl−1, nl−2, ..., n0)2k and the precomputed values .
- Output
- The element xn in G
y := 1; i := l - 1
while i ≥ 0 do
(s, u) := f(ni)
for j := 1 to k - s do
y := y2
y := y * xu
for j := 1 to s do
y := y2
i := i - 1
return y
For optimal efficiency, k should be the smallest integer satisfying[1]
Sliding-window method
[edit]This method is an efficient variant of the 2k-ary method. For example, to calculate the exponent 398, which has binary expansion (110 001 110)2, we take a window of length 3 using the 2k-ary method algorithm and calculate 1, x3, x6, x12, x24, x48, x49, x98, x99, x198, x199, x398. But, we can also compute 1, x3, x6, x12, x24, x48, x96, x192, x199, x398, which saves one multiplication and amounts to evaluating (110 001 110)2
Here is the general algorithm:
Algorithm:
- Input
- An element x of G, a non negative integer n=(nl−1, nl−2, ..., n0)2, a parameter k > 0 and the pre-computed values .
- Output
- The element xn ∈ G.
Algorithm:
y := 1; i := l - 1
while i > -1 do
if ni = 0 then
y := y2
i := i - 1
else
s := max{i - k + 1, 0}
while ns = 0 do
s := s + 1[notes 1]
for h := 1 to i - s + 1 do
y := y2
u := (ni, ni-1, ..., ns)2
y := y * xu
i := s - 1
return y
Montgomery's ladder technique
[edit]Many algorithms for exponentiation do not provide defence against side-channel attacks. Namely, an attacker observing the sequence of squarings and multiplications can (partially) recover the exponent involved in the computation. This is a problem if the exponent should remain secret, as with many public-key cryptosystems. A technique called "Montgomery's ladder"[2] addresses this concern.
Given the binary expansion of a positive, non-zero integer n = (nk−1...n0)2 with nk−1 = 1, we can compute xn as follows:
x1 = x; x2 = x2
for i = k - 2 to 0 do
if ni = 0 then
x2 = x1 * x2; x1 = x12
else
x1 = x1 * x2; x2 = x22
return x1
The algorithm performs a fixed sequence of operations (up to log n): a multiplication and squaring takes place for each bit in the exponent, regardless of the bit's specific value. A similar algorithm for multiplication by doubling exists.
This specific implementation of Montgomery's ladder is not yet protected against cache timing attacks: memory access latencies might still be observable to an attacker, as different variables are accessed depending on the value of bits of the secret exponent. Modern cryptographic implementations use a "scatter" technique to make sure the processor always misses the faster cache.[3]
Fixed-base exponent
[edit]There are several methods which can be employed to calculate xn when the base is fixed and the exponent varies. As one can see, precomputations play a key role in these algorithms.
Yao's method
[edit]Yao's method is orthogonal to the 2k-ary method where the exponent is expanded in radix b = 2k and the computation is as performed in the algorithm above. Let n, ni, b, and bi be integers.
Let the exponent n be written as
where for all .
Let xi = xbi.
Then the algorithm uses the equality
Given the element x of G, and the exponent n written in the above form, along with the precomputed values xb0...xbw−1, the element xn is calculated using the algorithm below:
y = 1, u = 1, j = h - 1
while j > 0 do
for i = 0 to w - 1 do
if ni = j then
u = u × xbi
y = y × u
j = j - 1
return y
If we set h = 2k and bi = hi, then the ni values are simply the digits of n in base h. Yao's method collects in u first those xi that appear to the highest power ; in the next round those with power are collected in u as well etc. The variable y is multiplied times with the initial u, times with the next highest powers, and so on. The algorithm uses multiplications, and elements must be stored to compute xn.[1]
Euclidean method
[edit]The Euclidean method was first introduced in Efficient exponentiation using precomputation and vector addition chains by P.D Rooij.
This method for computing in group G, where n is a natural integer, whose algorithm is given below, is using the following equality recursively:
where . In other words, a Euclidean division of the exponent n1 by n0 is used to return a quotient q and a rest n1 mod n0.
Given the base element x in group G, and the exponent written as in Yao's method, the element is calculated using precomputed values and then the algorithm below.
Begin loop
Find , such that .
Find , such that .
Break loop if .
Let , and then let .
Compute recursively , and then let .
End loop;
Return .
The algorithm first finds the largest value among the ni and then the supremum within the set of { ni \ i ≠ M }. Then it raises xM to the power q, multiplies this value with xN, and then assigns xN the result of this computation and nM the value nM modulo nN.
Further applications
[edit]The approach also works with semigroups that are not of characteristic zero, for example allowing fast computation of large exponents modulo a number. Especially in cryptography, it is useful to compute powers in a ring of integers modulo q. For example, the evaluation of
- 13789722341 (mod 2345) = 2029
would take a very long time and much storage space if the naïve method of computing 13789722341 and then taking the remainder when divided by 2345 were used. Even using a more effective method will take a long time: square 13789, take the remainder when divided by 2345, multiply the result by 13789, and so on.
Applying above exp-by-squaring algorithm, with "*" interpreted as x * y = xy mod 2345 (that is, a multiplication followed by a division with remainder) leads to only 27 multiplications and divisions of integers, which may all be stored in a single machine word. Generally, any of these approaches will take fewer than 2log2(722340) ≤ 40 modular multiplications.
The approach can also be used to compute integer powers in a group, using either of the rules
- Power(x, −n) = Power(x−1, n),
- Power(x, −n) = (Power(x, n))−1.
The approach also works in non-commutative semigroups and is often used to compute powers of matrices.
More generally, the approach works with positive integer exponents in every magma for which the binary operation is power associative.
Signed-digit recoding
[edit]In certain computations it may be more efficient to allow negative coefficients and hence use the inverse of the base, provided inversion in G is "fast" or has been precomputed. For example, when computing x2k−1, the binary method requires k−1 multiplications and k−1 squarings. However, one could perform k squarings to get x2k and then multiply by x−1 to obtain x2k−1.
To this end we define the signed-digit representation of an integer n in radix b as
Signed binary representation corresponds to the particular choice b = 2 and . It is denoted by . There are several methods for computing this representation. The representation is not unique. For example, take n = 478: two distinct signed-binary representations are given by and , where is used to denote −1. Since the binary method computes a multiplication for every non-zero entry in the base-2 representation of n, we are interested in finding the signed-binary representation with the smallest number of non-zero entries, that is, the one with minimal Hamming weight. One method of doing this is to compute the representation in non-adjacent form, or NAF for short, which is one that satisfies and denoted by . For example, the NAF representation of 478 is . This representation always has minimal Hamming weight. A simple algorithm to compute the NAF representation of a given integer with is the following:
for i = 0 to l − 1 do return
Another algorithm by Koyama and Tsuruoka does not require the condition that ; it still minimizes the Hamming weight.
Alternatives and generalizations
[edit]Exponentiation by squaring can be viewed as a suboptimal addition-chain exponentiation algorithm: it computes the exponent by an addition chain consisting of repeated exponent doublings (squarings) and/or incrementing exponents by one (multiplying by x) only. More generally, if one allows any previously computed exponents to be summed (by multiplying those powers of x), one can sometimes perform the exponentiation using fewer multiplications (but typically using more memory). The smallest power where this occurs is for n = 15:
- (squaring, 6 multiplies),
- (optimal addition chain, 5 multiplies if x3 is re-used).
In general, finding the optimal addition chain for a given exponent is a hard problem, for which no efficient algorithms are known, so optimal chains are typically used for small exponents only (e.g. in compilers where the chains for small powers have been pre-tabulated). However, there are a number of heuristic algorithms that, while not being optimal, have fewer multiplications than exponentiation by squaring at the cost of additional bookkeeping work and memory usage. Regardless, the number of multiplications never grows more slowly than Θ(log n), so these algorithms improve asymptotically upon exponentiation by squaring by only a constant factor at best.
See also
[edit]Notes
[edit]- ^ In this line, the loop finds the longest string of length less than or equal to k ending in a non-zero value. Not all odd powers of 2 up to need be computed, and only those specifically involved in the computation need be considered.
References
[edit]- ^ a b Cohen, H.; Frey, G., eds. (2006). Handbook of Elliptic and Hyperelliptic Curve Cryptography. Discrete Mathematics and Its Applications. Chapman & Hall/CRC. ISBN 9781584885184.
- ^ Montgomery, Peter L. (1987). "Speeding the Pollard and Elliptic Curve Methods of Factorization" (PDF). Math. Comput. 48 (177): 243–264. doi:10.1090/S0025-5718-1987-0866113-7.
- ^ Gueron, Shay (5 April 2012). "Efficient software implementations of modular exponentiation" (PDF). Journal of Cryptographic Engineering. 2 (1): 31–43. doi:10.1007/s13389-012-0031-5. S2CID 7629541.
Exponentiation by squaring
View on GrokipediaCore Algorithms
Recursive Binary Exponentiation
Recursive binary exponentiation is a divide-and-conquer method for computing , where is the base and is a non-negative integer exponent. The algorithm recursively computes , squares this value to obtain , and then multiplies by an additional factor of if is odd to account for the least significant bit in the binary representation of . This approach exploits the identity to break down the problem into smaller subproblems.[4] The recursive structure is captured in the following pseudocode, which handles base cases for (returning the multiplicative identity 1) and (returning ):function power(x, n):
if n == 0:
return 1
if n == 1:
return x
half = power(x, n // 2)
result = half * half
if n % 2 == 1:
result = result * x
return result
function power(x, n):
if n == 0:
return 1
if n == 1:
return x
half = power(x, n // 2)
result = half * half
if n % 2 == 1:
result = result * x
return result
power(5, 13): Since 13 is odd, computehalf = power(5, 6),result = half * half, thenresult = result * 5.power(5, 6): Even, so computehalf = power(5, 3),result = half * half.power(5, 3): Odd, so computehalf = power(5, 1),result = half * half = 5 * 5 = 25, thenresult = 25 * 5 = 125.power(5, 1): Base case, returns 5.- Thus,
power(5, 3) = 125. - Then,
power(5, 6) = 125 * 125 = 15,625. - Finally,
power(5, 13):half = 15,625,result = 15,625 * 15,625 = 244,140,625, thenresult = 244,140,625 * 5 = 1,220,703,125.
Iterative Binary Exponentiation
The iterative binary exponentiation algorithm provides a space-efficient method to compute by processing the binary representation of the exponent from least significant bit (LSB) to most significant bit (MSB), avoiding the recursion depth limitations of recursive implementations.[7] In the right-to-left variant, the process initializes the result to 1 and iteratively squares the current base while conditionally multiplying it into the result based on whether the current bit of is set, effectively building the power through repeated squaring and selective accumulation.[7] The following pseudocode illustrates the right-to-left approach:function iterative_pow(base, exponent):
result = 1
while exponent > 0:
if exponent % 2 == 1:
result = result * base
base = base * base
exponent = exponent // 2
return result
function iterative_pow(base, exponent):
result = 1
while exponent > 0:
if exponent % 2 == 1:
result = result * base
base = base * base
exponent = exponent // 2
return result
- Iteration 1: exponent = 13 (odd), result = 1 × 5 = 5; base = 5 × 5 = 25; exponent = 6.
- Iteration 2: exponent = 6 (even), result = 5; base = 25 × 25 = 625; exponent = 3.
- Iteration 3: exponent = 3 (odd), result = 5 × 625 = 3125; base = 625 × 625 = 390625; exponent = 1.
- Iteration 4: exponent = 1 (odd), result = 3125 × 390625 = 1,220,703,125; base = 390625 × 390625 (unused); exponent = 0.
function left_to_right_pow(base, exponent):
# Assume exponent binary representation has t+1 bits, e_t ... e_0
result = 1
for i from t downto 0:
result = result * result
if bit i of exponent == 1:
result = result * base
return result
function left_to_right_pow(base, exponent):
# Assume exponent binary representation has t+1 bits, e_t ... e_0
result = 1
for i from t downto 0:
result = result * result
if bit i of exponent == 1:
result = result * base
return result
Efficiency Analysis
Time and Space Complexity
The binary exponentiation algorithms, both recursive and iterative, exhibit a time complexity of in terms of the number of multiplications required to compute , where is the exponent. This logarithmic scaling arises because the exponent is halved in each step, leading to at most squarings and a number of additional multiplications equal to the Hamming weight of minus one. The Hamming weight, or population count (), represents the number of 1-bits in the binary representation of and averages approximately for random exponents. Thus, the total number of multiplications is approximately . For large integers without modular reduction, each squaring or multiplication operation incurs an additional cost of using the schoolbook multiplication method, where is the base, resulting in an overall time complexity of . In the context of modular exponentiation, where all operations are performed modulo , the intermediate values remain bounded by the bit length of , preserving the multiplication count while each modular multiplication costs ; the overall complexity thus remains logarithmic in . This represents a significant improvement over the naive repeated-multiplication approach, which requires operations. Regarding space complexity, the iterative version of binary exponentiation uses auxiliary space, relying only on a constant number of variables to track the result, current power, and exponent. In contrast, the recursive version requires space due to the call stack depth, which matches the recursion depth of .Comparison to Naive Multiplication
The naive method for computing involves repeatedly multiplying by itself times, resulting in multiplications. This approach is straightforward but becomes highly inefficient as grows large, since the number of operations scales linearly with the exponent. In contrast, binary exponentiation reduces the operation count to approximately multiplications, where is the Hamming weight (number of 1-bits) in the binary representation of . For example, with (binary 1111101000, , ), the naive method requires 999 multiplications, while binary exponentiation needs only 14 (9 squarings + 5 extra multiplications). This logarithmic scaling provides substantial efficiency gains for moderate to large exponents. To illustrate the difference more clearly, consider the following table comparing operation counts for powers of 2, where binary exponentiation consists solely of squarings due to the single 1-bit in the exponent:| Exponent | Naive Multiplications () | Binary Multiplications () | Relative Speedup |
|---|---|---|---|
| 1023 | 10 | 102.3× | |
| 1,048,575 | 20 | 52,428.75× |
Higher-Radix Variants
2^k-ary Exponentiation
2^k-ary exponentiation generalizes binary exponentiation, the special case for k=1, by representing the exponent in base m=2^k and precomputing the powers x^0, x^1, \dots, x^{m-1} to reduce the number of multiplications at the expense of additional precomputation and storage. The method processes groups of k bits at a time, enabling fewer but more expensive operations compared to binary methods.[8][9] In the algorithm, the exponent n is expressed as n = \sum_{i=0}^{\ell-1} d_i m^i where each digit d_i \in {0, 1, \dots, m-1} and \ell \approx \log_m n = (\log_2 n)/k is the number of digits. Precomputation involves calculating the table of powers x^j for j=1 to m-1, typically using successive multiplications or squarings, costing O(m) operations.[8] The main computation uses a left-to-right approach: initialize the result R = x^{d_{\ell-1}}, then for each subsequent digit d_i from i=\ell-2 down to 0, update R \leftarrow R^m \cdot x^{d_i}, where R^m is computed via k successive squarings.[10] This requires approximately \log_2 n squarings in total, independent of k, and up to \ell multiplications for the digit terms, yielding O((\log_2 n)/k) extra multiplications.[8] For k=2 (quaternary exponentiation, m=4), the binary digits of the exponent are grouped into 2-bit chunks, with precomputed values x^0 = 1, x^1 = x, x^2 = x^2, and x^3 = x^3. Each group corresponds to a digit d_i \in {0,1,2,3}, and R^m = R^4 is two squarings: (R^2)^2. The precomputation costs 3 multiplications (e.g., x^2 = x \cdot x, x^3 = x^2 \cdot x).[8] Consider computing 3^{25} with k=2. First, 25_{10} = 121_4 since 1 \cdot 4^2 + 2 \cdot 4^1 + 1 \cdot 4^0 = 16 + 8 + 1 = 25, with digits d_2=1, d_1=2, d_0=1. Precompute: p_0=1, p_1=3, p_2=9, p_3=27. Initialize R = p_1 = 3 (for 3^1). For d_1=2: compute R^4 = ((3)^2)^2 = 9^2 = 81, then R = 81 \cdot p_2 = 81 \cdot 9 = 729 (now 3^{1+8} = 3^9). For d_0=1: R^4 = ((729)^2)^2 = 531441^2 = 282429536481, then R = 282429536481 \cdot p_1 = 282429536481 \cdot 3 = 847288609443 (3^{9+16} = 3^{25}). This uses 4 squarings and 3 multiplications (plus precomputation), compared to 5 squarings and 3 multiplications for binary exponentiation.[10] The trade-off involves fewer multiplications—scaling as O((\log_2 n)/k)—but higher precomputation cost O(2^k) and potentially costlier multiplications by larger precomputed powers, with the number of extra multiplications depending on the digit distribution (typically \ell on average if multiplying even for d_i=0).[9] For typical hardware and exponent sizes around 1024-2048 bits, an optimal k is around 3 to 5, balancing reduced iterations against precomputation overhead and multiplication complexity.[10]Sliding-Window Exponentiation
Sliding-window exponentiation is an optimization of binary exponentiation that processes the exponent in overlapping windows of fixed size to reduce the total number of multiplications required, particularly by minimizing unnecessary squarings through lookahead and precomputation of small powers of the base . The method scans the exponent bits from the most significant bit (left-to-right), using sliding windows that start at each '1' bit and look ahead up to bits to form the window value, allowing efficient handling of zero runs. This approach trades a small amount of precomputation storage for fewer operations during the main loop, making it suitable for variable exponents in cryptographic applications like modular exponentiation.[11] The algorithm selects a window size , typically small to balance precomputation cost and runtime savings. It precomputes the powers for to , which support the repeated squarings needed between windows. Additionally, a lookup table is built for the possible window values corresponding to odd integers from 1 to , computed as using successive squarings and a single multiplication by per entry; this requires multiplications in total. The main computation initializes the result to 1. It processes the exponent by repeatedly finding the next window starting from the current position: if the current bit is 0, advance by squaring the result; otherwise, form the -bit (or shorter) window value by looking ahead, multiply the result by the precomputed , then square the result times to shift past the window (consolidating squarings for trailing zeros in the window). This process repeats until all bits are processed, with leading zeros handled by adjusting the initial window.[8][11] For example, consider computing with , where 26 in binary is . Precompute odd powers: , , , . The binary representation has windows that can be processed left-to-right: starting from MSB, the first window (bits 4-2: 110 binary 6, but since even, adjust to odd via recoding or treat as 0110 with leading 0; implementations vary but effectively use 2 windows: one for the leading 11 (3, shifted), then skip 0, then 10. The total reduces to approximately 2 multiplications plus precomputation and 5 squarings, fewer than binary's 5 squarings and 3 multiplications for this sparse exponent.[8] The benefits of sliding-window exponentiation include fewer overall multiplications compared to naive binary methods, with an average of approximately multiplications for an -bit exponent, where the first term approximates squarings per window and the second reflects multiplication probability for non-zero windows. Precomputation cost is multiplications, but runtime savings dominate for large exponents. Window sizes to are optimal, balancing table size (16-32 entries) against speed gains of 5-8% over m-ary methods for 512-bit exponents, as larger increases precomputation exponentially while diminishing marginal returns. Unlike the non-sliding 2^k-ary method with fixed blocks, sliding windows overlap to further reduce squarings by adapting to bit patterns.[11]Constant-Time Techniques
Montgomery Ladder
The Montgomery ladder is a variant of binary exponentiation designed for constant-time execution, making it resistant to side-channel attacks such as timing and power analysis in cryptographic applications.[12] It maintains two registers, and , initialized to 1 and the base respectively, and processes the binary representation of the exponent from the most significant bit (MSB) to the least significant bit (LSB), performing exactly one squaring and one multiplication per bit regardless of the bit value.[12] This uniformity ensures that the algorithm's execution path and operation counts do not leak information about the exponent.[12] The core technique operates in a multiplicative group, where for each bit :- If , compute followed by .
- If , compute followed by .
Input: base g, exponent k = (k_{t-1} ... k_0)_2, modulus n
Output: y = g^k mod n
R_0 ← 1 mod n
R_1 ← g mod n
for j = t-1 downto 0 do
if k_j = 0 then
R_1 ← (R_0 * R_1) mod n
R_0 ← (R_0 * R_0) mod n
else
R_0 ← (R_0 * R_1) mod n
R_1 ← (R_1 * R_1) mod n
return R_0
Input: base g, exponent k = (k_{t-1} ... k_0)_2, modulus n
Output: y = g^k mod n
R_0 ← 1 mod n
R_1 ← g mod n
for j = t-1 downto 0 do
if k_j = 0 then
R_1 ← (R_0 * R_1) mod n
R_0 ← (R_0 * R_0) mod n
else
R_0 ← (R_0 * R_1) mod n
R_1 ← (R_1 * R_1) mod n
return R_0
- Bit 1 (MSB): , . Now .
- Bit 1: , . Now .
- Bit 0: , . Now .
- Bit 1 (LSB): , . Now .
Signed-Digit Recoding
Signed-digit recoding transforms the binary representation of the exponent into a signed-digit form, such as the non-adjacent form (NAF), to minimize the number of non-zero digits while ensuring no two consecutive digits are non-zero. In NAF, each digit belongs to the set , often denoted with for -1, and the representation of an integer is . This recoding, originally introduced for efficient binary arithmetic, reduces the average Hamming weight—the number of non-zero digits—from approximately in standard binary to in NAF, leading to fewer multiplication operations during exponentiation.[15][16] The NAF can be computed from the binary representation using a deterministic algorithm that scans the bits from least to most significant, adjusting for signed digits to enforce the non-adjacency constraint. One standard method initializes a carry and, for each bit position from 0 to , computes the next carry and the recoded digit , where are the binary digits (with ); the process may extend by one bit if a final carry remains. This produces a unique NAF representation of minimal weight. Related techniques, such as Booth recoding adapted for signed digits, follow similar principles by examining overlapping bit pairs to generate or 0, though NAF specifically guarantees non-adjacency. For example, the exponent 13 in binary is , which recodes to (digits from MSB: 1, 0, -1, 0, 1), satisfying .[16][17] In the context of exponentiation by squaring, the recoded exponent enables a left-to-right algorithm where the result is iteratively squared, and multiplied by the base (or for -1 digits) only at non-zero positions, precomputing once if needed for modular arithmetic. To avoid explicit inversion in practice, especially in modular settings, implementations often precompute and perform additions equivalent to subtractions, treating negative digits as multiplications by this negated base, which maintains efficiency since modular negation is inexpensive. This approach yields approximately 30% fewer multiplications compared to binary methods, as the reduced Hamming weight directly lowers the number of base multiplications (from roughly to for an -bit exponent), making it particularly advantageous for hardware implementations with left-to-right scanning. Balanced ternary, another signed-digit system using radix 3 with digits -1, 0, 1, offers similar sparsity benefits but requires adjustments for higher-radix squaring. The non-adjacent property also enhances security against side-channel attacks by promoting more uniform operation patterns.[17][16]Fixed-Base Optimizations
Yao's Method
Yao's method, proposed by Andrew Yao in 1976, provides an efficient approach for fixed-base exponentiation, particularly beneficial when the same base is used to compute for multiple different exponents . The technique relies on representing the exponent in a higher radix , precomputing a table of powers for , where , and then assembling the result through grouped multiplications based on the digits of . This amortizes the precomputation cost across multiple evaluations, making it suitable for applications such as batch verification in cryptographic protocols.[16] The algorithm proceeds as follows. First, during precomputation, compute the table entries successively: set , and for , compute using group multiplications per step, yielding a total precomputation cost of multiplications and storage, where . For a given exponent , express with digits . The power is then given by In the evaluation phase, initialize . For each down to , if there exists at least one with , compute using multiplications, where is the number of such ; then compute via binary exponentiation in multiplications and update . The total evaluation cost is , dominated by the digit products and powerings.[16] To illustrate, consider computing with (so ) and precomputed table , , , , , . The base-4 digits of 2989 are , or .- For : Positions , so (1 multiplication); compute (using 1 squaring and 1 multiplication via binary exponentiation); set .
- For : Positions , so (2 multiplications); compute (1 squaring); set .
- For : Position , so (0 multiplications); compute (0 operations); set .
Euclidean Method
The Euclidean method provides an efficient approach to fixed-base exponentiation by applying principles from the Euclidean algorithm to decompose the exponent recursively, minimizing multiplications through division and modulo operations on exponent components. For a fixed base and exponents , the power is computed as , where and , with the exponentiation of handled recursively and optimized using squaring chains for the power . This decomposition extends the core idea of exponentiation by squaring to leverage exponent differences, reducing the problem size at each step similar to the Euclidean algorithm for GCD computation.[18] The algorithm operates by maintaining a set of current powers and corresponding exponents, initially derived from the base- representation of the target exponent, and iteratively applying the Euclidean step to the largest remaining exponents : compute (or the modulo variant for efficiency), recursing on the smaller pair until base cases (such as or ) are reached. In practice, for a single target exponent , the process starts by selecting an auxiliary exponent (e.g., a power of 2 or another related value) and recurses on the differences or remainders, building the result through a chain of multiplications and squarings. This avoids exhaustive binary scanning and focuses on rapid reduction via division. The method relates briefly to Yao's table-based approach for fixed-base cases but relies on recursive subtraction without extensive precomputed sums.[8] A representative example illustrates the recursion for computing , expressed initially as . Recursing on 8 and 7: since , , so . Now recurse on computing : choose auxiliary 4, , so ; then on 4 and 3, , ; continuing to , with . The full chain requires 6 multiplications and 3 squarings, fewer than the 7 operations of naive binary exponentiation for this case. Deeper recursion ensures the exponents diminish logarithmically.[18] The efficiency stems from the fact that the number of multiplications equals the number of steps in the Euclidean algorithm applied to the exponents, yielding total operations overall, though the constant factor depends on the exponent structure—with smaller constants observed for Fibonacci-like exponents, where the recursive relations align closely with the decomposition steps, enabling near-optimal addition chains. For general exponents, the method performs comparably to binary exponentiation but excels when exponents share common factors or follow patterns amenable to quick modulo reductions. Historically, the approach draws from the Euclidean spirit in constructing exponent chains, with seminal formulation in de Rooij's 1995 work.[8][10]Extensions and Applications
Further Uses in Computing
Exponentiation by squaring serves as the core algorithm for modular exponentiation in public-key cryptography, particularly in RSA encryption and decryption, where computing or requires efficient handling of large exponents to minimize multiplications. This method reduces the computational complexity from to modular multiplications, making RSA practical for 2048-bit or larger keys.[19] Similarly, in the Diffie-Hellman key exchange, parties compute shared secrets via and , relying on binary exponentiation to perform these operations securely and efficiently over large primes.[20] Software libraries commonly implement this technique for cryptographic primitives; for instance, Python's built-inpow(base, exp, mod) function uses binary exponentiation in its long integer arithmetic to compute modular powers, optimizing for speed.
In scientific computing, exponentiation by squaring extends to matrices, enabling efficient computation of powers of transition matrices in Markov chains to determine long-term state probabilities. For a transition matrix , the n-step probabilities are given by , which can be calculated via repeated squaring in matrix multiplications, avoiding the prohibitive cost of naive repeated multiplication for large n. This approach is particularly valuable in simulations of stochastic processes, such as population dynamics or queueing models.[21]
Applications in computer graphics leverage exponentiation by squaring for rapid evaluation of power functions in rendering pipelines, including specular exponentiation in shading models like Phong, where high shininess exponents (e.g., 100+) are common, and in procedural generation of fractals requiring iterative power computations.[22]
Hardware implementations integrate exponentiation by squaring into processors and accelerators for cryptographic workloads. On x86 architectures, libraries like Intel IPP Cryptography provide optimized modular exponentiation routines using binary methods tailored to instruction sets like AVX for RSA and elliptic curve operations.[23] In field-programmable gate arrays (FPAs), it forms the basis of dedicated crypto accelerators, such as those for RSA modular exponentiation via Montgomery reduction, achieving high throughput for secure communications.[24]
Post-2020 advancements in post-quantum cryptography, including NIST-standardized lattice-based schemes like CRYSTALS-Kyber (standardized as ML-KEM (FIPS 203) in August 2024 by NIST), employ the Number Theoretic Transform (NTT) with precomputed twiddle factors, which underpin efficient polynomial multiplications during key generation and encapsulation, significantly reducing computation time compared to direct powering.[25]
