Hubbry Logo
Exponentiation by squaringExponentiation by squaringMain
Open search
Exponentiation by squaring
Community hub
Exponentiation by squaring
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Exponentiation by squaring
Exponentiation by squaring
from Wikipedia

In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.

Basic method

[edit]

Recursive version

[edit]

The method is based on the observation that, for any integer , one has

If the exponent n is zero, then the answer is 1. If the exponent is negative then we can reuse the previous formula by rewriting the value using a positive exponent. That is,

Together, these may be implemented directly as the following recursive algorithm:

Inputs: a real number x; an integer n
Output: xn

function exp_by_squaring(x, n) is
    if n < 0 then
        return exp_by_squaring(1 / x, −n)
    else if n = 0 then
        return 1
    else if n is even then
        return exp_by_squaring(x × x, n / 2)
    else if n is odd then
        return x × exp_by_squaring(x × x, (n − 1) / 2)
end function

In each recursive call, the least-significant digit of the binary representation of n is removed. It follows that the number of recursive calls is the number of bits of the binary representation of n. So this algorithm computes this number of squares and a lower number of multiplication, which is equal to the number of 1s in the binary representation of n. This logarithmic number of operations is to be compared with the trivial algorithm which requires n − 1 multiplications.

This algorithm is not tail-recursive. This implies that it requires an amount of auxiliary memory that is roughly proportional to the number of recursive calls, or perhaps higher if the amount of data per iteration is increasing.

The algorithms of the next section use a different approach, and the resulting algorithms needs the same number of operations, but use an auxiliary memory that is roughly the same as the memory required to store the result.

With constant auxiliary memory

[edit]

The variants described in this section are based on the formula

If one applies recursively this formula, by starting with y = 1, one gets eventually an exponent equal to 0, and the desired result is then the left factor.

This may be implemented as a tail-recursive function:

Function exp_by_squaring(x, n)
    return exp_by_squaring2(1, x, n)

Function exp_by_squaring2(y, x, n)
    if n < 0 then return exp_by_squaring2(y, 1 / x, -n);
    else if n = 0 then return y;
    else if n is even then return exp_by_squaring2(y, x * x, n / 2);
    else if n is odd then return exp_by_squaring2(x * y, x * x, (n - 1) / 2).

The iterative version of the algorithm also uses a bounded auxiliary space, and is given by

Function exp_by_squaring_iterative(x, n)
    if n < 0 then
        x := 1 / x;
        n := -n;
    if n = 0 then return 1
    y := 1;
    while n > 1 do
        if n is odd then
            y := x * y;
            n := n - 1;
        x := x * x;
        n := n / 2;
    return x * y

The correctness of the algorithm results from the fact that is invariant during the computation; it is at the beginning; and it is at the end.

These algorithms use exactly the same number of operations as the algorithm of the preceding section, but the multiplications are done in a different order.

Computational complexity

[edit]

A brief analysis shows that such an algorithm uses squarings and at most multiplications, where denotes the floor function. More precisely, the number of multiplications is one less than the number of ones present in the binary expansion of n. For n greater than about 4 this is computationally more efficient than naively multiplying the base with itself repeatedly.

Each squaring results in approximately double the number of digits of the previous, and so, if multiplication of two d-digit numbers is implemented in O(dk) operations for some fixed k, then the complexity of computing xn is given by

2k-ary method

[edit]

This algorithm calculates the value of xn after expanding the exponent in base 2k. It was first proposed by Brauer in 1939. In the algorithm below we make use of the following function f(0) = (k, 0) and f(m) = (s, u), where m = u·2s with u odd.

Algorithm:

Input
An element x of G, a parameter k > 0, a non-negative integer n = (nl−1, nl−2, ..., n0)2k and the precomputed values .
Output
The element xn in G
y := 1; i := l - 1
while i ≥ 0 do
    (s, u) := f(ni)
    for j := 1 to k - s do
        y := y2
    y := y * xu
    for j := 1 to s do
        y := y2
    i := i - 1
return y

For optimal efficiency, k should be the smallest integer satisfying[1]

Sliding-window method

[edit]

This method is an efficient variant of the 2k-ary method. For example, to calculate the exponent 398, which has binary expansion (110 001 110)2, we take a window of length 3 using the 2k-ary method algorithm and calculate 1, x3, x6, x12, x24, x48, x49, x98, x99, x198, x199, x398. But, we can also compute 1, x3, x6, x12, x24, x48, x96, x192, x199, x398, which saves one multiplication and amounts to evaluating (110 001 110)2

Here is the general algorithm:

Algorithm:

Input
An element x of G, a non negative integer n=(nl−1, nl−2, ..., n0)2, a parameter k > 0 and the pre-computed values .
Output
The element xnG.

Algorithm:

y := 1; i := l - 1
while i > -1 do
    if ni = 0 then
        y := y2
        i := i - 1
    else
        s := max{i - k + 1, 0}
        while ns = 0 do
            s := s + 1[notes 1]
        for h := 1 to i - s + 1 do
            y := y2
        u := (ni, ni-1, ..., ns)2
        y := y * xu
        i := s - 1
return y

Montgomery's ladder technique

[edit]

Many algorithms for exponentiation do not provide defence against side-channel attacks. Namely, an attacker observing the sequence of squarings and multiplications can (partially) recover the exponent involved in the computation. This is a problem if the exponent should remain secret, as with many public-key cryptosystems. A technique called "Montgomery's ladder"[2] addresses this concern.

Given the binary expansion of a positive, non-zero integer n = (nk−1...n0)2 with nk−1 = 1, we can compute xn as follows:

x1 = x; x2 = x2
for i = k - 2 to 0 do
    if ni = 0 then
        x2 = x1 * x2; x1 = x12
    else
        x1 = x1 * x2; x2 = x22
return x1

The algorithm performs a fixed sequence of operations (up to log n): a multiplication and squaring takes place for each bit in the exponent, regardless of the bit's specific value. A similar algorithm for multiplication by doubling exists.

This specific implementation of Montgomery's ladder is not yet protected against cache timing attacks: memory access latencies might still be observable to an attacker, as different variables are accessed depending on the value of bits of the secret exponent. Modern cryptographic implementations use a "scatter" technique to make sure the processor always misses the faster cache.[3]

Fixed-base exponent

[edit]

There are several methods which can be employed to calculate xn when the base is fixed and the exponent varies. As one can see, precomputations play a key role in these algorithms.

Yao's method

[edit]

Yao's method is orthogonal to the 2k-ary method where the exponent is expanded in radix b = 2k and the computation is as performed in the algorithm above. Let n, ni, b, and bi be integers.

Let the exponent n be written as

where for all .

Let xi = xbi.

Then the algorithm uses the equality

Given the element x of G, and the exponent n written in the above form, along with the precomputed values xb0...xbw−1, the element xn is calculated using the algorithm below:

y = 1, u = 1, j = h - 1
while j > 0 do
    for i = 0 to w - 1 do
        if ni = j then
            u = u × xbi
    y = y × u
    j = j - 1
return y

If we set h = 2k and bi = hi, then the ni values are simply the digits of n in base h. Yao's method collects in u first those xi that appear to the highest power ; in the next round those with power are collected in u as well etc. The variable y is multiplied times with the initial u, times with the next highest powers, and so on. The algorithm uses multiplications, and elements must be stored to compute xn.[1]

Euclidean method

[edit]

The Euclidean method was first introduced in Efficient exponentiation using precomputation and vector addition chains by P.D Rooij.

This method for computing in group G, where n is a natural integer, whose algorithm is given below, is using the following equality recursively:

where . In other words, a Euclidean division of the exponent n1 by n0 is used to return a quotient q and a rest n1 mod n0.

Given the base element x in group G, and the exponent written as in Yao's method, the element is calculated using precomputed values and then the algorithm below.

Begin loop
    Find , such that .
    Find , such that .
    Break loop if .
    Let , and then let .
    Compute recursively , and then let .
End loop;
Return .

The algorithm first finds the largest value among the ni and then the supremum within the set of { ni \ iM }. Then it raises xM to the power q, multiplies this value with xN, and then assigns xN the result of this computation and nM the value nM modulo nN.

Further applications

[edit]

The approach also works with semigroups that are not of characteristic zero, for example allowing fast computation of large exponents modulo a number. Especially in cryptography, it is useful to compute powers in a ring of integers modulo q. For example, the evaluation of

13789722341 (mod 2345) = 2029

would take a very long time and much storage space if the naïve method of computing 13789722341 and then taking the remainder when divided by 2345 were used. Even using a more effective method will take a long time: square 13789, take the remainder when divided by 2345, multiply the result by 13789, and so on.

Applying above exp-by-squaring algorithm, with "*" interpreted as x * y = xy mod 2345 (that is, a multiplication followed by a division with remainder) leads to only 27 multiplications and divisions of integers, which may all be stored in a single machine word. Generally, any of these approaches will take fewer than 2log2(722340) ≤ 40 modular multiplications.

The approach can also be used to compute integer powers in a group, using either of the rules

Power(x, −n) = Power(x−1, n),
Power(x, −n) = (Power(x, n))−1.

The approach also works in non-commutative semigroups and is often used to compute powers of matrices.

More generally, the approach works with positive integer exponents in every magma for which the binary operation is power associative.

Signed-digit recoding

[edit]

In certain computations it may be more efficient to allow negative coefficients and hence use the inverse of the base, provided inversion in G is "fast" or has been precomputed. For example, when computing x2k−1, the binary method requires k−1 multiplications and k−1 squarings. However, one could perform k squarings to get x2k and then multiply by x−1 to obtain x2k−1.

To this end we define the signed-digit representation of an integer n in radix b as

Signed binary representation corresponds to the particular choice b = 2 and . It is denoted by . There are several methods for computing this representation. The representation is not unique. For example, take n = 478: two distinct signed-binary representations are given by and , where is used to denote −1. Since the binary method computes a multiplication for every non-zero entry in the base-2 representation of n, we are interested in finding the signed-binary representation with the smallest number of non-zero entries, that is, the one with minimal Hamming weight. One method of doing this is to compute the representation in non-adjacent form, or NAF for short, which is one that satisfies and denoted by . For example, the NAF representation of 478 is . This representation always has minimal Hamming weight. A simple algorithm to compute the NAF representation of a given integer with is the following:


for i = 0 to l − 1 do
  
  
return 

Another algorithm by Koyama and Tsuruoka does not require the condition that ; it still minimizes the Hamming weight.

Alternatives and generalizations

[edit]

Exponentiation by squaring can be viewed as a suboptimal addition-chain exponentiation algorithm: it computes the exponent by an addition chain consisting of repeated exponent doublings (squarings) and/or incrementing exponents by one (multiplying by x) only. More generally, if one allows any previously computed exponents to be summed (by multiplying those powers of x), one can sometimes perform the exponentiation using fewer multiplications (but typically using more memory). The smallest power where this occurs is for n = 15:

(squaring, 6 multiplies),
(optimal addition chain, 5 multiplies if x3 is re-used).

In general, finding the optimal addition chain for a given exponent is a hard problem, for which no efficient algorithms are known, so optimal chains are typically used for small exponents only (e.g. in compilers where the chains for small powers have been pre-tabulated). However, there are a number of heuristic algorithms that, while not being optimal, have fewer multiplications than exponentiation by squaring at the cost of additional bookkeeping work and memory usage. Regardless, the number of multiplications never grows more slowly than Θ(log n), so these algorithms improve asymptotically upon exponentiation by squaring by only a constant factor at best.

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Exponentiation by squaring is an efficient for computing large powers aba^b, where aa is the base and bb is a non-negative exponent, by leveraging the binary representation of bb to minimize the number of multiplications through repeated squaring of intermediate results. This method, also known as binary exponentiation or the square-and-multiply , transforms the problem of into a series of squarings and selective multiplications, making it particularly effective for scenarios involving very large exponents. The algorithm operates by expressing the exponent bb in its binary form, say b=i=0kui2ib = \sum_{i=0}^{k} u_i 2^i where each uiu_i is 0 or 1, and then computing powers of the base corresponding to powers of 2 via successive squaring. For instance, it initializes with a20=aa^{2^0} = a, then iteratively squares to obtain a2j=(a2j1)2a^{2^j} = (a^{2^{j-1}})^2 for j=1j = 1 to the bit length of bb, reducing modulo mm if applicable to keep values manageable. The final result is the product of those a2ja^{2^j} where uj=1u_j = 1, again taken modulo mm if needed. This can be implemented recursively, where ab=(ab/2)2a^b = (a^{b/2})^2 if bb is even, or a(a(b1)/2)2a \cdot (a^{(b-1)/2})^2 if odd, or iteratively by scanning the binary digits from most to least significant bit. In terms of efficiency, the naive approach to exponentiation requires up to b1b-1 multiplications, which is impractical for large bb, whereas exponentiation by squaring performs at most 2log2b2 \log_2 b multiplications—roughly one squaring per bit and one additional multiplication per set bit—achieving O(logb)O(\log b) time complexity. This logarithmic reduction is crucial in computational number theory, as it handles exponents with thousands of bits in feasible time. The algorithm finds widespread use in , a core operation in such as the RSA algorithm, where computing abmodma^b \mod m for large primes mm is essential yet computationally intensive without optimization. It also appears in matrix exponentiation for graph algorithms and scientific computing, extending its utility beyond scalar arithmetic to linear algebra problems.

Core Algorithms

Recursive Binary Exponentiation

Recursive binary exponentiation is a divide-and-conquer method for computing xnx^n, where xx is the base and nn is a non-negative exponent. The algorithm recursively computes xn/2x^{\lfloor n/2 \rfloor}, squares this value to obtain xn(nmod2)x^{n - (n \mod 2)}, and then multiplies by an additional factor of xx if nn is odd to account for the least significant bit in the binary representation of nn. This approach exploits the identity xn=(xn/2)2xnmod2x^n = (x^{\lfloor n/2 \rfloor})^2 \cdot x^{n \mod 2} to break down the problem into smaller subproblems. The recursive structure is captured in the following , which handles base cases for n=0n = 0 (returning the multiplicative identity 1) and n=1n = 1 (returning xx):

function power(x, n): if n == 0: return 1 if n == 1: return x half = power(x, n // 2) result = half * half if n % 2 == 1: result = result * x return result

function power(x, n): if n == 0: return 1 if n == 1: return x half = power(x, n // 2) result = half * half if n % 2 == 1: result = result * x return result

This performs a tree of recursive calls corresponding to the binary digits of nn, with each level halving the exponent until reaching the base cases. To illustrate, consider computing 5135^{13}. The binary representation of 13 is 1101 (i.e., 13=23+22+2013 = 2^3 + 2^2 + 2^0). The recursion proceeds as follows:
  • power(5, 13): Since 13 is odd, compute half = power(5, 6), result = half * half, then result = result * 5.
  • power(5, 6): Even, so compute half = power(5, 3), result = half * half.
  • power(5, 3): Odd, so compute half = power(5, 1), result = half * half = 5 * 5 = 25, then result = 25 * 5 = 125.
  • power(5, 1): Base case, returns 5.
  • Thus, power(5, 3) = 125.
  • Then, power(5, 6) = 125 * 125 = 15,625.
  • Finally, power(5, 13): half = 15,625, result = 15,625 * 15,625 = 244,140,625, then result = 244,140,625 * 5 = 1,220,703,125.
The multiplications are: 5×5=255 \times 5 = 25, 25×5=12525 \times 5 = 125, 125×125=15,625125 \times 125 = 15,625, 15,625×15,625=244,140,62515,625 \times 15,625 = 244,140,625, and 244,140,625×5=1,220,703,125244,140,625 \times 5 = 1,220,703,125. This leverages the binary expansion of the exponent, selecting terms from the powers computed via squaring (specifically 51,52,54,585^1, 5^2, 5^4, 5^8) to reduce the total multiplications to a number logarithmic in nn. The technique has ancient origins in , appearing in Pingala's Chandah-sutra (c. 200 BCE) as a method for efficient of powers in prosody and . It was formalized as a modern recursive in 20th-century , particularly in Donald Knuth's (1969). As a conceptual tool, the recursive version highlights the divide-and-conquer paradigm, though practical implementations often prefer iterative alternatives to avoid stack overhead.

Iterative Binary Exponentiation

The iterative binary exponentiation provides a space-efficient method to compute ana^n by processing the binary representation of the exponent nn from least significant bit (LSB) to most significant bit (MSB), avoiding the recursion depth limitations of recursive implementations. In the right-to-left variant, the process initializes the result to 1 and iteratively squares the current base while conditionally multiplying it into the result based on whether the current bit of nn is set, effectively building the power through repeated squaring and selective accumulation. The following pseudocode illustrates the right-to-left approach:

function iterative_pow(base, exponent): result = 1 while exponent > 0: if exponent % 2 == 1: result = result * base base = base * base exponent = exponent // 2 return result

function iterative_pow(base, exponent): result = 1 while exponent > 0: if exponent % 2 == 1: result = result * base base = base * base exponent = exponent // 2 return result

This implementation uses a simple that halves the exponent in each iteration until it reaches zero. To demonstrate, consider computing 5135^{13}, where 13 in binary is 1101 (bits: 1, 1, 0, 1 from MSB to LSB). Start with result = 1, base = 5, exponent = 13:
  • Iteration 1: exponent = 13 (odd), result = 1 × 5 = 5; base = 5 × 5 = 25; exponent = 6.
  • Iteration 2: exponent = 6 (even), result = 5; base = 25 × 25 = 625; exponent = 3.
  • Iteration 3: exponent = 3 (odd), result = 5 × 625 = 3125; base = 625 × 625 = 390625; exponent = 1.
  • Iteration 4: exponent = 1 (odd), result = 3125 × 390625 = 1,220,703,125; base = 390625 × 390625 (unused); exponent = 0.
The final result is 1,220,703,125, matching 5135^{13}. This method requires only constant auxiliary memory beyond the input values, achieving O(1) space complexity by relying on a fixed number of variables (result, base, and exponent) regardless of the exponent size, making it ideal for large exponents in resource-constrained environments. A left-to-right variant processes the bits from MSB to LSB, initializing the result to 1 and iteratively squaring it while multiplying by the original base if the current bit is 1, yielding equivalent and space usage. The for this approach is:

function left_to_right_pow(base, exponent): # Assume exponent binary representation has t+1 bits, e_t ... e_0 result = 1 for i from t downto 0: result = result * result if bit i of exponent == 1: result = result * base return result

function left_to_right_pow(base, exponent): # Assume exponent binary representation has t+1 bits, e_t ... e_0 result = 1 for i from t downto 0: result = result * result if bit i of exponent == 1: result = result * base return result

This iterative form serves as a practical, stack-safe alternative to the recursive binary exponentiation, producing the same logarithmic-time results without function call overhead.

Efficiency Analysis

Time and Space Complexity

The binary exponentiation algorithms, both recursive and iterative, exhibit a time complexity of O(logn)O(\log n) in terms of the number of multiplications required to compute xnx^n, where nn is the exponent. This logarithmic scaling arises because the exponent is halved in each step, leading to at most log2n+1\lfloor \log_2 n \rfloor + 1 squarings and a number of additional multiplications equal to the of nn minus one. The , or population count (\popcount(n)\popcount(n)), represents the number of 1-bits in the binary representation of nn and averages approximately (log2n+1)/2(\log_2 n + 1)/2 for random exponents. Thus, the total number of multiplications is approximately log2n+\popcount(n)1\log_2 n + \popcount(n) - 1. For large integers without modular reduction, each squaring or operation incurs an additional cost of O((logx)2)O((\log x)^2) using the schoolbook method, where xx is the base, resulting in an overall of O((logn)(logx)2)O((\log n) \cdot (\log x)^2). In the context of , where all operations are performed modulo mm, the intermediate values remain bounded by the of mm, preserving the O(logn)O(\log n) count while each modular costs O((logm)2)O((\log m)^2); the overall complexity thus remains logarithmic in nn. This represents a significant improvement over the naive repeated- approach, which requires O(n)O(n) operations. Regarding space complexity, the iterative version of binary exponentiation uses O(1)O(1) auxiliary space, relying only on a constant number of variables to track the result, current power, and exponent. In contrast, the recursive version requires O(logn)O(\log n) space due to the call stack depth, which matches the recursion depth of log2n\log_2 n.

Comparison to Naive Multiplication

The naive method for computing xnx^n involves repeatedly multiplying xx by itself nn times, resulting in n1n-1 multiplications. This approach is straightforward but becomes highly inefficient as nn grows large, since the number of operations scales linearly with the exponent. In contrast, binary exponentiation reduces the operation count to approximately log2n+wt(n)1\lfloor \log_2 n \rfloor + \mathrm{wt}(n) - 1 multiplications, where wt(n)\mathrm{wt}(n) is the (number of 1-bits) in the binary representation of nn. For example, with n=1000n = 1000 (binary 1111101000, log210009.97\log_2 1000 \approx 9.97, wt(1000)=6\mathrm{wt}(1000) = 6), the naive method requires 999 multiplications, while binary exponentiation needs only 14 (9 squarings + 5 extra multiplications). This logarithmic scaling provides substantial efficiency gains for moderate to large exponents. To illustrate the difference more clearly, consider the following table comparing operation counts for powers of 2, where binary exponentiation consists solely of squarings due to the single 1-bit in the exponent:
Exponent nnNaive Multiplications (n1n-1)Binary Multiplications (log2n\log_2 n)Relative Speedup
210=[1024](/page/1024)2^{10} = [1024](/page/1024)102310102.3×
220=1,048,5762^{20} = 1,048,5761,048,5752052,428.75×
Calculations based on standard formulas for each method. Historically, the naive method sufficed for small exponents in early computing, but binary exponentiation became essential with the advent of public-key cryptography, such as RSA, where exponents like 2048 bits require millions of naive operations but only about 3000 with binary methods. In RSA implementations, the square-and-multiply variant of binary exponentiation is standard to handle modular arithmetic efficiently. The naive method may still suffice in resource-constrained embedded systems for very small exponents (e.g., n<10n < 10), where the overhead of binary exponentiation's conditional branches outweighs the minimal savings, and hardware lacks optimized big-integer support.

Higher-Radix Variants

2^k-ary Exponentiation

2^k-ary exponentiation generalizes binary exponentiation, the special case for k=1, by representing the exponent in base m=2^k and precomputing the powers x^0, x^1, \dots, x^{m-1} to reduce the number of multiplications at the expense of additional precomputation and storage. The method processes groups of k bits at a time, enabling fewer but more expensive operations compared to binary methods. In the algorithm, the exponent n is expressed as n = \sum_{i=0}^{\ell-1} d_i m^i where each digit d_i \in {0, 1, \dots, m-1} and \ell \approx \log_m n = (\log_2 n)/k is the number of digits. Precomputation involves calculating the table of powers x^j for j=1 to m-1, typically using successive multiplications or squarings, costing O(m) operations. The main computation uses a left-to-right approach: initialize the result R = x^{d_{\ell-1}}, then for each subsequent digit d_i from i=\ell-2 down to 0, update R \leftarrow R^m \cdot x^{d_i}, where R^m is computed via k successive squarings. This requires approximately \log_2 n squarings in total, independent of k, and up to \ell multiplications for the digit terms, yielding O((\log_2 n)/k) extra multiplications. For k=2 (quaternary exponentiation, m=4), the binary digits of the exponent are grouped into 2-bit chunks, with precomputed values x^0 = 1, x^1 = x, x^2 = x^2, and x^3 = x^3. Each group corresponds to a digit d_i \in {0,1,2,3}, and R^m = R^4 is two squarings: (R^2)^2. The precomputation costs 3 multiplications (e.g., x^2 = x \cdot x, x^3 = x^2 \cdot x). Consider computing 3^{25} with k=2. First, 25_{10} = 121_4 since 1 \cdot 4^2 + 2 \cdot 4^1 + 1 \cdot 4^0 = 16 + 8 + 1 = 25, with digits d_2=1, d_1=2, d_0=1. Precompute: p_0=1, p_1=3, p_2=9, p_3=27. Initialize R = p_1 = 3 (for 3^1). For d_1=2: compute R^4 = ((3)^2)^2 = 9^2 = 81, then R = 81 \cdot p_2 = 81 \cdot 9 = 729 (now 3^{1+8} = 3^9). For d_0=1: R^4 = ((729)^2)^2 = 531441^2 = 282429536481, then R = 282429536481 \cdot p_1 = 282429536481 \cdot 3 = 847288609443 (3^{9+16} = 3^{25}). This uses 4 squarings and 3 multiplications (plus precomputation), compared to 5 squarings and 3 multiplications for binary exponentiation. The trade-off involves fewer multiplications—scaling as O((\log_2 n)/k)—but higher precomputation cost O(2^k) and potentially costlier multiplications by larger precomputed powers, with the number of extra multiplications depending on the digit distribution (typically \ell on average if multiplying even for d_i=0). For typical hardware and exponent sizes around 1024-2048 bits, an optimal k is around 3 to 5, balancing reduced iterations against precomputation overhead and multiplication complexity.

Sliding-Window Exponentiation

Sliding-window exponentiation is an optimization of binary exponentiation that processes the exponent in overlapping windows of fixed size ww to reduce the total number of multiplications required, particularly by minimizing unnecessary squarings through lookahead and precomputation of small powers of the base xx. The method scans the exponent bits from the most significant bit (left-to-right), using sliding windows that start at each '1' bit and look ahead up to w1w-1 bits to form the window value, allowing efficient handling of zero runs. This approach trades a small amount of precomputation storage for fewer operations during the main loop, making it suitable for variable exponents in cryptographic applications like modular exponentiation. The algorithm selects a window size ww, typically small to balance precomputation cost and runtime savings. It precomputes the powers x2ix^{2^i} for i=0i = 0 to w1w-1, which support the repeated squarings needed between windows. Additionally, a lookup table is built for the possible window values corresponding to odd integers from 1 to 2w12^w - 1, computed as xj=x(x2)(j1)/2x^j = x \cdot (x^2)^{(j-1)/2} using successive squarings and a single multiplication by xx per entry; this requires 2w112^{w-1} - 1 multiplications in total. The main computation initializes the result to 1. It processes the exponent by repeatedly finding the next window starting from the current position: if the current bit is 0, advance by squaring the result; otherwise, form the ww-bit (or shorter) window value vv by looking ahead, multiply the result by the precomputed xvx^v, then square the result ww times to shift past the window (consolidating squarings for trailing zeros in the window). This process repeats until all bits are processed, with leading zeros handled by adjusting the initial window. For example, consider computing 7267^{26} with w=3w = 3, where 26 in binary is 11010211010_2. Precompute odd powers: 71=77^1 = 7, 73=3437^3 = 343, 75=168077^5 = 16807, 77=8235437^7 = 823543. The binary representation has windows that can be processed left-to-right: starting from MSB, the first window (bits 4-2: 110 binary 6, but since even, adjust to odd via recoding or treat as 0110 with leading 0; implementations vary but effectively use 2 windows: one for the leading 11 (3, shifted), then skip 0, then 10. The total reduces to approximately 2 multiplications plus precomputation and 5 squarings, fewer than binary's 5 squarings and 3 multiplications for this sparse exponent. The benefits of sliding-window exponentiation include fewer overall multiplications compared to naive binary methods, with an average of approximately lognw+2w12wlognw\frac{\log n}{w} + \frac{2^w - 1}{2^w} \cdot \frac{\log n}{w} multiplications for an nn-bit exponent, where the first term approximates squarings per window and the second reflects multiplication probability for non-zero windows. Precomputation cost is 2w112^{w-1} - 1 multiplications, but runtime savings dominate for large exponents. Window sizes w=4w = 4 to 55 are optimal, balancing table size (16-32 entries) against speed gains of 5-8% over m-ary methods for 512-bit exponents, as larger ww increases precomputation exponentially while diminishing marginal returns. Unlike the non-sliding 2^k-ary method with fixed blocks, sliding windows overlap to further reduce squarings by adapting to bit patterns.

Constant-Time Techniques

Montgomery Ladder

The Montgomery ladder is a variant of binary exponentiation designed for constant-time execution, making it resistant to side-channel attacks such as timing and power analysis in cryptographic applications. It maintains two registers, R0R_0 and R1R_1, initialized to 1 and the base gg respectively, and processes the binary representation of the exponent kk from the most significant bit (MSB) to the least significant bit (LSB), performing exactly one squaring and one multiplication per bit regardless of the bit value. This uniformity ensures that the algorithm's execution path and operation counts do not leak information about the exponent. The core technique operates in a multiplicative group, where for each bit kjk_j:
  • If kj=0k_j = 0, compute R1R0R1R_1 \leftarrow R_0 \cdot R_1 followed by R0R02R_0 \leftarrow R_0^2.
  • If kj=1k_j = 1, compute R0R0R1R_0 \leftarrow R_0 \cdot R_1 followed by R1R12R_1 \leftarrow R_1^2.
To avoid conditional branches that could introduce timing variations, implementations often use conditional swaps or arithmetic operations that are independent of the bit value. The invariant preserved throughout is that R1/R0=gR_1 / R_0 = g after processing the first jj bits (starting from the MSB). At the end, R0=gkR_0 = g^k. The total cost is 2t2t multiplications for an tt-bit exponent, matching the efficiency of standard binary exponentiation while adding side-channel resistance. The technique was first described by Peter L. Montgomery in 1985. Here is the pseudocode for computing y=gkmodny = g^k \mod n in a ring (e.g., modular arithmetic):

Input: base g, exponent k = (k_{t-1} ... k_0)_2, modulus n Output: y = g^k mod n R_0 ← 1 mod n R_1 ← g mod n for j = t-1 downto 0 do if k_j = 0 then R_1 ← (R_0 * R_1) mod n R_0 ← (R_0 * R_0) mod n else R_0 ← (R_0 * R_1) mod n R_1 ← (R_1 * R_1) mod n return R_0

Input: base g, exponent k = (k_{t-1} ... k_0)_2, modulus n Output: y = g^k mod n R_0 ← 1 mod n R_1 ← g mod n for j = t-1 downto 0 do if k_j = 0 then R_1 ← (R_0 * R_1) mod n R_0 ← (R_0 * R_0) mod n else R_0 ← (R_0 * R_1) mod n R_1 ← (R_1 * R_1) mod n return R_0

Branch-free variants replace the if-statement with a swap conditioned on kjk_j, ensuring constant-time behavior. To illustrate, consider computing x13modnx^{13} \mod n where 13=1101213 = 1101_2 (MSB first: bits 1,1,0,1). Initialize R0=1R_0 = 1, R1=xR_1 = x.
  • Bit 1 (MSB): R01x=xR_0 \leftarrow 1 \cdot x = x, R1x2R_1 \leftarrow x^2. Now (R0,R1)=(x,x2)(R_0, R_1) = (x, x^2).
  • Bit 1: R0xx2=x3R_0 \leftarrow x \cdot x^2 = x^3, R1(x2)2=x4R_1 \leftarrow (x^2)^2 = x^4. Now (R0,R1)=(x3,x4)(R_0, R_1) = (x^3, x^4).
  • Bit 0: R1x3x4=x7R_1 \leftarrow x^3 \cdot x^4 = x^7, R0(x3)2=x6R_0 \leftarrow (x^3)^2 = x^6. Now (R0,R1)=(x6,x7)(R_0, R_1) = (x^6, x^7).
  • Bit 1 (LSB): R0x6x7=x13R_0 \leftarrow x^6 \cdot x^7 = x^{13}, R1(x7)2=x14R_1 \leftarrow (x^7)^2 = x^{14}. Now (R0,R1)=(x13,x14)(R_0, R_1) = (x^{13}, x^{14}).
The result is R0=x13modnR_0 = x^{13} \mod n. The security stems from performing the same sequence of operations—one squaring and one multiplication—per bit, preventing attackers from distinguishing bit values via execution time, power consumption, or electromagnetic emissions. This makes it suitable for protecting secret exponents in , where traditional binary methods could leak via variable-time branches. Variants further randomize the ladder to counter fault attacks or differential power analysis. In practice, the Montgomery ladder is widely adopted for elliptic curve scalar multiplication in standards like ECDSA and protocols such as TLS, with implementations in libraries like that use it for constant-time point multiplication on Montgomery curves. It also applies to general modular exponentiation in RSA and Diffie-Hellman, offering parallelization potential for up to 200% speedup on multi-processor systems.

Signed-Digit Recoding

Signed-digit recoding transforms the binary representation of the exponent into a signed-digit form, such as the non-adjacent form (NAF), to minimize the number of non-zero digits while ensuring no two consecutive digits are non-zero. In NAF, each digit did_i belongs to the set {1,0,1}\{-1, 0, 1\}, often denoted with 1ˉ\bar{1} for -1, and the representation of an integer ee is e=i=0di2ie = \sum_{i=0}^{\ell} d_i 2^i. This recoding, originally introduced for efficient binary arithmetic, reduces the average Hamming weight—the number of non-zero digits—from approximately 1/21/2 in standard binary to 1/31/3 in NAF, leading to fewer multiplication operations during exponentiation. The NAF can be computed from the binary representation using a deterministic algorithm that scans the bits from least to most significant, adjusting for signed digits to enforce the non-adjacency constraint. One standard method initializes a carry c0=0c_0 = 0 and, for each bit position ii from 0 to 1\ell-1, computes the next carry ci+1=(ci+bi+bi+1)/2c_{i+1} = \lfloor (c_i + b_i + b_{i+1}) / 2 \rfloor and the recoded digit di=ci+bi2ci+1d_i = c_i + b_i - 2 c_{i+1}, where bib_i are the binary digits (with b=0b_\ell = 0); the process may extend by one bit if a final carry remains. This produces a unique NAF representation of minimal weight. Related techniques, such as Booth recoding adapted for signed digits, follow similar principles by examining overlapping bit pairs to generate ±1\pm 1 or 0, though NAF specifically guarantees non-adjacency. For example, the exponent 13 in binary is 110121101_2, which recodes to 101ˉ01NAF10\bar{1}01_{\text{NAF}} (digits from MSB: 1, 0, -1, 0, 1), satisfying 124+023+(1)22+021+120=164+1=131 \cdot 2^4 + 0 \cdot 2^3 + (-1) \cdot 2^2 + 0 \cdot 2^1 + 1 \cdot 2^0 = 16 - 4 + 1 = 13. In the context of exponentiation by squaring, the recoded exponent enables a left-to-right algorithm where the result is iteratively squared, and multiplied by the base xx (or x1x^{-1} for -1 digits) only at non-zero positions, precomputing x1modmx^{-1} \mod m once if needed for modular arithmetic. To avoid explicit inversion in practice, especially in modular settings, implementations often precompute xmodm-x \mod m and perform additions equivalent to subtractions, treating negative digits as multiplications by this negated base, which maintains efficiency since modular negation is inexpensive. This approach yields approximately 30% fewer multiplications compared to binary methods, as the reduced Hamming weight directly lowers the number of base multiplications (from roughly m/2m/2 to m/3m/3 for an mm-bit exponent), making it particularly advantageous for hardware implementations with left-to-right scanning. Balanced ternary, another signed-digit system using radix 3 with digits -1, 0, 1, offers similar sparsity benefits but requires adjustments for higher-radix squaring. The non-adjacent property also enhances security against side-channel attacks by promoting more uniform operation patterns.

Fixed-Base Optimizations

Yao's Method

Yao's method, proposed by in 1976, provides an efficient approach for fixed-base exponentiation, particularly beneficial when the same base xx is used to compute xnx^n for multiple different exponents nn. The technique relies on representing the exponent in a higher radix b=h>2b = h > 2, precomputing a table of powers xbix^{b^i} for i=0,1,,1i = 0, 1, \dots, \ell-1, where logbn\ell \approx \log_b n, and then assembling the result through grouped multiplications based on the digits of nn. This amortizes the precomputation cost across multiple evaluations, making it suitable for applications such as batch verification in cryptographic protocols. The algorithm proceeds as follows. First, during precomputation, compute the table entries successively: set x0=xx_0 = x, and for i1i \geq 1, compute xi=xi1hx_i = x_{i-1}^h using O(logh)O(\log h) group multiplications per step, yielding a total precomputation cost of O(logn)O(\log n) multiplications and O()O(\ell) storage, where =loghn\ell = \lceil \log_h n \rceil. For a given exponent nn, express n=i=01nibin = \sum_{i=0}^{\ell-1} n_i b^i with digits 0ni<h0 \leq n_i < h. The power is then given by xn=j=1h1({i:ni=j}xbi)j.x^n = \prod_{j=1}^{h-1} \left( \prod_{\{i : n_i = j\}} x^{b^i} \right)^j.
Add your contribution
Related Hubs
User Avatar
No comments yet.