Hubbry Logo
Multiplication algorithmMultiplication algorithmMain
Open search
Multiplication algorithm
Community hub
Multiplication algorithm
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Multiplication algorithm
Multiplication algorithm
from Wikipedia

A multiplication algorithm is an algorithm (or method) to multiply two numbers. Depending on the size of the numbers, different algorithms are more efficient than others. Numerous algorithms are known and there has been much research into the topic.

The oldest and simplest method, known since antiquity as long multiplication or grade-school multiplication, consists of multiplying every digit in the first number by every digit in the second and adding the results. This has a time complexity of , where n is the number of digits. When done by hand, this may also be reframed as grid method multiplication or lattice multiplication. In software, this may be called "shift and add" due to bitshifts and addition being the only two operations needed.

In 1960, Anatoly Karatsuba discovered Karatsuba multiplication, unleashing a flood of research into fast multiplication algorithms. This method uses three multiplications rather than four to multiply two two-digit numbers. (A variant of this can also be used to multiply complex numbers quickly.) Done recursively, this has a time complexity of . Splitting numbers into more than two parts results in Toom–Cook multiplication; for example, using three parts results in the Toom-3 algorithm. Using many parts can set the exponent arbitrarily close to 1, but the constant factor also grows, making it impractical.

In 1968, the Schönhage–Strassen algorithm, which makes use of a Fourier transform over a modulus, was discovered. It has a time complexity of . In 2007, Martin Fürer proposed an algorithm with complexity . In 2014, Harvey, Joris van der Hoeven, and Lecerf proposed one with complexity , thus making the implicit constant explicit; this was improved to in 2018. Lastly, in 2019, Harvey and van der Hoeven came up with a galactic algorithm with complexity . This matches a guess by Schönhage and Strassen that this would be the optimal bound, although this remains a conjecture today.

Integer multiplication algorithms can also be used to multiply polynomials by means of the method of Kronecker substitution.

Long multiplication

[edit]

If a positional numeral system is used, a natural way of multiplying numbers is taught in schools as long multiplication, sometimes called grade-school multiplication, sometimes called the Standard Algorithm: multiply the multiplicand by each digit of the multiplier and then add up all the properly shifted results. It requires memorization of the multiplication table for single digits.

This is the usual algorithm for multiplying larger numbers by hand in base 10. A person doing long multiplication on paper will write down all the products and then add them together; an abacus-user will sum the products as soon as each one is computed.

Example

[edit]

This example uses long multiplication to multiply 23,958,233 (multiplicand) by 5,830 (multiplier) and arrives at 139,676,498,390 for the result (product).

      23958233
×         5830
———————————————
      00000000 ( =  23,958,233 ×     0)
     71874699  ( =  23,958,233 ×    30)
   191665864   ( =  23,958,233 ×   800)
+ 119791165    ( =  23,958,233 × 5,000)
———————————————
  139676498390 ( = 139,676,498,390)

Other notations

[edit]

In some countries such as Germany, the above multiplication is depicted similarly but with the original product kept horizontal and computation starting with the first digit of the multiplier:[1]

23958233 · 5830
———————————————
   119791165
    191665864
      71874699
       00000000
———————————————
   139676498390

Below pseudocode describes the process of above multiplication. It keeps only one row to maintain the sum which finally becomes the result. Note that the '+=' operator is used to denote sum to existing value and store operation (akin to languages such as Java and C) for compactness.

multiply(a[1..p], b[1..q], base)                            // Operands containing rightmost digits at index 1
  product = [1..p+q]                                        // Allocate space for result
  for b_i = 1 to q                                          // for all digits in b
    carry = 0
    for a_i = 1 to p                                        // for all digits in a
      product[a_i + b_i - 1] += carry + a[a_i] * b[b_i]
      carry = product[a_i + b_i - 1] / base
      product[a_i + b_i - 1] = product[a_i + b_i - 1] mod base
    product[b_i + p] = carry                               // last digit comes from final carry
  return product

Usage in computers

[edit]

Some chips implement long multiplication, in hardware or in microcode, for various integer and floating-point word sizes. In arbitrary-precision arithmetic, it is common to use long multiplication with the base set to 2w, where w is the number of bits in a word, for multiplying relatively small numbers. To multiply two numbers with n digits using this method, one needs about n2 operations. More formally, multiplying two n-digit numbers using long multiplication requires Θ(n2) single-digit operations (additions and multiplications).

When implemented in software, long multiplication algorithms must deal with overflow during additions, which can be expensive. A typical solution is to represent the number in a small base, b, such that, for example, 8b is a representable machine integer. Several additions can then be performed before an overflow occurs. When the number becomes too large, we add part of it to the result, or we carry and map the remaining part back to a number that is less than b. This process is called normalization. Richard Brent used this approach in his Fortran package, MP.[2]

Computers initially used a very similar algorithm to long multiplication in base 2, but modern processors have optimized circuitry for fast multiplications using more efficient algorithms, at the price of a more complex hardware realization.[citation needed] In base two, long multiplication is sometimes called "shift and add", because the algorithm simplifies and just consists of shifting left (multiplying by powers of two) and adding. Most currently available microprocessors implement this or other similar algorithms (such as Booth encoding) for various integer and floating-point sizes in hardware multipliers or in microcode.[citation needed]

On currently available processors, a bit-wise shift instruction is usually (but not always) faster than a multiply instruction and can be used to multiply (shift left) and divide (shift right) by powers of two. Multiplication by a constant and division by a constant can be implemented using a sequence of shifts and adds or subtracts. For example, there are several ways to multiply by 10 using only bit-shift and addition.

 ((x << 2) + x) << 1 # Here 10*x is computed as (x*2^2 + x)*2
 (x << 3) + (x << 1) # Here 10*x is computed as x*2^3 + x*2

In some cases such sequences of shifts and adds or subtracts will outperform hardware multipliers and especially dividers. A division by a number of the form or often can be converted to such a short sequence.

Algorithms for multiplying by hand

[edit]

In addition to the standard long multiplication, there are several other methods used to perform multiplication by hand. Such algorithms may be devised for speed, ease of calculation, or educational value, particularly when computers or multiplication tables are unavailable.

Grid method

[edit]

The grid method (or box method) is an introductory method for multiple-digit multiplication that is often taught to pupils at primary school or elementary school. It has been a standard part of the national primary school mathematics curriculum in England and Wales since the late 1990s.[3]

Both factors are broken up ("partitioned") into their hundreds, tens and units parts, and the products of the parts are then calculated explicitly in a relatively simple multiplication-only stage, before these contributions are then totalled to give the final answer in a separate addition stage.

The calculation 34 × 13, for example, could be computed using the grid:

  300
   40
   90
 + 12
 ————
  442
× 30 4
10 300 40
3 90 12

followed by addition to obtain 442, either in a single sum (see right), or through forming the row-by-row totals

(300 + 40) + (90 + 12) = 340 + 102 = 442.

This calculation approach (though not necessarily with the explicit grid arrangement) is also known as the partial products algorithm. Its essence is the calculation of the simple multiplications separately, with all addition being left to the final gathering-up stage.

The grid method can in principle be applied to factors of any size, although the number of sub-products becomes cumbersome as the number of digits increases. Nevertheless, it is seen as a usefully explicit method to introduce the idea of multiple-digit multiplications; and, in an age when most multiplication calculations are done using a calculator or a spreadsheet, it may in practice be the only multiplication algorithm that some students will ever need.

Lattice multiplication

[edit]
First, set up the grid by marking its rows and columns with the numbers to be multiplied. Then, fill in the boxes with tens digits in the top triangles and units digits on the bottom.
Finally, sum along the diagonal tracts and carry as needed to get the answer

Lattice, or sieve, multiplication is algorithmically equivalent to long multiplication. It requires the preparation of a lattice (a grid drawn on paper) which guides the calculation and separates all the multiplications from the additions. It was introduced to Europe in 1202 in Fibonacci's Liber Abaci. Fibonacci described the operation as mental, using his right and left hands to carry the intermediate calculations. Matrakçı Nasuh presented 6 different variants of this method in this 16th-century book, Umdet-ul Hisab. It was widely used in Enderun schools across the Ottoman Empire.[4] Napier's bones, or Napier's rods also used this method, as published by Napier in 1617, the year of his death.

As shown in the example, the multiplicand and multiplier are written above and to the right of a lattice, or a sieve. It is found in Muhammad ibn Musa al-Khwarizmi's "Arithmetic", one of Leonardo's sources mentioned by Sigler, author of "Fibonacci's Liber Abaci", 2002.[citation needed]

  • During the multiplication phase, the lattice is filled in with two-digit products of the corresponding digits labeling each row and column: the tens digit goes in the top-left corner.
  • During the addition phase, the lattice is summed on the diagonals.
  • Finally, if a carry phase is necessary, the answer as shown along the left and bottom sides of the lattice is converted to normal form by carrying ten's digits as in long addition or multiplication.

Example

[edit]

The pictures on the right show how to calculate 345 × 12 using lattice multiplication. As a more complicated example, consider the picture below displaying the computation of 23,958,233 multiplied by 5,830 (multiplier); the result is 139,676,498,390. Notice 23,958,233 is along the top of the lattice and 5,830 is along the right side. The products fill the lattice and the sum of those products (on the diagonal) are along the left and bottom sides. Then those sums are totaled as shown.

     2   3   9   5   8   2   3   3
   +---+---+---+---+---+---+---+---+-
   |1 /|1 /|4 /|2 /|4 /|1 /|1 /|1 /|
   | / | / | / | / | / | / | / | / | 5
 01|/ 0|/ 5|/ 5|/ 5|/ 0|/ 0|/ 5|/ 5|
   +---+---+---+---+---+---+---+---+-
   |1 /|2 /|7 /|4 /|6 /|1 /|2 /|2 /|
   | / | / | / | / | / | / | / | / | 8
 02|/ 6|/ 4|/ 2|/ 0|/ 4|/ 6|/ 4|/ 4|
   +---+---+---+---+---+---+---+---+-
   |0 /|0 /|2 /|1 /|2 /|0 /|0 /|0 /|
   | / | / | / | / | / | / | / | / | 3
 17|/ 6|/ 9|/ 7|/ 5|/ 4|/ 6|/ 9|/ 9|
   +---+---+---+---+---+---+---+---+-
   |0 /|0 /|0 /|0 /|0 /|0 /|0 /|0 /|
   | / | / | / | / | / | / | / | / | 0
 24|/ 0|/ 0|/ 0|/ 0|/ 0|/ 0|/ 0|/ 0|
   +---+---+---+---+---+---+---+---+-
     26  15  13  18  17  13  09  00
 01
 002
 0017
 00024
 000026
 0000015
 00000013
 000000018
 0000000017
 00000000013
 000000000009
 0000000000000
 —————————————
  139676498390
= 139,676,498,390

Russian peasant multiplication

[edit]

The binary method is also known as peasant multiplication, because it has been widely used by people who are classified as peasants and thus have not memorized the multiplication tables required for long multiplication.[5][failed verification] The algorithm was in use in ancient Egypt.[6] Its main advantages are that it can be taught quickly, requires no memorization, and can be performed using tokens, such as poker chips, if paper and pencil aren't available. The disadvantage is that it takes more steps than long multiplication, so it can be unwieldy for large numbers.

Description

[edit]

On paper, write down in one column the numbers you get when you repeatedly halve the multiplier, ignoring the remainder; in a column beside it repeatedly double the multiplicand. Cross out each row in which the last digit of the first number is even, and add the remaining numbers in the second column to obtain the product.

Examples

[edit]

This example uses peasant multiplication to multiply 11 by 3 to arrive at a result of 33.

Decimal:     Binary:
11   3       1011  11
5    6       101  110
2   12       10  1100
1   24       1  11000
    ——         ——————
    33         100001

Describing the steps explicitly:

  • 11 and 3 are written at the top
  • 11 is halved (5.5) and 3 is doubled (6). The fractional portion is discarded (5.5 becomes 5).
  • 5 is halved (2.5) and 6 is doubled (12). The fractional portion is discarded (2.5 becomes 2). The figure in the left column (2) is even, so the figure in the right column (12) is discarded.
  • 2 is halved (1) and 12 is doubled (24).
  • All not-scratched-out values are summed: 3 + 6 + 24 = 33.

The method works because multiplication is distributive, so:

A more complicated example, using the figures from the earlier examples (23,958,233 and 5,830):

Decimal:             Binary:
5830  23958233       1011011000110  1011011011001001011011001
2915  47916466       101101100011  10110110110010010110110010
1457  95832932       10110110001  101101101100100101101100100
728  191665864       1011011000  1011011011001001011011001000
364  383331728       101101100  10110110110010010110110010000
182  766663456       10110110  101101101100100101101100100000
91  1533326912       1011011  1011011011001001011011001000000
45  3066653824       101101  10110110110010010110110010000000
22  6133307648       10110  101101101100100101101100100000000
11 12266615296       1011  1011011011001001011011001000000000
5  24533230592       101  10110110110010010110110010000000000
2  49066461184       10  101101101100100101101100100000000000
1  98132922368       1  1011011011001001011011001000000000000
  ————————————          1022143253354344244353353243222210110 (before carry)
  139676498390         10000010000101010111100011100111010110

Quarter square multiplication

[edit]

This formula can in some cases be used, to make multiplication tasks easier to complete:

In the case where and are integers, we have that

because and are either both even or both odd. This means that

and it's sufficient to (pre-)compute the integral part of squares divided by 4 like in the following example.

Examples

[edit]

Below is a lookup table of quarter squares with the remainder discarded for the digits 0 through 18; this allows for the multiplication of numbers up to 9×9.

n     0   1   2   3   4   5   6 7 8 9 10 11 12 13 14 15 16 17 18
n2/4⌋ 0 0 1 2 4 6 9 12 16 20 25 30 36 42 49 56 64 72 81

If, for example, you wanted to multiply 9 by 3, you observe that the sum and difference are 12 and 6 respectively. Looking both those values up on the table yields 36 and 9, the difference of which is 27, which is the product of 9 and 3.

History of quarter square multiplication

[edit]

In prehistoric time, quarter square multiplication involved floor function; that some sources[7][8] attribute to Babylonian mathematics (2000–1600 BC).

Antoine Voisin published a table of quarter squares from 1 to 1000 in 1817 as an aid in multiplication. A larger table of quarter squares from 1 to 100000 was published by Samuel Laundy in 1856,[9] and a table from 1 to 200000 by Joseph Blater in 1888.[10]

Quarter square multipliers were used in analog computers to form an analog signal that was the product of two analog input signals. In this application, the sum and difference of two input voltages are formed using operational amplifiers. The square of each of these is approximated using piecewise linear circuits. Finally the difference of the two squares is formed and scaled by a factor of one fourth using yet another operational amplifier.

In 1980, Everett L. Johnson proposed using the quarter square method in a digital multiplier.[11] To form the product of two 8-bit integers, for example, the digital device forms the sum and difference, looks both quantities up in a table of squares, takes the difference of the results, and divides by four by shifting two bits to the right. For 8-bit integers the table of quarter squares will have 29−1=511 entries (one entry for the full range 0..510 of possible sums, the differences using only the first 256 entries in range 0..255) or 29−1=511 entries (using for negative differences the technique of 2-complements and 9-bit masking, which avoids testing the sign of differences), each entry being 16-bit wide (the entry values are from (0²/4)=0 to (510²/4)=65025).

The quarter square multiplier technique has benefited 8-bit systems that do not have any support for a hardware multiplier. Charles Putney implemented this for the 6502.[12]

Computational complexity of multiplication

[edit]

Unsolved problem in computer science
What is the fastest algorithm for multiplication of two -digit numbers?

A line of research in theoretical computer science is about the number of single-bit arithmetic operations necessary to multiply two -bit integers. This is known as the computational complexity of multiplication. Usual algorithms done by hand have asymptotic complexity of , but in 1960 Anatoly Karatsuba discovered that better complexity was possible (with the Karatsuba algorithm).[13]

Currently, the algorithm with the best computational complexity is a 2019 algorithm of David Harvey and Joris van der Hoeven, which uses the strategies of using number-theoretic transforms introduced with the Schönhage–Strassen algorithm to multiply integers using only operations.[14] This is conjectured to be the best possible algorithm, but lower bounds of are not known.

Karatsuba multiplication

[edit]

Karatsuba multiplication is an O(nlog23) ≈ O(n1.585) divide and conquer algorithm, that uses recursion to merge together sub calculations.

By rewriting the formula, one makes it possible to do sub calculations / recursion. By doing recursion, one can solve this in a fast manner.

Let and be represented as -digit strings in some base . For any positive integer less than , one can write the two given numbers as

where and are less than . The product is then

where

These formulae require four multiplications and were known to Charles Babbage.[15] Karatsuba observed that can be computed in only three multiplications, at the cost of a few extra additions. With and as before one can observe that


Because of the overhead of recursion, Karatsuba's multiplication is slower than long multiplication for small values of n; typical implementations therefore switch to long multiplication for small values of n.

General case with multiplication of N numbers

[edit]

By exploring patterns after expansion, one see following:

Each summand is associated to a unique binary number from 0 to , for example etc. Furthermore; B is powered to number of 1, in this binary string, multiplied with m.

If we express this in fewer terms, we get:

, where means digit in number i at position j. Notice that

History

[edit]

Karatsuba's algorithm was the first known algorithm for multiplication that is asymptotically faster than long multiplication,[16] and can thus be viewed as the starting point for the theory of fast multiplications.

Toom–Cook

[edit]

Another method of multiplication is called Toom–Cook or Toom-3. The Toom–Cook method splits each number to be multiplied into multiple parts. The Toom–Cook method is one of the generalizations of the Karatsuba method. A three-way Toom–Cook can do a size-3N multiplication for the cost of five size-N multiplications. This accelerates the operation by a factor of 9/5, while the Karatsuba method accelerates it by 4/3.

Although using more and more parts can reduce the time spent on recursive multiplications further, the overhead from additions and digit management also grows. For this reason, the method of Fourier transforms is typically faster for numbers with several thousand digits, and asymptotically faster for even larger numbers.

Schönhage–Strassen

[edit]
Demonstration of multiplying 1234 × 5678 = 7006652 using fast Fourier transforms (FFTs). Number-theoretic transforms in the integers modulo 337 are used, selecting 85 as an 8th root of unity. Base 10 is used in place of base 2w for illustrative purposes.

Every number in base B, can be written as a polynomial:

Furthermore, multiplication of two numbers could be thought of as a product of two polynomials:

Since the coefficient of in the product is one has a convolution, and one can use the fast Fourier transform (FFT):

Therefore, the multiplication is reduced to a FFT, multiplications, and an inverse FFT. It results a time complexity of O(n log(n) log(log n)).

History

[edit]

The algorithm was invented by Strassen (1968). It was made practical and theoretical guarantees were provided in 1971 by Schönhage and Strassen resulting in the Schönhage–Strassen algorithm.[17]

Further improvements

[edit]

In 2007 the asymptotic complexity of integer multiplication was improved by the Swiss mathematician Martin Fürer of Pennsylvania State University to using Fourier transforms over complex numbers,[18] where log* denotes the iterated logarithm. Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi gave a similar algorithm using modular arithmetic in 2008 achieving the same running time.[19] In context of the above material, what these latter authors have achieved is to find N much less than 23k + 1, so that Z/NZ has a (2m)th root of unity. This speeds up computation and reduces the time complexity. However, these latter algorithms are only faster than Schönhage–Strassen for impractically large inputs.

In 2014, Harvey, Joris van der Hoeven and Lecerf[20] gave a new algorithm that achieves a running time of , making explicit the implied constant in the exponent. They also proposed a variant of their algorithm which achieves but whose validity relies on standard conjectures about the distribution of Mersenne primes. In 2016, Covanov and Thomé proposed an integer multiplication algorithm based on a generalization of Fermat primes that conjecturally achieves a complexity bound of . This matches the 2015 conditional result of Harvey, van der Hoeven, and Lecerf but uses a different algorithm and relies on a different conjecture.[21] In 2018, Harvey and van der Hoeven used an approach based on the existence of short lattice vectors guaranteed by Minkowski's theorem to prove an unconditional complexity bound of .[22]

In March 2019, David Harvey and Joris van der Hoeven announced their discovery of an O(n log n) multiplication algorithm.[23] It was published in the Annals of Mathematics in 2021.[24] Because Schönhage and Strassen predicted that n log(n) is the "best possible" result, Harvey said: "... our work is expected to be the end of the road for this problem, although we don't know yet how to prove this rigorously."[25]

Lower bounds

[edit]

There is a trivial lower bound of Ω(n) for multiplying two n-bit numbers on a single processor; no matching algorithm (on conventional machines, that is on Turing equivalent machines) nor any sharper lower bound is known. The Hartmanis–Stearns conjecture would imply that cannot be achieved. Multiplication lies outside of AC0[p] for any prime p, meaning there is no family of constant-depth, polynomial (or even subexponential) size circuits using AND, OR, NOT, and MODp gates that can compute a product. This follows from a constant-depth reduction of MODq to multiplication.[26] Lower bounds for multiplication are also known for some classes of branching programs.[27]

Complex number multiplication

[edit]

Complex multiplication normally involves four multiplications and two additions.

Or

As observed by Peter Ungar in 1963, one can reduce the number of multiplications to three, using essentially the same computation as Karatsuba's algorithm.[28] The product (a + bi) · (c + di) can be calculated in the following way.

k1 = c · (a + b)
k2 = a · (dc)
k3 = b · (c + d)
Real part = k1k3
Imaginary part = k1 + k2.

This algorithm uses only three multiplications, rather than four, and five additions or subtractions rather than two. If a multiply is more expensive than three adds or subtracts, as when calculating by hand, then there is a gain in speed. On modern computers a multiply and an add can take about the same time so there may be no speed gain. There is a trade-off in that there may be some loss of precision when using floating point.

For fast Fourier transforms (FFTs) (or any linear transformation) the complex multiplies are by constant coefficients c + di (called twiddle factors in FFTs), in which case two of the additions (dc and c+d) can be precomputed. Hence, only three multiplies and three adds are required.[29] However, trading off a multiplication for an addition in this way may no longer be beneficial with modern floating-point units.[30]

Polynomial multiplication

[edit]

All the above multiplication algorithms can also be expanded to multiply polynomials. Alternatively the Kronecker substitution technique may be used to convert the problem of multiplying polynomials into a single binary multiplication.[31]

Long multiplication methods can be generalised to allow the multiplication of algebraic formulae:

 14ac - 3ab + 2 multiplied by ac - ab + 1
 14ac  -3ab   2
   ac   -ab   1
 ————————————————————
 14a2c2  -3a2bc   2ac
        -14a2bc         3 a2b2  -2ab
                 14ac           -3ab   2
 ———————————————————————————————————————
 14a2c2 -17a2bc   16ac  3a2b2    -5ab  +2
 =======================================[32]

As a further example of column based multiplication, consider multiplying 23 long tons (t), 12 hundredweight (cwt) and 2 quarters (qtr) by 47. This example uses avoirdupois measures: 1 t = 20 cwt, 1 cwt = 4 qtr.

    t    cwt  qtr
   23     12    2
               47 x
 ————————————————
  141     94   94
  940    470
   29     23
 ————————————————
 1110    587   94
 ————————————————
 1110      7    2
 =================  Answer: 1110 ton 7 cwt 2 qtr

First multiply the quarters by 47, the result 94 is written into the first workspace. Next, multiply cwt 12*47 = (2 + 10)*47 but don't add up the partial results (94, 470) yet. Likewise multiply 23 by 47 yielding (141, 940). The quarters column is totaled and the result placed in the second workspace (a trivial move in this case). 94 quarters is 23 cwt and 2 qtr, so place the 2 in the answer and put the 23 in the next column left. Now add up the three entries in the cwt column giving 587. This is 29 t 7 cwt, so write the 7 into the answer and the 29 in the column to the left. Now add up the tons column. There is no adjustment to make, so the result is just copied down.

The same layout and methods can be used for any traditional measurements and non-decimal currencies such as the old British £sd system.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A multiplication algorithm is a systematic procedure for computing the product of two mathematical objects, such as integers or polynomials, through a finite sequence of arithmetic operations, with variants optimized for efficiency in terms of time and depending on the operand size and . These algorithms form a cornerstone of and , enabling everything from basic arithmetic in processors to high-precision calculations in and scientific simulations. The most straightforward multiplication algorithm, often taught in elementary education, employs the long multiplication method, which generates partial products by multiplying the multiplicand by each digit of the multiplier and then summing them with appropriate shifts, achieving a of O(n²) for n-digit numbers. This approach is simple to implement in software and hardware but becomes inefficient for large n, prompting the development of faster alternatives. In hardware designs, such as those in digital processors, shift-and-add techniques extend this by iteratively shifting the multiplicand and adding it to an accumulator based on multiplier bits, typically requiring O(n) cycles for n-bit operands. Optimizations like Booth's recoding (introduced in ) reduce the number of additions by encoding the multiplier to handle sequences of identical bits efficiently, using values like -1, 0, or +1, which cuts partial additions by nearly half on average. For larger numbers, divide-and-conquer strategies offer asymptotic improvements over the quadratic baseline. The , devised by Anatolii Karatsuba in 1962, recursively splits operands into halves and computes the product using only three multiplications of half-sized numbers instead of four, plus some additions and shifts, yielding a complexity of O(n^{log_2 3}) ≈ O(n^{1.585}). This breakthrough inspired generalizations like the Toom-Cook algorithm, which evaluates polynomials at multiple points for even better performance in practice for moderate sizes. For extremely large integers, fast Fourier transform (FFT)-based methods dominate, with the Schönhage-Strassen algorithm (1971) achieving O(n log n log log n) time by representing numbers as polynomials, performing via FFT over rings with roots of unity, and managing carry propagation. Recent refinements, such as Fürer's algorithm (2007) and Harvey and van der Hoeven's (2019), have pushed closer to the theoretical O(n log n) limit, influencing libraries like GMP for . In hardware contexts, advanced multipliers employ parallel structures to reduce latency. Array multipliers use a grid of full adders to generate and sum partial products in a regular, pipelined fashion, suitable for VLSI implementation but with O(n) delay. Tree multipliers, such as Wallace or Dadda trees, leverage carry-save adders (CSAs) to compress partial products in logarithmic depth, achieving O(log n) latency at the cost of increased area, and are common in modern CPUs and GPUs for floating-point units. High-radix variants, processing multiple bits per cycle (e.g., radix-4 or -8), further minimize iterations using multiplexers and precomputed multiples, balancing speed and complexity in embedded systems. Overall, the evolution of multiplication algorithms reflects ongoing trade-offs between theoretical efficiency, practical implementation, and application-specific constraints.

Manual Multiplication Methods

Long Multiplication

Long multiplication, also known as the grade-school or standard multiplication algorithm, is a manual method for computing the product of two multi-digit numbers by generating partial products for each digit of the multiplier and summing them with appropriate shifts for place value. This approach leverages the of multiplication over addition, treating the multiplier's digits as contributions scaled by powers of the base (typically 10 in arithmetic). It is particularly effective for pencil-and-paper calculations involving numbers up to several digits long. The step-by-step process begins by aligning the two numbers vertically, with the multiplicand (the number being multiplied) placed above the multiplier (the number doing the multiplying), both right-justified. Starting with the units digit of the multiplier, multiply it by each digit of the multiplicand from right to left, recording the partial product below the line and handling any carries by adding 1 to the next digit's product if it reaches or exceeds the base. Shift this partial product one position to the left (adding a zero) for the tens digit of the multiplier, and repeat the multiplication and shifting for each subsequent digit, increasing the shift by one position per place value. Finally, add all the partial products column by column from right to left, again managing carries explicitly to obtain the total product. This method ensures accurate place value alignment and systematic error checking through intermediate steps. A detailed example illustrates the process for multiplying 123 by 456. First, multiply 123 by 6 (units digit of 456), yielding 738 (123 × 6 = 738). Next, multiply 123 by 50 (tens digit, shifted left by one), giving 6150 (123 × 5 = 615, then 0). Then, multiply 123 by 400 ( digit, shifted left by two), resulting in 49200 (123 × 4 = 492, then two 0s). Add the partial products: 738 + 6150 = 6888; 6888 + 49200 = 56088. Thus, 123 × 456 = 56088. Carries are handled within each partial product multiplication, such as when 3 × 6 = 18 (write 8, carry 1 to the next column).

123 × 456 ----- 738 (123 × 6) 6150 (123 × 50, shifted) 49200 (123 × 400, shifted) ----- 56088

123 × 456 ----- 738 (123 × 6) 6150 (123 × 50, shifted) 49200 (123 × 400, shifted) ----- 56088

Variations in notation emphasize vertical alignment to maintain place values, with the multiplicand and multiplier stacked and a multiplication symbol (×) placed between them. Some presentations use diagonal lines or arrows to visually connect the digits being multiplied, highlighting the "cross-multiplications" between corresponding positions, though this is optional for clarity. Carries are often noted explicitly above the digits or in a separate row during partial product formation and the final addition to avoid errors in manual computation. For hand calculations with n-digit numbers, long multiplication requires approximately n² single-digit multiplications and additions, resulting in O(n²) , which is efficient enough for typical educational and everyday use but scales poorly for very . Its structured, repeatable steps make it highly suitable for in schools, as it builds understanding of place value and partial products without requiring advanced tools.

Grid Method

The grid method, also known as the box method or area model, is a visual technique for multiplying multi-digit numbers by decomposing them into place values and organizing partial products in a tabular format. To construct the grid, the digits of each factor are separated by place value—for instance, a two-digit number like 23 is broken into 20 and 3—creating rows for one factor's components and columns for the other's, forming cells at each . Each cell is then filled with the product of the corresponding row and column values, representing partial products shifted appropriately for place value. The summation process involves adding the partial products from the grid, typically by summing the values in the cells while carrying over any values exceeding 9 or 99 as needed to the next position. This step-by-step aggregation ensures the total product is calculated without losing track of tens, hundreds, or higher places. For example, to multiply 23 by 14, decompose 23 into 20 and 3 (rows) and 14 into 10 and 4 (columns), forming a 2×2 grid:
104
2020080
33012
Summing the cells—200 + 80 + 30 + 12—yields 322. This method shares conceptual similarities with long multiplication by relying on partial products but organizes them spatially for simultaneous visualization rather than sequential computation. It is particularly advantageous for beginners, as the grid visually separates place values and partial products, reducing errors in alignment and fostering a deeper understanding of how contributes to the overall product.

Lattice Multiplication

Lattice multiplication, also known as the gelosia method, is a visual technique for multiplying multi-digit numbers that employs a grid with diagonal lines in each cell to separate the units and tens components of partial products, allowing for organized along parallel diagonals. This method emphasizes the separation of and steps, providing a structured way to handle place values and carries, which aids in error detection and conceptual understanding of the process. The lattice method traces its origins to early Hindu mathematics, where it appeared in texts such as the 16th-century commentary by Ganesa on Bhaskara II's Lilavati, and was subsequently adopted in Arab mathematics as the "method of the sieve" or "method of the net" following the 12th century. It reached Europe in the late medieval period, with the earliest printed account in the Italian Treviso Arithmetic of 1478, earning the name gelosia from the lattice-patterned window screens resembling jealous guards against onlookers. The technique is also featured in Vedic mathematics, a system drawing from ancient Indian traditions to simplify computations. To construct the grid, align the multiplicand's digits across the top of a rectangular and the multiplier's digits along the left side, creating a cell for each digit pair, with each cell containing a diagonal line from upper right to lower left. For the highest digit of the multiplier at the top of the left column and proceeding downward, multiply the digits for each cell, placing the units digit of the product below the diagonal and the tens digit above; this splitting inherently accounts for place values based on the row and column positions. The lattice serves as a precursor to the grid method by incorporating these diagonal splits for carries, unlike simpler grids that lack such built-in place value separation. Summing occurs along the diagonals parallel to the cell dividers, starting from the bottom-right corner and moving upward-left, where each diagonal may span multiple cells depending on the numbers' lengths; add the digits along each diagonal, carrying over any sum of 10 or more to the next diagonal. The resulting product is formed by reading the diagonal sums from the uppermost (leftmost) to the lowermost (rightmost), placing the rightmost sum as the units digit and proceeding leftward, ignoring a leading zero if the top diagonal sums to 10 or more after carry. For instance, to compute 89 × 92, draw a 2×2 grid with 8 and 9 across the top, and 9 (tens digit) at the top of the left column and 2 (units digit) at the bottom. In the top-left cell, 8 × 9 = 72, so write 2 below the diagonal and 7 above. In the top-right cell, 9 × 9 = 81, so write 1 below and 8 above. In the bottom-left cell, 8 × 2 = 16, so write 6 below and 1 above. In the bottom-right cell, 9 × 2 = 18, so write 8 below and 1 above. The diagonal sums are: units place (bottom-right below): 8. Tens place (bottom-right above + bottom-left below + top-right below): 1 + 6 + 1 = 8. Hundreds place (bottom-left above + top-left below + top-right above): 1 + 2 + 8 = 11 (write 1, carry 1). Thousands place (top-left above + carry): 7 + 1 = 8. Thus, the product is 8188.

Russian Peasant Multiplication

The Russian peasant multiplication, also known as the or Ethiopian multiplication method, is an ancient technique for performing through repeated doubling and halving, without relying on multiplication tables or decimal place values. This approach was documented in the from around 1650 BCE and later observed among Russian peasants in the , where it gained its modern name due to its simplicity for those without formal arithmetic . It proved particularly efficient in cultures lacking widespread use of Hindu-Arabic numerals, allowing calculations with basic operations like and division by 2. The algorithm proceeds as follows: Select one number as the multiplier (to be halved) and the other as the multiplicand (to be doubled). Create two columns: in the first, repeatedly halve the multiplier, discarding any (i.e., use division) until reaching zero; in the second, simultaneously double the multiplicand for each step. For each row where the halved value is odd, mark or note the corresponding doubled value. Continue until the halved value is zero, then sum the marked doubled values to obtain the product. This process requires only the ability to double, halve , and add, making it accessible for manual computation. Mathematically, the method exploits the binary representation of the multiplier: each odd step corresponds to a '1' bit in the binary expansion, effectively summing the multiplicand shifted by powers of 2 (i.e., multiplied by 2^k for each set bit position k). This equivalence to binary decomposition ensures the algorithm's correctness, as multiplication by an n is identical to summing the multiplicand multiplied by each power of 2 where n's binary digit is 1. To illustrate, consider the multiplication of 13 by 18, treating 18 as the multiplier to halve:
Halved (18)Doubled (13)Mark if Odd
18 (even)13
9 (odd)26
4 (even)52
2 (even)104
1 (odd)208
0416
Sum the marked values: 26 + 208 = 234, which is 13 × 18. Another example is 25 × 7, halving 7:
Halved (7)Doubled (25)Mark if Odd
7 (odd)25
3 (odd)50
1 (odd)100
0200
Sum: 25 + 50 + 100 = 175, confirming 25 × 7 = 175. This technique bears a resemblance to the binary multiplication process employed in early digital computers, where shifts and additions mimic the doubling and selective summing.

Quarter Square Multiplication

The quarter square multiplication method computes the product abab using the algebraic identity ab=(a+b)2(ab)24,ab = \frac{(a + b)^2 - (a - b)^2}{4}, which can be expressed in terms of quarter squares q(n)=n2/4q(n) = n^2 / 4, yielding ab=q(a+b)q(ab)ab = q(a + b) - q(a - b). This approach transforms multiplication into additions, subtractions, and lookups in a precomputed table of quarter squares, avoiding direct digit-by-digit operations. It relies on the fact that (a+b)2(ab)2=4ab(a + b)^2 - (a - b)^2 = 4ab, allowing the product to emerge from the difference of two scaled squares divided by 4. To apply the procedure, first compute the sum s=a+bs = a + b and the difference d=abd = a - b (assuming ab>0a \geq b > 0). Look up q(s)q(s) and q(d)q(d) from a quarter square table, subtract to get q(s)q(d)q(s) - q(d), and the result is abab. Tables typically list values of q(n)q(n) for integers nn up to a certain limit, often scaled (e.g., multiplied by 100 or 1000) to avoid fractions. For multi-digit numbers, the method adapts by applying the identity to pairs of digits or using partial products, though this increases table size and lookup complexity for larger operands. For example, to multiply 23 by 17, set a=23a = 23 and b=17b = 17, so s=40s = 40 and d=6d = 6. Assuming a table where q(40)=400q(40) = 400 and q(6)=9q(6) = 9, the product is 4009=391400 - 9 = 391. This method was particularly useful with printed tables, as it required only basic arithmetic beyond the lookups. The origins of quarter square multiplication trace back to ancient mathematics, with the underlying identity appearing in Euclid's Elements (circa 300 BCE), Book II, Proposition 5. Early traces appear in Middle Eastern clay tablets from around 2000 BCE, possibly used for area calculations via half-integer squares. By the 19th century, dedicated quarter square tables proliferated as alternatives to logarithmic tables for multiplication; notable examples include John Leslie's table up to 2,000 in his 1820 The Philosophy of Arithmetic and Samuel Linn Laundy's extensive table up to 100,000 in 1856. These tables often involved collaborative labor, with mathematicians overseeing computations by teams of assistants. The method declined in the 20th century with the advent of electronic calculators and computers, which rendered table lookups obsolete.

Computer Multiplication Algorithms

Binary Multiplication in Computers

Binary multiplication in computers adapts the long multiplication algorithm to binary representation, where operands consist solely of bits valued at or 1, eliminating the need for multi-digit partial products beyond simple AND operations. For each bit position in the multiplier, if the bit is 1, the multiplicand is logically ANDed with the multiplier bit (yielding the multiplicand itself) and shifted left by the bit's position—equivalent to by powers of 2—before being added to an accumulator; if the bit is , no addition occurs for that position. This shift-and-add process is implemented in hardware using barrel for efficient left shifts and circuits for accumulation, forming the basis of in central units (CPUs). To illustrate, consider multiplying the 4-bit 1011₂ (decimal 11) by 1101₂ ( 13). The multiplier's bits, from least to most significant, are 1, 0, 1, 1. The partial products are generated as follows:
  • For bit 0 (value 1): 1011₂
  • For bit 1 (value 0, shifted left by 1): 0000₂
  • For bit 2 (value 1, shifted left by 2): 101100₂
  • For bit 3 (value 1, shifted left by 3): 1011000₂
Summing these yields 1011₂ + 101100₂ + 1011000₂ = 10001111₂ ( 143), confirming 11 × 13 = 143. In hardware implementations, parallel processing accelerates this by generating all partial products simultaneously via an of AND gates, followed by reduction using adder . Array multipliers organize full adders and half adders in a rectangular grid to sum partial products row by row, propagating carries horizontally and vertically for the final product. For faster reduction of multiple partial products, Wallace trees employ a parallel structure of carry-save adders to compress terms logarithmically, reducing the number of adder levels from linear to O(log n) depth. This technique, proposed by C. S. Wallace, minimizes delay in combinational circuits. Wallace trees and array multipliers were foundational in early microprocessors, such as the , where the MUL instruction executed shift-and-add sequences via within the (ALU). The time complexity of binary multiplication via this method is O(n²) bit operations for n-bit operands, as it involves n partial products each requiring O(n) work for generation and summation, rendering it efficient for small fixed-width integers in ALUs but prompting advanced algorithms for larger numbers.

Karatsuba Multiplication

The Karatsuba multiplication algorithm is a divide-and-conquer method for multiplying large integers that reduces the number of required multiplications from the naive four to three, offering asymptotic improvements over the standard quadratic-time approach. Developed by Soviet mathematician Anatoly Karatsuba in 1960 while a student under Andrey Kolmogorov, the algorithm was initially met with skepticism by Kolmogorov, who had recently conjectured that no sub-quadratic multiplication method existed; however, Karatsuba's result disproved this, and the work was published in 1962 in collaboration with Yuri Ofman. This breakthrough marked the first asymptotically faster integer multiplication algorithm, influencing subsequent developments in computational complexity. The algorithm operates by splitting each input number into two parts of roughly equal size, assuming the numbers are represented in base BB with nn digits, where nn is even for simplicity. Let x=x1Bm+x0x = x_1 B^{m} + x_0 and y=y1Bm+y0y = y_1 B^{m} + y_0, where m=n/2m = n/2, x1,x0<Bmx_1, x_0 < B^m, and similarly for yy. Instead of computing the four products x1y1x_1 y_1, x1y0x_1 y_0, x0y1x_0 y_1, and x0y0x_0 y_0 as in long multiplication, the method computes only three: p1=x1y1,p2=x0y0,p3=(x1+x0)(y1+y0).p_1 = x_1 y_1, \quad p_2 = x_0 y_0, \quad p_3 = (x_1 + x_0)(y_1 + y_0). The product is then reconstructed as xy=p1B2m+[p3p1p2]Bm+p2,x y = p_1 B^{2m} + \left[ p_3 - p_1 - p_2 \right] B^m + p_2, where the middle term p3p1p2=x1y0+x0y1p_3 - p_1 - p_2 = x_1 y_0 + x_0 y_1. This avoids one multiplication at the cost of a few additions and subtractions, which are cheaper operations. The process can be applied recursively to the smaller subproducts for larger numbers. The time complexity of the Karatsuba algorithm is O(nlog23)O(n^{\log_2 3}), where log231.585\log_2 3 \approx 1.585, significantly better than the O(n2)O(n^2) complexity of schoolbook multiplication for sufficiently large nn. This bound arises from the recurrence T(n)=3T(n/2)+O(n)T(n) = 3 T(n/2) + O(n), solved using the master theorem, yielding the sub-quadratic growth; the constant factors are higher than the naive method, so it becomes advantageous only for numbers with hundreds of digits or more. A concrete example illustrates the algorithm with the multiplication 1234×56781234 \times 5678, splitting at m=2m=2 digits (base 100): 1234=121002+341234 = 12 \cdot 100^2 + 34 and 5678=561002+785678 = 56 \cdot 100^2 + 78. Compute p1=12×56=672p_1 = 12 \times 56 = 672, p2=34×78=2652p_2 = 34 \times 78 = 2652, and p3=(12+34)(56+78)=46×134=6164p_3 = (12 + 34)(56 + 78) = 46 \times 134 = 6164. The middle coefficient is 61646722652=28406164 - 672 - 2652 = 2840, so the product is 6721004+28401002+2652=6,720,000+284,000+2,652=7,006,652672 \cdot 100^4 + 2840 \cdot 100^2 + 2652 = 6,720,000 + 284,000 + 2,652 = 7,006,652, matching the direct computation. Recursive application extends the method to arbitrarily large numbers by further subdividing the subproducts.

Toom–Cook Algorithm

The Toom–Cook algorithm is a family of multiplication algorithms that generalize the Karatsuba method by splitting each operand into kk parts rather than two, treating them as polynomials of degree k1k-1, evaluating at 2k12k-1 points, performing multiplications at those points, and interpolating back to obtain the product coefficients. This reduces the number of multiplications from k2k^2 to 2k12k-1, offering better asymptotic performance for larger kk, though with increased additive work in evaluation and interpolation. Andrei Toom introduced the approach in 1963, and Stephen Cook refined it in his 1966 PhD thesis, leading to variants like Toom-3 (k=3), which requires 5 multiplications instead of 9 for three-part splits. For Toom-3, numbers are split into three segments in base BB, say x=x2B2m+x1Bm+x0x = x_2 B^{2m} + x_1 B^m + x_0 and similarly for yy. The polynomials are evaluated at five points (typically -1, 0, 1, 1/2, ∞), multiplied pointwise, and interpolated using Lagrange or Newton methods to recover the seven coefficients of the degree-4 product polynomial, followed by carry propagation. The time complexity for Toom-k is O(nlogk(2k1))O(n^{\log_k (2k-1)}); for Toom-3, log351.465\log_3 5 \approx 1.465, improving on for moderate large numbers, making it practical in libraries like for operands around 100-1000 digits before switching to . An example for Toom-3 with small numbers in base 10, multiplying 123 × 456 (split as 1|2|3 and 4|5|6, m=1): Evaluate polynomials at points like 0 (gives x0y0=36=18), 1 (123*456 directly but scaled), -1, etc., but detailed interpolation yields the product 56088 after adjustments. In practice, higher-precision splits are used recursively.

Schönhage–Strassen Algorithm

The Schönhage–Strassen algorithm is a fast multiplication method for very large integers, leveraging to achieve sub-quadratic time complexity. Developed for numbers with thousands or more digits, it represents a significant advancement over earlier divide-and-conquer approaches by incorporating cyclic convolution techniques. The algorithm is particularly effective for asymptotic performance in computational number theory and cryptographic applications requiring high-precision arithmetic. Invented by German mathematicians Arnold Schönhage and Volker Strassen, the algorithm was first described in their 1971 paper, marking a breakthrough in integer multiplication by integrating fast Fourier transform (FFT) principles into exact arithmetic. It has been implemented in major libraries such as the GNU Multiple Precision Arithmetic Library (GMP), where it serves as the default method for multiplying integers beyond a certain size threshold, typically around 10,000 bits. At its core, the algorithm treats the input integers as polynomials and computes their product via cyclic convolution using a number-theoretic transform (NTT), a discrete Fourier transform variant over finite fields to ensure exact integer results without floating-point errors. To handle large numbers, the inputs are split into smaller blocks, represented as coefficients in a polynomial modulo a carefully chosen prime or a ring like Z/(22k+1)Z\mathbb{Z}/(2^{2^k} + 1)\mathbb{Z}, where the modulus supports primitive roots of unity for efficient transforms. The convolution is performed recursively, with the transform size doubling at each level to manage carries. Final reconstruction uses the Chinese Remainder Theorem (CRT) to combine results from multiple modular computations, resolving any overflow and yielding the exact product. This NTT-based approach avoids the precision issues of standard FFT while maintaining fast convolution. The time complexity of the Schönhage–Strassen algorithm is O(nlognloglogn)O(n \log n \log \log n) bit operations for multiplying two nn-bit integers, which is asymptotically superior to the O(n1.585)O(n^{1.585}) of the and the higher-degree polynomials of Toom–Cook methods for sufficiently large nn. This near-linear scaling arises from the recursive application of FFT-like transforms, where the loglogn\log \log n factor accounts for the depth of recursion in handling the transform lengths. For practical sizes, such as n>220n > 2^{20}, it outperforms simpler methods, though constant factors make it less efficient for smaller inputs. A high-level example illustrates the process for multiplying two 2k2^k-bit numbers aa and bb. First, partition aa and bb into m=2k/2m = 2^{k/2} blocks of log2m\log_2 m-bit digits each, treating them as polynomials A(x)A(x) and B(x)B(x) of degree less than mm. Compute the cyclic convolution C(x)=A(x)B(x)mod(xm1)C(x) = A(x) B(x) \mod (x^m - 1) using NTT: transform AA and BB to point-value representations via NTT of length 2m2m (to avoid wrap-around issues), pointwise multiply, and inverse NTT to recover CC. Handle carries by adding the high part of CC shifted by mm to the low part. Repeat this recursively for the NTT steps until the base case of small multiplications, then apply CRT across multiple such computations if using composite moduli for larger ranges. The result is the full product without intermediate rounding errors.

Recent Improvements and Complexity Bounds

Subsequent to the , further refinements have pushed the theoretical limits of integer multiplication closer to the conjectured optimal O(nlogn)O(n \log n) bound. In 2007, Arnold Fürer developed an algorithm achieving O(nlogn2O(logn))O(n \log n \cdot 2^{O(\log^* n)}), where logn\log^* n is the , slightly improving the extra factor over Schönhage–Strassen by using a more sophisticated choice of rings and transform lengths in the recursive structure. A major breakthrough came in 2019 with the algorithm by and Joris van der Hoeven, which attains O(nlogn)O(n \log n) time, matching the lower bound implied by information-theoretic considerations for in standard models. This result relies on advanced techniques in and careful control of depths to eliminate the loglogn\log \log n factor, though the large hidden constants make it impractical for current hardware. As of 2025, it remains primarily theoretical, with no widespread implementation in libraries like GMP, which continue to rely on Schönhage–Strassen for very large integers due to better practical constants. These advancements underscore the ongoing pursuit of optimal computational bounds, influencing research in algebraic complexity and .

Specialized Multiplication Techniques

Complex Number Multiplication

The multiplication of two complex numbers z1=a+biz_1 = a + bi and z2=c+diz_2 = c + di, where a,b,c,dRa, b, c, d \in \mathbb{R} and i=1i = \sqrt{-1}
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.