Recent from talks
Contribute something
Nothing was collected or created yet.
Fast Fourier transform
View on Wikipedia



A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa.
The DFT is obtained by decomposing a sequence of values into components of different frequencies.[1] This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse (mostly zero) factors.[2] As a result, it manages to reduce the complexity of computing the DFT from , which arises if one simply applies the definition of DFT, to , where n is the data size. The difference in speed can be enormous, especially for long data sets where n may be in the thousands or millions.
As the FFT is merely an algebraic refactoring of terms within the DFT, the DFT and the FFT both perform mathematically equivalent and interchangeable operations, assuming that all terms are computed with infinite precision. However, in the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly or indirectly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory. The best-known FFT algorithms depend upon the factorization of n, but there are FFTs with complexity for all, even prime, n. Many FFT algorithms depend only on the fact that is an nth primitive root of unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms. Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/n factor, any FFT algorithm can easily be adapted for it.
Fast Fourier transforms are widely used for applications in engineering, music, science, and mathematics. The basic ideas were popularized in 1965, but some algorithms had been derived as early as 1805.[1] In 1994, Gilbert Strang described the FFT as "the most important numerical algorithm of our lifetime",[3][4] and it was included in Top 10 Algorithms of 20th Century by the IEEE magazine Computing in Science & Engineering.[5]
History
[edit]The development of fast algorithms for DFT was prefigured in Carl Friedrich Gauss's unpublished 1805 work on the orbits of asteroids Pallas and Juno. Gauss wanted to interpolate the orbits from sample observations;[6][7] his method was very similar to the one that would be published in 1965 by James Cooley and John Tukey, who are generally credited for the invention of the modern generic FFT algorithm. While Gauss's work predated even Joseph Fourier's 1822 results, he did not analyze the method's complexity, and eventually used other methods to achieve the same end.
Between 1805 and 1965, some versions of FFT were published by other authors. Frank Yates in 1932 published his version called interaction algorithm, which provided efficient computation of Hadamard and Walsh transforms.[8] Yates' algorithm is still used in the field of statistical design and analysis of experiments. In 1942, G. C. Danielson and Cornelius Lanczos published their version to compute DFT for x-ray crystallography, a field where calculation of Fourier transforms presented a formidable bottleneck.[9][10] While many methods in the past had focused on reducing the constant factor for computation by taking advantage of symmetries, Danielson and Lanczos realized that one could use the periodicity and apply a doubling trick to "double [n] with only slightly more than double the labor", though like Gauss they did not do the analysis to discover that this led to scaling.[11] In 1958, I. J. Good published a paper establishing the prime-factor FFT algorithm that applies to discrete Fourier transforms of size , where and are coprime.[12]
James Cooley and John Tukey independently rediscovered these earlier algorithms[7] and published a more general FFT in 1965 that is applicable when n is composite and not necessarily a power of 2, as well as analyzing the scaling.[13] Tukey came up with the idea during a meeting of President Kennedy's Science Advisory Committee where a discussion topic involved detecting nuclear tests by the Soviet Union by setting up sensors to surround the country from outside. To analyze the output of these sensors, an FFT algorithm would be needed. In discussion with Tukey, Richard Garwin recognized the general applicability of the algorithm not just to national security problems, but also to a wide range of problems including one of immediate interest to him, determining the periodicities of the spin orientations in a 3-D crystal of Helium-3.[14] Garwin gave Tukey's idea to Cooley (both worked at IBM's Watson labs) for implementation.[15] Cooley and Tukey published the paper in a relatively short time of six months.[16] As Tukey did not work at IBM, the patentability of the idea was doubted and the algorithm went into the public domain, which, through the computing revolution of the next decade, made FFT one of the indispensable algorithms in digital signal processing.
Definition
[edit]Let be complex numbers. The DFT is defined by the formula
where is a primitive nth root of 1.
Evaluating this definition directly requires operations: there are n outputs Xk , and each output requires a sum of n terms. An FFT is any method to compute the same results in operations. All known FFT algorithms require operations, although there is no known proof that lower complexity is impossible.[17]
To illustrate the savings of an FFT, consider the count of complex multiplications and additions for data points. Evaluating the DFT's sums directly involves complex multiplications and complex additions, of which operations can be saved by eliminating trivial operations such as multiplications by 1, leaving about 30 million operations. In contrast, the radix-2 Cooley–Tukey algorithm, for n a power of 2, can compute the same result with only complex multiplications (again, ignoring simplifications of multiplications by 1 and similar) and complex additions, in total about 30,000 operations — a thousand times less than with direct evaluation. In practice, actual performance on modern computers is usually dominated by factors other than the speed of arithmetic operations and the analysis is a complicated subject (for example, see Frigo & Johnson, 2005),[18] but the overall improvement from to remains.
Algorithms
[edit]Cooley–Tukey algorithm
[edit]By far the most commonly used FFT is the Cooley–Tukey algorithm. This is a divide-and-conquer algorithm that recursively breaks down a DFT of any composite size into smaller DFTs of size , along with multiplications by complex roots of unity traditionally called twiddle factors (after Gentleman and Sande, 1966).[19]
This method (and the general idea of an FFT) was popularized by a publication of Cooley and Tukey in 1965,[13] but it was later discovered[1] that those two authors had together independently re-invented an algorithm known to Carl Friedrich Gauss around 1805[20] (and subsequently rediscovered several times in limited forms).
The best known use of the Cooley–Tukey algorithm is to divide the transform into two pieces of size n/2 at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey[1]). These are called the radix-2 and mixed-radix cases, respectively (and other variants such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as those described below.
Other FFT algorithms
[edit]For with coprime and , one can use the prime-factor (Good–Thomas) algorithm (PFA), based on the Chinese remainder theorem, to factorize the DFT similarly to Cooley–Tukey but without the twiddle factors. The Rader–Brenner algorithm (1976)[21] is a Cooley–Tukey-like factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability; it was later superseded by the split-radix variant of Cooley–Tukey (which achieves the same multiplication count but with fewer additions and without sacrificing accuracy). Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The Rader–Brenner[21] and QFT algorithms were proposed for power-of-two sizes, but it is possible that they could be adapted to general composite n. Bruun's algorithm applies to arbitrary even composite sizes.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial , here into real-coefficient polynomials of the form and .
Another polynomial viewpoint is exploited by the Winograd FFT algorithm,[22][23] which factorizes into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only irrational multiplications, leading to a proven achievable lower bound on the number of multiplications for power-of-two sizes; this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime sizes.
Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime n, expresses a DFT of prime size n as a cyclic convolution of (composite) size n – 1, which can then be computed by a pair of ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as a convolution, but this time of the same size (which can be zero-padded to a power of two and evaluated by radix-2 Cooley–Tukey FFTs, for example), via the identity
Hexagonal fast Fourier transform (HFFT) aims at computing an efficient FFT for the hexagonally-sampled data by using a new addressing scheme for hexagonal grids, called Array Set Addressing (ASA).
FFT algorithms specialized for real or symmetric data
[edit]In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry
and efficient FFT algorithms have been designed for this situation (see e.g., Sorensen, 1987).[24][25] One approach consists of taking an ordinary algorithm (e.g. Cooley–Tukey) and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by post-processing operations.
It was once believed that real-input DFTs could be more efficiently computed by means of the discrete Hartley transform (DHT), but it was subsequently argued that a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs.[24] Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has not proved popular.
There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of roughly two in time and memory and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with pre- and post-processing.
Computational issues
[edit]Bounds on complexity and operation counts
[edit]A fundamental question of longstanding theoretical interest is to prove lower bounds on the complexity and exact operation counts of fast Fourier transforms, and many open problems remain. It is not rigorously proved whether DFTs truly require (i.e., order or greater) operations, even for the simple case of power of two sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic operations is usually the focus of such questions, although actual performance on modern-day computers is determined by many other factors such as cache or CPU pipeline optimization.
Following work by Shmuel Winograd (1978),[22] a tight lower bound is known for the number of real multiplications required by an FFT. It can be shown that only irrational real multiplications are required to compute a DFT of power-of-two length . Moreover, explicit algorithms that achieve this count are known (Heideman & Burrus, 1986;[26] Duhamel, 1990[27]). However, these algorithms require too many additions to be practical, at least on modern computers with hardware multipliers (Duhamel, 1990;[27] Frigo & Johnson, 2005).[18]
A tight lower bound is not known on the number of required additions, although lower bounds have been proved under some restrictive assumptions on the algorithms. In 1973, Morgenstern[28] proved an lower bound on the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for most but not all FFT algorithms). Pan (1986)[29] proved an lower bound assuming a bound on a measure of the FFT algorithm's asynchronicity, but the generality of this assumption is unclear. For the case of power-of-two n, Papadimitriou (1979)[30] argued that the number of complex-number additions achieved by Cooley–Tukey algorithms is optimal under certain assumptions on the graph of the algorithm (his assumptions imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would imply that at least real additions are required, although this is not a tight bound because extra additions are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer than complex-number additions (or their equivalent) for power-of-two n.
A third problem is to minimize the total number of real multiplications and additions, sometimes called the arithmetic complexity (although in this context it is the exact count and not the asymptotic complexity that is being considered). Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for power-of-two n was long achieved by the split-radix FFT algorithm, which requires real multiplications and additions for n > 1. This was recently reduced to (Johnson and Frigo, 2007;[17] Lundy and Van Buskirk, 2007[31]). A slightly larger count (but still better than split radix for n ≥ 256) was shown to be provably optimal for n ≤ 512 under additional restrictions on the possible algorithms (split-radix-like flowgraphs with unit-modulus multiplicative factors), by reduction to a satisfiability modulo theories problem solvable by brute force (Haynal & Haynal, 2011).[32]
Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related problems such as real-data FFTs, discrete cosine transforms, discrete Hartley transforms, and so on, that any improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990).[33]
Approximations
[edit]All of the FFT algorithms discussed above compute the DFT exactly (i.e., neglecting floating-point errors). A few FFT algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999)[34] achieves lower communication requirements for parallel computing with the help of a fast multipole method. A wavelet-based approximate FFT by Guo and Burrus (1996)[35] takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995).[36] The Edelman algorithm works equally well for sparse and non-sparse data, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data. Conversely, if the data are sparse—that is, if only k out of n Fourier coefficients are nonzero—then the complexity can be reduced to , and this has been demonstrated to lead to practical speedups compared to an ordinary FFT for n/k > 32 in a large-n example (n = 222) using a probabilistic approximate algorithm (which estimates the largest k coefficients to several decimal places).[37]
Accuracy
[edit]FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley–Tukey, have excellent numerical properties as a consequence of the pairwise summation structure of the algorithms. The upper bound on the relative error for the Cooley–Tukey algorithm is , compared to for the naïve DFT formula,[19] where 𝜀 is the machine floating-point relative precision. In fact, the root mean square (rms) errors are much better than these upper bounds, being only for Cooley–Tukey and for the naïve DFT (Schatzman, 1996).[38] These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate trigonometric recurrence formulas. Some FFTs other than Cooley–Tukey, such as the Rader–Brenner algorithm, are intrinsically less stable.
In fixed-point arithmetic, the finite-precision errors accumulated by FFT algorithms are worse, with rms errors growing as for the Cooley–Tukey algorithm (Welch, 1969).[39] Achieving this accuracy requires careful attention to scaling to minimize loss of precision, and fixed-point FFT algorithms involve rescaling at each intermediate stage of decompositions like Cooley–Tukey.
To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in time by a simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random inputs (Ergün, 1995).[40]
The values for intermediate frequencies may be obtained by various averaging methods.
Multidimensional FFTs
[edit]As defined in the multidimensional DFT article, the multidimensional DFT
transforms an array xn with a d-dimensional vector of indices by a set of d nested summations (over for each j), where the division is performed element-wise. Equivalently, it is the composition of a sequence of d sets of one-dimensional DFTs, performed along one dimension at a time (in any order).
This compositional viewpoint immediately provides the simplest and most common multidimensional DFT algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply performs a sequence of d one-dimensional FFTs (by any of the above algorithms): first you transform along the n1 dimension, then along the n2 dimension, and so on (actually, any ordering works). This method is easily shown to have the usual complexity, where is the total number of data points transformed. In particular, there are n/n1 transforms of size n1, etc., so the complexity of the sequence of FFTs is:
In two dimensions, the xk can be viewed as an matrix, and this algorithm corresponds to first performing the FFT of all the rows (resp. columns), grouping the resulting transformed rows (resp. columns) together as another matrix, and then performing the FFT on each of the columns (resp. rows) of this second matrix, and similarly grouping the results into the final result matrix.
In more than two dimensions, it is often advantageous for cache locality to group the dimensions recursively. For example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar slice for each fixed n1, and then perform the one-dimensional FFTs along the n1 direction. More generally, an asymptotically optimal cache-oblivious algorithm consists of recursively dividing the dimensions into two groups and that are transformed recursively (rounding if d is not even) (see Frigo and Johnson, 2005).[18] Still, this remains a straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT algorithm as the base case, and still has complexity. Yet another variation is to perform matrix transpositions in between transforming subsequent dimensions, so that the transforms operate on contiguous data; this is especially important for out-of-core and distributed memory situations where accessing non-contiguous data is extremely time-consuming.
There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of them have complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm, which is a generalization of the ordinary Cooley–Tukey algorithm where one divides the transform dimensions by a vector of radices at each step. (This may also have cache benefits.) The simplest case of vector-radix is where all of the radices are equal (e.g., vector-radix-2 divides all of the dimensions by two), but this is not necessary. Vector radix with only a single non-unit radix at a time, i.e. , is essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due to Nussbaumer (1977),[41] which view the transform in terms of convolutions and polynomial products. See Duhamel and Vetterli (1990)[33] for more information and references.
Other generalizations
[edit]An generalization to spherical harmonics on the sphere S2 with n2 nodes was described by Mohlenkamp,[42] along with an algorithm conjectured (but not proven) to have complexity; Mohlenkamp also provides an implementation in the libftsh library.[43] A spherical-harmonic algorithm with complexity is described by Rokhlin and Tygert.[44]
The fast folding algorithm is analogous to the FFT, except that it operates on a series of binned waveforms rather than a series of real or complex scalar values. Rotation (which in the FFT is multiplication by a complex phasor) is a circular shift of the component waveform.
Various groups have also published FFT algorithms for non-equispaced data, as reviewed in Potts et al. (2001).[45] Such algorithms do not strictly compute the DFT (which is only defined for equispaced data), but rather some approximation thereof (a non-uniform discrete Fourier transform, or NDFT, which itself is often computed only approximately). More generally, there are various other methods of spectral estimation.
Applications
[edit]The FFT is used in digital recording, sampling, additive synthesis and pitch correction software.[46]
The FFT's importance derives from the fact that it has made working in the frequency domain equally computationally feasible as working in the temporal or spatial domain. Some of the important applications of the FFT include:[16][47]
- fast large-integer multiplication algorithms and polynomial multiplication,
- efficient matrix–vector multiplication for Toeplitz, circulant and other structured matrices,
- filtering algorithms (see overlap–add and overlap–save methods),
- fast algorithms for discrete cosine or sine transforms (e.g. fast DCT used for JPEG and MPEG/MP3 encoding and decoding),
- fast Chebyshev approximation,
- solving difference equations,
- computation of isotopic distributions.[48]
- modulation and demodulation of complex data symbols using orthogonal frequency-division multiplexing (OFDM) for 5G, LTE, Wi-Fi, DSL, and other modern communication systems.
Alternatives
[edit]The FFT can be a poor choice for analyzing signals with non-stationary frequency content—where the frequency characteristics change over time. DFTs provide a global frequency estimate, assuming that all frequency components are present throughout the entire signal, which makes it challenging to detect short-lived or transient features within signals.
For cases where frequency information appears briefly in the signal or generally varies over time, alternatives like the short-time Fourier transform, discrete wavelet transforms, or discrete Hilbert transform can be more suitable.[49][50] These transforms allow for localized frequency analysis by capturing both frequency and time-based information.
Research areas
[edit]- Big FFTs
- With the explosion of big data in fields such as astronomy, the need for 512K FFTs has arisen for certain interferometry calculations. The data collected by projects such as WMAP and LIGO require FFTs of tens of billions of points. As this size does not fit into main memory, so-called out-of-core FFTs are an active area of research.[51]
- Approximate FFTs
- For applications such as MRI, it is necessary to compute DFTs for nonuniformly spaced grid points and/or frequencies. Multipole-based approaches can compute approximate quantities with factor of runtime increase.[52]
- Group FFTs
- The FFT may also be explained and interpreted using group representation theory, allowing for further generalization. A function on any compact group, including non-cyclic, has an expansion in terms of a basis of irreducible matrix elements. It remains an active area of research to find an efficient algorithm for performing this change of basis. Applications including efficient spherical harmonic expansion, analyzing certain Markov processes, robotics etc.[53]
- Quantum FFTs
- Shor's fast algorithm for integer factorization on a quantum computer has a subroutine to compute DFT of a binary vector. This is implemented as a sequence of 1- or 2-bit quantum gates now known as quantum FFT, which is effectively the Cooley–Tukey FFT realized as a particular factorization of the Fourier matrix. Extensions to these ideas are currently being explored.[54]
Language reference
[edit]| Language | Command–method | Prerequisites |
|---|---|---|
| R | stats::fft(x) | None |
| Scilab | fft(x) | None |
| MATLAB, Octave | fft(x) | None |
| Python | fft.fft(x) | numpy or scipy |
| Mathematica | Fourier[x] | None |
| Fortran | fftw_one(plan,in,out) | FFTW |
| Julia | fft(A [,dims]) | FFTW |
| Rust | fft.process(&mut x); | rustfft |
| Haskell | dft x | fft |
See also
[edit]FFT-related algorithms:
- Bit-reversal permutation
- Goertzel algorithm – computes individual terms of the discrete Fourier transform
FFT implementations:
- ALGLIB – a dual/GPL-licensed C++ and C# library (also supporting other languages), with real/complex FFT implementation
- FFTPACK – another Fortran FFT library (public domain)
- Architecture-specific:
- Arm Performance Libraries[55]
- Intel Integrated Performance Primitives
- Intel Math Kernel Library
- Many more implementations are available,[56] for CPUs and GPUs, such as PocketFFT for C++
Other links:
- Odlyzko–Schönhage algorithm applies the FFT to finite Dirichlet series
- Schönhage–Strassen algorithm – asymptotically fast multiplication algorithm for large integers
- Butterfly diagram – a diagram used to describe FFTs
- Spectral music (involves application of DFT analysis to musical composition)
- Spectrum analyzer – any of several devices that perform spectrum analysis, often via a DFT
- Time series
- Fast Walsh–Hadamard transform
- Generalized distributive law
- Least-squares spectral analysis
- Multidimensional transform
- Multidimensional discrete convolution
- Fast Fourier Transform Telescope
References
[edit]- ^ a b c d Heideman, Michael T.; Johnson, Don H.; Burrus, Charles Sidney (1984). "Gauss and the history of the fast Fourier transform" (PDF). IEEE ASSP Magazine. 1 (4): 14–21. Bibcode:1984IASSP...1...14H. CiteSeerX 10.1.1.309.181. doi:10.1109/MASSP.1984.1162257. S2CID 10032502. Archived (PDF) from the original on 2013-03-19.
- ^ Van Loan, Charles (1992). Computational Frameworks for the Fast Fourier Transform. SIAM.
- ^ Strang, Gilbert (May–June 1994). "Wavelets". American Scientist. 82 (3): 250–255. Bibcode:1994AmSci..82..250S. JSTOR 29775194.
- ^ Kent, Raymond D.; Read, Charles (2002). The Acoustic Analysis of Speech (2nd ed.). Singular/Thomson Learning. p. 61. ISBN 978-0-7693-0112-9.
- ^ Dongarra, Jack; Sullivan, Francis (January 2000). "Guest Editors' Introduction to the top 10 algorithms". Computing in Science & Engineering. 2 (1): 22–23. Bibcode:2000CSE.....2a..22D. doi:10.1109/MCISE.2000.814652. ISSN 1521-9615.
- ^ Gauss, Carl Friedrich (1866). "Theoria interpolationis methodo nova tractata" [Theory regarding a new method of interpolation]. Nachlass (Unpublished manuscript). Werke (in Latin and German). Vol. 3. Göttingen, Germany: Königlichen Gesellschaft der Wissenschaften zu Göttingen. pp. 265–303.
- ^ a b Heideman, Michael T.; Johnson, Don H.; Burrus, Charles Sidney (1985-09-01). "Gauss and the history of the fast Fourier transform". Archive for History of Exact Sciences. 34 (3): 265–277. CiteSeerX 10.1.1.309.181. doi:10.1007/BF00348431. ISSN 0003-9519. S2CID 122847826.
- ^ Yates, Frank (1937). "The design and analysis of factorial experiments". Technical Communication No. 35 of the Commonwealth Bureau of Soils. 142 (3585): 90–92. Bibcode:1938Natur.142...90F. doi:10.1038/142090a0. S2CID 23501205.
- ^ Danielson, Gordon C.; Lanczos, Cornelius (1942). "Some improvements in practical Fourier analysis and their application to x-ray scattering from liquids". Journal of the Franklin Institute. 233 (4): 365–380. doi:10.1016/S0016-0032(42)90767-1.
- ^ Lanczos, Cornelius (1956). Applied Analysis. Prentice–Hall.
- ^ Cooley, James W.; Lewis, Peter A. W.; Welch, Peter D. (June 1967). "Historical notes on the fast Fourier transform". IEEE Transactions on Audio and Electroacoustics. 15 (2): 76–79. Bibcode:1967ITAuE..15...76C. CiteSeerX 10.1.1.467.7209. doi:10.1109/TAU.1967.1161903. ISSN 0018-9278.
- ^ Good, I. J. (July 1958). "The Interaction Algorithm and Practical Fourier Analysis". Journal of the Royal Statistical Society, Series B (Methodological). 20 (2): 361–372. doi:10.1111/j.2517-6161.1958.tb00300.x.
- ^ a b Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Mathematics of Computation. 19 (90): 297–301. doi:10.1090/S0025-5718-1965-0178586-1. ISSN 0025-5718.
- ^ Cooley, James W. (1987). "The Re-Discovery of the Fast Fourier Transform Algorithm" (PDF). Microchimica Acta. Vol. III. Vienna, Austria. pp. 33–45. Archived (PDF) from the original on 2016-08-20.
{{cite book}}: CS1 maint: location missing publisher (link) - ^ Garwin, Richard (June 1969). "The Fast Fourier Transform As an Example of the Difficulty in Gaining Wide Use for a New Technique" (PDF). IEEE Transactions on Audio and Electroacoustics. AU-17 (2): 68–72. Archived (PDF) from the original on 2006-05-17.
- ^ a b Rockmore, Daniel N. (January 2000). "The FFT: an algorithm the whole family can use". Computing in Science & Engineering. 2 (1): 60–64. Bibcode:2000CSE.....2a..60R. CiteSeerX 10.1.1.17.228. doi:10.1109/5992.814659. ISSN 1521-9615. S2CID 14978667.
- ^ a b Frigo, Matteo; Johnson, Steven G. (January 2007) [2006-12-19]. "A Modified Split-Radix FFT With Fewer Arithmetic Operations". IEEE Transactions on Signal Processing. 55 (1): 111–119. Bibcode:2007ITSP...55..111J. CiteSeerX 10.1.1.582.5497. doi:10.1109/tsp.2006.882087. S2CID 14772428.
- ^ a b c Frigo, Matteo; Johnson, Steven G. (2005). "The Design and Implementation of FFTW3" (PDF). Proceedings of the IEEE. 93 (2): 216–231. Bibcode:2005IEEEP..93..216F. CiteSeerX 10.1.1.66.3097. doi:10.1109/jproc.2004.840301. S2CID 6644892. Archived (PDF) from the original on 2005-02-07.
- ^ a b Gentleman, W. Morven; Sande, G. (1966). "Fast Fourier transforms—for fun and profit". Proceedings of the AFIPS. 29: 563–578. doi:10.1145/1464291.1464352. S2CID 207170956.
- ^ Gauss, Carl Friedrich (1866) [1805]. Theoria interpolationis methodo nova tractata. Werke (in Latin and German). Vol. 3. Göttingen, Germany: Königliche Gesellschaft der Wissenschaften. pp. 265–327.
- ^ a b Brenner, Norman M.; Rader, Charles M. (1976). "A New Principle for Fast Fourier Transformation". IEEE Transactions on Acoustics, Speech, and Signal Processing. 24 (3): 264–266. Bibcode:1976ITASS..24..264R. doi:10.1109/TASSP.1976.1162805.
- ^ a b Winograd, Shmuel (1978). "On computing the discrete Fourier transform". Mathematics of Computation. 32 (141): 175–199. doi:10.1090/S0025-5718-1978-0468306-4. JSTOR 2006266. PMC 430186. PMID 16592303.
- ^ Winograd, Shmuel (1979). "On the multiplicative complexity of the discrete Fourier transform". Advances in Mathematics. 32 (2): 83–117. doi:10.1016/0001-8708(79)90037-9.
- ^ a b Sorensen, Henrik V.; Jones, Douglas L.; Heideman, Michael T.; Burrus, Charles Sidney (1987). "Real-valued fast Fourier transform algorithms". IEEE Transactions on Acoustics, Speech, and Signal Processing. 35 (6): 849–863. Bibcode:1987ITASS..35..849S. CiteSeerX 10.1.1.205.4523. doi:10.1109/TASSP.1987.1165220.
- ^ Sorensen, Henrik V.; Jones, Douglas L.; Heideman, Michael T.; Burrus, Charles Sidney (1987). "Corrections to "Real-valued fast Fourier transform algorithms"". IEEE Transactions on Acoustics, Speech, and Signal Processing. 35 (9): 1353. Bibcode:1987ITASS..35R1353S. doi:10.1109/TASSP.1987.1165284.
- ^ Heideman, Michael T.; Burrus, Charles Sidney (1986). "On the number of multiplications necessary to compute a length-2n DFT". IEEE Transactions on Acoustics, Speech, and Signal Processing. 34 (1): 91–95. Bibcode:1986ITASS..34...91H. doi:10.1109/TASSP.1986.1164785.
- ^ a b Duhamel, Pierre (1990). "Algorithms meeting the lower bounds on the multiplicative complexity of length-2n DFTs and their connection with practical algorithms". IEEE Transactions on Acoustics, Speech, and Signal Processing. 38 (9): 1504–1511. doi:10.1109/29.60070.
- ^ Morgenstern, Jacques (1973). "Note on a lower bound of the linear complexity of the fast Fourier transform". Journal of the ACM. 20 (2): 305–306. doi:10.1145/321752.321761. S2CID 2790142.
- ^ Pan, Victor Ya. (1986-01-02). "The trade-off between the additive complexity and the asynchronicity of linear and bilinear algorithms". Information Processing Letters. 22 (1): 11–14. doi:10.1016/0020-0190(86)90035-9. Retrieved 2017-10-31.
- ^ Papadimitriou, Christos H. (1979). "Optimality of the fast Fourier transform". Journal of the ACM. 26 (1): 95–102. doi:10.1145/322108.322118. S2CID 850634.
- ^ Lundy, Thomas J.; Van Buskirk, James (2007). "A new matrix approach to real FFTs and convolutions of length 2k". Computing. 80 (1): 23–45. doi:10.1007/s00607-007-0222-6. S2CID 27296044.
- ^ Haynal, Steve; Haynal, Heidi (2011). "Generating and Searching Families of FFT Algorithms" (PDF). Journal on Satisfiability, Boolean Modeling and Computation. 7 (4): 145–187. arXiv:1103.5740. Bibcode:2011arXiv1103.5740H. doi:10.3233/SAT190084. S2CID 173109. Archived from the original (PDF) on 2012-04-26.
- ^ a b Duhamel, Pierre; Vetterli, Martin (1990). "Fast Fourier transforms: a tutorial review and a state of the art". Signal Processing. 19 (4): 259–299. Bibcode:1990SigPr..19..259D. doi:10.1016/0165-1684(90)90158-U.
- ^ Edelman, Alan; McCorquodale, Peter; Toledo, Sivan (1999). "The Future Fast Fourier Transform?" (PDF). SIAM Journal on Scientific Computing. 20 (3): 1094–1114. CiteSeerX 10.1.1.54.9339. doi:10.1137/S1064827597316266. Archived (PDF) from the original on 2017-07-05.
- ^ Guo, Haitao; Burrus, Charles Sidney (1996). "Fast approximate Fourier transform via wavelets transform". In Unser, Michael A.; Aldroubi, Akram; Laine, Andrew F. (eds.). Wavelet Applications in Signal and Image Processing IV. Proceedings of SPIE. Vol. 2825. pp. 250–259. Bibcode:1996SPIE.2825..250G. CiteSeerX 10.1.1.54.3984. doi:10.1117/12.255236. S2CID 120514955.
- ^ Shentov, Ognjan V.; Mitra, Sanjit K.; Heute, Ulrich; Hossen, Abdul N. (1995). "Subband DFT. I. Definition, interpretations and extensions". Signal Processing. 41 (3): 261–277. doi:10.1016/0165-1684(94)00103-7.
- ^ Hassanieh, Haitham; Indyk, Piotr; Katabi, Dina; Price, Eric (January 2012). "Simple and Practical Algorithm for Sparse Fourier Transform" (PDF). ACM-SIAM Symposium on Discrete Algorithms. Archived (PDF) from the original on 2012-03-04. (NB. See also the sFFT Web Page.)
- ^ Schatzman, James C. (1996). "Accuracy of the discrete Fourier transform and the fast Fourier transform". SIAM Journal on Scientific Computing. 17 (5): 1150–1166. Bibcode:1996SJSC...17.1150S. CiteSeerX 10.1.1.495.9184. doi:10.1137/s1064827593247023.
- ^ Welch, Peter D. (1969). "A fixed-point fast Fourier transform error analysis". IEEE Transactions on Audio and Electroacoustics. 17 (2): 151–157. Bibcode:1969ITAuE..17..151W. doi:10.1109/TAU.1969.1162035.
- ^ Ergün, Funda (1995). "Testing multivariate linear functions". Proceedings of the twenty-seventh annual ACM symposium on Theory of computing - STOC '95. Kyoto, Japan. pp. 407–416. doi:10.1145/225058.225167. ISBN 978-0897917186. S2CID 15512806.
{{cite book}}: CS1 maint: location missing publisher (link) - ^ Nussbaumer, Henri J. (1977). "Digital filtering using polynomial transforms". Electronics Letters. 13 (13): 386–387. Bibcode:1977ElL....13..386N. doi:10.1049/el:19770280.
- ^ Mohlenkamp, Martin J. (1999). "A Fast Transform for Spherical Harmonics" (PDF). Journal of Fourier Analysis and Applications. 5 (2–3): 159–184. Bibcode:1999JFAA....5..159M. CiteSeerX 10.1.1.135.9830. doi:10.1007/BF01261607. S2CID 119482349. Archived (PDF) from the original on 2017-05-06. Retrieved 2018-01-11.
- ^ "libftsh library". Archived from the original on 2010-06-23. Retrieved 2007-01-09.
- ^ Rokhlin, Vladimir; Tygert, Mark (2006). "Fast Algorithms for Spherical Harmonic Expansions" (PDF). SIAM Journal on Scientific Computing. 27 (6): 1903–1928. Bibcode:2006SJSC...27.1903R. CiteSeerX 10.1.1.125.7415. doi:10.1137/050623073. Archived (PDF) from the original on 2014-12-17. Retrieved 2014-09-18. [1]
- ^ Potts, Daniel; Steidl, Gabriele; Tasche, Manfred (2001). "Fast Fourier transforms for nonequispaced data: A tutorial" (PDF). In Benedetto, J. J.; Ferreira, P. (eds.). Modern Sampling Theory: Mathematics and Applications. Birkhäuser. Archived (PDF) from the original on 2007-09-26.
- ^ Burgess, Richard James (2014). The History of Music Production. Oxford University Press. ISBN 978-0199357178. Retrieved 1 August 2019.
- ^ Chu, Eleanor; George, Alan (1999-11-11) [1999-11-11]. "Chapter 16". Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. CRC Press. pp. 153–168. ISBN 978-1-42004996-1.
- ^ Fernandez-de-Cossio Diaz, Jorge; Fernandez-de-Cossio, Jorge (2012-08-08). "Computation of Isotopic Peak Center-Mass Distribution by Fourier Transform". Analytical Chemistry. 84 (16): 7052–7056. doi:10.1021/ac301296a. ISSN 0003-2700. PMID 22873736.
- ^ Kijewski-Correa, T.; Kareem, A. (October 2006). "Efficacy of Hilbert and Wavelet Transforms for Time-Frequency Analysis". Journal of Engineering Mechanics. 132 (10): 1037–1049. doi:10.1061/(ASCE)0733-9399(2006)132:10(1037). ISSN 0733-9399.
- ^ Stern, Richard M. (2020). "Notes on short-time Fourier transforms" (PDF). Archived (PDF) from the original on 2025-02-08. Retrieved 2025-02-08.
- ^ Cormen, Thomas H.; Nicol, David M. (1998). "Performing out-of-core FFTs on parallel disk systems". Parallel Computing. 24 (1): 5–20. CiteSeerX 10.1.1.44.8212. doi:10.1016/S0167-8191(97)00114-2. S2CID 14996854.
- ^ Dutt, Alok; Rokhlin, Vladimir (1993-11-01). "Fast Fourier Transforms for Nonequispaced Data". SIAM Journal on Scientific Computing. 14 (6): 1368–1393. Bibcode:1993SJSC...14.1368D. doi:10.1137/0914081. ISSN 1064-8275.
- ^ Rockmore, Daniel N. (2004). "Recent Progress and Applications in Group FFTs". In Byrnes, Jim (ed.). Computational Noncommutative Algebra and Applications. NATO Science Series II: Mathematics, Physics and Chemistry. Vol. 136. Springer Netherlands. pp. 227–254. CiteSeerX 10.1.1.324.4700. doi:10.1007/1-4020-2307-3_9. ISBN 978-1-4020-1982-1. S2CID 1412268.
- ^ Ryo, Asaka; Kazumitsu, Sakai; Ryoko, Yahagi (2020). "Quantum circuit for the fast Fourier transform". Quantum Information Processing. 19 (277): 277. arXiv:1911.03055. Bibcode:2020QuIP...19..277A. doi:10.1007/s11128-020-02776-5. S2CID 207847474.
- ^ "Arm Performance Libraries". Arm. 2020. Retrieved 2020-12-16.
- ^ "Complete list of C/C++ FFT libraries". VCV Community. 2020-04-05. Retrieved 2021-03-03.
Further reading
[edit]- Brigham, Elbert Oran (1974). The fast Fourier transform (Nachdr. ed.). Englewood Cliffs, N.J: Prentice-Hall. ISBN 978-0-13-307496-3.
- Briggs, William L.; Henson, Van Emden (1995). The DFT: An Owner's Manual for the Discrete Fourier Transform. Philadelphia: Society for Industrial and Applied Mathematics. ISBN 978-0-89871-342-8.
- Chu, Eleanor; George, Alan (2000). Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. Computational mathematics series. Boca Raton, Fla. London: CRC Press. ISBN 978-0-8493-0270-1.
- Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 30: Polynomials and the FFT". Introduction to Algorithms (2nd. ed.). Cambridge (Mass.): MIT Press. ISBN 978-0-262-03293-3.
- Elliott, Douglas F.; Rao, K. Ramamohan (1982). Fast transforms: algorithms, analyses, applications. New York: Academic Press. ISBN 978-0-12-237080-9.
- Guo, H.; Sitton, G.A.; Burrus, C.S. (1994). "The quick discrete Fourier transform". Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing. Vol. iii. IEEE. pp. III/445–III/448. doi:10.1109/ICASSP.1994.389994. ISBN 978-0-7803-1775-8. S2CID 42639206.
- Johnson, Steven G.; Frigo, Matteo (January 2007). "A Modified Split-Radix FFT With Fewer Arithmetic Operations" (PDF). IEEE Transactions on Signal Processing. 55 (1): 111–119. Bibcode:2007ITSP...55..111J. CiteSeerX 10.1.1.582.5497. doi:10.1109/TSP.2006.882087. ISSN 1053-587X. S2CID 14772428. Archived (PDF) from the original on 2005-05-26.
- Nussbaumer, Henri J. (1990). Fast Fourier Transform and Convolution Algorithms. Springer series in information sciences (2., corr. and updated ed.). Berlin Heidelberg: Springer. ISBN 978-3-540-11825-1.
- Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P. (2007). "Chapter 12. Fast Fourier Transform". Numerical recipes: the art of scientific computing (PDF). Numerical Recipes (3. ed.). Cambridge: Cambridge University Press. pp. 600–639. ISBN 978-0-521-88068-8.
- Singleton, R. (June 1969). "A short bibliography on the fast Fourier transform". IEEE Transactions on Audio and Electroacoustics. 17 (2): 166–169. Bibcode:1969ITAuE..17..166S. doi:10.1109/TAU.1969.1162040. ISSN 0018-9278. (NB. Contains extensive bibliography.)
- Prestini, Elena (2004). The evolution of applied harmonic analysis: models of the real world. Applied and numerical harmonic analysis. Boston; Berlin: Springer Media. Section 3.10: Gauss and the asteroids: history of the FFT. ISBN 978-0-8176-4125-2.
- Van Loan, Charles F. (1992). Computational Frameworks for the Fast Fourier Transform. Frontiers in applied mathematics. Philadelphia: Society for Industrial and Applied Mathematics. ISBN 978-0-89871-285-8.
- Terras, Audrey (1999). Fourier Analysis on Finite Groups and Applications. London Mathematical Society student texts. Cambridge (GB): Cambridge University Press. ISBN 978-0-521-45718-7. (Chap.9 and other chapters)
External links
[edit]- Fast Fourier Transform for Polynomial Multiplication – fast Fourier algorithm
- Fast Fourier transform — FFT – FFT programming in C++ – the Cooley–Tukey algorithm
- Online documentation, links, book, and code
- Sri Welaratna, "Thirty years of FFT analyzers Archived 2014-01-12 at the Wayback Machine", Sound and Vibration (January 1997, 30th anniversary issue) – a historical review of hardware FFT devices
- ALGLIB FFT Code – a dual/GPL-licensed multilanguage (VBA, C++, Pascal, etc.) numerical analysis and data processing library
- SFFT: Sparse Fast Fourier Transform – MIT's sparse (sub-linear time) FFT algorithm, sFFT, and implementation
- VB6 FFT – a VB6 optimized library implementation with source code
- Interactive FFT Tutorial – a visual interactive intro to Fourier transforms and FFT methods
- Introduction to Fourier analysis of time series – tutorial how to use of the Fourier transform in time series analysis
Fast Fourier transform
View on GrokipediaFundamentals
Definition
The Fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a finite sequence of equally spaced data points in a computationally efficient manner.[1] The DFT of a complex-valued sequence for produces coefficients which represent the frequency-domain representation of the input.[9] Direct evaluation via this summation requires operations, but the FFT reduces the complexity to , enabling practical computation for large . The FFT serves a fundamental role in signal processing by converting time-domain sequences into frequency-domain representations, allowing decomposition into constituent sinusoidal components for analysis and manipulation.[10] The transform output exhibits periodicity with period , such that , and conjugate symmetry for real-valued input sequences, where for .[11] As an introductory example, consider the real-valued sequence of length given by . The FFT yields the output , illustrating the DC component in , the conjugate symmetry between and , and the Nyquist frequency at .[9]Relationship to the Discrete Fourier Transform
The Discrete Fourier Transform (DFT) provides a discrete-frequency representation of a finite-length sequence of equally spaced samples of a function, serving as the primary computational tool for frequency analysis in digital signal processing. For a sequence of length , defined for , the forward DFT is given by where and represents the frequency-domain coefficients at discrete frequencies cycles per sample.[11] The inverse DFT recovers the original sequence via This pair of transforms is unitary up to scaling, preserving the structure of the signal in the frequency domain.[11] Key properties of the DFT include linearity, time and frequency shift theorems, and convolution properties, which mirror those of the continuous Fourier transform but adapted to finite discrete sequences. A fundamental property is Parseval's theorem, which states that the energy of the sequence is preserved between domains: This relation quantifies the power distribution across frequencies, enabling energy-based analyses without loss of information.[11] The DFT arises as a sampled version of the continuous-time Fourier transform under specific assumptions. For a bandlimited continuous signal sampled at rate to yield , the DFT coefficients correspond to samples of the Fourier transform of the periodic extension , where repeats every seconds. This periodic assumption implies that the sequence is treated as one period of an infinite periodic train, with zero-padding if necessary to length , and requires to be at least the signal duration to avoid time-domain aliasing.[12][11] Direct computation of the DFT involves evaluating sums, each with terms, leading to an overall complexity of arithmetic operations. This can be interpreted as a matrix-vector multiplication, where the DFT matrix has entries , a dense Vandermonde structure requiring multiplications and additions for .[13]Historical Development
Early Concepts and Precursors
The foundations of the Fast Fourier Transform (FFT) trace back to the continuous Fourier analysis introduced by Joseph Fourier in his 1822 treatise Théorie analytique de la chaleur, where he demonstrated that arbitrary periodic functions could be represented as infinite sums of sines and cosines, laying the groundwork for spectral decomposition in physical systems like heat conduction.[14] This continuous framework motivated later discretizations as computational needs arose in the early 20th century, particularly with the advent of numerical methods for solving differential equations, where sampled data required finite approximations of Fourier integrals to enable practical calculations. A pivotal precursor emerged in Carl Friedrich Gauss's unpublished 1805 work on least-squares interpolation for asteroid orbits, in which he developed a discrete summation akin to the modern Discrete Fourier Transform (DFT) using a divide-and-conquer approach to exploit symmetries in trigonometric sums, though this remained obscure until its posthumous publication in 1866.[15] Building on such ideas, G. C. Danielson and Cornelius Lanczos advanced efficient computation in 1942 by deriving a recursive lemma that decomposes the DFT into smaller subproblems for spectrum analysis in x-ray crystallography, achieving an N log N complexity through repeated halvings of the data length, motivated by the need for faster filtering in experimental data processing.[16] That same year, Ralph V. L. Hartley proposed a real-valued transform alternative to the complex Fourier series in his analysis of transmission problems in electrical engineering, emphasizing symmetrical kernel functions to simplify computations for real signals without imaginary components.[17] In the 1950s, these concepts gained traction amid growing demands in early digital signal processing, particularly for radar signal analysis during World War II and subsequent seismological applications, where direct DFT evaluation proved computationally prohibitive on vacuum-tube computers, prompting explorations of recursive and symmetry-based accelerations.[16] A notable hint toward formalized fast methods appeared in I. J. Good's 1958 paper on statistical interaction algorithms, where a footnote briefly alluded to efficient multidimensional Fourier techniques via prime-factor decompositions for factorial designs, without providing a complete algorithmic description.[18]Cooley-Tukey Breakthrough and Evolution
James W. Cooley, working at IBM's Thomas J. Watson Research Center, and John W. Tukey, a statistician at Princeton University and Bell Labs, developed the divide-and-conquer approach to computing the discrete Fourier transform, with the algorithm first demonstrated in 1964, motivated by the need for efficient processing of seismic data to detect underground nuclear tests.[19] Their algorithm reduced computation time dramatically compared to direct methods, enabling practical applications on early computers. Cooley and Tukey published their findings in 1965 as "An Algorithm for the Machine Calculation of Complex Fourier Series" in Mathematics of Computation, building upon scattered precursors like Carl Friedrich Gauss's 1805 work on least squares and I. J. Good's 1958 suggestions for efficient DFT computation.[2] Tukey's interest in Fourier methods stemmed from his earlier contributions to spectrum analysis during the 1940s at Bell Labs, where he developed techniques for estimating power spectra from time series data, including applications in radar and communications engineering. The 1965 paper quickly gained traction, with initial implementations appearing soon after. In 1967, Norman M. Brenner at MIT's Lincoln Laboratory published FORTRAN implementations of the algorithm, building on interest sparked by Thomas G. Stockham, facilitating its use in analyzing seismic and audio data, sparking broader interest among signal processing researchers.[20] The algorithm's evolution accelerated through optimizations and dissemination in the late 1960s. Glenn D. Bergland at Sandia Laboratories published radix-2 and higher-radix variants in 1969, including a radix-8 subroutine for real-valued series that improved efficiency for specific hardware and data types. Key events included the Arden House Workshops on Fast Fourier Transform Processing, organized by the IEEE Audio and Electroacoustics Committee in 1968 and 1970, which brought together researchers from industry and academia to share implementations and applications, significantly popularizing the FFT. These developments enabled real-time signal processing in fields like seismology and speech analysis, transforming the feasibility of large-scale Fourier computations on computers of the era.[21]Core Algorithms
Cooley-Tukey Algorithm
The Cooley-Tukey algorithm is a divide-and-conquer approach to computing the discrete Fourier transform (DFT) by recursively decomposing an N-point transform into smaller transforms of sizes N₁ and N₂, where N = N₁ × N₂. This factorization reduces the computational complexity from O(N²) to O(N log N) operations when N has many factors, particularly powers of two. The general recursive step reindexes the input and output arrays to separate the DFT sum into independent subproblems, with twiddle factors W = exp(-2πi / N) applied to combine results. For the common radix-2 case where N = 2ᵐ, the decomposition splits the input into even- and odd-indexed subsequences of length N/2.[22] In the radix-2 formulation, the DFT coefficients X for k = 0 to N-1 are computed as follows for 0 ≤ k < N/2: where X_even and X_odd are the (N/2)-point DFTs of the even- and odd-indexed inputs, respectively, and W = exp(-2πi / N). This even-odd split is applied recursively until base cases of 1- or 2-point DFTs are reached, forming a recursion tree with log₂ N levels. The algorithm supports both decimation-in-time (DIT), which processes input in bit-reversed order, and decimation-in-frequency (DIF), which produces bit-reversed output.[22][23][24] The radix-2 butterfly operation is the core computational unit, combining two inputs a and b into outputs a + W^j · b and a - W^j · b, where j is the index determining the twiddle factor. In a signal-flow graph, butterflies connect inputs to outputs across stages, with each stage halving the transform size and applying twiddles. For in-place computation, the array is overwritten stage by stage: initialize with bit-reversed input (for DIT), then for each of log₂ N stages, iterate over groups of size 2^s (s from 1 to log₂ N), computing butterflies within each group using strides that double per stage, and twiddles W^{m · 2^{log N - s}} for offset m. This requires N log₂ N / 2 complex multiplications and N log₂ N complex additions overall.[25][23] Pseudocode for the recursive DIT variant (assuming N is a power of 2 and complex input array x of length N) is:function DIT_FFT(x, N):
if N == 1:
return x
even = DIT_FFT(x[0::2], N/2) # even indices
odd = DIT_FFT(x[1::2], N/2) # odd indices
ω = exp(-2πi / N)
for k in 0 to N/2 - 1:
twiddle = ω^k
upper = even[k] + twiddle * odd[k]
lower = even[k] - twiddle * odd[k]
x[k] = upper
x[k + N/2] = lower
return x
function DIT_FFT(x, N):
if N == 1:
return x
even = DIT_FFT(x[0::2], N/2) # even indices
odd = DIT_FFT(x[1::2], N/2) # odd indices
ω = exp(-2πi / N)
for k in 0 to N/2 - 1:
twiddle = ω^k
upper = even[k] + twiddle * odd[k]
lower = even[k] - twiddle * odd[k]
x[k] = upper
x[k + N/2] = lower
return x
function DIF_FFT(x, N):
if N == 1:
return x
for n in 0 to N/2 - 1:
a = x[n]
b = x[n + N/2]
x[n] = a + b
x[n + N/2] = (a - b) * exp(-2πi n / N)
even = DIF_FFT(x[0::2], N/2)
odd = DIF_FFT(x[1::2], N/2)
for k in 0 to N/2 - 1:
x[2*k] = even[k]
x[2*k + 1] = odd[k]
return x
function DIF_FFT(x, N):
if N == 1:
return x
for n in 0 to N/2 - 1:
a = x[n]
b = x[n + N/2]
x[n] = a + b
x[n + N/2] = (a - b) * exp(-2πi n / N)
even = DIF_FFT(x[0::2], N/2)
odd = DIF_FFT(x[1::2], N/2)
for k in 0 to N/2 - 1:
x[2*k] = even[k]
x[2*k + 1] = odd[k]
return x
Prime-Factor and Related Factorization Algorithms
Prime-factor algorithms provide an alternative to radix-based decompositions for computing the discrete Fourier transform (DFT) when the transform length factors into coprime integers, leveraging number-theoretic mappings to simplify the computation. The prime-factor algorithm (PFA), pioneered by Good in 1958 and refined by Thomas in 1963, reindexes the input and output sequences using the Chinese Remainder Theorem (CRT) to express the -point DFT as a set of smaller DFTs without intermediate twiddle factors or bit-reversal permutations. This row-column-like decomposition treats the data as a virtual two-dimensional array of dimensions where and , computing DFTs of length followed by DFTs of length . The indexing in the PFA is defined by the CRT mapping: for input index and output index , where , , , , and denotes modulo . The resulting DFT relation simplifies to , which separates into independent smaller DFTs along each dimension. For multiple factors, the algorithm extends recursively or iteratively over the prime factorization of . An illustrative example is the 15-point DFT (), where the input sequence x{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to x{{grok:render&&&type=render_inline_citation&&&citation_id=14&&&citation_type=wikipedia}} is rearranged into a matrix via the mapping . Three 5-point DFTs are computed on the rows, followed by five 3-point DFTs on the columns of the result; the output is then read out using the same mapping on the indices to yield X{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} to X{{grok:render&&&type=render_inline_citation&&&citation_id=14&&&citation_type=wikipedia}}. This requires 34 real multiplications and 82 real additions in total, compared to higher counts in non-factorized methods for this length. Related factorization approaches, such as the Winograd FFT introduced in 1976, further optimize small prime-length DFTs by minimizing multiplications through sparse polynomial convolutions derived from the cyclic structure of the transform. For a prime length , the Rader algorithm (1968) maps the non-trivial indices (excluding DC and Nyquist) to a cyclic convolution of length using a primitive root modulo , where the index permutation is for to , reducing the problem to an efficient convolution computable via smaller FFTs. Winograd's method generalizes this to composite short lengths (e.g., ) by factoring into minimal-arithmetic kernels, expressing the DFT as , where denotes convolution and , , are sparse matrices with few non-zero entries. Winograd algorithms achieve lower multiplication counts for short transforms by exploiting algebraic identities to replace general multiplications with additions and pre/post-additions around fewer scalar multiplies. For instance:| Length | Winograd Real Multiplications | Winograd Real Additions | Cooley-Tukey Baseline (Real Multiplications) |
|---|---|---|---|
| 2 | 0 | 2 | 4 |
| 3 | 2 | 10 | 12 |
| 4 | 0 | 10 | 16 |
| 5 | 4 | 34 | 40 |
Specialized Algorithms
For Real-Valued Inputs
When the input sequence to the discrete Fourier transform (DFT) consists entirely of real-valued numbers, specialized fast Fourier transform (FFT) algorithms can exploit the inherent symmetry of the resulting transform to reduce computational requirements by approximately half compared to the general complex case. This optimization is particularly valuable in applications such as signal processing, where inputs like audio or sensor data are often real-valued. The core approach involves treating the real sequence as a complex sequence with zero imaginary parts and then post-processing the output to extract the real DFT coefficients using the Hermitian symmetry property. One fundamental method is complexification followed by output separation. Given a real-valued sequence for , form the complex input and compute its N-point complex DFT . Due to the reality of the input, , where denotes the complex conjugate. The real-valued DFT coefficients are then recovered as the real part of for to , specifically: with and (if N even) being purely real. The imaginary parts follow an odd symmetry but are typically discarded if only the magnitude spectrum is needed. This approach requires one full complex FFT plus O(N) post-processing operations.[28] For further efficiency, dedicated real-input FFT algorithms avoid the full complex computation by directly incorporating the symmetry into the factorization. A prominent example is the real-valued split-radix FFT, which adapts the split-radix decomposition to real data, eliminating redundant operations on imaginary components. This results in approximately real multiplications and real additions for power-of-two lengths N, roughly halving the arithmetic compared to a complex split-radix FFT. The algorithm proceeds by splitting the real input into even and odd indexed parts, applying recursive real FFTs, and using sine-cosine symmetries to combine results without complex arithmetic throughout.[28] To compute multiple real FFTs efficiently, techniques pack two or more real sequences into a single complex input of the same length, leveraging phase shifts or direct interleaving. For two real sequences and , form the complex input and compute its N-point complex FFT . The individual DFTs are separated using: for , with adjustments for DC and Nyquist terms. This method computes two N-point real FFTs using one N-point complex FFT plus O(N) separation steps, effectively halving the cost per transform. For example, in audio processing, this packing allows simultaneous transformation of stereo channels (left and right) with minimal overhead. Such optimizations are widely implemented in libraries like FFTW, building on these foundational techniques.[28]For Symmetric or Structured Data
When the input sequence to the discrete Fourier transform (DFT) possesses symmetries such as even or odd functions, specialized fast Fourier transform (FFT) algorithms can exploit these properties to reduce the computational size below that of the standard N-point FFT. For an even-symmetric input, where , the DFT computation reduces to an N/2-point FFT after appropriate preprocessing of the data into a real-valued sequence that captures the symmetric components. This approach eliminates redundant calculations inherent in the symmetry, achieving approximately half the operations of a full complex FFT while maintaining the same output. Similarly, for odd-symmetric inputs, , a parallel reduction to an N/2-point FFT is possible, with modifications to the post-processing steps to recover the imaginary parts of the transform. These techniques build on the Cooley-Tukey radix-2 decomposition but prune unnecessary branches due to the symmetry constraints.[29] Further savings arise when the input exhibits combined symmetries, such as quarter-wave even symmetry (even around both n=0 and n=N/4), allowing reduction to an N/4-point FFT. In this case, the input satisfies , and preprocessing involves weighting and folding the data to form a smaller transform that encodes the full DFT. For example, consider an N=4 quarter-wave even-symmetric input . The algorithm first forms a 2-point real sequence , , computes its 2-point FFT, and then reconstructs the 4-point DFT via simple additions and multiplications by sine/cosine factors, requiring only 4 real multiplications and 6 additions total—far fewer than the 24 operations of a general 4-point FFT. Such reductions are particularly valuable in applications where data naturally arises with these symmetries, like certain filter designs or periodic extensions.[29] A prominent application of symmetry exploitation occurs in computing the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST), which are tailored for real-valued inputs with even or odd extensions, yielding real outputs and serving as Hermitian-symmetric equivalents to the DFT. The Type-II DCT, widely adopted in compression standards like JPEG and MPEG, is formulated as and can be derived by taking the real part of the DFT of an augmented 2N-point sequence formed by even extension and zero-padding of the input. This embedding allows the DCT to be computed via a single 2N-point FFT followed by O(N) post-processing, achieving the same complexity as the FFT while benefiting from the input's implicit symmetry. The DST, particularly Type-III, follows analogously using odd extensions. These transforms are Hermitian in the sense that their outputs are real and symmetric, enabling storage and computation savings comparable to real-input FFTs.[30] In structured data contexts, such as symmetric Toeplitz matrices—which arise in autoregressive modeling and linear prediction—the FFT exploits the constant-diagonal structure to enable fast matrix-vector multiplications. A symmetric N × N Toeplitz matrix can be embedded into a larger (2N-1) × (2N-1) circulant matrix, whose eigendecomposition is performed efficiently via the DFT, reducing the multiplication from O(N^2) to O(N log N) operations. For an N=4 example, a symmetric Toeplitz matrix like is extended to a 7×7 circulant matrix, and the product is obtained by forward and inverse FFTs on the padded vectors, with the central N elements extracted as the result. This method underpins iterative solvers for Toeplitz systems, leveraging the matrix's symmetry for numerical stability and efficiency.[31]Computational Analysis
Complexity and Operation Counts
The direct computation of the discrete Fourier transform requires arithmetic operations for an input sequence of length . In contrast, fast Fourier transform algorithms reduce this to operations through recursive decomposition.[32] This bound follows from the divide-and-conquer structure of algorithms like Cooley-Tukey, where the transform is recursively split into smaller subtransforms. The recursion forms a tree with levels for , and each level performs operations (such as butterflies, each involving a few arithmetic steps), summing to total work.[32] For the radix-2 Cooley-Tukey algorithm, the operation count is approximately complex multiplications and complex additions, assuming trivial multiplications by 1 or -1 are excluded. Lower bounds establish that any linear algorithm for the DFT requires at least complex additions, with no tight bound known for general arithmetic operations. Split-radix variants achieve improved counts, such as approximately real operations.[32] In-place implementations of the FFT, which overwrite the input array, require space overall. Recursive formulations incur an additional space for the call stack due to the recursion depth.[33]| Algorithm | N | Complex Multiplications | Complex Additions |
|---|---|---|---|
| Cooley-Tukey (radix-2) | 4 | 0 | 8 |
| Cooley-Tukey (radix-2) | 8 | 4 | 24 |
| Cooley-Tukey (radix-2) | 16 | 16 | 64 |
| Winograd | 4 | 0 | 10 |
| Winograd | 5 | 4 | 19 |
| Winograd | 7 | 12 | 42 |
Numerical Accuracy and Stability
The numerical stability of the Fast Fourier Transform (FFT) in finite-precision floating-point arithmetic has been extensively analyzed, revealing that accumulated rounding errors remain well-controlled despite the recursive structure of the algorithm. Backward stability proofs demonstrate that the computed output satisfies , where the perturbation for some constant , is the machine unit roundoff (typically in double precision), is the transform length, and denotes a suitable vector norm such as the Euclidean norm. This bound arises from the logarithmic depth of the computation tree in radix-2 Cooley-Tukey FFTs, where each level introduces rounding errors of order times the current intermediate norm, and errors propagate multiplicatively through the stages without significant growth due to the unitary nature of the transform.[34][35] Although the FFT computation itself is stable, the underlying discrete Fourier transform can exhibit ill-conditioning for specific inputs that lead to destructive interference in the frequency domain, amplifying forward errors beyond the backward perturbation. For instance, inputs with components aligned such that certain output bins experience near-total cancellation can result in relative forward errors up to , where is the condition number of the effective submatrix, potentially reaching in pathological cases like highly oscillatory signals near the Nyquist frequency. However, since the DFT matrix is unitary and thus well-conditioned overall (), such amplification is rare in practice and typically bounded by the same for generic inputs. Near-zero twiddle factors do not contribute to ill-conditioning, as all twiddles have unit magnitude, but precomputation errors in approximating these complex exponentials via trigonometric functions can introduce additional bias if not handled carefully.[34][36] Specific implementation details can further impact stability, including the bit-reversal permutation stage, which is sensitive to indexing errors that may scramble input elements and lead to complete output corruption if the reversal is imprecise. Twiddle factor precomputation introduces roundoff errors during sine and cosine evaluations, with worst-case analyses showing that certain decomposition methods (e.g., angle reduction via multiple-angle formulas) can accumulate up to extra error per factor, propagating through the butterflies to affect the overall bound. Numerical experiments for large , such as , illustrate this growth: in double precision, the observed relative error for random inputs reaches approximately , closely matching the theoretical , confirming the sharpness of the bound without exceeding it significantly.[34][36][37] To enhance stability, particularly for high-precision requirements, techniques such as compensated summation can be integrated into the butterfly additions, where an error compensation term tracks lost low-order bits, reducing summation errors from to for terms, as in Kahan's algorithm adapted to FFT stages. For applications demanding precision beyond standard floating-point, quadratic-time algorithms like the slow direct DFT offer exact computation up to machine precision without recursive error accumulation, though at higher cost; alternatively, mixed-precision schemes precompute twiddles in higher precision to minimize propagation. These methods ensure robustness, with compensated variants observed to halve error growth in large-scale transforms compared to naive implementations.[34]Extensions and Generalizations
Multidimensional Transforms
The fast Fourier transform extends naturally to multidimensional arrays through a separable decomposition, leveraging the one-dimensional FFT along each dimension independently. This approach, a generalization of the Cooley-Tukey algorithm, enables efficient computation of the multidimensional discrete Fourier transform (DFT) for data such as images or volumetric signals.[38] For a -dimensional array of dimensions , the separable multidimensional FFT applies a one-dimensional FFT sequentially to all elements along the first dimension, then the second, and so on. The total number of operations scales as , where is the total array size, because the cost along each dimension is . This efficiency holds assuming each admits a fast one-dimensional factorization, such as powers of two. In the two-dimensional case, the DFT of an array is defined as for and . The separability of the exponential term allows this double sum to be rewritten as a product of one-dimensional transforms.[39] The row-column algorithm implements this by first computing an -point FFT along each of the rows of the input array, producing an intermediate array of size . Then, an -point FFT is applied along each of the columns of the intermediate array to yield the final . For arrays with power-of-two dimensions, the algorithm supports in-place computation by reusing the input storage and applying bit-reversal permutations or similar indexing schemes from the one-dimensional Cooley-Tukey radix-2 method along each dimension, minimizing memory overhead.[40][38] As an illustrative example, consider a 4×4 grayscale image represented as the matrix Applying the row-column algorithm with 4-point FFTs (using Cooley-Tukey radix-2) first transforms each row, capturing horizontal frequency content, such as low-frequency trends from left to right. The subsequent column transforms incorporate vertical frequencies, revealing overall patterns like the linear increase in intensity. The resulting frequency-domain matrix encodes the image's spectral components, where low-frequency coefficients near (0,0) dominate due to the smooth gradient, while higher indices capture finer details or noise; shifting the zero-frequency term to the center viafftshift aids visual interpretation of the spectrum.[38]
Non-Standard Lengths and Variants
The Fast Fourier Transform (FFT) algorithms like the Cooley-Tukey method are highly efficient for input lengths that are powers of two, but many applications require transforms of arbitrary or non-composite lengths, such as primes or other irregular sizes. To address this, specialized variants adapt the FFT framework to maintain near-optimal computational complexity, often by reformulating the discrete Fourier transform (DFT) into forms computable via standard FFTs of convenient lengths. These adaptations are crucial for scenarios where data lengths are dictated by physical constraints, like sensor arrays or cryptographic keys.[41] One prominent method for arbitrary lengths is Bluestein's chirp z-transform algorithm, which converts the DFT of length into a linear convolution that can be efficiently evaluated using FFTs of padded power-of-two sizes. Developed in 1968, this approach exploits the quadratic phase structure of the DFT to express the output as: By pre- and post-multiplying the input and output with chirp signals (quadratic phase factors) and padding the resulting convolution to the next power of two, the transform achieves complexity regardless of 's factorization, making it suitable for prime or composite lengths. This method is widely implemented in libraries like FFTW for non-standard sizes.[42][43] For prime lengths specifically, Rader's algorithm provides an alternative by mapping the prime-length DFT to a cyclic convolution of length , which is then computed using an FFT of that size. Introduced in 1968, it leverages properties of primitive roots modulo to reorder the transform into a form where the convolution can be accelerated, again yielding operations. This is particularly effective when has favorable factors for standard FFTs, though it incurs overhead from index permutations. Rader's method complements Bluestein's for primes, with implementations appearing in high-performance libraries for exact prime transforms.[44][45] Beyond these, variants extend the FFT paradigm to specialized data structures. The sparse FFT targets compressible signals where the Fourier spectrum is -sparse (with only significant coefficients), enabling sublinear-time recovery via compressive sensing techniques that sample and hash the input to isolate non-zero frequencies. Recent integrations with deep learning have improved reconstruction accuracy for signals like structural vibrations.[46][47] In quantum computing, the quantum Fourier transform (QFT) generalizes the FFT for superposition states, running in time on quantum hardware and serving as a core subroutine in Shor's algorithm for integer factorization, where it extracts periods from modular exponentials.[48] As an illustrative example, consider computing the DFT for (a prime) using Bluestein's algorithm with padding to the next power of two, . The input is multiplied by a chirp , convolved with a precomputed chirp filter via two 32-point FFTs and an inverse FFT, then adjusted by output chirp ; this yields the exact 17-point transform in approximately 5 operations, outperforming direct DFT evaluation by orders of magnitude.[42]Applications
Signal and Audio Processing
In signal and audio processing, the fast Fourier transform (FFT) enables efficient spectral analysis of one-dimensional time-series data, allowing the estimation of frequency content in signals such as audio waveforms. A fundamental application is the computation of the power spectral density (PSD) via the periodogram method, where the squared magnitude of the FFT output provides an estimate of the signal's power distribution across frequencies. Specifically, for a discrete-time signal of length , the periodogram is given by where is the -th DFT coefficient obtained via FFT.[49] This nonparametric estimator reveals dominant frequencies in non-stationary signals like speech or music, though it suffers from high variance that can be mitigated by averaging multiple periodograms.[49] The FFT also facilitates fast convolution, essential for implementing finite impulse response (FIR) filters in signal processing. Circular convolution of an input signal with a filter impulse response , both of length , is computed as , where denotes element-wise multiplication; this exploits the convolution theorem to reduce complexity from to .[50] For linear convolution of long signals, the overlap-add method segments the input into overlapping blocks, applies FFT-based circular convolution to each, and sums the outputs after appropriate shifting and windowing to avoid artifacts.[50] This technique is widely used for real-time filtering in audio systems, enabling efficient removal of noise or unwanted frequency bands. In audio processing, the short-time Fourier transform (STFT), which applies the FFT to windowed segments of the signal, produces spectrograms that visualize time-varying frequency content. The STFT of with a window of length is , where is the hop size and ; the magnitude squared yields the spectrogram for applications like audio compression and effects.[51] For pitch detection in speech or music, autocorrelation can be efficiently computed using the FFT: the autocorrelation sequence is the IFFT of , with the pitch period identified as the lag of the first significant peak beyond the zero lag.[52] As an illustrative example, consider designing an FFT-based low-pass FIR filter. The filter's impulse response is derived from the inverse FFT of an ideal low-pass frequency response, such as a rectangular window in the frequency domain truncated to pass frequencies below a cutoff . The resulting filter's frequency response, obtained by taking the FFT of , exhibits a passband with near-unity gain up to , a transition band with sidelobe ripples due to windowing, and attenuation in the stopband, enabling effective high-frequency suppression in audio signals while preserving lower components.[50]Image Analysis and Scientific Computing
In image processing, the two-dimensional Fast Fourier Transform (2D FFT) enables efficient computation of convolutions, which are fundamental for operations such as blurring and sharpening. Blurring is achieved by convolving the image with a low-pass filter, like a Gaussian kernel, in the frequency domain by multiplying the image's Fourier transform with the filter's transform, leveraging the convolution theorem to reduce computational cost from O(N^4) to O(N^2 log N) for an N x N image.[53] Similarly, sharpening enhances high-frequency components through high-emphasis filters, such as adding a scaled Laplacian to the original image after frequency-domain multiplication, preserving edges while amplifying details.[53] Frequency-domain filtering further utilizes 2D FFT for tasks like edge detection, where high-pass filters isolate abrupt intensity changes; for instance, a Sobel operator approximates gradients by emphasizing high frequencies perpendicular to edges, followed by magnitude computation and thresholding to delineate boundaries.[53] In scientific computing, the FFT facilitates solving partial differential equations (PDEs) in spectral space, particularly for periodic boundary conditions. A seminal approach solves the Poisson equation ∇²φ = -ρ by transforming to Fourier space, where it becomes -k² Φ(k) = -ρ(k), allowing direct division and inverse transform to obtain φ with O(N log N) complexity per dimension, ideal for electrostatics in simulations.[54] In molecular dynamics, FFT accelerates computation of correlation functions, such as the pair distribution function g(r), by performing the Fourier transform of the structure factor S(k) obtained from particle positions, enabling efficient analysis of liquid structure and dynamics in large systems. Parallel and multigrid variants of FFT enhance scalability in multidimensional simulations. Multigrid methods combine FFT solvers on coarse grids with iterative refinement for non-periodic PDEs, reducing iterations in fluid dynamics. As of 2025, GPU acceleration of parallel FFT has been integrated into climate models, such as the Meso-NH atmospheric simulation code (version 5.5), where 3D FFT solves pressure equations via pencil decomposition on up to 64 nodes (256 AMD MI250X GPUs) of the Adastra supercomputer, achieving up to 6× speedup for high-resolution convection-permitting forecasts with horizontal grid spacing down to 100 m.[55] As a precursor to modern compression, 2D FFT relates to the Discrete Cosine Transform (DCT) in JPEG, where fast DCT algorithms exploit FFT-like butterfly structures to compute block-wise transforms, concentrating energy in low frequencies for quantization and lossy encoding.Implementations and Alternatives
Software Libraries and Performance
Several prominent software libraries implement the Fast Fourier Transform (FFT), optimized for various languages, platforms, and hardware. FFTPACK, a Fortran package developed at the National Center for Atmospheric Research, provides efficient subroutines for 1D and multidimensional FFTs of periodic, real, and symmetric sequences, serving as a foundational reference implementation.[56] FFTW (Fastest Fourier Transform in the West), a widely used C library, employs an adaptive architecture that generates platform-specific code through a planner routine, which benchmarks and selects from multiple algorithm variants to maximize performance.[57] In the Python ecosystem, NumPy's fft module offers basic DFT routines, while SciPy's scipy.fft submodule extends this with advanced features like multidimensional transforms and real-to-complex optimizations, often leveraging backends such as PocketFFT for portability and speed. For GPU computing, NVIDIA's cuFFT library integrates seamlessly with CUDA, supporting batched and multidimensional FFTs on NVIDIA hardware, with optimizations for power-of-two sizes like N=2^{20}.[58] Key performance factors in these libraries include cache efficiency, which reduces memory latency by improving data locality during the divide-and-conquer stages of the Cooley-Tukey algorithm; SIMD vectorization, enabling parallel execution of butterfly operations across multiple data elements using CPU extensions like AVX; and autotuning, as exemplified by FFTW's planner, which tests codelets to identify the fastest execution path for given hardware and problem size.[59][60] As of 2025, the FFT landscape has evolved with quantum computing tools, such as IBM's Qiskit library, which includes implementations of the Quantum Fourier Transform (QFT) for hybrid classical-quantum workflows, allowing FFT-like decompositions in quantum circuits integrated with classical post-processing.[61] Additionally, sparse FFT capabilities have advanced in the Python scientific computing stack, with SciPy incorporating efficient sparse signal handling through its ecosystem, enabling sublinear-time approximations for signals with few dominant frequencies. Benchmarks highlight the throughput advantages of GPU acceleration over CPUs for large-scale FFTs. The table below presents representative single-precision complex-to-complex FFT performance for N=2^{20} (1,048,576 points), measured in GFLOPS (assuming ~5N \log_2 N operations), on 2025-era hardware; these establish the scale where GPUs excel for data-parallel workloads.| Library | Hardware Platform | Throughput (GFLOPS) | Source |
|---|---|---|---|
| FFTW | Intel Xeon Platinum 8592+ (64-core CPU) | High performance on multi-core CPUs | |
| cuFFT | NVIDIA H100 GPU (single) | Orders of magnitude higher than CPUs |
Competing Transform Methods
While the Fast Fourier Transform (FFT) excels in efficient computation of the discrete Fourier transform for stationary signals, wavelet transforms offer a compelling alternative for non-stationary signals by providing localized time-frequency representations. The discrete wavelet transform (DWT), introduced by Mallat, decomposes signals using scalable and translatable basis functions, contrasting the FFT's global sinusoidal basis that assumes uniform frequency content across the signal. This multiresolution approach allows wavelets to capture transient features, such as sudden changes in seismic or biomedical signals, where FFT artifacts like spectral leakage degrade analysis. For instance, the Morlet wavelet, developed for geophysical applications, combines a Gaussian envelope with a complex exponential to yield a balanced time-frequency resolution suitable for continuous wavelet analysis of oscillatory non-stationary processes. The number-theoretic transform (NTT) serves as another FFT alternative in domains requiring exact integer arithmetic, particularly cryptography, by evaluating polynomial multiplications modulo a prime via roots of unity in finite fields. Pioneered by Pollard, the NTT mirrors the FFT's divide-and-conquer structure but operates over rings like , eliminating floating-point precision issues inherent in FFT implementations. In lattice-based cryptosystems, such as those using ring learning with errors, NTT accelerates key operations like convolution, outperforming FFT in modular settings by ensuring error-free computations and leveraging hardware-optimized integer operations. Other specialized transforms address limitations of the FFT for particular signal classes. Chirplet transforms, generalized from Gabor and wavelet bases, parameterize functions with linear frequency sweeps to model chirp signals—such as radar returns or bat echolocation—where FFT assumes constant frequencies and thus requires excessive basis elements for accurate representation. Introduced by Mann and Haykin, chirplets adaptively track instantaneous frequency variations, improving resolution for frequency-modulated waveforms compared to FFT's fixed grid. For sparse signals dominated by few exponentials, Prony's method provides an approximation technique that recovers parameters like frequencies and amplitudes directly, bypassing the FFT's full spectral computation and enabling super-resolution beyond the Nyquist limit in low-complexity scenarios. Comparisons highlight contexts where these methods surpass FFT. In modular arithmetic for cryptography, NTT avoids FFT's approximation errors, achieving identical results to direct multiplication but with complexity for large polynomials. For compression, wavelet transforms in JPEG2000 yield superior rate-distortion performance over FFT-related discrete cosine transform in JPEG, with gains of 20-30% in compression ratio or 2-4 dB in PSNR for natural images due to better energy compaction in subbands.| Transform | Key Advantage over FFT | Typical Application | Performance Edge Example |
|---|---|---|---|
| Wavelet (DWT) | Localized time-frequency analysis for non-stationarity | Signal denoising, feature extraction | 20-30% better compression in JPEG2000 vs. JPEG for images |
| NTT | Exact integer computation in finite fields | Polynomial multiplication in crypto | Error-free vs. FFT rounding in lattice schemes, same speed |
| Chirplet | Handles linear frequency modulation | Radar, audio chirps | Improved resolution for swept signals, fewer basis functions needed |
| Prony | Sparse exponential recovery | Super-resolution spectroscopy | Recovers frequencies beyond FFT's grid without full transform |