1@node Introduction, Tutorial, Top, Top 2@chapter Introduction 3This manual documents version @value{VERSION} of FFTW, the 4@emph{Fastest Fourier Transform in the West}. FFTW is a comprehensive 5collection of fast C routines for computing the discrete Fourier 6transform (DFT) and various special cases thereof. 7@cindex discrete Fourier transform 8@cindex DFT 9@itemize @bullet 10@item FFTW computes the DFT of complex data, real data, even- 11 or odd-symmetric real data (these symmetric transforms are usually 12 known as the discrete cosine or sine transform, respectively), and the 13 discrete Hartley transform (DHT) of real data. 14 15@item The input data can have arbitrary length. 16 FFTW employs @Onlogn{} algorithms for all lengths, including 17 prime numbers. 18 19@item FFTW supports arbitrary multi-dimensional data. 20 21@item FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX, and 22 NEON vector instruction sets. 23 24@item FFTW includes parallel (multi-threaded) transforms 25 for shared-memory systems. 26@item Starting with version 3.3, FFTW includes distributed-memory parallel 27 transforms using MPI. 28@end itemize 29 30We assume herein that you are familiar with the properties and uses of 31the DFT that are relevant to your application. Otherwise, see 32e.g. @cite{The Fast Fourier Transform and Its Applications} by E. O. Brigham 33(Prentice-Hall, Englewood Cliffs, NJ, 1988). 34@uref{http://www.fftw.org, Our web page} also has links to FFT-related 35information online. 36@cindex FFTW 37 38@c TODO: revise. We don't need to brag any longer 39@c 40@c FFTW is usually faster (and sometimes much faster) than all other 41@c freely-available Fourier transform programs found on the Net. It is 42@c competitive with (and often faster than) the FFT codes in Sun's 43@c Performance Library, IBM's ESSL library, HP's CXML library, and 44@c Intel's MKL library, which are targeted at specific machines. 45@c Moreover, FFTW's performance is @emph{portable}. Indeed, FFTW is 46@c unique in that it automatically adapts itself to your machine, your 47@c cache, the size of your memory, your number of registers, and all the 48@c other factors that normally make it impossible to optimize a program 49@c for more than one machine. An extensive comparison of FFTW's 50@c performance with that of other Fourier transform codes has been made, 51@c and the results are available on the Web at 52@c @uref{http://fftw.org/benchfft, the benchFFT home page}. 53@c @cindex benchmark 54@c @fpindex benchfft 55 56In order to use FFTW effectively, you need to learn one basic concept 57of FFTW's internal structure: FFTW does not use a fixed algorithm for 58computing the transform, but instead it adapts the DFT algorithm to 59details of the underlying hardware in order to maximize performance. 60Hence, the computation of the transform is split into two phases. 61First, FFTW's @dfn{planner} ``learns'' the fastest way to compute the 62transform on your machine. The planner 63@cindex planner 64produces a data structure called a @dfn{plan} that contains this 65@cindex plan 66information. Subsequently, the plan is @dfn{executed} 67@cindex execute 68to transform the array of input data as dictated by the plan. The 69plan can be reused as many times as needed. In typical 70high-performance applications, many transforms of the same size are 71computed and, consequently, a relatively expensive initialization of 72this sort is acceptable. On the other hand, if you need a single 73transform of a given size, the one-time cost of the planner becomes 74significant. For this case, FFTW provides fast planners based on 75heuristics or on previously computed plans. 76 77FFTW supports transforms of data with arbitrary length, rank, 78multiplicity, and a general memory layout. In simple cases, however, 79this generality may be unnecessary and confusing. Consequently, we 80organized the interface to FFTW into three levels of increasing 81generality. 82@itemize @bullet 83@item The @dfn{basic interface} computes a single 84 transform of contiguous data. 85@item The @dfn{advanced interface} computes transforms 86 of multiple or strided arrays. 87@item The @dfn{guru interface} supports the most general data 88 layouts, multiplicities, and strides. 89@end itemize 90We expect that most users will be best served by the basic interface, 91whereas the guru interface requires careful attention to the 92documentation to avoid problems. 93@cindex basic interface 94@cindex advanced interface 95@cindex guru interface 96 97 98Besides the automatic performance adaptation performed by the planner, 99it is also possible for advanced users to customize FFTW manually. For 100example, if code space is a concern, we provide a tool that links only 101the subset of FFTW needed by your application. Conversely, you may need 102to extend FFTW because the standard distribution is not sufficient for 103your needs. For example, the standard FFTW distribution works most 104efficiently for arrays whose size can be factored into small primes 105(@math{2}, @math{3}, @math{5}, and @math{7}), and otherwise it uses a 106slower general-purpose routine. If you need efficient transforms of 107other sizes, you can use FFTW's code generator, which produces fast C 108programs (``codelets'') for any particular array size you may care 109about. 110@cindex code generator 111@cindex codelet 112For example, if you need transforms of size 113@ifinfo 114@math{513 = 19 x 3^3}, 115@end ifinfo 116@tex 117$513 = 19 \cdot 3^3$, 118@end tex 119@html 120513 = 19*3<sup>3</sup>, 121@end html 122you can customize FFTW to support the factor @math{19} efficiently. 123 124For more information regarding FFTW, see the paper, ``The Design and 125Implementation of FFTW3,'' by M. Frigo and S. G. Johnson, which was an 126invited paper in @cite{Proc. IEEE} @b{93} (2), p. 216 (2005). The 127code generator is described in the paper ``A fast Fourier transform 128compiler'', 129@cindex compiler 130by M. Frigo, in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference 131on Programming Language Design and Implementation (PLDI), Atlanta, 132Georgia, May 1999}. These papers, along with the latest version of 133FFTW, the FAQ, benchmarks, and other links, are available at 134@uref{http://www.fftw.org, the FFTW home page}. 135 136The current version of FFTW incorporates many good ideas from the past 137thirty years of FFT literature. In one way or another, FFTW uses the 138Cooley-Tukey algorithm, the prime factor algorithm, Rader's algorithm 139for prime sizes, and a split-radix algorithm (with a 140``conjugate-pair'' variation pointed out to us by Dan Bernstein). 141FFTW's code generator also produces new algorithms that we do not 142completely understand. 143@cindex algorithm 144The reader is referred to the cited papers for the appropriate 145references. 146 147The rest of this manual is organized as follows. We first discuss the 148sequential (single-processor) implementation. We start by describing 149the basic interface/features of FFTW in @ref{Tutorial}. 150Next, @ref{Other Important Topics} discusses data alignment 151(@pxref{SIMD alignment and fftw_malloc}), 152the storage scheme of multi-dimensional arrays 153(@pxref{Multi-dimensional Array Format}), and FFTW's mechanism for 154storing plans on disk (@pxref{Words of Wisdom-Saving Plans}). Next, 155@ref{FFTW Reference} provides comprehensive documentation of all 156FFTW's features. Parallel transforms are discussed in their own 157chapters: @ref{Multi-threaded FFTW} and @ref{Distributed-memory FFTW 158with MPI}. Fortran programmers can also use FFTW, as described in 159@ref{Calling FFTW from Legacy Fortran} and @ref{Calling FFTW from 160Modern Fortran}. @ref{Installation and Customization} explains how to 161install FFTW in your computer system and how to adapt FFTW to your 162needs. License and copyright information is given in @ref{License and 163Copyright}. Finally, we thank all the people who helped us in 164@ref{Acknowledgments}. 165 166