1This is fftw3.info, produced by makeinfo version 6.5 from fftw3.texi. 2 3This manual is for FFTW (version 3.3.9, 10 December 2020). 4 5 Copyright (C) 2003 Matteo Frigo. 6 7 Copyright (C) 2003 Massachusetts Institute of Technology. 8 9 Permission is granted to make and distribute verbatim copies of 10 this manual provided the copyright notice and this permission 11 notice are preserved on all copies. 12 13 Permission is granted to copy and distribute modified versions of 14 this manual under the conditions for verbatim copying, provided 15 that the entire resulting derived work is distributed under the 16 terms of a permission notice identical to this one. 17 18 Permission is granted to copy and distribute translations of this 19 manual into another language, under the above conditions for 20 modified versions, except that this permission notice may be stated 21 in a translation approved by the Free Software Foundation. 22INFO-DIR-SECTION Development 23START-INFO-DIR-ENTRY 24* fftw3: (fftw3). FFTW User's Manual. 25END-INFO-DIR-ENTRY 26 27 28File: fftw3.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir) 29 30FFTW User Manual 31**************** 32 33Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a 34collection of fast C routines to compute the discrete Fourier transform. 35This manual documents FFTW version 3.3.9. 36 37* Menu: 38 39* Introduction:: 40* Tutorial:: 41* Other Important Topics:: 42* FFTW Reference:: 43* Multi-threaded FFTW:: 44* Distributed-memory FFTW with MPI:: 45* Calling FFTW from Modern Fortran:: 46* Calling FFTW from Legacy Fortran:: 47* Upgrading from FFTW version 2:: 48* Installation and Customization:: 49* Acknowledgments:: 50* License and Copyright:: 51* Concept Index:: 52* Library Index:: 53 54 55File: fftw3.info, Node: Introduction, Next: Tutorial, Prev: Top, Up: Top 56 571 Introduction 58************** 59 60This manual documents version 3.3.9 of FFTW, the _Fastest Fourier 61Transform in the West_. FFTW is a comprehensive collection of fast C 62routines for computing the discrete Fourier transform (DFT) and various 63special cases thereof. 64 * FFTW computes the DFT of complex data, real data, even- or 65 odd-symmetric real data (these symmetric transforms are usually 66 known as the discrete cosine or sine transform, respectively), and 67 the discrete Hartley transform (DHT) of real data. 68 69 * The input data can have arbitrary length. FFTW employs O(n log n) 70 algorithms for all lengths, including prime numbers. 71 72 * FFTW supports arbitrary multi-dimensional data. 73 74 * FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX, 75 and NEON vector instruction sets. 76 77 * FFTW includes parallel (multi-threaded) transforms for 78 shared-memory systems. 79 * Starting with version 3.3, FFTW includes distributed-memory 80 parallel transforms using MPI. 81 82 We assume herein that you are familiar with the properties and uses 83of the DFT that are relevant to your application. Otherwise, see e.g. 84'The Fast Fourier Transform and Its Applications' by E. O. Brigham 85(Prentice-Hall, Englewood Cliffs, NJ, 1988). Our web page 86(http://www.fftw.org) also has links to FFT-related information online. 87 88 In order to use FFTW effectively, you need to learn one basic concept 89of FFTW's internal structure: FFTW does not use a fixed algorithm for 90computing the transform, but instead it adapts the DFT algorithm to 91details of the underlying hardware in order to maximize performance. 92Hence, the computation of the transform is split into two phases. 93First, FFTW's "planner" "learns" the fastest way to compute the 94transform on your machine. The planner produces a data structure called 95a "plan" that contains this information. Subsequently, the plan is 96"executed" to transform the array of input data as dictated by the plan. 97The plan can be reused as many times as needed. In typical 98high-performance applications, many transforms of the same size are 99computed and, consequently, a relatively expensive initialization of 100this sort is acceptable. On the other hand, if you need a single 101transform of a given size, the one-time cost of the planner becomes 102significant. For this case, FFTW provides fast planners based on 103heuristics or on previously computed plans. 104 105 FFTW supports transforms of data with arbitrary length, rank, 106multiplicity, and a general memory layout. In simple cases, however, 107this generality may be unnecessary and confusing. Consequently, we 108organized the interface to FFTW into three levels of increasing 109generality. 110 * The "basic interface" computes a single transform of contiguous 111 data. 112 * The "advanced interface" computes transforms of multiple or strided 113 arrays. 114 * The "guru interface" supports the most general data layouts, 115 multiplicities, and strides. 116 We expect that most users will be best served by the basic interface, 117whereas the guru interface requires careful attention to the 118documentation to avoid problems. 119 120 Besides the automatic performance adaptation performed by the 121planner, it is also possible for advanced users to customize FFTW 122manually. For example, if code space is a concern, we provide a tool 123that links only the subset of FFTW needed by your application. 124Conversely, you may need to extend FFTW because the standard 125distribution is not sufficient for your needs. For example, the 126standard FFTW distribution works most efficiently for arrays whose size 127can be factored into small primes (2, 3, 5, and 7), and otherwise it 128uses a slower general-purpose routine. If you need efficient transforms 129of other sizes, you can use FFTW's code generator, which produces fast C 130programs ("codelets") for any particular array size you may care about. 131For example, if you need transforms of size 513 = 19 x 3^3, you can 132customize FFTW to support the factor 19 efficiently. 133 134 For more information regarding FFTW, see the paper, "The Design and 135Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an 136invited paper in 'Proc. IEEE' 93 (2), p. 216 (2005). The code 137generator is described in the paper "A fast Fourier transform compiler", 138by M. Frigo, in the 'Proceedings of the 1999 ACM SIGPLAN Conference on 139Programming Language Design and Implementation (PLDI), Atlanta, Georgia, 140May 1999'. These papers, along with the latest version of FFTW, the 141FAQ, benchmarks, and other links, are available at the FFTW home page 142(http://www.fftw.org). 143 144 The current version of FFTW incorporates many good ideas from the 145past thirty years of FFT literature. In one way or another, FFTW uses 146the Cooley-Tukey algorithm, the prime factor algorithm, Rader's 147algorithm for prime sizes, and a split-radix algorithm (with a 148"conjugate-pair" variation pointed out to us by Dan Bernstein). FFTW's 149code generator also produces new algorithms that we do not completely 150understand. The reader is referred to the cited papers for the 151appropriate references. 152 153 The rest of this manual is organized as follows. We first discuss 154the sequential (single-processor) implementation. We start by 155describing the basic interface/features of FFTW in *note Tutorial::. 156Next, *note Other Important Topics:: discusses data alignment (*note 157SIMD alignment and fftw_malloc::), the storage scheme of 158multi-dimensional arrays (*note Multi-dimensional Array Format::), and 159FFTW's mechanism for storing plans on disk (*note Words of Wisdom-Saving 160Plans::). Next, *note FFTW Reference:: provides comprehensive 161documentation of all FFTW's features. Parallel transforms are discussed 162in their own chapters: *note Multi-threaded FFTW:: and *note 163Distributed-memory FFTW with MPI::. Fortran programmers can also use 164FFTW, as described in *note Calling FFTW from Legacy Fortran:: and *note 165Calling FFTW from Modern Fortran::. *note Installation and 166Customization:: explains how to install FFTW in your computer system and 167how to adapt FFTW to your needs. License and copyright information is 168given in *note License and Copyright::. Finally, we thank all the 169people who helped us in *note Acknowledgments::. 170 171 172File: fftw3.info, Node: Tutorial, Next: Other Important Topics, Prev: Introduction, Up: Top 173 1742 Tutorial 175********** 176 177* Menu: 178 179* Complex One-Dimensional DFTs:: 180* Complex Multi-Dimensional DFTs:: 181* One-Dimensional DFTs of Real Data:: 182* Multi-Dimensional DFTs of Real Data:: 183* More DFTs of Real Data:: 184 185This chapter describes the basic usage of FFTW, i.e., how to compute the 186Fourier transform of a single array. This chapter tells the truth, but 187not the _whole_ truth. Specifically, FFTW implements additional 188routines and flags that are not documented here, although in many cases 189we try to indicate where added capabilities exist. For more complete 190information, see *note FFTW Reference::. (Note that you need to compile 191and install FFTW before you can use it in a program. For the details of 192the installation, see *note Installation and Customization::.) 193 194 We recommend that you read this tutorial in order.(1) At the least, 195read the first section (*note Complex One-Dimensional DFTs::) before 196reading any of the others, even if your main interest lies in one of the 197other transform types. 198 199 Users of FFTW version 2 and earlier may also want to read *note 200Upgrading from FFTW version 2::. 201 202 ---------- Footnotes ---------- 203 204 (1) You can read the tutorial in bit-reversed order after computing 205your first transform. 206 207 208File: fftw3.info, Node: Complex One-Dimensional DFTs, Next: Complex Multi-Dimensional DFTs, Prev: Tutorial, Up: Tutorial 209 2102.1 Complex One-Dimensional DFTs 211================================ 212 213 Plan: To bother about the best method of accomplishing an 214 accidental result. [Ambrose Bierce, 'The Enlarged Devil's 215 Dictionary'.] 216 217 The basic usage of FFTW to compute a one-dimensional DFT of size 'N' 218is simple, and it typically looks something like this code: 219 220 #include <fftw3.h> 221 ... 222 { 223 fftw_complex *in, *out; 224 fftw_plan p; 225 ... 226 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); 227 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); 228 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); 229 ... 230 fftw_execute(p); /* repeat as needed */ 231 ... 232 fftw_destroy_plan(p); 233 fftw_free(in); fftw_free(out); 234 } 235 236 You must link this code with the 'fftw3' library. On Unix systems, 237link with '-lfftw3 -lm'. 238 239 The example code first allocates the input and output arrays. You 240can allocate them in any way that you like, but we recommend using 241'fftw_malloc', which behaves like 'malloc' except that it properly 242aligns the array when SIMD instructions (such as SSE and Altivec) are 243available (*note SIMD alignment and fftw_malloc::). [Alternatively, we 244provide a convenient wrapper function 'fftw_alloc_complex(N)' which has 245the same effect.] 246 247 The data is an array of type 'fftw_complex', which is by default a 248'double[2]' composed of the real ('in[i][0]') and imaginary ('in[i][1]') 249parts of a complex number. 250 251 The next step is to create a "plan", which is an object that contains 252all the data that FFTW needs to compute the FFT. This function creates 253the plan: 254 255 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, 256 int sign, unsigned flags); 257 258 The first argument, 'n', is the size of the transform you are trying 259to compute. The size 'n' can be any positive integer, but sizes that 260are products of small factors are transformed most efficiently (although 261prime sizes still use an O(n log n) algorithm). 262 263 The next two arguments are pointers to the input and output arrays of 264the transform. These pointers can be equal, indicating an "in-place" 265transform. 266 267 The fourth argument, 'sign', can be either 'FFTW_FORWARD' ('-1') or 268'FFTW_BACKWARD' ('+1'), and indicates the direction of the transform you 269are interested in; technically, it is the sign of the exponent in the 270transform. 271 272 The 'flags' argument is usually either 'FFTW_MEASURE' or 273'FFTW_ESTIMATE'. 'FFTW_MEASURE' instructs FFTW to run and measure the 274execution time of several FFTs in order to find the best way to compute 275the transform of size 'n'. This process takes some time (usually a few 276seconds), depending on your machine and on the size of the transform. 277'FFTW_ESTIMATE', on the contrary, does not run any computation and just 278builds a reasonable plan that is probably sub-optimal. In short, if 279your program performs many transforms of the same size and 280initialization time is not important, use 'FFTW_MEASURE'; otherwise use 281the estimate. 282 283 _You must create the plan before initializing the input_, because 284'FFTW_MEASURE' overwrites the 'in'/'out' arrays. (Technically, 285'FFTW_ESTIMATE' does not touch your arrays, but you should always create 286plans first just to be sure.) 287 288 Once the plan has been created, you can use it as many times as you 289like for transforms on the specified 'in'/'out' arrays, computing the 290actual transforms via 'fftw_execute(plan)': 291 void fftw_execute(const fftw_plan plan); 292 293 The DFT results are stored in-order in the array 'out', with the 294zero-frequency (DC) component in 'out[0]'. If 'in != out', the 295transform is "out-of-place" and the input array 'in' is not modified. 296Otherwise, the input array is overwritten with the transform. 297 298 If you want to transform a _different_ array of the same size, you 299can create a new plan with 'fftw_plan_dft_1d' and FFTW automatically 300reuses the information from the previous plan, if possible. 301Alternatively, with the "guru" interface you can apply a given plan to a 302different array, if you are careful. *Note FFTW Reference::. 303 304 When you are done with the plan, you deallocate it by calling 305'fftw_destroy_plan(plan)': 306 void fftw_destroy_plan(fftw_plan plan); 307 If you allocate an array with 'fftw_malloc()' you must deallocate it 308with 'fftw_free()'. Do not use 'free()' or, heaven forbid, 'delete'. 309 310 FFTW computes an _unnormalized_ DFT. Thus, computing a forward 311followed by a backward transform (or vice versa) results in the original 312array scaled by 'n'. For the definition of the DFT, see *note What FFTW 313Really Computes::. 314 315 If you have a C compiler, such as 'gcc', that supports the C99 316standard, and you '#include <complex.h>' _before_ '<fftw3.h>', then 317'fftw_complex' is the native double-precision complex type and you can 318manipulate it with ordinary arithmetic. Otherwise, FFTW defines its own 319complex type, which is bit-compatible with the C99 complex type. *Note 320Complex numbers::. (The C++ '<complex>' template class may also be 321usable via a typecast.) 322 323 To use single or long-double precision versions of FFTW, replace the 324'fftw_' prefix by 'fftwf_' or 'fftwl_' and link with '-lfftw3f' or 325'-lfftw3l', but use the _same_ '<fftw3.h>' header file. 326 327 Many more flags exist besides 'FFTW_MEASURE' and 'FFTW_ESTIMATE'. 328For example, use 'FFTW_PATIENT' if you're willing to wait even longer 329for a possibly even faster plan (*note FFTW Reference::). You can also 330save plans for future use, as described by *note Words of Wisdom-Saving 331Plans::. 332 333 334File: fftw3.info, Node: Complex Multi-Dimensional DFTs, Next: One-Dimensional DFTs of Real Data, Prev: Complex One-Dimensional DFTs, Up: Tutorial 335 3362.2 Complex Multi-Dimensional DFTs 337================================== 338 339Multi-dimensional transforms work much the same way as one-dimensional 340transforms: you allocate arrays of 'fftw_complex' (preferably using 341'fftw_malloc'), create an 'fftw_plan', execute it as many times as you 342want with 'fftw_execute(plan)', and clean up with 343'fftw_destroy_plan(plan)' (and 'fftw_free'). 344 345 FFTW provides two routines for creating plans for 2d and 3d 346transforms, and one routine for creating plans of arbitrary 347dimensionality. The 2d and 3d routines have the following signature: 348 fftw_plan fftw_plan_dft_2d(int n0, int n1, 349 fftw_complex *in, fftw_complex *out, 350 int sign, unsigned flags); 351 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2, 352 fftw_complex *in, fftw_complex *out, 353 int sign, unsigned flags); 354 355 These routines create plans for 'n0' by 'n1' two-dimensional (2d) 356transforms and 'n0' by 'n1' by 'n2' 3d transforms, respectively. All of 357these transforms operate on contiguous arrays in the C-standard 358"row-major" order, so that the last dimension has the fastest-varying 359index in the array. This layout is described further in *note 360Multi-dimensional Array Format::. 361 362 FFTW can also compute transforms of higher dimensionality. In order 363to avoid confusion between the various meanings of the the word 364"dimension", we use the term _rank_ to denote the number of independent 365indices in an array.(1) For example, we say that a 2d transform has 366rank 2, a 3d transform has rank 3, and so on. You can plan transforms 367of arbitrary rank by means of the following function: 368 369 fftw_plan fftw_plan_dft(int rank, const int *n, 370 fftw_complex *in, fftw_complex *out, 371 int sign, unsigned flags); 372 373 Here, 'n' is a pointer to an array 'n[rank]' denoting an 'n[0]' by 374'n[1]' by ... by 'n[rank-1]' transform. Thus, for example, the call 375 fftw_plan_dft_2d(n0, n1, in, out, sign, flags); 376 is equivalent to the following code fragment: 377 int n[2]; 378 n[0] = n0; 379 n[1] = n1; 380 fftw_plan_dft(2, n, in, out, sign, flags); 381 'fftw_plan_dft' is not restricted to 2d and 3d transforms, however, 382but it can plan transforms of arbitrary rank. 383 384 You may have noticed that all the planner routines described so far 385have overlapping functionality. For example, you can plan a 1d or 2d 386transform by using 'fftw_plan_dft' with a 'rank' of '1' or '2', or even 387by calling 'fftw_plan_dft_3d' with 'n0' and/or 'n1' equal to '1' (with 388no loss in efficiency). This pattern continues, and FFTW's planning 389routines in general form a "partial order," sequences of interfaces with 390strictly increasing generality but correspondingly greater complexity. 391 392 'fftw_plan_dft' is the most general complex-DFT routine that we 393describe in this tutorial, but there are also the advanced and guru 394interfaces, which allow one to efficiently combine multiple/strided 395transforms into a single FFTW plan, transform a subset of a larger 396multi-dimensional array, and/or to handle more general complex-number 397formats. For more information, see *note FFTW Reference::. 398 399 ---------- Footnotes ---------- 400 401 (1) The term "rank" is commonly used in the APL, FORTRAN, and Common 402Lisp traditions, although it is not so common in the C world. 403 404 405File: fftw3.info, Node: One-Dimensional DFTs of Real Data, Next: Multi-Dimensional DFTs of Real Data, Prev: Complex Multi-Dimensional DFTs, Up: Tutorial 406 4072.3 One-Dimensional DFTs of Real Data 408===================================== 409 410In many practical applications, the input data 'in[i]' are purely real 411numbers, in which case the DFT output satisfies the "Hermitian" 412redundancy: 'out[i]' is the conjugate of 'out[n-i]'. It is possible to 413take advantage of these circumstances in order to achieve roughly a 414factor of two improvement in both speed and memory usage. 415 416 In exchange for these speed and space advantages, the user sacrifices 417some of the simplicity of FFTW's complex transforms. First of all, the 418input and output arrays are of _different sizes and types_: the input is 419'n' real numbers, while the output is 'n/2+1' complex numbers (the 420non-redundant outputs); this also requires slight "padding" of the input 421array for in-place transforms. Second, the inverse transform (complex 422to real) has the side-effect of _overwriting its input array_, by 423default. Neither of these inconveniences should pose a serious problem 424for users, but it is important to be aware of them. 425 426 The routines to perform real-data transforms are almost the same as 427those for complex transforms: you allocate arrays of 'double' and/or 428'fftw_complex' (preferably using 'fftw_malloc' or 'fftw_alloc_complex'), 429create an 'fftw_plan', execute it as many times as you want with 430'fftw_execute(plan)', and clean up with 'fftw_destroy_plan(plan)' (and 431'fftw_free'). The only differences are that the input (or output) is of 432type 'double' and there are new routines to create the plan. In one 433dimension: 434 435 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, 436 unsigned flags); 437 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out, 438 unsigned flags); 439 440 for the real input to complex-Hermitian output ("r2c") and 441complex-Hermitian input to real output ("c2r") transforms. Unlike the 442complex DFT planner, there is no 'sign' argument. Instead, r2c DFTs are 443always 'FFTW_FORWARD' and c2r DFTs are always 'FFTW_BACKWARD'. (For 444single/long-double precision 'fftwf' and 'fftwl', 'double' should be 445replaced by 'float' and 'long double', respectively.) 446 447 Here, 'n' is the "logical" size of the DFT, not necessarily the 448physical size of the array. In particular, the real ('double') array 449has 'n' elements, while the complex ('fftw_complex') array has 'n/2+1' 450elements (where the division is rounded down). For an in-place 451transform, 'in' and 'out' are aliased to the same array, which must be 452big enough to hold both; so, the real array would actually have 453'2*(n/2+1)' elements, where the elements beyond the first 'n' are unused 454padding. (Note that this is very different from the concept of 455"zero-padding" a transform to a larger length, which changes the logical 456size of the DFT by actually adding new input data.) The kth element of 457the complex array is exactly the same as the kth element of the 458corresponding complex DFT. All positive 'n' are supported; products of 459small factors are most efficient, but an O(n log n) algorithm is used 460even for prime sizes. 461 462 As noted above, the c2r transform destroys its input array even for 463out-of-place transforms. This can be prevented, if necessary, by 464including 'FFTW_PRESERVE_INPUT' in the 'flags', with unfortunately some 465sacrifice in performance. This flag is also not currently supported for 466multi-dimensional real DFTs (next section). 467 468 Readers familiar with DFTs of real data will recall that the 0th (the 469"DC") and 'n/2'-th (the "Nyquist" frequency, when 'n' is even) elements 470of the complex output are purely real. Some implementations therefore 471store the Nyquist element where the DC imaginary part would go, in order 472to make the input and output arrays the same size. Such packing, 473however, does not generalize well to multi-dimensional transforms, and 474the space savings are miniscule in any case; FFTW does not support it. 475 476 An alternative interface for one-dimensional r2c and c2r DFTs can be 477found in the 'r2r' interface (*note The Halfcomplex-format DFT::), with 478"halfcomplex"-format output that _is_ the same size (and type) as the 479input array. That interface, although it is not very useful for 480multi-dimensional transforms, may sometimes yield better performance. 481 482 483File: fftw3.info, Node: Multi-Dimensional DFTs of Real Data, Next: More DFTs of Real Data, Prev: One-Dimensional DFTs of Real Data, Up: Tutorial 484 4852.4 Multi-Dimensional DFTs of Real Data 486======================================= 487 488Multi-dimensional DFTs of real data use the following planner routines: 489 490 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1, 491 double *in, fftw_complex *out, 492 unsigned flags); 493 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2, 494 double *in, fftw_complex *out, 495 unsigned flags); 496 fftw_plan fftw_plan_dft_r2c(int rank, const int *n, 497 double *in, fftw_complex *out, 498 unsigned flags); 499 500 as well as the corresponding 'c2r' routines with the input/output 501types swapped. These routines work similarly to their complex 502analogues, except for the fact that here the complex output array is cut 503roughly in half and the real array requires padding for in-place 504transforms (as in 1d, above). 505 506 As before, 'n' is the logical size of the array, and the consequences 507of this on the the format of the complex arrays deserve careful 508attention. Suppose that the real data has dimensions n[0] x n[1] x n[2] 509x ... x n[d-1] (in row-major order). Then, after an r2c transform, the 510output is an n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) array of 511'fftw_complex' values in row-major order, corresponding to slightly over 512half of the output of the corresponding complex DFT. (The division is 513rounded down.) The ordering of the data is otherwise exactly the same 514as in the complex-DFT case. 515 516 For out-of-place transforms, this is the end of the story: the real 517data is stored as a row-major array of size n[0] x n[1] x n[2] x ... x 518n[d-1] and the complex data is stored as a row-major array of size n[0] 519x n[1] x n[2] x ... x (n[d-1]/2 + 1) . 520 521 For in-place transforms, however, extra padding of the real-data 522array is necessary because the complex array is larger than the real 523array, and the two arrays share the same memory locations. Thus, for 524in-place transforms, the final dimension of the real-data array must be 525padded with extra values to accommodate the size of the complex 526data--two values if the last dimension is even and one if it is odd. 527That is, the last dimension of the real data must physically contain 2 * 528(n[d-1]/2+1) 'double' values (exactly enough to hold the complex data). 529This physical array size does not, however, change the _logical_ array 530size--only n[d-1] values are actually stored in the last dimension, and 531n[d-1] is the last dimension passed to the plan-creation routine. 532 533 For example, consider the transform of a two-dimensional real array 534of size 'n0' by 'n1'. The output of the r2c transform is a 535two-dimensional complex array of size 'n0' by 'n1/2+1', where the 'y' 536dimension has been cut nearly in half because of redundancies in the 537output. Because 'fftw_complex' is twice the size of 'double', the 538output array is slightly bigger than the input array. Thus, if we want 539to compute the transform in place, we must _pad_ the input array so that 540it is of size 'n0' by '2*(n1/2+1)'. If 'n1' is even, then there are two 541padding elements at the end of each row (which need not be initialized, 542as they are only used for output). 543 544 These transforms are unnormalized, so an r2c followed by a c2r 545transform (or vice versa) will result in the original data scaled by the 546number of real data elements--that is, the product of the (logical) 547dimensions of the real data. 548 549 (Because the last dimension is treated specially, if it is equal to 550'1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r 551transform. In that case, the last complex dimension also has size '1' 552('=1/2+1'), and no advantage is gained over the complex transforms.) 553 554 555File: fftw3.info, Node: More DFTs of Real Data, Prev: Multi-Dimensional DFTs of Real Data, Up: Tutorial 556 5572.5 More DFTs of Real Data 558========================== 559 560* Menu: 561 562* The Halfcomplex-format DFT:: 563* Real even/odd DFTs (cosine/sine transforms):: 564* The Discrete Hartley Transform:: 565 566FFTW supports several other transform types via a unified "r2r" 567(real-to-real) interface, so called because it takes a real ('double') 568array and outputs a real array of the same size. These r2r transforms 569currently fall into three categories: DFTs of real input and 570complex-Hermitian output in halfcomplex format, DFTs of real input with 571even/odd symmetry (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), 572and discrete Hartley transforms (DHTs), all described in more detail by 573the following sections. 574 575 The r2r transforms follow the by now familiar interface of creating 576an 'fftw_plan', executing it with 'fftw_execute(plan)', and destroying 577it with 'fftw_destroy_plan(plan)'. Furthermore, all r2r transforms 578share the same planner interface: 579 580 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, 581 fftw_r2r_kind kind, unsigned flags); 582 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out, 583 fftw_r2r_kind kind0, fftw_r2r_kind kind1, 584 unsigned flags); 585 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2, 586 double *in, double *out, 587 fftw_r2r_kind kind0, 588 fftw_r2r_kind kind1, 589 fftw_r2r_kind kind2, 590 unsigned flags); 591 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, 592 const fftw_r2r_kind *kind, unsigned flags); 593 594 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional 595transforms for contiguous arrays in row-major order, transforming (real) 596input to output of the same size, where 'n' specifies the _physical_ 597dimensions of the arrays. All positive 'n' are supported (with the 598exception of 'n=1' for the 'FFTW_REDFT00' kind, noted in the real-even 599subsection below); products of small factors are most efficient 600(factorizing 'n-1' and 'n+1' for 'FFTW_REDFT00' and 'FFTW_RODFT00' 601kinds, described below), but an O(n log n) algorithm is used even for 602prime sizes. 603 604 Each dimension has a "kind" parameter, of type 'fftw_r2r_kind', 605specifying the kind of r2r transform to be used for that dimension. (In 606the case of 'fftw_plan_r2r', this is an array 'kind[rank]' where 607'kind[i]' is the transform kind for the dimension 'n[i]'.) The kind can 608be one of a set of predefined constants, defined in the following 609subsections. 610 611 In other words, FFTW computes the separable product of the specified 612r2r transforms over each dimension, which can be used e.g. for partial 613differential equations with mixed boundary conditions. (For some r2r 614kinds, notably the halfcomplex DFT and the DHT, such a separable product 615is somewhat problematic in more than one dimension, however, as is 616described below.) 617 618 In the current version of FFTW, all r2r transforms except for the 619halfcomplex type are computed via pre- or post-processing of halfcomplex 620transforms, and they are therefore not as fast as they could be. Since 621most other general DCT/DST codes employ a similar algorithm, however, 622FFTW's implementation should provide at least competitive performance. 623 624 625File: fftw3.info, Node: The Halfcomplex-format DFT, Next: Real even/odd DFTs (cosine/sine transforms), Prev: More DFTs of Real Data, Up: More DFTs of Real Data 626 6272.5.1 The Halfcomplex-format DFT 628-------------------------------- 629 630An r2r kind of 'FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note 631One-Dimensional DFTs of Real Data::) but with "halfcomplex" format 632output, and may sometimes be faster and/or more convenient than the 633latter. The inverse "hc2r" transform is of kind 'FFTW_HC2R'. This 634consists of the non-redundant half of the complex output for a 1d 635real-input DFT of size 'n', stored as a sequence of 'n' real numbers 636('double') in the format: 637 638 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 639 640 Here, rk is the real part of the kth output, and ik is the imaginary 641part. (Division by 2 is rounded down.) For a halfcomplex array 642'hc[n]', the kth component thus has its real part in 'hc[k]' and its 643imaginary part in 'hc[n-k]', with the exception of 'k' '==' '0' or 'n/2' 644(the latter only if 'n' is even)--in these two cases, the imaginary part 645is zero due to symmetries of the real-input DFT, and is not stored. 646Thus, the r2hc transform of 'n' real values is a halfcomplex array of 647length 'n', and vice versa for hc2r. 648 649 Aside from the differing format, the output of 650'FFTW_R2HC'/'FFTW_HC2R' is otherwise exactly the same as for the 651corresponding 1d r2c/c2r transform (i.e. 'FFTW_FORWARD'/'FFTW_BACKWARD' 652transforms, respectively). Recall that these transforms are 653unnormalized, so r2hc followed by hc2r will result in the original data 654multiplied by 'n'. Furthermore, like the c2r transform, an out-of-place 655hc2r transform will _destroy its input_ array. 656 657 Although these halfcomplex transforms can be used with the 658multi-dimensional r2r interface, the interpretation of such a separable 659product of transforms along each dimension is problematic. For example, 660consider a two-dimensional 'n0' by 'n1', r2hc by r2hc transform planned 661by 'fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC, 662FFTW_MEASURE)'. Conceptually, FFTW first transforms the rows (of size 663'n1') to produce halfcomplex rows, and then transforms the columns (of 664size 'n0'). Half of these column transforms, however, are of imaginary 665parts, and should therefore be multiplied by i and combined with the 666r2hc transforms of the real columns to produce the 2d DFT amplitudes; 667FFTW's r2r transform does _not_ perform this combination for you. Thus, 668if a multi-dimensional real-input/output DFT is required, we recommend 669using the ordinary r2c/c2r interface (*note Multi-Dimensional DFTs of 670Real Data::). 671 672 673File: fftw3.info, Node: Real even/odd DFTs (cosine/sine transforms), Next: The Discrete Hartley Transform, Prev: The Halfcomplex-format DFT, Up: More DFTs of Real Data 674 6752.5.2 Real even/odd DFTs (cosine/sine transforms) 676------------------------------------------------- 677 678The Fourier transform of a real-even function f(-x) = f(x) is real-even, 679and i times the Fourier transform of a real-odd function f(-x) = -f(x) 680is real-odd. Similar results hold for a discrete Fourier transform, and 681thus for these symmetries the need for complex inputs/outputs is 682entirely eliminated. Moreover, one gains a factor of two in speed/space 683from the fact that the data are real, and an additional factor of two 684from the even/odd symmetry: only the non-redundant (first) half of the 685array need be stored. The result is the real-even DFT ("REDFT") and the 686real-odd DFT ("RODFT"), also known as the discrete cosine and sine 687transforms ("DCT" and "DST"), respectively. 688 689 (In this section, we describe the 1d transforms; multi-dimensional 690transforms are just a separable product of these transforms operating 691along each dimension.) 692 693 Because of the discrete sampling, one has an additional choice: is 694the data even/odd around a sampling point, or around the point halfway 695between two samples? The latter corresponds to _shifting_ the samples 696by _half_ an interval, and gives rise to several transform variants 697denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether 698the input (a) and/or output (b) are shifted by half a sample (1 means it 699is shifted). These are also known as types I-IV of the DCT and DST, and 700all four types are supported by FFTW's r2r interface.(1) 701 702 The r2r kinds for the various REDFT and RODFT types supported by 703FFTW, along with the boundary conditions at both ends of the _input_ 704array ('n' real numbers 'in[j=0..n-1]'), are: 705 706 * 'FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1. 707 708 * 'FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even 709 around j=n-0.5. 710 711 * 'FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd 712 around j=n. 713 714 * 'FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5. 715 716 * 'FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n. 717 718 * 'FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5. 719 720 * 'FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1. 721 722 * 'FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5. 723 724 Note that these symmetries apply to the "logical" array being 725transformed; *there are no constraints on your physical input data*. 726So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data 727abcde, it corresponds to the DFT of the logical even array abcdedcb of 728size 8. A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the 729size-8 logical DFT of the even array abcddcba, shifted by half a sample. 730 731 All of these transforms are invertible. The inverse of R*DFT00 is 732R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called 733simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11. 734However, the transforms computed by FFTW are unnormalized, exactly like 735the corresponding real and complex DFTs, so computing a transform 736followed by its inverse yields the original array scaled by N, where N 737is the _logical_ DFT size. For REDFT00, N=2(n-1); for RODFT00, 738N=2(n+1); otherwise, N=2n. 739 740 Note that the boundary conditions of the transform output array are 741given by the input boundary conditions of the inverse transform. Thus, 742the above transforms are all inequivalent in terms of input/output 743boundary conditions, even neglecting the 0.5 shift difference. 744 745 FFTW is most efficient when N is a product of small factors; note 746that this _differs_ from the factorization of the physical size 'n' for 747REDFT00 and RODFT00! There is another oddity: 'n=1' REDFT00 transforms 748correspond to N=0, and so are _not defined_ (the planner will return 749'NULL'). Otherwise, any positive 'n' is supported. 750 751 For the precise mathematical definitions of these transforms as used 752by FFTW, see *note What FFTW Really Computes::. (For people accustomed 753to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of 754the cos/sin functions so that they correspond precisely to an even/odd 755DFT of size N. Some authors also include additional multiplicative 756factors of sqrt(2) for selected inputs and outputs; this makes the 757transform orthogonal, but sacrifices the direct equivalence to a 758symmetric DFT.) 759 760Which type do you need? 761....................... 762 763Since the required flavor of even/odd DFT depends upon your problem, you 764are the best judge of this choice, but we can make a few comments on 765relative efficiency to help you in your selection. In particular, 766R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially 767for odd sizes), while the R*DFT00 transforms are sometimes significantly 768slower (especially for even sizes).(2) 769 770 Thus, if only the boundary conditions on the transform inputs are 771specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over 772R*DFT11 (unless the half-sample shift or the self-inverse property is 773significant for your problem). 774 775 If performance is important to you and you are using only small sizes 776(say n<200), e.g. for multi-dimensional transforms, then you might 777consider generating hard-coded transforms of those sizes and types that 778you are interested in (*note Generating your own code::). 779 780 We are interested in hearing what types of symmetric transforms you 781find most useful. 782 783 ---------- Footnotes ---------- 784 785 (1) There are also type V-VIII transforms, which correspond to a 786logical DFT of _odd_ size N, independent of whether the physical size 787'n' is odd, but we do not support these variants. 788 789 (2) R*DFT00 is sometimes slower in FFTW because we discovered that 790the standard algorithm for computing this by a pre/post-processed real 791DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources 792for decades now--has serious numerical problems: it already loses 793several decimal places of accuracy for 16k sizes. There seem to be only 794two alternatives in the literature that do not suffer similarly: a 795recursive decomposition into smaller DCTs, which would require a large 796set of codelets for efficiency and generality, or sacrificing a factor 797of 2 in speed to use a real DFT of twice the size. We currently employ 798the latter technique for general n, as well as a limited form of the 799former method: a split-radix decomposition when n is odd (N a multiple 800of 4). For N containing many factors of 2, the split-radix method seems 801to recover most of the speed of the standard algorithm without the 802accuracy tradeoff. 803 804 805File: fftw3.info, Node: The Discrete Hartley Transform, Prev: Real even/odd DFTs (cosine/sine transforms), Up: More DFTs of Real Data 806 8072.5.3 The Discrete Hartley Transform 808------------------------------------ 809 810If you are planning to use the DHT because you've heard that it is 811"faster" than the DFT (FFT), *stop here*. The DHT is not faster than 812the DFT. That story is an old but enduring misconception that was 813debunked in 1987. 814 815 The discrete Hartley transform (DHT) is an invertible linear 816transform closely related to the DFT. In the DFT, one multiplies each 817input by cos - i * sin (a complex exponential), whereas in the DHT each 818input is multiplied by simply cos + sin. Thus, the DHT transforms 'n' 819real numbers to 'n' real numbers, and has the convenient property of 820being its own inverse. In FFTW, a DHT (of any positive 'n') can be 821specified by an r2r kind of 'FFTW_DHT'. 822 823 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of 824size 'n' followed by another DHT of the same size will result in the 825original array multiplied by 'n'. 826 827 The DHT was originally proposed as a more efficient alternative to 828the DFT for real data, but it was subsequently shown that a specialized 829DFT (such as FFTW's r2hc or r2c transforms) could be just as fast. In 830FFTW, the DHT is actually computed by post-processing an r2hc transform, 831so there is ordinarily no reason to prefer it from a performance 832perspective.(1) However, we have heard rumors that the DHT might be the 833most appropriate transform in its own right for certain applications, 834and we would be very interested to hear from anyone who finds it useful. 835 836 If 'FFTW_DHT' is specified for multiple dimensions of a 837multi-dimensional transform, FFTW computes the separable product of 1d 838DHTs along each dimension. Unfortunately, this is not quite the same 839thing as a true multi-dimensional DHT; you can compute the latter, if 840necessary, with at most 'rank-1' post-processing passes [see e.g. H. 841Hao and R. N. Bracewell, Proc. IEEE 75, 264-266 (1987)]. 842 843 For the precise mathematical definition of the DHT as used by FFTW, 844see *note What FFTW Really Computes::. 845 846 ---------- Footnotes ---------- 847 848 (1) We provide the DHT mainly as a byproduct of some internal 849algorithms. FFTW computes a real input/output DFT of _prime_ size by 850re-expressing it as a DHT plus post/pre-processing and then using 851Rader's prime-DFT algorithm adapted to the DHT. 852 853 854File: fftw3.info, Node: Other Important Topics, Next: FFTW Reference, Prev: Tutorial, Up: Top 855 8563 Other Important Topics 857************************ 858 859* Menu: 860 861* SIMD alignment and fftw_malloc:: 862* Multi-dimensional Array Format:: 863* Words of Wisdom-Saving Plans:: 864* Caveats in Using Wisdom:: 865 866 867File: fftw3.info, Node: SIMD alignment and fftw_malloc, Next: Multi-dimensional Array Format, Prev: Other Important Topics, Up: Other Important Topics 868 8693.1 SIMD alignment and fftw_malloc 870================================== 871 872SIMD, which stands for "Single Instruction Multiple Data," is a set of 873special operations supported by some processors to perform a single 874operation on several numbers (usually 2 or 4) simultaneously. SIMD 875floating-point instructions are available on several popular CPUs: 876SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and 877VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be 878compiled to support the SIMD instructions on any of these systems. 879 880 A program linking to an FFTW library compiled with SIMD support can 881obtain a nonnegligible speedup for most complex and r2c/c2r transforms. 882In order to obtain this speedup, however, the arrays of complex (or 883real) data passed to FFTW must be specially aligned in memory (typically 88416-byte aligned), and often this alignment is more stringent than that 885provided by the usual 'malloc' (etc.) allocation routines. 886 887 In order to guarantee proper alignment for SIMD, therefore, in case 888your program is ever linked against a SIMD-using FFTW, we recommend 889allocating your transform data with 'fftw_malloc' and de-allocating it 890with 'fftw_free'. These have exactly the same interface and behavior as 891'malloc'/'free', except that for a SIMD FFTW they ensure that the 892returned pointer has the necessary alignment (by calling 'memalign' or 893its equivalent on your OS). 894 895 You are not _required_ to use 'fftw_malloc'. You can allocate your 896data in any way that you like, from 'malloc' to 'new' (in C++) to a 897fixed-size array declaration. If the array happens not to be properly 898aligned, FFTW will not use the SIMD extensions. 899 900 Since 'fftw_malloc' only ever needs to be used for real and complex 901arrays, we provide two convenient wrapper routines 'fftw_alloc_real(N)' 902and 'fftw_alloc_complex(N)' that are equivalent to 903'(double*)fftw_malloc(sizeof(double) * N)' and 904'(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively (or 905their equivalents in other precisions). 906 907 908File: fftw3.info, Node: Multi-dimensional Array Format, Next: Words of Wisdom-Saving Plans, Prev: SIMD alignment and fftw_malloc, Up: Other Important Topics 909 9103.2 Multi-dimensional Array Format 911================================== 912 913This section describes the format in which multi-dimensional arrays are 914stored in FFTW. We felt that a detailed discussion of this topic was 915necessary. Since several different formats are common, this topic is 916often a source of confusion. 917 918* Menu: 919 920* Row-major Format:: 921* Column-major Format:: 922* Fixed-size Arrays in C:: 923* Dynamic Arrays in C:: 924* Dynamic Arrays in C-The Wrong Way:: 925 926 927File: fftw3.info, Node: Row-major Format, Next: Column-major Format, Prev: Multi-dimensional Array Format, Up: Multi-dimensional Array Format 928 9293.2.1 Row-major Format 930---------------------- 931 932The multi-dimensional arrays passed to 'fftw_plan_dft' etcetera are 933expected to be stored as a single contiguous block in "row-major" order 934(sometimes called "C order"). Basically, this means that as you step 935through adjacent memory locations, the first dimension's index varies 936most slowly and the last dimension's index varies most quickly. 937 938 To be more explicit, let us consider an array of rank d whose 939dimensions are n[0] x n[1] x n[2] x ... x n[d-1] . Now, we specify a 940location in the array by a sequence of d (zero-based) indices, one for 941each dimension: (i[0], i[1], ..., i[d-1]). If the array is stored in 942row-major order, then this element is located at the position i[d-1] + 943n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])). 944 945 Note that, for the ordinary complex DFT, each element of the array 946must be of type 'fftw_complex'; i.e. a (real, imaginary) pair of 947(double-precision) numbers. 948 949 In the advanced FFTW interface, the physical dimensions n from which 950the indices are computed can be different from (larger than) the logical 951dimensions of the transform to be computed, in order to transform a 952subset of a larger array. Note also that, in the advanced interface, 953the expression above is multiplied by a "stride" to get the actual array 954index--this is useful in situations where each element of the 955multi-dimensional array is actually a data structure (or another array), 956and you just want to transform a single field. In the basic interface, 957however, the stride is 1. 958 959 960File: fftw3.info, Node: Column-major Format, Next: Fixed-size Arrays in C, Prev: Row-major Format, Up: Multi-dimensional Array Format 961 9623.2.2 Column-major Format 963------------------------- 964 965Readers from the Fortran world are used to arrays stored in 966"column-major" order (sometimes called "Fortran order"). This is 967essentially the exact opposite of row-major order in that, here, the 968_first_ dimension's index varies most quickly. 969 970 If you have an array stored in column-major order and wish to 971transform it using FFTW, it is quite easy to do. When creating the 972plan, simply pass the dimensions of the array to the planner in _reverse 973order_. For example, if your array is a rank three 'N x M x L' matrix 974in column-major order, you should pass the dimensions of the array as if 975it were an 'L x M x N' matrix (which it is, from the perspective of 976FFTW). This is done for you _automatically_ by the FFTW legacy-Fortran 977interface (*note Calling FFTW from Legacy Fortran::), but you must do it 978manually with the modern Fortran interface (*note Reversing array 979dimensions::). 980 981 982File: fftw3.info, Node: Fixed-size Arrays in C, Next: Dynamic Arrays in C, Prev: Column-major Format, Up: Multi-dimensional Array Format 983 9843.2.3 Fixed-size Arrays in C 985---------------------------- 986 987A multi-dimensional array whose size is declared at compile time in C is 988_already_ in row-major order. You don't have to do anything special to 989transform it. For example: 990 991 { 992 fftw_complex data[N0][N1][N2]; 993 fftw_plan plan; 994 ... 995 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0], 996 FFTW_FORWARD, FFTW_ESTIMATE); 997 ... 998 } 999 1000 This will plan a 3d in-place transform of size 'N0 x N1 x N2'. 1001Notice how we took the address of the zero-th element to pass to the 1002planner (we could also have used a typecast). 1003 1004 However, we tend to _discourage_ users from declaring their arrays in 1005this way, for two reasons. First, this allocates the array on the stack 1006("automatic" storage), which has a very limited size on most operating 1007systems (declaring an array with more than a few thousand elements will 1008often cause a crash). (You can get around this limitation on many 1009systems by declaring the array as 'static' and/or global, but that has 1010its own drawbacks.) Second, it may not optimally align the array for 1011use with a SIMD FFTW (*note SIMD alignment and fftw_malloc::). Instead, 1012we recommend using 'fftw_malloc', as described below. 1013 1014 1015File: fftw3.info, Node: Dynamic Arrays in C, Next: Dynamic Arrays in C-The Wrong Way, Prev: Fixed-size Arrays in C, Up: Multi-dimensional Array Format 1016 10173.2.4 Dynamic Arrays in C 1018------------------------- 1019 1020We recommend allocating most arrays dynamically, with 'fftw_malloc'. 1021This isn't too hard to do, although it is not as straightforward for 1022multi-dimensional arrays as it is for one-dimensional arrays. 1023 1024 Creating the array is simple: using a dynamic-allocation routine like 1025'fftw_malloc', allocate an array big enough to store N 'fftw_complex' 1026values (for a complex DFT), where N is the product of the sizes of the 1027array dimensions (i.e. the total number of complex values in the 1028array). For example, here is code to allocate a 5 x 12 x 27 rank-3 1029array: 1030 1031 fftw_complex *an_array; 1032 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex)); 1033 1034 Accessing the array elements, however, is more tricky--you can't 1035simply use multiple applications of the '[]' operator like you could for 1036fixed-size arrays. Instead, you have to explicitly compute the offset 1037into the array using the formula given earlier for row-major arrays. 1038For example, to reference the (i,j,k)-th element of the array allocated 1039above, you would use the expression 'an_array[k + 27 * (j + 12 * i)]'. 1040 1041 This pain can be alleviated somewhat by defining appropriate macros, 1042or, in C++, creating a class and overloading the '()' operator. The 1043recent C99 standard provides a way to reinterpret the dynamic array as a 1044"variable-length" multi-dimensional array amenable to '[]', but this 1045feature is not yet widely supported by compilers. 1046 1047 1048File: fftw3.info, Node: Dynamic Arrays in C-The Wrong Way, Prev: Dynamic Arrays in C, Up: Multi-dimensional Array Format 1049 10503.2.5 Dynamic Arrays in C--The Wrong Way 1051---------------------------------------- 1052 1053A different method for allocating multi-dimensional arrays in C is often 1054suggested that is incompatible with FFTW: _using it will cause FFTW to 1055die a painful death_. We discuss the technique here, however, because 1056it is so commonly known and used. This method is to create arrays of 1057pointers of arrays of pointers of ...etcetera. For example, the 1058analogue in this method to the example above is: 1059 1060 int i,j; 1061 fftw_complex ***a_bad_array; /* another way to make a 5x12x27 array */ 1062 1063 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **)); 1064 for (i = 0; i < 5; ++i) { 1065 a_bad_array[i] = 1066 (fftw_complex **) malloc(12 * sizeof(fftw_complex *)); 1067 for (j = 0; j < 12; ++j) 1068 a_bad_array[i][j] = 1069 (fftw_complex *) malloc(27 * sizeof(fftw_complex)); 1070 } 1071 1072 As you can see, this sort of array is inconvenient to allocate (and 1073deallocate). On the other hand, it has the advantage that the 1074(i,j,k)-th element can be referenced simply by 'a_bad_array[i][j][k]'. 1075 1076 If you like this technique and want to maximize convenience in 1077accessing the array, but still want to pass the array to FFTW, you can 1078use a hybrid method. Allocate the array as one contiguous block, but 1079also declare an array of arrays of pointers that point to appropriate 1080places in the block. That sort of trick is beyond the scope of this 1081documentation; for more information on multi-dimensional arrays in C, 1082see the 'comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html). 1083 1084 1085File: fftw3.info, Node: Words of Wisdom-Saving Plans, Next: Caveats in Using Wisdom, Prev: Multi-dimensional Array Format, Up: Other Important Topics 1086 10873.3 Words of Wisdom--Saving Plans 1088================================= 1089 1090FFTW implements a method for saving plans to disk and restoring them. 1091In fact, what FFTW does is more general than just saving and loading 1092plans. The mechanism is called "wisdom". Here, we describe this 1093feature at a high level. *Note FFTW Reference::, for a less casual but 1094more complete discussion of how to use wisdom in FFTW. 1095 1096 Plans created with the 'FFTW_MEASURE', 'FFTW_PATIENT', or 1097'FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may 1098require a long time to compute because FFTW must measure the runtime of 1099many possible plans and select the best one. This setup is designed for 1100the situations where so many transforms of the same size must be 1101computed that the start-up time is irrelevant. For short initialization 1102times, but slower transforms, we have provided 'FFTW_ESTIMATE'. The 1103'wisdom' mechanism is a way to get the best of both worlds: you compute 1104a good plan once, save it to disk, and later reload it as many times as 1105necessary. The wisdom mechanism can actually save and reload many plans 1106at once, not just one. 1107 1108 Whenever you create a plan, the FFTW planner accumulates wisdom, 1109which is information sufficient to reconstruct the plan. After 1110planning, you can save this information to disk by means of the 1111function: 1112 int fftw_export_wisdom_to_filename(const char *filename); 1113 (This function returns non-zero on success.) 1114 1115 The next time you run the program, you can restore the wisdom with 1116'fftw_import_wisdom_from_filename' (which also returns non-zero on 1117success), and then recreate the plan using the same flags as before. 1118 int fftw_import_wisdom_from_filename(const char *filename); 1119 1120 Wisdom is automatically used for any size to which it is applicable, 1121as long as the planner flags are not more "patient" than those with 1122which the wisdom was created. For example, wisdom created with 1123'FFTW_MEASURE' can be used if you later plan with 'FFTW_ESTIMATE' or 1124'FFTW_MEASURE', but not with 'FFTW_PATIENT'. 1125 1126 The 'wisdom' is cumulative, and is stored in a global, private data 1127structure managed internally by FFTW. The storage space required is 1128minimal, proportional to the logarithm of the sizes the wisdom was 1129generated from. If memory usage is a concern, however, the wisdom can 1130be forgotten and its associated memory freed by calling: 1131 void fftw_forget_wisdom(void); 1132 1133 Wisdom can be exported to a file, a string, or any other medium. For 1134details, see *note Wisdom::. 1135 1136 1137File: fftw3.info, Node: Caveats in Using Wisdom, Prev: Words of Wisdom-Saving Plans, Up: Other Important Topics 1138 11393.4 Caveats in Using Wisdom 1140=========================== 1141 1142 For in much wisdom is much grief, and he that increaseth knowledge 1143 increaseth sorrow. [Ecclesiastes 1:18] 1144 1145 There are pitfalls to using wisdom, in that it can negate FFTW's 1146ability to adapt to changing hardware and other conditions. For 1147example, it would be perfectly possible to export wisdom from a program 1148running on one processor and import it into a program running on another 1149processor. Doing so, however, would mean that the second program would 1150use plans optimized for the first processor, instead of the one it is 1151running on. 1152 1153 It should be safe to reuse wisdom as long as the hardware and program 1154binaries remain unchanged. (Actually, the optimal plan may change even 1155between runs of the same binary on identical hardware, due to 1156differences in the virtual memory environment, etcetera. Users 1157seriously interested in performance should worry about this problem, 1158too.) It is likely that, if the same wisdom is used for two different 1159program binaries, even running on the same machine, the plans may be 1160sub-optimal because of differing code alignments. It is therefore wise 1161to recreate wisdom every time an application is recompiled. The more 1162the underlying hardware and software changes between the creation of 1163wisdom and its use, the greater grows the risk of sub-optimal plans. 1164 1165 Nevertheless, if the choice is between using 'FFTW_ESTIMATE' or using 1166possibly-suboptimal wisdom (created on the same machine, but for a 1167different binary), the wisdom is likely to be better. For this reason, 1168we provide a function to import wisdom from a standard system-wide 1169location ('/usr/local/etc/fftw/wisdom' on Unix): 1170 1171 int fftw_import_system_wisdom(void); 1172 1173 FFTW also provides a standalone program, 'fftw-wisdom' (described by 1174its own 'man' page on Unix) with which users can create wisdom, e.g. 1175for a canonical set of sizes to store in the system wisdom file. *Note 1176Wisdom Utilities::. 1177 1178 1179File: fftw3.info, Node: FFTW Reference, Next: Multi-threaded FFTW, Prev: Other Important Topics, Up: Top 1180 11814 FFTW Reference 1182**************** 1183 1184This chapter provides a complete reference for all sequential (i.e., 1185one-processor) FFTW functions. Parallel transforms are described in 1186later chapters. 1187 1188* Menu: 1189 1190* Data Types and Files:: 1191* Using Plans:: 1192* Basic Interface:: 1193* Advanced Interface:: 1194* Guru Interface:: 1195* New-array Execute Functions:: 1196* Wisdom:: 1197* What FFTW Really Computes:: 1198 1199 1200File: fftw3.info, Node: Data Types and Files, Next: Using Plans, Prev: FFTW Reference, Up: FFTW Reference 1201 12024.1 Data Types and Files 1203======================== 1204 1205All programs using FFTW should include its header file: 1206 1207 #include <fftw3.h> 1208 1209 You must also link to the FFTW library. On Unix, this means adding 1210'-lfftw3 -lm' at the _end_ of the link command. 1211 1212* Menu: 1213 1214* Complex numbers:: 1215* Precision:: 1216* Memory Allocation:: 1217 1218 1219File: fftw3.info, Node: Complex numbers, Next: Precision, Prev: Data Types and Files, Up: Data Types and Files 1220 12214.1.1 Complex numbers 1222--------------------- 1223 1224The default FFTW interface uses 'double' precision for all 1225floating-point numbers, and defines a 'fftw_complex' type to hold 1226complex numbers as: 1227 1228 typedef double fftw_complex[2]; 1229 1230 Here, the '[0]' element holds the real part and the '[1]' element 1231holds the imaginary part. 1232 1233 Alternatively, if you have a C compiler (such as 'gcc') that supports 1234the C99 revision of the ANSI C standard, you can use C's new native 1235complex type (which is binary-compatible with the typedef above). In 1236particular, if you '#include <complex.h>' _before_ '<fftw3.h>', then 1237'fftw_complex' is defined to be the native complex type and you can 1238manipulate it with ordinary arithmetic (e.g. 'x = y * (3+4*I)', where 1239'x' and 'y' are 'fftw_complex' and 'I' is the standard symbol for the 1240imaginary unit); 1241 1242 C++ has its own 'complex<T>' template class, defined in the standard 1243'<complex>' header file. Reportedly, the C++ standards committee has 1244recently agreed to mandate that the storage format used for this type be 1245binary-compatible with the C99 type, i.e. an array 'T[2]' with 1246consecutive real '[0]' and imaginary '[1]' parts. (See report 1247<http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf 1248WG21/N1388>.) Although not part of the official standard as of this 1249writing, the proposal stated that: "This solution has been tested with 1250all current major implementations of the standard library and shown to 1251be working." To the extent that this is true, if you have a variable 1252'complex<double> *x', you can pass it directly to FFTW via 1253'reinterpret_cast<fftw_complex*>(x)'. 1254 1255 1256File: fftw3.info, Node: Precision, Next: Memory Allocation, Prev: Complex numbers, Up: Data Types and Files 1257 12584.1.2 Precision 1259--------------- 1260 1261You can install single and long-double precision versions of FFTW, which 1262replace 'double' with 'float' and 'long double', respectively (*note 1263Installation and Customization::). To use these interfaces, you: 1264 1265 * Link to the single/long-double libraries; on Unix, '-lfftw3f' or 1266 '-lfftw3l' instead of (or in addition to) '-lfftw3'. (You can link 1267 to the different-precision libraries simultaneously.) 1268 1269 * Include the _same_ '<fftw3.h>' header file. 1270 1271 * Replace all lowercase instances of 'fftw_' with 'fftwf_' or 1272 'fftwl_' for single or long-double precision, respectively. 1273 ('fftw_complex' becomes 'fftwf_complex', 'fftw_execute' becomes 1274 'fftwf_execute', etcetera.) 1275 1276 * Uppercase names, i.e. names beginning with 'FFTW_', remain the 1277 same. 1278 1279 * Replace 'double' with 'float' or 'long double' for subroutine 1280 parameters. 1281 1282 Depending upon your compiler and/or hardware, 'long double' may not 1283be any more precise than 'double' (or may not be supported at all, 1284although it is standard in C99). 1285 1286 We also support using the nonstandard '__float128' 1287quadruple-precision type provided by recent versions of 'gcc' on 32- and 128864-bit x86 hardware (*note Installation and Customization::). To use 1289this type, link with '-lfftw3q -lquadmath -lm' (the 'libquadmath' 1290library provided by 'gcc' is needed for quadruple-precision 1291trigonometric functions) and use 'fftwq_' identifiers. 1292 1293 1294File: fftw3.info, Node: Memory Allocation, Prev: Precision, Up: Data Types and Files 1295 12964.1.3 Memory Allocation 1297----------------------- 1298 1299 void *fftw_malloc(size_t n); 1300 void fftw_free(void *p); 1301 1302 These are functions that behave identically to 'malloc' and 'free', 1303except that they guarantee that the returned pointer obeys any special 1304alignment restrictions imposed by any algorithm in FFTW (e.g. for SIMD 1305acceleration). *Note SIMD alignment and fftw_malloc::. 1306 1307 Data allocated by 'fftw_malloc' _must_ be deallocated by 'fftw_free' 1308and not by the ordinary 'free'. 1309 1310 These routines simply call through to your operating system's 1311'malloc' or, if necessary, its aligned equivalent (e.g. 'memalign'), so 1312you normally need not worry about any significant time or space 1313overhead. You are _not required_ to use them to allocate your data, but 1314we strongly recommend it. 1315 1316 Note: in C++, just as with ordinary 'malloc', you must typecast the 1317output of 'fftw_malloc' to whatever pointer type you are allocating. 1318 1319 We also provide the following two convenience functions to allocate 1320real and complex arrays with 'n' elements, which are equivalent to 1321'(double *) fftw_malloc(sizeof(double) * n)' and '(fftw_complex *) 1322fftw_malloc(sizeof(fftw_complex) * n)', respectively: 1323 1324 double *fftw_alloc_real(size_t n); 1325 fftw_complex *fftw_alloc_complex(size_t n); 1326 1327 The equivalent functions in other precisions allocate arrays of 'n' 1328elements in that precision. e.g. 'fftwf_alloc_real(n)' is equivalent 1329to '(float *) fftwf_malloc(sizeof(float) * n)'. 1330 1331 1332File: fftw3.info, Node: Using Plans, Next: Basic Interface, Prev: Data Types and Files, Up: FFTW Reference 1333 13344.2 Using Plans 1335=============== 1336 1337Plans for all transform types in FFTW are stored as type 'fftw_plan' (an 1338opaque pointer type), and are created by one of the various planning 1339routines described in the following sections. An 'fftw_plan' contains 1340all information necessary to compute the transform, including the 1341pointers to the input and output arrays. 1342 1343 void fftw_execute(const fftw_plan plan); 1344 1345 This executes the 'plan', to compute the corresponding transform on 1346the arrays for which it was planned (which must still exist). The plan 1347is not modified, and 'fftw_execute' can be called as many times as 1348desired. 1349 1350 To apply a given plan to a different array, you can use the new-array 1351execute interface. *Note New-array Execute Functions::. 1352 1353 'fftw_execute' (and equivalents) is the only function in FFTW 1354guaranteed to be thread-safe; see *note Thread safety::. 1355 1356 This function: 1357 void fftw_destroy_plan(fftw_plan plan); 1358 deallocates the 'plan' and all its associated data. 1359 1360 FFTW's planner saves some other persistent data, such as the 1361accumulated wisdom and a list of algorithms available in the current 1362configuration. If you want to deallocate all of that and reset FFTW to 1363the pristine state it was in when you started your program, you can 1364call: 1365 1366 void fftw_cleanup(void); 1367 1368 After calling 'fftw_cleanup', all existing plans become undefined, 1369and you should not attempt to execute them nor to destroy them. You can 1370however create and execute/destroy new plans, in which case FFTW starts 1371accumulating wisdom information again. 1372 1373 'fftw_cleanup' does not deallocate your plans, however. To prevent 1374memory leaks, you must still call 'fftw_destroy_plan' before executing 1375'fftw_cleanup'. 1376 1377 Occasionally, it may useful to know FFTW's internal "cost" metric 1378that it uses to compare plans to one another; this cost is proportional 1379to an execution time of the plan, in undocumented units, if the plan was 1380created with the 'FFTW_MEASURE' or other timing-based options, or 1381alternatively is a heuristic cost function for 'FFTW_ESTIMATE' plans. 1382(The cost values of measured and estimated plans are not comparable, 1383being in different units. Also, costs from different FFTW versions or 1384the same version compiled differently may not be in the same units. 1385Plans created from wisdom have a cost of 0 since no timing measurement 1386is performed for them. Finally, certain problems for which only one 1387top-level algorithm was possible may have required no measurements of 1388the cost of the whole plan, in which case 'fftw_cost' will also return 13890.) The cost metric for a given plan is returned by: 1390 1391 double fftw_cost(const fftw_plan plan); 1392 1393 The following two routines are provided purely for academic purposes 1394(that is, for entertainment). 1395 1396 void fftw_flops(const fftw_plan plan, 1397 double *add, double *mul, double *fma); 1398 1399 Given a 'plan', set 'add', 'mul', and 'fma' to an exact count of the 1400number of floating-point additions, multiplications, and fused 1401multiply-add operations involved in the plan's execution. The total 1402number of floating-point operations (flops) is 'add + mul + 2*fma', or 1403'add + mul + fma' if the hardware supports fused multiply-add 1404instructions (although the number of FMA operations is only approximate 1405because of compiler voodoo). (The number of operations should be an 1406integer, but we use 'double' to avoid overflowing 'int' for large 1407transforms; the arguments are of type 'double' even for single and 1408long-double precision versions of FFTW.) 1409 1410 void fftw_fprint_plan(const fftw_plan plan, FILE *output_file); 1411 void fftw_print_plan(const fftw_plan plan); 1412 char *fftw_sprint_plan(const fftw_plan plan); 1413 1414 This outputs a "nerd-readable" representation of the 'plan' to the 1415given file, to 'stdout', or two a newly allocated NUL-terminated string 1416(which the caller is responsible for deallocating with 'free'), 1417respectively. 1418 1419 1420File: fftw3.info, Node: Basic Interface, Next: Advanced Interface, Prev: Using Plans, Up: FFTW Reference 1421 14224.3 Basic Interface 1423=================== 1424 1425Recall that the FFTW API is divided into three parts(1): the "basic 1426interface" computes a single transform of contiguous data, the "advanced 1427interface" computes transforms of multiple or strided arrays, and the 1428"guru interface" supports the most general data layouts, multiplicities, 1429and strides. This section describes the the basic interface, which we 1430expect to satisfy the needs of most users. 1431 1432* Menu: 1433 1434* Complex DFTs:: 1435* Planner Flags:: 1436* Real-data DFTs:: 1437* Real-data DFT Array Format:: 1438* Real-to-Real Transforms:: 1439* Real-to-Real Transform Kinds:: 1440 1441 ---------- Footnotes ---------- 1442 1443 (1) Gallia est omnis divisa in partes tres (Julius Caesar). 1444 1445 1446File: fftw3.info, Node: Complex DFTs, Next: Planner Flags, Prev: Basic Interface, Up: Basic Interface 1447 14484.3.1 Complex DFTs 1449------------------ 1450 1451 fftw_plan fftw_plan_dft_1d(int n0, 1452 fftw_complex *in, fftw_complex *out, 1453 int sign, unsigned flags); 1454 fftw_plan fftw_plan_dft_2d(int n0, int n1, 1455 fftw_complex *in, fftw_complex *out, 1456 int sign, unsigned flags); 1457 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2, 1458 fftw_complex *in, fftw_complex *out, 1459 int sign, unsigned flags); 1460 fftw_plan fftw_plan_dft(int rank, const int *n, 1461 fftw_complex *in, fftw_complex *out, 1462 int sign, unsigned flags); 1463 1464 Plan a complex input/output discrete Fourier transform (DFT) in zero 1465or more dimensions, returning an 'fftw_plan' (*note Using Plans::). 1466 1467 Once you have created a plan for a certain transform type and 1468parameters, then creating another plan of the same type and parameters, 1469but for different arrays, is fast and shares constant data with the 1470first plan (if it still exists). 1471 1472 The planner returns 'NULL' if the plan cannot be created. In the 1473standard FFTW distribution, the basic interface is guaranteed to return 1474a non-'NULL' plan. A plan may be 'NULL', however, if you are using a 1475customized FFTW configuration supporting a restricted set of transforms. 1476 1477Arguments 1478......... 1479 1480 * 'rank' is the rank of the transform (it should be the size of the 1481 array '*n'), and can be any non-negative integer. (*Note Complex 1482 Multi-Dimensional DFTs::, for the definition of "rank".) The 1483 '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1', 1484 '2', and '3', respectively. The rank may be zero, which is 1485 equivalent to a rank-1 transform of size 1, i.e. a copy of one 1486 number from input to output. 1487 1488 * 'n0', 'n1', 'n2', or 'n[0..rank-1]' (as appropriate for each 1489 routine) specify the size of the transform dimensions. They can be 1490 any positive integer. 1491 1492 - Multi-dimensional arrays are stored in row-major order with 1493 dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x 1494 'n[1]' x ... x 'n[rank-1]'. *Note Multi-dimensional Array 1495 Format::. 1496 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d 1497 11^e 13^f, where e+f is either 0 or 1, and the other exponents 1498 are arbitrary. Other sizes are computed by means of a slow, 1499 general-purpose algorithm (which nevertheless retains O(n log 1500 n) performance even for prime sizes). It is possible to 1501 customize FFTW for different array sizes; see *note 1502 Installation and Customization::. Transforms whose sizes are 1503 powers of 2 are especially fast. 1504 1505 * 'in' and 'out' point to the input and output arrays of the 1506 transform, which may be the same (yielding an in-place transform). 1507 These arrays are overwritten during planning, unless 1508 'FFTW_ESTIMATE' is used in the flags. (The arrays need not be 1509 initialized, but they must be allocated.) 1510 1511 If 'in == out', the transform is "in-place" and the input array is 1512 overwritten. If 'in != out', the two arrays must not overlap (but 1513 FFTW does not check for this condition). 1514 1515 * 'sign' is the sign of the exponent in the formula that defines the 1516 Fourier transform. It can be -1 (= 'FFTW_FORWARD') or +1 (= 1517 'FFTW_BACKWARD'). 1518 1519 * 'flags' is a bitwise OR ('|') of zero or more planner flags, as 1520 defined in *note Planner Flags::. 1521 1522 FFTW computes an unnormalized transform: computing a forward followed 1523by a backward transform (or vice versa) will result in the original data 1524multiplied by the size of the transform (the product of the dimensions). 1525For more information, see *note What FFTW Really Computes::. 1526 1527 1528File: fftw3.info, Node: Planner Flags, Next: Real-data DFTs, Prev: Complex DFTs, Up: Basic Interface 1529 15304.3.2 Planner Flags 1531------------------- 1532 1533All of the planner routines in FFTW accept an integer 'flags' argument, 1534which is a bitwise OR ('|') of zero or more of the flag constants 1535defined below. These flags control the rigor (and time) of the planning 1536process, and can also impose (or lift) restrictions on the type of 1537transform algorithm that is employed. 1538 1539 _Important:_ the planner overwrites the input array during planning 1540unless a saved plan (*note Wisdom::) is available for that problem, so 1541you should initialize your input data after creating the plan. The only 1542exceptions to this are the 'FFTW_ESTIMATE' and 'FFTW_WISDOM_ONLY' flags, 1543as mentioned below. 1544 1545 In all cases, if wisdom is available for the given problem that was 1546created with equal-or-greater planning rigor, then the more rigorous 1547wisdom is used. For example, in 'FFTW_ESTIMATE' mode any available 1548wisdom is used, whereas in 'FFTW_PATIENT' mode only wisdom created in 1549patient or exhaustive mode can be used. *Note Words of Wisdom-Saving 1550Plans::. 1551 1552Planning-rigor flags 1553.................... 1554 1555 * 'FFTW_ESTIMATE' specifies that, instead of actual measurements of 1556 different algorithms, a simple heuristic is used to pick a 1557 (probably sub-optimal) plan quickly. With this flag, the 1558 input/output arrays are not overwritten during planning. 1559 1560 * 'FFTW_MEASURE' tells FFTW to find an optimized plan by actually 1561 _computing_ several FFTs and measuring their execution time. 1562 Depending on your machine, this can take some time (often a few 1563 seconds). 'FFTW_MEASURE' is the default planning option. 1564 1565 * 'FFTW_PATIENT' is like 'FFTW_MEASURE', but considers a wider range 1566 of algorithms and often produces a "more optimal" plan (especially 1567 for large transforms), but at the expense of several times longer 1568 planning time (especially for large transforms). 1569 1570 * 'FFTW_EXHAUSTIVE' is like 'FFTW_PATIENT', but considers an even 1571 wider range of algorithms, including many that we think are 1572 unlikely to be fast, to produce the most optimal plan but with a 1573 substantially increased planning time. 1574 1575 * 'FFTW_WISDOM_ONLY' is a special planning mode in which the plan is 1576 only created if wisdom is available for the given problem, and 1577 otherwise a 'NULL' plan is returned. This can be combined with 1578 other flags, e.g. 'FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan 1579 only if wisdom is available that was created in 'FFTW_PATIENT' or 1580 'FFTW_EXHAUSTIVE' mode. The 'FFTW_WISDOM_ONLY' flag is intended 1581 for users who need to detect whether wisdom is available; for 1582 example, if wisdom is not available one may wish to allocate new 1583 arrays for planning so that user data is not overwritten. 1584 1585Algorithm-restriction flags 1586........................... 1587 1588 * 'FFTW_DESTROY_INPUT' specifies that an out-of-place transform is 1589 allowed to _overwrite its input_ array with arbitrary data; this 1590 can sometimes allow more efficient algorithms to be employed. 1591 1592 * 'FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must 1593 _not change its input_ array. This is ordinarily the _default_, 1594 except for c2r and hc2r (i.e. complex-to-real) transforms for 1595 which 'FFTW_DESTROY_INPUT' is the default. In the latter cases, 1596 passing 'FFTW_PRESERVE_INPUT' will attempt to use algorithms that 1597 do not destroy the input, at the expense of worse performance; for 1598 multi-dimensional c2r transforms, however, no input-preserving 1599 algorithms are implemented and the planner will return 'NULL' if 1600 one is requested. 1601 1602 * 'FFTW_UNALIGNED' specifies that the algorithm may not impose any 1603 unusual alignment requirements on the input/output arrays (i.e. no 1604 SIMD may be used). This flag is normally _not necessary_, since 1605 the planner automatically detects misaligned arrays. The only use 1606 for this flag is if you want to use the new-array execute interface 1607 to execute a given plan on a different array that may not be 1608 aligned like the original. (Using 'fftw_malloc' makes this flag 1609 unnecessary even then. You can also use 'fftw_alignment_of' to 1610 detect whether two arrays are equivalently aligned.) 1611 1612Limiting planning time 1613...................... 1614 1615 extern void fftw_set_timelimit(double seconds); 1616 1617 This function instructs FFTW to spend at most 'seconds' seconds 1618(approximately) in the planner. If 'seconds == FFTW_NO_TIMELIMIT' (the 1619default value, which is negative), then planning time is unbounded. 1620Otherwise, FFTW plans with a progressively wider range of algorithms 1621until the the given time limit is reached or the given range of 1622algorithms is explored, returning the best available plan. 1623 1624 For example, specifying 'FFTW_PATIENT' first plans in 'FFTW_ESTIMATE' 1625mode, then in 'FFTW_MEASURE' mode, then finally (time permitting) in 1626'FFTW_PATIENT'. If 'FFTW_EXHAUSTIVE' is specified instead, the planner 1627will further progress to 'FFTW_EXHAUSTIVE' mode. 1628 1629 Note that the 'seconds' argument specifies only a rough limit; in 1630practice, the planner may use somewhat more time if the time limit is 1631reached when the planner is in the middle of an operation that cannot be 1632interrupted. At the very least, the planner will complete planning in 1633'FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0). 1634 1635 1636File: fftw3.info, Node: Real-data DFTs, Next: Real-data DFT Array Format, Prev: Planner Flags, Up: Basic Interface 1637 16384.3.3 Real-data DFTs 1639-------------------- 1640 1641 fftw_plan fftw_plan_dft_r2c_1d(int n0, 1642 double *in, fftw_complex *out, 1643 unsigned flags); 1644 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1, 1645 double *in, fftw_complex *out, 1646 unsigned flags); 1647 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2, 1648 double *in, fftw_complex *out, 1649 unsigned flags); 1650 fftw_plan fftw_plan_dft_r2c(int rank, const int *n, 1651 double *in, fftw_complex *out, 1652 unsigned flags); 1653 1654 Plan a real-input/complex-output discrete Fourier transform (DFT) in 1655zero or more dimensions, returning an 'fftw_plan' (*note Using Plans::). 1656 1657 Once you have created a plan for a certain transform type and 1658parameters, then creating another plan of the same type and parameters, 1659but for different arrays, is fast and shares constant data with the 1660first plan (if it still exists). 1661 1662 The planner returns 'NULL' if the plan cannot be created. A 1663non-'NULL' plan is always returned by the basic interface unless you are 1664using a customized FFTW configuration supporting a restricted set of 1665transforms, or if you use the 'FFTW_PRESERVE_INPUT' flag with a 1666multi-dimensional out-of-place c2r transform (see below). 1667 1668Arguments 1669......... 1670 1671 * 'rank' is the rank of the transform (it should be the size of the 1672 array '*n'), and can be any non-negative integer. (*Note Complex 1673 Multi-Dimensional DFTs::, for the definition of "rank".) The 1674 '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1', 1675 '2', and '3', respectively. The rank may be zero, which is 1676 equivalent to a rank-1 transform of size 1, i.e. a copy of one 1677 real number (with zero imaginary part) from input to output. 1678 1679 * 'n0', 'n1', 'n2', or 'n[0..rank-1]', (as appropriate for each 1680 routine) specify the size of the transform dimensions. They can be 1681 any positive integer. This is different in general from the 1682 _physical_ array dimensions, which are described in *note Real-data 1683 DFT Array Format::. 1684 1685 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d 1686 11^e 13^f, where e+f is either 0 or 1, and the other exponents 1687 are arbitrary. Other sizes are computed by means of a slow, 1688 general-purpose algorithm (which nevertheless retains O(n log 1689 n) performance even for prime sizes). (It is possible to 1690 customize FFTW for different array sizes; see *note 1691 Installation and Customization::.) Transforms whose sizes are 1692 powers of 2 are especially fast, and it is generally 1693 beneficial for the _last_ dimension of an r2c/c2r transform to 1694 be _even_. 1695 1696 * 'in' and 'out' point to the input and output arrays of the 1697 transform, which may be the same (yielding an in-place transform). 1698 These arrays are overwritten during planning, unless 1699 'FFTW_ESTIMATE' is used in the flags. (The arrays need not be 1700 initialized, but they must be allocated.) For an in-place 1701 transform, it is important to remember that the real array will 1702 require padding, described in *note Real-data DFT Array Format::. 1703 1704 * 'flags' is a bitwise OR ('|') of zero or more planner flags, as 1705 defined in *note Planner Flags::. 1706 1707 The inverse transforms, taking complex input (storing the 1708non-redundant half of a logically Hermitian array) to real output, are 1709given by: 1710 1711 fftw_plan fftw_plan_dft_c2r_1d(int n0, 1712 fftw_complex *in, double *out, 1713 unsigned flags); 1714 fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1, 1715 fftw_complex *in, double *out, 1716 unsigned flags); 1717 fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2, 1718 fftw_complex *in, double *out, 1719 unsigned flags); 1720 fftw_plan fftw_plan_dft_c2r(int rank, const int *n, 1721 fftw_complex *in, double *out, 1722 unsigned flags); 1723 1724 The arguments are the same as for the r2c transforms, except that the 1725input and output data formats are reversed. 1726 1727 FFTW computes an unnormalized transform: computing an r2c followed by 1728a c2r transform (or vice versa) will result in the original data 1729multiplied by the size of the transform (the product of the logical 1730dimensions). An r2c transform produces the same output as a 1731'FFTW_FORWARD' complex DFT of the same input, and a c2r transform is 1732correspondingly equivalent to 'FFTW_BACKWARD'. For more information, 1733see *note What FFTW Really Computes::. 1734 1735 1736File: fftw3.info, Node: Real-data DFT Array Format, Next: Real-to-Real Transforms, Prev: Real-data DFTs, Up: Basic Interface 1737 17384.3.4 Real-data DFT Array Format 1739-------------------------------- 1740 1741The output of a DFT of real data (r2c) contains symmetries that, in 1742principle, make half of the outputs redundant (*note What FFTW Really 1743Computes::). (Similarly for the input of an inverse c2r transform.) In 1744practice, it is not possible to entirely realize these savings in an 1745efficient and understandable format that generalizes to 1746multi-dimensional transforms. Instead, the output of the r2c transforms 1747is _slightly_ over half of the output of the corresponding complex 1748transform. We do not "pack" the data in any way, but store it as an 1749ordinary array of 'fftw_complex' values. In fact, this data is simply a 1750subsection of what would be the array in the corresponding complex 1751transform. 1752 1753 Specifically, for a real transform of d (= 'rank') dimensions n[0] x 1754n[1] x n[2] x ... x n[d-1] , the complex data is an n[0] x n[1] x n[2] 1755x ... x (n[d-1]/2 + 1) array of 'fftw_complex' values in row-major 1756order (with the division rounded down). That is, we only store the 1757_lower_ half (non-negative frequencies), plus one element, of the last 1758dimension of the data from the ordinary complex transform. (We could 1759have instead taken half of any other dimension, but implementation turns 1760out to be simpler if the last, contiguous, dimension is used.) 1761 1762 For an out-of-place transform, the real data is simply an array with 1763physical dimensions n[0] x n[1] x n[2] x ... x n[d-1] in row-major 1764order. 1765 1766 For an in-place transform, some complications arise since the complex 1767data is slightly larger than the real data. In this case, the final 1768dimension of the real data must be _padded_ with extra values to 1769accommodate the size of the complex data--two extra if the last 1770dimension is even and one if it is odd. That is, the last dimension of 1771the real data must physically contain 2 * (n[d-1]/2+1) 'double' values 1772(exactly enough to hold the complex data). This physical array size 1773does not, however, change the _logical_ array size--only n[d-1] values 1774are actually stored in the last dimension, and n[d-1] is the last 1775dimension passed to the planner. 1776 1777 1778File: fftw3.info, Node: Real-to-Real Transforms, Next: Real-to-Real Transform Kinds, Prev: Real-data DFT Array Format, Up: Basic Interface 1779 17804.3.5 Real-to-Real Transforms 1781----------------------------- 1782 1783 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, 1784 fftw_r2r_kind kind, unsigned flags); 1785 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out, 1786 fftw_r2r_kind kind0, fftw_r2r_kind kind1, 1787 unsigned flags); 1788 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2, 1789 double *in, double *out, 1790 fftw_r2r_kind kind0, 1791 fftw_r2r_kind kind1, 1792 fftw_r2r_kind kind2, 1793 unsigned flags); 1794 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, 1795 const fftw_r2r_kind *kind, unsigned flags); 1796 1797 Plan a real input/output (r2r) transform of various kinds in zero or 1798more dimensions, returning an 'fftw_plan' (*note Using Plans::). 1799 1800 Once you have created a plan for a certain transform type and 1801parameters, then creating another plan of the same type and parameters, 1802but for different arrays, is fast and shares constant data with the 1803first plan (if it still exists). 1804 1805 The planner returns 'NULL' if the plan cannot be created. A 1806non-'NULL' plan is always returned by the basic interface unless you are 1807using a customized FFTW configuration supporting a restricted set of 1808transforms, or for size-1 'FFTW_REDFT00' kinds (which are not defined). 1809 1810Arguments 1811......... 1812 1813 * 'rank' is the dimensionality of the transform (it should be the 1814 size of the arrays '*n' and '*kind'), and can be any non-negative 1815 integer. The '_1d', '_2d', and '_3d' planners correspond to a 1816 'rank' of '1', '2', and '3', respectively. A 'rank' of zero is 1817 equivalent to a copy of one number from input to output. 1818 1819 * 'n', or 'n0'/'n1'/'n2', or 'n[rank]', respectively, gives the 1820 (physical) size of the transform dimensions. They can be any 1821 positive integer. 1822 1823 - Multi-dimensional arrays are stored in row-major order with 1824 dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x 1825 'n[1]' x ... x 'n[rank-1]'. *Note Multi-dimensional Array 1826 Format::. 1827 - FFTW is generally best at handling sizes of the form 2^a 3^b 1828 5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other 1829 exponents are arbitrary. Other sizes are computed by means of 1830 a slow, general-purpose algorithm (which nevertheless retains 1831 O(n log n) performance even for prime sizes). (It is possible 1832 to customize FFTW for different array sizes; see *note 1833 Installation and Customization::.) Transforms whose sizes are 1834 powers of 2 are especially fast. 1835 - For a 'REDFT00' or 'RODFT00' transform kind in a dimension of 1836 size n, it is n-1 or n+1, respectively, that should be 1837 factorizable in the above form. 1838 1839 * 'in' and 'out' point to the input and output arrays of the 1840 transform, which may be the same (yielding an in-place transform). 1841 These arrays are overwritten during planning, unless 1842 'FFTW_ESTIMATE' is used in the flags. (The arrays need not be 1843 initialized, but they must be allocated.) 1844 1845 * 'kind', or 'kind0'/'kind1'/'kind2', or 'kind[rank]', is the kind of 1846 r2r transform used for the corresponding dimension. The valid kind 1847 constants are described in *note Real-to-Real Transform Kinds::. 1848 In a multi-dimensional transform, what is computed is the separable 1849 product formed by taking each transform kind along the 1850 corresponding dimension, one dimension after another. 1851 1852 * 'flags' is a bitwise OR ('|') of zero or more planner flags, as 1853 defined in *note Planner Flags::. 1854 1855 1856File: fftw3.info, Node: Real-to-Real Transform Kinds, Prev: Real-to-Real Transforms, Up: Basic Interface 1857 18584.3.6 Real-to-Real Transform Kinds 1859---------------------------------- 1860 1861FFTW currently supports 11 different r2r transform kinds, specified by 1862one of the constants below. For the precise definitions of these 1863transforms, see *note What FFTW Really Computes::. For a more 1864colloquial introduction to these transform kinds, see *note More DFTs of 1865Real Data::. 1866 1867 For dimension of size 'n', there is a corresponding "logical" 1868dimension 'N' that determines the normalization (and the optimal 1869factorization); the formula for 'N' is given for each kind below. Also, 1870with each transform kind is listed its corrsponding inverse transform. 1871FFTW computes unnormalized transforms: a transform followed by its 1872inverse will result in the original data multiplied by 'N' (or the 1873product of the 'N''s for each dimension, in multi-dimensions). 1874 1875 * 'FFTW_R2HC' computes a real-input DFT with output in "halfcomplex" 1876 format, i.e. real and imaginary parts for a transform of size 'n' 1877 stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical 1878 'N=n', inverse is 'FFTW_HC2R'.) 1879 1880 * 'FFTW_HC2R' computes the reverse of 'FFTW_R2HC', above. (Logical 1881 'N=n', inverse is 'FFTW_R2HC'.) 1882 1883 * 'FFTW_DHT' computes a discrete Hartley transform. (Logical 'N=n', 1884 inverse is 'FFTW_DHT'.) 1885 1886 * 'FFTW_REDFT00' computes an REDFT00 transform, i.e. a DCT-I. 1887 (Logical 'N=2*(n-1)', inverse is 'FFTW_REDFT00'.) 1888 1889 * 'FFTW_REDFT10' computes an REDFT10 transform, i.e. a DCT-II 1890 (sometimes called "the" DCT). (Logical 'N=2*n', inverse is 1891 'FFTW_REDFT01'.) 1892 1893 * 'FFTW_REDFT01' computes an REDFT01 transform, i.e. a DCT-III 1894 (sometimes called "the" IDCT, being the inverse of DCT-II). 1895 (Logical 'N=2*n', inverse is 'FFTW_REDFT=10'.) 1896 1897 * 'FFTW_REDFT11' computes an REDFT11 transform, i.e. a DCT-IV. 1898 (Logical 'N=2*n', inverse is 'FFTW_REDFT11'.) 1899 1900 * 'FFTW_RODFT00' computes an RODFT00 transform, i.e. a DST-I. 1901 (Logical 'N=2*(n+1)', inverse is 'FFTW_RODFT00'.) 1902 1903 * 'FFTW_RODFT10' computes an RODFT10 transform, i.e. a DST-II. 1904 (Logical 'N=2*n', inverse is 'FFTW_RODFT01'.) 1905 1906 * 'FFTW_RODFT01' computes an RODFT01 transform, i.e. a DST-III. 1907 (Logical 'N=2*n', inverse is 'FFTW_RODFT=10'.) 1908 1909 * 'FFTW_RODFT11' computes an RODFT11 transform, i.e. a DST-IV. 1910 (Logical 'N=2*n', inverse is 'FFTW_RODFT11'.) 1911 1912 1913File: fftw3.info, Node: Advanced Interface, Next: Guru Interface, Prev: Basic Interface, Up: FFTW Reference 1914 19154.4 Advanced Interface 1916====================== 1917 1918FFTW's "advanced" interface supplements the basic interface with four 1919new planner routines, providing a new level of flexibility: you can plan 1920a transform of multiple arrays simultaneously, operate on non-contiguous 1921(strided) data, and transform a subset of a larger multi-dimensional 1922array. Other than these additional features, the planner operates in 1923the same fashion as in the basic interface, and the resulting 1924'fftw_plan' is used in the same way (*note Using Plans::). 1925 1926* Menu: 1927 1928* Advanced Complex DFTs:: 1929* Advanced Real-data DFTs:: 1930* Advanced Real-to-real Transforms:: 1931 1932 1933File: fftw3.info, Node: Advanced Complex DFTs, Next: Advanced Real-data DFTs, Prev: Advanced Interface, Up: Advanced Interface 1934 19354.4.1 Advanced Complex DFTs 1936--------------------------- 1937 1938 fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany, 1939 fftw_complex *in, const int *inembed, 1940 int istride, int idist, 1941 fftw_complex *out, const int *onembed, 1942 int ostride, int odist, 1943 int sign, unsigned flags); 1944 1945 This routine plans multiple multidimensional complex DFTs, and it 1946extends the 'fftw_plan_dft' routine (*note Complex DFTs::) to compute 1947'howmany' transforms, each having rank 'rank' and size 'n'. In 1948addition, the transform data need not be contiguous, but it may be laid 1949out in memory with an arbitrary stride. To account for these 1950possibilities, 'fftw_plan_many_dft' adds the new parameters 'howmany', 1951{'i','o'}'nembed', {'i','o'}'stride', and {'i','o'}'dist'. The FFTW 1952basic interface (*note Complex DFTs::) provides routines specialized for 1953ranks 1, 2, and 3, but the advanced interface handles only the 1954general-rank case. 1955 1956 'howmany' is the (nonnegative) number of transforms to compute. The 1957resulting plan computes 'howmany' transforms, where the input of the 1958'k'-th transform is at location 'in+k*idist' (in C pointer arithmetic), 1959and its output is at location 'out+k*odist'. Plans obtained in this way 1960can often be faster than calling FFTW multiple times for the individual 1961transforms. The basic 'fftw_plan_dft' interface corresponds to 1962'howmany=1' (in which case the 'dist' parameters are ignored). 1963 1964 Each of the 'howmany' transforms has rank 'rank' and size 'n', as in 1965the basic interface. In addition, the advanced interface allows the 1966input and output arrays of each transform to be row-major subarrays of 1967larger rank-'rank' arrays, described by 'inembed' and 'onembed' 1968parameters, respectively. {'i','o'}'nembed' must be arrays of length 1969'rank', and 'n' should be elementwise less than or equal to 1970{'i','o'}'nembed'. Passing 'NULL' for an 'nembed' parameter is 1971equivalent to passing 'n' (i.e. same physical and logical dimensions, 1972as in the basic interface.) 1973 1974 The 'stride' parameters indicate that the 'j'-th element of the input 1975or output arrays is located at 'j*istride' or 'j*ostride', respectively. 1976(For a multi-dimensional array, 'j' is the ordinary row-major index.) 1977When combined with the 'k'-th transform in a 'howmany' loop, from above, 1978this means that the ('j','k')-th element is at 'j*stride+k*dist'. (The 1979basic 'fftw_plan_dft' interface corresponds to a stride of 1.) 1980 1981 For in-place transforms, the input and output 'stride' and 'dist' 1982parameters should be the same; otherwise, the planner may return 'NULL'. 1983 1984 Arrays 'n', 'inembed', and 'onembed' are not used after this function 1985returns. You can safely free or reuse them. 1986 1987 *Examples*: One transform of one 5 by 6 array contiguous in memory: 1988 int rank = 2; 1989 int n[] = {5, 6}; 1990 int howmany = 1; 1991 int idist = odist = 0; /* unused because howmany = 1 */ 1992 int istride = ostride = 1; /* array is contiguous in memory */ 1993 int *inembed = n, *onembed = n; 1994 1995 Transform of three 5 by 6 arrays, each contiguous in memory, stored 1996in memory one after another: 1997 int rank = 2; 1998 int n[] = {5, 6}; 1999 int howmany = 3; 2000 int idist = odist = n[0]*n[1]; /* = 30, the distance in memory 2001 between the first element 2002 of the first array and the 2003 first element of the second array */ 2004 int istride = ostride = 1; /* array is contiguous in memory */ 2005 int *inembed = n, *onembed = n; 2006 2007 Transform each column of a 2d array with 10 rows and 3 columns: 2008 int rank = 1; /* not 2: we are computing 1d transforms */ 2009 int n[] = {10}; /* 1d transforms of length 10 */ 2010 int howmany = 3; 2011 int idist = odist = 1; 2012 int istride = ostride = 3; /* distance between two elements in 2013 the same column */ 2014 int *inembed = n, *onembed = n; 2015 2016 2017File: fftw3.info, Node: Advanced Real-data DFTs, Next: Advanced Real-to-real Transforms, Prev: Advanced Complex DFTs, Up: Advanced Interface 2018 20194.4.2 Advanced Real-data DFTs 2020----------------------------- 2021 2022 fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany, 2023 double *in, const int *inembed, 2024 int istride, int idist, 2025 fftw_complex *out, const int *onembed, 2026 int ostride, int odist, 2027 unsigned flags); 2028 fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany, 2029 fftw_complex *in, const int *inembed, 2030 int istride, int idist, 2031 double *out, const int *onembed, 2032 int ostride, int odist, 2033 unsigned flags); 2034 2035 Like 'fftw_plan_many_dft', these two functions add 'howmany', 2036'nembed', 'stride', and 'dist' parameters to the 'fftw_plan_dft_r2c' and 2037'fftw_plan_dft_c2r' functions, but otherwise behave the same as the 2038basic interface. 2039 2040 The interpretation of 'howmany', 'stride', and 'dist' are the same as 2041for 'fftw_plan_many_dft', above. Note that the 'stride' and 'dist' for 2042the real array are in units of 'double', and for the complex array are 2043in units of 'fftw_complex'. 2044 2045 If an 'nembed' parameter is 'NULL', it is interpreted as what it 2046would be in the basic interface, as described in *note Real-data DFT 2047Array Format::. That is, for the complex array the size is assumed to 2048be the same as 'n', but with the last dimension cut roughly in half. 2049For the real array, the size is assumed to be 'n' if the transform is 2050out-of-place, or 'n' with the last dimension "padded" if the transform 2051is in-place. 2052 2053 If an 'nembed' parameter is non-'NULL', it is interpreted as the 2054physical size of the corresponding array, in row-major order, just as 2055for 'fftw_plan_many_dft'. In this case, each dimension of 'nembed' 2056should be '>=' what it would be in the basic interface (e.g. the halved 2057or padded 'n'). 2058 2059 Arrays 'n', 'inembed', and 'onembed' are not used after this function 2060returns. You can safely free or reuse them. 2061 2062 2063File: fftw3.info, Node: Advanced Real-to-real Transforms, Prev: Advanced Real-data DFTs, Up: Advanced Interface 2064 20654.4.3 Advanced Real-to-real Transforms 2066-------------------------------------- 2067 2068 fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany, 2069 double *in, const int *inembed, 2070 int istride, int idist, 2071 double *out, const int *onembed, 2072 int ostride, int odist, 2073 const fftw_r2r_kind *kind, unsigned flags); 2074 2075 Like 'fftw_plan_many_dft', this functions adds 'howmany', 'nembed', 2076'stride', and 'dist' parameters to the 'fftw_plan_r2r' function, but 2077otherwise behave the same as the basic interface. The interpretation of 2078those additional parameters are the same as for 'fftw_plan_many_dft'. 2079(Of course, the 'stride' and 'dist' parameters are now in units of 2080'double', not 'fftw_complex'.) 2081 2082 Arrays 'n', 'inembed', 'onembed', and 'kind' are not used after this 2083function returns. You can safely free or reuse them. 2084 2085 2086File: fftw3.info, Node: Guru Interface, Next: New-array Execute Functions, Prev: Advanced Interface, Up: FFTW Reference 2087 20884.5 Guru Interface 2089================== 2090 2091The "guru" interface to FFTW is intended to expose as much as possible 2092of the flexibility in the underlying FFTW architecture. It allows one 2093to compute multi-dimensional "vectors" (loops) of multi-dimensional 2094transforms, where each vector/transform dimension has an independent 2095size and stride. One can also use more general complex-number formats, 2096e.g. separate real and imaginary arrays. 2097 2098 For those users who require the flexibility of the guru interface, it 2099is important that they pay special attention to the documentation lest 2100they shoot themselves in the foot. 2101 2102* Menu: 2103 2104* Interleaved and split arrays:: 2105* Guru vector and transform sizes:: 2106* Guru Complex DFTs:: 2107* Guru Real-data DFTs:: 2108* Guru Real-to-real Transforms:: 2109* 64-bit Guru Interface:: 2110 2111 2112File: fftw3.info, Node: Interleaved and split arrays, Next: Guru vector and transform sizes, Prev: Guru Interface, Up: Guru Interface 2113 21144.5.1 Interleaved and split arrays 2115---------------------------------- 2116 2117The guru interface supports two representations of complex numbers, 2118which we call the interleaved and the split format. 2119 2120 The "interleaved" format is the same one used by the basic and 2121advanced interfaces, and it is documented in *note Complex numbers::. 2122In the interleaved format, you provide pointers to the real part of a 2123complex number, and the imaginary part understood to be stored in the 2124next memory location. 2125 2126 The "split" format allows separate pointers to the real and imaginary 2127parts of a complex array. 2128 2129 Technically, the interleaved format is redundant, because you can 2130always express an interleaved array in terms of a split array with 2131appropriate pointers and strides. On the other hand, the interleaved 2132format is simpler to use, and it is common in practice. Hence, FFTW 2133supports it as a special case. 2134 2135 2136File: fftw3.info, Node: Guru vector and transform sizes, Next: Guru Complex DFTs, Prev: Interleaved and split arrays, Up: Guru Interface 2137 21384.5.2 Guru vector and transform sizes 2139------------------------------------- 2140 2141The guru interface introduces one basic new data structure, 2142'fftw_iodim', that is used to specify sizes and strides for 2143multi-dimensional transforms and vectors: 2144 2145 typedef struct { 2146 int n; 2147 int is; 2148 int os; 2149 } fftw_iodim; 2150 2151 Here, 'n' is the size of the dimension, and 'is' and 'os' are the 2152strides of that dimension for the input and output arrays. (The stride 2153is the separation of consecutive elements along this dimension.) 2154 2155 The meaning of the stride parameter depends on the type of the array 2156that the stride refers to. _If the array is interleaved complex, 2157strides are expressed in units of complex numbers ('fftw_complex'). If 2158the array is split complex or real, strides are expressed in units of 2159real numbers ('double')._ This convention is consistent with the usual 2160pointer arithmetic in the C language. An interleaved array is denoted 2161by a pointer 'p' to 'fftw_complex', so that 'p+1' points to the next 2162complex number. Split arrays are denoted by pointers to 'double', in 2163which case pointer arithmetic operates in units of 'sizeof(double)'. 2164 2165 The guru planner interfaces all take a ('rank', 'dims[rank]') pair 2166describing the transform size, and a ('howmany_rank', 2167'howmany_dims[howmany_rank]') pair describing the "vector" size (a 2168multi-dimensional loop of transforms to perform), where 'dims' and 2169'howmany_dims' are arrays of 'fftw_iodim'. Each 'n' field must be 2170positive for 'dims' and nonnegative for 'howmany_dims', while both 2171'rank' and 'howmany_rank' must be nonnegative. 2172 2173 For example, the 'howmany' parameter in the advanced complex-DFT 2174interface corresponds to 'howmany_rank' = 1, 'howmany_dims[0].n' = 2175'howmany', 'howmany_dims[0].is' = 'idist', and 'howmany_dims[0].os' = 2176'odist'. (To compute a single transform, you can just use 2177'howmany_rank' = 0.) 2178 2179 A row-major multidimensional array with dimensions 'n[rank]' (*note 2180Row-major Format::) corresponds to 'dims[i].n' = 'n[i]' and the 2181recurrence 'dims[i].is' = 'n[i+1] * dims[i+1].is' (similarly for 'os'). 2182The stride of the last ('i=rank-1') dimension is the overall stride of 2183the array. e.g. to be equivalent to the advanced complex-DFT 2184interface, you would have 'dims[rank-1].is' = 'istride' and 2185'dims[rank-1].os' = 'ostride'. 2186 2187 In general, we only guarantee FFTW to return a non-'NULL' plan if the 2188vector and transform dimensions correspond to a set of distinct indices, 2189and for in-place transforms the input/output strides should be the same. 2190 2191 2192File: fftw3.info, Node: Guru Complex DFTs, Next: Guru Real-data DFTs, Prev: Guru vector and transform sizes, Up: Guru Interface 2193 21944.5.3 Guru Complex DFTs 2195----------------------- 2196 2197 fftw_plan fftw_plan_guru_dft( 2198 int rank, const fftw_iodim *dims, 2199 int howmany_rank, const fftw_iodim *howmany_dims, 2200 fftw_complex *in, fftw_complex *out, 2201 int sign, unsigned flags); 2202 2203 fftw_plan fftw_plan_guru_split_dft( 2204 int rank, const fftw_iodim *dims, 2205 int howmany_rank, const fftw_iodim *howmany_dims, 2206 double *ri, double *ii, double *ro, double *io, 2207 unsigned flags); 2208 2209 These two functions plan a complex-data, multi-dimensional DFT for 2210the interleaved and split format, respectively. Transform dimensions 2211are given by ('rank', 'dims') over a multi-dimensional vector (loop) of 2212dimensions ('howmany_rank', 'howmany_dims'). 'dims' and 'howmany_dims' 2213should point to 'fftw_iodim' arrays of length 'rank' and 'howmany_rank', 2214respectively. 2215 2216 'flags' is a bitwise OR ('|') of zero or more planner flags, as 2217defined in *note Planner Flags::. 2218 2219 In the 'fftw_plan_guru_dft' function, the pointers 'in' and 'out' 2220point to the interleaved input and output arrays, respectively. The 2221sign can be either -1 (= 'FFTW_FORWARD') or +1 (= 'FFTW_BACKWARD'). If 2222the pointers are equal, the transform is in-place. 2223 2224 In the 'fftw_plan_guru_split_dft' function, 'ri' and 'ii' point to 2225the real and imaginary input arrays, and 'ro' and 'io' point to the real 2226and imaginary output arrays. The input and output pointers may be the 2227same, indicating an in-place transform. For example, for 'fftw_complex' 2228pointers 'in' and 'out', the corresponding parameters are: 2229 2230 ri = (double *) in; 2231 ii = (double *) in + 1; 2232 ro = (double *) out; 2233 io = (double *) out + 1; 2234 2235 Because 'fftw_plan_guru_split_dft' accepts split arrays, strides are 2236expressed in units of 'double'. For a contiguous 'fftw_complex' array, 2237the overall stride of the transform should be 2, the distance between 2238consecutive real parts or between consecutive imaginary parts; see *note 2239Guru vector and transform sizes::. Note that the dimension strides are 2240applied equally to the real and imaginary parts; real and imaginary 2241arrays with different strides are not supported. 2242 2243 There is no 'sign' parameter in 'fftw_plan_guru_split_dft'. This 2244function always plans for an 'FFTW_FORWARD' transform. To plan for an 2245'FFTW_BACKWARD' transform, you can exploit the identity that the 2246backwards DFT is equal to the forwards DFT with the real and imaginary 2247parts swapped. For example, in the case of the 'fftw_complex' arrays 2248above, the 'FFTW_BACKWARD' transform is computed by the parameters: 2249 2250 ri = (double *) in + 1; 2251 ii = (double *) in; 2252 ro = (double *) out + 1; 2253 io = (double *) out; 2254 2255 2256File: fftw3.info, Node: Guru Real-data DFTs, Next: Guru Real-to-real Transforms, Prev: Guru Complex DFTs, Up: Guru Interface 2257 22584.5.4 Guru Real-data DFTs 2259------------------------- 2260 2261 fftw_plan fftw_plan_guru_dft_r2c( 2262 int rank, const fftw_iodim *dims, 2263 int howmany_rank, const fftw_iodim *howmany_dims, 2264 double *in, fftw_complex *out, 2265 unsigned flags); 2266 2267 fftw_plan fftw_plan_guru_split_dft_r2c( 2268 int rank, const fftw_iodim *dims, 2269 int howmany_rank, const fftw_iodim *howmany_dims, 2270 double *in, double *ro, double *io, 2271 unsigned flags); 2272 2273 fftw_plan fftw_plan_guru_dft_c2r( 2274 int rank, const fftw_iodim *dims, 2275 int howmany_rank, const fftw_iodim *howmany_dims, 2276 fftw_complex *in, double *out, 2277 unsigned flags); 2278 2279 fftw_plan fftw_plan_guru_split_dft_c2r( 2280 int rank, const fftw_iodim *dims, 2281 int howmany_rank, const fftw_iodim *howmany_dims, 2282 double *ri, double *ii, double *out, 2283 unsigned flags); 2284 2285 Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT 2286with transform dimensions given by ('rank', 'dims') over a 2287multi-dimensional vector (loop) of dimensions ('howmany_rank', 2288'howmany_dims'). 'dims' and 'howmany_dims' should point to 'fftw_iodim' 2289arrays of length 'rank' and 'howmany_rank', respectively. As for the 2290basic and advanced interfaces, an r2c transform is 'FFTW_FORWARD' and a 2291c2r transform is 'FFTW_BACKWARD'. 2292 2293 The _last_ dimension of 'dims' is interpreted specially: that 2294dimension of the real array has size 'dims[rank-1].n', but that 2295dimension of the complex array has size 'dims[rank-1].n/2+1' (division 2296rounded down). The strides, on the other hand, are taken to be exactly 2297as specified. It is up to the user to specify the strides appropriately 2298for the peculiar dimensions of the data, and we do not guarantee that 2299the planner will succeed (return non-'NULL') for any dimensions other 2300than those described in *note Real-data DFT Array Format:: and 2301generalized in *note Advanced Real-data DFTs::. (That is, for an 2302in-place transform, each individual dimension should be able to operate 2303in place.) 2304 2305 'in' and 'out' point to the input and output arrays for r2c and c2r 2306transforms, respectively. For split arrays, 'ri' and 'ii' point to the 2307real and imaginary input arrays for a c2r transform, and 'ro' and 'io' 2308point to the real and imaginary output arrays for an r2c transform. 2309'in' and 'ro' or 'ri' and 'out' may be the same, indicating an in-place 2310transform. (In-place transforms where 'in' and 'io' or 'ii' and 'out' 2311are the same are not currently supported.) 2312 2313 'flags' is a bitwise OR ('|') of zero or more planner flags, as 2314defined in *note Planner Flags::. 2315 2316 In-place transforms of rank greater than 1 are currently only 2317supported for interleaved arrays. For split arrays, the planner will 2318return 'NULL'. 2319 2320 2321File: fftw3.info, Node: Guru Real-to-real Transforms, Next: 64-bit Guru Interface, Prev: Guru Real-data DFTs, Up: Guru Interface 2322 23234.5.5 Guru Real-to-real Transforms 2324---------------------------------- 2325 2326 fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims, 2327 int howmany_rank, 2328 const fftw_iodim *howmany_dims, 2329 double *in, double *out, 2330 const fftw_r2r_kind *kind, 2331 unsigned flags); 2332 2333 Plan a real-to-real (r2r) multi-dimensional 'FFTW_FORWARD' transform 2334with transform dimensions given by ('rank', 'dims') over a 2335multi-dimensional vector (loop) of dimensions ('howmany_rank', 2336'howmany_dims'). 'dims' and 'howmany_dims' should point to 'fftw_iodim' 2337arrays of length 'rank' and 'howmany_rank', respectively. 2338 2339 The transform kind of each dimension is given by the 'kind' 2340parameter, which should point to an array of length 'rank'. Valid 2341'fftw_r2r_kind' constants are given in *note Real-to-Real Transform 2342Kinds::. 2343 2344 'in' and 'out' point to the real input and output arrays; they may be 2345the same, indicating an in-place transform. 2346 2347 'flags' is a bitwise OR ('|') of zero or more planner flags, as 2348defined in *note Planner Flags::. 2349 2350 2351File: fftw3.info, Node: 64-bit Guru Interface, Prev: Guru Real-to-real Transforms, Up: Guru Interface 2352 23534.5.6 64-bit Guru Interface 2354--------------------------- 2355 2356When compiled in 64-bit mode on a 64-bit architecture (where addresses 2357are 64 bits wide), FFTW uses 64-bit quantities internally for all 2358transform sizes, strides, and so on--you don't have to do anything 2359special to exploit this. However, in the ordinary FFTW interfaces, you 2360specify the transform size by an 'int' quantity, which is normally only 236132 bits wide. This means that, even though FFTW is using 64-bit sizes 2362internally, you cannot specify a single transform dimension larger than 23632^31-1 numbers. 2364 2365 We expect that few users will require transforms larger than this, 2366but, for those who do, we provide a 64-bit version of the guru interface 2367in which all sizes are specified as integers of type 'ptrdiff_t' instead 2368of 'int'. ('ptrdiff_t' is a signed integer type defined by the C 2369standard to be wide enough to represent address differences, and thus 2370must be at least 64 bits wide on a 64-bit machine.) We stress that 2371there is _no performance advantage_ to using this interface--the same 2372internal FFTW code is employed regardless--and it is only necessary if 2373you want to specify very large transform sizes. 2374 2375 In particular, the 64-bit guru interface is a set of planner routines 2376that are exactly the same as the guru planner routines, except that they 2377are named with 'guru64' instead of 'guru' and they take arguments of 2378type 'fftw_iodim64' instead of 'fftw_iodim'. For example, instead of 2379'fftw_plan_guru_dft', we have 'fftw_plan_guru64_dft'. 2380 2381 fftw_plan fftw_plan_guru64_dft( 2382 int rank, const fftw_iodim64 *dims, 2383 int howmany_rank, const fftw_iodim64 *howmany_dims, 2384 fftw_complex *in, fftw_complex *out, 2385 int sign, unsigned flags); 2386 2387 The 'fftw_iodim64' type is similar to 'fftw_iodim', with the same 2388interpretation, except that it uses type 'ptrdiff_t' instead of type 2389'int'. 2390 2391 typedef struct { 2392 ptrdiff_t n; 2393 ptrdiff_t is; 2394 ptrdiff_t os; 2395 } fftw_iodim64; 2396 2397 Every other 'fftw_plan_guru' function also has a 'fftw_plan_guru64' 2398equivalent, but we do not repeat their documentation here since they are 2399identical to the 32-bit versions except as noted above. 2400 2401 2402File: fftw3.info, Node: New-array Execute Functions, Next: Wisdom, Prev: Guru Interface, Up: FFTW Reference 2403 24044.6 New-array Execute Functions 2405=============================== 2406 2407Normally, one executes a plan for the arrays with which the plan was 2408created, by calling 'fftw_execute(plan)' as described in *note Using 2409Plans::. However, it is possible for sophisticated users to apply a 2410given plan to a _different_ array using the "new-array execute" 2411functions detailed below, provided that the following conditions are 2412met: 2413 2414 * The array size, strides, etcetera are the same (since those are set 2415 by the plan). 2416 2417 * The input and output arrays are the same (in-place) or different 2418 (out-of-place) if the plan was originally created to be in-place or 2419 out-of-place, respectively. 2420 2421 * For split arrays, the separations between the real and imaginary 2422 parts, 'ii-ri' and 'io-ro', are the same as they were for the input 2423 and output arrays when the plan was created. (This condition is 2424 automatically satisfied for interleaved arrays.) 2425 2426 * The "alignment" of the new input/output arrays is the same as that 2427 of the input/output arrays when the plan was created, unless the 2428 plan was created with the 'FFTW_UNALIGNED' flag. Here, the 2429 alignment is a platform-dependent quantity (for example, it is the 2430 address modulo 16 if SSE SIMD instructions are used, but the 2431 address modulo 4 for non-SIMD single-precision FFTW on the same 2432 machine). In general, only arrays allocated with 'fftw_malloc' are 2433 guaranteed to be equally aligned (*note SIMD alignment and 2434 fftw_malloc::). 2435 2436 The alignment issue is especially critical, because if you don't use 2437'fftw_malloc' then you may have little control over the alignment of 2438arrays in memory. For example, neither the C++ 'new' function nor the 2439Fortran 'allocate' statement provide strong enough guarantees about data 2440alignment. If you don't use 'fftw_malloc', therefore, you probably have 2441to use 'FFTW_UNALIGNED' (which disables most SIMD support). If 2442possible, it is probably better for you to simply create multiple plans 2443(creating a new plan is quick once one exists for a given size), or 2444better yet re-use the same array for your transforms. 2445 2446 For rare circumstances in which you cannot control the alignment of 2447allocated memory, but wish to determine where a given array is aligned 2448like the original array for which a plan was created, you can use the 2449'fftw_alignment_of' function: 2450 int fftw_alignment_of(double *p); 2451 Two arrays have equivalent alignment (for the purposes of applying a 2452plan) if and only if 'fftw_alignment_of' returns the same value for the 2453corresponding pointers to their data (typecast to 'double*' if 2454necessary). 2455 2456 If you are tempted to use the new-array execute interface because you 2457want to transform a known bunch of arrays of the same size, you should 2458probably go use the advanced interface instead (*note Advanced 2459Interface::)). 2460 2461 The new-array execute functions are: 2462 2463 void fftw_execute_dft( 2464 const fftw_plan p, 2465 fftw_complex *in, fftw_complex *out); 2466 2467 void fftw_execute_split_dft( 2468 const fftw_plan p, 2469 double *ri, double *ii, double *ro, double *io); 2470 2471 void fftw_execute_dft_r2c( 2472 const fftw_plan p, 2473 double *in, fftw_complex *out); 2474 2475 void fftw_execute_split_dft_r2c( 2476 const fftw_plan p, 2477 double *in, double *ro, double *io); 2478 2479 void fftw_execute_dft_c2r( 2480 const fftw_plan p, 2481 fftw_complex *in, double *out); 2482 2483 void fftw_execute_split_dft_c2r( 2484 const fftw_plan p, 2485 double *ri, double *ii, double *out); 2486 2487 void fftw_execute_r2r( 2488 const fftw_plan p, 2489 double *in, double *out); 2490 2491 These execute the 'plan' to compute the corresponding transform on 2492the input/output arrays specified by the subsequent arguments. The 2493input/output array arguments have the same meanings as the ones passed 2494to the guru planner routines in the preceding sections. The 'plan' is 2495not modified, and these routines can be called as many times as desired, 2496or intermixed with calls to the ordinary 'fftw_execute'. 2497 2498 The 'plan' _must_ have been created for the transform type 2499corresponding to the execute function, e.g. it must be a complex-DFT 2500plan for 'fftw_execute_dft'. Any of the planner routines for that 2501transform type, from the basic to the guru interface, could have been 2502used to create the plan, however. 2503 2504 2505File: fftw3.info, Node: Wisdom, Next: What FFTW Really Computes, Prev: New-array Execute Functions, Up: FFTW Reference 2506 25074.7 Wisdom 2508========== 2509 2510This section documents the FFTW mechanism for saving and restoring plans 2511from disk. This mechanism is called "wisdom". 2512 2513* Menu: 2514 2515* Wisdom Export:: 2516* Wisdom Import:: 2517* Forgetting Wisdom:: 2518* Wisdom Utilities:: 2519 2520 2521File: fftw3.info, Node: Wisdom Export, Next: Wisdom Import, Prev: Wisdom, Up: Wisdom 2522 25234.7.1 Wisdom Export 2524------------------- 2525 2526 int fftw_export_wisdom_to_filename(const char *filename); 2527 void fftw_export_wisdom_to_file(FILE *output_file); 2528 char *fftw_export_wisdom_to_string(void); 2529 void fftw_export_wisdom(void (*write_char)(char c, void *), void *data); 2530 2531 These functions allow you to export all currently accumulated wisdom 2532in a form from which it can be later imported and restored, even during 2533a separate run of the program. (*Note Words of Wisdom-Saving Plans::.) 2534The current store of wisdom is not affected by calling any of these 2535routines. 2536 2537 'fftw_export_wisdom' exports the wisdom to any output medium, as 2538specified by the callback function 'write_char'. 'write_char' is a 2539'putc'-like function that writes the character 'c' to some output; its 2540second parameter is the 'data' pointer passed to 'fftw_export_wisdom'. 2541For convenience, the following three "wrapper" routines are provided: 2542 2543 'fftw_export_wisdom_to_filename' writes wisdom to a file named 2544'filename' (which is created or overwritten), returning '1' on success 2545and '0' on failure. A lower-level function, which requires you to open 2546and close the file yourself (e.g. if you want to write wisdom to a 2547portion of a larger file) is 'fftw_export_wisdom_to_file'. This writes 2548the wisdom to the current position in 'output_file', which should be 2549open with write permission; upon exit, the file remains open and is 2550positioned at the end of the wisdom data. 2551 2552 'fftw_export_wisdom_to_string' returns a pointer to a 2553'NULL'-terminated string holding the wisdom data. This string is 2554dynamically allocated, and it is the responsibility of the caller to 2555deallocate it with 'free' when it is no longer needed. 2556 2557 All of these routines export the wisdom in the same format, which we 2558will not document here except to say that it is LISP-like ASCII text 2559that is insensitive to white space. 2560 2561 2562File: fftw3.info, Node: Wisdom Import, Next: Forgetting Wisdom, Prev: Wisdom Export, Up: Wisdom 2563 25644.7.2 Wisdom Import 2565------------------- 2566 2567 int fftw_import_system_wisdom(void); 2568 int fftw_import_wisdom_from_filename(const char *filename); 2569 int fftw_import_wisdom_from_string(const char *input_string); 2570 int fftw_import_wisdom(int (*read_char)(void *), void *data); 2571 2572 These functions import wisdom into a program from data stored by the 2573'fftw_export_wisdom' functions above. (*Note Words of Wisdom-Saving 2574Plans::.) The imported wisdom replaces any wisdom already accumulated 2575by the running program. 2576 2577 'fftw_import_wisdom' imports wisdom from any input medium, as 2578specified by the callback function 'read_char'. 'read_char' is a 2579'getc'-like function that returns the next character in the input; its 2580parameter is the 'data' pointer passed to 'fftw_import_wisdom'. If the 2581end of the input data is reached (which should never happen for valid 2582data), 'read_char' should return 'EOF' (as defined in '<stdio.h>'). For 2583convenience, the following three "wrapper" routines are provided: 2584 2585 'fftw_import_wisdom_from_filename' reads wisdom from a file named 2586'filename'. A lower-level function, which requires you to open and 2587close the file yourself (e.g. if you want to read wisdom from a portion 2588of a larger file) is 'fftw_import_wisdom_from_file'. This reads wisdom 2589from the current position in 'input_file' (which should be open with 2590read permission); upon exit, the file remains open, but the position of 2591the read pointer is unspecified. 2592 2593 'fftw_import_wisdom_from_string' reads wisdom from the 2594'NULL'-terminated string 'input_string'. 2595 2596 'fftw_import_system_wisdom' reads wisdom from an 2597implementation-defined standard file ('/usr/local/etc/fftw/wisdom' on Unix and GNU 2598systems). 2599 2600 The return value of these import routines is '1' if the wisdom was 2601read successfully and '0' otherwise. Note that, in all of these 2602functions, any data in the input stream past the end of the wisdom data 2603is simply ignored. 2604 2605 2606File: fftw3.info, Node: Forgetting Wisdom, Next: Wisdom Utilities, Prev: Wisdom Import, Up: Wisdom 2607 26084.7.3 Forgetting Wisdom 2609----------------------- 2610 2611 void fftw_forget_wisdom(void); 2612 2613 Calling 'fftw_forget_wisdom' causes all accumulated 'wisdom' to be 2614discarded and its associated memory to be freed. (New 'wisdom' can 2615still be gathered subsequently, however.) 2616 2617 2618File: fftw3.info, Node: Wisdom Utilities, Prev: Forgetting Wisdom, Up: Wisdom 2619 26204.7.4 Wisdom Utilities 2621---------------------- 2622 2623FFTW includes two standalone utility programs that deal with wisdom. We 2624merely summarize them here, since they come with their own 'man' pages 2625for Unix and GNU systems (with HTML versions on our web site). 2626 2627 The first program is 'fftw-wisdom' (or 'fftwf-wisdom' in single 2628precision, etcetera), which can be used to create a wisdom file 2629containing plans for any of the transform sizes and types supported by 2630FFTW. It is preferable to create wisdom directly from your executable 2631(*note Caveats in Using Wisdom::), but this program is useful for 2632creating global wisdom files for 'fftw_import_system_wisdom'. 2633 2634 The second program is 'fftw-wisdom-to-conf', which takes a wisdom 2635file as input and produces a "configuration routine" as output. The 2636latter is a C subroutine that you can compile and link into your 2637program, replacing a routine of the same name in the FFTW library, that 2638determines which parts of FFTW are callable by your program. 2639'fftw-wisdom-to-conf' produces a configuration routine that links to 2640only those parts of FFTW needed by the saved plans in the wisdom, 2641greatly reducing the size of statically linked executables (which should 2642only attempt to create plans corresponding to those in the wisdom, 2643however). 2644 2645 2646File: fftw3.info, Node: What FFTW Really Computes, Prev: Wisdom, Up: FFTW Reference 2647 26484.8 What FFTW Really Computes 2649============================= 2650 2651In this section, we provide precise mathematical definitions for the 2652transforms that FFTW computes. These transform definitions are fairly 2653standard, but some authors follow slightly different conventions for the 2654normalization of the transform (the constant factor in front) and the 2655sign of the complex exponent. We begin by presenting the 2656one-dimensional (1d) transform definitions, and then give the 2657straightforward extension to multi-dimensional transforms. 2658 2659* Menu: 2660 2661* The 1d Discrete Fourier Transform (DFT):: 2662* The 1d Real-data DFT:: 2663* 1d Real-even DFTs (DCTs):: 2664* 1d Real-odd DFTs (DSTs):: 2665* 1d Discrete Hartley Transforms (DHTs):: 2666* Multi-dimensional Transforms:: 2667 2668 2669File: fftw3.info, Node: The 1d Discrete Fourier Transform (DFT), Next: The 1d Real-data DFT, Prev: What FFTW Really Computes, Up: What FFTW Really Computes 2670 26714.8.1 The 1d Discrete Fourier Transform (DFT) 2672--------------------------------------------- 2673 2674The forward ('FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d 2675complex array X of size n computes an array Y, where: 2676 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) . 2677 The backward ('FFTW_BACKWARD') DFT computes: 2678 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) . 2679 2680 FFTW computes an unnormalized transform, in that there is no 2681coefficient in front of the summation in the DFT. In other words, 2682applying the forward and then the backward transform will multiply the 2683input by n. 2684 2685 From above, an 'FFTW_FORWARD' transform corresponds to a sign of -1 2686in the exponent of the DFT. Note also that we use the standard 2687"in-order" output ordering--the k-th output corresponds to the frequency 2688k/n (or k/T, where T is your total sampling period). For those who like 2689to think in terms of positive and negative frequencies, this means that 2690the positive frequencies are stored in the first half of the output and 2691the negative frequencies are stored in backwards order in the second 2692half of the output. (The frequency -k/n is the same as the frequency 2693(n-k)/n.) 2694 2695 2696File: fftw3.info, Node: The 1d Real-data DFT, Next: 1d Real-even DFTs (DCTs), Prev: The 1d Discrete Fourier Transform (DFT), Up: What FFTW Really Computes 2697 26984.8.2 The 1d Real-data DFT 2699-------------------------- 2700 2701The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of 2702the size 'n' real array X, exactly as defined above, i.e. 2703 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) . 2704 This output array Y can easily be shown to possess the "Hermitian" 2705symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] = 2706Y[0]. 2707 2708 As a result of this symmetry, half of the output Y is redundant 2709(being the complex conjugate of the other half), and so the 1d r2c 2710transforms only output elements 0...n/2 of Y (n/2+1 complex numbers), 2711where the division by 2 is rounded down. 2712 2713 Moreover, the Hermitian symmetry implies that Y[0] and, if n is even, 2714the Y[n/2] element, are purely real. So, for the 'R2HC' r2r transform, 2715the halfcomplex format does not store the imaginary parts of these 2716elements. 2717 2718 The c2r and 'H2RC' r2r transforms compute the backward DFT of the 2719_complex_ array X with Hermitian symmetry, stored in the r2c/'R2HC' 2720output formats, respectively, where the backward transform is defined 2721exactly as for the complex case: 2722 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) . 2723 The outputs 'Y' of this transform can easily be seen to be purely 2724real, and are stored as an array of real numbers. 2725 2726 Like FFTW's complex DFT, these transforms are unnormalized. In other 2727words, applying the real-to-complex (forward) and then the 2728complex-to-real (backward) transform will multiply the input by n. 2729 2730 2731File: fftw3.info, Node: 1d Real-even DFTs (DCTs), Next: 1d Real-odd DFTs (DSTs), Prev: The 1d Real-data DFT, Up: What FFTW Really Computes 2732 27334.8.3 1d Real-even DFTs (DCTs) 2734------------------------------ 2735 2736The Real-even symmetry DFTs in FFTW are exactly equivalent to the 2737unnormalized forward (and backward) DFTs as defined above, where the 2738input array X of length N is purely real and is also "even" symmetry. 2739In this case, the output array is likewise real and even symmetry. 2740 2741 For the case of 'REDFT00', this even symmetry means that X[j] = 2742X[N-j], where we take X to be periodic so that X[N] = X[0]. Because of 2743this redundancy, only the first n real numbers are actually stored, 2744where N = 2(n-1). 2745 2746 The proper definition of even symmetry for 'REDFT10', 'REDFT01', and 2747'REDFT11' transforms is somewhat more intricate because of the shifts by 27481/2 of the input and/or output, although the corresponding boundary 2749conditions are given in *note Real even/odd DFTs (cosine/sine 2750transforms)::. Because of the even symmetry, however, the sine terms in 2751the DFT all cancel and the remaining cosine terms are written explicitly 2752below. This formulation often leads people to call such a transform a 2753"discrete cosine transform" (DCT), although it is really just a special 2754case of the DFT. 2755 2756 In each of the definitions below, we transform a real array X of 2757length n to a real array Y of length n: 2758 2759REDFT00 (DCT-I) 2760............... 2761 2762An 'REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0] + 2763(-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))). 2764Note that this transform is not defined for n=1. For n=2, the summation 2765term above is dropped as you might expect. 2766 2767REDFT10 (DCT-II) 2768................ 2769 2770An 'REDFT10' transform (type-II DCT, sometimes called "the" DCT) in FFTW 2771is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k / 2772n)). 2773 2774REDFT01 (DCT-III) 2775................. 2776 2777An 'REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] = X[0] 2778+ 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)). In the case 2779of n=1, this reduces to Y[0] = X[0]. Up to a scale factor (see below), 2780this is the inverse of 'REDFT10' ("the" DCT), and so the 'REDFT01' 2781(DCT-III) is sometimes called the "IDCT". 2782 2783REDFT11 (DCT-IV) 2784................ 2785 2786An 'REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2 2787(sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)). 2788 2789Inverses and Normalization 2790.......................... 2791 2792These definitions correspond directly to the unnormalized DFTs used 2793elsewhere in FFTW (hence the factors of 2 in front of the summations). 2794The unnormalized inverse of 'REDFT00' is 'REDFT00', of 'REDFT10' is 2795'REDFT01' and vice versa, and of 'REDFT11' is 'REDFT11'. Each 2796unnormalized inverse results in the original array multiplied by N, 2797where N is the _logical_ DFT size. For 'REDFT00', N=2(n-1) (note that 2798n=1 is not defined); otherwise, N=2n. 2799 2800 In defining the discrete cosine transform, some authors also include 2801additional factors of sqrt(2) (or its inverse) multiplying selected 2802inputs and/or outputs. This is a mostly cosmetic change that makes the 2803transform orthogonal, but sacrifices the direct equivalence to a 2804symmetric DFT. 2805 2806 2807File: fftw3.info, Node: 1d Real-odd DFTs (DSTs), Next: 1d Discrete Hartley Transforms (DHTs), Prev: 1d Real-even DFTs (DCTs), Up: What FFTW Really Computes 2808 28094.8.4 1d Real-odd DFTs (DSTs) 2810----------------------------- 2811 2812The Real-odd symmetry DFTs in FFTW are exactly equivalent to the 2813unnormalized forward (and backward) DFTs as defined above, where the 2814input array X of length N is purely real and is also "odd" symmetry. In 2815this case, the output is odd symmetry and purely imaginary. 2816 2817 For the case of 'RODFT00', this odd symmetry means that X[j] = 2818-X[N-j], where we take X to be periodic so that X[N] = X[0]. Because of 2819this redundancy, only the first n real numbers starting at j=1 are 2820actually stored (the j=0 element is zero), where N = 2(n+1). 2821 2822 The proper definition of odd symmetry for 'RODFT10', 'RODFT01', and 2823'RODFT11' transforms is somewhat more intricate because of the shifts by 28241/2 of the input and/or output, although the corresponding boundary 2825conditions are given in *note Real even/odd DFTs (cosine/sine 2826transforms)::. Because of the odd symmetry, however, the cosine terms 2827in the DFT all cancel and the remaining sine terms are written 2828explicitly below. This formulation often leads people to call such a 2829transform a "discrete sine transform" (DST), although it is really just 2830a special case of the DFT. 2831 2832 In each of the definitions below, we transform a real array X of 2833length n to a real array Y of length n: 2834 2835RODFT00 (DST-I) 2836............... 2837 2838An 'RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2 (sum 2839for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))). 2840 2841RODFT10 (DST-II) 2842................ 2843 2844An 'RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2 2845(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)). 2846 2847RODFT01 (DST-III) 2848................. 2849 2850An 'RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] = 2851(-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) / 2852n)). In the case of n=1, this reduces to Y[0] = X[0]. 2853 2854RODFT11 (DST-IV) 2855................ 2856 2857An 'RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2 2858(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)). 2859 2860Inverses and Normalization 2861.......................... 2862 2863These definitions correspond directly to the unnormalized DFTs used 2864elsewhere in FFTW (hence the factors of 2 in front of the summations). 2865The unnormalized inverse of 'RODFT00' is 'RODFT00', of 'RODFT10' is 2866'RODFT01' and vice versa, and of 'RODFT11' is 'RODFT11'. Each 2867unnormalized inverse results in the original array multiplied by N, 2868where N is the _logical_ DFT size. For 'RODFT00', N=2(n+1); otherwise, 2869N=2n. 2870 2871 In defining the discrete sine transform, some authors also include 2872additional factors of sqrt(2) (or its inverse) multiplying selected 2873inputs and/or outputs. This is a mostly cosmetic change that makes the 2874transform orthogonal, but sacrifices the direct equivalence to an 2875antisymmetric DFT. 2876 2877 2878File: fftw3.info, Node: 1d Discrete Hartley Transforms (DHTs), Next: Multi-dimensional Transforms, Prev: 1d Real-odd DFTs (DSTs), Up: What FFTW Really Computes 2879 28804.8.5 1d Discrete Hartley Transforms (DHTs) 2881------------------------------------------- 2882 2883The discrete Hartley transform (DHT) of a 1d real array X of size n 2884computes a real array Y of the same size, where: 2885Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)]. 2886 2887 FFTW computes an unnormalized transform, in that there is no 2888coefficient in front of the summation in the DHT. In other words, 2889applying the transform twice (the DHT is its own inverse) will multiply 2890the input by n. 2891 2892 2893File: fftw3.info, Node: Multi-dimensional Transforms, Prev: 1d Discrete Hartley Transforms (DHTs), Up: What FFTW Really Computes 2894 28954.8.6 Multi-dimensional Transforms 2896---------------------------------- 2897 2898The multi-dimensional transforms of FFTW, in general, compute simply the 2899separable product of the given 1d transform along each dimension of the 2900array. Since each of these transforms is unnormalized, computing the 2901forward followed by the backward/inverse multi-dimensional transform 2902will result in the original array scaled by the product of the 2903normalization factors for each dimension (e.g. the product of the 2904dimension sizes, for a multi-dimensional DFT). 2905 2906 The definition of FFTW's multi-dimensional DFT of real data (r2c) 2907deserves special attention. In this case, we logically compute the full 2908multi-dimensional DFT of the input data; since the input data are purely 2909real, the output data have the Hermitian symmetry and therefore only one 2910non-redundant half need be stored. More specifically, for an n[0] x 2911n[1] x n[2] x ... x n[d-1] multi-dimensional real-input DFT, the full 2912(logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the 2913symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ..., 2914n[d-1] - k[d-1]]* (where each dimension is periodic). Because of this 2915symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the _last_ 2916dimension (division by 2 is rounded down). (We could instead have cut 2917any other dimension in half, but the last dimension proved 2918computationally convenient.) This results in the peculiar array format 2919described in more detail by *note Real-data DFT Array Format::. 2920 2921 The multi-dimensional c2r transform is simply the unnormalized 2922inverse of the r2c transform. i.e. it is the same as FFTW's complex 2923backward multi-dimensional DFT, operating on a Hermitian input array in 2924the peculiar format mentioned above and outputting a real array (since 2925the DFT output is purely real). 2926 2927 We should remind the user that the separable product of 1d transforms 2928along each dimension, as computed by FFTW, is not always the same thing 2929as the usual multi-dimensional transform. A multi-dimensional 'R2HC' 2930(or 'HC2R') transform is not identical to the multi-dimensional DFT, 2931requiring some post-processing to combine the requisite real and 2932imaginary parts, as was described in *note The Halfcomplex-format DFT::. 2933Likewise, FFTW's multidimensional 'FFTW_DHT' r2r transform is not the 2934same thing as the logical multi-dimensional discrete Hartley transform 2935defined in the literature, as discussed in *note The Discrete Hartley 2936Transform::. 2937 2938 2939File: fftw3.info, Node: Multi-threaded FFTW, Next: Distributed-memory FFTW with MPI, Prev: FFTW Reference, Up: Top 2940 29415 Multi-threaded FFTW 2942********************* 2943 2944In this chapter we document the parallel FFTW routines for shared-memory 2945parallel hardware. These routines, which support parallel one- and 2946multi-dimensional transforms of both real and complex data, are the 2947easiest way to take advantage of multiple processors with FFTW. They 2948work just like the corresponding uniprocessor transform routines, except 2949that you have an extra initialization routine to call, and there is a 2950routine to set the number of threads to employ. Any program that uses 2951the uniprocessor FFTW can therefore be trivially modified to use the 2952multi-threaded FFTW. 2953 2954 A shared-memory machine is one in which all CPUs can directly access 2955the same main memory, and such machines are now common due to the 2956ubiquity of multi-core CPUs. FFTW's multi-threading support allows you 2957to utilize these additional CPUs transparently from a single program. 2958However, this does not necessarily translate into performance 2959gains--when multiple threads/CPUs are employed, there is an overhead 2960required for synchronization that may outweigh the computatational 2961parallelism. Therefore, you can only benefit from threads if your 2962problem is sufficiently large. 2963 2964* Menu: 2965 2966* Installation and Supported Hardware/Software:: 2967* Usage of Multi-threaded FFTW:: 2968* How Many Threads to Use?:: 2969* Thread safety:: 2970 2971 2972File: fftw3.info, Node: Installation and Supported Hardware/Software, Next: Usage of Multi-threaded FFTW, Prev: Multi-threaded FFTW, Up: Multi-threaded FFTW 2973 29745.1 Installation and Supported Hardware/Software 2975================================================ 2976 2977All of the FFTW threads code is located in the 'threads' subdirectory of 2978the FFTW package. On Unix systems, the FFTW threads libraries and 2979header files can be automatically configured, compiled, and installed 2980along with the uniprocessor FFTW libraries simply by including 2981'--enable-threads' in the flags to the 'configure' script (*note 2982Installation on Unix::), or '--enable-openmp' to use OpenMP 2983(http://www.openmp.org) threads. 2984 2985 The threads routines require your operating system to have some sort 2986of shared-memory threads support. Specifically, the FFTW threads 2987package works with POSIX threads (available on most Unix variants, from 2988GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which are 2989supported in many common compilers (e.g. gcc) are also supported, and 2990may give better performance on some systems. (OpenMP threads are also 2991useful if you are employing OpenMP in your own code, in order to 2992minimize conflicts between threading models.) If you have a 2993shared-memory machine that uses a different threads API, it should be a 2994simple matter of programming to include support for it; see the file 2995'threads/threads.c' for more detail. 2996 2997 You can compile FFTW with _both_ '--enable-threads' and 2998'--enable-openmp' at the same time, since they install libraries with 2999different names ('fftw3_threads' and 'fftw3_omp', as described below). 3000However, your programs may only link to _one_ of these two libraries at 3001a time. 3002 3003 Ideally, of course, you should also have multiple processors in order 3004to get any benefit from the threaded transforms. 3005 3006 3007File: fftw3.info, Node: Usage of Multi-threaded FFTW, Next: How Many Threads to Use?, Prev: Installation and Supported Hardware/Software, Up: Multi-threaded FFTW 3008 30095.2 Usage of Multi-threaded FFTW 3010================================ 3011 3012Here, it is assumed that the reader is already familiar with the usage 3013of the uniprocessor FFTW routines, described elsewhere in this manual. 3014We only describe what one has to change in order to use the 3015multi-threaded routines. 3016 3017 First, programs using the parallel complex transforms should be 3018linked with '-lfftw3_threads -lfftw3 -lm' on Unix, or '-lfftw3_omp 3019-lfftw3 -lm' if you compiled with OpenMP. You will also need to link 3020with whatever library is responsible for threads on your system (e.g. 3021'-lpthread' on GNU/Linux) or include whatever compiler flag enables 3022OpenMP (e.g. '-fopenmp' with gcc). 3023 3024 Second, before calling _any_ FFTW routines, you should call the 3025function: 3026 3027 int fftw_init_threads(void); 3028 3029 This function, which need only be called once, performs any one-time 3030initialization required to use threads on your system. It returns zero 3031if there was some error (which should not happen under normal 3032circumstances) and a non-zero value otherwise. 3033 3034 Third, before creating a plan that you want to parallelize, you 3035should call: 3036 3037 void fftw_plan_with_nthreads(int nthreads); 3038 3039 The 'nthreads' argument indicates the number of threads you want FFTW 3040to use (or actually, the maximum number). All plans subsequently 3041created with any planner routine will use that many threads. You can 3042call 'fftw_plan_with_nthreads', create some plans, call 3043'fftw_plan_with_nthreads' again with a different argument, and create 3044some more plans for a new number of threads. Plans already created 3045before a call to 'fftw_plan_with_nthreads' are unaffected. If you pass 3046an 'nthreads' argument of '1' (the default), threads are disabled for 3047subsequent plans. 3048 3049 You can determine the current number of threads that the planner can 3050use by calling: 3051 3052 int fftw_planner_nthreads(void); 3053 3054 With OpenMP, to configure FFTW to use all of the currently running 3055OpenMP threads (set by 'omp_set_num_threads(nthreads)' or by the 3056'OMP_NUM_THREADS' environment variable), you can do: 3057'fftw_plan_with_nthreads(omp_get_max_threads())'. (The 'omp_' OpenMP 3058functions are declared via '#include <omp.h>'.) 3059 3060 Given a plan, you then execute it as usual with 'fftw_execute(plan)', 3061and the execution will use the number of threads specified when the plan 3062was created. When done, you destroy it as usual with 3063'fftw_destroy_plan'. As described in *note Thread safety::, plan 3064_execution_ is thread-safe, but plan creation and destruction are _not_: 3065you should create/destroy plans only from a single thread, but can 3066safely execute multiple plans in parallel. 3067 3068 There is one additional routine: if you want to get rid of all memory 3069and other resources allocated internally by FFTW, you can call: 3070 3071 void fftw_cleanup_threads(void); 3072 3073 which is much like the 'fftw_cleanup()' function except that it also 3074gets rid of threads-related data. You must _not_ execute any previously 3075created plans after calling this function. 3076 3077 We should also mention one other restriction: if you save wisdom from 3078a program using the multi-threaded FFTW, that wisdom _cannot be used_ by 3079a program using only the single-threaded FFTW (i.e. not calling 3080'fftw_init_threads'). *Note Words of Wisdom-Saving Plans::. 3081 3082 Finally, FFTW provides a optional callback interface that allows you 3083to replace its parallel threading backend at runtime: 3084 3085 void fftw_threads_set_callback( 3086 void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data), 3087 void *data); 3088 3089 This routine (which is _not_ threadsafe and should generally be 3090called before creating any FFTW plans) allows you to provide a function 3091'parallel_loop' that executes parallel work for FFTW: it should call the 3092function 'work(jobdata + elsize*i)' for 'i' from '0' to 'njobs-1', 3093possibly in parallel. (The 'data' pointer supplied to 3094'fftw_threads_set_callback' is passed through to your 'parallel_loop' 3095function.) For example, if you link to an FFTW threads library built to 3096use POSIX threads, but you want it to use OpenMP instead (because you 3097are using OpenMP elsewhere in your program and want to avoid competing 3098threads), you can call 'fftw_threads_set_callback' with the callback 3099function: 3100 3101 void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data) 3102 { 3103 #pragma omp parallel for 3104 for (int i = 0; i < njobs; ++i) 3105 work(jobdata + elsize * i); 3106 } 3107 3108 The same mechanism could be used in order to make FFTW use a 3109threading backend implemented via Intel TBB, Apple GCD, or Cilk, for 3110example. 3111 3112 3113File: fftw3.info, Node: How Many Threads to Use?, Next: Thread safety, Prev: Usage of Multi-threaded FFTW, Up: Multi-threaded FFTW 3114 31155.3 How Many Threads to Use? 3116============================ 3117 3118There is a fair amount of overhead involved in synchronizing threads, so 3119the optimal number of threads to use depends upon the size of the 3120transform as well as on the number of processors you have. 3121 3122 As a general rule, you don't want to use more threads than you have 3123processors. (Using more threads will work, but there will be extra 3124overhead with no benefit.) In fact, if the problem size is too small, 3125you may want to use fewer threads than you have processors. 3126 3127 You will have to experiment with your system to see what level of 3128parallelization is best for your problem size. Typically, the problem 3129will have to involve at least a few thousand data points before threads 3130become beneficial. If you plan with 'FFTW_PATIENT', it will 3131automatically disable threads for sizes that don't benefit from 3132parallelization. 3133 3134 3135File: fftw3.info, Node: Thread safety, Prev: How Many Threads to Use?, Up: Multi-threaded FFTW 3136 31375.4 Thread safety 3138================= 3139 3140Users writing multi-threaded programs (including OpenMP) must concern 3141themselves with the "thread safety" of the libraries they use--that is, 3142whether it is safe to call routines in parallel from multiple threads. 3143FFTW can be used in such an environment, but some care must be taken 3144because the planner routines share data (e.g. wisdom and trigonometric 3145tables) between calls and plans. 3146 3147 The upshot is that the only thread-safe routine in FFTW is 3148'fftw_execute' (and the new-array variants thereof). All other routines 3149(e.g. the planner) should only be called from one thread at a time. 3150So, for example, you can wrap a semaphore lock around any calls to the 3151planner; even more simply, you can just create all of your plans from 3152one thread. We do not think this should be an important restriction 3153(FFTW is designed for the situation where the only performance-sensitive 3154code is the actual execution of the transform), and the benefits of 3155shared data between plans are great. 3156 3157 Note also that, since the plan is not modified by 'fftw_execute', it 3158is safe to execute the _same plan_ in parallel by multiple threads. 3159However, since a given plan operates by default on a fixed array, you 3160need to use one of the new-array execute functions (*note New-array 3161Execute Functions::) so that different threads compute the transform of 3162different data. 3163 3164 (Users should note that these comments only apply to programs using 3165shared-memory threads or OpenMP. Parallelism using MPI or forked 3166processes involves a separate address-space and global variables for 3167each process, and is not susceptible to problems of this sort.) 3168 3169 The FFTW planner is intended to be called from a single thread. If 3170you really must call it from multiple threads, you are expected to grab 3171whatever lock makes sense for your application, with the understanding 3172that you may be holding that lock for a long time, which is undesirable. 3173 3174 Neither strategy works, however, in the following situation. The 3175"application" is structured as a set of "plugins" which are unaware of 3176each other, and for whatever reason the "plugins" cannot coordinate on 3177grabbing the lock. (This is not a technical problem, but an 3178organizational one. The "plugins" are written by independent agents, 3179and from the perspective of each plugin's author, each plugin is using 3180FFTW correctly from a single thread.) To cope with this situation, 3181starting from FFTW-3.3.5, FFTW supports an API to make the planner 3182thread-safe: 3183 3184 void fftw_make_planner_thread_safe(void); 3185 3186 This call operates by brute force: It just installs a hook that wraps 3187a lock (chosen by us) around all planner calls. So there is no magic 3188and you get the worst of all worlds. The planner is still 3189single-threaded, but you cannot choose which lock to use. The planner 3190still holds the lock for a long time, but you cannot impose a timeout on 3191lock acquisition. As of FFTW-3.3.5 and FFTW-3.3.6, this call does not 3192work when using OpenMP as threading substrate. (Suggestions on what to 3193do about this bug are welcome.) _Do not use 3194'fftw_make_planner_thread_safe' unless there is no other choice,_ such 3195as in the application/plugin situation. 3196 3197 3198File: fftw3.info, Node: Distributed-memory FFTW with MPI, Next: Calling FFTW from Modern Fortran, Prev: Multi-threaded FFTW, Up: Top 3199 32006 Distributed-memory FFTW with MPI 3201********************************** 3202 3203In this chapter we document the parallel FFTW routines for parallel 3204systems supporting the MPI message-passing interface. Unlike the 3205shared-memory threads described in the previous chapter, MPI allows you 3206to use _distributed-memory_ parallelism, where each CPU has its own 3207separate memory, and which can scale up to clusters of many thousands of 3208processors. This capability comes at a price, however: each process 3209only stores a _portion_ of the data to be transformed, which means that 3210the data structures and programming-interface are quite different from 3211the serial or threads versions of FFTW. 3212 3213 Distributed-memory parallelism is especially useful when you are 3214transforming arrays so large that they do not fit into the memory of a 3215single processor. The storage per-process required by FFTW's MPI 3216routines is proportional to the total array size divided by the number 3217of processes. Conversely, distributed-memory parallelism can easily 3218pose an unacceptably high communications overhead for small problems; 3219the threshold problem size for which parallelism becomes advantageous 3220will depend on the precise problem you are interested in, your hardware, 3221and your MPI implementation. 3222 3223 A note on terminology: in MPI, you divide the data among a set of 3224"processes" which each run in their own memory address space. 3225Generally, each process runs on a different physical processor, but this 3226is not required. A set of processes in MPI is described by an opaque 3227data structure called a "communicator," the most common of which is the 3228predefined communicator 'MPI_COMM_WORLD' which refers to _all_ 3229processes. For more information on these and other concepts common to 3230all MPI programs, we refer the reader to the documentation at the MPI 3231home page (http://www.mcs.anl.gov/research/projects/mpi/). 3232 3233 We assume in this chapter that the reader is familiar with the usage 3234of the serial (uniprocessor) FFTW, and focus only on the concepts new to 3235the MPI interface. 3236 3237* Menu: 3238 3239* FFTW MPI Installation:: 3240* Linking and Initializing MPI FFTW:: 3241* 2d MPI example:: 3242* MPI Data Distribution:: 3243* Multi-dimensional MPI DFTs of Real Data:: 3244* Other Multi-dimensional Real-data MPI Transforms:: 3245* FFTW MPI Transposes:: 3246* FFTW MPI Wisdom:: 3247* Avoiding MPI Deadlocks:: 3248* FFTW MPI Performance Tips:: 3249* Combining MPI and Threads:: 3250* FFTW MPI Reference:: 3251* FFTW MPI Fortran Interface:: 3252 3253 3254File: fftw3.info, Node: FFTW MPI Installation, Next: Linking and Initializing MPI FFTW, Prev: Distributed-memory FFTW with MPI, Up: Distributed-memory FFTW with MPI 3255 32566.1 FFTW MPI Installation 3257========================= 3258 3259All of the FFTW MPI code is located in the 'mpi' subdirectory of the 3260FFTW package. On Unix systems, the FFTW MPI libraries and header files 3261are automatically configured, compiled, and installed along with the 3262uniprocessor FFTW libraries simply by including '--enable-mpi' in the 3263flags to the 'configure' script (*note Installation on Unix::). 3264 3265 Any implementation of the MPI standard, version 1 or later, should 3266work with FFTW. The 'configure' script will attempt to automatically 3267detect how to compile and link code using your MPI implementation. In 3268some cases, especially if you have multiple different MPI 3269implementations installed or have an unusual MPI software package, you 3270may need to provide this information explicitly. 3271 3272 Most commonly, one compiles MPI code by invoking a special compiler 3273command, typically 'mpicc' for C code. The 'configure' script knows the 3274most common names for this command, but you can specify the MPI 3275compilation command explicitly by setting the 'MPICC' variable, as in 3276'./configure MPICC=mpicc ...'. 3277 3278 If, instead of a special compiler command, you need to link a certain 3279library, you can specify the link command via the 'MPILIBS' variable, as 3280in './configure MPILIBS=-lmpi ...'. Note that if your MPI library is 3281installed in a non-standard location (one the compiler does not know 3282about by default), you may also have to specify the location of the 3283library and header files via 'LDFLAGS' and 'CPPFLAGS' variables, 3284respectively, as in './configure LDFLAGS=-L/path/to/mpi/libs 3285CPPFLAGS=-I/path/to/mpi/include ...'. 3286 3287 3288File: fftw3.info, Node: Linking and Initializing MPI FFTW, Next: 2d MPI example, Prev: FFTW MPI Installation, Up: Distributed-memory FFTW with MPI 3289 32906.2 Linking and Initializing MPI FFTW 3291===================================== 3292 3293Programs using the MPI FFTW routines should be linked with '-lfftw3_mpi 3294-lfftw3 -lm' on Unix in double precision, '-lfftw3f_mpi -lfftw3f -lm' in 3295single precision, and so on (*note Precision::). You will also need to 3296link with whatever library is responsible for MPI on your system; in 3297most MPI implementations, there is a special compiler alias named 3298'mpicc' to compile and link MPI code. 3299 3300 Before calling any FFTW routines except possibly 'fftw_init_threads' 3301(*note Combining MPI and Threads::), but after calling 'MPI_Init', you 3302should call the function: 3303 3304 void fftw_mpi_init(void); 3305 3306 If, at the end of your program, you want to get rid of all memory and 3307other resources allocated internally by FFTW, for both the serial and 3308MPI routines, you can call: 3309 3310 void fftw_mpi_cleanup(void); 3311 3312 which is much like the 'fftw_cleanup()' function except that it also 3313gets rid of FFTW's MPI-related data. You must _not_ execute any 3314previously created plans after calling this function. 3315 3316 3317File: fftw3.info, Node: 2d MPI example, Next: MPI Data Distribution, Prev: Linking and Initializing MPI FFTW, Up: Distributed-memory FFTW with MPI 3318 33196.3 2d MPI example 3320================== 3321 3322Before we document the FFTW MPI interface in detail, we begin with a 3323simple example outlining how one would perform a two-dimensional 'N0' by 3324'N1' complex DFT. 3325 3326 #include <fftw3-mpi.h> 3327 3328 int main(int argc, char **argv) 3329 { 3330 const ptrdiff_t N0 = ..., N1 = ...; 3331 fftw_plan plan; 3332 fftw_complex *data; 3333 ptrdiff_t alloc_local, local_n0, local_0_start, i, j; 3334 3335 MPI_Init(&argc, &argv); 3336 fftw_mpi_init(); 3337 3338 /* get local data size and allocate */ 3339 alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD, 3340 &local_n0, &local_0_start); 3341 data = fftw_alloc_complex(alloc_local); 3342 3343 /* create plan for in-place forward DFT */ 3344 plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD, 3345 FFTW_FORWARD, FFTW_ESTIMATE); 3346 3347 /* initialize data to some function my_function(x,y) */ 3348 for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j) 3349 data[i*N1 + j] = my_function(local_0_start + i, j); 3350 3351 /* compute transforms, in-place, as many times as desired */ 3352 fftw_execute(plan); 3353 3354 fftw_destroy_plan(plan); 3355 3356 MPI_Finalize(); 3357 } 3358 3359 As can be seen above, the MPI interface follows the same basic style 3360of allocate/plan/execute/destroy as the serial FFTW routines. All of 3361the MPI-specific routines are prefixed with 'fftw_mpi_' instead of 3362'fftw_'. There are a few important differences, however: 3363 3364 First, we must call 'fftw_mpi_init()' after calling 'MPI_Init' 3365(required in all MPI programs) and before calling any other 'fftw_mpi_' 3366routine. 3367 3368 Second, when we create the plan with 'fftw_mpi_plan_dft_2d', 3369analogous to 'fftw_plan_dft_2d', we pass an additional argument: the 3370communicator, indicating which processes will participate in the 3371transform (here 'MPI_COMM_WORLD', indicating all processes). Whenever 3372you create, execute, or destroy a plan for an MPI transform, you must 3373call the corresponding FFTW routine on _all_ processes in the 3374communicator for that transform. (That is, these are _collective_ 3375calls.) Note that the plan for the MPI transform uses the standard 3376'fftw_execute' and 'fftw_destroy' routines (on the other hand, there are 3377MPI-specific new-array execute functions documented below). 3378 3379 Third, all of the FFTW MPI routines take 'ptrdiff_t' arguments 3380instead of 'int' as for the serial FFTW. 'ptrdiff_t' is a standard C 3381integer type which is (at least) 32 bits wide on a 32-bit machine and 64 3382bits wide on a 64-bit machine. This is to make it easy to specify very 3383large parallel transforms on a 64-bit machine. (You can specify 64-bit 3384transform sizes in the serial FFTW, too, but only by using the 'guru64' 3385planner interface. *Note 64-bit Guru Interface::.) 3386 3387 Fourth, and most importantly, you don't allocate the entire 3388two-dimensional array on each process. Instead, you call 3389'fftw_mpi_local_size_2d' to find out what _portion_ of the array resides 3390on each processor, and how much space to allocate. Here, the portion of 3391the array on each process is a 'local_n0' by 'N1' slice of the total 3392array, starting at index 'local_0_start'. The total number of 3393'fftw_complex' numbers to allocate is given by the 'alloc_local' return 3394value, which _may_ be greater than 'local_n0 * N1' (in case some 3395intermediate calculations require additional storage). The data 3396distribution in FFTW's MPI interface is described in more detail by the 3397next section. 3398 3399 Given the portion of the array that resides on the local process, it 3400is straightforward to initialize the data (here to a function 3401'myfunction') and otherwise manipulate it. Of course, at the end of the 3402program you may want to output the data somehow, but synchronizing this 3403output is up to you and is beyond the scope of this manual. (One good 3404way to output a large multi-dimensional distributed array in MPI to a 3405portable binary file is to use the free HDF5 library; see the HDF home 3406page (http://www.hdfgroup.org/).) 3407 3408 3409File: fftw3.info, Node: MPI Data Distribution, Next: Multi-dimensional MPI DFTs of Real Data, Prev: 2d MPI example, Up: Distributed-memory FFTW with MPI 3410 34116.4 MPI Data Distribution 3412========================= 3413 3414The most important concept to understand in using FFTW's MPI interface 3415is the data distribution. With a serial or multithreaded FFT, all of 3416the inputs and outputs are stored as a single contiguous chunk of 3417memory. With a distributed-memory FFT, the inputs and outputs are 3418broken into disjoint blocks, one per process. 3419 3420 In particular, FFTW uses a _1d block distribution_ of the data, 3421distributed along the _first dimension_. For example, if you want to 3422perform a 100 x 200 complex DFT, distributed over 4 processes, each 3423process will get a 25 x 200 slice of the data. That is, process 0 will 3424get rows 0 through 24, process 1 will get rows 25 through 49, process 2 3425will get rows 50 through 74, and process 3 will get rows 75 through 99. 3426If you take the same array but distribute it over 3 processes, then it 3427is not evenly divisible so the different processes will have unequal 3428chunks. FFTW's default choice in this case is to assign 34 rows to 3429processes 0 and 1, and 32 rows to process 2. 3430 3431 FFTW provides several 'fftw_mpi_local_size' routines that you can 3432call to find out what portion of an array is stored on the current 3433process. In most cases, you should use the default block sizes picked 3434by FFTW, but it is also possible to specify your own block size. For 3435example, with a 100 x 200 array on three processes, you can tell FFTW to 3436use a block size of 40, which would assign 40 rows to processes 0 and 1, 3437and 20 rows to process 2. FFTW's default is to divide the data equally 3438among the processes if possible, and as best it can otherwise. The rows 3439are always assigned in "rank order," i.e. process 0 gets the first 3440block of rows, then process 1, and so on. (You can change this by using 3441'MPI_Comm_split' to create a new communicator with re-ordered 3442processes.) However, you should always call the 'fftw_mpi_local_size' 3443routines, if possible, rather than trying to predict FFTW's distribution 3444choices. 3445 3446 In particular, it is critical that you allocate the storage size that 3447is returned by 'fftw_mpi_local_size', which is _not_ necessarily the 3448size of the local slice of the array. The reason is that intermediate 3449steps of FFTW's algorithms involve transposing the array and 3450redistributing the data, so at these intermediate steps FFTW may require 3451more local storage space (albeit always proportional to the total size 3452divided by the number of processes). The 'fftw_mpi_local_size' 3453functions know how much storage is required for these intermediate steps 3454and tell you the correct amount to allocate. 3455 3456* Menu: 3457 3458* Basic and advanced distribution interfaces:: 3459* Load balancing:: 3460* Transposed distributions:: 3461* One-dimensional distributions:: 3462 3463 3464File: fftw3.info, Node: Basic and advanced distribution interfaces, Next: Load balancing, Prev: MPI Data Distribution, Up: MPI Data Distribution 3465 34666.4.1 Basic and advanced distribution interfaces 3467------------------------------------------------ 3468 3469As with the planner interface, the 'fftw_mpi_local_size' distribution 3470interface is broken into basic and advanced ('_many') interfaces, where 3471the latter allows you to specify the block size manually and also to 3472request block sizes when computing multiple transforms simultaneously. 3473These functions are documented more exhaustively by the FFTW MPI 3474Reference, but we summarize the basic ideas here using a couple of 3475two-dimensional examples. 3476 3477 For the 100 x 200 complex-DFT example, above, we would find the 3478distribution by calling the following function in the basic interface: 3479 3480 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, 3481 ptrdiff_t *local_n0, ptrdiff_t *local_0_start); 3482 3483 Given the total size of the data to be transformed (here, 'n0 = 100' 3484and 'n1 = 200') and an MPI communicator ('comm'), this function provides 3485three numbers. 3486 3487 First, it describes the shape of the local data: the current process 3488should store a 'local_n0' by 'n1' slice of the overall dataset, in 3489row-major order ('n1' dimension contiguous), starting at index 3490'local_0_start'. That is, if the total dataset is viewed as a 'n0' by 3491'n1' matrix, the current process should store the rows 'local_0_start' 3492to 'local_0_start+local_n0-1'. Obviously, if you are running with only 3493a single MPI process, that process will store the entire array: 3494'local_0_start' will be zero and 'local_n0' will be 'n0'. *Note 3495Row-major Format::. 3496 3497 Second, the return value is the total number of data elements (e.g., 3498complex numbers for a complex DFT) that should be allocated for the 3499input and output arrays on the current process (ideally with 3500'fftw_malloc' or an 'fftw_alloc' function, to ensure optimal alignment). 3501It might seem that this should always be equal to 'local_n0 * n1', but 3502this is _not_ the case. FFTW's distributed FFT algorithms require data 3503redistributions at intermediate stages of the transform, and in some 3504circumstances this may require slightly larger local storage. This is 3505discussed in more detail below, under *note Load balancing::. 3506 3507 The advanced-interface 'local_size' function for multidimensional 3508transforms returns the same three things ('local_n0', 'local_0_start', 3509and the total number of elements to allocate), but takes more inputs: 3510 3511 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, 3512 ptrdiff_t howmany, 3513 ptrdiff_t block0, 3514 MPI_Comm comm, 3515 ptrdiff_t *local_n0, 3516 ptrdiff_t *local_0_start); 3517 3518 The two-dimensional case above corresponds to 'rnk = 2' and an array 3519'n' of length 2 with 'n[0] = n0' and 'n[1] = n1'. This routine is for 3520any 'rnk > 1'; one-dimensional transforms have their own interface 3521because they work slightly differently, as discussed below. 3522 3523 First, the advanced interface allows you to perform multiple 3524transforms at once, of interleaved data, as specified by the 'howmany' 3525parameter. ('hoamany' is 1 for a single transform.) 3526 3527 Second, here you can specify your desired block size in the 'n0' 3528dimension, 'block0'. To use FFTW's default block size, pass 3529'FFTW_MPI_DEFAULT_BLOCK' (0) for 'block0'. Otherwise, on 'P' processes, 3530FFTW will return 'local_n0' equal to 'block0' on the first 'P / block0' 3531processes (rounded down), return 'local_n0' equal to 'n0 - block0 * (P / 3532block0)' on the next process, and 'local_n0' equal to zero on any 3533remaining processes. In general, we recommend using the default block 3534size (which corresponds to 'n0 / P', rounded up). 3535 3536 For example, suppose you have 'P = 4' processes and 'n0 = 21'. The 3537default will be a block size of '6', which will give 'local_n0 = 6' on 3538the first three processes and 'local_n0 = 3' on the last process. 3539Instead, however, you could specify 'block0 = 5' if you wanted, which 3540would give 'local_n0 = 5' on processes 0 to 2, 'local_n0 = 6' on process 35413. (This choice, while it may look superficially more "balanced," has 3542the same critical path as FFTW's default but requires more 3543communications.) 3544 3545 3546File: fftw3.info, Node: Load balancing, Next: Transposed distributions, Prev: Basic and advanced distribution interfaces, Up: MPI Data Distribution 3547 35486.4.2 Load balancing 3549-------------------- 3550 3551Ideally, when you parallelize a transform over some P processes, each 3552process should end up with work that takes equal time. Otherwise, all 3553of the processes end up waiting on whichever process is slowest. This 3554goal is known as "load balancing." In this section, we describe the 3555circumstances under which FFTW is able to load-balance well, and in 3556particular how you should choose your transform size in order to load 3557balance. 3558 3559 Load balancing is especially difficult when you are parallelizing 3560over heterogeneous machines; for example, if one of your processors is a 3561old 486 and another is a Pentium IV, obviously you should give the 3562Pentium more work to do than the 486 since the latter is much slower. 3563FFTW does not deal with this problem, however--it assumes that your 3564processes run on hardware of comparable speed, and that the goal is 3565therefore to divide the problem as equally as possible. 3566 3567 For a multi-dimensional complex DFT, FFTW can divide the problem 3568equally among the processes if: (i) the _first_ dimension 'n0' is 3569divisible by P; and (ii), the _product_ of the subsequent dimensions is 3570divisible by P. (For the advanced interface, where you can specify 3571multiple simultaneous transforms via some "vector" length 'howmany', a 3572factor of 'howmany' is included in the product of the subsequent 3573dimensions.) 3574 3575 For a one-dimensional complex DFT, the length 'N' of the data should 3576be divisible by P _squared_ to be able to divide the problem equally 3577among the processes. 3578 3579 3580File: fftw3.info, Node: Transposed distributions, Next: One-dimensional distributions, Prev: Load balancing, Up: MPI Data Distribution 3581 35826.4.3 Transposed distributions 3583------------------------------ 3584 3585Internally, FFTW's MPI transform algorithms work by first computing 3586transforms of the data local to each process, then by globally 3587_transposing_ the data in some fashion to redistribute the data among 3588the processes, transforming the new data local to each process, and 3589transposing back. For example, a two-dimensional 'n0' by 'n1' array, 3590distributed across the 'n0' dimension, is transformd by: (i) 3591transforming the 'n1' dimension, which are local to each process; (ii) 3592transposing to an 'n1' by 'n0' array, distributed across the 'n1' 3593dimension; (iii) transforming the 'n0' dimension, which is now local to 3594each process; (iv) transposing back. 3595 3596 However, in many applications it is acceptable to compute a 3597multidimensional DFT whose results are produced in transposed order 3598(e.g., 'n1' by 'n0' in two dimensions). This provides a significant 3599performance advantage, because it means that the final transposition 3600step can be omitted. FFTW supports this optimization, which you specify 3601by passing the flag 'FFTW_MPI_TRANSPOSED_OUT' to the planner routines. 3602To compute the inverse transform of transposed output, you specify 3603'FFTW_MPI_TRANSPOSED_IN' to tell it that the input is transposed. In 3604this section, we explain how to interpret the output format of such a 3605transform. 3606 3607 Suppose you have are transforming multi-dimensional data with (at 3608least two) dimensions n[0] x n[1] x n[2] x ... x n[d-1] . As always, 3609it is distributed along the first dimension n[0] . Now, if we compute 3610its DFT with the 'FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output 3611data are stored with the first _two_ dimensions transposed: n[1] x n[0] 3612x n[2] x ... x n[d-1] , distributed along the n[1] dimension. 3613Conversely, if we take the n[1] x n[0] x n[2] x ... x n[d-1] data and 3614transform it with the 'FFTW_MPI_TRANSPOSED_IN' flag, then the format 3615goes back to the original n[0] x n[1] x n[2] x ... x n[d-1] array. 3616 3617 There are two ways to find the portion of the transposed array that 3618resides on the current process. First, you can simply call the 3619appropriate 'local_size' function, passing n[1] x n[0] x n[2] x ... x 3620n[d-1] (the transposed dimensions). This would mean calling the 3621'local_size' function twice, once for the transposed and once for the 3622non-transposed dimensions. Alternatively, you can call one of the 3623'local_size_transposed' functions, which returns both the non-transposed 3624and transposed data distribution from a single call. For example, for a 36253d transform with transposed output (or input), you might call: 3626 3627 ptrdiff_t fftw_mpi_local_size_3d_transposed( 3628 ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm, 3629 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 3630 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 3631 3632 Here, 'local_n0' and 'local_0_start' give the size and starting index 3633of the 'n0' dimension for the _non_-transposed data, as in the previous 3634sections. For _transposed_ data (e.g. the output for 3635'FFTW_MPI_TRANSPOSED_OUT'), 'local_n1' and 'local_1_start' give the size 3636and starting index of the 'n1' dimension, which is the first dimension 3637of the transposed data ('n1' by 'n0' by 'n2'). 3638 3639 (Note that 'FFTW_MPI_TRANSPOSED_IN' is completely equivalent to 3640performing 'FFTW_MPI_TRANSPOSED_OUT' and passing the first two 3641dimensions to the planner in reverse order, or vice versa. If you pass 3642_both_ the 'FFTW_MPI_TRANSPOSED_IN' and 'FFTW_MPI_TRANSPOSED_OUT' flags, 3643it is equivalent to swapping the first two dimensions passed to the 3644planner and passing _neither_ flag.) 3645 3646 3647File: fftw3.info, Node: One-dimensional distributions, Prev: Transposed distributions, Up: MPI Data Distribution 3648 36496.4.4 One-dimensional distributions 3650----------------------------------- 3651 3652For one-dimensional distributed DFTs using FFTW, matters are slightly 3653more complicated because the data distribution is more closely tied to 3654how the algorithm works. In particular, you can no longer pass an 3655arbitrary block size and must accept FFTW's default; also, the block 3656sizes may be different for input and output. Also, the data 3657distribution depends on the flags and transform direction, in order for 3658forward and backward transforms to work correctly. 3659 3660 ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm, 3661 int sign, unsigned flags, 3662 ptrdiff_t *local_ni, ptrdiff_t *local_i_start, 3663 ptrdiff_t *local_no, ptrdiff_t *local_o_start); 3664 3665 This function computes the data distribution for a 1d transform of 3666size 'n0' with the given transform 'sign' and 'flags'. Both input and 3667output data use block distributions. The input on the current process 3668will consist of 'local_ni' numbers starting at index 'local_i_start'; 3669e.g. if only a single process is used, then 'local_ni' will be 'n0' and 3670'local_i_start' will be '0'. Similarly for the output, with 'local_no' 3671numbers starting at index 'local_o_start'. The return value of 3672'fftw_mpi_local_size_1d' will be the total number of elements to 3673allocate on the current process (which might be slightly larger than the 3674local size due to intermediate steps in the algorithm). 3675 3676 As mentioned above (*note Load balancing::), the data will be divided 3677equally among the processes if 'n0' is divisible by the _square_ of the 3678number of processes. In this case, 'local_ni' will equal 'local_no'. 3679Otherwise, they may be different. 3680 3681 For some applications, such as convolutions, the order of the output 3682data is irrelevant. In this case, performance can be improved by 3683specifying that the output data be stored in an FFTW-defined "scrambled" 3684format. (In particular, this is the analogue of transposed output in 3685the multidimensional case: scrambled output saves a communications 3686step.) If you pass 'FFTW_MPI_SCRAMBLED_OUT' in the flags, then the 3687output is stored in this (undocumented) scrambled order. Conversely, to 3688perform the inverse transform of data in scrambled order, pass the 3689'FFTW_MPI_SCRAMBLED_IN' flag. 3690 3691 In MPI FFTW, only composite sizes 'n0' can be parallelized; we have 3692not yet implemented a parallel algorithm for large prime sizes. 3693 3694 3695File: fftw3.info, Node: Multi-dimensional MPI DFTs of Real Data, Next: Other Multi-dimensional Real-data MPI Transforms, Prev: MPI Data Distribution, Up: Distributed-memory FFTW with MPI 3696 36976.5 Multi-dimensional MPI DFTs of Real Data 3698=========================================== 3699 3700FFTW's MPI interface also supports multi-dimensional DFTs of real data, 3701similar to the serial r2c and c2r interfaces. (Parallel one-dimensional 3702real-data DFTs are not currently supported; you must use a complex 3703transform and set the imaginary parts of the inputs to zero.) 3704 3705 The key points to understand for r2c and c2r MPI transforms (compared 3706to the MPI complex DFTs or the serial r2c/c2r transforms), are: 3707 3708 * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1] x 3709 n[2] x ... x n[d-1] real data to/from n[0] x n[1] x n[2] x ... x 3710 (n[d-1]/2 + 1) complex data: the last dimension of the complex data 3711 is cut in half (rounded down), plus one. As for the serial 3712 transforms, the sizes you pass to the 'plan_dft_r2c' and 3713 'plan_dft_c2r' are the n[0] x n[1] x n[2] x ... x n[d-1] 3714 dimensions of the real data. 3715 3716 * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ... 3717 x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x ... 3718 x [2 (n[d-1]/2 + 1)] array, where the last dimension has been 3719 _padded_ to make it the same size as the complex output. This is 3720 much like the in-place serial r2c/c2r interface (*note 3721 Multi-Dimensional DFTs of Real Data::), except that in MPI the 3722 padding is required even for out-of-place data. The extra padding 3723 numbers are ignored by FFTW (they are _not_ like zero-padding the 3724 transform to a larger size); they are only used to determine the 3725 data layout. 3726 3727 * The data distribution in MPI for _both_ the real and complex data 3728 is determined by the shape of the _complex_ data. That is, you 3729 call the appropriate 'local size' function for the n[0] x n[1] x 3730 n[2] x ... x (n[d-1]/2 + 1) complex data, and then use the _same_ 3731 distribution for the real data except that the last complex 3732 dimension is replaced by a (padded) real dimension of twice the 3733 length. 3734 3735 For example suppose we are performing an out-of-place r2c transform 3736of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x M 3737x N/2+1 complex data. Similar to the example in *note 2d MPI example::, 3738we might do something like: 3739 3740 #include <fftw3-mpi.h> 3741 3742 int main(int argc, char **argv) 3743 { 3744 const ptrdiff_t L = ..., M = ..., N = ...; 3745 fftw_plan plan; 3746 double *rin; 3747 fftw_complex *cout; 3748 ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k; 3749 3750 MPI_Init(&argc, &argv); 3751 fftw_mpi_init(); 3752 3753 /* get local data size and allocate */ 3754 alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD, 3755 &local_n0, &local_0_start); 3756 rin = fftw_alloc_real(2 * alloc_local); 3757 cout = fftw_alloc_complex(alloc_local); 3758 3759 /* create plan for out-of-place r2c DFT */ 3760 plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD, 3761 FFTW_MEASURE); 3762 3763 /* initialize rin to some function my_func(x,y,z) */ 3764 for (i = 0; i < local_n0; ++i) 3765 for (j = 0; j < M; ++j) 3766 for (k = 0; k < N; ++k) 3767 rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k); 3768 3769 /* compute transforms as many times as desired */ 3770 fftw_execute(plan); 3771 3772 fftw_destroy_plan(plan); 3773 3774 MPI_Finalize(); 3775 } 3776 3777 Note that we allocated 'rin' using 'fftw_alloc_real' with an argument 3778of '2 * alloc_local': since 'alloc_local' is the number of _complex_ 3779values to allocate, the number of _real_ values is twice as many. The 3780'rin' array is then local_n0 x M x 2(N/2+1) in row-major order, so its 3781'(i,j,k)' element is at the index '(i*M + j) * (2*(N/2+1)) + k' (*note 3782Multi-dimensional Array Format::). 3783 3784 As for the complex transforms, improved performance can be obtained 3785by specifying that the output is the transpose of the input or vice 3786versa (*note Transposed distributions::). In our L x M x N r2c example, 3787including 'FFTW_TRANSPOSED_OUT' in the flags means that the input would 3788be a padded L x M x 2(N/2+1) real array distributed over the 'L' 3789dimension, while the output would be a M x L x N/2+1 complex array 3790distributed over the 'M' dimension. To perform the inverse c2r 3791transform with the same data distributions, you would use the 3792'FFTW_TRANSPOSED_IN' flag. 3793 3794 3795File: fftw3.info, Node: Other Multi-dimensional Real-data MPI Transforms, Next: FFTW MPI Transposes, Prev: Multi-dimensional MPI DFTs of Real Data, Up: Distributed-memory FFTW with MPI 3796 37976.6 Other multi-dimensional Real-Data MPI Transforms 3798==================================================== 3799 3800FFTW's MPI interface also supports multi-dimensional 'r2r' transforms of 3801all kinds supported by the serial interface (e.g. discrete cosine and 3802sine transforms, discrete Hartley transforms, etc.). Only 3803multi-dimensional 'r2r' transforms, not one-dimensional transforms, are 3804currently parallelized. 3805 3806 These are used much like the multidimensional complex DFTs discussed 3807above, except that the data is real rather than complex, and one needs 3808to pass an r2r transform kind ('fftw_r2r_kind') for each dimension as in 3809the serial FFTW (*note More DFTs of Real Data::). 3810 3811 For example, one might perform a two-dimensional L x M that is an 3812REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the 3813second dimension with code like: 3814 3815 const ptrdiff_t L = ..., M = ...; 3816 fftw_plan plan; 3817 double *data; 3818 ptrdiff_t alloc_local, local_n0, local_0_start, i, j; 3819 3820 /* get local data size and allocate */ 3821 alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD, 3822 &local_n0, &local_0_start); 3823 data = fftw_alloc_real(alloc_local); 3824 3825 /* create plan for in-place REDFT10 x RODFT10 */ 3826 plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD, 3827 FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE); 3828 3829 /* initialize data to some function my_function(x,y) */ 3830 for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j) 3831 data[i*M + j] = my_function(local_0_start + i, j); 3832 3833 /* compute transforms, in-place, as many times as desired */ 3834 fftw_execute(plan); 3835 3836 fftw_destroy_plan(plan); 3837 3838 Notice that we use the same 'local_size' functions as we did for 3839complex data, only now we interpret the sizes in terms of real rather 3840than complex values, and correspondingly use 'fftw_alloc_real'. 3841 3842 3843File: fftw3.info, Node: FFTW MPI Transposes, Next: FFTW MPI Wisdom, Prev: Other Multi-dimensional Real-data MPI Transforms, Up: Distributed-memory FFTW with MPI 3844 38456.7 FFTW MPI Transposes 3846======================= 3847 3848The FFTW's MPI Fourier transforms rely on one or more _global 3849transposition_ step for their communications. For example, the 3850multidimensional transforms work by transforming along some dimensions, 3851then transposing to make the first dimension local and transforming 3852that, then transposing back. Because global transposition of a 3853block-distributed matrix has many other potential uses besides FFTs, 3854FFTW's transpose routines can be called directly, as documented in this 3855section. 3856 3857* Menu: 3858 3859* Basic distributed-transpose interface:: 3860* Advanced distributed-transpose interface:: 3861* An improved replacement for MPI_Alltoall:: 3862 3863 3864File: fftw3.info, Node: Basic distributed-transpose interface, Next: Advanced distributed-transpose interface, Prev: FFTW MPI Transposes, Up: FFTW MPI Transposes 3865 38666.7.1 Basic distributed-transpose interface 3867------------------------------------------- 3868 3869In particular, suppose that we have an 'n0' by 'n1' array in row-major 3870order, block-distributed across the 'n0' dimension. To transpose this 3871into an 'n1' by 'n0' array block-distributed across the 'n1' dimension, 3872we would create a plan by calling the following function: 3873 3874 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1, 3875 double *in, double *out, 3876 MPI_Comm comm, unsigned flags); 3877 3878 The input and output arrays ('in' and 'out') can be the same. The 3879transpose is actually executed by calling 'fftw_execute' on the plan, as 3880usual. 3881 3882 The 'flags' are the usual FFTW planner flags, but support two 3883additional flags: 'FFTW_MPI_TRANSPOSED_OUT' and/or 3884'FFTW_MPI_TRANSPOSED_IN'. What these flags indicate, for transpose 3885plans, is that the output and/or input, respectively, are _locally_ 3886transposed. That is, on each process input data is normally stored as a 3887'local_n0' by 'n1' array in row-major order, but for an 3888'FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as 'n1' by 3889'local_n0' in row-major order. Similarly, 'FFTW_MPI_TRANSPOSED_OUT' 3890means that the output is 'n0' by 'local_n1' instead of 'local_n1' by 3891'n0'. 3892 3893 To determine the local size of the array on each process before and 3894after the transpose, as well as the amount of storage that must be 3895allocated, one should call 'fftw_mpi_local_size_2d_transposed', just as 3896for a 2d DFT as described in the previous section: 3897 3898 ptrdiff_t fftw_mpi_local_size_2d_transposed 3899 (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, 3900 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 3901 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 3902 3903 Again, the return value is the local storage to allocate, which in 3904this case is the number of _real_ ('double') values rather than complex 3905numbers as in the previous examples. 3906 3907 3908File: fftw3.info, Node: Advanced distributed-transpose interface, Next: An improved replacement for MPI_Alltoall, Prev: Basic distributed-transpose interface, Up: FFTW MPI Transposes 3909 39106.7.2 Advanced distributed-transpose interface 3911---------------------------------------------- 3912 3913The above routines are for a transpose of a matrix of numbers (of type 3914'double'), using FFTW's default block sizes. More generally, one can 3915perform transposes of _tuples_ of numbers, with user-specified block 3916sizes for the input and output: 3917 3918 fftw_plan fftw_mpi_plan_many_transpose 3919 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany, 3920 ptrdiff_t block0, ptrdiff_t block1, 3921 double *in, double *out, MPI_Comm comm, unsigned flags); 3922 3923 In this case, one is transposing an 'n0' by 'n1' matrix of 3924'howmany'-tuples (e.g. 'howmany = 2' for complex numbers). The input 3925is distributed along the 'n0' dimension with block size 'block0', and 3926the 'n1' by 'n0' output is distributed along the 'n1' dimension with 3927block size 'block1'. If 'FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a 3928block size then FFTW uses its default block size. To get the local size 3929of the data on each process, you should then call 3930'fftw_mpi_local_size_many_transposed'. 3931 3932 3933File: fftw3.info, Node: An improved replacement for MPI_Alltoall, Prev: Advanced distributed-transpose interface, Up: FFTW MPI Transposes 3934 39356.7.3 An improved replacement for MPI_Alltoall 3936---------------------------------------------- 3937 3938We close this section by noting that FFTW's MPI transpose routines can 3939be thought of as a generalization for the 'MPI_Alltoall' function 3940(albeit only for floating-point types), and in some circumstances can 3941function as an improved replacement. 3942 3943 'MPI_Alltoall' is defined by the MPI standard as: 3944 3945 int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype, 3946 void *recvbuf, int recvcnt, MPI_Datatype recvtype, 3947 MPI_Comm comm); 3948 3949 In particular, for 'double*' arrays 'in' and 'out', consider the 3950call: 3951 3952 MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm); 3953 3954 This is completely equivalent to: 3955 3956 MPI_Comm_size(comm, &P); 3957 plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE); 3958 fftw_execute(plan); 3959 fftw_destroy_plan(plan); 3960 3961 That is, computing a P x P transpose on 'P' processes, with a block 3962size of 1, is just a standard all-to-all communication. 3963 3964 However, using the FFTW routine instead of 'MPI_Alltoall' may have 3965certain advantages. First of all, FFTW's routine can operate in-place 3966('in == out') whereas 'MPI_Alltoall' can only operate out-of-place. 3967 3968 Second, even for out-of-place plans, FFTW's routine may be faster, 3969especially if you need to perform the all-to-all communication many 3970times and can afford to use 'FFTW_MEASURE' or 'FFTW_PATIENT'. It should 3971certainly be no slower, not including the time to create the plan, since 3972one of the possible algorithms that FFTW uses for an out-of-place 3973transpose _is_ simply to call 'MPI_Alltoall'. However, FFTW also 3974considers several other possible algorithms that, depending on your MPI 3975implementation and your hardware, may be faster. 3976 3977 3978File: fftw3.info, Node: FFTW MPI Wisdom, Next: Avoiding MPI Deadlocks, Prev: FFTW MPI Transposes, Up: Distributed-memory FFTW with MPI 3979 39806.8 FFTW MPI Wisdom 3981=================== 3982 3983FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be 3984used to save MPI plans as well as to save uniprocessor plans. However, 3985for MPI there are several unavoidable complications. 3986 3987 First, the MPI standard does not guarantee that every process can 3988perform file I/O (at least, not using C stdio routines)--in general, we 3989may only assume that process 0 is capable of I/O.(1) So, if we want to 3990export the wisdom from a single process to a file, we must first export 3991the wisdom to a string, then send it to process 0, then write it to a 3992file. 3993 3994 Second, in principle we may want to have separate wisdom for every 3995process, since in general the processes may run on different hardware 3996even for a single MPI program. However, in practice FFTW's MPI code is 3997designed for the case of homogeneous hardware (*note Load balancing::), 3998and in this case it is convenient to use the same wisdom for every 3999process. Thus, we need a mechanism to synchronize the wisdom. 4000 4001 To address both of these problems, FFTW provides the following two 4002functions: 4003 4004 void fftw_mpi_broadcast_wisdom(MPI_Comm comm); 4005 void fftw_mpi_gather_wisdom(MPI_Comm comm); 4006 4007 Given a communicator 'comm', 'fftw_mpi_broadcast_wisdom' will 4008broadcast the wisdom from process 0 to all other processes. Conversely, 4009'fftw_mpi_gather_wisdom' will collect wisdom from all processes onto 4010process 0. (If the plans created for the same problem by different 4011processes are not the same, 'fftw_mpi_gather_wisdom' will arbitrarily 4012choose one of the plans.) Both of these functions may result in 4013suboptimal plans for different processes if the processes are running on 4014non-identical hardware. Both of these functions are _collective_ calls, 4015which means that they must be executed by all processes in the 4016communicator. 4017 4018 So, for example, a typical code snippet to import wisdom from a file 4019and use it on all processes would be: 4020 4021 { 4022 int rank; 4023 4024 fftw_mpi_init(); 4025 MPI_Comm_rank(MPI_COMM_WORLD, &rank); 4026 if (rank == 0) fftw_import_wisdom_from_filename("mywisdom"); 4027 fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD); 4028 } 4029 4030 (Note that we must call 'fftw_mpi_init' before importing any wisdom 4031that might contain MPI plans.) Similarly, a typical code snippet to 4032export wisdom from all processes to a file is: 4033 4034 { 4035 int rank; 4036 4037 fftw_mpi_gather_wisdom(MPI_COMM_WORLD); 4038 MPI_Comm_rank(MPI_COMM_WORLD, &rank); 4039 if (rank == 0) fftw_export_wisdom_to_filename("mywisdom"); 4040 } 4041 4042 ---------- Footnotes ---------- 4043 4044 (1) In fact, even this assumption is not technically guaranteed by 4045the standard, although it seems to be universal in actual MPI 4046implementations and is widely assumed by MPI-using software. 4047Technically, you need to query the 'MPI_IO' attribute of 4048'MPI_COMM_WORLD' with 'MPI_Attr_get'. If this attribute is 4049'MPI_PROC_NULL', no I/O is possible. If it is 'MPI_ANY_SOURCE', any 4050process can perform I/O. Otherwise, it is the rank of a process that can 4051perform I/O ... but since it is not guaranteed to yield the _same_ rank 4052on all processes, you have to do an 'MPI_Allreduce' of some kind if you 4053want all processes to agree about which is going to do I/O. And even 4054then, the standard only guarantees that this process can perform output, 4055but not input. See e.g. 'Parallel Programming with MPI' by P. S. 4056Pacheco, section 8.1.3. Needless to say, in our experience virtually no 4057MPI programmers worry about this. 4058 4059 4060File: fftw3.info, Node: Avoiding MPI Deadlocks, Next: FFTW MPI Performance Tips, Prev: FFTW MPI Wisdom, Up: Distributed-memory FFTW with MPI 4061 40626.9 Avoiding MPI Deadlocks 4063========================== 4064 4065An MPI program can _deadlock_ if one process is waiting for a message 4066from another process that never gets sent. To avoid deadlocks when 4067using FFTW's MPI routines, it is important to know which functions are 4068_collective_: that is, which functions must _always_ be called in the 4069_same order_ from _every_ process in a given communicator. (For 4070example, 'MPI_Barrier' is the canonical example of a collective function 4071in the MPI standard.) 4072 4073 The functions in FFTW that are _always_ collective are: every 4074function beginning with 'fftw_mpi_plan', as well as 4075'fftw_mpi_broadcast_wisdom' and 'fftw_mpi_gather_wisdom'. Also, the 4076following functions from the ordinary FFTW interface are collective when 4077they are applied to a plan created by an 'fftw_mpi_plan' function: 4078'fftw_execute', 'fftw_destroy_plan', and 'fftw_flops'. 4079 4080 4081File: fftw3.info, Node: FFTW MPI Performance Tips, Next: Combining MPI and Threads, Prev: Avoiding MPI Deadlocks, Up: Distributed-memory FFTW with MPI 4082 40836.10 FFTW MPI Performance Tips 4084============================== 4085 4086In this section, we collect a few tips on getting the best performance 4087out of FFTW's MPI transforms. 4088 4089 First, because of the 1d block distribution, FFTW's parallelization 4090is currently limited by the size of the first dimension. 4091(Multidimensional block distributions may be supported by a future 4092version.) More generally, you should ideally arrange the dimensions so 4093that FFTW can divide them equally among the processes. *Note Load 4094balancing::. 4095 4096 Second, if it is not too inconvenient, you should consider working 4097with transposed output for multidimensional plans, as this saves a 4098considerable amount of communications. *Note Transposed 4099distributions::. 4100 4101 Third, the fastest choices are generally either an in-place transform 4102or an out-of-place transform with the 'FFTW_DESTROY_INPUT' flag (which 4103allows the input array to be used as scratch space). In-place is 4104especially beneficial if the amount of data per process is large. 4105 4106 Fourth, if you have multiple arrays to transform at once, rather than 4107calling FFTW's MPI transforms several times it usually seems to be 4108faster to interleave the data and use the advanced interface. (This 4109groups the communications together instead of requiring separate 4110messages for each transform.) 4111 4112 4113File: fftw3.info, Node: Combining MPI and Threads, Next: FFTW MPI Reference, Prev: FFTW MPI Performance Tips, Up: Distributed-memory FFTW with MPI 4114 41156.11 Combining MPI and Threads 4116============================== 4117 4118In certain cases, it may be advantageous to combine MPI 4119(distributed-memory) and threads (shared-memory) parallelization. FFTW 4120supports this, with certain caveats. For example, if you have a cluster 4121of 4-processor shared-memory nodes, you may want to use threads within 4122the nodes and MPI between the nodes, instead of MPI for all 4123parallelization. 4124 4125 In particular, it is possible to seamlessly combine the MPI FFTW 4126routines with the multi-threaded FFTW routines (*note Multi-threaded 4127FFTW::). However, some care must be taken in the initialization code, 4128which should look something like this: 4129 4130 int threads_ok; 4131 4132 int main(int argc, char **argv) 4133 { 4134 int provided; 4135 MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); 4136 threads_ok = provided >= MPI_THREAD_FUNNELED; 4137 4138 if (threads_ok) threads_ok = fftw_init_threads(); 4139 fftw_mpi_init(); 4140 4141 ... 4142 if (threads_ok) fftw_plan_with_nthreads(...); 4143 ... 4144 4145 MPI_Finalize(); 4146 } 4147 4148 First, note that instead of calling 'MPI_Init', you should call 4149'MPI_Init_threads', which is the initialization routine defined by the 4150MPI-2 standard to indicate to MPI that your program will be 4151multithreaded. We pass 'MPI_THREAD_FUNNELED', which indicates that we 4152will only call MPI routines from the main thread. (FFTW will launch 4153additional threads internally, but the extra threads will not call MPI 4154code.) (You may also pass 'MPI_THREAD_SERIALIZED' or 4155'MPI_THREAD_MULTIPLE', which requests additional multithreading support 4156from the MPI implementation, but this is not required by FFTW.) The 4157'provided' parameter returns what level of threads support is actually 4158supported by your MPI implementation; this _must_ be at least 4159'MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so 4160we define a global variable 'threads_ok' to record this. You should 4161only call 'fftw_init_threads' or 'fftw_plan_with_nthreads' if 4162'threads_ok' is true. For more information on thread safety in MPI, see 4163the MPI and Threads 4164(http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the 4165MPI-2 standard. 4166 4167 Second, we must call 'fftw_init_threads' _before_ 'fftw_mpi_init'. 4168This is critical for technical reasons having to do with how FFTW 4169initializes its list of algorithms. 4170 4171 Then, if you call 'fftw_plan_with_nthreads(N)', _every_ MPI process 4172will launch (up to) 'N' threads to parallelize its transforms. 4173 4174 For example, in the hypothetical cluster of 4-processor nodes, you 4175might wish to launch only a single MPI process per node, and then call 4176'fftw_plan_with_nthreads(4)' on each process to use all processors in 4177the nodes. 4178 4179 This may or may not be faster than simply using as many MPI processes 4180as you have processors, however. On the one hand, using threads within 4181a node eliminates the need for explicit message passing within the node. 4182On the other hand, FFTW's transpose routines are not multi-threaded, and 4183this means that the communications that do take place will not benefit 4184from parallelization within the node. Moreover, many MPI 4185implementations already have optimizations to exploit shared memory when 4186it is available, so adding the multithreaded FFTW on top of this may be 4187superfluous. 4188 4189 4190File: fftw3.info, Node: FFTW MPI Reference, Next: FFTW MPI Fortran Interface, Prev: Combining MPI and Threads, Up: Distributed-memory FFTW with MPI 4191 41926.12 FFTW MPI Reference 4193======================= 4194 4195This chapter provides a complete reference to all FFTW MPI functions, 4196datatypes, and constants. See also *note FFTW Reference:: for 4197information on functions and types in common with the serial interface. 4198 4199* Menu: 4200 4201* MPI Files and Data Types:: 4202* MPI Initialization:: 4203* Using MPI Plans:: 4204* MPI Data Distribution Functions:: 4205* MPI Plan Creation:: 4206* MPI Wisdom Communication:: 4207 4208 4209File: fftw3.info, Node: MPI Files and Data Types, Next: MPI Initialization, Prev: FFTW MPI Reference, Up: FFTW MPI Reference 4210 42116.12.1 MPI Files and Data Types 4212------------------------------- 4213 4214All programs using FFTW's MPI support should include its header file: 4215 4216 #include <fftw3-mpi.h> 4217 4218 Note that this header file includes the serial-FFTW 'fftw3.h' header 4219file, and also the 'mpi.h' header file for MPI, so you need not include 4220those files separately. 4221 4222 You must also link to _both_ the FFTW MPI library and to the serial 4223FFTW library. On Unix, this means adding '-lfftw3_mpi -lfftw3 -lm' at 4224the end of the link command. 4225 4226 Different precisions are handled as in the serial interface: *Note 4227Precision::. That is, 'fftw_' functions become 'fftwf_' (in single 4228precision) etcetera, and the libraries become '-lfftw3f_mpi -lfftw3f 4229-lm' etcetera on Unix. Long-double precision is supported in MPI, but 4230quad precision ('fftwq_') is not due to the lack of MPI support for this 4231type. 4232 4233 4234File: fftw3.info, Node: MPI Initialization, Next: Using MPI Plans, Prev: MPI Files and Data Types, Up: FFTW MPI Reference 4235 42366.12.2 MPI Initialization 4237------------------------- 4238 4239Before calling any other FFTW MPI ('fftw_mpi_') function, and before 4240importing any wisdom for MPI problems, you must call: 4241 4242 void fftw_mpi_init(void); 4243 4244 If FFTW threads support is used, however, 'fftw_mpi_init' should be 4245called _after_ 'fftw_init_threads' (*note Combining MPI and Threads::). 4246Calling 'fftw_mpi_init' additional times (before 'fftw_mpi_cleanup') has 4247no effect. 4248 4249 If you want to deallocate all persistent data and reset FFTW to the 4250pristine state it was in when you started your program, you can call: 4251 4252 void fftw_mpi_cleanup(void); 4253 4254 (This calls 'fftw_cleanup', so you need not call the serial cleanup 4255routine too, although it is safe to do so.) After calling 4256'fftw_mpi_cleanup', all existing plans become undefined, and you should 4257not attempt to execute or destroy them. You must call 'fftw_mpi_init' 4258again after 'fftw_mpi_cleanup' if you want to resume using the MPI FFTW 4259routines. 4260 4261 4262File: fftw3.info, Node: Using MPI Plans, Next: MPI Data Distribution Functions, Prev: MPI Initialization, Up: FFTW MPI Reference 4263 42646.12.3 Using MPI Plans 4265---------------------- 4266 4267Once an MPI plan is created, you can execute and destroy it using 4268'fftw_execute', 'fftw_destroy_plan', and the other functions in the 4269serial interface that operate on generic plans (*note Using Plans::). 4270 4271 The 'fftw_execute' and 'fftw_destroy_plan' functions, applied to MPI 4272plans, are _collective_ calls: they must be called for all processes in 4273the communicator that was used to create the plan. 4274 4275 You must _not_ use the serial new-array plan-execution functions 4276'fftw_execute_dft' and so on (*note New-array Execute Functions::) with 4277MPI plans. Such functions are specialized to the problem type, and 4278there are specific new-array execute functions for MPI plans: 4279 4280 void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out); 4281 void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out); 4282 void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out); 4283 void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out); 4284 4285 These functions have the same restrictions as those of the serial 4286new-array execute functions. They are _always_ safe to apply to the 4287_same_ 'in' and 'out' arrays that were used to create the plan. They 4288can only be applied to new arrarys if those arrays have the same types, 4289dimensions, in-placeness, and alignment as the original arrays, where 4290the best way to ensure the same alignment is to use FFTW's 'fftw_malloc' 4291and related allocation functions for all arrays (*note Memory 4292Allocation::). Note that distributed transposes (*note FFTW MPI 4293Transposes::) use 'fftw_mpi_execute_r2r', since they count as rank-zero 4294r2r plans from FFTW's perspective. 4295 4296 4297File: fftw3.info, Node: MPI Data Distribution Functions, Next: MPI Plan Creation, Prev: Using MPI Plans, Up: FFTW MPI Reference 4298 42996.12.4 MPI Data Distribution Functions 4300-------------------------------------- 4301 4302As described above (*note MPI Data Distribution::), in order to allocate 4303your arrays, _before_ creating a plan, you must first call one of the 4304following routines to determine the required allocation size and the 4305portion of the array locally stored on a given process. The 'MPI_Comm' 4306communicator passed here must be equivalent to the communicator used 4307below for plan creation. 4308 4309 The basic interface for multidimensional transforms consists of the 4310functions: 4311 4312 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, 4313 ptrdiff_t *local_n0, ptrdiff_t *local_0_start); 4314 ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4315 MPI_Comm comm, 4316 ptrdiff_t *local_n0, ptrdiff_t *local_0_start); 4317 ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm, 4318 ptrdiff_t *local_n0, ptrdiff_t *local_0_start); 4319 4320 ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, 4321 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 4322 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 4323 ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4324 MPI_Comm comm, 4325 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 4326 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 4327 ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm, 4328 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 4329 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 4330 4331 These functions return the number of elements to allocate (complex 4332numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the 4333'local_n0' and 'local_0_start' return the portion ('local_0_start' to 4334'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1] 4335x n[2] x ... x n[d-1] array that is stored on the local process. *Note 4336Basic and advanced distribution interfaces::. For 4337'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful 4338in order to also return the local portion of the first dimension in the 4339n[1] x n[0] x n[2] x ... x n[d-1] transposed output. *Note Transposed 4340distributions::. The advanced interface for multidimensional transforms 4341is: 4342 4343 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany, 4344 ptrdiff_t block0, MPI_Comm comm, 4345 ptrdiff_t *local_n0, ptrdiff_t *local_0_start); 4346 ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany, 4347 ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm, 4348 ptrdiff_t *local_n0, ptrdiff_t *local_0_start, 4349 ptrdiff_t *local_n1, ptrdiff_t *local_1_start); 4350 4351 These differ from the basic interface in only two ways. First, they 4352allow you to specify block sizes 'block0' and 'block1' (the latter for 4353the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use 4354FFTW's default block size as in the basic interface. Second, you can 4355pass a 'howmany' parameter, corresponding to the advanced planning 4356interface below: this is for transforms of contiguous 'howmany'-tuples 4357of numbers ('howmany = 1' in the basic interface). 4358 4359 The corresponding basic and advanced routines for one-dimensional 4360transforms (currently only complex DFTs) are: 4361 4362 ptrdiff_t fftw_mpi_local_size_1d( 4363 ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags, 4364 ptrdiff_t *local_ni, ptrdiff_t *local_i_start, 4365 ptrdiff_t *local_no, ptrdiff_t *local_o_start); 4366 ptrdiff_t fftw_mpi_local_size_many_1d( 4367 ptrdiff_t n0, ptrdiff_t howmany, 4368 MPI_Comm comm, int sign, unsigned flags, 4369 ptrdiff_t *local_ni, ptrdiff_t *local_i_start, 4370 ptrdiff_t *local_no, ptrdiff_t *local_o_start); 4371 4372 As above, the return value is the number of elements to allocate 4373(complex numbers, for complex DFTs). The 'local_ni' and 'local_i_start' 4374arguments return the portion ('local_i_start' to 'local_i_start + 4375local_ni - 1') of the 1d array that is stored on this process for the 4376transform _input_, and 'local_no' and 'local_o_start' are the 4377corresponding quantities for the input. The 'sign' ('FFTW_FORWARD' or 4378'FFTW_BACKWARD') and 'flags' must match the arguments passed when 4379creating a plan. Although the inputs and outputs have different data 4380distributions in general, it is guaranteed that the _output_ data 4381distribution of an 'FFTW_FORWARD' plan will match the _input_ data 4382distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for 4383the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags. *Note 4384One-dimensional distributions::. 4385 4386 4387File: fftw3.info, Node: MPI Plan Creation, Next: MPI Wisdom Communication, Prev: MPI Data Distribution Functions, Up: FFTW MPI Reference 4388 43896.12.5 MPI Plan Creation 4390------------------------ 4391 4392Complex-data MPI DFTs 4393..................... 4394 4395Plans for complex-data DFTs (*note 2d MPI example::) are created by: 4396 4397 fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out, 4398 MPI_Comm comm, int sign, unsigned flags); 4399 fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1, 4400 fftw_complex *in, fftw_complex *out, 4401 MPI_Comm comm, int sign, unsigned flags); 4402 fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4403 fftw_complex *in, fftw_complex *out, 4404 MPI_Comm comm, int sign, unsigned flags); 4405 fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n, 4406 fftw_complex *in, fftw_complex *out, 4407 MPI_Comm comm, int sign, unsigned flags); 4408 fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n, 4409 ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock, 4410 fftw_complex *in, fftw_complex *out, 4411 MPI_Comm comm, int sign, unsigned flags); 4412 4413 These are similar to their serial counterparts (*note Complex DFTs::) 4414in specifying the dimensions, sign, and flags of the transform. The 4415'comm' argument gives an MPI communicator that specifies the set of 4416processes to participate in the transform; plan creation is a collective 4417function that must be called for all processes in the communicator. The 4418'in' and 'out' pointers refer only to a portion of the overall transform 4419data (*note MPI Data Distribution::) as specified by the 'local_size' 4420functions in the previous section. Unless 'flags' contains 4421'FFTW_ESTIMATE', these arrays are overwritten during plan creation as 4422for the serial interface. For multi-dimensional transforms, any 4423dimensions '> 1' are supported; for one-dimensional transforms, only 4424composite (non-prime) 'n0' are currently supported (unlike the serial 4425FFTW). Requesting an unsupported transform size will yield a 'NULL' 4426plan. (As in the serial interface, highly composite sizes generally 4427yield the best performance.) 4428 4429 The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows 4430you to specify the block sizes for the first dimension ('block') of the 4431n[0] x n[1] x n[2] x ... x n[d-1] input data and the first dimension 4432('tblock') of the n[1] x n[0] x n[2] x ... x n[d-1] transposed data (at 4433intermediate steps of the transform, and for the output if 4434'FFTW_TRANSPOSED_OUT' is specified in 'flags'). These must be the same 4435block sizes as were passed to the corresponding 'local_size' function; 4436you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size 4437as in the basic interface. Also, the 'howmany' parameter specifies that 4438the transform is of contiguous 'howmany'-tuples rather than individual 4439complex numbers; this corresponds to the same parameter in the serial 4440advanced interface (*note Advanced Complex DFTs::) with 'stride = 4441howmany' and 'dist = 1'. 4442 4443MPI flags 4444......... 4445 4446The 'flags' can be any of those for the serial FFTW (*note Planner 4447Flags::), and in addition may include one or more of the following 4448MPI-specific flags, which improve performance at the cost of changing 4449the output or input data formats. 4450 4451 * 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d 4452 transforms only, these flags indicate that the output/input of the 4453 transform are in an undocumented "scrambled" order. A forward 4454 'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward 4455 'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization). *Note 4456 One-dimensional distributions::. 4457 4458 * 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for 4459 multidimensional ('rnk > 1') transforms only, these flags specify 4460 that the output or input of an n[0] x n[1] x n[2] x ... x n[d-1] 4461 transform is transposed to n[1] x n[0] x n[2] x ... x n[d-1] . 4462 *Note Transposed distributions::. 4463 4464Real-data MPI DFTs 4465.................. 4466 4467Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI 4468DFTs of Real Data::) are created by: 4469 4470 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1, 4471 double *in, fftw_complex *out, 4472 MPI_Comm comm, unsigned flags); 4473 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1, 4474 double *in, fftw_complex *out, 4475 MPI_Comm comm, unsigned flags); 4476 fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4477 double *in, fftw_complex *out, 4478 MPI_Comm comm, unsigned flags); 4479 fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n, 4480 double *in, fftw_complex *out, 4481 MPI_Comm comm, unsigned flags); 4482 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1, 4483 fftw_complex *in, double *out, 4484 MPI_Comm comm, unsigned flags); 4485 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1, 4486 fftw_complex *in, double *out, 4487 MPI_Comm comm, unsigned flags); 4488 fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4489 fftw_complex *in, double *out, 4490 MPI_Comm comm, unsigned flags); 4491 fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n, 4492 fftw_complex *in, double *out, 4493 MPI_Comm comm, unsigned flags); 4494 4495 Similar to the serial interface (*note Real-data DFTs::), these 4496transform logically n[0] x n[1] x n[2] x ... x n[d-1] real data to/from 4497n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) complex data, representing 4498the non-redundant half of the conjugate-symmetry output of a real-input 4499DFT (*note Multi-dimensional Transforms::). However, the real array 4500must be stored within a padded n[0] x n[1] x n[2] x ... x [2 (n[d-1]/2 4501+ 1)] array (much like the in-place serial r2c transforms, but here for 4502out-of-place transforms as well). Currently, only multi-dimensional 4503('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk 4504= 1' will yield 'NULL'). As explained above (*note Multi-dimensional 4505MPI DFTs of Real Data::), the data distribution of both the real and 4506complex arrays is given by the 'local_size' function called for the 4507dimensions of the _complex_ array. Similar to the other planning 4508functions, the input and output arrays are overwritten when the plan is 4509created except in 'FFTW_ESTIMATE' mode. 4510 4511 As for the complex DFTs above, there is an advance interface that 4512allows you to manually specify block sizes and to transform contiguous 4513'howmany'-tuples of real/complex numbers: 4514 4515 fftw_plan fftw_mpi_plan_many_dft_r2c 4516 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany, 4517 ptrdiff_t iblock, ptrdiff_t oblock, 4518 double *in, fftw_complex *out, 4519 MPI_Comm comm, unsigned flags); 4520 fftw_plan fftw_mpi_plan_many_dft_c2r 4521 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany, 4522 ptrdiff_t iblock, ptrdiff_t oblock, 4523 fftw_complex *in, double *out, 4524 MPI_Comm comm, unsigned flags); 4525 4526MPI r2r transforms 4527.................. 4528 4529There are corresponding plan-creation routines for r2r transforms (*note 4530More DFTs of Real Data::), currently supporting multidimensional ('rnk > 45311') transforms only ('rnk = 1' will yield a 'NULL' plan): 4532 4533 fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1, 4534 double *in, double *out, 4535 MPI_Comm comm, 4536 fftw_r2r_kind kind0, fftw_r2r_kind kind1, 4537 unsigned flags); 4538 fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, 4539 double *in, double *out, 4540 MPI_Comm comm, 4541 fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2, 4542 unsigned flags); 4543 fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n, 4544 double *in, double *out, 4545 MPI_Comm comm, const fftw_r2r_kind *kind, 4546 unsigned flags); 4547 fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n, 4548 ptrdiff_t iblock, ptrdiff_t oblock, 4549 double *in, double *out, 4550 MPI_Comm comm, const fftw_r2r_kind *kind, 4551 unsigned flags); 4552 4553 The parameters are much the same as for the complex DFTs above, 4554except that the arrays are of real numbers (and hence the outputs of the 4555'local_size' data-distribution functions should be interpreted as counts 4556of real rather than complex numbers). Also, the 'kind' parameters 4557specify the r2r kinds along each dimension as for the serial interface 4558(*note Real-to-Real Transform Kinds::). *Note Other Multi-dimensional 4559Real-data MPI Transforms::. 4560 4561MPI transposition 4562................. 4563 4564FFTW also provides routines to plan a transpose of a distributed 'n0' by 4565'n1' array of real numbers, or an array of 'howmany'-tuples of real 4566numbers with specified block sizes (*note FFTW MPI Transposes::): 4567 4568 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1, 4569 double *in, double *out, 4570 MPI_Comm comm, unsigned flags); 4571 fftw_plan fftw_mpi_plan_many_transpose 4572 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany, 4573 ptrdiff_t block0, ptrdiff_t block1, 4574 double *in, double *out, MPI_Comm comm, unsigned flags); 4575 4576 These plans are used with the 'fftw_mpi_execute_r2r' new-array 4577execute function (*note Using MPI Plans::), since they count as (rank 4578zero) r2r plans from FFTW's perspective. 4579 4580 4581File: fftw3.info, Node: MPI Wisdom Communication, Prev: MPI Plan Creation, Up: FFTW MPI Reference 4582 45836.12.6 MPI Wisdom Communication 4584------------------------------- 4585 4586To facilitate synchronizing wisdom among the different MPI processes, we 4587provide two functions: 4588 4589 void fftw_mpi_gather_wisdom(MPI_Comm comm); 4590 void fftw_mpi_broadcast_wisdom(MPI_Comm comm); 4591 4592 The 'fftw_mpi_gather_wisdom' function gathers all wisdom in the given 4593communicator 'comm' to the process of rank 0 in the communicator: that 4594process obtains the union of all wisdom on all the processes. As a side 4595effect, some other processes will gain additional wisdom from other 4596processes, but only process 0 will gain the complete union. 4597 4598 The 'fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom 4599from process 0 in 'comm' to all other processes in the communicator, 4600replacing any wisdom they currently have. 4601 4602 *Note FFTW MPI Wisdom::. 4603 4604 4605File: fftw3.info, Node: FFTW MPI Fortran Interface, Prev: FFTW MPI Reference, Up: Distributed-memory FFTW with MPI 4606 46076.13 FFTW MPI Fortran Interface 4608=============================== 4609 4610The FFTW MPI interface is callable from modern Fortran compilers 4611supporting the Fortran 2003 'iso_c_binding' standard for calling C 4612functions. As described in *note Calling FFTW from Modern Fortran::, 4613this means that you can directly call FFTW's C interface from Fortran 4614with only minor changes in syntax. There are, however, a few things 4615specific to the MPI interface to keep in mind: 4616 4617 * Instead of including 'fftw3.f03' as in *note Overview of Fortran 4618 interface::, you should 'include 'fftw3-mpi.f03'' (after 'use, 4619 intrinsic :: iso_c_binding' as before). The 'fftw3-mpi.f03' file 4620 includes 'fftw3.f03', so you should _not_ 'include' them both 4621 yourself. (You will also want to include the MPI header file, 4622 usually via 'include 'mpif.h'' or similar, although though this is 4623 not needed by 'fftw3-mpi.f03' per se.) (To use the 'fftwl_' 'long 4624 double' extended-precision routines in supporting compilers, you 4625 should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'. 4626 *Note Extended and quadruple precision in Fortran::.) 4627 4628 * Because of the different storage conventions between C and Fortran, 4629 you reverse the order of your array dimensions when passing them to 4630 FFTW (*note Reversing array dimensions::). This is merely a 4631 difference in notation and incurs no performance overhead. 4632 However, it means that, whereas in C the _first_ dimension is 4633 distributed, in Fortran the _last_ dimension of your array is 4634 distributed. 4635 4636 * In Fortran, communicators are stored as 'integer' types; there is 4637 no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'. 4638 Fortunately, this is taken care of for you by the FFTW Fortran 4639 interface: whenever the C interface expects an 'MPI_Comm' type, you 4640 should pass the Fortran communicator as an 'integer'.(1) 4641 4642 * Because you need to call the 'local_size' function to find out how 4643 much space to allocate, and this may be _larger_ than the local 4644 portion of the array (*note MPI Data Distribution::), you should 4645 _always_ allocate your arrays dynamically using FFTW's allocation 4646 routines as described in *note Allocating aligned memory in 4647 Fortran::. (Coincidentally, this also provides the best 4648 performance by guaranteeding proper data alignment.) 4649 4650 * Because all sizes in the MPI FFTW interface are declared as 4651 'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran 4652 (*note FFTW Fortran type reference::). 4653 4654 * In Fortran, because of the language semantics, we generally 4655 recommend using the new-array execute functions for all plans, even 4656 in the common case where you are executing the plan on the same 4657 arrays for which the plan was created (*note Plan execution in 4658 Fortran::). However, note that in the MPI interface these 4659 functions are changed: 'fftw_execute_dft' becomes 4660 'fftw_mpi_execute_dft', etcetera. *Note Using MPI Plans::. 4661 4662 For example, here is a Fortran code snippet to perform a distributed 4663L x M complex DFT in-place. (This assumes you have already initialized 4664MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.) 4665 4666 use, intrinsic :: iso_c_binding 4667 include 'fftw3-mpi.f03' 4668 integer(C_INTPTR_T), parameter :: L = ... 4669 integer(C_INTPTR_T), parameter :: M = ... 4670 type(C_PTR) :: plan, cdata 4671 complex(C_DOUBLE_COMPLEX), pointer :: data(:,:) 4672 integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset 4673 4674 ! get local data size and allocate (note dimension reversal) 4675 alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, & 4676 local_M, local_j_offset) 4677 cdata = fftw_alloc_complex(alloc_local) 4678 call c_f_pointer(cdata, data, [L,local_M]) 4679 4680 ! create MPI plan for in-place forward DFT (note dimension reversal) 4681 plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, & 4682 FFTW_FORWARD, FFTW_MEASURE) 4683 4684 ! initialize data to some function my_function(i,j) 4685 do j = 1, local_M 4686 do i = 1, L 4687 data(i, j) = my_function(i, j + local_j_offset) 4688 end do 4689 end do 4690 4691 ! compute transform (as many times as desired) 4692 call fftw_mpi_execute_dft(plan, data, data) 4693 4694 call fftw_destroy_plan(plan) 4695 call fftw_free(cdata) 4696 4697 Note that when we called 'fftw_mpi_local_size_2d' and 4698'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L 4699x M Fortran array is viewed by FFTW in C as a M x L array. This means 4700that the array was distributed over the 'M' dimension, the local portion 4701of which is a L x local_M array in Fortran. (You must _not_ use an 4702'allocate' statement to allocate an L x local_M array, however; you must 4703allocate 'alloc_local' complex numbers, which may be greater than 'L * 4704local_M', in order to reserve space for intermediate steps of the 4705transform.) Finally, we mention that because C's array indices are 4706zero-based, the 'local_j_offset' argument can conveniently be 4707interpreted as an offset in the 1-based 'j' index (rather than as a 4708starting index as in C). 4709 4710 If instead you had used the 'ior(FFTW_MEASURE, 4711FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a 4712transposed M x local_L array, associated with the _same_ 'cdata' 4713allocation (since the transform is in-place), and which you could 4714declare with: 4715 4716 complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:) 4717 ... 4718 call c_f_pointer(cdata, tdata, [M,local_L]) 4719 4720 where 'local_L' would have been obtained by changing the 4721'fftw_mpi_local_size_2d' call to: 4722 4723 alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, & 4724 local_M, local_j_offset, local_L, local_i_offset) 4725 4726 ---------- Footnotes ---------- 4727 4728 (1) Technically, this is because you aren't actually calling the C 4729functions directly. You are calling wrapper functions that translate 4730the communicator with 'MPI_Comm_f2c' before calling the ordinary C 4731interface. This is all done transparently, however, since the 4732'fftw3-mpi.f03' interface file renames the wrappers so that they are 4733called in Fortran with the same names as the C interface functions. 4734 4735 4736File: fftw3.info, Node: Calling FFTW from Modern Fortran, Next: Calling FFTW from Legacy Fortran, Prev: Distributed-memory FFTW with MPI, Up: Top 4737 47387 Calling FFTW from Modern Fortran 4739********************************** 4740 4741Fortran 2003 standardized ways for Fortran code to call C libraries, and 4742this allows us to support a direct translation of the FFTW C API into 4743Fortran. Compared to the legacy Fortran 77 interface (*note Calling 4744FFTW from Legacy Fortran::), this direct interface offers many 4745advantages, especially compile-time type-checking and aligned memory 4746allocation. As of this writing, support for these C interoperability 4747features seems widespread, having been implemented in nearly all major 4748Fortran compilers (e.g. GNU, Intel, IBM, Oracle/Solaris, Portland 4749Group, NAG). 4750 4751 This chapter documents that interface. For the most part, since this 4752interface allows Fortran to call the C interface directly, the usage is 4753identical to C translated to Fortran syntax. However, there are a few 4754subtle points such as memory allocation, wisdom, and data types that 4755deserve closer attention. 4756 4757* Menu: 4758 4759* Overview of Fortran interface:: 4760* Reversing array dimensions:: 4761* FFTW Fortran type reference:: 4762* Plan execution in Fortran:: 4763* Allocating aligned memory in Fortran:: 4764* Accessing the wisdom API from Fortran:: 4765* Defining an FFTW module:: 4766 4767 4768File: fftw3.info, Node: Overview of Fortran interface, Next: Reversing array dimensions, Prev: Calling FFTW from Modern Fortran, Up: Calling FFTW from Modern Fortran 4769 47707.1 Overview of Fortran interface 4771================================= 4772 4773FFTW provides a file 'fftw3.f03' that defines Fortran 2003 interfaces 4774for all of its C routines, except for the MPI routines described 4775elsewhere, which can be found in the same directory as 'fftw3.h' (the C 4776header file). In any Fortran subroutine where you want to use FFTW 4777functions, you should begin with: 4778 4779 use, intrinsic :: iso_c_binding 4780 include 'fftw3.f03' 4781 4782 This includes the interface definitions and the standard 4783'iso_c_binding' module (which defines the equivalents of C types). You 4784can also put the FFTW functions into a module if you prefer (*note 4785Defining an FFTW module::). 4786 4787 At this point, you can now call anything in the FFTW C interface 4788directly, almost exactly as in C other than minor changes in syntax. 4789For example: 4790 4791 type(C_PTR) :: plan 4792 complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out 4793 plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE) 4794 ... 4795 call fftw_execute_dft(plan, in, out) 4796 ... 4797 call fftw_destroy_plan(plan) 4798 4799 A few important things to keep in mind are: 4800 4801 * FFTW plans are 'type(C_PTR)'. Other C types are mapped in the 4802 obvious way via the 'iso_c_binding' standard: 'int' turns into 4803 'integer(C_INT)', 'fftw_complex' turns into 4804 'complex(C_DOUBLE_COMPLEX)', 'double' turns into 'real(C_DOUBLE)', 4805 and so on. *Note FFTW Fortran type reference::. 4806 4807 * Functions in C become functions in Fortran if they have a return 4808 value, and subroutines in Fortran otherwise. 4809 4810 * The ordering of the Fortran array dimensions must be _reversed_ 4811 when they are passed to the FFTW plan creation, thanks to 4812 differences in array indexing conventions (*note Multi-dimensional 4813 Array Format::). This is _unlike_ the legacy Fortran interface 4814 (*note Fortran-interface routines::), which reversed the dimensions 4815 for you. *Note Reversing array dimensions::. 4816 4817 * Using ordinary Fortran array declarations like this works, but may 4818 yield suboptimal performance because the data may not be not 4819 aligned to exploit SIMD instructions on modern proessors (*note 4820 SIMD alignment and fftw_malloc::). Better performance will often 4821 be obtained by allocating with 'fftw_alloc'. *Note Allocating 4822 aligned memory in Fortran::. 4823 4824 * Similar to the legacy Fortran interface (*note FFTW Execution in 4825 Fortran::), we currently recommend _not_ using 'fftw_execute' but 4826 rather using the more specialized functions like 'fftw_execute_dft' 4827 (*note New-array Execute Functions::). However, you should execute 4828 the plan on the 'same arrays' as the ones for which you created the 4829 plan, unless you are especially careful. *Note Plan execution in 4830 Fortran::. To prevent you from using 'fftw_execute' by mistake, 4831 the 'fftw3.f03' file does not provide an 'fftw_execute' interface 4832 declaration. 4833 4834 * Multiple planner flags are combined with 'ior' (equivalent to '|' 4835 in C). e.g. 'FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes 4836 'ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'. (You can also use '+' as 4837 long as you don't try to include a given flag more than once.) 4838 4839* Menu: 4840 4841* Extended and quadruple precision in Fortran:: 4842 4843 4844File: fftw3.info, Node: Extended and quadruple precision in Fortran, Prev: Overview of Fortran interface, Up: Overview of Fortran interface 4845 48467.1.1 Extended and quadruple precision in Fortran 4847------------------------------------------------- 4848 4849If FFTW is compiled in 'long double' (extended) precision (*note 4850Installation and Customization::), you may be able to call the resulting 4851'fftwl_' routines (*note Precision::) from Fortran if your compiler 4852supports the 'C_LONG_DOUBLE_COMPLEX' type code. 4853 4854 Because some Fortran compilers do not support 4855'C_LONG_DOUBLE_COMPLEX', the 'fftwl_' declarations are segregated into a 4856separate interface file 'fftw3l.f03', which you should include _in 4857addition_ to 'fftw3.f03' (which declares precision-independent 'FFTW_' 4858constants): 4859 4860 use, intrinsic :: iso_c_binding 4861 include 'fftw3.f03' 4862 include 'fftw3l.f03' 4863 4864 We also support using the nonstandard '__float128' 4865quadruple-precision type provided by recent versions of 'gcc' on 32- and 486664-bit x86 hardware (*note Installation and Customization::), using the 4867corresponding 'real(16)' and 'complex(16)' types supported by 4868'gfortran'. The quadruple-precision 'fftwq_' functions (*note 4869Precision::) are declared in a 'fftw3q.f03' interface file, which should 4870be included in addition to 'fftw3.f03', as above. You should also link 4871with '-lfftw3q -lquadmath -lm' as in C. 4872 4873 4874File: fftw3.info, Node: Reversing array dimensions, Next: FFTW Fortran type reference, Prev: Overview of Fortran interface, Up: Calling FFTW from Modern Fortran 4875 48767.2 Reversing array dimensions 4877============================== 4878 4879A minor annoyance in calling FFTW from Fortran is that FFTW's array 4880dimensions are defined in the C convention (row-major order), while 4881Fortran's array dimensions are the opposite convention (column-major 4882order). *Note Multi-dimensional Array Format::. This is just a 4883bookkeeping difference, with no effect on performance. The only 4884consequence of this is that, whenever you create an FFTW plan for a 4885multi-dimensional transform, you must always _reverse the ordering of 4886the dimensions_. 4887 4888 For example, consider the three-dimensional (L x M x N ) arrays: 4889 4890 complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out 4891 4892 To plan a DFT for these arrays using 'fftw_plan_dft_3d', you could 4893do: 4894 4895 plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE) 4896 4897 That is, from FFTW's perspective this is a N x M x L array. _No data 4898transposition need occur_, as this is _only notation_. Similarly, to 4899use the more generic routine 'fftw_plan_dft' with the same arrays, you 4900could do: 4901 4902 integer(C_INT), dimension(3) :: n = [N,M,L] 4903 plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE) 4904 4905 Note, by the way, that this is different from the legacy Fortran 4906interface (*note Fortran-interface routines::), which automatically 4907reverses the order of the array dimension for you. Here, you are 4908calling the C interface directly, so there is no "translation" layer. 4909 4910 An important thing to keep in mind is the implication of this for 4911multidimensional real-to-complex transforms (*note Multi-Dimensional 4912DFTs of Real Data::). In C, a multidimensional real-to-complex DFT 4913chops the last dimension roughly in half (N x M x L real input goes to N 4914x M x L/2+1 complex output). In Fortran, because the array dimension 4915notation is reversed, the _first_ dimension of the complex data is 4916chopped roughly in half. For example consider the 'r2c' transform of L 4917x M x N real input in Fortran: 4918 4919 type(C_PTR) :: plan 4920 real(C_DOUBLE), dimension(L,M,N) :: in 4921 complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out 4922 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE) 4923 ... 4924 call fftw_execute_dft_r2c(plan, in, out) 4925 4926 Alternatively, for an in-place r2c transform, as described in the C 4927documentation we must _pad_ the _first_ dimension of the real input with 4928an extra two entries (which are ignored by FFTW) so as to leave enough 4929space for the complex output. The input is _allocated_ as a 2[L/2+1] x 4930M x N array, even though only L x M x N of it is actually used. In this 4931example, we will allocate the array as a pointer type, using 4932'fftw_alloc' to ensure aligned memory for maximum performance (*note 4933Allocating aligned memory in Fortran::); this also makes it easy to 4934reference the same memory as both a real array and a complex array. 4935 4936 real(C_DOUBLE), pointer :: in(:,:,:) 4937 complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:) 4938 type(C_PTR) :: plan, data 4939 data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T)) 4940 call c_f_pointer(data, in, [2*(L/2+1),M,N]) 4941 call c_f_pointer(data, out, [L/2+1,M,N]) 4942 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE) 4943 ... 4944 call fftw_execute_dft_r2c(plan, in, out) 4945 ... 4946 call fftw_destroy_plan(plan) 4947 call fftw_free(data) 4948 4949 4950File: fftw3.info, Node: FFTW Fortran type reference, Next: Plan execution in Fortran, Prev: Reversing array dimensions, Up: Calling FFTW from Modern Fortran 4951 49527.3 FFTW Fortran type reference 4953=============================== 4954 4955The following are the most important type correspondences between the C 4956interface and Fortran: 4957 4958 * Plans ('fftw_plan' and variants) are 'type(C_PTR)' (i.e. an opaque 4959 pointer). 4960 4961 * The C floating-point types 'double', 'float', and 'long double' 4962 correspond to 'real(C_DOUBLE)', 'real(C_FLOAT)', and 4963 'real(C_LONG_DOUBLE)', respectively. The C complex types 4964 'fftw_complex', 'fftwf_complex', and 'fftwl_complex' correspond in 4965 Fortran to 'complex(C_DOUBLE_COMPLEX)', 'complex(C_FLOAT_COMPLEX)', 4966 and 'complex(C_LONG_DOUBLE_COMPLEX)', respectively. Just as in C 4967 (*note Precision::), the FFTW subroutines and types are prefixed 4968 with 'fftw_', 'fftwf_', and 'fftwl_' for the different precisions, 4969 and link to different libraries ('-lfftw3', '-lfftw3f', and 4970 '-lfftw3l' on Unix), but use the _same_ include file 'fftw3.f03' 4971 and the _same_ constants (all of which begin with 'FFTW_'). The 4972 exception is 'long double' precision, for which you should _also_ 4973 include 'fftw3l.f03' (*note Extended and quadruple precision in 4974 Fortran::). 4975 4976 * The C integer types 'int' and 'unsigned' (used for planner flags) 4977 become 'integer(C_INT)'. The C integer type 'ptrdiff_t' (e.g. in 4978 the *note 64-bit Guru Interface::) becomes 'integer(C_INTPTR_T)', 4979 and 'size_t' (in 'fftw_malloc' etc.) becomes 'integer(C_SIZE_T)'. 4980 4981 * The 'fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::) 4982 becomes 'integer(C_FFTW_R2R_KIND)'. The various constant values of 4983 the C enumerated type ('FFTW_R2HC' etc.) become simply integer 4984 constants of the same names in Fortran. 4985 4986 * Numeric array pointer arguments (e.g. 'double *') become 4987 'dimension(*), intent(out)' arrays of the same type, or 4988 'dimension(*), intent(in)' if they are pointers to constant data 4989 (e.g. 'const int *'). There are a few exceptions where numeric 4990 pointers refer to scalar outputs (e.g. for 'fftw_flops'), in which 4991 case they are 'intent(out)' scalar arguments in Fortran too. For 4992 the new-array execute functions (*note New-array Execute 4993 Functions::), the input arrays are declared 'dimension(*), 4994 intent(inout)', since they can be modified in the case of in-place 4995 or 'FFTW_DESTROY_INPUT' transforms. 4996 4997 * Pointer _return_ values (e.g 'double *') become 'type(C_PTR)'. (If 4998 they are pointers to arrays, as for 'fftw_alloc_real', you can 4999 convert them back to Fortran array pointers with the standard 5000 intrinsic function 'c_f_pointer'.) 5001 5002 * The 'fftw_iodim' type in the guru interface (*note Guru vector and 5003 transform sizes::) becomes 'type(fftw_iodim)' in Fortran, a derived 5004 data type (the Fortran analogue of C's 'struct') with three 5005 'integer(C_INT)' components: 'n', 'is', and 'os', with the same 5006 meanings as in C. The 'fftw_iodim64' type in the 64-bit guru 5007 interface (*note 64-bit Guru Interface::) is the same, except that 5008 its components are of type 'integer(C_INTPTR_T)'. 5009 5010 * Using the wisdom import/export functions from Fortran is a bit 5011 tricky, and is discussed in *note Accessing the wisdom API from 5012 Fortran::. In brief, the 'FILE *' arguments map to 'type(C_PTR)', 5013 'const char *' to 'character(C_CHAR), dimension(*), intent(in)' 5014 (null-terminated!), and the generic read-char/write-char functions 5015 map to 'type(C_FUNPTR)'. 5016 5017 You may be wondering if you need to search-and-replace 5018'real(kind(0.0d0))' (or whatever your favorite Fortran spelling of 5019"double precision" is) with 'real(C_DOUBLE)' everywhere in your program, 5020and similarly for 'complex' and 'integer' types. The answer is no; you 5021can still use your existing types. As long as these types match their C 5022counterparts, things should work without a hitch. The worst that can 5023happen, e.g. in the (unlikely) event of a system where 5024'real(kind(0.0d0))' is different from 'real(C_DOUBLE)', is that the 5025compiler will give you a type-mismatch error. That is, if you don't use 5026the 'iso_c_binding' kinds you need to accept at least the theoretical 5027possibility of having to change your code in response to compiler errors 5028on some future machine, but you don't need to worry about silently 5029compiling incorrect code that yields runtime errors. 5030 5031 5032File: fftw3.info, Node: Plan execution in Fortran, Next: Allocating aligned memory in Fortran, Prev: FFTW Fortran type reference, Up: Calling FFTW from Modern Fortran 5033 50347.4 Plan execution in Fortran 5035============================= 5036 5037In C, in order to use a plan, one normally calls 'fftw_execute', which 5038executes the plan to perform the transform on the input/output arrays 5039passed when the plan was created (*note Using Plans::). The 5040corresponding subroutine call in modern Fortran is: 5041 call fftw_execute(plan) 5042 5043 However, we have had reports that this causes problems with some 5044recent optimizing Fortran compilers. The problem is, because the 5045input/output arrays are not passed as explicit arguments to 5046'fftw_execute', the semantics of Fortran (unlike C) allow the compiler 5047to assume that the input/output arrays are not changed by 5048'fftw_execute'. As a consequence, certain compilers end up 5049repositioning the call to 'fftw_execute', assuming incorrectly that it 5050does nothing to the arrays. 5051 5052 There are various workarounds to this, but the safest and simplest 5053thing is to not use 'fftw_execute' in Fortran. Instead, use the 5054functions described in *note New-array Execute Functions::, which take 5055the input/output arrays as explicit arguments. For example, if the plan 5056is for a complex-data DFT and was created for the arrays 'in' and 'out', 5057you would do: 5058 call fftw_execute_dft(plan, in, out) 5059 5060 There are a few things to be careful of, however: 5061 5062 * You must use the correct type of execute function, matching the way 5063 the plan was created. Complex DFT plans should use 5064 'fftw_execute_dft', Real-input (r2c) DFT plans should use use 5065 'fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use 5066 'fftw_execute_dft_c2r'. The various r2r plans should use 5067 'fftw_execute_r2r'. Fortunately, if you use the wrong one you will 5068 get a compile-time type-mismatch error (unlike legacy Fortran). 5069 5070 * You should normally pass the same input/output arrays that were 5071 used when creating the plan. This is always safe. 5072 5073 * _If_ you pass _different_ input/output arrays compared to those 5074 used when creating the plan, you must abide by all the restrictions 5075 of the new-array execute functions (*note New-array Execute 5076 Functions::). The most tricky of these is the requirement that the 5077 new arrays have the same alignment as the original arrays; the best 5078 (and possibly only) way to guarantee this is to use the 5079 'fftw_alloc' functions to allocate your arrays (*note Allocating 5080 aligned memory in Fortran::). Alternatively, you can use the 5081 'FFTW_UNALIGNED' flag when creating the plan, in which case the 5082 plan does not depend on the alignment, but this may sacrifice 5083 substantial performance on architectures (like x86) with SIMD 5084 instructions (*note SIMD alignment and fftw_malloc::). 5085 5086 5087File: fftw3.info, Node: Allocating aligned memory in Fortran, Next: Accessing the wisdom API from Fortran, Prev: Plan execution in Fortran, Up: Calling FFTW from Modern Fortran 5088 50897.5 Allocating aligned memory in Fortran 5090======================================== 5091 5092In order to obtain maximum performance in FFTW, you should store your 5093data in arrays that have been specially aligned in memory (*note SIMD 5094alignment and fftw_malloc::). Enforcing alignment also permits you to 5095safely use the new-array execute functions (*note New-array Execute 5096Functions::) to apply a given plan to more than one pair of in/out 5097arrays. Unfortunately, standard Fortran arrays do _not_ provide any 5098alignment guarantees. The _only_ way to allocate aligned memory in 5099standard Fortran is to allocate it with an external C function, like the 5100'fftw_alloc_real' and 'fftw_alloc_complex' functions. Fortunately, 5101Fortran 2003 provides a simple way to associate such allocated memory 5102with a standard Fortran array pointer that you can then use normally. 5103 5104 We therefore recommend allocating all your input/output arrays using 5105the following technique: 5106 5107 1. Declare a 'pointer', 'arr', to your array of the desired type and 5108 dimensions. For example, 'real(C_DOUBLE), pointer :: a(:,:)' for a 5109 2d real array, or 'complex(C_DOUBLE_COMPLEX), pointer :: a(:,:,:)' 5110 for a 3d complex array. 5111 5112 2. The number of elements to allocate must be an 'integer(C_SIZE_T)'. 5113 You can either declare a variable of this type, e.g. 5114 'integer(C_SIZE_T) :: sz', to store the number of elements to 5115 allocate, or you can use the 'int(..., C_SIZE_T)' intrinsic 5116 function. e.g. set 'sz = L * M * N' or use 'int(L * M * N, 5117 C_SIZE_T)' for an L x M x N array. 5118 5119 3. Declare a 'type(C_PTR) :: p' to hold the return value from FFTW's 5120 allocation routine. Set 'p = fftw_alloc_real(sz)' for a real 5121 array, or 'p = fftw_alloc_complex(sz)' for a complex array. 5122 5123 4. Associate your pointer 'arr' with the allocated memory 'p' using 5124 the standard 'c_f_pointer' subroutine: 'call c_f_pointer(p, arr, 5125 [...dimensions...])', where '[...dimensions...])' are an array of 5126 the dimensions of the array (in the usual Fortran order). e.g. 5127 'call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array. 5128 (Alternatively, you can omit the dimensions argument if you 5129 specified the shape explicitly when declaring 'arr'.) You can now 5130 use 'arr' as a usual multidimensional array. 5131 5132 5. When you are done using the array, deallocate the memory by 'call 5133 fftw_free(p)' on 'p'. 5134 5135 For example, here is how we would allocate an L x M 2d real array: 5136 5137 real(C_DOUBLE), pointer :: arr(:,:) 5138 type(C_PTR) :: p 5139 p = fftw_alloc_real(int(L * M, C_SIZE_T)) 5140 call c_f_pointer(p, arr, [L,M]) 5141 _...use arr and arr(i,j) as usual..._ 5142 call fftw_free(p) 5143 5144 and here is an L x M x N 3d complex array: 5145 5146 complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:) 5147 type(C_PTR) :: p 5148 p = fftw_alloc_complex(int(L * M * N, C_SIZE_T)) 5149 call c_f_pointer(p, arr, [L,M,N]) 5150 _...use arr and arr(i,j,k) as usual..._ 5151 call fftw_free(p) 5152 5153 See *note Reversing array dimensions:: for an example allocating a 5154single array and associating both real and complex array pointers with 5155it, for in-place real-to-complex transforms. 5156 5157 5158File: fftw3.info, Node: Accessing the wisdom API from Fortran, Next: Defining an FFTW module, Prev: Allocating aligned memory in Fortran, Up: Calling FFTW from Modern Fortran 5159 51607.6 Accessing the wisdom API from Fortran 5161========================================= 5162 5163As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a 5164"wisdom" API for saving plans to disk so that they can be recreated 5165quickly. The C API for exporting (*note Wisdom Export::) and importing 5166(*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran, 5167however, because of differences in file I/O and string types between C 5168and Fortran. 5169 5170* Menu: 5171 5172* Wisdom File Export/Import from Fortran:: 5173* Wisdom String Export/Import from Fortran:: 5174* Wisdom Generic Export/Import from Fortran:: 5175 5176 5177File: fftw3.info, Node: Wisdom File Export/Import from Fortran, Next: Wisdom String Export/Import from Fortran, Prev: Accessing the wisdom API from Fortran, Up: Accessing the wisdom API from Fortran 5178 51797.6.1 Wisdom File Export/Import from Fortran 5180-------------------------------------------- 5181 5182The easiest way to export and import wisdom is to do so using 5183'fftw_export_wisdom_to_filename' and 'fftw_wisdom_from_filename'. The 5184only trick is that these require you to pass a C string, which is an 5185array of type 'CHARACTER(C_CHAR)' that is terminated by 'C_NULL_CHAR'. 5186You can call them like this: 5187 5188 integer(C_INT) :: ret 5189 ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR) 5190 if (ret .eq. 0) stop 'error exporting wisdom to file' 5191 ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR) 5192 if (ret .eq. 0) stop 'error importing wisdom from file' 5193 5194 Note that prepending 'C_CHAR_' is needed to specify that the literal 5195string is of kind 'C_CHAR', and we null-terminate the string by 5196appending '// C_NULL_CHAR'. These functions return an 'integer(C_INT)' 5197('ret') which is '0' if an error occurred during export/import and 5198nonzero otherwise. 5199 5200 It is also possible to use the lower-level routines 5201'fftw_export_wisdom_to_file' and 'fftw_import_wisdom_from_file', which 5202accept parameters of the C type 'FILE*', expressed in Fortran as 5203'type(C_PTR)'. However, you are then responsible for creating the 5204'FILE*' yourself. You can do this by using 'iso_c_binding' to define 5205Fortran intefaces for the C library functions 'fopen' and 'fclose', 5206which is a bit strange in Fortran but workable. 5207 5208 5209File: fftw3.info, Node: Wisdom String Export/Import from Fortran, Next: Wisdom Generic Export/Import from Fortran, Prev: Wisdom File Export/Import from Fortran, Up: Accessing the wisdom API from Fortran 5210 52117.6.2 Wisdom String Export/Import from Fortran 5212---------------------------------------------- 5213 5214Dealing with FFTW's C string export/import is a bit more painful. In 5215particular, the 'fftw_export_wisdom_to_string' function requires you to 5216deal with a dynamically allocated C string. To get its length, you must 5217define an interface to the C 'strlen' function, and to deallocate it you 5218must define an interface to C 'free': 5219 5220 use, intrinsic :: iso_c_binding 5221 interface 5222 integer(C_INT) function strlen(s) bind(C, name='strlen') 5223 import 5224 type(C_PTR), value :: s 5225 end function strlen 5226 subroutine free(p) bind(C, name='free') 5227 import 5228 type(C_PTR), value :: p 5229 end subroutine free 5230 end interface 5231 5232 Given these definitions, you can then export wisdom to a Fortran 5233character array: 5234 5235 character(C_CHAR), pointer :: s(:) 5236 integer(C_SIZE_T) :: slen 5237 type(C_PTR) :: p 5238 p = fftw_export_wisdom_to_string() 5239 if (.not. c_associated(p)) stop 'error exporting wisdom' 5240 slen = strlen(p) 5241 call c_f_pointer(p, s, [slen+1]) 5242 ... 5243 call free(p) 5244 5245 Note that 'slen' is the length of the C string, but the length of the 5246array is 'slen+1' because it includes the terminating null character. 5247(You can omit the '+1' if you don't want Fortran to know about the null 5248character.) The standard 'c_associated' function checks whether 'p' is 5249a null pointer, which is returned by 'fftw_export_wisdom_to_string' if 5250there was an error. 5251 5252 To import wisdom from a string, use 'fftw_import_wisdom_from_string' 5253as usual; note that the argument of this function must be a 5254'character(C_CHAR)' that is terminated by the 'C_NULL_CHAR' character, 5255like the 's' array above. 5256 5257 5258File: fftw3.info, Node: Wisdom Generic Export/Import from Fortran, Prev: Wisdom String Export/Import from Fortran, Up: Accessing the wisdom API from Fortran 5259 52607.6.3 Wisdom Generic Export/Import from Fortran 5261----------------------------------------------- 5262 5263The most generic wisdom export/import functions allow you to provide an 5264arbitrary callback function to read/write one character at a time in any 5265way you want. However, your callback function must be written in a 5266special way, using the 'bind(C)' attribute to be passed to a C 5267interface. 5268 5269 In particular, to call the generic wisdom export function 5270'fftw_export_wisdom', you would write a callback subroutine of the form: 5271 5272 subroutine my_write_char(c, p) bind(C) 5273 use, intrinsic :: iso_c_binding 5274 character(C_CHAR), value :: c 5275 type(C_PTR), value :: p 5276 _...write c..._ 5277 end subroutine my_write_char 5278 5279 Given such a subroutine (along with the corresponding interface 5280definition), you could then export wisdom using: 5281 5282 call fftw_export_wisdom(c_funloc(my_write_char), p) 5283 5284 The standard 'c_funloc' intrinsic converts a Fortran 'bind(C)' 5285subroutine into a C function pointer. The parameter 'p' is a 5286'type(C_PTR)' to any arbitrary data that you want to pass to 5287'my_write_char' (or 'C_NULL_PTR' if none). (Note that you can get a C 5288pointer to Fortran data using the intrinsic 'c_loc', and convert it back 5289to a Fortran pointer in 'my_write_char' using 'c_f_pointer'.) 5290 5291 Similarly, to use the generic 'fftw_import_wisdom', you would define 5292a callback function of the form: 5293 5294 integer(C_INT) function my_read_char(p) bind(C) 5295 use, intrinsic :: iso_c_binding 5296 type(C_PTR), value :: p 5297 character :: c 5298 _...read a character c..._ 5299 my_read_char = ichar(c, C_INT) 5300 end function my_read_char 5301 5302 .... 5303 5304 integer(C_INT) :: ret 5305 ret = fftw_import_wisdom(c_funloc(my_read_char), p) 5306 if (ret .eq. 0) stop 'error importing wisdom' 5307 5308 Your function can return '-1' if the end of the input is reached. 5309Again, 'p' is an arbitrary 'type(C_PTR' that is passed through to your 5310function. 'fftw_import_wisdom' returns '0' if an error occurred and 5311nonzero otherwise. 5312 5313 5314File: fftw3.info, Node: Defining an FFTW module, Prev: Accessing the wisdom API from Fortran, Up: Calling FFTW from Modern Fortran 5315 53167.7 Defining an FFTW module 5317=========================== 5318 5319Rather than using the 'include' statement to include the 'fftw3.f03' 5320interface file in any subroutine where you want to use FFTW, you might 5321prefer to define an FFTW Fortran module. FFTW does not install itself 5322as a module, primarily because 'fftw3.f03' can be shared between 5323different Fortran compilers while modules (in general) cannot. However, 5324it is trivial to define your own FFTW module if you want. Just create a 5325file containing: 5326 5327 module FFTW3 5328 use, intrinsic :: iso_c_binding 5329 include 'fftw3.f03' 5330 end module 5331 5332 Compile this file into a module as usual for your compiler (e.g. 5333with 'gfortran -c' you will get a file 'fftw3.mod'). Now, instead of 5334'include 'fftw3.f03'', whenever you want to use FFTW routines you can 5335just do: 5336 5337 use FFTW3 5338 5339 as usual for Fortran modules. (You still need to link to the FFTW 5340library, of course.) 5341 5342 5343File: fftw3.info, Node: Calling FFTW from Legacy Fortran, Next: Upgrading from FFTW version 2, Prev: Calling FFTW from Modern Fortran, Up: Top 5344 53458 Calling FFTW from Legacy Fortran 5346********************************** 5347 5348This chapter describes the interface to FFTW callable by Fortran code in 5349older compilers not supporting the Fortran 2003 C interoperability 5350features (*note Calling FFTW from Modern Fortran::). This interface has 5351the major disadvantage that it is not type-checked, so if you mistake 5352the argument types or ordering then your program will not have any 5353compiler errors, and will likely crash at runtime. So, greater care is 5354needed. Also, technically interfacing older Fortran versions to C is 5355nonstandard, but in practice we have found that the techniques used in 5356this chapter have worked with all known Fortran compilers for many 5357years. 5358 5359 The legacy Fortran interface differs from the C interface only in the 5360prefix ('dfftw_' instead of 'fftw_' in double precision) and a few other 5361minor details. This Fortran interface is included in the FFTW libraries 5362by default, unless a Fortran compiler isn't found on your system or 5363'--disable-fortran' is included in the 'configure' flags. We assume 5364here that the reader is already familiar with the usage of FFTW in C, as 5365described elsewhere in this manual. 5366 5367 The MPI parallel interface to FFTW is _not_ currently available to 5368legacy Fortran. 5369 5370* Menu: 5371 5372* Fortran-interface routines:: 5373* FFTW Constants in Fortran:: 5374* FFTW Execution in Fortran:: 5375* Fortran Examples:: 5376* Wisdom of Fortran?:: 5377 5378 5379File: fftw3.info, Node: Fortran-interface routines, Next: FFTW Constants in Fortran, Prev: Calling FFTW from Legacy Fortran, Up: Calling FFTW from Legacy Fortran 5380 53818.1 Fortran-interface routines 5382============================== 5383 5384Nearly all of the FFTW functions have Fortran-callable equivalents. The 5385name of the legacy Fortran routine is the same as that of the 5386corresponding C routine, but with the 'fftw_' prefix replaced by 5387'dfftw_'.(1) The single and long-double precision versions use 'sfftw_' 5388and 'lfftw_', respectively, instead of 'fftwf_' and 'fftwl_'; quadruple 5389precision ('real*16') is available on some systems as 'fftwq_' (*note 5390Precision::). (Note that 'long double' on x86 hardware is usually at 5391most 80-bit extended precision, _not_ quadruple precision.) 5392 5393 For the most part, all of the arguments to the functions are the 5394same, with the following exceptions: 5395 5396 * 'plan' variables (what would be of type 'fftw_plan' in C), must be 5397 declared as a type that is at least as big as a pointer (address) 5398 on your machine. We recommend using 'integer*8' everywhere, since 5399 this should always be big enough. 5400 5401 * Any function that returns a value (e.g. 'fftw_plan_dft') is 5402 converted into a _subroutine_. The return value is converted into 5403 an additional _first_ parameter of this subroutine.(2) 5404 5405 * The Fortran routines expect multi-dimensional arrays to be in 5406 _column-major_ order, which is the ordinary format of Fortran 5407 arrays (*note Multi-dimensional Array Format::). They do this 5408 transparently and costlessly simply by reversing the order of the 5409 dimensions passed to FFTW, but this has one important consequence 5410 for multi-dimensional real-complex transforms, discussed below. 5411 5412 * Wisdom import and export is somewhat more tricky because one cannot 5413 easily pass files or strings between C and Fortran; see *note 5414 Wisdom of Fortran?::. 5415 5416 * Legacy Fortran cannot use the 'fftw_malloc' dynamic-allocation 5417 routine. If you want to exploit the SIMD FFTW (*note SIMD 5418 alignment and fftw_malloc::), you'll need to figure out some other 5419 way to ensure that your arrays are at least 16-byte aligned. 5420 5421 * Since Fortran 77 does not have data structures, the 'fftw_iodim' 5422 structure from the guru interface (*note Guru vector and transform 5423 sizes::) must be split into separate arguments. In particular, any 5424 'fftw_iodim' array arguments in the C guru interface become three 5425 integer array arguments ('n', 'is', and 'os') in the Fortran guru 5426 interface, all of whose lengths should be equal to the 5427 corresponding 'rank' argument. 5428 5429 * The guru planner interface in Fortran does _not_ do any automatic 5430 translation between column-major and row-major; you are responsible 5431 for setting the strides etcetera to correspond to your Fortran 5432 arrays. However, as a slight bug that we are preserving for 5433 backwards compatibility, the 'plan_guru_r2r' in Fortran _does_ 5434 reverse the order of its 'kind' array parameter, so the 'kind' 5435 array of that routine should be in the reverse of the order of the 5436 iodim arrays (see above). 5437 5438 In general, you should take care to use Fortran data types that 5439correspond to (i.e. are the same size as) the C types used by FFTW. In 5440practice, this correspondence is usually straightforward (i.e. 5441'integer' corresponds to 'int', 'real' corresponds to 'float', 5442etcetera). The native Fortran double/single-precision complex type 5443should be compatible with 'fftw_complex'/'fftwf_complex'. Such simple 5444correspondences are assumed in the examples below. 5445 5446 ---------- Footnotes ---------- 5447 5448 (1) Technically, Fortran 77 identifiers are not allowed to have more 5449than 6 characters, nor may they contain underscores. Any compiler that 5450enforces this limitation doesn't deserve to link to FFTW. 5451 5452 (2) The reason for this is that some Fortran implementations seem to 5453have trouble with C function return values, and vice versa. 5454 5455 5456File: fftw3.info, Node: FFTW Constants in Fortran, Next: FFTW Execution in Fortran, Prev: Fortran-interface routines, Up: Calling FFTW from Legacy Fortran 5457 54588.2 FFTW Constants in Fortran 5459============================= 5460 5461When creating plans in FFTW, a number of constants are used to specify 5462options, such as 'FFTW_MEASURE' or 'FFTW_ESTIMATE'. The same constants 5463must be used with the wrapper routines, but of course the C header files 5464where the constants are defined can't be incorporated directly into 5465Fortran code. 5466 5467 Instead, we have placed Fortran equivalents of the FFTW constant 5468definitions in the file 'fftw3.f', which can be found in the same 5469directory as 'fftw3.h'. If your Fortran compiler supports a 5470preprocessor of some sort, you should be able to 'include' or '#include' 5471this file; otherwise, you can paste it directly into your code. 5472 5473 In C, you combine different flags (like 'FFTW_PRESERVE_INPUT' and 5474'FFTW_MEASURE') using the ''|'' operator; in Fortran you should just use 5475''+''. (Take care not to add in the same flag more than once, though. 5476Alternatively, you can use the 'ior' intrinsic function standardized in 5477Fortran 95.) 5478 5479 5480File: fftw3.info, Node: FFTW Execution in Fortran, Next: Fortran Examples, Prev: FFTW Constants in Fortran, Up: Calling FFTW from Legacy Fortran 5481 54828.3 FFTW Execution in Fortran 5483============================= 5484 5485In C, in order to use a plan, one normally calls 'fftw_execute', which 5486executes the plan to perform the transform on the input/output arrays 5487passed when the plan was created (*note Using Plans::). The 5488corresponding subroutine call in legacy Fortran is: 5489 call dfftw_execute(plan) 5490 5491 However, we have had reports that this causes problems with some 5492recent optimizing Fortran compilers. The problem is, because the 5493input/output arrays are not passed as explicit arguments to 5494'dfftw_execute', the semantics of Fortran (unlike C) allow the compiler 5495to assume that the input/output arrays are not changed by 5496'dfftw_execute'. As a consequence, certain compilers end up optimizing 5497out or repositioning the call to 'dfftw_execute', assuming incorrectly 5498that it does nothing. 5499 5500 There are various workarounds to this, but the safest and simplest 5501thing is to not use 'dfftw_execute' in Fortran. Instead, use the 5502functions described in *note New-array Execute Functions::, which take 5503the input/output arrays as explicit arguments. For example, if the plan 5504is for a complex-data DFT and was created for the arrays 'in' and 'out', 5505you would do: 5506 call dfftw_execute_dft(plan, in, out) 5507 5508 There are a few things to be careful of, however: 5509 5510 * You must use the correct type of execute function, matching the way 5511 the plan was created. Complex DFT plans should use 5512 'dfftw_execute_dft', Real-input (r2c) DFT plans should use use 5513 'dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should use 5514 'dfftw_execute_dft_c2r'. The various r2r plans should use 5515 'dfftw_execute_r2r'. 5516 5517 * You should normally pass the same input/output arrays that were 5518 used when creating the plan. This is always safe. 5519 5520 * _If_ you pass _different_ input/output arrays compared to those 5521 used when creating the plan, you must abide by all the restrictions 5522 of the new-array execute functions (*note New-array Execute 5523 Functions::). The most difficult of these, in Fortran, is the 5524 requirement that the new arrays have the same alignment as the 5525 original arrays, because there seems to be no way in legacy Fortran 5526 to obtain guaranteed-aligned arrays (analogous to 'fftw_malloc' in 5527 C). You can, of course, use the 'FFTW_UNALIGNED' flag when creating 5528 the plan, in which case the plan does not depend on the alignment, 5529 but this may sacrifice substantial performance on architectures 5530 (like x86) with SIMD instructions (*note SIMD alignment and 5531 fftw_malloc::). 5532 5533 5534File: fftw3.info, Node: Fortran Examples, Next: Wisdom of Fortran?, Prev: FFTW Execution in Fortran, Up: Calling FFTW from Legacy Fortran 5535 55368.4 Fortran Examples 5537==================== 5538 5539In C, you might have something like the following to transform a 5540one-dimensional complex array: 5541 5542 fftw_complex in[N], out[N]; 5543 fftw_plan plan; 5544 5545 plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE); 5546 fftw_execute(plan); 5547 fftw_destroy_plan(plan); 5548 5549 In Fortran, you would use the following to accomplish the same thing: 5550 5551 double complex in, out 5552 dimension in(N), out(N) 5553 integer*8 plan 5554 5555 call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE) 5556 call dfftw_execute_dft(plan, in, out) 5557 call dfftw_destroy_plan(plan) 5558 5559 Notice how all routines are called as Fortran subroutines, and the 5560plan is returned via the first argument to 'dfftw_plan_dft_1d'. Notice 5561also that we changed 'fftw_execute' to 'dfftw_execute_dft' (*note FFTW 5562Execution in Fortran::). To do the same thing, but using 8 threads in 5563parallel (*note Multi-threaded FFTW::), you would simply prefix these 5564calls with: 5565 5566 integer iret 5567 call dfftw_init_threads(iret) 5568 call dfftw_plan_with_nthreads(8) 5569 5570 (You might want to check the value of 'iret': if it is zero, it 5571indicates an unlikely error during thread initialization.) 5572 5573 To check the number of threads currently being used by the planner, 5574you can do the following: 5575 5576 integer iret 5577 call dfftw_planner_nthreads(iret) 5578 5579 To transform a three-dimensional array in-place with C, you might do: 5580 5581 fftw_complex arr[L][M][N]; 5582 fftw_plan plan; 5583 5584 plan = fftw_plan_dft_3d(L,M,N, arr,arr, 5585 FFTW_FORWARD, FFTW_ESTIMATE); 5586 fftw_execute(plan); 5587 fftw_destroy_plan(plan); 5588 5589 In Fortran, you would use this instead: 5590 5591 double complex arr 5592 dimension arr(L,M,N) 5593 integer*8 plan 5594 5595 call dfftw_plan_dft_3d(plan, L,M,N, arr,arr, 5596 & FFTW_FORWARD, FFTW_ESTIMATE) 5597 call dfftw_execute_dft(plan, arr, arr) 5598 call dfftw_destroy_plan(plan) 5599 5600 Note that we pass the array dimensions in the "natural" order in both 5601C and Fortran. 5602 5603 To transform a one-dimensional real array in Fortran, you might do: 5604 5605 double precision in 5606 dimension in(N) 5607 double complex out 5608 dimension out(N/2 + 1) 5609 integer*8 plan 5610 5611 call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE) 5612 call dfftw_execute_dft_r2c(plan, in, out) 5613 call dfftw_destroy_plan(plan) 5614 5615 To transform a two-dimensional real array, out of place, you might 5616use the following: 5617 5618 double precision in 5619 dimension in(M,N) 5620 double complex out 5621 dimension out(M/2 + 1, N) 5622 integer*8 plan 5623 5624 call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE) 5625 call dfftw_execute_dft_r2c(plan, in, out) 5626 call dfftw_destroy_plan(plan) 5627 5628 *Important:* Notice that it is the _first_ dimension of the complex 5629output array that is cut in half in Fortran, rather than the last 5630dimension as in C. This is a consequence of the interface routines 5631reversing the order of the array dimensions passed to FFTW so that the 5632Fortran program can use its ordinary column-major order. 5633 5634 5635File: fftw3.info, Node: Wisdom of Fortran?, Prev: Fortran Examples, Up: Calling FFTW from Legacy Fortran 5636 56378.5 Wisdom of Fortran? 5638====================== 5639 5640In this section, we discuss how one can import/export FFTW wisdom (saved 5641plans) to/from a Fortran program; we assume that the reader is already 5642familiar with wisdom, as described in *note Words of Wisdom-Saving 5643Plans::. 5644 5645 The basic problem is that is difficult to (portably) pass files and 5646strings between Fortran and C, so we cannot provide a direct Fortran 5647equivalent to the 'fftw_export_wisdom_to_file', etcetera, functions. 5648Fortran interfaces _are_ provided for the functions that do not take 5649file/string arguments, however: 'dfftw_import_system_wisdom', 5650'dfftw_import_wisdom', 'dfftw_export_wisdom', and 'dfftw_forget_wisdom'. 5651 5652 So, for example, to import the system-wide wisdom, you would do: 5653 5654 integer isuccess 5655 call dfftw_import_system_wisdom(isuccess) 5656 5657 As usual, the C return value is turned into a first parameter; 5658'isuccess' is non-zero on success and zero on failure (e.g. if there is 5659no system wisdom installed). 5660 5661 If you want to import/export wisdom from/to an arbitrary file or 5662elsewhere, you can employ the generic 'dfftw_import_wisdom' and 5663'dfftw_export_wisdom' functions, for which you must supply a subroutine 5664to read/write one character at a time. The FFTW package contains an 5665example file 'doc/f77_wisdom.f' demonstrating how to implement 5666'import_wisdom_from_file' and 'export_wisdom_to_file' subroutines in 5667this way. (These routines cannot be compiled into the FFTW library 5668itself, lest all FFTW-using programs be required to link with the 5669Fortran I/O library.) 5670 5671 5672File: fftw3.info, Node: Upgrading from FFTW version 2, Next: Installation and Customization, Prev: Calling FFTW from Legacy Fortran, Up: Top 5673 56749 Upgrading from FFTW version 2 5675******************************* 5676 5677In this chapter, we outline the process for updating codes designed for 5678the older FFTW 2 interface to work with FFTW 3. The interface for FFTW 56793 is not backwards-compatible with the interface for FFTW 2 and earlier 5680versions; codes written to use those versions will fail to link with 5681FFTW 3. Nor is it possible to write "compatibility wrappers" to bridge 5682the gap (at least not efficiently), because FFTW 3 has different 5683semantics from previous versions. However, upgrading should be a 5684straightforward process because the data formats are identical and the 5685overall style of planning/execution is essentially the same. 5686 5687 Unlike FFTW 2, there are no separate header files for real and 5688complex transforms (or even for different precisions) in FFTW 3; all 5689interfaces are defined in the '<fftw3.h>' header file. 5690 5691Numeric Types 5692============= 5693 5694The main difference in data types is that 'fftw_complex' in FFTW 2 was 5695defined as a 'struct' with macros 'c_re' and 'c_im' for accessing the 5696real/imaginary parts. (This is binary-compatible with FFTW 3 on any 5697machine except perhaps for some older Crays in single precision.) The 5698equivalent macros for FFTW 3 are: 5699 5700 #define c_re(c) ((c)[0]) 5701 #define c_im(c) ((c)[1]) 5702 5703 This does not work if you are using the C99 complex type, however, 5704unless you insert a 'double*' typecast into the above macros (*note 5705Complex numbers::). 5706 5707 Also, FFTW 2 had an 'fftw_real' typedef that was an alias for 5708'double' (in double precision). In FFTW 3 you should just use 'double' 5709(or whatever precision you are employing). 5710 5711Plans 5712===== 5713 5714The major difference between FFTW 2 and FFTW 3 is in the 5715planning/execution division of labor. In FFTW 2, plans were found for a 5716given transform size and type, and then could be applied to _any_ arrays 5717and for _any_ multiplicity/stride parameters. In FFTW 3, you specify 5718the particular arrays, stride parameters, etcetera when creating the 5719plan, and the plan is then executed for _those_ arrays (unless the guru 5720interface is used) and _those_ parameters _only_. (FFTW 2 had "specific 5721planner" routines that planned for a particular array and stride, but 5722the plan could still be used for other arrays and strides.) That is, 5723much of the information that was formerly specified at execution time is 5724now specified at planning time. 5725 5726 Like FFTW 2's specific planner routines, the FFTW 3 planner 5727overwrites the input/output arrays unless you use 'FFTW_ESTIMATE'. 5728 5729 FFTW 2 had separate data types 'fftw_plan', 'fftwnd_plan', 5730'rfftw_plan', and 'rfftwnd_plan' for complex and real one- and 5731multi-dimensional transforms, and each type had its own 'destroy' 5732function. In FFTW 3, all plans are of type 'fftw_plan' and all are 5733destroyed by 'fftw_destroy_plan(plan)'. 5734 5735 Where you formerly used 'fftw_create_plan' and 'fftw_one' to plan and 5736compute a single 1d transform, you would now use 'fftw_plan_dft_1d' to 5737plan the transform. If you used the generic 'fftw' function to execute 5738the transform with multiplicity ('howmany') and stride parameters, you 5739would now use the advanced interface 'fftw_plan_many_dft' to specify 5740those parameters. The plans are now executed with 'fftw_execute(plan)', 5741which takes all of its parameters (including the input/output arrays) 5742from the plan. 5743 5744 In-place transforms no longer interpret their output argument as 5745scratch space, nor is there an 'FFTW_IN_PLACE' flag. You simply pass 5746the same pointer for both the input and output arguments. (Previously, 5747the output 'ostride' and 'odist' parameters were ignored for in-place 5748transforms; now, if they are specified via the advanced interface, they 5749are significant even in the in-place case, although they should normally 5750equal the corresponding input parameters.) 5751 5752 The 'FFTW_ESTIMATE' and 'FFTW_MEASURE' flags have the same meaning as 5753before, although the planning time will differ. You may also consider 5754using 'FFTW_PATIENT', which is like 'FFTW_MEASURE' except that it takes 5755more time in order to consider a wider variety of algorithms. 5756 5757 For multi-dimensional complex DFTs, instead of 'fftwnd_create_plan' 5758(or 'fftw2d_create_plan' or 'fftw3d_create_plan'), followed by 5759'fftwnd_one', you would use 'fftw_plan_dft' (or 'fftw_plan_dft_2d' or 5760'fftw_plan_dft_3d'). followed by 'fftw_execute'. If you used 'fftwnd' 5761to to specify strides etcetera, you would instead specify these via 5762'fftw_plan_many_dft'. 5763 5764 The analogues to 'rfftw_create_plan' and 'rfftw_one' with 5765'FFTW_REAL_TO_COMPLEX' or 'FFTW_COMPLEX_TO_REAL' directions are 5766'fftw_plan_r2r_1d' with kind 'FFTW_R2HC' or 'FFTW_HC2R', followed by 5767'fftw_execute'. The stride etcetera arguments of 'rfftw' are now in 5768'fftw_plan_many_r2r'. 5769 5770 Instead of 'rfftwnd_create_plan' (or 'rfftw2d_create_plan' or 5771'rfftw3d_create_plan') followed by 'rfftwnd_one_real_to_complex' or 5772'rfftwnd_one_complex_to_real', you now use 'fftw_plan_dft_r2c' (or 5773'fftw_plan_dft_r2c_2d' or 'fftw_plan_dft_r2c_3d') or 'fftw_plan_dft_c2r' 5774(or 'fftw_plan_dft_c2r_2d' or 'fftw_plan_dft_c2r_3d'), respectively, 5775followed by 'fftw_execute'. As usual, the strides etcetera of 5776'rfftwnd_real_to_complex' or 'rfftwnd_complex_to_real' are no specified 5777in the advanced planner routines, 'fftw_plan_many_dft_r2c' or 5778'fftw_plan_many_dft_c2r'. 5779 5780Wisdom 5781====== 5782 5783In FFTW 2, you had to supply the 'FFTW_USE_WISDOM' flag in order to use 5784wisdom; in FFTW 3, wisdom is always used. (You could simulate the FFTW 57852 wisdom-less behavior by calling 'fftw_forget_wisdom' after every 5786planner call.) 5787 5788 The FFTW 3 wisdom import/export routines are almost the same as 5789before (although the storage format is entirely different). There is 5790one significant difference, however. In FFTW 2, the import routines 5791would never read past the end of the wisdom, so you could store extra 5792data beyond the wisdom in the same file, for example. In FFTW 3, the 5793file-import routine may read up to a few hundred bytes past the end of 5794the wisdom, so you cannot store other data just beyond it.(1) 5795 5796 Wisdom has been enhanced by additional humility in FFTW 3: whereas 5797FFTW 2 would re-use wisdom for a given transform size regardless of the 5798stride etc., in FFTW 3 wisdom is only used with the strides etc. for 5799which it was created. Unfortunately, this means FFTW 3 has to create 5800new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g. 5801one transform of size 1024 also created wisdom for all smaller powers of 58022, but this no longer occurs). 5803 5804 FFTW 3 also has the new routine 'fftw_import_system_wisdom' to import 5805wisdom from a standard system-wide location. 5806 5807Memory allocation 5808================= 5809 5810In FFTW 3, we recommend allocating your arrays with 'fftw_malloc' and 5811deallocating them with 'fftw_free'; this is not required, but allows 5812optimal performance when SIMD acceleration is used. (Those two 5813functions actually existed in FFTW 2, and worked the same way, but were 5814not documented.) 5815 5816 In FFTW 2, there were 'fftw_malloc_hook' and 'fftw_free_hook' 5817functions that allowed the user to replace FFTW's memory-allocation 5818routines (e.g. to implement different error-handling, since by default 5819FFTW prints an error message and calls 'exit' to abort the program if 5820'malloc' returns 'NULL'). These hooks are not supported in FFTW 3; 5821those few users who require this functionality can just directly modify 5822the memory-allocation routines in FFTW (they are defined in 5823'kernel/alloc.c'). 5824 5825Fortran interface 5826================= 5827 5828In FFTW 2, the subroutine names were obtained by replacing 'fftw_' with 5829'fftw_f77'; in FFTW 3, you replace 'fftw_' with 'dfftw_' (or 'sfftw_' or 5830'lfftw_', depending upon the precision). 5831 5832 In FFTW 3, we have begun recommending that you always declare the 5833type used to store plans as 'integer*8'. (Too many people didn't notice 5834our instruction to switch from 'integer' to 'integer*8' for 64-bit 5835machines.) 5836 5837 In FFTW 3, we provide a 'fftw3.f' "header file" to include in your 5838code (and which is officially installed on Unix systems). (In FFTW 2, 5839we supplied a 'fftw_f77.i' file, but it was not installed.) 5840 5841 Otherwise, the C-Fortran interface relationship is much the same as 5842it was before (e.g. return values become initial parameters, and 5843multi-dimensional arrays are in column-major order). Unlike FFTW 2, we 5844do provide some support for wisdom import/export in Fortran (*note 5845Wisdom of Fortran?::). 5846 5847Threads 5848======= 5849 5850Like FFTW 2, only the execution routines are thread-safe. All planner 5851routines, etcetera, should be called by only a single thread at a time 5852(*note Thread safety::). _Unlike_ FFTW 2, there is no special 5853'FFTW_THREADSAFE' flag for the planner to allow a given plan to be 5854usable by multiple threads in parallel; this is now the case by default. 5855 5856 The multi-threaded version of FFTW 2 required you to pass the number 5857of threads each time you execute the transform. The number of threads 5858is now stored in the plan, and is specified before the planner is called 5859by 'fftw_plan_with_nthreads'. The threads initialization routine used 5860to be called 'fftw_threads_init' and would return zero on success; the 5861new routine is called 'fftw_init_threads' and returns zero on failure. 5862The current number of threads used by the planner can be checked with 5863'fftw_planner_nthreads'. *Note Multi-threaded FFTW::. 5864 5865 There is no separate threads header file in FFTW 3; all the function 5866prototypes are in '<fftw3.h>'. However, you still have to link to a 5867separate library ('-lfftw3_threads -lfftw3 -lm' on Unix), as well as to 5868the threading library (e.g. POSIX threads on Unix). 5869 5870 ---------- Footnotes ---------- 5871 5872 (1) We do our own buffering because GNU libc I/O routines are 5873horribly slow for single-character I/O, apparently for thread-safety 5874reasons (whether you are using threads or not). 5875 5876 5877File: fftw3.info, Node: Installation and Customization, Next: Acknowledgments, Prev: Upgrading from FFTW version 2, Up: Top 5878 587910 Installation and Customization 5880********************************* 5881 5882This chapter describes the installation and customization of FFTW, the 5883latest version of which may be downloaded from the FFTW home page 5884(http://www.fftw.org). 5885 5886 In principle, FFTW should work on any system with an ANSI C compiler 5887('gcc' is fine). However, planner time is drastically reduced if FFTW 5888can exploit a hardware cycle counter; FFTW comes with cycle-counter 5889support for all modern general-purpose CPUs, but you may need to add a 5890couple of lines of code if your compiler is not yet supported (*note 5891Cycle Counters::). (On Unix, there will be a warning at the end of the 5892'configure' output if no cycle counter is found.) 5893 5894 Installation of FFTW is simplest if you have a Unix or a GNU system, 5895such as GNU/Linux, and we describe this case in the first section below, 5896including the use of special configuration options to e.g. install 5897different precisions or exploit optimizations for particular 5898architectures (e.g. SIMD). Compilation on non-Unix systems is a more 5899manual process, but we outline the procedure in the second section. It 5900is also likely that pre-compiled binaries will be available for popular 5901systems. 5902 5903 Finally, we describe how you can customize FFTW for particular needs 5904by generating _codelets_ for fast transforms of sizes not supported 5905efficiently by the standard FFTW distribution. 5906 5907* Menu: 5908 5909* Installation on Unix:: 5910* Installation on non-Unix systems:: 5911* Cycle Counters:: 5912* Generating your own code:: 5913 5914 5915File: fftw3.info, Node: Installation on Unix, Next: Installation on non-Unix systems, Prev: Installation and Customization, Up: Installation and Customization 5916 591710.1 Installation on Unix 5918========================= 5919 5920FFTW comes with a 'configure' program in the GNU style. Installation 5921can be as simple as: 5922 5923 ./configure 5924 make 5925 make install 5926 5927 This will build the uniprocessor complex and real transform libraries 5928along with the test programs. (We recommend that you use GNU 'make' if 5929it is available; on some systems it is called 'gmake'.) The "'make 5930install'" command installs the fftw and rfftw libraries in standard 5931places, and typically requires root privileges (unless you specify a 5932different install directory with the '--prefix' flag to 'configure'). 5933You can also type "'make check'" to put the FFTW test programs through 5934their paces. If you have problems during configuration or compilation, 5935you may want to run "'make distclean'" before trying again; this ensures 5936that you don't have any stale files left over from previous compilation 5937attempts. 5938 5939 The 'configure' script chooses the 'gcc' compiler by default, if it 5940is available; you can select some other compiler with: 5941 ./configure CC="<the name of your C compiler>" 5942 5943 The 'configure' script knows good 'CFLAGS' (C compiler flags) for a 5944few systems. If your system is not known, the 'configure' script will 5945print out a warning. In this case, you should re-configure FFTW with 5946the command 5947 ./configure CFLAGS="<write your CFLAGS here>" 5948 and then compile as usual. If you do find an optimal set of 'CFLAGS' 5949for your system, please let us know what they are (along with the output 5950of 'config.guess') so that we can include them in future releases. 5951 5952 'configure' supports all the standard flags defined by the GNU Coding 5953Standards; see the 'INSTALL' file in FFTW or the GNU web page 5954(http://www.gnu.org/prep/standards/html_node/index.html). Note 5955especially '--help' to list all flags and '--enable-shared' to create 5956shared, rather than static, libraries. 'configure' also accepts a few 5957FFTW-specific flags, particularly: 5958 5959 * '--enable-float': Produces a single-precision version of FFTW 5960 ('float') instead of the default double-precision ('double'). 5961 *Note Precision::. 5962 5963 * '--enable-long-double': Produces a long-double precision version of 5964 FFTW ('long double') instead of the default double-precision 5965 ('double'). The 'configure' script will halt with an error message 5966 if 'long double' is the same size as 'double' on your 5967 machine/compiler. *Note Precision::. 5968 5969 * '--enable-quad-precision': Produces a quadruple-precision version 5970 of FFTW using the nonstandard '__float128' type provided by 'gcc' 5971 4.6 or later on x86, x86-64, and Itanium architectures, instead of 5972 the default double-precision ('double'). The 'configure' script 5973 will halt with an error message if the compiler is not 'gcc' 5974 version 4.6 or later or if 'gcc''s 'libquadmath' library is not 5975 installed. *Note Precision::. 5976 5977 * '--enable-threads': Enables compilation and installation of the 5978 FFTW threads library (*note Multi-threaded FFTW::), which provides 5979 a simple interface to parallel transforms for SMP systems. By 5980 default, the threads routines are not compiled. 5981 5982 * '--enable-openmp': Like '--enable-threads', but using OpenMP 5983 compiler directives in order to induce parallelism rather than 5984 spawning its own threads directly, and installing an 'fftw3_omp' 5985 library rather than an 'fftw3_threads' library (*note 5986 Multi-threaded FFTW::). You can use both '--enable-openmp' and 5987 '--enable-threads' since they compile/install libraries with 5988 different names. By default, the OpenMP routines are not compiled. 5989 5990 * '--with-combined-threads': By default, if '--enable-threads' is 5991 used, the threads support is compiled into a separate library that 5992 must be linked in addition to the main FFTW library. This is so 5993 that users of the serial library do not need to link the system 5994 threads libraries. If '--with-combined-threads' is specified, 5995 however, then no separate threads library is created, and threads 5996 are included in the main FFTW library. This is mainly useful under 5997 Windows, where no system threads library is required and 5998 inter-library dependencies are problematic. 5999 6000 * '--enable-mpi': Enables compilation and installation of the FFTW 6001 MPI library (*note Distributed-memory FFTW with MPI::), which 6002 provides parallel transforms for distributed-memory systems with 6003 MPI. (By default, the MPI routines are not compiled.) *Note FFTW 6004 MPI Installation::. 6005 6006 * '--disable-fortran': Disables inclusion of legacy-Fortran wrapper 6007 routines (*note Calling FFTW from Legacy Fortran::) in the standard 6008 FFTW libraries. These wrapper routines increase the library size 6009 by only a negligible amount, so they are included by default as 6010 long as the 'configure' script finds a Fortran compiler on your 6011 system. (To specify a particular Fortran compiler foo, pass 6012 'F77='foo to 'configure'.) 6013 6014 * '--with-g77-wrappers': By default, when Fortran wrappers are 6015 included, the wrappers employ the linking conventions of the 6016 Fortran compiler detected by the 'configure' script. If this 6017 compiler is GNU 'g77', however, then _two_ versions of the wrappers 6018 are included: one with 'g77''s idiosyncratic convention of 6019 appending two underscores to identifiers, and one with the more 6020 common convention of appending only a single underscore. This way, 6021 the same FFTW library will work with both 'g77' and other Fortran 6022 compilers, such as GNU 'gfortran'. However, the converse is not 6023 true: if you configure with a different compiler, then the 6024 'g77'-compatible wrappers are not included. By specifying 6025 '--with-g77-wrappers', the 'g77'-compatible wrappers are included 6026 in addition to wrappers for whatever Fortran compiler 'configure' 6027 finds. 6028 6029 * '--with-slow-timer': Disables the use of hardware cycle counters, 6030 and falls back on 'gettimeofday' or 'clock'. This greatly worsens 6031 performance, and should generally not be used (unless you don't 6032 have a cycle counter but still really want an optimized plan 6033 regardless of the time). *Note Cycle Counters::. 6034 6035 * '--enable-sse' (single precision), '--enable-sse2' (single, 6036 double), '--enable-avx' (single, double), '--enable-avx2' (single, 6037 double), '--enable-avx512' (single, double), 6038 '--enable-avx-128-fma', '--enable-kcvi' (single), 6039 '--enable-altivec' (single), '--enable-vsx' (single, double), 6040 '--enable-neon' (single, double on aarch64), 6041 '--enable-generic-simd128', and '--enable-generic-simd256': 6042 6043 Enable various SIMD instruction sets. You need compiler that 6044 supports the given SIMD extensions, but FFTW will try to detect at 6045 runtime whether the CPU supports these extensions. That is, you 6046 can compile with'--enable-avx' and the code will still run on a CPU 6047 without AVX support. 6048 6049 - These options require a compiler supporting SIMD extensions, 6050 and compiler support is always a bit flaky: see the FFTW FAQ 6051 for a list of compiler versions that have problems compiling 6052 FFTW. 6053 - Because of the large variety of ARM processors and ABIs, FFTW 6054 does not attempt to guess the correct 'gcc' flags for 6055 generating NEON code. In general, you will have to provide 6056 them on the command line. This command line is known to have 6057 worked at least once: 6058 ./configure --with-slow-timer --host=arm-linux-gnueabi \ 6059 --enable-single --enable-neon \ 6060 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp" 6061 6062 To force 'configure' to use a particular C compiler foo (instead of 6063the default, usually 'gcc'), pass 'CC='foo to the 'configure' script; 6064you may also need to set the flags via the variable 'CFLAGS' as 6065described above. 6066 6067 6068File: fftw3.info, Node: Installation on non-Unix systems, Next: Cycle Counters, Prev: Installation on Unix, Up: Installation and Customization 6069 607010.2 Installation on non-Unix systems 6071===================================== 6072 6073It should be relatively straightforward to compile FFTW even on non-Unix 6074systems lacking the niceties of a 'configure' script. Basically, you 6075need to edit the 'config.h' header (copy it from 'config.h.in') to 6076'#define' the various options and compiler characteristics, and then 6077compile all the '.c' files in the relevant directories. 6078 6079 The 'config.h' header contains about 100 options to set, each one 6080initially an '#undef', each documented with a comment, and most of them 6081fairly obvious. For most of the options, you should simply '#define' 6082them to '1' if they are applicable, although a few options require a 6083particular value (e.g. 'SIZEOF_LONG_LONG' should be defined to the size 6084of the 'long long' type, in bytes, or zero if it is not supported). We 6085will likely post some sample 'config.h' files for various operating 6086systems and compilers for you to use (at least as a starting point). 6087Please let us know if you have to hand-create a configuration file 6088(and/or a pre-compiled binary) that you want to share. 6089 6090 To create the FFTW library, you will then need to compile all of the 6091'.c' files in the 'kernel', 'dft', 'dft/scalar', 'dft/scalar/codelets', 6092'rdft', 'rdft/scalar', 'rdft/scalar/r2cf', 'rdft/scalar/r2cb', 6093'rdft/scalar/r2r', 'reodft', and 'api' directories. If you are 6094compiling with SIMD support (e.g. you defined 'HAVE_SSE2' in 6095'config.h'), then you also need to compile the '.c' files in the 6096'simd-support', '{dft,rdft}/simd', '{dft,rdft}/simd/*' directories. 6097 6098 Once these files are all compiled, link them into a library, or a 6099shared library, or directly into your program. 6100 6101 To compile the FFTW test program, additionally compile the code in 6102the 'libbench2/' directory, and link it into a library. Then compile 6103the code in the 'tests/' directory and link it to the 'libbench2' and 6104FFTW libraries. To compile the 'fftw-wisdom' (command-line) tool (*note 6105Wisdom Utilities::), compile 'tools/fftw-wisdom.c' and link it to the 6106'libbench2' and FFTW libraries 6107 6108 6109File: fftw3.info, Node: Cycle Counters, Next: Generating your own code, Prev: Installation on non-Unix systems, Up: Installation and Customization 6110 611110.3 Cycle Counters 6112=================== 6113 6114FFTW's planner actually executes and times different possible FFT 6115algorithms in order to pick the fastest plan for a given n. In order to 6116do this in as short a time as possible, however, the timer must have a 6117very high resolution, and to accomplish this we employ the hardware 6118"cycle counters" that are available on most CPUs. Currently, FFTW 6119supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC 6120(SPARC v9), IA64, PA-RISC, and MIPS processors. 6121 6122 Access to the cycle counters, unfortunately, is a compiler and/or 6123operating-system dependent task, often requiring inline assembly 6124language, and it may be that your compiler is not supported. If you are 6125_not_ supported, FFTW will by default fall back on its estimator 6126(effectively using 'FFTW_ESTIMATE' for all plans). 6127 6128 You can add support by editing the file 'kernel/cycle.h'; normally, 6129this will involve adapting one of the examples already present in order 6130to use the inline-assembler syntax for your C compiler, and will only 6131require a couple of lines of code. Anyone adding support for a new 6132system to 'cycle.h' is encouraged to email us at <fftw@fftw.org>. 6133 6134 If a cycle counter is not available on your system (e.g. some 6135embedded processor), and you don't want to use estimated plans, as a 6136last resort you can use the '--with-slow-timer' option to 'configure' 6137(on Unix) or '#define WITH_SLOW_TIMER' in 'config.h' (elsewhere). This 6138will use the much lower-resolution 'gettimeofday' function, or even 6139'clock' if the former is unavailable, and planning will be extremely 6140slow. 6141 6142 6143File: fftw3.info, Node: Generating your own code, Prev: Cycle Counters, Up: Installation and Customization 6144 614510.4 Generating your own code 6146============================= 6147 6148The directory 'genfft' contains the programs that were used to generate 6149FFTW's "codelets," which are hard-coded transforms of small sizes. We 6150do not expect casual users to employ the generator, which is a rather 6151sophisticated program that generates directed acyclic graphs of FFT 6152algorithms and performs algebraic simplifications on them. It was 6153written in Objective Caml, a dialect of ML, which is available at 6154<http://caml.inria.fr/ocaml/index.en.html>. 6155 6156 If you have Objective Caml installed (along with recent versions of 6157GNU 'autoconf', 'automake', and 'libtool'), then you can change the set 6158of codelets that are generated or play with the generation options. The 6159set of generated codelets is specified by the 6160'{dft,rdft}/{codelets,simd}/*/Makefile.am' files. For example, you can 6161add efficient REDFT codelets of small sizes by modifying 6162'rdft/codelets/r2r/Makefile.am'. After you modify any 'Makefile.am' 6163files, you can type 'sh bootstrap.sh' in the top-level directory 6164followed by 'make' to re-generate the files. 6165 6166 We do not provide more details about the code-generation process, 6167since we do not expect that most users will need to generate their own 6168code. However, feel free to contact us at <fftw@fftw.org> if you are 6169interested in the subject. 6170 6171 You might find it interesting to learn Caml and/or some modern 6172programming techniques that we used in the generator (including monadic 6173programming), especially if you heard the rumor that Java and 6174object-oriented programming are the latest advancement in the field. 6175The internal operation of the codelet generator is described in the 6176paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is 6177available from the FFTW home page (http://www.fftw.org) and also 6178appeared in the 'Proceedings of the 1999 ACM SIGPLAN Conference on 6179Programming Language Design and Implementation (PLDI)'. 6180 6181 6182File: fftw3.info, Node: Acknowledgments, Next: License and Copyright, Prev: Installation and Customization, Up: Top 6183 618411 Acknowledgments 6185****************** 6186 6187Matteo Frigo was supported in part by the Special Research Program SFB 6188F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln 6189Laboratory. For previous versions of FFTW, he was supported in part by 6190the Defense Advanced Research Projects Agency (DARPA), under Grants 6191N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment 6192Corporation Fellowship. 6193 6194 Steven G. Johnson was supported in part by a Dept. of Defense NDSEG 6195Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials 6196Research Science and Engineering Center program of the National Science 6197Foundation under award DMR-9400334. 6198 6199 Code for the Cell Broadband Engine was graciously donated to the FFTW 6200project by the IBM Austin Research Lab and included in fftw-3.2. (This 6201code was removed in fftw-3.3.) 6202 6203 Code for the MIPS paired-single SIMD support was graciously donated 6204to the FFTW project by CodeSourcery, Inc. 6205 6206 We are grateful to Sun Microsystems Inc. for its donation of a 6207cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These 6208machines served as the primary platform for the development of early 6209versions of FFTW. 6210 6211 We thank Intel Corporation for donating a four-processor Pentium Pro 6212machine. We thank the GNU/Linux community for giving us a decent OS to 6213run on that machine. 6214 6215 We are thankful to the AMD corporation for donating an AMD Athlon XP 62161700+ computer to the FFTW project. 6217 6218 We thank the Compaq/HP testdrive program and VA Software Corporation 6219(SourceForge.net) for providing remote access to machines that were used 6220to test FFTW. 6221 6222 The 'genfft' suite of code generators was written using Objective 6223Caml, a dialect of ML. Objective Caml is a small and elegant language 6224developed by Xavier Leroy. The implementation is available from 6225'http://caml.inria.fr/' (http://caml.inria.fr/). In previous releases 6226of FFTW, 'genfft' was written in Caml Light, by the same authors. An 6227even earlier implementation of 'genfft' was written in Scheme, but Caml 6228is definitely better for this kind of application. 6229 6230 FFTW uses many tools from the GNU project, including 'automake', 6231'texinfo', and 'libtool'. 6232 6233 Prof. Charles E. Leiserson of MIT provided continuous support and 6234encouragement. This program would not exist without him. Charles also 6235proposed the name "codelets" for the basic FFT blocks. 6236 6237 Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance 6238of Steven's "extra-curricular" computer-science activities, as well as 6239remarkable creativity in working them into his grant proposals. 6240Steven's physics degree would not exist without him. 6241 6242 Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually 6243led to the SIMD support in FFTW 3. 6244 6245 Stefan Kral wrote most of the K7 code generator distributed with FFTW 62463.0.x and 3.1.x. 6247 6248 Andrew Sterian contributed the Windows timing code in FFTW 2. 6249 6250 Didier Miras reported a bug in the test procedure used in FFTW 1.2. 6251We now use a completely different test algorithm by Funda Ergun that 6252does not require a separate FFT program to compare against. 6253 6254 Wolfgang Reimer contributed the Pentium cycle counter and a few fixes 6255that help portability. 6256 6257 Ming-Chang Liu uncovered a well-hidden bug in the complex transforms 6258of FFTW 2.0 and supplied a patch to correct it. 6259 6260 The FFTW FAQ was written in 'bfnn' (Bizarre Format With No Name) and 6261formatted using the tools developed by Ian Jackson for the Linux FAQ. 6262 6263 _We are especially thankful to all of our users for their continuing 6264support, feedback, and interest during our development of FFTW._ 6265 6266 6267File: fftw3.info, Node: License and Copyright, Next: Concept Index, Prev: Acknowledgments, Up: Top 6268 626912 License and Copyright 6270************************ 6271 6272FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003, 62732007-11 Massachusetts Institute of Technology. 6274 6275 FFTW is free software; you can redistribute it and/or modify it under 6276the terms of the GNU General Public License as published by the Free 6277Software Foundation; either version 2 of the License, or (at your 6278option) any later version. 6279 6280 This program is distributed in the hope that it will be useful, but 6281WITHOUT ANY WARRANTY; without even the implied warranty of 6282MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General 6283Public License for more details. 6284 6285 You should have received a copy of the GNU General Public License 6286along with this program; if not, write to the Free Software Foundation, 6287Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You can 6288also find the GPL on the GNU web site 6289(http://www.gnu.org/licenses/gpl-2.0.html). 6290 6291 In addition, we kindly ask you to acknowledge FFTW and its authors in 6292any program or publication in which you use FFTW. (You are not 6293_required_ to do so; it is up to your common sense to decide whether you 6294want to comply with this request or not.) For general publications, we 6295suggest referencing: Matteo Frigo and Steven G. Johnson, "The design and 6296implementation of FFTW3," Proc. IEEE 93 (2), 216-231 (2005). 6297 6298 Non-free versions of FFTW are available under terms different from 6299those of the General Public License. (e.g. they do not require you to 6300accompany any object code using FFTW with the corresponding source 6301code.) For these alternative terms you must purchase a license from 6302MIT's Technology Licensing Office. Users interested in such a license 6303should contact us (<fftw@fftw.org>) for more information. 6304 6305