1This is fftw3.info, produced by makeinfo version 6.5 from fftw3.texi.
2
3This manual is for FFTW (version 3.3.9, 10 December 2020).
4
5   Copyright (C) 2003 Matteo Frigo.
6
7   Copyright (C) 2003 Massachusetts Institute of Technology.
8
9     Permission is granted to make and distribute verbatim copies of
10     this manual provided the copyright notice and this permission
11     notice are preserved on all copies.
12
13     Permission is granted to copy and distribute modified versions of
14     this manual under the conditions for verbatim copying, provided
15     that the entire resulting derived work is distributed under the
16     terms of a permission notice identical to this one.
17
18     Permission is granted to copy and distribute translations of this
19     manual into another language, under the above conditions for
20     modified versions, except that this permission notice may be stated
21     in a translation approved by the Free Software Foundation.
22INFO-DIR-SECTION Development
23START-INFO-DIR-ENTRY
24* fftw3: (fftw3).	FFTW User's Manual.
25END-INFO-DIR-ENTRY
26
27
28File: fftw3.info,  Node: Top,  Next: Introduction,  Prev: (dir),  Up: (dir)
29
30FFTW User Manual
31****************
32
33Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is a
34collection of fast C routines to compute the discrete Fourier transform.
35This manual documents FFTW version 3.3.9.
36
37* Menu:
38
39* Introduction::
40* Tutorial::
41* Other Important Topics::
42* FFTW Reference::
43* Multi-threaded FFTW::
44* Distributed-memory FFTW with MPI::
45* Calling FFTW from Modern Fortran::
46* Calling FFTW from Legacy Fortran::
47* Upgrading from FFTW version 2::
48* Installation and Customization::
49* Acknowledgments::
50* License and Copyright::
51* Concept Index::
52* Library Index::
53
54
55File: fftw3.info,  Node: Introduction,  Next: Tutorial,  Prev: Top,  Up: Top
56
571 Introduction
58**************
59
60This manual documents version 3.3.9 of FFTW, the _Fastest Fourier
61Transform in the West_.  FFTW is a comprehensive collection of fast C
62routines for computing the discrete Fourier transform (DFT) and various
63special cases thereof.
64   * FFTW computes the DFT of complex data, real data, even- or
65     odd-symmetric real data (these symmetric transforms are usually
66     known as the discrete cosine or sine transform, respectively), and
67     the discrete Hartley transform (DHT) of real data.
68
69   * The input data can have arbitrary length.  FFTW employs O(n log n)
70     algorithms for all lengths, including prime numbers.
71
72   * FFTW supports arbitrary multi-dimensional data.
73
74   * FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX,
75     and NEON vector instruction sets.
76
77   * FFTW includes parallel (multi-threaded) transforms for
78     shared-memory systems.
79   * Starting with version 3.3, FFTW includes distributed-memory
80     parallel transforms using MPI.
81
82   We assume herein that you are familiar with the properties and uses
83of the DFT that are relevant to your application.  Otherwise, see e.g.
84'The Fast Fourier Transform and Its Applications' by E. O. Brigham
85(Prentice-Hall, Englewood Cliffs, NJ, 1988).  Our web page
86(http://www.fftw.org) also has links to FFT-related information online.
87
88   In order to use FFTW effectively, you need to learn one basic concept
89of FFTW's internal structure: FFTW does not use a fixed algorithm for
90computing the transform, but instead it adapts the DFT algorithm to
91details of the underlying hardware in order to maximize performance.
92Hence, the computation of the transform is split into two phases.
93First, FFTW's "planner" "learns" the fastest way to compute the
94transform on your machine.  The planner produces a data structure called
95a "plan" that contains this information.  Subsequently, the plan is
96"executed" to transform the array of input data as dictated by the plan.
97The plan can be reused as many times as needed.  In typical
98high-performance applications, many transforms of the same size are
99computed and, consequently, a relatively expensive initialization of
100this sort is acceptable.  On the other hand, if you need a single
101transform of a given size, the one-time cost of the planner becomes
102significant.  For this case, FFTW provides fast planners based on
103heuristics or on previously computed plans.
104
105   FFTW supports transforms of data with arbitrary length, rank,
106multiplicity, and a general memory layout.  In simple cases, however,
107this generality may be unnecessary and confusing.  Consequently, we
108organized the interface to FFTW into three levels of increasing
109generality.
110   * The "basic interface" computes a single transform of contiguous
111     data.
112   * The "advanced interface" computes transforms of multiple or strided
113     arrays.
114   * The "guru interface" supports the most general data layouts,
115     multiplicities, and strides.
116   We expect that most users will be best served by the basic interface,
117whereas the guru interface requires careful attention to the
118documentation to avoid problems.
119
120   Besides the automatic performance adaptation performed by the
121planner, it is also possible for advanced users to customize FFTW
122manually.  For example, if code space is a concern, we provide a tool
123that links only the subset of FFTW needed by your application.
124Conversely, you may need to extend FFTW because the standard
125distribution is not sufficient for your needs.  For example, the
126standard FFTW distribution works most efficiently for arrays whose size
127can be factored into small primes (2, 3, 5, and 7), and otherwise it
128uses a slower general-purpose routine.  If you need efficient transforms
129of other sizes, you can use FFTW's code generator, which produces fast C
130programs ("codelets") for any particular array size you may care about.
131For example, if you need transforms of size 513 = 19 x 3^3, you can
132customize FFTW to support the factor 19 efficiently.
133
134   For more information regarding FFTW, see the paper, "The Design and
135Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an
136invited paper in 'Proc. IEEE' 93 (2), p.  216 (2005).  The code
137generator is described in the paper "A fast Fourier transform compiler",
138by M. Frigo, in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
139Programming Language Design and Implementation (PLDI), Atlanta, Georgia,
140May 1999'.  These papers, along with the latest version of FFTW, the
141FAQ, benchmarks, and other links, are available at the FFTW home page
142(http://www.fftw.org).
143
144   The current version of FFTW incorporates many good ideas from the
145past thirty years of FFT literature.  In one way or another, FFTW uses
146the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
147algorithm for prime sizes, and a split-radix algorithm (with a
148"conjugate-pair" variation pointed out to us by Dan Bernstein).  FFTW's
149code generator also produces new algorithms that we do not completely
150understand.  The reader is referred to the cited papers for the
151appropriate references.
152
153   The rest of this manual is organized as follows.  We first discuss
154the sequential (single-processor) implementation.  We start by
155describing the basic interface/features of FFTW in *note Tutorial::.
156Next, *note Other Important Topics:: discusses data alignment (*note
157SIMD alignment and fftw_malloc::), the storage scheme of
158multi-dimensional arrays (*note Multi-dimensional Array Format::), and
159FFTW's mechanism for storing plans on disk (*note Words of Wisdom-Saving
160Plans::).  Next, *note FFTW Reference:: provides comprehensive
161documentation of all FFTW's features.  Parallel transforms are discussed
162in their own chapters: *note Multi-threaded FFTW:: and *note
163Distributed-memory FFTW with MPI::.  Fortran programmers can also use
164FFTW, as described in *note Calling FFTW from Legacy Fortran:: and *note
165Calling FFTW from Modern Fortran::.  *note Installation and
166Customization:: explains how to install FFTW in your computer system and
167how to adapt FFTW to your needs.  License and copyright information is
168given in *note License and Copyright::.  Finally, we thank all the
169people who helped us in *note Acknowledgments::.
170
171
172File: fftw3.info,  Node: Tutorial,  Next: Other Important Topics,  Prev: Introduction,  Up: Top
173
1742 Tutorial
175**********
176
177* Menu:
178
179* Complex One-Dimensional DFTs::
180* Complex Multi-Dimensional DFTs::
181* One-Dimensional DFTs of Real Data::
182* Multi-Dimensional DFTs of Real Data::
183* More DFTs of Real Data::
184
185This chapter describes the basic usage of FFTW, i.e., how to compute the
186Fourier transform of a single array.  This chapter tells the truth, but
187not the _whole_ truth.  Specifically, FFTW implements additional
188routines and flags that are not documented here, although in many cases
189we try to indicate where added capabilities exist.  For more complete
190information, see *note FFTW Reference::.  (Note that you need to compile
191and install FFTW before you can use it in a program.  For the details of
192the installation, see *note Installation and Customization::.)
193
194   We recommend that you read this tutorial in order.(1)  At the least,
195read the first section (*note Complex One-Dimensional DFTs::) before
196reading any of the others, even if your main interest lies in one of the
197other transform types.
198
199   Users of FFTW version 2 and earlier may also want to read *note
200Upgrading from FFTW version 2::.
201
202   ---------- Footnotes ----------
203
204   (1) You can read the tutorial in bit-reversed order after computing
205your first transform.
206
207
208File: fftw3.info,  Node: Complex One-Dimensional DFTs,  Next: Complex Multi-Dimensional DFTs,  Prev: Tutorial,  Up: Tutorial
209
2102.1 Complex One-Dimensional DFTs
211================================
212
213     Plan: To bother about the best method of accomplishing an
214     accidental result.  [Ambrose Bierce, 'The Enlarged Devil's
215     Dictionary'.]
216
217   The basic usage of FFTW to compute a one-dimensional DFT of size 'N'
218is simple, and it typically looks something like this code:
219
220     #include <fftw3.h>
221     ...
222     {
223         fftw_complex *in, *out;
224         fftw_plan p;
225         ...
226         in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
227         out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
228         p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
229         ...
230         fftw_execute(p); /* repeat as needed */
231         ...
232         fftw_destroy_plan(p);
233         fftw_free(in); fftw_free(out);
234     }
235
236   You must link this code with the 'fftw3' library.  On Unix systems,
237link with '-lfftw3 -lm'.
238
239   The example code first allocates the input and output arrays.  You
240can allocate them in any way that you like, but we recommend using
241'fftw_malloc', which behaves like 'malloc' except that it properly
242aligns the array when SIMD instructions (such as SSE and Altivec) are
243available (*note SIMD alignment and fftw_malloc::).  [Alternatively, we
244provide a convenient wrapper function 'fftw_alloc_complex(N)' which has
245the same effect.]
246
247   The data is an array of type 'fftw_complex', which is by default a
248'double[2]' composed of the real ('in[i][0]') and imaginary ('in[i][1]')
249parts of a complex number.
250
251   The next step is to create a "plan", which is an object that contains
252all the data that FFTW needs to compute the FFT. This function creates
253the plan:
254
255     fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
256                                int sign, unsigned flags);
257
258   The first argument, 'n', is the size of the transform you are trying
259to compute.  The size 'n' can be any positive integer, but sizes that
260are products of small factors are transformed most efficiently (although
261prime sizes still use an O(n log n) algorithm).
262
263   The next two arguments are pointers to the input and output arrays of
264the transform.  These pointers can be equal, indicating an "in-place"
265transform.
266
267   The fourth argument, 'sign', can be either 'FFTW_FORWARD' ('-1') or
268'FFTW_BACKWARD' ('+1'), and indicates the direction of the transform you
269are interested in; technically, it is the sign of the exponent in the
270transform.
271
272   The 'flags' argument is usually either 'FFTW_MEASURE' or
273'FFTW_ESTIMATE'.  'FFTW_MEASURE' instructs FFTW to run and measure the
274execution time of several FFTs in order to find the best way to compute
275the transform of size 'n'.  This process takes some time (usually a few
276seconds), depending on your machine and on the size of the transform.
277'FFTW_ESTIMATE', on the contrary, does not run any computation and just
278builds a reasonable plan that is probably sub-optimal.  In short, if
279your program performs many transforms of the same size and
280initialization time is not important, use 'FFTW_MEASURE'; otherwise use
281the estimate.
282
283   _You must create the plan before initializing the input_, because
284'FFTW_MEASURE' overwrites the 'in'/'out' arrays.  (Technically,
285'FFTW_ESTIMATE' does not touch your arrays, but you should always create
286plans first just to be sure.)
287
288   Once the plan has been created, you can use it as many times as you
289like for transforms on the specified 'in'/'out' arrays, computing the
290actual transforms via 'fftw_execute(plan)':
291     void fftw_execute(const fftw_plan plan);
292
293   The DFT results are stored in-order in the array 'out', with the
294zero-frequency (DC) component in 'out[0]'.  If 'in != out', the
295transform is "out-of-place" and the input array 'in' is not modified.
296Otherwise, the input array is overwritten with the transform.
297
298   If you want to transform a _different_ array of the same size, you
299can create a new plan with 'fftw_plan_dft_1d' and FFTW automatically
300reuses the information from the previous plan, if possible.
301Alternatively, with the "guru" interface you can apply a given plan to a
302different array, if you are careful.  *Note FFTW Reference::.
303
304   When you are done with the plan, you deallocate it by calling
305'fftw_destroy_plan(plan)':
306     void fftw_destroy_plan(fftw_plan plan);
307   If you allocate an array with 'fftw_malloc()' you must deallocate it
308with 'fftw_free()'.  Do not use 'free()' or, heaven forbid, 'delete'.
309
310   FFTW computes an _unnormalized_ DFT. Thus, computing a forward
311followed by a backward transform (or vice versa) results in the original
312array scaled by 'n'.  For the definition of the DFT, see *note What FFTW
313Really Computes::.
314
315   If you have a C compiler, such as 'gcc', that supports the C99
316standard, and you '#include <complex.h>' _before_ '<fftw3.h>', then
317'fftw_complex' is the native double-precision complex type and you can
318manipulate it with ordinary arithmetic.  Otherwise, FFTW defines its own
319complex type, which is bit-compatible with the C99 complex type.  *Note
320Complex numbers::.  (The C++ '<complex>' template class may also be
321usable via a typecast.)
322
323   To use single or long-double precision versions of FFTW, replace the
324'fftw_' prefix by 'fftwf_' or 'fftwl_' and link with '-lfftw3f' or
325'-lfftw3l', but use the _same_ '<fftw3.h>' header file.
326
327   Many more flags exist besides 'FFTW_MEASURE' and 'FFTW_ESTIMATE'.
328For example, use 'FFTW_PATIENT' if you're willing to wait even longer
329for a possibly even faster plan (*note FFTW Reference::).  You can also
330save plans for future use, as described by *note Words of Wisdom-Saving
331Plans::.
332
333
334File: fftw3.info,  Node: Complex Multi-Dimensional DFTs,  Next: One-Dimensional DFTs of Real Data,  Prev: Complex One-Dimensional DFTs,  Up: Tutorial
335
3362.2 Complex Multi-Dimensional DFTs
337==================================
338
339Multi-dimensional transforms work much the same way as one-dimensional
340transforms: you allocate arrays of 'fftw_complex' (preferably using
341'fftw_malloc'), create an 'fftw_plan', execute it as many times as you
342want with 'fftw_execute(plan)', and clean up with
343'fftw_destroy_plan(plan)' (and 'fftw_free').
344
345   FFTW provides two routines for creating plans for 2d and 3d
346transforms, and one routine for creating plans of arbitrary
347dimensionality.  The 2d and 3d routines have the following signature:
348     fftw_plan fftw_plan_dft_2d(int n0, int n1,
349                                fftw_complex *in, fftw_complex *out,
350                                int sign, unsigned flags);
351     fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
352                                fftw_complex *in, fftw_complex *out,
353                                int sign, unsigned flags);
354
355   These routines create plans for 'n0' by 'n1' two-dimensional (2d)
356transforms and 'n0' by 'n1' by 'n2' 3d transforms, respectively.  All of
357these transforms operate on contiguous arrays in the C-standard
358"row-major" order, so that the last dimension has the fastest-varying
359index in the array.  This layout is described further in *note
360Multi-dimensional Array Format::.
361
362   FFTW can also compute transforms of higher dimensionality.  In order
363to avoid confusion between the various meanings of the the word
364"dimension", we use the term _rank_ to denote the number of independent
365indices in an array.(1)  For example, we say that a 2d transform has
366rank 2, a 3d transform has rank 3, and so on.  You can plan transforms
367of arbitrary rank by means of the following function:
368
369     fftw_plan fftw_plan_dft(int rank, const int *n,
370                             fftw_complex *in, fftw_complex *out,
371                             int sign, unsigned flags);
372
373   Here, 'n' is a pointer to an array 'n[rank]' denoting an 'n[0]' by
374'n[1]' by ... by 'n[rank-1]' transform.  Thus, for example, the call
375     fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
376   is equivalent to the following code fragment:
377     int n[2];
378     n[0] = n0;
379     n[1] = n1;
380     fftw_plan_dft(2, n, in, out, sign, flags);
381   'fftw_plan_dft' is not restricted to 2d and 3d transforms, however,
382but it can plan transforms of arbitrary rank.
383
384   You may have noticed that all the planner routines described so far
385have overlapping functionality.  For example, you can plan a 1d or 2d
386transform by using 'fftw_plan_dft' with a 'rank' of '1' or '2', or even
387by calling 'fftw_plan_dft_3d' with 'n0' and/or 'n1' equal to '1' (with
388no loss in efficiency).  This pattern continues, and FFTW's planning
389routines in general form a "partial order," sequences of interfaces with
390strictly increasing generality but correspondingly greater complexity.
391
392   'fftw_plan_dft' is the most general complex-DFT routine that we
393describe in this tutorial, but there are also the advanced and guru
394interfaces, which allow one to efficiently combine multiple/strided
395transforms into a single FFTW plan, transform a subset of a larger
396multi-dimensional array, and/or to handle more general complex-number
397formats.  For more information, see *note FFTW Reference::.
398
399   ---------- Footnotes ----------
400
401   (1) The term "rank" is commonly used in the APL, FORTRAN, and Common
402Lisp traditions, although it is not so common in the C world.
403
404
405File: fftw3.info,  Node: One-Dimensional DFTs of Real Data,  Next: Multi-Dimensional DFTs of Real Data,  Prev: Complex Multi-Dimensional DFTs,  Up: Tutorial
406
4072.3 One-Dimensional DFTs of Real Data
408=====================================
409
410In many practical applications, the input data 'in[i]' are purely real
411numbers, in which case the DFT output satisfies the "Hermitian"
412redundancy: 'out[i]' is the conjugate of 'out[n-i]'.  It is possible to
413take advantage of these circumstances in order to achieve roughly a
414factor of two improvement in both speed and memory usage.
415
416   In exchange for these speed and space advantages, the user sacrifices
417some of the simplicity of FFTW's complex transforms.  First of all, the
418input and output arrays are of _different sizes and types_: the input is
419'n' real numbers, while the output is 'n/2+1' complex numbers (the
420non-redundant outputs); this also requires slight "padding" of the input
421array for in-place transforms.  Second, the inverse transform (complex
422to real) has the side-effect of _overwriting its input array_, by
423default.  Neither of these inconveniences should pose a serious problem
424for users, but it is important to be aware of them.
425
426   The routines to perform real-data transforms are almost the same as
427those for complex transforms: you allocate arrays of 'double' and/or
428'fftw_complex' (preferably using 'fftw_malloc' or 'fftw_alloc_complex'),
429create an 'fftw_plan', execute it as many times as you want with
430'fftw_execute(plan)', and clean up with 'fftw_destroy_plan(plan)' (and
431'fftw_free').  The only differences are that the input (or output) is of
432type 'double' and there are new routines to create the plan.  In one
433dimension:
434
435     fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
436                                    unsigned flags);
437     fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
438                                    unsigned flags);
439
440   for the real input to complex-Hermitian output ("r2c") and
441complex-Hermitian input to real output ("c2r") transforms.  Unlike the
442complex DFT planner, there is no 'sign' argument.  Instead, r2c DFTs are
443always 'FFTW_FORWARD' and c2r DFTs are always 'FFTW_BACKWARD'.  (For
444single/long-double precision 'fftwf' and 'fftwl', 'double' should be
445replaced by 'float' and 'long double', respectively.)
446
447   Here, 'n' is the "logical" size of the DFT, not necessarily the
448physical size of the array.  In particular, the real ('double') array
449has 'n' elements, while the complex ('fftw_complex') array has 'n/2+1'
450elements (where the division is rounded down).  For an in-place
451transform, 'in' and 'out' are aliased to the same array, which must be
452big enough to hold both; so, the real array would actually have
453'2*(n/2+1)' elements, where the elements beyond the first 'n' are unused
454padding.  (Note that this is very different from the concept of
455"zero-padding" a transform to a larger length, which changes the logical
456size of the DFT by actually adding new input data.)  The kth element of
457the complex array is exactly the same as the kth element of the
458corresponding complex DFT. All positive 'n' are supported; products of
459small factors are most efficient, but an O(n log n) algorithm is used
460even for prime sizes.
461
462   As noted above, the c2r transform destroys its input array even for
463out-of-place transforms.  This can be prevented, if necessary, by
464including 'FFTW_PRESERVE_INPUT' in the 'flags', with unfortunately some
465sacrifice in performance.  This flag is also not currently supported for
466multi-dimensional real DFTs (next section).
467
468   Readers familiar with DFTs of real data will recall that the 0th (the
469"DC") and 'n/2'-th (the "Nyquist" frequency, when 'n' is even) elements
470of the complex output are purely real.  Some implementations therefore
471store the Nyquist element where the DC imaginary part would go, in order
472to make the input and output arrays the same size.  Such packing,
473however, does not generalize well to multi-dimensional transforms, and
474the space savings are miniscule in any case; FFTW does not support it.
475
476   An alternative interface for one-dimensional r2c and c2r DFTs can be
477found in the 'r2r' interface (*note The Halfcomplex-format DFT::), with
478"halfcomplex"-format output that _is_ the same size (and type) as the
479input array.  That interface, although it is not very useful for
480multi-dimensional transforms, may sometimes yield better performance.
481
482
483File: fftw3.info,  Node: Multi-Dimensional DFTs of Real Data,  Next: More DFTs of Real Data,  Prev: One-Dimensional DFTs of Real Data,  Up: Tutorial
484
4852.4 Multi-Dimensional DFTs of Real Data
486=======================================
487
488Multi-dimensional DFTs of real data use the following planner routines:
489
490     fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
491                                    double *in, fftw_complex *out,
492                                    unsigned flags);
493     fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
494                                    double *in, fftw_complex *out,
495                                    unsigned flags);
496     fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
497                                 double *in, fftw_complex *out,
498                                 unsigned flags);
499
500   as well as the corresponding 'c2r' routines with the input/output
501types swapped.  These routines work similarly to their complex
502analogues, except for the fact that here the complex output array is cut
503roughly in half and the real array requires padding for in-place
504transforms (as in 1d, above).
505
506   As before, 'n' is the logical size of the array, and the consequences
507of this on the the format of the complex arrays deserve careful
508attention.  Suppose that the real data has dimensions n[0] x n[1] x n[2]
509x ...  x n[d-1] (in row-major order).  Then, after an r2c transform, the
510output is an n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) array of
511'fftw_complex' values in row-major order, corresponding to slightly over
512half of the output of the corresponding complex DFT. (The division is
513rounded down.)  The ordering of the data is otherwise exactly the same
514as in the complex-DFT case.
515
516   For out-of-place transforms, this is the end of the story: the real
517data is stored as a row-major array of size n[0] x n[1] x n[2] x ...  x
518n[d-1] and the complex data is stored as a row-major array of size n[0]
519x n[1] x n[2] x ...  x (n[d-1]/2 + 1) .
520
521   For in-place transforms, however, extra padding of the real-data
522array is necessary because the complex array is larger than the real
523array, and the two arrays share the same memory locations.  Thus, for
524in-place transforms, the final dimension of the real-data array must be
525padded with extra values to accommodate the size of the complex
526data--two values if the last dimension is even and one if it is odd.
527That is, the last dimension of the real data must physically contain 2 *
528(n[d-1]/2+1) 'double' values (exactly enough to hold the complex data).
529This physical array size does not, however, change the _logical_ array
530size--only n[d-1] values are actually stored in the last dimension, and
531n[d-1] is the last dimension passed to the plan-creation routine.
532
533   For example, consider the transform of a two-dimensional real array
534of size 'n0' by 'n1'.  The output of the r2c transform is a
535two-dimensional complex array of size 'n0' by 'n1/2+1', where the 'y'
536dimension has been cut nearly in half because of redundancies in the
537output.  Because 'fftw_complex' is twice the size of 'double', the
538output array is slightly bigger than the input array.  Thus, if we want
539to compute the transform in place, we must _pad_ the input array so that
540it is of size 'n0' by '2*(n1/2+1)'.  If 'n1' is even, then there are two
541padding elements at the end of each row (which need not be initialized,
542as they are only used for output).
543
544   These transforms are unnormalized, so an r2c followed by a c2r
545transform (or vice versa) will result in the original data scaled by the
546number of real data elements--that is, the product of the (logical)
547dimensions of the real data.
548
549   (Because the last dimension is treated specially, if it is equal to
550'1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
551transform.  In that case, the last complex dimension also has size '1'
552('=1/2+1'), and no advantage is gained over the complex transforms.)
553
554
555File: fftw3.info,  Node: More DFTs of Real Data,  Prev: Multi-Dimensional DFTs of Real Data,  Up: Tutorial
556
5572.5 More DFTs of Real Data
558==========================
559
560* Menu:
561
562* The Halfcomplex-format DFT::
563* Real even/odd DFTs (cosine/sine transforms)::
564* The Discrete Hartley Transform::
565
566FFTW supports several other transform types via a unified "r2r"
567(real-to-real) interface, so called because it takes a real ('double')
568array and outputs a real array of the same size.  These r2r transforms
569currently fall into three categories: DFTs of real input and
570complex-Hermitian output in halfcomplex format, DFTs of real input with
571even/odd symmetry (a.k.a.  discrete cosine/sine transforms, DCTs/DSTs),
572and discrete Hartley transforms (DHTs), all described in more detail by
573the following sections.
574
575   The r2r transforms follow the by now familiar interface of creating
576an 'fftw_plan', executing it with 'fftw_execute(plan)', and destroying
577it with 'fftw_destroy_plan(plan)'.  Furthermore, all r2r transforms
578share the same planner interface:
579
580     fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
581                                fftw_r2r_kind kind, unsigned flags);
582     fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
583                                fftw_r2r_kind kind0, fftw_r2r_kind kind1,
584                                unsigned flags);
585     fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
586                                double *in, double *out,
587                                fftw_r2r_kind kind0,
588                                fftw_r2r_kind kind1,
589                                fftw_r2r_kind kind2,
590                                unsigned flags);
591     fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
592                             const fftw_r2r_kind *kind, unsigned flags);
593
594   Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
595transforms for contiguous arrays in row-major order, transforming (real)
596input to output of the same size, where 'n' specifies the _physical_
597dimensions of the arrays.  All positive 'n' are supported (with the
598exception of 'n=1' for the 'FFTW_REDFT00' kind, noted in the real-even
599subsection below); products of small factors are most efficient
600(factorizing 'n-1' and 'n+1' for 'FFTW_REDFT00' and 'FFTW_RODFT00'
601kinds, described below), but an O(n log n) algorithm is used even for
602prime sizes.
603
604   Each dimension has a "kind" parameter, of type 'fftw_r2r_kind',
605specifying the kind of r2r transform to be used for that dimension.  (In
606the case of 'fftw_plan_r2r', this is an array 'kind[rank]' where
607'kind[i]' is the transform kind for the dimension 'n[i]'.)  The kind can
608be one of a set of predefined constants, defined in the following
609subsections.
610
611   In other words, FFTW computes the separable product of the specified
612r2r transforms over each dimension, which can be used e.g.  for partial
613differential equations with mixed boundary conditions.  (For some r2r
614kinds, notably the halfcomplex DFT and the DHT, such a separable product
615is somewhat problematic in more than one dimension, however, as is
616described below.)
617
618   In the current version of FFTW, all r2r transforms except for the
619halfcomplex type are computed via pre- or post-processing of halfcomplex
620transforms, and they are therefore not as fast as they could be.  Since
621most other general DCT/DST codes employ a similar algorithm, however,
622FFTW's implementation should provide at least competitive performance.
623
624
625File: fftw3.info,  Node: The Halfcomplex-format DFT,  Next: Real even/odd DFTs (cosine/sine transforms),  Prev: More DFTs of Real Data,  Up: More DFTs of Real Data
626
6272.5.1 The Halfcomplex-format DFT
628--------------------------------
629
630An r2r kind of 'FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
631One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
632output, and may sometimes be faster and/or more convenient than the
633latter.  The inverse "hc2r" transform is of kind 'FFTW_HC2R'.  This
634consists of the non-redundant half of the complex output for a 1d
635real-input DFT of size 'n', stored as a sequence of 'n' real numbers
636('double') in the format:
637
638   r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
639
640   Here, rk is the real part of the kth output, and ik is the imaginary
641part.  (Division by 2 is rounded down.)  For a halfcomplex array
642'hc[n]', the kth component thus has its real part in 'hc[k]' and its
643imaginary part in 'hc[n-k]', with the exception of 'k' '==' '0' or 'n/2'
644(the latter only if 'n' is even)--in these two cases, the imaginary part
645is zero due to symmetries of the real-input DFT, and is not stored.
646Thus, the r2hc transform of 'n' real values is a halfcomplex array of
647length 'n', and vice versa for hc2r.
648
649   Aside from the differing format, the output of
650'FFTW_R2HC'/'FFTW_HC2R' is otherwise exactly the same as for the
651corresponding 1d r2c/c2r transform (i.e.  'FFTW_FORWARD'/'FFTW_BACKWARD'
652transforms, respectively).  Recall that these transforms are
653unnormalized, so r2hc followed by hc2r will result in the original data
654multiplied by 'n'.  Furthermore, like the c2r transform, an out-of-place
655hc2r transform will _destroy its input_ array.
656
657   Although these halfcomplex transforms can be used with the
658multi-dimensional r2r interface, the interpretation of such a separable
659product of transforms along each dimension is problematic.  For example,
660consider a two-dimensional 'n0' by 'n1', r2hc by r2hc transform planned
661by 'fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC,
662FFTW_MEASURE)'.  Conceptually, FFTW first transforms the rows (of size
663'n1') to produce halfcomplex rows, and then transforms the columns (of
664size 'n0').  Half of these column transforms, however, are of imaginary
665parts, and should therefore be multiplied by i and combined with the
666r2hc transforms of the real columns to produce the 2d DFT amplitudes;
667FFTW's r2r transform does _not_ perform this combination for you.  Thus,
668if a multi-dimensional real-input/output DFT is required, we recommend
669using the ordinary r2c/c2r interface (*note Multi-Dimensional DFTs of
670Real Data::).
671
672
673File: fftw3.info,  Node: Real even/odd DFTs (cosine/sine transforms),  Next: The Discrete Hartley Transform,  Prev: The Halfcomplex-format DFT,  Up: More DFTs of Real Data
674
6752.5.2 Real even/odd DFTs (cosine/sine transforms)
676-------------------------------------------------
677
678The Fourier transform of a real-even function f(-x) = f(x) is real-even,
679and i times the Fourier transform of a real-odd function f(-x) = -f(x)
680is real-odd.  Similar results hold for a discrete Fourier transform, and
681thus for these symmetries the need for complex inputs/outputs is
682entirely eliminated.  Moreover, one gains a factor of two in speed/space
683from the fact that the data are real, and an additional factor of two
684from the even/odd symmetry: only the non-redundant (first) half of the
685array need be stored.  The result is the real-even DFT ("REDFT") and the
686real-odd DFT ("RODFT"), also known as the discrete cosine and sine
687transforms ("DCT" and "DST"), respectively.
688
689   (In this section, we describe the 1d transforms; multi-dimensional
690transforms are just a separable product of these transforms operating
691along each dimension.)
692
693   Because of the discrete sampling, one has an additional choice: is
694the data even/odd around a sampling point, or around the point halfway
695between two samples?  The latter corresponds to _shifting_ the samples
696by _half_ an interval, and gives rise to several transform variants
697denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether
698the input (a) and/or output (b) are shifted by half a sample (1 means it
699is shifted).  These are also known as types I-IV of the DCT and DST, and
700all four types are supported by FFTW's r2r interface.(1)
701
702   The r2r kinds for the various REDFT and RODFT types supported by
703FFTW, along with the boundary conditions at both ends of the _input_
704array ('n' real numbers 'in[j=0..n-1]'), are:
705
706   * 'FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
707
708   * 'FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
709     around j=n-0.5.
710
711   * 'FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
712     around j=n.
713
714   * 'FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
715
716   * 'FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
717
718   * 'FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
719
720   * 'FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
721
722   * 'FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
723
724   Note that these symmetries apply to the "logical" array being
725transformed; *there are no constraints on your physical input data*.
726So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
727abcde, it corresponds to the DFT of the logical even array abcdedcb of
728size 8.  A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
729size-8 logical DFT of the even array abcddcba, shifted by half a sample.
730
731   All of these transforms are invertible.  The inverse of R*DFT00 is
732R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
733simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
734However, the transforms computed by FFTW are unnormalized, exactly like
735the corresponding real and complex DFTs, so computing a transform
736followed by its inverse yields the original array scaled by N, where N
737is the _logical_ DFT size.  For REDFT00, N=2(n-1); for RODFT00,
738N=2(n+1); otherwise, N=2n.
739
740   Note that the boundary conditions of the transform output array are
741given by the input boundary conditions of the inverse transform.  Thus,
742the above transforms are all inequivalent in terms of input/output
743boundary conditions, even neglecting the 0.5 shift difference.
744
745   FFTW is most efficient when N is a product of small factors; note
746that this _differs_ from the factorization of the physical size 'n' for
747REDFT00 and RODFT00!  There is another oddity: 'n=1' REDFT00 transforms
748correspond to N=0, and so are _not defined_ (the planner will return
749'NULL').  Otherwise, any positive 'n' is supported.
750
751   For the precise mathematical definitions of these transforms as used
752by FFTW, see *note What FFTW Really Computes::.  (For people accustomed
753to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
754the cos/sin functions so that they correspond precisely to an even/odd
755DFT of size N. Some authors also include additional multiplicative
756factors of sqrt(2) for selected inputs and outputs; this makes the
757transform orthogonal, but sacrifices the direct equivalence to a
758symmetric DFT.)
759
760Which type do you need?
761.......................
762
763Since the required flavor of even/odd DFT depends upon your problem, you
764are the best judge of this choice, but we can make a few comments on
765relative efficiency to help you in your selection.  In particular,
766R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
767for odd sizes), while the R*DFT00 transforms are sometimes significantly
768slower (especially for even sizes).(2)
769
770   Thus, if only the boundary conditions on the transform inputs are
771specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
772R*DFT11 (unless the half-sample shift or the self-inverse property is
773significant for your problem).
774
775   If performance is important to you and you are using only small sizes
776(say n<200), e.g.  for multi-dimensional transforms, then you might
777consider generating hard-coded transforms of those sizes and types that
778you are interested in (*note Generating your own code::).
779
780   We are interested in hearing what types of symmetric transforms you
781find most useful.
782
783   ---------- Footnotes ----------
784
785   (1) There are also type V-VIII transforms, which correspond to a
786logical DFT of _odd_ size N, independent of whether the physical size
787'n' is odd, but we do not support these variants.
788
789   (2) R*DFT00 is sometimes slower in FFTW because we discovered that
790the standard algorithm for computing this by a pre/post-processed real
791DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources
792for decades now--has serious numerical problems: it already loses
793several decimal places of accuracy for 16k sizes.  There seem to be only
794two alternatives in the literature that do not suffer similarly: a
795recursive decomposition into smaller DCTs, which would require a large
796set of codelets for efficiency and generality, or sacrificing a factor
797of 2 in speed to use a real DFT of twice the size.  We currently employ
798the latter technique for general n, as well as a limited form of the
799former method: a split-radix decomposition when n is odd (N a multiple
800of 4).  For N containing many factors of 2, the split-radix method seems
801to recover most of the speed of the standard algorithm without the
802accuracy tradeoff.
803
804
805File: fftw3.info,  Node: The Discrete Hartley Transform,  Prev: Real even/odd DFTs (cosine/sine transforms),  Up: More DFTs of Real Data
806
8072.5.3 The Discrete Hartley Transform
808------------------------------------
809
810If you are planning to use the DHT because you've heard that it is
811"faster" than the DFT (FFT), *stop here*.  The DHT is not faster than
812the DFT. That story is an old but enduring misconception that was
813debunked in 1987.
814
815   The discrete Hartley transform (DHT) is an invertible linear
816transform closely related to the DFT. In the DFT, one multiplies each
817input by cos - i * sin (a complex exponential), whereas in the DHT each
818input is multiplied by simply cos + sin.  Thus, the DHT transforms 'n'
819real numbers to 'n' real numbers, and has the convenient property of
820being its own inverse.  In FFTW, a DHT (of any positive 'n') can be
821specified by an r2r kind of 'FFTW_DHT'.
822
823   Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
824size 'n' followed by another DHT of the same size will result in the
825original array multiplied by 'n'.
826
827   The DHT was originally proposed as a more efficient alternative to
828the DFT for real data, but it was subsequently shown that a specialized
829DFT (such as FFTW's r2hc or r2c transforms) could be just as fast.  In
830FFTW, the DHT is actually computed by post-processing an r2hc transform,
831so there is ordinarily no reason to prefer it from a performance
832perspective.(1)  However, we have heard rumors that the DHT might be the
833most appropriate transform in its own right for certain applications,
834and we would be very interested to hear from anyone who finds it useful.
835
836   If 'FFTW_DHT' is specified for multiple dimensions of a
837multi-dimensional transform, FFTW computes the separable product of 1d
838DHTs along each dimension.  Unfortunately, this is not quite the same
839thing as a true multi-dimensional DHT; you can compute the latter, if
840necessary, with at most 'rank-1' post-processing passes [see e.g.  H.
841Hao and R. N. Bracewell, Proc.  IEEE 75, 264-266 (1987)].
842
843   For the precise mathematical definition of the DHT as used by FFTW,
844see *note What FFTW Really Computes::.
845
846   ---------- Footnotes ----------
847
848   (1) We provide the DHT mainly as a byproduct of some internal
849algorithms.  FFTW computes a real input/output DFT of _prime_ size by
850re-expressing it as a DHT plus post/pre-processing and then using
851Rader's prime-DFT algorithm adapted to the DHT.
852
853
854File: fftw3.info,  Node: Other Important Topics,  Next: FFTW Reference,  Prev: Tutorial,  Up: Top
855
8563 Other Important Topics
857************************
858
859* Menu:
860
861* SIMD alignment and fftw_malloc::
862* Multi-dimensional Array Format::
863* Words of Wisdom-Saving Plans::
864* Caveats in Using Wisdom::
865
866
867File: fftw3.info,  Node: SIMD alignment and fftw_malloc,  Next: Multi-dimensional Array Format,  Prev: Other Important Topics,  Up: Other Important Topics
868
8693.1 SIMD alignment and fftw_malloc
870==================================
871
872SIMD, which stands for "Single Instruction Multiple Data," is a set of
873special operations supported by some processors to perform a single
874operation on several numbers (usually 2 or 4) simultaneously.  SIMD
875floating-point instructions are available on several popular CPUs:
876SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
877VSX on some POWER/PowerPCs, NEON on some ARM models.  FFTW can be
878compiled to support the SIMD instructions on any of these systems.
879
880   A program linking to an FFTW library compiled with SIMD support can
881obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
882In order to obtain this speedup, however, the arrays of complex (or
883real) data passed to FFTW must be specially aligned in memory (typically
88416-byte aligned), and often this alignment is more stringent than that
885provided by the usual 'malloc' (etc.)  allocation routines.
886
887   In order to guarantee proper alignment for SIMD, therefore, in case
888your program is ever linked against a SIMD-using FFTW, we recommend
889allocating your transform data with 'fftw_malloc' and de-allocating it
890with 'fftw_free'.  These have exactly the same interface and behavior as
891'malloc'/'free', except that for a SIMD FFTW they ensure that the
892returned pointer has the necessary alignment (by calling 'memalign' or
893its equivalent on your OS).
894
895   You are not _required_ to use 'fftw_malloc'.  You can allocate your
896data in any way that you like, from 'malloc' to 'new' (in C++) to a
897fixed-size array declaration.  If the array happens not to be properly
898aligned, FFTW will not use the SIMD extensions.
899
900   Since 'fftw_malloc' only ever needs to be used for real and complex
901arrays, we provide two convenient wrapper routines 'fftw_alloc_real(N)'
902and 'fftw_alloc_complex(N)' that are equivalent to
903'(double*)fftw_malloc(sizeof(double) * N)' and
904'(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively (or
905their equivalents in other precisions).
906
907
908File: fftw3.info,  Node: Multi-dimensional Array Format,  Next: Words of Wisdom-Saving Plans,  Prev: SIMD alignment and fftw_malloc,  Up: Other Important Topics
909
9103.2 Multi-dimensional Array Format
911==================================
912
913This section describes the format in which multi-dimensional arrays are
914stored in FFTW. We felt that a detailed discussion of this topic was
915necessary.  Since several different formats are common, this topic is
916often a source of confusion.
917
918* Menu:
919
920* Row-major Format::
921* Column-major Format::
922* Fixed-size Arrays in C::
923* Dynamic Arrays in C::
924* Dynamic Arrays in C-The Wrong Way::
925
926
927File: fftw3.info,  Node: Row-major Format,  Next: Column-major Format,  Prev: Multi-dimensional Array Format,  Up: Multi-dimensional Array Format
928
9293.2.1 Row-major Format
930----------------------
931
932The multi-dimensional arrays passed to 'fftw_plan_dft' etcetera are
933expected to be stored as a single contiguous block in "row-major" order
934(sometimes called "C order").  Basically, this means that as you step
935through adjacent memory locations, the first dimension's index varies
936most slowly and the last dimension's index varies most quickly.
937
938   To be more explicit, let us consider an array of rank d whose
939dimensions are n[0] x n[1] x n[2] x ...  x n[d-1] .  Now, we specify a
940location in the array by a sequence of d (zero-based) indices, one for
941each dimension: (i[0], i[1], ..., i[d-1]).  If the array is stored in
942row-major order, then this element is located at the position i[d-1] +
943n[d-1] * (i[d-2] + n[d-2] * (...  + n[1] * i[0])).
944
945   Note that, for the ordinary complex DFT, each element of the array
946must be of type 'fftw_complex'; i.e.  a (real, imaginary) pair of
947(double-precision) numbers.
948
949   In the advanced FFTW interface, the physical dimensions n from which
950the indices are computed can be different from (larger than) the logical
951dimensions of the transform to be computed, in order to transform a
952subset of a larger array.  Note also that, in the advanced interface,
953the expression above is multiplied by a "stride" to get the actual array
954index--this is useful in situations where each element of the
955multi-dimensional array is actually a data structure (or another array),
956and you just want to transform a single field.  In the basic interface,
957however, the stride is 1.
958
959
960File: fftw3.info,  Node: Column-major Format,  Next: Fixed-size Arrays in C,  Prev: Row-major Format,  Up: Multi-dimensional Array Format
961
9623.2.2 Column-major Format
963-------------------------
964
965Readers from the Fortran world are used to arrays stored in
966"column-major" order (sometimes called "Fortran order").  This is
967essentially the exact opposite of row-major order in that, here, the
968_first_ dimension's index varies most quickly.
969
970   If you have an array stored in column-major order and wish to
971transform it using FFTW, it is quite easy to do.  When creating the
972plan, simply pass the dimensions of the array to the planner in _reverse
973order_.  For example, if your array is a rank three 'N x M x L' matrix
974in column-major order, you should pass the dimensions of the array as if
975it were an 'L x M x N' matrix (which it is, from the perspective of
976FFTW). This is done for you _automatically_ by the FFTW legacy-Fortran
977interface (*note Calling FFTW from Legacy Fortran::), but you must do it
978manually with the modern Fortran interface (*note Reversing array
979dimensions::).
980
981
982File: fftw3.info,  Node: Fixed-size Arrays in C,  Next: Dynamic Arrays in C,  Prev: Column-major Format,  Up: Multi-dimensional Array Format
983
9843.2.3 Fixed-size Arrays in C
985----------------------------
986
987A multi-dimensional array whose size is declared at compile time in C is
988_already_ in row-major order.  You don't have to do anything special to
989transform it.  For example:
990
991     {
992          fftw_complex data[N0][N1][N2];
993          fftw_plan plan;
994          ...
995          plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
996                                  FFTW_FORWARD, FFTW_ESTIMATE);
997          ...
998     }
999
1000   This will plan a 3d in-place transform of size 'N0 x N1 x N2'.
1001Notice how we took the address of the zero-th element to pass to the
1002planner (we could also have used a typecast).
1003
1004   However, we tend to _discourage_ users from declaring their arrays in
1005this way, for two reasons.  First, this allocates the array on the stack
1006("automatic" storage), which has a very limited size on most operating
1007systems (declaring an array with more than a few thousand elements will
1008often cause a crash).  (You can get around this limitation on many
1009systems by declaring the array as 'static' and/or global, but that has
1010its own drawbacks.)  Second, it may not optimally align the array for
1011use with a SIMD FFTW (*note SIMD alignment and fftw_malloc::).  Instead,
1012we recommend using 'fftw_malloc', as described below.
1013
1014
1015File: fftw3.info,  Node: Dynamic Arrays in C,  Next: Dynamic Arrays in C-The Wrong Way,  Prev: Fixed-size Arrays in C,  Up: Multi-dimensional Array Format
1016
10173.2.4 Dynamic Arrays in C
1018-------------------------
1019
1020We recommend allocating most arrays dynamically, with 'fftw_malloc'.
1021This isn't too hard to do, although it is not as straightforward for
1022multi-dimensional arrays as it is for one-dimensional arrays.
1023
1024   Creating the array is simple: using a dynamic-allocation routine like
1025'fftw_malloc', allocate an array big enough to store N 'fftw_complex'
1026values (for a complex DFT), where N is the product of the sizes of the
1027array dimensions (i.e.  the total number of complex values in the
1028array).  For example, here is code to allocate a 5 x 12 x 27 rank-3
1029array:
1030
1031     fftw_complex *an_array;
1032     an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
1033
1034   Accessing the array elements, however, is more tricky--you can't
1035simply use multiple applications of the '[]' operator like you could for
1036fixed-size arrays.  Instead, you have to explicitly compute the offset
1037into the array using the formula given earlier for row-major arrays.
1038For example, to reference the (i,j,k)-th element of the array allocated
1039above, you would use the expression 'an_array[k + 27 * (j + 12 * i)]'.
1040
1041   This pain can be alleviated somewhat by defining appropriate macros,
1042or, in C++, creating a class and overloading the '()' operator.  The
1043recent C99 standard provides a way to reinterpret the dynamic array as a
1044"variable-length" multi-dimensional array amenable to '[]', but this
1045feature is not yet widely supported by compilers.
1046
1047
1048File: fftw3.info,  Node: Dynamic Arrays in C-The Wrong Way,  Prev: Dynamic Arrays in C,  Up: Multi-dimensional Array Format
1049
10503.2.5 Dynamic Arrays in C--The Wrong Way
1051----------------------------------------
1052
1053A different method for allocating multi-dimensional arrays in C is often
1054suggested that is incompatible with FFTW: _using it will cause FFTW to
1055die a painful death_.  We discuss the technique here, however, because
1056it is so commonly known and used.  This method is to create arrays of
1057pointers of arrays of pointers of ...etcetera.  For example, the
1058analogue in this method to the example above is:
1059
1060     int i,j;
1061     fftw_complex ***a_bad_array;  /* another way to make a 5x12x27 array */
1062
1063     a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
1064     for (i = 0; i < 5; ++i) {
1065          a_bad_array[i] =
1066             (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
1067          for (j = 0; j < 12; ++j)
1068               a_bad_array[i][j] =
1069                     (fftw_complex *) malloc(27 * sizeof(fftw_complex));
1070     }
1071
1072   As you can see, this sort of array is inconvenient to allocate (and
1073deallocate).  On the other hand, it has the advantage that the
1074(i,j,k)-th element can be referenced simply by 'a_bad_array[i][j][k]'.
1075
1076   If you like this technique and want to maximize convenience in
1077accessing the array, but still want to pass the array to FFTW, you can
1078use a hybrid method.  Allocate the array as one contiguous block, but
1079also declare an array of arrays of pointers that point to appropriate
1080places in the block.  That sort of trick is beyond the scope of this
1081documentation; for more information on multi-dimensional arrays in C,
1082see the 'comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html).
1083
1084
1085File: fftw3.info,  Node: Words of Wisdom-Saving Plans,  Next: Caveats in Using Wisdom,  Prev: Multi-dimensional Array Format,  Up: Other Important Topics
1086
10873.3 Words of Wisdom--Saving Plans
1088=================================
1089
1090FFTW implements a method for saving plans to disk and restoring them.
1091In fact, what FFTW does is more general than just saving and loading
1092plans.  The mechanism is called "wisdom".  Here, we describe this
1093feature at a high level.  *Note FFTW Reference::, for a less casual but
1094more complete discussion of how to use wisdom in FFTW.
1095
1096   Plans created with the 'FFTW_MEASURE', 'FFTW_PATIENT', or
1097'FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may
1098require a long time to compute because FFTW must measure the runtime of
1099many possible plans and select the best one.  This setup is designed for
1100the situations where so many transforms of the same size must be
1101computed that the start-up time is irrelevant.  For short initialization
1102times, but slower transforms, we have provided 'FFTW_ESTIMATE'.  The
1103'wisdom' mechanism is a way to get the best of both worlds: you compute
1104a good plan once, save it to disk, and later reload it as many times as
1105necessary.  The wisdom mechanism can actually save and reload many plans
1106at once, not just one.
1107
1108   Whenever you create a plan, the FFTW planner accumulates wisdom,
1109which is information sufficient to reconstruct the plan.  After
1110planning, you can save this information to disk by means of the
1111function:
1112     int fftw_export_wisdom_to_filename(const char *filename);
1113   (This function returns non-zero on success.)
1114
1115   The next time you run the program, you can restore the wisdom with
1116'fftw_import_wisdom_from_filename' (which also returns non-zero on
1117success), and then recreate the plan using the same flags as before.
1118     int fftw_import_wisdom_from_filename(const char *filename);
1119
1120   Wisdom is automatically used for any size to which it is applicable,
1121as long as the planner flags are not more "patient" than those with
1122which the wisdom was created.  For example, wisdom created with
1123'FFTW_MEASURE' can be used if you later plan with 'FFTW_ESTIMATE' or
1124'FFTW_MEASURE', but not with 'FFTW_PATIENT'.
1125
1126   The 'wisdom' is cumulative, and is stored in a global, private data
1127structure managed internally by FFTW. The storage space required is
1128minimal, proportional to the logarithm of the sizes the wisdom was
1129generated from.  If memory usage is a concern, however, the wisdom can
1130be forgotten and its associated memory freed by calling:
1131     void fftw_forget_wisdom(void);
1132
1133   Wisdom can be exported to a file, a string, or any other medium.  For
1134details, see *note Wisdom::.
1135
1136
1137File: fftw3.info,  Node: Caveats in Using Wisdom,  Prev: Words of Wisdom-Saving Plans,  Up: Other Important Topics
1138
11393.4 Caveats in Using Wisdom
1140===========================
1141
1142     For in much wisdom is much grief, and he that increaseth knowledge
1143     increaseth sorrow.  [Ecclesiastes 1:18]
1144
1145   There are pitfalls to using wisdom, in that it can negate FFTW's
1146ability to adapt to changing hardware and other conditions.  For
1147example, it would be perfectly possible to export wisdom from a program
1148running on one processor and import it into a program running on another
1149processor.  Doing so, however, would mean that the second program would
1150use plans optimized for the first processor, instead of the one it is
1151running on.
1152
1153   It should be safe to reuse wisdom as long as the hardware and program
1154binaries remain unchanged.  (Actually, the optimal plan may change even
1155between runs of the same binary on identical hardware, due to
1156differences in the virtual memory environment, etcetera.  Users
1157seriously interested in performance should worry about this problem,
1158too.)  It is likely that, if the same wisdom is used for two different
1159program binaries, even running on the same machine, the plans may be
1160sub-optimal because of differing code alignments.  It is therefore wise
1161to recreate wisdom every time an application is recompiled.  The more
1162the underlying hardware and software changes between the creation of
1163wisdom and its use, the greater grows the risk of sub-optimal plans.
1164
1165   Nevertheless, if the choice is between using 'FFTW_ESTIMATE' or using
1166possibly-suboptimal wisdom (created on the same machine, but for a
1167different binary), the wisdom is likely to be better.  For this reason,
1168we provide a function to import wisdom from a standard system-wide
1169location ('/usr/local/etc/fftw/wisdom' on Unix):
1170
1171     int fftw_import_system_wisdom(void);
1172
1173   FFTW also provides a standalone program, 'fftw-wisdom' (described by
1174its own 'man' page on Unix) with which users can create wisdom, e.g.
1175for a canonical set of sizes to store in the system wisdom file.  *Note
1176Wisdom Utilities::.
1177
1178
1179File: fftw3.info,  Node: FFTW Reference,  Next: Multi-threaded FFTW,  Prev: Other Important Topics,  Up: Top
1180
11814 FFTW Reference
1182****************
1183
1184This chapter provides a complete reference for all sequential (i.e.,
1185one-processor) FFTW functions.  Parallel transforms are described in
1186later chapters.
1187
1188* Menu:
1189
1190* Data Types and Files::
1191* Using Plans::
1192* Basic Interface::
1193* Advanced Interface::
1194* Guru Interface::
1195* New-array Execute Functions::
1196* Wisdom::
1197* What FFTW Really Computes::
1198
1199
1200File: fftw3.info,  Node: Data Types and Files,  Next: Using Plans,  Prev: FFTW Reference,  Up: FFTW Reference
1201
12024.1 Data Types and Files
1203========================
1204
1205All programs using FFTW should include its header file:
1206
1207     #include <fftw3.h>
1208
1209   You must also link to the FFTW library.  On Unix, this means adding
1210'-lfftw3 -lm' at the _end_ of the link command.
1211
1212* Menu:
1213
1214* Complex numbers::
1215* Precision::
1216* Memory Allocation::
1217
1218
1219File: fftw3.info,  Node: Complex numbers,  Next: Precision,  Prev: Data Types and Files,  Up: Data Types and Files
1220
12214.1.1 Complex numbers
1222---------------------
1223
1224The default FFTW interface uses 'double' precision for all
1225floating-point numbers, and defines a 'fftw_complex' type to hold
1226complex numbers as:
1227
1228     typedef double fftw_complex[2];
1229
1230   Here, the '[0]' element holds the real part and the '[1]' element
1231holds the imaginary part.
1232
1233   Alternatively, if you have a C compiler (such as 'gcc') that supports
1234the C99 revision of the ANSI C standard, you can use C's new native
1235complex type (which is binary-compatible with the typedef above).  In
1236particular, if you '#include <complex.h>' _before_ '<fftw3.h>', then
1237'fftw_complex' is defined to be the native complex type and you can
1238manipulate it with ordinary arithmetic (e.g.  'x = y * (3+4*I)', where
1239'x' and 'y' are 'fftw_complex' and 'I' is the standard symbol for the
1240imaginary unit);
1241
1242   C++ has its own 'complex<T>' template class, defined in the standard
1243'<complex>' header file.  Reportedly, the C++ standards committee has
1244recently agreed to mandate that the storage format used for this type be
1245binary-compatible with the C99 type, i.e.  an array 'T[2]' with
1246consecutive real '[0]' and imaginary '[1]' parts.  (See report
1247<http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf
1248WG21/N1388>.)  Although not part of the official standard as of this
1249writing, the proposal stated that: "This solution has been tested with
1250all current major implementations of the standard library and shown to
1251be working."  To the extent that this is true, if you have a variable
1252'complex<double> *x', you can pass it directly to FFTW via
1253'reinterpret_cast<fftw_complex*>(x)'.
1254
1255
1256File: fftw3.info,  Node: Precision,  Next: Memory Allocation,  Prev: Complex numbers,  Up: Data Types and Files
1257
12584.1.2 Precision
1259---------------
1260
1261You can install single and long-double precision versions of FFTW, which
1262replace 'double' with 'float' and 'long double', respectively (*note
1263Installation and Customization::).  To use these interfaces, you:
1264
1265   * Link to the single/long-double libraries; on Unix, '-lfftw3f' or
1266     '-lfftw3l' instead of (or in addition to) '-lfftw3'.  (You can link
1267     to the different-precision libraries simultaneously.)
1268
1269   * Include the _same_ '<fftw3.h>' header file.
1270
1271   * Replace all lowercase instances of 'fftw_' with 'fftwf_' or
1272     'fftwl_' for single or long-double precision, respectively.
1273     ('fftw_complex' becomes 'fftwf_complex', 'fftw_execute' becomes
1274     'fftwf_execute', etcetera.)
1275
1276   * Uppercase names, i.e.  names beginning with 'FFTW_', remain the
1277     same.
1278
1279   * Replace 'double' with 'float' or 'long double' for subroutine
1280     parameters.
1281
1282   Depending upon your compiler and/or hardware, 'long double' may not
1283be any more precise than 'double' (or may not be supported at all,
1284although it is standard in C99).
1285
1286   We also support using the nonstandard '__float128'
1287quadruple-precision type provided by recent versions of 'gcc' on 32- and
128864-bit x86 hardware (*note Installation and Customization::).  To use
1289this type, link with '-lfftw3q -lquadmath -lm' (the 'libquadmath'
1290library provided by 'gcc' is needed for quadruple-precision
1291trigonometric functions) and use 'fftwq_' identifiers.
1292
1293
1294File: fftw3.info,  Node: Memory Allocation,  Prev: Precision,  Up: Data Types and Files
1295
12964.1.3 Memory Allocation
1297-----------------------
1298
1299     void *fftw_malloc(size_t n);
1300     void fftw_free(void *p);
1301
1302   These are functions that behave identically to 'malloc' and 'free',
1303except that they guarantee that the returned pointer obeys any special
1304alignment restrictions imposed by any algorithm in FFTW (e.g.  for SIMD
1305acceleration).  *Note SIMD alignment and fftw_malloc::.
1306
1307   Data allocated by 'fftw_malloc' _must_ be deallocated by 'fftw_free'
1308and not by the ordinary 'free'.
1309
1310   These routines simply call through to your operating system's
1311'malloc' or, if necessary, its aligned equivalent (e.g.  'memalign'), so
1312you normally need not worry about any significant time or space
1313overhead.  You are _not required_ to use them to allocate your data, but
1314we strongly recommend it.
1315
1316   Note: in C++, just as with ordinary 'malloc', you must typecast the
1317output of 'fftw_malloc' to whatever pointer type you are allocating.
1318
1319   We also provide the following two convenience functions to allocate
1320real and complex arrays with 'n' elements, which are equivalent to
1321'(double *) fftw_malloc(sizeof(double) * n)' and '(fftw_complex *)
1322fftw_malloc(sizeof(fftw_complex) * n)', respectively:
1323
1324     double *fftw_alloc_real(size_t n);
1325     fftw_complex *fftw_alloc_complex(size_t n);
1326
1327   The equivalent functions in other precisions allocate arrays of 'n'
1328elements in that precision.  e.g.  'fftwf_alloc_real(n)' is equivalent
1329to '(float *) fftwf_malloc(sizeof(float) * n)'.
1330
1331
1332File: fftw3.info,  Node: Using Plans,  Next: Basic Interface,  Prev: Data Types and Files,  Up: FFTW Reference
1333
13344.2 Using Plans
1335===============
1336
1337Plans for all transform types in FFTW are stored as type 'fftw_plan' (an
1338opaque pointer type), and are created by one of the various planning
1339routines described in the following sections.  An 'fftw_plan' contains
1340all information necessary to compute the transform, including the
1341pointers to the input and output arrays.
1342
1343     void fftw_execute(const fftw_plan plan);
1344
1345   This executes the 'plan', to compute the corresponding transform on
1346the arrays for which it was planned (which must still exist).  The plan
1347is not modified, and 'fftw_execute' can be called as many times as
1348desired.
1349
1350   To apply a given plan to a different array, you can use the new-array
1351execute interface.  *Note New-array Execute Functions::.
1352
1353   'fftw_execute' (and equivalents) is the only function in FFTW
1354guaranteed to be thread-safe; see *note Thread safety::.
1355
1356   This function:
1357     void fftw_destroy_plan(fftw_plan plan);
1358   deallocates the 'plan' and all its associated data.
1359
1360   FFTW's planner saves some other persistent data, such as the
1361accumulated wisdom and a list of algorithms available in the current
1362configuration.  If you want to deallocate all of that and reset FFTW to
1363the pristine state it was in when you started your program, you can
1364call:
1365
1366     void fftw_cleanup(void);
1367
1368   After calling 'fftw_cleanup', all existing plans become undefined,
1369and you should not attempt to execute them nor to destroy them.  You can
1370however create and execute/destroy new plans, in which case FFTW starts
1371accumulating wisdom information again.
1372
1373   'fftw_cleanup' does not deallocate your plans, however.  To prevent
1374memory leaks, you must still call 'fftw_destroy_plan' before executing
1375'fftw_cleanup'.
1376
1377   Occasionally, it may useful to know FFTW's internal "cost" metric
1378that it uses to compare plans to one another; this cost is proportional
1379to an execution time of the plan, in undocumented units, if the plan was
1380created with the 'FFTW_MEASURE' or other timing-based options, or
1381alternatively is a heuristic cost function for 'FFTW_ESTIMATE' plans.
1382(The cost values of measured and estimated plans are not comparable,
1383being in different units.  Also, costs from different FFTW versions or
1384the same version compiled differently may not be in the same units.
1385Plans created from wisdom have a cost of 0 since no timing measurement
1386is performed for them.  Finally, certain problems for which only one
1387top-level algorithm was possible may have required no measurements of
1388the cost of the whole plan, in which case 'fftw_cost' will also return
13890.)  The cost metric for a given plan is returned by:
1390
1391     double fftw_cost(const fftw_plan plan);
1392
1393   The following two routines are provided purely for academic purposes
1394(that is, for entertainment).
1395
1396     void fftw_flops(const fftw_plan plan,
1397                     double *add, double *mul, double *fma);
1398
1399   Given a 'plan', set 'add', 'mul', and 'fma' to an exact count of the
1400number of floating-point additions, multiplications, and fused
1401multiply-add operations involved in the plan's execution.  The total
1402number of floating-point operations (flops) is 'add + mul + 2*fma', or
1403'add + mul + fma' if the hardware supports fused multiply-add
1404instructions (although the number of FMA operations is only approximate
1405because of compiler voodoo).  (The number of operations should be an
1406integer, but we use 'double' to avoid overflowing 'int' for large
1407transforms; the arguments are of type 'double' even for single and
1408long-double precision versions of FFTW.)
1409
1410     void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
1411     void fftw_print_plan(const fftw_plan plan);
1412     char *fftw_sprint_plan(const fftw_plan plan);
1413
1414   This outputs a "nerd-readable" representation of the 'plan' to the
1415given file, to 'stdout', or two a newly allocated NUL-terminated string
1416(which the caller is responsible for deallocating with 'free'),
1417respectively.
1418
1419
1420File: fftw3.info,  Node: Basic Interface,  Next: Advanced Interface,  Prev: Using Plans,  Up: FFTW Reference
1421
14224.3 Basic Interface
1423===================
1424
1425Recall that the FFTW API is divided into three parts(1): the "basic
1426interface" computes a single transform of contiguous data, the "advanced
1427interface" computes transforms of multiple or strided arrays, and the
1428"guru interface" supports the most general data layouts, multiplicities,
1429and strides.  This section describes the the basic interface, which we
1430expect to satisfy the needs of most users.
1431
1432* Menu:
1433
1434* Complex DFTs::
1435* Planner Flags::
1436* Real-data DFTs::
1437* Real-data DFT Array Format::
1438* Real-to-Real Transforms::
1439* Real-to-Real Transform Kinds::
1440
1441   ---------- Footnotes ----------
1442
1443   (1) Gallia est omnis divisa in partes tres (Julius Caesar).
1444
1445
1446File: fftw3.info,  Node: Complex DFTs,  Next: Planner Flags,  Prev: Basic Interface,  Up: Basic Interface
1447
14484.3.1 Complex DFTs
1449------------------
1450
1451     fftw_plan fftw_plan_dft_1d(int n0,
1452                                fftw_complex *in, fftw_complex *out,
1453                                int sign, unsigned flags);
1454     fftw_plan fftw_plan_dft_2d(int n0, int n1,
1455                                fftw_complex *in, fftw_complex *out,
1456                                int sign, unsigned flags);
1457     fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
1458                                fftw_complex *in, fftw_complex *out,
1459                                int sign, unsigned flags);
1460     fftw_plan fftw_plan_dft(int rank, const int *n,
1461                             fftw_complex *in, fftw_complex *out,
1462                             int sign, unsigned flags);
1463
1464   Plan a complex input/output discrete Fourier transform (DFT) in zero
1465or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
1466
1467   Once you have created a plan for a certain transform type and
1468parameters, then creating another plan of the same type and parameters,
1469but for different arrays, is fast and shares constant data with the
1470first plan (if it still exists).
1471
1472   The planner returns 'NULL' if the plan cannot be created.  In the
1473standard FFTW distribution, the basic interface is guaranteed to return
1474a non-'NULL' plan.  A plan may be 'NULL', however, if you are using a
1475customized FFTW configuration supporting a restricted set of transforms.
1476
1477Arguments
1478.........
1479
1480   * 'rank' is the rank of the transform (it should be the size of the
1481     array '*n'), and can be any non-negative integer.  (*Note Complex
1482     Multi-Dimensional DFTs::, for the definition of "rank".)  The
1483     '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
1484     '2', and '3', respectively.  The rank may be zero, which is
1485     equivalent to a rank-1 transform of size 1, i.e.  a copy of one
1486     number from input to output.
1487
1488   * 'n0', 'n1', 'n2', or 'n[0..rank-1]' (as appropriate for each
1489     routine) specify the size of the transform dimensions.  They can be
1490     any positive integer.
1491
1492        - Multi-dimensional arrays are stored in row-major order with
1493          dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
1494          'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
1495          Format::.
1496        - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
1497          11^e 13^f, where e+f is either 0 or 1, and the other exponents
1498          are arbitrary.  Other sizes are computed by means of a slow,
1499          general-purpose algorithm (which nevertheless retains O(n log
1500          n) performance even for prime sizes).  It is possible to
1501          customize FFTW for different array sizes; see *note
1502          Installation and Customization::.  Transforms whose sizes are
1503          powers of 2 are especially fast.
1504
1505   * 'in' and 'out' point to the input and output arrays of the
1506     transform, which may be the same (yielding an in-place transform).
1507     These arrays are overwritten during planning, unless
1508     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
1509     initialized, but they must be allocated.)
1510
1511     If 'in == out', the transform is "in-place" and the input array is
1512     overwritten.  If 'in != out', the two arrays must not overlap (but
1513     FFTW does not check for this condition).
1514
1515   * 'sign' is the sign of the exponent in the formula that defines the
1516     Fourier transform.  It can be -1 (= 'FFTW_FORWARD') or +1 (=
1517     'FFTW_BACKWARD').
1518
1519   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
1520     defined in *note Planner Flags::.
1521
1522   FFTW computes an unnormalized transform: computing a forward followed
1523by a backward transform (or vice versa) will result in the original data
1524multiplied by the size of the transform (the product of the dimensions).
1525For more information, see *note What FFTW Really Computes::.
1526
1527
1528File: fftw3.info,  Node: Planner Flags,  Next: Real-data DFTs,  Prev: Complex DFTs,  Up: Basic Interface
1529
15304.3.2 Planner Flags
1531-------------------
1532
1533All of the planner routines in FFTW accept an integer 'flags' argument,
1534which is a bitwise OR ('|') of zero or more of the flag constants
1535defined below.  These flags control the rigor (and time) of the planning
1536process, and can also impose (or lift) restrictions on the type of
1537transform algorithm that is employed.
1538
1539   _Important:_ the planner overwrites the input array during planning
1540unless a saved plan (*note Wisdom::) is available for that problem, so
1541you should initialize your input data after creating the plan.  The only
1542exceptions to this are the 'FFTW_ESTIMATE' and 'FFTW_WISDOM_ONLY' flags,
1543as mentioned below.
1544
1545   In all cases, if wisdom is available for the given problem that was
1546created with equal-or-greater planning rigor, then the more rigorous
1547wisdom is used.  For example, in 'FFTW_ESTIMATE' mode any available
1548wisdom is used, whereas in 'FFTW_PATIENT' mode only wisdom created in
1549patient or exhaustive mode can be used.  *Note Words of Wisdom-Saving
1550Plans::.
1551
1552Planning-rigor flags
1553....................
1554
1555   * 'FFTW_ESTIMATE' specifies that, instead of actual measurements of
1556     different algorithms, a simple heuristic is used to pick a
1557     (probably sub-optimal) plan quickly.  With this flag, the
1558     input/output arrays are not overwritten during planning.
1559
1560   * 'FFTW_MEASURE' tells FFTW to find an optimized plan by actually
1561     _computing_ several FFTs and measuring their execution time.
1562     Depending on your machine, this can take some time (often a few
1563     seconds).  'FFTW_MEASURE' is the default planning option.
1564
1565   * 'FFTW_PATIENT' is like 'FFTW_MEASURE', but considers a wider range
1566     of algorithms and often produces a "more optimal" plan (especially
1567     for large transforms), but at the expense of several times longer
1568     planning time (especially for large transforms).
1569
1570   * 'FFTW_EXHAUSTIVE' is like 'FFTW_PATIENT', but considers an even
1571     wider range of algorithms, including many that we think are
1572     unlikely to be fast, to produce the most optimal plan but with a
1573     substantially increased planning time.
1574
1575   * 'FFTW_WISDOM_ONLY' is a special planning mode in which the plan is
1576     only created if wisdom is available for the given problem, and
1577     otherwise a 'NULL' plan is returned.  This can be combined with
1578     other flags, e.g.  'FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan
1579     only if wisdom is available that was created in 'FFTW_PATIENT' or
1580     'FFTW_EXHAUSTIVE' mode.  The 'FFTW_WISDOM_ONLY' flag is intended
1581     for users who need to detect whether wisdom is available; for
1582     example, if wisdom is not available one may wish to allocate new
1583     arrays for planning so that user data is not overwritten.
1584
1585Algorithm-restriction flags
1586...........................
1587
1588   * 'FFTW_DESTROY_INPUT' specifies that an out-of-place transform is
1589     allowed to _overwrite its input_ array with arbitrary data; this
1590     can sometimes allow more efficient algorithms to be employed.
1591
1592   * 'FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must
1593     _not change its input_ array.  This is ordinarily the _default_,
1594     except for c2r and hc2r (i.e.  complex-to-real) transforms for
1595     which 'FFTW_DESTROY_INPUT' is the default.  In the latter cases,
1596     passing 'FFTW_PRESERVE_INPUT' will attempt to use algorithms that
1597     do not destroy the input, at the expense of worse performance; for
1598     multi-dimensional c2r transforms, however, no input-preserving
1599     algorithms are implemented and the planner will return 'NULL' if
1600     one is requested.
1601
1602   * 'FFTW_UNALIGNED' specifies that the algorithm may not impose any
1603     unusual alignment requirements on the input/output arrays (i.e.  no
1604     SIMD may be used).  This flag is normally _not necessary_, since
1605     the planner automatically detects misaligned arrays.  The only use
1606     for this flag is if you want to use the new-array execute interface
1607     to execute a given plan on a different array that may not be
1608     aligned like the original.  (Using 'fftw_malloc' makes this flag
1609     unnecessary even then.  You can also use 'fftw_alignment_of' to
1610     detect whether two arrays are equivalently aligned.)
1611
1612Limiting planning time
1613......................
1614
1615     extern void fftw_set_timelimit(double seconds);
1616
1617   This function instructs FFTW to spend at most 'seconds' seconds
1618(approximately) in the planner.  If 'seconds == FFTW_NO_TIMELIMIT' (the
1619default value, which is negative), then planning time is unbounded.
1620Otherwise, FFTW plans with a progressively wider range of algorithms
1621until the the given time limit is reached or the given range of
1622algorithms is explored, returning the best available plan.
1623
1624   For example, specifying 'FFTW_PATIENT' first plans in 'FFTW_ESTIMATE'
1625mode, then in 'FFTW_MEASURE' mode, then finally (time permitting) in
1626'FFTW_PATIENT'.  If 'FFTW_EXHAUSTIVE' is specified instead, the planner
1627will further progress to 'FFTW_EXHAUSTIVE' mode.
1628
1629   Note that the 'seconds' argument specifies only a rough limit; in
1630practice, the planner may use somewhat more time if the time limit is
1631reached when the planner is in the middle of an operation that cannot be
1632interrupted.  At the very least, the planner will complete planning in
1633'FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0).
1634
1635
1636File: fftw3.info,  Node: Real-data DFTs,  Next: Real-data DFT Array Format,  Prev: Planner Flags,  Up: Basic Interface
1637
16384.3.3 Real-data DFTs
1639--------------------
1640
1641     fftw_plan fftw_plan_dft_r2c_1d(int n0,
1642                                    double *in, fftw_complex *out,
1643                                    unsigned flags);
1644     fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
1645                                    double *in, fftw_complex *out,
1646                                    unsigned flags);
1647     fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
1648                                    double *in, fftw_complex *out,
1649                                    unsigned flags);
1650     fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
1651                                 double *in, fftw_complex *out,
1652                                 unsigned flags);
1653
1654   Plan a real-input/complex-output discrete Fourier transform (DFT) in
1655zero or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
1656
1657   Once you have created a plan for a certain transform type and
1658parameters, then creating another plan of the same type and parameters,
1659but for different arrays, is fast and shares constant data with the
1660first plan (if it still exists).
1661
1662   The planner returns 'NULL' if the plan cannot be created.  A
1663non-'NULL' plan is always returned by the basic interface unless you are
1664using a customized FFTW configuration supporting a restricted set of
1665transforms, or if you use the 'FFTW_PRESERVE_INPUT' flag with a
1666multi-dimensional out-of-place c2r transform (see below).
1667
1668Arguments
1669.........
1670
1671   * 'rank' is the rank of the transform (it should be the size of the
1672     array '*n'), and can be any non-negative integer.  (*Note Complex
1673     Multi-Dimensional DFTs::, for the definition of "rank".)  The
1674     '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
1675     '2', and '3', respectively.  The rank may be zero, which is
1676     equivalent to a rank-1 transform of size 1, i.e.  a copy of one
1677     real number (with zero imaginary part) from input to output.
1678
1679   * 'n0', 'n1', 'n2', or 'n[0..rank-1]', (as appropriate for each
1680     routine) specify the size of the transform dimensions.  They can be
1681     any positive integer.  This is different in general from the
1682     _physical_ array dimensions, which are described in *note Real-data
1683     DFT Array Format::.
1684
1685        - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
1686          11^e 13^f, where e+f is either 0 or 1, and the other exponents
1687          are arbitrary.  Other sizes are computed by means of a slow,
1688          general-purpose algorithm (which nevertheless retains O(n log
1689          n) performance even for prime sizes).  (It is possible to
1690          customize FFTW for different array sizes; see *note
1691          Installation and Customization::.)  Transforms whose sizes are
1692          powers of 2 are especially fast, and it is generally
1693          beneficial for the _last_ dimension of an r2c/c2r transform to
1694          be _even_.
1695
1696   * 'in' and 'out' point to the input and output arrays of the
1697     transform, which may be the same (yielding an in-place transform).
1698     These arrays are overwritten during planning, unless
1699     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
1700     initialized, but they must be allocated.)  For an in-place
1701     transform, it is important to remember that the real array will
1702     require padding, described in *note Real-data DFT Array Format::.
1703
1704   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
1705     defined in *note Planner Flags::.
1706
1707   The inverse transforms, taking complex input (storing the
1708non-redundant half of a logically Hermitian array) to real output, are
1709given by:
1710
1711     fftw_plan fftw_plan_dft_c2r_1d(int n0,
1712                                    fftw_complex *in, double *out,
1713                                    unsigned flags);
1714     fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
1715                                    fftw_complex *in, double *out,
1716                                    unsigned flags);
1717     fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
1718                                    fftw_complex *in, double *out,
1719                                    unsigned flags);
1720     fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
1721                                 fftw_complex *in, double *out,
1722                                 unsigned flags);
1723
1724   The arguments are the same as for the r2c transforms, except that the
1725input and output data formats are reversed.
1726
1727   FFTW computes an unnormalized transform: computing an r2c followed by
1728a c2r transform (or vice versa) will result in the original data
1729multiplied by the size of the transform (the product of the logical
1730dimensions).  An r2c transform produces the same output as a
1731'FFTW_FORWARD' complex DFT of the same input, and a c2r transform is
1732correspondingly equivalent to 'FFTW_BACKWARD'.  For more information,
1733see *note What FFTW Really Computes::.
1734
1735
1736File: fftw3.info,  Node: Real-data DFT Array Format,  Next: Real-to-Real Transforms,  Prev: Real-data DFTs,  Up: Basic Interface
1737
17384.3.4 Real-data DFT Array Format
1739--------------------------------
1740
1741The output of a DFT of real data (r2c) contains symmetries that, in
1742principle, make half of the outputs redundant (*note What FFTW Really
1743Computes::).  (Similarly for the input of an inverse c2r transform.)  In
1744practice, it is not possible to entirely realize these savings in an
1745efficient and understandable format that generalizes to
1746multi-dimensional transforms.  Instead, the output of the r2c transforms
1747is _slightly_ over half of the output of the corresponding complex
1748transform.  We do not "pack" the data in any way, but store it as an
1749ordinary array of 'fftw_complex' values.  In fact, this data is simply a
1750subsection of what would be the array in the corresponding complex
1751transform.
1752
1753   Specifically, for a real transform of d (= 'rank') dimensions n[0] x
1754n[1] x n[2] x ...  x n[d-1] , the complex data is an n[0] x n[1] x n[2]
1755x ...  x (n[d-1]/2 + 1) array of 'fftw_complex' values in row-major
1756order (with the division rounded down).  That is, we only store the
1757_lower_ half (non-negative frequencies), plus one element, of the last
1758dimension of the data from the ordinary complex transform.  (We could
1759have instead taken half of any other dimension, but implementation turns
1760out to be simpler if the last, contiguous, dimension is used.)
1761
1762   For an out-of-place transform, the real data is simply an array with
1763physical dimensions n[0] x n[1] x n[2] x ...  x n[d-1] in row-major
1764order.
1765
1766   For an in-place transform, some complications arise since the complex
1767data is slightly larger than the real data.  In this case, the final
1768dimension of the real data must be _padded_ with extra values to
1769accommodate the size of the complex data--two extra if the last
1770dimension is even and one if it is odd.  That is, the last dimension of
1771the real data must physically contain 2 * (n[d-1]/2+1) 'double' values
1772(exactly enough to hold the complex data).  This physical array size
1773does not, however, change the _logical_ array size--only n[d-1] values
1774are actually stored in the last dimension, and n[d-1] is the last
1775dimension passed to the planner.
1776
1777
1778File: fftw3.info,  Node: Real-to-Real Transforms,  Next: Real-to-Real Transform Kinds,  Prev: Real-data DFT Array Format,  Up: Basic Interface
1779
17804.3.5 Real-to-Real Transforms
1781-----------------------------
1782
1783     fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
1784                                fftw_r2r_kind kind, unsigned flags);
1785     fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
1786                                fftw_r2r_kind kind0, fftw_r2r_kind kind1,
1787                                unsigned flags);
1788     fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
1789                                double *in, double *out,
1790                                fftw_r2r_kind kind0,
1791                                fftw_r2r_kind kind1,
1792                                fftw_r2r_kind kind2,
1793                                unsigned flags);
1794     fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
1795                             const fftw_r2r_kind *kind, unsigned flags);
1796
1797   Plan a real input/output (r2r) transform of various kinds in zero or
1798more dimensions, returning an 'fftw_plan' (*note Using Plans::).
1799
1800   Once you have created a plan for a certain transform type and
1801parameters, then creating another plan of the same type and parameters,
1802but for different arrays, is fast and shares constant data with the
1803first plan (if it still exists).
1804
1805   The planner returns 'NULL' if the plan cannot be created.  A
1806non-'NULL' plan is always returned by the basic interface unless you are
1807using a customized FFTW configuration supporting a restricted set of
1808transforms, or for size-1 'FFTW_REDFT00' kinds (which are not defined).
1809
1810Arguments
1811.........
1812
1813   * 'rank' is the dimensionality of the transform (it should be the
1814     size of the arrays '*n' and '*kind'), and can be any non-negative
1815     integer.  The '_1d', '_2d', and '_3d' planners correspond to a
1816     'rank' of '1', '2', and '3', respectively.  A 'rank' of zero is
1817     equivalent to a copy of one number from input to output.
1818
1819   * 'n', or 'n0'/'n1'/'n2', or 'n[rank]', respectively, gives the
1820     (physical) size of the transform dimensions.  They can be any
1821     positive integer.
1822
1823        - Multi-dimensional arrays are stored in row-major order with
1824          dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
1825          'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
1826          Format::.
1827        - FFTW is generally best at handling sizes of the form 2^a 3^b
1828          5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other
1829          exponents are arbitrary.  Other sizes are computed by means of
1830          a slow, general-purpose algorithm (which nevertheless retains
1831          O(n log n) performance even for prime sizes).  (It is possible
1832          to customize FFTW for different array sizes; see *note
1833          Installation and Customization::.)  Transforms whose sizes are
1834          powers of 2 are especially fast.
1835        - For a 'REDFT00' or 'RODFT00' transform kind in a dimension of
1836          size n, it is n-1 or n+1, respectively, that should be
1837          factorizable in the above form.
1838
1839   * 'in' and 'out' point to the input and output arrays of the
1840     transform, which may be the same (yielding an in-place transform).
1841     These arrays are overwritten during planning, unless
1842     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
1843     initialized, but they must be allocated.)
1844
1845   * 'kind', or 'kind0'/'kind1'/'kind2', or 'kind[rank]', is the kind of
1846     r2r transform used for the corresponding dimension.  The valid kind
1847     constants are described in *note Real-to-Real Transform Kinds::.
1848     In a multi-dimensional transform, what is computed is the separable
1849     product formed by taking each transform kind along the
1850     corresponding dimension, one dimension after another.
1851
1852   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
1853     defined in *note Planner Flags::.
1854
1855
1856File: fftw3.info,  Node: Real-to-Real Transform Kinds,  Prev: Real-to-Real Transforms,  Up: Basic Interface
1857
18584.3.6 Real-to-Real Transform Kinds
1859----------------------------------
1860
1861FFTW currently supports 11 different r2r transform kinds, specified by
1862one of the constants below.  For the precise definitions of these
1863transforms, see *note What FFTW Really Computes::.  For a more
1864colloquial introduction to these transform kinds, see *note More DFTs of
1865Real Data::.
1866
1867   For dimension of size 'n', there is a corresponding "logical"
1868dimension 'N' that determines the normalization (and the optimal
1869factorization); the formula for 'N' is given for each kind below.  Also,
1870with each transform kind is listed its corrsponding inverse transform.
1871FFTW computes unnormalized transforms: a transform followed by its
1872inverse will result in the original data multiplied by 'N' (or the
1873product of the 'N''s for each dimension, in multi-dimensions).
1874
1875   * 'FFTW_R2HC' computes a real-input DFT with output in "halfcomplex"
1876     format, i.e.  real and imaginary parts for a transform of size 'n'
1877     stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical
1878     'N=n', inverse is 'FFTW_HC2R'.)
1879
1880   * 'FFTW_HC2R' computes the reverse of 'FFTW_R2HC', above.  (Logical
1881     'N=n', inverse is 'FFTW_R2HC'.)
1882
1883   * 'FFTW_DHT' computes a discrete Hartley transform.  (Logical 'N=n',
1884     inverse is 'FFTW_DHT'.)
1885
1886   * 'FFTW_REDFT00' computes an REDFT00 transform, i.e.  a DCT-I.
1887     (Logical 'N=2*(n-1)', inverse is 'FFTW_REDFT00'.)
1888
1889   * 'FFTW_REDFT10' computes an REDFT10 transform, i.e.  a DCT-II
1890     (sometimes called "the" DCT). (Logical 'N=2*n', inverse is
1891     'FFTW_REDFT01'.)
1892
1893   * 'FFTW_REDFT01' computes an REDFT01 transform, i.e.  a DCT-III
1894     (sometimes called "the" IDCT, being the inverse of DCT-II).
1895     (Logical 'N=2*n', inverse is 'FFTW_REDFT=10'.)
1896
1897   * 'FFTW_REDFT11' computes an REDFT11 transform, i.e.  a DCT-IV.
1898     (Logical 'N=2*n', inverse is 'FFTW_REDFT11'.)
1899
1900   * 'FFTW_RODFT00' computes an RODFT00 transform, i.e.  a DST-I.
1901     (Logical 'N=2*(n+1)', inverse is 'FFTW_RODFT00'.)
1902
1903   * 'FFTW_RODFT10' computes an RODFT10 transform, i.e.  a DST-II.
1904     (Logical 'N=2*n', inverse is 'FFTW_RODFT01'.)
1905
1906   * 'FFTW_RODFT01' computes an RODFT01 transform, i.e.  a DST-III.
1907     (Logical 'N=2*n', inverse is 'FFTW_RODFT=10'.)
1908
1909   * 'FFTW_RODFT11' computes an RODFT11 transform, i.e.  a DST-IV.
1910     (Logical 'N=2*n', inverse is 'FFTW_RODFT11'.)
1911
1912
1913File: fftw3.info,  Node: Advanced Interface,  Next: Guru Interface,  Prev: Basic Interface,  Up: FFTW Reference
1914
19154.4 Advanced Interface
1916======================
1917
1918FFTW's "advanced" interface supplements the basic interface with four
1919new planner routines, providing a new level of flexibility: you can plan
1920a transform of multiple arrays simultaneously, operate on non-contiguous
1921(strided) data, and transform a subset of a larger multi-dimensional
1922array.  Other than these additional features, the planner operates in
1923the same fashion as in the basic interface, and the resulting
1924'fftw_plan' is used in the same way (*note Using Plans::).
1925
1926* Menu:
1927
1928* Advanced Complex DFTs::
1929* Advanced Real-data DFTs::
1930* Advanced Real-to-real Transforms::
1931
1932
1933File: fftw3.info,  Node: Advanced Complex DFTs,  Next: Advanced Real-data DFTs,  Prev: Advanced Interface,  Up: Advanced Interface
1934
19354.4.1 Advanced Complex DFTs
1936---------------------------
1937
1938     fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
1939                                  fftw_complex *in, const int *inembed,
1940                                  int istride, int idist,
1941                                  fftw_complex *out, const int *onembed,
1942                                  int ostride, int odist,
1943                                  int sign, unsigned flags);
1944
1945   This routine plans multiple multidimensional complex DFTs, and it
1946extends the 'fftw_plan_dft' routine (*note Complex DFTs::) to compute
1947'howmany' transforms, each having rank 'rank' and size 'n'.  In
1948addition, the transform data need not be contiguous, but it may be laid
1949out in memory with an arbitrary stride.  To account for these
1950possibilities, 'fftw_plan_many_dft' adds the new parameters 'howmany',
1951{'i','o'}'nembed', {'i','o'}'stride', and {'i','o'}'dist'.  The FFTW
1952basic interface (*note Complex DFTs::) provides routines specialized for
1953ranks 1, 2, and 3, but the advanced interface handles only the
1954general-rank case.
1955
1956   'howmany' is the (nonnegative) number of transforms to compute.  The
1957resulting plan computes 'howmany' transforms, where the input of the
1958'k'-th transform is at location 'in+k*idist' (in C pointer arithmetic),
1959and its output is at location 'out+k*odist'.  Plans obtained in this way
1960can often be faster than calling FFTW multiple times for the individual
1961transforms.  The basic 'fftw_plan_dft' interface corresponds to
1962'howmany=1' (in which case the 'dist' parameters are ignored).
1963
1964   Each of the 'howmany' transforms has rank 'rank' and size 'n', as in
1965the basic interface.  In addition, the advanced interface allows the
1966input and output arrays of each transform to be row-major subarrays of
1967larger rank-'rank' arrays, described by 'inembed' and 'onembed'
1968parameters, respectively.  {'i','o'}'nembed' must be arrays of length
1969'rank', and 'n' should be elementwise less than or equal to
1970{'i','o'}'nembed'.  Passing 'NULL' for an 'nembed' parameter is
1971equivalent to passing 'n' (i.e.  same physical and logical dimensions,
1972as in the basic interface.)
1973
1974   The 'stride' parameters indicate that the 'j'-th element of the input
1975or output arrays is located at 'j*istride' or 'j*ostride', respectively.
1976(For a multi-dimensional array, 'j' is the ordinary row-major index.)
1977When combined with the 'k'-th transform in a 'howmany' loop, from above,
1978this means that the ('j','k')-th element is at 'j*stride+k*dist'.  (The
1979basic 'fftw_plan_dft' interface corresponds to a stride of 1.)
1980
1981   For in-place transforms, the input and output 'stride' and 'dist'
1982parameters should be the same; otherwise, the planner may return 'NULL'.
1983
1984   Arrays 'n', 'inembed', and 'onembed' are not used after this function
1985returns.  You can safely free or reuse them.
1986
1987   *Examples*: One transform of one 5 by 6 array contiguous in memory:
1988        int rank = 2;
1989        int n[] = {5, 6};
1990        int howmany = 1;
1991        int idist = odist = 0; /* unused because howmany = 1 */
1992        int istride = ostride = 1; /* array is contiguous in memory */
1993        int *inembed = n, *onembed = n;
1994
1995   Transform of three 5 by 6 arrays, each contiguous in memory, stored
1996in memory one after another:
1997        int rank = 2;
1998        int n[] = {5, 6};
1999        int howmany = 3;
2000        int idist = odist = n[0]*n[1]; /* = 30, the distance in memory
2001                                          between the first element
2002                                          of the first array and the
2003                                          first element of the second array */
2004        int istride = ostride = 1; /* array is contiguous in memory */
2005        int *inembed = n, *onembed = n;
2006
2007   Transform each column of a 2d array with 10 rows and 3 columns:
2008        int rank = 1; /* not 2: we are computing 1d transforms */
2009        int n[] = {10}; /* 1d transforms of length 10 */
2010        int howmany = 3;
2011        int idist = odist = 1;
2012        int istride = ostride = 3; /* distance between two elements in
2013                                      the same column */
2014        int *inembed = n, *onembed = n;
2015
2016
2017File: fftw3.info,  Node: Advanced Real-data DFTs,  Next: Advanced Real-to-real Transforms,  Prev: Advanced Complex DFTs,  Up: Advanced Interface
2018
20194.4.2 Advanced Real-data DFTs
2020-----------------------------
2021
2022     fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
2023                                      double *in, const int *inembed,
2024                                      int istride, int idist,
2025                                      fftw_complex *out, const int *onembed,
2026                                      int ostride, int odist,
2027                                      unsigned flags);
2028     fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
2029                                      fftw_complex *in, const int *inembed,
2030                                      int istride, int idist,
2031                                      double *out, const int *onembed,
2032                                      int ostride, int odist,
2033                                      unsigned flags);
2034
2035   Like 'fftw_plan_many_dft', these two functions add 'howmany',
2036'nembed', 'stride', and 'dist' parameters to the 'fftw_plan_dft_r2c' and
2037'fftw_plan_dft_c2r' functions, but otherwise behave the same as the
2038basic interface.
2039
2040   The interpretation of 'howmany', 'stride', and 'dist' are the same as
2041for 'fftw_plan_many_dft', above.  Note that the 'stride' and 'dist' for
2042the real array are in units of 'double', and for the complex array are
2043in units of 'fftw_complex'.
2044
2045   If an 'nembed' parameter is 'NULL', it is interpreted as what it
2046would be in the basic interface, as described in *note Real-data DFT
2047Array Format::.  That is, for the complex array the size is assumed to
2048be the same as 'n', but with the last dimension cut roughly in half.
2049For the real array, the size is assumed to be 'n' if the transform is
2050out-of-place, or 'n' with the last dimension "padded" if the transform
2051is in-place.
2052
2053   If an 'nembed' parameter is non-'NULL', it is interpreted as the
2054physical size of the corresponding array, in row-major order, just as
2055for 'fftw_plan_many_dft'.  In this case, each dimension of 'nembed'
2056should be '>=' what it would be in the basic interface (e.g.  the halved
2057or padded 'n').
2058
2059   Arrays 'n', 'inembed', and 'onembed' are not used after this function
2060returns.  You can safely free or reuse them.
2061
2062
2063File: fftw3.info,  Node: Advanced Real-to-real Transforms,  Prev: Advanced Real-data DFTs,  Up: Advanced Interface
2064
20654.4.3 Advanced Real-to-real Transforms
2066--------------------------------------
2067
2068     fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
2069                                  double *in, const int *inembed,
2070                                  int istride, int idist,
2071                                  double *out, const int *onembed,
2072                                  int ostride, int odist,
2073                                  const fftw_r2r_kind *kind, unsigned flags);
2074
2075   Like 'fftw_plan_many_dft', this functions adds 'howmany', 'nembed',
2076'stride', and 'dist' parameters to the 'fftw_plan_r2r' function, but
2077otherwise behave the same as the basic interface.  The interpretation of
2078those additional parameters are the same as for 'fftw_plan_many_dft'.
2079(Of course, the 'stride' and 'dist' parameters are now in units of
2080'double', not 'fftw_complex'.)
2081
2082   Arrays 'n', 'inembed', 'onembed', and 'kind' are not used after this
2083function returns.  You can safely free or reuse them.
2084
2085
2086File: fftw3.info,  Node: Guru Interface,  Next: New-array Execute Functions,  Prev: Advanced Interface,  Up: FFTW Reference
2087
20884.5 Guru Interface
2089==================
2090
2091The "guru" interface to FFTW is intended to expose as much as possible
2092of the flexibility in the underlying FFTW architecture.  It allows one
2093to compute multi-dimensional "vectors" (loops) of multi-dimensional
2094transforms, where each vector/transform dimension has an independent
2095size and stride.  One can also use more general complex-number formats,
2096e.g.  separate real and imaginary arrays.
2097
2098   For those users who require the flexibility of the guru interface, it
2099is important that they pay special attention to the documentation lest
2100they shoot themselves in the foot.
2101
2102* Menu:
2103
2104* Interleaved and split arrays::
2105* Guru vector and transform sizes::
2106* Guru Complex DFTs::
2107* Guru Real-data DFTs::
2108* Guru Real-to-real Transforms::
2109* 64-bit Guru Interface::
2110
2111
2112File: fftw3.info,  Node: Interleaved and split arrays,  Next: Guru vector and transform sizes,  Prev: Guru Interface,  Up: Guru Interface
2113
21144.5.1 Interleaved and split arrays
2115----------------------------------
2116
2117The guru interface supports two representations of complex numbers,
2118which we call the interleaved and the split format.
2119
2120   The "interleaved" format is the same one used by the basic and
2121advanced interfaces, and it is documented in *note Complex numbers::.
2122In the interleaved format, you provide pointers to the real part of a
2123complex number, and the imaginary part understood to be stored in the
2124next memory location.
2125
2126   The "split" format allows separate pointers to the real and imaginary
2127parts of a complex array.
2128
2129   Technically, the interleaved format is redundant, because you can
2130always express an interleaved array in terms of a split array with
2131appropriate pointers and strides.  On the other hand, the interleaved
2132format is simpler to use, and it is common in practice.  Hence, FFTW
2133supports it as a special case.
2134
2135
2136File: fftw3.info,  Node: Guru vector and transform sizes,  Next: Guru Complex DFTs,  Prev: Interleaved and split arrays,  Up: Guru Interface
2137
21384.5.2 Guru vector and transform sizes
2139-------------------------------------
2140
2141The guru interface introduces one basic new data structure,
2142'fftw_iodim', that is used to specify sizes and strides for
2143multi-dimensional transforms and vectors:
2144
2145     typedef struct {
2146          int n;
2147          int is;
2148          int os;
2149     } fftw_iodim;
2150
2151   Here, 'n' is the size of the dimension, and 'is' and 'os' are the
2152strides of that dimension for the input and output arrays.  (The stride
2153is the separation of consecutive elements along this dimension.)
2154
2155   The meaning of the stride parameter depends on the type of the array
2156that the stride refers to.  _If the array is interleaved complex,
2157strides are expressed in units of complex numbers ('fftw_complex').  If
2158the array is split complex or real, strides are expressed in units of
2159real numbers ('double')._  This convention is consistent with the usual
2160pointer arithmetic in the C language.  An interleaved array is denoted
2161by a pointer 'p' to 'fftw_complex', so that 'p+1' points to the next
2162complex number.  Split arrays are denoted by pointers to 'double', in
2163which case pointer arithmetic operates in units of 'sizeof(double)'.
2164
2165   The guru planner interfaces all take a ('rank', 'dims[rank]') pair
2166describing the transform size, and a ('howmany_rank',
2167'howmany_dims[howmany_rank]') pair describing the "vector" size (a
2168multi-dimensional loop of transforms to perform), where 'dims' and
2169'howmany_dims' are arrays of 'fftw_iodim'.  Each 'n' field must be
2170positive for 'dims' and nonnegative for 'howmany_dims', while both
2171'rank' and 'howmany_rank' must be nonnegative.
2172
2173   For example, the 'howmany' parameter in the advanced complex-DFT
2174interface corresponds to 'howmany_rank' = 1, 'howmany_dims[0].n' =
2175'howmany', 'howmany_dims[0].is' = 'idist', and 'howmany_dims[0].os' =
2176'odist'.  (To compute a single transform, you can just use
2177'howmany_rank' = 0.)
2178
2179   A row-major multidimensional array with dimensions 'n[rank]' (*note
2180Row-major Format::) corresponds to 'dims[i].n' = 'n[i]' and the
2181recurrence 'dims[i].is' = 'n[i+1] * dims[i+1].is' (similarly for 'os').
2182The stride of the last ('i=rank-1') dimension is the overall stride of
2183the array.  e.g.  to be equivalent to the advanced complex-DFT
2184interface, you would have 'dims[rank-1].is' = 'istride' and
2185'dims[rank-1].os' = 'ostride'.
2186
2187   In general, we only guarantee FFTW to return a non-'NULL' plan if the
2188vector and transform dimensions correspond to a set of distinct indices,
2189and for in-place transforms the input/output strides should be the same.
2190
2191
2192File: fftw3.info,  Node: Guru Complex DFTs,  Next: Guru Real-data DFTs,  Prev: Guru vector and transform sizes,  Up: Guru Interface
2193
21944.5.3 Guru Complex DFTs
2195-----------------------
2196
2197     fftw_plan fftw_plan_guru_dft(
2198          int rank, const fftw_iodim *dims,
2199          int howmany_rank, const fftw_iodim *howmany_dims,
2200          fftw_complex *in, fftw_complex *out,
2201          int sign, unsigned flags);
2202
2203     fftw_plan fftw_plan_guru_split_dft(
2204          int rank, const fftw_iodim *dims,
2205          int howmany_rank, const fftw_iodim *howmany_dims,
2206          double *ri, double *ii, double *ro, double *io,
2207          unsigned flags);
2208
2209   These two functions plan a complex-data, multi-dimensional DFT for
2210the interleaved and split format, respectively.  Transform dimensions
2211are given by ('rank', 'dims') over a multi-dimensional vector (loop) of
2212dimensions ('howmany_rank', 'howmany_dims').  'dims' and 'howmany_dims'
2213should point to 'fftw_iodim' arrays of length 'rank' and 'howmany_rank',
2214respectively.
2215
2216   'flags' is a bitwise OR ('|') of zero or more planner flags, as
2217defined in *note Planner Flags::.
2218
2219   In the 'fftw_plan_guru_dft' function, the pointers 'in' and 'out'
2220point to the interleaved input and output arrays, respectively.  The
2221sign can be either -1 (= 'FFTW_FORWARD') or +1 (= 'FFTW_BACKWARD').  If
2222the pointers are equal, the transform is in-place.
2223
2224   In the 'fftw_plan_guru_split_dft' function, 'ri' and 'ii' point to
2225the real and imaginary input arrays, and 'ro' and 'io' point to the real
2226and imaginary output arrays.  The input and output pointers may be the
2227same, indicating an in-place transform.  For example, for 'fftw_complex'
2228pointers 'in' and 'out', the corresponding parameters are:
2229
2230     ri = (double *) in;
2231     ii = (double *) in + 1;
2232     ro = (double *) out;
2233     io = (double *) out + 1;
2234
2235   Because 'fftw_plan_guru_split_dft' accepts split arrays, strides are
2236expressed in units of 'double'.  For a contiguous 'fftw_complex' array,
2237the overall stride of the transform should be 2, the distance between
2238consecutive real parts or between consecutive imaginary parts; see *note
2239Guru vector and transform sizes::.  Note that the dimension strides are
2240applied equally to the real and imaginary parts; real and imaginary
2241arrays with different strides are not supported.
2242
2243   There is no 'sign' parameter in 'fftw_plan_guru_split_dft'.  This
2244function always plans for an 'FFTW_FORWARD' transform.  To plan for an
2245'FFTW_BACKWARD' transform, you can exploit the identity that the
2246backwards DFT is equal to the forwards DFT with the real and imaginary
2247parts swapped.  For example, in the case of the 'fftw_complex' arrays
2248above, the 'FFTW_BACKWARD' transform is computed by the parameters:
2249
2250     ri = (double *) in + 1;
2251     ii = (double *) in;
2252     ro = (double *) out + 1;
2253     io = (double *) out;
2254
2255
2256File: fftw3.info,  Node: Guru Real-data DFTs,  Next: Guru Real-to-real Transforms,  Prev: Guru Complex DFTs,  Up: Guru Interface
2257
22584.5.4 Guru Real-data DFTs
2259-------------------------
2260
2261     fftw_plan fftw_plan_guru_dft_r2c(
2262          int rank, const fftw_iodim *dims,
2263          int howmany_rank, const fftw_iodim *howmany_dims,
2264          double *in, fftw_complex *out,
2265          unsigned flags);
2266
2267     fftw_plan fftw_plan_guru_split_dft_r2c(
2268          int rank, const fftw_iodim *dims,
2269          int howmany_rank, const fftw_iodim *howmany_dims,
2270          double *in, double *ro, double *io,
2271          unsigned flags);
2272
2273     fftw_plan fftw_plan_guru_dft_c2r(
2274          int rank, const fftw_iodim *dims,
2275          int howmany_rank, const fftw_iodim *howmany_dims,
2276          fftw_complex *in, double *out,
2277          unsigned flags);
2278
2279     fftw_plan fftw_plan_guru_split_dft_c2r(
2280          int rank, const fftw_iodim *dims,
2281          int howmany_rank, const fftw_iodim *howmany_dims,
2282          double *ri, double *ii, double *out,
2283          unsigned flags);
2284
2285   Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
2286with transform dimensions given by ('rank', 'dims') over a
2287multi-dimensional vector (loop) of dimensions ('howmany_rank',
2288'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
2289arrays of length 'rank' and 'howmany_rank', respectively.  As for the
2290basic and advanced interfaces, an r2c transform is 'FFTW_FORWARD' and a
2291c2r transform is 'FFTW_BACKWARD'.
2292
2293   The _last_ dimension of 'dims' is interpreted specially: that
2294dimension of the real array has size 'dims[rank-1].n', but that
2295dimension of the complex array has size 'dims[rank-1].n/2+1' (division
2296rounded down).  The strides, on the other hand, are taken to be exactly
2297as specified.  It is up to the user to specify the strides appropriately
2298for the peculiar dimensions of the data, and we do not guarantee that
2299the planner will succeed (return non-'NULL') for any dimensions other
2300than those described in *note Real-data DFT Array Format:: and
2301generalized in *note Advanced Real-data DFTs::.  (That is, for an
2302in-place transform, each individual dimension should be able to operate
2303in place.)
2304
2305   'in' and 'out' point to the input and output arrays for r2c and c2r
2306transforms, respectively.  For split arrays, 'ri' and 'ii' point to the
2307real and imaginary input arrays for a c2r transform, and 'ro' and 'io'
2308point to the real and imaginary output arrays for an r2c transform.
2309'in' and 'ro' or 'ri' and 'out' may be the same, indicating an in-place
2310transform.  (In-place transforms where 'in' and 'io' or 'ii' and 'out'
2311are the same are not currently supported.)
2312
2313   'flags' is a bitwise OR ('|') of zero or more planner flags, as
2314defined in *note Planner Flags::.
2315
2316   In-place transforms of rank greater than 1 are currently only
2317supported for interleaved arrays.  For split arrays, the planner will
2318return 'NULL'.
2319
2320
2321File: fftw3.info,  Node: Guru Real-to-real Transforms,  Next: 64-bit Guru Interface,  Prev: Guru Real-data DFTs,  Up: Guru Interface
2322
23234.5.5 Guru Real-to-real Transforms
2324----------------------------------
2325
2326     fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
2327                                  int howmany_rank,
2328                                  const fftw_iodim *howmany_dims,
2329                                  double *in, double *out,
2330                                  const fftw_r2r_kind *kind,
2331                                  unsigned flags);
2332
2333   Plan a real-to-real (r2r) multi-dimensional 'FFTW_FORWARD' transform
2334with transform dimensions given by ('rank', 'dims') over a
2335multi-dimensional vector (loop) of dimensions ('howmany_rank',
2336'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
2337arrays of length 'rank' and 'howmany_rank', respectively.
2338
2339   The transform kind of each dimension is given by the 'kind'
2340parameter, which should point to an array of length 'rank'.  Valid
2341'fftw_r2r_kind' constants are given in *note Real-to-Real Transform
2342Kinds::.
2343
2344   'in' and 'out' point to the real input and output arrays; they may be
2345the same, indicating an in-place transform.
2346
2347   'flags' is a bitwise OR ('|') of zero or more planner flags, as
2348defined in *note Planner Flags::.
2349
2350
2351File: fftw3.info,  Node: 64-bit Guru Interface,  Prev: Guru Real-to-real Transforms,  Up: Guru Interface
2352
23534.5.6 64-bit Guru Interface
2354---------------------------
2355
2356When compiled in 64-bit mode on a 64-bit architecture (where addresses
2357are 64 bits wide), FFTW uses 64-bit quantities internally for all
2358transform sizes, strides, and so on--you don't have to do anything
2359special to exploit this.  However, in the ordinary FFTW interfaces, you
2360specify the transform size by an 'int' quantity, which is normally only
236132 bits wide.  This means that, even though FFTW is using 64-bit sizes
2362internally, you cannot specify a single transform dimension larger than
23632^31-1 numbers.
2364
2365   We expect that few users will require transforms larger than this,
2366but, for those who do, we provide a 64-bit version of the guru interface
2367in which all sizes are specified as integers of type 'ptrdiff_t' instead
2368of 'int'.  ('ptrdiff_t' is a signed integer type defined by the C
2369standard to be wide enough to represent address differences, and thus
2370must be at least 64 bits wide on a 64-bit machine.)  We stress that
2371there is _no performance advantage_ to using this interface--the same
2372internal FFTW code is employed regardless--and it is only necessary if
2373you want to specify very large transform sizes.
2374
2375   In particular, the 64-bit guru interface is a set of planner routines
2376that are exactly the same as the guru planner routines, except that they
2377are named with 'guru64' instead of 'guru' and they take arguments of
2378type 'fftw_iodim64' instead of 'fftw_iodim'.  For example, instead of
2379'fftw_plan_guru_dft', we have 'fftw_plan_guru64_dft'.
2380
2381     fftw_plan fftw_plan_guru64_dft(
2382          int rank, const fftw_iodim64 *dims,
2383          int howmany_rank, const fftw_iodim64 *howmany_dims,
2384          fftw_complex *in, fftw_complex *out,
2385          int sign, unsigned flags);
2386
2387   The 'fftw_iodim64' type is similar to 'fftw_iodim', with the same
2388interpretation, except that it uses type 'ptrdiff_t' instead of type
2389'int'.
2390
2391     typedef struct {
2392          ptrdiff_t n;
2393          ptrdiff_t is;
2394          ptrdiff_t os;
2395     } fftw_iodim64;
2396
2397   Every other 'fftw_plan_guru' function also has a 'fftw_plan_guru64'
2398equivalent, but we do not repeat their documentation here since they are
2399identical to the 32-bit versions except as noted above.
2400
2401
2402File: fftw3.info,  Node: New-array Execute Functions,  Next: Wisdom,  Prev: Guru Interface,  Up: FFTW Reference
2403
24044.6 New-array Execute Functions
2405===============================
2406
2407Normally, one executes a plan for the arrays with which the plan was
2408created, by calling 'fftw_execute(plan)' as described in *note Using
2409Plans::.  However, it is possible for sophisticated users to apply a
2410given plan to a _different_ array using the "new-array execute"
2411functions detailed below, provided that the following conditions are
2412met:
2413
2414   * The array size, strides, etcetera are the same (since those are set
2415     by the plan).
2416
2417   * The input and output arrays are the same (in-place) or different
2418     (out-of-place) if the plan was originally created to be in-place or
2419     out-of-place, respectively.
2420
2421   * For split arrays, the separations between the real and imaginary
2422     parts, 'ii-ri' and 'io-ro', are the same as they were for the input
2423     and output arrays when the plan was created.  (This condition is
2424     automatically satisfied for interleaved arrays.)
2425
2426   * The "alignment" of the new input/output arrays is the same as that
2427     of the input/output arrays when the plan was created, unless the
2428     plan was created with the 'FFTW_UNALIGNED' flag.  Here, the
2429     alignment is a platform-dependent quantity (for example, it is the
2430     address modulo 16 if SSE SIMD instructions are used, but the
2431     address modulo 4 for non-SIMD single-precision FFTW on the same
2432     machine).  In general, only arrays allocated with 'fftw_malloc' are
2433     guaranteed to be equally aligned (*note SIMD alignment and
2434     fftw_malloc::).
2435
2436   The alignment issue is especially critical, because if you don't use
2437'fftw_malloc' then you may have little control over the alignment of
2438arrays in memory.  For example, neither the C++ 'new' function nor the
2439Fortran 'allocate' statement provide strong enough guarantees about data
2440alignment.  If you don't use 'fftw_malloc', therefore, you probably have
2441to use 'FFTW_UNALIGNED' (which disables most SIMD support).  If
2442possible, it is probably better for you to simply create multiple plans
2443(creating a new plan is quick once one exists for a given size), or
2444better yet re-use the same array for your transforms.
2445
2446   For rare circumstances in which you cannot control the alignment of
2447allocated memory, but wish to determine where a given array is aligned
2448like the original array for which a plan was created, you can use the
2449'fftw_alignment_of' function:
2450     int fftw_alignment_of(double *p);
2451   Two arrays have equivalent alignment (for the purposes of applying a
2452plan) if and only if 'fftw_alignment_of' returns the same value for the
2453corresponding pointers to their data (typecast to 'double*' if
2454necessary).
2455
2456   If you are tempted to use the new-array execute interface because you
2457want to transform a known bunch of arrays of the same size, you should
2458probably go use the advanced interface instead (*note Advanced
2459Interface::)).
2460
2461   The new-array execute functions are:
2462
2463     void fftw_execute_dft(
2464          const fftw_plan p,
2465          fftw_complex *in, fftw_complex *out);
2466
2467     void fftw_execute_split_dft(
2468          const fftw_plan p,
2469          double *ri, double *ii, double *ro, double *io);
2470
2471     void fftw_execute_dft_r2c(
2472          const fftw_plan p,
2473          double *in, fftw_complex *out);
2474
2475     void fftw_execute_split_dft_r2c(
2476          const fftw_plan p,
2477          double *in, double *ro, double *io);
2478
2479     void fftw_execute_dft_c2r(
2480          const fftw_plan p,
2481          fftw_complex *in, double *out);
2482
2483     void fftw_execute_split_dft_c2r(
2484          const fftw_plan p,
2485          double *ri, double *ii, double *out);
2486
2487     void fftw_execute_r2r(
2488          const fftw_plan p,
2489          double *in, double *out);
2490
2491   These execute the 'plan' to compute the corresponding transform on
2492the input/output arrays specified by the subsequent arguments.  The
2493input/output array arguments have the same meanings as the ones passed
2494to the guru planner routines in the preceding sections.  The 'plan' is
2495not modified, and these routines can be called as many times as desired,
2496or intermixed with calls to the ordinary 'fftw_execute'.
2497
2498   The 'plan' _must_ have been created for the transform type
2499corresponding to the execute function, e.g.  it must be a complex-DFT
2500plan for 'fftw_execute_dft'.  Any of the planner routines for that
2501transform type, from the basic to the guru interface, could have been
2502used to create the plan, however.
2503
2504
2505File: fftw3.info,  Node: Wisdom,  Next: What FFTW Really Computes,  Prev: New-array Execute Functions,  Up: FFTW Reference
2506
25074.7 Wisdom
2508==========
2509
2510This section documents the FFTW mechanism for saving and restoring plans
2511from disk.  This mechanism is called "wisdom".
2512
2513* Menu:
2514
2515* Wisdom Export::
2516* Wisdom Import::
2517* Forgetting Wisdom::
2518* Wisdom Utilities::
2519
2520
2521File: fftw3.info,  Node: Wisdom Export,  Next: Wisdom Import,  Prev: Wisdom,  Up: Wisdom
2522
25234.7.1 Wisdom Export
2524-------------------
2525
2526     int fftw_export_wisdom_to_filename(const char *filename);
2527     void fftw_export_wisdom_to_file(FILE *output_file);
2528     char *fftw_export_wisdom_to_string(void);
2529     void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
2530
2531   These functions allow you to export all currently accumulated wisdom
2532in a form from which it can be later imported and restored, even during
2533a separate run of the program.  (*Note Words of Wisdom-Saving Plans::.)
2534The current store of wisdom is not affected by calling any of these
2535routines.
2536
2537   'fftw_export_wisdom' exports the wisdom to any output medium, as
2538specified by the callback function 'write_char'.  'write_char' is a
2539'putc'-like function that writes the character 'c' to some output; its
2540second parameter is the 'data' pointer passed to 'fftw_export_wisdom'.
2541For convenience, the following three "wrapper" routines are provided:
2542
2543   'fftw_export_wisdom_to_filename' writes wisdom to a file named
2544'filename' (which is created or overwritten), returning '1' on success
2545and '0' on failure.  A lower-level function, which requires you to open
2546and close the file yourself (e.g.  if you want to write wisdom to a
2547portion of a larger file) is 'fftw_export_wisdom_to_file'.  This writes
2548the wisdom to the current position in 'output_file', which should be
2549open with write permission; upon exit, the file remains open and is
2550positioned at the end of the wisdom data.
2551
2552   'fftw_export_wisdom_to_string' returns a pointer to a
2553'NULL'-terminated string holding the wisdom data.  This string is
2554dynamically allocated, and it is the responsibility of the caller to
2555deallocate it with 'free' when it is no longer needed.
2556
2557   All of these routines export the wisdom in the same format, which we
2558will not document here except to say that it is LISP-like ASCII text
2559that is insensitive to white space.
2560
2561
2562File: fftw3.info,  Node: Wisdom Import,  Next: Forgetting Wisdom,  Prev: Wisdom Export,  Up: Wisdom
2563
25644.7.2 Wisdom Import
2565-------------------
2566
2567     int fftw_import_system_wisdom(void);
2568     int fftw_import_wisdom_from_filename(const char *filename);
2569     int fftw_import_wisdom_from_string(const char *input_string);
2570     int fftw_import_wisdom(int (*read_char)(void *), void *data);
2571
2572   These functions import wisdom into a program from data stored by the
2573'fftw_export_wisdom' functions above.  (*Note Words of Wisdom-Saving
2574Plans::.)  The imported wisdom replaces any wisdom already accumulated
2575by the running program.
2576
2577   'fftw_import_wisdom' imports wisdom from any input medium, as
2578specified by the callback function 'read_char'.  'read_char' is a
2579'getc'-like function that returns the next character in the input; its
2580parameter is the 'data' pointer passed to 'fftw_import_wisdom'.  If the
2581end of the input data is reached (which should never happen for valid
2582data), 'read_char' should return 'EOF' (as defined in '<stdio.h>').  For
2583convenience, the following three "wrapper" routines are provided:
2584
2585   'fftw_import_wisdom_from_filename' reads wisdom from a file named
2586'filename'.  A lower-level function, which requires you to open and
2587close the file yourself (e.g.  if you want to read wisdom from a portion
2588of a larger file) is 'fftw_import_wisdom_from_file'.  This reads wisdom
2589from the current position in 'input_file' (which should be open with
2590read permission); upon exit, the file remains open, but the position of
2591the read pointer is unspecified.
2592
2593   'fftw_import_wisdom_from_string' reads wisdom from the
2594'NULL'-terminated string 'input_string'.
2595
2596   'fftw_import_system_wisdom' reads wisdom from an
2597implementation-defined standard file ('/usr/local/etc/fftw/wisdom' on Unix and GNU
2598systems).
2599
2600   The return value of these import routines is '1' if the wisdom was
2601read successfully and '0' otherwise.  Note that, in all of these
2602functions, any data in the input stream past the end of the wisdom data
2603is simply ignored.
2604
2605
2606File: fftw3.info,  Node: Forgetting Wisdom,  Next: Wisdom Utilities,  Prev: Wisdom Import,  Up: Wisdom
2607
26084.7.3 Forgetting Wisdom
2609-----------------------
2610
2611     void fftw_forget_wisdom(void);
2612
2613   Calling 'fftw_forget_wisdom' causes all accumulated 'wisdom' to be
2614discarded and its associated memory to be freed.  (New 'wisdom' can
2615still be gathered subsequently, however.)
2616
2617
2618File: fftw3.info,  Node: Wisdom Utilities,  Prev: Forgetting Wisdom,  Up: Wisdom
2619
26204.7.4 Wisdom Utilities
2621----------------------
2622
2623FFTW includes two standalone utility programs that deal with wisdom.  We
2624merely summarize them here, since they come with their own 'man' pages
2625for Unix and GNU systems (with HTML versions on our web site).
2626
2627   The first program is 'fftw-wisdom' (or 'fftwf-wisdom' in single
2628precision, etcetera), which can be used to create a wisdom file
2629containing plans for any of the transform sizes and types supported by
2630FFTW. It is preferable to create wisdom directly from your executable
2631(*note Caveats in Using Wisdom::), but this program is useful for
2632creating global wisdom files for 'fftw_import_system_wisdom'.
2633
2634   The second program is 'fftw-wisdom-to-conf', which takes a wisdom
2635file as input and produces a "configuration routine" as output.  The
2636latter is a C subroutine that you can compile and link into your
2637program, replacing a routine of the same name in the FFTW library, that
2638determines which parts of FFTW are callable by your program.
2639'fftw-wisdom-to-conf' produces a configuration routine that links to
2640only those parts of FFTW needed by the saved plans in the wisdom,
2641greatly reducing the size of statically linked executables (which should
2642only attempt to create plans corresponding to those in the wisdom,
2643however).
2644
2645
2646File: fftw3.info,  Node: What FFTW Really Computes,  Prev: Wisdom,  Up: FFTW Reference
2647
26484.8 What FFTW Really Computes
2649=============================
2650
2651In this section, we provide precise mathematical definitions for the
2652transforms that FFTW computes.  These transform definitions are fairly
2653standard, but some authors follow slightly different conventions for the
2654normalization of the transform (the constant factor in front) and the
2655sign of the complex exponent.  We begin by presenting the
2656one-dimensional (1d) transform definitions, and then give the
2657straightforward extension to multi-dimensional transforms.
2658
2659* Menu:
2660
2661* The 1d Discrete Fourier Transform (DFT)::
2662* The 1d Real-data DFT::
2663* 1d Real-even DFTs (DCTs)::
2664* 1d Real-odd DFTs (DSTs)::
2665* 1d Discrete Hartley Transforms (DHTs)::
2666* Multi-dimensional Transforms::
2667
2668
2669File: fftw3.info,  Node: The 1d Discrete Fourier Transform (DFT),  Next: The 1d Real-data DFT,  Prev: What FFTW Really Computes,  Up: What FFTW Really Computes
2670
26714.8.1 The 1d Discrete Fourier Transform (DFT)
2672---------------------------------------------
2673
2674The forward ('FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d
2675complex array X of size n computes an array Y, where:
2676 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
2677   The backward ('FFTW_BACKWARD') DFT computes:
2678 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
2679
2680   FFTW computes an unnormalized transform, in that there is no
2681coefficient in front of the summation in the DFT. In other words,
2682applying the forward and then the backward transform will multiply the
2683input by n.
2684
2685   From above, an 'FFTW_FORWARD' transform corresponds to a sign of -1
2686in the exponent of the DFT. Note also that we use the standard
2687"in-order" output ordering--the k-th output corresponds to the frequency
2688k/n (or k/T, where T is your total sampling period).  For those who like
2689to think in terms of positive and negative frequencies, this means that
2690the positive frequencies are stored in the first half of the output and
2691the negative frequencies are stored in backwards order in the second
2692half of the output.  (The frequency -k/n is the same as the frequency
2693(n-k)/n.)
2694
2695
2696File: fftw3.info,  Node: The 1d Real-data DFT,  Next: 1d Real-even DFTs (DCTs),  Prev: The 1d Discrete Fourier Transform (DFT),  Up: What FFTW Really Computes
2697
26984.8.2 The 1d Real-data DFT
2699--------------------------
2700
2701The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of
2702the size 'n' real array X, exactly as defined above, i.e.
2703 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
2704   This output array Y can easily be shown to possess the "Hermitian"
2705symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] =
2706Y[0].
2707
2708   As a result of this symmetry, half of the output Y is redundant
2709(being the complex conjugate of the other half), and so the 1d r2c
2710transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
2711where the division by 2 is rounded down.
2712
2713   Moreover, the Hermitian symmetry implies that Y[0] and, if n is even,
2714the Y[n/2] element, are purely real.  So, for the 'R2HC' r2r transform,
2715the halfcomplex format does not store the imaginary parts of these
2716elements.
2717
2718   The c2r and 'H2RC' r2r transforms compute the backward DFT of the
2719_complex_ array X with Hermitian symmetry, stored in the r2c/'R2HC'
2720output formats, respectively, where the backward transform is defined
2721exactly as for the complex case:
2722 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
2723   The outputs 'Y' of this transform can easily be seen to be purely
2724real, and are stored as an array of real numbers.
2725
2726   Like FFTW's complex DFT, these transforms are unnormalized.  In other
2727words, applying the real-to-complex (forward) and then the
2728complex-to-real (backward) transform will multiply the input by n.
2729
2730
2731File: fftw3.info,  Node: 1d Real-even DFTs (DCTs),  Next: 1d Real-odd DFTs (DSTs),  Prev: The 1d Real-data DFT,  Up: What FFTW Really Computes
2732
27334.8.3 1d Real-even DFTs (DCTs)
2734------------------------------
2735
2736The Real-even symmetry DFTs in FFTW are exactly equivalent to the
2737unnormalized forward (and backward) DFTs as defined above, where the
2738input array X of length N is purely real and is also "even" symmetry.
2739In this case, the output array is likewise real and even symmetry.
2740
2741   For the case of 'REDFT00', this even symmetry means that X[j] =
2742X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
2743this redundancy, only the first n real numbers are actually stored,
2744where N = 2(n-1).
2745
2746   The proper definition of even symmetry for 'REDFT10', 'REDFT01', and
2747'REDFT11' transforms is somewhat more intricate because of the shifts by
27481/2 of the input and/or output, although the corresponding boundary
2749conditions are given in *note Real even/odd DFTs (cosine/sine
2750transforms)::.  Because of the even symmetry, however, the sine terms in
2751the DFT all cancel and the remaining cosine terms are written explicitly
2752below.  This formulation often leads people to call such a transform a
2753"discrete cosine transform" (DCT), although it is really just a special
2754case of the DFT.
2755
2756   In each of the definitions below, we transform a real array X of
2757length n to a real array Y of length n:
2758
2759REDFT00 (DCT-I)
2760...............
2761
2762An 'REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0] +
2763(-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
2764Note that this transform is not defined for n=1.  For n=2, the summation
2765term above is dropped as you might expect.
2766
2767REDFT10 (DCT-II)
2768................
2769
2770An 'REDFT10' transform (type-II DCT, sometimes called "the" DCT) in FFTW
2771is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k /
2772n)).
2773
2774REDFT01 (DCT-III)
2775.................
2776
2777An 'REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] = X[0]
2778+ 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)).  In the case
2779of n=1, this reduces to Y[0] = X[0].  Up to a scale factor (see below),
2780this is the inverse of 'REDFT10' ("the" DCT), and so the 'REDFT01'
2781(DCT-III) is sometimes called the "IDCT".
2782
2783REDFT11 (DCT-IV)
2784................
2785
2786An 'REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
2787(sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
2788
2789Inverses and Normalization
2790..........................
2791
2792These definitions correspond directly to the unnormalized DFTs used
2793elsewhere in FFTW (hence the factors of 2 in front of the summations).
2794The unnormalized inverse of 'REDFT00' is 'REDFT00', of 'REDFT10' is
2795'REDFT01' and vice versa, and of 'REDFT11' is 'REDFT11'.  Each
2796unnormalized inverse results in the original array multiplied by N,
2797where N is the _logical_ DFT size.  For 'REDFT00', N=2(n-1) (note that
2798n=1 is not defined); otherwise, N=2n.
2799
2800   In defining the discrete cosine transform, some authors also include
2801additional factors of sqrt(2) (or its inverse) multiplying selected
2802inputs and/or outputs.  This is a mostly cosmetic change that makes the
2803transform orthogonal, but sacrifices the direct equivalence to a
2804symmetric DFT.
2805
2806
2807File: fftw3.info,  Node: 1d Real-odd DFTs (DSTs),  Next: 1d Discrete Hartley Transforms (DHTs),  Prev: 1d Real-even DFTs (DCTs),  Up: What FFTW Really Computes
2808
28094.8.4 1d Real-odd DFTs (DSTs)
2810-----------------------------
2811
2812The Real-odd symmetry DFTs in FFTW are exactly equivalent to the
2813unnormalized forward (and backward) DFTs as defined above, where the
2814input array X of length N is purely real and is also "odd" symmetry.  In
2815this case, the output is odd symmetry and purely imaginary.
2816
2817   For the case of 'RODFT00', this odd symmetry means that X[j] =
2818-X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
2819this redundancy, only the first n real numbers starting at j=1 are
2820actually stored (the j=0 element is zero), where N = 2(n+1).
2821
2822   The proper definition of odd symmetry for 'RODFT10', 'RODFT01', and
2823'RODFT11' transforms is somewhat more intricate because of the shifts by
28241/2 of the input and/or output, although the corresponding boundary
2825conditions are given in *note Real even/odd DFTs (cosine/sine
2826transforms)::.  Because of the odd symmetry, however, the cosine terms
2827in the DFT all cancel and the remaining sine terms are written
2828explicitly below.  This formulation often leads people to call such a
2829transform a "discrete sine transform" (DST), although it is really just
2830a special case of the DFT.
2831
2832   In each of the definitions below, we transform a real array X of
2833length n to a real array Y of length n:
2834
2835RODFT00 (DST-I)
2836...............
2837
2838An 'RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2 (sum
2839for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
2840
2841RODFT10 (DST-II)
2842................
2843
2844An 'RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
2845(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
2846
2847RODFT01 (DST-III)
2848.................
2849
2850An 'RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
2851(-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
2852n)).  In the case of n=1, this reduces to Y[0] = X[0].
2853
2854RODFT11 (DST-IV)
2855................
2856
2857An 'RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
2858(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
2859
2860Inverses and Normalization
2861..........................
2862
2863These definitions correspond directly to the unnormalized DFTs used
2864elsewhere in FFTW (hence the factors of 2 in front of the summations).
2865The unnormalized inverse of 'RODFT00' is 'RODFT00', of 'RODFT10' is
2866'RODFT01' and vice versa, and of 'RODFT11' is 'RODFT11'.  Each
2867unnormalized inverse results in the original array multiplied by N,
2868where N is the _logical_ DFT size.  For 'RODFT00', N=2(n+1); otherwise,
2869N=2n.
2870
2871   In defining the discrete sine transform, some authors also include
2872additional factors of sqrt(2) (or its inverse) multiplying selected
2873inputs and/or outputs.  This is a mostly cosmetic change that makes the
2874transform orthogonal, but sacrifices the direct equivalence to an
2875antisymmetric DFT.
2876
2877
2878File: fftw3.info,  Node: 1d Discrete Hartley Transforms (DHTs),  Next: Multi-dimensional Transforms,  Prev: 1d Real-odd DFTs (DSTs),  Up: What FFTW Really Computes
2879
28804.8.5 1d Discrete Hartley Transforms (DHTs)
2881-------------------------------------------
2882
2883The discrete Hartley transform (DHT) of a 1d real array X of size n
2884computes a real array Y of the same size, where:
2885Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
2886
2887   FFTW computes an unnormalized transform, in that there is no
2888coefficient in front of the summation in the DHT. In other words,
2889applying the transform twice (the DHT is its own inverse) will multiply
2890the input by n.
2891
2892
2893File: fftw3.info,  Node: Multi-dimensional Transforms,  Prev: 1d Discrete Hartley Transforms (DHTs),  Up: What FFTW Really Computes
2894
28954.8.6 Multi-dimensional Transforms
2896----------------------------------
2897
2898The multi-dimensional transforms of FFTW, in general, compute simply the
2899separable product of the given 1d transform along each dimension of the
2900array.  Since each of these transforms is unnormalized, computing the
2901forward followed by the backward/inverse multi-dimensional transform
2902will result in the original array scaled by the product of the
2903normalization factors for each dimension (e.g.  the product of the
2904dimension sizes, for a multi-dimensional DFT).
2905
2906   The definition of FFTW's multi-dimensional DFT of real data (r2c)
2907deserves special attention.  In this case, we logically compute the full
2908multi-dimensional DFT of the input data; since the input data are purely
2909real, the output data have the Hermitian symmetry and therefore only one
2910non-redundant half need be stored.  More specifically, for an n[0] x
2911n[1] x n[2] x ...  x n[d-1] multi-dimensional real-input DFT, the full
2912(logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the
2913symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ...,
2914n[d-1] - k[d-1]]* (where each dimension is periodic).  Because of this
2915symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the _last_
2916dimension (division by 2 is rounded down).  (We could instead have cut
2917any other dimension in half, but the last dimension proved
2918computationally convenient.)  This results in the peculiar array format
2919described in more detail by *note Real-data DFT Array Format::.
2920
2921   The multi-dimensional c2r transform is simply the unnormalized
2922inverse of the r2c transform.  i.e.  it is the same as FFTW's complex
2923backward multi-dimensional DFT, operating on a Hermitian input array in
2924the peculiar format mentioned above and outputting a real array (since
2925the DFT output is purely real).
2926
2927   We should remind the user that the separable product of 1d transforms
2928along each dimension, as computed by FFTW, is not always the same thing
2929as the usual multi-dimensional transform.  A multi-dimensional 'R2HC'
2930(or 'HC2R') transform is not identical to the multi-dimensional DFT,
2931requiring some post-processing to combine the requisite real and
2932imaginary parts, as was described in *note The Halfcomplex-format DFT::.
2933Likewise, FFTW's multidimensional 'FFTW_DHT' r2r transform is not the
2934same thing as the logical multi-dimensional discrete Hartley transform
2935defined in the literature, as discussed in *note The Discrete Hartley
2936Transform::.
2937
2938
2939File: fftw3.info,  Node: Multi-threaded FFTW,  Next: Distributed-memory FFTW with MPI,  Prev: FFTW Reference,  Up: Top
2940
29415 Multi-threaded FFTW
2942*********************
2943
2944In this chapter we document the parallel FFTW routines for shared-memory
2945parallel hardware.  These routines, which support parallel one- and
2946multi-dimensional transforms of both real and complex data, are the
2947easiest way to take advantage of multiple processors with FFTW. They
2948work just like the corresponding uniprocessor transform routines, except
2949that you have an extra initialization routine to call, and there is a
2950routine to set the number of threads to employ.  Any program that uses
2951the uniprocessor FFTW can therefore be trivially modified to use the
2952multi-threaded FFTW.
2953
2954   A shared-memory machine is one in which all CPUs can directly access
2955the same main memory, and such machines are now common due to the
2956ubiquity of multi-core CPUs.  FFTW's multi-threading support allows you
2957to utilize these additional CPUs transparently from a single program.
2958However, this does not necessarily translate into performance
2959gains--when multiple threads/CPUs are employed, there is an overhead
2960required for synchronization that may outweigh the computatational
2961parallelism.  Therefore, you can only benefit from threads if your
2962problem is sufficiently large.
2963
2964* Menu:
2965
2966* Installation and Supported Hardware/Software::
2967* Usage of Multi-threaded FFTW::
2968* How Many Threads to Use?::
2969* Thread safety::
2970
2971
2972File: fftw3.info,  Node: Installation and Supported Hardware/Software,  Next: Usage of Multi-threaded FFTW,  Prev: Multi-threaded FFTW,  Up: Multi-threaded FFTW
2973
29745.1 Installation and Supported Hardware/Software
2975================================================
2976
2977All of the FFTW threads code is located in the 'threads' subdirectory of
2978the FFTW package.  On Unix systems, the FFTW threads libraries and
2979header files can be automatically configured, compiled, and installed
2980along with the uniprocessor FFTW libraries simply by including
2981'--enable-threads' in the flags to the 'configure' script (*note
2982Installation on Unix::), or '--enable-openmp' to use OpenMP
2983(http://www.openmp.org) threads.
2984
2985   The threads routines require your operating system to have some sort
2986of shared-memory threads support.  Specifically, the FFTW threads
2987package works with POSIX threads (available on most Unix variants, from
2988GNU/Linux to MacOS X) and Win32 threads.  OpenMP threads, which are
2989supported in many common compilers (e.g.  gcc) are also supported, and
2990may give better performance on some systems.  (OpenMP threads are also
2991useful if you are employing OpenMP in your own code, in order to
2992minimize conflicts between threading models.)  If you have a
2993shared-memory machine that uses a different threads API, it should be a
2994simple matter of programming to include support for it; see the file
2995'threads/threads.c' for more detail.
2996
2997   You can compile FFTW with _both_ '--enable-threads' and
2998'--enable-openmp' at the same time, since they install libraries with
2999different names ('fftw3_threads' and 'fftw3_omp', as described below).
3000However, your programs may only link to _one_ of these two libraries at
3001a time.
3002
3003   Ideally, of course, you should also have multiple processors in order
3004to get any benefit from the threaded transforms.
3005
3006
3007File: fftw3.info,  Node: Usage of Multi-threaded FFTW,  Next: How Many Threads to Use?,  Prev: Installation and Supported Hardware/Software,  Up: Multi-threaded FFTW
3008
30095.2 Usage of Multi-threaded FFTW
3010================================
3011
3012Here, it is assumed that the reader is already familiar with the usage
3013of the uniprocessor FFTW routines, described elsewhere in this manual.
3014We only describe what one has to change in order to use the
3015multi-threaded routines.
3016
3017   First, programs using the parallel complex transforms should be
3018linked with '-lfftw3_threads -lfftw3 -lm' on Unix, or '-lfftw3_omp
3019-lfftw3 -lm' if you compiled with OpenMP. You will also need to link
3020with whatever library is responsible for threads on your system (e.g.
3021'-lpthread' on GNU/Linux) or include whatever compiler flag enables
3022OpenMP (e.g.  '-fopenmp' with gcc).
3023
3024   Second, before calling _any_ FFTW routines, you should call the
3025function:
3026
3027     int fftw_init_threads(void);
3028
3029   This function, which need only be called once, performs any one-time
3030initialization required to use threads on your system.  It returns zero
3031if there was some error (which should not happen under normal
3032circumstances) and a non-zero value otherwise.
3033
3034   Third, before creating a plan that you want to parallelize, you
3035should call:
3036
3037     void fftw_plan_with_nthreads(int nthreads);
3038
3039   The 'nthreads' argument indicates the number of threads you want FFTW
3040to use (or actually, the maximum number).  All plans subsequently
3041created with any planner routine will use that many threads.  You can
3042call 'fftw_plan_with_nthreads', create some plans, call
3043'fftw_plan_with_nthreads' again with a different argument, and create
3044some more plans for a new number of threads.  Plans already created
3045before a call to 'fftw_plan_with_nthreads' are unaffected.  If you pass
3046an 'nthreads' argument of '1' (the default), threads are disabled for
3047subsequent plans.
3048
3049   You can determine the current number of threads that the planner can
3050use by calling:
3051
3052     int fftw_planner_nthreads(void);
3053
3054   With OpenMP, to configure FFTW to use all of the currently running
3055OpenMP threads (set by 'omp_set_num_threads(nthreads)' or by the
3056'OMP_NUM_THREADS' environment variable), you can do:
3057'fftw_plan_with_nthreads(omp_get_max_threads())'.  (The 'omp_' OpenMP
3058functions are declared via '#include <omp.h>'.)
3059
3060   Given a plan, you then execute it as usual with 'fftw_execute(plan)',
3061and the execution will use the number of threads specified when the plan
3062was created.  When done, you destroy it as usual with
3063'fftw_destroy_plan'.  As described in *note Thread safety::, plan
3064_execution_ is thread-safe, but plan creation and destruction are _not_:
3065you should create/destroy plans only from a single thread, but can
3066safely execute multiple plans in parallel.
3067
3068   There is one additional routine: if you want to get rid of all memory
3069and other resources allocated internally by FFTW, you can call:
3070
3071     void fftw_cleanup_threads(void);
3072
3073   which is much like the 'fftw_cleanup()' function except that it also
3074gets rid of threads-related data.  You must _not_ execute any previously
3075created plans after calling this function.
3076
3077   We should also mention one other restriction: if you save wisdom from
3078a program using the multi-threaded FFTW, that wisdom _cannot be used_ by
3079a program using only the single-threaded FFTW (i.e.  not calling
3080'fftw_init_threads').  *Note Words of Wisdom-Saving Plans::.
3081
3082   Finally, FFTW provides a optional callback interface that allows you
3083to replace its parallel threading backend at runtime:
3084
3085     void fftw_threads_set_callback(
3086         void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data),
3087         void *data);
3088
3089   This routine (which is _not_ threadsafe and should generally be
3090called before creating any FFTW plans) allows you to provide a function
3091'parallel_loop' that executes parallel work for FFTW: it should call the
3092function 'work(jobdata + elsize*i)' for 'i' from '0' to 'njobs-1',
3093possibly in parallel.  (The 'data' pointer supplied to
3094'fftw_threads_set_callback' is passed through to your 'parallel_loop'
3095function.)  For example, if you link to an FFTW threads library built to
3096use POSIX threads, but you want it to use OpenMP instead (because you
3097are using OpenMP elsewhere in your program and want to avoid competing
3098threads), you can call 'fftw_threads_set_callback' with the callback
3099function:
3100
3101     void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data)
3102     {
3103     #pragma omp parallel for
3104         for (int i = 0; i < njobs; ++i)
3105             work(jobdata + elsize * i);
3106     }
3107
3108   The same mechanism could be used in order to make FFTW use a
3109threading backend implemented via Intel TBB, Apple GCD, or Cilk, for
3110example.
3111
3112
3113File: fftw3.info,  Node: How Many Threads to Use?,  Next: Thread safety,  Prev: Usage of Multi-threaded FFTW,  Up: Multi-threaded FFTW
3114
31155.3 How Many Threads to Use?
3116============================
3117
3118There is a fair amount of overhead involved in synchronizing threads, so
3119the optimal number of threads to use depends upon the size of the
3120transform as well as on the number of processors you have.
3121
3122   As a general rule, you don't want to use more threads than you have
3123processors.  (Using more threads will work, but there will be extra
3124overhead with no benefit.)  In fact, if the problem size is too small,
3125you may want to use fewer threads than you have processors.
3126
3127   You will have to experiment with your system to see what level of
3128parallelization is best for your problem size.  Typically, the problem
3129will have to involve at least a few thousand data points before threads
3130become beneficial.  If you plan with 'FFTW_PATIENT', it will
3131automatically disable threads for sizes that don't benefit from
3132parallelization.
3133
3134
3135File: fftw3.info,  Node: Thread safety,  Prev: How Many Threads to Use?,  Up: Multi-threaded FFTW
3136
31375.4 Thread safety
3138=================
3139
3140Users writing multi-threaded programs (including OpenMP) must concern
3141themselves with the "thread safety" of the libraries they use--that is,
3142whether it is safe to call routines in parallel from multiple threads.
3143FFTW can be used in such an environment, but some care must be taken
3144because the planner routines share data (e.g.  wisdom and trigonometric
3145tables) between calls and plans.
3146
3147   The upshot is that the only thread-safe routine in FFTW is
3148'fftw_execute' (and the new-array variants thereof).  All other routines
3149(e.g.  the planner) should only be called from one thread at a time.
3150So, for example, you can wrap a semaphore lock around any calls to the
3151planner; even more simply, you can just create all of your plans from
3152one thread.  We do not think this should be an important restriction
3153(FFTW is designed for the situation where the only performance-sensitive
3154code is the actual execution of the transform), and the benefits of
3155shared data between plans are great.
3156
3157   Note also that, since the plan is not modified by 'fftw_execute', it
3158is safe to execute the _same plan_ in parallel by multiple threads.
3159However, since a given plan operates by default on a fixed array, you
3160need to use one of the new-array execute functions (*note New-array
3161Execute Functions::) so that different threads compute the transform of
3162different data.
3163
3164   (Users should note that these comments only apply to programs using
3165shared-memory threads or OpenMP. Parallelism using MPI or forked
3166processes involves a separate address-space and global variables for
3167each process, and is not susceptible to problems of this sort.)
3168
3169   The FFTW planner is intended to be called from a single thread.  If
3170you really must call it from multiple threads, you are expected to grab
3171whatever lock makes sense for your application, with the understanding
3172that you may be holding that lock for a long time, which is undesirable.
3173
3174   Neither strategy works, however, in the following situation.  The
3175"application" is structured as a set of "plugins" which are unaware of
3176each other, and for whatever reason the "plugins" cannot coordinate on
3177grabbing the lock.  (This is not a technical problem, but an
3178organizational one.  The "plugins" are written by independent agents,
3179and from the perspective of each plugin's author, each plugin is using
3180FFTW correctly from a single thread.)  To cope with this situation,
3181starting from FFTW-3.3.5, FFTW supports an API to make the planner
3182thread-safe:
3183
3184     void fftw_make_planner_thread_safe(void);
3185
3186   This call operates by brute force: It just installs a hook that wraps
3187a lock (chosen by us) around all planner calls.  So there is no magic
3188and you get the worst of all worlds.  The planner is still
3189single-threaded, but you cannot choose which lock to use.  The planner
3190still holds the lock for a long time, but you cannot impose a timeout on
3191lock acquisition.  As of FFTW-3.3.5 and FFTW-3.3.6, this call does not
3192work when using OpenMP as threading substrate.  (Suggestions on what to
3193do about this bug are welcome.)  _Do not use
3194'fftw_make_planner_thread_safe' unless there is no other choice,_ such
3195as in the application/plugin situation.
3196
3197
3198File: fftw3.info,  Node: Distributed-memory FFTW with MPI,  Next: Calling FFTW from Modern Fortran,  Prev: Multi-threaded FFTW,  Up: Top
3199
32006 Distributed-memory FFTW with MPI
3201**********************************
3202
3203In this chapter we document the parallel FFTW routines for parallel
3204systems supporting the MPI message-passing interface.  Unlike the
3205shared-memory threads described in the previous chapter, MPI allows you
3206to use _distributed-memory_ parallelism, where each CPU has its own
3207separate memory, and which can scale up to clusters of many thousands of
3208processors.  This capability comes at a price, however: each process
3209only stores a _portion_ of the data to be transformed, which means that
3210the data structures and programming-interface are quite different from
3211the serial or threads versions of FFTW.
3212
3213   Distributed-memory parallelism is especially useful when you are
3214transforming arrays so large that they do not fit into the memory of a
3215single processor.  The storage per-process required by FFTW's MPI
3216routines is proportional to the total array size divided by the number
3217of processes.  Conversely, distributed-memory parallelism can easily
3218pose an unacceptably high communications overhead for small problems;
3219the threshold problem size for which parallelism becomes advantageous
3220will depend on the precise problem you are interested in, your hardware,
3221and your MPI implementation.
3222
3223   A note on terminology: in MPI, you divide the data among a set of
3224"processes" which each run in their own memory address space.
3225Generally, each process runs on a different physical processor, but this
3226is not required.  A set of processes in MPI is described by an opaque
3227data structure called a "communicator," the most common of which is the
3228predefined communicator 'MPI_COMM_WORLD' which refers to _all_
3229processes.  For more information on these and other concepts common to
3230all MPI programs, we refer the reader to the documentation at the MPI
3231home page (http://www.mcs.anl.gov/research/projects/mpi/).
3232
3233   We assume in this chapter that the reader is familiar with the usage
3234of the serial (uniprocessor) FFTW, and focus only on the concepts new to
3235the MPI interface.
3236
3237* Menu:
3238
3239* FFTW MPI Installation::
3240* Linking and Initializing MPI FFTW::
3241* 2d MPI example::
3242* MPI Data Distribution::
3243* Multi-dimensional MPI DFTs of Real Data::
3244* Other Multi-dimensional Real-data MPI Transforms::
3245* FFTW MPI Transposes::
3246* FFTW MPI Wisdom::
3247* Avoiding MPI Deadlocks::
3248* FFTW MPI Performance Tips::
3249* Combining MPI and Threads::
3250* FFTW MPI Reference::
3251* FFTW MPI Fortran Interface::
3252
3253
3254File: fftw3.info,  Node: FFTW MPI Installation,  Next: Linking and Initializing MPI FFTW,  Prev: Distributed-memory FFTW with MPI,  Up: Distributed-memory FFTW with MPI
3255
32566.1 FFTW MPI Installation
3257=========================
3258
3259All of the FFTW MPI code is located in the 'mpi' subdirectory of the
3260FFTW package.  On Unix systems, the FFTW MPI libraries and header files
3261are automatically configured, compiled, and installed along with the
3262uniprocessor FFTW libraries simply by including '--enable-mpi' in the
3263flags to the 'configure' script (*note Installation on Unix::).
3264
3265   Any implementation of the MPI standard, version 1 or later, should
3266work with FFTW. The 'configure' script will attempt to automatically
3267detect how to compile and link code using your MPI implementation.  In
3268some cases, especially if you have multiple different MPI
3269implementations installed or have an unusual MPI software package, you
3270may need to provide this information explicitly.
3271
3272   Most commonly, one compiles MPI code by invoking a special compiler
3273command, typically 'mpicc' for C code.  The 'configure' script knows the
3274most common names for this command, but you can specify the MPI
3275compilation command explicitly by setting the 'MPICC' variable, as in
3276'./configure MPICC=mpicc ...'.
3277
3278   If, instead of a special compiler command, you need to link a certain
3279library, you can specify the link command via the 'MPILIBS' variable, as
3280in './configure MPILIBS=-lmpi ...'.  Note that if your MPI library is
3281installed in a non-standard location (one the compiler does not know
3282about by default), you may also have to specify the location of the
3283library and header files via 'LDFLAGS' and 'CPPFLAGS' variables,
3284respectively, as in './configure LDFLAGS=-L/path/to/mpi/libs
3285CPPFLAGS=-I/path/to/mpi/include ...'.
3286
3287
3288File: fftw3.info,  Node: Linking and Initializing MPI FFTW,  Next: 2d MPI example,  Prev: FFTW MPI Installation,  Up: Distributed-memory FFTW with MPI
3289
32906.2 Linking and Initializing MPI FFTW
3291=====================================
3292
3293Programs using the MPI FFTW routines should be linked with '-lfftw3_mpi
3294-lfftw3 -lm' on Unix in double precision, '-lfftw3f_mpi -lfftw3f -lm' in
3295single precision, and so on (*note Precision::).  You will also need to
3296link with whatever library is responsible for MPI on your system; in
3297most MPI implementations, there is a special compiler alias named
3298'mpicc' to compile and link MPI code.
3299
3300   Before calling any FFTW routines except possibly 'fftw_init_threads'
3301(*note Combining MPI and Threads::), but after calling 'MPI_Init', you
3302should call the function:
3303
3304     void fftw_mpi_init(void);
3305
3306   If, at the end of your program, you want to get rid of all memory and
3307other resources allocated internally by FFTW, for both the serial and
3308MPI routines, you can call:
3309
3310     void fftw_mpi_cleanup(void);
3311
3312   which is much like the 'fftw_cleanup()' function except that it also
3313gets rid of FFTW's MPI-related data.  You must _not_ execute any
3314previously created plans after calling this function.
3315
3316
3317File: fftw3.info,  Node: 2d MPI example,  Next: MPI Data Distribution,  Prev: Linking and Initializing MPI FFTW,  Up: Distributed-memory FFTW with MPI
3318
33196.3 2d MPI example
3320==================
3321
3322Before we document the FFTW MPI interface in detail, we begin with a
3323simple example outlining how one would perform a two-dimensional 'N0' by
3324'N1' complex DFT.
3325
3326     #include <fftw3-mpi.h>
3327
3328     int main(int argc, char **argv)
3329     {
3330         const ptrdiff_t N0 = ..., N1 = ...;
3331         fftw_plan plan;
3332         fftw_complex *data;
3333         ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
3334
3335         MPI_Init(&argc, &argv);
3336         fftw_mpi_init();
3337
3338         /* get local data size and allocate */
3339         alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
3340                                              &local_n0, &local_0_start);
3341         data = fftw_alloc_complex(alloc_local);
3342
3343         /* create plan for in-place forward DFT */
3344         plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
3345                                     FFTW_FORWARD, FFTW_ESTIMATE);
3346
3347         /* initialize data to some function my_function(x,y) */
3348         for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
3349            data[i*N1 + j] = my_function(local_0_start + i, j);
3350
3351         /* compute transforms, in-place, as many times as desired */
3352         fftw_execute(plan);
3353
3354         fftw_destroy_plan(plan);
3355
3356         MPI_Finalize();
3357     }
3358
3359   As can be seen above, the MPI interface follows the same basic style
3360of allocate/plan/execute/destroy as the serial FFTW routines.  All of
3361the MPI-specific routines are prefixed with 'fftw_mpi_' instead of
3362'fftw_'.  There are a few important differences, however:
3363
3364   First, we must call 'fftw_mpi_init()' after calling 'MPI_Init'
3365(required in all MPI programs) and before calling any other 'fftw_mpi_'
3366routine.
3367
3368   Second, when we create the plan with 'fftw_mpi_plan_dft_2d',
3369analogous to 'fftw_plan_dft_2d', we pass an additional argument: the
3370communicator, indicating which processes will participate in the
3371transform (here 'MPI_COMM_WORLD', indicating all processes).  Whenever
3372you create, execute, or destroy a plan for an MPI transform, you must
3373call the corresponding FFTW routine on _all_ processes in the
3374communicator for that transform.  (That is, these are _collective_
3375calls.)  Note that the plan for the MPI transform uses the standard
3376'fftw_execute' and 'fftw_destroy' routines (on the other hand, there are
3377MPI-specific new-array execute functions documented below).
3378
3379   Third, all of the FFTW MPI routines take 'ptrdiff_t' arguments
3380instead of 'int' as for the serial FFTW. 'ptrdiff_t' is a standard C
3381integer type which is (at least) 32 bits wide on a 32-bit machine and 64
3382bits wide on a 64-bit machine.  This is to make it easy to specify very
3383large parallel transforms on a 64-bit machine.  (You can specify 64-bit
3384transform sizes in the serial FFTW, too, but only by using the 'guru64'
3385planner interface.  *Note 64-bit Guru Interface::.)
3386
3387   Fourth, and most importantly, you don't allocate the entire
3388two-dimensional array on each process.  Instead, you call
3389'fftw_mpi_local_size_2d' to find out what _portion_ of the array resides
3390on each processor, and how much space to allocate.  Here, the portion of
3391the array on each process is a 'local_n0' by 'N1' slice of the total
3392array, starting at index 'local_0_start'.  The total number of
3393'fftw_complex' numbers to allocate is given by the 'alloc_local' return
3394value, which _may_ be greater than 'local_n0 * N1' (in case some
3395intermediate calculations require additional storage).  The data
3396distribution in FFTW's MPI interface is described in more detail by the
3397next section.
3398
3399   Given the portion of the array that resides on the local process, it
3400is straightforward to initialize the data (here to a function
3401'myfunction') and otherwise manipulate it.  Of course, at the end of the
3402program you may want to output the data somehow, but synchronizing this
3403output is up to you and is beyond the scope of this manual.  (One good
3404way to output a large multi-dimensional distributed array in MPI to a
3405portable binary file is to use the free HDF5 library; see the HDF home
3406page (http://www.hdfgroup.org/).)
3407
3408
3409File: fftw3.info,  Node: MPI Data Distribution,  Next: Multi-dimensional MPI DFTs of Real Data,  Prev: 2d MPI example,  Up: Distributed-memory FFTW with MPI
3410
34116.4 MPI Data Distribution
3412=========================
3413
3414The most important concept to understand in using FFTW's MPI interface
3415is the data distribution.  With a serial or multithreaded FFT, all of
3416the inputs and outputs are stored as a single contiguous chunk of
3417memory.  With a distributed-memory FFT, the inputs and outputs are
3418broken into disjoint blocks, one per process.
3419
3420   In particular, FFTW uses a _1d block distribution_ of the data,
3421distributed along the _first dimension_.  For example, if you want to
3422perform a 100 x 200 complex DFT, distributed over 4 processes, each
3423process will get a 25 x 200 slice of the data.  That is, process 0 will
3424get rows 0 through 24, process 1 will get rows 25 through 49, process 2
3425will get rows 50 through 74, and process 3 will get rows 75 through 99.
3426If you take the same array but distribute it over 3 processes, then it
3427is not evenly divisible so the different processes will have unequal
3428chunks.  FFTW's default choice in this case is to assign 34 rows to
3429processes 0 and 1, and 32 rows to process 2.
3430
3431   FFTW provides several 'fftw_mpi_local_size' routines that you can
3432call to find out what portion of an array is stored on the current
3433process.  In most cases, you should use the default block sizes picked
3434by FFTW, but it is also possible to specify your own block size.  For
3435example, with a 100 x 200 array on three processes, you can tell FFTW to
3436use a block size of 40, which would assign 40 rows to processes 0 and 1,
3437and 20 rows to process 2.  FFTW's default is to divide the data equally
3438among the processes if possible, and as best it can otherwise.  The rows
3439are always assigned in "rank order," i.e.  process 0 gets the first
3440block of rows, then process 1, and so on.  (You can change this by using
3441'MPI_Comm_split' to create a new communicator with re-ordered
3442processes.)  However, you should always call the 'fftw_mpi_local_size'
3443routines, if possible, rather than trying to predict FFTW's distribution
3444choices.
3445
3446   In particular, it is critical that you allocate the storage size that
3447is returned by 'fftw_mpi_local_size', which is _not_ necessarily the
3448size of the local slice of the array.  The reason is that intermediate
3449steps of FFTW's algorithms involve transposing the array and
3450redistributing the data, so at these intermediate steps FFTW may require
3451more local storage space (albeit always proportional to the total size
3452divided by the number of processes).  The 'fftw_mpi_local_size'
3453functions know how much storage is required for these intermediate steps
3454and tell you the correct amount to allocate.
3455
3456* Menu:
3457
3458* Basic and advanced distribution interfaces::
3459* Load balancing::
3460* Transposed distributions::
3461* One-dimensional distributions::
3462
3463
3464File: fftw3.info,  Node: Basic and advanced distribution interfaces,  Next: Load balancing,  Prev: MPI Data Distribution,  Up: MPI Data Distribution
3465
34666.4.1 Basic and advanced distribution interfaces
3467------------------------------------------------
3468
3469As with the planner interface, the 'fftw_mpi_local_size' distribution
3470interface is broken into basic and advanced ('_many') interfaces, where
3471the latter allows you to specify the block size manually and also to
3472request block sizes when computing multiple transforms simultaneously.
3473These functions are documented more exhaustively by the FFTW MPI
3474Reference, but we summarize the basic ideas here using a couple of
3475two-dimensional examples.
3476
3477   For the 100 x 200 complex-DFT example, above, we would find the
3478distribution by calling the following function in the basic interface:
3479
3480     ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
3481                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
3482
3483   Given the total size of the data to be transformed (here, 'n0 = 100'
3484and 'n1 = 200') and an MPI communicator ('comm'), this function provides
3485three numbers.
3486
3487   First, it describes the shape of the local data: the current process
3488should store a 'local_n0' by 'n1' slice of the overall dataset, in
3489row-major order ('n1' dimension contiguous), starting at index
3490'local_0_start'.  That is, if the total dataset is viewed as a 'n0' by
3491'n1' matrix, the current process should store the rows 'local_0_start'
3492to 'local_0_start+local_n0-1'.  Obviously, if you are running with only
3493a single MPI process, that process will store the entire array:
3494'local_0_start' will be zero and 'local_n0' will be 'n0'.  *Note
3495Row-major Format::.
3496
3497   Second, the return value is the total number of data elements (e.g.,
3498complex numbers for a complex DFT) that should be allocated for the
3499input and output arrays on the current process (ideally with
3500'fftw_malloc' or an 'fftw_alloc' function, to ensure optimal alignment).
3501It might seem that this should always be equal to 'local_n0 * n1', but
3502this is _not_ the case.  FFTW's distributed FFT algorithms require data
3503redistributions at intermediate stages of the transform, and in some
3504circumstances this may require slightly larger local storage.  This is
3505discussed in more detail below, under *note Load balancing::.
3506
3507   The advanced-interface 'local_size' function for multidimensional
3508transforms returns the same three things ('local_n0', 'local_0_start',
3509and the total number of elements to allocate), but takes more inputs:
3510
3511     ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
3512                                        ptrdiff_t howmany,
3513                                        ptrdiff_t block0,
3514                                        MPI_Comm comm,
3515                                        ptrdiff_t *local_n0,
3516                                        ptrdiff_t *local_0_start);
3517
3518   The two-dimensional case above corresponds to 'rnk = 2' and an array
3519'n' of length 2 with 'n[0] = n0' and 'n[1] = n1'.  This routine is for
3520any 'rnk > 1'; one-dimensional transforms have their own interface
3521because they work slightly differently, as discussed below.
3522
3523   First, the advanced interface allows you to perform multiple
3524transforms at once, of interleaved data, as specified by the 'howmany'
3525parameter.  ('hoamany' is 1 for a single transform.)
3526
3527   Second, here you can specify your desired block size in the 'n0'
3528dimension, 'block0'.  To use FFTW's default block size, pass
3529'FFTW_MPI_DEFAULT_BLOCK' (0) for 'block0'.  Otherwise, on 'P' processes,
3530FFTW will return 'local_n0' equal to 'block0' on the first 'P / block0'
3531processes (rounded down), return 'local_n0' equal to 'n0 - block0 * (P /
3532block0)' on the next process, and 'local_n0' equal to zero on any
3533remaining processes.  In general, we recommend using the default block
3534size (which corresponds to 'n0 / P', rounded up).
3535
3536   For example, suppose you have 'P = 4' processes and 'n0 = 21'.  The
3537default will be a block size of '6', which will give 'local_n0 = 6' on
3538the first three processes and 'local_n0 = 3' on the last process.
3539Instead, however, you could specify 'block0 = 5' if you wanted, which
3540would give 'local_n0 = 5' on processes 0 to 2, 'local_n0 = 6' on process
35413.  (This choice, while it may look superficially more "balanced," has
3542the same critical path as FFTW's default but requires more
3543communications.)
3544
3545
3546File: fftw3.info,  Node: Load balancing,  Next: Transposed distributions,  Prev: Basic and advanced distribution interfaces,  Up: MPI Data Distribution
3547
35486.4.2 Load balancing
3549--------------------
3550
3551Ideally, when you parallelize a transform over some P processes, each
3552process should end up with work that takes equal time.  Otherwise, all
3553of the processes end up waiting on whichever process is slowest.  This
3554goal is known as "load balancing."  In this section, we describe the
3555circumstances under which FFTW is able to load-balance well, and in
3556particular how you should choose your transform size in order to load
3557balance.
3558
3559   Load balancing is especially difficult when you are parallelizing
3560over heterogeneous machines; for example, if one of your processors is a
3561old 486 and another is a Pentium IV, obviously you should give the
3562Pentium more work to do than the 486 since the latter is much slower.
3563FFTW does not deal with this problem, however--it assumes that your
3564processes run on hardware of comparable speed, and that the goal is
3565therefore to divide the problem as equally as possible.
3566
3567   For a multi-dimensional complex DFT, FFTW can divide the problem
3568equally among the processes if: (i) the _first_ dimension 'n0' is
3569divisible by P; and (ii), the _product_ of the subsequent dimensions is
3570divisible by P. (For the advanced interface, where you can specify
3571multiple simultaneous transforms via some "vector" length 'howmany', a
3572factor of 'howmany' is included in the product of the subsequent
3573dimensions.)
3574
3575   For a one-dimensional complex DFT, the length 'N' of the data should
3576be divisible by P _squared_ to be able to divide the problem equally
3577among the processes.
3578
3579
3580File: fftw3.info,  Node: Transposed distributions,  Next: One-dimensional distributions,  Prev: Load balancing,  Up: MPI Data Distribution
3581
35826.4.3 Transposed distributions
3583------------------------------
3584
3585Internally, FFTW's MPI transform algorithms work by first computing
3586transforms of the data local to each process, then by globally
3587_transposing_ the data in some fashion to redistribute the data among
3588the processes, transforming the new data local to each process, and
3589transposing back.  For example, a two-dimensional 'n0' by 'n1' array,
3590distributed across the 'n0' dimension, is transformd by: (i)
3591transforming the 'n1' dimension, which are local to each process; (ii)
3592transposing to an 'n1' by 'n0' array, distributed across the 'n1'
3593dimension; (iii) transforming the 'n0' dimension, which is now local to
3594each process; (iv) transposing back.
3595
3596   However, in many applications it is acceptable to compute a
3597multidimensional DFT whose results are produced in transposed order
3598(e.g., 'n1' by 'n0' in two dimensions).  This provides a significant
3599performance advantage, because it means that the final transposition
3600step can be omitted.  FFTW supports this optimization, which you specify
3601by passing the flag 'FFTW_MPI_TRANSPOSED_OUT' to the planner routines.
3602To compute the inverse transform of transposed output, you specify
3603'FFTW_MPI_TRANSPOSED_IN' to tell it that the input is transposed.  In
3604this section, we explain how to interpret the output format of such a
3605transform.
3606
3607   Suppose you have are transforming multi-dimensional data with (at
3608least two) dimensions n[0] x n[1] x n[2] x ...  x n[d-1] .  As always,
3609it is distributed along the first dimension n[0] .  Now, if we compute
3610its DFT with the 'FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output
3611data are stored with the first _two_ dimensions transposed: n[1] x n[0]
3612x n[2] x ...  x n[d-1] , distributed along the n[1] dimension.
3613Conversely, if we take the n[1] x n[0] x n[2] x ...  x n[d-1] data and
3614transform it with the 'FFTW_MPI_TRANSPOSED_IN' flag, then the format
3615goes back to the original n[0] x n[1] x n[2] x ...  x n[d-1] array.
3616
3617   There are two ways to find the portion of the transposed array that
3618resides on the current process.  First, you can simply call the
3619appropriate 'local_size' function, passing n[1] x n[0] x n[2] x ...  x
3620n[d-1] (the transposed dimensions).  This would mean calling the
3621'local_size' function twice, once for the transposed and once for the
3622non-transposed dimensions.  Alternatively, you can call one of the
3623'local_size_transposed' functions, which returns both the non-transposed
3624and transposed data distribution from a single call.  For example, for a
36253d transform with transposed output (or input), you might call:
3626
3627     ptrdiff_t fftw_mpi_local_size_3d_transposed(
3628                     ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
3629                     ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
3630                     ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
3631
3632   Here, 'local_n0' and 'local_0_start' give the size and starting index
3633of the 'n0' dimension for the _non_-transposed data, as in the previous
3634sections.  For _transposed_ data (e.g.  the output for
3635'FFTW_MPI_TRANSPOSED_OUT'), 'local_n1' and 'local_1_start' give the size
3636and starting index of the 'n1' dimension, which is the first dimension
3637of the transposed data ('n1' by 'n0' by 'n2').
3638
3639   (Note that 'FFTW_MPI_TRANSPOSED_IN' is completely equivalent to
3640performing 'FFTW_MPI_TRANSPOSED_OUT' and passing the first two
3641dimensions to the planner in reverse order, or vice versa.  If you pass
3642_both_ the 'FFTW_MPI_TRANSPOSED_IN' and 'FFTW_MPI_TRANSPOSED_OUT' flags,
3643it is equivalent to swapping the first two dimensions passed to the
3644planner and passing _neither_ flag.)
3645
3646
3647File: fftw3.info,  Node: One-dimensional distributions,  Prev: Transposed distributions,  Up: MPI Data Distribution
3648
36496.4.4 One-dimensional distributions
3650-----------------------------------
3651
3652For one-dimensional distributed DFTs using FFTW, matters are slightly
3653more complicated because the data distribution is more closely tied to
3654how the algorithm works.  In particular, you can no longer pass an
3655arbitrary block size and must accept FFTW's default; also, the block
3656sizes may be different for input and output.  Also, the data
3657distribution depends on the flags and transform direction, in order for
3658forward and backward transforms to work correctly.
3659
3660     ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
3661                     int sign, unsigned flags,
3662                     ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
3663                     ptrdiff_t *local_no, ptrdiff_t *local_o_start);
3664
3665   This function computes the data distribution for a 1d transform of
3666size 'n0' with the given transform 'sign' and 'flags'.  Both input and
3667output data use block distributions.  The input on the current process
3668will consist of 'local_ni' numbers starting at index 'local_i_start';
3669e.g.  if only a single process is used, then 'local_ni' will be 'n0' and
3670'local_i_start' will be '0'.  Similarly for the output, with 'local_no'
3671numbers starting at index 'local_o_start'.  The return value of
3672'fftw_mpi_local_size_1d' will be the total number of elements to
3673allocate on the current process (which might be slightly larger than the
3674local size due to intermediate steps in the algorithm).
3675
3676   As mentioned above (*note Load balancing::), the data will be divided
3677equally among the processes if 'n0' is divisible by the _square_ of the
3678number of processes.  In this case, 'local_ni' will equal 'local_no'.
3679Otherwise, they may be different.
3680
3681   For some applications, such as convolutions, the order of the output
3682data is irrelevant.  In this case, performance can be improved by
3683specifying that the output data be stored in an FFTW-defined "scrambled"
3684format.  (In particular, this is the analogue of transposed output in
3685the multidimensional case: scrambled output saves a communications
3686step.)  If you pass 'FFTW_MPI_SCRAMBLED_OUT' in the flags, then the
3687output is stored in this (undocumented) scrambled order.  Conversely, to
3688perform the inverse transform of data in scrambled order, pass the
3689'FFTW_MPI_SCRAMBLED_IN' flag.
3690
3691   In MPI FFTW, only composite sizes 'n0' can be parallelized; we have
3692not yet implemented a parallel algorithm for large prime sizes.
3693
3694
3695File: fftw3.info,  Node: Multi-dimensional MPI DFTs of Real Data,  Next: Other Multi-dimensional Real-data MPI Transforms,  Prev: MPI Data Distribution,  Up: Distributed-memory FFTW with MPI
3696
36976.5 Multi-dimensional MPI DFTs of Real Data
3698===========================================
3699
3700FFTW's MPI interface also supports multi-dimensional DFTs of real data,
3701similar to the serial r2c and c2r interfaces.  (Parallel one-dimensional
3702real-data DFTs are not currently supported; you must use a complex
3703transform and set the imaginary parts of the inputs to zero.)
3704
3705   The key points to understand for r2c and c2r MPI transforms (compared
3706to the MPI complex DFTs or the serial r2c/c2r transforms), are:
3707
3708   * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1] x
3709     n[2] x ...  x n[d-1] real data to/from n[0] x n[1] x n[2] x ...  x
3710     (n[d-1]/2 + 1) complex data: the last dimension of the complex data
3711     is cut in half (rounded down), plus one.  As for the serial
3712     transforms, the sizes you pass to the 'plan_dft_r2c' and
3713     'plan_dft_c2r' are the n[0] x n[1] x n[2] x ...  x n[d-1]
3714     dimensions of the real data.
3715
3716   * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
3717     x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x ...
3718     x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
3719     _padded_ to make it the same size as the complex output.  This is
3720     much like the in-place serial r2c/c2r interface (*note
3721     Multi-Dimensional DFTs of Real Data::), except that in MPI the
3722     padding is required even for out-of-place data.  The extra padding
3723     numbers are ignored by FFTW (they are _not_ like zero-padding the
3724     transform to a larger size); they are only used to determine the
3725     data layout.
3726
3727   * The data distribution in MPI for _both_ the real and complex data
3728     is determined by the shape of the _complex_ data.  That is, you
3729     call the appropriate 'local size' function for the n[0] x n[1] x
3730     n[2] x ...  x (n[d-1]/2 + 1) complex data, and then use the _same_
3731     distribution for the real data except that the last complex
3732     dimension is replaced by a (padded) real dimension of twice the
3733     length.
3734
3735   For example suppose we are performing an out-of-place r2c transform
3736of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x M
3737x N/2+1 complex data.  Similar to the example in *note 2d MPI example::,
3738we might do something like:
3739
3740     #include <fftw3-mpi.h>
3741
3742     int main(int argc, char **argv)
3743     {
3744         const ptrdiff_t L = ..., M = ..., N = ...;
3745         fftw_plan plan;
3746         double *rin;
3747         fftw_complex *cout;
3748         ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
3749
3750         MPI_Init(&argc, &argv);
3751         fftw_mpi_init();
3752
3753         /* get local data size and allocate */
3754         alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
3755                                              &local_n0, &local_0_start);
3756         rin = fftw_alloc_real(2 * alloc_local);
3757         cout = fftw_alloc_complex(alloc_local);
3758
3759         /* create plan for out-of-place r2c DFT */
3760         plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
3761                                         FFTW_MEASURE);
3762
3763         /* initialize rin to some function my_func(x,y,z) */
3764         for (i = 0; i < local_n0; ++i)
3765            for (j = 0; j < M; ++j)
3766              for (k = 0; k < N; ++k)
3767            rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
3768
3769         /* compute transforms as many times as desired */
3770         fftw_execute(plan);
3771
3772         fftw_destroy_plan(plan);
3773
3774         MPI_Finalize();
3775     }
3776
3777   Note that we allocated 'rin' using 'fftw_alloc_real' with an argument
3778of '2 * alloc_local': since 'alloc_local' is the number of _complex_
3779values to allocate, the number of _real_ values is twice as many.  The
3780'rin' array is then local_n0 x M x 2(N/2+1) in row-major order, so its
3781'(i,j,k)' element is at the index '(i*M + j) * (2*(N/2+1)) + k' (*note
3782Multi-dimensional Array Format::).
3783
3784   As for the complex transforms, improved performance can be obtained
3785by specifying that the output is the transpose of the input or vice
3786versa (*note Transposed distributions::).  In our L x M x N r2c example,
3787including 'FFTW_TRANSPOSED_OUT' in the flags means that the input would
3788be a padded L x M x 2(N/2+1) real array distributed over the 'L'
3789dimension, while the output would be a M x L x N/2+1 complex array
3790distributed over the 'M' dimension.  To perform the inverse c2r
3791transform with the same data distributions, you would use the
3792'FFTW_TRANSPOSED_IN' flag.
3793
3794
3795File: fftw3.info,  Node: Other Multi-dimensional Real-data MPI Transforms,  Next: FFTW MPI Transposes,  Prev: Multi-dimensional MPI DFTs of Real Data,  Up: Distributed-memory FFTW with MPI
3796
37976.6 Other multi-dimensional Real-Data MPI Transforms
3798====================================================
3799
3800FFTW's MPI interface also supports multi-dimensional 'r2r' transforms of
3801all kinds supported by the serial interface (e.g.  discrete cosine and
3802sine transforms, discrete Hartley transforms, etc.).  Only
3803multi-dimensional 'r2r' transforms, not one-dimensional transforms, are
3804currently parallelized.
3805
3806   These are used much like the multidimensional complex DFTs discussed
3807above, except that the data is real rather than complex, and one needs
3808to pass an r2r transform kind ('fftw_r2r_kind') for each dimension as in
3809the serial FFTW (*note More DFTs of Real Data::).
3810
3811   For example, one might perform a two-dimensional L x M that is an
3812REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the
3813second dimension with code like:
3814
3815         const ptrdiff_t L = ..., M = ...;
3816         fftw_plan plan;
3817         double *data;
3818         ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
3819
3820         /* get local data size and allocate */
3821         alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD,
3822                                              &local_n0, &local_0_start);
3823         data = fftw_alloc_real(alloc_local);
3824
3825         /* create plan for in-place REDFT10 x RODFT10 */
3826         plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD,
3827                                     FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE);
3828
3829         /* initialize data to some function my_function(x,y) */
3830         for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j)
3831            data[i*M + j] = my_function(local_0_start + i, j);
3832
3833         /* compute transforms, in-place, as many times as desired */
3834         fftw_execute(plan);
3835
3836         fftw_destroy_plan(plan);
3837
3838   Notice that we use the same 'local_size' functions as we did for
3839complex data, only now we interpret the sizes in terms of real rather
3840than complex values, and correspondingly use 'fftw_alloc_real'.
3841
3842
3843File: fftw3.info,  Node: FFTW MPI Transposes,  Next: FFTW MPI Wisdom,  Prev: Other Multi-dimensional Real-data MPI Transforms,  Up: Distributed-memory FFTW with MPI
3844
38456.7 FFTW MPI Transposes
3846=======================
3847
3848The FFTW's MPI Fourier transforms rely on one or more _global
3849transposition_ step for their communications.  For example, the
3850multidimensional transforms work by transforming along some dimensions,
3851then transposing to make the first dimension local and transforming
3852that, then transposing back.  Because global transposition of a
3853block-distributed matrix has many other potential uses besides FFTs,
3854FFTW's transpose routines can be called directly, as documented in this
3855section.
3856
3857* Menu:
3858
3859* Basic distributed-transpose interface::
3860* Advanced distributed-transpose interface::
3861* An improved replacement for MPI_Alltoall::
3862
3863
3864File: fftw3.info,  Node: Basic distributed-transpose interface,  Next: Advanced distributed-transpose interface,  Prev: FFTW MPI Transposes,  Up: FFTW MPI Transposes
3865
38666.7.1 Basic distributed-transpose interface
3867-------------------------------------------
3868
3869In particular, suppose that we have an 'n0' by 'n1' array in row-major
3870order, block-distributed across the 'n0' dimension.  To transpose this
3871into an 'n1' by 'n0' array block-distributed across the 'n1' dimension,
3872we would create a plan by calling the following function:
3873
3874     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
3875                                       double *in, double *out,
3876                                       MPI_Comm comm, unsigned flags);
3877
3878   The input and output arrays ('in' and 'out') can be the same.  The
3879transpose is actually executed by calling 'fftw_execute' on the plan, as
3880usual.
3881
3882   The 'flags' are the usual FFTW planner flags, but support two
3883additional flags: 'FFTW_MPI_TRANSPOSED_OUT' and/or
3884'FFTW_MPI_TRANSPOSED_IN'.  What these flags indicate, for transpose
3885plans, is that the output and/or input, respectively, are _locally_
3886transposed.  That is, on each process input data is normally stored as a
3887'local_n0' by 'n1' array in row-major order, but for an
3888'FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as 'n1' by
3889'local_n0' in row-major order.  Similarly, 'FFTW_MPI_TRANSPOSED_OUT'
3890means that the output is 'n0' by 'local_n1' instead of 'local_n1' by
3891'n0'.
3892
3893   To determine the local size of the array on each process before and
3894after the transpose, as well as the amount of storage that must be
3895allocated, one should call 'fftw_mpi_local_size_2d_transposed', just as
3896for a 2d DFT as described in the previous section:
3897
3898     ptrdiff_t fftw_mpi_local_size_2d_transposed
3899                     (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
3900                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
3901                      ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
3902
3903   Again, the return value is the local storage to allocate, which in
3904this case is the number of _real_ ('double') values rather than complex
3905numbers as in the previous examples.
3906
3907
3908File: fftw3.info,  Node: Advanced distributed-transpose interface,  Next: An improved replacement for MPI_Alltoall,  Prev: Basic distributed-transpose interface,  Up: FFTW MPI Transposes
3909
39106.7.2 Advanced distributed-transpose interface
3911----------------------------------------------
3912
3913The above routines are for a transpose of a matrix of numbers (of type
3914'double'), using FFTW's default block sizes.  More generally, one can
3915perform transposes of _tuples_ of numbers, with user-specified block
3916sizes for the input and output:
3917
3918     fftw_plan fftw_mpi_plan_many_transpose
3919                     (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
3920                      ptrdiff_t block0, ptrdiff_t block1,
3921                      double *in, double *out, MPI_Comm comm, unsigned flags);
3922
3923   In this case, one is transposing an 'n0' by 'n1' matrix of
3924'howmany'-tuples (e.g.  'howmany = 2' for complex numbers).  The input
3925is distributed along the 'n0' dimension with block size 'block0', and
3926the 'n1' by 'n0' output is distributed along the 'n1' dimension with
3927block size 'block1'.  If 'FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a
3928block size then FFTW uses its default block size.  To get the local size
3929of the data on each process, you should then call
3930'fftw_mpi_local_size_many_transposed'.
3931
3932
3933File: fftw3.info,  Node: An improved replacement for MPI_Alltoall,  Prev: Advanced distributed-transpose interface,  Up: FFTW MPI Transposes
3934
39356.7.3 An improved replacement for MPI_Alltoall
3936----------------------------------------------
3937
3938We close this section by noting that FFTW's MPI transpose routines can
3939be thought of as a generalization for the 'MPI_Alltoall' function
3940(albeit only for floating-point types), and in some circumstances can
3941function as an improved replacement.
3942
3943   'MPI_Alltoall' is defined by the MPI standard as:
3944
3945     int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype,
3946                      void *recvbuf, int recvcnt, MPI_Datatype recvtype,
3947                      MPI_Comm comm);
3948
3949   In particular, for 'double*' arrays 'in' and 'out', consider the
3950call:
3951
3952     MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
3953
3954   This is completely equivalent to:
3955
3956     MPI_Comm_size(comm, &P);
3957     plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
3958     fftw_execute(plan);
3959     fftw_destroy_plan(plan);
3960
3961   That is, computing a P x P transpose on 'P' processes, with a block
3962size of 1, is just a standard all-to-all communication.
3963
3964   However, using the FFTW routine instead of 'MPI_Alltoall' may have
3965certain advantages.  First of all, FFTW's routine can operate in-place
3966('in == out') whereas 'MPI_Alltoall' can only operate out-of-place.
3967
3968   Second, even for out-of-place plans, FFTW's routine may be faster,
3969especially if you need to perform the all-to-all communication many
3970times and can afford to use 'FFTW_MEASURE' or 'FFTW_PATIENT'.  It should
3971certainly be no slower, not including the time to create the plan, since
3972one of the possible algorithms that FFTW uses for an out-of-place
3973transpose _is_ simply to call 'MPI_Alltoall'.  However, FFTW also
3974considers several other possible algorithms that, depending on your MPI
3975implementation and your hardware, may be faster.
3976
3977
3978File: fftw3.info,  Node: FFTW MPI Wisdom,  Next: Avoiding MPI Deadlocks,  Prev: FFTW MPI Transposes,  Up: Distributed-memory FFTW with MPI
3979
39806.8 FFTW MPI Wisdom
3981===================
3982
3983FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be
3984used to save MPI plans as well as to save uniprocessor plans.  However,
3985for MPI there are several unavoidable complications.
3986
3987   First, the MPI standard does not guarantee that every process can
3988perform file I/O (at least, not using C stdio routines)--in general, we
3989may only assume that process 0 is capable of I/O.(1) So, if we want to
3990export the wisdom from a single process to a file, we must first export
3991the wisdom to a string, then send it to process 0, then write it to a
3992file.
3993
3994   Second, in principle we may want to have separate wisdom for every
3995process, since in general the processes may run on different hardware
3996even for a single MPI program.  However, in practice FFTW's MPI code is
3997designed for the case of homogeneous hardware (*note Load balancing::),
3998and in this case it is convenient to use the same wisdom for every
3999process.  Thus, we need a mechanism to synchronize the wisdom.
4000
4001   To address both of these problems, FFTW provides the following two
4002functions:
4003
4004     void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
4005     void fftw_mpi_gather_wisdom(MPI_Comm comm);
4006
4007   Given a communicator 'comm', 'fftw_mpi_broadcast_wisdom' will
4008broadcast the wisdom from process 0 to all other processes.  Conversely,
4009'fftw_mpi_gather_wisdom' will collect wisdom from all processes onto
4010process 0.  (If the plans created for the same problem by different
4011processes are not the same, 'fftw_mpi_gather_wisdom' will arbitrarily
4012choose one of the plans.)  Both of these functions may result in
4013suboptimal plans for different processes if the processes are running on
4014non-identical hardware.  Both of these functions are _collective_ calls,
4015which means that they must be executed by all processes in the
4016communicator.
4017
4018   So, for example, a typical code snippet to import wisdom from a file
4019and use it on all processes would be:
4020
4021     {
4022         int rank;
4023
4024         fftw_mpi_init();
4025         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4026         if (rank == 0) fftw_import_wisdom_from_filename("mywisdom");
4027         fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD);
4028     }
4029
4030   (Note that we must call 'fftw_mpi_init' before importing any wisdom
4031that might contain MPI plans.)  Similarly, a typical code snippet to
4032export wisdom from all processes to a file is:
4033
4034     {
4035         int rank;
4036
4037         fftw_mpi_gather_wisdom(MPI_COMM_WORLD);
4038         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4039         if (rank == 0) fftw_export_wisdom_to_filename("mywisdom");
4040     }
4041
4042   ---------- Footnotes ----------
4043
4044   (1) In fact, even this assumption is not technically guaranteed by
4045the standard, although it seems to be universal in actual MPI
4046implementations and is widely assumed by MPI-using software.
4047Technically, you need to query the 'MPI_IO' attribute of
4048'MPI_COMM_WORLD' with 'MPI_Attr_get'.  If this attribute is
4049'MPI_PROC_NULL', no I/O is possible.  If it is 'MPI_ANY_SOURCE', any
4050process can perform I/O. Otherwise, it is the rank of a process that can
4051perform I/O ...  but since it is not guaranteed to yield the _same_ rank
4052on all processes, you have to do an 'MPI_Allreduce' of some kind if you
4053want all processes to agree about which is going to do I/O. And even
4054then, the standard only guarantees that this process can perform output,
4055but not input.  See e.g.  'Parallel Programming with MPI' by P. S.
4056Pacheco, section 8.1.3.  Needless to say, in our experience virtually no
4057MPI programmers worry about this.
4058
4059
4060File: fftw3.info,  Node: Avoiding MPI Deadlocks,  Next: FFTW MPI Performance Tips,  Prev: FFTW MPI Wisdom,  Up: Distributed-memory FFTW with MPI
4061
40626.9 Avoiding MPI Deadlocks
4063==========================
4064
4065An MPI program can _deadlock_ if one process is waiting for a message
4066from another process that never gets sent.  To avoid deadlocks when
4067using FFTW's MPI routines, it is important to know which functions are
4068_collective_: that is, which functions must _always_ be called in the
4069_same order_ from _every_ process in a given communicator.  (For
4070example, 'MPI_Barrier' is the canonical example of a collective function
4071in the MPI standard.)
4072
4073   The functions in FFTW that are _always_ collective are: every
4074function beginning with 'fftw_mpi_plan', as well as
4075'fftw_mpi_broadcast_wisdom' and 'fftw_mpi_gather_wisdom'.  Also, the
4076following functions from the ordinary FFTW interface are collective when
4077they are applied to a plan created by an 'fftw_mpi_plan' function:
4078'fftw_execute', 'fftw_destroy_plan', and 'fftw_flops'.
4079
4080
4081File: fftw3.info,  Node: FFTW MPI Performance Tips,  Next: Combining MPI and Threads,  Prev: Avoiding MPI Deadlocks,  Up: Distributed-memory FFTW with MPI
4082
40836.10 FFTW MPI Performance Tips
4084==============================
4085
4086In this section, we collect a few tips on getting the best performance
4087out of FFTW's MPI transforms.
4088
4089   First, because of the 1d block distribution, FFTW's parallelization
4090is currently limited by the size of the first dimension.
4091(Multidimensional block distributions may be supported by a future
4092version.)  More generally, you should ideally arrange the dimensions so
4093that FFTW can divide them equally among the processes.  *Note Load
4094balancing::.
4095
4096   Second, if it is not too inconvenient, you should consider working
4097with transposed output for multidimensional plans, as this saves a
4098considerable amount of communications.  *Note Transposed
4099distributions::.
4100
4101   Third, the fastest choices are generally either an in-place transform
4102or an out-of-place transform with the 'FFTW_DESTROY_INPUT' flag (which
4103allows the input array to be used as scratch space).  In-place is
4104especially beneficial if the amount of data per process is large.
4105
4106   Fourth, if you have multiple arrays to transform at once, rather than
4107calling FFTW's MPI transforms several times it usually seems to be
4108faster to interleave the data and use the advanced interface.  (This
4109groups the communications together instead of requiring separate
4110messages for each transform.)
4111
4112
4113File: fftw3.info,  Node: Combining MPI and Threads,  Next: FFTW MPI Reference,  Prev: FFTW MPI Performance Tips,  Up: Distributed-memory FFTW with MPI
4114
41156.11 Combining MPI and Threads
4116==============================
4117
4118In certain cases, it may be advantageous to combine MPI
4119(distributed-memory) and threads (shared-memory) parallelization.  FFTW
4120supports this, with certain caveats.  For example, if you have a cluster
4121of 4-processor shared-memory nodes, you may want to use threads within
4122the nodes and MPI between the nodes, instead of MPI for all
4123parallelization.
4124
4125   In particular, it is possible to seamlessly combine the MPI FFTW
4126routines with the multi-threaded FFTW routines (*note Multi-threaded
4127FFTW::).  However, some care must be taken in the initialization code,
4128which should look something like this:
4129
4130     int threads_ok;
4131
4132     int main(int argc, char **argv)
4133     {
4134         int provided;
4135         MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
4136         threads_ok = provided >= MPI_THREAD_FUNNELED;
4137
4138         if (threads_ok) threads_ok = fftw_init_threads();
4139         fftw_mpi_init();
4140
4141         ...
4142         if (threads_ok) fftw_plan_with_nthreads(...);
4143         ...
4144
4145         MPI_Finalize();
4146     }
4147
4148   First, note that instead of calling 'MPI_Init', you should call
4149'MPI_Init_threads', which is the initialization routine defined by the
4150MPI-2 standard to indicate to MPI that your program will be
4151multithreaded.  We pass 'MPI_THREAD_FUNNELED', which indicates that we
4152will only call MPI routines from the main thread.  (FFTW will launch
4153additional threads internally, but the extra threads will not call MPI
4154code.)  (You may also pass 'MPI_THREAD_SERIALIZED' or
4155'MPI_THREAD_MULTIPLE', which requests additional multithreading support
4156from the MPI implementation, but this is not required by FFTW.) The
4157'provided' parameter returns what level of threads support is actually
4158supported by your MPI implementation; this _must_ be at least
4159'MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so
4160we define a global variable 'threads_ok' to record this.  You should
4161only call 'fftw_init_threads' or 'fftw_plan_with_nthreads' if
4162'threads_ok' is true.  For more information on thread safety in MPI, see
4163the MPI and Threads
4164(http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the
4165MPI-2 standard.
4166
4167   Second, we must call 'fftw_init_threads' _before_ 'fftw_mpi_init'.
4168This is critical for technical reasons having to do with how FFTW
4169initializes its list of algorithms.
4170
4171   Then, if you call 'fftw_plan_with_nthreads(N)', _every_ MPI process
4172will launch (up to) 'N' threads to parallelize its transforms.
4173
4174   For example, in the hypothetical cluster of 4-processor nodes, you
4175might wish to launch only a single MPI process per node, and then call
4176'fftw_plan_with_nthreads(4)' on each process to use all processors in
4177the nodes.
4178
4179   This may or may not be faster than simply using as many MPI processes
4180as you have processors, however.  On the one hand, using threads within
4181a node eliminates the need for explicit message passing within the node.
4182On the other hand, FFTW's transpose routines are not multi-threaded, and
4183this means that the communications that do take place will not benefit
4184from parallelization within the node.  Moreover, many MPI
4185implementations already have optimizations to exploit shared memory when
4186it is available, so adding the multithreaded FFTW on top of this may be
4187superfluous.
4188
4189
4190File: fftw3.info,  Node: FFTW MPI Reference,  Next: FFTW MPI Fortran Interface,  Prev: Combining MPI and Threads,  Up: Distributed-memory FFTW with MPI
4191
41926.12 FFTW MPI Reference
4193=======================
4194
4195This chapter provides a complete reference to all FFTW MPI functions,
4196datatypes, and constants.  See also *note FFTW Reference:: for
4197information on functions and types in common with the serial interface.
4198
4199* Menu:
4200
4201* MPI Files and Data Types::
4202* MPI Initialization::
4203* Using MPI Plans::
4204* MPI Data Distribution Functions::
4205* MPI Plan Creation::
4206* MPI Wisdom Communication::
4207
4208
4209File: fftw3.info,  Node: MPI Files and Data Types,  Next: MPI Initialization,  Prev: FFTW MPI Reference,  Up: FFTW MPI Reference
4210
42116.12.1 MPI Files and Data Types
4212-------------------------------
4213
4214All programs using FFTW's MPI support should include its header file:
4215
4216     #include <fftw3-mpi.h>
4217
4218   Note that this header file includes the serial-FFTW 'fftw3.h' header
4219file, and also the 'mpi.h' header file for MPI, so you need not include
4220those files separately.
4221
4222   You must also link to _both_ the FFTW MPI library and to the serial
4223FFTW library.  On Unix, this means adding '-lfftw3_mpi -lfftw3 -lm' at
4224the end of the link command.
4225
4226   Different precisions are handled as in the serial interface: *Note
4227Precision::.  That is, 'fftw_' functions become 'fftwf_' (in single
4228precision) etcetera, and the libraries become '-lfftw3f_mpi -lfftw3f
4229-lm' etcetera on Unix.  Long-double precision is supported in MPI, but
4230quad precision ('fftwq_') is not due to the lack of MPI support for this
4231type.
4232
4233
4234File: fftw3.info,  Node: MPI Initialization,  Next: Using MPI Plans,  Prev: MPI Files and Data Types,  Up: FFTW MPI Reference
4235
42366.12.2 MPI Initialization
4237-------------------------
4238
4239Before calling any other FFTW MPI ('fftw_mpi_') function, and before
4240importing any wisdom for MPI problems, you must call:
4241
4242     void fftw_mpi_init(void);
4243
4244   If FFTW threads support is used, however, 'fftw_mpi_init' should be
4245called _after_ 'fftw_init_threads' (*note Combining MPI and Threads::).
4246Calling 'fftw_mpi_init' additional times (before 'fftw_mpi_cleanup') has
4247no effect.
4248
4249   If you want to deallocate all persistent data and reset FFTW to the
4250pristine state it was in when you started your program, you can call:
4251
4252     void fftw_mpi_cleanup(void);
4253
4254   (This calls 'fftw_cleanup', so you need not call the serial cleanup
4255routine too, although it is safe to do so.)  After calling
4256'fftw_mpi_cleanup', all existing plans become undefined, and you should
4257not attempt to execute or destroy them.  You must call 'fftw_mpi_init'
4258again after 'fftw_mpi_cleanup' if you want to resume using the MPI FFTW
4259routines.
4260
4261
4262File: fftw3.info,  Node: Using MPI Plans,  Next: MPI Data Distribution Functions,  Prev: MPI Initialization,  Up: FFTW MPI Reference
4263
42646.12.3 Using MPI Plans
4265----------------------
4266
4267Once an MPI plan is created, you can execute and destroy it using
4268'fftw_execute', 'fftw_destroy_plan', and the other functions in the
4269serial interface that operate on generic plans (*note Using Plans::).
4270
4271   The 'fftw_execute' and 'fftw_destroy_plan' functions, applied to MPI
4272plans, are _collective_ calls: they must be called for all processes in
4273the communicator that was used to create the plan.
4274
4275   You must _not_ use the serial new-array plan-execution functions
4276'fftw_execute_dft' and so on (*note New-array Execute Functions::) with
4277MPI plans.  Such functions are specialized to the problem type, and
4278there are specific new-array execute functions for MPI plans:
4279
4280     void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out);
4281     void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out);
4282     void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out);
4283     void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out);
4284
4285   These functions have the same restrictions as those of the serial
4286new-array execute functions.  They are _always_ safe to apply to the
4287_same_ 'in' and 'out' arrays that were used to create the plan.  They
4288can only be applied to new arrarys if those arrays have the same types,
4289dimensions, in-placeness, and alignment as the original arrays, where
4290the best way to ensure the same alignment is to use FFTW's 'fftw_malloc'
4291and related allocation functions for all arrays (*note Memory
4292Allocation::).  Note that distributed transposes (*note FFTW MPI
4293Transposes::) use 'fftw_mpi_execute_r2r', since they count as rank-zero
4294r2r plans from FFTW's perspective.
4295
4296
4297File: fftw3.info,  Node: MPI Data Distribution Functions,  Next: MPI Plan Creation,  Prev: Using MPI Plans,  Up: FFTW MPI Reference
4298
42996.12.4 MPI Data Distribution Functions
4300--------------------------------------
4301
4302As described above (*note MPI Data Distribution::), in order to allocate
4303your arrays, _before_ creating a plan, you must first call one of the
4304following routines to determine the required allocation size and the
4305portion of the array locally stored on a given process.  The 'MPI_Comm'
4306communicator passed here must be equivalent to the communicator used
4307below for plan creation.
4308
4309   The basic interface for multidimensional transforms consists of the
4310functions:
4311
4312     ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
4313                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4314     ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4315                                      MPI_Comm comm,
4316                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4317     ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
4318                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4319
4320     ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
4321                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4322                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4323     ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4324                                                 MPI_Comm comm,
4325                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4326                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4327     ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
4328                                              ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4329                                              ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4330
4331   These functions return the number of elements to allocate (complex
4332numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
4333'local_n0' and 'local_0_start' return the portion ('local_0_start' to
4334'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1]
4335x n[2] x ...  x n[d-1] array that is stored on the local process.  *Note
4336Basic and advanced distribution interfaces::.  For
4337'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful
4338in order to also return the local portion of the first dimension in the
4339n[1] x n[0] x n[2] x ...  x n[d-1] transposed output.  *Note Transposed
4340distributions::.  The advanced interface for multidimensional transforms
4341is:
4342
4343     ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4344                                        ptrdiff_t block0, MPI_Comm comm,
4345                                        ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4346     ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4347                                                   ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
4348                                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4349                                                   ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4350
4351   These differ from the basic interface in only two ways.  First, they
4352allow you to specify block sizes 'block0' and 'block1' (the latter for
4353the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use
4354FFTW's default block size as in the basic interface.  Second, you can
4355pass a 'howmany' parameter, corresponding to the advanced planning
4356interface below: this is for transforms of contiguous 'howmany'-tuples
4357of numbers ('howmany = 1' in the basic interface).
4358
4359   The corresponding basic and advanced routines for one-dimensional
4360transforms (currently only complex DFTs) are:
4361
4362     ptrdiff_t fftw_mpi_local_size_1d(
4363                  ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
4364                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
4365                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
4366     ptrdiff_t fftw_mpi_local_size_many_1d(
4367                  ptrdiff_t n0, ptrdiff_t howmany,
4368                  MPI_Comm comm, int sign, unsigned flags,
4369                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
4370                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
4371
4372   As above, the return value is the number of elements to allocate
4373(complex numbers, for complex DFTs).  The 'local_ni' and 'local_i_start'
4374arguments return the portion ('local_i_start' to 'local_i_start +
4375local_ni - 1') of the 1d array that is stored on this process for the
4376transform _input_, and 'local_no' and 'local_o_start' are the
4377corresponding quantities for the input.  The 'sign' ('FFTW_FORWARD' or
4378'FFTW_BACKWARD') and 'flags' must match the arguments passed when
4379creating a plan.  Although the inputs and outputs have different data
4380distributions in general, it is guaranteed that the _output_ data
4381distribution of an 'FFTW_FORWARD' plan will match the _input_ data
4382distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for
4383the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags.  *Note
4384One-dimensional distributions::.
4385
4386
4387File: fftw3.info,  Node: MPI Plan Creation,  Next: MPI Wisdom Communication,  Prev: MPI Data Distribution Functions,  Up: FFTW MPI Reference
4388
43896.12.5 MPI Plan Creation
4390------------------------
4391
4392Complex-data MPI DFTs
4393.....................
4394
4395Plans for complex-data DFTs (*note 2d MPI example::) are created by:
4396
4397     fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
4398                                    MPI_Comm comm, int sign, unsigned flags);
4399     fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
4400                                    fftw_complex *in, fftw_complex *out,
4401                                    MPI_Comm comm, int sign, unsigned flags);
4402     fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4403                                    fftw_complex *in, fftw_complex *out,
4404                                    MPI_Comm comm, int sign, unsigned flags);
4405     fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
4406                                 fftw_complex *in, fftw_complex *out,
4407                                 MPI_Comm comm, int sign, unsigned flags);
4408     fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
4409                                      ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
4410                                      fftw_complex *in, fftw_complex *out,
4411                                      MPI_Comm comm, int sign, unsigned flags);
4412
4413   These are similar to their serial counterparts (*note Complex DFTs::)
4414in specifying the dimensions, sign, and flags of the transform.  The
4415'comm' argument gives an MPI communicator that specifies the set of
4416processes to participate in the transform; plan creation is a collective
4417function that must be called for all processes in the communicator.  The
4418'in' and 'out' pointers refer only to a portion of the overall transform
4419data (*note MPI Data Distribution::) as specified by the 'local_size'
4420functions in the previous section.  Unless 'flags' contains
4421'FFTW_ESTIMATE', these arrays are overwritten during plan creation as
4422for the serial interface.  For multi-dimensional transforms, any
4423dimensions '> 1' are supported; for one-dimensional transforms, only
4424composite (non-prime) 'n0' are currently supported (unlike the serial
4425FFTW). Requesting an unsupported transform size will yield a 'NULL'
4426plan.  (As in the serial interface, highly composite sizes generally
4427yield the best performance.)
4428
4429   The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows
4430you to specify the block sizes for the first dimension ('block') of the
4431n[0] x n[1] x n[2] x ...  x n[d-1] input data and the first dimension
4432('tblock') of the n[1] x n[0] x n[2] x ...  x n[d-1] transposed data (at
4433intermediate steps of the transform, and for the output if
4434'FFTW_TRANSPOSED_OUT' is specified in 'flags').  These must be the same
4435block sizes as were passed to the corresponding 'local_size' function;
4436you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
4437as in the basic interface.  Also, the 'howmany' parameter specifies that
4438the transform is of contiguous 'howmany'-tuples rather than individual
4439complex numbers; this corresponds to the same parameter in the serial
4440advanced interface (*note Advanced Complex DFTs::) with 'stride =
4441howmany' and 'dist = 1'.
4442
4443MPI flags
4444.........
4445
4446The 'flags' can be any of those for the serial FFTW (*note Planner
4447Flags::), and in addition may include one or more of the following
4448MPI-specific flags, which improve performance at the cost of changing
4449the output or input data formats.
4450
4451   * 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d
4452     transforms only, these flags indicate that the output/input of the
4453     transform are in an undocumented "scrambled" order.  A forward
4454     'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
4455     'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).  *Note
4456     One-dimensional distributions::.
4457
4458   * 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for
4459     multidimensional ('rnk > 1') transforms only, these flags specify
4460     that the output or input of an n[0] x n[1] x n[2] x ...  x n[d-1]
4461     transform is transposed to n[1] x n[0] x n[2] x ...  x n[d-1] .
4462     *Note Transposed distributions::.
4463
4464Real-data MPI DFTs
4465..................
4466
4467Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI
4468DFTs of Real Data::) are created by:
4469
4470     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
4471                                        double *in, fftw_complex *out,
4472                                        MPI_Comm comm, unsigned flags);
4473     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
4474                                        double *in, fftw_complex *out,
4475                                        MPI_Comm comm, unsigned flags);
4476     fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4477                                        double *in, fftw_complex *out,
4478                                        MPI_Comm comm, unsigned flags);
4479     fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
4480                                     double *in, fftw_complex *out,
4481                                     MPI_Comm comm, unsigned flags);
4482     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4483                                        fftw_complex *in, double *out,
4484                                        MPI_Comm comm, unsigned flags);
4485     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4486                                        fftw_complex *in, double *out,
4487                                        MPI_Comm comm, unsigned flags);
4488     fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4489                                        fftw_complex *in, double *out,
4490                                        MPI_Comm comm, unsigned flags);
4491     fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
4492                                     fftw_complex *in, double *out,
4493                                     MPI_Comm comm, unsigned flags);
4494
4495   Similar to the serial interface (*note Real-data DFTs::), these
4496transform logically n[0] x n[1] x n[2] x ...  x n[d-1] real data to/from
4497n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) complex data, representing
4498the non-redundant half of the conjugate-symmetry output of a real-input
4499DFT (*note Multi-dimensional Transforms::).  However, the real array
4500must be stored within a padded n[0] x n[1] x n[2] x ...  x [2 (n[d-1]/2
4501+ 1)] array (much like the in-place serial r2c transforms, but here for
4502out-of-place transforms as well).  Currently, only multi-dimensional
4503('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk
4504= 1' will yield 'NULL').  As explained above (*note Multi-dimensional
4505MPI DFTs of Real Data::), the data distribution of both the real and
4506complex arrays is given by the 'local_size' function called for the
4507dimensions of the _complex_ array.  Similar to the other planning
4508functions, the input and output arrays are overwritten when the plan is
4509created except in 'FFTW_ESTIMATE' mode.
4510
4511   As for the complex DFTs above, there is an advance interface that
4512allows you to manually specify block sizes and to transform contiguous
4513'howmany'-tuples of real/complex numbers:
4514
4515     fftw_plan fftw_mpi_plan_many_dft_r2c
4516                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4517                    ptrdiff_t iblock, ptrdiff_t oblock,
4518                    double *in, fftw_complex *out,
4519                    MPI_Comm comm, unsigned flags);
4520     fftw_plan fftw_mpi_plan_many_dft_c2r
4521                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4522                    ptrdiff_t iblock, ptrdiff_t oblock,
4523                    fftw_complex *in, double *out,
4524                    MPI_Comm comm, unsigned flags);
4525
4526MPI r2r transforms
4527..................
4528
4529There are corresponding plan-creation routines for r2r transforms (*note
4530More DFTs of Real Data::), currently supporting multidimensional ('rnk >
45311') transforms only ('rnk = 1' will yield a 'NULL' plan):
4532
4533     fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4534                                    double *in, double *out,
4535                                    MPI_Comm comm,
4536                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1,
4537                                    unsigned flags);
4538     fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4539                                    double *in, double *out,
4540                                    MPI_Comm comm,
4541                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
4542                                    unsigned flags);
4543     fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
4544                                 double *in, double *out,
4545                                 MPI_Comm comm, const fftw_r2r_kind *kind,
4546                                 unsigned flags);
4547     fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
4548                                      ptrdiff_t iblock, ptrdiff_t oblock,
4549                                      double *in, double *out,
4550                                      MPI_Comm comm, const fftw_r2r_kind *kind,
4551                                      unsigned flags);
4552
4553   The parameters are much the same as for the complex DFTs above,
4554except that the arrays are of real numbers (and hence the outputs of the
4555'local_size' data-distribution functions should be interpreted as counts
4556of real rather than complex numbers).  Also, the 'kind' parameters
4557specify the r2r kinds along each dimension as for the serial interface
4558(*note Real-to-Real Transform Kinds::).  *Note Other Multi-dimensional
4559Real-data MPI Transforms::.
4560
4561MPI transposition
4562.................
4563
4564FFTW also provides routines to plan a transpose of a distributed 'n0' by
4565'n1' array of real numbers, or an array of 'howmany'-tuples of real
4566numbers with specified block sizes (*note FFTW MPI Transposes::):
4567
4568     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
4569                                       double *in, double *out,
4570                                       MPI_Comm comm, unsigned flags);
4571     fftw_plan fftw_mpi_plan_many_transpose
4572                     (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
4573                      ptrdiff_t block0, ptrdiff_t block1,
4574                      double *in, double *out, MPI_Comm comm, unsigned flags);
4575
4576   These plans are used with the 'fftw_mpi_execute_r2r' new-array
4577execute function (*note Using MPI Plans::), since they count as (rank
4578zero) r2r plans from FFTW's perspective.
4579
4580
4581File: fftw3.info,  Node: MPI Wisdom Communication,  Prev: MPI Plan Creation,  Up: FFTW MPI Reference
4582
45836.12.6 MPI Wisdom Communication
4584-------------------------------
4585
4586To facilitate synchronizing wisdom among the different MPI processes, we
4587provide two functions:
4588
4589     void fftw_mpi_gather_wisdom(MPI_Comm comm);
4590     void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
4591
4592   The 'fftw_mpi_gather_wisdom' function gathers all wisdom in the given
4593communicator 'comm' to the process of rank 0 in the communicator: that
4594process obtains the union of all wisdom on all the processes.  As a side
4595effect, some other processes will gain additional wisdom from other
4596processes, but only process 0 will gain the complete union.
4597
4598   The 'fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom
4599from process 0 in 'comm' to all other processes in the communicator,
4600replacing any wisdom they currently have.
4601
4602   *Note FFTW MPI Wisdom::.
4603
4604
4605File: fftw3.info,  Node: FFTW MPI Fortran Interface,  Prev: FFTW MPI Reference,  Up: Distributed-memory FFTW with MPI
4606
46076.13 FFTW MPI Fortran Interface
4608===============================
4609
4610The FFTW MPI interface is callable from modern Fortran compilers
4611supporting the Fortran 2003 'iso_c_binding' standard for calling C
4612functions.  As described in *note Calling FFTW from Modern Fortran::,
4613this means that you can directly call FFTW's C interface from Fortran
4614with only minor changes in syntax.  There are, however, a few things
4615specific to the MPI interface to keep in mind:
4616
4617   * Instead of including 'fftw3.f03' as in *note Overview of Fortran
4618     interface::, you should 'include 'fftw3-mpi.f03'' (after 'use,
4619     intrinsic :: iso_c_binding' as before).  The 'fftw3-mpi.f03' file
4620     includes 'fftw3.f03', so you should _not_ 'include' them both
4621     yourself.  (You will also want to include the MPI header file,
4622     usually via 'include 'mpif.h'' or similar, although though this is
4623     not needed by 'fftw3-mpi.f03' per se.)  (To use the 'fftwl_' 'long
4624     double' extended-precision routines in supporting compilers, you
4625     should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
4626     *Note Extended and quadruple precision in Fortran::.)
4627
4628   * Because of the different storage conventions between C and Fortran,
4629     you reverse the order of your array dimensions when passing them to
4630     FFTW (*note Reversing array dimensions::).  This is merely a
4631     difference in notation and incurs no performance overhead.
4632     However, it means that, whereas in C the _first_ dimension is
4633     distributed, in Fortran the _last_ dimension of your array is
4634     distributed.
4635
4636   * In Fortran, communicators are stored as 'integer' types; there is
4637     no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
4638     Fortunately, this is taken care of for you by the FFTW Fortran
4639     interface: whenever the C interface expects an 'MPI_Comm' type, you
4640     should pass the Fortran communicator as an 'integer'.(1)
4641
4642   * Because you need to call the 'local_size' function to find out how
4643     much space to allocate, and this may be _larger_ than the local
4644     portion of the array (*note MPI Data Distribution::), you should
4645     _always_ allocate your arrays dynamically using FFTW's allocation
4646     routines as described in *note Allocating aligned memory in
4647     Fortran::.  (Coincidentally, this also provides the best
4648     performance by guaranteeding proper data alignment.)
4649
4650   * Because all sizes in the MPI FFTW interface are declared as
4651     'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
4652     (*note FFTW Fortran type reference::).
4653
4654   * In Fortran, because of the language semantics, we generally
4655     recommend using the new-array execute functions for all plans, even
4656     in the common case where you are executing the plan on the same
4657     arrays for which the plan was created (*note Plan execution in
4658     Fortran::).  However, note that in the MPI interface these
4659     functions are changed: 'fftw_execute_dft' becomes
4660     'fftw_mpi_execute_dft', etcetera.  *Note Using MPI Plans::.
4661
4662   For example, here is a Fortran code snippet to perform a distributed
4663L x M complex DFT in-place.  (This assumes you have already initialized
4664MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
4665
4666       use, intrinsic :: iso_c_binding
4667       include 'fftw3-mpi.f03'
4668       integer(C_INTPTR_T), parameter :: L = ...
4669       integer(C_INTPTR_T), parameter :: M = ...
4670       type(C_PTR) :: plan, cdata
4671       complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
4672       integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
4673
4674     !   get local data size and allocate (note dimension reversal)
4675       alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
4676                                            local_M, local_j_offset)
4677       cdata = fftw_alloc_complex(alloc_local)
4678       call c_f_pointer(cdata, data, [L,local_M])
4679
4680     !   create MPI plan for in-place forward DFT (note dimension reversal)
4681       plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
4682                                   FFTW_FORWARD, FFTW_MEASURE)
4683
4684     ! initialize data to some function my_function(i,j)
4685       do j = 1, local_M
4686         do i = 1, L
4687           data(i, j) = my_function(i, j + local_j_offset)
4688         end do
4689       end do
4690
4691     ! compute transform (as many times as desired)
4692       call fftw_mpi_execute_dft(plan, data, data)
4693
4694       call fftw_destroy_plan(plan)
4695       call fftw_free(cdata)
4696
4697   Note that when we called 'fftw_mpi_local_size_2d' and
4698'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
4699x M Fortran array is viewed by FFTW in C as a M x L array.  This means
4700that the array was distributed over the 'M' dimension, the local portion
4701of which is a L x local_M array in Fortran.  (You must _not_ use an
4702'allocate' statement to allocate an L x local_M array, however; you must
4703allocate 'alloc_local' complex numbers, which may be greater than 'L *
4704local_M', in order to reserve space for intermediate steps of the
4705transform.)  Finally, we mention that because C's array indices are
4706zero-based, the 'local_j_offset' argument can conveniently be
4707interpreted as an offset in the 1-based 'j' index (rather than as a
4708starting index as in C).
4709
4710   If instead you had used the 'ior(FFTW_MEASURE,
4711FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
4712transposed M x local_L array, associated with the _same_ 'cdata'
4713allocation (since the transform is in-place), and which you could
4714declare with:
4715
4716       complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
4717       ...
4718       call c_f_pointer(cdata, tdata, [M,local_L])
4719
4720   where 'local_L' would have been obtained by changing the
4721'fftw_mpi_local_size_2d' call to:
4722
4723       alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
4724                                local_M, local_j_offset, local_L, local_i_offset)
4725
4726   ---------- Footnotes ----------
4727
4728   (1) Technically, this is because you aren't actually calling the C
4729functions directly.  You are calling wrapper functions that translate
4730the communicator with 'MPI_Comm_f2c' before calling the ordinary C
4731interface.  This is all done transparently, however, since the
4732'fftw3-mpi.f03' interface file renames the wrappers so that they are
4733called in Fortran with the same names as the C interface functions.
4734
4735
4736File: fftw3.info,  Node: Calling FFTW from Modern Fortran,  Next: Calling FFTW from Legacy Fortran,  Prev: Distributed-memory FFTW with MPI,  Up: Top
4737
47387 Calling FFTW from Modern Fortran
4739**********************************
4740
4741Fortran 2003 standardized ways for Fortran code to call C libraries, and
4742this allows us to support a direct translation of the FFTW C API into
4743Fortran.  Compared to the legacy Fortran 77 interface (*note Calling
4744FFTW from Legacy Fortran::), this direct interface offers many
4745advantages, especially compile-time type-checking and aligned memory
4746allocation.  As of this writing, support for these C interoperability
4747features seems widespread, having been implemented in nearly all major
4748Fortran compilers (e.g.  GNU, Intel, IBM, Oracle/Solaris, Portland
4749Group, NAG).
4750
4751   This chapter documents that interface.  For the most part, since this
4752interface allows Fortran to call the C interface directly, the usage is
4753identical to C translated to Fortran syntax.  However, there are a few
4754subtle points such as memory allocation, wisdom, and data types that
4755deserve closer attention.
4756
4757* Menu:
4758
4759* Overview of Fortran interface::
4760* Reversing array dimensions::
4761* FFTW Fortran type reference::
4762* Plan execution in Fortran::
4763* Allocating aligned memory in Fortran::
4764* Accessing the wisdom API from Fortran::
4765* Defining an FFTW module::
4766
4767
4768File: fftw3.info,  Node: Overview of Fortran interface,  Next: Reversing array dimensions,  Prev: Calling FFTW from Modern Fortran,  Up: Calling FFTW from Modern Fortran
4769
47707.1 Overview of Fortran interface
4771=================================
4772
4773FFTW provides a file 'fftw3.f03' that defines Fortran 2003 interfaces
4774for all of its C routines, except for the MPI routines described
4775elsewhere, which can be found in the same directory as 'fftw3.h' (the C
4776header file).  In any Fortran subroutine where you want to use FFTW
4777functions, you should begin with:
4778
4779       use, intrinsic :: iso_c_binding
4780       include 'fftw3.f03'
4781
4782   This includes the interface definitions and the standard
4783'iso_c_binding' module (which defines the equivalents of C types).  You
4784can also put the FFTW functions into a module if you prefer (*note
4785Defining an FFTW module::).
4786
4787   At this point, you can now call anything in the FFTW C interface
4788directly, almost exactly as in C other than minor changes in syntax.
4789For example:
4790
4791       type(C_PTR) :: plan
4792       complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out
4793       plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4794       ...
4795       call fftw_execute_dft(plan, in, out)
4796       ...
4797       call fftw_destroy_plan(plan)
4798
4799   A few important things to keep in mind are:
4800
4801   * FFTW plans are 'type(C_PTR)'.  Other C types are mapped in the
4802     obvious way via the 'iso_c_binding' standard: 'int' turns into
4803     'integer(C_INT)', 'fftw_complex' turns into
4804     'complex(C_DOUBLE_COMPLEX)', 'double' turns into 'real(C_DOUBLE)',
4805     and so on.  *Note FFTW Fortran type reference::.
4806
4807   * Functions in C become functions in Fortran if they have a return
4808     value, and subroutines in Fortran otherwise.
4809
4810   * The ordering of the Fortran array dimensions must be _reversed_
4811     when they are passed to the FFTW plan creation, thanks to
4812     differences in array indexing conventions (*note Multi-dimensional
4813     Array Format::).  This is _unlike_ the legacy Fortran interface
4814     (*note Fortran-interface routines::), which reversed the dimensions
4815     for you.  *Note Reversing array dimensions::.
4816
4817   * Using ordinary Fortran array declarations like this works, but may
4818     yield suboptimal performance because the data may not be not
4819     aligned to exploit SIMD instructions on modern proessors (*note
4820     SIMD alignment and fftw_malloc::).  Better performance will often
4821     be obtained by allocating with 'fftw_alloc'.  *Note Allocating
4822     aligned memory in Fortran::.
4823
4824   * Similar to the legacy Fortran interface (*note FFTW Execution in
4825     Fortran::), we currently recommend _not_ using 'fftw_execute' but
4826     rather using the more specialized functions like 'fftw_execute_dft'
4827     (*note New-array Execute Functions::).  However, you should execute
4828     the plan on the 'same arrays' as the ones for which you created the
4829     plan, unless you are especially careful.  *Note Plan execution in
4830     Fortran::.  To prevent you from using 'fftw_execute' by mistake,
4831     the 'fftw3.f03' file does not provide an 'fftw_execute' interface
4832     declaration.
4833
4834   * Multiple planner flags are combined with 'ior' (equivalent to '|'
4835     in C). e.g.  'FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes
4836     'ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'.  (You can also use '+' as
4837     long as you don't try to include a given flag more than once.)
4838
4839* Menu:
4840
4841* Extended and quadruple precision in Fortran::
4842
4843
4844File: fftw3.info,  Node: Extended and quadruple precision in Fortran,  Prev: Overview of Fortran interface,  Up: Overview of Fortran interface
4845
48467.1.1 Extended and quadruple precision in Fortran
4847-------------------------------------------------
4848
4849If FFTW is compiled in 'long double' (extended) precision (*note
4850Installation and Customization::), you may be able to call the resulting
4851'fftwl_' routines (*note Precision::) from Fortran if your compiler
4852supports the 'C_LONG_DOUBLE_COMPLEX' type code.
4853
4854   Because some Fortran compilers do not support
4855'C_LONG_DOUBLE_COMPLEX', the 'fftwl_' declarations are segregated into a
4856separate interface file 'fftw3l.f03', which you should include _in
4857addition_ to 'fftw3.f03' (which declares precision-independent 'FFTW_'
4858constants):
4859
4860       use, intrinsic :: iso_c_binding
4861       include 'fftw3.f03'
4862       include 'fftw3l.f03'
4863
4864   We also support using the nonstandard '__float128'
4865quadruple-precision type provided by recent versions of 'gcc' on 32- and
486664-bit x86 hardware (*note Installation and Customization::), using the
4867corresponding 'real(16)' and 'complex(16)' types supported by
4868'gfortran'.  The quadruple-precision 'fftwq_' functions (*note
4869Precision::) are declared in a 'fftw3q.f03' interface file, which should
4870be included in addition to 'fftw3.f03', as above.  You should also link
4871with '-lfftw3q -lquadmath -lm' as in C.
4872
4873
4874File: fftw3.info,  Node: Reversing array dimensions,  Next: FFTW Fortran type reference,  Prev: Overview of Fortran interface,  Up: Calling FFTW from Modern Fortran
4875
48767.2 Reversing array dimensions
4877==============================
4878
4879A minor annoyance in calling FFTW from Fortran is that FFTW's array
4880dimensions are defined in the C convention (row-major order), while
4881Fortran's array dimensions are the opposite convention (column-major
4882order).  *Note Multi-dimensional Array Format::.  This is just a
4883bookkeeping difference, with no effect on performance.  The only
4884consequence of this is that, whenever you create an FFTW plan for a
4885multi-dimensional transform, you must always _reverse the ordering of
4886the dimensions_.
4887
4888   For example, consider the three-dimensional (L x M x N ) arrays:
4889
4890       complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out
4891
4892   To plan a DFT for these arrays using 'fftw_plan_dft_3d', you could
4893do:
4894
4895       plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4896
4897   That is, from FFTW's perspective this is a N x M x L array.  _No data
4898transposition need occur_, as this is _only notation_.  Similarly, to
4899use the more generic routine 'fftw_plan_dft' with the same arrays, you
4900could do:
4901
4902       integer(C_INT), dimension(3) :: n = [N,M,L]
4903       plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4904
4905   Note, by the way, that this is different from the legacy Fortran
4906interface (*note Fortran-interface routines::), which automatically
4907reverses the order of the array dimension for you.  Here, you are
4908calling the C interface directly, so there is no "translation" layer.
4909
4910   An important thing to keep in mind is the implication of this for
4911multidimensional real-to-complex transforms (*note Multi-Dimensional
4912DFTs of Real Data::).  In C, a multidimensional real-to-complex DFT
4913chops the last dimension roughly in half (N x M x L real input goes to N
4914x M x L/2+1 complex output).  In Fortran, because the array dimension
4915notation is reversed, the _first_ dimension of the complex data is
4916chopped roughly in half.  For example consider the 'r2c' transform of L
4917x M x N real input in Fortran:
4918
4919       type(C_PTR) :: plan
4920       real(C_DOUBLE), dimension(L,M,N) :: in
4921       complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out
4922       plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
4923       ...
4924       call fftw_execute_dft_r2c(plan, in, out)
4925
4926   Alternatively, for an in-place r2c transform, as described in the C
4927documentation we must _pad_ the _first_ dimension of the real input with
4928an extra two entries (which are ignored by FFTW) so as to leave enough
4929space for the complex output.  The input is _allocated_ as a 2[L/2+1] x
4930M x N array, even though only L x M x N of it is actually used.  In this
4931example, we will allocate the array as a pointer type, using
4932'fftw_alloc' to ensure aligned memory for maximum performance (*note
4933Allocating aligned memory in Fortran::); this also makes it easy to
4934reference the same memory as both a real array and a complex array.
4935
4936       real(C_DOUBLE), pointer :: in(:,:,:)
4937       complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:)
4938       type(C_PTR) :: plan, data
4939       data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T))
4940       call c_f_pointer(data, in, [2*(L/2+1),M,N])
4941       call c_f_pointer(data, out, [L/2+1,M,N])
4942       plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
4943       ...
4944       call fftw_execute_dft_r2c(plan, in, out)
4945       ...
4946       call fftw_destroy_plan(plan)
4947       call fftw_free(data)
4948
4949
4950File: fftw3.info,  Node: FFTW Fortran type reference,  Next: Plan execution in Fortran,  Prev: Reversing array dimensions,  Up: Calling FFTW from Modern Fortran
4951
49527.3 FFTW Fortran type reference
4953===============================
4954
4955The following are the most important type correspondences between the C
4956interface and Fortran:
4957
4958   * Plans ('fftw_plan' and variants) are 'type(C_PTR)' (i.e.  an opaque
4959     pointer).
4960
4961   * The C floating-point types 'double', 'float', and 'long double'
4962     correspond to 'real(C_DOUBLE)', 'real(C_FLOAT)', and
4963     'real(C_LONG_DOUBLE)', respectively.  The C complex types
4964     'fftw_complex', 'fftwf_complex', and 'fftwl_complex' correspond in
4965     Fortran to 'complex(C_DOUBLE_COMPLEX)', 'complex(C_FLOAT_COMPLEX)',
4966     and 'complex(C_LONG_DOUBLE_COMPLEX)', respectively.  Just as in C
4967     (*note Precision::), the FFTW subroutines and types are prefixed
4968     with 'fftw_', 'fftwf_', and 'fftwl_' for the different precisions,
4969     and link to different libraries ('-lfftw3', '-lfftw3f', and
4970     '-lfftw3l' on Unix), but use the _same_ include file 'fftw3.f03'
4971     and the _same_ constants (all of which begin with 'FFTW_').  The
4972     exception is 'long double' precision, for which you should _also_
4973     include 'fftw3l.f03' (*note Extended and quadruple precision in
4974     Fortran::).
4975
4976   * The C integer types 'int' and 'unsigned' (used for planner flags)
4977     become 'integer(C_INT)'.  The C integer type 'ptrdiff_t' (e.g.  in
4978     the *note 64-bit Guru Interface::) becomes 'integer(C_INTPTR_T)',
4979     and 'size_t' (in 'fftw_malloc' etc.)  becomes 'integer(C_SIZE_T)'.
4980
4981   * The 'fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::)
4982     becomes 'integer(C_FFTW_R2R_KIND)'.  The various constant values of
4983     the C enumerated type ('FFTW_R2HC' etc.)  become simply integer
4984     constants of the same names in Fortran.
4985
4986   * Numeric array pointer arguments (e.g.  'double *') become
4987     'dimension(*), intent(out)' arrays of the same type, or
4988     'dimension(*), intent(in)' if they are pointers to constant data
4989     (e.g.  'const int *').  There are a few exceptions where numeric
4990     pointers refer to scalar outputs (e.g.  for 'fftw_flops'), in which
4991     case they are 'intent(out)' scalar arguments in Fortran too.  For
4992     the new-array execute functions (*note New-array Execute
4993     Functions::), the input arrays are declared 'dimension(*),
4994     intent(inout)', since they can be modified in the case of in-place
4995     or 'FFTW_DESTROY_INPUT' transforms.
4996
4997   * Pointer _return_ values (e.g 'double *') become 'type(C_PTR)'.  (If
4998     they are pointers to arrays, as for 'fftw_alloc_real', you can
4999     convert them back to Fortran array pointers with the standard
5000     intrinsic function 'c_f_pointer'.)
5001
5002   * The 'fftw_iodim' type in the guru interface (*note Guru vector and
5003     transform sizes::) becomes 'type(fftw_iodim)' in Fortran, a derived
5004     data type (the Fortran analogue of C's 'struct') with three
5005     'integer(C_INT)' components: 'n', 'is', and 'os', with the same
5006     meanings as in C. The 'fftw_iodim64' type in the 64-bit guru
5007     interface (*note 64-bit Guru Interface::) is the same, except that
5008     its components are of type 'integer(C_INTPTR_T)'.
5009
5010   * Using the wisdom import/export functions from Fortran is a bit
5011     tricky, and is discussed in *note Accessing the wisdom API from
5012     Fortran::.  In brief, the 'FILE *' arguments map to 'type(C_PTR)',
5013     'const char *' to 'character(C_CHAR), dimension(*), intent(in)'
5014     (null-terminated!), and the generic read-char/write-char functions
5015     map to 'type(C_FUNPTR)'.
5016
5017   You may be wondering if you need to search-and-replace
5018'real(kind(0.0d0))' (or whatever your favorite Fortran spelling of
5019"double precision" is) with 'real(C_DOUBLE)' everywhere in your program,
5020and similarly for 'complex' and 'integer' types.  The answer is no; you
5021can still use your existing types.  As long as these types match their C
5022counterparts, things should work without a hitch.  The worst that can
5023happen, e.g.  in the (unlikely) event of a system where
5024'real(kind(0.0d0))' is different from 'real(C_DOUBLE)', is that the
5025compiler will give you a type-mismatch error.  That is, if you don't use
5026the 'iso_c_binding' kinds you need to accept at least the theoretical
5027possibility of having to change your code in response to compiler errors
5028on some future machine, but you don't need to worry about silently
5029compiling incorrect code that yields runtime errors.
5030
5031
5032File: fftw3.info,  Node: Plan execution in Fortran,  Next: Allocating aligned memory in Fortran,  Prev: FFTW Fortran type reference,  Up: Calling FFTW from Modern Fortran
5033
50347.4 Plan execution in Fortran
5035=============================
5036
5037In C, in order to use a plan, one normally calls 'fftw_execute', which
5038executes the plan to perform the transform on the input/output arrays
5039passed when the plan was created (*note Using Plans::).  The
5040corresponding subroutine call in modern Fortran is:
5041      call fftw_execute(plan)
5042
5043   However, we have had reports that this causes problems with some
5044recent optimizing Fortran compilers.  The problem is, because the
5045input/output arrays are not passed as explicit arguments to
5046'fftw_execute', the semantics of Fortran (unlike C) allow the compiler
5047to assume that the input/output arrays are not changed by
5048'fftw_execute'.  As a consequence, certain compilers end up
5049repositioning the call to 'fftw_execute', assuming incorrectly that it
5050does nothing to the arrays.
5051
5052   There are various workarounds to this, but the safest and simplest
5053thing is to not use 'fftw_execute' in Fortran.  Instead, use the
5054functions described in *note New-array Execute Functions::, which take
5055the input/output arrays as explicit arguments.  For example, if the plan
5056is for a complex-data DFT and was created for the arrays 'in' and 'out',
5057you would do:
5058      call fftw_execute_dft(plan, in, out)
5059
5060   There are a few things to be careful of, however:
5061
5062   * You must use the correct type of execute function, matching the way
5063     the plan was created.  Complex DFT plans should use
5064     'fftw_execute_dft', Real-input (r2c) DFT plans should use use
5065     'fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
5066     'fftw_execute_dft_c2r'.  The various r2r plans should use
5067     'fftw_execute_r2r'.  Fortunately, if you use the wrong one you will
5068     get a compile-time type-mismatch error (unlike legacy Fortran).
5069
5070   * You should normally pass the same input/output arrays that were
5071     used when creating the plan.  This is always safe.
5072
5073   * _If_ you pass _different_ input/output arrays compared to those
5074     used when creating the plan, you must abide by all the restrictions
5075     of the new-array execute functions (*note New-array Execute
5076     Functions::).  The most tricky of these is the requirement that the
5077     new arrays have the same alignment as the original arrays; the best
5078     (and possibly only) way to guarantee this is to use the
5079     'fftw_alloc' functions to allocate your arrays (*note Allocating
5080     aligned memory in Fortran::).  Alternatively, you can use the
5081     'FFTW_UNALIGNED' flag when creating the plan, in which case the
5082     plan does not depend on the alignment, but this may sacrifice
5083     substantial performance on architectures (like x86) with SIMD
5084     instructions (*note SIMD alignment and fftw_malloc::).
5085
5086
5087File: fftw3.info,  Node: Allocating aligned memory in Fortran,  Next: Accessing the wisdom API from Fortran,  Prev: Plan execution in Fortran,  Up: Calling FFTW from Modern Fortran
5088
50897.5 Allocating aligned memory in Fortran
5090========================================
5091
5092In order to obtain maximum performance in FFTW, you should store your
5093data in arrays that have been specially aligned in memory (*note SIMD
5094alignment and fftw_malloc::).  Enforcing alignment also permits you to
5095safely use the new-array execute functions (*note New-array Execute
5096Functions::) to apply a given plan to more than one pair of in/out
5097arrays.  Unfortunately, standard Fortran arrays do _not_ provide any
5098alignment guarantees.  The _only_ way to allocate aligned memory in
5099standard Fortran is to allocate it with an external C function, like the
5100'fftw_alloc_real' and 'fftw_alloc_complex' functions.  Fortunately,
5101Fortran 2003 provides a simple way to associate such allocated memory
5102with a standard Fortran array pointer that you can then use normally.
5103
5104   We therefore recommend allocating all your input/output arrays using
5105the following technique:
5106
5107  1. Declare a 'pointer', 'arr', to your array of the desired type and
5108     dimensions.  For example, 'real(C_DOUBLE), pointer :: a(:,:)' for a
5109     2d real array, or 'complex(C_DOUBLE_COMPLEX), pointer :: a(:,:,:)'
5110     for a 3d complex array.
5111
5112  2. The number of elements to allocate must be an 'integer(C_SIZE_T)'.
5113     You can either declare a variable of this type, e.g.
5114     'integer(C_SIZE_T) :: sz', to store the number of elements to
5115     allocate, or you can use the 'int(..., C_SIZE_T)' intrinsic
5116     function.  e.g.  set 'sz = L * M * N' or use 'int(L * M * N,
5117     C_SIZE_T)' for an L x M x N array.
5118
5119  3. Declare a 'type(C_PTR) :: p' to hold the return value from FFTW's
5120     allocation routine.  Set 'p = fftw_alloc_real(sz)' for a real
5121     array, or 'p = fftw_alloc_complex(sz)' for a complex array.
5122
5123  4. Associate your pointer 'arr' with the allocated memory 'p' using
5124     the standard 'c_f_pointer' subroutine: 'call c_f_pointer(p, arr,
5125     [...dimensions...])', where '[...dimensions...])' are an array of
5126     the dimensions of the array (in the usual Fortran order).  e.g.
5127     'call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array.
5128     (Alternatively, you can omit the dimensions argument if you
5129     specified the shape explicitly when declaring 'arr'.)  You can now
5130     use 'arr' as a usual multidimensional array.
5131
5132  5. When you are done using the array, deallocate the memory by 'call
5133     fftw_free(p)' on 'p'.
5134
5135   For example, here is how we would allocate an L x M 2d real array:
5136
5137       real(C_DOUBLE), pointer :: arr(:,:)
5138       type(C_PTR) :: p
5139       p = fftw_alloc_real(int(L * M, C_SIZE_T))
5140       call c_f_pointer(p, arr, [L,M])
5141       _...use arr and arr(i,j) as usual..._
5142       call fftw_free(p)
5143
5144   and here is an L x M x N 3d complex array:
5145
5146       complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:)
5147       type(C_PTR) :: p
5148       p = fftw_alloc_complex(int(L * M * N, C_SIZE_T))
5149       call c_f_pointer(p, arr, [L,M,N])
5150       _...use arr and arr(i,j,k) as usual..._
5151       call fftw_free(p)
5152
5153   See *note Reversing array dimensions:: for an example allocating a
5154single array and associating both real and complex array pointers with
5155it, for in-place real-to-complex transforms.
5156
5157
5158File: fftw3.info,  Node: Accessing the wisdom API from Fortran,  Next: Defining an FFTW module,  Prev: Allocating aligned memory in Fortran,  Up: Calling FFTW from Modern Fortran
5159
51607.6 Accessing the wisdom API from Fortran
5161=========================================
5162
5163As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a
5164"wisdom" API for saving plans to disk so that they can be recreated
5165quickly.  The C API for exporting (*note Wisdom Export::) and importing
5166(*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran,
5167however, because of differences in file I/O and string types between C
5168and Fortran.
5169
5170* Menu:
5171
5172* Wisdom File Export/Import from Fortran::
5173* Wisdom String Export/Import from Fortran::
5174* Wisdom Generic Export/Import from Fortran::
5175
5176
5177File: fftw3.info,  Node: Wisdom File Export/Import from Fortran,  Next: Wisdom String Export/Import from Fortran,  Prev: Accessing the wisdom API from Fortran,  Up: Accessing the wisdom API from Fortran
5178
51797.6.1 Wisdom File Export/Import from Fortran
5180--------------------------------------------
5181
5182The easiest way to export and import wisdom is to do so using
5183'fftw_export_wisdom_to_filename' and 'fftw_wisdom_from_filename'.  The
5184only trick is that these require you to pass a C string, which is an
5185array of type 'CHARACTER(C_CHAR)' that is terminated by 'C_NULL_CHAR'.
5186You can call them like this:
5187
5188       integer(C_INT) :: ret
5189       ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
5190       if (ret .eq. 0) stop 'error exporting wisdom to file'
5191       ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
5192       if (ret .eq. 0) stop 'error importing wisdom from file'
5193
5194   Note that prepending 'C_CHAR_' is needed to specify that the literal
5195string is of kind 'C_CHAR', and we null-terminate the string by
5196appending '// C_NULL_CHAR'.  These functions return an 'integer(C_INT)'
5197('ret') which is '0' if an error occurred during export/import and
5198nonzero otherwise.
5199
5200   It is also possible to use the lower-level routines
5201'fftw_export_wisdom_to_file' and 'fftw_import_wisdom_from_file', which
5202accept parameters of the C type 'FILE*', expressed in Fortran as
5203'type(C_PTR)'.  However, you are then responsible for creating the
5204'FILE*' yourself.  You can do this by using 'iso_c_binding' to define
5205Fortran intefaces for the C library functions 'fopen' and 'fclose',
5206which is a bit strange in Fortran but workable.
5207
5208
5209File: fftw3.info,  Node: Wisdom String Export/Import from Fortran,  Next: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom File Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
5210
52117.6.2 Wisdom String Export/Import from Fortran
5212----------------------------------------------
5213
5214Dealing with FFTW's C string export/import is a bit more painful.  In
5215particular, the 'fftw_export_wisdom_to_string' function requires you to
5216deal with a dynamically allocated C string.  To get its length, you must
5217define an interface to the C 'strlen' function, and to deallocate it you
5218must define an interface to C 'free':
5219
5220       use, intrinsic :: iso_c_binding
5221       interface
5222         integer(C_INT) function strlen(s) bind(C, name='strlen')
5223           import
5224           type(C_PTR), value :: s
5225         end function strlen
5226         subroutine free(p) bind(C, name='free')
5227           import
5228           type(C_PTR), value :: p
5229         end subroutine free
5230       end interface
5231
5232   Given these definitions, you can then export wisdom to a Fortran
5233character array:
5234
5235       character(C_CHAR), pointer :: s(:)
5236       integer(C_SIZE_T) :: slen
5237       type(C_PTR) :: p
5238       p = fftw_export_wisdom_to_string()
5239       if (.not. c_associated(p)) stop 'error exporting wisdom'
5240       slen = strlen(p)
5241       call c_f_pointer(p, s, [slen+1])
5242       ...
5243       call free(p)
5244
5245   Note that 'slen' is the length of the C string, but the length of the
5246array is 'slen+1' because it includes the terminating null character.
5247(You can omit the '+1' if you don't want Fortran to know about the null
5248character.)  The standard 'c_associated' function checks whether 'p' is
5249a null pointer, which is returned by 'fftw_export_wisdom_to_string' if
5250there was an error.
5251
5252   To import wisdom from a string, use 'fftw_import_wisdom_from_string'
5253as usual; note that the argument of this function must be a
5254'character(C_CHAR)' that is terminated by the 'C_NULL_CHAR' character,
5255like the 's' array above.
5256
5257
5258File: fftw3.info,  Node: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom String Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
5259
52607.6.3 Wisdom Generic Export/Import from Fortran
5261-----------------------------------------------
5262
5263The most generic wisdom export/import functions allow you to provide an
5264arbitrary callback function to read/write one character at a time in any
5265way you want.  However, your callback function must be written in a
5266special way, using the 'bind(C)' attribute to be passed to a C
5267interface.
5268
5269   In particular, to call the generic wisdom export function
5270'fftw_export_wisdom', you would write a callback subroutine of the form:
5271
5272       subroutine my_write_char(c, p) bind(C)
5273         use, intrinsic :: iso_c_binding
5274         character(C_CHAR), value :: c
5275         type(C_PTR), value :: p
5276         _...write c..._
5277       end subroutine my_write_char
5278
5279   Given such a subroutine (along with the corresponding interface
5280definition), you could then export wisdom using:
5281
5282       call fftw_export_wisdom(c_funloc(my_write_char), p)
5283
5284   The standard 'c_funloc' intrinsic converts a Fortran 'bind(C)'
5285subroutine into a C function pointer.  The parameter 'p' is a
5286'type(C_PTR)' to any arbitrary data that you want to pass to
5287'my_write_char' (or 'C_NULL_PTR' if none).  (Note that you can get a C
5288pointer to Fortran data using the intrinsic 'c_loc', and convert it back
5289to a Fortran pointer in 'my_write_char' using 'c_f_pointer'.)
5290
5291   Similarly, to use the generic 'fftw_import_wisdom', you would define
5292a callback function of the form:
5293
5294       integer(C_INT) function my_read_char(p) bind(C)
5295         use, intrinsic :: iso_c_binding
5296         type(C_PTR), value :: p
5297         character :: c
5298         _...read a character c..._
5299         my_read_char = ichar(c, C_INT)
5300       end function my_read_char
5301
5302       ....
5303
5304       integer(C_INT) :: ret
5305       ret = fftw_import_wisdom(c_funloc(my_read_char), p)
5306       if (ret .eq. 0) stop 'error importing wisdom'
5307
5308   Your function can return '-1' if the end of the input is reached.
5309Again, 'p' is an arbitrary 'type(C_PTR' that is passed through to your
5310function.  'fftw_import_wisdom' returns '0' if an error occurred and
5311nonzero otherwise.
5312
5313
5314File: fftw3.info,  Node: Defining an FFTW module,  Prev: Accessing the wisdom API from Fortran,  Up: Calling FFTW from Modern Fortran
5315
53167.7 Defining an FFTW module
5317===========================
5318
5319Rather than using the 'include' statement to include the 'fftw3.f03'
5320interface file in any subroutine where you want to use FFTW, you might
5321prefer to define an FFTW Fortran module.  FFTW does not install itself
5322as a module, primarily because 'fftw3.f03' can be shared between
5323different Fortran compilers while modules (in general) cannot.  However,
5324it is trivial to define your own FFTW module if you want.  Just create a
5325file containing:
5326
5327       module FFTW3
5328         use, intrinsic :: iso_c_binding
5329         include 'fftw3.f03'
5330       end module
5331
5332   Compile this file into a module as usual for your compiler (e.g.
5333with 'gfortran -c' you will get a file 'fftw3.mod').  Now, instead of
5334'include 'fftw3.f03'', whenever you want to use FFTW routines you can
5335just do:
5336
5337       use FFTW3
5338
5339   as usual for Fortran modules.  (You still need to link to the FFTW
5340library, of course.)
5341
5342
5343File: fftw3.info,  Node: Calling FFTW from Legacy Fortran,  Next: Upgrading from FFTW version 2,  Prev: Calling FFTW from Modern Fortran,  Up: Top
5344
53458 Calling FFTW from Legacy Fortran
5346**********************************
5347
5348This chapter describes the interface to FFTW callable by Fortran code in
5349older compilers not supporting the Fortran 2003 C interoperability
5350features (*note Calling FFTW from Modern Fortran::).  This interface has
5351the major disadvantage that it is not type-checked, so if you mistake
5352the argument types or ordering then your program will not have any
5353compiler errors, and will likely crash at runtime.  So, greater care is
5354needed.  Also, technically interfacing older Fortran versions to C is
5355nonstandard, but in practice we have found that the techniques used in
5356this chapter have worked with all known Fortran compilers for many
5357years.
5358
5359   The legacy Fortran interface differs from the C interface only in the
5360prefix ('dfftw_' instead of 'fftw_' in double precision) and a few other
5361minor details.  This Fortran interface is included in the FFTW libraries
5362by default, unless a Fortran compiler isn't found on your system or
5363'--disable-fortran' is included in the 'configure' flags.  We assume
5364here that the reader is already familiar with the usage of FFTW in C, as
5365described elsewhere in this manual.
5366
5367   The MPI parallel interface to FFTW is _not_ currently available to
5368legacy Fortran.
5369
5370* Menu:
5371
5372* Fortran-interface routines::
5373* FFTW Constants in Fortran::
5374* FFTW Execution in Fortran::
5375* Fortran Examples::
5376* Wisdom of Fortran?::
5377
5378
5379File: fftw3.info,  Node: Fortran-interface routines,  Next: FFTW Constants in Fortran,  Prev: Calling FFTW from Legacy Fortran,  Up: Calling FFTW from Legacy Fortran
5380
53818.1 Fortran-interface routines
5382==============================
5383
5384Nearly all of the FFTW functions have Fortran-callable equivalents.  The
5385name of the legacy Fortran routine is the same as that of the
5386corresponding C routine, but with the 'fftw_' prefix replaced by
5387'dfftw_'.(1)  The single and long-double precision versions use 'sfftw_'
5388and 'lfftw_', respectively, instead of 'fftwf_' and 'fftwl_'; quadruple
5389precision ('real*16') is available on some systems as 'fftwq_' (*note
5390Precision::).  (Note that 'long double' on x86 hardware is usually at
5391most 80-bit extended precision, _not_ quadruple precision.)
5392
5393   For the most part, all of the arguments to the functions are the
5394same, with the following exceptions:
5395
5396   * 'plan' variables (what would be of type 'fftw_plan' in C), must be
5397     declared as a type that is at least as big as a pointer (address)
5398     on your machine.  We recommend using 'integer*8' everywhere, since
5399     this should always be big enough.
5400
5401   * Any function that returns a value (e.g.  'fftw_plan_dft') is
5402     converted into a _subroutine_.  The return value is converted into
5403     an additional _first_ parameter of this subroutine.(2)
5404
5405   * The Fortran routines expect multi-dimensional arrays to be in
5406     _column-major_ order, which is the ordinary format of Fortran
5407     arrays (*note Multi-dimensional Array Format::).  They do this
5408     transparently and costlessly simply by reversing the order of the
5409     dimensions passed to FFTW, but this has one important consequence
5410     for multi-dimensional real-complex transforms, discussed below.
5411
5412   * Wisdom import and export is somewhat more tricky because one cannot
5413     easily pass files or strings between C and Fortran; see *note
5414     Wisdom of Fortran?::.
5415
5416   * Legacy Fortran cannot use the 'fftw_malloc' dynamic-allocation
5417     routine.  If you want to exploit the SIMD FFTW (*note SIMD
5418     alignment and fftw_malloc::), you'll need to figure out some other
5419     way to ensure that your arrays are at least 16-byte aligned.
5420
5421   * Since Fortran 77 does not have data structures, the 'fftw_iodim'
5422     structure from the guru interface (*note Guru vector and transform
5423     sizes::) must be split into separate arguments.  In particular, any
5424     'fftw_iodim' array arguments in the C guru interface become three
5425     integer array arguments ('n', 'is', and 'os') in the Fortran guru
5426     interface, all of whose lengths should be equal to the
5427     corresponding 'rank' argument.
5428
5429   * The guru planner interface in Fortran does _not_ do any automatic
5430     translation between column-major and row-major; you are responsible
5431     for setting the strides etcetera to correspond to your Fortran
5432     arrays.  However, as a slight bug that we are preserving for
5433     backwards compatibility, the 'plan_guru_r2r' in Fortran _does_
5434     reverse the order of its 'kind' array parameter, so the 'kind'
5435     array of that routine should be in the reverse of the order of the
5436     iodim arrays (see above).
5437
5438   In general, you should take care to use Fortran data types that
5439correspond to (i.e.  are the same size as) the C types used by FFTW. In
5440practice, this correspondence is usually straightforward (i.e.
5441'integer' corresponds to 'int', 'real' corresponds to 'float',
5442etcetera).  The native Fortran double/single-precision complex type
5443should be compatible with 'fftw_complex'/'fftwf_complex'.  Such simple
5444correspondences are assumed in the examples below.
5445
5446   ---------- Footnotes ----------
5447
5448   (1) Technically, Fortran 77 identifiers are not allowed to have more
5449than 6 characters, nor may they contain underscores.  Any compiler that
5450enforces this limitation doesn't deserve to link to FFTW.
5451
5452   (2) The reason for this is that some Fortran implementations seem to
5453have trouble with C function return values, and vice versa.
5454
5455
5456File: fftw3.info,  Node: FFTW Constants in Fortran,  Next: FFTW Execution in Fortran,  Prev: Fortran-interface routines,  Up: Calling FFTW from Legacy Fortran
5457
54588.2 FFTW Constants in Fortran
5459=============================
5460
5461When creating plans in FFTW, a number of constants are used to specify
5462options, such as 'FFTW_MEASURE' or 'FFTW_ESTIMATE'.  The same constants
5463must be used with the wrapper routines, but of course the C header files
5464where the constants are defined can't be incorporated directly into
5465Fortran code.
5466
5467   Instead, we have placed Fortran equivalents of the FFTW constant
5468definitions in the file 'fftw3.f', which can be found in the same
5469directory as 'fftw3.h'.  If your Fortran compiler supports a
5470preprocessor of some sort, you should be able to 'include' or '#include'
5471this file; otherwise, you can paste it directly into your code.
5472
5473   In C, you combine different flags (like 'FFTW_PRESERVE_INPUT' and
5474'FFTW_MEASURE') using the ''|'' operator; in Fortran you should just use
5475''+''.  (Take care not to add in the same flag more than once, though.
5476Alternatively, you can use the 'ior' intrinsic function standardized in
5477Fortran 95.)
5478
5479
5480File: fftw3.info,  Node: FFTW Execution in Fortran,  Next: Fortran Examples,  Prev: FFTW Constants in Fortran,  Up: Calling FFTW from Legacy Fortran
5481
54828.3 FFTW Execution in Fortran
5483=============================
5484
5485In C, in order to use a plan, one normally calls 'fftw_execute', which
5486executes the plan to perform the transform on the input/output arrays
5487passed when the plan was created (*note Using Plans::).  The
5488corresponding subroutine call in legacy Fortran is:
5489             call dfftw_execute(plan)
5490
5491   However, we have had reports that this causes problems with some
5492recent optimizing Fortran compilers.  The problem is, because the
5493input/output arrays are not passed as explicit arguments to
5494'dfftw_execute', the semantics of Fortran (unlike C) allow the compiler
5495to assume that the input/output arrays are not changed by
5496'dfftw_execute'.  As a consequence, certain compilers end up optimizing
5497out or repositioning the call to 'dfftw_execute', assuming incorrectly
5498that it does nothing.
5499
5500   There are various workarounds to this, but the safest and simplest
5501thing is to not use 'dfftw_execute' in Fortran.  Instead, use the
5502functions described in *note New-array Execute Functions::, which take
5503the input/output arrays as explicit arguments.  For example, if the plan
5504is for a complex-data DFT and was created for the arrays 'in' and 'out',
5505you would do:
5506             call dfftw_execute_dft(plan, in, out)
5507
5508   There are a few things to be careful of, however:
5509
5510   * You must use the correct type of execute function, matching the way
5511     the plan was created.  Complex DFT plans should use
5512     'dfftw_execute_dft', Real-input (r2c) DFT plans should use use
5513     'dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
5514     'dfftw_execute_dft_c2r'.  The various r2r plans should use
5515     'dfftw_execute_r2r'.
5516
5517   * You should normally pass the same input/output arrays that were
5518     used when creating the plan.  This is always safe.
5519
5520   * _If_ you pass _different_ input/output arrays compared to those
5521     used when creating the plan, you must abide by all the restrictions
5522     of the new-array execute functions (*note New-array Execute
5523     Functions::).  The most difficult of these, in Fortran, is the
5524     requirement that the new arrays have the same alignment as the
5525     original arrays, because there seems to be no way in legacy Fortran
5526     to obtain guaranteed-aligned arrays (analogous to 'fftw_malloc' in
5527     C). You can, of course, use the 'FFTW_UNALIGNED' flag when creating
5528     the plan, in which case the plan does not depend on the alignment,
5529     but this may sacrifice substantial performance on architectures
5530     (like x86) with SIMD instructions (*note SIMD alignment and
5531     fftw_malloc::).
5532
5533
5534File: fftw3.info,  Node: Fortran Examples,  Next: Wisdom of Fortran?,  Prev: FFTW Execution in Fortran,  Up: Calling FFTW from Legacy Fortran
5535
55368.4 Fortran Examples
5537====================
5538
5539In C, you might have something like the following to transform a
5540one-dimensional complex array:
5541
5542             fftw_complex in[N], out[N];
5543             fftw_plan plan;
5544
5545             plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
5546             fftw_execute(plan);
5547             fftw_destroy_plan(plan);
5548
5549   In Fortran, you would use the following to accomplish the same thing:
5550
5551             double complex in, out
5552             dimension in(N), out(N)
5553             integer*8 plan
5554
5555             call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
5556             call dfftw_execute_dft(plan, in, out)
5557             call dfftw_destroy_plan(plan)
5558
5559   Notice how all routines are called as Fortran subroutines, and the
5560plan is returned via the first argument to 'dfftw_plan_dft_1d'.  Notice
5561also that we changed 'fftw_execute' to 'dfftw_execute_dft' (*note FFTW
5562Execution in Fortran::).  To do the same thing, but using 8 threads in
5563parallel (*note Multi-threaded FFTW::), you would simply prefix these
5564calls with:
5565
5566             integer iret
5567             call dfftw_init_threads(iret)
5568             call dfftw_plan_with_nthreads(8)
5569
5570   (You might want to check the value of 'iret': if it is zero, it
5571indicates an unlikely error during thread initialization.)
5572
5573   To check the number of threads currently being used by the planner,
5574you can do the following:
5575
5576             integer iret
5577             call dfftw_planner_nthreads(iret)
5578
5579   To transform a three-dimensional array in-place with C, you might do:
5580
5581             fftw_complex arr[L][M][N];
5582             fftw_plan plan;
5583
5584             plan = fftw_plan_dft_3d(L,M,N, arr,arr,
5585                                     FFTW_FORWARD, FFTW_ESTIMATE);
5586             fftw_execute(plan);
5587             fftw_destroy_plan(plan);
5588
5589   In Fortran, you would use this instead:
5590
5591             double complex arr
5592             dimension arr(L,M,N)
5593             integer*8 plan
5594
5595             call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
5596            &                       FFTW_FORWARD, FFTW_ESTIMATE)
5597             call dfftw_execute_dft(plan, arr, arr)
5598             call dfftw_destroy_plan(plan)
5599
5600   Note that we pass the array dimensions in the "natural" order in both
5601C and Fortran.
5602
5603   To transform a one-dimensional real array in Fortran, you might do:
5604
5605             double precision in
5606             dimension in(N)
5607             double complex out
5608             dimension out(N/2 + 1)
5609             integer*8 plan
5610
5611             call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
5612             call dfftw_execute_dft_r2c(plan, in, out)
5613             call dfftw_destroy_plan(plan)
5614
5615   To transform a two-dimensional real array, out of place, you might
5616use the following:
5617
5618             double precision in
5619             dimension in(M,N)
5620             double complex out
5621             dimension out(M/2 + 1, N)
5622             integer*8 plan
5623
5624             call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
5625             call dfftw_execute_dft_r2c(plan, in, out)
5626             call dfftw_destroy_plan(plan)
5627
5628   *Important:* Notice that it is the _first_ dimension of the complex
5629output array that is cut in half in Fortran, rather than the last
5630dimension as in C. This is a consequence of the interface routines
5631reversing the order of the array dimensions passed to FFTW so that the
5632Fortran program can use its ordinary column-major order.
5633
5634
5635File: fftw3.info,  Node: Wisdom of Fortran?,  Prev: Fortran Examples,  Up: Calling FFTW from Legacy Fortran
5636
56378.5 Wisdom of Fortran?
5638======================
5639
5640In this section, we discuss how one can import/export FFTW wisdom (saved
5641plans) to/from a Fortran program; we assume that the reader is already
5642familiar with wisdom, as described in *note Words of Wisdom-Saving
5643Plans::.
5644
5645   The basic problem is that is difficult to (portably) pass files and
5646strings between Fortran and C, so we cannot provide a direct Fortran
5647equivalent to the 'fftw_export_wisdom_to_file', etcetera, functions.
5648Fortran interfaces _are_ provided for the functions that do not take
5649file/string arguments, however: 'dfftw_import_system_wisdom',
5650'dfftw_import_wisdom', 'dfftw_export_wisdom', and 'dfftw_forget_wisdom'.
5651
5652   So, for example, to import the system-wide wisdom, you would do:
5653
5654             integer isuccess
5655             call dfftw_import_system_wisdom(isuccess)
5656
5657   As usual, the C return value is turned into a first parameter;
5658'isuccess' is non-zero on success and zero on failure (e.g.  if there is
5659no system wisdom installed).
5660
5661   If you want to import/export wisdom from/to an arbitrary file or
5662elsewhere, you can employ the generic 'dfftw_import_wisdom' and
5663'dfftw_export_wisdom' functions, for which you must supply a subroutine
5664to read/write one character at a time.  The FFTW package contains an
5665example file 'doc/f77_wisdom.f' demonstrating how to implement
5666'import_wisdom_from_file' and 'export_wisdom_to_file' subroutines in
5667this way.  (These routines cannot be compiled into the FFTW library
5668itself, lest all FFTW-using programs be required to link with the
5669Fortran I/O library.)
5670
5671
5672File: fftw3.info,  Node: Upgrading from FFTW version 2,  Next: Installation and Customization,  Prev: Calling FFTW from Legacy Fortran,  Up: Top
5673
56749 Upgrading from FFTW version 2
5675*******************************
5676
5677In this chapter, we outline the process for updating codes designed for
5678the older FFTW 2 interface to work with FFTW 3.  The interface for FFTW
56793 is not backwards-compatible with the interface for FFTW 2 and earlier
5680versions; codes written to use those versions will fail to link with
5681FFTW 3.  Nor is it possible to write "compatibility wrappers" to bridge
5682the gap (at least not efficiently), because FFTW 3 has different
5683semantics from previous versions.  However, upgrading should be a
5684straightforward process because the data formats are identical and the
5685overall style of planning/execution is essentially the same.
5686
5687   Unlike FFTW 2, there are no separate header files for real and
5688complex transforms (or even for different precisions) in FFTW 3; all
5689interfaces are defined in the '<fftw3.h>' header file.
5690
5691Numeric Types
5692=============
5693
5694The main difference in data types is that 'fftw_complex' in FFTW 2 was
5695defined as a 'struct' with macros 'c_re' and 'c_im' for accessing the
5696real/imaginary parts.  (This is binary-compatible with FFTW 3 on any
5697machine except perhaps for some older Crays in single precision.)  The
5698equivalent macros for FFTW 3 are:
5699
5700     #define c_re(c) ((c)[0])
5701     #define c_im(c) ((c)[1])
5702
5703   This does not work if you are using the C99 complex type, however,
5704unless you insert a 'double*' typecast into the above macros (*note
5705Complex numbers::).
5706
5707   Also, FFTW 2 had an 'fftw_real' typedef that was an alias for
5708'double' (in double precision).  In FFTW 3 you should just use 'double'
5709(or whatever precision you are employing).
5710
5711Plans
5712=====
5713
5714The major difference between FFTW 2 and FFTW 3 is in the
5715planning/execution division of labor.  In FFTW 2, plans were found for a
5716given transform size and type, and then could be applied to _any_ arrays
5717and for _any_ multiplicity/stride parameters.  In FFTW 3, you specify
5718the particular arrays, stride parameters, etcetera when creating the
5719plan, and the plan is then executed for _those_ arrays (unless the guru
5720interface is used) and _those_ parameters _only_.  (FFTW 2 had "specific
5721planner" routines that planned for a particular array and stride, but
5722the plan could still be used for other arrays and strides.)  That is,
5723much of the information that was formerly specified at execution time is
5724now specified at planning time.
5725
5726   Like FFTW 2's specific planner routines, the FFTW 3 planner
5727overwrites the input/output arrays unless you use 'FFTW_ESTIMATE'.
5728
5729   FFTW 2 had separate data types 'fftw_plan', 'fftwnd_plan',
5730'rfftw_plan', and 'rfftwnd_plan' for complex and real one- and
5731multi-dimensional transforms, and each type had its own 'destroy'
5732function.  In FFTW 3, all plans are of type 'fftw_plan' and all are
5733destroyed by 'fftw_destroy_plan(plan)'.
5734
5735   Where you formerly used 'fftw_create_plan' and 'fftw_one' to plan and
5736compute a single 1d transform, you would now use 'fftw_plan_dft_1d' to
5737plan the transform.  If you used the generic 'fftw' function to execute
5738the transform with multiplicity ('howmany') and stride parameters, you
5739would now use the advanced interface 'fftw_plan_many_dft' to specify
5740those parameters.  The plans are now executed with 'fftw_execute(plan)',
5741which takes all of its parameters (including the input/output arrays)
5742from the plan.
5743
5744   In-place transforms no longer interpret their output argument as
5745scratch space, nor is there an 'FFTW_IN_PLACE' flag.  You simply pass
5746the same pointer for both the input and output arguments.  (Previously,
5747the output 'ostride' and 'odist' parameters were ignored for in-place
5748transforms; now, if they are specified via the advanced interface, they
5749are significant even in the in-place case, although they should normally
5750equal the corresponding input parameters.)
5751
5752   The 'FFTW_ESTIMATE' and 'FFTW_MEASURE' flags have the same meaning as
5753before, although the planning time will differ.  You may also consider
5754using 'FFTW_PATIENT', which is like 'FFTW_MEASURE' except that it takes
5755more time in order to consider a wider variety of algorithms.
5756
5757   For multi-dimensional complex DFTs, instead of 'fftwnd_create_plan'
5758(or 'fftw2d_create_plan' or 'fftw3d_create_plan'), followed by
5759'fftwnd_one', you would use 'fftw_plan_dft' (or 'fftw_plan_dft_2d' or
5760'fftw_plan_dft_3d').  followed by 'fftw_execute'.  If you used 'fftwnd'
5761to to specify strides etcetera, you would instead specify these via
5762'fftw_plan_many_dft'.
5763
5764   The analogues to 'rfftw_create_plan' and 'rfftw_one' with
5765'FFTW_REAL_TO_COMPLEX' or 'FFTW_COMPLEX_TO_REAL' directions are
5766'fftw_plan_r2r_1d' with kind 'FFTW_R2HC' or 'FFTW_HC2R', followed by
5767'fftw_execute'.  The stride etcetera arguments of 'rfftw' are now in
5768'fftw_plan_many_r2r'.
5769
5770   Instead of 'rfftwnd_create_plan' (or 'rfftw2d_create_plan' or
5771'rfftw3d_create_plan') followed by 'rfftwnd_one_real_to_complex' or
5772'rfftwnd_one_complex_to_real', you now use 'fftw_plan_dft_r2c' (or
5773'fftw_plan_dft_r2c_2d' or 'fftw_plan_dft_r2c_3d') or 'fftw_plan_dft_c2r'
5774(or 'fftw_plan_dft_c2r_2d' or 'fftw_plan_dft_c2r_3d'), respectively,
5775followed by 'fftw_execute'.  As usual, the strides etcetera of
5776'rfftwnd_real_to_complex' or 'rfftwnd_complex_to_real' are no specified
5777in the advanced planner routines, 'fftw_plan_many_dft_r2c' or
5778'fftw_plan_many_dft_c2r'.
5779
5780Wisdom
5781======
5782
5783In FFTW 2, you had to supply the 'FFTW_USE_WISDOM' flag in order to use
5784wisdom; in FFTW 3, wisdom is always used.  (You could simulate the FFTW
57852 wisdom-less behavior by calling 'fftw_forget_wisdom' after every
5786planner call.)
5787
5788   The FFTW 3 wisdom import/export routines are almost the same as
5789before (although the storage format is entirely different).  There is
5790one significant difference, however.  In FFTW 2, the import routines
5791would never read past the end of the wisdom, so you could store extra
5792data beyond the wisdom in the same file, for example.  In FFTW 3, the
5793file-import routine may read up to a few hundred bytes past the end of
5794the wisdom, so you cannot store other data just beyond it.(1)
5795
5796   Wisdom has been enhanced by additional humility in FFTW 3: whereas
5797FFTW 2 would re-use wisdom for a given transform size regardless of the
5798stride etc., in FFTW 3 wisdom is only used with the strides etc.  for
5799which it was created.  Unfortunately, this means FFTW 3 has to create
5800new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g.
5801one transform of size 1024 also created wisdom for all smaller powers of
58022, but this no longer occurs).
5803
5804   FFTW 3 also has the new routine 'fftw_import_system_wisdom' to import
5805wisdom from a standard system-wide location.
5806
5807Memory allocation
5808=================
5809
5810In FFTW 3, we recommend allocating your arrays with 'fftw_malloc' and
5811deallocating them with 'fftw_free'; this is not required, but allows
5812optimal performance when SIMD acceleration is used.  (Those two
5813functions actually existed in FFTW 2, and worked the same way, but were
5814not documented.)
5815
5816   In FFTW 2, there were 'fftw_malloc_hook' and 'fftw_free_hook'
5817functions that allowed the user to replace FFTW's memory-allocation
5818routines (e.g.  to implement different error-handling, since by default
5819FFTW prints an error message and calls 'exit' to abort the program if
5820'malloc' returns 'NULL').  These hooks are not supported in FFTW 3;
5821those few users who require this functionality can just directly modify
5822the memory-allocation routines in FFTW (they are defined in
5823'kernel/alloc.c').
5824
5825Fortran interface
5826=================
5827
5828In FFTW 2, the subroutine names were obtained by replacing 'fftw_' with
5829'fftw_f77'; in FFTW 3, you replace 'fftw_' with 'dfftw_' (or 'sfftw_' or
5830'lfftw_', depending upon the precision).
5831
5832   In FFTW 3, we have begun recommending that you always declare the
5833type used to store plans as 'integer*8'.  (Too many people didn't notice
5834our instruction to switch from 'integer' to 'integer*8' for 64-bit
5835machines.)
5836
5837   In FFTW 3, we provide a 'fftw3.f' "header file" to include in your
5838code (and which is officially installed on Unix systems).  (In FFTW 2,
5839we supplied a 'fftw_f77.i' file, but it was not installed.)
5840
5841   Otherwise, the C-Fortran interface relationship is much the same as
5842it was before (e.g.  return values become initial parameters, and
5843multi-dimensional arrays are in column-major order).  Unlike FFTW 2, we
5844do provide some support for wisdom import/export in Fortran (*note
5845Wisdom of Fortran?::).
5846
5847Threads
5848=======
5849
5850Like FFTW 2, only the execution routines are thread-safe.  All planner
5851routines, etcetera, should be called by only a single thread at a time
5852(*note Thread safety::).  _Unlike_ FFTW 2, there is no special
5853'FFTW_THREADSAFE' flag for the planner to allow a given plan to be
5854usable by multiple threads in parallel; this is now the case by default.
5855
5856   The multi-threaded version of FFTW 2 required you to pass the number
5857of threads each time you execute the transform.  The number of threads
5858is now stored in the plan, and is specified before the planner is called
5859by 'fftw_plan_with_nthreads'.  The threads initialization routine used
5860to be called 'fftw_threads_init' and would return zero on success; the
5861new routine is called 'fftw_init_threads' and returns zero on failure.
5862The current number of threads used by the planner can be checked with
5863'fftw_planner_nthreads'.  *Note Multi-threaded FFTW::.
5864
5865   There is no separate threads header file in FFTW 3; all the function
5866prototypes are in '<fftw3.h>'.  However, you still have to link to a
5867separate library ('-lfftw3_threads -lfftw3 -lm' on Unix), as well as to
5868the threading library (e.g.  POSIX threads on Unix).
5869
5870   ---------- Footnotes ----------
5871
5872   (1) We do our own buffering because GNU libc I/O routines are
5873horribly slow for single-character I/O, apparently for thread-safety
5874reasons (whether you are using threads or not).
5875
5876
5877File: fftw3.info,  Node: Installation and Customization,  Next: Acknowledgments,  Prev: Upgrading from FFTW version 2,  Up: Top
5878
587910 Installation and Customization
5880*********************************
5881
5882This chapter describes the installation and customization of FFTW, the
5883latest version of which may be downloaded from the FFTW home page
5884(http://www.fftw.org).
5885
5886   In principle, FFTW should work on any system with an ANSI C compiler
5887('gcc' is fine).  However, planner time is drastically reduced if FFTW
5888can exploit a hardware cycle counter; FFTW comes with cycle-counter
5889support for all modern general-purpose CPUs, but you may need to add a
5890couple of lines of code if your compiler is not yet supported (*note
5891Cycle Counters::).  (On Unix, there will be a warning at the end of the
5892'configure' output if no cycle counter is found.)
5893
5894   Installation of FFTW is simplest if you have a Unix or a GNU system,
5895such as GNU/Linux, and we describe this case in the first section below,
5896including the use of special configuration options to e.g.  install
5897different precisions or exploit optimizations for particular
5898architectures (e.g.  SIMD). Compilation on non-Unix systems is a more
5899manual process, but we outline the procedure in the second section.  It
5900is also likely that pre-compiled binaries will be available for popular
5901systems.
5902
5903   Finally, we describe how you can customize FFTW for particular needs
5904by generating _codelets_ for fast transforms of sizes not supported
5905efficiently by the standard FFTW distribution.
5906
5907* Menu:
5908
5909* Installation on Unix::
5910* Installation on non-Unix systems::
5911* Cycle Counters::
5912* Generating your own code::
5913
5914
5915File: fftw3.info,  Node: Installation on Unix,  Next: Installation on non-Unix systems,  Prev: Installation and Customization,  Up: Installation and Customization
5916
591710.1 Installation on Unix
5918=========================
5919
5920FFTW comes with a 'configure' program in the GNU style.  Installation
5921can be as simple as:
5922
5923     ./configure
5924     make
5925     make install
5926
5927   This will build the uniprocessor complex and real transform libraries
5928along with the test programs.  (We recommend that you use GNU 'make' if
5929it is available; on some systems it is called 'gmake'.)  The "'make
5930install'" command installs the fftw and rfftw libraries in standard
5931places, and typically requires root privileges (unless you specify a
5932different install directory with the '--prefix' flag to 'configure').
5933You can also type "'make check'" to put the FFTW test programs through
5934their paces.  If you have problems during configuration or compilation,
5935you may want to run "'make distclean'" before trying again; this ensures
5936that you don't have any stale files left over from previous compilation
5937attempts.
5938
5939   The 'configure' script chooses the 'gcc' compiler by default, if it
5940is available; you can select some other compiler with:
5941     ./configure CC="<the name of your C compiler>"
5942
5943   The 'configure' script knows good 'CFLAGS' (C compiler flags) for a
5944few systems.  If your system is not known, the 'configure' script will
5945print out a warning.  In this case, you should re-configure FFTW with
5946the command
5947     ./configure CFLAGS="<write your CFLAGS here>"
5948   and then compile as usual.  If you do find an optimal set of 'CFLAGS'
5949for your system, please let us know what they are (along with the output
5950of 'config.guess') so that we can include them in future releases.
5951
5952   'configure' supports all the standard flags defined by the GNU Coding
5953Standards; see the 'INSTALL' file in FFTW or the GNU web page
5954(http://www.gnu.org/prep/standards/html_node/index.html).  Note
5955especially '--help' to list all flags and '--enable-shared' to create
5956shared, rather than static, libraries.  'configure' also accepts a few
5957FFTW-specific flags, particularly:
5958
5959   * '--enable-float': Produces a single-precision version of FFTW
5960     ('float') instead of the default double-precision ('double').
5961     *Note Precision::.
5962
5963   * '--enable-long-double': Produces a long-double precision version of
5964     FFTW ('long double') instead of the default double-precision
5965     ('double').  The 'configure' script will halt with an error message
5966     if 'long double' is the same size as 'double' on your
5967     machine/compiler.  *Note Precision::.
5968
5969   * '--enable-quad-precision': Produces a quadruple-precision version
5970     of FFTW using the nonstandard '__float128' type provided by 'gcc'
5971     4.6 or later on x86, x86-64, and Itanium architectures, instead of
5972     the default double-precision ('double').  The 'configure' script
5973     will halt with an error message if the compiler is not 'gcc'
5974     version 4.6 or later or if 'gcc''s 'libquadmath' library is not
5975     installed.  *Note Precision::.
5976
5977   * '--enable-threads': Enables compilation and installation of the
5978     FFTW threads library (*note Multi-threaded FFTW::), which provides
5979     a simple interface to parallel transforms for SMP systems.  By
5980     default, the threads routines are not compiled.
5981
5982   * '--enable-openmp': Like '--enable-threads', but using OpenMP
5983     compiler directives in order to induce parallelism rather than
5984     spawning its own threads directly, and installing an 'fftw3_omp'
5985     library rather than an 'fftw3_threads' library (*note
5986     Multi-threaded FFTW::).  You can use both '--enable-openmp' and
5987     '--enable-threads' since they compile/install libraries with
5988     different names.  By default, the OpenMP routines are not compiled.
5989
5990   * '--with-combined-threads': By default, if '--enable-threads' is
5991     used, the threads support is compiled into a separate library that
5992     must be linked in addition to the main FFTW library.  This is so
5993     that users of the serial library do not need to link the system
5994     threads libraries.  If '--with-combined-threads' is specified,
5995     however, then no separate threads library is created, and threads
5996     are included in the main FFTW library.  This is mainly useful under
5997     Windows, where no system threads library is required and
5998     inter-library dependencies are problematic.
5999
6000   * '--enable-mpi': Enables compilation and installation of the FFTW
6001     MPI library (*note Distributed-memory FFTW with MPI::), which
6002     provides parallel transforms for distributed-memory systems with
6003     MPI. (By default, the MPI routines are not compiled.)  *Note FFTW
6004     MPI Installation::.
6005
6006   * '--disable-fortran': Disables inclusion of legacy-Fortran wrapper
6007     routines (*note Calling FFTW from Legacy Fortran::) in the standard
6008     FFTW libraries.  These wrapper routines increase the library size
6009     by only a negligible amount, so they are included by default as
6010     long as the 'configure' script finds a Fortran compiler on your
6011     system.  (To specify a particular Fortran compiler foo, pass
6012     'F77='foo to 'configure'.)
6013
6014   * '--with-g77-wrappers': By default, when Fortran wrappers are
6015     included, the wrappers employ the linking conventions of the
6016     Fortran compiler detected by the 'configure' script.  If this
6017     compiler is GNU 'g77', however, then _two_ versions of the wrappers
6018     are included: one with 'g77''s idiosyncratic convention of
6019     appending two underscores to identifiers, and one with the more
6020     common convention of appending only a single underscore.  This way,
6021     the same FFTW library will work with both 'g77' and other Fortran
6022     compilers, such as GNU 'gfortran'.  However, the converse is not
6023     true: if you configure with a different compiler, then the
6024     'g77'-compatible wrappers are not included.  By specifying
6025     '--with-g77-wrappers', the 'g77'-compatible wrappers are included
6026     in addition to wrappers for whatever Fortran compiler 'configure'
6027     finds.
6028
6029   * '--with-slow-timer': Disables the use of hardware cycle counters,
6030     and falls back on 'gettimeofday' or 'clock'.  This greatly worsens
6031     performance, and should generally not be used (unless you don't
6032     have a cycle counter but still really want an optimized plan
6033     regardless of the time).  *Note Cycle Counters::.
6034
6035   * '--enable-sse' (single precision), '--enable-sse2' (single,
6036     double), '--enable-avx' (single, double), '--enable-avx2' (single,
6037     double), '--enable-avx512' (single, double),
6038     '--enable-avx-128-fma', '--enable-kcvi' (single),
6039     '--enable-altivec' (single), '--enable-vsx' (single, double),
6040     '--enable-neon' (single, double on aarch64),
6041     '--enable-generic-simd128', and '--enable-generic-simd256':
6042
6043     Enable various SIMD instruction sets.  You need compiler that
6044     supports the given SIMD extensions, but FFTW will try to detect at
6045     runtime whether the CPU supports these extensions.  That is, you
6046     can compile with'--enable-avx' and the code will still run on a CPU
6047     without AVX support.
6048
6049        - These options require a compiler supporting SIMD extensions,
6050          and compiler support is always a bit flaky: see the FFTW FAQ
6051          for a list of compiler versions that have problems compiling
6052          FFTW.
6053        - Because of the large variety of ARM processors and ABIs, FFTW
6054          does not attempt to guess the correct 'gcc' flags for
6055          generating NEON code.  In general, you will have to provide
6056          them on the command line.  This command line is known to have
6057          worked at least once:
6058               ./configure --with-slow-timer --host=arm-linux-gnueabi \
6059                 --enable-single --enable-neon \
6060                 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
6061
6062   To force 'configure' to use a particular C compiler foo (instead of
6063the default, usually 'gcc'), pass 'CC='foo to the 'configure' script;
6064you may also need to set the flags via the variable 'CFLAGS' as
6065described above.
6066
6067
6068File: fftw3.info,  Node: Installation on non-Unix systems,  Next: Cycle Counters,  Prev: Installation on Unix,  Up: Installation and Customization
6069
607010.2 Installation on non-Unix systems
6071=====================================
6072
6073It should be relatively straightforward to compile FFTW even on non-Unix
6074systems lacking the niceties of a 'configure' script.  Basically, you
6075need to edit the 'config.h' header (copy it from 'config.h.in') to
6076'#define' the various options and compiler characteristics, and then
6077compile all the '.c' files in the relevant directories.
6078
6079   The 'config.h' header contains about 100 options to set, each one
6080initially an '#undef', each documented with a comment, and most of them
6081fairly obvious.  For most of the options, you should simply '#define'
6082them to '1' if they are applicable, although a few options require a
6083particular value (e.g.  'SIZEOF_LONG_LONG' should be defined to the size
6084of the 'long long' type, in bytes, or zero if it is not supported).  We
6085will likely post some sample 'config.h' files for various operating
6086systems and compilers for you to use (at least as a starting point).
6087Please let us know if you have to hand-create a configuration file
6088(and/or a pre-compiled binary) that you want to share.
6089
6090   To create the FFTW library, you will then need to compile all of the
6091'.c' files in the 'kernel', 'dft', 'dft/scalar', 'dft/scalar/codelets',
6092'rdft', 'rdft/scalar', 'rdft/scalar/r2cf', 'rdft/scalar/r2cb',
6093'rdft/scalar/r2r', 'reodft', and 'api' directories.  If you are
6094compiling with SIMD support (e.g.  you defined 'HAVE_SSE2' in
6095'config.h'), then you also need to compile the '.c' files in the
6096'simd-support', '{dft,rdft}/simd', '{dft,rdft}/simd/*' directories.
6097
6098   Once these files are all compiled, link them into a library, or a
6099shared library, or directly into your program.
6100
6101   To compile the FFTW test program, additionally compile the code in
6102the 'libbench2/' directory, and link it into a library.  Then compile
6103the code in the 'tests/' directory and link it to the 'libbench2' and
6104FFTW libraries.  To compile the 'fftw-wisdom' (command-line) tool (*note
6105Wisdom Utilities::), compile 'tools/fftw-wisdom.c' and link it to the
6106'libbench2' and FFTW libraries
6107
6108
6109File: fftw3.info,  Node: Cycle Counters,  Next: Generating your own code,  Prev: Installation on non-Unix systems,  Up: Installation and Customization
6110
611110.3 Cycle Counters
6112===================
6113
6114FFTW's planner actually executes and times different possible FFT
6115algorithms in order to pick the fastest plan for a given n.  In order to
6116do this in as short a time as possible, however, the timer must have a
6117very high resolution, and to accomplish this we employ the hardware
6118"cycle counters" that are available on most CPUs.  Currently, FFTW
6119supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC
6120(SPARC v9), IA64, PA-RISC, and MIPS processors.
6121
6122   Access to the cycle counters, unfortunately, is a compiler and/or
6123operating-system dependent task, often requiring inline assembly
6124language, and it may be that your compiler is not supported.  If you are
6125_not_ supported, FFTW will by default fall back on its estimator
6126(effectively using 'FFTW_ESTIMATE' for all plans).
6127
6128   You can add support by editing the file 'kernel/cycle.h'; normally,
6129this will involve adapting one of the examples already present in order
6130to use the inline-assembler syntax for your C compiler, and will only
6131require a couple of lines of code.  Anyone adding support for a new
6132system to 'cycle.h' is encouraged to email us at <fftw@fftw.org>.
6133
6134   If a cycle counter is not available on your system (e.g.  some
6135embedded processor), and you don't want to use estimated plans, as a
6136last resort you can use the '--with-slow-timer' option to 'configure'
6137(on Unix) or '#define WITH_SLOW_TIMER' in 'config.h' (elsewhere).  This
6138will use the much lower-resolution 'gettimeofday' function, or even
6139'clock' if the former is unavailable, and planning will be extremely
6140slow.
6141
6142
6143File: fftw3.info,  Node: Generating your own code,  Prev: Cycle Counters,  Up: Installation and Customization
6144
614510.4 Generating your own code
6146=============================
6147
6148The directory 'genfft' contains the programs that were used to generate
6149FFTW's "codelets," which are hard-coded transforms of small sizes.  We
6150do not expect casual users to employ the generator, which is a rather
6151sophisticated program that generates directed acyclic graphs of FFT
6152algorithms and performs algebraic simplifications on them.  It was
6153written in Objective Caml, a dialect of ML, which is available at
6154<http://caml.inria.fr/ocaml/index.en.html>.
6155
6156   If you have Objective Caml installed (along with recent versions of
6157GNU 'autoconf', 'automake', and 'libtool'), then you can change the set
6158of codelets that are generated or play with the generation options.  The
6159set of generated codelets is specified by the
6160'{dft,rdft}/{codelets,simd}/*/Makefile.am' files.  For example, you can
6161add efficient REDFT codelets of small sizes by modifying
6162'rdft/codelets/r2r/Makefile.am'.  After you modify any 'Makefile.am'
6163files, you can type 'sh bootstrap.sh' in the top-level directory
6164followed by 'make' to re-generate the files.
6165
6166   We do not provide more details about the code-generation process,
6167since we do not expect that most users will need to generate their own
6168code.  However, feel free to contact us at <fftw@fftw.org> if you are
6169interested in the subject.
6170
6171   You might find it interesting to learn Caml and/or some modern
6172programming techniques that we used in the generator (including monadic
6173programming), especially if you heard the rumor that Java and
6174object-oriented programming are the latest advancement in the field.
6175The internal operation of the codelet generator is described in the
6176paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
6177available from the FFTW home page (http://www.fftw.org) and also
6178appeared in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
6179Programming Language Design and Implementation (PLDI)'.
6180
6181
6182File: fftw3.info,  Node: Acknowledgments,  Next: License and Copyright,  Prev: Installation and Customization,  Up: Top
6183
618411 Acknowledgments
6185******************
6186
6187Matteo Frigo was supported in part by the Special Research Program SFB
6188F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln
6189Laboratory.  For previous versions of FFTW, he was supported in part by
6190the Defense Advanced Research Projects Agency (DARPA), under Grants
6191N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
6192Corporation Fellowship.
6193
6194   Steven G. Johnson was supported in part by a Dept. of Defense NDSEG
6195Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
6196Research Science and Engineering Center program of the National Science
6197Foundation under award DMR-9400334.
6198
6199   Code for the Cell Broadband Engine was graciously donated to the FFTW
6200project by the IBM Austin Research Lab and included in fftw-3.2.  (This
6201code was removed in fftw-3.3.)
6202
6203   Code for the MIPS paired-single SIMD support was graciously donated
6204to the FFTW project by CodeSourcery, Inc.
6205
6206   We are grateful to Sun Microsystems Inc. for its donation of a
6207cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak).  These
6208machines served as the primary platform for the development of early
6209versions of FFTW.
6210
6211   We thank Intel Corporation for donating a four-processor Pentium Pro
6212machine.  We thank the GNU/Linux community for giving us a decent OS to
6213run on that machine.
6214
6215   We are thankful to the AMD corporation for donating an AMD Athlon XP
62161700+ computer to the FFTW project.
6217
6218   We thank the Compaq/HP testdrive program and VA Software Corporation
6219(SourceForge.net) for providing remote access to machines that were used
6220to test FFTW.
6221
6222   The 'genfft' suite of code generators was written using Objective
6223Caml, a dialect of ML. Objective Caml is a small and elegant language
6224developed by Xavier Leroy.  The implementation is available from
6225'http://caml.inria.fr/' (http://caml.inria.fr/).  In previous releases
6226of FFTW, 'genfft' was written in Caml Light, by the same authors.  An
6227even earlier implementation of 'genfft' was written in Scheme, but Caml
6228is definitely better for this kind of application.
6229
6230   FFTW uses many tools from the GNU project, including 'automake',
6231'texinfo', and 'libtool'.
6232
6233   Prof. Charles E. Leiserson of MIT provided continuous support and
6234encouragement.  This program would not exist without him.  Charles also
6235proposed the name "codelets" for the basic FFT blocks.
6236
6237   Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance
6238of Steven's "extra-curricular" computer-science activities, as well as
6239remarkable creativity in working them into his grant proposals.
6240Steven's physics degree would not exist without him.
6241
6242   Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
6243led to the SIMD support in FFTW 3.
6244
6245   Stefan Kral wrote most of the K7 code generator distributed with FFTW
62463.0.x and 3.1.x.
6247
6248   Andrew Sterian contributed the Windows timing code in FFTW 2.
6249
6250   Didier Miras reported a bug in the test procedure used in FFTW 1.2.
6251We now use a completely different test algorithm by Funda Ergun that
6252does not require a separate FFT program to compare against.
6253
6254   Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
6255that help portability.
6256
6257   Ming-Chang Liu uncovered a well-hidden bug in the complex transforms
6258of FFTW 2.0 and supplied a patch to correct it.
6259
6260   The FFTW FAQ was written in 'bfnn' (Bizarre Format With No Name) and
6261formatted using the tools developed by Ian Jackson for the Linux FAQ.
6262
6263   _We are especially thankful to all of our users for their continuing
6264support, feedback, and interest during our development of FFTW._
6265
6266
6267File: fftw3.info,  Node: License and Copyright,  Next: Concept Index,  Prev: Acknowledgments,  Up: Top
6268
626912 License and Copyright
6270************************
6271
6272FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003,
62732007-11 Massachusetts Institute of Technology.
6274
6275   FFTW is free software; you can redistribute it and/or modify it under
6276the terms of the GNU General Public License as published by the Free
6277Software Foundation; either version 2 of the License, or (at your
6278option) any later version.
6279
6280   This program is distributed in the hope that it will be useful, but
6281WITHOUT ANY WARRANTY; without even the implied warranty of
6282MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
6283Public License for more details.
6284
6285   You should have received a copy of the GNU General Public License
6286along with this program; if not, write to the Free Software Foundation,
6287Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You can
6288also find the GPL on the GNU web site
6289(http://www.gnu.org/licenses/gpl-2.0.html).
6290
6291   In addition, we kindly ask you to acknowledge FFTW and its authors in
6292any program or publication in which you use FFTW. (You are not
6293_required_ to do so; it is up to your common sense to decide whether you
6294want to comply with this request or not.)  For general publications, we
6295suggest referencing: Matteo Frigo and Steven G. Johnson, "The design and
6296implementation of FFTW3," Proc.  IEEE 93 (2), 216-231 (2005).
6297
6298   Non-free versions of FFTW are available under terms different from
6299those of the General Public License.  (e.g.  they do not require you to
6300accompany any object code using FFTW with the corresponding source
6301code.)  For these alternative terms you must purchase a license from
6302MIT's Technology Licensing Office.  Users interested in such a license
6303should contact us (<fftw@fftw.org>) for more information.
6304
6305