• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

FAQ/H03-May-2022-3,2882,482

cilk/H24-Mar-2003-1,5951,329

doc/H07-May-2022-19,52216,904

fftw/H03-May-2022-36,36631,954

fortran/H24-Mar-2003-9750

gensrc/H24-Mar-2003-5,0614,309

matlab/H24-Mar-2003-500324

mpi/H03-May-2022-5,4493,852

rfftw/H03-May-2022-28,98225,881

tests/H03-May-2022-3,8552,925

threads/H03-May-2022-5,3113,999

AUTHORSH A D16-Mar-2003120 53

COPYINGH A D16-Mar-200317.7 KiB342281

COPYRIGHTH A D16-Mar-2003797 1918

ChangeLogH A D24-Mar-2003161 KiB5,0403,222

INSTALLH A D16-Mar-20039 KiB215168

Makefile.amH A D16-Mar-20032 KiB4233

Makefile.inH A D03-May-202218.5 KiB577489

NEWSH A D24-Mar-200313.2 KiB339237

READMEH A D16-Mar-20032.4 KiB5641

README.hacksH A D16-Mar-20035 KiB13398

TODOH A D16-Mar-20033.1 KiB6452

acinclude.m4H A D16-Mar-20037.2 KiB239220

aclocal.m4H A D24-Mar-2003154.9 KiB4,7134,182

acx_mpi.m4H A D16-Mar-20033.1 KiB9789

acx_pthread.m4H A D16-Mar-20038.8 KiB227199

bootstrap.shH A D16-Mar-2003344 2210

config.guessH A D15-Mar-200340.4 KiB1,3891,194

config.subH A D15-Mar-200329.3 KiB1,4901,349

configureH A D24-Mar-2003449.1 KiB15,72212,934

configure.inH A D24-Mar-200312.1 KiB367317

depcompH A D17-Sep-200211.8 KiB424278

fftw.spec.inH A D16-Mar-20035.7 KiB172154

install-shH A D17-Sep-20025.4 KiB252153

ltmain.shH A D07-Jan-2003139.1 KiB5,0634,062

missingH A D17-Sep-200210 KiB337263

mkinstalldirsH A D17-Sep-20021.8 KiB10072

README

1This is FFTW, a collection of fast C routines to compute the Discrete
2Fourier Transform in one or more dimensions.
3
4`OFFICIAL' CODE:
5
6The doc/ directory contains the manual in texinfo, postscript, info,
7and HTML formats.  Frequently asked questions and answers can be found
8in the FAQ/ directory in a variety of formats (including HTML).
9
10The fftw/ directory contains the source code for the complex transforms,
11and the rfftw/ directory contains the source code for the real transforms.
12
13Large portions of the source are automatically generated by a program
14in the gensrc/ directory (written in Objective Caml).  You do not need
15this program to use FFTW, since FFTW comes with a default set of
16pregenerated codelets.  You are, however, welcome to look at and play
17with the generator (see the FFTW manual for more information).
18
19The threads/ directory contains an parallel version of FFTW (for
20shared-memory machines) that uses threads.  See the "Multi-threaded
21FFTW" section of the manual for more information.
22
23The mpi/ directory contains a parallel version of FFTW for transforms
24on machines with MPI.  (This code, unlike our other two parallel
25transforms, supports distributed memory machines.)  See the "MPI FFTW"
26section of the manual for more information.
27
28fortran/ contains some constant definitions for using FFTW from
29Fortran (see the FFTW manual), and also a small example program.
30
31Installation instructions are provided in the manual (don't worry, it
32is straightforward).
33
34`UNOFFICIAL' CODE (for you to play with):
35
36matlab/ contains code that allows you to call FFTW from MATLAB.
37
38The cilk/ directory contains an parallel version of FFTW written in
39Cilk.  Cilk is a cool C-like language in which you can write spawn
40foo() : foo will be executed in parallel with the main thread and the
41cost of spawn is just a few cycles (compare this with all the mess you
42have to do to create a posix thread and pay 3000 cycles for it).  More
43info on Cilk can be found at http://supertech.lcs.mit.edu/cilk/.
44
45CONTACTS
46--------
47
48FFTW was written by Matteo Frigo and Steven G. Johnson.  You can
49contact them at fftw@fftw.org.  The latest version of FFTW,
50benchmarks, links, and other information can be found at the FFTW home
51page (http://www.fftw.org).  You can also sign up to the fftw-announce
52mailing list to receive (infrequent) updates and information about new
53releases; to do so, go to:
54
55	http://www.fftw.org/mailman/listinfo/fftw-announce
56

README.hacks

1This file contains a random collection of the various hacks we did (or
2did not dare to do) in order to get performance.  It is not intended
3to be complete nor up to date, but it may be instructive for people to
4read (see also my (Matteo Frigo's) slides on the web page).
5
61) Negative constants:
7
8        a = 0.5 * b;               a = 0.5 * b;
9        c = 0.5 * d;	           c = -0.5 * d;
10        e = 1.0 + a;	 vs.       e = 1.0 + a;
11        f = 1.0 - c;	           f = 1.0 + c;
12
13   The code on the left is faster.  Guess why? The constants
14   0.5 and -0.5 are stored in different memory locations and loaded
15   separately.
16
172) Twiddle factors:
18
19   For some reason that escapes me, every package I have seen
20   (including FFTW 1.0) stores the twiddle factors in such
21   a way that the codelets need to access memory in a random
22   fashion.  It is much faster to just store the factors in the
23   order they will be needed.  This increased spatial locality
24   exploits caches and improves performance by about 30% for big
25   arrays.
26
273) Loops:
28
29   You may notice that the `twiddle' codelets contain a loop.
30   This is the most logical choice, as opposed to moving the loop
31   outside the codelet.  For example, one would expect that
32
33      for (i = 0; i < n; ++i)
34          foo();
35
36   be slower than the case where the loop is inside foo().  This is
37   usually the case, *except* for the ultrasparc.  FFTW would be 10%
38   faster (more or less) if the loop were outside the codelet.
39   Unfortunately, it would be slower everywhere else. If you want to
40   play with this, the codelet generator can be easily hacked...
41
424) Array padding:
43
44   (This is a disgusting hack that my programmer's ethic
45   pervents me from releasing to the public).
46
47   On the IBM RS/6000, for big n, the following **** is *way* faster:
48
49   - Take the input array
50   - `inflate' it, that is, produce a bigger array in which every
51     k-th element is unused (say k=512).  In other words, insert
52     holes in the input.
53   - execute fftw normally (properly hacked to keep the holes into
54     account)
55   - compact the array.
56
57   With this hack, FFTW is sensibly faster than IBM's ESSL library.
58   Sorry guys, I don't want to be responsible for releasing this
59   monster (this works only on the RS/6000, fortunately).
60
615) Phase of moon:
62
63   Don't take the numbers on the web page too seriously.  The
64   architectural details of modern machines make performance
65   *extremely* sensitive to little details such as where your code and
66   data are placed in memory.  The ultimate example of brain damage is
67   the Pentium Pro (what a surprise...), where we had a case in which
68   adding a printf() statement to FFTW slowed down another completely
69   unrelated fft program by a factor of 20.
70
71   Our strategy to generate huge pieces of straight-line code should
72   immunize FFTW sufficiently well against these problems, but you are
73   still likely to observe weird things like: FFTW_ESTIMATE can be
74   faster than FFTW_MEASURE.
75
76   FFTW-17.0 will compute the phase of the moon in the planner and take
77   it into account.
78
796) Estimator:
80
81   The estimator tries to come up with a `reasonable' plan.
82   Unfortunately, this concept is very machine-dependent.  The
83   estimator in the release is tuned for the UltraSPARC.
84
85   The following two parameters can be found in fftw/planner.c :
86
87   #define NOTW_OPTIMAL_SIZE 32
88   #define TWIDDLE_OPTIMAL_SIZE 12
89
90   They say: use non-twiddle codelets of size close to 32, and twiddle
91   codelets of size close to 12.  More or less, these are the most
92   efficient pieces of code on the UltraSPARC.  Your mileage *will*
93   vary.  Here are some suggestions for other machines.
94
95   Pentium
96
97   #define NOTW_OPTIMAL_SIZE 8   (or 16)
98   #define TWIDDLE_OPTIMAL_SIZE 8
99
100   Pentium Pro:
101
102   #define NOTW_OPTIMAL_SIZE 32   (or 16)
103   #define TWIDDLE_OPTIMAL_SIZE 2
104
105
106   RS/6000:
107
108   #define NOTW_OPTIMAL_SIZE 32
109   #define TWIDDLE_OPTIMAL_SIZE 32
110
111   The basic idea is: compute some plans for sizes you most care
112   about, print the plan and plug in the numbers that appear more
113   often (or some kind of average).  Note the dramatic difference
114   between Pentium an Pentium Pro. (NO LONGER TRUE WITH --enable-i386-hacks)
115
1167) Stack alignment:
117
118   Pentium-type processors impose a huge performance penalty if double-
119   precision values are not aligned to 8-byte boundaries in memory.
120   (We have seen factors of 3 or more in tight loops.)  Unfortunately,
121   the Intel ABI specifies that variables on the stack need only be aligned to
122   4-byte boundaries.  Even more unfortunately, this convention is followed
123   by Linux/x86 and gcc.
124
125   To get around this, we wrote special macros (HACK_ALIGN_STACK_EVEN
126   and HACK_ALIGN_STACK_ODD) that take effect when FFTW is compiled
127   with gcc/egcs and 'configure --enable-i386-hacks' is used.  These
128   macros are called before the computation-intensive "codelets"
129   of FFTW.  They use the gcc __builtin_alloca function to examine the
130   stack pointer and align the stack to an even or odd 4-byte boundary,
131   depending upon how many values will be pushed on the stack to call
132   the codelet.
133