1Version 2.1.5 (3/24/2003) 2 3 * Bug fix: Fortran wrappers were disabled in version 2.1.4. 4 5Version 2.1.4 (3/16/2003) 6 7 * Upgraded to newer versions of autoconf, etcetera, to fix compilation 8 problems on various recent systems. 9 10 * The configure script no longer picks the wrong architecture flags 11 (which caused FFTW to crash) on newer IBM POWER machines running AIX. 12 13 * Multi-threaded transforms should now utilize multiple CPUs on 14 Solaris (which creates threads in single-processor mode by default). 15 16 * Added experimental support for OpenMP (and SGI MP) compiler 17 parallelization directives in the multi-threaded transforms, 18 instead of using explicit thread spawning. Enable by configuring 19 --with-openmp or --with-sgi-mp in addition to --enable-threads. 20 21 * Expanded FAQ. 22 23Version 2.1.3 (11/7/1999) 24 25 * The configure script no longer overrides the CFLAGS environment 26 variable if it is defined. (Thanks to Diab Jerius.) 27 28 * Experimental Fortran-callable wrapper routines for MPI FFTW. 29 See mpi/README.f77 for more information. 30 31 * The configure script now detects and works around a stack 32 alignment bug in gcc 2.95.x on x86. 33 34 * configure attempts to guess the appropriate -mcpu flag on 35 Linux/PPC systems, improving performance (especially on G3s with 36 gcc 2.95 or later). 37 38 * Fixed integer overflow bug for complex transforms of large prime 39 sizes (> 32768). Thanks to Ezio Riva for the bug report. 40 41 * Fixed memory leak in the Matlab wrappers; thanks to Matthew Davis 42 for the bug report. 43 44 * Fixed bugs in the configure script when detecting POSIX threads 45 libraries on AIX and Tru64 (nee Digital) Unix. 46 47 * Fixed bug in multi-threaded transforms on AIX (which strangely 48 creates threads in non-joinable mode by default). Thanks to 49 Jim Lindsay for the bug report, and for allowing us to debug on 50 Northwestern University's IBM SP2. 51 52 * Slight fix to help build DLL's on Win32 (thanks to Andrew Sterian). 53 54Version 2.1.2 (5/18/1999) 55 56 * Fixed bug in our MPI test programs which made them fail under MPICH with 57 the p4 device (TCP/IP). (The 2.1.1 transforms worked, but the test 58 programs crashed.) 59 60 * Added missing fftw_f77_threads_init function to the Fortran wrappers 61 for the multi-threaded transforms. Thanks to V. Sundararajan for 62 the bug report. 63 64 * The codelet generator can now output efficient hard-coded DCT/DST 65 transforms. As a side effect of this work, we slightly reduced the 66 code size of rfftw. 67 68 * Test programs now support GNU-style long options when used with glibc. 69 70 * Added some more ideas to our TODO list. 71 72 * Improved codelet generator speed. 73 74Version 2.1.1 (3/31/1999) 75 76 * Fixed bug in the complex transforms for certain sizes with 77 intermediate-length prime factors (17-97), which under some 78 (hopefully rare) circumstances could cause incorrect results. 79 Thanks to Ming-Chang Liu for the bug report and patch. (The test 80 program will now catch this sort of problem when it is run in 81 paranoid mode.) 82 83Version 2.1 (3/8/1999) 84 85 * Added Fortran-callable wrapper routines for the multi-threaded 86 transforms. 87 88 * Documentation fixes and improvements. 89 90 * The --enable-type-prefix option to configure makes it easy to install 91 both single- and double-precision versions of FFTW on the same 92 (Unix) system. (See the installation section of the manual.) 93 94 * The MPI FFTW routines now include parallel one-dimensional transforms 95 for complex data. (See the fftw_mpi documentation in the FFTW 96 manual.) 97 98 * The MPI FFTW routines now include parallel multi-dimensional transforms 99 specialized for real data. (See the rfftwnd_mpi documentation in the 100 FFTW manual.) 101 102 * The MPI FFTW routines are now documented in the main 103 manual (in the doc directory). On Unix systems, they are also 104 automatically configured, compiled, and installed along with the main 105 FFTW library when you include --enable-mpi in the flags to the 106 configure script. (See the FFTW manual.) 107 108 * Largely-rewritten MPI code. It is now cleaner and (sometimes) faster. 109 It also supports the option of a user-supplied workspace for (often) 110 greater performance (using the MPI_Alltoall primitive). Beware that 111 the interfaces have changed slightly, however. 112 113 * The multi-threaded FFTW routines now include parallel one- and 114 multi-dimensional transforms of real data. (See the rfftw_threads 115 documentation in the FFTW manual.) 116 117 * The multi-threaded FFTW routines are now documented in the main 118 manual (in the doc directory). On Unix systems, they are also 119 automatically configured, compiled, and installed along with the main 120 FFTW library when you include --enable-threads in the flags to the 121 configure script. (See the FFTW manual.) 122 123 * The multi-threaded FFTW routines now include support for Mach C 124 threads (used, for example, in Apple's MacOS X). 125 126 * The Fortran-callable wrapper routines are now incorporated into 127 the ordinary FFTW libraries by default (although you can 128 disable this with the --disable-fortran option to configure) and 129 are documented in the main FFTW manual. 130 131 * Added an illustration of the data layout to the rfftwnd tutorial 132 section of the manual, in the hope of preventing future confusion 133 on this subject. 134 135 * The test programs now allow you to specify multidimensional sizes 136 (e.g. 128x54x81) for the -c and -s correctness and speed test options. 137 138Version 2.0.1 (9/29/98) 139 140 * (bug fix) Due to a poorly-parenthesized expression, rfftwnd overflowed 141 32-bit integer precision for rank > 1 transforms with a final 142 dimension >= 65536. This is now fixed. (Thanks to Walter Brisken 143 for the bug report.) 144 145 * (bug fix) Added definition of FFTW_OUT_OF_PLACE to fftw.h. The 146 flag is mentioned several times in the documentation, but its 147 definition was accidentally omitted since FFTW_OUT_OF_PLACE is the 148 default behavior. 149 150 * Corrected various small errors in the documentation. Thanks to 151 Geir Thomassen and Jeremy Buhler for their comments. 152 153 * Improved speed of the codelet generator by orders of magnitude, 154 since a user needed a hard-coded fft of size 101. 155 156 * Modified buffering in multidimensional transforms for some speed 157 improvements (only when fftwnd_create_plan_specific is used). 158 Thanks to Geert van Kempen for his tips. 159 160 * Added Andrew Sterian's patch to allow FFTW to be used as a shared 161 library more easily on Win32. 162 163Version 2.0 (9/11/1998) 164 165 * Completely rewritten real-complex transforms, now using 166 specialized codelets and an inherently real-complex algorithm for 167 greatly increased speed. Also, rfftw can now handle odd sizes and 168 strided transforms. Beware that the output format for 1D rfftw 169 transforms has changed. See the manual for more details. 170 171 * The complex transforms now use a fast algorithm for large prime 172 factors, working in O(N lg N) time even for prime sizes. 173 (Previously, the complexity contained an O(p^2) term, where p is 174 the largest prime factor of N. This is still the case for the 175 rfftw transforms.) Small prime factors are still more efficient, 176 however. 177 178 * Added functions fftw_one, fftwnd_one, rfftw_one, etcetera, to 179 simplify and clarify the use of fftw for single, unit-stride 180 transforms. 181 182 * Renamed FFTW_COMPLEX, FFTW_REAL to fftw_complex, fftw_real (for 183 greater consistency in capitalization). The all-caps names will 184 continue to be supported indefinitely, but are deprecated. (Also, 185 support for the COMPLEX and REAL types from FFTW 1.0 is now 186 disabled by default.) 187 188 * There are now Fortran-callable wrappers for the rfftw real-complex 189 transforms. 190 191 * New section of the manual discussing the use of FFTW with multiple 192 threads, and a new FFTW_THREADSAFE flag (described therein). 193 194 * Added shared library support. Use configure --enable-shared to 195 produce a shared library instead of a static library (the default). 196 197 * Dropped support for the operation-count (*_op_count) routines 198 introduced in v1.3, as these were little-used and were a pain to 199 keep up-to-date as FFTW changed internally. 200 201 * Made it easier to support floating-point types other than float 202 and double (e.g. long double). (See the file fftw-int.h.) 203 204Version 1.3 (4/9/1998) 205 206 * Multi-dimensional transforms contain significant performance 207 improvements for dimensions >= 3. 208 209 * Performance improvements in multi-dimensional transforms 210 with howmany > 1 and stride > dist. 211 212 * Improved parallelization and performance in the threads 213 code for dimensions >= 3. 214 215 * Changed the wisdom import/export format (the new wisdom remembers 216 the stride of the plan that generated it, for use with the new 217 create_plan_specific functions). (You should regenerate any stored 218 wisdom you have anyway, since this is a new version of FFTW.) 219 220 * Several small fixes to aid compilation on some systems. 221 222 * Fixed a bug in the MPI transform (in the transpose routine) that 223 caused errors for some array sizes. 224 225 * Fixed the (hopefully) last few things causing problems with C++ 226 compilers. 227 228 * Hack for x86/gcc to properly align local double-precision variables. 229 230 * Completely rewritten codelet generator. Now it produces 231 better code for non powers of 2, and is ready to produce 232 real->complex transforms. 233 234 * Testing algorithm is now more robust, and has a more rigorous 235 theoretical foundation. (Bugs in testing large transforms or 236 in single precision are now fixed--these bugs were only in the 237 test programs and not in the FFTW library itself.) 238 239 * Added "specific" planners, which allow plan optimization for a 240 specific array/stride. They also reduce the memory requirements 241 of the planner, and permit new optimizations in the multi-dimensional 242 case. (See the *_create_plan_specific functions.) 243 244 * FFTW can now compute a count of the number of arithmetic operations 245 it requires, which is useful for some academic purposes. (See the 246 *_count_plan_ops functions.) 247 248 * Adapted for use with GNU autoconf to aid installation on UNIX systems. 249 (Installation on non-UNIX systems should be the same as before.) 250 251 * Used gettimeofday function if available. (This function typically 252 has much higher accuracy than clock(), permitting plans to be 253 created much more quickly than before on many machines.) 254 255 * Made timing algorithm (hopefully) more robust in the face of 256 system interrupts, etc. 257 258 * Added wrapper routines for calling FFTW from MATLAB (in the 259 matlab/ directory). 260 261 * Added wrapper routines for calling FFTW from Fortran (in the 262 fortran/ directory). (These were available separately before.) 263 264Version 1.2.1 (12/4/1997) 265 266 * Fixed a third bug in the mpi transpose routines (sheesh!) that 267 could cause problems when re-using a transpose plan. Thanks 268 to Eric Skyllingstad for the bug reports. 269 270 * Fixed another bug in the mpi transpose routines. This bug produced 271 a memory leak and also occasionally tries to free a null pointer, 272 which causes problems on some systems. The mpi transpose/fft routines 273 now pass all of our malloc paranoia tests. 274 275 * Fixed bug in mpi transpose routines, where wrong results 276 could be given for some large 2D arrays. 277 278Version 1.2 (9/8/1997) 279 280 * Added a FAQ (in the FAQ/ directory). 281 282 * Fixed bug in rfftwnd routines where a block was accidentally 283 allocated to be too small, causing random memory to be 284 overwritten (yikes!). (Amazingly, this bug only caused the 285 test program to fail on one system that we could find. Our 286 test suite can now catch this sort of bug.) 287 288 * Abstractified taking differences of times (with fftw_time_diff 289 macro/function) to allow more general timer data structures. 290 291 * Added "wisdom" mechanism for saving plans & related info. 292 293 * Made timing mechanism more robust and maintainable. (Instead of 294 using a fixed number of iterations, we now repeatedly double 295 the number of iterations until a specified time interval 296 (FFTW_TIME_MIN) is reached.) 297 298 * Fixed header files to prevent difficulties when a mix of C and 299 C++ compilers is used, and to prevent problems with multiple 300 inclusions. 301 302 * Added experimental distributed-memory transforms using MPI. 303 304 * Fixed memory leak in fftwnd_destroy_plan (reported by Richard 305 Sullivan). Our test programs now all check for leaks. 306 307Version 1.1 (5/5/1997) 308 309 * Improved speed (yes!) [Some clever tricks with twiddle factors 310 and better code generator] 311 312 * Renamed `blocks' to `codelets', just to be fashionable 313 314 * Rewritten planner and executor--much simpler and more readable 315 code. Reference-counter garbage collection employed throughout. 316 317 * Much improved codelet generator. The ML code should be now 318 readable by humans, and easier to modify. 319 320 * Support for Prime Factor transforms in the codelet generator. 321 322 * Renamed COMPLEX -> FFTW_COMPLEX to avoid clashes with 323 existing packages. COMPLEX is still supported 324 for compatibility with 1.0 325 326 * Added experimental real->complex transform (quick hack, 327 use at your own risk). 328 329 * Added experimental parallel transforms using Cilk. 330 331 * Added experimental parallel transforms using threads (currently, 332 POSIX threads and Solaris threads are implemented and tested). 333 334 * Added DOS support, in the sense that we now support 8.3 filenames. 335 336Version 1.0 (3/24/1997) 337 338 * First release. 339