1Quad Double computation package 2Copyright (C) 2003-2018 3================================================ 4 5Revision date: 30 Oct 2018 6 7Authors: 8Yozo Hida U.C. Berkeley yozo@cs.berkeley.edu 9Xiaoye S. Li Lawrence Berkeley Natl Lab xiaoye@nersc.gov 10David H. Bailey Lawrence Berkeley Natl Lab dhbailey@lbl.gov 11 12C++ usage guide: 13Alex Kaiser Lawrence Berkeley Natl Lab adkaiser@lbl.gov 14 15This work was supported by the Director, Office of Science, Division of Mathematical, 16Information, and Computational Sciences of the U.S. Department of Energy under contract 17number DE-AC02-05CH11231. 18 19This work was supported by the Director, Office of Science, Division of Mathematical, 20Information, and Computational Sciences of the U.S. Department of Energy under contract 21numbers DE-AC03-76SF00098 and DE-AC02-05CH11231. 22 23*** IMPORTANT NOTES: 24 25See the file COPYING for modified BSD license information. 26See the file INSTALL for installation instructions. 27See the file NEWS for recent revisions. 28See the file README.pdf for additional information. The file is mostly identical, but 29 but includes additional tables on constructors and constants. 30 31Outline: 32 33I. Introduction 34II. Directories and Files 35III. C++ Usage 36IV. Fortran Usage 37V. Note on x86-Based Processors (MOST systems in use today) 38 39 40I. Introduction 41 42This package provides numeric types of twice the precision of IEEE double (106 mantissa 43bits, or approximately 32 decimal digits) and four times the precision of IEEE double (212 44mantissa bits, or approximately 64 decimal digits). Due to features such as operator and 45function overloading, these facilities can be utilized with only minor modifications to 46conventional C++ and Fortran-90 programs. 47 48In addition to the basic arithmetic operations (add, subtract, multiply, divide, square root), 49common transcendental functions such as the exponential, logarithm, trigonometric and 50hyperbolic functions are also included. A detailed description of the algorithms used is 51available in the docs subdirectory (see docs/qd.ps). An abridged version of this paper, 52which was presented at the ARITH-15 conference, is also available in this same directory 53(see docs/arith15.ps). 54 55II. Directories and Files 56 57There are six directories and several files in the main directory of this distribution, 58described below 59 60src This contains the source code of the quad-double and double-double 61 library. This source code does not include inline functions, 62 which are found in the header files in the include directory. 63 64include This directory contains the header files. 65 66fortran This directory contains Fortran-90 files. 67 68tests This directory contains some simple (not comprehensive) tests. 69 70docs This directory contains two papers describing the algorithms. 71 72config This directory contains various scripts used by the configure 73 script and the Makefile. 74 75 76 77C++ Usage: 78 79Please note that all commands refer to a Unix-type environment such as Mac OSX or Ubuntu 80Linux using the bash shell. 81 82 83A. Building 84 85To build the library, first run the included configure script by typing 86 87 ./configure 88 89This script automatically generates makefiles for building the library and selects compilers 90and necessary flags and libraries to include. If the user wishes to specify compilers or flags 91they may use the following options. 92 93 CXX C++ compiler to use 94 CXXFLAGS C++ compiler flags to use 95 CC C compiler to use (for C demo program) 96 CFLAGS C compiler flags to use (for C demo program) 97 FC Fortran 90 compiler 98 FCFLAGS Fortran 90 compiler flags to use 99 FCLIBS Fortran 90 libraries needed to link with C++ code. 100 101For example, if one is using GNU compilers, configure with: 102 103 ./configure CXX=g++ FC=gfortran 104 105The Fortran and C++ compilers must produce compatible binaries. On some systems 106additional flags must be included to ensure that portions of the 107library are not built with 32 and 64 bit object files. For example, on 10864-Bit Mac OSX 10.6 (Snow Leopard) and 10.7 (Lion) the correct 109configure line using GNU compilers is: 110 111 ./configure CXX=g++ FC=gfortran FCFLAGS=-m64 112 113To build the library, simply type 114 115 make 116 117and the automatically generated makefiles will build the library including archive files. 118 119To allow for easy linking to the library, the user may also wish to 120install the archive files to a standard place. To do this type: 121 122 make install 123 124This will also build the library if it has not already been built. Many systems, including Mac 125and Ubuntu Linux systems, require administrator privileges to install the library at such 126standard places. On such systems, one may type: 127 128 sudo make install 129 130instead if one has sufficient access. 131 132The directory "tests" contains programs for high precision quadrature and integer-relation 133detection. To build such programs, type: 134 135 make demo 136 137in the "tests" directory. 138 139B. Linking 140 141The simplest way to link to the library is to install it to a standard place as described above, 142and use the -l option. For example 143 144 g++ compileExample.cpp -o compileExample -l qd 145 146One can also use this method to build with make. A file called "compileExample.cpp" and the 147associated makefile "makeCompileExample" illustrate the process. 148 149A third alternative is to use a link script. If one types "make demo" in the test directory, the 150output produced gives guidance as to how to build the files. By following the structure of 151the compiling commands one may copy the appropriate portions, perhaps replacing the 152filename with an argument that the user can include at link time. An example of such a 153script is as follows: 154 155g++ -DHAVE_CONFIG_H -I.. -I../include -I../include -O2 -MT $1.o -MD -MP -MF 156.deps/qd_test.Tpo -c -o $1.o $1.cpp 157mv -f .deps/$1.Tpo .deps/$1.Po 158g++ -O2 -o $1 $1.o ../src/libqd.a Ðlm 159 160To use it, make the link script executable and type: 161 162./link.scr compileExample 163 164Note that the file extension is not included because the script handles all extensions, 165expecting the source file to have the extension ".cpp" . 166 167Similarly, a script for compiling fortran programs may be constructed 168as follows. In the fortran directory, type "make quadts". This 169compiles the Fortran program tquadts.f, links with all necessary 170library files, and produces the executable "quadts". As this is being 171done, all flags and linked libraries are displayed. For instance, on 172many Mac systems, presuming g++-4.0 was defined for C++ and gfortran 173for F90, the following is output: 174 175 gfortran -O2 -ffree-form -c -o tquadts.o tquadts.f 176 g++-4.0 -O2 -Wall -o quadts tquadts.o second.o libarprec_f_main.a 177 libarprecmod.a ../src/libarprec.a 178 -L/usr/local/lib/gcc/i386-apple-darwin9.0.0/4.3.0 179 -L/usr/local/lib/gcc/i386-apple-darwin9.0.0/4.3.0/../../.. -lgfortranbegin 180 -lgfortran 181 182Thus a general compile-link script (which could be saved in an 183executable file named "complink.scr") is the following: 184 185 gfortran -O2 -ffree-form -c -o $1.o $1.f 186 g++-4.0 -O2 -Wall -o $1 $1.o second.o libarprec_f_main.a \ 187 libarprecmod.a ../src/libarprec.a \ 188 -L/usr/local/lib/gcc/i386-apple-darwin9.0.0/4.3.0 \ 189 -L/usr/local/lib/gcc/i386-apple-darwin9.0.0/4.3.0/../../.. -lgfortranbegin \ 190 -lgfortran 191 192Note that if the .f90 suffix is used for Fortran-90 source files, the 193-ffree-form flag can be omitted, and the first line above ends with 194"$1.f90". Remember to type "chmod +x complink.scr". Then, for 195instance, a program named "prog.f90" could be compiled and linked by 196merely typing "./complink.scr prog". 197 198 199C. Programming techniques 200 201As much as possible, operator overloading is included to make basic programming as much 202like using standard typed floating-point arithmetic. Changing many codes should be as 203simple as changing type statements and a few other lines. 204 205i. Constructors 206 207To create dd_real and qd_real variables calculated to the proper precision, one must use 208care to use the included constructors properly. Many computations in which variables are 209not explicitly typed to multiple-precision may be evaluated with double-precision 210arithmetic. The user must take care to ensure that this does not cause errors. In particular, 211an expression such as 1.0/3.0 will be evaluated to double precision before assignment or 212further arithmetic. Upon assignment to a multi-precision variable, the value will be zero 213padded. This problem is serious and potentially difficult to debug. To avoid this, use the 214included constructors to force arithmetic to be performed in the full precision requested. 215 216 217For a table with descriptions, please see the included file README.pdf 218 219 220ii. Included functions and Constants 221 222Supported functions include assignment operators, comparisons, arithmetic and 223assignment operators, and increments for integer types. Standard C math functions such as 224exponentiation, trigonometric, logarithmic, hyperbolic, exponential and rounding functions 225are included. As in assignment statements, one must be careful with implied typing of 226constants when using these functions. Many codes need particular conversion for the power 227function, which is frequently used with constants that must be explicitly typed for multi- 228precision codes. 229 230Many constants are included, which are global and calculated upon initialization. The 231following list of constants is calculated for both the dd_real and qd_real classes separately. 232Use care to select the correct value. 233 234 235For a table with descriptions, please see the included file README.pdf 236 237ii. Conversion of types 238 239Static casts may be used to convert constants between types. One may also use constructors 240to return temporary multi-precision types within expressions, but should be careful, as this 241will waste memory if done repeatedly. For example: 242 243 qd_real y ; 244 y = sin( qd_real(4.0) / 3.0 ) ; 245 246CÐstyle casts may be used, but are not recommended. Dynamic and reinterpret casts are 247not supported and should be considered unreliable. Casting between multi-precision and 248standard precision types can be dangerous, and care must be taken to ensure that programs 249are working properly and accuracy has not degraded by use of a misplaced type-conversion. 250 251D. Available precision, Control of Precision Levels, 252 253The library provides greatly extended accuracy when compared to standard double 254precision. The type dd_real provides for 106 mantissa bits, or about 32 decimal digits. The 255type qd_real provides for 212 mantissa bits, or about 64 decimal digits. 256 257Both the dd_real and qd_real values use the exponent from the highest double-precision 258word for arithmetic, and as such do not extend the total range of values available. That 259means that the maximum absolute value for either data type is the same as that of double- 260precision, or approximately 10^308. The precision near this range, however, is greatly 261increased. 262 263To ensure that arithmetic is carried out with proper precision and accuracy, one must call 264the function fpu_fix_start before performing any double-double or quad-double 265arithmetic. This forces all arithmetic to be carried out in 64-bit double precision, not the 80- 266bit precision that is found on certain compilers and interferes with the existing library. 267 268 unsigned int old_cw; 269 fpu_fix_start(&old_cw); 270 271To return standard settings for arithmetic on oneÕs system, call the function Òfpu_fix_endÓ. 272For example: 273 274 fpu_fix_end(&old_cw); 275 276 277E. I/O 278 279The standard I/O stream routines have been overloaded to be fully compatible with all 280included data types. One may need to manually reset the precision of the stream to obtain 281full output. For example, if 60 digits are desired, use: 282 283cout.precision(60) ; 284 285When reading values using cin, each input numerical value must start on a separate 286line. Two formats are acceptable: 287 288 1. Write the full constant 289 3. Mantissa e exponent 290 291Here are three valid examples: 292 293 1.1 294 3.14159 26535 89793 295 123.123123e50 296 297 298When read using cin, these constants will be converted using full multi-precision accuracy. 299 300 301IV. Fortran-90 Usage 302 303NEW (2007-01-10): The Fortran translation modules now support the complex datatypes 304"dd_complex" and "qd_complex". 305 306Since the quad-double library is written in C++, it must be linked in with a C++ compiler (so 307that C++ specific things such as static initializations are correctly handled). Thus the main 308program must be written in C/C++ and call the Fortran 90 subroutine. The Fortran 90 309subroutine should be called f_main. 310 311Here is a sample Fortran-90 program, equivalent to the above C++ program: 312 313 subroutine f_main 314 use qdmodule 315 implicit none 316 type (qd_real) a, b 317 integer*4 old_cw 318 319 call f_fpu_fix_start(old_cw) 320 a = 1.d0 321 b = cos(a)**2 + sin(a)**2 - 1.d0 322 call qdwrite(6, b) 323 stop 324 end subroutine 325 326This verifies that cos^2(1) + sin^2(1) = 1 to 64 digit accuracy. 327 328Most operators and generic function references, including many mixed-mode type 329combinations with double-precision (ie real*8), have been overloaded (extended) to work 330with double-double and quad-double data. It is important, however, that users keep in 331mind the fact that expressions are evaluated strictly according to conventional Fortran 332operator precedence rules. Thus some subexpressions may be evaluated only to 15-digit 333accuracy. For example, with the code 334 335 real*8 d1 336 type (dd_real) t1, t2 337 ... 338 t1 = cos (t2) + d1/3.d0 339 340the expression d1/3.d0 is computed to real*8 accuracy only (about 15 digits), since both d1 341and 3.d0 have type real*8. This result is then converted to dd_real by zero extension before 342being added to cos(t2). So, for example, if d1 held the value 1.d0, then the quotient d1/3.d0 343would only be accurate to 15 digits. If a fully accurate double-double quotient is required, 344this should be written: 345 346 real*8 d1 347 type (dd_real) t1, t2 348 ... 349 t1 = cos (t2) + ddreal (d1) / 3.d0 350 351which forces all operations to be performed with double-double arithmetic. 352 353Along this line, a constant such as 1.1 appearing in an expression is evaluated only to real*4 354accuracy, and a constant such as 1.1d0 is evaluated only to real*8 accuracy (this is 355according to standard Fortran conventions). If full quad-double accuracy is required, for 356instance, one should write 357 358 type (qd_real) t1 359 ... 360 t1 = '1.1' 361 362The quotes enclosing 1.1 specify to the compiler that the constant is to be converted to 363binary using quad-double arithmetic, before assignment to t1. Quoted constants may only 364appear in assignment statements such as this. 365 366To link a Fortran-90 program with the C++ qd library, it is recommended to link with the 367C++ compiler used to generate the library. The Fortran 90 interface (along with a C-style 368main function calling f_main) is found in qdmod library. The qd-config script installed 369during "make install" can be used to determine which flags to pass to compile and link your 370programs: 371 372 "qd-config --fcflags" displays compiler flags needed to compile your Fortran files. 373 "qd-config --fclibs" displays linker flags needed by the C++ linker to link in all the 374necessary libraries. 375 376A sample Makefile that can be used as a template for compiling Fortran programs using 377quad-double library is found in fortran/Makefile.sample. 378 379F90 functions defined with dd_real arguments: 380 Arithmetic: + - * / ** 381 Comparison tests: == < > <= >= /= 382 Others: abs, acos, aint, anint, asin, atan, atan2, cos, cosh, dble, erf, 383 erfc, exp, int, log, log10, max, min, mod, ddcsshf (cosh and sinh), 384 ddcssnf (cos and sin), ddranf (random number generator in (0,1)), 385 ddnrtf (n-th root), sign, sin, sinh, sqr, sqrt, tan, tanh 386Similar functions are provided for qd_real arguments (with function 387 names qdcsshf, qdcssnf, qdranf and qdnrtf instead of the names in 388 the list above). 389 390Input and output of double-double and quad-double data is done using the special 391subroutines ddread, ddwrite, qdread and qdwrite. The first argument of these subroutines 392is the Fortran I/O unit number, while additional arguments (as many as needed, up to 9 393arguments) are scalar variables or array elements of the appropriate type. Example: 394 395 integer n 396 type (qd_real) qda, qdb, qdc(n) 397 ... 398 call qdwrite (6, qda, qdb) 399 do j = 1, n 400 call qdwrite (6, qdc(j)) 401 enddo 402 403Each input values must be on a separate line, and may include D or E exponents. Double- 404double and quad-double constants may also be specified in assignment statements by 405enclosing them in quotes, as in 406 407 ... 408 type (qd_real) pi 409 ... 410 pi = 411"3.14159265358979323846264338327950288419716939937510582097494459230" 412 ... 413 414Sample Fortran-90 programs illustrating some of these features are provided in the f90 415subdirectory. 416 417 418V. Note on x86-Based Processors (MOST systems in use today) 419 420The algorithms in this library assume IEEE double precision floating point arithmetic. Since 421Intel x86 processors have extended (80-bit) floating point registers, the round-to-double 422flag must be enabled in the control word of the FPU for this library to function properly 423under x86 processors. The following functions contains appropriate code to facilitate 424manipulation of this flag. For non-x86 systems these functions do nothing (but still exist). 425 426fpu_fix_start This turns on the round-to-double bit in the control word. 427fpu_fix_end This restores the control flag. 428 429These functions must be called by the main program, as follows: 430 431 int main() { 432 unsigned int old_cw; 433 fpu_fix_start(&old_cw); 434 435 ... user code using quad-double library ... 436 437 fpu_fix_end(&old_cw); 438 } 439 440A Fortran-90 example is the following: 441 442 subroutine f_main 443 use qdmodule 444 implicit none 445 integer*4 old_cw 446 447 call f_fpu_fix_start(old_cw) 448 449 ... user code using quad-double library ... 450 451 call f_fpu_fix_end(old_cw) 452 end subroutine 453 454