1\documentclass[11pt]{report} 2 3\usepackage{indentfirst} 4\usepackage[body={6in,8.5in}]{geometry} 5\usepackage{hyperref} 6\usepackage{graphicx} 7\DeclareGraphicsRule{.ps}{eps}{}{} 8 9\renewcommand{\thesection}{\arabic{section}} 10\setcounter{tocdepth}{3} 11\setcounter{secnumdepth}{3} 12 13\begin{document} 14\begin{center} 15 {\Large LAPACK Working Note 81\\ 16 Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was 17 supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}} 18\end{center} 19\begin{center} 20% Edward Anderson\footnote{Current address: Cray Research Inc., 21% 655F Lone Oak Drive, Eagan, MN 55121}, 22 The LAPACK Authors\\ 23 Department of Computer Science \\ 24 University of Tennessee \\ 25 Knoxville, Tennessee 37996-1301 \\ 26\end{center} 27\begin{center} 28 REVISED: VERSION 3.1.1, February 2007 \\ 29 REVISED: VERSION 3.2.0, November 2008 30\end{center} 31 32\begin{center} 33Abstract 34\end{center} 35This working note describes how to install, and test version 3.2.0 36of LAPACK, a linear algebra package for high-performance 37computers, on a Unix System. The timing routines are not actually included in 38release 3.2.0, and that part of the LAWN refers to release 3.0. Also, 39version 3.2.0 contains many prototype routines needing user feedback. 40Non-Unix installation instructions and 41further details of the testing and timing suites are only contained in 42LAPACK Working Note 41, and not in this abbreviated version. 43%Separate instructions are provided for the Unix and non-Unix 44%versions of the test package. 45%Further details are also given on the design of the test and timing 46%programs. 47\newpage 48 49\tableofcontents 50 51\newpage 52% Introduction to Implementation Guide 53 54\section{Introduction} 55 56LAPACK is a linear algebra library for high-performance 57computers. 58The library includes Fortran subroutines for 59the analysis and solution of systems of simultaneous linear algebraic 60equations, linear least-squares problems, and matrix eigenvalue 61problems. 62Our approach to achieving high efficiency is based on the use of 63a standard set of Basic Linear Algebra Subprograms (the BLAS), 64which can be optimized for each computing environment. 65By confining most of the computational work to the BLAS, 66the subroutines should be 67transportable and efficient across a wide range of computers. 68 69This working note describes how to install, test, and time this 70release of LAPACK on a Unix System. 71 72The instructions for installing, testing, and timing 73\footnote{timing are only provided in LAPACK 3.0 and before} 74are designed for a person whose 75responsibility is the maintenance of a mathematical software library. 76We assume the installer has experience in compiling and running 77Fortran programs and in creating object libraries. 78The installation process involves untarring the file, creating a set of 79libraries, and compiling and running the test and timing programs 80\footnotemark[\value{footnote}]. 81 82%This guide combines the instructions for the Unix and non-Unix 83%versions of the LAPACK test package (the non-Unix version is in Appendix 84%~\ref{appendixe}). 85%At this time, the non-Unix version of LAPACK can only be obtained 86%after first untarring the Unix tar tape and then following the instructions in 87%Appendix ~\ref{appendixe}. 88 89Section~\ref{fileformat} describes how the files are organized in the 90file, and 91Section~\ref{overview} gives a general overview of the parts of the test package. 92Step-by-step instructions appear in Section~\ref{installation}. 93%for the Unix version and in the appendix for the non-Unix version. 94 95For users desiring additional information, please refer to LAPACK 96Working Note 41. 97% Sections~\ref{moretesting} 98%and ~\ref{moretiming} give 99%details of the test and timing programs and their input files. 100%Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe 101%the LAPACK routines and auxiliary routines provided 102%in this release. 103%Appendix ~\ref{appendixc} lists the operation counts we have computed 104%for the BLAS and for some of the LAPACK routines. 105Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known 106problems from our own experiences, with suggestions on how to 107overcome them. 108 109\textbf{It is strongly advised that the user read Appendix 110A before proceeding with the installation process.} 111%Appendix E contains the execution times of the different test 112%and timing runs on two sample machines. 113%Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix 114%system. 115 116\section{Revisions Since the First Public Release} 117 118Since its first public release in February, 1992, LAPACK has had 119several updates, which have encompassed the introduction of new routines 120as well as extending the functionality of existing routines. The first 121update, 122June 30, 1992, was version 1.0a; the second update, October 31, 1992, 123was version 1.0b; the third update, March 31, 1993, was version 1.1; 124version 2.0 on September 30, 1994, coincided with the release of the 125Second Edition of the LAPACK Users' Guide; 126version 3.0 on June 30, 1999 coincided with the release of the Third Edition of 127the LAPACK Users' Guide; 128version 3.1 was released on November, 2006; 129version 3.1.1 was released on November, 2007; 130and version 3.2.0 was released on November, 2008. 131 132All LAPACK routines reflect the current version number with the date 133on the routine indicating when it was last modified. 134For more information on revisions in the latest release, please refer 135to the \texttt{revisions.info} file in the lapack directory on netlib. 136\begin{quote} 137\url{http://www.netlib.org/lapack/revisions.info} 138\end{quote} 139 140%The distribution \texttt{tar} file \texttt{lapack.tar.z} that is 141%available on netlib is always the most up-to-date. 142% 143%On-line manpages (troff files) for LAPACK driver and computational 144%routines, as well as most of the BLAS routines, are available via 145%the \texttt{lapack} index on netlib. 146 147\section{File Format}\label{fileformat} 148 149The software for LAPACK is distributed in the form of a 150gzipped tar file (via anonymous ftp or the World Wide Web), 151which contains the Fortran source for LAPACK, 152the Basic Linear Algebra Subprograms 153(the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs, 154and the timing programs\footnotemark[\value{footnote}]. 155Users who wish to have a non-Unix installation should refer to LAPACK 156Working Note 41, 157although the overview in section~\ref{overview} applies to both the Unix and non-Unix 158versions. 159%Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe}, 160%although the overview in section ~\ref{overview} applies to both the Unix and non-Unix 161%versions. 162 163The package may be accessed via the World Wide Web through 164the URL address: 165\begin{quote} 166\url{http://www.netlib.org/lapack/lapack.tgz} 167\end{quote} 168 169Or, you can retrieve the file via anonymous ftp at netlib: 170 171\begin{verbatim} 172 ftp ftp.netlib.org 173 login: anonymous 174 password: <your email address> 175 cd lapack 176 binary 177 get lapack.tgz 178 quit 179\end{verbatim} 180 181The software in the \texttt{tar} file 182is organized in a number of essential directories as shown 183in Figure 1. Please note that this figure does not reflect everything 184that is contained in the \texttt{LAPACK} directory. Input and instructional 185files are also located at various levels. 186\begin{figure} 187\vspace{11pt} 188\centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}} 189\caption{Unix organization of LAPACK 3.0} 190\vspace{11pt} 191\end{figure} 192Libraries are created in the LAPACK directory and 193executable files are created in one of the directories BLAS, TESTING, 194or TIMING\footnotemark[\value{footnote}]. Input files for the test and 195timing\footnotemark[\value{footnote}] programs are also 196found in these three directories so that testing may be carried out 197in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}]. 198A top-level makefile in the LAPACK directory is provided to perform the 199entire installation procedure. 200 201\section{Overview of Tape Contents}\label{overview} 202 203Most routines in LAPACK occur in four versions: REAL, 204DOUBLE PRECISION, COMPLEX, and COMPLEX*16. 205The first three versions (REAL, DOUBLE PRECISION, and COMPLEX) 206are written in standard Fortran and are completely portable; 207the COMPLEX*16 version is provided for 208those compilers which allow this data type. 209Some routines use features of Fortran 90. 210For convenience, we often refer to routines by their single precision 211names; the leading `S' can be replaced by a `D' for double precision, 212a `C' for complex, or a `Z' for complex*16. 213For LAPACK use and testing you must decide which version(s) 214of the package you intend to install at your site (for example, 215REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and 216COMPLEX*16 on an IBM computer). 217 218\subsection{LAPACK Routines} 219 220There are three classes of LAPACK routines: 221\begin{itemize} 222 223\item \textbf{driver} routines solve a complete problem, such as solving 224a system of linear equations or computing the eigenvalues of a real 225symmetric matrix. Users are encouraged to use a driver routine if there 226is one that meets their requirements. The driver routines are listed 227in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}. 228%in Appendix ~\ref{appendixa}. 229 230\item \textbf{computational} routines, also called simply LAPACK routines, 231perform a distinct computational task, such as computing 232the $LU$ decomposition of an $m$-by-$n$ matrix or finding the 233eigenvalues and eigenvectors of a symmetric tridiagonal matrix using 234the $QR$ algorithm. 235The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41} 236and the LAPACK Users' Guide~\cite{LUG}. 237%The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK 238%Working Note \#5 \cite{WN5}. 239 240\item \textbf{auxiliary} routines are all the other subroutines called 241by the driver routines and computational routines. 242%Among them are subroutines to perform subtasks of block algorithms, 243%in particular, the unblocked versions of the block algorithms; 244%extensions to the BLAS, such as matrix-vector operations involving 245%complex symmetric matrices; 246%the special routines LSAME and XERBLA which first appeared with the 247%BLAS; 248%and a number of routines to perform common low-level computations, 249%such as computing a matrix norm, generating an elementary Householder 250%transformation, and applying a sequence of plane rotations. 251%Many of the auxiliary routines may be of use to numerical analysts 252%or software developers, so we have documented the Fortran source for 253%these routines with the same level of detail used for the LAPACK 254%routines and driver routines. 255The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41} 256and the LAPACK Users' Guide~\cite{LUG}. 257%The auxiliary routines are listed in Appendix ~\ref{appendixb}. 258\end{itemize} 259 260\subsection{Level 1, 2, and 3 BLAS} 261 262The BLAS are a set of Basic Linear Algebra Subprograms that perform 263vector-vector, matrix-vector, and matrix-matrix operations. 264LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all 265of the parallelism in the LAPACK routines is contained in the BLAS. 266Therefore, 267the key to getting good performance from LAPACK lies in having an 268efficient version of the BLAS optimized for your particular machine. 269Optimized BLAS libraries are available on a variety of architectures, 270refer to the BLAS FAQ on netlib for further information. 271\begin{quote} 272\url{http://www.netlib.org/blas/faq.html} 273\end{quote} 274There are also freely available BLAS generators that automatically 275tune a subset of the BLAS for a given architecture. E.g., 276\begin{quote} 277\url{http://www.netlib.org/atlas/} 278\end{quote} 279And, if all else fails, there is the Fortran~77 reference implementation 280of the Level 1, 2, and 3 BLAS available on netlib (also included in 281the LAPACK distribution tar file). 282\begin{quote} 283\url{http://www.netlib.org/blas/blas.tgz} 284\end{quote} 285No matter which BLAS library is used, the BLAS test programs should 286always be run. 287 288Users should not expect too much from the Fortran~77 reference implementation 289BLAS; these versions were written to define the basic operations and do not 290employ the standard tricks for optimizing Fortran code. 291 292The formal definitions of the Level 1, 2, and 3 BLAS 293are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}. 294The BLAS Quick Reference card is available on netlib. 295 296\subsection{Mixed- and Extended-Precision BLAS: XBLAS} 297 298The XBLAS extend the BLAS to work with mixed input and output 299precisions as well as using extra precision internally. The XBLAS are 300used in the prototype extra-precise iterative refinement codes. 301 302The current release of the XBLAS is available through 303Netlib\footnote{Development versions may be available through 304 \url{http://www.cs.berkeley.edu/~yozo/} or 305 \url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at 306\begin{quote} 307 \url{http://www.netlib.org/xblas} 308\end{quote} 309Their formal definition is in \cite{XBLAS}. 310 311\subsection{LAPACK Test Routines} 312 313This release contains two distinct test programs for LAPACK routines 314in each data type. One test program tests the routines for solving 315linear equations and linear least squares problems, 316and the other tests routines for the matrix eigenvalue problem. 317The routines for generating test matrices are used by both test 318programs and are compiled into a library for use by both test programs. 319 320\subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) } 321 322This release also contains two distinct timing programs for the 323LAPACK routines in each data type. 324The linear equation timing program gathers performance data in 325megaflops on the factor, solve, and inverse routines for solving 326linear systems, the routines to generate or apply an orthogonal matrix 327given as a sequence of elementary transformations, and the reductions 328to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue 329computations. 330The operation counts used in computing the megaflop rates are computed 331from a formula; 332see LAPACK Working Note 41~\cite{WN41}. 333% see Appendix ~\ref{appendixc}. 334The eigenvalue timing program is used with the eigensystem routines 335and returns the execution time, number of floating point operations, and 336megaflop rate for each of the requested subroutines. 337In this program, the number of operations is computed while the 338code is executing using special instrumented versions of the LAPACK 339subroutines. 340 341\section{Installing LAPACK on a Unix System}\label{installation} 342 343Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK 344involves the following steps: 345\begin{enumerate} 346\item Gunzip and tar the file. 347 348\item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}. 349 350\item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}. 351 352%\item Test and Install the Machine-Dependent Routines \\ 353%\emph{(WARNING: You may need to supply a correct version of second.f and 354%dsecnd.f for your machine)} 355%{\tt 356%\begin{list}{}{} 357%\item cd LAPACK 358%\item make install 359%\end{list} } 360% 361%\item Create the BLAS Library, \emph{if necessary} \\ 362%\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)} 363%{\tt 364%\begin{list}{}{} 365%\item \texttt{cd LAPACK} 366%\item \texttt{make blaslib} 367%\end{list} } 368% 369%\item Run the Level 1, 2, and 3 BLAS Test Programs 370%\begin{list}{}{} 371%\item \texttt{cd LAPACK} 372%\item \texttt{make blas\_testing} 373%\end{list} 374% 375%\item Create the LAPACK Library 376%\begin{list}{}{} 377%\item \texttt{cd LAPACK} 378%\item \texttt{make lapacklib} 379%\end{list} 380% 381%\item Create the Library of Test Matrix Generators 382%\begin{list}{}{} 383%\item \texttt{cd LAPACK} 384%\item \texttt{make tmglib} 385%\end{list} 386% 387%\item Run the LAPACK Test Programs 388%\begin{list}{}{} 389%\item \texttt{cd LAPACK} 390%\item \texttt{make testing} 391%\end{list} 392% 393%\item Run the LAPACK Timing Programs 394%\begin{list}{}{} 395%\item \texttt{cd LAPACK} 396%\item \texttt{make timing} 397%\end{list} 398% 399%\item Run the BLAS Timing Programs 400%\begin{list}{}{} 401%\item \texttt{cd LAPACK} 402%\item \texttt{make blas\_timing} 403%\end{list} 404\end{enumerate} 405 406\subsection{Untar the File} 407 408If you received a tar file of LAPACK via the World Wide 409Web or anonymous ftp, enter the following command: 410 411\begin{list}{} 412\item{\texttt{gunzip -c lapack.tgz | tar xvf -}} 413\end{list} 414 415\noindent 416This will create a top-level directory called \texttt{LAPACK}, which 417requires approximately 34 Mbytes of disk space. 418The total space requirements including the object files and executables 419is approximately 100 Mbytes for all four data types. 420 421\subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}} 422 423Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs 424run, you must define all machine-specific parameters for the 425architecture to which you are installing LAPACK. All machine-specific 426parameters are contained in the file \texttt{LAPACK/make.inc}. 427An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given 428in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command: 429 430\begin{list}{} 431\item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}} 432\end{list} 433 434\noindent 435Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations. 436The first line of this \texttt{make.inc} file is: 437\begin{quote} 438SHELL = /bin/sh 439\end{quote} 440and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are 441installing LAPACK on an SGI architecture. 442Next, you will need to modify \texttt{FC}, \texttt{FFLAGS}, 443\texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify 444the compiler, compiler options, compiler options for the testing and 445timing\footnotemark[\value{footnote}] main programs, and linker options. 446Next you will have to choose which function you will use to time in the 447\texttt{SECOND} and \texttt{DSECND} routines. 448\begin{verbatim} 449# Default: SECOND and DSECND will use a call to the 450# EXTERNAL FUNCTION ETIME 451#TIMER = EXT_ETIME 452# For RS6K: SECOND and DSECND will use a call to the 453# EXTERNAL FUNCTION ETIME_ 454#TIMER = EXT_ETIME_ 455# For gfortran compiler: SECOND and DSECND will use a call to the 456# INTERNAL FUNCTION ETIME 457TIMER = INT_ETIME 458# If your Fortran compiler does not provide etime (like Nag Fortran 459# Compiler, etc...) SECOND and DSECND will use a call to the 460# INTERNAL FUNCTION CPU_TIME 461#TIMER = INT_CPU_TIME 462# If none of these work, you can use the NONE value. 463# In that case, SECOND and DSECND will always return 0. 464#TIMER = NONE 465\end{verbatim} 466Refer to the section~\ref{second} to get more information. 467 468 469Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver, 470archiver options, and ranlib for your machine. If your architecture 471does not require \texttt{ranlib} to be run after each archive command (as 472is the case with CRAY computers running UNICOS, Hewlett Packard 473computers running HP-UX, or SUN SPARCstations running Solaris), set 474\texttt{RANLIB = echo}. And finally, you must 475modify the \texttt{BLASLIB} definition to specify the BLAS library to which 476you will be linking. If an optimized version of the BLAS is available 477on your machine, you are highly recommended to link to that library. 478Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version. 479 480If you want to enable the XBLAS, define the variable \texttt{USEXBLAS} 481to some value, for example \texttt{USEXBLAS = Yes}. Then set the 482variable \texttt{XBLASLIB} to point at the XBLAS library. Note that 483the prototype iterative refinement routines and their testers will not 484be built unless \texttt{USEXBLAS} is defined. 485 486\textbf{NOTE:} Example \texttt{make.inc} include files are contained in the 487\texttt{LAPACK/INSTALL} directory. Please refer to 488Appendix~\ref{appendixd} for machine-specific installation hints, and/or 489the \texttt{release\_notes} file on \texttt{netlib}. 490\begin{quote} 491\url{http://www.netlib.org/lapack/release\_notes} 492\end{quote} 493 494\subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile} 495 496This \texttt{Makefile} can be modified to perform as much of the 497installation process as the user desires. Ideally, this is the ONLY 498makefile the user must modify. However, modification of lower-level 499makefiles may be necessary if a specific routine needs to be compiled 500with a different level of optimization. 501 502First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib}, 503\texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile} 504to specify the data types desired. For example, 505if you only wish to compile the single precision real version of the 506LAPACK library, you would modify the \texttt{lapacklib} definition to be: 507 508\begin{verbatim} 509lapacklib: 510 $(MAKE) -C SRC single 511\end{verbatim} 512 513Likewise, you could specify \texttt{double, complex, or complex16} to 514build the double precision real, single precision complex, or double 515precision complex libraries, respectively. By default, the presence of 516no arguments following the \texttt{make} command will result in the 517building of all four data types. 518The make command can be run more than once to add another 519data type to the library if necessary. 520 521%If you are installing LAPACK on a Silicon Graphics machine, you must 522%modify the respective definitions of \texttt{testing} and \texttt{timing} to be 523%\begin{verbatim} 524%testing: 525% ( cd TESTING; $(MAKE) -f Makefile.sgi ) 526%\end{verbatim} 527%and 528%\begin{verbatim} 529%timing: 530% ( cd TIMING; $(MAKE) -f Makefile.sgi ) 531%\end{verbatim} 532 533Next, if you will be using a locally available BLAS library, you will need 534to remove \texttt{blaslib} from the \texttt{lib} definition. And finally, 535if you do not wish to build all of the libraries individually and 536likewise run all of the testing and timing separately, you can 537modify the \texttt{all} definition to specify the amount of the 538installation process that you want performed. By default, 539the \texttt{all} definition is set to 540\begin{verbatim} 541all: lapack_install lib lapack_testing blas_testing 542\end{verbatim} 543which will perform all phases of the installation 544process -- testing of machine-dependent routines, building the libraries, 545BLAS testing and LAPACK testing. 546 547The entire installation process will then be performed by typing 548\texttt{make}. 549 550Questions and/or comments can be directed to the 551authors as described in Section~\ref{sendresults}. If test failures 552occur, please refer to the appropriate subsection in 553Section~\ref{furtherdetails}. 554 555If disk space is limited, we suggest building each data type separately 556and/or deleting all object files after building the libraries. Likewise, all 557testing and timing executables can be deleted after the testing and timing 558process is completed. The removal of all object files and executables 559can be accomplished by the following: 560 561\begin{list}{}{} 562\item \texttt{cd LAPACK} 563\item \texttt{make cleanobj} 564\end{list} 565 566\section{Further Details of the Installation Process}\label{furtherdetails} 567 568Alternatively, you can choose to run each of the phases of the 569installation process separately. The following sections give details 570on how this may be achieved. 571 572\subsection{Test and Install the Machine-Dependent Routines.} 573 574There are six machine-dependent functions in the test and timing 575package, at least three of which must be installed. They are 576 577\begin{tabbing} 578MONOMO \= DOUBLE PRECYSION \= \kill 579LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\ 580SLAMCH \> REAL \> Determine machine-dependent parameters \\ 581DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\ 582SECOND \> REAL \> Return time in seconds from a fixed starting time \\ 583DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\ 584ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant 585\end{tabbing} 586 587\noindent 588If you are working only in single precision, you do not need to install 589DLAMCH and DSECND, and if you are working only in double precision, 590you do not need to install SLAMCH and SECOND. 591 592These six subroutines are provided in \texttt{LAPACK/INSTALL}, 593along with six test programs. 594To compile the six test programs and run the tests, go to \texttt{LAPACK} and 595type \texttt{make lapack\_install}. The test programs are called 596\texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and 597\texttt{testieee}. 598If you do not wish to run all tests, you will need to modify the 599\texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the 600tests you wish to run. Otherwise, all tests will be performed. 601The expected results of each test program are described below. 602 603\subsubsection{Installing LSAME} 604 605LSAME is a logical function with two character parameters, A and B. 606It returns .TRUE. if A and B are the same regardless of case, or .FALSE. 607if they are different. 608For example, the expression 609 610\begin{list}{}{} 611\item \texttt{LSAME( UPLO, 'U' )} 612\end{list} 613\noindent 614is equivalent to 615\begin{list}{}{} 616\item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )} 617\end{list} 618 619The test program in \texttt{lsametst.f} tests all combinations of 620the same character in upper and lower case for A and B, and two 621cases where A and B are different characters. 622 623Run the test program by typing \texttt{testlsame}. 624If LSAME works correctly, the only message you should see after the 625execution of \texttt{testlsame} is 626\begin{verbatim} 627 ASCII character set 628 Tests completed 629\end{verbatim} 630The file \texttt{lsame.f} is automatically copied to 631\texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}. 632The function LSAME is needed by both the BLAS and LAPACK, so it is safer 633to have it in both libraries as long as this does not cause trouble 634in the link phase when both libraries are used. 635 636\subsubsection{Installing SLAMCH and DLAMCH} 637 638SLAMCH and DLAMCH are real functions with a single character parameter 639that indicates the machine parameter to be returned. The test 640program in \texttt{slamchtst.f} 641simply prints out the different values computed by SLAMCH, 642so you need to know something about what the values should be. 643For example, the output of the test program executable \texttt{testslamch} 644for SLAMCH on a Sun SPARCstation is 645\begin{verbatim} 646 Epsilon = 5.96046E-08 647 Safe minimum = 1.17549E-38 648 Base = 2.00000 649 Precision = 1.19209E-07 650 Number of digits in mantissa = 24.0000 651 Rounding mode = 1.00000 652 Minimum exponent = -125.000 653 Underflow threshold = 1.17549E-38 654 Largest exponent = 128.000 655 Overflow threshold = 3.40282E+38 656 Reciprocal of safe minimum = 8.50706E+37 657\end{verbatim} 658On a Cray machine, the safe minimum underflows its output 659representation and the overflow threshold overflows its output 660representation, so the safe minimum is printed as 0.00000 and overflow 661is printed as R. This is normal. 662If you would prefer to print a representable number, you can modify 663the test program to print SFMIN*100. and RMAX/100. for the safe 664minimum and overflow thresholds. 665 666Likewise, the test executable \texttt{testdlamch} is run for DLAMCH. 667 668If both tests were successful, go to Section~\ref{second}. 669 670If SLAMCH (or DLAMCH) returns an invalid value, you will have to create 671your own version of this function. The following options are used in 672LAPACK and must be set: 673 674\begin{list}{}{} 675\item {`B': } Base of the machine 676\item {`E': } Epsilon (relative machine precision) 677\item {`O': } Overflow threshold 678\item {`P': } Precision = Epsilon*Base 679\item {`S': } Safe minimum (often same as underflow threshold) 680\item {`U': } Underflow threshold 681\end{list} 682 683Some people may be familiar with R1MACH (D1MACH), a primitive 684routine for setting machine parameters in which the user must 685comment out the appropriate assignment statements for the target 686machine. If a version of R1MACH is on hand, the assignments in 687SLAMCH can be made to refer to R1MACH using the correspondence 688 689\begin{list}{}{} 690\item {SLAMCH( `U' )} $=$ R1MACH( 1 ) 691\item {SLAMCH( `O' )} $=$ R1MACH( 2 ) 692\item {SLAMCH( `E' )} $=$ R1MACH( 3 ) 693\item {SLAMCH( `B' )} $=$ R1MACH( 5 ) 694\end{list} 695 696\noindent 697The safe minimum returned by SLAMCH( 'S' ) is initially set to the 698underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$ 699it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$, 700where $\varepsilon$ is the machine precision. 701 702BE AWARE that the initial call to SLAMCH or DLAMCH is expensive. 703We suggest that installers run it once, save the results, and hard-code 704the constants in the version they put in their library. 705 706\subsubsection{Installing SECOND and DSECND}\label{second} 707 708Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND 709(DSECND), a real function with no arguments that returns the time 710in seconds from some fixed starting time. 711Our version of this routine 712returns only ``user time'', and not ``user time $+$ system time''. 713The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls 714ETIME, a Fortran library routine available on some computer systems. 715If ETIME is not available or a better local timing function exists, 716you will have to provide the correct interface to SECOND and DSECND 717on your machine. 718 719Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines. 720The version that will be used depends on the value of the TIMER variable in the make.inc 721 722\begin{itemize} 723\item If ETIME is available as an external function, set the value of the TIMER variable in your 724make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used. 725Usually on HPPA architectures, 726the compiler and linker flag \texttt{+U77} should be included to access 727the function \texttt{ETIME}. 728 729\item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc 730to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used. 731It is the case on some IBM architectures such as IBM RS/6000s. 732 733\item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc 734to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used. 735This is the case with gfortan. 736 737\item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc 738to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used. 739 740\item If none of these function is available, set the value of the TIMER variable in your make.inc 741to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used. 742These routines will always return zero. 743\end{itemize} 744 745The test program in \texttt{secondtst.f} 746performs a million operations using 5000 iterations of 747the SAXPY operation $y := y + \alpha x$ on a vector of length 100. 748The total time and megaflops for this test is reported, then 749the operation is repeated including a call to SECOND on each of 750the 5000 iterations to determine the overhead due to calling SECOND. 751The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}). 752There is no single right answer, but the times 753in seconds should be positive and the megaflop ratios should be 754appropriate for your machine. 755 756\subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee} 757 758%\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST 759%modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV 760%assumes an IEEE machine, and does a test for IEEE-754 compliance.} 761 762As some new routines in LAPACK rely on IEEE-754 compliance, 763two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV 764(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and 765infinity arithmetic, respectively. By default, ILAENV assumes an IEEE 766machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you 767are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 768as this test inside ILAENV will crash!} 769 770If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 771issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, 772and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. 773 774Thus, for non-IEEE machines, the user must hard-code the setting of 775(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version 776of \texttt{LAPACK/SRC/ilaenv.f} to be put in 777his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of 778ILAENV that will also need to be modified. 779\begin{itemize} 780\item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f} 781\item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f} 782\item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f} 783\item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f} 784\end{itemize} 785 786%Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance 787%is detected (via a call to the function ILAENV), alternative (slower) 788%algorithms will be chosen. 789%For further details, refer to the leading comments of routines such 790%as \texttt{LAPACK/SRC/sstevr.f}. 791 792The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation 793architecture 794to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant. 795A warning message to the user is printed if non-compliance is detected. 796This same test is performed inside the function ILAENV. If 797\texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 798issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, 799and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. 800 801To avoid this IEEE test being run every time you call 802\texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest 803that the user hard-code the setting of 804\texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in 805his library. As aforementioned, there are also specialized testing and 806timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified. 807 808\subsection{Create the BLAS Library} 809 810Ideally, a highly optimized version of the BLAS library already 811exists on your machine. 812In this case you can go directly to Section~\ref{testblas} to 813make the BLAS test programs. 814 815\begin{itemize} 816\item[a)] 817Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the 818file \texttt{Makefile} to specify the data types desired, as in the example 819in Section~\ref{toplevelmakefile}. 820 821If you already have some of the BLAS, you will need to edit the file 822\texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines 823defining the BLAS you have. 824 825\item[b)] 826Type \texttt{make blaslib}. 827The make command can be run more than once to add another 828data type to the library if necessary. 829\end{itemize} 830 831\noindent 832The BLAS library is created in \texttt{LAPACK/librefblas.a}, 833or in the user-defined location specified by \texttt{BLASLIB} in the file 834\texttt{LAPACK/make.inc}. 835 836\subsection{Run the BLAS Test Programs}\label{testblas} 837 838Test programs for the Level 1, 2, and 3 BLAS are in the directory 839\texttt{LAPACK/BLAS/TESTING}. 840 841To compile and run the Level 1, 2, and 3 BLAS test programs, 842go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable 843files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and 844\texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3, 845depending upon the level of BLAS that it is testing. All executable and 846output files are created in \texttt{LAPACK/BLAS/}. 847For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out}, 848\texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level 8492 and 3 BLAS, the name of the output file is indicated on the first line of the 850input file and is currently defined to be \texttt{sblat2.out} for 851the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL 852version, with similar names for the other data types. 853 854If the tests using the supplied data files were completed successfully, 855consider whether the tests were sufficiently thorough. 856For example, on a machine with vector registers, at least one value 857of $N$ greater than the length of the vector registers should be used; 858otherwise, important parts of the compiled code may not be 859exercised by the tests. 860If the tests were not successful, either because the program did not 861finish or the test ratios did not pass the threshold, you will 862probably have to find and correct the problem before continuing. 863If you have been testing a system-specific 864BLAS library, try using the Fortran BLAS for the routines that 865did not pass the tests. 866For more details on the BLAS test programs, 867see \cite{BLAS2-test} and \cite{BLAS3-test}. 868 869\subsection{Create the LAPACK Library} 870 871\begin{itemize} 872\item[a)] 873Go to the directory \texttt{LAPACK} and edit the definition of 874\texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired, 875as in the example in Section~\ref{toplevelmakefile}. 876 877\item[b)] 878Type \texttt{make lapacklib}. 879The make command can be run more than once to add another 880data type to the library if necessary. 881 882\end{itemize} 883 884\noindent 885The LAPACK library is created in \texttt{LAPACK/liblapack.a}, 886or in the user-defined location specified by \texttt{LAPACKLIB} in the file 887\texttt{LAPACK/make.inc}. 888 889\subsection{Create the Test Matrix Generator Library} 890 891\begin{itemize} 892\item[a)] 893Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib} 894in the file \texttt{Makefile} to specify the data types desired, as in the 895example in Section~\ref{toplevelmakefile}. 896 897\item[b)] 898Type \texttt{make tmglib}. 899The make command can be run more than once to add another 900data type to the library if necessary. 901 902\end{itemize} 903 904\noindent 905The test matrix generator library is created in \texttt{LAPACK/libtmglib.a}, 906or in the user-defined location specified by \texttt{TMGLIB} in the file 907\texttt{LAPACK/make.inc}. 908 909\subsection{Run the LAPACK Test Programs} 910 911There are two distinct test programs for LAPACK routines 912in each data type, one for the linear equation routines and 913one for the eigensystem routines. 914In each data type, there is one input file for testing the linear 915equation routines and eighteen input files for testing the eigenvalue 916routines. 917The input files reside in \texttt{LAPACK/TESTING}. 918For more information on the test programs and how to modify the 919input files, please refer to LAPACK Working Note 41~\cite{WN41}. 920% see Section~\ref{moretesting}. 921 922If you do not wish to run each of the tests individually, you can 923go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file 924\texttt{Makefile} to specify the data types desired, and type \texttt{make 925lapack\_testing}. This will 926compile and run the tests as described in sections~\ref{testlin} 927and ~\ref{testeig}. 928 929%If you are installing LAPACK on a Silicon Graphics machine, you must 930%modify the definition of \texttt{testing} to be 931%\begin{verbatim} 932%testing: 933% ( cd TESTING; $(MAKE) -f Makefile.sgi ) 934%\end{verbatim} 935 936\subsubsection{Testing the Linear Equations Routines}\label{testlin} 937 938\begin{itemize} 939 940\item[a)] 941Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types 942desired. The executable files are called \texttt{xlintsts, xlintstc, 943xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}. 944 945\item[b)] 946Go to \texttt{LAPACK/TESTING} and run the tests for each data type. 947For the REAL version, the command is 948\begin{list}{}{} 949\item{} \texttt{xlintsts < stest.in > stest.out} 950\end{list} 951 952\noindent 953The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar 954with the leading `s' in the input and output file names replaced 955by `d', `c', or `z'. 956 957\end{itemize} 958 959If you encountered failures in this phase of the testing process, please 960refer to Section~\ref{sendresults}. 961 962\subsubsection{Testing the Eigensystem Routines}\label{testeig} 963 964\begin{itemize} 965 966\item[a)] 967Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types 968desired. The executable files are called \texttt{xeigtsts, 969xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created 970in \texttt{LAPACK/TESTING}. 971 972\item[b)] 973Go to \texttt{LAPACK/TESTING} and run the tests for each data type. 974The tests for the eigensystem routines use eighteen separate input files 975for testing the nonsymmetric eigenvalue problem, 976the symmetric eigenvalue problem, the banded symmetric eigenvalue 977problem, the generalized symmetric eigenvalue 978problem, the generalized nonsymmetric eigenvalue problem, the 979singular value decomposition, the banded singular value decomposition, 980the generalized singular value 981decomposition, the generalized QR and RQ factorizations, the generalized 982linear regression model, and the constrained linear least squares 983problem. 984The tests for the REAL version are as follows: 985\begin{list}{}{} 986\item \texttt{xeigtsts < nep.in > snep.out} 987\item \texttt{xeigtsts < sep.in > ssep.out} 988\item \texttt{xeigtsts < svd.in > ssvd.out} 989\item \texttt{xeigtsts < sec.in > sec.out} 990\item \texttt{xeigtsts < sed.in > sed.out} 991\item \texttt{xeigtsts < sgg.in > sgg.out} 992\item \texttt{xeigtsts < sgd.in > sgd.out} 993\item \texttt{xeigtsts < ssg.in > ssg.out} 994\item \texttt{xeigtsts < ssb.in > ssb.out} 995\item \texttt{xeigtsts < sbb.in > sbb.out} 996\item \texttt{xeigtsts < sbal.in > sbal.out} 997\item \texttt{xeigtsts < sbak.in > sbak.out} 998\item \texttt{xeigtsts < sgbal.in > sgbal.out} 999\item \texttt{xeigtsts < sgbak.in > sgbak.out} 1000\item \texttt{xeigtsts < glm.in > sglm.out} 1001\item \texttt{xeigtsts < gqr.in > sgqr.out} 1002\item \texttt{xeigtsts < gsv.in > sgsv.out} 1003\item \texttt{xeigtsts < lse.in > slse.out} 1004\end{list} 1005The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also 1006use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in}, 1007\texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in}, 1008but the leading `s' in the other input file names must be changed 1009to `c', `d', or `z'. 1010\end{itemize} 1011 1012If you encountered failures in this phase of the testing process, please 1013refer to Section~\ref{sendresults}. 1014 1015\subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)} 1016 1017There are two distinct timing programs for LAPACK routines 1018in each data type, one for the linear equation routines and 1019one for the eigensystem routines. The timing program for the 1020linear equation routines is also used to time the BLAS. 1021We encourage you to conduct these timing experiments 1022in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is 1023not necessary to send timing results in all four data types. 1024 1025Two sets of input files are provided, a small set and a large set. 1026The small data sets are appropriate for a standard workstation or 1027other non-vector machine. 1028The large data sets are appropriate for supercomputers, vector 1029computers, and high-performance workstations. 1030We are mainly interested in results from the large data sets, and 1031it is not necessary to run both the large and small sets. 1032The values of N in the large data sets are about five times larger 1033than those in the small data set, 1034and the large data sets use additional values for parameters such as the 1035block size NB and the leading array dimension LDA. 1036Small data sets finished with the \_small in their name , such as 1037\texttt{stime\_small.in}, and large data sets finished with \_large in their name, 1038such as \texttt{stime\_large.in}. 1039Except as noted, the leading `s' in the input file name must be 1040replaced by `d', `c', or `z' for the other data types. 1041 1042We encourage you to obtain timing results with the large data sets, 1043as this allows us to compare different machines. 1044If this would take too much time, suggestions for paring back the large 1045data sets are given in the instructions below. 1046We also encourage you to experiment with these timing 1047programs and send us any interesting results, such as results for 1048larger problems or for a wider range of block sizes. 1049The main programs are dimensioned for the large data sets, 1050so the parameters in the main program may have to be reduced in order 1051to run the small data sets on a small machine, or increased to run 1052experiments with larger problems. 1053 1054The minimum time each subroutine will be timed is set to 0.0 in 1055the large data files and to 0.05 in the small data files, and on 1056many machines this value should be increased. 1057If the timing interval is not long 1058enough, the time for the subroutine after subtracting the overhead 1059may be very small or zero, resulting in megaflop rates that are 1060very large or zero. (To avoid division by zero, the megaflop rate is 1061set to zero if the time is less than or equal to zero.) 1062The minimum time that should be used depends on the machine and the 1063resolution of the clock. 1064 1065For more information on the timing programs and how to modify the 1066input files, please refer to LAPACK Working Note 41~\cite{WN41}. 1067% see Section~\ref{moretiming}. 1068 1069If you do not wish to run each of the timings individually, you can 1070go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file 1071\texttt{Makefile} to specify the data types desired, and type \texttt{make 1072lapack\_timing}. This will compile 1073and run the timings for the linear equation routines and the eigensystem 1074routines (see Sections~\ref{timelin} and ~\ref{timeeig}). 1075 1076%If you are installing LAPACK on a Silicon Graphics machine, you must 1077%modify the definition of \texttt{timing} to be 1078%\begin{verbatim} 1079%timing: 1080% ( cd TIMING; $(MAKE) -f Makefile.sgi ) 1081%\end{verbatim} 1082 1083If you encounter failures in any phase of the timing process, please 1084feel free to contact the authors as directed in Section~\ref{sendresults}. 1085Tell us the 1086type of machine on which the tests were run, the version of the operating 1087system, the compiler and compiler options that were used, 1088and details of the BLAS library or libraries that you used. You should 1089also include a copy of the output file in which the failure occurs. 1090 1091Please note that the BLAS 1092timing runs will still need to be run as instructed in ~\ref{timeblas}. 1093 1094\subsubsection{Timing the Linear Equations Routines}\label{timelin} 1095 1096The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN} 1097and the input files are in \texttt{LAPACK/TIMING}. 1098Three input files are provided in each data type for timing the 1099linear equation routines, one for square matrices, one for band 1100matrices, and one for rectangular matrices. The small data sets for the REAL version 1101are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively, 1102and the large data sets are 1103\texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}. 1104 1105The timing program for the least squares routines uses special instrumented 1106versions of the LAPACK routines to time individual sections of the code. 1107The first step in compiling the timing program is therefore to make a library 1108of the instrumented routines. 1109 1110\begin{itemize} 1111\item[a)] 1112\begin{sloppypar} 1113To make a library of the instrumented LAPACK routines, first 1114go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed 1115by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 1116The library of instrumented code is created in 1117\texttt{LAPACK/TIMING/LIN/linsrc.a}. 1118\end{sloppypar} 1119 1120\item[b)] 1121To make the linear equation timing programs, 1122go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data 1123types desired, as in the examples in Section~\ref{toplevelmakefile}. 1124The executable files are called \texttt{xlintims}, 1125\texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created 1126in \texttt{LAPACK/TIMING}. 1127 1128\item[c)] 1129Go to \texttt{LAPACK/TIMING} and 1130make any necessary modifications to the input files. 1131You may need to set the minimum time a subroutine will 1132be timed to a positive value, or to restrict the size of the tests 1133if you are using a computer with performance in between that of a 1134workstation and that of a supercomputer. 1135The computational requirements can be cut in half by using only one 1136value of LDA. 1137If it is necessary to also reduce the matrix sizes or the values of 1138the blocksize, corresponding changes should be made to the 1139BLAS input files (see Section~\ref{timeblas}). 1140 1141\item[d)] 1142Run the programs for each data type you are using. 1143For the REAL version, the commands for the small data sets are 1144 1145\begin{list}{}{} 1146\item{} \texttt{xlintims < stime\_small.in > stime\_small.out } 1147\item{} \texttt{xlintims < sband\_small.in > sband\_small.out } 1148\item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out } 1149\end{list} 1150or the commands for the large data sets are 1151\begin{list}{}{} 1152\item{} \texttt{xlintims < stime\_large.in > stime\_large.out } 1153\item{} \texttt{xlintims < sband\_large.in > sband\_large.out } 1154\item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out } 1155\end{list} 1156 1157\noindent 1158Similar commands should be used for the other data types. 1159\end{itemize} 1160 1161\subsubsection{Timing the BLAS}\label{timeblas} 1162 1163The linear equation timing program is also used to time the BLAS. 1164Three input files are provided in each data type for timing the Level 11652 and 3 BLAS. 1166These input files time the BLAS using the matrix shapes encountered 1167in the LAPACK routines, and we will use the results to analyze the 1168performance of the LAPACK routines. 1169For the REAL version, the small data files are 1170\texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in} 1171and the large data files are 1172\texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}. 1173There are three sets of inputs because there are three 1174parameters in the Level 3 BLAS, M, N, and K, and 1175in most applications one of these parameters is small (on the order 1176of the blocksize) while the other two are large (on the order of the 1177matrix size). 1178In \texttt{sblasa\_small.in}, M and N are large but K is 1179small, while in \texttt{sblasb\_small.in} the small parameter is M, and 1180in \texttt{sblasc\_small.in} the small parameter is N. 1181The Level 2 BLAS are timed only in the first data set, where K 1182is also used as the bandwidth for the banded routines. 1183 1184\begin{itemize} 1185 1186\item[a)] 1187Go to \texttt{LAPACK/TIMING} and 1188make any necessary modifications to the input files. 1189You may need to set the minimum time a subroutine will 1190be timed to a positive value. 1191If you modified the values of N or NB 1192in Section~\ref{timelin}, set M, N, and K accordingly. 1193The large parameters among M, N, and K 1194should be the same as the matrix sizes used in timing the linear 1195equation routines, 1196and the small parameter should be the same as the 1197blocksizes used in timing the linear equation routines. 1198If necessary, the large data set can be simplified by using only one 1199value of LDA. 1200 1201\item[b)] 1202Run the programs for each data type you are using. 1203For the REAL version, the commands for the small data sets are 1204 1205\begin{list}{}{} 1206\item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out } 1207\item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out } 1208\item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out } 1209\end{list} 1210or the commands for the large data sets are 1211\begin{list}{}{} 1212\item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out } 1213\item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out } 1214\item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out } 1215\end{list} 1216 1217\noindent 1218Similar commands should be used for the other data types. 1219\end{itemize} 1220 1221\subsubsection{Timing the Eigensystem Routines}\label{timeeig} 1222 1223The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG} 1224and the input files are in \texttt{LAPACK/TIMING}. 1225Four input files are provided in each data type for timing the 1226eigensystem routines, 1227one for the generalized nonsymmetric eigenvalue problem, 1228one for the nonsymmetric eigenvalue problem, 1229one for the symmetric and generalized symmetric eigenvalue problem, 1230and one for the singular value decomposition. 1231For the REAL version, the small data sets are called \texttt{sgeptim\_small.in}, 1232\texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively. 1233and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in}, 1234\texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}. 1235Each of the four input files reads a different set of parameters, 1236and the format of the input is indicated by a 3-character code 1237on the first line. 1238 1239The timing program for eigenvalue/singular value routines accumulates 1240the operation count as the routines are executing using special 1241instrumented versions of the LAPACK routines. The first step in 1242compiling the timing program is therefore to make a library of the 1243instrumented routines. 1244 1245\begin{itemize} 1246\item[a)] 1247\begin{sloppypar} 1248To make a library of the instrumented LAPACK routines, first 1249go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed 1250by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 1251The library of instrumented code is created in 1252\texttt{LAPACK/TIMING/EIG/eigsrc.a}. 1253\end{sloppypar} 1254 1255\item[b)] 1256To make the eigensystem timing programs, 1257go to \texttt{LAPACK/TIMING/EIG} and 1258type \texttt{make} followed by the data types desired, as in the examples 1259of Section~\ref{toplevelmakefile}. The executable files are called 1260\texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz} 1261and are created in \texttt{LAPACK/TIMING}. 1262 1263\item[c)] 1264Go to \texttt{LAPACK/TIMING} and 1265make any necessary modifications to the input files. 1266You may need to set the minimum time a subroutine will 1267be timed to a positive value, or to restrict the number of tests 1268if you are using a computer with performance in between that of a 1269workstation and that of a supercomputer. 1270Instead of decreasing the matrix dimensions to reduce the time, 1271it would be better to reduce the number of matrix types to be timed, 1272since the performance varies more with the matrix size than with the 1273type. For example, for the nonsymmetric eigenvalue routines, 1274you could use only one matrix of type 4 instead of four matrices of 1275types 1, 3, 4, and 6. 1276Refer to LAPACK Working Note 41~\cite{WN41} for further details. 1277% See Section~\ref{moretiming} for further details. 1278 1279\item[d)] 1280Run the programs for each data type you are using. 1281For the REAL version, the commands for the small data sets are 1282 1283\begin{list}{}{} 1284\item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out } 1285\item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out } 1286\item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out } 1287\item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out } 1288\end{list} 1289or the commands for the large data sets are 1290\begin{list}{}{} 1291\item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out } 1292\item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out } 1293\item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out } 1294\item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out } 1295\end{list} 1296 1297\noindent 1298Similar commands should be used for the other data types. 1299\end{itemize} 1300 1301\subsection{Send the Results to Tennessee}\label{sendresults} 1302 1303Congratulations! You have now finished installing, testing, and 1304timing LAPACK. If you encountered failures in any phase of the 1305testing or timing process, please 1306consult our \texttt{release\_notes} file on netlib. 1307\begin{quote} 1308\url{http://www.netlib.org/lapack/release\_notes} 1309\end{quote} 1310This file contains machine-dependent installation clues which hopefully will 1311alleviate your difficulties or at least let you know that other users 1312have had similar difficulties on that machine. If there is not an entry 1313for your machine or the suggestions do not fix your problem, please feel 1314free to contact the authors at 1315\begin{list}{}{} 1316\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. 1317\end{list} 1318Tell us the 1319type of machine on which the tests were run, the version of the operating 1320system, the compiler and compiler options that were used, 1321and details of the BLAS library or libraries that you used. You should 1322also include a copy of the output file in which the failure occurs. 1323 1324We would like to keep our \texttt{release\_notes} file as up-to-date as possible. 1325Therefore, if you do not see an entry for your machine, please contact us 1326with your testing results. 1327 1328Comments and suggestions are also welcome. 1329 1330We encourage you to make the LAPACK library available to your 1331users and provide us with feedback from their experiences. 1332%This release of LAPACK is not guaranteed to be compatible 1333%with any previous test release. 1334 1335\subsection{Get support}\label{getsupport} 1336First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}. 1337if you still cannot solve your problem, you have 2 ways to go: 1338\begin{itemize} 1339\item 1340either send a post in the LAPACK forum 1341\begin{quote} 1342\url{http://icl.cs.utk.edu/lapack-forum} 1343\end{quote} 1344\item 1345or send an email to the LAPACK mailing list: 1346\begin{list}{}{} 1347\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. 1348\end{list} 1349\end{itemize} 1350\section*{Acknowledgments} 1351 1352Ed Anderson and Susan Blackford contributed to previous versions of this report. 1353 1354\appendix 1355 1356\chapter{Caveats}\label{appendixd} 1357 1358In this appendix we list a few of the machine-specific difficulties we 1359have 1360encountered in our own experience with LAPACK. A more detailed list 1361of machine-dependent problems, bugs, and compiler errors encountered 1362in the LAPACK installation process is maintained 1363on \emph{netlib}. 1364\begin{quote} 1365\url{http://www.netlib.org/lapack/release\_notes} 1366\end{quote} 1367 1368We assume the user has installed the machine-specific routines 1369correctly and that the Level 1, 2 and 3 BLAS test programs have run 1370successfully, so we do not list any warnings associated with those 1371routines. 1372 1373\section{\texttt{LAPACK/make.inc}} 1374 1375All machine-specific 1376parameters are specified in the file \texttt{LAPACK/make.inc}. 1377 1378The first line of this \texttt{make.inc} file is: 1379\begin{quote} 1380SHELL = /bin/sh 1381\end{quote} 1382and will need to be modified to \texttt{SHELL = /sbin/sh} if you are 1383installing LAPACK on an SGI architecture. 1384 1385\section{ETIME} 1386 1387On HPPA architectures, 1388the compiler and linker flag \texttt{+U77} should be included to access 1389the function \texttt{ETIME}. 1390 1391\section{ILAENV and IEEE-754 compliance} 1392 1393%By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754 1394%compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10}) 1395%and (\texttt{ISPEC=11}) settings in ILAENV. 1396% 1397%If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 1398%as this test inside ILAENV will crash! 1399 1400As some new routines in LAPACK rely on IEEE-754 compliance, 1401two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV 1402(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and 1403infinity arithmetic, respectively. By default, ILAENV assumes an IEEE 1404machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you 1405are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 1406as this test inside ILAENV will crash!} 1407 1408Thus, for non-IEEE machines, the user must hard-code the setting of 1409(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version 1410of \texttt{LAPACK/SRC/ilaenv.f} to be put in 1411his library. For further details, refer to section~\ref{testieee}. 1412 1413Be aware 1414that some IEEE compilers by default do not enforce IEEE-754 compliance, and 1415a compiler flag must be explicitly set by the user. 1416 1417On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler 1418flag to enable IEEE-754 compliance. 1419 1420And lastly, the test inside ILAENV to detect IEEE-754 compliance, will 1421result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''. 1422Thus, if the user is installing on a machine that issues IEEE exception 1423warning messages (like a Sun SPARCstation), the user can disregard these 1424messages. To avoid these messages, the user can hard-code the values 1425inside ILAENV as explained in section~\ref{testieee}. 1426 1427\section{Lack of \texttt{/tmp} space} 1428 1429If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your 1430architecture, you may run out of space 1431when compiling. There are a few possible solutions to this problem. 1432\begin{enumerate} 1433\item You can ask your system administrator to increase the size of the 1434\texttt{/tmp} partition. 1435\item You can change the environment variable \texttt{TMPDIR} to point to 1436your home directory for temporary space. E.g., 1437\begin{quote} 1438\texttt{setenv TMPDIR /home/userid/} 1439\end{quote} 1440where \texttt{/home/userid/} is the user's home directory. 1441\item If your archive command has an \texttt{l} option, you can change the 1442archive command to \texttt{ar crl} so that the 1443archive command will only place temporary files in the current working 1444directory rather than in the default temporary directory /tmp. 1445\end{enumerate} 1446 1447\section{BLAS} 1448 1449If you suspect a BLAS-related problem and you are linking 1450with an optimized version of the BLAS, we would strongly suggest 1451as a first step that you link to the Fortran~77 version of 1452the suspected BLAS routine and see if the error has disappeared. 1453 1454We have included test programs for the Level 1 BLAS. 1455Users should therefore beware of a common problem in machine-specific 1456implementations of xNRM2, 1457the function to compute the 2-norm of a vector. 1458The Fortran version of xNRM2 avoids underflow or overflow 1459by scaling intermediate results, but some library versions of xNRM2 1460are not so careful about scaling. 1461If xNRM2 is implemented without scaling intermediate results, some of 1462the LAPACK test ratios may be unusually high, or 1463a floating point exception may occur in the problems scaled near 1464underflow or overflow. 1465The solution to these problems is to link the Fortran version of 1466xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77 1467version of xNRM2 should be used.} 1468 1469\section{Optimization} 1470 1471If a large numbers of test failures occur for a specific matrix type 1472or operation, it could be that there is an optimization problem with 1473your compiler. Thus, the user could try reducing the level of 1474optimization or eliminating optimization entirely for those routines 1475to see if the failures disappear when you rerun the tests. 1476 1477%LAPACK is written in Fortran 77. Prospective users with only a 1478%Fortran 66 compiler will not be able to use this package. 1479 1480\section{Compiling testing/timing drivers} 1481 1482The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and 1483xTIMEE) 1484allocate large amounts of local variables. Therefore, it is vitally 1485important that the user know if his compiler by default allocates local 1486variables statically or on the stack. It is not uncommon for those 1487compilers which place local variables on the stack to cause a stack 1488overflow at runtime in the testing or timing process. The user then 1489has two options: increase your stack size, or force all local variables 1490to be allocated statically. 1491 1492On HPPA architectures, the 1493compiler and linker flag \texttt{-K} should be used when compiling these testing 1494and timing main programs to avoid such a stack overflow. I.e., set 1495\texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file. 1496 1497For similar reasons, 1498on SGI architectures, the compiler and linker flag \texttt{-static} should be 1499used. I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file. 1500 1501\section{IEEE arithmetic} 1502 1503Some of our test matrices are scaled near overflow or underflow, 1504but on the Crays, problems with the arithmetic near overflow and 1505underflow forced us to scale by only the square root of overflow 1506and underflow. 1507The LAPACK auxiliary routine SLABAD (or DLABAD) is called to 1508take the square root of underflow and overflow in cases where it 1509could cause difficulties. 1510We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$ 1511is greater than 2000 1512and take the square root of underflow and overflow in this case. 1513The test in SLABAD is as follows: 1514\begin{verbatim} 1515 IF( LOG10( LARGE ).GT.2000. ) THEN 1516 SMALL = SQRT( SMALL ) 1517 LARGE = SQRT( LARGE ) 1518 END IF 1519\end{verbatim} 1520Users of other machines with similar restrictions on the effective 1521range of usable numbers may have to modify this test so that the 1522square roots are done on their machine as well. \emph{Usually on 1523HPPA architectures, a similar restriction in SLABAD should be enforced 1524for all testing involving complex arithmetic.} 1525SLABAD is located in \texttt{LAPACK/SRC}. 1526 1527For machines which have a narrow exponent range or lack gradual 1528underflow (DEC VAXes for example), it is not uncommon to experience 1529failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL. 1530The failures in SLAQTR/DLAQTR and DTRSYL 1531occur with test problems which are very badly scaled when the norm of 1532the solution is very close to the underflow 1533threshold (or even underflows to zero). We believe that these failures 1534could probably be avoided by an even greater degree of care in scaling, 1535but we did not want to delay the release of LAPACK any further. These 1536tests pass successfully on most other machines. An example failure in 1537dec.out on a MicroVAX II looks like the following: 1538 1539\begin{verbatim} 1540Tests of the Nonsymmetric eigenproblem condition estimation routines 1541DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR 1542 1543Relative machine precision (EPS) = 0.277556D-16 1544Safe minimum (SFMIN) = 0.587747D-38 1545 1546Routines pass computational tests if test ratio is less than 20.00 1547 1548DEC routines passed the tests of the error exits ( 35 tests done) 1549Error in DTRSYL: RMAX = 0.155D+07 1550LMAX = 5323 NINFO= 1600 KNT= 27648 1551Error in DLAQTR: RMAX = 0.344D+04 1552LMAX = 15792 NINFO= 26720 KNT= 45000 1553\end{verbatim} 1554 1555\section{Timing programs} 1556 1557In the eigensystem timing program, calls are made to the LINPACK 1558and EISPACK equivalents of the LAPACK routines to allow a direct 1559comparison of performance measures. 1560In some cases we have increased the minimum number of 1561iterations in the LINPACK and EISPACK routines to allow 1562them to converge for our test problems, but 1563even this may not be enough. 1564One goal of the LAPACK project is to improve the convergence 1565properties of these routines, so error messages in the output 1566file indicating that a LINPACK or EISPACK routine did not 1567converge should not be regarded with alarm. 1568 1569In the eigensystem timing program, we have equivalenced some work 1570arrays and then passed them to a subroutine, where both arrays are 1571modified. This is a violation of the Fortran~77 standard, which 1572says ``if a subprogram reference causes a dummy argument in the 1573referenced subprogram to become associated with another dummy 1574argument in the referenced subprogram, neither dummy argument may 1575become defined during execution of the subprogram.'' 1576\footnote{ ANSI X3.9-1978, sec. 15.9.3.6} 1577If this causes any difficulties, the equivalence 1578can be commented out as explained in the comments for the main 1579eigensystem timing programs. 1580 1581%\section*{MACHINE-SPECIFIC DIFFICULTIES} 1582%Some IBM compilers do not recognize DBLE as a generic function as used 1583%in LAPACK. The software tools we use to convert from single precision 1584%to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX, 1585%to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but 1586%IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and 1587%imaginary parts of a double complex number. 1588%IBM users can fix this problem by changing DBLE to DREAL when the 1589%argument of DBLE is COMPLEX*16. 1590% 1591%IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION 1592%subprogram definition. The data type on the first line of the 1593%function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX 1594%for the following functions: 1595% 1596%\begin{tabbing} 1597%\dent ZLATMOO \= from the test matrix generator library \kill 1598%\dent ZBEG \> from the Level 2 BLAS test program \\ 1599%\dent ZBEG \> from the Level 3 BLAS test program \\ 1600%\dent ZLADIV \> from the LAPACK library \\ 1601%\dent ZLARND \> from the test matrix generator library \\ 1602%\dent ZLATM2 \> from the test matrix generator library \\ 1603%\dent ZLATM3 \> from the test matrix generator library 1604%\end{tabbing} 1605%The functions ZDOTC and ZDOTU from the Level 1 BLAS are already 1606%declared DOUBLE COMPLEX. If that doesn't work, try the declaration 1607%COMPLEX FUNCTION*16. 1608 1609 1610\newpage 1611\addcontentsline{toc}{section}{Bibliography} 1612 1613\begin{thebibliography}{9} 1614 1615\bibitem{LUG} 1616E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, 1617J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, 1618S. Ostrouchov, and D. Sorensen, 1619\textit{LAPACK Users' Guide}, Second Edition, 1620{SIAM}, Philadelphia, PA, 1995. 1621 1622\bibitem{WN16} 1623E. Anderson and J. Dongarra, 1624\textit{LAPACK Working Note 16: 1625Results from the Initial Release of LAPACK}, 1626University of Tennessee, CS-89-89, November 1989. 1627 1628\bibitem{WN41} 1629E. Anderson, J. Dongarra, and S. Ostrouchov, 1630\textit{LAPACK Working Note 41: 1631Installation Guide for LAPACK}, 1632University of Tennessee, CS-92-151, February 1992 (revised June 1999). 1633 1634\bibitem{WN5} 1635C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, 1636S. Hammarling, and D. Sorensen, 1637\textit{LAPACK Working Note \#5: Provisional Contents}, 1638Argonne National Laboratory, ANL-88-38, September 1988. 1639 1640\bibitem{WN13} 1641Z. Bai, J. Demmel, and A. McKenney, 1642\textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric 1643Eigenvalue Problem: Theory and Software}, 1644University of Tennessee, CS-89-86, October 1989. 1645 1646\bibitem{XBLAS} 1647X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, 1648W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, 1649and D. J. Yoo, \textit{Design, implementation and testing of extended 1650 and mixed precision BLAS}, 1651\textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002. 1652 1653\bibitem{BLAS3} 1654J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, 1655``A Set of Level 3 Basic Linear Algebra Subprograms,'' 1656\textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990 1657%Argonne National Laboratory, ANL-MCS-P88-1, August 1988. 1658 1659\bibitem{BLAS3-test} 1660J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, 1661``A Set of Level 3 Basic Linear Algebra Subprograms: 1662Model Implementation and Test Programs,'' 1663\textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990 1664%Argonne National Laboratory, ANL-MCS-TM-119, June 1988. 1665 1666\bibitem{BLAS2} 1667J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, 1668``An Extended Set of Fortran Basic Linear Algebra Subprograms,'' 1669\textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988. 1670 1671\bibitem{BLAS2-test} 1672J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, 1673``An Extended Set of Fortran Basic Linear Algebra Subprograms: 1674Model Implementation and Test Programs,'' 1675\textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988. 1676 1677\bibitem{BLAS1} 1678C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, 1679``Basic Linear Algebra Subprograms for Fortran Usage,'' 1680\textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979. 1681 1682\end{thebibliography} 1683 1684\end{document} 1685