1\documentclass[11pt]{report}
2
3\usepackage{indentfirst}
4\usepackage[body={6in,8.5in}]{geometry}
5\usepackage{hyperref}
6\usepackage{graphicx}
7\DeclareGraphicsRule{.ps}{eps}{}{}
8
9\renewcommand{\thesection}{\arabic{section}}
10\setcounter{tocdepth}{3}
11\setcounter{secnumdepth}{3}
12
13\begin{document}
14\begin{center}
15  {\Large LAPACK Working Note 81\\
16  Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
17 supported by NSF Grant No. ASC-8715728  and NSF Grant No. 0444486}}
18\end{center}
19\begin{center}
20%  Edward Anderson\footnote{Current address:  Cray Research Inc.,
21%                           655F Lone Oak Drive, Eagan, MN  55121},
22  The LAPACK Authors\\
23  Department of Computer Science \\
24  University of Tennessee \\
25  Knoxville, Tennessee  37996-1301 \\
26\end{center}
27\begin{center}
28  REVISED:  VERSION 3.1.1, February 2007 \\
29  REVISED:  VERSION 3.2.0, November 2008
30\end{center}
31
32\begin{center}
33Abstract
34\end{center}
35This working note describes how to install, and test version 3.2.0
36of LAPACK, a linear algebra package for high-performance
37computers, on a Unix System.  The timing routines are not actually included in
38release 3.2.0, and that part of the LAWN refers to release 3.0.  Also,
39version 3.2.0 contains many prototype routines needing user feedback.
40Non-Unix installation instructions and
41further details of the testing and timing suites are only contained in
42LAPACK Working Note 41, and not in this abbreviated version.
43%Separate instructions are provided for the Unix and non-Unix
44%versions of the test package.
45%Further details are also given on the design of the test and timing
46%programs.
47\newpage
48
49\tableofcontents
50
51\newpage
52% Introduction to Implementation Guide
53
54\section{Introduction}
55
56LAPACK is a linear algebra library for high-performance
57computers.
58The library includes Fortran subroutines for
59the analysis and solution of systems of simultaneous linear algebraic
60equations, linear least-squares problems, and matrix eigenvalue
61problems.
62Our approach to achieving high efficiency is based on the use of
63a standard set of Basic Linear Algebra Subprograms (the BLAS),
64which can be optimized for each computing environment.
65By confining most of the computational work to the BLAS,
66the subroutines should be
67transportable and efficient across a wide range of computers.
68
69This working note describes how to install, test, and time this
70release of LAPACK on a Unix System.
71
72The instructions for installing, testing, and timing
73\footnote{timing are only provided in LAPACK 3.0 and before}
74are designed for a person whose
75responsibility is the maintenance of a mathematical software library.
76We assume the installer has experience in compiling and running
77Fortran programs and in creating object libraries.
78The installation process involves untarring the file, creating a set of
79libraries, and compiling and running the test and timing programs
80\footnotemark[\value{footnote}].
81
82%This guide combines the instructions for the Unix and non-Unix
83%versions of the LAPACK test package (the non-Unix version is in Appendix
84%~\ref{appendixe}).
85%At this time, the non-Unix version of LAPACK can only be obtained
86%after first untarring the Unix tar tape and then following the instructions in
87%Appendix ~\ref{appendixe}.
88
89Section~\ref{fileformat} describes how the files are organized in the
90file, and
91Section~\ref{overview} gives a general overview of the parts of the test package.
92Step-by-step instructions appear in Section~\ref{installation}.
93%for the Unix version and in the appendix for the non-Unix version.
94
95For users desiring additional information, please refer to LAPACK
96Working Note 41.
97% Sections~\ref{moretesting}
98%and ~\ref{moretiming} give
99%details of the test and timing programs and their input files.
100%Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe
101%the LAPACK routines and auxiliary routines provided
102%in this release.
103%Appendix ~\ref{appendixc} lists the operation counts we have computed
104%for the BLAS and for some of the LAPACK routines.
105Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known
106problems from our own experiences, with suggestions on how to
107overcome them.
108
109\textbf{It is strongly advised that the user read Appendix
110A before proceeding with the installation process.}
111%Appendix E contains the execution times of the different test
112%and timing runs on two sample machines.
113%Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
114%system.
115
116\section{Revisions Since the First Public Release}
117
118Since its first public release in February, 1992, LAPACK has had
119several updates, which have encompassed the introduction of new routines
120as well as extending the functionality of existing routines.  The first
121update,
122June 30, 1992, was version 1.0a; the second update, October 31, 1992,
123was version 1.0b; the third update, March 31, 1993, was version 1.1;
124version 2.0 on September 30, 1994, coincided with the release of the
125Second Edition of the LAPACK Users' Guide;
126version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
127the LAPACK Users' Guide;
128version 3.1 was released on November, 2006;
129version 3.1.1 was released on November, 2007;
130and version 3.2.0 was released on November, 2008.
131
132All LAPACK routines reflect the current version number with the date
133on the routine indicating when it was last modified.
134For more information on revisions in the latest release, please refer
135to the \texttt{revisions.info} file in the lapack directory on netlib.
136\begin{quote}
137\url{http://www.netlib.org/lapack/revisions.info}
138\end{quote}
139
140%The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
141%available on netlib is always the most up-to-date.
142%
143%On-line manpages (troff files) for LAPACK driver and computational
144%routines, as well as most of the BLAS routines, are available via
145%the \texttt{lapack} index on netlib.
146
147\section{File Format}\label{fileformat}
148
149The software for LAPACK is distributed in the form of a
150gzipped tar file (via anonymous ftp or the World Wide Web),
151which contains the Fortran source for LAPACK,
152the Basic Linear Algebra Subprograms
153(the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
154and the timing programs\footnotemark[\value{footnote}].
155Users who wish to have a non-Unix installation should refer to LAPACK
156Working Note 41,
157although the overview in section~\ref{overview} applies to both the Unix and non-Unix
158versions.
159%Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
160%although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
161%versions.
162
163The package may be accessed via the World Wide Web through
164the URL address:
165\begin{quote}
166\url{http://www.netlib.org/lapack/lapack.tgz}
167\end{quote}
168
169Or, you can retrieve the file via anonymous ftp at netlib:
170
171\begin{verbatim}
172     ftp ftp.netlib.org
173     login:  anonymous
174     password:  <your email address>
175     cd lapack
176     binary
177     get lapack.tgz
178     quit
179\end{verbatim}
180
181The software in the \texttt{tar} file
182is organized in a number of essential directories as shown
183in Figure 1.  Please note that this figure does not reflect everything
184that is contained in the \texttt{LAPACK} directory.  Input and instructional
185files are also located at various levels.
186\begin{figure}
187\vspace{11pt}
188\centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
189\caption{Unix organization of LAPACK 3.0}
190\vspace{11pt}
191\end{figure}
192Libraries are created in the LAPACK directory and
193executable files are created in one of the directories BLAS, TESTING,
194or TIMING\footnotemark[\value{footnote}].  Input files for the test and
195timing\footnotemark[\value{footnote}]  programs are also
196found in these three directories so that testing may be carried out
197in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
198A top-level makefile in the LAPACK directory is provided to perform the
199entire installation procedure.
200
201\section{Overview of Tape Contents}\label{overview}
202
203Most routines in LAPACK occur in four versions: REAL,
204DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
205The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
206are written in standard Fortran and are completely portable;
207the COMPLEX*16 version is provided for
208those compilers which allow this data type.
209Some routines use features of Fortran 90.
210For convenience, we often refer to routines by their single precision
211names; the leading `S' can be replaced by a `D' for double precision,
212a `C' for complex, or a `Z' for complex*16.
213For LAPACK use and testing you must decide which version(s)
214of the package you intend to install at your site (for example,
215REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
216COMPLEX*16 on an IBM computer).
217
218\subsection{LAPACK Routines}
219
220There are three classes of LAPACK routines:
221\begin{itemize}
222
223\item \textbf{driver} routines solve a complete problem, such as solving
224a system of linear equations or computing the eigenvalues of a real
225symmetric matrix.  Users are encouraged to use a driver routine if there
226is one that meets their requirements.  The driver routines are listed
227in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
228%in Appendix ~\ref{appendixa}.
229
230\item \textbf{computational} routines, also called simply LAPACK routines,
231perform a distinct computational task, such as computing
232the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
233eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
234the $QR$ algorithm.
235The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
236and the LAPACK Users' Guide~\cite{LUG}.
237%The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
238%Working Note \#5 \cite{WN5}.
239
240\item \textbf{auxiliary} routines are all the other subroutines called
241by the driver routines and computational routines.
242%Among them are subroutines to perform subtasks of block algorithms,
243%in particular, the unblocked versions of the block algorithms;
244%extensions to the BLAS, such as matrix-vector operations involving
245%complex symmetric matrices;
246%the special routines LSAME and XERBLA which first appeared with the
247%BLAS;
248%and a number of routines to perform common low-level computations,
249%such as computing a matrix norm, generating an elementary Householder
250%transformation, and applying a sequence of plane rotations.
251%Many of the auxiliary routines may be of use to numerical analysts
252%or software developers, so we have documented the Fortran source for
253%these routines with the same level of detail used for the LAPACK
254%routines and driver routines.
255The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
256and the LAPACK Users' Guide~\cite{LUG}.
257%The auxiliary routines are listed in Appendix ~\ref{appendixb}.
258\end{itemize}
259
260\subsection{Level 1, 2, and 3 BLAS}
261
262The BLAS are a set of Basic Linear Algebra Subprograms that perform
263vector-vector, matrix-vector, and matrix-matrix operations.
264LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
265of the parallelism in the LAPACK routines is contained in the BLAS.
266Therefore,
267the key to getting good performance from LAPACK lies in having an
268efficient version of the BLAS optimized for your particular machine.
269Optimized BLAS libraries are available on a variety of architectures,
270refer to the BLAS FAQ on netlib for further information.
271\begin{quote}
272\url{http://www.netlib.org/blas/faq.html}
273\end{quote}
274There are also freely available BLAS generators that automatically
275tune a subset of the BLAS for a given architecture.  E.g.,
276\begin{quote}
277\url{http://www.netlib.org/atlas/}
278\end{quote}
279And, if all else fails, there is the Fortran~77 reference implementation
280of the Level 1, 2, and 3 BLAS available on netlib (also included in
281the LAPACK distribution tar file).
282\begin{quote}
283\url{http://www.netlib.org/blas/blas.tgz}
284\end{quote}
285No matter which BLAS library is used, the BLAS test programs should
286always be run.
287
288Users should not expect too much from the Fortran~77 reference implementation
289BLAS; these versions were written to define the basic operations and do not
290employ the standard tricks for optimizing Fortran code.
291
292The formal definitions of the Level 1, 2, and 3 BLAS
293are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}.
294The BLAS Quick Reference card is available on netlib.
295
296\subsection{Mixed- and Extended-Precision BLAS: XBLAS}
297
298The XBLAS extend the BLAS to work with mixed input and output
299precisions as well as using extra precision internally.  The XBLAS are
300used in the prototype extra-precise iterative refinement codes.
301
302The current release of the XBLAS is available through
303Netlib\footnote{Development versions may be available through
304  \url{http://www.cs.berkeley.edu/~yozo/} or
305  \url{http://www.nersc.gov/~xiaoye/XBLAS/}.}  at
306\begin{quote}
307  \url{http://www.netlib.org/xblas}
308\end{quote}
309Their formal definition is in \cite{XBLAS}.
310
311\subsection{LAPACK Test Routines}
312
313This release contains two distinct test programs for LAPACK routines
314in each data type.  One test program tests the routines for solving
315linear equations and linear least squares problems,
316and the other tests routines for the matrix eigenvalue problem.
317The routines for generating test matrices are used by both test
318programs and are compiled into a library for use by both test programs.
319
320\subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
321
322This release also contains two distinct timing programs for the
323LAPACK routines in each data type.
324The linear equation timing program gathers performance data in
325megaflops on the factor, solve, and inverse routines for solving
326linear systems, the routines to generate or apply an orthogonal matrix
327given as a sequence of elementary transformations, and the reductions
328to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
329computations.
330The operation counts used in computing the megaflop rates are computed
331from a formula;
332see LAPACK Working Note 41~\cite{WN41}.
333% see Appendix ~\ref{appendixc}.
334The eigenvalue timing program is used with the eigensystem routines
335and returns the execution time, number of floating point operations, and
336megaflop rate for each of the requested subroutines.
337In this program, the number of operations is computed while the
338code is executing using special instrumented versions of the LAPACK
339subroutines.
340
341\section{Installing LAPACK on a Unix System}\label{installation}
342
343Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
344involves the following steps:
345\begin{enumerate}
346\item Gunzip and tar the file.
347
348\item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
349
350\item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
351
352%\item Test and Install the Machine-Dependent Routines \\
353%\emph{(WARNING:  You may need to supply a correct version of second.f and
354%dsecnd.f for your machine)}
355%{\tt
356%\begin{list}{}{}
357%\item cd LAPACK
358%\item make install
359%\end{list} }
360%
361%\item Create the BLAS Library, \emph{if necessary} \\
362%\emph{(NOTE:  For best performance, it is recommended you use the manufacturers' BLAS)}
363%{\tt
364%\begin{list}{}{}
365%\item \texttt{cd LAPACK}
366%\item \texttt{make blaslib}
367%\end{list} }
368%
369%\item Run the Level 1, 2, and 3 BLAS Test Programs
370%\begin{list}{}{}
371%\item \texttt{cd LAPACK}
372%\item \texttt{make blas\_testing}
373%\end{list}
374%
375%\item Create the LAPACK Library
376%\begin{list}{}{}
377%\item \texttt{cd LAPACK}
378%\item \texttt{make lapacklib}
379%\end{list}
380%
381%\item Create the Library of Test Matrix Generators
382%\begin{list}{}{}
383%\item \texttt{cd LAPACK}
384%\item \texttt{make tmglib}
385%\end{list}
386%
387%\item Run the LAPACK Test Programs
388%\begin{list}{}{}
389%\item \texttt{cd LAPACK}
390%\item \texttt{make testing}
391%\end{list}
392%
393%\item Run the LAPACK Timing Programs
394%\begin{list}{}{}
395%\item \texttt{cd LAPACK}
396%\item \texttt{make timing}
397%\end{list}
398%
399%\item Run the BLAS Timing Programs
400%\begin{list}{}{}
401%\item \texttt{cd LAPACK}
402%\item \texttt{make blas\_timing}
403%\end{list}
404\end{enumerate}
405
406\subsection{Untar the File}
407
408If you received a tar file of LAPACK via the World Wide
409Web or anonymous ftp, enter the following command:
410
411\begin{list}{}
412\item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
413\end{list}
414
415\noindent
416This will create a top-level directory called \texttt{LAPACK}, which
417requires approximately 34 Mbytes of disk space.
418The total space requirements including the object files and executables
419is approximately 100 Mbytes for all four data types.
420
421\subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
422
423Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
424run, you must define all machine-specific parameters for the
425architecture to which you are installing LAPACK.  All machine-specific
426parameters are contained in the file \texttt{LAPACK/make.inc}.
427An example of  \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given
428in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
429
430\begin{list}{}
431\item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
432\end{list}
433
434\noindent
435Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
436The first line of this \texttt{make.inc} file is:
437\begin{quote}
438SHELL = /bin/sh
439\end{quote}
440and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
441installing LAPACK on an SGI architecture.
442Next, you will need to modify \texttt{FC}, \texttt{FFLAGS},
443\texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify
444the compiler, compiler options, compiler options for the testing and
445timing\footnotemark[\value{footnote}] main programs, and linker options.
446Next you will have to choose which function you will use to time in the
447\texttt{SECOND} and \texttt{DSECND} routines.
448\begin{verbatim}
449#  Default:  SECOND and DSECND will use a call to the
450#  EXTERNAL FUNCTION ETIME
451#TIMER = EXT_ETIME
452#  For RS6K:  SECOND and DSECND will use a call to the
453#  EXTERNAL FUNCTION ETIME_
454#TIMER = EXT_ETIME_
455#  For gfortran compiler:  SECOND and DSECND will use a call to the
456#  INTERNAL FUNCTION ETIME
457TIMER = INT_ETIME
458#  If your Fortran compiler does not provide etime (like Nag Fortran
459#  Compiler, etc...) SECOND and DSECND will use a call to the
460#  INTERNAL FUNCTION CPU_TIME
461#TIMER = INT_CPU_TIME
462#  If none of these work, you can use the NONE value.
463#  In that case, SECOND and DSECND will always return 0.
464#TIMER = NONE
465\end{verbatim}
466Refer to the section~\ref{second} to get more information.
467
468
469Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver,
470archiver options, and ranlib for your machine.  If your architecture
471does not require \texttt{ranlib} to be run after each archive command (as
472is the case with CRAY computers running UNICOS, Hewlett Packard
473computers running HP-UX, or SUN SPARCstations running Solaris), set
474\texttt{RANLIB = echo}.  And finally, you must
475modify the \texttt{BLASLIB} definition to specify the BLAS library to which
476you will be linking.  If an optimized version of the BLAS is available
477on your machine, you are highly recommended to link to that library.
478Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
479
480If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
481to some value, for example \texttt{USEXBLAS = Yes}.  Then set the
482variable \texttt{XBLASLIB} to point at the XBLAS library.  Note that
483the prototype iterative refinement routines and their testers will not
484be built unless \texttt{USEXBLAS} is defined.
485
486\textbf{NOTE:}  Example \texttt{make.inc} include files are contained in the
487\texttt{LAPACK/INSTALL} directory.  Please refer to
488Appendix~\ref{appendixd} for machine-specific installation hints, and/or
489the \texttt{release\_notes} file on \texttt{netlib}.
490\begin{quote}
491\url{http://www.netlib.org/lapack/release\_notes}
492\end{quote}
493
494\subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
495
496This \texttt{Makefile} can be modified to perform as much of the
497installation process as the user desires.  Ideally, this is the ONLY
498makefile the user must modify.  However, modification of lower-level
499makefiles may be necessary if a specific routine needs to be compiled
500with a different level of optimization.
501
502First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
503\texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
504to specify the data types desired.  For example,
505if you only wish to compile the single precision real version of the
506LAPACK library, you would modify the \texttt{lapacklib} definition to be:
507
508\begin{verbatim}
509lapacklib:
510        $(MAKE) -C SRC single
511\end{verbatim}
512
513Likewise, you could specify \texttt{double, complex, or complex16} to
514build the double precision real, single precision complex, or double
515precision complex libraries, respectively.  By default, the presence of
516no arguments following the \texttt{make} command will result in the
517building of all four data types.
518The make command can be run more than once to add another
519data type to the library if necessary.
520
521%If you are installing LAPACK on a Silicon Graphics machine, you must
522%modify the respective definitions of \texttt{testing} and \texttt{timing} to be
523%\begin{verbatim}
524%testing:
525%        ( cd TESTING; $(MAKE) -f Makefile.sgi )
526%\end{verbatim}
527%and
528%\begin{verbatim}
529%timing:
530%        ( cd TIMING; $(MAKE) -f Makefile.sgi )
531%\end{verbatim}
532
533Next, if you will be using a locally available BLAS library, you will need
534to remove \texttt{blaslib} from the \texttt{lib} definition.  And finally,
535if you do not wish to build all of the libraries individually and
536likewise run all of the testing and timing separately, you can
537modify the \texttt{all} definition to specify the amount of the
538installation process that you want performed.  By default,
539the \texttt{all} definition is set to
540\begin{verbatim}
541all: lapack_install lib lapack_testing blas_testing
542\end{verbatim}
543which will perform all phases of the installation
544process -- testing of machine-dependent routines, building the libraries,
545BLAS testing and LAPACK testing.
546
547The entire installation process will then be performed by typing
548\texttt{make}.
549
550Questions and/or comments can be directed to the
551authors as described in Section~\ref{sendresults}.  If test failures
552occur, please refer to the appropriate subsection in
553Section~\ref{furtherdetails}.
554
555If disk space is limited, we suggest building each data type separately
556and/or deleting all object files after building the libraries.  Likewise, all
557testing and timing executables can be deleted after the testing and timing
558process is completed.  The removal of all object files and executables
559can be accomplished by the following:
560
561\begin{list}{}{}
562\item \texttt{cd LAPACK}
563\item \texttt{make cleanobj}
564\end{list}
565
566\section{Further Details of the Installation Process}\label{furtherdetails}
567
568Alternatively, you can choose to run each of the phases of the
569installation process separately.  The following sections give details
570on how this may be achieved.
571
572\subsection{Test and Install the Machine-Dependent Routines.}
573
574There are six machine-dependent functions in the test and timing
575package, at least three of which must be installed.  They are
576
577\begin{tabbing}
578MONOMO  \=  DOUBLE PRECYSION  \=  \kill
579LSAME   \>  LOGICAL      \> Test if two characters are the same regardless of case \\
580SLAMCH  \>  REAL  \> Determine machine-dependent parameters \\
581DLAMCH  \>  DOUBLE PRECISION \> Determine machine-dependent parameters \\
582SECOND  \>  REAL  \> Return time in seconds from a fixed starting time \\
583DSECND  \>  DOUBLE PRECISION  \> Return time in seconds from a fixed starting time\\
584ILAENV  \>  INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
585\end{tabbing}
586
587\noindent
588If you are working only in single precision, you do not need to install
589DLAMCH and DSECND, and if you are working only in double precision,
590you do not need to install SLAMCH and SECOND.
591
592These six subroutines are provided in \texttt{LAPACK/INSTALL},
593along with six test programs.
594To compile the six test programs and run the tests, go to \texttt{LAPACK} and
595type \texttt{make lapack\_install}.  The test programs are called
596\texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
597\texttt{testieee}.
598If you do not wish to run all tests, you will need to modify the
599\texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
600tests you wish to run.  Otherwise, all tests will be performed.
601The expected results of each test program are described below.
602
603\subsubsection{Installing LSAME}
604
605LSAME is a logical function with two character parameters, A and B.
606It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
607if they are different.
608For example, the expression
609
610\begin{list}{}{}
611\item \texttt{LSAME( UPLO, 'U' )}
612\end{list}
613\noindent
614is equivalent to
615\begin{list}{}{}
616\item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
617\end{list}
618
619The test program in \texttt{lsametst.f} tests all combinations of
620the same character in upper and lower case for A and B, and two
621cases where A and B are different characters.
622
623Run the test program by typing \texttt{testlsame}.
624If LSAME works correctly, the only message you should see after the
625execution of \texttt{testlsame} is
626\begin{verbatim}
627 ASCII character set
628 Tests completed
629\end{verbatim}
630The file \texttt{lsame.f} is automatically copied to
631\texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}.
632The function LSAME is needed by both the BLAS and LAPACK, so it is safer
633to have it in both libraries as long as this does not cause trouble
634in the link phase when both libraries are used.
635
636\subsubsection{Installing SLAMCH and DLAMCH}
637
638SLAMCH and DLAMCH are real functions with a single character parameter
639that indicates the machine parameter to be returned.  The test
640program in \texttt{slamchtst.f}
641simply prints out the different values computed by SLAMCH,
642so you need to know something about what the values should be.
643For example, the output of the test program executable \texttt{testslamch}
644for SLAMCH on a Sun SPARCstation is
645\begin{verbatim}
646 Epsilon                      =     5.96046E-08
647 Safe minimum                 =     1.17549E-38
648 Base                         =     2.00000
649 Precision                    =     1.19209E-07
650 Number of digits in mantissa =     24.0000
651 Rounding mode                =     1.00000
652 Minimum exponent             =    -125.000
653 Underflow threshold          =     1.17549E-38
654 Largest exponent             =     128.000
655 Overflow threshold           =     3.40282E+38
656 Reciprocal of safe minimum   =     8.50706E+37
657\end{verbatim}
658On a Cray machine, the safe minimum underflows its output
659representation and the overflow threshold overflows its output
660representation, so the safe minimum is printed as 0.00000 and overflow
661is printed as R.  This is normal.
662If you would prefer to print a representable number, you can modify
663the test program to print SFMIN*100. and RMAX/100. for the safe
664minimum and overflow thresholds.
665
666Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
667
668If both tests were successful, go to Section~\ref{second}.
669
670If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
671your own version of this function.  The following options are used in
672LAPACK and must be set:
673
674\begin{list}{}{}
675\item {`B': }  Base of the machine
676\item {`E': }  Epsilon (relative machine precision)
677\item {`O': }  Overflow threshold
678\item {`P': }  Precision = Epsilon*Base
679\item {`S': }  Safe minimum (often same as underflow threshold)
680\item {`U': }  Underflow threshold
681\end{list}
682
683Some people may be familiar with R1MACH (D1MACH), a primitive
684routine for setting machine parameters in which the user must
685comment out the appropriate assignment statements for the target
686machine.  If a version of R1MACH is on hand, the assignments in
687SLAMCH can be made to refer to R1MACH using the correspondence
688
689\begin{list}{}{}
690\item {SLAMCH( `U' )}  $=$ R1MACH( 1 )
691\item {SLAMCH( `O' )}  $=$ R1MACH( 2 )
692\item {SLAMCH( `E' )}  $=$ R1MACH( 3 )
693\item {SLAMCH( `B' )}  $=$ R1MACH( 5 )
694\end{list}
695
696\noindent
697The safe minimum returned by SLAMCH( 'S' ) is initially set to the
698underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
699it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
700where $\varepsilon$ is the machine precision.
701
702BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.
703We suggest that installers run it once, save the results, and hard-code
704the constants in the version they put in their library.
705
706\subsubsection{Installing SECOND and DSECND}\label{second}
707
708Both the timing routines\footnotemark[\value{footnote}]  and the test routines call SECOND
709(DSECND), a real function with no arguments that returns the time
710in seconds from some fixed starting time.
711Our version of this routine
712returns only ``user time'', and not ``user time $+$ system time''.
713The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls
714ETIME, a Fortran library routine available on some computer systems.
715If ETIME is not available or a better local timing function exists,
716you will have to provide the correct interface to SECOND and DSECND
717on your machine.
718
719Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
720The version that will be used depends on the value of the TIMER variable in the make.inc
721
722\begin{itemize}
723\item If ETIME is available as an external function, set the value of the TIMER variable in your
724make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
725Usually on HPPA architectures,
726the compiler and linker flag \texttt{+U77} should be included to access
727the function \texttt{ETIME}.
728
729\item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc
730to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
731It is the case on some IBM architectures such as IBM RS/6000s.
732
733\item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
734to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f}  and \texttt{dsecnd\_INT\_ETIME.f} will be used.
735This is the case with gfortan.
736
737\item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
738to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
739
740\item If none of these function is available, set the value of the TIMER variable in your make.inc
741to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
742These routines will always return zero.
743\end{itemize}
744
745The test program in \texttt{secondtst.f}
746performs a million operations using 5000 iterations of
747the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
748The total time and megaflops for this test is reported, then
749the operation is repeated including a call to SECOND on each of
750the 5000 iterations to determine the overhead due to calling SECOND.
751The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
752There is no single right answer, but the times
753in seconds should be positive and the megaflop ratios should be
754appropriate for your machine.
755
756\subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
757
758%\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST
759%modify ILAENV!  Otherwise, ILAENV will crash .  By default, ILAENV
760%assumes an IEEE machine, and does a test for IEEE-754 compliance.}
761
762As some new routines in LAPACK rely on IEEE-754 compliance,
763two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
764(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
765infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
766machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
767are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
768as this test inside ILAENV will crash!}
769
770If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
771issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
772and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
773
774Thus, for non-IEEE machines, the user must hard-code the setting of
775(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
776of \texttt{LAPACK/SRC/ilaenv.f} to be put in
777his library.  There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
778ILAENV that will also need to be modified.
779\begin{itemize}
780\item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
781\item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
782\item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
783\item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
784\end{itemize}
785
786%Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
787%is detected (via a call to the function ILAENV), alternative (slower)
788%algorithms will be chosen.
789%For further details, refer to the leading comments of routines such
790%as \texttt{LAPACK/SRC/sstevr.f}.
791
792The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
793architecture
794to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.
795A warning message to the user is printed if non-compliance is detected.
796This same test is performed inside the function ILAENV.  If
797\texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
798issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
799and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
800
801To avoid this IEEE test being run every time you call
802\texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
803that the user hard-code the setting of
804\texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
805his library.  As aforementioned, there are also specialized testing and
806timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
807
808\subsection{Create the BLAS Library}
809
810Ideally, a highly optimized version of the BLAS library already
811exists on your machine.
812In this case you can go directly to Section~\ref{testblas} to
813make the BLAS test programs.
814
815\begin{itemize}
816\item[a)]
817Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
818file \texttt{Makefile} to specify the data types desired, as in the example
819in Section~\ref{toplevelmakefile}.
820
821If you already have some of the BLAS, you will need to edit the file
822\texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines
823defining the BLAS you have.
824
825\item[b)]
826Type \texttt{make blaslib}.
827The make command can be run more than once to add another
828data type to the library if necessary.
829\end{itemize}
830
831\noindent
832The BLAS library is created in \texttt{LAPACK/librefblas.a},
833or in the user-defined location specified by \texttt{BLASLIB} in the file
834\texttt{LAPACK/make.inc}.
835
836\subsection{Run the BLAS Test Programs}\label{testblas}
837
838Test programs for the Level 1, 2, and 3 BLAS are in the directory
839\texttt{LAPACK/BLAS/TESTING}.
840
841To compile and run the Level 1, 2, and 3 BLAS test programs,
842go to \texttt{LAPACK} and type \texttt{make blas\_testing}.  The executable
843files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
844\texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
845depending upon the level of BLAS that it is testing.  All executable and
846output files are created in \texttt{LAPACK/BLAS/}.
847For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
848\texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}.  For the Level
8492 and 3 BLAS, the name of the output file is indicated on the first line of the
850input file and is currently defined to be \texttt{sblat2.out} for
851the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
852version, with similar names for the other data types.
853
854If the tests using the supplied data files were completed successfully,
855consider whether the tests were sufficiently thorough.
856For example, on a machine with vector registers, at least one value
857of $N$ greater than the length of the vector registers should be used;
858otherwise, important parts of the compiled code may not be
859exercised by the tests.
860If the tests were not successful, either because the program did not
861finish or the test ratios did not pass the threshold, you will
862probably have to find and correct the problem before continuing.
863If you have been testing a system-specific
864BLAS library, try using the Fortran BLAS for the routines that
865did not pass the tests.
866For more details on the BLAS test programs,
867see \cite{BLAS2-test} and \cite{BLAS3-test}.
868
869\subsection{Create the LAPACK Library}
870
871\begin{itemize}
872\item[a)]
873Go to the directory \texttt{LAPACK} and edit the definition of
874\texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
875as in the example in Section~\ref{toplevelmakefile}.
876
877\item[b)]
878Type \texttt{make lapacklib}.
879The make command can be run more than once to add another
880data type to the library if necessary.
881
882\end{itemize}
883
884\noindent
885The LAPACK library is created in \texttt{LAPACK/liblapack.a},
886or in the user-defined location specified by \texttt{LAPACKLIB} in the file
887\texttt{LAPACK/make.inc}.
888
889\subsection{Create the Test Matrix Generator Library}
890
891\begin{itemize}
892\item[a)]
893Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
894in the file \texttt{Makefile} to specify the data types desired, as in the
895example in Section~\ref{toplevelmakefile}.
896
897\item[b)]
898Type \texttt{make tmglib}.
899The make command can be run more than once to add another
900data type to the library if necessary.
901
902\end{itemize}
903
904\noindent
905The test matrix generator library is created in \texttt{LAPACK/libtmglib.a},
906or in the user-defined location specified by \texttt{TMGLIB} in the file
907\texttt{LAPACK/make.inc}.
908
909\subsection{Run the LAPACK Test Programs}
910
911There are two distinct test programs for LAPACK routines
912in each data type, one for the linear equation routines and
913one for the eigensystem routines.
914In each data type, there is one input file for testing the linear
915equation routines and eighteen input files for testing the eigenvalue
916routines.
917The input files reside in \texttt{LAPACK/TESTING}.
918For more information on the test programs and how to modify the
919input files, please refer to LAPACK Working Note 41~\cite{WN41}.
920% see Section~\ref{moretesting}.
921
922If you do not wish to run each of the tests individually, you can
923go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
924\texttt{Makefile} to specify the data types desired, and type \texttt{make
925lapack\_testing}.  This will
926compile and run the tests as described in sections~\ref{testlin}
927and ~\ref{testeig}.
928
929%If you are installing LAPACK on a Silicon Graphics machine, you must
930%modify the definition of \texttt{testing} to be
931%\begin{verbatim}
932%testing:
933%        ( cd TESTING; $(MAKE) -f Makefile.sgi )
934%\end{verbatim}
935
936\subsubsection{Testing the Linear Equations Routines}\label{testlin}
937
938\begin{itemize}
939
940\item[a)]
941Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
942desired.  The executable files are called \texttt{xlintsts, xlintstc,
943xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
944
945\item[b)]
946Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
947For the REAL version, the command is
948\begin{list}{}{}
949\item{} \texttt{xlintsts  < stest.in > stest.out}
950\end{list}
951
952\noindent
953The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
954with the leading `s' in the input and output file names replaced
955by `d', `c', or `z'.
956
957\end{itemize}
958
959If you encountered failures in this phase of the testing process, please
960refer to Section~\ref{sendresults}.
961
962\subsubsection{Testing the Eigensystem Routines}\label{testeig}
963
964\begin{itemize}
965
966\item[a)]
967Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
968desired.  The executable files are called \texttt{xeigtsts,
969xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
970in \texttt{LAPACK/TESTING}.
971
972\item[b)]
973Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
974The tests for the eigensystem routines use eighteen separate input files
975for testing the nonsymmetric eigenvalue problem,
976the symmetric eigenvalue problem, the banded symmetric eigenvalue
977problem, the generalized symmetric eigenvalue
978problem, the generalized nonsymmetric eigenvalue problem, the
979singular value decomposition, the banded singular value decomposition,
980the generalized singular value
981decomposition, the generalized QR and RQ factorizations, the generalized
982linear regression model, and the constrained linear least squares
983problem.
984The tests for the REAL version are as follows:
985\begin{list}{}{}
986\item \texttt{xeigtsts  < nep.in > snep.out}
987\item \texttt{xeigtsts  < sep.in > ssep.out}
988\item \texttt{xeigtsts  < svd.in > ssvd.out}
989\item \texttt{xeigtsts  < sec.in > sec.out}
990\item \texttt{xeigtsts  < sed.in > sed.out}
991\item \texttt{xeigtsts  < sgg.in > sgg.out}
992\item \texttt{xeigtsts  < sgd.in > sgd.out}
993\item \texttt{xeigtsts  < ssg.in > ssg.out}
994\item \texttt{xeigtsts  < ssb.in > ssb.out}
995\item \texttt{xeigtsts  < sbb.in > sbb.out}
996\item \texttt{xeigtsts  < sbal.in > sbal.out}
997\item \texttt{xeigtsts  < sbak.in > sbak.out}
998\item \texttt{xeigtsts  < sgbal.in > sgbal.out}
999\item \texttt{xeigtsts  < sgbak.in > sgbak.out}
1000\item \texttt{xeigtsts  < glm.in > sglm.out}
1001\item \texttt{xeigtsts  < gqr.in > sgqr.out}
1002\item \texttt{xeigtsts  < gsv.in > sgsv.out}
1003\item \texttt{xeigtsts  < lse.in > slse.out}
1004\end{list}
1005The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
1006use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
1007\texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
1008but the leading `s' in the other input file names must be changed
1009to `c', `d', or `z'.
1010\end{itemize}
1011
1012If you encountered failures in this phase of the testing process, please
1013refer to Section~\ref{sendresults}.
1014
1015\subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
1016
1017There are two distinct timing programs for LAPACK routines
1018in each data type, one for the linear equation routines and
1019one for the eigensystem routines.  The timing program for the
1020linear equation routines is also used to time the BLAS.
1021We encourage you to conduct these timing experiments
1022in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
1023not necessary to send timing results in all four data types.
1024
1025Two sets of input files are provided, a small set and a large set.
1026The small data sets are appropriate for a standard workstation or
1027other non-vector machine.
1028The large data sets are appropriate for supercomputers, vector
1029computers, and high-performance workstations.
1030We are mainly interested in results from the large data sets, and
1031it is not necessary to run both the large and small sets.
1032The values of N in the large data sets are about five times larger
1033than those in the small data set,
1034and the large data sets use additional values for parameters such as the
1035block size NB and the leading array dimension LDA.
1036Small data sets finished with the \_small in their name , such as
1037\texttt{stime\_small.in}, and large data sets finished with \_large in their name,
1038such as \texttt{stime\_large.in}.
1039Except as noted, the leading `s' in the input file name must be
1040replaced by `d', `c', or `z' for the other data types.
1041
1042We encourage you to obtain timing results with the large data sets,
1043as this allows us to compare different machines.
1044If this would take too much time, suggestions for paring back the large
1045data sets are given in the instructions below.
1046We also encourage you to experiment with these timing
1047programs and send us any interesting results, such as results for
1048larger problems or for a wider range of block sizes.
1049The main programs are dimensioned for the large data sets,
1050so the parameters in the main program may have to be reduced in order
1051to run the small data sets on a small machine, or increased to run
1052experiments with larger problems.
1053
1054The minimum time each subroutine will be timed is set to 0.0 in
1055the large data files and to 0.05 in the small data files, and on
1056many machines this value should be increased.
1057If the timing interval is not long
1058enough, the time for the subroutine after subtracting the overhead
1059may be very small or zero, resulting in megaflop rates that are
1060very large or zero. (To avoid division by zero, the megaflop rate is
1061set to zero if the time is less than or equal to zero.)
1062The minimum time that should be used depends on the machine and the
1063resolution of the clock.
1064
1065For more information on the timing programs and how to modify the
1066input files, please refer to LAPACK Working Note 41~\cite{WN41}.
1067% see Section~\ref{moretiming}.
1068
1069If you do not wish to run each of the timings individually, you can
1070go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
1071\texttt{Makefile} to specify the data types desired, and type \texttt{make
1072lapack\_timing}.  This will compile
1073and run the timings for the linear equation routines and the eigensystem
1074routines (see Sections~\ref{timelin} and ~\ref{timeeig}).
1075
1076%If you are installing LAPACK on a Silicon Graphics machine, you must
1077%modify the definition of \texttt{timing} to be
1078%\begin{verbatim}
1079%timing:
1080%        ( cd TIMING; $(MAKE) -f Makefile.sgi )
1081%\end{verbatim}
1082
1083If you encounter failures in any phase of the timing process, please
1084feel free to contact the authors as directed in Section~\ref{sendresults}.
1085Tell us the
1086type of machine on which the tests were run, the version of the operating
1087system, the compiler and compiler options that were used,
1088and details of the BLAS library or libraries that you used.  You should
1089also include a copy of the output file in which the failure occurs.
1090
1091Please note that the BLAS
1092timing runs will still need to be run as instructed in ~\ref{timeblas}.
1093
1094\subsubsection{Timing the Linear Equations Routines}\label{timelin}
1095
1096The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
1097and the input files are in \texttt{LAPACK/TIMING}.
1098Three input files are provided in each data type for timing the
1099linear equation routines, one for square matrices, one for band
1100matrices, and one for rectangular matrices.  The small data sets for the REAL version
1101are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
1102and the large data sets are
1103\texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
1104
1105The timing program for the least squares routines uses special instrumented
1106versions of the LAPACK routines to time individual sections of the code.
1107The first step in compiling the timing program is therefore to make a library
1108of the instrumented routines.
1109
1110\begin{itemize}
1111\item[a)]
1112\begin{sloppypar}
1113To make a library of the instrumented LAPACK routines, first
1114go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
1115by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
1116The library of instrumented code is created in
1117\texttt{LAPACK/TIMING/LIN/linsrc.a}.
1118\end{sloppypar}
1119
1120\item[b)]
1121To make the linear equation timing programs,
1122go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
1123types desired, as in the examples in Section~\ref{toplevelmakefile}.
1124The executable files are called \texttt{xlintims},
1125\texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
1126in \texttt{LAPACK/TIMING}.
1127
1128\item[c)]
1129Go to \texttt{LAPACK/TIMING} and
1130make any necessary modifications to the input files.
1131You may need to set the minimum time a subroutine will
1132be timed to a positive value, or to restrict the size of the tests
1133if you are using a computer with performance in between that of a
1134workstation and that of a supercomputer.
1135The computational requirements can be cut in half by using only one
1136value of LDA.
1137If it is necessary to also reduce the matrix sizes or the values of
1138the blocksize, corresponding changes should be made to the
1139BLAS input files (see Section~\ref{timeblas}).
1140
1141\item[d)]
1142Run the programs for each data type you are using.
1143For the REAL version, the commands for the small data sets are
1144
1145\begin{list}{}{}
1146\item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
1147\item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
1148\item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
1149\end{list}
1150or the commands for the large data sets are
1151\begin{list}{}{}
1152\item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
1153\item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
1154\item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
1155\end{list}
1156
1157\noindent
1158Similar commands should be used for the other data types.
1159\end{itemize}
1160
1161\subsubsection{Timing the BLAS}\label{timeblas}
1162
1163The linear equation timing program is also used to time the BLAS.
1164Three input files are provided in each data type for timing the Level
11652 and 3 BLAS.
1166These input files time the BLAS using the matrix shapes encountered
1167in the LAPACK routines, and we will use the results to analyze the
1168performance of the LAPACK routines.
1169For the REAL version, the small data files are
1170\texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
1171and the large data files are
1172\texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
1173There are three sets of inputs because there are three
1174parameters in the Level 3 BLAS, M, N, and K, and
1175in most applications one of these parameters is small (on the order
1176of the blocksize) while the other two are large (on the order of the
1177matrix size).
1178In \texttt{sblasa\_small.in}, M and N are large but K is
1179small, while in \texttt{sblasb\_small.in} the small parameter is M, and
1180in \texttt{sblasc\_small.in} the small parameter is N.
1181The Level 2 BLAS are timed only in the first data set, where K
1182is also used as the bandwidth for the banded routines.
1183
1184\begin{itemize}
1185
1186\item[a)]
1187Go to \texttt{LAPACK/TIMING} and
1188make any necessary modifications to the input files.
1189You may need to set the minimum time a subroutine will
1190be timed to a positive value.
1191If you modified the values of N or NB
1192in Section~\ref{timelin}, set M, N, and K accordingly.
1193The large parameters among M, N, and K
1194should be the same as the matrix sizes used in timing the linear
1195equation routines,
1196and the small parameter should be the same as the
1197blocksizes used in timing the linear equation routines.
1198If necessary, the large data set can be simplified by using only one
1199value of LDA.
1200
1201\item[b)]
1202Run the programs for each data type you are using.
1203For the REAL version, the commands for the small data sets are
1204
1205\begin{list}{}{}
1206\item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
1207\item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
1208\item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
1209\end{list}
1210or the commands for the large data sets are
1211\begin{list}{}{}
1212\item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
1213\item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
1214\item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
1215\end{list}
1216
1217\noindent
1218Similar commands should be used for the other data types.
1219\end{itemize}
1220
1221\subsubsection{Timing the Eigensystem Routines}\label{timeeig}
1222
1223The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
1224and the input files are in \texttt{LAPACK/TIMING}.
1225Four input files are provided in each data type for timing the
1226eigensystem routines,
1227one for the generalized nonsymmetric eigenvalue problem,
1228one for the nonsymmetric eigenvalue problem,
1229one for the symmetric and generalized symmetric eigenvalue problem,
1230and one for the singular value decomposition.
1231For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
1232\texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
1233and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
1234\texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
1235Each of the four input files reads a different set of parameters,
1236and the format of the input is indicated by a 3-character code
1237on the first line.
1238
1239The timing program for eigenvalue/singular value routines accumulates
1240the operation count as the routines are executing using special
1241instrumented versions of the LAPACK routines.  The first step in
1242compiling the timing program is therefore to make a library of the
1243instrumented routines.
1244
1245\begin{itemize}
1246\item[a)]
1247\begin{sloppypar}
1248To make a library of the instrumented LAPACK routines, first
1249go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
1250by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
1251The library of instrumented code is created in
1252\texttt{LAPACK/TIMING/EIG/eigsrc.a}.
1253\end{sloppypar}
1254
1255\item[b)]
1256To make the eigensystem timing programs,
1257go to \texttt{LAPACK/TIMING/EIG} and
1258type \texttt{make} followed by the data types desired, as in the examples
1259of Section~\ref{toplevelmakefile}.  The executable files are called
1260\texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
1261and are created in \texttt{LAPACK/TIMING}.
1262
1263\item[c)]
1264Go to \texttt{LAPACK/TIMING} and
1265make any necessary modifications to the input files.
1266You may need to set the minimum time a subroutine will
1267be timed to a positive value, or to restrict the number of tests
1268if you are using a computer with performance in between that of a
1269workstation and that of a supercomputer.
1270Instead of decreasing the matrix dimensions to reduce the time,
1271it would be better to reduce the number of matrix types to be timed,
1272since the performance varies more with the matrix size than with the
1273type.  For example, for the nonsymmetric eigenvalue routines,
1274you could use only one matrix of type 4 instead of four matrices of
1275types 1, 3, 4, and 6.
1276Refer to LAPACK Working Note 41~\cite{WN41} for further details.
1277%  See Section~\ref{moretiming} for further details.
1278
1279\item[d)]
1280Run the programs for each data type you are using.
1281For the REAL version, the commands for the small data sets are
1282
1283\begin{list}{}{}
1284\item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
1285\item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
1286\item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
1287\item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
1288\end{list}
1289or the commands for the large data sets are
1290\begin{list}{}{}
1291\item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
1292\item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
1293\item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
1294\item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
1295\end{list}
1296
1297\noindent
1298Similar commands should be used for the other data types.
1299\end{itemize}
1300
1301\subsection{Send the Results to Tennessee}\label{sendresults}
1302
1303Congratulations!  You have now finished installing, testing, and
1304timing LAPACK.  If you encountered failures in any phase of the
1305testing or timing process, please
1306consult our \texttt{release\_notes} file on netlib.
1307\begin{quote}
1308\url{http://www.netlib.org/lapack/release\_notes}
1309\end{quote}
1310This file contains machine-dependent installation clues which hopefully will
1311alleviate your difficulties or at least let you know that other users
1312have had similar difficulties on that machine.  If there is not an entry
1313for your machine or the suggestions do not fix your problem, please feel
1314free to contact the authors at
1315\begin{list}{}{}
1316\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
1317\end{list}
1318Tell us the
1319type of machine on which the tests were run, the version of the operating
1320system, the compiler and compiler options that were used,
1321and details of the BLAS library or libraries that you used.  You should
1322also include a copy of the output file in which the failure occurs.
1323
1324We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
1325Therefore, if you do not see an entry for your machine, please contact us
1326with your testing results.
1327
1328Comments and suggestions are also welcome.
1329
1330We encourage you to make the LAPACK library available to your
1331users and provide us with feedback from their experiences.
1332%This release of LAPACK is not guaranteed to be compatible
1333%with any previous test release.
1334
1335\subsection{Get support}\label{getsupport}
1336First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}.
1337if you still cannot solve your problem, you have 2 ways to go:
1338\begin{itemize}
1339\item
1340either send a post in the LAPACK forum
1341\begin{quote}
1342\url{http://icl.cs.utk.edu/lapack-forum}
1343\end{quote}
1344\item
1345or send an email to the LAPACK mailing list:
1346\begin{list}{}{}
1347\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
1348\end{list}
1349\end{itemize}
1350\section*{Acknowledgments}
1351
1352Ed Anderson and Susan Blackford contributed to previous versions of this report.
1353
1354\appendix
1355
1356\chapter{Caveats}\label{appendixd}
1357
1358In this appendix we list a few of the machine-specific difficulties we
1359have
1360encountered in our own experience with LAPACK.  A more detailed list
1361of machine-dependent problems, bugs, and compiler errors encountered
1362in the LAPACK installation process is maintained
1363on \emph{netlib}.
1364\begin{quote}
1365\url{http://www.netlib.org/lapack/release\_notes}
1366\end{quote}
1367
1368We assume the user has installed the machine-specific routines
1369correctly and that the Level 1, 2 and 3 BLAS test programs have run
1370successfully, so we do not list any warnings associated with those
1371routines.
1372
1373\section{\texttt{LAPACK/make.inc}}
1374
1375All machine-specific
1376parameters are specified in the file \texttt{LAPACK/make.inc}.
1377
1378The first line of this \texttt{make.inc} file is:
1379\begin{quote}
1380SHELL = /bin/sh
1381\end{quote}
1382and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
1383installing LAPACK on an SGI architecture.
1384
1385\section{ETIME}
1386
1387On HPPA architectures,
1388the compiler and linker flag \texttt{+U77} should be included to access
1389the function \texttt{ETIME}.
1390
1391\section{ILAENV and IEEE-754 compliance}
1392
1393%By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
1394%compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
1395%and (\texttt{ISPEC=11}) settings in ILAENV.
1396%
1397%If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
1398%as this test inside ILAENV will crash!
1399
1400As some new routines in LAPACK rely on IEEE-754 compliance,
1401two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
1402(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
1403infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
1404machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
1405are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
1406as this test inside ILAENV will crash!}
1407
1408Thus, for non-IEEE machines, the user must hard-code the setting of
1409(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
1410of \texttt{LAPACK/SRC/ilaenv.f} to be put in
1411his library.  For further details, refer to section~\ref{testieee}.
1412
1413Be aware
1414that some IEEE compilers by default do not enforce IEEE-754 compliance, and
1415a compiler flag must be explicitly set by the user.
1416
1417On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
1418flag to enable IEEE-754 compliance.
1419
1420And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
1421result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
1422Thus, if the user is installing on a machine that issues IEEE exception
1423warning messages (like a Sun SPARCstation), the user can disregard these
1424messages.  To avoid these messages, the user can hard-code the values
1425inside ILAENV as explained in section~\ref{testieee}.
1426
1427\section{Lack of \texttt{/tmp} space}
1428
1429If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
1430architecture, you may run out of space
1431when compiling.  There are a few possible solutions to this problem.
1432\begin{enumerate}
1433\item You can ask your system administrator to increase the size of the
1434\texttt{/tmp} partition.
1435\item You can change the environment variable \texttt{TMPDIR} to point to
1436your home directory for temporary space.  E.g.,
1437\begin{quote}
1438\texttt{setenv TMPDIR /home/userid/}
1439\end{quote}
1440where \texttt{/home/userid/} is the user's home directory.
1441\item If your archive command has an \texttt{l} option, you can change the
1442archive command to \texttt{ar crl} so that the
1443archive command will only place temporary files in the current working
1444directory rather than in the default temporary directory /tmp.
1445\end{enumerate}
1446
1447\section{BLAS}
1448
1449If you suspect a BLAS-related problem and you are linking
1450with an optimized version of the BLAS, we would strongly suggest
1451as a first step that you link to the Fortran~77 version of
1452the suspected BLAS routine and see if the error has disappeared.
1453
1454We have included test programs for the Level 1 BLAS.
1455Users should therefore beware of a common problem in machine-specific
1456implementations of xNRM2,
1457the function to compute the 2-norm of a vector.
1458The Fortran version of xNRM2 avoids underflow or overflow
1459by scaling intermediate results, but some library versions of xNRM2
1460are not so careful about scaling.
1461If xNRM2 is implemented without scaling intermediate results, some of
1462the LAPACK test ratios may be unusually high, or
1463a floating point exception may occur in the problems scaled near
1464underflow or overflow.
1465The solution to these problems is to link the Fortran version of
1466xNRM2 with the test program.  \emph{On some CRAY architectures, the Fortran77
1467version of xNRM2 should be used.}
1468
1469\section{Optimization}
1470
1471If a large numbers of test failures occur for a specific matrix type
1472or operation, it could be that there is an optimization problem with
1473your compiler.  Thus, the user could try reducing the level of
1474optimization or eliminating optimization entirely for those routines
1475to see if the failures disappear when you rerun the tests.
1476
1477%LAPACK is written in Fortran 77.  Prospective users with only a
1478%Fortran 66 compiler will not be able to use this package.
1479
1480\section{Compiling testing/timing drivers}
1481
1482The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
1483xTIMEE)
1484allocate large amounts of local variables.  Therefore, it is vitally
1485important that the user know if his compiler by default allocates local
1486variables statically or on the stack.  It is not uncommon for those
1487compilers which place local variables on the stack to cause a stack
1488overflow at runtime in the testing or timing process.  The user then
1489has two options:  increase your stack size, or force all local variables
1490to be allocated statically.
1491
1492On HPPA architectures, the
1493compiler and linker flag \texttt{-K} should be used when compiling these testing
1494and timing main programs to avoid such a stack overflow.  I.e., set
1495\texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file.
1496
1497For similar reasons,
1498on SGI architectures, the compiler and linker flag \texttt{-static} should be
1499used.  I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file.
1500
1501\section{IEEE arithmetic}
1502
1503Some of our test matrices are scaled near overflow or underflow,
1504but on the Crays, problems with the arithmetic near overflow and
1505underflow forced us to scale by only the square root of overflow
1506and underflow.
1507The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
1508take the square root of underflow and overflow in cases where it
1509could cause difficulties.
1510We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
1511is greater than 2000
1512and take the square root of underflow and overflow in this case.
1513The test in SLABAD is as follows:
1514\begin{verbatim}
1515      IF( LOG10( LARGE ).GT.2000. ) THEN
1516         SMALL = SQRT( SMALL )
1517         LARGE = SQRT( LARGE )
1518      END IF
1519\end{verbatim}
1520Users of other machines with similar restrictions on the effective
1521range of usable numbers may have to modify this test so that the
1522square roots are done on their machine as well.  \emph{Usually on
1523HPPA architectures, a similar restriction in SLABAD should be enforced
1524for all testing involving complex arithmetic.}
1525SLABAD is located in \texttt{LAPACK/SRC}.
1526
1527For machines which have a narrow exponent range or lack gradual
1528underflow (DEC VAXes for example), it is not uncommon to experience
1529failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
1530The failures in SLAQTR/DLAQTR and DTRSYL
1531occur with test problems which are very badly scaled when the norm of
1532the solution is very close to the underflow
1533threshold (or even underflows to zero).  We believe that these failures
1534could probably be avoided by an even greater degree of care in scaling,
1535but we did not want to delay the release of LAPACK any further.  These
1536tests pass successfully on most other machines.  An example failure in
1537dec.out on a MicroVAX II looks like the following:
1538
1539\begin{verbatim}
1540Tests of the Nonsymmetric eigenproblem condition estimation routines
1541DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
1542
1543Relative machine precision (EPS) =     0.277556D-16
1544Safe minimum (SFMIN)             =     0.587747D-38
1545
1546Routines pass computational tests if test ratio is less than   20.00
1547
1548DEC routines passed the tests of the error exits ( 35 tests done)
1549Error in DTRSYL: RMAX =   0.155D+07
1550LMAX =     5323 NINFO=    1600 KNT=   27648
1551Error in DLAQTR: RMAX =   0.344D+04
1552LMAX =    15792 NINFO=   26720 KNT=   45000
1553\end{verbatim}
1554
1555\section{Timing programs}
1556
1557In the eigensystem timing program, calls are made to the LINPACK
1558and EISPACK equivalents of the LAPACK routines to allow a direct
1559comparison of performance measures.
1560In some cases we have increased the minimum number of
1561iterations in the LINPACK and EISPACK routines to allow
1562them to converge for our test problems, but
1563even this may not be enough.
1564One goal of the LAPACK project is to improve the convergence
1565properties of these routines, so error messages in the output
1566file indicating that a LINPACK or EISPACK routine did not
1567converge should not be regarded with alarm.
1568
1569In the eigensystem timing program, we have equivalenced some work
1570arrays and then passed them to a subroutine, where both arrays are
1571modified.  This is a violation of the Fortran~77 standard, which
1572says ``if a subprogram reference causes a dummy argument in the
1573referenced subprogram to become associated with another dummy
1574argument in the referenced subprogram, neither dummy argument may
1575become defined during execution of the subprogram.''
1576\footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
1577If this causes any difficulties, the equivalence
1578can be commented out as explained in the comments for the main
1579eigensystem timing programs.
1580
1581%\section*{MACHINE-SPECIFIC DIFFICULTIES}
1582%Some IBM compilers do not recognize DBLE as a generic function as used
1583%in LAPACK.  The software tools we use to convert from single precision
1584%to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
1585%to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
1586%IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
1587%imaginary parts of a double complex number.
1588%IBM users can fix this problem by changing DBLE to DREAL when the
1589%argument of DBLE is COMPLEX*16.
1590%
1591%IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
1592%subprogram definition.  The data type on the first line of the
1593%function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
1594%for the following functions:
1595%
1596%\begin{tabbing}
1597%\dent ZLATMOO \= from the test matrix generator library \kill
1598%\dent ZBEG \> from the Level 2 BLAS test program  \\
1599%\dent ZBEG \> from the Level 3 BLAS test program  \\
1600%\dent ZLADIV \> from the LAPACK library \\
1601%\dent ZLARND \> from the test matrix generator library \\
1602%\dent ZLATM2 \> from the test matrix generator library \\
1603%\dent ZLATM3 \> from the test matrix generator library
1604%\end{tabbing}
1605%The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
1606%declared DOUBLE COMPLEX.  If that doesn't work, try the declaration
1607%COMPLEX FUNCTION*16.
1608
1609
1610\newpage
1611\addcontentsline{toc}{section}{Bibliography}
1612
1613\begin{thebibliography}{9}
1614
1615\bibitem{LUG}
1616E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
1617J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
1618S. Ostrouchov, and D. Sorensen,
1619\textit{LAPACK Users' Guide}, Second Edition,
1620{SIAM}, Philadelphia, PA, 1995.
1621
1622\bibitem{WN16}
1623E. Anderson and J. Dongarra,
1624\textit{LAPACK Working Note 16:
1625Results from the Initial Release of LAPACK},
1626University of Tennessee, CS-89-89, November 1989.
1627
1628\bibitem{WN41}
1629E. Anderson, J. Dongarra, and S. Ostrouchov,
1630\textit{LAPACK Working Note 41:
1631Installation Guide for LAPACK},
1632University of Tennessee, CS-92-151, February 1992 (revised June 1999).
1633
1634\bibitem{WN5}
1635C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
1636S. Hammarling, and D. Sorensen,
1637\textit{LAPACK Working Note \#5:  Provisional Contents},
1638Argonne National Laboratory, ANL-88-38, September 1988.
1639
1640\bibitem{WN13}
1641Z. Bai, J. Demmel, and A. McKenney,
1642\textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
1643Eigenvalue Problem:  Theory and Software},
1644University of Tennessee, CS-89-86, October 1989.
1645
1646\bibitem{XBLAS}
1647X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
1648W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
1649and D. J. Yoo, \textit{Design, implementation and testing of extended
1650  and mixed precision BLAS},
1651\textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
1652
1653\bibitem{BLAS3}
1654J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
1655``A Set of Level 3 Basic Linear Algebra Subprograms,''
1656\textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
1657%Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
1658
1659\bibitem{BLAS3-test}
1660J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
1661``A Set of Level 3 Basic Linear Algebra Subprograms:
1662Model Implementation and Test Programs,''
1663\textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
1664%Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
1665
1666\bibitem{BLAS2}
1667J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
1668``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
1669\textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
1670
1671\bibitem{BLAS2-test}
1672J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
1673``An Extended Set of Fortran Basic Linear Algebra Subprograms:
1674Model Implementation and Test Programs,''
1675\textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
1676
1677\bibitem{BLAS1}
1678C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
1679``Basic Linear Algebra Subprograms for Fortran Usage,''
1680\textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
1681
1682\end{thebibliography}
1683
1684\end{document}
1685