1\documentclass[12pt]{article}
2\usepackage{hyperref}
3
4\topmargin -0.5in
5\textheight 9.0in
6\oddsidemargin 0pt
7\evensidemargin 0pt
8\textwidth 6.5in
9
10%-------------------------------------------------------------------------------
11% get epsf.tex file, for encapsulated postscript files:
12\input epsf
13%-------------------------------------------------------------------------------
14% macro for Postscript figures the easy way
15% usage:  \postscript{file.ps}{scale}
16% where scale is 1.0 for 100%, 0.5 for 50% reduction, etc.
17%
18\newcommand{\postscript}[2]
19{\setlength{\epsfxsize}{#2\hsize}
20\centerline{\epsfbox{#1}}}
21%-------------------------------------------------------------------------------
22
23\title{User's Guide for SuiteSparseQR, a multifrontal multithreaded sparse
24QR factorization package (with optional GPU acceleration)}
25\author{Timothy A. Davis\thanks{
26email: DrTimothyAldenDavis@gmail.com.
27http://www.suitesparse.com.
28Portions of this work were supported by the National
29Science Foundation, under grants 0203270, 0620286, and 0619080.},
30Sencer Nuri Yeralan, Sanjay Ranka, Wissam Sid-Lakhdar}
31
32\date{VERSION 2.0.9, Dec 28, 2018}
33
34%-------------------------------------------------------------------------------
35\begin{document}
36%-------------------------------------------------------------------------------
37\maketitle
38
39\begin{abstract}
40
41SuiteSparseQR is an implementation of the multifrontal sparse QR factorization
42method.  Parallelism is exploited both in the BLAS and across different frontal
43matrices using Intel's Threading Building Blocks, a shared-memory programming
44model for modern multicore architectures.  It can obtain a substantial fraction
45of the theoretical peak performance of a multicore computer.  The package is
46written in C++ with user interfaces for MATLAB, C, and C++.  Both real and
47complex sparse matrices are supported.
48
49\end{abstract}
50
51\maketitle
52
53%-------------------------------------------------------------------------------
54\section{Introduction}
55\label{intro}
56%-------------------------------------------------------------------------------
57
58The algorithms used in SuiteSparseQR are discussed in a companion paper,
59\cite{Davis08a}, and an overview of how to use the software is given in
60\cite{Davis08b}.  This document gives detailed information on the installation
61and use of SuiteSparseQR.
62
63%-------------------------------------------------------------------------------
64\section{Using SuiteSparseQR in MATLAB}
65%-------------------------------------------------------------------------------
66
67The simplest way to use SuiteSparseQR is via MATLAB.  Its syntax includes every
68feature of the MATLAB \verb'qr' in version R2009a and earlier
69\cite{GilbertMolerSchreiber}, plus additional features not available in MATLAB.
70It is also a replacement for \verb'x=A\b' for least-squares problems and
71underdetermined systems.  In addition to substantial gains in performance (10x
72to 100x is not uncommon, up to 10,000x has been observed), SuiteSparseQR adds
73new capabilities that are not present in MATLAB.  For example, it provides an
74efficient method for finding the minimum 2-norm solution to an underdetermined
75system.
76
77%-------------------------------------------------------------------------------
78\subsection{Installing SuiteSparseQR for use in MATLAB}
79%-------------------------------------------------------------------------------
80
81All packages in SuiteSparse, including SuiteSparseQR and the codes it relies on
82(AMD, COLAMD, CHOLMOD, METIS, CCAMD, and CCOLAMD) are compiled with a single
83command typed into the MATLAB Command Window.  SuiteSparseQR uses the LAPACK
84and BLAS libraries provided with MATLAB; you do not need to do anything to use
85these.  Below are step-by-step instructions for compiling all of SuiteSparse
86(including SuiteSparseQR), and optional instructions on using METIS and/or
87Intel's Threading Building Blocks (TBB).
88
89%-------------------------------------------------------------------------------
90\subsubsection{Required instructions for Windows}
91%-------------------------------------------------------------------------------
92
93For Windows, you cannot use the \verb'lcc' compiler that ships with MATLAB; it
94is not a C++ compiler.  To compile SuiteSparseQR, you must obtain a C++
95compiler; Microsoft Visual Studio C++ Express Edition will work fine.  Install
96this compiler from \newline
97\htmladdnormallink{http://www.microsoft.com/express/vc/}{http://www.microsoft.com/express/vc/}
98and then type \verb'mex -setup' in the MATLAB Command Window.
99
100%-------------------------------------------------------------------------------
101\subsubsection{Optional instructions on using METIS for any operating system}
102%-------------------------------------------------------------------------------
103
104SuiteSparse now relies on METIS 5.1.0, which is distributed along with
105SuiteSparse itself.  Its use is optional, however.  If you compile with
106{\tt -DNPARTITION}, or if you delete or move the {\tt SuiteSparse/metis-5.1.0}
107folder, then SuiteSparse will be compiled without it.
108
109METIS tends to give orderings that are good for the parallelism exploited by
110TBB, so its use with TBB is recommended.  Note however that METIS is not
111bug-free; it can occasionally cause segmentation faults, particularly if used
112when finding basic solutions to underdetermined systems with many more columns
113than rows (SuiteSparseQR does not use METIS, by default, for those systems).
114This (rare) faulty behavior has been confirmed with valid inputs to the METIS
115test programs themselves; it is not a bug in the SuiteSparse interface to
116METIS.  Use METIS at your own risk.
117
118%-------------------------------------------------------------------------------
119\subsubsection{Optional instructions for using TBB on Linux/Unix/Mac}
120%-------------------------------------------------------------------------------
121
122If you are using a Debian-based Linux system (such as Ubuntu), you're in luck!
123You can install TBB via the Synaptic Package Manager.  Just search for TBB,
124select it, and click Apply.  This will place the right files in \verb'/usr/lib'
125and it will create the \verb'/usr/include/tbb' directory.  It's by far the
126simplest way to install TBB.  If you do this, skip the rest of this section.
127
128Alternatively, obtain a copy of TBB from
129\htmladdnormallink{http://www.threadingbuildingblocks.org}{http://www.threadingbuildingblocks.org}
130as a \verb'tbb*.tgz' file appropriate for your version of Linux/Unix.  If you
131install via the \verb'tbb_*.tgz' file, make sure the \verb'libtbb*.so*' and
132\verb'libtbbmalloc*.so*' files are placed in the \verb'/usr/lib' directory.
133Make sure the \verb'tbb' directory containing all of the include files is
134placed in \verb'/usr/include' (for example, the
135\verb'/usr/include/tbb/task_scheduler_init.h' file must exist).
136
137If you do not have permission to install TBB properly into the \verb'/usr/lib'
138and \verb'/usr/include' directories, you can place it in your own directory and
139modify the \verb'tbb_path' variable in \verb'spqr_make.m' to specify
140the location of your own copy of TBB (refer to that file for instructions).
141You must also set your \verb'LD_LIBRARY_PATH' environment variable to include
142the directory containing the \verb'libtbb*so' files.  You need to first
143determine which subdirectory of the TBB distribution contains the libraries
144appropriate for your system; in the examples below this is just called
145\verb'/path'.  It must be an absolute path, starting with the \verb'/'
146character.  There must be no spaces in the path name.
147
148For Linux use this command at the system command line before starting MATLAB,
149where \verb'/path' should be replaced with the actual full path of the TBB
150\verb'lib' directory:
151
152\begin{verbatim}
153    setenv LD_LIBRARY_PATH /path:$LD_LIBRARY_PATH
154\end{verbatim}
155
156If you use the \verb'csh' shell, place the command in your \verb'~/.cshrc'
157file so you don't have to type it each time you start MATLAB.
158
159For the Mac, edit the \verb'/etc/profile' file and add this line to the end of
160the file:
161
162\begin{verbatim}
163    export DYLD_LIBRARY_PATH=/path:$DYLD_LIBRARY_PATH
164\end{verbatim}
165
166For example, using the 64-bit Linux version placed in my home directory, this
167command would be placed in my \verb'~/.cshrc' file, prior to starting MATLAB.
168
169{\scriptsize
170\begin{verbatim}
171    setenv LD_LIBRARY_PATH /home/davis/tbb21_009oss/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:$LD_LIBRARY_PATH
172\end{verbatim}
173}
174
175Then, before starting MATLAB, make sure this variable is set by typing this
176command at the system command line (you only have to do this for this session;
177whenever you start a new command shell this will be done automatically):
178
179\begin{verbatim}
180    source ~/.cshrc
181\end{verbatim}
182
183For this example, my \verb'spqr_make.m' file would contain this line:
184
185\begin{verbatim}
186    tbb_path = '/home/davis/tbb21_009oss' ;
187\end{verbatim}
188
189%-------------------------------------------------------------------------------
190\subsubsection{Optional instructions for using TBB on Windows}
191%-------------------------------------------------------------------------------
192
193Obtain a copy of TBB from
194\htmladdnormallink{http://www.threadingbuildingblocks.org}{http://www.threadingbuildingblocks.org}
195; for TBB Version 2.1 this file is \verb'tbb21_009oss_win.zip'.
196
197Create a folder and place the \verb'tbb21_009oss_win.zip' file there, and
198extract the file.  In the example below, I extracted to
199\verb'C:\TBB\tbb21_009oss' (if you do the same then you do not have to edit
200\verb'spqr_make.m').  There should be no spaces in the path (for example,
201placing TBB under the \verb'Program Files' directory will not work).
202
203In Windows XP, right-click \verb'My Computer' and select \verb'Properties'.
204Click the \verb'Advanced' tab.  Click \verb'Environment Variables'.  Under
205\verb'System' variables, edit the \verb'Path' to append the name of the folder
206containing the TBB \verb'bin' folder appropriate for your system, preceded by a
207semicolon.  For example, in a 32-bit Windows system if TBB is installed in
208\verb'C:\TBB' you would append the string
209\verb';C:\TBB\tbb21_009oss\ia32\vc9\bin\' to the end of your system \verb'Path'
210variable.
211
212For Vista, the instructions are the same, except that you choose
213\verb'Computer' instead of \verb'My Computer', and you click the
214\verb'Advanced System Settings' tab instead of the \verb'Advanced' tab.
215
216Next, edit the \verb'SPQR\MATLAB\spqr_make.m' file.  Change the \verb'tbb_path'
217variable to point to your copy of TBB.  For example, if you installed TBB into
218\verb'C:\TBB' your \verb'spqr_make.m' file would contain this line:
219
220\begin{verbatim}
221    tbb_path = 'C:\TBB\tbb21_009oss' ;
222\end{verbatim}
223
224That line is already in \verb'spqr_make.m', so if you choose to install TBB
225in that location, you do not have to edit the file.
226
227%-------------------------------------------------------------------------------
228\subsubsection{Now you're ready to compile (on any operating system)}
229%-------------------------------------------------------------------------------
230
231Type these commands in the MATLAB window:
232\begin{verbatim}
233    cd SuiteSparse
234    SuiteSparse_install
235\end{verbatim}
236You will be asked if you want to run some demos.  I recommend that you do this
237to ensure your functions have been installed correctly.  Next type the command
238\begin{verbatim}
239    pathtool
240\end{verbatim}
241and examine your MATLAB path.  The various SuiteSparse directories have been
242placed in your path.  Click ``save'' to save this path for future MATLAB
243sessions.  If this fails, you do not have permission to modify the
244\verb'pathdef.m' file (it is shared by all users).  An alternative is to type
245the command:
246\begin{verbatim}
247    path
248\end{verbatim}
249and cut-and-paste the paths displayed there into your own \verb'startup.m'
250file, prepending the command \verb'addpath' to each line.  For example, if I
251installed SuiteSparse into my home directory (\verb'/home/davis') then my
252\verb'startup.m' file should look like this:
253
254{\footnotesize
255\begin{verbatim}
256    addpath  /home/davis/SuiteSparse/SPQR/MATLAB
257    addpath  /home/davis/SuiteSparse/RBio
258    addpath  /home/davis/SuiteSparse/MATLAB_Tools/spok
259    addpath  /home/davis/SuiteSparse/MATLAB_Tools/waitmex
260    addpath  /home/davis/SuiteSparse/MATLAB_Tools/shellgui
261    addpath  /home/davis/SuiteSparse/MATLAB_Tools/GEE
262    addpath  /home/davis/SuiteSparse/LINFACTOR
263    addpath  /home/davis/SuiteSparse/MESHND
264    addpath  /home/davis/SuiteSparse/UFcollection
265    addpath  /home/davis/SuiteSparse/SSMULT
266    addpath  /home/davis/SuiteSparse/KLU/MATLAB
267    addpath  /home/davis/SuiteSparse/BTF/MATLAB
268    addpath  /home/davis/SuiteSparse/LDL/MATLAB
269    addpath  /home/davis/SuiteSparse/CXSparse/MATLAB/UFget
270    addpath  /home/davis/SuiteSparse/CXSparse/MATLAB/Demo
271    addpath  /home/davis/SuiteSparse/CXSparse/MATLAB/CSparse
272    addpath  /home/davis/SuiteSparse/CAMD/MATLAB
273    addpath  /home/davis/SuiteSparse/CCOLAMD/MATLAB
274    addpath  /home/davis/SuiteSparse/COLAMD/MATLAB
275    addpath  /home/davis/SuiteSparse/AMD/MATLAB
276    addpath  /home/davis/SuiteSparse/CHOLMOD/MATLAB
277    addpath  /home/davis/SuiteSparse/UMFPACK/MATLAB
278    addpath  /home/davis/SuiteSparse
279    addpath  /home/davis/SuiteSparse/MATLAB_Tools
280\end{verbatim}
281}
282
283On a Windows system, I might see paths like this instead in my \verb'startup.m'
284file:
285
286{\footnotesize
287\begin{verbatim}
288    addpath C:\Documents and Settings\davis\My Documents\SuiteSparse\SPQR\MATLAB
289    ...
290\end{verbatim}
291}
292
293Your \verb'startup.m' file should appear in the directory in which MATLAB
294starts.  Failing that, every time you start MATLAB, find your \verb'startup.m'
295file and run it.  For more help, type \verb'doc startup' in MATLAB.
296
297The \verb'SuiteSparse_install' script works on any version of MATLAB
298(Linux/Unix, Mac, or Windows) if you have a C++ compiler.  The install script
299will detect if you have placed the METIS directory in the right place, and will
300compile it for use with SuiteSparseQR if it finds it there.  Otherwise METIS
301will be skipped (the install script will tell you if it finds METIS or not).
302
303%-------------------------------------------------------------------------------
304\subsubsection{Optional instructions for using TBB on any system}
305%-------------------------------------------------------------------------------
306
307If you have followed the steps above (as I recommend that you do), you have
308just compiled SuiteSparseQR, but it will not yet be using TBB.  To use TBB with
309SuiteSparseQR, first install TBB as described above.  Once TBB is installed,
310type the following commands in the MATLAB Command Window (assuming you have
311METIS):
312
313\begin{verbatim}
314    cd SuiteSparse/SPQR/MATLAB
315    spqr_make metis tbb
316\end{verbatim}
317
318Or if you do not have METIS, do this instead:
319
320\begin{verbatim}
321    cd SuiteSparse/SPQR/MATLAB
322    spqr_make nometis tbb
323\end{verbatim}
324
325For more options, type \verb'help spqr_make'.
326
327%-------------------------------------------------------------------------------
328\subsection{Functions provided to the MATLAB user}
329%-------------------------------------------------------------------------------
330
331Three primary functions are available:
332
333\begin{enumerate}
334
335\item \verb'spqr', a replacement for the MATLAB \verb'qr'
336
337\item \verb'spqr_solve', a replacement for \verb'x=A\b' when \verb'A' is sparse
338and rectangular.  It works for the square case, too, but \verb'x=A\b' will be
339faster (using LU or Cholesky factorization). \verb'spqr_solve' is a good method
340for ill-conditioned or rank-deficient square matrices, however.
341
342\item \verb'spqr_qmult', which multiplies \verb'Q' (stored in Householder
343vector form) times a matrix \verb'x'.
344
345\end{enumerate}
346
347Their syntax is described below in the table below.
348The permutation \verb'P' is chosen to reduce fill-in and to return \verb'R' in
349upper trapezoidal form if \verb'A' is estimated to have less than full rank.
350The \verb'opts' parameter provides non-default options (refer to the next
351section).  The output \verb'Q' can be optionally returned in Householder form,
352which is far sparser than returning \verb'Q' as a sparse matrix.
353
354\vspace{0.1in}
355
356{\footnotesize
357\begin{tabular}{|ll|}
358\hline
359\verb'R = spqr (A)'             & Q-less QR factorization \\
360\verb'R = spqr (A,0)'           & economy variant (\verb'size(R,1) = min(m,n)') \\
361\verb'R = spqr (A,opts)'        & as above, with non-default options \\
362\hline
363\verb'[Q,R] = spqr (A)'         & \verb'A=Q*R' factorization \\
364\verb'[Q,R] = spqr (A,0)'       & economy variant (\verb'size(Q,2) = size(R,1) = min(m,n)') \\
365\verb'[Q,R] = spqr (A,opts)'    & \verb'A=Q*R', with non-default options \\
366\hline
367\verb'[Q,R,P] = spqr (A)'       & \verb'A*P=Q*R' where P reduces fill-in \\
368\verb'[Q,R,P] = spqr (A,0)'     & economy variant (\verb'size(Q,2) = size(R,1) = min(m,n)') \\
369\verb'[Q,R,P] = spqr (A,opts)'  & as above, with non-default options \\
370\hline
371\verb'[C,R] = spqr (A,B)'       & as \verb'R=spqr(A)', also returns \verb"C=Q'*B" \\
372\verb'[C,R] = spqr (A,B,0)'     & economy variant (\verb'size(C,1) = size(R,1) = min(m,n)') \\
373\verb'[C,R] = spqr (A,B,opts)'  & as above, with non-default options \\
374\hline
375\verb'[C,R,P] = spqr (A,B)'     & as \verb'R=spqr(A*P)', also returns \verb"C=Q'*B" \\
376\verb'[C,R,P] = spqr (A,B,0)'   & economy variant (\verb'size(C,1) = size(R,1) = min(m,n)') \\
377\verb'[C,R,P] = spqr (A,B,opts)'& as above, with non-default options \\
378\hline
379\verb'x = spqr_solve (A,B)'     & \verb'x=A\B' \\
380\verb'[x,info] = spqr_solve (A,B,opts)' & as above, with statistics and non-default parameters \\
381\hline
382\verb'Y = spqr_qmult (Q,X,k)'   & computes \verb"Q'*X", \verb"Q*X", \verb"X*Q'", or \verb"X*Q"
383(selected with \verb'k') \\
384\hline
385\end{tabular}
386}
387
388%-------------------------------------------------------------------------------
389\subsection{The {\tt opts} parameter}
390%-------------------------------------------------------------------------------
391
392The \verb'opts' struct provides control over non-default parameters for
393SuiteSparseQR.  Entries not present in \verb'opts' are set to their defaults.
394
395\begin{itemize}
396
397    \item \verb'opts.tol':   columns that have 2-norm \verb'<= opts.tol' are
398    treated as zero. The default is \verb"20*(m+n)*eps*sqrt(max(diag(A'*A)))"
399    where \verb'[m n]=size(A)'.
400
401    \item \verb'opts.econ':  number of rows of \verb'R' and columns of \verb'Q'
402    to return.  The default is \verb'm'.  Using \verb'n' gives the standard
403    economy form (as in the MATLAB \verb'qr(A,0)').  A value less than the
404    estimated rank \verb'r' is set to \verb'r', so \verb'opts.econ=0' gives
405    the ``rank-sized'' factorization, where \verb'size(R,1)==nnz(diag(R))==r'.
406
407    \item \verb'opts.ordering': a string describing which column ordering
408    method to use.  Let \newline
409    \verb'[m2 n2]=size(S)' where \verb'S' is obtained by
410    removing singletons from \verb'A'.  The singleton permutation places
411    \verb'A*P' in the form \verb'[A11 A12 ; 0 S]' where \verb'A11' is upper
412    triangular with diagonal entries all greater than \verb'opts.tol'.
413
414    The default is to use COLAMD if \verb'm2<=2*n2'; otherwise try AMD.  Let
415    \verb'f' be the flops for \verb"chol((S*P)'*(S*P))" with the ordering
416    \verb'P' found by AMD.  Then if \verb'f/nnz(R) >= 500' and
417    \verb'nnz(R)/nnz(S) >= 5' then try METIS, and take the best ordering found
418    (AMD or METIS); otherwise use AMD without trying METIS.  If METIS is not
419    installed then the default ordering is to use COLAMD if \verb'm2<=2*n2' and
420    to use AMD otherwise.
421
422    The available orderings are:
423
424       {\tt 'default'}: the default ordering.
425
426       {\tt 'amd'}: use \verb"amd(S'*S)".
427
428       {\tt 'colamd'}: use \verb"colamd(S)".
429
430       {\tt 'metis'}: use \verb"metis(S'*S)", only if METIS is
431       installed.
432
433       {\tt 'best'}: try all three (AMD, COLAMD, METIS) and take the
434       best.
435
436       {\tt 'bestamd'}: try AMD and COLAMD and take the best.
437
438       {\tt 'fixed'}: use \verb"P=I"; this is the only option if
439       \verb"P" is not present in the output.
440
441       {\tt 'natural'}: singleton removal only.
442
443    \item \verb'opts.Q': a string describing how \verb'Q' is to be returned.
444    The default is \verb"'discard'" if \verb'Q' is not present in the output,
445    or \verb"'matrix'" otherwise.  If \verb'Q' is present and \verb'opts.Q' is
446    \verb"'discard'", then \verb"Q=[]" is returned (thus \verb"R=spqr(A*P)" is
447    \verb"[Q,R,P]=spqr(A)" where \verb"spqr" finds \verb"P" but \verb"Q" is
448    discarded instead). The usage \verb"opts.Q='matrix'" returns \verb'Q' as a
449    sparse matrix where \verb'A=Q*R' or \verb"A*P=Q*R".  Using
450    \verb"opts.Q='Householder'" returns \verb'Q' as a struct containing the
451    Householder reflections applied to \verb'A' to obtain \verb'R', resulting
452    in a far sparser \verb'Q' than the \verb"'matrix'" option.
453
454    \item \verb'opts.permutation': a string describing how \verb'P' is to be
455    returned.  The default is \verb"'matrix'", so that \verb"A*P=Q*R".  Using
456    \verb"'vector'" gives \verb"A(:,P)=Q*R" instead.
457
458    \item \verb'opts.spumoni': an integer \verb'k' that
459        acts just like \verb"spparms('spumoni',k)".
460
461    \item \verb'opts.min2norm': used by \verb'spqr_solve'; you can use
462    \verb"'basic'" (the default), or \verb"'min2norm'".  Determines the kind of
463    solution that \verb'spqr_solve' computes for underdetermined systems.  Has
464    no effect for least-squares problems; ignored by \verb'spqr' itself.
465
466    \item \verb'opts.grain', \verb'opts.small', \verb'opts.nthreads':
467    multitasking control (if compiled with TBB).  Let \verb'f' be the total
468    flop count.  The analysis phase tries to ensure that all parallel tasks
469    have at least \verb'max(total_flops/opts.grain,opts.small)' flops.  No TBB
470    parallelism is exploited if \verb'opts.grain <= 1'.  The parameter
471    \verb'opts.nthreads' gives the number of threads to use for TBB (which is
472    different than the number of threads used by the BLAS).  Setting
473    \verb'opts.nthreads <= 0' means to let TBB determine the number of threads
474    (normally equal to the number of cores); otherwise, exactly
475    \verb'opts.nthreads' threads are used.  The defaults are
476    \verb'opts.grain=1', \verb'opts.small=1e6', and \verb'opts.nthreads=0',
477    respectively.  That is, TBB is disabled by default since it conflicts with
478    BLAS multithreading.  If you enable TBB, be sure to disable BLAS
479    multithreading with the MATLAB command \verb'maxNumCompThreads(1)', or
480    choose \newline \verb'opts.nthreads=k' and \verb'maxNumCompThreads(b)' so
481    that the product \verb'k*b' is equal to the number of cores.  Note that
482    these recommendations may change for future versions of TBB.  A good value
483    of \verb'opts.grain' is twice that of \verb'opts.nthreads'.  If TBB
484    parallelism is enabled, the METIS ordering normally gives the best speedup
485    for large problems.
486
487\end{itemize}
488
489%-------------------------------------------------------------------------------
490\subsection{Examples on how to use the MATLAB interface}
491%-------------------------------------------------------------------------------
492
493To solve a least-squares problem, or to find the basic solution to an
494underdetermined system, just use \verb'x = spqr_solve(A,b)' in place of
495\verb'x=A\b'.  To compute the QR factorization, use \verb'[Q,R]=spqr(A)'
496instead of \verb'[Q,R]=qr(A)'.  Better results can be obtained by discarding
497\verb'Q' with the usage \verb'R=spqr(A)' (in place of \verb'R=qr(A)'), or by
498requesting \verb'Q' in Householder form with \verb'[Q,R]=spqr(A,opts)' where
499\verb"opts.Q='Householder'".  The latter option is not available in MATLAB.  To
500use a fill-reducing ordering, simply use any of the syntaxes above with
501\verb'P' as an output parameter.
502
503The least-squares solution of an overdetermined system \verb'A*x=b' with
504\verb'm>n' (where \verb'A' has rank \verb'n') can be found in one of at least
505seven ways (in increasing order of efficiency, in time and memory):
506
507{\footnotesize
508\begin{tabular}{|l|l|}
509    \hline
510    \verb"x = pinv(full(A)) * b ;"
511        & impossible for large \verb'A' \\
512    \hline
513    \verb"[Q,R] = spqr (A) ;"
514        & high fill-in in \verb'R', \\
515    \verb"x = R\(Q'*b) ;"
516        & \verb'Q' costly in matrix form \\
517    \hline
518    \verb"[Q,R,P] = spqr (A) ;"
519        & low fill-in in \verb'R', \\
520    \verb"x = P*(R\(Q'*b)) ;"
521        & \verb'Q' costly in matrix form \\
522    \hline
523    \verb"[Q,R,P] = spqr (A,struct('Q','Householder')) ;"
524        & low fill-in in \verb'R', \\
525    \verb"x = P*(R\spqr_qmult (Q,b,0)) ;"
526        & \verb'Q' in efficient Householder form \\
527    \hline
528    \verb"[c,R,P] = spqr (A,b) ;"
529        & \verb'Q' not kept, \\
530    \verb"x = P*(R\c) ;"
531        & \verb'P' a permutation matrix \\
532    \hline
533    \verb"[c,R,p] = spqr (A,b,0) ;"
534        & \verb'Q' not kept, \\
535    \verb"y = (R\c) ; x(p) = y"
536        & \verb'p' a permutation vector \\
537    \hline
538    \verb"x = spqr_solve (A,b) ;"
539        & less memory and better handling \\
540        & of rank-deficient matrices \\
541    \hline
542\end{tabular}
543}
544
545\vspace{0.1in}
546
547The minimum-norm solution of an underdetermined system \verb'A*x=b' with
548\verb'm<n' can be found in one of five ways (in increasing order of
549efficiency, in time and memory):
550
551\vspace{0.1in}
552
553{\footnotesize
554\begin{tabular}{|l|l|}
555    \hline
556    \verb"x = pinv(full(A)) * b ;"
557        & impossible for large \verb'A' \\
558    \hline
559    \verb"[Q,R] = spqr (A') ;"
560        & high fill-in in \verb'R', \\
561    \verb"x = Q*(R'\b) ;"
562        & \verb'Q' costly in matrix form \\
563    \hline
564    \verb"[Q,R,P] = spqr (A') ;"
565        & low fill-in in \verb'R', \\
566    \verb"x = Q*(R'\(P'*b)) ;"
567        & \verb'Q' costly in matrix form \\
568    \hline
569    \verb"[Q,R,P] = spqr (A',struct('Q','Householder')) ;"
570        & low fill-in in \verb'R', \\
571    \verb"x = spqr_qmult (Q,R'\(P'*b),1) ;"
572        & \verb'Q' in efficient Householder form \\
573    \hline
574    \verb"opts.solution = 'min2norm' ;"
575        & as 4th option above, but faster, \\
576    \verb"x = spqr_solve (A,b,opts) ;"
577        & less memory, and better handling \\
578        & of rank-deficient matrices \\
579    \hline
580\end{tabular}
581}
582
583Note that \verb'spqr_solve' uses a fill-reducing ordering, by default.
584It can be disabled or modified using a non-default \verb'opts' parameter
585(\verb'opts.ordering', specifically).
586
587%-------------------------------------------------------------------------------
588\section{Using SuiteSparseQR in C and C++}
589%-------------------------------------------------------------------------------
590
591SuiteSparseQR relies on CHOLMOD for its basic sparse matrix data structure, a
592compressed sparse column format.  CHOLMOD provides interfaces to the AMD,
593COLAMD, and METIS ordering methods, supernodal symbolic Cholesky factorization
594(namely, \verb'symbfact' in MATLAB), functions for converting between different
595data structures, and for basic operations such as transpose, matrix multiply,
596reading a matrix from a file, writing a matrix to a file, and many other
597functions.
598
599%-------------------------------------------------------------------------------
600\subsection{Installing the C/C++ library on Linux/Unix}
601%-------------------------------------------------------------------------------
602
603Before you compile the SuiteSparseQR library and demo programs, you may wish to
604edit the \verb'SuiteSparse/SuiteSparse_config/SuiteSparse_config.mk' configuration file.  The
605defaults should be fine on most Linux/Unix systems and on the Mac.
606It automatically detects what system you have and sets compile parameters
607accordingly.
608
609Next, type \verb'make' at
610the Linux/Unix command line, in either the \verb'SuiteSparse' directory (which
611compiles all of SuiteSparse) or in the \verb'SuiteSparse/SPQR' directory (which
612just compiles SuiteSparseQR and the libraries it requires).  SuiteSparseQR will
613be compiled, and a set of simple demos will be run (including the one in the
614next section).
615
616The configuration file defines where the LAPACK and BLAS libraries are to be
617found.  Selecting the right BLAS is critical.  There is no standard naming
618scheme for the name and location of these libraries.  The defaults in the
619\verb'SuiteSparse_config.mk' file use \verb'-llapack' and \verb'-lblas'; the latter may
620link against the standard Fortran reference BLAS, which will not provide
621optimal performance.  For best results, you should use a
622high-performance vendor-supplied BLAS such as the
623Intel MKL.  Selection of LAPACK and
624the BLAS is done with the \verb'LAPACK=' and \verb'BLAS=' lines in the
625\verb'SuiteSparse_config.mk' file.
626
627Four compile-time options can be used to modify how SuiteSparseQR is compiled.
628Select these via the \verb'SPQR_CONFIG=' line in the \verb'SuiteSparse_config.mk' file.
629
630\begin{itemize}
631
632    \item \verb'-DNPARTITION': do not compile with METIS, CAMD, or CCOLAMD.
633    These packages are included by default.
634
635    \item \verb'-DNEXPERT': do not compile with the ``expert'' routines in
636    \verb'SuiteSparseQR_expert.cpp'.  The expert routines are included by
637    default.
638
639    \item \verb'-DHAVE_TBB': enable the Intel Threading Building Blocks, TBB.
640    The use of TBB is disabled by default.  It is disabled because not all
641    installations have TBB available.  The use of TBB is recommended, however,
642    if you have a multicore computer.
643
644\end{itemize}
645
646To fully test 100\% of the lines of SuiteSparseQR, go to the \verb'Tcov'
647directory and type \verb'make'.  This will work for Linux only.
648
649To install the shared library
650into /usr/local/lib and /usr/local/include, do {\tt make install}.
651To uninstall, do {\tt make uninstall}.
652For more options, see the {\tt SuiteSparse/README.txt} file.
653
654%-------------------------------------------------------------------------------
655\subsection{C/C++ Example}
656%-------------------------------------------------------------------------------
657
658The C++ interface is written using templates for handling both real and complex
659matrices.  The simplest function computes the MATLAB equivalent of
660\verb'x=A\b' and is almost as simple:
661{\footnotesize
662\begin{verbatim}
663    #include "SuiteSparseQR.hpp"
664    X = SuiteSparseQR <double> (A, B, cc) ;
665\end{verbatim}
666}
667The C version of this function is almost identical:
668{\footnotesize
669\begin{verbatim}
670    #include "SuiteSparseQR_C.h"
671    X = SuiteSparseQR_C_backslash_default (A, B, cc) ;
672\end{verbatim}
673}
674
675Below is a simple C++ program that illustrates the use of SuiteSparseQR.  The
676program reads in a least-squares problem from \verb'stdin' in MatrixMarket
677format \cite{BoisvertPozoRemingtonBarrettDongarra97}, solves it, and prints the
678norm of the residual and the estimated rank of \verb'A'.  The comments reflect
679the MATLAB equivalent statements.  The C version of this program is identical
680except for the \verb'#include' statement and call to SuiteSparseQR which are
681replaced with the C version of the statement above, and C-style comments.
682
683{\footnotesize
684\begin{verbatim}
685    #include "SuiteSparseQR.hpp"
686    int main (int argc, char **argv)
687    {
688        cholmod_common Common, *cc ;
689        cholmod_sparse *A ;
690        cholmod_dense *X, *B, *Residual ;
691        double rnorm, one [2] = {1,0}, minusone [2] = {-1,0} ;
692        int mtype ;
693
694        // start CHOLMOD
695        cc = &Common ;
696        cholmod_l_start (cc) ;
697
698        // load A
699        A = (cholmod_sparse *) cholmod_l_read_matrix (stdin, 1, &mtype, cc) ;
700
701        // B = ones (size (A,1),1)
702        B = cholmod_l_ones (A->nrow, 1, A->xtype, cc) ;
703
704        // X = A\B
705        X = SuiteSparseQR <double> (A, B, cc) ;
706
707        // rnorm = norm (B-A*X)
708        Residual = cholmod_l_copy_dense (B, cc) ;
709        cholmod_l_sdmult (A, 0, minusone, one, X, Residual, cc) ;
710        rnorm = cholmod_l_norm_dense (Residual, 2, cc) ;
711        printf ("2-norm of residual: %8.1e\n", rnorm) ;
712        printf ("rank %ld\n", cc->SPQR_istat [4]) ;
713
714        // free everything and finish CHOLMOD
715        cholmod_l_free_dense (&Residual, cc) ;
716        cholmod_l_free_sparse (&A, cc) ;
717        cholmod_l_free_dense (&X, cc) ;
718        cholmod_l_free_dense (&B, cc) ;
719        cholmod_l_finish (cc) ;
720        return (0) ;
721    }
722\end{verbatim}
723}
724
725%-------------------------------------------------------------------------------
726\subsection{C++ Syntax}
727%-------------------------------------------------------------------------------
728
729All features available to the MATLAB user are also available to both the C and
730C++ interfaces using a syntax that is not much more complicated than the MATLAB
731syntax.  Additional features not available via the MATLAB interface include the
732ability to compute the symbolic and numeric factorizations separately (for
733multiple matrices with the same nonzero pattern but different numerical
734values).  The following is a list of user-callable C++ functions and what they
735can do:
736
737\begin{enumerate}
738
739    \item \verb'SuiteSparseQR': an overloaded function that provides functions
740    equivalent to \verb'spqr' and \verb'spqr_solve' in the SuiteSparseQR MATLAB
741    interface.
742
743    \item \verb'SuiteSparseQR_factorize': performs both the symbolic and
744    numeric factorizations and returns a QR factorization object such that
745    \verb'A*P=Q*R'.  It always exploits singletons.
746
747    \item \verb'SuiteSparseQR_symbolic': performs the symbolic factorization
748    and returns a QR factorization object to be passed to
749    \verb'SuiteSparseQR_numeric'.  It does not exploit singletons.
750
751    \item \verb'SuiteSparseQR_numeric': performs the numeric factorization on a
752    QR factorization object, either one constructed by
753    \verb'SuiteSparseQR_symbolic', or reusing one from a prior call to
754    \verb'SuiteSparseQR_numeric' for a matrix \verb'A' with the same pattern as
755    the first one, but with different numerical values.
756
757    \item \verb'SuiteSparseQR_solve': solves a linear system using the object
758    returned by \newline \verb'SuiteSparseQR_factorize' or
759    \verb'SuiteSparseQR_numeric', namely \verb"x=R\b", \newline \verb"x=P*R\b",
760    \verb"x=R'\b", or \verb"x=R'\(P'*b)".
761
762    \item \verb'SuiteSparseQR_qmult': provides the same function as
763    \verb'spqr_qmult' in the MATLAB interface, computing
764    \verb"Q'*x", \verb"Q*x", \verb"x*Q'", or \verb"x*Q".
765    It uses either a QR factorization
766    in MATLAB-style sparse matrix format, or the QR factorization object
767    returned by \newline \verb'SuiteSparseQR_factorize' or
768    \verb'SuiteSparseQR_numeric'.
769
770    \item \verb'SuiteSparseQR_min2norm': finds the minimum 2-norm solution to
771    an underdetermined linear system.
772
773    \item \verb'SuiteSparseQR_free': frees the QR factorization object.
774
775\end{enumerate}
776
777%-------------------------------------------------------------------------------
778\subsection{Details of the C/C++ Syntax}
779%-------------------------------------------------------------------------------
780
781For further details of how to use the C/C++ syntax, please refer to the
782definitions and descriptions in the following files:
783
784\begin{enumerate}
785\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR.hpp' describes each
786C++ function.  Both \verb'double' and \verb'std::complex<double>' matrices
787are supported.
788
789\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR_definitions.h' describes
790definitions \newline common to both C and C++ functions.  For example, each of
791the ordering methods is given a \verb'#define''d name.  The default is
792\verb'ordering = SPQR_ORDERING_DEFAULT', and the default tolerance is given by
793\verb'tol = SPQR_DEFAULT_TOL'.
794
795\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR_C.h' describes
796the C-callable functions.
797
798\end{enumerate}
799
800Most of the packages in SuiteSparse come in multiple versions with different
801sized integers.  The first is the plain C/C++ \verb'int'.  The second the
802\verb'SuiteSparse_long' integer, defined in the
803\verb'SuiteSparse/SuiteSparse_config/SuiteSparse_config.h'
804file.  This integer is \verb'long' except on a Windows-64 platform for which it
805is the  \verb'__int64' type.  The intent of \verb'SuiteSparse_long'
806is that it should be
80732-bits on a 32-bit platform, and 64-bits on a 64-bit platform.
808
809By contrast, SuiteSparseQR only provides a \verb'SuiteSparse_long' version.
810Most users (except Windows-64) can simply use \verb'long' as the basic integer
811type passed to and returned from SuiteSparseQR.
812
813The C/C++ options corresponding to the MATLAB \verb'opts' parameters and the
814contents of the optional \verb'info' output of \verb'spqr_solve' are described
815below.  Let \verb'cc' be the CHOLMOD \verb'Common' object, containing parameter
816settings and statistics.  All are of type \verb'double', except for
817\verb'SPQR_istat' which is \verb'SuiteSparse_long',
818\verb'cc->memory_usage' which is
819\verb'size_t', and \verb'cc->SPQR_nthreads' which is \verb'int'.  Parameters
820include:
821
822\vspace{0.1in}
823{\footnotesize
824\begin{tabular}{|ll|}
825\hline
826\verb'cc->SPQR_grain' & the same as \verb'opts.grain' in the MATLAB interface \\
827\verb'cc->SPQR_small' & the same as \verb'opts.small' in the MATLAB interface \\
828\verb'cc->SPQR_nthreads'
829    & the same as \verb'opts.nthreads' in the MATLAB interface \\
830\hline
831\end{tabular}
832}
833\vspace{0.1in}
834
835Other parameters, such as \verb'opts.ordering' and \verb'opts.tol',
836are input parameters to the various C/C++ functions.  Others such as
837\verb"opts.solution='min2norm'" are separate functions in the C/C++
838interface.  Refer to the files listed above for details.
839Output statistics include:
840
841\vspace{0.1in}
842{\footnotesize
843\begin{tabular}{|ll|}
844\hline
845\verb'cc->SPQR_flopcount_bound' & an upper bound on the flop count \\
846\verb'cc->SPQR_tol_used' & the tolerance used (\verb'opts.tol') \\
847\hline
848\verb'cc->SPQR_istat [0]' & upper bound on \verb'nnz(R)' \\
849\verb'cc->SPQR_istat [1]' & upper bound on \verb'nnz(H)' \\
850\verb'cc->SPQR_istat [2]' & number of frontal matrices \\
851\verb'cc->SPQR_istat [3]' & number of TBB tasks \\
852\verb'cc->SPQR_istat [4]' & estimate of the rank of \verb'A' \\
853\verb'cc->SPQR_istat [5]' & number of column singletons \\
854\verb'cc->SPQR_istat [6]' & number of row singletons \\
855\verb'cc->SPQR_istat [7]' & ordering used \\
856\hline
857\verb'cc->memory_usage'   & memory used, in bytes \\
858\hline
859\end{tabular}
860}
861\vspace{0.1in}
862
863The upper bound on the flop count is found in the analysis phase, which ignores
864the numerical values of \verb'A' (the same analysis phase operates on both real
865and complex matrices).  Thus, if you are factorizing a complex matrix, multiply
866this statistic by 4.
867
868%-------------------------------------------------------------------------------
869\section{GPU acceleration}
870\label{GPU}
871%-------------------------------------------------------------------------------
872
873As of version 2.0.0, SuiteSparseQR now includes GPU acceleration.
874It can exploit a single NVIDIA GPU, via CUDA.  To enable GPU acceleration,
875you must compile SuiteSparseQR with non-default options.  See the
876\verb'SuiteSparse_config_GPU_gcc.mk' file in the \verb'SuiteSparse_config'
877directory for details.  The packages SuiteSparse\_GPURuntime and
878GPUQREngine are also required (they should appear in the SuiteSparse
879directory, along with SPQR).
880
881At run time, you must also enable the GPU by setting \verb'Common->useGPU'
882to \verb'true'.  Before calling any SuiteSparseQR function, you must
883poll the GPU to set the available memory.  Below is a sample code
884that initializes CHOLMOD and then polls the GPU for use in SuiteSparseQR.
885
886\begin{verbatim}
887    size_t total_mem, available_mem ;
888    cholmod_common *cc, Common ;
889    cc = &Common ;
890    cholmod_l_start (cc) ;
891    cc->useGPU = true ;
892    cholmod_l_gpu_memorysize (&total_mem, &available_mem, cc) ;
893    cc->gpuMemorySize = available_mem ;
894    if (cc->gpuMemorySize <= 1)
895    {
896        printf ("no GPU available\n") ;
897    }
898
899    // Subsequent calls to SuiteSparseQR will use the GPU, if available
900\end{verbatim}
901
902See \verb'Demo/qrdemo_gpu.cpp' for an extended example, which can be
903compiled via \verb'make gpu' in the \verb'Demo' directory.
904
905GPU acceleration is not yet available via the MATLAB mexFunction interface.
906We expect to include this in a future release.
907
908For a detailed technical report on the GPU-accelerated algorithm,
909see \verb'qrgpu_paper.pdf' in the \verb'Doc' directory.
910
911%-------------------------------------------------------------------------------
912\section{Requirements and Availability}
913\label{summary}
914%-------------------------------------------------------------------------------
915
916SuiteSparseQR requires four prior Collected Algorithms of the ACM: CHOLMOD
917\cite{ChenDavisHagerRajamanickam09,DavisHager09} (version 1.7 or later), AMD
918\cite{AmestoyDavisDuff96,AmestoyDavisDuff03}, and COLAMD
919\cite{DavisGilbertLarimoreNg00_algo,DavisGilbertLarimoreNg00} for its
920ordering/analysis phase and for its basic sparse matrix data structure, and the
921BLAS \cite{dddh:90} for dense matrix computations on its frontal matrices; also
922required is LAPACK \cite{LAPACK} for its Householder reflections.  An efficient
923implementation of the BLAS is strongly recommended, either vendor-provided
924(such as the Intel MKL, the AMD ACML, or the Sun Performance Library) or other
925high-performance BLAS such as those of \cite{GotoVanDeGeijn08}.
926
927The use of Intel's Threading Building Blocks is optional \cite{Reinders07}, but
928without it, only parallelism within the BLAS can be exploited (if available).
929Suite\-SparseQR can optionally use METIS 4.0.1 \cite{KarypisKumar98e} and two
930constrained minimum degree ordering algorithms, CCOLAMD and CAMD
931\cite{ChenDavisHagerRajamanickam09}, for its fill-reducing ordering options.
932SuiteSparseQR can be compiled without these ordering methods and without TBB.
933
934In addition to appearing as Collected Algorithm 8xx of the ACM, SuiteSparseQR
935is available at
936\htmladdnormallink{http://www.suitesparse.com}{http://www.suitesparse.com}
937and at MATLAB Central
938in the user-contributed File Exchange (
939\htmladdnormallink{http://www.mathworks.com/matlabcentral}{http://www.mathworks.com/matlabcentral}
940).
941See SPQR/Doc/License.txt for the license.
942Alternative licenses are also
943available; contact the author for details.
944
945%-------------------------------------------------------------------------------
946% References
947%-------------------------------------------------------------------------------
948
949\bibliographystyle{plain}
950\bibliography{spqr_user_guide}
951\end{document}
952
953