1\documentclass[12pt]{article} 2\usepackage{hyperref} 3 4\topmargin -0.5in 5\textheight 9.0in 6\oddsidemargin 0pt 7\evensidemargin 0pt 8\textwidth 6.5in 9 10%------------------------------------------------------------------------------- 11% get epsf.tex file, for encapsulated postscript files: 12\input epsf 13%------------------------------------------------------------------------------- 14% macro for Postscript figures the easy way 15% usage: \postscript{file.ps}{scale} 16% where scale is 1.0 for 100%, 0.5 for 50% reduction, etc. 17% 18\newcommand{\postscript}[2] 19{\setlength{\epsfxsize}{#2\hsize} 20\centerline{\epsfbox{#1}}} 21%------------------------------------------------------------------------------- 22 23\title{User's Guide for SuiteSparseQR, a multifrontal multithreaded sparse 24QR factorization package (with optional GPU acceleration)} 25\author{Timothy A. Davis\thanks{ 26email: DrTimothyAldenDavis@gmail.com. 27http://www.suitesparse.com. 28Portions of this work were supported by the National 29Science Foundation, under grants 0203270, 0620286, and 0619080.}, 30Sencer Nuri Yeralan, Sanjay Ranka, Wissam Sid-Lakhdar} 31 32\date{VERSION 2.0.9, Dec 28, 2018} 33 34%------------------------------------------------------------------------------- 35\begin{document} 36%------------------------------------------------------------------------------- 37\maketitle 38 39\begin{abstract} 40 41SuiteSparseQR is an implementation of the multifrontal sparse QR factorization 42method. Parallelism is exploited both in the BLAS and across different frontal 43matrices using Intel's Threading Building Blocks, a shared-memory programming 44model for modern multicore architectures. It can obtain a substantial fraction 45of the theoretical peak performance of a multicore computer. The package is 46written in C++ with user interfaces for MATLAB, C, and C++. Both real and 47complex sparse matrices are supported. 48 49\end{abstract} 50 51\maketitle 52 53%------------------------------------------------------------------------------- 54\section{Introduction} 55\label{intro} 56%------------------------------------------------------------------------------- 57 58The algorithms used in SuiteSparseQR are discussed in a companion paper, 59\cite{Davis08a}, and an overview of how to use the software is given in 60\cite{Davis08b}. This document gives detailed information on the installation 61and use of SuiteSparseQR. 62 63%------------------------------------------------------------------------------- 64\section{Using SuiteSparseQR in MATLAB} 65%------------------------------------------------------------------------------- 66 67The simplest way to use SuiteSparseQR is via MATLAB. Its syntax includes every 68feature of the MATLAB \verb'qr' in version R2009a and earlier 69\cite{GilbertMolerSchreiber}, plus additional features not available in MATLAB. 70It is also a replacement for \verb'x=A\b' for least-squares problems and 71underdetermined systems. In addition to substantial gains in performance (10x 72to 100x is not uncommon, up to 10,000x has been observed), SuiteSparseQR adds 73new capabilities that are not present in MATLAB. For example, it provides an 74efficient method for finding the minimum 2-norm solution to an underdetermined 75system. 76 77%------------------------------------------------------------------------------- 78\subsection{Installing SuiteSparseQR for use in MATLAB} 79%------------------------------------------------------------------------------- 80 81All packages in SuiteSparse, including SuiteSparseQR and the codes it relies on 82(AMD, COLAMD, CHOLMOD, METIS, CCAMD, and CCOLAMD) are compiled with a single 83command typed into the MATLAB Command Window. SuiteSparseQR uses the LAPACK 84and BLAS libraries provided with MATLAB; you do not need to do anything to use 85these. Below are step-by-step instructions for compiling all of SuiteSparse 86(including SuiteSparseQR), and optional instructions on using METIS and/or 87Intel's Threading Building Blocks (TBB). 88 89%------------------------------------------------------------------------------- 90\subsubsection{Required instructions for Windows} 91%------------------------------------------------------------------------------- 92 93For Windows, you cannot use the \verb'lcc' compiler that ships with MATLAB; it 94is not a C++ compiler. To compile SuiteSparseQR, you must obtain a C++ 95compiler; Microsoft Visual Studio C++ Express Edition will work fine. Install 96this compiler from \newline 97\htmladdnormallink{http://www.microsoft.com/express/vc/}{http://www.microsoft.com/express/vc/} 98and then type \verb'mex -setup' in the MATLAB Command Window. 99 100%------------------------------------------------------------------------------- 101\subsubsection{Optional instructions on using METIS for any operating system} 102%------------------------------------------------------------------------------- 103 104SuiteSparse now relies on METIS 5.1.0, which is distributed along with 105SuiteSparse itself. Its use is optional, however. If you compile with 106{\tt -DNPARTITION}, or if you delete or move the {\tt SuiteSparse/metis-5.1.0} 107folder, then SuiteSparse will be compiled without it. 108 109METIS tends to give orderings that are good for the parallelism exploited by 110TBB, so its use with TBB is recommended. Note however that METIS is not 111bug-free; it can occasionally cause segmentation faults, particularly if used 112when finding basic solutions to underdetermined systems with many more columns 113than rows (SuiteSparseQR does not use METIS, by default, for those systems). 114This (rare) faulty behavior has been confirmed with valid inputs to the METIS 115test programs themselves; it is not a bug in the SuiteSparse interface to 116METIS. Use METIS at your own risk. 117 118%------------------------------------------------------------------------------- 119\subsubsection{Optional instructions for using TBB on Linux/Unix/Mac} 120%------------------------------------------------------------------------------- 121 122If you are using a Debian-based Linux system (such as Ubuntu), you're in luck! 123You can install TBB via the Synaptic Package Manager. Just search for TBB, 124select it, and click Apply. This will place the right files in \verb'/usr/lib' 125and it will create the \verb'/usr/include/tbb' directory. It's by far the 126simplest way to install TBB. If you do this, skip the rest of this section. 127 128Alternatively, obtain a copy of TBB from 129\htmladdnormallink{http://www.threadingbuildingblocks.org}{http://www.threadingbuildingblocks.org} 130as a \verb'tbb*.tgz' file appropriate for your version of Linux/Unix. If you 131install via the \verb'tbb_*.tgz' file, make sure the \verb'libtbb*.so*' and 132\verb'libtbbmalloc*.so*' files are placed in the \verb'/usr/lib' directory. 133Make sure the \verb'tbb' directory containing all of the include files is 134placed in \verb'/usr/include' (for example, the 135\verb'/usr/include/tbb/task_scheduler_init.h' file must exist). 136 137If you do not have permission to install TBB properly into the \verb'/usr/lib' 138and \verb'/usr/include' directories, you can place it in your own directory and 139modify the \verb'tbb_path' variable in \verb'spqr_make.m' to specify 140the location of your own copy of TBB (refer to that file for instructions). 141You must also set your \verb'LD_LIBRARY_PATH' environment variable to include 142the directory containing the \verb'libtbb*so' files. You need to first 143determine which subdirectory of the TBB distribution contains the libraries 144appropriate for your system; in the examples below this is just called 145\verb'/path'. It must be an absolute path, starting with the \verb'/' 146character. There must be no spaces in the path name. 147 148For Linux use this command at the system command line before starting MATLAB, 149where \verb'/path' should be replaced with the actual full path of the TBB 150\verb'lib' directory: 151 152\begin{verbatim} 153 setenv LD_LIBRARY_PATH /path:$LD_LIBRARY_PATH 154\end{verbatim} 155 156If you use the \verb'csh' shell, place the command in your \verb'~/.cshrc' 157file so you don't have to type it each time you start MATLAB. 158 159For the Mac, edit the \verb'/etc/profile' file and add this line to the end of 160the file: 161 162\begin{verbatim} 163 export DYLD_LIBRARY_PATH=/path:$DYLD_LIBRARY_PATH 164\end{verbatim} 165 166For example, using the 64-bit Linux version placed in my home directory, this 167command would be placed in my \verb'~/.cshrc' file, prior to starting MATLAB. 168 169{\scriptsize 170\begin{verbatim} 171 setenv LD_LIBRARY_PATH /home/davis/tbb21_009oss/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:$LD_LIBRARY_PATH 172\end{verbatim} 173} 174 175Then, before starting MATLAB, make sure this variable is set by typing this 176command at the system command line (you only have to do this for this session; 177whenever you start a new command shell this will be done automatically): 178 179\begin{verbatim} 180 source ~/.cshrc 181\end{verbatim} 182 183For this example, my \verb'spqr_make.m' file would contain this line: 184 185\begin{verbatim} 186 tbb_path = '/home/davis/tbb21_009oss' ; 187\end{verbatim} 188 189%------------------------------------------------------------------------------- 190\subsubsection{Optional instructions for using TBB on Windows} 191%------------------------------------------------------------------------------- 192 193Obtain a copy of TBB from 194\htmladdnormallink{http://www.threadingbuildingblocks.org}{http://www.threadingbuildingblocks.org} 195; for TBB Version 2.1 this file is \verb'tbb21_009oss_win.zip'. 196 197Create a folder and place the \verb'tbb21_009oss_win.zip' file there, and 198extract the file. In the example below, I extracted to 199\verb'C:\TBB\tbb21_009oss' (if you do the same then you do not have to edit 200\verb'spqr_make.m'). There should be no spaces in the path (for example, 201placing TBB under the \verb'Program Files' directory will not work). 202 203In Windows XP, right-click \verb'My Computer' and select \verb'Properties'. 204Click the \verb'Advanced' tab. Click \verb'Environment Variables'. Under 205\verb'System' variables, edit the \verb'Path' to append the name of the folder 206containing the TBB \verb'bin' folder appropriate for your system, preceded by a 207semicolon. For example, in a 32-bit Windows system if TBB is installed in 208\verb'C:\TBB' you would append the string 209\verb';C:\TBB\tbb21_009oss\ia32\vc9\bin\' to the end of your system \verb'Path' 210variable. 211 212For Vista, the instructions are the same, except that you choose 213\verb'Computer' instead of \verb'My Computer', and you click the 214\verb'Advanced System Settings' tab instead of the \verb'Advanced' tab. 215 216Next, edit the \verb'SPQR\MATLAB\spqr_make.m' file. Change the \verb'tbb_path' 217variable to point to your copy of TBB. For example, if you installed TBB into 218\verb'C:\TBB' your \verb'spqr_make.m' file would contain this line: 219 220\begin{verbatim} 221 tbb_path = 'C:\TBB\tbb21_009oss' ; 222\end{verbatim} 223 224That line is already in \verb'spqr_make.m', so if you choose to install TBB 225in that location, you do not have to edit the file. 226 227%------------------------------------------------------------------------------- 228\subsubsection{Now you're ready to compile (on any operating system)} 229%------------------------------------------------------------------------------- 230 231Type these commands in the MATLAB window: 232\begin{verbatim} 233 cd SuiteSparse 234 SuiteSparse_install 235\end{verbatim} 236You will be asked if you want to run some demos. I recommend that you do this 237to ensure your functions have been installed correctly. Next type the command 238\begin{verbatim} 239 pathtool 240\end{verbatim} 241and examine your MATLAB path. The various SuiteSparse directories have been 242placed in your path. Click ``save'' to save this path for future MATLAB 243sessions. If this fails, you do not have permission to modify the 244\verb'pathdef.m' file (it is shared by all users). An alternative is to type 245the command: 246\begin{verbatim} 247 path 248\end{verbatim} 249and cut-and-paste the paths displayed there into your own \verb'startup.m' 250file, prepending the command \verb'addpath' to each line. For example, if I 251installed SuiteSparse into my home directory (\verb'/home/davis') then my 252\verb'startup.m' file should look like this: 253 254{\footnotesize 255\begin{verbatim} 256 addpath /home/davis/SuiteSparse/SPQR/MATLAB 257 addpath /home/davis/SuiteSparse/RBio 258 addpath /home/davis/SuiteSparse/MATLAB_Tools/spok 259 addpath /home/davis/SuiteSparse/MATLAB_Tools/waitmex 260 addpath /home/davis/SuiteSparse/MATLAB_Tools/shellgui 261 addpath /home/davis/SuiteSparse/MATLAB_Tools/GEE 262 addpath /home/davis/SuiteSparse/LINFACTOR 263 addpath /home/davis/SuiteSparse/MESHND 264 addpath /home/davis/SuiteSparse/UFcollection 265 addpath /home/davis/SuiteSparse/SSMULT 266 addpath /home/davis/SuiteSparse/KLU/MATLAB 267 addpath /home/davis/SuiteSparse/BTF/MATLAB 268 addpath /home/davis/SuiteSparse/LDL/MATLAB 269 addpath /home/davis/SuiteSparse/CXSparse/MATLAB/UFget 270 addpath /home/davis/SuiteSparse/CXSparse/MATLAB/Demo 271 addpath /home/davis/SuiteSparse/CXSparse/MATLAB/CSparse 272 addpath /home/davis/SuiteSparse/CAMD/MATLAB 273 addpath /home/davis/SuiteSparse/CCOLAMD/MATLAB 274 addpath /home/davis/SuiteSparse/COLAMD/MATLAB 275 addpath /home/davis/SuiteSparse/AMD/MATLAB 276 addpath /home/davis/SuiteSparse/CHOLMOD/MATLAB 277 addpath /home/davis/SuiteSparse/UMFPACK/MATLAB 278 addpath /home/davis/SuiteSparse 279 addpath /home/davis/SuiteSparse/MATLAB_Tools 280\end{verbatim} 281} 282 283On a Windows system, I might see paths like this instead in my \verb'startup.m' 284file: 285 286{\footnotesize 287\begin{verbatim} 288 addpath C:\Documents and Settings\davis\My Documents\SuiteSparse\SPQR\MATLAB 289 ... 290\end{verbatim} 291} 292 293Your \verb'startup.m' file should appear in the directory in which MATLAB 294starts. Failing that, every time you start MATLAB, find your \verb'startup.m' 295file and run it. For more help, type \verb'doc startup' in MATLAB. 296 297The \verb'SuiteSparse_install' script works on any version of MATLAB 298(Linux/Unix, Mac, or Windows) if you have a C++ compiler. The install script 299will detect if you have placed the METIS directory in the right place, and will 300compile it for use with SuiteSparseQR if it finds it there. Otherwise METIS 301will be skipped (the install script will tell you if it finds METIS or not). 302 303%------------------------------------------------------------------------------- 304\subsubsection{Optional instructions for using TBB on any system} 305%------------------------------------------------------------------------------- 306 307If you have followed the steps above (as I recommend that you do), you have 308just compiled SuiteSparseQR, but it will not yet be using TBB. To use TBB with 309SuiteSparseQR, first install TBB as described above. Once TBB is installed, 310type the following commands in the MATLAB Command Window (assuming you have 311METIS): 312 313\begin{verbatim} 314 cd SuiteSparse/SPQR/MATLAB 315 spqr_make metis tbb 316\end{verbatim} 317 318Or if you do not have METIS, do this instead: 319 320\begin{verbatim} 321 cd SuiteSparse/SPQR/MATLAB 322 spqr_make nometis tbb 323\end{verbatim} 324 325For more options, type \verb'help spqr_make'. 326 327%------------------------------------------------------------------------------- 328\subsection{Functions provided to the MATLAB user} 329%------------------------------------------------------------------------------- 330 331Three primary functions are available: 332 333\begin{enumerate} 334 335\item \verb'spqr', a replacement for the MATLAB \verb'qr' 336 337\item \verb'spqr_solve', a replacement for \verb'x=A\b' when \verb'A' is sparse 338and rectangular. It works for the square case, too, but \verb'x=A\b' will be 339faster (using LU or Cholesky factorization). \verb'spqr_solve' is a good method 340for ill-conditioned or rank-deficient square matrices, however. 341 342\item \verb'spqr_qmult', which multiplies \verb'Q' (stored in Householder 343vector form) times a matrix \verb'x'. 344 345\end{enumerate} 346 347Their syntax is described below in the table below. 348The permutation \verb'P' is chosen to reduce fill-in and to return \verb'R' in 349upper trapezoidal form if \verb'A' is estimated to have less than full rank. 350The \verb'opts' parameter provides non-default options (refer to the next 351section). The output \verb'Q' can be optionally returned in Householder form, 352which is far sparser than returning \verb'Q' as a sparse matrix. 353 354\vspace{0.1in} 355 356{\footnotesize 357\begin{tabular}{|ll|} 358\hline 359\verb'R = spqr (A)' & Q-less QR factorization \\ 360\verb'R = spqr (A,0)' & economy variant (\verb'size(R,1) = min(m,n)') \\ 361\verb'R = spqr (A,opts)' & as above, with non-default options \\ 362\hline 363\verb'[Q,R] = spqr (A)' & \verb'A=Q*R' factorization \\ 364\verb'[Q,R] = spqr (A,0)' & economy variant (\verb'size(Q,2) = size(R,1) = min(m,n)') \\ 365\verb'[Q,R] = spqr (A,opts)' & \verb'A=Q*R', with non-default options \\ 366\hline 367\verb'[Q,R,P] = spqr (A)' & \verb'A*P=Q*R' where P reduces fill-in \\ 368\verb'[Q,R,P] = spqr (A,0)' & economy variant (\verb'size(Q,2) = size(R,1) = min(m,n)') \\ 369\verb'[Q,R,P] = spqr (A,opts)' & as above, with non-default options \\ 370\hline 371\verb'[C,R] = spqr (A,B)' & as \verb'R=spqr(A)', also returns \verb"C=Q'*B" \\ 372\verb'[C,R] = spqr (A,B,0)' & economy variant (\verb'size(C,1) = size(R,1) = min(m,n)') \\ 373\verb'[C,R] = spqr (A,B,opts)' & as above, with non-default options \\ 374\hline 375\verb'[C,R,P] = spqr (A,B)' & as \verb'R=spqr(A*P)', also returns \verb"C=Q'*B" \\ 376\verb'[C,R,P] = spqr (A,B,0)' & economy variant (\verb'size(C,1) = size(R,1) = min(m,n)') \\ 377\verb'[C,R,P] = spqr (A,B,opts)'& as above, with non-default options \\ 378\hline 379\verb'x = spqr_solve (A,B)' & \verb'x=A\B' \\ 380\verb'[x,info] = spqr_solve (A,B,opts)' & as above, with statistics and non-default parameters \\ 381\hline 382\verb'Y = spqr_qmult (Q,X,k)' & computes \verb"Q'*X", \verb"Q*X", \verb"X*Q'", or \verb"X*Q" 383(selected with \verb'k') \\ 384\hline 385\end{tabular} 386} 387 388%------------------------------------------------------------------------------- 389\subsection{The {\tt opts} parameter} 390%------------------------------------------------------------------------------- 391 392The \verb'opts' struct provides control over non-default parameters for 393SuiteSparseQR. Entries not present in \verb'opts' are set to their defaults. 394 395\begin{itemize} 396 397 \item \verb'opts.tol': columns that have 2-norm \verb'<= opts.tol' are 398 treated as zero. The default is \verb"20*(m+n)*eps*sqrt(max(diag(A'*A)))" 399 where \verb'[m n]=size(A)'. 400 401 \item \verb'opts.econ': number of rows of \verb'R' and columns of \verb'Q' 402 to return. The default is \verb'm'. Using \verb'n' gives the standard 403 economy form (as in the MATLAB \verb'qr(A,0)'). A value less than the 404 estimated rank \verb'r' is set to \verb'r', so \verb'opts.econ=0' gives 405 the ``rank-sized'' factorization, where \verb'size(R,1)==nnz(diag(R))==r'. 406 407 \item \verb'opts.ordering': a string describing which column ordering 408 method to use. Let \newline 409 \verb'[m2 n2]=size(S)' where \verb'S' is obtained by 410 removing singletons from \verb'A'. The singleton permutation places 411 \verb'A*P' in the form \verb'[A11 A12 ; 0 S]' where \verb'A11' is upper 412 triangular with diagonal entries all greater than \verb'opts.tol'. 413 414 The default is to use COLAMD if \verb'm2<=2*n2'; otherwise try AMD. Let 415 \verb'f' be the flops for \verb"chol((S*P)'*(S*P))" with the ordering 416 \verb'P' found by AMD. Then if \verb'f/nnz(R) >= 500' and 417 \verb'nnz(R)/nnz(S) >= 5' then try METIS, and take the best ordering found 418 (AMD or METIS); otherwise use AMD without trying METIS. If METIS is not 419 installed then the default ordering is to use COLAMD if \verb'm2<=2*n2' and 420 to use AMD otherwise. 421 422 The available orderings are: 423 424 {\tt 'default'}: the default ordering. 425 426 {\tt 'amd'}: use \verb"amd(S'*S)". 427 428 {\tt 'colamd'}: use \verb"colamd(S)". 429 430 {\tt 'metis'}: use \verb"metis(S'*S)", only if METIS is 431 installed. 432 433 {\tt 'best'}: try all three (AMD, COLAMD, METIS) and take the 434 best. 435 436 {\tt 'bestamd'}: try AMD and COLAMD and take the best. 437 438 {\tt 'fixed'}: use \verb"P=I"; this is the only option if 439 \verb"P" is not present in the output. 440 441 {\tt 'natural'}: singleton removal only. 442 443 \item \verb'opts.Q': a string describing how \verb'Q' is to be returned. 444 The default is \verb"'discard'" if \verb'Q' is not present in the output, 445 or \verb"'matrix'" otherwise. If \verb'Q' is present and \verb'opts.Q' is 446 \verb"'discard'", then \verb"Q=[]" is returned (thus \verb"R=spqr(A*P)" is 447 \verb"[Q,R,P]=spqr(A)" where \verb"spqr" finds \verb"P" but \verb"Q" is 448 discarded instead). The usage \verb"opts.Q='matrix'" returns \verb'Q' as a 449 sparse matrix where \verb'A=Q*R' or \verb"A*P=Q*R". Using 450 \verb"opts.Q='Householder'" returns \verb'Q' as a struct containing the 451 Householder reflections applied to \verb'A' to obtain \verb'R', resulting 452 in a far sparser \verb'Q' than the \verb"'matrix'" option. 453 454 \item \verb'opts.permutation': a string describing how \verb'P' is to be 455 returned. The default is \verb"'matrix'", so that \verb"A*P=Q*R". Using 456 \verb"'vector'" gives \verb"A(:,P)=Q*R" instead. 457 458 \item \verb'opts.spumoni': an integer \verb'k' that 459 acts just like \verb"spparms('spumoni',k)". 460 461 \item \verb'opts.min2norm': used by \verb'spqr_solve'; you can use 462 \verb"'basic'" (the default), or \verb"'min2norm'". Determines the kind of 463 solution that \verb'spqr_solve' computes for underdetermined systems. Has 464 no effect for least-squares problems; ignored by \verb'spqr' itself. 465 466 \item \verb'opts.grain', \verb'opts.small', \verb'opts.nthreads': 467 multitasking control (if compiled with TBB). Let \verb'f' be the total 468 flop count. The analysis phase tries to ensure that all parallel tasks 469 have at least \verb'max(total_flops/opts.grain,opts.small)' flops. No TBB 470 parallelism is exploited if \verb'opts.grain <= 1'. The parameter 471 \verb'opts.nthreads' gives the number of threads to use for TBB (which is 472 different than the number of threads used by the BLAS). Setting 473 \verb'opts.nthreads <= 0' means to let TBB determine the number of threads 474 (normally equal to the number of cores); otherwise, exactly 475 \verb'opts.nthreads' threads are used. The defaults are 476 \verb'opts.grain=1', \verb'opts.small=1e6', and \verb'opts.nthreads=0', 477 respectively. That is, TBB is disabled by default since it conflicts with 478 BLAS multithreading. If you enable TBB, be sure to disable BLAS 479 multithreading with the MATLAB command \verb'maxNumCompThreads(1)', or 480 choose \newline \verb'opts.nthreads=k' and \verb'maxNumCompThreads(b)' so 481 that the product \verb'k*b' is equal to the number of cores. Note that 482 these recommendations may change for future versions of TBB. A good value 483 of \verb'opts.grain' is twice that of \verb'opts.nthreads'. If TBB 484 parallelism is enabled, the METIS ordering normally gives the best speedup 485 for large problems. 486 487\end{itemize} 488 489%------------------------------------------------------------------------------- 490\subsection{Examples on how to use the MATLAB interface} 491%------------------------------------------------------------------------------- 492 493To solve a least-squares problem, or to find the basic solution to an 494underdetermined system, just use \verb'x = spqr_solve(A,b)' in place of 495\verb'x=A\b'. To compute the QR factorization, use \verb'[Q,R]=spqr(A)' 496instead of \verb'[Q,R]=qr(A)'. Better results can be obtained by discarding 497\verb'Q' with the usage \verb'R=spqr(A)' (in place of \verb'R=qr(A)'), or by 498requesting \verb'Q' in Householder form with \verb'[Q,R]=spqr(A,opts)' where 499\verb"opts.Q='Householder'". The latter option is not available in MATLAB. To 500use a fill-reducing ordering, simply use any of the syntaxes above with 501\verb'P' as an output parameter. 502 503The least-squares solution of an overdetermined system \verb'A*x=b' with 504\verb'm>n' (where \verb'A' has rank \verb'n') can be found in one of at least 505seven ways (in increasing order of efficiency, in time and memory): 506 507{\footnotesize 508\begin{tabular}{|l|l|} 509 \hline 510 \verb"x = pinv(full(A)) * b ;" 511 & impossible for large \verb'A' \\ 512 \hline 513 \verb"[Q,R] = spqr (A) ;" 514 & high fill-in in \verb'R', \\ 515 \verb"x = R\(Q'*b) ;" 516 & \verb'Q' costly in matrix form \\ 517 \hline 518 \verb"[Q,R,P] = spqr (A) ;" 519 & low fill-in in \verb'R', \\ 520 \verb"x = P*(R\(Q'*b)) ;" 521 & \verb'Q' costly in matrix form \\ 522 \hline 523 \verb"[Q,R,P] = spqr (A,struct('Q','Householder')) ;" 524 & low fill-in in \verb'R', \\ 525 \verb"x = P*(R\spqr_qmult (Q,b,0)) ;" 526 & \verb'Q' in efficient Householder form \\ 527 \hline 528 \verb"[c,R,P] = spqr (A,b) ;" 529 & \verb'Q' not kept, \\ 530 \verb"x = P*(R\c) ;" 531 & \verb'P' a permutation matrix \\ 532 \hline 533 \verb"[c,R,p] = spqr (A,b,0) ;" 534 & \verb'Q' not kept, \\ 535 \verb"y = (R\c) ; x(p) = y" 536 & \verb'p' a permutation vector \\ 537 \hline 538 \verb"x = spqr_solve (A,b) ;" 539 & less memory and better handling \\ 540 & of rank-deficient matrices \\ 541 \hline 542\end{tabular} 543} 544 545\vspace{0.1in} 546 547The minimum-norm solution of an underdetermined system \verb'A*x=b' with 548\verb'm<n' can be found in one of five ways (in increasing order of 549efficiency, in time and memory): 550 551\vspace{0.1in} 552 553{\footnotesize 554\begin{tabular}{|l|l|} 555 \hline 556 \verb"x = pinv(full(A)) * b ;" 557 & impossible for large \verb'A' \\ 558 \hline 559 \verb"[Q,R] = spqr (A') ;" 560 & high fill-in in \verb'R', \\ 561 \verb"x = Q*(R'\b) ;" 562 & \verb'Q' costly in matrix form \\ 563 \hline 564 \verb"[Q,R,P] = spqr (A') ;" 565 & low fill-in in \verb'R', \\ 566 \verb"x = Q*(R'\(P'*b)) ;" 567 & \verb'Q' costly in matrix form \\ 568 \hline 569 \verb"[Q,R,P] = spqr (A',struct('Q','Householder')) ;" 570 & low fill-in in \verb'R', \\ 571 \verb"x = spqr_qmult (Q,R'\(P'*b),1) ;" 572 & \verb'Q' in efficient Householder form \\ 573 \hline 574 \verb"opts.solution = 'min2norm' ;" 575 & as 4th option above, but faster, \\ 576 \verb"x = spqr_solve (A,b,opts) ;" 577 & less memory, and better handling \\ 578 & of rank-deficient matrices \\ 579 \hline 580\end{tabular} 581} 582 583Note that \verb'spqr_solve' uses a fill-reducing ordering, by default. 584It can be disabled or modified using a non-default \verb'opts' parameter 585(\verb'opts.ordering', specifically). 586 587%------------------------------------------------------------------------------- 588\section{Using SuiteSparseQR in C and C++} 589%------------------------------------------------------------------------------- 590 591SuiteSparseQR relies on CHOLMOD for its basic sparse matrix data structure, a 592compressed sparse column format. CHOLMOD provides interfaces to the AMD, 593COLAMD, and METIS ordering methods, supernodal symbolic Cholesky factorization 594(namely, \verb'symbfact' in MATLAB), functions for converting between different 595data structures, and for basic operations such as transpose, matrix multiply, 596reading a matrix from a file, writing a matrix to a file, and many other 597functions. 598 599%------------------------------------------------------------------------------- 600\subsection{Installing the C/C++ library on Linux/Unix} 601%------------------------------------------------------------------------------- 602 603Before you compile the SuiteSparseQR library and demo programs, you may wish to 604edit the \verb'SuiteSparse/SuiteSparse_config/SuiteSparse_config.mk' configuration file. The 605defaults should be fine on most Linux/Unix systems and on the Mac. 606It automatically detects what system you have and sets compile parameters 607accordingly. 608 609Next, type \verb'make' at 610the Linux/Unix command line, in either the \verb'SuiteSparse' directory (which 611compiles all of SuiteSparse) or in the \verb'SuiteSparse/SPQR' directory (which 612just compiles SuiteSparseQR and the libraries it requires). SuiteSparseQR will 613be compiled, and a set of simple demos will be run (including the one in the 614next section). 615 616The configuration file defines where the LAPACK and BLAS libraries are to be 617found. Selecting the right BLAS is critical. There is no standard naming 618scheme for the name and location of these libraries. The defaults in the 619\verb'SuiteSparse_config.mk' file use \verb'-llapack' and \verb'-lblas'; the latter may 620link against the standard Fortran reference BLAS, which will not provide 621optimal performance. For best results, you should use a 622high-performance vendor-supplied BLAS such as the 623Intel MKL. Selection of LAPACK and 624the BLAS is done with the \verb'LAPACK=' and \verb'BLAS=' lines in the 625\verb'SuiteSparse_config.mk' file. 626 627Four compile-time options can be used to modify how SuiteSparseQR is compiled. 628Select these via the \verb'SPQR_CONFIG=' line in the \verb'SuiteSparse_config.mk' file. 629 630\begin{itemize} 631 632 \item \verb'-DNPARTITION': do not compile with METIS, CAMD, or CCOLAMD. 633 These packages are included by default. 634 635 \item \verb'-DNEXPERT': do not compile with the ``expert'' routines in 636 \verb'SuiteSparseQR_expert.cpp'. The expert routines are included by 637 default. 638 639 \item \verb'-DHAVE_TBB': enable the Intel Threading Building Blocks, TBB. 640 The use of TBB is disabled by default. It is disabled because not all 641 installations have TBB available. The use of TBB is recommended, however, 642 if you have a multicore computer. 643 644\end{itemize} 645 646To fully test 100\% of the lines of SuiteSparseQR, go to the \verb'Tcov' 647directory and type \verb'make'. This will work for Linux only. 648 649To install the shared library 650into /usr/local/lib and /usr/local/include, do {\tt make install}. 651To uninstall, do {\tt make uninstall}. 652For more options, see the {\tt SuiteSparse/README.txt} file. 653 654%------------------------------------------------------------------------------- 655\subsection{C/C++ Example} 656%------------------------------------------------------------------------------- 657 658The C++ interface is written using templates for handling both real and complex 659matrices. The simplest function computes the MATLAB equivalent of 660\verb'x=A\b' and is almost as simple: 661{\footnotesize 662\begin{verbatim} 663 #include "SuiteSparseQR.hpp" 664 X = SuiteSparseQR <double> (A, B, cc) ; 665\end{verbatim} 666} 667The C version of this function is almost identical: 668{\footnotesize 669\begin{verbatim} 670 #include "SuiteSparseQR_C.h" 671 X = SuiteSparseQR_C_backslash_default (A, B, cc) ; 672\end{verbatim} 673} 674 675Below is a simple C++ program that illustrates the use of SuiteSparseQR. The 676program reads in a least-squares problem from \verb'stdin' in MatrixMarket 677format \cite{BoisvertPozoRemingtonBarrettDongarra97}, solves it, and prints the 678norm of the residual and the estimated rank of \verb'A'. The comments reflect 679the MATLAB equivalent statements. The C version of this program is identical 680except for the \verb'#include' statement and call to SuiteSparseQR which are 681replaced with the C version of the statement above, and C-style comments. 682 683{\footnotesize 684\begin{verbatim} 685 #include "SuiteSparseQR.hpp" 686 int main (int argc, char **argv) 687 { 688 cholmod_common Common, *cc ; 689 cholmod_sparse *A ; 690 cholmod_dense *X, *B, *Residual ; 691 double rnorm, one [2] = {1,0}, minusone [2] = {-1,0} ; 692 int mtype ; 693 694 // start CHOLMOD 695 cc = &Common ; 696 cholmod_l_start (cc) ; 697 698 // load A 699 A = (cholmod_sparse *) cholmod_l_read_matrix (stdin, 1, &mtype, cc) ; 700 701 // B = ones (size (A,1),1) 702 B = cholmod_l_ones (A->nrow, 1, A->xtype, cc) ; 703 704 // X = A\B 705 X = SuiteSparseQR <double> (A, B, cc) ; 706 707 // rnorm = norm (B-A*X) 708 Residual = cholmod_l_copy_dense (B, cc) ; 709 cholmod_l_sdmult (A, 0, minusone, one, X, Residual, cc) ; 710 rnorm = cholmod_l_norm_dense (Residual, 2, cc) ; 711 printf ("2-norm of residual: %8.1e\n", rnorm) ; 712 printf ("rank %ld\n", cc->SPQR_istat [4]) ; 713 714 // free everything and finish CHOLMOD 715 cholmod_l_free_dense (&Residual, cc) ; 716 cholmod_l_free_sparse (&A, cc) ; 717 cholmod_l_free_dense (&X, cc) ; 718 cholmod_l_free_dense (&B, cc) ; 719 cholmod_l_finish (cc) ; 720 return (0) ; 721 } 722\end{verbatim} 723} 724 725%------------------------------------------------------------------------------- 726\subsection{C++ Syntax} 727%------------------------------------------------------------------------------- 728 729All features available to the MATLAB user are also available to both the C and 730C++ interfaces using a syntax that is not much more complicated than the MATLAB 731syntax. Additional features not available via the MATLAB interface include the 732ability to compute the symbolic and numeric factorizations separately (for 733multiple matrices with the same nonzero pattern but different numerical 734values). The following is a list of user-callable C++ functions and what they 735can do: 736 737\begin{enumerate} 738 739 \item \verb'SuiteSparseQR': an overloaded function that provides functions 740 equivalent to \verb'spqr' and \verb'spqr_solve' in the SuiteSparseQR MATLAB 741 interface. 742 743 \item \verb'SuiteSparseQR_factorize': performs both the symbolic and 744 numeric factorizations and returns a QR factorization object such that 745 \verb'A*P=Q*R'. It always exploits singletons. 746 747 \item \verb'SuiteSparseQR_symbolic': performs the symbolic factorization 748 and returns a QR factorization object to be passed to 749 \verb'SuiteSparseQR_numeric'. It does not exploit singletons. 750 751 \item \verb'SuiteSparseQR_numeric': performs the numeric factorization on a 752 QR factorization object, either one constructed by 753 \verb'SuiteSparseQR_symbolic', or reusing one from a prior call to 754 \verb'SuiteSparseQR_numeric' for a matrix \verb'A' with the same pattern as 755 the first one, but with different numerical values. 756 757 \item \verb'SuiteSparseQR_solve': solves a linear system using the object 758 returned by \newline \verb'SuiteSparseQR_factorize' or 759 \verb'SuiteSparseQR_numeric', namely \verb"x=R\b", \newline \verb"x=P*R\b", 760 \verb"x=R'\b", or \verb"x=R'\(P'*b)". 761 762 \item \verb'SuiteSparseQR_qmult': provides the same function as 763 \verb'spqr_qmult' in the MATLAB interface, computing 764 \verb"Q'*x", \verb"Q*x", \verb"x*Q'", or \verb"x*Q". 765 It uses either a QR factorization 766 in MATLAB-style sparse matrix format, or the QR factorization object 767 returned by \newline \verb'SuiteSparseQR_factorize' or 768 \verb'SuiteSparseQR_numeric'. 769 770 \item \verb'SuiteSparseQR_min2norm': finds the minimum 2-norm solution to 771 an underdetermined linear system. 772 773 \item \verb'SuiteSparseQR_free': frees the QR factorization object. 774 775\end{enumerate} 776 777%------------------------------------------------------------------------------- 778\subsection{Details of the C/C++ Syntax} 779%------------------------------------------------------------------------------- 780 781For further details of how to use the C/C++ syntax, please refer to the 782definitions and descriptions in the following files: 783 784\begin{enumerate} 785\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR.hpp' describes each 786C++ function. Both \verb'double' and \verb'std::complex<double>' matrices 787are supported. 788 789\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR_definitions.h' describes 790definitions \newline common to both C and C++ functions. For example, each of 791the ordering methods is given a \verb'#define''d name. The default is 792\verb'ordering = SPQR_ORDERING_DEFAULT', and the default tolerance is given by 793\verb'tol = SPQR_DEFAULT_TOL'. 794 795\item \verb'SuiteSparse/SPQR/Include/SuiteSparseQR_C.h' describes 796the C-callable functions. 797 798\end{enumerate} 799 800Most of the packages in SuiteSparse come in multiple versions with different 801sized integers. The first is the plain C/C++ \verb'int'. The second the 802\verb'SuiteSparse_long' integer, defined in the 803\verb'SuiteSparse/SuiteSparse_config/SuiteSparse_config.h' 804file. This integer is \verb'long' except on a Windows-64 platform for which it 805is the \verb'__int64' type. The intent of \verb'SuiteSparse_long' 806is that it should be 80732-bits on a 32-bit platform, and 64-bits on a 64-bit platform. 808 809By contrast, SuiteSparseQR only provides a \verb'SuiteSparse_long' version. 810Most users (except Windows-64) can simply use \verb'long' as the basic integer 811type passed to and returned from SuiteSparseQR. 812 813The C/C++ options corresponding to the MATLAB \verb'opts' parameters and the 814contents of the optional \verb'info' output of \verb'spqr_solve' are described 815below. Let \verb'cc' be the CHOLMOD \verb'Common' object, containing parameter 816settings and statistics. All are of type \verb'double', except for 817\verb'SPQR_istat' which is \verb'SuiteSparse_long', 818\verb'cc->memory_usage' which is 819\verb'size_t', and \verb'cc->SPQR_nthreads' which is \verb'int'. Parameters 820include: 821 822\vspace{0.1in} 823{\footnotesize 824\begin{tabular}{|ll|} 825\hline 826\verb'cc->SPQR_grain' & the same as \verb'opts.grain' in the MATLAB interface \\ 827\verb'cc->SPQR_small' & the same as \verb'opts.small' in the MATLAB interface \\ 828\verb'cc->SPQR_nthreads' 829 & the same as \verb'opts.nthreads' in the MATLAB interface \\ 830\hline 831\end{tabular} 832} 833\vspace{0.1in} 834 835Other parameters, such as \verb'opts.ordering' and \verb'opts.tol', 836are input parameters to the various C/C++ functions. Others such as 837\verb"opts.solution='min2norm'" are separate functions in the C/C++ 838interface. Refer to the files listed above for details. 839Output statistics include: 840 841\vspace{0.1in} 842{\footnotesize 843\begin{tabular}{|ll|} 844\hline 845\verb'cc->SPQR_flopcount_bound' & an upper bound on the flop count \\ 846\verb'cc->SPQR_tol_used' & the tolerance used (\verb'opts.tol') \\ 847\hline 848\verb'cc->SPQR_istat [0]' & upper bound on \verb'nnz(R)' \\ 849\verb'cc->SPQR_istat [1]' & upper bound on \verb'nnz(H)' \\ 850\verb'cc->SPQR_istat [2]' & number of frontal matrices \\ 851\verb'cc->SPQR_istat [3]' & number of TBB tasks \\ 852\verb'cc->SPQR_istat [4]' & estimate of the rank of \verb'A' \\ 853\verb'cc->SPQR_istat [5]' & number of column singletons \\ 854\verb'cc->SPQR_istat [6]' & number of row singletons \\ 855\verb'cc->SPQR_istat [7]' & ordering used \\ 856\hline 857\verb'cc->memory_usage' & memory used, in bytes \\ 858\hline 859\end{tabular} 860} 861\vspace{0.1in} 862 863The upper bound on the flop count is found in the analysis phase, which ignores 864the numerical values of \verb'A' (the same analysis phase operates on both real 865and complex matrices). Thus, if you are factorizing a complex matrix, multiply 866this statistic by 4. 867 868%------------------------------------------------------------------------------- 869\section{GPU acceleration} 870\label{GPU} 871%------------------------------------------------------------------------------- 872 873As of version 2.0.0, SuiteSparseQR now includes GPU acceleration. 874It can exploit a single NVIDIA GPU, via CUDA. To enable GPU acceleration, 875you must compile SuiteSparseQR with non-default options. See the 876\verb'SuiteSparse_config_GPU_gcc.mk' file in the \verb'SuiteSparse_config' 877directory for details. The packages SuiteSparse\_GPURuntime and 878GPUQREngine are also required (they should appear in the SuiteSparse 879directory, along with SPQR). 880 881At run time, you must also enable the GPU by setting \verb'Common->useGPU' 882to \verb'true'. Before calling any SuiteSparseQR function, you must 883poll the GPU to set the available memory. Below is a sample code 884that initializes CHOLMOD and then polls the GPU for use in SuiteSparseQR. 885 886\begin{verbatim} 887 size_t total_mem, available_mem ; 888 cholmod_common *cc, Common ; 889 cc = &Common ; 890 cholmod_l_start (cc) ; 891 cc->useGPU = true ; 892 cholmod_l_gpu_memorysize (&total_mem, &available_mem, cc) ; 893 cc->gpuMemorySize = available_mem ; 894 if (cc->gpuMemorySize <= 1) 895 { 896 printf ("no GPU available\n") ; 897 } 898 899 // Subsequent calls to SuiteSparseQR will use the GPU, if available 900\end{verbatim} 901 902See \verb'Demo/qrdemo_gpu.cpp' for an extended example, which can be 903compiled via \verb'make gpu' in the \verb'Demo' directory. 904 905GPU acceleration is not yet available via the MATLAB mexFunction interface. 906We expect to include this in a future release. 907 908For a detailed technical report on the GPU-accelerated algorithm, 909see \verb'qrgpu_paper.pdf' in the \verb'Doc' directory. 910 911%------------------------------------------------------------------------------- 912\section{Requirements and Availability} 913\label{summary} 914%------------------------------------------------------------------------------- 915 916SuiteSparseQR requires four prior Collected Algorithms of the ACM: CHOLMOD 917\cite{ChenDavisHagerRajamanickam09,DavisHager09} (version 1.7 or later), AMD 918\cite{AmestoyDavisDuff96,AmestoyDavisDuff03}, and COLAMD 919\cite{DavisGilbertLarimoreNg00_algo,DavisGilbertLarimoreNg00} for its 920ordering/analysis phase and for its basic sparse matrix data structure, and the 921BLAS \cite{dddh:90} for dense matrix computations on its frontal matrices; also 922required is LAPACK \cite{LAPACK} for its Householder reflections. An efficient 923implementation of the BLAS is strongly recommended, either vendor-provided 924(such as the Intel MKL, the AMD ACML, or the Sun Performance Library) or other 925high-performance BLAS such as those of \cite{GotoVanDeGeijn08}. 926 927The use of Intel's Threading Building Blocks is optional \cite{Reinders07}, but 928without it, only parallelism within the BLAS can be exploited (if available). 929Suite\-SparseQR can optionally use METIS 4.0.1 \cite{KarypisKumar98e} and two 930constrained minimum degree ordering algorithms, CCOLAMD and CAMD 931\cite{ChenDavisHagerRajamanickam09}, for its fill-reducing ordering options. 932SuiteSparseQR can be compiled without these ordering methods and without TBB. 933 934In addition to appearing as Collected Algorithm 8xx of the ACM, SuiteSparseQR 935is available at 936\htmladdnormallink{http://www.suitesparse.com}{http://www.suitesparse.com} 937and at MATLAB Central 938in the user-contributed File Exchange ( 939\htmladdnormallink{http://www.mathworks.com/matlabcentral}{http://www.mathworks.com/matlabcentral} 940). 941See SPQR/Doc/License.txt for the license. 942Alternative licenses are also 943available; contact the author for details. 944 945%------------------------------------------------------------------------------- 946% References 947%------------------------------------------------------------------------------- 948 949\bibliographystyle{plain} 950\bibliography{spqr_user_guide} 951\end{document} 952 953