1\documentclass[11pt]{article} 2 3\newcommand{\m}[1]{{\bf{#1}}} % for matrices and vectors 4\newcommand{\tr}{^{\sf T}} % transpose 5 6\topmargin 0in 7\textheight 9in 8\oddsidemargin 0pt 9\evensidemargin 0pt 10\textwidth 6.5in 11 12%------------------------------------------------------------------------------ 13\begin{document} 14%------------------------------------------------------------------------------ 15 16\title{User Guide for KLU and BTF} 17\author{ 18Timothy A. Davis\thanks{ 19DrTimothyAldenDavis@gmail.com, 20http://www.suitesparse.com. 21This work was supported by Sandia National Labs, and the National 22Science Foundation. 23Portions of the work were done while on sabbatical at Stanford University 24and Lawrence Berkeley National Laboratory (with funding from Stanford 25University and the SciDAC program). 26} 27\and Eka Palamadai Natarajan} 28 29\date{VERSION 1.3.9, Mar 12, 2018} 30\maketitle 31 32%------------------------------------------------------------------------------ 33\begin{abstract} 34KLU is a set of routines for solving sparse linear systems of equations. 35It is particularly well-suited to matrices arising in SPICE-like circuit 36simulation applications. 37It relies on a permutation to block triangular form (BTF), several methods 38for finding a fill-reducing ordering (variants of approximate minimum degree, 39and nested dissection), and a sparse left-looking LU factorization method 40to factorize each block. A MATLAB interface is included. 41KLU appears as Collected Algorithm 907 of the ACM \cite{DavisNatarajan10}. 42\end{abstract} 43%------------------------------------------------------------------------------ 44 45\newpage 46\tableofcontents 47\newpage 48 49%------------------------------------------------------------------------------ 50\section{License and Copyright} 51%------------------------------------------------------------------------------ 52 53KLU, Copyright\copyright 2004-2011 University of Florida. 54All Rights Reserved. 55KLU is available under alternate licenses; contact T. Davis for details. 56 57{\bf KLU License:} see KLU/Doc/License.txt for the license. 58 59{\bf Availability:} {\tt http://www.suitesparse.com} 60 61{\bf Acknowledgments:} 62 63 This work was supported by Sandia National Laboratories (Mike Heroux) 64 and the National Science Foundation under grants 0203270 and 0620286. 65 66 67%------------------------------------------------------------------------------ 68\newpage 69\section{Overview} 70%------------------------------------------------------------------------------ 71 72KLU is a set of routines for solving sparse linear systems of equations. It 73first permutes the matrix into upper block triangular form, via the BTF 74package. This is done by first finding a permutation for a zero-free diagonal 75(a maximum transversal) \cite{Duff81}. If there is no such permutation, then 76the matrix is structurally rank-deficient, and is numerically singular. Next, 77Tarjan's method \cite{Duff78a,Tarjan72} is used to find the strongly-connected 78components of the graph. The block triangular form is essentially unique; any 79method will lead to the same number and sizes of blocks, although the ordering 80of the blocks may vary (consider a diagonal matrix, for example). Assuming the 81matrix has full structural rank, the permuted matrix has the following form: 82\[ 83PAQ = 84\left[ 85\begin{array}{ccc} 86A_{11} & \cdots & A_{1k} \\ 87 & \ddots & \vdots \\ 88 & & A_{kk} \\ 89\end{array} 90\right], 91\] 92where each diagonal block is square with a zero-free diagonal. 93 94Next, each diagonal block is factorized with a sparse left-looking method 95\cite{GilbertPeierls88}. The kernel of this factorization method is an 96efficient method for solving $Lx=b$ when $L$, $x$, and $b$ are all sparse. 97This kernel is used to compute each column of $L$ and $U$, one column at a time. 98The total work performed by this method is always proportional to the number of 99floating-point operations, something that is not true of any other sparse LU 100factorization method. 101 102Prior to factorizing each diagonal block, the blocks are ordered to reduce 103fill-in. By default, the symmetric approximate minimum degree (AMD) ordering 104is used on $A_{ii}+A_{ii}^T$ \cite{AmestoyDavisDuff96,AmestoyDavisDuff03}. 105Another ordering option is to find a column ordering via COLAMD 106\cite{DavisGilbertLarimoreNg00_algo,DavisGilbertLarimoreNg00}. 107Alternatively, a permutation can be provided by the user, or a pointer to 108a user-provided ordering function can be passed, which is then used to order 109each block. 110 111Only the diagonal blocks need to be factorized. Consider a linear system where 112the matrix is permuted into three blocks, for example: 113\[ 114\left[ 115\begin{array}{ccc} 116A_{11} & A_{12} & A_{13} \\ 117 & A_{22} & A_{23} \\ 118 & & A_{33} \\ 119\end{array} 120\right] 121\left[ 122\begin{array}{c} 123x_{1} \\ 124x_{2} \\ 125x_{3} \\ 126\end{array} 127\right] 128= 129\left[ 130\begin{array}{c} 131b_{1} \\ 132b_{2} \\ 133b_{3} \\ 134\end{array} 135\right]. 136\] 137 138The non-singular system $A_{33} x_{3} = b_{3}$ can first be solved for $x_{3}$. 139After a block back substitution, the resulting system becomes 140\[ 141\left[ 142\begin{array}{cc} 143A_{11} & A_{12} \\ 144 & A_{22} \\ 145\end{array} 146\right] 147\left[ 148\begin{array}{c} 149x_{1} \\ 150x_{2} \\ 151\end{array} 152\right] 153= 154\left[ 155\begin{array}{c} 156b_{1} - A_{13} x_{3}\\ 157b_{2} - A_{23} x_{3}\\ 158\end{array} 159\right] 160= 161\left[ 162\begin{array}{c} 163b'_{1} \\ 164b'_{2} \\ 165\end{array} 166\right] 167\] 168and the $A_{22} x_{2} = b'_{2}$ system can be solved for $x_{2}$. The primary 169advantage of this method is that no fill-in occurs in the off-diagonal blocks 170($A_{12}$, $A_{13}$, and $A_{23}$). This is particular critical for sparse 171linear systems arising in SPICE-like circuit simulation 172\cite{Kundert86,KundertSangiovanniVincentelli85,NagelPederson,Quarles:M89/42}. 173Circuit matrices are typically permutable into block triangular form, with many 174singletons (1-by-1 blocks). They also often have a handful of rows and columns 175with many nonzero entries, due to voltage and current sources. These rows and 176columns are pushed into the upper block triangular form, and related to the 177singleton blocks (for example, $A_{33}$ in the above system is 1-by-1, and the 178column in $A_{13}$ and $A_{23}$ has many nonzero entries). Since these 179nearly-dense rows and columns do not appear in the LU factorization of the 180diagonal blocks, they cause no fill-in. 181 182The structural rank of a matrix is based solely on the pattern of its entries, 183not their numerical values. With random entries, the two ranks are equal with 184probability one. The structural rank of any matrix is an upper bound on the 185numerical rank. The maximum transversal algorithm in the BTF package is useful 186in determining if a matrix is structurally rank deficient, and if so, it 187reveals a (non-unique) set of rows and columns that contribute to that rank 188deficiency. This is useful in determining what parts of a circuit are poorly 189formulated (such as dangling components). 190 191When ordered and factorized with KLU, very little fill-in occurs in the 192resulting LU factors, which means that there is little scope for use of the 193BLAS \cite{ACM679a}. Sparse LU factorization methods that use the BLAS (such 194as SuperLU \cite{SuperLU99} amd UMFPACK \cite{Davis03_algo,Davis03}) are slower 195than KLU when applied to sparse matrices arising in circuit simulation. Both 196KLU and SuperLU are based on Gilbert and Peierl's left-looking method 197\cite{GilbertPeierls88}. SuperLU uses supernodes, but KLU does not; thus the 198name {\em KLU} refers to a ``Clark Kent'' LU factorization algorithm (what 199SuperLU was before it became Super). 200 201For details of the permutation to block triangular form, left-looking sparse 202LU factorization, and approximate minimum degree, refer to \cite{Davis06book}. 203Concise versions of these methods can be found in the CSparse package. KLU is 204also the topic of a Master's thesis by Palamadai Natarajan \cite{Palamadai05}; 205a copy of 206the thesis can be found in the {\tt KLU/Doc} directory. It includes a 207description of an earlier version of KLU; some of the function names and 208parameter lists have changed in this version. The descriptions of the methods 209used still applies to the current version of KLU, however. 210 211KLU appears as {\em Algorithm 907: KLU, a direct sparse solver for circuit 212simulation problems}, ACM Transactions on Mathematical Software, vol 37, no 3, 2132010. 214 215%------------------------------------------------------------------------------ 216\section{Availability} 217%------------------------------------------------------------------------------ 218 219KLU and its required ordering packages (BTF, COLAMD, AMD, and 220SuiteSparse\_config) are 221available at \newline {\tt http://www.suitesparse.com.} In 222addition, KLU can make use of any user-provided ordering function. One such 223function is included, which provides KLU with an interface to the ordering 224methods used in CHOLMOD \cite{ChenDavisHagerRajamanickam06}, such as METIS, a 225nested dissection method \cite{KarypisKumar98e}. After permutation to block 226triangular form, circuit matrices have very good node separators, and are thus 227excellent candidates for nested dissection. The METIS ordering takes much more 228time to compute than the AMD ordering, but if the ordering is reused many times 229(typical in circuit simulation) the better-quality ordering can pay off in 230lower total simulation time. 231 232To use KLU, you must obtain the KLU, BTF, SuiteSparse\_config, 233AMD, and COLAMD packages 234in the SuiteSparse suite of sparse matrix libraries. See 235{\tt http://www.suitesparse.com} for each of these packages. 236They are also all contained within the single {\tt SuiteSparse.zip} or 237{\tt SuiteSparse.tar.gz} distribution. 238 239%------------------------------------------------------------------------------ 240\section{Using KLU and BTF in MATLAB} 241%------------------------------------------------------------------------------ 242 243KLU has a single MATLAB interface which provides several options for factorizing 244a matrix and/or using the factors to solve a linear system. The following is 245a synopsis of its use. For more details, type {\tt help klu} in MATLAB. 246 247{\footnotesize 248\begin{verbatim} 249 LU = klu (A) factorizes R\A(p,q) into L*U+F, returning a struct 250 x = klu (A,'\',b) x = A\b 251 x = klu (b,'/',A) x = b/A 252 x = klu (LU,'\',b) x = A\b, where LU = klu(A) 253 x = klu (b,'/',LU) x = b/A, where LU = klu(A) 254\end{verbatim} 255} 256 257With a single input {\tt klu(A)} returns a MATLAB struct containing the LU 258factors. The factorization is in the form \verb'L*U + F = R \ A(p,q)' 259where {\tt L*U} is the LU factorization of just the diagonal blocks of the 260block triangular form, {\tt F} is a sparse matrix containing the entries in 261the off-diagonal blocks, {\tt R} is a diagonal matrix containing the row 262scale factors, and {\tt p} and {\tt q} are permutation vectors. The {\tt LU} 263struct also contains a vector {\tt r} which describes the block boundaries 264(the same as the third output parameter of {\tt dmperm}). The {\tt k}th 265block consists of rows and columns {\tt r(k)} to {\tt r(k+1)-1} in the 266permuted matrix {\tt A(p,q)} and the factors {\tt L} and {\tt U}. 267 268An optional final input argument ({\tt klu(A,opts)} for example) provides a 269way of modifying KLU's user-definable parameters, including a partial pivoting 270tolerance and ordering options. A second output argument 271({\tt [LU,info] = klu ( ... )}) provides statistics on the factorization. 272 273The BTF package includes three user-callable MATLAB functions which replicate 274most of features of the MATLAB built-in {\tt dmperm} function, and provide an 275additional option which can significantly limit the worst-case time taken by 276{\tt dmperm}. For more details, type {\tt help btf}, {\tt help maxtrans}, 277and {\tt help strongcomp} in MATLAB. Additional information about how 278these functions work can be found in \cite{Davis06book}. 279 280{\footnotesize 281\begin{verbatim} 282 [p,q,r] = btf (A) similar to [p,q,r] = dmperm (A) 283 q = maxtrans (A) similar to q = dmperm (A') 284 [p,r] = strongcomp (A) similar to [p,q,r] = dmperm (A + speye(n)) 285\end{verbatim} 286} 287 288Both {\tt btf} and {\tt maxtrans} include a second option input, {\tt maxwork}, 289which limits the total work performed in the maximum transversal to 290{\tt maxwork * nnz(A)}. The worst-case time taken by the algorithm is 291$O$ ({\tt n * nnz(A)}), where the matrix {\tt A} is {\tt n}-by-{\tt n}, but 292this worst-case time is rarely reached in practice. 293 294To use the KLU and BTF functions in MATLAB, you must first compile and install 295them. In the {\tt BTF/MATLAB} directory, type {\tt btf\_install}, and then 296type {\tt klu\_install} in the {\tt KLU/MATLAB} directory. Alternatively, if 297you have the entire SuiteSparse, simply run the {\tt SuiteSparse\_install} 298function in the {\tt SuiteSparse} directory. 299 300After running the installation scripts, type {\tt pathtool} and save your path 301for future MATLAB sessions. If you cannot save your path because of file 302permissions, edit your {\tt startup.m} by adding {\tt addpath} commands (type 303{\tt doc startup} and {\tt doc addpath} for more information). 304 305%------------------------------------------------------------------------------ 306\section{Using KLU and BTF in a C program} 307\label{Cversion} 308%------------------------------------------------------------------------------ 309 310KLU and BTF include the following C-callable functions. Each function is 311available in two or four versions: with {\tt int} or {\tt long} integers, and 312(for functions that deal with numerical values), with {\tt double} or complex 313{\tt double} values. The {\tt long} integer is actually a {\tt SuiteSparse\_long}, 314which is typically a {\tt long}, defined with a {\tt \#define} statement. It 315becomes an {\tt \_\_int64} on Microsoft Windows 64, however. 316 317The usage of real and complex, and {\tt int} and {\tt SuiteSparse\_long}, must not be 318mixed, except that some functions can be used for both real and complex cases. 319 320%------------------------------------------------------------------------------ 321\subsection{KLU Common object} 322%------------------------------------------------------------------------------ 323 324The {\tt klu\_common} object ({\tt klu\_l\_common} for the {\tt SuiteSparse\_long} 325version) contains user-definable parameters and statistics returned from 326KLU functions. This object appears in every KLU function as the last 327parameter. Details are given in the {\tt klu.h} include file, which also 328appears below in Section~\ref{klu_include}. Primary parameters and statistics 329are summarized below. The defaults are chosen specifically for circuit 330simulation matrices. 331 332\begin{itemize} 333\item {\tt tol}: partial pivoting tolerance. If the diagonal entry has a 334magnitude greater than or equal to {\tt tol} times the largest magnitude 335of entries in the pivot column, then the diagonal entry is chosen. 336Default value: 0.001. 337 338\item {\tt ordering}: which fill-reducing ordering to use: 0 for AMD, 3391 for COLAMD, 2 for a user-provided permutation {\tt P} and {\tt Q} 340(or a natural ordering if {\tt P} and {\tt Q} are NULL), or 3 for 341the {\tt user\_order} function. Default: 0 (AMD). 342 343\item {\tt scale}: whether or not the matrix should be scaled. 344If {\tt scale < 0}, then no scaling is performed and the input matrix 345is not checked for errors. If {\tt scale >= 0}, the input matrix is 346check for errors. 347If {\tt scale=0}, then no scaling is performed. 348If {\tt scale=1}, then each row of {\tt A} is divided by the sum of 349the absolute values in that row. 350If {\tt scale=2}, then each row of {\tt A} is divided by the maximum 351absolute value in that row. Default: 2. 352 353\item {\tt btf}: if nonzero, then BTF is used to permute the input matrix 354into block upper triangular form. This step is skipped if {\tt Common.btf} 355is zero. Default: 1. 356 357\item {\tt maxwork}: sets an upper limit on the amount of work performed in 358{\tt btf\_maxtrans} to \newline {\tt maxwork*nnz(A)}. If the limit is reached, 359a partial zero-free diagonal might be found. This has no effect on whether or 360not the matrix can be factorized, since the matrix can be factorized with no 361BTF pre-ordering at all. This option provides a tradeoff between the 362effectiveness of the BTF ordering and the cost to compute it. A partial result 363can result in fewer, and larger, blocks in the BTF form, resulting to more work 364required to factorize the matrix. No limit is enforced if {\tt maxwork <= 0}. 365Default: 0. 366 367\item {\tt user\_order}: a pointer to a function that can be provided by the 368application that uses KLU, to redefine the fill-reducing ordering used by KLU 369for each diagonal block. The {\tt int} and {\tt SuiteSparse\_long} prototypes must be 370as follows: 371 372{\footnotesize 373\begin{verbatim} 374int my_ordering_function (int n, int Ap [ ], int Ai [ ], int Perm [ ], klu_common *Common) ; 375 376 377SuiteSparse_long my_long_ordering_function (SuiteSparse_long n, 378 SuiteSparse_long Ap [ ], SuiteSparse_long Ai [ ], 379 SuiteSparse_long Perm [ ], klu_l_common *Common); 380\end{verbatim} 381} 382 383The function should return 0 if an error occurred, and either -1 or a positive 384(nonzero) value if no error occurred. If greater than zero, then the return 385value is interpreted by KLU as an estimate of the number of nonzeros in $L$ or 386$U$ (whichever is greater), when the permuted matrix is factorized. Only an 387estimate is possible, since partial pivoting with row interchanges is performed 388during numerical factorization. The input matrix is provided to the function 389in the parameters {\tt n}, {\tt Ap}, and {\tt Ai}, in compressed-column form. 390The matrix {\tt A} is {\tt n}-by-{\tt n}. The {\tt Ap} array is of size {\tt 391n+1}; the {\tt j}th column of {\tt A} has row indices {\tt Ai[Ap[j] ... 392Ap[j+1]-1]}. The {\tt Ai} array is of size {\tt Ap[n]}. The first column 393pointer {\tt Ap[0]} is zero. The row indices might not appear sorted in each 394column, but no duplicates will appear. 395 396The output permutation is to be passed back in the {\tt Perm} array, where 397{\tt Perm[k]=j} means that row and column {\tt j} of {\tt A} will appear as 398the {\tt k}th row and column of the permuted matrix factorized by KLU. The 399{\tt Perm} array is already allocated when it is passed to the user function. 400 401The user function may use, and optionally modify, the contents of the {\tt 402klu\_common Common} object. In particular, prior to calling KLU, the user 403application can set both {\tt Common.user\_order} and {\tt Common.user\_data}. 404The latter is a {\tt void *} pointer that KLU does not use, except to pass to 405the user ordering function pointed to by {\tt Common.user\_order}. This is a 406mechanism for passing additional arguments to the user function. 407 408An example user function is provided in the {\tt KLU/User} directory, which 409provides an interface to the ordering method in CHOLMOD. 410 411\end{itemize} 412 413%------------------------------------------------------------------------------ 414\subsection{KLU Symbolic object} 415%------------------------------------------------------------------------------ 416 417KLU performs its sparse LU factorization in two steps. The first is purely 418symbolic, and does not depend on the numerical values. This analysis returns a 419{\tt klu\_symbolic} object ({\tt klu\_l\_symbolic} in the {\tt SuiteSparse\_long} 420version). The {\tt Symbolic} object contains a pre-ordering which combines the 421block triangular form with the fill-reducing ordering, and an estimate of the 422number of nonzeros in the factors of each block. Its size is thus modest, only 423proportional to {\tt n}, the dimension of {\tt A}. It can be reused multiple 424times for the factorization of a sequence of matrices with identical nonzero 425pattern. Note: a {\em nonzero} in this sense is an entry present in the data 426structure of {\tt A}; such entries may in fact be numerically zero. 427 428%------------------------------------------------------------------------------ 429\subsection{KLU Numeric object} 430%------------------------------------------------------------------------------ 431 432The {\tt Numeric} object contains the numeric sparse LU factorization, including 433the final pivot permutations. To solve a linear system, both the {\tt Symbolic} 434and {\tt Numeric} objects are required. 435 436%------------------------------------------------------------------------------ 437\subsection{A sparse matrix in KLU} 438%------------------------------------------------------------------------------ 439 440% From here on, only the {\tt int} version is described. In the {\tt SuiteSparse\_long} 441% version, the function names change slightly ({\tt klu\_factor} becomes 442% {\tt klu\_l\_factor}, and the {\tt int}/complex version {\tt klu\_z\_factor} 443% becomes {\tt klu\_zl\_factor}, for example). For more details on the 444% {\tt SuiteSparse\_long} version, refer to Section~\ref{klu_include}. 445 446The input matrix provided to KLU is in sparse compressed-column form, which is 447the same data structure used internally in MATLAB, except that the version used 448here allows for the row indices to appear in any ordering, and this version 449also allows explicit zero entries to appear in the data structure. The same 450data structure is used in CSparse, which is fully documented in 451\cite{Davis06book}. If you wish to use a simpler input data structure, 452consider creating a triplet matrix in CSparse (or CXSparse if you use the long 453and/or complex versions of KLU), and then convert this data structure into a 454sparse compressed-column data structure, using the CSparse {\tt cs\_compress} 455and {\tt cs\_dupl} functions. Similar functions are available in CHOLMOD 456{\tt cholmod\_triplet\_to\_sparse}. 457 458The matrix is described with four parameters: 459 460\begin{itemize} 461\item {\tt n}: an integer scalar. The matrix is {\tt n}-by-{\tt n}. Note that 462KLU only operates on square matrices. 463 464\item {\tt Ap}: an integer array of size {\tt n+1}. The first entry is {\tt 465Ap[0]=0}, and the last entry {\tt nz=Ap[n]} is equal to the number of entries 466in the matrix. 467 468\item {\tt Ai}: an integer array of size {\tt nz = Ap[n]}. 469The row indices of entries in column {\tt j} of {\tt A} are located in 470{\tt Ai [Ap [j] ... Ap [j+1]-1]}. The matrix is zero-based; row and column 471indices are in the range 0 to {\tt n-1}. 472 473\item {\tt Ax}: a {\tt double} array of size {\tt nz} for the real case, or 474{\tt 2*nz} for the complex case. For the complex case, the real and imaginary 475parts are interleaved, compatible with Fortran and the ANSI C99 Complex data 476type. KLU does not rely on the ANSI C99 data type, however, for portability 477reasons. The numerical values in column {\tt j} of a real matrix are located 478in {\tt Ax [Ap [j] ... Ap [j+1]-1]}. For a complex matrix, they appear in {\tt 479Ax [2*Ap [j] ... 2*Ap [j+1]-1]}, as real/imaginary pairs (the real part appears 480first, followed by the imaginary part). 481 482\end{itemize} 483 484%------------------------------------------------------------------------------- 485\subsection{{\tt klu\_defaults}: set default parameters} 486%------------------------------------------------------------------------------- 487 488This function sets the default parameters for KLU and clears the statistics. 489It may be used for either the real or complex cases. A value of 0 is returned 490if an error occurs, 1 otherwise. This function {\bf must} be called before 491any other KLU function can be called. 492 493{\footnotesize 494\begin{verbatim} 495 #include "klu.h" 496 int ok ; 497 klu_common Common ; 498 ok = klu_defaults (&Common) ; /* real or complex */ 499 500 501 #include "klu.h" 502 SuiteSparse_long ok ; 503 klu_l_common Common ; 504 ok = klu_l_defaults (&Common) ; /* real or complex */ 505\end{verbatim} 506} 507 508%------------------------------------------------------------------------------- 509\subsection{{\tt klu\_analyze}: order and analyze a matrix} 510%------------------------------------------------------------------------------- 511 512The following usage returns a {\tt Symbolic} object that contains the 513fill-reducing ordering needed to factorize the matrix {\tt A}. A NULL pointer 514is returned if a failure occurs. The error status for this function, and all 515others, is returned in {\tt Common.status}. These functions may be used for 516both real and complex cases. The AMD ordering is used if {\tt Common.ordering 517= 0}, COLAMD is used if it is 1, the natural ordering is used if it is 2, and 518the user-provided {\tt Common.user\_ordering} is used if it is 3. 519 520{\footnotesize 521\begin{verbatim} 522 #include "klu.h" 523 int n, Ap [n+1], Ai [nz] ; 524 klu_symbolic *Symbolic ; 525 klu_common Common ; 526 Symbolic = klu_analyze (n, Ap, Ai, &Common) ; /* real or complex */ 527 528 529 #include "klu.h" 530 SuiteSparse_long n, Ap [n+1], Ai [nz] ; 531 klu_l_symbolic *Symbolic ; 532 klu_l_common Common ; 533 Symbolic = klu_l_analyze (n, Ap, Ai, &Common) ; /* real or complex */ 534\end{verbatim} 535} 536 537%------------------------------------------------------------------------------- 538\subsection{{\tt klu\_analyze\_given}: order and analyze a matrix} 539%------------------------------------------------------------------------------- 540 541In this routine, the fill-reducing ordering is provided by the user ({\tt 542Common.ordering} is ignored). Instead, the row permutation {\tt P} and column 543permutation {\tt Q} are used. These are integer arrays of size {\tt n}. If 544NULL, a natural ordering is used (so to provide just a column ordering, pass 545{\tt Q} as non-NULL and {\tt P} as NULL). A NULL pointer is returned if an 546error occurs. These functions may be used for both real and complex cases. 547 548{\footnotesize 549\begin{verbatim} 550 #include "klu.h" 551 int n, Ap [n+1], Ai [nz], P [n], Q [n] ; 552 klu_symbolic *Symbolic ; 553 klu_common Common ; 554 Symbolic = klu_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */ 555 556 557 #include "klu.h" 558 SuiteSparse_long n, Ap [n+1], Ai [nz], P [n], Q [n] ; 559 klu_l_symbolic *Symbolic ; 560 klu_l_common Common ; 561 Symbolic = klu_l_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */ 562\end{verbatim} 563} 564 565%------------------------------------------------------------------------------- 566\subsection{{\tt klu\_factor}: numerical factorization} 567%------------------------------------------------------------------------------- 568 569The {\tt klu\_factor} function factorizes a matrix, using a sparse left-looking 570method with threshold partial pivoting. The inputs {\tt Ap} and {\tt Ai} must 571be unchanged from the previous call to {\tt klu\_analyze} that created the {\tt 572Symbolic} object. A NULL pointer is returned if an error occurs. 573 574{\footnotesize 575\begin{verbatim} 576 #include "klu.h" 577 int Ap [n+1], Ai [nz] ; 578 double Ax [nz], Az [2*nz] ; 579 klu_symbolic *Symbolic ; 580 klu_numeric *Numeric ; 581 klu_common Common ; 582 Numeric = klu_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */ 583 Numeric = klu_z_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */ 584 585 586 #include "klu.h" 587 SuiteSparse_long Ap [n+1], Ai [nz] ; 588 double Ax [nz], Az [2*nz] ; 589 klu_l_symbolic *Symbolic ; 590 klu_l_numeric *Numeric ; 591 klu_l_common Common ; 592 Numeric = klu_l_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */ 593 Numeric = klu_zl_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */ 594\end{verbatim} 595} 596 597%------------------------------------------------------------------------------- 598\subsection{{\tt klu\_solve}: solve a linear system} 599%------------------------------------------------------------------------------- 600 601Solves the linear system $Ax=b$, using the {\tt Symbolic} and {\tt Numeric} 602objects. The right-hand side {\tt B} is overwritten with the solution on 603output. The array {\tt B} is stored in column major order, with a leading 604dimension of {\tt ldim}, and {\tt nrhs} columns. Thus, the real entry $b_{ij}$ 605is stored in {\tt B [i+j*ldim]}, where {\tt ldim >= n} must hold. A complex 606entry $b_{ij}$ is stored in {\tt B [2*(i+j*ldim)]} and {\tt B [2*(i+j*ldim)+1]} 607(for the real and imaginary parts, respectively). Returns 1 if successful, 6080 if an error occurs. 609 610{\footnotesize 611\begin{verbatim} 612 #include "klu.h" 613 int ldim, nrhs, ok ; 614 double B [ldim*nrhs], Bz [2*ldim*nrhs] ; 615 klu_symbolic *Symbolic ; 616 klu_numeric *Numeric ; 617 klu_common Common ; 618 ok = klu_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ 619 ok = klu_z_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */ 620 621 622 #include "klu.h" 623 SuiteSparse_long ldim, nrhs, ok ; 624 double B [ldim*nrhs], Bz [2*ldim*nrhs] ; 625 klu_symbolic *Symbolic ; 626 klu_numeric *Numeric ; 627 klu_common Common ; 628 ok = klu_l_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ 629 ok = klu_zl_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */ 630\end{verbatim} 631} 632 633%------------------------------------------------------------------------------- 634\subsection{{\tt klu\_tsolve}: solve a transposed linear system} 635%------------------------------------------------------------------------------- 636 637Solves the linear system $A^Tx=b$ or $A^Hx=b$. The {\tt conj\_solve} input 638is 0 for $A^Tx=b$, or nonzero for $A^Hx=b$. Otherwise, the function is 639identical to {\tt klu\_solve}. 640 641 642{\footnotesize 643\begin{verbatim} 644 #include "klu.h" 645 int ldim, nrhs, ok ; 646 double B [ldim*nrhs], Bz [2*ldim*nrhs] ; 647 klu_symbolic *Symbolic ; 648 klu_numeric *Numeric ; 649 klu_common Common ; 650 ok = klu_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ 651 ok = klu_z_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */ 652 653 654 #include "klu.h" 655 SuiteSparse_long ldim, nrhs, ok ; 656 double B [ldim*nrhs], Bz [2*ldim*nrhs] ; 657 klu_symbolic *Symbolic ; 658 klu_numeric *Numeric ; 659 klu_common Common ; 660 ok = klu_l_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ 661 ok = klu_zl_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */ 662\end{verbatim} 663} 664 665 666%------------------------------------------------------------------------------- 667\subsection{{\tt klu\_refactor}: numerical refactorization} 668%------------------------------------------------------------------------------- 669 670The {\tt klu\_refactor} function takes as input the {\tt Numeric} object 671created by {\tt klu\_factor} (or as modified by a previous call to {\tt 672klu\_refactor}). It factorizes a new matrix with the same nonzero pattern as 673that given to the call to {\tt klu\_factor} which created it. The same pivot 674order is used. Since this can lead to numeric instability, the use of {\tt 675klu\_rcond}, {\tt klu\_rgrowth}, or {\tt klu\_condest} is recommended to check 676the accuracy of the resulting factorization. The inputs {\tt Ap} and {\tt Ai} 677must be unmodified since the call to {\tt klu\_factor} that first created the 678{\tt Numeric} object. This is function is much faster than {\tt klu\_factor}, 679and requires no dynamic memory allocation. 680 681{\footnotesize 682\begin{verbatim} 683 #include "klu.h" 684 int ok, Ap [n+1], Ai [nz] ; 685 double Ax [nz], Az [2*nz] ; 686 klu_symbolic *Symbolic ; 687 klu_numeric *Numeric ; 688 klu_common Common ; 689 ok = klu_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 690 ok = klu_z_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 691 692 693 #include "klu.h" 694 SuiteSparse_long ok, Ap [n+1], Ai [nz] ; 695 double Ax [nz], Az [2*nz] ; 696 klu_l_symbolic *Symbolic ; 697 klu_l_numeric *Numeric ; 698 klu_l_common Common ; 699 ok = klu_l_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 700 ok = klu_zl_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 701\end{verbatim} 702} 703 704%------------------------------------------------------------------------------- 705\subsection{{\tt klu\_free\_symbolic}: destroy the {\tt Symbolic} object} 706%------------------------------------------------------------------------------- 707 708It is the user's responsibility to destroy the {\tt Symbolic} object when it is 709no longer needed, or else a memory leak will occur. It is safe to pass a NULL 710{\tt Symbolic} pointer. These functions may be used for both real and complex 711cases. 712 713{\footnotesize 714\begin{verbatim} 715 #include "klu.h" 716 klu_symbolic *Symbolic ; 717 klu_common Common ; 718 klu_free_symbolic (&Symbolic, &Common) ; /* real or complex */ 719 720 721 #include "klu.h" 722 klu_l_symbolic *Symbolic ; 723 klu_l_common Common ; 724 klu_l_free_symbolic (&Symbolic, &Common) ; /* real or complex */ 725\end{verbatim} 726} 727 728%------------------------------------------------------------------------------- 729\subsection{{\tt klu\_free\_numeric}: destroy the {\tt Numeric} object} 730%------------------------------------------------------------------------------- 731 732It is the user's responsibility to destroy the {\tt Numeric} object when it is 733no longer needed, or else a memory leak will occur. It is safe to pass a NULL 734{\tt Numeric} pointer. 735 736{\footnotesize 737\begin{verbatim} 738 #include "klu.h" 739 klu_numeric *Numeric ; 740 klu_common Common ; 741 klu_free_numeric (&Numeric, &Common) ; /* real */ 742 klu_z_free_numeric (&Numeric, &Common) ; /* complex */ 743 744 745 #include "klu.h" 746 klu_l_numeric *Numeric ; 747 klu_l_common Common ; 748 klu_l_free_numeric (&Numeric, &Common) ; /* real */ 749 klu_zl_free_numeric (&Numeric, &Common) ; /* complex */ 750\end{verbatim} 751} 752 753%------------------------------------------------------------------------------- 754\subsection{{\tt klu\_sort}: sort the columns of L and U} 755%------------------------------------------------------------------------------- 756 757The {\tt klu\_factor} function creates a {\tt Numeric} object with factors 758{\tt L} and {\tt U} stored in a compressed-column form (not the same data 759structure as {\tt A}, but similar). The columns typically contain lists of 760row indices in unsorted order. This function sorts these indices, for two 761purposes: (1) to return {\tt L} and {\tt U} to MATLAB, which expects its 762sparse matrices to have sorted columns, and (2) to slightly improve the 763performance of subsequent calls to {\tt klu\_solve} and {\tt klu\_tsolve}. 764Except within a MATLAB mexFunction (see {\tt KLU/MATLAB/klu\_mex.c}, the use 765of this function is optional. 766 767{\footnotesize 768\begin{verbatim} 769 #include "klu.h" 770 int ok ; 771 klu_symbolic *Symbolic ; 772 klu_numeric *Numeric ; 773 klu_common Common ; 774 ok = klu_sort (Symbolic, Numeric, &Common) ; /* real */ 775 ok = klu_z_sort (Symbolic, Numeric, &Common) ; /* complex */ 776 777 778 #include "klu.h" 779 SuiteSparse_long ok ; 780 klu_l_symbolic *Symbolic ; 781 klu_l_numeric *Numeric ; 782 klu_l_common Common ; 783 ok = klu_l_sort (Symbolic, Numeric, &Common) ; /* real */ 784 ok = klu_zl_sort (Symbolic, Numeric, &Common) ; /* complex */ 785\end{verbatim} 786} 787 788%------------------------------------------------------------------------------- 789\subsection{{\tt klu\_flops}: determine the flop count} 790%------------------------------------------------------------------------------- 791 792This function determines the number of floating-point operations performed 793when the matrix was factorized by {\tt klu\_factor} or {\tt klu\_refactor}. 794The result is returned in {\tt Common.flops}. 795 796 797{\footnotesize 798\begin{verbatim} 799 #include "klu.h" 800 int ok ; 801 klu_symbolic *Symbolic ; 802 klu_numeric *Numeric ; 803 klu_common Common ; 804 ok = klu_flops (Symbolic, Numeric, &Common) ; /* real */ 805 ok = klu_z_flops (Symbolic, Numeric, &Common) ; /* complex */ 806 807 808 #include "klu.h" 809 SuiteSparse_long ok ; 810 klu_l_symbolic *Symbolic ; 811 klu_l_numeric *Numeric ; 812 klu_l_common Common ; 813 ok = klu_l_flops (Symbolic, Numeric, &Common) ; /* real */ 814 ok = klu_zl_flops (Symbolic, Numeric, &Common) ; /* complex */ 815\end{verbatim} 816} 817 818%------------------------------------------------------------------------------- 819\subsection{{\tt klu\_rgrowth}: determine the pivot growth} 820%------------------------------------------------------------------------------- 821 822Computes the reciprocal pivot growth, 823$\mbox{\em rgrowth} = \min_j (( \max_i |c_{ij}| ) / ( \max_i |u_{ij}| ))$, 824where $c_{ij}$ is a scaled entry in a diagonal block of the block triangular 825form. In MATLAB notation: 826\begin{verbatim} 827 rgrowth = min (max (abs (R\A(p,q) - F)) ./ max (abs (U))) 828\end{verbatim} 829where the factorization is \verb'L*U + F = R \ A(p,q)'. 830This function returns 0 if an error occurred, 1 otherwise. If {\tt rgrowth} is 831very small, an inaccurate factorization may have been performed. The inputs 832{\tt Ap}, {\tt Ai}, and {\tt Ax} ({\tt Az} in the complex case) must be 833unchanged since the last call to {\tt klu\_factor} or {\tt klu\_refactor}. The 834result is returned in {\tt Common.rgrowth}. 835 836{\footnotesize 837\begin{verbatim} 838 #include "klu.h" 839 int ok, Ap [n+1], Ai [nz] ; 840 double Ax [nz], Az [2*nz] ; 841 klu_symbolic *Symbolic ; 842 klu_numeric *Numeric ; 843 klu_common Common ; 844 ok = klu_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 845 ok = klu_z_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 846 847 848 #include "klu.h" 849 SuiteSparse_long ok, Ap [n+1], Ai [nz] ; 850 double Ax [nz], Az [2*nz] ; 851 klu_l_symbolic *Symbolic ; 852 klu_l_numeric *Numeric ; 853 klu_l_common Common ; 854 ok = klu_l_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 855 ok = klu_zl_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 856\end{verbatim} 857} 858 859%------------------------------------------------------------------------------- 860\subsection{{\tt klu\_condest}: accurate condition number estimation} 861%------------------------------------------------------------------------------- 862 863This function is essentially the same as the MATLAB {\tt condest} function. It 864computes an estimate of the 1-norm condition number, using Hager's method 865\cite{Hager84} and the generalization by Higham and Tisseur 866\cite{HighamTisseur00}. The inputs {\tt Ap}, and {\tt Ax} ({\tt Az} in the 867complex case) must be unchanged since the last call to {\tt klu\_factor} or 868{\tt klu\_refactor}. The result is returned in {\tt Common.condest}. 869 870{\footnotesize 871\begin{verbatim} 872 #include "klu.h" 873 int ok, Ap [n+1] ; 874 double Ax [nz], Az [2*nz] ; 875 klu_symbolic *Symbolic ; 876 klu_numeric *Numeric ; 877 klu_common Common ; 878 ok = klu_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */ 879 ok = klu_z_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */ 880 881 882 #include "klu.h" 883 SuiteSparse_long ok, Ap [n+1] ; 884 double Ax [nz], Az [2*nz] ; 885 klu_l_symbolic *Symbolic ; 886 klu_l_numeric *Numeric ; 887 klu_l_common Common ; 888 ok = klu_l_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */ 889 ok = klu_zl_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */ 890\end{verbatim} 891} 892 893 894%------------------------------------------------------------------------------- 895\subsection{{\tt klu\_rcond}: cheap reciprocal condition number estimation} 896%------------------------------------------------------------------------------- 897 898This function returns the smallest diagonal entry of {\tt U} divided by the 899largest, which is a very crude estimate of the reciprocal of the condition 900number of the matrix {\tt A}. It is very cheap to compute, however. 901In MATLAB notation, {\tt rcond = min(abs(diag(U))) / max(abs(diag(U)))}. 902If the matrix is singular, {\tt rcond} will be zero. The result is returned 903in {\tt Common.rcond}. 904 905 906{\footnotesize 907\begin{verbatim} 908 #include "klu.h" 909 int ok ; 910 klu_symbolic *Symbolic ; 911 klu_numeric *Numeric ; 912 klu_common Common ; 913 ok = klu_rcond (Symbolic, Numeric, &Common) ; /* real */ 914 ok = klu_z_rcond (Symbolic, Numeric, &Common) ; /* complex */ 915 916 917 #include "klu.h" 918 SuiteSparse_long ok ; 919 klu_l_symbolic *Symbolic ; 920 klu_l_numeric *Numeric ; 921 klu_l_common Common ; 922 ok = klu_l_rcond (Symbolic, Numeric, &Common) ; /* real */ 923 ok = klu_zl_rcond (Symbolic, Numeric, &Common) ; /* complex */ 924\end{verbatim} 925} 926 927 928%------------------------------------------------------------------------------- 929\subsection{{\tt klu\_scale}: scale and check a sparse matrix} 930%------------------------------------------------------------------------------- 931 932This function computes the row scaling factors of a matrix and checks to see if 933it is a valid sparse matrix. It can perform two kinds of scaling, computing 934either the largest magnitude in each row, or the sum of the magnitudes of the 935entries each row. KLU calls this function itself, depending upon the {\tt 936Common.scale} parameter, where {\tt scale < 0} means no scaling, {\tt scale=1} 937means the sum, and {\tt scale=2} means the maximum. That is, in MATLAB 938notation, {\tt Rs = sum(abs(A'))} or {\tt Rs = max(abs(A'))}. KLU then divides 939each row of {\tt A} by its corresponding scale factor. The function returns 0 940if the matrix is invalid, or 1 otherwise. A valid sparse matrix must meet the 941following conditions: 942 943\begin{enumerate} 944\item {\tt n > 0}. Note that KLU does not handle empty (0-by-0) matrices. 945\item {\tt Ap}, {\tt Ai}, and {\tt Ax} ({\tt Az} for the complex case) must not be NULL. 946\item {\tt Ap[0]=0}, and {\tt Ap [j] <= Ap [j+1]} for all {\tt j} in the range 0 to {\tt n-1}. 947\item The row indices in each column, {\tt Ai [Ap [j] ... Ap [j+1]-1]}, must be in 948the range 0 to {\tt n-1}, and no duplicates can appear. If the workspace {\tt W} is 949NULL on input, the check for duplicate entries is skipped. 950\end{enumerate} 951 952{\footnotesize 953\begin{verbatim} 954 #include "klu.h" 955 int scale, ok, n, Ap [n+1], Ai [nz], W [n] ; 956 double Ax [nz], Az [2*nz], Rs [n] ; 957 klu_common Common ; 958 ok = klu_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 959 ok = klu_z_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 960 961 962 #include "klu.h" 963 SuiteSparse_long scale, ok, n, Ap [n+1], Ai [nz], W [n] ; 964 double Ax [nz], Az [2*nz], Rs [n] ; 965 klu_l_common Common ; 966 ok = klu_l_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ 967 ok = klu_zl_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ 968\end{verbatim} 969} 970 971%------------------------------------------------------------------------------- 972\subsection{{\tt klu\_extract}: extract the LU factorization} 973%------------------------------------------------------------------------------- 974 975This function extracts the LU factorization into a set of data structures 976suitable for passing back to MATLAB, with matrices in conventional 977compressed-column form. The {\tt klu\_sort} function should be called first if 978the row indices should be returned sorted. The factorization is returned in 979caller-provided arrays; if any of them are NULL, that part of the factorization 980is not extracted (this is not an error). Returns 1 if successful, 0 otherwise. 981 982The sizes of {\tt Li}, {\tt Lx}, and {\tt Lz} are {\tt Numeric->lnz}, 983{\tt Ui}, {\tt Ux}, and {\tt Uz} are of size {\tt Numeric->unz}, and 984{\tt Fi}, {\tt Fx}, and {\tt Fz} are of size {\tt Numeric->nzoff}. 985Note that in the complex versions, the real and imaginary parts are returned 986in separate arrays, to be compatible with how MATLAB stores complex matrices. 987 988This function is not required to solve a linear system with KLU. KLU does not 989itself make use of the extracted LU factorization returned by this function. 990It is only provided to simplify the MATLAB interface to KLU, and it may be of 991use to the end user who wishes to examine the contents of the LU factors. 992 993{\footnotesize 994\begin{verbatim} 995 #include "klu.h" 996 int ok, Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], Fi [nzoff], P [n], Q [n], R [n] ; 997 double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ; 998 klu_symbolic *Symbolic ; 999 klu_numeric *Numeric ; 1000 klu_common Common ; 1001 ok = klu_extract (Numeric, Symbolic, 1002 Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */ 1003 ok = klu_z_extract (Numeric, Symbolic, 1004 Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */ 1005 1006 1007 #include "klu.h" 1008 SuiteSparse_long ok, Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], 1009 Fi [nzoff], P [n], Q [n], R [n] ; 1010 double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ; 1011 klu_l_symbolic *Symbolic ; 1012 klu_l_numeric *Numeric ; 1013 klu_l_common Common ; 1014 ok = klu_l_extract (Numeric, Symbolic, 1015 Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */ 1016 ok = klu_zl_extract (Numeric, Symbolic, 1017 Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */ 1018\end{verbatim} 1019} 1020 1021%------------------------------------------------------------------------------- 1022\subsection{{\tt klu\_malloc}, {\tt klu\_free}, {\tt klu\_realloc}: 1023memory management} 1024%------------------------------------------------------------------------------- 1025 1026KLU uses a set of wrapper routines for {\tt malloc}, {\tt free}, and {\tt 1027realloc}. By default, these wrapper routines call the ANSI C versions of these 1028functions. However, pointers to functions in {\tt Common} can be modified 1029after calling {\tt klu\_defaults} to allow the use of other memory management 1030functions (such as the MATLAB {\tt mxMalloc}, {\tt mxFree}, and {\tt 1031mxRealloc}. These wrapper functions keep track of the current and peak memory 1032usage of KLU. They can be called by the user. 1033 1034{\tt klu\_malloc} is essentially the same as {\tt p = malloc (n * sizeof 1035(size))}, {\tt klu\_free} is essentially the same as {\tt free(p)} except that 1036{\tt klu\_free} returns NULL which can then be assigned to {\tt p}. {\tt 1037klu\_realloc} is similar to {\tt realloc}, except that if the reallocation 1038fails, {\tt p} is returned unchanged. Failure conditions are returned in {\tt 1039Common.status}. 1040 1041{\footnotesize 1042\begin{verbatim} 1043 #include "klu.h" 1044 size_t n, nnew, nold, size ; 1045 void *p ; 1046 klu_common Common ; 1047 p = klu_malloc (n, size, &Common) ; 1048 p = klu_free (p, n, size, &Common) ; 1049 p = klu_realloc (nnew, nold, size, p, &Common) ; 1050 1051 1052 #include "klu.h" 1053 size_t n, nnew, nold, size ; 1054 void *p ; 1055 klu_l_common Common ; 1056 p = klu_l_malloc (n, size, &Common) ; 1057 p = klu_l_free (p, n, size, &Common) ; 1058 p = klu_l_realloc (nnew, nold, size, p, &Common) ; 1059 \end{verbatim} 1060} 1061 1062%------------------------------------------------------------------------------- 1063\subsection{{\tt btf\_maxtrans}: maximum transversal} 1064%------------------------------------------------------------------------------- 1065 1066The BTF package includes three user-callable functions (each with {\tt int} 1067and {\tt SuiteSparse\_long} versions). They do not need to be called directly by an 1068application that uses KLU. KLU will call these functions to perform the 1069permutation into upper block triangular form. 1070 1071The {\tt btf\_maxtrans} function finds a column permutation {\tt Q} that gives 1072{\tt A*Q} a zero-free diagonal, if one exists. If row {\tt i} is matched to 1073column {\tt j}, then {\tt Match[i]=j}. If the matrix is structurally singular, 1074there will be some unmatched rows. If row {\tt i} is unmatched, then {\tt 1075Match[i]=-1}. If the matrix is square and structurally non-singular, then {\tt 1076Q=Match} is the column permutation. The {\tt btf\_maxtrans} function can 1077accept as input a rectangular matrix; it operates on the bipartite graph of 1078{\tt A}. It returns the number of columns matched. Unlike the KLU 1079user-callable functions, the BTF functions do not check its inputs at all; a 1080segmentation fault will occur if any input pointers are NULL, for example. 1081 1082The function can require up to $O$({\tt n*nnz(A)}) time (excluding the {\em 1083cheap match} phase, which takes another $O$({\tt nnz(A)}) time. If {\tt 1084maxwork > 0} on input, the work is limited to $O$({\tt maxwork*nnz(A)}) 1085(excluding the cheap match), but the maximum transversal might not be found if 1086the limit is reached. 1087 1088The {\tt Work} array is workspace required by the methods; its contents 1089are undefined on input and output. 1090 1091{\footnotesize 1092\begin{verbatim} 1093 int nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ; 1094 double maxwork, work ; 1095 nmatch = btf_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ; 1096 1097 1098 SuiteSparse_long nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ; 1099 double maxwork, work ; 1100 nmatch = btf_l_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ; 1101\end{verbatim} 1102} 1103 1104%------------------------------------------------------------------------------- 1105\subsection{{\tt btf\_strongcomp}: strongly connected components} 1106%------------------------------------------------------------------------------- 1107 1108The {\tt btf\_strongcomp} function finds the strongly connected components of a 1109directed graph, returning a symmetric permutation {\tt P}. The matrix {\tt A} 1110must be square. The diagonal of {\tt A} (or {\tt A*Q} if a column permutation 1111is given on input) is ignored. If {\tt Q} is NULL on input, the matrix 1112{\tt P*A*P'} is in upper block triangular form. Otherwise, {\tt Q} is modified 1113on output so that {\tt P*A*Q} is in upper block triangular form. The vector 1114{\tt R} gives the block boundaries, where the {\tt k}th block consists of 1115rows and columns {\tt R[k]} through {\tt R[k+1]-1} in the permuted matrix. 1116The function returns the number of strongly connected components found 1117(the diagonal blocks in the block triangular form). 1118 1119{\footnotesize 1120\begin{verbatim} 1121 int n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ; 1122 ncomp = btf_strongcomp (n, Ap, Ai, Q, P, R, Work) ; 1123 1124 1125 SuiteSparse_long n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ; 1126 ncomp = btf_l_strongcomp (n, Ap, Ai, Q, P, R, Work) ; 1127\end{verbatim} 1128} 1129 1130%------------------------------------------------------------------------------- 1131\subsection{{\tt btf\_order}: permutation to block triangular form} 1132%------------------------------------------------------------------------------- 1133 1134 1135The {\tt btf\_order} function combines the above two functions, first finding a 1136maximum transversal and then permuting the resulting matrix into upper block 1137triangular form. Unlike {\tt dmperm} in MATLAB, it always reveals the maximum 1138matching along the diagonal, even if the matrix is structurally singular. 1139 1140On output, {\tt P} and {\tt Q} are the row and column permutations, where 1141{\tt i = P[k]} if row {\tt i} of {\tt A} is the {\tt k}th row of {\tt P*A*Q}, 1142and {\tt j = BTF\_UNFLIP(Q[k])} if column {\tt j} of {\tt A} is the {\tt k}th 1143column of {\tt P*A*Q}. If {\tt Q[k] < 0}, then the {\tt (k,k)}th entry in 1144{\tt P*A*Q} is structurally zero. The vector {\tt R}, and the return value, 1145are the same as {\tt btf\_strongcomp}. 1146 1147{\footnotesize 1148\begin{verbatim} 1149 int n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ; 1150 double maxwork, work ; 1151 ncomp = btf_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ; 1152 1153 1154 SuiteSparse_long n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ; 1155 double maxwork, work ; 1156 ncomp = btf_l_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ; 1157\end{verbatim} 1158} 1159 1160%------------------------------------------------------------------------------ 1161\subsection{Sample C programs that use KLU} 1162%------------------------------------------------------------------------------ 1163 1164Here is a simple main program, {\tt klu\_simple.c}, that illustrates the basic 1165usage of KLU. It uses KLU, and indirectly makes use of BTF and AMD. COLAMD is 1166required to compile the demo, but it is not called by this example. It uses 1167statically defined global variables for the sparse matrix {\tt A}, which would 1168not be typical of a complete application. It just makes for a simpler example. 1169 1170{\footnotesize 1171\input{klu_simple_c.tex} 1172} 1173 1174The {\tt Ap}, {\tt Ai}, and {\tt Ax} arrays represent the matrix 1175\[ 1176A = \left[ 1177\begin{array}{ccccc} 1178 2 & 3 & 0 & 0 & 0 \\ 1179 3 & 0 & 4 & 0 & 6 \\ 1180 0 & -1 & -3 & 2 & 0 \\ 1181 0 & 0 & 1 & 0 & 0 \\ 1182 0 & 4 & 2 & 0 & 1 \\ 1183\end{array} 1184\right]. 1185\] 1186The solution to $Ax=b$ is $x = [1 \, 2 \, 3 \, 4 \, 5]^T$. The program 1187uses default control settings (no scaling, permutation to block triangular 1188form, and the AMD ordering). It ignores the error codes in the return values 1189and {\tt Common.status}. 1190 1191The block triangular form found by {\tt btf\_order} for this matrix is 1192given below 1193\[ 1194PAQ = \left[ 1195\begin{array}{c|ccc|c} 11962 & 0 & 0 & -1 & -3 \\ 1197\hline 1198 & 2 & 0 & 3 & 0 \\ 1199 & 3 & 6 & 0 & 4 \\ 1200 & 0 & 1 & 4 & 1 \\ 1201\hline 1202 & & & & 1 \\ 1203\end{array} 1204\right]. 1205\] 1206This ordering is not modified by the AMD ordering because the 3-by-3 matrix 1207$A_{22} + A_{22}^T$ happens to be a dense matrix. No partial pivoting happens 1208to occur during LU factorization; all pivots are selected along the diagonal of 1209each block. The matrix contains two singletons, which are the original entries 1210$a_{34}=2$ and $a_{43}=1$, and one 3-by-3 diagonal block (in which a single 1211fill-in entry occurs during factorization: the $u_{23}$ entry of this 3-by-3 1212matrix). 1213 1214For a more complete program that uses KLU, see {\tt KLU/Demo/kludemo.c} for an 1215{\tt int} version, and {\tt KLU/Demo/kluldemo.c} for a version that uses {\tt 1216SuiteSparse\_long} instead. The top-level main routine uses CHOLMOD to read in a 1217compressed-column sparse matrix from a Matrix Market file, because KLU does not 1218include such a function. Otherwise, no CHOLMOD functions are used. Unlike 1219{\tt klu\_simple.c}, CHOLMOD is required to run the {\tt kludemo.c} and {\tt 1220kluldemo.c} programs. 1221 1222%------------------------------------------------------------------------------ 1223\section{Installation} 1224\label{Install} 1225%------------------------------------------------------------------------------ 1226 1227Installation of the C-callable interface requires the {\tt make} utility, in 1228Linux/Unix. Alternatively, you can use the Cygwin {\tt make} in Windows. 1229The MATLAB installation in any platform, including Windows is simple; just 1230type {\tt klu\_install} to compile and install KLU, BTF, AMD, and COLAMD. 1231 1232For {\tt make}, system-dependent configurations are in the {\tt 1233../SuiteSparse\_config/SuiteSparse\_config.mk} file. 1234You can edit that file to customize the 1235compilation, but it now automatically detecs your system (Linux, Mac, etc). 1236So it's not likely that you will need to edit that file. 1237 1238To compile and install the C-callable KLU, BTF, AMD, and COLAMD libraries, go 1239to the {\tt KLU} directory and type {\tt make}. The KLU and BTF libraries are 1240placed in {\tt KLU/Lib/libklu.*} and {\tt BTF/Lib/libbtf.*}. 1241Two KLU demo 1242programs will be compiled and tested in the {\tt KLU/Demo} directory. You can 1243compare the output of {\tt make} with the results in the KLU distribution, {\tt 1244kludemo.out}. 1245 1246Typing {\tt make clean} will remove all but the final compiled libraries and 1247demo programs. Typing {\tt make purge} or {\tt make distclean} removes all 1248files not in the original distribution. If you compile KLU or BTF and then 1249later change the {\tt ../SuiteSparse\_config/SuiteSparse\_config.mk} 1250file then you should type {\tt 1251make purge} and then {\tt make} to recompile. 1252 1253When you compile your program that uses the C-callable KLU library, you need to 1254add the {\tt KLU/Lib/libklu.*}, {\tt BTF/Lib/libbtf.*}, {\tt AMD/Lib/libamd.*}, 1255and {\tt COLAMD/Lib/libcolamd.*} libraries, and you need to tell your compiler to 1256look in the {\tt KLU/Include} and {\tt BTF/Include} directory for include 1257files. 1258Alternatively, do {\tt make install}, and KLU will be installed in 1259/usr/local/lib and /usr/local/include, and documentation is placed in 1260/usr/local/doc. These installation locations can be changed; 1261see {\tt SuiteSparse/README.txt} for details. 1262 1263If all you want to use is the KLU mexFunction in MATLAB, you can skip the use 1264of the {\tt make} command entirely. Simply type {\tt klu\_install} in the 1265MATLAB command window while in the {\tt KLU/MATLAB} directory. This works on 1266any system with MATLAB, including Windows. 1267 1268%------------------------------------------------------------------------------ 1269\newpage 1270\section{The KLU routines} 1271\label{klu_include} 1272%------------------------------------------------------------------------------ 1273 1274The file {\tt KLU/Include/klu.h} listed below describes each user-callable 1275routine in the C version of KLU, and gives details on their use. 1276 1277{\footnotesize 1278\input{klu_h.tex} 1279} 1280 1281%------------------------------------------------------------------------------ 1282\newpage 1283\section{The BTF routines} 1284\label{btf_include} 1285%------------------------------------------------------------------------------ 1286 1287The file {\tt BTF/Include/btf.h} listed below describes each user-callable 1288routine in the C version of BTF, and gives details on their use. 1289 1290{\footnotesize 1291\input{btf_h.tex} 1292} 1293 1294%------------------------------------------------------------------------------ 1295\newpage 1296% References 1297%------------------------------------------------------------------------------ 1298 1299\bibliographystyle{plain} 1300\bibliography{KLU_UserGuide} 1301 1302\end{document} 1303