1\documentclass[11pt]{article}
2
3\newcommand{\m}[1]{{\bf{#1}}}       % for matrices and vectors
4\newcommand{\tr}{^{\sf T}}          % transpose
5
6\topmargin 0in
7\textheight 9in
8\oddsidemargin 0pt
9\evensidemargin 0pt
10\textwidth 6.5in
11
12%------------------------------------------------------------------------------
13\begin{document}
14%------------------------------------------------------------------------------
15
16\title{User Guide for KLU and BTF}
17\author{
18Timothy A. Davis\thanks{
19DrTimothyAldenDavis@gmail.com,
20http://www.suitesparse.com.
21This work was supported by Sandia National Labs, and the National
22Science Foundation.
23Portions of the work were done while on sabbatical at Stanford University
24and Lawrence Berkeley National Laboratory (with funding from Stanford
25University and the SciDAC program).
26}
27\and Eka Palamadai Natarajan}
28
29\date{VERSION 1.3.9, Mar 12, 2018}
30\maketitle
31
32%------------------------------------------------------------------------------
33\begin{abstract}
34KLU is a set of routines for solving sparse linear systems of equations.
35It is particularly well-suited to matrices arising in SPICE-like circuit
36simulation applications.
37It relies on a permutation to block triangular form (BTF), several methods
38for finding a fill-reducing ordering (variants of approximate minimum degree,
39and nested dissection), and a sparse left-looking LU factorization method
40to factorize each block.  A MATLAB interface is included.
41KLU appears as Collected Algorithm 907 of the ACM \cite{DavisNatarajan10}.
42\end{abstract}
43%------------------------------------------------------------------------------
44
45\newpage
46\tableofcontents
47\newpage
48
49%------------------------------------------------------------------------------
50\section{License and Copyright}
51%------------------------------------------------------------------------------
52
53KLU, Copyright\copyright 2004-2011 University of Florida.
54All Rights Reserved.
55KLU is available under alternate licenses; contact T. Davis for details.
56
57{\bf KLU License:} see KLU/Doc/License.txt for the license.
58
59{\bf Availability:} {\tt http://www.suitesparse.com}
60
61{\bf Acknowledgments:}
62
63    This work was supported by Sandia National Laboratories (Mike Heroux)
64    and the National Science Foundation under grants 0203270 and 0620286.
65
66
67%------------------------------------------------------------------------------
68\newpage
69\section{Overview}
70%------------------------------------------------------------------------------
71
72KLU is a set of routines for solving sparse linear systems of equations.  It
73first permutes the matrix into upper block triangular form, via the BTF
74package.  This is done by first finding a permutation for a zero-free diagonal
75(a maximum transversal) \cite{Duff81}. If there is no such permutation, then
76the matrix is structurally rank-deficient, and is numerically singular.  Next,
77Tarjan's method \cite{Duff78a,Tarjan72} is used to find the strongly-connected
78components of the graph.  The block triangular form is essentially unique; any
79method will lead to the same number and sizes of blocks, although the ordering
80of the blocks may vary (consider a diagonal matrix, for example).  Assuming the
81matrix has full structural rank, the permuted matrix has the following form:
82\[
83PAQ =
84\left[
85\begin{array}{ccc}
86A_{11} & \cdots & A_{1k} \\
87       & \ddots & \vdots \\
88       &        & A_{kk} \\
89\end{array}
90\right],
91\]
92where each diagonal block is square with a zero-free diagonal.
93
94Next, each diagonal block is factorized with a sparse left-looking method
95\cite{GilbertPeierls88}.  The kernel of this factorization method is an
96efficient method for solving $Lx=b$ when $L$, $x$, and $b$ are all sparse.
97This kernel is used to compute each column of $L$ and $U$, one column at a time.
98The total work performed by this method is always proportional to the number of
99floating-point operations, something that is not true of any other sparse LU
100factorization method.
101
102Prior to factorizing each diagonal block, the blocks are ordered to reduce
103fill-in.  By default, the symmetric approximate minimum degree (AMD) ordering
104is used on $A_{ii}+A_{ii}^T$ \cite{AmestoyDavisDuff96,AmestoyDavisDuff03}.
105Another ordering option is to find a column ordering via COLAMD
106\cite{DavisGilbertLarimoreNg00_algo,DavisGilbertLarimoreNg00}.
107Alternatively, a permutation can be provided by the user, or a pointer to
108a user-provided ordering function can be passed, which is then used to order
109each block.
110
111Only the diagonal blocks need to be factorized.  Consider a linear system where
112the matrix is permuted into three blocks, for example:
113\[
114\left[
115\begin{array}{ccc}
116A_{11} & A_{12} & A_{13} \\
117       & A_{22} & A_{23} \\
118       &        & A_{33} \\
119\end{array}
120\right]
121\left[
122\begin{array}{c}
123x_{1} \\
124x_{2} \\
125x_{3} \\
126\end{array}
127\right]
128=
129\left[
130\begin{array}{c}
131b_{1} \\
132b_{2} \\
133b_{3} \\
134\end{array}
135\right].
136\]
137
138The non-singular system $A_{33} x_{3} = b_{3}$ can first be solved for $x_{3}$.
139After a block back substitution, the resulting system becomes
140\[
141\left[
142\begin{array}{cc}
143A_{11} & A_{12} \\
144       & A_{22} \\
145\end{array}
146\right]
147\left[
148\begin{array}{c}
149x_{1} \\
150x_{2} \\
151\end{array}
152\right]
153=
154\left[
155\begin{array}{c}
156b_{1} - A_{13} x_{3}\\
157b_{2} - A_{23} x_{3}\\
158\end{array}
159\right]
160=
161\left[
162\begin{array}{c}
163b'_{1} \\
164b'_{2} \\
165\end{array}
166\right]
167\]
168and the $A_{22} x_{2} = b'_{2}$ system can be solved for $x_{2}$.  The primary
169advantage of this method is that no fill-in occurs in the off-diagonal blocks
170($A_{12}$, $A_{13}$, and $A_{23}$).  This is particular critical for sparse
171linear systems arising in SPICE-like circuit simulation
172\cite{Kundert86,KundertSangiovanniVincentelli85,NagelPederson,Quarles:M89/42}.
173Circuit matrices are typically permutable into block triangular form, with many
174singletons (1-by-1 blocks).  They also often have a handful of rows and columns
175with many nonzero entries, due to voltage and current sources.  These rows and
176columns are pushed into the upper block triangular form, and related to the
177singleton blocks (for example, $A_{33}$ in the above system is 1-by-1, and the
178column in $A_{13}$ and $A_{23}$ has many nonzero entries).  Since these
179nearly-dense rows and columns do not appear in the LU factorization of the
180diagonal blocks, they cause no fill-in.
181
182The structural rank of a matrix is based solely on the pattern of its entries,
183not their numerical values.  With random entries, the two ranks are equal with
184probability one.   The structural rank of any matrix is an upper bound on the
185numerical rank.  The maximum transversal algorithm in the BTF package is useful
186in determining if a matrix is structurally rank deficient, and if so, it
187reveals a (non-unique) set of rows and columns that contribute to that rank
188deficiency.  This is useful in determining what parts of a circuit are poorly
189formulated (such as dangling components).
190
191When ordered and factorized with KLU, very little fill-in occurs in the
192resulting LU factors, which means that there is little scope for use of the
193BLAS \cite{ACM679a}.  Sparse LU factorization methods that use the BLAS (such
194as SuperLU \cite{SuperLU99} amd UMFPACK \cite{Davis03_algo,Davis03}) are slower
195than KLU when applied to sparse matrices arising in circuit simulation.  Both
196KLU and SuperLU are based on Gilbert and Peierl's left-looking method
197\cite{GilbertPeierls88}.  SuperLU uses supernodes, but KLU does not; thus the
198name {\em KLU} refers to a ``Clark Kent'' LU factorization algorithm (what
199SuperLU was before it became Super).
200
201For details of the permutation to block triangular form, left-looking sparse
202LU factorization, and approximate minimum degree, refer to \cite{Davis06book}.
203Concise versions of these methods can be found in the CSparse package.  KLU is
204also the topic of a Master's thesis by Palamadai Natarajan \cite{Palamadai05};
205a copy of
206the thesis can be found in the {\tt KLU/Doc} directory.  It includes a
207description of an earlier version of KLU; some of the function names and
208parameter lists have changed in this version.  The descriptions of the methods
209used still applies to the current version of KLU, however.
210
211KLU appears as {\em Algorithm 907: KLU, a direct sparse solver for circuit
212simulation problems}, ACM Transactions on Mathematical Software, vol 37, no 3,
2132010.
214
215%------------------------------------------------------------------------------
216\section{Availability}
217%------------------------------------------------------------------------------
218
219KLU and its required ordering packages (BTF, COLAMD, AMD, and
220SuiteSparse\_config) are
221available at \newline {\tt http://www.suitesparse.com.}  In
222addition, KLU can make use of any user-provided ordering function.  One such
223function is included, which provides KLU with an interface to the ordering
224methods used in CHOLMOD \cite{ChenDavisHagerRajamanickam06}, such as METIS, a
225nested dissection method \cite{KarypisKumar98e}.  After permutation to block
226triangular form, circuit matrices have very good node separators, and are thus
227excellent candidates for nested dissection.  The METIS ordering takes much more
228time to compute than the AMD ordering, but if the ordering is reused many times
229(typical in circuit simulation) the better-quality ordering can pay off in
230lower total simulation time.
231
232To use KLU, you must obtain the KLU, BTF, SuiteSparse\_config,
233AMD, and COLAMD packages
234in the SuiteSparse suite of sparse matrix libraries.  See
235{\tt http://www.suitesparse.com} for each of these packages.
236They are also all contained within the single {\tt SuiteSparse.zip} or
237{\tt SuiteSparse.tar.gz} distribution.
238
239%------------------------------------------------------------------------------
240\section{Using KLU and BTF in MATLAB}
241%------------------------------------------------------------------------------
242
243KLU has a single MATLAB interface which provides several options for factorizing
244a matrix and/or using the factors to solve a linear system.  The following is
245a synopsis of its use.  For more details, type {\tt help klu} in MATLAB.
246
247{\footnotesize
248\begin{verbatim}
249   LU = klu (A)            factorizes R\A(p,q) into L*U+F, returning a struct
250   x = klu (A,'\',b)       x = A\b
251   x = klu (b,'/',A)       x = b/A
252   x = klu (LU,'\',b)      x = A\b, where LU = klu(A)
253   x = klu (b,'/',LU)      x = b/A, where LU = klu(A)
254\end{verbatim}
255}
256
257With a single input {\tt klu(A)} returns a MATLAB struct containing the LU
258factors.  The factorization is in the form \verb'L*U + F = R \ A(p,q)'
259where {\tt L*U} is the LU factorization of just the diagonal blocks of the
260block triangular form, {\tt F} is a sparse matrix containing the entries in
261the off-diagonal blocks, {\tt R} is a diagonal matrix containing the row
262scale factors, and {\tt p} and {\tt q} are permutation vectors.  The {\tt LU}
263struct also contains a vector {\tt r} which describes the block boundaries
264(the same as the third output parameter of {\tt dmperm}).  The {\tt k}th
265block consists of rows and columns {\tt r(k)} to {\tt r(k+1)-1} in the
266permuted matrix {\tt A(p,q)} and the factors {\tt L} and {\tt U}.
267
268An optional final input argument ({\tt klu(A,opts)} for example) provides a
269way of modifying KLU's user-definable parameters, including a partial pivoting
270tolerance and ordering options.  A second output argument
271({\tt [LU,info] = klu ( ... )}) provides statistics on the factorization.
272
273The BTF package includes three user-callable MATLAB functions which replicate
274most of features of the MATLAB built-in {\tt dmperm} function, and provide an
275additional option which can significantly limit the worst-case time taken by
276{\tt dmperm}.  For more details, type {\tt help btf}, {\tt help maxtrans},
277and {\tt help strongcomp} in MATLAB.  Additional information about how
278these functions work can be found in \cite{Davis06book}.
279
280{\footnotesize
281\begin{verbatim}
282   [p,q,r] = btf (A)         similar to [p,q,r] = dmperm (A)
283   q = maxtrans (A)          similar to q = dmperm (A')
284   [p,r] = strongcomp (A)    similar to [p,q,r] = dmperm (A + speye(n))
285\end{verbatim}
286}
287
288Both {\tt btf} and {\tt maxtrans} include a second option input, {\tt maxwork},
289which limits the total work performed in the maximum transversal to
290{\tt maxwork * nnz(A)}.  The worst-case time taken by the algorithm is
291$O$ ({\tt n * nnz(A)}), where the matrix {\tt A} is {\tt n}-by-{\tt n}, but
292this worst-case time is rarely reached in practice.
293
294To use the KLU and BTF functions in MATLAB, you must first compile and install
295them.  In the {\tt BTF/MATLAB} directory, type {\tt btf\_install}, and then
296type {\tt klu\_install} in the {\tt KLU/MATLAB} directory.  Alternatively, if
297you have the entire SuiteSparse, simply run the {\tt SuiteSparse\_install}
298function in the {\tt SuiteSparse} directory.
299
300After running the installation scripts, type {\tt pathtool} and save your path
301for future MATLAB sessions.  If you cannot save your path because of file
302permissions, edit your {\tt startup.m} by adding {\tt addpath} commands (type
303{\tt doc startup} and {\tt doc addpath} for more information).
304
305%------------------------------------------------------------------------------
306\section{Using KLU and BTF in a C program}
307\label{Cversion}
308%------------------------------------------------------------------------------
309
310KLU and BTF include the following C-callable functions.  Each function is
311available in two or four versions: with {\tt int} or {\tt long} integers, and
312(for functions that deal with numerical values), with {\tt double} or complex
313{\tt double} values.  The {\tt long} integer is actually a {\tt SuiteSparse\_long},
314which is typically a {\tt long}, defined with a {\tt \#define} statement.  It
315becomes an {\tt \_\_int64} on Microsoft Windows 64, however.
316
317The usage of real and complex, and {\tt int} and {\tt SuiteSparse\_long}, must not be
318mixed, except that some functions can be used for both real and complex cases.
319
320%------------------------------------------------------------------------------
321\subsection{KLU Common object}
322%------------------------------------------------------------------------------
323
324The {\tt klu\_common} object ({\tt klu\_l\_common} for the {\tt SuiteSparse\_long}
325version) contains user-definable parameters and statistics returned from
326KLU functions.  This object appears in every KLU function as the last
327parameter.  Details are given in the {\tt klu.h} include file, which also
328appears below in Section~\ref{klu_include}.  Primary parameters and statistics
329are summarized below.  The defaults are chosen specifically for circuit
330simulation matrices.
331
332\begin{itemize}
333\item {\tt tol}: partial pivoting tolerance.  If the diagonal entry has a
334magnitude greater than or equal to {\tt tol} times the largest magnitude
335of entries in the pivot column, then the diagonal entry is chosen.
336Default value: 0.001.
337
338\item {\tt ordering}: which fill-reducing ordering to use: 0 for AMD,
3391 for COLAMD, 2 for a user-provided permutation {\tt P} and {\tt Q}
340(or a natural ordering if {\tt P} and {\tt Q} are NULL), or 3 for
341the {\tt user\_order} function.  Default: 0 (AMD).
342
343\item {\tt scale}: whether or not the matrix should be scaled.
344If {\tt scale < 0}, then no scaling is performed and the input matrix
345is not checked for errors.  If {\tt scale >= 0}, the input matrix is
346check for errors.
347If {\tt scale=0}, then no scaling is performed.
348If {\tt scale=1}, then each row of {\tt A} is divided by the sum of
349the absolute values in that row.
350If {\tt scale=2}, then each row of {\tt A} is divided by the maximum
351absolute value in that row.  Default: 2.
352
353\item {\tt btf}:  if nonzero, then BTF is used to permute the input matrix
354into block upper triangular form.  This step is skipped if {\tt Common.btf}
355is zero.  Default: 1.
356
357\item {\tt maxwork}: sets an upper limit on the amount of work performed in
358{\tt btf\_maxtrans} to \newline {\tt maxwork*nnz(A)}.  If the limit is reached,
359a partial zero-free diagonal might be found.  This has no effect on whether or
360not the matrix can be factorized, since the matrix can be factorized with no
361BTF pre-ordering at all.  This option provides a tradeoff between the
362effectiveness of the BTF ordering and the cost to compute it.  A partial result
363can result in fewer, and larger, blocks in the BTF form, resulting to more work
364required to factorize the matrix.  No limit is enforced if {\tt maxwork <= 0}.
365Default: 0.
366
367\item {\tt user\_order}: a pointer to a function that can be provided by the
368application that uses KLU, to redefine the fill-reducing ordering used by KLU
369for each diagonal block.  The {\tt int} and {\tt SuiteSparse\_long} prototypes must be
370as follows:
371
372{\footnotesize
373\begin{verbatim}
374int my_ordering_function (int n, int Ap [ ], int Ai [ ], int Perm [ ], klu_common *Common) ;
375
376
377SuiteSparse_long my_long_ordering_function (SuiteSparse_long n,
378    SuiteSparse_long Ap [ ], SuiteSparse_long Ai [ ],
379    SuiteSparse_long Perm [ ], klu_l_common *Common);
380\end{verbatim}
381}
382
383The function should return 0 if an error occurred, and either -1 or a positive
384(nonzero) value if no error occurred.  If greater than zero, then the return
385value is interpreted by KLU as an estimate of the number of nonzeros in $L$ or
386$U$ (whichever is greater), when the permuted matrix is factorized.  Only an
387estimate is possible, since partial pivoting with row interchanges is performed
388during numerical factorization.  The input matrix is provided to the function
389in the parameters {\tt n}, {\tt Ap}, and {\tt Ai}, in compressed-column form.
390The matrix {\tt A} is {\tt n}-by-{\tt n}.  The {\tt Ap} array is of size {\tt
391n+1}; the {\tt j}th column of {\tt A} has row indices {\tt Ai[Ap[j] ...
392Ap[j+1]-1]}.  The {\tt Ai} array is of size {\tt Ap[n]}.  The first column
393pointer {\tt Ap[0]} is zero.  The row indices might not appear sorted in each
394column, but no duplicates will appear.
395
396The output permutation is to be passed back in the {\tt Perm} array, where
397{\tt Perm[k]=j} means that row and column {\tt j} of {\tt A} will appear as
398the {\tt k}th row and column of the permuted matrix factorized by KLU.  The
399{\tt Perm} array is already allocated when it is passed to the user function.
400
401The user function may use, and optionally modify, the contents of the {\tt
402klu\_common Common} object.  In particular, prior to calling KLU, the user
403application can set both {\tt Common.user\_order} and {\tt Common.user\_data}.
404The latter is a {\tt void *} pointer that KLU does not use, except to pass to
405the user ordering function pointed to by {\tt Common.user\_order}.  This is a
406mechanism for passing additional arguments to the user function.
407
408An example user function is provided in the {\tt KLU/User} directory, which
409provides an interface to the ordering method in CHOLMOD.
410
411\end{itemize}
412
413%------------------------------------------------------------------------------
414\subsection{KLU Symbolic object}
415%------------------------------------------------------------------------------
416
417KLU performs its sparse LU factorization in two steps.  The first is purely
418symbolic, and does not depend on the numerical values.  This analysis returns a
419{\tt klu\_symbolic} object ({\tt klu\_l\_symbolic} in the {\tt SuiteSparse\_long}
420version).  The {\tt Symbolic} object contains a pre-ordering which combines the
421block triangular form with the fill-reducing ordering, and an estimate of the
422number of nonzeros in the factors of each block.  Its size is thus modest, only
423proportional to {\tt n}, the dimension of {\tt A}.  It can be reused multiple
424times for the factorization of a sequence of matrices with identical nonzero
425pattern.  Note: a {\em nonzero} in this sense is an entry present in the data
426structure of {\tt A}; such entries may in fact be numerically zero.
427
428%------------------------------------------------------------------------------
429\subsection{KLU Numeric object}
430%------------------------------------------------------------------------------
431
432The {\tt Numeric} object contains the numeric sparse LU factorization, including
433the final pivot permutations.  To solve a linear system, both the {\tt Symbolic}
434and {\tt Numeric} objects are required.
435
436%------------------------------------------------------------------------------
437\subsection{A sparse matrix in KLU}
438%------------------------------------------------------------------------------
439
440% From here on, only the {\tt int} version is described.  In the {\tt SuiteSparse\_long}
441% version, the function names change slightly ({\tt klu\_factor} becomes
442% {\tt klu\_l\_factor}, and the {\tt int}/complex version {\tt klu\_z\_factor}
443% becomes {\tt klu\_zl\_factor}, for example).  For more details on the
444% {\tt SuiteSparse\_long} version, refer to Section~\ref{klu_include}.
445
446The input matrix provided to KLU is in sparse compressed-column form, which is
447the same data structure used internally in MATLAB, except that the version used
448here allows for the row indices to appear in any ordering, and this version
449also allows explicit zero entries to appear in the data structure.  The same
450data structure is used in CSparse, which is fully documented in
451\cite{Davis06book}.  If you wish to use a simpler input data structure,
452consider creating a triplet matrix in CSparse (or CXSparse if you use the long
453and/or complex versions of KLU), and then convert this data structure into a
454sparse compressed-column data structure, using the CSparse {\tt cs\_compress}
455and {\tt cs\_dupl} functions.  Similar functions are available in CHOLMOD
456{\tt cholmod\_triplet\_to\_sparse}.
457
458The matrix is described with four parameters:
459
460\begin{itemize}
461\item {\tt n}: an integer scalar.  The matrix is {\tt n}-by-{\tt n}.  Note that
462KLU only operates on square matrices.
463
464\item {\tt Ap}: an integer array of size {\tt n+1}.  The first entry is {\tt
465Ap[0]=0}, and the last entry {\tt nz=Ap[n]} is equal to the number of entries
466in the matrix.
467
468\item {\tt Ai}: an integer array of size {\tt nz = Ap[n]}.
469The row indices of entries in column {\tt j} of {\tt A} are located in
470{\tt Ai [Ap [j] ... Ap [j+1]-1]}.  The matrix is zero-based; row and column
471indices are in the range 0 to {\tt n-1}.
472
473\item {\tt Ax}: a {\tt double} array of size {\tt nz} for the real case, or
474{\tt 2*nz} for the complex case.  For the complex case, the real and imaginary
475parts are interleaved, compatible with Fortran and the ANSI C99 Complex data
476type.  KLU does not rely on the ANSI C99 data type, however, for portability
477reasons.  The numerical values in column {\tt j} of a real matrix are located
478in {\tt Ax [Ap [j] ... Ap [j+1]-1]}.  For a complex matrix, they appear in {\tt
479Ax [2*Ap [j] ... 2*Ap [j+1]-1]}, as real/imaginary pairs (the real part appears
480first, followed by the imaginary part).
481
482\end{itemize}
483
484%-------------------------------------------------------------------------------
485\subsection{{\tt klu\_defaults}: set default parameters}
486%-------------------------------------------------------------------------------
487
488This function sets the default parameters for KLU and clears the statistics.
489It may be used for either the real or complex cases.  A value of 0 is returned
490if an error occurs, 1 otherwise.  This function {\bf must} be called before
491any other KLU function can be called.
492
493{\footnotesize
494\begin{verbatim}
495    #include "klu.h"
496    int ok ;
497    klu_common Common ;
498    ok = klu_defaults (&Common) ;                                             /* real or complex */
499
500
501    #include "klu.h"
502    SuiteSparse_long ok ;
503    klu_l_common Common ;
504    ok = klu_l_defaults (&Common) ;                                           /* real or complex */
505\end{verbatim}
506}
507
508%-------------------------------------------------------------------------------
509\subsection{{\tt klu\_analyze}: order and analyze a matrix}
510%-------------------------------------------------------------------------------
511
512The following usage returns a {\tt Symbolic} object that contains the
513fill-reducing ordering needed to factorize the matrix {\tt A}.  A NULL pointer
514is returned if a failure occurs.  The error status for this function, and all
515others, is returned in {\tt Common.status}.  These functions may be used for
516both real and complex cases.  The AMD ordering is used if {\tt Common.ordering
517= 0}, COLAMD is used if it is 1, the natural ordering is used if it is 2, and
518the user-provided {\tt Common.user\_ordering} is used if it is 3.
519
520{\footnotesize
521\begin{verbatim}
522    #include "klu.h"
523    int n, Ap [n+1], Ai [nz] ;
524    klu_symbolic *Symbolic ;
525    klu_common Common ;
526    Symbolic = klu_analyze (n, Ap, Ai, &Common) ;                             /* real or complex */
527
528
529    #include "klu.h"
530    SuiteSparse_long n, Ap [n+1], Ai [nz] ;
531    klu_l_symbolic *Symbolic ;
532    klu_l_common Common ;
533    Symbolic = klu_l_analyze (n, Ap, Ai, &Common) ;                           /* real or complex */
534\end{verbatim}
535}
536
537%-------------------------------------------------------------------------------
538\subsection{{\tt klu\_analyze\_given}: order and analyze a matrix}
539%-------------------------------------------------------------------------------
540
541In this routine, the fill-reducing ordering is provided by the user ({\tt
542Common.ordering} is ignored).  Instead, the row permutation {\tt P} and column
543permutation {\tt Q} are used.  These are integer arrays of size {\tt n}.  If
544NULL, a natural ordering is used (so to provide just a column ordering, pass
545{\tt Q} as non-NULL and {\tt P} as NULL).  A NULL pointer is returned if an
546error occurs.  These functions may be used for both real and complex cases.
547
548{\footnotesize
549\begin{verbatim}
550    #include "klu.h"
551    int n, Ap [n+1], Ai [nz], P [n], Q [n] ;
552    klu_symbolic *Symbolic ;
553    klu_common Common ;
554    Symbolic = klu_analyze_given (n, Ap, Ai, P, Q, &Common) ;                 /* real or complex */
555
556
557    #include "klu.h"
558    SuiteSparse_long n, Ap [n+1], Ai [nz], P [n], Q [n] ;
559    klu_l_symbolic *Symbolic ;
560    klu_l_common Common ;
561    Symbolic = klu_l_analyze_given (n, Ap, Ai, P, Q, &Common) ;               /* real or complex */
562\end{verbatim}
563}
564
565%-------------------------------------------------------------------------------
566\subsection{{\tt klu\_factor}: numerical factorization}
567%-------------------------------------------------------------------------------
568
569The {\tt klu\_factor} function factorizes a matrix, using a sparse left-looking
570method with threshold partial pivoting.  The inputs {\tt Ap} and {\tt Ai} must
571be unchanged from the previous call to {\tt klu\_analyze} that created the {\tt
572Symbolic} object.  A NULL pointer is returned if an error occurs.
573
574{\footnotesize
575\begin{verbatim}
576    #include "klu.h"
577    int Ap [n+1], Ai [nz] ;
578    double Ax [nz], Az [2*nz] ;
579    klu_symbolic *Symbolic ;
580    klu_numeric *Numeric ;
581    klu_common Common ;
582    Numeric = klu_factor (Ap, Ai, Ax, Symbolic, &Common) ;                            /* real */
583    Numeric = klu_z_factor (Ap, Ai, Az, Symbolic, &Common) ;                          /* complex */
584
585
586    #include "klu.h"
587    SuiteSparse_long Ap [n+1], Ai [nz] ;
588    double Ax [nz], Az [2*nz] ;
589    klu_l_symbolic *Symbolic ;
590    klu_l_numeric *Numeric ;
591    klu_l_common Common ;
592    Numeric = klu_l_factor (Ap, Ai, Ax, Symbolic, &Common) ;                          /* real */
593    Numeric = klu_zl_factor (Ap, Ai, Az, Symbolic, &Common) ;                         /* complex */
594\end{verbatim}
595}
596
597%-------------------------------------------------------------------------------
598\subsection{{\tt klu\_solve}: solve a linear system}
599%-------------------------------------------------------------------------------
600
601Solves the linear system $Ax=b$, using the {\tt Symbolic} and  {\tt Numeric}
602objects.  The right-hand side {\tt B} is overwritten with the solution on
603output.  The array {\tt B} is stored in column major order, with a leading
604dimension of {\tt ldim}, and {\tt nrhs} columns.  Thus, the real entry $b_{ij}$
605is stored in {\tt B [i+j*ldim]}, where {\tt ldim >= n} must hold.  A complex
606entry $b_{ij}$ is stored in {\tt B [2*(i+j*ldim)]} and {\tt B [2*(i+j*ldim)+1]}
607(for the real and imaginary parts, respectively).  Returns 1 if successful,
6080 if an error occurs.
609
610{\footnotesize
611\begin{verbatim}
612    #include "klu.h"
613    int ldim, nrhs, ok ;
614    double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
615    klu_symbolic *Symbolic ;
616    klu_numeric *Numeric ;
617    klu_common Common ;
618    ok = klu_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ;                      /* real */
619    ok = klu_z_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ;                   /* complex */
620
621
622    #include "klu.h"
623    SuiteSparse_long ldim, nrhs, ok ;
624    double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
625    klu_symbolic *Symbolic ;
626    klu_numeric *Numeric ;
627    klu_common Common ;
628    ok = klu_l_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ;                    /* real */
629    ok = klu_zl_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ;                  /* complex */
630\end{verbatim}
631}
632
633%-------------------------------------------------------------------------------
634\subsection{{\tt klu\_tsolve}: solve a transposed linear system}
635%-------------------------------------------------------------------------------
636
637Solves the linear system $A^Tx=b$ or $A^Hx=b$.  The {\tt conj\_solve} input
638is 0 for $A^Tx=b$, or nonzero for $A^Hx=b$.  Otherwise, the function is
639identical to {\tt klu\_solve}.
640
641
642{\footnotesize
643\begin{verbatim}
644    #include "klu.h"
645    int ldim, nrhs, ok ;
646    double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
647    klu_symbolic *Symbolic ;
648    klu_numeric *Numeric ;
649    klu_common Common ;
650    ok = klu_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ;                     /* real */
651    ok = klu_z_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ;      /* complex */
652
653
654    #include "klu.h"
655    SuiteSparse_long ldim, nrhs, ok ;
656    double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
657    klu_symbolic *Symbolic ;
658    klu_numeric *Numeric ;
659    klu_common Common ;
660    ok = klu_l_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ;                   /* real */
661    ok = klu_zl_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ;     /* complex */
662\end{verbatim}
663}
664
665
666%-------------------------------------------------------------------------------
667\subsection{{\tt klu\_refactor}: numerical refactorization}
668%-------------------------------------------------------------------------------
669
670The {\tt klu\_refactor} function takes as input the {\tt Numeric} object
671created by {\tt klu\_factor} (or as modified by a previous call to {\tt
672klu\_refactor}).  It factorizes a new matrix with the same nonzero pattern as
673that given to the call to {\tt klu\_factor} which created it.  The same pivot
674order is used.  Since this can lead to numeric instability, the use of {\tt
675klu\_rcond}, {\tt klu\_rgrowth}, or {\tt klu\_condest} is recommended to check
676the accuracy of the resulting factorization.  The inputs {\tt Ap} and {\tt Ai}
677must be unmodified since the call to {\tt klu\_factor} that first created the
678{\tt Numeric} object.  This is function is much faster than {\tt klu\_factor},
679and requires no dynamic memory allocation.
680
681{\footnotesize
682\begin{verbatim}
683    #include "klu.h"
684    int ok, Ap [n+1], Ai [nz] ;
685    double Ax [nz], Az [2*nz] ;
686    klu_symbolic *Symbolic ;
687    klu_numeric *Numeric ;
688    klu_common Common ;
689    ok = klu_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ;                      /* real */
690    ok = klu_z_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ;                    /* complex */
691
692
693    #include "klu.h"
694    SuiteSparse_long ok, Ap [n+1], Ai [nz] ;
695    double Ax [nz], Az [2*nz] ;
696    klu_l_symbolic *Symbolic ;
697    klu_l_numeric *Numeric ;
698    klu_l_common Common ;
699    ok = klu_l_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ;                    /* real */
700    ok = klu_zl_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ;                   /* complex */
701\end{verbatim}
702}
703
704%-------------------------------------------------------------------------------
705\subsection{{\tt klu\_free\_symbolic}: destroy the {\tt Symbolic} object}
706%-------------------------------------------------------------------------------
707
708It is the user's responsibility to destroy the {\tt Symbolic} object when it is
709no longer needed, or else a memory leak will occur.  It is safe to pass a NULL
710{\tt Symbolic} pointer.  These functions may be used for both real and complex
711cases.
712
713{\footnotesize
714\begin{verbatim}
715    #include "klu.h"
716    klu_symbolic *Symbolic ;
717    klu_common Common ;
718    klu_free_symbolic (&Symbolic, &Common) ;                                  /* real or complex */
719
720
721    #include "klu.h"
722    klu_l_symbolic *Symbolic ;
723    klu_l_common Common ;
724    klu_l_free_symbolic (&Symbolic, &Common) ;                                /* real or complex */
725\end{verbatim}
726}
727
728%-------------------------------------------------------------------------------
729\subsection{{\tt klu\_free\_numeric}: destroy the {\tt Numeric} object}
730%-------------------------------------------------------------------------------
731
732It is the user's responsibility to destroy the {\tt Numeric} object when it is
733no longer needed, or else a memory leak will occur.  It is safe to pass a NULL
734{\tt Numeric} pointer.
735
736{\footnotesize
737\begin{verbatim}
738    #include "klu.h"
739    klu_numeric *Numeric ;
740    klu_common Common ;
741    klu_free_numeric (&Numeric, &Common) ;                                            /* real */
742    klu_z_free_numeric (&Numeric, &Common) ;                                          /* complex */
743
744
745    #include "klu.h"
746    klu_l_numeric *Numeric ;
747    klu_l_common Common ;
748    klu_l_free_numeric (&Numeric, &Common) ;                                          /* real */
749    klu_zl_free_numeric (&Numeric, &Common) ;                                         /* complex */
750\end{verbatim}
751}
752
753%-------------------------------------------------------------------------------
754\subsection{{\tt klu\_sort}: sort the columns of L and U}
755%-------------------------------------------------------------------------------
756
757The {\tt klu\_factor} function creates a {\tt Numeric} object with factors
758{\tt L} and {\tt U} stored in a compressed-column form (not the same data
759structure as {\tt A}, but similar).  The columns typically contain lists of
760row indices in unsorted order.  This function sorts these indices, for two
761purposes:  (1) to return {\tt L} and {\tt U} to MATLAB, which expects its
762sparse matrices to have sorted columns, and (2) to slightly improve the
763performance of subsequent calls to {\tt klu\_solve} and {\tt klu\_tsolve}.
764Except within a MATLAB mexFunction (see {\tt KLU/MATLAB/klu\_mex.c}, the use
765of this function is optional.
766
767{\footnotesize
768\begin{verbatim}
769    #include "klu.h"
770    int ok ;
771    klu_symbolic *Symbolic ;
772    klu_numeric *Numeric ;
773    klu_common Common ;
774    ok = klu_sort (Symbolic, Numeric, &Common) ;                                      /* real */
775    ok = klu_z_sort (Symbolic, Numeric, &Common) ;                                    /* complex */
776
777
778    #include "klu.h"
779    SuiteSparse_long ok ;
780    klu_l_symbolic *Symbolic ;
781    klu_l_numeric *Numeric ;
782    klu_l_common Common ;
783    ok = klu_l_sort (Symbolic, Numeric, &Common) ;                                    /* real */
784    ok = klu_zl_sort (Symbolic, Numeric, &Common) ;                                   /* complex */
785\end{verbatim}
786}
787
788%-------------------------------------------------------------------------------
789\subsection{{\tt klu\_flops}: determine the flop count}
790%-------------------------------------------------------------------------------
791
792This function determines the number of floating-point operations performed
793when the matrix was factorized by {\tt klu\_factor} or {\tt klu\_refactor}.
794The result is returned in {\tt Common.flops}.
795
796
797{\footnotesize
798\begin{verbatim}
799    #include "klu.h"
800    int ok ;
801    klu_symbolic *Symbolic ;
802    klu_numeric *Numeric ;
803    klu_common Common ;
804    ok = klu_flops (Symbolic, Numeric, &Common) ;                                     /* real */
805    ok = klu_z_flops (Symbolic, Numeric, &Common) ;                                   /* complex */
806
807
808    #include "klu.h"
809    SuiteSparse_long ok ;
810    klu_l_symbolic *Symbolic ;
811    klu_l_numeric *Numeric ;
812    klu_l_common Common ;
813    ok = klu_l_flops (Symbolic, Numeric, &Common) ;                                   /* real */
814    ok = klu_zl_flops (Symbolic, Numeric, &Common) ;                                  /* complex */
815\end{verbatim}
816}
817
818%-------------------------------------------------------------------------------
819\subsection{{\tt klu\_rgrowth}: determine the pivot growth}
820%-------------------------------------------------------------------------------
821
822Computes the reciprocal pivot growth,
823$\mbox{\em rgrowth} = \min_j (( \max_i |c_{ij}| ) / ( \max_i |u_{ij}| ))$,
824where $c_{ij}$ is a scaled entry in a diagonal block of the block triangular
825form.  In MATLAB notation:
826\begin{verbatim}
827    rgrowth = min (max (abs (R\A(p,q) - F)) ./ max (abs (U)))
828\end{verbatim}
829where the factorization is \verb'L*U + F = R \ A(p,q)'.
830This function returns 0 if an error occurred, 1 otherwise.  If {\tt rgrowth} is
831very small, an inaccurate factorization may have been performed.  The inputs
832{\tt Ap}, {\tt Ai}, and {\tt Ax}  ({\tt Az} in the complex case) must be
833unchanged since the last call to {\tt klu\_factor} or {\tt klu\_refactor}.  The
834result is returned in {\tt Common.rgrowth}.
835
836{\footnotesize
837\begin{verbatim}
838    #include "klu.h"
839    int ok, Ap [n+1], Ai [nz] ;
840    double Ax [nz], Az [2*nz] ;
841    klu_symbolic *Symbolic ;
842    klu_numeric *Numeric ;
843    klu_common Common ;
844    ok = klu_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ;                       /* real */
845    ok = klu_z_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ;                     /* complex */
846
847
848    #include "klu.h"
849    SuiteSparse_long ok, Ap [n+1], Ai [nz] ;
850    double Ax [nz], Az [2*nz] ;
851    klu_l_symbolic *Symbolic ;
852    klu_l_numeric *Numeric ;
853    klu_l_common Common ;
854    ok = klu_l_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ;                     /* real */
855    ok = klu_zl_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ;                    /* complex */
856\end{verbatim}
857}
858
859%-------------------------------------------------------------------------------
860\subsection{{\tt klu\_condest}: accurate condition number estimation}
861%-------------------------------------------------------------------------------
862
863This function is essentially the same as the MATLAB {\tt condest} function.  It
864computes an estimate of the 1-norm condition number, using Hager's method
865\cite{Hager84} and the generalization by Higham and Tisseur
866\cite{HighamTisseur00}.  The inputs {\tt Ap}, and {\tt Ax} ({\tt Az} in the
867complex case) must be unchanged since the last call to {\tt klu\_factor} or
868{\tt klu\_refactor}.  The result is returned in {\tt Common.condest}.
869
870{\footnotesize
871\begin{verbatim}
872    #include "klu.h"
873    int ok, Ap [n+1] ;
874    double Ax [nz], Az [2*nz] ;
875    klu_symbolic *Symbolic ;
876    klu_numeric *Numeric ;
877    klu_common Common ;
878    ok = klu_condest (Ap, Ax, Symbolic, Numeric, &Common) ;                           /* real */
879    ok = klu_z_condest (Ap, Az, Symbolic, Numeric, &Common) ;                         /* complex */
880
881
882    #include "klu.h"
883    SuiteSparse_long ok, Ap [n+1] ;
884    double Ax [nz], Az [2*nz] ;
885    klu_l_symbolic *Symbolic ;
886    klu_l_numeric *Numeric ;
887    klu_l_common Common ;
888    ok = klu_l_condest (Ap, Ax, Symbolic, Numeric, &Common) ;                         /* real */
889    ok = klu_zl_condest (Ap, Az, Symbolic, Numeric, &Common) ;                        /* complex */
890\end{verbatim}
891}
892
893
894%-------------------------------------------------------------------------------
895\subsection{{\tt klu\_rcond}: cheap reciprocal condition number estimation}
896%-------------------------------------------------------------------------------
897
898This function returns the smallest diagonal entry of {\tt U} divided by the
899largest, which is a very crude estimate of the reciprocal of the condition
900number of the matrix {\tt A}.  It is very cheap to compute, however.
901In MATLAB notation, {\tt rcond = min(abs(diag(U))) / max(abs(diag(U)))}.
902If the matrix is singular, {\tt rcond} will be zero.  The result is returned
903in {\tt Common.rcond}.
904
905
906{\footnotesize
907\begin{verbatim}
908    #include "klu.h"
909    int ok ;
910    klu_symbolic *Symbolic ;
911    klu_numeric *Numeric ;
912    klu_common Common ;
913    ok = klu_rcond (Symbolic, Numeric, &Common) ;                                     /* real */
914    ok = klu_z_rcond (Symbolic, Numeric, &Common) ;                                   /* complex */
915
916
917    #include "klu.h"
918    SuiteSparse_long ok ;
919    klu_l_symbolic *Symbolic ;
920    klu_l_numeric *Numeric ;
921    klu_l_common Common ;
922    ok = klu_l_rcond (Symbolic, Numeric, &Common) ;                                   /* real */
923    ok = klu_zl_rcond (Symbolic, Numeric, &Common) ;                                  /* complex */
924\end{verbatim}
925}
926
927
928%-------------------------------------------------------------------------------
929\subsection{{\tt klu\_scale}: scale and check a sparse matrix}
930%-------------------------------------------------------------------------------
931
932This function computes the row scaling factors of a matrix and checks to see if
933it is a valid sparse matrix.  It can perform two kinds of scaling, computing
934either the largest magnitude in each row, or the sum of the magnitudes of the
935entries each row.  KLU calls this function itself, depending upon the {\tt
936Common.scale} parameter, where {\tt scale < 0} means no scaling, {\tt scale=1}
937means the sum, and {\tt scale=2} means the maximum.  That is, in MATLAB
938notation, {\tt Rs = sum(abs(A'))} or {\tt Rs = max(abs(A'))}.  KLU then divides
939each row of {\tt A} by its corresponding scale factor.  The function returns 0
940if the matrix is invalid, or 1 otherwise.  A valid sparse matrix must meet the
941following conditions:
942
943\begin{enumerate}
944\item {\tt n > 0}.  Note that KLU does not handle empty (0-by-0) matrices.
945\item {\tt Ap}, {\tt Ai}, and {\tt Ax} ({\tt Az} for the complex case) must not be NULL.
946\item {\tt Ap[0]=0}, and {\tt Ap [j] <= Ap [j+1]} for all {\tt j} in the range 0 to {\tt n-1}.
947\item The row indices in each column, {\tt Ai [Ap [j] ... Ap [j+1]-1]}, must be in
948the range 0 to {\tt n-1}, and no duplicates can appear.  If the workspace {\tt W} is
949NULL on input, the check for duplicate entries is skipped.
950\end{enumerate}
951
952{\footnotesize
953\begin{verbatim}
954    #include "klu.h"
955    int scale, ok, n, Ap [n+1], Ai [nz], W [n] ;
956    double Ax [nz], Az [2*nz], Rs [n] ;
957    klu_common Common ;
958    ok = klu_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ;               /* real */
959    ok = klu_z_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ;             /* complex */
960
961
962    #include "klu.h"
963    SuiteSparse_long scale, ok, n, Ap [n+1], Ai [nz], W [n] ;
964    double Ax [nz], Az [2*nz], Rs [n] ;
965    klu_l_common Common ;
966    ok = klu_l_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ;             /* real */
967    ok = klu_zl_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ;            /* complex */
968\end{verbatim}
969}
970
971%-------------------------------------------------------------------------------
972\subsection{{\tt klu\_extract}: extract the LU factorization}
973%-------------------------------------------------------------------------------
974
975This function extracts the LU factorization into a set of data structures
976suitable for passing back to MATLAB, with matrices in conventional
977compressed-column form.  The {\tt klu\_sort} function should be called first if
978the row indices should be returned sorted.  The factorization is returned in
979caller-provided arrays; if any of them are NULL, that part of the factorization
980is not extracted (this is not an error).  Returns 1 if successful, 0 otherwise.
981
982The sizes of {\tt Li}, {\tt Lx}, and {\tt Lz} are {\tt Numeric->lnz},
983{\tt Ui}, {\tt Ux}, and {\tt Uz} are of size {\tt Numeric->unz}, and
984{\tt Fi}, {\tt Fx}, and {\tt Fz} are of size {\tt Numeric->nzoff}.
985Note that in the complex versions, the real and imaginary parts are returned
986in separate arrays, to be compatible with how MATLAB stores complex matrices.
987
988This function is not required to solve a linear system with KLU.  KLU does not
989itself make use of the extracted LU factorization returned by this function.
990It is only provided to simplify the MATLAB interface to KLU, and it may be of
991use to the end user who wishes to examine the contents of the LU factors.
992
993{\footnotesize
994\begin{verbatim}
995    #include "klu.h"
996    int ok, Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], Fi [nzoff], P [n], Q [n], R [n] ;
997    double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ;
998    klu_symbolic *Symbolic ;
999    klu_numeric *Numeric ;
1000    klu_common Common ;
1001    ok = klu_extract (Numeric, Symbolic,
1002        Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ;                   /* real */
1003    ok = klu_z_extract (Numeric, Symbolic,
1004        Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ;       /* complex */
1005
1006
1007    #include "klu.h"
1008    SuiteSparse_long ok, Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1],
1009        Fi [nzoff], P [n], Q [n], R [n] ;
1010    double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ;
1011    klu_l_symbolic *Symbolic ;
1012    klu_l_numeric *Numeric ;
1013    klu_l_common Common ;
1014    ok = klu_l_extract (Numeric, Symbolic,
1015        Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ;                   /* real */
1016    ok = klu_zl_extract (Numeric, Symbolic,
1017        Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ;       /* complex */
1018\end{verbatim}
1019}
1020
1021%-------------------------------------------------------------------------------
1022\subsection{{\tt klu\_malloc}, {\tt klu\_free}, {\tt klu\_realloc}:
1023memory management}
1024%-------------------------------------------------------------------------------
1025
1026KLU uses a set of wrapper routines for {\tt malloc}, {\tt free}, and {\tt
1027realloc}.  By default, these wrapper routines call the ANSI C versions of these
1028functions.  However, pointers to functions in {\tt Common} can be modified
1029after calling {\tt klu\_defaults} to allow the use of other memory management
1030functions (such as the MATLAB {\tt mxMalloc}, {\tt mxFree}, and {\tt
1031mxRealloc}.  These wrapper functions keep track of the current and peak memory
1032usage of KLU.  They can be called by the user.
1033
1034{\tt klu\_malloc} is essentially the same as {\tt p = malloc (n * sizeof
1035(size))}, {\tt klu\_free} is essentially the same as {\tt free(p)} except that
1036{\tt klu\_free} returns NULL which can then be assigned to {\tt p}.  {\tt
1037klu\_realloc} is similar to {\tt realloc}, except that if the reallocation
1038fails, {\tt p} is returned unchanged.  Failure conditions are returned in {\tt
1039Common.status}.
1040
1041{\footnotesize
1042\begin{verbatim}
1043    #include "klu.h"
1044    size_t n, nnew, nold, size ;
1045    void *p ;
1046    klu_common Common ;
1047    p = klu_malloc (n, size, &Common) ;
1048    p = klu_free (p, n, size, &Common) ;
1049    p = klu_realloc (nnew, nold, size, p, &Common) ;
1050
1051
1052    #include "klu.h"
1053    size_t n, nnew, nold, size ;
1054    void *p ;
1055    klu_l_common Common ;
1056    p = klu_l_malloc (n, size, &Common) ;
1057    p = klu_l_free (p, n, size, &Common) ;
1058    p = klu_l_realloc (nnew, nold, size, p, &Common) ;
1059    \end{verbatim}
1060}
1061
1062%-------------------------------------------------------------------------------
1063\subsection{{\tt btf\_maxtrans}: maximum transversal}
1064%-------------------------------------------------------------------------------
1065
1066The BTF package includes three user-callable functions (each with {\tt int}
1067and {\tt SuiteSparse\_long} versions).  They do not need to be called directly by an
1068application that uses KLU.  KLU will call these functions to perform the
1069permutation into upper block triangular form.
1070
1071The {\tt btf\_maxtrans} function finds a column permutation {\tt Q} that gives
1072{\tt A*Q} a zero-free diagonal, if one exists.  If row {\tt i} is matched to
1073column {\tt j}, then {\tt Match[i]=j}.  If the matrix is structurally singular,
1074there will be some unmatched rows.  If row {\tt i} is unmatched, then {\tt
1075Match[i]=-1}.  If the matrix is square and structurally non-singular, then {\tt
1076Q=Match} is the column permutation.  The {\tt btf\_maxtrans} function can
1077accept as input a rectangular matrix; it operates on the bipartite graph of
1078{\tt A}.  It returns the number of columns matched.  Unlike the KLU
1079user-callable functions, the BTF functions do not check its inputs at all; a
1080segmentation fault will occur if any input pointers are NULL, for example.
1081
1082The function can require up to $O$({\tt n*nnz(A)}) time (excluding the {\em
1083cheap match} phase, which takes another $O$({\tt nnz(A)}) time.  If {\tt
1084maxwork > 0} on input, the work is limited to $O$({\tt maxwork*nnz(A)})
1085(excluding the cheap match), but the maximum transversal might not be found if
1086the limit is reached.
1087
1088The {\tt Work} array is workspace required by the methods; its contents
1089are undefined on input and output.
1090
1091{\footnotesize
1092\begin{verbatim}
1093    int nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ;
1094    double maxwork, work ;
1095    nmatch = btf_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ;
1096
1097
1098    SuiteSparse_long nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ;
1099    double maxwork, work ;
1100    nmatch = btf_l_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ;
1101\end{verbatim}
1102}
1103
1104%-------------------------------------------------------------------------------
1105\subsection{{\tt btf\_strongcomp}: strongly connected components}
1106%-------------------------------------------------------------------------------
1107
1108The {\tt btf\_strongcomp} function finds the strongly connected components of a
1109directed graph, returning a symmetric permutation {\tt P}.  The matrix {\tt A}
1110must be square.  The diagonal of {\tt A} (or {\tt A*Q} if a column permutation
1111is given on input) is ignored.  If {\tt Q} is NULL on input, the matrix
1112{\tt P*A*P'} is in upper block triangular form.  Otherwise, {\tt Q} is modified
1113on output so that {\tt P*A*Q} is in upper block triangular form.  The vector
1114{\tt R} gives the block boundaries, where the {\tt k}th block consists of
1115rows and columns {\tt R[k]} through {\tt R[k+1]-1} in the permuted matrix.
1116The function returns the number of strongly connected components found
1117(the diagonal blocks in the block triangular form).
1118
1119{\footnotesize
1120\begin{verbatim}
1121    int n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ;
1122    ncomp = btf_strongcomp (n, Ap, Ai, Q, P, R, Work) ;
1123
1124
1125    SuiteSparse_long n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ;
1126    ncomp = btf_l_strongcomp (n, Ap, Ai, Q, P, R, Work) ;
1127\end{verbatim}
1128}
1129
1130%-------------------------------------------------------------------------------
1131\subsection{{\tt btf\_order}: permutation to block triangular form}
1132%-------------------------------------------------------------------------------
1133
1134
1135The {\tt btf\_order} function combines the above two functions, first finding a
1136maximum transversal and then permuting the resulting matrix into upper block
1137triangular form.  Unlike {\tt dmperm} in MATLAB, it always reveals the maximum
1138matching along the diagonal, even if the matrix is structurally singular.
1139
1140On output, {\tt P} and {\tt Q} are the row and column permutations, where
1141{\tt i = P[k]} if row {\tt i} of {\tt A} is the {\tt k}th row of {\tt P*A*Q},
1142and {\tt j = BTF\_UNFLIP(Q[k])} if column {\tt j} of {\tt A} is the {\tt k}th
1143column of {\tt P*A*Q}.  If {\tt Q[k] < 0}, then the {\tt (k,k)}th entry in
1144{\tt P*A*Q} is structurally zero.  The vector {\tt R}, and the return value,
1145are the same as {\tt btf\_strongcomp}.
1146
1147{\footnotesize
1148\begin{verbatim}
1149    int n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ;
1150    double maxwork, work ;
1151    ncomp = btf_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ;
1152
1153
1154    SuiteSparse_long n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ;
1155    double maxwork, work ;
1156    ncomp = btf_l_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ;
1157\end{verbatim}
1158}
1159
1160%------------------------------------------------------------------------------
1161\subsection{Sample C programs that use KLU}
1162%------------------------------------------------------------------------------
1163
1164Here is a simple main program, {\tt klu\_simple.c}, that illustrates the basic
1165usage of KLU.  It uses KLU, and indirectly makes use of BTF and AMD.  COLAMD is
1166required to compile the demo, but it is not called by this example.  It uses
1167statically defined global variables for the sparse matrix {\tt A}, which would
1168not be typical of a complete application.  It just makes for a simpler example.
1169
1170{\footnotesize
1171\input{klu_simple_c.tex}
1172}
1173
1174The {\tt Ap}, {\tt Ai}, and {\tt Ax} arrays represent the matrix
1175\[
1176A = \left[
1177\begin{array}{ccccc}
1178 2 &  3 &  0 &  0 &  0 \\
1179 3 &  0 &  4 &  0 &  6 \\
1180 0 & -1 & -3 &  2 &  0 \\
1181 0 &  0 &  1 &  0 &  0 \\
1182 0 &  4 &  2 &  0 &  1 \\
1183\end{array}
1184\right].
1185\]
1186The solution to $Ax=b$ is $x = [1 \, 2 \, 3 \, 4 \, 5]^T$.  The program
1187uses default control settings (no scaling, permutation to block triangular
1188form, and the AMD ordering).  It ignores the error codes in the return values
1189and {\tt Common.status}.
1190
1191The block triangular form found by {\tt btf\_order} for this matrix is
1192given below
1193\[
1194PAQ = \left[
1195\begin{array}{c|ccc|c}
11962 & 0 & 0 & -1 & -3 \\
1197\hline
1198  & 2 & 0 & 3 & 0 \\
1199  & 3 & 6 & 0 & 4 \\
1200  & 0 & 1 & 4 & 1 \\
1201\hline
1202  &   &   &   & 1 \\
1203\end{array}
1204\right].
1205\]
1206This ordering is not modified by the AMD ordering because the 3-by-3 matrix
1207$A_{22} + A_{22}^T$ happens to be a dense matrix.  No partial pivoting happens
1208to occur during LU factorization; all pivots are selected along the diagonal of
1209each block.  The matrix contains two singletons, which are the original entries
1210$a_{34}=2$ and $a_{43}=1$, and one 3-by-3 diagonal block (in which a single
1211fill-in entry occurs during factorization: the $u_{23}$ entry of this 3-by-3
1212matrix).
1213
1214For a more complete program that uses KLU, see {\tt KLU/Demo/kludemo.c} for an
1215{\tt int} version, and {\tt KLU/Demo/kluldemo.c} for a version that uses {\tt
1216SuiteSparse\_long} instead.  The top-level main routine uses CHOLMOD to read in a
1217compressed-column sparse matrix from a Matrix Market file, because KLU does not
1218include such a function.  Otherwise, no CHOLMOD functions are used.  Unlike
1219{\tt klu\_simple.c}, CHOLMOD is required to run the {\tt kludemo.c} and {\tt
1220kluldemo.c} programs.
1221
1222%------------------------------------------------------------------------------
1223\section{Installation}
1224\label{Install}
1225%------------------------------------------------------------------------------
1226
1227Installation of the C-callable interface requires the {\tt make} utility, in
1228Linux/Unix.  Alternatively, you can use the Cygwin {\tt make} in Windows.
1229The MATLAB installation in any platform, including Windows is simple; just
1230type {\tt klu\_install} to compile and install KLU, BTF, AMD, and COLAMD.
1231
1232For {\tt make}, system-dependent configurations are in the {\tt
1233../SuiteSparse\_config/SuiteSparse\_config.mk} file.
1234You can edit that file to customize the
1235compilation, but it now automatically detecs your system (Linux, Mac, etc).
1236So it's not likely that you will need to edit that file.
1237
1238To compile and install the C-callable KLU, BTF, AMD, and COLAMD libraries, go
1239to the {\tt KLU} directory and type {\tt make}.  The KLU and BTF libraries are
1240placed in {\tt KLU/Lib/libklu.*} and {\tt BTF/Lib/libbtf.*}.
1241Two KLU demo
1242programs will be compiled and tested in the {\tt KLU/Demo} directory.  You can
1243compare the output of {\tt make} with the results in the KLU distribution, {\tt
1244kludemo.out}.
1245
1246Typing {\tt make clean} will remove all but the final compiled libraries and
1247demo programs.  Typing {\tt make purge} or {\tt make distclean} removes all
1248files not in the original distribution.  If you compile KLU or BTF and then
1249later change the {\tt ../SuiteSparse\_config/SuiteSparse\_config.mk}
1250file then you should type {\tt
1251make purge} and then {\tt make} to recompile.
1252
1253When you compile your program that uses the C-callable KLU library, you need to
1254add the {\tt KLU/Lib/libklu.*}, {\tt BTF/Lib/libbtf.*}, {\tt AMD/Lib/libamd.*},
1255and {\tt COLAMD/Lib/libcolamd.*} libraries, and you need to tell your compiler to
1256look in the {\tt KLU/Include} and {\tt BTF/Include} directory for include
1257files.
1258Alternatively, do {\tt make install}, and KLU will be installed in
1259/usr/local/lib and /usr/local/include, and documentation is placed in
1260/usr/local/doc.  These installation locations can be changed;
1261see {\tt SuiteSparse/README.txt} for details.
1262
1263If all you want to use is the KLU mexFunction in MATLAB, you can skip the use
1264of the {\tt make} command entirely.  Simply type {\tt klu\_install} in the
1265MATLAB command window while in the {\tt KLU/MATLAB} directory.  This works on
1266any system with MATLAB, including Windows.
1267
1268%------------------------------------------------------------------------------
1269\newpage
1270\section{The KLU routines}
1271\label{klu_include}
1272%------------------------------------------------------------------------------
1273
1274The file {\tt KLU/Include/klu.h} listed below describes each user-callable
1275routine in the C version of KLU, and gives details on their use.
1276
1277{\footnotesize
1278\input{klu_h.tex}
1279}
1280
1281%------------------------------------------------------------------------------
1282\newpage
1283\section{The BTF routines}
1284\label{btf_include}
1285%------------------------------------------------------------------------------
1286
1287The file {\tt BTF/Include/btf.h} listed below describes each user-callable
1288routine in the C version of BTF, and gives details on their use.
1289
1290{\footnotesize
1291\input{btf_h.tex}
1292}
1293
1294%------------------------------------------------------------------------------
1295\newpage
1296% References
1297%------------------------------------------------------------------------------
1298
1299\bibliographystyle{plain}
1300\bibliography{KLU_UserGuide}
1301
1302\end{document}
1303