1\chapter{Using \libflame} 2\label{chapter:using} 3 4This chapter contains code examples that illustrate how to use \libflame in 5your application. 6 7\section{FLAME/C examples} 8\label{sec:flamec-examples} 9 10Let us begin by illustrating a small program that uses LAPACK. 11Figure \ref{fig:fla-chol-orig} contains a C language program that acquires a 12matrix buffer and its dimension properties, performs a Cholesky factorization 13on the matrix, and then frees the memory associated with the matrix buffer. 14 15\begin{figure}[h] 16\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,fontsize=\footnotesize] 17int main( void ) 18{ 19 double* buffer; 20 int m, rs, cs; 21 int info; 22 char uplo = 'L'; 23 24 // Get the matrix buffer address, size, and row and column strides. 25 get_matrix_info( &buffer, &m, &rs, &cs ); 26 27 // Compute the Cholesky factorization of the matrix, reading from and 28 // updating the lower triangle. 29 dpotrf_( &uplo, &m, buffer, &cs, &info ); 30 31 // Free the matrix buffer. 32 free_matrix( buffer ); 33 34 return 0; 35} 36\end{Verbatim} 37\caption{ 38A simple program that calls {\tt dpotrf()} from LAPACK. 39} 40\label{fig:fla-chol-orig} 41\end{figure} 42 43\noindent 44The program is trivial in that it does not do anything with the factored 45matrix before exiting. 46Furthermore, the corresponding code found in most real-world programs would 47most likely exist within a loop of some sort. 48However, we are keeping things simple here to better illustrate the usage 49of \libflame functions. 50 51Now suppose we wish to modify the previous program to use the FLAME/C API 52within \libflame. 53There are two general methods. 54\begin{itemize} 55\item 56Create a \libflame object without a buffer and then attach the conventional 57row- or column-major matrix buffer to the bufferless \libflame object. 58This method almost always requires the fewest number of code changes in the 59application. 60\item 61Modify the application such that the matrix is created natively along with 62the \libflame object. 63This will require the user to interface the application to the matrix data 64within the object using various query routines. 65This method often involves more work because many applications are written 66to access matrix buffers directly without any abstractions. 67There are two different strategies for implementing this method, and 68depending on the nature of the application, one strategy may be more 69appropriate than the other: 70\begin{itemize} 71\item 72The matrix may be created and fully initialized, and then copied into a 73\libflame object. 74\item 75The matrix may be created and initialized piecemeal, perhaps one block at 76a time. 77\end{itemize} 78Regardless of whether the matrix is initialized in full or one submatrix at 79a time, the user may use \flacopybuffertoobject to copy the data from a 80conventional column-major matrix arrays to \libflame objects. 81\end{itemize} 82 83 84\begin{figure}[t] 85\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 86\textcolor{red}{#include "FLAME.h"} 87 88int main( void ) 89\{ 90 double* buffer; 91 int m, rs, cs; 92 \textcolor{red}{FLA_Obj A;} 93 94 // Initialize libflame. 95 \textcolor{red}{FLA_Init();} 96 97 // Get the matrix buffer address, size, and row and column strides. 98 get_matrix_info( &buffer, &m, &rs, &cs ); 99 100 // Create an m x m double-precision libflame object without a buffer, 101 // and then attach the matrix buffer to the object. 102 \textcolor{red}{FLA_Obj_create_without_buffer( FLA_DOUBLE, m, m, &A );} 103 \textcolor{red}{FLA_Obj_attach_buffer( buffer, rs, cs, &A );} 104 105 // Compute the Cholesky factorization, storing to the lower triangle. 106 \textcolor{red}{FLA_Chol( FLA_LOWER_TRIANGULAR, A );} 107 108 // Free the object without freeing the matrix buffer. 109 \textcolor{red}{FLA_Obj_free_without_buffer( &A );} 110 111 // Free the matrix buffer. 112 free_matrix( buffer ); 113 114 // Finalize libflame. 115 \textcolor{red}{FLA_Finalize();} 116 117 return 0; 118\} 119\end{Verbatim} 120\caption{ 121The program from Figure \ref{fig:fla-chol-orig} modified to use \libflame 122objects. 123This example code illustrates the minimal amount of work to use FLAME/C APIs 124in a program that was originally designed to use the BLAS or LAPACK. 125} 126\label{fig:fla-chol-attach} 127\end{figure} 128 129The program in Figure \ref{fig:fla-chol-attach} uses the first method to 130integrate \libflamens. 131Note that changes from the original example are tracked in red. 132We start by inserting a {\tt \#include} directive for the \libflame header 133file, {\tt FLAME.h}. 134Before calling any other \libflame functions, we must first invoke \flainitns. 135Next, we replace the invocation to {\tt dpotrf()} with four lines of 136\libflame code. 137First, an $ m \by m $ object {\tt A} of datatype \fladouble is created without 138a buffer. 139Then the matrix buffer \buffer is attached to the \libflame object, assuming 140row and column strides \rs and \csns. 141The Cholesky factorization is invoked on {\tt A} with \flacholns. 142And finally, the matrix object is released with \flaobjfreewithoutbufferns. 143The library is finalized with a call to \flafinalizens. 144 145 146\begin{figure}[t] 147\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 148\textcolor{red}{#include "FLAME.h"} 149 150int main( void ) 151\{ 152 double* buffer; 153 int m, rs, cs; 154 \textcolor{red}{FLA_Obj A;} 155 156 // Initialize libflame. 157 \textcolor{red}{FLA_Init();} 158 159 // Get the matrix buffer address, size, and row and column strides. 160 get_matrix_info( &buffer, &m, &rs, &cs ); 161 162 // Create an m x m double-precision libflame object. 163 \textcolor{red}{FLA_Obj_create( FLA_DOUBLE, m, m, rs, cs, &A );} 164 165 // Copy the contents of the conventional matrix into a libflame object. 166 \textcolor{red}{FLA_Copy_buffer_to_object( FLA_NO_TRANSPOSE, m, m, buffer, rs, cs, 0, 0, A );} 167 168 // Compute the Cholesky factorization, storing to the lower triangle. 169 \textcolor{red}{FLA_Chol( FLA_LOWER_TRIANGULAR, A );} 170 171 // Free the object. 172 \textcolor{red}{FLA_Obj_free( &A );} 173 174 // Free the matrix buffer. 175 free_matrix( buffer ); 176 177 // Finalize libflame. 178 \textcolor{red}{FLA_Finalize();} 179 180 return 0; 181\} 182\end{Verbatim} 183\caption{ 184The program from Figure \ref{fig:fla-chol-orig} modified to use 185\libflame objects natively. 186This code does not attach the conventional matrix buffer to a bufferless 187object and instead copies the matrix contents into the object using 188\flacopybuffertoobjectns. 189Note that the matrix is copied all at once, and thus here we assume that 190original matrix is fully initialized in {\tt initialize\_matrix()} 191} 192\label{fig:fla-chol-native1} 193\end{figure} 194 195The second method requires somewhat more extensive modifications to the original 196program. 197In Figure \ref{fig:fla-chol-native1}, we revise and extend the previous 198example. 199This program initializes the matrix as before, but then creates a \libflame 200object natively (with an internal buffer), and then copies the contents of 201the conventional matrix into the \libflame object all at once. 202 203 204\begin{figure}[t] 205\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 206\textcolor{red}{#include "FLAME.h"} 207 208int main( void ) 209\{ 210 double* buffer; 211 int m, rs, cs\textcolor{red}{, b}; 212 \textcolor{red}{int i, j;} 213 \textcolor{red}{FLA_Obj A;} 214 215 // Initialize libflame. 216 \textcolor{red}{FLA_Init();} 217 218 // Get the matrix buffer address, size, row and column strides, and block size. 219 get_matrix_info( &buffer, &m, &rs, &cs\textcolor{red}{, &b }); 220 221 // Create an m x m double-precision libflame object. 222 \textcolor{red}{FLA_Obj_create( FLA_DOUBLE, m, m, rs, cs, &A );} 223 224 // Acquire the conventional matrix one block at a time and copy these 225 // blocks into the appropriate location within the libflame object. 226 \textcolor{red}{for( j = 0; j < m; j += b )} 227 \textcolor{red}{\{} 228 \textcolor{red}{for( i = 0; i < m; i += b )} 229 \textcolor{red}{\{} 230 \textcolor{red}{double* ij_ptr;} 231 \textcolor{red}{int b_m, b_n;} 232 233 // Compute the block dimensions, in case they are blocks along the lower and/or 234 // right edges of the overall matrix. 235 \textcolor{red}{b_m = ( m - i < b ? m - i : b );} 236 \textcolor{red}{b_n = ( m - j < b ? m - j : b );} 237 238 // Get a pointer to the b_m x b_n block that starts at element (i,j). 239 \textcolor{red}{ij_ptr = FLA_Submatrix_at( FLA_DOUBLE, buffer, i, j, rs, cs );} 240 241 // Copy the current block into the correct location within the libflame object. 242 \textcolor{red}{FLA_Copy_buffer_to_object( FLA_NO_TRANSPOSE, b_m, b_n, ij_ptr, rs, cs, i, j, A );} 243 \textcolor{red}{\}} 244 \textcolor{red}{\}} 245 246 // Compute the Cholesky factorization, storing to the lower triangle. 247 \textcolor{red}{FLA_Chol( FLA_LOWER_TRIANGULAR, A );} 248 249 // Free the object. 250 \textcolor{red}{FLA_Obj_free( &A );} 251 252 // Finalize libflame. 253 \textcolor{red}{FLA_Finalize();} 254 255 return 0; 256\} 257\end{Verbatim} 258\caption{ 259The program from Figure \ref{fig:fla-chol-orig} modified to use FLAME/C 260in a way that initializes a \libflame object incrementally, one block at a 261time. 262} 263\label{fig:fla-chol-native2} 264\end{figure} 265 266Finally, Figure \ref{fig:fla-chol-native2} shows what a program might look 267like if it were to use a native \libflame object but only copy over the data 268one block at a time. 269Here, we place \flacopybuffertoobject in a loop that copies a single 270submatrix per iteration. 271We use \flasubmatrixat to compute the starting address of the submatrix 272whose top-left element is the $ (i,j) $ element within the overall matrix 273stored in \bufferns. 274 275Note that \flacopybuffertoobject may also be used to copy over one 276row or column at a time. 277Copying single rows or columns are just special cases of copying rectangular 278blocks. 279 280 281 282 283\section{FLASH examples} 284\label{sec:flash-examples} 285 286Now let us discuss how we might convert the \libflame programs in 287Section \ref{sec:flamec-examples} to use the FLASH API. 288Please see Section \ref{sec:flash} for a full discussion of FLASH, including 289the motivation behind hierarchical objects and a summary of related 290terminology. 291 292%When using hierarchial objects, the user must consider how many levels 293%to build into the object hierarchies. 294 295In the previous section, we reviewed a code (Figure \ref{fig:fla-chol-attach}) 296that uses \libflame functions with an existing matrix buffer. 297Figure \ref{fig:flash-chol-attach} shows what this code would look like 298if we wished to use hierarchical objects. 299 300\begin{figure}[h] 301\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 302#include "FLAME.h" 303 304int main( void ) 305\{ 306 double* buffer; 307 int m, rs, cs\textcolor{red}{, b}; 308 FLA_Obj A; 309 310 // Initialize libflame. 311 FLA_Init(); 312 313 // Get the matrix buffer address, size, row and column strides, and blocksize. 314 get_matrix_info( &buffer, &m, &rs, &cs\textcolor{red}{, &b} ); 315 316 // Create an m x m double-precision hierarchical object without a buffer, 317 // of depth 1 and blocksize b, and then attach the matrix buffer to the object. 318 FLA\textcolor{red}{SH}_Obj_create_without_buffer( FLA_DOUBLE, m, m, \textcolor{red}{1, &b,} &A ); 319 FLA\textcolor{red}{SH}_Obj_attach_buffer( buffer, rs, cs, &A ); 320 321 // Compute the Cholesky factorization, storing to the lower triangle. 322 FLA\textcolor{red}{SH}_Chol( FLA_LOWER_TRIANGULAR, A ); 323 324 // Free the object without freeing the matrix buffer. 325 FLA\textcolor{red}{SH}_Obj_free_without_buffer( &A ); 326 327 // Free the matrix buffer. 328 free_matrix( buffer ); 329 330 // Finalize libflame. 331 FLA_Finalize(); 332 333 return 0; 334\} 335\end{Verbatim} 336\caption{ 337The program from Figure \ref{fig:fla-chol-attach} modified to use the 338FLASH API. 339} 340\label{fig:flash-chol-attach} 341\end{figure} 342 343\noindent 344Note that the changes from the corresponding FLAME/C code are highlighted in 345red. 346The application-specific code changes are limited to inputting a blocksize 347value to use in the creation of the hierarchical object {\tt A}. 348All of the \libflame function names are the same as in Figure 349\ref{fig:fla-chol-attach} except that the prefix has changed from 350{\tt FLA\_} to {\tt FLASH\_}. 351Additionally, all of the function type signatures are the same, except 352for the invocation to \flashobjcreatewithoutbufferns. 353This function takes two additional arguments: a depth, and an array of 354blocksizes.\footnote{Since the depth is 1 in this example, we choose to 355simply pass the address of the integer {\tt b} rather than create a separate 356single-element array.} 357The depth and the blocksize array together determine the details of the 358object hierarchy. 359Also note that since a conventional matrix buffer is being attached, the 360hierarchical object {\tt A} will refer to submatrices that are not contiguous 361in memory. 362 363 364\begin{figure}[t] 365\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 366#include "FLAME.h" 367 368int main( void ) 369\{ 370 double* buffer; 371 int m, rs, cs\textcolor{red}{, b}; 372 FLA_Obj A; 373 374 // Initialize libflame. 375 FLA_Init(); 376 377 // Get the matrix buffer address, size, row and column strides, and blocksize. 378 get_matrix_info( &m, &rs, &cs\textcolor{red}{, &b} ); 379 380 // Create an m x m double-precision libflame object. 381 FLA\textcolor{red}{SH}_Obj_create( FLA_DOUBLE, m, m, \textcolor{red}{1, &b,} &A ); 382 383 // Copy the contents of the conventional matrix into a libflame object. 384 FLA\textcolor{red}{SH}_Copy_buffer_to_hier( m, m, buffer, rs, cs, 0, 0, A ); 385 386 // Compute the Cholesky factorization, storing to the lower triangle. 387 FLA\textcolor{red}{SH}_Chol( FLA_LOWER_TRIANGULAR, A ); 388 389 // Free the object. 390 FLA\textcolor{red}{SH}_Obj_free( &A ); 391 392 // Free the matrix buffer. 393 free_matrix( buffer ); 394 395 // Finalize libflame. 396 FLA_Finalize(); 397 398 return 0; 399\} 400\end{Verbatim} 401\caption{ 402The program from Figure \ref{fig:fla-chol-native1} modified to use the 403FLASH API. 404} 405\label{fig:flash-chol-native1} 406\end{figure} 407 408In similar fashion, we have modified the code in Figure \ref{fig:fla-chol-native1} 409to use hierarchical objects, as shown in Figure \ref{fig:flash-chol-native1}. 410The changes in this code are similar to those discussed for the previous example. 411Note that while \flacopybuffertoobject accepts a transposition argument, 412\flashcopyflattohier does not, and thus we had to remove this 413argument from the invocation of the latter function. 414 415\begin{figure}[t] 416\begin{Verbatim}[frame=single,framesep=2.5mm,xleftmargin=5mm,commandchars=\\\{\},fontsize=\footnotesize] 417#include "FLAME.h" 418 419int main( void ) 420\{ 421 double* buffer; 422 int m, rs, cs, b; 423 int i, j; 424 FLA_Obj A; 425 426 // Initialize libflame. 427 FLA_Init(); 428 429 // Get the matrix buffer address, size, row and column strides, and blocksize. 430 get_matrix_info( &buffer, &m, &rs, &cs, &b ); 431 432 // Create an m x m double-precision libflame object. 433 FLA\textcolor{red}{SH}_Obj_create( FLA_DOUBLE, m, m, \textcolor{red}{1, &b,} &A ); 434 435 // Acquire the conventional matrix one block at a time and copy these 436 // blocks into the appropriate location within the libflame object. 437 for( j = 0; j < m; j += b ) 438 \{ 439 for( i = 0; i < m; i += b ) 440 \{ 441 double* ij_ptr; 442 int b_m, b_n; 443 444 // Compute the block dimensions, in case they are blocks along the lower and/or 445 // right edges of the overall matrix. 446 b_m = ( m - i < b ? m - i : b ); 447 b_n = ( m - j < b ? m - j : b ); 448 449 // Get a pointer to the b_m x b_n block that starts at element (i,j). 450 ij_ptr = FLA_Submatrix_at( FLA_DOUBLE, buffer, i, j, rs, cs ); 451 452 // Copy the current block into the correct location within the libflame object. 453 FLA\textcolor{red}{SH}_Copy_buffer_to_hier( b_m, b_n, ij_ptr, rs, cs, i, j, A ); 454 \} 455 \} 456 457 // Compute the Cholesky factorization, storing to the lower triangle. 458 FLA\textcolor{red}{SH}_Chol( FLA_LOWER_TRIANGULAR, A ); 459 460 // Free the object. 461 FLA\textcolor{red}{SH}_Obj_free( &A ); 462 463 // Finalize libflame. 464 FLA_Finalize(); 465 466 return 0; 467\} 468\end{Verbatim} 469\caption{ 470The program from Figure \ref{fig:fla-chol-native2} modified to use 471the FLASH API. 472} 473\label{fig:flash-chol-native2} 474\end{figure} 475 476In Figure \ref{fig:flash-chol-native2}, we show the code from Figure 477\ref{fig:fla-chol-native2} modified to use hierarchical objects. 478Once again, most of the differences are limited to changing the 479function prefixes. 480The one other change deserves additional attention, though, which 481is the use of the blocksize {\tt b} in the object creation. 482In the previous code, the blocksize was used only to determine the 483sizes of the submatrices that were individually acquired and copied 484into the {\tt A}. 485This code still uses the blocksize in this manner. 486However, it also uses the same value to establish the size of the 487submatrix blocks in the hierarchical object. 488It should be emphasized that \flashcopyflattohier allows 489the user to copy submatrices into the object that are different in size 490than the sizes of the underlying leaf-level blocks. 491That is, the function is capable of handling copies that span multiple 492block boundaries. 493 494The key insight we hope to have impressed on our readers from these 495simple examples is that the FLASH API (1) provides an easy interface for 496creating and manipulating hierarchical objects, and (2) is strikingly 497similar to the original FLAME/C API wherever possible. 498 499 500 501\section{SuperMatrix examples} 502\label{sec:sm-examples} 503 504 505 506