1 SUBROUTINE PCGETRF( M, N, A, IA, JA, DESCA, IPIV, INFO ) 2* 3* -- ScaLAPACK routine (version 1.7) -- 4* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 5* and University of California, Berkeley. 6* May 25, 2001 7* 8* .. Scalar Arguments .. 9 INTEGER IA, INFO, JA, M, N 10* .. 11* .. Array Arguments .. 12 INTEGER DESCA( * ), IPIV( * ) 13 COMPLEX A( * ) 14* .. 15* 16* Purpose 17* ======= 18* 19* PCGETRF computes an LU factorization of a general M-by-N distributed 20* matrix sub( A ) = (IA:IA+M-1,JA:JA+N-1) using partial pivoting with 21* row interchanges. 22* 23* The factorization has the form sub( A ) = P * L * U, where P is a 24* permutation matrix, L is lower triangular with unit diagonal ele- 25* ments (lower trapezoidal if m > n), and U is upper triangular 26* (upper trapezoidal if m < n). L and U are stored in sub( A ). 27* 28* This is the right-looking Parallel Level 3 BLAS version of the 29* algorithm. 30* 31* Notes 32* ===== 33* 34* Each global data object is described by an associated description 35* vector. This vector stores the information required to establish 36* the mapping between an object element and its corresponding process 37* and memory location. 38* 39* Let A be a generic term for any 2D block cyclicly distributed array. 40* Such a global array has an associated description vector DESCA. 41* In the following comments, the character _ should be read as 42* "of the global array". 43* 44* NOTATION STORED IN EXPLANATION 45* --------------- -------------- -------------------------------------- 46* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 47* DTYPE_A = 1. 48* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 49* the BLACS process grid A is distribu- 50* ted over. The context itself is glo- 51* bal, but the handle (the integer 52* value) may vary. 53* M_A (global) DESCA( M_ ) The number of rows in the global 54* array A. 55* N_A (global) DESCA( N_ ) The number of columns in the global 56* array A. 57* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 58* the rows of the array. 59* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 60* the columns of the array. 61* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 62* row of the array A is distributed. 63* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 64* first column of the array A is 65* distributed. 66* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 67* array. LLD_A >= MAX(1,LOCr(M_A)). 68* 69* Let K be the number of rows or columns of a distributed matrix, 70* and assume that its process grid has dimension p x q. 71* LOCr( K ) denotes the number of elements of K that a process 72* would receive if K were distributed over the p processes of its 73* process column. 74* Similarly, LOCc( K ) denotes the number of elements of K that a 75* process would receive if K were distributed over the q processes of 76* its process row. 77* The values of LOCr() and LOCc() may be determined via a call to the 78* ScaLAPACK tool function, NUMROC: 79* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 80* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 81* An upper bound for these quantities may be computed by: 82* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 83* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 84* 85* This routine requires square block decomposition ( MB_A = NB_A ). 86* 87* Arguments 88* ========= 89* 90* M (global input) INTEGER 91* The number of rows to be operated on, i.e. the number of rows 92* of the distributed submatrix sub( A ). M >= 0. 93* 94* N (global input) INTEGER 95* The number of columns to be operated on, i.e. the number of 96* columns of the distributed submatrix sub( A ). N >= 0. 97* 98* A (local input/local output) COMPLEX pointer into the 99* local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 100* On entry, this array contains the local pieces of the M-by-N 101* distributed matrix sub( A ) to be factored. On exit, this 102* array contains the local pieces of the factors L and U from 103* the factorization sub( A ) = P*L*U; the unit diagonal ele- 104* ments of L are not stored. 105* 106* IA (global input) INTEGER 107* The row index in the global array A indicating the first 108* row of sub( A ). 109* 110* JA (global input) INTEGER 111* The column index in the global array A indicating the 112* first column of sub( A ). 113* 114* DESCA (global and local input) INTEGER array of dimension DLEN_. 115* The array descriptor for the distributed matrix A. 116* 117* IPIV (local output) INTEGER array, dimension ( LOCr(M_A)+MB_A ) 118* This array contains the pivoting information. 119* IPIV(i) -> The global row local row i was swapped with. 120* This array is tied to the distributed matrix A. 121* 122* INFO (global output) INTEGER 123* = 0: successful exit 124* < 0: If the i-th argument is an array and the j-entry had 125* an illegal value, then INFO = -(i*100+j), if the i-th 126* argument is a scalar and had an illegal value, then 127* INFO = -i. 128* > 0: If INFO = K, U(IA+K-1,JA+K-1) is exactly zero. 129* The factorization has been completed, but the factor U 130* is exactly singular, and division by zero will occur if 131* it is used to solve a system of equations. 132* 133* ===================================================================== 134* 135* .. Parameters .. 136 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 137 $ LLD_, MB_, M_, NB_, N_, RSRC_ 138 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 139 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 140 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 141 COMPLEX ONE 142 PARAMETER ( ONE = 1.0E+0 ) 143* .. 144* .. Local Scalars .. 145 CHARACTER COLBTOP, COLCTOP, ROWBTOP 146 INTEGER I, ICOFF, ICTXT, IINFO, IN, IROFF, J, JB, JN, 147 $ MN, MYCOL, MYROW, NPCOL, NPROW 148* .. 149* .. Local Arrays .. 150 INTEGER IDUM1( 1 ), IDUM2( 1 ) 151* .. 152* .. External Subroutines .. 153 EXTERNAL BLACS_GRIDINFO, CHK1MAT, IGAMN2D, PCHK1MAT, 154 $ PB_TOPGET, PB_TOPSET, PCGEMM, PCGETF2, 155 $ PCLASWP, PCTRSM, PXERBLA 156* .. 157* .. External Functions .. 158 INTEGER ICEIL 159 EXTERNAL ICEIL 160* .. 161* .. Intrinsic Functions .. 162 INTRINSIC MIN, MOD 163* .. 164* .. Executable Statements .. 165* 166* Get grid parameters 167* 168 ICTXT = DESCA( CTXT_ ) 169 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 170* 171* Test the input parameters 172* 173 INFO = 0 174 IF( NPROW.EQ.-1 ) THEN 175 INFO = -(600+CTXT_) 176 ELSE 177 CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO ) 178 IF( INFO.EQ.0 ) THEN 179 IROFF = MOD( IA-1, DESCA( MB_ ) ) 180 ICOFF = MOD( JA-1, DESCA( NB_ ) ) 181 IF( IROFF.NE.0 ) THEN 182 INFO = -4 183 ELSE IF( ICOFF.NE.0 ) THEN 184 INFO = -5 185 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 186 INFO = -(600+NB_) 187 END IF 188 END IF 189 CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 0, IDUM1, 190 $ IDUM2, INFO ) 191 END IF 192* 193 IF( INFO.NE.0 ) THEN 194 CALL PXERBLA( ICTXT, 'PCGETRF', -INFO ) 195 RETURN 196 END IF 197* 198* Quick return if possible 199* 200 IF( DESCA( M_ ).EQ.1 ) THEN 201 IPIV( 1 ) = 1 202 RETURN 203 ELSE IF( M.EQ.0 .OR. N.EQ.0 ) THEN 204 RETURN 205 END IF 206* 207* Split-ring topology for the communication along process rows 208* 209 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 210 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 211 CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP ) 212 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' ) 213 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', ' ' ) 214 CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', ' ' ) 215* 216* Handle the first block of columns separately 217* 218 MN = MIN( M, N ) 219 IN = MIN( ICEIL( IA, DESCA( MB_ ) )*DESCA( MB_ ), IA+M-1 ) 220 JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA( NB_ ), JA+MN-1 ) 221 JB = JN - JA + 1 222* 223* Factor diagonal and subdiagonal blocks and test for exact 224* singularity. 225* 226 CALL PCGETF2( M, JB, A, IA, JA, DESCA, IPIV, INFO ) 227* 228 IF( JB+1.LE.N ) THEN 229* 230* Apply interchanges to columns JN+1:JA+N-1. 231* 232 CALL PCLASWP( 'Forward', 'Rows', N-JB, A, IA, JN+1, DESCA, 233 $ IA, IN, IPIV ) 234* 235* Compute block row of U. 236* 237 CALL PCTRSM( 'Left', 'Lower', 'No transpose', 'Unit', JB, 238 $ N-JB, ONE, A, IA, JA, DESCA, A, IA, JN+1, DESCA ) 239* 240 IF( JB+1.LE.M ) THEN 241* 242* Update trailing submatrix. 243* 244 CALL PCGEMM( 'No transpose', 'No transpose', M-JB, N-JB, JB, 245 $ -ONE, A, IN+1, JA, DESCA, A, IA, JN+1, DESCA, 246 $ ONE, A, IN+1, JN+1, DESCA ) 247* 248 END IF 249 END IF 250* 251* Loop over the remaining blocks of columns. 252* 253 DO 10 J = JN+1, JA+MN-1, DESCA( NB_ ) 254 JB = MIN( MN-J+JA, DESCA( NB_ ) ) 255 I = IA + J - JA 256* 257* Factor diagonal and subdiagonal blocks and test for exact 258* singularity. 259* 260 CALL PCGETF2( M-J+JA, JB, A, I, J, DESCA, IPIV, IINFO ) 261* 262 IF( INFO.EQ.0 .AND. IINFO.GT.0 ) 263 $ INFO = IINFO + J - JA 264* 265* Apply interchanges to columns JA:J-JA. 266* 267 CALL PCLASWP( 'Forward', 'Rowwise', J-JA, A, IA, JA, DESCA, 268 $ I, I+JB-1, IPIV ) 269* 270 IF( J-JA+JB+1.LE.N ) THEN 271* 272* Apply interchanges to columns J+JB:JA+N-1. 273* 274 CALL PCLASWP( 'Forward', 'Rowwise', N-J-JB+JA, A, IA, J+JB, 275 $ DESCA, I, I+JB-1, IPIV ) 276* 277* Compute block row of U. 278* 279 CALL PCTRSM( 'Left', 'Lower', 'No transpose', 'Unit', JB, 280 $ N-J-JB+JA, ONE, A, I, J, DESCA, A, I, J+JB, 281 $ DESCA ) 282* 283 IF( J-JA+JB+1.LE.M ) THEN 284* 285* Update trailing submatrix. 286* 287 CALL PCGEMM( 'No transpose', 'No transpose', M-J-JB+JA, 288 $ N-J-JB+JA, JB, -ONE, A, I+JB, J, DESCA, A, 289 $ I, J+JB, DESCA, ONE, A, I+JB, J+JB, DESCA ) 290* 291 END IF 292 END IF 293* 294 10 CONTINUE 295* 296 IF( INFO.EQ.0 ) 297 $ INFO = MN + 1 298 CALL IGAMN2D( ICTXT, 'Rowwise', ' ', 1, 1, INFO, 1, IDUM1, IDUM2, 299 $ -1, -1, MYCOL ) 300 IF( INFO.EQ.MN+1 ) 301 $ INFO = 0 302* 303 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 304 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 305 CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', COLCTOP ) 306* 307 RETURN 308* 309* End of PCGETRF 310* 311 END 312