1 SUBROUTINE PZPOTRF( UPLO, N, A, IA, JA, DESCA, INFO ) 2* 3* -- ScaLAPACK routine (version 1.7) -- 4* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 5* and University of California, Berkeley. 6* May 25, 2001 7* 8* .. Scalar Arguments .. 9 CHARACTER UPLO 10 INTEGER IA, INFO, JA, N 11* .. 12* .. Array Arguments .. 13 INTEGER DESCA( * ) 14 COMPLEX*16 A( * ) 15* .. 16* 17* Purpose 18* ======= 19* 20* PZPOTRF computes the Cholesky factorization of an N-by-N complex 21* hermitian positive definite distributed matrix sub( A ) denoting 22* A(IA:IA+N-1, JA:JA+N-1). 23* 24* The factorization has the form 25* 26* sub( A ) = U' * U , if UPLO = 'U', or 27* 28* sub( A ) = L * L', if UPLO = 'L', 29* 30* where U is an upper triangular matrix and L is lower triangular. 31* 32* Notes 33* ===== 34* 35* Each global data object is described by an associated description 36* vector. This vector stores the information required to establish 37* the mapping between an object element and its corresponding process 38* and memory location. 39* 40* Let A be a generic term for any 2D block cyclicly distributed array. 41* Such a global array has an associated description vector DESCA. 42* In the following comments, the character _ should be read as 43* "of the global array". 44* 45* NOTATION STORED IN EXPLANATION 46* --------------- -------------- -------------------------------------- 47* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 48* DTYPE_A = 1. 49* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 50* the BLACS process grid A is distribu- 51* ted over. The context itself is glo- 52* bal, but the handle (the integer 53* value) may vary. 54* M_A (global) DESCA( M_ ) The number of rows in the global 55* array A. 56* N_A (global) DESCA( N_ ) The number of columns in the global 57* array A. 58* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 59* the rows of the array. 60* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 61* the columns of the array. 62* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 63* row of the array A is distributed. 64* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 65* first column of the array A is 66* distributed. 67* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 68* array. LLD_A >= MAX(1,LOCr(M_A)). 69* 70* Let K be the number of rows or columns of a distributed matrix, 71* and assume that its process grid has dimension p x q. 72* LOCr( K ) denotes the number of elements of K that a process 73* would receive if K were distributed over the p processes of its 74* process column. 75* Similarly, LOCc( K ) denotes the number of elements of K that a 76* process would receive if K were distributed over the q processes of 77* its process row. 78* The values of LOCr() and LOCc() may be determined via a call to the 79* ScaLAPACK tool function, NUMROC: 80* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 81* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 82* An upper bound for these quantities may be computed by: 83* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 84* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 85* 86* This routine requires square block decomposition ( MB_A = NB_A ). 87* 88* Arguments 89* ========= 90* 91* UPLO (global input) CHARACTER 92* = 'U': Upper triangle of sub( A ) is stored; 93* = 'L': Lower triangle of sub( A ) is stored. 94* 95* N (global input) INTEGER 96* The number of rows and columns to be operated on, i.e. the 97* order of the distributed submatrix sub( A ). N >= 0. 98* 99* A (local input/local output) COMPLEX*16 pointer into the 100* local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 101* On entry, this array contains the local pieces of the 102* N-by-N Hermitian distributed matrix sub( A ) to be factored. 103* If UPLO = 'U', the leading N-by-N upper triangular part of 104* sub( A ) contains the upper triangular part of the matrix, 105* and its strictly lower triangular part is not referenced. 106* If UPLO = 'L', the leading N-by-N lower triangular part of 107* sub( A ) contains the lower triangular part of the distribu- 108* ted matrix, and its strictly upper triangular part is not 109* referenced. On exit, if UPLO = 'U', the upper triangular 110* part of the distributed matrix contains the Cholesky factor 111* U, if UPLO = 'L', the lower triangular part of the distribu- 112* ted matrix contains the Cholesky factor L. 113* 114* IA (global input) INTEGER 115* The row index in the global array A indicating the first 116* row of sub( A ). 117* 118* JA (global input) INTEGER 119* The column index in the global array A indicating the 120* first column of sub( A ). 121* 122* DESCA (global and local input) INTEGER array of dimension DLEN_. 123* The array descriptor for the distributed matrix A. 124* 125* INFO (global output) INTEGER 126* = 0: successful exit 127* < 0: If the i-th argument is an array and the j-entry had 128* an illegal value, then INFO = -(i*100+j), if the i-th 129* argument is a scalar and had an illegal value, then 130* INFO = -i. 131* > 0: If INFO = K, the leading minor of order K, 132* A(IA:IA+K-1,JA:JA+K-1) is not positive definite, and 133* the factorization could not be completed. 134* 135* ===================================================================== 136* 137* .. Parameters .. 138 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 139 $ LLD_, MB_, M_, NB_, N_, RSRC_ 140 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 141 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 142 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 143 DOUBLE PRECISION ONE 144 PARAMETER ( ONE = 1.0D+0 ) 145 COMPLEX*16 CONE 146 PARAMETER ( CONE = ( 1.0D+0, 0.0D+0 ) ) 147* .. 148* .. Local Scalars .. 149 LOGICAL UPPER 150 CHARACTER COLBTOP, ROWBTOP 151 INTEGER I, ICOFF, ICTXT, IROFF, J, JB, JN, MYCOL, 152 $ MYROW, NPCOL, NPROW 153* .. 154* .. Local Arrays .. 155 INTEGER IDUM1( 1 ), IDUM2( 1 ) 156* .. 157* .. External Subroutines .. 158 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK1MAT, PB_TOPGET, 159 $ PB_TOPSET, PXERBLA, PZPOTF2, PZHERK, 160 $ PZTRSM 161* .. 162* .. External Functions .. 163 LOGICAL LSAME 164 INTEGER ICEIL 165 EXTERNAL ICEIL, LSAME 166* .. 167* .. Intrinsic Functions .. 168 INTRINSIC ICHAR, MIN, MOD 169* .. 170* .. Executable Statements .. 171* 172* Get grid parameters 173* 174 ICTXT = DESCA( CTXT_ ) 175 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 176* 177* Test the input parameters 178* 179 INFO = 0 180 IF( NPROW.EQ.-1 ) THEN 181 INFO = -(600+CTXT_) 182 ELSE 183 CALL CHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, INFO ) 184 UPPER = LSAME( UPLO, 'U' ) 185 IF( INFO.EQ.0 ) THEN 186 IROFF = MOD( IA-1, DESCA( MB_ ) ) 187 ICOFF = MOD( JA-1, DESCA( NB_ ) ) 188 IF ( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN 189 INFO = -1 190 ELSE IF( IROFF.NE.0 ) THEN 191 INFO = -4 192 ELSE IF( ICOFF.NE.0 ) THEN 193 INFO = -5 194 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 195 INFO = -(600+NB_) 196 END IF 197 END IF 198 IF( UPPER ) THEN 199 IDUM1( 1 ) = ICHAR( 'U' ) 200 ELSE 201 IDUM1( 1 ) = ICHAR( 'L' ) 202 END IF 203 IDUM2( 1 ) = 1 204 CALL PCHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, 1, IDUM1, IDUM2, 205 $ INFO ) 206 END IF 207* 208 IF( INFO.NE.0 ) THEN 209 CALL PXERBLA( ICTXT, 'PZPOTRF', -INFO ) 210 RETURN 211 END IF 212* 213* Quick return if possible 214* 215 IF( N.EQ.0 ) 216 $ RETURN 217* 218 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 219 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 220* 221 IF( UPPER ) THEN 222* 223* Split-ring topology for the communication along process 224* columns, 1-tree topology along process rows. 225* 226 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 227 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'S-ring' ) 228* 229* A is upper triangular, compute Cholesky factorization A = U'*U. 230* 231* Handle the first block of columns separately 232* 233 JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA(NB_), JA+N-1 ) 234 JB = JN - JA + 1 235* 236* Perform unblocked Cholesky factorization on JB block 237* 238 CALL PZPOTF2( UPLO, JB, A, IA, JA, DESCA, INFO ) 239 IF( INFO.NE.0 ) 240 $ GO TO 30 241* 242 IF( JB+1.LE.N ) THEN 243* 244* Form the row panel of U using the triangular solver 245* 246 CALL PZTRSM( 'Left', UPLO, 'Conjugate transpose', 247 $ 'Non-Unit', JB, N-JB, CONE, A, IA, JA, DESCA, 248 $ A, IA, JA+JB, DESCA ) 249* 250* Update the trailing matrix, A = A - U'*U 251* 252 CALL PZHERK( UPLO, 'Conjugate transpose', N-JB, JB, -ONE, A, 253 $ IA, JA+JB, DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 254 END IF 255* 256* Loop over remaining block of columns 257* 258 DO 10 J = JN+1, JA+N-1, DESCA( NB_ ) 259 JB = MIN( N-J+JA, DESCA( NB_ ) ) 260 I = IA + J - JA 261* 262* Perform unblocked Cholesky factorization on JB block 263* 264 CALL PZPOTF2( UPLO, JB, A, I, J, DESCA, INFO ) 265 IF( INFO.NE.0 ) THEN 266 INFO = INFO + J - JA 267 GO TO 30 268 END IF 269* 270 IF( J-JA+JB+1.LE.N ) THEN 271* 272* Form the row panel of U using the triangular solver 273* 274 CALL PZTRSM( 'Left', UPLO, 'Conjugate transpose', 275 $ 'Non-Unit', JB, N-J-JB+JA, CONE, A, I, J, 276 $ DESCA, A, I, J+JB, DESCA ) 277* 278* Update the trailing matrix, A = A - U'*U 279* 280 CALL PZHERK( UPLO, 'Conjugate transpose', N-J-JB+JA, JB, 281 $ -ONE, A, I, J+JB, DESCA, ONE, A, I+JB, 282 $ J+JB, DESCA ) 283 END IF 284 10 CONTINUE 285* 286 ELSE 287* 288* 1-tree topology for the communication along process columns, 289* Split-ring topology along process rows. 290* 291 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' ) 292 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', ' ' ) 293* 294* A is lower triangular, compute Cholesky factorization A = L*L' 295* (right-looking) 296* 297* Handle the first block of columns separately 298* 299 JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA( NB_ ), JA+N-1 ) 300 JB = JN - JA + 1 301* 302* Perform unblocked Cholesky factorization on JB block 303* 304 CALL PZPOTF2( UPLO, JB, A, IA, JA, DESCA, INFO ) 305 IF( INFO.NE.0 ) 306 $ GO TO 30 307* 308 IF( JB+1.LE.N ) THEN 309* 310* Form the column panel of L using the triangular solver 311* 312 CALL PZTRSM( 'Right', UPLO, 'Conjugate transpose', 313 $ 'Non-Unit', N-JB, JB, CONE, A, IA, JA, DESCA, 314 $ A, IA+JB, JA, DESCA ) 315* 316* Update the trailing matrix, A = A - L*L' 317* 318 CALL PZHERK( UPLO, 'No Transpose', N-JB, JB, -ONE, A, IA+JB, 319 $ JA, DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 320* 321 END IF 322* 323 DO 20 J = JN+1, JA+N-1, DESCA( NB_ ) 324 JB = MIN( N-J+JA, DESCA( NB_ ) ) 325 I = IA + J - JA 326* 327* Perform unblocked Cholesky factorization on JB block 328* 329 CALL PZPOTF2( UPLO, JB, A, I, J, DESCA, INFO ) 330 IF( INFO.NE.0 ) THEN 331 INFO = INFO + J - JA 332 GO TO 30 333 END IF 334* 335 IF( J-JA+JB+1.LE.N ) THEN 336* 337* Form the column panel of L using the triangular solver 338* 339 CALL PZTRSM( 'Right', UPLO, 'Conjugate transpose', 340 $ 'Non-Unit', N-J-JB+JA, JB, CONE, A, I, J, 341 $ DESCA, A, I+JB, J, DESCA ) 342* 343* Update the trailing matrix, A = A - L*L' 344* 345 CALL PZHERK( UPLO, 'No Transpose', N-J-JB+JA, JB, -ONE, 346 $ A, I+JB, J, DESCA, ONE, A, I+JB, J+JB, 347 $ DESCA ) 348* 349 END IF 350 20 CONTINUE 351* 352 END IF 353* 354 30 CONTINUE 355* 356 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 357 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 358* 359 RETURN 360* 361* End of PZPOTRF 362* 363 END 364