1 SUBROUTINE PDPOTRRV( UPLO, N, A, IA, JA, DESCA, WORK ) 2* 3* -- ScaLAPACK routine (version 1.7) -- 4* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 5* and University of California, Berkeley. 6* May 28, 2001 7* 8* .. Scalar Arguments .. 9 CHARACTER UPLO 10 INTEGER IA, JA, N 11* .. 12* .. Array Arguments .. 13 INTEGER DESCA( * ) 14 DOUBLE PRECISION A( * ), WORK( * ) 15* .. 16* 17* Purpose 18* ======= 19* 20* PDPOTRRV recomputes sub( A ) = A(IA:IA+N-1,JA:JA+N-1) from L or U 21* computed by PDPOTRF. The routine performs the Cholesky factorization 22* in reverse. 23* 24* Notes 25* ===== 26* 27* Each global data object is described by an associated description 28* vector. This vector stores the information required to establish 29* the mapping between an object element and its corresponding process 30* and memory location. 31* 32* Let A be a generic term for any 2D block cyclicly distributed array. 33* Such a global array has an associated description vector DESCA. 34* In the following comments, the character _ should be read as 35* "of the global array". 36* 37* NOTATION STORED IN EXPLANATION 38* --------------- -------------- -------------------------------------- 39* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 40* DTYPE_A = 1. 41* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 42* the BLACS process grid A is distribu- 43* ted over. The context itself is glo- 44* bal, but the handle (the integer 45* value) may vary. 46* M_A (global) DESCA( M_ ) The number of rows in the global 47* array A. 48* N_A (global) DESCA( N_ ) The number of columns in the global 49* array A. 50* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 51* the rows of the array. 52* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 53* the columns of the array. 54* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 55* row of the array A is distributed. 56* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 57* first column of the array A is 58* distributed. 59* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 60* array. LLD_A >= MAX(1,LOCr(M_A)). 61* 62* Let K be the number of rows or columns of a distributed matrix, 63* and assume that its process grid has dimension p x q. 64* LOCr( K ) denotes the number of elements of K that a process 65* would receive if K were distributed over the p processes of its 66* process column. 67* Similarly, LOCc( K ) denotes the number of elements of K that a 68* process would receive if K were distributed over the q processes of 69* its process row. 70* The values of LOCr() and LOCc() may be determined via a call to the 71* ScaLAPACK tool function, NUMROC: 72* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 73* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 74* An upper bound for these quantities may be computed by: 75* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 76* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 77* 78* Arguments 79* ========= 80* 81* UPLO (global input) CHARACTER 82* Specifies whether the upper or lower triangular part of the 83* symmetric distributed matrix sub( A ) is stored: 84* stored: 85* = 'U': Upper triangular 86* = 'L': Lower triangular 87* 88* N (global input) INTEGER 89* The number of rows and columns to be operated on, i.e. the 90* order of the distributed submatrix sub( A ). N >= 0. 91* 92* A (local input/local output) DOUBLE PRECISION pointer into the 93* local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 94* On entry, the local pieces of the factors L or U of the 95* distributed matrix sub( A ) from the Cholesky factorization. 96* On exit, the original distributed matrix sub( A ) is 97* restored. 98* 99* IA (global input) INTEGER 100* The row index in the global array A indicating the first 101* row of sub( A ). 102* 103* JA (global input) INTEGER 104* The column index in the global array A indicating the 105* first column of sub( A ). 106* 107* DESCA (global and local input) INTEGER array of dimension DLEN_. 108* The array descriptor for the distributed matrix A. 109* 110* WORK (local workspace) DOUBLE PRECISION array, dimension (LWORK) 111* LWORK >= MB_A*NB_A. 112* 113* ===================================================================== 114* 115* .. Parameters .. 116 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 117 $ LLD_, MB_, M_, NB_, N_, RSRC_ 118 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 119 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 120 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 121 DOUBLE PRECISION ONE, ZERO 122 PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 ) 123* .. 124* .. Local Scalars .. 125 LOGICAL UPPER 126 CHARACTER COLBTOP, ROWBTOP 127 INTEGER IACOL, IAROW, ICTXT, IL, J, JB, JL, JN, MYCOL, 128 $ MYROW, NPCOL, NPROW 129* .. Local Arrays .. 130 INTEGER DESCW( DLEN_ ) 131* .. 132* .. External Subroutines .. 133 EXTERNAL BLACS_GRIDINFO, DESCSET, PDLACPY, PDLASET, 134 $ PDSYRK, PDTRMM, PB_TOPGET, PB_TOPSET 135* .. 136* .. External Functions .. 137 LOGICAL LSAME 138 INTEGER ICEIL, INDXG2P 139 EXTERNAL ICEIL, INDXG2P, LSAME 140* .. 141* .. Intrinsic Functions .. 142 INTRINSIC MIN, MOD 143* .. 144* .. Executable Statements .. 145* 146* Get grid parameters 147* 148 ICTXT = DESCA( CTXT_ ) 149 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 150* 151 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 152 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 153* 154 UPPER = LSAME( UPLO, 'U' ) 155 JN = MIN( ICEIL( JA, DESCA( NB_ ) ) * DESCA( NB_ ), JA+N-1 ) 156 JL = MAX( ( ( JA+N-2 ) / DESCA( NB_ ) ) * DESCA( NB_ ) + 1, JA ) 157 IL = MAX( ( ( IA+N-2 ) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 158 IAROW = INDXG2P( IL, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ) 159 IACOL = INDXG2P( JL, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), NPCOL ) 160* 161* Define array descriptor for working array WORK 162* 163 CALL DESCSET( DESCW, DESCA( MB_ ), DESCA( NB_ ), DESCA( MB_ ), 164 $ DESCA( NB_ ), IAROW, IACOL, ICTXT, DESCA( MB_ ) ) 165* 166 IF ( UPPER ) THEN 167* 168* Compute A from the Cholesky factor U : A = U'*U. 169* 170 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 171 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'S-ring' ) 172* 173 DO 10 J = JL, JN+1, -DESCA( NB_ ) 174* 175 JB = MIN( JA+N-J, DESCA( NB_ ) ) 176* 177* Update the trailing matrix, A = A + U'*U 178* 179 CALL PDSYRK( 'Upper', 'Transpose', JA+N-J-JB, JB, ONE, A, IL, 180 $ J+JB, DESCA, ONE, A, IL+JB, J+JB, DESCA ) 181* 182* Copy current diagonal block of A into workspace 183* 184 CALL PDLACPY( 'All', JB, JB, A, IL, J, DESCA, WORK, 1, 1, 185 $ DESCW ) 186* 187* Zero strict lower triangular part of diagonal block, to make 188* it U1. 189* 190 CALL PDLASET( 'Lower', JB-1, JB, ZERO, ZERO, A, IL+1, J, 191 $ DESCA ) 192* 193* Update the row panel U with the triangular matrix 194* 195 CALL PDTRMM( 'Left', 'Upper', 'Transpose', 'Non-Unit', JB, 196 $ N-J+JA, ONE, WORK, 1, 1, DESCW, A, IL, J, 197 $ DESCA ) 198* 199* Restore the strict lower triangular part of diagonal block. 200* 201 CALL PDLACPY( 'Lower', JB-1, JB, WORK, 2, 1, DESCW, A, 202 $ IL+1, J, DESCA ) 203* 204 IL = IL - DESCA( MB_ ) 205 DESCW( RSRC_ ) = MOD( DESCW( RSRC_ ) + NPROW - 1, NPROW ) 206 DESCW( CSRC_ ) = MOD( DESCW( CSRC_ ) + NPCOL - 1, NPCOL ) 207* 208 10 CONTINUE 209* 210* Handle first block separately 211* 212 JB = MIN( JN-JA+1, DESCA( NB_ ) ) 213* 214* Update the trailing matrix, A = A + U'*U 215* 216 CALL PDSYRK( 'Upper', 'Transpose', N-JB, JB, ONE, A, IA, JA+JB, 217 $ DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 218* 219* Copy current diagonal block of A into workspace 220* 221 CALL PDLACPY( 'All', JB, JB, A, IA, JA, DESCA, WORK, 1, 1, 222 $ DESCW ) 223* 224* Zero strict lower triangular part of diagonal block, to make 225* it U1. 226* 227 CALL PDLASET( 'Lower', JB-1, JB, ZERO, ZERO, A, IA+1, JA, 228 $ DESCA ) 229* 230* Update the row panel U with the triangular matrix 231* 232 CALL PDTRMM( 'Left', 'Upper', 'Transpose', 'Non-Unit', JB, 233 $ N, ONE, WORK, 1, 1, DESCW, A, IA, JA, DESCA ) 234* 235* Restore the strict lower triangular part of diagonal block. 236* 237 CALL PDLACPY( 'Lower', JB-1, JB, WORK, 2, 1, DESCW, A, IA+1, 238 $ JA, DESCA ) 239* 240 ELSE 241* 242* Compute A from the Cholesky factor L : A = L*L'. 243* 244 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' ) 245 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', ' ' ) 246* 247 DO 20 J = JL, JN+1, -DESCA( NB_ ) 248* 249 JB = MIN( JA+N-J, DESCA( NB_ ) ) 250* 251* Update the trailing matrix, A = A + L*L' 252* 253 CALL PDSYRK( 'Lower', 'No transpose', IA+N-IL-JB, JB, ONE, A, 254 $ IL+JB, J, DESCA, ONE, A, IL+JB, J+JB, DESCA ) 255* 256* Copy current diagonal block of A into workspace 257* 258 CALL PDLACPY( 'All', JB, JB, A, IL, J, DESCA, WORK, 1, 1, 259 $ DESCW ) 260* 261* Zero strict upper triangular part of diagonal block, to make 262* it L1. 263* 264 CALL PDLASET( 'Upper', JB, JB-1, ZERO, ZERO, A, IL, J+1, 265 $ DESCA ) 266* 267* Update the column panel L with the triangular matrix 268* 269 CALL PDTRMM( 'Right', 'Lower', 'Transpose', 'Non-Unit', 270 $ IA+N-IL, JB, ONE, WORK, 1, 1, DESCW, A, IL, 271 $ J, DESCA ) 272* 273* Restore the strict upper triangular part of diagonal block. 274* 275 CALL PDLACPY( 'Upper', JB, JB-1, WORK, 1, 2, DESCW, A, 276 $ IL, J+1, DESCA ) 277* 278 IL = IL - DESCA( MB_ ) 279 DESCW( RSRC_ ) = MOD( DESCW( RSRC_ ) + NPROW - 1, NPROW ) 280 DESCW( CSRC_ ) = MOD( DESCW( CSRC_ ) + NPCOL - 1, NPCOL ) 281* 282 20 CONTINUE 283* 284* Handle first block separately 285* 286 JB = MIN( JN-JA+1, DESCA( NB_ ) ) 287* 288* Update the trailing matrix, A = A + L*L' 289* 290 CALL PDSYRK( 'Lower', 'No transpose', N-JB, JB, ONE, A, 291 $ IA+JB, JA, DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 292* 293* Copy current diagonal block of A into workspace 294* 295 CALL PDLACPY( 'All', JB, JB, A, IA, JA, DESCA, WORK, 1, 1, 296 $ DESCW ) 297* 298* Zero strict upper triangular part of diagonal block, to make 299* it L1. 300* 301 CALL PDLASET( 'Upper', JB, JB-1, ZERO, ZERO, A, IA, JA+1, 302 $ DESCA ) 303* 304* Update the column panel L with the triangular matrix 305* 306 CALL PDTRMM( 'Right', 'Lower', 'Transpose', 'Non-Unit', N, JB, 307 $ ONE, WORK, 1, 1, DESCW, A, IA, JA, DESCA ) 308* 309* Restore the strict upper triangular part of diagonal block. 310* 311 CALL PDLACPY( 'Upper', JB, JB-1, WORK, 1, 2, DESCW, A, IA, 312 $ JA+1, DESCA ) 313* 314 END IF 315* 316 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 317 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 318* 319 RETURN 320* 321* End of PDPOTRRV 322* 323 END 324