1 SUBROUTINE PZLARFT( DIRECT, STOREV, N, K, V, IV, JV, DESCV, TAU, 2 $ T, WORK ) 3* 4* -- ScaLAPACK auxiliary routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 1, 1997 8* 9* .. Scalar Arguments .. 10 CHARACTER DIRECT, STOREV 11 INTEGER IV, JV, K, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCV( * ) 15 COMPLEX*16 TAU( * ), T( * ), V( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PZLARFT forms the triangular factor T of a complex block reflector H 22* of order n, which is defined as a product of k elementary reflectors. 23* 24* If DIRECT = 'F', H = H(1) H(2) . . . H(k) and T is upper triangular; 25* 26* If DIRECT = 'B', H = H(k) . . . H(2) H(1) and T is lower triangular. 27* 28* If STOREV = 'C', the vector which defines the elementary reflector 29* H(i) is stored in the i-th column of the distributed matrix V, and 30* 31* H = I - V * T * V' 32* 33* If STOREV = 'R', the vector which defines the elementary reflector 34* H(i) is stored in the i-th row of the distributed matrix V, and 35* 36* H = I - V' * T * V 37* 38* Notes 39* ===== 40* 41* Each global data object is described by an associated description 42* vector. This vector stores the information required to establish 43* the mapping between an object element and its corresponding process 44* and memory location. 45* 46* Let A be a generic term for any 2D block cyclicly distributed array. 47* Such a global array has an associated description vector DESCA. 48* In the following comments, the character _ should be read as 49* "of the global array". 50* 51* NOTATION STORED IN EXPLANATION 52* --------------- -------------- -------------------------------------- 53* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 54* DTYPE_A = 1. 55* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 56* the BLACS process grid A is distribu- 57* ted over. The context itself is glo- 58* bal, but the handle (the integer 59* value) may vary. 60* M_A (global) DESCA( M_ ) The number of rows in the global 61* array A. 62* N_A (global) DESCA( N_ ) The number of columns in the global 63* array A. 64* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 65* the rows of the array. 66* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 67* the columns of the array. 68* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 69* row of the array A is distributed. 70* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 71* first column of the array A is 72* distributed. 73* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 74* array. LLD_A >= MAX(1,LOCr(M_A)). 75* 76* Let K be the number of rows or columns of a distributed matrix, 77* and assume that its process grid has dimension p x q. 78* LOCr( K ) denotes the number of elements of K that a process 79* would receive if K were distributed over the p processes of its 80* process column. 81* Similarly, LOCc( K ) denotes the number of elements of K that a 82* process would receive if K were distributed over the q processes of 83* its process row. 84* The values of LOCr() and LOCc() may be determined via a call to the 85* ScaLAPACK tool function, NUMROC: 86* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 87* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 88* An upper bound for these quantities may be computed by: 89* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 90* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 91* 92* Arguments 93* ========= 94* 95* DIRECT (global input) CHARACTER*1 96* Specifies the order in which the elementary reflectors are 97* multiplied to form the block reflector: 98* = 'F': H = H(1) H(2) . . . H(k) (Forward) 99* = 'B': H = H(k) . . . H(2) H(1) (Backward) 100* 101* STOREV (global input) CHARACTER*1 102* Specifies how the vectors which define the elementary 103* reflectors are stored (see also Further Details): 104* = 'C': columnwise 105* = 'R': rowwise 106* 107* N (global input) INTEGER 108* The order of the block reflector H. N >= 0. 109* 110* K (global input) INTEGER 111* The order of the triangular factor T (= the number of 112* elementary reflectors). 1 <= K <= MB_V (= NB_V). 113* 114* V (input/output) COMPLEX*16 pointer into the local memory 115* to an array of local dimension (LOCr(IV+N-1),LOCc(JV+K-1)) 116* if STOREV = 'C', and (LOCr(IV+K-1),LOCc(JV+N-1)) if 117* STOREV = 'R'. The distributed matrix V contains the 118* Householder vectors. See further details. 119* 120* IV (global input) INTEGER 121* The row index in the global array V indicating the first 122* row of sub( V ). 123* 124* JV (global input) INTEGER 125* The column index in the global array V indicating the 126* first column of sub( V ). 127* 128* DESCV (global and local input) INTEGER array of dimension DLEN_. 129* The array descriptor for the distributed matrix V. 130* 131* TAU (local input) COMPLEX*16, array, dimension LOCr(IV+K-1) 132* if INCV = M_V, and LOCc(JV+K-1) otherwise. This array 133* contains the Householder scalars related to the Householder 134* vectors. TAU is tied to the distributed matrix V. 135* 136* T (local output) COMPLEX*16 array, dimension (NB_V,NB_V) 137* if STOREV = 'Col', and (MB_V,MB_V) otherwise. It contains 138* the k-by-k triangular factor of the block reflector asso- 139* ciated with V. If DIRECT = 'F', T is upper triangular; 140* if DIRECT = 'B', T is lower triangular. 141* 142* WORK (local workspace) COMPLEX*16 array, 143* dimension (K*(K-1)/2) 144* 145* Further Details 146* =============== 147* 148* The shape of the matrix V and the storage of the vectors which define 149* the H(i) is best illustrated by the following example with n = 5 and 150* k = 3. The elements equal to 1 are not stored; the corresponding 151* array elements are modified but restored on exit. The rest of the 152* array is not used. 153* 154* DIRECT = 'F' and STOREV = 'C': DIRECT = 'F' and STOREV = 'R': 155* 156* V( IV:IV+N-1, ( 1 ) V( IV:IV+K-1, ( 1 v1 v1 v1 v1 ) 157* JV:JV+K-1 ) = ( v1 1 ) JV:JV+N-1 ) = ( 1 v2 v2 v2 ) 158* ( v1 v2 1 ) ( 1 v3 v3 ) 159* ( v1 v2 v3 ) 160* ( v1 v2 v3 ) 161* 162* DIRECT = 'B' and STOREV = 'C': DIRECT = 'B' and STOREV = 'R': 163* 164* V( IV:IV+N-1, ( v1 v2 v3 ) V( IV:IV+K-1, ( v1 v1 1 ) 165* JV:JV+K-1 ) = ( v1 v2 v3 ) JV:JV+N-1 ) = ( v2 v2 v2 1 ) 166* ( 1 v2 v3 ) ( v3 v3 v3 v3 1 ) 167* ( 1 v3 ) 168* ( 1 ) 169* 170* ===================================================================== 171* 172* .. Parameters .. 173 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 174 $ LLD_, MB_, M_, NB_, N_, RSRC_ 175 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 176 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 177 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 178 COMPLEX*16 ONE, ZERO 179 PARAMETER ( ONE = ( 1.0D+0, 0.0D+0 ), 180 $ ZERO = ( 0.0D+0, 0.0D+0 ) ) 181* .. 182* .. Local Scalars .. 183 LOGICAL FORWARD 184 INTEGER ICOFF, ICTXT, II, IIV, IROFF, IVCOL, IVROW, 185 $ ITMP0, ITMP1, IW, JJ, JJV, LDV, MICOL, MIROW, 186 $ MYCOL, MYROW, NP, NPCOL, NPROW, NQ 187 COMPLEX*16 VII 188* .. 189* .. External Subroutines .. 190 EXTERNAL BLACS_GRIDINFO, INFOG2L, ZCOPY, ZGEMV, 191 $ ZGSUM2D, ZLACGV, ZLASET, ZTRMV 192* .. 193* .. External Functions .. 194 LOGICAL LSAME 195 INTEGER INDXG2P, NUMROC 196 EXTERNAL INDXG2P, LSAME, NUMROC 197* .. 198* .. Intrinsic Functions .. 199 INTRINSIC MOD 200* .. 201* .. Executable Statements .. 202* 203* Quick return if possible 204* 205 IF( N.LE.0 .OR. K.LE.0 ) 206 $ RETURN 207* 208 ICTXT = DESCV( CTXT_ ) 209 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 210* 211 FORWARD = LSAME( DIRECT, 'F' ) 212 CALL INFOG2L( IV, JV, DESCV, NPROW, NPCOL, MYROW, MYCOL, 213 $ IIV, JJV, IVROW, IVCOL ) 214* 215 IF( LSAME( STOREV, 'C' ) .AND. MYCOL.EQ.IVCOL ) THEN 216* 217 IW = 1 218 LDV = DESCV( LLD_ ) 219 IROFF = MOD( IV-1, DESCV( MB_ ) ) 220* 221 IF( FORWARD ) THEN 222* 223* DIRECT = 'Forward', STOREV = 'Columnwise' 224* 225 NP = NUMROC( N+IROFF, DESCV( MB_ ), MYROW, IVROW, NPROW ) 226 IF( MYROW.EQ.IVROW ) THEN 227 NP = NP - IROFF 228 II = IIV + 1 229 ELSE 230 II = IIV 231 END IF 232 IF( IROFF+1.EQ.DESCV( MB_ ) ) THEN 233 MIROW = MOD( IVROW+1, NPROW ) 234 ELSE 235 MIROW = IVROW 236 END IF 237 ITMP0 = 0 238* 239 DO 10 JJ = JJV+1, JJV+K-1 240* 241 IF( MYROW.EQ.MIROW ) THEN 242 VII = V( II+(JJ-1)*LDV ) 243 V( II+(JJ-1)*LDV ) = ONE 244 END IF 245* 246* T(1:i-1,i) = -tau( jv+i-1 ) * 247* V(iv+i-1:iv+n-1,jv:jv+i-2)' * V(iv+i-1:iv+n-1,jv+i-1) 248* 249 ITMP0 = ITMP0 + 1 250 IF( NP-II+IIV.GT.0 ) THEN 251 CALL ZGEMV( 'Conjugate transpose', NP-II+IIV, ITMP0, 252 $ -TAU( JJ ), V( II+(JJV-1)*LDV ), LDV, 253 $ V( II+(JJ-1)*LDV ), 1, ZERO, 254 $ WORK( IW ), 1 ) 255 ELSE 256 CALL ZLASET( 'All', ITMP0, 1, ZERO, ZERO, WORK( IW ), 257 $ ITMP0 ) 258 END IF 259* 260 IW = IW + ITMP0 261 IF( MYROW.EQ.MIROW ) THEN 262 V( II+(JJ-1)*LDV ) = VII 263 II = II + 1 264 END IF 265* 266 IF( MOD( IV+ITMP0, DESCV( MB_ ) ).EQ.0 ) 267 $ MIROW = MOD( MIROW+1, NPROW ) 268* 269 10 CONTINUE 270* 271 CALL ZGSUM2D( ICTXT, 'Columnwise', ' ', IW-1, 1, WORK, IW-1, 272 $ IVROW, MYCOL ) 273* 274 IF( MYROW.EQ.IVROW ) THEN 275* 276 IW = 1 277 ITMP0 = 0 278 ITMP1 = 1 279* 280 T( ITMP1 ) = TAU( JJV ) 281* 282 DO 20 JJ = JJV+1, JJV+K-1 283* 284* T(1:j-1,j) = T(1:j-1,1:j-1) * T(1:j-1,j) 285* 286 ITMP0 = ITMP0 + 1 287 ITMP1 = ITMP1 + DESCV( NB_ ) 288 CALL ZCOPY( ITMP0, WORK( IW ), 1, T( ITMP1 ), 1 ) 289 IW = IW + ITMP0 290* 291 CALL ZTRMV( 'Upper', 'No transpose', 'Non-unit', 292 $ ITMP0, T, DESCV( NB_ ), T( ITMP1 ), 1 ) 293 T(ITMP1+ITMP0) = TAU( JJ ) 294* 295 20 CONTINUE 296* 297 END IF 298* 299 ELSE 300* 301* DIRECT = 'Backward', STOREV = 'Columnwise' 302* 303 NP = NUMROC( N+IROFF-1, DESCV( MB_ ), MYROW, IVROW, NPROW ) 304 IF( MYROW.EQ.IVROW ) 305 $ NP = NP - IROFF 306 MIROW = INDXG2P( IV+N-2, DESCV( MB_ ), MYROW, 307 $ DESCV( RSRC_ ), NPROW ) 308 II = IIV + NP - 1 309 ITMP0 = 0 310* 311 DO 30 JJ = JJV+K-2, JJV, -1 312* 313 IF( MYROW.EQ.MIROW ) THEN 314 VII = V( II+(JJ-1)*LDV ) 315 V( II+(JJ-1)*LDV ) = ONE 316 END IF 317* 318* T(1:i-1,i) = -tau( jv+i-1 ) * 319* V(iv:iv+n-k+i-1,jv+i:jv+k-1)' * V(iv:iv+n-k+i-1,jv+i-1) 320* 321 ITMP0 = ITMP0 + 1 322 IF( II-IIV+1.GT.0 ) THEN 323 CALL ZGEMV( 'Conjugate transpose', II-IIV+1, ITMP0, 324 $ -TAU( JJ ), V( IIV+JJ*LDV ), LDV, 325 $ V( IIV+(JJ-1)*LDV ), 1, ZERO, 326 $ WORK( IW ), 1 ) 327 ELSE 328 CALL ZLASET( 'All', ITMP0, 1, ZERO, ZERO, WORK( IW ), 329 $ ITMP0 ) 330 END IF 331* 332 IW = IW + ITMP0 333 IF( MYROW.EQ.MIROW ) THEN 334 V( II+(JJ-1)*LDV ) = VII 335 II = II - 1 336 END IF 337* 338 IF( MOD( IV+N-ITMP0-2, DESCV(MB_) ).EQ.0 ) 339 $ MIROW = MOD( MIROW+NPROW-1, NPROW ) 340* 341 30 CONTINUE 342* 343 CALL ZGSUM2D( ICTXT, 'Columnwise', ' ', IW-1, 1, WORK, IW-1, 344 $ IVROW, MYCOL ) 345* 346 IF( MYROW.EQ.IVROW ) THEN 347* 348 IW = 1 349 ITMP0 = 0 350 ITMP1 = K + 1 + (K-1) * DESCV( NB_ ) 351* 352 T( ITMP1-1 ) = TAU( JJV+K-1 ) 353* 354 DO 40 JJ = JJV+K-2, JJV, -1 355* 356* T(j+1:k,j) = T(j+1:k,j+1:k) * T(j+1:k,j) 357* 358 ITMP0 = ITMP0 + 1 359 ITMP1 = ITMP1 - DESCV( NB_ ) - 1 360 CALL ZCOPY( ITMP0, WORK( IW ), 1, T( ITMP1 ), 1 ) 361 IW = IW + ITMP0 362* 363 CALL ZTRMV( 'Lower', 'No transpose', 'Non-unit', 364 $ ITMP0, T( ITMP1+DESCV( NB_ ) ), 365 $ DESCV( NB_ ), T( ITMP1 ), 1 ) 366 T( ITMP1-1 ) = TAU( JJ ) 367* 368 40 CONTINUE 369* 370 END IF 371* 372 END IF 373* 374 ELSE IF( LSAME( STOREV, 'R' ) .AND. MYROW.EQ.IVROW ) THEN 375* 376 IW = 1 377 LDV = DESCV( LLD_ ) 378 ICOFF = MOD( JV-1, DESCV( NB_ ) ) 379* 380 IF( FORWARD ) THEN 381* 382* DIRECT = 'Forward', STOREV = 'Rowwise' 383* 384 NQ = NUMROC( N+ICOFF, DESCV( NB_ ), MYCOL, IVCOL, NPCOL ) 385 IF( MYCOL.EQ.IVCOL ) THEN 386 NQ = NQ - ICOFF 387 JJ = JJV + 1 388 ELSE 389 JJ = JJV 390 END IF 391 IF( ICOFF+1.EQ.DESCV( NB_ ) ) THEN 392 MICOL = MOD( IVCOL+1, NPCOL ) 393 ELSE 394 MICOL = IVCOL 395 END IF 396 ITMP0 = 0 397* 398 DO 50 II = IIV+1, IIV+K-1 399* 400 IF( MYCOL.EQ.MICOL ) THEN 401 VII = V( II+(JJ-1)*LDV ) 402 V( II+(JJ-1)*LDV ) = ONE 403 END IF 404* 405* T(1:i-1,i) = -tau( iv+i-1 ) * 406* V(iv+i-1,jv+i-1:jv+n-1) * V(iv:iv+i-2,jv+i-1:jv+n-1)' 407* 408 ITMP0 = ITMP0 + 1 409 IF( NQ-JJ+JJV.GT.0 ) THEN 410 CALL ZLACGV( NQ-JJ+JJV, V( II+(JJ-1)*LDV ), LDV ) 411 CALL ZGEMV( 'No transpose', ITMP0, NQ-JJ+JJV, 412 $ -TAU(II), V( IIV+(JJ-1)*LDV ), LDV, 413 $ V( II+(JJ-1)*LDV ), LDV, ZERO, 414 $ WORK( IW ), 1 ) 415 CALL ZLACGV( NQ-JJ+JJV, V( II+(JJ-1)*LDV ), LDV ) 416 ELSE 417 CALL ZLASET( 'All', ITMP0, 1, ZERO, ZERO, 418 $ WORK( IW ), ITMP0 ) 419 END IF 420* 421 IW = IW + ITMP0 422 IF( MYCOL.EQ.MICOL ) THEN 423 V( II+(JJ-1)*LDV ) = VII 424 JJ = JJ + 1 425 END IF 426* 427 IF( MOD( JV+ITMP0, DESCV( NB_ ) ).EQ.0 ) 428 $ MICOL = MOD( MICOL+1, NPCOL ) 429* 430 50 CONTINUE 431* 432 CALL ZGSUM2D( ICTXT, 'Rowwise', ' ', IW-1, 1, WORK, IW-1, 433 $ MYROW, IVCOL ) 434* 435 IF( MYCOL.EQ.IVCOL ) THEN 436* 437 IW = 1 438 ITMP0 = 0 439 ITMP1 = 1 440* 441 T( ITMP1 ) = TAU( IIV ) 442* 443 DO 60 II = IIV+1, IIV+K-1 444* 445* T(1:i-1,i) = T(1:i-1,1:i-1) * T(1:i-1,i) 446* 447 ITMP0 = ITMP0 + 1 448 ITMP1 = ITMP1 + DESCV( MB_ ) 449 CALL ZCOPY( ITMP0, WORK( IW ), 1, T( ITMP1 ), 1 ) 450 IW = IW + ITMP0 451* 452 CALL ZTRMV( 'Upper', 'No transpose', 'Non-unit', 453 $ ITMP0, T, DESCV( MB_ ), T( ITMP1 ), 1 ) 454 T( ITMP1+ITMP0 ) = TAU( II ) 455* 456 60 CONTINUE 457* 458 END IF 459* 460 ELSE 461* 462* DIRECT = 'Backward', STOREV = 'Rowwise' 463* 464 NQ = NUMROC( N+ICOFF-1, DESCV( NB_ ), MYCOL, IVCOL, NPCOL ) 465 IF( MYCOL.EQ.IVCOL ) 466 $ NQ = NQ - ICOFF 467 MICOL = INDXG2P( JV+N-2, DESCV( NB_ ), MYCOL, 468 $ DESCV( CSRC_ ), NPCOL ) 469 JJ = JJV + NQ - 1 470 ITMP0 = 0 471* 472 DO 70 II = IIV+K-2, IIV, -1 473* 474 IF( MYCOL.EQ.MICOL ) THEN 475 VII = V( II+(JJ-1)*LDV ) 476 V( II+(JJ-1)*LDV ) = ONE 477 END IF 478* 479* T(i+1:k,i) = -tau( iv+i-1 ) * 480* V(iv+i:iv+k-1,jv:jv+n-k+i-1)' * V(iv+i-1,jv:jv+n-k+i-1)' 481* 482 ITMP0 = ITMP0 + 1 483 IF( JJ-JJV+1.GT.0 ) THEN 484 CALL ZLACGV( JJ-JJV+1, V( II+(JJV-1)*LDV ), LDV ) 485 CALL ZGEMV( 'No transpose', ITMP0, JJ-JJV+1, 486 $ -TAU( II ), V( II+1+(JJV-1)*LDV ), LDV, 487 $ V( II+(JJV-1)*LDV ), LDV, ZERO, 488 $ WORK( IW ), 1 ) 489 CALL ZLACGV( JJ-JJV+1, V( II+(JJV-1)*LDV ), LDV ) 490 ELSE 491 CALL ZLASET( 'All', ITMP0, 1, ZERO, ZERO, 492 $ WORK( IW ), ITMP0 ) 493 END IF 494* 495 IW = IW + ITMP0 496 IF( MYCOL.EQ.MICOL ) THEN 497 V( II+(JJ-1)*LDV ) = VII 498 JJ = JJ - 1 499 END IF 500* 501 IF( MOD( JV+N-ITMP0-2, DESCV( NB_ ) ).EQ.0 ) 502 $ MICOL = MOD( MICOL+NPCOL-1, NPCOL ) 503* 504 70 CONTINUE 505* 506 CALL ZGSUM2D( ICTXT, 'Rowwise', ' ', IW-1, 1, WORK, IW-1, 507 $ MYROW, IVCOL ) 508* 509 IF( MYCOL.EQ.IVCOL ) THEN 510* 511 IW = 1 512 ITMP0 = 0 513 ITMP1 = K + 1 + (K-1) * DESCV( MB_ ) 514* 515 T( ITMP1-1 ) = TAU( IIV+K-1 ) 516* 517 DO 80 II = IIV+K-2, IIV, -1 518* 519* T(i+1:k,i) = T(i+1:k,i+1:k) * T(i+1:k,i) 520* 521 ITMP0 = ITMP0 + 1 522 ITMP1 = ITMP1 - DESCV( MB_ ) - 1 523 CALL ZCOPY( ITMP0, WORK( IW ), 1, T( ITMP1 ), 1 ) 524 IW = IW + ITMP0 525* 526 CALL ZTRMV( 'Lower', 'No transpose', 'Non-unit', 527 $ ITMP0, T( ITMP1+DESCV( MB_ ) ), 528 $ DESCV( MB_ ), T( ITMP1 ), 1 ) 529 T( ITMP1-1 ) = TAU( II ) 530* 531 80 CONTINUE 532* 533 END IF 534* 535 END IF 536* 537 END IF 538* 539 RETURN 540* 541* End of PZLARFT 542* 543 END 544