1 SUBROUTINE PZLARZB( SIDE, TRANS, DIRECT, STOREV, M, N, K, L, V, 2 $ IV, JV, DESCV, T, C, IC, JC, DESCC, WORK ) 3* 4* -- ScaLAPACK auxiliary routine (version 2.0.2) -- 5* Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver 6* May 1 2012 7* 8* .. Scalar Arguments .. 9 CHARACTER DIRECT, SIDE, STOREV, TRANS 10 INTEGER IC, IV, JC, JV, K, L, M, N 11* .. 12* .. Array Arguments .. 13 INTEGER DESCC( * ), DESCV( * ) 14 COMPLEX*16 C( * ), T( * ), V( * ), WORK( * ) 15* .. 16* 17* Purpose 18* ======= 19* 20* PZLARZB applies a complex block reflector Q or its conjugate 21* transpose Q**H to a complex M-by-N distributed matrix sub( C ) 22* denoting C(IC:IC+M-1,JC:JC+N-1), from the left or the right. 23* 24* Q is a product of k elementary reflectors as returned by PZTZRZF. 25* 26* Currently, only STOREV = 'R' and DIRECT = 'B' are supported. 27* 28* Notes 29* ===== 30* 31* Each global data object is described by an associated description 32* vector. This vector stores the information required to establish 33* the mapping between an object element and its corresponding process 34* and memory location. 35* 36* Let A be a generic term for any 2D block cyclicly distributed array. 37* Such a global array has an associated description vector DESCA. 38* In the following comments, the character _ should be read as 39* "of the global array". 40* 41* NOTATION STORED IN EXPLANATION 42* --------------- -------------- -------------------------------------- 43* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 44* DTYPE_A = 1. 45* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 46* the BLACS process grid A is distribu- 47* ted over. The context itself is glo- 48* bal, but the handle (the integer 49* value) may vary. 50* M_A (global) DESCA( M_ ) The number of rows in the global 51* array A. 52* N_A (global) DESCA( N_ ) The number of columns in the global 53* array A. 54* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 55* the rows of the array. 56* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 57* the columns of the array. 58* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 59* row of the array A is distributed. 60* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 61* first column of the array A is 62* distributed. 63* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 64* array. LLD_A >= MAX(1,LOCr(M_A)). 65* 66* Let K be the number of rows or columns of a distributed matrix, 67* and assume that its process grid has dimension p x q. 68* LOCr( K ) denotes the number of elements of K that a process 69* would receive if K were distributed over the p processes of its 70* process column. 71* Similarly, LOCc( K ) denotes the number of elements of K that a 72* process would receive if K were distributed over the q processes of 73* its process row. 74* The values of LOCr() and LOCc() may be determined via a call to the 75* ScaLAPACK tool function, NUMROC: 76* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 77* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 78* An upper bound for these quantities may be computed by: 79* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 80* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 81* 82* Arguments 83* ========= 84* 85* SIDE (global input) CHARACTER 86* = 'L': apply Q or Q**H from the Left; 87* = 'R': apply Q or Q**H from the Right. 88* 89* TRANS (global input) CHARACTER 90* = 'N': No transpose, apply Q; 91* = 'C': Conjugate transpose, apply Q**H. 92* 93* DIRECT (global input) CHARACTER 94* Indicates how H is formed from a product of elementary 95* reflectors 96* = 'F': H = H(1) H(2) . . . H(k) (Forward, not supported yet) 97* = 'B': H = H(k) . . . H(2) H(1) (Backward) 98* 99* STOREV (global input) CHARACTER 100* Indicates how the vectors which define the elementary 101* reflectors are stored: 102* = 'C': Columnwise (not supported yet) 103* = 'R': Rowwise 104* 105* M (global input) INTEGER 106* The number of rows to be operated on i.e the number of rows 107* of the distributed submatrix sub( C ). M >= 0. 108* 109* N (global input) INTEGER 110* The number of columns to be operated on i.e the number of 111* columns of the distributed submatrix sub( C ). N >= 0. 112* 113* K (global input) INTEGER 114* The order of the matrix T (= the number of elementary 115* reflectors whose product defines the block reflector). 116* 117* L (global input) INTEGER 118* The columns of the distributed submatrix sub( A ) containing 119* the meaningful part of the Householder reflectors. 120* If SIDE = 'L', M >= L >= 0, if SIDE = 'R', N >= L >= 0. 121* 122* V (local input) COMPLEX*16 pointer into the local memory 123* to an array of dimension (LLD_V, LOCc(JV+M-1)) if SIDE = 'L', 124* (LLD_V, LOCc(JV+N-1)) if SIDE = 'R'. It contains the local 125* pieces of the distributed vectors V representing the 126* Householder transformation as returned by PZTZRZF. 127* LLD_V >= LOCr(IV+K-1). 128* 129* IV (global input) INTEGER 130* The row index in the global array V indicating the first 131* row of sub( V ). 132* 133* JV (global input) INTEGER 134* The column index in the global array V indicating the 135* first column of sub( V ). 136* 137* DESCV (global and local input) INTEGER array of dimension DLEN_. 138* The array descriptor for the distributed matrix V. 139* 140* T (local input) COMPLEX*16 array, dimension MB_V by MB_V 141* The lower triangular matrix T in the representation of the 142* block reflector. 143* 144* C (local input/local output) COMPLEX*16 pointer into the 145* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 146* On entry, the M-by-N distributed matrix sub( C ). On exit, 147* sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) or 148* sub( C )*Q or sub( C )*Q'. 149* 150* IC (global input) INTEGER 151* The row index in the global array C indicating the first 152* row of sub( C ). 153* 154* JC (global input) INTEGER 155* The column index in the global array C indicating the 156* first column of sub( C ). 157* 158* DESCC (global and local input) INTEGER array of dimension DLEN_. 159* The array descriptor for the distributed matrix C. 160* 161* WORK (local workspace) COMPLEX*16 array, dimension (LWORK) 162* If STOREV = 'C', 163* if SIDE = 'L', 164* LWORK >= ( NqC0 + MpC0 ) * K 165* else if SIDE = 'R', 166* LWORK >= ( NqC0 + MAX( NpV0 + NUMROC( NUMROC( N+ICOFFC, 167* NB_V, 0, 0, NPCOL ), NB_V, 0, 0, LCMQ ), 168* MpC0 ) ) * K 169* end if 170* else if STOREV = 'R', 171* if SIDE = 'L', 172* LWORK >= ( MpC0 + MAX( MqV0 + NUMROC( NUMROC( M+IROFFC, 173* MB_V, 0, 0, NPROW ), MB_V, 0, 0, LCMP ), 174* NqC0 ) ) * K 175* else if SIDE = 'R', 176* LWORK >= ( MpC0 + NqC0 ) * K 177* end if 178* end if 179* 180* where LCMQ = LCM / NPCOL with LCM = ICLM( NPROW, NPCOL ), 181* 182* IROFFV = MOD( IV-1, MB_V ), ICOFFV = MOD( JV-1, NB_V ), 183* IVROW = INDXG2P( IV, MB_V, MYROW, RSRC_V, NPROW ), 184* IVCOL = INDXG2P( JV, NB_V, MYCOL, CSRC_V, NPCOL ), 185* MqV0 = NUMROC( M+ICOFFV, NB_V, MYCOL, IVCOL, NPCOL ), 186* NpV0 = NUMROC( N+IROFFV, MB_V, MYROW, IVROW, NPROW ), 187* 188* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 189* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 190* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 191* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 192* NpC0 = NUMROC( N+ICOFFC, MB_C, MYROW, ICROW, NPROW ), 193* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 194* 195* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 196* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 197* the subroutine BLACS_GRIDINFO. 198* 199* Alignment requirements 200* ====================== 201* 202* The distributed submatrices V(IV:*, JV:*) and C(IC:IC+M-1,JC:JC+N-1) 203* must verify some alignment properties, namely the following 204* expressions should be true: 205* 206* If STOREV = 'Columnwise' 207* If SIDE = 'Left', 208* ( MB_V.EQ.MB_C .AND. IROFFV.EQ.IROFFC .AND. IVROW.EQ.ICROW ) 209* If SIDE = 'Right', 210* ( MB_V.EQ.NB_C .AND. IROFFV.EQ.ICOFFC ) 211* else if STOREV = 'Rowwise' 212* If SIDE = 'Left', 213* ( NB_V.EQ.MB_C .AND. ICOFFV.EQ.IROFFC ) 214* If SIDE = 'Right', 215* ( NB_V.EQ.NB_C .AND. ICOFFV.EQ.ICOFFC .AND. IVCOL.EQ.ICCOL ) 216* end if 217* 218* ===================================================================== 219* 220* .. Parameters .. 221 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 222 $ LLD_, MB_, M_, NB_, N_, RSRC_ 223 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 224 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 225 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 226 COMPLEX*16 ONE, ZERO 227 PARAMETER ( ONE = ( 1.0D+0, 0.0D+0 ), 228 $ ZERO = ( 0.0D+0, 0.0D+0 ) ) 229* .. 230* .. Local Scalars .. 231 LOGICAL LEFT 232 CHARACTER COLBTOP, TRANST 233 INTEGER ICCOL1, ICCOL2, ICOFFC1, ICOFFC2, ICOFFV, 234 $ ICROW1, ICROW2, ICTXT, IIBEG, IIC1, IIC2, 235 $ IIEND, IINXT, IIV, ILEFT, INFO, IOFFC2, IOFFV, 236 $ IPT, IPV, IPW, IROFFC1, IROFFC2, ITOP, IVCOL, 237 $ IVROW, J, JJBEG, JJEND, JJNXT, JJC1, JJC2, JJV, 238 $ LDC, LDV, LV, LW, MBC, MBV, MPC1, MPC2, MPC20, 239 $ MQV, MQV0, MYCOL, MYDIST, MYROW, NBC, NBV, 240 $ NPCOL, NPROW, NQC1, NQC2, NQCALL, NQV 241* .. 242* .. External Subroutines .. 243 EXTERNAL BLACS_ABORT, BLACS_GRIDINFO, INFOG2L, 244 $ PBZMATADD, PB_TOPGET, PXERBLA, PBZTRAN, 245 $ ZGEBR2D, ZGEBS2D, ZGEMM, 246 $ ZGSUM2D, ZLACGV, ZLAMOV, ZLASET, 247 $ ZTRBR2D, ZTRBS2D, ZTRMM 248* .. 249* .. Intrinsic Functions .. 250 INTRINSIC MAX, MIN, MOD 251* .. 252* .. External Functions .. 253 LOGICAL LSAME 254 INTEGER ICEIL, NUMROC 255 EXTERNAL ICEIL, LSAME, NUMROC 256* .. 257* .. Executable Statements .. 258* 259* Quick return if possible 260* 261 IF( M.LE.0 .OR. N.LE.0 .OR. K.LE.0 ) 262 $ RETURN 263* 264* Get grid parameters 265* 266 ICTXT = DESCC( CTXT_ ) 267 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 268* 269* Check for currently supported options 270* 271 INFO = 0 272 IF( .NOT.LSAME( DIRECT, 'B' ) ) THEN 273 INFO = -3 274 ELSE IF( .NOT.LSAME( STOREV, 'R' ) ) THEN 275 INFO = -4 276 END IF 277 IF( INFO.NE.0 ) THEN 278 CALL PXERBLA( ICTXT, 'PZLARZB', -INFO ) 279 CALL BLACS_ABORT( ICTXT, 1 ) 280 RETURN 281 END IF 282* 283 LEFT = LSAME( SIDE, 'L' ) 284 IF( LSAME( TRANS, 'N' ) ) THEN 285 TRANST = 'C' 286 ELSE 287 TRANST = 'N' 288 END IF 289* 290 CALL INFOG2L( IV, JV, DESCV, NPROW, NPCOL, MYROW, MYCOL, IIV, JJV, 291 $ IVROW, IVCOL ) 292 MBV = DESCV( MB_ ) 293 NBV = DESCV( NB_ ) 294 ICOFFV = MOD( JV-1, NBV ) 295 NQV = NUMROC( L+ICOFFV, NBV, MYCOL, IVCOL, NPCOL ) 296 IF( MYCOL.EQ.IVCOL ) 297 $ NQV = NQV - ICOFFV 298 LDV = DESCV( LLD_ ) 299 IIV = MIN( IIV, LDV ) 300 JJV = MIN( JJV, MAX( 1, NUMROC( DESCV( N_ ), NBV, MYCOL, 301 $ DESCV( CSRC_ ), NPCOL ) ) ) 302 IOFFV = IIV + ( JJV-1 ) * LDV 303 MBC = DESCC( MB_ ) 304 NBC = DESCC( NB_ ) 305 NQCALL = NUMROC( DESCC( N_ ), NBC, MYCOL, DESCC( CSRC_ ), NPCOL ) 306 CALL INFOG2L( IC, JC, DESCC, NPROW, NPCOL, MYROW, MYCOL, IIC1, 307 $ JJC1, ICROW1, ICCOL1 ) 308 LDC = DESCC( LLD_ ) 309 IIC1 = MIN( IIC1, LDC ) 310 JJC1 = MIN( JJC1, MAX( 1, NQCALL ) ) 311* 312 IF( LEFT ) THEN 313 IROFFC1 = MOD( IC-1, MBC ) 314 MPC1 = NUMROC( K+IROFFC1, MBC, MYROW, ICROW1, NPROW ) 315 IF( MYROW.EQ.ICROW1 ) 316 $ MPC1 = MPC1 - IROFFC1 317 ICOFFC1 = MOD( JC-1, NBC ) 318 NQC1 = NUMROC( N+ICOFFC1, NBC, MYCOL, ICCOL1, NPCOL ) 319 IF( MYCOL.EQ.ICCOL1 ) 320 $ NQC1 = NQC1 - ICOFFC1 321 CALL INFOG2L( IC+M-L, JC, DESCC, NPROW, NPCOL, MYROW, MYCOL, 322 $ IIC2, JJC2, ICROW2, ICCOL2 ) 323 IROFFC2 = MOD( IC+M-L-1, MBC ) 324 MPC2 = NUMROC( L+IROFFC2, MBC, MYROW, ICROW2, NPROW ) 325 IF( MYROW.EQ.ICROW2 ) 326 $ MPC2 = MPC2 - IROFFC2 327 ICOFFC2 = ICOFFC1 328 NQC2 = NQC1 329 ELSE 330 IROFFC1 = MOD( IC-1, MBC ) 331 MPC1 = NUMROC( M+IROFFC1, MBC, MYROW, ICROW1, NPROW ) 332 IF( MYROW.EQ.ICROW1 ) 333 $ MPC1 = MPC1 - IROFFC1 334 ICOFFC1 = MOD( JC-1, NBC ) 335 NQC1 = NUMROC( K+ICOFFC1, NBC, MYCOL, ICCOL1, NPCOL ) 336 IF( MYCOL.EQ.ICCOL1 ) 337 $ NQC1 = NQC1 - ICOFFC1 338 CALL INFOG2L( IC, JC+N-L, DESCC, NPROW, NPCOL, MYROW, MYCOL, 339 $ IIC2, JJC2, ICROW2, ICCOL2 ) 340 IROFFC2 = IROFFC1 341 MPC2 = MPC1 342 ICOFFC2 = MOD( JC+N-L-1, NBC ) 343 NQC2 = NUMROC( L+ICOFFC2, NBC, MYCOL, ICCOL2, NPCOL ) 344 IF( MYCOL.EQ.ICCOL2 ) 345 $ NQC2 = NQC2 - ICOFFC2 346 END IF 347 IIC2 = MIN( IIC2, LDC ) 348 JJC2 = MIN( JJC2, NQCALL ) 349 IOFFC2 = IIC2 + ( JJC2-1 ) * LDC 350* 351 IF( LSAME( SIDE, 'L' ) ) THEN 352* 353* Form Q*sub( C ) or Q'*sub( C ) 354* 355* IROFFC2 = ICOFFV is required by the current transposition 356* routine PBZTRAN 357* 358 MQV0 = NUMROC( M+ICOFFV, NBV, MYCOL, IVCOL, NPCOL ) 359 IF( MYCOL.EQ.IVCOL ) THEN 360 MQV = MQV0 - ICOFFV 361 ELSE 362 MQV = MQV0 363 END IF 364 IF( MYROW.EQ.ICROW2 ) THEN 365 MPC20 = MPC2 + IROFFC2 366 ELSE 367 MPC20 = MPC2 368 END IF 369* 370* Locally V( IOFFV ) is K x MQV, C( IOFFC2 ) is MPC2 x NQC2 371* WORK( IPV ) is MPC20 x K = [ . V( IOFFV ) ]' 372* WORK( IPW ) is K x MQV0 = [ . V( IOFFV ) ] 373* WORK( IPT ) is the workspace for PBZTRAN 374* 375 IPV = 1 376 IPW = IPV + MPC20 * K 377 IPT = IPW + K * MQV0 378 LV = MAX( 1, MPC20 ) 379 LW = MAX( 1, K ) 380* 381 IF( MYROW.EQ.IVROW ) THEN 382 IF( MYCOL.EQ.IVCOL ) THEN 383 CALL ZLAMOV( 'All', K, MQV, V( IOFFV ), LDV, 384 $ WORK( IPW+ICOFFV*LW ), LW ) 385 ELSE 386 CALL ZLAMOV( 'All', K, MQV, V( IOFFV ), LDV, 387 $ WORK( IPW ), LW ) 388 END IF 389 END IF 390* 391* WORK( IPV ) = WORK( IPW )' (replicated) is MPC20 x K 392* 393 CALL PBZTRAN( ICTXT, 'Rowwise', 'Conjugate transpose', K, 394 $ M+ICOFFV, DESCV( NB_ ), WORK( IPW ), LW, ZERO, 395 $ WORK( IPV ), LV, IVROW, IVCOL, ICROW2, -1, 396 $ WORK( IPT ) ) 397* 398* WORK( IPV ) = ( . V )' -> WORK( IPV ) = V' is MPC2 x K 399* 400 IF( MYROW.EQ.ICROW2 ) 401 $ IPV = IPV + IROFFC2 402* 403* WORK( IPW ) becomes NQC2 x K = C( IOFFC2 )' * V' 404* WORK( IPW ) = C( IOFFC2 )' * V' (NQC2 x MPC2 x K) -> NQC2 x K 405* 406 LW = MAX( 1, NQC2 ) 407* 408 IF( MPC2.GT.0 ) THEN 409 CALL ZGEMM( 'Transpose', 'No transpose', NQC2, K, MPC2, 410 $ ONE, C( IOFFC2 ), LDC, WORK( IPV ), LV, ZERO, 411 $ WORK( IPW ), LW ) 412 ELSE 413 CALL ZLASET( 'All', NQC2, K, ZERO, ZERO, WORK( IPW ), LW ) 414 END IF 415* 416* WORK( IPW ) = WORK( IPW ) + C1 ( NQC1 = NQC2 ) 417* 418 IF( MPC1.GT.0 ) THEN 419 MYDIST = MOD( MYROW-ICROW1+NPROW, NPROW ) 420 ITOP = MAX( 0, MYDIST * MBC - IROFFC1 ) 421 IIBEG = IIC1 422 IIEND = IIC1 + MPC1 - 1 423 IINXT = MIN( ICEIL( IIBEG, MBC ) * MBC, IIEND ) 424* 425 10 CONTINUE 426 IF( IIBEG.LE.IINXT ) THEN 427 CALL PBZMATADD( ICTXT, 'Transpose', NQC2, IINXT-IIBEG+1, 428 $ ONE, C( IIBEG+(JJC1-1)*LDC ), LDC, ONE, 429 $ WORK( IPW+ITOP ), LW ) 430 MYDIST = MYDIST + NPROW 431 ITOP = MYDIST * MBC - IROFFC1 432 IIBEG = IINXT +1 433 IINXT = MIN( IINXT+MBC, IIEND ) 434 GO TO 10 435 END IF 436 END IF 437* 438 CALL ZGSUM2D( ICTXT, 'Columnwise', ' ', NQC2, K, WORK( IPW ), 439 $ LW, IVROW, MYCOL ) 440* 441* WORK( IPW ) = WORK( IPW ) * T' or WORK( IPW ) * T 442* 443 IF( MYROW.EQ.IVROW ) THEN 444 IF( MYCOL.EQ.IVCOL ) THEN 445* 446* Broadcast the block reflector to the other columns. 447* 448 CALL ZTRBS2D( ICTXT, 'Rowwise', ' ', 'Lower', 'Non unit', 449 $ K, K, T, MBV ) 450 ELSE 451 CALL ZTRBR2D( ICTXT, 'Rowwise', ' ', 'Lower', 'Non unit', 452 $ K, K, T, MBV, MYROW, IVCOL ) 453 END IF 454 CALL ZTRMM( 'Right', 'Lower', TRANST, 'Non unit', NQC2, K, 455 $ ONE, T, MBV, WORK( IPW ), LW ) 456* 457 CALL ZGEBS2D( ICTXT, 'Columnwise', ' ', NQC2, K, 458 $ WORK( IPW ), LW ) 459 ELSE 460 CALL ZGEBR2D( ICTXT, 'Columnwise', ' ', NQC2, K, 461 $ WORK( IPW ), LW, IVROW, MYCOL ) 462 END IF 463* 464* C1 = C1 - WORK( IPW ) 465* 466 IF( MPC1.GT.0 ) THEN 467 MYDIST = MOD( MYROW-ICROW1+NPROW, NPROW ) 468 ITOP = MAX( 0, MYDIST * MBC - IROFFC1 ) 469 IIBEG = IIC1 470 IIEND = IIC1 + MPC1 - 1 471 IINXT = MIN( ICEIL( IIBEG, MBC ) * MBC, IIEND ) 472* 473 20 CONTINUE 474 IF( IIBEG.LE.IINXT ) THEN 475 CALL PBZMATADD( ICTXT, 'Transpose', IINXT-IIBEG+1, NQC2, 476 $ -ONE, WORK( IPW+ITOP ), LW, ONE, 477 $ C( IIBEG+(JJC1-1)*LDC ), LDC ) 478 MYDIST = MYDIST + NPROW 479 ITOP = MYDIST * MBC - IROFFC1 480 IIBEG = IINXT +1 481 IINXT = MIN( IINXT+MBC, IIEND ) 482 GO TO 20 483 END IF 484 END IF 485* 486* C2 C2 - V' * W' 487* C( IOFFC2 ) = C( IOFFC2 ) - WORK( IPV ) * WORK( IPW )' 488* MPC2 x NQC2 MPC2 x K K x NQC2 489* 490 DO 30 J = 1, K 491 CALL ZLACGV( MPC2, WORK( IPV+(J-1)*LV ), 1 ) 492 30 CONTINUE 493 CALL ZGEMM( 'No transpose', 'Transpose', MPC2, NQC2, K, -ONE, 494 $ WORK( IPV ), LV, WORK( IPW ), LW, ONE, 495 $ C( IOFFC2 ), LDC ) 496* 497 ELSE 498* 499* Form sub( C ) * Q or sub( C ) * Q' 500* 501* Locally V( IOFFV ) is K x NQV, C( IOFFC2 ) is MPC2 x NQC2 502* WORK( IPV ) is K x NQV = V( IOFFV ), NQV = NQC2 503* WORK( IPW ) is MPC2 x K = C( IOFFC2 ) * V( IOFFV )' 504* 505 IPV = 1 506 IPW = IPV + K * NQC2 507 LV = MAX( 1, K ) 508 LW = MAX( 1, MPC2 ) 509* 510* Broadcast V to the other process rows. 511* 512 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 513 IF( MYROW.EQ.IVROW ) THEN 514 CALL ZGEBS2D( ICTXT, 'Columnwise', COLBTOP, K, NQC2, 515 $ V( IOFFV ), LDV ) 516 IF( MYCOL.EQ.IVCOL ) 517 $ CALL ZTRBS2D( ICTXT, 'Columnwise', COLBTOP, 'Lower', 518 $ 'Non unit', K, K, T, MBV ) 519 CALL ZLAMOV( 'All', K, NQC2, V( IOFFV ), LDV, WORK( IPV ), 520 $ LV ) 521 ELSE 522 CALL ZGEBR2D( ICTXT, 'Columnwise', COLBTOP, K, NQC2, 523 $ WORK( IPV ), LV, IVROW, MYCOL ) 524 IF( MYCOL.EQ.IVCOL ) 525 $ CALL ZTRBR2D( ICTXT, 'Columnwise', COLBTOP, 'Lower', 526 $ 'Non unit', K, K, T, MBV, IVROW, MYCOL ) 527 END IF 528* 529* WORK( IPV ) is K x NQC2 = V = V( IOFFV ) 530* WORK( IPW ) = C( IOFFC2 ) * V' (MPC2 x NQC2 x K) -> MPC2 x K 531* 532 IF( NQC2.GT.0 ) THEN 533 CALL ZGEMM( 'No Transpose', 'Transpose', MPC2, K, NQC2, 534 $ ONE, C( IOFFC2 ), LDC, WORK( IPV ), LV, ZERO, 535 $ WORK( IPW ), LW ) 536 ELSE 537 CALL ZLASET( 'All', MPC2, K, ZERO, ZERO, WORK( IPW ), LW ) 538 END IF 539* 540* WORK( IPW ) = WORK( IPW ) + C1 ( MPC1 = MPC2 ) 541* 542 IF( NQC1.GT.0 ) THEN 543 MYDIST = MOD( MYCOL-ICCOL1+NPCOL, NPCOL ) 544 ILEFT = MAX( 0, MYDIST * NBC - ICOFFC1 ) 545 JJBEG = JJC1 546 JJEND = JJC1 + NQC1 - 1 547 JJNXT = MIN( ICEIL( JJBEG, NBC ) * NBC, JJEND ) 548* 549 40 CONTINUE 550 IF( JJBEG.LE.JJNXT ) THEN 551 CALL PBZMATADD( ICTXT, 'No transpose', MPC2, 552 $ JJNXT-JJBEG+1, ONE, 553 $ C( IIC1+(JJBEG-1)*LDC ), LDC, ONE, 554 $ WORK( IPW+ILEFT*LW ), LW ) 555 MYDIST = MYDIST + NPCOL 556 ILEFT = MYDIST * NBC - ICOFFC1 557 JJBEG = JJNXT +1 558 JJNXT = MIN( JJNXT+NBC, JJEND ) 559 GO TO 40 560 END IF 561 END IF 562* 563 CALL ZGSUM2D( ICTXT, 'Rowwise', ' ', MPC2, K, WORK( IPW ), 564 $ LW, MYROW, IVCOL ) 565* 566* WORK( IPW ) = WORK( IPW ) * T' or WORK( IPW ) * T 567* 568 IF( MYCOL.EQ.IVCOL ) THEN 569 DO 50 J = 1, K 570 CALL ZLACGV( K-J+1, T( J+(J-1)*MBV ), 1 ) 571 50 CONTINUE 572 CALL ZTRMM( 'Right', 'Lower', TRANS, 'Non unit', MPC2, K, 573 $ ONE, T, MBV, WORK( IPW ), LW ) 574 CALL ZGEBS2D( ICTXT, 'Rowwise', ' ', MPC2, K, WORK( IPW ), 575 $ LW ) 576 DO 60 J = 1, K 577 CALL ZLACGV( K-J+1, T( J+(J-1)*MBV ), 1 ) 578 60 CONTINUE 579 ELSE 580 CALL ZGEBR2D( ICTXT, 'Rowwise', ' ', MPC2, K, WORK( IPW ), 581 $ LW, MYROW, IVCOL ) 582 END IF 583* 584* C1 = C1 - WORK( IPW ) 585* 586 IF( NQC1.GT.0 ) THEN 587 MYDIST = MOD( MYCOL-ICCOL1+NPCOL, NPCOL ) 588 ILEFT = MAX( 0, MYDIST * NBC - ICOFFC1 ) 589 JJBEG = JJC1 590 JJEND = JJC1 + NQC1 - 1 591 JJNXT = MIN( ICEIL( JJBEG, NBC ) * NBC, JJEND ) 592* 593 70 CONTINUE 594 IF( JJBEG.LE.JJNXT ) THEN 595 CALL PBZMATADD( ICTXT, 'No transpose', MPC2, 596 $ JJNXT-JJBEG+1, -ONE, 597 $ WORK( IPW+ILEFT*LW ), LW, ONE, 598 $ C( IIC1+(JJBEG-1)*LDC ), LDC ) 599 MYDIST = MYDIST + NPCOL 600 ILEFT = MYDIST * NBC - ICOFFC1 601 JJBEG = JJNXT +1 602 JJNXT = MIN( JJNXT+NBC, JJEND ) 603 GO TO 70 604 END IF 605 END IF 606* 607* C2 C2 - W * conjg( V ) 608* C( IOFFC ) = C( IOFFC ) - WORK( IPW ) * conjg( WORK( IPV ) ) 609* MPC2 x NQC2 MPC2 x K K x NQC2 610* 611 DO 80 J = 1, NQC2 612 CALL ZLACGV( K, WORK( IPV+(J-1)*LV ), 1 ) 613 80 CONTINUE 614 IF( IOFFC2.GT.0 ) 615 $ CALL ZGEMM( 'No transpose', 'No transpose', MPC2, NQC2, K, 616 $ -ONE, WORK( IPW ), LW, WORK( IPV ), LV, ONE, 617 $ C( IOFFC2 ), LDC ) 618* 619 END IF 620* 621 RETURN 622* 623* End of PZLARZB 624* 625 END 626