1 SUBROUTINE PZUNMRQ( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 2 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 CHARACTER SIDE, TRANS 11 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ), DESCC( * ) 15 COMPLEX*16 A( * ), C( * ), TAU( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PZUNMRQ overwrites the general complex M-by-N distributed matrix 22* sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 23* 24* SIDE = 'L' SIDE = 'R' 25* TRANS = 'N': Q * sub( C ) sub( C ) * Q 26* TRANS = 'C': Q**H * sub( C ) sub( C ) * Q**H 27* 28* where Q is a complex unitary distributed matrix defined as the 29* product of K elementary reflectors 30* 31* Q = H(1)' H(2)' . . . H(k)' 32* 33* as returned by PZGERQF. Q is of order M if SIDE = 'L' and of order N 34* if SIDE = 'R'. 35* 36* Notes 37* ===== 38* 39* Each global data object is described by an associated description 40* vector. This vector stores the information required to establish 41* the mapping between an object element and its corresponding process 42* and memory location. 43* 44* Let A be a generic term for any 2D block cyclicly distributed array. 45* Such a global array has an associated description vector DESCA. 46* In the following comments, the character _ should be read as 47* "of the global array". 48* 49* NOTATION STORED IN EXPLANATION 50* --------------- -------------- -------------------------------------- 51* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 52* DTYPE_A = 1. 53* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 54* the BLACS process grid A is distribu- 55* ted over. The context itself is glo- 56* bal, but the handle (the integer 57* value) may vary. 58* M_A (global) DESCA( M_ ) The number of rows in the global 59* array A. 60* N_A (global) DESCA( N_ ) The number of columns in the global 61* array A. 62* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 63* the rows of the array. 64* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 65* the columns of the array. 66* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 67* row of the array A is distributed. 68* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 69* first column of the array A is 70* distributed. 71* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 72* array. LLD_A >= MAX(1,LOCr(M_A)). 73* 74* Let K be the number of rows or columns of a distributed matrix, 75* and assume that its process grid has dimension p x q. 76* LOCr( K ) denotes the number of elements of K that a process 77* would receive if K were distributed over the p processes of its 78* process column. 79* Similarly, LOCc( K ) denotes the number of elements of K that a 80* process would receive if K were distributed over the q processes of 81* its process row. 82* The values of LOCr() and LOCc() may be determined via a call to the 83* ScaLAPACK tool function, NUMROC: 84* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 85* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 86* An upper bound for these quantities may be computed by: 87* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 88* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 89* 90* Arguments 91* ========= 92* 93* SIDE (global input) CHARACTER 94* = 'L': apply Q or Q**H from the Left; 95* = 'R': apply Q or Q**H from the Right. 96* 97* TRANS (global input) CHARACTER 98* = 'N': No transpose, apply Q; 99* = 'C': Conjugate transpose, apply Q**H. 100* 101* M (global input) INTEGER 102* The number of rows to be operated on i.e the number of rows 103* of the distributed submatrix sub( C ). M >= 0. 104* 105* N (global input) INTEGER 106* The number of columns to be operated on i.e the number of 107* columns of the distributed submatrix sub( C ). N >= 0. 108* 109* K (global input) INTEGER 110* The number of elementary reflectors whose product defines the 111* matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 112* N >= K >= 0. 113* 114* A (local input) COMPLEX*16 pointer into the local memory 115* to an array of dimension (LLD_A,LOCc(JA+M-1)) if SIDE='L', 116* and (LLD_A,LOCc(JA+N-1)) if SIDE='R', where 117* LLD_A >= MAX(1,LOCr(IA+K-1)); On entry, the i-th row must 118* contain the vector which defines the elementary reflector 119* H(i), IA <= i <= IA+K-1, as returned by PZGERQF in the 120* K rows of its distributed matrix argument A(IA:IA+K-1,JA:*). 121* A(IA:IA+K-1,JA:*) is modified by the routine but restored on 122* exit. 123* 124* IA (global input) INTEGER 125* The row index in the global array A indicating the first 126* row of sub( A ). 127* 128* JA (global input) INTEGER 129* The column index in the global array A indicating the 130* first column of sub( A ). 131* 132* DESCA (global and local input) INTEGER array of dimension DLEN_. 133* The array descriptor for the distributed matrix A. 134* 135* TAU (local input) COMPLEX*16, array, dimension LOCc(IA+K-1). 136* This array contains the scalar factors TAU(i) of the 137* elementary reflectors H(i) as returned by PZGERQF. 138* TAU is tied to the distributed matrix A. 139* 140* C (local input/local output) COMPLEX*16 pointer into the 141* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 142* On entry, the local pieces of the distributed matrix sub(C). 143* On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 144* or sub( C )*Q' or sub( C )*Q. 145* 146* IC (global input) INTEGER 147* The row index in the global array C indicating the first 148* row of sub( C ). 149* 150* JC (global input) INTEGER 151* The column index in the global array C indicating the 152* first column of sub( C ). 153* 154* DESCC (global and local input) INTEGER array of dimension DLEN_. 155* The array descriptor for the distributed matrix C. 156* 157* WORK (local workspace/local output) COMPLEX*16 array, 158* dimension (LWORK) 159* On exit, WORK(1) returns the minimal and optimal LWORK. 160* 161* LWORK (local or global input) INTEGER 162* The dimension of the array WORK. 163* LWORK is local input and must be at least 164* if SIDE = 'L', 165* LWORK >= MAX( (MB_A*(MB_A-1))/2, ( MpC0 + MAX( MqA0 + 166* NUMROC( NUMROC( M+IROFFC, MB_A, 0, 0, NPROW ), 167* MB_A, 0, 0, LCMP ), NqC0 ) )*MB_A ) + 168* MB_A * MB_A 169* else if SIDE = 'R', 170* LWORK >= MAX( (MB_A*(MB_A-1))/2, (MpC0 + NqC0)*MB_A ) + 171* MB_A * MB_A 172* end if 173* 174* where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ), 175* 176* IROFFA = MOD( IA-1, MB_A ), ICOFFA = MOD( JA-1, NB_A ), 177* IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 178* MqA0 = NUMROC( M+ICOFFA, NB_A, MYCOL, IACOL, NPCOL ), 179* 180* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 181* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 182* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 183* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 184* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 185* 186* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 187* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 188* the subroutine BLACS_GRIDINFO. 189* 190* If LWORK = -1, then LWORK is global input and a workspace 191* query is assumed; the routine only calculates the minimum 192* and optimal size for all work arrays. Each of these 193* values is returned in the first entry of the corresponding 194* work array, and no error message is issued by PXERBLA. 195* 196* 197* INFO (global output) INTEGER 198* = 0: successful exit 199* < 0: If the i-th argument is an array and the j-entry had 200* an illegal value, then INFO = -(i*100+j), if the i-th 201* argument is a scalar and had an illegal value, then 202* INFO = -i. 203* 204* Alignment requirements 205* ====================== 206* 207* The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 208* must verify some alignment properties, namely the following 209* expressions should be true: 210* 211* If SIDE = 'L', 212* ( NB_A.EQ.MB_C .AND. ICOFFA.EQ.IROFFC ) 213* If SIDE = 'R', 214* ( NB_A.EQ.NB_C .AND. ICOFFA.EQ.ICOFFC .AND. IACOL.EQ.ICCOL ) 215* 216* ===================================================================== 217* 218* .. Parameters .. 219 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 220 $ LLD_, MB_, M_, NB_, N_, RSRC_ 221 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 222 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 223 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 224* .. 225* .. Local Scalars .. 226 LOGICAL LEFT, LQUERY, NOTRAN, RIGHT, TRAN 227 CHARACTER COLBTOP, ROWBTOP, TRANST 228 INTEGER I, I1, I2, I3, IACOL, IB, ICCOL, ICOFFA, 229 $ ICOFFC, ICROW, ICTXT, IINFO, IPW, IROFFC, LCM, 230 $ LCMP, LWMIN, MI, MPC0, MQA0, MYCOL, MYROW, NI, 231 $ NPCOL, NPROW, NQ, NQC0 232* .. 233* .. Local Arrays .. 234 INTEGER IDUM1( 4 ), IDUM2( 4 ) 235* .. 236* .. External Subroutines .. 237 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK2MAT, PB_TOPGET, 238 $ PB_TOPSET, PXERBLA, PZLARFB, PZLARFT, 239 $ PZUNMR2 240* .. 241* .. External Functions .. 242 LOGICAL LSAME 243 INTEGER ICEIL, ILCM, INDXG2P, NUMROC 244 EXTERNAL ICEIL, ILCM, INDXG2P, LSAME, NUMROC 245* .. 246* .. Intrinsic Functions .. 247 INTRINSIC DBLE, DCMPLX, ICHAR, MAX, MIN, MOD 248* .. 249* .. Executable Statements .. 250* 251* Get grid parameters 252* 253 ICTXT = DESCA( CTXT_ ) 254 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 255* 256* Test the input parameters 257* 258 INFO = 0 259 IF( NPROW.EQ.-1 ) THEN 260 INFO = -(900+CTXT_) 261 ELSE 262 IF( LSAME( SIDE, 'L' ) ) THEN 263 LEFT = .TRUE. 264 RIGHT = .FALSE. 265 ELSE 266 LEFT = .FALSE. 267 RIGHT = .TRUE. 268 END IF 269 IF( LSAME( TRANS, 'N' ) ) THEN 270 NOTRAN = .TRUE. 271 TRAN = .FALSE. 272 ELSE 273 NOTRAN = .FALSE. 274 TRAN = .TRUE. 275 END IF 276* 277* NQ is the order of Q 278* 279 IF( LEFT ) THEN 280 NQ = M 281 CALL CHK1MAT( K, 5, M, 3, IA, JA, DESCA, 9, INFO ) 282 ELSE 283 NQ = N 284 CALL CHK1MAT( K, 5, N, 4, IA, JA, DESCA, 9, INFO ) 285 END IF 286 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 287 IF( INFO.EQ.0 ) THEN 288 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 289 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 290 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 291 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 292 $ NPCOL ) 293 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 294 $ NPROW ) 295 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 296 $ NPCOL ) 297 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 298 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 299* 300 IF( LEFT ) THEN 301 MQA0 = NUMROC( M+ICOFFA, DESCA( NB_ ), MYCOL, IACOL, 302 $ NPCOL ) 303 LCM = ILCM( NPROW, NPCOL ) 304 LCMP = LCM / NPROW 305 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) 306 $ / 2, ( MPC0 + MAX( MQA0 + NUMROC( NUMROC( 307 $ M+IROFFC, DESCA( MB_ ), 0, 0, NPROW ), 308 $ DESCA( MB_ ), 0, 0, LCMP ), NQC0 ) ) * 309 $ DESCA( MB_ ) ) + DESCA( MB_ ) * DESCA( MB_ ) 310 ELSE 311 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) / 2, 312 $ ( MPC0 + NQC0 ) * DESCA( MB_ ) ) + 313 $ DESCA( MB_ ) * DESCA( MB_ ) 314 END IF 315* 316 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 317 LQUERY = ( LWORK.EQ.-1 ) 318 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 319 INFO = -1 320 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'C' ) ) THEN 321 INFO = -2 322 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 323 INFO = -5 324 ELSE IF( LEFT .AND. DESCA( NB_ ).NE.DESCC( MB_ ) ) THEN 325 INFO = -(900+NB_) 326 ELSE IF( LEFT .AND. ICOFFA.NE.IROFFC ) THEN 327 INFO = -12 328 ELSE IF( .NOT.LEFT .AND. ICOFFA.NE.ICOFFC ) THEN 329 INFO = -13 330 ELSE IF( .NOT.LEFT .AND. IACOL.NE.ICCOL ) THEN 331 INFO = -13 332 ELSE IF( .NOT.LEFT .AND. DESCA( NB_ ).NE.DESCC( NB_ ) ) THEN 333 INFO = -(1400+NB_) 334 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 335 INFO = -(1400+CTXT_) 336 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 337 INFO = -16 338 END IF 339 END IF 340 IF( LEFT ) THEN 341 IDUM1( 1 ) = ICHAR( 'L' ) 342 ELSE 343 IDUM1( 1 ) = ICHAR( 'R' ) 344 END IF 345 IDUM2( 1 ) = 1 346 IF( NOTRAN ) THEN 347 IDUM1( 2 ) = ICHAR( 'N' ) 348 ELSE 349 IDUM1( 2 ) = ICHAR( 'C' ) 350 END IF 351 IDUM2( 2 ) = 2 352 IDUM1( 3 ) = K 353 IDUM2( 3 ) = 5 354 IF( LWORK.EQ.-1 ) THEN 355 IDUM1( 4 ) = -1 356 ELSE 357 IDUM1( 4 ) = 1 358 END IF 359 IDUM2( 4 ) = 16 360 IF( LEFT ) THEN 361 CALL PCHK2MAT( K, 5, M, 3, IA, JA, DESCA, 9, M, 3, N, 4, 362 $ IC, JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 363 ELSE 364 CALL PCHK2MAT( K, 5, N, 4, IA, JA, DESCA, 9, M, 3, N, 4, 365 $ IC, JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 366 END IF 367 END IF 368* 369 IF( INFO.NE.0 ) THEN 370 CALL PXERBLA( ICTXT, 'PZUNMRQ', -INFO ) 371 RETURN 372 ELSE IF( LQUERY ) THEN 373 RETURN 374 END IF 375* 376* Quick return if possible 377* 378 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 379 $ RETURN 380* 381 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 382 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 383* 384 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. 385 $ ( .NOT.LEFT .AND. NOTRAN ) ) THEN 386 I1 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 387 $ + 1 388 I2 = IA + K - 1 389 I3 = DESCA( MB_ ) 390 ELSE 391 I1 = MAX( ( (IA+K-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 392 I2 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 393 $ + 1 394 I3 = -DESCA( MB_ ) 395 END IF 396* 397 IF( LEFT ) THEN 398 NI = N 399 ELSE 400 MI = M 401 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 402 IF( NOTRAN ) THEN 403 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'I-ring' ) 404 ELSE 405 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 406 END IF 407 END IF 408* 409 IF( NOTRAN ) THEN 410 TRANST = 'C' 411 ELSE 412 TRANST = 'N' 413 END IF 414* 415 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. 416 $ ( .NOT.LEFT .AND. NOTRAN ) ) THEN 417 IB = I1 - IA 418 IF( LEFT ) THEN 419 MI = M - K + IB 420 ELSE 421 NI = N - K + IB 422 END IF 423 CALL PZUNMR2( SIDE, TRANS, MI, NI, IB, A, IA, JA, DESCA, TAU, 424 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 425 END IF 426* 427 IPW = DESCA( MB_ )*DESCA( MB_ ) + 1 428 DO 10 I = I1, I2, I3 429 IB = MIN( DESCA( MB_ ), K-I+IA ) 430* 431* Form the triangular factor of the block reflector 432* H = H(i+ib-1) . . . H(i+1) H(i) 433* 434 CALL PZLARFT( 'Backward', 'Rowwise', NQ-K+I+IB-IA, IB, 435 $ A, I, JA, DESCA, TAU, WORK, WORK( IPW ) ) 436 IF( LEFT ) THEN 437* 438* H or H' is applied to C(ic:ic+m-k+i+ib-ia-1,jc:jc+n-1) 439* 440 MI = M - K + I + IB - IA 441 ELSE 442* 443* H or H' is applied to C(ic:ic+m-1,jc:jc+n-k+i+ib-ia-1) 444* 445 NI = N - K + I + IB - IA 446 END IF 447* 448* Apply H or H' 449* 450 CALL PZLARFB( SIDE, TRANST, 'Backward', 'Rowwise', MI, NI, 451 $ IB, A, I, JA, DESCA, WORK, C, IC, JC, DESCC, 452 $ WORK( IPW ) ) 453 10 CONTINUE 454* 455 IF( ( RIGHT .AND. TRAN ) .OR. 456 $ ( LEFT .AND. NOTRAN ) ) THEN 457 IB = I2 - IA 458 IF( LEFT ) THEN 459 MI = M - K + IB 460 ELSE 461 NI = N - K + IB 462 END IF 463 CALL PZUNMR2( SIDE, TRANS, MI, NI, IB, A, IA, JA, DESCA, TAU, 464 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 465 END IF 466* 467 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 468 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 469* 470 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 471* 472 RETURN 473* 474* End of PZUNMRQ 475* 476 END 477