1 SUBROUTINE PDORMRQ( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 2 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 CHARACTER SIDE, TRANS 11 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ), DESCC( * ) 15 DOUBLE PRECISION A( * ), C( * ), TAU( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PDORMRQ overwrites the general real M-by-N distributed matrix 22* sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 23* 24* SIDE = 'L' SIDE = 'R' 25* TRANS = 'N': Q * sub( C ) sub( C ) * Q 26* TRANS = 'T': Q**T * sub( C ) sub( C ) * Q**T 27* 28* where Q is a real orthogonal distributed matrix defined as the 29* product of K elementary reflectors 30* 31* Q = H(1) H(2) . . . H(k) 32* 33* as returned by PDGERQF. Q is of order M if SIDE = 'L' and of order N 34* if SIDE = 'R'. 35* 36* Notes 37* ===== 38* 39* Each global data object is described by an associated description 40* vector. This vector stores the information required to establish 41* the mapping between an object element and its corresponding process 42* and memory location. 43* 44* Let A be a generic term for any 2D block cyclicly distributed array. 45* Such a global array has an associated description vector DESCA. 46* In the following comments, the character _ should be read as 47* "of the global array". 48* 49* NOTATION STORED IN EXPLANATION 50* --------------- -------------- -------------------------------------- 51* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 52* DTYPE_A = 1. 53* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 54* the BLACS process grid A is distribu- 55* ted over. The context itself is glo- 56* bal, but the handle (the integer 57* value) may vary. 58* M_A (global) DESCA( M_ ) The number of rows in the global 59* array A. 60* N_A (global) DESCA( N_ ) The number of columns in the global 61* array A. 62* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 63* the rows of the array. 64* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 65* the columns of the array. 66* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 67* row of the array A is distributed. 68* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 69* first column of the array A is 70* distributed. 71* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 72* array. LLD_A >= MAX(1,LOCr(M_A)). 73* 74* Let K be the number of rows or columns of a distributed matrix, 75* and assume that its process grid has dimension p x q. 76* LOCr( K ) denotes the number of elements of K that a process 77* would receive if K were distributed over the p processes of its 78* process column. 79* Similarly, LOCc( K ) denotes the number of elements of K that a 80* process would receive if K were distributed over the q processes of 81* its process row. 82* The values of LOCr() and LOCc() may be determined via a call to the 83* ScaLAPACK tool function, NUMROC: 84* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 85* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 86* An upper bound for these quantities may be computed by: 87* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 88* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 89* 90* Arguments 91* ========= 92* 93* SIDE (global input) CHARACTER 94* = 'L': apply Q or Q**T from the Left; 95* = 'R': apply Q or Q**T from the Right. 96* 97* TRANS (global input) CHARACTER 98* = 'N': No transpose, apply Q; 99* = 'T': Transpose, apply Q**T. 100* 101* M (global input) INTEGER 102* The number of rows to be operated on i.e the number of rows 103* of the distributed submatrix sub( C ). M >= 0. 104* 105* N (global input) INTEGER 106* The number of columns to be operated on i.e the number of 107* columns of the distributed submatrix sub( C ). N >= 0. 108* 109* K (global input) INTEGER 110* The number of elementary reflectors whose product defines the 111* matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 112* N >= K >= 0. 113* 114* A (local input) DOUBLE PRECISION pointer into the local memory 115* to an array of dimension (LLD_A,LOCc(JA+M-1)) if SIDE='L', 116* and (LLD_A,LOCc(JA+N-1)) if SIDE='R', where 117* LLD_A >= MAX(1,LOCr(IA+K-1)); On entry, the i-th row must 118* contain the vector which defines the elementary reflector 119* H(i), IA <= i <= IA+K-1, as returned by PDGERQF in the 120* K rows of its distributed matrix argument A(IA:IA+K-1,JA:*). 121* A(IA:IA+K-1,JA:*) is modified by the routine but restored on 122* exit. 123* 124* IA (global input) INTEGER 125* The row index in the global array A indicating the first 126* row of sub( A ). 127* 128* JA (global input) INTEGER 129* The column index in the global array A indicating the 130* first column of sub( A ). 131* 132* DESCA (global and local input) INTEGER array of dimension DLEN_. 133* The array descriptor for the distributed matrix A. 134* 135* TAU (local input) DOUBLE PRECISION array, dimension LOCc(IA+K-1). 136* This array contains the scalar factors TAU(i) of the 137* elementary reflectors H(i) as returned by PDGERQF. 138* TAU is tied to the distributed matrix A. 139* 140* C (local input/local output) DOUBLE PRECISION pointer into the 141* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 142* On entry, the local pieces of the distributed matrix sub(C). 143* On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 144* or sub( C )*Q' or sub( C )*Q. 145* 146* IC (global input) INTEGER 147* The row index in the global array C indicating the first 148* row of sub( C ). 149* 150* JC (global input) INTEGER 151* The column index in the global array C indicating the 152* first column of sub( C ). 153* 154* DESCC (global and local input) INTEGER array of dimension DLEN_. 155* The array descriptor for the distributed matrix C. 156* 157* WORK (local workspace/local output) DOUBLE PRECISION array, 158* dimension (LWORK) 159* On exit, WORK(1) returns the minimal and optimal LWORK. 160* 161* LWORK (local or global input) INTEGER 162* The dimension of the array WORK. 163* LWORK is local input and must be at least 164* if SIDE = 'L', 165* LWORK >= MAX( (MB_A*(MB_A-1))/2, ( MpC0 + MAX( MqA0 + 166* NUMROC( NUMROC( M+IROFFC, MB_A, 0, 0, NPROW ), 167* MB_A, 0, 0, LCMP ), NqC0 ) )*MB_A ) + 168* MB_A * MB_A 169* else if SIDE = 'R', 170* LWORK >= MAX( (MB_A*(MB_A-1))/2, (MpC0 + NqC0)*MB_A ) + 171* MB_A * MB_A 172* end if 173* 174* where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ), 175* 176* IROFFA = MOD( IA-1, MB_A ), ICOFFA = MOD( JA-1, NB_A ), 177* IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 178* MqA0 = NUMROC( M+ICOFFA, NB_A, MYCOL, IACOL, NPCOL ), 179* 180* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 181* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 182* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 183* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 184* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 185* 186* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 187* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 188* the subroutine BLACS_GRIDINFO. 189* 190* If LWORK = -1, then LWORK is global input and a workspace 191* query is assumed; the routine only calculates the minimum 192* and optimal size for all work arrays. Each of these 193* values is returned in the first entry of the corresponding 194* work array, and no error message is issued by PXERBLA. 195* 196* 197* INFO (global output) INTEGER 198* = 0: successful exit 199* < 0: If the i-th argument is an array and the j-entry had 200* an illegal value, then INFO = -(i*100+j), if the i-th 201* argument is a scalar and had an illegal value, then 202* INFO = -i. 203* 204* Alignment requirements 205* ====================== 206* 207* The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 208* must verify some alignment properties, namely the following 209* expressions should be true: 210* 211* If SIDE = 'L', 212* ( NB_A.EQ.MB_C .AND. ICOFFA.EQ.IROFFC ) 213* If SIDE = 'R', 214* ( NB_A.EQ.NB_C .AND. ICOFFA.EQ.ICOFFC .AND. IACOL.EQ.ICCOL ) 215* 216* ===================================================================== 217* 218* .. Parameters .. 219 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 220 $ LLD_, MB_, M_, NB_, N_, RSRC_ 221 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 222 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 223 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 224* .. 225* .. Local Scalars .. 226 LOGICAL LEFT, LQUERY, NOTRAN, RIGHT, TRAN 227 CHARACTER COLBTOP, ROWBTOP, TRANST 228 INTEGER I, I1, I2, I3, IACOL, IB, ICCOL, ICOFFA, 229 $ ICOFFC, ICROW, ICTXT, IINFO, IPW, IROFFC, LCM, 230 $ LCMP, LWMIN, MI, MPC0, MQA0, MYCOL, MYROW, NI, 231 $ NPCOL, NPROW, NQ, NQC0 232* .. 233* .. Local Arrays .. 234 INTEGER IDUM1( 4 ), IDUM2( 4 ) 235* .. 236* .. External Subroutines .. 237 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK2MAT, PDLARFB, 238 $ PDLARFT, PDORMR2, PB_TOPGET, PB_TOPSET, PXERBLA 239* .. 240* .. External Functions .. 241 LOGICAL LSAME 242 INTEGER ICEIL, ILCM, INDXG2P, NUMROC 243 EXTERNAL ICEIL, ILCM, INDXG2P, LSAME, NUMROC 244* .. 245* .. Intrinsic Functions .. 246 INTRINSIC DBLE, ICHAR, MAX, MIN, MOD 247* .. 248* .. Executable Statements .. 249* 250* Get grid parameters 251* 252 ICTXT = DESCA( CTXT_ ) 253 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 254* 255* Test the input parameters 256* 257 INFO = 0 258 IF( NPROW.EQ.-1 ) THEN 259 INFO = -(900+CTXT_) 260 ELSE 261 IF( LSAME( SIDE, 'L' ) ) THEN 262 LEFT = .TRUE. 263 RIGHT = .FALSE. 264 ELSE 265 LEFT = .FALSE. 266 RIGHT = .TRUE. 267 END IF 268 IF( LSAME( TRANS, 'N' ) ) THEN 269 NOTRAN = .TRUE. 270 TRAN = .FALSE. 271 ELSE 272 NOTRAN = .FALSE. 273 TRAN = .TRUE. 274 END IF 275* 276* NQ is the order of Q 277* 278 IF( LEFT ) THEN 279 NQ = M 280 CALL CHK1MAT( K, 5, M, 3, IA, JA, DESCA, 9, INFO ) 281 ELSE 282 NQ = N 283 CALL CHK1MAT( K, 5, N, 4, IA, JA, DESCA, 9, INFO ) 284 END IF 285 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 286 IF( INFO.EQ.0 ) THEN 287 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 288 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 289 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 290 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 291 $ NPCOL ) 292 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 293 $ NPROW ) 294 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 295 $ NPCOL ) 296 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 297 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 298* 299 IF( LEFT ) THEN 300 MQA0 = NUMROC( M+ICOFFA, DESCA( NB_ ), MYCOL, IACOL, 301 $ NPCOL ) 302 LCM = ILCM( NPROW, NPCOL ) 303 LCMP = LCM / NPROW 304 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) 305 $ / 2, ( MPC0 + MAX( MQA0 + NUMROC( NUMROC( 306 $ M+IROFFC, DESCA( MB_ ), 0, 0, NPROW ), 307 $ DESCA( MB_ ), 0, 0, LCMP ), NQC0 ) ) * 308 $ DESCA( MB_ ) ) + DESCA( MB_ ) * DESCA( MB_ ) 309 ELSE 310 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) / 2, 311 $ ( MPC0 + NQC0 ) * DESCA( MB_ ) ) + 312 $ DESCA( MB_ ) * DESCA( MB_ ) 313 END IF 314* 315 WORK( 1 ) = DBLE( LWMIN ) 316 LQUERY = ( LWORK.EQ.-1 ) 317 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 318 INFO = -1 319 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'T' ) ) THEN 320 INFO = -2 321 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 322 INFO = -5 323 ELSE IF( LEFT .AND. DESCA( NB_ ).NE.DESCC( MB_ ) ) THEN 324 INFO = -(900+NB_) 325 ELSE IF( LEFT .AND. ICOFFA.NE.IROFFC ) THEN 326 INFO = -12 327 ELSE IF( .NOT.LEFT .AND. ICOFFA.NE.ICOFFC ) THEN 328 INFO = -13 329 ELSE IF( .NOT.LEFT .AND. IACOL.NE.ICCOL ) THEN 330 INFO = -13 331 ELSE IF( .NOT.LEFT .AND. DESCA( NB_ ).NE.DESCC( NB_ ) ) THEN 332 INFO = -(1400+NB_) 333 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 334 INFO = -(1400+CTXT_) 335 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 336 INFO = -16 337 END IF 338 END IF 339 IF( LEFT ) THEN 340 IDUM1( 1 ) = ICHAR( 'L' ) 341 ELSE 342 IDUM1( 1 ) = ICHAR( 'R' ) 343 END IF 344 IDUM2( 1 ) = 1 345 IF( NOTRAN ) THEN 346 IDUM1( 2 ) = ICHAR( 'N' ) 347 ELSE 348 IDUM1( 2 ) = ICHAR( 'T' ) 349 END IF 350 IDUM2( 2 ) = 2 351 IDUM1( 3 ) = K 352 IDUM2( 3 ) = 5 353 IF( LWORK.EQ.-1 ) THEN 354 IDUM1( 4 ) = -1 355 ELSE 356 IDUM1( 4 ) = 1 357 END IF 358 IDUM2( 4 ) = 16 359 IF( LEFT ) THEN 360 CALL PCHK2MAT( K, 5, M, 3, IA, JA, DESCA, 9, M, 3, N, 4, 361 $ IC, JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 362 ELSE 363 CALL PCHK2MAT( K, 5, N, 4, IA, JA, DESCA, 9, M, 3, N, 4, 364 $ IC, JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 365 END IF 366 END IF 367* 368 IF( INFO.NE.0 ) THEN 369 CALL PXERBLA( ICTXT, 'PDORMRQ', -INFO ) 370 RETURN 371 ELSE IF( LQUERY ) THEN 372 RETURN 373 END IF 374* 375* Quick return if possible 376* 377 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 378 $ RETURN 379* 380 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 381 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 382* 383 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. 384 $ ( .NOT.LEFT .AND. NOTRAN ) ) THEN 385 I1 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 386 $ + 1 387 I2 = IA + K - 1 388 I3 = DESCA( MB_ ) 389 ELSE 390 I1 = MAX( ( (IA+K-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 391 I2 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 392 $ + 1 393 I3 = -DESCA( MB_ ) 394 END IF 395* 396 IF( LEFT ) THEN 397 NI = N 398 ELSE 399 MI = M 400 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 401 IF( NOTRAN ) THEN 402 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'I-ring' ) 403 ELSE 404 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 405 END IF 406 END IF 407* 408 IF( NOTRAN ) THEN 409 TRANST = 'T' 410 ELSE 411 TRANST = 'N' 412 END IF 413* 414 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. 415 $ ( .NOT.LEFT .AND. NOTRAN ) ) THEN 416 IB = I1 - IA 417 IF( LEFT ) THEN 418 MI = M - K + IB 419 ELSE 420 NI = N - K + IB 421 END IF 422 CALL PDORMR2( SIDE, TRANS, MI, NI, IB, A, IA, JA, DESCA, TAU, 423 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 424 END IF 425* 426 IPW = DESCA( MB_ )*DESCA( MB_ ) + 1 427 DO 10 I = I1, I2, I3 428 IB = MIN( DESCA( MB_ ), K-I+IA ) 429* 430* Form the triangular factor of the block reflector 431* H = H(i+ib-1) . . . H(i+1) H(i) 432* 433 CALL PDLARFT( 'Backward', 'Rowwise', NQ-K+I+IB-IA, IB, 434 $ A, I, JA, DESCA, TAU, WORK, WORK( IPW ) ) 435 IF( LEFT ) THEN 436* 437* H or H' is applied to C(ic:ic+m-k+i+ib-ia-1,jc:jc+n-1) 438* 439 MI = M - K + I + IB - IA 440 ELSE 441* 442* H or H' is applied to C(ic:ic+m-1,jc:jc+n-k+i+ib-ia-1) 443* 444 NI = N - K + I + IB - IA 445 END IF 446* 447* Apply H or H' 448* 449 CALL PDLARFB( SIDE, TRANST, 'Backward', 'Rowwise', MI, NI, 450 $ IB, A, I, JA, DESCA, WORK, C, IC, JC, DESCC, 451 $ WORK( IPW ) ) 452 10 CONTINUE 453* 454 IF( ( RIGHT .AND. TRAN ) .OR. 455 $ ( LEFT .AND. NOTRAN ) ) THEN 456 IB = I2 - IA 457 IF( LEFT ) THEN 458 MI = M - K + IB 459 ELSE 460 NI = N - K + IB 461 END IF 462 CALL PDORMR2( SIDE, TRANS, MI, NI, IB, A, IA, JA, DESCA, TAU, 463 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 464 END IF 465* 466 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 467 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 468* 469 WORK( 1 ) = DBLE( LWMIN ) 470* 471 RETURN 472* 473* End of PDORMRQ 474* 475 END 476