1 SUBROUTINE PSORMQR( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 2 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 CHARACTER SIDE, TRANS 11 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ), DESCC( * ) 15 REAL A( * ), C( * ), TAU( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PSORMQR overwrites the general real M-by-N distributed matrix 22* sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 23* 24* SIDE = 'L' SIDE = 'R' 25* TRANS = 'N': Q * sub( C ) sub( C ) * Q 26* TRANS = 'T': Q**T * sub( C ) sub( C ) * Q**T 27* 28* where Q is a real orthogonal distributed matrix defined as the 29* product of k elementary reflectors 30* 31* Q = H(1) H(2) . . . H(k) 32* 33* as returned by PSGEQRF. Q is of order M if SIDE = 'L' and of order N 34* if SIDE = 'R'. 35* 36* Notes 37* ===== 38* 39* Each global data object is described by an associated description 40* vector. This vector stores the information required to establish 41* the mapping between an object element and its corresponding process 42* and memory location. 43* 44* Let A be a generic term for any 2D block cyclicly distributed array. 45* Such a global array has an associated description vector DESCA. 46* In the following comments, the character _ should be read as 47* "of the global array". 48* 49* NOTATION STORED IN EXPLANATION 50* --------------- -------------- -------------------------------------- 51* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 52* DTYPE_A = 1. 53* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 54* the BLACS process grid A is distribu- 55* ted over. The context itself is glo- 56* bal, but the handle (the integer 57* value) may vary. 58* M_A (global) DESCA( M_ ) The number of rows in the global 59* array A. 60* N_A (global) DESCA( N_ ) The number of columns in the global 61* array A. 62* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 63* the rows of the array. 64* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 65* the columns of the array. 66* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 67* row of the array A is distributed. 68* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 69* first column of the array A is 70* distributed. 71* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 72* array. LLD_A >= MAX(1,LOCr(M_A)). 73* 74* Let K be the number of rows or columns of a distributed matrix, 75* and assume that its process grid has dimension p x q. 76* LOCr( K ) denotes the number of elements of K that a process 77* would receive if K were distributed over the p processes of its 78* process column. 79* Similarly, LOCc( K ) denotes the number of elements of K that a 80* process would receive if K were distributed over the q processes of 81* its process row. 82* The values of LOCr() and LOCc() may be determined via a call to the 83* ScaLAPACK tool function, NUMROC: 84* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 85* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 86* An upper bound for these quantities may be computed by: 87* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 88* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 89* 90* Arguments 91* ========= 92* 93* SIDE (global input) CHARACTER 94* = 'L': apply Q or Q**T from the Left; 95* = 'R': apply Q or Q**T from the Right. 96* 97* TRANS (global input) CHARACTER 98* = 'N': No transpose, apply Q; 99* = 'T': Transpose, apply Q**T. 100* 101* M (global input) INTEGER 102* The number of rows to be operated on i.e the number of rows 103* of the distributed submatrix sub( C ). M >= 0. 104* 105* N (global input) INTEGER 106* The number of columns to be operated on i.e the number of 107* columns of the distributed submatrix sub( C ). N >= 0. 108* 109* K (global input) INTEGER 110* The number of elementary reflectors whose product defines the 111* matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 112* N >= K >= 0. 113* 114* A (local input) REAL pointer into the local memory 115* to an array of dimension (LLD_A,LOCc(JA+K-1)). On entry, the 116* j-th column must contain the vector which defines the elemen- 117* tary reflector H(j), JA <= j <= JA+K-1, as returned by 118* PSGEQRF in the K columns of its distributed matrix 119* argument A(IA:*,JA:JA+K-1). A(IA:*,JA:JA+K-1) is modified by 120* the routine but restored on exit. 121* If SIDE = 'L', LLD_A >= MAX( 1, LOCr(IA+M-1) ); 122* if SIDE = 'R', LLD_A >= MAX( 1, LOCr(IA+N-1) ). 123* 124* IA (global input) INTEGER 125* The row index in the global array A indicating the first 126* row of sub( A ). 127* 128* JA (global input) INTEGER 129* The column index in the global array A indicating the 130* first column of sub( A ). 131* 132* DESCA (global and local input) INTEGER array of dimension DLEN_. 133* The array descriptor for the distributed matrix A. 134* 135* TAU (local input) REAL, array, dimension LOCc(JA+K-1). 136* This array contains the scalar factors TAU(j) of the 137* elementary reflectors H(j) as returned by PSGEQRF. 138* TAU is tied to the distributed matrix A. 139* 140* C (local input/local output) REAL pointer into the 141* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 142* On entry, the local pieces of the distributed matrix sub(C). 143* On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 144* or sub( C )*Q' or sub( C )*Q. 145* 146* IC (global input) INTEGER 147* The row index in the global array C indicating the first 148* row of sub( C ). 149* 150* JC (global input) INTEGER 151* The column index in the global array C indicating the 152* first column of sub( C ). 153* 154* DESCC (global and local input) INTEGER array of dimension DLEN_. 155* The array descriptor for the distributed matrix C. 156* 157* WORK (local workspace/local output) REAL array, 158* dimension (LWORK) 159* On exit, WORK(1) returns the minimal and optimal LWORK. 160* 161* LWORK (local or global input) INTEGER 162* The dimension of the array WORK. 163* LWORK is local input and must be at least 164* If SIDE = 'L', 165* LWORK >= MAX( (NB_A*(NB_A-1))/2, (NqC0 + MpC0)*NB_A ) + 166* NB_A * NB_A 167* else if SIDE = 'R', 168* LWORK >= MAX( (NB_A*(NB_A-1))/2, ( NqC0 + MAX( NpA0 + 169* NUMROC( NUMROC( N+ICOFFC, NB_A, 0, 0, NPCOL ), 170* NB_A, 0, 0, LCMQ ), MpC0 ) )*NB_A ) + 171* NB_A * NB_A 172* end if 173* 174* where LCMQ = LCM / NPCOL with LCM = ICLM( NPROW, NPCOL ), 175* 176* IROFFA = MOD( IA-1, MB_A ), ICOFFA = MOD( JA-1, NB_A ), 177* IAROW = INDXG2P( IA, MB_A, MYROW, RSRC_A, NPROW ), 178* NpA0 = NUMROC( N+IROFFA, MB_A, MYROW, IAROW, NPROW ), 179* 180* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 181* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 182* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 183* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 184* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 185* 186* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 187* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 188* the subroutine BLACS_GRIDINFO. 189* 190* If LWORK = -1, then LWORK is global input and a workspace 191* query is assumed; the routine only calculates the minimum 192* and optimal size for all work arrays. Each of these 193* values is returned in the first entry of the corresponding 194* work array, and no error message is issued by PXERBLA. 195* 196* 197* INFO (global output) INTEGER 198* = 0: successful exit 199* < 0: If the i-th argument is an array and the j-entry had 200* an illegal value, then INFO = -(i*100+j), if the i-th 201* argument is a scalar and had an illegal value, then 202* INFO = -i. 203* 204* Alignment requirements 205* ====================== 206* 207* The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 208* must verify some alignment properties, namely the following 209* expressions should be true: 210* 211* If SIDE = 'L', 212* ( MB_A.EQ.MB_C .AND. IROFFA.EQ.IROFFC .AND. IAROW.EQ.ICROW ) 213* If SIDE = 'R', 214* ( MB_A.EQ.NB_C .AND. IROFFA.EQ.ICOFFC ) 215* 216* ===================================================================== 217* 218* .. Parameters .. 219 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 220 $ LLD_, MB_, M_, NB_, N_, RSRC_ 221 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 222 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 223 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 224* .. 225* .. Local Scalars .. 226 LOGICAL LEFT, LQUERY, NOTRAN 227 CHARACTER COLBTOP, ROWBTOP 228 INTEGER IAROW, ICC, ICCOL, ICOFFC, ICROW, ICTXT, IINFO, 229 $ IPW, IROFFA, IROFFC, J, J1, J2, J3, JB, JCC, 230 $ LCM, LCMQ, LWMIN, MI, MPC0, MYCOL, MYROW, NI, 231 $ NPA0, NPCOL, NPROW, NQ, NQC0 232* .. 233* .. Local Arrays .. 234 INTEGER IDUM1( 4 ), IDUM2( 4 ) 235* .. 236* .. External Subroutines .. 237 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK2MAT, PSLARFB, 238 $ PSLARFT, PSORM2R, PB_TOPGET, PB_TOPSET, PXERBLA 239* .. 240* .. External Functions .. 241 LOGICAL LSAME 242 INTEGER ICEIL, ILCM, INDXG2P, NUMROC 243 EXTERNAL ICEIL, ILCM, INDXG2P, LSAME, NUMROC 244* .. 245* .. Intrinsic Functions .. 246 INTRINSIC ICHAR, MAX, MIN, MOD, REAL 247* .. 248* .. Executable Statements .. 249* 250* Get grid parameters 251* 252 ICTXT = DESCA( CTXT_ ) 253 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 254* 255* Test the input parameters 256* 257 INFO = 0 258 IF( NPROW.EQ.-1 ) THEN 259 INFO = -(900+CTXT_) 260 ELSE 261 LEFT = LSAME( SIDE, 'L' ) 262 NOTRAN = LSAME( TRANS, 'N' ) 263* 264* NQ is the order of Q 265* 266 IF( LEFT ) THEN 267 NQ = M 268 CALL CHK1MAT( M, 3, K, 5, IA, JA, DESCA, 9, INFO ) 269 ELSE 270 NQ = N 271 CALL CHK1MAT( N, 4, K, 5, IA, JA, DESCA, 9, INFO ) 272 END IF 273 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 274 IF( INFO.EQ.0 ) THEN 275 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 276 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 277 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 278 IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), 279 $ NPROW ) 280 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 281 $ NPROW ) 282 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 283 $ NPCOL ) 284 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 285 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 286* 287 IF( LEFT ) THEN 288 LWMIN = MAX( ( DESCA( NB_ ) * ( DESCA( NB_ ) - 1 ) ) / 2, 289 $ ( MPC0 + NQC0 ) * DESCA( NB_ ) ) + 290 $ DESCA( NB_ ) * DESCA( NB_ ) 291 ELSE 292 NPA0 = NUMROC( N+IROFFA, DESCA( MB_ ), MYROW, IAROW, 293 $ NPROW ) 294 LCM = ILCM( NPROW, NPCOL ) 295 LCMQ = LCM / NPCOL 296 LWMIN = MAX( ( DESCA( NB_ ) * ( DESCA( NB_ ) - 1 ) ) 297 $ / 2, ( NQC0 + MAX( NPA0 + NUMROC( NUMROC( 298 $ N+ICOFFC, DESCA( NB_ ), 0, 0, NPCOL ), 299 $ DESCA( NB_ ), 0, 0, LCMQ ), MPC0 ) ) * 300 $ DESCA( NB_ ) ) + DESCA( NB_ ) * DESCA( NB_ ) 301 END IF 302* 303 WORK( 1 ) = REAL( LWMIN ) 304 LQUERY = ( LWORK.EQ.-1 ) 305 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 306 INFO = -1 307 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'T' ) ) THEN 308 INFO = -2 309 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 310 INFO = -5 311 ELSE IF( .NOT.LEFT .AND. DESCA( MB_ ).NE.DESCC( NB_ ) ) THEN 312 INFO = -(900+NB_) 313 ELSE IF( LEFT .AND. IROFFA.NE.IROFFC ) THEN 314 INFO = -12 315 ELSE IF( LEFT .AND. IAROW.NE.ICROW ) THEN 316 INFO = -12 317 ELSE IF( .NOT.LEFT .AND. IROFFA.NE.ICOFFC ) THEN 318 INFO = -13 319 ELSE IF( LEFT .AND. DESCA( MB_ ).NE.DESCC( MB_ ) ) THEN 320 INFO = -(1400+MB_) 321 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 322 INFO = -(1400+CTXT_) 323 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 324 INFO = -16 325 END IF 326 END IF 327* 328 IF( LEFT ) THEN 329 IDUM1( 1 ) = ICHAR( 'L' ) 330 ELSE 331 IDUM1( 1 ) = ICHAR( 'R' ) 332 END IF 333 IDUM2( 1 ) = 1 334 IF( NOTRAN ) THEN 335 IDUM1( 2 ) = ICHAR( 'N' ) 336 ELSE 337 IDUM1( 2 ) = ICHAR( 'T' ) 338 END IF 339 IDUM2( 2 ) = 2 340 IDUM1( 3 ) = K 341 IDUM2( 3 ) = 5 342 IF( LWORK.EQ.-1 ) THEN 343 IDUM1( 4 ) = -1 344 ELSE 345 IDUM1( 4 ) = 1 346 END IF 347 IDUM2( 4 ) = 16 348 IF( LEFT ) THEN 349 CALL PCHK2MAT( M, 3, K, 5, IA, JA, DESCA, 9, M, 3, N, 4, IC, 350 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 351 ELSE 352 CALL PCHK2MAT( N, 4, K, 5, IA, JA, DESCA, 9, M, 3, N, 4, IC, 353 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 354 END IF 355 END IF 356* 357 IF( INFO.NE.0 ) THEN 358 CALL PXERBLA( ICTXT, 'PSORMQR', -INFO ) 359 RETURN 360 ELSE IF( LQUERY ) THEN 361 RETURN 362 END IF 363* 364* Quick return if possible 365* 366 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 367 $ RETURN 368* 369 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 370 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 371* 372 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. 373 $ ( .NOT.LEFT .AND. NOTRAN ) ) THEN 374 J1 = MIN( ICEIL( JA, DESCA( NB_ ) ) * DESCA( NB_ ), JA+K-1 ) 375 $ + 1 376 J2 = JA+K-1 377 J3 = DESCA( NB_ ) 378 ELSE 379 J1 = MAX( ( (JA+K-2) / DESCA( NB_ ) ) * DESCA( NB_ ) + 1, JA ) 380 J2 = MIN( ICEIL( JA, DESCA( NB_ ) ) * DESCA( NB_ ), JA+K-1 ) 381 $ + 1 382 J3 = -DESCA( NB_ ) 383 END IF 384* 385 IF( LEFT ) THEN 386 NI = N 387 JCC = JC 388 IF( NOTRAN ) THEN 389 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'D-ring' ) 390 ELSE 391 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'I-ring' ) 392 END IF 393 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', ' ' ) 394 ELSE 395 MI = M 396 ICC = IC 397 END IF 398* 399* Use unblocked code for the first block if necessary 400* 401 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. ( .NOT.LEFT .AND. NOTRAN ) ) 402 $ CALL PSORM2R( SIDE, TRANS, M, N, J1-JA, A, IA, JA, DESCA, TAU, 403 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 404* 405 IPW = DESCA( NB_ ) * DESCA( NB_ ) + 1 406 DO 10 J = J1, J2, J3 407 JB = MIN( DESCA( NB_ ), K-J+JA ) 408* 409* Form the triangular factor of the block reflector 410* H = H(j) H(j+1) . . . H(j+jb-1) 411* 412 CALL PSLARFT( 'Forward', 'Columnwise', NQ-J+JA, JB, A, 413 $ IA+J-JA, J, DESCA, TAU, WORK, WORK( IPW ) ) 414 IF( LEFT ) THEN 415* 416* H or H' is applied to C(ic+j-ja:ic+m-1,jc:jc+n-1) 417* 418 MI = M - J + JA 419 ICC = IC + J - JA 420 ELSE 421* 422* H or H' is applied to C(ic:ic+m-1,jc+j-ja:jc+n-1) 423* 424 NI = N - J + JA 425 JCC = JC + J - JA 426 END IF 427* 428* Apply H or H' 429* 430 CALL PSLARFB( SIDE, TRANS, 'Forward', 'Columnwise', MI, NI, 431 $ JB, A, IA+J-JA, J, DESCA, WORK, C, ICC, JCC, 432 $ DESCC, WORK( IPW ) ) 433 10 CONTINUE 434* 435* Use unblocked code for the last block if necessary 436* 437 IF( ( LEFT .AND. NOTRAN ) .OR. ( .NOT.LEFT .AND. .NOT.NOTRAN ) ) 438 $ CALL PSORM2R( SIDE, TRANS, M, N, J2-JA, A, IA, JA, DESCA, TAU, 439 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 440* 441 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 442 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 443* 444 WORK( 1 ) = REAL( LWMIN ) 445* 446 RETURN 447* 448* End of PSORMQR 449* 450 END 451