1 SUBROUTINE PSORMR2( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 2 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 CHARACTER SIDE, TRANS 11 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ), DESCC( * ) 15 REAL A( * ), C( * ), TAU( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PSORMR2 overwrites the general real M-by-N distributed matrix 22* sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 23* 24* SIDE = 'L' SIDE = 'R' 25* TRANS = 'N': Q * sub( C ) sub( C ) * Q 26* TRANS = 'T': Q**T * sub( C ) sub( C ) * Q**T 27* 28* where Q is a real orthogonal distributed matrix defined as the 29* product of K elementary reflectors 30* 31* Q = H(1) H(2) . . . H(k) 32* 33* as returned by PSGERQF. Q is of order M if SIDE = 'L' and of order N 34* if SIDE = 'R'. 35* 36* Notes 37* ===== 38* 39* Each global data object is described by an associated description 40* vector. This vector stores the information required to establish 41* the mapping between an object element and its corresponding process 42* and memory location. 43* 44* Let A be a generic term for any 2D block cyclicly distributed array. 45* Such a global array has an associated description vector DESCA. 46* In the following comments, the character _ should be read as 47* "of the global array". 48* 49* NOTATION STORED IN EXPLANATION 50* --------------- -------------- -------------------------------------- 51* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 52* DTYPE_A = 1. 53* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 54* the BLACS process grid A is distribu- 55* ted over. The context itself is glo- 56* bal, but the handle (the integer 57* value) may vary. 58* M_A (global) DESCA( M_ ) The number of rows in the global 59* array A. 60* N_A (global) DESCA( N_ ) The number of columns in the global 61* array A. 62* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 63* the rows of the array. 64* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 65* the columns of the array. 66* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 67* row of the array A is distributed. 68* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 69* first column of the array A is 70* distributed. 71* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 72* array. LLD_A >= MAX(1,LOCr(M_A)). 73* 74* Let K be the number of rows or columns of a distributed matrix, 75* and assume that its process grid has dimension p x q. 76* LOCr( K ) denotes the number of elements of K that a process 77* would receive if K were distributed over the p processes of its 78* process column. 79* Similarly, LOCc( K ) denotes the number of elements of K that a 80* process would receive if K were distributed over the q processes of 81* its process row. 82* The values of LOCr() and LOCc() may be determined via a call to the 83* ScaLAPACK tool function, NUMROC: 84* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 85* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 86* An upper bound for these quantities may be computed by: 87* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 88* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 89* 90* Arguments 91* ========= 92* 93* SIDE (global input) CHARACTER 94* = 'L': apply Q or Q**T from the Left; 95* = 'R': apply Q or Q**T from the Right. 96* 97* TRANS (global input) CHARACTER 98* = 'N': No transpose, apply Q; 99* = 'T': Transpose, apply Q**T. 100* 101* M (global input) INTEGER 102* The number of rows to be operated on i.e the number of rows 103* of the distributed submatrix sub( C ). M >= 0. 104* 105* N (global input) INTEGER 106* The number of columns to be operated on i.e the number of 107* columns of the distributed submatrix sub( C ). N >= 0. 108* 109* K (global input) INTEGER 110* The number of elementary reflectors whose product defines the 111* matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 112* N >= K >= 0. 113* 114* A (local input) REAL pointer into the local memory 115* to an array of dimension (LLD_A,LOCc(JA+M-1)) if SIDE='L', 116* and (LLD_A,LOCc(JA+N-1)) if SIDE='R', where 117* LLD_A >= MAX(1,LOCr(IA+K-1)); On entry, the i-th row must 118* contain the vector which defines the elementary reflector 119* H(i), IA <= i <= IA+K-1, as returned by PSGERQF in the 120* K rows of its distributed matrix argument A(IA:IA+K-1,JA:*). 121* A(IA:IA+K-1,JA:*) is modified by the routine but restored on 122* exit. 123* 124* IA (global input) INTEGER 125* The row index in the global array A indicating the first 126* row of sub( A ). 127* 128* JA (global input) INTEGER 129* The column index in the global array A indicating the 130* first column of sub( A ). 131* 132* DESCA (global and local input) INTEGER array of dimension DLEN_. 133* The array descriptor for the distributed matrix A. 134* 135* TAU (local input) REAL, array, dimension LOCc(IA+K-1). 136* This array contains the scalar factors TAU(i) of the 137* elementary reflectors H(i) as returned by PSGERQF. 138* TAU is tied to the distributed matrix A. 139* 140* C (local input/local output) REAL pointer into the 141* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 142* On entry, the local pieces of the distributed matrix sub(C). 143* On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 144* or sub( C )*Q' or sub( C )*Q. 145* 146* IC (global input) INTEGER 147* The row index in the global array C indicating the first 148* row of sub( C ). 149* 150* JC (global input) INTEGER 151* The column index in the global array C indicating the 152* first column of sub( C ). 153* 154* DESCC (global and local input) INTEGER array of dimension DLEN_. 155* The array descriptor for the distributed matrix C. 156* 157* WORK (local workspace/local output) REAL array, 158* dimension (LWORK) 159* On exit, WORK(1) returns the minimal and optimal LWORK. 160* 161* LWORK (local or global input) INTEGER 162* The dimension of the array WORK. 163* LWORK is local input and must be at least 164* If SIDE = 'L', LWORK >= MpC0 + MAX( MAX( 1, NqC0 ), NUMROC( 165* NUMROC( M+IROFFC,MB_A,0,0,NPROW ),MB_A,0,0,LCMP ) ); 166* if SIDE = 'R', LWORK >= NqC0 + MAX( 1, MpC0 ); 167* 168* where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ), 169* 170* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 171* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 172* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 173* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 174* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 175* 176* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 177* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 178* the subroutine BLACS_GRIDINFO. 179* 180* If LWORK = -1, then LWORK is global input and a workspace 181* query is assumed; the routine only calculates the minimum 182* and optimal size for all work arrays. Each of these 183* values is returned in the first entry of the corresponding 184* work array, and no error message is issued by PXERBLA. 185* 186* 187* INFO (local output) INTEGER 188* = 0: successful exit 189* < 0: If the i-th argument is an array and the j-entry had 190* an illegal value, then INFO = -(i*100+j), if the i-th 191* argument is a scalar and had an illegal value, then 192* INFO = -i. 193* 194* Alignment requirements 195* ====================== 196* 197* The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 198* must verify some alignment properties, namely the following 199* expressions should be true: 200* 201* If SIDE = 'L', 202* ( NB_A.EQ.MB_C .AND. ICOFFA.EQ.IROFFC ) 203* If SIDE = 'R', 204* ( NB_A.EQ.NB_C .AND. ICOFFA.EQ.ICOFFC .AND. IACOL.EQ.ICCOL ) 205* 206* ===================================================================== 207* 208* .. Parameters .. 209 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 210 $ LLD_, MB_, M_, NB_, N_, RSRC_ 211 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 212 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 213 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 214 REAL ONE 215 PARAMETER ( ONE = 1.0E+0 ) 216* .. 217* .. Local Scalars .. 218 LOGICAL LEFT, LQUERY, NOTRAN 219 CHARACTER COLBTOP, ROWBTOP 220 INTEGER I, I1, I2, I3, IACOL, ICCOL, ICOFFA, ICOFFC, 221 $ ICROW, ICTXT, IROFFC, LCM, LCMP, LWMIN, MI, 222 $ MPC0, MYCOL, MYROW, NI, NPCOL, NPROW, NQ, NQC0 223 REAL AII 224* .. 225* .. External Subroutines .. 226 EXTERNAL BLACS_ABORT, BLACS_GRIDINFO, CHK1MAT, PSELSET, 227 $ PSELSET2, PSLARF, PB_TOPGET, PB_TOPSET, PXERBLA 228* .. 229* .. External Functions .. 230 LOGICAL LSAME 231 INTEGER ILCM, INDXG2P, NUMROC 232 EXTERNAL ILCM, INDXG2P, LSAME, NUMROC 233* .. 234* .. Intrinsic Functions .. 235 INTRINSIC MAX, MOD, REAL 236* .. 237* .. Executable Statements .. 238* 239* Get grid parameters 240* 241 ICTXT = DESCA( CTXT_ ) 242 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 243* 244* Test the input parameters 245* 246 INFO = 0 247 IF( NPROW.EQ.-1 ) THEN 248 INFO = -(900+CTXT_) 249 ELSE 250 LEFT = LSAME( SIDE, 'L' ) 251 NOTRAN = LSAME( TRANS, 'N' ) 252* 253* NQ is the order of Q 254* 255 IF( LEFT ) THEN 256 NQ = M 257 CALL CHK1MAT( K, 5, M, 3, IA, JA, DESCA, 9, INFO ) 258 ELSE 259 NQ = N 260 CALL CHK1MAT( K, 5, N, 4, IA, JA, DESCA, 9, INFO ) 261 END IF 262 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 263 IF( INFO.EQ.0 ) THEN 264 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 265 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 266 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 267 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 268 $ NPCOL ) 269 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 270 $ NPROW ) 271 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 272 $ NPCOL ) 273 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 274 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 275* 276 IF( LEFT ) THEN 277 LCM = ILCM( NPROW, NPCOL ) 278 LCMP = LCM / NPROW 279 LWMIN = MPC0 + MAX( MAX( 1, NQC0 ), NUMROC( NUMROC( 280 $ M+IROFFC, DESCA( MB_ ), 0, 0, NPROW ), 281 $ DESCA( MB_ ), 0, 0, LCMP ) ) 282 ELSE 283 LWMIN = NQC0 + MAX( 1, MPC0 ) 284 END IF 285* 286 WORK( 1 ) = REAL( LWMIN ) 287 LQUERY = ( LWORK.EQ.-1 ) 288 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 289 INFO = -1 290 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'T' ) ) THEN 291 INFO = -2 292 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 293 INFO = -5 294 ELSE IF( LEFT .AND. DESCA( NB_ ).NE.DESCC( MB_ ) ) THEN 295 INFO = -(900+NB_) 296 ELSE IF( LEFT .AND. ICOFFA.NE.IROFFC ) THEN 297 INFO = -12 298 ELSE IF( .NOT.LEFT .AND. ICOFFA.NE.ICOFFC ) THEN 299 INFO = -13 300 ELSE IF( .NOT.LEFT .AND. IACOL.NE.ICCOL ) THEN 301 INFO = -13 302 ELSE IF( .NOT.LEFT .AND. DESCA( NB_ ).NE.DESCC( NB_ ) ) THEN 303 INFO = -(1400+NB_) 304 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 305 INFO = -(1400+CTXT_) 306 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 307 INFO = -16 308 END IF 309 END IF 310 END IF 311* 312 IF( INFO.NE.0 ) THEN 313 CALL PXERBLA( ICTXT, 'PSORMR2', -INFO ) 314 CALL BLACS_ABORT( ICTXT, 1 ) 315 RETURN 316 ELSE IF( LQUERY ) THEN 317 RETURN 318 END IF 319* 320* Quick return if possible 321* 322 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 323 $ RETURN 324* 325 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 326 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 327* 328 IF( ( LEFT .AND. .NOT.NOTRAN .OR. .NOT.LEFT .AND. NOTRAN ) ) THEN 329 I1 = IA 330 I2 = IA + K - 1 331 I3 = 1 332 ELSE 333 I1 = IA + K - 1 334 I2 = IA 335 I3 = -1 336 END IF 337* 338 IF( LEFT ) THEN 339 NI = N 340 ELSE 341 MI = M 342 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 343 IF( NOTRAN ) THEN 344 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'I-ring' ) 345 ELSE 346 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 347 END IF 348 END IF 349* 350 DO 10 I = I1, I2, I3 351 IF( LEFT ) THEN 352* 353* H(i) or H(i)' is applied to C(ic:ic+m-k+i-ia,jc:jc+n-1) 354* 355 MI = M - K + I - IA + 1 356 ELSE 357* 358* H(i) or H(i)' is applied to C(ic:ic+m-1,jc:jc+n-k+i-ia+1) 359* 360 NI = N - K + I - IA + 1 361 END IF 362* 363* Apply H(i) or H(i)' 364* 365 CALL PSELSET2( AII, A, I, JA+NQ-K+I-IA, DESCA, ONE ) 366 CALL PSLARF( SIDE, MI, NI, A, I, JA, DESCA, DESCA( M_ ), 367 $ TAU, C, IC, JC, DESCC, WORK ) 368 CALL PSELSET( A, I, JA+NQ-K+I-IA, DESCA, AII ) 369* 370 10 CONTINUE 371* 372 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 373 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 374* 375 WORK( 1 ) = REAL( LWMIN ) 376* 377 RETURN 378* 379* End of PSORMR2 380* 381 END 382