1 SUBROUTINE PCUNMLQ( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 2 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 CHARACTER SIDE, TRANS 11 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ), DESCC( * ) 15 COMPLEX A( * ), C( * ), TAU( * ), WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PCUNMLQ overwrites the general complex M-by-N distributed matrix 22* sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 23* 24* SIDE = 'L' SIDE = 'R' 25* TRANS = 'N': Q * sub( C ) sub( C ) * Q 26* TRANS = 'C': Q**H * sub( C sub( C ) * Q**H 27* 28* where Q is a complex unitary distributed matrix defined as the 29* product of K elementary reflectors 30* 31* Q = H(k)' . . . H(2)' H(1)' 32* 33* as returned by PCGELQF. Q is of order M if SIDE = 'L' and of order N 34* if SIDE = 'R'. 35* 36* Notes 37* ===== 38* 39* Each global data object is described by an associated description 40* vector. This vector stores the information required to establish 41* the mapping between an object element and its corresponding process 42* and memory location. 43* 44* Let A be a generic term for any 2D block cyclicly distributed array. 45* Such a global array has an associated description vector DESCA. 46* In the following comments, the character _ should be read as 47* "of the global array". 48* 49* NOTATION STORED IN EXPLANATION 50* --------------- -------------- -------------------------------------- 51* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 52* DTYPE_A = 1. 53* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 54* the BLACS process grid A is distribu- 55* ted over. The context itself is glo- 56* bal, but the handle (the integer 57* value) may vary. 58* M_A (global) DESCA( M_ ) The number of rows in the global 59* array A. 60* N_A (global) DESCA( N_ ) The number of columns in the global 61* array A. 62* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 63* the rows of the array. 64* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 65* the columns of the array. 66* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 67* row of the array A is distributed. 68* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 69* first column of the array A is 70* distributed. 71* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 72* array. LLD_A >= MAX(1,LOCr(M_A)). 73* 74* Let K be the number of rows or columns of a distributed matrix, 75* and assume that its process grid has dimension p x q. 76* LOCr( K ) denotes the number of elements of K that a process 77* would receive if K were distributed over the p processes of its 78* process column. 79* Similarly, LOCc( K ) denotes the number of elements of K that a 80* process would receive if K were distributed over the q processes of 81* its process row. 82* The values of LOCr() and LOCc() may be determined via a call to the 83* ScaLAPACK tool function, NUMROC: 84* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 85* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 86* An upper bound for these quantities may be computed by: 87* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 88* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 89* 90* Arguments 91* ========= 92* 93* SIDE (global input) CHARACTER 94* = 'L': apply Q or Q**H from the Left; 95* = 'R': apply Q or Q**H from the Right. 96* 97* TRANS (global input) CHARACTER 98* = 'N': No transpose, apply Q; 99* = 'C': Conjugate transpose, apply Q**H. 100* 101* M (global input) INTEGER 102* The number of rows to be operated on i.e the number of rows 103* of the distributed submatrix sub( C ). M >= 0. 104* 105* N (global input) INTEGER 106* The number of columns to be operated on i.e the number of 107* columns of the distributed submatrix sub( C ). N >= 0. 108* 109* K (global input) INTEGER 110* The number of elementary reflectors whose product defines the 111* matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 112* N >= K >= 0. 113* 114* A (local input) COMPLEX pointer into the local memory 115* to an array of dimension (LLD_A,LOCc(JA+M-1)) if SIDE='L', 116* and (LLD_A,LOCc(JA+N-1)) if SIDE='R', where 117* LLD_A >= max(1,LOCr(IA+K-1)); On entry, the i-th row must 118* contain the vector which defines the elementary reflector 119* H(i), IA <= i <= IA+K-1, as returned by PCGELQF in the 120* K rows of its distributed matrix argument A(IA:IA+K-1,JA:*). 121* A(IA:IA+K-1,JA:*) is modified by the routine but restored on 122* exit. 123* 124* IA (global input) INTEGER 125* The row index in the global array A indicating the first 126* row of sub( A ). 127* 128* JA (global input) INTEGER 129* The column index in the global array A indicating the 130* first column of sub( A ). 131* 132* DESCA (global and local input) INTEGER array of dimension DLEN_. 133* The array descriptor for the distributed matrix A. 134* 135* TAU (local input) COMPLEX, array, dimension LOCc(IA+K-1). 136* This array contains the scalar factors TAU(i) of the 137* elementary reflectors H(i) as returned by PCGELQF. 138* TAU is tied to the distributed matrix A. 139* 140* C (local input/local output) COMPLEX pointer into the 141* local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 142* On entry, the local pieces of the distributed matrix sub(C). 143* On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 144* or sub( C )*Q' or sub( C )*Q. 145* 146* IC (global input) INTEGER 147* The row index in the global array C indicating the first 148* row of sub( C ). 149* 150* JC (global input) INTEGER 151* The column index in the global array C indicating the 152* first column of sub( C ). 153* 154* DESCC (global and local input) INTEGER array of dimension DLEN_. 155* The array descriptor for the distributed matrix C. 156* 157* WORK (local workspace/local output) COMPLEX array, 158* dimension (LWORK) 159* On exit, WORK(1) returns the minimal and optimal LWORK. 160* 161* LWORK (local or global input) INTEGER 162* The dimension of the array WORK. 163* LWORK is local input and must be at least 164* if SIDE = 'L', 165* LWORK >= MAX( (MB_A*(MB_A-1))/2, ( MpC0 + MAX( MqA0 + 166* NUMROC( NUMROC( M+IROFFC, MB_A, 0, 0, NPROW ), 167* MB_A, 0, 0, LCMP ), NqC0 ) )*MB_A ) + 168* MB_A * MB_A 169* else if SIDE = 'R', 170* LWORK >= MAX( (MB_A*(MB_A-1))/2, (MpC0 + NqC0)*MB_A ) + 171* MB_A * MB_A 172* end if 173* 174* where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ), 175* 176* IROFFA = MOD( IA-1, MB_A ), ICOFFA = MOD( JA-1, NB_A ), 177* IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 178* MqA0 = NUMROC( M+ICOFFA, NB_A, MYCOL, IACOL, NPCOL ), 179* 180* IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 181* ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 182* ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 183* MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 184* NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 185* 186* ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 187* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 188* the subroutine BLACS_GRIDINFO. 189* 190* If LWORK = -1, then LWORK is global input and a workspace 191* query is assumed; the routine only calculates the minimum 192* and optimal size for all work arrays. Each of these 193* values is returned in the first entry of the corresponding 194* work array, and no error message is issued by PXERBLA. 195* 196* 197* INFO (global output) INTEGER 198* = 0: successful exit 199* < 0: If the i-th argument is an array and the j-entry had 200* an illegal value, then INFO = -(i*100+j), if the i-th 201* argument is a scalar and had an illegal value, then 202* INFO = -i. 203* 204* Alignment requirements 205* ====================== 206* 207* The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 208* must verify some alignment properties, namely the following 209* expressions should be true: 210* 211* If SIDE = 'L', 212* ( NB_A.EQ.MB_C .AND. ICOFFA.EQ.IROFFC ) 213* If SIDE = 'R', 214* ( NB_A.EQ.NB_C .AND. ICOFFA.EQ.ICOFFC .AND. IACOL.EQ.ICCOL ) 215* 216* ===================================================================== 217* 218* .. Parameters .. 219 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 220 $ LLD_, MB_, M_, NB_, N_, RSRC_ 221 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 222 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 223 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 224* .. 225* .. Local Scalars .. 226 LOGICAL LEFT, LQUERY, NOTRAN 227 CHARACTER COLBTOP, ROWBTOP, TRANST 228 INTEGER I, I1, I2, I3, IACOL, IB, ICC, ICCOL, ICOFFA, 229 $ ICOFFC, ICROW, ICTXT, IINFO, IPW, IROFFC, JCC, 230 $ LCM, LCMP, LWMIN, MI, MPC0, MQA0, MYCOL, MYROW, 231 $ NI, NPCOL, NPROW, NQ, NQC0 232* .. 233* .. Local Arrays .. 234 INTEGER IDUM1( 4 ), IDUM2( 4 ) 235* .. 236* .. External Subroutines .. 237 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK2MAT, PCLARFB, 238 $ PCLARFT, PCUNML2, PB_TOPGET, PB_TOPSET, PXERBLA 239* .. 240* .. External Functions .. 241 LOGICAL LSAME 242 INTEGER ICEIL, ILCM, INDXG2P, NUMROC 243 EXTERNAL ICEIL, ILCM, INDXG2P, LSAME, NUMROC 244* .. 245* .. Intrinsic Functions .. 246 INTRINSIC CMPLX, ICHAR, MAX, MIN, MOD, REAL 247* .. 248* .. Executable Statements .. 249* 250* Get grid parameters 251* 252 ICTXT = DESCA( CTXT_ ) 253 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 254* 255* Test the input parameters 256* 257 INFO = 0 258 IF( NPROW.EQ.-1 ) THEN 259 INFO = -(900+CTXT_) 260 ELSE 261 LEFT = LSAME( SIDE, 'L' ) 262 NOTRAN = LSAME( TRANS, 'N' ) 263* 264* NQ is the order of Q 265* 266 IF( LEFT ) THEN 267 NQ = M 268 CALL CHK1MAT( K, 5, M, 3, IA, JA, DESCA, 9, INFO ) 269 ELSE 270 NQ = N 271 CALL CHK1MAT( K, 5, N, 4, IA, JA, DESCA, 9, INFO ) 272 END IF 273 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 274 IF( INFO.EQ.0 ) THEN 275 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 276 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 277 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 278 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 279 $ NPCOL ) 280 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 281 $ NPROW ) 282 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 283 $ NPCOL ) 284 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 285 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 286* 287 IF( LEFT ) THEN 288 MQA0 = NUMROC( M+ICOFFA, DESCA( NB_ ), MYCOL, IACOL, 289 $ NPCOL ) 290 LCM = ILCM( NPROW, NPCOL ) 291 LCMP = LCM / NPROW 292 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) 293 $ / 2, ( MPC0 + MAX( MQA0 + NUMROC( NUMROC( 294 $ M+IROFFC, DESCA( MB_ ), 0, 0, NPROW ), 295 $ DESCA( MB_ ), 0, 0, LCMP ), NQC0 ) ) * 296 $ DESCA( MB_ ) ) + DESCA( MB_ ) * DESCA( MB_ ) 297 ELSE 298 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) / 2, 299 $ ( MPC0 + NQC0 ) * DESCA( MB_ ) ) + 300 $ DESCA( MB_ ) * DESCA( MB_ ) 301 END IF 302* 303 WORK( 1 ) = CMPLX( REAL( LWMIN ) ) 304 LQUERY = ( LWORK.EQ.-1 ) 305 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 306 INFO = -1 307 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'C' ) ) THEN 308 INFO = -2 309 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 310 INFO = -5 311 ELSE IF( LEFT .AND. DESCA( NB_ ).NE.DESCC( MB_ ) ) THEN 312 INFO = -(900+NB_) 313 ELSE IF( LEFT .AND. ICOFFA.NE.IROFFC ) THEN 314 INFO = -12 315 ELSE IF( .NOT.LEFT .AND. ICOFFA.NE.ICOFFC ) THEN 316 INFO = -13 317 ELSE IF( .NOT.LEFT .AND. IACOL.NE.ICCOL ) THEN 318 INFO = -13 319 ELSE IF( .NOT.LEFT .AND. DESCA( NB_ ).NE.DESCC( NB_ ) ) THEN 320 INFO = -(1400+NB_) 321 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 322 INFO = -(1400+CTXT_) 323 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 324 INFO = -16 325 END IF 326 END IF 327 IF( LEFT ) THEN 328 IDUM1( 1 ) = ICHAR( 'L' ) 329 ELSE 330 IDUM1( 1 ) = ICHAR( 'R' ) 331 END IF 332 IDUM2( 1 ) = 1 333 IF( NOTRAN ) THEN 334 IDUM1( 2 ) = ICHAR( 'N' ) 335 ELSE 336 IDUM1( 2 ) = ICHAR( 'C' ) 337 END IF 338 IDUM2( 2 ) = 2 339 IDUM1( 3 ) = K 340 IDUM2( 3 ) = 5 341 IF( LWORK.EQ.-1 ) THEN 342 IDUM1( 4 ) = -1 343 ELSE 344 IDUM1( 4 ) = 1 345 END IF 346 IDUM2( 4 ) = 16 347 IF( LEFT ) THEN 348 CALL PCHK2MAT( K, 5, M, 3, IA, JA, DESCA, 9, M, 3, N, 4, IC, 349 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 350 ELSE 351 CALL PCHK2MAT( K, 5, N, 4, IA, JA, DESCA, 9, M, 3, N, 4, IC, 352 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 353 END IF 354 END IF 355* 356 IF( INFO.NE.0 ) THEN 357 CALL PXERBLA( ICTXT, 'PCUNMLQ', -INFO ) 358 RETURN 359 ELSE IF( LQUERY ) THEN 360 RETURN 361 END IF 362* 363* Quick return if possible 364* 365 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 366 $ RETURN 367* 368 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 369 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 370* 371 IF( ( LEFT .AND. NOTRAN ) .OR. 372 $ ( .NOT.LEFT .AND. .NOT.NOTRAN ) ) THEN 373 I1 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 374 $ + 1 375 I2 = IA + K - 1 376 I3 = DESCA( MB_ ) 377 ELSE 378 I1 = MAX( ( (IA+K-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 379 I2 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 380 $ + 1 381 I3 = -DESCA( MB_ ) 382 END IF 383* 384 IF( LEFT ) THEN 385 NI = N 386 JCC = JC 387 ELSE 388 MI = M 389 ICC = IC 390 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 391 IF( NOTRAN ) THEN 392 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 393 ELSE 394 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'I-ring' ) 395 END IF 396 END IF 397* 398 IF( NOTRAN ) THEN 399 TRANST = 'C' 400 ELSE 401 TRANST = 'N' 402 END IF 403* 404 IF( ( LEFT .AND. NOTRAN ) .OR. ( .NOT.LEFT .AND. .NOT.NOTRAN ) ) 405 $ CALL PCUNML2( SIDE, TRANS, M, N, I1-IA, A, IA, JA, DESCA, TAU, 406 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 407* 408 IPW = DESCA( MB_ ) * DESCA( MB_ ) + 1 409 DO 10 I = I1, I2, I3 410 IB = MIN( DESCA( MB_ ), K-I+IA ) 411* 412* Form the triangular factor of the block reflector 413* H = H(i) H(i+1) . . . H(i+ib-1) 414* 415 CALL PCLARFT( 'Forward', 'Rowwise', NQ-I+IA, IB, A, I, JA+I-IA, 416 $ DESCA, TAU, WORK, WORK( IPW ) ) 417 IF( LEFT ) THEN 418* 419* H or H' is applied to C(ic+i-ia:ic+m-1,jc:jc+n-1) 420* 421 MI = M - I + IA 422 ICC = IC + I - IA 423 ELSE 424* 425* H or H' is applied to C(ic:ic+m-1,jc+i-ia:jc+n-1) 426* 427 NI = N - I + IA 428 JCC = JC + I - IA 429 END IF 430* 431* Apply H or H' 432* 433 CALL PCLARFB( SIDE, TRANST, 'Forward', 'Rowwise', MI, NI, IB, 434 $ A, I, JA+I-IA, DESCA, WORK, C, ICC, JCC, DESCC, 435 $ WORK( IPW ) ) 436 10 CONTINUE 437* 438 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. ( .NOT.LEFT .AND. NOTRAN ) ) 439 $ CALL PCUNML2( SIDE, TRANS, M, N, I2-IA, A, IA, JA, DESCA, TAU, 440 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 441* 442 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 443 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 444* 445 WORK( 1 ) = CMPLX( REAL( LWMIN ) ) 446* 447 RETURN 448* 449* End of PCUNMLQ 450* 451 END 452