1 SUBROUTINE PDTRORD( COMPQ, SELECT, PARA, N, T, IT, JT, 2 $ DESCT, Q, IQ, JQ, DESCQ, WR, WI, M, WORK, LWORK, 3 $ IWORK, LIWORK, INFO ) 4* 5* Contribution from the Department of Computing Science and HPC2N, 6* Umea University, Sweden 7* 8* -- ScaLAPACK routine (version 2.0.2) -- 9* Univ. of Tennessee, Univ. of California Berkeley, Univ. of Colorado Denver 10* May 1 2012 11* 12 IMPLICIT NONE 13* 14* .. Scalar Arguments .. 15 CHARACTER COMPQ 16 INTEGER INFO, LIWORK, LWORK, M, N, 17 $ IT, JT, IQ, JQ 18* .. 19* .. Array Arguments .. 20 INTEGER SELECT( * ) 21 INTEGER PARA( 6 ), DESCT( * ), DESCQ( * ), IWORK( * ) 22 DOUBLE PRECISION Q( * ), T( * ), WI( * ), WORK( * ), WR( * ) 23* .. 24* 25* Purpose 26* ======= 27* 28* PDTRORD reorders the real Schur factorization of a real matrix 29* A = Q*T*Q**T, so that a selected cluster of eigenvalues appears 30* in the leading diagonal blocks of the upper quasi-triangular matrix 31* T, and the leading columns of Q form an orthonormal basis of the 32* corresponding right invariant subspace. 33* 34* T must be in Schur form (as returned by PDLAHQR), that is, block 35* upper triangular with 1-by-1 and 2-by-2 diagonal blocks. 36* 37* This subroutine uses a delay and accumulate procedure for performing 38* the off-diagonal updates (see references for details). 39* 40* Notes 41* ===== 42* 43* Each global data object is described by an associated description 44* vector. This vector stores the information required to establish 45* the mapping between an object element and its corresponding process 46* and memory location. 47* 48* Let A be a generic term for any 2D block cyclicly distributed array. 49* Such a global array has an associated description vector DESCA. 50* In the following comments, the character _ should be read as 51* "of the global array". 52* 53* NOTATION STORED IN EXPLANATION 54* --------------- -------------- -------------------------------------- 55* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 56* DTYPE_A = 1. 57* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 58* the BLACS process grid A is distribu- 59* ted over. The context itself is glo- 60* bal, but the handle (the integer 61* value) may vary. 62* M_A (global) DESCA( M_ ) The number of rows in the global 63* array A. 64* N_A (global) DESCA( N_ ) The number of columns in the global 65* array A. 66* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 67* the rows of the array. 68* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 69* the columns of the array. 70* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 71* row of the array A is distributed. 72* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 73* first column of the array A is 74* distributed. 75* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 76* array. LLD_A >= MAX(1,LOCr(M_A)). 77* 78* Let K be the number of rows or columns of a distributed matrix, 79* and assume that its process grid has dimension p x q. 80* LOCr( K ) denotes the number of elements of K that a process 81* would receive if K were distributed over the p processes of its 82* process column. 83* Similarly, LOCc( K ) denotes the number of elements of K that a 84* process would receive if K were distributed over the q processes of 85* its process row. 86* The values of LOCr() and LOCc() may be determined via a call to the 87* ScaLAPACK tool function, NUMROC: 88* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 89* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 90* An upper bound for these quantities may be computed by: 91* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 92* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 93* 94* Arguments 95* ========= 96* 97* 98* COMPQ (global input) CHARACTER*1 99* = 'V': update the matrix Q of Schur vectors; 100* = 'N': do not update Q. 101* 102* SELECT (global input/output) INTEGER array, dimension (N) 103* SELECT specifies the eigenvalues in the selected cluster. To 104* select a real eigenvalue w(j), SELECT(j) must be set to 1. 105* To select a complex conjugate pair of eigenvalues 106* w(j) and w(j+1), corresponding to a 2-by-2 diagonal block, 107* either SELECT(j) or SELECT(j+1) or both must be set to 1; 108* a complex conjugate pair of eigenvalues must be 109* either both included in the cluster or both excluded. 110* On output, the (partial) reordering is displayed. 111* 112* PARA (global input) INTEGER*6 113* Block parameters (some should be replaced by calls to 114* PILAENV and others by meaningful default values): 115* PARA(1) = maximum number of concurrent computational windows 116* allowed in the algorithm; 117* 0 < PARA(1) <= min(NPROW,NPCOL) must hold; 118* PARA(2) = number of eigenvalues in each window; 119* 0 < PARA(2) < PARA(3) must hold; 120* PARA(3) = window size; PARA(2) < PARA(3) < DESCT(MB_) 121* must hold; 122* PARA(4) = minimal percentage of flops required for 123* performing matrix-matrix multiplications instead 124* of pipelined orthogonal transformations; 125* 0 <= PARA(4) <= 100 must hold; 126* PARA(5) = width of block column slabs for row-wise 127* application of pipelined orthogonal 128* transformations in their factorized form; 129* 0 < PARA(5) <= DESCT(MB_) must hold. 130* PARA(6) = the maximum number of eigenvalues moved together 131* over a process border; in practice, this will be 132* approximately half of the cross border window size 133* 0 < PARA(6) <= PARA(2) must hold; 134* 135* N (global input) INTEGER 136* The order of the globally distributed matrix T. N >= 0. 137* 138* T (local input/output) DOUBLE PRECISION array, 139* dimension (LLD_T,LOCc(N)). 140* On entry, the local pieces of the global distributed 141* upper quasi-triangular matrix T, in Schur form. On exit, T is 142* overwritten by the local pieces of the reordered matrix T, 143* again in Schur form, with the selected eigenvalues in the 144* globally leading diagonal blocks. 145* 146* IT (global input) INTEGER 147* JT (global input) INTEGER 148* The row and column index in the global array T indicating the 149* first column of sub( T ). IT = JT = 1 must hold. 150* 151* DESCT (global and local input) INTEGER array of dimension DLEN_. 152* The array descriptor for the global distributed matrix T. 153* 154* Q (local input/output) DOUBLE PRECISION array, 155* dimension (LLD_Q,LOCc(N)). 156* On entry, if COMPQ = 'V', the local pieces of the global 157* distributed matrix Q of Schur vectors. 158* On exit, if COMPQ = 'V', Q has been postmultiplied by the 159* global orthogonal transformation matrix which reorders T; the 160* leading M columns of Q form an orthonormal basis for the 161* specified invariant subspace. 162* If COMPQ = 'N', Q is not referenced. 163* 164* IQ (global input) INTEGER 165* JQ (global input) INTEGER 166* The column index in the global array Q indicating the 167* first column of sub( Q ). IQ = JQ = 1 must hold. 168* 169* DESCQ (global and local input) INTEGER array of dimension DLEN_. 170* The array descriptor for the global distributed matrix Q. 171* 172* WR (global output) DOUBLE PRECISION array, dimension (N) 173* WI (global output) DOUBLE PRECISION array, dimension (N) 174* The real and imaginary parts, respectively, of the reordered 175* eigenvalues of T. The eigenvalues are in principle stored in 176* the same order as on the diagonal of T, with WR(i) = T(i,i) 177* and, if T(i:i+1,i:i+1) is a 2-by-2 diagonal block, WI(i) > 0 178* and WI(i+1) = -WI(i). 179* Note also that if a complex eigenvalue is sufficiently 180* ill-conditioned, then its value may differ significantly 181* from its value before reordering. 182* 183* M (global output) INTEGER 184* The dimension of the specified invariant subspace. 185* 0 <= M <= N. 186* 187* WORK (local workspace/output) DOUBLE PRECISION array, 188* dimension (LWORK) 189* On exit, if INFO = 0, WORK(1) returns the optimal LWORK. 190* 191* LWORK (local input) INTEGER 192* The dimension of the array WORK. 193* 194* If LWORK = -1, then a workspace query is assumed; the routine 195* only calculates the optimal size of the WORK array, returns 196* this value as the first entry of the WORK array, and no error 197* message related to LWORK is issued by PXERBLA. 198* 199* IWORK (local workspace/output) INTEGER array, dimension (LIWORK) 200* 201* LIWORK (local input) INTEGER 202* The dimension of the array IWORK. 203* 204* If LIWORK = -1, then a workspace query is assumed; the 205* routine only calculates the optimal size of the IWORK array, 206* returns this value as the first entry of the IWORK array, and 207* no error message related to LIWORK is issued by PXERBLA. 208* 209* INFO (global output) INTEGER 210* = 0: successful exit 211* < 0: if INFO = -i, the i-th argument had an illegal value. 212* If the i-th argument is an array and the j-entry had 213* an illegal value, then INFO = -(i*1000+j), if the i-th 214* argument is a scalar and had an illegal value, then INFO = -i. 215* > 0: here we have several possibilites 216* *) Reordering of T failed because some eigenvalues are too 217* close to separate (the problem is very ill-conditioned); 218* T may have been partially reordered, and WR and WI 219* contain the eigenvalues in the same order as in T. 220* On exit, INFO = {the index of T where the swap failed}. 221* *) A 2-by-2 block to be reordered split into two 1-by-1 222* blocks and the second block failed to swap with an 223* adjacent block. 224* On exit, INFO = {the index of T where the swap failed}. 225* *) If INFO = N+1, there is no valid BLACS context (see the 226* BLACS documentation for details). 227* In a future release this subroutine may distinguish between 228* the case 1 and 2 above. 229* 230* Additional requirements 231* ======================= 232* 233* The following alignment requirements must hold: 234* (a) DESCT( MB_ ) = DESCT( NB_ ) = DESCQ( MB_ ) = DESCQ( NB_ ) 235* (b) DESCT( RSRC_ ) = DESCQ( RSRC_ ) 236* (c) DESCT( CSRC_ ) = DESCQ( CSRC_ ) 237* 238* All matrices must be blocked by a block factor larger than or 239* equal to two (3). This is to simplify reordering across processor 240* borders in the presence of 2-by-2 blocks. 241* 242* Limitations 243* =========== 244* 245* This algorithm cannot work on submatrices of T and Q, i.e., 246* IT = JT = IQ = JQ = 1 must hold. This is however no limitation 247* since PDLAHQR does not compute Schur forms of submatrices anyway. 248* 249* References 250* ========== 251* 252* [1] Z. Bai and J. W. Demmel; On swapping diagonal blocks in real 253* Schur form, Linear Algebra Appl., 186:73--95, 1993. Also as 254* LAPACK Working Note 54. 255* 256* [2] D. Kressner; Block algorithms for reordering standard and 257* generalized Schur forms, ACM TOMS, 32(4):521-532, 2006. 258* Also LAPACK Working Note 171. 259* 260* [3] R. Granat, B. Kagstrom, and D. Kressner; Parallel eigenvalue 261* reordering in real Schur form, Concurrency and Computations: 262* Practice and Experience, 21(9):1225-1250, 2009. Also as 263* LAPACK Working Note 192. 264* 265* Parallel execution recommendations 266* ================================== 267* 268* Use a square grid, if possible, for maximum performance. The block 269* parameters in PARA should be kept well below the data distribution 270* block size. In particular, see [3] for recommended settings for 271* these parameters. 272* 273* In general, the parallel algorithm strives to perform as much work 274* as possible without crossing the block borders on the main block 275* diagonal. 276* 277* Contributors 278* ============ 279* 280* Implemented by Robert Granat, Dept. of Computing Science and HPC2N, 281* Umea University, Sweden, March 2007, 282* in collaboration with Bo Kagstrom and Daniel Kressner. 283* Modified by Meiyue Shao, October 2011. 284* 285* Revisions 286* ========= 287* 288* Please send bug-reports to granat@cs.umu.se 289* 290* Keywords 291* ======== 292* 293* Real Schur form, eigenvalue reordering 294* 295* ===================================================================== 296* .. 297* .. Parameters .. 298 CHARACTER TOP 299 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 300 $ LLD_, MB_, M_, NB_, N_, RSRC_ 301 DOUBLE PRECISION ZERO, ONE 302 PARAMETER ( TOP = '1-Tree', 303 $ BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 304 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 305 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9, 306 $ ZERO = 0.0D+0, ONE = 1.0D+0 ) 307* .. 308* .. Local Scalars .. 309 LOGICAL LQUERY, PAIR, SWAP, WANTQ, 310 $ ISHH, FIRST, SKIP1CR, BORDER, LASTWAIT 311 INTEGER NPROW, NPCOL, MYROW, MYCOL, NB, NPROCS, 312 $ IERR, DIM1, INDX, LLDT, TRSRC, TCSRC, ILOC1, 313 $ JLOC1, MYIERR, ICTXT, 314 $ RSRC1, CSRC1, ILOC3, JLOC3, TRSRC3, 315 $ TCSRC3, ILOC, JLOC, TRSRC4, TCSRC4, 316 $ FLOPS, I, ILO, IHI, J, K, KK, KKS, 317 $ KS, LIWMIN, LWMIN, MMULT, N1, N2, 318 $ NCB, NDTRAF, NITRAF, NWIN, NUMWIN, PDTRAF, 319 $ PITRAF, PDW, WINEIG, WINSIZ, LLDQ, 320 $ RSRC, CSRC, ILILO, ILIHI, ILSEL, IRSRC, 321 $ ICSRC, IPIW, IPW1, IPW2, IPW3, TIHI, TILO, 322 $ LIHI, WINDOW, LILO, LSEL, BUFFER, 323 $ NMWIN2, BUFFLEN, LROWS, LCOLS, ILOC2, JLOC2, 324 $ WNEICR, WINDOW0, RSRC4, CSRC4, LIHI4, RSRC3, 325 $ CSRC3, RSRC2, CSRC2, LIHIC, LIHI1, ILEN4, 326 $ SELI4, ILEN1, DIM4, IPW4, QROWS, TROWS, 327 $ TCOLS, IPW5, IPW6, IPW7, IPW8, JLOC4, 328 $ EAST, WEST, ILOC4, SOUTH, NORTH, INDXS, 329 $ ITT, JTT, ILEN, DLEN, INDXE, TRSRC1, TCSRC1, 330 $ TRSRC2, TCSRC2, ILOS, DIR, TLIHI, TLILO, TLSEL, 331 $ ROUND, LAST, WIN0S, WIN0E, WINE, MMAX, MMIN 332 DOUBLE PRECISION ELEM, ELEM1, ELEM2, ELEM3, ELEM4, SN, CS, TMP, 333 $ ELEM5 334* .. 335* .. Local Arrays .. 336 INTEGER IBUFF( 8 ), IDUM1( 1 ), IDUM2( 1 ) 337* .. 338* .. External Functions .. 339 LOGICAL LSAME 340 INTEGER NUMROC, INDXG2P, INDXG2L 341 EXTERNAL LSAME, NUMROC, INDXG2P, INDXG2L 342* .. 343* .. External Subroutines .. 344 EXTERNAL PDLACPY, PXERBLA, PCHK1MAT, PCHK2MAT, 345 $ DGEMM, DLAMOV, ILACPY, CHK1MAT, 346 $ INFOG2L, DGSUM2D, DGESD2D, DGERV2D, DGEBS2D, 347 $ DGEBR2D, IGSUM2D, BLACS_GRIDINFO, IGEBS2D, 348 $ IGEBR2D, IGAMX2D, IGAMN2D, BDLAAPP, BDTREXC 349* .. 350* .. Intrinsic Functions .. 351 INTRINSIC ABS, MAX, SQRT, MIN 352* .. 353* .. Local Functions .. 354 INTEGER ICEIL 355* .. 356* .. Executable Statements .. 357* 358* Get grid parameters. 359* 360 ICTXT = DESCT( CTXT_ ) 361 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 362 NPROCS = NPROW*NPCOL 363* 364* Test if grid is O.K., i.e., the context is valid. 365* 366 INFO = 0 367 IF( NPROW.EQ.-1 ) THEN 368 INFO = N+1 369 END IF 370* 371* Check if workspace query. 372* 373 LQUERY = LWORK.EQ.-1 .OR. LIWORK.EQ.-1 374* 375* Test dimensions for local sanity. 376* 377 IF( INFO.EQ.0 ) THEN 378 CALL CHK1MAT( N, 5, N, 5, IT, JT, DESCT, 9, INFO ) 379 END IF 380 IF( INFO.EQ.0 ) THEN 381 CALL CHK1MAT( N, 5, N, 5, IQ, JQ, DESCQ, 13, INFO ) 382 END IF 383* 384* Check the blocking sizes for alignment requirements. 385* 386 IF( INFO.EQ.0 ) THEN 387 IF( DESCT( MB_ ).NE.DESCT( NB_ ) ) INFO = -(1000*9 + MB_) 388 END IF 389 IF( INFO.EQ.0 ) THEN 390 IF( DESCQ( MB_ ).NE.DESCQ( NB_ ) ) INFO = -(1000*13 + MB_) 391 END IF 392 IF( INFO.EQ.0 ) THEN 393 IF( DESCT( MB_ ).NE.DESCQ( MB_ ) ) INFO = -(1000*9 + MB_) 394 END IF 395* 396* Check the blocking sizes for minimum sizes. 397* 398 IF( INFO.EQ.0 ) THEN 399 IF( N.NE.DESCT( MB_ ) .AND. DESCT( MB_ ).LT.3 ) 400 $ INFO = -(1000*9 + MB_) 401 IF( N.NE.DESCQ( MB_ ) .AND. DESCQ( MB_ ).LT.3 ) 402 $ INFO = -(1000*13 + MB_) 403 END IF 404* 405* Check parameters in PARA. 406* 407 NB = DESCT( MB_ ) 408 IF( INFO.EQ.0 ) THEN 409 IF( PARA(1).LT.1 .OR. PARA(1).GT.MIN(NPROW,NPCOL) ) 410 $ INFO = -(1000 * 4 + 1) 411 IF( PARA(2).LT.1 .OR. PARA(2).GE.PARA(3) ) 412 $ INFO = -(1000 * 4 + 2) 413 IF( PARA(3).LT.1 .OR. PARA(3).GT.NB ) 414 $ INFO = -(1000 * 4 + 3) 415 IF( PARA(4).LT.0 .OR. PARA(4).GT.100 ) 416 $ INFO = -(1000 * 4 + 4) 417 IF( PARA(5).LT.1 .OR. PARA(5).GT.NB ) 418 $ INFO = -(1000 * 4 + 5) 419 IF( PARA(6).LT.1 .OR. PARA(6).GT.PARA(2) ) 420 $ INFO = -(1000 * 4 + 6) 421 END IF 422* 423* Check requirements on IT, JT, IQ and JQ. 424* 425 IF( INFO.EQ.0 ) THEN 426 IF( IT.NE.1 ) INFO = -6 427 IF( JT.NE.IT ) INFO = -7 428 IF( IQ.NE.1 ) INFO = -10 429 IF( JQ.NE.IQ ) INFO = -11 430 END IF 431* 432* Test input parameters for global sanity. 433* 434 IF( INFO.EQ.0 ) THEN 435 CALL PCHK1MAT( N, 5, N, 5, IT, JT, DESCT, 9, 0, IDUM1, 436 $ IDUM2, INFO ) 437 END IF 438 IF( INFO.EQ.0 ) THEN 439 CALL PCHK1MAT( N, 5, N, 5, IQ, JQ, DESCQ, 13, 0, IDUM1, 440 $ IDUM2, INFO ) 441 END IF 442 IF( INFO.EQ.0 ) THEN 443 CALL PCHK2MAT( N, 5, N, 5, IT, JT, DESCT, 9, N, 5, N, 5, 444 $ IQ, JQ, DESCQ, 13, 0, IDUM1, IDUM2, INFO ) 445 END IF 446* 447* Decode and test the input parameters. 448* 449 IF( INFO.EQ.0 .OR. LQUERY ) THEN 450* 451 WANTQ = LSAME( COMPQ, 'V' ) 452 IF( N.LT.0 ) THEN 453 INFO = -4 454 ELSE 455* 456* Extract local leading dimension. 457* 458 LLDT = DESCT( LLD_ ) 459 LLDQ = DESCQ( LLD_ ) 460* 461* Check the SELECT vector for consistency and set M to the 462* dimension of the specified invariant subspace. 463* 464 M = 0 465 DO 10 K = 1, N 466 IF( K.LT.N ) THEN 467 CALL INFOG2L( K+1, K, DESCT, NPROW, NPCOL, 468 $ MYROW, MYCOL, ITT, JTT, TRSRC, TCSRC ) 469 IF( MYROW.EQ.TRSRC .AND. MYCOL.EQ.TCSRC ) THEN 470 ELEM = T( (JTT-1)*LLDT + ITT ) 471 IF( ELEM.NE.ZERO ) THEN 472 IF( SELECT(K).NE.0 .AND. 473 $ SELECT(K+1).EQ.0 ) THEN 474* INFO = -2 475 SELECT(K+1) = 1 476 ELSEIF( SELECT(K).EQ.0 .AND. 477 $ SELECT(K+1).NE.0 ) THEN 478* INFO = -2 479 SELECT(K) = 1 480 END IF 481 END IF 482 END IF 483 END IF 484 IF( SELECT(K).NE.0 ) M = M + 1 485 10 CONTINUE 486 MMAX = M 487 MMIN = M 488 IF( NPROCS.GT.1 ) 489 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, MMAX, 1, -1, 490 $ -1, -1, -1, -1 ) 491 IF( NPROCS.GT.1 ) 492 $ CALL IGAMN2D( ICTXT, 'All', TOP, 1, 1, MMIN, 1, -1, 493 $ -1, -1, -1, -1 ) 494 IF( MMAX.GT.MMIN ) THEN 495 M = MMAX 496 IF( NPROCS.GT.1 ) 497 $ CALL IGAMX2D( ICTXT, 'All', TOP, N, 1, SELECT, N, 498 $ -1, -1, -1, -1, -1 ) 499 END IF 500* 501* Compute needed workspace. 502* 503 N1 = M 504 N2 = N - M 505* 506 TROWS = NUMROC( N, NB, MYROW, DESCT(RSRC_), NPROW ) 507 TCOLS = NUMROC( N, NB, MYCOL, DESCT(CSRC_), NPCOL ) 508 LWMIN = N + 7*NB**2 + 2*TROWS*PARA( 3 ) + TCOLS*PARA( 3 ) + 509 $ MAX( TROWS*PARA( 3 ), TCOLS*PARA( 3 ) ) 510 LIWMIN = 5*PARA( 1 ) + PARA( 2 )*PARA( 3 ) - 511 $ PARA( 2 ) * ( PARA( 2 ) + 1 ) / 2 512* 513 IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 514 INFO = -17 515 ELSE IF( LIWORK.LT.LIWMIN .AND. .NOT.LQUERY ) THEN 516 INFO = -19 517 END IF 518 END IF 519 END IF 520* 521* Global maximum on info. 522* 523 IF( NPROCS.GT.1 ) 524 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, INFO, 1, -1, -1, -1, 525 $ -1, -1 ) 526* 527* Return if some argument is incorrect. 528* 529 IF( INFO.NE.0 .AND. .NOT.LQUERY ) THEN 530 M = 0 531 CALL PXERBLA( ICTXT, 'PDTRORD', -INFO ) 532 RETURN 533 ELSEIF( LQUERY ) THEN 534 WORK( 1 ) = DBLE(LWMIN) 535 IWORK( 1 ) = LIWMIN 536 RETURN 537 END IF 538* 539* Quick return if possible. 540* 541 IF( M.EQ.N .OR. M.EQ.0 ) GO TO 545 542* 543* Set parameters. 544* 545 NUMWIN = PARA( 1 ) 546 WINEIG = MAX( PARA( 2 ), 2 ) 547 WINSIZ = MIN( MAX( PARA( 3 ), PARA( 2 )*2 ), NB ) 548 MMULT = PARA( 4 ) 549 NCB = PARA( 5 ) 550 WNEICR = PARA( 6 ) 551* 552* Insert some pointers into INTEGER workspace. 553* 554* Information about all the active windows is stored 555* in IWORK( 1:5*NUMWIN ). Each processor has a copy. 556* LILO: start position 557* LIHI: stop position 558* LSEL: number of selected eigenvalues 559* RSRC: processor id (row) 560* CSRC: processor id (col) 561* IWORK( IPIW+ ) contain information of orthogonal transformations. 562* 563 ILILO = 1 564 ILIHI = ILILO + NUMWIN 565 ILSEL = ILIHI + NUMWIN 566 IRSRC = ILSEL + NUMWIN 567 ICSRC = IRSRC + NUMWIN 568 IPIW = ICSRC + NUMWIN 569* 570* Insert some pointers into DOUBLE PRECISION workspace - for now we 571* only need two pointers. 572* 573 IPW1 = 1 574 IPW2 = IPW1 + NB 575* 576* Collect the selected blocks at the top-left corner of T. 577* 578* Globally: ignore eigenvalues that are already in order. 579* ILO is a global variable and is kept updated to be consistent 580* throughout the process mesh. 581* 582 ILO = 0 583 40 CONTINUE 584 ILO = ILO + 1 585 IF( ILO.LE.N ) THEN 586 IF( SELECT(ILO).NE.0 ) GO TO 40 587 END IF 588* 589* Globally: start the collection at the top of the matrix. Here, 590* IHI is a global variable and is kept updated to be consistent 591* throughout the process mesh. 592* 593 IHI = N 594* 595* Globally: While ( ILO <= M ) do 596 50 CONTINUE 597* 598 IF( ILO.LE.M ) THEN 599* 600* Depending on the value of ILO, find the diagonal block index J, 601* such that T(1+(J-1)*NB:1+J*NB,1+(J-1)*NB:1+J*NB) contains the 602* first unsorted eigenvalue. Check that J does not point to a 603* block with only one selected eigenvalue in the last position 604* which belongs to a splitted 2-by-2 block. 605* 606 ILOS = ILO - 1 607 52 CONTINUE 608 ILOS = ILOS + 1 609 IF( SELECT(ILOS).EQ.0 ) GO TO 52 610 IF( ILOS.LT.N ) THEN 611 IF( SELECT(ILOS+1).NE.0 .AND. MOD(ILOS,NB).EQ.0 ) THEN 612 CALL PDELGET( 'All', TOP, ELEM, T, ILOS+1, ILOS, DESCT ) 613 IF( ELEM.NE.ZERO ) GO TO 52 614 END IF 615 END IF 616 J = ICEIL(ILOS,NB) 617* 618* Globally: Set start values of LILO and LIHI for all processes. 619* Choose also the number of selected eigenvalues at top of each 620* diagonal block such that the number of eigenvalues which remain 621* to be reordered is an integer multiple of WINEIG. 622* 623* All the information is saved into the INTEGER workspace such 624* that all processors are aware of each others operations. 625* 626* Compute the number of concurrent windows. 627* 628 NMWIN2 = (ICEIL(IHI,NB)*NB - (ILO-MOD(ILO,NB)+1)+1) / NB 629 NMWIN2 = MIN( MIN( NUMWIN, NMWIN2 ), ICEIL(N,NB) - J + 1 ) 630* 631* For all windows, set LSEL = 0 and find a proper start value of 632* LILO such that LILO points at the first non-selected entry in 633* the corresponding diagonal block of T. 634* 635 DO 80 K = 1, NMWIN2 636 IWORK( ILSEL+K-1) = 0 637 IWORK( ILILO+K-1) = MAX( ILO, (J-1)*NB+(K-1)*NB+1 ) 638 LILO = IWORK( ILILO+K-1 ) 639 82 CONTINUE 640 IF( SELECT(LILO).NE.0 .AND. LILO.LT.(J+K-1)*NB ) THEN 641 LILO = LILO + 1 642 IF( LILO.LE.N ) GO TO 82 643 END IF 644 IWORK( ILILO+K-1 ) = LILO 645* 646* Fix each LILO to ensure that no 2-by-2 block is cut in top 647* of the submatrix (LILO:LIHI,LILO:LIHI). 648* 649 LILO = IWORK(ILILO+K-1) 650 IF( LILO.GT.NB ) THEN 651 CALL PDELGET( 'All', TOP, ELEM, T, LILO, LILO-1, DESCT ) 652 IF( ELEM.NE.ZERO ) THEN 653 IF( LILO.LT.(J+K-1)*NB ) THEN 654 IWORK(ILILO+K-1) = IWORK(ILILO+K-1) + 1 655 ELSE 656 IWORK(ILILO+K-1) = IWORK(ILILO+K-1) - 1 657 END IF 658 END IF 659 END IF 660* 661* Set a proper LIHI value for each window. Also find the 662* processors corresponding to the corresponding windows. 663* 664 IWORK( ILIHI+K-1 ) = IWORK( ILILO+K-1 ) 665 IWORK( IRSRC+K-1 ) = INDXG2P( IWORK(ILILO+K-1), NB, MYROW, 666 $ DESCT( RSRC_ ), NPROW ) 667 IWORK( ICSRC+K-1 ) = INDXG2P( IWORK(ILILO+K-1), NB, MYCOL, 668 $ DESCT( CSRC_ ), NPCOL ) 669 TILO = IWORK(ILILO+K-1) 670 TIHI = MIN( N, ICEIL( TILO, NB ) * NB ) 671 DO 90 KK = TIHI, TILO, -1 672 IF( SELECT(KK).NE.0 ) THEN 673 IWORK(ILIHI+K-1) = MAX(IWORK(ILIHI+K-1) , KK ) 674 IWORK(ILSEL+K-1) = IWORK(ILSEL+K-1) + 1 675 IF( IWORK(ILSEL+K-1).GT.WINEIG ) THEN 676 IWORK(ILIHI+K-1) = KK 677 IWORK(ILSEL+K-1) = 1 678 END IF 679 END IF 680 90 CONTINUE 681* 682* Fix each LIHI to avoid that bottom of window cuts 2-by-2 683* block. We exclude such a block if located on block (process) 684* border and on window border or if an inclusion would cause 685* violation on the maximum number of eigenvalues to reorder 686* inside each window. If only on window border, we include it. 687* The excluded block is included automatically later when a 688* subcluster is reordered into the block from South-East. 689* 690 LIHI = IWORK(ILIHI+K-1) 691 IF( LIHI.LT.N ) THEN 692 CALL PDELGET( 'All', TOP, ELEM, T, LIHI+1, LIHI, DESCT ) 693 IF( ELEM.NE.ZERO ) THEN 694 IF( ICEIL( LIHI, NB ) .NE. ICEIL( LIHI+1, NB ) .OR. 695 $ IWORK( ILSEL+K-1 ).EQ.WINEIG ) THEN 696 IWORK( ILIHI+K-1 ) = IWORK( ILIHI+K-1 ) - 1 697 IF( IWORK( ILSEL+K-1 ).GT.2 ) 698 $ IWORK( ILSEL+K-1 ) = IWORK( ILSEL+K-1 ) - 1 699 ELSE 700 IWORK( ILIHI+K-1 ) = IWORK( ILIHI+K-1 ) + 1 701 IF( SELECT(LIHI+1).NE.0 ) 702 $ IWORK( ILSEL+K-1 ) = IWORK( ILSEL+K-1 ) + 1 703 END IF 704 END IF 705 END IF 706 80 CONTINUE 707* 708* Fix the special cases of LSEL = 0 and LILO = LIHI for each 709* window by assuring that the stop-condition for local reordering 710* is fulfilled directly. Do this by setting LIHI = startposition 711* for the corresponding block and LILO = LIHI + 1. 712* 713 DO 85 K = 1, NMWIN2 714 LILO = IWORK( ILILO + K - 1 ) 715 LIHI = IWORK( ILIHI + K - 1 ) 716 LSEL = IWORK( ILSEL + K - 1 ) 717 IF( LSEL.EQ.0 .OR. LILO.EQ.LIHI ) THEN 718 LIHI = IWORK( ILIHI + K - 1 ) 719 IWORK( ILIHI + K - 1 ) = (ICEIL(LIHI,NB)-1)*NB + 1 720 IWORK( ILILO + K - 1 ) = IWORK( ILIHI + K - 1 ) + 1 721 END IF 722 85 CONTINUE 723* 724* Associate all processors with the first computational window 725* that should be activated, if possible. 726* 727 LILO = IHI 728 LIHI = ILO 729 LSEL = M 730 FIRST = .TRUE. 731 DO 95 WINDOW = 1, NMWIN2 732 RSRC = IWORK(IRSRC+WINDOW-1) 733 CSRC = IWORK(ICSRC+WINDOW-1) 734 IF( MYROW.EQ.RSRC .OR. MYCOL.EQ.CSRC ) THEN 735 TLILO = IWORK( ILILO + WINDOW - 1 ) 736 TLIHI = IWORK( ILIHI + WINDOW - 1 ) 737 TLSEL = IWORK( ILSEL + WINDOW - 1 ) 738 IF( (.NOT. ( LIHI .GE. LILO + LSEL ) ) .AND. 739 $ ( (TLIHI .GE. TLILO + TLSEL) .OR. FIRST ) ) THEN 740 IF( FIRST ) FIRST = .FALSE. 741 LILO = TLILO 742 LIHI = TLIHI 743 LSEL = TLSEL 744 GO TO 97 745 END IF 746 END IF 747 95 CONTINUE 748 97 CONTINUE 749* 750* Exclude all processors that are not involved in any 751* computational window right now. 752* 753 IERR = 0 754 IF( LILO.EQ.IHI .AND. LIHI.EQ.ILO .AND. LSEL.EQ.M ) 755 $ GO TO 114 756* 757* Make sure all processors associated with a compuational window 758* enter the local reordering the first time. 759* 760 FIRST = .TRUE. 761* 762* Globally for all computational windows: 763* While ( LIHI >= LILO + LSEL ) do 764 ROUND = 1 765 130 CONTINUE 766 IF( FIRST .OR. ( LIHI .GE. LILO + LSEL ) ) THEN 767* 768* Perform computations in parallel: loop through all 769* compuational windows, do local reordering and accumulate 770* transformations, broadcast them in the corresponding block 771* row and columns and compute the corresponding updates. 772* 773 DO 110 WINDOW = 1, NMWIN2 774 RSRC = IWORK(IRSRC+WINDOW-1) 775 CSRC = IWORK(ICSRC+WINDOW-1) 776* 777* The process on the block diagonal computes the 778* reordering. 779* 780 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) THEN 781 LILO = IWORK(ILILO+WINDOW-1) 782 LIHI = IWORK(ILIHI+WINDOW-1) 783 LSEL = IWORK(ILSEL+WINDOW-1) 784* 785* Compute the local value of I -- start position. 786* 787 I = MAX( LILO, LIHI - WINSIZ + 1 ) 788* 789* Fix my I to avoid that top of window cuts a 2-by-2 790* block. 791* 792 IF( I.GT.LILO ) THEN 793 CALL INFOG2L( I, I-1, DESCT, NPROW, NPCOL, MYROW, 794 $ MYCOL, ILOC, JLOC, RSRC, CSRC ) 795 IF( T( LLDT*(JLOC-1) + ILOC ).NE.ZERO ) 796 $ I = I + 1 797 END IF 798* 799* Compute local indicies for submatrix to operate on. 800* 801 CALL INFOG2L( I, I, DESCT, NPROW, NPCOL, 802 $ MYROW, MYCOL, ILOC1, JLOC1, RSRC, CSRC ) 803* 804* The active window is ( I:LIHI, I:LIHI ). Reorder 805* eigenvalues within this window and pipeline 806* transformations. 807* 808 NWIN = LIHI - I + 1 809 KS = 0 810 PITRAF = IPIW 811 PDTRAF = IPW2 812* 813 PAIR = .FALSE. 814 DO 140 K = I, LIHI 815 IF( PAIR ) THEN 816 PAIR = .FALSE. 817 ELSE 818 SWAP = SELECT( K ).NE.0 819 IF( K.LT.LIHI ) THEN 820 CALL INFOG2L( K+1, K, DESCT, NPROW, NPCOL, 821 $ MYROW, MYCOL, ILOC, JLOC, RSRC, CSRC ) 822 IF( T( LLDT*(JLOC-1) + ILOC ).NE.ZERO ) 823 $ PAIR = .TRUE. 824 END IF 825 IF( SWAP ) THEN 826 KS = KS + 1 827* 828* Swap the K-th block to position I+KS-1. 829* 830 IERR = 0 831 KK = K - I + 1 832 KKS = KS 833 IF( KK.NE.KS ) THEN 834 NITRAF = LIWORK - PITRAF + 1 835 NDTRAF = LWORK - PDTRAF + 1 836 CALL BDTREXC( NWIN, 837 $ T(LLDT*(JLOC1-1) + ILOC1), LLDT, KK, 838 $ KKS, NITRAF, IWORK( PITRAF ), NDTRAF, 839 $ WORK( PDTRAF ), WORK(IPW1), IERR ) 840 PITRAF = PITRAF + NITRAF 841 PDTRAF = PDTRAF + NDTRAF 842* 843* Update array SELECT. 844* 845 IF ( PAIR ) THEN 846 DO 150 J = I+KK-1, I+KKS, -1 847 SELECT(J+1) = SELECT(J-1) 848 150 CONTINUE 849 SELECT(I+KKS-1) = 1 850 SELECT(I+KKS) = 1 851 ELSE 852 DO 160 J = I+KK-1, I+KKS, -1 853 SELECT(J) = SELECT(J-1) 854 160 CONTINUE 855 SELECT(I+KKS-1) = 1 856 END IF 857* 858 IF ( IERR.EQ.1 .OR. IERR.EQ.2 ) THEN 859* 860* Some blocks are too close to swap: 861* prepare to leave in a clean fashion. If 862* IERR.EQ.2, we must update SELECT to 863* account for the fact that the 2 by 2 864* block to be reordered did split and the 865* first part of this block is already 866* reordered. 867* 868 IF ( IERR.EQ.2 ) THEN 869 SELECT( I+KKS-3 ) = 1 870 SELECT( I+KKS-1 ) = 0 871 KKS = KKS + 1 872 END IF 873* 874* Update off-diagonal blocks immediately. 875* 876 GO TO 170 877 END IF 878 KS = KKS 879 END IF 880 IF( PAIR ) 881 $ KS = KS + 1 882 END IF 883 END IF 884 140 CONTINUE 885 END IF 886 110 CONTINUE 887 170 CONTINUE 888* 889* The on-diagonal processes save their information from the 890* local reordering in the integer buffer. This buffer is 891* broadcasted to updating processors, see below. 892* 893 DO 175 WINDOW = 1, NMWIN2 894 RSRC = IWORK(IRSRC+WINDOW-1) 895 CSRC = IWORK(ICSRC+WINDOW-1) 896 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) THEN 897 IBUFF( 1 ) = I 898 IBUFF( 2 ) = NWIN 899 IBUFF( 3 ) = PITRAF 900 IBUFF( 4 ) = KS 901 IBUFF( 5 ) = PDTRAF 902 IBUFF( 6 ) = NDTRAF 903 ILEN = PITRAF - IPIW 904 DLEN = PDTRAF - IPW2 905 IBUFF( 7 ) = ILEN 906 IBUFF( 8 ) = DLEN 907 END IF 908 175 CONTINUE 909* 910* For the updates with respect to the local reordering, we 911* organize the updates in two phases where the update 912* "direction" (controlled by the DIR variable below) is first 913* chosen to be the corresponding rows, then the corresponding 914* columns. 915* 916 DO 1111 DIR = 1, 2 917* 918* Broadcast information about the reordering and the 919* accumulated transformations: I, NWIN, PITRAF, NITRAF, 920* PDTRAF, NDTRAF. If no broadcast is performed, use an 921* artificial value of KS to prevent updating indicies for 922* windows already finished (use KS = -1). 923* 924 DO 111 WINDOW = 1, NMWIN2 925 RSRC = IWORK(IRSRC+WINDOW-1) 926 CSRC = IWORK(ICSRC+WINDOW-1) 927 IF( MYROW.EQ.RSRC .OR. MYCOL.EQ.CSRC ) THEN 928 LILO = IWORK(ILILO+WINDOW-1) 929 LIHI = IWORK(ILIHI+WINDOW-1) 930 LSEL = IWORK(ILSEL+WINDOW-1) 931 END IF 932 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) THEN 933 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) 934 $ CALL IGEBS2D( ICTXT, 'Row', TOP, 8, 1, IBUFF, 8 ) 935 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) 936 $ CALL IGEBS2D( ICTXT, 'Col', TOP, 8, 1, IBUFF, 8 ) 937 ELSEIF( MYROW.EQ.RSRC .OR. MYCOL.EQ.CSRC ) THEN 938 IF( NPCOL.GT.1 .AND. DIR.EQ.1 .AND. MYROW.EQ.RSRC ) 939 $ THEN 940 IF( FIRST .OR. (LIHI .GE. LILO + LSEL) ) THEN 941 CALL IGEBR2D( ICTXT, 'Row', TOP, 8, 1, IBUFF, 8, 942 $ RSRC, CSRC ) 943 I = IBUFF( 1 ) 944 NWIN = IBUFF( 2 ) 945 PITRAF = IBUFF( 3 ) 946 KS = IBUFF( 4 ) 947 PDTRAF = IBUFF( 5 ) 948 NDTRAF = IBUFF( 6 ) 949 ILEN = IBUFF( 7 ) 950 DLEN = IBUFF( 8 ) 951 ELSE 952 ILEN = 0 953 DLEN = 0 954 KS = -1 955 END IF 956 END IF 957 IF( NPROW.GT.1 .AND. DIR.EQ.2 .AND. MYCOL.EQ.CSRC ) 958 $ THEN 959 IF( FIRST .OR. (LIHI .GE. LILO + LSEL) ) THEN 960 CALL IGEBR2D( ICTXT, 'Col', TOP, 8, 1, IBUFF, 8, 961 $ RSRC, CSRC ) 962 I = IBUFF( 1 ) 963 NWIN = IBUFF( 2 ) 964 PITRAF = IBUFF( 3 ) 965 KS = IBUFF( 4 ) 966 PDTRAF = IBUFF( 5 ) 967 NDTRAF = IBUFF( 6 ) 968 ILEN = IBUFF( 7 ) 969 DLEN = IBUFF( 8 ) 970 ELSE 971 ILEN = 0 972 DLEN = 0 973 KS = -1 974 END IF 975 END IF 976 END IF 977* 978* Broadcast the accumulated transformations - copy all 979* information from IWORK(IPIW:PITRAF-1) and 980* WORK(IPW2:PDTRAF-1) to a buffer and broadcast this 981* buffer in the corresponding block row and column. On 982* arrival, copy the information back to the correct part of 983* the workspace. This step is avoided if no computations 984* were performed at the diagonal processor, i.e., 985* BUFFLEN = 0. 986* 987 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) THEN 988 BUFFER = PDTRAF 989 BUFFLEN = DLEN + ILEN 990 IF( BUFFLEN.NE.0 ) THEN 991 DO 180 INDX = 1, ILEN 992 WORK( BUFFER+INDX-1 ) = 993 $ DBLE( IWORK(IPIW+INDX-1) ) 994 180 CONTINUE 995 CALL DLAMOV( 'All', DLEN, 1, WORK( IPW2 ), 996 $ DLEN, WORK(BUFFER+ILEN), DLEN ) 997 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) THEN 998 CALL DGEBS2D( ICTXT, 'Row', TOP, BUFFLEN, 1, 999 $ WORK(BUFFER), BUFFLEN ) 1000 END IF 1001 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) THEN 1002 CALL DGEBS2D( ICTXT, 'Col', TOP, BUFFLEN, 1, 1003 $ WORK(BUFFER), BUFFLEN ) 1004 END IF 1005 END IF 1006 ELSEIF( MYROW.EQ.RSRC .OR. MYCOL.EQ.CSRC ) THEN 1007 IF( NPCOL.GT.1 .AND. DIR.EQ.1 .AND. MYROW.EQ.RSRC ) 1008 $ THEN 1009 BUFFER = PDTRAF 1010 BUFFLEN = DLEN + ILEN 1011 IF( BUFFLEN.NE.0 ) THEN 1012 CALL DGEBR2D( ICTXT, 'Row', TOP, BUFFLEN, 1, 1013 $ WORK(BUFFER), BUFFLEN, RSRC, CSRC ) 1014 END IF 1015 END IF 1016 IF( NPROW.GT.1 .AND. DIR.EQ.2 .AND. MYCOL.EQ.CSRC ) 1017 $ THEN 1018 BUFFER = PDTRAF 1019 BUFFLEN = DLEN + ILEN 1020 IF( BUFFLEN.NE.0 ) THEN 1021 CALL DGEBR2D( ICTXT, 'Col', TOP, BUFFLEN, 1, 1022 $ WORK(BUFFER), BUFFLEN, RSRC, CSRC ) 1023 END IF 1024 END IF 1025 IF((NPCOL.GT.1.AND.DIR.EQ.1.AND.MYROW.EQ.RSRC).OR. 1026 $ (NPROW.GT.1.AND.DIR.EQ.2.AND.MYCOL.EQ.CSRC ) ) 1027 $ THEN 1028 IF( BUFFLEN.NE.0 ) THEN 1029 DO 190 INDX = 1, ILEN 1030 IWORK(IPIW+INDX-1) = 1031 $ INT(WORK( BUFFER+INDX-1 )) 1032 190 CONTINUE 1033 CALL DLAMOV( 'All', DLEN, 1, 1034 $ WORK( BUFFER+ILEN ), DLEN, 1035 $ WORK( IPW2 ), DLEN ) 1036 END IF 1037 END IF 1038 END IF 1039 111 CONTINUE 1040* 1041* Now really perform the updates by applying the orthogonal 1042* transformations to the out-of-window parts of T and Q. This 1043* step is avoided if no reordering was performed by the on- 1044* diagonal processor from the beginning, i.e., BUFFLEN = 0. 1045* 1046* Count number of operations to decide whether to use 1047* matrix-matrix multiplications for updating off-diagonal 1048* parts or not. 1049* 1050 DO 112 WINDOW = 1, NMWIN2 1051 RSRC = IWORK(IRSRC+WINDOW-1) 1052 CSRC = IWORK(ICSRC+WINDOW-1) 1053* 1054 IF( (MYROW.EQ.RSRC .AND. DIR.EQ.1 ).OR. 1055 $ (MYCOL.EQ.CSRC .AND. DIR.EQ.2 ) ) THEN 1056 LILO = IWORK(ILILO+WINDOW-1) 1057 LIHI = IWORK(ILIHI+WINDOW-1) 1058 LSEL = IWORK(ILSEL+WINDOW-1) 1059* 1060* Skip update part for current WINDOW if BUFFLEN = 0. 1061* 1062 IF( BUFFLEN.EQ.0 ) GO TO 295 1063* 1064 NITRAF = PITRAF - IPIW 1065 ISHH = .FALSE. 1066 FLOPS = 0 1067 DO 200 K = 1, NITRAF 1068 IF( IWORK( IPIW + K - 1 ).LE.NWIN ) THEN 1069 FLOPS = FLOPS + 6 1070 ELSE 1071 FLOPS = FLOPS + 11 1072 ISHH = .TRUE. 1073 END IF 1074 200 CONTINUE 1075* 1076* Compute amount of work space necessary for performing 1077* matrix-matrix multiplications. 1078* 1079 PDW = BUFFER 1080 IPW3 = PDW + NWIN*NWIN 1081 ELSE 1082 FLOPS = 0 1083 END IF 1084* 1085 IF( FLOPS.NE.0 .AND. 1086 $ ( FLOPS*100 ) / ( 2*NWIN*NWIN ) .GE. MMULT ) THEN 1087* 1088* Update off-diagonal blocks and Q using matrix-matrix 1089* multiplications; if there are no Householder 1090* reflectors it is preferable to take the triangular 1091* block structure of the transformation matrix into 1092* account. 1093* 1094 CALL DLASET( 'All', NWIN, NWIN, ZERO, ONE, 1095 $ WORK( PDW ), NWIN ) 1096 CALL BDLAAPP( 1, NWIN, NWIN, NCB, WORK( PDW ), NWIN, 1097 $ NITRAF, IWORK(IPIW), WORK( IPW2 ), WORK(IPW3) ) 1098* 1099 IF( ISHH ) THEN 1100* 1101* Loop through the local blocks of the distributed 1102* matrices T and Q and update them according to the 1103* performed reordering. 1104* 1105* Update the columns of T and Q affected by the 1106* reordering. 1107* 1108 IF( DIR.EQ.2 ) THEN 1109 DO 210 INDX = 1, I-1, NB 1110 CALL INFOG2L( INDX, I, DESCT, NPROW, NPCOL, 1111 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1112 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) 1113 $ THEN 1114 LROWS = MIN(NB,I-INDX) 1115 CALL DGEMM( 'No transpose', 1116 $ 'No transpose', LROWS, NWIN, NWIN, 1117 $ ONE, T((JLOC-1)*LLDT+ILOC), LLDT, 1118 $ WORK( PDW ), NWIN, ZERO, 1119 $ WORK(IPW3), LROWS ) 1120 CALL DLAMOV( 'All', LROWS, NWIN, 1121 $ WORK(IPW3), LROWS, 1122 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1123 END IF 1124 210 CONTINUE 1125 IF( WANTQ ) THEN 1126 DO 220 INDX = 1, N, NB 1127 CALL INFOG2L( INDX, I, DESCQ, NPROW, 1128 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1129 $ RSRC1, CSRC1 ) 1130 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1131 $ THEN 1132 LROWS = MIN(NB,N-INDX+1) 1133 CALL DGEMM( 'No transpose', 1134 $ 'No transpose', LROWS, NWIN, NWIN, 1135 $ ONE, Q((JLOC-1)*LLDQ+ILOC), LLDQ, 1136 $ WORK( PDW ), NWIN, ZERO, 1137 $ WORK(IPW3), LROWS ) 1138 CALL DLAMOV( 'All', LROWS, NWIN, 1139 $ WORK(IPW3), LROWS, 1140 $ Q((JLOC-1)*LLDQ+ILOC), LLDQ ) 1141 END IF 1142 220 CONTINUE 1143 END IF 1144 END IF 1145* 1146* Update the rows of T affected by the reordering 1147* 1148 IF( DIR.EQ.1 ) THEN 1149 IF( LIHI.LT.N ) THEN 1150 IF( MOD(LIHI,NB).GT.0 ) THEN 1151 INDX = LIHI + 1 1152 CALL INFOG2L( I, INDX, DESCT, NPROW, 1153 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1154 $ RSRC1, CSRC1 ) 1155 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1156 $ THEN 1157 LCOLS = MOD( MIN( NB-MOD(LIHI,NB), 1158 $ N-LIHI ), NB ) 1159 CALL DGEMM( 'Transpose', 1160 $ 'No Transpose', NWIN, LCOLS, NWIN, 1161 $ ONE, WORK( PDW ), NWIN, 1162 $ T((JLOC-1)*LLDT+ILOC), LLDT, ZERO, 1163 $ WORK(IPW3), NWIN ) 1164 CALL DLAMOV( 'All', NWIN, LCOLS, 1165 $ WORK(IPW3), NWIN, 1166 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1167 END IF 1168 END IF 1169 INDXS = ICEIL(LIHI,NB)*NB + 1 1170 DO 230 INDX = INDXS, N, NB 1171 CALL INFOG2L( I, INDX, DESCT, NPROW, 1172 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1173 $ RSRC1, CSRC1 ) 1174 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1175 $ THEN 1176 LCOLS = MIN( NB, N-INDX+1 ) 1177 CALL DGEMM( 'Transpose', 1178 $ 'No Transpose', NWIN, LCOLS, NWIN, 1179 $ ONE, WORK( PDW ), NWIN, 1180 $ T((JLOC-1)*LLDT+ILOC), LLDT, ZERO, 1181 $ WORK(IPW3), NWIN ) 1182 CALL DLAMOV( 'All', NWIN, LCOLS, 1183 $ WORK(IPW3), NWIN, 1184 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1185 END IF 1186 230 CONTINUE 1187 END IF 1188 END IF 1189 ELSE 1190* 1191* The NWIN-by-NWIN matrix U containing the 1192* accumulated orthogonal transformations has the 1193* following structure: 1194* 1195* [ U11 U12 ] 1196* U = [ ], 1197* [ U21 U22 ] 1198* 1199* where U21 is KS-by-KS upper triangular and U12 is 1200* (NWIN-KS)-by-(NWIN-KS) lower triangular. 1201* 1202* Update the columns of T and Q affected by the 1203* reordering. 1204* 1205* Compute T2*U21 + T1*U11 in workspace. 1206* 1207 IF( DIR.EQ.2 ) THEN 1208 DO 240 INDX = 1, I-1, NB 1209 CALL INFOG2L( INDX, I, DESCT, NPROW, NPCOL, 1210 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1211 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) 1212 $ THEN 1213 JLOC1 = INDXG2L( I+NWIN-KS, NB, MYCOL, 1214 $ DESCT( CSRC_ ), NPCOL ) 1215 LROWS = MIN(NB,I-INDX) 1216 CALL DLAMOV( 'All', LROWS, KS, 1217 $ T((JLOC1-1)*LLDT+ILOC ), LLDT, 1218 $ WORK(IPW3), LROWS ) 1219 CALL DTRMM( 'Right', 'Upper', 1220 $ 'No transpose', 1221 $ 'Non-unit', LROWS, KS, ONE, 1222 $ WORK( PDW+NWIN-KS ), NWIN, 1223 $ WORK(IPW3), LROWS ) 1224 CALL DGEMM( 'No transpose', 1225 $ 'No transpose', LROWS, KS, NWIN-KS, 1226 $ ONE, T((JLOC-1)*LLDT+ILOC), LLDT, 1227 $ WORK( PDW ), NWIN, ONE, WORK(IPW3), 1228 $ LROWS ) 1229* 1230* Compute T1*U12 + T2*U22 in workspace. 1231* 1232 CALL DLAMOV( 'All', LROWS, NWIN-KS, 1233 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1234 $ WORK( IPW3+KS*LROWS ), LROWS ) 1235 CALL DTRMM( 'Right', 'Lower', 1236 $ 'No transpose', 'Non-unit', 1237 $ LROWS, NWIN-KS, ONE, 1238 $ WORK( PDW+NWIN*KS ), NWIN, 1239 $ WORK( IPW3+KS*LROWS ), LROWS ) 1240 CALL DGEMM( 'No transpose', 1241 $ 'No transpose', LROWS, NWIN-KS, KS, 1242 $ ONE, T((JLOC1-1)*LLDT+ILOC), LLDT, 1243 $ WORK( PDW+NWIN*KS+NWIN-KS ), NWIN, 1244 $ ONE, WORK( IPW3+KS*LROWS ), LROWS ) 1245* 1246* Copy workspace to T. 1247* 1248 CALL DLAMOV( 'All', LROWS, NWIN, 1249 $ WORK(IPW3), LROWS, 1250 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1251 END IF 1252 240 CONTINUE 1253 IF( WANTQ ) THEN 1254* 1255* Compute Q2*U21 + Q1*U11 in workspace. 1256* 1257 DO 250 INDX = 1, N, NB 1258 CALL INFOG2L( INDX, I, DESCQ, NPROW, 1259 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1260 $ RSRC1, CSRC1 ) 1261 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1262 $ THEN 1263 JLOC1 = INDXG2L( I+NWIN-KS, NB, 1264 $ MYCOL, DESCQ( CSRC_ ), NPCOL ) 1265 LROWS = MIN(NB,N-INDX+1) 1266 CALL DLAMOV( 'All', LROWS, KS, 1267 $ Q((JLOC1-1)*LLDQ+ILOC ), LLDQ, 1268 $ WORK(IPW3), LROWS ) 1269 CALL DTRMM( 'Right', 'Upper', 1270 $ 'No transpose', 'Non-unit', 1271 $ LROWS, KS, ONE, 1272 $ WORK( PDW+NWIN-KS ), NWIN, 1273 $ WORK(IPW3), LROWS ) 1274 CALL DGEMM( 'No transpose', 1275 $ 'No transpose', LROWS, KS, 1276 $ NWIN-KS, ONE, 1277 $ Q((JLOC-1)*LLDQ+ILOC), LLDQ, 1278 $ WORK( PDW ), NWIN, ONE, 1279 $ WORK(IPW3), LROWS ) 1280* 1281* Compute Q1*U12 + Q2*U22 in workspace. 1282* 1283 CALL DLAMOV( 'All', LROWS, NWIN-KS, 1284 $ Q((JLOC-1)*LLDQ+ILOC), LLDQ, 1285 $ WORK( IPW3+KS*LROWS ), LROWS) 1286 CALL DTRMM( 'Right', 'Lower', 1287 $ 'No transpose', 'Non-unit', 1288 $ LROWS, NWIN-KS, ONE, 1289 $ WORK( PDW+NWIN*KS ), NWIN, 1290 $ WORK( IPW3+KS*LROWS ), LROWS) 1291 CALL DGEMM( 'No transpose', 1292 $ 'No transpose', LROWS, NWIN-KS, 1293 $ KS, ONE, Q((JLOC1-1)*LLDQ+ILOC), 1294 $ LLDQ, WORK(PDW+NWIN*KS+NWIN-KS), 1295 $ NWIN, ONE, WORK( IPW3+KS*LROWS ), 1296 $ LROWS ) 1297* 1298* Copy workspace to Q. 1299* 1300 CALL DLAMOV( 'All', LROWS, NWIN, 1301 $ WORK(IPW3), LROWS, 1302 $ Q((JLOC-1)*LLDQ+ILOC), LLDQ ) 1303 END IF 1304 250 CONTINUE 1305 END IF 1306 END IF 1307* 1308 IF( DIR.EQ.1 ) THEN 1309 IF ( LIHI.LT.N ) THEN 1310* 1311* Compute U21**T*T2 + U11**T*T1 in workspace. 1312* 1313 IF( MOD(LIHI,NB).GT.0 ) THEN 1314 INDX = LIHI + 1 1315 CALL INFOG2L( I, INDX, DESCT, NPROW, 1316 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1317 $ RSRC1, CSRC1 ) 1318 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1319 $ THEN 1320 ILOC1 = INDXG2L( I+NWIN-KS, NB, MYROW, 1321 $ DESCT( RSRC_ ), NPROW ) 1322 LCOLS = MOD( MIN( NB-MOD(LIHI,NB), 1323 $ N-LIHI ), NB ) 1324 CALL DLAMOV( 'All', KS, LCOLS, 1325 $ T((JLOC-1)*LLDT+ILOC1), LLDT, 1326 $ WORK(IPW3), NWIN ) 1327 CALL DTRMM( 'Left', 'Upper', 1328 $ 'Transpose', 'Non-unit', KS, 1329 $ LCOLS, ONE, WORK( PDW+NWIN-KS ), 1330 $ NWIN, WORK(IPW3), NWIN ) 1331 CALL DGEMM( 'Transpose', 1332 $ 'No transpose', KS, LCOLS, 1333 $ NWIN-KS, ONE, WORK(PDW), NWIN, 1334 $ T((JLOC-1)*LLDT+ILOC), LLDT, ONE, 1335 $ WORK(IPW3), NWIN ) 1336* 1337* Compute U12**T*T1 + U22**T*T2 in 1338* workspace. 1339* 1340 CALL DLAMOV( 'All', NWIN-KS, LCOLS, 1341 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1342 $ WORK( IPW3+KS ), NWIN ) 1343 CALL DTRMM( 'Left', 'Lower', 1344 $ 'Transpose', 'Non-unit', 1345 $ NWIN-KS, LCOLS, ONE, 1346 $ WORK( PDW+NWIN*KS ), NWIN, 1347 $ WORK( IPW3+KS ), NWIN ) 1348 CALL DGEMM( 'Transpose', 1349 $ 'No Transpose', NWIN-KS, LCOLS, 1350 $ KS, ONE, 1351 $ WORK( PDW+NWIN*KS+NWIN-KS ), 1352 $ NWIN, T((JLOC-1)*LLDT+ILOC1), 1353 $ LLDT, ONE, WORK( IPW3+KS ), 1354 $ NWIN ) 1355* 1356* Copy workspace to T. 1357* 1358 CALL DLAMOV( 'All', NWIN, LCOLS, 1359 $ WORK(IPW3), NWIN, 1360 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1361 END IF 1362 END IF 1363 INDXS = ICEIL(LIHI,NB)*NB + 1 1364 DO 260 INDX = INDXS, N, NB 1365 CALL INFOG2L( I, INDX, DESCT, NPROW, 1366 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 1367 $ RSRC1, CSRC1 ) 1368 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC1 ) 1369 $ THEN 1370* 1371* Compute U21**T*T2 + U11**T*T1 in 1372* workspace. 1373* 1374 ILOC1 = INDXG2L( I+NWIN-KS, NB, 1375 $ MYROW, DESCT( RSRC_ ), NPROW ) 1376 LCOLS = MIN( NB, N-INDX+1 ) 1377 CALL DLAMOV( 'All', KS, LCOLS, 1378 $ T((JLOC-1)*LLDT+ILOC1), LLDT, 1379 $ WORK(IPW3), NWIN ) 1380 CALL DTRMM( 'Left', 'Upper', 1381 $ 'Transpose', 'Non-unit', KS, 1382 $ LCOLS, ONE, 1383 $ WORK( PDW+NWIN-KS ), NWIN, 1384 $ WORK(IPW3), NWIN ) 1385 CALL DGEMM( 'Transpose', 1386 $ 'No transpose', KS, LCOLS, 1387 $ NWIN-KS, ONE, WORK(PDW), NWIN, 1388 $ T((JLOC-1)*LLDT+ILOC), LLDT, ONE, 1389 $ WORK(IPW3), NWIN ) 1390* 1391* Compute U12**T*T1 + U22**T*T2 in 1392* workspace. 1393* 1394 CALL DLAMOV( 'All', NWIN-KS, LCOLS, 1395 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1396 $ WORK( IPW3+KS ), NWIN ) 1397 CALL DTRMM( 'Left', 'Lower', 1398 $ 'Transpose', 'Non-unit', 1399 $ NWIN-KS, LCOLS, ONE, 1400 $ WORK( PDW+NWIN*KS ), NWIN, 1401 $ WORK( IPW3+KS ), NWIN ) 1402 CALL DGEMM( 'Transpose', 1403 $ 'No Transpose', NWIN-KS, LCOLS, 1404 $ KS, ONE, 1405 $ WORK( PDW+NWIN*KS+NWIN-KS ), 1406 $ NWIN, T((JLOC-1)*LLDT+ILOC1), 1407 $ LLDT, ONE, WORK(IPW3+KS), NWIN ) 1408* 1409* Copy workspace to T. 1410* 1411 CALL DLAMOV( 'All', NWIN, LCOLS, 1412 $ WORK(IPW3), NWIN, 1413 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 1414 END IF 1415 260 CONTINUE 1416 END IF 1417 END IF 1418 END IF 1419 ELSEIF( FLOPS.NE.0 ) THEN 1420* 1421* Update off-diagonal blocks and Q using the pipelined 1422* elementary transformations. 1423* 1424 IF( DIR.EQ.2 ) THEN 1425 DO 270 INDX = 1, I-1, NB 1426 CALL INFOG2L( INDX, I, DESCT, NPROW, NPCOL, 1427 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1428 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1429 LROWS = MIN(NB,I-INDX) 1430 CALL BDLAAPP( 1, LROWS, NWIN, NCB, 1431 $ T((JLOC-1)*LLDT+ILOC ), LLDT, NITRAF, 1432 $ IWORK(IPIW), WORK( IPW2 ), 1433 $ WORK(IPW3) ) 1434 END IF 1435 270 CONTINUE 1436 IF( WANTQ ) THEN 1437 DO 280 INDX = 1, N, NB 1438 CALL INFOG2L( INDX, I, DESCQ, NPROW, NPCOL, 1439 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1440 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) 1441 $ THEN 1442 LROWS = MIN(NB,N-INDX+1) 1443 CALL BDLAAPP( 1, LROWS, NWIN, NCB, 1444 $ Q((JLOC-1)*LLDQ+ILOC), LLDQ, NITRAF, 1445 $ IWORK(IPIW), WORK( IPW2 ), 1446 $ WORK(IPW3) ) 1447 END IF 1448 280 CONTINUE 1449 END IF 1450 END IF 1451 IF( DIR.EQ.1 ) THEN 1452 IF( LIHI.LT.N ) THEN 1453 IF( MOD(LIHI,NB).GT.0 ) THEN 1454 INDX = LIHI + 1 1455 CALL INFOG2L( I, INDX, DESCT, NPROW, NPCOL, 1456 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1457 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) 1458 $ THEN 1459 LCOLS = MOD( MIN( NB-MOD(LIHI,NB), 1460 $ N-LIHI ), NB ) 1461 CALL BDLAAPP( 0, NWIN, LCOLS, NCB, 1462 $ T((JLOC-1)*LLDT+ILOC), LLDT, NITRAF, 1463 $ IWORK(IPIW), WORK( IPW2 ), 1464 $ WORK(IPW3) ) 1465 END IF 1466 END IF 1467 INDXS = ICEIL(LIHI,NB)*NB + 1 1468 DO 290 INDX = INDXS, N, NB 1469 CALL INFOG2L( I, INDX, DESCT, NPROW, NPCOL, 1470 $ MYROW, MYCOL, ILOC, JLOC, RSRC1, CSRC1 ) 1471 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) 1472 $ THEN 1473 LCOLS = MIN( NB, N-INDX+1 ) 1474 CALL BDLAAPP( 0, NWIN, LCOLS, NCB, 1475 $ T((JLOC-1)*LLDT+ILOC), LLDT, NITRAF, 1476 $ IWORK(IPIW), WORK( IPW2 ), 1477 $ WORK(IPW3) ) 1478 END IF 1479 290 CONTINUE 1480 END IF 1481 END IF 1482 END IF 1483* 1484* If I was not involved in the updates for the current 1485* window or the window was fully processed, I go here and 1486* try again for the next window. 1487* 1488 295 CONTINUE 1489* 1490* Update LIHI and LIHI depending on the number of 1491* eigenvalues really moved - for on-diagonal processes we 1492* do this update only once since each on-diagonal process 1493* is only involved with one window at one time. The 1494* indicies are updated in three cases: 1495* 1) When some reordering was really performed 1496* -- indicated by BUFFLEN > 0. 1497* 2) When no selected eigenvalues was found in the 1498* current window -- indicated by KS = 0. 1499* 3) When some selected eigenvalues was found in the 1500* current window but no one of them was moved 1501* (KS > 0 and BUFFLEN = 0) 1502* False index updating is avoided by sometimes setting 1503* KS = -1. This will affect processors involved in more 1504* than one window and where the first one ends up with 1505* KS = 0 and for the second one is done already. 1506* 1507 IF( MYROW.EQ.RSRC.AND.MYCOL.EQ.CSRC ) THEN 1508 IF( DIR.EQ.2 ) THEN 1509 IF( BUFFLEN.NE.0 .OR. KS.EQ.0 .OR. 1510 $ ( BUFFLEN.EQ.0 .AND. KS.GT.0 ) ) 1511 $ LIHI = I + KS - 1 1512 IWORK( ILIHI+WINDOW-1 ) = LIHI 1513 IF( .NOT. LIHI.GE.LILO+LSEL ) THEN 1514 LILO = LILO + LSEL 1515 IWORK( ILILO+WINDOW-1 ) = LILO 1516 END IF 1517 END IF 1518 ELSEIF( MYROW.EQ.RSRC .AND. DIR.EQ.1 ) THEN 1519 IF( BUFFLEN.NE.0 .OR. KS.EQ.0 .OR. 1520 $ ( BUFFLEN.EQ.0 .AND. KS.GT.0 ) ) 1521 $ LIHI = I + KS - 1 1522 IWORK( ILIHI+WINDOW-1 ) = LIHI 1523 IF( .NOT. LIHI.GE.LILO+LSEL ) THEN 1524 LILO = LILO + LSEL 1525 IWORK( ILILO+WINDOW-1 ) = LILO 1526 END IF 1527 ELSEIF( MYCOL.EQ.CSRC .AND. DIR.EQ.2 ) THEN 1528 IF( BUFFLEN.NE.0 .OR. KS.EQ.0 .OR. 1529 $ ( BUFFLEN.EQ.0 .AND. KS.GT.0 ) ) 1530 $ LIHI = I + KS - 1 1531 IWORK( ILIHI+WINDOW-1 ) = LIHI 1532 IF( .NOT. LIHI.GE.LILO+LSEL ) THEN 1533 LILO = LILO + LSEL 1534 IWORK( ILILO+WINDOW-1 ) = LILO 1535 END IF 1536 END IF 1537* 1538 112 CONTINUE 1539* 1540* End of direction loop for updates with respect to local 1541* reordering. 1542* 1543 1111 CONTINUE 1544* 1545* Associate each process with one of the corresponding 1546* computational windows such that the test for another round 1547* of local reordering is carried out properly. Since the 1548* column updates were computed after the row updates, it is 1549* sufficient to test for changing the association to the 1550* window in the corresponding process row. 1551* 1552 DO 113 WINDOW = 1, NMWIN2 1553 RSRC = IWORK( IRSRC + WINDOW - 1 ) 1554 IF( MYROW.EQ.RSRC .AND. (.NOT. LIHI.GE.LILO+LSEL ) ) THEN 1555 LILO = IWORK( ILILO + WINDOW - 1 ) 1556 LIHI = IWORK( ILIHI + WINDOW - 1 ) 1557 LSEL = IWORK( ILSEL + WINDOW - 1 ) 1558 END IF 1559 113 CONTINUE 1560* 1561* End While ( LIHI >= LILO + LSEL ) 1562 ROUND = ROUND + 1 1563 IF( FIRST ) FIRST = .FALSE. 1564 GO TO 130 1565 END IF 1566* 1567* All processors excluded from the local reordering go here. 1568* 1569 114 CONTINUE 1570* 1571* Barrier to collect the processes before proceeding. 1572* 1573 CALL BLACS_BARRIER( ICTXT, 'All' ) 1574* 1575* Compute global maximum of IERR so that we know if some process 1576* experienced a failure in the reordering. 1577* 1578 MYIERR = IERR 1579 IF( NPROCS.GT.1 ) 1580 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, IERR, 1, -1, 1581 $ -1, -1, -1, -1 ) 1582* 1583 IF( IERR.NE.0 ) THEN 1584* 1585* When calling BDTREXC, the block at position I+KKS-1 failed 1586* to swap. 1587* 1588 IF( MYIERR.NE.0 ) INFO = MAX(1,I+KKS-1) 1589 IF( NPROCS.GT.1 ) 1590 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, INFO, 1, -1, 1591 $ -1, -1, -1, -1 ) 1592 GO TO 300 1593 END IF 1594* 1595* Now, for each compuational window, move the selected 1596* eigenvalues across the process border. Do this by forming the 1597* processors into groups of four working together to bring the 1598* window over the border. The processes are numbered as follows 1599* 1600* 1 | 2 1601* --+-- 1602* 3 | 4 1603* 1604* where '|' and '-' denotes the process (and block) borders. 1605* This implies that the cluster to be reordered over the border 1606* is held by process 4, process 1 will receive the cluster after 1607* the reordering, process 3 holds the local (2,1)th element of a 1608* 2-by-2 diagonal block located on the block border and process 2 1609* holds the closest off-diagonal part of the window that is 1610* affected by the cross-border reordering. 1611* 1612* The active window is now ( I : LIHI[4], I : LIHI[4] ), where 1613* I = MAX( ILO, LIHI - 2*MOD(LIHI,NB) ). If this active window is 1614* too large compared to the value of PARA( 6 ), it will be 1615* truncated in both ends such that a maximum of PARA( 6 ) 1616* eigenvalues is reordered across the border this time. 1617* 1618* The active window will be collected and built in workspace at 1619* process 1 and 4, which both compute the reordering and return 1620* the updated parts to the corresponding processes 2-3. Next, the 1621* accumulated transformations are broadcasted for updates in the 1622* block rows and column that corresponds to the process rows and 1623* columns where process 1 and 4 reside. 1624* 1625* The off-diagonal blocks are updated by the processes receiving 1626* from the broadcasts of the orthogonal transformations. Since 1627* the active window is split over the process borders, the 1628* updates of T and Q requires that stripes of block rows of 1629* columns are exchanged between neighboring processes in the 1630* corresponding process rows and columns. 1631* 1632* First, form each group of processors involved in the 1633* crossborder reordering. Do this in two (or three) phases: 1634* 1) Reorder each odd window over the border. 1635* 2) Reorder each even window over the border. 1636* 3) Reorder the last odd window over the border, if it was not 1637* processed in the first phase. 1638* 1639* When reordering the odd windows over the border, we must make 1640* sure that no process row or column is involved in both the 1641* first and the last window at the same time. This happens when 1642* the total number of windows is odd, greater than one and equal 1643* to the minumum process mesh dimension. Therefore the last odd 1644* window may be reordered over the border at last. 1645* 1646 LASTWAIT = NMWIN2.GT.1 .AND. MOD(NMWIN2,2).EQ.1 .AND. 1647 $ NMWIN2.EQ.MIN(NPROW,NPCOL) 1648* 1649 LAST = 0 1650 308 CONTINUE 1651 IF( LASTWAIT ) THEN 1652 IF( LAST.EQ.0 ) THEN 1653 WIN0S = 1 1654 WIN0E = 2 1655 WINE = NMWIN2 - 1 1656 ELSE 1657 WIN0S = NMWIN2 1658 WIN0E = NMWIN2 1659 WINE = NMWIN2 1660 END IF 1661 ELSE 1662 WIN0S = 1 1663 WIN0E = 2 1664 WINE = NMWIN2 1665 END IF 1666 DO 310 WINDOW0 = WIN0S, WIN0E 1667 DO 320 WINDOW = WINDOW0, WINE, 2 1668* 1669* Define the process holding the down-right part of the 1670* window. 1671* 1672 RSRC4 = IWORK(IRSRC+WINDOW-1) 1673 CSRC4 = IWORK(ICSRC+WINDOW-1) 1674* 1675* Define the other processes in the group of four. 1676* 1677 RSRC3 = RSRC4 1678 CSRC3 = MOD( CSRC4 - 1 + NPCOL, NPCOL ) 1679 RSRC2 = MOD( RSRC4 - 1 + NPROW, NPROW ) 1680 CSRC2 = CSRC4 1681 RSRC1 = RSRC2 1682 CSRC1 = CSRC3 1683 IF( ( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) .OR. 1684 $ ( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) .OR. 1685 $ ( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) .OR. 1686 $ ( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) ) THEN 1687* 1688* Compute the correct active window - for reordering 1689* into a block that has not been active at all before, 1690* we try to reorder as many of our eigenvalues over the 1691* border as possible without knowing of the situation on 1692* the other side - this may cause very few eigenvalues 1693* to be reordered over the border this time (perhaps not 1694* any) but this should be an initial problem. Anyway, 1695* the bottom-right position of the block will be at 1696* position LIHIC. 1697* 1698 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 1699 LIHI4 = ( IWORK( ILILO + WINDOW - 1 ) + 1700 $ IWORK( ILIHI + WINDOW - 1 ) ) / 2 1701 LIHIC = MIN(LIHI4,(ICEIL(LIHI4,NB)-1)*NB+WNEICR) 1702* 1703* Fix LIHIC to avoid that bottom of window cuts 1704* 2-by-2 block and make sure all processors in the 1705* group knows about the correct value. 1706* 1707 IF( (.NOT. LIHIC.LE.NB) .AND. LIHIC.LT.N ) THEN 1708 ILOC = INDXG2L( LIHIC+1, NB, MYROW, 1709 $ DESCT( RSRC_ ), NPROW ) 1710 JLOC = INDXG2L( LIHIC, NB, MYCOL, 1711 $ DESCT( CSRC_ ), NPCOL ) 1712 IF( T( (JLOC-1)*LLDT+ILOC ).NE.ZERO ) THEN 1713 IF( MOD( LIHIC, NB ).EQ.1 .OR. 1714 $ ( MOD( LIHIC, NB ).EQ.2 .AND. 1715 $ SELECT(LIHIC-2).EQ.0 ) ) 1716 $ THEN 1717 LIHIC = LIHIC + 1 1718 ELSE 1719 LIHIC = LIHIC - 1 1720 END IF 1721 END IF 1722 END IF 1723 IF( RSRC4.NE.RSRC1 .OR. CSRC4.NE.CSRC1 ) 1724 $ CALL IGESD2D( ICTXT, 1, 1, LIHIC, 1, RSRC1, 1725 $ CSRC1 ) 1726 IF( RSRC4.NE.RSRC2 .OR. CSRC4.NE.CSRC2 ) 1727 $ CALL IGESD2D( ICTXT, 1, 1, LIHIC, 1, RSRC2, 1728 $ CSRC2 ) 1729 IF( RSRC4.NE.RSRC3 .OR. CSRC4.NE.CSRC3 ) 1730 $ CALL IGESD2D( ICTXT, 1, 1, LIHIC, 1, RSRC3, 1731 $ CSRC3 ) 1732 END IF 1733 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1734 IF( RSRC4.NE.RSRC1 .OR. CSRC4.NE.CSRC1 ) 1735 $ CALL IGERV2D( ICTXT, 1, 1, LIHIC, 1, RSRC4, 1736 $ CSRC4 ) 1737 END IF 1738 IF( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) THEN 1739 IF( RSRC4.NE.RSRC2 .OR. CSRC4.NE.CSRC2 ) 1740 $ CALL IGERV2D( ICTXT, 1, 1, LIHIC, 1, RSRC4, 1741 $ CSRC4 ) 1742 END IF 1743 IF( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) THEN 1744 IF( RSRC4.NE.RSRC3 .OR. CSRC4.NE.CSRC3 ) 1745 $ CALL IGERV2D( ICTXT, 1, 1, LIHIC, 1, RSRC4, 1746 $ CSRC4 ) 1747 END IF 1748* 1749* Avoid going over the border with the first window if 1750* it resides in the block where the last global position 1751* T(ILO,ILO) is or ILO has been updated to point to a 1752* position right of T(LIHIC,LIHIC). 1753* 1754 SKIP1CR = WINDOW.EQ.1 .AND. 1755 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 1756* 1757* Decide I, where to put top of window, such that top of 1758* window does not cut 2-by-2 block. Make sure that we do 1759* not end up in a situation where a 2-by-2 block 1760* splitted on the border is left in its original place 1761* -- this can cause infinite loops. 1762* Remedy: make sure that the part of the window that 1763* resides left to the border is at least of dimension 1764* two (2) in case we have 2-by-2 blocks in top of the 1765* cross border window. 1766* 1767* Also make sure all processors in the group knows about 1768* the correct value of I. When skipping the crossborder 1769* reordering, just set I = LIHIC. 1770* 1771 IF( .NOT. SKIP1CR ) THEN 1772 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1773 IF( WINDOW.EQ.1 ) THEN 1774 LIHI1 = ILO 1775 ELSE 1776 LIHI1 = IWORK( ILIHI + WINDOW - 2 ) 1777 END IF 1778 I = MAX( LIHI1, 1779 $ MIN( LIHIC-2*MOD(LIHIC,NB) + 1, 1780 $ (ICEIL(LIHIC,NB)-1)*NB - 1 ) ) 1781 ILOC = INDXG2L( I, NB, MYROW, DESCT( RSRC_ ), 1782 $ NPROW ) 1783 JLOC = INDXG2L( I-1, NB, MYCOL, DESCT( CSRC_ ), 1784 $ NPCOL ) 1785 IF( T( (JLOC-1)*LLDT+ILOC ).NE.ZERO ) 1786 $ I = I - 1 1787 IF( RSRC1.NE.RSRC4 .OR. CSRC1.NE.CSRC4 ) 1788 $ CALL IGESD2D( ICTXT, 1, 1, I, 1, RSRC4, 1789 $ CSRC4 ) 1790 IF( RSRC1.NE.RSRC2 .OR. CSRC1.NE.CSRC2 ) 1791 $ CALL IGESD2D( ICTXT, 1, 1, I, 1, RSRC2, 1792 $ CSRC2 ) 1793 IF( RSRC1.NE.RSRC3 .OR. CSRC1.NE.CSRC3 ) 1794 $ CALL IGESD2D( ICTXT, 1, 1, I, 1, RSRC3, 1795 $ CSRC3 ) 1796 END IF 1797 IF( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) THEN 1798 IF( RSRC1.NE.RSRC2 .OR. CSRC1.NE.CSRC2 ) 1799 $ CALL IGERV2D( ICTXT, 1, 1, I, 1, RSRC1, 1800 $ CSRC1 ) 1801 END IF 1802 IF( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) THEN 1803 IF( RSRC1.NE.RSRC3 .OR. CSRC1.NE.CSRC3 ) 1804 $ CALL IGERV2D( ICTXT, 1, 1, I, 1, RSRC1, 1805 $ CSRC1 ) 1806 END IF 1807 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 1808 IF( RSRC1.NE.RSRC4 .OR. CSRC1.NE.CSRC4 ) 1809 $ CALL IGERV2D( ICTXT, 1, 1, I, 1, RSRC1, 1810 $ CSRC1 ) 1811 END IF 1812 ELSE 1813 I = LIHIC 1814 END IF 1815* 1816* Finalize computation of window size: active window is 1817* now (I:LIHIC,I:LIHIC). 1818* 1819 NWIN = LIHIC - I + 1 1820 KS = 0 1821* 1822* Skip rest of this part if appropriate. 1823* 1824 IF( SKIP1CR ) GO TO 360 1825* 1826* Divide workspace -- put active window in 1827* WORK(IPW2:IPW2+NWIN**2-1) and orthogonal 1828* transformations in WORK(IPW3:...). 1829* 1830 CALL DLASET( 'All', NWIN, NWIN, ZERO, ZERO, 1831 $ WORK( IPW2 ), NWIN ) 1832* 1833 PITRAF = IPIW 1834 IPW3 = IPW2 + NWIN*NWIN 1835 PDTRAF = IPW3 1836* 1837* Exchange the current view of SELECT for the active 1838* window between process 1 and 4 to make sure that 1839* exactly the same job is performed for both processes. 1840* 1841 IF( RSRC1.NE.RSRC4 .OR. CSRC1.NE.CSRC4 ) THEN 1842 ILEN4 = MOD(LIHIC,NB) 1843 SELI4 = ICEIL(I,NB)*NB+1 1844 ILEN1 = NWIN - ILEN4 1845 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1846 CALL IGESD2D( ICTXT, ILEN1, 1, SELECT(I), 1847 $ ILEN1, RSRC4, CSRC4 ) 1848 CALL IGERV2D( ICTXT, ILEN4, 1, SELECT(SELI4), 1849 $ ILEN4, RSRC4, CSRC4 ) 1850 END IF 1851 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 1852 CALL IGESD2D( ICTXT, ILEN4, 1, SELECT(SELI4), 1853 $ ILEN4, RSRC1, CSRC1 ) 1854 CALL IGERV2D( ICTXT, ILEN1, 1, SELECT(I), 1855 $ ILEN1, RSRC1, CSRC1 ) 1856 END IF 1857 END IF 1858* 1859* Form the active window by a series of point-to-point 1860* sends and receives. 1861* 1862 DIM1 = NB - MOD(I-1,NB) 1863 DIM4 = NWIN - DIM1 1864 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1865 ILOC = INDXG2L( I, NB, MYROW, DESCT( RSRC_ ), 1866 $ NPROW ) 1867 JLOC = INDXG2L( I, NB, MYCOL, DESCT( CSRC_ ), 1868 $ NPCOL ) 1869 CALL DLAMOV( 'All', DIM1, DIM1, 1870 $ T((JLOC-1)*LLDT+ILOC), LLDT, WORK(IPW2), 1871 $ NWIN ) 1872 IF( RSRC1.NE.RSRC4 .OR. CSRC1.NE.CSRC4 ) THEN 1873 CALL DGESD2D( ICTXT, DIM1, DIM1, 1874 $ WORK(IPW2), NWIN, RSRC4, CSRC4 ) 1875 CALL DGERV2D( ICTXT, DIM4, DIM4, 1876 $ WORK(IPW2+DIM1*NWIN+DIM1), NWIN, RSRC4, 1877 $ CSRC4 ) 1878 END IF 1879 END IF 1880 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 1881 ILOC = INDXG2L( I+DIM1, NB, MYROW, DESCT( RSRC_ ), 1882 $ NPROW ) 1883 JLOC = INDXG2L( I+DIM1, NB, MYCOL, DESCT( CSRC_ ), 1884 $ NPCOL ) 1885 CALL DLAMOV( 'All', DIM4, DIM4, 1886 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1887 $ WORK(IPW2+DIM1*NWIN+DIM1), NWIN ) 1888 IF( RSRC4.NE.RSRC1 .OR. CSRC4.NE.CSRC1 ) THEN 1889 CALL DGESD2D( ICTXT, DIM4, DIM4, 1890 $ WORK(IPW2+DIM1*NWIN+DIM1), NWIN, RSRC1, 1891 $ CSRC1 ) 1892 CALL DGERV2D( ICTXT, DIM1, DIM1, 1893 $ WORK(IPW2), NWIN, RSRC1, CSRC1 ) 1894 END IF 1895 END IF 1896 IF( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) THEN 1897 ILOC = INDXG2L( I, NB, MYROW, DESCT( RSRC_ ), 1898 $ NPROW ) 1899 JLOC = INDXG2L( I+DIM1, NB, MYCOL, DESCT( CSRC_ ), 1900 $ NPCOL ) 1901 CALL DLAMOV( 'All', DIM1, DIM4, 1902 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1903 $ WORK(IPW2+DIM1*NWIN), NWIN ) 1904 IF( RSRC2.NE.RSRC1 .OR. CSRC2.NE.CSRC1 ) THEN 1905 CALL DGESD2D( ICTXT, DIM1, DIM4, 1906 $ WORK(IPW2+DIM1*NWIN), NWIN, RSRC1, CSRC1 ) 1907 END IF 1908 END IF 1909 IF( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) THEN 1910 IF( RSRC2.NE.RSRC4 .OR. CSRC2.NE.CSRC4 ) THEN 1911 CALL DGESD2D( ICTXT, DIM1, DIM4, 1912 $ WORK(IPW2+DIM1*NWIN), NWIN, RSRC4, CSRC4 ) 1913 END IF 1914 END IF 1915 IF( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) THEN 1916 ILOC = INDXG2L( I+DIM1, NB, MYROW, DESCT( RSRC_ ), 1917 $ NPROW ) 1918 JLOC = INDXG2L( I+DIM1-1, NB, MYCOL, 1919 $ DESCT( CSRC_ ), NPCOL ) 1920 CALL DLAMOV( 'All', 1, 1, 1921 $ T((JLOC-1)*LLDT+ILOC), LLDT, 1922 $ WORK(IPW2+(DIM1-1)*NWIN+DIM1), NWIN ) 1923 IF( RSRC3.NE.RSRC1 .OR. CSRC3.NE.CSRC1 ) THEN 1924 CALL DGESD2D( ICTXT, 1, 1, 1925 $ WORK(IPW2+(DIM1-1)*NWIN+DIM1), NWIN, 1926 $ RSRC1, CSRC1 ) 1927 END IF 1928 END IF 1929 IF( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) THEN 1930 IF( RSRC3.NE.RSRC4 .OR. CSRC3.NE.CSRC4 ) THEN 1931 CALL DGESD2D( ICTXT, 1, 1, 1932 $ WORK(IPW2+(DIM1-1)*NWIN+DIM1), NWIN, 1933 $ RSRC4, CSRC4 ) 1934 END IF 1935 END IF 1936 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 1937 IF( RSRC1.NE.RSRC2 .OR. CSRC1.NE.CSRC2 ) THEN 1938 CALL DGERV2D( ICTXT, DIM1, DIM4, 1939 $ WORK(IPW2+DIM1*NWIN), NWIN, RSRC2, 1940 $ CSRC2 ) 1941 END IF 1942 IF( RSRC1.NE.RSRC3 .OR. CSRC1.NE.CSRC3 ) THEN 1943 CALL DGERV2D( ICTXT, 1, 1, 1944 $ WORK(IPW2+(DIM1-1)*NWIN+DIM1), NWIN, 1945 $ RSRC3, CSRC3 ) 1946 END IF 1947 END IF 1948 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 1949 IF( RSRC4.NE.RSRC2 .OR. CSRC4.NE.CSRC2 ) THEN 1950 CALL DGERV2D( ICTXT, DIM1, DIM4, 1951 $ WORK(IPW2+DIM1*NWIN), NWIN, RSRC2, 1952 $ CSRC2 ) 1953 END IF 1954 IF( RSRC4.NE.RSRC3 .OR. CSRC4.NE.CSRC3 ) THEN 1955 CALL DGERV2D( ICTXT, 1, 1, 1956 $ WORK(IPW2+(DIM1-1)*NWIN+DIM1), NWIN, 1957 $ RSRC3, CSRC3 ) 1958 END IF 1959 END IF 1960* 1961* Compute the reordering (just as in the total local 1962* case) and accumulate the transformations (ONLY 1963* ON-DIAGONAL PROCESSES). 1964* 1965 IF( ( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) .OR. 1966 $ ( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) ) THEN 1967 PAIR = .FALSE. 1968 DO 330 K = I, LIHIC 1969 IF( PAIR ) THEN 1970 PAIR = .FALSE. 1971 ELSE 1972 SWAP = SELECT( K ).NE.0 1973 IF( K.LT.LIHIC ) THEN 1974 ELEM = WORK(IPW2+(K-I)*NWIN+K-I+1) 1975 IF( ELEM.NE.ZERO ) 1976 $ PAIR = .TRUE. 1977 END IF 1978 IF( SWAP ) THEN 1979 KS = KS + 1 1980* 1981* Swap the K-th block to position I+KS-1. 1982* 1983 IERR = 0 1984 KK = K - I + 1 1985 KKS = KS 1986 IF( KK.NE.KS ) THEN 1987 NITRAF = LIWORK - PITRAF + 1 1988 NDTRAF = LWORK - PDTRAF + 1 1989 CALL BDTREXC( NWIN, WORK(IPW2), NWIN, 1990 $ KK, KKS, NITRAF, IWORK( PITRAF ), 1991 $ NDTRAF, WORK( PDTRAF ), 1992 $ WORK(IPW1), IERR ) 1993 PITRAF = PITRAF + NITRAF 1994 PDTRAF = PDTRAF + NDTRAF 1995* 1996* Update array SELECT. 1997* 1998 IF ( PAIR ) THEN 1999 DO 340 J = I+KK-1, I+KKS, -1 2000 SELECT(J+1) = SELECT(J-1) 2001 340 CONTINUE 2002 SELECT(I+KKS-1) = 1 2003 SELECT(I+KKS) = 1 2004 ELSE 2005 DO 350 J = I+KK-1, I+KKS, -1 2006 SELECT(J) = SELECT(J-1) 2007 350 CONTINUE 2008 SELECT(I+KKS-1) = 1 2009 END IF 2010* 2011 IF ( IERR.EQ.1 .OR. IERR.EQ.2 ) THEN 2012* 2013 IF ( IERR.EQ.2 ) THEN 2014 SELECT( I+KKS-3 ) = 1 2015 SELECT( I+KKS-1 ) = 0 2016 KKS = KKS + 1 2017 END IF 2018* 2019 GO TO 360 2020 END IF 2021 KS = KKS 2022 END IF 2023 IF( PAIR ) 2024 $ KS = KS + 1 2025 END IF 2026 END IF 2027 330 CONTINUE 2028 END IF 2029 360 CONTINUE 2030* 2031* Save information about the reordering. 2032* 2033 IF( ( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) .OR. 2034 $ ( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) ) THEN 2035 IBUFF( 1 ) = I 2036 IBUFF( 2 ) = NWIN 2037 IBUFF( 3 ) = PITRAF 2038 IBUFF( 4 ) = KS 2039 IBUFF( 5 ) = PDTRAF 2040 IBUFF( 6 ) = NDTRAF 2041 ILEN = PITRAF - IPIW + 1 2042 DLEN = PDTRAF - IPW3 + 1 2043 IBUFF( 7 ) = ILEN 2044 IBUFF( 8 ) = DLEN 2045* 2046* Put reordered data back into global matrix if a 2047* reordering took place. 2048* 2049 IF( .NOT. SKIP1CR ) THEN 2050 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 2051 ILOC = INDXG2L( I, NB, MYROW, DESCT( RSRC_ ), 2052 $ NPROW ) 2053 JLOC = INDXG2L( I, NB, MYCOL, DESCT( CSRC_ ), 2054 $ NPCOL ) 2055 CALL DLAMOV( 'All', DIM1, DIM1, WORK(IPW2), 2056 $ NWIN, T((JLOC-1)*LLDT+ILOC), LLDT ) 2057 END IF 2058 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2059 ILOC = INDXG2L( I+DIM1, NB, MYROW, 2060 $ DESCT( RSRC_ ), NPROW ) 2061 JLOC = INDXG2L( I+DIM1, NB, MYCOL, 2062 $ DESCT( CSRC_ ), NPCOL ) 2063 CALL DLAMOV( 'All', DIM4, DIM4, 2064 $ WORK(IPW2+DIM1*NWIN+DIM1), NWIN, 2065 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 2066 END IF 2067 END IF 2068 END IF 2069* 2070* Break if appropriate -- IBUFF(3:8) may now contain 2071* nonsens, but that's no problem. The processors outside 2072* the cross border group only needs to know about I and 2073* NWIN to get a correct value of SKIP1CR (see below) and 2074* to skip the cross border updates if necessary. 2075* 2076 IF( WINDOW.EQ.1 .AND. SKIP1CR ) GO TO 325 2077* 2078* Return reordered data to process 2 and 3. 2079* 2080 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 2081 IF( RSRC1.NE.RSRC3 .OR. CSRC1.NE.CSRC3 ) THEN 2082 CALL DGESD2D( ICTXT, 1, 1, 2083 $ WORK( IPW2+(DIM1-1)*NWIN+DIM1 ), NWIN, 2084 $ RSRC3, CSRC3 ) 2085 END IF 2086 END IF 2087 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2088 IF( RSRC4.NE.RSRC2 .OR. CSRC4.NE.CSRC2 ) THEN 2089 CALL DGESD2D( ICTXT, DIM1, DIM4, 2090 $ WORK( IPW2+DIM1*NWIN), NWIN, RSRC2, 2091 $ CSRC2 ) 2092 END IF 2093 END IF 2094 IF( MYROW.EQ.RSRC2 .AND. MYCOL.EQ.CSRC2 ) THEN 2095 ILOC = INDXG2L( I, NB, MYROW, DESCT( RSRC_ ), 2096 $ NPROW ) 2097 JLOC = INDXG2L( I+DIM1, NB, MYCOL, 2098 $ DESCT( CSRC_ ), NPCOL ) 2099 IF( RSRC2.NE.RSRC4 .OR. CSRC2.NE.CSRC4 ) THEN 2100 CALL DGERV2D( ICTXT, DIM1, DIM4, 2101 $ WORK(IPW2+DIM1*NWIN), NWIN, RSRC4, CSRC4 ) 2102 END IF 2103 CALL DLAMOV( 'All', DIM1, DIM4, 2104 $ WORK( IPW2+DIM1*NWIN ), NWIN, 2105 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 2106 END IF 2107 IF( MYROW.EQ.RSRC3 .AND. MYCOL.EQ.CSRC3 ) THEN 2108 ILOC = INDXG2L( I+DIM1, NB, MYROW, 2109 $ DESCT( RSRC_ ), NPROW ) 2110 JLOC = INDXG2L( I+DIM1-1, NB, MYCOL, 2111 $ DESCT( CSRC_ ), NPCOL ) 2112 IF( RSRC3.NE.RSRC1 .OR. CSRC3.NE.CSRC1 ) THEN 2113 CALL DGERV2D( ICTXT, 1, 1, 2114 $ WORK( IPW2+(DIM1-1)*NWIN+DIM1 ), NWIN, 2115 $ RSRC1, CSRC1 ) 2116 END IF 2117 T((JLOC-1)*LLDT+ILOC) = 2118 $ WORK( IPW2+(DIM1-1)*NWIN+DIM1 ) 2119 END IF 2120 END IF 2121* 2122 325 CONTINUE 2123* 2124 320 CONTINUE 2125* 2126* For the crossborder updates, we use the same directions as 2127* in the local reordering case above. 2128* 2129 DO 2222 DIR = 1, 2 2130* 2131* Broadcast information about the reordering. 2132* 2133 DO 321 WINDOW = WINDOW0, WINE, 2 2134 RSRC4 = IWORK(IRSRC+WINDOW-1) 2135 CSRC4 = IWORK(ICSRC+WINDOW-1) 2136 RSRC1 = MOD( RSRC4 - 1 + NPROW, NPROW ) 2137 CSRC1 = MOD( CSRC4 - 1 + NPCOL, NPCOL ) 2138 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 2139 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) 2140 $ CALL IGEBS2D( ICTXT, 'Row', TOP, 8, 1, 2141 $ IBUFF, 8 ) 2142 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) 2143 $ CALL IGEBS2D( ICTXT, 'Col', TOP, 8, 1, 2144 $ IBUFF, 8 ) 2145 SKIP1CR = WINDOW.EQ.1 .AND. 2146 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2147 ELSEIF( MYROW.EQ.RSRC1 .OR. MYCOL.EQ.CSRC1 ) THEN 2148 IF( NPCOL.GT.1 .AND. DIR.EQ.1 .AND. 2149 $ MYROW.EQ.RSRC1 ) THEN 2150 CALL IGEBR2D( ICTXT, 'Row', TOP, 8, 1, 2151 $ IBUFF, 8, RSRC1, CSRC1 ) 2152 I = IBUFF( 1 ) 2153 NWIN = IBUFF( 2 ) 2154 PITRAF = IBUFF( 3 ) 2155 KS = IBUFF( 4 ) 2156 PDTRAF = IBUFF( 5 ) 2157 NDTRAF = IBUFF( 6 ) 2158 ILEN = IBUFF( 7 ) 2159 DLEN = IBUFF( 8 ) 2160 BUFFLEN = ILEN + DLEN 2161 IPW3 = IPW2 + NWIN*NWIN 2162 DIM1 = NB - MOD(I-1,NB) 2163 DIM4 = NWIN - DIM1 2164 LIHIC = NWIN + I - 1 2165 SKIP1CR = WINDOW.EQ.1 .AND. 2166 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2167 END IF 2168 IF( NPROW.GT.1 .AND. DIR.EQ.2 .AND. 2169 $ MYCOL.EQ.CSRC1 ) THEN 2170 CALL IGEBR2D( ICTXT, 'Col', TOP, 8, 1, 2171 $ IBUFF, 8, RSRC1, CSRC1 ) 2172 I = IBUFF( 1 ) 2173 NWIN = IBUFF( 2 ) 2174 PITRAF = IBUFF( 3 ) 2175 KS = IBUFF( 4 ) 2176 PDTRAF = IBUFF( 5 ) 2177 NDTRAF = IBUFF( 6 ) 2178 ILEN = IBUFF( 7 ) 2179 DLEN = IBUFF( 8 ) 2180 BUFFLEN = ILEN + DLEN 2181 IPW3 = IPW2 + NWIN*NWIN 2182 DIM1 = NB - MOD(I-1,NB) 2183 DIM4 = NWIN - DIM1 2184 LIHIC = NWIN + I - 1 2185 SKIP1CR = WINDOW.EQ.1 .AND. 2186 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2187 END IF 2188 END IF 2189 IF( RSRC1.NE.RSRC4 ) THEN 2190 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2191 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) 2192 $ CALL IGEBS2D( ICTXT, 'Row', TOP, 8, 1, 2193 $ IBUFF, 8 ) 2194 SKIP1CR = WINDOW.EQ.1 .AND. 2195 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2196 ELSEIF( MYROW.EQ.RSRC4 ) THEN 2197 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) THEN 2198 CALL IGEBR2D( ICTXT, 'Row', TOP, 8, 1, 2199 $ IBUFF, 8, RSRC4, CSRC4 ) 2200 I = IBUFF( 1 ) 2201 NWIN = IBUFF( 2 ) 2202 PITRAF = IBUFF( 3 ) 2203 KS = IBUFF( 4 ) 2204 PDTRAF = IBUFF( 5 ) 2205 NDTRAF = IBUFF( 6 ) 2206 ILEN = IBUFF( 7 ) 2207 DLEN = IBUFF( 8 ) 2208 BUFFLEN = ILEN + DLEN 2209 IPW3 = IPW2 + NWIN*NWIN 2210 DIM1 = NB - MOD(I-1,NB) 2211 DIM4 = NWIN - DIM1 2212 LIHIC = NWIN + I - 1 2213 SKIP1CR = WINDOW.EQ.1 .AND. 2214 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2215 END IF 2216 END IF 2217 END IF 2218 IF( CSRC1.NE.CSRC4 ) THEN 2219 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2220 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) 2221 $ CALL IGEBS2D( ICTXT, 'Col', TOP, 8, 1, 2222 $ IBUFF, 8 ) 2223 SKIP1CR = WINDOW.EQ.1 .AND. 2224 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2225 ELSEIF( MYCOL.EQ.CSRC4 ) THEN 2226 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) THEN 2227 CALL IGEBR2D( ICTXT, 'Col', TOP, 8, 1, 2228 $ IBUFF, 8, RSRC4, CSRC4 ) 2229 I = IBUFF( 1 ) 2230 NWIN = IBUFF( 2 ) 2231 PITRAF = IBUFF( 3 ) 2232 KS = IBUFF( 4 ) 2233 PDTRAF = IBUFF( 5 ) 2234 NDTRAF = IBUFF( 6 ) 2235 ILEN = IBUFF( 7 ) 2236 DLEN = IBUFF( 8 ) 2237 BUFFLEN = ILEN + DLEN 2238 IPW3 = IPW2 + NWIN*NWIN 2239 DIM1 = NB - MOD(I-1,NB) 2240 DIM4 = NWIN - DIM1 2241 LIHIC = NWIN + I - 1 2242 SKIP1CR = WINDOW.EQ.1 .AND. 2243 $ ICEIL(LIHIC,NB).LE.ICEIL(ILO,NB) 2244 END IF 2245 END IF 2246 END IF 2247* 2248* Skip rest of broadcasts and updates if appropriate. 2249* 2250 IF( SKIP1CR ) GO TO 326 2251* 2252* Broadcast the orthogonal transformations. 2253* 2254 IF( MYROW.EQ.RSRC1 .AND. MYCOL.EQ.CSRC1 ) THEN 2255 BUFFER = PDTRAF 2256 BUFFLEN = DLEN + ILEN 2257 IF( (NPROW.GT.1 .AND. DIR.EQ.2) .OR. 2258 $ (NPCOL.GT.1 .AND. DIR.EQ.1) ) THEN 2259 DO 370 INDX = 1, ILEN 2260 WORK( BUFFER+INDX-1 ) = 2261 $ DBLE( IWORK(IPIW+INDX-1) ) 2262 370 CONTINUE 2263 CALL DLAMOV( 'All', DLEN, 1, WORK( IPW3 ), 2264 $ DLEN, WORK(BUFFER+ILEN), DLEN ) 2265 END IF 2266 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) THEN 2267 CALL DGEBS2D( ICTXT, 'Row', TOP, BUFFLEN, 1, 2268 $ WORK(BUFFER), BUFFLEN ) 2269 END IF 2270 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) THEN 2271 CALL DGEBS2D( ICTXT, 'Col', TOP, BUFFLEN, 1, 2272 $ WORK(BUFFER), BUFFLEN ) 2273 END IF 2274 ELSEIF( MYROW.EQ.RSRC1 .OR. MYCOL.EQ.CSRC1 ) THEN 2275 IF( NPCOL.GT.1 .AND. DIR.EQ.1 .AND. 2276 $ MYROW.EQ.RSRC1 ) THEN 2277 BUFFER = PDTRAF 2278 BUFFLEN = DLEN + ILEN 2279 CALL DGEBR2D( ICTXT, 'Row', TOP, BUFFLEN, 1, 2280 $ WORK(BUFFER), BUFFLEN, RSRC1, CSRC1 ) 2281 END IF 2282 IF( NPROW.GT.1 .AND. DIR.EQ.2 .AND. 2283 $ MYCOL.EQ.CSRC1 ) THEN 2284 BUFFER = PDTRAF 2285 BUFFLEN = DLEN + ILEN 2286 CALL DGEBR2D( ICTXT, 'Col', TOP, BUFFLEN, 1, 2287 $ WORK(BUFFER), BUFFLEN, RSRC1, CSRC1 ) 2288 END IF 2289 IF( (NPCOL.GT.1.AND.DIR.EQ.1.AND.MYROW.EQ.RSRC1) 2290 $ .OR. (NPROW.GT.1.AND.DIR.EQ.2.AND. 2291 $ MYCOL.EQ.CSRC1) ) THEN 2292 DO 380 INDX = 1, ILEN 2293 IWORK(IPIW+INDX-1) = 2294 $ INT( WORK( BUFFER+INDX-1 ) ) 2295 380 CONTINUE 2296 CALL DLAMOV( 'All', DLEN, 1, 2297 $ WORK( BUFFER+ILEN ), DLEN, 2298 $ WORK( IPW3 ), DLEN ) 2299 END IF 2300 END IF 2301 IF( RSRC1.NE.RSRC4 ) THEN 2302 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2303 BUFFER = PDTRAF 2304 BUFFLEN = DLEN + ILEN 2305 IF( NPCOL.GT.1 .AND. DIR.EQ.1 ) THEN 2306 DO 390 INDX = 1, ILEN 2307 WORK( BUFFER+INDX-1 ) = 2308 $ DBLE( IWORK(IPIW+INDX-1) ) 2309 390 CONTINUE 2310 CALL DLAMOV( 'All', DLEN, 1, WORK( IPW3 ), 2311 $ DLEN, WORK(BUFFER+ILEN), DLEN ) 2312 CALL DGEBS2D( ICTXT, 'Row', TOP, BUFFLEN, 2313 $ 1, WORK(BUFFER), BUFFLEN ) 2314 END IF 2315 ELSEIF( MYROW.EQ.RSRC4 .AND. DIR.EQ.1 .AND. 2316 $ NPCOL.GT.1 ) THEN 2317 BUFFER = PDTRAF 2318 BUFFLEN = DLEN + ILEN 2319 CALL DGEBR2D( ICTXT, 'Row', TOP, BUFFLEN, 2320 $ 1, WORK(BUFFER), BUFFLEN, RSRC4, CSRC4 ) 2321 DO 400 INDX = 1, ILEN 2322 IWORK(IPIW+INDX-1) = 2323 $ INT( WORK( BUFFER+INDX-1 ) ) 2324 400 CONTINUE 2325 CALL DLAMOV( 'All', DLEN, 1, 2326 $ WORK( BUFFER+ILEN ), DLEN, 2327 $ WORK( IPW3 ), DLEN ) 2328 END IF 2329 END IF 2330 IF( CSRC1.NE.CSRC4 ) THEN 2331 IF( MYROW.EQ.RSRC4 .AND. MYCOL.EQ.CSRC4 ) THEN 2332 BUFFER = PDTRAF 2333 BUFFLEN = DLEN + ILEN 2334 IF( NPROW.GT.1 .AND. DIR.EQ.2 ) THEN 2335 DO 395 INDX = 1, ILEN 2336 WORK( BUFFER+INDX-1 ) = 2337 $ DBLE( IWORK(IPIW+INDX-1) ) 2338 395 CONTINUE 2339 CALL DLAMOV( 'All', DLEN, 1, WORK( IPW3 ), 2340 $ DLEN, WORK(BUFFER+ILEN), DLEN ) 2341 CALL DGEBS2D( ICTXT, 'Col', TOP, BUFFLEN, 2342 $ 1, WORK(BUFFER), BUFFLEN ) 2343 END IF 2344 ELSEIF( MYCOL.EQ.CSRC4 .AND. DIR.EQ.2 .AND. 2345 $ NPROW.GT.1 ) THEN 2346 BUFFER = PDTRAF 2347 BUFFLEN = DLEN + ILEN 2348 CALL DGEBR2D( ICTXT, 'Col', TOP, BUFFLEN, 1, 2349 $ WORK(BUFFER), BUFFLEN, RSRC4, CSRC4 ) 2350 DO 402 INDX = 1, ILEN 2351 IWORK(IPIW+INDX-1) = 2352 $ INT( WORK( BUFFER+INDX-1 ) ) 2353 402 CONTINUE 2354 CALL DLAMOV( 'All', DLEN, 1, 2355 $ WORK( BUFFER+ILEN ), DLEN, 2356 $ WORK( IPW3 ), DLEN ) 2357 END IF 2358 END IF 2359* 2360 326 CONTINUE 2361* 2362 321 CONTINUE 2363* 2364* Compute crossborder updates. 2365* 2366 DO 322 WINDOW = WINDOW0, WINE, 2 2367 IF( WINDOW.EQ.1 .AND. SKIP1CR ) GO TO 327 2368 RSRC4 = IWORK(IRSRC+WINDOW-1) 2369 CSRC4 = IWORK(ICSRC+WINDOW-1) 2370 RSRC1 = MOD( RSRC4 - 1 + NPROW, NPROW ) 2371 CSRC1 = MOD( CSRC4 - 1 + NPCOL, NPCOL ) 2372* 2373* Prepare workspaces for updates: 2374* IPW3 holds now the orthogonal transformations 2375* IPW4 holds the explicit orthogonal matrix, if formed 2376* IPW5 holds the crossborder block column of T 2377* IPW6 holds the crossborder block row of T 2378* IPW7 holds the crossborder block column of Q 2379* (if WANTQ=.TRUE.) 2380* IPW8 points to the leftover workspace used as lhs in 2381* matrix multiplications 2382* 2383 IF( ((MYCOL.EQ.CSRC1.OR.MYCOL.EQ.CSRC4).AND.DIR.EQ.2) 2384 $ .OR. ((MYROW.EQ.RSRC1.OR.MYROW.EQ.RSRC4).AND. 2385 $ DIR.EQ.1)) THEN 2386 IPW4 = BUFFER 2387 IF( DIR.EQ.2 ) THEN 2388 IF( WANTQ ) THEN 2389 QROWS = NUMROC( N, NB, MYROW, DESCQ( RSRC_ ), 2390 $ NPROW ) 2391 ELSE 2392 QROWS = 0 2393 END IF 2394 TROWS = NUMROC( I-1, NB, MYROW, DESCT( RSRC_ ), 2395 $ NPROW ) 2396 ELSE 2397 QROWS = 0 2398 TROWS = 0 2399 END IF 2400 IF( DIR.EQ.1 ) THEN 2401 TCOLS = NUMROC( N - (I+DIM1-1), NB, MYCOL, 2402 $ CSRC4, NPCOL ) 2403 IF( MYCOL.EQ.CSRC4 ) TCOLS = TCOLS - DIM4 2404 ELSE 2405 TCOLS = 0 2406 END IF 2407 IPW5 = IPW4 + NWIN*NWIN 2408 IPW6 = IPW5 + TROWS * NWIN 2409 IF( WANTQ ) THEN 2410 IPW7 = IPW6 + NWIN * TCOLS 2411 IPW8 = IPW7 + QROWS * NWIN 2412 ELSE 2413 IPW8 = IPW6 + NWIN * TCOLS 2414 END IF 2415 END IF 2416* 2417* Let each process row and column involved in the updates 2418* exchange data in T and Q with their neighbours. 2419* 2420 IF( DIR.EQ.2 ) THEN 2421 IF( MYCOL.EQ.CSRC1 .OR. MYCOL.EQ.CSRC4 ) THEN 2422 DO 410 INDX = 1, NPROW 2423 IF( MYCOL.EQ.CSRC1 ) THEN 2424 CALL INFOG2L( 1+(INDX-1)*NB, I, DESCT, 2425 $ NPROW, NPCOL, MYROW, MYCOL, ILOC, 2426 $ JLOC1, RSRC, CSRC1 ) 2427 IF( MYROW.EQ.RSRC ) THEN 2428 CALL DLAMOV( 'All', TROWS, DIM1, 2429 $ T((JLOC1-1)*LLDT+ILOC), LLDT, 2430 $ WORK(IPW5), TROWS ) 2431 IF( NPCOL.GT.1 ) THEN 2432 EAST = MOD( MYCOL + 1, NPCOL ) 2433 CALL DGESD2D( ICTXT, TROWS, DIM1, 2434 $ WORK(IPW5), TROWS, RSRC, 2435 $ EAST ) 2436 CALL DGERV2D( ICTXT, TROWS, DIM4, 2437 $ WORK(IPW5+TROWS*DIM1), TROWS, 2438 $ RSRC, EAST ) 2439 END IF 2440 END IF 2441 END IF 2442 IF( MYCOL.EQ.CSRC4 ) THEN 2443 CALL INFOG2L( 1+(INDX-1)*NB, I+DIM1, 2444 $ DESCT, NPROW, NPCOL, MYROW, MYCOL, 2445 $ ILOC, JLOC4, RSRC, CSRC4 ) 2446 IF( MYROW.EQ.RSRC ) THEN 2447 CALL DLAMOV( 'All', TROWS, DIM4, 2448 $ T((JLOC4-1)*LLDT+ILOC), LLDT, 2449 $ WORK(IPW5+TROWS*DIM1), TROWS ) 2450 IF( NPCOL.GT.1 ) THEN 2451 WEST = MOD( MYCOL-1+NPCOL, NPCOL ) 2452 CALL DGESD2D( ICTXT, TROWS, DIM4, 2453 $ WORK(IPW5+TROWS*DIM1), TROWS, 2454 $ RSRC, WEST ) 2455 CALL DGERV2D( ICTXT, TROWS, DIM1, 2456 $ WORK(IPW5), TROWS, RSRC, 2457 $ WEST ) 2458 END IF 2459 END IF 2460 END IF 2461 410 CONTINUE 2462 END IF 2463 END IF 2464* 2465 IF( DIR.EQ.1 ) THEN 2466 IF( MYROW.EQ.RSRC1 .OR. MYROW.EQ.RSRC4 ) THEN 2467 DO 420 INDX = 1, NPCOL 2468 IF( MYROW.EQ.RSRC1 ) THEN 2469 IF( INDX.EQ.1 ) THEN 2470 CALL INFOG2L( I, LIHIC+1, DESCT, NPROW, 2471 $ NPCOL, MYROW, MYCOL, ILOC1, JLOC, 2472 $ RSRC1, CSRC ) 2473 ELSE 2474 CALL INFOG2L( I, 2475 $ (ICEIL(LIHIC,NB)+(INDX-2))*NB+1, 2476 $ DESCT, NPROW, NPCOL, MYROW, MYCOL, 2477 $ ILOC1, JLOC, RSRC1, CSRC ) 2478 END IF 2479 IF( MYCOL.EQ.CSRC ) THEN 2480 CALL DLAMOV( 'All', DIM1, TCOLS, 2481 $ T((JLOC-1)*LLDT+ILOC1), LLDT, 2482 $ WORK(IPW6), NWIN ) 2483 IF( NPROW.GT.1 ) THEN 2484 SOUTH = MOD( MYROW + 1, NPROW ) 2485 CALL DGESD2D( ICTXT, DIM1, TCOLS, 2486 $ WORK(IPW6), NWIN, SOUTH, 2487 $ CSRC ) 2488 CALL DGERV2D( ICTXT, DIM4, TCOLS, 2489 $ WORK(IPW6+DIM1), NWIN, SOUTH, 2490 $ CSRC ) 2491 END IF 2492 END IF 2493 END IF 2494 IF( MYROW.EQ.RSRC4 ) THEN 2495 IF( INDX.EQ.1 ) THEN 2496 CALL INFOG2L( I+DIM1, LIHIC+1, DESCT, 2497 $ NPROW, NPCOL, MYROW, MYCOL, ILOC4, 2498 $ JLOC, RSRC4, CSRC ) 2499 ELSE 2500 CALL INFOG2L( I+DIM1, 2501 $ (ICEIL(LIHIC,NB)+(INDX-2))*NB+1, 2502 $ DESCT, NPROW, NPCOL, MYROW, MYCOL, 2503 $ ILOC4, JLOC, RSRC4, CSRC ) 2504 END IF 2505 IF( MYCOL.EQ.CSRC ) THEN 2506 CALL DLAMOV( 'All', DIM4, TCOLS, 2507 $ T((JLOC-1)*LLDT+ILOC4), LLDT, 2508 $ WORK(IPW6+DIM1), NWIN ) 2509 IF( NPROW.GT.1 ) THEN 2510 NORTH = MOD( MYROW-1+NPROW, NPROW ) 2511 CALL DGESD2D( ICTXT, DIM4, TCOLS, 2512 $ WORK(IPW6+DIM1), NWIN, NORTH, 2513 $ CSRC ) 2514 CALL DGERV2D( ICTXT, DIM1, TCOLS, 2515 $ WORK(IPW6), NWIN, NORTH, 2516 $ CSRC ) 2517 END IF 2518 END IF 2519 END IF 2520 420 CONTINUE 2521 END IF 2522 END IF 2523* 2524 IF( DIR.EQ.2 ) THEN 2525 IF( WANTQ ) THEN 2526 IF( MYCOL.EQ.CSRC1 .OR. MYCOL.EQ.CSRC4 ) THEN 2527 DO 430 INDX = 1, NPROW 2528 IF( MYCOL.EQ.CSRC1 ) THEN 2529 CALL INFOG2L( 1+(INDX-1)*NB, I, DESCQ, 2530 $ NPROW, NPCOL, MYROW, MYCOL, ILOC, 2531 $ JLOC1, RSRC, CSRC1 ) 2532 IF( MYROW.EQ.RSRC ) THEN 2533 CALL DLAMOV( 'All', QROWS, DIM1, 2534 $ Q((JLOC1-1)*LLDQ+ILOC), LLDQ, 2535 $ WORK(IPW7), QROWS ) 2536 IF( NPCOL.GT.1 ) THEN 2537 EAST = MOD( MYCOL + 1, NPCOL ) 2538 CALL DGESD2D( ICTXT, QROWS, DIM1, 2539 $ WORK(IPW7), QROWS, RSRC, 2540 $ EAST ) 2541 CALL DGERV2D( ICTXT, QROWS, DIM4, 2542 $ WORK(IPW7+QROWS*DIM1), 2543 $ QROWS, RSRC, EAST ) 2544 END IF 2545 END IF 2546 END IF 2547 IF( MYCOL.EQ.CSRC4 ) THEN 2548 CALL INFOG2L( 1+(INDX-1)*NB, I+DIM1, 2549 $ DESCQ, NPROW, NPCOL, MYROW, MYCOL, 2550 $ ILOC, JLOC4, RSRC, CSRC4 ) 2551 IF( MYROW.EQ.RSRC ) THEN 2552 CALL DLAMOV( 'All', QROWS, DIM4, 2553 $ Q((JLOC4-1)*LLDQ+ILOC), LLDQ, 2554 $ WORK(IPW7+QROWS*DIM1), QROWS ) 2555 IF( NPCOL.GT.1 ) THEN 2556 WEST = MOD( MYCOL-1+NPCOL, 2557 $ NPCOL ) 2558 CALL DGESD2D( ICTXT, QROWS, DIM4, 2559 $ WORK(IPW7+QROWS*DIM1), 2560 $ QROWS, RSRC, WEST ) 2561 CALL DGERV2D( ICTXT, QROWS, DIM1, 2562 $ WORK(IPW7), QROWS, RSRC, 2563 $ WEST ) 2564 END IF 2565 END IF 2566 END IF 2567 430 CONTINUE 2568 END IF 2569 END IF 2570 END IF 2571* 2572 327 CONTINUE 2573* 2574 322 CONTINUE 2575* 2576 DO 323 WINDOW = WINDOW0, WINE, 2 2577 RSRC4 = IWORK(IRSRC+WINDOW-1) 2578 CSRC4 = IWORK(ICSRC+WINDOW-1) 2579 RSRC1 = MOD( RSRC4 - 1 + NPROW, NPROW ) 2580 CSRC1 = MOD( CSRC4 - 1 + NPCOL, NPCOL ) 2581 FLOPS = 0 2582 IF( ((MYCOL.EQ.CSRC1.OR.MYCOL.EQ.CSRC4).AND.DIR.EQ.2) 2583 $ .OR. ((MYROW.EQ.RSRC1.OR.MYROW.EQ.RSRC4).AND. 2584 $ DIR.EQ.1) ) THEN 2585* 2586* Skip this part of the updates if appropriate. 2587* 2588 IF( WINDOW.EQ.1 .AND. SKIP1CR ) GO TO 328 2589* 2590* Count number of operations to decide whether to use 2591* matrix-matrix multiplications for updating 2592* off-diagonal parts or not. 2593* 2594 NITRAF = PITRAF - IPIW 2595 ISHH = .FALSE. 2596 DO 405 K = 1, NITRAF 2597 IF( IWORK( IPIW + K - 1 ).LE.NWIN ) THEN 2598 FLOPS = FLOPS + 6 2599 ELSE 2600 FLOPS = FLOPS + 11 2601 ISHH = .TRUE. 2602 END IF 2603 405 CONTINUE 2604* 2605* Perform updates in parallel. 2606* 2607 IF( FLOPS.NE.0 .AND. 2608 $ ( 2*FLOPS*100 )/( 2*NWIN*NWIN ) .GE. MMULT ) 2609 $ THEN 2610* 2611 CALL DLASET( 'All', NWIN, NWIN, ZERO, ONE, 2612 $ WORK( IPW4 ), NWIN ) 2613 WORK(IPW8) = DBLE(MYROW) 2614 WORK(IPW8+1) = DBLE(MYCOL) 2615 CALL BDLAAPP( 1, NWIN, NWIN, NCB, WORK( IPW4 ), 2616 $ NWIN, NITRAF, IWORK(IPIW), WORK( IPW3 ), 2617 $ WORK(IPW8) ) 2618* 2619* Test if sparsity structure of orthogonal matrix 2620* can be exploited (see below). 2621* 2622 IF( ISHH .OR. DIM1.NE.KS .OR. DIM4.NE.KS ) THEN 2623* 2624* Update the columns of T and Q affected by the 2625* reordering. 2626* 2627 IF( DIR.EQ.2 ) THEN 2628 DO 440 INDX = 1, MIN(I-1,1+(NPROW-1)*NB), 2629 $ NB 2630 IF( MYCOL.EQ.CSRC1 ) THEN 2631 CALL INFOG2L( INDX, I, DESCT, NPROW, 2632 $ NPCOL, MYROW, MYCOL, ILOC, 2633 $ JLOC, RSRC, CSRC1 ) 2634 IF( MYROW.EQ.RSRC ) THEN 2635 CALL DGEMM( 'No transpose', 2636 $ 'No transpose', TROWS, DIM1, 2637 $ NWIN, ONE, WORK( IPW5 ), 2638 $ TROWS, WORK( IPW4 ), NWIN, 2639 $ ZERO, WORK(IPW8), TROWS ) 2640 CALL DLAMOV( 'All', TROWS, DIM1, 2641 $ WORK(IPW8), TROWS, 2642 $ T((JLOC-1)*LLDT+ILOC), 2643 $ LLDT ) 2644 END IF 2645 END IF 2646 IF( MYCOL.EQ.CSRC4 ) THEN 2647 CALL INFOG2L( INDX, I+DIM1, DESCT, 2648 $ NPROW, NPCOL, MYROW, MYCOL, 2649 $ ILOC, JLOC, RSRC, CSRC4 ) 2650 IF( MYROW.EQ.RSRC ) THEN 2651 CALL DGEMM( 'No transpose', 2652 $ 'No transpose', TROWS, DIM4, 2653 $ NWIN, ONE, WORK( IPW5 ), 2654 $ TROWS, 2655 $ WORK( IPW4+NWIN*DIM1 ), 2656 $ NWIN, ZERO, WORK(IPW8), 2657 $ TROWS ) 2658 CALL DLAMOV( 'All', TROWS, DIM4, 2659 $ WORK(IPW8), TROWS, 2660 $ T((JLOC-1)*LLDT+ILOC), 2661 $ LLDT ) 2662 END IF 2663 END IF 2664 440 CONTINUE 2665* 2666 IF( WANTQ ) THEN 2667 DO 450 INDX = 1, MIN(N,1+(NPROW-1)*NB), 2668 $ NB 2669 IF( MYCOL.EQ.CSRC1 ) THEN 2670 CALL INFOG2L( INDX, I, DESCQ, 2671 $ NPROW, NPCOL, MYROW, MYCOL, 2672 $ ILOC, JLOC, RSRC, CSRC1 ) 2673 IF( MYROW.EQ.RSRC ) THEN 2674 CALL DGEMM( 'No transpose', 2675 $ 'No transpose', QROWS, 2676 $ DIM1, NWIN, ONE, 2677 $ WORK( IPW7 ), QROWS, 2678 $ WORK( IPW4 ), NWIN, 2679 $ ZERO, WORK(IPW8), 2680 $ QROWS ) 2681 CALL DLAMOV( 'All', QROWS, 2682 $ DIM1, WORK(IPW8), QROWS, 2683 $ Q((JLOC-1)*LLDQ+ILOC), 2684 $ LLDQ ) 2685 END IF 2686 END IF 2687 IF( MYCOL.EQ.CSRC4 ) THEN 2688 CALL INFOG2L( INDX, I+DIM1, 2689 $ DESCQ, NPROW, NPCOL, MYROW, 2690 $ MYCOL, ILOC, JLOC, RSRC, 2691 $ CSRC4 ) 2692 IF( MYROW.EQ.RSRC ) THEN 2693 CALL DGEMM( 'No transpose', 2694 $ 'No transpose', QROWS, 2695 $ DIM4, NWIN, ONE, 2696 $ WORK( IPW7 ), QROWS, 2697 $ WORK( IPW4+NWIN*DIM1 ), 2698 $ NWIN, ZERO, WORK(IPW8), 2699 $ QROWS ) 2700 CALL DLAMOV( 'All', QROWS, 2701 $ DIM4, WORK(IPW8), QROWS, 2702 $ Q((JLOC-1)*LLDQ+ILOC), 2703 $ LLDQ ) 2704 END IF 2705 END IF 2706 450 CONTINUE 2707 END IF 2708 END IF 2709* 2710* Update the rows of T affected by the 2711* reordering. 2712* 2713 IF( DIR.EQ.1 ) THEN 2714 IF ( LIHIC.LT.N ) THEN 2715 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC4 2716 $ .AND.MOD(LIHIC,NB).NE.0 ) THEN 2717 INDX = LIHIC + 1 2718 CALL INFOG2L( I, INDX, DESCT, NPROW, 2719 $ NPCOL, MYROW, MYCOL, ILOC, 2720 $ JLOC, RSRC1, CSRC4 ) 2721 CALL DGEMM( 'Transpose', 2722 $ 'No Transpose', DIM1, TCOLS, 2723 $ NWIN, ONE, WORK(IPW4), NWIN, 2724 $ WORK( IPW6 ), NWIN, ZERO, 2725 $ WORK(IPW8), DIM1 ) 2726 CALL DLAMOV( 'All', DIM1, TCOLS, 2727 $ WORK(IPW8), DIM1, 2728 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 2729 END IF 2730 IF( MYROW.EQ.RSRC4.AND.MYCOL.EQ.CSRC4 2731 $ .AND.MOD(LIHIC,NB).NE.0 ) THEN 2732 INDX = LIHIC + 1 2733 CALL INFOG2L( I+DIM1, INDX, DESCT, 2734 $ NPROW, NPCOL, MYROW, MYCOL, 2735 $ ILOC, JLOC, RSRC4, CSRC4 ) 2736 CALL DGEMM( 'Transpose', 2737 $ 'No Transpose', DIM4, TCOLS, 2738 $ NWIN, ONE, 2739 $ WORK( IPW4+DIM1*NWIN ), NWIN, 2740 $ WORK( IPW6), NWIN, ZERO, 2741 $ WORK(IPW8), DIM4 ) 2742 CALL DLAMOV( 'All', DIM4, TCOLS, 2743 $ WORK(IPW8), DIM4, 2744 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 2745 END IF 2746 INDXS = ICEIL(LIHIC,NB)*NB + 1 2747 INDXE = MIN(N,INDXS+(NPCOL-2)*NB) 2748 DO 460 INDX = INDXS, INDXE, NB 2749 IF( MYROW.EQ.RSRC1 ) THEN 2750 CALL INFOG2L( I, INDX, DESCT, 2751 $ NPROW, NPCOL, MYROW, MYCOL, 2752 $ ILOC, JLOC, RSRC1, CSRC ) 2753 IF( MYCOL.EQ.CSRC ) THEN 2754 CALL DGEMM( 'Transpose', 2755 $ 'No Transpose', DIM1, 2756 $ TCOLS, NWIN, ONE, 2757 $ WORK( IPW4 ), NWIN, 2758 $ WORK( IPW6 ), NWIN, 2759 $ ZERO, WORK(IPW8), DIM1 ) 2760 CALL DLAMOV( 'All', DIM1, 2761 $ TCOLS, WORK(IPW8), DIM1, 2762 $ T((JLOC-1)*LLDT+ILOC), 2763 $ LLDT ) 2764 END IF 2765 END IF 2766 IF( MYROW.EQ.RSRC4 ) THEN 2767 CALL INFOG2L( I+DIM1, INDX, 2768 $ DESCT, NPROW, NPCOL, MYROW, 2769 $ MYCOL, ILOC, JLOC, RSRC4, 2770 $ CSRC ) 2771 IF( MYCOL.EQ.CSRC ) THEN 2772 CALL DGEMM( 'Transpose', 2773 $ 'No Transpose', DIM4, 2774 $ TCOLS, NWIN, ONE, 2775 $ WORK( IPW4+NWIN*DIM1 ), 2776 $ NWIN, WORK( IPW6 ), 2777 $ NWIN, ZERO, WORK(IPW8), 2778 $ DIM4 ) 2779 CALL DLAMOV( 'All', DIM4, 2780 $ TCOLS, WORK(IPW8), DIM4, 2781 $ T((JLOC-1)*LLDT+ILOC), 2782 $ LLDT ) 2783 END IF 2784 END IF 2785 460 CONTINUE 2786 END IF 2787 END IF 2788 ELSE 2789* 2790* The NWIN-by-NWIN matrix U containing the 2791* accumulated orthogonal transformations has 2792* the following structure: 2793* 2794* [ U11 U12 ] 2795* U = [ ], 2796* [ U21 U22 ] 2797* 2798* where U21 is KS-by-KS upper triangular and 2799* U12 is (NWIN-KS)-by-(NWIN-KS) lower 2800* triangular. For reordering over the border 2801* the structure is only exploited when the 2802* border cuts the columns of U conformally with 2803* the structure itself. This happens exactly 2804* when all eigenvalues in the subcluster was 2805* moved to the other side of the border and 2806* fits perfectly in their new positions, i.e., 2807* the reordering stops when the last eigenvalue 2808* to cross the border is reordered to the 2809* position closest to the border. Tested by 2810* checking is KS = DIM1 = DIM4 (see above). 2811* This should hold quite often. But this branch 2812* is entered only if all involved eigenvalues 2813* are real. 2814* 2815* Update the columns of T and Q affected by the 2816* reordering. 2817* 2818* Compute T2*U21 + T1*U11 on the left side of 2819* the border. 2820* 2821 IF( DIR.EQ.2 ) THEN 2822 INDXE = MIN(I-1,1+(NPROW-1)*NB) 2823 DO 470 INDX = 1, INDXE, NB 2824 IF( MYCOL.EQ.CSRC1 ) THEN 2825 CALL INFOG2L( INDX, I, DESCT, NPROW, 2826 $ NPCOL, MYROW, MYCOL, ILOC, 2827 $ JLOC, RSRC, CSRC1 ) 2828 IF( MYROW.EQ.RSRC ) THEN 2829 CALL DLAMOV( 'All', TROWS, KS, 2830 $ WORK( IPW5+TROWS*DIM4), 2831 $ TROWS, WORK(IPW8), TROWS ) 2832 CALL DTRMM( 'Right', 'Upper', 2833 $ 'No transpose', 2834 $ 'Non-unit', TROWS, KS, 2835 $ ONE, WORK( IPW4+DIM4 ), 2836 $ NWIN, WORK(IPW8), TROWS ) 2837 CALL DGEMM( 'No transpose', 2838 $ 'No transpose', TROWS, KS, 2839 $ DIM4, ONE, WORK( IPW5 ), 2840 $ TROWS, WORK( IPW4 ), NWIN, 2841 $ ONE, WORK(IPW8), TROWS ) 2842 CALL DLAMOV( 'All', TROWS, KS, 2843 $ WORK(IPW8), TROWS, 2844 $ T((JLOC-1)*LLDT+ILOC), 2845 $ LLDT ) 2846 END IF 2847 END IF 2848* 2849* Compute T1*U12 + T2*U22 on the right 2850* side of the border. 2851* 2852 IF( MYCOL.EQ.CSRC4 ) THEN 2853 CALL INFOG2L( INDX, I+DIM1, DESCT, 2854 $ NPROW, NPCOL, MYROW, MYCOL, 2855 $ ILOC, JLOC, RSRC, CSRC4 ) 2856 IF( MYROW.EQ.RSRC ) THEN 2857 CALL DLAMOV( 'All', TROWS, DIM4, 2858 $ WORK(IPW5), TROWS, 2859 $ WORK( IPW8 ), TROWS ) 2860 CALL DTRMM( 'Right', 'Lower', 2861 $ 'No transpose', 2862 $ 'Non-unit', TROWS, DIM4, 2863 $ ONE, WORK( IPW4+NWIN*KS ), 2864 $ NWIN, WORK( IPW8 ), TROWS ) 2865 CALL DGEMM( 'No transpose', 2866 $ 'No transpose', TROWS, DIM4, 2867 $ KS, ONE, 2868 $ WORK( IPW5+TROWS*DIM4), 2869 $ TROWS, 2870 $ WORK( IPW4+NWIN*KS+DIM4 ), 2871 $ NWIN, ONE, WORK( IPW8 ), 2872 $ TROWS ) 2873 CALL DLAMOV( 'All', TROWS, DIM4, 2874 $ WORK(IPW8), TROWS, 2875 $ T((JLOC-1)*LLDT+ILOC), 2876 $ LLDT ) 2877 END IF 2878 END IF 2879 470 CONTINUE 2880 IF( WANTQ ) THEN 2881* 2882* Compute Q2*U21 + Q1*U11 on the left 2883* side of border. 2884* 2885 INDXE = MIN(N,1+(NPROW-1)*NB) 2886 DO 480 INDX = 1, INDXE, NB 2887 IF( MYCOL.EQ.CSRC1 ) THEN 2888 CALL INFOG2L( INDX, I, DESCQ, 2889 $ NPROW, NPCOL, MYROW, MYCOL, 2890 $ ILOC, JLOC, RSRC, CSRC1 ) 2891 IF( MYROW.EQ.RSRC ) THEN 2892 CALL DLAMOV( 'All', QROWS, KS, 2893 $ WORK( IPW7+QROWS*DIM4), 2894 $ QROWS, WORK(IPW8), 2895 $ QROWS ) 2896 CALL DTRMM( 'Right', 'Upper', 2897 $ 'No transpose', 2898 $ 'Non-unit', QROWS, 2899 $ KS, ONE, 2900 $ WORK( IPW4+DIM4 ), NWIN, 2901 $ WORK(IPW8), QROWS ) 2902 CALL DGEMM( 'No transpose', 2903 $ 'No transpose', QROWS, 2904 $ KS, DIM4, ONE, 2905 $ WORK( IPW7 ), QROWS, 2906 $ WORK( IPW4 ), NWIN, ONE, 2907 $ WORK(IPW8), QROWS ) 2908 CALL DLAMOV( 'All', QROWS, KS, 2909 $ WORK(IPW8), QROWS, 2910 $ Q((JLOC-1)*LLDQ+ILOC), 2911 $ LLDQ ) 2912 END IF 2913 END IF 2914* 2915* Compute Q1*U12 + Q2*U22 on the right 2916* side of border. 2917* 2918 IF( MYCOL.EQ.CSRC4 ) THEN 2919 CALL INFOG2L( INDX, I+DIM1, 2920 $ DESCQ, NPROW, NPCOL, MYROW, 2921 $ MYCOL, ILOC, JLOC, RSRC, 2922 $ CSRC4 ) 2923 IF( MYROW.EQ.RSRC ) THEN 2924 CALL DLAMOV( 'All', QROWS, 2925 $ DIM4, WORK(IPW7), QROWS, 2926 $ WORK( IPW8 ), QROWS ) 2927 CALL DTRMM( 'Right', 'Lower', 2928 $ 'No transpose', 2929 $ 'Non-unit', QROWS, 2930 $ DIM4, ONE, 2931 $ WORK( IPW4+NWIN*KS ), 2932 $ NWIN, WORK( IPW8 ), 2933 $ QROWS ) 2934 CALL DGEMM( 'No transpose', 2935 $ 'No transpose', QROWS, 2936 $ DIM4, KS, ONE, 2937 $ WORK(IPW7+QROWS*(DIM4)), 2938 $ QROWS, 2939 $ WORK(IPW4+NWIN*KS+DIM4), 2940 $ NWIN, ONE, WORK( IPW8 ), 2941 $ QROWS ) 2942 CALL DLAMOV( 'All', QROWS, 2943 $ DIM4, WORK(IPW8), QROWS, 2944 $ Q((JLOC-1)*LLDQ+ILOC), 2945 $ LLDQ ) 2946 END IF 2947 END IF 2948 480 CONTINUE 2949 END IF 2950 END IF 2951* 2952 IF( DIR.EQ.1 ) THEN 2953 IF ( LIHIC.LT.N ) THEN 2954* 2955* Compute U21**T*T2 + U11**T*T1 on the 2956* upper side of the border. 2957* 2958 IF( MYROW.EQ.RSRC1.AND.MYCOL.EQ.CSRC4 2959 $ .AND.MOD(LIHIC,NB).NE.0 ) THEN 2960 INDX = LIHIC + 1 2961 CALL INFOG2L( I, INDX, DESCT, NPROW, 2962 $ NPCOL, MYROW, MYCOL, ILOC, 2963 $ JLOC, RSRC1, CSRC4 ) 2964 CALL DLAMOV( 'All', KS, TCOLS, 2965 $ WORK( IPW6+DIM4 ), NWIN, 2966 $ WORK(IPW8), KS ) 2967 CALL DTRMM( 'Left', 'Upper', 2968 $ 'Transpose', 'Non-unit', 2969 $ KS, TCOLS, ONE, 2970 $ WORK( IPW4+DIM4 ), NWIN, 2971 $ WORK(IPW8), KS ) 2972 CALL DGEMM( 'Transpose', 2973 $ 'No transpose', KS, TCOLS, 2974 $ DIM4, ONE, WORK(IPW4), NWIN, 2975 $ WORK(IPW6), NWIN, ONE, 2976 $ WORK(IPW8), KS ) 2977 CALL DLAMOV( 'All', KS, TCOLS, 2978 $ WORK(IPW8), KS, 2979 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 2980 END IF 2981* 2982* Compute U12**T*T1 + U22**T*T2 on the 2983* lower side of the border. 2984* 2985 IF( MYROW.EQ.RSRC4.AND.MYCOL.EQ.CSRC4 2986 $ .AND.MOD(LIHIC,NB).NE.0 ) THEN 2987 INDX = LIHIC + 1 2988 CALL INFOG2L( I+DIM1, INDX, DESCT, 2989 $ NPROW, NPCOL, MYROW, MYCOL, 2990 $ ILOC, JLOC, RSRC4, CSRC4 ) 2991 CALL DLAMOV( 'All', DIM4, TCOLS, 2992 $ WORK( IPW6 ), NWIN, 2993 $ WORK( IPW8 ), DIM4 ) 2994 CALL DTRMM( 'Left', 'Lower', 2995 $ 'Transpose', 'Non-unit', 2996 $ DIM4, TCOLS, ONE, 2997 $ WORK( IPW4+NWIN*KS ), NWIN, 2998 $ WORK( IPW8 ), DIM4 ) 2999 CALL DGEMM( 'Transpose', 3000 $ 'No Transpose', DIM4, TCOLS, 3001 $ KS, ONE, 3002 $ WORK( IPW4+NWIN*KS+DIM4 ), 3003 $ NWIN, WORK( IPW6+DIM1 ), NWIN, 3004 $ ONE, WORK( IPW8), DIM4 ) 3005 CALL DLAMOV( 'All', DIM4, TCOLS, 3006 $ WORK(IPW8), DIM4, 3007 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 3008 END IF 3009* 3010* Compute U21**T*T2 + U11**T*T1 on upper 3011* side on border. 3012* 3013 INDXS = ICEIL(LIHIC,NB)*NB+1 3014 INDXE = MIN(N,INDXS+(NPCOL-2)*NB) 3015 DO 490 INDX = INDXS, INDXE, NB 3016 IF( MYROW.EQ.RSRC1 ) THEN 3017 CALL INFOG2L( I, INDX, DESCT, 3018 $ NPROW, NPCOL, MYROW, MYCOL, 3019 $ ILOC, JLOC, RSRC1, CSRC ) 3020 IF( MYCOL.EQ.CSRC ) THEN 3021 CALL DLAMOV( 'All', KS, TCOLS, 3022 $ WORK( IPW6+DIM4 ), NWIN, 3023 $ WORK(IPW8), KS ) 3024 CALL DTRMM( 'Left', 'Upper', 3025 $ 'Transpose', 3026 $ 'Non-unit', KS, 3027 $ TCOLS, ONE, 3028 $ WORK( IPW4+DIM4 ), NWIN, 3029 $ WORK(IPW8), KS ) 3030 CALL DGEMM( 'Transpose', 3031 $ 'No transpose', KS, 3032 $ TCOLS, DIM4, ONE, 3033 $ WORK(IPW4), NWIN, 3034 $ WORK(IPW6), NWIN, ONE, 3035 $ WORK(IPW8), KS ) 3036 CALL DLAMOV( 'All', KS, TCOLS, 3037 $ WORK(IPW8), KS, 3038 $ T((JLOC-1)*LLDT+ILOC), 3039 $ LLDT ) 3040 END IF 3041 END IF 3042* 3043* Compute U12**T*T1 + U22**T*T2 on 3044* lower side of border. 3045* 3046 IF( MYROW.EQ.RSRC4 ) THEN 3047 CALL INFOG2L( I+DIM1, INDX, 3048 $ DESCT, NPROW, NPCOL, MYROW, 3049 $ MYCOL, ILOC, JLOC, RSRC4, 3050 $ CSRC ) 3051 IF( MYCOL.EQ.CSRC ) THEN 3052 CALL DLAMOV( 'All', DIM4, 3053 $ TCOLS, WORK( IPW6 ), 3054 $ NWIN, WORK( IPW8 ), 3055 $ DIM4 ) 3056 CALL DTRMM( 'Left', 'Lower', 3057 $ 'Transpose', 3058 $ 'Non-unit', DIM4, 3059 $ TCOLS, ONE, 3060 $ WORK( IPW4+NWIN*KS ), 3061 $ NWIN, WORK( IPW8 ), 3062 $ DIM4 ) 3063 CALL DGEMM( 'Transpose', 3064 $ 'No Transpose', DIM4, 3065 $ TCOLS, KS, ONE, 3066 $ WORK(IPW4+NWIN*KS+DIM4), 3067 $ NWIN, WORK( IPW6+DIM1 ), 3068 $ NWIN, ONE, WORK( IPW8), 3069 $ DIM4 ) 3070 CALL DLAMOV( 'All', DIM4, 3071 $ TCOLS, WORK(IPW8), DIM4, 3072 $ T((JLOC-1)*LLDT+ILOC), 3073 $ LLDT ) 3074 END IF 3075 END IF 3076 490 CONTINUE 3077 END IF 3078 END IF 3079 END IF 3080 ELSEIF( FLOPS.NE.0 ) THEN 3081* 3082* Update off-diagonal blocks and Q using the 3083* pipelined elementary transformations. Now we 3084* have a delicate problem: how to do this without 3085* redundant work? For now, we let the processes 3086* involved compute the whole crossborder block 3087* rows and column saving only the part belonging 3088* to the corresponding side of the border. To make 3089* this a realistic alternative, we have modified 3090* the ratio r_flops (see Reference [2] above) to 3091* give more favor to the ordinary matrix 3092* multiplication. 3093* 3094 IF( DIR.EQ.2 ) THEN 3095 INDXE = MIN(I-1,1+(NPROW-1)*NB) 3096 DO 500 INDX = 1, INDXE, NB 3097 CALL INFOG2L( INDX, I, DESCT, NPROW, 3098 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3099 $ RSRC, CSRC ) 3100 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3101 $ THEN 3102 CALL BDLAAPP( 1, TROWS, NWIN, NCB, 3103 $ WORK(IPW5), TROWS, NITRAF, 3104 $ IWORK(IPIW), WORK( IPW3 ), 3105 $ WORK(IPW8) ) 3106 CALL DLAMOV( 'All', TROWS, DIM1, 3107 $ WORK(IPW5), TROWS, 3108 $ T((JLOC-1)*LLDT+ILOC ), LLDT ) 3109 END IF 3110 CALL INFOG2L( INDX, I+DIM1, DESCT, NPROW, 3111 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3112 $ RSRC, CSRC ) 3113 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3114 $ THEN 3115 IF( NPCOL.GT.1 ) 3116 $ CALL BDLAAPP( 1, TROWS, NWIN, NCB, 3117 $ WORK(IPW5), TROWS, NITRAF, 3118 $ IWORK(IPIW), WORK( IPW3 ), 3119 $ WORK(IPW8) ) 3120 CALL DLAMOV( 'All', TROWS, DIM4, 3121 $ WORK(IPW5+TROWS*DIM1), TROWS, 3122 $ T((JLOC-1)*LLDT+ILOC ), LLDT ) 3123 END IF 3124 500 CONTINUE 3125 IF( WANTQ ) THEN 3126 INDXE = MIN(N,1+(NPROW-1)*NB) 3127 DO 510 INDX = 1, INDXE, NB 3128 CALL INFOG2L( INDX, I, DESCQ, NPROW, 3129 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3130 $ RSRC, CSRC ) 3131 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3132 $ THEN 3133 CALL BDLAAPP( 1, QROWS, NWIN, NCB, 3134 $ WORK(IPW7), QROWS, NITRAF, 3135 $ IWORK(IPIW), WORK( IPW3 ), 3136 $ WORK(IPW8) ) 3137 CALL DLAMOV( 'All', QROWS, DIM1, 3138 $ WORK(IPW7), QROWS, 3139 $ Q((JLOC-1)*LLDQ+ILOC ), LLDQ ) 3140 END IF 3141 CALL INFOG2L( INDX, I+DIM1, DESCQ, 3142 $ NPROW, NPCOL, MYROW, MYCOL, ILOC, 3143 $ JLOC, RSRC, CSRC ) 3144 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3145 $ THEN 3146 IF( NPCOL.GT.1 ) 3147 $ CALL BDLAAPP( 1, QROWS, NWIN, 3148 $ NCB, WORK(IPW7), QROWS, 3149 $ NITRAF, IWORK(IPIW), 3150 $ WORK( IPW3 ), WORK(IPW8) ) 3151 CALL DLAMOV( 'All', QROWS, DIM4, 3152 $ WORK(IPW7+QROWS*DIM1), QROWS, 3153 $ Q((JLOC-1)*LLDQ+ILOC ), LLDQ ) 3154 END IF 3155 510 CONTINUE 3156 END IF 3157 END IF 3158* 3159 IF( DIR.EQ.1 ) THEN 3160 IF( LIHIC.LT.N ) THEN 3161 INDX = LIHIC + 1 3162 CALL INFOG2L( I, INDX, DESCT, NPROW, 3163 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3164 $ RSRC, CSRC ) 3165 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC.AND. 3166 $ MOD(LIHIC,NB).NE.0 ) THEN 3167 CALL BDLAAPP( 0, NWIN, TCOLS, NCB, 3168 $ WORK( IPW6 ), NWIN, NITRAF, 3169 $ IWORK(IPIW), WORK( IPW3 ), 3170 $ WORK(IPW8) ) 3171 CALL DLAMOV( 'All', DIM1, TCOLS, 3172 $ WORK( IPW6 ), NWIN, 3173 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 3174 END IF 3175 CALL INFOG2L( I+DIM1, INDX, DESCT, NPROW, 3176 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3177 $ RSRC, CSRC ) 3178 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC.AND. 3179 $ MOD(LIHIC,NB).NE.0 ) THEN 3180 IF( NPROW.GT.1 ) 3181 $ CALL BDLAAPP( 0, NWIN, TCOLS, NCB, 3182 $ WORK( IPW6 ), NWIN, NITRAF, 3183 $ IWORK(IPIW), WORK( IPW3 ), 3184 $ WORK(IPW8) ) 3185 CALL DLAMOV( 'All', DIM4, TCOLS, 3186 $ WORK( IPW6+DIM1 ), NWIN, 3187 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 3188 END IF 3189 INDXS = ICEIL(LIHIC,NB)*NB + 1 3190 INDXE = MIN(N,INDXS+(NPCOL-2)*NB) 3191 DO 520 INDX = INDXS, INDXE, NB 3192 CALL INFOG2L( I, INDX, DESCT, NPROW, 3193 $ NPCOL, MYROW, MYCOL, ILOC, JLOC, 3194 $ RSRC, CSRC ) 3195 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3196 $ THEN 3197 CALL BDLAAPP( 0, NWIN, TCOLS, NCB, 3198 $ WORK(IPW6), NWIN, NITRAF, 3199 $ IWORK(IPIW), WORK( IPW3 ), 3200 $ WORK(IPW8) ) 3201 CALL DLAMOV( 'All', DIM1, TCOLS, 3202 $ WORK( IPW6 ), NWIN, 3203 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 3204 END IF 3205 CALL INFOG2L( I+DIM1, INDX, DESCT, 3206 $ NPROW, NPCOL, MYROW, MYCOL, ILOC, 3207 $ JLOC, RSRC, CSRC ) 3208 IF( MYROW.EQ.RSRC .AND. MYCOL.EQ.CSRC ) 3209 $ THEN 3210 IF( NPROW.GT.1 ) 3211 $ CALL BDLAAPP( 0, NWIN, TCOLS, 3212 $ NCB, WORK(IPW6), NWIN, NITRAF, 3213 $ IWORK(IPIW), WORK( IPW3 ), 3214 $ WORK(IPW8) ) 3215 CALL DLAMOV( 'All', DIM4, TCOLS, 3216 $ WORK( IPW6+DIM1 ), NWIN, 3217 $ T((JLOC-1)*LLDT+ILOC), LLDT ) 3218 END IF 3219 520 CONTINUE 3220 END IF 3221 END IF 3222 END IF 3223 END IF 3224* 3225 328 CONTINUE 3226* 3227 323 CONTINUE 3228* 3229* End of loops over directions (DIR). 3230* 3231 2222 CONTINUE 3232* 3233* End of loops over diagonal blocks for reordering over the 3234* block diagonal. 3235* 3236 310 CONTINUE 3237 LAST = LAST + 1 3238 IF( LASTWAIT .AND. LAST.LT.2 ) GO TO 308 3239* 3240* Barrier to collect the processes before proceeding. 3241* 3242 CALL BLACS_BARRIER( ICTXT, 'All' ) 3243* 3244* Compute global maximum of IERR so that we know if some process 3245* experienced a failure in the reordering. 3246* 3247 MYIERR = IERR 3248 IF( NPROCS.GT.1 ) 3249 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, IERR, 1, -1, 3250 $ -1, -1, -1, -1 ) 3251* 3252 IF( IERR.NE.0 ) THEN 3253* 3254* When calling BDTREXC, the block at position I+KKS-1 failed 3255* to swap. 3256* 3257 IF( MYIERR.NE.0 ) INFO = MAX(1,I+KKS-1) 3258 IF( NPROCS.GT.1 ) 3259 $ CALL IGAMX2D( ICTXT, 'All', TOP, 1, 1, INFO, 1, -1, 3260 $ -1, -1, -1, -1 ) 3261 GO TO 300 3262 END IF 3263* 3264* Do a global update of the SELECT vector. 3265* 3266 DO 530 K = 1, N 3267 RSRC = INDXG2P( K, NB, MYROW, DESCT( RSRC_ ), NPROW ) 3268 CSRC = INDXG2P( K, NB, MYCOL, DESCT( CSRC_ ), NPCOL ) 3269 IF( MYROW.NE.RSRC .OR. MYCOL.NE.CSRC ) 3270 $ SELECT( K ) = 0 3271 530 CONTINUE 3272 IF( NPROCS.GT.1 ) 3273 $ CALL IGSUM2D( ICTXT, 'All', TOP, N, 1, SELECT, N, -1, -1 ) 3274* 3275* Find the global minumum of ILO and IHI. 3276* 3277 ILO = ILO - 1 3278 523 CONTINUE 3279 ILO = ILO + 1 3280 IF( ILO.LE.N ) THEN 3281 IF( SELECT(ILO).NE.0 ) GO TO 523 3282 END IF 3283 IHI = IHI + 1 3284 527 CONTINUE 3285 IHI = IHI - 1 3286 IF( IHI.GE.1 ) THEN 3287 IF( SELECT(IHI).EQ.0 ) GO TO 527 3288 END IF 3289* 3290* End While ( ILO <= M ) 3291 GO TO 50 3292 END IF 3293* 3294 300 CONTINUE 3295* 3296* In case an error occured, do an additional global update of 3297* SELECT. 3298* 3299 IF( INFO.NE.0 ) THEN 3300 DO 540 K = 1, N 3301 RSRC = INDXG2P( K, NB, MYROW, DESCT( RSRC_ ), NPROW ) 3302 CSRC = INDXG2P( K, NB, MYCOL, DESCT( CSRC_ ), NPCOL ) 3303 IF( MYROW.NE.RSRC .OR. MYCOL.NE.CSRC ) 3304 $ SELECT( K ) = 0 3305 540 CONTINUE 3306 IF( NPROCS.GT.1 ) 3307 $ CALL IGSUM2D( ICTXT, 'All', TOP, N, 1, SELECT, N, -1, -1 ) 3308 END IF 3309* 3310 545 CONTINUE 3311* 3312* Store the output eigenvalues in WR and WI: first let all the 3313* processes compute the eigenvalue inside their diagonal blocks in 3314* parallel, except for the eigenvalue located next to a block 3315* border. After that, compute all eigenvalues located next to the 3316* block borders. Finally, do a global summation over WR and WI so 3317* that all processors receive the result. Notice: real eigenvalues 3318* extracted from a non-canonical 2-by-2 block are not stored in 3319* any particular order. 3320* 3321 DO 550 K = 1, N 3322 WR( K ) = ZERO 3323 WI( K ) = ZERO 3324 550 CONTINUE 3325* 3326* Loop 560: extract eigenvalues from the blocks which are not laid 3327* out across a border of the processor mesh, except for those 1x1 3328* blocks on the border. 3329* 3330 PAIR = .FALSE. 3331 DO 560 K = 1, N 3332 IF( .NOT. PAIR ) THEN 3333 BORDER = ( K.NE.N .AND. MOD( K, NB ).EQ.0 ) .OR. 3334 % ( K.NE.1 .AND. MOD( K, NB ).EQ.1 ) 3335 IF( .NOT. BORDER ) THEN 3336 CALL INFOG2L( K, K, DESCT, NPROW, NPCOL, MYROW, MYCOL, 3337 $ ILOC1, JLOC1, TRSRC1, TCSRC1 ) 3338 IF( MYROW.EQ.TRSRC1 .AND. MYCOL.EQ.TCSRC1 ) THEN 3339 ELEM1 = T((JLOC1-1)*LLDT+ILOC1) 3340 IF( K.LT.N ) THEN 3341 ELEM3 = T((JLOC1-1)*LLDT+ILOC1+1) 3342 ELSE 3343 ELEM3 = ZERO 3344 END IF 3345 IF( ELEM3.NE.ZERO ) THEN 3346 ELEM2 = T((JLOC1)*LLDT+ILOC1) 3347 ELEM4 = T((JLOC1)*LLDT+ILOC1+1) 3348 CALL DLANV2( ELEM1, ELEM2, ELEM3, ELEM4, 3349 $ WR( K ), WI( K ), WR( K+1 ), WI( K+1 ), SN, 3350 $ CS ) 3351 PAIR = .TRUE. 3352 ELSE 3353 IF( K.GT.1 ) THEN 3354 TMP = T((JLOC1-2)*LLDT+ILOC1) 3355 IF( TMP.NE.ZERO ) THEN 3356 ELEM1 = T((JLOC1-2)*LLDT+ILOC1-1) 3357 ELEM2 = T((JLOC1-1)*LLDT+ILOC1-1) 3358 ELEM3 = T((JLOC1-2)*LLDT+ILOC1) 3359 ELEM4 = T((JLOC1-1)*LLDT+ILOC1) 3360 CALL DLANV2( ELEM1, ELEM2, ELEM3, ELEM4, 3361 $ WR( K-1 ), WI( K-1 ), WR( K ), 3362 $ WI( K ), SN, CS ) 3363 ELSE 3364 WR( K ) = ELEM1 3365 END IF 3366 ELSE 3367 WR( K ) = ELEM1 3368 END IF 3369 END IF 3370 END IF 3371 END IF 3372 ELSE 3373 PAIR = .FALSE. 3374 END IF 3375 560 CONTINUE 3376* 3377* Loop 570: extract eigenvalues from the blocks which are laid 3378* out across a border of the processor mesh. The processors are 3379* numbered as below: 3380* 3381* 1 | 2 3382* --+-- 3383* 3 | 4 3384* 3385 DO 570 K = NB, N-1, NB 3386 CALL INFOG2L( K, K, DESCT, NPROW, NPCOL, MYROW, MYCOL, 3387 $ ILOC1, JLOC1, TRSRC1, TCSRC1 ) 3388 CALL INFOG2L( K, K+1, DESCT, NPROW, NPCOL, MYROW, MYCOL, 3389 $ ILOC2, JLOC2, TRSRC2, TCSRC2 ) 3390 CALL INFOG2L( K+1, K, DESCT, NPROW, NPCOL, MYROW, MYCOL, 3391 $ ILOC3, JLOC3, TRSRC3, TCSRC3 ) 3392 CALL INFOG2L( K+1, K+1, DESCT, NPROW, NPCOL, MYROW, MYCOL, 3393 $ ILOC4, JLOC4, TRSRC4, TCSRC4 ) 3394 IF( MYROW.EQ.TRSRC2 .AND. MYCOL.EQ.TCSRC2 ) THEN 3395 ELEM2 = T((JLOC2-1)*LLDT+ILOC2) 3396 IF( TRSRC1.NE.TRSRC2 .OR. TCSRC1.NE.TCSRC2 ) 3397 $ CALL DGESD2D( ICTXT, 1, 1, ELEM2, 1, TRSRC1, TCSRC1 ) 3398 END IF 3399 IF( MYROW.EQ.TRSRC3 .AND. MYCOL.EQ.TCSRC3 ) THEN 3400 ELEM3 = T((JLOC3-1)*LLDT+ILOC3) 3401 IF( TRSRC1.NE.TRSRC3 .OR. TCSRC1.NE.TCSRC3 ) 3402 $ CALL DGESD2D( ICTXT, 1, 1, ELEM3, 1, TRSRC1, TCSRC1 ) 3403 END IF 3404 IF( MYROW.EQ.TRSRC4 .AND. MYCOL.EQ.TCSRC4 ) THEN 3405 WORK(1) = T((JLOC4-1)*LLDT+ILOC4) 3406 IF( K+1.LT.N ) THEN 3407 WORK(2) = T((JLOC4-1)*LLDT+ILOC4+1) 3408 ELSE 3409 WORK(2) = ZERO 3410 END IF 3411 IF( TRSRC1.NE.TRSRC4 .OR. TCSRC1.NE.TCSRC4 ) 3412 $ CALL DGESD2D( ICTXT, 2, 1, WORK, 2, TRSRC1, TCSRC1 ) 3413 END IF 3414 IF( MYROW.EQ.TRSRC1 .AND. MYCOL.EQ.TCSRC1 ) THEN 3415 ELEM1 = T((JLOC1-1)*LLDT+ILOC1) 3416 IF( TRSRC1.NE.TRSRC2 .OR. TCSRC1.NE.TCSRC2 ) 3417 $ CALL DGERV2D( ICTXT, 1, 1, ELEM2, 1, TRSRC2, TCSRC2 ) 3418 IF( TRSRC1.NE.TRSRC3 .OR. TCSRC1.NE.TCSRC3 ) 3419 $ CALL DGERV2D( ICTXT, 1, 1, ELEM3, 1, TRSRC3, TCSRC3 ) 3420 IF( TRSRC1.NE.TRSRC4 .OR. TCSRC1.NE.TCSRC4 ) 3421 $ CALL DGERV2D( ICTXT, 2, 1, WORK, 2, TRSRC4, TCSRC4 ) 3422 ELEM4 = WORK(1) 3423 ELEM5 = WORK(2) 3424 IF( ELEM5.EQ.ZERO ) THEN 3425 IF( WR( K ).EQ.ZERO .AND. WI( K ).EQ.ZERO ) THEN 3426 CALL DLANV2( ELEM1, ELEM2, ELEM3, ELEM4, WR( K ), 3427 $ WI( K ), WR( K+1 ), WI( K+1 ), SN, CS ) 3428 ELSEIF( WR( K+1 ).EQ.ZERO .AND. WI( K+1 ).EQ.ZERO ) THEN 3429 WR( K+1 ) = ELEM4 3430 END IF 3431 ELSEIF( WR( K ).EQ.ZERO .AND. WI( K ).EQ.ZERO ) THEN 3432 WR( K ) = ELEM1 3433 END IF 3434 END IF 3435 570 CONTINUE 3436* 3437 IF( NPROCS.GT.1 ) THEN 3438 CALL DGSUM2D( ICTXT, 'All', TOP, N, 1, WR, N, -1, -1 ) 3439 CALL DGSUM2D( ICTXT, 'All', TOP, N, 1, WI, N, -1, -1 ) 3440 END IF 3441* 3442* Store storage requirements in workspaces. 3443* 3444 WORK( 1 ) = DBLE(LWMIN) 3445 IWORK( 1 ) = LIWMIN 3446* 3447* Return to calling program. 3448* 3449 RETURN 3450* 3451* End of PDTRORD 3452* 3453 END 3454* 3455