1 SUBROUTINE PSLAED2( ICTXT, K, N, N1, NB, D, DROW, DCOL, Q, LDQ, 2 $ RHO, Z, W, DLAMDA, Q2, LDQ2, QBUF, CTOT, PSM, 3 $ NPCOL, INDX, INDXC, INDXP, INDCOL, COLTYP, NN, 4 $ NN1, NN2, IB1, IB2 ) 5* 6* -- ScaLAPACK auxiliary routine (version 1.7) -- 7* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 8* and University of California, Berkeley. 9* December 31, 1998 10* 11* .. Scalar Arguments .. 12 INTEGER DCOL, DROW, IB1, IB2, ICTXT, K, LDQ, LDQ2, N, 13 $ N1, NB, NN, NN1, NN2, NPCOL 14 REAL RHO 15* .. 16* .. Array Arguments .. 17 INTEGER COLTYP( * ), CTOT( 0: NPCOL-1, 4 ), 18 $ INDCOL( N ), INDX( * ), INDXC( * ), INDXP( * ), 19 $ PSM( 0: NPCOL-1, 4 ) 20 REAL D( * ), DLAMDA( * ), Q( LDQ, * ), 21 $ Q2( LDQ2, * ), QBUF( * ), W( * ), Z( * ) 22* .. 23* 24* Purpose 25* ======= 26* 27* PSLAED2 sorts the two sets of eigenvalues together into a single 28* sorted set. Then it tries to deflate the size of the problem. 29* There are two ways in which deflation can occur: when two or more 30* eigenvalues are close together or if there is a tiny entry in the 31* Z vector. For each such occurrence the order of the related secular 32* equation problem is reduced by one. 33* 34* Arguments 35* ========= 36* 37* ICTXT (global input) INTEGER 38* The BLACS context handle, indicating the global context of 39* the operation on the matrix. The context itself is global. 40* 41* K (output) INTEGER 42* The number of non-deflated eigenvalues, and the order of the 43* related secular equation. 0 <= K <=N. 44* 45* N (input) INTEGER 46* The dimension of the symmetric tridiagonal matrix. N >= 0. 47* 48* N1 (input) INTEGER 49* The location of the last eigenvalue in the leading sub-matrix. 50* min(1,N) < N1 < N. 51* 52* NB (global input) INTEGER 53* The blocking factor used to distribute the columns of the 54* matrix. NB >= 1. 55* 56* D (input/output) REAL array, dimension (N) 57* On entry, D contains the eigenvalues of the two submatrices to 58* be combined. 59* On exit, D contains the trailing (N-K) updated eigenvalues 60* (those which were deflated) sorted into increasing order. 61* 62* DROW (global input) INTEGER 63* The process row over which the first row of the matrix D is 64* distributed. 0 <= DROW < NPROW. 65* 66* DCOL (global input) INTEGER 67* The process column over which the first column of the 68* matrix D is distributed. 0 <= DCOL < NPCOL. 69* 70* Q (input/output) REAL array, dimension (LDQ, N) 71* On entry, Q contains the eigenvectors of two submatrices in 72* the two square blocks with corners at (1,1), (N1,N1) 73* and (N1+1, N1+1), (N,N). 74* On exit, Q contains the trailing (N-K) updated eigenvectors 75* (those which were deflated) in its last N-K columns. 76* 77* LDQ (input) INTEGER 78* The leading dimension of the array Q. LDQ >= max(1,NQ). 79* 80* RHO (global input/output) REAL 81* On entry, the off-diagonal element associated with the rank-1 82* cut which originally split the two submatrices which are now 83* being recombined. 84* On exit, RHO has been modified to the value required by 85* PSLAED3. 86* 87* Z (global input) REAL array, dimension (N) 88* On entry, Z contains the updating vector (the last 89* row of the first sub-eigenvector matrix and the first row of 90* the second sub-eigenvector matrix). 91* On exit, the contents of Z have been destroyed by the updating 92* process. 93* 94* DLAMDA (global output) REAL array, dimension (N) 95* A copy of the first K eigenvalues which will be used by 96* SLAED3 to form the secular equation. 97* 98* W (global output) REAL array, dimension (N) 99* The first k values of the final deflation-altered z-vector 100* which will be passed to SLAED3. 101* 102* Q2 (output) REAL array, dimension (LDQ2, NQ) 103* A copy of the first K eigenvectors which will be used by 104* 105* LDQ2 (input) INTEGER 106* The leading dimension of the array Q2. 107* 108* QBUF (workspace) REAL array, dimension 3*N 109* 110* CTOT (workspace) INTEGER array, dimension( NPCOL, 4) 111* 112* PSM (workspace) INTEGER array, dimension( NPCOL, 4) 113* 114* NPCOL (global input) INTEGER 115* The total number of columns over which the distributed 116* submatrix is distributed. 117* 118* INDX (workspace) INTEGER array, dimension (N) 119* The permutation used to sort the contents of DLAMDA into 120* ascending order. 121* 122* INDXC (output) INTEGER array, dimension (N) 123* The permutation used to arrange the columns of the deflated 124* Q matrix into three groups: the first group contains non-zero 125* elements only at and above N1, the second contains 126* non-zero elements only below N1, and the third is dense. 127* 128* INDXP (workspace) INTEGER array, dimension (N) 129* The permutation used to place deflated values of D at the end 130* of the array. INDXP(1:K) points to the nondeflated D-values 131* and INDXP(K+1:N) points to the deflated eigenvalues. 132* 133* INDCOL (workspace) INTEGER array, dimension (N) 134* 135* COLTYP (workspace/output) INTEGER array, dimension (N) 136* During execution, a label which will indicate which of the 137* following types a column in the Q2 matrix is: 138* 1 : non-zero in the upper half only; 139* 2 : dense; 140* 3 : non-zero in the lower half only; 141* 4 : deflated. 142* 143* NN (global output) INTEGER, the order of matrix U, (PSLAED1). 144* NN1 (global output) INTEGER, the order of matrix Q1, (PSLAED1). 145* NN2 (global output) INTEGER, the order of matrix Q2, (PSLAED1). 146* IB1 (global output) INTEGER, pointeur on Q1, (PSLAED1). 147* IB2 (global output) INTEGER, pointeur on Q2, (PSLAED1). 148* 149* ===================================================================== 150* 151* .. Parameters .. 152 REAL MONE, ZERO, ONE, TWO, EIGHT 153 PARAMETER ( MONE = -1.0E0, ZERO = 0.0E0, ONE = 1.0E0, 154 $ TWO = 2.0E0, EIGHT = 8.0E0 ) 155* .. 156* .. Local Scalars .. 157 INTEGER COL, CT, I, IAM, IE1, IE2, IMAX, INFO, J, JJQ2, 158 $ JJS, JMAX, JS, K2, MYCOL, MYROW, N1P1, N2, NJ, 159 $ NJCOL, NJJ, NP, NPROCS, NPROW, PJ, PJCOL, PJJ 160 REAL C, EPS, S, T, TAU, TOL 161* .. 162* .. External Functions .. 163 INTEGER INDXG2L, INDXL2G, ISAMAX, NUMROC 164 REAL PSLAMCH, SLAPY2 165 EXTERNAL INDXG2L, INDXL2G, ISAMAX, NUMROC, PSLAMCH, 166 $ SLAPY2 167* .. 168* .. External Subroutines .. 169 EXTERNAL BLACS_GRIDINFO, BLACS_PINFO, INFOG1L, SCOPY, 170 $ SGERV2D, SGESD2D, SLAPST, SROT, SSCAL 171* .. 172* .. Intrinsic Functions .. 173 INTRINSIC ABS, MAX, MIN, MOD, SQRT 174* .. 175* .. External Functions .. 176* .. 177* .. Local Arrays .. 178 INTEGER PTT( 4 ) 179* .. 180* .. Executable Statements .. 181* 182* Quick return if possible 183* 184 IF( N.EQ.0 ) 185 $ RETURN 186* 187 CALL BLACS_PINFO( IAM, NPROCS ) 188 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 189 NP = NUMROC( N, NB, MYROW, DROW, NPROW ) 190* 191 N2 = N - N1 192 N1P1 = N1 + 1 193* 194 IF( RHO.LT.ZERO ) THEN 195 CALL SSCAL( N2, MONE, Z( N1P1 ), 1 ) 196 END IF 197* 198* Normalize z so that norm(z) = 1. Since z is the concatenation of 199* two normalized vectors, norm2(z) = sqrt(2). 200* 201 T = ONE / SQRT( TWO ) 202 CALL SSCAL( N, T, Z, 1 ) 203* 204* RHO = ABS( norm(z)**2 * RHO ) 205* 206 RHO = ABS( TWO*RHO ) 207* 208* Calculate the allowable deflation tolerance 209* 210 IMAX = ISAMAX( N, Z, 1 ) 211 JMAX = ISAMAX( N, D, 1 ) 212 EPS = PSLAMCH( ICTXT, 'Epsilon' ) 213 TOL = EIGHT*EPS*MAX( ABS( D( JMAX ) ), ABS( Z( IMAX ) ) ) 214* 215* If the rank-1 modifier is small enough, no more needs to be done 216* except to reorganize Q so that its columns correspond with the 217* elements in D. 218* 219 IF( RHO*ABS( Z( IMAX ) ).LE.TOL ) THEN 220 K = 0 221 GO TO 220 222 END IF 223* 224* If there are multiple eigenvalues then the problem deflates. Here 225* the number of equal eigenvalues are found. As each equal 226* eigenvalue is found, an elementary reflector is computed to rotate 227* the corresponding eigensubspace so that the corresponding 228* components of Z are zero in this new basis. 229* 230* 231 CALL SLAPST( 'I', N, D, INDX, INFO ) 232* 233 DO 10 I = 1, N1 234 COLTYP( I ) = 1 235 10 CONTINUE 236 DO 20 I = N1P1, N 237 COLTYP( I ) = 3 238 20 CONTINUE 239 COL = DCOL 240 DO 40 I = 1, N, NB 241 DO 30 J = 0, NB - 1 242 IF( I+J.LE.N ) 243 $ INDCOL( I+J ) = COL 244 30 CONTINUE 245 COL = MOD( COL+1, NPCOL ) 246 40 CONTINUE 247* 248 K = 0 249 K2 = N + 1 250 DO 50 J = 1, N 251 NJ = INDX( J ) 252 IF( RHO*ABS( Z( NJ ) ).LE.TOL ) THEN 253* 254* Deflate due to small z component. 255* 256 K2 = K2 - 1 257 COLTYP( NJ ) = 4 258 INDXP( K2 ) = NJ 259 IF( J.EQ.N ) 260 $ GO TO 80 261 ELSE 262 PJ = NJ 263 GO TO 60 264 END IF 265 50 CONTINUE 266 60 CONTINUE 267 J = J + 1 268 NJ = INDX( J ) 269 IF( J.GT.N ) 270 $ GO TO 80 271 IF( RHO*ABS( Z( NJ ) ).LE.TOL ) THEN 272* 273* Deflate due to small z component. 274* 275 K2 = K2 - 1 276 COLTYP( NJ ) = 4 277 INDXP( K2 ) = NJ 278 ELSE 279* 280* Check if eigenvalues are close enough to allow deflation. 281* 282 S = Z( PJ ) 283 C = Z( NJ ) 284* 285* Find sqrt(a**2+b**2) without overflow or 286* destructive underflow. 287* 288 TAU = SLAPY2( C, S ) 289 T = D( NJ ) - D( PJ ) 290 C = C / TAU 291 S = -S / TAU 292 IF( ABS( T*C*S ).LE.TOL ) THEN 293* 294* Deflation is possible. 295* 296 Z( NJ ) = TAU 297 Z( PJ ) = ZERO 298 IF( COLTYP( NJ ).NE.COLTYP( PJ ) ) 299 $ COLTYP( NJ ) = 2 300 COLTYP( PJ ) = 4 301 CALL INFOG1L( NJ, NB, NPCOL, MYCOL, DCOL, NJJ, NJCOL ) 302 CALL INFOG1L( PJ, NB, NPCOL, MYCOL, DCOL, PJJ, PJCOL ) 303 IF( INDCOL( PJ ).EQ.INDCOL( NJ ) .AND. MYCOL.EQ.NJCOL ) THEN 304 CALL SROT( NP, Q( 1, PJJ ), 1, Q( 1, NJJ ), 1, C, S ) 305 ELSE IF( MYCOL.EQ.PJCOL ) THEN 306 CALL SGESD2D( ICTXT, NP, 1, Q( 1, PJJ ), NP, MYROW, 307 $ NJCOL ) 308 CALL SGERV2D( ICTXT, NP, 1, QBUF, NP, MYROW, NJCOL ) 309 CALL SROT( NP, Q( 1, PJJ ), 1, QBUF, 1, C, S ) 310 ELSE IF( MYCOL.EQ.NJCOL ) THEN 311 CALL SGESD2D( ICTXT, NP, 1, Q( 1, NJJ ), NP, MYROW, 312 $ PJCOL ) 313 CALL SGERV2D( ICTXT, NP, 1, QBUF, NP, MYROW, PJCOL ) 314 CALL SROT( NP, QBUF, 1, Q( 1, NJJ ), 1, C, S ) 315 END IF 316 T = D( PJ )*C**2 + D( NJ )*S**2 317 D( NJ ) = D( PJ )*S**2 + D( NJ )*C**2 318 D( PJ ) = T 319 K2 = K2 - 1 320 I = 1 321 70 CONTINUE 322 IF( K2+I.LE.N ) THEN 323 IF( D( PJ ).LT.D( INDXP( K2+I ) ) ) THEN 324 INDXP( K2+I-1 ) = INDXP( K2+I ) 325 INDXP( K2+I ) = PJ 326 I = I + 1 327 GO TO 70 328 ELSE 329 INDXP( K2+I-1 ) = PJ 330 END IF 331 ELSE 332 INDXP( K2+I-1 ) = PJ 333 END IF 334 PJ = NJ 335 ELSE 336 K = K + 1 337 DLAMDA( K ) = D( PJ ) 338 W( K ) = Z( PJ ) 339 INDXP( K ) = PJ 340 PJ = NJ 341 END IF 342 END IF 343 GO TO 60 344 80 CONTINUE 345* 346* Record the last eigenvalue. 347* 348 K = K + 1 349 DLAMDA( K ) = D( PJ ) 350 W( K ) = Z( PJ ) 351 INDXP( K ) = PJ 352* 353* Count up the total number of the various types of columns, then 354* form a permutation which positions the four column types into 355* four uniform groups (although one or more of these groups may be 356* empty). 357* 358 DO 100 J = 1, 4 359 DO 90 I = 0, NPCOL - 1 360 CTOT( I, J ) = 0 361 90 CONTINUE 362 PTT( J ) = 0 363 100 CONTINUE 364 DO 110 J = 1, N 365 CT = COLTYP( J ) 366 COL = INDCOL( J ) 367 CTOT( COL, CT ) = CTOT( COL, CT ) + 1 368 110 CONTINUE 369* 370* PSM(*) = Position in SubMatrix (of types 1 through 4) 371* 372 DO 120 COL = 0, NPCOL - 1 373 PSM( COL, 1 ) = 1 374 PSM( COL, 2 ) = 1 + CTOT( COL, 1 ) 375 PSM( COL, 3 ) = PSM( COL, 2 ) + CTOT( COL, 2 ) 376 PSM( COL, 4 ) = PSM( COL, 3 ) + CTOT( COL, 3 ) 377 120 CONTINUE 378 PTT( 1 ) = 1 379 DO 140 I = 2, 4 380 CT = 0 381 DO 130 J = 0, NPCOL - 1 382 CT = CT + CTOT( J, I-1 ) 383 130 CONTINUE 384 PTT( I ) = PTT( I-1 ) + CT 385 140 CONTINUE 386* 387* Fill out the INDXC array so that the permutation which it induces 388* will place all type-1 columns first, all type-2 columns next, 389* then all type-3's, and finally all type-4's. 390* 391 DO 150 J = 1, N 392 JS = INDXP( J ) 393 COL = INDCOL( JS ) 394 CT = COLTYP( JS ) 395 I = INDXL2G( PSM( COL, CT ), NB, COL, DCOL, NPCOL ) 396 INDX( J ) = I 397 INDXC( PTT( CT ) ) = I 398 PSM( COL, CT ) = PSM( COL, CT ) + 1 399 PTT( CT ) = PTT( CT ) + 1 400 150 CONTINUE 401 DO 160 J = 1, N 402 JS = INDXP( J ) 403 JJS = INDXG2L( JS, NB, J, J, NPCOL ) 404 COL = INDCOL( JS ) 405 IF( COL.EQ.MYCOL ) THEN 406 I = INDX( J ) 407 JJQ2 = INDXG2L( I, NB, J, J, NPCOL ) 408 CALL SCOPY( NP, Q( 1, JJS ), 1, Q2( 1, JJQ2 ), 1 ) 409 END IF 410 160 CONTINUE 411* 412* 413* The deflated eigenvalues and their corresponding vectors go back 414* into the last N - K slots of D and Q respectively. 415* 416 CALL SCOPY( N, D, 1, Z, 1 ) 417 DO 170 J = K + 1, N 418 JS = INDXP( J ) 419 I = INDX( J ) 420 D( I ) = Z( JS ) 421 170 CONTINUE 422* 423 PTT( 1 ) = 1 424 DO 190 I = 2, 4 425 CT = 0 426 DO 180 J = 0, NPCOL - 1 427 CT = CT + CTOT( J, I-1 ) 428 180 CONTINUE 429 PTT( I ) = PTT( I-1 ) + CT 430 190 CONTINUE 431* 432* 433 IB1 = INDXC( 1 ) 434 IE1 = IB1 435 IB2 = INDXC( PTT( 2 ) ) 436 IE2 = IB2 437 DO 200 I = 2, PTT( 3 ) - 1 438 IB1 = MIN( IB1, INDXC( I ) ) 439 IE1 = MAX( IE1, INDXC( I ) ) 440 200 CONTINUE 441 DO 210 I = PTT( 2 ), PTT( 4 ) - 1 442 IB2 = MIN( IB2, INDXC( I ) ) 443 IE2 = MAX( IE2, INDXC( I ) ) 444 210 CONTINUE 445 NN1 = IE1 - IB1 + 1 446 NN2 = IE2 - IB2 + 1 447 NN = MAX( IE1, IE2 ) - MIN( IB1, IB2 ) + 1 448 220 CONTINUE 449 RETURN 450* 451* End of PSLAED2 452* 453 END 454