1 SUBROUTINE PCHENTRD( UPLO, N, A, IA, JA, DESCA, D, E, TAU, WORK, 2 $ LWORK, RWORK, LRWORK, INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* October 15, 1999 8* 9* .. Scalar Arguments .. 10 CHARACTER UPLO 11 INTEGER IA, INFO, JA, LRWORK, LWORK, N 12* .. 13* .. Array Arguments .. 14 INTEGER DESCA( * ) 15 REAL D( * ), E( * ), RWORK( * ) 16 COMPLEX A( * ), TAU( * ), WORK( * ) 17* .. 18* Bugs 19* ==== 20* 21* 22* Support for UPLO='U' is limited to calling the old, slow, PCHETRD 23* code. 24* 25* 26* Purpose 27* 28* ======= 29* 30* PCHENTRD is a prototype version of PCHETRD which uses tailored 31* codes (either the serial, CHETRD, or the parallel code, PCHETTRD) 32* when the workspace provided by the user is adequate. 33* 34* 35* PCHENTRD reduces a complex Hermitian matrix sub( A ) to Hermitian 36* tridiagonal form T by an unitary similarity transformation: 37* Q' * sub( A ) * Q = T, where sub( A ) = A(IA:IA+N-1,JA:JA+N-1). 38* 39* Features 40* ======== 41* 42* PCHENTRD is faster than PCHETRD on almost all matrices, 43* particularly small ones (i.e. N < 500 * sqrt(P) ), provided that 44* enough workspace is available to use the tailored codes. 45* 46* The tailored codes provide performance that is essentially 47* independent of the input data layout. 48* 49* The tailored codes place no restrictions on IA, JA, MB or NB. 50* At present, IA, JA, MB and NB are restricted to those values allowed 51* by PCHETRD to keep the interface simple. These restrictions are 52* documented below. (Search for "restrictions".) 53* 54* Notes 55* ===== 56* 57* 58* Each global data object is described by an associated description 59* vector. This vector stores the information required to establish 60* the mapping between an object element and its corresponding process 61* and memory location. 62* 63* Let A be a generic term for any 2D block cyclicly distributed array. 64* Such a global array has an associated description vector DESCA. 65* In the following comments, the character _ should be read as 66* "of the global array". 67* 68* NOTATION STORED IN EXPLANATION 69* --------------- -------------- -------------------------------------- 70* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 71* DTYPE_A = 1. 72* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 73* the BLACS process grid A is distribu- 74* ted over. The context itself is glo- 75* bal, but the handle (the integer 76* value) may vary. 77* M_A (global) DESCA( M_ ) The number of rows in the global 78* array A. 79* N_A (global) DESCA( N_ ) The number of columns in the global 80* array A. 81* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 82* the rows of the array. 83* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 84* the columns of the array. 85* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 86* row of the array A is distributed. 87* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 88* first column of the array A is 89* distributed. 90* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 91* array. LLD_A >= MAX(1,LOCr(M_A)). 92* 93* Let K be the number of rows or columns of a distributed matrix, 94* and assume that its process grid has dimension p x q. 95* LOCr( K ) denotes the number of elements of K that a process 96* would receive if K were distributed over the p processes of its 97* process column. 98* Similarly, LOCc( K ) denotes the number of elements of K that a 99* process would receive if K were distributed over the q processes of 100* its process row. 101* The values of LOCr() and LOCc() may be determined via a call to the 102* ScaLAPACK tool function, NUMROC: 103* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 104* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 105* An upper bound for these quantities may be computed by: 106* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 107* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 108* 109* 110* Arguments 111* ========= 112* 113* UPLO (global input) CHARACTER 114* Specifies whether the upper or lower triangular part of the 115* Hermitian matrix sub( A ) is stored: 116* = 'U': Upper triangular 117* = 'L': Lower triangular 118* 119* N (global input) INTEGER 120* The number of rows and columns to be operated on, i.e. the 121* order of the distributed submatrix sub( A ). N >= 0. 122* 123* A (local input/local output) COMPLEX pointer into the 124* local memory to an array of dimension (LLD_A,LOCc(JA+N-1)). 125* On entry, this array contains the local pieces of the 126* Hermitian distributed matrix sub( A ). If UPLO = 'U', the 127* leading N-by-N upper triangular part of sub( A ) contains 128* the upper triangular part of the matrix, and its strictly 129* lower triangular part is not referenced. If UPLO = 'L', the 130* leading N-by-N lower triangular part of sub( A ) contains the 131* lower triangular part of the matrix, and its strictly upper 132* triangular part is not referenced. On exit, if UPLO = 'U', 133* the diagonal and first superdiagonal of sub( A ) are over- 134* written by the corresponding elements of the tridiagonal 135* matrix T, and the elements above the first superdiagonal, 136* with the array TAU, represent the unitary matrix Q as a 137* product of elementary reflectors; if UPLO = 'L', the diagonal 138* and first subdiagonal of sub( A ) are overwritten by the 139* corresponding elements of the tridiagonal matrix T, and the 140* elements below the first subdiagonal, with the array TAU, 141* represent the unitary matrix Q as a product of elementary 142* reflectors. See Further Details. 143* 144* IA (global input) INTEGER 145* The row index in the global array A indicating the first 146* row of sub( A ). 147* 148* JA (global input) INTEGER 149* The column index in the global array A indicating the 150* first column of sub( A ). 151* 152* DESCA (global and local input) INTEGER array of dimension DLEN_. 153* The array descriptor for the distributed matrix A. 154* 155* D (local output) REAL array, dimension LOCc(JA+N-1) 156* The diagonal elements of the tridiagonal matrix T: 157* D(i) = A(i,i). D is tied to the distributed matrix A. 158* 159* E (local output) REAL array, dimension LOCc(JA+N-1) 160* if UPLO = 'U', LOCc(JA+N-2) otherwise. The off-diagonal 161* elements of the tridiagonal matrix T: E(i) = A(i,i+1) if 162* UPLO = 'U', E(i) = A(i+1,i) if UPLO = 'L'. E is tied to the 163* distributed matrix A. 164* 165* TAU (local output) COMPLEX, array, dimension 166* LOCc(JA+N-1). This array contains the scalar factors TAU of 167* the elementary reflectors. TAU is tied to the distributed 168* matrix A. 169* 170* WORK (local workspace/local output) COMPLEX array, 171* dimension (LWORK) 172* On exit, WORK( 1 ) returns the optimal LWORK. 173* 174* LWORK (local or global input) INTEGER 175* The dimension of the array WORK. 176* LWORK is local input and must be at least 177* LWORK >= MAX( NB * ( NP +1 ), 3 * NB ) 178* 179* For optimal performance, greater workspace is needed, i.e. 180* LWORK >= 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS + 4 ) * NPS 181* ICTXT = DESCA( CTXT_ ) 182* ANB = PJLAENV( ICTXT, 3, 'PCHETTRD', 'L', 0, 0, 0, 0 ) 183* SQNPC = INT( SQRT( REAL( NPROW * NPCOL ) ) ) 184* NPS = MAX( NUMROC( N, 1, 0, 0, SQNPC ), 2*ANB ) 185* 186* NUMROC is a ScaLAPACK tool functions; 187* PJLAENV is a ScaLAPACK envionmental inquiry function 188* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 189* the subroutine BLACS_GRIDINFO. 190* 191* 192* RWORK (local workspace/local output) COMPLEX array, 193* dimension (LRWORK) 194* On exit, RWORK( 1 ) returns the optimal LRWORK. 195* 196* LRWORK (local or global input) INTEGER 197* The dimension of the array RWORK. 198* LRWORK is local input and must be at least 199* LRWORK >= 1 200* 201* For optimal performance, greater workspace is needed, i.e. 202* LRWORK >= MAX( 2 * N ) 203* 204* 205* INFO (global output) INTEGER 206* = 0: successful exit 207* < 0: If the i-th argument is an array and the j-entry had 208* an illegal value, then INFO = -(i*100+j), if the i-th 209* argument is a scalar and had an illegal value, then 210* INFO = -i. 211* 212* Further Details 213* =============== 214* 215* If UPLO = 'U', the matrix Q is represented as a product of elementary 216* reflectors 217* 218* Q = H(n-1) . . . H(2) H(1). 219* 220* Each H(i) has the form 221* 222* H(i) = I - tau * v * v' 223* 224* where tau is a complex scalar, and v is a complex vector with 225* v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in 226* A(ia:ia+i-2,ja+i), and tau in TAU(ja+i-1). 227* 228* If UPLO = 'L', the matrix Q is represented as a product of elementary 229* reflectors 230* 231* Q = H(1) H(2) . . . H(n-1). 232* 233* Each H(i) has the form 234* 235* H(i) = I - tau * v * v' 236* 237* where tau is a complex scalar, and v is a complex vector with 238* v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in 239* A(ia+i+1:ia+n-1,ja+i-1), and tau in TAU(ja+i-1). 240* 241* The contents of sub( A ) on exit are illustrated by the following 242* examples with n = 5: 243* 244* if UPLO = 'U': if UPLO = 'L': 245* 246* ( d e v2 v3 v4 ) ( d ) 247* ( d e v3 v4 ) ( e d ) 248* ( d e v4 ) ( v1 e d ) 249* ( d e ) ( v1 v2 e d ) 250* ( d ) ( v1 v2 v3 e d ) 251* 252* where d and e denote diagonal and off-diagonal elements of T, and vi 253* denotes an element of the vector defining H(i). 254* 255* Alignment requirements 256* ====================== 257* 258* The distributed submatrix sub( A ) must verify some alignment proper- 259* ties, namely the following expression should be true: 260* ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA .AND. IROFFA.EQ.0 ) with 261* IROFFA = MOD( IA-1, MB_A ) and ICOFFA = MOD( JA-1, NB_A ). 262* 263* ===================================================================== 264* 265* .. Parameters .. 266 INTEGER BLOCK_CYCLIC_2D, DLEN_, DTYPE_, CTXT_, M_, N_, 267 $ MB_, NB_, RSRC_, CSRC_, LLD_ 268 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 269 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 270 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 271 REAL ONE 272 PARAMETER ( ONE = 1.0E+0 ) 273 COMPLEX CONE 274 PARAMETER ( CONE = ( 1.0E+0, 0.0E+0 ) ) 275* .. 276* .. Local Scalars .. 277 LOGICAL LQUERY, UPPER 278 CHARACTER COLCTOP, ROWCTOP 279 INTEGER ANB, CTXTB, I, IACOL, IAROW, ICOFFA, ICTXT, 280 $ IINFO, INDB, INDRD, INDRE, INDTAU, INDW, IPW, 281 $ IROFFA, J, JB, JX, K, KK, LLRWORK, LLWORK, 282 $ LRWMIN, LWMIN, MINSZ, MYCOL, MYCOLB, MYROW, 283 $ MYROWB, NB, NP, NPCOL, NPCOLB, NPROW, NPROWB, 284 $ NPS, NQ, ONEPMIN, ONEPRMIN, SQNPC, TTLRWMIN, 285 $ TTLWMIN 286* .. 287* .. Local Arrays .. 288 INTEGER DESCB( DLEN_ ), DESCW( DLEN_ ), IDUM1( 3 ), 289 $ IDUM2( 3 ) 290* .. 291* .. External Subroutines .. 292 EXTERNAL BLACS_GET, BLACS_GRIDEXIT, BLACS_GRIDINFO, 293 $ BLACS_GRIDINIT, CHETRD, CHK1MAT, DESCSET, 294 $ IGAMN2D, PCELSET, PCHER2K, PCHETD2, PCHETTRD, 295 $ PCHK1MAT, PCLAMR1D, PCLATRD, PCTRMR2D, 296 $ PSLAMR1D, PB_TOPGET, PB_TOPSET, PXERBLA 297* .. 298* .. External Functions .. 299 LOGICAL LSAME 300 INTEGER INDXG2L, INDXG2P, NUMROC, PJLAENV 301 EXTERNAL LSAME, INDXG2L, INDXG2P, NUMROC, PJLAENV 302* .. 303* .. Intrinsic Functions .. 304 INTRINSIC CMPLX, ICHAR, INT, MAX, MIN, MOD, REAL, SQRT 305* .. 306* .. Executable Statements .. 307* 308* This is just to keep ftnchek and toolpack/1 happy 309 IF( BLOCK_CYCLIC_2D*CSRC_*CTXT_*DLEN_*DTYPE_*LLD_*MB_*M_*NB_*N_* 310 $ RSRC_.LT.0 )RETURN 311* Get grid parameters 312* 313 ICTXT = DESCA( CTXT_ ) 314 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 315* 316* Test the input parameters 317* 318 INFO = 0 319 IF( NPROW.EQ.-1 ) THEN 320 INFO = -( 600+CTXT_ ) 321 ELSE 322 CALL CHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, INFO ) 323 UPPER = LSAME( UPLO, 'U' ) 324 IF( INFO.EQ.0 ) THEN 325 NB = DESCA( NB_ ) 326 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 327 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 328 IAROW = INDXG2P( IA, NB, MYROW, DESCA( RSRC_ ), NPROW ) 329 IACOL = INDXG2P( JA, NB, MYCOL, DESCA( CSRC_ ), NPCOL ) 330 NP = NUMROC( N, NB, MYROW, IAROW, NPROW ) 331 NQ = MAX( 1, NUMROC( N+JA-1, NB, MYCOL, DESCA( CSRC_ ), 332 $ NPCOL ) ) 333 LWMIN = MAX( ( NP+1 )*NB, 3*NB ) 334 ANB = PJLAENV( ICTXT, 3, 'PCHETTRD', 'L', 0, 0, 0, 0 ) 335 MINSZ = PJLAENV( ICTXT, 5, 'PCHETTRD', 'L', 0, 0, 0, 0 ) 336 SQNPC = INT( SQRT( REAL( NPROW*NPCOL ) ) ) 337 NPS = MAX( NUMROC( N, 1, 0, 0, SQNPC ), 2*ANB ) 338 TTLWMIN = 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS+2 )*NPS 339 LRWMIN = 1 340 TTLRWMIN = 2*NPS 341* 342 WORK( 1 ) = CMPLX( REAL( TTLWMIN ) ) 343 RWORK( 1 ) = REAL( TTLRWMIN ) 344 LQUERY = ( LWORK.EQ.-1 .OR. LRWORK.EQ.-1 ) 345 IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN 346 INFO = -1 347* 348* The following two restrictions are not necessary provided 349* that either of the tailored codes are used. 350* 351 ELSE IF( IROFFA.NE.ICOFFA .OR. ICOFFA.NE.0 ) THEN 352 INFO = -5 353 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 354 INFO = -( 600+NB_ ) 355 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 356 INFO = -11 357 ELSE IF( LRWORK.LT.LRWMIN .AND. .NOT.LQUERY ) THEN 358 INFO = -13 359 END IF 360 END IF 361 IF( UPPER ) THEN 362 IDUM1( 1 ) = ICHAR( 'U' ) 363 ELSE 364 IDUM1( 1 ) = ICHAR( 'L' ) 365 END IF 366 IDUM2( 1 ) = 1 367 IF( LWORK.EQ.-1 ) THEN 368 IDUM1( 2 ) = -1 369 ELSE 370 IDUM1( 2 ) = 1 371 END IF 372 IDUM2( 2 ) = 11 373 IF( LRWORK.EQ.-1 ) THEN 374 IDUM1( 3 ) = -1 375 ELSE 376 IDUM1( 3 ) = 1 377 END IF 378 IDUM2( 3 ) = 13 379 CALL PCHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, 3, IDUM1, IDUM2, 380 $ INFO ) 381 END IF 382* 383 IF( INFO.NE.0 ) THEN 384 CALL PXERBLA( ICTXT, 'PCHENTRD', -INFO ) 385 RETURN 386 ELSE IF( LQUERY ) THEN 387 RETURN 388 END IF 389* 390* Quick return if possible 391* 392 IF( N.EQ.0 ) 393 $ RETURN 394* 395* 396 ONEPMIN = N*N + 3*N + 1 397 LLWORK = LWORK 398 CALL IGAMN2D( ICTXT, 'A', ' ', 1, 1, LLWORK, 1, 1, -1, -1, -1, 399 $ -1 ) 400* 401 ONEPRMIN = 2*N 402 LLRWORK = LRWORK 403 CALL IGAMN2D( ICTXT, 'A', ' ', 1, 1, LLRWORK, 1, 1, -1, -1, -1, 404 $ -1 ) 405* 406* 407* Use the serial, LAPACK, code: CTRD on small matrices if we 408* we have enough space. 409* 410 NPROWB = 0 411 IF( ( N.LT.MINSZ .OR. SQNPC.EQ.1 ) .AND. LLWORK.GE.ONEPMIN .AND. 412 $ LLRWORK.GE.ONEPRMIN .AND. .NOT.UPPER ) THEN 413 NPROWB = 1 414 NPS = N 415 ELSE 416 IF( LLWORK.GE.TTLWMIN .AND. LLRWORK.GE.TTLRWMIN .AND. .NOT. 417 $ UPPER ) THEN 418 NPROWB = SQNPC 419 END IF 420 END IF 421* 422 IF( NPROWB.GE.1 ) THEN 423 NPCOLB = NPROWB 424 SQNPC = NPROWB 425 INDB = 1 426 INDRD = 1 427 INDRE = INDRD + NPS 428 INDTAU = INDB + NPS*NPS 429 INDW = INDTAU + NPS 430 LLWORK = LLWORK - INDW + 1 431* 432 CALL BLACS_GET( ICTXT, 10, CTXTB ) 433 CALL BLACS_GRIDINIT( CTXTB, 'Row major', SQNPC, SQNPC ) 434 CALL BLACS_GRIDINFO( CTXTB, NPROWB, NPCOLB, MYROWB, MYCOLB ) 435 CALL DESCSET( DESCB, N, N, 1, 1, 0, 0, CTXTB, NPS ) 436* 437 CALL PCTRMR2D( UPLO, 'N', N, N, A, IA, JA, DESCA, WORK( INDB ), 438 $ 1, 1, DESCB, ICTXT ) 439* 440* 441* Only those processors in context CTXTB are needed for a while 442* 443 IF( NPROWB.GT.0 ) THEN 444* 445 IF( NPROWB.EQ.1 ) THEN 446 CALL CHETRD( UPLO, N, WORK( INDB ), NPS, RWORK( INDRD ), 447 $ RWORK( INDRE ), WORK( INDTAU ), 448 $ WORK( INDW ), LLWORK, INFO ) 449 ELSE 450* 451 CALL PCHETTRD( 'L', N, WORK( INDB ), 1, 1, DESCB, 452 $ RWORK( INDRD ), RWORK( INDRE ), 453 $ WORK( INDTAU ), WORK( INDW ), LLWORK, 454 $ INFO ) 455* 456 END IF 457 END IF 458* 459* All processors participate in moving the data back to the 460* way that PCHENTRD expects it. 461* 462 CALL PSLAMR1D( N-1, RWORK( INDRE ), 1, 1, DESCB, E, 1, JA, 463 $ DESCA ) 464* 465 CALL PSLAMR1D( N, RWORK( INDRD ), 1, 1, DESCB, D, 1, JA, 466 $ DESCA ) 467* 468 CALL PCLAMR1D( N, WORK( INDTAU ), 1, 1, DESCB, TAU, 1, JA, 469 $ DESCA ) 470* 471 CALL PCTRMR2D( UPLO, 'N', N, N, WORK( INDB ), 1, 1, DESCB, A, 472 $ IA, JA, DESCA, ICTXT ) 473* 474 IF( MYROWB.GE.0 ) 475 $ CALL BLACS_GRIDEXIT( CTXTB ) 476* 477 ELSE 478* 479 CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP ) 480 CALL PB_TOPGET( ICTXT, 'Combine', 'Rowwise', ROWCTOP ) 481 CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', '1-tree' ) 482 CALL PB_TOPSET( ICTXT, 'Combine', 'Rowwise', '1-tree' ) 483* 484 IPW = NP*NB + 1 485* 486 IF( UPPER ) THEN 487* 488* Reduce the upper triangle of sub( A ). 489* 490 KK = MOD( JA+N-1, NB ) 491 IF( KK.EQ.0 ) 492 $ KK = NB 493 CALL DESCSET( DESCW, N, NB, NB, NB, IAROW, 494 $ INDXG2P( JA+N-KK, NB, MYCOL, DESCA( CSRC_ ), 495 $ NPCOL ), ICTXT, MAX( 1, NP ) ) 496* 497 DO 10 K = N - KK + 1, NB + 1, -NB 498 JB = MIN( N-K+1, NB ) 499 I = IA + K - 1 500 J = JA + K - 1 501* 502* Reduce columns I:I+NB-1 to tridiagonal form and form 503* the matrix W which is needed to update the unreduced part of 504* the matrix 505* 506 CALL PCLATRD( UPLO, K+JB-1, JB, A, IA, JA, DESCA, D, E, 507 $ TAU, WORK, 1, 1, DESCW, WORK( IPW ) ) 508* 509* Update the unreduced submatrix A(IA:I-1,JA:J-1), using an 510* update of the form: 511* A(IA:I-1,JA:J-1) := A(IA:I-1,JA:J-1) - V*W' - W*V' 512* 513 CALL PCHER2K( UPLO, 'No transpose', K-1, JB, -CONE, A, 514 $ IA, J, DESCA, WORK, 1, 1, DESCW, ONE, A, 515 $ IA, JA, DESCA ) 516* 517* Copy last superdiagonal element back into sub( A ) 518* 519 JX = MIN( INDXG2L( J, NB, 0, IACOL, NPCOL ), NQ ) 520 CALL PCELSET( A, I-1, J, DESCA, CMPLX( E( JX ) ) ) 521* 522 DESCW( CSRC_ ) = MOD( DESCW( CSRC_ )+NPCOL-1, NPCOL ) 523* 524 10 CONTINUE 525* 526* Use unblocked code to reduce the last or only block 527* 528 CALL PCHETD2( UPLO, MIN( N, NB ), A, IA, JA, DESCA, D, E, 529 $ TAU, WORK, LWORK, IINFO ) 530* 531 ELSE 532* 533* Reduce the lower triangle of sub( A ) 534* 535 KK = MOD( JA+N-1, NB ) 536 IF( KK.EQ.0 ) 537 $ KK = NB 538 CALL DESCSET( DESCW, N, NB, NB, NB, IAROW, IACOL, ICTXT, 539 $ MAX( 1, NP ) ) 540* 541 DO 20 K = 1, N - NB, NB 542 I = IA + K - 1 543 J = JA + K - 1 544* 545* Reduce columns I:I+NB-1 to tridiagonal form and form 546* the matrix W which is needed to update the unreduced part 547* of the matrix 548* 549 CALL PCLATRD( UPLO, N-K+1, NB, A, I, J, DESCA, D, E, TAU, 550 $ WORK, K, 1, DESCW, WORK( IPW ) ) 551* 552* Update the unreduced submatrix A(I+NB:IA+N-1,I+NB:IA+N-1), 553* using an update of the form: A(I+NB:IA+N-1,I+NB:IA+N-1) := 554* A(I+NB:IA+N-1,I+NB:IA+N-1) - V*W' - W*V' 555* 556 CALL PCHER2K( UPLO, 'No transpose', N-K-NB+1, NB, -CONE, 557 $ A, I+NB, J, DESCA, WORK, K+NB, 1, DESCW, 558 $ ONE, A, I+NB, J+NB, DESCA ) 559* 560* Copy last subdiagonal element back into sub( A ) 561* 562 JX = MIN( INDXG2L( J+NB-1, NB, 0, IACOL, NPCOL ), NQ ) 563 CALL PCELSET( A, I+NB, J+NB-1, DESCA, CMPLX( E( JX ) ) ) 564* 565 DESCW( CSRC_ ) = MOD( DESCW( CSRC_ )+1, NPCOL ) 566* 567 20 CONTINUE 568* 569* Use unblocked code to reduce the last or only block 570* 571 CALL PCHETD2( UPLO, KK, A, IA+K-1, JA+K-1, DESCA, D, E, TAU, 572 $ WORK, LWORK, IINFO ) 573 END IF 574* 575 CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', COLCTOP ) 576 CALL PB_TOPSET( ICTXT, 'Combine', 'Rowwise', ROWCTOP ) 577* 578 END IF 579* 580 WORK( 1 ) = CMPLX( REAL( TTLWMIN ) ) 581 RWORK( 1 ) = REAL( TTLRWMIN ) 582* 583 RETURN 584* 585* End of PCHENTRD 586* 587 END 588