1*> \brief \b DGESVJ 2* 3* =========== DOCUMENTATION =========== 4* 5* Online html documentation available at 6* http://www.netlib.org/lapack/explore-html/ 7* 8*> \htmlonly 9*> Download DGESVJ + dependencies 10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f"> 11*> [TGZ]</a> 12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f"> 13*> [ZIP]</a> 14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f"> 15*> [TXT]</a> 16*> \endhtmlonly 17* 18* Definition: 19* =========== 20* 21* SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 22* LDV, WORK, LWORK, INFO ) 23* 24* .. Scalar Arguments .. 25* INTEGER INFO, LDA, LDV, LWORK, M, MV, N 26* CHARACTER*1 JOBA, JOBU, JOBV 27* .. 28* .. Array Arguments .. 29* DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ), 30* $ WORK( LWORK ) 31* .. 32* 33* 34*> \par Purpose: 35* ============= 36*> 37*> \verbatim 38*> 39*> DGESVJ computes the singular value decomposition (SVD) of a real 40*> M-by-N matrix A, where M >= N. The SVD of A is written as 41*> [++] [xx] [x0] [xx] 42*> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx] 43*> [++] [xx] 44*> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal 45*> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements 46*> of SIGMA are the singular values of A. The columns of U and V are the 47*> left and the right singular vectors of A, respectively. 48*> DGESVJ can sometimes compute tiny singular values and their singular vectors much 49*> more accurately than other SVD routines, see below under Further Details. 50*> \endverbatim 51* 52* Arguments: 53* ========== 54* 55*> \param[in] JOBA 56*> \verbatim 57*> JOBA is CHARACTER*1 58*> Specifies the structure of A. 59*> = 'L': The input matrix A is lower triangular; 60*> = 'U': The input matrix A is upper triangular; 61*> = 'G': The input matrix A is general M-by-N matrix, M >= N. 62*> \endverbatim 63*> 64*> \param[in] JOBU 65*> \verbatim 66*> JOBU is CHARACTER*1 67*> Specifies whether to compute the left singular vectors 68*> (columns of U): 69*> = 'U': The left singular vectors corresponding to the nonzero 70*> singular values are computed and returned in the leading 71*> columns of A. See more details in the description of A. 72*> The default numerical orthogonality threshold is set to 73*> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E'). 74*> = 'C': Analogous to JOBU='U', except that user can control the 75*> level of numerical orthogonality of the computed left 76*> singular vectors. TOL can be set to TOL = CTOL*EPS, where 77*> CTOL is given on input in the array WORK. 78*> No CTOL smaller than ONE is allowed. CTOL greater 79*> than 1 / EPS is meaningless. The option 'C' 80*> can be used if M*EPS is satisfactory orthogonality 81*> of the computed left singular vectors, so CTOL=M could 82*> save few sweeps of Jacobi rotations. 83*> See the descriptions of A and WORK(1). 84*> = 'N': The matrix U is not computed. However, see the 85*> description of A. 86*> \endverbatim 87*> 88*> \param[in] JOBV 89*> \verbatim 90*> JOBV is CHARACTER*1 91*> Specifies whether to compute the right singular vectors, that 92*> is, the matrix V: 93*> = 'V': the matrix V is computed and returned in the array V 94*> = 'A': the Jacobi rotations are applied to the MV-by-N 95*> array V. In other words, the right singular vector 96*> matrix V is not computed explicitly, instead it is 97*> applied to an MV-by-N matrix initially stored in the 98*> first MV rows of V. 99*> = 'N': the matrix V is not computed and the array V is not 100*> referenced 101*> \endverbatim 102*> 103*> \param[in] M 104*> \verbatim 105*> M is INTEGER 106*> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0. 107*> \endverbatim 108*> 109*> \param[in] N 110*> \verbatim 111*> N is INTEGER 112*> The number of columns of the input matrix A. 113*> M >= N >= 0. 114*> \endverbatim 115*> 116*> \param[in,out] A 117*> \verbatim 118*> A is DOUBLE PRECISION array, dimension (LDA,N) 119*> On entry, the M-by-N matrix A. 120*> On exit : 121*> If JOBU = 'U' .OR. JOBU = 'C' : 122*> If INFO = 0 : 123*> RANKA orthonormal columns of U are returned in the 124*> leading RANKA columns of the array A. Here RANKA <= N 125*> is the number of computed singular values of A that are 126*> above the underflow threshold DLAMCH('S'). The singular 127*> vectors corresponding to underflowed or zero singular 128*> values are not computed. The value of RANKA is returned 129*> in the array WORK as RANKA=NINT(WORK(2)). Also see the 130*> descriptions of SVA and WORK. The computed columns of U 131*> are mutually numerically orthogonal up to approximately 132*> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'), 133*> see the description of JOBU. 134*> If INFO > 0 : 135*> the procedure DGESVJ did not converge in the given number 136*> of iterations (sweeps). In that case, the computed 137*> columns of U may not be orthogonal up to TOL. The output 138*> U (stored in A), SIGMA (given by the computed singular 139*> values in SVA(1:N)) and V is still a decomposition of the 140*> input matrix A in the sense that the residual 141*> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. 142*> 143*> If JOBU = 'N' : 144*> If INFO = 0 : 145*> Note that the left singular vectors are 'for free' in the 146*> one-sided Jacobi SVD algorithm. However, if only the 147*> singular values are needed, the level of numerical 148*> orthogonality of U is not an issue and iterations are 149*> stopped when the columns of the iterated matrix are 150*> numerically orthogonal up to approximately M*EPS. Thus, 151*> on exit, A contains the columns of U scaled with the 152*> corresponding singular values. 153*> If INFO > 0 : 154*> the procedure DGESVJ did not converge in the given number 155*> of iterations (sweeps). 156*> \endverbatim 157*> 158*> \param[in] LDA 159*> \verbatim 160*> LDA is INTEGER 161*> The leading dimension of the array A. LDA >= max(1,M). 162*> \endverbatim 163*> 164*> \param[out] SVA 165*> \verbatim 166*> SVA is DOUBLE PRECISION array, dimension (N) 167*> On exit : 168*> If INFO = 0 : 169*> depending on the value SCALE = WORK(1), we have: 170*> If SCALE = ONE : 171*> SVA(1:N) contains the computed singular values of A. 172*> During the computation SVA contains the Euclidean column 173*> norms of the iterated matrices in the array A. 174*> If SCALE .NE. ONE : 175*> The singular values of A are SCALE*SVA(1:N), and this 176*> factored representation is due to the fact that some of the 177*> singular values of A might underflow or overflow. 178*> If INFO > 0 : 179*> the procedure DGESVJ did not converge in the given number of 180*> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate. 181*> \endverbatim 182*> 183*> \param[in] MV 184*> \verbatim 185*> MV is INTEGER 186*> If JOBV = 'A', then the product of Jacobi rotations in DGESVJ 187*> is applied to the first MV rows of V. See the description of JOBV. 188*> \endverbatim 189*> 190*> \param[in,out] V 191*> \verbatim 192*> V is DOUBLE PRECISION array, dimension (LDV,N) 193*> If JOBV = 'V', then V contains on exit the N-by-N matrix of 194*> the right singular vectors; 195*> If JOBV = 'A', then V contains the product of the computed right 196*> singular vector matrix and the initial matrix in 197*> the array V. 198*> If JOBV = 'N', then V is not referenced. 199*> \endverbatim 200*> 201*> \param[in] LDV 202*> \verbatim 203*> LDV is INTEGER 204*> The leading dimension of the array V, LDV >= 1. 205*> If JOBV = 'V', then LDV >= max(1,N). 206*> If JOBV = 'A', then LDV >= max(1,MV) . 207*> \endverbatim 208*> 209*> \param[in,out] WORK 210*> \verbatim 211*> WORK is DOUBLE PRECISION array, dimension (LWORK) 212*> On entry : 213*> If JOBU = 'C' : 214*> WORK(1) = CTOL, where CTOL defines the threshold for convergence. 215*> The process stops if all columns of A are mutually 216*> orthogonal up to CTOL*EPS, EPS=DLAMCH('E'). 217*> It is required that CTOL >= ONE, i.e. it is not 218*> allowed to force the routine to obtain orthogonality 219*> below EPS. 220*> On exit : 221*> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) 222*> are the computed singular values of A. 223*> (See description of SVA().) 224*> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero 225*> singular values. 226*> WORK(3) = NINT(WORK(3)) is the number of the computed singular 227*> values that are larger than the underflow threshold. 228*> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi 229*> rotations needed for numerical convergence. 230*> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. 231*> This is useful information in cases when DGESVJ did 232*> not converge, as it can be used to estimate whether 233*> the output is still useful and for post festum analysis. 234*> WORK(6) = the largest absolute value over all sines of the 235*> Jacobi rotation angles in the last sweep. It can be 236*> useful for a post festum analysis. 237*> \endverbatim 238*> 239*> \param[in] LWORK 240*> \verbatim 241*> LWORK is INTEGER 242*> length of WORK, WORK >= MAX(6,M+N) 243*> \endverbatim 244*> 245*> \param[out] INFO 246*> \verbatim 247*> INFO is INTEGER 248*> = 0: successful exit. 249*> < 0: if INFO = -i, then the i-th argument had an illegal value 250*> > 0: DGESVJ did not converge in the maximal allowed number (30) 251*> of sweeps. The output may still be useful. See the 252*> description of WORK. 253*> \endverbatim 254* 255* Authors: 256* ======== 257* 258*> \author Univ. of Tennessee 259*> \author Univ. of California Berkeley 260*> \author Univ. of Colorado Denver 261*> \author NAG Ltd. 262* 263*> \ingroup doubleGEcomputational 264* 265*> \par Further Details: 266* ===================== 267*> 268*> \verbatim 269*> 270*> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane 271*> rotations. The rotations are implemented as fast scaled rotations of 272*> Anda and Park [1]. In the case of underflow of the Jacobi angle, a 273*> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses 274*> column interchanges of de Rijk [2]. The relative accuracy of the computed 275*> singular values and the accuracy of the computed singular vectors (in 276*> angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. 277*> The condition number that determines the accuracy in the full rank case 278*> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the 279*> spectral condition number. The best performance of this Jacobi SVD 280*> procedure is achieved if used in an accelerated version of Drmac and 281*> Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. 282*> Some tuning parameters (marked with [TP]) are available for the 283*> implementer. 284*> The computational range for the nonzero singular values is the machine 285*> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even 286*> denormalized singular values can be computed with the corresponding 287*> gradual loss of accurate digits. 288*> \endverbatim 289* 290*> \par Contributors: 291* ================== 292*> 293*> \verbatim 294*> 295*> ============ 296*> 297*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) 298*> \endverbatim 299* 300*> \par References: 301* ================ 302*> 303*> \verbatim 304*> 305*> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. 306*> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. 307*> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the 308*> singular value decomposition on a vector computer. 309*> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. 310*> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. 311*> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular 312*> value computation in floating point arithmetic. 313*> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. 314*> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. 315*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. 316*> LAPACK Working note 169. 317*> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. 318*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. 319*> LAPACK Working note 170. 320*> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, 321*> QSVD, (H,K)-SVD computations. 322*> Department of Mathematics, University of Zagreb, 2008. 323*> \endverbatim 324* 325*> \par Bugs, examples and comments: 326* ================================= 327*> 328*> \verbatim 329*> =========================== 330*> Please report all bugs and send interesting test examples and comments to 331*> drmac@math.hr. Thank you. 332*> \endverbatim 333*> 334* ===================================================================== 335 SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 336 $ LDV, WORK, LWORK, INFO ) 337* 338* -- LAPACK computational routine -- 339* -- LAPACK is a software package provided by Univ. of Tennessee, -- 340* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 341* 342* .. Scalar Arguments .. 343 INTEGER INFO, LDA, LDV, LWORK, M, MV, N 344 CHARACTER*1 JOBA, JOBU, JOBV 345* .. 346* .. Array Arguments .. 347 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ), 348 $ WORK( LWORK ) 349* .. 350* 351* ===================================================================== 352* 353* .. Local Parameters .. 354 DOUBLE PRECISION ZERO, HALF, ONE 355 PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0) 356 INTEGER NSWEEP 357 PARAMETER ( NSWEEP = 30 ) 358* .. 359* .. Local Scalars .. 360 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG, 361 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ, 362 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL, 363 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA, 364 $ THSIGN, TOL 365 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1, 366 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34, 367 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP, 368 $ SWBAND 369 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK, 370 $ RSVEC, UCTOL, UPPER 371* .. 372* .. Local Arrays .. 373 DOUBLE PRECISION FASTR( 5 ) 374* .. 375* .. Intrinsic Functions .. 376 INTRINSIC DABS, MAX, MIN, DBLE, DSIGN, DSQRT 377* .. 378* .. External Functions .. 379* .. 380* from BLAS 381 DOUBLE PRECISION DDOT, DNRM2 382 EXTERNAL DDOT, DNRM2 383 INTEGER IDAMAX 384 EXTERNAL IDAMAX 385* from LAPACK 386 DOUBLE PRECISION DLAMCH 387 EXTERNAL DLAMCH 388 LOGICAL LSAME 389 EXTERNAL LSAME 390* .. 391* .. External Subroutines .. 392* .. 393* from BLAS 394 EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP 395* from LAPACK 396 EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA 397* 398 EXTERNAL DGSVJ0, DGSVJ1 399* .. 400* .. Executable Statements .. 401* 402* Test the input arguments 403* 404 LSVEC = LSAME( JOBU, 'U' ) 405 UCTOL = LSAME( JOBU, 'C' ) 406 RSVEC = LSAME( JOBV, 'V' ) 407 APPLV = LSAME( JOBV, 'A' ) 408 UPPER = LSAME( JOBA, 'U' ) 409 LOWER = LSAME( JOBA, 'L' ) 410* 411 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN 412 INFO = -1 413 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN 414 INFO = -2 415 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN 416 INFO = -3 417 ELSE IF( M.LT.0 ) THEN 418 INFO = -4 419 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN 420 INFO = -5 421 ELSE IF( LDA.LT.M ) THEN 422 INFO = -7 423 ELSE IF( MV.LT.0 ) THEN 424 INFO = -9 425 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR. 426 $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN 427 INFO = -11 428 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN 429 INFO = -12 430 ELSE IF( LWORK.LT.MAX( M+N, 6 ) ) THEN 431 INFO = -13 432 ELSE 433 INFO = 0 434 END IF 435* 436* #:( 437 IF( INFO.NE.0 ) THEN 438 CALL XERBLA( 'DGESVJ', -INFO ) 439 RETURN 440 END IF 441* 442* #:) Quick return for void matrix 443* 444 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN 445* 446* Set numerical parameters 447* The stopping criterion for Jacobi rotations is 448* 449* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS 450* 451* where EPS is the round-off and CTOL is defined as follows: 452* 453 IF( UCTOL ) THEN 454* ... user controlled 455 CTOL = WORK( 1 ) 456 ELSE 457* ... default 458 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN 459 CTOL = DSQRT( DBLE( M ) ) 460 ELSE 461 CTOL = DBLE( M ) 462 END IF 463 END IF 464* ... and the machine dependent parameters are 465*[!] (Make sure that DLAMCH() works properly on the target machine.) 466* 467 EPSLN = DLAMCH( 'Epsilon' ) 468 ROOTEPS = DSQRT( EPSLN ) 469 SFMIN = DLAMCH( 'SafeMinimum' ) 470 ROOTSFMIN = DSQRT( SFMIN ) 471 SMALL = SFMIN / EPSLN 472 BIG = DLAMCH( 'Overflow' ) 473* BIG = ONE / SFMIN 474 ROOTBIG = ONE / ROOTSFMIN 475 LARGE = BIG / DSQRT( DBLE( M*N ) ) 476 BIGTHETA = ONE / ROOTEPS 477* 478 TOL = CTOL*EPSLN 479 ROOTTOL = DSQRT( TOL ) 480* 481 IF( DBLE( M )*EPSLN.GE.ONE ) THEN 482 INFO = -4 483 CALL XERBLA( 'DGESVJ', -INFO ) 484 RETURN 485 END IF 486* 487* Initialize the right singular vector matrix. 488* 489 IF( RSVEC ) THEN 490 MVL = N 491 CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV ) 492 ELSE IF( APPLV ) THEN 493 MVL = MV 494 END IF 495 RSVEC = RSVEC .OR. APPLV 496* 497* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N ) 498*(!) If necessary, scale A to protect the largest singular value 499* from overflow. It is possible that saving the largest singular 500* value destroys the information about the small ones. 501* This initial scaling is almost minimal in the sense that the 502* goal is to make sure that no column norm overflows, and that 503* DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries 504* in A are detected, the procedure returns with INFO=-6. 505* 506 SKL= ONE / DSQRT( DBLE( M )*DBLE( N ) ) 507 NOSCALE = .TRUE. 508 GOSCALE = .TRUE. 509* 510 IF( LOWER ) THEN 511* the input matrix is M-by-N lower triangular (trapezoidal) 512 DO 1874 p = 1, N 513 AAPP = ZERO 514 AAQQ = ONE 515 CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ ) 516 IF( AAPP.GT.BIG ) THEN 517 INFO = -6 518 CALL XERBLA( 'DGESVJ', -INFO ) 519 RETURN 520 END IF 521 AAQQ = DSQRT( AAQQ ) 522 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 523 SVA( p ) = AAPP*AAQQ 524 ELSE 525 NOSCALE = .FALSE. 526 SVA( p ) = AAPP*( AAQQ*SKL) 527 IF( GOSCALE ) THEN 528 GOSCALE = .FALSE. 529 DO 1873 q = 1, p - 1 530 SVA( q ) = SVA( q )*SKL 531 1873 CONTINUE 532 END IF 533 END IF 534 1874 CONTINUE 535 ELSE IF( UPPER ) THEN 536* the input matrix is M-by-N upper triangular (trapezoidal) 537 DO 2874 p = 1, N 538 AAPP = ZERO 539 AAQQ = ONE 540 CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ ) 541 IF( AAPP.GT.BIG ) THEN 542 INFO = -6 543 CALL XERBLA( 'DGESVJ', -INFO ) 544 RETURN 545 END IF 546 AAQQ = DSQRT( AAQQ ) 547 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 548 SVA( p ) = AAPP*AAQQ 549 ELSE 550 NOSCALE = .FALSE. 551 SVA( p ) = AAPP*( AAQQ*SKL) 552 IF( GOSCALE ) THEN 553 GOSCALE = .FALSE. 554 DO 2873 q = 1, p - 1 555 SVA( q ) = SVA( q )*SKL 556 2873 CONTINUE 557 END IF 558 END IF 559 2874 CONTINUE 560 ELSE 561* the input matrix is M-by-N general dense 562 DO 3874 p = 1, N 563 AAPP = ZERO 564 AAQQ = ONE 565 CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ ) 566 IF( AAPP.GT.BIG ) THEN 567 INFO = -6 568 CALL XERBLA( 'DGESVJ', -INFO ) 569 RETURN 570 END IF 571 AAQQ = DSQRT( AAQQ ) 572 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 573 SVA( p ) = AAPP*AAQQ 574 ELSE 575 NOSCALE = .FALSE. 576 SVA( p ) = AAPP*( AAQQ*SKL) 577 IF( GOSCALE ) THEN 578 GOSCALE = .FALSE. 579 DO 3873 q = 1, p - 1 580 SVA( q ) = SVA( q )*SKL 581 3873 CONTINUE 582 END IF 583 END IF 584 3874 CONTINUE 585 END IF 586* 587 IF( NOSCALE )SKL= ONE 588* 589* Move the smaller part of the spectrum from the underflow threshold 590*(!) Start by determining the position of the nonzero entries of the 591* array SVA() relative to ( SFMIN, BIG ). 592* 593 AAPP = ZERO 594 AAQQ = BIG 595 DO 4781 p = 1, N 596 IF( SVA( p ).NE.ZERO )AAQQ = MIN( AAQQ, SVA( p ) ) 597 AAPP = MAX( AAPP, SVA( p ) ) 598 4781 CONTINUE 599* 600* #:) Quick return for zero matrix 601* 602 IF( AAPP.EQ.ZERO ) THEN 603 IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA ) 604 WORK( 1 ) = ONE 605 WORK( 2 ) = ZERO 606 WORK( 3 ) = ZERO 607 WORK( 4 ) = ZERO 608 WORK( 5 ) = ZERO 609 WORK( 6 ) = ZERO 610 RETURN 611 END IF 612* 613* #:) Quick return for one-column matrix 614* 615 IF( N.EQ.1 ) THEN 616 IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1, 617 $ A( 1, 1 ), LDA, IERR ) 618 WORK( 1 ) = ONE / SKL 619 IF( SVA( 1 ).GE.SFMIN ) THEN 620 WORK( 2 ) = ONE 621 ELSE 622 WORK( 2 ) = ZERO 623 END IF 624 WORK( 3 ) = ZERO 625 WORK( 4 ) = ZERO 626 WORK( 5 ) = ZERO 627 WORK( 6 ) = ZERO 628 RETURN 629 END IF 630* 631* Protect small singular values from underflow, and try to 632* avoid underflows/overflows in computing Jacobi rotations. 633* 634 SN = DSQRT( SFMIN / EPSLN ) 635 TEMP1 = DSQRT( BIG / DBLE( N ) ) 636 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR. 637 $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN 638 TEMP1 = MIN( BIG, TEMP1 / AAPP ) 639* AAQQ = AAQQ*TEMP1 640* AAPP = AAPP*TEMP1 641 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN 642 TEMP1 = MIN( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) ) 643* AAQQ = AAQQ*TEMP1 644* AAPP = AAPP*TEMP1 645 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 646 TEMP1 = MAX( SN / AAQQ, TEMP1 / AAPP ) 647* AAQQ = AAQQ*TEMP1 648* AAPP = AAPP*TEMP1 649 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 650 TEMP1 = MIN( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) ) 651* AAQQ = AAQQ*TEMP1 652* AAPP = AAPP*TEMP1 653 ELSE 654 TEMP1 = ONE 655 END IF 656* 657* Scale, if necessary 658* 659 IF( TEMP1.NE.ONE ) THEN 660 CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR ) 661 END IF 662 SKL= TEMP1*SKL 663 IF( SKL.NE.ONE ) THEN 664 CALL DLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR ) 665 SKL= ONE / SKL 666 END IF 667* 668* Row-cyclic Jacobi SVD algorithm with column pivoting 669* 670 EMPTSW = ( N*( N-1 ) ) / 2 671 NOTROT = 0 672 FASTR( 1 ) = ZERO 673* 674* A is represented in factored form A = A * diag(WORK), where diag(WORK) 675* is initialized to identity. WORK is updated during fast scaled 676* rotations. 677* 678 DO 1868 q = 1, N 679 WORK( q ) = ONE 680 1868 CONTINUE 681* 682* 683 SWBAND = 3 684*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective 685* if DGESVJ is used as a computational routine in the preconditioned 686* Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure 687* works on pivots inside a band-like region around the diagonal. 688* The boundaries are determined dynamically, based on the number of 689* pivots above a threshold. 690* 691 KBL = MIN( 8, N ) 692*[TP] KBL is a tuning parameter that defines the tile size in the 693* tiling of the p-q loops of pivot pairs. In general, an optimal 694* value of KBL depends on the matrix dimensions and on the 695* parameters of the computer's memory. 696* 697 NBL = N / KBL 698 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1 699* 700 BLSKIP = KBL**2 701*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL. 702* 703 ROWSKIP = MIN( 5, KBL ) 704*[TP] ROWSKIP is a tuning parameter. 705* 706 LKAHEAD = 1 707*[TP] LKAHEAD is a tuning parameter. 708* 709* Quasi block transformations, using the lower (upper) triangular 710* structure of the input matrix. The quasi-block-cycling usually 711* invokes cubic convergence. Big part of this cycle is done inside 712* canonical subspaces of dimensions less than M. 713* 714 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX( 64, 4*KBL ) ) ) THEN 715*[TP] The number of partition levels and the actual partition are 716* tuning parameters. 717 N4 = N / 4 718 N2 = N / 2 719 N34 = 3*N4 720 IF( APPLV ) THEN 721 q = 0 722 ELSE 723 q = 1 724 END IF 725* 726 IF( LOWER ) THEN 727* 728* This works very well on lower triangular matrices, in particular 729* in the framework of the preconditioned Jacobi SVD (xGEJSV). 730* The idea is simple: 731* [+ 0 0 0] Note that Jacobi transformations of [0 0] 732* [+ + 0 0] [0 0] 733* [+ + x 0] actually work on [x 0] [x 0] 734* [+ + x x] [x x]. [x x] 735* 736 CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA, 737 $ WORK( N34+1 ), SVA( N34+1 ), MVL, 738 $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL, 739 $ 2, WORK( N+1 ), LWORK-N, IERR ) 740* 741 CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA, 742 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 743 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2, 744 $ WORK( N+1 ), LWORK-N, IERR ) 745* 746 CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA, 747 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 748 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 749 $ WORK( N+1 ), LWORK-N, IERR ) 750* 751 CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA, 752 $ WORK( N4+1 ), SVA( N4+1 ), MVL, 753 $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1, 754 $ WORK( N+1 ), LWORK-N, IERR ) 755* 756 CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV, 757 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 758 $ IERR ) 759* 760 CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V, 761 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 762 $ LWORK-N, IERR ) 763* 764* 765 ELSE IF( UPPER ) THEN 766* 767* 768 CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV, 769 $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N, 770 $ IERR ) 771* 772 CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ), 773 $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV, 774 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 775 $ IERR ) 776* 777 CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V, 778 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 779 $ LWORK-N, IERR ) 780* 781 CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA, 782 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 783 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 784 $ WORK( N+1 ), LWORK-N, IERR ) 785 786 END IF 787* 788 END IF 789* 790* .. Row-cyclic pivot strategy with de Rijk's pivoting .. 791* 792 DO 1993 i = 1, NSWEEP 793* 794* .. go go go ... 795* 796 MXAAPQ = ZERO 797 MXSINJ = ZERO 798 ISWROT = 0 799* 800 NOTROT = 0 801 PSKIPPED = 0 802* 803* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs 804* 1 <= p < q <= N. This is the first step toward a blocked implementation 805* of the rotations. New implementation, based on block transformations, 806* is under development. 807* 808 DO 2000 ibr = 1, NBL 809* 810 igl = ( ibr-1 )*KBL + 1 811* 812 DO 1002 ir1 = 0, MIN( LKAHEAD, NBL-ibr ) 813* 814 igl = igl + ir1*KBL 815* 816 DO 2001 p = igl, MIN( igl+KBL-1, N-1 ) 817* 818* .. de Rijk's pivoting 819* 820 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1 821 IF( p.NE.q ) THEN 822 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 823 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, 824 $ V( 1, q ), 1 ) 825 TEMP1 = SVA( p ) 826 SVA( p ) = SVA( q ) 827 SVA( q ) = TEMP1 828 TEMP1 = WORK( p ) 829 WORK( p ) = WORK( q ) 830 WORK( q ) = TEMP1 831 END IF 832* 833 IF( ir1.EQ.0 ) THEN 834* 835* Column norms are periodically updated by explicit 836* norm computation. 837* Caveat: 838* Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1) 839* as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to 840* overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to 841* underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold). 842* Hence, DNRM2 cannot be trusted, not even in the case when 843* the true norm is far from the under(over)flow boundaries. 844* If properly implemented DNRM2 is available, the IF-THEN-ELSE 845* below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)". 846* 847 IF( ( SVA( p ).LT.ROOTBIG ) .AND. 848 $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN 849 SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p ) 850 ELSE 851 TEMP1 = ZERO 852 AAPP = ONE 853 CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP ) 854 SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p ) 855 END IF 856 AAPP = SVA( p ) 857 ELSE 858 AAPP = SVA( p ) 859 END IF 860* 861 IF( AAPP.GT.ZERO ) THEN 862* 863 PSKIPPED = 0 864* 865 DO 2002 q = p + 1, MIN( igl+KBL-1, N ) 866* 867 AAQQ = SVA( q ) 868* 869 IF( AAQQ.GT.ZERO ) THEN 870* 871 AAPP0 = AAPP 872 IF( AAQQ.GE.ONE ) THEN 873 ROTOK = ( SMALL*AAPP ).LE.AAQQ 874 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 875 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 876 $ q ), 1 )*WORK( p )*WORK( q ) / 877 $ AAQQ ) / AAPP 878 ELSE 879 CALL DCOPY( M, A( 1, p ), 1, 880 $ WORK( N+1 ), 1 ) 881 CALL DLASCL( 'G', 0, 0, AAPP, 882 $ WORK( p ), M, 1, 883 $ WORK( N+1 ), LDA, IERR ) 884 AAPQ = DDOT( M, WORK( N+1 ), 1, 885 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 886 END IF 887 ELSE 888 ROTOK = AAPP.LE.( AAQQ / SMALL ) 889 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 890 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 891 $ q ), 1 )*WORK( p )*WORK( q ) / 892 $ AAQQ ) / AAPP 893 ELSE 894 CALL DCOPY( M, A( 1, q ), 1, 895 $ WORK( N+1 ), 1 ) 896 CALL DLASCL( 'G', 0, 0, AAQQ, 897 $ WORK( q ), M, 1, 898 $ WORK( N+1 ), LDA, IERR ) 899 AAPQ = DDOT( M, WORK( N+1 ), 1, 900 $ A( 1, p ), 1 )*WORK( p ) / AAPP 901 END IF 902 END IF 903* 904 MXAAPQ = MAX( MXAAPQ, DABS( AAPQ ) ) 905* 906* TO rotate or NOT to rotate, THAT is the question ... 907* 908 IF( DABS( AAPQ ).GT.TOL ) THEN 909* 910* .. rotate 911*[RTD] ROTATED = ROTATED + ONE 912* 913 IF( ir1.EQ.0 ) THEN 914 NOTROT = 0 915 PSKIPPED = 0 916 ISWROT = ISWROT + 1 917 END IF 918* 919 IF( ROTOK ) THEN 920* 921 AQOAP = AAQQ / AAPP 922 APOAQ = AAPP / AAQQ 923 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ 924* 925 IF( DABS( THETA ).GT.BIGTHETA ) THEN 926* 927 T = HALF / THETA 928 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 929 FASTR( 4 ) = -T*WORK( q ) / 930 $ WORK( p ) 931 CALL DROTM( M, A( 1, p ), 1, 932 $ A( 1, q ), 1, FASTR ) 933 IF( RSVEC )CALL DROTM( MVL, 934 $ V( 1, p ), 1, 935 $ V( 1, q ), 1, 936 $ FASTR ) 937 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 938 $ ONE+T*APOAQ*AAPQ ) ) 939 AAPP = AAPP*DSQRT( MAX( ZERO, 940 $ ONE-T*AQOAP*AAPQ ) ) 941 MXSINJ = MAX( MXSINJ, DABS( T ) ) 942* 943 ELSE 944* 945* .. choose correct signum for THETA and rotate 946* 947 THSIGN = -DSIGN( ONE, AAPQ ) 948 T = ONE / ( THETA+THSIGN* 949 $ DSQRT( ONE+THETA*THETA ) ) 950 CS = DSQRT( ONE / ( ONE+T*T ) ) 951 SN = T*CS 952* 953 MXSINJ = MAX( MXSINJ, DABS( SN ) ) 954 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 955 $ ONE+T*APOAQ*AAPQ ) ) 956 AAPP = AAPP*DSQRT( MAX( ZERO, 957 $ ONE-T*AQOAP*AAPQ ) ) 958* 959 APOAQ = WORK( p ) / WORK( q ) 960 AQOAP = WORK( q ) / WORK( p ) 961 IF( WORK( p ).GE.ONE ) THEN 962 IF( WORK( q ).GE.ONE ) THEN 963 FASTR( 3 ) = T*APOAQ 964 FASTR( 4 ) = -T*AQOAP 965 WORK( p ) = WORK( p )*CS 966 WORK( q ) = WORK( q )*CS 967 CALL DROTM( M, A( 1, p ), 1, 968 $ A( 1, q ), 1, 969 $ FASTR ) 970 IF( RSVEC )CALL DROTM( MVL, 971 $ V( 1, p ), 1, V( 1, q ), 972 $ 1, FASTR ) 973 ELSE 974 CALL DAXPY( M, -T*AQOAP, 975 $ A( 1, q ), 1, 976 $ A( 1, p ), 1 ) 977 CALL DAXPY( M, CS*SN*APOAQ, 978 $ A( 1, p ), 1, 979 $ A( 1, q ), 1 ) 980 WORK( p ) = WORK( p )*CS 981 WORK( q ) = WORK( q ) / CS 982 IF( RSVEC ) THEN 983 CALL DAXPY( MVL, -T*AQOAP, 984 $ V( 1, q ), 1, 985 $ V( 1, p ), 1 ) 986 CALL DAXPY( MVL, 987 $ CS*SN*APOAQ, 988 $ V( 1, p ), 1, 989 $ V( 1, q ), 1 ) 990 END IF 991 END IF 992 ELSE 993 IF( WORK( q ).GE.ONE ) THEN 994 CALL DAXPY( M, T*APOAQ, 995 $ A( 1, p ), 1, 996 $ A( 1, q ), 1 ) 997 CALL DAXPY( M, -CS*SN*AQOAP, 998 $ A( 1, q ), 1, 999 $ A( 1, p ), 1 ) 1000 WORK( p ) = WORK( p ) / CS 1001 WORK( q ) = WORK( q )*CS 1002 IF( RSVEC ) THEN 1003 CALL DAXPY( MVL, T*APOAQ, 1004 $ V( 1, p ), 1, 1005 $ V( 1, q ), 1 ) 1006 CALL DAXPY( MVL, 1007 $ -CS*SN*AQOAP, 1008 $ V( 1, q ), 1, 1009 $ V( 1, p ), 1 ) 1010 END IF 1011 ELSE 1012 IF( WORK( p ).GE.WORK( q ) ) 1013 $ THEN 1014 CALL DAXPY( M, -T*AQOAP, 1015 $ A( 1, q ), 1, 1016 $ A( 1, p ), 1 ) 1017 CALL DAXPY( M, CS*SN*APOAQ, 1018 $ A( 1, p ), 1, 1019 $ A( 1, q ), 1 ) 1020 WORK( p ) = WORK( p )*CS 1021 WORK( q ) = WORK( q ) / CS 1022 IF( RSVEC ) THEN 1023 CALL DAXPY( MVL, 1024 $ -T*AQOAP, 1025 $ V( 1, q ), 1, 1026 $ V( 1, p ), 1 ) 1027 CALL DAXPY( MVL, 1028 $ CS*SN*APOAQ, 1029 $ V( 1, p ), 1, 1030 $ V( 1, q ), 1 ) 1031 END IF 1032 ELSE 1033 CALL DAXPY( M, T*APOAQ, 1034 $ A( 1, p ), 1, 1035 $ A( 1, q ), 1 ) 1036 CALL DAXPY( M, 1037 $ -CS*SN*AQOAP, 1038 $ A( 1, q ), 1, 1039 $ A( 1, p ), 1 ) 1040 WORK( p ) = WORK( p ) / CS 1041 WORK( q ) = WORK( q )*CS 1042 IF( RSVEC ) THEN 1043 CALL DAXPY( MVL, 1044 $ T*APOAQ, V( 1, p ), 1045 $ 1, V( 1, q ), 1 ) 1046 CALL DAXPY( MVL, 1047 $ -CS*SN*AQOAP, 1048 $ V( 1, q ), 1, 1049 $ V( 1, p ), 1 ) 1050 END IF 1051 END IF 1052 END IF 1053 END IF 1054 END IF 1055* 1056 ELSE 1057* .. have to use modified Gram-Schmidt like transformation 1058 CALL DCOPY( M, A( 1, p ), 1, 1059 $ WORK( N+1 ), 1 ) 1060 CALL DLASCL( 'G', 0, 0, AAPP, ONE, M, 1061 $ 1, WORK( N+1 ), LDA, 1062 $ IERR ) 1063 CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M, 1064 $ 1, A( 1, q ), LDA, IERR ) 1065 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 1066 CALL DAXPY( M, TEMP1, WORK( N+1 ), 1, 1067 $ A( 1, q ), 1 ) 1068 CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M, 1069 $ 1, A( 1, q ), LDA, IERR ) 1070 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 1071 $ ONE-AAPQ*AAPQ ) ) 1072 MXSINJ = MAX( MXSINJ, SFMIN ) 1073 END IF 1074* END IF ROTOK THEN ... ELSE 1075* 1076* In the case of cancellation in updating SVA(q), SVA(p) 1077* recompute SVA(q), SVA(p). 1078* 1079 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 1080 $ THEN 1081 IF( ( AAQQ.LT.ROOTBIG ) .AND. 1082 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 1083 SVA( q ) = DNRM2( M, A( 1, q ), 1 )* 1084 $ WORK( q ) 1085 ELSE 1086 T = ZERO 1087 AAQQ = ONE 1088 CALL DLASSQ( M, A( 1, q ), 1, T, 1089 $ AAQQ ) 1090 SVA( q ) = T*DSQRT( AAQQ )*WORK( q ) 1091 END IF 1092 END IF 1093 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN 1094 IF( ( AAPP.LT.ROOTBIG ) .AND. 1095 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 1096 AAPP = DNRM2( M, A( 1, p ), 1 )* 1097 $ WORK( p ) 1098 ELSE 1099 T = ZERO 1100 AAPP = ONE 1101 CALL DLASSQ( M, A( 1, p ), 1, T, 1102 $ AAPP ) 1103 AAPP = T*DSQRT( AAPP )*WORK( p ) 1104 END IF 1105 SVA( p ) = AAPP 1106 END IF 1107* 1108 ELSE 1109* A(:,p) and A(:,q) already numerically orthogonal 1110 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1111*[RTD] SKIPPED = SKIPPED + 1 1112 PSKIPPED = PSKIPPED + 1 1113 END IF 1114 ELSE 1115* A(:,q) is zero column 1116 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1117 PSKIPPED = PSKIPPED + 1 1118 END IF 1119* 1120 IF( ( i.LE.SWBAND ) .AND. 1121 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 1122 IF( ir1.EQ.0 )AAPP = -AAPP 1123 NOTROT = 0 1124 GO TO 2103 1125 END IF 1126* 1127 2002 CONTINUE 1128* END q-LOOP 1129* 1130 2103 CONTINUE 1131* bailed out of q-loop 1132* 1133 SVA( p ) = AAPP 1134* 1135 ELSE 1136 SVA( p ) = AAPP 1137 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) ) 1138 $ NOTROT = NOTROT + MIN( igl+KBL-1, N ) - p 1139 END IF 1140* 1141 2001 CONTINUE 1142* end of the p-loop 1143* end of doing the block ( ibr, ibr ) 1144 1002 CONTINUE 1145* end of ir1-loop 1146* 1147* ... go to the off diagonal blocks 1148* 1149 igl = ( ibr-1 )*KBL + 1 1150* 1151 DO 2010 jbc = ibr + 1, NBL 1152* 1153 jgl = ( jbc-1 )*KBL + 1 1154* 1155* doing the block at ( ibr, jbc ) 1156* 1157 IJBLSK = 0 1158 DO 2100 p = igl, MIN( igl+KBL-1, N ) 1159* 1160 AAPP = SVA( p ) 1161 IF( AAPP.GT.ZERO ) THEN 1162* 1163 PSKIPPED = 0 1164* 1165 DO 2200 q = jgl, MIN( jgl+KBL-1, N ) 1166* 1167 AAQQ = SVA( q ) 1168 IF( AAQQ.GT.ZERO ) THEN 1169 AAPP0 = AAPP 1170* 1171* .. M x 2 Jacobi SVD .. 1172* 1173* Safe Gram matrix computation 1174* 1175 IF( AAQQ.GE.ONE ) THEN 1176 IF( AAPP.GE.AAQQ ) THEN 1177 ROTOK = ( SMALL*AAPP ).LE.AAQQ 1178 ELSE 1179 ROTOK = ( SMALL*AAQQ ).LE.AAPP 1180 END IF 1181 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 1182 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 1183 $ q ), 1 )*WORK( p )*WORK( q ) / 1184 $ AAQQ ) / AAPP 1185 ELSE 1186 CALL DCOPY( M, A( 1, p ), 1, 1187 $ WORK( N+1 ), 1 ) 1188 CALL DLASCL( 'G', 0, 0, AAPP, 1189 $ WORK( p ), M, 1, 1190 $ WORK( N+1 ), LDA, IERR ) 1191 AAPQ = DDOT( M, WORK( N+1 ), 1, 1192 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 1193 END IF 1194 ELSE 1195 IF( AAPP.GE.AAQQ ) THEN 1196 ROTOK = AAPP.LE.( AAQQ / SMALL ) 1197 ELSE 1198 ROTOK = AAQQ.LE.( AAPP / SMALL ) 1199 END IF 1200 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 1201 AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 1202 $ q ), 1 )*WORK( p )*WORK( q ) / 1203 $ AAQQ ) / AAPP 1204 ELSE 1205 CALL DCOPY( M, A( 1, q ), 1, 1206 $ WORK( N+1 ), 1 ) 1207 CALL DLASCL( 'G', 0, 0, AAQQ, 1208 $ WORK( q ), M, 1, 1209 $ WORK( N+1 ), LDA, IERR ) 1210 AAPQ = DDOT( M, WORK( N+1 ), 1, 1211 $ A( 1, p ), 1 )*WORK( p ) / AAPP 1212 END IF 1213 END IF 1214* 1215 MXAAPQ = MAX( MXAAPQ, DABS( AAPQ ) ) 1216* 1217* TO rotate or NOT to rotate, THAT is the question ... 1218* 1219 IF( DABS( AAPQ ).GT.TOL ) THEN 1220 NOTROT = 0 1221*[RTD] ROTATED = ROTATED + 1 1222 PSKIPPED = 0 1223 ISWROT = ISWROT + 1 1224* 1225 IF( ROTOK ) THEN 1226* 1227 AQOAP = AAQQ / AAPP 1228 APOAQ = AAPP / AAQQ 1229 THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ 1230 IF( AAQQ.GT.AAPP0 )THETA = -THETA 1231* 1232 IF( DABS( THETA ).GT.BIGTHETA ) THEN 1233 T = HALF / THETA 1234 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 1235 FASTR( 4 ) = -T*WORK( q ) / 1236 $ WORK( p ) 1237 CALL DROTM( M, A( 1, p ), 1, 1238 $ A( 1, q ), 1, FASTR ) 1239 IF( RSVEC )CALL DROTM( MVL, 1240 $ V( 1, p ), 1, 1241 $ V( 1, q ), 1, 1242 $ FASTR ) 1243 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 1244 $ ONE+T*APOAQ*AAPQ ) ) 1245 AAPP = AAPP*DSQRT( MAX( ZERO, 1246 $ ONE-T*AQOAP*AAPQ ) ) 1247 MXSINJ = MAX( MXSINJ, DABS( T ) ) 1248 ELSE 1249* 1250* .. choose correct signum for THETA and rotate 1251* 1252 THSIGN = -DSIGN( ONE, AAPQ ) 1253 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN 1254 T = ONE / ( THETA+THSIGN* 1255 $ DSQRT( ONE+THETA*THETA ) ) 1256 CS = DSQRT( ONE / ( ONE+T*T ) ) 1257 SN = T*CS 1258 MXSINJ = MAX( MXSINJ, DABS( SN ) ) 1259 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 1260 $ ONE+T*APOAQ*AAPQ ) ) 1261 AAPP = AAPP*DSQRT( MAX( ZERO, 1262 $ ONE-T*AQOAP*AAPQ ) ) 1263* 1264 APOAQ = WORK( p ) / WORK( q ) 1265 AQOAP = WORK( q ) / WORK( p ) 1266 IF( WORK( p ).GE.ONE ) THEN 1267* 1268 IF( WORK( q ).GE.ONE ) THEN 1269 FASTR( 3 ) = T*APOAQ 1270 FASTR( 4 ) = -T*AQOAP 1271 WORK( p ) = WORK( p )*CS 1272 WORK( q ) = WORK( q )*CS 1273 CALL DROTM( M, A( 1, p ), 1, 1274 $ A( 1, q ), 1, 1275 $ FASTR ) 1276 IF( RSVEC )CALL DROTM( MVL, 1277 $ V( 1, p ), 1, V( 1, q ), 1278 $ 1, FASTR ) 1279 ELSE 1280 CALL DAXPY( M, -T*AQOAP, 1281 $ A( 1, q ), 1, 1282 $ A( 1, p ), 1 ) 1283 CALL DAXPY( M, CS*SN*APOAQ, 1284 $ A( 1, p ), 1, 1285 $ A( 1, q ), 1 ) 1286 IF( RSVEC ) THEN 1287 CALL DAXPY( MVL, -T*AQOAP, 1288 $ V( 1, q ), 1, 1289 $ V( 1, p ), 1 ) 1290 CALL DAXPY( MVL, 1291 $ CS*SN*APOAQ, 1292 $ V( 1, p ), 1, 1293 $ V( 1, q ), 1 ) 1294 END IF 1295 WORK( p ) = WORK( p )*CS 1296 WORK( q ) = WORK( q ) / CS 1297 END IF 1298 ELSE 1299 IF( WORK( q ).GE.ONE ) THEN 1300 CALL DAXPY( M, T*APOAQ, 1301 $ A( 1, p ), 1, 1302 $ A( 1, q ), 1 ) 1303 CALL DAXPY( M, -CS*SN*AQOAP, 1304 $ A( 1, q ), 1, 1305 $ A( 1, p ), 1 ) 1306 IF( RSVEC ) THEN 1307 CALL DAXPY( MVL, T*APOAQ, 1308 $ V( 1, p ), 1, 1309 $ V( 1, q ), 1 ) 1310 CALL DAXPY( MVL, 1311 $ -CS*SN*AQOAP, 1312 $ V( 1, q ), 1, 1313 $ V( 1, p ), 1 ) 1314 END IF 1315 WORK( p ) = WORK( p ) / CS 1316 WORK( q ) = WORK( q )*CS 1317 ELSE 1318 IF( WORK( p ).GE.WORK( q ) ) 1319 $ THEN 1320 CALL DAXPY( M, -T*AQOAP, 1321 $ A( 1, q ), 1, 1322 $ A( 1, p ), 1 ) 1323 CALL DAXPY( M, CS*SN*APOAQ, 1324 $ A( 1, p ), 1, 1325 $ A( 1, q ), 1 ) 1326 WORK( p ) = WORK( p )*CS 1327 WORK( q ) = WORK( q ) / CS 1328 IF( RSVEC ) THEN 1329 CALL DAXPY( MVL, 1330 $ -T*AQOAP, 1331 $ V( 1, q ), 1, 1332 $ V( 1, p ), 1 ) 1333 CALL DAXPY( MVL, 1334 $ CS*SN*APOAQ, 1335 $ V( 1, p ), 1, 1336 $ V( 1, q ), 1 ) 1337 END IF 1338 ELSE 1339 CALL DAXPY( M, T*APOAQ, 1340 $ A( 1, p ), 1, 1341 $ A( 1, q ), 1 ) 1342 CALL DAXPY( M, 1343 $ -CS*SN*AQOAP, 1344 $ A( 1, q ), 1, 1345 $ A( 1, p ), 1 ) 1346 WORK( p ) = WORK( p ) / CS 1347 WORK( q ) = WORK( q )*CS 1348 IF( RSVEC ) THEN 1349 CALL DAXPY( MVL, 1350 $ T*APOAQ, V( 1, p ), 1351 $ 1, V( 1, q ), 1 ) 1352 CALL DAXPY( MVL, 1353 $ -CS*SN*AQOAP, 1354 $ V( 1, q ), 1, 1355 $ V( 1, p ), 1 ) 1356 END IF 1357 END IF 1358 END IF 1359 END IF 1360 END IF 1361* 1362 ELSE 1363 IF( AAPP.GT.AAQQ ) THEN 1364 CALL DCOPY( M, A( 1, p ), 1, 1365 $ WORK( N+1 ), 1 ) 1366 CALL DLASCL( 'G', 0, 0, AAPP, ONE, 1367 $ M, 1, WORK( N+1 ), LDA, 1368 $ IERR ) 1369 CALL DLASCL( 'G', 0, 0, AAQQ, ONE, 1370 $ M, 1, A( 1, q ), LDA, 1371 $ IERR ) 1372 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 1373 CALL DAXPY( M, TEMP1, WORK( N+1 ), 1374 $ 1, A( 1, q ), 1 ) 1375 CALL DLASCL( 'G', 0, 0, ONE, AAQQ, 1376 $ M, 1, A( 1, q ), LDA, 1377 $ IERR ) 1378 SVA( q ) = AAQQ*DSQRT( MAX( ZERO, 1379 $ ONE-AAPQ*AAPQ ) ) 1380 MXSINJ = MAX( MXSINJ, SFMIN ) 1381 ELSE 1382 CALL DCOPY( M, A( 1, q ), 1, 1383 $ WORK( N+1 ), 1 ) 1384 CALL DLASCL( 'G', 0, 0, AAQQ, ONE, 1385 $ M, 1, WORK( N+1 ), LDA, 1386 $ IERR ) 1387 CALL DLASCL( 'G', 0, 0, AAPP, ONE, 1388 $ M, 1, A( 1, p ), LDA, 1389 $ IERR ) 1390 TEMP1 = -AAPQ*WORK( q ) / WORK( p ) 1391 CALL DAXPY( M, TEMP1, WORK( N+1 ), 1392 $ 1, A( 1, p ), 1 ) 1393 CALL DLASCL( 'G', 0, 0, ONE, AAPP, 1394 $ M, 1, A( 1, p ), LDA, 1395 $ IERR ) 1396 SVA( p ) = AAPP*DSQRT( MAX( ZERO, 1397 $ ONE-AAPQ*AAPQ ) ) 1398 MXSINJ = MAX( MXSINJ, SFMIN ) 1399 END IF 1400 END IF 1401* END IF ROTOK THEN ... ELSE 1402* 1403* In the case of cancellation in updating SVA(q) 1404* .. recompute SVA(q) 1405 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 1406 $ THEN 1407 IF( ( AAQQ.LT.ROOTBIG ) .AND. 1408 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 1409 SVA( q ) = DNRM2( M, A( 1, q ), 1 )* 1410 $ WORK( q ) 1411 ELSE 1412 T = ZERO 1413 AAQQ = ONE 1414 CALL DLASSQ( M, A( 1, q ), 1, T, 1415 $ AAQQ ) 1416 SVA( q ) = T*DSQRT( AAQQ )*WORK( q ) 1417 END IF 1418 END IF 1419 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN 1420 IF( ( AAPP.LT.ROOTBIG ) .AND. 1421 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 1422 AAPP = DNRM2( M, A( 1, p ), 1 )* 1423 $ WORK( p ) 1424 ELSE 1425 T = ZERO 1426 AAPP = ONE 1427 CALL DLASSQ( M, A( 1, p ), 1, T, 1428 $ AAPP ) 1429 AAPP = T*DSQRT( AAPP )*WORK( p ) 1430 END IF 1431 SVA( p ) = AAPP 1432 END IF 1433* end of OK rotation 1434 ELSE 1435 NOTROT = NOTROT + 1 1436*[RTD] SKIPPED = SKIPPED + 1 1437 PSKIPPED = PSKIPPED + 1 1438 IJBLSK = IJBLSK + 1 1439 END IF 1440 ELSE 1441 NOTROT = NOTROT + 1 1442 PSKIPPED = PSKIPPED + 1 1443 IJBLSK = IJBLSK + 1 1444 END IF 1445* 1446 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) ) 1447 $ THEN 1448 SVA( p ) = AAPP 1449 NOTROT = 0 1450 GO TO 2011 1451 END IF 1452 IF( ( i.LE.SWBAND ) .AND. 1453 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 1454 AAPP = -AAPP 1455 NOTROT = 0 1456 GO TO 2203 1457 END IF 1458* 1459 2200 CONTINUE 1460* end of the q-loop 1461 2203 CONTINUE 1462* 1463 SVA( p ) = AAPP 1464* 1465 ELSE 1466* 1467 IF( AAPP.EQ.ZERO )NOTROT = NOTROT + 1468 $ MIN( jgl+KBL-1, N ) - jgl + 1 1469 IF( AAPP.LT.ZERO )NOTROT = 0 1470* 1471 END IF 1472* 1473 2100 CONTINUE 1474* end of the p-loop 1475 2010 CONTINUE 1476* end of the jbc-loop 1477 2011 CONTINUE 1478*2011 bailed out of the jbc-loop 1479 DO 2012 p = igl, MIN( igl+KBL-1, N ) 1480 SVA( p ) = DABS( SVA( p ) ) 1481 2012 CONTINUE 1482*** 1483 2000 CONTINUE 1484*2000 :: end of the ibr-loop 1485* 1486* .. update SVA(N) 1487 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) ) 1488 $ THEN 1489 SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N ) 1490 ELSE 1491 T = ZERO 1492 AAPP = ONE 1493 CALL DLASSQ( M, A( 1, N ), 1, T, AAPP ) 1494 SVA( N ) = T*DSQRT( AAPP )*WORK( N ) 1495 END IF 1496* 1497* Additional steering devices 1498* 1499 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR. 1500 $ ( ISWROT.LE.N ) ) )SWBAND = i 1501* 1502 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )* 1503 $ TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN 1504 GO TO 1994 1505 END IF 1506* 1507 IF( NOTROT.GE.EMPTSW )GO TO 1994 1508* 1509 1993 CONTINUE 1510* end i=1:NSWEEP loop 1511* 1512* #:( Reaching this point means that the procedure has not converged. 1513 INFO = NSWEEP - 1 1514 GO TO 1995 1515* 1516 1994 CONTINUE 1517* #:) Reaching this point means numerical convergence after the i-th 1518* sweep. 1519* 1520 INFO = 0 1521* #:) INFO = 0 confirms successful iterations. 1522 1995 CONTINUE 1523* 1524* Sort the singular values and find how many are above 1525* the underflow threshold. 1526* 1527 N2 = 0 1528 N4 = 0 1529 DO 5991 p = 1, N - 1 1530 q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1 1531 IF( p.NE.q ) THEN 1532 TEMP1 = SVA( p ) 1533 SVA( p ) = SVA( q ) 1534 SVA( q ) = TEMP1 1535 TEMP1 = WORK( p ) 1536 WORK( p ) = WORK( q ) 1537 WORK( q ) = TEMP1 1538 CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 1539 IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 ) 1540 END IF 1541 IF( SVA( p ).NE.ZERO ) THEN 1542 N4 = N4 + 1 1543 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1 1544 END IF 1545 5991 CONTINUE 1546 IF( SVA( N ).NE.ZERO ) THEN 1547 N4 = N4 + 1 1548 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1 1549 END IF 1550* 1551* Normalize the left singular vectors. 1552* 1553 IF( LSVEC .OR. UCTOL ) THEN 1554 DO 1998 p = 1, N2 1555 CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 ) 1556 1998 CONTINUE 1557 END IF 1558* 1559* Scale the product of Jacobi rotations (assemble the fast rotations). 1560* 1561 IF( RSVEC ) THEN 1562 IF( APPLV ) THEN 1563 DO 2398 p = 1, N 1564 CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 ) 1565 2398 CONTINUE 1566 ELSE 1567 DO 2399 p = 1, N 1568 TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 ) 1569 CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 ) 1570 2399 CONTINUE 1571 END IF 1572 END IF 1573* 1574* Undo scaling, if necessary (and possible). 1575 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / SKL) ) ) 1576 $ .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( MAX( N2, 1 ) ) .GT. 1577 $ ( SFMIN / SKL) ) ) ) THEN 1578 DO 2400 p = 1, N 1579 SVA( P ) = SKL*SVA( P ) 1580 2400 CONTINUE 1581 SKL= ONE 1582 END IF 1583* 1584 WORK( 1 ) = SKL 1585* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE 1586* then some of the singular values may overflow or underflow and 1587* the spectrum is given in this factored representation. 1588* 1589 WORK( 2 ) = DBLE( N4 ) 1590* N4 is the number of computed nonzero singular values of A. 1591* 1592 WORK( 3 ) = DBLE( N2 ) 1593* N2 is the number of singular values of A greater than SFMIN. 1594* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers 1595* that may carry some information. 1596* 1597 WORK( 4 ) = DBLE( i ) 1598* i is the index of the last sweep before declaring convergence. 1599* 1600 WORK( 5 ) = MXAAPQ 1601* MXAAPQ is the largest absolute value of scaled pivots in the 1602* last sweep 1603* 1604 WORK( 6 ) = MXSINJ 1605* MXSINJ is the largest absolute value of the sines of Jacobi angles 1606* in the last sweep 1607* 1608 RETURN 1609* .. 1610* .. END OF DGESVJ 1611* .. 1612 END 1613