1*> \brief \b SGESVJ 2* 3* =========== DOCUMENTATION =========== 4* 5* Online html documentation available at 6* http://www.netlib.org/lapack/explore-html/ 7* 8*> \htmlonly 9*> Download SGESVJ + dependencies 10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/sgesvj.f"> 11*> [TGZ]</a> 12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/sgesvj.f"> 13*> [ZIP]</a> 14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/sgesvj.f"> 15*> [TXT]</a> 16*> \endhtmlonly 17* 18* Definition: 19* =========== 20* 21* SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 22* LDV, WORK, LWORK, INFO ) 23* 24* .. Scalar Arguments .. 25* INTEGER INFO, LDA, LDV, LWORK, M, MV, N 26* CHARACTER*1 JOBA, JOBU, JOBV 27* .. 28* .. Array Arguments .. 29* REAL A( LDA, * ), SVA( N ), V( LDV, * ), 30* $ WORK( LWORK ) 31* .. 32* 33* 34*> \par Purpose: 35* ============= 36*> 37*> \verbatim 38*> 39*> SGESVJ computes the singular value decomposition (SVD) of a real 40*> M-by-N matrix A, where M >= N. The SVD of A is written as 41*> [++] [xx] [x0] [xx] 42*> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx] 43*> [++] [xx] 44*> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal 45*> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements 46*> of SIGMA are the singular values of A. The columns of U and V are the 47*> left and the right singular vectors of A, respectively. 48*> SGESVJ can sometimes compute tiny singular values and their singular vectors much 49*> more accurately than other SVD routines, see below under Further Details. 50*> \endverbatim 51* 52* Arguments: 53* ========== 54* 55*> \param[in] JOBA 56*> \verbatim 57*> JOBA is CHARACTER*1 58*> Specifies the structure of A. 59*> = 'L': The input matrix A is lower triangular; 60*> = 'U': The input matrix A is upper triangular; 61*> = 'G': The input matrix A is general M-by-N matrix, M >= N. 62*> \endverbatim 63*> 64*> \param[in] JOBU 65*> \verbatim 66*> JOBU is CHARACTER*1 67*> Specifies whether to compute the left singular vectors 68*> (columns of U): 69*> = 'U': The left singular vectors corresponding to the nonzero 70*> singular values are computed and returned in the leading 71*> columns of A. See more details in the description of A. 72*> The default numerical orthogonality threshold is set to 73*> approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E'). 74*> = 'C': Analogous to JOBU='U', except that user can control the 75*> level of numerical orthogonality of the computed left 76*> singular vectors. TOL can be set to TOL = CTOL*EPS, where 77*> CTOL is given on input in the array WORK. 78*> No CTOL smaller than ONE is allowed. CTOL greater 79*> than 1 / EPS is meaningless. The option 'C' 80*> can be used if M*EPS is satisfactory orthogonality 81*> of the computed left singular vectors, so CTOL=M could 82*> save few sweeps of Jacobi rotations. 83*> See the descriptions of A and WORK(1). 84*> = 'N': The matrix U is not computed. However, see the 85*> description of A. 86*> \endverbatim 87*> 88*> \param[in] JOBV 89*> \verbatim 90*> JOBV is CHARACTER*1 91*> Specifies whether to compute the right singular vectors, that 92*> is, the matrix V: 93*> = 'V': the matrix V is computed and returned in the array V 94*> = 'A': the Jacobi rotations are applied to the MV-by-N 95*> array V. In other words, the right singular vector 96*> matrix V is not computed explicitly; instead it is 97*> applied to an MV-by-N matrix initially stored in the 98*> first MV rows of V. 99*> = 'N': the matrix V is not computed and the array V is not 100*> referenced 101*> \endverbatim 102*> 103*> \param[in] M 104*> \verbatim 105*> M is INTEGER 106*> The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0. 107*> \endverbatim 108*> 109*> \param[in] N 110*> \verbatim 111*> N is INTEGER 112*> The number of columns of the input matrix A. 113*> M >= N >= 0. 114*> \endverbatim 115*> 116*> \param[in,out] A 117*> \verbatim 118*> A is REAL array, dimension (LDA,N) 119*> On entry, the M-by-N matrix A. 120*> On exit, 121*> If JOBU = 'U' .OR. JOBU = 'C': 122*> If INFO = 0: 123*> RANKA orthonormal columns of U are returned in the 124*> leading RANKA columns of the array A. Here RANKA <= N 125*> is the number of computed singular values of A that are 126*> above the underflow threshold SLAMCH('S'). The singular 127*> vectors corresponding to underflowed or zero singular 128*> values are not computed. The value of RANKA is returned 129*> in the array WORK as RANKA=NINT(WORK(2)). Also see the 130*> descriptions of SVA and WORK. The computed columns of U 131*> are mutually numerically orthogonal up to approximately 132*> TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'), 133*> see the description of JOBU. 134*> If INFO > 0, 135*> the procedure SGESVJ did not converge in the given number 136*> of iterations (sweeps). In that case, the computed 137*> columns of U may not be orthogonal up to TOL. The output 138*> U (stored in A), SIGMA (given by the computed singular 139*> values in SVA(1:N)) and V is still a decomposition of the 140*> input matrix A in the sense that the residual 141*> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. 142*> If JOBU = 'N': 143*> If INFO = 0: 144*> Note that the left singular vectors are 'for free' in the 145*> one-sided Jacobi SVD algorithm. However, if only the 146*> singular values are needed, the level of numerical 147*> orthogonality of U is not an issue and iterations are 148*> stopped when the columns of the iterated matrix are 149*> numerically orthogonal up to approximately M*EPS. Thus, 150*> on exit, A contains the columns of U scaled with the 151*> corresponding singular values. 152*> If INFO > 0: 153*> the procedure SGESVJ did not converge in the given number 154*> of iterations (sweeps). 155*> \endverbatim 156*> 157*> \param[in] LDA 158*> \verbatim 159*> LDA is INTEGER 160*> The leading dimension of the array A. LDA >= max(1,M). 161*> \endverbatim 162*> 163*> \param[out] SVA 164*> \verbatim 165*> SVA is REAL array, dimension (N) 166*> On exit, 167*> If INFO = 0 : 168*> depending on the value SCALE = WORK(1), we have: 169*> If SCALE = ONE: 170*> SVA(1:N) contains the computed singular values of A. 171*> During the computation SVA contains the Euclidean column 172*> norms of the iterated matrices in the array A. 173*> If SCALE .NE. ONE: 174*> The singular values of A are SCALE*SVA(1:N), and this 175*> factored representation is due to the fact that some of the 176*> singular values of A might underflow or overflow. 177*> 178*> If INFO > 0 : 179*> the procedure SGESVJ did not converge in the given number of 180*> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate. 181*> \endverbatim 182*> 183*> \param[in] MV 184*> \verbatim 185*> MV is INTEGER 186*> If JOBV = 'A', then the product of Jacobi rotations in SGESVJ 187*> is applied to the first MV rows of V. See the description of JOBV. 188*> \endverbatim 189*> 190*> \param[in,out] V 191*> \verbatim 192*> V is REAL array, dimension (LDV,N) 193*> If JOBV = 'V', then V contains on exit the N-by-N matrix of 194*> the right singular vectors; 195*> If JOBV = 'A', then V contains the product of the computed right 196*> singular vector matrix and the initial matrix in 197*> the array V. 198*> If JOBV = 'N', then V is not referenced. 199*> \endverbatim 200*> 201*> \param[in] LDV 202*> \verbatim 203*> LDV is INTEGER 204*> The leading dimension of the array V, LDV >= 1. 205*> If JOBV = 'V', then LDV >= max(1,N). 206*> If JOBV = 'A', then LDV >= max(1,MV) . 207*> \endverbatim 208*> 209*> \param[in,out] WORK 210*> \verbatim 211*> WORK is REAL array, dimension (LWORK) 212*> On entry, 213*> If JOBU = 'C' : 214*> WORK(1) = CTOL, where CTOL defines the threshold for convergence. 215*> The process stops if all columns of A are mutually 216*> orthogonal up to CTOL*EPS, EPS=SLAMCH('E'). 217*> It is required that CTOL >= ONE, i.e. it is not 218*> allowed to force the routine to obtain orthogonality 219*> below EPSILON. 220*> On exit, 221*> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) 222*> are the computed singular vcalues of A. 223*> (See description of SVA().) 224*> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero 225*> singular values. 226*> WORK(3) = NINT(WORK(3)) is the number of the computed singular 227*> values that are larger than the underflow threshold. 228*> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi 229*> rotations needed for numerical convergence. 230*> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. 231*> This is useful information in cases when SGESVJ did 232*> not converge, as it can be used to estimate whether 233*> the output is still useful and for post festum analysis. 234*> WORK(6) = the largest absolute value over all sines of the 235*> Jacobi rotation angles in the last sweep. It can be 236*> useful for a post festum analysis. 237*> \endverbatim 238*> 239*> \param[in] LWORK 240*> \verbatim 241*> LWORK is INTEGER 242*> length of WORK, WORK >= MAX(6,M+N) 243*> \endverbatim 244*> 245*> \param[out] INFO 246*> \verbatim 247*> INFO is INTEGER 248*> = 0: successful exit. 249*> < 0: if INFO = -i, then the i-th argument had an illegal value 250*> > 0: SGESVJ did not converge in the maximal allowed number (30) 251*> of sweeps. The output may still be useful. See the 252*> description of WORK. 253*> \endverbatim 254* 255* Authors: 256* ======== 257* 258*> \author Univ. of Tennessee 259*> \author Univ. of California Berkeley 260*> \author Univ. of Colorado Denver 261*> \author NAG Ltd. 262* 263*> \ingroup realGEcomputational 264* 265*> \par Further Details: 266* ===================== 267*> 268*> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane 269*> rotations. The rotations are implemented as fast scaled rotations of 270*> Anda and Park [1]. In the case of underflow of the Jacobi angle, a 271*> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses 272*> column interchanges of de Rijk [2]. The relative accuracy of the computed 273*> singular values and the accuracy of the computed singular vectors (in 274*> angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. 275*> The condition number that determines the accuracy in the full rank case 276*> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the 277*> spectral condition number. The best performance of this Jacobi SVD 278*> procedure is achieved if used in an accelerated version of Drmac and 279*> Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. 280*> Some tuning parameters (marked with [TP]) are available for the 281*> implementer. \n 282*> The computational range for the nonzero singular values is the machine 283*> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even 284*> denormalized singular values can be computed with the corresponding 285*> gradual loss of accurate digits. 286*> 287*> \par Contributors: 288* ================== 289*> 290*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) 291*> 292*> \par References: 293* ================ 294*> 295*> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. \n 296*> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. \n\n 297*> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the 298*> singular value decomposition on a vector computer. \n 299*> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. \n\n 300*> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. \n 301*> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular 302*> value computation in floating point arithmetic. \n 303*> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. \n\n 304*> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. \n 305*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. \n 306*> LAPACK Working note 169. \n\n 307*> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. \n 308*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. \n 309*> LAPACK Working note 170. \n\n 310*> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, 311*> QSVD, (H,K)-SVD computations.\n 312*> Department of Mathematics, University of Zagreb, 2008. 313*> 314*> \par Bugs, Examples and Comments: 315* ================================= 316*> 317*> Please report all bugs and send interesting test examples and comments to 318*> drmac@math.hr. Thank you. 319* 320* ===================================================================== 321 SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 322 $ LDV, WORK, LWORK, INFO ) 323* 324* -- LAPACK computational routine -- 325* -- LAPACK is a software package provided by Univ. of Tennessee, -- 326* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 327* 328* .. Scalar Arguments .. 329 INTEGER INFO, LDA, LDV, LWORK, M, MV, N 330 CHARACTER*1 JOBA, JOBU, JOBV 331* .. 332* .. Array Arguments .. 333 REAL A( LDA, * ), SVA( N ), V( LDV, * ), 334 $ WORK( LWORK ) 335* .. 336* 337* ===================================================================== 338* 339* .. Local Parameters .. 340 REAL ZERO, HALF, ONE 341 PARAMETER ( ZERO = 0.0E0, HALF = 0.5E0, ONE = 1.0E0) 342 INTEGER NSWEEP 343 PARAMETER ( NSWEEP = 30 ) 344* .. 345* .. Local Scalars .. 346 REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG, 347 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ, 348 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL, 349 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA, 350 $ THSIGN, TOL 351 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1, 352 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34, 353 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP, 354 $ SWBAND 355 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK, 356 $ RSVEC, UCTOL, UPPER 357* .. 358* .. Local Arrays .. 359 REAL FASTR( 5 ) 360* .. 361* .. Intrinsic Functions .. 362 INTRINSIC ABS, MAX, MIN, FLOAT, SIGN, SQRT 363* .. 364* .. External Functions .. 365* .. 366* from BLAS 367 REAL SDOT, SNRM2 368 EXTERNAL SDOT, SNRM2 369 INTEGER ISAMAX 370 EXTERNAL ISAMAX 371* from LAPACK 372 REAL SLAMCH 373 EXTERNAL SLAMCH 374 LOGICAL LSAME 375 EXTERNAL LSAME 376* .. 377* .. External Subroutines .. 378* .. 379* from BLAS 380 EXTERNAL SAXPY, SCOPY, SROTM, SSCAL, SSWAP 381* from LAPACK 382 EXTERNAL SLASCL, SLASET, SLASSQ, XERBLA 383* 384 EXTERNAL SGSVJ0, SGSVJ1 385* .. 386* .. Executable Statements .. 387* 388* Test the input arguments 389* 390 LSVEC = LSAME( JOBU, 'U' ) 391 UCTOL = LSAME( JOBU, 'C' ) 392 RSVEC = LSAME( JOBV, 'V' ) 393 APPLV = LSAME( JOBV, 'A' ) 394 UPPER = LSAME( JOBA, 'U' ) 395 LOWER = LSAME( JOBA, 'L' ) 396* 397 IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN 398 INFO = -1 399 ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN 400 INFO = -2 401 ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN 402 INFO = -3 403 ELSE IF( M.LT.0 ) THEN 404 INFO = -4 405 ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN 406 INFO = -5 407 ELSE IF( LDA.LT.M ) THEN 408 INFO = -7 409 ELSE IF( MV.LT.0 ) THEN 410 INFO = -9 411 ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR. 412 $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN 413 INFO = -11 414 ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN 415 INFO = -12 416 ELSE IF( LWORK.LT.MAX( M+N, 6 ) ) THEN 417 INFO = -13 418 ELSE 419 INFO = 0 420 END IF 421* 422* #:( 423 IF( INFO.NE.0 ) THEN 424 CALL XERBLA( 'SGESVJ', -INFO ) 425 RETURN 426 END IF 427* 428* #:) Quick return for void matrix 429* 430 IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN 431* 432* Set numerical parameters 433* The stopping criterion for Jacobi rotations is 434* 435* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS 436* 437* where EPS is the round-off and CTOL is defined as follows: 438* 439 IF( UCTOL ) THEN 440* ... user controlled 441 CTOL = WORK( 1 ) 442 ELSE 443* ... default 444 IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN 445 CTOL = SQRT( FLOAT( M ) ) 446 ELSE 447 CTOL = FLOAT( M ) 448 END IF 449 END IF 450* ... and the machine dependent parameters are 451*[!] (Make sure that SLAMCH() works properly on the target machine.) 452* 453 EPSLN = SLAMCH( 'Epsilon' ) 454 ROOTEPS = SQRT( EPSLN ) 455 SFMIN = SLAMCH( 'SafeMinimum' ) 456 ROOTSFMIN = SQRT( SFMIN ) 457 SMALL = SFMIN / EPSLN 458 BIG = SLAMCH( 'Overflow' ) 459* BIG = ONE / SFMIN 460 ROOTBIG = ONE / ROOTSFMIN 461 LARGE = BIG / SQRT( FLOAT( M*N ) ) 462 BIGTHETA = ONE / ROOTEPS 463* 464 TOL = CTOL*EPSLN 465 ROOTTOL = SQRT( TOL ) 466* 467 IF( FLOAT( M )*EPSLN.GE.ONE ) THEN 468 INFO = -4 469 CALL XERBLA( 'SGESVJ', -INFO ) 470 RETURN 471 END IF 472* 473* Initialize the right singular vector matrix. 474* 475 IF( RSVEC ) THEN 476 MVL = N 477 CALL SLASET( 'A', MVL, N, ZERO, ONE, V, LDV ) 478 ELSE IF( APPLV ) THEN 479 MVL = MV 480 END IF 481 RSVEC = RSVEC .OR. APPLV 482* 483* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N ) 484*(!) If necessary, scale A to protect the largest singular value 485* from overflow. It is possible that saving the largest singular 486* value destroys the information about the small ones. 487* This initial scaling is almost minimal in the sense that the 488* goal is to make sure that no column norm overflows, and that 489* SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries 490* in A are detected, the procedure returns with INFO=-6. 491* 492 SKL = ONE / SQRT( FLOAT( M )*FLOAT( N ) ) 493 NOSCALE = .TRUE. 494 GOSCALE = .TRUE. 495* 496 IF( LOWER ) THEN 497* the input matrix is M-by-N lower triangular (trapezoidal) 498 DO 1874 p = 1, N 499 AAPP = ZERO 500 AAQQ = ONE 501 CALL SLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ ) 502 IF( AAPP.GT.BIG ) THEN 503 INFO = -6 504 CALL XERBLA( 'SGESVJ', -INFO ) 505 RETURN 506 END IF 507 AAQQ = SQRT( AAQQ ) 508 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 509 SVA( p ) = AAPP*AAQQ 510 ELSE 511 NOSCALE = .FALSE. 512 SVA( p ) = AAPP*( AAQQ*SKL ) 513 IF( GOSCALE ) THEN 514 GOSCALE = .FALSE. 515 DO 1873 q = 1, p - 1 516 SVA( q ) = SVA( q )*SKL 517 1873 CONTINUE 518 END IF 519 END IF 520 1874 CONTINUE 521 ELSE IF( UPPER ) THEN 522* the input matrix is M-by-N upper triangular (trapezoidal) 523 DO 2874 p = 1, N 524 AAPP = ZERO 525 AAQQ = ONE 526 CALL SLASSQ( p, A( 1, p ), 1, AAPP, AAQQ ) 527 IF( AAPP.GT.BIG ) THEN 528 INFO = -6 529 CALL XERBLA( 'SGESVJ', -INFO ) 530 RETURN 531 END IF 532 AAQQ = SQRT( AAQQ ) 533 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 534 SVA( p ) = AAPP*AAQQ 535 ELSE 536 NOSCALE = .FALSE. 537 SVA( p ) = AAPP*( AAQQ*SKL ) 538 IF( GOSCALE ) THEN 539 GOSCALE = .FALSE. 540 DO 2873 q = 1, p - 1 541 SVA( q ) = SVA( q )*SKL 542 2873 CONTINUE 543 END IF 544 END IF 545 2874 CONTINUE 546 ELSE 547* the input matrix is M-by-N general dense 548 DO 3874 p = 1, N 549 AAPP = ZERO 550 AAQQ = ONE 551 CALL SLASSQ( M, A( 1, p ), 1, AAPP, AAQQ ) 552 IF( AAPP.GT.BIG ) THEN 553 INFO = -6 554 CALL XERBLA( 'SGESVJ', -INFO ) 555 RETURN 556 END IF 557 AAQQ = SQRT( AAQQ ) 558 IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 559 SVA( p ) = AAPP*AAQQ 560 ELSE 561 NOSCALE = .FALSE. 562 SVA( p ) = AAPP*( AAQQ*SKL ) 563 IF( GOSCALE ) THEN 564 GOSCALE = .FALSE. 565 DO 3873 q = 1, p - 1 566 SVA( q ) = SVA( q )*SKL 567 3873 CONTINUE 568 END IF 569 END IF 570 3874 CONTINUE 571 END IF 572* 573 IF( NOSCALE )SKL = ONE 574* 575* Move the smaller part of the spectrum from the underflow threshold 576*(!) Start by determining the position of the nonzero entries of the 577* array SVA() relative to ( SFMIN, BIG ). 578* 579 AAPP = ZERO 580 AAQQ = BIG 581 DO 4781 p = 1, N 582 IF( SVA( p ).NE.ZERO )AAQQ = MIN( AAQQ, SVA( p ) ) 583 AAPP = MAX( AAPP, SVA( p ) ) 584 4781 CONTINUE 585* 586* #:) Quick return for zero matrix 587* 588 IF( AAPP.EQ.ZERO ) THEN 589 IF( LSVEC )CALL SLASET( 'G', M, N, ZERO, ONE, A, LDA ) 590 WORK( 1 ) = ONE 591 WORK( 2 ) = ZERO 592 WORK( 3 ) = ZERO 593 WORK( 4 ) = ZERO 594 WORK( 5 ) = ZERO 595 WORK( 6 ) = ZERO 596 RETURN 597 END IF 598* 599* #:) Quick return for one-column matrix 600* 601 IF( N.EQ.1 ) THEN 602 IF( LSVEC )CALL SLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1, 603 $ A( 1, 1 ), LDA, IERR ) 604 WORK( 1 ) = ONE / SKL 605 IF( SVA( 1 ).GE.SFMIN ) THEN 606 WORK( 2 ) = ONE 607 ELSE 608 WORK( 2 ) = ZERO 609 END IF 610 WORK( 3 ) = ZERO 611 WORK( 4 ) = ZERO 612 WORK( 5 ) = ZERO 613 WORK( 6 ) = ZERO 614 RETURN 615 END IF 616* 617* Protect small singular values from underflow, and try to 618* avoid underflows/overflows in computing Jacobi rotations. 619* 620 SN = SQRT( SFMIN / EPSLN ) 621 TEMP1 = SQRT( BIG / FLOAT( N ) ) 622 IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR. 623 $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN 624 TEMP1 = MIN( BIG, TEMP1 / AAPP ) 625* AAQQ = AAQQ*TEMP1 626* AAPP = AAPP*TEMP1 627 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN 628 TEMP1 = MIN( SN / AAQQ, BIG / ( AAPP*SQRT( FLOAT( N ) ) ) ) 629* AAQQ = AAQQ*TEMP1 630* AAPP = AAPP*TEMP1 631 ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 632 TEMP1 = MAX( SN / AAQQ, TEMP1 / AAPP ) 633* AAQQ = AAQQ*TEMP1 634* AAPP = AAPP*TEMP1 635 ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 636 TEMP1 = MIN( SN / AAQQ, BIG / ( SQRT( FLOAT( N ) )*AAPP ) ) 637* AAQQ = AAQQ*TEMP1 638* AAPP = AAPP*TEMP1 639 ELSE 640 TEMP1 = ONE 641 END IF 642* 643* Scale, if necessary 644* 645 IF( TEMP1.NE.ONE ) THEN 646 CALL SLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR ) 647 END IF 648 SKL = TEMP1*SKL 649 IF( SKL.NE.ONE ) THEN 650 CALL SLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR ) 651 SKL = ONE / SKL 652 END IF 653* 654* Row-cyclic Jacobi SVD algorithm with column pivoting 655* 656 EMPTSW = ( N*( N-1 ) ) / 2 657 NOTROT = 0 658 FASTR( 1 ) = ZERO 659* 660* A is represented in factored form A = A * diag(WORK), where diag(WORK) 661* is initialized to identity. WORK is updated during fast scaled 662* rotations. 663* 664 DO 1868 q = 1, N 665 WORK( q ) = ONE 666 1868 CONTINUE 667* 668* 669 SWBAND = 3 670*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective 671* if SGESVJ is used as a computational routine in the preconditioned 672* Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure 673* works on pivots inside a band-like region around the diagonal. 674* The boundaries are determined dynamically, based on the number of 675* pivots above a threshold. 676* 677 KBL = MIN( 8, N ) 678*[TP] KBL is a tuning parameter that defines the tile size in the 679* tiling of the p-q loops of pivot pairs. In general, an optimal 680* value of KBL depends on the matrix dimensions and on the 681* parameters of the computer's memory. 682* 683 NBL = N / KBL 684 IF( ( NBL*KBL ).NE.N )NBL = NBL + 1 685* 686 BLSKIP = KBL**2 687*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL. 688* 689 ROWSKIP = MIN( 5, KBL ) 690*[TP] ROWSKIP is a tuning parameter. 691* 692 LKAHEAD = 1 693*[TP] LKAHEAD is a tuning parameter. 694* 695* Quasi block transformations, using the lower (upper) triangular 696* structure of the input matrix. The quasi-block-cycling usually 697* invokes cubic convergence. Big part of this cycle is done inside 698* canonical subspaces of dimensions less than M. 699* 700 IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX( 64, 4*KBL ) ) ) THEN 701*[TP] The number of partition levels and the actual partition are 702* tuning parameters. 703 N4 = N / 4 704 N2 = N / 2 705 N34 = 3*N4 706 IF( APPLV ) THEN 707 q = 0 708 ELSE 709 q = 1 710 END IF 711* 712 IF( LOWER ) THEN 713* 714* This works very well on lower triangular matrices, in particular 715* in the framework of the preconditioned Jacobi SVD (xGEJSV). 716* The idea is simple: 717* [+ 0 0 0] Note that Jacobi transformations of [0 0] 718* [+ + 0 0] [0 0] 719* [+ + x 0] actually work on [x 0] [x 0] 720* [+ + x x] [x x]. [x x] 721* 722 CALL SGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA, 723 $ WORK( N34+1 ), SVA( N34+1 ), MVL, 724 $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL, 725 $ 2, WORK( N+1 ), LWORK-N, IERR ) 726* 727 CALL SGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA, 728 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 729 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2, 730 $ WORK( N+1 ), LWORK-N, IERR ) 731* 732 CALL SGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA, 733 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 734 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 735 $ WORK( N+1 ), LWORK-N, IERR ) 736* 737 CALL SGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA, 738 $ WORK( N4+1 ), SVA( N4+1 ), MVL, 739 $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1, 740 $ WORK( N+1 ), LWORK-N, IERR ) 741* 742 CALL SGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV, 743 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 744 $ IERR ) 745* 746 CALL SGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V, 747 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 748 $ LWORK-N, IERR ) 749* 750* 751 ELSE IF( UPPER ) THEN 752* 753* 754 CALL SGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV, 755 $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N, 756 $ IERR ) 757* 758 CALL SGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ), 759 $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV, 760 $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 761 $ IERR ) 762* 763 CALL SGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V, 764 $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 765 $ LWORK-N, IERR ) 766* 767 CALL SGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA, 768 $ WORK( N2+1 ), SVA( N2+1 ), MVL, 769 $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 770 $ WORK( N+1 ), LWORK-N, IERR ) 771 772 END IF 773* 774 END IF 775* 776* .. Row-cyclic pivot strategy with de Rijk's pivoting .. 777* 778 DO 1993 i = 1, NSWEEP 779* 780* .. go go go ... 781* 782 MXAAPQ = ZERO 783 MXSINJ = ZERO 784 ISWROT = 0 785* 786 NOTROT = 0 787 PSKIPPED = 0 788* 789* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs 790* 1 <= p < q <= N. This is the first step toward a blocked implementation 791* of the rotations. New implementation, based on block transformations, 792* is under development. 793* 794 DO 2000 ibr = 1, NBL 795* 796 igl = ( ibr-1 )*KBL + 1 797* 798 DO 1002 ir1 = 0, MIN( LKAHEAD, NBL-ibr ) 799* 800 igl = igl + ir1*KBL 801* 802 DO 2001 p = igl, MIN( igl+KBL-1, N-1 ) 803* 804* .. de Rijk's pivoting 805* 806 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 807 IF( p.NE.q ) THEN 808 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 809 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, 810 $ V( 1, q ), 1 ) 811 TEMP1 = SVA( p ) 812 SVA( p ) = SVA( q ) 813 SVA( q ) = TEMP1 814 TEMP1 = WORK( p ) 815 WORK( p ) = WORK( q ) 816 WORK( q ) = TEMP1 817 END IF 818* 819 IF( ir1.EQ.0 ) THEN 820* 821* Column norms are periodically updated by explicit 822* norm computation. 823* Caveat: 824* Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1) 825* as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to 826* overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to 827* underflow for ||A(:,p)||_2 < SQRT(underflow_threshold). 828* Hence, SNRM2 cannot be trusted, not even in the case when 829* the true norm is far from the under(over)flow boundaries. 830* If properly implemented SNRM2 is available, the IF-THEN-ELSE 831* below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)". 832* 833 IF( ( SVA( p ).LT.ROOTBIG ) .AND. 834 $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN 835 SVA( p ) = SNRM2( M, A( 1, p ), 1 )*WORK( p ) 836 ELSE 837 TEMP1 = ZERO 838 AAPP = ONE 839 CALL SLASSQ( M, A( 1, p ), 1, TEMP1, AAPP ) 840 SVA( p ) = TEMP1*SQRT( AAPP )*WORK( p ) 841 END IF 842 AAPP = SVA( p ) 843 ELSE 844 AAPP = SVA( p ) 845 END IF 846* 847 IF( AAPP.GT.ZERO ) THEN 848* 849 PSKIPPED = 0 850* 851 DO 2002 q = p + 1, MIN( igl+KBL-1, N ) 852* 853 AAQQ = SVA( q ) 854* 855 IF( AAQQ.GT.ZERO ) THEN 856* 857 AAPP0 = AAPP 858 IF( AAQQ.GE.ONE ) THEN 859 ROTOK = ( SMALL*AAPP ).LE.AAQQ 860 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 861 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 862 $ q ), 1 )*WORK( p )*WORK( q ) / 863 $ AAQQ ) / AAPP 864 ELSE 865 CALL SCOPY( M, A( 1, p ), 1, 866 $ WORK( N+1 ), 1 ) 867 CALL SLASCL( 'G', 0, 0, AAPP, 868 $ WORK( p ), M, 1, 869 $ WORK( N+1 ), LDA, IERR ) 870 AAPQ = SDOT( M, WORK( N+1 ), 1, 871 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 872 END IF 873 ELSE 874 ROTOK = AAPP.LE.( AAQQ / SMALL ) 875 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 876 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 877 $ q ), 1 )*WORK( p )*WORK( q ) / 878 $ AAQQ ) / AAPP 879 ELSE 880 CALL SCOPY( M, A( 1, q ), 1, 881 $ WORK( N+1 ), 1 ) 882 CALL SLASCL( 'G', 0, 0, AAQQ, 883 $ WORK( q ), M, 1, 884 $ WORK( N+1 ), LDA, IERR ) 885 AAPQ = SDOT( M, WORK( N+1 ), 1, 886 $ A( 1, p ), 1 )*WORK( p ) / AAPP 887 END IF 888 END IF 889* 890 MXAAPQ = MAX( MXAAPQ, ABS( AAPQ ) ) 891* 892* TO rotate or NOT to rotate, THAT is the question ... 893* 894 IF( ABS( AAPQ ).GT.TOL ) THEN 895* 896* .. rotate 897*[RTD] ROTATED = ROTATED + ONE 898* 899 IF( ir1.EQ.0 ) THEN 900 NOTROT = 0 901 PSKIPPED = 0 902 ISWROT = ISWROT + 1 903 END IF 904* 905 IF( ROTOK ) THEN 906* 907 AQOAP = AAQQ / AAPP 908 APOAQ = AAPP / AAQQ 909 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 910* 911 IF( ABS( THETA ).GT.BIGTHETA ) THEN 912* 913 T = HALF / THETA 914 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 915 FASTR( 4 ) = -T*WORK( q ) / 916 $ WORK( p ) 917 CALL SROTM( M, A( 1, p ), 1, 918 $ A( 1, q ), 1, FASTR ) 919 IF( RSVEC )CALL SROTM( MVL, 920 $ V( 1, p ), 1, 921 $ V( 1, q ), 1, 922 $ FASTR ) 923 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 924 $ ONE+T*APOAQ*AAPQ ) ) 925 AAPP = AAPP*SQRT( MAX( ZERO, 926 $ ONE-T*AQOAP*AAPQ ) ) 927 MXSINJ = MAX( MXSINJ, ABS( T ) ) 928* 929 ELSE 930* 931* .. choose correct signum for THETA and rotate 932* 933 THSIGN = -SIGN( ONE, AAPQ ) 934 T = ONE / ( THETA+THSIGN* 935 $ SQRT( ONE+THETA*THETA ) ) 936 CS = SQRT( ONE / ( ONE+T*T ) ) 937 SN = T*CS 938* 939 MXSINJ = MAX( MXSINJ, ABS( SN ) ) 940 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 941 $ ONE+T*APOAQ*AAPQ ) ) 942 AAPP = AAPP*SQRT( MAX( ZERO, 943 $ ONE-T*AQOAP*AAPQ ) ) 944* 945 APOAQ = WORK( p ) / WORK( q ) 946 AQOAP = WORK( q ) / WORK( p ) 947 IF( WORK( p ).GE.ONE ) THEN 948 IF( WORK( q ).GE.ONE ) THEN 949 FASTR( 3 ) = T*APOAQ 950 FASTR( 4 ) = -T*AQOAP 951 WORK( p ) = WORK( p )*CS 952 WORK( q ) = WORK( q )*CS 953 CALL SROTM( M, A( 1, p ), 1, 954 $ A( 1, q ), 1, 955 $ FASTR ) 956 IF( RSVEC )CALL SROTM( MVL, 957 $ V( 1, p ), 1, V( 1, q ), 958 $ 1, FASTR ) 959 ELSE 960 CALL SAXPY( M, -T*AQOAP, 961 $ A( 1, q ), 1, 962 $ A( 1, p ), 1 ) 963 CALL SAXPY( M, CS*SN*APOAQ, 964 $ A( 1, p ), 1, 965 $ A( 1, q ), 1 ) 966 WORK( p ) = WORK( p )*CS 967 WORK( q ) = WORK( q ) / CS 968 IF( RSVEC ) THEN 969 CALL SAXPY( MVL, -T*AQOAP, 970 $ V( 1, q ), 1, 971 $ V( 1, p ), 1 ) 972 CALL SAXPY( MVL, 973 $ CS*SN*APOAQ, 974 $ V( 1, p ), 1, 975 $ V( 1, q ), 1 ) 976 END IF 977 END IF 978 ELSE 979 IF( WORK( q ).GE.ONE ) THEN 980 CALL SAXPY( M, T*APOAQ, 981 $ A( 1, p ), 1, 982 $ A( 1, q ), 1 ) 983 CALL SAXPY( M, -CS*SN*AQOAP, 984 $ A( 1, q ), 1, 985 $ A( 1, p ), 1 ) 986 WORK( p ) = WORK( p ) / CS 987 WORK( q ) = WORK( q )*CS 988 IF( RSVEC ) THEN 989 CALL SAXPY( MVL, T*APOAQ, 990 $ V( 1, p ), 1, 991 $ V( 1, q ), 1 ) 992 CALL SAXPY( MVL, 993 $ -CS*SN*AQOAP, 994 $ V( 1, q ), 1, 995 $ V( 1, p ), 1 ) 996 END IF 997 ELSE 998 IF( WORK( p ).GE.WORK( q ) ) 999 $ THEN 1000 CALL SAXPY( M, -T*AQOAP, 1001 $ A( 1, q ), 1, 1002 $ A( 1, p ), 1 ) 1003 CALL SAXPY( M, CS*SN*APOAQ, 1004 $ A( 1, p ), 1, 1005 $ A( 1, q ), 1 ) 1006 WORK( p ) = WORK( p )*CS 1007 WORK( q ) = WORK( q ) / CS 1008 IF( RSVEC ) THEN 1009 CALL SAXPY( MVL, 1010 $ -T*AQOAP, 1011 $ V( 1, q ), 1, 1012 $ V( 1, p ), 1 ) 1013 CALL SAXPY( MVL, 1014 $ CS*SN*APOAQ, 1015 $ V( 1, p ), 1, 1016 $ V( 1, q ), 1 ) 1017 END IF 1018 ELSE 1019 CALL SAXPY( M, T*APOAQ, 1020 $ A( 1, p ), 1, 1021 $ A( 1, q ), 1 ) 1022 CALL SAXPY( M, 1023 $ -CS*SN*AQOAP, 1024 $ A( 1, q ), 1, 1025 $ A( 1, p ), 1 ) 1026 WORK( p ) = WORK( p ) / CS 1027 WORK( q ) = WORK( q )*CS 1028 IF( RSVEC ) THEN 1029 CALL SAXPY( MVL, 1030 $ T*APOAQ, V( 1, p ), 1031 $ 1, V( 1, q ), 1 ) 1032 CALL SAXPY( MVL, 1033 $ -CS*SN*AQOAP, 1034 $ V( 1, q ), 1, 1035 $ V( 1, p ), 1 ) 1036 END IF 1037 END IF 1038 END IF 1039 END IF 1040 END IF 1041* 1042 ELSE 1043* .. have to use modified Gram-Schmidt like transformation 1044 CALL SCOPY( M, A( 1, p ), 1, 1045 $ WORK( N+1 ), 1 ) 1046 CALL SLASCL( 'G', 0, 0, AAPP, ONE, M, 1047 $ 1, WORK( N+1 ), LDA, 1048 $ IERR ) 1049 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, M, 1050 $ 1, A( 1, q ), LDA, IERR ) 1051 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 1052 CALL SAXPY( M, TEMP1, WORK( N+1 ), 1, 1053 $ A( 1, q ), 1 ) 1054 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, M, 1055 $ 1, A( 1, q ), LDA, IERR ) 1056 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 1057 $ ONE-AAPQ*AAPQ ) ) 1058 MXSINJ = MAX( MXSINJ, SFMIN ) 1059 END IF 1060* END IF ROTOK THEN ... ELSE 1061* 1062* In the case of cancellation in updating SVA(q), SVA(p) 1063* recompute SVA(q), SVA(p). 1064* 1065 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 1066 $ THEN 1067 IF( ( AAQQ.LT.ROOTBIG ) .AND. 1068 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 1069 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 1070 $ WORK( q ) 1071 ELSE 1072 T = ZERO 1073 AAQQ = ONE 1074 CALL SLASSQ( M, A( 1, q ), 1, T, 1075 $ AAQQ ) 1076 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 1077 END IF 1078 END IF 1079 IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN 1080 IF( ( AAPP.LT.ROOTBIG ) .AND. 1081 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 1082 AAPP = SNRM2( M, A( 1, p ), 1 )* 1083 $ WORK( p ) 1084 ELSE 1085 T = ZERO 1086 AAPP = ONE 1087 CALL SLASSQ( M, A( 1, p ), 1, T, 1088 $ AAPP ) 1089 AAPP = T*SQRT( AAPP )*WORK( p ) 1090 END IF 1091 SVA( p ) = AAPP 1092 END IF 1093* 1094 ELSE 1095* A(:,p) and A(:,q) already numerically orthogonal 1096 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1097*[RTD] SKIPPED = SKIPPED + 1 1098 PSKIPPED = PSKIPPED + 1 1099 END IF 1100 ELSE 1101* A(:,q) is zero column 1102 IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1103 PSKIPPED = PSKIPPED + 1 1104 END IF 1105* 1106 IF( ( i.LE.SWBAND ) .AND. 1107 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 1108 IF( ir1.EQ.0 )AAPP = -AAPP 1109 NOTROT = 0 1110 GO TO 2103 1111 END IF 1112* 1113 2002 CONTINUE 1114* END q-LOOP 1115* 1116 2103 CONTINUE 1117* bailed out of q-loop 1118* 1119 SVA( p ) = AAPP 1120* 1121 ELSE 1122 SVA( p ) = AAPP 1123 IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) ) 1124 $ NOTROT = NOTROT + MIN( igl+KBL-1, N ) - p 1125 END IF 1126* 1127 2001 CONTINUE 1128* end of the p-loop 1129* end of doing the block ( ibr, ibr ) 1130 1002 CONTINUE 1131* end of ir1-loop 1132* 1133* ... go to the off diagonal blocks 1134* 1135 igl = ( ibr-1 )*KBL + 1 1136* 1137 DO 2010 jbc = ibr + 1, NBL 1138* 1139 jgl = ( jbc-1 )*KBL + 1 1140* 1141* doing the block at ( ibr, jbc ) 1142* 1143 IJBLSK = 0 1144 DO 2100 p = igl, MIN( igl+KBL-1, N ) 1145* 1146 AAPP = SVA( p ) 1147 IF( AAPP.GT.ZERO ) THEN 1148* 1149 PSKIPPED = 0 1150* 1151 DO 2200 q = jgl, MIN( jgl+KBL-1, N ) 1152* 1153 AAQQ = SVA( q ) 1154 IF( AAQQ.GT.ZERO ) THEN 1155 AAPP0 = AAPP 1156* 1157* .. M x 2 Jacobi SVD .. 1158* 1159* Safe Gram matrix computation 1160* 1161 IF( AAQQ.GE.ONE ) THEN 1162 IF( AAPP.GE.AAQQ ) THEN 1163 ROTOK = ( SMALL*AAPP ).LE.AAQQ 1164 ELSE 1165 ROTOK = ( SMALL*AAQQ ).LE.AAPP 1166 END IF 1167 IF( AAPP.LT.( BIG / AAQQ ) ) THEN 1168 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 1169 $ q ), 1 )*WORK( p )*WORK( q ) / 1170 $ AAQQ ) / AAPP 1171 ELSE 1172 CALL SCOPY( M, A( 1, p ), 1, 1173 $ WORK( N+1 ), 1 ) 1174 CALL SLASCL( 'G', 0, 0, AAPP, 1175 $ WORK( p ), M, 1, 1176 $ WORK( N+1 ), LDA, IERR ) 1177 AAPQ = SDOT( M, WORK( N+1 ), 1, 1178 $ A( 1, q ), 1 )*WORK( q ) / AAQQ 1179 END IF 1180 ELSE 1181 IF( AAPP.GE.AAQQ ) THEN 1182 ROTOK = AAPP.LE.( AAQQ / SMALL ) 1183 ELSE 1184 ROTOK = AAQQ.LE.( AAPP / SMALL ) 1185 END IF 1186 IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 1187 AAPQ = ( SDOT( M, A( 1, p ), 1, A( 1, 1188 $ q ), 1 )*WORK( p )*WORK( q ) / 1189 $ AAQQ ) / AAPP 1190 ELSE 1191 CALL SCOPY( M, A( 1, q ), 1, 1192 $ WORK( N+1 ), 1 ) 1193 CALL SLASCL( 'G', 0, 0, AAQQ, 1194 $ WORK( q ), M, 1, 1195 $ WORK( N+1 ), LDA, IERR ) 1196 AAPQ = SDOT( M, WORK( N+1 ), 1, 1197 $ A( 1, p ), 1 )*WORK( p ) / AAPP 1198 END IF 1199 END IF 1200* 1201 MXAAPQ = MAX( MXAAPQ, ABS( AAPQ ) ) 1202* 1203* TO rotate or NOT to rotate, THAT is the question ... 1204* 1205 IF( ABS( AAPQ ).GT.TOL ) THEN 1206 NOTROT = 0 1207*[RTD] ROTATED = ROTATED + 1 1208 PSKIPPED = 0 1209 ISWROT = ISWROT + 1 1210* 1211 IF( ROTOK ) THEN 1212* 1213 AQOAP = AAQQ / AAPP 1214 APOAQ = AAPP / AAQQ 1215 THETA = -HALF*ABS( AQOAP-APOAQ ) / AAPQ 1216 IF( AAQQ.GT.AAPP0 )THETA = -THETA 1217* 1218 IF( ABS( THETA ).GT.BIGTHETA ) THEN 1219 T = HALF / THETA 1220 FASTR( 3 ) = T*WORK( p ) / WORK( q ) 1221 FASTR( 4 ) = -T*WORK( q ) / 1222 $ WORK( p ) 1223 CALL SROTM( M, A( 1, p ), 1, 1224 $ A( 1, q ), 1, FASTR ) 1225 IF( RSVEC )CALL SROTM( MVL, 1226 $ V( 1, p ), 1, 1227 $ V( 1, q ), 1, 1228 $ FASTR ) 1229 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 1230 $ ONE+T*APOAQ*AAPQ ) ) 1231 AAPP = AAPP*SQRT( MAX( ZERO, 1232 $ ONE-T*AQOAP*AAPQ ) ) 1233 MXSINJ = MAX( MXSINJ, ABS( T ) ) 1234 ELSE 1235* 1236* .. choose correct signum for THETA and rotate 1237* 1238 THSIGN = -SIGN( ONE, AAPQ ) 1239 IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN 1240 T = ONE / ( THETA+THSIGN* 1241 $ SQRT( ONE+THETA*THETA ) ) 1242 CS = SQRT( ONE / ( ONE+T*T ) ) 1243 SN = T*CS 1244 MXSINJ = MAX( MXSINJ, ABS( SN ) ) 1245 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 1246 $ ONE+T*APOAQ*AAPQ ) ) 1247 AAPP = AAPP*SQRT( MAX( ZERO, 1248 $ ONE-T*AQOAP*AAPQ ) ) 1249* 1250 APOAQ = WORK( p ) / WORK( q ) 1251 AQOAP = WORK( q ) / WORK( p ) 1252 IF( WORK( p ).GE.ONE ) THEN 1253* 1254 IF( WORK( q ).GE.ONE ) THEN 1255 FASTR( 3 ) = T*APOAQ 1256 FASTR( 4 ) = -T*AQOAP 1257 WORK( p ) = WORK( p )*CS 1258 WORK( q ) = WORK( q )*CS 1259 CALL SROTM( M, A( 1, p ), 1, 1260 $ A( 1, q ), 1, 1261 $ FASTR ) 1262 IF( RSVEC )CALL SROTM( MVL, 1263 $ V( 1, p ), 1, V( 1, q ), 1264 $ 1, FASTR ) 1265 ELSE 1266 CALL SAXPY( M, -T*AQOAP, 1267 $ A( 1, q ), 1, 1268 $ A( 1, p ), 1 ) 1269 CALL SAXPY( M, CS*SN*APOAQ, 1270 $ A( 1, p ), 1, 1271 $ A( 1, q ), 1 ) 1272 IF( RSVEC ) THEN 1273 CALL SAXPY( MVL, -T*AQOAP, 1274 $ V( 1, q ), 1, 1275 $ V( 1, p ), 1 ) 1276 CALL SAXPY( MVL, 1277 $ CS*SN*APOAQ, 1278 $ V( 1, p ), 1, 1279 $ V( 1, q ), 1 ) 1280 END IF 1281 WORK( p ) = WORK( p )*CS 1282 WORK( q ) = WORK( q ) / CS 1283 END IF 1284 ELSE 1285 IF( WORK( q ).GE.ONE ) THEN 1286 CALL SAXPY( M, T*APOAQ, 1287 $ A( 1, p ), 1, 1288 $ A( 1, q ), 1 ) 1289 CALL SAXPY( M, -CS*SN*AQOAP, 1290 $ A( 1, q ), 1, 1291 $ A( 1, p ), 1 ) 1292 IF( RSVEC ) THEN 1293 CALL SAXPY( MVL, T*APOAQ, 1294 $ V( 1, p ), 1, 1295 $ V( 1, q ), 1 ) 1296 CALL SAXPY( MVL, 1297 $ -CS*SN*AQOAP, 1298 $ V( 1, q ), 1, 1299 $ V( 1, p ), 1 ) 1300 END IF 1301 WORK( p ) = WORK( p ) / CS 1302 WORK( q ) = WORK( q )*CS 1303 ELSE 1304 IF( WORK( p ).GE.WORK( q ) ) 1305 $ THEN 1306 CALL SAXPY( M, -T*AQOAP, 1307 $ A( 1, q ), 1, 1308 $ A( 1, p ), 1 ) 1309 CALL SAXPY( M, CS*SN*APOAQ, 1310 $ A( 1, p ), 1, 1311 $ A( 1, q ), 1 ) 1312 WORK( p ) = WORK( p )*CS 1313 WORK( q ) = WORK( q ) / CS 1314 IF( RSVEC ) THEN 1315 CALL SAXPY( MVL, 1316 $ -T*AQOAP, 1317 $ V( 1, q ), 1, 1318 $ V( 1, p ), 1 ) 1319 CALL SAXPY( MVL, 1320 $ CS*SN*APOAQ, 1321 $ V( 1, p ), 1, 1322 $ V( 1, q ), 1 ) 1323 END IF 1324 ELSE 1325 CALL SAXPY( M, T*APOAQ, 1326 $ A( 1, p ), 1, 1327 $ A( 1, q ), 1 ) 1328 CALL SAXPY( M, 1329 $ -CS*SN*AQOAP, 1330 $ A( 1, q ), 1, 1331 $ A( 1, p ), 1 ) 1332 WORK( p ) = WORK( p ) / CS 1333 WORK( q ) = WORK( q )*CS 1334 IF( RSVEC ) THEN 1335 CALL SAXPY( MVL, 1336 $ T*APOAQ, V( 1, p ), 1337 $ 1, V( 1, q ), 1 ) 1338 CALL SAXPY( MVL, 1339 $ -CS*SN*AQOAP, 1340 $ V( 1, q ), 1, 1341 $ V( 1, p ), 1 ) 1342 END IF 1343 END IF 1344 END IF 1345 END IF 1346 END IF 1347* 1348 ELSE 1349 IF( AAPP.GT.AAQQ ) THEN 1350 CALL SCOPY( M, A( 1, p ), 1, 1351 $ WORK( N+1 ), 1 ) 1352 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 1353 $ M, 1, WORK( N+1 ), LDA, 1354 $ IERR ) 1355 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 1356 $ M, 1, A( 1, q ), LDA, 1357 $ IERR ) 1358 TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 1359 CALL SAXPY( M, TEMP1, WORK( N+1 ), 1360 $ 1, A( 1, q ), 1 ) 1361 CALL SLASCL( 'G', 0, 0, ONE, AAQQ, 1362 $ M, 1, A( 1, q ), LDA, 1363 $ IERR ) 1364 SVA( q ) = AAQQ*SQRT( MAX( ZERO, 1365 $ ONE-AAPQ*AAPQ ) ) 1366 MXSINJ = MAX( MXSINJ, SFMIN ) 1367 ELSE 1368 CALL SCOPY( M, A( 1, q ), 1, 1369 $ WORK( N+1 ), 1 ) 1370 CALL SLASCL( 'G', 0, 0, AAQQ, ONE, 1371 $ M, 1, WORK( N+1 ), LDA, 1372 $ IERR ) 1373 CALL SLASCL( 'G', 0, 0, AAPP, ONE, 1374 $ M, 1, A( 1, p ), LDA, 1375 $ IERR ) 1376 TEMP1 = -AAPQ*WORK( q ) / WORK( p ) 1377 CALL SAXPY( M, TEMP1, WORK( N+1 ), 1378 $ 1, A( 1, p ), 1 ) 1379 CALL SLASCL( 'G', 0, 0, ONE, AAPP, 1380 $ M, 1, A( 1, p ), LDA, 1381 $ IERR ) 1382 SVA( p ) = AAPP*SQRT( MAX( ZERO, 1383 $ ONE-AAPQ*AAPQ ) ) 1384 MXSINJ = MAX( MXSINJ, SFMIN ) 1385 END IF 1386 END IF 1387* END IF ROTOK THEN ... ELSE 1388* 1389* In the case of cancellation in updating SVA(q) 1390* .. recompute SVA(q) 1391 IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 1392 $ THEN 1393 IF( ( AAQQ.LT.ROOTBIG ) .AND. 1394 $ ( AAQQ.GT.ROOTSFMIN ) ) THEN 1395 SVA( q ) = SNRM2( M, A( 1, q ), 1 )* 1396 $ WORK( q ) 1397 ELSE 1398 T = ZERO 1399 AAQQ = ONE 1400 CALL SLASSQ( M, A( 1, q ), 1, T, 1401 $ AAQQ ) 1402 SVA( q ) = T*SQRT( AAQQ )*WORK( q ) 1403 END IF 1404 END IF 1405 IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN 1406 IF( ( AAPP.LT.ROOTBIG ) .AND. 1407 $ ( AAPP.GT.ROOTSFMIN ) ) THEN 1408 AAPP = SNRM2( M, A( 1, p ), 1 )* 1409 $ WORK( p ) 1410 ELSE 1411 T = ZERO 1412 AAPP = ONE 1413 CALL SLASSQ( M, A( 1, p ), 1, T, 1414 $ AAPP ) 1415 AAPP = T*SQRT( AAPP )*WORK( p ) 1416 END IF 1417 SVA( p ) = AAPP 1418 END IF 1419* end of OK rotation 1420 ELSE 1421 NOTROT = NOTROT + 1 1422*[RTD] SKIPPED = SKIPPED + 1 1423 PSKIPPED = PSKIPPED + 1 1424 IJBLSK = IJBLSK + 1 1425 END IF 1426 ELSE 1427 NOTROT = NOTROT + 1 1428 PSKIPPED = PSKIPPED + 1 1429 IJBLSK = IJBLSK + 1 1430 END IF 1431* 1432 IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) ) 1433 $ THEN 1434 SVA( p ) = AAPP 1435 NOTROT = 0 1436 GO TO 2011 1437 END IF 1438 IF( ( i.LE.SWBAND ) .AND. 1439 $ ( PSKIPPED.GT.ROWSKIP ) ) THEN 1440 AAPP = -AAPP 1441 NOTROT = 0 1442 GO TO 2203 1443 END IF 1444* 1445 2200 CONTINUE 1446* end of the q-loop 1447 2203 CONTINUE 1448* 1449 SVA( p ) = AAPP 1450* 1451 ELSE 1452* 1453 IF( AAPP.EQ.ZERO )NOTROT = NOTROT + 1454 $ MIN( jgl+KBL-1, N ) - jgl + 1 1455 IF( AAPP.LT.ZERO )NOTROT = 0 1456* 1457 END IF 1458* 1459 2100 CONTINUE 1460* end of the p-loop 1461 2010 CONTINUE 1462* end of the jbc-loop 1463 2011 CONTINUE 1464*2011 bailed out of the jbc-loop 1465 DO 2012 p = igl, MIN( igl+KBL-1, N ) 1466 SVA( p ) = ABS( SVA( p ) ) 1467 2012 CONTINUE 1468*** 1469 2000 CONTINUE 1470*2000 :: end of the ibr-loop 1471* 1472* .. update SVA(N) 1473 IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) ) 1474 $ THEN 1475 SVA( N ) = SNRM2( M, A( 1, N ), 1 )*WORK( N ) 1476 ELSE 1477 T = ZERO 1478 AAPP = ONE 1479 CALL SLASSQ( M, A( 1, N ), 1, T, AAPP ) 1480 SVA( N ) = T*SQRT( AAPP )*WORK( N ) 1481 END IF 1482* 1483* Additional steering devices 1484* 1485 IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR. 1486 $ ( ISWROT.LE.N ) ) )SWBAND = i 1487* 1488 IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.SQRT( FLOAT( N ) )* 1489 $ TOL ) .AND. ( FLOAT( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN 1490 GO TO 1994 1491 END IF 1492* 1493 IF( NOTROT.GE.EMPTSW )GO TO 1994 1494* 1495 1993 CONTINUE 1496* end i=1:NSWEEP loop 1497* 1498* #:( Reaching this point means that the procedure has not converged. 1499 INFO = NSWEEP - 1 1500 GO TO 1995 1501* 1502 1994 CONTINUE 1503* #:) Reaching this point means numerical convergence after the i-th 1504* sweep. 1505* 1506 INFO = 0 1507* #:) INFO = 0 confirms successful iterations. 1508 1995 CONTINUE 1509* 1510* Sort the singular values and find how many are above 1511* the underflow threshold. 1512* 1513 N2 = 0 1514 N4 = 0 1515 DO 5991 p = 1, N - 1 1516 q = ISAMAX( N-p+1, SVA( p ), 1 ) + p - 1 1517 IF( p.NE.q ) THEN 1518 TEMP1 = SVA( p ) 1519 SVA( p ) = SVA( q ) 1520 SVA( q ) = TEMP1 1521 TEMP1 = WORK( p ) 1522 WORK( p ) = WORK( q ) 1523 WORK( q ) = TEMP1 1524 CALL SSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 1525 IF( RSVEC )CALL SSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 ) 1526 END IF 1527 IF( SVA( p ).NE.ZERO ) THEN 1528 N4 = N4 + 1 1529 IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1 1530 END IF 1531 5991 CONTINUE 1532 IF( SVA( N ).NE.ZERO ) THEN 1533 N4 = N4 + 1 1534 IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1 1535 END IF 1536* 1537* Normalize the left singular vectors. 1538* 1539 IF( LSVEC .OR. UCTOL ) THEN 1540 DO 1998 p = 1, N2 1541 CALL SSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 ) 1542 1998 CONTINUE 1543 END IF 1544* 1545* Scale the product of Jacobi rotations (assemble the fast rotations). 1546* 1547 IF( RSVEC ) THEN 1548 IF( APPLV ) THEN 1549 DO 2398 p = 1, N 1550 CALL SSCAL( MVL, WORK( p ), V( 1, p ), 1 ) 1551 2398 CONTINUE 1552 ELSE 1553 DO 2399 p = 1, N 1554 TEMP1 = ONE / SNRM2( MVL, V( 1, p ), 1 ) 1555 CALL SSCAL( MVL, TEMP1, V( 1, p ), 1 ) 1556 2399 CONTINUE 1557 END IF 1558 END IF 1559* 1560* Undo scaling, if necessary (and possible). 1561 IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / SKL ) ) ) 1562 $ .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( MAX( N2, 1 ) ) .GT. 1563 $ ( SFMIN / SKL ) ) ) ) THEN 1564 DO 2400 p = 1, N 1565 SVA( P ) = SKL*SVA( P ) 1566 2400 CONTINUE 1567 SKL = ONE 1568 END IF 1569* 1570 WORK( 1 ) = SKL 1571* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE 1572* then some of the singular values may overflow or underflow and 1573* the spectrum is given in this factored representation. 1574* 1575 WORK( 2 ) = FLOAT( N4 ) 1576* N4 is the number of computed nonzero singular values of A. 1577* 1578 WORK( 3 ) = FLOAT( N2 ) 1579* N2 is the number of singular values of A greater than SFMIN. 1580* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers 1581* that may carry some information. 1582* 1583 WORK( 4 ) = FLOAT( i ) 1584* i is the index of the last sweep before declaring convergence. 1585* 1586 WORK( 5 ) = MXAAPQ 1587* MXAAPQ is the largest absolute value of scaled pivots in the 1588* last sweep 1589* 1590 WORK( 6 ) = MXSINJ 1591* MXSINJ is the largest absolute value of the sines of Jacobi angles 1592* in the last sweep 1593* 1594 RETURN 1595* .. 1596* .. END OF SGESVJ 1597* .. 1598 END 1599