1 SUBROUTINE PSGEBD2( M, N, A, IA, JA, DESCA, D, E, TAUQ, TAUP, 2 $ WORK, LWORK, INFO ) 3* 4* -- ScaLAPACK auxiliary routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 1, 1997 8* 9* .. Scalar Arguments .. 10 INTEGER IA, INFO, JA, LWORK, M, N 11* .. 12* .. Array Arguments .. 13 INTEGER DESCA( * ) 14 REAL A( * ), D( * ), E( * ), TAUP( * ), TAUQ( * ), 15 $ WORK( * ) 16* .. 17* 18* Purpose 19* ======= 20* 21* PSGEBD2 reduces a real general M-by-N distributed matrix 22* sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to upper or lower bidiagonal 23* form B by an orthogonal transformation: Q' * sub( A ) * P = B. 24* 25* If M >= N, B is upper bidiagonal; if M < N, B is lower bidiagonal. 26* 27* Notes 28* ===== 29* 30* Each global data object is described by an associated description 31* vector. This vector stores the information required to establish 32* the mapping between an object element and its corresponding process 33* and memory location. 34* 35* Let A be a generic term for any 2D block cyclicly distributed array. 36* Such a global array has an associated description vector DESCA. 37* In the following comments, the character _ should be read as 38* "of the global array". 39* 40* NOTATION STORED IN EXPLANATION 41* --------------- -------------- -------------------------------------- 42* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 43* DTYPE_A = 1. 44* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 45* the BLACS process grid A is distribu- 46* ted over. The context itself is glo- 47* bal, but the handle (the integer 48* value) may vary. 49* M_A (global) DESCA( M_ ) The number of rows in the global 50* array A. 51* N_A (global) DESCA( N_ ) The number of columns in the global 52* array A. 53* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 54* the rows of the array. 55* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 56* the columns of the array. 57* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 58* row of the array A is distributed. 59* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 60* first column of the array A is 61* distributed. 62* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 63* array. LLD_A >= MAX(1,LOCr(M_A)). 64* 65* Let K be the number of rows or columns of a distributed matrix, 66* and assume that its process grid has dimension p x q. 67* LOCr( K ) denotes the number of elements of K that a process 68* would receive if K were distributed over the p processes of its 69* process column. 70* Similarly, LOCc( K ) denotes the number of elements of K that a 71* process would receive if K were distributed over the q processes of 72* its process row. 73* The values of LOCr() and LOCc() may be determined via a call to the 74* ScaLAPACK tool function, NUMROC: 75* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 76* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 77* An upper bound for these quantities may be computed by: 78* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 79* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 80* 81* Arguments 82* ========= 83* 84* M (global input) INTEGER 85* The number of rows to be operated on, i.e. the number of rows 86* of the distributed submatrix sub( A ). M >= 0. 87* 88* N (global input) INTEGER 89* The number of columns to be operated on, i.e. the number of 90* columns of the distributed submatrix sub( A ). N >= 0. 91* 92* A (local input/local output) REAL pointer into the 93* local memory to an array of dimension (LLD_A,LOCc(JA+N-1)). 94* On entry, this array contains the local pieces of the 95* general distributed matrix sub( A ). On exit, if M >= N, 96* the diagonal and the first superdiagonal of sub( A ) are 97* overwritten with the upper bidiagonal matrix B; the elements 98* below the diagonal, with the array TAUQ, represent the 99* orthogonal matrix Q as a product of elementary reflectors, 100* and the elements above the first superdiagonal, with the 101* array TAUP, represent the orthogonal matrix P as a product 102* of elementary reflectors. If M < N, the diagonal and the 103* first subdiagonal are overwritten with the lower bidiagonal 104* matrix B; the elements below the first subdiagonal, with the 105* array TAUQ, represent the orthogonal matrix Q as a product of 106* elementary reflectors, and the elements above the diagonal, 107* with the array TAUP, represent the orthogonal matrix P as a 108* product of elementary reflectors. See Further Details. 109* 110* IA (global input) INTEGER 111* The row index in the global array A indicating the first 112* row of sub( A ). 113* 114* JA (global input) INTEGER 115* The column index in the global array A indicating the 116* first column of sub( A ). 117* 118* DESCA (global and local input) INTEGER array of dimension DLEN_. 119* The array descriptor for the distributed matrix A. 120* 121* D (local output) REAL array, dimension 122* LOCc(JA+MIN(M,N)-1) if M >= N; LOCr(IA+MIN(M,N)-1) otherwise. 123* The distributed diagonal elements of the bidiagonal matrix 124* B: D(i) = A(i,i). D is tied to the distributed matrix A. 125* 126* E (local output) REAL array, dimension 127* LOCr(IA+MIN(M,N)-1) if M >= N; LOCc(JA+MIN(M,N)-2) otherwise. 128* The distributed off-diagonal elements of the bidiagonal 129* distributed matrix B: 130* if m >= n, E(i) = A(i,i+1) for i = 1,2,...,n-1; 131* if m < n, E(i) = A(i+1,i) for i = 1,2,...,m-1. 132* E is tied to the distributed matrix A. 133* 134* TAUQ (local output) REAL array dimension 135* LOCc(JA+MIN(M,N)-1). The scalar factors of the elementary 136* reflectors which represent the orthogonal matrix Q. TAUQ 137* is tied to the distributed matrix A. See Further Details. 138* 139* TAUP (local output) REAL array, dimension 140* LOCr(IA+MIN(M,N)-1). The scalar factors of the elementary 141* reflectors which represent the orthogonal matrix P. TAUP 142* is tied to the distributed matrix A. See Further Details. 143* 144* WORK (local workspace/local output) REAL array, 145* dimension (LWORK) 146* On exit, WORK(1) returns the minimal and optimal LWORK. 147* 148* LWORK (local or global input) INTEGER 149* The dimension of the array WORK. 150* LWORK is local input and must be at least 151* LWORK >= MAX( MpA0, NqA0 ) 152* 153* where NB = MB_A = NB_A, IROFFA = MOD( IA-1, NB ) 154* IAROW = INDXG2P( IA, NB, MYROW, RSRC_A, NPROW ), 155* IACOL = INDXG2P( JA, NB, MYCOL, CSRC_A, NPCOL ), 156* MpA0 = NUMROC( M+IROFFA, NB, MYROW, IAROW, NPROW ), 157* NqA0 = NUMROC( N+IROFFA, NB, MYCOL, IACOL, NPCOL ). 158* 159* INDXG2P and NUMROC are ScaLAPACK tool functions; 160* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 161* the subroutine BLACS_GRIDINFO. 162* 163* If LWORK = -1, then LWORK is global input and a workspace 164* query is assumed; the routine only calculates the minimum 165* and optimal size for all work arrays. Each of these 166* values is returned in the first entry of the corresponding 167* work array, and no error message is issued by PXERBLA. 168* 169* INFO (local output) INTEGER 170* = 0: successful exit 171* < 0: If the i-th argument is an array and the j-entry had 172* an illegal value, then INFO = -(i*100+j), if the i-th 173* argument is a scalar and had an illegal value, then 174* INFO = -i. 175* 176* Further Details 177* =============== 178* 179* The matrices Q and P are represented as products of elementary 180* reflectors: 181* 182* If m >= n, 183* 184* Q = H(1) H(2) . . . H(n) and P = G(1) G(2) . . . G(n-1) 185* 186* Each H(i) and G(i) has the form: 187* 188* H(i) = I - tauq * v * v' and G(i) = I - taup * u * u' 189* 190* where tauq and taup are real scalars, and v and u are real vectors; 191* v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in 192* A(ia+i:ia+m-1,ja+i-1); 193* u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in 194* A(ia+i-1,ja+i+1:ja+n-1); 195* tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1). 196* 197* If m < n, 198* 199* Q = H(1) H(2) . . . H(m-1) and P = G(1) G(2) . . . G(m) 200* 201* Each H(i) and G(i) has the form: 202* 203* H(i) = I - tauq * v * v' and G(i) = I - taup * u * u' 204* 205* where tauq and taup are real scalars, and v and u are real vectors; 206* v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in 207* A(ia+i+1:ia+m-1,ja+i-1); 208* u(1:i-1) = 0, u(i) = 1, and u(i+1:n) is stored on exit in 209* A(ia+i-1,ja+i:ja+n-1); 210* tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1). 211* 212* The contents of sub( A ) on exit are illustrated by the following 213* examples: 214* 215* m = 6 and n = 5 (m > n): m = 5 and n = 6 (m < n): 216* 217* ( d e u1 u1 u1 ) ( d u1 u1 u1 u1 u1 ) 218* ( v1 d e u2 u2 ) ( e d u2 u2 u2 u2 ) 219* ( v1 v2 d e u3 ) ( v1 e d u3 u3 u3 ) 220* ( v1 v2 v3 d e ) ( v1 v2 e d u4 u4 ) 221* ( v1 v2 v3 v4 d ) ( v1 v2 v3 e d u5 ) 222* ( v1 v2 v3 v4 v5 ) 223* 224* where d and e denote diagonal and off-diagonal elements of B, vi 225* denotes an element of the vector defining H(i), and ui an element of 226* the vector defining G(i). 227* 228* Alignment requirements 229* ====================== 230* 231* The distributed submatrix sub( A ) must verify some alignment proper- 232* ties, namely the following expressions should be true: 233* ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA ) 234* 235* ===================================================================== 236* 237* .. Parameters .. 238 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 239 $ LLD_, MB_, M_, NB_, N_, RSRC_ 240 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 241 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 242 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 243 REAL ONE, ZERO 244 PARAMETER ( ONE = 1.0E+0, ZERO = 0.0E+0 ) 245* .. 246* .. Local Scalars .. 247 LOGICAL LQUERY 248 INTEGER I, IACOL, IAROW, ICOFFA, ICTXT, II, IROFFA, J, 249 $ JJ, K, LWMIN, MPA0, MYCOL, MYROW, NPCOL, NPROW, 250 $ NQA0 251 REAL ALPHA 252* .. 253* .. Local Arrays .. 254 INTEGER DESCD( DLEN_ ), DESCE( DLEN_ ) 255* .. 256* .. External Subroutines .. 257 EXTERNAL BLACS_ABORT, BLACS_GRIDINFO, CHK1MAT, DESCSET, 258 $ INFOG2L, PSLARF, PSLARFG, PSELSET, 259 $ PXERBLA, SGEBR2D, SGEBS2D, SLARFG 260* .. 261* .. External Functions .. 262 INTEGER INDXG2P, NUMROC 263 EXTERNAL INDXG2P, NUMROC 264* .. 265* .. Intrinsic Functions .. 266 INTRINSIC MAX, MIN, MOD, REAL 267* .. 268* .. Executable Statements .. 269* 270* Test the input parameters 271* 272 ICTXT = DESCA( CTXT_ ) 273 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 274* 275* Test the input parameters 276* 277 INFO = 0 278 IF( NPROW.EQ.-1 ) THEN 279 INFO = -(600+CTXT_) 280 ELSE 281 CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO ) 282 IF( INFO.EQ.0 ) THEN 283 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 284 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 285 IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), 286 $ NPROW ) 287 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 288 $ NPCOL ) 289 MPA0 = NUMROC( M+IROFFA, DESCA( MB_ ), MYROW, IAROW, NPROW ) 290 NQA0 = NUMROC( N+ICOFFA, DESCA( NB_ ), MYCOL, IACOL, NPCOL ) 291 LWMIN = MAX( MPA0, NQA0 ) 292* 293 WORK( 1 ) = REAL( LWMIN ) 294 LQUERY = ( LWORK.EQ.-1 ) 295 IF( IROFFA.NE.ICOFFA ) THEN 296 INFO = -5 297 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 298 INFO = -(600+NB_) 299 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 300 INFO = -12 301 END IF 302 END IF 303 END IF 304* 305 IF( INFO.LT.0 ) THEN 306 CALL PXERBLA( ICTXT, 'PSGEBD2', -INFO ) 307 CALL BLACS_ABORT( ICTXT, 1 ) 308 RETURN 309 ELSE IF( LQUERY ) THEN 310 RETURN 311 END IF 312* 313 CALL INFOG2L( IA, JA, DESCA, NPROW, NPCOL, MYROW, MYCOL, II, JJ, 314 $ IAROW, IACOL ) 315* 316 IF( M.EQ.1 .AND. N.EQ.1 ) THEN 317 IF( MYCOL.EQ.IACOL ) THEN 318 IF( MYROW.EQ.IAROW ) THEN 319 I = II+(JJ-1)*DESCA( LLD_ ) 320 CALL SLARFG( 1, A( I ), A( I ), 1, TAUQ( JJ ) ) 321 D( JJ ) = A( I ) 322 CALL SGEBS2D( ICTXT, 'Columnwise', ' ', 1, 1, D( JJ ), 323 $ 1 ) 324 CALL SGEBS2D( ICTXT, 'Columnwise', ' ', 1, 1, TAUQ( JJ ), 325 $ 1 ) 326 ELSE 327 CALL SGEBR2D( ICTXT, 'Columnwise', ' ', 1, 1, D( JJ ), 328 $ 1, IAROW, IACOL ) 329 CALL SGEBR2D( ICTXT, 'Columnwise', ' ', 1, 1, TAUQ( JJ ), 330 $ 1, IAROW, IACOL ) 331 END IF 332 END IF 333 IF( MYROW.EQ.IAROW ) 334 $ TAUP( II ) = ZERO 335 RETURN 336 END IF 337* 338 ALPHA = ZERO 339* 340 IF( M.GE.N ) THEN 341* 342* Reduce to upper bidiagonal form 343* 344 CALL DESCSET( DESCD, 1, JA+MIN(M,N)-1, 1, DESCA( NB_ ), MYROW, 345 $ DESCA( CSRC_ ), DESCA( CTXT_ ), 1 ) 346 CALL DESCSET( DESCE, IA+MIN(M,N)-1, 1, DESCA( MB_ ), 1, 347 $ DESCA( RSRC_ ), MYCOL, DESCA( CTXT_ ), 348 $ DESCA( LLD_ ) ) 349 DO 10 K = 1, N 350 I = IA + K - 1 351 J = JA + K - 1 352* 353* Generate elementary reflector H(j) to annihilate 354* A(ia+i:ia+m-1,j) 355* 356 CALL PSLARFG( M-K+1, ALPHA, I, J, A, MIN( I+1, M+IA-1 ), 357 $ J, DESCA, 1, TAUQ ) 358 CALL PSELSET( D, 1, J, DESCD, ALPHA ) 359 CALL PSELSET( A, I, J, DESCA, ONE ) 360* 361* Apply H(i) to A(i:ia+m-1,i+1:ja+n-1) from the left 362* 363 CALL PSLARF( 'Left', M-K+1, N-K, A, I, J, DESCA, 1, TAUQ, A, 364 $ I, J+1, DESCA, WORK ) 365 CALL PSELSET( A, I, J, DESCA, ALPHA ) 366* 367 IF( K.LT.N ) THEN 368* 369* Generate elementary reflector G(i) to annihilate 370* A(i,ja+j+1:ja+n-1) 371* 372 CALL PSLARFG( N-K, ALPHA, I, J+1, A, I, 373 $ MIN( J+2, JA+N-1 ), DESCA, DESCA( M_ ), 374 $ TAUP ) 375 CALL PSELSET( E, I, 1, DESCE, ALPHA ) 376 CALL PSELSET( A, I, J+1, DESCA, ONE ) 377* 378* Apply G(i) to A(i+1:ia+m-1,i+1:ja+n-1) from the right 379* 380 CALL PSLARF( 'Right', M-K, N-K, A, I, J+1, DESCA, 381 $ DESCA( M_ ), TAUP, A, I+1, J+1, DESCA, 382 $ WORK ) 383 CALL PSELSET( A, I, J+1, DESCA, ALPHA ) 384 ELSE 385 CALL PSELSET( TAUP, I, 1, DESCE, ZERO ) 386 END IF 387 10 CONTINUE 388* 389 ELSE 390* 391* Reduce to lower bidiagonal form 392* 393 CALL DESCSET( DESCD, IA+MIN(M,N)-1, 1, DESCA( MB_ ), 1, 394 $ DESCA( RSRC_ ), MYCOL, DESCA( CTXT_ ), 395 $ DESCA( LLD_ ) ) 396 CALL DESCSET( DESCE, 1, JA+MIN(M,N)-1, 1, DESCA( NB_ ), MYROW, 397 $ DESCA( CSRC_ ), DESCA( CTXT_ ), 1 ) 398 DO 20 K = 1, M 399 I = IA + K - 1 400 J = JA + K - 1 401* 402* Generate elementary reflector G(i) to annihilate 403* A(i,ja+j:ja+n-1) 404* 405 CALL PSLARFG( N-K+1, ALPHA, I, J, A, I, 406 $ MIN( J+1, JA+N-1 ), DESCA, DESCA( M_ ), TAUP ) 407 CALL PSELSET( D, I, 1, DESCD, ALPHA ) 408 CALL PSELSET( A, I, J, DESCA, ONE ) 409* 410* Apply G(i) to A(i:ia+m-1,j:ja+n-1) from the right 411* 412 CALL PSLARF( 'Right', M-K, N-K+1, A, I, J, DESCA, 413 $ DESCA( M_ ), TAUP, A, MIN( I+1, IA+M-1 ), J, 414 $ DESCA, WORK ) 415 CALL PSELSET( A, I, J, DESCA, ALPHA ) 416* 417 IF( K.LT.M ) THEN 418* 419* Generate elementary reflector H(i) to annihilate 420* A(i+2:ia+m-1,j) 421* 422 CALL PSLARFG( M-K, ALPHA, I+1, J, A, 423 $ MIN( I+2, IA+M-1 ), J, DESCA, 1, TAUQ ) 424 CALL PSELSET( E, 1, J, DESCE, ALPHA ) 425 CALL PSELSET( A, I+1, J, DESCA, ONE ) 426* 427* Apply H(i) to A(i+1:ia+m-1,j+1:ja+n-1) from the left 428* 429 CALL PSLARF( 'Left', M-K, N-K, A, I+1, J, DESCA, 1, TAUQ, 430 $ A, I+1, J+1, DESCA, WORK ) 431 CALL PSELSET( A, I+1, J, DESCA, ALPHA ) 432 ELSE 433 CALL PSELSET( TAUQ, 1, J, DESCE, ZERO ) 434 END IF 435 20 CONTINUE 436 END IF 437* 438 WORK( 1 ) = REAL( LWMIN ) 439* 440 RETURN 441* 442* End of PSGEBD2 443* 444 END 445