1 SUBROUTINE PZTZRZF( M, N, A, IA, JA, DESCA, TAU, WORK, LWORK, 2 $ INFO ) 3* 4* -- ScaLAPACK routine (version 1.7) -- 5* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 6* and University of California, Berkeley. 7* May 25, 2001 8* 9* .. Scalar Arguments .. 10 INTEGER IA, INFO, JA, LWORK, M, N 11* .. 12* .. Array Arguments .. 13 INTEGER DESCA( * ) 14 COMPLEX*16 A( * ), TAU( * ), WORK( * ) 15* .. 16* 17* Purpose 18* ======= 19* 20* PZTZRZF reduces the M-by-N ( M<=N ) complex upper trapezoidal matrix 21* sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to upper triangular form by means 22* of unitary transformations. 23* 24* The upper trapezoidal matrix sub( A ) is factored as 25* 26* sub( A ) = ( R 0 ) * Z, 27* 28* where Z is an N-by-N unitary matrix and R is an M-by-M upper 29* triangular matrix. 30* 31* Notes 32* ===== 33* 34* Each global data object is described by an associated description 35* vector. This vector stores the information required to establish 36* the mapping between an object element and its corresponding process 37* and memory location. 38* 39* Let A be a generic term for any 2D block cyclicly distributed array. 40* Such a global array has an associated description vector DESCA. 41* In the following comments, the character _ should be read as 42* "of the global array". 43* 44* NOTATION STORED IN EXPLANATION 45* --------------- -------------- -------------------------------------- 46* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 47* DTYPE_A = 1. 48* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 49* the BLACS process grid A is distribu- 50* ted over. The context itself is glo- 51* bal, but the handle (the integer 52* value) may vary. 53* M_A (global) DESCA( M_ ) The number of rows in the global 54* array A. 55* N_A (global) DESCA( N_ ) The number of columns in the global 56* array A. 57* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 58* the rows of the array. 59* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 60* the columns of the array. 61* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 62* row of the array A is distributed. 63* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 64* first column of the array A is 65* distributed. 66* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 67* array. LLD_A >= MAX(1,LOCr(M_A)). 68* 69* Let K be the number of rows or columns of a distributed matrix, 70* and assume that its process grid has dimension p x q. 71* LOCr( K ) denotes the number of elements of K that a process 72* would receive if K were distributed over the p processes of its 73* process column. 74* Similarly, LOCc( K ) denotes the number of elements of K that a 75* process would receive if K were distributed over the q processes of 76* its process row. 77* The values of LOCr() and LOCc() may be determined via a call to the 78* ScaLAPACK tool function, NUMROC: 79* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 80* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 81* An upper bound for these quantities may be computed by: 82* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 83* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 84* 85* Arguments 86* ========= 87* 88* M (global input) INTEGER 89* The number of rows to be operated on, i.e. the number of rows 90* of the distributed submatrix sub( A ). M >= 0. 91* 92* N (global input) INTEGER 93* The number of columns to be operated on, i.e. the number of 94* columns of the distributed submatrix sub( A ). N >= 0. 95* 96* A (local input/local output) COMPLEX*16 pointer into the 97* local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 98* On entry, the local pieces of the M-by-N distributed matrix 99* sub( A ) which is to be factored. On exit, the leading M-by-M 100* upper triangular part of sub( A ) contains the upper trian- 101* gular matrix R, and elements M+1 to N of the first M rows of 102* sub( A ), with the array TAU, represent the unitary matrix Z 103* as a product of M elementary reflectors. 104* 105* IA (global input) INTEGER 106* The row index in the global array A indicating the first 107* row of sub( A ). 108* 109* JA (global input) INTEGER 110* The column index in the global array A indicating the 111* first column of sub( A ). 112* 113* DESCA (global and local input) INTEGER array of dimension DLEN_. 114* The array descriptor for the distributed matrix A. 115* 116* TAU (local output) COMPLEX*16, array, dimension LOCr(IA+M-1) 117* This array contains the scalar factors of the elementary 118* reflectors. TAU is tied to the distributed matrix A. 119* 120* WORK (local workspace/local output) COMPLEX*16 array, 121* dimension (LWORK) 122* On exit, WORK(1) returns the minimal and optimal LWORK. 123* 124* LWORK (local or global input) INTEGER 125* The dimension of the array WORK. 126* LWORK is local input and must be at least 127* LWORK >= MB_A * ( Mp0 + Nq0 + MB_A ), where 128* 129* IROFF = MOD( IA-1, MB_A ), ICOFF = MOD( JA-1, NB_A ), 130* IAROW = INDXG2P( IA, MB_A, MYROW, RSRC_A, NPROW ), 131* IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 132* Mp0 = NUMROC( M+IROFF, MB_A, MYROW, IAROW, NPROW ), 133* Nq0 = NUMROC( N+ICOFF, NB_A, MYCOL, IACOL, NPCOL ), 134* 135* and NUMROC, INDXG2P are ScaLAPACK tool functions; 136* MYROW, MYCOL, NPROW and NPCOL can be determined by calling 137* the subroutine BLACS_GRIDINFO. 138* 139* If LWORK = -1, then LWORK is global input and a workspace 140* query is assumed; the routine only calculates the minimum 141* and optimal size for all work arrays. Each of these 142* values is returned in the first entry of the corresponding 143* work array, and no error message is issued by PXERBLA. 144* 145* INFO (global output) INTEGER 146* = 0: successful exit 147* < 0: If the i-th argument is an array and the j-entry had 148* an illegal value, then INFO = -(i*100+j), if the i-th 149* argument is a scalar and had an illegal value, then 150* INFO = -i. 151* 152* Further Details 153* =============== 154* 155* The factorization is obtained by Householder's method. The kth 156* transformation matrix, Z( k ), whose conjugate transpose is used to 157* introduce zeros into the (m - k + 1)th row of sub( A ), is given in 158* the form 159* 160* Z( k ) = ( I 0 ), 161* ( 0 T( k ) ) 162* 163* where 164* 165* T( k ) = I - tau*u( k )*u( k )', u( k ) = ( 1 ), 166* ( 0 ) 167* ( z( k ) ) 168* 169* tau is a scalar and z( k ) is an ( n - m ) element vector. 170* tau and z( k ) are chosen to annihilate the elements of the kth row 171* of sub( A ). 172* 173* The scalar tau is returned in the kth element of TAU and the vector 174* u( k ) in the kth row of sub( A ), such that the elements of z( k ) 175* are in a( k, m + 1 ), ..., a( k, n ). The elements of R are returned 176* in the upper triangular part of sub( A ). 177* 178* Z is given by 179* 180* Z = Z( 1 ) * Z( 2 ) * ... * Z( m ). 181* 182* ===================================================================== 183* 184* .. Parameters .. 185 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 186 $ LLD_, MB_, M_, NB_, N_, RSRC_ 187 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 188 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 189 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 190 COMPLEX*16 ZERO 191 PARAMETER ( ZERO = ( 0.0D+0, 0.0D+0 ) ) 192* .. 193* .. Local Scalars .. 194 LOGICAL LQUERY 195 CHARACTER COLBTOP, ROWBTOP 196 INTEGER I, IACOL, IAROW, IB, ICTXT, IIA, IL, IN, IPW, 197 $ IROFFA, J, JM1, L, LWMIN, MP0, MYCOL, MYROW, 198 $ NPCOL, NPROW, NQ0 199* .. 200* .. Local Arrays .. 201 INTEGER IDUM1( 1 ), IDUM2( 1 ) 202* .. 203* .. External Subroutines .. 204 EXTERNAL BLACS_GRIDINFO, CHK1MAT, INFOG1L, PCHK1MAT, 205 $ PB_TOPGET, PB_TOPSET, PXERBLA, PZLATRZ, 206 $ PZLARZB, PZLARZT 207* .. 208* .. External Functions .. 209 INTEGER ICEIL, INDXG2P, NUMROC 210 EXTERNAL ICEIL, INDXG2P, NUMROC 211* .. 212* .. Intrinsic Functions .. 213 INTRINSIC DBLE, DCMPLX, MAX, MIN, MOD 214* .. 215* .. Executable Statements .. 216* 217* Get grid parameters 218* 219 ICTXT = DESCA( CTXT_ ) 220 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 221* 222* Test the input parameters 223* 224 INFO = 0 225 IF( NPROW.EQ.-1 ) THEN 226 INFO = -(600+CTXT_) 227 ELSE 228 CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO ) 229 IF( INFO.EQ.0 ) THEN 230 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 231 IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), 232 $ NPROW ) 233 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 234 $ NPCOL ) 235 MP0 = NUMROC( M+IROFFA, DESCA( MB_ ), MYROW, IAROW, NPROW ) 236 NQ0 = NUMROC( N+MOD( JA-1, DESCA( NB_ ) ), DESCA( NB_ ), 237 $ MYCOL, IACOL, NPCOL ) 238 LWMIN = DESCA( MB_ ) * ( MP0 + NQ0 + DESCA( MB_ ) ) 239* 240 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 241 LQUERY = ( LWORK.EQ.-1 ) 242 IF( N.LT.M ) THEN 243 INFO = -2 244 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 245 INFO = -9 246 END IF 247 END IF 248 IF( LQUERY ) THEN 249 IDUM1( 1 ) = -1 250 ELSE 251 IDUM1( 1 ) = 1 252 END IF 253 IDUM2( 1 ) = 9 254 CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 1, IDUM1, IDUM2, 255 $ INFO ) 256 END IF 257* 258 IF( INFO.NE.0 ) THEN 259 CALL PXERBLA( ICTXT, 'PZTZRZF', -INFO ) 260 RETURN 261 ELSE IF( LQUERY ) THEN 262 RETURN 263 END IF 264* 265* Quick return if possible 266* 267 IF( M.EQ.0 .OR. N.EQ.0 ) 268 $ RETURN 269* 270 IF( M.EQ.N ) THEN 271* 272 CALL INFOG1L( IA, DESCA( MB_ ), NPROW, MYROW, DESCA( RSRC_ ), 273 $ IIA, IAROW ) 274 IF( MYROW.EQ.IAROW ) 275 $ MP0 = MP0 - IROFFA 276 DO 10 I = IIA, IIA+MP0-1 277 TAU( I ) = ZERO 278 10 CONTINUE 279* 280 ELSE 281* 282 L = N-M 283 JM1 = JA + MIN( M+1, N ) - 1 284 IPW = DESCA( MB_ ) * DESCA( MB_ ) + 1 285 IN = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+M-1 ) 286 IL = MAX( ( (IA+M-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 287 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 288 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 289 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 290 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 291* 292* Use blocked code initially 293* 294 DO 20 I = IL, IN+1, -DESCA( MB_ ) 295 IB = MIN( IA+M-I, DESCA( MB_ ) ) 296 J = JA + I - IA 297* 298* Compute the complete orthogonal factorization of the current 299* block A(i:i+ib-1,j:ja+n-1) 300* 301 CALL PZLATRZ( IB, JA+N-J, L, A, I, J, DESCA, TAU, WORK ) 302* 303 IF( I.GT.IA ) THEN 304* 305* Form the triangular factor of the block reflector 306* H = H(i+ib-1) . . . H(i+1) H(i) 307* 308 CALL PZLARZT( 'Backward', 'Rowwise', L, IB, A, I, JM1, 309 $ DESCA, TAU, WORK, WORK( IPW ) ) 310* 311* Apply H to A(ia:i-1,j:ja+n-1) from the right 312* 313 CALL PZLARZB( 'Right', 'No transpose', 'Backward', 314 $ 'Rowwise', I-IA, JA+N-J, IB, L, A, I, JM1, 315 $ DESCA, WORK, A, IA, J, DESCA, WORK( IPW ) ) 316 END IF 317* 318 20 CONTINUE 319* 320* Use unblocked code to factor the last or only block 321* 322 CALL PZLATRZ( IN-IA+1, N, N-M, A, IA, JA, DESCA, TAU, WORK ) 323* 324 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 325 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 326* 327 END IF 328* 329 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 330* 331 RETURN 332* 333* End of PZTZRZF 334* 335 END 336