1 SUBROUTINE PZLASSQ( N, X, IX, JX, DESCX, INCX, SCALE, SUMSQ ) 2* 3* -- ScaLAPACK auxiliary routine (version 1.7) -- 4* University of Tennessee, Knoxville, Oak Ridge National Laboratory, 5* and University of California, Berkeley. 6* May 1, 1997 7* 8* .. Scalar Arguments .. 9 INTEGER IX, INCX, JX, N 10 DOUBLE PRECISION SCALE, SUMSQ 11* .. 12* .. Array Arguments .. 13 INTEGER DESCX( * ) 14 COMPLEX*16 X( * ) 15* .. 16* 17* Purpose 18* ======= 19* 20* PZLASSQ returns the values scl and smsq such that 21* 22* ( scl**2 )*smsq = x( 1 )**2 +...+ x( n )**2 + ( scale**2 )*sumsq, 23* 24* where x( i ) = sub( X ) = abs( X( IX+(JX-1)*DESCX(M_)+(i-1)*INCX ) ). 25* The value of sumsq is assumed to be at least unity and the value of 26* ssq will then satisfy 27* 28* 1.0 .le. ssq .le. ( sumsq + 2*n ). 29* 30* scale is assumed to be non-negative and scl returns the value 31* 32* scl = max( scale, abs( real( x( i ) ) ), abs( aimag( x( i ) ) ) ), 33* i 34* 35* scale and sumsq must be supplied in SCALE and SUMSQ respectively. 36* SCALE and SUMSQ are overwritten by scl and ssq respectively. 37* 38* The routine makes only one pass through the vector sub( X ). 39* 40* Notes 41* ===== 42* 43* Each global data object is described by an associated description 44* vector. This vector stores the information required to establish 45* the mapping between an object element and its corresponding process 46* and memory location. 47* 48* Let A be a generic term for any 2D block cyclicly distributed array. 49* Such a global array has an associated description vector DESCA. 50* In the following comments, the character _ should be read as 51* "of the global array". 52* 53* NOTATION STORED IN EXPLANATION 54* --------------- -------------- -------------------------------------- 55* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 56* DTYPE_A = 1. 57* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 58* the BLACS process grid A is distribu- 59* ted over. The context itself is glo- 60* bal, but the handle (the integer 61* value) may vary. 62* M_A (global) DESCA( M_ ) The number of rows in the global 63* array A. 64* N_A (global) DESCA( N_ ) The number of columns in the global 65* array A. 66* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 67* the rows of the array. 68* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 69* the columns of the array. 70* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 71* row of the array A is distributed. 72* CSRC_A (global) DESCA( CSRC_ ) The process column over which the 73* first column of the array A is 74* distributed. 75* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 76* array. LLD_A >= MAX(1,LOCr(M_A)). 77* 78* Let K be the number of rows or columns of a distributed matrix, 79* and assume that its process grid has dimension p x q. 80* LOCr( K ) denotes the number of elements of K that a process 81* would receive if K were distributed over the p processes of its 82* process column. 83* Similarly, LOCc( K ) denotes the number of elements of K that a 84* process would receive if K were distributed over the q processes of 85* its process row. 86* The values of LOCr() and LOCc() may be determined via a call to the 87* ScaLAPACK tool function, NUMROC: 88* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 89* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 90* An upper bound for these quantities may be computed by: 91* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 92* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 93* 94* Because vectors may be viewed as a subclass of matrices, a 95* distributed vector is considered to be a distributed matrix. 96* 97* The result are only available in the scope of sub( X ), i.e if 98* sub( X ) is distributed along a process row, the correct results are 99* only available in this process row of the grid. Similarly if sub( X ) 100* is distributed along a process column, the correct results are only 101* available in this process column of the grid. 102* 103* Arguments 104* ========= 105* 106* N (global input) INTEGER 107* The length of the distributed vector sub( X ). 108* 109* X (input) COMPLEX*16 110* The vector for which a scaled sum of squares is computed. 111* x( i ) = X(IX+(JX-1)*M_X +(i-1)*INCX ), 1 <= i <= n. 112* 113* IX (global input) INTEGER 114* The row index in the global array X indicating the first 115* row of sub( X ). 116* 117* JX (global input) INTEGER 118* The column index in the global array X indicating the 119* first column of sub( X ). 120* 121* DESCX (global and local input) INTEGER array of dimension DLEN_. 122* The array descriptor for the distributed matrix X. 123* 124* INCX (global input) INTEGER 125* The global increment for the elements of X. Only two values 126* of INCX are supported in this version, namely 1 and M_X. 127* INCX must not be zero. 128* 129* SCALE (local input/local output) DOUBLE PRECISION 130* On entry, the value scale in the equation above. 131* On exit, SCALE is overwritten with scl , the scaling factor 132* for the sum of squares. 133* 134* SUMSQ (local input/local output) DOUBLE PRECISION 135* On entry, the value sumsq in the equation above. 136* On exit, SUMSQ is overwritten with smsq , the basic sum of 137* squares from which scl has been factored out. 138* 139* ===================================================================== 140* 141* .. Parameters .. 142 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 143 $ LLD_, MB_, M_, NB_, N_, RSRC_ 144 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 145 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 146 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 147 DOUBLE PRECISION ZERO 148 PARAMETER ( ZERO = 0.0D+0 ) 149* .. 150* .. Local Scalars .. 151 INTEGER I, ICOFF, ICTXT, IIX, IOFF, IROFF, IXCOL, 152 $ IXROW, JJX, LDX, MYCOL, MYROW, NP, NPCOL, 153 $ NPROW, NQ 154 DOUBLE PRECISION TEMP1 155* .. 156* .. Local Arrays .. 157 DOUBLE PRECISION WORK( 2 ) 158* .. 159* .. External Subroutines .. 160 EXTERNAL BLACS_GRIDINFO, DCOMBSSQ, INFOG2L, PDTREECOMB 161* .. 162* .. External Functions .. 163 INTEGER NUMROC 164 EXTERNAL NUMROC 165* .. 166* .. Intrinsic Functions .. 167 INTRINSIC ABS, DBLE, DIMAG, MOD 168* .. 169* .. Executable Statements .. 170* 171* Get grid parameters. 172* 173 ICTXT = DESCX( CTXT_ ) 174 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 175* 176* Figure local indexes 177* 178 CALL INFOG2L( IX, JX, DESCX, NPROW, NPCOL, MYROW, MYCOL, IIX, JJX, 179 $ IXROW, IXCOL ) 180* 181 LDX = DESCX( LLD_ ) 182 IF( INCX.EQ.DESCX( M_ ) ) THEN 183* 184* X is rowwise distributed. 185* 186 IF( MYROW.NE.IXROW ) 187 $ RETURN 188 ICOFF = MOD( JX, DESCX( NB_ ) ) 189 NQ = NUMROC( N+ICOFF, DESCX( NB_ ), MYCOL, IXCOL, NPCOL ) 190 IF( MYCOL.EQ.IXCOL ) 191 $ NQ = NQ - ICOFF 192* 193* Code direct from LAPACK's ZLASSQ, (save subroutine call) 194* 195 IF( NQ.GT.0 ) THEN 196 IOFF = IIX + ( JJX - 1 ) * LDX 197 DO 10 I = 1, NQ 198 IF( DBLE( X( IOFF ) ).NE.ZERO ) THEN 199 TEMP1 = ABS( DBLE( X( IOFF ) ) ) 200 IF( SCALE.LT.TEMP1 ) THEN 201 SUMSQ = 1 + SUMSQ * ( SCALE / TEMP1 )**2 202 SCALE = TEMP1 203 ELSE 204 SUMSQ = SUMSQ + ( TEMP1 / SCALE )**2 205 END IF 206 END IF 207 IF( DIMAG( X( IOFF ) ).NE.ZERO ) THEN 208 TEMP1 = ABS( DIMAG( X( IOFF ) ) ) 209 IF( SCALE.LT.TEMP1 ) THEN 210 SUMSQ = 1 + SUMSQ * ( SCALE / TEMP1 )**2 211 SCALE = TEMP1 212 ELSE 213 SUMSQ = SUMSQ + ( TEMP1 / SCALE )**2 214 END IF 215 END IF 216 IOFF = IOFF + LDX 217 10 CONTINUE 218 END IF 219* 220* Take local result and find global 221* 222 WORK( 1 ) = SCALE 223 WORK( 2 ) = SUMSQ 224* 225 CALL PDTREECOMB( ICTXT, 'Rowwise', 2, WORK, -1, IXCOL, 226 $ DCOMBSSQ ) 227* 228 SCALE = WORK( 1 ) 229 SUMSQ = WORK( 2 ) 230* 231 ELSE IF( INCX.EQ.1 ) THEN 232* 233* X is columnwise distributed. 234* 235 IF( MYCOL.NE.IXCOL ) 236 $ RETURN 237 IROFF = MOD( IX, DESCX( MB_ ) ) 238 NP = NUMROC( N+IROFF, DESCX( MB_ ), MYROW, IXROW, NPROW ) 239 IF( MYROW.EQ.IXROW ) 240 $ NP = NP - IROFF 241* 242* Code direct from LAPACK's ZLASSQ, (save subroutine call) 243* 244 IF( NP.GT.0 ) THEN 245 IOFF = IIX + ( JJX - 1 ) * LDX 246 DO 20 I = 1, NP 247 IF( DBLE( X( IOFF ) ).NE.ZERO ) THEN 248 TEMP1 = ABS( DBLE( X( IOFF ) ) ) 249 IF( SCALE.LT.TEMP1 ) THEN 250 SUMSQ = 1 + SUMSQ*( SCALE / TEMP1 )**2 251 SCALE = TEMP1 252 ELSE 253 SUMSQ = SUMSQ + ( TEMP1 / SCALE )**2 254 END IF 255 END IF 256 IF( DIMAG( X( IOFF ) ).NE.ZERO ) THEN 257 TEMP1 = ABS( DIMAG( X( IOFF ) ) ) 258 IF( SCALE.LT.TEMP1 ) THEN 259 SUMSQ = 1 + SUMSQ*( SCALE / TEMP1 )**2 260 SCALE = TEMP1 261 ELSE 262 SUMSQ = SUMSQ + ( TEMP1 / SCALE )**2 263 END IF 264 END IF 265 IOFF = IOFF + 1 266 20 CONTINUE 267 END IF 268* 269* Take local result and find global 270* 271 WORK( 1 ) = SCALE 272 WORK( 2 ) = SUMSQ 273* 274 CALL PDTREECOMB( ICTXT, 'Columnwise', 2, WORK, -1, IXCOL, 275 $ DCOMBSSQ ) 276* 277 SCALE = WORK( 1 ) 278 SUMSQ = WORK( 2 ) 279* 280 END IF 281* 282 RETURN 283* 284* End of PZLASSQ 285* 286 END 287