1=head1 NAME 2 3histogram module - part of the Wise2 package 4 5=head1 SYNOPSIS 6 7This module contains the following objects 8 9=over 10 11=item Histogram 12 13 14=back 15 16=head1 DESCRIPTION 17 18=head2 Object Histogram 19 20=over 21 22=item histogram 23 24 Type [int* ] Scalar counts of hits 25 26=item min 27 28 Type [int] Scalar elem 0 of histogram == min 29 30=item max 31 32 Type [int] Scalar last elem of histogram == max 33 34=item highscore 35 36 Type [int] Scalar highest active elem has this score 37 38=item lowscore 39 40 Type [int] Scalar lowest active elem has this score 41 42=item lumpsize 43 44 Type [int] Scalar when resizing, overalloc by this 45 46=item total 47 48 Type [int] Scalar total # of hits counted 49 50=item expect 51 52 Type [float*] Scalar expected counts of hits 53 54=item fit_type 55 56 Type [int] Scalar flag indicating distribution type 57 58=item param[3] 59 60 Type [float] Scalar parameters used for fits 61 62=item chisq 63 64 Type [float] Scalar chi-squared val for goodness of fit 65 66=item chip 67 68 Type [float] Scalar P value for chisquared 69 70 71 72=back 73 74This Object came from Sean Eddy excellent histogram package. 75He donated it free of all restrictions to allow it to be used 76in the Wise2 package without complicated licensing terms. 77He is a *very* nice man. 78 79It was made into a dynamite object so that 80 a) External ports to scripting languages would be trivial 81 b) cooperation with future versions of histogram.c would be possible. 82 83Here is the rest of the documentation from sean. 84 85Keep a score histogram. 86 87The main implementation issue here is that the range of 88scores is unknown, and will go negative. histogram is 89a 0..max-min array that represents the range min..max. 90A given score is indexed in histogram array as score-min. 91The AddToHistogram function deals with dynamically 92resizing the histogram array when necessary. 93 94 95 96=head2 Member functions of Histogram 97 98=over 99 100=item UnfitHistogram 101 102&Wise2::Histogram::UnfitHistogram(h) 103 104 105 This function was written by Sean Eddy 106 as part of his HMMer2 histogram.c module 107 108 Converted by Ewan Birney to Dynamite source June 98. 109 Copyright is LGPL. For more info read READMEs 110 111 Documentation: 112 113 Free only the theoretical fit part of a histogram. 114 115 116 117 Argument h [UNKN ] Undocumented argument [Histogram *] 118 Return [UNKN ] Undocumented return value [void] 119 120 121=item add 122 123&Wise2::Histogram::add(h,sc) 124 125 126 This function was written by Sean Eddy 127 as part of his HMMer2 histogram.c module 128 129 Converted by Ewan Birney to Dynamite source June 98. 130 Copyright is LGPL. For more info read READMEs 131 132 Documentation: 133 134 Bump the appropriate counter in a histogram 135 structure, given a score. The score is 136 rounded off from float precision to the 137 next lower integer. 138 139 140 141 Argument h [UNKN ] Undocumented argument [Histogram *] 142 Argument sc [UNKN ] Undocumented argument [float] 143 Return [UNKN ] Undocumented return value [void] 144 145 146=item show 147 148&Wise2::Histogram::show(h,fp) 149 150 151 This function was written by Sean Eddy 152 as part of his HMMer2 histogram.c module 153 154 Converted by Ewan Birney to Dynamite source June 98. 155 Copyright is LGPL. For more info read READMEs 156 157 Documentation: 158 159 Print a "prettified" histogram to a file pointer. 160 Deliberately a look-and-feel clone of Bill Pearson's 161 excellent FASTA output. 162 163 164 165 Argument h [UNKN ] histogram to print [Histogram *] 166 Argument fp [UNKN ] open file to print to (stdout works) [FILE *] 167 Return [UNKN ] Undocumented return value [void] 168 169 170=item EVDBasicFit 171 172&Wise2::Histogram::EVDBasicFit(h) 173 174 175 This function was written by Sean Eddy 176 as part of his HMMer2 histogram.c module 177 178 Converted by Ewan Birney to Dynamite source June 98. 179 Copyright is LGPL. For more info read READMEs 180 181 Documentation: 182 183 Fit a score histogram to the extreme value 184 distribution. Set the parameters lambda 185 and mu in the histogram structure. Fill in the 186 expected values in the histogram. Calculate 187 a chi-square test as a measure of goodness of fit. 188 189 This is the basic version of ExtremeValueFitHistogram(), 190 in a nonrobust form: simple linear regression with no 191 outlier pruning. 192 193 Methods: Uses a linear regression fitting method [Collins88,Lawless82] 194 195 196 197 Argument h [UNKN ] histogram to fit [Histogram *] 198 Return [UNKN ] Undocumented return value [void] 199 200 201=item fit_EVD 202 203&Wise2::Histogram::fit_EVD(h,censor,high_hint) 204 205 206 This function was written by Sean Eddy 207 as part of his HMMer2 histogram.c module 208 209 Converted by Ewan Birney to Dynamite source June 98. 210 Copyright is LGPL. For more info read READMEs 211 212 Documentation: 213 214 Purpose: Fit a score histogram to the extreme value 215 distribution. Set the parameters lambda 216 and mu in the histogram structure. Calculate 217 a chi-square test as a measure of goodness of fit. 218 219 Methods: Uses a maximum likelihood method [Lawless82]. 220 Lower outliers are removed by censoring the data below the peak. 221 Upper outliers are removed iteratively using method 222 described by [Mott92]. 223 224 225 226 Argument h [UNKN ] histogram to fit [Histogram *] 227 Argument censor [UNKN ] TRUE to censor data left of the peak [int] 228 Argument high_hint [UNKN ] score cutoff; above this are real hits that arent fit [float] 229 Return [UNKN ] if fit is judged to be valid else 0 if fit is invalid (too few seqs.) [int] 230 231 232=item set_EVD 233 234&Wise2::Histogram::set_EVD(h,mu,lambda,lowbound,highbound,wonka,ndegrees) 235 236 237 This function was written by Sean Eddy 238 as part of his HMMer2 histogram.c module 239 240 Converted by Ewan Birney to Dynamite source June 98. 241 Copyright is LGPL. For more info read READMEs 242 243 Documentation: 244 245 Instead of fitting the histogram to an EVD, 246 simply set the EVD parameters from an external source. 247 248 Note that the fudge factor "wonka" is used /only/ 249 for prettification of expected/theoretical curves 250 in PrintASCIIHistogram displays. 251 252 253 254 255 Argument h [UNKN ] the histogram to set [Histogram *] 256 Argument mu [UNKN ] mu location parameter [float] 257 Argument lambda [UNKN ] lambda scale parameter [float] 258 Argument lowbound [UNKN ] low bound of the histogram that was fit [float] 259 Argument highbound [UNKN ] high bound of histogram that was fit [float] 260 Argument wonka [UNKN ] fudge factor; fraction of hits estimated to be "EVD-like" [float] 261 Argument ndegrees [UNKN ] extra degrees of freedom to subtract in chi2 test: [int] 262 Return [UNKN ] Undocumented return value [void] 263 264 265=item fit_Gaussian 266 267&Wise2::Histogram::fit_Gaussian(h,high_hint) 268 269 270 This function was written by Sean Eddy 271 as part of his HMMer2 histogram.c module 272 273 Converted by Ewan Birney to Dynamite source June 98. 274 Copyright is LGPL. For more info read READMEs 275 276 Documentation: 277 278 Fit a score histogram to a Gaussian distribution. 279 Set the parameters mean and sd in the histogram 280 structure, as well as a chi-squared test for 281 goodness of fit. 282 283 284 285 286 Argument h [UNKN ] histogram to fit [Histogram *] 287 Argument high_hint [UNKN ] score cutoff; above this are `real' hits that aren't fit [float] 288 Return [UNKN ] if fit is judged to be valid else 0 if fit is invalid (too few seqs.) [int] 289 290 291=item set_Gaussian 292 293&Wise2::Histogram::set_Gaussian(h,mean,sd) 294 295 296 This function was written by Sean Eddy 297 as part of his HMMer2 histogram.c module 298 299 Converted by Ewan Birney to Dynamite source June 98. 300 Copyright is LGPL. For more info read READMEs 301 302 Documentation: 303 304 Instead of fitting the histogram to a Gaussian, 305 simply set the Gaussian parameters from an external source. 306 307 308 309 Argument h [UNKN ] Undocumented argument [Histogram *] 310 Argument mean [UNKN ] Undocumented argument [float] 311 Argument sd [UNKN ] Undocumented argument [float] 312 Return [UNKN ] Undocumented return value [void] 313 314 315=item evalue 316 317&Wise2::Histogram::evalue(his,score) 318 319 320 This is a convient short cut for calculating 321 expected values from the histogram of results 322 323 324 325 Argument his [UNKN ] Histogram object [Histogram *] 326 Argument score [UNKN ] score you want the evalue for [double] 327 Return [UNKN ] Undocumented return value [double] 328 329 330=item hard_link_Histogram 331 332&Wise2::Histogram::hard_link_Histogram(obj) 333 334 Bumps up the reference count of the object 335 Meaning that multiple pointers can 'own' it 336 337 338 339 Argument obj [UNKN ] Object to be hard linked [Histogram *] 340 Return [UNKN ] Undocumented return value [Histogram *] 341 342 343=item alloc 344 345&Wise2::Histogram::alloc(void) 346 347 Allocates structure: assigns defaults if given 348 349 350 351 Return [UNKN ] Undocumented return value [Histogram *] 352 353 354=back 355 356=over 357 358=item new_Histogram 359 360&Wise2::new_Histogram(min,max,lumpsize) 361 362 363 This function was written by Sean Eddy 364 as part of his HMMer2 histogram.c module 365 366 Converted by Ewan Birney to Dynamite source June 98. 367 Copyright is LGPL. For more info read READMEs 368 369 Documentation: 370 371 Allocate and return a histogram structure. 372 min and max are your best guess. They need 373 not be absolutely correct; the histogram 374 will expand dynamically to accomodate scores 375 that exceed these suggested bounds. The amount 376 that the histogram grows by is set by "lumpsize" 377 378 Was called AllocHistorgram new_Historgram is more wise2-ish 379 380 381 382 Argument min [UNKN ] minimum score (integer) [int] 383 Argument max [UNKN ] maximum score (integer) [int] 384 Argument lumpsize [UNKN ] when reallocating histogram, the reallocation amount [int] 385 Return [UNKN ] Undocumented return value [Histogram *] 386 387 388=back 389 390