1=head1 NAME
2
3histogram module - part of the Wise2 package
4
5=head1 SYNOPSIS
6
7This module contains the following objects
8
9=over
10
11=item Histogram
12
13
14=back
15
16=head1 DESCRIPTION
17
18=head2 Object Histogram
19
20=over
21
22=item histogram
23
24 Type [int*  ] Scalar  counts of hits
25
26=item min
27
28 Type [int] Scalar  elem 0 of histogram == min
29
30=item max
31
32 Type [int] Scalar  last elem of histogram == max
33
34=item highscore
35
36 Type [int] Scalar  highest active elem has this score
37
38=item lowscore
39
40 Type [int] Scalar  lowest active elem has this score
41
42=item lumpsize
43
44 Type [int] Scalar  when resizing, overalloc by this
45
46=item total
47
48 Type [int] Scalar  total # of hits counted
49
50=item expect
51
52 Type [float*] Scalar  expected counts of hits
53
54=item fit_type
55
56 Type [int] Scalar  flag indicating distribution type
57
58=item param[3]
59
60 Type [float] Scalar  parameters used for fits
61
62=item chisq
63
64 Type [float] Scalar  chi-squared val for goodness of fit
65
66=item chip
67
68 Type [float] Scalar  P value for chisquared
69
70
71
72=back
73
74This Object came from Sean Eddy excellent histogram package.
75He donated it free of all restrictions to allow it to be used
76in the Wise2 package without complicated licensing terms.
77He is a *very* nice man.
78
79It was made into a dynamite object so that
80   a) External ports to scripting languages would be trivial
81   b) cooperation with future versions of histogram.c would be possible.
82
83Here is the rest of the documentation from sean.
84
85Keep a score histogram.
86
87The main implementation issue here is that the range of
88scores is unknown, and will go negative. histogram is
89a 0..max-min array that represents the range min..max.
90A given score is indexed in histogram array as score-min.
91The AddToHistogram function deals with dynamically
92resizing the histogram array when necessary.
93
94
95
96=head2 Member functions of Histogram
97
98=over
99
100=item UnfitHistogram
101
102&Wise2::Histogram::UnfitHistogram(h)
103
104
105  This function was written by Sean Eddy
106  as part of his HMMer2 histogram.c module
107
108  Converted by Ewan Birney to Dynamite source June 98.
109  Copyright is LGPL. For more info read READMEs
110
111  Documentation:
112
113  Free only the theoretical fit part of a histogram.
114
115
116
117  Argument h            [UNKN ] Undocumented argument [Histogram *]
118  Return [UNKN ] Undocumented return value [void]
119
120
121=item add
122
123&Wise2::Histogram::add(h,sc)
124
125
126  This function was written by Sean Eddy
127  as part of his HMMer2 histogram.c module
128
129  Converted by Ewan Birney to Dynamite source June 98.
130  Copyright is LGPL. For more info read READMEs
131
132  Documentation:
133
134  Bump the appropriate counter in a histogram
135  structure, given a score. The score is
136  rounded off from float precision to the
137  next lower integer.
138
139
140
141  Argument h            [UNKN ] Undocumented argument [Histogram *]
142  Argument sc           [UNKN ] Undocumented argument [float]
143  Return [UNKN ] Undocumented return value [void]
144
145
146=item show
147
148&Wise2::Histogram::show(h,fp)
149
150
151  This function was written by Sean Eddy
152  as part of his HMMer2 histogram.c module
153
154  Converted by Ewan Birney to Dynamite source June 98.
155  Copyright is LGPL. For more info read READMEs
156
157  Documentation:
158
159  Print a "prettified" histogram to a file pointer.
160  Deliberately a look-and-feel clone of Bill Pearson's
161  excellent FASTA output.
162
163
164
165  Argument h            [UNKN ] histogram to print [Histogram *]
166  Argument fp           [UNKN ] open file to print to (stdout works) [FILE *]
167  Return [UNKN ] Undocumented return value [void]
168
169
170=item EVDBasicFit
171
172&Wise2::Histogram::EVDBasicFit(h)
173
174
175  This function was written by Sean Eddy
176  as part of his HMMer2 histogram.c module
177
178  Converted by Ewan Birney to Dynamite source June 98.
179  Copyright is LGPL. For more info read READMEs
180
181  Documentation:
182
183  Fit a score histogram to the extreme value
184  distribution. Set the parameters lambda
185  and mu in the histogram structure. Fill in the
186  expected values in the histogram. Calculate
187  a chi-square test as a measure of goodness of fit.
188
189  This is the basic version of ExtremeValueFitHistogram(),
190  in a nonrobust form: simple linear regression with no
191  outlier pruning.
192
193  Methods:  Uses a linear regression fitting method [Collins88,Lawless82]
194
195
196
197  Argument h            [UNKN ] histogram to fit [Histogram *]
198  Return [UNKN ] Undocumented return value [void]
199
200
201=item fit_EVD
202
203&Wise2::Histogram::fit_EVD(h,censor,high_hint)
204
205
206  This function was written by Sean Eddy
207  as part of his HMMer2 histogram.c module
208
209  Converted by Ewan Birney to Dynamite source June 98.
210  Copyright is LGPL. For more info read READMEs
211
212  Documentation:
213
214  Purpose:  Fit a score histogram to the extreme value
215  distribution. Set the parameters lambda
216  and mu in the histogram structure. Calculate
217  a chi-square test as a measure of goodness of fit.
218
219  Methods:  Uses a maximum likelihood method [Lawless82].
220  Lower outliers are removed by censoring the data below the peak.
221  Upper outliers are removed iteratively using method
222  described by [Mott92].
223
224
225
226  Argument h            [UNKN ] histogram to fit [Histogram *]
227  Argument censor       [UNKN ] TRUE to censor data left of the peak [int]
228  Argument high_hint    [UNKN ] score cutoff; above this are real hits that arent fit [float]
229  Return [UNKN ] if fit is judged to be valid else 0 if fit is invalid (too few seqs.) [int]
230
231
232=item set_EVD
233
234&Wise2::Histogram::set_EVD(h,mu,lambda,lowbound,highbound,wonka,ndegrees)
235
236
237  This function was written by Sean Eddy
238  as part of his HMMer2 histogram.c module
239
240  Converted by Ewan Birney to Dynamite source June 98.
241  Copyright is LGPL. For more info read READMEs
242
243  Documentation:
244
245  Instead of fitting the histogram to an EVD,
246  simply set the EVD parameters from an external source.
247
248  Note that the fudge factor "wonka" is used /only/
249  for prettification of expected/theoretical curves
250  in PrintASCIIHistogram displays.
251
252
253
254
255  Argument h            [UNKN ] the histogram to set [Histogram *]
256  Argument mu           [UNKN ] mu location parameter                 [float]
257  Argument lambda       [UNKN ] lambda scale parameter [float]
258  Argument lowbound     [UNKN ] low bound of the histogram that was fit [float]
259  Argument highbound    [UNKN ] high bound of histogram that was fit [float]
260  Argument wonka        [UNKN ] fudge factor; fraction of hits estimated to be "EVD-like" [float]
261  Argument ndegrees     [UNKN ] extra degrees of freedom to subtract in chi2 test: [int]
262  Return [UNKN ] Undocumented return value [void]
263
264
265=item fit_Gaussian
266
267&Wise2::Histogram::fit_Gaussian(h,high_hint)
268
269
270  This function was written by Sean Eddy
271  as part of his HMMer2 histogram.c module
272
273  Converted by Ewan Birney to Dynamite source June 98.
274  Copyright is LGPL. For more info read READMEs
275
276  Documentation:
277
278  Fit a score histogram to a Gaussian distribution.
279  Set the parameters mean and sd in the histogram
280  structure, as well as a chi-squared test for
281  goodness of fit.
282
283
284
285
286  Argument h            [UNKN ] histogram to fit [Histogram *]
287  Argument high_hint    [UNKN ] score cutoff; above this are `real' hits that aren't fit [float]
288  Return [UNKN ] if fit is judged to be valid else 0 if fit is invalid (too few seqs.)            [int]
289
290
291=item set_Gaussian
292
293&Wise2::Histogram::set_Gaussian(h,mean,sd)
294
295
296  This function was written by Sean Eddy
297  as part of his HMMer2 histogram.c module
298
299  Converted by Ewan Birney to Dynamite source June 98.
300  Copyright is LGPL. For more info read READMEs
301
302  Documentation:
303
304  Instead of fitting the histogram to a Gaussian,
305  simply set the Gaussian parameters from an external source.
306
307
308
309  Argument h            [UNKN ] Undocumented argument [Histogram *]
310  Argument mean         [UNKN ] Undocumented argument [float]
311  Argument sd           [UNKN ] Undocumented argument [float]
312  Return [UNKN ] Undocumented return value [void]
313
314
315=item evalue
316
317&Wise2::Histogram::evalue(his,score)
318
319
320  This is a convient short cut for calculating
321  expected values from the histogram of results
322
323
324
325  Argument his          [UNKN ] Histogram object [Histogram *]
326  Argument score        [UNKN ] score you want the evalue for [double]
327  Return [UNKN ] Undocumented return value [double]
328
329
330=item hard_link_Histogram
331
332&Wise2::Histogram::hard_link_Histogram(obj)
333
334  Bumps up the reference count of the object
335  Meaning that multiple pointers can 'own' it
336
337
338
339  Argument obj          [UNKN ] Object to be hard linked [Histogram *]
340  Return [UNKN ] Undocumented return value [Histogram *]
341
342
343=item alloc
344
345&Wise2::Histogram::alloc(void)
346
347  Allocates structure: assigns defaults if given
348
349
350
351  Return [UNKN ] Undocumented return value [Histogram *]
352
353
354=back
355
356=over
357
358=item new_Histogram
359
360&Wise2::new_Histogram(min,max,lumpsize)
361
362
363  This function was written by Sean Eddy
364  as part of his HMMer2 histogram.c module
365
366  Converted by Ewan Birney to Dynamite source June 98.
367  Copyright is LGPL. For more info read READMEs
368
369  Documentation:
370
371  Allocate and return a histogram structure.
372  min and max are your best guess. They need
373  not be absolutely correct; the histogram
374  will expand dynamically to accomodate scores
375  that exceed these suggested bounds. The amount
376  that the histogram grows by is set by "lumpsize"
377
378  Was called AllocHistorgram new_Historgram is more wise2-ish
379
380
381
382  Argument min          [UNKN ] minimum score (integer) [int]
383  Argument max          [UNKN ] maximum score (integer) [int]
384  Argument lumpsize     [UNKN ] when reallocating histogram, the reallocation amount [int]
385  Return [UNKN ] Undocumented return value [Histogram *]
386
387
388=back
389
390