1# Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved.
2#
3# This software is distributable under the terms of the GNU
4# General Public License (GPL) v2, the text of which can be found at
5# http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise
6# using this module constitutes acceptance of the terms of this License.
7#
8# Disclaimer
9#
10# This software is provided "as-is".  There are no expressed or implied
11# warranties of any kind, including, but not limited to, the warranties
12# of merchantability and fittness for a given application.  In no event
13# shall Gary Strangman be liable for any direct, indirect, incidental,
14# special, exemplary or consequential damages (including, but not limited
15# to, loss of use, data or profits, or business interruption) however
16# caused and on any theory of liability, whether in contract, strict
17# liability or tort (including negligence or otherwise) arising in any way
18# out of the use of this software, even if advised of the possibility of
19# such damage.
20#
21# Comments and/or additions are welcome (send e-mail to:
22# strang@nmr.mgh.harvard.edu).
23#
24"""
25stats.py module
26
27(Requires pstat.py module.)
28
29#################################################
30#######  Written by:  Gary Strangman  ###########
31#######  Last modified:  May 10, 2002 ###########
32#################################################
33
34A collection of basic statistical functions for python.  The function
35names appear below.
36
37IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l'
38prefix, which can be used with list or tuple arguments.  The second set has
39an 'a' prefix, which can accept NumPy array arguments.  These latter
40functions are defined only when NumPy is available on the system.  The third
41type has NO prefix (i.e., has the name that appears below).  Functions of
42this set are members of a "Dispatch" class, c/o David Ascher.  This class
43allows different functions to be called depending on the type of the passed
44arguments.  Thus, stats.mean is a member of the Dispatch class and
45stats.mean(range(20)) will call stats.lmean(range(20)) while
46stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
47This is a handy way to keep consistent function names when different
48argument types require different functions to be called.  Having
49implementated the Dispatch class, however, means that to get info on
50a given function, you must use the REAL function name ... that is
51"print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
52while "print stats.mean.__doc__" will print the doc for the Dispatch
53class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options
54but should otherwise be consistent with the corresponding list functions.
55
56Disclaimers:  The function list is obviously incomplete and, worse, the
57functions are not optimized.  All functions have been tested (some more
58so than others), but they are far from bulletproof.  Thus, as with any
59free software, no warranty or guarantee is expressed or implied. :-)  A
60few extra functions that don't appear in the list below can be found by
61interested treasure-hunters.  These functions don't necessarily have
62both list and array versions but were deemed useful
63
64CENTRAL TENDENCY:  geometricmean
65                   harmonicmean
66                   mean
67                   median
68                   medianscore
69                   mode
70
71MOMENTS:  moment
72          variation
73          skew
74          kurtosis
75          skewtest   (for Numpy arrays only)
76          kurtosistest (for Numpy arrays only)
77          normaltest (for Numpy arrays only)
78
79ALTERED VERSIONS:  tmean  (for Numpy arrays only)
80                   tvar   (for Numpy arrays only)
81                   tmin   (for Numpy arrays only)
82                   tmax   (for Numpy arrays only)
83                   tstdev (for Numpy arrays only)
84                   tsem   (for Numpy arrays only)
85                   describe
86
87FREQUENCY STATS:  itemfreq
88                  scoreatpercentile
89                  percentileofscore
90                  histogram
91                  cumfreq
92                  relfreq
93
94VARIABILITY:  obrientransform
95              samplevar
96              samplestdev
97              signaltonoise (for Numpy arrays only)
98              var
99              stdev
100              sterr
101              sem
102              z
103              zs
104              zmap (for Numpy arrays only)
105
106TRIMMING FCNS:  threshold (for Numpy arrays only)
107                trimboth
108                trim1
109                round (round all vals to 'n' decimals; Numpy only)
110
111CORRELATION FCNS:  covariance  (for Numpy arrays only)
112                   correlation (for Numpy arrays only)
113                   paired
114                   pearsonr
115                   spearmanr
116                   pointbiserialr
117                   kendalltau
118                   linregress
119
120INFERENTIAL STATS:  ttest_1samp
121                    ttest_ind
122                    ttest_rel
123                    chisquare
124                    ks_2samp
125                    mannwhitneyu
126                    ranksums
127                    wilcoxont
128                    kruskalwallish
129                    friedmanchisquare
130
131PROBABILITY CALCS:  chisqprob
132                    erfcc
133                    zprob
134                    ksprob
135                    fprob
136                    betacf
137                    gammln
138                    betai
139
140ANOVA FUNCTIONS:  F_oneway
141                  F_value
142
143SUPPORT FUNCTIONS:  writecc
144                    incr
145                    sign  (for Numpy arrays only)
146                    sum
147                    cumsum
148                    ss
149                    summult
150                    sumdiffsquared
151                    square_of_sums
152                    shellsort
153                    rankdata
154                    outputpairedstats
155                    findwithin
156"""
157# CHANGE LOG:
158# ===========
159# 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
160# 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
161# 00-12-28 ... removed aanova() to separate module, fixed licensing to
162# match Python License, fixed doc string & imports
163# 00-04-13 ... pulled all "global" statements, except from aanova()
164# added/fixed lots of documentation, removed io.py dependency
165# changed to version 0.5
166# 99-11-13 ... added asign() function
167# 99-11-01 ... changed version to 0.4 ... enough incremental changes now
168# 99-10-25 ... added acovariance and acorrelation functions
169# 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
170# added aglm function (crude, but will be improved)
171# 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
172# all handle lists of 'dimension's and keepdims
173# REMOVED ar0, ar2, ar3, ar4 and replaced them with around
174# reinserted fixes for abetai to avoid math overflows
175# 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
176# handle multi-dimensional arrays (whew!)
177# 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
178# added anormaltest per same reference
179# re-wrote azprob to calc arrays of probs all at once
180# 99-08-22 ... edited attest_ind printing section so arrays could be rounded
181# 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
182# short/byte arrays (mean of #s btw 100-300 = -150??)
183# 99-08-09 ... fixed asum so that the None case works for Byte arrays
184# 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
185# 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
186# 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
187# 04/11/99 ... added asignaltonoise, athreshold functions, changed all
188# max/min in array section to N.maximum/N.minimum,
189# fixed square_of_sums to prevent integer overflow
190# 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
191# 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
192# 02/28/99 ... Fixed aobrientransform to return an array rather than a list
193# 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
194# 01/13/99 ... CHANGED TO VERSION 0.3
195# fixed bug in a/lmannwhitneyu p-value calculation
196# 12/31/98 ... fixed variable-name bug in ldescribe
197# 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
198# 12/16/98 ... changed amedianscore to return float (not array) for 1 score
199# 12/14/98 ... added atmin and atmax functions
200# removed umath from import line (not needed)
201# l/ageometricmean modified to reduce chance of overflows (take
202# nth root first, then multiply)
203# 12/07/98 ... added __version__variable (now 0.2)
204# removed all 'stats.' from anova() fcn
205# 12/06/98 ... changed those functions (except shellsort) that altered
206# arguments in-place ... cumsum, ranksort, ...
207# updated (and fixed some) doc-strings
208# 12/01/98 ... added anova() function (requires NumPy)
209# incorporated Dispatch class
210# 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
211# added 'asum' function (added functionality to N.add.reduce)
212# fixed both moment and amoment (two errors)
213# changed name of skewness and askewness to skew and askew
214# fixed (a)histogram (which sometimes counted points <lowerlimit)
215
216import copy
217import math
218import string
219
220from . import pstat               # required 3rd party module
221
222
223__version__ = 0.6
224
225# DISPATCH CODE
226
227
228class Dispatch:
229    """
230The Dispatch class, care of David Ascher, allows different functions to
231be called depending on the argument types.  This way, there can be one
232function name regardless of the argument type.  To access function doc
233in stats.py module, prefix the function with an 'l' or 'a' for list or
234array arguments, respectively.  That is, print stats.lmean.__doc__ or
235print stats.amean.__doc__ or whatever.
236"""
237
238    def __init__(self, *tuples):
239        self._dispatch = {}
240        for func, types in tuples:
241            for t in types:
242                if t in self._dispatch.keys():
243                    raise ValueError("can't have two dispatches on "+str(t))
244                self._dispatch[t] = func
245        self._types = list(self._dispatch.keys())
246
247    def __call__(self, arg1, *args, **kw):
248        if type(arg1) not in self._types:
249            raise TypeError("don't know how to dispatch %s arguments" % type(arg1))
250        return self._dispatch[type(arg1)](*(arg1,) + args, **kw)
251
252
253# LIST-BASED FUNCTIONS
254
255# Define these regardless
256
257# CENTRAL TENDENCY
258
259def lgeometricmean(inlist):
260    """
261Calculates the geometric mean of the values in the passed list.
262That is:  n-th root of (x1 * x2 * ... * xn).  Assumes a '1D' list.
263
264Usage:   lgeometricmean(inlist)
265"""
266    mult = 1.0
267    one_over_n = 1.0/len(inlist)
268    for item in inlist:
269        mult = mult * pow(item, one_over_n)
270    return mult
271
272
273def lharmonicmean(inlist):
274    """
275Calculates the harmonic mean of the values in the passed list.
276That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Assumes a '1D' list.
277
278Usage:   lharmonicmean(inlist)
279"""
280    sum = 0
281    for item in inlist:
282        sum = sum + 1.0/item
283    return len(inlist) / sum
284
285
286def lmean(inlist):
287    """
288Returns the arithematic mean of the values in the passed list.
289Assumes a '1D' list, but will function on the 1st dim of an array(!).
290
291Usage:   lmean(inlist)
292"""
293    sum = 0
294    for item in inlist:
295        sum = sum + item
296    return sum/float(len(inlist))
297
298
299def lmedian(inlist, numbins=1000):
300    """
301Returns the computed median value of a list of numbers, given the
302number of bins to use for the histogram (more bins brings the computed value
303closer to the median score, default number of bins = 1000).  See G.W.
304Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
305
306Usage:   lmedian (inlist, numbins=1000)
307"""
308    (hist, smallest, binsize, extras) = histogram(inlist, numbins)  # make histog
309    cumhist = cumsum(hist)              # make cumulative histogram
310    for i in range(len(cumhist)):        # get 1st(!) index holding 50%ile score
311        if cumhist[i] >= len(inlist)/2.0:
312            cfbin = i
313            break
314    LRL = smallest + binsize*cfbin        # get lower read limit of that bin
315    cfbelow = cumhist[cfbin-1]
316    freq = float(hist[cfbin])                # frequency IN the 50%ile bin
317    median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize  # median formula
318    return median
319
320
321def lmedianscore(inlist):
322    """
323Returns the 'middle' score of the passed list.  If there is an even
324number of scores, the mean of the 2 middle scores is returned.
325
326Usage:   lmedianscore(inlist)
327"""
328
329    newlist = sorted(copy.deepcopy(inlist))
330    if len(newlist) % 2 == 0:   # if even number of scores, average middle 2
331        index = len(newlist)/2  # integer division correct
332        median = float(newlist[index] + newlist[index-1]) / 2
333    else:
334        index = len(newlist)/2  # int divsion gives mid value when count from 0
335        median = newlist[index]
336    return median
337
338
339def lmode(inlist):
340    """
341Returns a list of the modal (most common) score(s) in the passed
342list.  If there is more than one such score, all are returned.  The
343bin-count for the mode(s) is also returned.
344
345Usage:   lmode(inlist)
346Returns: bin-count for mode(s), a list of modal value(s)
347"""
348
349    scores = sorted(pstat.unique(inlist))
350    freq = []
351    for item in scores:
352        freq.append(inlist.count(item))
353    maxfreq = max(freq)
354    mode = []
355    stillmore = 1
356    while stillmore:
357        try:
358            indx = freq.index(maxfreq)
359            mode.append(scores[indx])
360            del freq[indx]
361            del scores[indx]
362        except ValueError:
363            stillmore = 0
364    return maxfreq, mode
365
366
367# MOMENTS
368
369def lmoment(inlist, moment=1):
370    """
371Calculates the nth moment about the mean for a sample (defaults to
372the 1st moment).  Used to calculate coefficients of skewness and kurtosis.
373
374Usage:   lmoment(inlist,moment=1)
375Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
376"""
377    if moment == 1:
378        return 0.0
379    else:
380        mn = mean(inlist)
381        n = len(inlist)
382        s = 0
383        for x in inlist:
384            s = s + (x-mn)**moment
385        return s/float(n)
386
387
388def lvariation(inlist):
389    """
390Returns the coefficient of variation, as defined in CRC Standard
391Probability and Statistics, p.6.
392
393Usage:   lvariation(inlist)
394"""
395    return 100.0*samplestdev(inlist)/float(mean(inlist))
396
397
398def lskew(inlist):
399    """
400Returns the skewness of a distribution, as defined in Numerical
401Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
402
403Usage:   lskew(inlist)
404"""
405    return moment(inlist, 3)/pow(moment(inlist, 2), 1.5)
406
407
408def lkurtosis(inlist):
409    """
410Returns the kurtosis of a distribution, as defined in Numerical
411Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
412
413Usage:   lkurtosis(inlist)
414"""
415    return moment(inlist, 4)/pow(moment(inlist, 2), 2.0)
416
417
418def ldescribe(inlist):
419    """
420Returns some descriptive statistics of the passed list (assumed to be 1D).
421
422Usage:   ldescribe(inlist)
423Returns: n, mean, standard deviation, skew, kurtosis
424"""
425    n = len(inlist)
426    mm = (min(inlist), max(inlist))
427    m = mean(inlist)
428    sd = stdev(inlist)
429    sk = skew(inlist)
430    kurt = kurtosis(inlist)
431    return n, mm, m, sd, sk, kurt
432
433
434# FREQUENCY STATS
435
436def litemfreq(inlist):
437    """
438Returns a list of pairs.  Each pair consists of one of the scores in inlist
439and it's frequency count.  Assumes a 1D list is passed.
440
441Usage:   litemfreq(inlist)
442Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
443"""
444    scores = sorted(pstat.unique(inlist))
445    freq = []
446    for item in scores:
447        freq.append(inlist.count(item))
448    return pstat.abut(scores, freq)
449
450
451def lscoreatpercentile(inlist, percent):
452    """
453Returns the score at a given percentile relative to the distribution
454given by inlist.
455
456Usage:   lscoreatpercentile(inlist,percent)
457"""
458    if percent > 1:
459        print("\nDividing percent>1 by 100 in lscoreatpercentile().\n")
460        percent = percent / 100.0
461    targetcf = percent*len(inlist)
462    h, lrl, binsize, extras = histogram(inlist)
463    cumhist = cumsum(copy.deepcopy(h))
464    for i in range(len(cumhist)):
465        if cumhist[i] >= targetcf:
466            break
467    score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
468    return score
469
470
471def lpercentileofscore(inlist, score, histbins=10, defaultlimits=None):
472    """
473Returns the percentile value of a score relative to the distribution
474given by inlist.  Formula depends on the values used to histogram the data(!).
475
476Usage:   lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
477"""
478
479    h, lrl, binsize, extras = histogram(inlist, histbins, defaultlimits)
480    cumhist = cumsum(copy.deepcopy(h))
481    i = int((score - lrl)/float(binsize))
482    pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
483    return pct
484
485
486def lhistogram(inlist, numbins=10, defaultreallimits=None, printextras=0):
487    """
488Returns (i) a list of histogram bin counts, (ii) the smallest value
489of the histogram binning, and (iii) the bin width (the last 2 are not
490necessarily integers).  Default number of bins is 10.  If no sequence object
491is given for defaultreallimits, the routine picks (usually non-pretty) bins
492spanning all the numbers in the inlist.
493
494Usage:   lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
495Returns: list of bin values, lowerreallimit, binsize, extrapoints
496"""
497    if (defaultreallimits is not None):
498        if type(defaultreallimits) not in [list, tuple] or len(defaultreallimits) == 1:  # only one limit given, assumed to be lower one & upper is calc'd
499            lowerreallimit = defaultreallimits
500            upperreallimit = 1.0001 * max(inlist)
501        else:  # assume both limits given
502            lowerreallimit = defaultreallimits[0]
503            upperreallimit = defaultreallimits[1]
504        binsize = (upperreallimit-lowerreallimit)/float(numbins)
505    else:     # no limits given for histogram, both must be calc'd
506        estbinwidth = (max(inlist)-min(inlist))/float(numbins) + 1  # 1=>cover all
507        binsize = (max(inlist)-min(inlist)+estbinwidth)/float(numbins)
508        lowerreallimit = min(inlist) - binsize/2  # lower real limit,1st bin
509    bins = [0]*(numbins)
510    extrapoints = 0
511    for num in inlist:
512        try:
513            if (num-lowerreallimit) < 0:
514                extrapoints = extrapoints + 1
515            else:
516                bintoincrement = int((num-lowerreallimit)/float(binsize))
517                bins[bintoincrement] = bins[bintoincrement] + 1
518        except Exception:
519            extrapoints = extrapoints + 1
520    if (extrapoints > 0 and printextras == 1):
521        print('\nPoints outside given histogram range =', extrapoints)
522    return (bins, lowerreallimit, binsize, extrapoints)
523
524
525def lcumfreq(inlist, numbins=10, defaultreallimits=None):
526    """
527Returns a cumulative frequency histogram, using the histogram function.
528
529Usage:   lcumfreq(inlist,numbins=10,defaultreallimits=None)
530Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
531"""
532    h, l, b, e = histogram(inlist, numbins, defaultreallimits)
533    cumhist = cumsum(copy.deepcopy(h))
534    return cumhist, l, b, e
535
536
537def lrelfreq(inlist, numbins=10, defaultreallimits=None):
538    """
539Returns a relative frequency histogram, using the histogram function.
540
541Usage:   lrelfreq(inlist,numbins=10,defaultreallimits=None)
542Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
543"""
544    h, l, b, e = histogram(inlist, numbins, defaultreallimits)
545    for i in range(len(h)):
546        h[i] = h[i]/float(len(inlist))
547    return h, l, b, e
548
549
550# VARIABILITY FUNCTIONS
551
552def lobrientransform(*args):
553    """
554Computes a transform on input data (any number of columns).  Used to
555test for homogeneity of variance prior to running one-way stats.  From
556Maxwell and Delaney, p.112.
557
558Usage:   lobrientransform(*args)
559Returns: transformed data for use in an ANOVA
560"""
561    TINY = 1e-10
562    k = len(args)
563    n = [0.0]*k
564    v = [0.0]*k
565    m = [0.0]*k
566    nargs = []
567    for i in range(k):
568        nargs.append(copy.deepcopy(args[i]))
569        n[i] = float(len(nargs[i]))
570        v[i] = var(nargs[i])
571        m[i] = mean(nargs[i])
572    for j in range(k):
573        for i in range(n[j]):
574            t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
575            t2 = 0.5*v[j]*(n[j]-1.0)
576            t3 = (n[j]-1.0)*(n[j]-2.0)
577            nargs[j][i] = (t1-t2) / float(t3)
578    check = 1
579    for j in range(k):
580        if v[j] - mean(nargs[j]) > TINY:
581            check = 0
582    if check != 1:
583        raise ValueError('Problem in obrientransform.')
584    else:
585        return nargs
586
587
588def lsamplevar(inlist):
589    """
590Returns the variance of the values in the passed list using
591N for the denominator (i.e., DESCRIBES the sample variance only).
592
593Usage:   lsamplevar(inlist)
594"""
595    n = len(inlist)
596    mn = mean(inlist)
597    deviations = []
598    for item in inlist:
599        deviations.append(item-mn)
600    return ss(deviations)/float(n)
601
602
603def lsamplestdev(inlist):
604    """
605Returns the standard deviation of the values in the passed list using
606N for the denominator (i.e., DESCRIBES the sample stdev only).
607
608Usage:   lsamplestdev(inlist)
609"""
610    return math.sqrt(samplevar(inlist))
611
612
613def lvar(inlist):
614    """
615Returns the variance of the values in the passed list using N-1
616for the denominator (i.e., for estimating population variance).
617
618Usage:   lvar(inlist)
619"""
620    n = len(inlist)
621    mn = mean(inlist)
622    deviations = [0]*len(inlist)
623    for i in range(len(inlist)):
624        deviations[i] = inlist[i] - mn
625    return ss(deviations)/float(n-1)
626
627
628def lstdev(inlist):
629    """
630Returns the standard deviation of the values in the passed list
631using N-1 in the denominator (i.e., to estimate population stdev).
632
633Usage:   lstdev(inlist)
634"""
635    return math.sqrt(var(inlist))
636
637
638def lsterr(inlist):
639    """
640Returns the standard error of the values in the passed list using N-1
641in the denominator (i.e., to estimate population standard error).
642
643Usage:   lsterr(inlist)
644"""
645    return stdev(inlist) / float(math.sqrt(len(inlist)))
646
647
648def lsem(inlist):
649    """
650Returns the estimated standard error of the mean (sx-bar) of the
651values in the passed list.  sem = stdev / sqrt(n)
652
653Usage:   lsem(inlist)
654"""
655    sd = stdev(inlist)
656    n = len(inlist)
657    return sd/math.sqrt(n)
658
659
660def lz(inlist, score):
661    """
662Returns the z-score for a given input score, given that score and the
663list from which that score came.  Not appropriate for population calculations.
664
665Usage:   lz(inlist, score)
666"""
667    z = (score-mean(inlist))/samplestdev(inlist)
668    return z
669
670
671def lzs(inlist):
672    """
673Returns a list of z-scores, one for each score in the passed list.
674
675Usage:   lzs(inlist)
676"""
677    zscores = []
678    for item in inlist:
679        zscores.append(z(inlist, item))
680    return zscores
681
682
683# TRIMMING FUNCTIONS
684
685def ltrimboth(l, proportiontocut):
686    """
687Slices off the passed proportion of items from BOTH ends of the passed
688list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
68910% of scores.  Assumes list is sorted by magnitude.  Slices off LESS if
690proportion results in a non-integer slice index (i.e., conservatively
691slices off proportiontocut).
692
693Usage:   ltrimboth (l,proportiontocut)
694Returns: trimmed version of list l
695"""
696    lowercut = int(proportiontocut*len(l))
697    uppercut = len(l) - lowercut
698    return l[lowercut:uppercut]
699
700
701def ltrim1(l, proportiontocut, tail='right'):
702    """
703Slices off the passed proportion of items from ONE end of the passed
704list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
70510% of scores).  Slices off LESS if proportion results in a non-integer
706slice index (i.e., conservatively slices off proportiontocut).
707
708Usage:   ltrim1 (l,proportiontocut,tail='right')  or set tail='left'
709Returns: trimmed version of list l
710"""
711    if tail == 'right':
712        lowercut = 0
713        uppercut = len(l) - int(proportiontocut*len(l))
714    elif tail == 'left':
715        lowercut = int(proportiontocut*len(l))
716        uppercut = len(l)
717    return l[lowercut:uppercut]
718
719
720# CORRELATION FUNCTIONS
721
722def lpaired(x, y):
723    """
724Interactively determines the type of data and then runs the
725appropriated statistic for paired group data.
726
727Usage:   lpaired(x,y)
728Returns: appropriate statistic name, value, and probability
729"""
730    samples = ''
731    while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
732        print('\nIndependent or related samples, or correlation (i,r,c): ', end=' ')
733        samples = input()
734
735    if samples in ['i', 'I', 'r', 'R']:
736        print('\nComparing variances ...', end=' ')
737# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
738        r = obrientransform(x, y)
739        f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
740        if p < 0.05:
741            vartype = 'unequal, p='+str(round(p, 4))
742        else:
743            vartype = 'equal'
744        print(vartype)
745        if samples in ['i', 'I']:
746            if vartype[0] == 'e':
747                t, p = ttest_ind(x, y, 0)
748                print('\nIndependent samples t-test:  ', round(t, 4), round(p, 4))
749            else:
750                if len(x) > 20 or len(y) > 20:
751                    z, p = ranksums(x, y)
752                    print('\nRank Sums test (NONparametric, n>20):  ', round(z, 4), round(p, 4))
753                else:
754                    u, p = mannwhitneyu(x, y)
755                    print('\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u, 4), round(p, 4))
756
757        else:  # RELATED SAMPLES
758            if vartype[0] == 'e':
759                t, p = ttest_rel(x, y, 0)
760                print('\nRelated samples t-test:  ', round(t, 4), round(p, 4))
761            else:
762                t, p = ranksums(x, y)
763                print('\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4))
764    else:  # CORRELATION ANALYSIS
765        corrtype = ''
766        while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
767            print('\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', end=' ')
768            corrtype = input()
769        if corrtype in ['c', 'C']:
770            m, b, r, p, see = linregress(x, y)
771            print('\nLinear regression for continuous variables ...')
772            lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'], [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]]
773            pstat.printcc(lol)
774        elif corrtype in ['r', 'R']:
775            r, p = spearmanr(x, y)
776            print('\nCorrelation for ranked variables ...')
777            print("Spearman's r: ", round(r, 4), round(p, 4))
778        else:  # DICHOTOMOUS
779            r, p = pointbiserialr(x, y)
780            print('\nAssuming x contains a dichotomous variable ...')
781            print('Point Biserial r: ', round(r, 4), round(p, 4))
782    print('\n\n')
783    return None
784
785
786def lpearsonr(x, y):
787    """
788Calculates a Pearson correlation coefficient and the associated
789probability value.  Taken from Heiman's Basic Statistics for the Behav.
790Sci (2nd), p.195.
791
792Usage:   lpearsonr(x,y)      where x and y are equal-length lists
793Returns: Pearson's r value, two-tailed p-value
794"""
795    TINY = 1.0e-30
796    if len(x) != len(y):
797        raise ValueError('Input values not paired in pearsonr.  Aborting.')
798    n = len(x)
799    x = [float(_) for _ in x]
800    y = [float(_) for _ in y]
801    r_num = n*(summult(x, y)) - sum(x)*sum(y)
802    r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
803    r = (r_num / r_den)  # denominator already a float
804    df = n-2
805    t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
806    prob = betai(0.5*df, 0.5, df/float(df+t*t))
807    return r, prob
808
809
810def lspearmanr(x, y):
811    """
812Calculates a Spearman rank-order correlation coefficient.  Taken
813from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
814
815Usage:   lspearmanr(x,y)      where x and y are equal-length lists
816Returns: Spearman's r, two-tailed p-value
817"""
818    if len(x) != len(y):
819        raise ValueError('Input values not paired in spearmanr.  Aborting.')
820    n = len(x)
821    rankx = rankdata(x)
822    ranky = rankdata(y)
823    dsq = sumdiffsquared(rankx, ranky)
824    rs = 1 - 6*dsq / float(n*(n**2-1))
825    t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
826    df = n-2
827    probrs = betai(0.5*df, 0.5, df/(df+t*t))  # t already a float
828# probability values for rs are from part 2 of the spearman function in
829# Numerical Recipies, p.510.  They are close to tables, but not exact. (?)
830    return rs, probrs
831
832
833def lpointbiserialr(x, y):
834    """
835Calculates a point-biserial correlation coefficient and the associated
836probability value.  Taken from Heiman's Basic Statistics for the Behav.
837Sci (1st), p.194.
838
839Usage:   lpointbiserialr(x,y)      where x,y are equal-length lists
840Returns: Point-biserial r, two-tailed p-value
841"""
842    TINY = 1e-30
843    if len(x) != len(y):
844        raise ValueError('INPUT VALUES NOT PAIRED IN pointbiserialr.  ABORTING.')
845    data = pstat.abut(x, y)
846    categories = pstat.unique(x)
847    if len(categories) != 2:
848        raise ValueError("Exactly 2 categories required for pointbiserialr().")
849    else:   # there are 2 categories, continue
850        codemap = pstat.abut(categories, range(2))
851        pstat.recode(data, codemap, 0)  # recoded
852        x = pstat.linexand(data, 0, categories[0])
853        y = pstat.linexand(data, 0, categories[1])
854        xmean = mean(pstat.colex(x, 1))
855        ymean = mean(pstat.colex(y, 1))
856        n = len(data)
857        adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
858        rpb = (ymean - xmean)/samplestdev(pstat.colex(data, 1))*adjust
859        df = n-2
860        t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
861        prob = betai(0.5*df, 0.5, df/(df+t*t))  # t already a float
862        return rpb, prob
863
864
865def lkendalltau(x, y):
866    """
867Calculates Kendall's tau ... correlation of ordinal data.  Adapted
868from function kendl1 in Numerical Recipies.  Needs good test-routine.@@@
869
870Usage:   lkendalltau(x,y)
871Returns: Kendall's tau, two-tailed p-value
872"""
873    n1 = 0
874    n2 = 0
875    iss = 0
876    for j in range(len(x)-1):
877        for k in range(j, len(y)):
878            a1 = x[j] - x[k]
879            a2 = y[j] - y[k]
880            aa = a1 * a2
881            if (aa):             # neither list has a tie
882                n1 = n1 + 1
883                n2 = n2 + 1
884                if aa > 0:
885                    iss = iss + 1
886                else:
887                    iss = iss - 1
888            else:
889                if (a1):
890                    n1 = n1 + 1
891                else:
892                    n2 = n2 + 1
893    tau = iss / math.sqrt(n1*n2)
894    svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
895    z = tau / math.sqrt(svar)
896    prob = erfcc(abs(z)/1.4142136)
897    return tau, prob
898
899
900def llinregress(x, y):
901    """
902Calculates a regression line on x,y pairs.
903
904Usage:   llinregress(x,y)      x,y are equal-length lists of x-y coordinates
905Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
906"""
907    TINY = 1.0e-20
908    if len(x) != len(y):
909        raise ValueError('Input values not paired in linregress.  Aborting.')
910    n = len(x)
911    x = [float(_) for _ in x]
912    y = [float(_) for _ in y]
913    xmean = mean(x)
914    ymean = mean(y)
915    r_num = float(n*(summult(x, y)) - sum(x)*sum(y))
916    r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
917    r = r_num / r_den
918    df = n-2
919    t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
920    prob = betai(0.5*df, 0.5, df/(df+t*t))
921    slope = r_num / float(n*ss(x) - square_of_sums(x))
922    intercept = ymean - slope*xmean
923    sterrest = math.sqrt(1-r*r)*samplestdev(y)
924    return slope, intercept, r, prob, sterrest
925
926
927# INFERENTIAL STATISTICS
928
929def lttest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
930    """
931Calculates the t-obtained for the independent samples T-test on ONE group
932of scores a, given a population mean.  If printit=1, results are printed
933to the screen.  If printit='filename', the results are output to 'filename'
934using the given writemode (default=append).  Returns t-value, and prob.
935
936Usage:   lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
937Returns: t-value, two-tailed prob
938"""
939    x = mean(a)
940    v = var(a)
941    n = len(a)
942    df = n-1
943    svar = ((n-1)*v)/float(df)
944    t = (x-popmean)/math.sqrt(svar*(1.0/n))
945    prob = betai(0.5*df, 0.5, float(df)/(df+t*t))
946
947    if printit != 0:
948        statname = 'Single-sample T-test.'
949        outputpairedstats(printit, writemode,
950                          'Population', '--', popmean, 0, 0, 0,
951                          name, n, x, v, min(a), max(a),
952                          statname, t, prob)
953    return t, prob
954
955
956def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
957    """
958Calculates the t-obtained T-test on TWO INDEPENDENT samples of
959scores a, and b.  From Numerical Recipies, p.483.  If printit=1, results
960are printed to the screen.  If printit='filename', the results are output
961to 'filename' using the given writemode (default=append).  Returns t-value,
962and prob.
963
964Usage:   lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
965Returns: t-value, two-tailed prob
966"""
967    x1 = mean(a)
968    x2 = mean(b)
969    v1 = stdev(a)**2
970    v2 = stdev(b)**2
971    n1 = len(a)
972    n2 = len(b)
973    df = n1+n2-2
974    svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
975    t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
976    prob = betai(0.5*df, 0.5, df/(df+t*t))
977
978    if printit != 0:
979        statname = 'Independent samples T-test.'
980        outputpairedstats(printit, writemode,
981                          name1, n1, x1, v1, min(a), max(a),
982                          name2, n2, x2, v2, min(b), max(b),
983                          statname, t, prob)
984    return t, prob
985
986
987def lttest_rel(a, b, printit=0, name1='Sample1', name2='Sample2', writemode='a'):
988    """
989Calculates the t-obtained T-test on TWO RELATED samples of scores,
990a and b.  From Numerical Recipies, p.483.  If printit=1, results are
991printed to the screen.  If printit='filename', the results are output to
992'filename' using the given writemode (default=append).  Returns t-value,
993and prob.
994
995Usage:   lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
996Returns: t-value, two-tailed prob
997"""
998    if len(a) != len(b):
999        raise ValueError('Unequal length lists in ttest_rel.')
1000    x1 = mean(a)
1001    x2 = mean(b)
1002    v1 = var(a)
1003    v2 = var(b)
1004    n = len(a)
1005    cov = 0
1006    for i in range(len(a)):
1007        cov = cov + (a[i]-x1) * (b[i]-x2)
1008    df = n-1
1009    cov = cov / float(df)
1010    sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1011    t = (x1-x2)/sd
1012    prob = betai(0.5*df, 0.5, df/(df+t*t))
1013
1014    if printit != 0:
1015        statname = 'Related samples T-test.'
1016        outputpairedstats(printit, writemode,
1017                          name1, n, x1, v1, min(a), max(a),
1018                          name2, n, x2, v2, min(b), max(b),
1019                          statname, t, prob)
1020    return t, prob
1021
1022
1023def lchisquare(f_obs, f_exp=None):
1024    """
1025Calculates a one-way chi square for list of observed frequencies and returns
1026the result.  If no expected frequencies are given, the total N is assumed to
1027be equally distributed across all groups.
1028
1029Usage:   lchisquare(f_obs, f_exp=None)   f_obs = list of observed cell freq.
1030Returns: chisquare-statistic, associated p-value
1031"""
1032    k = len(f_obs)                 # number of groups
1033    if f_exp is None:
1034        f_exp = [sum(f_obs)/float(k)] * len(f_obs)  # create k bins with = freq.
1035    chisq = 0
1036    for i in range(len(f_obs)):
1037        chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1038    return chisq, chisqprob(chisq, k-1)
1039
1040
1041def lks_2samp(data1, data2):
1042    """
1043Computes the Kolmogorov-Smirnof statistic on 2 samples.  From
1044Numerical Recipies in C, page 493.
1045
1046Usage:   lks_2samp(data1,data2)   data1&2 are lists of values for 2 conditions
1047Returns: KS D-value, associated p-value
1048"""
1049    j1 = 0
1050    j2 = 0
1051    fn1 = 0.0
1052    fn2 = 0.0
1053    n1 = len(data1)
1054    n2 = len(data2)
1055    en1 = n1
1056    en2 = n2
1057    d = 0.0
1058    data1.sort()
1059    data2.sort()
1060    while j1 < n1 and j2 < n2:
1061        d1 = data1[j1]
1062        d2 = data2[j2]
1063        if d1 <= d2:
1064            fn1 = (j1)/float(en1)
1065            j1 = j1 + 1
1066        if d2 <= d1:
1067            fn2 = (j2)/float(en2)
1068            j2 = j2 + 1
1069        dt = (fn2-fn1)
1070        if math.fabs(dt) > math.fabs(d):
1071            d = dt
1072    try:
1073        en = math.sqrt(en1*en2/float(en1+en2))
1074        prob = ksprob((en+0.12+0.11/en)*abs(d))
1075    except Exception:
1076        prob = 1.0
1077    return d, prob
1078
1079
1080def lmannwhitneyu(x, y):
1081    """
1082Calculates a Mann-Whitney U statistic on the provided scores and
1083returns the result.  Use only when the n in each condition is < 20 and
1084you have 2 independent samples of ranks.  NOTE: Mann-Whitney U is
1085significant if the u-obtained is LESS THAN or equal to the critical
1086value of U found in the tables.  Equivalent to Kruskal-Wallis H with
1087just 2 groups.
1088
1089Usage:   lmannwhitneyu(data)
1090Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1091"""
1092    n1 = len(x)
1093    n2 = len(y)
1094    ranked = rankdata(x+y)
1095    rankx = ranked[0:n1]       # get the x-ranks
1096    u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x
1097    u2 = n1*n2 - u1                            # remainder is U for y
1098    bigu = max(u1, u2)
1099    smallu = min(u1, u2)
1100    T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
1101    if T == 0:
1102        raise ValueError('All numbers are identical in lmannwhitneyu')
1103    sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1104    z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc
1105    return smallu, 1.0 - zprob(z)
1106
1107
1108def ltiecorrect(rankvals):
1109    """
1110Corrects for ties in Mann Whitney U and Kruskal Wallis H tests.  See
1111Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1112New York: McGraw-Hill.  Code adapted from |Stat rankind.c code.
1113
1114Usage:   ltiecorrect(rankvals)
1115Returns: T correction factor for U or H
1116"""
1117    sorted, posn = shellsort(rankvals)
1118    n = len(sorted)
1119    T = 0.0
1120    i = 0
1121    while (i < n-1):
1122        if sorted[i] == sorted[i+1]:
1123            nties = 1
1124            while (i < n-1) and (sorted[i] == sorted[i+1]):
1125                nties = nties + 1
1126                i = i + 1
1127            T = T + nties**3 - nties
1128        i = i+1
1129    T = T / float(n**3-n)
1130    return 1.0 - T
1131
1132
1133def lranksums(x, y):
1134    """
1135Calculates the rank sums statistic on the provided scores and
1136returns the result.  Use only when the n in each condition is > 20 and you
1137have 2 independent samples of ranks.
1138
1139Usage:   lranksums(x,y)
1140Returns: a z-statistic, two-tailed p-value
1141"""
1142    n1 = len(x)
1143    n2 = len(y)
1144    alldata = x+y
1145    ranked = rankdata(alldata)
1146    x = ranked[:n1]
1147    y = ranked[n1:]
1148    s = sum(x)
1149    expected = n1*(n1+n2+1) / 2.0
1150    z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1151    prob = 2*(1.0 - zprob(abs(z)))
1152    return z, prob
1153
1154
1155def lwilcoxont(x, y):
1156    """
1157Calculates the Wilcoxon T-test for related samples and returns the
1158result.  A non-parametric T-test.
1159
1160Usage:   lwilcoxont(x,y)
1161Returns: a t-statistic, two-tail probability estimate
1162"""
1163    if len(x) != len(y):
1164        raise ValueError('Unequal N in wilcoxont.  Aborting.')
1165    d = []
1166    for i in range(len(x)):
1167        diff = x[i] - y[i]
1168        if diff != 0:
1169            d.append(diff)
1170    count = len(d)
1171    absd = [abs(_) for _ in d]
1172    absranked = rankdata(absd)
1173    r_plus = 0.0
1174    r_minus = 0.0
1175    for i in range(len(absd)):
1176        if d[i] < 0:
1177            r_minus = r_minus + absranked[i]
1178        else:
1179            r_plus = r_plus + absranked[i]
1180    wt = min(r_plus, r_minus)
1181    mn = count * (count+1) * 0.25
1182    se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1183    z = math.fabs(wt-mn) / se
1184    prob = 2*(1.0 - zprob(abs(z)))
1185    return wt, prob
1186
1187
1188def lkruskalwallish(*args):
1189    """
1190The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1191groups, requiring at least 5 subjects in each group.  This function
1192calculates the Kruskal-Wallis H-test for 3 or more independent samples
1193and returns the result.
1194
1195Usage:   lkruskalwallish(*args)
1196Returns: H-statistic (corrected for ties), associated p-value
1197"""
1198    args = list(args)
1199    n = [0]*len(args)
1200    all = []
1201    n = [len(_) for _ in args]
1202    for i in range(len(args)):
1203        all = all + args[i]
1204    ranked = rankdata(all)
1205    T = tiecorrect(ranked)
1206    for i in range(len(args)):
1207        args[i] = ranked[0:n[i]]
1208        del ranked[0:n[i]]
1209    rsums = []
1210    for i in range(len(args)):
1211        rsums.append(sum(args[i])**2)
1212        rsums[i] = rsums[i] / float(n[i])
1213    ssbn = sum(rsums)
1214    totaln = sum(n)
1215    h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1216    df = len(args) - 1
1217    if T == 0:
1218        raise ValueError('All numbers are identical in lkruskalwallish')
1219    h = h / float(T)
1220    return h, chisqprob(h, df)
1221
1222
1223def lfriedmanchisquare(*args):
1224    """
1225Friedman Chi-Square is a non-parametric, one-way within-subjects
1226ANOVA.  This function calculates the Friedman Chi-square test for repeated
1227measures and returns the result, along with the associated probability
1228value.  It assumes 3 or more repeated measures.  Only 3 levels requires a
1229minimum of 10 subjects in the study.  Four levels requires 5 subjects per
1230level(??).
1231
1232Usage:   lfriedmanchisquare(*args)
1233Returns: chi-square statistic, associated p-value
1234"""
1235    k = len(args)
1236    if k < 3:
1237        raise ValueError('Less than 3 levels.  Friedman test not appropriate.')
1238    n = len(args[0])
1239    data = pstat.abut(*tuple(args))
1240    for i in range(len(data)):
1241        data[i] = rankdata(data[i])
1242    ssbn = 0
1243    for i in range(k):
1244        ssbn = ssbn + sum(args[i])**2
1245    chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1246    return chisq, chisqprob(chisq, k-1)
1247
1248
1249# PROBABILITY CALCULATIONS
1250
1251def lchisqprob(chisq, df):
1252    """
1253Returns the (1-tailed) probability value associated with the provided
1254chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat.
1255
1256Usage:   lchisqprob(chisq,df)
1257"""
1258    BIG = 20.0
1259
1260    def ex(x):
1261        BIG = 20.0
1262        if x < -BIG:
1263            return 0.0
1264        else:
1265            return math.exp(x)
1266
1267    if chisq <= 0 or df < 1:
1268        return 1.0
1269    a = 0.5 * chisq
1270    if df % 2 == 0:
1271        even = 1
1272    else:
1273        even = 0
1274    if df > 1:
1275        y = ex(-a)
1276    if even:
1277        s = y
1278    else:
1279        s = 2.0 * zprob(-math.sqrt(chisq))
1280    if (df > 2):
1281        chisq = 0.5 * (df - 1.0)
1282        if even:
1283            z = 1.0
1284        else:
1285            z = 0.5
1286        if a > BIG:
1287            if even:
1288                e = 0.0
1289            else:
1290                e = math.log(math.sqrt(math.pi))
1291            c = math.log(a)
1292            while (z <= chisq):
1293                e = math.log(z) + e
1294                s = s + ex(c*z-a-e)
1295                z = z + 1.0
1296            return s
1297        else:
1298            if even:
1299                e = 1.0
1300            else:
1301                e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1302            c = 0.0
1303            while (z <= chisq):
1304                e = e * (a/float(z))
1305                c = c + e
1306                z = z + 1.0
1307            return (c*y+s)
1308    else:
1309        return s
1310
1311
1312def lerfcc(x):
1313    """
1314Returns the complementary error function erfc(x) with fractional
1315error everywhere less than 1.2e-7.  Adapted from Numerical Recipies.
1316
1317Usage:   lerfcc(x)
1318"""
1319    z = abs(x)
1320    t = 1.0 / (1.0+0.5*z)
1321    ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1322    if x >= 0:
1323        return ans
1324    else:
1325        return 2.0 - ans
1326
1327
1328def lzprob(z):
1329    """
1330Returns the area under the normal curve 'to the left of' the given z value.
1331Thus,
1332    for z<0, zprob(z) = 1-tail probability
1333    for z>0, 1.0-zprob(z) = 1-tail probability
1334    for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1335Adapted from z.c in Gary Perlman's |Stat.
1336
1337Usage:   lzprob(z)
1338"""
1339    Z_MAX = 6.0    # maximum meaningful z-value
1340    if z == 0.0:
1341        x = 0.0
1342    else:
1343        y = 0.5 * math.fabs(z)
1344        if y >= (Z_MAX*0.5):
1345            x = 1.0
1346        elif (y < 1.0):
1347            w = y*y
1348            x = ((((((((0.000124818987 * w
1349                        - 0.001075204047) * w + 0.005198775019) * w
1350                      - 0.019198292004) * w + 0.059054035642) * w
1351                    - 0.151968751364) * w + 0.319152932694) * w
1352                  - 0.531923007300) * w + 0.797884560593) * y * 2.0
1353        else:
1354            y = y - 2.0
1355            x = (((((((((((((-0.000045255659 * y
1356                             + 0.000152529290) * y - 0.000019538132) * y
1357                           - 0.000676904986) * y + 0.001390604284) * y
1358                         - 0.000794620820) * y - 0.002034254874) * y
1359                       + 0.006549791214) * y - 0.010557625006) * y
1360                     + 0.011630447319) * y - 0.009279453341) * y
1361                   + 0.005353579108) * y - 0.002141268741) * y
1362                 + 0.000535310849) * y + 0.999936657524
1363    if z > 0.0:
1364        prob = ((x+1.0)*0.5)
1365    else:
1366        prob = ((1.0-x)*0.5)
1367    return prob
1368
1369
1370def lksprob(alam):
1371    """
1372Computes a Kolmolgorov-Smirnov t-test significance level.  Adapted from
1373Numerical Recipies.
1374
1375Usage:   lksprob(alam)
1376"""
1377    fac = 2.0
1378    sum = 0.0
1379    termbf = 0.0
1380    a2 = -2.0*alam*alam
1381    for j in range(1, 201):
1382        term = fac*math.exp(a2*j*j)
1383        sum = sum + term
1384        if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1385            return sum
1386        fac = -fac
1387        termbf = math.fabs(term)
1388    return 1.0             # Get here only if fails to converge; was 0.0!!
1389
1390
1391def lfprob(dfnum, dfden, F):
1392    """
1393Returns the (1-tailed) significance level (p-value) of an F
1394statistic given the degrees of freedom for the numerator (dfR-dfF) and
1395the degrees of freedom for the denominator (dfF).
1396
1397Usage:   lfprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
1398"""
1399    p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1400    return p
1401
1402
1403def lbetacf(a, b, x):
1404    """
1405This function evaluates the continued fraction form of the incomplete
1406Beta function, betai.  (Adapted from: Numerical Recipies in C.)
1407
1408Usage:   lbetacf(a,b,x)
1409"""
1410    ITMAX = 200
1411    EPS = 3.0e-7
1412
1413    bm = az = am = 1.0
1414    qab = a+b
1415    qap = a+1.0
1416    qam = a-1.0
1417    bz = 1.0-qab*x/qap
1418    for i in range(ITMAX+1):
1419        em = float(i+1)
1420        tem = em + em
1421        d = em*(b-em)*x/((qam+tem)*(a+tem))
1422        ap = az + d*am
1423        bp = bz+d*bm
1424        d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1425        app = ap+d*az
1426        bpp = bp+d*bz
1427        aold = az
1428        am = ap/bpp
1429        bm = bp/bpp
1430        az = app/bpp
1431        bz = 1.0
1432        if (abs(az-aold) < (EPS*abs(az))):
1433            return az
1434    print('a or b too big, or ITMAX too small in Betacf.')
1435
1436
1437def lgammln(xx):
1438    """
1439Returns the gamma function of xx.
1440    Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1441(Adapted from: Numerical Recipies in C.)
1442
1443Usage:   lgammln(xx)
1444"""
1445
1446    coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1447             0.120858003e-2, -0.536382e-5]
1448    x = xx - 1.0
1449    tmp = x + 5.5
1450    tmp = tmp - (x+0.5)*math.log(tmp)
1451    ser = 1.0
1452    for j in range(len(coeff)):
1453        x = x + 1
1454        ser = ser + coeff[j]/x
1455    return -tmp + math.log(2.50662827465*ser)
1456
1457
1458def lbetai(a, b, x):
1459    """
1460Returns the incomplete beta function:
1461
1462    I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1463
1464where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1465function of a.  The continued fraction formulation is implemented here,
1466using the betacf function.  (Adapted from: Numerical Recipies in C.)
1467
1468Usage:   lbetai(a,b,x)
1469"""
1470    if (x < 0.0 or x > 1.0):
1471        raise ValueError('Bad x in lbetai')
1472    if (x == 0.0 or x == 1.0):
1473        bt = 0.0
1474    else:
1475        bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b
1476                      * math.log(1.0-x))
1477    if (x < (a+1.0)/(a+b+2.0)):
1478        return bt*betacf(a, b, x)/float(a)
1479    else:
1480        return 1.0-bt*betacf(b, a, 1.0-x)/float(b)
1481
1482
1483# ANOVA CALCULATIONS
1484
1485def lF_oneway(*lists):
1486    """
1487Performs a 1-way ANOVA, returning an F-value and probability given
1488any number of groups.  From Heiman, pp.394-7.
1489
1490Usage:   F_oneway(*lists)    where *lists is any number of lists, one per
1491                                  treatment group
1492Returns: F value, one-tailed p-value
1493"""
1494    a = len(lists)           # ANOVA on 'a' groups, each in it's own list
1495    alldata = []
1496    for i in range(len(lists)):
1497        alldata = alldata + lists[i]
1498    alldata = N.array(alldata)
1499    bign = len(alldata)
1500    sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1501    ssbn = 0
1502    for list in lists:
1503        ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1504    ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1505    sswn = sstot-ssbn
1506    dfbn = a-1
1507    dfwn = bign - a
1508    msb = ssbn/float(dfbn)
1509    msw = sswn/float(dfwn)
1510    f = msb/msw
1511    prob = fprob(dfbn, dfwn, f)
1512    return f, prob
1513
1514
1515def lF_value(ER, EF, dfnum, dfden):
1516    """
1517Returns an F-statistic given the following:
1518        ER  = error associated with the null hypothesis (the Restricted model)
1519        EF  = error associated with the alternate hypothesis (the Full model)
1520        dfR-dfF = degrees of freedom of the numerator
1521        dfF = degrees of freedom associated with the denominator/Full model
1522
1523Usage:   lF_value(ER,EF,dfnum,dfden)
1524"""
1525    return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1526
1527
1528# SUPPORT FUNCTIONS
1529
1530def writecc(listoflists, file, writetype='w', extra=2):
1531    """
1532Writes a list of lists to a file in columns, customized by the max
1533size of items within the columns (max size of items in col, +2 characters)
1534to specified file.  File-overwrite is the default.
1535
1536Usage:   writecc (listoflists,file,writetype='w',extra=2)
1537Returns: None
1538"""
1539    if type(listoflists[0]) not in [list, tuple]:
1540        listoflists = [listoflists]
1541    outfile = open(file, writetype)
1542    rowstokill = []
1543    list2print = copy.deepcopy(listoflists)
1544    for i in range(len(listoflists)):
1545        if listoflists[i] == ['\n'] or listoflists[i] == '\n' or listoflists[i] == 'dashes':
1546            rowstokill = rowstokill + [i]
1547    rowstokill.reverse()
1548    for row in rowstokill:
1549        del list2print[row]
1550    maxsize = [0]*len(list2print[0])
1551    for col in range(len(list2print[0])):
1552        items = pstat.colex(list2print, col)
1553        items = [pstat.makestr(_) for _ in items]
1554        maxsize[col] = max(map(len, items)) + extra
1555    for row in listoflists:
1556        if row == ['\n'] or row == '\n':
1557            outfile.write('\n')
1558        elif row == ['dashes'] or row == 'dashes':
1559            dashes = [0]*len(maxsize)
1560            for j in range(len(maxsize)):
1561                dashes[j] = '-'*(maxsize[j]-2)
1562            outfile.write(pstat.lineincustcols(dashes, maxsize))
1563        else:
1564            outfile.write(pstat.lineincustcols(row, maxsize))
1565        outfile.write('\n')
1566    outfile.close()
1567    return None
1568
1569
1570def lincr(l, cap):        # to increment a list up to a max-list of 'cap'
1571    """
1572Simulate a counting system from an n-dimensional list.
1573
1574Usage:   lincr(l,cap)   l=list to increment, cap=max values for each list pos'n
1575Returns: next set of values for list l, OR -1 (if overflow)
1576"""
1577    l[0] = l[0] + 1     # e.g., [0,0,0] --> [2,4,3] (=cap)
1578    for i in range(len(l)):
1579        if l[i] > cap[i] and i < len(l)-1:  # if carryover AND not done
1580            l[i] = 0
1581            l[i+1] = l[i+1] + 1
1582        elif l[i] > cap[i] and i == len(l)-1:  # overflow past last column, must be finished
1583            l = -1
1584    return l
1585
1586
1587def lsum(inlist):
1588    """
1589Returns the sum of the items in the passed list.
1590
1591Usage:   lsum(inlist)
1592"""
1593    s = 0
1594    for item in inlist:
1595        s = s + item
1596    return s
1597
1598
1599def lcumsum(inlist):
1600    """
1601Returns a list consisting of the cumulative sum of the items in the
1602passed list.
1603
1604Usage:   lcumsum(inlist)
1605"""
1606    newlist = copy.deepcopy(inlist)
1607    for i in range(1, len(newlist)):
1608        newlist[i] = newlist[i] + newlist[i-1]
1609    return newlist
1610
1611
1612def lss(inlist):
1613    """
1614Squares each value in the passed list, adds up these squares and
1615returns the result.
1616
1617Usage:   lss(inlist)
1618"""
1619    ss = 0
1620    for item in inlist:
1621        ss = ss + item*item
1622    return ss
1623
1624
1625def lsummult(list1, list2):
1626    """
1627Multiplies elements in list1 and list2, element by element, and
1628returns the sum of all resulting multiplications.  Must provide equal
1629length lists.
1630
1631Usage:   lsummult(list1,list2)
1632"""
1633    if len(list1) != len(list2):
1634        raise ValueError("Lists not equal length in summult.")
1635    s = 0
1636    for item1, item2 in pstat.abut(list1, list2):
1637        s = s + item1*item2
1638    return s
1639
1640
1641def lsumdiffsquared(x, y):
1642    """
1643Takes pairwise differences of the values in lists x and y, squares
1644these differences, and returns the sum of these squares.
1645
1646Usage:   lsumdiffsquared(x,y)
1647Returns: sum[(x[i]-y[i])**2]
1648"""
1649    sds = 0
1650    for i in range(len(x)):
1651        sds = sds + (x[i]-y[i])**2
1652    return sds
1653
1654
1655def lsquare_of_sums(inlist):
1656    """
1657Adds the values in the passed list, squares the sum, and returns
1658the result.
1659
1660Usage:   lsquare_of_sums(inlist)
1661Returns: sum(inlist[i])**2
1662"""
1663    s = sum(inlist)
1664    return float(s)*s
1665
1666
1667def lshellsort(inlist):
1668    """
1669Shellsort algorithm.  Sorts a 1D-list.
1670
1671Usage:   lshellsort(inlist)
1672Returns: sorted-inlist, sorting-index-vector (for original list)
1673"""
1674    n = len(inlist)
1675    svec = copy.deepcopy(inlist)
1676    ivec = list(range(n))
1677    gap = n/2   # integer division needed
1678    while gap > 0:
1679        for i in range(gap, n):
1680            for j in range(i-gap, -1, -gap):
1681                while j >= 0 and svec[j] > svec[j+gap]:
1682                    temp = svec[j]
1683                    svec[j] = svec[j+gap]
1684                    svec[j+gap] = temp
1685                    itemp = ivec[j]
1686                    ivec[j] = ivec[j+gap]
1687                    ivec[j+gap] = itemp
1688        gap = gap / 2  # integer division needed
1689# svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
1690    return svec, ivec
1691
1692
1693def lrankdata(inlist):
1694    """
1695Ranks the data in inlist, dealing with ties appropritely.  Assumes
1696a 1D inlist.  Adapted from Gary Perlman's |Stat ranksort.
1697
1698Usage:   lrankdata(inlist)
1699Returns: a list of length equal to inlist, containing rank scores
1700"""
1701    n = len(inlist)
1702    svec, ivec = shellsort(inlist)
1703    sumranks = 0
1704    dupcount = 0
1705    newlist = [0]*n
1706    for i in range(n):
1707        sumranks = sumranks + i
1708        dupcount = dupcount + 1
1709        if i == n-1 or svec[i] != svec[i+1]:
1710            averank = sumranks / float(dupcount) + 1
1711            for j in range(i-dupcount+1, i+1):
1712                newlist[ivec[j]] = averank
1713            sumranks = 0
1714            dupcount = 0
1715    return newlist
1716
1717
1718def outputpairedstats(fname, writemode, name1, n1, m1, se1, min1, max1, name2, n2, m2, se2, min2, max2, statname, stat, prob):
1719    """
1720Prints or write to a file stats for two groups, using the name, n,
1721mean, sterr, min and max for each group, as well as the statistic name,
1722its value, and the associated p-value.
1723
1724Usage:   outputpairedstats(fname,writemode,
1725                           name1,n1,mean1,stderr1,min1,max1,
1726                           name2,n2,mean2,stderr2,min2,max2,
1727                           statname,stat,prob)
1728Returns: None
1729"""
1730    suffix = ''                       # for *s after the p-value
1731    try:
1732        prob.shape
1733        prob = prob[0]
1734    except Exception:
1735        pass
1736    if prob < 0.001:
1737        suffix = '  ***'
1738    elif prob < 0.01:
1739        suffix = '  **'
1740    elif prob < 0.05:
1741        suffix = '  *'
1742    title = [['Name', 'N', 'Mean', 'SD', 'Min', 'Max']]
1743    lofl = title+[[name1, n1, round(m1, 3), round(math.sqrt(se1), 3), min1, max1],
1744                  [name2, n2, round(m2, 3), round(math.sqrt(se2), 3), min2, max2]]
1745    if not isinstance(fname, str) or len(fname) == 0:
1746        print()
1747        print(statname)
1748        print()
1749        pstat.printcc(lofl)
1750        print()
1751        try:
1752            if stat.shape == ():
1753                stat = stat[0]
1754            if prob.shape == ():
1755                prob = prob[0]
1756        except Exception:
1757            pass
1758        print('Test statistic = ', round(stat, 3), '   p = ', round(prob, 3), suffix)
1759        print()
1760    else:
1761        file = open(fname, writemode)
1762        file.write('\n'+statname+'\n\n')
1763        file.close()
1764        writecc(lofl, fname, 'a')
1765        file = open(fname, 'a')
1766        try:
1767            if stat.shape == ():
1768                stat = stat[0]
1769            if prob.shape == ():
1770                prob = prob[0]
1771        except Exception:
1772            pass
1773        file.write(pstat.list2string(['\nTest statistic = ', round(stat, 4), '   p = ', round(prob, 4), suffix, '\n\n']))
1774        file.close()
1775    return None
1776
1777
1778def lfindwithin(data):
1779    """
1780Returns an integer representing a binary vector, where 1=within-
1781subject factor, 0=between.  Input equals the entire data 2D list (i.e.,
1782column 0=random factor, column -1=measured values (those two are skipped).
1783Note: input data is in |Stat format ... a list of lists ("2D list") with
1784one row per measured value, first column=subject identifier, last column=
1785score, one in-between column per factor (these columns contain level
1786designations on each factor).  See also stats.anova.__doc__.
1787
1788Usage:   lfindwithin(data)     data in |Stat format
1789"""
1790
1791    numfact = len(data[0])-1
1792    withinvec = 0
1793    for col in range(1, numfact):
1794        examplelevel = pstat.unique(pstat.colex(data, col))[0]
1795        rows = pstat.linexand(data, col, examplelevel)  # get 1 level of this factor
1796        factsubjs = pstat.unique(pstat.colex(rows, 0))
1797        allsubjs = pstat.unique(pstat.colex(data, 0))
1798        if len(factsubjs) == len(allsubjs):  # fewer Ss than scores on this factor?
1799            withinvec = withinvec + (1 << col)
1800    return withinvec
1801
1802
1803# DISPATCH LISTS AND TUPLES TO ABOVE FCNS
1804
1805# CENTRAL TENDENCY:
1806geometricmean = Dispatch((lgeometricmean, (list, tuple)), )
1807harmonicmean = Dispatch((lharmonicmean, (list, tuple)), )
1808mean = Dispatch((lmean, (list, tuple)), )
1809median = Dispatch((lmedian, (list, tuple)), )
1810medianscore = Dispatch((lmedianscore, (list, tuple)), )
1811mode = Dispatch((lmode, (list, tuple)), )
1812
1813# MOMENTS:
1814moment = Dispatch((lmoment, (list, tuple)), )
1815variation = Dispatch((lvariation, (list, tuple)), )
1816skew = Dispatch((lskew, (list, tuple)), )
1817kurtosis = Dispatch((lkurtosis, (list, tuple)), )
1818describe = Dispatch((ldescribe, (list, tuple)), )
1819
1820# FREQUENCY STATISTICS:
1821itemfreq = Dispatch((litemfreq, (list, tuple)), )
1822scoreatpercentile = Dispatch((lscoreatpercentile, (list, tuple)), )
1823percentileofscore = Dispatch((lpercentileofscore, (list, tuple)), )
1824histogram = Dispatch((lhistogram, (list, tuple)), )
1825cumfreq = Dispatch((lcumfreq, (list, tuple)), )
1826relfreq = Dispatch((lrelfreq, (list, tuple)), )
1827
1828# VARIABILITY:
1829obrientransform = Dispatch((lobrientransform, (list, tuple)), )
1830samplevar = Dispatch((lsamplevar, (list, tuple)), )
1831samplestdev = Dispatch((lsamplestdev, (list, tuple)), )
1832var = Dispatch((lvar, (list, tuple)), )
1833stdev = Dispatch((lstdev, (list, tuple)), )
1834sterr = Dispatch((lsterr, (list, tuple)), )
1835sem = Dispatch((lsem, (list, tuple)), )
1836z = Dispatch((lz, (list, tuple)), )
1837zs = Dispatch((lzs, (list, tuple)), )
1838
1839# TRIMMING FCNS:
1840trimboth = Dispatch((ltrimboth, (list, tuple)), )
1841trim1 = Dispatch((ltrim1, (list, tuple)), )
1842
1843# CORRELATION FCNS:
1844paired = Dispatch((lpaired, (list, tuple)), )
1845pearsonr = Dispatch((lpearsonr, (list, tuple)), )
1846spearmanr = Dispatch((lspearmanr, (list, tuple)), )
1847pointbiserialr = Dispatch((lpointbiserialr, (list, tuple)), )
1848kendalltau = Dispatch((lkendalltau, (list, tuple)), )
1849linregress = Dispatch((llinregress, (list, tuple)), )
1850
1851# INFERENTIAL STATS:
1852ttest_1samp = Dispatch((lttest_1samp, (list, tuple)), )
1853ttest_ind = Dispatch((lttest_ind, (list, tuple)), )
1854ttest_rel = Dispatch((lttest_rel, (list, tuple)), )
1855chisquare = Dispatch((lchisquare, (list, tuple)), )
1856ks_2samp = Dispatch((lks_2samp, (list, tuple)), )
1857mannwhitneyu = Dispatch((lmannwhitneyu, (list, tuple)), )
1858ranksums = Dispatch((lranksums, (list, tuple)), )
1859tiecorrect = Dispatch((ltiecorrect, (list, tuple)), )
1860wilcoxont = Dispatch((lwilcoxont, (list, tuple)), )
1861kruskalwallish = Dispatch((lkruskalwallish, (list, tuple)), )
1862friedmanchisquare = Dispatch((lfriedmanchisquare, (list, tuple)), )
1863
1864# PROBABILITY CALCS:
1865chisqprob = Dispatch((lchisqprob, (int, float)), )
1866zprob = Dispatch((lzprob, (int, float)), )
1867ksprob = Dispatch((lksprob, (int, float)), )
1868fprob = Dispatch((lfprob, (int, float)), )
1869betacf = Dispatch((lbetacf, (int, float)), )
1870betai = Dispatch((lbetai, (int, float)), )
1871erfcc = Dispatch((lerfcc, (int, float)), )
1872gammln = Dispatch((lgammln, (int, float)), )
1873
1874# ANOVA FUNCTIONS:
1875F_oneway = Dispatch((lF_oneway, (list, tuple)), )
1876F_value = Dispatch((lF_value, (list, tuple)), )
1877
1878# SUPPORT FUNCTIONS:
1879incr = Dispatch((lincr, (list, tuple)), )
1880sum = Dispatch((lsum, (list, tuple)), )
1881cumsum = Dispatch((lcumsum, (list, tuple)), )
1882ss = Dispatch((lss, (list, tuple)), )
1883summult = Dispatch((lsummult, (list, tuple)), )
1884square_of_sums = Dispatch((lsquare_of_sums, (list, tuple)), )
1885sumdiffsquared = Dispatch((lsumdiffsquared, (list, tuple)), )
1886shellsort = Dispatch((lshellsort, (list, tuple)), )
1887rankdata = Dispatch((lrankdata, (list, tuple)), )
1888findwithin = Dispatch((lfindwithin, (list, tuple)), )
1889
1890
1891# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1892# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1893# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1894# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1895# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1896# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1897# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1898# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1899# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1900# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1901# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1902# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1903# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1904# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1905# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1906# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1907# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1908# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1909# =============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
1910
1911try:                         # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE
1912    import Numeric
1913    N = Numeric
1914    import LinearAlgebra
1915    LA = LinearAlgebra
1916
1917# ACENTRAL TENDENCY
1918
1919    def ageometricmean(inarray, dimension=None, keepdims=0):
1920        """
1921    Calculates the geometric mean of the values in the passed array.
1922    That is:  n-th root of (x1 * x2 * ... * xn).  Defaults to ALL values in
1923    the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
1924    dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
1925    if dimension is a sequence, it collapses over all specified dimensions.  If
1926    keepdims is set to 1, the resulting array will have as many dimensions as
1927    inarray, with only 1 'level' per dim that was collapsed over.
1928
1929    Usage:   ageometricmean(inarray,dimension=None,keepdims=0)
1930    Returns: geometric mean computed over dim(s) listed in dimension
1931    """
1932        inarray = N.array(inarray, N.Float)
1933        if dimension is None:
1934            inarray = N.ravel(inarray)
1935            size = len(inarray)
1936            mult = N.power(inarray, 1.0/size)
1937            mult = N.multiply.reduce(mult)
1938        elif type(dimension) in [int, float]:
1939            size = inarray.shape[dimension]
1940            mult = N.power(inarray, 1.0/size)
1941            mult = N.multiply.reduce(mult, dimension)
1942            if keepdims == 1:
1943                shp = list(inarray.shape)
1944                shp[dimension] = 1
1945                N.reshape(sum, shp)
1946        else:  # must be a SEQUENCE of dims to average over
1947            dims = sorted(dimension)
1948            dims.reverse()
1949            size = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.Float)
1950            mult = N.power(inarray, 1.0/size)
1951            for dim in dims:
1952                mult = N.multiply.reduce(mult, dim)
1953            if keepdims == 1:
1954                shp = list(inarray.shape)
1955                for dim in dims:
1956                    shp[dim] = 1
1957                mult = N.reshape(mult, shp)
1958        return mult
1959
1960    def aharmonicmean(inarray, dimension=None, keepdims=0):
1961        """
1962    Calculates the harmonic mean of the values in the passed array.
1963    That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Defaults to ALL values in
1964    the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
1965    dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
1966    if dimension is a sequence, it collapses over all specified dimensions.  If
1967    keepdims is set to 1, the resulting array will have as many dimensions as
1968    inarray, with only 1 'level' per dim that was collapsed over.
1969
1970    Usage:   aharmonicmean(inarray,dimension=None,keepdims=0)
1971    Returns: harmonic mean computed over dim(s) in dimension
1972    """
1973        inarray = inarray.astype(N.Float)
1974        if dimension is None:
1975            inarray = N.ravel(inarray)
1976            size = len(inarray)
1977            s = N.add.reduce(1.0 / inarray)
1978        elif type(dimension) in [int, float]:
1979            size = float(inarray.shape[dimension])
1980            s = N.add.reduce(1.0/inarray, dimension)
1981            if keepdims == 1:
1982                shp = list(inarray.shape)
1983                shp[dimension] = 1
1984                s = N.reshape(s, shp)
1985        else:  # must be a SEQUENCE of dims to average over
1986            dims = sorted(dimension)
1987            nondims = []
1988            for i in range(len(inarray.shape)):
1989                if i not in dims:
1990                    nondims.append(i)
1991            tinarray = N.transpose(inarray, nondims+dims)  # put keep-dims first
1992            idx = [0] * len(nondims)
1993            if idx == []:
1994                size = len(N.ravel(inarray))
1995                s = asum(1.0 / inarray)
1996                if keepdims == 1:
1997                    s = N.reshape([s], N.ones(len(inarray.shape)))
1998            else:
1999                idx[0] = -1
2000                loopcap = N.array(tinarray.shape[0:len(nondims)]) - 1
2001                s = N.zeros(loopcap+1, N.Float)
2002                while incr(idx, loopcap) != -1:
2003                    s[idx] = asum(1.0/tinarray[idx])
2004                size = N.multiply.reduce(N.take(inarray.shape, dims))
2005                if keepdims == 1:
2006                    shp = list(inarray.shape)
2007                    for dim in dims:
2008                        shp[dim] = 1
2009                    s = N.reshape(s, shp)
2010        return size / s
2011
2012    def amean(inarray, dimension=None, keepdims=0):
2013        """
2014    Calculates the arithmatic mean of the values in the passed array.
2015    That is:  1/n * (x1 + x2 + ... + xn).  Defaults to ALL values in the
2016    passed array.  Use dimension=None to flatten array first.  REMEMBER: if
2017    dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2018    if dimension is a sequence, it collapses over all specified dimensions.  If
2019    keepdims is set to 1, the resulting array will have as many dimensions as
2020    inarray, with only 1 'level' per dim that was collapsed over.
2021
2022    Usage:   amean(inarray,dimension=None,keepdims=0)
2023    Returns: arithematic mean calculated over dim(s) in dimension
2024    """
2025        if inarray.typecode() in ['l', 's', 'b']:
2026            inarray = inarray.astype(N.Float)
2027        if dimension is None:
2028            inarray = N.ravel(inarray)
2029            sum = N.add.reduce(inarray)
2030            denom = float(len(inarray))
2031        elif type(dimension) in [int, float]:
2032            sum = asum(inarray, dimension)
2033            denom = float(inarray.shape[dimension])
2034            if keepdims == 1:
2035                shp = list(inarray.shape)
2036                shp[dimension] = 1
2037                sum = N.reshape(sum, shp)
2038        else:  # must be a TUPLE of dims to average over
2039            dims = sorted(dimension)
2040            dims.reverse()
2041            sum = inarray * 1.0
2042            for dim in dims:
2043                sum = N.add.reduce(sum, dim)
2044            denom = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.Float)
2045            if keepdims == 1:
2046                shp = list(inarray.shape)
2047                for dim in dims:
2048                    shp[dim] = 1
2049                sum = N.reshape(sum, shp)
2050        return sum/denom
2051
2052    def amedian(inarray, numbins=1000):
2053        """
2054    Calculates the COMPUTED median value of an array of numbers, given the
2055    number of bins to use for the histogram (more bins approaches finding the
2056    precise median value of the array; default number of bins = 1000).  From
2057    G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
2058    NOTE:  THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
2059
2060    Usage:   amedian(inarray,numbins=1000)
2061    Returns: median calculated over ALL values in inarray
2062    """
2063        inarray = N.ravel(inarray)
2064        (hist, smallest, binsize, extras) = ahistogram(inarray, numbins)
2065        cumhist = N.cumsum(hist)            # make cumulative histogram
2066        otherbins = N.greater_equal(cumhist, len(inarray)/2.0)
2067        otherbins = list(otherbins)         # list of 0/1s, 1s start at median bin
2068        cfbin = otherbins.index(1)                # get 1st(!) index holding 50%ile score
2069        LRL = smallest + binsize*cfbin        # get lower read limit of that bin
2070        cfbelow = N.add.reduce(hist[0:cfbin])        # cum. freq. below bin
2071        freq = hist[cfbin]                        # frequency IN the 50%ile bin
2072        median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize  # MEDIAN
2073        return median
2074
2075    def amedianscore(inarray, dimension=None):
2076        """
2077    Returns the 'middle' score of the passed array.  If there is an even
2078    number of scores, the mean of the 2 middle scores is returned.  Can function
2079    with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
2080    be None, to pre-flatten the array, or else dimension must equal 0).
2081
2082    Usage:   amedianscore(inarray,dimension=None)
2083    Returns: 'middle' score of the array, or the mean of the 2 middle scores
2084    """
2085        if dimension is None:
2086            inarray = N.ravel(inarray)
2087            dimension = 0
2088        inarray = N.sort(inarray, dimension)
2089        if inarray.shape[dimension] % 2 == 0:   # if even number of elements
2090            indx = inarray.shape[dimension]/2   # integer division correct
2091            median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0
2092        else:
2093            indx = inarray.shape[dimension] / 2  # integer division correct
2094            median = N.take(inarray, [indx], dimension)
2095            if median.shape == (1,):
2096                median = median[0]
2097        return median
2098
2099    def amode(a, dimension=None):
2100        """
2101    Returns an array of the modal (most common) score in the passed array.
2102    If there is more than one such score, ONLY THE FIRST is returned.
2103    The bin-count for the modal values is also returned.  Operates on whole
2104    array (dimension=None), or on a given dimension.
2105
2106    Usage:   amode(a, dimension=None)
2107    Returns: array of bin-counts for mode(s), array of corresponding modal values
2108    """
2109
2110        if dimension is None:
2111            a = N.ravel(a)
2112            dimension = 0
2113        scores = pstat.aunique(N.ravel(a))       # get ALL unique values
2114        testshape = list(a.shape)
2115        testshape[dimension] = 1
2116        oldmostfreq = N.zeros(testshape)
2117        oldcounts = N.zeros(testshape)
2118        for score in scores:
2119            template = N.equal(a, score)
2120            counts = asum(template, dimension, 1)
2121            mostfrequent = N.where(N.greater(counts, oldcounts), score, oldmostfreq)
2122            oldcounts = N.where(N.greater(counts, oldcounts), counts, oldcounts)
2123            oldmostfreq = mostfrequent
2124        return oldcounts, mostfrequent
2125
2126    def atmean(a, limits=None, inclusive=(1, 1)):
2127        """
2128   Returns the arithmetic mean of all values in an array, ignoring values
2129   strictly outside the sequence passed to 'limits'.   Note: either limit
2130   in the sequence, or the value of limits itself, can be set to None.  The
2131   inclusive list/tuple determines whether the lower and upper limiting bounds
2132   (respectively) are open/exclusive (0) or closed/inclusive (1).
2133
2134   Usage:   atmean(a,limits=None,inclusive=(1,1))
2135   """
2136        if a.typecode() in ['l', 's', 'b']:
2137            a = a.astype(N.Float)
2138        if limits is None:
2139            return mean(a)
2140        assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atmean"
2141        if inclusive[0]:
2142            lowerfcn = N.greater_equal
2143        else:
2144            lowerfcn = N.greater
2145        if inclusive[1]:
2146            upperfcn = N.less_equal
2147        else:
2148            upperfcn = N.less
2149        if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2150            raise ValueError("No array values within given limits (atmean).")
2151        elif limits[0] is None and limits[1] is not None:
2152            mask = upperfcn(a, limits[1])
2153        elif limits[0] is not None and limits[1] is None:
2154            mask = lowerfcn(a, limits[0])
2155        elif limits[0] is not None and limits[1] is not None:
2156            mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1])
2157        s = float(N.add.reduce(N.ravel(a*mask)))
2158        n = float(N.add.reduce(N.ravel(mask)))
2159        return s/n
2160
2161    def atvar(a, limits=None, inclusive=(1, 1)):
2162        """
2163   Returns the sample variance of values in an array, (i.e., using N-1),
2164   ignoring values strictly outside the sequence passed to 'limits'.
2165   Note: either limit in the sequence, or the value of limits itself,
2166   can be set to None.  The inclusive list/tuple determines whether the lower
2167   and upper limiting bounds (respectively) are open/exclusive (0) or
2168   closed/inclusive (1).
2169
2170   Usage:   atvar(a,limits=None,inclusive=(1,1))
2171   """
2172        a = a.astype(N.Float)
2173        if limits is None or limits == [None, None]:
2174            term1 = N.add.reduce(N.ravel(a*a))
2175            n = float(len(N.ravel(a))) - 1
2176            term2 = N.add.reduce(N.ravel(a))**2 / n
2177            print(term1, term2, n)
2178            return (term1 - term2) / n
2179        assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atvar"
2180        if inclusive[0]:
2181            lowerfcn = N.greater_equal
2182        else:
2183            lowerfcn = N.greater
2184        if inclusive[1]:
2185            upperfcn = N.less_equal
2186        else:
2187            upperfcn = N.less
2188        if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2189            raise ValueError("No array values within given limits (atvar).")
2190        elif limits[0] is None and limits[1] is not None:
2191            mask = upperfcn(a, limits[1])
2192        elif limits[0] is not None and limits[1] is None:
2193            mask = lowerfcn(a, limits[0])
2194        elif limits[0] is not None and limits[1] is not None:
2195            mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1])
2196        term1 = N.add.reduce(N.ravel(a*a*mask))
2197        n = float(N.add.reduce(N.ravel(mask))) - 1
2198        term2 = N.add.reduce(N.ravel(a*mask))**2 / n
2199        print(term1, term2, n)
2200        return (term1 - term2) / n
2201
2202    def atmin(a, lowerlimit=None, dimension=None, inclusive=1):
2203        """
2204   Returns the minimum value of a, along dimension, including only values less
2205   than (or equal to, if inclusive=1) lowerlimit.  If the limit is set to None,
2206   all values in the array are used.
2207
2208   Usage:   atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2209   """
2210        if inclusive:
2211            lowerfcn = N.greater
2212        else:
2213            lowerfcn = N.greater_equal
2214        if dimension is None:
2215            a = N.ravel(a)
2216            dimension = 0
2217        if lowerlimit is None:
2218            lowerlimit = N.minimum.reduce(N.ravel(a))-11
2219        biggest = N.maximum.reduce(N.ravel(a))
2220        ta = N.where(lowerfcn(a, lowerlimit), a, biggest)
2221        return N.minimum.reduce(ta, dimension)
2222
2223    def atmax(a, upperlimit, dimension=None, inclusive=1):
2224        """
2225   Returns the maximum value of a, along dimension, including only values greater
2226   than (or equal to, if inclusive=1) upperlimit.  If the limit is set to None,
2227   a limit larger than the max value in the array is used.
2228
2229   Usage:   atmax(a,upperlimit,dimension=None,inclusive=1)
2230   """
2231        if inclusive:
2232            upperfcn = N.less
2233        else:
2234            upperfcn = N.less_equal
2235        if dimension is None:
2236            a = N.ravel(a)
2237            dimension = 0
2238        if upperlimit is None:
2239            upperlimit = N.maximum.reduce(N.ravel(a))+1
2240        smallest = N.minimum.reduce(N.ravel(a))
2241        ta = N.where(upperfcn(a, upperlimit), a, smallest)
2242        return N.maximum.reduce(ta, dimension)
2243
2244    def atstdev(a, limits=None, inclusive=(1, 1)):
2245        """
2246   Returns the standard deviation of all values in an array, ignoring values
2247   strictly outside the sequence passed to 'limits'.   Note: either limit
2248   in the sequence, or the value of limits itself, can be set to None.  The
2249   inclusive list/tuple determines whether the lower and upper limiting bounds
2250   (respectively) are open/exclusive (0) or closed/inclusive (1).
2251
2252   Usage:   atstdev(a,limits=None,inclusive=(1,1))
2253   """
2254        return N.sqrt(tvar(a, limits, inclusive))
2255
2256    def atsem(a, limits=None, inclusive=(1, 1)):
2257        """
2258   Returns the standard error of the mean for the values in an array,
2259   (i.e., using N for the denominator), ignoring values strictly outside
2260   the sequence passed to 'limits'.   Note: either limit in the sequence,
2261   or the value of limits itself, can be set to None.  The inclusive list/tuple
2262   determines whether the lower and upper limiting bounds (respectively) are
2263   open/exclusive (0) or closed/inclusive (1).
2264
2265   Usage:   atsem(a,limits=None,inclusive=(1,1))
2266   """
2267        sd = tstdev(a, limits, inclusive)
2268        if limits is None or limits == [None, None]:
2269            n = float(len(N.ravel(a)))
2270        assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atsem"
2271        if inclusive[0]:
2272            lowerfcn = N.greater_equal
2273        else:
2274            lowerfcn = N.greater
2275        if inclusive[1]:
2276            upperfcn = N.less_equal
2277        else:
2278            upperfcn = N.less
2279        if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2280            raise ValueError("No array values within given limits (atsem).")
2281        elif limits[0] is None and limits[1] is not None:
2282            mask = upperfcn(a, limits[1])
2283        elif limits[0] is not None and limits[1] is None:
2284            mask = lowerfcn(a, limits[0])
2285        elif limits[0] is not None and limits[1] is not None:
2286            mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1])
2287        N.add.reduce(N.ravel(a*a*mask))
2288        n = float(N.add.reduce(N.ravel(mask)))
2289        return sd/math.sqrt(n)
2290
2291# AMOMENTS
2292
2293    def amoment(a, moment=1, dimension=None):
2294        """
2295        Calculates the nth moment about the mean for a sample (defaults to the
2296        1st moment).  Generally used to calculate coefficients of skewness and
2297        kurtosis.  Dimension can equal None (ravel array first), an integer
2298        (the dimension over which to operate), or a sequence (operate over
2299        multiple dimensions).
2300
2301        Usage:   amoment(a,moment=1,dimension=None)
2302        Returns: appropriate moment along given dimension
2303        """
2304        if dimension is None:
2305            a = N.ravel(a)
2306            dimension = 0
2307        if moment == 1:
2308            return 0.0
2309        else:
2310            mn = amean(a, dimension, 1)  # 1=keepdims
2311            s = N.power((a-mn), moment)
2312            return amean(s, dimension)
2313
2314    def avariation(a, dimension=None):
2315        """
2316        Returns the coefficient of variation, as defined in CRC Standard
2317        Probability and Statistics, p.6. Dimension can equal None (ravel array
2318        first), an integer (the dimension over which to operate), or a
2319        sequence (operate over multiple dimensions).
2320
2321        Usage:   avariation(a,dimension=None)
2322        """
2323        return 100.0*asamplestdev(a, dimension)/amean(a, dimension)
2324
2325    def askew(a, dimension=None):
2326        """
2327        Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2328        weight in left tail).  Use askewtest() to see if it's close enough.
2329        Dimension can equal None (ravel array first), an integer (the
2330        dimension over which to operate), or a sequence (operate over multiple
2331        dimensions).
2332
2333        Usage:   askew(a, dimension=None)
2334        Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2335        """
2336        denom = N.power(amoment(a, 2, dimension), 1.5)
2337        zero = N.equal(denom, 0)
2338        if isinstance(denom, N.ArrayType) and asum(zero) != 0:
2339            print("Number of zeros in askew: ", asum(zero))
2340        denom = denom + zero  # prevent divide-by-zero
2341        return N.where(zero, 0, amoment(a, 3, dimension)/denom)
2342
2343    def akurtosis(a, dimension=None):
2344        """
2345        Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2346        heavier in the tails, and usually more peaked).  Use akurtosistest()
2347        to see if it's close enough.  Dimension can equal None (ravel array
2348        first), an integer (the dimension over which to operate), or a
2349        sequence (operate over multiple dimensions).
2350
2351        Usage:   akurtosis(a,dimension=None)
2352        Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2353        """
2354        denom = N.power(amoment(a, 2, dimension), 2)
2355        zero = N.equal(denom, 0)
2356        if isinstance(denom, N.ArrayType) and asum(zero) != 0:
2357            print("Number of zeros in akurtosis: ", asum(zero))
2358        denom = denom + zero  # prevent divide-by-zero
2359        return N.where(zero, 0, amoment(a, 4, dimension)/denom)
2360
2361    def adescribe(inarray, dimension=None):
2362        """
2363        Returns several descriptive statistics of the passed array.  Dimension
2364        can equal None (ravel array first), an integer (the dimension over
2365        which to operate), or a sequence (operate over multiple dimensions).
2366
2367        Usage:   adescribe(inarray,dimension=None)
2368        Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2369        """
2370        if dimension is None:
2371            inarray = N.ravel(inarray)
2372            dimension = 0
2373        n = inarray.shape[dimension]
2374        mm = (N.minimum.reduce(inarray), N.maximum.reduce(inarray))
2375        m = amean(inarray, dimension)
2376        sd = astdev(inarray, dimension)
2377        skew = askew(inarray, dimension)
2378        kurt = akurtosis(inarray, dimension)
2379        return n, mm, m, sd, skew, kurt
2380
2381# NORMALITY TESTS
2382
2383    def askewtest(a, dimension=None):
2384        """
2385        Tests whether the skew is significantly different from a normal
2386        distribution.  Dimension can equal None (ravel array first), an
2387        integer (the dimension over which to operate), or a sequence (operate
2388        over multiple dimensions).
2389
2390        Usage:   askewtest(a,dimension=None)
2391        Returns: z-score and 2-tail z-probability
2392        """
2393        if dimension is None:
2394            a = N.ravel(a)
2395            dimension = 0
2396        b2 = askew(a, dimension)
2397        n = float(a.shape[dimension])
2398        y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)))
2399        beta2 = (3.0*(n*n+27*n-70)*(n+1)*(n+3)) / ((n-2.0)*(n+5)*(n+7)*(n+9))
2400        W2 = -1 + N.sqrt(2*(beta2-1))
2401        delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2402        alpha = N.sqrt(2/(W2-1))
2403        y = N.where(N.equal(y, 0), 1, y)
2404        Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2405        return Z, (1.0-zprob(Z))*2
2406
2407    def akurtosistest(a, dimension=None):
2408        """
2409        Tests whether a dataset has normal kurtosis (i.e.,
2410        kurtosis=3(n-1)/(n+1)) Valid only for n>20.  Dimension can equal None
2411        (ravel array first), an integer (the dimension over which to operate),
2412        or a sequence (operate over multiple dimensions).
2413
2414        Usage:   akurtosistest(a,dimension=None)
2415        Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2416        """
2417        if dimension is None:
2418            a = N.ravel(a)
2419            dimension = 0
2420        n = float(a.shape[dimension])
2421        if n < 20:
2422            print("akurtosistest only valid for n>=20 ... continuing anyway, n=", n)
2423        b2 = akurtosis(a, dimension)
2424        E = 3.0*(n-1) / (n+1)
2425        varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2426        x = (b2-E)/N.sqrt(varb2)
2427        sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))
2428                                                           / (n*(n-2)*(n-3)))
2429        A = 6.0 + 8.0/sqrtbeta1 * (2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2430        term1 = 1 - 2/(9.0*A)
2431        denom = 1 + x*N.sqrt(2/(A-4.0))
2432        denom = N.where(N.less(denom, 0), 99, denom)
2433        term2 = N.where(N.equal(denom, 0), term1, N.power((1-2.0/A)/denom, 1/3.0))
2434        Z = (term1 - term2) / N.sqrt(2/(9.0*A))
2435        Z = N.where(N.equal(denom, 99), 0, Z)
2436        return Z, (1.0-zprob(Z))*2
2437
2438    def anormaltest(a, dimension=None):
2439        """
2440        Tests whether skew and/OR kurtosis of dataset differs from normal
2441        curve.  Can operate over multiple dimensions.  Dimension can equal
2442        None (ravel array first), an integer (the dimension over which to
2443        operate), or a sequence (operate over multiple dimensions).
2444
2445        Usage:   anormaltest(a,dimension=None)
2446        Returns: z-score and 2-tail probability
2447        """
2448        if dimension is None:
2449            a = N.ravel(a)
2450            dimension = 0
2451        s, p = askewtest(a, dimension)
2452        k, p = akurtosistest(a, dimension)
2453        k2 = N.power(s, 2) + N.power(k, 2)
2454        return k2, achisqprob(k2, 2)
2455
2456# AFREQUENCY FUNCTIONS
2457
2458    def aitemfreq(a):
2459        """
2460        Returns a 2D array of item frequencies.  Column 1 contains item values,
2461        column 2 contains their respective counts.  Assumes a 1D array is passed.
2462
2463        Usage:   aitemfreq(a)
2464        Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2465        """
2466        scores = pstat.aunique(a)
2467        scores = N.sort(scores)
2468        freq = N.zeros(len(scores))
2469        for i in range(len(scores)):
2470            freq[i] = N.add.reduce(N.equal(a, scores[i]))
2471        return N.array(pstat.aabut(scores, freq))
2472
2473    def ascoreatpercentile(inarray, percent):
2474        """
2475        Usage:   ascoreatpercentile(inarray,percent)   0<percent<100
2476        Returns: score at given percentile, relative to inarray distribution
2477        """
2478        percent = percent / 100.0
2479        targetcf = percent*len(inarray)
2480        h, lrl, binsize, extras = histogram(inarray)
2481        cumhist = cumsum(h*1)
2482        for i in range(len(cumhist)):
2483            if cumhist[i] >= targetcf:
2484                break
2485        score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2486        return score
2487
2488    def apercentileofscore(inarray, score, histbins=10, defaultlimits=None):
2489        """
2490    Note: result of this function depends on the values used to histogram
2491    the data(!).
2492
2493    Usage:   apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2494    Returns: percentile-position of score (0-100) relative to inarray
2495    """
2496        h, lrl, binsize, extras = histogram(inarray, histbins, defaultlimits)
2497        cumhist = cumsum(h*1)
2498        i = int((score - lrl)/float(binsize))
2499        pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2500        return pct
2501
2502    def ahistogram(inarray, numbins=10, defaultlimits=None, printextras=1):
2503        """
2504    Returns (i) an array of histogram bin counts, (ii) the smallest value
2505    of the histogram binning, and (iii) the bin width (the last 2 are not
2506    necessarily integers).  Default number of bins is 10.  Defaultlimits
2507    can be None (the routine picks bins spanning all the numbers in the
2508    inarray) or a 2-sequence (lowerlimit, upperlimit).  Returns all of the
2509    following: array of bin values, lowerreallimit, binsize, extrapoints.
2510
2511    Usage:   ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2512    Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2513    """
2514        inarray = N.ravel(inarray)               # flatten any >1D arrays
2515        if (defaultlimits is not None):
2516            lowerreallimit = defaultlimits[0]
2517            upperreallimit = defaultlimits[1]
2518            binsize = (upperreallimit-lowerreallimit) / float(numbins)
2519        else:
2520            Min = N.minimum.reduce(inarray)
2521            Max = N.maximum.reduce(inarray)
2522            estbinwidth = float(Max - Min)/float(numbins) + 1
2523            binsize = (Max-Min+estbinwidth)/float(numbins)
2524            lowerreallimit = Min - binsize/2.0  # lower real limit,1st bin
2525        bins = N.zeros(numbins)
2526        extrapoints = 0
2527        for num in inarray:
2528            try:
2529                if (num-lowerreallimit) < 0:
2530                    extrapoints = extrapoints + 1
2531                else:
2532                    bintoincrement = int((num-lowerreallimit) / float(binsize))
2533                    bins[bintoincrement] = bins[bintoincrement] + 1
2534            except Exception:  # point outside lower/upper limits
2535                extrapoints = extrapoints + 1
2536        if (extrapoints > 0 and printextras == 1):
2537            print('\nPoints outside given histogram range =', extrapoints)
2538        return (bins, lowerreallimit, binsize, extrapoints)
2539
2540    def acumfreq(a, numbins=10, defaultreallimits=None):
2541        """
2542        Returns a cumulative frequency histogram, using the histogram function.
2543        Defaultreallimits can be None (use all data), or a 2-sequence containing
2544        lower and upper limits on values to include.
2545
2546        Usage:   acumfreq(a,numbins=10,defaultreallimits=None)
2547        Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2548        """
2549        h, l, b, e = histogram(a, numbins, defaultreallimits)
2550        cumhist = cumsum(h*1)
2551        return cumhist, l, b, e
2552
2553    def arelfreq(a, numbins=10, defaultreallimits=None):
2554        """
2555        Returns a relative frequency histogram, using the histogram function.
2556        Defaultreallimits can be None (use all data), or a 2-sequence containing
2557        lower and upper limits on values to include.
2558
2559        Usage:   arelfreq(a,numbins=10,defaultreallimits=None)
2560        Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2561        """
2562        h, l, b, e = histogram(a, numbins, defaultreallimits)
2563        h = N.array(h/float(a.shape[0]))
2564        return h, l, b, e
2565
2566# AVARIABILITY FUNCTIONS
2567
2568    def aobrientransform(*args):
2569        """
2570        Computes a transform on input data (any number of columns).  Used to
2571        test for homogeneity of variance prior to running one-way stats.  Each
2572        array in *args is one level of a factor.  If an F_oneway() run on the
2573        transformed data and found significant, variances are unequal.   From
2574        Maxwell and Delaney, p.112.
2575
2576        Usage:   aobrientransform(*args)    *args = 1D arrays, one per level of factor
2577        Returns: transformed data for use in an ANOVA
2578        """
2579        TINY = 1e-10
2580        k = len(args)
2581        n = N.zeros(k, N.Float)
2582        v = N.zeros(k, N.Float)
2583        m = N.zeros(k, N.Float)
2584        nargs = []
2585        for i in range(k):
2586            nargs.append(args[i].astype(N.Float))
2587            n[i] = float(len(nargs[i]))
2588            v[i] = var(nargs[i])
2589            m[i] = mean(nargs[i])
2590        for j in range(k):
2591            for i in range(n[j]):
2592                t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
2593                t2 = 0.5*v[j]*(n[j]-1.0)
2594                t3 = (n[j]-1.0)*(n[j]-2.0)
2595                nargs[j][i] = (t1-t2) / float(t3)
2596        check = 1
2597        for j in range(k):
2598            if v[j] - mean(nargs[j]) > TINY:
2599                check = 0
2600        if check != 1:
2601            raise ValueError('Lack of convergence in obrientransform.')
2602        else:
2603            return N.array(nargs)
2604
2605    def asamplevar(inarray, dimension=None, keepdims=0):
2606        """
2607    Returns the sample standard deviation of the values in the passed
2608    array (i.e., using N).  Dimension can equal None (ravel array first),
2609    an integer (the dimension over which to operate), or a sequence
2610    (operate over multiple dimensions).  Set keepdims=1 to return an array
2611    with the same number of dimensions as inarray.
2612
2613    Usage:   asamplevar(inarray,dimension=None,keepdims=0)
2614    """
2615        if dimension is None:
2616            inarray = N.ravel(inarray)
2617            dimension = 0
2618        if dimension == 1:
2619            mn = amean(inarray, dimension)[:, N.NewAxis]
2620        else:
2621            mn = amean(inarray, dimension, keepdims=1)
2622        deviations = inarray - mn
2623        if isinstance(dimension, list):
2624            n = 1
2625            for d in dimension:
2626                n = n*inarray.shape[d]
2627        else:
2628            n = inarray.shape[dimension]
2629        svar = ass(deviations, dimension, keepdims) / float(n)
2630        return svar
2631
2632    def asamplestdev(inarray, dimension=None, keepdims=0):
2633        """
2634    Returns the sample standard deviation of the values in the passed
2635    array (i.e., using N).  Dimension can equal None (ravel array first),
2636    an integer (the dimension over which to operate), or a sequence
2637    (operate over multiple dimensions).  Set keepdims=1 to return an array
2638    with the same number of dimensions as inarray.
2639
2640    Usage:   asamplestdev(inarray,dimension=None,keepdims=0)
2641    """
2642        return N.sqrt(asamplevar(inarray, dimension, keepdims))
2643
2644    def asignaltonoise(instack, dimension=0):
2645        """
2646    Calculates signal-to-noise.  Dimension can equal None (ravel array
2647    first), an integer (the dimension over which to operate), or a
2648    sequence (operate over multiple dimensions).
2649
2650    Usage:   asignaltonoise(instack,dimension=0):
2651    Returns: array containing the value of (mean/stdev) along dimension,
2652             or 0 when stdev=0
2653    """
2654        m = mean(instack, dimension)
2655        sd = stdev(instack, dimension)
2656        return N.where(N.equal(sd, 0), 0, m/sd)
2657
2658    def avar(inarray, dimension=None, keepdims=0):
2659        """
2660    Returns the estimated population variance of the values in the passed
2661    array (i.e., N-1).  Dimension can equal None (ravel array first), an
2662    integer (the dimension over which to operate), or a sequence (operate
2663    over multiple dimensions).  Set keepdims=1 to return an array with the
2664    same number of dimensions as inarray.
2665
2666    Usage:   avar(inarray,dimension=None,keepdims=0)
2667    """
2668        if dimension is None:
2669            inarray = N.ravel(inarray)
2670            dimension = 0
2671        mn = amean(inarray, dimension, 1)
2672        deviations = inarray - mn
2673        if isinstance(dimension, list):
2674            n = 1
2675            for d in dimension:
2676                n = n*inarray.shape[d]
2677        else:
2678            n = inarray.shape[dimension]
2679        var = ass(deviations, dimension, keepdims)/float(n-1)
2680        return var
2681
2682    def astdev(inarray, dimension=None, keepdims=0):
2683        """
2684    Returns the estimated population standard deviation of the values in
2685    the passed array (i.e., N-1).  Dimension can equal None (ravel array
2686    first), an integer (the dimension over which to operate), or a
2687    sequence (operate over multiple dimensions).  Set keepdims=1 to return
2688    an array with the same number of dimensions as inarray.
2689
2690    Usage:   astdev(inarray,dimension=None,keepdims=0)
2691    """
2692        return N.sqrt(avar(inarray, dimension, keepdims))
2693
2694    def asterr(inarray, dimension=None, keepdims=0):
2695        """
2696    Returns the estimated population standard error of the values in the
2697    passed array (i.e., N-1).  Dimension can equal None (ravel array
2698    first), an integer (the dimension over which to operate), or a
2699    sequence (operate over multiple dimensions).  Set keepdims=1 to return
2700    an array with the same number of dimensions as inarray.
2701
2702    Usage:   asterr(inarray,dimension=None,keepdims=0)
2703    """
2704        if dimension is None:
2705            inarray = N.ravel(inarray)
2706            dimension = 0
2707        return astdev(inarray, dimension, keepdims) / float(N.sqrt(inarray.shape[dimension]))
2708
2709    def asem(inarray, dimension=None, keepdims=0):
2710        """
2711    Returns the standard error of the mean (i.e., using N) of the values
2712    in the passed array.  Dimension can equal None (ravel array first), an
2713    integer (the dimension over which to operate), or a sequence (operate
2714    over multiple dimensions).  Set keepdims=1 to return an array with the
2715    same number of dimensions as inarray.
2716
2717    Usage:   asem(inarray,dimension=None, keepdims=0)
2718    """
2719        if dimension is None:
2720            inarray = N.ravel(inarray)
2721            dimension = 0
2722        if isinstance(dimension, list):
2723            n = 1
2724            for d in dimension:
2725                n = n*inarray.shape[d]
2726        else:
2727            n = inarray.shape[dimension]
2728        s = asamplestdev(inarray, dimension, keepdims) / N.sqrt(n-1)
2729        return s
2730
2731    def az(a, score):
2732        """
2733    Returns the z-score of a given input score, given thearray from which
2734    that score came.  Not appropriate for population calculations, nor for
2735    arrays > 1D.
2736
2737    Usage:   az(a, score)
2738    """
2739        z = (score-amean(a)) / asamplestdev(a)
2740        return z
2741
2742    def azs(a):
2743        """
2744    Returns a 1D array of z-scores, one for each score in the passed array,
2745    computed relative to the passed array.
2746
2747    Usage:   azs(a)
2748    """
2749        zscores = []
2750        for item in a:
2751            zscores.append(z(a, item))
2752        return N.array(zscores)
2753
2754    def azmap(scores, compare, dimension=0):
2755        """
2756    Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2757    array passed to compare (e.g., [time,x,y]).  Assumes collapsing over dim 0
2758    of the compare array.
2759
2760    Usage:   azs(scores, compare, dimension=0)
2761    """
2762        mns = amean(compare, dimension)
2763        sstd = asamplestdev(compare, 0)
2764        return (scores - mns) / sstd
2765
2766# ATRIMMING FUNCTIONS
2767
2768    def around(a, digits=1):
2769        """
2770        Rounds all values in array a to 'digits' decimal places.
2771
2772        Usage:   around(a,digits)
2773        Returns: a, where each value is rounded to 'digits' decimals
2774        """
2775        def ar(x, d=digits):
2776            return round(x, d)
2777
2778        if not isinstance(a, N.ArrayType):
2779            try:
2780                a = N.array(a)
2781            except Exception:
2782                a = N.array(a, 'O')
2783        shp = a.shape
2784        if a.typecode() in ['f', 'F', 'd', 'D']:
2785            b = N.ravel(a)
2786            b = N.array([ar(_) for _ in b])
2787            b.shape = shp
2788        elif a.typecode() in ['o', 'O']:
2789            b = N.ravel(a)*1
2790            for i in range(len(b)):
2791                if isinstance(b[i], float):
2792                    b[i] = round(b[i], digits)
2793            b.shape = shp
2794        else:  # not a float, double or Object array
2795            b = a*1
2796        return b
2797
2798    def athreshold(a, threshmin=None, threshmax=None, newval=0):
2799        """
2800    Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2801    by newval instead of by threshmin/threshmax (respectively).
2802
2803    Usage:   athreshold(a,threshmin=None,threshmax=None,newval=0)
2804    Returns: a, with values <threshmin or >threshmax replaced with newval
2805    """
2806        mask = N.zeros(a.shape)
2807        if threshmin is not None:
2808            mask = mask + N.where(N.less(a, threshmin), 1, 0)
2809        if threshmax is not None:
2810            mask = mask + N.where(N.greater(a, threshmax), 1, 0)
2811        mask = N.clip(mask, 0, 1)
2812        return N.where(mask, newval, a)
2813
2814    def atrimboth(a, proportiontocut):
2815        """
2816    Slices off the passed proportion of items from BOTH ends of the passed
2817    array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2818    'rightmost' 10% of scores.  You must pre-sort the array if you want
2819    "proper" trimming.  Slices off LESS if proportion results in a
2820    non-integer slice index (i.e., conservatively slices off
2821    proportiontocut).
2822
2823    Usage:   atrimboth (a,proportiontocut)
2824    Returns: trimmed version of array a
2825    """
2826        lowercut = int(proportiontocut*len(a))
2827        uppercut = len(a) - lowercut
2828        return a[lowercut:uppercut]
2829
2830    def atrim1(a, proportiontocut, tail='right'):
2831        """
2832        Slices off the passed proportion of items from ONE end of the passed
2833        array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2834        10% of scores).  Slices off LESS if proportion results in a non-integer
2835        slice index (i.e., conservatively slices off proportiontocut).
2836
2837        Usage:   atrim1(a,proportiontocut,tail='right')  or set tail='left'
2838        Returns: trimmed version of array a
2839        """
2840        if string.lower(tail) == 'right':
2841            lowercut = 0
2842            uppercut = len(a) - int(proportiontocut*len(a))
2843        elif string.lower(tail) == 'left':
2844            lowercut = int(proportiontocut*len(a))
2845            uppercut = len(a)
2846        return a[lowercut:uppercut]
2847
2848# ACORRELATION FUNCTIONS
2849
2850    def acovariance(X):
2851        """
2852        Computes the covariance matrix of a matrix X.  Requires a 2D matrix input.
2853
2854        Usage:   acovariance(X)
2855        Returns: covariance matrix of X
2856        """
2857        if len(X.shape) != 2:
2858            raise TypeError("acovariance requires 2D matrices")
2859        n = X.shape[0]
2860        mX = amean(X, 0)
2861        return N.dot(N.transpose(X), X) / float(n) - N.multiply.outer(mX, mX)
2862
2863    def acorrelation(X):
2864        """
2865        Computes the correlation matrix of a matrix X.  Requires a 2D matrix input.
2866
2867        Usage:   acorrelation(X)
2868        Returns: correlation matrix of X
2869        """
2870        C = acovariance(X)
2871        V = N.diagonal(C)
2872        return C / N.sqrt(N.multiply.outer(V, V))
2873
2874    def apaired(x, y):
2875        """
2876        Interactively determines the type of data in x and y, and then runs the
2877        appropriated statistic for paired group data.
2878
2879        Usage:   apaired(x,y)     x,y = the two arrays of values to be compared
2880        Returns: appropriate statistic name, value, and probability
2881        """
2882        samples = ''
2883        while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
2884            print('\nIndependent or related samples, or correlation (i,r,c): ', end=' ')
2885            samples = input()
2886
2887        if samples in ['i', 'I', 'r', 'R']:
2888            print('\nComparing variances ...', end=' ')
2889            # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
2890            r = obrientransform(x, y)
2891            f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
2892            if p < 0.05:
2893                vartype = 'unequal, p='+str(round(p, 4))
2894            else:
2895                vartype = 'equal'
2896            print(vartype)
2897            if samples in ['i', 'I']:
2898                if vartype[0] == 'e':
2899                    t, p = ttest_ind(x, y, None, 0)
2900                    print('\nIndependent samples t-test:  ', round(t, 4), round(p, 4))
2901                else:
2902                    if len(x) > 20 or len(y) > 20:
2903                        z, p = ranksums(x, y)
2904                        print('\nRank Sums test (NONparametric, n>20):  ', round(z, 4), round(p, 4))
2905                    else:
2906                        u, p = mannwhitneyu(x, y)
2907                        print('\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u, 4), round(p, 4))
2908
2909            else:  # RELATED SAMPLES
2910                if vartype[0] == 'e':
2911                    t, p = ttest_rel(x, y, 0)
2912                    print('\nRelated samples t-test:  ', round(t, 4), round(p, 4))
2913                else:
2914                    t, p = ranksums(x, y)
2915                    print('\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4))
2916        else:  # CORRELATION ANALYSIS
2917            corrtype = ''
2918            while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
2919                print('\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', end=' ')
2920                corrtype = input()
2921            if corrtype in ['c', 'C']:
2922                m, b, r, p, see = linregress(x, y)
2923                print('\nLinear regression for continuous variables ...')
2924                lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'], [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]]
2925                pstat.printcc(lol)
2926            elif corrtype in ['r', 'R']:
2927                r, p = spearmanr(x, y)
2928                print('\nCorrelation for ranked variables ...')
2929                print("Spearman's r: ", round(r, 4), round(p, 4))
2930            else:  # DICHOTOMOUS
2931                r, p = pointbiserialr(x, y)
2932                print('\nAssuming x contains a dichotomous variable ...')
2933                print('Point Biserial r: ', round(r, 4), round(p, 4))
2934        print('\n\n')
2935        return None
2936
2937    def apearsonr(x, y, verbose=1):
2938        """
2939        Calculates a Pearson correlation coefficient and returns p.  Taken
2940        from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
2941
2942        Usage:   apearsonr(x,y,verbose=1)      where x,y are equal length arrays
2943        Returns: Pearson's r, two-tailed p-value
2944        """
2945        TINY = 1.0e-20
2946        n = len(x)
2947        r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
2948        r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
2949        r = (r_num / r_den)
2950        df = n-2
2951        t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
2952        prob = abetai(0.5*df, 0.5, df/(df+t*t), verbose)
2953        return r, prob
2954
2955    def aspearmanr(x, y):
2956        """
2957        Calculates a Spearman rank-order correlation coefficient.  Taken
2958        from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
2959
2960        Usage:   aspearmanr(x,y)      where x,y are equal-length arrays
2961        Returns: Spearman's r, two-tailed p-value
2962        """
2963        n = len(x)
2964        rankx = rankdata(x)
2965        ranky = rankdata(y)
2966        dsq = N.add.reduce((rankx-ranky)**2)
2967        rs = 1 - 6*dsq / float(n*(n**2-1))
2968        t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
2969        df = n-2
2970        probrs = abetai(0.5*df, 0.5, df/(df+t*t))
2971        # probability values for rs are from part 2 of the spearman function in
2972        # Numerical Recipies, p.510.  They close to tables, but not exact.(?)
2973        return rs, probrs
2974
2975    def apointbiserialr(x, y):
2976        """
2977        Calculates a point-biserial correlation coefficient and the associated
2978        probability value.  Taken from Heiman's Basic Statistics for the Behav.
2979        Sci (1st), p.194.
2980
2981        Usage:   apointbiserialr(x,y)      where x,y are equal length arrays
2982        Returns: Point-biserial r, two-tailed p-value
2983        """
2984        TINY = 1e-30
2985        categories = pstat.aunique(x)
2986        data = pstat.aabut(x, y)
2987        if len(categories) != 2:
2988            raise ValueError("Exactly 2 categories required (in x) for pointbiserialr().")
2989        else:   # there are 2 categories, continue
2990            codemap = pstat.aabut(categories, N.arange(2))
2991            pstat.arecode(data, codemap, 0)  # recoded
2992            x = pstat.alinexand(data, 0, categories[0])
2993            y = pstat.alinexand(data, 0, categories[1])
2994            xmean = amean(pstat.acolex(x, 1))
2995            ymean = amean(pstat.acolex(y, 1))
2996            n = len(data)
2997            adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
2998            rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data, 1))*adjust
2999            df = n-2
3000            t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3001            prob = abetai(0.5*df, 0.5, df/(df+t*t))
3002            return rpb, prob
3003
3004    def akendalltau(x, y):
3005        """
3006    Calculates Kendall's tau ... correlation of ordinal data.  Adapted
3007    from function kendl1 in Numerical Recipies.  Needs good test-cases.@@@
3008
3009    Usage:   akendalltau(x,y)
3010    Returns: Kendall's tau, two-tailed p-value
3011    """
3012        n1 = 0
3013        n2 = 0
3014        iss = 0
3015        for j in range(len(x)-1):
3016            for k in range(j, len(y)):
3017                a1 = x[j] - x[k]
3018                a2 = y[j] - y[k]
3019                aa = a1 * a2
3020                if (aa):             # neither array has a tie
3021                    n1 = n1 + 1
3022                    n2 = n2 + 1
3023                    if aa > 0:
3024                        iss = iss + 1
3025                    else:
3026                        iss = iss - 1
3027                else:
3028                    if (a1):
3029                        n1 = n1 + 1
3030                    else:
3031                        n2 = n2 + 1
3032        tau = iss / math.sqrt(n1*n2)
3033        svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3034        z = tau / math.sqrt(svar)
3035        prob = erfcc(abs(z)/1.4142136)
3036        return tau, prob
3037
3038    def alinregress(*args):
3039        """
3040    Calculates a regression line on two arrays, x and y, corresponding to x,y
3041    pairs.  If a single 2D array is passed, alinregress finds dim with 2 levels
3042    and splits data into x,y pairs along that dim.
3043
3044    Usage:   alinregress(*args)    args=2 equal-length arrays, or one 2D array
3045    Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate
3046    """
3047        TINY = 1.0e-20
3048        if len(args) == 1:  # more than 1D array?
3049            args = args[0]
3050            if len(args) == 2:
3051                x = args[0]
3052                y = args[1]
3053            else:
3054                x = args[:, 0]
3055                y = args[:, 1]
3056        else:
3057            x = args[0]
3058            y = args[1]
3059        n = len(x)
3060        xmean = amean(x)
3061        ymean = amean(y)
3062        r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3063        r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3064        r = r_num / r_den
3065        df = n-2
3066        t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3067        prob = abetai(0.5*df, 0.5, df/(df+t*t))
3068        slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3069        intercept = ymean - slope*xmean
3070        sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3071        return slope, intercept, r, prob, sterrest
3072
3073# AINFERENTIAL STATISTICS
3074
3075    def attest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
3076        """
3077        Calculates the t-obtained for the independent samples T-test on ONE group
3078        of scores a, given a population mean.  If printit=1, results are printed
3079        to the screen.  If printit='filename', the results are output to 'filename'
3080        using the given writemode (default=append).  Returns t-value, and prob.
3081
3082        Usage:   attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3083        Returns: t-value, two-tailed prob
3084        """
3085        if not isinstance(a, N.ArrayType):
3086            a = N.array(a)
3087        x = amean(a)
3088        v = avar(a)
3089        n = len(a)
3090        df = n-1
3091        svar = ((n-1)*v) / float(df)
3092        t = (x-popmean)/math.sqrt(svar*(1.0/n))
3093        prob = abetai(0.5*df, 0.5, df/(df+t*t))
3094
3095        if printit != 0:
3096            statname = 'Single-sample T-test.'
3097            outputpairedstats(printit, writemode,
3098                              'Population', '--', popmean, 0, 0, 0,
3099                              name, n, x, v, N.minimum.reduce(N.ravel(a)),
3100                              N.maximum.reduce(N.ravel(a)),
3101                              statname, t, prob)
3102        return t, prob
3103
3104    def attest_ind(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
3105        """
3106    Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3107    a, and b.  From Numerical Recipies, p.483.  If printit=1, results are
3108    printed to the screen.  If printit='filename', the results are output
3109    to 'filename' using the given writemode (default=append).  Dimension
3110    can equal None (ravel array first), or an integer (the dimension over
3111    which to operate on a and b).
3112
3113    Usage:   attest_ind (a,b,dimension=None,printit=0,
3114                         Name1='Samp1',Name2='Samp2',writemode='a')
3115    Returns: t-value, two-tailed p-value
3116    """
3117        if dimension is None:
3118            a = N.ravel(a)
3119            b = N.ravel(b)
3120            dimension = 0
3121        x1 = amean(a, dimension)
3122        x2 = amean(b, dimension)
3123        v1 = avar(a, dimension)
3124        v2 = avar(b, dimension)
3125        n1 = a.shape[dimension]
3126        n2 = b.shape[dimension]
3127        df = n1+n2-2
3128        svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3129        zerodivproblem = N.equal(svar, 0)
3130        svar = N.where(zerodivproblem, 1, svar)  # avoid zero-division in 1st place
3131        t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))  # N-D COMPUTATION HERE!!!!!!
3132        t = N.where(zerodivproblem, 1.0, t)     # replace NaN/wrong t-values with 1.0
3133        probs = abetai(0.5*df, 0.5, float(df)/(df+t*t))
3134
3135        if isinstance(t, N.ArrayType):
3136            probs = N.reshape(probs, t.shape)
3137        if len(probs) == 1:
3138            probs = probs[0]
3139
3140        if printit != 0:
3141            if isinstance(t, N.ArrayType):
3142                t = t[0]
3143            if isinstance(probs, N.ArrayType):
3144                probs = probs[0]
3145            statname = 'Independent samples T-test.'
3146            outputpairedstats(printit, writemode,
3147                              name1, n1, x1, v1, N.minimum.reduce(N.ravel(a)),
3148                              N.maximum.reduce(N.ravel(a)),
3149                              name2, n2, x2, v2, N.minimum.reduce(N.ravel(b)),
3150                              N.maximum.reduce(N.ravel(b)),
3151                              statname, t, probs)
3152            return
3153        return t, probs
3154
3155    def attest_rel(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
3156        """
3157    Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3158    and b.  From Numerical Recipies, p.483.  If printit=1, results are
3159    printed to the screen.  If printit='filename', the results are output
3160    to 'filename' using the given writemode (default=append).  Dimension
3161    can equal None (ravel array first), or an integer (the dimension over
3162    which to operate on a and b).
3163
3164    Usage:   attest_rel(a,b,dimension=None,printit=0,
3165                        name1='Samp1',name2='Samp2',writemode='a')
3166    Returns: t-value, two-tailed p-value
3167    """
3168        if dimension is None:
3169            a = N.ravel(a)
3170            b = N.ravel(b)
3171            dimension = 0
3172        if len(a) != len(b):
3173            raise ValueError('Unequal length arrays.')
3174        x1 = amean(a, dimension)
3175        x2 = amean(b, dimension)
3176        v1 = avar(a, dimension)
3177        v2 = avar(b, dimension)
3178        n = a.shape[dimension]
3179        df = float(n-1)
3180        d = (a-b).astype('d')
3181
3182        denom = N.sqrt((n*N.add.reduce(d*d, dimension) - N.add.reduce(d, dimension)**2) / df)
3183        zerodivproblem = N.equal(denom, 0)
3184        denom = N.where(zerodivproblem, 1, denom)  # avoid zero-division in 1st place
3185        t = N.add.reduce(d, dimension) / denom      # N-D COMPUTATION HERE!!!!!!
3186        t = N.where(zerodivproblem, 1.0, t)     # replace NaN/wrong t-values with 1.0
3187        probs = abetai(0.5*df, 0.5, float(df)/(df+t*t))
3188        if isinstance(t, N.ArrayType):
3189            probs = N.reshape(probs, t.shape)
3190        if len(probs) == 1:
3191            probs = probs[0]
3192
3193        if printit != 0:
3194            statname = 'Related samples T-test.'
3195            outputpairedstats(printit, writemode,
3196                              name1, n, x1, v1, N.minimum.reduce(N.ravel(a)),
3197                              N.maximum.reduce(N.ravel(a)),
3198                              name2, n, x2, v2, N.minimum.reduce(N.ravel(b)),
3199                              N.maximum.reduce(N.ravel(b)),
3200                              statname, t, probs)
3201            return
3202        return t, probs
3203
3204    def achisquare(f_obs, f_exp=None):
3205        """
3206    Calculates a one-way chi square for array of observed frequencies and returns
3207    the result.  If no expected frequencies are given, the total N is assumed to
3208    be equally distributed across all groups.
3209
3210    Usage:   achisquare(f_obs, f_exp=None)   f_obs = array of observed cell freq.
3211    Returns: chisquare-statistic, associated p-value
3212    """
3213
3214        k = len(f_obs)
3215        if f_exp is None:
3216            f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs), N.Float)
3217        f_exp = f_exp.astype(N.Float)
3218        chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3219        return chisq, chisqprob(chisq, k-1)
3220
3221    def aks_2samp(data1, data2):
3222        """
3223    Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified from
3224    Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not ufunc-
3225    like.
3226
3227    Usage:   aks_2samp(data1,data2)  where data1 and data2 are 1D arrays
3228    Returns: KS D-value, p-value
3229    """
3230        j1 = 0    # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
3231        j2 = 0    # N.zeros(data2.shape[1:])
3232        fn1 = 0.0  # N.zeros(data1.shape[1:],N.Float)
3233        fn2 = 0.0  # N.zeros(data2.shape[1:],N.Float)
3234        n1 = data1.shape[0]
3235        n2 = data2.shape[0]
3236        en1 = n1*1
3237        en2 = n2*1
3238        d = N.zeros(data1.shape[1:], N.Float)
3239        data1 = N.sort(data1, 0)
3240        data2 = N.sort(data2, 0)
3241        while j1 < n1 and j2 < n2:
3242            d1 = data1[j1]
3243            d2 = data2[j2]
3244            if d1 <= d2:
3245                fn1 = (j1)/float(en1)
3246                j1 = j1 + 1
3247            if d2 <= d1:
3248                fn2 = (j2)/float(en2)
3249                j2 = j2 + 1
3250            dt = (fn2-fn1)
3251            if abs(dt) > abs(d):
3252                d = dt
3253        try:
3254            en = math.sqrt(en1*en2/float(en1+en2))
3255            prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3256        except Exception:
3257            prob = 1.0
3258        return d, prob
3259
3260    def amannwhitneyu(x, y):
3261        """
3262        Calculates a Mann-Whitney U statistic on the provided scores and
3263        returns the result.  Use only when the n in each condition is < 20 and
3264        you have 2 independent samples of ranks.  REMEMBER: Mann-Whitney U is
3265        significant if the u-obtained is LESS THAN or equal to the critical
3266        value of U.
3267
3268        Usage:   amannwhitneyu(x,y)     where x,y are arrays of values for 2 conditions
3269        Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3270        """
3271        n1 = len(x)
3272        n2 = len(y)
3273        ranked = rankdata(N.concatenate((x, y)))
3274        rankx = ranked[0:n1]       # get the x-ranks
3275        u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x
3276        u2 = n1*n2 - u1                            # remainder is U for y
3277        bigu = max(u1, u2)
3278        smallu = min(u1, u2)
3279        T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
3280        if T == 0:
3281            raise ValueError('All numbers are identical in amannwhitneyu')
3282        sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3283        z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc
3284        return smallu, 1.0 - zprob(z)
3285
3286    def atiecorrect(rankvals):
3287        """
3288        Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3289        See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3290        Sciences.  New York: McGraw-Hill.  Code adapted from |Stat rankind.c
3291        code.
3292
3293        Usage:   atiecorrect(rankvals)
3294        Returns: T correction factor for U or H
3295        """
3296        sorted, posn = ashellsort(N.array(rankvals))
3297        n = len(sorted)
3298        T = 0.0
3299        i = 0
3300        while (i < n-1):
3301            if sorted[i] == sorted[i+1]:
3302                nties = 1
3303                while (i < n-1) and (sorted[i] == sorted[i+1]):
3304                    nties = nties + 1
3305                    i = i + 1
3306                T = T + nties**3 - nties
3307            i = i+1
3308        T = T / float(n**3-n)
3309        return 1.0 - T
3310
3311    def aranksums(x, y):
3312        """
3313    Calculates the rank sums statistic on the provided scores and returns
3314    the result.
3315
3316    Usage:   aranksums(x,y)     where x,y are arrays of values for 2 conditions
3317    Returns: z-statistic, two-tailed p-value
3318    """
3319        n1 = len(x)
3320        n2 = len(y)
3321        alldata = N.concatenate((x, y))
3322        ranked = arankdata(alldata)
3323        x = ranked[:n1]
3324        y = ranked[n1:]
3325        s = sum(x)
3326        expected = n1*(n1+n2+1) / 2.0
3327        z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3328        prob = 2*(1.0 - zprob(abs(z)))
3329        return z, prob
3330
3331    def awilcoxont(x, y):
3332        """
3333    Calculates the Wilcoxon T-test for related samples and returns the
3334    result.  A non-parametric T-test.
3335
3336    Usage:   awilcoxont(x,y)     where x,y are equal-length arrays for 2 conditions
3337    Returns: t-statistic, two-tailed p-value
3338    """
3339        if len(x) != len(y):
3340            raise ValueError('Unequal N in awilcoxont.  Aborting.')
3341        d = x-y
3342        d = N.compress(N.not_equal(d, 0), d)  # Keep all non-zero differences
3343        count = len(d)
3344        absd = abs(d)
3345        absranked = arankdata(absd)
3346        r_plus = 0.0
3347        r_minus = 0.0
3348        for i in range(len(absd)):
3349            if d[i] < 0:
3350                r_minus = r_minus + absranked[i]
3351            else:
3352                r_plus = r_plus + absranked[i]
3353        wt = min(r_plus, r_minus)
3354        mn = count * (count+1) * 0.25
3355        se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3356        z = math.fabs(wt-mn) / se
3357        z = math.fabs(wt-mn) / se
3358        prob = 2*(1.0 - zprob(abs(z)))
3359        return wt, prob
3360
3361    def akruskalwallish(*args):
3362        """
3363    The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3364    groups, requiring at least 5 subjects in each group.  This function
3365    calculates the Kruskal-Wallis H and associated p-value for 3 or more
3366    independent samples.
3367
3368    Usage:   akruskalwallish(*args)     args are separate arrays for 3+ conditions
3369    Returns: H-statistic (corrected for ties), associated p-value
3370    """
3371        assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3372        args = list(args)
3373        n = [0]*len(args)
3374        n = [len(_) for _ in args]
3375        all = []
3376        for i in range(len(args)):
3377            all = all + args[i].tolist()
3378        ranked = rankdata(all)
3379        T = tiecorrect(ranked)
3380        for i in range(len(args)):
3381            args[i] = ranked[0:n[i]]
3382            del ranked[0:n[i]]
3383        rsums = []
3384        for i in range(len(args)):
3385            rsums.append(sum(args[i])**2)
3386            rsums[i] = rsums[i] / float(n[i])
3387        ssbn = sum(rsums)
3388        totaln = sum(n)
3389        h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3390        df = len(args) - 1
3391        if T == 0:
3392            raise ValueError('All numbers are identical in akruskalwallish')
3393        h = h / float(T)
3394        return h, chisqprob(h, df)
3395
3396    def afriedmanchisquare(*args):
3397        """
3398    Friedman Chi-Square is a non-parametric, one-way within-subjects
3399    ANOVA.  This function calculates the Friedman Chi-square test for
3400    repeated measures and returns the result, along with the associated
3401    probability value.  It assumes 3 or more repeated measures.  Only 3
3402    levels requires a minimum of 10 subjects in the study.  Four levels
3403    requires 5 subjects per level(??).
3404
3405    Usage:   afriedmanchisquare(*args)   args are separate arrays for 2+ conditions
3406    Returns: chi-square statistic, associated p-value
3407    """
3408        k = len(args)
3409        if k < 3:
3410            raise ValueError('\nLess than 3 levels.  Friedman test not appropriate.\n')
3411        n = len(args[0])
3412        data = pstat.aabut(*args)
3413        data = data.astype(N.Float)
3414        for i in range(len(data)):
3415            data[i] = arankdata(data[i])
3416        ssbn = asum(asum(args, 1)**2)
3417        chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3418        return chisq, chisqprob(chisq, k-1)
3419
3420# APROBABILITY CALCULATIONS
3421
3422    def achisqprob(chisq, df):
3423        """
3424    Returns the (1-tail) probability value associated with the provided chi-square
3425    value and df.  Heavily modified from chisq.c in Gary Perlman's |Stat.  Can
3426    handle multiple dimensions.
3427
3428    Usage:   achisqprob(chisq,df)    chisq=chisquare stat., df=degrees of freedom
3429    """
3430        BIG = 200.0
3431
3432        def ex(x):
3433            BIG = 200.0
3434            exponents = N.where(N.less(x, -BIG), -BIG, x)
3435            return N.exp(exponents)
3436
3437        if not isinstance(chisq, N.ArrayType):
3438            chisq = N.array([chisq])
3439        if df < 1:
3440            return N.ones(chisq.shape, N.float)
3441        probs = N.zeros(chisq.shape, N.Float)
3442        probs = N.where(N.less_equal(chisq, 0), 1.0, probs)  # set prob=1 for chisq<0
3443        a = 0.5 * chisq
3444        if df > 1:
3445            y = ex(-a)
3446        if df % 2 == 0:
3447            even = 1
3448            s = y*1
3449            s2 = s*1
3450        else:
3451            even = 0
3452            s = 2.0 * azprob(-N.sqrt(chisq))
3453            s2 = s*1
3454        if (df > 2):
3455            chisq = 0.5 * (df - 1.0)
3456            if even:
3457                z = N.ones(probs.shape, N.Float)
3458            else:
3459                z = 0.5 * N.ones(probs.shape, N.Float)
3460            if even:
3461                e = N.zeros(probs.shape, N.Float)
3462            else:
3463                e = N.log(N.sqrt(N.pi)) * N.ones(probs.shape, N.Float)
3464            c = N.log(a)
3465            mask = N.zeros(probs.shape)
3466            a_big = N.greater(a, BIG)
3467            a_big_frozen = -1 * N.ones(probs.shape, N.Float)
3468            totalelements = N.multiply.reduce(N.array(probs.shape))
3469            while asum(mask) != totalelements:
3470                e = N.log(z) + e
3471                s = s + ex(c*z-a-e)
3472                z = z + 1.0
3473#            print z, e, s
3474                newmask = N.greater(z, chisq)
3475                a_big_frozen = N.where(newmask*N.equal(mask, 0)*a_big, s, a_big_frozen)
3476                mask = N.clip(newmask+mask, 0, 1)
3477            if even:
3478                z = N.ones(probs.shape, N.Float)
3479                e = N.ones(probs.shape, N.Float)
3480            else:
3481                z = 0.5 * N.ones(probs.shape, N.Float)
3482                e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape, N.Float)
3483            c = 0.0
3484            mask = N.zeros(probs.shape)
3485            a_notbig_frozen = -1 * N.ones(probs.shape, N.Float)
3486            while asum(mask) != totalelements:
3487                e = e * (a/z.astype(N.Float))
3488                c = c + e
3489                z = z + 1.0
3490#            print '#2', z, e, c, s, c*y+s2
3491                newmask = N.greater(z, chisq)
3492                a_notbig_frozen = N.where(newmask*N.equal(mask, 0)*(1-a_big),
3493                                          c*y+s2, a_notbig_frozen)
3494                mask = N.clip(newmask+mask, 0, 1)
3495            probs = N.where(N.equal(probs, 1), 1,
3496                            N.where(N.greater(a, BIG), a_big_frozen, a_notbig_frozen))
3497            return probs
3498        else:
3499            return s
3500
3501    def aerfcc(x):
3502        """
3503    Returns the complementary error function erfc(x) with fractional error
3504    everywhere less than 1.2e-7.  Adapted from Numerical Recipies.  Can
3505    handle multiple dimensions.
3506
3507    Usage:   aerfcc(x)
3508    """
3509        z = abs(x)
3510        t = 1.0 / (1.0+0.5*z)
3511        ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3512        return N.where(N.greater_equal(x, 0), ans, 2.0-ans)
3513
3514    def azprob(z):
3515        """
3516    Returns the area under the normal curve 'to the left of' the given z value.
3517    Thus,
3518        for z<0, zprob(z) = 1-tail probability
3519        for z>0, 1.0-zprob(z) = 1-tail probability
3520        for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3521    Adapted from z.c in Gary Perlman's |Stat.  Can handle multiple dimensions.
3522
3523    Usage:   azprob(z)    where z is a z-value
3524    """
3525        def yfunc(y):
3526            x = (((((((((((((-0.000045255659 * y
3527                             + 0.000152529290) * y - 0.000019538132) * y
3528                           - 0.000676904986) * y + 0.001390604284) * y
3529                         - 0.000794620820) * y - 0.002034254874) * y
3530                       + 0.006549791214) * y - 0.010557625006) * y
3531                     + 0.011630447319) * y - 0.009279453341) * y
3532                   + 0.005353579108) * y - 0.002141268741) * y
3533                 + 0.000535310849) * y + 0.999936657524
3534            return x
3535
3536        def wfunc(w):
3537            x = ((((((((0.000124818987 * w
3538                        - 0.001075204047) * w + 0.005198775019) * w
3539                      - 0.019198292004) * w + 0.059054035642) * w
3540                    - 0.151968751364) * w + 0.319152932694) * w
3541                  - 0.531923007300) * w + 0.797884560593) * N.sqrt(w) * 2.0
3542            return x
3543
3544        Z_MAX = 6.0    # maximum meaningful z-value
3545        x = N.zeros(z.shape, N.Float)  # initialize
3546        y = 0.5 * N.fabs(z)
3547        x = N.where(N.less(y, 1.0), wfunc(y*y), yfunc(y-2.0))  # get x's
3548        x = N.where(N.greater(y, Z_MAX*0.5), 1.0, x)          # kill those with big Z
3549        prob = N.where(N.greater(z, 0), (x+1)*0.5, (1-x)*0.5)
3550        return prob
3551
3552    def aksprob(alam):
3553        """
3554   Returns the probability value for a K-S statistic computed via ks_2samp.
3555   Adapted from Numerical Recipies.  Can handle multiple dimensions.
3556
3557   Usage:   aksprob(alam)
3558   """
3559        if isinstance(alam, N.ArrayType):
3560            frozen = -1 * N.ones(alam.shape, N.Float64)
3561            alam = alam.astype(N.Float64)
3562            arrayflag = 1
3563        else:
3564            frozen = N.array(-1.)
3565            alam = N.array(alam, N.Float64)
3566        mask = N.zeros(alam.shape)
3567        fac = 2.0 * N.ones(alam.shape, N.Float)
3568        sum = N.zeros(alam.shape, N.Float)
3569        termbf = N.zeros(alam.shape, N.Float)
3570        a2 = N.array(-2.0*alam*alam, N.Float64)
3571        totalelements = N.multiply.reduce(N.array(mask.shape))
3572        for j in range(1, 201):
3573            if asum(mask) == totalelements:
3574                break
3575            exponents = (a2*j*j)
3576            overflowmask = N.less(exponents, -746)
3577            frozen = N.where(overflowmask, 0, frozen)
3578            mask = mask+overflowmask
3579            term = fac*N.exp(exponents)
3580            sum = sum + term
3581            newmask = N.where(N.less_equal(abs(term), (0.001*termbf))
3582                              + N.less(abs(term), 1.0e-8*sum), 1, 0)
3583            frozen = N.where(newmask*N.equal(mask, 0), sum, frozen)
3584            mask = N.clip(mask+newmask, 0, 1)
3585            fac = -fac
3586            termbf = abs(term)
3587        if arrayflag:
3588            return N.where(N.equal(frozen, -1), 1.0, frozen)  # 1.0 if doesn't converge
3589        else:
3590            return N.where(N.equal(frozen, -1), 1.0, frozen)[0]  # 1.0 if doesn't converge
3591
3592    def afprob(dfnum, dfden, F):
3593        """
3594    Returns the 1-tailed significance level (p-value) of an F statistic
3595    given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3596    of freedom for the denominator (dfF).  Can handle multiple dims for F.
3597
3598    Usage:   afprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
3599    """
3600        if isinstance(F, N.ArrayType):
3601            return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3602        else:
3603            return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3604
3605    def abetacf(a, b, x, verbose=1):
3606        """
3607    Evaluates the continued fraction form of the incomplete Beta function,
3608    betai.  (Adapted from: Numerical Recipies in C.)  Can handle multiple
3609    dimensions for x.
3610
3611    Usage:   abetacf(a,b,x,verbose=1)
3612    """
3613        ITMAX = 200
3614        EPS = 3.0e-7
3615
3616        arrayflag = 1
3617        if isinstance(x, N.ArrayType):
3618            frozen = N.ones(x.shape, N.Float) * -1  # start out w/ -1s, should replace all
3619        else:
3620            arrayflag = 0
3621            frozen = N.array([-1])
3622            x = N.array([x])
3623        mask = N.zeros(x.shape)
3624        bm = az = am = 1.0
3625        qab = a+b
3626        qap = a+1.0
3627        qam = a-1.0
3628        bz = 1.0-qab*x/qap
3629        for i in range(ITMAX+1):
3630            if N.sum(N.ravel(N.equal(frozen, -1))) == 0:
3631                break
3632            em = float(i+1)
3633            tem = em + em
3634            d = em*(b-em)*x/((qam+tem)*(a+tem))
3635            ap = az + d*am
3636            bp = bz+d*bm
3637            d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3638            app = ap+d*az
3639            bpp = bp+d*bz
3640            aold = az*1
3641            am = ap/bpp
3642            bm = bp/bpp
3643            az = app/bpp
3644            bz = 1.0
3645            newmask = N.less(abs(az-aold), EPS*abs(az))
3646            frozen = N.where(newmask*N.equal(mask, 0), az, frozen)
3647            mask = N.clip(mask+newmask, 0, 1)
3648        noconverge = asum(N.equal(frozen, -1))
3649        if noconverge != 0 and verbose:
3650            print('a or b too big, or ITMAX too small in Betacf for ', noconverge, ' elements')
3651        if arrayflag:
3652            return frozen
3653        else:
3654            return frozen[0]
3655
3656    def agammln(xx):
3657        """
3658    Returns the gamma function of xx.
3659        Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3660    Adapted from: Numerical Recipies in C.  Can handle multiple dims ... but
3661    probably doesn't normally have to.
3662
3663    Usage:   agammln(xx)
3664    """
3665        coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3666                 0.120858003e-2, -0.536382e-5]
3667        x = xx - 1.0
3668        tmp = x + 5.5
3669        tmp = tmp - (x+0.5)*N.log(tmp)
3670        ser = 1.0
3671        for j in range(len(coeff)):
3672            x = x + 1
3673            ser = ser + coeff[j]/x
3674        return -tmp + N.log(2.50662827465*ser)
3675
3676    def abetai(a, b, x, verbose=1):
3677        """
3678        Returns the incomplete beta function:
3679
3680            I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3681
3682        where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3683        function of a.  The continued fraction formulation is implemented
3684        here, using the betacf function.  (Adapted from: Numerical Recipies in
3685        C.)  Can handle multiple dimensions.
3686
3687        Usage:   abetai(a,b,x,verbose=1)
3688        """
3689        TINY = 1e-15
3690        if isinstance(a, N.ArrayType):
3691            if asum(N.less(x, 0)+N.greater(x, 1)) != 0:
3692                raise ValueError('Bad x in abetai')
3693        x = N.where(N.equal(x, 0), TINY, x)
3694        x = N.where(N.equal(x, 1.0), 1-TINY, x)
3695
3696        bt = N.where(N.equal(x, 0)+N.equal(x, 1), 0, -1)
3697        exponents = (gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b * N.log(1.0-x))
3698        # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
3699        exponents = N.where(N.less(exponents, -740), -740, exponents)
3700        bt = N.exp(exponents)
3701        if isinstance(x, N.ArrayType):
3702            ans = N.where(N.less(x, (a+1)/(a+b+2.0)),
3703                          bt*abetacf(a, b, x, verbose)/float(a),
3704                          1.0-bt*abetacf(b, a, 1.0-x, verbose)/float(b))
3705        else:
3706            if x < (a+1)/(a+b+2.0):
3707                ans = bt*abetacf(a, b, x, verbose)/float(a)
3708            else:
3709                ans = 1.0-bt*abetacf(b, a, 1.0-x, verbose)/float(b)
3710        return ans
3711
3712# AANOVA CALCULATIONS
3713
3714    import LinearAlgebra
3715    LA = LinearAlgebra
3716
3717    def aglm(data, para):
3718        """
3719    Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
3720    from:
3721        Peterson et al. Statistical limitations in functional neuroimaging
3722        I. Non-inferential methods and statistical models.  Phil Trans Royal Soc
3723        Lond B 354: 1239-1260.
3724
3725    Usage:   aglm(data,para)
3726    Returns: statistic, p-value ???
3727    """
3728        if len(para) != len(data):
3729            print("data and para must be same length in aglm")
3730            return
3731        n = len(para)
3732        p = pstat.aunique(para)
3733        x = N.zeros((n, len(p)))  # design matrix
3734        for l in range(len(p)):
3735            x[:, l] = N.equal(para, p[l])
3736        b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x), x)),  # i.e., b=inv(X'X)X'Y
3737                        N.transpose(x)),
3738                  data)
3739        diffs = (data - N.dot(x, b))
3740        s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
3741
3742        if len(p) == 2:  # ttest_ind
3743            c = N.array([1, -1])
3744            df = n-2
3745            fact = asum(1.0/asum(x, 0))  # i.e., 1/n1 + 1/n2 + 1/n3 ...
3746            t = N.dot(c, b) / N.sqrt(s_sq*fact)
3747            probs = abetai(0.5*df, 0.5, float(df)/(df+t*t))
3748            return t, probs
3749
3750    def aF_oneway(*args):
3751        """
3752        Performs a 1-way ANOVA, returning an F-value and probability given
3753        any number of groups.  From Heiman, pp.394-7.
3754
3755        Usage:   aF_oneway (*args)    where *args is 2 or more arrays, one per
3756                                        treatment group
3757        Returns: f-value, probability
3758        """
3759        na = len(args)  # ANOVA on 'na' groups, each in it's own array
3760        alldata = []
3761        alldata = N.concatenate(args)
3762        bign = len(alldata)
3763        sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
3764        ssbn = 0
3765        for a in args:
3766            ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
3767        ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
3768        sswn = sstot-ssbn
3769        dfbn = na-1
3770        dfwn = bign - na
3771        msb = ssbn/float(dfbn)
3772        msw = sswn/float(dfwn)
3773        f = msb/msw
3774        prob = fprob(dfbn, dfwn, f)
3775        return f, prob
3776
3777    def aF_value(ER, EF, dfR, dfF):
3778        """
3779    Returns an F-statistic given the following:
3780            ER  = error associated with the null hypothesis (the Restricted model)
3781            EF  = error associated with the alternate hypothesis (the Full model)
3782            dfR = degrees of freedom the Restricted model
3783            dfF = degrees of freedom associated with the Restricted model
3784    """
3785        return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3786
3787    def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
3788        Enum = round(Enum, 3)
3789        Eden = round(Eden, 3)
3790        dfnum = round(Enum, 3)
3791        dfden = round(dfden, 3)
3792        f = round(f, 3)
3793        prob = round(prob, 3)
3794        suffix = ''                       # for *s after the p-value
3795        if prob < 0.001:
3796            suffix = '  ***'
3797        elif prob < 0.01:
3798            suffix = '  **'
3799        elif prob < 0.05:
3800            suffix = '  *'
3801        title = [['EF/ER', 'DF', 'Mean Square', 'F-value', 'prob', '']]
3802        lofl = title+[[Enum, dfnum, round(Enum/float(dfnum), 3), f, prob, suffix],
3803                      [Eden, dfden, round(Eden/float(dfden), 3), '', '', '']]
3804        pstat.printcc(lofl)
3805        return
3806
3807    def F_value_multivariate(ER, EF, dfnum, dfden):
3808        """
3809   Returns an F-statistic given the following:
3810           ER  = error associated with the null hypothesis (the Restricted model)
3811           EF  = error associated with the alternate hypothesis (the Full model)
3812           dfR = degrees of freedom the Restricted model
3813           dfF = degrees of freedom associated with the Restricted model
3814   where ER and EF are matrices from a multivariate F calculation.
3815   """
3816        if type(ER) in [int, float]:
3817            ER = N.array([[ER]])
3818        if type(EF) in [int, float]:
3819            EF = N.array([[EF]])
3820        n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum)
3821        d_en = LA.determinant(EF) / float(dfden)
3822        return n_um / d_en
3823
3824# ASUPPORT FUNCTIONS
3825
3826    def asign(a):
3827        """
3828    Usage:   asign(a)
3829    Returns: array shape of a, with -1 where a<0 and +1 where a>=0
3830    """
3831        a = N.asarray(a)
3832        if ((isinstance(a, float)) or (isinstance(a, int))):
3833            return a-a-N.less(a, 0)+N.greater(a, 0)
3834        else:
3835            return N.zeros(N.shape(a))-N.less(a, 0)+N.greater(a, 0)
3836
3837    def asum(a, dimension=None, keepdims=0):
3838        """
3839   An alternative to the Numeric.add.reduce function, which allows one to
3840   (1) collapse over multiple dimensions at once, and/or (2) to retain
3841   all dimensions in the original array (squashing one down to size.
3842   Dimension can equal None (ravel array first), an integer (the
3843   dimension over which to operate), or a sequence (operate over multiple
3844   dimensions).  If keepdims=1, the resulting array will have as many
3845   dimensions as the input array.
3846
3847   Usage:   asum(a, dimension=None, keepdims=0)
3848   Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
3849   """
3850        if isinstance(a, N.ArrayType) and a.typecode() in ['l', 's', 'b']:
3851            a = a.astype(N.Float)
3852        if dimension is None:
3853            s = N.sum(N.ravel(a))
3854        elif type(dimension) in [int, float]:
3855            s = N.add.reduce(a, dimension)
3856            if keepdims == 1:
3857                shp = list(a.shape)
3858                shp[dimension] = 1
3859                s = N.reshape(s, shp)
3860        else:  # must be a SEQUENCE of dims to sum over
3861            dims = sorted(dimension)
3862            dims.reverse()
3863            s = a * 1.0
3864            for dim in dims:
3865                s = N.add.reduce(s, dim)
3866            if keepdims == 1:
3867                shp = list(a.shape)
3868                for dim in dims:
3869                    shp[dim] = 1
3870                s = N.reshape(s, shp)
3871        return s
3872
3873    def acumsum(a, dimension=None):
3874        """
3875    Returns an array consisting of the cumulative sum of the items in the
3876    passed array.  Dimension can equal None (ravel array first), an
3877    integer (the dimension over which to operate), or a sequence (operate
3878    over multiple dimensions, but this last one just barely makes sense).
3879
3880    Usage:   acumsum(a,dimension=None)
3881    """
3882        if dimension is None:
3883            a = N.ravel(a)
3884            dimension = 0
3885        if type(dimension) in [list, tuple, N.ArrayType]:
3886            dimension = sorted(dimension)
3887            dimension.reverse()
3888            for d in dimension:
3889                a = N.add.accumulate(a, d)
3890            return a
3891        else:
3892            return N.add.accumulate(a, dimension)
3893
3894    def ass(inarray, dimension=None, keepdims=0):
3895        """
3896    Squares each value in the passed array, adds these squares & returns
3897    the result.  Unfortunate function name. :-) Defaults to ALL values in
3898    the array.  Dimension can equal None (ravel array first), an integer
3899    (the dimension over which to operate), or a sequence (operate over
3900    multiple dimensions).  Set keepdims=1 to maintain the original number
3901    of dimensions.
3902
3903    Usage:   ass(inarray, dimension=None, keepdims=0)
3904    Returns: sum-along-'dimension' for (inarray*inarray)
3905    """
3906        if dimension is None:
3907            inarray = N.ravel(inarray)
3908            dimension = 0
3909        return asum(inarray*inarray, dimension, keepdims)
3910
3911    def asummult(array1, array2, dimension=None, keepdims=0):
3912        """
3913    Multiplies elements in array1 and array2, element by element, and
3914    returns the sum (along 'dimension') of all resulting multiplications.
3915    Dimension can equal None (ravel array first), an integer (the
3916    dimension over which to operate), or a sequence (operate over multiple
3917    dimensions).  A trivial function, but included for completeness.
3918
3919    Usage:   asummult(array1,array2,dimension=None,keepdims=0)
3920    """
3921        if dimension is None:
3922            array1 = N.ravel(array1)
3923            array2 = N.ravel(array2)
3924            dimension = 0
3925        return asum(array1*array2, dimension, keepdims)
3926
3927    def asquare_of_sums(inarray, dimension=None, keepdims=0):
3928        """
3929    Adds the values in the passed array, squares that sum, and returns the
3930    result.  Dimension can equal None (ravel array first), an integer (the
3931    dimension over which to operate), or a sequence (operate over multiple
3932    dimensions).  If keepdims=1, the returned array will have the same
3933    NUMBER of dimensions as the original.
3934
3935    Usage:   asquare_of_sums(inarray, dimension=None, keepdims=0)
3936    Returns: the square of the sum over dim(s) in dimension
3937    """
3938        if dimension is None:
3939            inarray = N.ravel(inarray)
3940            dimension = 0
3941        s = asum(inarray, dimension, keepdims)
3942        if isinstance(s, N.ArrayType):
3943            return s.astype(N.Float)*s
3944        else:
3945            return float(s)*s
3946
3947    def asumdiffsquared(a, b, dimension=None, keepdims=0):
3948        """
3949    Takes pairwise differences of the values in arrays a and b, squares
3950    these differences, and returns the sum of these squares.  Dimension
3951    can equal None (ravel array first), an integer (the dimension over
3952    which to operate), or a sequence (operate over multiple dimensions).
3953    keepdims=1 means the return shape = len(a.shape) = len(b.shape)
3954
3955    Usage:   asumdiffsquared(a,b)
3956    Returns: sum[ravel(a-b)**2]
3957    """
3958        if dimension is None:
3959            N.ravel(a)  # inarray
3960            dimension = 0
3961        return asum((a-b)**2, dimension, keepdims)
3962
3963    def ashellsort(inarray):
3964        """
3965    Shellsort algorithm.  Sorts a 1D-array.
3966
3967    Usage:   ashellsort(inarray)
3968    Returns: sorted-inarray, sorting-index-vector (for original array)
3969    """
3970        n = len(inarray)
3971        svec = inarray * 1.0
3972        ivec = list(range(n))
3973        gap = n/2   # integer division needed
3974        while gap > 0:
3975            for i in range(gap, n):
3976                for j in range(i-gap, -1, -gap):
3977                    while j >= 0 and svec[j] > svec[j+gap]:
3978                        temp = svec[j]
3979                        svec[j] = svec[j+gap]
3980                        svec[j+gap] = temp
3981                        itemp = ivec[j]
3982                        ivec[j] = ivec[j+gap]
3983                        ivec[j+gap] = itemp
3984            gap = gap / 2  # integer division needed
3985#    svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
3986        return svec, ivec
3987
3988    def arankdata(inarray):
3989        """
3990    Ranks the data in inarray, dealing with ties appropritely.  Assumes
3991    a 1D inarray.  Adapted from Gary Perlman's |Stat ranksort.
3992
3993    Usage:   arankdata(inarray)
3994    Returns: array of length equal to inarray, containing rank scores
3995    """
3996        n = len(inarray)
3997        svec, ivec = ashellsort(inarray)
3998        sumranks = 0
3999        dupcount = 0
4000        newarray = N.zeros(n, N.Float)
4001        for i in range(n):
4002            sumranks = sumranks + i
4003            dupcount = dupcount + 1
4004            if i == n-1 or svec[i] != svec[i+1]:
4005                averank = sumranks / float(dupcount) + 1
4006                for j in range(i-dupcount+1, i+1):
4007                    newarray[ivec[j]] = averank
4008                sumranks = 0
4009                dupcount = 0
4010        return newarray
4011
4012    def afindwithin(data):
4013        """
4014    Returns a binary vector, 1=within-subject factor, 0=between.  Input
4015    equals the entire data array (i.e., column 1=random factor, last
4016    column = measured values.
4017
4018    Usage:   afindwithin(data)     data in |Stat format
4019    """
4020        numfact = len(data[0])-2
4021        withinvec = [0]*numfact
4022        for col in range(1, numfact+1):
4023            rows = pstat.linexand(data, col, pstat.unique(pstat.colex(data, 1))[0])  # get 1 level of this factor
4024            if len(pstat.unique(pstat.colex(rows, 0))) < len(rows):   # if fewer subjects than scores on this factor
4025                withinvec[col-1] = 1
4026        return withinvec
4027
4028    # RE-DEFINE DISPATCHES TO INCLUDE ARRAYS
4029
4030# CENTRAL TENDENCY:
4031    geometricmean = Dispatch((lgeometricmean, (list, tuple)), (ageometricmean, (N.ArrayType,)))
4032    harmonicmean = Dispatch((lharmonicmean, (list, tuple)), (aharmonicmean, (N.ArrayType,)))
4033    mean = Dispatch((lmean, (list, tuple)), (amean, (N.ArrayType,)))
4034    median = Dispatch((lmedian, (list, tuple)), (amedian, (N.ArrayType,)))
4035    medianscore = Dispatch((lmedianscore, (list, tuple)), (amedianscore, (N.ArrayType,)))
4036    mode = Dispatch((lmode, (list, tuple)), (amode, (N.ArrayType,)))
4037    tmean = Dispatch((atmean, (N.ArrayType,)))
4038    tvar = Dispatch((atvar, (N.ArrayType,)))
4039    tstdev = Dispatch((atstdev, (N.ArrayType,)))
4040    tsem = Dispatch((atsem, (N.ArrayType,)))
4041
4042# VARIATION:
4043    moment = Dispatch((lmoment, (list, tuple)), (amoment, (N.ArrayType,)))
4044    variation = Dispatch((lvariation, (list, tuple)), (avariation, (N.ArrayType,)))
4045    skew = Dispatch((lskew, (list, tuple)), (askew, (N.ArrayType,)))
4046    kurtosis = Dispatch((lkurtosis, (list, tuple)), (akurtosis, (N.ArrayType,)))
4047    describe = Dispatch((ldescribe, (list, tuple)), (adescribe, (N.ArrayType,)))
4048
4049# DISTRIBUTION TESTS
4050
4051    skewtest = Dispatch((askewtest, (list, tuple)), (askewtest, (N.ArrayType,)))
4052    kurtosistest = Dispatch((akurtosistest, (list, tuple)), (akurtosistest, (N.ArrayType,)))
4053    normaltest = Dispatch((anormaltest, (list, tuple)), (anormaltest, (N.ArrayType,)))
4054
4055# FREQUENCY STATS:
4056    itemfreq = Dispatch((litemfreq, (list, tuple)), (aitemfreq, (N.ArrayType,)))
4057    scoreatpercentile = Dispatch((lscoreatpercentile, (list, tuple)), (ascoreatpercentile, (N.ArrayType,)))
4058    percentileofscore = Dispatch((lpercentileofscore, (list, tuple)), (apercentileofscore, (N.ArrayType,)))
4059    histogram = Dispatch((lhistogram, (list, tuple)), (ahistogram, (N.ArrayType,)))
4060    cumfreq = Dispatch((lcumfreq, (list, tuple)), (acumfreq, (N.ArrayType,)))
4061    relfreq = Dispatch((lrelfreq, (list, tuple)), (arelfreq, (N.ArrayType,)))
4062
4063# VARIABILITY:
4064    obrientransform = Dispatch((lobrientransform, (list, tuple)), (aobrientransform, (N.ArrayType,)))
4065    samplevar = Dispatch((lsamplevar, (list, tuple)), (asamplevar, (N.ArrayType,)))
4066    samplestdev = Dispatch((lsamplestdev, (list, tuple)), (asamplestdev, (N.ArrayType,)))
4067    signaltonoise = Dispatch((asignaltonoise, (N.ArrayType,)),)
4068    var = Dispatch((lvar, (list, tuple)), (avar, (N.ArrayType,)))
4069    stdev = Dispatch((lstdev, (list, tuple)), (astdev, (N.ArrayType,)))
4070    sterr = Dispatch((lsterr, (list, tuple)), (asterr, (N.ArrayType,)))
4071    sem = Dispatch((lsem, (list, tuple)), (asem, (N.ArrayType,)))
4072    z = Dispatch((lz, (list, tuple)), (az, (N.ArrayType,)))
4073    zs = Dispatch((lzs, (list, tuple)), (azs, (N.ArrayType,)))
4074
4075# TRIMMING FCNS:
4076    threshold = Dispatch((athreshold, (N.ArrayType,)),)
4077    trimboth = Dispatch((ltrimboth, (list, tuple)), (atrimboth, (N.ArrayType,)))
4078    trim1 = Dispatch((ltrim1, (list, tuple)), (atrim1, (N.ArrayType,)))
4079
4080# CORRELATION FCNS:
4081    paired = Dispatch((lpaired, (list, tuple)), (apaired, (N.ArrayType,)))
4082    pearsonr = Dispatch((lpearsonr, (list, tuple)), (apearsonr, (N.ArrayType,)))
4083    spearmanr = Dispatch((lspearmanr, (list, tuple)), (aspearmanr, (N.ArrayType,)))
4084    pointbiserialr = Dispatch((lpointbiserialr, (list, tuple)), (apointbiserialr, (N.ArrayType,)))
4085    kendalltau = Dispatch((lkendalltau, (list, tuple)), (akendalltau, (N.ArrayType,)))
4086    linregress = Dispatch((llinregress, (list, tuple)), (alinregress, (N.ArrayType,)))
4087
4088# INFERENTIAL STATS:
4089    ttest_1samp = Dispatch((lttest_1samp, (list, tuple)), (attest_1samp, (N.ArrayType,)))
4090    ttest_ind = Dispatch((lttest_ind, (list, tuple)), (attest_ind, (N.ArrayType,)))
4091    ttest_rel = Dispatch((lttest_rel, (list, tuple)), (attest_rel, (N.ArrayType,)))
4092    chisquare = Dispatch((lchisquare, (list, tuple)), (achisquare, (N.ArrayType,)))
4093    ks_2samp = Dispatch((lks_2samp, (list, tuple)), (aks_2samp, (N.ArrayType,)))
4094    mannwhitneyu = Dispatch((lmannwhitneyu, (list, tuple)), (amannwhitneyu, (N.ArrayType,)))
4095    tiecorrect = Dispatch((ltiecorrect, (list, tuple)), (atiecorrect, (N.ArrayType,)))
4096    ranksums = Dispatch((lranksums, (list, tuple)), (aranksums, (N.ArrayType,)))
4097    wilcoxont = Dispatch((lwilcoxont, (list, tuple)), (awilcoxont, (N.ArrayType,)))
4098    kruskalwallish = Dispatch((lkruskalwallish, (list, tuple)), (akruskalwallish, (N.ArrayType,)))
4099    friedmanchisquare = Dispatch((lfriedmanchisquare, (list, tuple)), (afriedmanchisquare, (N.ArrayType,)))
4100
4101# PROBABILITY CALCS:
4102    chisqprob = Dispatch((lchisqprob, (int, float)), (achisqprob, (N.ArrayType,)))
4103    zprob = Dispatch((lzprob, (int, float)), (azprob, (N.ArrayType,)))
4104    ksprob = Dispatch((lksprob, (int, float)), (aksprob, (N.ArrayType,)))
4105    fprob = Dispatch((lfprob, (int, float)), (afprob, (N.ArrayType,)))
4106    betacf = Dispatch((lbetacf, (int, float)), (abetacf, (N.ArrayType,)))
4107    betai = Dispatch((lbetai, (int, float)), (abetai, (N.ArrayType,)))
4108    erfcc = Dispatch((lerfcc, (int, float)), (aerfcc, (N.ArrayType,)))
4109    gammln = Dispatch((lgammln, (int, float)), (agammln, (N.ArrayType,)))
4110
4111# ANOVA FUNCTIONS:
4112    F_oneway = Dispatch((lF_oneway, (list, tuple)), (aF_oneway, (N.ArrayType,)))
4113    F_value = Dispatch((lF_value, (list, tuple)), (aF_value, (N.ArrayType,)))
4114
4115# SUPPORT FUNCTIONS:
4116    incr = Dispatch((lincr, (list, tuple, N.ArrayType)), )
4117    sum = Dispatch((lsum, (list, tuple)), (asum, (N.ArrayType,)))
4118    cumsum = Dispatch((lcumsum, (list, tuple)), (acumsum, (N.ArrayType,)))
4119    ss = Dispatch((lss, (list, tuple)), (ass, (N.ArrayType,)))
4120    summult = Dispatch((lsummult, (list, tuple)), (asummult, (N.ArrayType,)))
4121    square_of_sums = Dispatch((lsquare_of_sums, (list, tuple)), (asquare_of_sums, (N.ArrayType,)))
4122    sumdiffsquared = Dispatch((lsumdiffsquared, (list, tuple)), (asumdiffsquared, (N.ArrayType,)))
4123    shellsort = Dispatch((lshellsort, (list, tuple)), (ashellsort, (N.ArrayType,)))
4124    rankdata = Dispatch((lrankdata, (list, tuple)), (arankdata, (N.ArrayType,)))
4125    findwithin = Dispatch((lfindwithin, (list, tuple)), (afindwithin, (N.ArrayType,)))
4126
4127# END OF NUMERIC FUNCTION BLOCK
4128
4129# END OF STATISTICAL FUNCTIONS
4130
4131except ImportError:
4132    pass
4133