1# Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved. 2# 3# This software is distributable under the terms of the GNU 4# General Public License (GPL) v2, the text of which can be found at 5# http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 6# using this module constitutes acceptance of the terms of this License. 7# 8# Disclaimer 9# 10# This software is provided "as-is". There are no expressed or implied 11# warranties of any kind, including, but not limited to, the warranties 12# of merchantability and fittness for a given application. In no event 13# shall Gary Strangman be liable for any direct, indirect, incidental, 14# special, exemplary or consequential damages (including, but not limited 15# to, loss of use, data or profits, or business interruption) however 16# caused and on any theory of liability, whether in contract, strict 17# liability or tort (including negligence or otherwise) arising in any way 18# out of the use of this software, even if advised of the possibility of 19# such damage. 20# 21# Comments and/or additions are welcome (send e-mail to: 22# strang@nmr.mgh.harvard.edu). 23# 24""" 25stats.py module 26 27(Requires pstat.py module.) 28 29################################################# 30####### Written by: Gary Strangman ########### 31####### Last modified: May 10, 2002 ########### 32################################################# 33 34A collection of basic statistical functions for python. The function 35names appear below. 36 37IMPORTANT: There are really *3* sets of functions. The first set has an 'l' 38prefix, which can be used with list or tuple arguments. The second set has 39an 'a' prefix, which can accept NumPy array arguments. These latter 40functions are defined only when NumPy is available on the system. The third 41type has NO prefix (i.e., has the name that appears below). Functions of 42this set are members of a "Dispatch" class, c/o David Ascher. This class 43allows different functions to be called depending on the type of the passed 44arguments. Thus, stats.mean is a member of the Dispatch class and 45stats.mean(range(20)) will call stats.lmean(range(20)) while 46stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). 47This is a handy way to keep consistent function names when different 48argument types require different functions to be called. Having 49implementated the Dispatch class, however, means that to get info on 50a given function, you must use the REAL function name ... that is 51"print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, 52while "print stats.mean.__doc__" will print the doc for the Dispatch 53class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options 54but should otherwise be consistent with the corresponding list functions. 55 56Disclaimers: The function list is obviously incomplete and, worse, the 57functions are not optimized. All functions have been tested (some more 58so than others), but they are far from bulletproof. Thus, as with any 59free software, no warranty or guarantee is expressed or implied. :-) A 60few extra functions that don't appear in the list below can be found by 61interested treasure-hunters. These functions don't necessarily have 62both list and array versions but were deemed useful 63 64CENTRAL TENDENCY: geometricmean 65 harmonicmean 66 mean 67 median 68 medianscore 69 mode 70 71MOMENTS: moment 72 variation 73 skew 74 kurtosis 75 skewtest (for Numpy arrays only) 76 kurtosistest (for Numpy arrays only) 77 normaltest (for Numpy arrays only) 78 79ALTERED VERSIONS: tmean (for Numpy arrays only) 80 tvar (for Numpy arrays only) 81 tmin (for Numpy arrays only) 82 tmax (for Numpy arrays only) 83 tstdev (for Numpy arrays only) 84 tsem (for Numpy arrays only) 85 describe 86 87FREQUENCY STATS: itemfreq 88 scoreatpercentile 89 percentileofscore 90 histogram 91 cumfreq 92 relfreq 93 94VARIABILITY: obrientransform 95 samplevar 96 samplestdev 97 signaltonoise (for Numpy arrays only) 98 var 99 stdev 100 sterr 101 sem 102 z 103 zs 104 zmap (for Numpy arrays only) 105 106TRIMMING FCNS: threshold (for Numpy arrays only) 107 trimboth 108 trim1 109 round (round all vals to 'n' decimals; Numpy only) 110 111CORRELATION FCNS: covariance (for Numpy arrays only) 112 correlation (for Numpy arrays only) 113 paired 114 pearsonr 115 spearmanr 116 pointbiserialr 117 kendalltau 118 linregress 119 120INFERENTIAL STATS: ttest_1samp 121 ttest_ind 122 ttest_rel 123 chisquare 124 ks_2samp 125 mannwhitneyu 126 ranksums 127 wilcoxont 128 kruskalwallish 129 friedmanchisquare 130 131PROBABILITY CALCS: chisqprob 132 erfcc 133 zprob 134 ksprob 135 fprob 136 betacf 137 gammln 138 betai 139 140ANOVA FUNCTIONS: F_oneway 141 F_value 142 143SUPPORT FUNCTIONS: writecc 144 incr 145 sign (for Numpy arrays only) 146 sum 147 cumsum 148 ss 149 summult 150 sumdiffsquared 151 square_of_sums 152 shellsort 153 rankdata 154 outputpairedstats 155 findwithin 156""" 157# CHANGE LOG: 158# =========== 159# 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows 160# 02-05-10 ... fixed lchisqprob indentation (failed when df=even) 161# 00-12-28 ... removed aanova() to separate module, fixed licensing to 162# match Python License, fixed doc string & imports 163# 00-04-13 ... pulled all "global" statements, except from aanova() 164# added/fixed lots of documentation, removed io.py dependency 165# changed to version 0.5 166# 99-11-13 ... added asign() function 167# 99-11-01 ... changed version to 0.4 ... enough incremental changes now 168# 99-10-25 ... added acovariance and acorrelation functions 169# 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors 170# added aglm function (crude, but will be improved) 171# 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to 172# all handle lists of 'dimension's and keepdims 173# REMOVED ar0, ar2, ar3, ar4 and replaced them with around 174# reinserted fixes for abetai to avoid math overflows 175# 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to 176# handle multi-dimensional arrays (whew!) 177# 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) 178# added anormaltest per same reference 179# re-wrote azprob to calc arrays of probs all at once 180# 99-08-22 ... edited attest_ind printing section so arrays could be rounded 181# 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on 182# short/byte arrays (mean of #s btw 100-300 = -150??) 183# 99-08-09 ... fixed asum so that the None case works for Byte arrays 184# 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays 185# 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) 186# 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) 187# 04/11/99 ... added asignaltonoise, athreshold functions, changed all 188# max/min in array section to N.maximum/N.minimum, 189# fixed square_of_sums to prevent integer overflow 190# 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums 191# 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions 192# 02/28/99 ... Fixed aobrientransform to return an array rather than a list 193# 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) 194# 01/13/99 ... CHANGED TO VERSION 0.3 195# fixed bug in a/lmannwhitneyu p-value calculation 196# 12/31/98 ... fixed variable-name bug in ldescribe 197# 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) 198# 12/16/98 ... changed amedianscore to return float (not array) for 1 score 199# 12/14/98 ... added atmin and atmax functions 200# removed umath from import line (not needed) 201# l/ageometricmean modified to reduce chance of overflows (take 202# nth root first, then multiply) 203# 12/07/98 ... added __version__variable (now 0.2) 204# removed all 'stats.' from anova() fcn 205# 12/06/98 ... changed those functions (except shellsort) that altered 206# arguments in-place ... cumsum, ranksort, ... 207# updated (and fixed some) doc-strings 208# 12/01/98 ... added anova() function (requires NumPy) 209# incorporated Dispatch class 210# 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean 211# added 'asum' function (added functionality to N.add.reduce) 212# fixed both moment and amoment (two errors) 213# changed name of skewness and askewness to skew and askew 214# fixed (a)histogram (which sometimes counted points <lowerlimit) 215 216import copy 217import math 218import string 219 220from . import pstat # required 3rd party module 221 222 223__version__ = 0.6 224 225# DISPATCH CODE 226 227 228class Dispatch: 229 """ 230The Dispatch class, care of David Ascher, allows different functions to 231be called depending on the argument types. This way, there can be one 232function name regardless of the argument type. To access function doc 233in stats.py module, prefix the function with an 'l' or 'a' for list or 234array arguments, respectively. That is, print stats.lmean.__doc__ or 235print stats.amean.__doc__ or whatever. 236""" 237 238 def __init__(self, *tuples): 239 self._dispatch = {} 240 for func, types in tuples: 241 for t in types: 242 if t in self._dispatch.keys(): 243 raise ValueError("can't have two dispatches on "+str(t)) 244 self._dispatch[t] = func 245 self._types = list(self._dispatch.keys()) 246 247 def __call__(self, arg1, *args, **kw): 248 if type(arg1) not in self._types: 249 raise TypeError("don't know how to dispatch %s arguments" % type(arg1)) 250 return self._dispatch[type(arg1)](*(arg1,) + args, **kw) 251 252 253# LIST-BASED FUNCTIONS 254 255# Define these regardless 256 257# CENTRAL TENDENCY 258 259def lgeometricmean(inlist): 260 """ 261Calculates the geometric mean of the values in the passed list. 262That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list. 263 264Usage: lgeometricmean(inlist) 265""" 266 mult = 1.0 267 one_over_n = 1.0/len(inlist) 268 for item in inlist: 269 mult = mult * pow(item, one_over_n) 270 return mult 271 272 273def lharmonicmean(inlist): 274 """ 275Calculates the harmonic mean of the values in the passed list. 276That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list. 277 278Usage: lharmonicmean(inlist) 279""" 280 sum = 0 281 for item in inlist: 282 sum = sum + 1.0/item 283 return len(inlist) / sum 284 285 286def lmean(inlist): 287 """ 288Returns the arithematic mean of the values in the passed list. 289Assumes a '1D' list, but will function on the 1st dim of an array(!). 290 291Usage: lmean(inlist) 292""" 293 sum = 0 294 for item in inlist: 295 sum = sum + item 296 return sum/float(len(inlist)) 297 298 299def lmedian(inlist, numbins=1000): 300 """ 301Returns the computed median value of a list of numbers, given the 302number of bins to use for the histogram (more bins brings the computed value 303closer to the median score, default number of bins = 1000). See G.W. 304Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. 305 306Usage: lmedian (inlist, numbins=1000) 307""" 308 (hist, smallest, binsize, extras) = histogram(inlist, numbins) # make histog 309 cumhist = cumsum(hist) # make cumulative histogram 310 for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score 311 if cumhist[i] >= len(inlist)/2.0: 312 cfbin = i 313 break 314 LRL = smallest + binsize*cfbin # get lower read limit of that bin 315 cfbelow = cumhist[cfbin-1] 316 freq = float(hist[cfbin]) # frequency IN the 50%ile bin 317 median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula 318 return median 319 320 321def lmedianscore(inlist): 322 """ 323Returns the 'middle' score of the passed list. If there is an even 324number of scores, the mean of the 2 middle scores is returned. 325 326Usage: lmedianscore(inlist) 327""" 328 329 newlist = sorted(copy.deepcopy(inlist)) 330 if len(newlist) % 2 == 0: # if even number of scores, average middle 2 331 index = len(newlist)/2 # integer division correct 332 median = float(newlist[index] + newlist[index-1]) / 2 333 else: 334 index = len(newlist)/2 # int divsion gives mid value when count from 0 335 median = newlist[index] 336 return median 337 338 339def lmode(inlist): 340 """ 341Returns a list of the modal (most common) score(s) in the passed 342list. If there is more than one such score, all are returned. The 343bin-count for the mode(s) is also returned. 344 345Usage: lmode(inlist) 346Returns: bin-count for mode(s), a list of modal value(s) 347""" 348 349 scores = sorted(pstat.unique(inlist)) 350 freq = [] 351 for item in scores: 352 freq.append(inlist.count(item)) 353 maxfreq = max(freq) 354 mode = [] 355 stillmore = 1 356 while stillmore: 357 try: 358 indx = freq.index(maxfreq) 359 mode.append(scores[indx]) 360 del freq[indx] 361 del scores[indx] 362 except ValueError: 363 stillmore = 0 364 return maxfreq, mode 365 366 367# MOMENTS 368 369def lmoment(inlist, moment=1): 370 """ 371Calculates the nth moment about the mean for a sample (defaults to 372the 1st moment). Used to calculate coefficients of skewness and kurtosis. 373 374Usage: lmoment(inlist,moment=1) 375Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) 376""" 377 if moment == 1: 378 return 0.0 379 else: 380 mn = mean(inlist) 381 n = len(inlist) 382 s = 0 383 for x in inlist: 384 s = s + (x-mn)**moment 385 return s/float(n) 386 387 388def lvariation(inlist): 389 """ 390Returns the coefficient of variation, as defined in CRC Standard 391Probability and Statistics, p.6. 392 393Usage: lvariation(inlist) 394""" 395 return 100.0*samplestdev(inlist)/float(mean(inlist)) 396 397 398def lskew(inlist): 399 """ 400Returns the skewness of a distribution, as defined in Numerical 401Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 402 403Usage: lskew(inlist) 404""" 405 return moment(inlist, 3)/pow(moment(inlist, 2), 1.5) 406 407 408def lkurtosis(inlist): 409 """ 410Returns the kurtosis of a distribution, as defined in Numerical 411Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 412 413Usage: lkurtosis(inlist) 414""" 415 return moment(inlist, 4)/pow(moment(inlist, 2), 2.0) 416 417 418def ldescribe(inlist): 419 """ 420Returns some descriptive statistics of the passed list (assumed to be 1D). 421 422Usage: ldescribe(inlist) 423Returns: n, mean, standard deviation, skew, kurtosis 424""" 425 n = len(inlist) 426 mm = (min(inlist), max(inlist)) 427 m = mean(inlist) 428 sd = stdev(inlist) 429 sk = skew(inlist) 430 kurt = kurtosis(inlist) 431 return n, mm, m, sd, sk, kurt 432 433 434# FREQUENCY STATS 435 436def litemfreq(inlist): 437 """ 438Returns a list of pairs. Each pair consists of one of the scores in inlist 439and it's frequency count. Assumes a 1D list is passed. 440 441Usage: litemfreq(inlist) 442Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 443""" 444 scores = sorted(pstat.unique(inlist)) 445 freq = [] 446 for item in scores: 447 freq.append(inlist.count(item)) 448 return pstat.abut(scores, freq) 449 450 451def lscoreatpercentile(inlist, percent): 452 """ 453Returns the score at a given percentile relative to the distribution 454given by inlist. 455 456Usage: lscoreatpercentile(inlist,percent) 457""" 458 if percent > 1: 459 print("\nDividing percent>1 by 100 in lscoreatpercentile().\n") 460 percent = percent / 100.0 461 targetcf = percent*len(inlist) 462 h, lrl, binsize, extras = histogram(inlist) 463 cumhist = cumsum(copy.deepcopy(h)) 464 for i in range(len(cumhist)): 465 if cumhist[i] >= targetcf: 466 break 467 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 468 return score 469 470 471def lpercentileofscore(inlist, score, histbins=10, defaultlimits=None): 472 """ 473Returns the percentile value of a score relative to the distribution 474given by inlist. Formula depends on the values used to histogram the data(!). 475 476Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) 477""" 478 479 h, lrl, binsize, extras = histogram(inlist, histbins, defaultlimits) 480 cumhist = cumsum(copy.deepcopy(h)) 481 i = int((score - lrl)/float(binsize)) 482 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 483 return pct 484 485 486def lhistogram(inlist, numbins=10, defaultreallimits=None, printextras=0): 487 """ 488Returns (i) a list of histogram bin counts, (ii) the smallest value 489of the histogram binning, and (iii) the bin width (the last 2 are not 490necessarily integers). Default number of bins is 10. If no sequence object 491is given for defaultreallimits, the routine picks (usually non-pretty) bins 492spanning all the numbers in the inlist. 493 494Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) 495Returns: list of bin values, lowerreallimit, binsize, extrapoints 496""" 497 if (defaultreallimits is not None): 498 if type(defaultreallimits) not in [list, tuple] or len(defaultreallimits) == 1: # only one limit given, assumed to be lower one & upper is calc'd 499 lowerreallimit = defaultreallimits 500 upperreallimit = 1.0001 * max(inlist) 501 else: # assume both limits given 502 lowerreallimit = defaultreallimits[0] 503 upperreallimit = defaultreallimits[1] 504 binsize = (upperreallimit-lowerreallimit)/float(numbins) 505 else: # no limits given for histogram, both must be calc'd 506 estbinwidth = (max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all 507 binsize = (max(inlist)-min(inlist)+estbinwidth)/float(numbins) 508 lowerreallimit = min(inlist) - binsize/2 # lower real limit,1st bin 509 bins = [0]*(numbins) 510 extrapoints = 0 511 for num in inlist: 512 try: 513 if (num-lowerreallimit) < 0: 514 extrapoints = extrapoints + 1 515 else: 516 bintoincrement = int((num-lowerreallimit)/float(binsize)) 517 bins[bintoincrement] = bins[bintoincrement] + 1 518 except Exception: 519 extrapoints = extrapoints + 1 520 if (extrapoints > 0 and printextras == 1): 521 print('\nPoints outside given histogram range =', extrapoints) 522 return (bins, lowerreallimit, binsize, extrapoints) 523 524 525def lcumfreq(inlist, numbins=10, defaultreallimits=None): 526 """ 527Returns a cumulative frequency histogram, using the histogram function. 528 529Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None) 530Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 531""" 532 h, l, b, e = histogram(inlist, numbins, defaultreallimits) 533 cumhist = cumsum(copy.deepcopy(h)) 534 return cumhist, l, b, e 535 536 537def lrelfreq(inlist, numbins=10, defaultreallimits=None): 538 """ 539Returns a relative frequency histogram, using the histogram function. 540 541Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None) 542Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 543""" 544 h, l, b, e = histogram(inlist, numbins, defaultreallimits) 545 for i in range(len(h)): 546 h[i] = h[i]/float(len(inlist)) 547 return h, l, b, e 548 549 550# VARIABILITY FUNCTIONS 551 552def lobrientransform(*args): 553 """ 554Computes a transform on input data (any number of columns). Used to 555test for homogeneity of variance prior to running one-way stats. From 556Maxwell and Delaney, p.112. 557 558Usage: lobrientransform(*args) 559Returns: transformed data for use in an ANOVA 560""" 561 TINY = 1e-10 562 k = len(args) 563 n = [0.0]*k 564 v = [0.0]*k 565 m = [0.0]*k 566 nargs = [] 567 for i in range(k): 568 nargs.append(copy.deepcopy(args[i])) 569 n[i] = float(len(nargs[i])) 570 v[i] = var(nargs[i]) 571 m[i] = mean(nargs[i]) 572 for j in range(k): 573 for i in range(n[j]): 574 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 575 t2 = 0.5*v[j]*(n[j]-1.0) 576 t3 = (n[j]-1.0)*(n[j]-2.0) 577 nargs[j][i] = (t1-t2) / float(t3) 578 check = 1 579 for j in range(k): 580 if v[j] - mean(nargs[j]) > TINY: 581 check = 0 582 if check != 1: 583 raise ValueError('Problem in obrientransform.') 584 else: 585 return nargs 586 587 588def lsamplevar(inlist): 589 """ 590Returns the variance of the values in the passed list using 591N for the denominator (i.e., DESCRIBES the sample variance only). 592 593Usage: lsamplevar(inlist) 594""" 595 n = len(inlist) 596 mn = mean(inlist) 597 deviations = [] 598 for item in inlist: 599 deviations.append(item-mn) 600 return ss(deviations)/float(n) 601 602 603def lsamplestdev(inlist): 604 """ 605Returns the standard deviation of the values in the passed list using 606N for the denominator (i.e., DESCRIBES the sample stdev only). 607 608Usage: lsamplestdev(inlist) 609""" 610 return math.sqrt(samplevar(inlist)) 611 612 613def lvar(inlist): 614 """ 615Returns the variance of the values in the passed list using N-1 616for the denominator (i.e., for estimating population variance). 617 618Usage: lvar(inlist) 619""" 620 n = len(inlist) 621 mn = mean(inlist) 622 deviations = [0]*len(inlist) 623 for i in range(len(inlist)): 624 deviations[i] = inlist[i] - mn 625 return ss(deviations)/float(n-1) 626 627 628def lstdev(inlist): 629 """ 630Returns the standard deviation of the values in the passed list 631using N-1 in the denominator (i.e., to estimate population stdev). 632 633Usage: lstdev(inlist) 634""" 635 return math.sqrt(var(inlist)) 636 637 638def lsterr(inlist): 639 """ 640Returns the standard error of the values in the passed list using N-1 641in the denominator (i.e., to estimate population standard error). 642 643Usage: lsterr(inlist) 644""" 645 return stdev(inlist) / float(math.sqrt(len(inlist))) 646 647 648def lsem(inlist): 649 """ 650Returns the estimated standard error of the mean (sx-bar) of the 651values in the passed list. sem = stdev / sqrt(n) 652 653Usage: lsem(inlist) 654""" 655 sd = stdev(inlist) 656 n = len(inlist) 657 return sd/math.sqrt(n) 658 659 660def lz(inlist, score): 661 """ 662Returns the z-score for a given input score, given that score and the 663list from which that score came. Not appropriate for population calculations. 664 665Usage: lz(inlist, score) 666""" 667 z = (score-mean(inlist))/samplestdev(inlist) 668 return z 669 670 671def lzs(inlist): 672 """ 673Returns a list of z-scores, one for each score in the passed list. 674 675Usage: lzs(inlist) 676""" 677 zscores = [] 678 for item in inlist: 679 zscores.append(z(inlist, item)) 680 return zscores 681 682 683# TRIMMING FUNCTIONS 684 685def ltrimboth(l, proportiontocut): 686 """ 687Slices off the passed proportion of items from BOTH ends of the passed 688list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' 68910% of scores. Assumes list is sorted by magnitude. Slices off LESS if 690proportion results in a non-integer slice index (i.e., conservatively 691slices off proportiontocut). 692 693Usage: ltrimboth (l,proportiontocut) 694Returns: trimmed version of list l 695""" 696 lowercut = int(proportiontocut*len(l)) 697 uppercut = len(l) - lowercut 698 return l[lowercut:uppercut] 699 700 701def ltrim1(l, proportiontocut, tail='right'): 702 """ 703Slices off the passed proportion of items from ONE end of the passed 704list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 70510% of scores). Slices off LESS if proportion results in a non-integer 706slice index (i.e., conservatively slices off proportiontocut). 707 708Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left' 709Returns: trimmed version of list l 710""" 711 if tail == 'right': 712 lowercut = 0 713 uppercut = len(l) - int(proportiontocut*len(l)) 714 elif tail == 'left': 715 lowercut = int(proportiontocut*len(l)) 716 uppercut = len(l) 717 return l[lowercut:uppercut] 718 719 720# CORRELATION FUNCTIONS 721 722def lpaired(x, y): 723 """ 724Interactively determines the type of data and then runs the 725appropriated statistic for paired group data. 726 727Usage: lpaired(x,y) 728Returns: appropriate statistic name, value, and probability 729""" 730 samples = '' 731 while samples not in ['i', 'r', 'I', 'R', 'c', 'C']: 732 print('\nIndependent or related samples, or correlation (i,r,c): ', end=' ') 733 samples = input() 734 735 if samples in ['i', 'I', 'r', 'R']: 736 print('\nComparing variances ...', end=' ') 737# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 738 r = obrientransform(x, y) 739 f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1)) 740 if p < 0.05: 741 vartype = 'unequal, p='+str(round(p, 4)) 742 else: 743 vartype = 'equal' 744 print(vartype) 745 if samples in ['i', 'I']: 746 if vartype[0] == 'e': 747 t, p = ttest_ind(x, y, 0) 748 print('\nIndependent samples t-test: ', round(t, 4), round(p, 4)) 749 else: 750 if len(x) > 20 or len(y) > 20: 751 z, p = ranksums(x, y) 752 print('\nRank Sums test (NONparametric, n>20): ', round(z, 4), round(p, 4)) 753 else: 754 u, p = mannwhitneyu(x, y) 755 print('\nMann-Whitney U-test (NONparametric, ns<20): ', round(u, 4), round(p, 4)) 756 757 else: # RELATED SAMPLES 758 if vartype[0] == 'e': 759 t, p = ttest_rel(x, y, 0) 760 print('\nRelated samples t-test: ', round(t, 4), round(p, 4)) 761 else: 762 t, p = ranksums(x, y) 763 print('\nWilcoxon T-test (NONparametric): ', round(t, 4), round(p, 4)) 764 else: # CORRELATION ANALYSIS 765 corrtype = '' 766 while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']: 767 print('\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', end=' ') 768 corrtype = input() 769 if corrtype in ['c', 'C']: 770 m, b, r, p, see = linregress(x, y) 771 print('\nLinear regression for continuous variables ...') 772 lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'], [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]] 773 pstat.printcc(lol) 774 elif corrtype in ['r', 'R']: 775 r, p = spearmanr(x, y) 776 print('\nCorrelation for ranked variables ...') 777 print("Spearman's r: ", round(r, 4), round(p, 4)) 778 else: # DICHOTOMOUS 779 r, p = pointbiserialr(x, y) 780 print('\nAssuming x contains a dichotomous variable ...') 781 print('Point Biserial r: ', round(r, 4), round(p, 4)) 782 print('\n\n') 783 return None 784 785 786def lpearsonr(x, y): 787 """ 788Calculates a Pearson correlation coefficient and the associated 789probability value. Taken from Heiman's Basic Statistics for the Behav. 790Sci (2nd), p.195. 791 792Usage: lpearsonr(x,y) where x and y are equal-length lists 793Returns: Pearson's r value, two-tailed p-value 794""" 795 TINY = 1.0e-30 796 if len(x) != len(y): 797 raise ValueError('Input values not paired in pearsonr. Aborting.') 798 n = len(x) 799 x = [float(_) for _ in x] 800 y = [float(_) for _ in y] 801 r_num = n*(summult(x, y)) - sum(x)*sum(y) 802 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 803 r = (r_num / r_den) # denominator already a float 804 df = n-2 805 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 806 prob = betai(0.5*df, 0.5, df/float(df+t*t)) 807 return r, prob 808 809 810def lspearmanr(x, y): 811 """ 812Calculates a Spearman rank-order correlation coefficient. Taken 813from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 814 815Usage: lspearmanr(x,y) where x and y are equal-length lists 816Returns: Spearman's r, two-tailed p-value 817""" 818 if len(x) != len(y): 819 raise ValueError('Input values not paired in spearmanr. Aborting.') 820 n = len(x) 821 rankx = rankdata(x) 822 ranky = rankdata(y) 823 dsq = sumdiffsquared(rankx, ranky) 824 rs = 1 - 6*dsq / float(n*(n**2-1)) 825 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 826 df = n-2 827 probrs = betai(0.5*df, 0.5, df/(df+t*t)) # t already a float 828# probability values for rs are from part 2 of the spearman function in 829# Numerical Recipies, p.510. They are close to tables, but not exact. (?) 830 return rs, probrs 831 832 833def lpointbiserialr(x, y): 834 """ 835Calculates a point-biserial correlation coefficient and the associated 836probability value. Taken from Heiman's Basic Statistics for the Behav. 837Sci (1st), p.194. 838 839Usage: lpointbiserialr(x,y) where x,y are equal-length lists 840Returns: Point-biserial r, two-tailed p-value 841""" 842 TINY = 1e-30 843 if len(x) != len(y): 844 raise ValueError('INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.') 845 data = pstat.abut(x, y) 846 categories = pstat.unique(x) 847 if len(categories) != 2: 848 raise ValueError("Exactly 2 categories required for pointbiserialr().") 849 else: # there are 2 categories, continue 850 codemap = pstat.abut(categories, range(2)) 851 pstat.recode(data, codemap, 0) # recoded 852 x = pstat.linexand(data, 0, categories[0]) 853 y = pstat.linexand(data, 0, categories[1]) 854 xmean = mean(pstat.colex(x, 1)) 855 ymean = mean(pstat.colex(y, 1)) 856 n = len(data) 857 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 858 rpb = (ymean - xmean)/samplestdev(pstat.colex(data, 1))*adjust 859 df = n-2 860 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 861 prob = betai(0.5*df, 0.5, df/(df+t*t)) # t already a float 862 return rpb, prob 863 864 865def lkendalltau(x, y): 866 """ 867Calculates Kendall's tau ... correlation of ordinal data. Adapted 868from function kendl1 in Numerical Recipies. Needs good test-routine.@@@ 869 870Usage: lkendalltau(x,y) 871Returns: Kendall's tau, two-tailed p-value 872""" 873 n1 = 0 874 n2 = 0 875 iss = 0 876 for j in range(len(x)-1): 877 for k in range(j, len(y)): 878 a1 = x[j] - x[k] 879 a2 = y[j] - y[k] 880 aa = a1 * a2 881 if (aa): # neither list has a tie 882 n1 = n1 + 1 883 n2 = n2 + 1 884 if aa > 0: 885 iss = iss + 1 886 else: 887 iss = iss - 1 888 else: 889 if (a1): 890 n1 = n1 + 1 891 else: 892 n2 = n2 + 1 893 tau = iss / math.sqrt(n1*n2) 894 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 895 z = tau / math.sqrt(svar) 896 prob = erfcc(abs(z)/1.4142136) 897 return tau, prob 898 899 900def llinregress(x, y): 901 """ 902Calculates a regression line on x,y pairs. 903 904Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates 905Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate 906""" 907 TINY = 1.0e-20 908 if len(x) != len(y): 909 raise ValueError('Input values not paired in linregress. Aborting.') 910 n = len(x) 911 x = [float(_) for _ in x] 912 y = [float(_) for _ in y] 913 xmean = mean(x) 914 ymean = mean(y) 915 r_num = float(n*(summult(x, y)) - sum(x)*sum(y)) 916 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 917 r = r_num / r_den 918 df = n-2 919 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 920 prob = betai(0.5*df, 0.5, df/(df+t*t)) 921 slope = r_num / float(n*ss(x) - square_of_sums(x)) 922 intercept = ymean - slope*xmean 923 sterrest = math.sqrt(1-r*r)*samplestdev(y) 924 return slope, intercept, r, prob, sterrest 925 926 927# INFERENTIAL STATISTICS 928 929def lttest_1samp(a, popmean, printit=0, name='Sample', writemode='a'): 930 """ 931Calculates the t-obtained for the independent samples T-test on ONE group 932of scores a, given a population mean. If printit=1, results are printed 933to the screen. If printit='filename', the results are output to 'filename' 934using the given writemode (default=append). Returns t-value, and prob. 935 936Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 937Returns: t-value, two-tailed prob 938""" 939 x = mean(a) 940 v = var(a) 941 n = len(a) 942 df = n-1 943 svar = ((n-1)*v)/float(df) 944 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 945 prob = betai(0.5*df, 0.5, float(df)/(df+t*t)) 946 947 if printit != 0: 948 statname = 'Single-sample T-test.' 949 outputpairedstats(printit, writemode, 950 'Population', '--', popmean, 0, 0, 0, 951 name, n, x, v, min(a), max(a), 952 statname, t, prob) 953 return t, prob 954 955 956def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'): 957 """ 958Calculates the t-obtained T-test on TWO INDEPENDENT samples of 959scores a, and b. From Numerical Recipies, p.483. If printit=1, results 960are printed to the screen. If printit='filename', the results are output 961to 'filename' using the given writemode (default=append). Returns t-value, 962and prob. 963 964Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') 965Returns: t-value, two-tailed prob 966""" 967 x1 = mean(a) 968 x2 = mean(b) 969 v1 = stdev(a)**2 970 v2 = stdev(b)**2 971 n1 = len(a) 972 n2 = len(b) 973 df = n1+n2-2 974 svar = ((n1-1)*v1+(n2-1)*v2)/float(df) 975 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) 976 prob = betai(0.5*df, 0.5, df/(df+t*t)) 977 978 if printit != 0: 979 statname = 'Independent samples T-test.' 980 outputpairedstats(printit, writemode, 981 name1, n1, x1, v1, min(a), max(a), 982 name2, n2, x2, v2, min(b), max(b), 983 statname, t, prob) 984 return t, prob 985 986 987def lttest_rel(a, b, printit=0, name1='Sample1', name2='Sample2', writemode='a'): 988 """ 989Calculates the t-obtained T-test on TWO RELATED samples of scores, 990a and b. From Numerical Recipies, p.483. If printit=1, results are 991printed to the screen. If printit='filename', the results are output to 992'filename' using the given writemode (default=append). Returns t-value, 993and prob. 994 995Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') 996Returns: t-value, two-tailed prob 997""" 998 if len(a) != len(b): 999 raise ValueError('Unequal length lists in ttest_rel.') 1000 x1 = mean(a) 1001 x2 = mean(b) 1002 v1 = var(a) 1003 v2 = var(b) 1004 n = len(a) 1005 cov = 0 1006 for i in range(len(a)): 1007 cov = cov + (a[i]-x1) * (b[i]-x2) 1008 df = n-1 1009 cov = cov / float(df) 1010 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) 1011 t = (x1-x2)/sd 1012 prob = betai(0.5*df, 0.5, df/(df+t*t)) 1013 1014 if printit != 0: 1015 statname = 'Related samples T-test.' 1016 outputpairedstats(printit, writemode, 1017 name1, n, x1, v1, min(a), max(a), 1018 name2, n, x2, v2, min(b), max(b), 1019 statname, t, prob) 1020 return t, prob 1021 1022 1023def lchisquare(f_obs, f_exp=None): 1024 """ 1025Calculates a one-way chi square for list of observed frequencies and returns 1026the result. If no expected frequencies are given, the total N is assumed to 1027be equally distributed across all groups. 1028 1029Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq. 1030Returns: chisquare-statistic, associated p-value 1031""" 1032 k = len(f_obs) # number of groups 1033 if f_exp is None: 1034 f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. 1035 chisq = 0 1036 for i in range(len(f_obs)): 1037 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) 1038 return chisq, chisqprob(chisq, k-1) 1039 1040 1041def lks_2samp(data1, data2): 1042 """ 1043Computes the Kolmogorov-Smirnof statistic on 2 samples. From 1044Numerical Recipies in C, page 493. 1045 1046Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions 1047Returns: KS D-value, associated p-value 1048""" 1049 j1 = 0 1050 j2 = 0 1051 fn1 = 0.0 1052 fn2 = 0.0 1053 n1 = len(data1) 1054 n2 = len(data2) 1055 en1 = n1 1056 en2 = n2 1057 d = 0.0 1058 data1.sort() 1059 data2.sort() 1060 while j1 < n1 and j2 < n2: 1061 d1 = data1[j1] 1062 d2 = data2[j2] 1063 if d1 <= d2: 1064 fn1 = (j1)/float(en1) 1065 j1 = j1 + 1 1066 if d2 <= d1: 1067 fn2 = (j2)/float(en2) 1068 j2 = j2 + 1 1069 dt = (fn2-fn1) 1070 if math.fabs(dt) > math.fabs(d): 1071 d = dt 1072 try: 1073 en = math.sqrt(en1*en2/float(en1+en2)) 1074 prob = ksprob((en+0.12+0.11/en)*abs(d)) 1075 except Exception: 1076 prob = 1.0 1077 return d, prob 1078 1079 1080def lmannwhitneyu(x, y): 1081 """ 1082Calculates a Mann-Whitney U statistic on the provided scores and 1083returns the result. Use only when the n in each condition is < 20 and 1084you have 2 independent samples of ranks. NOTE: Mann-Whitney U is 1085significant if the u-obtained is LESS THAN or equal to the critical 1086value of U found in the tables. Equivalent to Kruskal-Wallis H with 1087just 2 groups. 1088 1089Usage: lmannwhitneyu(data) 1090Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 1091""" 1092 n1 = len(x) 1093 n2 = len(y) 1094 ranked = rankdata(x+y) 1095 rankx = ranked[0:n1] # get the x-ranks 1096 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 1097 u2 = n1*n2 - u1 # remainder is U for y 1098 bigu = max(u1, u2) 1099 smallu = min(u1, u2) 1100 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 1101 if T == 0: 1102 raise ValueError('All numbers are identical in lmannwhitneyu') 1103 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 1104 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 1105 return smallu, 1.0 - zprob(z) 1106 1107 1108def ltiecorrect(rankvals): 1109 """ 1110Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See 1111Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. 1112New York: McGraw-Hill. Code adapted from |Stat rankind.c code. 1113 1114Usage: ltiecorrect(rankvals) 1115Returns: T correction factor for U or H 1116""" 1117 sorted, posn = shellsort(rankvals) 1118 n = len(sorted) 1119 T = 0.0 1120 i = 0 1121 while (i < n-1): 1122 if sorted[i] == sorted[i+1]: 1123 nties = 1 1124 while (i < n-1) and (sorted[i] == sorted[i+1]): 1125 nties = nties + 1 1126 i = i + 1 1127 T = T + nties**3 - nties 1128 i = i+1 1129 T = T / float(n**3-n) 1130 return 1.0 - T 1131 1132 1133def lranksums(x, y): 1134 """ 1135Calculates the rank sums statistic on the provided scores and 1136returns the result. Use only when the n in each condition is > 20 and you 1137have 2 independent samples of ranks. 1138 1139Usage: lranksums(x,y) 1140Returns: a z-statistic, two-tailed p-value 1141""" 1142 n1 = len(x) 1143 n2 = len(y) 1144 alldata = x+y 1145 ranked = rankdata(alldata) 1146 x = ranked[:n1] 1147 y = ranked[n1:] 1148 s = sum(x) 1149 expected = n1*(n1+n2+1) / 2.0 1150 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 1151 prob = 2*(1.0 - zprob(abs(z))) 1152 return z, prob 1153 1154 1155def lwilcoxont(x, y): 1156 """ 1157Calculates the Wilcoxon T-test for related samples and returns the 1158result. A non-parametric T-test. 1159 1160Usage: lwilcoxont(x,y) 1161Returns: a t-statistic, two-tail probability estimate 1162""" 1163 if len(x) != len(y): 1164 raise ValueError('Unequal N in wilcoxont. Aborting.') 1165 d = [] 1166 for i in range(len(x)): 1167 diff = x[i] - y[i] 1168 if diff != 0: 1169 d.append(diff) 1170 count = len(d) 1171 absd = [abs(_) for _ in d] 1172 absranked = rankdata(absd) 1173 r_plus = 0.0 1174 r_minus = 0.0 1175 for i in range(len(absd)): 1176 if d[i] < 0: 1177 r_minus = r_minus + absranked[i] 1178 else: 1179 r_plus = r_plus + absranked[i] 1180 wt = min(r_plus, r_minus) 1181 mn = count * (count+1) * 0.25 1182 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 1183 z = math.fabs(wt-mn) / se 1184 prob = 2*(1.0 - zprob(abs(z))) 1185 return wt, prob 1186 1187 1188def lkruskalwallish(*args): 1189 """ 1190The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 1191groups, requiring at least 5 subjects in each group. This function 1192calculates the Kruskal-Wallis H-test for 3 or more independent samples 1193and returns the result. 1194 1195Usage: lkruskalwallish(*args) 1196Returns: H-statistic (corrected for ties), associated p-value 1197""" 1198 args = list(args) 1199 n = [0]*len(args) 1200 all = [] 1201 n = [len(_) for _ in args] 1202 for i in range(len(args)): 1203 all = all + args[i] 1204 ranked = rankdata(all) 1205 T = tiecorrect(ranked) 1206 for i in range(len(args)): 1207 args[i] = ranked[0:n[i]] 1208 del ranked[0:n[i]] 1209 rsums = [] 1210 for i in range(len(args)): 1211 rsums.append(sum(args[i])**2) 1212 rsums[i] = rsums[i] / float(n[i]) 1213 ssbn = sum(rsums) 1214 totaln = sum(n) 1215 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 1216 df = len(args) - 1 1217 if T == 0: 1218 raise ValueError('All numbers are identical in lkruskalwallish') 1219 h = h / float(T) 1220 return h, chisqprob(h, df) 1221 1222 1223def lfriedmanchisquare(*args): 1224 """ 1225Friedman Chi-Square is a non-parametric, one-way within-subjects 1226ANOVA. This function calculates the Friedman Chi-square test for repeated 1227measures and returns the result, along with the associated probability 1228value. It assumes 3 or more repeated measures. Only 3 levels requires a 1229minimum of 10 subjects in the study. Four levels requires 5 subjects per 1230level(??). 1231 1232Usage: lfriedmanchisquare(*args) 1233Returns: chi-square statistic, associated p-value 1234""" 1235 k = len(args) 1236 if k < 3: 1237 raise ValueError('Less than 3 levels. Friedman test not appropriate.') 1238 n = len(args[0]) 1239 data = pstat.abut(*tuple(args)) 1240 for i in range(len(data)): 1241 data[i] = rankdata(data[i]) 1242 ssbn = 0 1243 for i in range(k): 1244 ssbn = ssbn + sum(args[i])**2 1245 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 1246 return chisq, chisqprob(chisq, k-1) 1247 1248 1249# PROBABILITY CALCULATIONS 1250 1251def lchisqprob(chisq, df): 1252 """ 1253Returns the (1-tailed) probability value associated with the provided 1254chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat. 1255 1256Usage: lchisqprob(chisq,df) 1257""" 1258 BIG = 20.0 1259 1260 def ex(x): 1261 BIG = 20.0 1262 if x < -BIG: 1263 return 0.0 1264 else: 1265 return math.exp(x) 1266 1267 if chisq <= 0 or df < 1: 1268 return 1.0 1269 a = 0.5 * chisq 1270 if df % 2 == 0: 1271 even = 1 1272 else: 1273 even = 0 1274 if df > 1: 1275 y = ex(-a) 1276 if even: 1277 s = y 1278 else: 1279 s = 2.0 * zprob(-math.sqrt(chisq)) 1280 if (df > 2): 1281 chisq = 0.5 * (df - 1.0) 1282 if even: 1283 z = 1.0 1284 else: 1285 z = 0.5 1286 if a > BIG: 1287 if even: 1288 e = 0.0 1289 else: 1290 e = math.log(math.sqrt(math.pi)) 1291 c = math.log(a) 1292 while (z <= chisq): 1293 e = math.log(z) + e 1294 s = s + ex(c*z-a-e) 1295 z = z + 1.0 1296 return s 1297 else: 1298 if even: 1299 e = 1.0 1300 else: 1301 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) 1302 c = 0.0 1303 while (z <= chisq): 1304 e = e * (a/float(z)) 1305 c = c + e 1306 z = z + 1.0 1307 return (c*y+s) 1308 else: 1309 return s 1310 1311 1312def lerfcc(x): 1313 """ 1314Returns the complementary error function erfc(x) with fractional 1315error everywhere less than 1.2e-7. Adapted from Numerical Recipies. 1316 1317Usage: lerfcc(x) 1318""" 1319 z = abs(x) 1320 t = 1.0 / (1.0+0.5*z) 1321 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 1322 if x >= 0: 1323 return ans 1324 else: 1325 return 2.0 - ans 1326 1327 1328def lzprob(z): 1329 """ 1330Returns the area under the normal curve 'to the left of' the given z value. 1331Thus, 1332 for z<0, zprob(z) = 1-tail probability 1333 for z>0, 1.0-zprob(z) = 1-tail probability 1334 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 1335Adapted from z.c in Gary Perlman's |Stat. 1336 1337Usage: lzprob(z) 1338""" 1339 Z_MAX = 6.0 # maximum meaningful z-value 1340 if z == 0.0: 1341 x = 0.0 1342 else: 1343 y = 0.5 * math.fabs(z) 1344 if y >= (Z_MAX*0.5): 1345 x = 1.0 1346 elif (y < 1.0): 1347 w = y*y 1348 x = ((((((((0.000124818987 * w 1349 - 0.001075204047) * w + 0.005198775019) * w 1350 - 0.019198292004) * w + 0.059054035642) * w 1351 - 0.151968751364) * w + 0.319152932694) * w 1352 - 0.531923007300) * w + 0.797884560593) * y * 2.0 1353 else: 1354 y = y - 2.0 1355 x = (((((((((((((-0.000045255659 * y 1356 + 0.000152529290) * y - 0.000019538132) * y 1357 - 0.000676904986) * y + 0.001390604284) * y 1358 - 0.000794620820) * y - 0.002034254874) * y 1359 + 0.006549791214) * y - 0.010557625006) * y 1360 + 0.011630447319) * y - 0.009279453341) * y 1361 + 0.005353579108) * y - 0.002141268741) * y 1362 + 0.000535310849) * y + 0.999936657524 1363 if z > 0.0: 1364 prob = ((x+1.0)*0.5) 1365 else: 1366 prob = ((1.0-x)*0.5) 1367 return prob 1368 1369 1370def lksprob(alam): 1371 """ 1372Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from 1373Numerical Recipies. 1374 1375Usage: lksprob(alam) 1376""" 1377 fac = 2.0 1378 sum = 0.0 1379 termbf = 0.0 1380 a2 = -2.0*alam*alam 1381 for j in range(1, 201): 1382 term = fac*math.exp(a2*j*j) 1383 sum = sum + term 1384 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): 1385 return sum 1386 fac = -fac 1387 termbf = math.fabs(term) 1388 return 1.0 # Get here only if fails to converge; was 0.0!! 1389 1390 1391def lfprob(dfnum, dfden, F): 1392 """ 1393Returns the (1-tailed) significance level (p-value) of an F 1394statistic given the degrees of freedom for the numerator (dfR-dfF) and 1395the degrees of freedom for the denominator (dfF). 1396 1397Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 1398""" 1399 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 1400 return p 1401 1402 1403def lbetacf(a, b, x): 1404 """ 1405This function evaluates the continued fraction form of the incomplete 1406Beta function, betai. (Adapted from: Numerical Recipies in C.) 1407 1408Usage: lbetacf(a,b,x) 1409""" 1410 ITMAX = 200 1411 EPS = 3.0e-7 1412 1413 bm = az = am = 1.0 1414 qab = a+b 1415 qap = a+1.0 1416 qam = a-1.0 1417 bz = 1.0-qab*x/qap 1418 for i in range(ITMAX+1): 1419 em = float(i+1) 1420 tem = em + em 1421 d = em*(b-em)*x/((qam+tem)*(a+tem)) 1422 ap = az + d*am 1423 bp = bz+d*bm 1424 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 1425 app = ap+d*az 1426 bpp = bp+d*bz 1427 aold = az 1428 am = ap/bpp 1429 bm = bp/bpp 1430 az = app/bpp 1431 bz = 1.0 1432 if (abs(az-aold) < (EPS*abs(az))): 1433 return az 1434 print('a or b too big, or ITMAX too small in Betacf.') 1435 1436 1437def lgammln(xx): 1438 """ 1439Returns the gamma function of xx. 1440 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 1441(Adapted from: Numerical Recipies in C.) 1442 1443Usage: lgammln(xx) 1444""" 1445 1446 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 1447 0.120858003e-2, -0.536382e-5] 1448 x = xx - 1.0 1449 tmp = x + 5.5 1450 tmp = tmp - (x+0.5)*math.log(tmp) 1451 ser = 1.0 1452 for j in range(len(coeff)): 1453 x = x + 1 1454 ser = ser + coeff[j]/x 1455 return -tmp + math.log(2.50662827465*ser) 1456 1457 1458def lbetai(a, b, x): 1459 """ 1460Returns the incomplete beta function: 1461 1462 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 1463 1464where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 1465function of a. The continued fraction formulation is implemented here, 1466using the betacf function. (Adapted from: Numerical Recipies in C.) 1467 1468Usage: lbetai(a,b,x) 1469""" 1470 if (x < 0.0 or x > 1.0): 1471 raise ValueError('Bad x in lbetai') 1472 if (x == 0.0 or x == 1.0): 1473 bt = 0.0 1474 else: 1475 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b 1476 * math.log(1.0-x)) 1477 if (x < (a+1.0)/(a+b+2.0)): 1478 return bt*betacf(a, b, x)/float(a) 1479 else: 1480 return 1.0-bt*betacf(b, a, 1.0-x)/float(b) 1481 1482 1483# ANOVA CALCULATIONS 1484 1485def lF_oneway(*lists): 1486 """ 1487Performs a 1-way ANOVA, returning an F-value and probability given 1488any number of groups. From Heiman, pp.394-7. 1489 1490Usage: F_oneway(*lists) where *lists is any number of lists, one per 1491 treatment group 1492Returns: F value, one-tailed p-value 1493""" 1494 a = len(lists) # ANOVA on 'a' groups, each in it's own list 1495 alldata = [] 1496 for i in range(len(lists)): 1497 alldata = alldata + lists[i] 1498 alldata = N.array(alldata) 1499 bign = len(alldata) 1500 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 1501 ssbn = 0 1502 for list in lists: 1503 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) 1504 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 1505 sswn = sstot-ssbn 1506 dfbn = a-1 1507 dfwn = bign - a 1508 msb = ssbn/float(dfbn) 1509 msw = sswn/float(dfwn) 1510 f = msb/msw 1511 prob = fprob(dfbn, dfwn, f) 1512 return f, prob 1513 1514 1515def lF_value(ER, EF, dfnum, dfden): 1516 """ 1517Returns an F-statistic given the following: 1518 ER = error associated with the null hypothesis (the Restricted model) 1519 EF = error associated with the alternate hypothesis (the Full model) 1520 dfR-dfF = degrees of freedom of the numerator 1521 dfF = degrees of freedom associated with the denominator/Full model 1522 1523Usage: lF_value(ER,EF,dfnum,dfden) 1524""" 1525 return ((ER-EF)/float(dfnum) / (EF/float(dfden))) 1526 1527 1528# SUPPORT FUNCTIONS 1529 1530def writecc(listoflists, file, writetype='w', extra=2): 1531 """ 1532Writes a list of lists to a file in columns, customized by the max 1533size of items within the columns (max size of items in col, +2 characters) 1534to specified file. File-overwrite is the default. 1535 1536Usage: writecc (listoflists,file,writetype='w',extra=2) 1537Returns: None 1538""" 1539 if type(listoflists[0]) not in [list, tuple]: 1540 listoflists = [listoflists] 1541 outfile = open(file, writetype) 1542 rowstokill = [] 1543 list2print = copy.deepcopy(listoflists) 1544 for i in range(len(listoflists)): 1545 if listoflists[i] == ['\n'] or listoflists[i] == '\n' or listoflists[i] == 'dashes': 1546 rowstokill = rowstokill + [i] 1547 rowstokill.reverse() 1548 for row in rowstokill: 1549 del list2print[row] 1550 maxsize = [0]*len(list2print[0]) 1551 for col in range(len(list2print[0])): 1552 items = pstat.colex(list2print, col) 1553 items = [pstat.makestr(_) for _ in items] 1554 maxsize[col] = max(map(len, items)) + extra 1555 for row in listoflists: 1556 if row == ['\n'] or row == '\n': 1557 outfile.write('\n') 1558 elif row == ['dashes'] or row == 'dashes': 1559 dashes = [0]*len(maxsize) 1560 for j in range(len(maxsize)): 1561 dashes[j] = '-'*(maxsize[j]-2) 1562 outfile.write(pstat.lineincustcols(dashes, maxsize)) 1563 else: 1564 outfile.write(pstat.lineincustcols(row, maxsize)) 1565 outfile.write('\n') 1566 outfile.close() 1567 return None 1568 1569 1570def lincr(l, cap): # to increment a list up to a max-list of 'cap' 1571 """ 1572Simulate a counting system from an n-dimensional list. 1573 1574Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n 1575Returns: next set of values for list l, OR -1 (if overflow) 1576""" 1577 l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap) 1578 for i in range(len(l)): 1579 if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done 1580 l[i] = 0 1581 l[i+1] = l[i+1] + 1 1582 elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished 1583 l = -1 1584 return l 1585 1586 1587def lsum(inlist): 1588 """ 1589Returns the sum of the items in the passed list. 1590 1591Usage: lsum(inlist) 1592""" 1593 s = 0 1594 for item in inlist: 1595 s = s + item 1596 return s 1597 1598 1599def lcumsum(inlist): 1600 """ 1601Returns a list consisting of the cumulative sum of the items in the 1602passed list. 1603 1604Usage: lcumsum(inlist) 1605""" 1606 newlist = copy.deepcopy(inlist) 1607 for i in range(1, len(newlist)): 1608 newlist[i] = newlist[i] + newlist[i-1] 1609 return newlist 1610 1611 1612def lss(inlist): 1613 """ 1614Squares each value in the passed list, adds up these squares and 1615returns the result. 1616 1617Usage: lss(inlist) 1618""" 1619 ss = 0 1620 for item in inlist: 1621 ss = ss + item*item 1622 return ss 1623 1624 1625def lsummult(list1, list2): 1626 """ 1627Multiplies elements in list1 and list2, element by element, and 1628returns the sum of all resulting multiplications. Must provide equal 1629length lists. 1630 1631Usage: lsummult(list1,list2) 1632""" 1633 if len(list1) != len(list2): 1634 raise ValueError("Lists not equal length in summult.") 1635 s = 0 1636 for item1, item2 in pstat.abut(list1, list2): 1637 s = s + item1*item2 1638 return s 1639 1640 1641def lsumdiffsquared(x, y): 1642 """ 1643Takes pairwise differences of the values in lists x and y, squares 1644these differences, and returns the sum of these squares. 1645 1646Usage: lsumdiffsquared(x,y) 1647Returns: sum[(x[i]-y[i])**2] 1648""" 1649 sds = 0 1650 for i in range(len(x)): 1651 sds = sds + (x[i]-y[i])**2 1652 return sds 1653 1654 1655def lsquare_of_sums(inlist): 1656 """ 1657Adds the values in the passed list, squares the sum, and returns 1658the result. 1659 1660Usage: lsquare_of_sums(inlist) 1661Returns: sum(inlist[i])**2 1662""" 1663 s = sum(inlist) 1664 return float(s)*s 1665 1666 1667def lshellsort(inlist): 1668 """ 1669Shellsort algorithm. Sorts a 1D-list. 1670 1671Usage: lshellsort(inlist) 1672Returns: sorted-inlist, sorting-index-vector (for original list) 1673""" 1674 n = len(inlist) 1675 svec = copy.deepcopy(inlist) 1676 ivec = list(range(n)) 1677 gap = n/2 # integer division needed 1678 while gap > 0: 1679 for i in range(gap, n): 1680 for j in range(i-gap, -1, -gap): 1681 while j >= 0 and svec[j] > svec[j+gap]: 1682 temp = svec[j] 1683 svec[j] = svec[j+gap] 1684 svec[j+gap] = temp 1685 itemp = ivec[j] 1686 ivec[j] = ivec[j+gap] 1687 ivec[j+gap] = itemp 1688 gap = gap / 2 # integer division needed 1689# svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] 1690 return svec, ivec 1691 1692 1693def lrankdata(inlist): 1694 """ 1695Ranks the data in inlist, dealing with ties appropritely. Assumes 1696a 1D inlist. Adapted from Gary Perlman's |Stat ranksort. 1697 1698Usage: lrankdata(inlist) 1699Returns: a list of length equal to inlist, containing rank scores 1700""" 1701 n = len(inlist) 1702 svec, ivec = shellsort(inlist) 1703 sumranks = 0 1704 dupcount = 0 1705 newlist = [0]*n 1706 for i in range(n): 1707 sumranks = sumranks + i 1708 dupcount = dupcount + 1 1709 if i == n-1 or svec[i] != svec[i+1]: 1710 averank = sumranks / float(dupcount) + 1 1711 for j in range(i-dupcount+1, i+1): 1712 newlist[ivec[j]] = averank 1713 sumranks = 0 1714 dupcount = 0 1715 return newlist 1716 1717 1718def outputpairedstats(fname, writemode, name1, n1, m1, se1, min1, max1, name2, n2, m2, se2, min2, max2, statname, stat, prob): 1719 """ 1720Prints or write to a file stats for two groups, using the name, n, 1721mean, sterr, min and max for each group, as well as the statistic name, 1722its value, and the associated p-value. 1723 1724Usage: outputpairedstats(fname,writemode, 1725 name1,n1,mean1,stderr1,min1,max1, 1726 name2,n2,mean2,stderr2,min2,max2, 1727 statname,stat,prob) 1728Returns: None 1729""" 1730 suffix = '' # for *s after the p-value 1731 try: 1732 prob.shape 1733 prob = prob[0] 1734 except Exception: 1735 pass 1736 if prob < 0.001: 1737 suffix = ' ***' 1738 elif prob < 0.01: 1739 suffix = ' **' 1740 elif prob < 0.05: 1741 suffix = ' *' 1742 title = [['Name', 'N', 'Mean', 'SD', 'Min', 'Max']] 1743 lofl = title+[[name1, n1, round(m1, 3), round(math.sqrt(se1), 3), min1, max1], 1744 [name2, n2, round(m2, 3), round(math.sqrt(se2), 3), min2, max2]] 1745 if not isinstance(fname, str) or len(fname) == 0: 1746 print() 1747 print(statname) 1748 print() 1749 pstat.printcc(lofl) 1750 print() 1751 try: 1752 if stat.shape == (): 1753 stat = stat[0] 1754 if prob.shape == (): 1755 prob = prob[0] 1756 except Exception: 1757 pass 1758 print('Test statistic = ', round(stat, 3), ' p = ', round(prob, 3), suffix) 1759 print() 1760 else: 1761 file = open(fname, writemode) 1762 file.write('\n'+statname+'\n\n') 1763 file.close() 1764 writecc(lofl, fname, 'a') 1765 file = open(fname, 'a') 1766 try: 1767 if stat.shape == (): 1768 stat = stat[0] 1769 if prob.shape == (): 1770 prob = prob[0] 1771 except Exception: 1772 pass 1773 file.write(pstat.list2string(['\nTest statistic = ', round(stat, 4), ' p = ', round(prob, 4), suffix, '\n\n'])) 1774 file.close() 1775 return None 1776 1777 1778def lfindwithin(data): 1779 """ 1780Returns an integer representing a binary vector, where 1=within- 1781subject factor, 0=between. Input equals the entire data 2D list (i.e., 1782column 0=random factor, column -1=measured values (those two are skipped). 1783Note: input data is in |Stat format ... a list of lists ("2D list") with 1784one row per measured value, first column=subject identifier, last column= 1785score, one in-between column per factor (these columns contain level 1786designations on each factor). See also stats.anova.__doc__. 1787 1788Usage: lfindwithin(data) data in |Stat format 1789""" 1790 1791 numfact = len(data[0])-1 1792 withinvec = 0 1793 for col in range(1, numfact): 1794 examplelevel = pstat.unique(pstat.colex(data, col))[0] 1795 rows = pstat.linexand(data, col, examplelevel) # get 1 level of this factor 1796 factsubjs = pstat.unique(pstat.colex(rows, 0)) 1797 allsubjs = pstat.unique(pstat.colex(data, 0)) 1798 if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor? 1799 withinvec = withinvec + (1 << col) 1800 return withinvec 1801 1802 1803# DISPATCH LISTS AND TUPLES TO ABOVE FCNS 1804 1805# CENTRAL TENDENCY: 1806geometricmean = Dispatch((lgeometricmean, (list, tuple)), ) 1807harmonicmean = Dispatch((lharmonicmean, (list, tuple)), ) 1808mean = Dispatch((lmean, (list, tuple)), ) 1809median = Dispatch((lmedian, (list, tuple)), ) 1810medianscore = Dispatch((lmedianscore, (list, tuple)), ) 1811mode = Dispatch((lmode, (list, tuple)), ) 1812 1813# MOMENTS: 1814moment = Dispatch((lmoment, (list, tuple)), ) 1815variation = Dispatch((lvariation, (list, tuple)), ) 1816skew = Dispatch((lskew, (list, tuple)), ) 1817kurtosis = Dispatch((lkurtosis, (list, tuple)), ) 1818describe = Dispatch((ldescribe, (list, tuple)), ) 1819 1820# FREQUENCY STATISTICS: 1821itemfreq = Dispatch((litemfreq, (list, tuple)), ) 1822scoreatpercentile = Dispatch((lscoreatpercentile, (list, tuple)), ) 1823percentileofscore = Dispatch((lpercentileofscore, (list, tuple)), ) 1824histogram = Dispatch((lhistogram, (list, tuple)), ) 1825cumfreq = Dispatch((lcumfreq, (list, tuple)), ) 1826relfreq = Dispatch((lrelfreq, (list, tuple)), ) 1827 1828# VARIABILITY: 1829obrientransform = Dispatch((lobrientransform, (list, tuple)), ) 1830samplevar = Dispatch((lsamplevar, (list, tuple)), ) 1831samplestdev = Dispatch((lsamplestdev, (list, tuple)), ) 1832var = Dispatch((lvar, (list, tuple)), ) 1833stdev = Dispatch((lstdev, (list, tuple)), ) 1834sterr = Dispatch((lsterr, (list, tuple)), ) 1835sem = Dispatch((lsem, (list, tuple)), ) 1836z = Dispatch((lz, (list, tuple)), ) 1837zs = Dispatch((lzs, (list, tuple)), ) 1838 1839# TRIMMING FCNS: 1840trimboth = Dispatch((ltrimboth, (list, tuple)), ) 1841trim1 = Dispatch((ltrim1, (list, tuple)), ) 1842 1843# CORRELATION FCNS: 1844paired = Dispatch((lpaired, (list, tuple)), ) 1845pearsonr = Dispatch((lpearsonr, (list, tuple)), ) 1846spearmanr = Dispatch((lspearmanr, (list, tuple)), ) 1847pointbiserialr = Dispatch((lpointbiserialr, (list, tuple)), ) 1848kendalltau = Dispatch((lkendalltau, (list, tuple)), ) 1849linregress = Dispatch((llinregress, (list, tuple)), ) 1850 1851# INFERENTIAL STATS: 1852ttest_1samp = Dispatch((lttest_1samp, (list, tuple)), ) 1853ttest_ind = Dispatch((lttest_ind, (list, tuple)), ) 1854ttest_rel = Dispatch((lttest_rel, (list, tuple)), ) 1855chisquare = Dispatch((lchisquare, (list, tuple)), ) 1856ks_2samp = Dispatch((lks_2samp, (list, tuple)), ) 1857mannwhitneyu = Dispatch((lmannwhitneyu, (list, tuple)), ) 1858ranksums = Dispatch((lranksums, (list, tuple)), ) 1859tiecorrect = Dispatch((ltiecorrect, (list, tuple)), ) 1860wilcoxont = Dispatch((lwilcoxont, (list, tuple)), ) 1861kruskalwallish = Dispatch((lkruskalwallish, (list, tuple)), ) 1862friedmanchisquare = Dispatch((lfriedmanchisquare, (list, tuple)), ) 1863 1864# PROBABILITY CALCS: 1865chisqprob = Dispatch((lchisqprob, (int, float)), ) 1866zprob = Dispatch((lzprob, (int, float)), ) 1867ksprob = Dispatch((lksprob, (int, float)), ) 1868fprob = Dispatch((lfprob, (int, float)), ) 1869betacf = Dispatch((lbetacf, (int, float)), ) 1870betai = Dispatch((lbetai, (int, float)), ) 1871erfcc = Dispatch((lerfcc, (int, float)), ) 1872gammln = Dispatch((lgammln, (int, float)), ) 1873 1874# ANOVA FUNCTIONS: 1875F_oneway = Dispatch((lF_oneway, (list, tuple)), ) 1876F_value = Dispatch((lF_value, (list, tuple)), ) 1877 1878# SUPPORT FUNCTIONS: 1879incr = Dispatch((lincr, (list, tuple)), ) 1880sum = Dispatch((lsum, (list, tuple)), ) 1881cumsum = Dispatch((lcumsum, (list, tuple)), ) 1882ss = Dispatch((lss, (list, tuple)), ) 1883summult = Dispatch((lsummult, (list, tuple)), ) 1884square_of_sums = Dispatch((lsquare_of_sums, (list, tuple)), ) 1885sumdiffsquared = Dispatch((lsumdiffsquared, (list, tuple)), ) 1886shellsort = Dispatch((lshellsort, (list, tuple)), ) 1887rankdata = Dispatch((lrankdata, (list, tuple)), ) 1888findwithin = Dispatch((lfindwithin, (list, tuple)), ) 1889 1890 1891# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1892# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1893# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1894# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1895# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1896# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1897# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1898# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1899# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1900# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1901# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1902# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1903# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1904# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1905# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1906# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1907# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1908# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1909# ============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1910 1911try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 1912 import Numeric 1913 N = Numeric 1914 import LinearAlgebra 1915 LA = LinearAlgebra 1916 1917# ACENTRAL TENDENCY 1918 1919 def ageometricmean(inarray, dimension=None, keepdims=0): 1920 """ 1921 Calculates the geometric mean of the values in the passed array. 1922 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in 1923 the passed array. Use dimension=None to flatten array first. REMEMBER: if 1924 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 1925 if dimension is a sequence, it collapses over all specified dimensions. If 1926 keepdims is set to 1, the resulting array will have as many dimensions as 1927 inarray, with only 1 'level' per dim that was collapsed over. 1928 1929 Usage: ageometricmean(inarray,dimension=None,keepdims=0) 1930 Returns: geometric mean computed over dim(s) listed in dimension 1931 """ 1932 inarray = N.array(inarray, N.Float) 1933 if dimension is None: 1934 inarray = N.ravel(inarray) 1935 size = len(inarray) 1936 mult = N.power(inarray, 1.0/size) 1937 mult = N.multiply.reduce(mult) 1938 elif type(dimension) in [int, float]: 1939 size = inarray.shape[dimension] 1940 mult = N.power(inarray, 1.0/size) 1941 mult = N.multiply.reduce(mult, dimension) 1942 if keepdims == 1: 1943 shp = list(inarray.shape) 1944 shp[dimension] = 1 1945 N.reshape(sum, shp) 1946 else: # must be a SEQUENCE of dims to average over 1947 dims = sorted(dimension) 1948 dims.reverse() 1949 size = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.Float) 1950 mult = N.power(inarray, 1.0/size) 1951 for dim in dims: 1952 mult = N.multiply.reduce(mult, dim) 1953 if keepdims == 1: 1954 shp = list(inarray.shape) 1955 for dim in dims: 1956 shp[dim] = 1 1957 mult = N.reshape(mult, shp) 1958 return mult 1959 1960 def aharmonicmean(inarray, dimension=None, keepdims=0): 1961 """ 1962 Calculates the harmonic mean of the values in the passed array. 1963 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in 1964 the passed array. Use dimension=None to flatten array first. REMEMBER: if 1965 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 1966 if dimension is a sequence, it collapses over all specified dimensions. If 1967 keepdims is set to 1, the resulting array will have as many dimensions as 1968 inarray, with only 1 'level' per dim that was collapsed over. 1969 1970 Usage: aharmonicmean(inarray,dimension=None,keepdims=0) 1971 Returns: harmonic mean computed over dim(s) in dimension 1972 """ 1973 inarray = inarray.astype(N.Float) 1974 if dimension is None: 1975 inarray = N.ravel(inarray) 1976 size = len(inarray) 1977 s = N.add.reduce(1.0 / inarray) 1978 elif type(dimension) in [int, float]: 1979 size = float(inarray.shape[dimension]) 1980 s = N.add.reduce(1.0/inarray, dimension) 1981 if keepdims == 1: 1982 shp = list(inarray.shape) 1983 shp[dimension] = 1 1984 s = N.reshape(s, shp) 1985 else: # must be a SEQUENCE of dims to average over 1986 dims = sorted(dimension) 1987 nondims = [] 1988 for i in range(len(inarray.shape)): 1989 if i not in dims: 1990 nondims.append(i) 1991 tinarray = N.transpose(inarray, nondims+dims) # put keep-dims first 1992 idx = [0] * len(nondims) 1993 if idx == []: 1994 size = len(N.ravel(inarray)) 1995 s = asum(1.0 / inarray) 1996 if keepdims == 1: 1997 s = N.reshape([s], N.ones(len(inarray.shape))) 1998 else: 1999 idx[0] = -1 2000 loopcap = N.array(tinarray.shape[0:len(nondims)]) - 1 2001 s = N.zeros(loopcap+1, N.Float) 2002 while incr(idx, loopcap) != -1: 2003 s[idx] = asum(1.0/tinarray[idx]) 2004 size = N.multiply.reduce(N.take(inarray.shape, dims)) 2005 if keepdims == 1: 2006 shp = list(inarray.shape) 2007 for dim in dims: 2008 shp[dim] = 1 2009 s = N.reshape(s, shp) 2010 return size / s 2011 2012 def amean(inarray, dimension=None, keepdims=0): 2013 """ 2014 Calculates the arithmatic mean of the values in the passed array. 2015 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the 2016 passed array. Use dimension=None to flatten array first. REMEMBER: if 2017 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 2018 if dimension is a sequence, it collapses over all specified dimensions. If 2019 keepdims is set to 1, the resulting array will have as many dimensions as 2020 inarray, with only 1 'level' per dim that was collapsed over. 2021 2022 Usage: amean(inarray,dimension=None,keepdims=0) 2023 Returns: arithematic mean calculated over dim(s) in dimension 2024 """ 2025 if inarray.typecode() in ['l', 's', 'b']: 2026 inarray = inarray.astype(N.Float) 2027 if dimension is None: 2028 inarray = N.ravel(inarray) 2029 sum = N.add.reduce(inarray) 2030 denom = float(len(inarray)) 2031 elif type(dimension) in [int, float]: 2032 sum = asum(inarray, dimension) 2033 denom = float(inarray.shape[dimension]) 2034 if keepdims == 1: 2035 shp = list(inarray.shape) 2036 shp[dimension] = 1 2037 sum = N.reshape(sum, shp) 2038 else: # must be a TUPLE of dims to average over 2039 dims = sorted(dimension) 2040 dims.reverse() 2041 sum = inarray * 1.0 2042 for dim in dims: 2043 sum = N.add.reduce(sum, dim) 2044 denom = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.Float) 2045 if keepdims == 1: 2046 shp = list(inarray.shape) 2047 for dim in dims: 2048 shp[dim] = 1 2049 sum = N.reshape(sum, shp) 2050 return sum/denom 2051 2052 def amedian(inarray, numbins=1000): 2053 """ 2054 Calculates the COMPUTED median value of an array of numbers, given the 2055 number of bins to use for the histogram (more bins approaches finding the 2056 precise median value of the array; default number of bins = 1000). From 2057 G.W. Heiman's Basic Stats, or CRC Probability & Statistics. 2058 NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). 2059 2060 Usage: amedian(inarray,numbins=1000) 2061 Returns: median calculated over ALL values in inarray 2062 """ 2063 inarray = N.ravel(inarray) 2064 (hist, smallest, binsize, extras) = ahistogram(inarray, numbins) 2065 cumhist = N.cumsum(hist) # make cumulative histogram 2066 otherbins = N.greater_equal(cumhist, len(inarray)/2.0) 2067 otherbins = list(otherbins) # list of 0/1s, 1s start at median bin 2068 cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score 2069 LRL = smallest + binsize*cfbin # get lower read limit of that bin 2070 cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin 2071 freq = hist[cfbin] # frequency IN the 50%ile bin 2072 median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN 2073 return median 2074 2075 def amedianscore(inarray, dimension=None): 2076 """ 2077 Returns the 'middle' score of the passed array. If there is an even 2078 number of scores, the mean of the 2 middle scores is returned. Can function 2079 with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can 2080 be None, to pre-flatten the array, or else dimension must equal 0). 2081 2082 Usage: amedianscore(inarray,dimension=None) 2083 Returns: 'middle' score of the array, or the mean of the 2 middle scores 2084 """ 2085 if dimension is None: 2086 inarray = N.ravel(inarray) 2087 dimension = 0 2088 inarray = N.sort(inarray, dimension) 2089 if inarray.shape[dimension] % 2 == 0: # if even number of elements 2090 indx = inarray.shape[dimension]/2 # integer division correct 2091 median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 2092 else: 2093 indx = inarray.shape[dimension] / 2 # integer division correct 2094 median = N.take(inarray, [indx], dimension) 2095 if median.shape == (1,): 2096 median = median[0] 2097 return median 2098 2099 def amode(a, dimension=None): 2100 """ 2101 Returns an array of the modal (most common) score in the passed array. 2102 If there is more than one such score, ONLY THE FIRST is returned. 2103 The bin-count for the modal values is also returned. Operates on whole 2104 array (dimension=None), or on a given dimension. 2105 2106 Usage: amode(a, dimension=None) 2107 Returns: array of bin-counts for mode(s), array of corresponding modal values 2108 """ 2109 2110 if dimension is None: 2111 a = N.ravel(a) 2112 dimension = 0 2113 scores = pstat.aunique(N.ravel(a)) # get ALL unique values 2114 testshape = list(a.shape) 2115 testshape[dimension] = 1 2116 oldmostfreq = N.zeros(testshape) 2117 oldcounts = N.zeros(testshape) 2118 for score in scores: 2119 template = N.equal(a, score) 2120 counts = asum(template, dimension, 1) 2121 mostfrequent = N.where(N.greater(counts, oldcounts), score, oldmostfreq) 2122 oldcounts = N.where(N.greater(counts, oldcounts), counts, oldcounts) 2123 oldmostfreq = mostfrequent 2124 return oldcounts, mostfrequent 2125 2126 def atmean(a, limits=None, inclusive=(1, 1)): 2127 """ 2128 Returns the arithmetic mean of all values in an array, ignoring values 2129 strictly outside the sequence passed to 'limits'. Note: either limit 2130 in the sequence, or the value of limits itself, can be set to None. The 2131 inclusive list/tuple determines whether the lower and upper limiting bounds 2132 (respectively) are open/exclusive (0) or closed/inclusive (1). 2133 2134 Usage: atmean(a,limits=None,inclusive=(1,1)) 2135 """ 2136 if a.typecode() in ['l', 's', 'b']: 2137 a = a.astype(N.Float) 2138 if limits is None: 2139 return mean(a) 2140 assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atmean" 2141 if inclusive[0]: 2142 lowerfcn = N.greater_equal 2143 else: 2144 lowerfcn = N.greater 2145 if inclusive[1]: 2146 upperfcn = N.less_equal 2147 else: 2148 upperfcn = N.less 2149 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2150 raise ValueError("No array values within given limits (atmean).") 2151 elif limits[0] is None and limits[1] is not None: 2152 mask = upperfcn(a, limits[1]) 2153 elif limits[0] is not None and limits[1] is None: 2154 mask = lowerfcn(a, limits[0]) 2155 elif limits[0] is not None and limits[1] is not None: 2156 mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1]) 2157 s = float(N.add.reduce(N.ravel(a*mask))) 2158 n = float(N.add.reduce(N.ravel(mask))) 2159 return s/n 2160 2161 def atvar(a, limits=None, inclusive=(1, 1)): 2162 """ 2163 Returns the sample variance of values in an array, (i.e., using N-1), 2164 ignoring values strictly outside the sequence passed to 'limits'. 2165 Note: either limit in the sequence, or the value of limits itself, 2166 can be set to None. The inclusive list/tuple determines whether the lower 2167 and upper limiting bounds (respectively) are open/exclusive (0) or 2168 closed/inclusive (1). 2169 2170 Usage: atvar(a,limits=None,inclusive=(1,1)) 2171 """ 2172 a = a.astype(N.Float) 2173 if limits is None or limits == [None, None]: 2174 term1 = N.add.reduce(N.ravel(a*a)) 2175 n = float(len(N.ravel(a))) - 1 2176 term2 = N.add.reduce(N.ravel(a))**2 / n 2177 print(term1, term2, n) 2178 return (term1 - term2) / n 2179 assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atvar" 2180 if inclusive[0]: 2181 lowerfcn = N.greater_equal 2182 else: 2183 lowerfcn = N.greater 2184 if inclusive[1]: 2185 upperfcn = N.less_equal 2186 else: 2187 upperfcn = N.less 2188 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2189 raise ValueError("No array values within given limits (atvar).") 2190 elif limits[0] is None and limits[1] is not None: 2191 mask = upperfcn(a, limits[1]) 2192 elif limits[0] is not None and limits[1] is None: 2193 mask = lowerfcn(a, limits[0]) 2194 elif limits[0] is not None and limits[1] is not None: 2195 mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1]) 2196 term1 = N.add.reduce(N.ravel(a*a*mask)) 2197 n = float(N.add.reduce(N.ravel(mask))) - 1 2198 term2 = N.add.reduce(N.ravel(a*mask))**2 / n 2199 print(term1, term2, n) 2200 return (term1 - term2) / n 2201 2202 def atmin(a, lowerlimit=None, dimension=None, inclusive=1): 2203 """ 2204 Returns the minimum value of a, along dimension, including only values less 2205 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None, 2206 all values in the array are used. 2207 2208 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1) 2209 """ 2210 if inclusive: 2211 lowerfcn = N.greater 2212 else: 2213 lowerfcn = N.greater_equal 2214 if dimension is None: 2215 a = N.ravel(a) 2216 dimension = 0 2217 if lowerlimit is None: 2218 lowerlimit = N.minimum.reduce(N.ravel(a))-11 2219 biggest = N.maximum.reduce(N.ravel(a)) 2220 ta = N.where(lowerfcn(a, lowerlimit), a, biggest) 2221 return N.minimum.reduce(ta, dimension) 2222 2223 def atmax(a, upperlimit, dimension=None, inclusive=1): 2224 """ 2225 Returns the maximum value of a, along dimension, including only values greater 2226 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None, 2227 a limit larger than the max value in the array is used. 2228 2229 Usage: atmax(a,upperlimit,dimension=None,inclusive=1) 2230 """ 2231 if inclusive: 2232 upperfcn = N.less 2233 else: 2234 upperfcn = N.less_equal 2235 if dimension is None: 2236 a = N.ravel(a) 2237 dimension = 0 2238 if upperlimit is None: 2239 upperlimit = N.maximum.reduce(N.ravel(a))+1 2240 smallest = N.minimum.reduce(N.ravel(a)) 2241 ta = N.where(upperfcn(a, upperlimit), a, smallest) 2242 return N.maximum.reduce(ta, dimension) 2243 2244 def atstdev(a, limits=None, inclusive=(1, 1)): 2245 """ 2246 Returns the standard deviation of all values in an array, ignoring values 2247 strictly outside the sequence passed to 'limits'. Note: either limit 2248 in the sequence, or the value of limits itself, can be set to None. The 2249 inclusive list/tuple determines whether the lower and upper limiting bounds 2250 (respectively) are open/exclusive (0) or closed/inclusive (1). 2251 2252 Usage: atstdev(a,limits=None,inclusive=(1,1)) 2253 """ 2254 return N.sqrt(tvar(a, limits, inclusive)) 2255 2256 def atsem(a, limits=None, inclusive=(1, 1)): 2257 """ 2258 Returns the standard error of the mean for the values in an array, 2259 (i.e., using N for the denominator), ignoring values strictly outside 2260 the sequence passed to 'limits'. Note: either limit in the sequence, 2261 or the value of limits itself, can be set to None. The inclusive list/tuple 2262 determines whether the lower and upper limiting bounds (respectively) are 2263 open/exclusive (0) or closed/inclusive (1). 2264 2265 Usage: atsem(a,limits=None,inclusive=(1,1)) 2266 """ 2267 sd = tstdev(a, limits, inclusive) 2268 if limits is None or limits == [None, None]: 2269 n = float(len(N.ravel(a))) 2270 assert type(limits) in [list, tuple, N.ArrayType], "Wrong type for limits in atsem" 2271 if inclusive[0]: 2272 lowerfcn = N.greater_equal 2273 else: 2274 lowerfcn = N.greater 2275 if inclusive[1]: 2276 upperfcn = N.less_equal 2277 else: 2278 upperfcn = N.less 2279 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2280 raise ValueError("No array values within given limits (atsem).") 2281 elif limits[0] is None and limits[1] is not None: 2282 mask = upperfcn(a, limits[1]) 2283 elif limits[0] is not None and limits[1] is None: 2284 mask = lowerfcn(a, limits[0]) 2285 elif limits[0] is not None and limits[1] is not None: 2286 mask = lowerfcn(a, limits[0])*upperfcn(a, limits[1]) 2287 N.add.reduce(N.ravel(a*a*mask)) 2288 n = float(N.add.reduce(N.ravel(mask))) 2289 return sd/math.sqrt(n) 2290 2291# AMOMENTS 2292 2293 def amoment(a, moment=1, dimension=None): 2294 """ 2295 Calculates the nth moment about the mean for a sample (defaults to the 2296 1st moment). Generally used to calculate coefficients of skewness and 2297 kurtosis. Dimension can equal None (ravel array first), an integer 2298 (the dimension over which to operate), or a sequence (operate over 2299 multiple dimensions). 2300 2301 Usage: amoment(a,moment=1,dimension=None) 2302 Returns: appropriate moment along given dimension 2303 """ 2304 if dimension is None: 2305 a = N.ravel(a) 2306 dimension = 0 2307 if moment == 1: 2308 return 0.0 2309 else: 2310 mn = amean(a, dimension, 1) # 1=keepdims 2311 s = N.power((a-mn), moment) 2312 return amean(s, dimension) 2313 2314 def avariation(a, dimension=None): 2315 """ 2316 Returns the coefficient of variation, as defined in CRC Standard 2317 Probability and Statistics, p.6. Dimension can equal None (ravel array 2318 first), an integer (the dimension over which to operate), or a 2319 sequence (operate over multiple dimensions). 2320 2321 Usage: avariation(a,dimension=None) 2322 """ 2323 return 100.0*asamplestdev(a, dimension)/amean(a, dimension) 2324 2325 def askew(a, dimension=None): 2326 """ 2327 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra 2328 weight in left tail). Use askewtest() to see if it's close enough. 2329 Dimension can equal None (ravel array first), an integer (the 2330 dimension over which to operate), or a sequence (operate over multiple 2331 dimensions). 2332 2333 Usage: askew(a, dimension=None) 2334 Returns: skew of vals in a along dimension, returning ZERO where all vals equal 2335 """ 2336 denom = N.power(amoment(a, 2, dimension), 1.5) 2337 zero = N.equal(denom, 0) 2338 if isinstance(denom, N.ArrayType) and asum(zero) != 0: 2339 print("Number of zeros in askew: ", asum(zero)) 2340 denom = denom + zero # prevent divide-by-zero 2341 return N.where(zero, 0, amoment(a, 3, dimension)/denom) 2342 2343 def akurtosis(a, dimension=None): 2344 """ 2345 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means 2346 heavier in the tails, and usually more peaked). Use akurtosistest() 2347 to see if it's close enough. Dimension can equal None (ravel array 2348 first), an integer (the dimension over which to operate), or a 2349 sequence (operate over multiple dimensions). 2350 2351 Usage: akurtosis(a,dimension=None) 2352 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal 2353 """ 2354 denom = N.power(amoment(a, 2, dimension), 2) 2355 zero = N.equal(denom, 0) 2356 if isinstance(denom, N.ArrayType) and asum(zero) != 0: 2357 print("Number of zeros in akurtosis: ", asum(zero)) 2358 denom = denom + zero # prevent divide-by-zero 2359 return N.where(zero, 0, amoment(a, 4, dimension)/denom) 2360 2361 def adescribe(inarray, dimension=None): 2362 """ 2363 Returns several descriptive statistics of the passed array. Dimension 2364 can equal None (ravel array first), an integer (the dimension over 2365 which to operate), or a sequence (operate over multiple dimensions). 2366 2367 Usage: adescribe(inarray,dimension=None) 2368 Returns: n, (min,max), mean, standard deviation, skew, kurtosis 2369 """ 2370 if dimension is None: 2371 inarray = N.ravel(inarray) 2372 dimension = 0 2373 n = inarray.shape[dimension] 2374 mm = (N.minimum.reduce(inarray), N.maximum.reduce(inarray)) 2375 m = amean(inarray, dimension) 2376 sd = astdev(inarray, dimension) 2377 skew = askew(inarray, dimension) 2378 kurt = akurtosis(inarray, dimension) 2379 return n, mm, m, sd, skew, kurt 2380 2381# NORMALITY TESTS 2382 2383 def askewtest(a, dimension=None): 2384 """ 2385 Tests whether the skew is significantly different from a normal 2386 distribution. Dimension can equal None (ravel array first), an 2387 integer (the dimension over which to operate), or a sequence (operate 2388 over multiple dimensions). 2389 2390 Usage: askewtest(a,dimension=None) 2391 Returns: z-score and 2-tail z-probability 2392 """ 2393 if dimension is None: 2394 a = N.ravel(a) 2395 dimension = 0 2396 b2 = askew(a, dimension) 2397 n = float(a.shape[dimension]) 2398 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2))) 2399 beta2 = (3.0*(n*n+27*n-70)*(n+1)*(n+3)) / ((n-2.0)*(n+5)*(n+7)*(n+9)) 2400 W2 = -1 + N.sqrt(2*(beta2-1)) 2401 delta = 1/N.sqrt(N.log(N.sqrt(W2))) 2402 alpha = N.sqrt(2/(W2-1)) 2403 y = N.where(N.equal(y, 0), 1, y) 2404 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) 2405 return Z, (1.0-zprob(Z))*2 2406 2407 def akurtosistest(a, dimension=None): 2408 """ 2409 Tests whether a dataset has normal kurtosis (i.e., 2410 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None 2411 (ravel array first), an integer (the dimension over which to operate), 2412 or a sequence (operate over multiple dimensions). 2413 2414 Usage: akurtosistest(a,dimension=None) 2415 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels 2416 """ 2417 if dimension is None: 2418 a = N.ravel(a) 2419 dimension = 0 2420 n = float(a.shape[dimension]) 2421 if n < 20: 2422 print("akurtosistest only valid for n>=20 ... continuing anyway, n=", n) 2423 b2 = akurtosis(a, dimension) 2424 E = 3.0*(n-1) / (n+1) 2425 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) 2426 x = (b2-E)/N.sqrt(varb2) 2427 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5)) 2428 / (n*(n-2)*(n-3))) 2429 A = 6.0 + 8.0/sqrtbeta1 * (2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) 2430 term1 = 1 - 2/(9.0*A) 2431 denom = 1 + x*N.sqrt(2/(A-4.0)) 2432 denom = N.where(N.less(denom, 0), 99, denom) 2433 term2 = N.where(N.equal(denom, 0), term1, N.power((1-2.0/A)/denom, 1/3.0)) 2434 Z = (term1 - term2) / N.sqrt(2/(9.0*A)) 2435 Z = N.where(N.equal(denom, 99), 0, Z) 2436 return Z, (1.0-zprob(Z))*2 2437 2438 def anormaltest(a, dimension=None): 2439 """ 2440 Tests whether skew and/OR kurtosis of dataset differs from normal 2441 curve. Can operate over multiple dimensions. Dimension can equal 2442 None (ravel array first), an integer (the dimension over which to 2443 operate), or a sequence (operate over multiple dimensions). 2444 2445 Usage: anormaltest(a,dimension=None) 2446 Returns: z-score and 2-tail probability 2447 """ 2448 if dimension is None: 2449 a = N.ravel(a) 2450 dimension = 0 2451 s, p = askewtest(a, dimension) 2452 k, p = akurtosistest(a, dimension) 2453 k2 = N.power(s, 2) + N.power(k, 2) 2454 return k2, achisqprob(k2, 2) 2455 2456# AFREQUENCY FUNCTIONS 2457 2458 def aitemfreq(a): 2459 """ 2460 Returns a 2D array of item frequencies. Column 1 contains item values, 2461 column 2 contains their respective counts. Assumes a 1D array is passed. 2462 2463 Usage: aitemfreq(a) 2464 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 2465 """ 2466 scores = pstat.aunique(a) 2467 scores = N.sort(scores) 2468 freq = N.zeros(len(scores)) 2469 for i in range(len(scores)): 2470 freq[i] = N.add.reduce(N.equal(a, scores[i])) 2471 return N.array(pstat.aabut(scores, freq)) 2472 2473 def ascoreatpercentile(inarray, percent): 2474 """ 2475 Usage: ascoreatpercentile(inarray,percent) 0<percent<100 2476 Returns: score at given percentile, relative to inarray distribution 2477 """ 2478 percent = percent / 100.0 2479 targetcf = percent*len(inarray) 2480 h, lrl, binsize, extras = histogram(inarray) 2481 cumhist = cumsum(h*1) 2482 for i in range(len(cumhist)): 2483 if cumhist[i] >= targetcf: 2484 break 2485 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 2486 return score 2487 2488 def apercentileofscore(inarray, score, histbins=10, defaultlimits=None): 2489 """ 2490 Note: result of this function depends on the values used to histogram 2491 the data(!). 2492 2493 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None) 2494 Returns: percentile-position of score (0-100) relative to inarray 2495 """ 2496 h, lrl, binsize, extras = histogram(inarray, histbins, defaultlimits) 2497 cumhist = cumsum(h*1) 2498 i = int((score - lrl)/float(binsize)) 2499 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 2500 return pct 2501 2502 def ahistogram(inarray, numbins=10, defaultlimits=None, printextras=1): 2503 """ 2504 Returns (i) an array of histogram bin counts, (ii) the smallest value 2505 of the histogram binning, and (iii) the bin width (the last 2 are not 2506 necessarily integers). Default number of bins is 10. Defaultlimits 2507 can be None (the routine picks bins spanning all the numbers in the 2508 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the 2509 following: array of bin values, lowerreallimit, binsize, extrapoints. 2510 2511 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) 2512 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) 2513 """ 2514 inarray = N.ravel(inarray) # flatten any >1D arrays 2515 if (defaultlimits is not None): 2516 lowerreallimit = defaultlimits[0] 2517 upperreallimit = defaultlimits[1] 2518 binsize = (upperreallimit-lowerreallimit) / float(numbins) 2519 else: 2520 Min = N.minimum.reduce(inarray) 2521 Max = N.maximum.reduce(inarray) 2522 estbinwidth = float(Max - Min)/float(numbins) + 1 2523 binsize = (Max-Min+estbinwidth)/float(numbins) 2524 lowerreallimit = Min - binsize/2.0 # lower real limit,1st bin 2525 bins = N.zeros(numbins) 2526 extrapoints = 0 2527 for num in inarray: 2528 try: 2529 if (num-lowerreallimit) < 0: 2530 extrapoints = extrapoints + 1 2531 else: 2532 bintoincrement = int((num-lowerreallimit) / float(binsize)) 2533 bins[bintoincrement] = bins[bintoincrement] + 1 2534 except Exception: # point outside lower/upper limits 2535 extrapoints = extrapoints + 1 2536 if (extrapoints > 0 and printextras == 1): 2537 print('\nPoints outside given histogram range =', extrapoints) 2538 return (bins, lowerreallimit, binsize, extrapoints) 2539 2540 def acumfreq(a, numbins=10, defaultreallimits=None): 2541 """ 2542 Returns a cumulative frequency histogram, using the histogram function. 2543 Defaultreallimits can be None (use all data), or a 2-sequence containing 2544 lower and upper limits on values to include. 2545 2546 Usage: acumfreq(a,numbins=10,defaultreallimits=None) 2547 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2548 """ 2549 h, l, b, e = histogram(a, numbins, defaultreallimits) 2550 cumhist = cumsum(h*1) 2551 return cumhist, l, b, e 2552 2553 def arelfreq(a, numbins=10, defaultreallimits=None): 2554 """ 2555 Returns a relative frequency histogram, using the histogram function. 2556 Defaultreallimits can be None (use all data), or a 2-sequence containing 2557 lower and upper limits on values to include. 2558 2559 Usage: arelfreq(a,numbins=10,defaultreallimits=None) 2560 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2561 """ 2562 h, l, b, e = histogram(a, numbins, defaultreallimits) 2563 h = N.array(h/float(a.shape[0])) 2564 return h, l, b, e 2565 2566# AVARIABILITY FUNCTIONS 2567 2568 def aobrientransform(*args): 2569 """ 2570 Computes a transform on input data (any number of columns). Used to 2571 test for homogeneity of variance prior to running one-way stats. Each 2572 array in *args is one level of a factor. If an F_oneway() run on the 2573 transformed data and found significant, variances are unequal. From 2574 Maxwell and Delaney, p.112. 2575 2576 Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor 2577 Returns: transformed data for use in an ANOVA 2578 """ 2579 TINY = 1e-10 2580 k = len(args) 2581 n = N.zeros(k, N.Float) 2582 v = N.zeros(k, N.Float) 2583 m = N.zeros(k, N.Float) 2584 nargs = [] 2585 for i in range(k): 2586 nargs.append(args[i].astype(N.Float)) 2587 n[i] = float(len(nargs[i])) 2588 v[i] = var(nargs[i]) 2589 m[i] = mean(nargs[i]) 2590 for j in range(k): 2591 for i in range(n[j]): 2592 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 2593 t2 = 0.5*v[j]*(n[j]-1.0) 2594 t3 = (n[j]-1.0)*(n[j]-2.0) 2595 nargs[j][i] = (t1-t2) / float(t3) 2596 check = 1 2597 for j in range(k): 2598 if v[j] - mean(nargs[j]) > TINY: 2599 check = 0 2600 if check != 1: 2601 raise ValueError('Lack of convergence in obrientransform.') 2602 else: 2603 return N.array(nargs) 2604 2605 def asamplevar(inarray, dimension=None, keepdims=0): 2606 """ 2607 Returns the sample standard deviation of the values in the passed 2608 array (i.e., using N). Dimension can equal None (ravel array first), 2609 an integer (the dimension over which to operate), or a sequence 2610 (operate over multiple dimensions). Set keepdims=1 to return an array 2611 with the same number of dimensions as inarray. 2612 2613 Usage: asamplevar(inarray,dimension=None,keepdims=0) 2614 """ 2615 if dimension is None: 2616 inarray = N.ravel(inarray) 2617 dimension = 0 2618 if dimension == 1: 2619 mn = amean(inarray, dimension)[:, N.NewAxis] 2620 else: 2621 mn = amean(inarray, dimension, keepdims=1) 2622 deviations = inarray - mn 2623 if isinstance(dimension, list): 2624 n = 1 2625 for d in dimension: 2626 n = n*inarray.shape[d] 2627 else: 2628 n = inarray.shape[dimension] 2629 svar = ass(deviations, dimension, keepdims) / float(n) 2630 return svar 2631 2632 def asamplestdev(inarray, dimension=None, keepdims=0): 2633 """ 2634 Returns the sample standard deviation of the values in the passed 2635 array (i.e., using N). Dimension can equal None (ravel array first), 2636 an integer (the dimension over which to operate), or a sequence 2637 (operate over multiple dimensions). Set keepdims=1 to return an array 2638 with the same number of dimensions as inarray. 2639 2640 Usage: asamplestdev(inarray,dimension=None,keepdims=0) 2641 """ 2642 return N.sqrt(asamplevar(inarray, dimension, keepdims)) 2643 2644 def asignaltonoise(instack, dimension=0): 2645 """ 2646 Calculates signal-to-noise. Dimension can equal None (ravel array 2647 first), an integer (the dimension over which to operate), or a 2648 sequence (operate over multiple dimensions). 2649 2650 Usage: asignaltonoise(instack,dimension=0): 2651 Returns: array containing the value of (mean/stdev) along dimension, 2652 or 0 when stdev=0 2653 """ 2654 m = mean(instack, dimension) 2655 sd = stdev(instack, dimension) 2656 return N.where(N.equal(sd, 0), 0, m/sd) 2657 2658 def avar(inarray, dimension=None, keepdims=0): 2659 """ 2660 Returns the estimated population variance of the values in the passed 2661 array (i.e., N-1). Dimension can equal None (ravel array first), an 2662 integer (the dimension over which to operate), or a sequence (operate 2663 over multiple dimensions). Set keepdims=1 to return an array with the 2664 same number of dimensions as inarray. 2665 2666 Usage: avar(inarray,dimension=None,keepdims=0) 2667 """ 2668 if dimension is None: 2669 inarray = N.ravel(inarray) 2670 dimension = 0 2671 mn = amean(inarray, dimension, 1) 2672 deviations = inarray - mn 2673 if isinstance(dimension, list): 2674 n = 1 2675 for d in dimension: 2676 n = n*inarray.shape[d] 2677 else: 2678 n = inarray.shape[dimension] 2679 var = ass(deviations, dimension, keepdims)/float(n-1) 2680 return var 2681 2682 def astdev(inarray, dimension=None, keepdims=0): 2683 """ 2684 Returns the estimated population standard deviation of the values in 2685 the passed array (i.e., N-1). Dimension can equal None (ravel array 2686 first), an integer (the dimension over which to operate), or a 2687 sequence (operate over multiple dimensions). Set keepdims=1 to return 2688 an array with the same number of dimensions as inarray. 2689 2690 Usage: astdev(inarray,dimension=None,keepdims=0) 2691 """ 2692 return N.sqrt(avar(inarray, dimension, keepdims)) 2693 2694 def asterr(inarray, dimension=None, keepdims=0): 2695 """ 2696 Returns the estimated population standard error of the values in the 2697 passed array (i.e., N-1). Dimension can equal None (ravel array 2698 first), an integer (the dimension over which to operate), or a 2699 sequence (operate over multiple dimensions). Set keepdims=1 to return 2700 an array with the same number of dimensions as inarray. 2701 2702 Usage: asterr(inarray,dimension=None,keepdims=0) 2703 """ 2704 if dimension is None: 2705 inarray = N.ravel(inarray) 2706 dimension = 0 2707 return astdev(inarray, dimension, keepdims) / float(N.sqrt(inarray.shape[dimension])) 2708 2709 def asem(inarray, dimension=None, keepdims=0): 2710 """ 2711 Returns the standard error of the mean (i.e., using N) of the values 2712 in the passed array. Dimension can equal None (ravel array first), an 2713 integer (the dimension over which to operate), or a sequence (operate 2714 over multiple dimensions). Set keepdims=1 to return an array with the 2715 same number of dimensions as inarray. 2716 2717 Usage: asem(inarray,dimension=None, keepdims=0) 2718 """ 2719 if dimension is None: 2720 inarray = N.ravel(inarray) 2721 dimension = 0 2722 if isinstance(dimension, list): 2723 n = 1 2724 for d in dimension: 2725 n = n*inarray.shape[d] 2726 else: 2727 n = inarray.shape[dimension] 2728 s = asamplestdev(inarray, dimension, keepdims) / N.sqrt(n-1) 2729 return s 2730 2731 def az(a, score): 2732 """ 2733 Returns the z-score of a given input score, given thearray from which 2734 that score came. Not appropriate for population calculations, nor for 2735 arrays > 1D. 2736 2737 Usage: az(a, score) 2738 """ 2739 z = (score-amean(a)) / asamplestdev(a) 2740 return z 2741 2742 def azs(a): 2743 """ 2744 Returns a 1D array of z-scores, one for each score in the passed array, 2745 computed relative to the passed array. 2746 2747 Usage: azs(a) 2748 """ 2749 zscores = [] 2750 for item in a: 2751 zscores.append(z(a, item)) 2752 return N.array(zscores) 2753 2754 def azmap(scores, compare, dimension=0): 2755 """ 2756 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to 2757 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 2758 of the compare array. 2759 2760 Usage: azs(scores, compare, dimension=0) 2761 """ 2762 mns = amean(compare, dimension) 2763 sstd = asamplestdev(compare, 0) 2764 return (scores - mns) / sstd 2765 2766# ATRIMMING FUNCTIONS 2767 2768 def around(a, digits=1): 2769 """ 2770 Rounds all values in array a to 'digits' decimal places. 2771 2772 Usage: around(a,digits) 2773 Returns: a, where each value is rounded to 'digits' decimals 2774 """ 2775 def ar(x, d=digits): 2776 return round(x, d) 2777 2778 if not isinstance(a, N.ArrayType): 2779 try: 2780 a = N.array(a) 2781 except Exception: 2782 a = N.array(a, 'O') 2783 shp = a.shape 2784 if a.typecode() in ['f', 'F', 'd', 'D']: 2785 b = N.ravel(a) 2786 b = N.array([ar(_) for _ in b]) 2787 b.shape = shp 2788 elif a.typecode() in ['o', 'O']: 2789 b = N.ravel(a)*1 2790 for i in range(len(b)): 2791 if isinstance(b[i], float): 2792 b[i] = round(b[i], digits) 2793 b.shape = shp 2794 else: # not a float, double or Object array 2795 b = a*1 2796 return b 2797 2798 def athreshold(a, threshmin=None, threshmax=None, newval=0): 2799 """ 2800 Like Numeric.clip() except that values <threshmid or >threshmax are replaced 2801 by newval instead of by threshmin/threshmax (respectively). 2802 2803 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0) 2804 Returns: a, with values <threshmin or >threshmax replaced with newval 2805 """ 2806 mask = N.zeros(a.shape) 2807 if threshmin is not None: 2808 mask = mask + N.where(N.less(a, threshmin), 1, 0) 2809 if threshmax is not None: 2810 mask = mask + N.where(N.greater(a, threshmax), 1, 0) 2811 mask = N.clip(mask, 0, 1) 2812 return N.where(mask, newval, a) 2813 2814 def atrimboth(a, proportiontocut): 2815 """ 2816 Slices off the passed proportion of items from BOTH ends of the passed 2817 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 2818 'rightmost' 10% of scores. You must pre-sort the array if you want 2819 "proper" trimming. Slices off LESS if proportion results in a 2820 non-integer slice index (i.e., conservatively slices off 2821 proportiontocut). 2822 2823 Usage: atrimboth (a,proportiontocut) 2824 Returns: trimmed version of array a 2825 """ 2826 lowercut = int(proportiontocut*len(a)) 2827 uppercut = len(a) - lowercut 2828 return a[lowercut:uppercut] 2829 2830 def atrim1(a, proportiontocut, tail='right'): 2831 """ 2832 Slices off the passed proportion of items from ONE end of the passed 2833 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 2834 10% of scores). Slices off LESS if proportion results in a non-integer 2835 slice index (i.e., conservatively slices off proportiontocut). 2836 2837 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left' 2838 Returns: trimmed version of array a 2839 """ 2840 if string.lower(tail) == 'right': 2841 lowercut = 0 2842 uppercut = len(a) - int(proportiontocut*len(a)) 2843 elif string.lower(tail) == 'left': 2844 lowercut = int(proportiontocut*len(a)) 2845 uppercut = len(a) 2846 return a[lowercut:uppercut] 2847 2848# ACORRELATION FUNCTIONS 2849 2850 def acovariance(X): 2851 """ 2852 Computes the covariance matrix of a matrix X. Requires a 2D matrix input. 2853 2854 Usage: acovariance(X) 2855 Returns: covariance matrix of X 2856 """ 2857 if len(X.shape) != 2: 2858 raise TypeError("acovariance requires 2D matrices") 2859 n = X.shape[0] 2860 mX = amean(X, 0) 2861 return N.dot(N.transpose(X), X) / float(n) - N.multiply.outer(mX, mX) 2862 2863 def acorrelation(X): 2864 """ 2865 Computes the correlation matrix of a matrix X. Requires a 2D matrix input. 2866 2867 Usage: acorrelation(X) 2868 Returns: correlation matrix of X 2869 """ 2870 C = acovariance(X) 2871 V = N.diagonal(C) 2872 return C / N.sqrt(N.multiply.outer(V, V)) 2873 2874 def apaired(x, y): 2875 """ 2876 Interactively determines the type of data in x and y, and then runs the 2877 appropriated statistic for paired group data. 2878 2879 Usage: apaired(x,y) x,y = the two arrays of values to be compared 2880 Returns: appropriate statistic name, value, and probability 2881 """ 2882 samples = '' 2883 while samples not in ['i', 'r', 'I', 'R', 'c', 'C']: 2884 print('\nIndependent or related samples, or correlation (i,r,c): ', end=' ') 2885 samples = input() 2886 2887 if samples in ['i', 'I', 'r', 'R']: 2888 print('\nComparing variances ...', end=' ') 2889 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 2890 r = obrientransform(x, y) 2891 f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1)) 2892 if p < 0.05: 2893 vartype = 'unequal, p='+str(round(p, 4)) 2894 else: 2895 vartype = 'equal' 2896 print(vartype) 2897 if samples in ['i', 'I']: 2898 if vartype[0] == 'e': 2899 t, p = ttest_ind(x, y, None, 0) 2900 print('\nIndependent samples t-test: ', round(t, 4), round(p, 4)) 2901 else: 2902 if len(x) > 20 or len(y) > 20: 2903 z, p = ranksums(x, y) 2904 print('\nRank Sums test (NONparametric, n>20): ', round(z, 4), round(p, 4)) 2905 else: 2906 u, p = mannwhitneyu(x, y) 2907 print('\nMann-Whitney U-test (NONparametric, ns<20): ', round(u, 4), round(p, 4)) 2908 2909 else: # RELATED SAMPLES 2910 if vartype[0] == 'e': 2911 t, p = ttest_rel(x, y, 0) 2912 print('\nRelated samples t-test: ', round(t, 4), round(p, 4)) 2913 else: 2914 t, p = ranksums(x, y) 2915 print('\nWilcoxon T-test (NONparametric): ', round(t, 4), round(p, 4)) 2916 else: # CORRELATION ANALYSIS 2917 corrtype = '' 2918 while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']: 2919 print('\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', end=' ') 2920 corrtype = input() 2921 if corrtype in ['c', 'C']: 2922 m, b, r, p, see = linregress(x, y) 2923 print('\nLinear regression for continuous variables ...') 2924 lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'], [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]] 2925 pstat.printcc(lol) 2926 elif corrtype in ['r', 'R']: 2927 r, p = spearmanr(x, y) 2928 print('\nCorrelation for ranked variables ...') 2929 print("Spearman's r: ", round(r, 4), round(p, 4)) 2930 else: # DICHOTOMOUS 2931 r, p = pointbiserialr(x, y) 2932 print('\nAssuming x contains a dichotomous variable ...') 2933 print('Point Biserial r: ', round(r, 4), round(p, 4)) 2934 print('\n\n') 2935 return None 2936 2937 def apearsonr(x, y, verbose=1): 2938 """ 2939 Calculates a Pearson correlation coefficient and returns p. Taken 2940 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. 2941 2942 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays 2943 Returns: Pearson's r, two-tailed p-value 2944 """ 2945 TINY = 1.0e-20 2946 n = len(x) 2947 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 2948 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 2949 r = (r_num / r_den) 2950 df = n-2 2951 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 2952 prob = abetai(0.5*df, 0.5, df/(df+t*t), verbose) 2953 return r, prob 2954 2955 def aspearmanr(x, y): 2956 """ 2957 Calculates a Spearman rank-order correlation coefficient. Taken 2958 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 2959 2960 Usage: aspearmanr(x,y) where x,y are equal-length arrays 2961 Returns: Spearman's r, two-tailed p-value 2962 """ 2963 n = len(x) 2964 rankx = rankdata(x) 2965 ranky = rankdata(y) 2966 dsq = N.add.reduce((rankx-ranky)**2) 2967 rs = 1 - 6*dsq / float(n*(n**2-1)) 2968 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 2969 df = n-2 2970 probrs = abetai(0.5*df, 0.5, df/(df+t*t)) 2971 # probability values for rs are from part 2 of the spearman function in 2972 # Numerical Recipies, p.510. They close to tables, but not exact.(?) 2973 return rs, probrs 2974 2975 def apointbiserialr(x, y): 2976 """ 2977 Calculates a point-biserial correlation coefficient and the associated 2978 probability value. Taken from Heiman's Basic Statistics for the Behav. 2979 Sci (1st), p.194. 2980 2981 Usage: apointbiserialr(x,y) where x,y are equal length arrays 2982 Returns: Point-biserial r, two-tailed p-value 2983 """ 2984 TINY = 1e-30 2985 categories = pstat.aunique(x) 2986 data = pstat.aabut(x, y) 2987 if len(categories) != 2: 2988 raise ValueError("Exactly 2 categories required (in x) for pointbiserialr().") 2989 else: # there are 2 categories, continue 2990 codemap = pstat.aabut(categories, N.arange(2)) 2991 pstat.arecode(data, codemap, 0) # recoded 2992 x = pstat.alinexand(data, 0, categories[0]) 2993 y = pstat.alinexand(data, 0, categories[1]) 2994 xmean = amean(pstat.acolex(x, 1)) 2995 ymean = amean(pstat.acolex(y, 1)) 2996 n = len(data) 2997 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 2998 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data, 1))*adjust 2999 df = n-2 3000 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 3001 prob = abetai(0.5*df, 0.5, df/(df+t*t)) 3002 return rpb, prob 3003 3004 def akendalltau(x, y): 3005 """ 3006 Calculates Kendall's tau ... correlation of ordinal data. Adapted 3007 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@ 3008 3009 Usage: akendalltau(x,y) 3010 Returns: Kendall's tau, two-tailed p-value 3011 """ 3012 n1 = 0 3013 n2 = 0 3014 iss = 0 3015 for j in range(len(x)-1): 3016 for k in range(j, len(y)): 3017 a1 = x[j] - x[k] 3018 a2 = y[j] - y[k] 3019 aa = a1 * a2 3020 if (aa): # neither array has a tie 3021 n1 = n1 + 1 3022 n2 = n2 + 1 3023 if aa > 0: 3024 iss = iss + 1 3025 else: 3026 iss = iss - 1 3027 else: 3028 if (a1): 3029 n1 = n1 + 1 3030 else: 3031 n2 = n2 + 1 3032 tau = iss / math.sqrt(n1*n2) 3033 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 3034 z = tau / math.sqrt(svar) 3035 prob = erfcc(abs(z)/1.4142136) 3036 return tau, prob 3037 3038 def alinregress(*args): 3039 """ 3040 Calculates a regression line on two arrays, x and y, corresponding to x,y 3041 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels 3042 and splits data into x,y pairs along that dim. 3043 3044 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array 3045 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate 3046 """ 3047 TINY = 1.0e-20 3048 if len(args) == 1: # more than 1D array? 3049 args = args[0] 3050 if len(args) == 2: 3051 x = args[0] 3052 y = args[1] 3053 else: 3054 x = args[:, 0] 3055 y = args[:, 1] 3056 else: 3057 x = args[0] 3058 y = args[1] 3059 n = len(x) 3060 xmean = amean(x) 3061 ymean = amean(y) 3062 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 3063 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 3064 r = r_num / r_den 3065 df = n-2 3066 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 3067 prob = abetai(0.5*df, 0.5, df/(df+t*t)) 3068 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) 3069 intercept = ymean - slope*xmean 3070 sterrest = math.sqrt(1-r*r)*asamplestdev(y) 3071 return slope, intercept, r, prob, sterrest 3072 3073# AINFERENTIAL STATISTICS 3074 3075 def attest_1samp(a, popmean, printit=0, name='Sample', writemode='a'): 3076 """ 3077 Calculates the t-obtained for the independent samples T-test on ONE group 3078 of scores a, given a population mean. If printit=1, results are printed 3079 to the screen. If printit='filename', the results are output to 'filename' 3080 using the given writemode (default=append). Returns t-value, and prob. 3081 3082 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 3083 Returns: t-value, two-tailed prob 3084 """ 3085 if not isinstance(a, N.ArrayType): 3086 a = N.array(a) 3087 x = amean(a) 3088 v = avar(a) 3089 n = len(a) 3090 df = n-1 3091 svar = ((n-1)*v) / float(df) 3092 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 3093 prob = abetai(0.5*df, 0.5, df/(df+t*t)) 3094 3095 if printit != 0: 3096 statname = 'Single-sample T-test.' 3097 outputpairedstats(printit, writemode, 3098 'Population', '--', popmean, 0, 0, 0, 3099 name, n, x, v, N.minimum.reduce(N.ravel(a)), 3100 N.maximum.reduce(N.ravel(a)), 3101 statname, t, prob) 3102 return t, prob 3103 3104 def attest_ind(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a'): 3105 """ 3106 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores 3107 a, and b. From Numerical Recipies, p.483. If printit=1, results are 3108 printed to the screen. If printit='filename', the results are output 3109 to 'filename' using the given writemode (default=append). Dimension 3110 can equal None (ravel array first), or an integer (the dimension over 3111 which to operate on a and b). 3112 3113 Usage: attest_ind (a,b,dimension=None,printit=0, 3114 Name1='Samp1',Name2='Samp2',writemode='a') 3115 Returns: t-value, two-tailed p-value 3116 """ 3117 if dimension is None: 3118 a = N.ravel(a) 3119 b = N.ravel(b) 3120 dimension = 0 3121 x1 = amean(a, dimension) 3122 x2 = amean(b, dimension) 3123 v1 = avar(a, dimension) 3124 v2 = avar(b, dimension) 3125 n1 = a.shape[dimension] 3126 n2 = b.shape[dimension] 3127 df = n1+n2-2 3128 svar = ((n1-1)*v1+(n2-1)*v2) / float(df) 3129 zerodivproblem = N.equal(svar, 0) 3130 svar = N.where(zerodivproblem, 1, svar) # avoid zero-division in 1st place 3131 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!! 3132 t = N.where(zerodivproblem, 1.0, t) # replace NaN/wrong t-values with 1.0 3133 probs = abetai(0.5*df, 0.5, float(df)/(df+t*t)) 3134 3135 if isinstance(t, N.ArrayType): 3136 probs = N.reshape(probs, t.shape) 3137 if len(probs) == 1: 3138 probs = probs[0] 3139 3140 if printit != 0: 3141 if isinstance(t, N.ArrayType): 3142 t = t[0] 3143 if isinstance(probs, N.ArrayType): 3144 probs = probs[0] 3145 statname = 'Independent samples T-test.' 3146 outputpairedstats(printit, writemode, 3147 name1, n1, x1, v1, N.minimum.reduce(N.ravel(a)), 3148 N.maximum.reduce(N.ravel(a)), 3149 name2, n2, x2, v2, N.minimum.reduce(N.ravel(b)), 3150 N.maximum.reduce(N.ravel(b)), 3151 statname, t, probs) 3152 return 3153 return t, probs 3154 3155 def attest_rel(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a'): 3156 """ 3157 Calculates the t-obtained T-test on TWO RELATED samples of scores, a 3158 and b. From Numerical Recipies, p.483. If printit=1, results are 3159 printed to the screen. If printit='filename', the results are output 3160 to 'filename' using the given writemode (default=append). Dimension 3161 can equal None (ravel array first), or an integer (the dimension over 3162 which to operate on a and b). 3163 3164 Usage: attest_rel(a,b,dimension=None,printit=0, 3165 name1='Samp1',name2='Samp2',writemode='a') 3166 Returns: t-value, two-tailed p-value 3167 """ 3168 if dimension is None: 3169 a = N.ravel(a) 3170 b = N.ravel(b) 3171 dimension = 0 3172 if len(a) != len(b): 3173 raise ValueError('Unequal length arrays.') 3174 x1 = amean(a, dimension) 3175 x2 = amean(b, dimension) 3176 v1 = avar(a, dimension) 3177 v2 = avar(b, dimension) 3178 n = a.shape[dimension] 3179 df = float(n-1) 3180 d = (a-b).astype('d') 3181 3182 denom = N.sqrt((n*N.add.reduce(d*d, dimension) - N.add.reduce(d, dimension)**2) / df) 3183 zerodivproblem = N.equal(denom, 0) 3184 denom = N.where(zerodivproblem, 1, denom) # avoid zero-division in 1st place 3185 t = N.add.reduce(d, dimension) / denom # N-D COMPUTATION HERE!!!!!! 3186 t = N.where(zerodivproblem, 1.0, t) # replace NaN/wrong t-values with 1.0 3187 probs = abetai(0.5*df, 0.5, float(df)/(df+t*t)) 3188 if isinstance(t, N.ArrayType): 3189 probs = N.reshape(probs, t.shape) 3190 if len(probs) == 1: 3191 probs = probs[0] 3192 3193 if printit != 0: 3194 statname = 'Related samples T-test.' 3195 outputpairedstats(printit, writemode, 3196 name1, n, x1, v1, N.minimum.reduce(N.ravel(a)), 3197 N.maximum.reduce(N.ravel(a)), 3198 name2, n, x2, v2, N.minimum.reduce(N.ravel(b)), 3199 N.maximum.reduce(N.ravel(b)), 3200 statname, t, probs) 3201 return 3202 return t, probs 3203 3204 def achisquare(f_obs, f_exp=None): 3205 """ 3206 Calculates a one-way chi square for array of observed frequencies and returns 3207 the result. If no expected frequencies are given, the total N is assumed to 3208 be equally distributed across all groups. 3209 3210 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq. 3211 Returns: chisquare-statistic, associated p-value 3212 """ 3213 3214 k = len(f_obs) 3215 if f_exp is None: 3216 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs), N.Float) 3217 f_exp = f_exp.astype(N.Float) 3218 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) 3219 return chisq, chisqprob(chisq, k-1) 3220 3221 def aks_2samp(data1, data2): 3222 """ 3223 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from 3224 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc- 3225 like. 3226 3227 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays 3228 Returns: KS D-value, p-value 3229 """ 3230 j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE 3231 j2 = 0 # N.zeros(data2.shape[1:]) 3232 fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) 3233 fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) 3234 n1 = data1.shape[0] 3235 n2 = data2.shape[0] 3236 en1 = n1*1 3237 en2 = n2*1 3238 d = N.zeros(data1.shape[1:], N.Float) 3239 data1 = N.sort(data1, 0) 3240 data2 = N.sort(data2, 0) 3241 while j1 < n1 and j2 < n2: 3242 d1 = data1[j1] 3243 d2 = data2[j2] 3244 if d1 <= d2: 3245 fn1 = (j1)/float(en1) 3246 j1 = j1 + 1 3247 if d2 <= d1: 3248 fn2 = (j2)/float(en2) 3249 j2 = j2 + 1 3250 dt = (fn2-fn1) 3251 if abs(dt) > abs(d): 3252 d = dt 3253 try: 3254 en = math.sqrt(en1*en2/float(en1+en2)) 3255 prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) 3256 except Exception: 3257 prob = 1.0 3258 return d, prob 3259 3260 def amannwhitneyu(x, y): 3261 """ 3262 Calculates a Mann-Whitney U statistic on the provided scores and 3263 returns the result. Use only when the n in each condition is < 20 and 3264 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is 3265 significant if the u-obtained is LESS THAN or equal to the critical 3266 value of U. 3267 3268 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions 3269 Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 3270 """ 3271 n1 = len(x) 3272 n2 = len(y) 3273 ranked = rankdata(N.concatenate((x, y))) 3274 rankx = ranked[0:n1] # get the x-ranks 3275 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 3276 u2 = n1*n2 - u1 # remainder is U for y 3277 bigu = max(u1, u2) 3278 smallu = min(u1, u2) 3279 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 3280 if T == 0: 3281 raise ValueError('All numbers are identical in amannwhitneyu') 3282 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 3283 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 3284 return smallu, 1.0 - zprob(z) 3285 3286 def atiecorrect(rankvals): 3287 """ 3288 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. 3289 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral 3290 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c 3291 code. 3292 3293 Usage: atiecorrect(rankvals) 3294 Returns: T correction factor for U or H 3295 """ 3296 sorted, posn = ashellsort(N.array(rankvals)) 3297 n = len(sorted) 3298 T = 0.0 3299 i = 0 3300 while (i < n-1): 3301 if sorted[i] == sorted[i+1]: 3302 nties = 1 3303 while (i < n-1) and (sorted[i] == sorted[i+1]): 3304 nties = nties + 1 3305 i = i + 1 3306 T = T + nties**3 - nties 3307 i = i+1 3308 T = T / float(n**3-n) 3309 return 1.0 - T 3310 3311 def aranksums(x, y): 3312 """ 3313 Calculates the rank sums statistic on the provided scores and returns 3314 the result. 3315 3316 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions 3317 Returns: z-statistic, two-tailed p-value 3318 """ 3319 n1 = len(x) 3320 n2 = len(y) 3321 alldata = N.concatenate((x, y)) 3322 ranked = arankdata(alldata) 3323 x = ranked[:n1] 3324 y = ranked[n1:] 3325 s = sum(x) 3326 expected = n1*(n1+n2+1) / 2.0 3327 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 3328 prob = 2*(1.0 - zprob(abs(z))) 3329 return z, prob 3330 3331 def awilcoxont(x, y): 3332 """ 3333 Calculates the Wilcoxon T-test for related samples and returns the 3334 result. A non-parametric T-test. 3335 3336 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions 3337 Returns: t-statistic, two-tailed p-value 3338 """ 3339 if len(x) != len(y): 3340 raise ValueError('Unequal N in awilcoxont. Aborting.') 3341 d = x-y 3342 d = N.compress(N.not_equal(d, 0), d) # Keep all non-zero differences 3343 count = len(d) 3344 absd = abs(d) 3345 absranked = arankdata(absd) 3346 r_plus = 0.0 3347 r_minus = 0.0 3348 for i in range(len(absd)): 3349 if d[i] < 0: 3350 r_minus = r_minus + absranked[i] 3351 else: 3352 r_plus = r_plus + absranked[i] 3353 wt = min(r_plus, r_minus) 3354 mn = count * (count+1) * 0.25 3355 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 3356 z = math.fabs(wt-mn) / se 3357 z = math.fabs(wt-mn) / se 3358 prob = 2*(1.0 - zprob(abs(z))) 3359 return wt, prob 3360 3361 def akruskalwallish(*args): 3362 """ 3363 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 3364 groups, requiring at least 5 subjects in each group. This function 3365 calculates the Kruskal-Wallis H and associated p-value for 3 or more 3366 independent samples. 3367 3368 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions 3369 Returns: H-statistic (corrected for ties), associated p-value 3370 """ 3371 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" 3372 args = list(args) 3373 n = [0]*len(args) 3374 n = [len(_) for _ in args] 3375 all = [] 3376 for i in range(len(args)): 3377 all = all + args[i].tolist() 3378 ranked = rankdata(all) 3379 T = tiecorrect(ranked) 3380 for i in range(len(args)): 3381 args[i] = ranked[0:n[i]] 3382 del ranked[0:n[i]] 3383 rsums = [] 3384 for i in range(len(args)): 3385 rsums.append(sum(args[i])**2) 3386 rsums[i] = rsums[i] / float(n[i]) 3387 ssbn = sum(rsums) 3388 totaln = sum(n) 3389 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 3390 df = len(args) - 1 3391 if T == 0: 3392 raise ValueError('All numbers are identical in akruskalwallish') 3393 h = h / float(T) 3394 return h, chisqprob(h, df) 3395 3396 def afriedmanchisquare(*args): 3397 """ 3398 Friedman Chi-Square is a non-parametric, one-way within-subjects 3399 ANOVA. This function calculates the Friedman Chi-square test for 3400 repeated measures and returns the result, along with the associated 3401 probability value. It assumes 3 or more repeated measures. Only 3 3402 levels requires a minimum of 10 subjects in the study. Four levels 3403 requires 5 subjects per level(??). 3404 3405 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions 3406 Returns: chi-square statistic, associated p-value 3407 """ 3408 k = len(args) 3409 if k < 3: 3410 raise ValueError('\nLess than 3 levels. Friedman test not appropriate.\n') 3411 n = len(args[0]) 3412 data = pstat.aabut(*args) 3413 data = data.astype(N.Float) 3414 for i in range(len(data)): 3415 data[i] = arankdata(data[i]) 3416 ssbn = asum(asum(args, 1)**2) 3417 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 3418 return chisq, chisqprob(chisq, k-1) 3419 3420# APROBABILITY CALCULATIONS 3421 3422 def achisqprob(chisq, df): 3423 """ 3424 Returns the (1-tail) probability value associated with the provided chi-square 3425 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can 3426 handle multiple dimensions. 3427 3428 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom 3429 """ 3430 BIG = 200.0 3431 3432 def ex(x): 3433 BIG = 200.0 3434 exponents = N.where(N.less(x, -BIG), -BIG, x) 3435 return N.exp(exponents) 3436 3437 if not isinstance(chisq, N.ArrayType): 3438 chisq = N.array([chisq]) 3439 if df < 1: 3440 return N.ones(chisq.shape, N.float) 3441 probs = N.zeros(chisq.shape, N.Float) 3442 probs = N.where(N.less_equal(chisq, 0), 1.0, probs) # set prob=1 for chisq<0 3443 a = 0.5 * chisq 3444 if df > 1: 3445 y = ex(-a) 3446 if df % 2 == 0: 3447 even = 1 3448 s = y*1 3449 s2 = s*1 3450 else: 3451 even = 0 3452 s = 2.0 * azprob(-N.sqrt(chisq)) 3453 s2 = s*1 3454 if (df > 2): 3455 chisq = 0.5 * (df - 1.0) 3456 if even: 3457 z = N.ones(probs.shape, N.Float) 3458 else: 3459 z = 0.5 * N.ones(probs.shape, N.Float) 3460 if even: 3461 e = N.zeros(probs.shape, N.Float) 3462 else: 3463 e = N.log(N.sqrt(N.pi)) * N.ones(probs.shape, N.Float) 3464 c = N.log(a) 3465 mask = N.zeros(probs.shape) 3466 a_big = N.greater(a, BIG) 3467 a_big_frozen = -1 * N.ones(probs.shape, N.Float) 3468 totalelements = N.multiply.reduce(N.array(probs.shape)) 3469 while asum(mask) != totalelements: 3470 e = N.log(z) + e 3471 s = s + ex(c*z-a-e) 3472 z = z + 1.0 3473# print z, e, s 3474 newmask = N.greater(z, chisq) 3475 a_big_frozen = N.where(newmask*N.equal(mask, 0)*a_big, s, a_big_frozen) 3476 mask = N.clip(newmask+mask, 0, 1) 3477 if even: 3478 z = N.ones(probs.shape, N.Float) 3479 e = N.ones(probs.shape, N.Float) 3480 else: 3481 z = 0.5 * N.ones(probs.shape, N.Float) 3482 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape, N.Float) 3483 c = 0.0 3484 mask = N.zeros(probs.shape) 3485 a_notbig_frozen = -1 * N.ones(probs.shape, N.Float) 3486 while asum(mask) != totalelements: 3487 e = e * (a/z.astype(N.Float)) 3488 c = c + e 3489 z = z + 1.0 3490# print '#2', z, e, c, s, c*y+s2 3491 newmask = N.greater(z, chisq) 3492 a_notbig_frozen = N.where(newmask*N.equal(mask, 0)*(1-a_big), 3493 c*y+s2, a_notbig_frozen) 3494 mask = N.clip(newmask+mask, 0, 1) 3495 probs = N.where(N.equal(probs, 1), 1, 3496 N.where(N.greater(a, BIG), a_big_frozen, a_notbig_frozen)) 3497 return probs 3498 else: 3499 return s 3500 3501 def aerfcc(x): 3502 """ 3503 Returns the complementary error function erfc(x) with fractional error 3504 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can 3505 handle multiple dimensions. 3506 3507 Usage: aerfcc(x) 3508 """ 3509 z = abs(x) 3510 t = 1.0 / (1.0+0.5*z) 3511 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 3512 return N.where(N.greater_equal(x, 0), ans, 2.0-ans) 3513 3514 def azprob(z): 3515 """ 3516 Returns the area under the normal curve 'to the left of' the given z value. 3517 Thus, 3518 for z<0, zprob(z) = 1-tail probability 3519 for z>0, 1.0-zprob(z) = 1-tail probability 3520 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 3521 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions. 3522 3523 Usage: azprob(z) where z is a z-value 3524 """ 3525 def yfunc(y): 3526 x = (((((((((((((-0.000045255659 * y 3527 + 0.000152529290) * y - 0.000019538132) * y 3528 - 0.000676904986) * y + 0.001390604284) * y 3529 - 0.000794620820) * y - 0.002034254874) * y 3530 + 0.006549791214) * y - 0.010557625006) * y 3531 + 0.011630447319) * y - 0.009279453341) * y 3532 + 0.005353579108) * y - 0.002141268741) * y 3533 + 0.000535310849) * y + 0.999936657524 3534 return x 3535 3536 def wfunc(w): 3537 x = ((((((((0.000124818987 * w 3538 - 0.001075204047) * w + 0.005198775019) * w 3539 - 0.019198292004) * w + 0.059054035642) * w 3540 - 0.151968751364) * w + 0.319152932694) * w 3541 - 0.531923007300) * w + 0.797884560593) * N.sqrt(w) * 2.0 3542 return x 3543 3544 Z_MAX = 6.0 # maximum meaningful z-value 3545 x = N.zeros(z.shape, N.Float) # initialize 3546 y = 0.5 * N.fabs(z) 3547 x = N.where(N.less(y, 1.0), wfunc(y*y), yfunc(y-2.0)) # get x's 3548 x = N.where(N.greater(y, Z_MAX*0.5), 1.0, x) # kill those with big Z 3549 prob = N.where(N.greater(z, 0), (x+1)*0.5, (1-x)*0.5) 3550 return prob 3551 3552 def aksprob(alam): 3553 """ 3554 Returns the probability value for a K-S statistic computed via ks_2samp. 3555 Adapted from Numerical Recipies. Can handle multiple dimensions. 3556 3557 Usage: aksprob(alam) 3558 """ 3559 if isinstance(alam, N.ArrayType): 3560 frozen = -1 * N.ones(alam.shape, N.Float64) 3561 alam = alam.astype(N.Float64) 3562 arrayflag = 1 3563 else: 3564 frozen = N.array(-1.) 3565 alam = N.array(alam, N.Float64) 3566 mask = N.zeros(alam.shape) 3567 fac = 2.0 * N.ones(alam.shape, N.Float) 3568 sum = N.zeros(alam.shape, N.Float) 3569 termbf = N.zeros(alam.shape, N.Float) 3570 a2 = N.array(-2.0*alam*alam, N.Float64) 3571 totalelements = N.multiply.reduce(N.array(mask.shape)) 3572 for j in range(1, 201): 3573 if asum(mask) == totalelements: 3574 break 3575 exponents = (a2*j*j) 3576 overflowmask = N.less(exponents, -746) 3577 frozen = N.where(overflowmask, 0, frozen) 3578 mask = mask+overflowmask 3579 term = fac*N.exp(exponents) 3580 sum = sum + term 3581 newmask = N.where(N.less_equal(abs(term), (0.001*termbf)) 3582 + N.less(abs(term), 1.0e-8*sum), 1, 0) 3583 frozen = N.where(newmask*N.equal(mask, 0), sum, frozen) 3584 mask = N.clip(mask+newmask, 0, 1) 3585 fac = -fac 3586 termbf = abs(term) 3587 if arrayflag: 3588 return N.where(N.equal(frozen, -1), 1.0, frozen) # 1.0 if doesn't converge 3589 else: 3590 return N.where(N.equal(frozen, -1), 1.0, frozen)[0] # 1.0 if doesn't converge 3591 3592 def afprob(dfnum, dfden, F): 3593 """ 3594 Returns the 1-tailed significance level (p-value) of an F statistic 3595 given the degrees of freedom for the numerator (dfR-dfF) and the degrees 3596 of freedom for the denominator (dfF). Can handle multiple dims for F. 3597 3598 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 3599 """ 3600 if isinstance(F, N.ArrayType): 3601 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) 3602 else: 3603 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 3604 3605 def abetacf(a, b, x, verbose=1): 3606 """ 3607 Evaluates the continued fraction form of the incomplete Beta function, 3608 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple 3609 dimensions for x. 3610 3611 Usage: abetacf(a,b,x,verbose=1) 3612 """ 3613 ITMAX = 200 3614 EPS = 3.0e-7 3615 3616 arrayflag = 1 3617 if isinstance(x, N.ArrayType): 3618 frozen = N.ones(x.shape, N.Float) * -1 # start out w/ -1s, should replace all 3619 else: 3620 arrayflag = 0 3621 frozen = N.array([-1]) 3622 x = N.array([x]) 3623 mask = N.zeros(x.shape) 3624 bm = az = am = 1.0 3625 qab = a+b 3626 qap = a+1.0 3627 qam = a-1.0 3628 bz = 1.0-qab*x/qap 3629 for i in range(ITMAX+1): 3630 if N.sum(N.ravel(N.equal(frozen, -1))) == 0: 3631 break 3632 em = float(i+1) 3633 tem = em + em 3634 d = em*(b-em)*x/((qam+tem)*(a+tem)) 3635 ap = az + d*am 3636 bp = bz+d*bm 3637 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 3638 app = ap+d*az 3639 bpp = bp+d*bz 3640 aold = az*1 3641 am = ap/bpp 3642 bm = bp/bpp 3643 az = app/bpp 3644 bz = 1.0 3645 newmask = N.less(abs(az-aold), EPS*abs(az)) 3646 frozen = N.where(newmask*N.equal(mask, 0), az, frozen) 3647 mask = N.clip(mask+newmask, 0, 1) 3648 noconverge = asum(N.equal(frozen, -1)) 3649 if noconverge != 0 and verbose: 3650 print('a or b too big, or ITMAX too small in Betacf for ', noconverge, ' elements') 3651 if arrayflag: 3652 return frozen 3653 else: 3654 return frozen[0] 3655 3656 def agammln(xx): 3657 """ 3658 Returns the gamma function of xx. 3659 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 3660 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but 3661 probably doesn't normally have to. 3662 3663 Usage: agammln(xx) 3664 """ 3665 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 3666 0.120858003e-2, -0.536382e-5] 3667 x = xx - 1.0 3668 tmp = x + 5.5 3669 tmp = tmp - (x+0.5)*N.log(tmp) 3670 ser = 1.0 3671 for j in range(len(coeff)): 3672 x = x + 1 3673 ser = ser + coeff[j]/x 3674 return -tmp + N.log(2.50662827465*ser) 3675 3676 def abetai(a, b, x, verbose=1): 3677 """ 3678 Returns the incomplete beta function: 3679 3680 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 3681 3682 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 3683 function of a. The continued fraction formulation is implemented 3684 here, using the betacf function. (Adapted from: Numerical Recipies in 3685 C.) Can handle multiple dimensions. 3686 3687 Usage: abetai(a,b,x,verbose=1) 3688 """ 3689 TINY = 1e-15 3690 if isinstance(a, N.ArrayType): 3691 if asum(N.less(x, 0)+N.greater(x, 1)) != 0: 3692 raise ValueError('Bad x in abetai') 3693 x = N.where(N.equal(x, 0), TINY, x) 3694 x = N.where(N.equal(x, 1.0), 1-TINY, x) 3695 3696 bt = N.where(N.equal(x, 0)+N.equal(x, 1), 0, -1) 3697 exponents = (gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b * N.log(1.0-x)) 3698 # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW 3699 exponents = N.where(N.less(exponents, -740), -740, exponents) 3700 bt = N.exp(exponents) 3701 if isinstance(x, N.ArrayType): 3702 ans = N.where(N.less(x, (a+1)/(a+b+2.0)), 3703 bt*abetacf(a, b, x, verbose)/float(a), 3704 1.0-bt*abetacf(b, a, 1.0-x, verbose)/float(b)) 3705 else: 3706 if x < (a+1)/(a+b+2.0): 3707 ans = bt*abetacf(a, b, x, verbose)/float(a) 3708 else: 3709 ans = 1.0-bt*abetacf(b, a, 1.0-x, verbose)/float(b) 3710 return ans 3711 3712# AANOVA CALCULATIONS 3713 3714 import LinearAlgebra 3715 LA = LinearAlgebra 3716 3717 def aglm(data, para): 3718 """ 3719 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken 3720 from: 3721 Peterson et al. Statistical limitations in functional neuroimaging 3722 I. Non-inferential methods and statistical models. Phil Trans Royal Soc 3723 Lond B 354: 1239-1260. 3724 3725 Usage: aglm(data,para) 3726 Returns: statistic, p-value ??? 3727 """ 3728 if len(para) != len(data): 3729 print("data and para must be same length in aglm") 3730 return 3731 n = len(para) 3732 p = pstat.aunique(para) 3733 x = N.zeros((n, len(p))) # design matrix 3734 for l in range(len(p)): 3735 x[:, l] = N.equal(para, p[l]) 3736 b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x), x)), # i.e., b=inv(X'X)X'Y 3737 N.transpose(x)), 3738 data) 3739 diffs = (data - N.dot(x, b)) 3740 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) 3741 3742 if len(p) == 2: # ttest_ind 3743 c = N.array([1, -1]) 3744 df = n-2 3745 fact = asum(1.0/asum(x, 0)) # i.e., 1/n1 + 1/n2 + 1/n3 ... 3746 t = N.dot(c, b) / N.sqrt(s_sq*fact) 3747 probs = abetai(0.5*df, 0.5, float(df)/(df+t*t)) 3748 return t, probs 3749 3750 def aF_oneway(*args): 3751 """ 3752 Performs a 1-way ANOVA, returning an F-value and probability given 3753 any number of groups. From Heiman, pp.394-7. 3754 3755 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per 3756 treatment group 3757 Returns: f-value, probability 3758 """ 3759 na = len(args) # ANOVA on 'na' groups, each in it's own array 3760 alldata = [] 3761 alldata = N.concatenate(args) 3762 bign = len(alldata) 3763 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 3764 ssbn = 0 3765 for a in args: 3766 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) 3767 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 3768 sswn = sstot-ssbn 3769 dfbn = na-1 3770 dfwn = bign - na 3771 msb = ssbn/float(dfbn) 3772 msw = sswn/float(dfwn) 3773 f = msb/msw 3774 prob = fprob(dfbn, dfwn, f) 3775 return f, prob 3776 3777 def aF_value(ER, EF, dfR, dfF): 3778 """ 3779 Returns an F-statistic given the following: 3780 ER = error associated with the null hypothesis (the Restricted model) 3781 EF = error associated with the alternate hypothesis (the Full model) 3782 dfR = degrees of freedom the Restricted model 3783 dfF = degrees of freedom associated with the Restricted model 3784 """ 3785 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF))) 3786 3787 def outputfstats(Enum, Eden, dfnum, dfden, f, prob): 3788 Enum = round(Enum, 3) 3789 Eden = round(Eden, 3) 3790 dfnum = round(Enum, 3) 3791 dfden = round(dfden, 3) 3792 f = round(f, 3) 3793 prob = round(prob, 3) 3794 suffix = '' # for *s after the p-value 3795 if prob < 0.001: 3796 suffix = ' ***' 3797 elif prob < 0.01: 3798 suffix = ' **' 3799 elif prob < 0.05: 3800 suffix = ' *' 3801 title = [['EF/ER', 'DF', 'Mean Square', 'F-value', 'prob', '']] 3802 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum), 3), f, prob, suffix], 3803 [Eden, dfden, round(Eden/float(dfden), 3), '', '', '']] 3804 pstat.printcc(lofl) 3805 return 3806 3807 def F_value_multivariate(ER, EF, dfnum, dfden): 3808 """ 3809 Returns an F-statistic given the following: 3810 ER = error associated with the null hypothesis (the Restricted model) 3811 EF = error associated with the alternate hypothesis (the Full model) 3812 dfR = degrees of freedom the Restricted model 3813 dfF = degrees of freedom associated with the Restricted model 3814 where ER and EF are matrices from a multivariate F calculation. 3815 """ 3816 if type(ER) in [int, float]: 3817 ER = N.array([[ER]]) 3818 if type(EF) in [int, float]: 3819 EF = N.array([[EF]]) 3820 n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) 3821 d_en = LA.determinant(EF) / float(dfden) 3822 return n_um / d_en 3823 3824# ASUPPORT FUNCTIONS 3825 3826 def asign(a): 3827 """ 3828 Usage: asign(a) 3829 Returns: array shape of a, with -1 where a<0 and +1 where a>=0 3830 """ 3831 a = N.asarray(a) 3832 if ((isinstance(a, float)) or (isinstance(a, int))): 3833 return a-a-N.less(a, 0)+N.greater(a, 0) 3834 else: 3835 return N.zeros(N.shape(a))-N.less(a, 0)+N.greater(a, 0) 3836 3837 def asum(a, dimension=None, keepdims=0): 3838 """ 3839 An alternative to the Numeric.add.reduce function, which allows one to 3840 (1) collapse over multiple dimensions at once, and/or (2) to retain 3841 all dimensions in the original array (squashing one down to size. 3842 Dimension can equal None (ravel array first), an integer (the 3843 dimension over which to operate), or a sequence (operate over multiple 3844 dimensions). If keepdims=1, the resulting array will have as many 3845 dimensions as the input array. 3846 3847 Usage: asum(a, dimension=None, keepdims=0) 3848 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 3849 """ 3850 if isinstance(a, N.ArrayType) and a.typecode() in ['l', 's', 'b']: 3851 a = a.astype(N.Float) 3852 if dimension is None: 3853 s = N.sum(N.ravel(a)) 3854 elif type(dimension) in [int, float]: 3855 s = N.add.reduce(a, dimension) 3856 if keepdims == 1: 3857 shp = list(a.shape) 3858 shp[dimension] = 1 3859 s = N.reshape(s, shp) 3860 else: # must be a SEQUENCE of dims to sum over 3861 dims = sorted(dimension) 3862 dims.reverse() 3863 s = a * 1.0 3864 for dim in dims: 3865 s = N.add.reduce(s, dim) 3866 if keepdims == 1: 3867 shp = list(a.shape) 3868 for dim in dims: 3869 shp[dim] = 1 3870 s = N.reshape(s, shp) 3871 return s 3872 3873 def acumsum(a, dimension=None): 3874 """ 3875 Returns an array consisting of the cumulative sum of the items in the 3876 passed array. Dimension can equal None (ravel array first), an 3877 integer (the dimension over which to operate), or a sequence (operate 3878 over multiple dimensions, but this last one just barely makes sense). 3879 3880 Usage: acumsum(a,dimension=None) 3881 """ 3882 if dimension is None: 3883 a = N.ravel(a) 3884 dimension = 0 3885 if type(dimension) in [list, tuple, N.ArrayType]: 3886 dimension = sorted(dimension) 3887 dimension.reverse() 3888 for d in dimension: 3889 a = N.add.accumulate(a, d) 3890 return a 3891 else: 3892 return N.add.accumulate(a, dimension) 3893 3894 def ass(inarray, dimension=None, keepdims=0): 3895 """ 3896 Squares each value in the passed array, adds these squares & returns 3897 the result. Unfortunate function name. :-) Defaults to ALL values in 3898 the array. Dimension can equal None (ravel array first), an integer 3899 (the dimension over which to operate), or a sequence (operate over 3900 multiple dimensions). Set keepdims=1 to maintain the original number 3901 of dimensions. 3902 3903 Usage: ass(inarray, dimension=None, keepdims=0) 3904 Returns: sum-along-'dimension' for (inarray*inarray) 3905 """ 3906 if dimension is None: 3907 inarray = N.ravel(inarray) 3908 dimension = 0 3909 return asum(inarray*inarray, dimension, keepdims) 3910 3911 def asummult(array1, array2, dimension=None, keepdims=0): 3912 """ 3913 Multiplies elements in array1 and array2, element by element, and 3914 returns the sum (along 'dimension') of all resulting multiplications. 3915 Dimension can equal None (ravel array first), an integer (the 3916 dimension over which to operate), or a sequence (operate over multiple 3917 dimensions). A trivial function, but included for completeness. 3918 3919 Usage: asummult(array1,array2,dimension=None,keepdims=0) 3920 """ 3921 if dimension is None: 3922 array1 = N.ravel(array1) 3923 array2 = N.ravel(array2) 3924 dimension = 0 3925 return asum(array1*array2, dimension, keepdims) 3926 3927 def asquare_of_sums(inarray, dimension=None, keepdims=0): 3928 """ 3929 Adds the values in the passed array, squares that sum, and returns the 3930 result. Dimension can equal None (ravel array first), an integer (the 3931 dimension over which to operate), or a sequence (operate over multiple 3932 dimensions). If keepdims=1, the returned array will have the same 3933 NUMBER of dimensions as the original. 3934 3935 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0) 3936 Returns: the square of the sum over dim(s) in dimension 3937 """ 3938 if dimension is None: 3939 inarray = N.ravel(inarray) 3940 dimension = 0 3941 s = asum(inarray, dimension, keepdims) 3942 if isinstance(s, N.ArrayType): 3943 return s.astype(N.Float)*s 3944 else: 3945 return float(s)*s 3946 3947 def asumdiffsquared(a, b, dimension=None, keepdims=0): 3948 """ 3949 Takes pairwise differences of the values in arrays a and b, squares 3950 these differences, and returns the sum of these squares. Dimension 3951 can equal None (ravel array first), an integer (the dimension over 3952 which to operate), or a sequence (operate over multiple dimensions). 3953 keepdims=1 means the return shape = len(a.shape) = len(b.shape) 3954 3955 Usage: asumdiffsquared(a,b) 3956 Returns: sum[ravel(a-b)**2] 3957 """ 3958 if dimension is None: 3959 N.ravel(a) # inarray 3960 dimension = 0 3961 return asum((a-b)**2, dimension, keepdims) 3962 3963 def ashellsort(inarray): 3964 """ 3965 Shellsort algorithm. Sorts a 1D-array. 3966 3967 Usage: ashellsort(inarray) 3968 Returns: sorted-inarray, sorting-index-vector (for original array) 3969 """ 3970 n = len(inarray) 3971 svec = inarray * 1.0 3972 ivec = list(range(n)) 3973 gap = n/2 # integer division needed 3974 while gap > 0: 3975 for i in range(gap, n): 3976 for j in range(i-gap, -1, -gap): 3977 while j >= 0 and svec[j] > svec[j+gap]: 3978 temp = svec[j] 3979 svec[j] = svec[j+gap] 3980 svec[j+gap] = temp 3981 itemp = ivec[j] 3982 ivec[j] = ivec[j+gap] 3983 ivec[j+gap] = itemp 3984 gap = gap / 2 # integer division needed 3985# svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] 3986 return svec, ivec 3987 3988 def arankdata(inarray): 3989 """ 3990 Ranks the data in inarray, dealing with ties appropritely. Assumes 3991 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort. 3992 3993 Usage: arankdata(inarray) 3994 Returns: array of length equal to inarray, containing rank scores 3995 """ 3996 n = len(inarray) 3997 svec, ivec = ashellsort(inarray) 3998 sumranks = 0 3999 dupcount = 0 4000 newarray = N.zeros(n, N.Float) 4001 for i in range(n): 4002 sumranks = sumranks + i 4003 dupcount = dupcount + 1 4004 if i == n-1 or svec[i] != svec[i+1]: 4005 averank = sumranks / float(dupcount) + 1 4006 for j in range(i-dupcount+1, i+1): 4007 newarray[ivec[j]] = averank 4008 sumranks = 0 4009 dupcount = 0 4010 return newarray 4011 4012 def afindwithin(data): 4013 """ 4014 Returns a binary vector, 1=within-subject factor, 0=between. Input 4015 equals the entire data array (i.e., column 1=random factor, last 4016 column = measured values. 4017 4018 Usage: afindwithin(data) data in |Stat format 4019 """ 4020 numfact = len(data[0])-2 4021 withinvec = [0]*numfact 4022 for col in range(1, numfact+1): 4023 rows = pstat.linexand(data, col, pstat.unique(pstat.colex(data, 1))[0]) # get 1 level of this factor 4024 if len(pstat.unique(pstat.colex(rows, 0))) < len(rows): # if fewer subjects than scores on this factor 4025 withinvec[col-1] = 1 4026 return withinvec 4027 4028 # RE-DEFINE DISPATCHES TO INCLUDE ARRAYS 4029 4030# CENTRAL TENDENCY: 4031 geometricmean = Dispatch((lgeometricmean, (list, tuple)), (ageometricmean, (N.ArrayType,))) 4032 harmonicmean = Dispatch((lharmonicmean, (list, tuple)), (aharmonicmean, (N.ArrayType,))) 4033 mean = Dispatch((lmean, (list, tuple)), (amean, (N.ArrayType,))) 4034 median = Dispatch((lmedian, (list, tuple)), (amedian, (N.ArrayType,))) 4035 medianscore = Dispatch((lmedianscore, (list, tuple)), (amedianscore, (N.ArrayType,))) 4036 mode = Dispatch((lmode, (list, tuple)), (amode, (N.ArrayType,))) 4037 tmean = Dispatch((atmean, (N.ArrayType,))) 4038 tvar = Dispatch((atvar, (N.ArrayType,))) 4039 tstdev = Dispatch((atstdev, (N.ArrayType,))) 4040 tsem = Dispatch((atsem, (N.ArrayType,))) 4041 4042# VARIATION: 4043 moment = Dispatch((lmoment, (list, tuple)), (amoment, (N.ArrayType,))) 4044 variation = Dispatch((lvariation, (list, tuple)), (avariation, (N.ArrayType,))) 4045 skew = Dispatch((lskew, (list, tuple)), (askew, (N.ArrayType,))) 4046 kurtosis = Dispatch((lkurtosis, (list, tuple)), (akurtosis, (N.ArrayType,))) 4047 describe = Dispatch((ldescribe, (list, tuple)), (adescribe, (N.ArrayType,))) 4048 4049# DISTRIBUTION TESTS 4050 4051 skewtest = Dispatch((askewtest, (list, tuple)), (askewtest, (N.ArrayType,))) 4052 kurtosistest = Dispatch((akurtosistest, (list, tuple)), (akurtosistest, (N.ArrayType,))) 4053 normaltest = Dispatch((anormaltest, (list, tuple)), (anormaltest, (N.ArrayType,))) 4054 4055# FREQUENCY STATS: 4056 itemfreq = Dispatch((litemfreq, (list, tuple)), (aitemfreq, (N.ArrayType,))) 4057 scoreatpercentile = Dispatch((lscoreatpercentile, (list, tuple)), (ascoreatpercentile, (N.ArrayType,))) 4058 percentileofscore = Dispatch((lpercentileofscore, (list, tuple)), (apercentileofscore, (N.ArrayType,))) 4059 histogram = Dispatch((lhistogram, (list, tuple)), (ahistogram, (N.ArrayType,))) 4060 cumfreq = Dispatch((lcumfreq, (list, tuple)), (acumfreq, (N.ArrayType,))) 4061 relfreq = Dispatch((lrelfreq, (list, tuple)), (arelfreq, (N.ArrayType,))) 4062 4063# VARIABILITY: 4064 obrientransform = Dispatch((lobrientransform, (list, tuple)), (aobrientransform, (N.ArrayType,))) 4065 samplevar = Dispatch((lsamplevar, (list, tuple)), (asamplevar, (N.ArrayType,))) 4066 samplestdev = Dispatch((lsamplestdev, (list, tuple)), (asamplestdev, (N.ArrayType,))) 4067 signaltonoise = Dispatch((asignaltonoise, (N.ArrayType,)),) 4068 var = Dispatch((lvar, (list, tuple)), (avar, (N.ArrayType,))) 4069 stdev = Dispatch((lstdev, (list, tuple)), (astdev, (N.ArrayType,))) 4070 sterr = Dispatch((lsterr, (list, tuple)), (asterr, (N.ArrayType,))) 4071 sem = Dispatch((lsem, (list, tuple)), (asem, (N.ArrayType,))) 4072 z = Dispatch((lz, (list, tuple)), (az, (N.ArrayType,))) 4073 zs = Dispatch((lzs, (list, tuple)), (azs, (N.ArrayType,))) 4074 4075# TRIMMING FCNS: 4076 threshold = Dispatch((athreshold, (N.ArrayType,)),) 4077 trimboth = Dispatch((ltrimboth, (list, tuple)), (atrimboth, (N.ArrayType,))) 4078 trim1 = Dispatch((ltrim1, (list, tuple)), (atrim1, (N.ArrayType,))) 4079 4080# CORRELATION FCNS: 4081 paired = Dispatch((lpaired, (list, tuple)), (apaired, (N.ArrayType,))) 4082 pearsonr = Dispatch((lpearsonr, (list, tuple)), (apearsonr, (N.ArrayType,))) 4083 spearmanr = Dispatch((lspearmanr, (list, tuple)), (aspearmanr, (N.ArrayType,))) 4084 pointbiserialr = Dispatch((lpointbiserialr, (list, tuple)), (apointbiserialr, (N.ArrayType,))) 4085 kendalltau = Dispatch((lkendalltau, (list, tuple)), (akendalltau, (N.ArrayType,))) 4086 linregress = Dispatch((llinregress, (list, tuple)), (alinregress, (N.ArrayType,))) 4087 4088# INFERENTIAL STATS: 4089 ttest_1samp = Dispatch((lttest_1samp, (list, tuple)), (attest_1samp, (N.ArrayType,))) 4090 ttest_ind = Dispatch((lttest_ind, (list, tuple)), (attest_ind, (N.ArrayType,))) 4091 ttest_rel = Dispatch((lttest_rel, (list, tuple)), (attest_rel, (N.ArrayType,))) 4092 chisquare = Dispatch((lchisquare, (list, tuple)), (achisquare, (N.ArrayType,))) 4093 ks_2samp = Dispatch((lks_2samp, (list, tuple)), (aks_2samp, (N.ArrayType,))) 4094 mannwhitneyu = Dispatch((lmannwhitneyu, (list, tuple)), (amannwhitneyu, (N.ArrayType,))) 4095 tiecorrect = Dispatch((ltiecorrect, (list, tuple)), (atiecorrect, (N.ArrayType,))) 4096 ranksums = Dispatch((lranksums, (list, tuple)), (aranksums, (N.ArrayType,))) 4097 wilcoxont = Dispatch((lwilcoxont, (list, tuple)), (awilcoxont, (N.ArrayType,))) 4098 kruskalwallish = Dispatch((lkruskalwallish, (list, tuple)), (akruskalwallish, (N.ArrayType,))) 4099 friedmanchisquare = Dispatch((lfriedmanchisquare, (list, tuple)), (afriedmanchisquare, (N.ArrayType,))) 4100 4101# PROBABILITY CALCS: 4102 chisqprob = Dispatch((lchisqprob, (int, float)), (achisqprob, (N.ArrayType,))) 4103 zprob = Dispatch((lzprob, (int, float)), (azprob, (N.ArrayType,))) 4104 ksprob = Dispatch((lksprob, (int, float)), (aksprob, (N.ArrayType,))) 4105 fprob = Dispatch((lfprob, (int, float)), (afprob, (N.ArrayType,))) 4106 betacf = Dispatch((lbetacf, (int, float)), (abetacf, (N.ArrayType,))) 4107 betai = Dispatch((lbetai, (int, float)), (abetai, (N.ArrayType,))) 4108 erfcc = Dispatch((lerfcc, (int, float)), (aerfcc, (N.ArrayType,))) 4109 gammln = Dispatch((lgammln, (int, float)), (agammln, (N.ArrayType,))) 4110 4111# ANOVA FUNCTIONS: 4112 F_oneway = Dispatch((lF_oneway, (list, tuple)), (aF_oneway, (N.ArrayType,))) 4113 F_value = Dispatch((lF_value, (list, tuple)), (aF_value, (N.ArrayType,))) 4114 4115# SUPPORT FUNCTIONS: 4116 incr = Dispatch((lincr, (list, tuple, N.ArrayType)), ) 4117 sum = Dispatch((lsum, (list, tuple)), (asum, (N.ArrayType,))) 4118 cumsum = Dispatch((lcumsum, (list, tuple)), (acumsum, (N.ArrayType,))) 4119 ss = Dispatch((lss, (list, tuple)), (ass, (N.ArrayType,))) 4120 summult = Dispatch((lsummult, (list, tuple)), (asummult, (N.ArrayType,))) 4121 square_of_sums = Dispatch((lsquare_of_sums, (list, tuple)), (asquare_of_sums, (N.ArrayType,))) 4122 sumdiffsquared = Dispatch((lsumdiffsquared, (list, tuple)), (asumdiffsquared, (N.ArrayType,))) 4123 shellsort = Dispatch((lshellsort, (list, tuple)), (ashellsort, (N.ArrayType,))) 4124 rankdata = Dispatch((lrankdata, (list, tuple)), (arankdata, (N.ArrayType,))) 4125 findwithin = Dispatch((lfindwithin, (list, tuple)), (afindwithin, (N.ArrayType,))) 4126 4127# END OF NUMERIC FUNCTION BLOCK 4128 4129# END OF STATISTICAL FUNCTIONS 4130 4131except ImportError: 4132 pass 4133