1.. module:: statsmodels.stats
2   :synopsis: Statistical methods and tests
3
4.. currentmodule:: statsmodels.stats
5
6.. _stats:
7
8
9Statistics :mod:`stats`
10=======================
11
12This section collects various statistical tests and tools.
13Some can be used independently of any models, some are intended as extension to the
14models and model results.
15
16API Warning: The functions and objects in this category are spread out in
17various modules and might still be moved around. We expect that in future the
18statistical tests will return class instances with more informative reporting
19instead of only the raw numbers.
20
21
22.. _stattools:
23
24
25Residual Diagnostics and Specification Tests
26--------------------------------------------
27
28.. module:: statsmodels.stats.stattools
29   :synopsis: Statistical methods and tests that do not fit into other categories
30
31.. currentmodule:: statsmodels.stats.stattools
32
33.. autosummary::
34   :toctree: generated/
35
36   durbin_watson
37   jarque_bera
38   omni_normtest
39   medcouple
40   robust_skewness
41   robust_kurtosis
42   expected_robust_kurtosis
43
44.. module:: statsmodels.stats.diagnostic
45   :synopsis: Statistical methods and tests to diagnose model fit problems
46
47.. currentmodule:: statsmodels.stats.diagnostic
48
49.. autosummary::
50   :toctree: generated/
51
52   acorr_breusch_godfrey
53   acorr_ljungbox
54   acorr_lm
55
56   breaks_cusumolsresid
57   breaks_hansen
58   recursive_olsresiduals
59
60   compare_cox
61   compare_encompassing
62   compare_j
63
64   het_arch
65   het_breuschpagan
66   het_goldfeldquandt
67   het_white
68   spec_white
69
70   linear_harvey_collier
71   linear_lm
72   linear_rainbow
73   linear_reset
74
75
76Outliers and influence measures
77~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
78
79.. module:: statsmodels.stats.outliers_influence
80   :synopsis: Statistical methods and measures for outliers and influence
81
82.. currentmodule:: statsmodels.stats.outliers_influence
83
84.. autosummary::
85   :toctree: generated/
86
87   OLSInfluence
88   GLMInfluence
89   MLEInfluence
90   variance_inflation_factor
91
92See also the notes on :ref:`notes on regression diagnostics <diagnostics>`
93
94Sandwich Robust Covariances
95---------------------------
96
97The following functions calculate covariance matrices and standard errors for
98the parameter estimates that are robust to heteroscedasticity and
99autocorrelation in the errors. Similar to the methods that are available
100for the LinearModelResults, these methods are designed for use with OLS.
101
102.. currentmodule:: statsmodels.stats
103
104.. autosummary::
105   :toctree: generated/
106
107   sandwich_covariance.cov_hac
108   sandwich_covariance.cov_nw_panel
109   sandwich_covariance.cov_nw_groupsum
110   sandwich_covariance.cov_cluster
111   sandwich_covariance.cov_cluster_2groups
112   sandwich_covariance.cov_white_simple
113
114The following are standalone versions of the heteroscedasticity robust
115standard errors attached to LinearModelResults
116
117.. autosummary::
118   :toctree: generated/
119
120   sandwich_covariance.cov_hc0
121   sandwich_covariance.cov_hc1
122   sandwich_covariance.cov_hc2
123   sandwich_covariance.cov_hc3
124
125   sandwich_covariance.se_cov
126
127
128Goodness of Fit Tests and Measures
129----------------------------------
130
131some tests for goodness of fit for univariate distributions
132
133.. module:: statsmodels.stats.gof
134   :synopsis: Goodness of fit measures and tests
135
136.. currentmodule:: statsmodels.stats.gof
137
138.. autosummary::
139   :toctree: generated/
140
141   powerdiscrepancy
142   gof_chisquare_discrete
143   gof_binning_discrete
144   chisquare_effectsize
145
146.. currentmodule:: statsmodels.stats.diagnostic
147
148.. autosummary::
149   :toctree: generated/
150
151   anderson_statistic
152   normal_ad
153   kstest_exponential
154   kstest_fit
155   kstest_normal
156   lilliefors
157
158Non-Parametric Tests
159--------------------
160
161.. module:: statsmodels.sandbox.stats.runs
162   :synopsis: Experimental statistical methods and tests to analyze runs
163
164.. currentmodule:: statsmodels.sandbox.stats.runs
165
166.. autosummary::
167   :toctree: generated/
168
169   mcnemar
170   symmetry_bowker
171   median_test_ksample
172   runstest_1samp
173   runstest_2samp
174   cochrans_q
175   Runs
176
177.. currentmodule:: statsmodels.stats.descriptivestats
178
179.. autosummary::
180   :toctree: generated/
181
182   sign_test
183
184.. currentmodule:: statsmodels.stats.nonparametric
185
186.. autosummary::
187   :toctree: generated/
188
189   rank_compare_2indep
190   rank_compare_2ordinal
191   RankCompareResult
192   cohensd2problarger
193   prob_larger_continuous
194   rankdata_2samp
195
196
197Descriptive Statistics
198----------------------
199
200.. module:: statsmodels.stats.descriptivestats
201   :synopsis: Descriptive statistics
202
203.. currentmodule:: statsmodels.stats.descriptivestats
204
205.. autosummary::
206   :toctree: generated/
207
208   describe
209   Description
210
211.. _interrater:
212
213Interrater Reliability and Agreement
214------------------------------------
215
216The main function that statsmodels has currently available for interrater
217agreement measures and tests is Cohen's Kappa. Fleiss' Kappa is currently
218only implemented as a measures but without associated results statistics.
219
220.. module:: statsmodels.stats.inter_rater
221.. currentmodule:: statsmodels.stats.inter_rater
222
223.. autosummary::
224   :toctree: generated/
225
226   cohens_kappa
227   fleiss_kappa
228   to_table
229   aggregate_raters
230
231Multiple Tests and Multiple Comparison Procedures
232-------------------------------------------------
233
234`multipletests` is a function for p-value correction, which also includes p-value
235correction based on fdr in `fdrcorrection`.
236`tukeyhsd` performs simultaneous testing for the comparison of (independent) means.
237These three functions are verified.
238GroupsStats and MultiComparison are convenience classes to multiple comparisons similar
239to one way ANOVA, but still in development
240
241.. module:: statsmodels.sandbox.stats.multicomp
242   :synopsis: Experimental methods for controlling size while performing multiple comparisons
243
244
245.. currentmodule:: statsmodels.stats.multitest
246
247.. autosummary::
248   :toctree: generated/
249
250   multipletests
251   fdrcorrection
252
253.. currentmodule:: statsmodels.sandbox.stats.multicomp
254
255.. autosummary::
256   :toctree: generated/
257
258   GroupsStats
259   MultiComparison
260   TukeyHSDResults
261
262.. module:: statsmodels.stats.multicomp
263   :synopsis: Methods for controlling size while performing multiple comparisons
264
265.. currentmodule:: statsmodels.stats.multicomp
266
267.. autosummary::
268   :toctree: generated/
269
270   pairwise_tukeyhsd
271
272.. module:: statsmodels.stats.multitest
273   :synopsis: Multiple testing p-value and FDR adjustments
274
275.. currentmodule:: statsmodels.stats.multitest
276
277.. autosummary::
278   :toctree: generated/
279
280   local_fdr
281   fdrcorrection_twostage
282   NullDistribution
283   RegressionFDR
284
285.. module:: statsmodels.stats.knockoff_regeffects
286   :synopsis: Regression Knock-Off Effects
287
288.. currentmodule:: statsmodels.stats.knockoff_regeffects
289
290.. autosummary::
291   :toctree: generated/
292
293   CorrelationEffects
294   OLSEffects
295   ForwardEffects
296   OLSEffects
297   RegModelEffects
298
299The following functions are not (yet) public
300
301.. currentmodule:: statsmodels.sandbox.stats.multicomp
302
303.. autosummary::
304   :toctree: generated/
305
306   varcorrection_pairs_unbalanced
307   varcorrection_pairs_unequal
308   varcorrection_unbalanced
309   varcorrection_unequal
310
311   StepDown
312   catstack
313   ccols
314   compare_ordered
315   distance_st_range
316   ecdf
317   get_tukeyQcrit
318   homogeneous_subsets
319   maxzero
320   maxzerodown
321   mcfdr
322   qcrit
323   randmvn
324   rankdata
325   rejectionline
326   set_partition
327   set_remove_subs
328   tiecorrect
329
330.. _tost:
331
332Basic Statistics and t-Tests with frequency weights
333---------------------------------------------------
334
335Besides basic statistics, like mean, variance, covariance and correlation for
336data with case weights, the classes here provide one and two sample tests
337for means. The t-tests have more options than those in scipy.stats, but are
338more restrictive in the shape of the arrays. Confidence intervals for means
339are provided based on the same assumptions as the t-tests.
340
341Additionally, tests for equivalence of means are available for one sample and
342for two, either paired or independent, samples. These tests are based on TOST,
343two one-sided tests, which have as null hypothesis that the means are not
344"close" to each other.
345
346.. module:: statsmodels.stats.weightstats
347   :synopsis: Weighted statistics
348
349.. currentmodule:: statsmodels.stats.weightstats
350
351.. autosummary::
352   :toctree: generated/
353
354   DescrStatsW
355   CompareMeans
356   ttest_ind
357   ttost_ind
358   ttost_paired
359   ztest
360   ztost
361   zconfint
362
363weightstats also contains tests and confidence intervals based on summary
364data
365
366.. currentmodule:: statsmodels.stats.weightstats
367
368.. autosummary::
369   :toctree: generated/
370
371   _tconfint_generic
372   _tstat_generic
373   _zconfint_generic
374   _zstat_generic
375   _zstat_generic2
376
377
378Power and Sample Size Calculations
379----------------------------------
380
381The :mod:`power` module currently implements power and sample size calculations
382for the t-tests, normal based test, F-tests and Chisquare goodness of fit test.
383The implementation is class based, but the module also provides
384three shortcut functions, ``tt_solve_power``, ``tt_ind_solve_power`` and
385``zt_ind_solve_power`` to solve for any one of the parameters of the power
386equations.
387
388
389.. module:: statsmodels.stats.power
390   :synopsis: Power and size calculations for common tests
391
392.. currentmodule:: statsmodels.stats.power
393
394.. autosummary::
395   :toctree: generated/
396
397   TTestIndPower
398   TTestPower
399   GofChisquarePower
400   NormalIndPower
401   FTestAnovaPower
402   FTestPower
403   normal_power_het
404   normal_sample_size_one_tail
405   tt_solve_power
406   tt_ind_solve_power
407   zt_ind_solve_power
408
409
410.. _proportion_stats:
411
412Proportion
413----------
414
415Also available are hypothesis test, confidence intervals and effect size for
416proportions that can be used with NormalIndPower.
417
418.. module:: statsmodels.stats.proportion
419   :synopsis: Tests for proportions
420
421.. currentmodule:: statsmodels.stats.proportion
422
423.. autosummary::
424   :toctree: generated
425
426   proportion_confint
427   proportion_effectsize
428
429   binom_test
430   binom_test_reject_interval
431   binom_tost
432   binom_tost_reject_interval
433
434   multinomial_proportions_confint
435
436   proportions_ztest
437   proportions_ztost
438   proportions_chisquare
439   proportions_chisquare_allpairs
440   proportions_chisquare_pairscontrol
441
442   proportion_effectsize
443   power_binom_tost
444   power_ztost_prop
445   samplesize_confint_proportion
446
447Statistics for two independent samples
448Status: experimental, API might change, added in 0.12
449
450.. autosummary::
451   :toctree: generated
452
453   test_proportions_2indep
454   confint_proportions_2indep
455   power_proportions_2indep
456   tost_proportions_2indep
457   samplesize_proportions_2indep_onetail
458   score_test_proportions_2indep
459   _score_confint_inversion
460
461
462Rates
463-----
464
465Statistical functions for rates. This currently includes hypothesis tests for
466two independent samples.
467
468Status: experimental, API might change, added in 0.12
469
470.. module:: statsmodels.stats.rates
471   :synopsis: Tests for Poisson rates
472
473.. currentmodule:: statsmodels.stats.rates
474
475.. autosummary::
476   :toctree: generated
477
478   test_poisson_2indep
479   etest_poisson_2indep
480   tost_poisson_2indep
481
482
483Multivariate
484------------
485
486Statistical functions for multivariate samples.
487
488This includes hypothesis test and confidence intervals for mean of sample
489of multivariate observations and hypothesis tests for the structure of a
490covariance matrix.
491
492Status: experimental, API might change, added in 0.12
493
494.. module:: statsmodels.stats.multivariate
495   :synopsis: Statistical functions for multivariate samples.
496
497.. currentmodule:: statsmodels.stats.multivariate
498
499.. autosummary::
500   :toctree: generated
501
502   test_mvmean
503   confint_mvmean
504   confint_mvmean_fromstats
505   test_mvmean_2indep
506   test_cov
507   test_cov_blockdiagonal
508   test_cov_diagonal
509   test_cov_oneway
510   test_cov_spherical
511
512
513.. _oneway_stats:
514
515Oneway Anova
516------------
517
518Hypothesis test, confidence intervals and effect size for oneway analysis of
519k samples.
520
521Status: experimental, API might change, added in 0.12
522
523.. module:: statsmodels.stats.oneway
524   :synopsis: Statistical functions for oneway analysis, Anova.
525
526.. currentmodule:: statsmodels.stats.oneway
527
528.. autosummary::
529   :toctree: generated
530
531
532   anova_oneway
533   anova_generic
534   equivalence_oneway
535   equivalence_oneway_generic
536   power_equivalence_oneway
537   _power_equivalence_oneway_emp
538
539   test_scale_oneway
540   equivalence_scale_oneway
541
542   confint_effectsize_oneway
543   confint_noncentrality
544   convert_effectsize_fsqu
545   effectsize_oneway
546   f2_to_wellek
547   fstat_to_wellek
548   wellek_to_f2
549   _fstat2effectsize
550
551   scale_transform
552   simulate_power_equivalence_oneway
553
554
555.. _robust_stats:
556
557Robust, Trimmed Statistics
558--------------------------
559
560Statistics for samples that are trimmed at a fixed fraction. This includes
561class TrimmedMean for one sample statistics. It is used in `stats.oneway`
562for trimmed "Yuen" Anova.
563
564Status: experimental, API might change, added in 0.12
565
566.. module:: statsmodels.stats.robust_compare
567   :synopsis: Trimmed sample statistics.
568
569.. currentmodule:: statsmodels.stats.robust_compare
570
571.. autosummary::
572   :toctree: generated
573
574   TrimmedMean
575   scale_transform
576   trim_mean
577   trimboth
578
579
580Moment Helpers
581--------------
582
583When there are missing values, then it is possible that a correlation or
584covariance matrix is not positive semi-definite. The following
585functions can be used to find a correlation or covariance matrix that is
586positive definite and close to the original matrix.
587Additional functions estimate spatial covariance matrix and regularized
588inverse covariance or precision matrix.
589
590.. module:: statsmodels.stats.correlation_tools
591   :synopsis: Procedures for ensuring correlations are positive semi-definite
592
593.. currentmodule:: statsmodels.stats.correlation_tools
594
595.. autosummary::
596   :toctree: generated/
597
598   corr_clipped
599   corr_nearest
600   corr_nearest_factor
601   corr_thresholded
602   cov_nearest
603   cov_nearest_factor_homog
604   FactoredPSDMatrix
605   kernel_covariance
606
607.. currentmodule:: statsmodels.stats.regularized_covariance
608
609.. autosummary::
610   :toctree: generated/
611
612   RegularizedInvCovariance
613
614These are utility functions to convert between central and non-central moments, skew,
615kurtosis and cummulants.
616
617.. module:: statsmodels.stats.moment_helpers
618   :synopsis: Tools for converting moments
619
620.. currentmodule:: statsmodels.stats.moment_helpers
621
622.. autosummary::
623   :toctree: generated/
624
625   cum2mc
626   mc2mnc
627   mc2mvsk
628   mnc2cum
629   mnc2mc
630   mnc2mvsk
631   mvsk2mc
632   mvsk2mnc
633   cov2corr
634   corr2cov
635   se_cov
636
637
638Mediation Analysis
639------------------
640
641Mediation analysis focuses on the relationships among three key variables:
642an 'outcome', a 'treatment', and a 'mediator'. Since mediation analysis is a
643form of causal inference, there are several assumptions involved that are
644difficult or impossible to verify. Ideally, mediation analysis is conducted in
645the context of an experiment such as this one in which the treatment is
646randomly assigned. It is also common for people to conduct mediation analyses
647using observational data in which the treatment may be thought of as an
648'exposure'. The assumptions behind mediation analysis are even more difficult
649to verify in an observational setting.
650
651.. module:: statsmodels.stats.mediation
652   :synopsis: Mediation analysis
653
654.. currentmodule:: statsmodels.stats.mediation
655
656.. autosummary::
657   :toctree: generated/
658
659   Mediation
660   MediationResults
661
662
663Oaxaca-Blinder Decomposition
664----------------------------
665
666The Oaxaca-Blinder, or Blinder-Oaxaca as some call it, decomposition attempts to explain
667gaps in means of groups. It uses the linear models of two given regression equations to
668show what is explained by regression coefficients and known data and what is unexplained
669using the same data. There are two types of Oaxaca-Blinder decompositions, the two-fold
670and the three-fold, both of which can and are used in Economics Literature to discuss
671differences in groups. This method helps classify discrimination or unobserved effects.
672This function attempts to port the functionality of the oaxaca command in STATA to Python.
673
674.. module:: statsmodels.stats.oaxaca
675   :synopsis: Oaxaca-Blinder Decomposition
676
677.. currentmodule:: statsmodels.stats.oaxaca
678
679.. autosummary::
680   :toctree: generated/
681
682   OaxacaBlinder
683   OaxacaResults
684
685
686Distance Dependence Measures
687----------------------------
688
689Distance dependence measures and the Distance Covariance (dCov) test.
690
691.. module:: statsmodels.stats.dist_dependence_measures
692   :synopsis: Distance Dependence Measures
693
694.. currentmodule:: statsmodels.stats.dist_dependence_measures
695
696.. autosummary::
697   :toctree: generated/
698
699   distance_covariance_test
700   distance_statistics
701   distance_correlation
702   distance_covariance
703   distance_variance
704
705
706Meta-Analysis
707-------------
708
709Functions for basic meta-analysis of a collection of sample statistics.
710
711Examples can be found in the notebook
712
713 * `Meta-Analysis <examples/notebooks/generated/metaanalysis1.html>`__
714
715Status: experimental, API might change, added in 0.12
716
717.. module:: statsmodels.stats.meta_analysis
718   :synopsis: Meta-Analysis
719
720.. currentmodule:: statsmodels.stats.meta_analysis
721
722.. autosummary::
723   :toctree: generated/
724
725   combine_effects
726   effectsize_2proportions
727   effectsize_smd
728   CombineResults
729
730The module also includes internal functions to compute random effects
731variance.
732
733
734.. autosummary::
735   :toctree: generated/
736
737   _fit_tau_iter_mm
738   _fit_tau_iterative
739   _fit_tau_mm
740