1.. module:: statsmodels.stats 2 :synopsis: Statistical methods and tests 3 4.. currentmodule:: statsmodels.stats 5 6.. _stats: 7 8 9Statistics :mod:`stats` 10======================= 11 12This section collects various statistical tests and tools. 13Some can be used independently of any models, some are intended as extension to the 14models and model results. 15 16API Warning: The functions and objects in this category are spread out in 17various modules and might still be moved around. We expect that in future the 18statistical tests will return class instances with more informative reporting 19instead of only the raw numbers. 20 21 22.. _stattools: 23 24 25Residual Diagnostics and Specification Tests 26-------------------------------------------- 27 28.. module:: statsmodels.stats.stattools 29 :synopsis: Statistical methods and tests that do not fit into other categories 30 31.. currentmodule:: statsmodels.stats.stattools 32 33.. autosummary:: 34 :toctree: generated/ 35 36 durbin_watson 37 jarque_bera 38 omni_normtest 39 medcouple 40 robust_skewness 41 robust_kurtosis 42 expected_robust_kurtosis 43 44.. module:: statsmodels.stats.diagnostic 45 :synopsis: Statistical methods and tests to diagnose model fit problems 46 47.. currentmodule:: statsmodels.stats.diagnostic 48 49.. autosummary:: 50 :toctree: generated/ 51 52 acorr_breusch_godfrey 53 acorr_ljungbox 54 acorr_lm 55 56 breaks_cusumolsresid 57 breaks_hansen 58 recursive_olsresiduals 59 60 compare_cox 61 compare_encompassing 62 compare_j 63 64 het_arch 65 het_breuschpagan 66 het_goldfeldquandt 67 het_white 68 spec_white 69 70 linear_harvey_collier 71 linear_lm 72 linear_rainbow 73 linear_reset 74 75 76Outliers and influence measures 77~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78 79.. module:: statsmodels.stats.outliers_influence 80 :synopsis: Statistical methods and measures for outliers and influence 81 82.. currentmodule:: statsmodels.stats.outliers_influence 83 84.. autosummary:: 85 :toctree: generated/ 86 87 OLSInfluence 88 GLMInfluence 89 MLEInfluence 90 variance_inflation_factor 91 92See also the notes on :ref:`notes on regression diagnostics <diagnostics>` 93 94Sandwich Robust Covariances 95--------------------------- 96 97The following functions calculate covariance matrices and standard errors for 98the parameter estimates that are robust to heteroscedasticity and 99autocorrelation in the errors. Similar to the methods that are available 100for the LinearModelResults, these methods are designed for use with OLS. 101 102.. currentmodule:: statsmodels.stats 103 104.. autosummary:: 105 :toctree: generated/ 106 107 sandwich_covariance.cov_hac 108 sandwich_covariance.cov_nw_panel 109 sandwich_covariance.cov_nw_groupsum 110 sandwich_covariance.cov_cluster 111 sandwich_covariance.cov_cluster_2groups 112 sandwich_covariance.cov_white_simple 113 114The following are standalone versions of the heteroscedasticity robust 115standard errors attached to LinearModelResults 116 117.. autosummary:: 118 :toctree: generated/ 119 120 sandwich_covariance.cov_hc0 121 sandwich_covariance.cov_hc1 122 sandwich_covariance.cov_hc2 123 sandwich_covariance.cov_hc3 124 125 sandwich_covariance.se_cov 126 127 128Goodness of Fit Tests and Measures 129---------------------------------- 130 131some tests for goodness of fit for univariate distributions 132 133.. module:: statsmodels.stats.gof 134 :synopsis: Goodness of fit measures and tests 135 136.. currentmodule:: statsmodels.stats.gof 137 138.. autosummary:: 139 :toctree: generated/ 140 141 powerdiscrepancy 142 gof_chisquare_discrete 143 gof_binning_discrete 144 chisquare_effectsize 145 146.. currentmodule:: statsmodels.stats.diagnostic 147 148.. autosummary:: 149 :toctree: generated/ 150 151 anderson_statistic 152 normal_ad 153 kstest_exponential 154 kstest_fit 155 kstest_normal 156 lilliefors 157 158Non-Parametric Tests 159-------------------- 160 161.. module:: statsmodels.sandbox.stats.runs 162 :synopsis: Experimental statistical methods and tests to analyze runs 163 164.. currentmodule:: statsmodels.sandbox.stats.runs 165 166.. autosummary:: 167 :toctree: generated/ 168 169 mcnemar 170 symmetry_bowker 171 median_test_ksample 172 runstest_1samp 173 runstest_2samp 174 cochrans_q 175 Runs 176 177.. currentmodule:: statsmodels.stats.descriptivestats 178 179.. autosummary:: 180 :toctree: generated/ 181 182 sign_test 183 184.. currentmodule:: statsmodels.stats.nonparametric 185 186.. autosummary:: 187 :toctree: generated/ 188 189 rank_compare_2indep 190 rank_compare_2ordinal 191 RankCompareResult 192 cohensd2problarger 193 prob_larger_continuous 194 rankdata_2samp 195 196 197Descriptive Statistics 198---------------------- 199 200.. module:: statsmodels.stats.descriptivestats 201 :synopsis: Descriptive statistics 202 203.. currentmodule:: statsmodels.stats.descriptivestats 204 205.. autosummary:: 206 :toctree: generated/ 207 208 describe 209 Description 210 211.. _interrater: 212 213Interrater Reliability and Agreement 214------------------------------------ 215 216The main function that statsmodels has currently available for interrater 217agreement measures and tests is Cohen's Kappa. Fleiss' Kappa is currently 218only implemented as a measures but without associated results statistics. 219 220.. module:: statsmodels.stats.inter_rater 221.. currentmodule:: statsmodels.stats.inter_rater 222 223.. autosummary:: 224 :toctree: generated/ 225 226 cohens_kappa 227 fleiss_kappa 228 to_table 229 aggregate_raters 230 231Multiple Tests and Multiple Comparison Procedures 232------------------------------------------------- 233 234`multipletests` is a function for p-value correction, which also includes p-value 235correction based on fdr in `fdrcorrection`. 236`tukeyhsd` performs simultaneous testing for the comparison of (independent) means. 237These three functions are verified. 238GroupsStats and MultiComparison are convenience classes to multiple comparisons similar 239to one way ANOVA, but still in development 240 241.. module:: statsmodels.sandbox.stats.multicomp 242 :synopsis: Experimental methods for controlling size while performing multiple comparisons 243 244 245.. currentmodule:: statsmodels.stats.multitest 246 247.. autosummary:: 248 :toctree: generated/ 249 250 multipletests 251 fdrcorrection 252 253.. currentmodule:: statsmodels.sandbox.stats.multicomp 254 255.. autosummary:: 256 :toctree: generated/ 257 258 GroupsStats 259 MultiComparison 260 TukeyHSDResults 261 262.. module:: statsmodels.stats.multicomp 263 :synopsis: Methods for controlling size while performing multiple comparisons 264 265.. currentmodule:: statsmodels.stats.multicomp 266 267.. autosummary:: 268 :toctree: generated/ 269 270 pairwise_tukeyhsd 271 272.. module:: statsmodels.stats.multitest 273 :synopsis: Multiple testing p-value and FDR adjustments 274 275.. currentmodule:: statsmodels.stats.multitest 276 277.. autosummary:: 278 :toctree: generated/ 279 280 local_fdr 281 fdrcorrection_twostage 282 NullDistribution 283 RegressionFDR 284 285.. module:: statsmodels.stats.knockoff_regeffects 286 :synopsis: Regression Knock-Off Effects 287 288.. currentmodule:: statsmodels.stats.knockoff_regeffects 289 290.. autosummary:: 291 :toctree: generated/ 292 293 CorrelationEffects 294 OLSEffects 295 ForwardEffects 296 OLSEffects 297 RegModelEffects 298 299The following functions are not (yet) public 300 301.. currentmodule:: statsmodels.sandbox.stats.multicomp 302 303.. autosummary:: 304 :toctree: generated/ 305 306 varcorrection_pairs_unbalanced 307 varcorrection_pairs_unequal 308 varcorrection_unbalanced 309 varcorrection_unequal 310 311 StepDown 312 catstack 313 ccols 314 compare_ordered 315 distance_st_range 316 ecdf 317 get_tukeyQcrit 318 homogeneous_subsets 319 maxzero 320 maxzerodown 321 mcfdr 322 qcrit 323 randmvn 324 rankdata 325 rejectionline 326 set_partition 327 set_remove_subs 328 tiecorrect 329 330.. _tost: 331 332Basic Statistics and t-Tests with frequency weights 333--------------------------------------------------- 334 335Besides basic statistics, like mean, variance, covariance and correlation for 336data with case weights, the classes here provide one and two sample tests 337for means. The t-tests have more options than those in scipy.stats, but are 338more restrictive in the shape of the arrays. Confidence intervals for means 339are provided based on the same assumptions as the t-tests. 340 341Additionally, tests for equivalence of means are available for one sample and 342for two, either paired or independent, samples. These tests are based on TOST, 343two one-sided tests, which have as null hypothesis that the means are not 344"close" to each other. 345 346.. module:: statsmodels.stats.weightstats 347 :synopsis: Weighted statistics 348 349.. currentmodule:: statsmodels.stats.weightstats 350 351.. autosummary:: 352 :toctree: generated/ 353 354 DescrStatsW 355 CompareMeans 356 ttest_ind 357 ttost_ind 358 ttost_paired 359 ztest 360 ztost 361 zconfint 362 363weightstats also contains tests and confidence intervals based on summary 364data 365 366.. currentmodule:: statsmodels.stats.weightstats 367 368.. autosummary:: 369 :toctree: generated/ 370 371 _tconfint_generic 372 _tstat_generic 373 _zconfint_generic 374 _zstat_generic 375 _zstat_generic2 376 377 378Power and Sample Size Calculations 379---------------------------------- 380 381The :mod:`power` module currently implements power and sample size calculations 382for the t-tests, normal based test, F-tests and Chisquare goodness of fit test. 383The implementation is class based, but the module also provides 384three shortcut functions, ``tt_solve_power``, ``tt_ind_solve_power`` and 385``zt_ind_solve_power`` to solve for any one of the parameters of the power 386equations. 387 388 389.. module:: statsmodels.stats.power 390 :synopsis: Power and size calculations for common tests 391 392.. currentmodule:: statsmodels.stats.power 393 394.. autosummary:: 395 :toctree: generated/ 396 397 TTestIndPower 398 TTestPower 399 GofChisquarePower 400 NormalIndPower 401 FTestAnovaPower 402 FTestPower 403 normal_power_het 404 normal_sample_size_one_tail 405 tt_solve_power 406 tt_ind_solve_power 407 zt_ind_solve_power 408 409 410.. _proportion_stats: 411 412Proportion 413---------- 414 415Also available are hypothesis test, confidence intervals and effect size for 416proportions that can be used with NormalIndPower. 417 418.. module:: statsmodels.stats.proportion 419 :synopsis: Tests for proportions 420 421.. currentmodule:: statsmodels.stats.proportion 422 423.. autosummary:: 424 :toctree: generated 425 426 proportion_confint 427 proportion_effectsize 428 429 binom_test 430 binom_test_reject_interval 431 binom_tost 432 binom_tost_reject_interval 433 434 multinomial_proportions_confint 435 436 proportions_ztest 437 proportions_ztost 438 proportions_chisquare 439 proportions_chisquare_allpairs 440 proportions_chisquare_pairscontrol 441 442 proportion_effectsize 443 power_binom_tost 444 power_ztost_prop 445 samplesize_confint_proportion 446 447Statistics for two independent samples 448Status: experimental, API might change, added in 0.12 449 450.. autosummary:: 451 :toctree: generated 452 453 test_proportions_2indep 454 confint_proportions_2indep 455 power_proportions_2indep 456 tost_proportions_2indep 457 samplesize_proportions_2indep_onetail 458 score_test_proportions_2indep 459 _score_confint_inversion 460 461 462Rates 463----- 464 465Statistical functions for rates. This currently includes hypothesis tests for 466two independent samples. 467 468Status: experimental, API might change, added in 0.12 469 470.. module:: statsmodels.stats.rates 471 :synopsis: Tests for Poisson rates 472 473.. currentmodule:: statsmodels.stats.rates 474 475.. autosummary:: 476 :toctree: generated 477 478 test_poisson_2indep 479 etest_poisson_2indep 480 tost_poisson_2indep 481 482 483Multivariate 484------------ 485 486Statistical functions for multivariate samples. 487 488This includes hypothesis test and confidence intervals for mean of sample 489of multivariate observations and hypothesis tests for the structure of a 490covariance matrix. 491 492Status: experimental, API might change, added in 0.12 493 494.. module:: statsmodels.stats.multivariate 495 :synopsis: Statistical functions for multivariate samples. 496 497.. currentmodule:: statsmodels.stats.multivariate 498 499.. autosummary:: 500 :toctree: generated 501 502 test_mvmean 503 confint_mvmean 504 confint_mvmean_fromstats 505 test_mvmean_2indep 506 test_cov 507 test_cov_blockdiagonal 508 test_cov_diagonal 509 test_cov_oneway 510 test_cov_spherical 511 512 513.. _oneway_stats: 514 515Oneway Anova 516------------ 517 518Hypothesis test, confidence intervals and effect size for oneway analysis of 519k samples. 520 521Status: experimental, API might change, added in 0.12 522 523.. module:: statsmodels.stats.oneway 524 :synopsis: Statistical functions for oneway analysis, Anova. 525 526.. currentmodule:: statsmodels.stats.oneway 527 528.. autosummary:: 529 :toctree: generated 530 531 532 anova_oneway 533 anova_generic 534 equivalence_oneway 535 equivalence_oneway_generic 536 power_equivalence_oneway 537 _power_equivalence_oneway_emp 538 539 test_scale_oneway 540 equivalence_scale_oneway 541 542 confint_effectsize_oneway 543 confint_noncentrality 544 convert_effectsize_fsqu 545 effectsize_oneway 546 f2_to_wellek 547 fstat_to_wellek 548 wellek_to_f2 549 _fstat2effectsize 550 551 scale_transform 552 simulate_power_equivalence_oneway 553 554 555.. _robust_stats: 556 557Robust, Trimmed Statistics 558-------------------------- 559 560Statistics for samples that are trimmed at a fixed fraction. This includes 561class TrimmedMean for one sample statistics. It is used in `stats.oneway` 562for trimmed "Yuen" Anova. 563 564Status: experimental, API might change, added in 0.12 565 566.. module:: statsmodels.stats.robust_compare 567 :synopsis: Trimmed sample statistics. 568 569.. currentmodule:: statsmodels.stats.robust_compare 570 571.. autosummary:: 572 :toctree: generated 573 574 TrimmedMean 575 scale_transform 576 trim_mean 577 trimboth 578 579 580Moment Helpers 581-------------- 582 583When there are missing values, then it is possible that a correlation or 584covariance matrix is not positive semi-definite. The following 585functions can be used to find a correlation or covariance matrix that is 586positive definite and close to the original matrix. 587Additional functions estimate spatial covariance matrix and regularized 588inverse covariance or precision matrix. 589 590.. module:: statsmodels.stats.correlation_tools 591 :synopsis: Procedures for ensuring correlations are positive semi-definite 592 593.. currentmodule:: statsmodels.stats.correlation_tools 594 595.. autosummary:: 596 :toctree: generated/ 597 598 corr_clipped 599 corr_nearest 600 corr_nearest_factor 601 corr_thresholded 602 cov_nearest 603 cov_nearest_factor_homog 604 FactoredPSDMatrix 605 kernel_covariance 606 607.. currentmodule:: statsmodels.stats.regularized_covariance 608 609.. autosummary:: 610 :toctree: generated/ 611 612 RegularizedInvCovariance 613 614These are utility functions to convert between central and non-central moments, skew, 615kurtosis and cummulants. 616 617.. module:: statsmodels.stats.moment_helpers 618 :synopsis: Tools for converting moments 619 620.. currentmodule:: statsmodels.stats.moment_helpers 621 622.. autosummary:: 623 :toctree: generated/ 624 625 cum2mc 626 mc2mnc 627 mc2mvsk 628 mnc2cum 629 mnc2mc 630 mnc2mvsk 631 mvsk2mc 632 mvsk2mnc 633 cov2corr 634 corr2cov 635 se_cov 636 637 638Mediation Analysis 639------------------ 640 641Mediation analysis focuses on the relationships among three key variables: 642an 'outcome', a 'treatment', and a 'mediator'. Since mediation analysis is a 643form of causal inference, there are several assumptions involved that are 644difficult or impossible to verify. Ideally, mediation analysis is conducted in 645the context of an experiment such as this one in which the treatment is 646randomly assigned. It is also common for people to conduct mediation analyses 647using observational data in which the treatment may be thought of as an 648'exposure'. The assumptions behind mediation analysis are even more difficult 649to verify in an observational setting. 650 651.. module:: statsmodels.stats.mediation 652 :synopsis: Mediation analysis 653 654.. currentmodule:: statsmodels.stats.mediation 655 656.. autosummary:: 657 :toctree: generated/ 658 659 Mediation 660 MediationResults 661 662 663Oaxaca-Blinder Decomposition 664---------------------------- 665 666The Oaxaca-Blinder, or Blinder-Oaxaca as some call it, decomposition attempts to explain 667gaps in means of groups. It uses the linear models of two given regression equations to 668show what is explained by regression coefficients and known data and what is unexplained 669using the same data. There are two types of Oaxaca-Blinder decompositions, the two-fold 670and the three-fold, both of which can and are used in Economics Literature to discuss 671differences in groups. This method helps classify discrimination or unobserved effects. 672This function attempts to port the functionality of the oaxaca command in STATA to Python. 673 674.. module:: statsmodels.stats.oaxaca 675 :synopsis: Oaxaca-Blinder Decomposition 676 677.. currentmodule:: statsmodels.stats.oaxaca 678 679.. autosummary:: 680 :toctree: generated/ 681 682 OaxacaBlinder 683 OaxacaResults 684 685 686Distance Dependence Measures 687---------------------------- 688 689Distance dependence measures and the Distance Covariance (dCov) test. 690 691.. module:: statsmodels.stats.dist_dependence_measures 692 :synopsis: Distance Dependence Measures 693 694.. currentmodule:: statsmodels.stats.dist_dependence_measures 695 696.. autosummary:: 697 :toctree: generated/ 698 699 distance_covariance_test 700 distance_statistics 701 distance_correlation 702 distance_covariance 703 distance_variance 704 705 706Meta-Analysis 707------------- 708 709Functions for basic meta-analysis of a collection of sample statistics. 710 711Examples can be found in the notebook 712 713 * `Meta-Analysis <examples/notebooks/generated/metaanalysis1.html>`__ 714 715Status: experimental, API might change, added in 0.12 716 717.. module:: statsmodels.stats.meta_analysis 718 :synopsis: Meta-Analysis 719 720.. currentmodule:: statsmodels.stats.meta_analysis 721 722.. autosummary:: 723 :toctree: generated/ 724 725 combine_effects 726 effectsize_2proportions 727 effectsize_smd 728 CombineResults 729 730The module also includes internal functions to compute random effects 731variance. 732 733 734.. autosummary:: 735 :toctree: generated/ 736 737 _fit_tau_iter_mm 738 _fit_tau_iterative 739 _fit_tau_mm 740