1:orphan: 2 3.. module:: statsmodels.tsa.vector_ar.var_model 4 :synopsis: Vector autoregressions 5 6.. currentmodule:: statsmodels.tsa.vector_ar.var_model 7 8.. _var: 9 10Vector Autoregressions :mod:`tsa.vector_ar` 11=========================================== 12 13:mod:`statsmodels.tsa.vector_ar` contains methods that are useful 14for simultaneously modeling and analyzing multiple time series using 15:ref:`Vector Autoregressions (VAR) <var>` and 16:ref:`Vector Error Correction Models (VECM) <vecm>`. 17 18.. _var_process: 19 20VAR(p) processes 21---------------- 22 23We are interested in modeling a :math:`T \times K` multivariate time series 24:math:`Y`, where :math:`T` denotes the number of observations and :math:`K` the 25number of variables. One way of estimating relationships between the time series 26and their lagged values is the *vector autoregression process*: 27 28.. math:: 29 30 Y_t = \nu + A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t 31 32 u_t \sim {\sf Normal}(0, \Sigma_u) 33 34where :math:`A_i` is a :math:`K \times K` coefficient matrix. 35 36We follow in large part the methods and notation of `Lutkepohl (2005) 37<https://www.springer.com/gb/book/9783540401728>`__, 38which we will not develop here. 39 40Model fitting 41~~~~~~~~~~~~~ 42 43.. note:: 44 45 The classes referenced below are accessible via the 46 :mod:`statsmodels.tsa.api` module. 47 48To estimate a VAR model, one must first create the model using an `ndarray` of 49homogeneous or structured dtype. When using a structured or record array, the 50class will use the passed variable names. Otherwise they can be passed 51explicitly: 52 53.. ipython:: python 54 :suppress: 55 56 import pandas as pd 57 pd.options.display.max_rows = 10 58 import matplotlib 59 import matplotlib.pyplot as plt 60 matplotlib.style.use('ggplot') 61 62.. ipython:: python 63 :okwarning: 64 65 # some example data 66 import numpy as np 67 import pandas 68 import statsmodels.api as sm 69 from statsmodels.tsa.api import VAR 70 mdata = sm.datasets.macrodata.load_pandas().data 71 72 # prepare the dates index 73 dates = mdata[['year', 'quarter']].astype(int).astype(str) 74 quarterly = dates["year"] + "Q" + dates["quarter"] 75 from statsmodels.tsa.base.datetools import dates_from_str 76 quarterly = dates_from_str(quarterly) 77 78 mdata = mdata[['realgdp','realcons','realinv']] 79 mdata.index = pandas.DatetimeIndex(quarterly) 80 data = np.log(mdata).diff().dropna() 81 82 # make a VAR model 83 model = VAR(data) 84 85.. note:: 86 87 The :class:`VAR` class assumes that the passed time series are 88 stationary. Non-stationary or trending data can often be transformed to be 89 stationary by first-differencing or some other method. For direct analysis of 90 non-stationary time series, a standard stable VAR(p) model is not 91 appropriate. 92 93To actually do the estimation, call the `fit` method with the desired lag 94order. Or you can have the model select a lag order based on a standard 95information criterion (see below): 96 97.. ipython:: python 98 :okwarning: 99 100 results = model.fit(2) 101 results.summary() 102 103Several ways to visualize the data using `matplotlib` are available. 104 105Plotting input time series: 106 107.. ipython:: python 108 :okwarning: 109 110 @savefig var_plot_input.png 111 results.plot() 112 113 114Plotting time series autocorrelation function: 115 116.. ipython:: python 117 118 @savefig var_plot_acorr.png 119 results.plot_acorr() 120 121 122Lag order selection 123~~~~~~~~~~~~~~~~~~~ 124 125Choice of lag order can be a difficult problem. Standard analysis employs 126likelihood test or information criteria-based order selection. We have 127implemented the latter, accessible through the :class:`VAR` class: 128 129.. ipython:: python 130 131 model.select_order(15) 132 133When calling the `fit` function, one can pass a maximum number of lags and the 134order criterion to use for order selection: 135 136.. ipython:: python 137 138 results = model.fit(maxlags=15, ic='aic') 139 140Forecasting 141~~~~~~~~~~~ 142 143The linear predictor is the optimal h-step ahead forecast in terms of 144mean-squared error: 145 146.. math:: 147 148 y_t(h) = \nu + A_1 y_t(h − 1) + \cdots + A_p y_t(h − p) 149 150We can use the `forecast` function to produce this forecast. Note that we have 151to specify the "initial value" for the forecast: 152 153.. ipython:: python 154 155 lag_order = results.k_ar 156 results.forecast(data.values[-lag_order:], 5) 157 158The `forecast_interval` function will produce the above forecast along with 159asymptotic standard errors. These can be visualized using the `plot_forecast` 160function: 161 162.. ipython:: python 163 164 @savefig var_forecast.png 165 results.plot_forecast(10) 166 167Class Reference 168~~~~~~~~~~~~~~~ 169 170.. module:: statsmodels.tsa.vector_ar 171 :synopsis: Vector autoregressions and related tools 172 173.. currentmodule:: statsmodels.tsa.vector_ar.var_model 174 175 176.. autosummary:: 177 :toctree: generated/ 178 179 VAR 180 VARProcess 181 VARResults 182 183 184Post-estimation Analysis 185~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 186 187Several process properties and additional results after 188estimation are available for vector autoregressive processes. 189 190.. currentmodule:: statsmodels.tsa.vector_ar.var_model 191.. autosummary:: 192 :toctree: generated/ 193 194 LagOrderResults 195 196.. currentmodule:: statsmodels.tsa.vector_ar.hypothesis_test_results 197.. autosummary:: 198 :toctree: generated/ 199 200 HypothesisTestResults 201 NormalityTestResults 202 WhitenessTestResults 203 204 205Impulse Response Analysis 206------------------------- 207 208*Impulse responses* are of interest in econometric studies: they are the 209estimated responses to a unit impulse in one of the variables. They are computed 210in practice using the MA(:math:`\infty`) representation of the VAR(p) process: 211 212.. math:: 213 214 Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i} 215 216We can perform an impulse response analysis by calling the `irf` function on a 217`VARResults` object: 218 219.. ipython:: python 220 :okwarning: 221 222 irf = results.irf(10) 223 224These can be visualized using the `plot` function, in either orthogonalized or 225non-orthogonalized form. Asymptotic standard errors are plotted by default at 226the 95% significance level, which can be modified by the user. 227 228.. note:: 229 230 Orthogonalization is done using the Cholesky decomposition of the estimated 231 error covariance matrix :math:`\hat \Sigma_u` and hence interpretations may 232 change depending on variable ordering. 233 234.. ipython:: python 235 236 @savefig var_irf.png 237 irf.plot(orth=False) 238 239 240Note the `plot` function is flexible and can plot only variables of interest if 241so desired: 242 243.. ipython:: python 244 245 @savefig var_realgdp.png 246 irf.plot(impulse='realgdp') 247 248The cumulative effects :math:`\Psi_n = \sum_{i=0}^n \Phi_i` can be plotted with 249the long run effects as follows: 250 251.. ipython:: python 252 253 @savefig var_irf_cum.png 254 irf.plot_cum_effects(orth=False) 255 256 257.. currentmodule:: statsmodels.tsa.vector_ar.irf 258.. autosummary:: 259 :toctree: generated/ 260 261 IRAnalysis 262 263Forecast Error Variance Decomposition (FEVD) 264-------------------------------------------- 265 266Forecast errors of component j on k in an i-step ahead forecast can be 267decomposed using the orthogonalized impulse responses :math:`\Theta_i`: 268 269.. math:: 270 271 \omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h) 272 273 \mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j 274 275These are computed via the `fevd` function up through a total number of steps ahead: 276 277.. ipython:: python 278 279 fevd = results.fevd(5) 280 fevd.summary() 281 282They can also be visualized through the returned :class:`FEVD` object: 283 284.. ipython:: python 285 286 @savefig var_fevd.png 287 results.fevd(20).plot() 288 289 290.. currentmodule:: statsmodels.tsa.vector_ar.var_model 291.. autosummary:: 292 :toctree: generated/ 293 294 FEVD 295 296Statistical tests 297----------------- 298 299A number of different methods are provided to carry out hypothesis tests about 300the model results and also the validity of the model assumptions (normality, 301whiteness / "iid-ness" of errors, etc.). 302 303Granger causality 304~~~~~~~~~~~~~~~~~ 305 306One is often interested in whether a variable or group of variables is "causal" 307for another variable, for some definition of "causal". In the context of VAR 308models, one can say that a set of variables are Granger-causal within one of the 309VAR equations. We will not detail the mathematics or definition of Granger 310causality, but leave it to the reader. The :class:`VARResults` object has the 311`test_causality` method for performing either a Wald (:math:`\chi^2`) test or an 312F-test. 313 314.. ipython:: python 315 316 results.test_causality('realgdp', ['realinv', 'realcons'], kind='f') 317 318Normality 319~~~~~~~~~ 320 321As pointed out in the beginning of this document, the white noise component 322:math:`u_t` is assumed to be normally distributed. While this assumption 323is not required for parameter estimates to be consistent or asymptotically 324normal, results are generally more reliable in finite samples when residuals 325are Gaussian white noise. To test whether this assumption is consistent with 326a data set, :class:`VARResults` offers the `test_normality` method. 327 328.. ipython:: python 329 330 results.test_normality() 331 332Whiteness of residuals 333~~~~~~~~~~~~~~~~~~~~~~ 334 335To test the whiteness of the estimation residuals (this means absence of 336significant residual autocorrelations) one can use the `test_whiteness` 337method of :class:`VARResults`. 338 339 340.. currentmodule:: statsmodels.tsa.vector_ar.hypothesis_test_results 341.. autosummary:: 342 :toctree: generated/ 343 344 HypothesisTestResults 345 CausalityTestResults 346 NormalityTestResults 347 WhitenessTestResults 348 349.. _svar: 350 351Structural Vector Autoregressions 352--------------------------------- 353 354There are a matching set of classes that handle some types of Structural VAR models. 355 356.. module:: statsmodels.tsa.vector_ar.svar_model 357 :synopsis: Structural vector autoregressions and related tools 358 359.. currentmodule:: statsmodels.tsa.vector_ar.svar_model 360 361.. autosummary:: 362 :toctree: generated/ 363 364 SVAR 365 SVARProcess 366 SVARResults 367 368.. _vecm: 369 370Vector Error Correction Models (VECM) 371------------------------------------- 372 373Vector Error Correction Models are used to study short-run deviations from 374one or more permanent stochastic trends (unit roots). A VECM models the 375difference of a vector of time series by imposing structure that is implied 376by the assumed number of stochastic trends. :class:`VECM` is used to 377specify and estimate these models. 378 379A VECM(:math:`k_{ar}-1`) has the following form 380 381.. math:: 382 383 \Delta y_t = \Pi y_{t-1} + \Gamma_1 \Delta y_{t-1} + \ldots 384 + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + u_t 385 386where 387 388.. math:: 389 390 \Pi = \alpha \beta' 391 392as described in chapter 7 of [1]_. 393 394A VECM(:math:`k_{ar} - 1`) with deterministic terms has the form 395 396.. math:: 397 398 \Delta y_t = \alpha \begin{pmatrix}\beta' & \eta'\end{pmatrix} \begin{pmatrix}y_{t-1} \\ 399 D^{co}_{t-1}\end{pmatrix} + \Gamma_1 \Delta y_{t-1} + \dots + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + C D_t + u_t. 400 401In :math:`D^{co}_{t-1}` we have the deterministic terms which are inside 402the cointegration relation (or restricted to the cointegration relation). 403:math:`\eta` is the corresponding estimator. To pass a deterministic term 404inside the cointegration relation, we can use the `exog_coint` argument. 405For the two special cases of an intercept and a linear trend there exists 406a simpler way to declare these terms: we can pass ``"ci"`` and ``"li"`` 407respectively to the `deterministic` argument. So for an intercept inside 408the cointegration relation we can either pass ``"ci"`` as `deterministic` 409or `np.ones(len(data))` as `exog_coint` if `data` is passed as the 410`endog` argument. This ensures that :math:`D_{t-1}^{co} = 1` for all 411:math:`t`. 412 413We can also use deterministic terms outside the cointegration relation. 414These are defined in :math:`D_t` in the formula above with the 415corresponding estimators in the matrix :math:`C`. We specify such terms by 416passing them to the `exog` argument. For an intercept and/or linear trend 417we again have the possibility to use `deterministic` alternatively. For 418an intercept we pass ``"co"`` and for a linear trend we pass ``"lo"`` where 419the `o` stands for `outside`. 420 421The following table shows the five cases considered in [2]_. The last 422column indicates which string to pass to the `deterministic` argument for 423each of these cases. 424 425==== =============================== =================================== ============= 426Case Intercept Slope of the linear trend `deterministic` 427==== =============================== =================================== ============= 428I 0 0 ``"nc"`` 429II :math:`- \alpha \beta^T \mu` 0 ``"ci"`` 430III :math:`\neq 0` 0 ``"co"`` 431IV :math:`\neq 0` :math:`- \alpha \beta^T \gamma` ``"coli"`` 432V :math:`\neq 0` :math:`\neq 0` ``"colo"`` 433==== =============================== =================================== ============= 434 435.. currentmodule:: statsmodels.tsa.vector_ar.vecm 436.. autosummary:: 437 :toctree: generated/ 438 439 VECM 440 coint_johansen 441 JohansenTestResult 442 select_order 443 select_coint_rank 444 VECMResults 445 CointRankResults 446 447 448References 449---------- 450.. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer. 451 452.. [2] Johansen, S. 1995. *Likelihood-Based Inference in Cointegrated * 453 *Vector Autoregressive Models*. Oxford University Press. 454