1:orphan:
2
3.. module:: statsmodels.tsa.vector_ar.var_model
4   :synopsis: Vector autoregressions
5
6.. currentmodule:: statsmodels.tsa.vector_ar.var_model
7
8.. _var:
9
10Vector Autoregressions :mod:`tsa.vector_ar`
11===========================================
12
13:mod:`statsmodels.tsa.vector_ar` contains methods that are useful
14for simultaneously modeling and analyzing multiple time series using
15:ref:`Vector Autoregressions (VAR) <var>` and
16:ref:`Vector Error Correction Models (VECM) <vecm>`.
17
18.. _var_process:
19
20VAR(p) processes
21----------------
22
23We are interested in modeling a :math:`T \times K` multivariate time series
24:math:`Y`, where :math:`T` denotes the number of observations and :math:`K` the
25number of variables. One way of estimating relationships between the time series
26and their lagged values is the *vector autoregression process*:
27
28.. math::
29
30   Y_t = \nu + A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t
31
32   u_t \sim {\sf Normal}(0, \Sigma_u)
33
34where :math:`A_i` is a :math:`K \times K` coefficient matrix.
35
36We follow in large part the methods and notation of `Lutkepohl (2005)
37<https://www.springer.com/gb/book/9783540401728>`__,
38which we will not develop here.
39
40Model fitting
41~~~~~~~~~~~~~
42
43.. note::
44
45    The classes referenced below are accessible via the
46    :mod:`statsmodels.tsa.api` module.
47
48To estimate a VAR model, one must first create the model using an `ndarray` of
49homogeneous or structured dtype. When using a structured or record array, the
50class will use the passed variable names. Otherwise they can be passed
51explicitly:
52
53.. ipython:: python
54   :suppress:
55
56   import pandas as pd
57   pd.options.display.max_rows = 10
58   import matplotlib
59   import matplotlib.pyplot as plt
60   matplotlib.style.use('ggplot')
61
62.. ipython:: python
63   :okwarning:
64
65   # some example data
66   import numpy as np
67   import pandas
68   import statsmodels.api as sm
69   from statsmodels.tsa.api import VAR
70   mdata = sm.datasets.macrodata.load_pandas().data
71
72   # prepare the dates index
73   dates = mdata[['year', 'quarter']].astype(int).astype(str)
74   quarterly = dates["year"] + "Q" + dates["quarter"]
75   from statsmodels.tsa.base.datetools import dates_from_str
76   quarterly = dates_from_str(quarterly)
77
78   mdata = mdata[['realgdp','realcons','realinv']]
79   mdata.index = pandas.DatetimeIndex(quarterly)
80   data = np.log(mdata).diff().dropna()
81
82   # make a VAR model
83   model = VAR(data)
84
85.. note::
86
87   The :class:`VAR` class assumes that the passed time series are
88   stationary. Non-stationary or trending data can often be transformed to be
89   stationary by first-differencing or some other method. For direct analysis of
90   non-stationary time series, a standard stable VAR(p) model is not
91   appropriate.
92
93To actually do the estimation, call the `fit` method with the desired lag
94order. Or you can have the model select a lag order based on a standard
95information criterion (see below):
96
97.. ipython:: python
98   :okwarning:
99
100   results = model.fit(2)
101   results.summary()
102
103Several ways to visualize the data using `matplotlib` are available.
104
105Plotting input time series:
106
107.. ipython:: python
108   :okwarning:
109
110   @savefig var_plot_input.png
111   results.plot()
112
113
114Plotting time series autocorrelation function:
115
116.. ipython:: python
117
118   @savefig var_plot_acorr.png
119   results.plot_acorr()
120
121
122Lag order selection
123~~~~~~~~~~~~~~~~~~~
124
125Choice of lag order can be a difficult problem. Standard analysis employs
126likelihood test or information criteria-based order selection. We have
127implemented the latter, accessible through the :class:`VAR` class:
128
129.. ipython:: python
130
131   model.select_order(15)
132
133When calling the `fit` function, one can pass a maximum number of lags and the
134order criterion to use for order selection:
135
136.. ipython:: python
137
138   results = model.fit(maxlags=15, ic='aic')
139
140Forecasting
141~~~~~~~~~~~
142
143The linear predictor is the optimal h-step ahead forecast in terms of
144mean-squared error:
145
146.. math::
147
148   y_t(h) = \nu + A_1 y_t(h − 1) + \cdots + A_p y_t(h − p)
149
150We can use the `forecast` function to produce this forecast. Note that we have
151to specify the "initial value" for the forecast:
152
153.. ipython:: python
154
155   lag_order = results.k_ar
156   results.forecast(data.values[-lag_order:], 5)
157
158The `forecast_interval` function will produce the above forecast along with
159asymptotic standard errors. These can be visualized using the `plot_forecast`
160function:
161
162.. ipython:: python
163
164   @savefig var_forecast.png
165   results.plot_forecast(10)
166
167Class Reference
168~~~~~~~~~~~~~~~
169
170.. module:: statsmodels.tsa.vector_ar
171   :synopsis: Vector autoregressions and related tools
172
173.. currentmodule:: statsmodels.tsa.vector_ar.var_model
174
175
176.. autosummary::
177   :toctree: generated/
178
179   VAR
180   VARProcess
181   VARResults
182
183
184Post-estimation Analysis
185~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
186
187Several process properties and additional results after
188estimation are available for vector autoregressive processes.
189
190.. currentmodule:: statsmodels.tsa.vector_ar.var_model
191.. autosummary::
192   :toctree: generated/
193
194   LagOrderResults
195
196.. currentmodule:: statsmodels.tsa.vector_ar.hypothesis_test_results
197.. autosummary::
198   :toctree: generated/
199
200   HypothesisTestResults
201   NormalityTestResults
202   WhitenessTestResults
203
204
205Impulse Response Analysis
206-------------------------
207
208*Impulse responses* are of interest in econometric studies: they are the
209estimated responses to a unit impulse in one of the variables. They are computed
210in practice using the MA(:math:`\infty`) representation of the VAR(p) process:
211
212.. math::
213
214    Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}
215
216We can perform an impulse response analysis by calling the `irf` function on a
217`VARResults` object:
218
219.. ipython:: python
220   :okwarning:
221
222   irf = results.irf(10)
223
224These can be visualized using the `plot` function, in either orthogonalized or
225non-orthogonalized form. Asymptotic standard errors are plotted by default at
226the 95% significance level, which can be modified by the user.
227
228.. note::
229
230    Orthogonalization is done using the Cholesky decomposition of the estimated
231    error covariance matrix :math:`\hat \Sigma_u` and hence interpretations may
232    change depending on variable ordering.
233
234.. ipython:: python
235
236   @savefig var_irf.png
237   irf.plot(orth=False)
238
239
240Note the `plot` function is flexible and can plot only variables of interest if
241so desired:
242
243.. ipython:: python
244
245   @savefig var_realgdp.png
246   irf.plot(impulse='realgdp')
247
248The cumulative effects :math:`\Psi_n = \sum_{i=0}^n \Phi_i` can be plotted with
249the long run effects as follows:
250
251.. ipython:: python
252
253   @savefig var_irf_cum.png
254   irf.plot_cum_effects(orth=False)
255
256
257.. currentmodule:: statsmodels.tsa.vector_ar.irf
258.. autosummary::
259   :toctree: generated/
260
261   IRAnalysis
262
263Forecast Error Variance Decomposition (FEVD)
264--------------------------------------------
265
266Forecast errors of component j on k in an i-step ahead forecast can be
267decomposed using the orthogonalized impulse responses :math:`\Theta_i`:
268
269.. math::
270
271    \omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h)
272
273    \mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j
274
275These are computed via the `fevd` function up through a total number of steps ahead:
276
277.. ipython:: python
278
279   fevd = results.fevd(5)
280   fevd.summary()
281
282They can also be visualized through the returned :class:`FEVD` object:
283
284.. ipython:: python
285
286   @savefig var_fevd.png
287   results.fevd(20).plot()
288
289
290.. currentmodule:: statsmodels.tsa.vector_ar.var_model
291.. autosummary::
292   :toctree: generated/
293
294   FEVD
295
296Statistical tests
297-----------------
298
299A number of different methods are provided to carry out hypothesis tests about
300the model results and also the validity of the model assumptions (normality,
301whiteness / "iid-ness" of errors, etc.).
302
303Granger causality
304~~~~~~~~~~~~~~~~~
305
306One is often interested in whether a variable or group of variables is "causal"
307for another variable, for some definition of "causal". In the context of VAR
308models, one can say that a set of variables are Granger-causal within one of the
309VAR equations. We will not detail the mathematics or definition of Granger
310causality, but leave it to the reader. The :class:`VARResults` object has the
311`test_causality` method for performing either a Wald (:math:`\chi^2`) test or an
312F-test.
313
314.. ipython:: python
315
316   results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')
317
318Normality
319~~~~~~~~~
320
321As pointed out in the beginning of this document, the white noise component
322:math:`u_t` is assumed to be normally distributed. While this assumption
323is not required for parameter estimates to be consistent or asymptotically
324normal, results are generally more reliable in finite samples when residuals
325are Gaussian white noise. To test whether this assumption is consistent with
326a data set, :class:`VARResults` offers the `test_normality` method.
327
328.. ipython:: python
329
330    results.test_normality()
331
332Whiteness of residuals
333~~~~~~~~~~~~~~~~~~~~~~
334
335To test the whiteness of the estimation residuals (this means absence of
336significant residual autocorrelations) one can use the `test_whiteness`
337method of :class:`VARResults`.
338
339
340.. currentmodule:: statsmodels.tsa.vector_ar.hypothesis_test_results
341.. autosummary::
342   :toctree: generated/
343
344   HypothesisTestResults
345   CausalityTestResults
346   NormalityTestResults
347   WhitenessTestResults
348
349.. _svar:
350
351Structural Vector Autoregressions
352---------------------------------
353
354There are a matching set of classes that handle some types of Structural VAR models.
355
356.. module:: statsmodels.tsa.vector_ar.svar_model
357   :synopsis: Structural vector autoregressions and related tools
358
359.. currentmodule:: statsmodels.tsa.vector_ar.svar_model
360
361.. autosummary::
362   :toctree: generated/
363
364   SVAR
365   SVARProcess
366   SVARResults
367
368.. _vecm:
369
370Vector Error Correction Models (VECM)
371-------------------------------------
372
373Vector Error Correction Models are used to study short-run deviations from
374one or more permanent stochastic trends (unit roots). A VECM models the
375difference of a vector of time series by imposing structure that is implied
376by the assumed number of stochastic trends. :class:`VECM` is used to
377specify and estimate these models.
378
379A VECM(:math:`k_{ar}-1`) has the following form
380
381.. math::
382
383    \Delta y_t = \Pi y_{t-1} + \Gamma_1 \Delta y_{t-1} + \ldots
384                   + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + u_t
385
386where
387
388.. math::
389
390    \Pi = \alpha \beta'
391
392as described in chapter 7 of [1]_.
393
394A VECM(:math:`k_{ar} - 1`) with deterministic terms has the form
395
396.. math::
397
398   \Delta y_t = \alpha \begin{pmatrix}\beta' & \eta'\end{pmatrix} \begin{pmatrix}y_{t-1} \\
399                D^{co}_{t-1}\end{pmatrix} + \Gamma_1 \Delta y_{t-1} + \dots + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + C D_t + u_t.
400
401In :math:`D^{co}_{t-1}` we have the deterministic terms which are inside
402the cointegration relation (or restricted to the cointegration relation).
403:math:`\eta` is the corresponding estimator. To pass a deterministic term
404inside the cointegration relation, we can use the `exog_coint` argument.
405For the two special cases of an intercept and a linear trend there exists
406a simpler way to declare these terms: we can pass ``"ci"`` and ``"li"``
407respectively to the `deterministic` argument. So for an intercept inside
408the cointegration relation we can either pass ``"ci"`` as `deterministic`
409or `np.ones(len(data))` as `exog_coint` if `data` is passed as the
410`endog` argument. This ensures that :math:`D_{t-1}^{co} = 1` for all
411:math:`t`.
412
413We can also use deterministic terms outside the cointegration relation.
414These are defined in :math:`D_t` in the formula above with the
415corresponding estimators in the matrix :math:`C`. We specify such terms by
416passing them to the `exog` argument. For an intercept and/or linear trend
417we again have the possibility to use `deterministic` alternatively. For
418an intercept we pass ``"co"`` and for a linear trend we pass ``"lo"`` where
419the `o` stands for `outside`.
420
421The following table shows the five cases considered in [2]_. The last
422column indicates which string to pass to the `deterministic` argument for
423each of these cases.
424
425====  ===============================  ===================================  =============
426Case  Intercept                        Slope of the linear trend            `deterministic`
427====  ===============================  ===================================  =============
428I     0                                0                                    ``"nc"``
429II    :math:`- \alpha \beta^T \mu`     0                                    ``"ci"``
430III   :math:`\neq 0`                   0                                    ``"co"``
431IV    :math:`\neq 0`                   :math:`- \alpha \beta^T \gamma`      ``"coli"``
432V     :math:`\neq 0`                   :math:`\neq 0`                       ``"colo"``
433====  ===============================  ===================================  =============
434
435.. currentmodule:: statsmodels.tsa.vector_ar.vecm
436.. autosummary::
437   :toctree: generated/
438
439   VECM
440   coint_johansen
441   JohansenTestResult
442   select_order
443   select_coint_rank
444   VECMResults
445   CointRankResults
446
447
448References
449----------
450.. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
451
452.. [2] Johansen, S. 1995. *Likelihood-Based Inference in Cointegrated *
453       *Vector Autoregressive Models*. Oxford University Press.
454