1\chapter{Dynamic panel models}
2\label{chap:dpanel}
3
4\newcommand{\by}{\mathbf{y}}
5\newcommand{\bx}{\mathbf{x}}
6\newcommand{\bv}{\mathbf{v}}
7\newcommand{\bX}{\mathbf{X}}
8\newcommand{\bW}{\mathbf{W}}
9\newcommand{\bZ}{\mathbf{Z}}
10\newcommand{\bA}{\mathbf{A}}
11\newcommand{\bM}{\mathbf{M}}
12\newcommand{\biota}{\bm{\iota}}
13
14\newenvironment%
15{altcode}%
16{\vspace{1ex}\small\leftmargin 1em}{\vspace{1ex}}
17
18The command for estimating dynamic panel models in gretl is
19\texttt{dpanel}. This command supports both the ``difference''
20estimator \citep{arellano-bond91} and the ``system'' estimator
21\citep{blundell-bond98}, which has become the method of choice in the
22applied literature.
23
24\section{Introduction}
25\label{sec:dpanel-intro}
26
27\subsection{Notation}
28\label{sec:dpanel-notation}
29
30A dynamic linear panel data model can be represented as follows
31(in notation based on \cite{arellano03}):
32\begin{equation}
33  \label{eq:dpd-def}
34  y_{it} = \alpha y_{i,t-1} + \beta'x_{it} + \eta_{i} + v_{it}
35\end{equation}
36where $i=1,2\ldots,N$ indexes the cross-section units and $t$ indexes
37time.
38
39The main idea behind the difference estimator is to sweep out the
40individual effect via differencing.  First-differencing eq.\
41(\ref{eq:dpd-def}) yields
42\begin{equation}
43  \label{eq:dpd-dif}
44  \Delta y_{it} = \alpha \Delta y_{i,t-1} + \beta'\Delta x_{it} +
45  \Delta v_{it} = \gamma' W_{it} + \Delta v_{it} ,
46\end{equation}
47in obvious notation. The error term of (\ref{eq:dpd-dif}) is, by
48construction, autocorrelated and also correlated with the lagged
49dependent variable, so an estimator that takes both issues into
50account is needed. The endogeneity issue is solved by noting that all
51values of $y_{i,t-k}$ with $k>1$ can be used as instruments for
52$\Delta y_{i,t-1}$: unobserved values of $y_{i,t-k}$ (whether missing
53or pre-sample) can safely be substituted with 0. In the language of
54GMM, this amounts to using the relation
55\begin{equation}
56  \label{eq:OC-dif}
57  E(\Delta v_{it} \cdot y_{i,t-k}) = 0, \quad k>1
58\end{equation}
59as an orthogonality condition.
60
61Autocorrelation is dealt with by noting that if $v_{it}$ is white
62noise, the covariance matrix of the vector whose typical element is
63$\Delta v_{it}$ is proportional to a matrix $H$ that has 2 on the main
64diagonal, $-1$ on the first subdiagonals and 0 elsewhere.  One-step
65GMM estimation of equation (\ref{eq:dpd-dif}) amounts to computing
66\begin{equation}
67\label{eq:dif-gmm}
68  \hat{\gamma} = \left[
69    \left( \sum_i \bW_i'\bZ_i \right) \bA_N
70    \left( \sum_i \bZ_i'\bW_i \right) \right]^{-1}
71    \left( \sum_i \bW_i'\bZ_i \right) \bA_N
72    \left( \sum_i \bZ_i'\Delta \by_i \right)
73\end{equation}
74where
75\begin{align*}
76  \Delta \by_i  & =
77     \left[ \begin{array}{ccc}
78         \Delta y_{i,3} & \cdots & \Delta y_{i,T}
79       \end{array} \right]' \\
80  \bW_i  & =
81     \left[ \begin{array}{ccc}
82         \Delta y_{i,2} & \cdots & \Delta y_{i,T-1} \\
83         \Delta x_{i,3} & \cdots & \Delta x_{i,T} \\
84       \end{array} \right]' \\
85  \bZ_i  & =
86     \left[ \begin{array}{ccccccc}
87         y_{i1} & 0 & 0 & \cdots & 0 & \Delta x_{i3}\\
88         0 & y_{i1} & y_{i2} & \cdots & 0 & \Delta x_{i4}\\
89         & & \vdots \\
90         0 & 0 & 0 & \cdots & y_{i, T-2} & \Delta x_{iT} \\
91       \end{array} \right]' \\
92  \intertext{and}
93  \bA_N & = \left( \sum_i \bZ_i' H \bZ_i \right)^{-1}
94\end{align*}
95
96Once the 1-step estimator is computed, the sample covariance matrix of
97the estimated residuals can be used instead of $H$ to obtain 2-step
98estimates, which are not only consistent but asymptotically
99efficient. (In principle the process may be iterated, but nobody seems
100to be interested.) Standard GMM theory applies, except for one thing:
101\cite{Windmeijer05} has computed finite-sample corrections to the
102asymptotic covariance matrix of the parameters, which are nowadays
103almost universally used.
104
105The difference estimator is consistent, but has been shown to have
106poor properties in finite samples when $\alpha$ is near one. People
107these days prefer the so-called ``system'' estimator, which
108complements the differenced data (with lagged levels used as
109instruments) with data in levels (using lagged differences as
110instruments). The system estimator relies on an extra orthogonality
111condition which has to do with the earliest value of the dependent
112variable $y_{i,1}$. The interested reader is referred to \citet[pp.\
113124--125]{blundell-bond98} for details, but here it suffices to say
114that this condition is satisfied in mean-stationary models and brings
115an improvement in efficiency that may be substantial in many cases.
116
117The set of orthogonality conditions exploited in the system approach
118is not very much larger than with the difference estimator since most
119of the possible orthogonality conditions associated with the equations
120in levels are redundant, given those already used for the equations in
121differences.
122
123The key equations of the system estimator can be written as
124
125\begin{equation}
126\label{eq:sys-gmm}
127  \tilde{\gamma} = \left[
128    \left( \sum_i \tilde{\bW}_i'\tilde{\bZ}_i \right) \bA_N
129    \left( \sum_i \tilde{\bZ}_i'\tilde{\bW}_i \right) \right]^{-1}
130    \left( \sum_i \tilde{\bW}_i'\tilde{\bZ}_i \right) \bA_N
131    \left( \sum_i \tilde{\bZ}_i'\Delta \tilde{\by}_i \right)
132\end{equation}
133where
134\begin{align*}
135  \Delta \tilde{\by}_i  & =
136     \left[ \begin{array}{ccccccc}
137         \Delta y_{i3} & \cdots & \Delta y_{iT} & y_{i3} & \cdots & y_{iT}
138       \end{array} \right]' \\
139  \tilde{\bW}_i  & =
140     \left[ \begin{array}{cccccc}
141         \Delta y_{i2} & \cdots & \Delta y_{i,T-1} & y_{i2} & \cdots & y_{i,T-1} \\
142         \Delta x_{i3} & \cdots & \Delta x_{iT}  & x_{i3} & \cdots & x_{iT} \\
143       \end{array} \right]' \\
144  \tilde{\bZ}_i  & =
145     \left[ \begin{array}{ccccccccc}
146         y_{i1} & 0 & 0       & \cdots & 0  & 0  & \cdots & 0 & \Delta x_{i,3}\\
147         0 & y_{i1} & y_{i2} & \cdots & 0  & 0  & \cdots & 0 & \Delta x_{i,4}\\
148         & & \vdots \\
149         0 & 0 & 0 & \cdots & y_{i, T-2} & 0  & \cdots & 0  & \Delta x_{iT}\\
150         & & \vdots \\
151         0 & 0 & 0 & \cdots & 0 & \Delta y_{i2} & \cdots & 0  & x_{i3}\\
152         & & \vdots \\
153         0 & 0 & 0 & \cdots & 0 & 0 & \cdots & \Delta y_{i,T-1}  & x_{iT}\\
154       \end{array} \right]' \\
155  \intertext{and}
156  \bA_N & = \left( \sum_i \tilde{\bZ}_i' H^* \tilde{\bZ}_i \right)^{-1}
157\end{align*}
158
159In this case choosing a precise form for the matrix $H^*$ for the
160first step is no trivial matter. Its north-west block should be as
161similar as possible to the covariance matrix of the vector $\Delta
162v_{it}$, so the same choice as the ``difference'' estimator is
163appropriate. Ideally, the south-east block should be proportional to
164the covariance matrix of the vector $\biota \eta_i + \bv$, that is
165$\sigma^2_{v} I + \sigma^2_{\eta} \biota \biota'$; but since
166$\sigma^2_{\eta}$ is unknown and any positive definite matrix renders
167the estimator consistent, people just use $I$. The off-diagonal blocks
168should, in principle, contain the covariances between $\Delta v_{is}$
169and $v_{it}$, which would be an identity matrix if $v_{it}$ is white
170noise. However, since the south-east block is typically given a
171conventional value anyway, the benefit in making this choice is not
172obvious. Some packages use $I$; others use a zero matrix.
173Asymptotically, it should not matter, but on real datasets the
174difference between the resulting estimates can be noticeable.
175
176\subsection{Rank deficiency}
177\label{sec:rankdef}
178
179Both the difference estimator (\ref{eq:dif-gmm}) and the system
180estimator (\ref{eq:sys-gmm}) depend for their existence on the
181invertibility of $\bA_N$. This matrix may turn out to be singular for
182several reasons. However, this does not mean that the estimator is not
183computable: in some cases, adjustments are possible such that the
184estimator does exist, but the user should be aware that in these cases
185not all software packages use the same strategy and replication of
186results may prove difficult or even impossible.
187
188A first reason why $\bA_N$ may be singular could be the unavailability
189of instruments, chiefly because of missing observations. This case is
190easy to handle. If a particular row of $\tilde{\bZ}_i$ is zero for all
191units, the corresponding orthogonality condition (or the corresponding
192instrument if you prefer) is automatically dropped; of course, the
193overidentification rank is adjusted for testing purposes.
194
195Even if no instruments are zero, however, $\bA_N$ could be rank
196deficient. A trivial case occurs if there are collinear instruments,
197but a less trivial case may arise when $T$ (the total number of time
198periods available) is not much smaller than $N$ (the number of units),
199as, for example, in some macro datasets where the units are
200countries. The total number of potentially usable orthogonality
201conditions is $O(T^2)$, which may well exceed $N$ in some cases. Of
202course $\bA_N$ is the sum of $N$ matrices which have, at most, rank $2T -
2033$ and therefore it could well happen that the sum is singular.
204
205In all these cases, what we consider the ``proper'' way to go is to
206substitute the pseudo-inverse of $\bA_N$ (Moore--Penrose) for its regular
207inverse. Again, our choice is shared by some software packages, but
208not all, so replication may be hard.
209
210\subsection{Covariance matrix and standard errors}
211
212By default the standard errors shown by \texttt{dpanel} for 1-step
213estimation are robust, based on the heteroskedasticity-consistent
214variance estimator
215\[
216  \widehat{\rm Var}(\hat{\gamma}) =
217    \bM^{-1} \left(\sum_i\bW_i'\bZ_i\right)
218    \bA_N\hat{\mathbf{V}}_N\bA_N
219    \left(\sum_i\bZ_i'\bW_i\right) \bM^{-1}
220  \]
221  where $\bM = (\sum_i\bW_i'\bZ_i) \bA_N (\sum_i\bZ_i'\bW_i)$ and
222  $\hat{\mathbf{V}}_N = N^{-1} \sum_i
223  \bZ_i'\hat{\mathbf{u}}_i\hat{\mathbf{u}}_i'\bZ_i$, with
224  $\hat{\mathbf{u}}_i$ the vector of residuals in differences for
225  individual $i$.  In addition, as noted above, the variance estimator
226  for 2-step estimation employs the finite-sample correction of
227  \cite{Windmeijer05}.
228
229  When the \verb|--asymptotic| option is passed to \texttt{dpanel},
230  however, the 1-step variance estimator is simply
231  $\hat{\sigma}_u^2 M^{-1}$, which is not
232  heteroskedasticity-consistent, and the Windmeijer correction is not
233  applied for 2-step estimation. Use of the asymptotic option is not
234  recommended unless you wish to replicate prior results that did not
235  report robust standard errors. In particular, tests based on the
236  asymptotic 2-step variance estimator are known to over-reject quite
237  substantially (standard errors too small).
238
239\subsection{Treatment of missing values}
240
241Textbooks seldom bother with missing values, but in some cases their
242treatment may be far from obvious. This is especially true if missing
243values are interspersed between valid observations. For example,
244consider the plain difference estimator with one lag, so
245\[
246y_t = \alpha y_{t-1} + \eta + \epsilon_t
247\]
248where the $i$ index is omitted for clarity. Suppose you have an
249individual with $t=1\ldots5$, for which $y_3$ is missing. It may seem
250that the data for this individual are unusable, because
251differencing $y_t$ would produce something like
252\[
253\begin{array}{c|ccccc}
254  t & 1 & 2 & 3 & 4 & 5 \\
255  \hline
256  y_t & * & * & \circ & * & * \\
257  \Delta y_t & \circ & * & \circ & \circ & *
258\end{array}
259\]
260where $*$ = nonmissing and $\circ$ = missing. Estimation seems to be
261unfeasible, since there are no periods in which $\Delta y_t$ and
262$\Delta y_{t-1}$ are both observable.
263
264However, we can use a $k$-difference operator and get
265\[
266\Delta_k y_t = \alpha \Delta_k y_{t-1} + \Delta_k \epsilon_t
267\]
268where $\Delta_k = 1 - L^k$ and past levels of $y_t$ are perfectly
269valid instruments. In this example, we can choose $k=3$ and use $y_1$
270as an instrument, so this unit is in fact perfectly usable.
271
272Not all software packages seem to be aware of this possibility, so
273replicating published results may prove tricky if your dataset
274contains individuals with gaps between valid observations.
275
276\section{Usage}
277\label{sec:dpanel-usage}
278
279One of the concepts underlying the syntax of \texttt{dpanel} is that
280you get default values for several choices you may want to make, so
281that in a ``standard'' situation the command is very concise.  The
282simplest case of the model (\ref{eq:dpd-def}) is a plain AR(1)
283process:
284\begin{equation}
285\label{eq:dp1}
286  y_{i,t} = \alpha y_{i,t-1} + \eta_{i} + v_{it} .
287\end{equation}
288If you give the command
289\begin{code}
290  dpanel 1 ; y
291\end{code}
292gretl assumes that you want to estimate (\ref{eq:dp1}) via the
293difference estimator (\ref{eq:dif-gmm}), using as many orthogonality
294conditions as possible.  The scalar \texttt{1} between \texttt{dpanel}
295and the semicolon indicates that only one lag of \texttt{y} is
296included as an explanatory variable; using \texttt{2} would give an
297AR(2) model. The syntax that gretl uses for the non-seasonal AR and MA
298lags in an ARMA model is also supported in this context. For
299example, if you want the first and third lags of \texttt{y} (but not
300the second) included as explanatory variables you can say
301\begin{code}
302  dpanel {1 3} ; y
303\end{code}
304or you can use a pre-defined matrix for this purpose:
305\begin{code}
306  matrix ylags = {1, 3}
307  dpanel ylags ; y
308\end{code}
309To use a single lag of \texttt{y} other than the first you need to
310employ this mechanism:
311\begin{code}
312  dpanel {3} ; y # only lag 3 is included
313  dpanel 3 ; y   # compare: lags 1, 2 and 3 are used
314\end{code}
315
316To use the system estimator instead, you add the \verb|--system|
317option, as in
318\begin{code}
319  dpanel 1 ; y --system
320\end{code}
321The level orthogonality conditions and the corresponding instrument
322are appended automatically (see eq.\ \ref{eq:sys-gmm}).
323
324\subsection{Regressors}
325
326If we want to introduce additional regressors, we list them after the
327dependent variable in the same way as other gretl commands, such as
328\texttt{ols}.  For the difference orthogonality relations,
329\texttt{dpanel} takes care of transforming the regressors in parallel
330with the dependent variable.
331
332One case of potential ambiguity is when an intercept is specified but
333the difference-only estimator is selected, as in
334\begin{code}
335  dpanel 1 ; y const
336\end{code}
337In this case the default \texttt{dpanel} behavior, which agrees with
338Stata's \texttt{xtabond2}, is to drop the constant (since differencing
339reduces it to nothing but zeros). However, for compatibility with the
340DPD package for Ox, you can give the option \verb|--dpdstyle|, in
341which case the constant is retained (equivalent to including a linear
342trend in equation~\ref{eq:dpd-def}).  A similar point applies to the
343period-specific dummy variables which can be added in \texttt{dpanel}
344via the \verb|--time-dummies| option: in the differences-only case
345these dummies are entered in differenced form by default, but when the
346\verb|--dpdstyle| switch is applied they are entered in levels.
347
348The standard gretl syntax applies if you want to use lagged
349explanatory variables, so for example the command
350\begin{code}
351  dpanel 1 ; y const x(0 to -1) --system
352\end{code}
353would result in estimation of the model
354\[
355  y_{it} = \alpha y_{i,t-1} +
356  \beta_0 + \beta_1 x_{it} + \beta_2 x_{i,t-1} +
357  \eta_{i} + v_{it} .
358\]
359
360
361\subsection{Instruments}
362
363The default rules for instruments are:
364\begin{itemize}
365\item lags of the dependent variable are instrumented using all
366  available orthogonality conditions; and
367\item additional regressors are considered exogenous, so they are used
368  as their own instruments.
369\end{itemize}
370
371If a different policy is wanted, the instruments should be specified
372in an additional list, separated from the regressors list by a
373semicolon. The syntax closely mirrors that for the \texttt{tsls}
374command, but in this context it is necessary to distinguish between
375``regular'' instruments and what are often called ``GMM-style''
376instruments (that is, instruments that are handled in the same
377block-diagonal manner as lags of the dependent variable, as described
378above).
379
380``Regular'' instruments are transformed in the same way as
381regressors, and the contemporaneous value of the transformed variable
382is used to form an orthogonality condition. Since regressors are
383treated as exogenous by default, it follows that these two commands
384estimate the same model:
385
386\begin{code}
387  dpanel 1 ; y z
388  dpanel 1 ; y z ; z
389\end{code}
390The instrument specification in the second case simply confirms what
391is implicit in the first: that \texttt{z} is exogenous. Note, though,
392that if you have some additional variable \texttt{z2} which you want
393to add as a regular instrument, it then becomes necessary to
394include \texttt{z} in the instrument list if it is to be treated
395as exogenous:
396\begin{code}
397  dpanel 1 ; y z ; z2   # z is now implicitly endogenous
398  dpanel 1 ; y z ; z z2 # z is treated as exogenous
399\end{code}
400
401The specification of ``GMM-style'' instruments is handled by the
402special constructs \texttt{GMM()} and \texttt{GMMlevel()}.  The first
403of these relates to instruments for the equations in differences, and
404the second to the equations in levels. The syntax for \texttt{GMM()}
405is
406
407\begin{altcode}
408\texttt{GMM(}\textsl{name}\texttt{,} \textsl{minlag}\texttt{,}
409\textsl{maxlag}\texttt{)}
410\end{altcode}
411
412\noindent
413where \textsl{name} is replaced by the name of a series (or the name
414of a list of series), and \textsl{minlag} and \textsl{maxlag} are
415replaced by the minimum and maximum lags to be used as
416instruments. The same goes for \texttt{GMMlevel()}.
417
418One common use of \texttt{GMM()} is to limit the number of lagged
419levels of the dependent variable used as instruments for the equations
420in differences. It's well known that although exploiting all possible
421orthogonality conditions yields maximal asymptotic efficiency, in
422finite samples it may be preferable to use a smaller subset (but see
423also \cite{OkuiJoE2009}).  For example, the specification
424
425\begin{code}
426  dpanel 1 ; y ; GMM(y, 2, 4)
427\end{code}
428ensures that no lags of $y_t$ earlier than $t-4$ will be used as
429instruments.
430
431A second use of \texttt{GMM()} is to exploit more fully the potential
432block-diagonal orthogonality conditions offered by an exogenous
433regressor, or a related variable that does not appear as a regressor.
434For example, in
435
436\begin{code}
437  dpanel 1 ; y x ; GMM(z, 2, 6)
438\end{code}
439the variable \texttt{x} is considered an endogenous regressor, and up to
4405 lags of \texttt{z} are used as instruments.
441
442Note that in the following script fragment
443\begin{code}
444  dpanel 1 ; y z
445  dpanel 1 ; y z ; GMM(z,0,0)
446\end{code}
447the two estimation commands should not be expected to give the same
448result, as the sets of orthogonality relationships are subtly
449different.  In the latter case, you have $T-2$ separate orthogonality
450relationships pertaining to $z_{it}$, none of which has any
451implication for the other ones; in the former case, you only have one.
452In terms of the $\bZ_i$ matrix, the first form adds a single row to
453the bottom of the instruments matrix, while the second form adds a
454diagonal block with $T-2$ columns; that is,
455\[
456  \left[ \begin{array}{cccc}
457         z_{i3} & z_{i4} & \cdots & z_{it}
458       \end{array} \right]
459\]
460versus
461\[
462  \left[ \begin{array}{cccc}
463         z_{i3} & 0 & \cdots & 0 \\
464         0 & z_{i4} & \cdots & 0 \\
465          & \ddots & \ddots &  \\
466         0 & 0 & \cdots & z_{it}
467       \end{array} \right]
468\]
469
470\section{Replication of DPD results}
471\label{sec:DPD-replic}
472
473In this section we show how to replicate the results of some of the
474pioneering work with dynamic panel-data estimators by Arellano, Bond
475and Blundell.  As the DPD manual \citep*{DPDmanual} explains, it is
476difficult to replicate the original published results exactly, for two
477main reasons: not all of the data used in those studies are publicly
478available; and some of the choices made in the original software
479implementation of the estimators have been superseded.  Here,
480therefore, our focus is on replicating the results obtained using the
481current DPD package and reported in the DPD manual.
482
483The examples are based on the program files \texttt{abest1.ox},
484\texttt{abest3.ox} and \texttt{bbest1.ox}. These are included in the
485DPD package, along with the Arellano--Bond database files
486\texttt{abdata.bn7} and \texttt{abdata.in7}.\footnote{See
487  \url{http://www.doornik.com/download.html}.} The
488Arellano--Bond data are also provided with gretl, in the file
489\texttt{abdata.gdt}. In the following we do not show the output from
490DPD or gretl; it is somewhat voluminous, and is easily generated by
491the user. As of this writing the results from Ox/DPD and gretl are
492identical in all relevant respects for all of the examples
493shown.\footnote{To be specific, this is using Ox Console version 5.10,
494  version 1.24 of the DPD package, and gretl built from CVS as of
495  2010-10-23, all on Linux.}
496
497A complete Ox/DPD program to generate the results of interest takes
498this general form:
499
500\begin{code}
501#include <oxstd.h>
502#import <packages/dpd/dpd>
503
504main()
505{
506    decl dpd = new DPD();
507
508    dpd.Load("abdata.in7");
509    dpd.SetYear("YEAR");
510
511    // model-specific code here
512
513    delete dpd;
514}
515\end{code}
516%
517In the examples below we take this template for granted and show just
518the model-specific code.
519
520\subsection{Example 1}
521
522The following Ox/DPD code---drawn from \texttt{abest1.ox}---replicates
523column (b) of Table 4 in \cite{arellano-bond91}, an instance of the
524differences-only or GMM-DIF estimator. The dependent variable is the
525log of employment, \texttt{n}; the regressors include two lags of the
526dependent variable, current and lagged values of the log real-product
527wage, \texttt{w}, the current value of the log of gross capital,
528\texttt{k}, and current and lagged values of the log of industry
529output, \texttt{ys}. In addition the specification includes a constant
530and five year dummies; unlike the stochastic regressors, these
531deterministic terms are not differenced. In this specification the
532regressors \texttt{w}, \texttt{k} and \texttt{ys} are treated as
533exogenous and serve as their own instruments. In DPD syntax this
534requires entering these variables twice, on the \verb|X_VAR| and
535\verb|I_VAR| lines. The GMM-type (block-diagonal) instruments in this
536example are the second and subsequent lags of the level of \texttt{n}.
537Both 1-step and 2-step estimates are computed.
538
539\begin{code}
540dpd.SetOptions(FALSE); // don't use robust standard errors
541dpd.Select(Y_VAR, {"n", 0, 2});
542dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
543dpd.Select(I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
544
545dpd.Gmm("n", 2, 99);
546dpd.SetDummies(D_CONSTANT + D_TIME);
547
548print("\n\n***** Arellano & Bond (1991), Table 4 (b)");
549dpd.SetMethod(M_1STEP);
550dpd.Estimate();
551dpd.SetMethod(M_2STEP);
552dpd.Estimate();
553\end{code}
554
555Here is gretl code to do the same job:
556
557\begin{code}
558open abdata.gdt
559list X = w w(-1) k ys ys(-1)
560dpanel 2 ; n X const --time-dummies --asy --dpdstyle
561dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle
562\end{code}
563
564Note that in gretl the switch to suppress robust standard errors is
565\verb|--asymptotic|, here abbreviated to \verb|--asy|.\footnote{Option
566  flags in gretl can always be truncated, down to the minimal unique
567  abbreviation.} The \verb|--dpdstyle| flag specifies that the
568constant and dummies should not be differenced, in the context of a
569GMM-DIF model. With gretl's \texttt{dpanel} command it is not
570necessary to specify the exogenous regressors as their own instruments
571since this is the default; similarly, the use of the second and all
572longer lags of the dependent variable as GMM-type instruments is the
573default and need not be stated explicitly.
574
575\subsection{Example 2}
576
577The DPD file \texttt{abest3.ox} contains a variant of the above that
578differs with regard to the choice of instruments: the variables
579\texttt{w} and \texttt{k} are now treated as predetermined, and are
580instrumented GMM-style using the second and third lags of their
581levels. This approximates column (c) of Table 4 in
582\cite{arellano-bond91}.  We have modified the code in
583\texttt{abest3.ox} slightly to allow the use of robust
584(Windmeijer-corrected) standard errors, which are the default in both
585DPD and gretl with 2-step estimation:
586
587\begin{code}
588dpd.Select(Y_VAR, {"n", 0, 2});
589dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
590dpd.Select(I_VAR, {"ys", 0, 1});
591dpd.SetDummies(D_CONSTANT + D_TIME);
592
593dpd.Gmm("n", 2, 99);
594dpd.Gmm("w", 2, 3);
595dpd.Gmm("k", 2, 3);
596
597print("\n***** Arellano & Bond (1991), Table 4 (c)\n");
598print("        (but using different instruments!!)\n");
599dpd.SetMethod(M_2STEP);
600dpd.Estimate();
601\end{code}
602
603The gretl code is as follows:
604
605\begin{code}
606open abdata.gdt
607list X = w w(-1) k ys ys(-1)
608list Ivars = ys ys(-1)
609dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd
610\end{code}
611%
612Note that since we are now calling for an instrument set other then
613the default (following the second semicolon), it is necessary to
614include the \texttt{Ivars} specification for the variable \texttt{ys}.
615However, it is not necessary to specify \texttt{GMM(n,2,99)} since
616this remains the default treatment of the dependent variable.
617
618\subsection{Example 3}
619
620Our third example replicates the DPD output from \texttt{bbest1.ox}:
621this uses the same dataset as the previous examples but the model
622specifications are based on \cite{blundell-bond98}, and involve
623comparison of the GMM-DIF and GMM-SYS (``system'') estimators. The
624basic specification is slightly simplified in that the variable
625\texttt{ys} is not used and only one lag of the dependent variable
626appears as a regressor. The Ox/DPD code is:
627
628\begin{code}
629dpd.Select(Y_VAR, {"n", 0, 1});
630dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 1});
631dpd.SetDummies(D_CONSTANT + D_TIME);
632
633print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF");
634dpd.Gmm("n", 2, 99);
635dpd.Gmm("w", 2, 99);
636dpd.Gmm("k", 2, 99);
637dpd.SetMethod(M_2STEP);
638dpd.Estimate();
639
640print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS");
641dpd.GmmLevel("n", 1, 1);
642dpd.GmmLevel("w", 1, 1);
643dpd.GmmLevel("k", 1, 1);
644dpd.SetMethod(M_2STEP);
645dpd.Estimate();
646\end{code}
647
648Here is the corresponding gretl code:
649
650\begin{code}
651open abdata.gdt
652list X = w w(-1) k k(-1)
653list Z = w k
654
655# Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF
656dpanel 1 ; n X const ; GMM(Z,2,99) --time --two-step --dpd
657
658# Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS
659dpanel 1 ; n X const ; GMM(Z,2,99) GMMlevel(Z,1,1) \
660 --time --two-step --dpd --system
661\end{code}
662
663Note the use of the \verb|--system| option flag to specify GMM-SYS,
664including the default treatment of the dependent variable, which
665corresponds to \texttt{GMMlevel(n,1,1)}. In this case we also want to
666use lagged differences of the regressors \texttt{w} and \texttt{k} as
667instruments for the levels equations so we need explicit
668\texttt{GMMlevel} entries for those variables. If you want something
669other than the default treatment for the dependent variable as an
670instrument for the levels equations, you should give an explicit
671\texttt{GMMlevel} specification for that variable---and in that case
672the \verb|--system| flag is redundant (but harmless).
673
674For the sake of completeness, note that if you specify at least one
675\texttt{GMMlevel} term, \texttt{dpanel} will then include equations in
676levels, but it will not automatically add a default \texttt{GMMlevel}
677specification for the dependent variable unless the \verb|--system|
678option is given.
679
680\section{Cross-country growth example}
681\label{sec:dpanel-growth}
682
683The previous examples all used the Arellano--Bond dataset; for this
684example we use the dataset \texttt{CEL.gdt}, which is also included in
685the gretl distribution. As with the Arellano--Bond data, there are
686numerous missing values.  Details of the provenance of the data can be
687found by opening the dataset information window in the gretl GUI
688(\textsf{Data} menu, \textsf{Dataset info} item). This is a subset of
689the Barro--Lee 138-country panel dataset, an approximation to which is
690used in \citet*{CEL96} and \citet*{Bond2001}.\footnote{We say an
691  ``approximation'' because we have not been able to replicate exactly
692  the OLS results reported in the papers cited, though it seems from
693  the description of the data in \cite{CEL96} that we ought to be able
694  to do so.  We note that \cite{Bond2001} used data provided by
695  Professor Caselli yet did not manage to reproduce the latter's
696  results.}  Both of these papers explore the dynamic panel-data
697approach in relation to the issues of growth and convergence of per
698capita income across countries.
699
700The dependent variable is growth in real GDP per capita over
701successive five-year periods; the regressors are the log of the
702initial (five years prior) value of GDP per capita, the log-ratio of
703investment to GDP, $s$, in the prior five years, and the log of annual
704average population growth, $n$, over the prior five years plus 0.05 as
705stand-in for the rate of technical progress, $g$, plus the rate of
706depreciation, $\delta$ (with the last two terms assumed to be constant
707across both countries and periods).  The original model is
708\begin{equation}
709\label{eq:CEL96}
710\Delta_5 y_{it} = \beta y_{i,t-5} + \alpha s_{it} + \gamma (n_{it} +
711g + \delta) + \nu_t + \eta_i + \epsilon_{it}
712\end{equation}
713which allows for a time-specific disturbance $\nu_t$. The Solow model
714with Cobb--Douglas production function implies that $\gamma =
715-\alpha$, but this assumption is not imposed in estimation. The
716time-specific disturbance is eliminated by subtracting the period mean
717from each of the series.
718
719Equation (\ref{eq:CEL96}) can be transformed to an AR(1) dynamic
720panel-data model by adding $y_{i,t-5}$ to both sides, which gives
721\begin{equation}
722\label{eq:CEL96a}
723y_{it} = (1 + \beta) y_{i,t-5} + \alpha s_{it} + \gamma (n_{it} +
724g + \delta) + \eta_i + \epsilon_{it}
725\end{equation}
726where all variables are now assumed to be time-demeaned.
727
728In (rough) replication of \cite{Bond2001} we now proceed to estimate
729the following two models: (a) equation (\ref{eq:CEL96a}) via GMM-DIF,
730using as instruments the second and all longer lags of $y_{it}$,
731$s_{it}$ and $n_{it} + g + \delta$; and (b) equation
732(\ref{eq:CEL96a}) via GMM-SYS, using $\Delta y_{i,t-1}$, $\Delta
733s_{i,t-1}$ and $\Delta (n_{i,t-1} + g + \delta)$ as additional
734instruments in the levels equations. We report robust standard errors
735throughout. (As a purely notational matter, we now use ``$t-1$'' to
736refer to values five years prior to $t$, as in \cite{Bond2001}).
737
738The gretl script to do this job is shown below. Note that the final
739transformed versions of the variables (logs, with time-means
740subtracted) are named \texttt{ly} ($y_{it}$), \texttt{linv} ($s_{it}$)
741and \texttt{lngd} ($n_{it} + g + \delta$).
742%
743\begin{code}
744open CEL.gdt
745
746ngd = n + 0.05
747ly = log(y)
748linv = log(s)
749lngd = log(ngd)
750
751# take out time means
752loop i=1..8
753  smpl (time == i) --restrict --replace
754  ly -= mean(ly)
755  linv -= mean(linv)
756  lngd -= mean(lngd)
757endloop
758
759smpl --full
760list X = linv lngd
761# 1-step GMM-DIF
762dpanel 1 ; ly X ; GMM(X,2,99)
763# 2-step GMM-DIF
764dpanel 1 ; ly X ; GMM(X,2,99) --two-step
765# GMM-SYS
766dpanel 1 ; ly X ; GMM(X,2,99) GMMlevel(X,1,1) --two-step --sys
767\end{code}
768
769For comparison we estimated the same two models using Ox/DPD and the
770Stata command \texttt{xtabond2}. (In each case we constructed a
771comma-separated values dataset containing the data as transformed in
772the gretl script shown above, using a missing-value code appropriate
773to the target program.) For reference, the commands used with
774Stata are reproduced below:
775%
776\begin{code}
777#delimit ;
778insheet using CEL.csv
779tsset unit time;
780xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
781  gmm(lngd, lag(2 99)) rob nolev;
782xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
783  gmm(lngd, lag(2 99)) rob nolev twostep;
784xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
785  gmm(lngd, lag(2 99)) rob nocons twostep;
786\end{code}
787
788For the GMM-DIF model all three programs find 382 usable observations
789and 30 instruments, and yield identical parameter estimates and
790robust standard errors (up to the number of digits printed, or more);
791see Table~\ref{tab:growth-DIF}.\footnote{The coefficient shown for
792  \texttt{ly(-1)} in the Tables is that reported directly by the
793  software; for comparability with the original model (eq.\
794  \ref{eq:CEL96}) it is necesary to subtract 1, which produces the
795  expected negative value indicating conditional convergence in per
796  capita income.}
797
798\begin{table}[htbp]
799\begin{center}
800\begin{tabular}{lrrrr}
801& \multicolumn{2}{c}{1-step} & \multicolumn{2}{c}{2-step} \\
802& \multicolumn{1}{c}{coeff} & \multicolumn{1}{c}{std.\ error} &
803  \multicolumn{1}{c}{coeff} & \multicolumn{1}{c}{std.\ error} \\
804\texttt{ly(-1)} & 0.577564 & 0.1292 & 0.610056 & 0.1562 \\
805\texttt{linv} & 0.0565469 & 0.07082 & 0.100952 & 0.07772 \\
806\texttt{lngd} & $-$0.143950 & 0.2753 & $-$0.310041 & 0.2980 \\
807\end{tabular}
808\caption{GMM-DIF: Barro--Lee data}
809\label{tab:growth-DIF}
810\end{center}
811\end{table}
812
813Results for GMM-SYS estimation are shown in
814Table~\ref{tab:growth-SYS}. In this case we show two sets of gretl
815results: those labeled ``gretl(1)'' were obtained using gretl's
816\verb|--dpdstyle| option, while those labeled ``gretl(2)'' did not use
817that option---the intent being to reproduce the $H$ matrices used by
818Ox/DPD and \texttt{xtabond2} respectively.
819
820\begin{table}[htbp]
821\begin{center}
822\begin{tabular}{lrrrr}
823& \multicolumn{1}{c}{gretl(1)} &
824  \multicolumn{1}{c}{Ox/DPD} &
825  \multicolumn{1}{c}{gretl(2)} &
826  \multicolumn{1}{c}{xtabond2} \\
827\texttt{ly(-1)} & 0.9237 (0.0385) &
828  0.9167 (0.0373) &
829    0.9073 (0.0370) &
830      0.9073 (0.0370) \\
831\texttt{linv} & 0.1592 (0.0449) &
832  0.1636 (0.0441) &
833    0.1856 (0.0411) &
834      0.1856 (0.0411) \\
835\texttt{lngd} & $-$0.2370 (0.1485) &
836  $-$0.2178 (0.1433) &
837    $-$0.2355 (0.1501) &
838      $-$0.2355 (0.1501)
839\end{tabular}
840\caption{2-step GMM-SYS: Barro--Lee data (standard errors in parentheses)}
841\label{tab:growth-SYS}
842\end{center}
843\end{table}
844
845In this case all three programs use 479 observations; gretl and
846\texttt{xtabond2} use 41 instruments and produce the same estimates
847(when using the same $H$ matrix) while Ox/DPD nominally uses
84866.\footnote{This is a case of the issue described in
849  section~\ref{sec:rankdef}: the full $\bA_N$ matrix turns out to be
850  singular and special measures must be taken to produce estimates.}
851It is noteworthy that with GMM-SYS plus ``messy'' missing
852observations, the results depend on the precise array of instruments
853used, which in turn depends on the details of the implementation of
854the estimator.
855
856\section{Auxiliary test statistics}
857\label{sec:dpanel-aux}
858
859We have concentrated above on the parameter estimates and standard
860errors. Here we add a few words on the additional test statistics that
861typically accompany both GMM-DIF and GMM-SYS estimation. These include
862the Sargan test for overidentification, one or more Wald tests for the
863joint significance of the regressors (and time dummies, if applicable)
864and tests for first- and second-order autocorrelation of the residuals
865from the equations in differences.
866
867As in Ox/DPD, the Sargan test statistic reported by gretl is
868\[
869  S = \left(\sum_{i=1}^N \hat{\bv}^{*\prime}_i \bZ_i\right)
870   \bA_N \left(\sum_{i=1}^N \bZ_i' \hat{\bv}^*_i\right)
871\]
872where the $\hat{\bv}^*_i$ are the transformed (e.g.\ differenced)
873residuals for unit $i$.  Under the null hypothesis that the
874instruments are valid, $S$ is asymptotically distributed as chi-square
875with degrees of freedom equal to the number of overidentifying
876restrictions.
877
878In general we see a good level of agreement between gretl, DPD and
879\texttt{xtabond2} with regard to these statistics, with a few
880relatively minor exceptions. Specifically, \texttt{xtabond2} computes
881both a ``Sargan test'' and a ``Hansen test'' for overidentification,
882but what it calls the Hansen test is, apparently, what DPD calls the
883Sargan test. (We have had difficulty determining from the
884\texttt{xtabond2} documentation \citep{Roodman2006} exactly how its
885Sargan test is computed.) In addition there are cases where the
886degrees of freedom for the Sargan test differ between DPD and gretl;
887this occurs when the $\bA_N$ matrix is singular
888(section~\ref{sec:rankdef}). In concept the df equals the number of
889instruments minus the number of parameters estimated; for the first of
890these terms gretl uses the rank of $\bA_N$, while DPD appears to use
891the full dimension of this matrix.
892
893Negative first-order autocorrelation of the residuals is expected by
894construction of the estimator, so a significant value for the AR(1)
895test does not indicate a problem. If the AR(2) test is significant,
896however, this indicates violation of the maintained assumptions. Note
897that valid AR tests cannot be produced when the \verb|--asymptotic|
898option is specified in conjunction with one-step GMM-SYS estimation;
899if you need the tests, either add the \verb|two-step| option or drop
900the asymptotic flag (which is recommended in any case).
901
902\section{Post-estimation available statistics}
903\label{sec:dpanel-post}
904
905After estimation, the \dollar{model} accessor will return a bundle
906containing several items that may be of interest: most should be
907self-explanatory, but here's a partial list:
908
909\begin{center}
910\begin{tabular}{rp{0.6\textwidth}}
911  \hline
912  \textbf{Key} & \textbf{Content} \\
913  \hline
914  \texttt{AR1}, \texttt{AR2} & 1st and 2nd order autocorrelation test
915                               statistics \\
916  \texttt{sargan}, \texttt{sargan\_df} & Sargan test for
917                                         overidentifying restrictions
918                                         and corresponding degrees of freedom \\
919  \texttt{wald}, \texttt{wald\_df} & Wald test for
920                                     overall significance
921                                     and corresponding degrees of
922                                     freedom \\
923  \texttt{GMMinst} & The matrix $\bZ$ of instruments (see equations
924                     (\ref{eq:dpd-dif}) and (\ref{eq:sys-gmm}) \\
925  \texttt{wgtmat} & The matrix $\bA$ of GMM weights (see equations
926                    (\ref{eq:dpd-dif}) and (\ref{eq:sys-gmm}) \\
927  \hline
928\end{tabular}
929\end{center}
930
931Note, however, that \texttt{GMMinst} and \texttt{wgtmat} (which may be
932quite large matrices) are not saved in the \dollar{model} bundle by
933default; that requires use of the \option{keep-extra} option with the
934\cmd{dpanel} command. Listing~\ref{ex:dpanel-rep} illustrates use
935of these matrices to replicate via hansl commands the calculation of
936the GMM estimator.
937
938\begin{script}[p]
939  \scriptcaption{replication of built-in command via hansl commands}
940  \label{ex:dpanel-rep}
941\begin{scode}
942set verbose off
943open abdata.gdt
944
945# compose list of regressors
946list X = w w(-1) k k(-1)
947list Z = w k
948
949dpanel 1 ; n X const ; GMM(Z,2,99) --two-step --dpd --keep-extra
950
951### --- re-do by hand ----------------------------
952
953# fetch Z and A from model
954A = $model.wgtmat
955mZt = $model.GMMinst # note: transposed
956
957# create data matrices
958series valid = ok($uhat)
959series ddep = diff(n)
960series dldep = ddep(-1)
961list dreg = diff(X)
962
963smpl valid --dummy
964
965matrix m_reg = {dldep} ~ {dreg} ~ 1
966matrix m_dep = {ddep}
967
968matrix uno = mZt * m_reg
969matrix due = qform(uno', A)
970matrix tre = (uno'A) * (mZt * m_dep)
971matrix coef = due\tre
972
973print coef
974\end{scode}
975\end{script}
976
977
978\section{Memo: \texttt{dpanel} options}
979\label{sec:dpanel-options}
980
981\begin{center}
982\begin{tabular}{lp{.7\textwidth}}
983  \textit{flag} & \textit{effect} \\ [6pt]
984  \verb|--asymptotic| & Suppresses the use of robust standard errors \\
985  \verb|--two-step| & Calls for 2-step estimation (the default being 1-step) \\
986  \verb|--system| & Calls for GMM-SYS, with default treatment of the
987                    dependent variable, as in \texttt{GMMlevel(y,1,1)} \\
988  \verb|--time-dummies| & Includes period-specific dummy variables \\
989  \verb|--dpdstyle| & Compute the $H$ matrix as in DPD; also suppresses
990                      differencing of automatic time dummies and omission of intercept
991                      in the GMM-DIF case\\
992  \verb|--verbose| & Prints confirmation of the GMM-style instruments
993                     used; and when \verb|--two-step| is selected, prints
994                     the 1-step estimates first \\
995  \verb|--vcv| & Calls for printing of the covariance matrix \\
996  \verb|--quiet| & Suppresses the printing of results \\
997  \verb|--keep-extra| & Save additional matrices in \dollar{model}
998                        bundle (see above) \\
999\end{tabular}
1000\end{center}
1001
1002The time dummies option supports the qualifier \texttt{noprint}, as
1003in
1004\begin{code}
1005--time-dummies=noprint
1006\end{code}
1007
1008This means that although the dummies are included in the specification
1009their coefficients, standard errors and so on are not printed.
1010
1011%%% Local Variables:
1012%%% mode: latex
1013%%% TeX-master: "gretl-guide"
1014%%% End:
1015