doc/tex/dpanel.tex

\chapter{Dynamic panel models}
\label{chap:dpanel}

\newcommand{\by}{\mathbf{y}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bv}{\mathbf{v}}
\newcommand{\bX}{\mathbf{X}}
\newcommand{\bW}{\mathbf{W}}
\newcommand{\bZ}{\mathbf{Z}}
\newcommand{\bA}{\mathbf{A}}
\newcommand{\bM}{\mathbf{M}}
\newcommand{\biota}{\bm{\iota}}

\newenvironment%
{altcode}%
{\vspace{1ex}\small\leftmargin 1em}{\vspace{1ex}}

The command for estimating dynamic panel models in gretl is
\texttt{dpanel}. This command supports both the ``difference''
estimator \citep{arellano-bond91} and the ``system'' estimator
\citep{blundell-bond98}, which has become the method of choice in the
applied literature.

\section{Introduction}
\label{sec:dpanel-intro}

\subsection{Notation}
\label{sec:dpanel-notation}

A dynamic linear panel data model can be represented as follows
(in notation based on \cite{arellano03}):
\begin{equation}
  \label{eq:dpd-def}
  y_{it} = \alpha y_{i,t-1} + \beta'x_{it} + \eta_{i} + v_{it}
\end{equation}
where $i=1,2\ldots,N$ indexes the cross-section units and $t$ indexes
time.

The main idea behind the difference estimator is to sweep out the
individual effect via differencing.  First-differencing eq.\
(\ref{eq:dpd-def}) yields
\begin{equation}
  \label{eq:dpd-dif}
  \Delta y_{it} = \alpha \Delta y_{i,t-1} + \beta'\Delta x_{it} +
  \Delta v_{it} = \gamma' W_{it} + \Delta v_{it} ,
\end{equation}
in obvious notation. The error term of (\ref{eq:dpd-dif}) is, by
construction, autocorrelated and also correlated with the lagged
dependent variable, so an estimator that takes both issues into
account is needed. The endogeneity issue is solved by noting that all
values of $y_{i,t-k}$ with $k>1$ can be used as instruments for
$\Delta y_{i,t-1}$: unobserved values of $y_{i,t-k}$ (whether missing
or pre-sample) can safely be substituted with 0. In the language of
GMM, this amounts to using the relation
\begin{equation}
  \label{eq:OC-dif}
  E(\Delta v_{it} \cdot y_{i,t-k}) = 0, \quad k>1
\end{equation}
as an orthogonality condition.

Autocorrelation is dealt with by noting that if $v_{it}$ is white
noise, the covariance matrix of the vector whose typical element is
$\Delta v_{it}$ is proportional to a matrix $H$ that has 2 on the main
diagonal, $-1$ on the first subdiagonals and 0 elsewhere.  One-step
GMM estimation of equation (\ref{eq:dpd-dif}) amounts to computing
\begin{equation}
\label{eq:dif-gmm}
  \hat{\gamma} = \left[
    \left( \sum_i \bW_i'\bZ_i \right) \bA_N
    \left( \sum_i \bZ_i'\bW_i \right) \right]^{-1}
    \left( \sum_i \bW_i'\bZ_i \right) \bA_N
    \left( \sum_i \bZ_i'\Delta \by_i \right)
\end{equation}
where
\begin{align*}
  \Delta \by_i  & =
     \left[ \begin{array}{ccc}
         \Delta y_{i,3} & \cdots & \Delta y_{i,T}
       \end{array} \right]' \\
  \bW_i  & =
     \left[ \begin{array}{ccc}
         \Delta y_{i,2} & \cdots & \Delta y_{i,T-1} \\
         \Delta x_{i,3} & \cdots & \Delta x_{i,T} \\
       \end{array} \right]' \\
  \bZ_i  & =
     \left[ \begin{array}{ccccccc}
         y_{i1} & 0 & 0 & \cdots & 0 & \Delta x_{i3}\\
         0 & y_{i1} & y_{i2} & \cdots & 0 & \Delta x_{i4}\\
         & & \vdots \\
         0 & 0 & 0 & \cdots & y_{i, T-2} & \Delta x_{iT} \\
       \end{array} \right]' \\
  \intertext{and}
  \bA_N & = \left( \sum_i \bZ_i' H \bZ_i \right)^{-1}
\end{align*}

Once the 1-step estimator is computed, the sample covariance matrix of
the estimated residuals can be used instead of $H$ to obtain 2-step
estimates, which are not only consistent but asymptotically
efficient. (In principle the process may be iterated, but nobody seems
to be interested.) Standard GMM theory applies, except for one thing:
\cite{Windmeijer05} has computed finite-sample corrections to the
asymptotic covariance matrix of the parameters, which are nowadays
almost universally used.

The difference estimator is consistent, but has been shown to have
poor properties in finite samples when $\alpha$ is near one. People
these days prefer the so-called ``system'' estimator, which
complements the differenced data (with lagged levels used as
instruments) with data in levels (using lagged differences as
instruments). The system estimator relies on an extra orthogonality
condition which has to do with the earliest value of the dependent
variable $y_{i,1}$. The interested reader is referred to \citet[pp.\
124--125]{blundell-bond98} for details, but here it suffices to say
that this condition is satisfied in mean-stationary models and brings
an improvement in efficiency that may be substantial in many cases.

The set of orthogonality conditions exploited in the system approach
is not very much larger than with the difference estimator since most
of the possible orthogonality conditions associated with the equations
in levels are redundant, given those already used for the equations in
differences.

The key equations of the system estimator can be written as

\begin{equation}
\label{eq:sys-gmm}
  \tilde{\gamma} = \left[
    \left( \sum_i \tilde{\bW}_i'\tilde{\bZ}_i \right) \bA_N
    \left( \sum_i \tilde{\bZ}_i'\tilde{\bW}_i \right) \right]^{-1}
    \left( \sum_i \tilde{\bW}_i'\tilde{\bZ}_i \right) \bA_N
    \left( \sum_i \tilde{\bZ}_i'\Delta \tilde{\by}_i \right)
\end{equation}
where
\begin{align*}
  \Delta \tilde{\by}_i  & =
     \left[ \begin{array}{ccccccc}
         \Delta y_{i3} & \cdots & \Delta y_{iT} & y_{i3} & \cdots & y_{iT}
       \end{array} \right]' \\
  \tilde{\bW}_i  & =
     \left[ \begin{array}{cccccc}
         \Delta y_{i2} & \cdots & \Delta y_{i,T-1} & y_{i2} & \cdots & y_{i,T-1} \\
         \Delta x_{i3} & \cdots & \Delta x_{iT}  & x_{i3} & \cdots & x_{iT} \\
       \end{array} \right]' \\
  \tilde{\bZ}_i  & =
     \left[ \begin{array}{ccccccccc}
         y_{i1} & 0 & 0       & \cdots & 0  & 0  & \cdots & 0 & \Delta x_{i,3}\\
         0 & y_{i1} & y_{i2} & \cdots & 0  & 0  & \cdots & 0 & \Delta x_{i,4}\\
         & & \vdots \\
         0 & 0 & 0 & \cdots & y_{i, T-2} & 0  & \cdots & 0  & \Delta x_{iT}\\
         & & \vdots \\
         0 & 0 & 0 & \cdots & 0 & \Delta y_{i2} & \cdots & 0  & x_{i3}\\
         & & \vdots \\
         0 & 0 & 0 & \cdots & 0 & 0 & \cdots & \Delta y_{i,T-1}  & x_{iT}\\
       \end{array} \right]' \\
  \intertext{and}
  \bA_N & = \left( \sum_i \tilde{\bZ}_i' H^* \tilde{\bZ}_i \right)^{-1}
\end{align*}

In this case choosing a precise form for the matrix $H^*$ for the
first step is no trivial matter. Its north-west block should be as
similar as possible to the covariance matrix of the vector $\Delta
v_{it}$, so the same choice as the ``difference'' estimator is
appropriate. Ideally, the south-east block should be proportional to
the covariance matrix of the vector $\biota \eta_i + \bv$, that is
$\sigma^2_{v} I + \sigma^2_{\eta} \biota \biota'$; but since
$\sigma^2_{\eta}$ is unknown and any positive definite matrix renders
the estimator consistent, people just use $I$. The off-diagonal blocks
should, in principle, contain the covariances between $\Delta v_{is}$
and $v_{it}$, which would be an identity matrix if $v_{it}$ is white
noise. However, since the south-east block is typically given a
conventional value anyway, the benefit in making this choice is not
obvious. Some packages use $I$; others use a zero matrix.
Asymptotically, it should not matter, but on real datasets the
difference between the resulting estimates can be noticeable.

\subsection{Rank deficiency}
\label{sec:rankdef}

Both the difference estimator (\ref{eq:dif-gmm}) and the system
estimator (\ref{eq:sys-gmm}) depend for their existence on the
invertibility of $\bA_N$. This matrix may turn out to be singular for
several reasons. However, this does not mean that the estimator is not
computable: in some cases, adjustments are possible such that the
estimator does exist, but the user should be aware that in these cases
not all software packages use the same strategy and replication of
results may prove difficult or even impossible.

A first reason why $\bA_N$ may be singular could be the unavailability
of instruments, chiefly because of missing observations. This case is
easy to handle. If a particular row of $\tilde{\bZ}_i$ is zero for all
units, the corresponding orthogonality condition (or the corresponding
instrument if you prefer) is automatically dropped; of course, the
overidentification rank is adjusted for testing purposes.

Even if no instruments are zero, however, $\bA_N$ could be rank
deficient. A trivial case occurs if there are collinear instruments,
but a less trivial case may arise when $T$ (the total number of time
periods available) is not much smaller than $N$ (the number of units),
as, for example, in some macro datasets where the units are
countries. The total number of potentially usable orthogonality
conditions is $O(T^2)$, which may well exceed $N$ in some cases. Of
course $\bA_N$ is the sum of $N$ matrices which have, at most, rank $2T -
3$ and therefore it could well happen that the sum is singular.

In all these cases, what we consider the ``proper'' way to go is to
substitute the pseudo-inverse of $\bA_N$ (Moore--Penrose) for its regular
inverse. Again, our choice is shared by some software packages, but
not all, so replication may be hard.

\subsection{Covariance matrix and standard errors}

By default the standard errors shown by \texttt{dpanel} for 1-step
estimation are robust, based on the heteroskedasticity-consistent
variance estimator
\[
  \widehat{\rm Var}(\hat{\gamma}) =
    \bM^{-1} \left(\sum_i\bW_i'\bZ_i\right)
    \bA_N\hat{\mathbf{V}}_N\bA_N
    \left(\sum_i\bZ_i'\bW_i\right) \bM^{-1}
  \]
  where $\bM = (\sum_i\bW_i'\bZ_i) \bA_N (\sum_i\bZ_i'\bW_i)$ and
  $\hat{\mathbf{V}}_N = N^{-1} \sum_i
  \bZ_i'\hat{\mathbf{u}}_i\hat{\mathbf{u}}_i'\bZ_i$, with
  $\hat{\mathbf{u}}_i$ the vector of residuals in differences for
  individual $i$.  In addition, as noted above, the variance estimator
  for 2-step estimation employs the finite-sample correction of
  \cite{Windmeijer05}.

  When the \verb|--asymptotic| option is passed to \texttt{dpanel},
  however, the 1-step variance estimator is simply
  $\hat{\sigma}_u^2 M^{-1}$, which is not
  heteroskedasticity-consistent, and the Windmeijer correction is not
  applied for 2-step estimation. Use of the asymptotic option is not
  recommended unless you wish to replicate prior results that did not
  report robust standard errors. In particular, tests based on the
  asymptotic 2-step variance estimator are known to over-reject quite
  substantially (standard errors too small).

\subsection{Treatment of missing values}

Textbooks seldom bother with missing values, but in some cases their
treatment may be far from obvious. This is especially true if missing
values are interspersed between valid observations. For example,
consider the plain difference estimator with one lag, so
\[
y_t = \alpha y_{t-1} + \eta + \epsilon_t
\]
where the $i$ index is omitted for clarity. Suppose you have an
individual with $t=1\ldots5$, for which $y_3$ is missing. It may seem
that the data for this individual are unusable, because
differencing $y_t$ would produce something like
\[
\begin{array}{c|ccccc}
  t & 1 & 2 & 3 & 4 & 5 \\
  \hline
  y_t & * & * & \circ & * & * \\
  \Delta y_t & \circ & * & \circ & \circ & *
\end{array}
\]
where $*$ = nonmissing and $\circ$ = missing. Estimation seems to be
unfeasible, since there are no periods in which $\Delta y_t$ and
$\Delta y_{t-1}$ are both observable.

However, we can use a $k$-difference operator and get
\[
\Delta_k y_t = \alpha \Delta_k y_{t-1} + \Delta_k \epsilon_t
\]
where $\Delta_k = 1 - L^k$ and past levels of $y_t$ are perfectly
valid instruments. In this example, we can choose $k=3$ and use $y_1$
as an instrument, so this unit is in fact perfectly usable.

Not all software packages seem to be aware of this possibility, so
replicating published results may prove tricky if your dataset
contains individuals with gaps between valid observations.

\section{Usage}
\label{sec:dpanel-usage}

One of the concepts underlying the syntax of \texttt{dpanel} is that
you get default values for several choices you may want to make, so
that in a ``standard'' situation the command is very concise.  The
simplest case of the model (\ref{eq:dpd-def}) is a plain AR(1)
process:
\begin{equation}
\label{eq:dp1}
  y_{i,t} = \alpha y_{i,t-1} + \eta_{i} + v_{it} .
\end{equation}
If you give the command
\begin{code}
  dpanel 1 ; y
\end{code}
gretl assumes that you want to estimate (\ref{eq:dp1}) via the
difference estimator (\ref{eq:dif-gmm}), using as many orthogonality
conditions as possible.  The scalar \texttt{1} between \texttt{dpanel}
and the semicolon indicates that only one lag of \texttt{y} is
included as an explanatory variable; using \texttt{2} would give an
AR(2) model. The syntax that gretl uses for the non-seasonal AR and MA
lags in an ARMA model is also supported in this context. For
example, if you want the first and third lags of \texttt{y} (but not
the second) included as explanatory variables you can say
\begin{code}
  dpanel {1 3} ; y
\end{code}
or you can use a pre-defined matrix for this purpose:
\begin{code}
  matrix ylags = {1, 3}
  dpanel ylags ; y
\end{code}
To use a single lag of \texttt{y} other than the first you need to
employ this mechanism:
\begin{code}
  dpanel {3} ; y # only lag 3 is included
  dpanel 3 ; y   # compare: lags 1, 2 and 3 are used
\end{code}

To use the system estimator instead, you add the \verb|--system|
option, as in
\begin{code}
  dpanel 1 ; y --system
\end{code}
The level orthogonality conditions and the corresponding instrument
are appended automatically (see eq.\ \ref{eq:sys-gmm}).

\subsection{Regressors}

If we want to introduce additional regressors, we list them after the
dependent variable in the same way as other gretl commands, such as
\texttt{ols}.  For the difference orthogonality relations,
\texttt{dpanel} takes care of transforming the regressors in parallel
with the dependent variable.

One case of potential ambiguity is when an intercept is specified but
the difference-only estimator is selected, as in
\begin{code}
  dpanel 1 ; y const
\end{code}
In this case the default \texttt{dpanel} behavior, which agrees with
Stata's \texttt{xtabond2}, is to drop the constant (since differencing
reduces it to nothing but zeros). However, for compatibility with the
DPD package for Ox, you can give the option \verb|--dpdstyle|, in
which case the constant is retained (equivalent to including a linear
trend in equation~\ref{eq:dpd-def}).  A similar point applies to the
period-specific dummy variables which can be added in \texttt{dpanel}
via the \verb|--time-dummies| option: in the differences-only case
these dummies are entered in differenced form by default, but when the
\verb|--dpdstyle| switch is applied they are entered in levels.

The standard gretl syntax applies if you want to use lagged
explanatory variables, so for example the command
\begin{code}
  dpanel 1 ; y const x(0 to -1) --system
\end{code}
would result in estimation of the model
\[
  y_{it} = \alpha y_{i,t-1} +
  \beta_0 + \beta_1 x_{it} + \beta_2 x_{i,t-1} +
  \eta_{i} + v_{it} .
\]


\subsection{Instruments}

The default rules for instruments are:
\begin{itemize}
\item lags of the dependent variable are instrumented using all
  available orthogonality conditions; and
\item additional regressors are considered exogenous, so they are used
  as their own instruments.
\end{itemize}

If a different policy is wanted, the instruments should be specified
in an additional list, separated from the regressors list by a
semicolon. The syntax closely mirrors that for the \texttt{tsls}
command, but in this context it is necessary to distinguish between
``regular'' instruments and what are often called ``GMM-style''
instruments (that is, instruments that are handled in the same
block-diagonal manner as lags of the dependent variable, as described
above).

``Regular'' instruments are transformed in the same way as
regressors, and the contemporaneous value of the transformed variable
is used to form an orthogonality condition. Since regressors are
treated as exogenous by default, it follows that these two commands
estimate the same model:

\begin{code}
  dpanel 1 ; y z
  dpanel 1 ; y z ; z
\end{code}
The instrument specification in the second case simply confirms what
is implicit in the first: that \texttt{z} is exogenous. Note, though,
that if you have some additional variable \texttt{z2} which you want
to add as a regular instrument, it then becomes necessary to
include \texttt{z} in the instrument list if it is to be treated
as exogenous:
\begin{code}
  dpanel 1 ; y z ; z2   # z is now implicitly endogenous
  dpanel 1 ; y z ; z z2 # z is treated as exogenous
\end{code}

The specification of ``GMM-style'' instruments is handled by the
special constructs \texttt{GMM()} and \texttt{GMMlevel()}.  The first
of these relates to instruments for the equations in differences, and
the second to the equations in levels. The syntax for \texttt{GMM()}
is

\begin{altcode}
\texttt{GMM(}\textsl{name}\texttt{,} \textsl{minlag}\texttt{,}
\textsl{maxlag}\texttt{)}
\end{altcode}

\noindent
where \textsl{name} is replaced by the name of a series (or the name
of a list of series), and \textsl{minlag} and \textsl{maxlag} are
replaced by the minimum and maximum lags to be used as
instruments. The same goes for \texttt{GMMlevel()}.

One common use of \texttt{GMM()} is to limit the number of lagged
levels of the dependent variable used as instruments for the equations
in differences. It's well known that although exploiting all possible
orthogonality conditions yields maximal asymptotic efficiency, in
finite samples it may be preferable to use a smaller subset (but see
also \cite{OkuiJoE2009}).  For example, the specification

\begin{code}
  dpanel 1 ; y ; GMM(y, 2, 4)
\end{code}
ensures that no lags of $y_t$ earlier than $t-4$ will be used as
instruments.

A second use of \texttt{GMM()} is to exploit more fully the potential
block-diagonal orthogonality conditions offered by an exogenous
regressor, or a related variable that does not appear as a regressor.
For example, in

\begin{code}
  dpanel 1 ; y x ; GMM(z, 2, 6)
\end{code}
the variable \texttt{x} is considered an endogenous regressor, and up to
5 lags of \texttt{z} are used as instruments.

Note that in the following script fragment
\begin{code}
  dpanel 1 ; y z
  dpanel 1 ; y z ; GMM(z,0,0)
\end{code}
the two estimation commands should not be expected to give the same
result, as the sets of orthogonality relationships are subtly
different.  In the latter case, you have $T-2$ separate orthogonality
relationships pertaining to $z_{it}$, none of which has any
implication for the other ones; in the former case, you only have one.
In terms of the $\bZ_i$ matrix, the first form adds a single row to
the bottom of the instruments matrix, while the second form adds a
diagonal block with $T-2$ columns; that is,
\[
  \left[ \begin{array}{cccc}
         z_{i3} & z_{i4} & \cdots & z_{it}
       \end{array} \right]
\]
versus
\[
  \left[ \begin{array}{cccc}
         z_{i3} & 0 & \cdots & 0 \\
         0 & z_{i4} & \cdots & 0 \\
          & \ddots & \ddots &  \\
         0 & 0 & \cdots & z_{it}
       \end{array} \right]
\]

\section{Replication of DPD results}
\label{sec:DPD-replic}

In this section we show how to replicate the results of some of the
pioneering work with dynamic panel-data estimators by Arellano, Bond
and Blundell.  As the DPD manual \citep*{DPDmanual} explains, it is
difficult to replicate the original published results exactly, for two
main reasons: not all of the data used in those studies are publicly
available; and some of the choices made in the original software
implementation of the estimators have been superseded.  Here,
therefore, our focus is on replicating the results obtained using the
current DPD package and reported in the DPD manual.

The examples are based on the program files \texttt{abest1.ox},
\texttt{abest3.ox} and \texttt{bbest1.ox}. These are included in the
DPD package, along with the Arellano--Bond database files
\texttt{abdata.bn7} and \texttt{abdata.in7}.\footnote{See
  \url{http://www.doornik.com/download.html}.} The
Arellano--Bond data are also provided with gretl, in the file
\texttt{abdata.gdt}. In the following we do not show the output from
DPD or gretl; it is somewhat voluminous, and is easily generated by
the user. As of this writing the results from Ox/DPD and gretl are
identical in all relevant respects for all of the examples
shown.\footnote{To be specific, this is using Ox Console version 5.10,
  version 1.24 of the DPD package, and gretl built from CVS as of
  2010-10-23, all on Linux.}

A complete Ox/DPD program to generate the results of interest takes
this general form:

\begin{code}
#include <oxstd.h>
#import <packages/dpd/dpd>

main()
{
    decl dpd = new DPD();

    dpd.Load("abdata.in7");
    dpd.SetYear("YEAR");

    // model-specific code here

    delete dpd;
}
\end{code}
%
In the examples below we take this template for granted and show just
the model-specific code.

\subsection{Example 1}

The following Ox/DPD code---drawn from \texttt{abest1.ox}---replicates
column (b) of Table 4 in \cite{arellano-bond91}, an instance of the
differences-only or GMM-DIF estimator. The dependent variable is the
log of employment, \texttt{n}; the regressors include two lags of the
dependent variable, current and lagged values of the log real-product
wage, \texttt{w}, the current value of the log of gross capital,
\texttt{k}, and current and lagged values of the log of industry
output, \texttt{ys}. In addition the specification includes a constant
and five year dummies; unlike the stochastic regressors, these
deterministic terms are not differenced. In this specification the
regressors \texttt{w}, \texttt{k} and \texttt{ys} are treated as
exogenous and serve as their own instruments. In DPD syntax this
requires entering these variables twice, on the \verb|X_VAR| and
\verb|I_VAR| lines. The GMM-type (block-diagonal) instruments in this
example are the second and subsequent lags of the level of \texttt{n}.
Both 1-step and 2-step estimates are computed.

\begin{code}
dpd.SetOptions(FALSE); // don't use robust standard errors
dpd.Select(Y_VAR, {"n", 0, 2});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Gmm("n", 2, 99);
dpd.SetDummies(D_CONSTANT + D_TIME);

print("\n\n***** Arellano & Bond (1991), Table 4 (b)");
dpd.SetMethod(M_1STEP);
dpd.Estimate();
dpd.SetMethod(M_2STEP);
dpd.Estimate();
\end{code}

Here is gretl code to do the same job:

\begin{code}
open abdata.gdt
list X = w w(-1) k ys ys(-1)
dpanel 2 ; n X const --time-dummies --asy --dpdstyle
dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle
\end{code}

Note that in gretl the switch to suppress robust standard errors is
\verb|--asymptotic|, here abbreviated to \verb|--asy|.\footnote{Option
  flags in gretl can always be truncated, down to the minimal unique
  abbreviation.} The \verb|--dpdstyle| flag specifies that the
constant and dummies should not be differenced, in the context of a
GMM-DIF model. With gretl's \texttt{dpanel} command it is not
necessary to specify the exogenous regressors as their own instruments
since this is the default; similarly, the use of the second and all
longer lags of the dependent variable as GMM-type instruments is the
default and need not be stated explicitly.

\subsection{Example 2}

The DPD file \texttt{abest3.ox} contains a variant of the above that
differs with regard to the choice of instruments: the variables
\texttt{w} and \texttt{k} are now treated as predetermined, and are
instrumented GMM-style using the second and third lags of their
levels. This approximates column (c) of Table 4 in
\cite{arellano-bond91}.  We have modified the code in
\texttt{abest3.ox} slightly to allow the use of robust
(Windmeijer-corrected) standard errors, which are the default in both
DPD and gretl with 2-step estimation:

\begin{code}
dpd.Select(Y_VAR, {"n", 0, 2});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"ys", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);

dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 3);
dpd.Gmm("k", 2, 3);

print("\n***** Arellano & Bond (1991), Table 4 (c)\n");
print("        (but using different instruments!!)\n");
dpd.SetMethod(M_2STEP);
dpd.Estimate();
\end{code}

The gretl code is as follows:

\begin{code}
open abdata.gdt
list X = w w(-1) k ys ys(-1)
list Ivars = ys ys(-1)
dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd
\end{code}
%
Note that since we are now calling for an instrument set other then
the default (following the second semicolon), it is necessary to
include the \texttt{Ivars} specification for the variable \texttt{ys}.
However, it is not necessary to specify \texttt{GMM(n,2,99)} since
this remains the default treatment of the dependent variable.

\subsection{Example 3}

Our third example replicates the DPD output from \texttt{bbest1.ox}:
this uses the same dataset as the previous examples but the model
specifications are based on \cite{blundell-bond98}, and involve
comparison of the GMM-DIF and GMM-SYS (``system'') estimators. The
basic specification is slightly simplified in that the variable
\texttt{ys} is not used and only one lag of the dependent variable
appears as a regressor. The Ox/DPD code is:

\begin{code}
dpd.Select(Y_VAR, {"n", 0, 1});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF");
dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 99);
dpd.Gmm("k", 2, 99);
dpd.SetMethod(M_2STEP);
dpd.Estimate();

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS");
dpd.GmmLevel("n", 1, 1);
dpd.GmmLevel("w", 1, 1);
dpd.GmmLevel("k", 1, 1);
dpd.SetMethod(M_2STEP);
dpd.Estimate();
\end{code}

Here is the corresponding gretl code:

\begin{code}
open abdata.gdt
list X = w w(-1) k k(-1)
list Z = w k

# Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF
dpanel 1 ; n X const ; GMM(Z,2,99) --time --two-step --dpd

# Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS
dpanel 1 ; n X const ; GMM(Z,2,99) GMMlevel(Z,1,1) \
 --time --two-step --dpd --system
\end{code}

Note the use of the \verb|--system| option flag to specify GMM-SYS,
including the default treatment of the dependent variable, which
corresponds to \texttt{GMMlevel(n,1,1)}. In this case we also want to
use lagged differences of the regressors \texttt{w} and \texttt{k} as
instruments for the levels equations so we need explicit
\texttt{GMMlevel} entries for those variables. If you want something
other than the default treatment for the dependent variable as an
instrument for the levels equations, you should give an explicit
\texttt{GMMlevel} specification for that variable---and in that case
the \verb|--system| flag is redundant (but harmless).

For the sake of completeness, note that if you specify at least one
\texttt{GMMlevel} term, \texttt{dpanel} will then include equations in
levels, but it will not automatically add a default \texttt{GMMlevel}
specification for the dependent variable unless the \verb|--system|
option is given.

\section{Cross-country growth example}
\label{sec:dpanel-growth}

The previous examples all used the Arellano--Bond dataset; for this
example we use the dataset \texttt{CEL.gdt}, which is also included in
the gretl distribution. As with the Arellano--Bond data, there are
numerous missing values.  Details of the provenance of the data can be
found by opening the dataset information window in the gretl GUI
(\textsf{Data} menu, \textsf{Dataset info} item). This is a subset of
the Barro--Lee 138-country panel dataset, an approximation to which is
used in \citet*{CEL96} and \citet*{Bond2001}.\footnote{We say an
  ``approximation'' because we have not been able to replicate exactly
  the OLS results reported in the papers cited, though it seems from
  the description of the data in \cite{CEL96} that we ought to be able
  to do so.  We note that \cite{Bond2001} used data provided by
  Professor Caselli yet did not manage to reproduce the latter's
  results.}  Both of these papers explore the dynamic panel-data
approach in relation to the issues of growth and convergence of per
capita income across countries.

The dependent variable is growth in real GDP per capita over
successive five-year periods; the regressors are the log of the
initial (five years prior) value of GDP per capita, the log-ratio of
investment to GDP, $s$, in the prior five years, and the log of annual
average population growth, $n$, over the prior five years plus 0.05 as
stand-in for the rate of technical progress, $g$, plus the rate of
depreciation, $\delta$ (with the last two terms assumed to be constant
across both countries and periods).  The original model is
\begin{equation}
\label{eq:CEL96}
\Delta_5 y_{it} = \beta y_{i,t-5} + \alpha s_{it} + \gamma (n_{it} +
g + \delta) + \nu_t + \eta_i + \epsilon_{it}
\end{equation}
which allows for a time-specific disturbance $\nu_t$. The Solow model
with Cobb--Douglas production function implies that $\gamma =
-\alpha$, but this assumption is not imposed in estimation. The
time-specific disturbance is eliminated by subtracting the period mean
from each of the series.

Equation (\ref{eq:CEL96}) can be transformed to an AR(1) dynamic
panel-data model by adding $y_{i,t-5}$ to both sides, which gives
\begin{equation}
\label{eq:CEL96a}
y_{it} = (1 + \beta) y_{i,t-5} + \alpha s_{it} + \gamma (n_{it} +
g + \delta) + \eta_i + \epsilon_{it}
\end{equation}
where all variables are now assumed to be time-demeaned.

In (rough) replication of \cite{Bond2001} we now proceed to estimate
the following two models: (a) equation (\ref{eq:CEL96a}) via GMM-DIF,
using as instruments the second and all longer lags of $y_{it}$,
$s_{it}$ and $n_{it} + g + \delta$; and (b) equation
(\ref{eq:CEL96a}) via GMM-SYS, using $\Delta y_{i,t-1}$, $\Delta
s_{i,t-1}$ and $\Delta (n_{i,t-1} + g + \delta)$ as additional
instruments in the levels equations. We report robust standard errors
throughout. (As a purely notational matter, we now use ``$t-1$'' to
refer to values five years prior to $t$, as in \cite{Bond2001}).

The gretl script to do this job is shown below. Note that the final
transformed versions of the variables (logs, with time-means
subtracted) are named \texttt{ly} ($y_{it}$), \texttt{linv} ($s_{it}$)
and \texttt{lngd} ($n_{it} + g + \delta$).
%
\begin{code}
open CEL.gdt

ngd = n + 0.05
ly = log(y)
linv = log(s)
lngd = log(ngd)

# take out time means
loop i=1..8
  smpl (time == i) --restrict --replace
  ly -= mean(ly)
  linv -= mean(linv)
  lngd -= mean(lngd)
endloop

smpl --full
list X = linv lngd
# 1-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99)
# 2-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99) --two-step
# GMM-SYS
dpanel 1 ; ly X ; GMM(X,2,99) GMMlevel(X,1,1) --two-step --sys
\end{code}

For comparison we estimated the same two models using Ox/DPD and the
Stata command \texttt{xtabond2}. (In each case we constructed a
comma-separated values dataset containing the data as transformed in
the gretl script shown above, using a missing-value code appropriate
to the target program.) For reference, the commands used with
Stata are reproduced below:
%
\begin{code}
#delimit ;
insheet using CEL.csv
tsset unit time;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
  gmm(lngd, lag(2 99)) rob nolev;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
  gmm(lngd, lag(2 99)) rob nolev twostep;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
  gmm(lngd, lag(2 99)) rob nocons twostep;
\end{code}

For the GMM-DIF model all three programs find 382 usable observations
and 30 instruments, and yield identical parameter estimates and
robust standard errors (up to the number of digits printed, or more);
see Table~\ref{tab:growth-DIF}.\footnote{The coefficient shown for
  \texttt{ly(-1)} in the Tables is that reported directly by the
  software; for comparability with the original model (eq.\
  \ref{eq:CEL96}) it is necesary to subtract 1, which produces the
  expected negative value indicating conditional convergence in per
  capita income.}

\begin{table}[htbp]
\begin{center}
\begin{tabular}{lrrrr}
& \multicolumn{2}{c}{1-step} & \multicolumn{2}{c}{2-step} \\
& \multicolumn{1}{c}{coeff} & \multicolumn{1}{c}{std.\ error} &
  \multicolumn{1}{c}{coeff} & \multicolumn{1}{c}{std.\ error} \\
\texttt{ly(-1)} & 0.577564 & 0.1292 & 0.610056 & 0.1562 \\
\texttt{linv} & 0.0565469 & 0.07082 & 0.100952 & 0.07772 \\
\texttt{lngd} & $-$0.143950 & 0.2753 & $-$0.310041 & 0.2980 \\
\end{tabular}
\caption{GMM-DIF: Barro--Lee data}
\label{tab:growth-DIF}
\end{center}
\end{table}

Results for GMM-SYS estimation are shown in
Table~\ref{tab:growth-SYS}. In this case we show two sets of gretl
results: those labeled ``gretl(1)'' were obtained using gretl's
\verb|--dpdstyle| option, while those labeled ``gretl(2)'' did not use
that option---the intent being to reproduce the $H$ matrices used by
Ox/DPD and \texttt{xtabond2} respectively.

\begin{table}[htbp]
\begin{center}
\begin{tabular}{lrrrr}
& \multicolumn{1}{c}{gretl(1)} &
  \multicolumn{1}{c}{Ox/DPD} &
  \multicolumn{1}{c}{gretl(2)} &
  \multicolumn{1}{c}{xtabond2} \\
\texttt{ly(-1)} & 0.9237 (0.0385) &
  0.9167 (0.0373) &
    0.9073 (0.0370) &
      0.9073 (0.0370) \\
\texttt{linv} & 0.1592 (0.0449) &
  0.1636 (0.0441) &
    0.1856 (0.0411) &
      0.1856 (0.0411) \\
\texttt{lngd} & $-$0.2370 (0.1485) &
  $-$0.2178 (0.1433) &
    $-$0.2355 (0.1501) &
      $-$0.2355 (0.1501)
\end{tabular}
\caption{2-step GMM-SYS: Barro--Lee data (standard errors in parentheses)}
\label{tab:growth-SYS}
\end{center}
\end{table}

In this case all three programs use 479 observations; gretl and
\texttt{xtabond2} use 41 instruments and produce the same estimates
(when using the same $H$ matrix) while Ox/DPD nominally uses
66.\footnote{This is a case of the issue described in
  section~\ref{sec:rankdef}: the full $\bA_N$ matrix turns out to be
  singular and special measures must be taken to produce estimates.}
It is noteworthy that with GMM-SYS plus ``messy'' missing
observations, the results depend on the precise array of instruments
used, which in turn depends on the details of the implementation of
the estimator.

\section{Auxiliary test statistics}
\label{sec:dpanel-aux}

We have concentrated above on the parameter estimates and standard
errors. Here we add a few words on the additional test statistics that
typically accompany both GMM-DIF and GMM-SYS estimation. These include
the Sargan test for overidentification, one or more Wald tests for the
joint significance of the regressors (and time dummies, if applicable)
and tests for first- and second-order autocorrelation of the residuals
from the equations in differences.

As in Ox/DPD, the Sargan test statistic reported by gretl is
\[
  S = \left(\sum_{i=1}^N \hat{\bv}^{*\prime}_i \bZ_i\right)
   \bA_N \left(\sum_{i=1}^N \bZ_i' \hat{\bv}^*_i\right)
\]
where the $\hat{\bv}^*_i$ are the transformed (e.g.\ differenced)
residuals for unit $i$.  Under the null hypothesis that the
instruments are valid, $S$ is asymptotically distributed as chi-square
with degrees of freedom equal to the number of overidentifying
restrictions.

In general we see a good level of agreement between gretl, DPD and
\texttt{xtabond2} with regard to these statistics, with a few
relatively minor exceptions. Specifically, \texttt{xtabond2} computes
both a ``Sargan test'' and a ``Hansen test'' for overidentification,
but what it calls the Hansen test is, apparently, what DPD calls the
Sargan test. (We have had difficulty determining from the
\texttt{xtabond2} documentation \citep{Roodman2006} exactly how its
Sargan test is computed.) In addition there are cases where the
degrees of freedom for the Sargan test differ between DPD and gretl;
this occurs when the $\bA_N$ matrix is singular
(section~\ref{sec:rankdef}). In concept the df equals the number of
instruments minus the number of parameters estimated; for the first of
these terms gretl uses the rank of $\bA_N$, while DPD appears to use
the full dimension of this matrix.

Negative first-order autocorrelation of the residuals is expected by
construction of the estimator, so a significant value for the AR(1)
test does not indicate a problem. If the AR(2) test is significant,
however, this indicates violation of the maintained assumptions. Note
that valid AR tests cannot be produced when the \verb|--asymptotic|
option is specified in conjunction with one-step GMM-SYS estimation;
if you need the tests, either add the \verb|two-step| option or drop
the asymptotic flag (which is recommended in any case).

\section{Post-estimation available statistics}
\label{sec:dpanel-post}

After estimation, the \dollar{model} accessor will return a bundle
containing several items that may be of interest: most should be
self-explanatory, but here's a partial list:

\begin{center}
\begin{tabular}{rp{0.6\textwidth}}
  \hline
  \textbf{Key} & \textbf{Content} \\
  \hline
  \texttt{AR1}, \texttt{AR2} & 1st and 2nd order autocorrelation test
                               statistics \\
  \texttt{sargan}, \texttt{sargan\_df} & Sargan test for
                                         overidentifying restrictions
                                         and corresponding degrees of freedom \\
  \texttt{wald}, \texttt{wald\_df} & Wald test for
                                     overall significance
                                     and corresponding degrees of
                                     freedom \\
  \texttt{GMMinst} & The matrix $\bZ$ of instruments (see equations
                     (\ref{eq:dpd-dif}) and (\ref{eq:sys-gmm}) \\
  \texttt{wgtmat} & The matrix $\bA$ of GMM weights (see equations
                    (\ref{eq:dpd-dif}) and (\ref{eq:sys-gmm}) \\
  \hline
\end{tabular}
\end{center}

Note, however, that \texttt{GMMinst} and \texttt{wgtmat} (which may be
quite large matrices) are not saved in the \dollar{model} bundle by
default; that requires use of the \option{keep-extra} option with the
\cmd{dpanel} command. Listing~\ref{ex:dpanel-rep} illustrates use
of these matrices to replicate via hansl commands the calculation of
the GMM estimator.

\begin{script}[p]
  \scriptcaption{replication of built-in command via hansl commands}
  \label{ex:dpanel-rep}
\begin{scode}
set verbose off
open abdata.gdt

# compose list of regressors
list X = w w(-1) k k(-1)
list Z = w k

dpanel 1 ; n X const ; GMM(Z,2,99) --two-step --dpd --keep-extra

### --- re-do by hand ----------------------------

# fetch Z and A from model
A = $model.wgtmat
mZt = $model.GMMinst # note: transposed

# create data matrices
series valid = ok($uhat)
series ddep = diff(n)
series dldep = ddep(-1)
list dreg = diff(X)

smpl valid --dummy

matrix m_reg = {dldep} ~ {dreg} ~ 1
matrix m_dep = {ddep}

matrix uno = mZt * m_reg
matrix due = qform(uno', A)
matrix tre = (uno'A) * (mZt * m_dep)
matrix coef = due\tre

print coef
\end{scode}
\end{script}


\section{Memo: \texttt{dpanel} options}
\label{sec:dpanel-options}

\begin{center}
\begin{tabular}{lp{.7\textwidth}}
  \textit{flag} & \textit{effect} \\ [6pt]
  \verb|--asymptotic| & Suppresses the use of robust standard errors \\
  \verb|--two-step| & Calls for 2-step estimation (the default being 1-step) \\
  \verb|--system| & Calls for GMM-SYS, with default treatment of the
                    dependent variable, as in \texttt{GMMlevel(y,1,1)} \\
  \verb|--time-dummies| & Includes period-specific dummy variables \\
  \verb|--dpdstyle| & Compute the $H$ matrix as in DPD; also suppresses
                      differencing of automatic time dummies and omission of intercept
                      in the GMM-DIF case\\
  \verb|--verbose| & Prints confirmation of the GMM-style instruments
                     used; and when \verb|--two-step| is selected, prints
                     the 1-step estimates first \\
  \verb|--vcv| & Calls for printing of the covariance matrix \\
  \verb|--quiet| & Suppresses the printing of results \\
  \verb|--keep-extra| & Save additional matrices in \dollar{model}
                        bundle (see above) \\
\end{tabular}
\end{center}

The time dummies option supports the qualifier \texttt{noprint}, as
in
\begin{code}
--time-dummies=noprint
\end{code}

This means that although the dummies are included in the specification
their coefficients, standard errors and so on are not printed.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: