1---
2output: rmarkdown::html_vignette
3bibliography: ../inst/REFERENCES.bib
4abstract: |
5  This introduction to the `plm` package is a modified
6  and extended version of @Croissant:Millo:2008, published in the
7  *Journal of Statistical Software*.
8
9  Panel data econometrics is obviously one of the main fields
10  in the statistics profession, but most of the models used are difficult to
11  estimate with only plain `R`. `plm` is a package for `R` which
12  intends to make the estimation of linear panel models
13  straightforward. `plm` provides functions to estimate a wide
14  variety of models and to make (robust) inference.
15title: 'Panel data econometrics in R:'
16subtitle: 'the plm package'
17author:
18- name: Yves Croissant
19- name: Giovanni Millo
20vignette: >
21  %\VignetteIndexEntry{Panel data econometrics in R: the plm package}
22  %\VignetteEngine{knitr::rmarkdown}
23  %\VignetteEncoding{UTF-8}
24
25---
26
27# Introduction
28
29Panel data econometrics is a continuously developing field. The
30increasing availability of data observed on cross-sections of units
31(like households, firms, countries etc.) *and* over time has given
32rise to a number of estimation approaches exploiting this double
33dimensionality to cope with some of the typical problems associated
34with economic data, first of all that of unobserved heterogeneity.
35
36Timewise observation of data from different observational units has
37long been common in other fields of statistics (where they are often
38termed *longitudinal* data). In the panel data field as well as in
39others, the econometric approach is nevertheless peculiar with respect
40to experimental contexts, as it is emphasizing model specification and
41testing and tackling a number of issues arising from the particular
42statistical problems associated with economic data.
43
44Thus, while a very comprehensive software framework for (among many
45other features) maximum likelihood estimation of linear regression
46models for longitudinal data, packages `nlme`
47[@PINH:BATE:DEBR:SARK:07] and `lme4`
48[@BATE:07], is available in the `R` (@R:2008)
49environment and can be used, e.g., for estimation of random effects
50panel models, its use is not intuitive for a practicing
51econometrician, and maximum likelihood estimation is only one of the
52possible approaches to panel data econometrics. Moreover, economic
53panel data sets often happen to be *unbalanced* (i.e., they have a
54different number of observations between groups), which case needs
55some adaptation to the methods and is not compatible with those in
56`nlme`.  Hence the need for a package doing panel data "from the
57econometrician's viewpoint" and featuring at a minimum the basic
58techniques econometricians are used to: random and fixed
59effects estimation of static linear panel data models, variable
60coefficients models, generalized method of moments estimation of
61dynamic models; and the basic toolbox of specification and
62misspecification diagnostics.
63
64Furthermore, we felt there was a need for automation of some basic
65data management tasks such as lagging, summing and, more in general,
66`apply`ing (in the `R` sense) functions to the data, which,
67although conceptually simple, become cumbersome and error-prone on
68two-dimensional data, especially in the case of unbalanced panels.
69
70This paper is organized as follows: Section [linear panel model](#linear-panel-model)
71presents a very short overview of the typical model
72taxonomy^[Comprehensive treatments are to be found in many econometrics textbooks,
73e.g., @BALT:05, @BALT:13, @BALT:21 or @WOOL:02, @WOOL:10: the reader is referred to
74these, especially to the first 9 chapters of @BALT:05, @BALT:13, @BALT:21.].
75Section [software approach](#software-approach) discusses the
76software approach used in the package. The next three sections present
77the functionalities of the package in more detail: data management
78(Section [managing data and formulae](#managing-data-and-formulae)), estimation
79(Section [model estimation](#model-estimation)) and testing (Section [tests](#tests)),
80giving a short description and illustrating them with
81examples. Section [plm vs nlme and lme4](#nlme) compares the approach in `plm` to
82that of `nlme` and `lme4`, highlighting the features of the
83latter two that an econometrician might find most useful.
84Section [conclusion](#conclusions) concludes the paper.
85
86
87# The linear panel model{#linear-panel-model}
88
89The basic linear panel models used in econometrics can be described
90through suitable restrictions of the following general model:
91
92\begin{equation*}
93 y_{it}=\alpha_{it} +  \beta_{it}^\top x_{it} + u_{it}
94\end{equation*}
95
96where $i=1, ..., n$ is the individual (group, country ...) index,
97$t=1, ..., T$ is the time index and $u_{it}$ a random disturbance
98term of mean $0$.
99
100Of course $u_{it}$ is not estimable with $N = n \times T$ data
101points. A number of assumptions are usually made about the parameters,
102the errors and the exogeneity of the regressors, giving rise to a
103taxonomy of feasible models for panel data.
104
105The most common one is parameter homogeneity, which means that
106$\alpha_{it}=\alpha$ for all $i,t$ and $\beta_{it}=\beta$ for all
107$i,t$. The resulting model
108
109\begin{equation*}
110 y_{it}=\alpha +  \beta^\top x_{it} + u_{it}
111\end{equation*}
112
113is a standard linear model pooling all the data across $i$ and $t$.
114
115To model individual heterogeneity, one often assumes that the error
116term has two separate components, one of which is specific to the
117individual and doesn't change over time^[For the sake of exposition we are
118considering only the individual effects case here. There may also be time
119effects, which is a symmetric case, or both of them, so that the error has
120three components: $u_{it}=\mu_{i}+\lambda_{t}+\epsilon_{it}$.].
121This is called the unobserved effects model:
122
123\begin{equation}
124  (\#eq:errcomp)
125 y_{it}=\alpha +  \beta^\top x_{it} + \mu_i + \epsilon_{it}
126\end{equation}
127
128The appropriate estimation method for this model depends on the
129properties of the two error components. The idiosyncratic error
130$\epsilon_{it}$ is usually assumed well-behaved and independent of
131both the regressors $x_{it}$ and the individual error component
132$\mu_i$. The individual component may be in turn either independent of
133the regressors or correlated.
134
135If it is correlated, the ordinary least squares (OLS) estimator of
136$\beta$ would be inconsistent, so it is customary to treat the $\mu_i$
137as a further set of $n$ parameters to be estimated, as if in the
138general model $\alpha_{it}=\alpha_{i}$ for all $t$. This is called the
139fixed effects (a.k.a. *within* or *least squares dummy variables*)
140model, usually estimated by OLS on transformed data, and gives
141consistent estimates for $\beta$.
142
143If the individual-specific component $\mu_i$ is uncorrelated with the
144regressors, a situation which is usually termed *random effects*, the
145overall error $u_{it}$ also is, so the OLS estimator is
146consistent. Nevertheless, the common error component over individuals
147induces correlation across the composite error terms, making OLS
148estimation inefficient, so one has to resort to some form of feasible
149generalized least squares (GLS) estimators. This is based on the
150estimation of the variance of the two error components, for which
151there are a number of different procedures available.
152
153If the individual component is missing altogether, pooled OLS is the
154most efficient estimator for $\beta$. This set of assumptions is
155usually labelled *pooling* model, although this actually refers to the
156errors' properties and the appropriate estimation method rather than
157the model itself.  If one relaxes the usual hypotheses of
158well-behaved, white noise errors and allows for the idiosyncratic
159error $\epsilon_{it}$ to be arbitrarily heteroskedastic and serially
160correlated over time, a more general kind of feasible GLS is needed,
161called the *unrestricted* or *general* GLS.  This specification can
162also be augmented with individual-specific error components possibly
163correlated with the regressors, in which case it is termed *fixed
164effects* GLS.
165
166Another way of estimating unobserved effects models through removing
167time-invariant individual components is by first-differencing the
168data: lagging the model and subtracting, the time-invariant components
169(the intercept and the individual error component) are eliminated, and
170the model
171
172\begin{equation*}
173 \Delta y_{it}=  \beta^\top  \Delta x_{it} + \Delta u_{it}
174\end{equation*}
175
176(where $\Delta y_{it}=y_{it}-y_{i,t-1}$, $\Delta
177x_{it}=x_{it}-x_{i,t-1}$ and, from \@ref(eq:errcomp), $\Delta
178u_{it}=u_{it}-u_{i,t-1}=\Delta \epsilon_{it}$ for $t=2,...,T$) can be
179consistently estimated by pooled OLS. This is called the *first-difference*
180or FD estimator. Its relative efficiency, and so reasons for choosing it
181against other consistent alternatives, depends on the properties of the error
182term. The FD estimator is usually preferred if the errors $u_{it}$ are
183strongly persistent in time, because then the $\Delta u_{it}$ will
184tend to be serially uncorrelated.
185
186Lastly, the *between* model, which is computed on time (group)
187averages of the data, discards all the information due to intragroup
188variability but is consistent in some settings (e.g., non-stationarity)
189where the others are not, and is often preferred to estimate long-run
190relationships.
191
192Variable coefficients models relax the assumption that
193$\beta_{it}=\beta$ for all $i,t$. Fixed coefficients models allow the
194coefficients to vary along one dimension, like $\beta_{it}=\beta_i$
195for all $t$. Random coefficients models instead assume that
196coefficients vary randomly around a common average, as
197$\beta_{it}=\beta+\eta_{i}$ for all $t$, where $\eta_{i}$ is a group--
198(time--) specific effect with mean zero.
199
200
201
202<!--
203The hypotheses on parameters and error terms (and hence the choice of
204the most appropriate estimator) are usually tested by means of: *pooling*
205tests to check poolability, i.e., the hypothesis that the
206same coefficients apply across all individuals; and random effects
207tests, comparing the null of spherical residuals with the alternative
208of group (time) specific effects in the error term. If the homogeneity
209assumption over the $\beta$s is established, the next step is to
210establish the presence of unobserved effects. The choice between fixed
211and random effects specifications is based on Hausman-type tests,
212comparing the two estimators under the null of no significant
213difference: if this is not rejected, the more efficient random effects
214estimator is chosen.  Even after this step, departures of the error
215structure from sphericity can further affect inference, so that either
216screening tests or robust diagnostics are needed.
217-->
218
219
220The hypotheses on parameters and error terms (and hence the choice of
221the most appropriate estimator) are usually tested by means of:
222
223
224- *pooling* tests to check poolability, i.e., the hypothesis
225  that the same coefficients apply across all individuals,
226- if the homogeneity assumption over the coefficients is
227  established, the next step is to establish the presence of
228  unobserved effects, comparing the null of spherical residuals with
229  the alternative of group (time) specific effects in the error term,
230- the choice between fixed and random effects specifications is
231  based on Hausman-type tests, comparing the two estimators under the
232  null of no significant difference: if this is not rejected, the more
233  efficient random effects estimator is chosen,
234- even after this step, departures of the error structure from
235  sphericity can further affect inference, so that either screening
236  tests or robust diagnostics are needed.
237
238
239
240
241<!--
242Dynamic models in a fixed or random effects setting, and in general
243lack of strict exogeneity of the regressors, pose further problems to
244estimation which are usually dealt with in the generalized method of
245moments (GMM) framework.
246-->
247
248Dynamic models and in general lack of strict exogeneity of the
249regressors, pose further problems to estimation which are usually
250dealt with in the generalized method of moments (GMM)
251framework.
252
253These were, in our opinion, the basic requirements of a panel data
254econometrics package for the `R` language and
255environment. Some, as often happens with `R`, were already
256fulfilled by packages developed for other branches of computational
257statistics, while others (like the fixed effects or the between
258estimators) were straightforward to compute after transforming the
259data, but in every case there were either language inconsistencies
260w.r.t. the standard econometric toolbox or subtleties to be dealt with
261(like, for example, appropriate computation of standard errors for the
262demeaned model, a common pitfall), so we felt there was need for an
263"all in one" econometrics-oriented package allowing to make
264specification searches, estimation and inference in a natural way.
265
266# Software approach{#software-approach}
267
268## Data structure
269
270Panel data have a special structure: each row of the data corresponds
271to a specific individual and time period. In `plm` the `data`
272argument may be an ordinary `data.frame` but, in this case, an
273argument called `index` has to be added to indicate the structure
274of the data.  This can be:
275
276- `NULL` (the default value), it is then assumed
277  that the first two columns contain the individual and the time
278  index and that observations are ordered by individual and by time period,
279- a character string, which should be the name of the individual
280  index,
281- a character vector of length two containing the names of the
282  individual and the time index,
283- an integer which is the number of individuals (only in case of a
284  balanced panel with observations ordered by individual).
285
286
287The `pdata.frame` function is then called internally, which returns a
288`pdata.frame` which is a `data.frame` with an attribute called
289index. This attribute is a `data.frame` that contains the individual
290and the time indexes.
291
292It is also possible to use directly the `pdata.frame` function
293and then to use the `pdata.frame` in the estimation functions.
294
295## Interface
296
297### Estimation interface
298
299Package `plm` provides various functions for panel data estimation, among them:
300
301- `plm`: estimation of the
302  basic panel models and instrumental variable panel models, *i.e.*, between and
303  first-difference models and within and random effect  models. Models are
304  estimated internally using the `lm` function on transformed data,
305- `pvcm`: estimation of models with variable coefficients,
306- `pgmm`: estimation of generalized method of moments models,
307- `pggls`: estimation of general feasible generalized least squares
308  models,
309- `pmg`: estimators for mean groups (MG), demeaned MG (DMG) and common
310   correlated effects MG (CCEMG) for heterogeneous panel models,
311- `pcce`: estimators for common correlated effects mean groups (CCEMG) and
312   pooled (CCEP) for panel data with common factors,
313- `pldv`: panel estimators for limited dependent variables.
314
315
316The interface of these functions is consistent with the `lm()`
317function. Namely, their first two arguments are `formula` and
318`data` (which should be a `data.frame` and is
319mandatory). Three additional arguments are common to these functions:
320
321- `index`: this argument enables the estimation functions
322  to identify the structure of the data, *i.e.*, the individual
323  and the time period for each observation,
324- `effect`: the kind of effects to include in the model,
325  *i.e.*, individual effects, time effects or both^[Although in most models the
326  individual and time effects cases are symmetric, there are exceptions:
327  estimating the *first-difference* model on time effects is meaningless
328  because cross-sections do not generally have a natural ordering, so trying
329  `effect = "time"` stops with an error message as does `effect = "twoways"`
330  which is not defined for first-difference models.],
331- `model`: the kind of model to be estimated, most of the
332time a model with fixed effects or a model with random effects.
333
334The results of these four functions are stored in an object which class
335has the same name of the function. They all inherit from class
336`panelmodel`. A `panelmodel` object contains:
337`coefficients`, `residuals`, `fitted.values`,
338`vcov`, `df.residual` and `call` and functions
339that extract these elements are provided.
340
341<!-- TODO: currently, plm objects do not contain fitted.values -->
342
343
344### Testing interface
345
346The diagnostic testing interface provides both `formula` and
347`panelmodel` methods for most functions, with some exceptions.
348The user may thus choose whether to employ results stored in a
349previously estimated `panelmodel` object or to re-estimate it
350for the sake of testing.
351
352Although the first strategy is the most efficient one, diagnostic testing
353on panel models mostly employs OLS residuals from pooling
354model objects, whose estimation is computationally inexpensive.
355Therefore most examples in the following are based on `formula`
356methods, which are perhaps the cleanest for illustrative purposes.
357
358<!--
359In some cases where tests are to be done on model objects that be, in
360computational terms, more demanding to estimate (and that are also
361likely to be already present in the workspace at that stage), an
362exception is made to the rule and tests require a `panelmodel`
363object argument. This is the case of Hausman tests and of
364Wooldridge-type serial correlation tests.
365-->
366
367## Computational approach to estimation
368
369The feasible GLS methods needed for efficient estimation of
370unobserved effects models have a simple closed-form solution: once the
371variance components have been estimated and hence the covariance
372matrix of errors $\hat{V}$, model parameters can be estimated as
373
374\begin{equation}
375  (\#eq:naive)
376  \hat{\beta}=(X^\top  \hat{V}^{-1} X)^{-1} (X^\top  \hat{V}^{-1} y)
377\end{equation}
378
379Nevertheless, in practice plain computation of $\hat{\beta}$ has long
380been an intractable problem even for moderate-sized data sets because
381of the need to invert the $N\times N$ $\hat{V}$ matrix. With the
382advances in computer power, this is no more so, and it is possible to
383program the "naive" estimator \@ref(eq:naive) in `R` with standard
384matrix algebra operators and have it working seamlessly for the
385standard "guinea pigs", e.g., the Grunfeld data. Estimation with a
386couple of thousands of data points also becomes feasible on a modern
387machine, although excruciatingly slow and definitely not suitable for
388everyday econometric practice. Memory limits would also be very near
389because of the storage needs related to the huge $\hat{V}$ matrix. An
390established solution exists for the random effects model which reduces
391the problem to an ordinary least squares computation.
392
393
394### The (quasi--)demeaning framework
395
396The estimation methods for the basic models in panel data
397econometrics, the pooled OLS, random effects and fixed effects (or
398within) models, can all be described inside the OLS estimation
399framework. In fact, while pooled OLS simply pools data, the standard
400way of estimating fixed effects models with, say, group (time) effects
401entails transforming the data by subtracting the average over time
402(group) to every variable, which is usually termed *time-demeaning*.
403In the random effects case, the various feasible
404GLS estimators which have been put forth to tackle the issue of serial
405correlation induced by the group-invariant random effect have been
406proven to be equivalent (as far as estimation of $\beta$s is
407concerned) to OLS on *partially demeaned* data, where partial
408demeaning is defined as:
409
410\begin{equation}
411 (\#eq:ldemmodel)
412 y_{it} - \theta \bar{y}_i = ( X_{it} - \theta \bar{X}_{i} ) \beta + ( u_{it} - \theta \bar{u}_i )
413\end{equation}
414
415where $\theta=1-[\sigma_u^2 / (\sigma_u^2 + T \sigma_e^2)]^{1/2}$,
416$\bar{y}$ and $\bar{X}$ denote time means of $y$ and $X$, and the
417disturbance $v_{it} - \theta \bar{v}_i$ is homoskedastic and serially
418uncorrelated. Thus the feasible RE estimate for $\beta$ may be
419obtained estimating $\hat{\theta}$ and running an OLS
420regression on the transformed data with `lm()`. The other
421estimators can be computed as special cases: for $\theta=1$ one gets
422the fixed effects estimator, for $\theta=0$ the pooled OLS
423one.
424
425Moreover, instrumental variable estimators of all these models may also
426be obtained using several calls to `lm()`.
427
428For this reason the three above estimators have been grouped inside
429the same function.
430
431On the output side, a number of diagnostics and a very general
432coefficients' covariance matrix estimator also benefits from this
433framework, as they can be readily calculated applying the standard
434OLS formulas to the demeaned data, which are contained inside
435`plm` objects. This will be the subject of
436subsection [inference in the panel model](#inference).
437
438
439### The object oriented approach to general GLS computations
440
441The covariance matrix of errors in general GLS models is too
442generic to fit the quasi-demeaning framework, so this method calls for
443a full-blown application of GLS as in \@ref(eq:naive). On the
444other hand, this estimator relies heavily on $n$--asymptotics, making it
445theoretically most suitable for situations which forbid it
446computationally: e.g., "short" micropanels with thousands of
447individuals observed over few time periods.
448
449`R` has general facilities for fast matrix computation based on object
450orientation: particular types of matrices (symmetric, sparse, dense
451etc.) are assigned the relevant class and the additional information
452on structure is used in the computations, sometimes with dramatic
453effects on performance (see @BATE:04) and packages `Matrix` (see
454@BATE:MAEC:2016) and `SparseM` (see @KOEN:NG:2016). Some optimized
455linear algebra routines are available in the `R` package `bdsmatrix`
456(see @THER:14) which exploit the particular block-diagonal and
457symmetric structure of $\hat{V}$ making it possible to implement a
458fast and reliable full-matrix solution to problems of any practically
459relevant size.
460
461The $\hat{V}$ matrix is constructed as an object of class
462`bdsmatrix`. The peculiar properties of this matrix class are used for
463efficiently storing the object in memory and then by ad-hoc versions
464of the `solve` and `crossprod` methods, dramatically reducing
465computing times and memory usage.  The resulting matrix is then used
466"the naive way" as in \@ref(eq:naive) to compute $\hat{\beta}$,
467resulting in speed comparable to that of the demeaning solution.
468
469
470## Inference in the panel model{#inference}
471
472
473General frameworks for restrictions and linear hypotheses testing are
474available in the `R` environment^[See packages `lmtest`
475(@HOTH:ZEIL:FARE:CUMM:MILL:MITC:2015) and `car` (@FOX:2016).].
476These
477are based on the Wald test, constructed as $\hat{\beta}^\top \hat{V}^{-1}
478\hat{\beta}$, where $\hat{\beta}$ and $\hat{V}$ are consistent
479estimates of $\beta$ and $V(\beta)$, The Wald test may be used for
480zero-restriction (i.e., significance) testing and, more generally, for
481linear hypotheses in the form $(R \hat{\beta} - r)^\top  [R \hat{V} R^\top ]^{-1} (R
482\hat{\beta} - r)$^[Moreover, `coeftest()` provides a compact way of looking at
483coefficient estimates and significance diagnostics.].
484To be applicable, the test functions require
485extractor methods for coefficients' and covariance matrix estimates to
486be defined for the model object to be tested. Model objects in
487`plm` all have `coef()` and `vcov()` methods and are
488therefore compatible with the above functions.
489
490In the same framework, robust inference is accomplished substituting
491("plugging in") a robust estimate of the coefficient covariance
492matrix into the Wald statistic formula. In the panel context, the
493estimator of choice is the White system estimator. This called for a
494flexible method for computing robust coefficient covariance matrices
495*à la White* for `plm` objects.
496
497A general White system estimator for panel data is:
498
499\begin{equation*}
500  \hat{V}_R(\beta)=(X^\top X)^{-1} \sum_{i=1}^n{X_i^\top  E_i X_i} (X^\top X)^{-1}
501\end{equation*}
502
503where $E_i$ is a function of the residuals $\hat{e}_{it}, \; t=1,
504\dots T$ chosen according to the relevant heteroskedasticity and
505correlation structure. Moreover, it turns out that the White
506covariance matrix calculated on the demeaned model's regressors and
507residuals (both part of `plm` objects) is a consistent estimator
508of the relevant model's parameters' covariance matrix, thus the method
509is readily applicable to models estimated by random or fixed effects,
510first difference or pooled OLS methods. Different
511pre-weighting schemes taken from package `sandwich`
512(see @ZEIL:04; @LUML:ZEIL:2015) are also implemented to improve small-sample
513performance. Robust estimators with any combination of covariance
514structures and weighting schemes can be passed on to the testing
515functions.
516
517
518# Managing data and formulae{#dataformula}
519
520The package is now illustrated by application to some well-known
521examples. It is loaded using
522
523```{r echo=FALSE,results='hide'}
524options(prompt= "R> ", useFancyQuotes = FALSE, scipen = 999)
525library("knitr")
526opts_chunk$set(message = FALSE, warning = FALSE)
527```
528
529```{r echo=TRUE, results='hide'}
530library("plm")
531```
532
533The four data sets used are `EmplUK` which was used by
534@AREL:BOND:91, the `Grunfeld` data [@KLEI:ZEIL:08]
535which is used in several econometric books, the `Produc` data
536used by @MUNN:90 and the `Wages` used by @CORN:RUPE:88.
537
538```{r }
539data("EmplUK", package="plm")
540data("Produc", package="plm")
541data("Grunfeld", package="plm")
542data("Wages", package="plm")
543```
544
545
546## Data structure
547
548As observed above, the current version of `plm` is capable of
549working with a regular `data.frame` without any further
550transformation, provided that the individual and time indexes are in
551the first two columns, as in all the example data sets but `Wages`.
552If this weren't the case, an `index` optional argument would
553have to be passed on to the estimating and testing functions.
554
555```{r setdata1}
556head(Grunfeld)
557E <- pdata.frame(EmplUK, index=c("firm","year"), drop.index=TRUE, row.names=TRUE)
558head(E)
559head(attr(E, "index"))
560```
561
562Two further arguments are logical: `drop.index = TRUE` drops the
563indexes from the `data.frame` and `row.names = TRUE` computes
564"fancy" row names by pasting the individual and the time
565indexes. While extracting a series from a `pdata.frame`, a
566`pseries` is created, which is the original series with the
567index attribute. This object has specific methods, like
568`summary` and `as.matrix`. The former
569indicates the total variation of the variable and the shares of this
570variation due to the individual and the time dimensions. The
571latter gives the matrix representation of the series, with, by default,
572individuals as rows and times as columns.
573
574```{r }
575summary(E$emp)
576head(as.matrix(E$emp))
577```
578## Data transformation
579
580Panel data estimation requires to apply different transformations to
581raw series. If $x$ is a series of length $nT$ (where $n$ is the number
582of individuals and $T$ is the number of time periods), the transformed
583series $\tilde{x}$ is obtained as $\tilde{x}=Mx$ where $M$ is a
584transformation matrix. Denoting $j$ a vector of one of length $T$ and
585$I_n$ the identity matrix of dimension $n$, we get:
586
587- the between transformation: $P=\frac{1}{T}I_n\otimes jj'$
588  returns a vector containing the individual means. The
589  `Between` and `between` functions perform this
590  operation, the first one returning a vector of length $nT$, the
591  second one a vector of length $n$,
592- the within transformation: $Q=I_{nT}-P$ returns a vector
593  containing the values in deviation from the individual means. The
594  `Within` function performs this operation.
595- the first difference transformation $D=I_n \otimes d$ where
596
597$d=\left(
598\begin{array}{ccccccc}
5991 & -1 & 0 & 0 & ... & 0 & 0 \\
6000 &  1 & -1 & 0 & ... & 0 & 0 \\
6010 & 0 & 1 & -1 & ... & 0 & 0 \\
602\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
6030 & 0 & 0 & 0 & ... & 1 & -1 \\
604\end{array}
605\right)$
606
607is of dimension $(T-1,T)$.
608
609Note that `R`'s `diff()` and `lag()` functions
610don't compute correctly these transformations for panel data because
611they are unable to identify when there is a change in individual in
612the data. Therefore, specific methods for `pseries` objects have
613been written in order to handle correctly panel data. Note that
614compared to the `lag()` method for `ts` objects, the order
615of lags are indicated by a positive integer. Moreover, 0 is a relevant
616value and a vector argument may be provided:
617
618```{r }
619head(lag(E$emp, 0:2))
620```
621
622Further functions called `Between`, `between` and
623`Within` are also provided to compute the between and the within
624transformation. The `between` returns unique values, whereas
625`Between` duplicates the values and returns a vector which
626length is the number of observations.
627
628```{r }
629head(diff(E$emp), 10)
630head(lag(E$emp, 2), 10)
631head(Within(E$emp))
632head(between(E$emp), 4)
633head(Between(E$emp), 10)
634```
635
636
637## Formulas
638
639In some circumstances, standard `formula`s are not very
640useful to describe a model, notably while using instrumental variable
641like estimators: to deal with these situations, we use the
642`Formula` package.
643
644The `Formula` package provides a class which enables to construct
645multi-part formula, each part being separated by a pipe sign (`|`).
646
647The two formulas below are identical:
648```{r results='hide'}
649emp ~ wage + capital | lag(wage, 1) + capital
650emp ~ wage + capital | . -wage + lag(wage, 1)
651```
652
653In the second case, the `.` means the previous parts which
654describes the covariates and this part is "updated". This is
655particularly interesting when there are a few external instruments.
656
657# Model estimation{#modelestimation}
658
659
660## Estimation of the basic models with plm
661
662
663Several models can be estimated with `plm` by filling the
664`model` argument:
665
666- the fixed effects model (`"within"`), the default,
667- the pooling model (`"pooling"`),
668- the first-difference model (`"fd"`),
669- the between model (`"between"`),
670- the error components model (`"random"`).
671
672The basic use of `plm` is to indicate the model formula, the
673data and the model to be estimated. For example, the fixed effects
674model and the random effects model are estimated using:
675
676```{r fe_re}
677grun.fe <- plm(inv~value+capital, data = Grunfeld, model = "within")
678grun.re <- plm(inv~value+capital, data = Grunfeld, model = "random")
679```
680Methods to display a sumamry of the model estimation are available via `summary`.
681For example, for a `random` model, the `summary` method gives information
682about the variance of the components of the errors and some test statistics.
683Random effects of the estimated model can be extracted via `ranef`.
684```{r summary_re}
685summary(grun.re)
686ranef(grun.re)
687```
688
689The fixed effects of a fixed effects model may be extracted easily using `fixef`.
690An argument `type` indicates how fixed effects should be computed: in levels by
691`type = "level"` (the default), in deviations from the overall mean by
692`type = "dmean"` or in deviations from the first individual by
693`type = "dfirst"`.
694
695```{r }
696fixef(grun.fe, type = "dmean")
697```
698
699The `fixef` function returns an object of class `fixef`. A
700summary method is provided, which prints the effects (in deviation
701from the overall intercept), their standard
702errors and the test of equality to the overall intercept.
703
704```{r }
705summary(fixef(grun.fe, type = "dmean"))
706```
707
708In case of a two-ways fixed effect model, argument `effect` is relevant in function
709`fixef` to extract specific effect fixed effects with possible values `"individual"`
710for individual fixed effects (default for two-ways fixed effect models), `"time"`
711for time fixed effects, and `"twoways"` for the sum of individual and time fixed
712effects. Example to extract the time fixed effects from a two-ways model:
713
714```{r }
715grun.twfe <- plm(inv~value+capital, data=Grunfeld, model="within", effect="twoways")
716fixef(grun.twfe, effect = "time")
717```
718
719## More advanced use of plm
720
721### Random effects estimators
722
723As observed above, the random effect model is obtained as a linear
724estimation on quasi-demeaned data. The parameter of this
725transformation is obtained using preliminary estimations.
726
727Four estimators of this parameter are available, depending on the
728value of the argument `random.method`:
729
730- `"swar"`: from @SWAM:AROR:72, the default value,
731- `"walhus"`: from @WALL:HUSS:69,
732- `"amemiya"`: from @AMEM:71,
733- `"nerlove"`: from @NERLO:71.
734- `"ht"`: for Hausman-Taylor-type instrumental variable (IV) estimation,
735discussed later, see Section [Instrumental variable estimator](#instrumental-variable-est).
736
737For example, to use the `amemiya` estimator:
738
739```{r }
740grun.amem <- plm(inv~value+capital, data=Grunfeld,
741                 model="random", random.method="amemiya")
742```
743
744The estimation of the variance of the error components are performed
745using the `ercomp` function, which has a `method` and an
746`effect` argument, and can be used by itself:
747
748```{r }
749ercomp(inv~value+capital, data=Grunfeld, method = "amemiya", effect = "twoways")
750```
751
752### Introducing time or two-ways effects
753
754The default behavior of `plm` is to introduce individual
755effects. Using the `effect` argument, one may also introduce:
756
757- time effects (`effect = "time"`),
758- individual and time effects (`effect = "twoways"`).
759
760For example, to estimate a two-ways effect model for the
761`Grunfeld` data:
762
763```{r 2RE-amemiya}
764grun.tways <- plm(inv~value+capital, data = Grunfeld, effect = "twoways",
765                  model = "random", random.method = "amemiya")
766summary(grun.tways)
767```
768
769In the "effects" section of the printed summary of the result, the variance
770of the three elements of the error term and the three parameters used in the
771transformation are printed.
772
773### Unbalanced panels
774
775Estimations by `plm` support unbalanced panel models.
776
777The following example is using data used by @HARR:RUBI:78 to
778estimate an hedonic housing prices function. It is reproduced in
779@BALT:CHAN:94, table 2 (and in @BALT:05, pp. 172/4; @BALT:13, pp. 195/7
780tables 9.1/3).
781
782```{r hedonic}
783data("Hedonic", package = "plm")
784Hed <- plm(mv~crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+blacks+lstat,
785           data = Hedonic, model = "random", index = "townid")
786summary(Hed)
787```
788Measures for the unbalancedness of a panel data set or the data used in
789estimated models are provided by function `punbalancedness`. It gives
790the measures $\gamma$ and $\nu$ from @AHRE:PINC:81 where for both 1 represents
791balanced data and the more unbalanced the data the lower the value.
792
793```{r hedonic-punbal}
794punbalancedness(Hed)
795```
796
797### Instrumental variable estimators{#instrumental-variable-est}
798
799
800All of the models presented above may be estimated using instrumental
801variables. The instruments are specified at the end of the formula
802after a `|` sign (pipe).
803
804The instrumental variables estimator used is indicated with the
805`inst.method` argument:
806
807- `"bvk"`, from  @BALE:VARA:87, the default value,
808- `"baltagi"`, from @BALT:81,
809- `"am"`, from @AMEM:MACU:86,
810- `"bms"`, from @BREU:MIZO:SCHM:89.
811
812An illustration is in the following example from
813@BALT:05, p. 120; @BALT:13, p. 137; @BALT:21, p. 165, table 7.3 ("G2SLS").
814
815```{r G2SLS}
816data("Crime", package = "plm")
817cr <- plm(lcrmrte ~ lprbarr + lpolpc + lprbconv + lprbpris + lavgsen +
818          ldensity + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed +
819          lwsta + lwloc + lpctymle + lpctmin + region + smsa + factor(year)
820          | . - lprbarr - lpolpc + ltaxpc + lmix,
821          data = Crime, model = "random")
822summary(cr)
823```
824
825The Hausman-Taylor model (see @HAUS:TAYL:81) may be estimated
826with the `plm`^[Function `pht` is a deprecated way to estimate this type of model:
827`ht <- pht(lwage~wks+south+smsa+married+exp+I(exp^2)+ bluecol+ind+union+sex+black+ed
828| sex+black+bluecol+south+smsa+ind, data=Wages,index=595)`.]
829function by setting parameters `random.method = "ht"` and
830`inst.method = "baltagi"` like in
831the example below. The following replicates @BALT:05, pp. 129/30;
832@BALT:13, pp. 145/6, tables 7.4/5; @BALT:21, pp. 174/5 tables 7.5/6:
833
834```{r hausman-taylor}
835ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) +
836              bluecol + ind + union + sex + black + ed |
837              bluecol + south + smsa + ind + sex + black |
838              wks + married + union + exp + I(exp ^ 2),
839          data = Wages, index = 595,
840          model = "random", random.method = "ht", inst.method = "baltagi")
841summary(ht)
842```
843
844
845## Variable coefficients model
846
847The `pvcm` function enables the estimation of variable
848coefficients models. Time or individual effects are introduced if argument
849`effect` is fixed to `"time"` or `"individual"` (the default value).
850
851Coefficients are assumed to be fixed if `model="within"` or
852random if `model="random"`. In the first case, a different
853model is estimated for each individual (or time period). In the second
854case, the Swamy model (see @SWAM:70) model is estimated. It is a generalized
855least squares model which uses the results of the previous
856model. Denoting $\hat{\beta}_i$ the vectors of coefficients obtained
857for each individual, we get:
858
859\begin{equation*}
860\hat{\beta}=\left(\sum_{i=1}^n \left(\hat{\Delta}+\hat{\sigma}_i^2(X_i^\top X_i)^{-1}\right)^{-1}\right)\left(\hat{\Delta}+\hat{\sigma}_i^2(X_i^\top X_i)^{-1}\right)^{-1}\hat{\beta}_i
861\end{equation*}
862
863where $\hat{\sigma}_i^2$ is the unbiased estimator of the variance of
864the errors for individual $i$ obtained from the preliminary estimation
865and:
866
867\begin{equation*}
868  \hat{\Delta}=\frac{1}{n-1}\sum_{i=1}^n\left(\hat{\beta}_i-\frac{1}{n}\sum_{i=1}^n\hat{\beta}_i\right)
869  \left(\hat{\beta}_i-\frac{1}{n}\sum_{i=1}^n\hat{\beta}_i\right)^\top -\frac{1}{n}\sum_{i=1}^n\hat{\sigma}_i^2(X_i^\top X_i)^{-1}
870\end{equation*}
871
872If this matrix is not positive-definite, the second term is dropped.
873
874With the `Grunfeld` data, we get:
875
876```{r grunfeld.within}
877grun.varw <- pvcm(inv~value+capital, data=Grunfeld, model="within")
878grun.varr <- pvcm(inv~value+capital, data=Grunfeld, model="random")
879summary(grun.varr)
880```
881
882## Generalized method of moments estimator
883
884The generalized method of moments is mainly used in panel data
885econometrics to estimate dynamic models [@AREL:BOND:91; @HOLT:NEWE:ROSE:88].
886
887\begin{equation*}
888y_{it}=\rho y_{it-1}+\beta^\top x_{it}+\mu_i+\epsilon_{it}
889\end{equation*}
890
891The model is first differenced to get rid of the individual effect:
892
893\begin{equation*}
894\Delta y_{it}=\rho \Delta  y_{it-1}+\beta^\top \Delta x_{it}+\Delta \epsilon_{it}
895\end{equation*}
896
897Least squares are inconsistent because $\Delta \epsilon_{it}$ is
898correlated with $\Delta y_{it-1}$.  $y_{it-2}$ is a valid, but weak
899instrument (see @ANDE:HSIA:81). The GMM estimator uses the fact that
900the number of valid instruments is growing with $t$:
901
902- $t=3$: $y_1$,
903- $t=4$: $y_1,y_2$,
904- $t=5$: $y_1,y_2,y_3$.
905
906For individual $i$, the matrix of instruments is then:
907
908\begin{equation*}
909W_i=\left(
910\begin{array}{ccccccccccccc}
911y_1 & 0   & 0   & 0    & 0   & 0   & ... & 0    & 0   &  0   &        0      & x_{i3} \\
9120   & y_1 & y_2 & 0    & 0   & 0   & ... & 0    & 0     &  0  &      0          & x_{i4} \\
9130   & 0   & 0   & y_1  & y_2 & y_3 & ... & 0    &  0    & 0  & 0                     & x_{i5} \\
914\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots  & \vdots & \ddots & \vdots & \vdots \\
9150   & 0   & 0   & 0    & ...   & ...    &  ...      & y_1  & y_2 & ... & y_{t-2} & x_{iT-2} &\\
916\end{array}
917\right)
918\end{equation*}
919
920The moment conditions are: $\sum_{i=1}^n W_i^\top e_i(\beta)$ where
921$e_i(\beta)$ is the vector of residuals for individual $i$. The GMM
922estimator minimizes:
923
924\begin{equation*}
925\left(\sum_{i=1}^n e_i(\beta)^\top W_i\right) A \left(\sum_{i=1}^n W_i^\top e_i(\beta)\right)
926\end{equation*}
927
928where $A$ is the weighting matrix of the moments.
929
930One-step estimators are computed using a known weighting matrix. For
931the model in first differences, one uses:
932
933\begin{equation*}
934A^{(1)}=\left(\sum_{i=1}^n W_i^\top H^{(1)}W_i\right)^{-1}
935\end{equation*}
936
937with:
938
939\begin{equation*}
940H^{(1)}=d^\top d=\left(
941\begin{array}{ccccc}
9422 & -1 & 0 & ... & 0\\
943-1 & 2 & -1 & ... & 0\\
9440 & -1 & 2 & ... & 0\\
945\vdots & \vdots & \vdots & \vdots & \vdots \\
9460 & 0 & 0 & -1  & 2\\
947\end{array}
948\right)
949\end{equation*}
950
951Two-steps estimators are obtained using
952$H^{(2)}_i=\sum_{i=1}^n e^{(1)}_i e^{(1)\top }_i$
953where $e_i^{(1)}$ are the residuals of the one step estimate.
954
955
956@BLUN:BOND:98 show that with weak hypothesis on the data
957generating process, supplementary moment conditions exist for the
958equation in level:
959
960$$
961y_{it} = \gamma y_{it-1}+\mu_i+\eta_{it}
962$$
963
964More precisely, they show that $\Delta y_{it-2}=y_{it-2}-y_{it-3}$ is
965a valid instrument. The estimator is obtained using the residual
966vector in difference and in level:
967
968$$
969e^+_i=(\Delta e_i, e_i)
970$$
971
972and the matrix of augmented moments:
973
974$$
975Z_i^+=\left(
976\begin{array}{ccccc}
977Z_i & 0   & 0      &  ... & 0  \\
9780 & \Delta y_{i2} & 0  & ... & 0 \\
9790 & 0 & \Delta y_{i3}   & ... & 0 \\
9800 & 0 & 0 & ... & \Delta y_{iT-1}
981\end{array}
982\right)
983$$
984
985The moment conditions are then
986
987
988\begin{eqnarray*}
989\left(\sum_{i=1}^n Z_i^{+\top} \left(\begin{array}{c}\bar{e}_i(\beta)\\ e_i(\beta)\end{array}\right)\right)^\top = \left(\sum_{i=1}^n y_{i1} \bar{e}_{i3},\sum_{i=1}^n  y_{i1}\bar{e}_{i4},\sum_{i=1}^n  y_{i2}\bar{e}_{i4}, ...,  \right.\\ \left. \sum_{i=1}^n y_{i1} \bar{e}_{iT}, \sum_{i=1}^n y_{i2} \bar{e}_{iT}, ...,\sum_{i=1}^n y_{iT-2} \bar{e}_{iT}, \sum_{i=1}^n \sum_{t=3}^T x_{it} \bar{e}_{it}\right.\\
990\left.\sum_{i=1}^n e_{i3} \Delta y_{i2}, \sum_{i=1}^n e_{i4} \Delta y_{i3},  ... , \sum_{i=1}^n e_{iT} \Delta y_{iT-1} \right)^\top
991\end{eqnarray*}
992
993
994The GMM estimator is provided by the `pgmm`
995function. By using a multi-part formula, the variables of the model and
996the lag structure are described.
997
998In a GMM estimation, there are "normal instruments" and
999"GMM instruments". GMM instruments are indicated
1000in the second part of the formula. By default, all the variables of
1001the model that are not used as GMM instruments are used as
1002normal instruments, with the same lag structure; "normal"
1003instruments may also be indicated in the third part of the formula.
1004
1005The `effect` argument is either `NULL`, `"individual"`
1006(the default), or `"twoways"`. In the first case, the model is
1007estimated in levels. In the second case, the model is estimated in
1008first differences to get rid of the individuals effects. In the last
1009case, the model is estimated in first differences and time dummies are
1010included.
1011
1012The `model` argument specifies whether a one-step or a
1013two-steps model is requested (`"onestep"` or
1014`"twosteps"`).
1015
1016The following example is from @AREL:BOND:91. Employment is
1017explained by past values of employment (two lags), current and first
1018lag of wages and output and current value of capital.
1019
1020```{r gmm}
1021emp.gmm <- pgmm(log(emp)~lag(log(emp), 1:2)+lag(log(wage), 0:1)+log(capital)+
1022                lag(log(output), 0:1) | lag(log(emp), 2:99),
1023                data = EmplUK, effect = "twoways", model = "twosteps")
1024summary(emp.gmm)
1025```
1026
1027The following example is from @BLUN:BOND:98. The "sys"
1028estimator is obtained using `transformation = "ld"` for level
1029and difference. The `robust` argument of the `summary`
1030method enables to use the robust covariance matrix proposed by
1031@WIND:05. For all pgmm models, `robust = TRUE` is the default (but set
1032in this example explicitly).
1033
1034```{r gmm2}
1035z2 <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) +
1036           lag(log(capital), 0:1) | lag(log(emp), 2:99) +
1037           lag(log(wage), 2:99) + lag(log(capital), 2:99),
1038           data = EmplUK, effect = "twoways", model = "onestep",
1039           transformation = "ld")
1040summary(z2, robust = TRUE)
1041```
1042
1043## General FGLS models
1044
1045General FGLS estimators are based on a two-step estimation
1046process: first an OLS model is estimated, then its residuals
1047$\hat{u}_{it}$ are used to estimate an error covariance matrix
1048more general than the random effects one for use in a
1049feasible-GLS analysis. Formally, the estimated error
1050covariance matrix is $\hat{V}=I_n \otimes \hat{\Omega}$,
1051with $$\hat{\Omega}=\sum_{i=1}^n \frac{\hat{u}_{it}
1052  \hat{u}_{it}^\top }{n} $$ (see @WOOL:02 10.4.3 and 10.5.5).
1053
1054
1055This framework allows the error covariance structure inside every
1056group (if `effect = "individual"`) of observations to be fully
1057unrestricted and is therefore robust against any type of intragroup
1058heteroskedasticity and serial correlation. This structure, by
1059converse, is assumed identical across groups and thus general
1060FGLS is inefficient under groupwise
1061heteroskedasticity. Cross-sectional correlation is excluded a priori.
1062
1063Moreover, the number of variance parameters to be estimated with $N=n\times T$
1064data points is $T(T+1)/2$, which makes these estimators particularly
1065suited for situations where $n>>T$, as e.g., in labour or household
1066income surveys, while problematic for "long" panels, where $\hat{V}$
1067tends to become singular and standard errors therefore become biased
1068downwards.
1069
1070In a pooled time series context (`effect = "time"`),
1071symmetrically, this estimator is able to account for arbitrary
1072cross-sectional correlation, provided that the latter is
1073time-invariant (see @GREE:03 13.9.1--2, pp. 321--2). In this case
1074serial correlation has to be assumed away and the estimator is
1075consistent with respect to the time dimension, keeping $n$ fixed.
1076
1077The function `pggls` estimates general FGLS models, with either
1078fixed or "random" effects^[The "random effect" is better termed "general FGLS"
1079model, as in fact it does not have a proper random effects structure, but we
1080keep this terminology for general language consistency.].
1081
1082The "random effect" general FGLS is estimated by:
1083
1084```{r pggls}
1085zz <- pggls(log(emp)~log(wage)+log(capital), data=EmplUK, model="pooling")
1086summary(zz)
1087```
1088
1089The fixed effects `pggls` (see @WOOL:02, p. 276) is based
1090on the estimation of a within model in the first step; the rest follows as
1091above. It is estimated by:
1092
1093```{r }
1094zz <- pggls(log(emp)~log(wage)+log(capital), data=EmplUK, model="within")
1095```
1096
1097The `pggls` function is similar to `plm` in many
1098respects. An exception is that the estimate of the group covariance
1099matrix of errors (`zz$sigma`, a matrix, not shown) is reported in
1100the model objects instead of the usual estimated variances of the two
1101error components.
1102
1103
1104# Tests{#tests}
1105
1106As sketched in Section [linear panel model](#linear-panel-model), specification
1107testing in panel models involves essentially testing for poolability,
1108for individual or time unobserved effects and for correlation between
1109these latter and the regressors (Hausman-type tests). As for the other
1110usual diagnostic checks, we provide a suite of serial correlation
1111tests, while not touching on the issue of heteroskedasticity
1112testing. Instead, we provide heteroskedasticity-robust covariance
1113estimators, to be described in
1114subsection [robust covariance matrix estimation](#robust).
1115
1116## Tests of poolability
1117
1118`pooltest` tests the hypothesis that the same coefficients
1119apply to each individual. It is a standard F test, based on the
1120comparison of a model obtained for the full sample and a model based
1121on the estimation of an equation for each individual. The first
1122argument of `pooltest` is a `plm` object.  The second
1123argument is a `pvcm` object obtained with
1124`model="within"`.  If the first argument is a pooling model, the
1125test applies to all the coefficients (including the intercepts), if it
1126is a within model, different intercepts are assumed.
1127
1128To test the hypothesis that all the coefficients in the
1129`Grunfeld` example, excluding the intercepts, are equal, we use
1130:
1131
1132```{r }
1133znp <- pvcm(inv~value+capital, data=Grunfeld, model="within")
1134zplm <- plm(inv~value+capital, data=Grunfeld, model="within")
1135pooltest(zplm, znp)
1136```
1137
1138The same test can be computed using a formula as first argument of the
1139`pooltest` function:
1140
1141```{r results='hide'}
1142pooltest(inv~value+capital, data=Grunfeld, model="within")
1143```
1144
1145## Tests for individual and time effects
1146
1147
1148`plmtest` implements Lagrange multiplier tests of individual
1149or/and time effects based on the results of the pooling model. Its
1150main argument is a `plm` object (the result of a pooling model)
1151or a formula.
1152
1153Two additional arguments can be added to indicate the kind of test to
1154be computed. The argument `type` is one of:
1155
1156- `"honda"`: @HOND:85, the default value,
1157- `"bp"`: @BREU:PAGA:80,
1158- `"kw"`: @KING:WU:97^[NB: Oneway King-Wu (`"kw"`) statistics
1159(`"individual"` and `"time"`) coincide with the respective  Honda statistics
1160(`"honda"`); however, the twoway statistics of `"kw"`  and `"honda"` differ.],
1161- `"ghm"`: @GOUR:HOLL:MONF:82.
1162
1163The effects tested are indicated with the `effect` argument (one of
1164`"individual"`, `"time"`, or `"twoways"`). The test statistics
1165implemented are also suitable for unbalanced panels.^[The `"bp"` test
1166for unbalanced panels was derived in @BALT:LI:90, the `"kw"` test for
1167unbalanced panels in @BALT:CHAN:LI:98. The `"ghm"` test and the `"kw"`
1168test were extended to two--way effects in @BALT:CHAN:LI:92. For a
1169concise overview of all these statistics see @BALT:13 Sec. 4.2,
1170pp. 68--76 (for balanced panels) and Sec. 9.5, pp. 200--203 (for
1171unbalanced panels) or @BALT:21, Sec. 4.2, pp. 81-84 (balanced), Sec. 9.6,
1172pp. 243-246 (unbalanced).]
1173
1174To test the presence of individual and time effects in the
1175`Grunfeld` example, using the @GOUR:HOLL:MONF:82 test,
1176we use:
1177
1178```{r }
1179g <- plm(inv ~ value + capital, data=Grunfeld, model="pooling")
1180plmtest(g, effect="twoways", type="ghm")
1181```
1182
1183or
1184
1185```{r results='hide'}
1186plmtest(inv~value+capital, data=Grunfeld, effect="twoways", type="ghm")
1187```
1188
1189
1190
1191`pFtest` computes F tests of effects based on the comparison of
1192the within and the pooling model. Its main arguments
1193are either two `plm` objects (a pooling and a within model) or a formula.
1194
1195```{r }
1196gw <- plm(inv ~ value + capital, data=Grunfeld, effect="twoways", model="within")
1197gp <- plm(inv ~ value + capital, data=Grunfeld, model="pooling")
1198pFtest(gw, gp)
1199```
1200
1201```{r results='hide'}
1202pFtest(inv~value+capital, data=Grunfeld, effect="twoways")
1203```
1204
1205## Hausman test
1206
1207`phtest` computes the Hausman test which is based on the
1208comparison of two sets of estimates (see @HAUS:78). Its main
1209arguments  are two `panelmodel` objects or a
1210formula.  A classical application of the Hausman test for panel data
1211is to compare the fixed and the random effects models:
1212
1213```{r }
1214gw <- plm(inv~value+capital, data=Grunfeld, model="within")
1215gr <- plm(inv~value+capital, data=Grunfeld, model="random")
1216phtest(gw, gr)
1217```
1218
1219## Tests of serial correlation{#serialcor}
1220
1221
1222A model with individual effects has composite errors that are serially
1223correlated by definition. The presence of the time-invariant error
1224component^[Here we treat fixed and random effects alike, as components
1225of the error term, according with the modern approach in econometrics
1226(see @WOOL:02, @WOOL:10).] gives rise to serial correlation which does
1227not die out over time, thus standard tests applied on pooled data
1228always end up rejecting the null of spherical residuals^[Neglecting
1229time effects may also lead to serial correlation in residuals (as
1230observed in @WOOL:02 10.4.1).].  There may also be serial correlation
1231of the "usual" kind in the idiosyncratic error terms, e.g., as an AR(1)
1232process. By "testing for serial correlation" we mean testing for this
1233latter kind of dependence.
1234
1235For these reasons, the subjects of testing for individual error
1236components and for serially correlated idiosyncratic errors are
1237closely related. In particular, simple (*marginal*) tests for one
1238direction of departure from the hypothesis of spherical errors usually
1239have power against the other one: in case it is present, they are
1240substantially biased towards rejection. *Joint* tests are
1241correctly sized and have power against both directions, but usually do
1242not give any information about which one actually caused
1243rejection. *Conditional* tests for serial correlation that take
1244into account the error components are correctly sized under presence
1245of both departures from sphericity and have power only against the
1246alternative of interest. While most powerful if correctly specified,
1247the latter, based on the likelihood framework, are crucially dependent
1248on normality and homoskedasticity of the errors.
1249
1250In `plm` we provide a number of joint, marginal and conditional
1251ML-based tests, plus some semiparametric alternatives which are
1252robust vs. heteroskedasticity and free from distributional
1253assumptions.
1254
1255### Unobserved effects test
1256
1257The unobserved effects test *à la Wooldridge*
1258(see @WOOL:02 10.4.4), is a semiparametric test for the null
1259hypothesis that $\sigma^2_{\mu}=0$, i.e. that there are no unobserved
1260effects in the residuals. Given that under the null the covariance
1261matrix of the residuals for each individual is diagonal, the test
1262statistic is based on the average of elements in the upper (or lower)
1263triangle of its estimate, diagonal excluded: $n^{-1/2} \sum_{i=1}^n
1264\sum_{t=1}^{T-1} \sum_{s=t+1}^T \hat{u}_{it} \hat{u}_{is}$ (where
1265$\hat{u}$ are the pooled OLS residuals), which must be
1266"statistically close" to zero under the null, scaled by its standard
1267deviation: $$W=\frac{ \sum_{i=1}^n \sum_{t=1}^{T-1} \sum_{s=t+1}^T
1268  \hat{u}_{it} \hat{u}_{is} }{ [{ \sum_{i=1}^n ( \sum_{t=1}^{T-1}
1269    \sum_{s=t+1}^T \hat{u}_{it} \hat{u}_{is} } )^2 ]^{1/2} }$$
1270
1271
1272This test is ($n$-) asymptotically distributed as a standard normal
1273regardless of the distribution of the errors. It does also not rely on
1274homoskedasticity.
1275
1276It has power both against the standard random effects specification,
1277where the unobserved effects are constant within every group, as well
1278as against any kind of serial correlation. As such, it "nests" both
1279random effects and serial correlation tests, trading some power
1280against more specific alternatives in exchange for robustness.
1281
1282While not rejecting the null favours the use of pooled OLS, rejection
1283may follow from serial correlation of different kinds, and in
1284particular, quoting @WOOL:02, "should not be interpreted as
1285implying that the random effects error structure *must* be true".
1286
1287Below, the test is applied to the data and model in @MUNN:90:
1288
1289```{r wtest}
1290pwtest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc)
1291```
1292
1293### Locally robust tests for serial correlation or random effects
1294
1295The presence of random effects may affect tests for residual serial
1296correlation, and the opposite. One solution is to use a joint test,
1297which has power against both alternatives. A joint LM test for random
1298effects *and* serial correlation under normality and
1299homoskedasticity of the idiosyncratic errors has been derived by
1300@BALT:LI:91 and @BALT:LI:95 and is implemented as an
1301option in `pbsytest`:
1302
1303```{r pbsytestJoint}
1304pbsytest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, test="j")
1305```
1306
1307Rejection of the joint test, though, gives no information on the
1308direction of the departure from the null hypothesis, i.e.: is
1309rejection due to the presence of serial correlation, of random effects
1310or of both?
1311
1312@BERA:SOSA:YOON:01 (hereafter BSY) derive locally robust tests both
1313for individual random effects and for first-order serial correlation
1314in residuals as "corrected" versions of the standard LM test (see
1315`plmtest`). While still dependent on normality and
1316homoskedasticity, these are robust to *local* departures from the
1317hypotheses of, respectively, no serial correlation or no random
1318effects. The authors observe that, although suboptimal, these tests
1319may help detecting the right direction of the departure from the null,
1320thus complementing the use of joint tests. Moreover, being based on
1321pooled OLS residuals, the BSY tests are computationally far less
1322demanding than likelihood-based conditional tests.
1323
1324On the other hand, the statistical properties of these "locally
1325corrected" tests are inferior to those of the non-corrected
1326counterparts when the latter are correctly specified. If there is no
1327serial correlation, then the optimal test for random effects is the
1328likelihood-based LM test of Breusch and Godfrey (with refinements by
1329Honda, see `plmtest`), while if there are no random effects the
1330optimal test for serial correlation is, again, Breusch-Godfrey's
1331test^[$LM_3$ in @BALT:LI:95.].  If the presence of a
1332random effect is taken for granted, then the optimal test for serial
1333correlation is the likelihood-based conditional LM test of
1334@BALT:LI:95 (see `pbltest`).
1335
1336The serial correlation version is the default:
1337
1338```{r pbsytestAR}
1339pbsytest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc)
1340```
1341
1342The BSY test for random effects is implemented in the one-sided
1343version^[Corresponding to $RSO^*_{\mu}$ in the original paper.],
1344which takes heed that the variance of the random effect
1345must be non-negative:
1346
1347```{r pbsytestRE}
1348pbsytest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, test="re")
1349```
1350
1351### Conditional LM test for AR(1) or MA(1) errors under random effects
1352
1353@BALT:LI:91 and @BALT:LI:95 derive a Lagrange multiplier test for
1354serial correlation in the idiosyncratic component of the errors under
1355(normal, heteroskedastic) random effects. Under the null of
1356serially uncorrelated errors, the test turns out to be identical for
1357both the alternative of AR(1) and MA(1) processes. One- and two-sided
1358versions are provided, the one-sided having power against positive
1359serial correlation only. The two-sided is the default, while for the
1360other one must specify the `alternative` option to
1361`"onesided"`:
1362
1363```{r pbltest}
1364pbltest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,
1365        data=Produc, alternative="onesided")
1366```
1367
1368As usual, the LM test statistic is based on residuals from the maximum
1369likelihood estimate of the restricted model (random effects with
1370serially uncorrelated errors). In this case, though, the restricted
1371model cannot be estimated by OLS anymore, therefore the testing
1372function depends on `lme()` in the `nlme` package for
1373estimation of a random effects model by maximum likelihood. For this
1374reason, the test is applicable only to balanced panels.
1375
1376No test has been implemented to date for the symmetric hypothesis of
1377no random effects in a model with errors following an AR(1) process,
1378but an asymptotically equivalent likelihood ratio test is available in
1379the `nlme` package (see Section [plm versus nlme and lme4](#nlme)).
1380
1381### General serial correlation tests
1382
1383A general testing procedure for serial correlation in fixed effects
1384(FE), random effects (RE) and pooled-OLS
1385panel models alike can be based on considerations in @WOOL:02, 10.7.2.
1386
1387Recall that `plm` model objects are the result of OLS
1388estimation performed on "demeaned" data, where, in the case of
1389individual effects (else symmetric), this means time-demeaning for the
1390FE (`within`) model, quasi-time-demeaning for the RE
1391(`random`) model and original data, with no demeaning at all, for
1392the pooled OLS (`pooling`) model (see Section [software approach](#software-approach)).
1393
1394For the random effects model, @WOOL:02 observes that
1395under the null of homoskedasticity and no serial correlation in the
1396idiosyncratic errors, the residuals from the quasi-demeaned regression
1397must be spherical as well. Else, as the individual effects are wiped
1398out in the demeaning, any remaining serial correlation must be due to
1399the idiosyncratic component. Hence, a simple way of testing for serial
1400correlation is to apply a standard serial correlation test to the
1401quasi-demeaned model. The same applies in a pooled model, w.r.t. the
1402original data.
1403
1404The FE case needs some qualification. It is well-known that if the
1405original model's errors are uncorrelated then FE residuals are
1406negatively serially correlated, with $cor(\hat{u}_{it},
1407\hat{u}_{is})=-1/(T-1)$ for each $t,s$ (see @WOOL:02 10.5.4). This
1408correlation clearly dies out as T increases, so this kind of AR test
1409is applicable to `within` model objects only for T "sufficiently
1410large"^[Baltagi and Li derive a basically analogous T-asymptotic test
1411for first-order serial correlation in a FE panel model as a
1412Breusch-Godfrey LM test on within residuals (see @BALT:LI:95 par. 2.3
1413and formula 12). They also observe that the test on within residuals
1414can be used for testing on the RE model, as "the within transformation
1415[time-demeaning, in our terminology] wipes out the individual effects,
1416whether fixed or random". Generalizing the Durbin-Watson test to FE
1417models by applying it to fixed effects residuals is documented in
1418@BHAR:FRAN:NARE:82, a (modified) version for unbalanced and/or
1419non-consecutive panels is implemented in `pbnftest` as is Baltagi-Wu's
1420LBI statistic (for both see @BALT:WU:99).].  On the converse, in short
1421panels the test gets severely biased towards rejection (or, as the
1422induced correlation is negative, towards acceptance in the case of the
1423one-sided DW test with `alternative="greater"`). See below for a
1424serial correlation test applicable to "short" FE panel models.
1425
1426
1427`plm` objects retain the "demeaned" data, so the procedure is
1428straightforward for them. The wrapper functions `pbgtest` and
1429`pdwtest` re-estimate the relevant quasi-demeaned model by
1430OLS and apply, respectively, standard
1431Breusch-Godfrey and Durbin-Watson tests from package `lmtest`:
1432
1433```{r generalAR}
1434pbgtest(grun.fe, order = 2)
1435```
1436
1437The tests share the features of their OLS counterparts, in particular
1438the `pbgtest` allows testing for higher-order serial correlation,
1439which might turn useful, e.g., on quarterly data. Analogously, from
1440the point of view of software, as the functions are simple wrappers
1441towards `bgtest` and `dwtest`, all arguments from the latter two apply
1442and may be passed on through the ellipsis (the `...` argument).
1443
1444
1445### Wooldridge's test for serial correlation in "short" FE panels
1446
1447For the reasons reported above, under the null of no serial
1448correlation in the errors, the residuals of a FE model must be
1449negatively serially correlated, with $cor(\hat{\epsilon}_{it},
1450\hat{\epsilon}_{is})=-1/(T-1)$ for each $t,s$. Wooldridge suggests basing a
1451test for this null hypothesis on a pooled regression of FE residuals
1452on themselves, lagged one period: $$\hat{\epsilon}_{i,t}=\alpha + \delta
1453\hat{\epsilon}_{i,t-1} + \eta_{i,t}$$ Rejecting the restriction $\delta =
1454-1/(T-1)$ makes us conclude against the original null of no serial
1455correlation.
1456
1457The building blocks available in `plm` make it easy to
1458construct a function carrying out this procedure: first the
1459FE model is estimated and the residuals retrieved, then they
1460are lagged and a `pooling` AR(1) model is estimated. The test
1461statistic is obtained by applying the above restriction on
1462$\delta$ and supplying a heteroskedasticity- and autocorrelation-consistent
1463covariance matrix (`vcovHC` with the appropriate options, in particular
1464`method="arellano"`)^[see subsection [robust covariance matrix estimation](#robust).].
1465
1466```{r pwartest}
1467pwartest(log(emp) ~ log(wage) + log(capital), data=EmplUK)
1468```
1469
1470The test is applicable to any FE panel model, and in particular to
1471"short" panels with small $T$ and large $n$.
1472
1473
1474
1475### Wooldridge's first-difference-based test
1476
1477In the context of the first difference model,
1478@WOOL:02, 10.6.3 proposes a serial correlation test that can
1479also be seen as a specification test to choose the most efficient
1480estimator between fixed effects (`within`) and first difference
1481(`fd`).
1482
1483The starting point is the observation that if the idiosyncratic errors
1484of the original model $u_{it}$ are uncorrelated, the errors of the
1485(first) differenced model^[Here, $e_{it}$ for notational simplicity (and as in
1486Wooldridge): equivalent to $\Delta \epsilon_{it}$ in the general notation of
1487the paper.]
1488$e_{it} \equiv
1489u_{it}-u_{i,t-1}$ will be correlated, with $cor(e_{it},
1490e_{i,t-1})=-0.5$, while any time-invariant effect, "fixed" or
1491"random", is wiped out in the differencing. So a serial correlation
1492test for models with individual effects of any kind can be based on
1493estimating the model $$\hat{u}_{i,t}= \delta \hat{u}_{i,t-1} +
1494\eta_{i,t}$$ and testing the restriction $\delta = -0.5$,
1495corresponding to the null of no serial correlation.  @DRUK:03
1496provides Monte Carlo evidence of the good empirical properties of the
1497test.
1498
1499On the other extreme (see @WOOL:02 10.6.1), if the differenced
1500errors $e_{it}$ are uncorrelated, as by definition $u_{it} = u_{i,t-1}
1501+ e_{it}$, then $u_{it}$ is a random walk. In this latter case, the
1502most efficient estimator is the first difference (`fd`) one; in
1503the former case, it is the fixed effects one (`within`).
1504
1505The function `pwfdtest` allows testing either hypothesis: the
1506default behaviour `h0="fd"` is to test for serial correlation in
1507*first-differenced* errors:
1508
1509```{r pwfdtest1}
1510pwfdtest(log(emp) ~ log(wage) + log(capital), data=EmplUK)
1511```
1512
1513while specifying `h0="fe"` the null hypothesis becomes no serial
1514correlation in *original* errors, which is similar to the `pwartest`.
1515
1516```{r pwfdtest2}
1517pwfdtest(log(emp) ~ log(wage) + log(capital), data=EmplUK, h0="fe")
1518```
1519
1520Not rejecting one of the two is evidence in favour of using the
1521estimator corresponding to `h0`. Should the truth lie in the
1522middle (both rejected), whichever estimator is chosen will have
1523serially correlated errors: therefore it will be advisable to use the
1524autocorrelation-robust covariance estimators from the
1525subsection [robust covariance matrix estimation](#robust) in inference.
1526
1527
1528## Tests for cross-sectional dependence
1529
1530Next to the more familiar issue of serial correlation, over the last
1531years a growing body of literature has been dealing with
1532cross-sectional dependence (henceforth: XSD) in panels, which
1533can arise, e.g., if individuals respond to common shocks (as in the
1534literature on *factor models*) or if spatial diffusion processes
1535are present, relating individuals in a way depending on a measure of
1536distance (*spatial models*).
1537
1538The subject is huge, and here we touch only some general aspects of
1539misspecification testing and valid inference. If XSD is
1540present, the consequence is, at a minimum, inefficiency of the usual
1541estimators and invalid inference when using the standard covariance
1542matrix^[This is the case, e.g., if in an unobserved effects model when XSD is
1543due to an unobservable factor structure, with factors that are uncorrelated
1544with the regressors. In this case the within or random estimators are still
1545consistent, although inefficient (see @DEHO:SARA:06).].
1546The plan is to have in `plm` both misspecification tests to detect
1547XSD and robust covariance matrices to perform valid inference
1548in its presence, like in the serial dependence case. For now, though,
1549only misspecification tests are included.
1550
1551
1552### CD and LM-type tests for global cross-sectional dependence
1553
1554The function `pcdtest` implements a family of XSD
1555tests which can be applied in different settings, ranging from those
1556where $T$ grows large with $n$ fixed to "short" panels with a big
1557$n$ dimension and a few time periods. All are based on
1558(transformations of--) the product-moment correlation coefficient of a
1559model's residuals, defined as
1560
1561$$ \hat{\rho}_{ij}=\frac{\sum_{t=1}^T \hat{u}_{it} \hat{u}_{jt}}{(\sum_{t=1}^T \hat{u}^2_{it})^{1/2} (\sum_{t=1}^T \hat{u}^2_{jt})^{1/2} } $$
1562
1563i.e., as averages over the time dimension of pairwise correlation
1564coefficients for each pair of cross-sectional units.
1565
1566The Breusch-Pagan [@BREU:PAGA:80] LM test, based on the squares of
1567$\rho_{ij}$, is valid for $T \rightarrow \infty$ with $n$ fixed;
1568defined as
1569
1570$$LM=\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} T_{ij} \hat{\rho}_{ij}^2$$
1571
1572where in the case of an unbalanced panel only pairwise complete
1573observations are considered, and $T_{ij}=min(T_i,T_j)$ with $T_i$
1574being the number of observations for individual $i$; else, if the
1575panel is balanced, $T_{ij}=T$ for each $i,j$.  The test is distributed
1576as $\chi^2_{n(n-1)/2}$. It is inappropriate whenever the $n$ dimension
1577is "large". A scaled version, applicable also if $T \rightarrow
1578\infty$ and *then* $n \rightarrow \infty$ (as in some pooled time
1579series contexts), is defined as
1580
1581$$SCLM=\sqrt{\frac{1}{n(n-1)}} ( \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} T_{ij} \hat{\rho}_{ij}^2 -1 )$$
1582
1583and distributed as a standard normal (see @PESA:04).
1584
1585A bias-corrected scaled version, $BCSCLM$, for the *fixed effect model with individual
1586effects* only is also available which is simply the $SCLM$ with a term correcting
1587for the bias (@BALT:FENG:KAO:12)^[The unbalanced version of this statistic uses
1588max(Tij) for T in the bias-correction term.]. This statistic is also asymptotically
1589distributed as standard normal.
1590$$BCSCLM=\sqrt{\frac{1}{n(n-1)}} ( \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} T_{ij} \hat{\rho}_{ij}^2 -1)-\frac{n}{2(T-1)}$$
1591
1592
1593
1594
1595Pesaran's (@PESA:04, @PESA:15) $CD$ test
1596
1597$$CD=\sqrt{\frac{2}{n(n-1)}} ( \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \sqrt{T_{ij}} \hat{\rho}_{ij} )$$
1598
1599based on $\rho_{ij}$ without squaring (also distributed as a standard
1600normal) is appropriate both in $n$-- and in $T$--asymptotic
1601settings. It has remarkable properties in samples of any practically
1602relevant size and is robust to a variety of settings. The only big
1603drawback is that the test loses power against the alternative of
1604cross-sectional dependence if the latter is due to a factor structure
1605with factor loadings averaging zero, that is, some units react
1606positively to common shocks, others negatively.
1607
1608The default version of the test is `"cd"` yielding Pesaran's $CD$ test.
1609These tests are originally meant to use the residuals of separate
1610estimation of one time-series regression for each cross-sectional
1611unit, so this is the default behaviour of `pcdtest`.
1612
1613```{r pcdtest1}
1614pcdtest(inv~value+capital, data=Grunfeld)
1615```
1616
1617If a different model specification (`within`, `random`, ...)
1618is assumed consistent, one can resort to its residuals for
1619testing^[This is also the only solution when the time dimension's length is
1620insufficient for estimating the heterogeneous model.]
1621by specifying the relevant `model` type. The main
1622argument of this function may be either a model of class
1623`panelmodel` or a `formula` and a `data.frame`; in the
1624second case, unless `model` is set to `NULL`, all usual
1625parameters relative to the estimation of a `plm` model may be
1626passed on.  The test is compatible with any consistent
1627`panelmodel` for the data at hand, with any specification of
1628`effect`. E.g., specifying `effect = "time"` or
1629`effect = "twoways"` allows to test for residual cross-sectional
1630dependence after the introduction of time fixed effects to account for
1631common shocks.
1632
1633```{r pcdtest2}
1634pcdtest(inv~value+capital, data=Grunfeld, model="within")
1635```
1636
1637
1638
1639If the time dimension is insufficient and `model=NULL`, the
1640function defaults to estimation of a `within` model and issues a
1641warning.
1642
1643
1644### CD(p) test for local cross-sectional dependence
1645
1646A *local* variant of the $CD$ test, called $CD(p)$ test
1647[@PESA:04], takes into account an appropriate subset of *neighbouring*
1648cross-sectional units to check the null of no XSD against the alternative
1649of *local* XSD, i.e. dependence between neighbours only. To do so, the pairs
1650of neighbouring units are selected by means of a binary proximity matrix
1651like those used in spatial models. In the original paper, a regular
1652ordering of observations is assumed, so that the $m$-th
1653cross-sectional observation is a neighbour to the $(m-1)$-th and to
1654the $(m+1)$-th. Extending the $CD(p)$ test to irregular lattices, we
1655employ the binary proximity matrix as a selector for discarding the
1656correlation coefficients relative to pairs of observations that are
1657not neighbours in computing the $CD$ statistic.  The test is then
1658defined as
1659
1660$$CD=\sqrt{\frac{1}{\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} w(p)_{ij}}} ( \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} [w(p)]_{ij} \sqrt{T_{ij}}\hat{\rho}_{ij} )$$
1661
1662where $[w(p)]_{ij}$ is the $(i,j)$-th element of the $p$-th order
1663proximity matrix, so that if $h,k$ are not neighbours, $[w(p)]_{hk}=0$
1664and $\hat{\rho}_{hk}$ gets "killed"; this is easily seen to reduce
1665to formula (14) in Pesaran [@PESA:04] for the special case
1666considered in that paper.  The same can be applied to the $LM$,
1667$SCLM$, and $BCSCLM$ tests.
1668
1669Therefore, the *local* version of either test can be computed
1670supplying an $n \times n$ matrix (of any kind coercible to
1671`logical`), providing information on whether any pair of
1672observations are neighbours or not, to the `w` argument. If
1673`w` is supplied, only neighbouring pairs will be used in
1674computing the test; else, `w` will default to `NULL` and all
1675observations will be used.  The matrix needs not really be binary, so
1676commonly used "row-standardized" matrices can be employed as well:
1677it is enough that neighbouring pairs correspond to nonzero elements in
1678`w` ^[The very comprehensive package `spdep` for spatial dependence analysis
1679(see @BIVA:08) contains features for creating, lagging and manipulating
1680*neighbour list* objects of class `nb`, that can be readily converted to and
1681from proximity matrices by means of the `nb2mat` function. Higher orders of
1682the $CD(p)$ test can be obtained by lagging the corresponding `nb`s
1683through `nblag`.].
1684
1685## Panel unit root tests
1686
1687### Overview of functions for panel unit root testing
1688
1689
1690Below, first an overview is provided which tests are implemented per functions.
1691A theoretical treatment is given for a few of those tests later on. The package
1692`plm` offers several panel unit root tests contained in three functions:
1693
1694- `purtest` (Levin-Lin-Chu test, IPS test, several Fisher-type tests, Hadri's test),
1695- `cipstest` (cross-sectionally augmented IPS test), and
1696- `phansi` (Simes' test).
1697
1698While `purtest` implements various tests which can be selected via its `test`
1699argument, `cipstest` and `phansi` are functions for a specific test each.
1700
1701Function `purtest` offers the following tests by setting argument `test` to:
1702
1703- `"levinlin"` (default), for the Levin-Lin-Chu test
1704              (@LEVIN:LIN:CHU:02), see below for
1705              a theoretical exposition ([Levin-Lin-Chu test](#levinlin))),
1706- `"ips"`, for Im-Pesaran-Shin (IPS) test by @IM:PESAR:SHIN:03, see below for
1707              a theoretical exposition ([Im-Pesaran-Shin test](#ips))),
1708- `"madwu"`, is the inverse $\chi^2$ test by @MADDA:WU:99, also called
1709            P test by @CHOI:01,
1710- `"Pm"`, is the modified P test proposed by @CHOI:01 for large N,
1711- `"invnormal"`, is the inverse normal test (@CHOI:01),
1712- `"logit"`, is the logit test (@CHOI:01),
1713- `"hadri"`, for Hadri's test (@HADR:00).
1714
1715The tests in `purtest` are often called first generation panel unit root tests
1716as they do assume absence of cross-sectional correlation; all these, except
1717Hadri's test, are based on the estimation of
1718augmented Dickey-Fuller (ADF) regressions for each time series. A
1719statistic is then computed using the t-statistics associated with
1720the lagged variable. I a different manner, the Hadri residual-based LM statistic
1721is the cross-sectional average of individual KPSS statistics
1722(@KWIA:PHIL:SCHM:SHIN:92), standardized
1723by their asymptotic mean and standard deviation.
1724Among the tests in `purtest`, `"madwu"`, `"Pm"`, `"invormal"`, and `"logit"` are
1725Fisher-type tests.^[The individual p-values for the Fisher-type tests
1726are approximated as described in @MACK:96 if the package
1727`urca` (@PFAFF:08) is available, otherwise as described in @MACK:94.]
1728
1729`purtest` returns an object of class `"purtest"` which contains details
1730about the test performed, among them details about the individual
1731regressions/statistics for the test. Associated `summary` and `print.summary`
1732methods can be used to extract/display the additional information.
1733
1734Function `cipstest` implements Pesaran's (@pes07) cross-sectionally
1735augmented version of the Im-Pesaran-Shin panel unit root test and is a
1736so-called second-generation panel unit root test.
1737
1738Function `phansi` implements the idea of @HANCK:13 to apply Simes' testing
1739approach for intersection of individual hypothesis tests to panel unit root
1740testing, see below for a more thorough treatment
1741of [Simes’ approach for intersecting hypotheses](#phansi).
1742
1743
1744### Preliminary results
1745
1746We consider the following model:
1747
1748$$
1749y_{it} = \delta y_{it-1} + \sum_{L=1}^{p_i} \theta_i \Delta
1750y_{it-L}+\alpha_{mi} d_{mt}+\epsilon_{it}
1751$$
1752
1753The unit root hypothesis is $\rho = 1$. The model can be rewritten in
1754difference:
1755
1756$$
1757\Delta y_{it} = \rho y_{it-1} + \sum_{L=1}^{p_i} \theta_i \Delta
1758y_{it-L}+\alpha_{mi} d_{mt}+\epsilon_{it}
1759$$
1760
1761So that the unit-root hypothesis is now $\rho = 0$.
1762
1763Some of the unit-root tests for panel data are based on preliminary
1764results obtained by running the above Augmented Dickey-Fuller (ADF)
1765regression.
1766
1767First, we have to determine the optimal number of lags $p_i$ for each
1768time-series. Several possibilities are available. They all have in
1769common that the maximum number of lags have to be chosen first. Then,
1770$p_i$ can be chosen by using:
1771
1772- the Schwarz information criterion (SIC) (also known as Bayesian information criterion (BIC)),
1773- the Akaike information criterion (AIC),
1774- the Hall's method, which consist in removing the higher lags while
1775  they are not significant.
1776
1777The ADF regression is run on $T-p_i-1$ observations for
1778each individual, so that the total number of observations is $n\times
1779\tilde{T}$ where $\tilde{T}=T-p_i-1$
1780
1781$\bar{p}$ is the average number of lags. Call $e_{i}$ the
1782vector of residuals.
1783
1784Estimate the variance of the $\epsilon_i$ as:
1785
1786$$
1787\hat{\sigma}_{\epsilon_i}^2 = \frac{\sum_{t=p_i+1}^{T} e_{it}^2}{df_i}
1788$$
1789
1790
1791
1792### Levin-Lin-Chu model{#levinlin}
1793
1794Then, as per @LEVIN:LIN:CHU:02, compute artificial regressions of $\Delta y_{it}$
1795and $y_{it-1}$ on $\Delta y_{it-L}$ and $d_{mt}$ and get the two vectors of
1796residuals $z_{it}$ and $v_{it}$.
1797
1798Standardize these two residuals and run the pooled regression of
1799$z_{it}/\hat{\sigma}_i$ on $v_{it}/\hat{\sigma}_i$ to get
1800$\hat{\rho}$, its standard deviation $\hat{\sigma}({\hat{\rho}})$ and
1801the t-statistic $t_{\hat{\rho}}=\hat{\rho}/\hat{\sigma}({\hat{\rho}})$.
1802
1803Compute the long run variance of $y_i$ :
1804
1805$$
1806\hat{\sigma}_{yi}^2 = \frac{1}{T-1}\sum_{t=2}^T \Delta y_{it}^2 + 2
1807\sum_{L=1}^{\bar{K}}w_{\bar{K}L}\left[\frac{1}{T-1}\sum_{t=2+L}^T
1808  \Delta y_{it} \Delta y_{it-L}\right]
1809$$
1810
1811Define $\bar{s}_i$ as the ratio of the long and short term variance
1812and $\bar{s}$ the mean for all the individuals of the sample
1813
1814$$
1815s_i = \frac{\hat{\sigma}_{yi}}{\hat{\sigma}_{\epsilon_i}}
1816$$
1817
1818$$
1819\bar{s} = \frac{\sum_{i=1}^n s_i}{n}
1820$$
1821
1822
1823$$
1824t^*_{\rho}=\frac{t_{\rho}- n \bar{T} \bar{s} \hat{\sigma}_{\tilde{\epsilon}}^{-2}
1825\hat{\sigma}({\hat{\rho}}) \mu^*_{m\tilde{T}}}{\sigma^*_{m\tilde{T}}}
1826$$
1827
1828follows a normal distribution under the null hypothesis of
1829stationarity. $\mu^*_{m\tilde{T}}$ and $\sigma^*_{m\tilde{T}}$ are
1830given in table 2 of the original paper and are also available in the
1831package.
1832
1833An example how the Levin-Lin-Chu test is performed with `purtest` using a lag
1834of 2 and intercept and a time trend as exogenous variables in the ADF
1835regressions is:
1836```{r levinlin}
1837data("HousePricesUS", package = "pder")
1838lprice <- log(pdata.frame(HousePricesUS)$price)
1839(lev <- purtest(lprice, test = "levinlin", lags = 2, exo = "trend"))
1840summary(lev) ### gives details
1841```
1842
1843
1844### Im-Pesaran-Shin (IPS) test{#ips}
1845
1846This test by @IM:PESAR:SHIN:03 does not require that $\rho$ is the same for all
1847the individuals. The null hypothesis is still that all the series have an
1848unit root, but the alternative is that some may have a unit root and
1849others have different values of $\rho_i <0$.
1850
1851The test is based on the average of the student statistic of the
1852$\rho$ obtained for each individual:
1853
1854$$
1855\bar{t}=\frac{1}{n}\sum_{i=1}^n t_{\rho i}
1856$$
1857
1858The statistic is then:
1859
1860$$
1861z = \frac{\sqrt{n}\left(\bar{t}- E(\bar{t})\right)}{\sqrt{V(\bar{t})}}
1862$$
1863
1864$\mu^*_{m\tilde{T}}$ and $\sigma^*_{m\tilde{T}}$ are
1865given in table 2 of the original paper and are also available in the
1866package.
1867
1868
1869An example of the IPS test with `purtest` with the same settings as in the
1870previously performed Levin-Lin-Chu test is:
1871```{r ips}
1872purtest(lprice, test = "ips", lags = 2, exo = "trend")
1873```
1874
1875
1876
1877### Simes' approach: intersecting hypotheses{#phansi}
1878
1879A different approach to panel unit root testing can be drawn from the general
1880Simes' test for intersection of individual hypothesis tests [@SIMES:86].
1881@HANCK:13 suggests to apply the approach for panel unit root testing: The tests
1882works by combining p-values from single hypothesis tests (individual
1883unit root tests) with a global (intersected) hypothesis and controls for the
1884multiplicity in testing. Thus, it works "on top" of any panel unit root test
1885which yield a p-value for each individual series. Unlike most other panel unit
1886root tests, this approach allows to discriminate between individuals for which
1887the individual H0 (unit root present for individual series) is rejected/is not
1888rejected and requires a pre-specified significance level. Further, the test is
1889robust versus general patterns of cross-sectional dependence.
1890
1891The function `phansi` for this test takes as main input object either a numeric
1892containing p-values of individual tests or a `"purtest"` object as produced by
1893function `purtest` which holds a suitable pre-computed panel unit root test
1894(one that produces p-values per individual series). The significance level is
1895set by argument `alpha` (default 5 %). The function's return value is a list
1896with detailed evaluation of the applied Simes test. The associated print method
1897gives a verbal evaluation.
1898
1899The following examples shows both accepted ways of input, the first example
1900replicates @HANCK:13, table 11 (left side), who applied some panel unit root test
1901for a Purchasing Power Parity analysis per country (individual H0 hypotheses per
1902series) to get the individual p-values and then used Simes' approach for testing
1903the global (intersecting) hypothesis for the whole panel.
1904```{r phansi1}
1905### input is numeric (p-values), replicates Hanck (2013), Table 11 (left side)
1906pvals <- c(0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0050,0.0050,0.0050,
1907           0.0050,0.0175,0.0175,0.0200,0.0250,0.0400,0.0500,0.0575,0.2375,0.2475)
1908countries <- c("Argentina","Sweden","Norway","Mexico","Italy","Finland","France",
1909              "Germany","Belgium","U.K.","Brazil","Australia","Netherlands",
1910              "Portugal","Canada", "Spain","Denmark","Switzerland","Japan")
1911names(pvals) <- countries
1912h <- phansi(pvals)
1913print(h)
1914h$rejected # logical indicating the individuals with rejected individual H0
1915```
1916
1917```{r phansi2, results='hide'}
1918### input is a (suitable) purtest object / different example
1919y <- data.frame(split(Grunfeld$inv, Grunfeld$firm))
1920obj <- purtest(y, pmax = 4, exo = "intercept", test = "madwu")
1921phansi(obj, alpha = 0.06) # test with significance level set to 6 %
1922```
1923
1924
1925## Robust covariance matrix estimation{#robust}
1926
1927Robust estimators of the covariance matrix of coefficients are
1928provided, mostly for use in Wald-type tests, and this section provides
1929some basics and examples. A more comprehensive exposition of the theory
1930and the capabilities that come with the plm package is given in @mil17b.
1931
1932`vcovHC` estimates three "flavours" of White's heteroskedasticity-consistent
1933covariance matrix^[See @WHIT:80 and @WHIT:84b.] (known as the
1934*sandwich* estimator). Interestingly, in the
1935context of panel data the most general version also proves consistent
1936vs. serial correlation.
1937
1938All types assume no correlation between errors of different groups
1939while allowing for heteroskedasticity across groups, so that the full
1940covariance matrix of errors is $V=I_n \otimes \Omega_i; i=1,..,n$. As
1941for the *intragroup* error covariance matrix of every single
1942group of observations, `"white1"` allows for general
1943heteroskedasticity but no serial correlation, *i.e.*
1944
1945\begin{equation}
1946 (\#eq:omegaW1)
1947 \Omega_i=
1948 \left[ \begin{array}{c c c c}
1949 \sigma_{i1}^2 & \dots & \dots & 0 \\
1950 0 & \sigma_{i2}^2 & & \vdots \\
1951 \vdots & & \ddots & 0 \\
1952 0 & ... & ... & \sigma_{iT}^2 \\
1953 \end{array} \right]
1954\end{equation}
1955
1956while `"white2"` is `"white1"` restricted to a common
1957variance inside every group, estimated as
1958$\sigma_i^2=\sum_{t=1}^T{\hat{u}_{it}^2}/T$, so that $\Omega_i=I_T \otimes
1959\sigma_i^2$ (see @GREE:03, 13.7.1--2 and
1960@WOOL:02, 10.7.2; `"arellano"` (see ibid. and the original ref. @AREL:87) allows
1961a fully general structure w.r.t. heteroskedasticity and serial correlation:
1962
1963\begin{equation}
1964 (\#eq:omegaArellano)
1965 \Omega_i=
1966 \left[ \begin{array}{c c c c c}
1967 \sigma_{i1}^2 & \sigma_{i1,i2}  & \dots & \dots & \sigma_{i1,iT} \\
1968 \sigma_{i2,i1} & \sigma_{i2}^2 & & & \vdots \\
1969 \vdots & & \ddots & & \vdots \\
1970 \vdots & & & \sigma_{iT-1}^2 & \sigma_{iT-1,iT} \\
1971 \sigma_{iT,i1} & \dots & \dots & \sigma_{iT,iT-1} & \sigma_{iT}^2 \\
1972 \end{array} \right]
1973\end{equation}
1974
1975
1976The latter is, as already observed, consistent w.r.t. timewise
1977correlation of the errors, but on the converse, unlike the White 1 and
19782 methods, it relies on large $n$ asymptotics with small $T$.
1979
1980The fixed effects case, as already observed in Section
1981[tests of serial correlation](#serialcor) on serial correlation, is
1982complicated by the fact that the demeaning induces serial correlation
1983in the errors. The original White estimator (`"white1"`) turns out to be
1984inconsistent for fixed $T$ as $n$ grows, so in this case it is
1985advisable to use the `"arellano"` version (see @STOC:WATS:08).
1986
1987The errors may be weighted according to the schemes proposed by
1988@MACK:WHIT:85 and @CRIB:04 to improve small-sample performance^[The
1989HC3 and HC4 weighting schemes are computationally expensive and may
1990hit memory limits for $nT$ in the thousands, where on the other hand
1991it makes little sense to apply small sample corrections.].
1992
1993The main use of `vcovHC` (and the other variance-covariance estimators provided
1994in the package `vcovBK`, `vcovNW`, `vcovDC`, `vcovSCC`) is to pass it to plm's
1995own functions like `summary`, `pwaldtest`, and `phtest` or together with testing
1996functions from the `lmtest` and `car` packages. All of these typically allow
1997passing the `vcov` or `vcov.` parameter either as a matrix or as a function
1998(see also @ZEIL:04). If one
1999is happy with the defaults, it is easiest to pass the function
2000itself^[For `coeftest` set `df = Inf` to have the coefficients' tests
2001be performed with standard normal distribution instead of t
2002distribution as we deal with a random effects model here. For these
2003types of models, the precise distribution of the coefficients
2004estimates is unknown.]:
2005
2006```{r vcovHC1}
2007re <- plm(inv~value+capital, data = Grunfeld, model = "random")
2008summary(re, vcov = vcovHC) # gives usual summary output but with robust test statistics
2009
2010library("lmtest")
2011coeftest(re, vcovHC, df = Inf)
2012```
2013
2014
2015else one may do the covariance computation inside the call, thus passing on a matrix:
2016
2017```{r vcovHC2, results='hide'}
2018summary(re, vcov = vcovHC(re, method="white2", type="HC3"), df = Inf)
2019coeftest(re, vcovHC(re, method="white2", type="HC3"), df = Inf)
2020```
2021
2022For some tests, e.g., for multiple model comparisons by `waldtest`, one
2023should always provide a function^[Joint zero-restriction testing still
2024allows providing the `vcov` of the unrestricted model as a matrix, see
2025the documentation of package `lmtest`.].  In this case, optional
2026parameters are provided as shown below (see also @ZEIL:04, p. 12):
2027
2028```{r waldtest-vcovHC}
2029waldtest(re, update(re, . ~ . -capital),
2030         vcov=function(x) vcovHC(x, method="white2", type="HC3"))
2031```
2032
2033Moreover, `linearHypothesis` from package `car` may be used to test
2034for linear restrictions:
2035
2036```{r car-vcovHC}
2037library("car")
2038linearHypothesis(re, "2*value=capital", vcov. = vcovHC)
2039```
2040
2041A specific methods are also provided for `pcce` and `pgmm` objects, for the latter
2042`vcovHC` provides the robust covariance matrix proposed by
2043@WIND:05 for generalized method of moments estimators.
2044
2045
2046# plm versus nlme and lme4{#nlme}
2047
2048The models termed *panel* by the econometricians have counterparts in
2049the statistics literature on *mixed* models (or *hierarchical models*,
2050or *models for longitudinal data*), although there are both
2051differences in jargon and more substantial distinctions. This language
2052inconsistency between the two communities, together with the more
2053complicated general structure of statistical models for longitudinal
2054data and the associated notation in the software, is likely to scare
2055some practicing econometricians away from some potentially useful
2056features of the `R` environment, so it may be useful to provide here a
2057brief reconciliation between the typical panel data specifications
2058used in econometrics and the general framework used in statistics for
2059mixed models^[This discussion does not consider GMM models. One of the
2060basic reasons for econometricians not to choose maximum likelihood
2061methods in estimation is that the strict exogeneity of regressors
2062assumption required for consistency of the ML models reported in the
2063following is often inappropriate in economic settings.].
2064
2065`R` is particularly strong on mixed models' estimation, thanks to the
2066long-standing `nlme` package (see @PINH:BATE:DEBR:SARK:07) and the
2067more recent `lme4` package, based on S4 classes (see @BATE:07)^[The
2068standard reference on the subject of mixed models in `S`/`R` is
2069@PINH:BATE:00.].  In the following we will refer to the more
2070established `nlme` to give some examples of "econometric" panel models
2071that can be estimated in a likelihood framework, also including some
2072likelihood ratio tests.  Some of them are not feasible in `plm` and
2073make a useful complement to the econometric "toolbox" available in
2074`R`.
2075
2076## Fundamental differences between the two approaches
2077
2078Econometrics deal mostly with non-experimental data. Great emphasis
2079is put on specification procedures and misspecification testing. Model
2080specifications tend therefore to be very simple, while great attention
2081is put on the issues of endogeneity of the regressors, dependence
2082structures in the errors and robustness of the estimators under
2083deviations from normality. The preferred approach is often semi- or
2084non-parametric, and heteroskedasticity-consistent techniques are
2085becoming standard practice both in estimation and testing.
2086
2087For all these reasons, although the maximum likelihood framework is
2088important in testing^[Lagrange Multiplier tests based on the
2089likelihood principle are suitable for testing against more general
2090alternatives on the basis of a maintained model with spherical
2091residuals and find therefore application in testing for departures
2092from the classical hypotheses on the error term. The seminal reference
2093is @BREU:PAGA:80.]  and sometimes used in estimation as well, panel
2094model estimation in econometrics is mostly accomplished in the
2095generalized least squares framework based on Aitken's Theorem and,
2096when possible, in its special case OLS, which are free from
2097distributional assumptions (although these kick in at the diagnostic
2098testing stage). On the contrary, longitudinal data models in `nlme`
2099and `lme4` are estimated by (restricted or unrestricted) maximum
2100likelihood. While under normality, homoskedasticity and no serial
2101correlation of the errors OLS are also the maximum likelihood
2102estimator, in all the other cases there are important differences.
2103
2104The econometric GLS approach has closed-form analytical solutions
2105computable by standard linear algebra and, although the latter can
2106sometimes get computationally heavy on the machine, the expressions
2107for the estimators are usually rather simple. ML estimation of
2108longitudinal models, on the contrary, is based on numerical
2109optimization of nonlinear functions without closed-form solutions and
2110is thus dependent on approximations and convergence criteria. For
2111example, the "GLS" functionality in `nlme` is rather different
2112from its "econometric" counterpart. "Feasible GLS" estimation in
2113`plm` is based on a single two-step procedure, in which an
2114inefficient but consistent estimation method (typically OLS) is
2115employed first in order to get a consistent estimate of the errors'
2116covariance matrix, to be used in GLS at the second step; on the
2117converse, "GLS" estimators in `nlme` are based on iteration
2118until convergence of two-step optimization of the relevant likelihood.
2119
2120## Some false friends
2121
2122The *fixed/random effects* terminology in econometrics is often
2123recognized to be misleading, as both are treated as random variates in
2124modern econometrics (see, e.g., @WOOL:02 10.2.1). It has been
2125recognized since Mundlak's classic paper (@MUND:78) that the
2126fundamental issue is whether the unobserved effects are correlated
2127with the regressors or not. In this last case, they can safely be left
2128in the error term, and the serial correlation they induce is cared for
2129by means of appropriate GLS transformations. On the contrary, in the
2130case of correlation, "fixed effects" methods such as least squares
2131dummy variables or time-demeaning are needed, which explicitly,
2132although inconsistently^[For fixed effects estimation, as the sample
2133grows (on the dimension on which the fixed effects are specified) so
2134does the number of parameters to be estimated. Estimation of
2135individual fixed effects is $T$-- (but not $n$--) consistent, and the
2136opposite.], estimate a group-- (or time--) invariant additional
2137parameter for each group (or time period).
2138
2139Thus, from the point of view of model specification, having
2140*fixed effects* in an econometric model has the meaning of allowing the
2141intercept to vary with group, or time, or both, while the other
2142parameters are generally still assumed to be homogeneous. Having
2143*random effects* means having a group-- (or time--, or both) specific
2144component in the error term.
2145
2146In the mixed models literature, on the contrary, *fixed effect*
2147indicates a parameter that is assumed constant, while
2148*random effects* are parameters that vary randomly around zero
2149according to a joint multivariate normal distribution.
2150
2151<!---
2152 So, the FE model in econometrics has no counterpart in the mixed
2153 models framework, unless reducing it to OLS on a specification with
2154 one dummy for each group (viz. time period, or both) (often termed
2155 *least squares dummy variables*, or LSDV model) which can
2156 trivially be estimated by OLS.
2157-->
2158
2159So, the FE model in econometrics has no counterpart in the
2160mixed models framework, unless reducing it to OLS on a
2161specification with one dummy for each group (often termed
2162*least squares dummy variables*, or LSDV model) which can
2163trivially be estimated by OLS. The RE model is
2164instead a special case of a mixed model where only the intercept is
2165specified as a random effect, while the "random" type variable
2166coefficients model can be seen as one that has the same regressors in
2167the fixed and random sets. The unrestricted generalized least squares
2168can in turn be seen, in the `nlme` framework, as a standard linear
2169model with a general error covariance structure within the groups and
2170errors uncorrelated across groups.
2171
2172
2173
2174## A common taxonomy
2175
2176To reconcile the two terminologies, in the following we report the
2177specification of the panel models in `plm` according to the general
2178expression of a mixed model in Laird-Ware form [see the web appendix
2179to @FOX:02] and the `nlme` estimation commands for maximum likelihood
2180estimation of an equivalent specification^[In doing so, we stress that
2181"equivalence" concerns only the specification of the model, and
2182neither the appropriateness nor the relative efficiency of the
2183relevant estimation techniques, which will of course be dependent on
2184the context. Unlike their mixed model counterparts, the specifications
2185in `plm` are, strictly speaking, distribution-free. Nevertheless, for
2186the sake of exposition, in the following we present them in the
2187setting which ensures consistency and efficiency (e.g., we consider
2188the hypothesis of spherical errors part of the specification of pooled
2189OLS and so forth).].
2190
2191### The Laird-Ware representation for mixed models
2192
2193A general representation for the linear mixed effects model is given
2194in @LAIR:WARE:82.
2195
2196$$
2197\begin{array}{rcl}
2198y_{it} & = & \beta_1 x_{1ij} +  \dots + \beta_p x_{pij} \\
2199 &  & b_1 z_{1ij} +  \dots + b_p z_{pij} + \epsilon_{ij} \\
2200b_{ik} & \sim & N(0,\psi^2_k), \phantom{p} Cov(b_k,b_{k'}) = \psi_{kk'} \\
2201\epsilon_{ij} & \sim & N(0,\sigma^2 \lambda_{ijj}), \phantom{p} Cov(\epsilon_{ij},\epsilon_{ij'}) = \sigma^2 \lambda_{ijj'} \\
2202\end{array}
2203$$
2204
2205where the $x_1, \dots x_p$ are the fixed effects regressors and the
2206$z_1, \dots z_p$ are the random effects regressors, assumed to be
2207 normally distributed across groups. The covariance of
2208the random effects coefficients $\psi_{kk'}$ is assumed constant
2209across groups and the covariances between the errors in group $i$,
2210$\sigma^2 \lambda_{ijj'}$, are described by the term $\lambda_{ijj'}$
2211representing the correlation structure of the errors within each group
2212(e.g., serial correlation over time) scaled by the common error
2213variance $\sigma^2$.
2214
2215### Pooling and Within
2216
2217The *pooling* specification in `plm` is equivalent to a classical
2218linear model (i.e., no random effects regressor and spherical errors:
2219$b_{iq}=0 \phantom{p} \forall i,q, \phantom{p} \lambda_{ijj}=\sigma^2$
2220for $j=j'$, $0$ else). The *within* one is the same with the
2221regressors' set augmented by $n-1$ group dummies. There is no point in
2222using `nlme` as parameters can be estimated by OLS which is also ML.
2223
2224### Random effects
2225
2226In the Laird and Ware notation, the RE specification is a model with
2227only one random effects regressor: the intercept. Formally, $z_{1ij}=1
2228\phantom{p}\forall i,j, \phantom{p} z_{qij}=0 \phantom{p} \forall i,
2229\forall j, \forall q \neq 1$ $\lambda_{ij}=1$ for $i=j$, $0$
2230else). The composite error is therefore $u_{ij}=1b_{i1} +
2231\epsilon_{ij}$. Below we report coefficients of Grunfeld's model
2232estimated by GLS and then by ML:
2233
2234```{r re2}
2235library(nlme)
2236reGLS <- plm(inv~value+capital, data=Grunfeld, model="random")
2237
2238reML <- lme(inv~value+capital, data=Grunfeld, random=~1|firm)
2239
2240coef(reGLS)
2241
2242summary(reML)$coefficients$fixed
2243```
2244
2245### Variable coefficients, "random"
2246
2247Swamy's variable coefficients model [@SWAM:70] has coefficients
2248varying randomly (and independently of each other) around a set of
2249fixed values, so the equivalent specification is $z_{q}=x_{q}
2250\phantom{p} \forall q$, i.e. the fixed effects and the random effects
2251regressors are the same, and $\psi_{kk'}=\sigma_\mu^2 I_N$, and
2252$\lambda_{ijj}=1$, $\lambda_{ijj'}=0$ for $j \neq j'$, that's to say
2253they are not correlated.
2254
2255Estimation of a mixed model with random coefficients on all regressors
2256is rather demanding from the computational side. Some models from our
2257examples fail to converge. The below example is estimated on the
2258Grunfeld data and model with time effects.
2259
2260```{r vcmrand}
2261vcm <- pvcm(inv~value+capital, data=Grunfeld, model="random", effect="time")
2262
2263vcmML <- lme(inv~value+capital, data=Grunfeld, random=~value+capital|year)
2264
2265coef(vcm)
2266
2267summary(vcmML)$coefficients$fixed
2268```
2269
2270### Variable coefficients, "within"
2271
2272This specification actually entails separate estimation of $T$
2273different standard linear models, one for each group in the data, so
2274the estimation approach is the same: OLS. In `nlme` this is done by
2275creating an `lmList` object, so that the two models below are
2276equivalent (output suppressed):
2277
2278
2279```{r vcmfixed}
2280vcmf <- pvcm(inv~value+capital, data=Grunfeld, model="within", effect="time")
2281
2282vcmfML <- lmList(inv~value+capital|year, data=Grunfeld)
2283```
2284
2285### General FGLS
2286
2287The general, or unrestricted, feasible GLS (FGLS), `pggls` in the
2288`plm` nomenclature, is equivalent to a model with no random effects
2289regressors ($b_{iq}=0 \phantom{p} \forall i,q$) and an error
2290covariance structure which is unrestricted within groups apart from
2291the usual requirements. The function for estimating such models with
2292correlation in the errors but no random effects is `gls()`.
2293
2294This very general serial correlation and heteroskedasticity structure
2295is not estimable for the original Grunfeld data, which have more time
2296periods than firms, therefore we restrict them to firms 4 to 6.
2297
2298```{r gglsre}
2299sGrunfeld <- Grunfeld[Grunfeld$firm %in% 4:6, ]
2300
2301ggls <- pggls(inv~value+capital, data=sGrunfeld, model="pooling")
2302
2303gglsML <- gls(inv~value+capital, data=sGrunfeld,
2304              correlation=corSymm(form=~1|year))
2305
2306coef(ggls)
2307
2308summary(gglsML)$coefficients
2309```
2310
2311The *within* case is analogous, with the regressor set augmented
2312by $n-1$ group dummies.
2313
2314## Some useful "econometric" models in nlme
2315Finally, amongst the many possible specifications estimable with
2316`nlme`, we report a couple cases that might be especially
2317interesting to applied econometricians.
2318
2319### AR(1) pooling or random effects panel
2320
2321Linear models with groupwise structures of time-dependence^[Take heed
2322that here, in contrast to the usual meaning of serial correlation in
2323time series, we always speak of serial correlation *between the errors
2324of each group*.]  may be fitted by `gls()`, specifying the correlation
2325structure in the `correlation` option^[note that the time index is
2326coerced to numeric before the estimation.]:
2327
2328
2329```{r lmAR1}
2330Grunfeld$year <- as.numeric(as.character(Grunfeld$year))
2331lmAR1ML <- gls(inv~value+capital,data=Grunfeld,
2332               correlation=corAR1(0,form=~year|firm))
2333```
2334
2335and analogously the random effects panel with, e.g., AR(1) errors
2336(see @BALT:05; @BALT:13; @BALT:21, ch. 5), which is a very common specification
2337in econometrics, may be fit by `lme` specifying an additional
2338random intercept:
2339
2340```{r reAR1}
2341reAR1ML <- lme(inv~value+capital, data=Grunfeld,random=~1|firm,
2342               correlation=corAR1(0,form=~year|firm))
2343```
2344
2345The regressors' coefficients and the error's serial correlation
2346coefficient may be retrieved this way:
2347
2348```{r fetchcoefs}
2349summary(reAR1ML)$coefficients$fixed
2350coef(reAR1ML$modelStruct$corStruct, unconstrained=FALSE)
2351```
2352
2353Significance statistics for the regressors' coefficients are to be
2354found in the usual `summary` object, while to get the significance
2355test of the serial correlation coefficient one can do a likelihood
2356ratio test as shown in the following.
2357
2358### An LR test for serial correlation and one for random effects
2359
2360A likelihood ratio test for serial correlation in the idiosyncratic
2361residuals can be done as a nested models test, by `anova()`,
2362comparing the model with spherical idiosyncratic residuals with the
2363more general alternative featuring AR(1) residuals. The test takes the
2364form of a zero restriction test on the autoregressive parameter.
2365
2366This can be done on pooled or random effects models alike. First we
2367report the simpler case.
2368
2369We already estimated the pooling AR(1) model above. The GLS model
2370without correlation in the residuals is the same as OLS, and one could
2371well use `lm()` for the restricted model. Here we estimate it
2372by `gls()`.
2373
2374
2375```{r LRar}
2376lmML <- gls(inv~value+capital, data=Grunfeld)
2377anova(lmML, lmAR1ML)
2378```
2379
2380
2381The AR(1) test on the random effects model is to be done in much the
2382same way, using the random effects model objects estimated above:
2383
2384```{r LRarsubRE}
2385anova(reML, reAR1ML)
2386```
2387
2388A likelihood ratio test for random effects compares the specifications
2389with and without random effects and spherical idiosyncratic errors:
2390
2391
2392```{r LRre}
2393anova(lmML, reML)
2394```
2395
2396
2397The random effects, AR(1) errors model in turn nests the AR(1) pooling
2398model, therefore a likelihood ratio test for random effects sub AR(1)
2399errors may be carried out, again, by comparing the two autoregressive
2400specifications:
2401
2402```{r LRresubAR}
2403anova(lmAR1ML, reAR1ML)
2404```
2405
2406whence we see that the Grunfeld model specification doesn't seem to need any
2407random effects once we control for serial correlation in the data.
2408
2409
2410# Conclusions{#conclusions}
2411
2412With `plm` we aim at providing a comprehensive package containing the
2413standard functionalities that are needed for the management and the
2414econometric analysis of panel data. In particular, we provide:
2415functions for data transformation; estimators for pooled, random and
2416fixed effects static panel models and variable coefficients models,
2417general GLS for general covariance structures, and generalized method
2418of moments estimators for dynamic panels; specification and diagnostic
2419tests. Instrumental variables estimation is supported. Most estimators
2420allow working with unbalanced panels. While among the different
2421approaches to longitudinal data analysis we take the perspective of
2422the econometrician, the syntax is consistent with the basic linear
2423modeling tools, like the `lm` function.
2424
2425On the input side, `formula` and `data` arguments are
2426used to specify the model to be estimated. Special functions are
2427provided to make writing formulas easier, and the structure of the
2428data is indicated with an `index` argument.
2429
2430On the output side, the model objects (of the new class
2431`panelmodel`) are compatible with the general restriction
2432testing frameworks of packages `lmtest` and `car`.
2433Specialized methods are also provided for the calculation of robust
2434covariance matrices; heteroskedasticity- and correlation-consistent
2435testing is accomplished by passing these on to testing functions,
2436together with a `panelmodel` object.
2437
2438The main functionalities of the package have been illustrated here by
2439applying them on some well-known data sets from the econometric
2440literature. The similarities and differences with the maximum
2441likelihood approach to longitudinal data have also been briefly
2442discussed.
2443
2444<!--
2445We plan to expand the methods in this paper to systems of equations
2446and to the estimation and testing of models with autoregressive
2447errors. Additions to the misspecification testing toolbox and
2448covariance estimators robust vs. cross-sectional correlation are also
2449in the offing. Lastly, conditional visualization features in the
2450`R` environment seem to offer a promising toolbox for visual
2451diagnostics, which is another subject for future work.
2452-->
2453
2454We plan to expand the methods in this paper to systems of equations
2455and to the estimation of models with autoregressive errors. Addition
2456of covariance estimators robust vs. cross-sectional correlation are
2457also in the offing. Lastly, conditional visualization features in the
2458`R` environment seem to offer a promising toolbox for visual
2459diagnostics, which is another subject for future work.
2460
2461
2462# Acknowledgments {-}
2463
2464While retaining responsibility for any error, we thank Jeffrey
2465Wooldridge, Achim Zeileis and three anonymous referees for useful
2466comments. We also acknowledge kind editing assistance by Lisa
2467Benedetti.
2468
2469# Bibliography {-}
2470
2471 <!-- Local IspellDict: english --> <!-- Local IspellPersDict: ~/emacs/.ispell-english -->
2472
2473