1%\VignetteIndexEntry{Follow-up data with the Lexis functions in Epi}
2
3\documentclass[a4paper,dvipsnames,twoside,12pt]{report}
4\newcommand{\Title}{Follow-up data with the\\ \texttt{Lexis} functions in
5  \texttt{Epi}}
6\newcommand{\Tit}{Follow-up with \texttt{Lexis}}
7\newcommand{\Version}{Version 6}
8\newcommand{\Dates}{August 2019}
9\newcommand{\Where}{SDCC}
10\newcommand{\Homepage}{\url{http://bendixcarstensen.com/} }
11\newcommand{\Faculty}{\begin{tabular}{rl}
12Bendix Carstensen
13  & Steno Diabetes Center Copenhagen, Gentofte, Denmark\\
14  & {\small \& Department of Biostatistics,
15               University of Copenhagen} \\
16  & \texttt{b@bxc.dk} \\
17  & \url{http://BendixCarstensen.com} \\[1em]
18                      \end{tabular}}
19
20\input{topreport}
21\renewcommand{\rwpre}{./flup}
22
23\chapter*{Introduction}
24\addcontentsline{toc}{chapter}{Introduction}
25
26This is an introduction to the \texttt{Lexis} machinery in the
27\texttt{Epi} package. The machinery is intended for representation and
28manipulation of follow-up data (event history data) from studies where
29exact dates of events are known. It accommodates follow-up through
30multiple states and on multiple time scales.
31
32This vignette uses an example from the \texttt{Epi} package to
33illustrate the set-up of a simple \texttt{Lexis} object (a data frame
34of follow-up intervals), as well as the subdivision of follow-up
35intervals needed for multistate representation and analysis of
36transition rates.
37
38The first chapter is exclusively on manipulation of the follow-up
39representation, but it points to the subsequent chapter where analysis
40is based on a \texttt{Lexis} representation with very small follow-up
41intervals.
42
43Chapter 2 uses analysis of mortality rates among Danish diabetes
44patients (available in the \texttt{Epi} package) currently on insulin
45treatment or not to illustrate the use of the the \texttt{Lexis}
46machinery.
47
48I owe much thanks to my colleague Lars Jorge Diaz for careful reading
49and many constructive suggestions.
50
51\section{History}
52
53The \texttt{Lexis} machinery in the \texttt{Epi} package was first
54conceived by Martyn Plummer\cite{Plummer.2011,Carstensen.2011a}, and
55since its first appearance in the \texttt{Epi} package in 2008 it has
56been expanded with a number of utilities. An overview of these can be
57found in the last chapter of this note: ``\texttt{Lexis} functions''.
58
59\chapter{Representation of follow-up data in the \texttt{Epi} package}
60
61In the \texttt{Epi}-package, follow-up data is represented by adding
62some extra variables to a data frame. Such a data frame is called a
63\texttt{Lexis} object. The tools for handling follow-up data then use
64the structure of this for special plots, tabulations and modeling.
65
66Follow-up data basically consists of a time of entry, a time of exit
67and an indication of the status at exit (normally either ``alive'' or
68``dead'') for each person. Implicitly is also assumed a status
69\emph{during} the follow-up (usually ``alive'').
70
71\begin{figure}[htbp]
72  \centering
73\setlength{\unitlength}{1pt}
74\begin{picture}(210,70)(0,75)
75%\scriptsize
76\thicklines
77 \put(  0,80){\makebox(0,0)[r]{Age-scale}}
78 \put( 50,80){\line(1,0){150}}
79 \put( 50,80){\line(0,1){5}}
80 \put(100,80){\line(0,1){5}}
81 \put(150,80){\line(0,1){5}}
82 \put(200,80){\line(0,1){5}}
83 \put( 50,77){\makebox(0,0)[t]{35}}
84 \put(100,77){\makebox(0,0)[t]{40}}
85 \put(150,77){\makebox(0,0)[t]{45}}
86 \put(200,77){\makebox(0,0)[t]{50}}
87
88 \put(  0,115){\makebox(0,0)[r]{Follow-up}}
89
90 \put( 80,105){\makebox(0,0)[r]{\small Two}}
91 \put( 90,105){\line(1,0){87}}
92 \put( 90,100){\line(0,1){10}}
93 \put(100,100){\line(0,1){10}}
94 \put(150,100){\line(0,1){10}}
95 \put(180,105){\circle{6}}
96 \put( 95,110){\makebox(0,0)[b]{1}}
97 \put(125,110){\makebox(0,0)[b]{5}}
98 \put(165,110){\makebox(0,0)[b]{3}}
99
100 \put( 50,130){\makebox(0,0)[r]{\small One}}
101 \put( 60,130){\line(1,0){70}}
102 \put( 60,125){\line(0,1){10}}
103 \put(100,125){\line(0,1){10}}
104 \put(130,130){\circle*{6}}
105 \put( 80,135){\makebox(0,0)[b]{4}}
106 \put(115,135){\makebox(0,0)[b]{3}}
107\end{picture}
108  \caption{\it Follow-up of two persons}
109  \label{fig:fu2}
110\end{figure}
111
112\section{Time scales}
113
114A time scale is a variable that varies deterministically \emph{within}
115each person during follow-up, \textit{e.g.}:
116\begin{itemize}
117  \item Age
118  \item Calendar time
119  \item Time since start of treatment
120  \item Time since relapse
121\end{itemize}
122All time scales advance at the same pace, so the time followed is the
123same on all time scales. Therefore, it will suffice to use only the
124entry point on each of the time scales, for example:
125\begin{itemize}
126  \item Age at entry
127  \item Date of entry
128  \item Time at treatment (\emph{at} treatment this is 0)
129  \item Time at relapse (\emph{at} relapse this is 0)
130\end{itemize}
131For illustration we need to load the \texttt{Epi} package:
132\begin{Schunk}
133\begin{Sinput}
134> library(Epi)
135> print( sessionInfo(), l=F )
136\end{Sinput}
137\begin{Soutput}
138R version 3.6.3 (2020-02-29)
139Platform: x86_64-pc-linux-gnu (64-bit)
140Running under: Ubuntu 14.04.6 LTS
141
142Matrix products: default
143BLAS:   /usr/lib/openblas-base/libopenblas.so.0
144LAPACK: /usr/lib/lapack/liblapack.so.3.0
145
146attached base packages:
147[1] utils     datasets  graphics  grDevices stats     methods   base
148
149other attached packages:
150[1] Epi_2.43
151
152loaded via a namespace (and not attached):
153 [1] Rcpp_1.0.3        magrittr_1.5      splines_3.6.3     MASS_7.3-53
154 [5] tidyselect_0.2.5  lattice_0.20-41   R6_2.4.1          rlang_0.4.4
155 [9] plyr_1.8.5        dplyr_0.8.4       cmprsk_2.2-7      tools_3.6.3
156[13] parallel_3.6.3    grid_3.6.3        data.table_1.12.0 nlme_3.1-149
157[17] mgcv_1.8-33       survival_3.2-7    assertthat_0.2.1  tibble_2.1.3
158[21] etm_1.0.4         numDeriv_2016.8-1 crayon_1.3.4      Matrix_1.2-18
159[25] purrr_0.3.0       glue_1.3.1        compiler_3.6.3    pillar_1.4.3
160[29] zoo_1.8-4         pkgconfig_2.0.3
161\end{Soutput}
162\end{Schunk}
163In the \texttt{Epi} package, follow-up in a cohort is represented in a
164\texttt{Lexis} object. As mentioned, a \texttt{Lexis} object is a data
165frame with some extra structure representing the follow-up. For the
166\texttt{DMlate} data --- follow-up of diabetes patients in Denmark
167recording date of birth, date of diabetes, date of insulin use, date
168of first oral drug use and date of death --- we can construct a
169\texttt{Lexis} object by:
170\begin{Schunk}
171\begin{Sinput}
172> data( DMlate )
173> head( DMlate )
174\end{Sinput}
175\begin{Soutput}
176       sex    dobth     dodm    dodth    dooad doins      dox
17750185    F 1940.256 1998.917       NA       NA    NA 2009.997
178307563   M 1939.218 2003.309       NA 2007.446    NA 2009.997
179294104   F 1918.301 2004.552       NA       NA    NA 2009.997
180336439   F 1965.225 2009.261       NA       NA    NA 2009.997
181245651   M 1932.877 2008.653       NA       NA    NA 2009.997
182216824   F 1927.870 2007.886 2009.923       NA    NA 2009.923
183\end{Soutput}
184\begin{Sinput}
185> dmL <- Lexis( entry = list( per=dodm,
186+                             age=dodm-dobth,
187+                             tfD=0 ),
188+                exit = list( per=dox ),
189+         exit.status = factor( !is.na(dodth), labels=c("DM","Dead") ),
190+                data = DMlate )
191\end{Sinput}
192\begin{Soutput}
193NOTE: entry.status has been set to "DM" for all.
194NOTE: Dropping  4  rows with duration of follow up < tol
195\end{Soutput}
196\begin{Sinput}
197> timeScales(dmL)
198\end{Sinput}
199\begin{Soutput}
200[1] "per" "age" "tfD"
201\end{Soutput}
202\end{Schunk}
203(The excluded persons are persons with date of diabetes equal to date
204of death.)
205
206The \texttt{entry} argument is a \emph{named} list with the entry
207points on each of the time scales we want to use. It defines the names
208of the time scales and the entry points of the follow-up of each
209person. The \texttt{exit} argument gives the exit time on \emph{one}
210of the time scales, so the name of the element in this list must match
211one of the names of the \texttt{entry} list. This is sufficient,
212because the follow-up time on all time scales is the same, in this
213case $\mathtt{dox}$-$\mathtt{dodm}$.
214
215The \texttt{exit.status} is a categorical variable (a \emph{factor})
216that indicates the exit status --- in this case whether the person
217(still) is in state \texttt{DM} or exits to \texttt{Dead} at the end
218of follow-up. In principle we should also indicate the
219\texttt{entry.status}, but the default is to assume that all persons
220enter in the \emph{first} of the mentioned \texttt{exit.state}s ---
221in this case \texttt{DM}, because $\mathtt{FALSE}<\mathtt{TRUE}$.
222
223Now take a look at the result:
224\begin{Schunk}
225\begin{Sinput}
226> str( dmL )
227\end{Sinput}
228\begin{Soutput}
229Classes ‘Lexis’ and 'data.frame':	9996 obs. of  14 variables:
230 $ per    : num  1999 2003 2005 2009 2009 ...
231 $ age    : num  58.7 64.1 86.3 44 75.8 ...
232 $ tfD    : num  0 0 0 0 0 0 0 0 0 0 ...
233 $ lex.dur: num  11.08 6.689 5.446 0.736 1.344 ...
234 $ lex.Cst: Factor w/ 2 levels "DM","Dead": 1 1 1 1 1 1 1 1 1 1 ...
235 $ lex.Xst: Factor w/ 2 levels "DM","Dead": 1 1 1 1 1 2 1 1 2 1 ...
236 $ lex.id : int  1 2 3 4 5 6 7 8 9 10 ...
237 $ sex    : Factor w/ 2 levels "M","F": 2 1 2 2 1 2 1 1 2 1 ...
238 $ dobth  : num  1940 1939 1918 1965 1933 ...
239 $ dodm   : num  1999 2003 2005 2009 2009 ...
240 $ dodth  : num  NA NA NA NA NA ...
241 $ dooad  : num  NA 2007 NA NA NA ...
242 $ doins  : num  NA NA NA NA NA NA NA NA NA NA ...
243 $ dox    : num  2010 2010 2010 2010 2010 ...
244 - attr(*, "time.scales")= chr  "per" "age" "tfD"
245 - attr(*, "time.since")= chr  "" "" ""
246 - attr(*, "breaks")=List of 3
247  ..$ per: NULL
248  ..$ age: NULL
249  ..$ tfD: NULL
250\end{Soutput}
251\begin{Sinput}
252> head( dmL )[,1:10]
253\end{Sinput}
254\begin{Soutput}
255            per      age tfD    lex.dur lex.Cst lex.Xst lex.id sex    dobth     dodm
25650185  1998.917 58.66119   0 11.0800821      DM      DM      1   F 1940.256 1998.917
257307563 2003.309 64.09035   0  6.6885695      DM      DM      2   M 1939.218 2003.309
258294104 2004.552 86.25051   0  5.4455852      DM      DM      3   F 1918.301 2004.552
259336439 2009.261 44.03559   0  0.7364819      DM      DM      4   F 1965.225 2009.261
260245651 2008.653 75.77550   0  1.3442847      DM      DM      5   M 1932.877 2008.653
261216824 2007.886 80.01643   0  2.0369610      DM    Dead      6   F 1927.870 2007.886
262\end{Soutput}
263\end{Schunk}
264The \texttt{Lexis} object \texttt{dmL} has a variable for each
265time scale which is the entry point on this time scale. The follow-up
266time is in the variable \texttt{lex.dur} (\texttt{dur}ation).
267Note that the exit status is in the variable \texttt{lex.Xst}
268(e\texttt{X}it \texttt{st}ate. The variable \texttt{lex.Cst} is the
269state where the follow-up takes place (\texttt{C}urrent
270\texttt{st}ate), in this case \texttt{DM} (alive with diabetes) for
271all persons. This implies that \emph{censored} observations are
272characterized by having $\mathtt{lex.Cst}=\mathtt{lex.Xst}$.
273
274There is a \texttt{summary} function for \texttt{Lexis} objects that
275lists the number of transitions and records as well as the total amount
276of follow-up time; it also (optionally) prints information about the names of the
277variables that constitute the time scales:
278\begin{Schunk}
279\begin{Sinput}
280> summary.Lexis( dmL, timeScales=TRUE )
281\end{Sinput}
282\begin{Soutput}
283Transitions:
284     To
285From   DM Dead  Records:  Events: Risk time:  Persons:
286  DM 7497 2499      9996     2499   54273.27      9996
287
288Timescales:
289per age tfD
290 ""  ""  ""
291\end{Soutput}
292\end{Schunk}
293It is possible to get a visualization of the follow-up along the
294time scales chosen by using the \texttt{plot} method for \texttt{Lexis}
295objects. \texttt{dmL} is an object of \emph{class} \texttt{Lexis}, so
296using the function \texttt{plot()} on it means that \R\ will look for
297the function \texttt{plot.Lexis} and use this function.
298\begin{Schunk}
299\begin{Sinput}
300> plot( dmL )
301\end{Sinput}
302\end{Schunk}
303The function allows quite a bit of control over the output, and a
304\texttt{points.Lexis} function allows plotting of the endpoints of
305follow-up:
306\begin{Schunk}
307\begin{Sinput}
308> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6 )
309> plot( dmL, 1:2, lwd=1, col=c("blue","red")[dmL$sex],
310+       grid=TRUE, lty.grid=1, col.grid=gray(0.7),
311+       xlim=1960+c(0,60), xaxs="i",
312+       ylim=  40+c(0,60), yaxs="i", las=1 )
313> points( dmL, 1:2, pch=c(NA,3)[dmL$lex.Xst],
314+         col="lightgray", lwd=3, cex=0.3 )
315> points( dmL, 1:2, pch=c(NA,3)[dmL$lex.Xst],
316+         col=c("blue","red")[dmL$sex], lwd=1, cex=0.3 )
317> box(bty='o')
318\end{Sinput}
319\end{Schunk}
320In the above code you will note that the values of the arguments
321\texttt{col} and \texttt{pch} are indexed by factors, using the
322convention in \R\ that the index is taken as \emph{number of the level}
323of the supplied factor. Thus \texttt{c("blue","red")[dmL\$sex]} is
324\texttt{"blue"} when \texttt{sex} is \texttt{M} (the first level).
325
326The results of these two plotting commands are in figure
327\ref{fig:Lexis-diagram}, p. \pageref{fig:Lexis-diagram}.
328\begin{figure}[tb]
329\centering
330\includegraphics[width=0.35\textwidth]{flup-dmL1}
331\includegraphics[width=0.63\textwidth]{flup-dmL2}
332\caption{\it Lexis diagram of the \textrm{\tt DMlate} dataset; left
333  panel is the default version, right panel: plot with some bells and
334  whistles. The red lines are for women, blue for men, crosses
335  indicate deaths.}
336\label{fig:Lexis-diagram}
337\end{figure}
338
339\section{Splitting the follow-up time along a time scale}
340
341In next chapter we shall conduct statistical analysis of mortality
342rates, and a prerequisite for parametric analysis of rates is that
343follow-up time is subdivided in smaller intervals, where we can
344reasonably assume that rates are constant.
345
346The follow-up time in a cohort can be subdivided (``split'') along a
347time scale, for example current age. This is achieved by the
348\texttt{splitLexis} (note that it is \emph{not} called
349\texttt{split.Lexis}). This requires that the time scale and the
350breakpoints on this time scale are supplied. Try:
351\begin{Schunk}
352\begin{Sinput}
353> dmS1 <- splitLexis( dmL, "age", breaks=seq(0,100,5) )
354> summary( dmL )
355\end{Sinput}
356\begin{Soutput}
357Transitions:
358     To
359From   DM Dead  Records:  Events: Risk time:  Persons:
360  DM 7497 2499      9996     2499   54273.27      9996
361\end{Soutput}
362\begin{Sinput}
363> summary( dmS1 )
364\end{Sinput}
365\begin{Soutput}
366Transitions:
367     To
368From    DM Dead  Records:  Events: Risk time:  Persons:
369  DM 18328 2499     20827     2499   54273.27      9996
370\end{Soutput}
371\end{Schunk}
372We see that the number of persons and events and the amount of
373follow-up is the same in the two data sets; only the number of records
374differ --- the extra records all have \texttt{lex.Cst}=\texttt{DM} and
375\texttt{lex.Xst}=\texttt{DM}.
376
377To see how records are split for each individual, it is useful to list
378the results for a few individuals (whom we selected with a view to the
379illustrative usefulness):
380\begin{Schunk}
381\begin{Sinput}
382> wh.id <- c(9,27,52,484)
383> subset( dmL , lex.id %in% wh.id )[,1:10]
384\end{Sinput}
385\begin{Soutput}
386            per      age tfD   lex.dur lex.Cst lex.Xst lex.id sex    dobth     dodm
387430048 1998.956 61.87269   0  9.508556      DM    Dead      9   F 1937.083 1998.956
38822671  2000.042 52.71184   0  9.954825      DM      DM     27   M 1947.331 2000.042
389338459 1998.249 61.85626   0 11.748118      DM      DM     52   F 1936.393 1998.249
390274124 1998.260 62.37919   0 10.929500      DM    Dead    484   F 1935.881 1998.260
391\end{Soutput}
392\begin{Sinput}
393> subset( dmS1, lex.id %in% wh.id )[,1:10]
394\end{Sinput}
395\begin{Soutput}
396     lex.id      per      age      tfD  lex.dur lex.Cst lex.Xst sex    dobth     dodm
39714        9 1998.956 61.87269 0.000000 3.127310      DM      DM   F 1937.083 1998.956
39815        9 2002.083 65.00000 3.127310 5.000000      DM      DM   F 1937.083 1998.956
39916        9 2007.083 70.00000 8.127310 1.381246      DM    Dead   F 1937.083 1998.956
40054       27 2000.042 52.71184 0.000000 2.288159      DM      DM   M 1947.331 2000.042
40155       27 2002.331 55.00000 2.288159 5.000000      DM      DM   M 1947.331 2000.042
40256       27 2007.331 60.00000 7.288159 2.666667      DM      DM   M 1947.331 2000.042
403108      52 1998.249 61.85626 0.000000 3.143737      DM      DM   F 1936.393 1998.249
404109      52 2001.393 65.00000 3.143737 5.000000      DM      DM   F 1936.393 1998.249
405110      52 2006.393 70.00000 8.143737 3.604381      DM      DM   F 1936.393 1998.249
4061004    484 1998.260 62.37919 0.000000 2.620808      DM      DM   F 1935.881 1998.260
4071005    484 2000.881 65.00000 2.620808 5.000000      DM      DM   F 1935.881 1998.260
4081006    484 2005.881 70.00000 7.620808 3.308693      DM    Dead   F 1935.881 1998.260
409\end{Soutput}
410\end{Schunk}
411The resulting object, \texttt{dmS1}, is again a \texttt{Lexis} object,
412and the follow-up may be split further along another time scale, for
413example diabetes duration, \texttt{tfD}. Subsequently we list the
414results for the chosen individuals:
415\begin{Schunk}
416\begin{Sinput}
417> dmS2 <- splitLexis( dmS1, "tfD", breaks=c(0,1,2,5,10,20,30,40) )
418> subset( dmS2, lex.id %in% wh.id )[,1:10]
419\end{Sinput}
420\begin{Soutput}
421     lex.id      per      age       tfD   lex.dur lex.Cst lex.Xst sex    dobth     dodm
42231        9 1998.956 61.87269  0.000000 1.0000000      DM      DM   F 1937.083 1998.956
42332        9 1999.956 62.87269  1.000000 1.0000000      DM      DM   F 1937.083 1998.956
42433        9 2000.956 63.87269  2.000000 1.1273101      DM      DM   F 1937.083 1998.956
42534        9 2002.083 65.00000  3.127310 1.8726899      DM      DM   F 1937.083 1998.956
42635        9 2003.956 66.87269  5.000000 3.1273101      DM      DM   F 1937.083 1998.956
42736        9 2007.083 70.00000  8.127310 1.3812457      DM    Dead   F 1937.083 1998.956
428111      27 2000.042 52.71184  0.000000 1.0000000      DM      DM   M 1947.331 2000.042
429112      27 2001.042 53.71184  1.000000 1.0000000      DM      DM   M 1947.331 2000.042
430113      27 2002.042 54.71184  2.000000 0.2881588      DM      DM   M 1947.331 2000.042
431114      27 2002.331 55.00000  2.288159 2.7118412      DM      DM   M 1947.331 2000.042
432115      27 2005.042 57.71184  5.000000 2.2881588      DM      DM   M 1947.331 2000.042
433116      27 2007.331 60.00000  7.288159 2.6666667      DM      DM   M 1947.331 2000.042
434229      52 1998.249 61.85626  0.000000 1.0000000      DM      DM   F 1936.393 1998.249
435230      52 1999.249 62.85626  1.000000 1.0000000      DM      DM   F 1936.393 1998.249
436231      52 2000.249 63.85626  2.000000 1.1437372      DM      DM   F 1936.393 1998.249
437232      52 2001.393 65.00000  3.143737 1.8562628      DM      DM   F 1936.393 1998.249
438233      52 2003.249 66.85626  5.000000 3.1437372      DM      DM   F 1936.393 1998.249
439234      52 2006.393 70.00000  8.143737 1.8562628      DM      DM   F 1936.393 1998.249
440235      52 2008.249 71.85626 10.000000 1.7481177      DM      DM   F 1936.393 1998.249
4412084    484 1998.260 62.37919  0.000000 1.0000000      DM      DM   F 1935.881 1998.260
4422085    484 1999.260 63.37919  1.000000 1.0000000      DM      DM   F 1935.881 1998.260
4432086    484 2000.260 64.37919  2.000000 0.6208077      DM      DM   F 1935.881 1998.260
4442087    484 2000.881 65.00000  2.620808 2.3791923      DM      DM   F 1935.881 1998.260
4452088    484 2003.260 67.37919  5.000000 2.6208077      DM      DM   F 1935.881 1998.260
4462089    484 2005.881 70.00000  7.620808 2.3791923      DM      DM   F 1935.881 1998.260
4472090    484 2008.260 72.37919 10.000000 0.9295003      DM    Dead   F 1935.881 1998.260
448\end{Soutput}
449\end{Schunk}
450A more efficient (and more intuitive) way of making this double split
451is to use the \texttt{splitMulti} function from the \texttt{popEpi}
452package:
453\begin{Schunk}
454\begin{Sinput}
455> library( popEpi )
456> dmM <- splitMulti( dmL, age = seq(0,100,5),
457+                         tfD = c(0,1,2,5,10,20,30,40),
458+                    drop = FALSE )
459> summary( dmS2 )
460\end{Sinput}
461\begin{Soutput}
462Transitions:
463     To
464From    DM Dead  Records:  Events: Risk time:  Persons:
465  DM 40897 2499     43396     2499   54273.27      9996
466\end{Soutput}
467\begin{Sinput}
468> summary( dmM )
469\end{Sinput}
470\begin{Soutput}
471Transitions:
472     To
473From    DM Dead  Records:  Events: Risk time:  Persons:
474  DM 40897 2499     43396     2499   54273.27      9996
475\end{Soutput}
476\end{Schunk}
477Note we used the argument \texttt{drop=FALSE} which will retain
478follow-up also outside the window defined by the breaks. Otherwise the
479default for \texttt{splitMulti} would be to drop follow-up outside
480\texttt{age} [0,100] and \texttt{tfD} [0,40]. This clipping behaviour
481is not available in \texttt{splitLexis}, nevertheless this may be
482exactly what we want in some situations.
483
484So we see that the two ways of splitting data yields the same amount of
485follow-up, but the results are not identical:
486\begin{Schunk}
487\begin{Sinput}
488> identical( dmS2, dmM )
489\end{Sinput}
490\begin{Soutput}
491[1] FALSE
492\end{Soutput}
493\begin{Sinput}
494> class( dmS2 )
495\end{Sinput}
496\begin{Soutput}
497[1] "Lexis"      "data.frame"
498\end{Soutput}
499\begin{Sinput}
500> class( dmM )
501\end{Sinput}
502\begin{Soutput}
503[1] "Lexis"      "data.table" "data.frame"
504\end{Soutput}
505\end{Schunk}
506As we see, this is because the \texttt{dmM} object also is a
507\texttt{data.table} object; the \texttt{splitMulti} uses the
508\texttt{data.table} machinery which makes the splitting substantially
509faster --- this is of particular interest if you operate on large data
510sets ($>100,000$ records).
511
512Thus the recommended way of splitting follow-up time is by
513\texttt{splitMulti}. But you should be aware that the result is a
514\texttt{data.table} object, which in some circumstances behaves
515slightly different from \texttt{data.frame}s. See the manual for
516\texttt{data.table}.
517
518\section{Cutting follow up time at dates of intermediate events}
519
520If we have a recording of the date of a specific event as for example
521recovery or relapse, we may classify follow-up time as being before or
522after this intermediate event, but it requires that follow-up records
523that straddle the event be cut in two and placed in separate records,
524one representing follow-up \emph{before} the intermediate event, and another
525representing follow-up \emph{after} the intermediate event. This is
526achieved with the function \texttt{cutLexis}, which takes three
527arguments: the time point of the intermediate event, the time scale
528that this point refers to, and the value of the (new) state following
529the date. Optionally, we may also define a new time scale with the
530argument \texttt{new.scale=}.
531
532We are interested in the time before and after inception of insulin
533use, which occurs at the date \texttt{doins}:
534\begin{Schunk}
535\begin{Sinput}
536> whc <- c(names(dmL)[1:7],"dodm","doins") # WHich Columns do we want to see?
537> subset( dmL, lex.id %in% wh.id )[,whc]
538\end{Sinput}
539\begin{Soutput}
540            per      age tfD   lex.dur lex.Cst lex.Xst lex.id     dodm    doins
541430048 1998.956 61.87269   0  9.508556      DM    Dead      9 1998.956       NA
54222671  2000.042 52.71184   0  9.954825      DM      DM     27 2000.042       NA
543338459 1998.249 61.85626   0 11.748118      DM      DM     52 1998.249 2004.804
544274124 1998.260 62.37919   0 10.929500      DM    Dead    484 1998.260 2003.960
545\end{Soutput}
546\begin{Sinput}
547> dmC <- cutLexis( data = dmL,
548+                   cut = dmL$doins,
549+             timescale = "per",
550+             new.state = "Ins",
551+             new.scale = "tfI",
552+      precursor.states = "DM" )
553> whc <- c(names(dmL)[1:8],"doins") # WHich Columns do we want to see?
554> subset( dmC, lex.id %in% wh.id )[,whc]
555\end{Sinput}
556\begin{Soutput}
557           per      age      tfD  lex.dur lex.Cst lex.Xst lex.id sex    doins
5589     1998.956 61.87269 0.000000 9.508556      DM    Dead      9   F       NA
55927    2000.042 52.71184 0.000000 9.954825      DM      DM     27   M       NA
56052    1998.249 61.85626 0.000000 6.554415      DM     Ins     52   F 2004.804
56110048 2004.804 68.41068 6.554415 5.193703     Ins     Ins     52   F 2004.804
562484   1998.260 62.37919 0.000000 5.700205      DM     Ins    484   F 2003.960
56310480 2003.960 68.07940 5.700205 5.229295     Ins    Dead    484   F 2003.960
564\end{Soutput}
565\end{Schunk}
566(The \texttt{precursor.states=} argument is explained below).
567
568Note that the process of cutting time is simplified by having all
569types of events referred to the calendar time scale. This is a
570generally applicable advice in handling follow-up data: Get all event
571times as \emph{dates}, location of events and follow-up on other time
572scales can then easily be derived from this.
573
574Note
575that individual 52 has had his follow-up cut at 6.55 years from
576diabetes diagnosis and individual 484 at 5.70 years from diabetes
577diagnosis. This dataset could then be split along the time scales as we
578did before with \texttt{dmL}.
579
580The result of this can also be achieved by cutting the split dataset
581\texttt{dmS2} instead of \texttt{dmL}:
582\begin{Schunk}
583\begin{Sinput}
584> dmS2C <- cutLexis( data = dmS2,
585+                     cut = dmS2$doins,
586+               timescale = "per",
587+               new.state = "Ins",
588+               new.scale = "tfI",
589+        precursor.states = "DM" )
590> subset( dmS2C, lex.id %in% wh.id )[,whc]
591\end{Sinput}
592\begin{Soutput}
593           per      age       tfD   lex.dur lex.Cst lex.Xst lex.id sex    doins
59431    1998.956 61.87269  0.000000 1.0000000      DM      DM      9   F       NA
59532    1999.956 62.87269  1.000000 1.0000000      DM      DM      9   F       NA
59633    2000.956 63.87269  2.000000 1.1273101      DM      DM      9   F       NA
59734    2002.083 65.00000  3.127310 1.8726899      DM      DM      9   F       NA
59835    2003.956 66.87269  5.000000 3.1273101      DM      DM      9   F       NA
59936    2007.083 70.00000  8.127310 1.3812457      DM    Dead      9   F       NA
600111   2000.042 52.71184  0.000000 1.0000000      DM      DM     27   M       NA
601112   2001.042 53.71184  1.000000 1.0000000      DM      DM     27   M       NA
602113   2002.042 54.71184  2.000000 0.2881588      DM      DM     27   M       NA
603114   2002.331 55.00000  2.288159 2.7118412      DM      DM     27   M       NA
604115   2005.042 57.71184  5.000000 2.2881588      DM      DM     27   M       NA
605116   2007.331 60.00000  7.288159 2.6666667      DM      DM     27   M       NA
606229   1998.249 61.85626  0.000000 1.0000000      DM      DM     52   F 2004.804
607230   1999.249 62.85626  1.000000 1.0000000      DM      DM     52   F 2004.804
608231   2000.249 63.85626  2.000000 1.1437372      DM      DM     52   F 2004.804
609232   2001.393 65.00000  3.143737 1.8562628      DM      DM     52   F 2004.804
610233   2003.249 66.85626  5.000000 1.5544148      DM     Ins     52   F 2004.804
61143629 2004.804 68.41068  6.554415 1.5893224     Ins     Ins     52   F 2004.804
61243630 2006.393 70.00000  8.143737 1.8562628     Ins     Ins     52   F 2004.804
61343631 2008.249 71.85626 10.000000 1.7481177     Ins     Ins     52   F 2004.804
6142084  1998.260 62.37919  0.000000 1.0000000      DM      DM    484   F 2003.960
6152085  1999.260 63.37919  1.000000 1.0000000      DM      DM    484   F 2003.960
6162086  2000.260 64.37919  2.000000 0.6208077      DM      DM    484   F 2003.960
6172087  2000.881 65.00000  2.620808 2.3791923      DM      DM    484   F 2003.960
6182088  2003.260 67.37919  5.000000 0.7002053      DM     Ins    484   F 2003.960
61945484 2003.960 68.07940  5.700205 1.9206023     Ins     Ins    484   F 2003.960
62045485 2005.881 70.00000  7.620808 2.3791923     Ins     Ins    484   F 2003.960
62145486 2008.260 72.37919 10.000000 0.9295003     Ins    Dead    484   F 2003.960
622\end{Soutput}
623\end{Schunk}
624Thus it does not matter in which order we use \texttt{splitLexis} and
625\texttt{cutLexis}. Mathematicians would say that \texttt{splitLexis}
626and \texttt{cutLexis} are commutative.
627
628Note in \texttt{lex.id}=484, that follow-up subsequent to the event is
629classified as being in state \texttt{Ins}, but that the final
630transition to state \texttt{Dead} is preserved. This is the point of
631the \texttt{precursor.states=} argument. It names the states (in this
632case \texttt{DM}) that will be over-written by \texttt{new.state} (in
633this case \texttt{Ins}), while the state \texttt{Dead} should not be
634updated even if it is after the time where the persons moves to state
635\texttt{Ins}. In other words, only state \texttt{DM} is a precursor to
636state \texttt{Ins}, state \texttt{Dead} is always subsequent to state
637\texttt{Ins}.
638
639Note that we defined a new time scale, \texttt{tfI}, using the argument
640\texttt{new.scale=tfI}. This has a special status relative to the other
641three time scales, it is defined as time since entry into a state,
642namely \texttt{Ins}, this is noted in the time scale part of the
643summary of \texttt{Lexis} object --- the information sits in the
644attribute \texttt{time.since} of the \texttt{Lexis} object, which can
645be accessed by the function \texttt{timeSince()} or through the \texttt{summary()}:
646\begin{Schunk}
647\begin{Sinput}
648> summary( dmS2C, timeScales=TRUE )
649\end{Sinput}
650\begin{Soutput}
651Transitions:
652     To
653From     DM  Ins Dead  Records:  Events: Risk time:  Persons:
654  DM  35135 1694 2048     38877     3742   45885.49      9899
655  Ins     0 5762  451      6213      451    8387.77      1791
656  Sum 35135 7456 2499     45090     4193   54273.27      9996
657
658Timescales:
659  per   age   tfD   tfI
660   ""    ""    "" "Ins"
661\end{Soutput}
662\end{Schunk}
663Finally we can get a quick overview of the states and transitions by
664using \texttt{boxes} --- \texttt{scale.R} scales transition rates to
665rates per 1000 PY:
666\begin{Schunk}
667\begin{Sinput}
668> boxes( dmC, boxpos=TRUE, scale.R=1000, show.BE=TRUE )
669\end{Sinput}
670\end{Schunk}
671\insfig{box1}{0.8}{States, person years, transitions and rates in the
672  cut dataset. The numbers \emph{in} the boxes are person-years and
673  the number of persons \texttt{B}eginning, resp. \texttt{E}nding
674  their follow-up in each state (triggered by \textrm{\tt
675    show.BE=TRUE}). The numbers at the arrows are the number of
676  transitions and transition rates per 1000 (triggered by \textrm{\tt
677  scale.R=1000}).}
678
679\chapter{Modeling rates from \texttt{Lexis} objects}
680
681\section{Covariates}
682
683In the dataset \texttt{dmS2C} there are three types of covariates that
684can be used to describe mortality rates:
685\begin{enumerate}
686\item time-dependent covariates
687\item time scales
688\item fixed covariates
689\end{enumerate}
690
691There is only one time-dependent covariate here, namely
692\texttt{lex.Cst}, the current state of the person's follow up; it
693takes the values \texttt{DM} and \texttt{Ins} according to whether the
694person has ever purchased insulin at a given time of follow-up.
695
696The time-scales are obvious candidates for explanatory variables for
697the rates, notably age and time from diagnosis (duration of diabetes)
698and insulin.
699
700\subsection{Time scales as covariates}
701
702If we want to model the effect of the time scale variables on
703occurrence rates, we will for each interval use either the value of
704the left endpoint in each interval or the middle. There is a function
705\texttt{timeBand} which returns either of these:
706\begin{Schunk}
707\begin{Sinput}
708> timeBand( dmS2C, "age", "middle" )[1:10]
709\end{Sinput}
710\begin{Soutput}
711 [1] 57.5 57.5 62.5 62.5 62.5 67.5 67.5 62.5 67.5 67.5
712\end{Soutput}
713\begin{Sinput}
714> # For nice printing and column labelling we use the data.frame() function:
715> data.frame( dmS2C[,c("per","age","tfD","lex.dur")],
716+             mid.age=timeBand( dmS2C, "age", "middle" ),
717+               mid.t=timeBand( dmS2C, "tfD", "middle" ),
718+              left.t=timeBand( dmS2C, "tfD", "left"   ),
719+             right.t=timeBand( dmS2C, "tfD", "right"  ),
720+              fact.t=timeBand( dmS2C, "tfD", "factor" ) )[1:15,]
721\end{Sinput}
722\begin{Soutput}
723        per      age        tfD    lex.dur mid.age mid.t left.t right.t  fact.t
7241  1998.917 58.66119  0.0000000 1.00000000    57.5   0.5      0       1   (0,1]
7252  1999.917 59.66119  1.0000000 0.33880903    57.5   1.5      1       2   (1,2]
7263  2000.256 60.00000  1.3388090 0.66119097    62.5   1.5      1       2   (1,2]
7274  2000.917 60.66119  2.0000000 3.00000000    62.5   3.5      2       5   (2,5]
7285  2003.917 63.66119  5.0000000 1.33880903    62.5   7.5      5      10  (5,10]
7296  2005.256 65.00000  6.3388090 3.66119097    67.5   7.5      5      10  (5,10]
7307  2008.917 68.66119 10.0000000 1.08008214    67.5  15.0     10      20 (10,20]
7318  2003.309 64.09035  0.0000000 0.90965092    62.5   0.5      0       1   (0,1]
7329  2004.218 65.00000  0.9096509 0.09034908    67.5   0.5      0       1   (0,1]
73310 2004.309 65.09035  1.0000000 1.00000000    67.5   1.5      1       2   (1,2]
73411 2005.309 66.09035  2.0000000 3.00000000    67.5   3.5      2       5   (2,5]
73512 2008.309 69.09035  5.0000000 0.90965092    67.5   7.5      5      10  (5,10]
73613 2009.218 70.00000  5.9096509 0.77891855    72.5   7.5      5      10  (5,10]
73714 2004.552 86.25051  0.0000000 1.00000000    87.5   0.5      0       1   (0,1]
73815 2005.552 87.25051  1.0000000 1.00000000    87.5   1.5      1       2   (1,2]
739\end{Soutput}
740\end{Schunk}
741Note that the values of these functions are characteristics of the
742intervals defined by \texttt{breaks=}, \emph{not} the midpoints nor
743left or right endpoints of the actual follow-up intervals (which would
744be \texttt{tfD} and \texttt{tfD+lex.dur}, respectively).
745
746These functions are intended for modeling time scale variables as
747factors (categorical variables) in which case the coding must be
748independent of the censoring and mortality pattern --- it should only
749depend on the chosen grouping of the time scale. Modeling time scales as
750\emph{quantitative} should not be based on these codings but directly
751on the values of the time-scale variables, notably the left endpoints
752of the intervals.
753
754\subsection{Differences between time scales}
755
756Apparently, the only fixed variable is \texttt{sex}, but formally the
757dates of birth (\texttt{dobth}), diagnosis (\texttt{dodm}) and first
758insulin purchase (\texttt{doins}) are fixed covariates too. They can
759be constructed as origins of time scales referred to the calendar time
760scale. Likewise, and possibly of greater interest, we can consider
761these origins on the age scale, calculated as the difference between
762age and another time scale.
763
764These would then be age at birth (hardly relevant since it is the same
765for all persons), age at diabetes
766diagnosis and age at insulin treatment.
767
768\subsection{Keeping the relation between time scales}
769
770The midpoint (as well as the right interval endpoint) should be used
771with caution if the variable age at diagnosis \texttt{dodm-dobth} is
772modeled too; the age at diabetes is logically equal to the difference
773between current age (\texttt{age}) and time since diabetes diagnosis
774(\texttt{tfD}):
775\begin{Schunk}
776\begin{Sinput}
777> summary( (dmS2$age-dmS2$tfD) - (dmS2$dodm-dmS2$dobth) )
778\end{Sinput}
779\begin{Soutput}
780   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
781      0       0       0       0       0       0
782\end{Soutput}
783\end{Schunk}
784This calculation refers to the \emph{start} of each interval --- which
785are in the time scale variables in the \texttt{Lexis} object. But when
786using the middle of the intervals, this relationship is not preserved:
787\begin{Schunk}
788\begin{Sinput}
789> summary( timeBand( dmS2, "age", "middle" ) -
790+          timeBand( dmS2, "tfD", "middle" ) - (dmS2$dodm-dmS2$dobth) )
791\end{Sinput}
792\begin{Soutput}
793   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
794-7.4870 -2.0862 -0.3765     Inf  1.3641     Inf
795\end{Soutput}
796\end{Schunk}
797If all three variables are to be included in a model, we must make
798sure that the \emph{substantial} relationship between the variables be
799maintained. One way is to recompute age at diabetes diagnosis from the two
800midpoint variables, but more straightforward would be to use the left
801endpoint of the intervals, that is the time scale variables in the
802\texttt{Lexis} object.
803
804If we dissolve the relationship between the variables \texttt{age},
805\texttt{tfD} and age at diagnosis by grouping we may obtain
806identifiability of the three separate effects, but it will be at the
807price of an arbitrary allocation of a linear trend between them.
808
809For the sake of clarity, consider current age, $a$, duration of
810disease, $d$ and age at diagnosis $e$, where
811\[
812 \text{current age} =
813 \text{age at diagnosis} +
814 \text{disease duration},
815 \quad \text{\ie} \quad a=e+d \quad
816 \Leftrightarrow \quad e+d-a=0
817\]
818If we model the effect of the quantitative variables $a$, $e$ and $d$
819on the log-rates by three functions $f$, $g$ and $h$:
820$ \log(\lambda)=f(a)+g(d)+h(e) $
821then for any $\kappa$:
822\begin{align*}
823   \log(\lambda) & = f(a)+g(d)+h(e)+\kappa(e+d-a)\\
824                 & =
825   \big(f(a)-\kappa a \big)+
826   \big(g(d)+\kappa d \big)+
827   \big(h(e)+\kappa e \big) \\
828& = \tilde f(a)+ \tilde g(d)+ \tilde h(e)
829\end{align*}
830In practical modeling this will turn up as a singular model matrix
831with one parameter aliased, corresponding to some arbitrarily chosen
832value of $\kappa$ (depending on software conventions for singular
833models). This phenomenon is well known from age-period-cohort models.
834
835Thus we see that we can move any slope around between the three terms,
836so if we achieve identifiability by using grouping of one of the
837variables we will in reality have settled for a particular value of
838$\kappa$, without known why we chose just that. The solution is to
839resort to predictions which are independent of the particular
840parametrization or choose a particular parametrization with explicit constraints.
841
842\section{Modeling of rates}
843
844As mentioned, the purpose of subdividing follow-up data in smaller
845intervals is to be able to model effects of time scale variables as
846parametric functions. When we split along a time scale we can get
847intervals that are as small as we want; if they are sufficiently
848small, an assumption of constant rates in each interval becomes
849reasonable.
850
851In a model that assumes a constant occurrence rate in each of the
852intervals the likelihood contribution from each interval is the same
853as the likelihood contribution from a Poisson variate $D$, say, with
854mean $\lambda \ell$ where $\lambda$ is the rate and $\ell$ is the
855interval length, and where the value of the variate $D$ is 1 or 0
856according to whether an event has occurred or not. Moreover, the
857likelihood contributions from all follow-up intervals from a single
858person are \emph{conditionally} independent (conditional on having
859survived till the start of the interval in question). This implies
860that the total contribution to the likelihood from a single person is
861a product of terms, and hence the same as the likelihood of a number
862of independent Poisson terms, one from each interval.
863
864Note that variables are neither Poisson distributed (\eg they can only
865ever assume values 0 or 1) nor independent --- it is only the
866likelihood for the follow-up data that happens to be the same as the
867likelihood from independent Poisson variates. Different models can
868have the same likelihood, a model cannot be inferred from the
869likelihood.
870
871Parametric modeling of the rates is obtained by using the
872\emph{values} of the time scales for each interval as \emph{quantitative}
873explanatory variables, using for example splines. And of course also
874the values of the fixed covariates and the time-dependent variables
875for each interval. Thus the model will be one where the rate is
876assumed constant in each (small) interval, but where a parametric form
877of the \emph{size} of the rate in each interval is imposed by the
878model, using the time scale as a quantitative covariate.
879
880\subsection{Interval length}
881
882In the first chapter we illustrated cutting and splitting by listing
883the results for a few individuals across a number of intervals. For
884illustrational purposes we used 5-year age bands to avoid excessive
885listings, but since the doubling time for mortality on the age scale
886is only slightly larger than 5 years, the assumption about constant
887rates in each interval would be pretty far fetched if we were to use 5
888year intervals.
889
890Thus, for modeling purposes we split the follow-up in 3 month
891intervals. When we use intervals of 3 months length it is superfluous
892to split along multiple time scales --- the precise location of
893tightly spaced splits will be irrelevant from any practical point of
894view. \texttt{splitLexis} and \texttt{splitMulti} will allocate the
895actual split times for all of the time scale variables, so these can be
896used directly in modeling.
897
898So we split the cut dataset in 3 months intervals along the age
899scale:
900\begin{Schunk}
901\begin{Sinput}
902> dmCs <- splitMulti( dmC, age = seq(0,110,1/4) )
903> summary( dmCs, t=T )
904\end{Sinput}
905\begin{Soutput}
906Transitions:
907     To
908From      DM   Ins Dead  Records:  Events: Risk time:  Persons:
909  DM  189669  1694 2048    193411     3742   45885.49      9899
910  Ins      0 34886  451     35337      451    8387.77      1791
911  Sum 189669 36580 2499    228748     4193   54273.27      9996
912
913Timescales:
914  per   age   tfD   tfI
915   ""    ""    "" "Ins"
916\end{Soutput}
917\end{Schunk}
918We see that we now have 228,748 records and 9996 persons, so about 23
919records per person. The total risk time is 54,275 years, a bit less
920than 3 months on average per record as expected.
921
922\subsection{Practicalities for splines}
923
924In this study we want to look at how mortality depend on age
925(\texttt{age}) and time since start of insulin use (\texttt{tfI}). If
926we want to use splines in the description we must allocate knots for
927anchoring the splines at each of the time scales, either by some
928\textit{ad hoc} method or by using some sort of penalized splines as
929for example by \texttt{gam}; the latter will not be treated here; it
930belongs in the realm of the \texttt{mgcv} package.
931
932Here we shall use the former approach and allocate 5 knots on each of
933the time-scales. We allocate knots so that we have the events evenly
934distributed between the knots. Since the insulin state starts at 0 for
935all individuals we include 0 as the first knot, such that any set of natural
936splines with these knots will have the value 0 at 0 on the time
937scale.
938\begin{Schunk}
939\begin{Sinput}
940> ( a.kn <- with( subset( dmCs, lex.Xst=="Dead" ),
941+                 quantile( age+lex.dur, (1:5-0.5)/5 ) ) )
942\end{Sinput}
943\begin{Soutput}
944     10%      30%      50%      70%      90%
94560.29350 71.31937 77.72758 82.72745 89.86393
946\end{Soutput}
947\begin{Sinput}
948> ( i.kn <- c( 0,
949+           with( subset( dmCs, lex.Xst=="Dead" & lex.Cst=="Ins" ),
950+                 quantile( tfI+lex.dur, (1:4)/5 ) ) ) )
951\end{Sinput}
952\begin{Soutput}
953                20%       40%       60%       80%
9540.0000000 0.3093771 1.1307324 2.5489391 4.9117043
955\end{Soutput}
956\end{Schunk}
957In the \texttt{Epi} package there is a convenience wrapper,
958\texttt{Ns}, for the \texttt{n}atural \texttt{s}pline generator
959\texttt{ns}, that takes the smallest and the largest of a set of
960supplied knots to be the boundary knots, so the explicit definition of
961the boundary knots becomes superfluous.
962
963Note that it is a feature of the \texttt{Ns} (via the features of
964\texttt{ns}) that any generated spline function is 0 at the leftmost
965knot.
966
967\subsection{Poisson models}
968
969A model that describes mortality rates as only a function of age would
970then be:
971\begin{Schunk}
972\begin{Sinput}
973> ma <- glm( (lex.Xst=="Dead") ~ Ns(age,knots=a.kn),
974+             family = poisson,
975+             offset = log(lex.dur),
976+               data = dmCs )
977> summary( ma )
978\end{Sinput}
979\begin{Soutput}
980Call:
981glm(formula = (lex.Xst == "Dead") ~ Ns(age, knots = a.kn), family = poisson,
982    data = dmCs, offset = log(lex.dur))
983
984Deviance Residuals:
985    Min       1Q   Median       3Q      Max
986-0.5883  -0.1688  -0.1144  -0.0766   4.5958
987
988Coefficients:
989                       Estimate Std. Error z value Pr(>|z|)
990(Intercept)            -3.82830    0.03861  -99.16   <2e-16
991Ns(age, knots = a.kn)1  1.36254    0.08723   15.62   <2e-16
992Ns(age, knots = a.kn)2  1.49446    0.06845   21.83   <2e-16
993Ns(age, knots = a.kn)3  2.63557    0.07058   37.34   <2e-16
994Ns(age, knots = a.kn)4  1.94173    0.05769   33.66   <2e-16
995
996(Dispersion parameter for poisson family taken to be 1)
997
998    Null deviance: 27719  on 228747  degrees of freedom
999Residual deviance: 25423  on 228743  degrees of freedom
1000AIC: 30431
1001
1002Number of Fisher Scoring iterations: 8
1003\end{Soutput}
1004\end{Schunk}
1005The offset, \texttt{log(lex.dur)} comes from the fact that the
1006likelihood for the follow-up data during $\ell$ time is the same as
1007that for independent Poisson variates with mean $\lambda \ell$, and
1008that the default link function for the Poisson family is the log, so
1009that we are using a linear model for the log-mean,
1010$\log(\lambda) + \log(\ell)$.  But when we want a model for the
1011log-rate ($\log(\lambda)$), the term $\log(\ell)$ must still be
1012included as a covariate, but with regression coefficient fixed to 1; a
1013so-called \emph{offset}. This is however a technicality; it just
1014exploits that the likelihood of a particular Poisson model and that of
1015the rates model is the same.
1016
1017In the \texttt{Epi} package is a \texttt{glm} family, \texttt{poisreg}
1018that has a more intuitive interface, where the response is a 2-column
1019matrix of events and person-time, respectively. This is in concert
1020with the fact that the outcome variable in follow-up studies is
1021bivariate: (event, risk time).
1022\begin{Schunk}
1023\begin{Sinput}
1024> Ma <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn),
1025+            family = poisreg, data = dmCs )
1026> summary( Ma )
1027\end{Sinput}
1028\begin{Soutput}
1029Call:
1030glm(formula = cbind(lex.Xst == "Dead", lex.dur) ~ Ns(age, knots = a.kn),
1031    family = poisreg, data = dmCs)
1032
1033Deviance Residuals:
1034    Min       1Q   Median       3Q      Max
1035-0.5883  -0.1688  -0.1144  -0.0766   4.5958
1036
1037Coefficients:
1038                       Estimate Std. Error z value Pr(>|z|)
1039(Intercept)            -3.82830    0.03861  -99.15   <2e-16
1040Ns(age, knots = a.kn)1  1.36254    0.08723   15.62   <2e-16
1041Ns(age, knots = a.kn)2  1.49446    0.06845   21.83   <2e-16
1042Ns(age, knots = a.kn)3  2.63557    0.07058   37.34   <2e-16
1043Ns(age, knots = a.kn)4  1.94173    0.05769   33.66   <2e-16
1044
1045(Dispersion parameter for poisson family taken to be 1)
1046
1047    Null deviance: 27719  on 228747  degrees of freedom
1048Residual deviance: 25423  on 228743  degrees of freedom
1049AIC: 30431
1050
1051Number of Fisher Scoring iterations: 7
1052\end{Soutput}
1053\end{Schunk}
1054Exploiting the multistate structure in the \texttt{Lexis} object
1055there is a multistate convenience wrapper for \texttt{glm} with the
1056\texttt{poisreg} family, that just requires specification of the
1057transitions in terms of \texttt{from} and \texttt{to}. Although it is
1058called \texttt{glm.Lexis} it is \emph{not} an S3 method for
1059\texttt{Lexis} objects:
1060\begin{Schunk}
1061\begin{Sinput}
1062> Xa <- glm.Lexis( dmCs, from="DM", to="Dead",
1063+                  formula = ~ Ns(age,knots=a.kn) )
1064\end{Sinput}
1065\begin{Soutput}
1066stats::glm Poisson analysis of Lexis object dmCs with log link:
1067Rates for the transition: DM->Dead
1068\end{Soutput}
1069\end{Schunk}
1070The result is a \texttt{glm} object but with an extra attribute, \texttt{Lexis}:
1071\begin{Schunk}
1072\begin{Sinput}
1073> attr( Xa, "Lexis" )
1074\end{Sinput}
1075\begin{Soutput}
1076$data
1077[1] "dmCs"
1078
1079$trans
1080[1] "DM->Dead"
1081
1082$formula
1083~Ns(age, knots = a.kn)
1084<environment: 0xb8adec8>
1085
1086$scale
1087[1] 1
1088\end{Soutput}
1089\end{Schunk}
1090There are similar wrappers for \texttt{gam} and \texttt{coxph} models,
1091\texttt{gam.Lexis} and \texttt{coxph.Lexis}, but these will not be
1092elaborated in detail.
1093
1094The \texttt{from=} and \texttt{to=} can even be omitted, in which case
1095all possible transitions \emph{into} any of the absorbing states is
1096modeled:
1097\begin{Schunk}
1098\begin{Sinput}
1099> xa <- glm.Lexis( dmCs, formula = ~ Ns(age,knots=a.kn) )
1100\end{Sinput}
1101\begin{Soutput}
1102stats::glm Poisson analysis of Lexis object dmCs with log link:
1103Rates for transitions: DM->Dead, Ins->Dead
1104\end{Soutput}
1105\end{Schunk}
1106We can check if the four models fitted are the same:
1107\begin{Schunk}
1108\begin{Sinput}
1109> c( deviance(ma), deviance(Ma), deviance(Xa), deviance(xa) )
1110\end{Sinput}
1111\begin{Soutput}
1112[1] 25422.92 25422.92 20902.31 25422.92
1113\end{Soutput}
1114\end{Schunk}
1115Oops! the model \texttt{Xa} is apparently not the same as the other
1116three? This is because the explicit specification
1117\verb|from="DM", to="Dead"|, omits modeling contributions from the
1118$\mathtt{Ins}\rightarrow\mathtt{Dead}$ transition --- the output
1119actually said so --- see also figure \ref{fig:box1} on
1120p. \pageref{fig:box1}. The other three models all use both transitions
1121--- and assume that the two transition rates are the same, \ie that
1122start of insulin has no effect on mortality. We shall relax this
1123assumption later.
1124
1125The parameters from the model do not have any direct interpretation
1126\textit{per se}, but we can compute the estimated mortality rates for
1127a range of ages using \texttt{ci.pred} with a suitably defined
1128prediction data frame.
1129
1130Note that if we use the \texttt{poisson} family of models, we must
1131specify all covariates in the model, including the variable in the
1132offset, \texttt{lex.dur} (remember that this was a covariate with
1133coefficient fixed at 1). We set the latter to 1000, because we want the
1134mortality rates per 1000 person-years. Using the \texttt{poisreg}
1135family, the prediction will ignore any value of \texttt{lex.dur}
1136specified in the prediction data frame, the returned rates will be per
1137unit in which \texttt{lex.dur} is recorded.
1138\begin{Schunk}
1139\begin{Sinput}
1140> nd <- data.frame( age=40:85, lex.dur=1000 )
1141> pr.0 <- ci.pred( ma, newdata = nd )      # mortality per 100 PY
1142> pr.a <- ci.pred( Ma, newdata = nd )*1000 # mortality per 100 PY
1143> summary(pr.0/pr.a)
1144\end{Sinput}
1145\begin{Soutput}
1146    Estimate      2.5%       97.5%
1147 Min.   :1   Min.   :1   Min.   :1
1148 1st Qu.:1   1st Qu.:1   1st Qu.:1
1149 Median :1   Median :1   Median :1
1150 Mean   :1   Mean   :1   Mean   :1
1151 3rd Qu.:1   3rd Qu.:1   3rd Qu.:1
1152 Max.   :1   Max.   :1   Max.   :1
1153\end{Soutput}
1154\begin{Sinput}
1155> matshade( nd$age, pr.a, plot=TRUE,
1156+           type="l", lty=1,
1157+           log="y", xlab="Age (years)",
1158+           ylab="DM mortality per 1000 PY")
1159\end{Sinput}
1160\end{Schunk}
1161\insfig{pr-a}{0.8}{Mortality among Danish diabetes patients by age
1162  with 95\% CI as shaded area. We see that the rates increase linearly
1163  on the log-scale, that is exponentially by age.}
1164
1165\section{Time dependent covariate}
1166
1167A Poisson model approach to mortality by insulin status, would be to
1168assume that the rate-ratio between patients on insulin and not on
1169insulin is a fixed quantity, independent of time since start of insulin,
1170independent of age. This is commonly termed a proportional hazards
1171assumption, because the rates (hazards) in the two groups are
1172proportional along the age (baseline time) scale.
1173\begin{Schunk}
1174\begin{Sinput}
1175> pm <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn)
1176+                                           + lex.Cst + sex,
1177+            family=poisreg, data = dmCs )
1178> round( ci.exp( pm ), 3 )
1179\end{Sinput}
1180\begin{Soutput}
1181                       exp(Est.)   2.5%  97.5%
1182(Intercept)                0.022  0.021  0.024
1183Ns(age, knots = a.kn)1     4.248  3.581  5.040
1184Ns(age, knots = a.kn)2     5.008  4.376  5.731
1185Ns(age, knots = a.kn)3    16.832 14.624 19.373
1186Ns(age, knots = a.kn)4     7.994  7.126  8.968
1187lex.CstIns                 1.985  1.791  2.200
1188sexF                       0.668  0.617  0.724
1189\end{Soutput}
1190\end{Schunk}
1191So we see that persons on insulin have about twice the mortality of
1192persons not on insulin and that women have 2/3 the mortality of men.
1193
1194\subsection{Time since insulin start}
1195
1196If we want to test whether the excess mortality depends on the time
1197since start if insulin treatment, we can add a spline terms in
1198\texttt{tfI}. But since \texttt{tfI} is a time scale defined as time
1199since entry into a new state (\texttt{Ins}), the variable \texttt{tfI}
1200will be missing for those in the \texttt{DM} state, so before modeling
1201we must set the \texttt{NA}s to 0, which we do with \texttt{tsNA20}
1202(acronym for \texttt{t}ime\texttt{s}cale \texttt{NA}s to zero):
1203\begin{Schunk}
1204\begin{Sinput}
1205> pm <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn)
1206+                                           + Ns(tfI,knots=i.kn)
1207+                                           + lex.Cst + sex,
1208+            family=poisreg, data = tsNA20(dmCs) )
1209\end{Sinput}
1210\end{Schunk}
1211As noted before we could do this simpler with \texttt{glm.Lexis}, even
1212without the \texttt{from=} and \texttt{to=} arguments, because we are
1213modeling all transitions \emph{into} the absorbing state
1214(\texttt{Dead}):
1215\begin{Schunk}
1216\begin{Sinput}
1217> Pm <- glm.Lexis( tsNA20(dmCs),
1218+                  form = ~ Ns(age,knots=a.kn)
1219+                         + Ns(tfI,knots=i.kn)
1220+                         + lex.Cst + sex )
1221\end{Sinput}
1222\begin{Soutput}
1223stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link:
1224Rates for transitions: DM->Dead, Ins->Dead
1225\end{Soutput}
1226\begin{Sinput}
1227> c( deviance(Pm), deviance(pm) )
1228\end{Sinput}
1229\begin{Soutput}
1230[1] 25096.33 25096.33
1231\end{Soutput}
1232\begin{Sinput}
1233> identical( model.matrix(Pm), model.matrix(pm) )
1234\end{Sinput}
1235\begin{Soutput}
1236[1] TRUE
1237\end{Soutput}
1238\end{Schunk}
1239The coding of the effect of \texttt{tfI} is so that the value is 0 at
12400, so the meaning of the estimate of the effect of \texttt{lex.Cst} is
1241the RR between persons with and without insulin, immediately after
1242start of insulin:
1243\begin{Schunk}
1244\begin{Sinput}
1245> round( ci.exp( Pm, subset="ex" ), 3 )
1246\end{Sinput}
1247\begin{Soutput}
1248           exp(Est.)  2.5% 97.5%
1249lex.CstIns     5.632 4.430  7.16
1250sexF           0.674 0.622  0.73
1251\end{Soutput}
1252\end{Schunk}
1253We see that the effect of sex is pretty much the same as before, but
1254the effect of \texttt{lex.Cst} is much larger, it now refers to a
1255different quantity, namely the RR at \texttt{tfI}=0. If we want to see
1256the effect of time since insulin, it is best viewed jointly with the
1257effect of age:
1258\begin{Schunk}
1259\begin{Sinput}
1260> ndI <- data.frame( expand.grid( tfI=c(NA,seq(0,15,0.1)),
1261+                                 ai=seq(40,80,10) ),
1262+                    sex="M",
1263+                    lex.Cst="Ins" )
1264> ndI <- transform( ndI, age=ai+tfI )
1265> head( ndI )
1266\end{Sinput}
1267\begin{Soutput}
1268  tfI ai sex lex.Cst  age
12691  NA 40   M     Ins   NA
12702 0.0 40   M     Ins 40.0
12713 0.1 40   M     Ins 40.1
12724 0.2 40   M     Ins 40.2
12735 0.3 40   M     Ins 40.3
12746 0.4 40   M     Ins 40.4
1275\end{Soutput}
1276\begin{Sinput}
1277> ndA <- data.frame( age= seq(40,100,0.1), tfI=0,  lex.Cst="DM", sex="M" )
1278> pri <- ci.pred( Pm, ndI ) * 1000
1279> pra <- ci.pred( Pm, ndA ) * 1000
1280> matshade( ndI$age, pri, plot=TRUE, las=1,
1281+           xlab="Age (years)", ylab="DM mortality per 1000 PY",
1282+           log="y", lty=1, col="blue" )
1283> matshade( ndA$age, pra )
1284\end{Sinput}
1285\end{Schunk}
1286\insfig{ins-time}{0.8}{Mortality rates of persons on insulin, starting
1287insulin at ages 40,50,\ldots,80 (blue), compared with persons not on
1288insulin (black curve). Shaded areas are 95\% CI.}
1289
1290In figure \ref{fig:ins-time}, p. \pageref{fig:ins-time}, we see that
1291mortality is high just after insulin start, but falls by almost a
1292factor 3 during the first year. Also we see that there is a tendency
1293that mortality in a given age is smallest for those with the longest
1294duration of insulin use.
1295
1296\section{The Cox model}
1297
1298Note that in the Cox-model the age is used as response variable,
1299slightly counter-intuitive. Hence the age part of the linear predictors
1300is not in that model:
1301\begin{Schunk}
1302\begin{Sinput}
1303> library( survival )
1304> cm <- coxph( Surv(age,age+lex.dur,lex.Xst=="Dead") ~
1305+              Ns(tfI,knots=i.kn) + lex.Cst + sex,
1306+              data = tsNA20(dmCs) )
1307\end{Sinput}
1308\end{Schunk}
1309There is also a multistate wrapper for Cox models, requiring a
1310l.h.s. side for the \texttt{formula=} argument:
1311\begin{Schunk}
1312\begin{Sinput}
1313> Cm <- coxph.Lexis( tsNA20(dmCs),
1314+                    form= age ~ Ns(tfI,knots=i.kn) + lex.Cst + sex )
1315\end{Sinput}
1316\begin{Soutput}
1317survival::coxph analysis of Lexis object tsNA20(dmCs):
1318Rates for transitions DM->Dead, Ins->Dead
1319Baseline timescale: age
1320\end{Soutput}
1321\begin{Sinput}
1322> cbind( ci.exp( cm ), ci.exp( Cm ) )
1323\end{Sinput}
1324\begin{Soutput}
1325                       exp(Est.)       2.5%     97.5% exp(Est.)       2.5%     97.5%
1326Ns(tfI, knots = i.kn)1 0.2984062 0.19417148 0.4585960 0.2984062 0.19417148 0.4585960
1327Ns(tfI, knots = i.kn)2 0.3871151 0.29011380 0.5165495 0.3871151 0.29011380 0.5165495
1328Ns(tfI, knots = i.kn)3 0.1239128 0.06287008 0.2442238 0.1239128 0.06287008 0.2442238
1329Ns(tfI, knots = i.kn)4 0.4405121 0.34839015 0.5569932 0.4405121 0.34839015 0.5569932
1330lex.CstIns             5.6700284 4.45011220 7.2243623 5.6700284 4.45011220 7.2243623
1331lex.CstDead            1.0000000 1.00000000 1.0000000 1.0000000 1.00000000 1.0000000
1332sexF                   0.6753202 0.62316569 0.7318397 0.6753202 0.62316569 0.7318397
1333\end{Soutput}
1334\end{Schunk}
1335We can compare the estimates from the Cox model with those from the
1336Poisson model --- we must add \texttt{NA}s because the Cox-model does
1337not give the parameters for the baseline time scale (\texttt{age}), but
1338also remove one of the parameters, because \texttt{coxph} parametrizes
1339factors (in this case \texttt{lex.Cst}) by all defined levels and not
1340only by the levels present in the dataset at hand (note the line of
1341\texttt{1.0000000}s in the print above):
1342\begin{Schunk}
1343\begin{Sinput}
1344> round( cbind( ci.exp( Pm ),
1345+        rbind( matrix(NA,5,3),
1346+               ci.exp( cm )[-6,] ) ), 3 )
1347\end{Sinput}
1348\begin{Soutput}
1349                       exp(Est.)   2.5%  97.5% exp(Est.)  2.5% 97.5%
1350(Intercept)                0.022  0.021  0.024        NA    NA    NA
1351Ns(age, knots = a.kn)1     4.208  3.546  4.993        NA    NA    NA
1352Ns(age, knots = a.kn)2     5.012  4.380  5.736        NA    NA    NA
1353Ns(age, knots = a.kn)3    16.560 14.386 19.063        NA    NA    NA
1354Ns(age, knots = a.kn)4     7.921  7.061  8.885        NA    NA    NA
1355Ns(tfI, knots = i.kn)1     0.298  0.194  0.458     0.298 0.194 0.459
1356Ns(tfI, knots = i.kn)2     0.385  0.289  0.514     0.387 0.290 0.517
1357Ns(tfI, knots = i.kn)3     0.125  0.064  0.246     0.124 0.063 0.244
1358Ns(tfI, knots = i.kn)4     0.438  0.346  0.553     0.441 0.348 0.557
1359lex.CstIns                 5.632  4.430  7.160     5.670 4.450 7.224
1360sexF                       0.674  0.622  0.730     0.675 0.623 0.732
1361\end{Soutput}
1362\end{Schunk}
1363Thus we see that the Poisson and Cox gives pretty much the same
1364results. You may argue that Cox requires a smaller dataset, because
1365there is no need to subdivide data in small intervals \emph{before}
1366insulin use. But certainly the time \emph{after} insulin inception need
1367to be if the effect of this time should be modeled.
1368
1369The drawback of the Cox-modeling is that it is not possible to show
1370the absolute rates as we did in figure \ref{fig:ins-time} on
1371\pageref{fig:ins-time}.
1372
1373\section{Marginal effect of time since insulin}
1374
1375When we plot the marginal effect of \texttt{tfI} from the two models
1376we get pretty much the same; here we plot the RR relative to
1377\texttt{tfI}=2 years. Note that we are deriving the RR as the ratio of
1378two sets of predictions, from the data frames \texttt{nd} and
1379\texttt{nr} --- for further details consult the help page for
1380\texttt{ci.lin}, specifically the use of a list as the
1381\texttt{ctr.mat} argument:
1382\begin{Schunk}
1383\begin{Sinput}
1384> nd <- data.frame( tfI=seq(0,15,,151), lex.Cst="Ins", sex="M" )
1385> nr <- data.frame( tfI=    2         , lex.Cst="Ins", sex="M" )
1386> ppr <- ci.exp( pm, list(nd,nr), xvars="age" )
1387> cpr <- ci.exp( cm, list(nd,nr) )
1388> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1389> matshade( nd$tfI, cbind(ppr,cpr), plot=T,
1390+           lty=c(1,2), log="y",
1391+           xlab="Time since insulin (years)", ylab="Rate ratio")
1392> abline( h=1, lty=3 )
1393\end{Sinput}
1394\end{Schunk}
1395\insfig{Ieff}{0.8}{The naked duration effects relative to 2 years of
1396  duration, black from Poisson model, red from Cox model. The two sets
1397  of estimates are identical, and so are the standard errors, so the
1398  two shaded areas overlap almost perfectly.}
1399
1400In figure \ref{fig:Ieff}, p. \pageref{fig:Ieff}, we see that the
1401duration effect is exactly the same from the two modeling approaches.
1402
1403We will also want the RR relative to the non-insulin users --- recall these
1404are coded 0 on the \texttt{tfI} variable:
1405\begin{Schunk}
1406\begin{Sinput}
1407> nd <- data.frame( tfI=seq(0,15,,151), lex.Cst="Ins", sex="M" )
1408> nr <- data.frame( tfI=    0         , lex.Cst="DM" , sex="M" )
1409> ppr <- ci.exp( pm, list(nd,nr), xvars="age" )
1410> cpr <- ci.exp( cm, list(nd,nr) )
1411> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1412> matshade( nd$tfI, cbind(ppr,cpr),
1413+           xlab="Time since insulin (years)",
1414+           ylab="Rate ratio relative to non-Insulin",
1415+           lty=c(1,2), log="y", plot=T )
1416\end{Sinput}
1417\end{Schunk}
1418\insfig{IeffR}{0.8}{Insulin duration effect (state \textrm{\tt Ins})
1419  relative to no insulin (state \textrm{\tt DM}), black from Poisson
1420  model, red from Cox model. The \emph{shape} is the same as the
1421  previous figure, but the RR is now relative to non-insulin, instead
1422  of relative to insulin users at 2 years duration. The two sets of
1423  estimates are identical, and so are the standard errors, so the two
1424  shaded areas overlap almost perfectly.}
1425
1426In figure \ref{fig:IeffR}, p. \pageref{fig:IeffR}, we see the effect
1427of increasing duration of insulin use \emph{for a fixed age} which is
1428a bit artificial, so we would like to see the \emph{joint} effects of
1429age and insulin duration. What we cannot see is how the duration
1430affects mortality relative to \texttt{current} age (at the age
1431attained at the same time as the attained \texttt{tfI}).
1432
1433Another way of interpreting this curve is as the rate ratio relative to a
1434person not on insulin, so we see that the RR (or hazard ratio, HR as
1435some call it) is over 5 at the start of insulin (the \texttt{lex.Cst}
1436estimate), and decreases to about 1.5 in the long term.
1437
1438Both figure \ref{fig:Ieff} and \ref{fig:IeffR} indicate a declining
1439RR by insulin duration, but only from figure \ref{fig:ins-time} it is
1440visible that mortality actually is \emph{in}creasing by age after some
14412 years after insulin start. This point would not be available if we
1442had only fitted a Cox model where we did not have access to the
1443baseline hazard as a function of age.
1444
1445\section{Age$\times$duration interaction}
1446
1447The model we fitted assumes that the RR is the same regardless of the
1448age at start of insulin --- the hazards are multiplicative. Sometimes
1449this is termed the proportional hazards assumption: For \emph{any}
1450fixed age the HR is the same as a function of time since insulin, and
1451vice versa.
1452
1453A more correct term would be ``main effects model'' --- there is no
1454interaction between age (the baseline time scale) and other
1455covariates. So there is really no need for the term ``proportional
1456hazards''; well defined and precise statistical terms for it has
1457existed for aeons.
1458
1459\subsection{Age at insulin start}
1460
1461In order to check the consistency of the multiplicativity assumption
1462across the spectrum of age at insulin inception, we can fit an
1463interaction model. One approach to this would be using a non-linear
1464effect of age at insulin use (for convenience we use the same knots as
1465for age) --- note that the prediction data frames are the same as we
1466used above, because we do not compute age at insulin use as a separate
1467variable, but rather enter it as the difference between current age
1468(\texttt{age}) and insulin duration (\texttt{tfI}).
1469
1470At first glance we might think of doing:
1471\begin{Schunk}
1472\begin{Sinput}
1473> imx <- glm.Lexis( tsNA20(dmCs),
1474+                  formula = ~ Ns(age    ,knots=a.kn)
1475+                            + Ns(    tfI,knots=i.kn)
1476+                            + Ns(age-tfI,knots=a.kn)
1477+                            + lex.Cst + sex )
1478\end{Sinput}
1479\begin{Soutput}
1480stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link:
1481Rates for transitions: DM->Dead, Ins->Dead
1482\end{Soutput}
1483\end{Schunk}
1484But this will fit a model with a rate-ratio between persons with and
1485without insulin that depends only on age at insulin start for the time
1486\emph{after} insulin start, the RR at \texttt{tfI}=0 will be the same
1487at any age, which really is not the type of interaction we wanted.
1488
1489We want the \texttt{age-tfI} term to be specific for the insulin
1490exposed so we may use one of two other approaches, that are
1491conceptually alike but mathematically different:
1492\begin{Schunk}
1493\begin{Sinput}
1494> Im <- glm.Lexis( tsNA20(dmCs),
1495+                  formula = ~ Ns(age    ,knots=a.kn)
1496+                            + Ns(    tfI,knots=i.kn)
1497+                            + Ns((age-tfI)*(lex.Cst=="Ins"),knots=a.kn)
1498+                            + lex.Cst + sex )
1499\end{Sinput}
1500\begin{Soutput}
1501stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link:
1502Rates for transitions: DM->Dead, Ins->Dead
1503\end{Soutput}
1504\begin{Sinput}
1505> im <- glm.Lexis( tsNA20(dmCs),
1506+                  formula = ~ Ns(age    ,knots=a.kn)
1507+                            + Ns(    tfI,knots=i.kn)
1508+                            + lex.Cst:Ns(age-tfI,knots=a.kn)
1509+                            + lex.Cst + sex )
1510\end{Sinput}
1511\begin{Soutput}
1512stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link:
1513Rates for transitions: DM->Dead, Ins->Dead
1514\end{Soutput}
1515\end{Schunk}
1516The first model (\texttt{Im}) has a common age-effect (\texttt{Ns(age,...}) for
1517persons with and without diabetes and a RR depending on insulin
1518duration \texttt{tfI} and age at insulin (\texttt{age-tfI}). Since the
1519linear effect of these two terms are in the model as well, a linear
1520trend in the RR by current age (\texttt{age}) is accommodated as well.
1521
1522The second model allows age-effects that differ non-linearly between
1523person with and without insulin, because the interaction term
1524\texttt{lex.Cst:Ns(age-tfI...} for persons not on insulin is merely an
1525age term (since \texttt{tfI} is coded 0 for all follow-up not on
1526insulin).
1527
1528We can compare the models fitted:
1529\begin{Schunk}
1530\begin{Sinput}
1531> anova( imx, Im, im, test='Chisq')
1532\end{Sinput}
1533\begin{Soutput}
1534Analysis of Deviance Table
1535
1536Model 1: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age,
1537    knots = a.kn) + Ns(tfI, knots = i.kn) + Ns(age - tfI, knots = a.kn) +
1538    lex.Cst + sex
1539Model 2: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age,
1540    knots = a.kn) + Ns(tfI, knots = i.kn) + Ns((age - tfI) *
1541    (lex.Cst == "Ins"), knots = a.kn) + lex.Cst + sex
1542Model 3: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age,
1543    knots = a.kn) + Ns(tfI, knots = i.kn) + lex.Cst:Ns(age -
1544    tfI, knots = a.kn) + lex.Cst + sex
1545  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
15461    228734      25096
15472    228733      25087  1   8.9631 0.002755
15483    228730      25082  3   4.6804 0.196749
1549\end{Soutput}
1550\end{Schunk}
1551so we see that the models indeed are different, and moreover that the
1552last model does not provide substantial further improvement, by
1553allowing non-linear RR along the age-scale.
1554
1555We can illustrate the different estimated rates from the three models
1556in figure \ref{fig:dur-int}, p. \pageref{fig:dur-int}:
1557\begin{Schunk}
1558\begin{Sinput}
1559> pxi <- ci.pred( imx, ndI )
1560> pxa <- ci.pred( imx, ndA )
1561> pIi <- ci.pred( Im , ndI )
1562> pIa <- ci.pred( Im , ndA )
1563> pii <- ci.pred( im , ndI )
1564> pia <- ci.pred( im , ndA )
1565> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1566> matshade( ndI$age, cbind( pxi, pIi, pii)*1000, plot=T, log="y",
1567+           xlab="Age", ylab="Mortality per 1000 PY",
1568+           lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 )
1569> matshade( ndA$age, cbind( pxa, pIa, pia)*1000,
1570+           lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 )
1571\end{Sinput}
1572\end{Schunk}
1573\insfig{dur-int}{0.8}{Age at insulin as interaction between age and
1574  duration. Blue curves are from the naive interaction model
1575  \textrm{\tt imx} with identical $\RR$ at \textrm{\tt tfI}=0 at any
1576  age; green curves are from the interaction model with age at
1577  insulin, from the model \textrm{\tt Im} with only linear
1578  differences by age, and red lines from the full interaction model
1579  \textrm{\tt im}.}
1580
1581We can also plot the RRs only from these models (figure
1582\ref{fig:dur-int-RR}, p. \pageref{fig:dur-int-RR}); for this we need
1583the reference frames, and the machinery from \texttt{ci.exp} allowing
1584a list of two data frames:
1585\begin{Schunk}
1586\begin{Sinput}
1587> ndR <- transform( ndI, tfI=0, lex.Cst="DM" )
1588> cbind( head(ndI), head(ndR) )
1589\end{Sinput}
1590\begin{Soutput}
1591  tfI ai sex lex.Cst  age tfI ai sex lex.Cst  age
15921  NA 40   M     Ins   NA   0 40   M      DM   NA
15932 0.0 40   M     Ins 40.0   0 40   M      DM 40.0
15943 0.1 40   M     Ins 40.1   0 40   M      DM 40.1
15954 0.2 40   M     Ins 40.2   0 40   M      DM 40.2
15965 0.3 40   M     Ins 40.3   0 40   M      DM 40.3
15976 0.4 40   M     Ins 40.4   0 40   M      DM 40.4
1598\end{Soutput}
1599\begin{Sinput}
1600> Rxi <- ci.exp( imx, list(ndI,ndR) )
1601> Rii <- ci.exp( im , list(ndI,ndR) )
1602> RIi <- ci.exp( Im , list(ndI,ndR) )
1603> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1604> matshade( ndI$age, cbind( Rxi, RIi, Rii), plot=T, log="y",
1605+           xlab="Age (years)", ylab="Rate ratio vs, non-Insulin",
1606+           lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 )
1607> abline( h=1 )
1608> abline( h=ci.exp(imx,subset="lex.Cst")[,1], lty="25", col="blue" )
1609\end{Sinput}
1610\end{Schunk}
1611\insfig{dur-int-RR}{0.9}{RR from three different interaction
1612  models. The horizontal dotted line is at the estimated effect of
1613  \textrm{\tt lex.Cst}, to illustrate that the first model (blue)
1614  constrains this initial HR to be constant across age. The green
1615  curves are the extended interaction model, and the red the full
1616  one.}
1617
1618\clearpage
1619
1620\subsection{General interaction}
1621
1622As a final illustration we may want to explore a different kind of
1623interaction, not defined from the duration --- here we simplify the
1624interaction by not using the second-last knot in the interaction terms
1625--- figure \ref{fig:splint}, p. \pageref{fig:splint}. Note again that
1626the prediction code is the same:
1627\begin{Schunk}
1628\begin{Sinput}
1629> gm <- glm.Lexis( tsNA20(dmCs),
1630+                  formula = ~ Ns(age,knots=a.kn)
1631+                            + Ns(tfI,knots=i.kn)
1632+                            + lex.Cst:Ns(age,knots=a.kn):Ns(tfI,knots=i.kn)
1633+                            + lex.Cst + sex )
1634\end{Sinput}
1635\begin{Soutput}
1636stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link:
1637Rates for transitions: DM->Dead, Ins->Dead
1638\end{Soutput}
1639\begin{Sinput}
1640> pgi <- ci.pred( gm, ndI )
1641> pga <- ci.pred( gm, ndA )
1642> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1643> matshade( ndI$age, cbind( pgi, pii )*1000,  plot=T,
1644+           lty=c("solid","21"), lend="butt", lwd=2, log="y",
1645+           xlab="Age (years)", ylab="Mortality rates per 1000 PY",
1646+           alpha=c(0.2,0.1), col=c("black","red") )
1647> matshade( ndA$age, cbind( pga, pia )*1000,
1648+           lty=c("solid","21"), lend="butt", lwd=2,
1649+           alpha=c(0.2,0.1), col=c("black","red") )
1650\end{Sinput}
1651\end{Schunk}
1652\insfig{splint}{0.8}{Spline-by-spline interaction between age and
1653  duration (model \textrm{\tt gm}, black), and the interaction using a
1654  non-linear effect of age at entry (model \textrm{\tt im}, red),
1655  corresponding to the red curves in figure \ref{fig:dur-int}.}
1656This is in figure \ref{fig:splint}, p. \pageref{fig:splint}.
1657
1658\subsection{Evaluating interactions}
1659
1660Here we see that the interaction effect is such that in the older ages
1661the length of insulin use has an increasing effect on mortality.
1662
1663Even though there is no statistically significant interaction between
1664age and time since start of insulin, it would be illustrative to show
1665the RR as a function of age at insulin and age at follow-up:
1666\begin{Schunk}
1667\begin{Sinput}
1668> ndR <- transform( ndI, lex.Cst="DM", tfI=0 )
1669> iRR <- ci.exp( im, ctr.mat=list(ndI,ndR) )
1670> gRR <- ci.exp( gm, ctr.mat=list(ndI,ndR) )
1671> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" )
1672> matshade( ndI$age, cbind(gRR,iRR), lty=1, log="y", plot=TRUE,
1673+           xlab="Age (years)", ylab="Rate ratio: Ins vs. non-Ins",
1674+           col=c("black","red") )
1675> abline( h=1 )
1676\end{Sinput}
1677\end{Schunk}
1678\insfig{RR-int}{0.8}{The effect of duration of insulin use at
1679  different ages of follow-up (and age at insulin start). Estimates
1680  are from the model with an interaction term using a non-linear
1681  effect of age at insulin start (model \textrm{\tt im}, red) and
1682  using a general spline interactions (model \textrm{\tt gm},
1683  black). It appears that the general interaction over-models a bit.}
1684This is in figure \ref{fig:RR-int}, p. \pageref{fig:RR-int}.
1685
1686The advantage of the parametric modeling (be that with age at insulin
1687or general spline interaction) is that it is straight-forward to
1688\emph{test} whether we have an interaction.
1689
1690\section{Separate models}
1691
1692In the above we insisted on making a joint model for the
1693\texttt{DM}$\rightarrow$\texttt{Dead} and the
1694\texttt{Ins}$\rightarrow$\texttt{Dead}
1695transitions, but with the complications demonstrated it would actually
1696have been more sensible to model the two transitions separately:
1697\begin{Schunk}
1698\begin{Sinput}
1699> dmd <- glm.Lexis( dmCs,
1700+                   from="DM", to="Dead",
1701+                   formula = ~ Ns(age,knots=a.kn)
1702+                             + sex )
1703\end{Sinput}
1704\begin{Soutput}
1705stats::glm Poisson analysis of Lexis object dmCs with log link:
1706Rates for the transition: DM->Dead
1707\end{Soutput}
1708\begin{Sinput}
1709> ind <- glm.Lexis( dmCs,
1710+                   from="Ins", to="Dead",
1711+                   formula = ~ Ns(age,knots=a.kn)
1712+                             + Ns(tfI,knots=i.kn)
1713+                             + Ns(age-tfI,knots=a.kn)
1714+                             + sex )
1715\end{Sinput}
1716\begin{Soutput}
1717stats::glm Poisson analysis of Lexis object dmCs with log link:
1718Rates for the transition: Ins->Dead
1719\end{Soutput}
1720\begin{Sinput}
1721> ini <- ci.pred( ind, ndI )
1722> dmi <- ci.pred( dmd, ndI )
1723> dma <- ci.pred( dmd, ndA )
1724\end{Sinput}
1725\end{Schunk}
1726The estimated mortality rates are shown in figure \ref{fig:sep-mort},
1727p. \pageref{fig:sep-mort}, using:
1728\begin{Schunk}
1729\begin{Sinput}
1730> par(mar=c(3,3,1,1),mgp=c(3,1,0)/1.6,las=1,bty="n")
1731> matshade( ndI$age, ini*1000, plot=TRUE, log="y",
1732+           xlab="Age (years)", ylab="Mortality rates per 1000 PY",
1733+           lwd=2, col="red" )
1734> matshade( ndA$age, dma*1000,
1735+           lwd=2, col="black" )
1736\end{Sinput}
1737\end{Schunk}
1738The estimated RRs are computed using that the estimates from the two
1739models are uncorrelated, and hence qualify for \texttt{ci.ratio} (this
1740and the previous graph
1741appear in figure \ref{fig:Ins-noIns}, p. \pageref{fig:Ins-noIns})
1742\begin{Schunk}
1743\begin{Sinput}
1744> par(mar=c(3,3,1,1),mgp=c(3,1,0)/1.6,las=1,bty="n")
1745> matshade( ndI$age, ci.ratio(ini,dmi), plot=TRUE, log="y",
1746+           xlab="Age (years)", ylab="RR insulin vs. no insulin",
1747+           lwd=2, col="red" )
1748> abline( h=1 )
1749\end{Sinput}
1750\end{Schunk}
1751\begin{figure}[tb]
1752\centering
1753\includegraphics[width=0.49\textwidth]{flup-sep-mort}
1754\includegraphics[width=0.49\textwidth]{flup-sep-HR}
1755\caption{\it Left panel: Mortality rates from separate models for the
1756  two mortality transitions; the \textrm{\tt
1757    DM}$\rightarrow$\textrm{\tt Dead} transition modeled by age alone;
1758  \textrm{\tt Ins}$\rightarrow$\textrm{\tt Dead} transition modeled
1759  with spline effects of current age, time since insulin and and age
1760  at insulin. \newline Right panel: Mortality HR of insulin vs. no insulin.}
1761\label{fig:Ins-noIns}
1762\end{figure}
1763
1764\chapter{More states}
1765
1766\section{Subdividing states}
1767
1768It may be of interest to subdivide the states following the
1769intermediate event according to whether the event has occurred or
1770not. This will enable us to address the question of the fraction of
1771the patients that ever go on insulin.
1772
1773This is done by the argument \texttt{split.states=TRUE}.
1774\begin{Schunk}
1775\begin{Sinput}
1776> dmCs <- cutLexis( data = dmS2,
1777+                     cut = dmS2$doins,
1778+               timescale = "per",
1779+               new.state = "Ins",
1780+               new.scale = "tfI",
1781+        precursor.states = "DM",
1782+            split.states = TRUE )
1783> summary( dmCs )
1784\end{Sinput}
1785\begin{Soutput}
1786Transitions:
1787     To
1788From     DM  Ins Dead Dead(Ins)  Records:  Events: Risk time:  Persons:
1789  DM  35135 1694 2048         0     38877     3742   45885.49      9899
1790  Ins     0 5762    0       451      6213      451    8387.77      1791
1791  Sum 35135 7456 2048       451     45090     4193   54273.27      9996
1792\end{Soutput}
1793\end{Schunk}
1794We can illustrate the numbers and the transitions (figure
1795\ref{fig:box4}, p. \pageref{fig:box4})
1796\begin{Schunk}
1797\begin{Sinput}
1798> boxes( dmCs, boxpos=list(x=c(15,15,85,85),
1799+                          y=c(85,15,85,15)),
1800+        scale.R=1000, show.BE=TRUE )
1801\end{Sinput}
1802\end{Schunk}
1803\insfig{box4}{0.7}{Transitions between 4 states: the numbers \emph{in}
1804  the boxes are person-years (middle), and below the number of persons
1805  who start, respectively end their follow-up in each of the states.}
1806
1807Note that it is only the mortality rates that we have been modeling,
1808namely the transitions \texttt{DM}$\rightarrow$\texttt{Dead}
1809and \texttt{Ins}$\rightarrow$\texttt{Dead(Ins)}.
1810If we were to model the cumulative risk of using insulin we would also
1811need a model for the DM$\rightarrow$Ins
1812transition. Subsequent to that we would then compute the probability
1813of being in each state conditional on suitable starting
1814conditions. With models where transition rates depend on several time
1815scales this is not a trivial task. This is treated in more detail in the
1816vignette on \texttt{simLexis}.
1817
1818\section{Multiple intermediate events}
1819
1820We may be interested in starting either insulin or OAD (oral
1821anti-diabetic drugs), thus giving rise to more states and more
1822time scales. This can be accomplished by the \texttt{mcutLexis}
1823function, that generalizes \texttt{cutLexis}:
1824\begin{Schunk}
1825\begin{Sinput}
1826> dmM <- mcutLexis( dmL,
1827+              timescale = "per",
1828+                     wh = c("doins","dooad"),
1829+             new.states = c("Ins","OAD"),
1830+             new.scales = c("tfI","tfO"),
1831+       precursor.states = "DM",
1832+           ties.resolve = TRUE )
1833\end{Sinput}
1834\begin{Soutput}
1835NOTE: 15 records with tied events times resolved (adding 0.01 random uniform),
1836      so results are only reproducible if the random number seed was set.
1837\end{Soutput}
1838\begin{Sinput}
1839> summary( dmM, t=T )
1840\end{Sinput}
1841\begin{Soutput}
1842Transitions:
1843     To
1844From        DM Dead  OAD  Ins OAD-Ins Ins-OAD  Records:  Events: Risk time:  Persons:
1845  DM      2830 1056 2957  689       0       0      7532     4702   22920.34      7532
1846  OAD        0  992 3327    0    1005       0      5324     1997   22965.24      5324
1847  Ins        0  152    0  462       0     172       786      324    3883.06       786
1848  OAD-Ins    0  266    0    0     739       0      1005      266    3770.53      1005
1849  Ins-OAD    0   33    0    0       0     139       172       33     734.09       172
1850  Sum     2830 2499 6284 1151    1744     311     14819     7322   54273.27      9996
1851
1852Timescales:
1853  per   age   tfD   tfI   tfO
1854   ""    ""    "" "Ins" "OAD"
1855\end{Soutput}
1856\end{Schunk}
1857We see that we now have two time scales defined as entry since into
1858states.
1859\begin{Schunk}
1860\begin{Sinput}
1861> wh <- c(subset(dmM,lex.Cst=="Ins-OAD")$lex.id[1:2],
1862+         subset(dmM,lex.Cst=="OAD-Ins")$lex.id[1:2])
1863> options( width=110 )
1864> print( subset( dmM, lex.id %in% wh )[,c('lex.id',names(dmM[1:8]),c("doins","dooad"))],
1865+        digits=6, row.names=FALSE )
1866\end{Sinput}
1867\begin{Soutput}
1868 lex.id        tfI     tfO     per     age       tfD    lex.dur lex.Cst lex.Xst   doins   dooad
1869     18         NA      NA 1996.75 61.7221 0.0000000 1.16906229      DM     OAD 2005.99 1997.92
1870     18         NA 0.00000 1997.92 62.8912 1.1690623 8.07939767     OAD OAD-Ins 2005.99 1997.92
1871     18 0.00000000 8.07940 2005.99 70.9706 9.2484600 4.00273785 OAD-Ins OAD-Ins 2005.99 1997.92
1872     25         NA      NA 2003.69 60.3422 0.0000000 1.88090349      DM     OAD 2008.64 2005.57
1873     25         NA 0.00000 2005.57 62.2231 1.8809035 3.06913073     OAD OAD-Ins 2008.64 2005.57
1874     25 0.00000000 3.06913 2008.64 65.2923 4.9500342 1.35797399 OAD-Ins OAD-Ins 2008.64 2005.57
1875     20         NA      NA 2009.25 53.2183 0.0000000 0.04071988      DM     Ins 2009.29 2009.29
1876     20 0.00000000      NA 2009.29 53.2591 0.0407199 0.00131847     Ins Ins-OAD 2009.29 2009.29
1877     20 0.00131847 0.00000 2009.29 53.2604 0.0420383 0.70813277 Ins-OAD Ins-OAD 2009.29 2009.29
1878     38         NA      NA 2008.37 63.9316 0.0000000 0.09308693      DM     Ins 2008.46 2008.67
1879     38 0.00000000      NA 2008.46 64.0246 0.0930869 0.21355236     Ins Ins-OAD 2008.46 2008.67
1880     38 0.21355236 0.00000 2008.67 64.2382 0.3066393 1.32511978 Ins-OAD    Dead 2008.46 2008.67
1881\end{Soutput}
1882\end{Schunk}
1883We can also illustrate the transitions to the different states, as in
1884figure \ref{fig:mbox}:
1885\begin{Schunk}
1886\begin{Sinput}
1887> boxes( dmM, boxpos=list(x=c(15,80,40,40,85,85),
1888+                         y=c(50,50,90,10,90,10)),
1889+             scale.R=1000, show.BE=TRUE )
1890\end{Sinput}
1891\end{Schunk}
1892\insfig{mbox}{1.0}{Boxes for the \textrm{\tt dmM} object. The numbers
1893  \emph{in} the boxes are person-years (middle), and below the number
1894  of persons who start, respectively end their follow-up in each of
1895  the states.}
1896We may not be interested in whether persons were prescribed insulin
1897before OAD or vice versa, in which case we would combine the levels
1898with both insulin and OAD to one, regardless of order (figure
1899\ref{fig:mboxr}):
1900\begin{Schunk}
1901\begin{Sinput}
1902> summary( dmMr <- Relevel( dmM, list('OAD+Ins'=5:6), first=FALSE) )
1903\end{Sinput}
1904\begin{Soutput}
1905Transitions:
1906     To
1907From        DM Dead  OAD  Ins OAD+Ins  Records:  Events: Risk time:  Persons:
1908  DM      2830 1056 2957  689       0      7532     4702   22920.34      7532
1909  OAD        0  992 3327    0    1005      5324     1997   22965.24      5324
1910  Ins        0  152    0  462     172       786      324    3883.06       786
1911  OAD+Ins    0  299    0    0     878      1177      299    4504.62      1177
1912  Sum     2830 2499 6284 1151    2055     14819     7322   54273.27      9996
1913\end{Soutput}
1914\begin{Sinput}
1915> boxes( dmMr, boxpos=list(x=c(15,50,15,85,85),
1916+                          y=c(85,50,15,85,15)),
1917+              scale.R=1000, show.BE=TRUE )
1918\end{Sinput}
1919\end{Schunk}
1920\insfig{mboxr}{1.0}{Boxes for the \textrm{\tt dmMr} object with
1921  collapsed states. The numbers \emph{in} the boxes are person-years
1922  (middle), and below the number of persons who start, respectively
1923  end their follow-up in each of the states.}
1924
1925Diagrams as those in figures
1926\ref{fig:mbox} and
1927\ref{fig:mboxr} gives an overview of the possible transitions,
1928which states it might be relevant to collapse, and which transitions
1929to model and how.
1930
1931The actual modeling of the transition rates is straightforward;
1932split the data along some timescale and then use \texttt{glm.Lexis} or
1933\texttt{gam.Lexis}, where it is possible to select the transitions
1934modelled. This is also possible with the \texttt{coxph.Lexis}
1935function, but it requires that a single time scale be selected as the
1936baseline time scale, and the effect of this will not be accessible.
1937
1938\chapter{\texttt{Lexis} functions}
1939
1940The \texttt{Lexis} machinery has evolved over time since it was first
1941introduced in a workable version in \texttt{Epi\_1.0.5} in August 2008.
1942
1943Over the years there have been additions of tools for handling
1944multistate data. Here is a list of the current functions relating to
1945\texttt{Lexis} objects with a very brief description; it does not
1946replace the documentation. Unless otherwise stated, functions named
1947\texttt{something.Lexis} (with a ``\texttt{.}'')  are S3 methods for
1948\texttt{Lexis} objects, so you can skip the ``\texttt{.Lexis}'' in
1949daily use.
1950
1951\setlist{noitemsep}
1952\begin{description}
1953
1954\item[Define]\ \\
1955\begin{description}
1956\item[\texttt{Lexis}] defines a \texttt{Lexis} object
1957\end{description}
1958
1959\item[Cut and split]\ \\
1960\begin{description}
1961\item[\texttt{cutLexis}] cut follow-up at intermediate event
1962\item[\texttt{mcutLexis}] cut follow-up at several intermediate events
1963\item[\texttt{countLexis}] cut follow-up at intermediate event count
1964  the no. events so far
1965\item[\texttt{splitLexis}] split follow up along a time scale
1966\item[\texttt{splitMulti}] split follow up along a time scale --- from
1967  the \texttt{popEpi} package, faster and has simpler syntax than
1968  \texttt{splitLexis}
1969\item[\texttt{addCov.Lexis}] add clinical measurements at a given date to a
1970  \texttt{Lexis} object
1971\end{description}
1972
1973\item[Boxes and plots]\ \\
1974\begin{description}
1975\item[\texttt{boxes.Lexis}] draw a diagram of states and transitions
1976\item[\texttt{plot.Lexis}] draw a standard Lexis diagram
1977\item[\texttt{points.Lexis}] add points to a Lexis diagram
1978\item[\texttt{lines.Lexis}] add lines to a Lexis diagram
1979\item[\texttt{PY.ann.Lexis}] annotate life lines in a Lexis diagram
1980\end{description}
1981
1982\item[Summarize and query]\ \\
1983\begin{description}
1984\item[\texttt{summary.Lexis}] overview of transitions, risk time etc.
1985\item[\texttt{levels.Lexis}] what are the states in the \texttt{Lexis} object
1986\item[\texttt{nid.Lexis}] number of persons in the \texttt{Lexis}
1987  object --- how many unique values of \texttt{lex.id} are present
1988\item[\texttt{entry}] entry time
1989\item[\texttt{exit}] exit time
1990\item[\texttt{status}] status at entry or exit
1991\item[\texttt{timeBand}] factor of time bands
1992\item[\texttt{timeScales}] what time scales are in the \texttt{Lexis} object
1993\item[\texttt{timeSince}] what time scales are defined as time since a given state
1994\item[\texttt{breaks}] what breaks are currently defined
1995\item[\texttt{absorbing}] what are the absorbing states
1996\item[\texttt{transient}] what are the transient states
1997\item[\texttt{preceding}, \texttt{before}] which states precede this
1998\item[\texttt{succeeding}, \texttt{after}] which states can follow this
1999\item[\texttt{tmat.Lexis}] transition matrix for the \texttt{Lexis} object
2000\end{description}
2001
2002\item[Manipulate]\ \\
2003\begin{description}
2004\item[\texttt{subset.Lexis}, \texttt{[}] subset of a \texttt{Lexis} object
2005\item[\texttt{merge.Lexis}] merges a \texttt{Lexis} objects with a
2006  \texttt{data.frame}
2007\item[\texttt{cbind.Lexis}] bind a \texttt{data.frame} to a \texttt{Lexis} object
2008\item[\texttt{rbind.Lexis}] put two \texttt{Lexis} objects head-to-foot
2009\item[\texttt{transform.Lexis}] transform and add variables
2010\item[\texttt{tsNA20}] turn \texttt{NA}s to 0s for time scales
2011\item[\texttt{Relevel.Lexis}, \texttt{factorize.Lexis}] reorder and
2012  combine states
2013\item[\texttt{rm.tr}] remove transitions from a \texttt{Lexis} object
2014\item[\texttt{bootLexis}] bootstrap sample of \emph{persons}
2015  (\texttt{lex.id}) in the \texttt{Lexis} object
2016\end{description}
2017
2018\item[Simulate]\ \\
2019\begin{description}
2020\item[\texttt{simLexis}] simulate a \texttt{Lexis} object from
2021  specified transition rate models
2022\item[\texttt{nState}, \texttt{pState}] count state occupancy from a
2023  simulated \texttt{Lexis} object
2024\item[\texttt{plot.pState}, \texttt{lines.pState}] plot state occupancy from a
2025  \texttt{pState} object
2026\end{description}
2027
2028\item[Stack]\ \\
2029\begin{description}
2030\item[\texttt{stack.Lexis}] make a stacked object for simultaneous
2031  analysis of transitions --- returns a \texttt{stacked.Lexis} object
2032\item[\texttt{subset.stacked.Lexis}] subsets of a \texttt{stacked.Lexis} object
2033\item[\texttt{transform.stacked.Lexis}] transform  a \texttt{stacked.Lexis} object
2034\end{description}
2035
2036\item[Interface to other packages]\ \\
2037\begin{description}
2038\item[\texttt{msdata.Lexis}] interface to \texttt{mstate} package
2039\item[\texttt{etm.Lexis}] interface to \texttt{etm} package
2040\item[\texttt{crr.Lexis}] interface to \texttt{cmprsk} package
2041\end{description}
2042
2043\item[Statistical models] --- these are \emph{not} S3 methods
2044\begin{description}
2045\item[\texttt{glm.Lexis}] fit a \texttt{glm} model using the
2046  \texttt{poisreg} family to (hopefully) time split data
2047\item[\texttt{gam.Lexis}] fit a \texttt{gam} model (from the
2048  \texttt{mgcv} package) using the \texttt{poisreg} family to
2049  (hopefully) time split data
2050\item[\texttt{coxph.Lexis}] fit a Cox model to follow-up in a
2051  \texttt{Lexis} object
2052\end{description}
2053
2054\end{description}
2055
2056\renewcommand{\bibname}{References}
2057
2058\bibliographystyle{plain}
2059\begin{thebibliography}{1}
2060
2061\bibitem{Carstensen.2011a}
2062B~Carstensen and M~Plummer.
2063\newblock Using {L}exis objects for multi-state models in {R}.
2064\newblock {\em Journal of Statistical Software}, 38(6):1--18, 1 2011.
2065
2066%% \bibitem{Iacobelli.2013}
2067%% S~Iacobelli and B~Carstensen.
2068%% \newblock {Multiple time scales in multi-state models}.
2069%% \newblock {\em Stat Med}, 32(30):5315--5327, Dec 2013.
2070
2071\bibitem{Plummer.2011}
2072M~Plummer and B~Carstensen.
2073\newblock Lexis: An {R} class for epidemiological studies with long-term
2074  follow-up.
2075\newblock {\em Journal of Statistical Software}, 38(5):1--12, 1 2011.
2076
2077\end{thebibliography}
2078
2079\addcontentsline{toc}{chapter}{\bibname}
2080
2081\end{document}
2082