1%\VignetteIndexEntry{Follow-up data with the Lexis functions in Epi} 2 3\documentclass[a4paper,dvipsnames,twoside,12pt]{report} 4\newcommand{\Title}{Follow-up data with the\\ \texttt{Lexis} functions in 5 \texttt{Epi}} 6\newcommand{\Tit}{Follow-up with \texttt{Lexis}} 7\newcommand{\Version}{Version 6} 8\newcommand{\Dates}{August 2019} 9\newcommand{\Where}{SDCC} 10\newcommand{\Homepage}{\url{http://bendixcarstensen.com/} } 11\newcommand{\Faculty}{\begin{tabular}{rl} 12Bendix Carstensen 13 & Steno Diabetes Center Copenhagen, Gentofte, Denmark\\ 14 & {\small \& Department of Biostatistics, 15 University of Copenhagen} \\ 16 & \texttt{b@bxc.dk} \\ 17 & \url{http://BendixCarstensen.com} \\[1em] 18 \end{tabular}} 19 20\input{topreport} 21\renewcommand{\rwpre}{./flup} 22 23\chapter*{Introduction} 24\addcontentsline{toc}{chapter}{Introduction} 25 26This is an introduction to the \texttt{Lexis} machinery in the 27\texttt{Epi} package. The machinery is intended for representation and 28manipulation of follow-up data (event history data) from studies where 29exact dates of events are known. It accommodates follow-up through 30multiple states and on multiple time scales. 31 32This vignette uses an example from the \texttt{Epi} package to 33illustrate the set-up of a simple \texttt{Lexis} object (a data frame 34of follow-up intervals), as well as the subdivision of follow-up 35intervals needed for multistate representation and analysis of 36transition rates. 37 38The first chapter is exclusively on manipulation of the follow-up 39representation, but it points to the subsequent chapter where analysis 40is based on a \texttt{Lexis} representation with very small follow-up 41intervals. 42 43Chapter 2 uses analysis of mortality rates among Danish diabetes 44patients (available in the \texttt{Epi} package) currently on insulin 45treatment or not to illustrate the use of the the \texttt{Lexis} 46machinery. 47 48I owe much thanks to my colleague Lars Jorge Diaz for careful reading 49and many constructive suggestions. 50 51\section{History} 52 53The \texttt{Lexis} machinery in the \texttt{Epi} package was first 54conceived by Martyn Plummer\cite{Plummer.2011,Carstensen.2011a}, and 55since its first appearance in the \texttt{Epi} package in 2008 it has 56been expanded with a number of utilities. An overview of these can be 57found in the last chapter of this note: ``\texttt{Lexis} functions''. 58 59\chapter{Representation of follow-up data in the \texttt{Epi} package} 60 61In the \texttt{Epi}-package, follow-up data is represented by adding 62some extra variables to a data frame. Such a data frame is called a 63\texttt{Lexis} object. The tools for handling follow-up data then use 64the structure of this for special plots, tabulations and modeling. 65 66Follow-up data basically consists of a time of entry, a time of exit 67and an indication of the status at exit (normally either ``alive'' or 68``dead'') for each person. Implicitly is also assumed a status 69\emph{during} the follow-up (usually ``alive''). 70 71\begin{figure}[htbp] 72 \centering 73\setlength{\unitlength}{1pt} 74\begin{picture}(210,70)(0,75) 75%\scriptsize 76\thicklines 77 \put( 0,80){\makebox(0,0)[r]{Age-scale}} 78 \put( 50,80){\line(1,0){150}} 79 \put( 50,80){\line(0,1){5}} 80 \put(100,80){\line(0,1){5}} 81 \put(150,80){\line(0,1){5}} 82 \put(200,80){\line(0,1){5}} 83 \put( 50,77){\makebox(0,0)[t]{35}} 84 \put(100,77){\makebox(0,0)[t]{40}} 85 \put(150,77){\makebox(0,0)[t]{45}} 86 \put(200,77){\makebox(0,0)[t]{50}} 87 88 \put( 0,115){\makebox(0,0)[r]{Follow-up}} 89 90 \put( 80,105){\makebox(0,0)[r]{\small Two}} 91 \put( 90,105){\line(1,0){87}} 92 \put( 90,100){\line(0,1){10}} 93 \put(100,100){\line(0,1){10}} 94 \put(150,100){\line(0,1){10}} 95 \put(180,105){\circle{6}} 96 \put( 95,110){\makebox(0,0)[b]{1}} 97 \put(125,110){\makebox(0,0)[b]{5}} 98 \put(165,110){\makebox(0,0)[b]{3}} 99 100 \put( 50,130){\makebox(0,0)[r]{\small One}} 101 \put( 60,130){\line(1,0){70}} 102 \put( 60,125){\line(0,1){10}} 103 \put(100,125){\line(0,1){10}} 104 \put(130,130){\circle*{6}} 105 \put( 80,135){\makebox(0,0)[b]{4}} 106 \put(115,135){\makebox(0,0)[b]{3}} 107\end{picture} 108 \caption{\it Follow-up of two persons} 109 \label{fig:fu2} 110\end{figure} 111 112\section{Time scales} 113 114A time scale is a variable that varies deterministically \emph{within} 115each person during follow-up, \textit{e.g.}: 116\begin{itemize} 117 \item Age 118 \item Calendar time 119 \item Time since start of treatment 120 \item Time since relapse 121\end{itemize} 122All time scales advance at the same pace, so the time followed is the 123same on all time scales. Therefore, it will suffice to use only the 124entry point on each of the time scales, for example: 125\begin{itemize} 126 \item Age at entry 127 \item Date of entry 128 \item Time at treatment (\emph{at} treatment this is 0) 129 \item Time at relapse (\emph{at} relapse this is 0) 130\end{itemize} 131For illustration we need to load the \texttt{Epi} package: 132\begin{Schunk} 133\begin{Sinput} 134> library(Epi) 135> print( sessionInfo(), l=F ) 136\end{Sinput} 137\begin{Soutput} 138R version 3.6.3 (2020-02-29) 139Platform: x86_64-pc-linux-gnu (64-bit) 140Running under: Ubuntu 14.04.6 LTS 141 142Matrix products: default 143BLAS: /usr/lib/openblas-base/libopenblas.so.0 144LAPACK: /usr/lib/lapack/liblapack.so.3.0 145 146attached base packages: 147[1] utils datasets graphics grDevices stats methods base 148 149other attached packages: 150[1] Epi_2.43 151 152loaded via a namespace (and not attached): 153 [1] Rcpp_1.0.3 magrittr_1.5 splines_3.6.3 MASS_7.3-53 154 [5] tidyselect_0.2.5 lattice_0.20-41 R6_2.4.1 rlang_0.4.4 155 [9] plyr_1.8.5 dplyr_0.8.4 cmprsk_2.2-7 tools_3.6.3 156[13] parallel_3.6.3 grid_3.6.3 data.table_1.12.0 nlme_3.1-149 157[17] mgcv_1.8-33 survival_3.2-7 assertthat_0.2.1 tibble_2.1.3 158[21] etm_1.0.4 numDeriv_2016.8-1 crayon_1.3.4 Matrix_1.2-18 159[25] purrr_0.3.0 glue_1.3.1 compiler_3.6.3 pillar_1.4.3 160[29] zoo_1.8-4 pkgconfig_2.0.3 161\end{Soutput} 162\end{Schunk} 163In the \texttt{Epi} package, follow-up in a cohort is represented in a 164\texttt{Lexis} object. As mentioned, a \texttt{Lexis} object is a data 165frame with some extra structure representing the follow-up. For the 166\texttt{DMlate} data --- follow-up of diabetes patients in Denmark 167recording date of birth, date of diabetes, date of insulin use, date 168of first oral drug use and date of death --- we can construct a 169\texttt{Lexis} object by: 170\begin{Schunk} 171\begin{Sinput} 172> data( DMlate ) 173> head( DMlate ) 174\end{Sinput} 175\begin{Soutput} 176 sex dobth dodm dodth dooad doins dox 17750185 F 1940.256 1998.917 NA NA NA 2009.997 178307563 M 1939.218 2003.309 NA 2007.446 NA 2009.997 179294104 F 1918.301 2004.552 NA NA NA 2009.997 180336439 F 1965.225 2009.261 NA NA NA 2009.997 181245651 M 1932.877 2008.653 NA NA NA 2009.997 182216824 F 1927.870 2007.886 2009.923 NA NA 2009.923 183\end{Soutput} 184\begin{Sinput} 185> dmL <- Lexis( entry = list( per=dodm, 186+ age=dodm-dobth, 187+ tfD=0 ), 188+ exit = list( per=dox ), 189+ exit.status = factor( !is.na(dodth), labels=c("DM","Dead") ), 190+ data = DMlate ) 191\end{Sinput} 192\begin{Soutput} 193NOTE: entry.status has been set to "DM" for all. 194NOTE: Dropping 4 rows with duration of follow up < tol 195\end{Soutput} 196\begin{Sinput} 197> timeScales(dmL) 198\end{Sinput} 199\begin{Soutput} 200[1] "per" "age" "tfD" 201\end{Soutput} 202\end{Schunk} 203(The excluded persons are persons with date of diabetes equal to date 204of death.) 205 206The \texttt{entry} argument is a \emph{named} list with the entry 207points on each of the time scales we want to use. It defines the names 208of the time scales and the entry points of the follow-up of each 209person. The \texttt{exit} argument gives the exit time on \emph{one} 210of the time scales, so the name of the element in this list must match 211one of the names of the \texttt{entry} list. This is sufficient, 212because the follow-up time on all time scales is the same, in this 213case $\mathtt{dox}$-$\mathtt{dodm}$. 214 215The \texttt{exit.status} is a categorical variable (a \emph{factor}) 216that indicates the exit status --- in this case whether the person 217(still) is in state \texttt{DM} or exits to \texttt{Dead} at the end 218of follow-up. In principle we should also indicate the 219\texttt{entry.status}, but the default is to assume that all persons 220enter in the \emph{first} of the mentioned \texttt{exit.state}s --- 221in this case \texttt{DM}, because $\mathtt{FALSE}<\mathtt{TRUE}$. 222 223Now take a look at the result: 224\begin{Schunk} 225\begin{Sinput} 226> str( dmL ) 227\end{Sinput} 228\begin{Soutput} 229Classes ‘Lexis’ and 'data.frame': 9996 obs. of 14 variables: 230 $ per : num 1999 2003 2005 2009 2009 ... 231 $ age : num 58.7 64.1 86.3 44 75.8 ... 232 $ tfD : num 0 0 0 0 0 0 0 0 0 0 ... 233 $ lex.dur: num 11.08 6.689 5.446 0.736 1.344 ... 234 $ lex.Cst: Factor w/ 2 levels "DM","Dead": 1 1 1 1 1 1 1 1 1 1 ... 235 $ lex.Xst: Factor w/ 2 levels "DM","Dead": 1 1 1 1 1 2 1 1 2 1 ... 236 $ lex.id : int 1 2 3 4 5 6 7 8 9 10 ... 237 $ sex : Factor w/ 2 levels "M","F": 2 1 2 2 1 2 1 1 2 1 ... 238 $ dobth : num 1940 1939 1918 1965 1933 ... 239 $ dodm : num 1999 2003 2005 2009 2009 ... 240 $ dodth : num NA NA NA NA NA ... 241 $ dooad : num NA 2007 NA NA NA ... 242 $ doins : num NA NA NA NA NA NA NA NA NA NA ... 243 $ dox : num 2010 2010 2010 2010 2010 ... 244 - attr(*, "time.scales")= chr "per" "age" "tfD" 245 - attr(*, "time.since")= chr "" "" "" 246 - attr(*, "breaks")=List of 3 247 ..$ per: NULL 248 ..$ age: NULL 249 ..$ tfD: NULL 250\end{Soutput} 251\begin{Sinput} 252> head( dmL )[,1:10] 253\end{Sinput} 254\begin{Soutput} 255 per age tfD lex.dur lex.Cst lex.Xst lex.id sex dobth dodm 25650185 1998.917 58.66119 0 11.0800821 DM DM 1 F 1940.256 1998.917 257307563 2003.309 64.09035 0 6.6885695 DM DM 2 M 1939.218 2003.309 258294104 2004.552 86.25051 0 5.4455852 DM DM 3 F 1918.301 2004.552 259336439 2009.261 44.03559 0 0.7364819 DM DM 4 F 1965.225 2009.261 260245651 2008.653 75.77550 0 1.3442847 DM DM 5 M 1932.877 2008.653 261216824 2007.886 80.01643 0 2.0369610 DM Dead 6 F 1927.870 2007.886 262\end{Soutput} 263\end{Schunk} 264The \texttt{Lexis} object \texttt{dmL} has a variable for each 265time scale which is the entry point on this time scale. The follow-up 266time is in the variable \texttt{lex.dur} (\texttt{dur}ation). 267Note that the exit status is in the variable \texttt{lex.Xst} 268(e\texttt{X}it \texttt{st}ate. The variable \texttt{lex.Cst} is the 269state where the follow-up takes place (\texttt{C}urrent 270\texttt{st}ate), in this case \texttt{DM} (alive with diabetes) for 271all persons. This implies that \emph{censored} observations are 272characterized by having $\mathtt{lex.Cst}=\mathtt{lex.Xst}$. 273 274There is a \texttt{summary} function for \texttt{Lexis} objects that 275lists the number of transitions and records as well as the total amount 276of follow-up time; it also (optionally) prints information about the names of the 277variables that constitute the time scales: 278\begin{Schunk} 279\begin{Sinput} 280> summary.Lexis( dmL, timeScales=TRUE ) 281\end{Sinput} 282\begin{Soutput} 283Transitions: 284 To 285From DM Dead Records: Events: Risk time: Persons: 286 DM 7497 2499 9996 2499 54273.27 9996 287 288Timescales: 289per age tfD 290 "" "" "" 291\end{Soutput} 292\end{Schunk} 293It is possible to get a visualization of the follow-up along the 294time scales chosen by using the \texttt{plot} method for \texttt{Lexis} 295objects. \texttt{dmL} is an object of \emph{class} \texttt{Lexis}, so 296using the function \texttt{plot()} on it means that \R\ will look for 297the function \texttt{plot.Lexis} and use this function. 298\begin{Schunk} 299\begin{Sinput} 300> plot( dmL ) 301\end{Sinput} 302\end{Schunk} 303The function allows quite a bit of control over the output, and a 304\texttt{points.Lexis} function allows plotting of the endpoints of 305follow-up: 306\begin{Schunk} 307\begin{Sinput} 308> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6 ) 309> plot( dmL, 1:2, lwd=1, col=c("blue","red")[dmL$sex], 310+ grid=TRUE, lty.grid=1, col.grid=gray(0.7), 311+ xlim=1960+c(0,60), xaxs="i", 312+ ylim= 40+c(0,60), yaxs="i", las=1 ) 313> points( dmL, 1:2, pch=c(NA,3)[dmL$lex.Xst], 314+ col="lightgray", lwd=3, cex=0.3 ) 315> points( dmL, 1:2, pch=c(NA,3)[dmL$lex.Xst], 316+ col=c("blue","red")[dmL$sex], lwd=1, cex=0.3 ) 317> box(bty='o') 318\end{Sinput} 319\end{Schunk} 320In the above code you will note that the values of the arguments 321\texttt{col} and \texttt{pch} are indexed by factors, using the 322convention in \R\ that the index is taken as \emph{number of the level} 323of the supplied factor. Thus \texttt{c("blue","red")[dmL\$sex]} is 324\texttt{"blue"} when \texttt{sex} is \texttt{M} (the first level). 325 326The results of these two plotting commands are in figure 327\ref{fig:Lexis-diagram}, p. \pageref{fig:Lexis-diagram}. 328\begin{figure}[tb] 329\centering 330\includegraphics[width=0.35\textwidth]{flup-dmL1} 331\includegraphics[width=0.63\textwidth]{flup-dmL2} 332\caption{\it Lexis diagram of the \textrm{\tt DMlate} dataset; left 333 panel is the default version, right panel: plot with some bells and 334 whistles. The red lines are for women, blue for men, crosses 335 indicate deaths.} 336\label{fig:Lexis-diagram} 337\end{figure} 338 339\section{Splitting the follow-up time along a time scale} 340 341In next chapter we shall conduct statistical analysis of mortality 342rates, and a prerequisite for parametric analysis of rates is that 343follow-up time is subdivided in smaller intervals, where we can 344reasonably assume that rates are constant. 345 346The follow-up time in a cohort can be subdivided (``split'') along a 347time scale, for example current age. This is achieved by the 348\texttt{splitLexis} (note that it is \emph{not} called 349\texttt{split.Lexis}). This requires that the time scale and the 350breakpoints on this time scale are supplied. Try: 351\begin{Schunk} 352\begin{Sinput} 353> dmS1 <- splitLexis( dmL, "age", breaks=seq(0,100,5) ) 354> summary( dmL ) 355\end{Sinput} 356\begin{Soutput} 357Transitions: 358 To 359From DM Dead Records: Events: Risk time: Persons: 360 DM 7497 2499 9996 2499 54273.27 9996 361\end{Soutput} 362\begin{Sinput} 363> summary( dmS1 ) 364\end{Sinput} 365\begin{Soutput} 366Transitions: 367 To 368From DM Dead Records: Events: Risk time: Persons: 369 DM 18328 2499 20827 2499 54273.27 9996 370\end{Soutput} 371\end{Schunk} 372We see that the number of persons and events and the amount of 373follow-up is the same in the two data sets; only the number of records 374differ --- the extra records all have \texttt{lex.Cst}=\texttt{DM} and 375\texttt{lex.Xst}=\texttt{DM}. 376 377To see how records are split for each individual, it is useful to list 378the results for a few individuals (whom we selected with a view to the 379illustrative usefulness): 380\begin{Schunk} 381\begin{Sinput} 382> wh.id <- c(9,27,52,484) 383> subset( dmL , lex.id %in% wh.id )[,1:10] 384\end{Sinput} 385\begin{Soutput} 386 per age tfD lex.dur lex.Cst lex.Xst lex.id sex dobth dodm 387430048 1998.956 61.87269 0 9.508556 DM Dead 9 F 1937.083 1998.956 38822671 2000.042 52.71184 0 9.954825 DM DM 27 M 1947.331 2000.042 389338459 1998.249 61.85626 0 11.748118 DM DM 52 F 1936.393 1998.249 390274124 1998.260 62.37919 0 10.929500 DM Dead 484 F 1935.881 1998.260 391\end{Soutput} 392\begin{Sinput} 393> subset( dmS1, lex.id %in% wh.id )[,1:10] 394\end{Sinput} 395\begin{Soutput} 396 lex.id per age tfD lex.dur lex.Cst lex.Xst sex dobth dodm 39714 9 1998.956 61.87269 0.000000 3.127310 DM DM F 1937.083 1998.956 39815 9 2002.083 65.00000 3.127310 5.000000 DM DM F 1937.083 1998.956 39916 9 2007.083 70.00000 8.127310 1.381246 DM Dead F 1937.083 1998.956 40054 27 2000.042 52.71184 0.000000 2.288159 DM DM M 1947.331 2000.042 40155 27 2002.331 55.00000 2.288159 5.000000 DM DM M 1947.331 2000.042 40256 27 2007.331 60.00000 7.288159 2.666667 DM DM M 1947.331 2000.042 403108 52 1998.249 61.85626 0.000000 3.143737 DM DM F 1936.393 1998.249 404109 52 2001.393 65.00000 3.143737 5.000000 DM DM F 1936.393 1998.249 405110 52 2006.393 70.00000 8.143737 3.604381 DM DM F 1936.393 1998.249 4061004 484 1998.260 62.37919 0.000000 2.620808 DM DM F 1935.881 1998.260 4071005 484 2000.881 65.00000 2.620808 5.000000 DM DM F 1935.881 1998.260 4081006 484 2005.881 70.00000 7.620808 3.308693 DM Dead F 1935.881 1998.260 409\end{Soutput} 410\end{Schunk} 411The resulting object, \texttt{dmS1}, is again a \texttt{Lexis} object, 412and the follow-up may be split further along another time scale, for 413example diabetes duration, \texttt{tfD}. Subsequently we list the 414results for the chosen individuals: 415\begin{Schunk} 416\begin{Sinput} 417> dmS2 <- splitLexis( dmS1, "tfD", breaks=c(0,1,2,5,10,20,30,40) ) 418> subset( dmS2, lex.id %in% wh.id )[,1:10] 419\end{Sinput} 420\begin{Soutput} 421 lex.id per age tfD lex.dur lex.Cst lex.Xst sex dobth dodm 42231 9 1998.956 61.87269 0.000000 1.0000000 DM DM F 1937.083 1998.956 42332 9 1999.956 62.87269 1.000000 1.0000000 DM DM F 1937.083 1998.956 42433 9 2000.956 63.87269 2.000000 1.1273101 DM DM F 1937.083 1998.956 42534 9 2002.083 65.00000 3.127310 1.8726899 DM DM F 1937.083 1998.956 42635 9 2003.956 66.87269 5.000000 3.1273101 DM DM F 1937.083 1998.956 42736 9 2007.083 70.00000 8.127310 1.3812457 DM Dead F 1937.083 1998.956 428111 27 2000.042 52.71184 0.000000 1.0000000 DM DM M 1947.331 2000.042 429112 27 2001.042 53.71184 1.000000 1.0000000 DM DM M 1947.331 2000.042 430113 27 2002.042 54.71184 2.000000 0.2881588 DM DM M 1947.331 2000.042 431114 27 2002.331 55.00000 2.288159 2.7118412 DM DM M 1947.331 2000.042 432115 27 2005.042 57.71184 5.000000 2.2881588 DM DM M 1947.331 2000.042 433116 27 2007.331 60.00000 7.288159 2.6666667 DM DM M 1947.331 2000.042 434229 52 1998.249 61.85626 0.000000 1.0000000 DM DM F 1936.393 1998.249 435230 52 1999.249 62.85626 1.000000 1.0000000 DM DM F 1936.393 1998.249 436231 52 2000.249 63.85626 2.000000 1.1437372 DM DM F 1936.393 1998.249 437232 52 2001.393 65.00000 3.143737 1.8562628 DM DM F 1936.393 1998.249 438233 52 2003.249 66.85626 5.000000 3.1437372 DM DM F 1936.393 1998.249 439234 52 2006.393 70.00000 8.143737 1.8562628 DM DM F 1936.393 1998.249 440235 52 2008.249 71.85626 10.000000 1.7481177 DM DM F 1936.393 1998.249 4412084 484 1998.260 62.37919 0.000000 1.0000000 DM DM F 1935.881 1998.260 4422085 484 1999.260 63.37919 1.000000 1.0000000 DM DM F 1935.881 1998.260 4432086 484 2000.260 64.37919 2.000000 0.6208077 DM DM F 1935.881 1998.260 4442087 484 2000.881 65.00000 2.620808 2.3791923 DM DM F 1935.881 1998.260 4452088 484 2003.260 67.37919 5.000000 2.6208077 DM DM F 1935.881 1998.260 4462089 484 2005.881 70.00000 7.620808 2.3791923 DM DM F 1935.881 1998.260 4472090 484 2008.260 72.37919 10.000000 0.9295003 DM Dead F 1935.881 1998.260 448\end{Soutput} 449\end{Schunk} 450A more efficient (and more intuitive) way of making this double split 451is to use the \texttt{splitMulti} function from the \texttt{popEpi} 452package: 453\begin{Schunk} 454\begin{Sinput} 455> library( popEpi ) 456> dmM <- splitMulti( dmL, age = seq(0,100,5), 457+ tfD = c(0,1,2,5,10,20,30,40), 458+ drop = FALSE ) 459> summary( dmS2 ) 460\end{Sinput} 461\begin{Soutput} 462Transitions: 463 To 464From DM Dead Records: Events: Risk time: Persons: 465 DM 40897 2499 43396 2499 54273.27 9996 466\end{Soutput} 467\begin{Sinput} 468> summary( dmM ) 469\end{Sinput} 470\begin{Soutput} 471Transitions: 472 To 473From DM Dead Records: Events: Risk time: Persons: 474 DM 40897 2499 43396 2499 54273.27 9996 475\end{Soutput} 476\end{Schunk} 477Note we used the argument \texttt{drop=FALSE} which will retain 478follow-up also outside the window defined by the breaks. Otherwise the 479default for \texttt{splitMulti} would be to drop follow-up outside 480\texttt{age} [0,100] and \texttt{tfD} [0,40]. This clipping behaviour 481is not available in \texttt{splitLexis}, nevertheless this may be 482exactly what we want in some situations. 483 484So we see that the two ways of splitting data yields the same amount of 485follow-up, but the results are not identical: 486\begin{Schunk} 487\begin{Sinput} 488> identical( dmS2, dmM ) 489\end{Sinput} 490\begin{Soutput} 491[1] FALSE 492\end{Soutput} 493\begin{Sinput} 494> class( dmS2 ) 495\end{Sinput} 496\begin{Soutput} 497[1] "Lexis" "data.frame" 498\end{Soutput} 499\begin{Sinput} 500> class( dmM ) 501\end{Sinput} 502\begin{Soutput} 503[1] "Lexis" "data.table" "data.frame" 504\end{Soutput} 505\end{Schunk} 506As we see, this is because the \texttt{dmM} object also is a 507\texttt{data.table} object; the \texttt{splitMulti} uses the 508\texttt{data.table} machinery which makes the splitting substantially 509faster --- this is of particular interest if you operate on large data 510sets ($>100,000$ records). 511 512Thus the recommended way of splitting follow-up time is by 513\texttt{splitMulti}. But you should be aware that the result is a 514\texttt{data.table} object, which in some circumstances behaves 515slightly different from \texttt{data.frame}s. See the manual for 516\texttt{data.table}. 517 518\section{Cutting follow up time at dates of intermediate events} 519 520If we have a recording of the date of a specific event as for example 521recovery or relapse, we may classify follow-up time as being before or 522after this intermediate event, but it requires that follow-up records 523that straddle the event be cut in two and placed in separate records, 524one representing follow-up \emph{before} the intermediate event, and another 525representing follow-up \emph{after} the intermediate event. This is 526achieved with the function \texttt{cutLexis}, which takes three 527arguments: the time point of the intermediate event, the time scale 528that this point refers to, and the value of the (new) state following 529the date. Optionally, we may also define a new time scale with the 530argument \texttt{new.scale=}. 531 532We are interested in the time before and after inception of insulin 533use, which occurs at the date \texttt{doins}: 534\begin{Schunk} 535\begin{Sinput} 536> whc <- c(names(dmL)[1:7],"dodm","doins") # WHich Columns do we want to see? 537> subset( dmL, lex.id %in% wh.id )[,whc] 538\end{Sinput} 539\begin{Soutput} 540 per age tfD lex.dur lex.Cst lex.Xst lex.id dodm doins 541430048 1998.956 61.87269 0 9.508556 DM Dead 9 1998.956 NA 54222671 2000.042 52.71184 0 9.954825 DM DM 27 2000.042 NA 543338459 1998.249 61.85626 0 11.748118 DM DM 52 1998.249 2004.804 544274124 1998.260 62.37919 0 10.929500 DM Dead 484 1998.260 2003.960 545\end{Soutput} 546\begin{Sinput} 547> dmC <- cutLexis( data = dmL, 548+ cut = dmL$doins, 549+ timescale = "per", 550+ new.state = "Ins", 551+ new.scale = "tfI", 552+ precursor.states = "DM" ) 553> whc <- c(names(dmL)[1:8],"doins") # WHich Columns do we want to see? 554> subset( dmC, lex.id %in% wh.id )[,whc] 555\end{Sinput} 556\begin{Soutput} 557 per age tfD lex.dur lex.Cst lex.Xst lex.id sex doins 5589 1998.956 61.87269 0.000000 9.508556 DM Dead 9 F NA 55927 2000.042 52.71184 0.000000 9.954825 DM DM 27 M NA 56052 1998.249 61.85626 0.000000 6.554415 DM Ins 52 F 2004.804 56110048 2004.804 68.41068 6.554415 5.193703 Ins Ins 52 F 2004.804 562484 1998.260 62.37919 0.000000 5.700205 DM Ins 484 F 2003.960 56310480 2003.960 68.07940 5.700205 5.229295 Ins Dead 484 F 2003.960 564\end{Soutput} 565\end{Schunk} 566(The \texttt{precursor.states=} argument is explained below). 567 568Note that the process of cutting time is simplified by having all 569types of events referred to the calendar time scale. This is a 570generally applicable advice in handling follow-up data: Get all event 571times as \emph{dates}, location of events and follow-up on other time 572scales can then easily be derived from this. 573 574Note 575that individual 52 has had his follow-up cut at 6.55 years from 576diabetes diagnosis and individual 484 at 5.70 years from diabetes 577diagnosis. This dataset could then be split along the time scales as we 578did before with \texttt{dmL}. 579 580The result of this can also be achieved by cutting the split dataset 581\texttt{dmS2} instead of \texttt{dmL}: 582\begin{Schunk} 583\begin{Sinput} 584> dmS2C <- cutLexis( data = dmS2, 585+ cut = dmS2$doins, 586+ timescale = "per", 587+ new.state = "Ins", 588+ new.scale = "tfI", 589+ precursor.states = "DM" ) 590> subset( dmS2C, lex.id %in% wh.id )[,whc] 591\end{Sinput} 592\begin{Soutput} 593 per age tfD lex.dur lex.Cst lex.Xst lex.id sex doins 59431 1998.956 61.87269 0.000000 1.0000000 DM DM 9 F NA 59532 1999.956 62.87269 1.000000 1.0000000 DM DM 9 F NA 59633 2000.956 63.87269 2.000000 1.1273101 DM DM 9 F NA 59734 2002.083 65.00000 3.127310 1.8726899 DM DM 9 F NA 59835 2003.956 66.87269 5.000000 3.1273101 DM DM 9 F NA 59936 2007.083 70.00000 8.127310 1.3812457 DM Dead 9 F NA 600111 2000.042 52.71184 0.000000 1.0000000 DM DM 27 M NA 601112 2001.042 53.71184 1.000000 1.0000000 DM DM 27 M NA 602113 2002.042 54.71184 2.000000 0.2881588 DM DM 27 M NA 603114 2002.331 55.00000 2.288159 2.7118412 DM DM 27 M NA 604115 2005.042 57.71184 5.000000 2.2881588 DM DM 27 M NA 605116 2007.331 60.00000 7.288159 2.6666667 DM DM 27 M NA 606229 1998.249 61.85626 0.000000 1.0000000 DM DM 52 F 2004.804 607230 1999.249 62.85626 1.000000 1.0000000 DM DM 52 F 2004.804 608231 2000.249 63.85626 2.000000 1.1437372 DM DM 52 F 2004.804 609232 2001.393 65.00000 3.143737 1.8562628 DM DM 52 F 2004.804 610233 2003.249 66.85626 5.000000 1.5544148 DM Ins 52 F 2004.804 61143629 2004.804 68.41068 6.554415 1.5893224 Ins Ins 52 F 2004.804 61243630 2006.393 70.00000 8.143737 1.8562628 Ins Ins 52 F 2004.804 61343631 2008.249 71.85626 10.000000 1.7481177 Ins Ins 52 F 2004.804 6142084 1998.260 62.37919 0.000000 1.0000000 DM DM 484 F 2003.960 6152085 1999.260 63.37919 1.000000 1.0000000 DM DM 484 F 2003.960 6162086 2000.260 64.37919 2.000000 0.6208077 DM DM 484 F 2003.960 6172087 2000.881 65.00000 2.620808 2.3791923 DM DM 484 F 2003.960 6182088 2003.260 67.37919 5.000000 0.7002053 DM Ins 484 F 2003.960 61945484 2003.960 68.07940 5.700205 1.9206023 Ins Ins 484 F 2003.960 62045485 2005.881 70.00000 7.620808 2.3791923 Ins Ins 484 F 2003.960 62145486 2008.260 72.37919 10.000000 0.9295003 Ins Dead 484 F 2003.960 622\end{Soutput} 623\end{Schunk} 624Thus it does not matter in which order we use \texttt{splitLexis} and 625\texttt{cutLexis}. Mathematicians would say that \texttt{splitLexis} 626and \texttt{cutLexis} are commutative. 627 628Note in \texttt{lex.id}=484, that follow-up subsequent to the event is 629classified as being in state \texttt{Ins}, but that the final 630transition to state \texttt{Dead} is preserved. This is the point of 631the \texttt{precursor.states=} argument. It names the states (in this 632case \texttt{DM}) that will be over-written by \texttt{new.state} (in 633this case \texttt{Ins}), while the state \texttt{Dead} should not be 634updated even if it is after the time where the persons moves to state 635\texttt{Ins}. In other words, only state \texttt{DM} is a precursor to 636state \texttt{Ins}, state \texttt{Dead} is always subsequent to state 637\texttt{Ins}. 638 639Note that we defined a new time scale, \texttt{tfI}, using the argument 640\texttt{new.scale=tfI}. This has a special status relative to the other 641three time scales, it is defined as time since entry into a state, 642namely \texttt{Ins}, this is noted in the time scale part of the 643summary of \texttt{Lexis} object --- the information sits in the 644attribute \texttt{time.since} of the \texttt{Lexis} object, which can 645be accessed by the function \texttt{timeSince()} or through the \texttt{summary()}: 646\begin{Schunk} 647\begin{Sinput} 648> summary( dmS2C, timeScales=TRUE ) 649\end{Sinput} 650\begin{Soutput} 651Transitions: 652 To 653From DM Ins Dead Records: Events: Risk time: Persons: 654 DM 35135 1694 2048 38877 3742 45885.49 9899 655 Ins 0 5762 451 6213 451 8387.77 1791 656 Sum 35135 7456 2499 45090 4193 54273.27 9996 657 658Timescales: 659 per age tfD tfI 660 "" "" "" "Ins" 661\end{Soutput} 662\end{Schunk} 663Finally we can get a quick overview of the states and transitions by 664using \texttt{boxes} --- \texttt{scale.R} scales transition rates to 665rates per 1000 PY: 666\begin{Schunk} 667\begin{Sinput} 668> boxes( dmC, boxpos=TRUE, scale.R=1000, show.BE=TRUE ) 669\end{Sinput} 670\end{Schunk} 671\insfig{box1}{0.8}{States, person years, transitions and rates in the 672 cut dataset. The numbers \emph{in} the boxes are person-years and 673 the number of persons \texttt{B}eginning, resp. \texttt{E}nding 674 their follow-up in each state (triggered by \textrm{\tt 675 show.BE=TRUE}). The numbers at the arrows are the number of 676 transitions and transition rates per 1000 (triggered by \textrm{\tt 677 scale.R=1000}).} 678 679\chapter{Modeling rates from \texttt{Lexis} objects} 680 681\section{Covariates} 682 683In the dataset \texttt{dmS2C} there are three types of covariates that 684can be used to describe mortality rates: 685\begin{enumerate} 686\item time-dependent covariates 687\item time scales 688\item fixed covariates 689\end{enumerate} 690 691There is only one time-dependent covariate here, namely 692\texttt{lex.Cst}, the current state of the person's follow up; it 693takes the values \texttt{DM} and \texttt{Ins} according to whether the 694person has ever purchased insulin at a given time of follow-up. 695 696The time-scales are obvious candidates for explanatory variables for 697the rates, notably age and time from diagnosis (duration of diabetes) 698and insulin. 699 700\subsection{Time scales as covariates} 701 702If we want to model the effect of the time scale variables on 703occurrence rates, we will for each interval use either the value of 704the left endpoint in each interval or the middle. There is a function 705\texttt{timeBand} which returns either of these: 706\begin{Schunk} 707\begin{Sinput} 708> timeBand( dmS2C, "age", "middle" )[1:10] 709\end{Sinput} 710\begin{Soutput} 711 [1] 57.5 57.5 62.5 62.5 62.5 67.5 67.5 62.5 67.5 67.5 712\end{Soutput} 713\begin{Sinput} 714> # For nice printing and column labelling we use the data.frame() function: 715> data.frame( dmS2C[,c("per","age","tfD","lex.dur")], 716+ mid.age=timeBand( dmS2C, "age", "middle" ), 717+ mid.t=timeBand( dmS2C, "tfD", "middle" ), 718+ left.t=timeBand( dmS2C, "tfD", "left" ), 719+ right.t=timeBand( dmS2C, "tfD", "right" ), 720+ fact.t=timeBand( dmS2C, "tfD", "factor" ) )[1:15,] 721\end{Sinput} 722\begin{Soutput} 723 per age tfD lex.dur mid.age mid.t left.t right.t fact.t 7241 1998.917 58.66119 0.0000000 1.00000000 57.5 0.5 0 1 (0,1] 7252 1999.917 59.66119 1.0000000 0.33880903 57.5 1.5 1 2 (1,2] 7263 2000.256 60.00000 1.3388090 0.66119097 62.5 1.5 1 2 (1,2] 7274 2000.917 60.66119 2.0000000 3.00000000 62.5 3.5 2 5 (2,5] 7285 2003.917 63.66119 5.0000000 1.33880903 62.5 7.5 5 10 (5,10] 7296 2005.256 65.00000 6.3388090 3.66119097 67.5 7.5 5 10 (5,10] 7307 2008.917 68.66119 10.0000000 1.08008214 67.5 15.0 10 20 (10,20] 7318 2003.309 64.09035 0.0000000 0.90965092 62.5 0.5 0 1 (0,1] 7329 2004.218 65.00000 0.9096509 0.09034908 67.5 0.5 0 1 (0,1] 73310 2004.309 65.09035 1.0000000 1.00000000 67.5 1.5 1 2 (1,2] 73411 2005.309 66.09035 2.0000000 3.00000000 67.5 3.5 2 5 (2,5] 73512 2008.309 69.09035 5.0000000 0.90965092 67.5 7.5 5 10 (5,10] 73613 2009.218 70.00000 5.9096509 0.77891855 72.5 7.5 5 10 (5,10] 73714 2004.552 86.25051 0.0000000 1.00000000 87.5 0.5 0 1 (0,1] 73815 2005.552 87.25051 1.0000000 1.00000000 87.5 1.5 1 2 (1,2] 739\end{Soutput} 740\end{Schunk} 741Note that the values of these functions are characteristics of the 742intervals defined by \texttt{breaks=}, \emph{not} the midpoints nor 743left or right endpoints of the actual follow-up intervals (which would 744be \texttt{tfD} and \texttt{tfD+lex.dur}, respectively). 745 746These functions are intended for modeling time scale variables as 747factors (categorical variables) in which case the coding must be 748independent of the censoring and mortality pattern --- it should only 749depend on the chosen grouping of the time scale. Modeling time scales as 750\emph{quantitative} should not be based on these codings but directly 751on the values of the time-scale variables, notably the left endpoints 752of the intervals. 753 754\subsection{Differences between time scales} 755 756Apparently, the only fixed variable is \texttt{sex}, but formally the 757dates of birth (\texttt{dobth}), diagnosis (\texttt{dodm}) and first 758insulin purchase (\texttt{doins}) are fixed covariates too. They can 759be constructed as origins of time scales referred to the calendar time 760scale. Likewise, and possibly of greater interest, we can consider 761these origins on the age scale, calculated as the difference between 762age and another time scale. 763 764These would then be age at birth (hardly relevant since it is the same 765for all persons), age at diabetes 766diagnosis and age at insulin treatment. 767 768\subsection{Keeping the relation between time scales} 769 770The midpoint (as well as the right interval endpoint) should be used 771with caution if the variable age at diagnosis \texttt{dodm-dobth} is 772modeled too; the age at diabetes is logically equal to the difference 773between current age (\texttt{age}) and time since diabetes diagnosis 774(\texttt{tfD}): 775\begin{Schunk} 776\begin{Sinput} 777> summary( (dmS2$age-dmS2$tfD) - (dmS2$dodm-dmS2$dobth) ) 778\end{Sinput} 779\begin{Soutput} 780 Min. 1st Qu. Median Mean 3rd Qu. Max. 781 0 0 0 0 0 0 782\end{Soutput} 783\end{Schunk} 784This calculation refers to the \emph{start} of each interval --- which 785are in the time scale variables in the \texttt{Lexis} object. But when 786using the middle of the intervals, this relationship is not preserved: 787\begin{Schunk} 788\begin{Sinput} 789> summary( timeBand( dmS2, "age", "middle" ) - 790+ timeBand( dmS2, "tfD", "middle" ) - (dmS2$dodm-dmS2$dobth) ) 791\end{Sinput} 792\begin{Soutput} 793 Min. 1st Qu. Median Mean 3rd Qu. Max. 794-7.4870 -2.0862 -0.3765 Inf 1.3641 Inf 795\end{Soutput} 796\end{Schunk} 797If all three variables are to be included in a model, we must make 798sure that the \emph{substantial} relationship between the variables be 799maintained. One way is to recompute age at diabetes diagnosis from the two 800midpoint variables, but more straightforward would be to use the left 801endpoint of the intervals, that is the time scale variables in the 802\texttt{Lexis} object. 803 804If we dissolve the relationship between the variables \texttt{age}, 805\texttt{tfD} and age at diagnosis by grouping we may obtain 806identifiability of the three separate effects, but it will be at the 807price of an arbitrary allocation of a linear trend between them. 808 809For the sake of clarity, consider current age, $a$, duration of 810disease, $d$ and age at diagnosis $e$, where 811\[ 812 \text{current age} = 813 \text{age at diagnosis} + 814 \text{disease duration}, 815 \quad \text{\ie} \quad a=e+d \quad 816 \Leftrightarrow \quad e+d-a=0 817\] 818If we model the effect of the quantitative variables $a$, $e$ and $d$ 819on the log-rates by three functions $f$, $g$ and $h$: 820$ \log(\lambda)=f(a)+g(d)+h(e) $ 821then for any $\kappa$: 822\begin{align*} 823 \log(\lambda) & = f(a)+g(d)+h(e)+\kappa(e+d-a)\\ 824 & = 825 \big(f(a)-\kappa a \big)+ 826 \big(g(d)+\kappa d \big)+ 827 \big(h(e)+\kappa e \big) \\ 828& = \tilde f(a)+ \tilde g(d)+ \tilde h(e) 829\end{align*} 830In practical modeling this will turn up as a singular model matrix 831with one parameter aliased, corresponding to some arbitrarily chosen 832value of $\kappa$ (depending on software conventions for singular 833models). This phenomenon is well known from age-period-cohort models. 834 835Thus we see that we can move any slope around between the three terms, 836so if we achieve identifiability by using grouping of one of the 837variables we will in reality have settled for a particular value of 838$\kappa$, without known why we chose just that. The solution is to 839resort to predictions which are independent of the particular 840parametrization or choose a particular parametrization with explicit constraints. 841 842\section{Modeling of rates} 843 844As mentioned, the purpose of subdividing follow-up data in smaller 845intervals is to be able to model effects of time scale variables as 846parametric functions. When we split along a time scale we can get 847intervals that are as small as we want; if they are sufficiently 848small, an assumption of constant rates in each interval becomes 849reasonable. 850 851In a model that assumes a constant occurrence rate in each of the 852intervals the likelihood contribution from each interval is the same 853as the likelihood contribution from a Poisson variate $D$, say, with 854mean $\lambda \ell$ where $\lambda$ is the rate and $\ell$ is the 855interval length, and where the value of the variate $D$ is 1 or 0 856according to whether an event has occurred or not. Moreover, the 857likelihood contributions from all follow-up intervals from a single 858person are \emph{conditionally} independent (conditional on having 859survived till the start of the interval in question). This implies 860that the total contribution to the likelihood from a single person is 861a product of terms, and hence the same as the likelihood of a number 862of independent Poisson terms, one from each interval. 863 864Note that variables are neither Poisson distributed (\eg they can only 865ever assume values 0 or 1) nor independent --- it is only the 866likelihood for the follow-up data that happens to be the same as the 867likelihood from independent Poisson variates. Different models can 868have the same likelihood, a model cannot be inferred from the 869likelihood. 870 871Parametric modeling of the rates is obtained by using the 872\emph{values} of the time scales for each interval as \emph{quantitative} 873explanatory variables, using for example splines. And of course also 874the values of the fixed covariates and the time-dependent variables 875for each interval. Thus the model will be one where the rate is 876assumed constant in each (small) interval, but where a parametric form 877of the \emph{size} of the rate in each interval is imposed by the 878model, using the time scale as a quantitative covariate. 879 880\subsection{Interval length} 881 882In the first chapter we illustrated cutting and splitting by listing 883the results for a few individuals across a number of intervals. For 884illustrational purposes we used 5-year age bands to avoid excessive 885listings, but since the doubling time for mortality on the age scale 886is only slightly larger than 5 years, the assumption about constant 887rates in each interval would be pretty far fetched if we were to use 5 888year intervals. 889 890Thus, for modeling purposes we split the follow-up in 3 month 891intervals. When we use intervals of 3 months length it is superfluous 892to split along multiple time scales --- the precise location of 893tightly spaced splits will be irrelevant from any practical point of 894view. \texttt{splitLexis} and \texttt{splitMulti} will allocate the 895actual split times for all of the time scale variables, so these can be 896used directly in modeling. 897 898So we split the cut dataset in 3 months intervals along the age 899scale: 900\begin{Schunk} 901\begin{Sinput} 902> dmCs <- splitMulti( dmC, age = seq(0,110,1/4) ) 903> summary( dmCs, t=T ) 904\end{Sinput} 905\begin{Soutput} 906Transitions: 907 To 908From DM Ins Dead Records: Events: Risk time: Persons: 909 DM 189669 1694 2048 193411 3742 45885.49 9899 910 Ins 0 34886 451 35337 451 8387.77 1791 911 Sum 189669 36580 2499 228748 4193 54273.27 9996 912 913Timescales: 914 per age tfD tfI 915 "" "" "" "Ins" 916\end{Soutput} 917\end{Schunk} 918We see that we now have 228,748 records and 9996 persons, so about 23 919records per person. The total risk time is 54,275 years, a bit less 920than 3 months on average per record as expected. 921 922\subsection{Practicalities for splines} 923 924In this study we want to look at how mortality depend on age 925(\texttt{age}) and time since start of insulin use (\texttt{tfI}). If 926we want to use splines in the description we must allocate knots for 927anchoring the splines at each of the time scales, either by some 928\textit{ad hoc} method or by using some sort of penalized splines as 929for example by \texttt{gam}; the latter will not be treated here; it 930belongs in the realm of the \texttt{mgcv} package. 931 932Here we shall use the former approach and allocate 5 knots on each of 933the time-scales. We allocate knots so that we have the events evenly 934distributed between the knots. Since the insulin state starts at 0 for 935all individuals we include 0 as the first knot, such that any set of natural 936splines with these knots will have the value 0 at 0 on the time 937scale. 938\begin{Schunk} 939\begin{Sinput} 940> ( a.kn <- with( subset( dmCs, lex.Xst=="Dead" ), 941+ quantile( age+lex.dur, (1:5-0.5)/5 ) ) ) 942\end{Sinput} 943\begin{Soutput} 944 10% 30% 50% 70% 90% 94560.29350 71.31937 77.72758 82.72745 89.86393 946\end{Soutput} 947\begin{Sinput} 948> ( i.kn <- c( 0, 949+ with( subset( dmCs, lex.Xst=="Dead" & lex.Cst=="Ins" ), 950+ quantile( tfI+lex.dur, (1:4)/5 ) ) ) ) 951\end{Sinput} 952\begin{Soutput} 953 20% 40% 60% 80% 9540.0000000 0.3093771 1.1307324 2.5489391 4.9117043 955\end{Soutput} 956\end{Schunk} 957In the \texttt{Epi} package there is a convenience wrapper, 958\texttt{Ns}, for the \texttt{n}atural \texttt{s}pline generator 959\texttt{ns}, that takes the smallest and the largest of a set of 960supplied knots to be the boundary knots, so the explicit definition of 961the boundary knots becomes superfluous. 962 963Note that it is a feature of the \texttt{Ns} (via the features of 964\texttt{ns}) that any generated spline function is 0 at the leftmost 965knot. 966 967\subsection{Poisson models} 968 969A model that describes mortality rates as only a function of age would 970then be: 971\begin{Schunk} 972\begin{Sinput} 973> ma <- glm( (lex.Xst=="Dead") ~ Ns(age,knots=a.kn), 974+ family = poisson, 975+ offset = log(lex.dur), 976+ data = dmCs ) 977> summary( ma ) 978\end{Sinput} 979\begin{Soutput} 980Call: 981glm(formula = (lex.Xst == "Dead") ~ Ns(age, knots = a.kn), family = poisson, 982 data = dmCs, offset = log(lex.dur)) 983 984Deviance Residuals: 985 Min 1Q Median 3Q Max 986-0.5883 -0.1688 -0.1144 -0.0766 4.5958 987 988Coefficients: 989 Estimate Std. Error z value Pr(>|z|) 990(Intercept) -3.82830 0.03861 -99.16 <2e-16 991Ns(age, knots = a.kn)1 1.36254 0.08723 15.62 <2e-16 992Ns(age, knots = a.kn)2 1.49446 0.06845 21.83 <2e-16 993Ns(age, knots = a.kn)3 2.63557 0.07058 37.34 <2e-16 994Ns(age, knots = a.kn)4 1.94173 0.05769 33.66 <2e-16 995 996(Dispersion parameter for poisson family taken to be 1) 997 998 Null deviance: 27719 on 228747 degrees of freedom 999Residual deviance: 25423 on 228743 degrees of freedom 1000AIC: 30431 1001 1002Number of Fisher Scoring iterations: 8 1003\end{Soutput} 1004\end{Schunk} 1005The offset, \texttt{log(lex.dur)} comes from the fact that the 1006likelihood for the follow-up data during $\ell$ time is the same as 1007that for independent Poisson variates with mean $\lambda \ell$, and 1008that the default link function for the Poisson family is the log, so 1009that we are using a linear model for the log-mean, 1010$\log(\lambda) + \log(\ell)$. But when we want a model for the 1011log-rate ($\log(\lambda)$), the term $\log(\ell)$ must still be 1012included as a covariate, but with regression coefficient fixed to 1; a 1013so-called \emph{offset}. This is however a technicality; it just 1014exploits that the likelihood of a particular Poisson model and that of 1015the rates model is the same. 1016 1017In the \texttt{Epi} package is a \texttt{glm} family, \texttt{poisreg} 1018that has a more intuitive interface, where the response is a 2-column 1019matrix of events and person-time, respectively. This is in concert 1020with the fact that the outcome variable in follow-up studies is 1021bivariate: (event, risk time). 1022\begin{Schunk} 1023\begin{Sinput} 1024> Ma <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn), 1025+ family = poisreg, data = dmCs ) 1026> summary( Ma ) 1027\end{Sinput} 1028\begin{Soutput} 1029Call: 1030glm(formula = cbind(lex.Xst == "Dead", lex.dur) ~ Ns(age, knots = a.kn), 1031 family = poisreg, data = dmCs) 1032 1033Deviance Residuals: 1034 Min 1Q Median 3Q Max 1035-0.5883 -0.1688 -0.1144 -0.0766 4.5958 1036 1037Coefficients: 1038 Estimate Std. Error z value Pr(>|z|) 1039(Intercept) -3.82830 0.03861 -99.15 <2e-16 1040Ns(age, knots = a.kn)1 1.36254 0.08723 15.62 <2e-16 1041Ns(age, knots = a.kn)2 1.49446 0.06845 21.83 <2e-16 1042Ns(age, knots = a.kn)3 2.63557 0.07058 37.34 <2e-16 1043Ns(age, knots = a.kn)4 1.94173 0.05769 33.66 <2e-16 1044 1045(Dispersion parameter for poisson family taken to be 1) 1046 1047 Null deviance: 27719 on 228747 degrees of freedom 1048Residual deviance: 25423 on 228743 degrees of freedom 1049AIC: 30431 1050 1051Number of Fisher Scoring iterations: 7 1052\end{Soutput} 1053\end{Schunk} 1054Exploiting the multistate structure in the \texttt{Lexis} object 1055there is a multistate convenience wrapper for \texttt{glm} with the 1056\texttt{poisreg} family, that just requires specification of the 1057transitions in terms of \texttt{from} and \texttt{to}. Although it is 1058called \texttt{glm.Lexis} it is \emph{not} an S3 method for 1059\texttt{Lexis} objects: 1060\begin{Schunk} 1061\begin{Sinput} 1062> Xa <- glm.Lexis( dmCs, from="DM", to="Dead", 1063+ formula = ~ Ns(age,knots=a.kn) ) 1064\end{Sinput} 1065\begin{Soutput} 1066stats::glm Poisson analysis of Lexis object dmCs with log link: 1067Rates for the transition: DM->Dead 1068\end{Soutput} 1069\end{Schunk} 1070The result is a \texttt{glm} object but with an extra attribute, \texttt{Lexis}: 1071\begin{Schunk} 1072\begin{Sinput} 1073> attr( Xa, "Lexis" ) 1074\end{Sinput} 1075\begin{Soutput} 1076$data 1077[1] "dmCs" 1078 1079$trans 1080[1] "DM->Dead" 1081 1082$formula 1083~Ns(age, knots = a.kn) 1084<environment: 0xb8adec8> 1085 1086$scale 1087[1] 1 1088\end{Soutput} 1089\end{Schunk} 1090There are similar wrappers for \texttt{gam} and \texttt{coxph} models, 1091\texttt{gam.Lexis} and \texttt{coxph.Lexis}, but these will not be 1092elaborated in detail. 1093 1094The \texttt{from=} and \texttt{to=} can even be omitted, in which case 1095all possible transitions \emph{into} any of the absorbing states is 1096modeled: 1097\begin{Schunk} 1098\begin{Sinput} 1099> xa <- glm.Lexis( dmCs, formula = ~ Ns(age,knots=a.kn) ) 1100\end{Sinput} 1101\begin{Soutput} 1102stats::glm Poisson analysis of Lexis object dmCs with log link: 1103Rates for transitions: DM->Dead, Ins->Dead 1104\end{Soutput} 1105\end{Schunk} 1106We can check if the four models fitted are the same: 1107\begin{Schunk} 1108\begin{Sinput} 1109> c( deviance(ma), deviance(Ma), deviance(Xa), deviance(xa) ) 1110\end{Sinput} 1111\begin{Soutput} 1112[1] 25422.92 25422.92 20902.31 25422.92 1113\end{Soutput} 1114\end{Schunk} 1115Oops! the model \texttt{Xa} is apparently not the same as the other 1116three? This is because the explicit specification 1117\verb|from="DM", to="Dead"|, omits modeling contributions from the 1118$\mathtt{Ins}\rightarrow\mathtt{Dead}$ transition --- the output 1119actually said so --- see also figure \ref{fig:box1} on 1120p. \pageref{fig:box1}. The other three models all use both transitions 1121--- and assume that the two transition rates are the same, \ie that 1122start of insulin has no effect on mortality. We shall relax this 1123assumption later. 1124 1125The parameters from the model do not have any direct interpretation 1126\textit{per se}, but we can compute the estimated mortality rates for 1127a range of ages using \texttt{ci.pred} with a suitably defined 1128prediction data frame. 1129 1130Note that if we use the \texttt{poisson} family of models, we must 1131specify all covariates in the model, including the variable in the 1132offset, \texttt{lex.dur} (remember that this was a covariate with 1133coefficient fixed at 1). We set the latter to 1000, because we want the 1134mortality rates per 1000 person-years. Using the \texttt{poisreg} 1135family, the prediction will ignore any value of \texttt{lex.dur} 1136specified in the prediction data frame, the returned rates will be per 1137unit in which \texttt{lex.dur} is recorded. 1138\begin{Schunk} 1139\begin{Sinput} 1140> nd <- data.frame( age=40:85, lex.dur=1000 ) 1141> pr.0 <- ci.pred( ma, newdata = nd ) # mortality per 100 PY 1142> pr.a <- ci.pred( Ma, newdata = nd )*1000 # mortality per 100 PY 1143> summary(pr.0/pr.a) 1144\end{Sinput} 1145\begin{Soutput} 1146 Estimate 2.5% 97.5% 1147 Min. :1 Min. :1 Min. :1 1148 1st Qu.:1 1st Qu.:1 1st Qu.:1 1149 Median :1 Median :1 Median :1 1150 Mean :1 Mean :1 Mean :1 1151 3rd Qu.:1 3rd Qu.:1 3rd Qu.:1 1152 Max. :1 Max. :1 Max. :1 1153\end{Soutput} 1154\begin{Sinput} 1155> matshade( nd$age, pr.a, plot=TRUE, 1156+ type="l", lty=1, 1157+ log="y", xlab="Age (years)", 1158+ ylab="DM mortality per 1000 PY") 1159\end{Sinput} 1160\end{Schunk} 1161\insfig{pr-a}{0.8}{Mortality among Danish diabetes patients by age 1162 with 95\% CI as shaded area. We see that the rates increase linearly 1163 on the log-scale, that is exponentially by age.} 1164 1165\section{Time dependent covariate} 1166 1167A Poisson model approach to mortality by insulin status, would be to 1168assume that the rate-ratio between patients on insulin and not on 1169insulin is a fixed quantity, independent of time since start of insulin, 1170independent of age. This is commonly termed a proportional hazards 1171assumption, because the rates (hazards) in the two groups are 1172proportional along the age (baseline time) scale. 1173\begin{Schunk} 1174\begin{Sinput} 1175> pm <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn) 1176+ + lex.Cst + sex, 1177+ family=poisreg, data = dmCs ) 1178> round( ci.exp( pm ), 3 ) 1179\end{Sinput} 1180\begin{Soutput} 1181 exp(Est.) 2.5% 97.5% 1182(Intercept) 0.022 0.021 0.024 1183Ns(age, knots = a.kn)1 4.248 3.581 5.040 1184Ns(age, knots = a.kn)2 5.008 4.376 5.731 1185Ns(age, knots = a.kn)3 16.832 14.624 19.373 1186Ns(age, knots = a.kn)4 7.994 7.126 8.968 1187lex.CstIns 1.985 1.791 2.200 1188sexF 0.668 0.617 0.724 1189\end{Soutput} 1190\end{Schunk} 1191So we see that persons on insulin have about twice the mortality of 1192persons not on insulin and that women have 2/3 the mortality of men. 1193 1194\subsection{Time since insulin start} 1195 1196If we want to test whether the excess mortality depends on the time 1197since start if insulin treatment, we can add a spline terms in 1198\texttt{tfI}. But since \texttt{tfI} is a time scale defined as time 1199since entry into a new state (\texttt{Ins}), the variable \texttt{tfI} 1200will be missing for those in the \texttt{DM} state, so before modeling 1201we must set the \texttt{NA}s to 0, which we do with \texttt{tsNA20} 1202(acronym for \texttt{t}ime\texttt{s}cale \texttt{NA}s to zero): 1203\begin{Schunk} 1204\begin{Sinput} 1205> pm <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(age,knots=a.kn) 1206+ + Ns(tfI,knots=i.kn) 1207+ + lex.Cst + sex, 1208+ family=poisreg, data = tsNA20(dmCs) ) 1209\end{Sinput} 1210\end{Schunk} 1211As noted before we could do this simpler with \texttt{glm.Lexis}, even 1212without the \texttt{from=} and \texttt{to=} arguments, because we are 1213modeling all transitions \emph{into} the absorbing state 1214(\texttt{Dead}): 1215\begin{Schunk} 1216\begin{Sinput} 1217> Pm <- glm.Lexis( tsNA20(dmCs), 1218+ form = ~ Ns(age,knots=a.kn) 1219+ + Ns(tfI,knots=i.kn) 1220+ + lex.Cst + sex ) 1221\end{Sinput} 1222\begin{Soutput} 1223stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link: 1224Rates for transitions: DM->Dead, Ins->Dead 1225\end{Soutput} 1226\begin{Sinput} 1227> c( deviance(Pm), deviance(pm) ) 1228\end{Sinput} 1229\begin{Soutput} 1230[1] 25096.33 25096.33 1231\end{Soutput} 1232\begin{Sinput} 1233> identical( model.matrix(Pm), model.matrix(pm) ) 1234\end{Sinput} 1235\begin{Soutput} 1236[1] TRUE 1237\end{Soutput} 1238\end{Schunk} 1239The coding of the effect of \texttt{tfI} is so that the value is 0 at 12400, so the meaning of the estimate of the effect of \texttt{lex.Cst} is 1241the RR between persons with and without insulin, immediately after 1242start of insulin: 1243\begin{Schunk} 1244\begin{Sinput} 1245> round( ci.exp( Pm, subset="ex" ), 3 ) 1246\end{Sinput} 1247\begin{Soutput} 1248 exp(Est.) 2.5% 97.5% 1249lex.CstIns 5.632 4.430 7.16 1250sexF 0.674 0.622 0.73 1251\end{Soutput} 1252\end{Schunk} 1253We see that the effect of sex is pretty much the same as before, but 1254the effect of \texttt{lex.Cst} is much larger, it now refers to a 1255different quantity, namely the RR at \texttt{tfI}=0. If we want to see 1256the effect of time since insulin, it is best viewed jointly with the 1257effect of age: 1258\begin{Schunk} 1259\begin{Sinput} 1260> ndI <- data.frame( expand.grid( tfI=c(NA,seq(0,15,0.1)), 1261+ ai=seq(40,80,10) ), 1262+ sex="M", 1263+ lex.Cst="Ins" ) 1264> ndI <- transform( ndI, age=ai+tfI ) 1265> head( ndI ) 1266\end{Sinput} 1267\begin{Soutput} 1268 tfI ai sex lex.Cst age 12691 NA 40 M Ins NA 12702 0.0 40 M Ins 40.0 12713 0.1 40 M Ins 40.1 12724 0.2 40 M Ins 40.2 12735 0.3 40 M Ins 40.3 12746 0.4 40 M Ins 40.4 1275\end{Soutput} 1276\begin{Sinput} 1277> ndA <- data.frame( age= seq(40,100,0.1), tfI=0, lex.Cst="DM", sex="M" ) 1278> pri <- ci.pred( Pm, ndI ) * 1000 1279> pra <- ci.pred( Pm, ndA ) * 1000 1280> matshade( ndI$age, pri, plot=TRUE, las=1, 1281+ xlab="Age (years)", ylab="DM mortality per 1000 PY", 1282+ log="y", lty=1, col="blue" ) 1283> matshade( ndA$age, pra ) 1284\end{Sinput} 1285\end{Schunk} 1286\insfig{ins-time}{0.8}{Mortality rates of persons on insulin, starting 1287insulin at ages 40,50,\ldots,80 (blue), compared with persons not on 1288insulin (black curve). Shaded areas are 95\% CI.} 1289 1290In figure \ref{fig:ins-time}, p. \pageref{fig:ins-time}, we see that 1291mortality is high just after insulin start, but falls by almost a 1292factor 3 during the first year. Also we see that there is a tendency 1293that mortality in a given age is smallest for those with the longest 1294duration of insulin use. 1295 1296\section{The Cox model} 1297 1298Note that in the Cox-model the age is used as response variable, 1299slightly counter-intuitive. Hence the age part of the linear predictors 1300is not in that model: 1301\begin{Schunk} 1302\begin{Sinput} 1303> library( survival ) 1304> cm <- coxph( Surv(age,age+lex.dur,lex.Xst=="Dead") ~ 1305+ Ns(tfI,knots=i.kn) + lex.Cst + sex, 1306+ data = tsNA20(dmCs) ) 1307\end{Sinput} 1308\end{Schunk} 1309There is also a multistate wrapper for Cox models, requiring a 1310l.h.s. side for the \texttt{formula=} argument: 1311\begin{Schunk} 1312\begin{Sinput} 1313> Cm <- coxph.Lexis( tsNA20(dmCs), 1314+ form= age ~ Ns(tfI,knots=i.kn) + lex.Cst + sex ) 1315\end{Sinput} 1316\begin{Soutput} 1317survival::coxph analysis of Lexis object tsNA20(dmCs): 1318Rates for transitions DM->Dead, Ins->Dead 1319Baseline timescale: age 1320\end{Soutput} 1321\begin{Sinput} 1322> cbind( ci.exp( cm ), ci.exp( Cm ) ) 1323\end{Sinput} 1324\begin{Soutput} 1325 exp(Est.) 2.5% 97.5% exp(Est.) 2.5% 97.5% 1326Ns(tfI, knots = i.kn)1 0.2984062 0.19417148 0.4585960 0.2984062 0.19417148 0.4585960 1327Ns(tfI, knots = i.kn)2 0.3871151 0.29011380 0.5165495 0.3871151 0.29011380 0.5165495 1328Ns(tfI, knots = i.kn)3 0.1239128 0.06287008 0.2442238 0.1239128 0.06287008 0.2442238 1329Ns(tfI, knots = i.kn)4 0.4405121 0.34839015 0.5569932 0.4405121 0.34839015 0.5569932 1330lex.CstIns 5.6700284 4.45011220 7.2243623 5.6700284 4.45011220 7.2243623 1331lex.CstDead 1.0000000 1.00000000 1.0000000 1.0000000 1.00000000 1.0000000 1332sexF 0.6753202 0.62316569 0.7318397 0.6753202 0.62316569 0.7318397 1333\end{Soutput} 1334\end{Schunk} 1335We can compare the estimates from the Cox model with those from the 1336Poisson model --- we must add \texttt{NA}s because the Cox-model does 1337not give the parameters for the baseline time scale (\texttt{age}), but 1338also remove one of the parameters, because \texttt{coxph} parametrizes 1339factors (in this case \texttt{lex.Cst}) by all defined levels and not 1340only by the levels present in the dataset at hand (note the line of 1341\texttt{1.0000000}s in the print above): 1342\begin{Schunk} 1343\begin{Sinput} 1344> round( cbind( ci.exp( Pm ), 1345+ rbind( matrix(NA,5,3), 1346+ ci.exp( cm )[-6,] ) ), 3 ) 1347\end{Sinput} 1348\begin{Soutput} 1349 exp(Est.) 2.5% 97.5% exp(Est.) 2.5% 97.5% 1350(Intercept) 0.022 0.021 0.024 NA NA NA 1351Ns(age, knots = a.kn)1 4.208 3.546 4.993 NA NA NA 1352Ns(age, knots = a.kn)2 5.012 4.380 5.736 NA NA NA 1353Ns(age, knots = a.kn)3 16.560 14.386 19.063 NA NA NA 1354Ns(age, knots = a.kn)4 7.921 7.061 8.885 NA NA NA 1355Ns(tfI, knots = i.kn)1 0.298 0.194 0.458 0.298 0.194 0.459 1356Ns(tfI, knots = i.kn)2 0.385 0.289 0.514 0.387 0.290 0.517 1357Ns(tfI, knots = i.kn)3 0.125 0.064 0.246 0.124 0.063 0.244 1358Ns(tfI, knots = i.kn)4 0.438 0.346 0.553 0.441 0.348 0.557 1359lex.CstIns 5.632 4.430 7.160 5.670 4.450 7.224 1360sexF 0.674 0.622 0.730 0.675 0.623 0.732 1361\end{Soutput} 1362\end{Schunk} 1363Thus we see that the Poisson and Cox gives pretty much the same 1364results. You may argue that Cox requires a smaller dataset, because 1365there is no need to subdivide data in small intervals \emph{before} 1366insulin use. But certainly the time \emph{after} insulin inception need 1367to be if the effect of this time should be modeled. 1368 1369The drawback of the Cox-modeling is that it is not possible to show 1370the absolute rates as we did in figure \ref{fig:ins-time} on 1371\pageref{fig:ins-time}. 1372 1373\section{Marginal effect of time since insulin} 1374 1375When we plot the marginal effect of \texttt{tfI} from the two models 1376we get pretty much the same; here we plot the RR relative to 1377\texttt{tfI}=2 years. Note that we are deriving the RR as the ratio of 1378two sets of predictions, from the data frames \texttt{nd} and 1379\texttt{nr} --- for further details consult the help page for 1380\texttt{ci.lin}, specifically the use of a list as the 1381\texttt{ctr.mat} argument: 1382\begin{Schunk} 1383\begin{Sinput} 1384> nd <- data.frame( tfI=seq(0,15,,151), lex.Cst="Ins", sex="M" ) 1385> nr <- data.frame( tfI= 2 , lex.Cst="Ins", sex="M" ) 1386> ppr <- ci.exp( pm, list(nd,nr), xvars="age" ) 1387> cpr <- ci.exp( cm, list(nd,nr) ) 1388> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1389> matshade( nd$tfI, cbind(ppr,cpr), plot=T, 1390+ lty=c(1,2), log="y", 1391+ xlab="Time since insulin (years)", ylab="Rate ratio") 1392> abline( h=1, lty=3 ) 1393\end{Sinput} 1394\end{Schunk} 1395\insfig{Ieff}{0.8}{The naked duration effects relative to 2 years of 1396 duration, black from Poisson model, red from Cox model. The two sets 1397 of estimates are identical, and so are the standard errors, so the 1398 two shaded areas overlap almost perfectly.} 1399 1400In figure \ref{fig:Ieff}, p. \pageref{fig:Ieff}, we see that the 1401duration effect is exactly the same from the two modeling approaches. 1402 1403We will also want the RR relative to the non-insulin users --- recall these 1404are coded 0 on the \texttt{tfI} variable: 1405\begin{Schunk} 1406\begin{Sinput} 1407> nd <- data.frame( tfI=seq(0,15,,151), lex.Cst="Ins", sex="M" ) 1408> nr <- data.frame( tfI= 0 , lex.Cst="DM" , sex="M" ) 1409> ppr <- ci.exp( pm, list(nd,nr), xvars="age" ) 1410> cpr <- ci.exp( cm, list(nd,nr) ) 1411> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1412> matshade( nd$tfI, cbind(ppr,cpr), 1413+ xlab="Time since insulin (years)", 1414+ ylab="Rate ratio relative to non-Insulin", 1415+ lty=c(1,2), log="y", plot=T ) 1416\end{Sinput} 1417\end{Schunk} 1418\insfig{IeffR}{0.8}{Insulin duration effect (state \textrm{\tt Ins}) 1419 relative to no insulin (state \textrm{\tt DM}), black from Poisson 1420 model, red from Cox model. The \emph{shape} is the same as the 1421 previous figure, but the RR is now relative to non-insulin, instead 1422 of relative to insulin users at 2 years duration. The two sets of 1423 estimates are identical, and so are the standard errors, so the two 1424 shaded areas overlap almost perfectly.} 1425 1426In figure \ref{fig:IeffR}, p. \pageref{fig:IeffR}, we see the effect 1427of increasing duration of insulin use \emph{for a fixed age} which is 1428a bit artificial, so we would like to see the \emph{joint} effects of 1429age and insulin duration. What we cannot see is how the duration 1430affects mortality relative to \texttt{current} age (at the age 1431attained at the same time as the attained \texttt{tfI}). 1432 1433Another way of interpreting this curve is as the rate ratio relative to a 1434person not on insulin, so we see that the RR (or hazard ratio, HR as 1435some call it) is over 5 at the start of insulin (the \texttt{lex.Cst} 1436estimate), and decreases to about 1.5 in the long term. 1437 1438Both figure \ref{fig:Ieff} and \ref{fig:IeffR} indicate a declining 1439RR by insulin duration, but only from figure \ref{fig:ins-time} it is 1440visible that mortality actually is \emph{in}creasing by age after some 14412 years after insulin start. This point would not be available if we 1442had only fitted a Cox model where we did not have access to the 1443baseline hazard as a function of age. 1444 1445\section{Age$\times$duration interaction} 1446 1447The model we fitted assumes that the RR is the same regardless of the 1448age at start of insulin --- the hazards are multiplicative. Sometimes 1449this is termed the proportional hazards assumption: For \emph{any} 1450fixed age the HR is the same as a function of time since insulin, and 1451vice versa. 1452 1453A more correct term would be ``main effects model'' --- there is no 1454interaction between age (the baseline time scale) and other 1455covariates. So there is really no need for the term ``proportional 1456hazards''; well defined and precise statistical terms for it has 1457existed for aeons. 1458 1459\subsection{Age at insulin start} 1460 1461In order to check the consistency of the multiplicativity assumption 1462across the spectrum of age at insulin inception, we can fit an 1463interaction model. One approach to this would be using a non-linear 1464effect of age at insulin use (for convenience we use the same knots as 1465for age) --- note that the prediction data frames are the same as we 1466used above, because we do not compute age at insulin use as a separate 1467variable, but rather enter it as the difference between current age 1468(\texttt{age}) and insulin duration (\texttt{tfI}). 1469 1470At first glance we might think of doing: 1471\begin{Schunk} 1472\begin{Sinput} 1473> imx <- glm.Lexis( tsNA20(dmCs), 1474+ formula = ~ Ns(age ,knots=a.kn) 1475+ + Ns( tfI,knots=i.kn) 1476+ + Ns(age-tfI,knots=a.kn) 1477+ + lex.Cst + sex ) 1478\end{Sinput} 1479\begin{Soutput} 1480stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link: 1481Rates for transitions: DM->Dead, Ins->Dead 1482\end{Soutput} 1483\end{Schunk} 1484But this will fit a model with a rate-ratio between persons with and 1485without insulin that depends only on age at insulin start for the time 1486\emph{after} insulin start, the RR at \texttt{tfI}=0 will be the same 1487at any age, which really is not the type of interaction we wanted. 1488 1489We want the \texttt{age-tfI} term to be specific for the insulin 1490exposed so we may use one of two other approaches, that are 1491conceptually alike but mathematically different: 1492\begin{Schunk} 1493\begin{Sinput} 1494> Im <- glm.Lexis( tsNA20(dmCs), 1495+ formula = ~ Ns(age ,knots=a.kn) 1496+ + Ns( tfI,knots=i.kn) 1497+ + Ns((age-tfI)*(lex.Cst=="Ins"),knots=a.kn) 1498+ + lex.Cst + sex ) 1499\end{Sinput} 1500\begin{Soutput} 1501stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link: 1502Rates for transitions: DM->Dead, Ins->Dead 1503\end{Soutput} 1504\begin{Sinput} 1505> im <- glm.Lexis( tsNA20(dmCs), 1506+ formula = ~ Ns(age ,knots=a.kn) 1507+ + Ns( tfI,knots=i.kn) 1508+ + lex.Cst:Ns(age-tfI,knots=a.kn) 1509+ + lex.Cst + sex ) 1510\end{Sinput} 1511\begin{Soutput} 1512stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link: 1513Rates for transitions: DM->Dead, Ins->Dead 1514\end{Soutput} 1515\end{Schunk} 1516The first model (\texttt{Im}) has a common age-effect (\texttt{Ns(age,...}) for 1517persons with and without diabetes and a RR depending on insulin 1518duration \texttt{tfI} and age at insulin (\texttt{age-tfI}). Since the 1519linear effect of these two terms are in the model as well, a linear 1520trend in the RR by current age (\texttt{age}) is accommodated as well. 1521 1522The second model allows age-effects that differ non-linearly between 1523person with and without insulin, because the interaction term 1524\texttt{lex.Cst:Ns(age-tfI...} for persons not on insulin is merely an 1525age term (since \texttt{tfI} is coded 0 for all follow-up not on 1526insulin). 1527 1528We can compare the models fitted: 1529\begin{Schunk} 1530\begin{Sinput} 1531> anova( imx, Im, im, test='Chisq') 1532\end{Sinput} 1533\begin{Soutput} 1534Analysis of Deviance Table 1535 1536Model 1: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age, 1537 knots = a.kn) + Ns(tfI, knots = i.kn) + Ns(age - tfI, knots = a.kn) + 1538 lex.Cst + sex 1539Model 2: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age, 1540 knots = a.kn) + Ns(tfI, knots = i.kn) + Ns((age - tfI) * 1541 (lex.Cst == "Ins"), knots = a.kn) + lex.Cst + sex 1542Model 3: cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ Ns(age, 1543 knots = a.kn) + Ns(tfI, knots = i.kn) + lex.Cst:Ns(age - 1544 tfI, knots = a.kn) + lex.Cst + sex 1545 Resid. Df Resid. Dev Df Deviance Pr(>Chi) 15461 228734 25096 15472 228733 25087 1 8.9631 0.002755 15483 228730 25082 3 4.6804 0.196749 1549\end{Soutput} 1550\end{Schunk} 1551so we see that the models indeed are different, and moreover that the 1552last model does not provide substantial further improvement, by 1553allowing non-linear RR along the age-scale. 1554 1555We can illustrate the different estimated rates from the three models 1556in figure \ref{fig:dur-int}, p. \pageref{fig:dur-int}: 1557\begin{Schunk} 1558\begin{Sinput} 1559> pxi <- ci.pred( imx, ndI ) 1560> pxa <- ci.pred( imx, ndA ) 1561> pIi <- ci.pred( Im , ndI ) 1562> pIa <- ci.pred( Im , ndA ) 1563> pii <- ci.pred( im , ndI ) 1564> pia <- ci.pred( im , ndA ) 1565> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1566> matshade( ndI$age, cbind( pxi, pIi, pii)*1000, plot=T, log="y", 1567+ xlab="Age", ylab="Mortality per 1000 PY", 1568+ lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 ) 1569> matshade( ndA$age, cbind( pxa, pIa, pia)*1000, 1570+ lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 ) 1571\end{Sinput} 1572\end{Schunk} 1573\insfig{dur-int}{0.8}{Age at insulin as interaction between age and 1574 duration. Blue curves are from the naive interaction model 1575 \textrm{\tt imx} with identical $\RR$ at \textrm{\tt tfI}=0 at any 1576 age; green curves are from the interaction model with age at 1577 insulin, from the model \textrm{\tt Im} with only linear 1578 differences by age, and red lines from the full interaction model 1579 \textrm{\tt im}.} 1580 1581We can also plot the RRs only from these models (figure 1582\ref{fig:dur-int-RR}, p. \pageref{fig:dur-int-RR}); for this we need 1583the reference frames, and the machinery from \texttt{ci.exp} allowing 1584a list of two data frames: 1585\begin{Schunk} 1586\begin{Sinput} 1587> ndR <- transform( ndI, tfI=0, lex.Cst="DM" ) 1588> cbind( head(ndI), head(ndR) ) 1589\end{Sinput} 1590\begin{Soutput} 1591 tfI ai sex lex.Cst age tfI ai sex lex.Cst age 15921 NA 40 M Ins NA 0 40 M DM NA 15932 0.0 40 M Ins 40.0 0 40 M DM 40.0 15943 0.1 40 M Ins 40.1 0 40 M DM 40.1 15954 0.2 40 M Ins 40.2 0 40 M DM 40.2 15965 0.3 40 M Ins 40.3 0 40 M DM 40.3 15976 0.4 40 M Ins 40.4 0 40 M DM 40.4 1598\end{Soutput} 1599\begin{Sinput} 1600> Rxi <- ci.exp( imx, list(ndI,ndR) ) 1601> Rii <- ci.exp( im , list(ndI,ndR) ) 1602> RIi <- ci.exp( Im , list(ndI,ndR) ) 1603> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1604> matshade( ndI$age, cbind( Rxi, RIi, Rii), plot=T, log="y", 1605+ xlab="Age (years)", ylab="Rate ratio vs, non-Insulin", 1606+ lty=1, lwd=2, col=c("blue","forestgreen","red"), alpha=0.1 ) 1607> abline( h=1 ) 1608> abline( h=ci.exp(imx,subset="lex.Cst")[,1], lty="25", col="blue" ) 1609\end{Sinput} 1610\end{Schunk} 1611\insfig{dur-int-RR}{0.9}{RR from three different interaction 1612 models. The horizontal dotted line is at the estimated effect of 1613 \textrm{\tt lex.Cst}, to illustrate that the first model (blue) 1614 constrains this initial HR to be constant across age. The green 1615 curves are the extended interaction model, and the red the full 1616 one.} 1617 1618\clearpage 1619 1620\subsection{General interaction} 1621 1622As a final illustration we may want to explore a different kind of 1623interaction, not defined from the duration --- here we simplify the 1624interaction by not using the second-last knot in the interaction terms 1625--- figure \ref{fig:splint}, p. \pageref{fig:splint}. Note again that 1626the prediction code is the same: 1627\begin{Schunk} 1628\begin{Sinput} 1629> gm <- glm.Lexis( tsNA20(dmCs), 1630+ formula = ~ Ns(age,knots=a.kn) 1631+ + Ns(tfI,knots=i.kn) 1632+ + lex.Cst:Ns(age,knots=a.kn):Ns(tfI,knots=i.kn) 1633+ + lex.Cst + sex ) 1634\end{Sinput} 1635\begin{Soutput} 1636stats::glm Poisson analysis of Lexis object tsNA20(dmCs) with log link: 1637Rates for transitions: DM->Dead, Ins->Dead 1638\end{Soutput} 1639\begin{Sinput} 1640> pgi <- ci.pred( gm, ndI ) 1641> pga <- ci.pred( gm, ndA ) 1642> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1643> matshade( ndI$age, cbind( pgi, pii )*1000, plot=T, 1644+ lty=c("solid","21"), lend="butt", lwd=2, log="y", 1645+ xlab="Age (years)", ylab="Mortality rates per 1000 PY", 1646+ alpha=c(0.2,0.1), col=c("black","red") ) 1647> matshade( ndA$age, cbind( pga, pia )*1000, 1648+ lty=c("solid","21"), lend="butt", lwd=2, 1649+ alpha=c(0.2,0.1), col=c("black","red") ) 1650\end{Sinput} 1651\end{Schunk} 1652\insfig{splint}{0.8}{Spline-by-spline interaction between age and 1653 duration (model \textrm{\tt gm}, black), and the interaction using a 1654 non-linear effect of age at entry (model \textrm{\tt im}, red), 1655 corresponding to the red curves in figure \ref{fig:dur-int}.} 1656This is in figure \ref{fig:splint}, p. \pageref{fig:splint}. 1657 1658\subsection{Evaluating interactions} 1659 1660Here we see that the interaction effect is such that in the older ages 1661the length of insulin use has an increasing effect on mortality. 1662 1663Even though there is no statistically significant interaction between 1664age and time since start of insulin, it would be illustrative to show 1665the RR as a function of age at insulin and age at follow-up: 1666\begin{Schunk} 1667\begin{Sinput} 1668> ndR <- transform( ndI, lex.Cst="DM", tfI=0 ) 1669> iRR <- ci.exp( im, ctr.mat=list(ndI,ndR) ) 1670> gRR <- ci.exp( gm, ctr.mat=list(ndI,ndR) ) 1671> par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, las=1, bty="n" ) 1672> matshade( ndI$age, cbind(gRR,iRR), lty=1, log="y", plot=TRUE, 1673+ xlab="Age (years)", ylab="Rate ratio: Ins vs. non-Ins", 1674+ col=c("black","red") ) 1675> abline( h=1 ) 1676\end{Sinput} 1677\end{Schunk} 1678\insfig{RR-int}{0.8}{The effect of duration of insulin use at 1679 different ages of follow-up (and age at insulin start). Estimates 1680 are from the model with an interaction term using a non-linear 1681 effect of age at insulin start (model \textrm{\tt im}, red) and 1682 using a general spline interactions (model \textrm{\tt gm}, 1683 black). It appears that the general interaction over-models a bit.} 1684This is in figure \ref{fig:RR-int}, p. \pageref{fig:RR-int}. 1685 1686The advantage of the parametric modeling (be that with age at insulin 1687or general spline interaction) is that it is straight-forward to 1688\emph{test} whether we have an interaction. 1689 1690\section{Separate models} 1691 1692In the above we insisted on making a joint model for the 1693\texttt{DM}$\rightarrow$\texttt{Dead} and the 1694\texttt{Ins}$\rightarrow$\texttt{Dead} 1695transitions, but with the complications demonstrated it would actually 1696have been more sensible to model the two transitions separately: 1697\begin{Schunk} 1698\begin{Sinput} 1699> dmd <- glm.Lexis( dmCs, 1700+ from="DM", to="Dead", 1701+ formula = ~ Ns(age,knots=a.kn) 1702+ + sex ) 1703\end{Sinput} 1704\begin{Soutput} 1705stats::glm Poisson analysis of Lexis object dmCs with log link: 1706Rates for the transition: DM->Dead 1707\end{Soutput} 1708\begin{Sinput} 1709> ind <- glm.Lexis( dmCs, 1710+ from="Ins", to="Dead", 1711+ formula = ~ Ns(age,knots=a.kn) 1712+ + Ns(tfI,knots=i.kn) 1713+ + Ns(age-tfI,knots=a.kn) 1714+ + sex ) 1715\end{Sinput} 1716\begin{Soutput} 1717stats::glm Poisson analysis of Lexis object dmCs with log link: 1718Rates for the transition: Ins->Dead 1719\end{Soutput} 1720\begin{Sinput} 1721> ini <- ci.pred( ind, ndI ) 1722> dmi <- ci.pred( dmd, ndI ) 1723> dma <- ci.pred( dmd, ndA ) 1724\end{Sinput} 1725\end{Schunk} 1726The estimated mortality rates are shown in figure \ref{fig:sep-mort}, 1727p. \pageref{fig:sep-mort}, using: 1728\begin{Schunk} 1729\begin{Sinput} 1730> par(mar=c(3,3,1,1),mgp=c(3,1,0)/1.6,las=1,bty="n") 1731> matshade( ndI$age, ini*1000, plot=TRUE, log="y", 1732+ xlab="Age (years)", ylab="Mortality rates per 1000 PY", 1733+ lwd=2, col="red" ) 1734> matshade( ndA$age, dma*1000, 1735+ lwd=2, col="black" ) 1736\end{Sinput} 1737\end{Schunk} 1738The estimated RRs are computed using that the estimates from the two 1739models are uncorrelated, and hence qualify for \texttt{ci.ratio} (this 1740and the previous graph 1741appear in figure \ref{fig:Ins-noIns}, p. \pageref{fig:Ins-noIns}) 1742\begin{Schunk} 1743\begin{Sinput} 1744> par(mar=c(3,3,1,1),mgp=c(3,1,0)/1.6,las=1,bty="n") 1745> matshade( ndI$age, ci.ratio(ini,dmi), plot=TRUE, log="y", 1746+ xlab="Age (years)", ylab="RR insulin vs. no insulin", 1747+ lwd=2, col="red" ) 1748> abline( h=1 ) 1749\end{Sinput} 1750\end{Schunk} 1751\begin{figure}[tb] 1752\centering 1753\includegraphics[width=0.49\textwidth]{flup-sep-mort} 1754\includegraphics[width=0.49\textwidth]{flup-sep-HR} 1755\caption{\it Left panel: Mortality rates from separate models for the 1756 two mortality transitions; the \textrm{\tt 1757 DM}$\rightarrow$\textrm{\tt Dead} transition modeled by age alone; 1758 \textrm{\tt Ins}$\rightarrow$\textrm{\tt Dead} transition modeled 1759 with spline effects of current age, time since insulin and and age 1760 at insulin. \newline Right panel: Mortality HR of insulin vs. no insulin.} 1761\label{fig:Ins-noIns} 1762\end{figure} 1763 1764\chapter{More states} 1765 1766\section{Subdividing states} 1767 1768It may be of interest to subdivide the states following the 1769intermediate event according to whether the event has occurred or 1770not. This will enable us to address the question of the fraction of 1771the patients that ever go on insulin. 1772 1773This is done by the argument \texttt{split.states=TRUE}. 1774\begin{Schunk} 1775\begin{Sinput} 1776> dmCs <- cutLexis( data = dmS2, 1777+ cut = dmS2$doins, 1778+ timescale = "per", 1779+ new.state = "Ins", 1780+ new.scale = "tfI", 1781+ precursor.states = "DM", 1782+ split.states = TRUE ) 1783> summary( dmCs ) 1784\end{Sinput} 1785\begin{Soutput} 1786Transitions: 1787 To 1788From DM Ins Dead Dead(Ins) Records: Events: Risk time: Persons: 1789 DM 35135 1694 2048 0 38877 3742 45885.49 9899 1790 Ins 0 5762 0 451 6213 451 8387.77 1791 1791 Sum 35135 7456 2048 451 45090 4193 54273.27 9996 1792\end{Soutput} 1793\end{Schunk} 1794We can illustrate the numbers and the transitions (figure 1795\ref{fig:box4}, p. \pageref{fig:box4}) 1796\begin{Schunk} 1797\begin{Sinput} 1798> boxes( dmCs, boxpos=list(x=c(15,15,85,85), 1799+ y=c(85,15,85,15)), 1800+ scale.R=1000, show.BE=TRUE ) 1801\end{Sinput} 1802\end{Schunk} 1803\insfig{box4}{0.7}{Transitions between 4 states: the numbers \emph{in} 1804 the boxes are person-years (middle), and below the number of persons 1805 who start, respectively end their follow-up in each of the states.} 1806 1807Note that it is only the mortality rates that we have been modeling, 1808namely the transitions \texttt{DM}$\rightarrow$\texttt{Dead} 1809and \texttt{Ins}$\rightarrow$\texttt{Dead(Ins)}. 1810If we were to model the cumulative risk of using insulin we would also 1811need a model for the DM$\rightarrow$Ins 1812transition. Subsequent to that we would then compute the probability 1813of being in each state conditional on suitable starting 1814conditions. With models where transition rates depend on several time 1815scales this is not a trivial task. This is treated in more detail in the 1816vignette on \texttt{simLexis}. 1817 1818\section{Multiple intermediate events} 1819 1820We may be interested in starting either insulin or OAD (oral 1821anti-diabetic drugs), thus giving rise to more states and more 1822time scales. This can be accomplished by the \texttt{mcutLexis} 1823function, that generalizes \texttt{cutLexis}: 1824\begin{Schunk} 1825\begin{Sinput} 1826> dmM <- mcutLexis( dmL, 1827+ timescale = "per", 1828+ wh = c("doins","dooad"), 1829+ new.states = c("Ins","OAD"), 1830+ new.scales = c("tfI","tfO"), 1831+ precursor.states = "DM", 1832+ ties.resolve = TRUE ) 1833\end{Sinput} 1834\begin{Soutput} 1835NOTE: 15 records with tied events times resolved (adding 0.01 random uniform), 1836 so results are only reproducible if the random number seed was set. 1837\end{Soutput} 1838\begin{Sinput} 1839> summary( dmM, t=T ) 1840\end{Sinput} 1841\begin{Soutput} 1842Transitions: 1843 To 1844From DM Dead OAD Ins OAD-Ins Ins-OAD Records: Events: Risk time: Persons: 1845 DM 2830 1056 2957 689 0 0 7532 4702 22920.34 7532 1846 OAD 0 992 3327 0 1005 0 5324 1997 22965.24 5324 1847 Ins 0 152 0 462 0 172 786 324 3883.06 786 1848 OAD-Ins 0 266 0 0 739 0 1005 266 3770.53 1005 1849 Ins-OAD 0 33 0 0 0 139 172 33 734.09 172 1850 Sum 2830 2499 6284 1151 1744 311 14819 7322 54273.27 9996 1851 1852Timescales: 1853 per age tfD tfI tfO 1854 "" "" "" "Ins" "OAD" 1855\end{Soutput} 1856\end{Schunk} 1857We see that we now have two time scales defined as entry since into 1858states. 1859\begin{Schunk} 1860\begin{Sinput} 1861> wh <- c(subset(dmM,lex.Cst=="Ins-OAD")$lex.id[1:2], 1862+ subset(dmM,lex.Cst=="OAD-Ins")$lex.id[1:2]) 1863> options( width=110 ) 1864> print( subset( dmM, lex.id %in% wh )[,c('lex.id',names(dmM[1:8]),c("doins","dooad"))], 1865+ digits=6, row.names=FALSE ) 1866\end{Sinput} 1867\begin{Soutput} 1868 lex.id tfI tfO per age tfD lex.dur lex.Cst lex.Xst doins dooad 1869 18 NA NA 1996.75 61.7221 0.0000000 1.16906229 DM OAD 2005.99 1997.92 1870 18 NA 0.00000 1997.92 62.8912 1.1690623 8.07939767 OAD OAD-Ins 2005.99 1997.92 1871 18 0.00000000 8.07940 2005.99 70.9706 9.2484600 4.00273785 OAD-Ins OAD-Ins 2005.99 1997.92 1872 25 NA NA 2003.69 60.3422 0.0000000 1.88090349 DM OAD 2008.64 2005.57 1873 25 NA 0.00000 2005.57 62.2231 1.8809035 3.06913073 OAD OAD-Ins 2008.64 2005.57 1874 25 0.00000000 3.06913 2008.64 65.2923 4.9500342 1.35797399 OAD-Ins OAD-Ins 2008.64 2005.57 1875 20 NA NA 2009.25 53.2183 0.0000000 0.04071988 DM Ins 2009.29 2009.29 1876 20 0.00000000 NA 2009.29 53.2591 0.0407199 0.00131847 Ins Ins-OAD 2009.29 2009.29 1877 20 0.00131847 0.00000 2009.29 53.2604 0.0420383 0.70813277 Ins-OAD Ins-OAD 2009.29 2009.29 1878 38 NA NA 2008.37 63.9316 0.0000000 0.09308693 DM Ins 2008.46 2008.67 1879 38 0.00000000 NA 2008.46 64.0246 0.0930869 0.21355236 Ins Ins-OAD 2008.46 2008.67 1880 38 0.21355236 0.00000 2008.67 64.2382 0.3066393 1.32511978 Ins-OAD Dead 2008.46 2008.67 1881\end{Soutput} 1882\end{Schunk} 1883We can also illustrate the transitions to the different states, as in 1884figure \ref{fig:mbox}: 1885\begin{Schunk} 1886\begin{Sinput} 1887> boxes( dmM, boxpos=list(x=c(15,80,40,40,85,85), 1888+ y=c(50,50,90,10,90,10)), 1889+ scale.R=1000, show.BE=TRUE ) 1890\end{Sinput} 1891\end{Schunk} 1892\insfig{mbox}{1.0}{Boxes for the \textrm{\tt dmM} object. The numbers 1893 \emph{in} the boxes are person-years (middle), and below the number 1894 of persons who start, respectively end their follow-up in each of 1895 the states.} 1896We may not be interested in whether persons were prescribed insulin 1897before OAD or vice versa, in which case we would combine the levels 1898with both insulin and OAD to one, regardless of order (figure 1899\ref{fig:mboxr}): 1900\begin{Schunk} 1901\begin{Sinput} 1902> summary( dmMr <- Relevel( dmM, list('OAD+Ins'=5:6), first=FALSE) ) 1903\end{Sinput} 1904\begin{Soutput} 1905Transitions: 1906 To 1907From DM Dead OAD Ins OAD+Ins Records: Events: Risk time: Persons: 1908 DM 2830 1056 2957 689 0 7532 4702 22920.34 7532 1909 OAD 0 992 3327 0 1005 5324 1997 22965.24 5324 1910 Ins 0 152 0 462 172 786 324 3883.06 786 1911 OAD+Ins 0 299 0 0 878 1177 299 4504.62 1177 1912 Sum 2830 2499 6284 1151 2055 14819 7322 54273.27 9996 1913\end{Soutput} 1914\begin{Sinput} 1915> boxes( dmMr, boxpos=list(x=c(15,50,15,85,85), 1916+ y=c(85,50,15,85,15)), 1917+ scale.R=1000, show.BE=TRUE ) 1918\end{Sinput} 1919\end{Schunk} 1920\insfig{mboxr}{1.0}{Boxes for the \textrm{\tt dmMr} object with 1921 collapsed states. The numbers \emph{in} the boxes are person-years 1922 (middle), and below the number of persons who start, respectively 1923 end their follow-up in each of the states.} 1924 1925Diagrams as those in figures 1926\ref{fig:mbox} and 1927\ref{fig:mboxr} gives an overview of the possible transitions, 1928which states it might be relevant to collapse, and which transitions 1929to model and how. 1930 1931The actual modeling of the transition rates is straightforward; 1932split the data along some timescale and then use \texttt{glm.Lexis} or 1933\texttt{gam.Lexis}, where it is possible to select the transitions 1934modelled. This is also possible with the \texttt{coxph.Lexis} 1935function, but it requires that a single time scale be selected as the 1936baseline time scale, and the effect of this will not be accessible. 1937 1938\chapter{\texttt{Lexis} functions} 1939 1940The \texttt{Lexis} machinery has evolved over time since it was first 1941introduced in a workable version in \texttt{Epi\_1.0.5} in August 2008. 1942 1943Over the years there have been additions of tools for handling 1944multistate data. Here is a list of the current functions relating to 1945\texttt{Lexis} objects with a very brief description; it does not 1946replace the documentation. Unless otherwise stated, functions named 1947\texttt{something.Lexis} (with a ``\texttt{.}'') are S3 methods for 1948\texttt{Lexis} objects, so you can skip the ``\texttt{.Lexis}'' in 1949daily use. 1950 1951\setlist{noitemsep} 1952\begin{description} 1953 1954\item[Define]\ \\ 1955\begin{description} 1956\item[\texttt{Lexis}] defines a \texttt{Lexis} object 1957\end{description} 1958 1959\item[Cut and split]\ \\ 1960\begin{description} 1961\item[\texttt{cutLexis}] cut follow-up at intermediate event 1962\item[\texttt{mcutLexis}] cut follow-up at several intermediate events 1963\item[\texttt{countLexis}] cut follow-up at intermediate event count 1964 the no. events so far 1965\item[\texttt{splitLexis}] split follow up along a time scale 1966\item[\texttt{splitMulti}] split follow up along a time scale --- from 1967 the \texttt{popEpi} package, faster and has simpler syntax than 1968 \texttt{splitLexis} 1969\item[\texttt{addCov.Lexis}] add clinical measurements at a given date to a 1970 \texttt{Lexis} object 1971\end{description} 1972 1973\item[Boxes and plots]\ \\ 1974\begin{description} 1975\item[\texttt{boxes.Lexis}] draw a diagram of states and transitions 1976\item[\texttt{plot.Lexis}] draw a standard Lexis diagram 1977\item[\texttt{points.Lexis}] add points to a Lexis diagram 1978\item[\texttt{lines.Lexis}] add lines to a Lexis diagram 1979\item[\texttt{PY.ann.Lexis}] annotate life lines in a Lexis diagram 1980\end{description} 1981 1982\item[Summarize and query]\ \\ 1983\begin{description} 1984\item[\texttt{summary.Lexis}] overview of transitions, risk time etc. 1985\item[\texttt{levels.Lexis}] what are the states in the \texttt{Lexis} object 1986\item[\texttt{nid.Lexis}] number of persons in the \texttt{Lexis} 1987 object --- how many unique values of \texttt{lex.id} are present 1988\item[\texttt{entry}] entry time 1989\item[\texttt{exit}] exit time 1990\item[\texttt{status}] status at entry or exit 1991\item[\texttt{timeBand}] factor of time bands 1992\item[\texttt{timeScales}] what time scales are in the \texttt{Lexis} object 1993\item[\texttt{timeSince}] what time scales are defined as time since a given state 1994\item[\texttt{breaks}] what breaks are currently defined 1995\item[\texttt{absorbing}] what are the absorbing states 1996\item[\texttt{transient}] what are the transient states 1997\item[\texttt{preceding}, \texttt{before}] which states precede this 1998\item[\texttt{succeeding}, \texttt{after}] which states can follow this 1999\item[\texttt{tmat.Lexis}] transition matrix for the \texttt{Lexis} object 2000\end{description} 2001 2002\item[Manipulate]\ \\ 2003\begin{description} 2004\item[\texttt{subset.Lexis}, \texttt{[}] subset of a \texttt{Lexis} object 2005\item[\texttt{merge.Lexis}] merges a \texttt{Lexis} objects with a 2006 \texttt{data.frame} 2007\item[\texttt{cbind.Lexis}] bind a \texttt{data.frame} to a \texttt{Lexis} object 2008\item[\texttt{rbind.Lexis}] put two \texttt{Lexis} objects head-to-foot 2009\item[\texttt{transform.Lexis}] transform and add variables 2010\item[\texttt{tsNA20}] turn \texttt{NA}s to 0s for time scales 2011\item[\texttt{Relevel.Lexis}, \texttt{factorize.Lexis}] reorder and 2012 combine states 2013\item[\texttt{rm.tr}] remove transitions from a \texttt{Lexis} object 2014\item[\texttt{bootLexis}] bootstrap sample of \emph{persons} 2015 (\texttt{lex.id}) in the \texttt{Lexis} object 2016\end{description} 2017 2018\item[Simulate]\ \\ 2019\begin{description} 2020\item[\texttt{simLexis}] simulate a \texttt{Lexis} object from 2021 specified transition rate models 2022\item[\texttt{nState}, \texttt{pState}] count state occupancy from a 2023 simulated \texttt{Lexis} object 2024\item[\texttt{plot.pState}, \texttt{lines.pState}] plot state occupancy from a 2025 \texttt{pState} object 2026\end{description} 2027 2028\item[Stack]\ \\ 2029\begin{description} 2030\item[\texttt{stack.Lexis}] make a stacked object for simultaneous 2031 analysis of transitions --- returns a \texttt{stacked.Lexis} object 2032\item[\texttt{subset.stacked.Lexis}] subsets of a \texttt{stacked.Lexis} object 2033\item[\texttt{transform.stacked.Lexis}] transform a \texttt{stacked.Lexis} object 2034\end{description} 2035 2036\item[Interface to other packages]\ \\ 2037\begin{description} 2038\item[\texttt{msdata.Lexis}] interface to \texttt{mstate} package 2039\item[\texttt{etm.Lexis}] interface to \texttt{etm} package 2040\item[\texttt{crr.Lexis}] interface to \texttt{cmprsk} package 2041\end{description} 2042 2043\item[Statistical models] --- these are \emph{not} S3 methods 2044\begin{description} 2045\item[\texttt{glm.Lexis}] fit a \texttt{glm} model using the 2046 \texttt{poisreg} family to (hopefully) time split data 2047\item[\texttt{gam.Lexis}] fit a \texttt{gam} model (from the 2048 \texttt{mgcv} package) using the \texttt{poisreg} family to 2049 (hopefully) time split data 2050\item[\texttt{coxph.Lexis}] fit a Cox model to follow-up in a 2051 \texttt{Lexis} object 2052\end{description} 2053 2054\end{description} 2055 2056\renewcommand{\bibname}{References} 2057 2058\bibliographystyle{plain} 2059\begin{thebibliography}{1} 2060 2061\bibitem{Carstensen.2011a} 2062B~Carstensen and M~Plummer. 2063\newblock Using {L}exis objects for multi-state models in {R}. 2064\newblock {\em Journal of Statistical Software}, 38(6):1--18, 1 2011. 2065 2066%% \bibitem{Iacobelli.2013} 2067%% S~Iacobelli and B~Carstensen. 2068%% \newblock {Multiple time scales in multi-state models}. 2069%% \newblock {\em Stat Med}, 32(30):5315--5327, Dec 2013. 2070 2071\bibitem{Plummer.2011} 2072M~Plummer and B~Carstensen. 2073\newblock Lexis: An {R} class for epidemiological studies with long-term 2074 follow-up. 2075\newblock {\em Journal of Statistical Software}, 38(5):1--12, 1 2011. 2076 2077\end{thebibliography} 2078 2079\addcontentsline{toc}{chapter}{\bibname} 2080 2081\end{document} 2082