1 2\documentclass{article} 3 4\usepackage{indentfirst} 5 6\begin{document} 7 8\title{An MCMC Package for R} 9\author{Charles J. Geyer} 10\maketitle 11 12\section{Introduction} 13 14This package is a simple first attempt at a sensible \emph{general} 15MCMC package. It doesn't do much yet. It only does ``normal random-walk'' 16Metropolis for continuous distributions. 17No non-normal proposals. No Metropolis-Hastings or Metropolis-Hastings-Green. 18No discrete state. No dimension jumping. No simulated tempering. 19No perfect sampling. There's a lot left to do. Still, limited as it is, 20it does equilibrium distributions that no other R package does. 21 22Its basic idea is the following. Given an R function \verb@fred@ that 23calculates the unnormalized density of the desired equilibrium distribution 24of the Markov chain, or, better yet, \emph{log} unnormalized density, 25so we avoid overflow and underflow, the \verb@metrop@ function should 26generate a Markov chain having this stationary distribution. 27 28The package does not do any of the following. 29\begin{itemize} 30\item \textbf{Theory.} (What R package does?) It doesn't prove the 31Markov chain is irreducible or ergodic or positive recurrent or 32Harris recurrent or geometrically 33ergodic or uniformly ergodic or satisfies conditions for the central limit 34theorem. 35\item \textbf{Diagnostics.} There are no non-bogus Markov chain diagnostics 36(except for perfect sampling). This package doesn't do any bogus diagnostics 37(other R packages do them). 38\item \textbf{Calculus.} If the putative unnormalized density specified 39by \verb@fred@ is not integrable, then it does not specify an equilibrium 40distribution. But this package doesn't check that either. 41\end{itemize} 42 43Thus the only requirement the package has to satisfy is that given 44a function \verb@fred@ it correctly simulates a Markov chain that 45actually has \verb@fred@ as its equilibrium distribution 46(when \verb@fred@ actually does specify some equilibrium 47distribution) 48 49\section{Design Issues} 50 51\subsection{First Try} 52 53For a start we have a function with signature 54\begin{verbatim} 55metrop(lud, initial, niter, ...) 56\end{verbatim} 57such that when 58\begin{itemize} 59\item \verb@initial@ is a real vector, the initial state of the Markov 60chain, 61\item \verb@lud@ is a function, the log unnormalized density of the 62equilibrium distribution of the Markov chain, 63such that 64\begin{itemize} 65\item \verb@lud(initial, ...)@ works and produces a finite scalar value and 66\item \verb@lud(x, ...)@ works for any real vector \verb@x@ 67having the same length as \verb@initial@ and all elements finite and 68and produces a scalar value that is finite or \verb@-Inf@, 69\end{itemize} 70\end{itemize} 71then the function produces an \verb@niter@ by \verb@length(initial)@ matrix 72whose rows are the iterations of the Markov chain. 73 74\subsubsection{Checks} 75 76If 77\begin{verbatim} 78logh <- lud(initial, ...) 79\end{verbatim} 80then \verb@is.finite(logh)@ is \verb@TRUE@. 81 82Moreover, if \verb@x@ is any vector such that 83\verb@length(x) == length(initial)@ and \verb@all(is.finite(x))@ 84are \verb@TRUE@ and 85\begin{verbatim} 86logh <- lud(x, ...) 87\end{verbatim} 88then 89\begin{verbatim} 90is.finite(logh) | (logh == -Inf) 91\end{verbatim} 92is \verb@TRUE@. 93 94Points \verb@x@ having log unnormalized density \verb@-Inf@ have 95density zero (normalized or unnormalized, since a constant times zero is zero) 96hence cannot occur. Thus if 97\begin{verbatim} 98path <- metrop(fred, x, n, some, extra, arguments) 99\end{verbatim} 100then 101\begin{verbatim} 102all(is.finite(apply(path, 1, fred, some, extra, arguments))) 103\end{verbatim} 104is \verb@TRUE@. 105 106This is how we specify log unnormalized densities for distribution 107having support that is not all of Euclidean space. The value 108of the log unnormalized density off the support is \verb@-Inf@. 109 110Thus when coding a log unnormalized density, we should normally do 111something like 112\begin{verbatim} 113fred <- function(x, ...) 114{ 115 if (! is.numeric(x)) 116 stop("argument x not numeric") 117 if (length(x) != d) 118 stop("argument x wrong length") 119 if (! all(is.finite(x))) 120 stop("elements of argument x not all finite") 121 if (! is.in.the.support(x)) 122 return(-Inf) 123 return(log.unnormalized.density(x)) 124} 125\end{verbatim} 126where \verb@d@ is the dimension of the state space of the Markov chain 127(defined in the global environment or in the \verb@...@ arguments), 128\verb@is.in.the.support(x)@ returns 129\verb@TRUE@ if \verb@x@ is in the support of the desired equilibrium 130distribution and \verb@FALSE@ otherwise and \verb@log.unnormalized.density(x)@ 131calculates the log unnormalized density of the desired equilibrium 132distribution at the point \verb@x@, which 133is guaranteed to be finite because \verb@x@ is in the support if the 134this code is executed. 135 136Of course, you needn't actually have functions named 137\verb@is.in.the.support@ and \verb@log.unnormalized.density@. 138The point is that you use this logic. First you check 139whether \verb@x@ is in the support. If not return \verb@-Inf@. 140If it is, return a finite value. Do not crash. Do not return 141\verb@NA@, \verb@NaN@, or \verb@Inf@. If you do, then \verb@metrop@ 142crashes, and it's your fault. 143 144Of course, a crash is no big deal. Lots of first efforts in R crash. 145You just fix the problem and retry. Error messages are your friends. 146 147\subsection{Proposal} 148 149We also need to specify the proposal distribution (the preceding stuff 150assumed some default proposal). This can be any multivariate normal 151distribution on the Euclidean space of dimension \verb@length(initial)@ 152having mean zero. Thus it is specified by specifying its covariance 153matrix. 154 155But to avoid having to check whether the specified covariance 156matrix actually is a covariance matrix, we make the specification 157an arbitrary \verb@d@ by \verb@d@ matrix, call it \verb@scale@, 158where \verb@d@ is the dimension of the state space, specified by 159\verb@length(initial)@, 160and use the proposal \verb@x + scale %*% z@, where \verb@x@ is the 161current state and \verb@z@ is a standard normal random vector 162(``standard'' meaning its covariance matrix is the identity matrix). 163 164Thus we need to add this to the argument list of our function. 165It is now 166\begin{verbatim} 167metrop(lud, initial, niter, scale, ...) 168\end{verbatim} 169 170The covariance matrix specified by this is, of course, 171\verb@scale %*% t(scale)@. If you want the proposal to have covariance 172matrix \verb@melvin@, then specifying 173\verb@scale = t(chol(melvin))@ will do the job. 174(Of course, many other specifications will also do the job.) 175 176For convenience, we also allow \verb@scale@ to be a vector of 177length \verb@d@ and in this case take \verb@scale = sally@ 178to mean the same thing as \verb@scale = diag(sally)@. 179 180For convenience, we also allow \verb@scale@ to be a vector of 181length 1 and in this case take \verb@scale = sally@ 182to mean the same thing as \verb@scale = sally * diag(d)@ 183where \verb@d@ is still the dimension of the state space 184\verb@length(initial)@. 185 186We can use this last convenience option to give \verb@scale@ a default 187\begin{verbatim} 188metrop(lud, initial, niter, scale = 1, ...) 189\end{verbatim} 190 191In order to tell what is sensible scaling, we need to return the 192acceptance rate (the proportion of proposals that are accepted). 193The only criterion known for choosing sensible scaling is to adjust 194so that the acceptance rate is about 20\%. Of course, that recommendation 195was derived for a specific toy model that is not very much like what 196people do in real MCMC applications, so 20\% is only a very rough guideline. 197But acceptance rate is all we know to use, so that's what we will output. 198 199Thus the result of \verb@metrop@, assuming we write it in R will be 200something like 201\begin{verbatim} 202return(structure(list(path = path, rate = rate), 203 class = "mcmc")) 204\end{verbatim} 205and if we write it in C will be whatever does the same job. 206 207\subsection{Output I} 208 209Generally we don't want \verb@path@ to be as described above. 210It may be way too big. We might have \verb@d@, the dimension of the 211state space $10^3$ or even larger and we might have \verb@niter@ 212$10^7$ or even larger, the resulting \verb@path@ matrix would 213be $10^{10}$ doubles or $8 \times 10^{10}$ bytes. Too big to fit in 214my computer! 215 216Thus we facilitate subsampling and batching of the output. 217 218\subsubsection{Subsampling} 219 220If the Markov chain exhibits high autocorrelation, subsampling the 221chain may lose little information. (Most users way overdo the subsampling, 222but it's not the job of a computer program to keep users from overdoing 223things). Thus we add an argument \verb@nspac@ that specifies subsampling. 224Only every \verb@nspac@ iterate is output. 225 226\subsubsection{Batching} 227 228The method of batch means uses ``batches'' which are sums over consecutive 229blocks of output. For most purposes batching is better than subsampling. 230It loses no information while reducing the amount of output even more 231than subsampling. So we introduce arguments \verb@nbatch@ specifying 232the number of batches and \verb@blen@ specifying the length of the batches. 233 234Our function now has signature 235\begin{verbatim} 236metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1, 237 ...) 238\end{verbatim} 239Note that the argument \verb@niter@ formerly present has vanished. 240The number of iterations that will now be done is 241\verb@nbatch * blen * nspac@. If we accept the defaults 242\verb@blen = 1@ and \verb@nspac = 1@, then \verb@nbatch@ is the same 243as the former argument \verb@niter@. Otherwise, it is quite different. 244 245\subsection{Output II} 246 247The preceding section takes care of of the problem of \verb@niter@ being 248too big. This section deals with the dimension of the state space being 249too big. When the dimension of the state space is large, we generally 250do not want to output the whole state, but only some function of the 251state. 252 253Thus we need another function (besides \verb@lud@) to produce the 254output vector. Call it \verb@outfun@. The requirements on \verb@outfun@ 255are 256\begin{itemize} 257\item If \verb@is.finite(lud(x, ...))@, then \verb@outfun(x, ...)@ 258works (it does not crash) and produces a vector having all elements finite 259and always of the same length (say \verb@k@). 260\end{itemize} 261\verb@outfun@ will never be called in any other situation 262(that is, never when \verb@x@ is not in the support of the equilibrium 263distribution). 264 265Now we can describe the \verb@path@ component of the output. 266We'll use a little math here. Write $L$ for \verb@blen@ and 267$M$ for \verb@nspac@. Write $x_i$ for the $i$-th iterate of the 268Markov chain, and write $g$ for \verb@outfun@. Then \verb@path[@$j$\verb@, ]@ 269is the vector 270$$ 271 \frac{1}{L} \sum_{i = 1}^L g(x_{M [L (j - 1) + i]}) 272$$ 273 274For convenience, we also allow \verb@outfun@ to be a logical vector of 275length \verb@d@ or an integer vector having elements in \verb@1:d@ 276or in \verb@-(1:d)@ and in this case take \verb@outfun = fred@ 277to mean the same thing as \verb@outfun = function(x) x[fred]@. 278 279For convenience, we also allow \verb@outfun@ to be missing 280take this to mean the same thing as \verb@outfun = function(x) x@, 281that is, the ``outfun'' is the identity function and we are back 282to outputting the entire state. 283 284Our function now has signature 285\begin{verbatim} 286metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1, 287 outfun, ...) 288\end{verbatim} 289 290\subsection{Restarting} 291 292It should be possible to restart the Markov chain and get exactly the 293same results. It should be possible to continue the Markov chain and 294get exactly the same results. Thus we need to save the initial and 295final state of the Markov chain 296and the initial and final state of the random number generator 297(the R object \verb@.Random.seed@). 298 299Thus the result of \verb@metrop@ now looks like 300\begin{verbatim} 301return(structure(list(path = path, rate = rate, 302 initial = initial, final = final, 303 initial.seed = iseed, final.seed = .Random.seed), 304 class = "mcmc")) 305\end{verbatim} 306 307We also need to add arguments to \verb@metrop@. It now has signature 308\begin{verbatim} 309metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1, 310 outfun, object, restart = FALSE, ...) 311\end{verbatim} 312Here \verb@object@ is an R object of class \verb@"mcmc"@, the output 313a previous call to \verb@metrop@, from which we take either initial 314or final state and seed depending the value of \verb@restart@. 315 316While we are at it, it is convenient to allow any or all of the other 317arguments to be missing if \verb@object@ is supplied. We just take 318the argument from \verb@object@. Thus we can make calls like 319\begin{verbatim} 320out <- metrop(fred, x, 1e3, scale = 4, blen = 3) 321out.too <- metrop(object = out, nbatch = 1e4) 322\end{verbatim} 323 324Woof! I now see (how embarrasing) after four earlier versions how to 325use the R class system to make this convenient. We have three functions. 326\begin{verbatim} 327metrop.default <- function(o, ...) 328UseMethod("metrop") 329 330metrop.mcmc <- function(o, initial, nbatch, blen = 1, 331 nspac = 1, scale = 1, outfun, restart = FALSE, ...) 332{ 333 if (missing(nbatch)) nbatch <- o$nbatch 334 if (missing(blen)) blen <- o$blen 335 if (missing(nspac)) nspac <- o$nspac 336 if (missing(scale)) scale <- o$scale 337 if (missing(outfun)) outfun <- o$outfun 338 339 if (restart) { 340 .Random.seed <- o$final.seed 341 return(metrop.function(o$lud, o$final, nbatch, blen, 342 nspac, scale, outfun)) 343 } else { 344 .Random.seed <- o$initial.seed 345 return(metrop.function(o$lud, o$initial, nbatch, blen, 346 nspac, scale, outfun)) 347 } 348} 349 350metrop.function <- function(o, initial, nbatch, blen = 1, 351 nspac = 1, scale = 1, outfun, restart = FALSE, ...) 352{ 353 if (! exists(".Random.seed")) runif(1) 354 initial.seed <- .Random.seed 355 func1 <- function(state) o(state, ...) 356 func2 <- function(state) outfun(state, ...) 357 .Call("metrop", func1, initial, nbatch, blen, 358 nspac, scale, func2, environment(fun = func1), 359 environment(fun = func2)) 360} 361\end{verbatim} 362Note that \verb@restart@ is ignored in \verb@metrop.function@. 363We can't ``restart'' when we have no saved state in an \verb@"mcmc"@ 364object. 365 366Note also that our \verb@"mcmc"@ objects must now store a lot more stuff 367(and more to come in the next section). 368 369\subsection{Testing and Debugging} 370 371It is nearly impossible to test or debug a random algorithm 372(any Monte Carlo) from looking at its designed (useful to the user) 373output. In order to do any serious testing or debugging, it is necessary 374to look under the hood. For the Metropolis algorithm, we need to look 375at the current state, the proposal, the log odds ratio, the uniform 376random variate (if any) used in the Metropolis rejection, and the 377result (accept or reject) of the Metropolis rejection. 378 379Hence we need to add one final argument \verb@debug = FALSE@ to our 380functions and a lot of debugging output to the result. 381 382In debugging a Metropolis (etc.)\ algorithm there is a very important 383principle. Debugging should use Markov chain theory! Just enlarge 384the state space of the Markov chain to include 385\begin{itemize} 386\item the proposal (vector), 387\item details of the calculation of the Metropolis-Hastings-Green ratio 388 (for Metropolis this is just the log unnormalized density at the 389 current state and proposal, for others it includes proposal densities) 390 and the calculated ratio, 391\item the uniform random number (if any) used in the decision, and 392\item the decision (\verb@TRUE@ or \verb@FALSE@) in the Metropolis rejection. 393\end{itemize} 394With all that it is easy to tell whether the algorithm is doing the Right 395Thing. Without all that, it's nearly impossible. 396 397\end{document} 398 399