1
2\documentclass{article}
3
4\usepackage{indentfirst}
5
6\begin{document}
7
8\title{An MCMC Package for R}
9\author{Charles J. Geyer}
10\maketitle
11
12\section{Introduction}
13
14This package is a simple first attempt at a sensible \emph{general}
15MCMC package.  It doesn't do much yet.  It only does ``normal random-walk''
16Metropolis for continuous distributions.
17No non-normal proposals.  No Metropolis-Hastings or Metropolis-Hastings-Green.
18No discrete state.  No dimension jumping.  No simulated tempering.
19No perfect sampling.  There's a lot left to do.  Still, limited as it is,
20it does equilibrium distributions that no other R package does.
21
22Its basic idea is the following.  Given an R function \verb@fred@ that
23calculates the unnormalized density of the desired equilibrium distribution
24of the Markov chain, or, better yet, \emph{log} unnormalized density,
25so we avoid overflow and underflow, the \verb@metrop@ function should
26generate a Markov chain having this stationary distribution.
27
28The package does not do any of the following.
29\begin{itemize}
30\item \textbf{Theory.}  (What R package does?)  It doesn't prove the
31Markov chain is irreducible or ergodic or positive recurrent or
32Harris recurrent or geometrically
33ergodic or uniformly ergodic or satisfies conditions for the central limit
34theorem.
35\item \textbf{Diagnostics.}  There are no non-bogus Markov chain diagnostics
36(except for perfect sampling).  This package doesn't do any bogus diagnostics
37(other R packages do them).
38\item \textbf{Calculus.}  If the putative unnormalized density specified
39by \verb@fred@ is not integrable, then it does not specify an equilibrium
40distribution.  But this package doesn't check that either.
41\end{itemize}
42
43Thus the only requirement the package has to satisfy is that given
44a function \verb@fred@ it correctly simulates a Markov chain that
45actually has \verb@fred@ as its equilibrium distribution
46(when \verb@fred@ actually does specify some equilibrium
47distribution)
48
49\section{Design Issues}
50
51\subsection{First Try}
52
53For a start we have a function with signature
54\begin{verbatim}
55metrop(lud, initial, niter, ...)
56\end{verbatim}
57such that when
58\begin{itemize}
59\item \verb@initial@ is a real vector, the initial state of the Markov
60chain,
61\item \verb@lud@ is a function, the log unnormalized density of the
62equilibrium distribution of the Markov chain,
63such that
64\begin{itemize}
65\item \verb@lud(initial, ...)@ works and produces a finite scalar value and
66\item \verb@lud(x, ...)@ works for any real vector \verb@x@
67having the same length as \verb@initial@ and all elements finite and
68and produces a scalar value that is finite or \verb@-Inf@,
69\end{itemize}
70\end{itemize}
71then the function produces an \verb@niter@ by \verb@length(initial)@ matrix
72whose rows are the iterations of the Markov chain.
73
74\subsubsection{Checks}
75
76If
77\begin{verbatim}
78logh <- lud(initial, ...)
79\end{verbatim}
80then \verb@is.finite(logh)@ is \verb@TRUE@.
81
82Moreover, if \verb@x@ is any vector such that
83\verb@length(x) == length(initial)@ and \verb@all(is.finite(x))@
84are \verb@TRUE@ and
85\begin{verbatim}
86logh <- lud(x, ...)
87\end{verbatim}
88then
89\begin{verbatim}
90is.finite(logh) | (logh == -Inf)
91\end{verbatim}
92is \verb@TRUE@.
93
94Points \verb@x@ having log unnormalized density \verb@-Inf@ have
95density zero (normalized or unnormalized, since a constant times zero is zero)
96hence cannot occur.  Thus if
97\begin{verbatim}
98path <- metrop(fred, x, n, some, extra, arguments)
99\end{verbatim}
100then
101\begin{verbatim}
102all(is.finite(apply(path, 1, fred, some, extra, arguments)))
103\end{verbatim}
104is \verb@TRUE@.
105
106This is how we specify log unnormalized densities for distribution
107having support that is not all of Euclidean space.  The value
108of the log unnormalized density off the support is \verb@-Inf@.
109
110Thus when coding a log unnormalized density, we should normally do
111something like
112\begin{verbatim}
113fred <- function(x, ...)
114{
115    if (! is.numeric(x))
116        stop("argument x not numeric")
117    if (length(x) != d)
118        stop("argument x wrong length")
119    if (! all(is.finite(x)))
120        stop("elements of argument x not all finite")
121    if (! is.in.the.support(x))
122        return(-Inf)
123    return(log.unnormalized.density(x))
124}
125\end{verbatim}
126where \verb@d@ is the dimension of the state space of the Markov chain
127(defined in the global environment or in the \verb@...@ arguments),
128\verb@is.in.the.support(x)@ returns
129\verb@TRUE@ if \verb@x@ is in the support of the desired equilibrium
130distribution and \verb@FALSE@ otherwise and \verb@log.unnormalized.density(x)@
131calculates the log unnormalized density of the desired equilibrium
132distribution at the point \verb@x@, which
133is guaranteed to be finite because \verb@x@ is in the support if the
134this code is executed.
135
136Of course, you needn't actually have functions named
137\verb@is.in.the.support@ and \verb@log.unnormalized.density@.
138The point is that you use this logic.  First you check
139whether \verb@x@ is in the support.  If not return \verb@-Inf@.
140If it is, return a finite value.  Do not crash.  Do not return
141\verb@NA@, \verb@NaN@, or \verb@Inf@.  If you do, then \verb@metrop@
142crashes, and it's your fault.
143
144Of course, a crash is no big deal.  Lots of first efforts in R crash.
145You just fix the problem and retry.  Error messages are your friends.
146
147\subsection{Proposal}
148
149We also need to specify the proposal distribution (the preceding stuff
150assumed some default proposal).  This can be any multivariate normal
151distribution on the Euclidean space of dimension \verb@length(initial)@
152having mean zero.  Thus it is specified by specifying its covariance
153matrix.
154
155But to avoid having to check whether the specified covariance
156matrix actually is a covariance matrix, we make the specification
157an arbitrary \verb@d@ by \verb@d@ matrix, call it \verb@scale@,
158where \verb@d@ is the dimension of the state space, specified by
159\verb@length(initial)@,
160and use the proposal \verb@x + scale %*% z@, where \verb@x@ is the
161current state and \verb@z@ is a standard normal random vector
162(``standard'' meaning its covariance matrix is the identity matrix).
163
164Thus we need to add this to the argument list of our function.
165It is now
166\begin{verbatim}
167metrop(lud, initial, niter, scale, ...)
168\end{verbatim}
169
170The covariance matrix specified by this is, of course,
171\verb@scale %*% t(scale)@.  If you want the proposal to have covariance
172matrix \verb@melvin@, then specifying
173\verb@scale = t(chol(melvin))@ will do the job.
174(Of course, many other specifications will also do the job.)
175
176For convenience, we also allow \verb@scale@ to be a vector of
177length \verb@d@ and in this case take \verb@scale = sally@
178to mean the same thing as \verb@scale = diag(sally)@.
179
180For convenience, we also allow \verb@scale@ to be a vector of
181length 1 and in this case take \verb@scale = sally@
182to mean the same thing as \verb@scale = sally * diag(d)@
183where \verb@d@ is still the dimension of the state space
184\verb@length(initial)@.
185
186We can use this last convenience option to give \verb@scale@ a default
187\begin{verbatim}
188metrop(lud, initial, niter, scale = 1, ...)
189\end{verbatim}
190
191In order to tell what is sensible scaling, we need to return the
192acceptance rate (the proportion of proposals that are accepted).
193The only criterion known for choosing sensible scaling is to adjust
194so that the acceptance rate is about 20\%.  Of course, that recommendation
195was derived for a specific toy model that is not very much like what
196people do in real MCMC applications, so 20\% is only a very rough guideline.
197But acceptance rate is all we know to use, so that's what we will output.
198
199Thus the result of \verb@metrop@, assuming we write it in R will be
200something like
201\begin{verbatim}
202return(structure(list(path = path, rate = rate),
203    class = "mcmc"))
204\end{verbatim}
205and if we write it in C will be whatever does the same job.
206
207\subsection{Output I}
208
209Generally we don't want \verb@path@ to be as described above.
210It may be way too big.  We might have \verb@d@, the dimension of the
211state space $10^3$ or even larger and we might have \verb@niter@
212$10^7$ or even larger, the resulting \verb@path@ matrix would
213be $10^{10}$ doubles or $8 \times 10^{10}$ bytes.  Too big to fit in
214my computer!
215
216Thus we facilitate subsampling and batching of the output.
217
218\subsubsection{Subsampling}
219
220If the Markov chain exhibits high autocorrelation, subsampling the
221chain may lose little information.  (Most users way overdo the subsampling,
222but it's not the job of a computer program to keep users from overdoing
223things).  Thus we add an argument \verb@nspac@ that specifies subsampling.
224Only every \verb@nspac@ iterate is output.
225
226\subsubsection{Batching}
227
228The method of batch means uses ``batches'' which are sums over consecutive
229blocks of output.  For most purposes batching is better than subsampling.
230It loses no information while reducing the amount of output even more
231than subsampling.  So we introduce arguments \verb@nbatch@ specifying
232the number of batches and \verb@blen@ specifying the length of the batches.
233
234Our function now has signature
235\begin{verbatim}
236metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1,
237    ...)
238\end{verbatim}
239Note that the argument \verb@niter@ formerly present has vanished.
240The number of iterations that will now be done is
241\verb@nbatch * blen * nspac@.  If we accept the defaults
242\verb@blen = 1@ and \verb@nspac = 1@, then \verb@nbatch@ is the same
243as the former argument \verb@niter@.  Otherwise, it is quite different.
244
245\subsection{Output II}
246
247The preceding section takes care of of the problem of \verb@niter@ being
248too big.  This section deals with the dimension of the state space being
249too big.  When the dimension of the state space is large, we generally
250do not want to output the whole state, but only some function of the
251state.
252
253Thus we need another function (besides \verb@lud@) to produce the
254output vector.  Call it \verb@outfun@.  The requirements on \verb@outfun@
255are
256\begin{itemize}
257\item If \verb@is.finite(lud(x, ...))@, then \verb@outfun(x, ...)@
258works (it does not crash) and produces a vector having all elements finite
259and always of the same length (say \verb@k@).
260\end{itemize}
261\verb@outfun@ will never be called in any other situation
262(that is, never when \verb@x@ is not in the support of the equilibrium
263distribution).
264
265Now we can describe the \verb@path@ component of the output.
266We'll use a little math here.  Write $L$ for \verb@blen@ and
267$M$ for \verb@nspac@.  Write $x_i$ for the $i$-th iterate of the
268Markov chain, and write $g$ for \verb@outfun@.  Then \verb@path[@$j$\verb@, ]@
269is the vector
270$$
271   \frac{1}{L} \sum_{i = 1}^L g(x_{M [L (j - 1) + i]})
272$$
273
274For convenience, we also allow \verb@outfun@ to be a logical vector of
275length \verb@d@ or an integer vector having elements in \verb@1:d@
276or in \verb@-(1:d)@ and in this case take \verb@outfun = fred@
277to mean the same thing as \verb@outfun = function(x) x[fred]@.
278
279For convenience, we also allow \verb@outfun@ to be missing
280take this to mean the same thing as \verb@outfun = function(x) x@,
281that is, the ``outfun'' is the identity function and we are back
282to outputting the entire state.
283
284Our function now has signature
285\begin{verbatim}
286metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1,
287    outfun, ...)
288\end{verbatim}
289
290\subsection{Restarting}
291
292It should be possible to restart the Markov chain and get exactly the
293same results.  It should be possible to continue the Markov chain and
294get exactly the same results.  Thus we need to save the initial and
295final state of the Markov chain
296and the initial and final state of the random number generator
297(the R object \verb@.Random.seed@).
298
299Thus the result of \verb@metrop@ now looks like
300\begin{verbatim}
301return(structure(list(path = path, rate = rate,
302    initial = initial, final = final,
303    initial.seed = iseed, final.seed = .Random.seed),
304    class = "mcmc"))
305\end{verbatim}
306
307We also need to add arguments to \verb@metrop@.  It now has signature
308\begin{verbatim}
309metrop(lud, initial, nbatch, blen = 1, nspac = 1, scale = 1,
310    outfun, object, restart = FALSE, ...)
311\end{verbatim}
312Here \verb@object@ is an R object of class \verb@"mcmc"@, the output
313a previous call to \verb@metrop@, from which we take either initial
314or final state and seed depending the value of \verb@restart@.
315
316While we are at it, it is convenient to allow any or all of the other
317arguments to be missing if \verb@object@ is supplied.  We just take
318the argument from \verb@object@.  Thus we can make calls like
319\begin{verbatim}
320out <- metrop(fred, x, 1e3, scale = 4, blen = 3)
321out.too <- metrop(object = out, nbatch = 1e4)
322\end{verbatim}
323
324Woof!  I now see (how embarrasing) after four earlier versions how to
325use the R class system to make this convenient.  We have three functions.
326\begin{verbatim}
327metrop.default <- function(o, ...)
328UseMethod("metrop")
329
330metrop.mcmc <- function(o, initial, nbatch, blen = 1,
331    nspac = 1, scale = 1, outfun, restart = FALSE, ...)
332{
333    if (missing(nbatch)) nbatch <- o$nbatch
334    if (missing(blen)) blen <- o$blen
335    if (missing(nspac)) nspac <- o$nspac
336    if (missing(scale)) scale <- o$scale
337    if (missing(outfun)) outfun <- o$outfun
338
339    if (restart) {
340        .Random.seed <- o$final.seed
341        return(metrop.function(o$lud, o$final, nbatch, blen,
342            nspac, scale, outfun))
343    } else {
344        .Random.seed <- o$initial.seed
345        return(metrop.function(o$lud, o$initial, nbatch, blen,
346            nspac, scale, outfun))
347    }
348}
349
350metrop.function <- function(o, initial, nbatch, blen = 1,
351    nspac = 1, scale = 1, outfun, restart = FALSE, ...)
352{
353    if (! exists(".Random.seed")) runif(1)
354    initial.seed <- .Random.seed
355    func1 <- function(state) o(state, ...)
356    func2 <- function(state) outfun(state, ...)
357    .Call("metrop", func1, initial, nbatch, blen,
358        nspac, scale, func2, environment(fun = func1),
359        environment(fun = func2))
360}
361\end{verbatim}
362Note that \verb@restart@ is ignored in \verb@metrop.function@.
363We can't ``restart'' when we have no saved state in an \verb@"mcmc"@
364object.
365
366Note also that our \verb@"mcmc"@ objects must now store a lot more stuff
367(and more to come in the next section).
368
369\subsection{Testing and Debugging}
370
371It is nearly impossible to test or debug a random algorithm
372(any Monte Carlo) from looking at its designed (useful to the user)
373output.  In order to do any serious testing or debugging, it is necessary
374to look under the hood.  For the Metropolis algorithm, we need to look
375at the current state, the proposal, the log odds ratio, the uniform
376random variate (if any) used in the Metropolis rejection, and the
377result (accept or reject) of the Metropolis rejection.
378
379Hence we need to add one final argument \verb@debug = FALSE@ to our
380functions and a lot of debugging output to the result.
381
382In debugging a Metropolis (etc.)\ algorithm there is a very important
383principle.  Debugging should use Markov chain theory!  Just enlarge
384the state space of the Markov chain to include
385\begin{itemize}
386\item the proposal (vector),
387\item details of the calculation of the Metropolis-Hastings-Green ratio
388    (for Metropolis this is just the log unnormalized density at the
389    current state and proposal, for others it includes proposal densities)
390    and the calculated ratio,
391\item the uniform random number (if any) used in the decision, and
392\item the decision (\verb@TRUE@ or \verb@FALSE@) in the Metropolis rejection.
393\end{itemize}
394With all that it is easy to tell whether the algorithm is doing the Right
395Thing.  Without all that, it's nearly impossible.
396
397\end{document}
398
399