1:orphan: 2 3.. 4 _href from docs/source/index.rst 5 6****************** 7Gaussian Processes 8****************** 9 10GP Basics 11========= 12 13Sometimes an unknown parameter or variable in a model is not a scalar value or 14a fixed-length vector, but a *function*. A Gaussian process (GP) can be used 15as a prior probability distribution whose support is over the space of 16continuous functions. A GP prior on the function :math:`f(x)` is usually written, 17 18.. math:: 19 20 f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,. 21 22The function values are modeled as a draw from a multivariate normal 23distribution that is parameterized by the mean function, :math:`m(x)`, and the 24covariance function, :math:`k(x, x')`. Gaussian processes are a convenient 25choice as priors over functions due to the marginalization and conditioning 26properties of the multivariate normal distribution. Usually, the marginal 27distribution over :math:`f(x)` is evaluated during the inference step. The 28conditional distribution is then used for predicting the function values 29:math:`f(x_*)` at new points, :math:`x_*`. 30 31The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate 32normal, 33 34.. math:: 35 36 \begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim 37 \text{N}\left( 38 \begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,, 39 \begin{bmatrix} k(x,x') & k(x_*, x) \\ 40 k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix} 41 \right) \,. 42 43Starting from the joint distribution, one obtains the marginal distribution 44of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional 45distribution is 46 47.. math:: 48 49 f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\, 50 k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,. 51 52.. note:: 53 54 For more information on GPs, check out the book `Gaussian Processes for 55 Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen & 56 Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf>`_ 57 by D. Mackay. 58 59PyMC3 is a great environment for working with fully Bayesian Gaussian Process 60models. GPs in PyMC3 have a clear syntax and are highly composable, and many 61predefined covariance functions (or kernels), mean functions, and several GP 62implementations are included. GPs are treated as distributions that can be 63used within larger or hierarchical models, not just as standalone regression 64models. 65 66Mean and covariance functions 67============================= 68 69Those who have used the GPy or GPflow Python packages will find the syntax for 70construction mean and covariance functions somewhat familiar. When first 71instantiated, the mean and covariance functions are parameterized, but not 72given their inputs yet. The covariance functions must additionally be provided 73with the number of dimensions of the input matrix, and a list that indexes 74which of those dimensions they are to operate on. The reason for this design 75is so that covariance functions can be constructed that are combinations of 76other covariance functions. 77 78For example, to construct an exponentiated quadratic covariance function that 79operates on the second and third column of a three column matrix representing 80three predictor variables:: 81 82 ls = [2, 5] # the lengthscales 83 cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2]) 84 85Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second 86and third dimension to have a different lengthscale. The reason we have to 87specify :code:`input_dim`, the total number of columns of :code:`X`, and 88:code:`active_dims`, which of those columns or dimensions the covariance 89function will act on, is because :code:`cov_func` hasn't actually seen the 90input data yet. The :code:`active_dims` argument is optional, and defaults to 91all columns of the matrix of inputs. 92 93Covariance functions in PyMC3 closely follow the algebraic rules for kernels, 94which allows users to combine covariance functions into new ones, for example: 95 96- The sum of two covariance functions is also a covariance function:: 97 98 99 cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...) 100 101- The product of two covariance functions is also a covariance function:: 102 103 104 cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...) 105 106- The product (or sum) of a covariance function with a scalar is a 107 covariance function:: 108 109 110 cov_func = eta**2 * pm.gp.cov.Matern32(...) 111 112 113 114After the covariance function is defined, it is now a function that is 115evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since 116PyMC3 is built on top of Theano, it is relatively easy to define and experiment 117with non-standard covariance and mean functons. For more information check out 118the tutorial on covariance functions. 119 120 121GP Implementations 122================== 123 124PyMC3 includes several GP implementations, including marginal and latent 125variable models and also some fast approximations. Their usage all follows a 126similar pattern: First, a GP is instantiated with a mean function and a 127covariance function. Then, GP objects can be added together, allowing for 128function characteristics to be carefully modeled and separated. Finally, one 129of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP 130object to actually construct the PyMC3 random variable that represents the 131function prior. 132 133Using :code:`gp.Latent` for the example, the syntax to first specify the GP 134is:: 135 136 gp = pm.gp.Latent(mean_func, cov_func) 137 138The first argument is the mean function and the second is the covariance 139function. We've made the GP object, but we haven't made clear which function 140it is to be a prior for, what the inputs are, or what parameters it will be 141conditioned on. 142 143.. note:: 144 145 The :code:`gp.Marginal` class and similar don't have a :code:`prior` method. 146 Instead they have a :code:`marginal_likelihood` method that is used similarly, 147 but has additional required arguments, such as the observed data, noise, 148 or other, depending on the implementation. See the notebooks for examples. 149 The :code:`conditional` method works similarly. 150 151Calling the `prior` method will create a PyMC3 random variable that represents 152the latent function :math:`f(x) = \mathbf{f}`:: 153 154 f = gp.prior("f", X) 155 156:code:`f` is a random variable that can be used within a PyMC3 model like any 157other type of random variable. The first argument is the name of the random 158variable representing the function we are placing the prior over. 159The second argument is the inputs to the function that the prior is over, 160:code:`X`. The inputs are usually known and present in the data, but they can 161also be PyMC3 random variables. If the inputs are a Theano tensor or a 162PyMC3 random variable, the :code:`shape` needs to be given. 163 164Usually at this point, inference is performed on the model. The 165:code:`conditional` method creates the conditional, or predictive, 166distribution over the latent function at arbitrary :math:`x_*` input points, 167:math:`f(x_*)`. To construct the conditional distribution we write:: 168 169 f_star = gp.conditional("f_star", X_star) 170 171Additive GPs 172============ 173 174The GP implementation in PyMC3 is constructed so that it is easy to define 175additive GPs and sample from individual GP components. We can write:: 176 177 gp1 = pm.gp.Marginal(mean_func1, cov_func1) 178 gp2 = pm.gp.Marginal(mean_func2, cov_func2) 179 gp3 = gp1 + gp2 180 181The GP objects have to have the same type, :code:`gp.Marginal` cannot 182be added to :code:`gp.Latent`. 183 184Consider two independent GP distributed functions, :math:`f_1(x) \sim 185\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim 186\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of 187:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2 and f_1^* + f_2^*` is 188 189.. math:: 190 191 \begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^* 192 \\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim 193 \text{N}\left( 194 \begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\ 195 m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\, 196 \begin{bmatrix} 197 K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\ 198 K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\ 199 0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\ 200 0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\ 201 K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\ 202 K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**} 203 \end{bmatrix} 204 \right) \,. 205 206Using the joint distribution to obtain the conditional distribution of :math:`f_1^*` 207with the contribution due to :math:`f_1 + f_2` factored out, we get 208 209.. math:: 210 f_1^* \mid f_1 + f_2 \sim \text{N}\left( 211 m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\, 212 K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,. 213 214 215These equations show how to break down GP models into individual components to see how each 216contributes to the data. For more information, check out `David Duvenaud's PhD 217thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_. 218 219The GP objects in PyMC3 keeps track of these marginals automatically. The 220following code sketch shows how to define the conditional distribution of 221:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for 222other implementations. The first block fits the GP prior. We denote 223:math:`f_1 + f_2` as just :math:`f` for brevity:: 224 225 with pm.Model() as model: 226 gp1 = pm.gp.Marginal(mean_func1, cov_func1) 227 gp2 = pm.gp.Marginal(mean_func2, cov_func2) 228 229 # gp represents f1 + f2. 230 gp = gp1 + gp2 231 232 f = gp.marginal_likelihood("f", X, y, noise) 233 234 trace = pm.sample(1000) 235 236 237To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we 238also need to include the additional arguments, :code:`X`, :code:`y`, and 239:code:`noise`:: 240 241 with model: 242 # conditional distributions of f1 and f2 243 f1_star = gp1.conditional("f1_star", X_star, 244 given={"X": X, "y": y, "noise": noise, "gp": gp}) 245 f2_star = gp2.conditional("f2_star", X_star, 246 given={"X": X, "y": y, "noise": noise, "gp": gp}) 247 248 # conditional of f1 + f2, `given` not required 249 f_star = gp.conditional("f_star", X_star) 250 251This second block produces the conditional distributions. Notice that extra 252arguments are required for conditionals of :math:`f1` and :math:`f2`, but not 253:math:`f`. This is because those arguments are cached when 254:code:`.marginal_likelihood` is called on :code:`gp`. 255 256.. note:: 257 When constructing conditionals, the additional arguments :code:`X`, :code:`y`, 258 :code:`noise` and :code:`gp` must be provided as a dict called `given`! 259 260Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called, 261their conditionals need to be provided with the required inputs. In the same 262fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random 263variables that can now be used like any other random variable in PyMC3. 264 265Check the notebooks for detailed demonstrations of the usage of GP functionality 266in PyMC3. 267