1Blurb::
2Use second Kraskov algorithm to compute mutual information
3
4Description::
5This algorithm is derived in \cite Kra04 . The mutual information between
6\f$m\f$ random variables is approximated by
7
8\f[
9I_{2}(X_{1}, X_{2}, \ldots, X_{m}) = \psi(k) + (m-1)\psi(N) - (m-1)/k -
10< \psi(n_{x_{1}}) + \psi(n_{x_{2}}) + \ldots + \psi(n_{x_{m}}) >,
11\f]
12
13where \f$\psi\f$ is the digamma function, \f$k\f$ is the number of nearest
14neighbors being used, and \f$N\f$ is the
15number of samples available for the joint distribution of the random variables.
16For each point \f$z_{i} = (x_{1,i}, x_{2,i}, \ldots, x_{m,i})\f$ in the joint
17distribution, \f$z_{i}\f$ and its \f$k\f$ nearest neighbors are projected into
18each marginal subpsace. For each subspace \f$ j = 1, \ldots, m\f$,
19\f$\epsilon_{j,i}\f$ is defined as the radius of the \f$l_{\infty}\f$-ball
20containing all \f$k+1\f$ points. Then, \f$n_{x_{j,i}}\f$ is the number of points
21in the \f$j\f$-th subspace within a distance of \f$\epsilon_{j,i}\f$ from the
22point \f$x_{j,i}\f$. The angular brackets denote that the average of
23\f$\psi(n_{x_{j,i}})\f$ is taken over all points \f$i = 1, \ldots, N\f$.
24
25Topics::
26Examples::
27\verbatim
28method
29	bayes_calibration queso
30	  dram
31	  seed = 34785
32	  chain_samples = 1000
33	  posterior_stats mutual_info
34		ksg2
35\endverbatim
36
37\verbatim
38method
39	bayes_calibration
40	  queso
41	  dram
42	  chain_samples = 1000 seed = 348
43 	experimental_design
44	  initial_samples = 5
45	  num_candidates = 10
46	  max_hifi_evaluations = 3
47	  ksg2
48\endverbatim
49
50
51Theory::
52Faq::
53See_Also::
54