1Blurb:: 2Use second Kraskov algorithm to compute mutual information 3 4Description:: 5This algorithm is derived in \cite Kra04 . The mutual information between 6\f$m\f$ random variables is approximated by 7 8\f[ 9I_{2}(X_{1}, X_{2}, \ldots, X_{m}) = \psi(k) + (m-1)\psi(N) - (m-1)/k - 10< \psi(n_{x_{1}}) + \psi(n_{x_{2}}) + \ldots + \psi(n_{x_{m}}) >, 11\f] 12 13where \f$\psi\f$ is the digamma function, \f$k\f$ is the number of nearest 14neighbors being used, and \f$N\f$ is the 15number of samples available for the joint distribution of the random variables. 16For each point \f$z_{i} = (x_{1,i}, x_{2,i}, \ldots, x_{m,i})\f$ in the joint 17distribution, \f$z_{i}\f$ and its \f$k\f$ nearest neighbors are projected into 18each marginal subpsace. For each subspace \f$ j = 1, \ldots, m\f$, 19\f$\epsilon_{j,i}\f$ is defined as the radius of the \f$l_{\infty}\f$-ball 20containing all \f$k+1\f$ points. Then, \f$n_{x_{j,i}}\f$ is the number of points 21in the \f$j\f$-th subspace within a distance of \f$\epsilon_{j,i}\f$ from the 22point \f$x_{j,i}\f$. The angular brackets denote that the average of 23\f$\psi(n_{x_{j,i}})\f$ is taken over all points \f$i = 1, \ldots, N\f$. 24 25Topics:: 26Examples:: 27\verbatim 28method 29 bayes_calibration queso 30 dram 31 seed = 34785 32 chain_samples = 1000 33 posterior_stats mutual_info 34 ksg2 35\endverbatim 36 37\verbatim 38method 39 bayes_calibration 40 queso 41 dram 42 chain_samples = 1000 seed = 348 43 experimental_design 44 initial_samples = 5 45 num_candidates = 10 46 max_hifi_evaluations = 3 47 ksg2 48\endverbatim 49 50 51Theory:: 52Faq:: 53See_Also:: 54