1\chapter{Uncertainty Quantification Capabilities}\label{uq}
2
3\section{Overview}\label{uq:overview}
4
5At a high level, uncertainty quantification (UQ) or nondeterministic
6analysis is the process of (1) characterizing input uncertainties, (2)
7forward propagating these uncertainties through a computational model,
8and (3) performing statistical or interval assessments on the
9resulting responses. This process determines the effect of
10uncertainties and assumptions on model outputs or results. In Dakota,
11uncertainty quantification methods primarily focus on the forward
12propagation and analysis parts of the process (2 and 3), where
13probabilistic or interval information on parametric inputs are mapped
14through the computational model to assess statistics or intervals on
15outputs. For an overview of these approaches for engineering
16applications, consult~\cite{Hal00}. Dakota also has emerging methods
17for inference or inverse UQ, such as Bayesian calibration. These
18methods help with (1) by obtaining a statistical characterization of
19input parameters based on provided data.
20
21UQ is related to sensitivity analysis in that the common goal is to
22gain an understanding of how variations in the parameters affect the
23response functions of the engineering design problem. However, for UQ,
24some or all of the components of the parameter vector, are considered
25to be uncertain as specified by particular probability distributions
26(e.g., normal, exponential, extreme value), or other uncertainty
27structures. By assigning specific distributional structure to the
28inputs, distributional structure for the outputs (i.e, response
29statistics) can be inferred.  This migrates from an analysis that is
30more {\em qualitative} in nature, in the case of sensitivity analysis,
31to an analysis that is more rigorously {\em quantitative}.
32
33UQ methods are often distinguished by their ability to propagate
34aleatory or epistemic input uncertainty characterizations, where
35aleatory uncertainties are irreducible variabilities inherent in
36nature and epistemic uncertainties are reducible uncertainties
37resulting from a lack of knowledge. Since sufficient data is generally
38available for aleatory uncertainties, probabilistic methods are
39commonly used for computing response distribution statistics based on
40input probability distribution specifications. Conversely, for
41epistemic uncertainties, any use of probability distributions is based
42on subjective knowledge rather than objective data, and we may
43alternatively explore nonprobabilistic methods based on interval
44specifications.
45
46\subsection{Summary of Dakota UQ Methods}\label{uq:overview:methods}
47
48Dakota contains capabilities for performing nondeterministic analysis
49with both types of input uncertainty. These UQ methods have been
50developed by Sandia Labs, in conjunction with collaborators in
51academia~\cite{Gha99,Gha91,Eld05,Tang10a}.
52
53The aleatory UQ methods in Dakota include various sampling-based
54approaches (e.g., Monte Carlo and Latin Hypercube sampling), local and
55global reliability methods, and stochastic expansion (polynomial chaos
56expansions and stochastic collocation) approaches. The epistemic UQ
57methods include local and global interval analysis and Dempster-Shafer
58evidence theory. These are summarized below and then described in more
59depth in subsequent sections of this chapter. Dakota additionally
60supports mixed aleatory/epistemic UQ via interval-valued probability,
61second-order probability, and Dempster-Shafer theory of
62evidence. These involve advanced model recursions and are described in
63Section~\ref{adv_models:mixed_uq}.
64
65%In addition, future extensions to the DDACE package will make it
66%applicable to general UQ problems, which will augment the Dakota/UQ
67%capabilities.
68%Uncertainty quantification methods (also referred to as
69%nondeterministic analysis methods) in the Dakota/UQ system involve the
70%computation of probabilistic information about response functions
71%based on sets of simulations taken from the specified probability
72%distributions for uncertain parameters. That is,
73
74%The impact on the response functions due to the probabilistic nature
75%of the parameters is often estimated using a sampling-based approach
76%such as Monte Carlo sampling or one of its variants (latin hypercube,
77%quasi-Monte Carlo, Markov-chain Monte Carlo, etc.). In these sampling
78%approaches, a random number generator is used to select different
79%values of the parameters with probability specified by their
80%probability distributions. This is the point that distinguishes UQ
81%sampling from DoE/DACE sampling, in that the former supports general
82%probabilistic descriptions of the parameter set and the latter
83%generally supports only a bounded parameter space description. A
84%particular set of parameter values is often called a \emph{sample
85%point}, or simply a \emph{sample}. With Monte Carlo and Latin
86%Hypercube sampling, the user may specify correlations among the input
87%sample points. After a user-selected number of sample points has been
88%generated, the response functions for each sample are evaluated. Then,
89%a statistical analysis is performed on the response function values to
90%yield information on their characteristics. While this approach is
91%straightforward, and readily amenable to parallel computing, it can be
92%computationally expensive depending on the accuracy requirements of
93%the statistical information (which links directly to the number of
94%sample points).
95
96%Finally, when the input uncertainties are poorly characterized, then
97%epistemic uncertainty methods, such as second-order probability or
98%Dempster-Shafer theory of evidence, can be used to compute intervals
99%of potential probability values. The second-order probability
100%approach performs an ensemble of aleatory UQ analyses, one for each
101%realization of the epistemic parameter set. The ensemble of CDF/CCDF
102%curves generates what is known as a ``horse-tail'' plot.
103%Dempster-Shafer, on the other hand, directly generates probability
104%bounds, known as the belief and plausibility functions.
105
106%This chapter has extensive details on various Dakota UQ methods, but
107%here is a high-level summary of available capabilities:
108
109% BMA TODO: Considerably shorten the following descriptions, moving
110% text to later sections if needed.
111
112\textbf{LHS (Latin Hypercube Sampling)}: This package provides both
113Monte Carlo (random) sampling and Latin Hypercube sampling methods,
114which can be used with probabilistic variables in Dakota that have the
115following distributions: normal, lognormal, uniform, loguniform,
116triangular, exponential, beta, gamma, gumbel, frechet, weibull, poisson,
117binomial, negative binomial, geometric, hypergeometric, and
118user-supplied histograms. In addition, LHS accounts for correlations
119among the variables~\cite{Ima84}, which can be used to accommodate a
120user-supplied correlation matrix or to minimize correlation when a
121correlation matrix is not supplied.
122%The LHS package currently serves
123%two purposes: (1) it can be used for uncertainty quantification by
124%sampling over uncertain variables characterized by probability
125%distributions, or (2) it can be used in a DACE mode in which any
126%design and state variables are treated as having uniform distributions
127%(see the \texttt{active all} view override in the Dakota Reference
128%Manual~\cite{RefMan}). The LHS package historically came in two
129%versions: ``old'' (circa 1980) and ``new'' (circa 1998), but presently
130%only the latter is supported in Dakota, requiring a Fortran 90
131%compiler. This ``new'' LHS is available under a separate GNU Lesser
132%General Public License and is distributed with Dakota.
133In addition to a standard sampling study, we support the capability to perform
134``incremental'' LHS, where a user can specify an initial LHS study
135of N samples, and then re-run an additional incremental study which
136will double the number of samples (to 2N, with the first N being
137carried from the initial study). The full incremental sample of
138size 2N is also a Latin Hypercube, with proper stratification and
139correlation. Statistics for each increment are reported separately at
140the end of the study.
141
142\textbf{Reliability Methods}: This suite of methods includes both
143local and global reliability methods. Local methods include first- and
144second-order versions of the Mean Value method (MVFOSM and MVSOSM) and
145a variety of most probable point (MPP) search methods, including the
146Advanced Mean Value method (AMV and AMV$^2$), the iterated Advanced
147Mean Value method (AMV+ and AMV$^2$+), the Two-point Adaptive
148Nonlinearity Approximation method (TANA-3), and the traditional First
149Order and Second Order Reliability Methods (FORM and
150SORM)~\cite{Hal00}. The MPP search methods may be used in forward
151(Reliability Index Approach (RIA)) or inverse (Performance Measure
152Approach (PMA)) modes, as dictated by the type of level mappings. Each
153of the MPP search techniques solve local optimization problems in
154order to locate the MPP, which is then used as the point about which
155approximate probabilities are integrated (using first- or second-order
156integrations in combination with refinements based on importance
157sampling).
158% Reliability mappings may involve computing reliability and
159% probability levels for prescribed response levels (forward
160% reliability analysis, commonly known as the reliability index
161% approach or RIA) or computing response levels for prescribed
162% reliability and probability levels (inverse reliability analysis,
163% commonly known as the performance measure approach or PMA).
164% Approximation-based MPP search methods (AMV, AMV$^2$, AMV+,
165% AMV$^2$+, and TANA) may be applied in either x-space or u-space, and
166% mappings may involve either cumulative or complementary cumulative
167% distribution functions.
168Global reliability methods are designed to handle
169nonsmooth and multimodal failure surfaces, by creating global
170approximations based on Gaussian process models. They accurately
171resolve a particular contour of a response function and then estimate
172probabilities using multimodal adaptive importance sampling.
173
174\textbf{Stochastic Expansion Methods}: %The objective of these
175%techniques is to characterize the response of systems whose governing
176%equations involve stochastic coefficients.
177The development of these techniques mirrors that of deterministic
178finite element analysis utilizing the notions of projection,
179orthogonality, and weak convergence~\cite{Gha99},~\cite{Gha91}. Rather
180than estimating point probabilities, they form an approximation to the
181functional relationship between response functions and their random
182inputs, which provides a more complete uncertainty representation for
183use in multi-code simulations. Expansion methods include polynomial
184chaos expansions (PCE), which employ multivariate orthogonal
185polynomials that are tailored to representing particular input
186probability distributions, and stochastic collocation (SC), which
187employs multivariate interpolation polynomials.  For PCE, expansion
188coefficients may be evaluated using a spectral projection approach
189(based on sampling, tensor-product quadrature, Smolyak sparse grid, or
190cubature methods for numerical integration) or a regression approach
191(least squares or compressive sensing). For SC, interpolants are
192formed over tensor-product or sparse grids and may be local or global,
193value-based or gradient-enhanced, and nodal or hierarchical. In global
194value-based cases (Lagrange polynomials), the barycentric formulation
195is used~\cite{BerTref04,Klimke05,Higham04} to improve numerical
196efficiency and stability.  Both sets of methods provide analytic
197response moments and variance-based metrics; however, CDF/CCDF
198probabilities are evaluated numerically by sampling on the expansion.
199
200\textbf{Importance Sampling}: Importance sampling is a method that
201allows one to estimate statistical quantities such as failure
202probabilities in a way that is more efficient than Monte Carlo
203sampling. The core idea in importance sampling is that one generates
204samples that are preferentially placed in important regions of the
205space (e.g. in or near the failure region or user-defined region
206of interest), then appropriately weights the samples to obtain an
207unbiased estimate of the failure probability.
208
209\textbf{Adaptive Sampling}: The goal in performing adaptive
210sampling is to construct a surrogate
211model that can be used as an accurate predictor of an expensive
212simulation. The aim is to build a surrogate that minimizes the error
213over the entire domain of interest using as little data as possible
214from the expensive simulation. The adaptive sampling methods start
215with an initial LHS sample, and then adaptively choose samples that
216optimize a particular criteria. For example, if a set of additional
217possible sample points are generated, one criteria is to pick the
218next sample point as the point which maximizes the minimum distance
219to the existing points (maximin). Another criteria is to pick the
220sample point where the surrogate indicates the most uncertainty
221in its prediction.
222
223Recently, Dakota added a new method to assess failure probabilities
224based on ideas from computational geometry.  Part of the idea
225underpinning this method is the idea of throwing ``darts'' which
226are higher dimensional objects than sample points (e.g. lines, planes, etc.)
227The POF (Probability-of-Failure) darts method uses these objects
228to estimate failure probabilities.
229
230\textbf{Interval Analysis}: Interval analysis is often used to model
231epistemic uncertainty. In interval analysis, one assumes that nothing
232is known about an epistemic uncertain variable except that its value lies
233somewhere within an interval. In this situation, it is NOT
234assumed that the value has a uniform probability of occurring
235within the interval. Instead, the interpretation is that
236any value within the interval is a possible value or a potential
237realization of that variable. In interval analysis, the
238uncertainty quantification problem is one of determining the
239resulting bounds on the output (defining the output interval)
240given interval bounds on the inputs. Again, any output response
241that falls within the output interval is a possible output
242with no frequency information assigned to it.
243
244We have the capability to perform interval analysis using either
245global or local methods. In the global approach, one uses either a
246global optimization method (based on a Gaussian process surrogate model)
247or a sampling method to assess the bounds. The
248local method uses gradient information in a derivative-based
249optimization approach, using either SQP (sequential quadratic
250programming) or a NIP (nonlinear interior point) method to obtain bounds.
251
252\textbf{Dempster-Shafer Theory of Evidence}: The objective of evidence
253theory is to model the effects of epistemic uncertainties. Epistemic
254uncertainty refers to the situation where one does not know enough
255to specify a probability distribution on a variable. Sometimes epistemic
256uncertainty is referred to as subjective, reducible, or lack of knowledge
257uncertainty. In contrast, aleatory uncertainty refers to the situation
258where one does have enough information to specify a probability distribution.
259In Dempster-Shafer theory of evidence, the uncertain input variables
260are modeled as sets of intervals. The user assigns a basic probability
261assignment (BPA) to each interval, indicating how likely it is that the
262uncertain input falls within the interval. The intervals may be
263overlapping, contiguous, or have gaps. The intervals and their associated
264BPAs are then propagated through the simulation to obtain cumulative
265distribution functions on belief and plausibility. Belief is the lower
266bound on a probability estimate that is consistent with the evidence, and
267plausibility is the upper bound on a probability estimate that is consistent
268with the evidence. In addition to the full evidence theory structure,
269we have a simplified capability for users wanting to perform pure
270interval analysis (e.g. what is the interval on the output given
271intervals on the input) using either global or local optimization methods.
272Interval analysis is often used to model epistemic variables in
273nested analyses, where probability theory is used to model aleatory variables.
274
275\textbf{Bayesian Calibration}: In Bayesian calibration, uncertain
276input parameters are initially characterized by a ``prior''
277distribution. A Bayesian calibration approach uses experimental data
278together with a likelihood function, which describes how well a
279realization of the parameters is supported by the data, to update this
280prior knowledge. The process yields a posterior distribution of the
281parameters most consistent with the data, such that running the model
282at samples from the posterior yields results consistent with the
283observational data.
284
285
286\subsection{Variables and Responses for UQ}\label{uq:overview:varsresp}
287
288Most of the UQ methods perform a forward uncertainty propagation in
289which probability or interval information for input parameters is
290mapped to probability or interval information for output response
291functions. The $m$ functions in the Dakota response data set are
292interpreted as $m$ general response functions by the Dakota methods
293(with no specific interpretation of the functions as for optimization
294and least squares).
295
296Within the variables specification, uncertain variable descriptions
297are employed to define the parameter probability distributions (see
298Section~\ref{variables:uncertain}). For Bayesian inference methods,
299these uncertain variable properties characterize the prior
300distribution. The continuous aleatory distribution types include:
301normal (Gaussian), lognormal, uniform, loguniform, triangular,
302exponential, beta, gamma, gumbel, frechet, weibull, and histogram
303bin. The discrete aleatory distribution types include: poisson,
304binomial, negative binomial, geometric, hypergeometric, and histogram
305point. The epistemic distribution type is interval for continuous
306variables. For epistemic discrete variables, there are three types:
307integer range, integer set, and real set.  When gradient and/or
308Hessian information is used in an uncertainty assessment, derivative
309components are normally computed with respect to the active continuous
310variables, which could be aleatory uncertain, epistemic uncertain,
311aleatory and epistemic uncertain, or all continuous variables,
312depending on the active view (see Section~\ref{variables:mixed}).
313
314\section{Sampling Methods}\label{uq:sampling}
315
316Sampling techniques are selected using the \texttt{sampling}
317method selection. This method generates sets of samples according to
318the probability distributions of the uncertain variables and maps them
319into corresponding sets of response functions, where the number of
320samples is specified by the \texttt{samples} integer specification.
321Means, standard deviations, coefficients of variation (COVs), and 95\%
322confidence intervals are computed for the response functions.
323Probabilities and reliabilities may be computed for
324\texttt{response\_levels} specifications, and response levels may be
325computed for either \texttt{probability\_levels} or
326\texttt{reliability\_levels} specifications (refer to the Method
327keywords section in the Dakota Reference Manual~\cite{RefMan} for
328additional information).
329
330Currently, traditional Monte Carlo (MC) and Latin hypercube sampling
331(LHS) are supported by Dakota and are chosen by specifying
332\texttt{sample\_type} as \texttt{random} or \texttt{lhs}. In Monte
333Carlo sampling, the samples are selected randomly according to the
334user-specified probability distributions. Latin hypercube sampling is
335a stratified sampling technique for which the range of each uncertain
336variable is divided into $N_{s}$ segments of equal probability, where
337$N_{s}$ is the number of samples requested. The relative lengths of
338the segments are determined by the nature of the specified probability
339distribution (e.g., uniform has segments of equal width, normal has
340small segments near the mean and larger segments in the tails). For
341each of the uncertain variables, a sample is selected randomly from
342each of these equal probability segments. These $N_{s}$ values for
343each of the individual parameters are then combined in a shuffling
344operation to create a set of $N_{s}$ parameter vectors with a
345specified correlation structure. A feature of the resulting sample set
346is that
347\emph{every row and column in the hypercube of partitions has exactly one sample}.
348Since the total number of samples is exactly equal
349to the number of partitions used for each uncertain variable, an
350arbitrary number of desired samples is easily accommodated (as
351compared to less flexible approaches in which the total number of
352samples is a product or exponential function of the number of
353intervals for each variable, i.e., many classical design of
354experiments methods).
355
356Advantages of sampling-based methods include their relatively simple
357implementation and their independence from the scientific disciplines
358involved in the analysis. The main drawback of these techniques is the
359large number of function evaluations needed to generate converged
360statistics, which can render such an analysis computationally very
361expensive, if not intractable, for real-world engineering
362applications. LHS techniques, in general, require fewer samples than
363traditional Monte Carlo for the same accuracy in statistics, but they
364still can be prohibitively expensive. For further information on the
365method and its relationship to other sampling techniques, one is
366referred to the works by McKay, et al.~\cite{Mck79}, Iman and
367Shortencarier~\cite{Ima84}, and Helton and Davis~\cite{Hel00}.
368Note that under certain separability conditions associated with the
369function to be sampled,
370Latin hypercube sampling provides a more accurate estimate of the mean
371value than does random sampling. That is, given an equal number of
372samples, the LHS estimate of the mean will have less variance than the
373mean value obtained through random sampling.
374
375Figure~\ref{dace:figure01} demonstrates Latin hypercube sampling on a
376two-variable parameter space. Here, the range of both parameters,
377$x_1$ and $x_2$, is $[0,1]$. Also, for this example both $x_1$
378and $x_2$ have uniform statistical distributions. For Latin
379hypercube sampling, the range of each parameter is divided into $p$
380``bins'' of equal probability. For parameters with uniform
381distributions, this corresponds to partitions of equal size. For $n$
382design parameters, this partitioning yields a total of $p^{n}$ bins in
383the parameter space. Next, $p$ samples are randomly selected in the
384parameter space, with the following restrictions: (a) each sample is
385randomly placed inside a bin, and (b) for all one-dimensional
386projections of the $p$ samples and bins, there will be one and only
387one sample in each bin. In a two-dimensional example such as that
388shown in Figure~\ref{dace:figure01}, these LHS rules guarantee that
389only one bin can be selected in each row and column. For $p=4$, there
390are four partitions in both $x_1$ and $x_2$. This gives a total of
39116 bins, of which four will be chosen according to the criteria
392described above. Note that there is more than one possible arrangement
393of bins that meet the LHS criteria. The dots in
394Figure~\ref{dace:figure01} represent the four sample sites in this
395example, where each sample is randomly located in its bin. There is
396no restriction on the number of bins in the range of each parameter,
397however, all parameters must have the same number of bins.
398
399\begin{figure}[htbp!]
400  \centering
401  \includegraphics[scale=0.35]{images/lhs_graphic}
402  \caption{An example of Latin hypercube sampling with four bins in
403    design parameters $x_1$ and $x_2$. The dots
404    are the sample sites.}
405  \label{dace:figure01}
406\end{figure}
407
408The actual algorithm for generating Latin hypercube samples is more
409complex than indicated by the description given above. For example,
410the Latin hypercube sampling method implemented in the LHS
411code~\cite{Swi04} takes into account a user-specified correlation
412structure when selecting the sample sites. For more details on the
413implementation of the LHS algorithm, see Reference~\cite{Swi04}.
414
415In addition to Monte Carlo vs. LHS design choices, Dakota sampling
416methods support options for incrementally-refined designs, generation
417of approximately determinant-optimal (D-optimal) designs, and
418selection of sample sizes to satisfy Wilks' criteria.
419
420\subsection{Uncertainty Quantification Example using Sampling Methods}\label{uq:uncertainty1}
421
422The input file in Figure~\ref{uq:figure01}
423demonstrates the use of Latin hypercube Monte Carlo sampling for
424assessing probability of failure as measured by specified response
425levels.  The two-variable Textbook example problem (see
426Equation~\ref{additional:textbook_f}) will be used to demonstrate
427the application of sampling methods for uncertainty quantification
428where it is assumed that $x_1$ and $x_2$ are uniform uncertain
429variables on the interval $[0,1]$.
430
431The number of samples to
432perform is controlled with the \texttt{samples} specification, the
433type of sampling algorithm to use is controlled with the
434\texttt{sample\_type} specification, the levels used for computing
435statistics on the response functions is specified with the
436\texttt{response\_levels} input, and the \texttt{seed} specification
437controls the sequence of the pseudo-random numbers generated by the
438sampling algorithms. The input samples generated are shown in
439Figure~\ref{uq:figure02} for the case where \texttt{samples} = 5 and
440\texttt{samples} = 10 for both \texttt{random} ($\circ$) and
441\texttt{lhs} ($+$) sample types.
442
443\begin{figure}[htbp!]
444  \centering \begin{bigbox} \begin{small}
445  \verbatimtabinput[8]{textbook_uq_sampling.in} \end{small} \end{bigbox}
446\caption{Dakota input file for UQ example using LHS --
447see \protect\path{dakota/share/dakota/examples/users/textbook_uq_sampling.in} }
448\label{uq:figure01}
449\end{figure}
450
451\begin{figure}[htbp!]
452  \centering
453  \subfigure{\includegraphics[scale=0.35]{images/input_samples5}}
454  \subfigure{\includegraphics[scale=0.35]{images/input_samples10}}
455  \caption{Distribution of input sample points for random ($\triangle$)
456    and lhs ($+$) sampling for \texttt{samples=5} and \texttt{10}.}
457  \label{uq:figure02}
458\end{figure}
459
460Latin hypercube sampling ensures full coverage of the range of the
461input variables, which is often a problem with Monte Carlo sampling
462when the number of samples is small. In the case of \texttt{samples =
4635}, poor stratification is evident in $x_1$ as four out of the five
464Monte Carlo samples are clustered in the range $0.35 < x_1 < 0.55$,
465and the regions $x_1 < 0.3$ and $0.6 < x_1 < 0.9$ are completely
466missed. For the case where \texttt{samples = 10}, some clustering in
467the Monte Carlo samples is again evident with \texttt{4} samples in
468the range $0.5 < x_1 < 0.55$. In both cases, the stratification with
469LHS is superior.
470
471The response function statistics returned by Dakota
472are shown in Figure~\ref{uq:figure03}. The first block of output
473specifies the response sample means, sample standard deviations, and
474skewness and kurtosis.  The second block of output displays confidence
475intervals on the means and standard deviations of the responses.
476The third block defines Probability Density Function (PDF) histograms
477of the samples: the histogram bins are defined by the lower and upper
478values of the bin and the corresponding density for that bin.
479Note that these bin endpoints correspond to the \texttt{response\_levels}
480and/or \texttt{probability\_levels} defined by the user in the Dakota
481input file.  If there are just a few levels, these histograms may be coarse.
482Dakota does not do anything to optimize the bin size or spacing.
483Finally, the last section of the output defines the Cumulative Distribution
484Function (CDF) pairs.  In this case,
485\texttt{distribution cumulative} was specified for the response
486functions, and Dakota presents the probability levels corresponding to the
487specified response levels (\texttt{response\_levels}) that were set.
488The
489default \texttt{compute probabilities} was used. Alternatively,
490Dakota could have provided CCDF pairings, reliability levels
491corresponding to prescribed response levels, or response levels
492corresponding to prescribed probability or reliability levels.
493
494\begin{figure}[htbp!]
495\centering
496\begin{bigbox}
497\begin{footnotesize}
498\begin{verbatim}
499Statistics based on 10 samples:
500
501Sample moment statistics for each response function:
502                            Mean           Std Dev          Skewness          Kurtosis
503 response_fn_1  3.8383990322e-01  4.0281539886e-01  1.2404952971e+00  6.5529797327e-01
504 response_fn_2  7.4798705803e-02  3.4686110941e-01  4.5716015887e-01 -5.8418924529e-01
505 response_fn_3  7.0946176558e-02  3.4153246532e-01  5.2851897926e-01 -8.2527332042e-01
506
50795% confidence intervals for each response function:
508                    LowerCI_Mean      UpperCI_Mean    LowerCI_StdDev    UpperCI_StdDev
509 response_fn_1  9.5683125821e-02  6.7199668063e-01  2.7707061315e-01  7.3538389383e-01
510 response_fn_2 -1.7333078422e-01  3.2292819583e-01  2.3858328290e-01  6.3323317325e-01
511 response_fn_3 -1.7337143113e-01  3.1526378424e-01  2.3491805390e-01  6.2350514636e-01
512
513Probability Density Function (PDF) histograms for each response function:
514PDF for response_fn_1:
515          Bin Lower          Bin Upper      Density Value
516          ---------          ---------      -------------
517   2.3066424677e-02   1.0000000000e-01   3.8994678038e+00
518   1.0000000000e-01   2.0000000000e-01   2.0000000000e+00
519   2.0000000000e-01   6.0000000000e-01   5.0000000000e-01
520   6.0000000000e-01   1.2250968624e+00   4.7992562123e-01
521PDF for response_fn_2:
522          Bin Lower          Bin Upper      Density Value
523          ---------          ---------      -------------
524  -3.5261164651e-01   1.0000000000e-01   1.1046998102e+00
525   1.0000000000e-01   2.0000000000e-01   2.0000000000e+00
526   2.0000000000e-01   6.0000000000e-01   5.0000000000e-01
527   6.0000000000e-01   6.9844576220e-01   1.0157877573e+00
528PDF for response_fn_3:
529          Bin Lower          Bin Upper      Density Value
530          ---------          ---------      -------------
531  -3.8118095128e-01   1.0000000000e-01   1.2469321539e+00
532   1.0000000000e-01   2.0000000000e-01   0.0000000000e+00
533   2.0000000000e-01   6.0000000000e-01   7.5000000000e-01
534   6.0000000000e-01   6.4526450977e-01   2.2092363423e+00
535
536Level mappings for each response function:
537Cumulative Distribution Function (CDF) for response_fn_1:
538     Response Level  Probability Level  Reliability Index  General Rel Index
539     --------------  -----------------  -----------------  -----------------
540   1.0000000000e-01   3.0000000000e-01
541   2.0000000000e-01   5.0000000000e-01
542   6.0000000000e-01   7.0000000000e-01
543Cumulative Distribution Function (CDF) for response_fn_2:
544     Response Level  Probability Level  Reliability Index  General Rel Index
545     --------------  -----------------  -----------------  -----------------
546   1.0000000000e-01   5.0000000000e-01
547   2.0000000000e-01   7.0000000000e-01
548   6.0000000000e-01   9.0000000000e-01
549Cumulative Distribution Function (CDF) for response_fn_3:
550     Response Level  Probability Level  Reliability Index  General Rel Index
551     --------------  -----------------  -----------------  -----------------
552   1.0000000000e-01   6.0000000000e-01
553   2.0000000000e-01   6.0000000000e-01
554   6.0000000000e-01   9.0000000000e-01
555\end{verbatim}
556\end{footnotesize}
557\end{bigbox}
558\caption{Dakota response function statistics from UQ sampling example.}
559\label{uq:figure03}
560\end{figure}
561
562In addition to obtaining statistical summary information of the type
563shown in Figure~\ref{uq:figure03}, the results of LHS sampling also
564include correlations. Four types of correlations are returned in the
565output: simple and partial ``raw'' correlations, and simple and
566partial ``rank'' correlations. The raw correlations refer to
567correlations performed on the actual input and output data. Rank
568correlations refer to correlations performed on the ranks of the data.
569Ranks are obtained by replacing the actual data by the ranked values,
570which are obtained by ordering the data in ascending order. For
571example, the smallest value in a set of input samples would be given a
572rank 1, the next smallest value a rank 2, etc. Rank correlations are
573useful when some of the inputs and outputs differ greatly in
574magnitude: then it is easier to compare if the smallest ranked input
575sample is correlated with the smallest ranked output, for example.
576
577Correlations are always calculated between two sets of sample data.
578One can calculate correlation coefficients between two input
579variables, between an input and an output variable (probably the most
580useful), or between two output variables. The simple correlation
581coefficients presented in the output tables are Pearson's correlation
582coefficient, which is defined for two variables $x$ and $y$ as:
583$\mathtt{Corr}(x,y) = \frac{\sum_{i}(x_{i}-\bar{x})(y_{i}-\bar{y})}
584{\sqrt{\sum_{i}(x_{i}-\bar{x})^2\sum_{i}(y_{i}-\bar{y})^2}}$.
585Partial correlation coefficients are similar to simple correlations,
586but a partial correlation coefficient between two variables measures
587their correlation while adjusting for the effects of the other
588variables. For example, say one has a problem with two inputs and one
589output; and the two inputs are highly correlated. Then the
590correlation of the second input and the output may be very low after
591accounting for the effect of the first input. The rank correlations
592in Dakota are obtained using Spearman's rank correlation. Spearman's
593rank is the same as the Pearson correlation coefficient except that it
594is calculated on the rank data.
595
596Figure~\ref{uq:figure04} shows an example of the correlation output
597provided by Dakota for the input file in Figure~\ref{uq:figure01}.
598Note that these correlations are presently only available when one
599specifies \texttt{lhs} as the sampling method under \texttt{sampling}.
600Also note that the simple and partial correlations should be similar in most
601cases (in terms of values of correlation coefficients). This is
602because we use a default ``restricted pairing'' method in the LHS
603routine which forces near-zero correlation amongst uncorrelated
604inputs.
605
606\begin{figure}[htbp!]
607\centering
608\begin{bigbox}
609\begin{small}
610\begin{verbatim}
611Simple Correlation Matrix between input and output:
612                       x1           x2 response_fn_1 response_fn_2 response_fn_3
613          x1  1.00000e+00
614          x2 -7.22482e-02  1.00000e+00
615response_fn_1 -7.04965e-01 -6.27351e-01  1.00000e+00
616response_fn_2  8.61628e-01 -5.31298e-01 -2.60486e-01  1.00000e+00
617response_fn_3 -5.83075e-01  8.33989e-01 -1.23374e-01 -8.92771e-01  1.00000e+00
618
619Partial Correlation Matrix between input and output:
620             response_fn_1 response_fn_2 response_fn_3
621          x1 -9.65994e-01  9.74285e-01 -9.49997e-01
622          x2 -9.58854e-01 -9.26578e-01  9.77252e-01
623
624Simple Rank Correlation Matrix between input and output:
625                       x1           x2 response_fn_1 response_fn_2 response_fn_3
626          x1  1.00000e+00
627          x2 -6.66667e-02  1.00000e+00
628response_fn_1 -6.60606e-01 -5.27273e-01  1.00000e+00
629response_fn_2  8.18182e-01 -6.00000e-01 -2.36364e-01  1.00000e+00
630response_fn_3 -6.24242e-01  7.93939e-01 -5.45455e-02 -9.27273e-01  1.00000e+00
631
632Partial Rank Correlation Matrix between input and output:
633             response_fn_1 response_fn_2 response_fn_3
634          x1 -8.20657e-01  9.74896e-01 -9.41760e-01
635          x2 -7.62704e-01 -9.50799e-01  9.65145e-01
636\end{verbatim}
637\end{small}
638\end{bigbox}
639\caption{Correlation results using LHS Sampling.}
640\label{uq:figure04}
641\end{figure}
642
643Finally, note that the LHS package can be used for design of
644experiments over design and state variables by including an active
645view override in the variables specification section of the Dakota
646input file (see Section~\ref{variables:mixedview}). Then, instead of
647iterating on only the uncertain variables, the LHS package will sample
648over all of the active variables. In the \texttt{active all} view,
649continuous design and continuous state variables are treated as having
650uniform probability distributions within their upper and lower bounds,
651discrete design and state variables are sampled uniformly from within
652their sets or ranges, and any uncertain variables are sampled within
653their specified probability distributions.
654
655\subsection{Incremental Sampling}\label{uq:incremental}
656
657In many situations, one may run an initial sample set and then need to
658perform further sampling to get better estimates of the mean,
659variance, and percentiles, and to obtain more comprehensive sample
660coverage. We call this capability incremental sampling.  Typically, a
661Dakota restart file (\path{dakota.rst}) would be available from the
662original sample, so only the newly generated samples would need to be
663evaluated.  Incremental sampling supports continuous uncertain
664variables and discrete uncertain variables such as discrete
665distributions (e.g.  binomial, Poisson, etc.) as well as histogram
666variables and uncertain set types.
667
668There are two cases, incremental random and incremental Latin
669hypercube sampling, with incremental LHS being the most common.  One
670major advantage of LHS incremental sampling is that it maintains the
671stratification and correlation structure of the original LHS sample.
672That is, if one generated two independent LHS samples and simply
673merged them, the calculation of the accuracy of statistical measures
674such as the mean and the variance would be slightly
675incorrect. However, in the incremental case, the full sample (double
676the original size) is a Latin Hypercube sample itself and statistical
677measures and their accuracy can be properly calculated. The
678incremental sampling capability is most useful when one is starting
679off with very small samples. Once the sample size is more than a few
680hundred, the benefit of incremental sampling diminishes.
681
682\begin{enumerate}
683
684\item Incremental random sampling: With incremental random sampling,
685  the original sample set with $N1$ samples must be generated using
686  \texttt{sample\_type = random} and \texttt{samples = N1}.  Then, the
687  user can duplicate the Dakota input file and add
688  \texttt{refinement\_samples = N2} with the number of new samples
689  $N2$ to be added.  Random incremental sampling does not require a
690  doubling of samples each time. Thus, the user can specify any number
691  of \texttt{refinement\_samples} (from an additional one sample to a
692  large integer).
693
694  For example, if the first sample has 50 samples, and 10 more samples
695  are desired, the second Dakota run should specify \texttt{samples =
696    50}, \texttt{refinement\_samples = 10}.  In this situation, only
697  10 new samples will be generated, and the final statistics will be
698  reported at the end of the study both for the initial 50 samples and
699  for the full sample of 60. The command line syntax for
700  running the second sample is \texttt{dakota -i input60.in -r
701    dakota.50.rst} where \texttt{input60.in} is the input file with
702  the refinement samples specification and \path{dakota.50.rst} is
703  the restart file containing the initial 50 samples.  Note that if
704  the restart file has a different name, that is fine; the correct
705  restart file name should be used.
706
707  This process can be repeated if desired,arbitrarily extending the
708  total sample size each time, e.g, \texttt{samples = 50},
709  \texttt{refinement\_samples = 10  3  73  102}.
710
711\item Incremental Latin hypercube sampling: With incremental LHS
712  sampling, the original sample set with $N1$ samples must be
713  generated using \texttt{sample\_type = lhs} and \texttt{samples =
714    N1}.  Then, the user can duplicate the Dakota input file and add
715  \texttt{refinement\_samples = N1}.  The sample size must double each
716  time, so the first set of refinement samples must be the same size
717  as the initial set.  That is, if one starts with a very small sample
718  size of 10, then one can use the incremental sampling capability to
719  generate sample sizes of 20, 40, 80, etc.
720
721  For example, if the first sample has 50 samples, in the second
722  Dakota run, the number of refinement samples should be set to 50 for
723  a total of 100. In this situation, only 50 new samples will be
724  generated, and at the end of the study final statistics will be reported
725  both for the initial 50 samples and for the full sample of 100. The command
726  line syntax for running the second sample
727  is \texttt{dakota -i input100.in -r dakota.50.rst}, where
728  \path{input100.in} is the input file with the incremental sampling
729  specification and \path{dakota.50.rst} is the restart file
730  containing the initial 50 samples.  Note that if the restart file
731  has a different name, that is fine; the correct restart file name
732  should be used.
733
734  This process can be repeated if desired, doubling the total sample
735  size each time, e.g, \texttt{samples = 50},
736  \texttt{refinement\_samples = 50  100  200  400}.
737
738\end{enumerate}
739
740\subsection{Principal Component Analysis}
741As of Dakota 6.3, we added a capability to perform
742Principal Component Analysis on field response data when
743using LHS sampling.  Principal components analysis (PCA)
744is a data reduction method and allows one to express an
745ensemble of field data with a set of principal components
746responsible for the spread of that data.
747
748Dakota can calculate the principal components of the response matrix of
749N samples * L responses (the field response of length L)
750using the keyword \texttt{principal\_components}.
751The Dakota implementation is under active development:  the PCA capability
752may ultimately be specified elsewhere or used in different ways.
753For now, it is performed as a post-processing analysis based on a set of
754Latin Hypercube samples.
755
756If the user specifies LHS sampling with field data responses and also
757specifies \texttt{principal\_components}, Dakota will calculate the principal
758components by calculating the eigenvalues and eigenvectors of a centered
759data matrix.
760Further, if the user specifies \texttt{percent\_variance\_explained} = 0.99,
761the number of components that accounts for at least 99 percent of the variance
762in the responses will be retained.  The default for this percentage is 0.95.
763In many applications, only a few principal components explain the majority
764of the variance, resulting in significant data reduction.
765The principal components are written to a file, \path{princ_comp.txt}.
766Dakota also uses the principal components to create a surrogate model by
767representing the overall response as weighted sum of M principal components,
768where the weights will be determined by Gaussian processes which are a function
769of the input uncertain variables.  This reduced form then can be used for
770sensitivity analysis, calibration, etc.
771
772\subsection{Wilks-based Sample Sizes}\label{uq:wilks}
773
774Most of the sampling methods require the user to specify the number of
775samples in advance.  However, if one specifies \texttt{random} sampling,
776one can use an approach developed by Wilks\cite{Wilks} to determine
777the number of samples that ensures a particular confidence level in
778a percentile of interest.  The Wilks method of computing the number
779of samples to execute for a random sampling study is based on order
780statistics, eg considering the outputs ordered from smallest to largest
781\cite{Wilks,Nutt04}.  Given a \texttt{probability\_level}, $\alpha$,
782and \texttt{confidence\_level}, $\beta$, the Wilks calculation determines
783the minimum number of samples required such that there is $(\beta*100)$\%
784confidence that the $(\alpha*100)$\%-ile of the uncertain distribution
785on model output will fall below the actual $(\alpha*100)$\%-ile given
786by the sample.  To be more specific, if we wish to calculate the $95\%$
787confidence limit on the $95^{th}$ percentile, Wilks indicates that 59
788samples are needed.  If we order the responses and take the largest one,
789that value defines a tolerance limit on the 95th percentile:  we have a
790situation where $95\%$ of the time, the $95^{th}$ percentile will fall at
791or below that sampled value.  This represents a \texttt{one\_sided\_upper}
792treatment applicable to the largest output value.  This treatment
793can be reversed to apply to the lowest output value by using the
794\texttt{one\_sided\_lower} option, and further expansion to include an
795interval containing both the smallest and the largest output values in
796the statistical statement can be specified via the \texttt{two\_sided}
797option.  Additional generalization to higher order statistics, eg a
798statement applied to the N largest outputs (\texttt{one\_sided\_upper}) or the
799N smallest and N largest outputs (\texttt{two\_sided}), can be specified
800using the \texttt{order} option along with value N.
801
802\section{Reliability Methods}\label{uq:reliability}
803
804Reliability methods provide an alternative approach to uncertainty
805quantification which can be less computationally demanding than
806sampling techniques. Reliability methods for uncertainty
807quantification are based on probabilistic approaches that compute
808approximate response function distribution statistics based on
809specified uncertain variable distributions. These response statistics
810include response mean, response standard deviation, and cumulative or
811complementary cumulative distribution functions (CDF/CCDF). These
812methods are often more efficient at computing statistics in the tails
813of the response distributions (events with low probability) than
814sampling based approaches since the number of samples required to
815resolve a low probability can be prohibitive.
816
817The methods all answer the fundamental question: ``Given a set of
818uncertain input variables, $\mathbf{X}$, and a scalar response
819function, $g$, what is the probability that the response function is
820below or above a certain level, $\bar{z}$?'' The former can be written
821as $P[g(\mathbf{X}) \le \bar{z}] = \mathit{F}_{g}(\bar{z})$ where
822$\mathit{F}_{g}(\bar{z})$ is the cumulative distribution function
823(CDF) of the uncertain response $g(\mathbf{X})$ over a set of response
824levels. The latter can be written as $P[g(\mathbf{X}) > \bar{z}]$ and
825defines the complementary cumulative distribution function (CCDF).
826
827This probability calculation involves a multi-dimensional integral
828over an irregularly shaped domain of interest, $\mathbf{D}$, where
829$g(\mathbf{X}) < z$ as displayed in Figure~\ref{uq:figure05} for the
830case of two variables. The reliability methods all involve the
831transformation of the user-specified uncertain variables,
832$\mathbf{X}$, with probability density function, $p(x_1,x_2)$, which
833can be non-normal and correlated, to a space of independent Gaussian
834random variables, $\mathbf{u}$, possessing a mean value of zero and
835unit variance (i.e., standard normal variables). The region of
836interest, $\mathbf{D}$, is also mapped to the transformed space to
837yield, $\mathbf{D_{u}}$ , where $g(\mathbf{U}) < z$ as shown in
838Figure~\ref{uq:figure06}. The Nataf transformation~\cite{Der86},
839which is identical to the Rosenblatt transformation~\cite{Ros52} in
840the case of independent random variables, is used in Dakota to
841accomplish this mapping. This transformation is performed to make the
842probability calculation more tractable. In the transformed space,
843probability contours are circular in nature as shown in
844Figure~\ref{uq:figure06} unlike in the original uncertain variable
845space, Figure~\ref{uq:figure05}. Also, the multi-dimensional integrals
846can be approximated by simple functions of a single parameter,
847$\beta$, called the reliability index. $\beta$ is the minimum
848Euclidean distance from the origin in the transformed space to the
849response surface. This point is also known as the most probable point
850(MPP) of failure. Note, however, the methodology is equally applicable
851for generic functions, not simply those corresponding to failure
852criteria; this nomenclature is due to the origin of these methods
853within the disciplines of structural safety and reliability.
854Note that there are local and global reliability methods. The majority
855of the methods available are local, meaning that a local optimization
856formulation is used to locate one MPP. In contrast, global methods
857can find multiple MPPs if they exist.
858\begin{figure}[htbp!]
859  \centering
860  \includegraphics[scale=0.75]{images/cdf_orig_graphic}
861  \caption{Graphical depiction of calculation of cumulative
862    distribution function in the original uncertain variable space.}
863  \label{uq:figure05}
864\end{figure}
865
866\begin{figure}[htbp!]
867  \centering
868  \includegraphics[scale=0.75]{images/cdf_tran_graphic}
869  \caption{Graphical depiction of integration for the calculation of
870    cumulative distribution function in the transformed uncertain
871    variable space.}
872  \label{uq:figure06}
873\end{figure}
874
875\subsection{Local Reliability Methods}\label{uq:reliability:local}
876
877The Dakota Theory Manual~\cite{TheoMan} provides the algorithmic
878details for the local reliability methods, including the Mean Value
879method and the family of most probable point (MPP) search methods.
880
881
882\subsubsection{Method mapping} \label{uq:reliability:local:map}
883
884Given settings for limit state approximation, approximation order,
885integration approach, and other details presented to this point, it is
886evident that the number of algorithmic combinations is high.
887Table~\ref{tab:rel_meth_map} provides a succinct mapping for some of
888these combinations to common method names from the reliability
889literature, where blue indicates the most well-known combinations and
890gray indicates other supported combinations.
891\begin{table}
892\centering
893\caption{Mapping from Dakota options to standard reliability methods.}
894\label{tab:rel_meth_map}
895\begin{tabular}{|c|c|c|}
896\hline
897& \multicolumn{2}{c|}{Order of approximation and integration} \\ \cline{2-3}
898MPP search      & First order & Second order                        \\ \hline
899none            & \cellcolor{blue}\textcolor{white}{MVFOSM}
900                & \cellcolor[gray]{0.5}\textcolor{black}{MVSOSM}   \\ \hline
901x\_taylor\_mean & \cellcolor{blue}\textcolor{white}{AMV}
902                & \cellcolor[gray]{0.5}\textcolor{black}{AMV$^2$}  \\ \hline
903u\_taylor\_mean & \cellcolor[gray]{0.5}\textcolor{black}{u-space AMV}
904                & \cellcolor[gray]{0.5}\textcolor{black}{u-space AMV$^2$} \\
905\hline
906x\_taylor\_mpp  & \cellcolor{blue}\textcolor{white}{AMV+}
907                & \cellcolor[gray]{0.5}\textcolor{black}{AMV$^2$+} \\ \hline
908u\_taylor\_mpp  & \cellcolor[gray]{0.5}\textcolor{black}{u-space AMV+}
909                & \cellcolor[gray]{0.5}\textcolor{black}{u-space AMV$^2$+} \\
910\hline
911x\_two\_point   & \cellcolor{blue}\textcolor{white}{TANA}
912                & \cellcolor[gray]{0.5}                             \\ \hline
913u\_two\_point   & \cellcolor[gray]{0.5}\textcolor{black}{u-space TANA}
914                & \cellcolor[gray]{0.5}                             \\ \hline
915no\_approx      & \cellcolor{blue}\textcolor{white}{FORM}
916                & \cellcolor{blue}\textcolor{white}{SORM}           \\ \hline
917\end{tabular}
918\end{table}
919
920Within the Dakota specification (refer to \texttt{local\_reliability}
921in the keywords section of the Reference Manual~\cite{RefMan})
922within the Reference Manual), the MPP search and integration order
923selections are explicit in the method specification, but the order of
924the approximation is inferred from the associated response
925specification (as is done with local taylor series approximations
926described in Section~\ref{models:surf:taylor}). Thus, reliability
927methods do not have to be synchronized in approximation and
928integration order as shown in the table; however, it is often
929desirable to do so.
930
931
932\subsection{Global Reliability Methods}\label{uq:reliability:global}
933
934Global reliability methods are designed to handle nonsmooth and
935multimodal failure surfaces, by creating global approximations based
936on Gaussian process models. They accurately resolve a particular
937contour of a response function and then estimate probabilities using
938multimodal adaptive importance sampling.
939
940The global reliability method in Dakota is called
941Efficient Global Reliability Analysis (EGRA) ~\cite{Bichon2008}.
942The name is due to its
943roots in efficient global optimization (EGO) ~\cite{Jon98,Hua06}.
944The main idea in EGO-type optimization methods is that a global
945approximation is made of the underlying function. This approximation,
946which is a Gaussian process model, is used to guide the search by finding
947points which maximize the expected improvement function (EIF).
948The EIF is used to select the location at which a new training point should be
949added to the Gaussian process model by maximizing the amount of improvement
950in the objective function that can be expected by adding that point.
951A point could be expected to produce an improvement in the objective function
952if its predicted value is better than the current best solution, or if the
953uncertainty in its prediction is such that the probability of it producing
954a better solution is high.
955Because the uncertainty is higher in regions of the design space with fewer
956observations, this provides a balance between exploiting areas of the
957design space that predict good solutions, and exploring areas where more
958information is needed.
959
960The general procedure of these EGO-type methods is:
961\begin{enumerate}
962\item Build an initial Gaussian process model of the objective function.
963%\item Use cross validation to ensure that the GP model is satisfactory.
964\item Find the point that maximizes the EIF.
965      If the EIF value at this point is sufficiently small, stop.
966\item Evaluate the objective function at the point where the EIF is maximized.
967      Update the Gaussian process model using this new point.
968      Go to Step 2.
969\end{enumerate}
970
971Gaussian process (GP) models are used because
972they provide not just a predicted value at an unsampled point, but also an
973estimate of the prediction variance.
974This variance gives an indication of the uncertainty in the GP model, which
975results from the construction of the covariance function.
976This function is based on the idea that when input points are near one another,
977the correlation between their corresponding outputs will be high.
978As a result, the uncertainty associated with the model's predictions will be
979small for input points which are near the points used to train the model,
980and will increase as one moves further from the training points.
981
982The expected improvement function is used in EGO algorithms
983to select the location at which a new training point should be added.
984The EIF is defined as the expectation that any point in the search
985space will provide a better solution than the current best solution
986based on the expected values and variances predicted by the GP model.
987It is important to understand how the use of this EIF leads to optimal
988solutions. The EIF indicates how much the objective function value at
989a new potential location
990is expected to be less than the predicted value at the current best solution.
991Because the GP model provides a Gaussian distribution at each predicted
992point, expectations can be calculated.
993Points with good expected values and even a small variance will
994have a significant expectation of producing a better solution (exploitation),
995but so will points that have relatively poor expected values and greater
996variance (exploration).
997
998The application of EGO to reliability analysis, however, is made more
999complicated due to the inclusion of equality constraints.
1000In forward reliability analysis, the response function appears as a
1001constraint rather than the objective. That is, we want to satisfy
1002the constraint that the response equals a threshold value
1003and is on the limit state:  $G({\bf u})\!=\!\bar{z}$.
1004Therefore, the EIF function was modified to focus on feasibility,
1005and instead of using an expected improvement function, we use an
1006expected feasibility function (EFF) ~\cite{Bichon2008}.
1007The EFF provides an indication of how well the response is expected
1008to satisfy the equality constraint.
1009Points where the expected value is close to the threshold
1010($\mu_G\!\approx\!\bar{z}$) and points with a large uncertainty in the
1011prediction will have large expected feasibility values.
1012
1013The general outline of the EGRA algorithm is as follows:  LHS sampling
1014is used to generate a small number of samples from the true response
1015function. Then, an initial Gaussian process model is constructed.
1016Based on the EFF, the point with maximum EFF is found using
1017the global optimizer DIRECT. The true response function is then
1018evaluated at this new point, and this point is added to the sample set
1019and the process of building a new GP model and maximizing the EFF is
1020repeated until the maximum EFF is small. At this stage, the GP model
1021is accurate in the vicinity of the limit state. The GP model
1022is then used to calculate the probability of failure
1023using multimodal importance sampling, which is explained below.
1024
1025One method to calculate the probability of failure is
1026to directly perform the probability
1027integration numerically by sampling the response function.
1028Sampling methods can be
1029prohibitively expensive because they generally require a large
1030number of response function evaluations.
1031Importance sampling methods reduce this expense by focusing the samples in
1032the important regions of the uncertain space.
1033They do this by centering the sampling density function at the MPP rather
1034than at the mean.
1035This ensures the samples will lie the region of interest,
1036thus increasing the efficiency of the sampling method.
1037Adaptive importance sampling (AIS) further improves the efficiency by
1038adaptively updating the sampling density function.
1039Multimodal adaptive importance sampling~\cite{Dey98} is a
1040variation of AIS that allows for the use of multiple sampling densities
1041making it better suited for cases where multiple sections of the limit state
1042are highly probable.
1043
1044Note that importance sampling methods require that the location of at least
1045one MPP be known because it is used to center the initial sampling density.
1046However, current gradient-based, local search methods used in MPP search may
1047fail to converge or may converge to poor solutions for
1048highly nonlinear problems, possibly making these methods inapplicable.
1049The EGRA algorithm described above does
1050not depend on the availability of accurate gradient information, making
1051convergence more reliable for nonsmooth response functions.
1052Moreover, EGRA has the ability to locate multiple failure points, which
1053can provide multiple starting points and thus a good multimodal sampling density for the initial steps of multimodal AIS. The probability assessment
1054using multimodal AIS thus incorporates probability of failure at
1055multiple points.
1056
1057\subsection{Uncertainty Quantification Examples using Reliability Analysis} \label{uq:reliability:ex}
1058
1059%Reliability methods provide an alternative approach to uncertainty
1060%quantification which can be less computationally demanding than
1061%sampling techniques.
1062
1063In summary, the user can choose to perform either forward (RIA) or
1064inverse (PMA) mappings when performing a reliability analysis. With
1065either approach, there are a variety of methods from which to choose
1066in terms of limit state approximations (MVFOSM, MVSOSM, x-/u-space
1067AMV, x-/u-space AMV$^2$, x-/u-space AMV+, x-/u-space AMV$^2$+,
1068x-/u-space TANA, and FORM/SORM), probability integrations
1069(first-order or second-order), limit state Hessian selection
1070(analytic, finite difference, BFGS, or SR1), and MPP optimization
1071algorithm (SQP or NIP) selections.
1072
1073All reliability methods output approximate values of the CDF/CCDF
1074response-probability-reliability levels for prescribed response levels
1075(RIA) or prescribed probability or reliability levels (PMA). In
1076addition, mean value methods output estimates of the response means
1077and standard deviations as well as importance factors that attribute
1078variance among the set of uncertain variables (provided a nonzero
1079response variance estimate).
1080
1081\subsubsection{Mean-value Reliability with Textbook}
1082\label{uq:examples:mv}
1083
1084Figure~\ref{uq:examples:mv_input} shows the Dakota input file for an
1085example problem that demonstrates the simplest reliability method,
1086called the mean value method (also referred to as the Mean Value First
1087Order Second Moment method). It is specified with method keyword
1088\texttt{local\_reliability}. This method calculates the mean and
1089variance of the response function based on information about the mean
1090and variance of the inputs and gradient information at the mean of the
1091inputs. The mean value method is extremely cheap computationally (only
1092five runs were required for the textbook function), but can be quite
1093inaccurate, especially for nonlinear problems and/or problems with
1094uncertain inputs that are significantly non-normal. More detail on the
1095mean value method can be found in the Local Reliability Methods
1096section of the Dakota Theory Manual~\cite{TheoMan}, and more detail on
1097reliability methods in general (including the more advanced methods)
1098is found in Section~\ref{uq:reliability}.
1099
1100Example output from the mean value method is displayed in
1101Figure~\ref{uq:examples:mv_results}. Note that since the mean of both
1102inputs is 1, the mean value of the output for response 1 is zero.
1103However, the mean values of the constraints are both 0.5.  The mean
1104value results indicate that variable x1 is more important in
1105constraint 1 while x2 is more important in constraint 2, which is the
1106case based on Equation~\ref{additional:textbook_f}.  The importance
1107factors are not available for the first response as the standard
1108deviation is zero.
1109
1110\begin{figure}[htbp!]
1111  \centering
1112  \begin{bigbox}
1113    \begin{small}
1114      \verbatimtabinput[8]{textbook_uq_meanvalue.in}
1115    \end{small}
1116  \end{bigbox}
1117  \caption{Mean Value Reliability Method: the Dakota input file -- see
1118    \protect\path{dakota/share/dakota/examples/users/textbook_uq_meanvalue.in}
1119  }
1120  \label{uq:examples:mv_input}
1121\end{figure}
1122
1123\begin{figure}[htbp!]
1124\centering
1125\begin{bigbox}
1126\begin{small}
1127\begin{verbatim}
1128MV Statistics for response_fn_1:
1129  Approximate Mean Response                  =  0.0000000000e+00
1130  Approximate Standard Deviation of Response =  0.0000000000e+00
1131  Importance Factors not available.
1132MV Statistics for response_fn_2:
1133  Approximate Mean Response                  =  5.0000000000e-01
1134  Approximate Standard Deviation of Response =  1.0307764064e+00
1135  Importance Factor for TF1ln                =  9.4117647059e-01
1136  Importance Factor for TF2ln                =  5.8823529412e-02
1137MV Statistics for response_fn_3:
1138  Approximate Mean Response                  =  5.0000000000e-01
1139  Approximate Standard Deviation of Response =  1.0307764064e+00
1140  Importance Factor for TF1ln                =  5.8823529412e-02
1141  Importance Factor for TF2ln                =  9.4117647059e-01
1142\end{verbatim}
1143\end{small}
1144\end{bigbox}
1145\caption{Results of the Mean Value Method on the Textbook Function}
1146\label{uq:examples:mv_results}
1147\end{figure}
1148
1149\subsubsection{FORM Reliability with Lognormal Ratio}
1150
1151This example quantifies the uncertainty in the ``log ratio'' response
1152function:
1153\begin{equation}
1154g(x_1,x_2) = \frac{x_1}{x_2}
1155\end{equation}
1156by computing approximate response statistics using reliability
1157analysis to determine the response cumulative distribution function:
1158\begin{equation}
1159P[g(x_1,x_2) < \bar{z}]
1160\end{equation}
1161where $X_1$ and $X_2$ are identically distributed lognormal random
1162variables with means of \texttt{1}, standard deviations of
1163\texttt{0.5}, and correlation coefficient of \texttt{0.3}.
1164
1165A Dakota input file showing RIA using FORM (option 7 in limit state
1166approximations combined with first-order integration) is listed in
1167Figure~\ref{uq:rel_input_form}.
1168The user first specifies the \texttt{local\_reliability}
1169method, followed by the MPP search approach and integration order. In
1170this example, we specify \texttt{mpp\_search no\_approx} and utilize
1171the default first-order integration to select FORM. Finally, the user
1172specifies response levels or probability/reliability levels to
1173determine if the problem will be solved using an RIA approach or a PMA
1174approach. In the example figure of~\ref{uq:rel_input_form}, we use
1175RIA by specifying a range of \texttt{response\_levels} for the
1176problem. The resulting output for this input is shown in
1177Figure~\ref{uq:rel_output_form}, with probability and reliability
1178levels listed for each response level. Figure~\ref{uq:rel_form_compare}
1179shows that FORM compares favorably to an exact analytic solution for
1180this problem. Also note that FORM does have some error in the
1181calculation of CDF values for this problem, but it is a very small
1182error (on the order of e-11), much smaller than the error obtained
1183when using a Mean Value method, which will be discussed next.
1184\begin{figure}[htbp!]
1185  \centering
1186  \begin{bigbox}
1187    \begin{small}
1188      \verbatimtabinput[8]{logratio_uq_reliability.in}
1189    \end{small}
1190  \end{bigbox}
1191\caption{Dakota input file for Reliability UQ example using FORM --
1192see \protect\path{dakota/share/dakota/examples/users/logratio_uq_reliability.in} }
1193\label{uq:rel_input_form}
1194\end{figure}
1195
1196\begin{figure}[htbp!]
1197\centering
1198\begin{bigbox}
1199\begin{small}
1200\begin{verbatim}
1201Cumulative Distribution Function (CDF) for response_fn_1:
1202     Response Level  Probability Level  Reliability Index
1203     --------------  -----------------  -----------------
1204   4.0000000000e-01   4.7624085962e-02   1.6683404020e+00
1205   5.0000000000e-01   1.0346525475e-01   1.2620507942e+00
1206   5.5000000000e-01   1.3818404972e-01   1.0885143628e+00
1207   6.0000000000e-01   1.7616275822e-01   9.3008801339e-01
1208   6.5000000000e-01   2.1641741368e-01   7.8434989943e-01
1209   7.0000000000e-01   2.5803428381e-01   6.4941748143e-01
1210   7.5000000000e-01   3.0020938124e-01   5.2379840558e-01
1211   8.0000000000e-01   3.4226491013e-01   4.0628960782e-01
1212   8.5000000000e-01   3.8365052982e-01   2.9590705956e-01
1213   9.0000000000e-01   4.2393548232e-01   1.9183562480e-01
1214   1.0000000000e+00   5.0000000000e-01   6.8682233460e-12
1215   1.0500000000e+00   5.3539344228e-01  -8.8834907167e-02
1216   1.1500000000e+00   6.0043460094e-01  -2.5447217462e-01
1217   1.2000000000e+00   6.3004131827e-01  -3.3196278078e-01
1218   1.2500000000e+00   6.5773508987e-01  -4.0628960782e-01
1219   1.3000000000e+00   6.8356844630e-01  -4.7770089473e-01
1220   1.3500000000e+00   7.0761025532e-01  -5.4641676380e-01
1221   1.4000000000e+00   7.2994058691e-01  -6.1263331274e-01
1222   1.5000000000e+00   7.6981945355e-01  -7.3825238860e-01
1223   1.5500000000e+00   7.8755158269e-01  -7.9795460350e-01
1224   1.6000000000e+00   8.0393505584e-01  -8.5576118635e-01
1225   1.6500000000e+00   8.1906005158e-01  -9.1178881995e-01
1226   1.7000000000e+00   8.3301386860e-01  -9.6614373461e-01
1227   1.7500000000e+00   8.4588021938e-01  -1.0189229206e+00
1228\end{verbatim}
1229\end{small}
1230\end{bigbox}
1231\caption{Output from Reliability UQ example using FORM.}
1232\label{uq:rel_output_form}
1233\end{figure}
1234
1235\begin{figure}[htbp!]
1236  \centering
1237  \includegraphics[scale=0.5]{images/cdf_form}
1238\caption{Comparison of the cumulative distribution function (CDF) computed by
1239FORM, the Mean Value method, and the exact CDF for $g(x_1,x_2)=\frac{x_1}{x_2}$}
1240\label{uq:rel_form_compare}
1241\end{figure}
1242
1243If the user specifies \texttt{local\_reliability} as a method with no
1244additional specification on how to do the MPP search (for example, by
1245commenting out {\tt mpp\_search no\_approx} in
1246Figure~\ref{uq:rel_input_form}), then no MPP search is done: the Mean
1247Value method is used. The mean value results are shown in
1248Figure~\ref{uq:rel_output_mv} and consist of approximate mean and
1249standard deviation of the response, the importance factors for each
1250uncertain variable, and approximate probability/reliability levels for
1251the prescribed response levels that have been inferred from the
1252approximate mean and standard deviation (see Mean Value section in
1253Reliability Methods Chapter of Dakota Theory
1254Manual~\cite{TheoMan}). It is evident that the statistics are
1255considerably different from the fully converged FORM results; however,
1256these rough approximations are also much less expensive to
1257calculate. The importance factors are a measure of the sensitivity of
1258the response function(s) to the uncertain input variables. A
1259comparison of the mean value results with the FORM results is shown in
1260Figure~\ref{uq:rel_form_compare}. The mean value results are not
1261accurate near the tail values of the CDF, and can differ from the
1262exact solution by as much as 0.11 in CDF estimates. A comprehensive
1263comparison of various reliability methods applied to the logratio
1264problem is provided in ~\cite{Eld06a}.
1265
1266\begin{figure}[htbp!]
1267\begin{bigbox}
1268\begin{small}
1269\begin{verbatim}
1270MV Statistics for response_fn_1:
1271  Approximate Mean Response                  =  1.0000000000e+00
1272  Approximate Standard Deviation of Response =  5.9160798127e-01
1273  Importance Factor for TF1ln                =  7.1428570714e-01
1274  Importance Factor for TF2ln                =  7.1428572143e-01
1275  Importance Factor for TF1ln     TF2ln      = -4.2857142857e-01
1276Cumulative Distribution Function (CDF) for response_fn_1:
1277     Response Level  Probability Level  Reliability Index  General Rel Index
1278     --------------  -----------------  -----------------  -----------------
1279   4.0000000000e-01   1.5524721837e-01   1.0141851006e+00   1.0141851006e+00
1280   5.0000000000e-01   1.9901236093e-01   8.4515425050e-01   8.4515425050e-01
1281   5.5000000000e-01   2.2343641149e-01   7.6063882545e-01   7.6063882545e-01
1282   6.0000000000e-01   2.4948115037e-01   6.7612340040e-01   6.7612340040e-01
1283   6.5000000000e-01   2.7705656603e-01   5.9160797535e-01   5.9160797535e-01
1284   7.0000000000e-01   3.0604494093e-01   5.0709255030e-01   5.0709255030e-01
1285   7.5000000000e-01   3.3630190949e-01   4.2257712525e-01   4.2257712525e-01
1286   8.0000000000e-01   3.6765834596e-01   3.3806170020e-01   3.3806170020e-01
1287   8.5000000000e-01   3.9992305332e-01   2.5354627515e-01   2.5354627515e-01
1288   9.0000000000e-01   4.3288618783e-01   1.6903085010e-01   1.6903085010e-01
1289   1.0000000000e+00   5.0000000000e-01   0.0000000000e+00   0.0000000000e+00
1290   1.0500000000e+00   5.3367668035e-01  -8.4515425050e-02  -8.4515425050e-02
1291   1.1500000000e+00   6.0007694668e-01  -2.5354627515e-01  -2.5354627515e-01
1292   1.2000000000e+00   6.3234165404e-01  -3.3806170020e-01  -3.3806170020e-01
1293   1.2500000000e+00   6.6369809051e-01  -4.2257712525e-01  -4.2257712525e-01
1294   1.3000000000e+00   6.9395505907e-01  -5.0709255030e-01  -5.0709255030e-01
1295   1.3500000000e+00   7.2294343397e-01  -5.9160797535e-01  -5.9160797535e-01
1296   1.4000000000e+00   7.5051884963e-01  -6.7612340040e-01  -6.7612340040e-01
1297   1.5000000000e+00   8.0098763907e-01  -8.4515425050e-01  -8.4515425050e-01
1298   1.5500000000e+00   8.2372893005e-01  -9.2966967555e-01  -9.2966967555e-01
1299   1.6000000000e+00   8.4475278163e-01  -1.0141851006e+00  -1.0141851006e+00
1300   1.6500000000e+00   8.6405064339e-01  -1.0987005257e+00  -1.0987005257e+00
1301   1.7000000000e+00   8.8163821351e-01  -1.1832159507e+00  -1.1832159507e+00
1302   1.7500000000e+00   8.9755305196e-01  -1.2677313758e+00  -1.2677313758e+00
1303\end{verbatim}
1304\end{small}
1305\end{bigbox}
1306\caption{Output from Reliability UQ example using mean value.}
1307\label{uq:rel_output_mv}
1308\end{figure}
1309
1310Additional reliability analysis and design results are provided in
1311Sections~\ref{additional:logratio}-\ref{additional:steel_column}.
1312
1313
1314\section{Stochastic Expansion Methods}\label{uq:expansion}
1315
1316
1317%The objective of these techniques is to characterize the response of
1318%systems whose governing equations involve stochastic coefficients.
1319The development of these techniques mirrors that of deterministic
1320finite element analysis through the utilization of the concepts of
1321projection, orthogonality, and weak convergence. The polynomial chaos
1322expansion is based on a multidimensional orthogonal polynomial
1323approximation and the stochastic collocation approach is based on a
1324multidimensional interpolation polynomial approximation, both formed
1325in terms of standardized random variables. A distinguishing feature of
1326these two methodologies is that the final solution is expressed as a
1327functional mapping, and not merely as a set of statistics as is the
1328case for many nondeterministic methodologies.  This makes these
1329techniques particularly attractive for use in multi-physics
1330applications which link different analysis packages.  The first
1331stochastic expansion method is the polynomial chaos expansion
1332(PCE)~\cite{Gha99,Gha91}. For smooth functions (i.e.,
1333analytic, infinitely-differentiable) in $L^2$ (i.e., possessing finite
1334variance), exponential convergence rates can be obtained under order
1335refinement for integrated statistical quantities of interest such as
1336mean, variance, and probability. Dakota implements the generalized
1337PCE approach using the Wiener-Askey
1338scheme~\cite{XiuKarn02}, in which Hermite, Legendre, Laguerre, Jacobi,
1339and generalized Laguerre orthogonal polynomials are used for modeling
1340the effect of continuous random variables described by normal,
1341uniform, exponential, beta, and gamma probability distributions,
1342respectively\footnote{Orthogonal polynomial selections also exist for
1343  discrete probability distributions, but are not yet supported in
1344  Dakota.}. These orthogonal polynomial selections are optimal for
1345these distribution types since the inner product weighting function
1346corresponds\footnote{Identical support range; weight differs by at
1347  most a constant factor.} to the probability density functions for
1348these continuous distributions. Orthogonal polynomials can be computed
1349for any positive weight function, so these five classical orthogonal
1350polynomials may be augmented with numerically-generated polynomials
1351for other probability distributions (e.g., for lognormal, extreme
1352value, and histogram distributions).  When independent standard random
1353variables are used (or computed through transformation), the variable
1354expansions are uncoupled, allowing the polynomial orthogonality
1355properties to be applied on a per-dimension basis. This allows one to
1356mix and match the polynomial basis used for each variable without
1357interference with the spectral projection scheme for the response.
1358
1359In non-intrusive PCE, simulations are used as black boxes and the
1360calculation of chaos expansion coefficients for response metrics of
1361interest is based on a set of simulation response evaluations. To
1362calculate these response PCE coefficients, two primary classes of
1363approaches have been proposed: spectral projection and regression. The
1364spectral projection approach projects the response against each basis
1365function using inner products and employs the polynomial orthogonality
1366properties to extract each coefficient. Each inner product involves a
1367multidimensional integral over the support range of the weighting
1368function, which can be evaluated numerically using sampling,
1369tensor-product quadrature, Smolyak sparse grid~\cite{Smolyak_63}, or
1370cubature~\cite{stroud} approaches. The regression approach finds a set
1371of PCE coefficients which best match a set of response values obtained
1372from either a design of computer experiments (``point
1373collocation''~\cite{pt_colloc1}) or from a randomly selected subset of
1374tensor Gauss points (``probabilistic
1375collocation''~\cite{Tat95}). Various methods can be used to solve the
1376resulting linear system, including least squares methods for
1377over-determined systems and compressed sensing methods for
1378under-determined systems. Details of these methods are documented in
1379the Linear regression section of the Dakota Theory
1380Manual~\cite{TheoMan} and the necessary specifications needed to
1381activate these techniques are listed in the keyword section of the
1382Dakota Reference Manual~\cite{RefMan}.
1383
1384Stochastic collocation (SC) is another stochastic expansion technique
1385for UQ that is closely related to PCE. As for PCE, exponential
1386convergence rates can be obtained under order refinement for
1387integrated statistical quantities of interest, provided that the
1388response functions are smooth with finite variance. The primary
1389distinction is that, whereas PCE estimates coefficients for known
1390multivariate orthogonal polynomial basis functions, SC forms
1391multivariate interpolation polynomial bases for known coefficients.
1392The interpolation polynomials may be either local or global and either
1393value-based or gradient-enhanced (four combinations: Lagrange,
1394Hermite, piecewise linear spline, and piecewise cubic spline), and may
1395be used within nodal or hierarchical interpolation
1396formulations. Interpolation is performed on structured grids such as
1397tensor-product or sparse grids. Starting from a tensor-product
1398multidimensional interpolation polynomial in the value-based case
1399(Lagrange or piecewise linear spline), we have the feature that the
1400$i^{th}$ interpolation polynomial has a value of 1 at collocation
1401point $i$ and a value of 0 for all other collocation points, leading
1402to the use of expansion coefficients that are just the response values
1403at each of the collocation points. In the gradient-enhanced case
1404(Hermite or piecewise cubic spline), SC includes both type 1 and type
14052 interpolation polynomials, where the former interpolate the values
1406while producing zero gradients and the latter interpolate the
1407gradients while producing zero values (refer to~\cite{TheoMan} for
1408additional details). Sparse interpolants are weighted sums of these
1409tensor interpolants;
1410%and retain the use of response values as expansion coefficients;
1411however, they are only interpolatory for sparse grids based on fully
1412nested rules and will exhibit some interpolation error at the
1413collocation points for sparse grids based on non-nested rules.
1414A key to maximizing performance with SC is performing collocation
1415using the Gauss points and weights from the same optimal orthogonal
1416polynomials used in PCE.
1417For use of standard Gauss integration rules (not nested variants such
1418as Gauss-Patterson or Genz-Keister) within tensor-product quadrature,
1419tensor PCE expansions and tensor SC interpolants are equivalent in
1420that identical polynomial approximations are
1421generated~\cite{ConstTPQ}. Moreover, this equivalence can be extended
1422to sparse grids based on standard Gauss rules, provided that a sparse
1423PCE is formed based on a weighted sum of tensor expansions~\cite{ConstSSG}.
1424
1425The Dakota Theory Manual~\cite{TheoMan} provides full algorithmic
1426details for the PCE and SC methods.
1427
1428
1429\subsection{Uncertainty Quantification Examples using Stochastic Expansions} \label{uq:stoch_exp:ex}
1430
1431\subsubsection{Polynomial Chaos Expansion for Rosenbrock}
1432\label{uq:stoch_exp:ex:pce}
1433
1434%The term ``Polynomial Chaos'' refers to the representation of a stochastic
1435%process as a polynomial expansion in random (or stochastic) variables. This
1436%representation acts as a response surface that maps stochastic inputs to
1437%stochastic outputs. Desired statistics can then be obtained from the
1438%response surface either analytically or by re-sampling the fast surrogate.
1439%Additional details regarding the method are provided in
1440
1441A typical Dakota input file for performing an uncertainty
1442quantification using PCE is shown in
1443Figure~\ref{uq:examples:pce_input}.  In this example, we compute CDF
1444probabilities for six response levels of Rosenbrock's function. Since
1445Rosenbrock is a fourth order polynomial and we employ a fourth-order
1446expansion using an optimal basis (Legendre for uniform random
1447variables), we can readily obtain a polynomial expansion which exactly
1448matches the Rosenbrock function. In this example, we select Gaussian
1449quadratures using an anisotropic approach (fifth-order quadrature in
1450$x_1$ and third-order quadrature in $x_2$), resulting in a total of 15
1451function evaluations to compute the PCE coefficients.
1452
1453\begin{figure}[htbp!]
1454  \centering
1455  \begin{bigbox}
1456    \begin{small}
1457      \verbatimtabinput[8]{rosen_uq_pce.in}
1458    \end{small}
1459  \end{bigbox}
1460\caption{Dakota input file for performing UQ using polynomial chaos expansions --
1461see \protect\path{dakota/share/dakota/examples/users/rosen_uq_pce.in} }
1462\label{uq:examples:pce_input}
1463\end{figure}
1464
1465The tensor product quadature points upon which the expansion is calculated
1466are shown in Figure~\ref{uq:examples:rosen_pce_points}.
1467The tensor product generates
1468all combinations of values from each individual dimension: it is an
1469all-way pairing of points.
1470
1471\begin{figure}[htbp!]
1472  \centering
1473  \includegraphics[height=2.5in]{images/rosen_pce_pts}
1474  \caption{Rosenbrock polynomial chaos example: tensor product quadrature points.}
1475  \label{uq:examples:rosen_pce_points}
1476\end{figure}
1477
1478Once the expansion coefficients have been calculated, some statistics
1479are available analytically and others must be evaluated numerically.
1480For the numerical portion, the input file specifies the use of 10000
1481samples, which will be evaluated on the expansion to compute the CDF
1482probabilities. In Figure~\ref{uq:examples:pce_out}, excerpts from the
1483results summary are presented, where we first see a summary of the PCE
1484coefficients which exactly reproduce Rosenbrock for a Legendre
1485polynomial basis. The analytic statistics for mean, standard
1486deviation, and COV are then presented. For example, the mean is 455.66
1487and the standard deviation is 606.56. The moments are followed by
1488global sensitivity indices (Sobol' indices).This example shows that
1489variable x1 has the largest main effect (0.497) as compared with
1490variable x2 (0.296) or the interaction between x1 and x2 (0.206).
1491After the global sensitivity indices, the local sensitivities are
1492presented, evaluated at the mean values.  Finally, we see the
1493numerical results for the CDF probabilities based on 10000 samples
1494performed on the expansion. For example, the probability that the
1495Rosenbrock function is less than 100 over these two uncertain
1496variables is 0.342. Note that this is a very similar estimate to what
1497was obtained using 200 Monte Carlo samples, with fewer true function
1498evaluations.
1499\begin{figure}[htbp!]
1500\centering
1501\begin{bigbox}
1502%\begin{footnotesize}
1503\begin{scriptsize}
1504\begin{verbatim}
1505Polynomial Chaos coefficients for response_fn_1:
1506        coefficient   u1   u2
1507        ----------- ---- ----
1508   4.5566666667e+02   P0   P0
1509  -4.0000000000e+00   P1   P0
1510   9.1695238095e+02   P2   P0
1511  -9.9475983006e-14   P3   P0
1512   3.6571428571e+02   P4   P0
1513  -5.3333333333e+02   P0   P1
1514  -3.9968028887e-14   P1   P1
1515  -1.0666666667e+03   P2   P1
1516  -3.3573144265e-13   P3   P1
1517   1.2829737273e-12   P4   P1
1518   2.6666666667e+02   P0   P2
1519   2.2648549702e-13   P1   P2
1520   4.8849813084e-13   P2   P2
1521   2.8754776338e-13   P3   P2
1522  -2.8477220582e-13   P4   P2
1523-------------------------------------------------------------------
1524Statistics derived analytically from polynomial expansion:
1525
1526Moment-based statistics for each response function:
1527                            Mean           Std Dev          Skewness          Kurtosis
1528response_fn_1
1529  expansion:    4.5566666667e+02  6.0656024184e+02
1530  numerical:    4.5566666667e+02  6.0656024184e+02  1.9633285271e+00  3.3633861456e+00
1531
1532Covariance among response functions:
1533[[  3.6791532698e+05 ]]
1534
1535Local sensitivities for each response function evaluated at uncertain variable means:
1536response_fn_1:
1537 [ -2.0000000000e+00  2.4055757386e-13 ]
1538
1539Global sensitivity indices for each response function:
1540response_fn_1 Sobol indices:
1541                                  Main             Total
1542                      4.9746891383e-01  7.0363551328e-01 x1
1543                      2.9636448672e-01  5.0253108617e-01 x2
1544                           Interaction
1545                      2.0616659946e-01 x1 x2
1546
1547Statistics based on 10000 samples performed on polynomial expansion:
1548
1549Probability Density Function (PDF) histograms for each response function:
1550PDF for response_fn_1:
1551          Bin Lower          Bin Upper      Density Value
1552          ---------          ---------      -------------
1553   6.8311107124e-03   1.0000000000e-01   2.0393073423e-02
1554   1.0000000000e-01   1.0000000000e+00   1.3000000000e-02
1555   1.0000000000e+00   5.0000000000e+01   4.7000000000e-03
1556   5.0000000000e+01   1.0000000000e+02   1.9680000000e-03
1557   1.0000000000e+02   5.0000000000e+02   9.2150000000e-04
1558   5.0000000000e+02   1.0000000000e+03   2.8300000000e-04
1559   1.0000000000e+03   3.5755437782e+03   5.7308286215e-05
1560
1561Level mappings for each response function:
1562Cumulative Distribution Function (CDF) for response_fn_1:
1563     Response Level  Probability Level  Reliability Index  General Rel Index
1564     --------------  -----------------  -----------------  -----------------
1565   1.0000000000e-01   1.9000000000e-03
1566   1.0000000000e+00   1.3600000000e-02
1567   5.0000000000e+01   2.4390000000e-01
1568   1.0000000000e+02   3.4230000000e-01
1569   5.0000000000e+02   7.1090000000e-01
1570   1.0000000000e+03   8.5240000000e-01
1571-------------------------------------------------------------------
1572\end{verbatim}
1573\end{scriptsize}
1574%\end{footnotesize}
1575\end{bigbox}
1576\caption{Excerpt of UQ output for polynomial chaos example.}
1577\label{uq:examples:pce_out}
1578\end{figure}
1579
1580\subsubsection{Uncertainty Quantification Example using Stochastic Collocation}
1581\label{uq:stoch_exp:ex:sc}
1582
1583%A typical Dakota input file for performing an uncertainty
1584%quantification using polynomial chaos expansions is shown in
1585%Section~\ref{uq:stoch_exp:ex:pce}, which illustrates PCE defined from an
1586%anisotropic tensor-product quadrature grid. The uncertain variables
1587%are uniforms, so the expansion is built using classical Legendre
1588%polynomials.
1589
1590Compared to the previous PCE example, this section presents a more
1591sophisticated example, where we use stochastic collocation built on an
1592anisotropic sparse grid defined from numerically-generated orthogonal
1593polynomials. The uncertain variables are lognormal in this example and
1594the orthogonal polynomials are generated from Gauss-Wigert recursion
1595coefficients~\cite{simpson_gw} in combination with the Golub-Welsch
1596procedure~\cite{GolubWelsch69}.  The input file is shown in
1597Figure~\ref{uq:figure11}.  Note that the dimension preference of
1598$(2,1)$ is inverted to define a $\gamma$ weighting vector of $(0.5,1)$
1599(and $\underline{\gamma}$ of $0.5$) for use in the anisotropic Smolyak
1600index set constraint (see Smolyak sparse grids section in Stochastic
1601Expansion Methods chapter in Dakota Theory Manual~\cite{TheoMan}). In
1602this example, we compute CDF probabilities for six response levels of
1603Rosenbrock's function. This example requires 19 function evaluations
1604to calculate the interpolating polynomials in stochastic collocation
1605and the resulting expansion exactly reproduces Rosenbrock's function.
1606The placement of the points generated by the sparse grid is shown in
1607Figure~\ref{uq:figure11b}.
1608
1609\begin{figure}[htbp!]
1610  \centering
1611  \begin{bigbox}
1612    \begin{small}
1613      \verbatimtabinput[8]{rosen_uq_sc.in}
1614    \end{small}
1615  \end{bigbox}
1616\caption{Dakota input file for performing UQ using stochastic collocation --
1617see \protect\path{dakota/share/dakota/examples/users/rosen_uq_sc.in} }
1618\label{uq:figure11}
1619\end{figure}
1620
1621\begin{figure}[htbp!]
1622  \centering
1623  \includegraphics[height=2.5in]{images/rosen_sc_pts}
1624  \caption{Rosenbrock stochastic collocation example: sparse grid points.}
1625  \label{uq:figure11b}
1626\end{figure}
1627
1628Once the expansion coefficients have been calculated, some statistics
1629are available analytically and others must be evaluated numerically.
1630For the numerical portion, the input file specifies the use of 10000
1631samples, which will be evaluated on the expansion to compute the CDF
1632probabilities. In Figure~\ref{uq:figure12}, excerpts from the results
1633summary are presented. We first see the moment statistics for mean,
1634standard deviation, skewness, and kurtosis computed by numerical
1635integration (see Analytic moments section in Stochastic Expansion
1636Methods chapter in Dakota Theory Manual~\cite{TheoMan}), where the
1637numerical row corresponds to integration using the original response
1638values and the expansion row corresponds to integration using values
1639from the interpolant. The response covariance (collapsing to a single
1640variance value for one response function) and global sensitivity
1641indices (Sobol' indices) are presented next. This example shows that
1642variable x1 has the largest main effect (0.99) as compared with
1643variable x2 (0.0007) or the interaction between x1 and x2 (0.005).
1644%After the global sensitivity indices, the local, analytic random
1645%variable sensitivities are presented, as computed from
1646%Eqs.~\ref{eq:dR_dx}-\ref{eq:dR_dxi_sc}, evaluated at the mean values.
1647Finally, we see the numerical results for the CDF probabilities based
1648on 10000 samples performed on the expansion. For example, the probability
1649that the Rosenbrock function is less than 100 is 0.7233. Note that these
1650results are significantly different than the ones presented in
1651Section~\ref{uq:stoch_exp:ex:pce} because of
1652the different assumptions about the inputs: uniform[-2,2] versus
1653lognormals with means of 1.0 and standard deviations of 0.5.
1654\begin{figure}[htbp!]
1655\centering
1656\begin{bigbox}
1657\begin{footnotesize}
1658\begin{verbatim}
1659Statistics derived analytically from polynomial expansion:
1660
1661Moment-based statistics for each response function:
1662                            Mean           Std Dev          Skewness          Kurtosis
1663response_fn_1
1664  expansion:    2.5671972656e+02  2.0484189184e+03  2.7419241630e+02  1.9594567379e+06
1665  numerical:    2.5671972656e+02  2.0484189184e+03  2.7419241630e+02  1.9594567379e+06
1666
1667Covariance among response functions:
1668[[  4.1960200651e+06 ]]
1669
1670Global sensitivity indices for each response function:
1671response_fn_1 Sobol indices:
1672                                  Main             Total
1673                      9.9391978710e-01  9.9928724777e-01 x1
1674                      7.1275222945e-04  6.0802128961e-03 x2
1675                           Interaction
1676                      5.3674606667e-03 x1 x2
1677
1678Statistics based on 10000 samples performed on polynomial expansion:
1679
1680Level mappings for each response function:
1681Cumulative Distribution Function (CDF) for response_fn_1:
1682     Response Level  Probability Level  Reliability Index  General Rel Index
1683     --------------  -----------------  -----------------  -----------------
1684   1.0000000000e-01   1.8100000000e-02
1685   1.0000000000e+00   8.7800000000e-02
1686   5.0000000000e+01   5.8410000000e-01
1687   1.0000000000e+02   7.2330000000e-01
1688   5.0000000000e+02   9.2010000000e-01
1689   1.0000000000e+03   9.5660000000e-01
1690\end{verbatim}
1691\end{footnotesize}
1692\end{bigbox}
1693\caption{Excerpt of UQ output for stochastic collocation example.}
1694\label{uq:figure12}
1695\end{figure}
1696
1697\section{Importance Sampling Methods}\label{uq:importance}
1698
1699Importance sampling is a method that allows one to estimate statistical
1700quantities such as failure probabilities (e.g. the probability that
1701a response quantity will exceed a threshold or fall below a threshold value)
1702in a way that is more efficient than Monte Carlo sampling. The core idea
1703in importance sampling is that one generates samples that preferentially
1704samples important regions in the space (e.g. in or near the failure region
1705or user-defined region of interest), and then appropriately weights
1706the samples to obtain an unbiased estimate of the failure probability
1707~\cite{Srinivasan2002}.
1708In importance sampling, the samples are generated from a density which is
1709called the importance density:  it is not the original probability density
1710of the input distributions. The importance density should be centered near the
1711failure region of interest. For black-box simulations such as those commonly
1712interfaced with Dakota, it is difficult to specify the importance density a priori:
1713the user often does not know where the failure region lies, especially in a high-dimensional
1714space.~\cite{Swiler2010}
1715
1716More formally, we define the objective of importance sampling as calculating the probability, $P$, that the output will exceed a threshold level. This is a failure
1717probability, where the failure probability is defined as some scalar function,
1718$y\left(\textbf{X}\right)$, exceeding a threshold, $T$,
1719where the inputs, $\textbf{X}$, are randomly distributed with density, $\rho\left(\textbf{X}\right)$.
1720When evaluating $y\left(\textbf{X}\right)$ is sufficiently expensive or $P$ is sufficiently small, Monte Carlo (MC) sampling methods to estimate $P$ will be infeasible due to the large number of function evaluations required
1721for a specified accuracy.
1722
1723The probability of failure can be thought of as the mean rate of occurrence
1724of failure. The Monte Carlo (MC) estimate of $P$ is therefore the sample
1725mean of the indicator function, $I\left(\textbf{X}\right)$,
1726\begin{equation}
1727P_{MC}=\frac{1}{N}\sum_{i=1}^{N}I\left(\mathbf{X_i}\right)\ \ \textbf{X}\sim \rho\left(\textbf{X}\right),
1728\label{mc_ind}
1729\end{equation}
1730where $N$ samples, $\mathbf{X_i}$, are drawn from
1731$\rho\left(\textbf{X}\right)$,
1732and the indicator function $I\left(\textbf{X}\right)$
1733is 1 if failure occurs and zero otherwise.
1734
1735Importance sampling draws samples from the importance density
1736$\rho'\left(\textbf{X}\right)$ and scales the sample mean by the importance density:
1737\begin{equation}
1738P_{IS}=\frac{1}{N}\sum_{i=1}^N \left(I\left(\mathbf{X_i}\right)\frac{\rho\left(\mathbf{X_i}\right)}{\rho'\left(\mathbf{X_i}\right)}\right)\ \ \textbf{X}\sim\rho'\left(\textbf{X}\right).\label{eqn:ispfail}
1739\end{equation}
1740This reduces the asymptotic error variance from:
1741\begin{equation}
1742\sigma_{err_{MC}}^2=\frac{{\rm E}\left[\left(I\left(\textbf{X}\right)-P\right)^2\right]}{N}
1743\end{equation}
1744to
1745\begin{equation}
1746\sigma_{err_{IS}}^2=\frac{{\rm E}\left[\left(I\left(\textbf{X}\right)\frac{\rho\left(\textbf{X}\right)}{\rho'\left(\textbf{X}\right)}
1747-P\right)^2\right]}{N}.
1748\label{eqn:iserrorvar}
1749\end{equation}
1750Inspection of Eq. ~\ref{eqn:iserrorvar} reveals $\sigma_{err_{IS}}^2=0$ if
1751$\rho'\left(\textbf{X}\right)$ equals the ideal importance density
1752$\rho^*\left(\textbf{X}\right)$,
1753\begin{equation}
1754\rho^*\left(\textbf{X}\right)=\frac{I\left(\textbf{X}\right)\rho\left(\textbf{X}\right)}{P}.\end{equation}
1755
1756However, $\rho^*\left(\textbf{X}\right)$ is unknown a priori because
1757$I\left(\textbf{X}\right)$ is only known where it has been evaluated.
1758Therefore, the required $P$ in the denominator is also unknown:  this is what we are trying to estimate.
1759
1760If importance sampling is to be effective, the practitioner must be able to
1761choose a good $\rho'\left(\textbf{X}\right)$ without already knowing
1762$I\left(\textbf{X}\right)$ everywhere.
1763There is a danger: a poor choice for $\rho'\left(\textbf{X}\right)$ can put most of the samples in
1764unimportant regions and make $\sigma_{err_{IS}}^2$ much greater than
1765$\sigma_{err_{MC}}^2$.
1766In particular, importance sampling can be challenging for very low probability events in high-dimensional spaces where
1767the output $y$ is calculated by a simulation. In these cases, usually one
1768does not know anything a priori about where the failure region exists
1769in input space.
1770We have developed two importance sampling approaches which do not
1771rely on the user explicitly specifying an importance density.
1772
1773\subsection{Importance Sampling Method based on Reliability Approach}\label{uq:importance_rel}
1774The first method is based on ideas in reliability modeling ~\ref{uq:reliability:local}.
1775An initial Latin Hypercube sampling is performed to generate an initial set of samples.
1776These initial samples are augmented with samples from an importance density as follows:
1777The variables are transformed to standard normal space. In the transformed space,
1778the importance density is a set of normal densities centered around points which
1779are in the failure region. Note that this is similar in spirit to the reliability
1780methods, in which importance sampling is centered around a Most Probable Point (MPP).
1781In the case of the LHS samples, the importance sampling density will simply by
1782a mixture of normal distributions centered around points in the failure region.
1783
1784This method is specified by the keyword \texttt{importance\_sampling}.
1785The options for importance sampling are as follows:  \texttt{import}
1786centers a sampling
1787density at one of the initial LHS samples identified in the failure region.
1788It then generates the importance samples, weights them by their probability of occurence
1789given the original density, and calculates the required probability (CDF or CCDF level).
1790\texttt{adapt\_import} is the same as \texttt{import} but is performed iteratively until the
1791failure probability estimate converges.
1792\texttt{mm\_adapt\_import} starts with all of the samples located in the failure region
1793to build a multimodal sampling density. First, it uses a small number of samples around
1794each of the initial samples in the failure region. Note that these samples
1795are allocated to the different points based on their relative probabilities of occurrence:
1796more probable points get more samples. This early part of the approach is done to
1797search for ``representative'' points. Once these are located, the multimodal sampling density is set and then the multi-modal adaptive method proceeds similarly to the
1798adaptive method (sample until convergence).
1799
1800\subsection{Gaussian Process Adaptive Importance Sampling Method}\label{uq:gpais}
1801The second importance sampling method in Dakota is the one we recommend,
1802at least for problems that have a relatively small number of input variables (e.g.
1803less than 10). This method, Gaussian Process Adaptive Importance Sampling,
1804is outlined in the paper ~\cite{Dalbey2014}.
1805This method  starts with an initial set of LHS samples and adds samples one at a time,
1806with the goal of adaptively improving the estimate of the ideal importance density
1807during the process. The approach uses a mixture of component densities. An
1808iterative process is used
1809to construct the sequence of improving component densities. At each
1810iteration, a Gaussian process (GP) surrogate is used to help identify areas
1811in the space where failure is likely to occur. The GPs are not used to
1812directly calculate the failure probability; they are only used to approximate
1813the importance density. Thus, the Gaussian process adaptive importance
1814sampling algorithm overcomes limitations involving using a potentially
1815inaccurate surrogate model directly in importance sampling calculations.
1816
1817This method is specified with the keyword \texttt{gpais}. There are three
1818main controls which govern the behavior of the algorithm.
1819\texttt{samples} specifies the initial number of Latin Hypercube samples
1820which are used to create the initial Gaussian process surrogate.
1821\texttt{emulator\_samples} specifies the number of samples taken on the
1822latest Gaussian process model each iteration of the algorithm.
1823These samples are used in the construction of the next importance
1824sampling density. The default is 10,000 samples. The third control
1825is \texttt{max\_iterations}, which controls the number of iterations
1826of the algorithm. Each iteration, one additional sample of the ``true''
1827simulation is taken. Thus, if \texttt{samples} were set at 100 and
1828\texttt{max\_iterations} were set to 200, there would be a total of
1829300 function evaluations of the simulator model taken.
1830
1831\section{Adaptive Sampling Methods}\label{uq:adaptive}
1832The goal in performing adaptive sampling is to construct a surrogate model that
1833can be used as an accurate predictor to some expensive simulation, thus it is
1834to one's advantage to build a surrogate that minimizes the error over the entire
1835domain of interest using as little data as possible from the expensive
1836simulation. The adaptive part alludes to the fact that the surrogate will be
1837refined by focusing samples of the expensive simulation on particular areas of
1838interest rather than rely on random selection or standard space-filling
1839techniques.
1840
1841\subsection{Adaptive sampling based on surrogates}\label{uq:adaptive:surrogate}
1842
1843At a high-level, the adaptive sampling pipeline is a four-step process:
1844\begin{enumerate}
1845\item Evaluate the expensive simulation (referred to as the true model) at
1846initial sample points
1847\item Fit/refit a surrogate model
1848\item Create a candidate set and score based on information from surrogate
1849\item Select a candidate point to evaluate the true model and Repeat 2-4
1850\end{enumerate}
1851
1852In terms of the Dakota implementation, the adaptive sampling method
1853currently uses Latin Hypercube sampling (LHS) to generate the initial
1854points in Step 1 above. For Step 2, we use a Gaussian process model.
1855The user can specify the scoring metric used to select the
1856next point (or points) to evaluate and add to the set.
1857We have investigated several scoring metrics with which to evaluate
1858candidate points for Step 3. There are some classical ones such as distance
1859(e.g. add a point which maximizes the minimum distance to all of
1860the existing points). This distance metric tends to generate
1861points that are space-filling. We have investigated several
1862methods that involve interesting topological features of the
1863space (e.g. points that are near saddle points). These are
1864an area of active investigation but are not currently included
1865in Dakota. The fitness metrics for scoring
1866candidate points currently include:
1867\begin{description}
1868\item[Predicted Variance]
1869First introduced in \cite{MacKay} and later
1870used in \cite{Seo}, this method uses the predicted
1871variance of the Gaussian process surrogate as the score of a candidate
1872point. Thus, the adaptively chosen points will be in areas of highest
1873uncertainty according to the Gaussian process model.
1874\item[Distance]
1875A candidate's score is the Euclidean distance in domain space between the
1876candidate and its nearest neighbor in the set of points already evaluated on the
1877true model. Therefore, the most undersampled area of the domain will always be
1878selected. The adaptivity of this method could be brought to question as it would
1879chose the exact same points regardless of the surrogate model used. However, it
1880is useful to use to compare other adaptive metrics to one that relies purely on
1881space-filling in an equivalent context.
1882\item[Gradient]
1883%DPM: PROBABLY WANT TO CHANGE THE NAME OF THIS METRIC
1884Similar to the above metric, a candidate's nearest neighbor is determined as in
1885the distance metric, only now the score is the absolute value of the difference
1886in range space of the two points. The range space values used are predicted
1887from the surrogate model. Though this method is called the gradient metric, it
1888actually does not take into account how close the candidate and its neighbor are
1889in domain space. This method attempts to evenly fill the range space of the
1890surrogate.
1891\end{description}
1892
1893Note that in our approach, a Latin Hypercube sample is generated (a new one,
1894different from the initial sample) and the surrogate model is evaluated
1895at this points. These are the ``candidate points'' that are then evaluated
1896according to the fitness metric outlined above. The number of candidates used
1897in practice should be high enough to fill most
1898of the input domain: we recommend at least hundreds of points for a low-
1899dimensional problem.
1900All of the candidates (samples on the emulator) are
1901given a score and then the highest-scoring candidate is selected to be evaluated
1902on the true model.
1903
1904The adaptive sampling method also can generate batches of points
1905to add at a time. With batch or  multi-point
1906selection, the true model can be evaluated in parallel and thus
1907increase throughput before refitting our surrogate model. This proposes a new
1908challenge as the problem of choosing a single point and choosing multiple points
1909off a surrogate are fundamentally different. Selecting the $n$ best scoring
1910candidates is more than likely to generate a set of points clustered in one
1911area which will not be conducive to adapting the surrogate.
1912We have implemented several strategies for batch selection of points:
1913\begin{description}
1914\item[\bf Naive Selection]
1915This strategy will select the $n$ highest scoring candidates regardless of their
1916position. This tends to group an entire round of points in the same area.
1917\item[\bf Distance Penalized Re-weighted Scoring]
1918In this strategy, the highest
1919scoring candidate is selected and then all
1920remaining candidates are re-scored with a distance penalization factor added in
1921to the score. Only points selected within a round are used for the distance
1922penalization. The factor is the same as used in the distance penalization
1923 scoring metrics from \cite{Maljovec}. First, compute all of the minimum
1924distances from each remaining candidate to the selected candidates. Then,
1925determine the median value of these distances. If the smallest distance, $d$,
1926between a point and the selected set is less than the computed median distance
1927its score is unaltered, otherwise the score is multiplied by a value $\rho$
1928determined by the following equation:
1929\begin{equation}
1930\rho = 1.5*d - 0.5*d^3
1931\end{equation}
1932\item[\bf Topological Maxima of Scoring Function]
1933In this strategy we look at the topology of the scoring function and select the
1934$n$ highest maxima in the topology. To determine local maxima, we construct the
1935approximate Morse-Smale complex. If the number of local maxima is less than $n$,
1936 we revert to the distance strategy above. As a further extension, one may
1937want to filter low-persistence maxima, but to keep the framework general, we
1938chose to omit this feature as defining a threshold for what deems a critical
1939point as "low persistence" can vary drastically from problem to problem.
1940\item[\bf Constant Liar]
1941We adapt the constant liar strategy presented in \cite{Ginsbourger} with the
1942scoring metrics. The strategy first selects
1943the highest scoring candidate, and then refits the surrogate using a ``lie'' value
1944at the point selected and repeating until $n$ points have been selected
1945whereupon the lie values are removed from the surrogate and the selected points
1946are evaluated on the true model and the surrogate is refit with these values.
1947\end{description}
1948
1949The adaptive sampling method is specified by the method keyword
1950\texttt{adaptive\_sampling}. There are many controls, including
1951the number of candidate samples to investigate each iteration
1952(\texttt{emulator\_samples}), the fitness metric used in scoring
1953candidates (\texttt{fitness\_metric}), and the number of iterations
1954to perform the adaptive sampling (\texttt{max\_iterations}).
1955For batch selection of points, one specifies a \texttt{batch\_selection}
1956strategy and a \texttt{batch\_size}.
1957The details of the specification are provided in the Dakota
1958reference manual.
1959
1960\subsection{Adaptive sampling based on dart throwing}\label{uq:adaptive:darts}
1961\texttt{pof\_darts} is a novel method for estimating the tail probability
1962(Probability of Failure) based on random sphere-packing in the uncertain
1963parameter space. Random points are sequentially sampled from the domain and
1964consequently surrounded by protecting spheres, with the constraint that each new
1965sphere center has to be outside all prior spheres~\cite{ebeida2016pof}. The
1966radius of each sphere is chosen such that the entire sphere lies either in the
1967failure or the non-failure region. This radius depends of the function
1968evaluation at the disk center, the failure threshold and an estimate of the
1969function gradient at the disk center. After exhausting the sampling budget
1970specified by \texttt{build\_samples}, which is the number of spheres per failure
1971threshold, the domain is decomposed into two regions.  These regions correspond
1972to failure and non-failure categories, each represented by the union of the
1973spheres of each type. The volume of the union of failure spheres gives a lower
1974bound on the required estimate of the probability of failure, while the volume
1975of the union of the non-failure spheres subtracted from the volume of the domain
1976gives an upper estimate. After all the spheres are constructed, we construct a
1977surrogate model, specified via a \texttt{model\_pointer}, and sample the
1978surrogate model extensively to estimate the probability of failure for each
1979threshold.
1980
1981\texttt{pof\_darts} handles multiple response functions and allows each to have
1982multiple failure thresholds. For each failure threshold \texttt{pof\_darts} will
1983insert a number of spheres specified by the user-input parameter "samples".
1984However, estimating the probability of failure for each failure threshold would
1985utilize the total number of disks sampled for all failure thresholds. For each
1986failure threshold, the sphere radii changes to generate the right spatial
1987decomposition. The POF-Darts method is specified by the method keyword
1988\texttt{pof\_darts}. The sample budget is specified by \texttt{build\_samples}.
1989By default, the method employs a local approach to estimate the Lipschitz
1990constant per sphere.
1991%The method can generally support local or global approaches to estimate the
1992%Lipschitz constant per sphere, given by (\texttt{lipschitz local} or
1993%\texttt{lipschitz global}). However, only the local approach is currently
1994%supported and is the default if not specified by the user.
1995
1996The surrogate model used by the \texttt{pof\_darts} method for extensive
1997sampling is specified using a \texttt{model\_pointer}, and its parameters are
1998therefore defined in that model. It can typically be any global surrogate in
1999Dakota (e.g., Gaussian process, polynomial chaos expansion, polynomial
2000regression, etc). POF-Darts can also use piecewise-decomposed surrogates which
2001build local pieces of the surrogate over different domain patches. The piecewise
2002decomposition option is a new capability added to Dakota to help construct
2003surrogates in high-dimensional spaces, using known function evaluations as well
2004as gradient and Hessian information, if available. The piecewise decomposition
2005option is declared using the keyword \texttt{domain\_decomp} and currently
2006supports polynomial, Gaussian Process (GP), and Radial Basis Functions (RBF)
2007surroagte models only. For example: a polynomial regression global surrogate is
2008specified with \texttt{model polynomial}, its order is selected using
2009\texttt{surrogate\_order}, and the piecewise decomposition option is specified
2010with \texttt{domain\_decomp}. The \texttt{domain\_decomp} option is parametrized
2011by a \texttt{cell\_type} set by default to Voronoi cells, an optional number of
2012\texttt{support\_layers}, and an optional \texttt{discontinuity\_detection}
2013capability. See~\ref{models:surf:piecewise_decomp} for more details.
2014
2015\section{Epistemic Nondeterministic Methods}\label{uq:epistemic}
2016
2017Uncertainty quantification is often used as part of the risk
2018assessment of performance, reliability, and safety of engineered
2019systems. Increasingly, uncertainty is separated into two categories
2020for analysis purposes: aleatory and epistemic
2021uncertainty~\cite{Obe03,Hel07}. Aleatory uncertainty is also referred to as
2022variability, irreducible or inherent uncertainty, or uncertainty due
2023to chance. Examples of aleatory uncertainty include the height of
2024individuals in a population, or the temperature in a processing
2025environment. Aleatory uncertainty is usually modeled with probability
2026distributions, and sampling methods such as Latin Hypercube sampling
2027in Dakota can be used to model aleatory uncertainty. In contrast,
2028epistemic uncertainty refers to lack of knowledge or lack of
2029information about a particular aspect of the simulation model,
2030including the system and environment being modeled. An increase in
2031knowledge or information relating to epistemic uncertainty will lead
2032to a reduction in the predicted uncertainty of the system response or
2033performance. For epistemic uncertain variables, typically one does not
2034know enough to specify a probability distribution on a variable.
2035Epistemic uncertainty is referred to as subjective, reducible, or lack
2036of knowledge uncertainty. Examples of epistemic uncertainty include
2037little or no experimental data for a fixed but unknown physical
2038parameter, incomplete understanding of complex physical phenomena,
2039uncertainty about the correct model form to use, etc.
2040
2041There are many approaches which have been developed to model epistemic
2042uncertainty, including fuzzy set theory, possibility theory, and
2043evidence theory. It is also possible to use simple interval analysis in
2044an epistemic context. Interval analysis and evidence theory are
2045described in more detail below.
2046
2047\subsection{Interval Methods for Epistemic Analysis}\label{uq:interval}
2048
2049In interval analysis, one assumes that nothing is known about
2050an epistemic uncertain variable except that its value lies
2051somewhere within an interval. In this situation, it is NOT
2052assumed that the value has a uniform probability of occuring
2053within the interval. Instead, the interpretation is that
2054any value within the interval is a possible value or a potential
2055realization of that variable. In interval analysis, the
2056uncertainty quantification problem is one of determining the
2057resulting bounds on the output (defining the output interval)
2058given interval bounds on the inputs. Again, any output response
2059that falls within the output interval is a possible output
2060with no frequency information assigned to it.
2061
2062We have the capability to perform interval analysis using either
2063\dakotakw{global_interval_est} or \dakotakw{local_interval_est}.  In
2064the global approach, one uses either a global optimization method or a
2065sampling method to assess the bounds.  \texttt{global\_interval\_est}
2066allows the user to specify either \texttt{lhs}, which performs Latin
2067Hypercube Sampling and takes the minimum and maximum of the samples as
2068the bounds (no optimization is performed) or \texttt{ego}. In the case
2069of \texttt{ego}, the efficient global optimization method is used to
2070calculate bounds. The ego method is described in
2071Section~\ref{opt:methods:gradientfree:global}.  If the problem is
2072amenable to local optimization methods (e.g. can provide derivatives
2073or use finite difference method to calculate derivatives), then one
2074can use local methods to calculate these
2075bounds. \texttt{local\_interval\_est} allows the user to specify
2076either \texttt{sqp} which is sequential quadratic programming, or
2077\texttt{nip} which is a nonlinear interior point method.
2078
2079Note that when performing interval analysis, it is necessary to define
2080interval uncertain variables as described in
2081Section~\ref{variables:uncertain}. For interval analysis, one must
2082define only one interval per input variable, in contrast with
2083Dempster-Shafer evidence theory, where an input can have several
2084possible intervals. Interval analysis can be considered a special
2085case of Dempster-Shafer evidence theory where each input is defined by
2086one input interval with a basic probability assignment of one. In
2087Dakota, however, the methods are separate and semantic differences
2088exist in the output presentation. If you are performing a pure
2089interval analysis, we recommend using either
2090\texttt{global\_interval\_est} or \texttt{local\_interval\_est}
2091instead of \texttt{global\_evidence} or \texttt{local\_evidence}, for
2092reasons of simplicity. %An example of interval estimation is found in
2093%the \path{dakota/share/dakota/examples/users/cantilever_uq_global_interval.in},
2094%and also in Section~\ref{uq:examples:interval}.
2095
2096%Note that we have kept separate implementations of interval analysis and
2097%Dempster-Shafer evidence theory because our users often want to couple
2098%interval analysis on an ``outer loop'' with an aleatory, probabilistic
2099%analysis on an ``inner loop'' for nested, second-order probability
2100%calculations. See Section~\ref{adv_models:mixed_uq} for additional
2101%details on these nested approaches.
2102These interval methods can also be used as the outer loop within an
2103interval-valued probability analysis for propagating mixed aleatory
2104and epistemic uncertainty -- refer to
2105Section~\ref{adv_models:mixed_uq:ivp} for additional details.
2106
2107%\subsubsection{Interval Analysis for Cantilever}\label{uq:examples:interval}
2108
2109%Interval analysis is often used to model epistemic uncertainty.
2110%In interval analysis, the
2111%uncertainty quantification problem is one of determining the
2112%resulting bounds on the output (defining the output interval)
2113%given interval bounds on the inputs.
2114
2115%We can do interval analysis using either
2116%\texttt{global\_interval\_est} or \texttt{local\_interval\_est}.
2117%In the global approach, one uses either a global optimization
2118%method or a sampling method to assess the bounds, whereas the
2119%local method uses gradient information in a derivative-based
2120%optimization approach.
2121
2122An example of interval estimation
2123is shown in Figure~\ref{uq:examples:interval_input}, with example results in
2124Figure~\ref{uq:examples:interval_out}. This example is a demonstration
2125of calculating interval bounds for three outputs of the cantilever beam
2126problem. The cantilever beam problem is described in detail in
2127Section~\ref{additional:cantilever}. Given input intervals of [1,10] on
2128beam width and beam thickness, we can see that the interval estimate of
2129beam weight is approximately [1,100].
2130
2131\begin{figure}[htbp!]
2132  \centering
2133  \begin{bigbox}
2134    \begin{small}
2135      \verbatimtabinput[8]{cantilever_uq_global_interval.in}
2136    \end{small}
2137  \end{bigbox}
2138\caption{Dakota input file for performing UQ using interval analysis --
2139see \protect\path{dakota/share/dakota/examples/users/cantilever_uq_global_interval.in} }
2140\label{uq:examples:interval_input}
2141\end{figure}
2142
2143\begin{figure}[htbp!]
2144\centering
2145\begin{bigbox}
2146\begin{small}
2147\begin{verbatim}
2148------------------------------------------------------------------
2149Min and Max estimated values for each response function:
2150weight:  Min = 1.0000169352e+00  Max = 9.9999491948e+01
2151stress:  Min = -9.7749994284e-01  Max = 2.1499428450e+01
2152displ:  Min = -9.9315672724e-01  Max = 6.7429714485e+01
2153-----------------------------------------------------------------
2154\end{verbatim}
2155\end{small}
2156\end{bigbox}
2157\caption{Excerpt of UQ output for interval example.}
2158\label{uq:examples:interval_out}
2159\end{figure}
2160
2161
2162\subsection{Dempster-Shafer Theory of Evidence}\label{uq:dempshaf}
2163
2164We have chosen to pursue evidence theory at Sandia as a way to model
2165epistemic uncertainty, in part because evidence theory is a
2166generalization of probability theory. Evidence theory is also
2167referred to as Dempster-Shafer theory or the theory of random
2168sets~\cite{Obe03}. This section focuses on the use of Dempster-Shafer
2169evidence theory for propagating epistemic uncertainties. When
2170aleatory uncertainties are also present, we may choose either to
2171discretize the aleatory probability distributions into sets of
2172intervals and treat them as well-characterized epistemic variables, or
2173we may choose to segregate the aleatory uncertainties and treat them
2174within an inner loop. A nested Dempster-Shafer approach for
2175propagating mixed aleatory and epistemic uncertainty is described in
2176Section~\ref{adv_models:mixed_uq:dste}.
2177%We also use a technique called second-order probability to perform
2178%uncertainty quantification when there is both epistemic and aleatory
2179%uncertainty present. Second-order probability is a nested technique
2180%with two levels of uncertainty quantification. The outer level UQ is
2181%typically linked to epistemic uncertainties and the inner level UQ is
2182%commonly associated with aleatory uncertainties. A common approach
2183%used is to sample possible realizations of epistemic variables in the
2184%outer loop, then send these to the inner loop for additional sampling
2185%over the aleatory variables. In this way one generates ``families''
2186%or ensembles of cumulative distribution functions, where each
2187%individual CDF is based on aleatory uncertainty, and the ensemble is
2188%based on epistemic uncertainty. See Section~\ref{adv_models:mixed_uq}
2189%for more details.
2190
2191In evidence theory, there are two complementary measures of
2192uncertainty: belief and plausibility. Together, belief and
2193plausibility can be thought of as defining lower and upper bounds,
2194respectively, on probabilities. Belief and plausibility define the
2195lower and upper limits or intervals on probability values. Typical
2196plots of cumulative and complementary cumulative belief and
2197plausibility functions are shown in Figure~\ref{uq:figure15}~\cite{Hel07}.
2198\begin{figure}[htbp!]
2199  \centering
2200  \includegraphics[scale=0.8]{images/belief_plaus}
2201  \caption{Example cumulative belief and plausibility distribution functions on left; complementary cumulative belief and plausibility distribution functions on right}
2202  \label{uq:figure15}
2203\end{figure}
2204In evidence theory, it is not possible to specify one probability value.
2205Instead, there is a range of values that is consistent with the
2206evidence. The range of values is defined by belief and
2207plausibility. Note that no statement or claim is made about one value
2208within an interval being more or less likely than any other value.
2209
2210In Dempster-Shafer evidence theory, the uncertain input variables are
2211modeled as sets of intervals. The user assigns a basic probability
2212assignment (BPA) to each interval, indicating how likely it is that
2213the uncertain input falls within the interval. The BPAs for a
2214particular uncertain input variable must sum to one. The intervals
2215may be overlapping, contiguous, or have gaps. In Dakota, an interval
2216uncertain variable is specified as \dakotakw{interval_uncertain}. When
2217one defines an interval type variable in Dakota, it is also necessary
2218to specify the number of intervals defined for each variable with
2219\dakotakw{iuv_num_intervals} as well the basic probability assignments
2220per interval, \dakotakw{iuv_interval_probs}, and the associated bounds
2221per each interval, \dakotakw{iuv_interval_bounds}.
2222Figure~\ref{uq:figure16} shows the input specification for interval
2223uncertain variables.
2224The example has two epistemic uncertain interval variables.
2225The first uncertain
2226variable has three intervals and the second has two. The basic
2227probability assignments for the first variable are 0.5, 0.1, and 0.4,
2228while the BPAs for the second variable are 0.7 and 0.3. Note that it
2229is possible (and often the case) to define an interval uncertain
2230variable with only ONE interval. This means that you only know that
2231the possible value of that variable falls within the interval, and the
2232BPA for that interval would be 1.0. In the case we have shown, the
2233interval bounds on the first interval for the first variable are 0.6
2234and 0.9, and the bounds for the second interval for the first variable
2235are 0.1 to 0.5, etc.
2236
2237\begin{figure}[htbp!]
2238  \centering
2239  \begin{bigbox}
2240    \begin{small}
2241      \verbatimtabinput[8]{textbook_uq_glob_evidence.in}
2242    \end{small}
2243  \end{bigbox}
2244\caption{Dakota input file for UQ example using Evidence Theory --
2245see \protect\path{dakota/share/dakota/examples/users/textbook_uq_glob_evidence.in} }
2246\label{uq:figure16}
2247\end{figure}
2248
2249Once the intervals, the BPAs, and the interval bounds are defined,
2250the user can run an epistemic analysis by specifying the method as
2251either \texttt{global\_evidence} or
2252\texttt{local\_evidence} in the Dakota input file.
2253Both of these methods perform Dempster-Shafer calculations:
2254the difference is that the local method uses a local optimization
2255algorithm to calculate the interval bounds and the global
2256method uses either sampling or a global optimization approach to
2257calculate an interval bound. These differences are discussed in
2258more detail below.
2259The intervals and their associated BPAs are then propagated through
2260the simulation to obtain cumulative distribution functions on belief
2261and plausibility. As mentioned above, belief is the lower bound on a
2262probability estimate that is consistent with the evidence, and
2263plausibility is the upper bound on a probability estimate that is
2264consistent with the evidence.
2265
2266Figure~\ref{uq:figure17} shows
2267results for the first response function obtained when running the
2268example in Figure~\ref{uq:figure16}. In this example, there are 6
2269output intervals (as a result of the 2 interval input variables with 3
2270and 2 intervals, respectively).
2271The output intervals are ordered to obtain cumulative
2272bound functions for both belief and plausibility. The
2273cumulative distribution function is presented for both belief (CBF) and
2274plausibility (CPF). The CBF value is the cumulative belief
2275corresponding to a certain output value. For example, the belief that
2276the output value is less than or equal to 0.2 for response 1 is 0.27,
2277and the plausibility that the output is less than
2278or equal to 0.2 is 1 for response 1. The belief that the
2279output value is less than 0.6217 is 0.75, while the plausbility that
2280the output is less than  0.0806 is 0.75.
2281The CBF and CPF may be plotted on a graph and
2282interpreted as bounding the cumulative distribution
2283function (CDF), which is the probability that the output is less than or equal
2284to a certain value. The interval bounds on probability values show
2285the value of epistemic uncertainty analysis: the intervals are usually
2286much larger than expected, giving one a truer picture of the total
2287output uncertainty caused by lack of knowledge or information about
2288the epistemic input quantities.
2289
2290\begin{figure}[htbp!]
2291\centering
2292\begin{bigbox}
2293\begin{small}
2294\begin{verbatim}
2295Belief and Plausibility for each response function:
2296Cumulative Belief/Plausibility Functions (CBF/CPF) for response_fn_1:
2297     Response Level  Belief Prob Level   Plaus Prob Level
2298     --------------  -----------------   ----------------
2299   1.0000000000e-03   0.0000000000e+00   0.0000000000e+00
2300   3.0000000000e-02   0.0000000000e+00   2.7000000000e-01
2301   2.0000000000e-01   2.7000000000e-01   1.0000000000e+00
2302   8.0000000000e-01   9.3000000000e-01   1.0000000000e+00
2303  Probability Level  Belief Resp Level   Plaus Resp Level
2304  -----------------  -----------------   ----------------
2305   2.5000000000e-01   2.6187288772e-01   6.2609206069e-02
2306   5.0000000000e-01   2.9829775860e-01   6.3736734971e-02
2307   7.5000000000e-01   6.2173551556e-01   8.0596931719e-02
2308\end{verbatim}
2309\end{small}
2310\end{bigbox}
2311\caption{Results of an Epistemic Uncertainty Quantification using Evidence Theory.}
2312\label{uq:figure17}
2313\end{figure}
2314
2315As in other nondeterministic methods, with \texttt{local\_evidence}
2316or \texttt{global\_evidence},
2317one can specify probability levels and response levels.
2318If response levels are specified, the belief and plausibility
2319function values corresponding to those response levels are calculated
2320(see Belief Prob Level and Plaus Prob Level in the tables shown in
2321Figure~\ref{uq:figure17}). Similarly, if probability levels are
2322specified, these are first interpreted to be belief values, and the
2323corresponding response levels are calculated (see Belief Resp Level);
2324then they are interpreted to be plausibility values and the
2325corresponding response levels are calculated (see Plaus Resp Level in
2326the table in Figure~\ref{uq:figure17}). We have recently added the
2327capability to support generalized reliability mappings in
2328the evidence methods. If the user specifies a generalized
2329reliability level, it will be first converted to a probability,
2330then interpreted as a belief and plausibility and the corresponding
2331response levels will be calculated. Likewise, if response levels
2332are specified, the corresponding belief and plausibility values
2333will be mapped to bounds on the generalized reliability levels.
2334
2335To elaborate on the differences between \texttt{global\_evidence} and
2336\texttt{local\_evidence}: both of these methods take the
2337Dempster-Shafer structures specified on the inputs and calculate a
2338resulting Dempster-Shafer structure on the outputs (e.g. a cumulative
2339belief and plausibility function).  To calculate the belief and
2340plausibility measures, it is necessary to calculate the minimum and
2341maximum of the response function in each ``interval cell
2342combination.''  For example, in a two variable problem, if the first
2343variable had three intervals and associated BPAs assigned and the
2344second variable had two intervals and associated BPAs assigned, there
2345would be 6 interval cells in total.  In each of these six cells, one
2346needs to identify a minimum and maximum value of the response
2347function. This is easy to do if the function is monotonic in both
2348variables, but in general it is not. We offer the capability to use
2349local optimization methods to calculate these bounds:
2350\texttt{local\_evidence} allows the user to specify either
2351\texttt{sqp} which is sequential quadratic programming, or
2352\texttt{nip} which is a nonlinear interior point method. We also offer
2353the capability to use global methods to assess these interval cell
2354bounds. \texttt{global\_evidence} allows the user to specify either
2355\texttt{lhs}, which performs Latin Hypercube Sampling and takes the
2356minimum and maximum of the samples within each cell as the bounds (no
2357optimization is performed) or \texttt{ego}. In the case of
2358\texttt{ego}, the efficient global optimization method is used to
2359calculate bounds. The ego method is described in
2360Section~\ref{opt:methods:gradientfree:global}.  Note that for a
2361situation with many uncertain variables, each with a fairly
2362complicated Dempster-Shafer structure described by many intervals,
2363there will be a huge number of interval calls, and the overall process
2364of performing Dempster-Shafer analysis will be extremely expensive.
2365Reference~\cite{Tang10b} provides more details about the
2366implementation of the optimization methods to perform Dempster-Shafer
2367calculations, as well as comparisons on test problems.
2368
2369\section{Bayesian Calibration Methods}\label{uq:bayesian}
2370
2371In Bayesian calibration a ``prior distribution'' on a parameter is
2372updated through a Bayesian framework involving experimental data and a
2373likelihood function.  Bayesian inference theory is best left to other
2374sources ~\cite{Kenn01} and only a brief summary is given here.  In
2375Bayesian methods, uncertain parameters are characterized by
2376probability density functions. These probability density functions
2377define the permissible parameter values - the support, as well as the
2378relative plausibility of each permissible parameter value. In the
2379context of calibration or any inference step, the probability density
2380function that describes knowledge before the incorporation of data is
2381called the prior, $f_{\boldsymbol{\Theta}}\left( \boldsymbol{\theta}
2382\right)$.
2383
2384Note: In Dakota, the prior distribution is characterized by the
2385properties of the active uncertain variables. Correlated priors are
2386only supported for unbounded normal, untruncated lognormal, uniform,
2387exponential, gumbel, frechet, and weibull distributions and require a
2388probability transformation by specifying
2389\dakotakw{standardized_space}.
2390
2391When data are available, the likelihood function describes how well
2392each parameter value is supported by the data. Bayes
2393Theorem~\cite{Jaynes}, shown in Equation~\ref{eq:BayesThm}, is used
2394for inference: to derive the plausible parameter values, based on the
2395prior probability density and the data $\boldsymbol{d}$. The result is
2396the posterior probability density function of the parameters
2397$f_{\boldsymbol{{\Theta |D}}}\left( \boldsymbol{{\theta |d}}
2398\right)$. It is interpreted the same way as the prior, but includes
2399the information derived from the data.
2400
2401\begin{equation}
2402{f_{\boldsymbol{\Theta |D}}}\left( \boldsymbol{\theta |d} \right) = \frac{{{f_{\boldsymbol{\Theta}}}\left( \boldsymbol{\theta}  \right)\mathcal{L}\left( \boldsymbol{\theta;d} \right)}}{{{f_{\boldsymbol{D}}}\left( \boldsymbol{d} \right)}}. \label{eq:BayesThm}
2403\end{equation}
2404
2405
2406%\begin{equation}
2407%  \label{eqn:BayesThm}
2408%  {f_{\Theta |D}}\left( {\theta |d} \right) = \frac{{{f_\Theta }\left( \theta  \right)\mathcal{L}\left( {\theta ;d} \right)}}{{{f_D}\left( d \right)}}
2409%\end{equation}
2410
2411The likelihood function is used to describe how well a model's
2412predictions are supported by the data.  The likelihood function can be
2413written generally as:
2414\begin{equation*}
2415  \mathcal{L}\left(\boldsymbol{{\theta ;d}} \right) =  \mathcal{F}(q(\boldsymbol{\theta)} -\boldsymbol{d}),
2416\end{equation*}
2417where $\boldsymbol{\theta}$ are the parameters of model quantity of
2418interest $q$.  The form of the function $\mathcal{F}$ can greatly
2419influence the results.  The specific likelihood function used in
2420Dakota is based on Gaussian probability density functions. This means
2421that we assume the difference between the model quantity
2422(e.g. quantity of interest returned from a computer simulation) and
2423the experimental observations are Gaussian:
2424\begin{equation}\label{eq:model}
2425d_i = q_i(\boldsymbol{\theta}) + \epsilon_i,
2426\end{equation}
2427where $\epsilon_i$ is a random variable that can encompass both
2428measurement errors on $d_i$ and modeling errors associated with the
2429simulation quantity of interest $q_i$, for each of $n$ observations.
2430
2431If we assume that all experiments and observations are independent,
2432then the probabilistic model defined by Eq.~(\ref{eq:model}) results
2433in a likelihood function for $\boldsymbol{\theta}$ that is the product
2434of $n$ normal probability density functions:
2435%as shown in Equation~\ref{eqn:Likelihood}.
2436\begin{equation}\label{eqn:Likelihood}
2437\mathcal{L}(\boldsymbol{{\theta};d}) = \prod_{i=1}^n
2438\frac{1}{\sigma_d \sqrt{2\pi}} \exp
2439\left[ - \frac{\left(d_i-q_i(\boldsymbol{{\theta}})\right)^2}{2{\sigma_d}^2} \right],
2440%\mathcal{L}\left( {{\theta};d} \right) = \prod\limits_{i = 1}^n {\frac{1}{{\sigma \sqrt {2\pi } }}  \exp \left[  - \frac{\left(d_i - \mathcal{M}({\theta})\right)^2}{2\sigma^2} \right]
2441\end{equation}
2442where ${\sigma_d}^2$ refers to the measurement error of the data,
2443assumed constant across all data observations in this case.
2444
2445We also support the more general case of a full covariance matrix,
2446$\boldsymbol{\Sigma_d}$, that specifies the covariance between each
2447observation $i$ and $j$.  In this case, the likelihood is commonly
2448written in log form, where the log-likelihood
2449is: %given in Equation~\ref{eqn:LogLikelihood}.
2450\begin{equation}\label{eqn:LogLikelihood}
2451\log{\mathcal{L}(\boldsymbol{{\theta};d})} \propto % =-0.5 {R}^{T} {\Sigma_d}^{-1} {R}
2452-\frac{1}{2} \boldsymbol{r}^T \boldsymbol{\Sigma_d}^{-1} \boldsymbol{r},
2453\end{equation}
2454where $\boldsymbol{r}$ is the vector of residuals between the data
2455points and the model quantity of interest,
2456$q(\boldsymbol{\theta})-\boldsymbol{d}$.
2457
2458Dakota admits four \texttt{experiment\_variance\_type} options to specify the
2459measurement error covariance: \texttt{none} for no measurement error
2460specified (in this case, the variance is assumed to be one),
2461\texttt{scalar} where a constant value ${\sigma_d}^2$ is given for all
2462observations, \texttt{diagonal} where a value is specified for the
2463diagonal elements of the covariance matrix $\boldsymbol{\Sigma_d}$
2464meaning that each observation has its own measurement error but there
2465are no cross terms, and \texttt{matrix} where the full covariance
2466matrix $\boldsymbol{\Sigma_d}$ is specified.  The \texttt{diagonal}
2467and \texttt{matrix} terms are only available for field response data.
2468In contrast to earlier versions of Dakota, all measurement error
2469variance should be specified with units of variance/covariance, not
2470standard deviation.
2471
2472Markov Chain Monte Carlo (MCMC) is the prototypical method used to
2473estimate posterior parameter densities, given the observational data
2474and the priors. There are many references that describe the basic
2475algorithm~\cite{Gilks} and this is an active research area.  MCMC
2476algorithms may require hundreds of thousands of steps to converge,
2477depending on dimensionality, response nonlinearity, and the desired
2478set of posterior statistics.  Since each iteration involves an
2479evaluation of the model to obtain $q(\boldsymbol{\theta})$, surrogate
2480models of expensive simulations are often employed to make the MCMC
2481process tractable.
2482
2483Dakota offers five approaches for Bayesian calibration: QUESO, DREAM,
2484GPMSA, MUQ, and WASABI.  They are specified with the
2485\texttt{bayes\_calibration} keyword in combination with the
2486\texttt{queso}, \texttt{dream}, \texttt{gpmsa}, \texttt{muq}, or \texttt{wasabi}
2487selections, respectively.  The QUESO and GPMSA methods use components
2488from the QUESO library (Quantification of Uncertainty for Estimation,
2489Simulation, and Optimization) developed at The University of Texas at
2490Austin.  It implements the Delayed Rejection and Adaptive
2491Metropolis~\cite{Haario} (DRAM) algorithm, among others.  Algorithm
2492variants selectively combine the delayed rejection and adaptive
2493elements.  The QUESO/GPMSA capability is based on the GPMSA Matlab
2494toolbox developed at Los Alamos National Laboratory and uses tightly
2495integrated Gaussian process models during calibration.  The Dakota
2496implementation of QUES0/GPMSA is in a prototype stage.  DREAM uses an
2497implementation of DiffeRential Evolution Adaptive Metropolis developed
2498by John Burkardt.  The DREAM approach runs concurrent chains for
2499global exploration, and automatically tunes the proposal covariance
2500during the process by a self-adaptive randomized subspace
2501sampling~\cite{Vrugt}.  MUQ uses components from the MIT
2502Uncertainty Quantification library and also implements Metropolis Hastings
2503and Adaptive Metropolis. The prototype WASABI method is an MCMC-free
2504Bayesian calibration approach.  QUESO/DRAM
2505and variants are the most well-developed within Dakota.
2506
2507\subsection{QUESO}\label{uq:bayesian:queso}
2508The QUESO library includes several sampling algorithm variants.  One
2509can use a standard Metropolis-Hastings algorithm
2510(\texttt{metropolis\_hastings}), adaptive Metropolis
2511(\texttt{adaptive\_metropolis}) for adapting the proposal covariance
2512during the sampling, delayed rejection (\texttt{delayed\_rejection})
2513for backtracking from sample rejections, the full DRAM (\texttt{dram})
2514which involves both delayed rejection and adaptive Metropolis, or a
2515multi-level algorithm (\texttt{multilevel}).  This last option is not
2516yet production-ready in Dakota.
2517
2518With any choice of sampling algorithm, one may manually set the burn in
2519period for the MCMC chain with \texttt{burn\_in\_samples}. If a
2520\texttt{sub\_sampling\_period} is specified, the MCMC chain is further
2521filtered such that only the sample at the beginning of each period is in
2522the final MCMC chain. The \texttt{sub\_sampling\_period} should therefore
2523be greater than or equal to the correlation length of the samples.
2524
2525With the QUESO method, one may run the MCMC sampling on the simulation
2526model directly.  However, if the model is expensive, use of a
2527surrogate (emulator) model is recommended.  Options include a Gaussian
2528process, a polynomial chaos expansion, or a stochastic collocation
2529expansion.
2530
2531The proposal covariance refers to the covariance structure of a
2532multivariate normal distribution, which governs sample generation in
2533the chain.  One may specify \texttt{proposal\_covariance}, followed by
2534\texttt{prior} (the default), \texttt{values}, \texttt{filename}, or
2535\texttt{derivatives}.  With the \texttt{prior} setting, the proposal
2536covariance will be set to the variance of the prior distributions of
2537the parameters being calibrated.  When specifying the proposal
2538covariance with input file values or from a separate file, the user
2539may specify only the diagonals of the covariance matrix or the full
2540covariance matrix.
2541
2542The derivatives option will use derivatives from the simulation or
2543emulator model to form or approximate the Hessian of the misfit
2544function (the negative log likelihood).  Especially when derivative
2545information is available inexpensively (e.g. from an emulator), the
2546derivative-based proposal covariance forms a more accurate proposal
2547distribution, resulting in lower rejection rates and faster chain
2548mixing.  When using an emulator, the derivative-based proposal
2549covariance should be updated periodically using the
2550\texttt{posterior\_adaptive} specification.  This will add simulation
2551truth evaluations in areas of high-likelihood to the emulator training
2552data, thus refining the Hessian.  For more detail about
2553derivative-based formulations involving the misfit Hessian, refer to
2554the Theory Manual.
2555
2556An additional control for QUESO is to perform a logit transformation
2557(\texttt{logit\_transform}) which performs an internal variable
2558transformation from bounded domains to unbounded
2559domains. %in order to reduce sample rejection due to an out-of-bounds condition.
2560This option can be helpful when regions of high posterior density
2561exist in the corners of a multi-dimensional bounded domain.  In these
2562cases, it may be difficult to generate feasible samples from the
2563proposal density, and transformation to unbounded domains may
2564greatly reduce sample rejection rates.
2565
2566The \texttt{pre\_solve} option will perform a deterministic
2567gradient-based optimization before the MCMC sampling to get a good
2568starting point for the chain.  This pre-solve seeks to maximize the
2569log-posterior (the log-likelihood minus the log-prior) to identify the
2570maximum a posteriori probability point, also called the MAP point.
2571The Markov Chain will then start at the MAP point, which can
2572circumvent a lot of initial searching for high posterior probability
2573points.  The pre-solve option can be used with an emulator or with no
2574emulator.
2575
2576Credible and prediction intervals will be calculated if
2577\texttt{probability\_levels} is specified. Credible intervals propagate
2578uncertainties in parameter density information to the quantity of interest
2579and quantify how well the model fits the provided data, while prediction
2580intervals propagate both parameter and experimental measurement uncertainties
2581and contain the next experimental or simulated observation with the specified
2582probability. Further details can be found in \cite{Smith2013}. If
2583\texttt{probability\_levels} is specified, credible intervals will always be
2584calculated. Prediction intervals will only be calculated if
2585\texttt{experiment\_variance\_type} is also specified in the \texttt{responses}
2586block.
2587By specifying \texttt{posterior\_stats}, information-theoretic metrics may be
2588calculated using the posterior distribution of parameters. If the
2589\texttt{kl\_divergence} option is selected, the Kullback-Leibler Divergence
2590will be calculated between the posterior and the prior distributions such that
2591\begin{equation}
2592D_{KL} = \int {f_{\boldsymbol{\Theta |D}}}\left( \boldsymbol{\theta |d} \right)
2593\log \frac{ {f_{\boldsymbol{\Theta |D}}}\left( \boldsymbol{\theta |d} \right) }
2594{{f_{\boldsymbol{\Theta}}}\left( \boldsymbol{\theta}  \right)}
2595d\boldsymbol{\theta}.
2596\end{equation}
2597This quantity represents the amount of information gained about the parameters
2598during the Bayesian update. Further details regarding the calculation and use
2599of $D_{KL}$ can be found in~\cite{TheoMan}.
2600
2601\subsection{DREAM}
2602For the DREAM method, one can define the number of chains used with the
2603\texttt{chains} specification.  The total number of generations per chain in
2604DREAM is the number of samples divided by the number of chains.
2605%The minimum number of chains is three.
2606The number of chains randomly selected to be used in the crossover
2607each time a crossover occurs is \texttt{crossover\_chain\_pairs}.
2608There is an extra adaptation during burn-in, in which DREAM estimates a
2609distribution of crossover probabilities that favors large jumps over
2610smaller ones in each of the chains.
2611Normalization is required to ensure that all of the input dimensions contribute
2612equally.  In this process, a discrete number of candidate points for
2613each crossover value is generated, which can be specified with \texttt{num\_cr}.
2614The \texttt{gr\_threshold} control is the convergence tolerance for the Gelman-Rubin
2615statistic, which governs the convergence of the multiple chain
2616process.  The integer \texttt{jump\_step} forces a long jump every
2617\texttt{jump\_step} generations.
2618For more details about these parameters, refer to \cite{Vrugt}.
2619Credible and prediction intervals can be calculated by specifying
2620\texttt{probability\_levels}, and statistics regarding the posterior may be
2621calculated by specifying \texttt{posterior\_stats}, as described in
2622Section~\ref{uq:bayesian:queso}.
2623
2624\subsection{GPMSA}
2625Core to GPMSA is the construction of a Gaussian process emulator from
2626simulation runs collected at various settings of input parameters. The
2627emulator is a statistical model of the system response, and it is used
2628to incorporate the observational data to improve system predictions
2629and constrain or calibrate the unknown parameters. The GPMSA code
2630draws heavily on the theory developed in the seminal Bayesian
2631calibration paper by Kennedy and O'Hagan~\cite{Kenn01}. The particular
2632approach developed by the Los Alamos group, and implemented in QUESO
2633and therefore Dakota, is provided in~\cite{Hig08}.  It includes an
2634embedded discrepancy model and the ability to estimate various
2635hyper-parameters of the Gaussian process, observation error model, and
2636discrepancy model.  Dakota's GPMSA capability is an experimental
2637prototype with a number of limitations.  See the Dakota Reference
2638Manual~\cite{RefMan} for more information.
2639
2640\subsection{MUQ}
2641MUQ is the MIT Uncertainty Quantification library.  See
2642\url{https://bitbucket.org/mituq/muq2/src/master/} and
2643\url{https://mituq.bitbucket.io/index.html} for additional
2644documentation.  Dakota currently exposes two MCMC approaches from MUQ:
2645Metropolis-Hastings and Adaptive Metropolis. Dakota's MUQ integration
2646is preliminary, anticipated to extend to use MUQ components for
2647Hamiltonian Monte Carlo and Langevin-based sampling.  MUQ is an
2648experimental Dakota capability, and as such, it is not turned on by
2649default, and must be explicitly enabled when compiling Dakota.
2650
2651\subsection{WASABI}
2652WASABI differs from the other Bayesian approaches in that it is
2653not an MCMC-based approach.  Instead, it is based on the idea of
2654``consistent Bayes'' which is outlined in ~\cite{Butler2017}.
2655This approach to stochastic inference uses measure-theoretic
2656principles to construct a probability measure or density on model parameters
2657that is consistent with the model and the data.  The idea is that
2658the probability measure on the parameters, when ``pushed-foward'' through
2659the computational model, will give results that match the probability
2660measure on the observational data.
2661
2662We use a similar notation as with the Bayesian methods, but the interpretation
2663is different here.  The goal is to identify the posterior density on the
2664parameters, $\pi_{post}({\theta})$, which is equal to the prior density on the
2665parameters times a ratio.  The numerator of the ratio, $\pi_{D}^{obs}$,
2666describes the relative likelihood that the output of the model corresponds
2667to the observed data ${D}$:  this is the density of the data evaluated at
2668the model output.
2669$q(\boldsymbol{\theta})$ refers to the model output.
2670$\pi_{D}^{q_{prior}}$ refers to the push-forward of the prior through the model
2671and represents a forward propagation of uncertainty.
2672\begin{equation}
2673\pi_{post}(\boldsymbol{\theta})=\pi_{prior}(\boldsymbol{\theta})\frac{\pi_{D}^{obs}(q(\boldsymbol{\theta}))}{\pi_{D}^{q_{prior}}(q(\boldsymbol{\theta}))}.
2674\label{eq:consistentBayesEq}
2675\end{equation}
2676
2677The Theory Manual~\cite{TheoMan} has more detail about the assumptions and mathematical foundations
2678for this method.  Note a major difference in interpretation of the posterior results with respect to
2679a standard Bayesian approach:  In a standard Bayesian approach, the posterior reflects an updated state
2680of information about the prior distribution on parameter values induced by the observational data.
2681In consistent Bayes, the posterior reflects a stochastic mapping of parameter values such that the posterior
2682parameters, when pushed-forward through the model, give results that are consistent with the density
2683of the observational data.
2684WASABI is a prototype capability.  See the Dakota Reference
2685Manual~\cite{RefMan} for more information.
2686
2687\subsection{Feature Comparison}
2688Table~\ref{tab:bayes_comparison} compares the options available
2689with the QUESO, DREAM, GPMSA, MUQ, and WASABI implementations in Dakota.
2690
2691\begin{table}
2692\centering
2693\caption{Capabilities of Bayesian methods in Dakota}
2694\label{tab:bayes_comparison}
2695\begin{tabulary}{\textwidth}{|L|C|C|C|C|C|}
2696\hline
2697Capability           &  QUESO  & MUQ  & GPMSA & DREAM & WASABI \\
2698\hline
2699Prior Distributions  &  Any continuous variable type
2700                     &  Any continuous variable type
2701                     &  Any continuous variable type
2702                     & Uniform only & Uniform only \\
2703\hline
2704Inference Type       & MCMC with DR, AM, DRAM, or MH
2705                     & MCMC with MH or AM
2706                     & MCMC with DR, AM, DRAM, or MH
2707                     & MCMC with Differential Evolution Adaptive Metropolis & MCMC-free interval analysis \\
2708\hline
2709Can use PCE/SC Emulator        &  Yes            & Yes              & Yes                    & Yes  & Yes \\
2710\hline
2711Can use GP Emulator            &  Yes            &  Yes            & Yes (required)  & Yes                    & Yes \\
2712\hline
2713Likelihood-based adaptive emulator update  &  Yes            &  No             &  No             & No                     & No \\
2714\hline
2715Initialize with MAP pre-solve  &  Yes            &  Yes            &  No             & No                     & No \\
2716\hline
2717Proposal covariance options    &  prior, user, derivative-based    & n/a                    &  prior, user  & n/a                    & n/a \\
2718\hline
2719Can calibrate error covariance multipliers &  Yes            &  Yes            &  Yes (internal) & Yes                    & No          \\
2720\hline
2721Supports standardized space    &  Yes            &  Yes            & Yes             & Yes                    & Yes           \\
2722\hline
2723Logit transform                &  Yes            &  Yes            & Yes             &  No                    & No            \\
2724\hline
2725Posterior export               &  samples        &  samples        & samples         &  samples               & samples, density \\
2726\hline
2727\end{tabulary}
2728\end{table}
2729
2730
2731\subsection{Bayesian Calibration Example}\label{uq:bayesian:ex}
2732
2733To run a QUESO-based Bayesian calibration in Dakota, create a Dakota
2734input file such as the one shown in Figure~\ref{uq:figure18}.  Here,
2735the QUESO DRAM (delayed rejection adaptive metropolis) solver is
2736selected. The number of samples = 1000 indicates how many points to
2737generate in the acceptance chain\footnote{If delayed rejection is
2738  active, the number of simulation evaluations will typically be
2739  higher due to backtracking.}.  This example uses the
2740\texttt{mod\_cantilever} algebraic model, so an emulator is not
2741warranted.  The proposal covariance used has diagonal element values
2742of 1.e6 and 0.1. Two credible and prediction intervals will be
2743calculated for each model output: a 5/95 interval and a 10/90
2744interval. The calibration terms in the responses section refers to the
2745number of outputs that will be used in the calibration process: in
2746this case, it is just two. The calibration data file has the
2747observational data: in this case, it is a freeform file (e.g. no
2748header or annotation) with ten experiments. For each experiment, there
2749are two experiment values, one for stress and one for displacement,
2750followed by two variance values for the error associated with that
2751experiment for each quantity of interest.
2752
2753\begin{figure}[htbp!]
2754  \centering
2755  \begin{bigbox}
2756    \begin{small}
2757      \verbatimtabinput[8]{queso_uq.in}
2758    \end{small}
2759  \end{bigbox}
2760  \caption{Dakota input file for Bayesian calibration}
2761\label{uq:figure18}
2762\end{figure}
2763
2764When the input file shown in~\ref{uq:figure18} is run, Dakota will run
2765the MCMC algorithm and generate a posterior sample of
2766$\boldsymbol{\theta}$ in accordance with Bayes
2767Theorem~\ref{eq:BayesThm} and the likelihood
2768function~\ref{eqn:Likelihood}. Dakota's final output summary reports
2769an evaluation count summary and the best sample point visited during
2770the MCMC process as measured by maximum posterior probability (an
2771estimate of the MAP point).  The final output also summarizes the
2772moments of the posterior samples from the chain (e.g.  mean of the
2773chain, standard deviation of the chain samples, etc.), as well as the
2774credible and prediction intervals for each model output.
2775
2776Auxilliary output is also generated to a directory called
2777\path{QuesoDiagnostics/} in the directory from which Dakota is run.
2778The file \path{display_sub0.txt} contains diagnostic information
2779regarding the MCMC process.  The Matlab files contained in the
2780\path{QuesoDiagnostics/} directory contain the chain files.  The
2781files to load in Matlab are \path{raw_chain.m} or
2782\path{filtered_chain.m}, containing the full chain or the filtered
2783chain values of the parameters\footnote{The full chain will be output
2784  in cases of adaptive posterior refinement or proposal updating,
2785  since these use cases access the entire acceptance chain to identify
2786  refinement data or restarting points, respectively.}.  In addition,
2787the accepted chain values that Dakota generates are written to a file
2788in the run directory called (by default)
2789\path{dakota_mcmc_tabular.dat}. The first columns of this file are
2790the posterior values of the input variables. If \texttt{burn\_in} or
2791\texttt{sub\_sampling\_period} are specified, the filtered acceptance
2792chain is instead written to the file
2793\path{dakota_mcmc_tabular.dat}. This file contains the posterior
2794values of the filtered MCMC chain, as well as the values of state
2795variables and the resulting model responses. Finally, if one wants to
2796see the likelihood of each point, specifying verbose output in the
2797method section will result in the likelihoods being printed.
2798
2799\subsection{Chain Diagnostics}
2800
2801The convergence of the chain produced by MCMC may require many
2802thousands of steps, if not millions, as discussed earlier in this
2803section. Assessing the convergence of MCMC chains is an active area of
2804research, and the implementation of metrics for chain convergence is
2805undergoing active development in Dakota, and can be triggered during a
2806Bayesian calibration study through the use of the keyword
2807\texttt{chain\_diagnostics}.
2808
2809As of Dakota 6.10,  \texttt{confidence\_intervals} is the only
2810diagnostic implemented.
2811
2812Suppose $g$ is a function that represents some characteristic (e.g.
2813moment) of an underlying distribution, such as the mean or variance.
2814Then under the standard assumptions of an MCMC chain,  the true value
2815can be approximated by taking the ensemble mean over the MCMC chain.
2816The confidence interval for the true moment calculated using the
2817asymptotically valid interval estimator is given by~\cite{Fle10}
2818\begin{equation}
2819  \bar{g}_{n} \pm t_{*} \frac{\hat{\sigma}_{n}}{\sqrt{n}},
2820\end{equation}
2821where $\bar{g}_{n}$ is the estimated moment (i.e. mean or variance),
2822$t_{*}$ is the Student's $t$-value for the 95th quantile, $n$ is the
2823MCMC chain length, and $\hat{\sigma}_{n}$ is an estimate of the
2824standard error whose square is obtained using batch means estimation.
2825To obtain the estimate $\hat{\sigma}_{n}$, the Markov chain produced
2826during calibration is broken up into ``batches," the sample moment is
2827calculated for each batch, and $\hat{\sigma}_{n}$ is subsequently
2828obtained as an unbiased estimate of the standard deviation in the
2829moment calculations over the batches. The confidence intervals
2830produced are 95\% confidence intervals, and they are calculated for
2831the mean and variance (first and second moments) for each parameter
2832and each response. Further details regarding the default settings for
2833these calculations can be found in the Dakota Theory
2834Manual~\cite{TheoMan}.
2835
2836Confidence intervals may be used as a chain diagnostic by setting
2837fixed-width stopping rules~\cite{Rob18}. For example, if the interval
2838width is below some threshold value, that may indicate that enough
2839samples have been drawn. Common choices for the threshold value
2840include:
2841\begin{itemize}
2842  \item Fixed width: $\epsilon$
2843  \item Relative magnitude: $\epsilon \| \bar{g}_{n} \|$
2844  \item Relative standard deviation: $\epsilon \| \hat{\sigma}_{n} \|$
2845\end{itemize}
2846If the chosen threshold is exceeded, \texttt{samples} may need to be
2847increased, say by 10\%, and the diagnostics reevaluated for signs of
2848chain convergence. Furthermore, if \texttt{output} is set to
2849\texttt{debug}, the sample moment for each batch (for each parameter
2850and response) is output to the screen. The user can then analyze the
2851convergence of these batch means in order to deduce whether the MCMC
2852chain has converged.
2853
2854\subsection{Calibrating the Observation Error Model}
2855
2856As discussed in Section~\ref{input:calib_data}, Dakota accepts user
2857information on the covariance $\Sigma_d$ among observation errors and
2858includes this in the likelihood formulation:
2859\begin{equation*}
2860\log{\mathcal{L}(\boldsymbol{{\theta};d})} \propto % =-0.5 {R}^{T} {\Sigma_d}^{-1} {R}
2861-\frac{1}{2} \boldsymbol{r}^T \boldsymbol{\Sigma_d}^{-1} \boldsymbol{r}.
2862\end{equation*}
2863
2864In some cases, it can be helpful to fine tune the assumptions in this
2865covariance during the calibration process.  Dakota supports
2866calibrating one or more multipliers on the blocks of the error
2867covariance.  So if $\Sigma_d$ is block diagonal such that $\Sigma_d =
2868\mbox{diag}({\Sigma_d}_1, ..., {\Sigma_d}_N)$, we instead formulate as
2869$\Sigma_d = \mbox{diag}(m_1{\Sigma_d}_1, ..., m_P{\Sigma_d}_P)$ and
2870calibrate the multipliers $m_i$ as hyper-parameters in the Bayesian
2871inference process.
2872
2873The supported modes for calibrating observation error multipliers are
2874shown in Figure \ref{fig:uq:obs_err_mult}: \dakotakw{one},
2875\dakotakw{per_experiment}, \dakotakw{per_response}, and \dakotakw{both}.
2876Here, the two major blocks denote two experiments, while the inner
2877blocks denote five response groups (two scalar, three field).  The
2878priors on the hyper-parameters $m_i$ are taken to be inverse gamma
2879distributions, with mean and mode approximately 1.0 and standard
2880deviation approximately 0.1.
2881
2882\begin{figure}[htbp!]
2883  \centering
2884  \begin{subfigmatrix}{2}
2885    \subfigure[]{\includegraphics[scale=0.5]{images/CalibrateOne}}
2886    \subfigure[]{\includegraphics[scale=0.5]{images/CalibratePerExperiment}}
2887    \subfigure[]{\includegraphics[scale=0.5]{images/CalibratePerResponse}}
2888    \subfigure[]{\includegraphics[scale=0.5]{images/CalibrateBoth}}
2889  \end{subfigmatrix}
2890  \caption{Calibrating observational error covariance multipliers: (a)
2891    one multiplier on whole user-provided covariance structure, (b)
2892    multiplier per-experiment, (c) multiplier per-response, and (d)
2893    both..}
2894  \label{fig:uq:obs_err_mult}
2895\end{figure}
2896
2897\subsection{Scaling and Weighting of Residuals}
2898
2899Dakota's scaling options, described in
2900Section~\ref{opt:additional:scaling}, can be used on Bayesian
2901calibration problems, using the \dakotakw{calibration_term_scales}
2902keyword, to scale the residuals between the data points and the model
2903predictions, if desired. Additionally, Bayesian calibration
2904residuals-squared can be weighted via the \dakotakw{calibration_terms
2905weights} specification. Neither set of weights nor scales are adjusted
2906during calibration.  When response scaling is active, it is applied
2907after error variance weighting and before \dakotakw{weights}
2908application. The \dakotakw{calibration_terms} keyword documentation in
2909the Dakota Reference Manual~\cite{RefMan} has more detail about
2910weighting and scaling of the residual terms.
2911
2912\subsection{Model Evidence}
2913In some situations, there are multiple models that may represent a phenomenon
2914and the user is left with the task to determine which is most appropriate given
2915the available data. In this case, Bayesian model selection may help.  Suppose that
2916the user has a set of models, $\mathcal{M}$={$M_1,M_2...M_m$} from which to
2917choose.  In the Bayesian setting, the parameters of each of these models may be
2918updated according to Bayes' rule:
2919\begin{equation}
2920\pi_{post}(\boldsymbol{\theta_i}|D,M_i)=\pi_{prior}(\boldsymbol{\theta_i}|M_i)\frac{\pi(D|\boldsymbol{\theta_i},M_i)}{\pi(D|M_i)}
2921\end{equation}
2922where the dependence on the model has been made explicit.  The denominator is used
2923as the likelihood of a specific model of interest in a version of Bayes' rule which calculates
2924the posterior model plausibility as:
2925\begin{equation}
2926\pi_{post}(M_i|D)=\pi_{prior}(M_i)\frac{\pi(D|M_i)}{\pi(D)}
2927\label{eq:model_plausibility}
2928\end{equation}
2929In this equation, the posterior model probability given the data is
2930also referred to as model plausibility.  The prior model plausibility,
2931$\pi(M_i)$, is usually taken to be uniform, meaning equally likely for all
2932models, but it does not have to be.  $\pi(D)$ is a normalizing factor such
2933that the sum of all model plausibilities is 1.  In this context, model selection
2934involves choosing the model with the highest posterior model plausibility.
2935Model evidence is defined as the likelihood in
2936Equation~\ref{eq:model_plausibility}, denoted by $\pi(D|M_i)$.  Model evidence
2937is determined by averaging the likelihood of its model parameters over all possible
2938values of the model parameters, according to their prior distributions.  It is
2939also called the marginal likelihood of the model.  Model evidence
2940is defined as:
2941\begin{equation}
2942\pi(D|M_i)=\int \pi(D|\boldsymbol{\theta_i},M_i)\pi_{prior}(\boldsymbol{\theta_i}|M_i)d \boldsymbol{\theta_i}
2943\label{eq:uq:model_evidence}
2944\end{equation}
2945
2946There are many ways to calculate model evidence. There are currently two methods
2947implemented in Dakota. The user first specifies \texttt{model\_evidence}, then either
2948\texttt{mc\_approx} and/or \texttt{laplace\_approx} depending on the
2949method(s) used to calculate model evidence.
2950\begin{enumerate}
2951\item Monte Carlo approximation.  This involves sampling from the prior
2952distribution of the parameters, calculating the corresponding likelihood values
2953at those samples, and  estimating the integral given in Eq.~\ref{eq:uq:model_evidence}
2954by brute force.  The number of samples used in the sampling of the integral is
2955determined by \dakotakw{evidence_samples}. Although this method is easy,
2956it is not efficient because each sample of the prior density requires
2957an evaluation of the simulation model to compute the corresponding likelihood.
2958Additionally, many prior samples will have very low (near zero) likelihood,
2959so millions of samples may be required for accurate computation of the integral.
2960\item Laplace approximation.  This approach is based on the Laplace approximation,
2961as outlined in~\cite{Wasserman}.  It has the assumption that the posterior distribution
2962is nearly Gaussian, which is not always a safe assumption.
2963Then, with maximum a posteriori (MAP) point $\hat{\boldsymbol{\theta}}$,
2964the Laplace approximation of model evidence is:
2965
2966\begin{equation}
2967\int \pi(D|\boldsymbol{\theta_i},M_i)\pi_{prior}(\boldsymbol{\theta_i}|M_i)d \boldsymbol{\theta_i} \approx \pi(D|\hat{\boldsymbol{\theta}},M_i)\pi(\hat{\boldsymbol{\theta}}|M_i)(2\pi)^{N_i/2}{\|\det(H(\hat{\boldsymbol{\theta}}))\|}^{-1/2}
2968\end{equation}
2969where $N_i$ is the number of unknown parameters in the i-th model and
2970$H$ is the negative Hessian of the log-posterior evaluated at the MAP point
2971$\hat{\boldsymbol{\theta}}$.
2972Therefore, this implementation only requires the evaluation of the model likelihood
2973and the Hessian of the log-posterior density at the MAP point.
2974
2975\end{enumerate}
2976
2977\subsection{Model Discrepancy}
2978%kam
2979Whether in a Bayesian setting or otherwise, the goal of model calibration
2980is to minimize the difference between the observational data $d_i$ and
2981the corresponding model response $q_i(\boldsymbol{\theta})$. In the
2982presence of scenario or configuration variables $x$, Eq.~\ref{eq:model} can
2983be modified,
2984\begin{equation}
2985d_i(x) = q_i\left(\boldsymbol{\theta}, x\right) + \epsilon_i,
2986\end{equation}
2987with the ensuing equations of the likelihood and Bayes' Theorem updated
2988likewise. The configuration variables represent experimental settings, such
2989as temperature or pressure, that may vary between experiments.
2990
2991However, it is often the case that the agreement between the
2992data and the model after calibration is not sufficiently close. This is
2993generally attributed to model form or structural error, and can be corrected
2994to some extent with the use of model discrepancy. The Kennedy and
2995O'Hagan~\cite{Kenn01} formulation takes the form
2996\begin{equation}
2997d_i(x) = q_i\left(\boldsymbol{\theta}, x\right) + \delta_i(x) + \epsilon_i,
2998\end{equation}
2999where $\delta_i(x)$ represents the model discrepancy. For scalar responses, the
3000model discrepancy is \textit{only} a function of the configuration variables.
3001Furthermore, one discrepancy model is calculated for \textit{each} observable
3002$d_i$, $i = 1, \ldots, n$, yielding $\delta_1, \ldots, \delta_n$. For field
3003responses, a single, global $\delta$ is a function of the configuration
3004variables as well as the independent field coordinates, which are usually
3005points in time or space. The construction of the model discrepancy in cases
3006with mixed scalar and field responses has not been tested.
3007
3008The current implementation of the model discrepancy capability in Dakota
3009serves as a post-processing mechanism after the completion of a Bayesian update.
3010If \texttt{model\_discrepancy} is specified in the input file, Dakota will
3011perform the Bayesian update as detailed in the section above, and then begin
3012the process of approximating $\delta$. For each scalar observable $d_i$ and for
3013each configuration $x_j$,
3014\begin{equation}
3015\delta_i \left( x_j \right) = d_i \left( x_j \right) -
3016q_i \left(\boldsymbol{\theta}^*, x_j \right),
3017\end{equation}
3018where $\boldsymbol{\theta}^*$ is the average of the calibrated posterior
3019distribution of the model parameters. The $i^{th}$ discrepancy function
3020will be built over the computed $\delta_i \left( x_j \right)$, $j = 1, \ldots,
3021m$. For field observable $d$, the discrepancy is calculated for each
3022independent field coordinate $t_{i}$ and for each configuration $x_{j}$,
3023\begin{equation}
3024  \delta(t_{i}, x_{j}) = d(t_{i}, x_{j}) - q(\boldsymbol{\theta}^{*}, t_{i},
3025  x_{j}).
3026\end{equation}
3027The global discrepancy function is then built over the computed $\delta(t_{i},
3028x_{j})$, $i = 1, \ldots, n$, $j = 1, \ldots, m$. For simplicity in future
3029notation, we let $\delta_{i}(x_i) = \delta(t_i, x_i)$.
3030
3031The field discrepancy function is built using a Gaussian process regression
3032model with a quadratic trend function. If instead the responses are scalars,
3033more options for the regression model are available. Within the Dakota input
3034file, the user may specify the \texttt{discrepancy\_type} to be either a
3035Gaussian process or polynomial regression model with the
3036\texttt{gaussian\_process} or \texttt{polynomial} commands, respectively.
3037Additionally, the order of the trend function may be selected using the
3038\texttt{correction\_order} command and choosing one of \texttt{constant},
3039\texttt{linear}, or \texttt{quadratic}. Any specifications using these keywords
3040will apply to all $\delta_i$. By default, Dakota will build a Gaussian process
3041discrepancy model with a quadratic trend function. Information regarding how
3042polynomial and Gaussian process models are built can be found in
3043Sections~\ref{models:surf:polynomial} and~\ref{models:surf:kriging},
3044respectively.
3045
3046The user may specify new ``prediction" configurations at which the
3047corrected model should be calculated. For each response and for each new
3048configuration, $q_i(\boldsymbol{\theta}, x_{k,new}) + \delta_i(x_{k,new})$
3049will be computed. The prediction configurations can be specified in one of
3050three ways. If \texttt{num\_prediction\_configs} is included, Dakota will
3051uniformly distribute the indicated number of prediction configurations
3052throughout the domain of the configuration variable that is given in the
3053\texttt{variables} block of the input file. Alternatively, the user may
3054explicitly list desired prediction configuration locations within the input
3055file following the \texttt{prediction\_configs} keyword, or in an external
3056file to be read in with the \texttt{import\_prediction\_configs} option. If
3057none of these three options is selected, Dakota will automatically calculate
3058corrected model predictions at ten configurations in the scalar response case,
3059with the predictions spread uniformly in the configuration variable domain. In
3060the case of field responses, corrected model predictions are calculated for each
3061value of the input configuration variable(s).
3062
3063Calculations corresponding to each prediction configuration and to each
3064observable will be output to tabular files. The responses from the discrepancy
3065model itself is output to \path{dakota_discrepancy_tabular.dat}. Those
3066from the corrected model are output to \path{dakota_corrected_tabular.dat}.
3067The user may specify the output file names for the discrepancy and corrected
3068model tabular files using the \texttt{export\_discrepancy\_file} and
3069\texttt{export\_corrected\_model\_file} keywords, respectively.
3070
3071Variance information corresponding to each specified configuration location
3072and for each observable is also computed. In a prediction setting for scalar
3073responses, the variance calculated from the discrepancy model is additively
3074combined with the variance information provided with the experimental data,
3075such that
3076\begin{equation}
3077\label{eq:discrep_var}
3078\Sigma_{total,i}(x) = \Sigma_{\delta, i}(x) + \sigma^{2}_{exp,i} I
3079\end{equation}
3080for each observable $i$. Further details of how the variance
3081$\Sigma_{\delta,i}(x)$ is computed for Gaussian process and polynomial
3082regression models can be found in the Dakota Theory Manual~\cite{TheoMan}. The
3083experimental variance provided for parameter calibration may vary for the
3084same observable from experiment to experiment, thus $\sigma^{2}_{exp,i}$ is
3085taken to be the maximum variance given for each observable. That is,
3086\begin{equation}
3087\sigma^2_{exp,i} = \max_{j} \sigma^2_{i}(x_j),
3088\end{equation}
3089where $\sigma^2_{i}(x_j)$ is the variance provided for the $i^{th}$ observable
3090$d_i$, computed or measured with the configuration variable $x_j$.
3091
3092When each corrected model value $q_i(\boldsymbol{\theta}^{*}, x_{k, new}) +
3093\delta_i(x_{k,new})$ is considered, the variance calculated
3094via~\ref{eq:discrep_var} provides a prediction interval, similar to those
3095described in Section~\ref{uq:bayesian:queso}. Including $\sigma^{2}_{exp,i}$
3096in the variance calculation accounts for the uncertainty in the model
3097predictions that arise due to uncertainties in the calibration data. These
3098prediction variances are output to the file
3099\path{dakota_discrepancy_variance_tabular.dat} by default. The name of
3100this file can be modified using the \texttt{export\_corrected\_variance\_file}
3101keyword in the input script. If the response is a field, the variance
3102information written to this file is the variance of the Gaussian process alone.
3103Future work includes calculation of combined experimental variance and
3104discrepancy model variance for field responses.
3105
3106Additional details and an illustrative example of these calculations are given
3107in the Dakota Theory Manual~\cite{TheoMan}.
3108
3109\subsection{Bayesian Experimental Design}
3110\label{sec:bayes_expdesign}
3111
3112The goal of experimental design is to add observational data to the Bayesian
3113update that informs model parameters and reduces their uncertainties. In
3114Bayesian approaches, data from physical experiments is typically used to
3115calibrate a model. However, another common practice is to use responses or
3116output from a high-fidelity model as ``truth data" in place of experimental
3117data in a low-fidelity model calibration. This can be done in with a single
3118Bayesian calibration, or it can be done iteratively with the use of
3119experimental design, where an initial set of high-fidelity runs is
3120augmented sequentially to find the next ``best" high-fidelity design point at
3121which to run the high-fidelity model to add to the calibration data. The
3122low-fidelity posterior parameter distribution is then updated again using
3123Bayesian calibration. The mutual information is used as the selection criterion
3124to guide the process of high-fidelity data acquisition.
3125
3126In Dakota, design conditions, such as temperature or spatial location, can be
3127specified using so-called configuration
3128variables. The design selection algorithm implemented in Dakota uses a
3129user-specified high-fidelity code to produce the ``experimental" or
3130observational data that is used in the calibration of the desired low-fidelity
3131model. The high-fidelity model is dependent only upon the design or
3132configuration variables while the low-fidelity model depends on both the
3133design variables and uncertain model parameters.
3134
3135An example Dakota input file that implements this Bayesian experimental design
3136algorithm is shown in
3137Figures~\ref{figure:uq_expdesign1}-\ref{figure:uq_expdesign2}. Note that there
3138are three \texttt{model} blocks, one describing the model hierarchy and one
3139each for the high-fidelity and low-fidelity models. There are two
3140\texttt{variables}, \texttt{interface}, and \texttt{responses} blocks such that
3141each model has its own specifications. The low-fidelity \texttt{variables}
3142block contains information about both the design variables, which are specified
3143with \texttt{continuous\_state}, and the parameters to be updated via Bayes'
3144Theorem~\ref{eq:BayesThm}, which are specified using one of the aleatory
3145uncertain variable types discussed in Section~\ref{variables:uncertain:cauv}.
3146In the high-fidelity \texttt{variables} block, only the
3147\texttt{continuous\_state} parameters are included. The specifications of the
3148design variables should be consistent in both blocks. Each \texttt{interface}
3149block should point to the appropriate high- or low-fidelity code, and the
3150\texttt{responses} blocks should contain consistent details about the responses
3151from each code. For example, both of the models should return the same number
3152of \texttt{calibration\_terms}.
3153
3154\stepcounter{figure}
3155\renewcommand{\thelstlisting}{\thechapter.\arabic{figure}}
3156\lstinputlisting
3157[
3158float=htbp,
3159lastline=53,
3160caption={Dakota input file for Bayesian Experimental Design.},
3161label={figure:uq_expdesign1},
3162captionpos=b,
3163basicstyle=\small,
3164frame=single
3165]
3166{bayes_experimental_design.in}
3167\stepcounter{figure}
3168\lstinputlisting
3169[
3170float=htbp,
3171firstline=54,
3172caption={Dakota input file for Bayesian Experimental Design.},
3173label={figure:uq_expdesign2},
3174captionpos=b,
3175basicstyle=\small,
3176frame=single
3177]
3178{bayes_experimental_design.in}
3179
3180The mutual information experimental design algorithm is selected by specifying
3181\texttt{bayesian\_calibration}, \texttt{queso}, and
3182\texttt{experimental\_design} within the \texttt{method} block of the input
3183file, and the first \texttt{model} block should contain the
3184\texttt{hierarchical} specification of the \texttt{surrogate} keyword. The
3185algorithm starts by performing a Bayesian calibration using a number of data
3186points, specified in Dakota by \texttt{initial\_samples}. These initial data
3187points can be pulled from external data using the
3188\texttt{calibration\_data\_file} keyword in the high-fidelity \texttt{response}
3189block. In this case, \texttt{num\_config\_variables} should be specified and
3190set to the number of configuration variables captured in the \texttt{variables}
3191blocks. Furthermore, for use in Bayesian experimental design,
3192\texttt{calibration\_data\_file} should contain the configuration variables
3193and the corresponding high-fidelity model responses. Scalar variance
3194information may be included for the calibration data through the use of the
3195\texttt{experimental\_variance\_type} or \texttt{simulation\_variance} command
3196within the high-fidelity \texttt{responses} block. The former is applied to any
3197user-provided data, such as through the \texttt{calibration\_data\_file}
3198keyword, while the latter applies only to those high-fidelity model responses
3199produced by the high-fidelity code run by Dakota. Further information can be
3200found in the Dakota Reference Manual~\cite{RefMan}. If the number of points
3201taken from this file is less than \texttt{initial\_samples}, or if no such
3202file is provided, Latin Hypercube Sampling is used to draw samples of the
3203design space, and the high-fidelity model is run at these points to supplement
3204the user-specified data. After this initial calibration, a set of design
3205conditions (i.e.\ configuration variables) of size \texttt{num\_candidates} is
3206proposed. Users may specify these candidate points through the
3207\texttt{import\_candidate\_points\_file} command. Again, if the number of
3208points in this file is less than \texttt{num\_candidates}, or if no such file
3209is provided, Latin Hypercube Sampling is used to draw samples of the design
3210space.
3211
3212From these candidate designs, that which maximizes the mutual information with
3213respect to the low-fidelity model parameters is deemed ``optimal." The mutual
3214information is approximated using the low-fidelity model and a $k$-nearest
3215neighbor algorithm, as detailed in~\cite{Lew16}. This optimal design is used in
3216the high-fidelity model to create a new observation, which is appended to the
3217initial data. This updated data is used to recalculate the Bayesian posterior,
3218and the process repeats until one of three stopping criteria are met.
3219Multiple optimal designs may be selected concurrently by specifying
3220\texttt{batch\_size} in the input script. These designs are selected using
3221the greedy algorithm described in detail in~\cite{TheoMan}. In this case, the
3222high-fidelity model is run at all batch-selected optimal designs before the
3223Bayesian posterior is recalculated with the updated data for an ensuing
3224iteration of the experimental design algorithm.
3225
3226There are two algorithms that may be used to calculate the mutual information,
3227both of which are derived in~\cite{Kra04}. The first algorithm discussed
3228therein is used as the default algorithm within Dakota; the second may be
3229selected by including the keyword \texttt{ksg2} in the Dakota input script.
3230Furthermore, the user may choose to include, during the computation of the
3231mutual information, a stochastic error term on the low-fidelity model
3232responses. This is done by specifying \texttt{simulation\_variance} in the
3233\texttt{responses} block corresponding to the low-fidelity model. See the
3234Dakota Theory Manual~\cite{TheoMan} for more information regarding the
3235implementation of the mutual information calculations.
3236
3237There are three criteria by which this algorithm is considered complete.
3238The user may specify \dakotakw{max_hifi_evaluations}, which limits the number
3239of high-fidelity model simulations Dakota will run. Note that this does not
3240include any simulations needed to perform the initial Bayesian calibration of
3241the low-fidelity model parameters. Alternatively, if the change in the mutual
3242information from one iteration to the next is sufficiently small or if all
3243candidate points have been exhausted, the algorithm will terminate.
3244
3245Progress of the algorithm will be reported to the screen with the rest of the
3246Dakota output. Furthermore, a summary of the algorithm's results, including,
3247for each iteration, the optimal design, the mutual information, and the
3248corresponding high-fidelity model response, can be found in the file
3249\path{experimental_design_output.txt}.
3250
3251\subsubsection{One-at-a-time Implementation}
3252
3253There may be some applications for which the high-fidelity model must be run
3254independently of Dakota. This algorithm may still be implemented in this case,
3255however, it requires some extra work by the user to ensure that the
3256high-fidelity model information is properly communicated to Dakota, as a
3257"dummy" high-fidelity model code must be supplied to Dakota. The data to be
3258used in the initial Bayesian calibration should be gathered from the
3259high-fidelity model or physical experiment and imported via the
3260\texttt{calibration\_data\_file} in the high-fidelity \texttt{responses} block,
3261and extra care should be taken to ensure that \texttt{initial\_samples} matches
3262the number of experiments in this file. It is also best, for this use-case, to
3263use \texttt{import\_candidate\_points\_file}, with \texttt{num\_candidates}
3264exactly matching the number of candidate points in the file.
3265
3266By setting \texttt{max\_hifi\_evaluations} to zero, Dakota will run the initial
3267calibration of the low-fidelity model, select the optimal design (or multiple
3268optimal designs when \texttt{batch\_size} is greater than 1) from
3269those provided in \texttt{import\_candidate\_points\_file}, and exit
3270\textit{without} running the ``dummy" high-fidelity model code. The selected
3271design(s) will be output to the screen, as well as to
3272\path{experimental_design_output.txt}, as detailed above. The high-fidelity
3273model may then be run offline with the newly selected design point(s).
3274
3275The user must update \texttt{calibration\_data\_file} with the new
3276high-fidelity data when it becomes available, as well as remove the previously
3277selected design point(s) from \texttt{import\_candidate\_points\_file}. Within
3278the Dakota input file, \texttt{initial\_samples}, \texttt{num\_experiments},
3279and \texttt{num\_candidates} should be correspondingly updated. Dakota may then
3280be run again to yield the next optimal experimental design(s). It should be
3281noted that the stopping criteria will not be automatically evaluated by Dakota
3282when one-at-a-time implementation is used. The user must determine when the
3283algorithm should be terminated.
3284
3285\section{Uncertainty Quantification Usage Guidelines} \label{usage:uq}
3286
3287The choice of uncertainty quantification method depends on how the
3288input uncertainty is characterized, the computational budget, and the
3289desired output accuracy.  The recommendations for UQ methods are
3290summarized in Table~\ref{usage:guideuq} and are discussed in the
3291remainder of the section.
3292
3293\begin{table}[hbp]
3294\centering
3295\caption{Guidelines for UQ method selection.} \label{usage:guideuq}\vspace{2mm}
3296\begin{tabular}{|c|c|c|}
3297\hline
3298\textbf{Method} & \textbf{Desired Problem} & \textbf{Applicable Methods} \\
3299\textbf{Classification} & \textbf{Characteristics} & \\
3300\hline
3301Sampling & nonsmooth, multimodal response functions;       & sampling
3302(Monte Carlo or LHS) \\
3303         & response evaluations are relatively inexpensive & \\
3304\hline
3305Local       & smooth, unimodal response functions; & local\_reliability
3306(MV, AMV/AMV$^2$,\\
3307reliability & larger sets of random variables;     & AMV+/AMV$^2$+, TANA,
3308FORM/SORM) \\
3309            & estimation of tail probabilities     & \\
3310\hline
3311Global      & smooth or limited nonsmooth response; & global\_reliability \\
3312reliability & multimodal response; low dimensional; & \\
3313            & estimation of tail probabilities      & \\
3314\hline
3315Adaptive    & smooth or limited nonsmooth response; & importance\_sampling, \\
3316Sampling    & multimodal response; low dimensional; & gpais, adaptive\_sampling, \\
3317            & estimation of tail probabilities      & pof\_darts \\
3318\hline
3319Stochastic & smooth or limited nonsmooth response; & polynomial\_chaos, \\
3320expansions & multimodal response; low dimensional; & stoch\_collocation\\
3321           & estimation of moments or moment-based metrics & \\
3322\hline
3323Epistemic & uncertainties are poorly characterized &
3324interval: local\_interval\_est, \\
3325 & & global\_interval\_est, sampling; \\
3326 & & BPA: local\_evidence, global\_evidence \\
3327\hline
3328Mixed UQ  & some uncertainties are poorly characterized &
3329nested UQ (IVP, SOP, DSTE) with epistemic \\
3330 & & outer loop and aleatory inner loop, sampling \\
3331\hline
3332\end{tabular}
3333\end{table}
3334
3335
3336{\bf Sampling Methods} \\
3337Sampling-based methods are the most robust uncertainty techniques
3338available, are applicable to almost all simulations, and possess
3339rigorous error bounds; consequently, they should be used whenever the
3340function is relatively inexpensive to compute and adequate sampling
3341can be performed. In the case of terascale computational simulations,
3342however, the number of function evaluations required by traditional
3343techniques such as Monte Carlo and Latin hypercube sampling (LHS)
3344quickly becomes prohibitive, especially if tail statistics are
3345needed. %Additional sampling options include quasi-Monte Carlo (QMC)
3346%sampling and importance sampling (IS),
3347%or Markov Chain Monte Carlo (MCMC) sampling
3348%and incremental sampling may also be used to incrementally add samples
3349%to an existing sample set.
3350
3351Alternatively, one can apply the traditional sampling techniques to a
3352surrogate function approximating the expensive computational
3353simulation (see Section~\ref{adv_models:sbuq}). However, if this
3354approach is selected, the user should be aware that it is very
3355difficult to assess the accuracy of the results obtained. Unlike
3356the case of surrogate-based local minimization (see
3357Section~\ref{adv_meth:sbm:sblm}), there is no simple pointwise calculation to
3358verify the accuracy of the approximate results. This is due to the
3359functional nature of uncertainty quantification, i.e. the accuracy of
3360the surrogate over the entire parameter space needs to be considered,
3361not just around a candidate optimum as in the case of surrogate-based
3362local. This issue especially manifests itself when trying to estimate
3363low probability events such as the catastrophic failure of a system.
3364
3365{\bf Reliability Methods} \\
3366Local reliability methods (e.g., MV, AMV/AMV$^2$, AMV+/AMV$^2$+, TANA,
3367and FORM/SORM) are more computationally efficient in general than the
3368sampling methods and are effective when applied to reasonably
3369well-behaved response functions; i.e., functions that are smooth,
3370unimodal, and only mildly nonlinear. They can be used to provide
3371qualitative sensitivity information concerning which uncertain
3372variables are important (with relatively few function evaluations), or
3373compute full cumulative or complementary cumulative response functions
3374(with additional computational effort). Since they rely on gradient
3375calculations to compute local optima (most probable points of
3376failure), they scale well for increasing numbers of random variables,
3377but issues with nonsmooth, discontinuous, and multimodal response
3378functions are relevant concerns. In addition, even if there is a
3379single MPP and it is calculated accurately, first-order and
3380second-order integrations may fail to accurately capture the shape of
3381the failure domain. In these cases, adaptive importance sampling
3382around the MPP can be helpful. Overall, local reliability methods
3383should be used with some care and their accuracy should be verified
3384whenever possible.
3385
3386An effective alternative to local reliability analysis when confronted
3387with nonsmooth, multimodal, and/or highly nonlinear response functions
3388is efficient global reliability analysis (EGRA). This technique
3389employs Gaussian process global surrogate models to accurately resolve
3390the failure domain and then employs multimodal adaptive importance
3391sampling to resolve the probabilities. For relatively low dimensional
3392problems (i.e, on the order of 10 variables), this method displays the
3393efficiency of local reliability analysis with the accuracy of
3394exhaustive sampling. While extremely promising, this method is still
3395relatively new and is the subject of ongoing refinements as we deploy
3396it to additional applications.
3397
3398{\bf Adaptive Sampling Methods} \\
3399There are now a number of methods in Dakota which are tailored to
3400estimating tail probabilities.
3401These methods include both standard importance sampling and
3402Gaussian Process Adaptive Importance Sampling, as well as
3403adaptive sampling and the POF-darts method. These methods
3404are suitable for smooth or limited non-smooth responses,
3405and work well in low dimensions. GPAIS and POF-darts utilize
3406a Gaussian process surrogate model.
3407
3408{\bf Stochastic Expansions Methods} \\
3409The next class of UQ methods available in Dakota is comprised of
3410stochastic expansion methods (polynomial chaos and stochastic
3411collocation), which are general purpose techniques provided that the
3412response functions possess finite second order moments. Further, these
3413methods capture the underlying functional relationship between a key
3414response metric and its random variables, rather than just
3415approximating statistics such as mean and standard deviation. This
3416class of methods parallels traditional variational methods in
3417mechanics; in that vein, efforts are underway to compute rigorous
3418error bounds of the approximations produced by the methods. Another
3419strength of these methods is their potential use in a multiphysics
3420environment as a means to propagate the uncertainty through a series
3421of simulations while retaining as much information as possible at each
3422stage of the analysis. The current challenge in the development of
3423these methods, as for other global surrogate-based methods, is
3424effective scaling for large numbers of random variables. Recent
3425advances in adaptive collocation and sparsity detection methods
3426address some of the scaling issues for stochastic expansions.
3427
3428{\bf Epistemic Uncertainty Quantification Methods} \\
3429The final class of UQ methods available in Dakota are focused on
3430epistemic uncertainties, or uncertainties resulting from a lack of
3431knowledge. In these problems, the assignment of input probability
3432distributions when data is sparse can be somewhat suspect. One
3433approach to handling epistemic uncertainties is interval analysis
3434(\texttt{local\_interval\_est} and \texttt{global\_interval\_est}),
3435where a set of intervals on inputs, one interval for each input
3436variable, is mapped to a set of intervals on outputs.  To perform this
3437process efficiently, optimization methods can be used.  Another
3438related technique is Dempster-Shafer theory of evidence (Dakota
3439methods \texttt{local\_evidence} and \texttt{global\_evidence}), where
3440multiple intervals per input variable (which can be overlapping,
3441contiguous, or disjoint) are propagated, again potentially using
3442optimization methods.  The choice between local or global optimization
3443methods for interval computation is governed by the same issues
3444described in Section~\ref{opt:usage}.
3445
3446{\bf Mixed Aleatoric and Epistemic Methods} \\
3447For problems with a mixture of epistemic and aleatoric uncertainties,
3448it is desirable to segregate the two uncertainty types within a nested
3449analysis, allowing stronger probabilistic inferences for the portion
3450of the problem where they are appropriate. In this nested approach, an
3451outer epistemic level selects realizations of epistemic parameters
3452(augmented variables) and/or realizations of random variable
3453distribution parameters (inserted variables). These realizations
3454define the objective probabilistic analysis to be performed on the
3455inner aleatoric level. In the case where the outer loop involves
3456propagation of subjective probability, the nested approach is known as
3457second-order probability and the study generates a family of CDF/CCDF
3458respresentations known as a ``horse tail'' plot.  In the case where
3459the outer loop is an interval propagation approach
3460(\texttt{local\_interval\_est} or \texttt{global\_interval\_est}), the
3461nested approach is known as interval-valued probability (see also
3462Section~\ref{models:nested}) . In the case where the outer loop is an
3463evidence-based approach (\texttt{local\_evidence} or
3464\texttt{global\_evidence}), the approach generates epistemic belief
3465and plausibility bounds on aleatory statistics.
3466
3467
3468\begin{comment}
3469\section{Future Nondeterministic Methods}\label{uq:future}
3470
3471Uncertainty analysis methods under investigation for future inclusion
3472into the Dakota framework include extensions to the stochastic
3473expansion methods and sampling capabilities currently supported.
3474In particular, smart adaptive methods that can mitigate the curse
3475of dimensionality are a research emphasis.
3476%
3477%Advanced ``smart sampling'' techniques such as bootstrap sampling (BS)
3478%and Markov chain Monte Carlo simulation (McMC) are being considered.
3479%We also have an active research focus on adaptive sparse grid methods,
3480%to more efficiently construct stochastic expansions.
3481%
3482%Efforts have been initiated to allow for non-traditional
3483%representations of uncertainty. We have implemented Dempster-Shafer
3484%theory of evidence, and other non-traditional approaches may follow.
3485%
3486We are also currently emphasizing the development of Bayesian methods,
3487specifically focusing on surrogate-modeling, discrepancy,
3488post-processing, and multi-model extensions to the Bayesian
3489calibration capabilities.
3490%
3491%, where a ``prior
3492%distribution'' on a parameter is updated through a Bayesian framework
3493%involving experimental data and a likelihood function.
3494%
3495%Finally, the tractability and efficacy of the more intrusive
3496%variant of stochastic finite element/polynomial chaos expansion
3497%methods, previously mentioned, is being assessed for possible
3498%implementation in Dakota.
3499\end{comment}
3500
3501%%  LocalWords:  MUQ MCMC Langevin
3502