1\chapter{Advanced Methods}\label{adv_meth}
2
3\section{Overview}\label{adv_meth:overview}
4
5A variety of ``meta-algorithm'' capabilities have been developed in
6order to provide a mechanism for employing individual iterators and
7models as reusable components within higher-level solution approaches.
8%It was driven by the observed need for ``meta-optimization'' and other
9%high level systems analysis procedures in real-world engineering
10%design problems.
11This capability allows the use of existing iterative algorithm and
12computational model software components as building blocks to
13accomplish more sophisticated studies, such as hybrid,
14multistart, Pareto, or surrogate-based minimization.  Further
15multi-component capabilities are enabled by the model recursion
16capabilities described in Chapter~\ref{models} with specific examples
17in Chapter~\ref{adv_models}.
18%When these model recursion specifications are sufficient to completely
19%describe a multi-iterator, multi-model solution approach, then a
20%separate meta-iterator specification is not used (see
21%Chapter~\ref{adv_models} for examples).
22
23\section{Hybrid Minimization}\label{adv_meth:hybrid}
24
25In the hybrid minimization method (keyword: \texttt{hybrid}), a
26sequence of minimization methods are applied to find an optimal design
27point. The goal of this method is to exploit the strengths of
28different minimization algorithms through different stages of the
29minimization process. Global/local optimization hybrids (e.g., genetic
30algorithms combined with nonlinear programming) are a common example
31in which the desire for a global optimum is balanced with the need for
32efficient navigation to a local optimum. An important related feature
33is that the sequence of minimization algorithms can employ models of
34varying fidelity. In the global/local case, for example, it would
35often be advantageous to use a low-fidelity model in the global search
36phase, followed by use of a more refined model in the local search
37phase.
38
39The specification for hybrid minimization involves a list of
40method identifier strings, and each of the corresponding method
41specifications has the responsibility for identifying the model
42specification (which may in turn identify variables, interface, and
43responses specifications) that each method will use (see the Dakota
44Reference Manual~\cite{RefMan} and the example discussed below).
45Currently, only the sequential hybrid approach is available. The
46\texttt{embedded} and \texttt{collaborative} approaches are
47not fully functional at this time.
48
49In the \texttt{sequential} hybrid minimization approach, a sequence
50of minimization methods is invoked in the order specified in the
51Dakota input file. After each method completes execution,
52the best solution or solutions from that method are used as the
53starting point(s) for the following method.
54The number of solutions transferred
55is defined by how many that method can generate and how many the
56user specifies with the individual method keyword \texttt{final\_solutions}.
57For example, currently only a few of the global optimization methods such as
58genetic algorithms (e.g. \texttt{moga} and \texttt{coliny\_ea}) and
59sampling methods return multiple solutions.  In this case,
60the specified number of solutions from the previous
61method will be used to initialize the subsequent method.  If the subsequent
62method cannot accept multiple input points (currently only a few methods
63such as the genetic algorithms in JEGA allow multiple input points), then
64multiple instances of the subsequent method are generated, each one
65initialized by one of the optimal solutions from the previous method.
66For example, if LHS sampling were run as the first method and
67the number of final solutions was 10 and the DOT conjugate gradient
68was the second method, there would be 10 instances of \texttt{dot\_frcg}
69started, each with a separate LHS sample solution as its initial point.
70Method switching is governed
71by the separate convergence controls of each method; that is,
72\emph{each method is allowed to run to its own internal definition of
73completion without interference}. Individual method completion may be
74determined by convergence criteria (e.g.,
75\texttt{convergence\_tolerance}) or iteration limits (e.g.,
76\texttt{max\_iterations}).
77
78%The \texttt{adaptive} option
79%is similar, with the difference that the progress of each method is
80%monitored and method switching is enforced according to
81%externally-defined relative progress metrics.
82
83%The \texttt{embedded} approach is restricted to special tightly-coupled
84%hybrid algorithms in which local searches are used periodically to
85%accelerate a global search.  These hybrids do not contain a discrete
86%method switch, but rather repeatedly apply a local algorithm within
87%the context of the global algorithm.
88
89Figure~\ref{adv_meth:figure01} shows a Dakota input file that specifies
90a sequential hybrid optimization method to solve the
91``textbook'' optimization test problem.
92The \path{textbook_hybrid_strat.in} file
93provided in \path{dakota/share/dakota/examples/users} starts with a
94\dakotakw{coliny_ea} solution which feeds its best point into a
95\dakotakw{coliny_pattern_search} optimization which feeds its best
96point into \dakotakw{optpp_newton}. While this approach is overkill for
97such a simple problem, it is useful for demonstrating the coordination
98between multiple sub-methods in the hybrid minimization algorithm.
99
100The three optimization methods are identified using the
101\texttt{method\_list} specification in the hybrid method section of the
102input file. The identifier strings listed in the specification are
103`\texttt{GA}' for genetic algorithm, `\texttt{PS}' for pattern search,
104and `\texttt{NLP}' for nonlinear programming. Following the hybrid method
105keyword block are the three corresponding method keyword blocks. Note
106that each method has a tag following the \texttt{id\_method} keyword
107that corresponds to one of the method names listed in the hybrid method
108keyword block. By following the identifier tags from \texttt{method}
109to \texttt{model} and from \texttt{model} to \texttt{variables},
110\texttt{interface}, and \texttt{responses}, it is easy to see the
111specification linkages for this problem. The GA optimizer runs first
112and uses model `\texttt{M1}' which includes variables `\texttt{V1}',
113interface `\texttt{I1}', and responses `\texttt{R1}'.
114Note that in the specification, \texttt{final\_solutions=1},
115so only one (the best) solution is returned from the GA.
116However, it is possible to change this to \texttt{final\_solutions=5}
117and get five solutions passed from the GA to the Pattern Search
118(for example).  Once the GA is complete, the PS optimizer starts from the
119best GA result and again
120uses model `\texttt{M1}'. Since both GA and PS are nongradient-based
121optimization methods, there is no need for gradient or Hessian
122information in the `\texttt{R1}' response keyword block. The NLP
123optimizer runs last, using the best result from the PS method as its
124starting point.  It uses model `\texttt{M2}' which includes the same
125`\texttt{V1}' and `\texttt{I1}' keyword blocks, but uses the responses
126keyword block `\texttt{R2}' since the full Newton optimizer used in
127this example (\texttt{optpp\_newton}) needs analytic gradient and
128Hessian data to perform its search.
129\begin{figure}
130  \centering
131  \begin{bigbox}
132    \begin{tiny}
133      \verbatimtabinput[8]{textbook_hybrid_strat.in}
134    \end{tiny}
135  \end{bigbox}
136  \caption{Dakota input file for a sequential hybrid optimization method --
137see \protect\path{dakota/share/dakota/examples/users/textbook_hybrid_strat.in} }
138  \label{adv_meth:figure01}
139\end{figure}
140
141\section{Multistart Local Minimization}\label{adv_meth:multistart}
142
143A simple, heuristic, global minimization technique is to use many
144local minimization runs, each of which is started from a different
145initial point in the parameter space. This is known as multistart
146local minimization. This is an attractive method in situations where
147multiple local optima are known or expected to exist in the parameter
148space. However, there is no theoretical guarantee that the global
149optimum will be found. This approach combines the efficiency of local
150minimization methods with a user-specified global stratification
151(using a specified \texttt{starting\_points} list, a number of
152specified \texttt{random\_starts}, or both; see the Dakota Reference
153Manual~\cite{RefMan} for additional specification details). Since
154solutions for different starting points are independent, parallel
155computing may be used to concurrently run the local minimizations.
156
157An example input file for multistart local optimization on the
158``quasi\_sine'' test function (see \path{quasi_sine_fcn.C} in
159\path{dakota_source/test}) is shown in Figure~\ref{adv_meth:figure02}.
160The method keyword block in the input file contains the keyword
161\texttt{multi\_start}, along with the set of starting points (3 random
162and 5 listed) that will be used for the optimization runs. The other
163keyword blocks in the input file are similar to what would be used in
164a single optimization run.
165
166\begin{figure}
167  \centering
168  \begin{bigbox}
169    \begin{small}
170      \verbatimtabinput[8]{qsf_multistart_strat.in}
171    \end{small}
172  \end{bigbox}
173  \caption{Dakota input file for a multistart local optimization method --
174see \protect\path{dakota/share/dakota/examples/users/qsf_multistart_strat.in} }
175  \label{adv_meth:figure02}
176\end{figure}
177
178The \texttt{quasi\_sine} test function has multiple local minima, but
179there is an overall trend in the function that tends toward the global
180minimum at $(x1,x2)=(0.177,0.177)$. See~\cite{Giu00} for more
181information on this test function. Figure~\ref{adv_meth:figure03} shows
182the results summary for the eight local optimizations performed. From
183the five specified starting points and the 3 random starting points
184(as identified by the \texttt{x1}, \texttt{x2} headers), the eight
185local optima (as identified by the \texttt{x1*},
186\texttt{x2*} headers) are all different and only one of the local
187optimizations finds the global minimum.
188
189\begin{figure}
190\centering
191\begin{bigbox}
192\begin{footnotesize}
193\begin{verbatim}
194<<<<< Results summary:
195   set_id             x1             x2            x1*            x2*         obj_fn
196        1           -0.8           -0.8  -0.8543728666  -0.8543728666   0.5584096919
197        2           -0.8            0.8  -0.9998398719    0.177092822    0.291406596
198        3            0.8           -0.8    0.177092822  -0.9998398719    0.291406596
199        4            0.8            0.8   0.1770928217   0.1770928217   0.0602471946
200        5              0              0  0.03572926375  0.03572926375  0.08730499239
201        6  -0.7767971993  0.01810943539  -0.7024118387  0.03572951143   0.3165522387
202        7  -0.3291571008  -0.7697378755   0.3167607374  -0.4009188363   0.2471403213
203        8   0.8704730469   0.7720679005    0.177092899   0.3167611757  0.08256082751
204\end{verbatim}
205\end{footnotesize}
206\end{bigbox}
207\caption{Dakota results summary for a multistart local optimization method.}
208\label{adv_meth:figure03}
209\end{figure}
210
211\section{Pareto Optimization}\label{adv_meth:pareto}
212
213The Pareto optimization method (keyword: \dakotakw{pareto_set}) is
214one of three multiobjective optimization capabilities discussed in
215Section~\ref{opt:additional:multiobjective}. In the Pareto
216optimization method, multiple sets of multiobjective weightings are
217evaluated. The user can specify these weighting sets in the method
218keyword block using a \dakotakw{multi_objective_weight_sets} list, a
219number of \dakotakw{random_weight_sets}, or both (see the Dakota
220Reference Manual~\cite{RefMan} for additional specification details).
221
222Dakota performs one multiobjective optimization problem for each set
223of multiobjective weights. The collection of computed optimal
224solutions form a Pareto set, which can be useful in making trade-off
225decisions in engineering design. Since solutions for different
226multiobjective weights are independent, parallel computing may be used
227to concurrently execute the multiobjective optimization problems.
228
229Figure~\ref{adv_meth:figure05} shows the results summary for the
230Pareto-set optimization method. For the four multiobjective
231weighting sets (as identified by the \texttt{w1}, \texttt{w2},
232\texttt{w3} headers), the local optima (as identified by the
233\texttt{x1}, \texttt{x2} headers) are all different and correspond to
234individual objective function values of ($f_1,f_2,f_3$) =
235(0.0,0.5,0.5), (13.1,-1.2,8.16), (532.,33.6,-2.9), and (0.125,0.0,0.0)
236(note: the composite objective function is tabulated under the
237\texttt{obj\_fn} header).  The first three solutions reflect exclusive
238optimization of each of the individual objective functions in turn,
239whereas the final solution reflects a balanced weighting and the
240lowest sum of the three objectives.  Plotting these ($f_1,f_2,f_3$)
241triplets on a 3-dimensional plot results in a Pareto surface (not
242shown), which is useful for visualizing the trade-offs in the
243competing objectives.
244
245\begin{figure}
246  \centering
247  \begin{bigbox}
248    \begin{small}
249      \verbatimtabinput[8]{textbook_pareto_strat.in}
250    \end{small}
251  \end{bigbox}
252  \caption{Dakota input file for the Pareto optimization method --
253see \protect\path{dakota/share/dakota/examples/users/textbook_pareto_strat.in} }
254  \label{adv_meth:figure04}
255\end{figure}
256
257\begin{figure}
258\centering
259\begin{bigbox}
260\begin{scriptsize}
261\begin{verbatim}
262<<<<< Results summary:
263   set_id             w1             w2             w3             x1             x2         obj_fn
264        1              1              0              0   0.9996554048    0.997046351 7.612301561e-11
265        2              0              1              0            0.5            2.9           -1.2
266        3              0              0              1            5.8 1.12747589e-11           -2.9
267        4          0.333          0.333          0.333            0.5   0.5000000041       0.041625
268\end{verbatim}
269\end{scriptsize}
270\end{bigbox}
271\caption{Dakota results summary for the Pareto-set optimization
272  method.}
273\label{adv_meth:figure05}
274\end{figure}
275
276\section{Mixed Integer Nonlinear Programming (MINLP)}\label{adv_meth:minlp}
277
278Many nonlinear optimization problems involve a combination of discrete
279and continuous variables. These are known as mixed integer nonlinear
280programming (MINLP) problems. A typical MINLP optimization problem is
281formulated as follows:
282
283\begin{eqnarray}
284  \hbox{minimize:} & & f(\mathbf{x,d})\nonumber\\
285  \hbox{subject to:} & & \mathbf{g}_{L} \leq \mathbf{g(x,d)}
286    \leq \mathbf{g}_{U}\nonumber\\
287  & & \mathbf{h(x,d)}=\mathbf{h}_{t}\label{adv_meth:equation01}\\
288  & & \mathbf{x}_{L} \leq \mathbf{x} \leq \mathbf{x}_{U}\nonumber\\
289  & & \mathbf{d} \in \{-2,-1,0,1,2\}\nonumber
290\end{eqnarray}
291
292where $\mathbf{d}$ is a vector whose elements are integer values. In
293situations where the discrete variables can be temporarily relaxed
294(i.e., noncategorical discrete variables, see
295Section~\ref{variables:design:ddv}), the branch-and-bound algorithm
296can be applied. Categorical variables (e.g., true/false variables,
297feature counts, etc.) that are not relaxable cannot be used with the
298branch and bound method.  During the branch and bound process, the
299discrete variables are treated as continuous variables and the
300integrality conditions on these variables are incrementally enforced
301through a sequence of optimization subproblems.  By the end of this
302process, an optimal solution that is feasible with respect to the
303integrality conditions is computed.
304
305Dakota's branch and bound method (keyword:
306\texttt{branch\_and\_bound}) can solve optimization problems having
307either discrete or mixed continuous/discrete variables. This method
308uses the parallel branch-and-bound algorithm from the PEBBL software
309%package~\cite{Eck97,Eck01} to generate a series of optimization
310package~\cite{Eck09} to generate a series of optimization
311subproblems (``branches''). These subproblems are solved as continuous
312variable problems using any of Dakota's nonlinear optimization
313algorithms (e.g., DOT, NPSOL). When a solution to a branch is feasible
314with respect to the integrality constraints, it provides an upper
315bound on the optimal objective function, which can be used to prune
316branches with higher objective functions that are not yet
317feasible. Since solutions for different branches are independent,
318parallel computing may be used to concurrently execute the
319optimization subproblems.
320
321PEBBL, by itself, targets the solution of mixed integer linear
322programming (MILP) problems, and through coupling with Dakota's
323nonlinear optimizers, is extended to solution of MINLP problems. In
324the case of MILP problems, the upper bound obtained with a feasible
325solution is an exact bound and the branch and bound process is
326provably convergent to the global minimum. For nonlinear problems
327which may exhibit nonconvexity or multimodality, the process is
328heuristic in general, since there may be good solutions that are
329missed during the solution of a particular branch. However, the
330process still computes a series of locally optimal solutions, and is
331therefore a natural extension of the results from local optimization
332techniques for continuous domains. Only with rigorous global
333optimization of each branch can a global minimum be guaranteed when
334performing branch and bound on nonlinear problems of unknown
335structure.
336
337In cases where there are only a few discrete variables and when the
338discrete values are drawn from a small set, then it may be reasonable
339to perform a separate optimization problem for all of the possible
340combinations of the discrete variables. However, this brute force
341approach becomes computationally intractable if these conditions are
342not met. The branch-and-bound algorithm will generally require
343solution of fewer subproblems than the brute force method, although it
344will still be significantly more expensive than solving a purely
345continuous design problem.
346
347\subsection{Example MINLP Problem}\label{adv_meth:minlp:example}
348
349As an example, consider the following MINLP problem~\cite{Eld99}:
350
351\begin{eqnarray}
352  \hbox{minimize:} & &
353  f(\mathbf{x})=\sum_{i=1}^{6}(x_{i}-1.4)^{4}\nonumber\\
354  & & g_{1}=x_{1}^{2}-\frac{x_{2}}{2} \leq 0\nonumber\\
355  & & g_{2}=x_{2}^{2}-\frac{x_{1}}{2} \leq 0\label{adv_meth:equation02}\\
356  & & -10 \leq x_{1},x_{2},x_{3},x_{4} \leq 10\nonumber\\
357  & & x_{5},x_{6} \in \{0,1,2,3,4\}\nonumber
358\end{eqnarray}
359
360This problem is a variant of the textbook test problem described in
361Section~\ref{additional:textbook}. In addition to the introduction of
362two integer variables, a modified value of $1.4$ is used inside the
363quartic sum to render the continuous solution a non-integral solution.
364%Figure~\ref{adv_meth:figure06} shows a Dakota input file for solving this
365%problem. This input file is named \path{dakota_bandb.in} in the
366%\path{dakota/share/dakota/test} directory. Note the specification for the
367%discrete variables, where lower and upper bounds are given. The
368%discrete variables can take on any integer value within these bounds.
369
370%\begin{figure}
371%  \centering
372%  \begin{bigbox}
373%    \begin{small}
374%      \verbatimtabinput[8]{dakota_bandb.in}
375%    \end{small}
376%  \end{bigbox}
377%  \caption{Dakota input file for the branch-and-bound method for
378%    solving MINLP optimization problems.}
379%  \label{adv_meth:figure06}
380%\end{figure}
381
382Figure~\ref{adv_meth:figure07} shows the sequence of branches generated
383for this problem.  The first optimization subproblem relaxes the
384integrality constraint on parameters $x_{5}$ and $x_{6}$, so that $0
385\leq x_{5} \leq 4$ and $0 \leq x_{6} \leq 4$. The values for $x_{5}$
386and $x_{6}$ at the solution to this first subproblem are
387$x_{5}=x_{6}=1.4$.  Since $x_{5}$ and $x_{6}$ must be integers, the
388next step in the solution process ``branches'' on parameter $x_{5}$ to
389create two new optimization subproblems; one with $0 \leq x_{5} \leq
3901$ and the other with $2 \leq x_{5} \leq 4$.  Note that, at this first
391branching, the bounds on $x_{6}$ are still $0 \leq x_{6} \leq 4$.
392Next, the two new optimization subproblems are solved.  Since they are
393independent, they can be performed in parallel.  The branch-and-bound
394process continues, operating on both $x_{5}$ and $x_{6}$ , until a
395optimization subproblem is solved where $x_{5}$ and $x_{6}$ are
396integer-valued. At the solution to this problem, the optimal values
397for $x_{5}$ and $x_{6}$ are $x_{5}=x_{6}=1$.
398
399\begin{figure}
400  \centering
401  \includegraphics[scale=0.75]{images/branch_history}
402  \caption{Branching history for example MINLP optimization problem.}
403  \label{adv_meth:figure07}
404\end{figure}
405
406In this example problem, the branch-and-bound algorithm executes as
407few as five and no more than seven optimization subproblems to reach
408the solution. For comparison, the brute force approach would require
40925 optimization problems to be solved (i.e., five possible values for
410each of $x_{5}$ and $x_{6}$ ).
411
412In the example given above, the discrete variables are integer-valued.
413In some cases, the discrete variables may be real-valued, such as $x
414\in \{0.0,0.5,1.0,1.5,2.0\}$.  The branch-and-bound algorithm is
415restricted to work with integer values. Therefore, it is up to the
416user to perform a transformation between the discrete integer values
417from Dakota and the discrete real values that are passed to the
418simulation code (see Section~\ref{variables:design:ddv}).  When
419integrality is not being relaxed, a common mapping is to use the
420integer value from Dakota as the index into a vector of discrete real
421values.  However, when integrality is relaxed, additional logic for
422interpolating between the discrete real values is needed.
423% Note: it should be straightforward to extend MINLP to support
424% general discrete variables, if PICO would support it.  Does this
425% come up in MILP for logistics, etc.?
426
427\section{Surrogate-Based Minimization}\label{adv_meth:sbm}
428
429Surrogate models approximate an original, high fidelity ``truth''
430model, typically at reduced computational cost.  In Dakota, several
431surrogate model selections are possible, which are categorized as data
432fits, multifidelity models, and reduced-order models, as described in
433Section~\ref{models:surrogate}.  In the context of minimization
434(optimization or calibration), surrogate models can speed convergence
435by reducing function evaluation cost or smoothing noisy response
436functions.  Three categories of surrogate-based minimization are
437discussed in this chapter:
438\begin{itemize}
439\item Trust region-managed surrogate-based local minimization, with
440  data fit surrogate, multifidelity models, or reduced-order models.
441
442\item Surrogate-based global minimization, where a single surrogate is
443  built (and optionally iteratively updated) over the whole design
444  space.
445
446\item Efficient global minimization: nongradient-based constrained and
447  unconstrained optimization and nonlinear least squares based on
448  Gaussian process models, guided by an expected improvement function.
449\end{itemize}
450
451\subsection{Surrogate-Based Local Minimization}\label{adv_meth:sbm:sblm}
452
453In the surrogate-based local minimization method (keyword:
454\texttt{surrogate\_based\_local}) the minimization algorithm operates
455on a surrogate model instead of directly operating on the
456computationally expensive simulation model. The surrogate model can be
457based on data fits, multifidelity models, or reduced-order models, as
458described in Section~\ref{models:surrogate}. Since the surrogate will
459generally have a limited range of accuracy, the surrogate-based local
460algorithm periodically checks the accuracy of the surrogate model
461against the original simulation model and adaptively manages the
462extent of the approximate optimization cycles using a trust region
463approach.
464
465%The surrogate-based local method in
466%Dakota can be implemented using heuristic rules (less expensive) or
467%provably-convergent rules (more expensive). The heuristic approach
468%is particularly effective on real-world engineering design problems
469%that contain nonsmooth features (e.g., slope discontinuities,
470%numerical noise) where gradient-based optimization methods often have
471%trouble, and where the computational expense of the simulation
472%precludes the use of nongradient-based methods.
473
474Refer to the Dakota Theory Manual~\cite{TheoMan} for algorithmic
475details on iterate acceptance, merit function formulations,
476convergence assessment, and constraint relaxation.
477
478
479\subsubsection{SBO with Data Fits}\label{adv_meth:sbm:sblm:surface}
480
481When performing SBO with local, multipoint, and global data fit
482surrogates, it is necessary to regenerate or update the data fit for
483each new trust region.  In the global data fit case, this can mean
484performing a new design of experiments on the original high-fidelity
485model for each trust region, which can effectively limit the approach
486to use on problems with, at most, tens of variables.
487Figure~\ref{fig:sbo_df} displays this case.  However, an important
488benefit of the global sampling is that the global data fits can tame
489poorly-behaved, nonsmooth, discontinuous response variations within
490the original model into smooth, differentiable, easily navigated
491surrogates.  This allows SBO with global data fits to extract the
492relevant global design trends from noisy simulation data.
493
494\begin{figure}
495  \centering
496  \includegraphics[width=.4\textwidth]{images/sbo_df}
497  \caption{SBO iteration progression for global data fits.}
498  \label{fig:sbo_df}
499\end{figure}
500When enforcing local consistency between a global data fit surrogate
501and a high-fidelity model at a point, care must be taken to balance
502this local consistency requirement with the global accuracy of the
503surrogate.  In particular, performing a correction on an existing
504global data fit in order to enforce local consistency can skew the
505data fit and destroy its global accuracy.  One approach for achieving
506this balance is to include the consistency requirement within the data
507fit process by constraining the global data fit calculation (e.g.,
508using constrained linear least squares).  This allows the data fit to
509satisfy the consistency requirement while still addressing global
510accuracy with its remaining degrees of freedom.
511% Use figure from Theresa's paper?  Use equations from notes?
512Embedding the consistency within the data fit also reduces the
513sampling requirements.  For example, a quadratic polynomial normally
514requires at least $(n+1)(n+2)/2$ samples for $n$ variables to perform
515the fit.  However, with an embedded first-order consistency constraint
516at a single point, the minimum number of samples is reduced by $n+1$
517to $(n^2+n)/2$.
518% With gradient information in each sample, this can be further
519% reduced to ceil(n+2/2) samples.
520%This corresponds to defining the terms of a symmetric Hessian matrix
521%and points to an alternate approach.  Rather than enforcing
522%consistency through constrained least squares, one can embed
523%consistency directly by employing a Taylor series centered at the
524%point of local consistency enforcement and globally estimating the
525%higher order terms.  In the quadratic polynomial example, a
526%second-order Taylor series with globally estimated Hessian terms
527%requires the same $(n^2+n)/2$ samples and directly satisfies
528%first-order consistency.  To further reduce sampling requirements in
529%this case, one can choose to perform only partial updates (e.g., the
530%diagonal) of the Hessian matrix~\cite{Per02}.
531
532% Additional research area: Exploiting variance estimators to guide
533% global search (e.g., kriging)
534
535In the local and multipoint data fit cases, the iteration progression
536will appear as in Fig.~\ref{fig:sbo_mh}.  Both cases involve a single
537new evaluation of the original high-fidelity model per trust region,
538with the distinction that multipoint approximations reuse information
539from previous SBO iterates.  Like model hierarchy surrogates, these
540techniques scale to larger numbers of design variables.  Unlike model
541hierarchy surrogates, they generally do not require surrogate
542corrections, since the matching conditions are embedded in the
543surrogate form (as discussed for the global Taylor series approach
544above).  The primary disadvantage to these surrogates is that the
545region of accuracy tends to be smaller than for global data fits and
546multifidelity surrogates, requiring more SBO cycles with smaller trust
547regions.
548%In SBO with surface fit functions, a sequence of optimization
549%subproblems are evaluated, each of which is confined to a subset of
550%the parameter space known as a ``trust region.'' Inside each trust
551%region, Dakota's data sampling methods are used to evaluate the
552%response quantities at a small number (order $10^{1}$ to $10^{2}$) of
553%design points. Next, multidimensional surface fitting is performed to
554%create a surrogate function for each of the response quantities.
555%Finally, optimization is performed using the surrogate functions in
556%lieu of the actual response quantities, and the optimizer's search is
557%limited to the region inside the trust region bounds. A validation
558%procedure is then applied to compare the predicted improvement in the
559%response quantities to the actual improvement in the response
560%quantities. Based on the results of this validation, the optimum
561%design point is either accepted or rejected and the size of the trust
562%region is either expanded, contracted, or left unchanged. The sequence
563%of optimization subproblems continues until the SBO convergence
564%criteria are satisfied
565More information on the design of experiments methods is available in
566Chapter~\ref{dace}, and the data fit surrogates are described in
567Section~\ref{models:surrogate:datafit}.
568
569Figure~\ref{sbm:sblm_rosen} shows a Dakota input file that implements
570surrogate-based optimization on Rosenbrock's function.
571The first method keyword block contains the SBO
572keyword \texttt{surrogate\_based\_local}, plus the commands for
573specifying the trust region size and scaling factors. The optimization
574portion of SBO, using the CONMIN Fletcher-Reeves conjugate gradient method,
575is specified in the following keyword blocks for
576\texttt{method}, \texttt{model}, \texttt{variables}, and
577\texttt{responses}.  The model used by the optimization method
578specifies that a global surrogate will be used to map variables into
579responses (no \texttt{interface} specification is used by the
580surrogate model). The global surrogate is constructed using a DACE
581method which is identified with the \texttt{`SAMPLING'} identifier.
582This data sampling portion of SBO is specified in the final set of
583keyword blocks for \texttt{method}, \texttt{model},
584\texttt{interface}, and \texttt{responses} (the earlier
585\texttt{variables} specification is reused). This example problem uses
586the Latin hypercube sampling method in the LHS software to select 10
587design points in each trust region. A single surrogate model is
588constructed for the objective function using a quadratic polynomial.
589The initial trust region is centered at the design point
590$(x_1,x_2)=(-1.2,1.0)$, and extends $\pm 0.4$ (10\% of the global
591bounds) from this point in the $x_1$ and $x_2$ coordinate directions.
592\begin{figure}
593  \begin{bigbox}
594    \begin{tiny}
595      \verbatimtabinput[8]{rosen_opt_sbo.in}
596    \end{tiny}
597  \end{bigbox}
598  \caption{Dakota input file for the surrogate-based local optimization
599    example --
600see \protect\path{dakota/share/dakota/examples/users/rosen_opt_sbo.in} }
601  \label{sbm:sblm_rosen}
602\end{figure}
603
604If this input file is executed in Dakota, it will converge to the
605optimal design point at $(x_{1},x_{2})=(1,1)$ in approximately 800
606function evaluations. While this solution is correct, it is obtained
607at a much higher cost than a traditional gradient-based optimizer
608(e.g., see the results obtained in Section~\ref{tutorial:examples:optimization}).
609This demonstrates that the SBO method with global data fits is not
610really intended for use with smooth continuous optimization problems;
611direct gradient-based optimization can be more efficient for such
612applications. Rather, SBO with global data fits is best-suited for the
613types of problems that occur in engineering design where the response
614quantities may be discontinuous, nonsmooth, or may have multiple local
615optima~\cite{Giu02}. In these types of engineering design problems,
616traditional gradient-based optimizers often are ineffective, whereas
617global data fits can extract the global trends of interest despite the
618presence of local nonsmoothness (for an example problem with multiple
619local optima, look in \path{dakota/share/dakota/test} for the file
620\path{dakota_sbo_sine_fcn.in}~\cite{Giu00}).
621
622The surrogate-based local minimizer is only mathematically
623guaranteed to find a local minimum. However, in practice, SBO can often find
624the global minimum.  Due to the random sampling method used within the
625SBO algorithm, the SBO method will solve a given problem a little differently
626each time it is run (unless the user specifies a particular random
627number seed in the dakota input file as is shown in Figure~\ref{sbm:sblm_rosen}).
628Our experience on the quasi-sine function mentioned above is that if
629you run this problem 10 times with the same starting conditions but different
630seeds, then you will find the global minimum in about 70-80\% of the trials.
631This is good performance for what is mathematically only a local optimization method.
632
633\subsubsection{SBO with Multifidelity Models}\label{adv_meth:sbm:sblm:multifidelity}
634
635When performing SBO with model hierarchies, the low-fidelity model is
636normally fixed, requiring only a single high-fidelity evaluation to
637compute a new correction for each new trust region.
638Figure~\ref{fig:sbo_mh} displays this case.  This renders the
639multifidelity SBO technique more scalable to larger numbers of design
640variables since the number of high-fidelity evaluations per iteration
641(assuming no finite differencing for derivatives) is independent of
642the scale of the design problem.  However, the ability to smooth
643poorly-behaved response variations in the high-fidelity model is lost,
644and the technique becomes dependent on having a well-behaved
645low-fidelity model\footnote{It is also possible to use a hybrid data
646fit/multifidelity approach in which a smooth data fit of a noisy low
647fidelity model is used in combination with a high fidelity model}.  In
648addition, the parameterizations for the low and high-fidelity models
649may differ, requiring the use of a mapping between these
650parameterizations.  Space mapping, corrected space mapping, POD
651mapping, and hybrid POD space mapping are being explored for this
652purpose~\cite{Rob06a,Rob06b}.
653
654\begin{wrapfigure}{r}{.3\textwidth}
655  \centering
656  \includegraphics[width=.3\textwidth]{images/sbo_mh}
657  \caption{SBO iteration progression for model hierarchies.}
658  \label{fig:sbo_mh}
659\end{wrapfigure}
660%\begin{figure}
661%\epsfxsize 3in
662%\centerline{\epsfbox{sbo_mh.eps}}
663%\caption{SBO iteration progression for model hierarchies.}
664%\label{fig:sbo_mh}
665%\end{figure}
666
667When applying corrections to the low-fidelity model, there is no
668concern for balancing global accuracy with the local consistency
669requirements.  However, with only a single high-fidelity model evaluation
670at the center of each trust region, it is critical to use the best
671correction possible on the low-fidelity model in order to achieve
672rapid convergence rates to the optimum of the high-fidelity
673model~\cite{Eld04}.
674
675%SBO can also be applied with multifidelity, or hierarchical, models,
676%i.e., where one has available both a high-fidelity computational model
677%and a low-fidelity computational model. This situation can occur when
678%the low-fidelity model neglects some physical phenomena (e.g.,
679%viscosity, heat transfer, etc.) that are included in the high-fidelity
680%model, or when the low-fidelity model has a lower resolution
681%computational mesh than the high-fidelity model. In many cases, the
682%low-fidelity model can serve as a surrogate for the high-fidelity
683%model during the optimization process. Thus, the low-fidelity model
684%can be used in SBO in a manner similar to the use of surface fit models
685%described in Section~\ref{adv_meth:sbm:sblm:surface}. A key difference
686%in SBO with hierarchical surrogates is that a design of experiments
687%using the high-fidelity model is not required; rather high-fidelity
688%evaluations are only needed at the center of the current trust-region
689%and the predicted optimum point in order to correct the low-fidelity
690%model and verify improvement, respectively. Another difference is that
691%one of the four types of correction described in
692%Section~\ref{adv_meth:sbm:sblm:surface} is required for SBO with
693%multifidelity models.
694
695A multifidelity test problem named
696\path{dakota_sbo_hierarchical.in} is available in
697\path{dakota/share/dakota/test} to demonstrate this SBO approach. This test
698problem uses the Rosenbrock function as the high fidelity model and a
699function named ``lf\_rosenbrock'' as the low fidelity model. Here,
700lf\_rosenbrock is a variant of the Rosenbrock function (see
701\path{dakota_source/test/lf_rosenbrock.C} for formulation) with the
702minimum point at $(x_1,x_2)=(0.80,0.44)$, whereas the minimum of the
703original Rosenbrock function is $(x_1,x_2)=(1,1)$. Multifidelity SBO
704locates the high-fidelity minimum in 11 high fidelity evaluations for
705additive second-order corrections and in 208 high fidelity evaluations
706for additive first-order corrections, but fails for zeroth-order
707additive corrections by converging to the low-fidelity minimum.
708
709\subsubsection{SBO with Reduced Order Models}\label{adv_meth:sbm:sblm:rom}
710
711When performing SBO with reduced-order models (ROMs), the ROM is
712mathematically generated from the high-fidelity model.  A critical
713issue in this ROM generation is the ability to capture the effect of
714parametric changes within the ROM.  Two approaches to parametric ROM
715are extended ROM (E-ROM) and spanning ROM (S-ROM)
716techniques~\cite{Wei06}.  Closely related techniques include tensor
717singular value decomposition (SVD) methods~\cite{Lat00}.  In the
718single-point and multipoint E-ROM cases, the SBO iteration can appear
719as in Fig.~\ref{fig:sbo_mh}, whereas in the S-ROM, global E-ROM, and
720tensor SVD cases, the SBO iteration will appear as in
721Fig.~\ref{fig:sbo_df}.  In addition to the high-fidelity model
722analysis requirements, procedures for updating the system matrices and
723basis vectors are also required.
724
725Relative to data fits and multifidelity models, ROMs have some
726attractive advantages.  Compared to data fits such as regression-based
727polynomial models, they are more physics-based and would be expected
728to be more predictive (e.g., in extrapolating away from the immediate
729data).  Compared to multifidelity models, ROMS may be more practical
730in that they do not require multiple computational models or meshes
731which are not always available.  The primary disadvantage is potential
732invasiveness to the simulation code for projecting the system using
733the reduced basis.
734
735
736\subsection{Surrogate-Based Global Minimization}\label{adv_meth:sbm:sbgm}
737
738Surrogate-based global minimization differs from the surrogate-based
739local minimization approach discussed in the previous section in
740several ways: it is not a trust-region approach; initially there is
741one global surrogate constructed over a set of sample points and the
742optimizer operates on that surrogate (as opposed to adaptively
743selecting points and re-building a surrogate in each trust region);
744and there is no guarantee of convergence.
745
746The \texttt{surrogate\_based\_global} method was developed to address
747two needs.  The first is the case where a user wishes to use existing
748function evaluations or a fixed sample size (perhaps based on
749computational cost and allocation of resources) to build a surrogate
750once and optimize on it.  In this case (a single global optimization
751on a surrogate model), the set of surrogate building points is
752determined in advance as opposed to the trust-region local surrogate
753optimization in which the number of ``true'' function evaluations
754depends on the location and size of the trust region, the goodness of
755the surrogate within the trust-region, and problem characteristics.
756
757In the second \texttt{surrogate\_based\_global} use case, we want to
758update the surrogate, but globally.  That is, we add points to the
759sample set used to create the surrogate, rebuild the surrogate, and
760then perform another global optimization on the new surrogate.  Thus,
761surrogate-based global optimization can be used in an iterative
762scheme.  In one iteration, minimizers of the surrogate model are
763found, and a selected subset of these are passed to the next
764iteration.  In the next iteration, these surrogate points are
765evaluated with the ``truth'' model, and then added to the set of
766points upon which the next surrogate is constructed.  This presents a
767more accurate surrogate to the minimizer at each subsequent iteration,
768presumably driving to optimality quickly.  Note that a global
769surrogate is constructed using the same bounds in each iteration.
770This approach has no guarantee of convergence.
771
772The surrogate-based global method was originally designed for MOGA (a
773multi-objective genetic algorithm).  Since genetic algorithms often
774need thousands or tens of thousands of points to produce optimal or
775near-optimal solutions, surrogates can help by reducing the necessary
776truth model evaluations.  Instead of creating one set of surrogates
777for the individual objectives and running the optimization algorithm
778on the surrogate once, the idea is to select points along the
779(surrogate) Pareto frontier, which can be used to supplement the
780existing points.  In this way, one does not need to use many points
781initially to get a very accurate surrogate.  The surrogate becomes
782more accurate as the iterations progress.
783
784Most single objective optimization methods will return only a single
785optimal point.  In that case, only one point from the surrogate model
786will be evaluated with the ``true'' function and added to the pointset
787upon which the surrogate is based.  In this case, it will take many
788iterations of the surrogate-based global optimization for the approach
789to converge, and its utility may not be as great as for the
790multi-objective case when multiple optimal solutions are passed from
791one iteration to the next to supplement the surrogate.  Note that the
792user has the option of appending the optimal points from the surrogate
793model to the current set of truth points or using the optimal points
794from the surrogate model to replace the optimal set of points from the
795previous iteration.  Although appending to the set is the default
796behavior, at this time we strongly recommend using the option
797\texttt{replace\_points} because it appears to be more accurate and
798robust.
799
800When using the surrogate-based global method, we first recommend
801running one optimization on a single surrogate model. That is, set
802\texttt{max\_iterations} to 1.  This will allow one to get a sense of
803where the optima are located and also what surrogate types are the
804most accurate to use for the problem.  Note that by fixing the seed of
805the sample on which the surrogate is built, one can take a Dakota
806input file, change the surrogate type, and re-run the problem without
807any additional function evaluations by specifying the use of the
808dakota restart file which will pick up the existing function
809evaluations, create the new surrogate type, and run the optimization
810on that new surrogate.  Also note that one can specify that surrogates
811be built for all primary functions and constraints or for only a
812subset of these functions and constraints.  This allows one to use a
813"truth" model directly for some of the response functions, perhaps due
814to them being much less expensive than other functions.  Finally, a
815diagnostic threshold can be used to stop the method if the surrogate
816is so poor that it is unlikely to provide useful points.  If the
817goodness-of-fit has an R-squared value less than 0.5, meaning that
818less than half the variance of the output can be explained or
819accounted for by the surrogate model, the surrogate-based global
820optimization stops and outputs an error message.  This is an arbitrary
821threshold, but generally one would want to have an R-squared value as
822close to 1.0 as possible, and an R-squared value below 0.5 indicates a
823very poor fit.
824
825For the surrogate-based global method, we initially recommend a small
826number of maximum iterations, such as 3--5, to get a sense of how the
827optimization is evolving as the surrogate gets updated globally.  If
828it appears to be changing significantly, then a larger number (used in
829combination with restart) may be needed.
830
831Figure~\ref{sbm:sbgm_moga} shows a Dakota input file that implements
832surrogate-based global optimization on a multi-objective test function.
833The first method keyword block contains the
834keyword \texttt{surrogate\_based\_global}, plus the commands for
835specifying five as the maximum iterations and the option to replace
836points in the global surrogate construction. The method block identified
837as MOGA specifies a multi-objective genetic algorithm optimizer and its
838controls.  The model keyword block specifies a surrogate model.
839In this case, a \texttt{gaussian\_process} model is used as a surrogate.
840The \texttt{dace\_method\_pointer} specifies that the surrogate will be
841build on 100 Latin Hypercube samples with a seed = 531.
842The remainder of the input specification deals with the interface
843to the actual analysis driver and the 2 responses being returned
844as objective functions from that driver.
845
846\begin{figure}
847  \begin{bigbox}
848    \begin{scriptsize}
849      \verbatimtabinput[8]{mogatest1_opt_sbo.in}
850    \end{scriptsize}
851  \end{bigbox}
852  \caption{MOGA example --
853see \protect\path{dakota/share/dakota/examples/users/mogatest1_opt_sbo.in} }
854  \label{sbm:sbgm_moga}
855\end{figure}
856