1\chapter{Advanced Methods}\label{adv_meth} 2 3\section{Overview}\label{adv_meth:overview} 4 5A variety of ``meta-algorithm'' capabilities have been developed in 6order to provide a mechanism for employing individual iterators and 7models as reusable components within higher-level solution approaches. 8%It was driven by the observed need for ``meta-optimization'' and other 9%high level systems analysis procedures in real-world engineering 10%design problems. 11This capability allows the use of existing iterative algorithm and 12computational model software components as building blocks to 13accomplish more sophisticated studies, such as hybrid, 14multistart, Pareto, or surrogate-based minimization. Further 15multi-component capabilities are enabled by the model recursion 16capabilities described in Chapter~\ref{models} with specific examples 17in Chapter~\ref{adv_models}. 18%When these model recursion specifications are sufficient to completely 19%describe a multi-iterator, multi-model solution approach, then a 20%separate meta-iterator specification is not used (see 21%Chapter~\ref{adv_models} for examples). 22 23\section{Hybrid Minimization}\label{adv_meth:hybrid} 24 25In the hybrid minimization method (keyword: \texttt{hybrid}), a 26sequence of minimization methods are applied to find an optimal design 27point. The goal of this method is to exploit the strengths of 28different minimization algorithms through different stages of the 29minimization process. Global/local optimization hybrids (e.g., genetic 30algorithms combined with nonlinear programming) are a common example 31in which the desire for a global optimum is balanced with the need for 32efficient navigation to a local optimum. An important related feature 33is that the sequence of minimization algorithms can employ models of 34varying fidelity. In the global/local case, for example, it would 35often be advantageous to use a low-fidelity model in the global search 36phase, followed by use of a more refined model in the local search 37phase. 38 39The specification for hybrid minimization involves a list of 40method identifier strings, and each of the corresponding method 41specifications has the responsibility for identifying the model 42specification (which may in turn identify variables, interface, and 43responses specifications) that each method will use (see the Dakota 44Reference Manual~\cite{RefMan} and the example discussed below). 45Currently, only the sequential hybrid approach is available. The 46\texttt{embedded} and \texttt{collaborative} approaches are 47not fully functional at this time. 48 49In the \texttt{sequential} hybrid minimization approach, a sequence 50of minimization methods is invoked in the order specified in the 51Dakota input file. After each method completes execution, 52the best solution or solutions from that method are used as the 53starting point(s) for the following method. 54The number of solutions transferred 55is defined by how many that method can generate and how many the 56user specifies with the individual method keyword \texttt{final\_solutions}. 57For example, currently only a few of the global optimization methods such as 58genetic algorithms (e.g. \texttt{moga} and \texttt{coliny\_ea}) and 59sampling methods return multiple solutions. In this case, 60the specified number of solutions from the previous 61method will be used to initialize the subsequent method. If the subsequent 62method cannot accept multiple input points (currently only a few methods 63such as the genetic algorithms in JEGA allow multiple input points), then 64multiple instances of the subsequent method are generated, each one 65initialized by one of the optimal solutions from the previous method. 66For example, if LHS sampling were run as the first method and 67the number of final solutions was 10 and the DOT conjugate gradient 68was the second method, there would be 10 instances of \texttt{dot\_frcg} 69started, each with a separate LHS sample solution as its initial point. 70Method switching is governed 71by the separate convergence controls of each method; that is, 72\emph{each method is allowed to run to its own internal definition of 73completion without interference}. Individual method completion may be 74determined by convergence criteria (e.g., 75\texttt{convergence\_tolerance}) or iteration limits (e.g., 76\texttt{max\_iterations}). 77 78%The \texttt{adaptive} option 79%is similar, with the difference that the progress of each method is 80%monitored and method switching is enforced according to 81%externally-defined relative progress metrics. 82 83%The \texttt{embedded} approach is restricted to special tightly-coupled 84%hybrid algorithms in which local searches are used periodically to 85%accelerate a global search. These hybrids do not contain a discrete 86%method switch, but rather repeatedly apply a local algorithm within 87%the context of the global algorithm. 88 89Figure~\ref{adv_meth:figure01} shows a Dakota input file that specifies 90a sequential hybrid optimization method to solve the 91``textbook'' optimization test problem. 92The \path{textbook_hybrid_strat.in} file 93provided in \path{dakota/share/dakota/examples/users} starts with a 94\dakotakw{coliny_ea} solution which feeds its best point into a 95\dakotakw{coliny_pattern_search} optimization which feeds its best 96point into \dakotakw{optpp_newton}. While this approach is overkill for 97such a simple problem, it is useful for demonstrating the coordination 98between multiple sub-methods in the hybrid minimization algorithm. 99 100The three optimization methods are identified using the 101\texttt{method\_list} specification in the hybrid method section of the 102input file. The identifier strings listed in the specification are 103`\texttt{GA}' for genetic algorithm, `\texttt{PS}' for pattern search, 104and `\texttt{NLP}' for nonlinear programming. Following the hybrid method 105keyword block are the three corresponding method keyword blocks. Note 106that each method has a tag following the \texttt{id\_method} keyword 107that corresponds to one of the method names listed in the hybrid method 108keyword block. By following the identifier tags from \texttt{method} 109to \texttt{model} and from \texttt{model} to \texttt{variables}, 110\texttt{interface}, and \texttt{responses}, it is easy to see the 111specification linkages for this problem. The GA optimizer runs first 112and uses model `\texttt{M1}' which includes variables `\texttt{V1}', 113interface `\texttt{I1}', and responses `\texttt{R1}'. 114Note that in the specification, \texttt{final\_solutions=1}, 115so only one (the best) solution is returned from the GA. 116However, it is possible to change this to \texttt{final\_solutions=5} 117and get five solutions passed from the GA to the Pattern Search 118(for example). Once the GA is complete, the PS optimizer starts from the 119best GA result and again 120uses model `\texttt{M1}'. Since both GA and PS are nongradient-based 121optimization methods, there is no need for gradient or Hessian 122information in the `\texttt{R1}' response keyword block. The NLP 123optimizer runs last, using the best result from the PS method as its 124starting point. It uses model `\texttt{M2}' which includes the same 125`\texttt{V1}' and `\texttt{I1}' keyword blocks, but uses the responses 126keyword block `\texttt{R2}' since the full Newton optimizer used in 127this example (\texttt{optpp\_newton}) needs analytic gradient and 128Hessian data to perform its search. 129\begin{figure} 130 \centering 131 \begin{bigbox} 132 \begin{tiny} 133 \verbatimtabinput[8]{textbook_hybrid_strat.in} 134 \end{tiny} 135 \end{bigbox} 136 \caption{Dakota input file for a sequential hybrid optimization method -- 137see \protect\path{dakota/share/dakota/examples/users/textbook_hybrid_strat.in} } 138 \label{adv_meth:figure01} 139\end{figure} 140 141\section{Multistart Local Minimization}\label{adv_meth:multistart} 142 143A simple, heuristic, global minimization technique is to use many 144local minimization runs, each of which is started from a different 145initial point in the parameter space. This is known as multistart 146local minimization. This is an attractive method in situations where 147multiple local optima are known or expected to exist in the parameter 148space. However, there is no theoretical guarantee that the global 149optimum will be found. This approach combines the efficiency of local 150minimization methods with a user-specified global stratification 151(using a specified \texttt{starting\_points} list, a number of 152specified \texttt{random\_starts}, or both; see the Dakota Reference 153Manual~\cite{RefMan} for additional specification details). Since 154solutions for different starting points are independent, parallel 155computing may be used to concurrently run the local minimizations. 156 157An example input file for multistart local optimization on the 158``quasi\_sine'' test function (see \path{quasi_sine_fcn.C} in 159\path{dakota_source/test}) is shown in Figure~\ref{adv_meth:figure02}. 160The method keyword block in the input file contains the keyword 161\texttt{multi\_start}, along with the set of starting points (3 random 162and 5 listed) that will be used for the optimization runs. The other 163keyword blocks in the input file are similar to what would be used in 164a single optimization run. 165 166\begin{figure} 167 \centering 168 \begin{bigbox} 169 \begin{small} 170 \verbatimtabinput[8]{qsf_multistart_strat.in} 171 \end{small} 172 \end{bigbox} 173 \caption{Dakota input file for a multistart local optimization method -- 174see \protect\path{dakota/share/dakota/examples/users/qsf_multistart_strat.in} } 175 \label{adv_meth:figure02} 176\end{figure} 177 178The \texttt{quasi\_sine} test function has multiple local minima, but 179there is an overall trend in the function that tends toward the global 180minimum at $(x1,x2)=(0.177,0.177)$. See~\cite{Giu00} for more 181information on this test function. Figure~\ref{adv_meth:figure03} shows 182the results summary for the eight local optimizations performed. From 183the five specified starting points and the 3 random starting points 184(as identified by the \texttt{x1}, \texttt{x2} headers), the eight 185local optima (as identified by the \texttt{x1*}, 186\texttt{x2*} headers) are all different and only one of the local 187optimizations finds the global minimum. 188 189\begin{figure} 190\centering 191\begin{bigbox} 192\begin{footnotesize} 193\begin{verbatim} 194<<<<< Results summary: 195 set_id x1 x2 x1* x2* obj_fn 196 1 -0.8 -0.8 -0.8543728666 -0.8543728666 0.5584096919 197 2 -0.8 0.8 -0.9998398719 0.177092822 0.291406596 198 3 0.8 -0.8 0.177092822 -0.9998398719 0.291406596 199 4 0.8 0.8 0.1770928217 0.1770928217 0.0602471946 200 5 0 0 0.03572926375 0.03572926375 0.08730499239 201 6 -0.7767971993 0.01810943539 -0.7024118387 0.03572951143 0.3165522387 202 7 -0.3291571008 -0.7697378755 0.3167607374 -0.4009188363 0.2471403213 203 8 0.8704730469 0.7720679005 0.177092899 0.3167611757 0.08256082751 204\end{verbatim} 205\end{footnotesize} 206\end{bigbox} 207\caption{Dakota results summary for a multistart local optimization method.} 208\label{adv_meth:figure03} 209\end{figure} 210 211\section{Pareto Optimization}\label{adv_meth:pareto} 212 213The Pareto optimization method (keyword: \dakotakw{pareto_set}) is 214one of three multiobjective optimization capabilities discussed in 215Section~\ref{opt:additional:multiobjective}. In the Pareto 216optimization method, multiple sets of multiobjective weightings are 217evaluated. The user can specify these weighting sets in the method 218keyword block using a \dakotakw{multi_objective_weight_sets} list, a 219number of \dakotakw{random_weight_sets}, or both (see the Dakota 220Reference Manual~\cite{RefMan} for additional specification details). 221 222Dakota performs one multiobjective optimization problem for each set 223of multiobjective weights. The collection of computed optimal 224solutions form a Pareto set, which can be useful in making trade-off 225decisions in engineering design. Since solutions for different 226multiobjective weights are independent, parallel computing may be used 227to concurrently execute the multiobjective optimization problems. 228 229Figure~\ref{adv_meth:figure05} shows the results summary for the 230Pareto-set optimization method. For the four multiobjective 231weighting sets (as identified by the \texttt{w1}, \texttt{w2}, 232\texttt{w3} headers), the local optima (as identified by the 233\texttt{x1}, \texttt{x2} headers) are all different and correspond to 234individual objective function values of ($f_1,f_2,f_3$) = 235(0.0,0.5,0.5), (13.1,-1.2,8.16), (532.,33.6,-2.9), and (0.125,0.0,0.0) 236(note: the composite objective function is tabulated under the 237\texttt{obj\_fn} header). The first three solutions reflect exclusive 238optimization of each of the individual objective functions in turn, 239whereas the final solution reflects a balanced weighting and the 240lowest sum of the three objectives. Plotting these ($f_1,f_2,f_3$) 241triplets on a 3-dimensional plot results in a Pareto surface (not 242shown), which is useful for visualizing the trade-offs in the 243competing objectives. 244 245\begin{figure} 246 \centering 247 \begin{bigbox} 248 \begin{small} 249 \verbatimtabinput[8]{textbook_pareto_strat.in} 250 \end{small} 251 \end{bigbox} 252 \caption{Dakota input file for the Pareto optimization method -- 253see \protect\path{dakota/share/dakota/examples/users/textbook_pareto_strat.in} } 254 \label{adv_meth:figure04} 255\end{figure} 256 257\begin{figure} 258\centering 259\begin{bigbox} 260\begin{scriptsize} 261\begin{verbatim} 262<<<<< Results summary: 263 set_id w1 w2 w3 x1 x2 obj_fn 264 1 1 0 0 0.9996554048 0.997046351 7.612301561e-11 265 2 0 1 0 0.5 2.9 -1.2 266 3 0 0 1 5.8 1.12747589e-11 -2.9 267 4 0.333 0.333 0.333 0.5 0.5000000041 0.041625 268\end{verbatim} 269\end{scriptsize} 270\end{bigbox} 271\caption{Dakota results summary for the Pareto-set optimization 272 method.} 273\label{adv_meth:figure05} 274\end{figure} 275 276\section{Mixed Integer Nonlinear Programming (MINLP)}\label{adv_meth:minlp} 277 278Many nonlinear optimization problems involve a combination of discrete 279and continuous variables. These are known as mixed integer nonlinear 280programming (MINLP) problems. A typical MINLP optimization problem is 281formulated as follows: 282 283\begin{eqnarray} 284 \hbox{minimize:} & & f(\mathbf{x,d})\nonumber\\ 285 \hbox{subject to:} & & \mathbf{g}_{L} \leq \mathbf{g(x,d)} 286 \leq \mathbf{g}_{U}\nonumber\\ 287 & & \mathbf{h(x,d)}=\mathbf{h}_{t}\label{adv_meth:equation01}\\ 288 & & \mathbf{x}_{L} \leq \mathbf{x} \leq \mathbf{x}_{U}\nonumber\\ 289 & & \mathbf{d} \in \{-2,-1,0,1,2\}\nonumber 290\end{eqnarray} 291 292where $\mathbf{d}$ is a vector whose elements are integer values. In 293situations where the discrete variables can be temporarily relaxed 294(i.e., noncategorical discrete variables, see 295Section~\ref{variables:design:ddv}), the branch-and-bound algorithm 296can be applied. Categorical variables (e.g., true/false variables, 297feature counts, etc.) that are not relaxable cannot be used with the 298branch and bound method. During the branch and bound process, the 299discrete variables are treated as continuous variables and the 300integrality conditions on these variables are incrementally enforced 301through a sequence of optimization subproblems. By the end of this 302process, an optimal solution that is feasible with respect to the 303integrality conditions is computed. 304 305Dakota's branch and bound method (keyword: 306\texttt{branch\_and\_bound}) can solve optimization problems having 307either discrete or mixed continuous/discrete variables. This method 308uses the parallel branch-and-bound algorithm from the PEBBL software 309%package~\cite{Eck97,Eck01} to generate a series of optimization 310package~\cite{Eck09} to generate a series of optimization 311subproblems (``branches''). These subproblems are solved as continuous 312variable problems using any of Dakota's nonlinear optimization 313algorithms (e.g., DOT, NPSOL). When a solution to a branch is feasible 314with respect to the integrality constraints, it provides an upper 315bound on the optimal objective function, which can be used to prune 316branches with higher objective functions that are not yet 317feasible. Since solutions for different branches are independent, 318parallel computing may be used to concurrently execute the 319optimization subproblems. 320 321PEBBL, by itself, targets the solution of mixed integer linear 322programming (MILP) problems, and through coupling with Dakota's 323nonlinear optimizers, is extended to solution of MINLP problems. In 324the case of MILP problems, the upper bound obtained with a feasible 325solution is an exact bound and the branch and bound process is 326provably convergent to the global minimum. For nonlinear problems 327which may exhibit nonconvexity or multimodality, the process is 328heuristic in general, since there may be good solutions that are 329missed during the solution of a particular branch. However, the 330process still computes a series of locally optimal solutions, and is 331therefore a natural extension of the results from local optimization 332techniques for continuous domains. Only with rigorous global 333optimization of each branch can a global minimum be guaranteed when 334performing branch and bound on nonlinear problems of unknown 335structure. 336 337In cases where there are only a few discrete variables and when the 338discrete values are drawn from a small set, then it may be reasonable 339to perform a separate optimization problem for all of the possible 340combinations of the discrete variables. However, this brute force 341approach becomes computationally intractable if these conditions are 342not met. The branch-and-bound algorithm will generally require 343solution of fewer subproblems than the brute force method, although it 344will still be significantly more expensive than solving a purely 345continuous design problem. 346 347\subsection{Example MINLP Problem}\label{adv_meth:minlp:example} 348 349As an example, consider the following MINLP problem~\cite{Eld99}: 350 351\begin{eqnarray} 352 \hbox{minimize:} & & 353 f(\mathbf{x})=\sum_{i=1}^{6}(x_{i}-1.4)^{4}\nonumber\\ 354 & & g_{1}=x_{1}^{2}-\frac{x_{2}}{2} \leq 0\nonumber\\ 355 & & g_{2}=x_{2}^{2}-\frac{x_{1}}{2} \leq 0\label{adv_meth:equation02}\\ 356 & & -10 \leq x_{1},x_{2},x_{3},x_{4} \leq 10\nonumber\\ 357 & & x_{5},x_{6} \in \{0,1,2,3,4\}\nonumber 358\end{eqnarray} 359 360This problem is a variant of the textbook test problem described in 361Section~\ref{additional:textbook}. In addition to the introduction of 362two integer variables, a modified value of $1.4$ is used inside the 363quartic sum to render the continuous solution a non-integral solution. 364%Figure~\ref{adv_meth:figure06} shows a Dakota input file for solving this 365%problem. This input file is named \path{dakota_bandb.in} in the 366%\path{dakota/share/dakota/test} directory. Note the specification for the 367%discrete variables, where lower and upper bounds are given. The 368%discrete variables can take on any integer value within these bounds. 369 370%\begin{figure} 371% \centering 372% \begin{bigbox} 373% \begin{small} 374% \verbatimtabinput[8]{dakota_bandb.in} 375% \end{small} 376% \end{bigbox} 377% \caption{Dakota input file for the branch-and-bound method for 378% solving MINLP optimization problems.} 379% \label{adv_meth:figure06} 380%\end{figure} 381 382Figure~\ref{adv_meth:figure07} shows the sequence of branches generated 383for this problem. The first optimization subproblem relaxes the 384integrality constraint on parameters $x_{5}$ and $x_{6}$, so that $0 385\leq x_{5} \leq 4$ and $0 \leq x_{6} \leq 4$. The values for $x_{5}$ 386and $x_{6}$ at the solution to this first subproblem are 387$x_{5}=x_{6}=1.4$. Since $x_{5}$ and $x_{6}$ must be integers, the 388next step in the solution process ``branches'' on parameter $x_{5}$ to 389create two new optimization subproblems; one with $0 \leq x_{5} \leq 3901$ and the other with $2 \leq x_{5} \leq 4$. Note that, at this first 391branching, the bounds on $x_{6}$ are still $0 \leq x_{6} \leq 4$. 392Next, the two new optimization subproblems are solved. Since they are 393independent, they can be performed in parallel. The branch-and-bound 394process continues, operating on both $x_{5}$ and $x_{6}$ , until a 395optimization subproblem is solved where $x_{5}$ and $x_{6}$ are 396integer-valued. At the solution to this problem, the optimal values 397for $x_{5}$ and $x_{6}$ are $x_{5}=x_{6}=1$. 398 399\begin{figure} 400 \centering 401 \includegraphics[scale=0.75]{images/branch_history} 402 \caption{Branching history for example MINLP optimization problem.} 403 \label{adv_meth:figure07} 404\end{figure} 405 406In this example problem, the branch-and-bound algorithm executes as 407few as five and no more than seven optimization subproblems to reach 408the solution. For comparison, the brute force approach would require 40925 optimization problems to be solved (i.e., five possible values for 410each of $x_{5}$ and $x_{6}$ ). 411 412In the example given above, the discrete variables are integer-valued. 413In some cases, the discrete variables may be real-valued, such as $x 414\in \{0.0,0.5,1.0,1.5,2.0\}$. The branch-and-bound algorithm is 415restricted to work with integer values. Therefore, it is up to the 416user to perform a transformation between the discrete integer values 417from Dakota and the discrete real values that are passed to the 418simulation code (see Section~\ref{variables:design:ddv}). When 419integrality is not being relaxed, a common mapping is to use the 420integer value from Dakota as the index into a vector of discrete real 421values. However, when integrality is relaxed, additional logic for 422interpolating between the discrete real values is needed. 423% Note: it should be straightforward to extend MINLP to support 424% general discrete variables, if PICO would support it. Does this 425% come up in MILP for logistics, etc.? 426 427\section{Surrogate-Based Minimization}\label{adv_meth:sbm} 428 429Surrogate models approximate an original, high fidelity ``truth'' 430model, typically at reduced computational cost. In Dakota, several 431surrogate model selections are possible, which are categorized as data 432fits, multifidelity models, and reduced-order models, as described in 433Section~\ref{models:surrogate}. In the context of minimization 434(optimization or calibration), surrogate models can speed convergence 435by reducing function evaluation cost or smoothing noisy response 436functions. Three categories of surrogate-based minimization are 437discussed in this chapter: 438\begin{itemize} 439\item Trust region-managed surrogate-based local minimization, with 440 data fit surrogate, multifidelity models, or reduced-order models. 441 442\item Surrogate-based global minimization, where a single surrogate is 443 built (and optionally iteratively updated) over the whole design 444 space. 445 446\item Efficient global minimization: nongradient-based constrained and 447 unconstrained optimization and nonlinear least squares based on 448 Gaussian process models, guided by an expected improvement function. 449\end{itemize} 450 451\subsection{Surrogate-Based Local Minimization}\label{adv_meth:sbm:sblm} 452 453In the surrogate-based local minimization method (keyword: 454\texttt{surrogate\_based\_local}) the minimization algorithm operates 455on a surrogate model instead of directly operating on the 456computationally expensive simulation model. The surrogate model can be 457based on data fits, multifidelity models, or reduced-order models, as 458described in Section~\ref{models:surrogate}. Since the surrogate will 459generally have a limited range of accuracy, the surrogate-based local 460algorithm periodically checks the accuracy of the surrogate model 461against the original simulation model and adaptively manages the 462extent of the approximate optimization cycles using a trust region 463approach. 464 465%The surrogate-based local method in 466%Dakota can be implemented using heuristic rules (less expensive) or 467%provably-convergent rules (more expensive). The heuristic approach 468%is particularly effective on real-world engineering design problems 469%that contain nonsmooth features (e.g., slope discontinuities, 470%numerical noise) where gradient-based optimization methods often have 471%trouble, and where the computational expense of the simulation 472%precludes the use of nongradient-based methods. 473 474Refer to the Dakota Theory Manual~\cite{TheoMan} for algorithmic 475details on iterate acceptance, merit function formulations, 476convergence assessment, and constraint relaxation. 477 478 479\subsubsection{SBO with Data Fits}\label{adv_meth:sbm:sblm:surface} 480 481When performing SBO with local, multipoint, and global data fit 482surrogates, it is necessary to regenerate or update the data fit for 483each new trust region. In the global data fit case, this can mean 484performing a new design of experiments on the original high-fidelity 485model for each trust region, which can effectively limit the approach 486to use on problems with, at most, tens of variables. 487Figure~\ref{fig:sbo_df} displays this case. However, an important 488benefit of the global sampling is that the global data fits can tame 489poorly-behaved, nonsmooth, discontinuous response variations within 490the original model into smooth, differentiable, easily navigated 491surrogates. This allows SBO with global data fits to extract the 492relevant global design trends from noisy simulation data. 493 494\begin{figure} 495 \centering 496 \includegraphics[width=.4\textwidth]{images/sbo_df} 497 \caption{SBO iteration progression for global data fits.} 498 \label{fig:sbo_df} 499\end{figure} 500When enforcing local consistency between a global data fit surrogate 501and a high-fidelity model at a point, care must be taken to balance 502this local consistency requirement with the global accuracy of the 503surrogate. In particular, performing a correction on an existing 504global data fit in order to enforce local consistency can skew the 505data fit and destroy its global accuracy. One approach for achieving 506this balance is to include the consistency requirement within the data 507fit process by constraining the global data fit calculation (e.g., 508using constrained linear least squares). This allows the data fit to 509satisfy the consistency requirement while still addressing global 510accuracy with its remaining degrees of freedom. 511% Use figure from Theresa's paper? Use equations from notes? 512Embedding the consistency within the data fit also reduces the 513sampling requirements. For example, a quadratic polynomial normally 514requires at least $(n+1)(n+2)/2$ samples for $n$ variables to perform 515the fit. However, with an embedded first-order consistency constraint 516at a single point, the minimum number of samples is reduced by $n+1$ 517to $(n^2+n)/2$. 518% With gradient information in each sample, this can be further 519% reduced to ceil(n+2/2) samples. 520%This corresponds to defining the terms of a symmetric Hessian matrix 521%and points to an alternate approach. Rather than enforcing 522%consistency through constrained least squares, one can embed 523%consistency directly by employing a Taylor series centered at the 524%point of local consistency enforcement and globally estimating the 525%higher order terms. In the quadratic polynomial example, a 526%second-order Taylor series with globally estimated Hessian terms 527%requires the same $(n^2+n)/2$ samples and directly satisfies 528%first-order consistency. To further reduce sampling requirements in 529%this case, one can choose to perform only partial updates (e.g., the 530%diagonal) of the Hessian matrix~\cite{Per02}. 531 532% Additional research area: Exploiting variance estimators to guide 533% global search (e.g., kriging) 534 535In the local and multipoint data fit cases, the iteration progression 536will appear as in Fig.~\ref{fig:sbo_mh}. Both cases involve a single 537new evaluation of the original high-fidelity model per trust region, 538with the distinction that multipoint approximations reuse information 539from previous SBO iterates. Like model hierarchy surrogates, these 540techniques scale to larger numbers of design variables. Unlike model 541hierarchy surrogates, they generally do not require surrogate 542corrections, since the matching conditions are embedded in the 543surrogate form (as discussed for the global Taylor series approach 544above). The primary disadvantage to these surrogates is that the 545region of accuracy tends to be smaller than for global data fits and 546multifidelity surrogates, requiring more SBO cycles with smaller trust 547regions. 548%In SBO with surface fit functions, a sequence of optimization 549%subproblems are evaluated, each of which is confined to a subset of 550%the parameter space known as a ``trust region.'' Inside each trust 551%region, Dakota's data sampling methods are used to evaluate the 552%response quantities at a small number (order $10^{1}$ to $10^{2}$) of 553%design points. Next, multidimensional surface fitting is performed to 554%create a surrogate function for each of the response quantities. 555%Finally, optimization is performed using the surrogate functions in 556%lieu of the actual response quantities, and the optimizer's search is 557%limited to the region inside the trust region bounds. A validation 558%procedure is then applied to compare the predicted improvement in the 559%response quantities to the actual improvement in the response 560%quantities. Based on the results of this validation, the optimum 561%design point is either accepted or rejected and the size of the trust 562%region is either expanded, contracted, or left unchanged. The sequence 563%of optimization subproblems continues until the SBO convergence 564%criteria are satisfied 565More information on the design of experiments methods is available in 566Chapter~\ref{dace}, and the data fit surrogates are described in 567Section~\ref{models:surrogate:datafit}. 568 569Figure~\ref{sbm:sblm_rosen} shows a Dakota input file that implements 570surrogate-based optimization on Rosenbrock's function. 571The first method keyword block contains the SBO 572keyword \texttt{surrogate\_based\_local}, plus the commands for 573specifying the trust region size and scaling factors. The optimization 574portion of SBO, using the CONMIN Fletcher-Reeves conjugate gradient method, 575is specified in the following keyword blocks for 576\texttt{method}, \texttt{model}, \texttt{variables}, and 577\texttt{responses}. The model used by the optimization method 578specifies that a global surrogate will be used to map variables into 579responses (no \texttt{interface} specification is used by the 580surrogate model). The global surrogate is constructed using a DACE 581method which is identified with the \texttt{`SAMPLING'} identifier. 582This data sampling portion of SBO is specified in the final set of 583keyword blocks for \texttt{method}, \texttt{model}, 584\texttt{interface}, and \texttt{responses} (the earlier 585\texttt{variables} specification is reused). This example problem uses 586the Latin hypercube sampling method in the LHS software to select 10 587design points in each trust region. A single surrogate model is 588constructed for the objective function using a quadratic polynomial. 589The initial trust region is centered at the design point 590$(x_1,x_2)=(-1.2,1.0)$, and extends $\pm 0.4$ (10\% of the global 591bounds) from this point in the $x_1$ and $x_2$ coordinate directions. 592\begin{figure} 593 \begin{bigbox} 594 \begin{tiny} 595 \verbatimtabinput[8]{rosen_opt_sbo.in} 596 \end{tiny} 597 \end{bigbox} 598 \caption{Dakota input file for the surrogate-based local optimization 599 example -- 600see \protect\path{dakota/share/dakota/examples/users/rosen_opt_sbo.in} } 601 \label{sbm:sblm_rosen} 602\end{figure} 603 604If this input file is executed in Dakota, it will converge to the 605optimal design point at $(x_{1},x_{2})=(1,1)$ in approximately 800 606function evaluations. While this solution is correct, it is obtained 607at a much higher cost than a traditional gradient-based optimizer 608(e.g., see the results obtained in Section~\ref{tutorial:examples:optimization}). 609This demonstrates that the SBO method with global data fits is not 610really intended for use with smooth continuous optimization problems; 611direct gradient-based optimization can be more efficient for such 612applications. Rather, SBO with global data fits is best-suited for the 613types of problems that occur in engineering design where the response 614quantities may be discontinuous, nonsmooth, or may have multiple local 615optima~\cite{Giu02}. In these types of engineering design problems, 616traditional gradient-based optimizers often are ineffective, whereas 617global data fits can extract the global trends of interest despite the 618presence of local nonsmoothness (for an example problem with multiple 619local optima, look in \path{dakota/share/dakota/test} for the file 620\path{dakota_sbo_sine_fcn.in}~\cite{Giu00}). 621 622The surrogate-based local minimizer is only mathematically 623guaranteed to find a local minimum. However, in practice, SBO can often find 624the global minimum. Due to the random sampling method used within the 625SBO algorithm, the SBO method will solve a given problem a little differently 626each time it is run (unless the user specifies a particular random 627number seed in the dakota input file as is shown in Figure~\ref{sbm:sblm_rosen}). 628Our experience on the quasi-sine function mentioned above is that if 629you run this problem 10 times with the same starting conditions but different 630seeds, then you will find the global minimum in about 70-80\% of the trials. 631This is good performance for what is mathematically only a local optimization method. 632 633\subsubsection{SBO with Multifidelity Models}\label{adv_meth:sbm:sblm:multifidelity} 634 635When performing SBO with model hierarchies, the low-fidelity model is 636normally fixed, requiring only a single high-fidelity evaluation to 637compute a new correction for each new trust region. 638Figure~\ref{fig:sbo_mh} displays this case. This renders the 639multifidelity SBO technique more scalable to larger numbers of design 640variables since the number of high-fidelity evaluations per iteration 641(assuming no finite differencing for derivatives) is independent of 642the scale of the design problem. However, the ability to smooth 643poorly-behaved response variations in the high-fidelity model is lost, 644and the technique becomes dependent on having a well-behaved 645low-fidelity model\footnote{It is also possible to use a hybrid data 646fit/multifidelity approach in which a smooth data fit of a noisy low 647fidelity model is used in combination with a high fidelity model}. In 648addition, the parameterizations for the low and high-fidelity models 649may differ, requiring the use of a mapping between these 650parameterizations. Space mapping, corrected space mapping, POD 651mapping, and hybrid POD space mapping are being explored for this 652purpose~\cite{Rob06a,Rob06b}. 653 654\begin{wrapfigure}{r}{.3\textwidth} 655 \centering 656 \includegraphics[width=.3\textwidth]{images/sbo_mh} 657 \caption{SBO iteration progression for model hierarchies.} 658 \label{fig:sbo_mh} 659\end{wrapfigure} 660%\begin{figure} 661%\epsfxsize 3in 662%\centerline{\epsfbox{sbo_mh.eps}} 663%\caption{SBO iteration progression for model hierarchies.} 664%\label{fig:sbo_mh} 665%\end{figure} 666 667When applying corrections to the low-fidelity model, there is no 668concern for balancing global accuracy with the local consistency 669requirements. However, with only a single high-fidelity model evaluation 670at the center of each trust region, it is critical to use the best 671correction possible on the low-fidelity model in order to achieve 672rapid convergence rates to the optimum of the high-fidelity 673model~\cite{Eld04}. 674 675%SBO can also be applied with multifidelity, or hierarchical, models, 676%i.e., where one has available both a high-fidelity computational model 677%and a low-fidelity computational model. This situation can occur when 678%the low-fidelity model neglects some physical phenomena (e.g., 679%viscosity, heat transfer, etc.) that are included in the high-fidelity 680%model, or when the low-fidelity model has a lower resolution 681%computational mesh than the high-fidelity model. In many cases, the 682%low-fidelity model can serve as a surrogate for the high-fidelity 683%model during the optimization process. Thus, the low-fidelity model 684%can be used in SBO in a manner similar to the use of surface fit models 685%described in Section~\ref{adv_meth:sbm:sblm:surface}. A key difference 686%in SBO with hierarchical surrogates is that a design of experiments 687%using the high-fidelity model is not required; rather high-fidelity 688%evaluations are only needed at the center of the current trust-region 689%and the predicted optimum point in order to correct the low-fidelity 690%model and verify improvement, respectively. Another difference is that 691%one of the four types of correction described in 692%Section~\ref{adv_meth:sbm:sblm:surface} is required for SBO with 693%multifidelity models. 694 695A multifidelity test problem named 696\path{dakota_sbo_hierarchical.in} is available in 697\path{dakota/share/dakota/test} to demonstrate this SBO approach. This test 698problem uses the Rosenbrock function as the high fidelity model and a 699function named ``lf\_rosenbrock'' as the low fidelity model. Here, 700lf\_rosenbrock is a variant of the Rosenbrock function (see 701\path{dakota_source/test/lf_rosenbrock.C} for formulation) with the 702minimum point at $(x_1,x_2)=(0.80,0.44)$, whereas the minimum of the 703original Rosenbrock function is $(x_1,x_2)=(1,1)$. Multifidelity SBO 704locates the high-fidelity minimum in 11 high fidelity evaluations for 705additive second-order corrections and in 208 high fidelity evaluations 706for additive first-order corrections, but fails for zeroth-order 707additive corrections by converging to the low-fidelity minimum. 708 709\subsubsection{SBO with Reduced Order Models}\label{adv_meth:sbm:sblm:rom} 710 711When performing SBO with reduced-order models (ROMs), the ROM is 712mathematically generated from the high-fidelity model. A critical 713issue in this ROM generation is the ability to capture the effect of 714parametric changes within the ROM. Two approaches to parametric ROM 715are extended ROM (E-ROM) and spanning ROM (S-ROM) 716techniques~\cite{Wei06}. Closely related techniques include tensor 717singular value decomposition (SVD) methods~\cite{Lat00}. In the 718single-point and multipoint E-ROM cases, the SBO iteration can appear 719as in Fig.~\ref{fig:sbo_mh}, whereas in the S-ROM, global E-ROM, and 720tensor SVD cases, the SBO iteration will appear as in 721Fig.~\ref{fig:sbo_df}. In addition to the high-fidelity model 722analysis requirements, procedures for updating the system matrices and 723basis vectors are also required. 724 725Relative to data fits and multifidelity models, ROMs have some 726attractive advantages. Compared to data fits such as regression-based 727polynomial models, they are more physics-based and would be expected 728to be more predictive (e.g., in extrapolating away from the immediate 729data). Compared to multifidelity models, ROMS may be more practical 730in that they do not require multiple computational models or meshes 731which are not always available. The primary disadvantage is potential 732invasiveness to the simulation code for projecting the system using 733the reduced basis. 734 735 736\subsection{Surrogate-Based Global Minimization}\label{adv_meth:sbm:sbgm} 737 738Surrogate-based global minimization differs from the surrogate-based 739local minimization approach discussed in the previous section in 740several ways: it is not a trust-region approach; initially there is 741one global surrogate constructed over a set of sample points and the 742optimizer operates on that surrogate (as opposed to adaptively 743selecting points and re-building a surrogate in each trust region); 744and there is no guarantee of convergence. 745 746The \texttt{surrogate\_based\_global} method was developed to address 747two needs. The first is the case where a user wishes to use existing 748function evaluations or a fixed sample size (perhaps based on 749computational cost and allocation of resources) to build a surrogate 750once and optimize on it. In this case (a single global optimization 751on a surrogate model), the set of surrogate building points is 752determined in advance as opposed to the trust-region local surrogate 753optimization in which the number of ``true'' function evaluations 754depends on the location and size of the trust region, the goodness of 755the surrogate within the trust-region, and problem characteristics. 756 757In the second \texttt{surrogate\_based\_global} use case, we want to 758update the surrogate, but globally. That is, we add points to the 759sample set used to create the surrogate, rebuild the surrogate, and 760then perform another global optimization on the new surrogate. Thus, 761surrogate-based global optimization can be used in an iterative 762scheme. In one iteration, minimizers of the surrogate model are 763found, and a selected subset of these are passed to the next 764iteration. In the next iteration, these surrogate points are 765evaluated with the ``truth'' model, and then added to the set of 766points upon which the next surrogate is constructed. This presents a 767more accurate surrogate to the minimizer at each subsequent iteration, 768presumably driving to optimality quickly. Note that a global 769surrogate is constructed using the same bounds in each iteration. 770This approach has no guarantee of convergence. 771 772The surrogate-based global method was originally designed for MOGA (a 773multi-objective genetic algorithm). Since genetic algorithms often 774need thousands or tens of thousands of points to produce optimal or 775near-optimal solutions, surrogates can help by reducing the necessary 776truth model evaluations. Instead of creating one set of surrogates 777for the individual objectives and running the optimization algorithm 778on the surrogate once, the idea is to select points along the 779(surrogate) Pareto frontier, which can be used to supplement the 780existing points. In this way, one does not need to use many points 781initially to get a very accurate surrogate. The surrogate becomes 782more accurate as the iterations progress. 783 784Most single objective optimization methods will return only a single 785optimal point. In that case, only one point from the surrogate model 786will be evaluated with the ``true'' function and added to the pointset 787upon which the surrogate is based. In this case, it will take many 788iterations of the surrogate-based global optimization for the approach 789to converge, and its utility may not be as great as for the 790multi-objective case when multiple optimal solutions are passed from 791one iteration to the next to supplement the surrogate. Note that the 792user has the option of appending the optimal points from the surrogate 793model to the current set of truth points or using the optimal points 794from the surrogate model to replace the optimal set of points from the 795previous iteration. Although appending to the set is the default 796behavior, at this time we strongly recommend using the option 797\texttt{replace\_points} because it appears to be more accurate and 798robust. 799 800When using the surrogate-based global method, we first recommend 801running one optimization on a single surrogate model. That is, set 802\texttt{max\_iterations} to 1. This will allow one to get a sense of 803where the optima are located and also what surrogate types are the 804most accurate to use for the problem. Note that by fixing the seed of 805the sample on which the surrogate is built, one can take a Dakota 806input file, change the surrogate type, and re-run the problem without 807any additional function evaluations by specifying the use of the 808dakota restart file which will pick up the existing function 809evaluations, create the new surrogate type, and run the optimization 810on that new surrogate. Also note that one can specify that surrogates 811be built for all primary functions and constraints or for only a 812subset of these functions and constraints. This allows one to use a 813"truth" model directly for some of the response functions, perhaps due 814to them being much less expensive than other functions. Finally, a 815diagnostic threshold can be used to stop the method if the surrogate 816is so poor that it is unlikely to provide useful points. If the 817goodness-of-fit has an R-squared value less than 0.5, meaning that 818less than half the variance of the output can be explained or 819accounted for by the surrogate model, the surrogate-based global 820optimization stops and outputs an error message. This is an arbitrary 821threshold, but generally one would want to have an R-squared value as 822close to 1.0 as possible, and an R-squared value below 0.5 indicates a 823very poor fit. 824 825For the surrogate-based global method, we initially recommend a small 826number of maximum iterations, such as 3--5, to get a sense of how the 827optimization is evolving as the surrogate gets updated globally. If 828it appears to be changing significantly, then a larger number (used in 829combination with restart) may be needed. 830 831Figure~\ref{sbm:sbgm_moga} shows a Dakota input file that implements 832surrogate-based global optimization on a multi-objective test function. 833The first method keyword block contains the 834keyword \texttt{surrogate\_based\_global}, plus the commands for 835specifying five as the maximum iterations and the option to replace 836points in the global surrogate construction. The method block identified 837as MOGA specifies a multi-objective genetic algorithm optimizer and its 838controls. The model keyword block specifies a surrogate model. 839In this case, a \texttt{gaussian\_process} model is used as a surrogate. 840The \texttt{dace\_method\_pointer} specifies that the surrogate will be 841build on 100 Latin Hypercube samples with a seed = 531. 842The remainder of the input specification deals with the interface 843to the actual analysis driver and the 2 responses being returned 844as objective functions from that driver. 845 846\begin{figure} 847 \begin{bigbox} 848 \begin{scriptsize} 849 \verbatimtabinput[8]{mogatest1_opt_sbo.in} 850 \end{scriptsize} 851 \end{bigbox} 852 \caption{MOGA example -- 853see \protect\path{dakota/share/dakota/examples/users/mogatest1_opt_sbo.in} } 854 \label{sbm:sbgm_moga} 855\end{figure} 856