1\chapter{Nonlinear least squares} 2\label{chap:nls} 3 4\section{Introduction and examples} 5\label{nls-intro} 6 7Gretl supports nonlinear least squares (NLS) using a variant of 8the Levenberg--Marquardt algorithm. The user must supply a 9specification of the regression function; prior to giving this 10specification the parameters to be estimated must be ``declared'' and 11given initial values. Optionally, the user may supply analytical 12derivatives of the regression function with respect to each of the 13parameters. If derivatives are not given, the user must instead give 14a list of the parameters to be estimated (separated by spaces or 15commas), preceded by the keyword \cmd{params}. The tolerance 16(criterion for terminating the iterative estimation procedure) can be 17adjusted using the \cmd{set} command. 18 19The syntax for specifying the function to be estimated consists of the 20name of the dependent variable, followed by an expression to generate 21it. This is illustrated in the following two examples, with 22accompanying derivatives. 23 24\begin{code} 25# Consumption function from Greene 26nls C = alpha + beta * Y^gamma 27 deriv alpha = 1 28 deriv beta = Y^gamma 29 deriv gamma = beta * Y^gamma * log(Y) 30end nls 31 32# Nonlinear function from Russell Davidson 33nls y = alpha + beta * x1 + (1/beta) * x2 34 deriv alpha = 1 35 deriv beta = x1 - x2/(beta*beta) 36end nls --vcv 37\end{code} 38 39Note the command words \cmd{nls} (which introduces the regression 40function), \cmd{deriv} (which introduces the specification of a 41derivative), and \cmd{end nls}, which terminates the specification and 42calls for estimation. If the \cmd{--vcv} flag is appended to the last 43line the covariance matrix of the parameter estimates is printed. 44 45\section{Initializing the parameters} 46\label{nls-param} 47 48The parameters of the regression function must be given initial values 49prior to the \cmd{nls} command. (In the GUI program this may be done 50via the menu item ``Variable, Define new variable''). 51 52In some cases, where the nonlinear function is a generalization of (or 53a restricted form of) a linear model, it may be convenient to run an 54\cmd{ols} and initialize the parameters from the OLS coefficient 55estimates. In relation to the first example above, one might do: 56% 57\begin{code} 58ols C 0 Y 59alpha = $coeff(0) 60beta = $coeff(Y) 61gamma = 1 62\end{code} 63 64And in relation to the second example one might do: 65% 66\begin{code} 67ols y 0 x1 x2 68alpha = $coeff(0) 69beta = $coeff(x1) 70\end{code} 71 72\section{NLS dialog window} 73\label{nls-gui} 74 75It is probably most convenient to compose the commands for NLS 76estimation in the form of a gretl script but you can also do so 77interactively, by selecting the item ``Nonlinear Least Squares'' under 78the ``Model, Nonlinear models'' menu. This opens a dialog box where 79you can type the function specification (possibly prefaced by 80statements to set the initial parameter values) and the derivatives, 81if available. An example of this is shown in 82Figure~\ref{fig-nls-dialog}. Note that in this context you do not 83have to supply the \cmd{nls} and \cmd{end nls} tags. 84 85\begin{figure}[htbp] 86 \begin{center} 87 \includegraphics[scale=0.5]{figures/nls_window} 88 \end{center} 89 \caption{NLS dialog box} 90 \label{fig-nls-dialog} 91\end{figure} 92 93\section{Analytical and numerical derivatives} 94\label{nls-deriv} 95 96If you are able to figure out the derivatives of the regression 97function with respect to the parameters, it is advisable to supply 98those derivatives as shown in the examples above. If that is not 99possible, gretl will compute approximate numerical derivatives. 100However, the properties of the NLS algorithm may not be so good in 101this case (see section~\ref{nls-accuracy}). 102 103This is done by using the \cmd{params} statement, which should be 104followed by a list of identifiers containing the parameters to be 105estimated. In this case, the examples above would read as follows: 106 107\begin{code} 108# Greene 109nls C = alpha + beta * Y^gamma 110 params alpha beta gamma 111end nls 112\end{code} 113 114\begin{code} 115# Davidson 116nls y = alpha + beta * x1 + (1/beta) * x2 117 params alpha beta 118end nls 119\end{code} 120 121If analytical derivatives are supplied, they are checked for 122consistency with the given nonlinear function. If the derivatives are 123clearly incorrect estimation is aborted with an error message. If the 124derivatives are ``suspicious'' a warning message is issued but 125estimation proceeds. This warning may sometimes be triggered by 126incorrect derivatives, but it may also be triggered by a high degree 127of collinearity among the derivatives. 128 129Note that you cannot mix analytical and numerical derivatives: you 130should supply expressions for all of the derivatives or none. 131 132\section{Advanced use} 133\label{sec:nls-generalize} 134 135The \cmd{nls} block can also contain more sophisticated constructs. 136First, it can handle intermediate expressions; this makes it possible 137to construct the conditional mean expression as a multi-step job, thus 138enhancing modularity and readability of the code. Second, more complex 139objects, such as lists and matrices, can be used for this purpose. 140 141For example, suppose that we want to estimate a Probit Binary Response 142model via NLS. The specification is 143\begin{equation} 144\label{eq:probit-nls} 145y_i = \Phi\left[g(\bm{x}_i)\right] + u_i, \qquad g(\bm{x}_i) = b_0 + 146b_1 x_{1,i} + b_2 x_{2,i} = \bm{b}'\bm{x}_i 147\end{equation} 148Note: this is not the recommended way to estimate a probit model: the 149$u_i$ term is heteroskedastic by construction and ML estimation is 150much preferable here. Still, NLS is a consistent estimator of the 151parameter vector $\bm{b}$, although its covariance matrix will have to 152be adjusted to compensate for heteroskedasticity: this is accomplished 153via the \option{robust} switch. 154 155\begin{script}[htbp] 156 \scriptcaption{NLS estimation of a Probit model} 157 \label{script:nls-probit-numder} 158 \begin{scode} 159 open greene25_1.gdt 160 list X = const age income ownrent selfempl 161 162 # initalisation 163 ols cardhldr X --quiet 164 matrix b = $coeff / $sigma 165 166 # proceed with NLS estimation 167 nls cardhldr = cnorm(ndx) 168 series ndx = lincomb(X, b) 169 params b 170 end nls --robust 171 172 # compare with ML probit 173 probit cardhldr X --p-values 174 \end{scode} 175\end{script} 176 177The example in script \ref{script:nls-probit-numder} can be enhanced by 178using analytical derivatives: since 179\[ 180\frac{\partial g(\bm{x}_i)}{\partial b_j} = \varphi(\bm{b}'\bm{x}_i) \cdot x_{ij} 181\] 182one could substitute the \cmd{params} line in the script with the two-liner 183\begin{code} 184 series f = dnorm(ndx) 185 deriv b = {f} .* {X} 186\end{code} 187and have \cmd{nls} use analytically-computed derivatives, which are 188quicker and usually more reliable. 189 190\section{Controlling termination} 191\label{nls-toler} 192 193The NLS estimation procedure is an iterative process. Iteration is 194terminated when the criterion for convergence is met or when the 195maximum number of iterations is reached, whichever comes first. 196 197Let $k$ denote the number of parameters being estimated. The maximum 198number of iterations is $100 \times (k+1)$ when analytical derivatives 199are given, and $200 \times (k+1)$ when numerical derivatives are used. 200 201Let $\epsilon$ denote a small number. The iteration is deemed to have 202converged if at least one of the following conditions is satisfied: 203 204\begin{itemize} 205\item Both the actual and predicted relative reductions in the error 206 sum of squares are at most $\epsilon$. 207\item The relative error between two consecutive iterates is at most 208 $\epsilon$. 209\end{itemize} 210 211This default value of $\epsilon$ is the machine precision to the power 2123/4,\footnote{On a 32-bit Intel Pentium machine a likely value for 213 this parameter is $1.82\times 10^{-12}$.} but it can be adjusted 214using the \cmd{set} command with the parameter \verb+nls_toler+. For 215example 216% 217\begin{code} 218set nls_toler .0001 219\end{code} 220% 221will relax the value of $\epsilon$ to 0.0001. 222 223\section{Details on the code} 224\label{nls-code} 225 226The underlying engine for NLS estimation is based on the \app{minpack} 227suite of functions, available from 228\href{http://www.netlib.org/minpack/}{netlib.org}. Specifically, the 229following \app{minpack} functions are called: 230 231\begin{center} 232 \begin{tabular}{ll} 233 \verb+lmder+ & Levenberg--Marquardt algorithm with analytical 234 derivatives 235 \\ 236 \verb+chkder+ & Check the supplied analytical derivatives 237 \\ 238 \verb+lmdif+ & Levenberg--Marquardt algorithm with numerical 239 derivatives 240 \\ 241 \verb+fdjac2+ & Compute final approximate Jacobian when using 242 numerical derivatives 243 \\ 244 \verb+dpmpar+ & Determine the machine precision 245 \\ 246 \end{tabular} 247\end{center} 248 249On successful completion of the Levenberg--Marquardt iteration, a 250Gauss--Newton regression is used to calculate the covariance matrix 251for the parameter estimates. If the \option{robust} flag is given a 252robust variant is computed. The documentation for the \cmd{set} 253command explains the specific options available in this regard. 254 255Since NLS results are asymptotic, there is room for debate over 256whether or not a correction for degrees of freedom should be applied 257when calculating the standard error of the regression (and the 258standard errors of the parameter estimates). For comparability with 259OLS, and in light of the reasoning given in 260\cite{davidson-mackinnon93}, the estimates shown in gretl 261\emph{do} use a degrees of freedom correction. 262 263\section{Numerical accuracy} 264\label{nls-accuracy} 265 266Table \ref{tab-nls} shows the results of running the gretl NLS 267procedure on the 27 Statistical Reference Datasets made available by 268the U.S. National Institute of Standards and Technology (NIST) for 269testing nonlinear regression software.\footnote{For a discussion of 270 gretl's accuracy in the estimation of linear models, see 271 Appendix~\ref{app-accuracy}.} For each dataset, two sets of 272starting values for the parameters are given in the test files, so the 273full test comprises 54 runs. Two full tests were performed, one using 274all analytical derivatives and one using all numerical approximations. 275In each case the default tolerance was used.\footnote{The data shown 276 in the table were gathered from a pre-release build of gretl 277 version 1.0.9, compiled with \app{gcc} 3.3, linked against 278 \app{glibc} 2.3.2, and run under Linux on an i686 PC (IBM ThinkPad 279 A21m).} 280 281Out of the 54 runs, gretl failed to produce a solution 282in 4 cases when using analytical derivatives, and in 5 cases when 283using numeric approximation. Of the four failures in analytical 284derivatives mode, two were due to non-convergence of the 285Levenberg--Marquardt algorithm after the maximum number of iterations 286(on \verb+MGH09+ and \verb+Bennett5+, both described by NIST as of 287``Higher difficulty'') and two were due to generation of range errors 288(out-of-bounds floating point values) when computing the Jacobian (on 289\verb+BoxBOD+ and \verb+MGH17+, described as of ``Higher difficulty'' 290and ``Average difficulty'' respectively). The additional failure in 291numerical approximation mode was on \verb+MGH10+ (``Higher 292difficulty'', maximum number of iterations reached). 293 294The table gives information on several aspects of the tests: the 295number of outright failures, the average number of iterations taken to 296produce a solution and two sorts of measure of the accuracy of the 297estimates for both the parameters and the standard errors of the 298parameters. 299 300For each of the 54 runs in each mode, if the run produced a solution 301the parameter estimates obtained by gretl were compared with the 302NIST certified values. We define the ``minimum correct figures'' for 303a given run as the number of significant figures to which the 304\emph{least accurate} gretl estimate agreed with the certified 305value, for that run. The table shows both the average and the worst 306case value of this variable across all the runs that produced a 307solution. The same information is shown for the estimated standard 308errors.\footnote{For the standard errors, I excluded one outlier from 309 the statistics shown in the table, namely \verb+Lanczos1+. This is 310 an odd case, using generated data with an almost-exact fit: the 311 standard errors are 9 or 10 orders of magnitude smaller than the 312 coefficients. In this instance gretl could reproduce the 313 certified standard errors to only 3 figures (analytical derivatives) 314 and 2 figures (numerical derivatives).} 315 316The second measure of accuracy shown is the percentage of cases, 317taking into account all parameters from all successful runs, in which 318the gretl estimate agreed with the certified value to at least 319the 6 significant figures which are printed by default in the 320gretl regression output. 321 322\begin{table}[htbp] 323 \caption{Nonlinear regression: the NIST tests} 324 \label{tab-nls} 325 \begin{center} 326 \begin{tabular}{lcc} 327 & \textit{Analytical derivatives} & 328 \textit{Numerical derivatives} \\ [4pt] 329 Failures in 54 tests & 4 & 5\\ 330 Average iterations & 32 & 127\\ 331 Mean of min. correct figures, & 8.120 & 6.980\\ 332 parameters \\ 333 Worst of min. correct figures, & 4 & 3\\ 334 parameters \\ 335 Mean of min. correct figures, & 8.000 & 5.673\\ 336 standard errors \\ 337 Worst of min. correct figures, & 5 & 2\\ 338 standard errors \\ 339 Percent correct to at least 6 figures, & 96.5 & 91.9\\ 340 parameters \\ 341 Percent correct to at least 6 figures, & 97.7 & 77.3\\ 342 standard errors \\ 343 \end{tabular} 344 \end{center} 345 \end{table} 346 347 Using analytical derivatives, the worst case values for both 348 parameters and standard errors were improved to 6 correct figures on 349 the test machine when the tolerance was tightened to 1.0e$-$14. 350 Using numerical derivatives, the same tightening of the tolerance 351 raised the worst values to 5 correct figures for the parameters and 352 3 figures for standard errors, at a cost of one additional failure 353 of convergence. 354 355 Note the overall superiority of analytical derivatives: on average 356 solutions to the test problems were obtained with substantially 357 fewer iterations and the results were more accurate (most notably 358 for the estimated standard errors). Note also that the six-digit 359 results printed by gretl are not 100 percent reliable for 360 difficult nonlinear problems (in particular when using numerical 361 derivatives). Having registered this caveat, the percentage of 362 cases where the results were good to six digits or better seems high 363 enough to justify their printing in this form. 364 365%%% Local Variables: 366%%% mode: latex 367%%% TeX-master: "gretl-guide" 368%%% End: 369