1\chapter{Nonlinear least squares}
2\label{chap:nls}
3
4\section{Introduction and examples}
5\label{nls-intro}
6
7Gretl supports nonlinear least squares (NLS) using a variant of
8the Levenberg--Marquardt algorithm.  The user must supply a
9specification of the regression function; prior to giving this
10specification the parameters to be estimated must be ``declared'' and
11given initial values.  Optionally, the user may supply analytical
12derivatives of the regression function with respect to each of the
13parameters.  If derivatives are not given, the user must instead give
14a list of the parameters to be estimated (separated by spaces or
15commas), preceded by the keyword \cmd{params}.  The tolerance
16(criterion for terminating the iterative estimation procedure) can be
17adjusted using the \cmd{set} command.
18
19The syntax for specifying the function to be estimated consists of the
20name of the dependent variable, followed by an expression to generate
21it. This is illustrated in the following two examples, with
22accompanying derivatives.
23
24\begin{code}
25# Consumption function from Greene
26nls C = alpha + beta * Y^gamma
27    deriv alpha = 1
28    deriv beta = Y^gamma
29    deriv gamma = beta * Y^gamma * log(Y)
30end nls
31
32# Nonlinear function from Russell Davidson
33nls y = alpha + beta * x1 + (1/beta) * x2
34    deriv alpha = 1
35    deriv beta = x1 - x2/(beta*beta)
36end nls --vcv
37\end{code}
38
39Note the command words \cmd{nls} (which introduces the regression
40function), \cmd{deriv} (which introduces the specification of a
41derivative), and \cmd{end nls}, which terminates the specification and
42calls for estimation. If the \cmd{--vcv} flag is appended to the last
43line the covariance matrix of the parameter estimates is printed.
44
45\section{Initializing the parameters}
46\label{nls-param}
47
48The parameters of the regression function must be given initial values
49prior to the \cmd{nls} command. (In the GUI program this may be done
50via the menu item ``Variable, Define new variable'').
51
52In some cases, where the nonlinear function is a generalization of (or
53a restricted form of) a linear model, it may be convenient to run an
54\cmd{ols} and initialize the parameters from the OLS coefficient
55estimates.  In relation to the first example above, one might do:
56%
57\begin{code}
58ols C 0 Y
59alpha = $coeff(0)
60beta = $coeff(Y)
61gamma = 1
62\end{code}
63
64And in relation to the second example one might do:
65%
66\begin{code}
67ols y 0 x1 x2
68alpha = $coeff(0)
69beta = $coeff(x1)
70\end{code}
71
72\section{NLS dialog window}
73\label{nls-gui}
74
75It is probably most convenient to compose the commands for NLS
76estimation in the form of a gretl script but you can also do so
77interactively, by selecting the item ``Nonlinear Least Squares'' under
78the ``Model, Nonlinear models'' menu.  This opens a dialog box where
79you can type the function specification (possibly prefaced by
80statements to set the initial parameter values) and the derivatives,
81if available.  An example of this is shown in
82Figure~\ref{fig-nls-dialog}.  Note that in this context you do not
83have to supply the \cmd{nls} and \cmd{end nls} tags.
84
85\begin{figure}[htbp]
86  \begin{center}
87    \includegraphics[scale=0.5]{figures/nls_window}
88  \end{center}
89  \caption{NLS dialog box}
90  \label{fig-nls-dialog}
91\end{figure}
92
93\section{Analytical and numerical derivatives}
94\label{nls-deriv}
95
96If you are able to figure out the derivatives of the regression
97function with respect to the parameters, it is advisable to supply
98those derivatives as shown in the examples above.  If that is not
99possible, gretl will compute approximate numerical derivatives.
100However, the properties of the NLS algorithm may not be so good in
101this case (see section~\ref{nls-accuracy}).
102
103This is done by using the \cmd{params} statement, which should be
104followed by a list of identifiers containing the parameters to be
105estimated. In this case, the examples above would read as follows:
106
107\begin{code}
108# Greene
109nls C = alpha + beta * Y^gamma
110    params alpha beta gamma
111end nls
112\end{code}
113
114\begin{code}
115# Davidson
116nls y = alpha + beta * x1 + (1/beta) * x2
117    params alpha beta
118end nls
119\end{code}
120
121If analytical derivatives are supplied, they are checked for
122consistency with the given nonlinear function.  If the derivatives are
123clearly incorrect estimation is aborted with an error message.  If the
124derivatives are ``suspicious'' a warning message is issued but
125estimation proceeds.  This warning may sometimes be triggered by
126incorrect derivatives, but it may also be triggered by a high degree
127of collinearity among the derivatives.
128
129Note that you cannot mix analytical and numerical derivatives: you
130should supply expressions for all of the derivatives or none.
131
132\section{Advanced use}
133\label{sec:nls-generalize}
134
135The \cmd{nls} block can also contain more sophisticated constructs.
136First, it can handle intermediate expressions; this makes it possible
137to construct the conditional mean expression as a multi-step job, thus
138enhancing modularity and readability of the code. Second, more complex
139objects, such as lists and matrices, can be used for this purpose.
140
141For example, suppose that we want to estimate a Probit Binary Response
142model via NLS. The specification is
143\begin{equation}
144\label{eq:probit-nls}
145y_i = \Phi\left[g(\bm{x}_i)\right] + u_i, \qquad g(\bm{x}_i) = b_0 +
146b_1 x_{1,i} + b_2 x_{2,i} = \bm{b}'\bm{x}_i
147\end{equation}
148Note: this is not the recommended way to estimate a probit model: the
149$u_i$ term is heteroskedastic by construction and ML estimation is
150much preferable here. Still, NLS is a consistent estimator of the
151parameter vector $\bm{b}$, although its covariance matrix will have to
152be adjusted to compensate for heteroskedasticity: this is accomplished
153via the \option{robust} switch.
154
155\begin{script}[htbp]
156  \scriptcaption{NLS estimation of a Probit model}
157  \label{script:nls-probit-numder}
158  \begin{scode}
159    open greene25_1.gdt
160    list X = const age income ownrent selfempl
161
162    # initalisation
163    ols cardhldr X --quiet
164    matrix b = $coeff / $sigma
165
166    # proceed with NLS estimation
167    nls cardhldr = cnorm(ndx)
168        series ndx = lincomb(X, b)
169        params b
170    end nls --robust
171
172    # compare with ML probit
173    probit cardhldr X --p-values
174  \end{scode}
175\end{script}
176
177The example in script \ref{script:nls-probit-numder} can be enhanced by
178using analytical derivatives: since
179\[
180\frac{\partial g(\bm{x}_i)}{\partial b_j} = \varphi(\bm{b}'\bm{x}_i) \cdot x_{ij}
181\]
182one could substitute the \cmd{params} line in the script with the two-liner
183\begin{code}
184  series f = dnorm(ndx)
185  deriv b = {f} .* {X}
186\end{code}
187and have \cmd{nls} use analytically-computed derivatives, which are
188quicker and usually more reliable.
189
190\section{Controlling termination}
191\label{nls-toler}
192
193The NLS estimation procedure is an iterative process.  Iteration is
194terminated when the criterion for convergence is met or when the
195maximum number of iterations is reached, whichever comes first.
196
197Let $k$ denote the number of parameters being estimated.  The maximum
198number of iterations is $100 \times (k+1)$ when analytical derivatives
199are given, and $200 \times (k+1)$ when numerical derivatives are used.
200
201Let $\epsilon$ denote a small number.  The iteration is deemed to have
202converged if at least one of the following conditions is satisfied:
203
204\begin{itemize}
205\item Both the actual and predicted relative reductions in the error
206  sum of squares are at most $\epsilon$.
207\item The relative error between two consecutive iterates is at most
208  $\epsilon$.
209\end{itemize}
210
211This default value of $\epsilon$ is the machine precision to the power
2123/4,\footnote{On a 32-bit Intel Pentium machine a likely value for
213  this parameter is $1.82\times 10^{-12}$.} but it can be adjusted
214using the \cmd{set} command with the parameter \verb+nls_toler+.  For
215example
216%
217\begin{code}
218set nls_toler .0001
219\end{code}
220%
221will relax the value of $\epsilon$ to 0.0001.
222
223\section{Details on the code}
224\label{nls-code}
225
226The underlying engine for NLS estimation is based on the \app{minpack}
227suite of functions, available from
228\href{http://www.netlib.org/minpack/}{netlib.org}.  Specifically, the
229following \app{minpack} functions are called:
230
231\begin{center}
232  \begin{tabular}{ll}
233    \verb+lmder+ & Levenberg--Marquardt algorithm with analytical
234    derivatives
235    \\
236    \verb+chkder+ & Check the supplied analytical derivatives
237    \\
238    \verb+lmdif+ & Levenberg--Marquardt algorithm with numerical
239    derivatives
240    \\
241    \verb+fdjac2+ & Compute final approximate Jacobian when using
242    numerical derivatives
243    \\
244    \verb+dpmpar+ & Determine the machine precision
245    \\
246  \end{tabular}
247\end{center}
248
249On successful completion of the Levenberg--Marquardt iteration, a
250Gauss--Newton regression is used to calculate the covariance matrix
251for the parameter estimates.  If the \option{robust} flag is given a
252robust variant is computed.  The documentation for the \cmd{set}
253command explains the specific options available in this regard.
254
255Since NLS results are asymptotic, there is room for debate over
256whether or not a correction for degrees of freedom should be applied
257when calculating the standard error of the regression (and the
258standard errors of the parameter estimates).  For comparability with
259OLS, and in light of the reasoning given in
260\cite{davidson-mackinnon93}, the estimates shown in gretl
261\emph{do} use a degrees of freedom correction.
262
263\section{Numerical accuracy}
264\label{nls-accuracy}
265
266Table \ref{tab-nls} shows the results of running the gretl NLS
267procedure on the 27 Statistical Reference Datasets made available by
268the U.S.  National Institute of Standards and Technology (NIST) for
269testing nonlinear regression software.\footnote{For a discussion of
270  gretl's accuracy in the estimation of linear models, see
271  Appendix~\ref{app-accuracy}.}  For each dataset, two sets of
272starting values for the parameters are given in the test files, so the
273full test comprises 54 runs.  Two full tests were performed, one using
274all analytical derivatives and one using all numerical approximations.
275In each case the default tolerance was used.\footnote{The data shown
276  in the table were gathered from a pre-release build of gretl
277  version 1.0.9, compiled with \app{gcc} 3.3, linked against
278  \app{glibc} 2.3.2, and run under Linux on an i686 PC (IBM ThinkPad
279  A21m).}
280
281Out of the 54 runs, gretl failed to produce a solution
282in 4 cases when using analytical derivatives, and in 5 cases when
283using numeric approximation. Of the four failures in analytical
284derivatives mode, two were due to non-convergence of the
285Levenberg--Marquardt algorithm after the maximum number of iterations
286(on \verb+MGH09+ and \verb+Bennett5+, both described by NIST as of
287``Higher difficulty'') and two were due to generation of range errors
288(out-of-bounds floating point values) when computing the Jacobian (on
289\verb+BoxBOD+ and \verb+MGH17+, described as of ``Higher difficulty''
290and ``Average difficulty'' respectively).  The additional failure in
291numerical approximation mode was on \verb+MGH10+ (``Higher
292difficulty'', maximum number of iterations reached).
293
294The table gives information on several aspects of the tests: the
295number of outright failures, the average number of iterations taken to
296produce a solution and two sorts of measure of the accuracy of the
297estimates for both the parameters and the standard errors of the
298parameters.
299
300For each of the 54 runs in each mode, if the run produced a solution
301the parameter estimates obtained by gretl were compared with the
302NIST certified values.  We define the ``minimum correct figures'' for
303a given run as the number of significant figures to which the
304\emph{least accurate} gretl estimate agreed with the certified
305value, for that run. The table shows both the average and the worst
306case value of this variable across all the runs that produced a
307solution.  The same information is shown for the estimated standard
308errors.\footnote{For the standard errors, I excluded one outlier from
309  the statistics shown in the table, namely \verb+Lanczos1+.  This is
310  an odd case, using generated data with an almost-exact fit: the
311  standard errors are 9 or 10 orders of magnitude smaller than the
312  coefficients.  In this instance gretl could reproduce the
313  certified standard errors to only 3 figures (analytical derivatives)
314  and 2 figures (numerical derivatives).}
315
316The second measure of accuracy shown is the percentage of cases,
317taking into account all parameters from all successful runs, in which
318the gretl estimate agreed with the certified value to at least
319the 6 significant figures which are printed by default in the
320gretl regression output.
321
322\begin{table}[htbp]
323 \caption{Nonlinear regression: the NIST tests}
324 \label{tab-nls}
325  \begin{center}
326    \begin{tabular}{lcc}
327      & \textit{Analytical derivatives} &
328      \textit{Numerical derivatives} \\ [4pt]
329        Failures in 54 tests & 4 & 5\\
330        Average iterations & 32 & 127\\
331        Mean of min. correct figures, & 8.120 & 6.980\\
332        parameters \\
333        Worst of min. correct figures, & 4 & 3\\
334        parameters \\
335        Mean of min. correct figures, & 8.000 & 5.673\\
336        standard errors \\
337        Worst of min. correct figures, & 5 & 2\\
338        standard errors \\
339        Percent correct to at least 6 figures, & 96.5 & 91.9\\
340        parameters \\
341        Percent correct to at least 6 figures, & 97.7 & 77.3\\
342        standard errors \\
343      \end{tabular}
344    \end{center}
345  \end{table}
346
347  Using analytical derivatives, the worst case values for both
348  parameters and standard errors were improved to 6 correct figures on
349  the test machine when the tolerance was tightened to 1.0e$-$14.
350  Using numerical derivatives, the same tightening of the tolerance
351  raised the worst values to 5 correct figures for the parameters and
352  3 figures for standard errors, at a cost of one additional failure
353  of convergence.
354
355  Note the overall superiority of analytical derivatives: on average
356  solutions to the test problems were obtained with substantially
357  fewer iterations and the results were more accurate (most notably
358  for the estimated standard errors).  Note also that the six-digit
359  results printed by gretl are not 100 percent reliable for
360  difficult nonlinear problems (in particular when using numerical
361  derivatives).  Having registered this caveat, the percentage of
362  cases where the results were good to six digits or better seems high
363  enough to justify their printing in this form.
364
365%%% Local Variables:
366%%% mode: latex
367%%% TeX-master: "gretl-guide"
368%%% End:
369