1headings 10
2Tests 24
3add
4adf
5bds
6bkw
7chow
8coeffsum
9coint
10cusum
11difftest
12johansen
13kpss
14leverage
15levinlin
16meantest
17modtest
18normtest
19omit
20panspec
21qlrtest
22reset
23restrict
24runs
25vartest
26vif
27Graphs 10
28boxplot
29gnuplot
30graphpg
31hfplot
32panplot
33plot
34qqplot
35rmplot
36scatters
37textplot
38Statistics 14
39anova
40corr
41corrgm
42fractint
43freq
44hurst
45mahal
46pca
47pergm
48pvalue
49spearman
50summary
51xcorrgm
52xtab
53Dataset 18
54append
55data
56dataset
57delete
58genr
59info
60join
61labels
62markers
63nulldata
64open
65rename
66setinfo
67setmiss
68setobs
69smpl
70store
71varlist
72Estimation 34
73ar
74ar1
75arch
76arima
77arma
78biprobit
79dpanel
80duration
81equation
82estimate
83garch
84gmm
85heckit
86hsk
87intreg
88lad
89logistic
90logit
91midasreg
92mle
93mpols
94negbin
95nls
96ols
97panel
98poisson
99probit
100quantreg
101system
102tobit
103tsls
104var
105vecm
106wls
107Programming 19
108break
109catch
110clear
111elif
112else
113end
114endif
115endloop
116flush
117foreign
118funcerr
119function
120if
121include
122loop
123makepkg
124run
125set
126setopt
127Transformations 10
128diff
129discrete
130dummify
131lags
132ldiff
133logs
134orthdev
135sdiff
136square
137stdize
138Utilities 6
139eval
140help
141modeltab
142pkg
143quit
144shell
145Printing 7
146eqnprint
147modprint
148outfile
149print
150printf
151sprintf
152tabprint
153Prediction 1
154fcast
155
156# add Tests
157
158Argument:   varlist
159Options:    --lm (do an LM test, OLS only)
160            --quiet (print only the basic test result)
161            --silent (don't print anything)
162            --vcv (print covariance matrix for augmented model)
163            --both (IV estimation only, see below)
164Examples:   add 5 7 9
165            add xx yy zz --quiet
166
167Must be invoked after an estimation command. Performs a joint test for the
168addition of the specified variables to the last model, the results of which
169may be retrieved using the accessors "$test" and "$pvalue".
170
171By default an augmented version of the original model is estimated,
172including the variables in varlist. The test is a Wald test on the augmented
173model, which replaces the original as the "current model" for the purposes
174of, for example, retrieving the residuals as $uhat or doing further tests.
175
176Alternatively, given the --lm option (available only for the models
177estimated via OLS), an LM test is performed. An auxiliary regression is run
178in which the dependent variable is the residual from the last model and the
179independent variables are those from the last model plus varlist. Under the
180null hypothesis that the added variables have no additional explanatory
181power, the sample size times the unadjusted R-squared from this regression
182is distributed as chi-square with degrees of freedom equal to the number of
183added regressors. In this case the original model is not replaced.
184
185The --both option is specific to two-stage least squares: it specifies that
186the new variables should be added both to the list of regressors and the
187list of instruments, the default in this case being to add to the regressors
188only.
189
190Menu path:    Model window, /Tests/Add variables
191
192# adf Tests
193
194Arguments:  order varlist
195Options:    --nc (test without a constant)
196            --c (with constant only)
197            --ct (with constant and trend)
198            --ctt (with constant, trend and trend squared)
199            --seasonals (include seasonal dummy variables)
200            --gls (de-mean or de-trend using GLS)
201            --verbose (print regression results)
202            --quiet (suppress printing of results)
203            --difference (use first difference of variable)
204            --test-down[=criterion] (automatic lag order)
205            --perron-qu (see below)
206Examples:   adf 0 y
207            adf 2 y --nc --c --ct
208            adf 12 y --c --test-down
209            See also jgm-1996.inp
210
211The options shown above and the discussion which follows mostly pertain to
212the use of the adf command with regular time series data. For use of this
213command with panel data please see the section titled "Panel data" below.
214
215This command computes a set of Dickey-Fuller tests on each of the listed
216variables, the null hypothesis being that the variable in question has a
217unit root. (But if the --difference flag is given, the first difference of
218the variable is taken prior to testing, and the discussion below must be
219taken as referring to the transformed variable.)
220
221By default, two variants of the test are shown: one based on a regression
222containing a constant and one using a constant and linear trend. You can
223control the variants that are presented by specifying one or more of the
224option flags --nc, --c, --ct, --ctt.
225
226The --gls option can be used in conjunction with one or other of the flags
227--c and --ct. The effect of this option is that the series to be tested is
228demeaned or detrended using the GLS procedure proposed by Elliott,
229Rothenberg and Stock (1996), which gives a test of greater power than the
230standard Dickey-Fuller approach. This option is not compatible with --nc,
231--ctt or --seasonals.
232
233In all cases the dependent variable in the test regression is the first
234difference of the specified series, y, and the key independent variable is
235the first lag of y. The regression is constructed such that the coefficient
236on lagged y equals the root in question, α, minus 1. For example, the model
237with a constant may be written as
238
239  (1 - L)y(t) = b0 + (a-1)y(t-1) + e(t)
240
241Under the null hypothesis of a unit root the coefficient on lagged y equals
242zero. Under the alternative that y is stationary this coefficient is
243negative. So the test is inherently one-sided.
244
245Selecting the lag order
246
247The simplest version of the Dickey-Fuller test assumes that the error term
248in the test regression is serially uncorrelated. In practice this is
249unlikely to be the case and the specification is often extended by including
250one or more lags of the dependent variable, giving an Augmented
251Dickey-Fuller (ADF) test. The order argument governs the number of such
252lags, k, possibly depending on the sample size, T.
253
254  For a fixed, user-specified k: give a non-negative value for order.
255
256  For T-dependent k: give order as -1. The order is then set following the
257  recommendation of Schwert (1989), namely the integer part of
258  12(T/100)^0.25.
259
260In general, however, we don't know how many lags will be required to
261"whiten" the Dickey-Fuller residual. It's therefore common to specify the
262maximum value of k and let the data decide the actual number of lags to
263include. This can be done via the --test-down option. The criterion for
264selecting optimal k may be set using the parameter to this option, which
265should be one of AIC, BIC or tstat, AIC being the default.
266
267When testing down via AIC or BIC, the final lag order for the ADF equation
268is that which optimizes the chosen information criterion (Akaike or Schwarz
269Bayesian). The exact procedure depends on whether or not the --gls option is
270given. When GLS is specified, AIC and BIC are the "modified" versions
271described in Ng and Perron (2001), otherwise they are the standard versions.
272In the GLS case a refinement is available. If the additional option
273--perron-qu is given, lag-order selection is performed via the revised
274method recommended by Perron and Qu (2007). In this case the data are first
275demeaned or detrended via OLS; GLS is applied once the lag order is
276determined.
277
278When testing down via the t-statistic method is called for, the procedure is
279as follows:
280
2811. Estimate the Dickey-Fuller regression with k lags of the dependent
282   variable.
283
2842. Is the last lag significant? If so, execute the test with lag order k.
285   Otherwise, let k = k - 1; if k equals 0, execute the test with lag order
286   0, else go to step 1.
287
288In the context of step 2 above, "significant" means that the t-statistic for
289the last lag has an asymptotic two-sided p-value, against the normal
290distribution, of 0.10 or less.
291
292To sum up, if we accept the various arguments of Perron, Ng, Qu and Schwert
293referenced above, the favored command for testing a series y is likely to
294be:
295
296	adf -1 y --c --gls --test-down --perron-qu
297
298(Or substitute --ct for --c if the series seems to display a trend.) The lag
299order for the test will then be determined by testing down via modified AIC
300from the Schwert maximum, with the Perron-Qu refinement.
301
302P-values for the Dickey-Fuller tests are based on response-surface
303estimates. When GLS is not applied these are taken from MacKinnon (1996).
304Otherwise they are taken from Cottrell (2015) or, when testing down is
305performed, Sephton (2021). The P-values are specific to the sample size
306unless they are labeled as asymptotic.
307
308Panel data
309
310When the adf command is used with panel data, to produce a panel unit root
311test, the applicable options and the results shown are somewhat different.
312
313First, while you may give a list of variables for testing in the regular
314time-series case, with panel data only one variable may be tested per
315command. Second, the options governing the inclusion of deterministic terms
316become mutually exclusive: you must choose between no-constant, constant
317only, and constant plus trend; the default is constant only. In addition,
318the --seasonals option is not available. Third, the --verbose option has a
319different meaning: it produces a brief account of the test for each
320individual time series (the default being to show only the overall result).
321
322The overall test (null hypothesis: the series in question has a unit root
323for all the panel units) is calculated in one or both of two ways: using the
324method of Im, Pesaran and Shin (Journal of Econometrics, 2003) or that of
325Choi (Journal of International Money and Finance, 2001). The Choi test
326requires that P-values are available for the individual tests; if this is
327not the case (depending on the options selected) it is omitted. The
328particular statistic given for the Im, Pesaran, Shin test varies as follows:
329if the lag order for the test is non-zero their W statistic is shown;
330otherwise if the time-series lengths differ by individual, their Z
331statistic; otherwise their t-bar statistic. See also the "levinlin" command.
332
333Menu path:    /Variable/Unit root tests/Augmented Dickey-Fuller test
334
335# anova Statistics
336
337Arguments:  response treatment [ block ]
338Option:     --quiet (don't print results)
339
340Analysis of Variance: response is a series measuring some effect of interest
341and treatment must be a discrete variable that codes for two or more types
342of treatment (or non-treatment). For two-way ANOVA, the block variable
343(which should also be discrete) codes for the values of some control
344variable.
345
346Unless the --quiet option is given, this command prints a table showing the
347sums of squares and mean squares along with an F-test. The F-test and its
348p-value can be retrieved using the accessors "$test" and "$pvalue"
349respectively.
350
351The null hypothesis for the F-test is that the mean response is invariant
352with respect to the treatment type, or in words that the treatment has no
353effect. Strictly speaking, the test is valid only if the variance of the
354response is the same for all treatment types.
355
356Note that the results shown by this command are in fact a subset of the
357information given by the following procedure, which is easily implemented in
358gretl. Create a set of dummy variables coding for all but one of the
359treatment types. For two-way ANOVA, in addition create a set of dummies
360coding for all but one of the "blocks". Then regress response on a constant
361and the dummies using "ols". For a one-way design the ANOVA table is printed
362via the --anova option to ols. In the two-way case the relevant F-test is
363found by using the "omit" command. For example (assuming y is the response,
364xt codes for the treatment, and xb codes for blocks):
365
366	# one-way
367	list dxt = dummify(xt)
368	ols y 0 dxt --anova
369	# two-way
370	list dxb = dummify(xb)
371	ols y 0 dxt dxb
372	# test joint significance of dxt
373	omit dxt --quiet
374
375Menu path:    /Model/Other linear models/ANOVA
376
377# append Dataset
378
379Argument:   filename
380Options:    --time-series (see below)
381            --fixed-sample (see below)
382            --update-overlap (see below)
383            --quiet (don't print anything)
384            See below for additional specialized options
385
386Opens a data file and appends the content to the current dataset, if the new
387data are compatible. The program will try to detect the format of the data
388file (native, plain text, CSV, Gnumeric, Excel, etc.).
389
390The appended data may take the form of either additional observations on
391series already present in the dataset, and/or new series. In the case of
392adding series, compatibility requires either (a) that the number of
393observations for the new data equals that for the current data, or (b) that
394the new data carries clear observation information so that gretl can work
395out how to place the values.
396
397One case that is not supported is where the new data start earlier and also
398end later than the original data. To add new series in such a case you can
399use the --fixed-sample option; this has the effect of suppressing the adding
400of observations, and so restricting the operation to the addition of new
401series.
402
403A special feature is supported for appending to a panel dataset. Let n
404denote the number of cross-sectional units in the panel, T denote the number
405of time periods, and m denote the number of observations for the new data.
406If m = n the new data are taken to be time-invariant, and are copied into
407place for each time period. On the other hand, if m = T the data are treated
408as non-varying across the panel units, and are copied into place for each
409unit. If the panel is "square", and m equals both n and T, an ambiguity
410arises. The default in this case is to treat the new data as time-invariant,
411but you can force gretl to treat the new data as time series via the
412--time-series option. (This option is ignored in all other cases.)
413
414When a data file is selected for appending, there may be an area of overlap
415with the existing dataset; that is, one or more series may have one or more
416observations in common across the two sources. If the option
417--update-overlap is given, the append operation will replace any overlapping
418observations with the values from the selected data file, otherwise the
419values currently in place will be unaffected.
420
421The additional specialized options --sheet, --coloffset, --rowoffset and
422--fixed-cols work in the same way as with "open"; see that command for
423explanations.
424
425See also "join" for more sophisticated handling of multiple data sources.
426
427Menu path:    /File/Append data
428
429# ar Estimation
430
431Arguments:  lags ; depvar indepvars
432Options:    --vcv (print covariance matrix)
433            --quiet (don't print parameter estimates)
434Example:    ar 1 3 4 ; y 0 x1 x2 x3
435
436Computes parameter estimates using the generalized Cochrane-Orcutt iterative
437procedure; see Section 9.5 of Ramanathan (2002). Iteration is terminated
438when successive error sums of squares do not differ by more than 0.005
439percent or after 20 iterations.
440
441"lags" is a list of lags in the residuals, terminated by a semicolon. In the
442above example, the error term is specified as
443
444  u(t) = rho(1)*u(t-1) + rho(3)*u(t-3) + rho(4)*u(t-4)
445
446Menu path:    /Model/Univariate time series/AR Errors (GLS)
447
448# ar1 Estimation
449
450Arguments:  depvar indepvars
451Options:    --hilu (use Hildreth-Lu procedure)
452            --pwe (use Prais-Winsten estimator)
453            --vcv (print covariance matrix)
454            --no-corc (do not fine-tune results with Cochrane-Orcutt)
455            --loose (use looser convergence criterion)
456            --quiet (don't print anything)
457Examples:   ar1 1 0 2 4 6 7
458            ar1 y 0 xlist --pwe
459            ar1 y 0 xlist --hilu --no-corc
460
461Computes feasible GLS estimates for a model in which the error term is
462assumed to follow a first-order autoregressive process.
463
464The default method is the Cochrane-Orcutt iterative procedure; see for
465example section 9.4 of Ramanathan (2002). The criterion for convergence is
466that successive estimates of the autocorrelation coefficient do not differ
467by more than 1e-6, or if the --loose option is given, by more than 0.001. If
468this is not achieved within 100 iterations an error is flagged.
469
470If the --pwe option is given, the Prais-Winsten estimator is used. This
471involves an iteration similar to Cochrane-Orcutt; the difference is that
472while Cochrane-Orcutt discards the first observation, Prais-Winsten makes
473use of it. See, for example, Chapter 13 of Greene (2000) for details.
474
475If the --hilu option is given, the Hildreth-Lu search procedure is used. The
476results are then fine-tuned using the Cochrane-Orcutt method, unless the
477--no-corc flag is specified. The --no-corc option is ignored for estimators
478other than Hildreth-Lu.
479
480Menu path:    /Model/Univariate time series/AR Errors (GLS)
481
482# arch Estimation
483
484Arguments:  order depvar indepvars
485Option:     --quiet (don't print anything)
486Example:    arch 4 y 0 x1 x2 x3
487
488This command is retained at present for backward compatibility, but you are
489better off using the maximum likelihood estimator offered by the "garch"
490command; for a plain ARCH model, set the first GARCH parameter to 0.
491
492Estimates the given model specification allowing for ARCH (Autoregressive
493Conditional Heteroskedasticity). The model is first estimated via OLS, then
494an auxiliary regression is run, in which the squared residual from the first
495stage is regressed on its own lagged values. The final step is weighted
496least squares estimation, using as weights the reciprocals of the fitted
497error variances from the auxiliary regression. (If the predicted variance of
498any observation in the auxiliary regression is not positive, then the
499corresponding squared residual is used instead).
500
501The alpha values displayed below the coefficients are the estimated
502parameters of the ARCH process from the auxiliary regression.
503
504See also "garch" and "modtest" (the --arch option).
505
506# arima Estimation
507
508Arguments:  p d q [ ; P D Q ] ; depvar [ indepvars ]
509Options:    --verbose (print details of iterations)
510            --quiet (don't print out results)
511            --vcv (print covariance matrix)
512            --hessian (see below)
513            --opg (see below)
514            --nc (do not include a constant)
515            --conditional (use conditional maximum likelihood)
516            --x-12-arima (use X-12-ARIMA, or X13, for estimation)
517            --lbfgs (use L-BFGS-B maximizer)
518            --y-diff-only (ARIMAX special, see below)
519Examples:   arima 1 0 2 ; y
520            arima 2 0 2 ; y 0 x1 x2 --verbose
521            arima 0 1 1 ; 0 1 1 ; y --nc
522            See also armaloop.inp, bjg.inp
523
524Note: arma is an acceptable alias for this command.
525
526If no indepvars list is given, estimates a univariate ARIMA (Autoregressive,
527Integrated, Moving Average) model. The values p, d and q represent the
528autoregressive (AR) order, the differencing order, and the moving average
529(MA) order respectively. These values may be given in numerical form, or as
530the names of pre-existing scalar variables. A d value of 1, for instance,
531means that the first difference of the dependent variable should be taken
532before estimating the ARMA parameters.
533
534If you wish to include only specific AR or MA lags in the model (as opposed
535to all lags up to a given order) you can substitute for p and/or q either
536(a) the name of a pre-defined matrix containing a set of integer values or
537(b) an expression such as {1,4}; that is, a set of lags separated by commas
538and enclosed in braces.
539
540The optional integer values P, D and Q represent the seasonal AR order, the
541order for seasonal differencing, and the seasonal MA order, respectively.
542These are applicable only if the data have a frequency greater than 1 (for
543example, quarterly or monthly data). These orders may be given in numerical
544form or as scalar variables.
545
546In the univariate case the default is to include an intercept in the model
547but this can be suppressed with the --nc flag. If indepvars are added, the
548model becomes ARMAX; in this case the constant should be included explicitly
549if you want an intercept (as in the second example above).
550
551An alternative form of syntax is available for this command: if you do not
552want to apply differencing (either seasonal or non-seasonal), you may omit
553the d and D fields altogether, rather than explicitly entering 0. In
554addition, arma is a synonym or alias for arima. Thus for example the
555following command is a valid way to specify an ARMA(2, 1) model:
556
557	arma 2 1 ; y
558
559The default is to use the "native" gretl ARMA functionality, with estimation
560by exact ML; estimation via conditional ML is available as an option. (If
561X-12-ARIMA is installed you have the option of using it instead of native
562code. Note that the newer X13 works as a drop-in replacement in exactly the
563same way.) For details regarding these options, please see chapter 31 of the
564Gretl User's Guide.
565
566When native exact ML code is used, estimated standard errors are by default
567based on a numerical approximation to the (negative inverse of) the Hessian,
568with a fallback to the outer product of the gradient (OPG) if calculation of
569the numerical Hessian should fail. Two (mutually exclusive) option flags can
570be used to force the issue: the --opg option forces use of the OPG method,
571with no attempt to compute the Hessian, while the --hessian flag disables
572the fallback to OPG. Note that failure of the numerical Hessian computation
573is generally an indicator of a misspecified model.
574
575The option --lbfgs is specific to estimation using native ARMA code and
576exact ML: it calls for use of the "limited memory" L-BFGS-B algorithm in
577place of the regular BFGS maximizer. This may help in some instances where
578convergence is difficult to achieve.
579
580The option --y-diff-only is specific to estimation of ARIMAX models (models
581with a non-zero order of integration and including exogenous regressors),
582and applies only when gretl's native exact ML is used. For such models the
583default behavior is to difference both the dependent variable and the
584regressors, but when this option is specified only the dependent variable is
585differenced, the regressors remaining in level form.
586
587The AIC value given in connection with ARIMA models is calculated according
588to the definition used in X-12-ARIMA, namely
589
590  AIC = -2L + 2k
591
592where L is the log-likelihood and k is the total number of parameters
593estimated. Note that X-12-ARIMA does not produce information criteria such
594as AIC when estimation is by conditional ML.
595
596The AR and MA roots shown in connection with ARMA estimation are based on
597the following representation of an ARMA(p, q) process:
598
599	(1 - a_1*L - a_2*L^2 - ... - a_p*L^p)Y =
600          c + (1 + b_1*L + b_2*L^2 + ... + b_q*L^q) e_t
601
602The AR roots are therefore the solutions to
603
604         1 - a_1*z - a_2*z^2 - ... - a_p*L^p = 0
605
606and stability requires that these roots lie outside the unit circle.
607
608The "frequency" figure printed in connection with AR and MA roots is the
609lambda value that solves z = r * exp(i*2*pi*lambda) where z is the root in
610question and r is its modulus.
611
612Menu path:    /Model/Univariate time series/ARIMA
613
614# arma Estimation
615
616See "arima"; arma is an alias.
617
618# bds Tests
619
620Arguments:  order x
621Options:    --corr1=rho (see below)
622            --sdcrit=multiple (see below)
623            --boot=N (see below)
624            --matrix=m (use matrix input)
625            --quiet (suppress printing of results)
626Examples:   bds 5 x
627            bds 3 --matrix=m
628            bds 4 --sdcrit=2.0
629
630Performs the BDS (Brock, Dechert, Scheinkman and LeBaron, 1996) test for
631nonlinearity of the series x. In an econometric context this is typically
632used to test a regression residual for violation of the IID condition. The
633test is based on a set of correlation integrals, designed to detect
634nonlinearity of progressively higher dimensionality, and the order argument
635sets the number of such integrals. This must be at least 2; the first
636integral establishes a baseline but does not support a test. The BDS test is
637of the portmanteau type: able to detect all manner of departures from
638linearity but not informative about how exactly the condition was violated.
639
640Instead of giving x as a series, the --matrix option can be used to specify
641a matrix as input. The matrix must be a vector (column or row).
642
643Criterion for closeness
644
645The correlation integrals are based on a measure of "closeness" of data
646points, where two points are considered close if they lie within ε of each
647other. The test requires a specification of ε. By default gretl follows the
648recommendation of Kanzler (1999): ε is chosen such that the first-order
649correlation integral is around 0.7. A common alternative (requiring less
650computation) is to specify ε as a multiple of the standard deviation of the
651target series. The --sdcrit option supports the latter method; in the third
652example above ε is set to twice the standard deviation of x. The --corr1
653option implies use of Kanzler's method but allows for a target correlation
654other than 0.7. It should be clear that these two options are mutually
655exclusive.
656
657Bootstrapping
658
659BDS test statistics are asymptotically distributed as N(0,1) but the test
660over-rejects quite markedly in small to moderate-sized samples. For that
661reason P-values are by default obtained via bootstrapping when x is of
662length less than 600 (but by reference to the normal distribution
663otherwise). If you want to use the bootstrap for larger samples you can
664force the issue by giving a non-zero value for the --boot option,
665Conversely, if you don't want bootstrapping for smaller samples, give a zero
666value for --boot.
667
668When bootstrapping is performed the default number of iterations is 1999,
669but you can specify a different number by giving a value greater than 1 with
670--boot.
671
672Accessor matrix
673
674On successful completion of this command, "$result" retrieves the test
675results in the form of a matrix with two rows and order - 1 columns. The
676first row contains test statistics and the second P-values for each of the
677per-dimension tests under the null that x is linear/IID.
678
679# biprobit Estimation
680
681Arguments:  depvar1 depvar2 indepvars1 [ ; indepvars2 ]
682Options:    --vcv (print covariance matrix)
683            --robust (robust standard errors)
684            --cluster=clustvar (see "logit" for explanation)
685            --opg (see below)
686            --save-xbeta (see below)
687            --verbose (print extra information)
688Examples:   biprobit y1 y2 0 x1 x2
689            biprobit y1 y2 0 x11 x12 ; 0 x21 x22
690            See also biprobit.inp
691
692Estimates a bivariate probit model, using the Newton-Raphson method to
693maximize the likelihood.
694
695The argument list starts with the two (binary) dependent variables, followed
696by a list of regressors. If a second list is given, separated by a
697semicolon, this is interpreted as a set of regressors specific to the second
698equation, with indepvars1 being specific to the first equation; otherwise
699indepvars1 is taken to represent a common set of regressors.
700
701By default, standard errors are computed using the analytical Hessian at
702convergence. But if the --opg option is given the covariance matrix is based
703on the Outer Product of the Gradient (OPG), or if the --robust option is
704given QML standard errors are calculated, using a "sandwich" of the inverse
705of the Hessian and the OPG.
706
707Note that the estimate of rho, the correlation of the error terms across the
708two equations, is included in the coefficient vector; it's the last element
709in the accessors coeff, stderr and vcv.
710
711After successful estimation, the accessor $uhat retrieves a matrix with two
712columns holding the generalized residuals for the two equations; that is,
713the expected values of the disturbances conditional on the observed outcomes
714and covariates. By default $yhat retrieves a matrix with four columns,
715holding the estimated probabilities of the four possible joint outcomes for
716(y_1, y_2), in the order (1,1), (1,0), (0,1), (0,0). Alternatively, if the
717option --save-xbeta is given, $yhat has two columns and holds the values of
718the index functions for the respective equations.
719
720The output includes a test of the null hypothesis that the disturbances in
721the two equations are uncorrelated. This is a likelihood ratio test unless
722the QML variance estimator is requested, in which case it's a Wald test.
723
724# bkw Tests
725
726Option:     --quiet (don't print anything)
727Examples:   longley.inp
728
729Must follow the estimation of a model which includes at least two
730independent variables. Calculates and displays diagnostic information
731pertaining to collinearity, namely the BKW Table, based on the work of
732Belsley, Kuh and Welsch (1980). This table presents a sophisticated analysis
733of the degree and sources of collinearity, via eigenanalysis of the inverse
734correlation matrix. For a thorough account of the BKW approach with
735reference to gretl, and with several examples, see Adkins, Waters and Hill
736(2015).
737
738Following this command the "$result" accessor may be used to retrieve the
739BKW table as a matrix. See also the "vif" command for a simpler approach to
740diagnosing collinearity.
741
742There is also a function named "bkw" which offers greater flexibility.
743
744Menu path:    Model window, /Analysis/Collinearity
745
746# boxplot Graphs
747
748Argument:   varlist
749Options:    --notches (show 90 percent interval for median)
750            --factorized (see below)
751            --panel (see below)
752            --matrix=name (plot columns of named matrix)
753            --output=filename (send output to specified file)
754
755These plots display the distribution of a variable. The central box encloses
756the middle 50 percent of the data, i.e. it is bounded by the first and third
757quartiles. The "whiskers" extend from each end of the box for a range equal
758to 1.5 times the interquartile range. Observations outside that range are
759considered outliers and represented via dots. A line is drawn across the box
760at the median. A "+" sign is used to indicate the mean. If the option of
761showing a confidence interval for the median is selected, this is computed
762via the bootstrap method and shown in the form of dashed horizontal lines
763above and/or below the median.
764
765The --factorized option allows you to examine the distribution of a chosen
766variable conditional on the value of some discrete factor. For example, if a
767data set contains wages and a gender dummy variable you can select the wage
768variable as the target and gender as the factor, to see side-by-side
769boxplots of male and female wages, as in
770
771	boxplot wage gender --factorized
772
773Note that in this case you must specify exactly two variables, with the
774factor given second.
775
776If the current data set is a panel, and just one variable is specified, the
777--panel option produces a series of side-by-side boxplots, one for each
778panel "unit" or group.
779
780Generally, the argument varlist is required, and refers to one or more
781series in the current dataset (given either by name or ID number). But if a
782named matrix is supplied via the --matrix option this argument becomes
783optional: by default a plot is drawn for each column of the specified
784matrix.
785
786Gretl's boxplots are generated using gnuplot, and it is possible to specify
787the plot more fully by appending additional gnuplot commands, enclosed in
788braces. For details, please see the help for the "gnuplot" command.
789
790In interactive mode the result is displayed immediately. In batch mode the
791default behavior is that a gnuplot command file is written in the user's
792working directory, with a name on the pattern gpttmpN.plt, starting with N =
79301. The actual plots may be generated later using gnuplot (under MS Windows,
794wgnuplot). This behavior can be modified by use of the --output=filename
795option. For details, please see the "gnuplot" command.
796
797Menu path:    /View/Graph specified vars/Boxplots
798
799# break Programming
800
801Break out of a loop. This command can be used only within a loop; it causes
802command execution to break out of the current (innermost) loop. See also
803"loop".
804
805# catch Programming
806
807Syntax:     catch command
808
809This is not a command in its own right but can be used as a prefix to most
810regular commands: the effect is to prevent termination of a script if an
811error occurs in executing the command. If an error does occur, this is
812registered in an internal error code which can be accessed as $error (a zero
813value indicates success). The value of $error should always be checked
814immediately after using catch, and appropriate action taken if the command
815failed.
816
817The catch keyword cannot be used before if, elif or endif. In addition it
818should not be used on calls to user-defined functions; it is intended for
819use only with gretl commands and calls to "built-in" functions or operators.
820Furthermore, catch cannot be used in conjunction with "back-arrow"
821assignment of models or plots to session icons (see chapter 3 of the Gretl
822User's Guide).
823
824# chow Tests
825
826Variants:   chow obs
827            chow dummyvar --dummy
828Options:    --dummy (use a pre-existing dummy variable)
829            --quiet (don't print estimates for augmented model)
830            --limit-to=list (limit test to subset of regressors)
831Examples:   chow 25
832            chow 1988:1
833            chow female --dummy
834
835Must follow an OLS regression. If an observation number or date is given,
836provides a test for the null hypothesis of no structural break at the given
837split point. The procedure is to create a dummy variable which equals 1 from
838the split point specified by obs to the end of the sample, 0 otherwise, and
839also interaction terms between this dummy and the original regressors. If a
840dummy variable is given, tests the null hypothesis of structural homogeneity
841with respect to that dummy. Again, interaction terms are added. In either
842case an augmented regression is run including the additional terms.
843
844By default an F statistic is calculated, taking the augmented regression as
845the unrestricted model and the original as the restricted. But if the
846original model used a robust estimator for the covariance matrix, the test
847statistic is a Wald chi-square value based on a robust estimator of the
848covariance matrix for the augmented regression.
849
850The --limit-to option can be used to limit the set of interactions with the
851split dummy variable to a subset of the original regressors. The parameter
852for this option must be a named list, all of whose members are among the
853original regressors. The list should not include the constant.
854
855Menu path:    Model window, /Tests/Chow test
856
857# clear Programming
858
859Options:    --dataset (clear dataset only)
860            --functions (clear functions (only))
861
862By default this command clears the current dataset (if any) plus all saved
863variables (scalars, matrices, etc.) out of memory. Note that opening a new
864dataset, or using the "nulldata" command to create an empty dataset, also
865has this effect, so explicit use of "clear" is not usually necessary.
866
867If the --dataset option is given, then only the dataset is cleared (plus any
868named lists of series); other saved objects such as matrices, scalars and
869bundles are preserved.
870
871If the --functions option is given, then any user-defined functions, and any
872functions defined by packages that have been loaded, are cleared out of
873memory. The dataset and other variables are not affected.
874
875# coeffsum Tests
876
877Argument:   varlist
878Option:     --quiet (don't print anything)
879Examples:   coeffsum xt xt_1 xr_2
880            See also restrict.inp
881
882Must follow a regression. Calculates the sum of the coefficients on the
883variables in varlist. Prints this sum along with its standard error and the
884p-value for the null hypothesis that the sum is zero.
885
886Note the difference between this and "omit", which tests the null hypothesis
887that the coefficients on a specified subset of independent variables are all
888equal to zero.
889
890The --quiet option may be useful if one just wants access to the "$test" and
891"$pvalue" values that are recorded on successful completion.
892
893Menu path:    Model window, /Tests/Sum of coefficients
894
895# coint Tests
896
897Arguments:  order depvar indepvars
898Options:    --nc (do not include a constant)
899            --ct (include constant and trend)
900            --ctt (include constant and quadratic trend)
901            --seasonals (include seasonal dummy variables)
902            --skip-df (no DF tests on individual variables)
903            --test-down[=criterion] (automatic lag order)
904            --verbose (print extra details of regressions)
905            --silent (don't print anything)
906Examples:   coint 4 y x1 x2
907            coint 0 y x1 x2 --ct --skip-df
908
909The Engle-Granger (1987) cointegration test. The default procedure is: (1)
910carry out Dickey-Fuller tests on the null hypothesis that each of the
911variables listed has a unit root; (2) estimate the cointegrating regression;
912and (3) run a DF test on the residuals from the cointegrating regression. If
913the --skip-df flag is given, step (1) is omitted.
914
915If the specified lag order is positive all the Dickey-Fuller tests use that
916order, with this qualification: if the --test-down option is given, the
917given value is taken as the maximum and the actual lag order used in each
918case is obtained by testing down. See the "adf" command for details of this
919procedure.
920
921By default, the cointegrating regression contains a constant. If you wish to
922suppress the constant, add the --nc flag. If you wish to augment the list of
923deterministic terms in the cointegrating regression with a linear or
924quadratic trend, add the --ct or --ctt flag. These option flags are mutually
925exclusive. You also have the option of adding seasonal dummy variables (in
926the case of quarterly or monthly data).
927
928P-values for this test are based on MacKinnon (1996). The relevant code is
929included by kind permission of the author.
930
931For the cointegration tests due to Søren Johansen, see "johansen".
932
933Menu path:    /Model/Multivariate time series
934
935# corr Statistics
936
937Variants:   corr [ varlist ]
938            corr --matrix=matname
939Options:    --uniform (ensure uniform sample)
940            --spearman (Spearman's rho)
941            --kendall (Kendall's tau)
942            --verbose (print rankings)
943            --plot=mode-or-filename (see below)
944            --triangle (only plot lower half, see below)
945Examples:   corr y x1 x2 x3
946            corr ylist --uniform
947            corr x y --spearman
948            corr --matrix=X --plot=display
949
950By default, prints the pairwise correlation coefficients (Pearson's
951product-moment correlation) for the variables in varlist, or for all
952variables in the data set if varlist is not given. The standard behavior is
953to use all available observations for computing each pairwise coefficient,
954but if the --uniform option is given the sample is limited (if necessary) so
955that the same set of observations is used for all the coefficients. This
956option has an effect only if there are differing numbers of missing values
957for the variables used.
958
959The (mutually exclusive) options --spearman and --kendall produce,
960respectively, Spearman's rank correlation rho and Kendall's rank correlation
961tau in place of the default Pearson coefficient. When either of these
962options is given, varlist should contain just two variables.
963
964When a rank correlation is computed, the --verbose option can be used to
965print the original and ranked data (otherwise this option is ignored).
966
967If varlist contains more than two series and the program is not in batch
968mode, a "heatmap" plot of the correlation matrix is shown. This can be
969adjusted via the --plot option. The acceptable parameters to this option are
970none (to suppress the plot); display (to display a plot even when in batch
971mode); or a file name. The effect of providing a file name is as described
972for the --output option of the "gnuplot" command. When plotting is active
973the option --triangle can be used to show only the lower triangle of the
974matrix plot.
975
976If the alternative form is given, using a named matrix rather than a list of
977series, the --spearman and --kendall options are not available -- but see
978the "npcorr" function.
979
980The "$result" accessor can be used to obtain the correlations as a matrix.
981
982Menu path:    /View/Correlation matrix
983Other access: Main window pop-up menu (multiple selection)
984
985# corrgm Statistics
986
987Arguments:  series [ order ]
988Options:    --bartlett (use Bartlett standard errors)
989            --plot=mode-or-filename (see below)
990            --quiet (suppress the plot)
991Example:    corrgm x 12
992
993Prints the values of the autocorrelation function (ACF) for series, which
994may be specified by name or number. The values are defined as rho(u_t,
995u_t-s) where u_t is the t^th observation of the variable u and s denotes the
996number of lags.
997
998The partial autocorrelations (PACF, calculated using the Durbin-Levinson
999algorithm) are also shown: these are net of the effects of intervening lags.
1000In addition the Ljung-Box Q statistic is printed. This may be used to test
1001the null hypothesis that the series is "white noise"; it is asymptotically
1002distributed as chi-square with degrees of freedom equal to the number of
1003lags used.
1004
1005Asterisks are used to indicate statistical significance of the individual
1006autocorrelations. By default this is assessed using a standard error of one
1007over the square root of the sample size, but if the --bartlett option is
1008given then Bartlett standard errors are used for the ACF. This option also
1009governs the confidence band drawn in the ACF plot, if applicable.
1010
1011If an order value is specified the length of the correlogram is limited to
1012at most that number of lags, otherwise the length is determined
1013automatically, as a function of the frequency of the data and the number of
1014observations.
1015
1016By default, a plot of the correlogram is produced: a gnuplot graph in
1017interactive mode or an ASCII graphic in batch mode. This can be adjusted via
1018the --plot option. The acceptable parameters to this option are none (to
1019suppress the plot); ascii (to produce a text graphic even when in
1020interactive mode); display (to produce a gnuplot graph even when in batch
1021mode); or a file name. The effect of providing a file name is as described
1022for the --output option of the "gnuplot" command.
1023
1024Upon successful completion, the accessors "$test" and "$pvalue" contain the
1025corresponding figures of the Ljung-Box test for the maximum order displayed.
1026Note that if you just want to compute the Q statistic, you'll probably want
1027to use the "ljungbox" function instead.
1028
1029Menu path:    /Variable/Correlogram
1030Other access: Main window pop-up menu (single selection)
1031
1032# cusum Tests
1033
1034Options:    --squares (perform the CUSUMSQ test)
1035            --quiet (just print the Harvey-Collier test)
1036            --plot=mode-or-filename (see below)
1037
1038Must follow the estimation of a model via OLS. Performs the CUSUM test -- or
1039if the --squares option is given, the CUSUMSQ test -- for parameter
1040stability. A series of one-step ahead forecast errors is obtained by running
1041a series of regressions: the first regression uses the first k observations
1042and is used to generate a prediction of the dependent variable at
1043observation k + 1; the second uses the first k + 1 observations and
1044generates a prediction for observation k + 2, and so on (where k is the
1045number of parameters in the original model).
1046
1047The cumulated sum of the scaled forecast errors, or the squares of these
1048errors, is printed. The null hypothesis of parameter stability is rejected
1049at the 5 percent significance level if the cumulated sum strays outside of
1050the 95 percent confidence band.
1051
1052In the case of the CUSUM test, the Harvey-Collier t-statistic for testing
1053the null hypothesis of parameter stability is also printed. See Greene's
1054Econometric Analysis for details. For the CUSUMSQ test, the 95 percent
1055confidence band is calculated using the algorithm given in Edgerton and
1056Wells (1994).
1057
1058By default, if the program is not in batch mode a plot of the cumulated
1059series and confidence band is shown. This can be adjusted via the --plot
1060option. The acceptable parameters to this option are none (to suppress the
1061plot); display (to display a plot even when in batch mode); or a file name.
1062The effect of providing a file name is as described for the --output option
1063of the "gnuplot" command.
1064
1065Menu path:    Model window, /Tests/CUSUM(SQ)
1066
1067# data Dataset
1068
1069Argument:   varlist
1070Options:    --compact=method (specify compaction method)
1071            --quiet (don't report results except on error)
1072            --name=identifier (rename imported series)
1073            --odbc (import from ODBC database)
1074            --no-align (ODBC-specific, see below)
1075
1076Reads the variables in varlist from a database file (native gretl, RATS 4.0
1077or PcGive), which must have been opened previously using the "open" command.
1078The data command can also be used to import series from DB.NOMICS or from an
1079ODBC database; for details on those variants see gretl + DB.NOMICS or
1080chapter 42 of the Gretl User's Guide, respectively.
1081
1082The data frequency and sample range may be established via the "setobs" and
1083"smpl" commands prior to using this command. Here's an example:
1084
1085	open fedstl.bin
1086	setobs 12 2000:01
1087	smpl ; 2019:12
1088	data unrate cpiaucsl
1089
1090The commands above open the database named fedstl.bin (which is supplied
1091with gretl), establish a monthly dataset starting in January 2000 and ending
1092in December of 2019, and then import the series named unrate (unemployment
1093rate) and cpiaucsl (all-items CPI).
1094
1095If setobs and smpl are not specified in this way, the data frequency and
1096sample range are set using the first variable read from the database.
1097
1098If the series to be read are of higher frequency than the working dataset,
1099you may specify a compaction method as below:
1100
1101	data LHUR PUNEW --compact=average
1102
1103The five available compaction methods are "average" (takes the mean of the
1104high frequency observations), "last" (uses the last observation), "first",
1105"sum" and "spread". If no method is specified, the default is to use the
1106average. The "spread" method is special: no information is lost, rather it
1107is spread across multiple series, one per sub-period. So for example when
1108adding a monthly series to a quarterly dataset three series are created, one
1109for each month of the quarter; their names bear the suffixes m01, m02 and
1110m03.
1111
1112If the series to be read are of lower frequency than the working dataset the
1113values of the added data are simply repeated as required, but note that the
1114"tdisagg" function can then be used to distribution or interpolation
1115("temporal disaggregation").
1116
1117In the case of native gretl databases (only), the "glob" characters * and ?
1118can be used in varlist to import series that match the given pattern. For
1119example, the following will import all series in the database whose names
1120begin with cpi:
1121
1122	data cpi*
1123
1124The --name option can be used to set a name for the imported series other
1125than the original name in the database. The parameter must be a valid gretl
1126identifier. This option is restricted to the case where a single series is
1127specified for importation.
1128
1129The --no-align option applies only to importation of series via ODBC. By
1130default we require that the ODBC query returns information telling gretl on
1131which rows of the dataset to place the incoming data -- or at least that the
1132number of incoming values matches either the length of the dataset or the
1133length of the current sample range. Setting the --no-align option relaxes
1134this requirement: failing the conditions just mentioned, incoming values are
1135simply placed consecutively starting at the first row of the dataset. If
1136there are fewer such values than rows in the dataset the trailing rows are
1137filled with NAs; if there are more such values than rows the extra values
1138are discarded. For more on ODBC importation see chapter 42 of the Gretl
1139User's Guide.
1140
1141Menu path:    /File/Databases
1142
1143# dataset Dataset
1144
1145Arguments:  keyword parameters
1146Option:     --panel-time (see addobs below)
1147Examples:   dataset addobs 24
1148            dataset addobs 2 --panel-time
1149            dataset insobs 10
1150            dataset compact 1
1151            dataset compact 4 last
1152            dataset expand
1153            dataset transpose
1154            dataset sortby x1
1155            dataset resample 500
1156            dataset renumber x 4
1157            dataset pad-daily 7
1158            dataset clear
1159
1160Performs various operations on the data set as a whole, depending on the
1161given keyword, which must be addobs, insobs, clear, compact, expand,
1162transpose, sortby, dsortby, resample, renumber or pad-daily. Note: with the
1163exception of clear, these actions are not available when the dataset is
1164currently subsampled by selection of cases on some Boolean criterion.
1165
1166addobs: Must be followed by a positive integer, call it n. Adds n extra
1167observations to the end of the working dataset. This is primarily intended
1168for forecasting purposes. The values of most variables over the additional
1169range will be set to missing, but certain deterministic variables are
1170recognized and extended, namely, a simple linear trend and periodic dummy
1171variables. If the dataset takes the form of a panel, the --panel-time flag
1172can be used to lengthen the time series for each cross-sectional unit (the
1173default action being to add n such units).
1174
1175insobs: Must be followed by a positive integer no greater than the current
1176number of observations. Inserts a single observation at the specified
1177position. All subsequent data are shifted by one place and the dataset is
1178extended by one observation. All variables apart from the constant are given
1179missing values at the new observation. This action is not available for
1180panel datasets.
1181
1182clear: No parameter required. Clears out the current data, returning gretl
1183to its initial "empty" state.
1184
1185compact: Must be followed by a positive integer representing a new data
1186frequency, which should be lower than the current frequency (for example, a
1187value of 4 when the current frequency is 12 indicates compaction from
1188monthly to quarterly). This command is available for time series data only;
1189it compacts all the series in the data set to the new frequency. A second
1190parameter may be given, namely one of sum, first, last or spread, to
1191specify, respectively, compaction using the sum of the higher-frequency
1192values, start-of-period values, end-of-period values, or spreading of the
1193higher-frequency values across multiple series (one per sub-period). The
1194default is to compact by averaging.
1195
1196expand: This command is only available for annual or quarterly time series
1197data: annual data can be expanded to quarterly or monthly, and quarterly
1198data to monthly. All series in the data set are padded out to the new
1199frequency by repeating the existing values. If the original dataset is
1200annual the default expansion is to quarterly but expand can be followed by
120112 to request monthly.
1202
1203transpose: No additional parameter required. Transposes the current data
1204set. That is, each observation (row) in the current data set will be treated
1205as a variable (column), and each variable as an observation. This command
1206may be useful if data have been read from some external source in which the
1207rows of the data table represent variables.
1208
1209sortby: The name of a single series or list is required. If one series is
1210given, the observations on all variables in the dataset are re-ordered by
1211increasing value of the specified series. If a list is given, the sort
1212proceeds hierarchically: if the observations are tied in sort order with
1213respect to the first key variable then the second key is used to break the
1214tie, and so on until the tie is broken or the keys are exhausted. Note that
1215this command is available only for undated data.
1216
1217dsortby: Works as sortby except that the re-ordering is by decreasing value
1218of the key series.
1219
1220resample: Constructs a new dataset by random sampling, with replacement, of
1221the rows of the current dataset. One argument is required, namely the number
1222of rows to include. This may be less than, equal to, or greater than the
1223number of observations in the original data. The original dataset can be
1224retrieved via the command smpl full.
1225
1226renumber: Requires the name of an existing series followed by an integer
1227between 1 and the number of series in the dataset minus one. Moves the
1228specified series to the specified position in the dataset, renumbering the
1229other series accordingly. (Position 0 is occupied by the constant, which
1230cannot be moved.)
1231
1232pad-daily: Valid only if the current dataset contains dated daily data with
1233an incomplete calendar. The effect is to pad the data out to a complete
1234calendar by inserting blank rows (that is, rows containing nothing but NAs).
1235This option requires an integer parameter, namely the number of days per
1236week, which must be 5, 6 or 7, and must be greater than or equal to the
1237current data frequency. On successful completion, the data calendar will be
1238"complete" relative to this value. For example if days-per-week is 5 then
1239all weekdays will be represented, whether or not any data are available for
1240those days.
1241
1242Menu path:    /Data
1243
1244# delete Dataset
1245
1246Variants:   delete varlist
1247            delete varname
1248            delete --type=type-name
1249            delete pkgname
1250Options:    --db (delete series from database)
1251            --force (see below)
1252
1253This command is an all-purpose destructor. It should be used with caution;
1254no confirmation is asked.
1255
1256In the first form above, varlist is a list of series, given by name or ID
1257number. Note that when you delete series any series with higher ID numbers
1258than those on the deletion list will be re-numbered. If the --db option is
1259given, this command deletes the listed series not from the current dataset
1260but from a gretl database, assuming that a database has been opened, and the
1261user has write permission for file in question. See also the "open" command.
1262
1263In the second form, the name of a scalar, matrix, string or bundle may be
1264given for deletion. The --db option is not applicable in this case. Note
1265that series and variables of other types should not be mixed in a given call
1266to delete.
1267
1268In the third form, the --type option must be accompanied by one of the
1269following type-names: matrix, bundle, string, list, scalar or array. The
1270effect is to delete all variables of the given type. In this case no
1271argument other than the option should be given.
1272
1273The fourth form can be used to unload a function package. In this case the
1274.gfn suffix must be supplied, as in
1275
1276	delete somepkg.gfn
1277
1278Note that this does not delete the package file, it just unloads the package
1279from memory.
1280
1281Deleting variables in a loop
1282
1283In general it is not permitted to delete variables in the context of a loop,
1284since this may threaten the integrity of the loop code. However, if you are
1285confident that deleting a certain variable is safe you can override this
1286prohibition by appending the --force flag to the delete command.
1287
1288Menu path:    Main window pop-up (single selection)
1289
1290# diff Transformations
1291
1292Argument:   varlist
1293Examples:   penngrow.inp, sw_ch12.inp, sw_ch14.inp
1294
1295The first difference of each variable in varlist is obtained and the result
1296stored in a new variable with the prefix d_. Thus "diff x y" creates the new
1297variables
1298
1299	d_x = x(t) - x(t-1)
1300	d_y = y(t) - y(t-1)
1301
1302Menu path:    /Add/First differences of selected variables
1303
1304# difftest Tests
1305
1306Arguments:  series1 series2
1307Options:    --sign (Sign test, the default)
1308            --rank-sum (Wilcoxon rank-sum test)
1309            --signed-rank (Wilcoxon signed-rank test)
1310            --verbose (print extra output)
1311            --quiet (suppress printed output)
1312Examples:   ooballot.inp
1313
1314Carries out a nonparametric test for a difference between two populations or
1315groups, the specific test depending on the option selected.
1316
1317With the --sign option, the Sign test is performed. This test is based on
1318the fact that if two samples, x and y, are drawn randomly from the same
1319distribution, the probability that x_i > y_i, for each observation i, should
1320equal 0.5. The test statistic is w, the number of observations for which x_i
1321> y_i. Under the null hypothesis this follows the Binomial distribution with
1322parameters (n, 0.5), where n is the number of observations.
1323
1324With the --rank-sum option, the Wilcoxon rank-sum test is performed. This
1325test proceeds by ranking the observations from both samples jointly, from
1326smallest to largest, then finding the sum of the ranks of the observations
1327from one of the samples. The two samples do not have to be of the same size,
1328and if they differ the smaller sample is used in calculating the rank-sum.
1329Under the null hypothesis that the samples are drawn from populations with
1330the same median, the probability distribution of the rank-sum can be
1331computed for any given sample sizes; and for reasonably large samples a
1332close Normal approximation exists.
1333
1334With the --signed-rank option, the Wilcoxon signed-rank test is performed.
1335This is designed for matched data pairs such as, for example, the values of
1336a variable for a sample of individuals before and after some treatment. The
1337test proceeds by finding the differences between the paired observations,
1338x_i - y_i, ranking these differences by absolute value, then assigning to
1339each pair a signed rank, the sign agreeing with the sign of the difference.
1340One then calculates W_+, the sum of the positive signed ranks. As with the
1341rank-sum test, this statistic has a well-defined distribution under the null
1342that the median difference is zero, which converges to the Normal for
1343samples of reasonable size.
1344
1345For the Wilcoxon tests, if the --verbose option is given then the ranking is
1346printed. (This option has no effect if the Sign test is selected.)
1347
1348On successful completion the accessors "$test" and "$pvalue" are available.
1349If one just wants to obtain these values the --quiet flag can be appended to
1350the command.
1351
1352# discrete Transformations
1353
1354Argument:   varlist
1355Option:     --reverse (mark variables as continuous)
1356Examples:   ooballot.inp, oprobit.inp
1357
1358Marks each variable in varlist as being discrete. By default all variables
1359are treated as continuous; marking a variable as discrete affects the way
1360the variable is handled in frequency plots, and also allows you to select
1361the variable for the command "dummify".
1362
1363If the --reverse flag is given, the operation is reversed; that is, the
1364variables in varlist are marked as being continuous.
1365
1366Menu path:    /Variable/Edit attributes
1367
1368# dpanel Estimation
1369
1370Argument:   p ; depvar indepvars [ ; instruments ]
1371Options:    --quiet (don't show estimated model)
1372            --vcv (print covariance matrix)
1373            --two-step (perform 2-step GMM estimation)
1374            --system (add equations in levels)
1375            --time-dummies (add time dummy variables)
1376            --dpdstyle (emulate DPD package for Ox)
1377            --asymptotic (uncorrected asymptotic standard errors)
1378            --keep-extra (see below)
1379Examples:   dpanel 2 ; y x1 x2
1380            dpanel 2 ; y x1 x2 --system
1381            dpanel {2 3} ; y x1 x2 ; x1
1382            dpanel 1 ; y x1 x2 ; x1 GMM(x2,2,3)
1383            See also bbond98.inp
1384
1385Carries out estimation of dynamic panel data models (that is, panel models
1386including one or more lags of the dependent variable) using either the
1387GMM-DIF or GMM-SYS method.
1388
1389The parameter p represents the order of the autoregression for the dependent
1390variable. In the simplest case this is a scalar value, but a pre-defined
1391matrix may be given for this argument, to specify a set of (possibly
1392non-contiguous) lags to be used.
1393
1394The dependent variable and regressors should be given in levels form; they
1395will be differenced automatically (since this estimator uses differencing to
1396cancel out the individual effects).
1397
1398The last (optional) field in the command is for specifying instruments. If
1399no instruments are given, it is assumed that all the independent variables
1400are strictly exogenous. If you specify any instruments, you should include
1401in the list any strictly exogenous independent variables. For predetermined
1402regressors, you can use the GMM function to include a specified range of
1403lags in block-diagonal fashion. This is illustrated in the third example
1404above. The first argument to GMM is the name of the variable in question,
1405the second is the minimum lag to be used as an instrument, and the third is
1406the maximum lag. The same syntax can be used with the GMMlevel function to
1407specify GMM-type instruments for the equations in levels.
1408
1409By default the results of 1-step estimation are reported (with robust
1410standard errors). You may select 2-step estimation as an option. In both
1411cases tests for autocorrelation of orders 1 and 2 are provided, as well as
1412the Sargan overidentification test and a Wald test for the joint
1413significance of the regressors. Note that in this differenced model
1414first-order autocorrelation is not a threat to the validity of the model,
1415but second-order autocorrelation violates the maintained statistical
1416assumptions.
1417
1418In the case of 2-step estimation, standard errors are by default computed
1419using the finite-sample correction suggested by Windmeijer (2005). The
1420standard asymptotic standard errors associated with the 2-step estimator are
1421generally reckoned to be an unreliable guide to inference, but if for some
1422reason you want to see them you can use the --asymptotic option to turn off
1423the Windmeijer correction.
1424
1425If the --time-dummies option is given, a set of time dummy variables is
1426added to the specified regressors. The number of dummies is one less than
1427the maximum number of periods used in estimation, to avoid perfect
1428collinearity with the constant. The dummies are entered in differenced form
1429unless the --dpdstyle option is given, in which case they are entered in
1430levels.
1431
1432As with other estimation commands, a "$model" bundle is available after
1433estimation. In the case of dpanel, the --keep-extra option can be used to
1434save additional information in this bundle, namely the GMM weight and
1435instrument matrices.
1436
1437For further details and examples, please see chapter 24 of the Gretl User's
1438Guide.
1439
1440Menu path:    /Model/Panel/Dynamic panel model
1441
1442# dummify Transformations
1443
1444Argument:   varlist
1445Options:    --drop-first (omit lowest value from encoding)
1446            --drop-last (omit highest value from encoding)
1447
1448For any suitable variables in varlist, creates a set of dummy variables
1449coding for the distinct values of that variable. Suitable variables are
1450those that have been explicitly marked as discrete, or those that take on a
1451fairly small number of values all of which are "fairly round" (multiples of
14520.25).
1453
1454By default a dummy variable is added for each distinct value of the variable
1455in question. For example if a discrete variable x has 5 distinct values, 5
1456dummy variables will be added to the data set, with names Dx_1, Dx_2 and so
1457on. The first dummy variable will have value 1 for observations where x
1458takes on its smallest value, 0 otherwise; the next dummy will have value 1
1459when x takes on its second-smallest value, and so on. If one of the option
1460flags --drop-first or --drop-last is added, then either the lowest or the
1461highest value of each variable is omitted from the encoding (which may be
1462useful for avoiding the "dummy variable trap").
1463
1464This command can also be embedded in the context of a regression
1465specification. For example, the following line specifies a model where y is
1466regressed on the set of dummy variables coding for x. (Option flags cannot
1467be passed to "dummify" in this context.)
1468
1469	ols y dummify(x)
1470
1471Other access: Main window pop-up menu (single selection)
1472
1473# duration Estimation
1474
1475Arguments:  depvar indepvars [ ; censvar ]
1476Options:    --exponential (use exponential distribution)
1477            --loglogistic (use log-logistic distribution)
1478            --lognormal (use log-normal distribution)
1479            --medians (fitted values are medians)
1480            --robust (robust (QML) standard errors)
1481            --cluster=clustvar (see "logit" for explanation)
1482            --vcv (print covariance matrix)
1483            --verbose (print details of iterations)
1484            --quiet (don't print anything)
1485Examples:   duration y 0 x1 x2
1486            duration y 0 x1 x2 ; cens
1487            See also weibull.inp
1488
1489Estimates a duration model: the dependent variable (which must be positive)
1490represents the duration of some state of affairs, for example the length of
1491spells of unemployment for a cross-section of respondents. By default the
1492Weibull distribution is used but the exponential, log-logistic and
1493log-normal distributions are also available.
1494
1495If some of the duration measurements are right-censored (e.g. an
1496individual's spell of unemployment has not come to an end within the period
1497of observation) then you should supply the trailing argument censvar, a
1498series in which non-zero values indicate right-censored cases.
1499
1500By default the fitted values obtained via the accessor $yhat are the
1501conditional means of the durations, but if the --medians option is given
1502then $yhat provides the conditional medians instead.
1503
1504Please see chapter 38 of the Gretl User's Guide for details.
1505
1506Menu path:    /Model/Limited dependent variable/Duration data
1507
1508# elif Programming
1509
1510See "if".
1511
1512# else Programming
1513
1514See "if". Note that "else" requires a line to itself, before the following
1515conditional command. You can append a comment, as in
1516
1517	else # OK, do something different
1518
1519But you cannot append a command, as in
1520
1521	else x = 5 # wrong!
1522
1523# end Programming
1524
1525Ends a block of commands of some sort. For example, "end system" terminates
1526an equation "system".
1527
1528# endif Programming
1529
1530See "if".
1531
1532# endloop Programming
1533
1534Marks the end of a command loop. See "loop".
1535
1536# eqnprint Printing
1537
1538Options:    --complete (Create a complete document)
1539            --output=filename (send output to specified file)
1540
1541Must follow the estimation of a model. Prints the estimated model in the
1542form of a LaTeX equation. If a filename is specified using the --output
1543option output goes to that file, otherwise it goes to a file with a name of
1544the form equation_N.tex, where N is the number of models estimated to date
1545in the current session. See also "tabprint".
1546
1547The output file will be written in the currently set "workdir", unless the
1548filename string contains a full path specification.
1549
1550If the --complete flag is given, the LaTeX file is a complete document,
1551ready for processing; otherwise it must be included in a document.
1552
1553Menu path:    Model window, /LaTeX
1554
1555# equation Estimation
1556
1557Arguments:  depvar indepvars
1558Example:    equation y x1 x2 x3 const
1559
1560Specifies an equation within a system of equations (see "system"). The
1561syntax for specifying an equation within an SUR system is the same as that
1562for, e.g., "ols". For an equation within a Three-Stage Least Squares system
1563you may either (a) give an OLS-type equation specification and provide a
1564common list of instruments using the "instr" keyword (again, see "system"),
1565or (b) use the same equation syntax as for "tsls".
1566
1567# estimate Estimation
1568
1569Arguments:  [ systemname ] [ estimator ]
1570Options:    --iterate (iterate to convergence)
1571            --no-df-corr (no degrees of freedom correction)
1572            --geomean (see below)
1573            --quiet (don't print results)
1574            --verbose (print details of iterations)
1575Examples:   estimate "Klein Model 1" method=fiml
1576            estimate Sys1 method=sur
1577            estimate Sys1 method=sur --iterate
1578
1579Calls for estimation of a system of equations, which must have been
1580previously defined using the "system" command. The name of the system should
1581be given first, surrounded by double quotes if the name contains spaces. The
1582estimator, which must be one of "ols", "tsls", "sur", "3sls", "fiml" or
1583"liml", is preceded by the string method=. These arguments are optional if
1584the system in question has already been estimated and occupies the place of
1585the "last model"; in that case the estimator defaults to the previously used
1586value.
1587
1588If the system in question has had a set of restrictions applied (see the
1589"restrict" command), estimation will be subject to the specified
1590restrictions.
1591
1592If the estimation method is "sur" or "3sls" and the --iterate flag is given,
1593the estimator will be iterated. In the case of SUR, if the procedure
1594converges the results are maximum likelihood estimates. Iteration of
1595three-stage least squares, however, does not in general converge on the
1596full-information maximum likelihood results. The --iterate flag is ignored
1597for other methods of estimation.
1598
1599If the equation-by-equation estimators "ols" or "tsls" are chosen, the
1600default is to apply a degrees of freedom correction when calculating
1601standard errors. This can be suppressed using the --no-df-corr flag. This
1602flag has no effect with the other estimators; no degrees of freedom
1603correction is applied in any case.
1604
1605By default, the formula used in calculating the elements of the
1606cross-equation covariance matrix is
1607
1608  sigma(i,j) = u(i)' * u(j) / T
1609
1610If the --geomean flag is given, a degrees of freedom correction is applied:
1611the formula is
1612
1613  sigma(i,j) = u(i)' * u(j) / sqrt((T - ki) * (T - kj))
1614
1615where the ks denote the number of independent parameters in each equation.
1616
1617If the --verbose option is given and an iterative method is specified,
1618details of the iterations are printed.
1619
1620# eval Utilities
1621
1622Argument:   expression
1623Examples:   eval x
1624            eval inv(X'X)
1625            eval sqrt($pi)
1626
1627This command makes gretl act like a glorified calculator. The program
1628evaluates expression and prints its value. The argument may be the name of a
1629variable, or something more complicated. In any case, it should be an
1630expression which could stand as the right-hand side of an assignment
1631statement.
1632
1633In interactive use (for instance in the gretl console) an equals sign works
1634as shorthand for eval, as in
1635
1636	=sqrt(x)
1637
1638(with or without a space following "="). But this variant is not accepted in
1639scripting mode since it could easily mask coding errors.
1640
1641In most contexts "print" can be used in place of eval to much the same
1642effect. See also "printf" for the case where you wish to combine textual and
1643numerical output.
1644
1645# fcast Prediction
1646
1647Variants:   fcast [startobs endobs] [vname]
1648            fcast [startobs endobs] steps-ahead [vname] --recursive
1649Options:    --dynamic (create dynamic forecast)
1650            --static (create static forecast)
1651            --out-of-sample (generate post-sample forecast)
1652            --no-stats (don't print forecast statistics)
1653            --stats-only (only print forecast statistics)
1654            --quiet (don't print anything)
1655            --recursive (see below)
1656            --plot=filename (see below)
1657Examples:   fcast 1997:1 2001:4 f1
1658            fcast fit2
1659            fcast 2004:1 2008:3 4 rfcast --recursive
1660            See also gdp_midas.inp
1661
1662Must follow an estimation command. Forecasts are generated for a certain
1663range of observations: if startobs and endobs are given, for that range (if
1664possible); otherwise if the --out-of-sample option is given, for
1665observations following the range over which the model was estimated;
1666otherwise over the currently defined sample range. If an out-of-sample
1667forecast is requested but no relevant observations are available, an error
1668is flagged. Depending on the nature of the model, standard errors may also
1669be generated; see below. Also see below for the special effect of the
1670--recursive option.
1671
1672If the last model estimated is a single equation, then the optional vname
1673argument has the following effect: the forecast values are not printed, but
1674are saved to the dataset under the given name. If the last model is a system
1675of equations, vname has a different effect, namely selecting a particular
1676endogenous variable for forecasting (the default being to produce forecasts
1677for all the endogenous variables). In the system case, or if vname is not
1678given, the forecast values can be retrieved using the accessor "$fcast", and
1679the standard errors, if available, via "$fcse".
1680
1681The choice between a static and a dynamic forecast applies only in the case
1682of dynamic models, with an autoregressive error process or including one or
1683more lagged values of the dependent variable as regressors. Static forecasts
1684are one step ahead, based on realized values from the previous period, while
1685dynamic forecasts employ the chain rule of forecasting. For example, if a
1686forecast for y in 2008 requires as input a value of y for 2007, a static
1687forecast is impossible without actual data for 2007. A dynamic forecast for
16882008 is possible if a prior forecast can be substituted for y in 2007.
1689
1690The default is to give a static forecast for any portion of the forecast
1691range that lies within the sample range over which the model was estimated,
1692and a dynamic forecast (if relevant) out of sample. The --dynamic option
1693requests a dynamic forecast from the earliest possible date, and the
1694--static option requests a static forecast even out of sample.
1695
1696The --recursive option is presently available only for single-equation
1697models estimated via OLS. When this option is given the forecasts are
1698recursive. That is, each forecast is generated from an estimate of the given
1699model using data from a fixed starting point (namely, the start of the
1700sample range for the original estimation) up to the forecast date minus k,
1701where k is the number of steps ahead, which must be given in the steps-ahead
1702argument. The forecasts are always dynamic if this is applicable. Note that
1703the steps-ahead argument should be given only in conjunction with the
1704--recursive option.
1705
1706The --plot option (available only in the case of single-equation estimation)
1707calls for a plot file to be produced, containing a graphical representation
1708of the forecast. The suffix of the filename argument to this option controls
1709the format of the plot: .eps for EPS, .pdf for PDF, .png for PNG, .plt for a
1710gnuplot command file. The dummy filename display can be used to force
1711display of the plot in a window. For example,
1712
1713	fcast --plot=fc.pdf
1714
1715will generate a graphic in PDF format. Absolute pathnames are respected,
1716otherwise files are written to the gretl working directory.
1717
1718The nature of the forecast standard errors (if available) depends on the
1719nature of the model and the forecast. For static linear models standard
1720errors are computed using the method outlined by Davidson and MacKinnon
1721(2004); they incorporate both uncertainty due to the error process and
1722parameter uncertainty (summarized in the covariance matrix of the parameter
1723estimates). For dynamic models, forecast standard errors are computed only
1724in the case of a dynamic forecast, and they do not incorporate parameter
1725uncertainty. For nonlinear models, forecast standard errors are not
1726presently available.
1727
1728Menu path:    Model window, /Analysis/Forecasts
1729
1730# flush Programming
1731
1732This simple command (no arguments, no options) is intended for use in
1733time-consuming scripts that may be executed via the gretl GUI (it is ignored
1734by the command-line program), to give the user a visual indication that
1735things are moving along and gretl is not "frozen".
1736
1737Ordinarily if you launch a script in the GUI no output is shown until its
1738execution is completed, but the effect of invoking flush is as follows:
1739
1740  On the first invocation, gretl opens a window, displays the output so far,
1741  and appends the message "Processing...".
1742
1743  On subsequent invocations the text shown in the output window is updated,
1744  and a new "processing" message is appended.
1745
1746When execution of the script is completed any remaining output is
1747automatically flushed to the text window.
1748
1749Please note, there is no point in using flush in scripts that take less than
1750(say) 5 seconds to execute. Also note that this command should not be used
1751at a point in the script where there is no further output to be printed, as
1752the "processing" message will then be misleading to the user.
1753
1754The following illustrates the intended use of flush:
1755
1756       set echo off
1757       scalar n = 10
1758       loop i=1..n
1759           # do some time-consuming operation
1760           loop 100 --quiet
1761               a = mnormal(200,200)
1762               b = inv(a)
1763           endloop
1764           # print some results
1765           printf "Iteration %2d done\n", i
1766           if i < n
1767               flush
1768           endif
1769       endloop
1770
1771# foreign Programming
1772
1773Syntax:     foreign language=lang
1774Options:    --send-data[=list] (pre-load data; see below)
1775            --quiet (suppress output from foreign program)
1776
1777This command opens a special mode in which commands to be executed by
1778another program are accepted. You exit this mode with end foreign; at this
1779point the stacked commands are executed.
1780
1781At present the "foreign" programs supported in this way are GNU R
1782(language=R), Python, Julia, GNU Octave (language=Octave), Jurgen Doornik's
1783Ox and Stata. Language names are recognized on a case-insensitive basis.
1784
1785In connection with R, Octave and Stata the --send-data option has the effect
1786of making data from gretl's workspace available within the target program.
1787By default the entire dataset is sent, but you can limit the data to be sent
1788by giving the name of a predefined list of series. For example:
1789
1790	list Rlist = x1 x2 x3
1791	foreign language=R --send-data=Rlist
1792
1793See chapter 44 of the Gretl User's Guide for details and examples.
1794
1795# fractint Statistics
1796
1797Arguments:  series [ order ]
1798Options:    --gph (do Geweke and Porter-Hudak test)
1799            --all (do both tests)
1800            --quiet (don't print results)
1801
1802Tests the specified series for fractional integration ("long memory"). The
1803null hypothesis is that the integration order of the series is zero. By
1804default the local Whittle estimator (Robinson, 1995) is used but if the
1805--gph option is given the GPH test (Geweke and Porter-Hudak, 1983) is
1806performed instead. If the --all flag is given then the results of both tests
1807are printed.
1808
1809For details on this sort of test, see Phillips and Shimotsu (2004).
1810
1811If the optional order argument is not given the order for the test(s) is set
1812automatically as the lesser of T/2 and T^0.6.
1813
1814The estimated fractional integration orders and their standard errors are
1815available via the "$result" accessor. With the --all option, the Local
1816Whittle estimate will be in the first row and the GPH estimate in the second
1817one.
1818
1819The results of the test can be retrieved using the accessors "$test" and
1820"$pvalue". These values are based on the Local Whittle Estimator unless the
1821--gph option is given.
1822
1823Menu path:    /Variable/Unit root tests/Fractional integration
1824
1825# freq Statistics
1826
1827Argument:   var
1828Options:    --nbins=n (specify number of bins)
1829            --min=minval (specify minimum, see below)
1830            --binwidth=width (specify bin width, see below)
1831            --normal (test for the normal distribution)
1832            --gamma (test for gamma distribution)
1833            --silent (don't print anything)
1834            --matrix=name (use column of named matrix)
1835            --plot=mode-or-filename (see below)
1836            --quiet (suppress the plot)
1837Examples:   freq x
1838            freq x --normal
1839            freq x --nbins=5
1840            freq x --min=0 --binwidth=0.10
1841
1842With no options given, displays the frequency distribution for the series
1843var (given by name or number), with the number of bins and their size chosen
1844automatically.
1845
1846If the --matrix option is given, var (which must be an integer) is instead
1847interpreted as a 1-based index that selects a column from the named matrix.
1848If the matrix in question is in fact a column vector, the var argument may
1849be omitted.
1850
1851To control the presentation of the distribution you may specify either the
1852number of bins or the minimum value plus the width of the bins, as shown in
1853the last two examples above. The --min option sets the lower limit of the
1854left-most bin.
1855
1856If the --normal option is given, the Doornik-Hansen chi-square test for
1857normality is computed. If the --gamma option is given, the test for
1858normality is replaced by Locke's nonparametric test for the null hypothesis
1859that the variable follows the gamma distribution; see Locke (1976), Shapiro
1860and Chen (2001). Note that the parameterization of the gamma distribution
1861used in gretl is (shape, scale).
1862
1863By default, if the program is not in batch mode a plot of the distribution
1864is shown. This can be adjusted via the --plot option. The acceptable
1865parameters to this option are none (to suppress the plot); display (to
1866display a plot even when in batch mode); or a file name. The effect of
1867providing a file name is as described for the --output option of the
1868"gnuplot" command.
1869
1870The --silent flag suppresses the usual text output. This might be used in
1871conjunction with one or other of the distribution test options: the test
1872statistic and its p-value are recorded, and can be retrieved using the
1873accessors "$test" and "$pvalue". It might also be used along with the --plot
1874option if you just want a histogram and don't care to see the accompanying
1875text.
1876
1877Note that gretl does not have a function that matches this command, but it
1878is possible to use the "aggregate" function to achieve the same purpose. In
1879addition, the frequency distribution constructed by freq can be obtained in
1880matrix form via the "$result" accessor.
1881
1882Menu path:    /Variable/Frequency distribution
1883
1884# funcerr Programming
1885
1886Argument:   [ message ]
1887
1888Applicable only in the context of a user-defined function (see "function").
1889Causes execution of the current function to terminate with an error
1890condition flagged. An exception is the special MPI mode for parallelized
1891program execution, where only the associated string is printed.
1892
1893The optional message argument can take the form of a string literal or the
1894name of a string variable; if present it is printed as part of the error
1895message shown to the caller of the function.
1896
1897See also the closely related function, "errorif".
1898
1899# function Programming
1900
1901Argument:   fnname
1902
1903Opens a block of statements in which a function is defined. This block must
1904be closed with end function. (An exception is the case when a user-defined
1905function shall be deleted, which is achieved by the single command line
1906function foo delete for a function named "foo".) See chapter 14 of the Gretl
1907User's Guide for details.
1908
1909# garch Estimation
1910
1911Arguments:  p q ; depvar [ indepvars ]
1912Options:    --robust (robust standard errors)
1913            --verbose (print details of iterations)
1914            --quiet (don't print anything)
1915            --vcv (print covariance matrix)
1916            --nc (do not include a constant)
1917            --stdresid (standardize the residuals)
1918            --fcp (use Fiorentini, Calzolari, Panattoni algorithm)
1919            --arma-init (initial variance parameters from ARMA)
1920Examples:   garch 1 1 ; y
1921            garch 1 1 ; y 0 x1 x2 --robust
1922            See also garch.inp, sw_ch14.inp
1923
1924Estimates a GARCH model (GARCH = Generalized Autoregressive Conditional
1925Heteroskedasticity), either a univariate model or, if indepvars are
1926specified, including the given exogenous variables. The integer values p and
1927q (which may be given in numerical form or as the names of pre-existing
1928scalar variables) represent the lag orders in the conditional variance
1929equation:
1930
1931  h(t) = a(0) + sum(i=1 to q) a(i)*u(t-i)^2 + sum(j=1 to p) b(j)*h(t-j)
1932
1933The parameter p therefore represents the Generalized (or "AR") order, while
1934q represents the regular ARCH (or "MA") order. If p is non-zero, q must also
1935be non-zero otherwise the model is unidentified. However, you can estimate a
1936regular ARCH model by setting q to a positive value and p to zero. The sum
1937of p and q must be no greater than 5. Note that a constant is automatically
1938included in the mean equation unless the --nc option is given.
1939
1940By default native gretl code is used in estimation of GARCH models, but you
1941also have the option of using the algorithm of Fiorentini, Calzolari and
1942Panattoni (1996). The former uses the BFGS maximizer while the latter uses
1943the information matrix to maximize the likelihood, with fine-tuning via the
1944Hessian.
1945
1946Several variant estimators of the covariance matrix are available with this
1947command. By default, the Hessian is used unless the --robust option is
1948given, in which case the QML (White) covariance matrix is used. Other
1949possibilities (e.g. the information matrix, or the Bollerslev-Wooldridge
1950estimator) can be specified using the "set" command.
1951
1952By default, the estimates of the variance parameters are initialized using
1953the unconditional error variance from initial OLS estimation for the
1954constant, and small positive values for the coefficients on the past values
1955of the squared error and the error variance. The flag --arma-init calls for
1956the starting values of these parameters to be set using an initial ARMA
1957model, exploiting the relationship between GARCH and ARMA set out in Chapter
195821 of Hamilton's Time Series Analysis. In some cases this may improve the
1959chances of convergence.
1960
1961The GARCH residuals and estimated conditional variance can be retrieved as
1962$uhat and $h respectively. For example, to get the conditional variance:
1963
1964	series ht = $h
1965
1966If the --stdresid option is given, the $uhat values are divided by the
1967square root of h_t.
1968
1969Menu path:    /Model/Univariate time series/GARCH
1970
1971# genr Dataset
1972
1973Arguments:  newvar = formula
1974
1975NOTE: this command has undergone numerous changes and enhancements since the
1976following help text was written, so for comprehensive and updated info on
1977this command you'll want to refer to chapter 10 of the Gretl User's Guide.
1978On the other hand, this help does not contain anything actually erroneous,
1979so take the following as "you have this, plus more".
1980
1981In the appropriate context, series, scalar, matrix, string and bundle are
1982synonyms for this command.
1983
1984Creates new variables, often via transformations of existing variables. See
1985also "diff", "logs", "lags", "ldiff", "sdiff" and "square" for shortcuts. In
1986the context of a genr formula, existing variables must be referenced by
1987name, not ID number. The formula should be a well-formed combination of
1988variable names, constants, operators and functions (described below). Note
1989that further details on some aspects of this command can be found in chapter
199010 of the Gretl User's Guide.
1991
1992A genr command may yield either a series or a scalar result. For example,
1993the formula x2 = x * 2 naturally yields a series if the variable x is a
1994series and a scalar if x is a scalar. The formulae x = 0 and mx = mean(x)
1995naturally return scalars. Under some circumstances you may want to have a
1996scalar result expanded into a series or vector. You can do this by using
1997series as an "alias" for the genr command. For example, series x = 0
1998produces a series all of whose values are set to 0. You can also use scalar
1999as an alias for genr. It is not possible to coerce a vector result into a
2000scalar, but use of this keyword indicates that the result should be a
2001scalar: if it is not, an error occurs.
2002
2003When a formula yields a series result, the range over which the result is
2004written to the target variable depends on the current sample setting. It is
2005possible, therefore, to define a series piecewise using the smpl command in
2006conjunction with genr.
2007
2008Supported arithmetical operators are, in order of precedence: ^
2009(exponentiation); *, / and % (modulus or remainder); + and -.
2010
2011The available Boolean operators are (again, in order of precedence): !
2012(negation), && (logical AND), || (logical OR), >, <, == (is equal to), >=
2013(greater than or equal), <= (less than or equal) and != (not equal). The
2014Boolean operators can be used in constructing dummy variables: for instance
2015(x > 10) returns 1 if x > 10, 0 otherwise.
2016
2017Built-in constants are pi and NA. The latter is the missing value code: you
2018can initialize a variable to the missing value with scalar x = NA.
2019
2020The genr command supports a wide range of mathematical and statistical
2021functions, including all the common ones plus several that are special to
2022econometrics. In addition it offers access to numerous internal variables
2023that are defined in the course of running regressions, doing hypothesis
2024tests, and so on. For a listing of functions and accessors, type "help
2025functions".
2026
2027Besides the operators and functions noted above there are some special uses
2028of "genr":
2029
2030  "genr time" creates a time trend variable (1,2,3,...) called "time". "genr
2031  index" does the same thing except that the variable is called index.
2032
2033  "genr dummy" creates dummy variables up to the periodicity of the data. In
2034  the case of quarterly data (periodicity 4), the program creates dq1 = 1
2035  for first quarter and 0 in other quarters, dq2 = 1 for the second quarter
2036  and 0 in other quarters, and so on. With monthly data the dummies are
2037  named dm1, dm2, and so on. With other frequencies the names are dummy_1,
2038  dummy_2, etc.
2039
2040  "genr unitdum" and "genr timedum" create sets of special dummy variables
2041  for use with panel data. The first codes for the cross-sectional units and
2042  the second for the time period of the observations.
2043
2044Note: In the command-line program, "genr" commands that retrieve
2045model-related data always reference the model that was estimated most
2046recently. This is also true in the GUI program, if one uses "genr" in the
2047"gretl console" or enters a formula using the "Define new variable" option
2048under the Add menu in the main window. With the GUI, however, you have the
2049option of retrieving data from any model currently displayed in a window
2050(whether or not it's the most recent model). You do this under the "Save"
2051menu in the model's window.
2052
2053The special variable obs serves as an index of the observations. For
2054instance series dum = (obs==15) will generate a dummy variable that has
2055value 1 for observation 15, 0 otherwise. You can also use this variable to
2056pick out particular observations by date or name. For example, series d =
2057(obs>1986:4), series d = (obs>"2008-04-01"), or series d = (obs=="CA"). If
2058daily dates or observation labels are used in this context, they should be
2059enclosed in double quotes. Quarterly and monthly dates (with a colon) may be
2060used unquoted. Note that in the case of annual time series data, the year is
2061not distinguishable syntactically from a plain integer; therefore if you
2062wish to compare observations against obs by year you must use the function
2063obsnum to convert the year to a 1-based index value, as in series d =
2064(obs>obsnum(1986)).
2065
2066Scalar values can be pulled from a series in the context of a genr formula,
2067using the syntax varname[obs]. The obs value can be given by number or date.
2068Examples: x[5], CPI[1996:01]. For daily data, the form YYYY-MM-DD should be
2069used, e.g. ibm[1970-01-23].
2070
2071An individual observation in a series can be modified via genr. To do this,
2072a valid observation number or date, in square brackets, must be appended to
2073the name of the variable on the left-hand side of the formula. For example,
2074genr x[3] = 30 or genr x[1950:04] = 303.7.
2075
2076  Formula                Comment
2077  -------                -------
2078  y = x1^3               x1 cubed
2079  y = ln((x1+x2)/x3)
2080  z = x>y                z(t) = 1 if x(t) > y(t), otherwise 0
2081  y = x(-2)              x lagged 2 periods
2082  y = x(+2)              x led 2 periods
2083  y = diff(x)            y(t) = x(t) - x(t-1)
2084  y = ldiff(x)           y(t) = log x(t) - log x(t-1), the instantaneous rate
2085                         of growth of x
2086  y = sort(x)            sorts x in increasing order and stores in y
2087  y = dsort(x)           sort x in decreasing order
2088  y = int(x)             truncate x and store its integer value as y
2089  y = abs(x)             store the absolute values of x
2090  y = sum(x)             sum x values excluding missing NA entries
2091  y = cum(x)             cumulation: y(t) = the sum from s=1 to s=t of x(s)
2092  aa = $ess              set aa equal to the Error Sum of Squares from last
2093                         regression
2094  x = $coeff(sqft)       grab the estimated coefficient on the variable sqft
2095                         from the last regression
2096  rho4 = $rho(4)         grab the 4th-order autoregressive coefficient from
2097                         the last model (presumes an ar model)
2098  cvx1x2 = $vcv(x1, x2)  grab the estimated coefficient covariance of vars x1
2099                         and x2 from the last model
2100  foo = uniform()        uniform pseudo-random variable in range 0-1
2101  bar = 3 * normal()     normal pseudo-random variable, mu = 0, sigma = 3
2102  samp = ok(x)           = 1 for observations where x is not missing.
2103
2104Menu path:    /Add/Define new variable
2105Other access: Main window pop-up menu
2106
2107# gmm Estimation
2108
2109Options:    --two-step (two step estimation)
2110            --iterate (iterated GMM)
2111            --vcv (print covariance matrix)
2112            --verbose (print details of iterations)
2113            --quiet (don't print anything)
2114            --lbfgs (use L-BFGS-B instead of regular BFGS)
2115Examples:   hall_cbapm.inp
2116
2117Performs Generalized Method of Moments (GMM) estimation using the BFGS
2118(Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or
2119more commands for updating the relevant quantities (typically GMM
2120residuals), one or more sets of orthogonality conditions, an initial matrix
2121of weights, and a listing of the parameters to be estimated, all enclosed
2122between the tags gmm and end gmm. Any options should be appended to the end
2123gmm line.
2124
2125Please see chapter 27 of the Gretl User's Guide for details on this command.
2126Here we just illustrate with a simple example.
2127
2128	gmm e = y - X*b
2129	  orthog e ; W
2130	  weights V
2131	  params b
2132	end gmm
2133
2134In the example above we assume that y and X are data matrices, b is an
2135appropriately sized vector of parameter values, W is a matrix of
2136instruments, and V is a suitable matrix of weights. The statement
2137
2138	orthog e ; W
2139
2140indicates that the residual vector e is in principle orthogonal to each of
2141the instruments composing the columns of W.
2142
2143Parameter names
2144
2145In estimating a nonlinear model it is often convenient to name the
2146parameters tersely. In printing the results, however, it may be desirable to
2147use more informative labels. This can be achieved via the additional keyword
2148param_names within the command block. For a model with k parameters the
2149argument following this keyword should be a double-quoted string literal
2150holding k space-separated names, the name of a string variable that holds k
2151such names, or the name of an array of k strings.
2152
2153Menu path:    /Model/Instrumental variables/GMM
2154
2155# gnuplot Graphs
2156
2157Arguments:  yvars xvar [ dumvar ]
2158Options:    --with-lines[=varspec] (use lines, not points)
2159            --with-lp[=varspec] (use lines and points)
2160            --with-impulses[=varspec] (use vertical lines)
2161            --with-steps[=varspec] (use perpendicular line segments)
2162            --time-series (plot against time)
2163            --single-yaxis (force use of just one y-axis)
2164            --ylogscale[=base] (use log scale for vertical axis)
2165            --dummy (see below)
2166            --fit=fitspec (see below)
2167            --font=fontspec (see below)
2168            --band=bandspec (see below)
2169            --band-style=style (see below)
2170            --matrix=name (plot columns of named matrix)
2171            --output=filename (send output to specified file)
2172            --input=filename (take input from specified file)
2173Examples:   gnuplot y1 y2 x
2174            gnuplot x --time-series --with-lines
2175            gnuplot wages educ gender --dummy
2176            gnuplot y x --fit=quadratic
2177            gnuplot y1 y2 x --with-lines=y2
2178
2179The variables in the list yvars are graphed against xvar. For a time series
2180plot you may either give time as xvar or use the option flag --time-series.
2181See also the "plot" and "panplot" commands.
2182
2183By default, data-points are shown as points; this can be overridden by
2184giving one of the options --with-lines, --with-lp, --with-impulses or
2185--with-steps. If more than one variable is to be plotted on the y axis, the
2186effect of these options may be confined to a subset of the variables by
2187using the varspec parameter. This should take the form of a comma-separated
2188listing of the names or numbers of the variables to be plotted with lines or
2189impulses respectively. For instance, the final example above shows how to
2190plot y1 and y2 against x, such that y2 is represented by a line but y1 by
2191points.
2192
2193If the --dummy option is selected, exactly three variables should be given:
2194a single y variable, an x variable, and dvar, a discrete variable. The
2195effect is to plot yvar against xvar with the points shown in different
2196colors depending on the value of dvar at the given observation.
2197
2198You can choose the scale for the y axis to be logarithmic rather than linear
2199by using the --ylogscale option, together with a base parameter. For
2200example,
2201
2202	gnuplot y x --ylogscale=2
2203
2204plots the data such that the vertical axis is expressed as powers of 2. If
2205the base is omitted, it defaults to 10.
2206
2207Taking data from a matrix
2208
2209Generally, the arguments yvars and xvar are required, and refer to series in
2210the current dataset (given either by name or ID number). But if a named
2211matrix is supplied via the --matrix option these arguments become optional:
2212if the specified matrix has k columns, by default the first k - 1 columns
2213are treated as the yvars and the last column as xvar. If the --time-series
2214option is given, however, all k columns are plotted against time. If you
2215wish to plot selected columns of the matrix, you should specify yvars and
2216xvar in the form of 1-based column numbers. For example if you want a
2217scatterplot of column 2 of matrix M against column 1, you can do:
2218
2219	gnuplot 2 1 --matrix=M
2220
2221Showing a line of best fit
2222
2223The --fit option is applicable only for bivariate scatterplots and single
2224time-series plots. The default behavior for a scatterplot is to show the OLS
2225fit if the slope coefficient is significant at the 10 percent level, while
2226the default behavior for time-series is not to show any fitted line. You can
2227call for different behavior by using this option along with one of the
2228following fitspec parameter values. Note that if the plot is a single time
2229series the place of x is taken by time.
2230
2231  linear: show the OLS fit regardless of its level of statistical
2232  significance.
2233
2234  none: don't show any fitted line.
2235
2236  inverse, quadratic, cubic, semilog or linlog: show a fitted line based on
2237  a regression of the specified type. By semilog, we mean a regression of
2238  log y on x; the fitted line represents the conditional expectation of y,
2239  obtained by exponentiation. By linlog we mean a regression of y on the log
2240  of x.
2241
2242  loess: show the fit from a robust locally weighted regression (also is
2243  sometimes known as "lowess").
2244
2245Plotting a band
2246
2247The --band option can be used for plotting zero or more series along with a
2248"band" of some sort (typically representing a confidence interval). This
2249option requires two comma-separated parameters: the name or ID number of a
2250series representing the center of the band, and the name or ID of a series
2251giving the width of the band: the effect is to draw a band with y
2252coordinates equal to center minus width and center plus width. An optional
2253third parameter (again, comma-separated) can be used to give a multiplier
2254for the width dimension, in the form of a numerical constant or the name of
2255a scalar variable. So for example, the following example plots y along with
2256a band of plus or minus 1.96 times se_y:
2257
2258	gnuplot y --time-series --band=y,se_y,1.96 --with-lines
2259
2260When the --band option is given, the companion option --band-style can be
2261used to control the band's representation. By default the upper and lower
2262limits are shown as solid lines, but the parameters fill, dash, bars or step
2263cause the band to be drawn as a shaded area, using dashed lines, using error
2264bars or using steps, respectively. In addition a color specification can be
2265appended (following a comma) or substituted. Here are some style examples:
2266
2267	gnuplot ... --band-style=fill
2268	gnuplot ... --band-style=dash,0xbbddff
2269	gnuplot ... --band-style=,black
2270	gnuplot ... --band-style=bars,blue
2271
2272The first example produces a shaded area in the default color; the second
2273switches to dashed lines with a specified blue-gray color; the third uses
2274solid black lines; and the last shows blue bars. Note that colors can be
2275given as either hexadecimal RGB values or by name; you can access the list
2276of color-names recognized by gnuplot by issuing the command "show
2277colornames" in gnuplot itself, or in the gretl console by doing
2278
2279	eval readfile("@gretldir/data/gnuplot/gpcolors.txt")
2280
2281Recession bars
2282
2283The "band" options described above can also be used to add "recession bars"
2284to a plot. By this we mean vertical bars occupying the full y-dimension of
2285the plot and indicating the presence (bar) or absence (no bar) of some
2286qualitative feature in a time-series plot. Such bars are commonly used to
2287flag periods of recession; they could also be used to indicate periods of
2288war, or anything that can be coded in a 0/1 dummy variable.
2289
2290In this context the --band option requires a single parameter: the
2291identifier of a series with values 0 and 1, where 1 indicates "on" and 0
2292"off". The --band-style option may be used to specify a color for the bars,
2293given in hexadecimal form or as the name of a color known to gnuplot (see
2294the previous section). An example showing a single bar is given below:
2295
2296	open AWM17 --quiet
2297	series dum = obs >= 1990:1 && obs <= 1994:2
2298	gnuplot YER URX --with-lines --time-series \
2299	  --band=dum --band-style=0xcccccc --output=display \
2300	  {set key top left;}
2301
2302Controlling the output
2303
2304In interactive mode the plot is displayed immediately. In batch mode the
2305default behavior is that a gnuplot command file is written in the user's
2306working directory, with a name on the pattern gpttmpN.plt, starting with N =
230701. The actual plots may be generated later using gnuplot (under MS Windows,
2308wgnuplot). This behavior can be modified by use of the --output=filename
2309option. This option controls the filename used, and at the same time allows
2310you to specify a particular output format via the three-letter extension of
2311the file name, as follows: .eps results in the production of an Encapsulated
2312PostScript (EPS) file; .pdf produces PDF; .png produces PNG format, .emf
2313calls for EMF (Enhanced MetaFile), .fig calls for an Xfig file, and .svg for
2314SVG (Scalable Vector Graphics). If the dummy filename "display" is given
2315then the plot is shown on screen as in interactive mode. If a filename with
2316any extension other than those just mentioned is given, a gnuplot command
2317file is written.
2318
2319Specifying a font
2320
2321The --font option can be used to specify a particular font for the plot. The
2322fontspec parameter should take the form of the name of a font, optionally
2323followed by a size in points separated from the name by a comma or space,
2324all wrapped in double quotes, as in
2325
2326	--font="serif,12"
2327
2328Note that the fonts available to gnuplot will vary by platform, and if
2329you're writing a plot command that is intended to be portable it is best to
2330restrict the font name to the generic sans or serif.
2331
2332Adding gnuplot commands
2333
2334A further option to this command is available: following the specification
2335of the variables to be plotted and the option flag (if any), you may add
2336literal gnuplot commands to control the appearance of the plot (for example,
2337setting the plot title and/or the axis ranges). These commands should be
2338enclosed in braces, and each gnuplot command must be terminated with a
2339semi-colon. A backslash may be used to continue a set of gnuplot commands
2340over more than one line. Here is an example of the syntax:
2341
2342	{ set title 'My Title'; set yrange [0:1000]; }
2343
2344Menu path:    /View/Graph specified vars
2345Other access: Main window pop-up menu, graph button on toolbar
2346
2347# graphpg Graphs
2348
2349Variants:   graphpg add
2350            graphpg fontscale value
2351            graphpg show
2352            graphpg free
2353            graphpg --output=filename
2354
2355The session "graph page" will work only if you have the LaTeX typesetting
2356system installed, and are able to generate and view PDF or PostScript
2357output.
2358
2359In the session icon window, you can drag up to eight graphs onto the graph
2360page icon. When you double-click on the graph page (or right-click and
2361select "Display"), a page containing the selected graphs will be composed
2362and opened in a suitable viewer. From there you should be able to print the
2363page.
2364
2365To clear the graph page, right-click on its icon and select "Clear".
2366
2367Note that on systems other than MS Windows, you may have to adjust the
2368setting for the program used to view PDF or PostScript files. Find that
2369under the "Programs" tab in the gretl Preferences dialog box (under the
2370Tools menu in the main window).
2371
2372It's also possible to operate on the graph page via script, or using the
2373console (in the GUI program). The following commands and options are
2374supported:
2375
2376To add a graph to the graph page, issue the command graphpg add after saving
2377a named graph, as in
2378
2379	grf1 <- gnuplot Y X
2380	graphpg add
2381
2382To display the graph page: graphpg show.
2383
2384To clear the graph page: graphpg free.
2385
2386To adjust the scale of the font used in the graph page, use graphpg
2387fontscale scale, where scale is a multiplier (with a default of 1.0). Thus
2388to make the font size 50 percent bigger than the default you can do
2389
2390	graphpg fontscale 1.5
2391
2392To call for printing of the graph page to file, use the flag --output= plus
2393a filename; the filename should have the suffix ".pdf", ".ps" or ".eps". For
2394example:
2395
2396	graphpg --output="myfile.pdf"
2397
2398The output file will be written in the currently set "workdir", unless the
2399filename string contains a full path specification.
2400
2401In this context the output uses colored lines by default; to use dot/dash
2402patterns instead of colors you can append the --monochrome flag.
2403
2404# heckit Estimation
2405
2406Arguments:  depvar indepvars ; selection equation
2407Options:    --quiet (suppress printing of results)
2408            --two-step (perform two-step estimation)
2409            --vcv (print covariance matrix)
2410            --opg (OPG standard errors)
2411            --robust (QML standard errors)
2412            --cluster=clustvar (see "logit" for explanation)
2413            --verbose (print extra output)
2414Examples:   heckit y 0 x1 x2 ; ys 0 x3 x4
2415            See also heckit.inp
2416
2417Heckman-type selection model. In the specification, the list before the
2418semicolon represents the outcome equation, and the second list represents
2419the selection equation. The dependent variable in the selection equation (ys
2420in the example above) must be a binary variable.
2421
2422By default, the parameters are estimated by maximum likelihood. The
2423covariance matrix of the parameters is computed using the negative inverse
2424of the Hessian. If two-step estimation is desired, use the --two-step
2425option. In this case, the covariance matrix of the parameters of the outcome
2426equation is appropriately adjusted as per Heckman (1979).
2427
2428Menu path:    /Model/Limited dependent variable/Heckit
2429
2430# help Utilities
2431
2432Variants:   help
2433            help functions
2434            help command
2435            help function
2436Option:     --func (select functions help)
2437
2438If no arguments are given, prints a list of available commands. If the
2439single argument "functions" is given, prints a list of available functions
2440(see "genr").
2441
2442help command describes command (e.g. help smpl). help function describes
2443function (e.g. help ldet). Some functions have the same names as related
2444commands (e.g. diff): in that case the default is to print help for the
2445command, but you can get help on the function by using the --func option.
2446
2447Menu path:    /Help
2448
2449# hfplot Graphs
2450
2451Arguments:  hflist [ ; lflist ]
2452Options:    --with-lines (plot with lines)
2453            --time-series (put time on x-axis)
2454            --output=filename (send output to specified file)
2455
2456Provides a means of plotting a high-frequency series, possibly along with
2457one or more series observed at the base frequency of the dataset. The first
2458argument should be a "MIDAS list"; the optional additional lflist terms,
2459following a semicolon, should be regular ("low-frequency") series.
2460
2461For details on the effect of the --output option, please see the "gnuplot"
2462command.
2463
2464# hsk Estimation
2465
2466Arguments:  depvar indepvars
2467Options:    --no-squares (see below)
2468            --vcv (print covariance matrix)
2469            --quiet (don't print anything)
2470
2471This command is applicable where heteroskedasticity is present in the form
2472of an unknown function of the regressors which can be approximated by a
2473quadratic relationship. In that context it offers the possibility of
2474consistent standard errors and more efficient parameter estimates as
2475compared with OLS.
2476
2477The procedure involves (a) OLS estimation of the model of interest, followed
2478by (b) an auxiliary regression to generate an estimate of the error
2479variance, then finally (c) weighted least squares, using as weight the
2480reciprocal of the estimated variance.
2481
2482In the auxiliary regression (b) we regress the log of the squared residuals
2483from the first OLS on the original regressors and their squares (by
2484default), or just on the original regressors (if the --no-squares option is
2485given). The log transformation is performed to ensure that the estimated
2486variances are all non-negative. Call the fitted values from this regression
2487u^*. The weight series for the final WLS is then formed as 1/exp(u^*).
2488
2489Menu path:    /Model/Other linear models/Heteroskedasticity corrected
2490
2491# hurst Statistics
2492
2493Argument:   series
2494Option:     --plot=mode-or-filename (see below)
2495
2496Calculates the Hurst exponent (a measure of persistence or long memory) for
2497a time-series variable having at least 128 observations. The result
2498(together with its standard error) can be retrieved via the "$result"
2499accessor.
2500
2501The Hurst exponent is discussed by Mandelbrot (1983). In theoretical terms
2502it is the exponent, H, in the relationship
2503
2504  RS(x) = an^H
2505
2506where RS is the "rescaled range" of the variable x in samples of size n and
2507a is a constant. The rescaled range is the range (maximum minus minimum) of
2508the cumulated value or partial sum of x over the sample period (after
2509subtraction of the sample mean), divided by the sample standard deviation.
2510
2511As a reference point, if x is white noise (zero mean, zero persistence) then
2512the range of its cumulated "wandering" (which forms a random walk), scaled
2513by the standard deviation, grows as the square root of the sample size,
2514giving an expected Hurst exponent of 0.5. Values of the exponent
2515significantly in excess of 0.5 indicate persistence, and values less than
25160.5 indicate anti-persistence (negative autocorrelation). In principle the
2517exponent is bounded by 0 and 1, although in finite samples it is possible to
2518get an estimated exponent greater than 1.
2519
2520In gretl, the exponent is estimated using binary sub-sampling: we start with
2521the entire data range, then the two halves of the range, then the four
2522quarters, and so on. For sample sizes smaller than the data range, the RS
2523value is the mean across the available samples. The exponent is then
2524estimated as the slope coefficient in a regression of the log of RS on the
2525log of sample size.
2526
2527By default, if the program is not in batch mode a plot of the rescaled range
2528is shown. This can be adjusted via the --plot option. The acceptable
2529parameters to this option are none (to suppress the plot); display (to
2530display a plot even when in batch mode); or a file name. The effect of
2531providing a file name is as described for the --output option of the
2532"gnuplot" command.
2533
2534Menu path:    /Variable/Hurst exponent
2535
2536# if Programming
2537
2538Flow control for command execution. Three sorts of construction are
2539supported, as follows.
2540
2541	# simple form
2542	if condition
2543	    commands
2544	endif
2545
2546	# two branches
2547	if condition
2548	    commands1
2549	else
2550	    commands2
2551	endif
2552
2553	# three or more branches
2554	if condition1
2555	    commands1
2556	elif condition2
2557	    commands2
2558	else
2559	    commands3
2560	endif
2561
2562"condition" must be a Boolean expression, for the syntax of which see
2563"genr". More than one "elif" block may be included. In addition, if ...
2564endif blocks may be nested.
2565
2566# include Programming
2567
2568Argument:   filename
2569Option:     --force (force re-reading from file)
2570Examples:   include myfile.inp
2571            include sols.gfn
2572
2573Intended for use in a command script, primarily for including definitions of
2574functions. filename should have the extension inp (a plain-text script) or
2575gfn (a gretl function package). The commands in filename are executed then
2576control is returned to the main script.
2577
2578The --force option is specific to gfn files: its effect is to force gretl to
2579re-read the function package from file even if it is already loaded into
2580memory. (Plain inp files are always read and processed in response to this
2581command.)
2582
2583See also "run".
2584
2585# info Dataset
2586
2587Prints out any supplementary information stored with the current datafile.
2588
2589Menu path:    /Data/Dataset info
2590Other access: Data browser windows
2591
2592# intreg Estimation
2593
2594Arguments:  minvar maxvar indepvars
2595Options:    --quiet (suppress printing of results)
2596            --verbose (print details of iterations)
2597            --robust (robust standard errors)
2598            --opg (see below)
2599            --cluster=clustvar (see "logit" for explanation)
2600Examples:   intreg lo hi const x1 x2
2601            See also wtp.inp
2602
2603Estimates an interval regression model. This model arises when the dependent
2604variable is imperfectly observed for some (possibly all) observations. In
2605other words, the data generating process is assumed to be
2606
2607  y* = x b + u
2608
2609but we only observe m <= y* <= M (the interval may be left- or
2610right-unbounded). Note that for some observations m may equal M. The
2611variables minvar and maxvar must contain NAs for left- and right-unbounded
2612observations, respectively.
2613
2614The model is estimated by maximum likelihood, assuming normality of the
2615disturbance term.
2616
2617By default, standard errors are computed using the negative inverse of the
2618Hessian. If the --robust flag is given, then QML or Huber-White standard
2619errors are calculated instead. In this case the estimated covariance matrix
2620is a "sandwich" of the inverse of the estimated Hessian and the outer
2621product of the gradient. Alternatively, the --opg option can be given, in
2622which case standard errors are based on the outer product of the gradient
2623alone.
2624
2625Menu path:    /Model/Limited dependent variable/Interval regression
2626
2627# johansen Tests
2628
2629Arguments:  order ylist [ ; xlist ] [ ; rxlist ]
2630Options:    --nc (no constant)
2631            --rc (restricted constant)
2632            --uc (unrestricted constant)
2633            --crt (constant and restricted trend)
2634            --ct (constant and unrestricted trend)
2635            --seasonals (include centered seasonal dummies)
2636            --asy (record asymptotic p-values)
2637            --quiet (print just the tests)
2638            --silent (don't print anything)
2639            --verbose (print details of auxiliary regressions)
2640Examples:   johansen 2 y x
2641            johansen 4 y x1 x2 --verbose
2642            johansen 3 y x1 x2 --rc
2643            See also hamilton.inp, denmark.inp
2644
2645Carries out the Johansen test for cointegration among the variables in ylist
2646for the given lag order. For details of this test see chapter 33 of the
2647Gretl User's Guide or Hamilton (1994), Chapter 20. P-values are computed via
2648Doornik's gamma approximation (Doornik, 1998). Two sets of p-values are
2649shown for the trace test, straight asymptotic values and values adjusted for
2650the sample size. By default the "$pvalue" accessor gets the adjusted
2651variant, but the --asy flag may be used to record the asymptotic values
2652instead.
2653
2654The inclusion of deterministic terms in the model is controlled by the
2655option flags. The default if no option is specified is to include an
2656"unrestricted constant", which allows for the presence of a non-zero
2657intercept in the cointegrating relations as well as a trend in the levels of
2658the endogenous variables. In the literature stemming from the work of
2659Johansen (see for example his 1995 book) this is often referred to as "case
26603". The first four options given above, which are mutually exclusive,
2661produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the
2662criteria for selecting a case are explained in chapter 33 of the Gretl
2663User's Guide.
2664
2665The optional lists xlist and rxlist allow you to control for specified
2666exogenous variables: these enter the system either unrestrictedly (xlist) or
2667restricted to the cointegration space (rxlist). These lists are separated
2668from ylist and from each other by semicolons.
2669
2670The --seasonals option, which may be combined with any of the other options,
2671specifies the inclusion of a set of centered seasonal dummy variables. This
2672option is available only for quarterly or monthly data.
2673
2674The following table is offered as a guide to the interpretation of the
2675results shown for the test, for the 3-variable case. H0 denotes the null
2676hypothesis, H1 the alternative hypothesis, and c the number of cointegrating
2677relations.
2678
2679         Rank     Trace test         Lmax test
2680                  H0     H1          H0     H1
2681         ---------------------------------------
2682          0      c = 0  c = 3       c = 0  c = 1
2683          1      c = 1  c = 3       c = 1  c = 2
2684          2      c = 2  c = 3       c = 2  c = 3
2685         ---------------------------------------
2686
2687See also the "vecm" command, and "coint" if you want the Engle-Granger
2688cointegration test.
2689
2690Menu path:    /Model/Multivariate time series
2691
2692# join Dataset
2693
2694Arguments:  filename varname
2695Options:    --data=column-name (see below)
2696            --filter=expression (see below)
2697            --ikey=inner-key (see below)
2698            --okey=outer-key (see below)
2699            --aggr=method (see below)
2700            --tkey=column-name,format-string (see below)
2701            --verbose (report on progress)
2702
2703This command imports one or more data series from the source filename (which
2704must be either a delimited text data file or a "native" gretl data file)
2705under the name varname. For details please see chapter 7 of the Gretl User's
2706Guide; here we just give a brief summary of the available options. See also
2707"append" for simpler joining operations.
2708
2709The --data option can be used to specify the column heading of the data in
2710the source file, if this differs from the name by which the data should be
2711known in gretl.
2712
2713The --filter option can be used to specify a criterion for filtering the
2714source data (that is, selecting a subset of observations).
2715
2716The --ikey and --okey options can be used to specify a mapping between
2717observations in the current dataset and observations in the source data (for
2718example, individuals can be matched against the household to which they
2719belong).
2720
2721The --aggr option is used when the mapping between observations in the
2722current dataset and the source is not one-to-one.
2723
2724The --tkey option is applicable only when the current dataset has a
2725time-series structure. It can be used to specify the name of a column
2726containing dates to be matched to the dataset and/or the format in which
2727dates are represented in that column.
2728
2729Importing more than one series at once
2730
2731The "join" command can handle the importation of several series at once.
2732This happens when (a) the varname argument is a space-separated list of
2733names rather than a single name, or (b) when it points to an array of
2734strings: the elements of this array should be the names of the series to
2735import.
2736
2737This methods has some limitations, however: the --data option is not
2738available. When importing multiple series you are obliged to accept their
2739"outer" names. The other options apply uniformly to all the series imported
2740via a given command.
2741
2742# kpss Tests
2743
2744Arguments:  order varlist
2745Options:    --trend (include a trend)
2746            --seasonals (include seasonal dummies)
2747            --verbose (print regression results)
2748            --quiet (suppress printing of results)
2749            --difference (use first difference of variable)
2750Examples:   kpss 8 y
2751            kpss 4 x1 --trend
2752
2753For use of this command with panel data please see the final section in this
2754entry.
2755
2756Computes the KPSS test (Kwiatkowski et al, Journal of Econometrics, 1992)
2757for stationarity, for each of the specified variables (or their first
2758difference, if the --difference option is selected). The null hypothesis is
2759that the variable in question is stationary, either around a level or, if
2760the --trend option is given, around a deterministic linear trend.
2761
2762The order argument determines the size of the window used for Bartlett
2763smoothing. If a negative value is given this is taken as a signal to use an
2764automatic window size of 4(T/100)^0.25, where T is the sample size.
2765
2766If the --verbose option is chosen the results of the auxiliary regression
2767are printed, along with the estimated variance of the random walk component
2768of the variable.
2769
2770The critical values shown for the test statistic are based on response
2771surfaces estimated in the manner set out by Sephton (Economics Letters,
27721995), which are more accurate for small samples than the values given in
2773the original KPSS article. When the test statistic lies between the 10
2774percent and 1 percent critical values a p-value is shown; this is obtained
2775by linear interpolation and should not be taken too literally. See the
2776"kpsscrit" function for a means of obtaining these critical values
2777programmatically.
2778
2779Panel data
2780
2781When the kpss command is used with panel data, to produce a panel unit root
2782test, the applicable options and the results shown are somewhat different.
2783While you may give a list of variables for testing in the regular
2784time-series case, with panel data only one variable may be tested per
2785command. And the --verbose option has a different meaning: it produces a
2786brief account of the test for each individual time series (the default being
2787to show only the overall result).
2788
2789When possible, the overall test (null hypothesis: the series in question is
2790stationary for all the panel units) is calculated using the method of Choi
2791(Journal of International Money and Finance, 2001). This is not always
2792straightforward, the difficulty being that while the Choi test is based on
2793the p-values of the tests on the individual series, we do not currently have
2794a means of calculating p-values for the KPSS test statistic; we must rely on
2795a few critical values.
2796
2797If the test statistic for a given series falls between the 10 percent and 1
2798percent critical values, we are able to interpolate a p-value. But if the
2799test falls short of the 10 percent value, or exceeds the 1 percent value, we
2800cannot interpolate and can at best place a bound on the global Choi test. If
2801the individual test statistic falls short of the 10 percent value for some
2802units but exceeds the 1 percent value for others, we cannot even compute a
2803bound for the global test.
2804
2805Menu path:    /Variable/Unit root tests/KPSS test
2806
2807# labels Dataset
2808
2809Variants:   labels [ varlist ]
2810            labels --to-file=filename
2811            labels --from-file=filename
2812            labels --delete
2813Examples:   oprobit.inp
2814
2815In the first form, prints out the informative labels (if present) for the
2816series in varlist, or for all series in the dataset if varlist is not
2817specified.
2818
2819With the option --to-file, writes to the named file the labels for all
2820series in the dataset, one per line. If no labels are present an error is
2821flagged; if some series have labels and others do not, a blank line is
2822printed for series with no label. The output file will be written in the
2823currently set "workdir", unless the filename string contains a full path
2824specification.
2825
2826With the option --from-file, reads the specified file (which should be plain
2827text) and assigns labels to the series in the dataset, reading one label per
2828line and taking blank lines to indicate blank labels.
2829
2830The --delete option does what you'd expect: it removes all the series labels
2831from the dataset.
2832
2833Menu path:    /Data/Variable labels
2834
2835# lad Estimation
2836
2837Arguments:  depvar indepvars
2838Options:    --vcv (print covariance matrix)
2839            --no-vcv (don't compute covariance matrix)
2840            --quiet (don't print anything)
2841
2842Calculates a regression that minimizes the sum of the absolute deviations of
2843the observed from the fitted values of the dependent variable. Coefficient
2844estimates are derived using the Barrodale-Roberts simplex algorithm; a
2845warning is printed if the solution is not unique.
2846
2847Standard errors are derived using the bootstrap procedure with 500 drawings.
2848The covariance matrix for the parameter estimates, printed when the --vcv
2849flag is given, is based on the same bootstrap. Since this is quite an
2850expensive operation, the --no-vcv option is provided for the case where the
2851covariance matrix is not required; when this option is given standard errors
2852will not be available.
2853
2854Note that this method can be slow when the sample is large or there are many
2855regressors; in that case it may be preferable to use the "quantreg" command.
2856Given a dependent variable y and a list of regressors X, the following
2857commands are basically equivalent, except that the quantreg method uses the
2858faster Frisch-Newton algorithm and provides analytical rather than
2859bootstrapped standard errors.
2860
2861	lad y const X
2862	quantreg 0.5 y const X
2863
2864Menu path:    /Model/Robust estimation/Least Absolute Deviation
2865
2866# lags Transformations
2867
2868Arguments:  [ order ; ] laglist
2869Option:     --bylag (order terms by lag)
2870Examples:   lags x y
2871            lags 12 ; x y
2872            lags 4 ; x1 x2 x3 --bylag
2873            See also sw_ch12.inp, sw_ch14.inp
2874
2875Creates new series which are lagged values of each of the series in varlist.
2876By default the number of lags created equals the periodicity of the data.
2877For example, if the periodicity is 4 (quarterly), the command "lags x"
2878creates
2879
2880	x_1 = x(t-1)
2881	x_2 = x(t-2)
2882	x_3 = x(t-3)
2883	x_4 = x(t-4)
2884
2885The number of lags created can be controlled by the optional first parameter
2886(which, if present, must be followed by a semicolon).
2887
2888The --bylag option is meaningful only if varlist contains more than one
2889series and the maximum lag order is greater than 1. By default the lagged
2890terms are added to the dataset by variable: first all lags of the first
2891series, then all lags of the second series, and so on. But if --bylag is
2892given, the ordering is by lags: first lag 1 of all the listed series, then
2893lag 2 of all the list series, and so on.
2894
2895Menu path:    /Add/Lags of selected variables
2896
2897# ldiff Transformations
2898
2899Argument:   varlist
2900
2901The first difference of the natural log of each series in varlist is
2902obtained and the result stored in a new series with the prefix ld_. Thus
2903"ldiff x y" creates the new variables
2904
2905	ld_x = log(x) - log(x(-1))
2906	ld_y = log(y) - log(y(-1))
2907
2908Menu path:    /Add/Log differences of selected variables
2909
2910# leverage Tests
2911
2912Options:    --save (save the resulting series)
2913            --overwrite (OK to overwrite existing series)
2914            --quiet (don't print results)
2915            --plot=mode-or-filename (see below)
2916Examples:   leverage.inp
2917
2918Must follow an "ols" command. Calculates the leverage (h, which must lie in
2919the range 0 to 1) for each data point in the sample on which the previous
2920model was estimated. Displays the residual (u) for each observation along
2921with its leverage and a measure of its influence on the estimates, uh/(1 -
2922h). "Leverage points" for which the value of h exceeds 2k/n (where k is the
2923number of parameters being estimated and n is the sample size) are flagged
2924with an asterisk. For details on the concepts of leverage and influence see
2925Davidson and MacKinnon (1993), Chapter 2.
2926
2927DFFITS values are also computed: these are "studentized residuals"
2928(predicted residuals divided by their standard errors) multiplied by
2929sqrt[h/(1 - h)]. For discussions of studentized residuals and DFFITS see
2930chapter 12 of Maddala's Introduction to Econometrics or Belsley, Kuh and
2931Welsch (1980).
2932
2933Briefly, a "predicted residual" is the difference between the observed value
2934of the dependent variable at observation t, and the fitted value for
2935observation t obtained from a regression in which that observation is
2936omitted (or a dummy variable with value 1 for observation t alone has been
2937added); the studentized residual is obtained by dividing the predicted
2938residual by its standard error.
2939
2940If the --save flag is given with this command, the leverage, influence and
2941DFFITS values are added to the current data set; in this context the --quiet
2942flag may be used to suppress the printing of results. The default names of
2943the saved series are, respectively, lever, influ and dffits. If series of
2944these names already exist, what happens depends on whether the --overwrite
2945option is given. If so, the existing series are overwritten; if not, the
2946names will be adjusted to ensure uniqueness. In the latter case the newly
2947generated series will always be the highest-numbered three series in the
2948dataset.
2949
2950After execution, the "$test" accessor returns the cross-validation
2951criterion, which is defined as the sum of squared deviations of the
2952dependent variable from its forecast value, the forecast for each
2953observation being based on a sample from which that observation is excluded.
2954(This is known as the leave-one-out estimator). For a broader discussion of
2955the cross-validation criterion, see Davidson and MacKinnon's Econometric
2956Theory and Methods, pages 685-686, and the references therein.
2957
2958By default, if this command is invoked interactively a plot of the leverage
2959and influence values is shown. This can be adjusted via the --plot option.
2960The acceptable parameters to this option are none (to suppress the plot);
2961display (to display a plot even when in script mode); or a file name. The
2962effect of providing a file name is as described for the --output option of
2963the "gnuplot" command.
2964
2965Menu path:    Model window, /Analysis/Influential observations
2966
2967# levinlin Tests
2968
2969Arguments:  order series
2970Options:    --nc (test without a constant)
2971            --ct (with constant and trend)
2972            --quiet (suppress printing of results)
2973            --verbose (print per-unit results)
2974Examples:   levinlin 0 y
2975            levinlin 2 y --ct
2976            levinlin {2,2,3,3,4,4} y
2977
2978Carries out the panel unit-root test described by Levin, Lin and Chu (2002).
2979The null hypothesis is that all of the individual time series exhibit a unit
2980root, and the alternative is that none of the series has a unit root. (That
2981is, a common AR(1) coefficient is assumed, although in other respects the
2982statistical properties of the series are allowed to vary across
2983individuals.)
2984
2985By default the test ADF regressions include a constant; to suppress the
2986constant use the --nc option, or to add a linear trend use the --ct option.
2987(See the "adf" command for explanation of ADF regressions.)
2988
2989The (non-negative) order for the test (governing the number of lags of the
2990dependent variable to include in the ADF regressions) may be given in either
2991of two forms. If a scalar value is given, this is applied to all the
2992individuals in the panel. The alternative is to provide a matrix containing
2993a specific lag order for each individual; this must be a vector with as many
2994elements as there are individuals in the current sample range. Such a matrix
2995can be specified by name, or constructed using braces as illustrated in the
2996last example above.
2997
2998When the --verbose option is given, the following results are printed for
2999each unit in the panel: delta, the coefficient on the lagged level in each
3000ADF regression; s2e, the estimated variance of the innovations; and s2y, the
3001estimated long-run variance of the differenced series.
3002
3003Note that panel unit-root tests can also be conducted using the "adf" and
3004"kpss" commands.
3005
3006Menu path:    /Variable/Unit root tests/Levin-Lin-Chu test
3007
3008# logistic Estimation
3009
3010Arguments:  depvar indepvars
3011Options:    --ymax=value (specify maximum of dependent variable)
3012            --robust (robust standard errors)
3013            --cluster=clustvar (see "logit" for explanation)
3014            --vcv (print covariance matrix)
3015            --fixed-effects (see below)
3016            --quiet (don't print anything)
3017Examples:   logistic y const x
3018            logistic y const x --ymax=50
3019
3020Logistic regression: carries out an OLS regression using the logistic
3021transformation of the dependent variable,
3022
3023  log(y/(y* - y))
3024
3025In the case of panel data the specification may include individual fixed
3026effects.
3027
3028The dependent variable must be strictly positive. If all its values lie
3029between 0 and 1, the default is to use a y^* value (the asymptotic maximum
3030of the dependent variable) of 1; if its values lie between 0 and 100, the
3031default y^* is 100.
3032
3033If you wish to set a different maximum, use the --ymax option. Note that the
3034supplied value must be greater than all of the observed values of the
3035dependent variable.
3036
3037The fitted values and residuals from the regression are automatically
3038adjusted using the inverse of the logistic transformation:
3039
3040  y =~ E(y* / (1 + exp(-x)))
3041
3042where x represents either a fitted value or a residual from the OLS
3043regression using the logistic dependent variable. The reported values are
3044therefore comparable with the original dependent variable. The need for
3045approximation arises because the inverse transformation is nonlinear and
3046therefore does not conserve expectation.
3047
3048The --fixed-effects option is applicable only if the dataset takes the form
3049of a panel. In that case we subtract the group means from the logistic
3050transform of the dependent variable and estimation proceeds as usual for
3051fixed effects.
3052
3053Note that if the dependent variable is binary, you should use the "logit"
3054command instead.
3055
3056Menu path:    /Model/Limited dependent variable/Logistic
3057Menu path:    /Model/Panel/FE logistic
3058
3059# logit Estimation
3060
3061Arguments:  depvar indepvars
3062Options:    --robust (robust standard errors)
3063            --cluster=clustvar (clustered standard errors)
3064            --multinomial (estimate multinomial logit)
3065            --vcv (print covariance matrix)
3066            --verbose (print details of iterations)
3067            --quiet (don't print results)
3068            --p-values (show p-values instead of slopes)
3069            --estrella (select pseudo-R-squared variant)
3070Examples:   keane.inp, oprobit.inp
3071
3072If the dependent variable is a binary variable (all values are 0 or 1)
3073maximum likelihood estimates of the coefficients on indepvars are obtained
3074via the Newton-Raphson method. As the model is nonlinear the slopes depend
3075on the values of the independent variables. By default the slopes with
3076respect to each of the independent variables are calculated (at the means of
3077those variables) and these slopes replace the usual p-values in the
3078regression output. This behavior can be suppressed by giving the --p-values
3079option. The chi-square statistic tests the null hypothesis that all
3080coefficients are zero apart from the constant.
3081
3082By default, standard errors are computed using the negative inverse of the
3083Hessian. If the --robust flag is given, then QML or Huber-White standard
3084errors are calculated instead. In this case the estimated covariance matrix
3085is a "sandwich" of the inverse of the estimated Hessian and the outer
3086product of the gradient; see chapter 10 of Davidson and MacKinnon (2004).
3087But if the --cluster option is given, then "cluster-robust" standard errors
3088are produced; see chapter 22 of the Gretl User's Guide for details.
3089
3090By default the pseudo-R-squared statistic suggested by McFadden (1974) is
3091shown, but in the binary case if the --estrella option is given, the variant
3092recommended by Estrella (1998) is shown instead. This variant arguably
3093mimics more closely the properties of the regular R^2 in the context of
3094least-squares estimation.
3095
3096If the dependent variable is not binary but is discrete, then by default it
3097is interpreted as an ordinal response, and Ordered Logit estimates are
3098obtained. However, if the --multinomial option is given, the dependent
3099variable is interpreted as an unordered response, and Multinomial Logit
3100estimates are produced. (In either case, if the variable selected as
3101dependent is not discrete an error is flagged.) In the multinomial case, the
3102accessor $mnlprobs is available after estimation, to get a matrix containing
3103the estimated probabilities of the outcomes at each observation
3104(observations in rows, outcomes in columns).
3105
3106If you want to use logit for analysis of proportions (where the dependent
3107variable is the proportion of cases having a certain characteristic, at each
3108observation, rather than a 1 or 0 variable indicating whether the
3109characteristic is present or not) you should not use the "logit" command,
3110but rather construct the logit variable, as in
3111
3112	series lgt_p = log(p/(1 - p))
3113
3114and use this as the dependent variable in an OLS regression. See chapter 12
3115of Ramanathan (2002).
3116
3117Menu path:    /Model/Limited dependent variable/Logit
3118
3119# logs Transformations
3120
3121Argument:   varlist
3122
3123The natural log of each of the series in varlist is obtained and the result
3124stored in a new series with the prefix l_ ("el" underscore). For example,
3125"logs x y" creates the new variables l_x = ln(x) and l_y = ln(y).
3126
3127Menu path:    /Add/Logs of selected variables
3128
3129# loop Programming
3130
3131Argument:   control
3132Options:    --progressive (enable special forms of certain commands)
3133            --verbose (echo commands and show confirmatory messages)
3134Examples:   loop 1000
3135            loop 1000 --progressive
3136            loop while essdiff > .00001
3137            loop i=1991..2000 --verbose
3138            loop for (r=-.99; r<=.99; r+=.01)
3139            loop foreach i xlist
3140            See also armaloop.inp, keane.inp
3141
3142This command opens a special mode in which the program accepts commands to
3143be executed repeatedly. You exit the mode of entering loop commands with
3144"endloop": at this point the stacked commands are executed.
3145
3146The parameter "control" may take any of five forms, as shown in the
3147examples: an integer number of times to repeat the commands within the loop;
3148"while" plus a boolean condition; a range of integer values for index
3149variable; "for" plus three expressions in parentheses, separated by
3150semicolons (which emulates the for statement in the C programming language);
3151or "foreach" plus an index variable and a list.
3152
3153See chapter 13 of the Gretl User's Guide for further details and examples.
3154The effect of the --progressive option (which is designed for use in Monte
3155Carlo simulations) is explained there. Not all gretl commands may be used
3156within a loop; the commands available in this context are also set out
3157there.
3158
3159By default, execution of commands proceeds more quietly within loops than in
3160other contexts. If you want more feedback on what's going on in a loop, give
3161the --verbose option.
3162
3163# mahal Statistics
3164
3165Argument:   varlist
3166Options:    --quiet (don't print anything)
3167            --save (add distances to the dataset)
3168            --vcv (print covariance matrix)
3169
3170Computes the Mahalanobis distances between the series in varlist. The
3171Mahalanobis distance is the distance between two points in a k-dimensional
3172space, scaled by the statistical variation in each dimension of the space.
3173For example, if p and q are two observations on a set of k variables with
3174covariance matrix C, then the Mahalanobis distance between the observations
3175is given by
3176
3177  sqrt((p - q)' * C-inverse * (p - q))
3178
3179where (p - q) is a k-vector. This reduces to Euclidean distance if the
3180covariance matrix is the identity matrix.
3181
3182The space for which distances are computed is defined by the selected
3183variables. For each observation in the current sample range, the distance is
3184computed between the observation and the centroid of the selected variables.
3185This distance is the multidimensional counterpart of a standard z-score, and
3186can be used to judge whether a given observation "belongs" with a group of
3187other observations.
3188
3189If the --vcv option is given, the covariance matrix and its inverse are
3190printed. If the --save option is given, the distances are saved to the
3191dataset under the name mdist (or mdist1, mdist2 and so on if there is
3192already a variable of that name).
3193
3194Menu path:    /View/Mahalanobis distances
3195
3196# makepkg Programming
3197
3198Argument:   filename
3199Options:    --index (write auxiliary index file)
3200            --translations (write auxiliary strings file)
3201            --quiet (operate quietly)
3202
3203Supports creation of a gretl function package via the command line. The mode
3204of operation of this command depends on the extension of filename, which
3205must be either .gfn or .zip.
3206
3207Gfn mode
3208
3209Writes a gfn file. It is assumed that a package specification file, with the
3210same basename as filename but with the extension .spec, is accessible, along
3211with any auxiliary files that it references. It is also assumed that all the
3212functions to be packaged have been read into memory.
3213
3214Zip mode
3215
3216Writes a zip package file (gfn plus other materials). If a gfn file of the
3217same basename as filename is found, gretl checks for corresponding inp and
3218spec files: if these are both found and at least one of them is newer than
3219the gfn file then the gfn is rebuilt, otherwise the existing gfn is used. If
3220no such file is found, gretl first attempts to build the gfn.
3221
3222Gfn options
3223
3224The option flags support the writing of auxiliary files, intended for use
3225with gretl "addons". The index file is a short XML document containing basic
3226information about the package; it has the same basename as the package and
3227the extension .xml. The translations file contains strings from the package
3228that may be suitable for translation, in C format; for package foo this file
3229is named foo-i18n.c. These files are not produced if the command is
3230operating in zip mode and a pre-existing gfn file is used.
3231
3232For details on all of this, see the gretl Function Package Guide.
3233
3234Menu path:    /File/Function packages/New package
3235
3236# markers Dataset
3237
3238Variants:   markers --to-file=filename
3239            markers --to-array=name
3240            markers --from-file=filename
3241            markers --delete
3242
3243The options --to-file and --to-array provide means of saving the observation
3244marker strings from the current dataset, either to a named file or a named
3245array. If no such strings are present an error is flagged. In the file case
3246output will be written in the current "workdir" unless the filename string
3247contains a full path specification. The markers are written one per line. In
3248the array case, if name is the identifier of an existing array of strings it
3249will be overwritten, otherwise a new array will be created.
3250
3251With the option --from-file, reads the specified file (which should be plain
3252text) and assigns observation markers to the rows in the dataset, reading
3253one marker per line. In general there should be at least as many markers in
3254the file as observations in the dataset, but if the dataset is a panel it is
3255also acceptable if the number of markers in the file matches the number of
3256cross-sectional units (in which case the markers are repeated for each time
3257period.)
3258
3259The --delete option does what you'd expect: it removes the observation
3260marker strings from the dataset.
3261
3262Menu path:    /Data/Observation markers
3263
3264# meantest Tests
3265
3266Arguments:  series1 series2
3267Option:     --unequal-vars (assume variances are unequal)
3268
3269Calculates the t statistic for the null hypothesis that the population means
3270are equal for the variables series1 and series2, and shows its p-value.
3271
3272By default the test statistic is calculated on the assumption that the
3273variances are equal for the two variables. With the --unequal-vars option
3274the variances are assumed to be different; in this case the degrees of
3275freedom for the test statistic are approximated as per Satterthwaite (1946).
3276
3277Menu path:    /Tools/Test statistic calculator
3278
3279# midasreg Estimation
3280
3281Arguments:  depvar indepvars ; MIDAS-terms
3282Options:    --vcv (print covariance matrix)
3283            --robust (robust standard errors)
3284            --quiet (suppress printing of results)
3285            --levenberg (see below)
3286Examples:   midasreg y 0 y(-1) ; mds(X, 1, 9, 1, theta)
3287            midasreg y 0 y(-1) ; mds(X, 1, 9, 0)
3288            midasreg y 0 y(-1) ; mdsl(XL, 2, theta)
3289            See also gdp_midas.inp
3290
3291Carries out least-squares estimation (either NLS or OLS, depending on the
3292specification) of a MIDAS (Mixed Data Sampling) model. Such models include
3293one or more independent variables that are observed at a higher frequency
3294than the dependent variable; for a good brief introduction see Armesto,
3295Engemann and Owyang (2010).
3296
3297The variables in indepvars should be of the same frequency as the dependent
3298variable. This list should usually include const or 0 (intercept) and
3299typically includes one or more lags of the dependent variable. The
3300high-frequency terms are given after a semicolon; each one takes the form of
3301a number of comma-separated arguments within parentheses, prefixed by either
3302mds or mdsl.
3303
3304mds: this variant generally requires 5 arguments, as follows: the name of a
3305"MIDAS list", two integers giving the minimum and maximum high-frequency
3306lags, an integer between 0 and 4 (or string, see below) specifying the type
3307of parameterization to use, and the name of a vector holding initial values
3308of the parameters. The example below calls for lags 3 to 11 of the
3309high-frequency series represented by the list X, using parameterization type
33101 (exponential Almon, see below) with initializer theta.
3311
3312	mds(X, 3, 11, 1, theta)
3313
3314mdsl: generally requires 3 arguments: the name of a list of MIDAS lags, an
3315integer (or string, see below) to specify the type of parameterization and
3316the name of an initialization vector. In this case the minimum and maximum
3317lags are implicit in the initial list argument. In the example below Xlags
3318should be a list which already holds all the required lags; such a list can
3319be constructed using the "hflags" function.
3320
3321	mdsl(XLags, 1, theta)
3322
3323The supported types of parameterization are shown below; in the context of
3324mds and mdsl specifications these may be given in the form of numeric codes
3325or the double-quoted strings shown after the numbers.
3326
33270 or "umidas": unrestricted MIDAS or U-MIDAS (each lag has its own
3328coefficient)
3329
33301 or "nealmon": normalized exponential Almon; requires at least one
3331parameter, commonly uses two
3332
33332 or "beta0": normalized beta with a zero last lag; requires exactly two
3334parameters
3335
33363 or "betan": normalized beta with non-zero last lag; requires exactly three
3337parameters
3338
33394 or "almonp": (non-normalized) Almon polynomial; requires at least one
3340parameter
3341
33425 or "beta1": as beta0, but with the first parameter fixed at 1, leaving a
3343single free parameter.
3344
3345When the parameterization is U-MIDAS, the final initializer argument is not
3346required. In other cases you can request an automatic initialization by
3347substituting one or other of these two forms for the name of an initial
3348parameter vector:
3349
3350  The keyword null: this is accepted if the parameterization has a fixed
3351  number of terms (the beta cases, with 2 or 3 parameters). It's also
3352  accepted for the exponential Almon case, implying the default of 2
3353  parameters.
3354
3355  An integer value giving the required number of parameters.
3356
3357The estimation method used by this command depends on the specification of
3358the high-frequency terms. In the case of U-MIDAS the method is OLS,
3359otherwise it is nonlinear least squares (NLS). When the normalized
3360exponential Almon or normalized beta parameterization is specified, the
3361default NLS method is a combination of constrained BFGS and OLS, but the
3362--levenberg option can be given to force use of the Levenberg-Marquardt
3363algorithm.
3364
3365Menu path:    /Model/Univariate time series/MIDAS
3366
3367# mle Estimation
3368
3369Arguments:  log-likelihood function [ derivatives ]
3370Options:    --quiet (don't show estimated model)
3371            --vcv (print covariance matrix)
3372            --hessian (base covariance matrix on the Hessian)
3373            --robust[=hac] (QML or HAC covariance matrix)
3374            --cluster=clustvar (cluster-robust covariance matrix)
3375            --verbose (print details of iterations)
3376            --no-gradient-check (see below)
3377            --auxiliary (see below)
3378            --lbfgs (use L-BFGS-B instead of regular BFGS)
3379Examples:   weibull.inp, biprobit_via_ghk.inp, frontier.inp, keane.inp
3380
3381Performs Maximum Likelihood (ML) estimation using either the BFGS (Broyden,
3382Fletcher, Goldfarb, Shanno) algorithm or Newton's method. The user must
3383specify the log-likelihood function. The parameters of this function must be
3384declared and given starting values prior to estimation. Optionally, the user
3385may specify the derivatives of the log-likelihood function with respect to
3386each of the parameters; if analytical derivatives are not supplied, a
3387numerical approximation is computed.
3388
3389This help text assumes use of the default BFGS maximizer. For information on
3390using Newton's method please see chapter 26 of the Gretl User's Guide.
3391
3392Simple example: Suppose we have a series X with values 0 or 1 and we wish to
3393obtain the maximum likelihood estimate of the probability, p, that X = 1.
3394(In this simple case we can guess in advance that the ML estimate of p will
3395simply equal the proportion of Xs equal to 1 in the sample.)
3396
3397The parameter p must first be added to the dataset and given an initial
3398value. For example, scalar p = 0.5.
3399
3400We then construct the MLE command block:
3401
3402	mle loglik = X*log(p) + (1-X)*log(1-p)
3403	  deriv p = X/p - (1-X)/(1-p)
3404	end mle
3405
3406The first line above specifies the log-likelihood function. It starts with
3407the keyword mle, then a dependent variable is specified and an expression
3408for the log-likelihood is given (using the same syntax as in the "genr"
3409command). The next line (which is optional) starts with the keyword deriv
3410and supplies the derivative of the log-likelihood function with respect to
3411the parameter p. If no derivatives are given, you should include a statement
3412using the keyword params which identifies the free parameters: these are
3413listed on one line, separated by spaces and can be either scalars, or
3414vectors, or any combination of the two. For example, the above could be
3415changed to:
3416
3417	mle loglik = X*log(p) + (1-X)*log(1-p)
3418	  params p
3419	end mle
3420
3421in which case numerical derivatives would be used.
3422
3423Note that any option flags should be appended to the ending line of the MLE
3424block. For example:
3425
3426	mle loglik = X*log(p) + (1-X)*log(1-p)
3427	  params p
3428	end mle --quiet
3429
3430Covariance matrix and standard errors
3431
3432If the log-likelihood function returns a series or vector giving
3433per-observation values then estimated standard errors are by default based
3434on the Outer Product of the Gradient (OPG), while if the --hessian option is
3435given they are instead based on the negative inverse of the Hessian, which
3436is approximated numerically. If the --robust option is given, a QML
3437estimator is used (namely, a sandwich of the negative inverse of the Hessian
3438and the OPG). If the hac parameter is added to this option the OPG is
3439augmented in the manner of Newey and West to allow for serial correlation of
3440the gradient. (This only makes sense with time-series data.) However, if the
3441log-likelihood function just returns a scalar value, the OPG is not
3442available (and so neither is the QML estimator), and standard errors are of
3443necessity computed using the numerical Hessian.
3444
3445In the event that you just want the primary parameter estimates you can give
3446the --auxiliary option, which suppresses computation of the covariance
3447matrix and standard errors; this will save some CPU cycles and memory usage.
3448
3449Checking analytical derivatives
3450
3451If you supply analytical derivatives, by default gretl runs a numerical
3452check on their plausibility. Occasionally this may produce false positives,
3453instances where correct derivatives appear to be wrong and estimation is
3454refused. To counter this, or to achieve a little extra speed, you can give
3455the option --no-gradient-check. Obviously, you should do this only if you
3456are confident that the gradient you have specified is right.
3457
3458Parameter names
3459
3460In estimating a nonlinear model it is often convenient to name the
3461parameters tersely. In printing the results, however, it may be desirable to
3462use more informative labels. This can be achieved via the additional keyword
3463param_names within the command block. For a model with k parameters the
3464argument following this keyword should be a double-quoted string literal
3465holding k space-separated names, the name of a string variable that holds k
3466such names, or the name of an array of k strings.
3467
3468For an in-depth description of "mle" please refer to chapter 26 of the Gretl
3469User's Guide.
3470
3471Menu path:    /Model/Maximum likelihood
3472
3473# modeltab Utilities
3474
3475Variants:   modeltab add
3476            modeltab show
3477            modeltab free
3478            modeltab --output=filename
3479
3480Manipulates the gretl "model table". See chapter 3 of the Gretl User's Guide
3481for details. The sub-commands have the following effects: "add" adds the
3482last model estimated to the model table, if possible; "show" displays the
3483model table in a window; and "free" clears the table.
3484
3485To call for printing of the model table, use the flag --output= plus a
3486filename. If the filename has the suffix ".tex", the output will be in TeX
3487format; if the suffix is ".rtf" the output will be RTF; otherwise it will be
3488plain text. In the case of TeX output the default is to produce a
3489"fragment", suitable for inclusion in a document; if you want a stand-alone
3490document instead, use the --complete option, for example
3491
3492	modeltab --output="myfile.tex" --complete
3493
3494Menu path:    Session icon window, Model table icon
3495
3496# modprint Printing
3497
3498Arguments:  coeffmat names [ addstats ]
3499Option:     --output=filename (send output to specified file)
3500
3501Prints the coefficient table and optional additional statistics for a model
3502estimated "by hand". Mainly useful for user-written functions.
3503
3504The argument coeffmat should be a k by 2 matrix containing k coefficients
3505and k associated standard errors. The names argument should supply at least
3506k names for labeling the coefficients; it can take the form of a string
3507literal (in double quotes) or string variable, in which case the names
3508should be separated by commas or spaces, or it may be given as a named array
3509of strings.
3510
3511The optional argument addstats is a vector containing p additional
3512statistics to be printed under the coefficient table. If this argument is
3513given, then names should contain k + p names, the additional p names to be
3514associated with the extra statistics.
3515
3516If addstats is not provided and the coeffmat matrix has row names attached,
3517then the names argument can be omitted.
3518
3519To put the output into a file, use the flag --output= plus a filename. If
3520the filename has the suffix ".tex", the output will be in TeX format; if the
3521suffix is ".rtf" the output will be RTF; otherwise it will be plain text. In
3522the case of TeX output the default is to produce a "fragment", suitable for
3523inclusion in a document; if you want a stand-alone document instead, use the
3524--complete option.
3525
3526The output file will be written in the currently set "workdir", unless the
3527filename string contains a full path specification.
3528
3529# modtest Tests
3530
3531Argument:   [ order ]
3532Options:    --normality (normality of residual)
3533            --logs (nonlinearity, logs)
3534            --squares (nonlinearity, squares)
3535            --autocorr (serial correlation)
3536            --arch (ARCH)
3537            --white (heteroskedasticity, White's test)
3538            --white-nocross (White's test, squares only)
3539            --breusch-pagan (heteroskedasticity, Breusch-Pagan)
3540            --robust (robust variance estimate for Breusch-Pagan)
3541            --panel (heteroskedasticity, groupwise)
3542            --comfac (common factor restriction, AR1 models only)
3543            --xdepend (cross-sectional dependence, panel data only)
3544            --quiet (don't print details)
3545            --silent (don't print anything)
3546Examples:   credscore.inp
3547
3548Must immediately follow an estimation command. The discussion below applies
3549to usage of the command following estimation of a single-equation model; see
3550chapter 32 of the Gretl User's Guide for an account of how "modtest"
3551operates after estimation of a VAR.
3552
3553Depending on the option given, this command carries out one of the
3554following: the Doornik-Hansen test for the normality of the error term; a
3555Lagrange Multiplier test for nonlinearity (logs or squares); White's test
3556(with or without cross-products) or the Breusch-Pagan test (Breusch and
3557Pagan, 1979) for heteroskedasticity; the LMF test for serial correlation
3558(Kiviet, 1986); a test for ARCH (Autoregressive Conditional
3559Heteroskedasticity; see also the "arch" command); a test of the common
3560factor restriction implied by AR(1) estimation; or a test for
3561cross-sectional dependence in panel-data models. With the exception of the
3562normality, common factor and cross-sectional dependence tests most of the
3563options are only available for models estimated via OLS, but see below for
3564details regarding two-stage least squares.
3565
3566The optional order argument is relevant only in case the --autocorr or
3567--arch options are selected. The default is to run these tests using a lag
3568order equal to the periodicity of the data, but this can be adjusted by
3569supplying a specific lag order.
3570
3571The --robust option applies only when the Breusch-Pagan test is selected;
3572its effect is to use the robust variance estimator proposed by Koenker
3573(1981), making the test less sensitive to the assumption of normality.
3574
3575The --panel option is available only when the model is estimated on panel
3576data: in this case a test for groupwise heteroskedasticity is performed
3577(that is, for a differing error variance across the cross-sectional units).
3578
3579The --comfac option is available only when the model is estimated via an
3580AR(1) method such as Hildreth-Lu. The auxiliary regression takes the form of
3581a relatively unrestricted dynamic model, which is used to test the common
3582factor restriction implicit in the AR(1) specification.
3583
3584The --xdepend option is available only for models estimated on panel data.
3585The test statistic is that developed by Pesaran (2004). The null hypothesis
3586is that the error term is independently distributed across the
3587cross-sectional units or individuals.
3588
3589By default, the program prints the auxiliary regression on which the test
3590statistic is based, where applicable. This may be suppressed by using the
3591--quiet flag (minimal printed output) or the --silent flag (no printed
3592output). The test statistic and its p-value may be retrieved using the
3593accessors "$test" and "$pvalue" respectively.
3594
3595When a model has been estimated by two-stage least squares (see "tsls"), the
3596LM principle breaks down and gretl offers some equivalents: the --autocorr
3597option computes Godfrey's test for autocorrelation (Godfrey, 1994) while the
3598--white option yields the HET1 heteroskedasticity test (Pesaran and Taylor,
35991999).
3600
3601For additional diagnostic tests on models, see "chow", "cusum", "reset" and
3602"qlrtest".
3603
3604Menu path:    Model window, /Tests
3605
3606# mpols Estimation
3607
3608Arguments:  depvar indepvars
3609Options:    --vcv (print covariance matrix)
3610            --simple-print (do not print auxiliary statistics)
3611            --quiet (suppress printing of results)
3612
3613Computes OLS estimates for the specified model using multiple precision
3614floating-point arithmetic, with the help of the Gnu Multiple Precision (GMP)
3615library. By default 256 bits of precision are used for the calculations, but
3616this can be increased via the environment variable GRETL_MP_BITS. For
3617example, when using the bash shell one could issue the following command,
3618before starting gretl, to set a precision of 1024 bits.
3619
3620	export GRETL_MP_BITS=1024
3621
3622A rather arcane option is available for this command (primarily for testing
3623purposes): if the indepvars list is followed by a semicolon and a further
3624list of numbers, those numbers are taken as powers of x to be added to the
3625regression, where x is the last variable in indepvars. These additional
3626terms are computed and stored in multiple precision. In the following
3627example y is regressed on x and the second, third and fourth powers of x:
3628
3629	mpols y 0 x ; 2 3 4
3630
3631Menu path:    /Model/Other linear models/High precision OLS
3632
3633# negbin Estimation
3634
3635Arguments:  depvar indepvars [ ; offset ]
3636Options:    --model1 (use NegBin 1 model)
3637            --robust (QML covariance matrix)
3638            --cluster=clustvar (see "logit" for explanation)
3639            --opg (see below)
3640            --vcv (print covariance matrix)
3641            --verbose (print details of iterations)
3642            --quiet (don't print results)
3643Examples:   camtriv.inp
3644
3645Estimates a Negative Binomial model. The dependent variable is taken to
3646represent a count of the occurrence of events of some sort, and must have
3647only non-negative integer values. By default the model NegBin 2 is used, in
3648which the conditional variance of the count is given by mu(1 + αmu), where
3649mu denotes the conditional mean. But if the --model1 option is given the
3650conditional variance is mu(1 + α).
3651
3652The optional offset series works in the same way as for the "poisson"
3653command. The Poisson model is a restricted form of the Negative Binomial in
3654which α = 0 by construction.
3655
3656By default, standard errors are computed using a numerical approximation to
3657the Hessian at convergence. But if the --opg option is given the covariance
3658matrix is based on the Outer Product of the Gradient (OPG), or if the
3659--robust option is given QML standard errors are calculated, using a
3660"sandwich" of the inverse of the Hessian and the OPG.
3661
3662Menu path:    /Model/Limited dependent variable/Count data
3663
3664# nls Estimation
3665
3666Arguments:  function [ derivatives ]
3667Options:    --quiet (don't show estimated model)
3668            --robust (robust standard errors)
3669            --vcv (print covariance matrix)
3670            --verbose (print details of iterations)
3671            --no-gradient-check (see below)
3672Examples:   wg_nls.inp, ects_nls.inp
3673
3674Performs Nonlinear Least Squares (NLS) estimation using a modified version
3675of the Levenberg-Marquardt algorithm. You must supply a function
3676specification. The parameters of this function must be declared and given
3677starting values prior to estimation. Optionally, you may specify the
3678derivatives of the regression function with respect to each of the
3679parameters. If you do not supply derivatives you should instead give a list
3680of the parameters to be estimated (separated by spaces or commas), preceded
3681by the keyword params. In the latter case a numerical approximation to the
3682Jacobian is computed.
3683
3684It is easiest to show what is required by example. The following is a
3685complete script to estimate the nonlinear consumption function set out in
3686William Greene's Econometric Analysis (Chapter 11 of the 4th edition, or
3687Chapter 9 of the 5th). The numbers to the left of the lines are for
3688reference and are not part of the commands. Note that any option flags, such
3689as --vcv for printing the covariance matrix of the parameter estimates,
3690should be appended to the final command, end nls.
3691
3692	1   open greene11_3.gdt
3693	2   ols C 0 Y
3694	3   scalar a = $coeff(0)
3695	4   scalar b = $coeff(Y)
3696	5   scalar g = 1.0
3697	6   nls C = a + b * Y^g
3698	7    deriv a = 1
3699	8    deriv b = Y^g
3700	9    deriv g = b * Y^g * log(Y)
3701	10  end nls --vcv
3702
3703It is often convenient to initialize the parameters by reference to a
3704related linear model; that is accomplished here on lines 2 to 5. The
3705parameters alpha, beta and gamma could be set to any initial values (not
3706necessarily based on a model estimated with OLS), although convergence of
3707the NLS procedure is not guaranteed for an arbitrary starting point.
3708
3709The actual NLS commands occupy lines 6 to 10. On line 6 the "nls" command is
3710given: a dependent variable is specified, followed by an equals sign,
3711followed by a function specification. The syntax for the expression on the
3712right is the same as that for the "genr" command. The next three lines
3713specify the derivatives of the regression function with respect to each of
3714the parameters in turn. Each line begins with the keyword "deriv", gives the
3715name of a parameter, an equals sign, and an expression whereby the
3716derivative can be calculated. As an alternative to supplying analytical
3717derivatives, you could substitute the following for lines 7 to 9:
3718
3719	params a b g
3720
3721Line 10, "end nls", completes the command and calls for estimation. Any
3722options should be appended to this line.
3723
3724If you supply analytical derivatives, by default gretl runs a numerical
3725check on their plausibility. Occasionally this may produce false positives,
3726instances where correct derivatives appear to be wrong and estimation is
3727refused. To counter this, or to achieve a little extra speed, you can give
3728the option --no-gradient-check. Obviously, you should do this only if you
3729are confident that the gradient you have specified is right.
3730
3731Parameter names
3732
3733In estimating a nonlinear model it is often convenient to name the
3734parameters tersely. In printing the results, however, it may be desirable to
3735use more informative labels. This can be achieved via the additional keyword
3736param_names within the command block. For a model with k parameters the
3737argument following this keyword should be a double-quoted string literal
3738holding k space-separated names, the name of a string variable that holds k
3739such names, or the name of an array of k strings.
3740
3741For further details on NLS estimation please see chapter 25 of the Gretl
3742User's Guide.
3743
3744Menu path:    /Model/Nonlinear Least Squares
3745
3746# normtest Tests
3747
3748Argument:   series
3749Options:    --dhansen (Doornik-Hansen test, the default)
3750            --swilk (Shapiro-Wilk test)
3751            --lillie (Lilliefors test)
3752            --jbera (Jarque-Bera test)
3753            --all (do all tests)
3754            --quiet (suppress printed output)
3755
3756Carries out a test for normality for the given series. The specific test is
3757controlled by the option flags (but if no flag is given, the Doornik-Hansen
3758test is performed). Note: the Doornik-Hansen and Shapiro-Wilk tests are
3759recommended over the others, on account of their superior small-sample
3760properties.
3761
3762The test statistic and its p-value may be retrieved using the accessors
3763"$test" and "$pvalue". Please note that if the --all option is given, the
3764result recorded is that from the Doornik-Hansen test.
3765
3766Menu path:    /Variable/Normality test
3767
3768# nulldata Dataset
3769
3770Argument:   series_length
3771Option:     --preserve (preserve variables other than series)
3772Example:    nulldata 500
3773
3774Establishes a "blank" data set, containing only a constant and an index
3775variable, with periodicity 1 and the specified number of observations. This
3776may be used for simulation purposes: functions such as "uniform()" and
3777"normal()" will generate artificial series from scratch to fill out the data
3778set. This command may be useful in conjunction with "loop". See also the
3779"seed" option to the "set" command.
3780
3781By default, this command cleans out all data in gretl's current workspace:
3782not only series but also matrices, scalars, strings, etc. If you give the
3783--preserve option, however, any currently defined variables other than
3784series are retained.
3785
3786Menu path:    /File/New data set
3787
3788# ols Estimation
3789
3790Arguments:  depvar indepvars
3791Options:    --vcv (print covariance matrix)
3792            --robust (robust standard errors)
3793            --cluster=clustvar (clustered standard errors)
3794            --jackknife (see below)
3795            --simple-print (do not print auxiliary statistics)
3796            --quiet (suppress printing of results)
3797            --anova (print an ANOVA table)
3798            --no-df-corr (suppress degrees of freedom correction)
3799            --print-final (see below)
3800Examples:   ols 1 0 2 4 6 7
3801            ols y 0 x1 x2 x3 --vcv
3802            ols y 0 x1 x2 x3 --quiet
3803
3804Computes ordinary least squares (OLS) estimates with depvar as the dependent
3805variable and indepvars as the list of independent variables. Variables may
3806be specified by name or number; use the number zero for a constant term.
3807
3808Besides coefficient estimates and standard errors, the program also prints
3809p-values for t (two-tailed) and F-statistics. A p-value below 0.01 indicates
3810statistical significance at the 1 percent level and is marked with ***. **
3811indicates significance between 1 and 5 percent and * indicates significance
3812between the 5 and 10 percent levels. Model selection statistics (the Akaike
3813Information Criterion or AIC and Schwarz's Bayesian Information Criterion)
3814are also printed. The formula used for the AIC is that given by Akaike
3815(1974), namely minus two times the maximized log-likelihood plus two times
3816the number of parameters estimated.
3817
3818If the option --no-df-corr is given, the usual degrees of freedom correction
3819is not applied when calculating the estimated error variance (and hence also
3820the standard errors of the parameter estimates).
3821
3822The option --print-final is applicable only in the context of a "loop". It
3823arranges for the regression to be run silently on all but the final
3824iteration of the loop. See chapter 13 of the Gretl User's Guide for details.
3825
3826Various internal variables may be retrieved following estimation. For
3827example
3828
3829	series uh = $uhat
3830
3831saves the residuals under the name uh. See the "accessors" section of the
3832gretl function reference for details.
3833
3834The specific formula ("HC" version) used for generating robust standard
3835errors when the --robust option is given can be adjusted via the "set"
3836command. The --jackknife option has the effect of selecting an hc_version of
38373a. The --cluster overrides the selection of HC version, and produces robust
3838standard errors by grouping the observations by the distinct values of
3839clustvar; see chapter 22 of the Gretl User's Guide for details.
3840
3841Menu path:    /Model/Ordinary Least Squares
3842Other access: Beta-hat button on toolbar
3843
3844# omit Tests
3845
3846Argument:   varlist
3847Options:    --test-only (don't replace the current model)
3848            --chi-square (give chi-square form of Wald test)
3849            --quiet (print only the basic test result)
3850            --silent (don't print anything)
3851            --vcv (print covariance matrix for reduced model)
3852            --auto[=alpha] (sequential elimination, see below)
3853Examples:   omit 5 7 9
3854            omit seasonals --quiet
3855            omit --auto
3856            omit --auto=0.05
3857            See also restrict.inp, sw_ch12.inp, sw_ch14.inp
3858
3859This command must follow an estimation command. In its primary form, it
3860calculates a Wald test for the joint significance of the variables in
3861varlist, which should be a subset (though not necessarily a proper subset)
3862of the independent variables in the model last estimated. The results of the
3863test may be retrieved using the accessors "$test" and "$pvalue".
3864
3865Unless the restriction removes all the original regressors, by default the
3866restricted model is estimated and it replaces the original as the "current
3867model" for the purposes of, for example, retrieving the residuals as $uhat
3868or doing further tests. This behavior may be suppressed via the --test-only
3869option.
3870
3871By default the F-form of the Wald test is recorded; the --chi-square option
3872may be used to record the chi-square form instead.
3873
3874If the restricted model is both estimated and printed, the --vcv option has
3875the effect of printing its covariance matrix, otherwise this option is
3876ignored.
3877
3878Alternatively, if the --auto flag is given, sequential elimination is
3879performed: at each step the variable with the highest p-value is omitted,
3880until all remaining variables have a p-value no greater than some cutoff.
3881The default cutoff is 10 percent (two-sided); this can be adjusted by
3882appending "=" and a value between 0 and 1 (with no spaces), as in the fourth
3883example above. If varlist is given this process is confined to the listed
3884variables, otherwise all regressors aside from the constant are treated as
3885candidates for omission. Note that the --auto and --test-only options cannot
3886be combined.
3887
3888Menu path:    Model window, /Tests/Omit variables
3889
3890# open Dataset
3891
3892Argument:   filename
3893Options:    --quiet (don't print list of series)
3894            --preserve (preserve variables other than series)
3895            --select=selection (read only the specified series, see below)
3896            --frompkg=pkgname (see below)
3897            --all-cols (see below)
3898            --www (use a database on the gretl server)
3899            --odbc (use an ODBC database)
3900            See below for additional specialized options
3901Examples:   open data4-1
3902            open voter.dta
3903            open fedbog.bin --www
3904            open dbnomics
3905
3906Opens a data file or database -- see chapter 4 of the Gretl User's Guide for
3907an explanation of this distinction. The effect is somewhat different in the
3908two cases. When a data file is opened, its content is read into gretl's
3909workspace, replacing the current dataset (if any). To add data to the
3910current dataset instead of replacing, see "append" or (for greater
3911flexibility) "join". When a database is opened this does not immediately
3912load any data; rather, it sets the source for subsequent invocations of the
3913"data" command, which is used to import selected series. For specifics
3914regarding databases see the section headed "Opening a database" below.
3915
3916If filename is not given as a full path, gretl will search some relevant
3917paths to try to find the file, with "workdir" as a first choice. If no
3918filename suffix is given (as in the first example above), gretl assumes a
3919native datafile with suffix .gdt. Based on the name of the file and various
3920heuristics, gretl will try to detect the format of the data file (native,
3921plain text, CSV, MS Excel, Stata, SPSS, etc.).
3922
3923If the --frompkg option is used, gretl will look for the specified data file
3924in the subdirectory associated with the function package specified by
3925pkgname.
3926
3927If the filename argument takes the form of a URI starting with http:// or
3928https://, then gretl will attempt to download the indicated data file before
3929opening it.
3930
3931By default, opening a new data file clears the current gretl session, which
3932includes deletion of all named variables, including matrices, scalars and
3933strings. If you wish to keep your currently defined variables (other than
3934series, which are necessarily cleared out), use the --preserve option.
3935
3936Spreadsheet files
3937
3938When opening a data file in a spreadsheet format (Gnumeric, Open Document or
3939MS Excel), you may give up to three additional parameters following the
3940filename. First, you can select a particular worksheet within the file. This
3941is done either by giving its (1-based) number, using the syntax, e.g.,
3942--sheet=2, or, if you know the name of the sheet, by giving the name in
3943double quotes, as in --sheet="MacroData". The default is to read the first
3944worksheet. You can also specify a column and/or row offset into the
3945worksheet via, e.g.,
3946
3947	--coloffset=3 --rowoffset=2
3948
3949which would cause gretl to ignore the first 3 columns and the first 2 rows.
3950The default is an offset of 0 in both dimensions, that is, to start reading
3951at the top-left cell.
3952
3953Delimited text files
3954
3955With plain text files, gretl generally expects to find the data columns
3956delimited in some standard manner (generally via comma, tab, space or
3957semicolon). By default gretl looks for observation labels or dates in the
3958first column if its heading is empty or is a suggestive string such as
3959"year", "date" or "obs". You can prevent gretl from treating the first
3960column specially by giving the --all-cols option.
3961
3962Fixed format text
3963
3964A "fixed format" text data file is one without column delimiters, but in
3965which the data are laid out according to a known set of specifications such
3966as "variable k occupies 8 columns starting at column 24". To read such
3967files, you should append a string --fixed-cols=colspec, where colspec is
3968composed of comma-separated integers. These integers are interpreted as a
3969set of pairs. The first element of each pair denotes a starting column,
3970measured in bytes from the beginning of the line with 1 indicating the first
3971byte; and the second element indicates how many bytes should be read for the
3972given field. So, for example, if you say
3973
3974	open fixed.txt --fixed-cols=1,6,20,3
3975
3976then for variable 1 gretl will read 6 bytes starting at column 1; and for
3977variable 2, 3 bytes starting at column 20. Lines that are blank, or that
3978begin with #, are ignored, but otherwise the column-reading template is
3979applied, and if anything other than a valid numerical value is found an
3980error is flagged. If the data are read successfully, the variables will be
3981named v1, v2, etc. It's up to the user to provide meaningful names and/or
3982descriptions using the commands "rename" and/or "setinfo".
3983
3984By default, when you import a file that contains string-valued series, a
3985text box will open showing you the contents of string_table.txt, a file
3986which contains the mapping between strings and their numeric coding. You can
3987suppress this behavior via the --quiet option.
3988
3989Loading selected series
3990
3991Use of open with a data file argument (as opposed to the database case, see
3992below) generally implies loading all series from the specified file.
3993However, in the case of native gretl files (gdt and gdtb) only, it is
3994possible to specify by name a subset of series to load. This is done via the
3995--select option, which requires an accompanying argument in one of three
3996forms: the name of a single series; a list of names, separated by spaces and
3997enclosed in double quotes; or the name of an array of strings. Examples:
3998
3999	# single series
4000	open somefile.gdt --select=x1
4001	# more than one series
4002	open somefile.gdt --select="x1 x5 x27"
4003	# alternative method
4004	strings Sel = defarray("x1", "x5", "x27")
4005	open somefile.gdt --select=Sel
4006
4007Opening a database
4008
4009As mentioned above, the open command can be used to open a database file for
4010subsequent reading via the "data" command. Supported file-types are native
4011gretl databases, RATS 4.0 and PcGive.
4012
4013Besides reading a file of one of these types on the local machine, three
4014further cases are supported. First, if the --www option is given, gretl will
4015try to access a native gretl database of the given name on the gretl server
4016-- for instance the Federal Reserve interest rates database fedbog.bin in
4017the third example shown above. Second, the command "open dbnomics" can be
4018used to set DB.NOMICS as the source for database reads; on this see dbnomics
4019for gretl. Third, if the --odbc option is given gretl will try to access an
4020ODBC database. This option is explained at length in chapter 42 of the Gretl
4021User's Guide.
4022
4023Menu path:    /File/Open data
4024Other access: Drag a data file onto gretl's main window
4025
4026# orthdev Transformations
4027
4028Argument:   varlist
4029
4030Applicable with panel data only. A series of forward orthogonal deviations
4031is obtained for each variable in varlist and stored in a new variable with
4032the prefix o_. Thus "orthdev x y" creates the new variables o_x and o_y.
4033
4034The values are stored one step ahead of their true temporal location (that
4035is, o_x at observation t holds the deviation that, strictly speaking,
4036belongs at t - 1). This is for compatibility with first differences: one
4037loses the first observation in each time series, not the last.
4038
4039# outfile Printing
4040
4041Variants:   outfile filename
4042            outfile --buffer=strvar
4043            outfile --tempfile=strvar
4044Options:    --append (append to file, first variant only)
4045            --quiet (see below)
4046            --buffer (see below)
4047            --tempfile (see below)
4048
4049The outfile command starts a block in which any printed output is diverted
4050to a file or buffer (or just discarded, if you wish). Such a block is
4051terminated by the command "end outfile", after which output reverts to the
4052default stream.
4053
4054Diversion to a named file
4055
4056The first variant shown above sends output to a file named by the filename
4057argument. By default a new file is created (or an existing one is
4058overwritten). The output file will be written in the currently set
4059"workdir", unless the filename string contains a full path specification to
4060the contrary. If you wish to append output to an existing file instead, use
4061the --append flag.
4062
4063Some special variations on this theme are available. If you give the keyword
4064null in place of a real filename the effect is to suppress all printed
4065output until redirection is ended. If either of the keywords stdout or
4066stderr are given in place of a regular filename the effect is to redirect
4067output to standard output or standard error output respectively.
4068
4069A simple example follows, where the output from a particular regression is
4070written to a named file.
4071
4072	open data4-10
4073	outfile regress.txt
4074	  ols ENROLL 0 CATHOL INCOME COLLEGE
4075	end outfile
4076
4077Diversion to a string buffer
4078
4079The --buffer option is used to store output in a string variable. The
4080required parameter for this option must be the name of an existing string
4081variable, whose content will be over-written. We show below the example
4082given above, revised to write to a string. In this case printing model_out
4083will display the redirected output.
4084
4085	open data4-10
4086	string model_out = ""
4087	outfile --buffer=model_out
4088	  ols ENROLL 0 CATHOL INCOME COLLEGE
4089	end outfile
4090	print model_out
4091
4092Diversion to a temporary file
4093
4094The --tempfile option is used to direct output to a temporary file, with an
4095automatically constructed name that is guaranteed to be unique, in the
4096user's "dot" directory. As in the redirection to buffer case, the option
4097parameter should be the name of a string variable: in this case its content
4098is over-written with the name of the temporary file. Please note: files
4099written to the dot directory are cleaned up on exit from the program, so
4100don't use this form is you want the output to be preserved after your gretl
4101session.
4102
4103We repeat the simple example from above, with a couple of extra lines to
4104illustrate the points that strvar tells you where the output went, and you
4105can retrieve it using the "readfile" function.
4106
4107	open data4-10
4108	string mytemp
4109	outfile --tempfile=mytemp
4110	  ols ENROLL 0 CATHOL INCOME COLLEGE
4111	end outfile
4112	printf "Output went to %s\n", mytemp
4113	printf "The output was:\n%s\n", readfile(mytemp)
4114
4115Quietness
4116
4117The effect of the --quiet option is to turn off the echoing of commands and
4118the printing of auxiliary messages while output is redirected. It is
4119equivalent to doing
4120
4121	set echo off
4122	set messages off
4123
4124except that when redirection is ended the original values of the echo and
4125messages variables are restored. This option is available in all cases.
4126
4127Levels of redirection
4128
4129In general only one file can be opened in this way at any given time, so
4130calls to this command cannot be nested. However, use of this command is
4131permitted inside user-defined functions (provided the output file is also
4132closed from inside the same function) such that output can be temporarily
4133diverted and then given back to an original output file, in case outfile is
4134currently in use by the caller. For example, the code
4135
4136	function void f (string s)
4137	    outfile inner.txt
4138	      print s
4139	    end outfile
4140	end function
4141
4142	outfile outer.txt --quiet
4143	  print "Outside"
4144	  f("Inside")
4145	  print "Outside again"
4146	end outfile
4147
4148will produce a file called "outer.txt" containing the two lines
4149
4150	Outside
4151	Outside again
4152
4153and a file called "inner.txt" containing the line
4154
4155	Inside
4156
4157# panel Estimation
4158
4159Arguments:  depvar indepvars
4160Options:    --vcv (print covariance matrix)
4161            --fixed-effects (estimate with group fixed effects)
4162            --random-effects (random effects or GLS model)
4163            --nerlove (use the Nerlove transformation)
4164            --pooled (estimate via pooled OLS)
4165            --between (estimate the between-groups model)
4166            --robust (robust standard errors; see below)
4167            --time-dummies (include time dummy variables)
4168            --unit-weights (weighted least squares)
4169            --iterate (iterative estimation)
4170            --matrix-diff (compute Hausman test via matrix difference)
4171            --unbalanced=method (random effects only, see below)
4172            --quiet (less verbose output)
4173            --verbose (more verbose output)
4174Examples:   penngrow.inp
4175
4176Estimates a panel model. By default the fixed effects estimator is used;
4177this is implemented by subtracting the group or unit means from the original
4178data.
4179
4180If the --random-effects flag is given, random effects estimates are
4181computed, by default using the method of Swamy and Arora (1972). In this
4182case (only) the option --matrix-diff forces use of the matrix-difference
4183method (as opposed to the regression method) for carrying out the Hausman
4184test for the consistency of the random effects estimator. Also specific to
4185the random effects estimator is the --nerlove flag, which selects the method
4186of Nerlove (1971) as opposed to Swamy and Arora.
4187
4188Alternatively, if the --unit-weights flag is given, the model is estimated
4189via weighted least squares, with the weights based on the residual variance
4190for the respective cross-sectional units in the sample. In this case (only)
4191the --iterate flag may be added to produce iterative estimates: if the
4192iteration converges, the resulting estimates are Maximum Likelihood.
4193
4194As a further alternative, if the --between flag is given, the between-groups
4195model is estimated (that is, an OLS regression using the group means).
4196
4197The default means of calculating robust standard errors in panel-data models
4198is the Arellano HAC estimator, but Beck-Katz "Panel Corrected Standard
4199Errors" can be selected via the command set pcse on. When the robust option
4200is specified the joint F test on the fixed effects is performed using the
4201robust method of Welch (1951).
4202
4203The --unbalanced option is available only for random effects models: it can
4204be used to choose an ANOVA method for use with an unbalanced panel. By
4205default gretl uses the Swamy-Arora method as for balanced panels, except
4206that the harmonic mean of the individual time-series lengths is used in
4207place of a common T. Under this option you can specify either bc, to use the
4208method of Baltagi and Chang (1994), or stata, to emulate the sa option to
4209the xtreg command in Stata.
4210
4211For more details on panel estimation, please see chapter 23 of the Gretl
4212User's Guide.
4213
4214Menu path:    /Model/Panel
4215
4216# panplot Graphs
4217
4218Argument:   plotvar
4219Options:    --means (time series, group means)
4220            --overlay (plot per group, overlaid, N <= 130)
4221            --sequence (plot per group, in sequence, N <= 130)
4222            --grid (plot per group, in grid, N <= 16)
4223            --stack (plot per group, stacked, N <= 6)
4224            --boxplots (boxplot per group, in sequence, N <= 150)
4225            --boxplot (single boxplot, all groups)
4226            --output=filename (send output to specified file)
4227Examples:   panplot x --overlay
4228            panplot x --means --output=display
4229
4230Graphing command specific to panel data: the series plotvar is plotted in a
4231mode specified by one or other of the options.
4232
4233Apart from the --means and --boxplot options the plot explicitly represents
4234variation in both the time-series and cross-sectional dimensions. Such plots
4235are limited in respect of the number of groups (also known as individuals or
4236units) in the current sample range of the panel. For example, the --overlay
4237option, which shows a time series for each group in a single plot, is
4238available only when the number of groups, N, is 130 or less. (Otherwise the
4239graphic becomes too dense to be informative.) If a panel is too large to
4240permit the desired plot specification one can select a reduced range of
4241groups or units temporarily, as in
4242
4243	smpl 1 100 --unit
4244	panplot x --overlay
4245	smpl full
4246
4247The --output=filename option can be used to control the form and destination
4248of the output; see the "gnuplot" command for details.
4249
4250Other access: Main window pop-up menu (single selection)
4251
4252# panspec Tests
4253
4254Options:    --nerlove (use Nerlove method for random effects)
4255            --matrix_diff (use matrix-difference method for Hausman test)
4256            --quiet (Suppress printed output)
4257
4258This command is available only after estimating a panel-data model via OLS.
4259It tests the simple pooled specification against the most common
4260alternatives, fixed effects and random effects.
4261
4262The fixed effects specification allows the intercept of the regression to
4263vary across the cross-sectional units. A Wald F-test is reported for the
4264null hypotheses that the intercepts do not differ. The random effects
4265specification decomposes the residual variance into two parts, one part
4266specific to the cross-sectional unit and the other specific to the
4267particular observation. (This estimator can be computed only if the number
4268of cross-sectional units in the data set exceeds the number of parameters to
4269be estimated.) The Breusch-Pagan LM statistic tests the null hypothesis that
4270pooled OLS is adequate against the random effects alternative.
4271
4272Pooled OLS may be rejected against both of the alternatives. Provided the
4273unit- or group-specific error is uncorrelated with the independent
4274variables, the random effects estimator is more efficient than fixed
4275effects; otherwise the random effects estimator is inconsistent and fixed
4276effects are to be preferred. The null hypothesis for the Hausman test is
4277that the group-specific error is not so correlated (and therefore the random
4278effects estimator is preferable). A low p-value for this test counts against
4279random effects and in favor of fixed effects.
4280
4281The first two options for this command pertain to random effects estimation.
4282By default the method of Swamy and Arora is used, and the Hausman test
4283statistic is calculated using the regression method. The options enable the
4284use of Nerlove's alternative variance estimator, and/or the
4285matrix-difference approach to the Hausman statistic.
4286
4287On successful completion the accessors "$test" and "$pvalue" retrieve
42883-vectors holding test statistics and p-values for the three tests noted
4289above: poolability (Wald), poolability (Breusch-Pagan), and Hausman. If you
4290just want the results in this form you can give the --quiet option to skip
4291printed output.
4292
4293Note that after estimating the random effects specification via the "panel"
4294command, the Hausman test is automatically carried out and the results can
4295be retrieved via the "$hausman" accessor.
4296
4297Menu path:    Model window, /Tests/Panel specification
4298
4299# pca Statistics
4300
4301Argument:   varlist
4302Options:    --covariance (use the covariance matrix)
4303            --save[=n] (save major components)
4304            --save-all (save all components)
4305            --quiet (don't print results)
4306
4307Principal Components Analysis. Unless the --quiet option is given, prints
4308the eigenvalues of the correlation matrix (or the covariance matrix if the
4309--covariance option is given) for the variables in varlist, along with the
4310proportion of the joint variance accounted for by each component. Also
4311prints the corresponding eigenvectors or "component loadings".
4312
4313If you give the --save-all option then all components are saved to the
4314dataset as series, with names PC1, PC2 and so on. These artificial variables
4315are formed as the sum of (component loading) times (standardized X_i), where
4316X_i denotes the ith variable in varlist.
4317
4318If you give the --save option without a parameter value, components with
4319eigenvalues greater than the mean (which means greater than 1.0 if the
4320analysis is based on the correlation matrix) are saved to the dataset as
4321described above. If you provide a value for n with this option then the most
4322important n components are saved.
4323
4324See also the "princomp" function.
4325
4326Menu path:    /View/Principal components
4327
4328# pergm Statistics
4329
4330Arguments:  series [ bandwidth ]
4331Options:    --bartlett (use Bartlett lag window)
4332            --log (use log scale)
4333            --radians (show frequency in radians)
4334            --degrees (show frequency in degrees)
4335            --plot=mode-or-filename (see below)
4336
4337Computes and displays the spectrum of the specified series. By default the
4338sample periodogram is given, but optionally a Bartlett lag window is used in
4339estimating the spectrum (see, for example, Greene's Econometric Analysis for
4340a discussion of this). The default width of the Bartlett window is twice the
4341square root of the sample size but this can be set manually using the
4342bandwidth parameter, up to a maximum of half the sample size.
4343
4344If the --log option is given the spectrum is represented on a logarithmic
4345scale.
4346
4347The (mutually exclusive) options --radians and --degrees influence the
4348appearance of the frequency axis when the periodogram is graphed. By default
4349the frequency is scaled by the number of periods in the sample, but these
4350options cause the axis to be labeled from 0 to pi radians or from 0 to
4351180degrees, respectively.
4352
4353By default, if the program is not in batch mode a plot of the periodogram is
4354shown. This can be adjusted via the --plot option. The acceptable parameters
4355to this option are none (to suppress the plot); display (to display a plot
4356even when in batch mode); or a file name. The effect of providing a file
4357name is as described for the --output option of the "gnuplot" command.
4358
4359Menu path:    /Variable/Periodogram
4360Other access: Main window pop-up menu (single selection)
4361
4362# pkg Utilities
4363
4364Arguments:  action pkgname
4365Options:    --local (install from local file)
4366            --quiet (see below)
4367            --verbose (see below)
4368Examples:   pkg install armax
4369            pkg install /path/to/myfile.gfn --local
4370            pkg query ghosts
4371            pkg unload armax
4372
4373This command provides a means of installing, unloading, querying or deleting
4374gretl function packages. The action argument must be one of install, query,
4375unload, remove or index.
4376
4377install: In the most basic form, with no option flag and the pkgname
4378argument given as the "plain" name of a gretl function package (as in the
4379first example above), the effect is to download the specified package from
4380the gretl server (unless pkgname starts with http://) and install it on the
4381local machine. In this case it is not necessary to supply a filename
4382extension. If the --local option is given, however, pkgname should be the
4383path to an uninstalled package file on the local machine, with the correct
4384extension (.gfn or .zip). In this case the effect is to copy the file into
4385place (gfn), or unzip it into place (zip), "into place" meaning where the
4386"include" command will find it.
4387
4388query: The default effect is to print basic information about the specified
4389package (author, version, etc.). But if the --quiet option is appended
4390nothing is printed; the package information is instead stored in the form of
4391a gretl bundle, which can be accessed via "$result". If no information can
4392be found this bundle will be empty.
4393
4394unload: pkgname should be given in plain form, without path or suffix as in
4395the last example above. The effect is to unload the package in question from
4396gretl's memory, if it is currently loaded, and also to remove it from the
4397GUI menu to which it is attached, if any.
4398
4399remove: performs the actions noted for unload and in addition deletes the
4400file(s) associated with the package from disk.
4401
4402index: is a special case in which pkgname must be replaced by the keyword
4403"addons": the effect is to update the index of the standard packages known
4404as addons. Such updating is performed automatically from time to time but in
4405some cases a manual update may be useful. In this case the --verbose flag
4406produces a printout of where gretl has searched and what it has found. To be
4407clear, here's the way to get full indexing output:
4408
4409	pkg index addons --verbose
4410
4411Menu path:    /File/Function packages/On server
4412
4413# plot Graphs
4414
4415Argument:   [ data ]
4416Options:    --with-lines[=varspec] (use lines, not points)
4417            --with-lp[=varspec] (use lines and points)
4418            --with-impulses[=varspec] (use vertical lines)
4419            --with-steps[=varspec] (use horizontal and vertical line segments)
4420            --time-series (plot against time)
4421            --single-yaxis (force use of just one y-axis)
4422            --ylogscale[=base] (use log scale for vertical axis)
4423            --dummy (see below)
4424            --fit=fitspec (see below)
4425            --band=bandspec (see below)
4426            --band-style=style (see below)
4427            --output=filename (send output to specified file)
4428Examples:   nile.inp
4429
4430The plot block provides an alternative to the "gnuplot" command which may be
4431more convenient when you are producing an elaborate plot (with several
4432options and/or gnuplot commands to be inserted into the plot file). In
4433addition to the following explanation, please also refer to chapter 6 of the
4434Gretl User's Guide for some further examples.
4435
4436A plot block starts with the command-word plot. This is commonly followed by
4437a data argument, which specifies data to be plotted: this should be the name
4438of a list, a matrix, or a single series. If no input data are specified the
4439block must contain at least one directive to plot a formula instead; such
4440directives may be given via literal or printf lines (see below).
4441
4442If a list or matrix is given, the last element (list) or column (matrix) is
4443assumed to be the x-axis variable and the other(s) the y-axis variable(s),
4444unless the --time-series option is given in which case all the specified
4445data go on the y axis.
4446
4447The option of supplying a single series name is restricted to time-series
4448data, in which case it is assumed that a time-series plot is wanted;
4449otherwise an error is flagged.
4450
4451The starting line may be prefixed with the "savename <-" apparatus to save a
4452plot as an icon in the GUI program. The block ends with end plot.
4453
4454Inside the block you have zero or more lines of these types, identified by
4455an initial keyword:
4456
4457  option: specify a single option.
4458
4459  options: specify multiple options on a single line, separated by spaces.
4460
4461  literal: a command to be passed to gnuplot literally.
4462
4463  printf: a printf statement whose result will be passed to gnuplot
4464  literally.
4465
4466Note that when you specify an option using the option or options keywords,
4467it is not necessary to supply the customary double-dash before the option
4468specifier. For details on the effects of the various options please see
4469"gnuplot" (but see below for some specifics on using the --band option in
4470the plot context).
4471
4472The intended use of the plot block is best illustrated by example:
4473
4474	string title = "My title"
4475	string xname = "My x-variable"
4476	plot plotmat
4477	    options with-lines fit=none
4478	    literal set linetype 3 lc rgb "#0000ff"
4479	    literal set nokey
4480	    printf "set title \"%s\"", title
4481	    printf "set xlabel \"%s\"", xname
4482	end plot --output=display
4483
4484This example assumes that plotmat is the name of a matrix with at least 2
4485columns (or a list with at least two members). Note that it is considered
4486good practice to place the --output option (only) on the last line of the
4487block; other options should be placed within the block.
4488
4489Plotting a band with matrix data
4490
4491The --band and --band-style options mostly work as described in the help for
4492"gnuplot", with the following exception: when the data to be plotted are
4493given in the form of a matrix, the first parameter to --band must be given
4494as the name of a matrix with two columns (holding, respectively, the center
4495and the width of the band). This parameter takes the place of the two values
4496(series names or ID numbers, or matrix columns) required by the gnuplot
4497version of this option. An illustration follows:
4498
4499	scalar n = 100
4500	matrix x = seq(1,n)'
4501	matrix y = x + filter(mnormal(n,1), 1, {1.8, -0.9})
4502	matrix B = y ~ muniform(n,1)
4503	plot y
4504	    options time-series with-lines
4505	    options band=B,10 band-style=fill
4506	end plot --output=display
4507
4508Plotting without data
4509
4510The following example shows a simple case of specifying a plot without a
4511data source.
4512
4513	plot
4514	    literal set title 'CRRA utility'
4515	    literal set xlabel 'c'
4516	    literal set ylabel 'u(c)'
4517	    literal set xrange[1:3]
4518	    literal set key top left
4519	    literal crra(x,s) = (x**(1-s) - 1)/(1-s)
4520	    printf "plot crra(x, 0) t 'sigma=0', \\"
4521	    printf " log(x) t 'sigma=1', \\"
4522	    printf " crra(x,3) t 'sigma=3"
4523	end plot --output=display
4524
4525# poisson Estimation
4526
4527Arguments:  depvar indepvars [ ; offset ]
4528Options:    --robust (robust standard errors)
4529            --cluster=clustvar (see "logit" for explanation)
4530            --vcv (print covariance matrix)
4531            --verbose (print details of iterations)
4532            --quiet (don't print results)
4533Examples:   poisson y 0 x1 x2
4534            poisson y 0 x1 x2 ; S
4535            See also camtriv.inp, greene19_3.inp
4536
4537Estimates a poisson regression. The dependent variable is taken to represent
4538the occurrence of events of some sort, and must take on only non-negative
4539integer values.
4540
4541If a discrete random variable Y follows the Poisson distribution, then
4542
4543  Pr(Y = y) = exp(-v) * v^y / y!
4544
4545for y = 0, 1, 2,.... The mean and variance of the distribution are both
4546equal to v. In the Poisson regression model, the parameter v is represented
4547as a function of one or more independent variables. The most common version
4548(and the only one supported by gretl) has
4549
4550  v = exp(b0 + b1*x1 + b2*x2 + ...)
4551
4552or in other words the log of v is a linear function of the independent
4553variables.
4554
4555Optionally, you may add an "offset" variable to the specification. This is a
4556scale variable, the log of which is added to the linear regression function
4557(implicitly, with a coefficient of 1.0). This makes sense if you expect the
4558number of occurrences of the event in question to be proportional, other
4559things equal, to some known factor. For example, the number of traffic
4560accidents might be supposed to be proportional to traffic volume, other
4561things equal, and in that case traffic volume could be specified as an
4562"offset" in a Poisson model of the accident rate. The offset variable must
4563be strictly positive.
4564
4565By default, standard errors are computed using the negative inverse of the
4566Hessian. If the --robust flag is given, then QML or Huber-White standard
4567errors are calculated instead. In this case the estimated covariance matrix
4568is a "sandwich" of the inverse of the estimated Hessian and the outer
4569product of the gradient.
4570
4571See also "negbin".
4572
4573Menu path:    /Model/Limited dependent variable/Count data
4574
4575# print Printing
4576
4577Variants:   print varlist
4578            print
4579            print object-names
4580            print string-literal
4581Options:    --byobs (by observations)
4582            --no-dates (use simple observation numbers)
4583            --range=start:stop (see below)
4584            --midas (see below)
4585            --tree (specific to bundles; see below)
4586Examples:   print x1 x2 --byobs
4587            print my_matrix
4588            print "This is a string"
4589            print my_array --range=3:6
4590            print hflist --midas
4591
4592Please note that print is a rather "basic" command (primarily intended for
4593printing the values of series); see "printf" and "eval" for more advanced,
4594and less restrictive, alternatives.
4595
4596In the first variant shown above (also see the first example), varlist
4597should be a list of series (either a named list or a list specified via the
4598names or ID numbers of series, separated by spaces). In that case this
4599command prints the values of the listed series. By default the data are
4600printed "by variable", but if the --byobs flag is added they are printed by
4601observation. When printing by observation, the default is to show the date
4602(with time-series data) or the observation marker string (if any) at the
4603start of each line. The --no-dates option suppresses the printing of dates
4604or markers; a simple observation number is shown instead. See the final
4605paragraph of this entry for the effect of the --midas option (which applies
4606only to a named list of series).
4607
4608If no argument is given (the second variant shown above) then the action is
4609similar to the first case except that all series in the current dataset are
4610printed. The supported options are as decribed above.
4611
4612The third variant (with the object-names argument; see the second example)
4613expects a space-separated list of names of primary gretl objects other than
4614series (scalars, matrices, strings, bundles, arrays). The value(s) of these
4615objects are displayed. In the case of bundles, their members are sorted by
4616type and alphabetically.
4617
4618In the fourth form (third example), string-literal should be a string
4619enclosed in double-quotes (and there should be nothing else following on the
4620command line). The string in question is printed, followed by a newline
4621character.
4622
4623The --range option can be used to control the amount of information printed.
4624The start and stop (integer) values refer to observations for series and
4625lists, rows for matrices, elements for arrays, and lines of text for
4626strings. In all cases the minimum start value is 1 and the maximum stop
4627value is the "row-wise size" of the object in question. Negative values for
4628these indices are taken to indicate a count back from the end. The indices
4629may be given in numeric form or as the names of predefined scalar variables.
4630If start is omitted that is taken as an implicit 1 and if stop is omitted
4631that means go all the way to the end. Note that with series and lists the
4632indices are relative to the current sample range.
4633
4634The --tree option is specific to the printing of a gretl bundle: the effect
4635is that if the specified bundle contains further bundles, or arrays of
4636bundles, their contents are listed. Otherwise only the top-level members of
4637the bundle are listed.
4638
4639The --midas option is specific to the printing of a list of series, and
4640moreover it is specific to datasets that contain one or more high-frequency
4641series, each represented by a "MIDAS list". If one such list is given as
4642argument and this option is appended, the series is printed by observation
4643at its "native" frequency.
4644
4645Menu path:    /Data/Display values
4646
4647# printf Printing
4648
4649Arguments:  format , args
4650
4651Prints scalar values, series, matrices, or strings under the control of a
4652format string (providing a subset of the printf function in the C
4653programming language). Recognized numeric formats are %e, %E, %f, %g, %G, %d
4654and %x, in each case with the various modifiers available in C. Examples:
4655the format %.10g prints a value to 10 significant figures; %12.6f prints a
4656value to 6 decimal places, with a width of 12 characters. Note, however,
4657that in gretl the format %g is a good default choice for all numerical
4658values; you don't need to get too complicated. The format %s should be used
4659for strings.
4660
4661The format string itself must be enclosed in double quotes. The values to be
4662printed must follow the format string, separated by commas. These values
4663should take the form of either (a) the names of variables, (b) expressions
4664that are yield some sort of printable result, or (c) the special functions
4665varname() or date(). The following example prints the values of two
4666variables plus that of a calculated expression:
4667
4668	ols 1 0 2 3
4669	scalar b = $coeff[2]
4670	scalar se_b = $stderr[2]
4671	printf "b = %.8g, standard error %.8g, t = %.4f\n",
4672          b, se_b, b/se_b
4673
4674The next lines illustrate the use of the varname and date functions, which
4675respectively print the name of a variable, given its ID number, and a date
4676string, given a 1-based observation number.
4677
4678	printf "The name of variable %d is %s\n", i, varname(i)
4679	printf "The date of observation %d is %s\n", j, date(j)
4680
4681If a matrix argument is given in association with a numeric format, the
4682entire matrix is printed using the specified format for each element. The
4683same applies to series, except that the range of values printed is governed
4684by the current sample setting.
4685
4686The maximum length of a format string is 127 characters. The escape
4687sequences \n (newline), \t (tab), \v (vertical tab) and \\ (literal
4688backslash) are recognized. To print a literal percent sign, use %%.
4689
4690As in C, numerical values that form part of the format (width and or
4691precision) may be given directly as numbers, as in %10.4f, or they may be
4692given as variables. In the latter case, one puts asterisks into the format
4693string and supplies corresponding arguments in order. For example,
4694
4695	scalar width = 12
4696	scalar precision = 6
4697	printf "x = %*.*f\n", width, precision, x
4698
4699# probit Estimation
4700
4701Arguments:  depvar indepvars
4702Options:    --robust (robust standard errors)
4703            --cluster=clustvar (see "logit" for explanation)
4704            --vcv (print covariance matrix)
4705            --verbose (print details of iterations)
4706            --quiet (don't print results)
4707            --p-values (show p-values instead of slopes)
4708            --estrella (select pseudo-R-squared variant)
4709            --random-effects (estimates a random effects panel probit model)
4710            --quadpoints=k (number of quadrature points for RE estimation)
4711Examples:   ooballot.inp, oprobit.inp, reprobit.inp
4712
4713If the dependent variable is a binary variable (all values are 0 or 1)
4714maximum likelihood estimates of the coefficients on indepvars are obtained
4715via the Newton-Raphson method. As the model is nonlinear the slopes depend
4716on the values of the independent variables. By default the slopes with
4717respect to each of the independent variables are calculated (at the means of
4718those variables) and these slopes replace the usual p-values in the
4719regression output. This behavior can be suppressed by giving the --p-values
4720option. The chi-square statistic tests the null hypothesis that all
4721coefficients are zero apart from the constant.
4722
4723By default, standard errors are computed using the negative inverse of the
4724Hessian. If the --robust flag is given, then QML or Huber-White standard
4725errors are calculated instead. In this case the estimated covariance matrix
4726is a "sandwich" of the inverse of the estimated Hessian and the outer
4727product of the gradient. See chapter 10 of Davidson and MacKinnon for
4728details.
4729
4730By default the pseudo-R-squared statistic suggested by McFadden (1974) is
4731shown, but in the binary case if the --estrella option is given, the variant
4732recommended by Estrella (1998) is shown instead. This variant arguably
4733mimics more closely the properties of the regular R^2 in the context of
4734least-squares estimation.
4735
4736If the dependent variable is not binary but is discrete, then Ordered Probit
4737estimates are obtained. (If the variable selected as dependent is not
4738discrete, an error is flagged.)
4739
4740Probit for panel data
4741
4742With the --random-effects option, the error term is assumed to be composed
4743of two normally distributed components: one time-invariant term that is
4744specific to the cross-sectional unit or "individual" (and is known as the
4745individual effect); and one term that is specific to the particular
4746observation.
4747
4748Evaluation of the likelihood for this model involves the use of
4749Gauss-Hermite quadrature for approximating the value of expectations of
4750functions of normal variates. The number of quadrature points used can be
4751chosen through the --quadpoints option (the default is 32). Using more
4752points will increase the accuracy of the results, but at the cost of longer
4753compute time; with many quadrature points and a large dataset estimation may
4754be quite time consuming.
4755
4756Besides the usual parameter estimates (and associated statistics) relating
4757to the included regressors, certain additional information is presented on
4758estimation of this sort of model:
4759
4760  lnsigma2: the maximum likelihood estimate of the log of the variance of
4761  the individual effect;
4762
4763  sigma_u: the estimated standard deviation of the individual effect; and
4764
4765  rho: the estimated share of the individual effect in the composite error
4766  variance (also known as the intra-class correlation).
4767
4768The Likelihood Ratio test of the null hypothesis that rho equals zero
4769provides a means of assessing whether the random effects specification is
4770needed. If the null is not rejected that suggests that a simple pooled
4771probit specification is adequate.
4772
4773Menu path:    /Model/Limited dependent variable/Probit
4774
4775# pvalue Statistics
4776
4777Arguments:  dist [ params ] xval
4778Examples:   pvalue z zscore
4779            pvalue t 25 3.0
4780            pvalue X 3 5.6
4781            pvalue F 4 58 fval
4782            pvalue G shape scale x
4783            pvalue B bprob 10 6
4784            pvalue P lambda x
4785            pvalue W shape scale x
4786            See also mrw.inp, restrict.inp
4787
4788Computes the area to the right of xval in the specified distribution (z for
4789Gaussian, t for Student's t, X for chi-square, F for F, G for gamma, B for
4790binomial, P for Poisson, exp for Exponential, W for Weibull).
4791
4792Depending on the distribution, the following information must be given,
4793before the xval: for the t and chi-square distributions, the degrees of
4794freedom; for F, the numerator and denominator degrees of freedom; for gamma,
4795the shape and scale parameters; for the binomial distribution, the "success"
4796probability and the number of trials; for the Poisson distribution, the
4797parameter lambda (which is both the mean and the variance); for the
4798Exponential, a scale parameter; and for the Weibull, shape and scale
4799parameters. As shown in the examples above, the numerical parameters may be
4800given in numeric form or as the names of variables.
4801
4802The parameters for the gamma distribution are sometimes given as mean and
4803variance rather than shape and scale. The mean is the product of the shape
4804and the scale; the variance is the product of the shape and the square of
4805the scale. So the scale may be found as the variance divided by the mean,
4806and the shape as the mean divided by the scale.
4807
4808Menu path:    /Tools/P-value finder
4809
4810# qlrtest Tests
4811
4812Options:    --limit-to=list (limit test to subset of regressors)
4813            --plot=mode-or-filename (see below)
4814            --quiet (suppress printed output)
4815
4816For a model estimated on time-series data via OLS, performs the Quandt
4817likelihood ratio (QLR) test for a structural break at an unknown point in
4818time, with 15 percent trimming at the beginning and end of the sample
4819period.
4820
4821For each potential break point within the central 70 percent of the
4822observations, a Chow test is performed. See "chow" for details; as with the
4823regular Chow test, this is a robust Wald test if the original model was
4824estimated with the --robust option, an F-test otherwise. The QLR statistic
4825is then the maximum of the individual test statistics.
4826
4827An asymptotic p-value is obtained using the method of Bruce Hansen (1997).
4828
4829Besides the standard hypothesis test accessors "$test" and "$pvalue",
4830"$qlrbreak" can be used to retrieve the index of the observation at which
4831the test statistic is maximized.
4832
4833The --limit-to option can be used to limit the set of interactions with the
4834split dummy variable in the Chow tests to a subset of the original
4835regressors. The parameter for this option must be a named list, all of whose
4836members are among the original regressors. The list should not include the
4837constant.
4838
4839When this command is run interactively (only), a plot of the Chow test
4840statistic is displayed by default. This can be adjusted via the --plot
4841option. The acceptable parameters to this option are none (to suppress the
4842plot); display (to display a plot even when not in interactive mode); or a
4843file name. The effect of providing a file name is as described for the
4844--output option of the "gnuplot" command.
4845
4846Menu path:    Model window, /Tests/QLR test
4847
4848# qqplot Graphs
4849
4850Variants:   qqplot y
4851            qqplot y x
4852Options:    --z-scores (see below)
4853            --raw (see below)
4854            --output=filename (send plot to specified file)
4855
4856Given just one series argument, displays a plot of the empirical quantiles
4857of the selected series (given by name or ID number) against the quantiles of
4858the normal distribution. The series must include at least 20 valid
4859observations in the current sample range. By default the empirical quantiles
4860are plotted against quantiles of the normal distribution having the same
4861mean and variance as the sample data, but two alternatives are available: if
4862the --z-scores option is given the data are standardized, while if the --raw
4863option is given the "raw" empirical quantiles are plotted against the
4864quantiles of the standard normal distribution.
4865
4866The option --output has the effect of sending the output to the specified
4867file; use "display" to force output to the screen. See the "gnuplot" command
4868for more detail on this option.
4869
4870Given two series arguments, y and x, displays a plot of the empirical
4871quantiles of y against those of x. The data values are not standardized.
4872
4873Menu path:    /Variable/Normal Q-Q plot
4874Menu path:    /View/Graph specified vars/Q-Q plot
4875
4876# quantreg Estimation
4877
4878Arguments:  tau depvar indepvars
4879Options:    --robust (robust standard errors)
4880            --intervals[=level] (compute confidence intervals)
4881            --vcv (print covariance matrix)
4882            --quiet (suppress printing of results)
4883Examples:   quantreg 0.25 y 0 xlist
4884            quantreg 0.5 y 0 xlist --intervals
4885            quantreg 0.5 y 0 xlist --intervals=.95
4886            quantreg tauvec y 0 xlist --robust
4887            See also mrw_qr.inp
4888
4889Quantile regression. The first argument, tau, is the conditional quantile
4890for which estimates are wanted. It may be given either as a numerical value
4891or as the name of a pre-defined scalar variable; the value must be in the
4892range 0.01 to 0.99. (Alternatively, a vector of values may be given for tau;
4893see below for details.) The second and subsequent arguments compose a
4894regression list on the same pattern as "ols".
4895
4896Without the --intervals option, standard errors are printed for the quantile
4897estimates. By default, these are computed according to the asymptotic
4898formula given by Koenker and Bassett (1978), but if the --robust option is
4899given, standard errors that are robust with respect to heteroskedasticity
4900are calculated using the method of Koenker and Zhao (1994).
4901
4902When the --intervals option is chosen, confidence intervals are given for
4903the parameter estimates instead of standard errors. These intervals are
4904computed using the rank inversion method, and in general they are
4905asymmetrical about the point estimates. The specifics of the calculation are
4906inflected by the --robust option: without this, the intervals are computed
4907on the assumption of IID errors (Koenker, 1994); with it, they use the
4908robust estimator developed by Koenker and Machado (1999).
4909
4910By default, 90 percent confidence intervals are produced. You can change
4911this by appending a confidence level (expressed as a decimal fraction) to
4912the intervals option, as in --intervals=0.95.
4913
4914Vector-valued tau: instead of supplying a scalar, you may give the name of a
4915pre-defined matrix. In this case estimates are computed for all the given
4916tau values and the results are printed in a special format, showing the
4917sequence of quantile estimates for each regressor in turn.
4918
4919Menu path:    /Model/Robust estimation/Quantile regression
4920
4921# quit Utilities
4922
4923Exits from gretl's current modality.
4924
4925  When called from a script, execution of the script is terminated. If the
4926  context is gretlcli in batch mode, gretlcli itself exits, otherwise the
4927  program reverts to interactive mode.
4928
4929  When called from the GUI console, the console window is closed.
4930
4931  When called from gretlcli in interactive mode the program exits.
4932
4933Note that this command cannot be called within functions or loops.
4934
4935In no case does the quit command cause the gretl GUI program to exit. That
4936is done via the Quit item under the File menu, or Ctrl+Q, or by clicking the
4937close control on the title-bar of the main gretl window.
4938
4939# rename Dataset
4940
4941Arguments:  series newname
4942Option:     --quiet (suppress printed output)
4943
4944Changes the name of series (identified by name or ID number) to newname. The
4945new name must be of 31 characters maximum, must start with a letter, and
4946must be composed of only letters, digits, and the underscore character. In
4947addition, it must not be the name of an existing object of any kind.
4948
4949Menu path:    /Variable/Edit attributes
4950Other access: Main window pop-up menu (single selection)
4951
4952# reset Tests
4953
4954Options:    --quiet (don't print the auxiliary regression)
4955            --silent (don't print anything)
4956            --squares-only (compute the test using only the squares)
4957            --cubes-only (compute the test using only the cubes)
4958
4959Must follow the estimation of a model via OLS. Carries out Ramsey's RESET
4960test for model specification (nonlinearity) by adding the squares and/or the
4961cubes of the fitted values to the regression and calculating the F statistic
4962for the null hypothesis that the coefficients on the added terms are zero.
4963
4964Both the squares and the cubes are added unless one of the options
4965--squares-only or --cubes-only is given.
4966
4967The --silent option may be used if one plans to make use of the "$test"
4968and/or "$pvalue" accessors to grab the results of the test.
4969
4970Menu path:    Model window, /Tests/Ramsey's RESET
4971
4972# restrict Tests
4973
4974Options:    --quiet (don't print restricted estimates)
4975            --silent (don't print anything)
4976            --wald (system estimators only - see below)
4977            --bootstrap (bootstrap the test if possible)
4978            --full (OLS and VECMs only, see below)
4979Examples:   hamilton.inp, restrict.inp
4980
4981Imposes a set of (usually linear) restrictions on either (a) the model last
4982estimated or (b) a system of equations previously defined and named. In all
4983cases the set of restrictions should be started with the keyword "restrict"
4984and terminated with "end restrict".
4985
4986In the single equation case the restrictions are always implicitly to be
4987applied to the last model, and they are evaluated as soon as the restrict
4988block is closed.
4989
4990In the case of a system of equations (defined via the "system" command), the
4991initial "restrict" may be followed by the name of a previously defined
4992system of equations. If this is omitted and the last model was a system then
4993the restrictions are applied to the last model. By default the restrictions
4994are evaluated when the system is next estimated, using the "estimate"
4995command. But if the --wald option is given the restriction is tested right
4996away, via a Wald chi-square test on the covariance matrix. Note that this
4997option will produce an error if a system has been defined but not yet
4998estimated.
4999
5000Depending on the context, the restrictions to be tested may be expressed in
5001various ways. The simplest form is as follows: each restriction is given as
5002an equation, with a linear combination of parameters on the left and a
5003scalar value to the right of the equals sign (either a numerical constant or
5004the name of a scalar variable).
5005
5006In the single-equation case, parameters may be referenced in the form b[i],
5007where i represents the position in the list of regressors (starting at 1),
5008or b[varname], where varname is the name of the regressor in question. In
5009the system case, parameters are referenced using b plus two numbers in
5010square brackets. The leading number represents the position of the equation
5011within the system and the second number indicates position in the list of
5012regressors. For example b[2,1] denotes the first parameter in the second
5013equation, and b[3,2] the second parameter in the third equation. The b terms
5014in the equation representing a restriction may be prefixed with a numeric
5015multiplier, for example 3.5*b[4].
5016
5017Here is an example of a set of restrictions for a previously estimated
5018model:
5019
5020	restrict
5021	 b[1] = 0
5022	 b[2] - b[3] = 0
5023	 b[4] + 2*b[5] = 1
5024	end restrict
5025
5026And here is an example of a set of restrictions to be applied to a named
5027system. (If the name of the system does not contain spaces, the surrounding
5028quotes are not required.)
5029
5030	restrict "System 1"
5031	 b[1,1] = 0
5032	 b[1,2] - b[2,2] = 0
5033	 b[3,4] + 2*b[3,5] = 1
5034	end restrict
5035
5036In the single-equation case the restrictions are by default evaluated via a
5037Wald test, using the covariance matrix of the model in question. If the
5038original model was estimated via OLS then the restricted coefficient
5039estimates are printed; to suppress this, append the --quiet option flag to
5040the initial restrict command. As an alternative to the Wald test, for models
5041estimated via OLS or WLS only, you can give the --bootstrap option to
5042perform a bootstrapped test of the restriction.
5043
5044In the system case, the test statistic depends on the estimator chosen: a
5045Likelihood Ratio test if the system is estimated using a Maximum Likelihood
5046method, or an asymptotic F-test otherwise.
5047
5048There are three alternatives to the method of expressing restrictions
5049described above. First, a set of g linear restrictions on a k-vector of
5050parameters, beta, may be written compactly as Rbeta - q = 0, where R is an g
5051x k matrix and q is a g-vector. You can specify a restriction by giving the
5052names of pre-defined, conformable matrices to be used as R and q, as in
5053
5054	restrict
5055	  R = Rmat
5056	  q = qvec
5057	end restrict
5058
5059Second, in a variant that may be useful when restrict is used within a
5060function, you can construct the set of restriction statements in the form of
5061an array of strings. You then use the inject keyword with the name of the
5062array. Here's a simple example:
5063
5064	strings SR = array(2)
5065	RS[1] = "b[1,2] = 0"
5066	RS[2] = "b[2,1] = 0"
5067	restrict
5068	  inject RS
5069	end restrict
5070
5071In actual usage of this method one would likely use "sprintf" to construct
5072the strings, based on input to a function.
5073
5074Lastly, if you wish to test a nonlinear restriction (this is currently
5075available for single-equation models only) you should give the restriction
5076as the name of a function, preceded by "rfunc = ", as in
5077
5078	restrict
5079	  rfunc = myfunction
5080	end restrict
5081
5082The constraint function should take a single const matrix argument; this
5083will be automatically filled out with the parameter vector. And it should
5084return a vector which is zero under the null hypothesis, non-zero otherwise.
5085The length of the vector is the number of restrictions. This function is
5086used as a "callback" by gretl's numerical Jacobian routine, which calculates
5087a Wald test statistic via the delta method.
5088
5089Here is a simple example of a function suitable for testing one nonlinear
5090restriction, namely that two pairs of parameter values have a common ratio.
5091
5092	function matrix restr (const matrix b)
5093	  matrix v = b[1]/b[2] - b[4]/b[5]
5094	  return v
5095	end function
5096
5097On successful completion of the restrict command the accessors "$test" and
5098"$pvalue" give the test statistic and its p-value.
5099
5100When testing restrictions on a single-equation model estimated via OLS, or
5101on a VECM, the --full option can be used to set the restricted estimates as
5102the "last model" for the purposes of further testing or the use of accessors
5103such as $coeff and $vcv. Note that some special considerations apply in the
5104case of testing restrictions on Vector Error Correction Models. Please see
5105chapter 33 of the Gretl User's Guide for details.
5106
5107Menu path:    Model window, /Tests/Linear restrictions
5108
5109# rmplot Graphs
5110
5111Argument:   series
5112Options:    --trim (see below)
5113            --quiet (suppress printed output)
5114            --output=filename (see below)
5115
5116Range-mean plot: this command creates a simple graph to help in deciding
5117whether a time series, y(t), has constant variance or not. We take the full
5118sample t=1,...,T and divide it into small subsamples of arbitrary size k.
5119The first subsample is formed by y(1),...,y(k), the second is y(k+1), ...,
5120y(2k), and so on. For each subsample we calculate the sample mean and range
5121(= maximum minus minimum), and we construct a graph with the means on the
5122horizontal axis and the ranges on the vertical. So each subsample is
5123represented by a point in this plane. If the variance of the series is
5124constant we would expect the subsample range to be independent of the
5125subsample mean; if we see the points approximate an upward-sloping line this
5126suggests the variance of the series is increasing in its mean; and if the
5127points approximate a downward sloping line this suggests the variance is
5128decreasing in the mean.
5129
5130Besides the graph, gretl displays the means and ranges for each subsample,
5131along with the slope coefficient for an OLS regression of the range on the
5132mean and the p-value for the null hypothesis that this slope is zero. If the
5133slope coefficient is significant at the 10 percent significance level then
5134the fitted line from the regression of range on mean is shown on the graph.
5135The t-statistic for the null, and the corresponding p-value, are recorded
5136and may be retrieved using the accessors "$test" and "$pvalue" respectively.
5137
5138If the --trim option is given, the minimum and maximum values in each
5139sub-sample are discarded before calculating the mean and range. This makes
5140it less likely that outliers will distort the analysis.
5141
5142If the --quiet option is given, no graph is shown and no output is printed;
5143only the t-statistic and p-value are recorded. Otherwise the form of the
5144plot can be controlled via the --output option; this works as described in
5145connection with the "gnuplot" command.
5146
5147Menu path:    /Variable/Range-mean graph
5148
5149# run Programming
5150
5151Argument:   filename
5152
5153Executes the commands in filename then returns control to the interactive
5154prompt. This command is intended for use with the command-line program
5155gretlcli, or at the "gretl console" in the GUI program.
5156
5157See also "include".
5158
5159Menu path:    Run icon in script window
5160
5161# runs Tests
5162
5163Argument:   series
5164Options:    --difference (use first difference of variable)
5165            --equal (positive and negative values are equiprobable)
5166
5167Carries out the nonparametric "runs" test for randomness of the specified
5168series, where runs are defined as sequences of consecutive positive or
5169negative values. If you want to test for randomness of deviations from the
5170median, for a variable named x1 with a non-zero median, you can do the
5171following:
5172
5173	series signx1 = x1 - median(x1)
5174	runs signx1
5175
5176If the --difference option is given, the variable is differenced prior to
5177the analysis, hence the runs are interpreted as sequences of consecutive
5178increases or decreases in the value of the variable.
5179
5180If the --equal option is given, the null hypothesis incorporates the
5181assumption that positive and negative values are equiprobable, otherwise the
5182test statistic is invariant with respect to the "fairness" of the process
5183generating the sequence, and the test focuses on independence alone.
5184
5185Menu path:    /Tools/Nonparametric tests
5186
5187# scatters Graphs
5188
5189Arguments:  yvar ; xvars  or yvars ; xvar
5190Options:    --with-lines (create line graphs)
5191            --matrix=name (plot columns of named matrix)
5192            --output=filename (send output to specified file)
5193Examples:   scatters 1 ; 2 3 4 5
5194            scatters 1 2 3 4 5 6 ; 7
5195            scatters y1 y2 y3 ; x --with-lines
5196
5197Generates pairwise graphs of yvar against all the variables in xvars, or of
5198all the variables in yvars against xvar. The first example above puts
5199variable 1 on the y-axis and draws four graphs, the first having variable 2
5200on the x-axis, the second variable 3 on the x-axis, and so on. The second
5201example plots each of variables 1 through 6 against variable 7 on the
5202x-axis. Scanning a set of such plots can be a useful step in exploratory
5203data analysis. The maximum number of plots is 16; any extra variable in the
5204list will be ignored.
5205
5206By default the graphs are scatterplots, but if you give the --with-lines
5207flag they will be line graphs.
5208
5209For details on usage of the --output option, please see the "gnuplot"
5210command.
5211
5212If a named matrix is specified as the data source the x and y lists should
5213be given as 1-based column numbers; or alternatively, if no such numbers are
5214given, all the columns are plotted against time or an index variable.
5215
5216If the dataset is time-series, then the second sub-list can be omitted, in
5217which case it will implicitly be taken as "time", so you can plot multiple
5218time series in separated sub-graphs.
5219
5220Menu path:    /View/Multiple graphs
5221
5222# sdiff Transformations
5223
5224Argument:   varlist
5225
5226The seasonal difference of each variable in varlist is obtained and the
5227result stored in a new variable with the prefix sd_. This command is
5228available only for seasonal time series.
5229
5230Menu path:    /Add/Seasonal differences of selected variables
5231
5232# set Programming
5233
5234Variants:   set variable value
5235            set --to-file=filename
5236            set --from-file=filename
5237            set stopwatch
5238            set
5239Examples:   set svd on
5240            set csv_delim tab
5241            set horizon 10
5242            set --to-file=mysettings.inp
5243
5244The most common use of this command is the first variant shown above, where
5245it is used to set the value of a selected program parameter. This is
5246discussed in detail below. The other uses are: with --to-file, to write a
5247script file containing all the current parameter settings; with --from-file
5248to read a script file containing parameter settings and apply them to the
5249current session; with stopwatch to zero the gretl "stopwatch" which can be
5250used to measure CPU time (see the entry for the "$stopwatch" accessor); or,
5251if the word set is given alone, to print the current settings.
5252
5253Values set via this comand remain in force for the duration of the gretl
5254session unless they are changed by a further call to "set". The parameters
5255that can be set in this way are enumerated below. Note that the settings of
5256hc_version, hac_lag and hac_kernel are used when the --robust option is
5257given to an estimation command.
5258
5259The available settings are grouped under the following categories: program
5260interaction and behavior, numerical methods, random number generation,
5261robust estimation, filtering, time series estimation, and interaction with
5262GNU R.
5263
5264Program interaction and behavior
5265
5266These settings are used for controlling various aspects of the way gretl
5267interacts with the user.
5268
5269  workdir: path. Sets the default directory for writing and reading files,
5270  whenever full paths are not specified.
5271
5272  use_cwd: on or off (the default). Governs the setting of workdir at
5273  start-up: if it's on, the working directory is inherited from the shell,
5274  otherwise it is set to whatever was selected in the previous gretl
5275  session.
5276
5277  echo: off or on (the default). Suppress or resume the echoing of commands
5278  in gretl's output.
5279
5280  messages: off or on (the default). Suppress or resume the printing of
5281  non-error messages associated with various commands, for example when a
5282  new variable is generated or when the sample range is changed.
5283
5284  verbose: off, on (the default) or comments. Acts as a "master switch" for
5285  echo and messages (see above), turning them both off or on simultaneously.
5286  The comments argument turns off echo and messages but preserves printing
5287  of comments in a script.
5288
5289  warnings: off or on (the default). Suppress or resume the printing of
5290  warning messages issued when arithmetical operations produce non-finite
5291  values.
5292
5293  csv_delim: either comma (the default), space, tab or semicolon. Sets the
5294  column delimiter used when saving data to file in CSV format.
5295
5296  csv_write_na: the string used to represent missing values when writing
5297  data to file in CSV format. Maximum 7 characters; the default is NA.
5298
5299  csv_read_na: the string taken to represent missing values (NAs) when
5300  reading data in CSV format. Maximum 7 characters. The default depends on
5301  whether a data column is found to contain numerical data (mostly) or
5302  string values. For numerical data the following are taken as indicating
5303  NAs: an empty cell, or any of the strings NA, N.A., na, n.a., N/A, #N/A,
5304  NaN, .NaN, ., .., -999, and -9999. For string-valued data only a blank
5305  cell, or a cell containing an empty string, is counted as NA. These
5306  defaults can be reimposed by giving default as the value for csv_read_na.
5307  To specify that only empty cells are read as NAs, give a value of "". Note
5308  that empty cells are always read as NAs regardless of the setting of this
5309  variable.
5310
5311  csv_digits: a positive integer specifying the number of significant digits
5312  to use when writing data in CSV format. By default up to 15 digits are
5313  used depending on the precision of the original data. Note that CSV output
5314  employs the C library's fprintf function with "%g" conversion, which means
5315  that trailing zeros are dropped.
5316
5317  display_digits: an integer from 3 to 6, specifying the number of
5318  significant digits to use when displaying regression coefficients and
5319  standard errors (the default being 6). This setting can also be used to
5320  limit the number of digits shown by the "summary" command; in this case
5321  the default (and also the maximum) is 5, or 4 when the --simple option is
5322  given.
5323
5324  mwrite_g: on or off (the default). When writing a matrix to file as text,
5325  gretl by default uses scientific notation with 18-digit precision, hence
5326  ensuring that the stored values are a faithful representation of the
5327  numbers in memory. When writing primary data with no more than 6 digits of
5328  precision it may be preferable to use %g format for a more compact and
5329  human-readable file; you can make this switch via set mwrite_g on.
5330
5331  force_decpoint: on or off (the default). Force gretl to use the decimal
5332  point character, in a locale where another character (most likely the
5333  comma) is the standard decimal separator.
5334
5335  loop_maxiter: one non-negative integer value (default 100000). Sets the
5336  maximum number of iterations that a while loop is allowed before halting
5337  (see "loop"). Note that this setting only affects the while variant; its
5338  purpose is to guard against inadvertently infinite loops. Setting this
5339  value to 0 has the effect of disabling the limit; use with caution.
5340
5341  max_verbose: off (the default), on or full. Controls the verbosity of
5342  commands and functions that use numerical optimization methods. The on
5343  choice applies only to functions (such as "BFGSmax" and "NRmax") which
5344  work silently by default; the effect is to print basic iteration
5345  information. The full setting can be used to trigger more detailed output,
5346  including parameter values and their respective gradient for the objective
5347  function at each iteration. This choice applies both to functions of the
5348  above-mentioned sort and to commands that rely on numerical optimization
5349  such as "arima", "probit" and "mle". In the case of commands the effect is
5350  to make their --verbose option produce more detail. See also chapter 37 of
5351  the Gretl User's Guide.
5352
5353  debug: 1, 2 or 0 (the default). This is for use with user-defined
5354  functions. Setting debug to 1 is equivalent to turning messages on within
5355  all such functions; setting this variable to 2 has the additional effect
5356  of turning on max_verbose within all functions.
5357
5358  shell_ok: on or off (the default). Enable launching external programs from
5359  gretl via the system shell. This is disabled by default for security
5360  reasons, and can only be enabled via the graphical user interface
5361  (Tools/Preferences/General). However, once set to on, this setting will
5362  remain active for future sessions until explicitly disabled.
5363
5364  bfgs_verbskip: one integer. This setting affects the behavior of the
5365  --verbose option to those commands that use BFGS as an optimization
5366  algorithm and is used to compact output. if bfgs_verbskip is set to, say,
5367  3, then the --verbose switch will only print iterations 3, 6, 9 and so on.
5368
5369  skip_missing: on (the default) or off. Controls gretl's behavior when
5370  contructing a matrix from data series: the default is to skip data rows
5371  that contain one or more missing values but if skip_missing is set off
5372  missing values are converted to NaNs.
5373
5374  matrix_mask: the name of a series, or the keyword null. Offers greater
5375  control than skip_missing when constructing matrices from series: the data
5376  rows selected for matrices are those with non-zero (and non-missing)
5377  values in the specified series. The selected mask remains in force until
5378  it is replaced, or removed via the null keyword.
5379
5380  quantile_type: must be one of Q6 (the default), Q7 or Q8. Selects the
5381  specific method used by the "quantile" function. For details see Hyndman
5382  and Fan (1996) or the Wikipedia entry at
5383  https://en.wikipedia.org/wiki/Quantile.
5384
5385  huge: a large positive number (by default, 1.0E100). This setting controls
5386  the value returned by the accessor "$huge".
5387
5388  assert: off (the default), warn or stop. Controls the consequences of
5389  failure (return value of 0) from the "assert" function.
5390
5391  datacols: an integer from 1 to 15, with default value 5. Sets the maximum
5392  number of series shown side-by-side when data are displayed by
5393  observation.
5394
5395  plot_collection: on, auto or off. This setting affects the way plots are
5396  displayed during interactive use. If it's on, plots of the same pixel size
5397  are gathered in a "plot collection", that is a single output window in
5398  which you can browse through the various plots going back and forth. With
5399  the off setting, instead, a different window for each plot will be
5400  generated, as in older gretl versions. Finally, the auto setting has the
5401  effect of enabling the plot collection mode only for graphs that are
5402  generated within 1.25 seconds from one another (for example, as a result
5403  of executing plotting commands in a loop).
5404
5405Numerical methods
5406
5407These settings are used for controlling the numerical algorithms that gretl
5408uses for estimation.
5409
5410  optimizer: either auto (the default), BFGS or newton. Sets the
5411  optimization algorithm used for various ML estimators, in cases where both
5412  BFGS and Newton-Raphson are applicable. The default is to use
5413  Newton-Raphson where an analytical Hessian is available, otherwise BFGS.
5414
5415  bhhh_maxiter: one integer, the maximum number of iterations for gretl's
5416  internal BHHH routine, which is used in the "arma" command for conditional
5417  ML estimation. If convergence is not achieved after bhhh_maxiter, the
5418  program returns an error. The default is set at 500.
5419
5420  bhhh_toler: one floating point value, or the string default. This is used
5421  in gretl's internal BHHH routine to check if convergence has occurred. The
5422  algorithm stops iterating as soon as the increment in the log-likelihood
5423  between iterations is smaller than bhhh_toler. The default value is
5424  1.0E-06; this value may be re-established by typing default in place of a
5425  numeric value.
5426
5427  bfgs_maxiter: one integer, the maximum number of iterations for gretl's
5428  BFGS routine, which is used for "mle", "gmm" and several specific
5429  estimators. If convergence is not achieved in the specified number of
5430  iterations, the program returns an error. The default value depends on the
5431  context, but is typically of the order of 500.
5432
5433  bfgs_toler: one floating point value, or the string default. This is used
5434  in gretl's BFGS routine to check if convergence has occurred. The
5435  algorithm stops as soon as the relative improvement in the objective
5436  function between iterations is smaller than bfgs_toler. The default value
5437  is the machine precision to the power 3/4; this value may be
5438  re-established by typing default in place of a numeric value.
5439
5440  bfgs_maxgrad: one floating point value. This is used in gretl's BFGS
5441  routine to check if the norm of the gradient is reasonably close to zero
5442  when the bfgs_toler criterion is met. A warning is printed if the norm of
5443  the gradient exceeds 1; an error is flagged if the norm exceeds
5444  bfgs_maxgrad. At present the default is the permissive value of 5.0.
5445
5446  bfgs_richardson: on or off (the default). Use Richardson extrapolation
5447  when computing numerical derivatives in the context of BFGS maximization.
5448
5449  initvals: the name of a predefined matrix. Allows manual setting of the
5450  initial parameter vector for certain estimation commands that involve
5451  numerical optimization: arma, garch, logit and probit, tobit and intreg,
5452  biprobit, duration, poisson, negbin, and also when imposing certain sorts
5453  of restriction associated with VECMs. Unlike other settings, initvals is
5454  not persistent: it resets to the default initializer after its first use.
5455  For details in connection with ARMA estimation see chapter 31 of the Gretl
5456  User's Guide.
5457
5458  lbfgs: on or off (the default). Use the limited-memory version of BFGS
5459  (L-BFGS-B) instead of the ordinary algorithm. This may be advantageous
5460  when the function to be maximized is not globally concave.
5461
5462  lbfgs_mem: an integer value in the range 3 to 20 (with a default value of
5463  8). This determines the number of corrections used in the limited memory
5464  matrix when L-BFGS-B is employed.
5465
5466  nls_toler: a floating-point value. Sets the tolerance used in judging
5467  whether or not convergence has occurred in nonlinear least squares
5468  estimation using the "nls" command. The default value is the machine
5469  precision to the power 3/4; this value may be re-established by typing
5470  default in place of a numeric value.
5471
5472  svd: on or off (the default). Use SVD rather than Cholesky or QR
5473  decomposition in least squares calculations. This option applies to the
5474  mols function as well as various internal calculations, but not to the
5475  regular "ols" command.
5476
5477  force_qr: on or off (the default). This applies to the "ols" command. By
5478  default this command computes OLS estimates using Cholesky decomposition
5479  (the fastest method), with a fallback to QR if the data seem too
5480  ill-conditioned. You can use force_qr to skip the Cholesky step; in
5481  "doubtful" cases this may ensure greater accuracy.
5482
5483  fcp: on or off (the default). Use the algorithm of Fiorentini, Calzolari
5484  and Panattoni rather than native gretl code when computing GARCH
5485  estimates.
5486
5487  gmm_maxiter: one integer, the maximum number of iterations for gretl's
5488  "gmm" command when in iterated mode (as opposed to one- or two-step). The
5489  default value is 250.
5490
5491  nadarwat_trim: one integer, the trim parameter used in the "nadarwat"
5492  function.
5493
5494  fdjac_quality: one integer (0, 1 or 2), the algorithm used by the "fdjac"
5495  function; the default is 0.
5496
5497Random number generation
5498
5499  seed: an unsigned integer. Sets the seed for the pseudo-random number
5500  generator. By default this is set from the system time; if you want to
5501  generate repeatable sequences of random numbers you must set the seed
5502  manually.
5503
5504Robust estimation
5505
5506  bootrep: an integer. Sets the number of replications for the "restrict"
5507  command with the --bootstrap option.
5508
5509  garch_vcv: unset, hessian, im (information matrix) , op (outer product
5510  matrix), qml (QML estimator), bw (Bollerslev-Wooldridge). Specifies the
5511  variant that will be used for estimating the coefficient covariance
5512  matrix, for GARCH models. If unset is given (the default) then the Hessian
5513  is used unless the "robust" option is given for the garch command, in
5514  which case QML is used.
5515
5516  arma_vcv: hessian (the default) or op (outer product matrix). Specifies
5517  the variant to be used when computing the covariance matrix for ARIMA
5518  models.
5519
5520  force_hc: off (the default) or on. By default, with time-series data and
5521  when the --robust option is given with ols, the HAC estimator is used. If
5522  you set force_hc to "on", this forces calculation of the regular
5523  Heteroskedasticity Consistent Covariance Matrix (HCCM), which does not
5524  take autocorrelation into account. Note that VARs are treated as a special
5525  case: when the --robust option is given the default method is regular
5526  HCCM, but the --robust-hac flag can be used to force the use of a HAC
5527  estimator.
5528
5529  robust_z: off (the default) or on. This controls the distribution used
5530  when calculating p-values based on robust standard errors in the context
5531  of least-squares estimators. By default gretl uses the Student t
5532  distribution but if robust_z is turned on the normal distribution is used.
5533
5534  hac_lag: nw1 (the default), nw2, nw3 or an integer. Sets the maximum lag
5535  value or bandwidth, p, used when calculating HAC (Heteroskedasticity and
5536  Autocorrelation Consistent) standard errors using the Newey-West approach,
5537  for time series data. nw1 and nw2 represent two variant automatic
5538  calculations based on the sample size, T: for nw1, p = 0.75 * T^(1/3), and
5539  for nw2, p = 4 * (T/100)^(2/9). nw3 calls for data-based bandwidth
5540  selection. See also qs_bandwidth and hac_prewhiten below.
5541
5542  hac_kernel: bartlett (the default), parzen, or qs (Quadratic Spectral).
5543  Sets the kernel, or pattern of weights, used when calculating HAC standard
5544  errors.
5545
5546  hac_prewhiten: on or off (the default). Use Andrews-Monahan prewhitening
5547  and re-coloring when computing HAC standard errors. This also implies use
5548  of data-based bandwidth selection.
5549
5550  hc_version: 0 (the default), 1, 2, 3 or 3a. Sets the variant used when
5551  calculating Heteroskedasticity Consistent standard errors with
5552  cross-sectional data. The first four options correspond to the HC0, HC1,
5553  HC2 and HC3 discussed by Davidson and MacKinnon in Econometric Theory and
5554  Methods, chapter 5. HC0 produces what are usually called "White's standard
5555  errors". Variant 3a is the MacKinnon-White "jackknife" procedure.
5556
5557  pcse: off (the default) or on. By default, when estimating a model using
5558  pooled OLS on panel data with the --robust option, the Arellano estimator
5559  is used for the covariance matrix. If you set pcse to "on", this forces
5560  use of the Beck and Katz Panel Corrected Standard Errors (which do not
5561  take autocorrelation into account).
5562
5563  qs_bandwidth: Bandwidth for HAC estimation in the case where the Quadratic
5564  Spectral kernel is selected. (Unlike the Bartlett and Parzen kernels, the
5565  QS bandwidth need not be an integer.)
5566
5567Time series
5568
5569  horizon: one integer (the default is based on the frequency of the data).
5570  Sets the horizon for impulse responses and forecast variance
5571  decompositions in the context of vector autoregressions.
5572
5573  vecm_norm: phillips (the default), diag, first or none. Used in the
5574  context of VECM estimation via the "vecm" command for identifying the
5575  cointegration vectors. See the chapter 33 of the Gretl User's Guide for
5576  details.
5577
5578  boot_iters: one integer, B. Sets the number of bootstrap iterations used
5579  when computing impulse response functions with confidence intervals. The
5580  default is 1999. It is recommended that B + 1 is evenly divisible by
5581  100α/2, so for example with α = 0.1 B + 1 should be a multiple of 5. The
5582  minimum acceptable value is 499.
5583
5584Interaction with R
5585
5586  R_lib: on (the default) or off. When sending instructions to be executed
5587  by R, use the R shared library by preference to the R executable, if the
5588  library is available.
5589
5590  R_functions: off (the default) or on. Recognize functions defined in R as
5591  if they were native functions (the namespace prefix "R." is required). See
5592  chapter 44 of the Gretl User's Guide for details on this and the previous
5593  item.
5594
5595Miscellaneous
5596
5597  mpi_use_smt: on or off (the default). This switch affects the default
5598  number of processes launched in an mpi block within a script. If the
5599  switch is off the default number of processes equals the number of
5600  physical cores on the local machine; if it's on the default is the maximum
5601  number of threads, which will be twice the number of physical cores if the
5602  cores support SMT (Simultaneous MultiThreading, also known as
5603  Hyper-Threading). This applies only if the user has not specified a number
5604  of processes, either directly or indirectly (by specifying a hosts file
5605  for use with MPI).
5606
5607  graph_theme: a string, one of altpoints, classic, dark2 (the current
5608  default), ethan, iwanthue or sober. This sets the "theme" used for graphs
5609  produced by gretl. The classic option reverts to the single theme that was
5610  in force prior to version 2020c of gretl.
5611
5612# setinfo Dataset
5613
5614Argument:   series
5615Options:    --description=string (set description)
5616            --graph-name=string (set graph name)
5617            --discrete (mark series as discrete)
5618            --continuous (mark series as continuous)
5619            --coded (mark as an encoding)
5620            --numeric (mark as not an encoding)
5621            --midas (mark as component of high-frequency data)
5622Examples:   setinfo x1 --description="Description of x1"
5623            setinfo y --graph-name="Some string"
5624            setinfo z --discrete
5625
5626If the options --description or --graph-name are invoked the argument must
5627be a single series, otherwise it may be a list of series in which case it
5628operates on all members of the list. This command sets up to four attributes
5629as follows.
5630
5631If the --description flag is given followed by a string in double quotes,
5632that string is used to set the variable's descriptive label. This label is
5633shown in response to the "labels" command, and is also shown in the main
5634window of the GUI program.
5635
5636If the --graph-name flag is given followed by a quoted string, that string
5637will be used in place of the variable's name in graphs.
5638
5639If one or other of the --discrete or --continuous option flags is given, the
5640variable's numerical character is set accordingly. The default is to treat
5641all series as continuous; setting a series as discrete affects the way the
5642variable is handled in other commands and functions, such as for example
5643"freq" or "dummify" .
5644
5645If one or other of the --coded or --numeric option flags is given, the
5646status of the given series is set accordingly. The default is to treat all
5647numerical values as meaningful as such, at least in an ordinal sense;
5648setting a series as coded means that the numerical values are an arbitrary
5649encoding of qualitative characteristics.
5650
5651The --midas option sets a flag indicating that a given series holds data of
5652a higher frequency than the base frequency of the dataset; for example, the
5653dataset is quarterly and the series holds values for month 1, 2 or 3 of each
5654quarter. (MIDAS = Mixed Data Sampling.)
5655
5656Menu path:    /Variable/Edit attributes
5657Other access: Main window pop-up menu
5658
5659# setmiss Dataset
5660
5661Arguments:  value [ varlist ]
5662Examples:   setmiss -1
5663            setmiss 100 x2
5664
5665Get the program to interpret some specific numerical data value (the first
5666parameter to the command) as a code for "missing", in the case of imported
5667data. If this value is the only parameter, as in the first example above,
5668the interpretation will be applied to all series in the data set. If "value"
5669is followed by a list of variables, by name or number, the interpretation is
5670confined to the specified variable(s). Thus in the second example the data
5671value 100 is interpreted as a code for "missing", but only for the variable
5672x2.
5673
5674Menu path:    /Data/Set missing value code
5675
5676# setobs Dataset
5677
5678Variants:   setobs periodicity startobs
5679            setobs unitvar timevar --panel-vars
5680Options:    --cross-section (interpret as cross section)
5681            --time-series (interpret as time series)
5682            --special-time-series (see below)
5683            --stacked-cross-section (interpret as panel data)
5684            --stacked-time-series (interpret as panel data)
5685            --panel-vars (use index variables, see below)
5686            --panel-time (see below)
5687            --panel-groups (see below)
5688Examples:   setobs 4 1990:1 --time-series
5689            setobs 12 1978:03
5690            setobs 1 1 --cross-section
5691            setobs 20 1:1 --stacked-time-series
5692            setobs unit year --panel-vars
5693
5694This command forces the program to interpret the current data set as having
5695a specified structure.
5696
5697In the first form of the command the periodicity, which must be an integer,
5698represents frequency in the case of time-series data (1 = annual; 4 =
5699quarterly; 12 = monthly; 52 = weekly; 5, 6, or 7 = daily; 24 = hourly). In
5700the case of panel data the periodicity means the number of lines per data
5701block: this corresponds to the number of cross-sectional units in the case
5702of stacked cross-sections, or the number of time periods in the case of
5703stacked time series. In the case of simple cross-sectional data the
5704periodicity should be set to 1.
5705
5706The starting observation represents the starting date in the case of time
5707series data. Years may be given with two or four digits; subperiods (for
5708example, quarters or months) should be separated from the year with a colon.
5709In the case of panel data the starting observation should be given as 1:1;
5710and in the case of cross-sectional data, as 1. Starting observations for
5711daily or weekly data should be given in the form YYYY-MM-DD (or simply as 1
5712for undated data).
5713
5714Certain time-series periodicities have standard interpretations -- for
5715example, 12 = monthly and 4 = quarterly. If you have unusual time-series
5716data to which the standard interpretation does not apply, you can signal
5717this by giving the --special-time-series option. In that case gretl will not
5718(for example) report your frequency-12 data as being monthly.
5719
5720If no explicit option flag is given to indicate the structure of the data
5721the program will attempt to guess the structure from the information given.
5722
5723The second form of the command (which requires the --panel-vars flag) may be
5724used to impose a panel interpretation when the data set contains variables
5725that uniquely identify the cross-sectional units and the time periods. The
5726data set will be sorted as stacked time series, by ascending values of the
5727units variable, unitvar.
5728
5729Panel-specific options
5730
5731The --panel-time and --panel-groups options can only be used with a dataset
5732which has already been defined as a panel.
5733
5734The purpose of --panel-time is to set extra information regarding the time
5735dimension of the panel. This should be given on the pattern of the first
5736form of setobs noted above. For example, the following may be used to
5737indicate that the time dimension of a panel is quarterly, starting in the
5738first quarter of 1990.
5739
5740	setobs 4 1990:1 --panel-time
5741
5742The purpose of --panel-groups is to create a string-valued series holding
5743names for the groups (individuals, cross-sectional units) in the panel.
5744(This will be used where appropriate in panel graphs.) With this option you
5745supply either one or two arguments as follows.
5746
5747First case: the (single) argument is the name of a string-valued series. If
5748the number of distinct values equals the number of groups in the panel this
5749series is used to define the group names. If necessary, the numerical
5750content of the series will be adjusted such that the values are all 1s for
5751the first group, all 2s for the second, and so on. If the number of string
5752values doesn't match the number of groups an error is flagged.
5753
5754Second case: the first argument is the name of a series and the second is a
5755string literal or variable holding a name for each group. The series will be
5756created if it does not already exist. If the second argument is a string
5757literal or string variable the group names should be separated by spaces; if
5758a name includes spaces it should be wrapped in backslash-escaped
5759double-quotes. Alternatively the second argument may be an array of strings.
5760
5761For example, the following will create a series named country in which the
5762names in cstrs are each repeated T times, T being the time-series length of
5763the panel.
5764
5765	string cstrs = sprintf("France Germany Italy \"United Kingdom\"")
5766	setobs country cstrs --panel-groups
5767
5768Menu path:    /Data/Dataset structure
5769
5770# setopt Programming
5771
5772Arguments:  command [ action ] options
5773Examples:   setopt mle --hessian
5774            setopt ols persist --quiet
5775            setopt ols clear
5776            See also gdp_midas.inp
5777
5778This command enables the pre-setting of options for a specified command.
5779Ordinarily this is not required, but it may be useful for the writers of
5780hansl functions when they wish to make certain command options conditional
5781on the value of an argument supplied by the caller.
5782
5783For example, suppose a function offers a boolean "quiet" switch, whose
5784intended effect is to suppress the printing of results from a certain
5785regression executed within the function. In that case one might write:
5786
5787	if quiet
5788	  setopt ols --quiet
5789	endif
5790	ols ...
5791
5792The --quiet option will then be applied to the next ols command if and only
5793if the variable quiet has a non-zero value.
5794
5795By default, options set in this way apply only to the following instance of
5796command; they are not persistent. However if you give persist as the value
5797for action the options will continue to apply to the given command until
5798further notice. The antidote to the persist action is clear: this erases any
5799stored setting for the specified command.
5800
5801It should be noted that options set via setopt are compounded with any
5802options attached to the target command directly. So for example one might
5803append the --hessian option to an mle command unconditionally but use setopt
5804to add --quiet conditionally.
5805
5806# shell Utilities
5807
5808Argument:   shellcommand
5809Examples:   ! ls -al
5810            ! notepad
5811            launch notepad
5812
5813An exclamation mark, "!", or the keyword "launch", at the beginning of a
5814command line is interpreted as an escape to the user's shell. Thus arbitrary
5815shell commands can be executed from within gretl. When "!" is used, the
5816external command is executed synchronously. That is, gretl waits for it to
5817complete before proceeding. If you want to start another program from within
5818gretl and not wait for its completion (asynchronous operation), use "launch"
5819instead.
5820
5821For reasons of security this facility is not enabled by default. To activate
5822it, check the box titled "Allow shell commands" under
5823Tools/Preferences/General in the GUI program. This also makes shell commands
5824available in the command-line program (and is the only way to do so).
5825
5826# smpl Dataset
5827
5828Variants:   smpl startobs endobs
5829            smpl +i -j
5830            smpl dumvar --dummy
5831            smpl condition --restrict
5832            smpl --no-missing [ varlist ]
5833            smpl --no-all-missing [ varlist ]
5834            smpl --contiguous [ varlist ]
5835            smpl n --random
5836            smpl full
5837Options:    --dummy (argument is a dummy variable)
5838            --restrict (apply boolean restriction)
5839            --replace (replace any existing boolean restriction)
5840            --no-missing (restrict to valid observations)
5841            --no-all-missing (omit empty observations (see below))
5842            --contiguous (see below)
5843            --random (form random sub-sample)
5844            --permanent (see below)
5845            --balanced (panel data: try to retain balanced panel)
5846            --unit (panel data: sample in cross-sectional dimension)
5847            --quiet (don't report sample range)
5848Examples:   smpl 3 10
5849            smpl 1960:2 1982:4
5850            smpl +1 -1
5851            smpl x > 3000 --restrict
5852            smpl y > 3000 --restrict --replace
5853            smpl 100 --random
5854
5855Resets the sample range. The new range can be defined in several ways. In
5856the first alternate form (and the first two examples) above, startobs and
5857endobs must be consistent with the periodicity of the data. Either one may
5858be replaced by a semicolon to leave the value unchanged. In the second form,
5859the integers i and j (which may be positive or negative, and should be
5860signed) are taken as offsets relative to the existing sample range. In the
5861third form dummyvar must be an indicator variable with values 0 or 1 at each
5862observation; the sample will be restricted to observations where the value
5863is 1. The fourth form, using --restrict, restricts the sample to
5864observations that satisfy the given Boolean condition (which is specified
5865according to the syntax of the "genr" command).
5866
5867The options --no-missing and --no-all-missing may be used to exclude from
5868the sample observations for which data are missing. The first variant
5869excludes those rows in the dataset for which at least one variable has a
5870missing value, while the second excludes just those rows on which all
5871variables have missing values. In each case the test is confined to the
5872variables in varlist if this argument is given, otherwise it is applied to
5873all series -- with the qualification that in the case of --no-all-missing
5874and no varlist, the generic variables index and time are ignored.
5875
5876The --contiguous form of smpl is intended for use with time series data. The
5877effect is to trim any observations at the start and end of the current
5878sample range that contain missing values (either for the variables in
5879varlist, or for all data series if no varlist is given). Then a check is
5880performed to see if there are any missing values in the remaining range; if
5881so, an error is flagged.
5882
5883With the --random flag, the specified number of cases are selected from the
5884current dataset at random (without replacement). If you wish to be able to
5885replicate this selection you should set the seed for the random number
5886generator first (see the "set" command).
5887
5888The final form, smpl full, restores the full data range.
5889
5890Note that sample restrictions are, by default, cumulative: the baseline for
5891any smpl command is the current sample. If you wish the command to act so as
5892to replace any existing restriction you can add the option flag --replace to
5893the end of the command. (But this option is not compatible with the
5894--contiguous option.)
5895
5896The internal variable obs may be used with the --restrict form of smpl to
5897exclude particular observations from the sample. For example
5898
5899	smpl obs!=4 --restrict
5900
5901will drop just the fourth observation. If the data points are identified by
5902labels,
5903
5904	smpl obs!="USA" --restrict
5905
5906will drop the observation with label "USA".
5907
5908One point should be noted about the --dummy, --restrict and --no-missing
5909forms of smpl: "structural" information in the data file (regarding the time
5910series or panel nature of the data) is likely to be lost when this command
5911is issued. You may reimpose structure with the "setobs" command. A related
5912option, for use with panel data, is the --balanced flag: this requests that
5913a balanced panel is reconstituted after sub-sampling, via the insertion of
5914"missing rows" if need be. But note that it is not always possible to comply
5915with this request.
5916
5917The --unit option is specific to panel data: it allows you to specify a
5918range of "individuals" directly. For example:
5919
5920	# limit the sample to the first 50 individuals
5921	smpl 1 50 --unit
5922
5923By default, restrictions on the current sample range can be undone: you can
5924restore the full dataset via smpl full. However, the --permanent flag can be
5925used to substitute the restricted dataset for the original. If you give the
5926--permanent option with no other arguments or options the effect is to
5927shrink the dataset to the current sample range.
5928
5929Please see chapter 5 of the Gretl User's Guide for further details.
5930
5931Menu path:    /Sample
5932
5933# spearman Statistics
5934
5935Arguments:  series1 series2
5936Option:     --verbose (print ranked data)
5937
5938Prints Spearman's rank correlation coefficient for the series series1 and
5939series2. The variables do not have to be ranked manually in advance; the
5940function takes care of this.
5941
5942The automatic ranking is from largest to smallest (i.e. the largest data
5943value gets rank 1). If you need to invert this ranking, create a new
5944variable which is the negative of the original. For example:
5945
5946	series altx = -x
5947	spearman altx y
5948
5949Menu path:    /Tools/Nonparametric tests/Correlation
5950
5951# sprintf Printing
5952
5953Obsolete command: please use the "sprintf" function instead.
5954
5955# square Transformations
5956
5957Argument:   varlist
5958Option:     --cross (generate cross-products as well as squares)
5959
5960Generates new series which are squares of the series in varlist (plus
5961cross-products if the --cross option is given). For example, "square x y"
5962will generate sq_x = x squared, sq_y = y squared and (optionally) x_y = x
5963times y. If a particular variable is a dummy variable it is not squared
5964because we will get the same variable.
5965
5966Menu path:    /Add/Squares of selected variables
5967
5968# stdize Transformations
5969
5970Argument:   varlist
5971Options:    --no-df-corr (no degrees of freedom correction)
5972            --center-only (don't divide by s.d.)
5973
5974By default a standardized version of each of the series in varlist is
5975obtained and the result stored in a new series with the prefix s_. For
5976example, "stdize x y" creates the new series s_x and s_y, each of which is
5977centered and divided by its sample standard deviation (with a degrees of
5978freedom correction of 1).
5979
5980If the --no-df-corr option is given no degrees of freedom correction is
5981applied; the standard deviation used is the maximum likelihood estimator. If
5982--center-only is given the series just have their means subtracted, and in
5983that case the output names have prefix c_ rather than s_.
5984
5985The functionality of this command is available in somewhat more flexible
5986form via the "stdize" function.
5987
5988Menu path:    /Add/Standardize selected variables
5989
5990# store Dataset
5991
5992Arguments:  filename [ varlist ]
5993Options:    --omit-obs (see below, on CSV format)
5994            --no-header (see below, on CSV format)
5995            --gnu-octave (use GNU Octave format)
5996            --gnu-R (format friendly for read.table)
5997            --gzipped[=level] (apply gzip compression)
5998            --jmulti (use JMulti ASCII format)
5999            --dat (use PcGive ASCII format)
6000            --decimal-comma (use comma as decimal character)
6001            --database (use gretl database format)
6002            --overwrite (see below, on database format)
6003            --comment=string (see below)
6004            --matrix=matrix-name (see below)
6005            --compat (gdtb compatibility, see below)
6006
6007Save data to filename. By default all currently defined series are saved but
6008the optional varlist argument can be used to select a subset of series. If
6009the dataset is sub-sampled, only the observations in the current sample
6010range are saved.
6011
6012The output file will be written in the currently set "workdir", unless the
6013filename string contains a full path specification.
6014
6015Note that the store command behaves in a special manner in the context of a
6016"progressive loop"; see chapter 13 of the Gretl User's Guide for details.
6017
6018Native formats
6019
6020If filename has extension .gdt or .gtdb this implies saving the data in one
6021of gretl's native formats. In addition, if no extension is given .gdt is
6022taken to be implicit and the suffix is added automatically. The gdt format
6023is XML, optionally gzip-compressed, while the gdtb format is binary. The
6024former is recommended for datasets of moderate size (say, up to several
6025hundred kilobytes of data); the binary format is much faster for very large
6026datasets.
6027
6028The gdtb format was revised in gretl 2021a (producing a huge write/read
6029speed-up for super-large datasets). But if you wish to write a binary data
6030file readable by earlier gretl (2018c or higher) you should append the
6031--compat option.
6032
6033When data are saved in gdt format the --gzipped option may be used for data
6034compression. The optional parameter for this flag controls the level of
6035compression (from 0 to 9): higher levels produce a smaller file, but
6036compression takes longer. The default level is 1; a level of 0 means that no
6037compression is applied.
6038
6039Other formats
6040
6041The format in which the data are written may be controlled to a degree by
6042the extension or suffix of filename, as follows:
6043
6044  .csv: comma-separated values (CSV).
6045
6046  .txt or .asc: space-separated values.
6047
6048  .m: GNU Octave matrix format.
6049
6050  .dta: Stata dta format (version 113).
6051
6052The format-related option flags shown above can be used to force the choice
6053of format independently of the filename (or to get gretl to write in the
6054formats of PcGive or JMulTi).
6055
6056CSV options
6057
6058The option flags --omit-obs and --no-header are specific to saving data in
6059CSV format. By default, if the data are time series or panel, or if the
6060dataset includes specific observation markers, the output file includes a
6061first column identifying the observations (e.g. by date). If the --omit-obs
6062flag is given this column is omitted. The --no-header flag suppresses the
6063usual printing of the names of the variables at the top of the columns.
6064
6065The option flag --decimal-comma is also confined to CSV. Its effect is to
6066replace the decimal point with decimal comma; in addition the column
6067separator is forced to be a semicolon rather than a comma.
6068
6069Storing to a database
6070
6071The option of saving in gretl database format is intended to help with the
6072construction of large sets of series with mixed frequencies and ranges of
6073observations. At present this option is available only for annual, quarterly
6074or monthly time-series data. If you save to a file that already exists, the
6075default action is to append the newly saved series to the existing content
6076of the database. In this context it is an error if one or more of the
6077variables to be saved has the same name as a variable that is already
6078present in the database. The --overwrite flag has the effect that, if there
6079are variable names in common, the newly saved variable replaces the variable
6080of the same name in the original dataset.
6081
6082The --comment option is available when saving data as a database or as CSV.
6083The required parameter is a double-quoted one-line string, attached to the
6084option flag with an equals sign. The string is inserted as a comment into
6085the database index file or at the top of the CSV output.
6086
6087Writing a matrix as a dataset
6088
6089The --matrix option requires a parameter, the name of a (non-empty) matrix.
6090The effect of store is then, in effect, to turn the matrix into a dataset
6091"in the background" and write it to file as such. Matrix columns become
6092series; their names are taken from column-names attached to the matrix, if
6093any, or by default are assigned as v1, v2 and so on. If the matrix has row
6094names attached these are used as "observation markers" in the dataset.
6095
6096Note that matrices can be written to file in their own right, see the
6097"mwrite" function. But in some cases it may be useful to write them in
6098dataset mode.
6099
6100Menu path:    /File/Save data; /File/Export data
6101
6102# summary Statistics
6103
6104Variants:   summary [ varlist ]
6105            summary --matrix=matname
6106Options:    --simple (basic statistics only)
6107            --weight=wvar (weighting variable)
6108            --by=byvar (see below)
6109Examples:   frontier.inp
6110
6111In its first form, this command prints summary statistics for the variables
6112in varlist, or for all the variables in the data set if varlist is omitted.
6113By default, output consists of the mean, standard deviation (sd),
6114coefficient of variation (= sd/mean), median, minimum, maximum, skewness
6115coefficient, and excess kurtosis. If the --simple option is given, output is
6116restricted to the mean, minimum, maximum and standard deviation.
6117
6118If the --by option is given (in which case the parameter byvar should be the
6119name of a discrete variable), then statistics are printed for sub-samples
6120corresponding to the distinct values taken on by byvar. For example, if
6121byvar is a (binary) dummy variable, statistics are given for the cases byvar
6122= 0 and byvar = 1. Note: at present, this option is incompatible with the
6123--weight option.
6124
6125If the alternative form is given, using a named matrix, then summary
6126statistics are printed for each column of the matrix. The --by option is not
6127available in this case.
6128
6129The table of statistics produced by summary can be retrieved in matrix form
6130via the "$result" accessor.
6131
6132Menu path:    /View/Summary statistics
6133Other access: Main window pop-up menu
6134
6135# system Estimation
6136
6137Variants:   system method=estimator
6138            sysname <- system
6139Examples:   "Klein Model 1" <- system
6140            system method=sur
6141            system method=3sls
6142            See also klein.inp, kmenta.inp, greene14_2.inp
6143
6144Starts a system of equations. Either of two forms of the command may be
6145given, depending on whether you wish to save the system for estimation in
6146more than one way or just estimate the system once.
6147
6148To save the system you should assign it a name, as in the first example (if
6149the name contains spaces it must be surrounded by double quotes). In this
6150case you estimate the system using the "estimate" command. With a saved
6151system of equations, you are able to impose restrictions (including
6152cross-equation restrictions) using the "restrict" command.
6153
6154Alternatively you can specify an estimator for the system using method=
6155followed by a string identifying one of the supported estimators: "ols"
6156(Ordinary Least Squares), "tsls" (Two-Stage Least Squares) "sur" (Seemingly
6157Unrelated Regressions), "3sls" (Three-Stage Least Squares), "fiml" (Full
6158Information Maximum Likelihood) or "liml" (Limited Information Maximum
6159Likelihood). In this case the system is estimated once its definition is
6160complete.
6161
6162An equation system is terminated by the line "end system". Within the system
6163four sorts of statement may be given, as follows.
6164
6165  "equation": specify an equation within the system.
6166
6167  "instr": for a system to be estimated via Three-Stage Least Squares, a
6168  list of instruments (by variable name or number). Alternatively, you can
6169  put this information into the "equation" line using the same syntax as in
6170  the "tsls" command.
6171
6172  "endog": for a system of simultaneous equations, a list of endogenous
6173  variables. This is primarily intended for use with FIML estimation, but
6174  with Three-Stage Least Squares this approach may be used instead of giving
6175  an "instr" list; then all the variables not identified as endogenous will
6176  be used as instruments.
6177
6178  "identity": for use with FIML, an identity linking two or more of the
6179  variables in the system. This sort of statement is ignored when an
6180  estimator other than FIML is used.
6181
6182After estimation using the "system" or "estimate" commands the following
6183accessors can be used to retrieve additional information:
6184
6185  $uhat: the matrix of residuals, one column per equation.
6186
6187  $yhat: matrix of fitted values, one column per equation.
6188
6189  $coeff: column vector of coefficients (all the coefficients from the first
6190  equation, followed by those from the second equation, and so on).
6191
6192  $vcv: covariance matrix of the coefficients. If there are k elements in
6193  the $coeff vector, this matrix is k by k.
6194
6195  $sigma: cross-equation residual covariance matrix.
6196
6197  $sysGamma, $sysA and $sysB: structural-form coefficient matrices (see
6198  below).
6199
6200If you want to retrieve the residuals or fitted values for a specific
6201equation as a data series, select a column from the $uhat or $yhat matrix
6202and assign it to a series, as in
6203
6204	series uh1 = $uhat[,1]
6205
6206The structural-form matrices correspond to the following representation of a
6207simultaneous equations model:
6208
6209  Gamma y(t) = A y(t-1) + B x(t) + e(t)
6210
6211If there are n endogenous variables and k exogenous variables, Gamma is an n
6212x n matrix and B is n x k. If the system contains no lags of the endogenous
6213variables then the A matrix is not present. If the maximum lag of an
6214endogenous regressor is p, the A matrix is n x np.
6215
6216Menu path:    /Model/Simultaneous equations
6217
6218# tabprint Printing
6219
6220Options:    --output=filename (send output to specified file)
6221            --format="f1|f2|f3|f4" (Specify custom TeX format)
6222            --complete (TeX-related, see below)
6223
6224Must follow the estimation of a model. Prints the model in tabular form. The
6225format is governed by the extension of the specified filename: ".tex" for
6226LaTeX, ".rtf" for RTF (Microsoft's Rich Text Format), or ".csv" for
6227comma-separated. The file will be written in the currently set "workdir",
6228unless filename contains a full path specification.
6229
6230If CSV format is selected, values are comma-separated unless the decimal
6231comma is in force, in which case the separator is the semicolon.
6232
6233Options specific to LaTeX output
6234
6235If the --complete flag is given the LaTeX file is a complete document, ready
6236for processing; otherwise it must be included in a document.
6237
6238If you wish alter the appearance of the tabular output, you can specify a
6239custom row format using the --format flag. The format string must be
6240enclosed in double quotes and must be tied to the flag with an equals sign.
6241The pattern for the format string is as follows. There are four fields,
6242representing the coefficient, standard error, t-ratio and p-value
6243respectively. These fields should be separated by vertical bars; they may
6244contain a printf-type specification for the formatting of the numeric value
6245in question, or may be left blank to suppress the printing of that column
6246(subject to the constraint that you can't leave all the columns blank). Here
6247are a few examples:
6248
6249	--format="%.4f|%.4f|%.4f|%.4f"
6250	--format="%.4f|%.4f|%.3f|"
6251	--format="%.5f|%.4f||%.4f"
6252	--format="%.8g|%.8g||%.4f"
6253
6254The first of these specifications prints the values in all columns using 4
6255decimal places. The second suppresses the p-value and prints the t-ratio to
62563 places. The third omits the t-ratio. The last one again omits the t, and
6257prints both coefficient and standard error to 8 significant figures.
6258
6259Once you set a custom format in this way, it is remembered and used for the
6260duration of the gretl session. To revert to the default format you can use
6261the special variant --format=default.
6262
6263Menu path:    Model window, /LaTeX
6264
6265# textplot Graphs
6266
6267Argument:   varlist
6268Options:    --time-series (plot by observation)
6269            --one-scale (force a single scale)
6270            --tall (use 40 rows)
6271
6272Quick and simple ASCII graphics. Without the --time-series flag, varlist
6273must contain at least two series, the last of which is taken as the variable
6274for the x axis, and a scatter plot is produced. In this case the --tall
6275option may be used to produce a graph in which the y axis is represented by
627640 rows of characters (the default is 20 rows).
6277
6278With the --time-series, a plot by observation is produced. In this case the
6279option --one-scale may be used to force the use of a single scale; otherwise
6280if varlist contains more than one series the data may be scaled. Each line
6281represents an observation, with the data values plotted horizontally.
6282
6283See also "gnuplot".
6284
6285# tobit Estimation
6286
6287Arguments:  depvar indepvars
6288Options:    --llimit=lval (specify left bound)
6289            --rlimit=rval (specify right bound)
6290            --vcv (print covariance matrix)
6291            --robust (robust standard errors)
6292            --opg (see below)
6293            --cluster=clustvar (see "logit" for explanation)
6294            --verbose (print details of iterations)
6295            --quiet (don't print results)
6296
6297Estimates a Tobit model, which may be appropriate when the dependent
6298variable is "censored". For example, positive and zero values of purchases
6299of durable goods on the part of individual households are observed, and no
6300negative values, yet decisions on such purchases may be thought of as
6301outcomes of an underlying, unobserved disposition to purchase that may be
6302negative in some cases.
6303
6304By default it is assumed that the dependent variable is censored at zero on
6305the left and is uncensored on the right. However you can use the options
6306--llimit and --rlimit to specify a different pattern of censoring. Note that
6307if you specify a right bound only, the assumption is then that the dependent
6308variable is uncensored on the left.
6309
6310The Tobit model is a special case of interval regression. Please see the
6311"intreg" command for further details, including an account of the --robust
6312and --opg options.
6313
6314Menu path:    /Model/Limited dependent variable/Tobit
6315
6316# tsls Estimation
6317
6318Arguments:  depvar indepvars ; instruments
6319Options:    --no-tests (don't do diagnostic tests)
6320            --vcv (print covariance matrix)
6321            --quiet (don't print results)
6322            --no-df-corr (no degrees-of-freedom correction)
6323            --robust (robust standard errors)
6324            --cluster=clustvar (clustered standard errors)
6325            --liml (use Limited Information Maximum Likelihood)
6326            --gmm (use the Generalized Method of Moments)
6327Examples:   tsls y1 0 y2 y3 x1 x2 ; 0 x1 x2 x3 x4 x5 x6
6328            See also penngrow.inp
6329
6330Computes Instrumental Variables (IV) estimates, by default using two-stage
6331least squares (TSLS) but see below for further options. The dependent
6332variable is depvar, indepvars is the list of regressors (which is presumed
6333to include at least one endogenous variable); and instruments is the list of
6334instruments (exogenous and/or predetermined variables). If the instruments
6335list is not at least as long as indepvars, the model is not identified.
6336
6337In the above example, the ys are endogenous and the xs are the exogenous
6338variables. Note that exogenous regressors should appear in both lists.
6339
6340Output for two-stage least squares estimates includes the Hausman test and,
6341if the model is over-identified, the Sargan over-identification test. In the
6342Hausman test, the null hypothesis is that OLS estimates are consistent, or
6343in other words estimation by means of instrumental variables is not really
6344required. A model of this sort is over-identified if there are more
6345instruments than are strictly required. The Sargan test is based on an
6346auxiliary regression of the residuals from the two-stage least squares model
6347on the full list of instruments. The null hypothesis is that all the
6348instruments are valid, and suspicion is thrown on this hypothesis if the
6349auxiliary regression has a significant degree of explanatory power. For a
6350good explanation of both tests see chapter 8 of Davidson and MacKinnon
6351(2004).
6352
6353For both TSLS and LIML estimation, an additional test result is shown
6354provided that the model is estimated under the assumption of i.i.d. errors
6355(that is, the --robust option is not selected). This is a test for weakness
6356of the instruments. Weak instruments can lead to serious problems in IV
6357regression: biased estimates and/or incorrect size of hypothesis tests based
6358on the covariance matrix, with rejection rates well in excess of the nominal
6359significance level (Stock, Wright and Yogo, 2002). The test statistic is the
6360first-stage F-test if the model contains just one endogenous regressor,
6361otherwise it is the smallest eigenvalue of the matrix counterpart of the
6362first stage F. Critical values based on the Monte Carlo analysis of Stock
6363and Yogo (2003) are shown when available.
6364
6365The R-squared value printed for models estimated via two-stage least squares
6366is the square of the correlation between the dependent variable and the
6367fitted values.
6368
6369For details on the effects of the --robust and --cluster options, please see
6370the help for "ols".
6371
6372As alternatives to TSLS, the model may be estimated via Limited Information
6373Maximum Likelihood (the --liml option) or via the Generalized Method of
6374Moments (--gmm option). Note that if the model is just identified these
6375methods should produce the same results as TSLS, but if it is
6376over-identified the results will differ in general.
6377
6378If GMM estimation is selected, the following additional options become
6379available:
6380
6381  --two-step: perform two-step GMM rather than the default of one-step.
6382
6383  --iterate: Iterate GMM to convergence.
6384
6385  --weights=Wmat: specify a square matrix of weights to be used when
6386  computing the GMM criterion function. The dimension of this matrix must
6387  equal the number of instruments. The default is an appropriately sized
6388  identity matrix.
6389
6390Menu path:    /Model/Instrumental variables
6391
6392# var Estimation
6393
6394Arguments:  order ylist [ ; xlist ]
6395Options:    --nc (do not include a constant)
6396            --trend (include a linear trend)
6397            --seasonals (include seasonal dummy variables)
6398            --robust (robust standard errors)
6399            --robust-hac (HAC standard errors)
6400            --quiet (skip output of individual equations)
6401            --silent (don't print anything)
6402            --impulse-responses (print impulse responses)
6403            --variance-decomp (print variance decompositions)
6404            --lagselect (show criteria for lag selection)
6405            --minlag=minimum lag (lag selection only, see below)
6406Examples:   var 4 x1 x2 x3 ; time mydum
6407            var 4 x1 x2 x3 --seasonals
6408            var 12 x1 x2 x3 --lagselect
6409            See also sw_ch14.inp
6410
6411Sets up and estimates (using OLS) a vector autoregression (VAR). The first
6412argument specifies the lag order -- or the maximum lag order in case the
6413--lagselect option is given (see below). The order may be given numerically,
6414or as the name of a pre-existing scalar variable. Then follows the setup for
6415the first equation. Do not include lags among the elements of ylist -- they
6416will be added automatically. The semi-colon separates the stochastic
6417variables, for which order lags will be included, from any exogenous
6418variables in xlist. Note that a constant is included automatically unless
6419you give the --nc flag, a trend can be added with the --trend flag, and
6420seasonal dummy variables may be added using the --seasonals flag.
6421
6422While a VAR specification usually includes all lags from 1 to a given
6423maximum, it is possible to select a specific set of lags. To do this,
6424substitute for the regular (scalar) order argument either the name of a
6425predefined vector or a comma-separated list of lags, enclosed in braces. We
6426show below two ways of specifying that a VAR should include lags 1, 2 and 4
6427(but not lag 3):
6428
6429	var {1,2,4} ylist
6430	matrix p = {1,2,4}
6431	var p ylist
6432
6433A separate regression is reported for each variable in ylist. Output for
6434each equation includes F-tests for zero restrictions on all lags of each of
6435the variables, an F-test for the significance of the maximum lag, and, if
6436the --impulse-responses flag is given, forecast variance decompositions and
6437impulse responses.
6438
6439Forecast variance decompositions and impulse responses are based on the
6440Cholesky decomposition of the contemporaneous covariance matrix, and in this
6441context the order in which the (stochastic) variables are given matters. The
6442first variable in the list is assumed to be "most exogenous" within-period.
6443The horizon for variance decompositions and impulse responses can be set
6444using the "set" command. For retrieval of a specified impulse response
6445function in matrix form, see the "irf" function.
6446
6447If the --robust option is given, standard errors are corrected for
6448heteroskedasticity. Alternatively, the --robust-hac option can be given to
6449produce standard errors that are robust with respect to both
6450heteroskedasticity and autocorrelation (HAC). In general the latter
6451correction should not be needed if the VAR includes sufficient lags.
6452
6453If the --lagselect option is given, the first parameter to the var command
6454is taken as the maximum lag order. Output consists of a table showing the
6455values of the Akaike (AIC), Schwarz (BIC) and Hannan-Quinn (HQC) information
6456criteria, by default computed from VARs of order 1 to the given maximum.
6457This is intended to help with the selection of the optimal lag order. The
6458usual VAR output is not presented. The table of information criteria may be
6459retrieved as a matrix via the "$test" accessor. In this context (only) the
6460--minlag option can be used to adjust the minimum lag order. Set this to 0
6461to allow for the possibility that the optimal lag order is zero, meaning
6462that a VAR is not really called for at all. Conversely you could set
6463--minlag=4 if you believe you need at least 4 lags, thereby saving a little
6464compute time.
6465
6466Menu path:    /Model/Multivariate time series
6467
6468# varlist Dataset
6469
6470Option:     --type=typename (scope of listing)
6471
6472By default, prints a listing of the series in the current dataset (if any);
6473"ls" may be used as an alias.
6474
6475If the --type option is given, it should be followed (after an equals sign)
6476by one of the following typenames: series, scalar, matrix, list, string,
6477bundle, array or accessor. The effect is to print the names of all currently
6478defined objects of the named type.
6479
6480As a special case, if the typename is accessor, the names printed are those
6481of the internal variables currently available as "accessors", such as
6482"$nobs" and "$uhat", regardless of their specific type.
6483
6484# vartest Tests
6485
6486Arguments:  series1 series2
6487
6488Calculates the F statistic for the null hypothesis that the population
6489variances for the variables series1 and series2 are equal, and shows its
6490p-value. The test statistics and the p-value can be retrieved through the
6491accessors "$test" and "$pvalue", respectively. The following code
6492
6493      	open AWM18.gdt
6494		vartest EEN EXR
6495		eval $test
6496		eval $pvalue
6497
6498computes the test and shows how to retrieve the test statistics and
6499corresponding p-value afterwards:
6500
6501		Equality of variances test
6502
6503		EEN: Number of observations = 192
6504		EXR: Number of observations = 188
6505		Ratio of sample variances = 3.70707
6506		Null hypothesis: The two population variances are equal
6507		Test statistic: F(191,187) = 3.70707
6508		p-value (two-tailed) = 1.94866e-18
6509
6510		3.7070716
6511		1.9486605e-18
6512
6513Menu path:    /Tools/Test statistic calculator
6514
6515# vecm Estimation
6516
6517Arguments:  order rank ylist [ ; xlist ] [ ; rxlist ]
6518Options:    --nc (no constant)
6519            --rc (restricted constant)
6520            --uc (unrestricted constant)
6521            --crt (constant and restricted trend)
6522            --ct (constant and unrestricted trend)
6523            --seasonals (include centered seasonal dummies)
6524            --quiet (skip output of individual equations)
6525            --silent (don't print anything)
6526            --impulse-responses (print impulse responses)
6527            --variance-decomp (print variance decompositions)
6528Examples:   vecm 4 1 Y1 Y2 Y3
6529            vecm 3 2 Y1 Y2 Y3 --rc
6530            vecm 3 2 Y1 Y2 Y3 ; X1 --rc
6531            See also denmark.inp, hamilton.inp
6532
6533A VECM is a form of vector autoregression or VAR (see "var"), applicable
6534where the variables in the model are individually integrated of order 1
6535(that is, are random walks, with or without drift), but exhibit
6536cointegration. This command is closely related to the Johansen test for
6537cointegration (see "johansen").
6538
6539The order parameter to this command represents the lag order of the VAR
6540system. The number of lags in the VECM itself (where the dependent variable
6541is given as a first difference) is one less than order.
6542
6543The rank parameter represents the cointegration rank, or in other words the
6544number of cointegrating vectors. This must be greater than zero and less
6545than or equal to (generally, less than) the number of endogenous variables
6546given in ylist.
6547
6548ylist supplies the list of endogenous variables, in levels. The inclusion of
6549deterministic terms in the model is controlled by the option flags. The
6550default if no option is specified is to include an "unrestricted constant",
6551which allows for the presence of a non-zero intercept in the cointegrating
6552relations as well as a trend in the levels of the endogenous variables. In
6553the literature stemming from the work of Johansen (see for example his 1995
6554book) this is often referred to as "case 3". The first four options given
6555above, which are mutually exclusive, produce cases 1, 2, 4 and 5
6556respectively. The meaning of these cases and the criteria for selecting a
6557case are explained in chapter 33 of the Gretl User's Guide.
6558
6559The optional lists xlist and rxlist allow you to specify sets of exogenous
6560variables which enter the model either unrestrictedly (xlist) or restricted
6561to the cointegration space (rxlist). These lists are separated from ylist
6562and from each other by semicolons.
6563
6564The --seasonals option, which may be combined with any of the other options,
6565specifies the inclusion of a set of centered seasonal dummy variables. This
6566option is available only for quarterly or monthly data.
6567
6568The first example above specifies a VECM with lag order 4 and a single
6569cointegrating vector. The endogenous variables are Y1, Y2 and Y3. The second
6570example uses the same variables but specifies a lag order of 3 and two
6571cointegrating vectors; it also specifies a "restricted constant", which is
6572appropriate if the cointegrating vectors may have a non-zero intercept but
6573the Y variables have no trend.
6574
6575Following estimation of a VECM some special accessors are available:
6576$jalpha, $jbeta and $jvbeta retrieve, respectively, the α and beta matrices
6577and the estimated variance of beta. For retrieval of a specified impulse
6578response function in matrix form, see the "irf" function.
6579
6580Menu path:    /Model/Multivariate time series
6581
6582# vif Tests
6583
6584Option:     --quiet (don't print anything)
6585Examples:   longley.inp
6586
6587Must follow the estimation of a model which includes at least two
6588independent variables. Calculates and displays diagnostic information
6589pertaining to collinearity.
6590
6591The Variance Inflation Factor or VIF for regressor j is defined as
6592
6593  1/(1 - Rj^2)
6594
6595where R_j is the coefficient of multiple correlation between regressor j and
6596the other regressors. The factor has a minimum value of 1.0 when the
6597variable in question is orthogonal to the other independent variables.
6598Neter, Wasserman, and Kutner (1990) suggest inspecting the largest VIF as a
6599diagnostic for collinearity; a value greater than 10 is sometimes taken as
6600indicating a problematic degree of collinearity.
6601
6602Following this command the "$result" accessor may be used to retrieve a
6603column vector holding the VIFs. For a more sophisticated approach to
6604diagnosing collinearity, see the "bkw" command.
6605
6606Menu path:    Model window, /Analysis/Collinearity
6607
6608# wls Estimation
6609
6610Arguments:  wtvar depvar indepvars
6611Options:    --vcv (print covariance matrix)
6612            --robust (robust standard errors)
6613            --quiet (suppress printing of results)
6614            --allow-zeros (see below)
6615
6616Computes weighted least squares (WLS) estimates using wtvar as the weight,
6617depvar as the dependent variable, and indepvars as the list of independent
6618variables. Let w denote the positive square root of wtvar; then WLS is
6619basically equivalent to an OLS regression of w * depvar on w * indepvars.
6620The R-squared, however, is calculated in a special manner, namely as
6621
6622  R^2 = 1 - ESS / WTSS
6623
6624where ESS is the error sum of squares (sum of squared residuals) from the
6625weighted regression and WTSS denotes the "weighted total sum of squares",
6626which equals the sum of squared residuals from a regression of the weighted
6627dependent variable on the weighted constant alone.
6628
6629As a special case, if wtvar is a 0/1 dummy variable, WLS estimation is
6630equivalent to OLS on a sample that excludes all observations with value zero
6631for wtvar. Otherwise including weights of zero is considered an error, but
6632if you really want to mix zero weights with positive ones you can append the
6633--allow-zeros option.
6634
6635For weighted least squares estimation applied to panel data and based on the
6636unit specific error variances please see the "panel" command with the
6637--unit-weights option.
6638
6639Menu path:    /Model/Other linear models/Weighted Least Squares
6640
6641# xcorrgm Statistics
6642
6643Arguments:  series1 series2 [ order ]
6644Options:    --plot=mode-or-filename (see below)
6645            --quiet (suppress plot)
6646Example:    xcorrgm x y 12
6647
6648Prints and graphs the cross-correlogram for series1 and series2, which may
6649be specified by name or number. The values are the sample correlation
6650coefficients between the current value of series1 and successive leads and
6651lags of series2.
6652
6653If an order value is specified the length of the cross-correlogram is
6654limited to at most that number of leads and lags, otherwise the length is
6655determined automatically, as a function of the frequency of the data and the
6656number of observations.
6657
6658By default, a plot of the cross-correlogram is produced: a gnuplot graph in
6659interactive mode or an ASCII graphic in batch mode. This can be adjusted via
6660the --plot option. The acceptable parameters to this option are none (to
6661suppress the plot); ascii (to produce a text graphic even when in
6662interactive mode); display (to produce a gnuplot graph even when in batch
6663mode); or a file name. The effect of providing a file name is as described
6664for the --output option of the "gnuplot" command.
6665
6666Menu path:    /View/Cross-correlogram
6667Other access: Main window pop-up menu (multiple selection)
6668
6669# xtab Statistics
6670
6671Arguments:  ylist [ ; xlist ]
6672Options:    --row (display row percentages)
6673            --column (display column percentages)
6674            --zeros (display zero entries)
6675            --no-totals (suppress printing of marginal counts)
6676            --matrix=matname (use frequencies from named matrix)
6677            --quiet (suppress printed output)
6678            --tex[=filename] (output as LaTeX)
6679            --equal (see the LaTeX case below)
6680Examples:   xtab 1 2
6681            xtab 1 ; 2 3 4
6682            xtab --matrix=A
6683            xtab 1 2 --tex="xtab.tex"
6684            See also ooballot.inp
6685
6686Given just the ylist argument, computes (and by default prints) a
6687contingency table or cross-tabulation for each combination of the variables
6688included in the list. If a second list xlist is given, each variable in
6689ylist is cross-tabulated by row against each variable in xlist (by column).
6690Variables in these lists can be referenced by name or by number. Note that
6691all the variables must have been marked as discrete. Alternatively, if the
6692--matrix option is given, the named matrix is treated as a precomputed set
6693of frequencies, to be displayed as a cross-tabulation (see also the "mxtab"
6694function). In this case the list argument(s) should be omitted.
6695
6696By default the cell entries are given as frequency counts. The --row and
6697--column options (which are mutually exclusive) replace the counts with the
6698percentages for each row or column, respectively. By default, cells with a
6699zero count are left blank but the --zeros option has the effect of showing
6700zero counts explicitly, which may be useful for importing the table into
6701another program, such as a spreadsheet.
6702
6703Pearson's chi-square test for independence is shown if the expected
6704frequency under independence is at least 1.0e-7 for all cells. A common rule
6705of thumb for the validity of this statistic is that at least 80 percent of
6706cells should have expected frequencies of 5 or greater; if this criterion is
6707not met a warning is printed.
6708
6709If the contingency table is 2 by 2, Fisher's Exact Test for independence is
6710shown. Note that this test is based on the assumption that the row and
6711column totals are fixed, which may or may not be appropriate depending on
6712how the data were generated. The left p-value should be used when the
6713alternative to independence is negative association (values tend to cluster
6714in the lower left and upper right cells), the right p-value when the
6715alternative is positive association. The two-tailed p-value for this test is
6716calculated by method (b) in section 2.1 of Agresti (1992): it is the sum of
6717the probabilities of all possible tables with the given row and column
6718totals and a probability no greater than that of the observed table.
6719
6720The bivariate case
6721
6722In the case of a bivariate cross-tabulation (only one list is given, and it
6723has two members) certain results are stored. The contingency table may be
6724retrieved in matrix form via the "$result" accessor. In addition, if the
6725minimum expected value condition is met, the Pearson chi-square test and its
6726p-value may be retrieved via the "$test" and "$pvalue" accessors. If it's
6727these results that are of interest, the --quiet option can be used to
6728suppress the usual printout.
6729
6730LaTeX output
6731
6732If the --tex option is given the cross-tabulation is printed in the form of
6733a LaTeX tabular environment, either inline (from where it may be copied and
6734pasted) or, if the filename parameter is appended, to the specified file.
6735(If filename does not specify a full path the file is written in the
6736currently set "workdir".) No test statistic is computed. The additional
6737option --equal can be used to flag, by printing in boldface, the count or
6738percentage for cells in which the row and column variables have the same
6739numerical value. This option is ignored unless the --tex option is given,
6740and also when one or both of the cross-tabulated variables are
6741string-valued.
6742
6743