1%
2% $Id$
3%
4\label{sec:getstart}
5
6This section provides an overview of NWChem input and program
7architecture, and the syntax used to describe the input.  See Sections
8\ref{sec:simplesample} and \ref{sec:realsample} for examples of NWChem
9input files with detailed explanation.
10
11NWChem consists of independent modules that perform the various
12functions of the code.  Examples of modules include the input parser,
13SCF energy, SCF analytic gradient, DFT energy, etc..  Data is passed
14between modules and saved for restart using a disk-resident database
15or dumpfile (see Section \ref{sec:arch}).
16
17The input to NWChem is composed of commands, called directives, which
18define data (such as basis sets, geometries, and filenames) and the
19actions to be performed on that data.  Directives are processed in the order
20presented in the input file, with the exception of certain start-up
21directives (see Section \ref{sec:inputstructure}) which provide
22critical job control information, and are processed before all other
23input.  Most directives are specific to a particular module and define
24data that is used by that module only.  A few directives (see Section
25\ref{sec:toplevel}) potentially affect all modules, for instance by
26specifying the total electric charge on the system.
27
28There are two types of directives.  Simple directives consist of one
29line of input, which may contain multiple fields.  Compound directives
30group together multiple simple directives that are in some way
31related and are terminated with an \verb+END+ directive.  See the
32sample inputs (Sections \ref{sec:simplesample}, \ref{sec:realsample})
33and the input syntax specification (Section \ref{sec:syntax}).
34
35All input is free format and case is ignored except for actual data
36(e.g., names/tags of centers, titles). Directives or blocks of
37module-specific directives (i.e., compound directives) can appear in
38any order, with the exception of the \verb+TASK+ directive (see
39sections \ref{sec:inputstructure} and \ref{sec:task}) which is used to
40invoke an NWChem module.  All input for a given task must
41precede the \verb+TASK+ directive.  This input specification rule
42allows the concatenation of multiple tasks in a single NWChem input
43file.
44
45To make the input as short and simple as possible, most options have
46default values.  The user needs to supply input only for those items that
47have no defaults, or for items that must be different from the defaults
48for the particular application.  In the discussion of each directive, the
49defaults are noted, where applicable.
50
51The input file structure is described in the following sections, and
52illustrated with two examples.  The input format and syntax for directives
53is also described in detail.
54
55\section{Input File Structure}
56\label{sec:inputstructure}
57
58The structure of an input file reflects the internal structure of
59NWChem.  At the beginning of a calculation, NWChem needs to determine
60how much memory to use, the name of the database, whether it is a new or
61restarted job, where to put scratch/permanent files,
62etc..  It is not necessary to put this information at the top of the
63input file, however.  NWChem will read through the {\em entire} input
64file looking for the start-up directives.  In this first pass, all other
65directives are ignored.
66
67The start-up directives are
68\begin{itemize}
69\item \verb+START+
70\item \verb+RESTART+
71\item \verb+SCRATCH_DIR+
72\item \verb+PERMANENT_DIR+
73\item \verb=MEMORY=
74\item \verb=ECHO=
75\end{itemize}
76
77After the input file has been scanned for the start-up directives, it
78is rewound and read sequentially.  Input is processed either by the
79top-level parser (for the directives listed in Section
80\ref{sec:toplevel}, such as \verb+TITLE+, \verb+SET+, \ldots) or by
81the parsers for specific computational modules (e.g., SCF, DFT,
82\ldots).  Any directives that have already been processed (e.g.,
83\verb+MEMORY+) are ignored.  Input is read until a \verb+TASK+
84directive (see Section \ref{sec:task}) is encountered.  A \verb+TASK+
85directive requests that a calculation be performed and specifies the level
86of theory and the operation to be performed.  Input processing then
87stops and the specified task is executed.  The position of the
88\verb+TASK+ directive in effect marks the end of the input for that
89task.  Processing of the input resumes upon the successful completion
90of the task, and the results of that task are available to subsequent
91tasks in the same input file.
92
93The name of the input file is usually provided as an argument to the
94execute command for NWChem.  That is, the execute command looks
95something like the following;
96
97\begin{verbatim}
98  nwchem input_file
99\end{verbatim}
100
101The default name for the input file is \verb+nwchem.nw+.  If an input
102file name \verb+input_file+ is specified without an extension, the
103code assumes \verb+.nw+ as a default extension, and the input filename
104becomes \verb+input_file.nw+.  If the code cannot locate a file named
105either \verb+input_file+ or \verb+input_file.nw+ (or \verb+nwchem.nw+
106if no file name is provided), an error is reported and execution
107terminates.  The following section presents two input files to
108illustrate the directive syntax and input file format for NWChem
109applications.
110
111\section{Simple Input File --- SCF geometry optimization}
112\label{sec:simplesample}
113
114A simple example of an NWChem input file is an SCF geometry optimization of
115the nitrogen molecule, using a Dunning cc-pvdz basis set.  This input
116file contains the bare minimum of information the user must specify
117to run this type of problem --- fewer than ten lines of input,
118as follows:
119\begin{verbatim}
120  title "Nitrogen cc-pvdz SCF geometry optimization"
121  geometry
122    n 0 0 0
123    n 0 0 1.08
124  end
125  basis
126    n library cc-pvdz
127  end
128  task scf optimize
129\end{verbatim}
130
131Examining the input line by line, it can be seen that it contains
132only four directives; \verb+TITLE+, \verb+GEOMETRY+, \verb+BASIS+, and
133\verb+TASK+.  The \verb+TITLE+ directive is optional, and is provided
134as a means for the user to more easily identify outputs from different
135jobs.  An initial geometry is specified in Cartesian coordinates and
136{\angstroms} by means of the \verb+GEOMETRY+ directive.  The Dunning
137cc-pvdz basis is obtained from the NWChem basis library, as specified
138by the \verb+BASIS+ directive input.  The \verb+TASK+ directive requests
139an SCF geometry optimization.
140
141The \verb+GEOMETRY+ directive (Section \ref{sec:geom}) defaults to Cartesian
142coordinates and {\angstroms} (options include atomic units and
143Z-matrix format; see Section \ref{sec:Z-matrix}).  The input blocks for the  \verb+BASIS+
144and \verb+GEOMETRY+ directives are structured in similar fashion,
145i.e., name, keyword, \ldots, end (In this simple example, there are no keywords).  The \verb+BASIS+ input block {\em must} contain basis set information for
146every atom type in the geometry with which it will be used.
147Refer to Sections \ref{sec:basis} and \ref{sec:ecp}, and Appendix
148\ref{sec:knownbasis} for a description of available basis sets and a
149discussion of how to define new ones.
150
151The last line of this sample input file ({\tt task scf optimize})
152tells the program to optimize the molecular geometry by minimizing
153the SCF energy.  (For a description of possible tasks and the format
154of the \verb+TASK+ directive, refer to Section \ref{sec:task}.)
155
156If the input is stored in the file \verb+n2.nw+, the command to run
157this job on a typical UNIX workstation is as follows:
158
159\begin{verbatim}
160  nwchem n2
161\end{verbatim}
162
163NWChem output is to UNIX standard output, and error messages are sent to
164both standard output and standard error.
165
166\section{Water Molecule Sample Input File}
167\label{sec:realsample}
168
169A more complex sample problem is the optimization of a positively
170charged water molecule using second-order M{\o}ller-Plesset
171perturbation theory (MP2), followed by a computation of frequencies at
172the optimized geometry.  A preliminary SCF geometry optimization is
173performed using a computationally inexpensive basis set (STO-3G).
174This yields a good starting guess for the optimal geometry, and any
175Hessian information generated will be used in the next optimization
176step.  Then the optimization is finished using MP2 and a basis set
177with polarization functions.  The final task is to calculate the
178MP2 vibrational frequencies.  The input file to accomplish these three
179tasks is as follows:
180
181\begin{verbatim}
182start h2o_freq
183
184charge 1
185
186geometry units angstroms
187  O       0.0  0.0  0.0
188  H       0.0  0.0  1.0
189  H       0.0  1.0  0.0
190end
191
192basis
193  H library sto-3g
194  O library sto-3g
195end
196
197scf
198  uhf; doublet
199  print low
200end
201
202title "H2O+ : STO-3G UHF geometry optimization"
203
204task scf optimize
205
206basis
207  H library 6-31g**
208  O library 6-31g**
209end
210
211title "H2O+ : 6-31g** UMP2 geometry optimization"
212
213task mp2 optimize
214
215mp2; print none; end
216scf; print none; end
217
218title "H2O+ : 6-31g** UMP2 frequencies"
219
220task mp2 freq
221\end{verbatim}
222
223The \verb+START+ directive (Section \ref{sec:start}) tells NWChem that
224this run is to be started from the beginning.  This directive need not
225be at the beginning of the input file, but it is commonly placed there.
226Existing database or vector files are to be ignored or overwritten.
227The entry \verb+h2o_freq+ on the \verb+START+ line is the prefix to be used
228for all files created by the calculation.  This convention allows
229different jobs to run in the same directory or to share the same
230scratch directory (see Section \ref{sec:dirs}), as long as they use
231different prefix names in this field.
232
233As in the first sample problem, the geometry is given in Cartesian
234coordinates.  In this case, the units are specified as {\angstroms}.
235(Since this is the default, explicit specification of the units is not
236actually necessary, however.)  The {\tt CHARGE} directive defines the
237total charge of the system.  This calculation is to be done on an ion
238with charge +1.
239
240A small basis set (STO-3G) is specified for the intial geometry
241optimization.  Next, the multiple lines of the first {\tt SCF}
242directive in the {\tt scf \ldots end} block specify details about the
243SCF calculation to be performed.  Unrestricted Hartree-Fock is chosen
244here (by specifying the keyword {\tt uhf}), rather than the default,
245restricted open-shell high-spin Hartree-Fock (ROHF).  This is
246necessary for the subsequent MP2 calculation, because only UMP2 is
247currently available for open-shell systems (see Section
248\ref{sec:functionality}).  For open-shell systems, the spin
249multiplicity has to be specified (using {\tt doublet} in this case),
250or it defaults to {\tt singlet}.  The print level is set to {\tt low}
251to avoid verbose output for the starting basis calculations.
252
253All input up to this point affects only the settings in the runtime
254database.  The program takes its information from this database, so
255the sequence of directives up to the first \verb+TASK+ directive is
256irrelevant.  An exchange of order of the different blocks or
257directives would not affect the result.  The {\tt TASK} directive,
258however, must be specified after all relevant input for a given
259problem.  The {\tt TASK} directive causes the code to perform the
260specified calculation using the parameters set in the preceding
261directives. In this case, the first task is an SCF calculation with
262geometry optimization, specified with the input {\tt scf} and {\tt
263  optimize}.  (See Section \ref{sec:task} for a list of available
264tasks and operations.)
265
266After the completion of any task, settings in the database are used in
267subsequent tasks without change, unless they are overridden by new
268input directives.  In this example, before the second task
269(\verb+task mp2 optimize+),
270 a better basis set (6-31G**) is defined and the title
271is changed.  The second {\tt TASK} directive invokes an MP2 geometry
272optimization.
273
274Once the MP2 optimization is completed, the geometry obtained in the
275calculation is used to perform a frequency calculation.  This task is
276invoked by the keyword \verb+freq+ in the final \verb+TASK+ directive,
277\verb+task mp2 freq+.  The second derivatives of the energy are
278calculated as numerical derivatives of analytical gradients. The
279intermediate energies and gradients are not of interest in
280this case, so output from the SCF and MP2 modules is disabled with the
281\verb+PRINT+ directives.
282
283\section{Input Format and Syntax for Directives}
284\label{sec:syntax}
285
286This section describes the input format and the syntax used in the
287rest of this documentation to describe the format of directives.  The
288input format for the directives used in NWChem is similar to that of
289UNIX shells, which is also used in other chemistry packages, most
290notably GAMESS-UK.  An input line is parsed into whitespace (blanks or
291tabs) separating tokens or fields.  Any token that contains whitespace
292must be enclosed in double quotes in order to be processed correctly.
293For example, the basis set with the descriptive name
294\verb+modified Dunning DZ+ must appear in a directive as
295\verb+"modified Dunning DZ"+, since the name consists of three separate words.
296
297\subsection{Input Format}
298
299A (physical) line in the input file is terminated with a newline
300character (also known as a `return' or `enter' character).  A
301semicolon (\verb+;+) can be also used to indicate the end of an input
302line, allowing a single physical line of input to contain multiple
303logical lines of input.  For example, five lines of input for the
304\verb+GEOMETRY+ directive can be entered as follows;
305\begin{verbatim}
306  geometry
307   O 0  0     0
308   H 0  1.430 1.107
309   H 0 -1.430 1.107
310  end
311\end{verbatim}
312These same five lines could be entered on a single line, as
313\begin{verbatim}
314  geometry; O 0 0 0; H 0 1.430 1.107; H 0 -1.430 1.107; end
315\end{verbatim}
316This one physical input line comprises five logical
317input lines.  Each logical or physical input line must be no longer
318than 1023 characters.
319
320In the input file:
321\begin{itemize}
322\item a string, token, or field is a sequence of ASCII characters
323  (NOTE: if the string includes blanks or tabs (i.e., white space),
324  the entire string must be enclosed in double quotes).
325\item \verb+\+ (backslash) at the end of a line concatenates it with
326  the next line.  Note that a space character is automatically
327  inserted at this point so that it is {\em not} possible to split
328  tokens across lines.  A backslash is also used to quote special
329  characters such as whitespace, semi-colons, and hash symbols so as
330  to avoid their special meaning (NOTE: these special symbols must be
331  quoted with the backslash even when enclosed within double quotes).
332\item \verb+;+ (semicolon) is used to mark the end of a logical input
333  line within a physical line of input.
334\item \verb+#+ (the hash or pound symbol) is the comment character.
335  All characters following \verb+#+ (up to the end of the physical
336  line) are ignored.
337\item If {\em any} input line (excluding Python programs, Section
338\ref{sec:python}) begins with the string \verb+INCLUDE+ (ignoring
339case) and is followed by a valid file name, then the data in that file
340are read as if they were included into the current input file at the
341current line.  Up to three levels of nested include files are
342supported.  The user should note that inputting a basis set from the
343standard basis library (Section \ref{sec:basis}) uses one level of
344include.
345\item Data is read from the input file until an end-of-file is detected, or
346until the string \verb+EOF+ (ignoring case) is encountered at the
347beginning of an input line.
348\end{itemize}
349
350\subsection{Format and syntax of directives}
351
352Directives consist of a directive name, keywords, and optional input,
353and may contain one line or many.  Simple directives consist of a
354single line of input with one or more fields.  Compound directives can
355have multiple input lines, and can also include other optional simple
356and compound directives.  A compound directive is terminated with an
357END directive.  The directives START (see Section \ref{sec:start}) and
358ECHO (see Section \ref{sec:echo}) are examples of simple directives.
359The directive GEOMETRY (see Section \ref{sec:geom}) is an example of a
360compound directive.
361
362Some limited checking of the input for self-consistency is performed
363by the input module, but most defaults are imposed by the application
364modules at runtime.  It is therefore usually impossible to determine
365beforehand whether or not all selected options are consistent with
366each other.
367
368\sloppy
369
370In the rest of this document, the following notation and syntax
371conventions are used in the generic descriptions of the NWChem input.
372\begin{itemize}
373\item a directive name always appears in all-capitals, and in computer
374  typeface (e.g., \verb+GEOMETRY+, \verb+BASIS+, \verb+SCF+).  Note
375  that the case of directives and keywords is ignored in the actual
376  input.
377\item a keyword always appears in lower case, in computer typeface
378  (e.g., {\tt swap}, {\tt print}, {\tt units}, {\tt bqbq}).
379\item variable names always appear in lower case, in computer
380  typeface, and enclosed in angle brackets to distinguish them from
381  keywords (e.g., {\tt <input\_filename>}, {\tt <basisname>}, {\tt
382    <tag>}).
383\item \verb+$variable$+ is used to indicate the substitution of the
384  value of a variable.
385\item \verb+()+ is used to group items (the parentheses and other
386  special symbols should not appear in the input).
387\item \verb+||+ separate exclusive options, parameters, or formats.
388\item \verb+[ ]+ enclose optional entries that have a default value.
389\item \verb+< >+ enclose a type, a name of a value to be specified, or
390  a default value, if any.
391\item \verb+\+ is used to concatenate lines in a description.
392\item \verb+...+ is used to indicate indefinite continuation of a
393  list.
394\end{itemize}
395
396\fussy
397
398An input parameter is identified in the description of the directive
399by prefacing the name of the item with the type of data expected,
400i.e.,
401\begin{itemize}
402\item \verb+string +  -- an ASCII character string
403\item \verb+integer+ --  integer value(s) for a variable or an array
404\item \verb+logical+ --  true/false logical variable
405\item \verb+real   +  -- real floating point value(s) for a variable or an array
406\item \verb+double + -- synonymous with real
407\end{itemize}
408
409If an input item is not prefaced by one of these type names,
410it is assumed to be of type ``string''.
411
412In addition, integer lists may be specified using Fortran triplet
413notation, which interprets \verb+lo:hi:inc+ as \verb+lo+, \verb=lo+inc=,
414\verb=lo+2*inc=, \ldots, \verb+hi+.  For example, where a list of
415integers is expected in the input, the following two lines are
416equivalent
417\begin{verbatim}
418   7 10 21:27:2 1:3 99
419   7 10 21 23 25 27 1 2 3 99
420\end{verbatim}
421(In Fortran triplet notation,  the increment, if unstated, is 1; e.g., 1:3 = 1:3:1.)
422
423The directive \verb+VECTORS+ (Section \ref{sec:vectors}) is presented here
424as an example of an NWChem input directive.  The general form of the
425directive is as follows:
426\begin{verbatim}
427  VECTORS [input (<string input_movecs default atomic>) || \
428                   (project <string basisname> <string filename>)] \
429          [swap [(alpha||beta)] <integer vec1 vec2> ...] \
430          [output <string output_movecs default $file_prefix$.movecs>]
431\end{verbatim}
432
433This directive contains three optional keywords, as indicated by the
434three main sets of square brackets enclosing the keywords \verb+input+,
435\verb+swap+, and \verb+output+.  The keyword \verb+input+ allows the
436user to specify the source of the molecular orbital vectors.
437There are two mutually exclusive options for
438specifying the vectors, as indicated by the \verb+||+ symbol
439separating the option descriptions;
440\begin{verbatim}
441  (<string input_movecs default atomic>) || \
442                  (project <string basisname> <string filename>) \
443\end{verbatim}
444
445The first option, \verb+(<string input_movecs default atomic>)+,
446allows the user to specify an ASCII character string for the parameter
447{\tt input\_movecs}.  If no entry is specified, the code uses the
448default \verb+atomic+ (i.e., atomic guess).  The second option,
449{\tt(project <string basisname> <string filename>)}, contains the
450keyword \verb+project+, which takes two string arguments.  When this
451keyword is used, the vectors in file \verb+<filename>+ will be
452projected from the (smaller) basis \verb+<basisname>+ into the current
453atomic orbital (AO) basis.
454
455The second keyword, \verb+swap+, allows the user to re-order the
456starting vectors, specifying the pairs of vectors to be swapped.  As
457many pairs as the user wishes to have swapped can be listed for {\tt
458  <integer vec1 vec2 ... >}.  The optional keywords \verb+alpha+ and
459\verb+beta+ allow the user to swap the alpha or beta spin orbitals.
460
461The third keyword, \verb+output+, allows the user to tell the code
462where to store the vectors, by specifying an ASCII string for the
463parameter {\tt output\_movecs}.  If no entry is specified for this
464parameter, the default is to write the vectors back into either the
465user- specified MO vectors input file or, if this is not available,
466the file \verb+$file_prefix$.movecs+.
467
468A particular example of the \verb+VECTORS+ directive is shown below.
469It specifies both the \verb+input+ and \verb+output+ keywords, but
470does not use the \verb+swap+ option.
471\begin{verbatim}
472  vectors input project "small basis" small_basis.movecs \
473          output large_basis.movecs
474\end{verbatim}
475This directive tells the code to generate input vectors by projecting
476from vectors in a smaller basis named \verb+"small basis"+, which is
477stored in the file \verb+small_basis.movecs+.  The output vectors will
478be stored in the file \verb+large_basis.movecs+.
479
480The order of keyed optional entries within a directive should not
481matter, unless noted otherwise in the specific instructions for a
482particular directive.
483