doc/user/getstart.tex

%
% $Id$
%
\label{sec:getstart}

This section provides an overview of NWChem input and program
architecture, and the syntax used to describe the input.  See Sections
\ref{sec:simplesample} and \ref{sec:realsample} for examples of NWChem
input files with detailed explanation.

NWChem consists of independent modules that perform the various
functions of the code.  Examples of modules include the input parser,
SCF energy, SCF analytic gradient, DFT energy, etc..  Data is passed
between modules and saved for restart using a disk-resident database
or dumpfile (see Section \ref{sec:arch}).

The input to NWChem is composed of commands, called directives, which
define data (such as basis sets, geometries, and filenames) and the
actions to be performed on that data.  Directives are processed in the order
presented in the input file, with the exception of certain start-up
directives (see Section \ref{sec:inputstructure}) which provide
critical job control information, and are processed before all other
input.  Most directives are specific to a particular module and define
data that is used by that module only.  A few directives (see Section
\ref{sec:toplevel}) potentially affect all modules, for instance by
specifying the total electric charge on the system.

There are two types of directives.  Simple directives consist of one
line of input, which may contain multiple fields.  Compound directives
group together multiple simple directives that are in some way
related and are terminated with an \verb+END+ directive.  See the
sample inputs (Sections \ref{sec:simplesample}, \ref{sec:realsample})
and the input syntax specification (Section \ref{sec:syntax}).

All input is free format and case is ignored except for actual data
(e.g., names/tags of centers, titles). Directives or blocks of
module-specific directives (i.e., compound directives) can appear in
any order, with the exception of the \verb+TASK+ directive (see
sections \ref{sec:inputstructure} and \ref{sec:task}) which is used to
invoke an NWChem module.  All input for a given task must
precede the \verb+TASK+ directive.  This input specification rule
allows the concatenation of multiple tasks in a single NWChem input
file.

To make the input as short and simple as possible, most options have
default values.  The user needs to supply input only for those items that
have no defaults, or for items that must be different from the defaults
for the particular application.  In the discussion of each directive, the
defaults are noted, where applicable.

The input file structure is described in the following sections, and
illustrated with two examples.  The input format and syntax for directives
is also described in detail.

\section{Input File Structure}
\label{sec:inputstructure}

The structure of an input file reflects the internal structure of
NWChem.  At the beginning of a calculation, NWChem needs to determine
how much memory to use, the name of the database, whether it is a new or
restarted job, where to put scratch/permanent files,
etc..  It is not necessary to put this information at the top of the
input file, however.  NWChem will read through the {\em entire} input
file looking for the start-up directives.  In this first pass, all other
directives are ignored.

The start-up directives are
\begin{itemize}
\item \verb+START+
\item \verb+RESTART+
\item \verb+SCRATCH_DIR+
\item \verb+PERMANENT_DIR+
\item \verb=MEMORY=
\item \verb=ECHO=
\end{itemize}

After the input file has been scanned for the start-up directives, it
is rewound and read sequentially.  Input is processed either by the
top-level parser (for the directives listed in Section
\ref{sec:toplevel}, such as \verb+TITLE+, \verb+SET+, \ldots) or by
the parsers for specific computational modules (e.g., SCF, DFT,
\ldots).  Any directives that have already been processed (e.g.,
\verb+MEMORY+) are ignored.  Input is read until a \verb+TASK+
directive (see Section \ref{sec:task}) is encountered.  A \verb+TASK+
directive requests that a calculation be performed and specifies the level
of theory and the operation to be performed.  Input processing then
stops and the specified task is executed.  The position of the
\verb+TASK+ directive in effect marks the end of the input for that
task.  Processing of the input resumes upon the successful completion
of the task, and the results of that task are available to subsequent
tasks in the same input file.

The name of the input file is usually provided as an argument to the
execute command for NWChem.  That is, the execute command looks
something like the following;

\begin{verbatim}
  nwchem input_file
\end{verbatim}

The default name for the input file is \verb+nwchem.nw+.  If an input
file name \verb+input_file+ is specified without an extension, the
code assumes \verb+.nw+ as a default extension, and the input filename
becomes \verb+input_file.nw+.  If the code cannot locate a file named
either \verb+input_file+ or \verb+input_file.nw+ (or \verb+nwchem.nw+
if no file name is provided), an error is reported and execution
terminates.  The following section presents two input files to
illustrate the directive syntax and input file format for NWChem
applications.

\section{Simple Input File --- SCF geometry optimization}
\label{sec:simplesample}

A simple example of an NWChem input file is an SCF geometry optimization of
the nitrogen molecule, using a Dunning cc-pvdz basis set.  This input
file contains the bare minimum of information the user must specify
to run this type of problem --- fewer than ten lines of input,
as follows:
\begin{verbatim}
  title "Nitrogen cc-pvdz SCF geometry optimization"
  geometry
    n 0 0 0
    n 0 0 1.08
  end
  basis
    n library cc-pvdz
  end
  task scf optimize
\end{verbatim}

Examining the input line by line, it can be seen that it contains
only four directives; \verb+TITLE+, \verb+GEOMETRY+, \verb+BASIS+, and
\verb+TASK+.  The \verb+TITLE+ directive is optional, and is provided
as a means for the user to more easily identify outputs from different
jobs.  An initial geometry is specified in Cartesian coordinates and
{\angstroms} by means of the \verb+GEOMETRY+ directive.  The Dunning
cc-pvdz basis is obtained from the NWChem basis library, as specified
by the \verb+BASIS+ directive input.  The \verb+TASK+ directive requests
an SCF geometry optimization.

The \verb+GEOMETRY+ directive (Section \ref{sec:geom}) defaults to Cartesian
coordinates and {\angstroms} (options include atomic units and
Z-matrix format; see Section \ref{sec:Z-matrix}).  The input blocks for the  \verb+BASIS+
and \verb+GEOMETRY+ directives are structured in similar fashion,
i.e., name, keyword, \ldots, end (In this simple example, there are no keywords).  The \verb+BASIS+ input block {\em must} contain basis set information for
every atom type in the geometry with which it will be used.
Refer to Sections \ref{sec:basis} and \ref{sec:ecp}, and Appendix
\ref{sec:knownbasis} for a description of available basis sets and a
discussion of how to define new ones.

The last line of this sample input file ({\tt task scf optimize})
tells the program to optimize the molecular geometry by minimizing
the SCF energy.  (For a description of possible tasks and the format
of the \verb+TASK+ directive, refer to Section \ref{sec:task}.)

If the input is stored in the file \verb+n2.nw+, the command to run
this job on a typical UNIX workstation is as follows:

\begin{verbatim}
  nwchem n2
\end{verbatim}

NWChem output is to UNIX standard output, and error messages are sent to
both standard output and standard error.

\section{Water Molecule Sample Input File}
\label{sec:realsample}

A more complex sample problem is the optimization of a positively
charged water molecule using second-order M{\o}ller-Plesset
perturbation theory (MP2), followed by a computation of frequencies at
the optimized geometry.  A preliminary SCF geometry optimization is
performed using a computationally inexpensive basis set (STO-3G).
This yields a good starting guess for the optimal geometry, and any
Hessian information generated will be used in the next optimization
step.  Then the optimization is finished using MP2 and a basis set
with polarization functions.  The final task is to calculate the
MP2 vibrational frequencies.  The input file to accomplish these three
tasks is as follows:

\begin{verbatim}
start h2o_freq

charge 1

geometry units angstroms
  O       0.0  0.0  0.0
  H       0.0  0.0  1.0
  H       0.0  1.0  0.0
end

basis
  H library sto-3g
  O library sto-3g
end

scf
  uhf; doublet
  print low
end

title "H2O+ : STO-3G UHF geometry optimization"

task scf optimize

basis
  H library 6-31g**
  O library 6-31g**
end

title "H2O+ : 6-31g** UMP2 geometry optimization"

task mp2 optimize

mp2; print none; end
scf; print none; end

title "H2O+ : 6-31g** UMP2 frequencies"

task mp2 freq
\end{verbatim}

The \verb+START+ directive (Section \ref{sec:start}) tells NWChem that
this run is to be started from the beginning.  This directive need not
be at the beginning of the input file, but it is commonly placed there.
Existing database or vector files are to be ignored or overwritten.
The entry \verb+h2o_freq+ on the \verb+START+ line is the prefix to be used
for all files created by the calculation.  This convention allows
different jobs to run in the same directory or to share the same
scratch directory (see Section \ref{sec:dirs}), as long as they use
different prefix names in this field.

As in the first sample problem, the geometry is given in Cartesian
coordinates.  In this case, the units are specified as {\angstroms}.
(Since this is the default, explicit specification of the units is not
actually necessary, however.)  The {\tt CHARGE} directive defines the
total charge of the system.  This calculation is to be done on an ion
with charge +1.

A small basis set (STO-3G) is specified for the intial geometry
optimization.  Next, the multiple lines of the first {\tt SCF}
directive in the {\tt scf \ldots end} block specify details about the
SCF calculation to be performed.  Unrestricted Hartree-Fock is chosen
here (by specifying the keyword {\tt uhf}), rather than the default,
restricted open-shell high-spin Hartree-Fock (ROHF).  This is
necessary for the subsequent MP2 calculation, because only UMP2 is
currently available for open-shell systems (see Section
\ref{sec:functionality}).  For open-shell systems, the spin
multiplicity has to be specified (using {\tt doublet} in this case),
or it defaults to {\tt singlet}.  The print level is set to {\tt low}
to avoid verbose output for the starting basis calculations.

All input up to this point affects only the settings in the runtime
database.  The program takes its information from this database, so
the sequence of directives up to the first \verb+TASK+ directive is
irrelevant.  An exchange of order of the different blocks or
directives would not affect the result.  The {\tt TASK} directive,
however, must be specified after all relevant input for a given
problem.  The {\tt TASK} directive causes the code to perform the
specified calculation using the parameters set in the preceding
directives. In this case, the first task is an SCF calculation with
geometry optimization, specified with the input {\tt scf} and {\tt
  optimize}.  (See Section \ref{sec:task} for a list of available
tasks and operations.)

After the completion of any task, settings in the database are used in
subsequent tasks without change, unless they are overridden by new
input directives.  In this example, before the second task
(\verb+task mp2 optimize+),
 a better basis set (6-31G**) is defined and the title
is changed.  The second {\tt TASK} directive invokes an MP2 geometry
optimization.

Once the MP2 optimization is completed, the geometry obtained in the
calculation is used to perform a frequency calculation.  This task is
invoked by the keyword \verb+freq+ in the final \verb+TASK+ directive,
\verb+task mp2 freq+.  The second derivatives of the energy are
calculated as numerical derivatives of analytical gradients. The
intermediate energies and gradients are not of interest in
this case, so output from the SCF and MP2 modules is disabled with the
\verb+PRINT+ directives.

\section{Input Format and Syntax for Directives}
\label{sec:syntax}

This section describes the input format and the syntax used in the
rest of this documentation to describe the format of directives.  The
input format for the directives used in NWChem is similar to that of
UNIX shells, which is also used in other chemistry packages, most
notably GAMESS-UK.  An input line is parsed into whitespace (blanks or
tabs) separating tokens or fields.  Any token that contains whitespace
must be enclosed in double quotes in order to be processed correctly.
For example, the basis set with the descriptive name
\verb+modified Dunning DZ+ must appear in a directive as
\verb+"modified Dunning DZ"+, since the name consists of three separate words.

\subsection{Input Format}

A (physical) line in the input file is terminated with a newline
character (also known as a `return' or `enter' character).  A
semicolon (\verb+;+) can be also used to indicate the end of an input
line, allowing a single physical line of input to contain multiple
logical lines of input.  For example, five lines of input for the
\verb+GEOMETRY+ directive can be entered as follows;
\begin{verbatim}
  geometry
   O 0  0     0
   H 0  1.430 1.107
   H 0 -1.430 1.107
  end
\end{verbatim}
These same five lines could be entered on a single line, as
\begin{verbatim}
  geometry; O 0 0 0; H 0 1.430 1.107; H 0 -1.430 1.107; end
\end{verbatim}
This one physical input line comprises five logical
input lines.  Each logical or physical input line must be no longer
than 1023 characters.

In the input file:
\begin{itemize}
\item a string, token, or field is a sequence of ASCII characters
  (NOTE: if the string includes blanks or tabs (i.e., white space),
  the entire string must be enclosed in double quotes).
\item \verb+\+ (backslash) at the end of a line concatenates it with
  the next line.  Note that a space character is automatically
  inserted at this point so that it is {\em not} possible to split
  tokens across lines.  A backslash is also used to quote special
  characters such as whitespace, semi-colons, and hash symbols so as
  to avoid their special meaning (NOTE: these special symbols must be
  quoted with the backslash even when enclosed within double quotes).
\item \verb+;+ (semicolon) is used to mark the end of a logical input
  line within a physical line of input.
\item \verb+#+ (the hash or pound symbol) is the comment character.
  All characters following \verb+#+ (up to the end of the physical
  line) are ignored.
\item If {\em any} input line (excluding Python programs, Section
\ref{sec:python}) begins with the string \verb+INCLUDE+ (ignoring
case) and is followed by a valid file name, then the data in that file
are read as if they were included into the current input file at the
current line.  Up to three levels of nested include files are
supported.  The user should note that inputting a basis set from the
standard basis library (Section \ref{sec:basis}) uses one level of
include.
\item Data is read from the input file until an end-of-file is detected, or
until the string \verb+EOF+ (ignoring case) is encountered at the
beginning of an input line.
\end{itemize}

\subsection{Format and syntax of directives}

Directives consist of a directive name, keywords, and optional input,
and may contain one line or many.  Simple directives consist of a
single line of input with one or more fields.  Compound directives can
have multiple input lines, and can also include other optional simple
and compound directives.  A compound directive is terminated with an
END directive.  The directives START (see Section \ref{sec:start}) and
ECHO (see Section \ref{sec:echo}) are examples of simple directives.
The directive GEOMETRY (see Section \ref{sec:geom}) is an example of a
compound directive.

Some limited checking of the input for self-consistency is performed
by the input module, but most defaults are imposed by the application
modules at runtime.  It is therefore usually impossible to determine
beforehand whether or not all selected options are consistent with
each other.

\sloppy

In the rest of this document, the following notation and syntax
conventions are used in the generic descriptions of the NWChem input.
\begin{itemize}
\item a directive name always appears in all-capitals, and in computer
  typeface (e.g., \verb+GEOMETRY+, \verb+BASIS+, \verb+SCF+).  Note
  that the case of directives and keywords is ignored in the actual
  input.
\item a keyword always appears in lower case, in computer typeface
  (e.g., {\tt swap}, {\tt print}, {\tt units}, {\tt bqbq}).
\item variable names always appear in lower case, in computer
  typeface, and enclosed in angle brackets to distinguish them from
  keywords (e.g., {\tt <input\_filename>}, {\tt <basisname>}, {\tt
    <tag>}).
\item \verb+$variable$+ is used to indicate the substitution of the
  value of a variable.
\item \verb+()+ is used to group items (the parentheses and other
  special symbols should not appear in the input).
\item \verb+||+ separate exclusive options, parameters, or formats.
\item \verb+[ ]+ enclose optional entries that have a default value.
\item \verb+< >+ enclose a type, a name of a value to be specified, or
  a default value, if any.
\item \verb+\+ is used to concatenate lines in a description.
\item \verb+...+ is used to indicate indefinite continuation of a
  list.
\end{itemize}

\fussy

An input parameter is identified in the description of the directive
by prefacing the name of the item with the type of data expected,
i.e.,
\begin{itemize}
\item \verb+string +  -- an ASCII character string
\item \verb+integer+ --  integer value(s) for a variable or an array
\item \verb+logical+ --  true/false logical variable
\item \verb+real   +  -- real floating point value(s) for a variable or an array
\item \verb+double + -- synonymous with real
\end{itemize}

If an input item is not prefaced by one of these type names,
it is assumed to be of type ``string''.

In addition, integer lists may be specified using Fortran triplet
notation, which interprets \verb+lo:hi:inc+ as \verb+lo+, \verb=lo+inc=,
\verb=lo+2*inc=, \ldots, \verb+hi+.  For example, where a list of
integers is expected in the input, the following two lines are
equivalent
\begin{verbatim}
   7 10 21:27:2 1:3 99
   7 10 21 23 25 27 1 2 3 99
\end{verbatim}
(In Fortran triplet notation,  the increment, if unstated, is 1; e.g., 1:3 = 1:3:1.)

The directive \verb+VECTORS+ (Section \ref{sec:vectors}) is presented here
as an example of an NWChem input directive.  The general form of the
directive is as follows:
\begin{verbatim}
  VECTORS [input (<string input_movecs default atomic>) || \
                   (project <string basisname> <string filename>)] \
          [swap [(alpha||beta)] <integer vec1 vec2> ...] \
          [output <string output_movecs default $file_prefix$.movecs>]
\end{verbatim}

This directive contains three optional keywords, as indicated by the
three main sets of square brackets enclosing the keywords \verb+input+,
\verb+swap+, and \verb+output+.  The keyword \verb+input+ allows the
user to specify the source of the molecular orbital vectors.
There are two mutually exclusive options for
specifying the vectors, as indicated by the \verb+||+ symbol
separating the option descriptions;
\begin{verbatim}
  (<string input_movecs default atomic>) || \
                  (project <string basisname> <string filename>) \
\end{verbatim}

The first option, \verb+(<string input_movecs default atomic>)+,
allows the user to specify an ASCII character string for the parameter
{\tt input\_movecs}.  If no entry is specified, the code uses the
default \verb+atomic+ (i.e., atomic guess).  The second option,
{\tt(project <string basisname> <string filename>)}, contains the
keyword \verb+project+, which takes two string arguments.  When this
keyword is used, the vectors in file \verb+<filename>+ will be
projected from the (smaller) basis \verb+<basisname>+ into the current
atomic orbital (AO) basis.

The second keyword, \verb+swap+, allows the user to re-order the
starting vectors, specifying the pairs of vectors to be swapped.  As
many pairs as the user wishes to have swapped can be listed for {\tt
  <integer vec1 vec2 ... >}.  The optional keywords \verb+alpha+ and
\verb+beta+ allow the user to swap the alpha or beta spin orbitals.

The third keyword, \verb+output+, allows the user to tell the code
where to store the vectors, by specifying an ASCII string for the
parameter {\tt output\_movecs}.  If no entry is specified for this
parameter, the default is to write the vectors back into either the
user- specified MO vectors input file or, if this is not available,
the file \verb+$file_prefix$.movecs+.

A particular example of the \verb+VECTORS+ directive is shown below.
It specifies both the \verb+input+ and \verb+output+ keywords, but
does not use the \verb+swap+ option.
\begin{verbatim}
  vectors input project "small basis" small_basis.movecs \
          output large_basis.movecs
\end{verbatim}
This directive tells the code to generate input vectors by projecting
from vectors in a smaller basis named \verb+"small basis"+, which is
stored in the file \verb+small_basis.movecs+.  The output vectors will
be stored in the file \verb+large_basis.movecs+.

The order of keyed optional entries within a directive should not
matter, unless noted otherwise in the specific instructions for a
particular directive.