1% 2% $Id$ 3% 4\label{sec:getstart} 5 6This section provides an overview of NWChem input and program 7architecture, and the syntax used to describe the input. See Sections 8\ref{sec:simplesample} and \ref{sec:realsample} for examples of NWChem 9input files with detailed explanation. 10 11NWChem consists of independent modules that perform the various 12functions of the code. Examples of modules include the input parser, 13SCF energy, SCF analytic gradient, DFT energy, etc.. Data is passed 14between modules and saved for restart using a disk-resident database 15or dumpfile (see Section \ref{sec:arch}). 16 17The input to NWChem is composed of commands, called directives, which 18define data (such as basis sets, geometries, and filenames) and the 19actions to be performed on that data. Directives are processed in the order 20presented in the input file, with the exception of certain start-up 21directives (see Section \ref{sec:inputstructure}) which provide 22critical job control information, and are processed before all other 23input. Most directives are specific to a particular module and define 24data that is used by that module only. A few directives (see Section 25\ref{sec:toplevel}) potentially affect all modules, for instance by 26specifying the total electric charge on the system. 27 28There are two types of directives. Simple directives consist of one 29line of input, which may contain multiple fields. Compound directives 30group together multiple simple directives that are in some way 31related and are terminated with an \verb+END+ directive. See the 32sample inputs (Sections \ref{sec:simplesample}, \ref{sec:realsample}) 33and the input syntax specification (Section \ref{sec:syntax}). 34 35All input is free format and case is ignored except for actual data 36(e.g., names/tags of centers, titles). Directives or blocks of 37module-specific directives (i.e., compound directives) can appear in 38any order, with the exception of the \verb+TASK+ directive (see 39sections \ref{sec:inputstructure} and \ref{sec:task}) which is used to 40invoke an NWChem module. All input for a given task must 41precede the \verb+TASK+ directive. This input specification rule 42allows the concatenation of multiple tasks in a single NWChem input 43file. 44 45To make the input as short and simple as possible, most options have 46default values. The user needs to supply input only for those items that 47have no defaults, or for items that must be different from the defaults 48for the particular application. In the discussion of each directive, the 49defaults are noted, where applicable. 50 51The input file structure is described in the following sections, and 52illustrated with two examples. The input format and syntax for directives 53is also described in detail. 54 55\section{Input File Structure} 56\label{sec:inputstructure} 57 58The structure of an input file reflects the internal structure of 59NWChem. At the beginning of a calculation, NWChem needs to determine 60how much memory to use, the name of the database, whether it is a new or 61restarted job, where to put scratch/permanent files, 62etc.. It is not necessary to put this information at the top of the 63input file, however. NWChem will read through the {\em entire} input 64file looking for the start-up directives. In this first pass, all other 65directives are ignored. 66 67The start-up directives are 68\begin{itemize} 69\item \verb+START+ 70\item \verb+RESTART+ 71\item \verb+SCRATCH_DIR+ 72\item \verb+PERMANENT_DIR+ 73\item \verb=MEMORY= 74\item \verb=ECHO= 75\end{itemize} 76 77After the input file has been scanned for the start-up directives, it 78is rewound and read sequentially. Input is processed either by the 79top-level parser (for the directives listed in Section 80\ref{sec:toplevel}, such as \verb+TITLE+, \verb+SET+, \ldots) or by 81the parsers for specific computational modules (e.g., SCF, DFT, 82\ldots). Any directives that have already been processed (e.g., 83\verb+MEMORY+) are ignored. Input is read until a \verb+TASK+ 84directive (see Section \ref{sec:task}) is encountered. A \verb+TASK+ 85directive requests that a calculation be performed and specifies the level 86of theory and the operation to be performed. Input processing then 87stops and the specified task is executed. The position of the 88\verb+TASK+ directive in effect marks the end of the input for that 89task. Processing of the input resumes upon the successful completion 90of the task, and the results of that task are available to subsequent 91tasks in the same input file. 92 93The name of the input file is usually provided as an argument to the 94execute command for NWChem. That is, the execute command looks 95something like the following; 96 97\begin{verbatim} 98 nwchem input_file 99\end{verbatim} 100 101The default name for the input file is \verb+nwchem.nw+. If an input 102file name \verb+input_file+ is specified without an extension, the 103code assumes \verb+.nw+ as a default extension, and the input filename 104becomes \verb+input_file.nw+. If the code cannot locate a file named 105either \verb+input_file+ or \verb+input_file.nw+ (or \verb+nwchem.nw+ 106if no file name is provided), an error is reported and execution 107terminates. The following section presents two input files to 108illustrate the directive syntax and input file format for NWChem 109applications. 110 111\section{Simple Input File --- SCF geometry optimization} 112\label{sec:simplesample} 113 114A simple example of an NWChem input file is an SCF geometry optimization of 115the nitrogen molecule, using a Dunning cc-pvdz basis set. This input 116file contains the bare minimum of information the user must specify 117to run this type of problem --- fewer than ten lines of input, 118as follows: 119\begin{verbatim} 120 title "Nitrogen cc-pvdz SCF geometry optimization" 121 geometry 122 n 0 0 0 123 n 0 0 1.08 124 end 125 basis 126 n library cc-pvdz 127 end 128 task scf optimize 129\end{verbatim} 130 131Examining the input line by line, it can be seen that it contains 132only four directives; \verb+TITLE+, \verb+GEOMETRY+, \verb+BASIS+, and 133\verb+TASK+. The \verb+TITLE+ directive is optional, and is provided 134as a means for the user to more easily identify outputs from different 135jobs. An initial geometry is specified in Cartesian coordinates and 136{\angstroms} by means of the \verb+GEOMETRY+ directive. The Dunning 137cc-pvdz basis is obtained from the NWChem basis library, as specified 138by the \verb+BASIS+ directive input. The \verb+TASK+ directive requests 139an SCF geometry optimization. 140 141The \verb+GEOMETRY+ directive (Section \ref{sec:geom}) defaults to Cartesian 142coordinates and {\angstroms} (options include atomic units and 143Z-matrix format; see Section \ref{sec:Z-matrix}). The input blocks for the \verb+BASIS+ 144and \verb+GEOMETRY+ directives are structured in similar fashion, 145i.e., name, keyword, \ldots, end (In this simple example, there are no keywords). The \verb+BASIS+ input block {\em must} contain basis set information for 146every atom type in the geometry with which it will be used. 147Refer to Sections \ref{sec:basis} and \ref{sec:ecp}, and Appendix 148\ref{sec:knownbasis} for a description of available basis sets and a 149discussion of how to define new ones. 150 151The last line of this sample input file ({\tt task scf optimize}) 152tells the program to optimize the molecular geometry by minimizing 153the SCF energy. (For a description of possible tasks and the format 154of the \verb+TASK+ directive, refer to Section \ref{sec:task}.) 155 156If the input is stored in the file \verb+n2.nw+, the command to run 157this job on a typical UNIX workstation is as follows: 158 159\begin{verbatim} 160 nwchem n2 161\end{verbatim} 162 163NWChem output is to UNIX standard output, and error messages are sent to 164both standard output and standard error. 165 166\section{Water Molecule Sample Input File} 167\label{sec:realsample} 168 169A more complex sample problem is the optimization of a positively 170charged water molecule using second-order M{\o}ller-Plesset 171perturbation theory (MP2), followed by a computation of frequencies at 172the optimized geometry. A preliminary SCF geometry optimization is 173performed using a computationally inexpensive basis set (STO-3G). 174This yields a good starting guess for the optimal geometry, and any 175Hessian information generated will be used in the next optimization 176step. Then the optimization is finished using MP2 and a basis set 177with polarization functions. The final task is to calculate the 178MP2 vibrational frequencies. The input file to accomplish these three 179tasks is as follows: 180 181\begin{verbatim} 182start h2o_freq 183 184charge 1 185 186geometry units angstroms 187 O 0.0 0.0 0.0 188 H 0.0 0.0 1.0 189 H 0.0 1.0 0.0 190end 191 192basis 193 H library sto-3g 194 O library sto-3g 195end 196 197scf 198 uhf; doublet 199 print low 200end 201 202title "H2O+ : STO-3G UHF geometry optimization" 203 204task scf optimize 205 206basis 207 H library 6-31g** 208 O library 6-31g** 209end 210 211title "H2O+ : 6-31g** UMP2 geometry optimization" 212 213task mp2 optimize 214 215mp2; print none; end 216scf; print none; end 217 218title "H2O+ : 6-31g** UMP2 frequencies" 219 220task mp2 freq 221\end{verbatim} 222 223The \verb+START+ directive (Section \ref{sec:start}) tells NWChem that 224this run is to be started from the beginning. This directive need not 225be at the beginning of the input file, but it is commonly placed there. 226Existing database or vector files are to be ignored or overwritten. 227The entry \verb+h2o_freq+ on the \verb+START+ line is the prefix to be used 228for all files created by the calculation. This convention allows 229different jobs to run in the same directory or to share the same 230scratch directory (see Section \ref{sec:dirs}), as long as they use 231different prefix names in this field. 232 233As in the first sample problem, the geometry is given in Cartesian 234coordinates. In this case, the units are specified as {\angstroms}. 235(Since this is the default, explicit specification of the units is not 236actually necessary, however.) The {\tt CHARGE} directive defines the 237total charge of the system. This calculation is to be done on an ion 238with charge +1. 239 240A small basis set (STO-3G) is specified for the intial geometry 241optimization. Next, the multiple lines of the first {\tt SCF} 242directive in the {\tt scf \ldots end} block specify details about the 243SCF calculation to be performed. Unrestricted Hartree-Fock is chosen 244here (by specifying the keyword {\tt uhf}), rather than the default, 245restricted open-shell high-spin Hartree-Fock (ROHF). This is 246necessary for the subsequent MP2 calculation, because only UMP2 is 247currently available for open-shell systems (see Section 248\ref{sec:functionality}). For open-shell systems, the spin 249multiplicity has to be specified (using {\tt doublet} in this case), 250or it defaults to {\tt singlet}. The print level is set to {\tt low} 251to avoid verbose output for the starting basis calculations. 252 253All input up to this point affects only the settings in the runtime 254database. The program takes its information from this database, so 255the sequence of directives up to the first \verb+TASK+ directive is 256irrelevant. An exchange of order of the different blocks or 257directives would not affect the result. The {\tt TASK} directive, 258however, must be specified after all relevant input for a given 259problem. The {\tt TASK} directive causes the code to perform the 260specified calculation using the parameters set in the preceding 261directives. In this case, the first task is an SCF calculation with 262geometry optimization, specified with the input {\tt scf} and {\tt 263 optimize}. (See Section \ref{sec:task} for a list of available 264tasks and operations.) 265 266After the completion of any task, settings in the database are used in 267subsequent tasks without change, unless they are overridden by new 268input directives. In this example, before the second task 269(\verb+task mp2 optimize+), 270 a better basis set (6-31G**) is defined and the title 271is changed. The second {\tt TASK} directive invokes an MP2 geometry 272optimization. 273 274Once the MP2 optimization is completed, the geometry obtained in the 275calculation is used to perform a frequency calculation. This task is 276invoked by the keyword \verb+freq+ in the final \verb+TASK+ directive, 277\verb+task mp2 freq+. The second derivatives of the energy are 278calculated as numerical derivatives of analytical gradients. The 279intermediate energies and gradients are not of interest in 280this case, so output from the SCF and MP2 modules is disabled with the 281\verb+PRINT+ directives. 282 283\section{Input Format and Syntax for Directives} 284\label{sec:syntax} 285 286This section describes the input format and the syntax used in the 287rest of this documentation to describe the format of directives. The 288input format for the directives used in NWChem is similar to that of 289UNIX shells, which is also used in other chemistry packages, most 290notably GAMESS-UK. An input line is parsed into whitespace (blanks or 291tabs) separating tokens or fields. Any token that contains whitespace 292must be enclosed in double quotes in order to be processed correctly. 293For example, the basis set with the descriptive name 294\verb+modified Dunning DZ+ must appear in a directive as 295\verb+"modified Dunning DZ"+, since the name consists of three separate words. 296 297\subsection{Input Format} 298 299A (physical) line in the input file is terminated with a newline 300character (also known as a `return' or `enter' character). A 301semicolon (\verb+;+) can be also used to indicate the end of an input 302line, allowing a single physical line of input to contain multiple 303logical lines of input. For example, five lines of input for the 304\verb+GEOMETRY+ directive can be entered as follows; 305\begin{verbatim} 306 geometry 307 O 0 0 0 308 H 0 1.430 1.107 309 H 0 -1.430 1.107 310 end 311\end{verbatim} 312These same five lines could be entered on a single line, as 313\begin{verbatim} 314 geometry; O 0 0 0; H 0 1.430 1.107; H 0 -1.430 1.107; end 315\end{verbatim} 316This one physical input line comprises five logical 317input lines. Each logical or physical input line must be no longer 318than 1023 characters. 319 320In the input file: 321\begin{itemize} 322\item a string, token, or field is a sequence of ASCII characters 323 (NOTE: if the string includes blanks or tabs (i.e., white space), 324 the entire string must be enclosed in double quotes). 325\item \verb+\+ (backslash) at the end of a line concatenates it with 326 the next line. Note that a space character is automatically 327 inserted at this point so that it is {\em not} possible to split 328 tokens across lines. A backslash is also used to quote special 329 characters such as whitespace, semi-colons, and hash symbols so as 330 to avoid their special meaning (NOTE: these special symbols must be 331 quoted with the backslash even when enclosed within double quotes). 332\item \verb+;+ (semicolon) is used to mark the end of a logical input 333 line within a physical line of input. 334\item \verb+#+ (the hash or pound symbol) is the comment character. 335 All characters following \verb+#+ (up to the end of the physical 336 line) are ignored. 337\item If {\em any} input line (excluding Python programs, Section 338\ref{sec:python}) begins with the string \verb+INCLUDE+ (ignoring 339case) and is followed by a valid file name, then the data in that file 340are read as if they were included into the current input file at the 341current line. Up to three levels of nested include files are 342supported. The user should note that inputting a basis set from the 343standard basis library (Section \ref{sec:basis}) uses one level of 344include. 345\item Data is read from the input file until an end-of-file is detected, or 346until the string \verb+EOF+ (ignoring case) is encountered at the 347beginning of an input line. 348\end{itemize} 349 350\subsection{Format and syntax of directives} 351 352Directives consist of a directive name, keywords, and optional input, 353and may contain one line or many. Simple directives consist of a 354single line of input with one or more fields. Compound directives can 355have multiple input lines, and can also include other optional simple 356and compound directives. A compound directive is terminated with an 357END directive. The directives START (see Section \ref{sec:start}) and 358ECHO (see Section \ref{sec:echo}) are examples of simple directives. 359The directive GEOMETRY (see Section \ref{sec:geom}) is an example of a 360compound directive. 361 362Some limited checking of the input for self-consistency is performed 363by the input module, but most defaults are imposed by the application 364modules at runtime. It is therefore usually impossible to determine 365beforehand whether or not all selected options are consistent with 366each other. 367 368\sloppy 369 370In the rest of this document, the following notation and syntax 371conventions are used in the generic descriptions of the NWChem input. 372\begin{itemize} 373\item a directive name always appears in all-capitals, and in computer 374 typeface (e.g., \verb+GEOMETRY+, \verb+BASIS+, \verb+SCF+). Note 375 that the case of directives and keywords is ignored in the actual 376 input. 377\item a keyword always appears in lower case, in computer typeface 378 (e.g., {\tt swap}, {\tt print}, {\tt units}, {\tt bqbq}). 379\item variable names always appear in lower case, in computer 380 typeface, and enclosed in angle brackets to distinguish them from 381 keywords (e.g., {\tt <input\_filename>}, {\tt <basisname>}, {\tt 382 <tag>}). 383\item \verb+$variable$+ is used to indicate the substitution of the 384 value of a variable. 385\item \verb+()+ is used to group items (the parentheses and other 386 special symbols should not appear in the input). 387\item \verb+||+ separate exclusive options, parameters, or formats. 388\item \verb+[ ]+ enclose optional entries that have a default value. 389\item \verb+< >+ enclose a type, a name of a value to be specified, or 390 a default value, if any. 391\item \verb+\+ is used to concatenate lines in a description. 392\item \verb+...+ is used to indicate indefinite continuation of a 393 list. 394\end{itemize} 395 396\fussy 397 398An input parameter is identified in the description of the directive 399by prefacing the name of the item with the type of data expected, 400i.e., 401\begin{itemize} 402\item \verb+string + -- an ASCII character string 403\item \verb+integer+ -- integer value(s) for a variable or an array 404\item \verb+logical+ -- true/false logical variable 405\item \verb+real + -- real floating point value(s) for a variable or an array 406\item \verb+double + -- synonymous with real 407\end{itemize} 408 409If an input item is not prefaced by one of these type names, 410it is assumed to be of type ``string''. 411 412In addition, integer lists may be specified using Fortran triplet 413notation, which interprets \verb+lo:hi:inc+ as \verb+lo+, \verb=lo+inc=, 414\verb=lo+2*inc=, \ldots, \verb+hi+. For example, where a list of 415integers is expected in the input, the following two lines are 416equivalent 417\begin{verbatim} 418 7 10 21:27:2 1:3 99 419 7 10 21 23 25 27 1 2 3 99 420\end{verbatim} 421(In Fortran triplet notation, the increment, if unstated, is 1; e.g., 1:3 = 1:3:1.) 422 423The directive \verb+VECTORS+ (Section \ref{sec:vectors}) is presented here 424as an example of an NWChem input directive. The general form of the 425directive is as follows: 426\begin{verbatim} 427 VECTORS [input (<string input_movecs default atomic>) || \ 428 (project <string basisname> <string filename>)] \ 429 [swap [(alpha||beta)] <integer vec1 vec2> ...] \ 430 [output <string output_movecs default $file_prefix$.movecs>] 431\end{verbatim} 432 433This directive contains three optional keywords, as indicated by the 434three main sets of square brackets enclosing the keywords \verb+input+, 435\verb+swap+, and \verb+output+. The keyword \verb+input+ allows the 436user to specify the source of the molecular orbital vectors. 437There are two mutually exclusive options for 438specifying the vectors, as indicated by the \verb+||+ symbol 439separating the option descriptions; 440\begin{verbatim} 441 (<string input_movecs default atomic>) || \ 442 (project <string basisname> <string filename>) \ 443\end{verbatim} 444 445The first option, \verb+(<string input_movecs default atomic>)+, 446allows the user to specify an ASCII character string for the parameter 447{\tt input\_movecs}. If no entry is specified, the code uses the 448default \verb+atomic+ (i.e., atomic guess). The second option, 449{\tt(project <string basisname> <string filename>)}, contains the 450keyword \verb+project+, which takes two string arguments. When this 451keyword is used, the vectors in file \verb+<filename>+ will be 452projected from the (smaller) basis \verb+<basisname>+ into the current 453atomic orbital (AO) basis. 454 455The second keyword, \verb+swap+, allows the user to re-order the 456starting vectors, specifying the pairs of vectors to be swapped. As 457many pairs as the user wishes to have swapped can be listed for {\tt 458 <integer vec1 vec2 ... >}. The optional keywords \verb+alpha+ and 459\verb+beta+ allow the user to swap the alpha or beta spin orbitals. 460 461The third keyword, \verb+output+, allows the user to tell the code 462where to store the vectors, by specifying an ASCII string for the 463parameter {\tt output\_movecs}. If no entry is specified for this 464parameter, the default is to write the vectors back into either the 465user- specified MO vectors input file or, if this is not available, 466the file \verb+$file_prefix$.movecs+. 467 468A particular example of the \verb+VECTORS+ directive is shown below. 469It specifies both the \verb+input+ and \verb+output+ keywords, but 470does not use the \verb+swap+ option. 471\begin{verbatim} 472 vectors input project "small basis" small_basis.movecs \ 473 output large_basis.movecs 474\end{verbatim} 475This directive tells the code to generate input vectors by projecting 476from vectors in a smaller basis named \verb+"small basis"+, which is 477stored in the file \verb+small_basis.movecs+. The output vectors will 478be stored in the file \verb+large_basis.movecs+. 479 480The order of keyed optional entries within a directive should not 481matter, unless noted otherwise in the specific instructions for a 482particular directive. 483