Dev_Spec_Change.dox - OpenGrok cross reference for /dports/science/dakota/dakota-6.13.0-release-public.src-UI/docs/Dev_Spec_Change.dox

namespace Dakota {

/** \page SpecChange Instructions for Modifying Dakota's Input Specification

\htmlonly
<b>Specification Change Table of Contents</b>
<ul>

<li> <a href="SpecChange.html#ModXMLSpec"> XML input specification</a>
  <ul>
  <li> <a href="SpecChange.html#ModXMLSpecReq"> XML Build Requirements</a>
  <li> <a href="SpecChange.html#ModXMLSpecTools"> XML Editing Tools</a>
  <li> <a href="SpecChange.html#ModXMLSpecFeatures"> XML Features (with map to NIDR)</a>
  </ul>

<li> <a href="SpecChange.html#RebuildGenFiles"> Rebuild generated files</a>

<li> <a href="SpecChange.html#UpdateNIDRPDDB"> Update parser source NIDRProblemDescDB.cpp</a>

<li> <a href="SpecChange.html#UpdateData"> Update Corresponding Data Classes</a>
  <ul>
  <li> <a href="SpecChange.html#UpdateDatap1"> Update the Data class header file</a>
  <li> <a href="SpecChange.html#UpdateDatap2"> Update the .cpp file</a>
  </ul>
<li> <a href="SpecChange.html#UpdatePDDB"> Update database source ProblemDescDB.cpp</a>
  <ul>
  <li> <a href="SpecChange.html#UpdatePDDBp1"> Augment/update get_<data_type>() functions</a>
  </ul>

<li> <a href="SpecChange.html#UseFns"> Use get_<data_type>() Functions</a>

<li> <a href="SpecChange.html#UpdateDocs"> Update the Documentation</a>

</ul>
\endhtmlonly

To modify %Dakota's input specification (for maintenance or addition of
new input syntax), specification maintenance mode must be enabled at
%Dakota configure time with the \c -DENABLE_SPEC_MAINT option, e.g.,
\code
  ./cmake -DENABLE_SPEC_MAINT:BOOL=ON ..
\endcode
This will enable regeneration of NIDR and %Dakota components which must
be updated following a spec change.


\section ModXMLSpec XML Input Specification

The authoritative source for valid %Dakota input grammar is \c
dakota/src/dakota.xml.  The schema defining valid content for this XML
file is in \c dakota/src/dakota.xsd.  NIDR remains %Dakota's user
input file parser, so \c dakota.xml is translated to \c
dakota/src/dakota.input.nspec during the %Dakota build process.  To
update the XML input definition: <ul>

<li> Make sure ENABLE_SPEC_MAINT is enabled in your build and necessary
     Java development tools are installed (see below).</li>

<li> Edit the XML spec in \c dakota.xml.</li>

<li> Perform a make in \c dakota.build/src which will regenerate \c
     dakota.source/src/dakota.input.nspec and related file.</li>

<li> Review that any changes induced in the \c dakota.input.nspec file
     are as expected.</li>

<li> Proceed with verifying code changes and making downstream parse
     handler changes as normal (described below).</li>

<li> Commit the modified \c dakota.xml, \c dakota.input.nspec, and
     other files generated to \c dakota.source/src along with your
     other code changes.</li>

</ul>


\subsection ModXMLSpecReq XML Build Requirements

Editing the XML and then compiling %Dakota requires

 - Java Development Kit (JDK) providing the Java compiler javac.  Java
   6 (version 1.6) or newer should work, with Java 8 recommended.  Can
   satisfy on RHEL6 with RPM packages \c java-1.8.0-openjdk-devel and
   \c java-1.8.0-openjdk.  This is needed to build the Java-based XML
   to NIDR translator.  If this becomes too burdensome, we can check
   in the generated \c xml2nidr.jar file.

\subsection ModXMLSpecTools XML Editing Tools

The following tools will make editing dakota.input.xml easier.

<ul>

<li> <b>Recommended: Eclipse Web Tools Platform.</b> Includes both
     graphical and text editors.

  <ol>
  <li> Download Eclipse Standard (Classic)</li>
  <li> Configure proxy if needed, setting to manual:
       Window > Preferences > General > Network Connection > Proxy
  <li> Install Web Tools Platform
       - Help > Install New Software
       - Work With: Kepler - http://download.eclipse.org/releases/kepler
       - Search "Eclipse X" and install two packages under Web, XML, Java
         - Eclipse XML Editors and Tools
         - Eclipse XSL Developer Tools
       - Optionally install C/C++ Development Tools
  </li>

  <li> Optional: add Subclipse for subversion (Subversive is the other
       major competing tool and I don't think requires JavaHL)
       Help > Install New Software
       * Work With: http://subclipse.tigris.org/update_1.6.x
       * Install Subclipse
       * On Linux: yum install subversion-javahl.x86_64
  </li>

  <li> Alternately install Eclipse for Java or Eclipse Java EE
       development which includes webtools, then optionally add
       subclipse and C/C++ dev</li>

  </ol>
</li>

<li> <b>Alternate: Emacs or your usual editor.</b> For example, Emacs
     supports an Nxml mode.  You can tell it where to find the schema,
     edit XML, and have it perform validation against the schema.  See
     help at
     http://www.gnu.org/software/emacs/manual/html_mono/nxml-mode.html
     </li>

<li> <b>Other Suggested Alternates:</b> XMLSpy, DreamWeaver, XML Copy
     Editor</li>

</ul>


\subsection ModXMLSpecFeatures XML Features (with map to NIDR)

Out of necessity, %Dakota XML \c dakota.xml closely mirrors
\c dakota.input.nspec.  Valid %Dakota input grammar is constrained by
\c dakota.xml, an XML document which must validate against \c
dakota.xsd.  The top-level element of interest is \c <input>, which is
comprised of a sequence of content elements (keywords, alternates,
etc.), which may themselves contain additional child content elements.
The key content types are:

<ul>

<li> <b>Keyword (\c <keyword>):</b>, specified with the \c <keyword>
     element whose definition is given by keywordType in \c
     dakota.xsd.  The required attributes are:
     <ul>

     <li><b>name:</b> The keyword name (lower case with underscores)
         as it will be given in user input; must follow same
         uniqueness rules are historical NIDR.  User input is allowed
         in mixed case, but the XML must use lower case names. </li>

	 Since the NIDR parser allows keyword abbreviation, you \e
         must not add a keyword that could be misinterpreted as
         an abbreviation for a different keyword within the same
         top-level keyword, such as "environment" and "method".  For
         example, adding the keyword "expansion" within the method
         specification would be a mistake if the keyword
         "expansion_factor" already was being used in this block.

	 The NIDR input is somewhat order-dependent, allowing the same
	 keyword to be reused multiple times in the specification.
	 This often happens with aliases, such as \c lower_bounds, \c
	 upper_bounds and \c initial_point.  Ambiguities are resolved
	 by attaching a keyword to the most recently seen context in
	 which it could appear, if such exists, or to the first
	 relevant context that subsequently comes along in the input
	 file.

     <li><b>code:</b> The verbatim NIDR handler to be invoked when
         this keyword parsed.  In NIDR this was specified with
         {N_macro(...)}.</li>

     </ul>

     Optional/useful parser-related elements/attributes in order of
     importance are: <ul>

     <li><b>param sub-element:</b>Parameters and data types: A keyword
     	 may have an associated parameter element with a specified
     	 data type: \c <param \c type="PARAMTYPE" />.  NIDR data types
     	 remain the same (INTEGER, REAL, STRING and LISTs thereof, but
     	 new data types INPUT_FILE and OUTPUT_FILE add convenience for
     	 the GUI, mapping to STRING for NIDR purposes.  Parameters can
     	 also include attributes \c constraint, \c in_taglist, or \c
     	 taglist, which are used to help validate the user-specified
     	 parameter value.  For example <code>constraint >= 0 LEN
     	 normal_uncertain</code></li>

     <li><b>alias sub-element:</b> Historical aliases for this keyword
         (can appear multiple times).  Alias has a single attribute
         <b>name</b> which must be lower case with underscores.</li>

     <li><b>id:</b> Unique ID for the keyword, usually name with an
         integer appended, but not currently used/enforced.</li>

     <li><b>minOccurs:</b> Minimum occurrences of the keyword in
         current context (set to 1 for required, 0 for optional)</li>

     <li><b>maxOccurs:</b> Maximum occurrences of the keyword in
         current context (for example environment may appear at most
         once)</li>

     </ul>

     And optional/useful GUI-related attributes are:
     <ul>

     <li><b>help:</b> (Don't add this attribute the new keywords!) A
         pointer to the corresponding reference manual section
         (deprecated as not needed with new reference manual format
         which mirrors keyword hierarchy).</li>

     <li><b>label:</b> a short, friendly label string for the keyword
         in the GUI.  Format these like titles, e.g., "Initial Point
         for Search".</li>

     <li><b>group:</b> Category or group for this keyword, e.g.,
         optimization vs. parameter study if they are to be groups for
         GUI purposes</li>

     </ul>

<li> <b>Alternation (\c <oneOf>):</b> Alternation of groups of content
     is done with the element \c <oneOf> which indicates that its
     immediate children are alternates.  In NIDR this was done with
     the pipe symbol: OptionA | OptionB.  oneOf allows the label
     attribute and its use is recommended. </li>

<li> <b>Required Group (\c <required>):</b> A required group can be specified by
     enclosing the contents in the \c <required> element.  In NIDR
     this was done by enclosing the content in parentheses: ( required
     group... ) </li>

<li> <b>Optional Group (\c <optional>):</b> An optional group can be
     specified by enclosing the contents in the \c <optional> element.
     In NIDR this was done by enclosing the content in brackets: [
     optional group... ] </li>

</ul>


\section RebuildGenFiles Rebuild Generated Files

When configured with \c -DENABLE_SPEC_MAINT, performing a make in \c
dakota.build/src will regenerate all files which derive from \c
dakota.xml, include dakota.input.nspec, NIDR_keywds.hpp, and
dakota.input.summary.  If you commit changes to a source repository,
be sure to commit any automatically generated files in addition to any
modified in the following steps.  It is not strictly necessary to run
make at this point in the sequence, and in fact may generate errors if
necessary handlers aren't yet available.

\warning Please do not manually modify generated files!


\section UpdateNIDRPDDB Update Parser Source NIDRProblemDescDB.cpp

Many keywords have data associated with them: an integer, a
floating-point number, a string, or arrays of such entities.  Data
requirements are specified in dakota.input.nspec by the tokens
INTEGER, REAL, STRING, INTEGERLIST, REALLIST, STRINGLIST.  (Some
keywords have no associated data and hence no such token.)  After each
keyword and data token, the dakota.input.nspec file specifies
functions that the NIDR parser should call to record the appearance of
the keyword and deal with any associated data.  The general form of
this specification is

{ startfcn, startdata, stopfcn, stopdata }

i.e., a brace-enclosed list of one to four functions and data
pointers, with trailing entities taken to be zero if not present; zero
for a function means no function will be called.  The startfcn must
deal with any associated data.  Otherwise, the distinction between
startfcn and stopfcn is relevant only to keywords that begin a group
of keywords (enclosed in parentheses or square brackets).  The
startfcn is called before other entities in the group are processed,
and the stop function is called after they are processed.  Top-level
keywords often have both startfcn and stopfcn; stopfcn is uncommon but
possible for lower-level keywords.  The startdata and (if needed)
stopdata values are usually pointers to little structures that provide
keyword-specific details to generic functions for startfcn and
stopfcn.  Some keywords that begin groups (such as "approx_problem"
within the top-level "environment" keyword) have no need of either a
startfcn or a stopfcn; this is indicated by "{0}".

Most of the things within braces in dakota.input.nspec are invocations
of macros defined in \c dakota.source/src/NIDRProblemDescDB.cpp.  The
macros simplify writing dakota.input.nspec and make it more readable.
Most macro invocations refer to little structures defined in
NIDRProblemDescDB.cpp, usually with the help of other macros, some of
which have different definitions in different parts of
NIDRProblemDescDB.cpp.  When adding a keyword to dakota.input.nspec,
you may need to add a structure definition or even introduce a new
data type.  NIDRProblemDescDB.cpp has sections corresponding to each
top-level keyword.  The top-level keywords are in alphabetical order,
and most entities in the section for a top-level keyword are also in
alphabetical order.  While not required, it is probably good practice
to maintain this structure, as it makes things easier to find.


Any integer, real, or string data associated with a keyword are
provided to the keyword's startfcn, whose second argument is a pointer
to a \c Values structure, defined in header file \c nidr.h.

<b>Example 1:</b> if you added the specification:
\verbatim
    [method_setting REAL {method_setting_start, &method_setting_details} ]
\endverbatim
you would provide a function
\code
        void NIDRProblemDescDB::
    method_setting_start(const char *keyname, Values *val, void **g, void *v)
    { ... }
\endcode
in NIDRProblemDescDB.cpp.  In this example, argument \c &method_setting_details
would be passed as \c v, \c val->n (the number of values) would be 1 and \c *val->r
would be the REAL value given for the \c method_setting keyword.  The
\c method_setting_start function would suitably store this value with the
help of \c method_setting_details.

For some top-level keywords, \c g
(the third argument to the startfcn and stopfcn) provides access to a relevant context.
For example, \c method_start (the startfcn for the top-level \c method keyword)
executes
\code
    DataMethod *dm = new DataMethod;
    *g = (void*)dm;
\endcode
(and supplies a couple of default values to \c dm).  The start functions for
lower-level keywords within the \c method keyword get access to \c dm
through their \c g arguments.  Here is an example:
\code
 void NIDRProblemDescDB::
method_str(const char *keyname, Values *val, void **g, void *v)
{
	(*(DataMethod**)g)->**(String DataMethod::**)v = *val->s;
	}
\endcode
In this example, \c v points to a pointer-to-member, and an assignment is made
to one of the components of the DataMethod object pointed to by \c *g.
The corresponding stopfcn for the top-level \c method keyword is
\code
 void NIDRProblemDescDB::
method_stop(const char *keyname, Values *val, void **g, void *v)
{
	DataMethod *p = *(DataMethod**)g;
	pDDBInstance->dataMethodList.insert(*p);
	delete p;
	}
\endcode
which copies the now populated DataMethod object to the right place
and cleans up.


<b>Example 2:</b> if you added the specification
\verbatim
    [method_setting REALLIST {{N_mdm(RealL,methodCoeffs)}
\endverbatim
then method_RealL (defined in NIDRProblemDescDB.cpp) would be called
as the startfcn, and methodCoeffs would be the name of a
(currently nonexistent) component of DataMethod.  The N_mdm macro
is defined in NIDRProblemDescDB.cpp; among other things, it turns
\c RealL into \c NIDRProblemDescDB::method_RealL.  This function is
used to process lists of REAL values for several keywords.  By looking
at the source, you can see that the list values are val->r[i]
for 0 <= \c i < val->n.


\section UpdateData Update Corresponding Data Classes

The Data classes (\ref DataEnvironment "DataEnvironment", \ref
DataMethod "DataMethod", \ref DataModel "DataModel", \ref
DataVariables "DataVariables", \ref DataInterface "DataInterface", and
\ref DataResponses "DataResponses") store the parsed user input data.
In this step, we extend the Data class definitions to
include any new attributes referred to in \c dakota.xml or \c NIDRProblemDescDB

\subsection UpdateDatap1 Update the Data Class Header File

Add a new attribute to the public data for each of the new
specifications.  Follow the style guide for class attribute naming
conventions (or mimic the existing code).

\subsection UpdateDatap2 Update the .cpp File

Define defaults for the new attributes in the constructor
initialization list (if not a container with a sensible default
constructor) in same order as they appear in the header.  Add the new
attributes to the write(MPIPackBuffer&), read(MPIUnpackBuffer&), and
write(ostream&) functions, paying careful attention to the use of a
consistent ordering.


\section UpdatePDDB Update Database Source ProblemDescDB.cpp

\subsection UpdatePDDBp1 Augment/update get_<data_type>() Functions

The next update step involves extending the database retrieval
functions in \c dakota.source/src/ProblemDescDB.cpp.  These retrieval
functions accept an identifier string and return a database attribute
of a particular type, e.g., a RealVector:

\code
    const RealVector& get_rv(const String& entry_name);
\endcode

The implementation of each of these functions contains tables of
possible entry_name values and associated pointer-to-member values.
There is one table for each relevant top-level keyword, with the
top-level keyword omitted from the names in the table.  Since binary
search is used to look for names in these tables, each table must be
kept in alphabetical order of its entry names.  For example,

\code
  ...
  else if ((L = Begins(entry_name, "model."))) {
    if (dbRep->methodDBLocked)
	Locked_db();

    #define P &DataModelRep::
    static KW<RealVector, DataModelRep> RVdmo[] = {	// must be sorted
	{"nested.primary_response_mapping", P primaryRespCoeffs},
	{"nested.secondary_response_mapping", P secondaryRespCoeffs},
	{"surrogate.kriging_conmin_seed", P krigingConminSeed},
	{"surrogate.kriging_correlations", P krigingCorrelations},
	{"surrogate.kriging_max_correlations", P krigingMaxCorrelations},
	{"surrogate.kriging_min_correlations", P krigingMinCorrelations}};
    #undef P

    KW<RealVector, DataModelRep> *kw;
    if ((kw = (KW<RealVector, DataModelRep>*)Binsearch(RVdmo, L)))
	return dbRep->dataModelIter->dataModelRep->*kw->p;
  }
\endcode

is the "model" portion of \ref ProblemDescDB::get_rv
"ProblemDescDB::get_rv()".  Based on entry_name, it returns the
relevant attribute from a \ref DataModel "DataModel" object.  Since
there may be multiple model specifications, the \c dataModelIter list
iterator identifies which node in the list of \ref DataModel
"DataModel" objects is used.  In particular, \c dataModelList contains
a list of all of the \c data_model objects, one for each time a
top-level \c model keyword was seen by the parser.  The particular
model object used for the data retrieval is managed by \c
dataModelIter, which is set in a \c set_db_list_nodes() operation that
will not be described here.

There may be multiple \ref DataMethod "DataMethod",
\ref DataModel "DataModel", \ref DataVariables "DataVariables",
\ref DataInterface "DataInterface", and/or
\ref DataResponses "DataResponses" objects.  However, only one
 specification is currently allowed so a list of
\ref DataEnvironment "DataEnvironment" objects is not needed.  Rather,
\ref ProblemDescDB::environmentSpec "ProblemDescDB::environmentSpec"
is the lone \ref DataEnvironment "DataEnvironment" object.

To augment the get_<data_type>() functions, add table entries with new
identifier strings and pointer-to-member values that address the
appropriate data attributes from the Data class object.  The style for
the identifier strings is a top-down hierarchical description, with
specification levels separated by periods and words separated with
underscores, e.g., \c
"keyword.group_specification.individual_specification".  Use the \c
dbRep->listIter->attribute syntax for variables, interface, responses,
and method specifications.  For example, the \c method_setting example
attribute would be added to \c get_drv() as:

\code
  {"method_name.method_setting", P methodSetting},
\endcode

inserted at the beginning of the \c RVdmo array shown above (since the
name in the existing first entry, i.e.,
"nested.primary_response_mapping", comes alphabetically after
"method_name.method_setting").


\section UseFns Use get_<data_type>() Functions

At this point, the new specifications have been mapped through all of
the database classes.  The only remaining step is to retrieve the new
data within the constructors of the classes that need it.  This is
done by invoking the get_<data_type>() function on the
\ref ProblemDescDB "ProblemDescDB"
object using the identifier string you selected in \ref UpdatePDDBp1.
For example:
\code
  const String& interface_type = problem_db.get_string("interface.type");
\endcode
passes the \c "interface.type" identifier string to the
\ref ProblemDescDB::get_string "ProblemDescDB::get_string()"
retrieval function, which returns the desired attribute from the active
\ref DataInterface "DataInterface" object.

\warning Use of the get_<data_type>() functions is restricted to class
constructors, since only in class constructors are the data list
iterators (i.e., \c dataMethodIter, \c dataModelIter, \c
dataVariablesIter, \c dataInterfaceIter, and \c dataResponsesIter)
guaranteed to be set correctly.  Outside of the constructors, the
database list nodes will correspond to the last set operation, and may
not return data from the desired list node.


\section UpdateDocs Update the Documentation

Doxygen comments should be added to the Data class headers for the new
attributes, and the reference manual sections describing the portions
of \c dakota.xml that have been modified should be updated by updating
files in \c dakota.source/docs/KeywordMetaData/.  d\c dakota.xml,
together with these metadata files generates the reference manual and
GUI context-aware help documentation.

*/

} // namespace Dakota