1<?xml version="1.0" standalone="yes"?>
2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
3     "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"
4[
5    <!ENTITY % entities SYSTEM "program_options.ent" >
6    %entities;
7]>
8<section id="program_options.overview">
9  <title>Library Overview</title>
10
11  <para>In the tutorial section, we saw several examples of library usage.
12    Here we will describe the overall library design including the primary
13    components and their function.
14  </para>
15
16  <para>The library has three main components:
17    <itemizedlist>
18      <listitem>
19        <para>The options description component, which describes the allowed options
20          and what to do with the values of the options.
21        </para>
22      </listitem>
23      <listitem>
24        <para>The parsers component, which uses this information to find option names
25          and values in the input sources and return them.
26        </para>
27      </listitem>
28      <listitem>
29        <para>The storage component, which provides the
30          interface to access the value of an option. It also converts the string
31          representation of values that parsers return into desired C++ types.
32        </para>
33      </listitem>
34    </itemizedlist>
35  </para>
36
37  <para>To be a little more concrete, the <code>options_description</code>
38  class is from the options description component, the
39  <code>parse_command_line</code> function is from the parsers component, and the
40  <code>variables_map</code> class is from the storage component. </para>
41
42  <para>In the tutorial we've learned how those components can be used by the
43    <code>main</code> function to parse the command line and config
44    file. Before going into the details of each component, a few notes about
45    the world outside of <code>main</code>.
46  </para>
47
48  <para>
49    For that outside world, the storage component is the most important. It
50    provides a class which stores all option values and that class can be
51    freely passed around your program to modules which need access to the
52    options. All the other components can be used only in the place where
53    the actual parsing is the done.  However, it might also make sense for the
54    individual program modules to describe their options and pass them to the
55    main module, which will merge all options. Of course, this is only
56    important when the number of options is large and declaring them in one
57    place becomes troublesome.
58  </para>
59
60<!--
61  <para>The design looks very simple and straight-forward, but it is worth
62  noting some important points:
63    <itemizedlist>
64      <listitem>
65        <para>The options description is not tied to specific source. Once
66        options are described, all parsers can use that description.</para>
67      </listitem>
68      <listitem>
69        <para>The parsers are intended to be fairly dumb. They just
70          split the input into (name, value) pairs, using strings to represent
71          names and values. No meaningful processing of values is done.
72        </para>
73      </listitem>
74      <listitem>
75        <para>The storage component is focused on storing options values. It
76        </para>
77      </listitem>
78
79
80    </itemizedlist>
81
82  </para>
83-->
84
85  <section>
86    <title>Options Description Component</title>
87
88    <para>The options description component has three main classes:
89      &option_description;, &value_semantic; and &options_description;. The
90      first two together describe a single option. The &option_description;
91      class contains the option's name, description and a pointer to &value_semantic;,
92      which, in turn, knows the type of the option's value and can parse the value,
93      apply the default value, and so on. The &options_description; class is a
94      container for instances of &option_description;.
95    </para>
96
97    <para>For almost every library, those classes could be created in a
98      conventional way: that is, you'd create new options using constructors and
99      then call the <code>add</code> method of &options_description;. However,
100      that's overly verbose for declaring 20 or 30 options. This concern led
101      to creation of the syntax that you've already seen:
102<programlisting>
103options_description desc;
104desc.add_options()
105    ("help", "produce help")
106    ("optimization", value&lt;int&gt;()->default_value(10), "optimization level")
107    ;
108</programlisting>
109    </para>
110
111    <para>The call to the <code>value</code> function creates an instance of
112      a class derived from the <code>value_semantic</code> class: <code>typed_value</code>.
113      That class contains the code to parse
114      values of a specific type, and contains a number of methods which can be
115      called by the user to specify additional information. (This
116      essentially emulates named parameters of the constructor.) Calls to
117      <code>operator()</code> on the object returned by <code>add_options</code>
118      forward arguments to the constructor of the <code>option_description</code>
119      class and add the new instance.
120    </para>
121
122    <para>
123      Note that in addition to the
124      <code>value</code>, library provides the <code>bool_switch</code>
125      function, and user can write his own function which will return
126      other subclasses of <code>value_semantic</code> with
127      different behaviour. For the remainder of this section, we'll talk only
128      about the <code>value</code> function.
129    </para>
130
131    <para>The information about an option is divided into syntactic and
132      semantic. Syntactic information includes the name of the option and the
133      number of tokens which can be used to specify the value. This
134      information is used by parsers to group tokens into (name, value) pairs,
135      where value is just a vector of strings
136      (<code>std::vector&lt;std::string&gt;</code>). The semantic layer
137      is responsible for converting the value of the option into more usable C++
138      types.
139    </para>
140
141    <para>This separation is an important part of library design. The parsers
142      use only the syntactic layer, which takes away some of the freedom to
143      use overly complex structures. For example, it's not easy to parse
144      syntax like: <screen>calc --expression=1 + 2/3</screen> because it's not
145      possible to parse <screen>1 + 2/3</screen> without knowing that it's a C
146      expression. With a little help from the user the task becomes trivial,
147      and the syntax clear: <screen>calc --expression="1 + 2/3"</screen>
148    </para>
149
150    <section>
151      <title>Syntactic Information</title>
152      <para>The syntactic information is provided by the
153        <classname>boost::program_options::options_description</classname> class
154        and some methods of the
155        <classname>boost::program_options::value_semantic</classname> class
156        and includes:
157        <itemizedlist>
158          <listitem>
159            <para>
160              name of the option, used to identify the option inside the
161              program,
162            </para>
163          </listitem>
164          <listitem>
165            <para>
166              description of the option, which can be presented to the user,
167            </para>
168          </listitem>
169          <listitem>
170            <para>
171              the allowed number of source tokens that comprise options's
172              value, which is used during parsing.
173            </para>
174          </listitem>
175        </itemizedlist>
176      </para>
177
178      <para>Consider the following example:
179      <programlisting>
180options_description desc;
181desc.add_options()
182    ("help", "produce help message")
183    ("compression", value&lt;string&gt;(), "compression level")
184    ("verbose", value&lt;string&gt;()->implicit_value("0"), "verbosity level")
185    ("email", value&lt;string&gt;()->multitoken(), "email to send to")
186    ;
187      </programlisting>
188      For the first parameter, we specify only the name and the
189      description. No value can be specified in the parsed source.
190      For the first option, the user must specify a value, using a single
191      token. For the third option, the user may either provide a single token
192      for the value, or no token at all. For the last option, the value can
193      span several tokens. For example, the following command line is OK:
194      <screen>
195          test --help --compression 10 --verbose --email beadle@mars beadle2@mars
196      </screen>
197      </para>
198
199      <section>
200        <title>Description formatting</title>
201
202        <para>
203          Sometimes the description can get rather long, for example, when
204          several option's values need separate documentation. Below we
205          describe some simple formatting mechanisms you can use.
206        </para>
207
208        <para>The description string has one or more paragraphs, separated by
209        the newline character ('\n'). When an option is output, the library
210        will compute the indentation for options's description. Each of the
211        paragraph is output as a separate line with that intentation. If
212        a paragraph does not fit on one line it is spanned over multiple
213        lines (which will have the same indentation).
214        </para>
215
216        <para>You may specify additional indent for the first specified by
217        inserting spaces at the beginning of a paragraph. For example:
218        <programlisting>
219options.add_options()
220    ("help", "   A long help msg a long help msg a long help msg a long help
221msg a long help msg a long help msg a long help msg a long help msg ")
222    ;
223        </programlisting>
224        will specify a four-space indent for the first line. The output will
225        look like:
226        <screen>
227  --help                    A long help msg a long
228                        help msg a long help msg
229                        a long help msg a long
230                        help msg a long help msg
231                        a long help msg a long
232                        help msg
233
234        </screen>
235        </para>
236
237        <para>For the case where line is wrapped, you can want an additional
238        indent for wrapped text. This can be done by
239        inserting a tabulator character ('\t') at the desired position. For
240        example:
241        <programlisting>
242options.add_options()
243      ("well_formated", "As you can see this is a very well formatted
244option description.\n"
245                        "You can do this for example:\n\n"
246                        "Values:\n"
247                        "  Value1: \tdoes this and that, bla bla bla bla
248bla bla bla bla bla bla bla bla bla bla bla\n"
249                        "  Value2: \tdoes something else, bla bla bla bla
250bla bla bla bla bla bla bla bla bla bla bla\n\n"
251                        "    This paragraph has a first line indent only,
252bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla");
253        </programlisting>
254        will produce:
255        <screen>
256  --well_formated       As you can see this is a
257                        very well formatted
258                        option description.
259                        You can do this for
260                        example:
261
262                        Values:
263                          Value1: does this and
264                                  that, bla bla
265                                  bla bla bla bla
266                                  bla bla bla bla
267                                  bla bla bla bla
268                                  bla
269                          Value2: does something
270                                  else, bla bla
271                                  bla bla bla bla
272                                  bla bla bla bla
273                                  bla bla bla bla
274                                  bla
275
276                            This paragraph has a
277                        first line indent only,
278                        bla bla bla bla bla bla
279                        bla bla bla bla bla bla
280                        bla bla bla
281        </screen>
282        The tab character is removed before output. Only one tabulator per
283        paragraph is allowed, otherwise an exception of type
284        program_options::error is thrown. Finally, the tabulator is ignored if
285        it is not on the first line of the paragraph or is on the last
286        possible position of the first line.
287        </para>
288
289      </section>
290
291    </section>
292
293    <section>
294      <title>Semantic Information</title>
295
296      <para>The semantic information is completely provided by the
297        <classname>boost::program_options::value_semantic</classname> class. For
298        example:
299<programlisting>
300options_description desc;
301desc.add_options()
302    ("compression", value&lt;int&gt;()->default_value(10), "compression level")
303    ("email", value&lt; vector&lt;string&gt; &gt;()
304        ->composing()->notifier(&amp;your_function), "email")
305    ;
306</programlisting>
307        These declarations specify that default value of the first option is 10,
308        that the second option can appear several times and all instances should
309        be merged, and that after parsing is done, the library will  call
310        function <code>&amp;your_function</code>, passing the value of the
311        "email" option as argument.
312      </para>
313    </section>
314
315    <section>
316      <title>Positional Options</title>
317
318      <para>Our definition of option as (name, value) pairs is simple and
319        useful, but in one special case of the command line, there's a
320        problem. A command line can include a <firstterm>positional option</firstterm>,
321        which does not specify any name at all, for example:
322        <screen>
323          archiver --compression=9 /etc/passwd
324        </screen>
325        Here, the "/etc/passwd" element does not have any option name.
326      </para>
327
328      <para>One solution is to ask the user to extract positional options
329        himself and process them as he likes. However, there's a nicer approach
330        -- provide a method to automatically assign the names for positional
331        options, so that the above command line can be interpreted the same way
332        as:
333        <screen>
334          archiver --compression=9 --input-file=/etc/passwd
335        </screen>
336      </para>
337
338      <para>The &positional_options_desc; class allows the command line
339        parser to assign the names. The class specifies how many positional options
340        are allowed, and for each allowed option, specifies the name. For example:
341<programlisting>
342positional_options_description pd; pd.add("input-file", 1);
343</programlisting> specifies that for exactly one, first, positional
344        option the name will be "input-file".
345      </para>
346
347      <para>It's possible to specify that a number, or even all positional options, be
348        given the same name.
349<programlisting>
350positional_options_description pd;
351pd.add("output-file", 2).add("input-file", -1);
352</programlisting>
353        In the above example, the first two positional options will be associated
354        with name "output-file", and any others with the name "input-file".
355      </para>
356
357    <warning>
358      <para>The &positional_options_desc; class only specifies translation from
359      position to name, and the option name should still be registered with
360      an instance of the &options_description; class.</para>
361    </warning>
362
363
364    </section>
365
366    <!-- Note that the classes are not modified during parsing -->
367
368  </section>
369
370  <section>
371    <title>Parsers Component</title>
372
373    <para>The parsers component splits input sources into (name, value) pairs.
374      Each parser looks for possible options and consults the options
375      description component to determine if the option is known and how its value
376      is specified. In the simplest case, the name is explicitly specified,
377      which allows the library to decide if such option is known. If it is known, the
378      &value_semantic; instance determines how the value is specified. (If
379      it is not known, an exception is thrown.) Common
380      cases are when the value is explicitly specified by the user, and when
381      the value cannot be specified by the user, but the presence of the
382      option implies some value (for example, <code>true</code>). So, the
383      parser checks that the value is specified when needed and not specified
384      when not needed, and returns new (name, value) pair.
385    </para>
386
387    <para>
388      To invoke a parser you typically call a function, passing the options
389      description and command line or config file or something else.
390      The results of parsing are returned as an instance of the &parsed_options;
391      class. Typically, that object is passed directly to the storage
392      component. However, it also can be used directly, or undergo some additional
393      processing.
394    </para>
395
396    <para>
397      There are three exceptions to the above model -- all related to
398      traditional usage of the command line. While they require some support
399      from the options description component, the additional complexity is
400      tolerable.
401      <itemizedlist>
402        <listitem>
403          <para>The name specified on the command line may be
404            different from the option name -- it's common to provide a "short option
405            name" alias to a longer name. It's also common to allow an abbreviated name
406            to be specified on the command line.
407          </para>
408        </listitem>
409        <listitem>
410          <para>Sometimes it's desirable to specify value as several
411          tokens. For example, an option "--email-recipient" may be followed
412          by several emails, each as a separate command line token. This
413          behaviour is supported, though it can lead to parsing ambiguities
414          and is not enabled by default.
415          </para>
416        </listitem>
417        <listitem>
418          <para>The command line may contain positional options -- elements
419            which don't have any name. The command line parser provides a
420            mechanism to guess names for such options, as we've seen in the
421            tutorial.
422          </para>
423        </listitem>
424      </itemizedlist>
425    </para>
426
427  </section>
428
429
430  <section>
431    <title>Storage Component</title>
432
433    <para>The storage component is responsible for:
434      <itemizedlist>
435        <listitem>
436          <para>Storing the final values of an option into a special class and in
437            regular variables</para>
438        </listitem>
439        <listitem>
440          <para>Handling priorities among different sources.</para>
441        </listitem>
442
443        <listitem>
444          <para>Calling user-specified <code>notify</code> functions with the final
445         values of options.</para>
446        </listitem>
447      </itemizedlist>
448    </para>
449
450    <para>Let's consider an example:
451<programlisting>
452variables_map vm;
453store(parse_command_line(argc, argv, desc), vm);
454store(parse_config_file("example.cfg", desc), vm);
455notify(vm);
456</programlisting>
457      The <code>variables_map</code> class is used to store the option
458      values. The two calls to the <code>store</code> function add values
459      found on the command line and in the config file. Finally the call to
460      the <code>notify</code> function runs the user-specified notify
461      functions and stores the values into regular variables, if needed.
462    </para>
463
464    <para>The priority is handled in a simple way: the <code>store</code>
465      function will not change the value of an option if it's already
466      assigned. In this case, if the command line specifies the value for an
467      option, any value in the config file is ignored.
468    </para>
469
470    <warning>
471      <para>Don't forget to call the <code>notify</code> function after you've
472      stored all parsed values.</para>
473    </warning>
474
475  </section>
476
477  <section>
478    <title>Specific parsers</title>
479
480    <section>
481      <title>Configuration file parser</title>
482
483      <para>The &parse_config_file; function implements parsing
484      of simple INI-like configuration files. Configuration file
485      syntax is line based:
486      </para>
487      <itemizedlist>
488        <listitem><para>A line in the form:</para>
489        <screen>
490<replaceable>name</replaceable>=<replaceable>value</replaceable>
491        </screen>
492        <para>gives a value to an option.</para>
493        </listitem>
494        <listitem><para>A line in the form:</para>
495        <screen>
496[<replaceable>section name</replaceable>]
497        </screen>
498        <para>introduces a new section in the configuration file.</para>
499        </listitem>
500        <listitem><para>The <literal>#</literal> character introduces a
501        comment that spans until the end of the line.</para>
502        </listitem>
503      </itemizedlist>
504
505      <para>The option names are relative to the section names, so
506      the following configuration file part:</para>
507      <screen>
508[gui.accessibility]
509visual_bell=yes
510      </screen>
511      <para>is equivalent to</para>
512      <screen>
513gui.accessibility.visual_bell=yes
514      </screen>
515      <para>When the option "gui.accessibility.visual_bell" has been added to the options</para>
516      <programlisting>
517options_description desc;
518desc.add_options()
519    ("gui.accessibility.visual_bell", value&lt;string&gt;(), "flash screen for bell")
520    ;
521    </programlisting>
522    </section>
523
524    <section>
525      <title>Environment variables parser</title>
526
527      <para><firstterm>Environment variables</firstterm> are string variables
528      which are available to all programs via the <code>getenv</code> function
529      of C runtime library. The operating system allows to set initial values
530      for a given user, and the values can be further changed on the command
531      line.  For example, on Windows one can use the
532      <filename>autoexec.bat</filename> file or (on recent versions) the
533      <filename>Control Panel/System/Advanced/Environment Variables</filename>
534      dialog, and on Unix &#x2014;, the <filename>/etc/profile</filename>,
535      <filename>~/.profile</filename> and <filename>~/.bash_profile</filename>
536      files. Because environment variables can be set for the entire system,
537      they are particularly suitable for options which apply to all programs.
538      </para>
539
540      <para>The environment variables can be parsed with the
541      &parse_environment; function. The function have several overloaded
542      versions. The first parameter is always an &options_description;
543      instance, and the second specifies what variables must be processed, and
544      what option names must correspond to it. To describe the second
545      parameter we need to consider naming conventions for environment
546      variables.</para>
547
548      <para>If you have an option that should be specified via environment
549      variable, you need make up the variable's name. To avoid name clashes,
550      we suggest that you use a sufficiently unique prefix for environment
551      variables. Also, while option names are most likely in lower case,
552      environment variables conventionally use upper case. So, for an option
553      name <literal>proxy</literal> the environment variable might be called
554      <envar>BOOST_PROXY</envar>. During parsing, we need to perform reverse
555      conversion of the names. This is accomplished by passing the choosen
556      prefix as the second parameter of the &parse_environment; function.
557      Say, if you pass <literal>BOOST_</literal> as the prefix, and there are
558      two variables, <envar>CVSROOT</envar> and <envar>BOOST_PROXY</envar>, the
559      first variable will be ignored, and the second one will be converted to
560      option <literal>proxy</literal>.
561      </para>
562
563      <para>The above logic is sufficient in many cases, but it is also
564      possible to pass, as the second parameter of the &parse_environment;
565      function, any function taking a <code>std::string</code> and returning
566      <code>std::string</code>. That function will be called for each
567      environment variable and should return either the name of the option, or
568      empty string if the variable should be ignored. An example showing this
569      method can be found in "example/env_options.cpp".
570      </para>
571
572    </section>
573  </section>
574
575  <section>
576    <title>Types</title>
577
578    <para>Everything that is passed in on the command line, as an environmental
579    variable, or in a config file is a string. For values that need to be used
580    as a non-string type, the value in the variables_map will attempt to
581    convert it to the correct type.</para>
582
583    <para>Integers and floating point values are converted using Boost's
584    lexical_cast. It will accept integer values such as "41" or "-42". It will
585    accept floating point numbers such as "51.1", "-52.1", "53.1234567890" (as
586    a double), "54", "55.", ".56", "57.1e5", "58.1E5", ".591e5", "60.1e-5",
587    "-61.1e5", "-62.1e-5", etc. Unfortunately, hex, octal, and binary
588    representations that are available in C++ literals are not supported by
589    lexical_cast, and thus will not work with program_options.</para>
590
591    <para>Booleans a special in that there are multiple ways to come at them.
592    Similar to another value type, it can be specified as <code>("my-option",
593    value&lt;bool&gt;())</code>, and then set as:</para>
594    <screen>
595example --my-option=true
596    </screen>
597    <para>However, more typical is that boolean values are set by the simple
598    presence of a switch. This is enabled by &bool_switch; as in <code>
599    ("other-option", bool_switch())</code>. This will cause the value to
600    default to false and it will become true if the switch is found:</para>
601    <screen>
602example --other-switch
603    </screen>
604    <para>When a boolean does take a parameter, there are several options.
605    Those that evaluate to true in C++ are: "true", "yes", "on", "1". Those
606    that evaluate to false in C++ are: "false", "no", "off", "0". In addition,
607    when reading from a config file, the option name with an equal sign and no
608    value after it will also evaluate to true.</para>
609  </section>
610
611  <section>
612    <title>Annotated List of Symbols</title>
613
614    <para>The following table describes all the important symbols in the
615      library, for quick access.</para>
616
617    <informaltable pgwide="1">
618
619      <tgroup cols="2">
620        <colspec colname='c1'/>
621        <colspec colname='c2'/>
622        <thead>
623
624          <row>
625            <entry>Symbol</entry>
626            <entry>Description</entry>
627          </row>
628        </thead>
629
630        <tbody>
631
632          <row>
633            <entry namest='c1' nameend='c2'>Options description component</entry>
634          </row>
635
636          <row>
637            <entry>&options_description;</entry>
638            <entry>describes a number of options</entry>
639          </row>
640          <row>
641            <entry>&value;</entry>
642            <entry>defines the option's value</entry>
643          </row>
644
645          <row>
646            <entry namest='c1' nameend='c2'>Parsers component</entry>
647          </row>
648
649          <row>
650            <entry>&parse_command_line;</entry>
651            <entry>parses command line (simpified interface)</entry>
652          </row>
653
654          <row>
655            <entry>&basic_command_line_parser;</entry>
656            <entry>parses command line (extended interface)</entry>
657          </row>
658
659
660          <row>
661            <entry>&parse_config_file;</entry>
662            <entry>parses config file</entry>
663          </row>
664
665          <row>
666            <entry>&parse_environment;</entry>
667            <entry>parses environment</entry>
668          </row>
669
670          <row>
671            <entry namest='c1' nameend='c2'>Storage component</entry>
672          </row>
673
674          <row>
675            <entry>&variables_map;</entry>
676            <entry>storage for option values</entry>
677          </row>
678
679        </tbody>
680      </tgroup>
681
682    </informaltable>
683
684  </section>
685
686</section>
687
688<!--
689     Local Variables:
690     mode: nxml
691     sgml-indent-data: t
692     sgml-parent-document: ("program_options.xml" "section")
693     sgml-set-face: t
694     End:
695-->
696