1<?xml version="1.0" standalone="yes"?> 2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" 3 "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd" 4[ 5 <!ENTITY % entities SYSTEM "program_options.ent" > 6 %entities; 7]> 8<section id="program_options.overview"> 9 <title>Library Overview</title> 10 11 <para>In the tutorial section, we saw several examples of library usage. 12 Here we will describe the overall library design including the primary 13 components and their function. 14 </para> 15 16 <para>The library has three main components: 17 <itemizedlist> 18 <listitem> 19 <para>The options description component, which describes the allowed options 20 and what to do with the values of the options. 21 </para> 22 </listitem> 23 <listitem> 24 <para>The parsers component, which uses this information to find option names 25 and values in the input sources and return them. 26 </para> 27 </listitem> 28 <listitem> 29 <para>The storage component, which provides the 30 interface to access the value of an option. It also converts the string 31 representation of values that parsers return into desired C++ types. 32 </para> 33 </listitem> 34 </itemizedlist> 35 </para> 36 37 <para>To be a little more concrete, the <code>options_description</code> 38 class is from the options description component, the 39 <code>parse_command_line</code> function is from the parsers component, and the 40 <code>variables_map</code> class is from the storage component. </para> 41 42 <para>In the tutorial we've learned how those components can be used by the 43 <code>main</code> function to parse the command line and config 44 file. Before going into the details of each component, a few notes about 45 the world outside of <code>main</code>. 46 </para> 47 48 <para> 49 For that outside world, the storage component is the most important. It 50 provides a class which stores all option values and that class can be 51 freely passed around your program to modules which need access to the 52 options. All the other components can be used only in the place where 53 the actual parsing is the done. However, it might also make sense for the 54 individual program modules to describe their options and pass them to the 55 main module, which will merge all options. Of course, this is only 56 important when the number of options is large and declaring them in one 57 place becomes troublesome. 58 </para> 59 60<!-- 61 <para>The design looks very simple and straight-forward, but it is worth 62 noting some important points: 63 <itemizedlist> 64 <listitem> 65 <para>The options description is not tied to specific source. Once 66 options are described, all parsers can use that description.</para> 67 </listitem> 68 <listitem> 69 <para>The parsers are intended to be fairly dumb. They just 70 split the input into (name, value) pairs, using strings to represent 71 names and values. No meaningful processing of values is done. 72 </para> 73 </listitem> 74 <listitem> 75 <para>The storage component is focused on storing options values. It 76 </para> 77 </listitem> 78 79 80 </itemizedlist> 81 82 </para> 83--> 84 85 <section> 86 <title>Options Description Component</title> 87 88 <para>The options description component has three main classes: 89 &option_description;, &value_semantic; and &options_description;. The 90 first two together describe a single option. The &option_description; 91 class contains the option's name, description and a pointer to &value_semantic;, 92 which, in turn, knows the type of the option's value and can parse the value, 93 apply the default value, and so on. The &options_description; class is a 94 container for instances of &option_description;. 95 </para> 96 97 <para>For almost every library, those classes could be created in a 98 conventional way: that is, you'd create new options using constructors and 99 then call the <code>add</code> method of &options_description;. However, 100 that's overly verbose for declaring 20 or 30 options. This concern led 101 to creation of the syntax that you've already seen: 102<programlisting> 103options_description desc; 104desc.add_options() 105 ("help", "produce help") 106 ("optimization", value<int>()->default_value(10), "optimization level") 107 ; 108</programlisting> 109 </para> 110 111 <para>The call to the <code>value</code> function creates an instance of 112 a class derived from the <code>value_semantic</code> class: <code>typed_value</code>. 113 That class contains the code to parse 114 values of a specific type, and contains a number of methods which can be 115 called by the user to specify additional information. (This 116 essentially emulates named parameters of the constructor.) Calls to 117 <code>operator()</code> on the object returned by <code>add_options</code> 118 forward arguments to the constructor of the <code>option_description</code> 119 class and add the new instance. 120 </para> 121 122 <para> 123 Note that in addition to the 124 <code>value</code>, library provides the <code>bool_switch</code> 125 function, and user can write his own function which will return 126 other subclasses of <code>value_semantic</code> with 127 different behaviour. For the remainder of this section, we'll talk only 128 about the <code>value</code> function. 129 </para> 130 131 <para>The information about an option is divided into syntactic and 132 semantic. Syntactic information includes the name of the option and the 133 number of tokens which can be used to specify the value. This 134 information is used by parsers to group tokens into (name, value) pairs, 135 where value is just a vector of strings 136 (<code>std::vector<std::string></code>). The semantic layer 137 is responsible for converting the value of the option into more usable C++ 138 types. 139 </para> 140 141 <para>This separation is an important part of library design. The parsers 142 use only the syntactic layer, which takes away some of the freedom to 143 use overly complex structures. For example, it's not easy to parse 144 syntax like: <screen>calc --expression=1 + 2/3</screen> because it's not 145 possible to parse <screen>1 + 2/3</screen> without knowing that it's a C 146 expression. With a little help from the user the task becomes trivial, 147 and the syntax clear: <screen>calc --expression="1 + 2/3"</screen> 148 </para> 149 150 <section> 151 <title>Syntactic Information</title> 152 <para>The syntactic information is provided by the 153 <classname>boost::program_options::options_description</classname> class 154 and some methods of the 155 <classname>boost::program_options::value_semantic</classname> class 156 and includes: 157 <itemizedlist> 158 <listitem> 159 <para> 160 name of the option, used to identify the option inside the 161 program, 162 </para> 163 </listitem> 164 <listitem> 165 <para> 166 description of the option, which can be presented to the user, 167 </para> 168 </listitem> 169 <listitem> 170 <para> 171 the allowed number of source tokens that comprise options's 172 value, which is used during parsing. 173 </para> 174 </listitem> 175 </itemizedlist> 176 </para> 177 178 <para>Consider the following example: 179 <programlisting> 180options_description desc; 181desc.add_options() 182 ("help", "produce help message") 183 ("compression", value<string>(), "compression level") 184 ("verbose", value<string>()->implicit_value("0"), "verbosity level") 185 ("email", value<string>()->multitoken(), "email to send to") 186 ; 187 </programlisting> 188 For the first parameter, we specify only the name and the 189 description. No value can be specified in the parsed source. 190 For the first option, the user must specify a value, using a single 191 token. For the third option, the user may either provide a single token 192 for the value, or no token at all. For the last option, the value can 193 span several tokens. For example, the following command line is OK: 194 <screen> 195 test --help --compression 10 --verbose --email beadle@mars beadle2@mars 196 </screen> 197 </para> 198 199 <section> 200 <title>Description formatting</title> 201 202 <para> 203 Sometimes the description can get rather long, for example, when 204 several option's values need separate documentation. Below we 205 describe some simple formatting mechanisms you can use. 206 </para> 207 208 <para>The description string has one or more paragraphs, separated by 209 the newline character ('\n'). When an option is output, the library 210 will compute the indentation for options's description. Each of the 211 paragraph is output as a separate line with that intentation. If 212 a paragraph does not fit on one line it is spanned over multiple 213 lines (which will have the same indentation). 214 </para> 215 216 <para>You may specify additional indent for the first specified by 217 inserting spaces at the beginning of a paragraph. For example: 218 <programlisting> 219options.add_options() 220 ("help", " A long help msg a long help msg a long help msg a long help 221msg a long help msg a long help msg a long help msg a long help msg ") 222 ; 223 </programlisting> 224 will specify a four-space indent for the first line. The output will 225 look like: 226 <screen> 227 --help A long help msg a long 228 help msg a long help msg 229 a long help msg a long 230 help msg a long help msg 231 a long help msg a long 232 help msg 233 234 </screen> 235 </para> 236 237 <para>For the case where line is wrapped, you can want an additional 238 indent for wrapped text. This can be done by 239 inserting a tabulator character ('\t') at the desired position. For 240 example: 241 <programlisting> 242options.add_options() 243 ("well_formated", "As you can see this is a very well formatted 244option description.\n" 245 "You can do this for example:\n\n" 246 "Values:\n" 247 " Value1: \tdoes this and that, bla bla bla bla 248bla bla bla bla bla bla bla bla bla bla bla\n" 249 " Value2: \tdoes something else, bla bla bla bla 250bla bla bla bla bla bla bla bla bla bla bla\n\n" 251 " This paragraph has a first line indent only, 252bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla"); 253 </programlisting> 254 will produce: 255 <screen> 256 --well_formated As you can see this is a 257 very well formatted 258 option description. 259 You can do this for 260 example: 261 262 Values: 263 Value1: does this and 264 that, bla bla 265 bla bla bla bla 266 bla bla bla bla 267 bla bla bla bla 268 bla 269 Value2: does something 270 else, bla bla 271 bla bla bla bla 272 bla bla bla bla 273 bla bla bla bla 274 bla 275 276 This paragraph has a 277 first line indent only, 278 bla bla bla bla bla bla 279 bla bla bla bla bla bla 280 bla bla bla 281 </screen> 282 The tab character is removed before output. Only one tabulator per 283 paragraph is allowed, otherwise an exception of type 284 program_options::error is thrown. Finally, the tabulator is ignored if 285 it is not on the first line of the paragraph or is on the last 286 possible position of the first line. 287 </para> 288 289 </section> 290 291 </section> 292 293 <section> 294 <title>Semantic Information</title> 295 296 <para>The semantic information is completely provided by the 297 <classname>boost::program_options::value_semantic</classname> class. For 298 example: 299<programlisting> 300options_description desc; 301desc.add_options() 302 ("compression", value<int>()->default_value(10), "compression level") 303 ("email", value< vector<string> >() 304 ->composing()->notifier(&your_function), "email") 305 ; 306</programlisting> 307 These declarations specify that default value of the first option is 10, 308 that the second option can appear several times and all instances should 309 be merged, and that after parsing is done, the library will call 310 function <code>&your_function</code>, passing the value of the 311 "email" option as argument. 312 </para> 313 </section> 314 315 <section> 316 <title>Positional Options</title> 317 318 <para>Our definition of option as (name, value) pairs is simple and 319 useful, but in one special case of the command line, there's a 320 problem. A command line can include a <firstterm>positional option</firstterm>, 321 which does not specify any name at all, for example: 322 <screen> 323 archiver --compression=9 /etc/passwd 324 </screen> 325 Here, the "/etc/passwd" element does not have any option name. 326 </para> 327 328 <para>One solution is to ask the user to extract positional options 329 himself and process them as he likes. However, there's a nicer approach 330 -- provide a method to automatically assign the names for positional 331 options, so that the above command line can be interpreted the same way 332 as: 333 <screen> 334 archiver --compression=9 --input-file=/etc/passwd 335 </screen> 336 </para> 337 338 <para>The &positional_options_desc; class allows the command line 339 parser to assign the names. The class specifies how many positional options 340 are allowed, and for each allowed option, specifies the name. For example: 341<programlisting> 342positional_options_description pd; pd.add("input-file", 1); 343</programlisting> specifies that for exactly one, first, positional 344 option the name will be "input-file". 345 </para> 346 347 <para>It's possible to specify that a number, or even all positional options, be 348 given the same name. 349<programlisting> 350positional_options_description pd; 351pd.add("output-file", 2).add("input-file", -1); 352</programlisting> 353 In the above example, the first two positional options will be associated 354 with name "output-file", and any others with the name "input-file". 355 </para> 356 357 <warning> 358 <para>The &positional_options_desc; class only specifies translation from 359 position to name, and the option name should still be registered with 360 an instance of the &options_description; class.</para> 361 </warning> 362 363 364 </section> 365 366 <!-- Note that the classes are not modified during parsing --> 367 368 </section> 369 370 <section> 371 <title>Parsers Component</title> 372 373 <para>The parsers component splits input sources into (name, value) pairs. 374 Each parser looks for possible options and consults the options 375 description component to determine if the option is known and how its value 376 is specified. In the simplest case, the name is explicitly specified, 377 which allows the library to decide if such option is known. If it is known, the 378 &value_semantic; instance determines how the value is specified. (If 379 it is not known, an exception is thrown.) Common 380 cases are when the value is explicitly specified by the user, and when 381 the value cannot be specified by the user, but the presence of the 382 option implies some value (for example, <code>true</code>). So, the 383 parser checks that the value is specified when needed and not specified 384 when not needed, and returns new (name, value) pair. 385 </para> 386 387 <para> 388 To invoke a parser you typically call a function, passing the options 389 description and command line or config file or something else. 390 The results of parsing are returned as an instance of the &parsed_options; 391 class. Typically, that object is passed directly to the storage 392 component. However, it also can be used directly, or undergo some additional 393 processing. 394 </para> 395 396 <para> 397 There are three exceptions to the above model -- all related to 398 traditional usage of the command line. While they require some support 399 from the options description component, the additional complexity is 400 tolerable. 401 <itemizedlist> 402 <listitem> 403 <para>The name specified on the command line may be 404 different from the option name -- it's common to provide a "short option 405 name" alias to a longer name. It's also common to allow an abbreviated name 406 to be specified on the command line. 407 </para> 408 </listitem> 409 <listitem> 410 <para>Sometimes it's desirable to specify value as several 411 tokens. For example, an option "--email-recipient" may be followed 412 by several emails, each as a separate command line token. This 413 behaviour is supported, though it can lead to parsing ambiguities 414 and is not enabled by default. 415 </para> 416 </listitem> 417 <listitem> 418 <para>The command line may contain positional options -- elements 419 which don't have any name. The command line parser provides a 420 mechanism to guess names for such options, as we've seen in the 421 tutorial. 422 </para> 423 </listitem> 424 </itemizedlist> 425 </para> 426 427 </section> 428 429 430 <section> 431 <title>Storage Component</title> 432 433 <para>The storage component is responsible for: 434 <itemizedlist> 435 <listitem> 436 <para>Storing the final values of an option into a special class and in 437 regular variables</para> 438 </listitem> 439 <listitem> 440 <para>Handling priorities among different sources.</para> 441 </listitem> 442 443 <listitem> 444 <para>Calling user-specified <code>notify</code> functions with the final 445 values of options.</para> 446 </listitem> 447 </itemizedlist> 448 </para> 449 450 <para>Let's consider an example: 451<programlisting> 452variables_map vm; 453store(parse_command_line(argc, argv, desc), vm); 454store(parse_config_file("example.cfg", desc), vm); 455notify(vm); 456</programlisting> 457 The <code>variables_map</code> class is used to store the option 458 values. The two calls to the <code>store</code> function add values 459 found on the command line and in the config file. Finally the call to 460 the <code>notify</code> function runs the user-specified notify 461 functions and stores the values into regular variables, if needed. 462 </para> 463 464 <para>The priority is handled in a simple way: the <code>store</code> 465 function will not change the value of an option if it's already 466 assigned. In this case, if the command line specifies the value for an 467 option, any value in the config file is ignored. 468 </para> 469 470 <warning> 471 <para>Don't forget to call the <code>notify</code> function after you've 472 stored all parsed values.</para> 473 </warning> 474 475 </section> 476 477 <section> 478 <title>Specific parsers</title> 479 480 <section> 481 <title>Configuration file parser</title> 482 483 <para>The &parse_config_file; function implements parsing 484 of simple INI-like configuration files. Configuration file 485 syntax is line based: 486 </para> 487 <itemizedlist> 488 <listitem><para>A line in the form:</para> 489 <screen> 490<replaceable>name</replaceable>=<replaceable>value</replaceable> 491 </screen> 492 <para>gives a value to an option.</para> 493 </listitem> 494 <listitem><para>A line in the form:</para> 495 <screen> 496[<replaceable>section name</replaceable>] 497 </screen> 498 <para>introduces a new section in the configuration file.</para> 499 </listitem> 500 <listitem><para>The <literal>#</literal> character introduces a 501 comment that spans until the end of the line.</para> 502 </listitem> 503 </itemizedlist> 504 505 <para>The option names are relative to the section names, so 506 the following configuration file part:</para> 507 <screen> 508[gui.accessibility] 509visual_bell=yes 510 </screen> 511 <para>is equivalent to</para> 512 <screen> 513gui.accessibility.visual_bell=yes 514 </screen> 515 <para>When the option "gui.accessibility.visual_bell" has been added to the options</para> 516 <programlisting> 517options_description desc; 518desc.add_options() 519 ("gui.accessibility.visual_bell", value<string>(), "flash screen for bell") 520 ; 521 </programlisting> 522 </section> 523 524 <section> 525 <title>Environment variables parser</title> 526 527 <para><firstterm>Environment variables</firstterm> are string variables 528 which are available to all programs via the <code>getenv</code> function 529 of C runtime library. The operating system allows to set initial values 530 for a given user, and the values can be further changed on the command 531 line. For example, on Windows one can use the 532 <filename>autoexec.bat</filename> file or (on recent versions) the 533 <filename>Control Panel/System/Advanced/Environment Variables</filename> 534 dialog, and on Unix —, the <filename>/etc/profile</filename>, 535 <filename>~/.profile</filename> and <filename>~/.bash_profile</filename> 536 files. Because environment variables can be set for the entire system, 537 they are particularly suitable for options which apply to all programs. 538 </para> 539 540 <para>The environment variables can be parsed with the 541 &parse_environment; function. The function have several overloaded 542 versions. The first parameter is always an &options_description; 543 instance, and the second specifies what variables must be processed, and 544 what option names must correspond to it. To describe the second 545 parameter we need to consider naming conventions for environment 546 variables.</para> 547 548 <para>If you have an option that should be specified via environment 549 variable, you need make up the variable's name. To avoid name clashes, 550 we suggest that you use a sufficiently unique prefix for environment 551 variables. Also, while option names are most likely in lower case, 552 environment variables conventionally use upper case. So, for an option 553 name <literal>proxy</literal> the environment variable might be called 554 <envar>BOOST_PROXY</envar>. During parsing, we need to perform reverse 555 conversion of the names. This is accomplished by passing the choosen 556 prefix as the second parameter of the &parse_environment; function. 557 Say, if you pass <literal>BOOST_</literal> as the prefix, and there are 558 two variables, <envar>CVSROOT</envar> and <envar>BOOST_PROXY</envar>, the 559 first variable will be ignored, and the second one will be converted to 560 option <literal>proxy</literal>. 561 </para> 562 563 <para>The above logic is sufficient in many cases, but it is also 564 possible to pass, as the second parameter of the &parse_environment; 565 function, any function taking a <code>std::string</code> and returning 566 <code>std::string</code>. That function will be called for each 567 environment variable and should return either the name of the option, or 568 empty string if the variable should be ignored. An example showing this 569 method can be found in "example/env_options.cpp". 570 </para> 571 572 </section> 573 </section> 574 575 <section> 576 <title>Types</title> 577 578 <para>Everything that is passed in on the command line, as an environmental 579 variable, or in a config file is a string. For values that need to be used 580 as a non-string type, the value in the variables_map will attempt to 581 convert it to the correct type.</para> 582 583 <para>Integers and floating point values are converted using Boost's 584 lexical_cast. It will accept integer values such as "41" or "-42". It will 585 accept floating point numbers such as "51.1", "-52.1", "53.1234567890" (as 586 a double), "54", "55.", ".56", "57.1e5", "58.1E5", ".591e5", "60.1e-5", 587 "-61.1e5", "-62.1e-5", etc. Unfortunately, hex, octal, and binary 588 representations that are available in C++ literals are not supported by 589 lexical_cast, and thus will not work with program_options.</para> 590 591 <para>Booleans a special in that there are multiple ways to come at them. 592 Similar to another value type, it can be specified as <code>("my-option", 593 value<bool>())</code>, and then set as:</para> 594 <screen> 595example --my-option=true 596 </screen> 597 <para>However, more typical is that boolean values are set by the simple 598 presence of a switch. This is enabled by &bool_switch; as in <code> 599 ("other-option", bool_switch())</code>. This will cause the value to 600 default to false and it will become true if the switch is found:</para> 601 <screen> 602example --other-switch 603 </screen> 604 <para>When a boolean does take a parameter, there are several options. 605 Those that evaluate to true in C++ are: "true", "yes", "on", "1". Those 606 that evaluate to false in C++ are: "false", "no", "off", "0". In addition, 607 when reading from a config file, the option name with an equal sign and no 608 value after it will also evaluate to true.</para> 609 </section> 610 611 <section> 612 <title>Annotated List of Symbols</title> 613 614 <para>The following table describes all the important symbols in the 615 library, for quick access.</para> 616 617 <informaltable pgwide="1"> 618 619 <tgroup cols="2"> 620 <colspec colname='c1'/> 621 <colspec colname='c2'/> 622 <thead> 623 624 <row> 625 <entry>Symbol</entry> 626 <entry>Description</entry> 627 </row> 628 </thead> 629 630 <tbody> 631 632 <row> 633 <entry namest='c1' nameend='c2'>Options description component</entry> 634 </row> 635 636 <row> 637 <entry>&options_description;</entry> 638 <entry>describes a number of options</entry> 639 </row> 640 <row> 641 <entry>&value;</entry> 642 <entry>defines the option's value</entry> 643 </row> 644 645 <row> 646 <entry namest='c1' nameend='c2'>Parsers component</entry> 647 </row> 648 649 <row> 650 <entry>&parse_command_line;</entry> 651 <entry>parses command line (simpified interface)</entry> 652 </row> 653 654 <row> 655 <entry>&basic_command_line_parser;</entry> 656 <entry>parses command line (extended interface)</entry> 657 </row> 658 659 660 <row> 661 <entry>&parse_config_file;</entry> 662 <entry>parses config file</entry> 663 </row> 664 665 <row> 666 <entry>&parse_environment;</entry> 667 <entry>parses environment</entry> 668 </row> 669 670 <row> 671 <entry namest='c1' nameend='c2'>Storage component</entry> 672 </row> 673 674 <row> 675 <entry>&variables_map;</entry> 676 <entry>storage for option values</entry> 677 </row> 678 679 </tbody> 680 </tgroup> 681 682 </informaltable> 683 684 </section> 685 686</section> 687 688<!-- 689 Local Variables: 690 mode: nxml 691 sgml-indent-data: t 692 sgml-parent-document: ("program_options.xml" "section") 693 sgml-set-face: t 694 End: 695--> 696