1[vset VERSION 1.1] 2[comment {-*- text -*- doctools manpage}] 3[manpage_begin pt::rde n [vset VERSION]] 4[include include/module.inc] 5[titledesc {Parsing Runtime Support, PARAM based}] 6[require pt::rde [opt [vset VERSION]]] 7[require snit] 8[require struct::stack 1.5] 9[require pt::ast 1.1] 10[description] 11[include include/ref_intro.inc] 12 13This package provides a class whose instances provide the runtime 14support for recursive descent parsers with backtracking, as is needed 15for the execution of, for example, parsing expression grammars. It 16implements the [manpage {PackRat Machine Specification}], as such that 17document is [emph required] reading to understand both this manpage, 18and the package itself. The description below does make numerous 19shorthand references to the PARAM's instructions and the various parts 20of its architectural state. 21 22[para] 23 24The package resides in the Execution section of the Core Layer of 25Parser Tools. 26[para][image arch_core_transform][para] 27 28[para] 29 30Note: This package not only has the standard Tcl implementation, but 31also an accelerator, i.e. a C implementation, based on Critcl. 32 33[para] 34 35[subsection {Class API}] 36 37The package exports the API described here. 38 39[list_begin definitions] 40 41[call [cmd ::pt::rde] [arg objectName]] 42 43The command creates a new runtime object for a recursive descent 44parser with backtracking and returns the fully qualified name of the 45object command as its result. The API of this object command is 46described in the section [sectref {Object API}]. It may be used to 47invoke various operations on the object. 48 49[list_end] 50 51[subsection {Object API}] 52 53All objects created by this package provide the following 63 methods 54for the manipulation and querying of their state, which is, in essence 55the architectural state of a PARAM. 56 57[para] 58 59First some general methods and the state accessors. 60 61[list_begin definitions] 62 63[call [arg objectName] [method destroy]] 64 65This method destroys the object, releasing all claimed memory, and 66deleting the associated object command. 67 68[call [arg objectName] [method reset] [arg chan]] 69 70This method resets the state of the runtme to its defaults, preparing 71it for the parsing of the character in the channel [arg chan], which 72becomes IN. 73 74[include include/channel_notes.inc] 75 76[call [arg objectName] [method complete]] 77 78This method completes parsing, either returning the AST made from the 79elements of ARS, or throwing an error containing the current ER. 80 81[call [arg objectName] [method chan]] 82 83This method returns the handle of the channel which is IN. 84 85[call [arg objectName] [method line]] 86 87This method returns the line number for the position IN is currently 88at. Note that this may not match with the line number for CL, due to 89backtracking. 90 91[call [arg objectName] [method column]] 92 93This method returns the column for the position IN is currently 94at. Note that this may not match with the column for CL, due to 95backtracking. 96 97[call [arg objectName] [method current]] 98 99This method returns CC. 100 101[call [arg objectName] [method location]] 102 103This method returns CL. 104 105[call [arg objectName] [method locations]] 106 107This method returns the LS. The topmost entry of the stack will be the 108first element of the returned list. 109 110[call [arg objectName] [method ok]] 111 112This method returns ST. 113 114[call [arg objectName] [method value]] 115 116This method returns SV. 117 118[call [arg objectName] [method error]] 119 120This method returns ER. This is either the empty string for an empty 121ER, or a list of 2 elements, the location the error is for, and a set 122of messages which specify which symbols were expected at the 123location. The messages are encoded as one of the possible atomic 124parsing expressions (special operators, terminal, range, and 125nonterminal operator). 126 127[call [arg objectName] [method errors]] 128 129This method returns ES. The topmost entry of the stack will be the 130first element of the returned list. Each entry is encoded as described 131for [method error]. 132 133[call [arg objectName] [method tokens] [opt "[arg from] [opt [arg to]]"]] 134 135This method returns the part of TC for the range of locations of IN 136starting at [arg from] and ending at [arg to]. If [arg to] is not 137specified it is taken as identical to [arg from]. If neither argument 138is specified the whole of TC is returned. 139 140[para] 141 142Each token in the returned list is a list of three elements itself, 143containing the character at the location, and the associated line and 144column numbers, in this order. 145 146[call [arg objectName] [method symbols]] 147 148This method returns a dictionary containing NC. Keys are two-element 149lists containing nonterminal symbol and location, in this order. The 150values are 4-tuples containing CL, ST, ER, and SV, in this order. ER 151is encoded as specified for the method [method error]. 152 153[call [arg objectName] [method known]] 154 155This method returns a list containing the keys of SC. They are 156encoded in the same manner as is done by method [method symbols]. 157 158[call [arg objectName] [method reducible]] 159 160This method returns ARS. The topmost entry of the stack will be the 161first element of the returned list 162 163[call [arg objectName] [method asts]] 164 165This method returns AS. The topmost entry of the stack will be the 166first element of the returned list 167 168[call [arg objectName] [method ast]] 169 170This is a convenience method returning the topmost element of ARS. 171 172[call [arg objectName] [method position] [arg loc]] 173 174This method returns the line and column numbers for the specified 175location of IN, assuming that this location has already been reached 176during the parsing process. 177 178[list_end] 179 180The following methods implement all PARAM instructions. They all have 181the prefix "i_". 182 183[para] 184 185The control flow is mainly provided by Tcl's builtin commands, like 186[cmd if], [cmd while], etc., plus a few guarded variants of PARAM 187instructions and Tcl commands.. That means that these instruction 188variants will do nothing if their guard condition is not 189fulfilled. They can be recognized by the prefix "i:ok_" and "i:fail_", 190which denote the value ST has to have for the instruction to execute. 191 192[para] 193 194The instructions are listed in the same order they occur in the 195[manpage {PackRat Machine Specification}], with the guard variants 196listed after their regular implementation, if any, or in their place. 197 198[list_begin definitions] 199 200[vset INS input_next][vset IA0 msg][include include/rde_1ins.inc] 201 202[vset INS test_alnum][include include/rde_0ins.inc] 203[vset INS test_alpha][include include/rde_0ins.inc] 204[vset INS test_ascii][include include/rde_0ins.inc] 205[vset INS test_char][vset IA0 char][include include/rde_1ins.inc] 206[vset INS test_ddigit][include include/rde_0ins.inc] 207[vset INS test_digit][include include/rde_0ins.inc] 208[vset INS test_graph][include include/rde_0ins.inc] 209[vset INS test_lower][include include/rde_0ins.inc] 210[vset INS test_print][include include/rde_0ins.inc] 211[vset INS test_punct][include include/rde_0ins.inc] 212[vset INS test_range][vset IA0 chars][vset IA1 chare][include include/rde_2ins.inc] 213[vset INS test_space][include include/rde_0ins.inc] 214[vset INS test_upper][include include/rde_0ins.inc] 215[vset INS test_wordchar][include include/rde_0ins.inc] 216[vset INS test_xdigit][include include/rde_0ins.inc] 217 218[vset INS error_clear][include include/rde_0ins.inc] 219[vset INS error_push][include include/rde_0ins.inc] 220[vset INS error_pop_merge][include include/rde_0ins.inc] 221[vset INS error_nonterminal][vset IA0 symbol][include include/rde_1ins.inc] 222 223[vset INS status_ok][include include/rde_0ins.inc] 224[vset INS status_fail][include include/rde_0ins.inc] 225[vset INS status_negate][include include/rde_0ins.inc] 226 227[vset INS loc_push][include include/rde_0ins.inc] 228[vset INS loc_pop_discard][include include/rde_0ins.inc] 229[vset INS loc_pop_rewind][include include/rde_0ins.inc] 230[vset G ok][vset INS loc_pop_rewind][include include/rde_0gins.inc] 231[vset IFAIL loc_pop_rewind] 232[vset IOK loc_pop_discard] 233[vset IOKX discard][include include/rde_0cins.inc] 234 235[vset INS symbol_restore][vset IA0 symbol][include include/rde_1ins.inc] 236 237[para] The boolean result of the check is returned as the result of 238the method and can be used with standard Tcl control flow commands. 239 240[vset INS symbol_save][vset IA0 symbol][include include/rde_1ins.inc] 241 242[vset INS value_clear][include include/rde_0ins.inc] 243[vset IFAIL value_clear] 244[vset IOK value_leaf] 245[vset IOKX leaf][include include/rde_0cins.inc] 246[vset IFAIL value_clear] 247[vset IOK value_reduce] 248[vset IOKX reduce][include include/rde_0cins.inc] 249 250[vset G ok][vset INS ast_value_push][include include/rde_0ginsb.inc] 251[vset INS ast_push][include include/rde_0ins.inc] 252[vset INS ast_pop_rewind][include include/rde_0ins.inc] 253[vset G fail][vset INS ast_pop_rewind][include include/rde_0gins.inc] 254[vset IFAIL ast_pop_rewind] 255[vset IOK ast_pop_discard] 256[vset IOKX discard][include include/rde_0cins.inc] 257[vset INS ast_pop_discard][include include/rde_0ins.inc] 258[vset IFAIL ast_pop_discard] 259[vset IOK ast_pop_rewind] 260[vset IOKX rewind][include include/rde_0cins.inc] 261 262[call [arg objectName] [method i:ok_continue]] 263 264This guarded method executes only for "ST == ok". Then it aborts the 265current iteration of the innermost loop in the calling Tcl procedure. 266 267[call [arg objectName] [method i:fail_continue]] 268 269This guarded method executes only for "ST == fail". Then it aborts the 270current iteration of the innermost loop in the calling Tcl procedure. 271 272[call [arg objectName] [method i:fail_return]] 273 274This guarded method executes only for "ST == fail". Then it aborts the 275calling Tcl procedure. 276 277[call [arg objectName] [method i:ok_return]] 278 279This guarded method executes only for "ST == ok". Then it aborts the 280calling Tcl procedure. 281 282[list_end] 283[para] 284 285The next set of methods are [term {super instructions}], meaning that 286each implements a longer sequence of instructions commonly used in 287parsers. The combinated instructions of the previous set, i.e. those 288with names matching the pattern "i_*/*", are actually super 289instructions as well, albeit with limited scope, handling 2 290instructions with their control flow. The upcoming set is much broader 291in scope, folding as much as six or more PARAM instructions into a 292single method call. 293 294[para] 295 296In this we can see the reasoning behind their use well: 297 298[list_begin enumerated] 299[enum] 300By using less instructions the generated parsers become smaller, as 301the common parts are now truly part of the common runtime, and not 302explicitly written in the parser's code over and over again. 303 304[enum] 305Using less instructions additionally reduces the overhead associated 306with calls into the runtime, i.e. the cost of method dispatch and of 307setting up the variable context. 308 309[enum] 310Another effect of the super instructions is that their internals can 311be optimized as well, especially regarding control flow, and stack 312use, as the runtime internals are accessible to all instructions 313folded into the sequence. 314[list_end] 315 316[para] 317 318[list_begin definitions] 319[call [arg objectName] [method si:void_state_push]] 320 321This method combines [example { 322i_loc_push 323i_error_clear 324i_error_push 325}] 326 327Parsers use it at the beginning of [term void] sequences and choices 328with a [term void] initial branch. 329 330[call [arg objectName] [method si:void2_state_push]] 331 332This method combines [example { 333i_loc_push 334i_error_clear 335i_error_push 336}] 337 338Parsers use it at the beginning of optional and repeated expressions. 339 340[call [arg objectName] [method si:value_state_push]] 341 342This method combines [example { 343i_ast_push 344i_loc_push 345i_error_clear 346i_error_push 347}] 348 349Parsers use it at the beginning of sequences generating an AST and 350choices with an initial branch generating an AST. 351 352[call [arg objectName] [method si:void_state_merge]] 353 354This method combines [example { 355i_error_pop_merge 356i_loc_pop_rewind/discard 357}] 358 359Parsers use it at the end of void sequences and choices whose last 360branch is void. 361 362[call [arg objectName] [method si:void_state_merge_ok]] 363 364This method combines [example { 365i_error_pop_merge 366i_loc_pop_rewind/discard 367i_status_ok 368}] 369 370Parsers use it at the end of optional expressions 371 372[call [arg objectName] [method si:value_state_merge]] 373 374This method combines [example { 375i_error_pop_merge 376i_ast_pop_rewind/discard 377i_loc_pop_rewind/discard 378}] 379 380Parsers use it at the end of sequences generating ASTs and choices 381whose last branch generates an AST 382 383[call [arg objectName] [method si:value_notahead_start]] 384 385This method combines [example { 386i_loc_push 387i_ast_push 388}] 389 390Parsers use it at the beginning of negative lookahead predicates which 391generate ASTs. 392 393[call [arg objectName] [method si:void_notahead_exit]] 394 395This method combines [example { 396i_loc_pop_rewind 397i_status_negate 398}] 399 400Parsers use it at the end of void negative lookahead predicates. 401 402[call [arg objectName] [method si:value_notahead_exit]] 403 404This method combines [example { 405i_ast_pop_discard/rewind 406i_loc_pop_rewind 407i_status_negate 408}] 409 410Parsers use it at the end of negative lookahead predicates which 411generate ASTs. 412 413[call [arg objectName] [method si:kleene_abort]] 414 415This method combines [example { 416i_loc_pop_rewind/discard 417i:fail_return 418}] 419 420Parsers use it to stop a positive repetition when its first, required, expression fails. 421 422[call [arg objectName] [method si:kleene_close]] 423 424This method combines [example { 425i_error_pop_merge 426i_loc_pop_rewind/discard 427i:fail_status_ok 428i:fail_return 429}] 430 431Parsers use it at the end of repetitions. 432 433[call [arg objectName] [method si:voidvoid_branch]] 434 435This method combines [example { 436i_error_pop_merge 437i:ok_loc_pop_discard 438i:ok_return 439i_loc_rewind 440i_error_push 441}] 442 443Parsers use it when transiting between branches of a choice when both are void. 444 445[call [arg objectName] [method si:voidvalue_branch]] 446 447This method combines [example { 448i_error_pop_merge 449i:ok_loc_pop_discard 450i:ok_return 451i_ast_push 452i_loc_rewind 453i_error_push 454}] 455 456Parsers use it when transiting between branches of a choice when the 457failing branch is void, and the next to test generates an AST. 458 459[call [arg objectName] [method si:valuevoid_branch]] 460 461This method combines [example { 462i_error_pop_merge 463i_ast_pop_rewind/discard 464i:ok_loc_pop_discard 465i:ok_return 466i_loc_rewind 467i_error_push 468}] 469 470Parsers use it when transiting between branches of a choice when the 471failing branch generates an AST, and the next to test is void. 472 473[call [arg objectName] [method si:valuevalue_branch]] 474 475This method combines [example { 476i_error_pop_merge 477i_ast_pop_discard 478i:ok_loc_pop_discard 479i:ok_return 480i_ast_rewind 481i_loc_rewind 482i_error_push 483}] 484 485Parsers use it when transiting between branches of a choice when both 486generate ASTs. 487 488[call [arg objectName] [method si:voidvoid_part]] 489 490This method combines [example { 491i_error_pop_merge 492i:fail_loc_pop_rewind 493i:fail_return 494i_error_push 495}] 496 497Parsers use it when transiting between parts of a sequence and both 498are void. 499 500[call [arg objectName] [method si:voidvalue_part]] 501 502This method combines [example { 503i_error_pop_merge 504i:fail_loc_pop_rewind 505i:fail_return 506i_ast_push 507i_error_push 508}] 509 510Parsers use it when transiting between parts of a sequence and the 511sucessfully matched part is void, and after it an AST is generated. 512 513[call [arg objectName] [method si:valuevalue_part]] 514 515This method combines [example { 516i_error_pop_merge 517i:fail_ast_pop_rewind 518i:fail_loc_pop_rewind 519i:fail_return 520i_error_push 521}] 522 523Parsers use it when transiting between parts of a sequence and both 524parts generate ASTs. 525 526[call [arg objectName] [method si:value_symbol_start] [arg symbol]] 527 528This method combines [example { 529if/found? i_symbol_restore $symbol 530i:found:ok_ast_value_push 531i:found_return 532i_loc_push 533i_ast_push 534}] 535 536Parsers use it at the beginning of a nonterminal symbol generating an 537AST, whose right-hand side may have generated an AST as well. 538 539[call [arg objectName] [method si:value_void_symbol_start] [arg symbol]] 540 541This method combines [example { 542if/found? i_symbol_restore $symbol 543i:found:ok_ast_value_push 544i:found_return 545i_loc_push 546i_ast_push 547}] 548 549Parsers use it at the beginning of a void nonterminal symbol whose 550right-hand side may generate an AST. 551 552[call [arg objectName] [method si:void_symbol_start] [arg symbol]] 553 554This method combines [example { 555if/found? i_symbol_restore $symbol 556i:found_return 557i_loc_push 558i_ast_push 559}] 560 561Parsers use it at the beginning of a nonterminal symbol generating an 562AST whose right-hand side is void. 563 564[call [arg objectName] [method si:void_void_symbol_start] [arg symbol]] 565 566This method combines [example { 567if/found? i_symbol_restore $symbol 568i:found_return 569i_loc_push 570}] 571 572Parsers use it at the beginning of a void nonterminal symbol whose 573right-hand side is void as well. 574 575[call [arg objectName] [method si:reduce_symbol_end] [arg symbol]] 576 577This method combines [example { 578i_value_clear/reduce $symbol 579i_symbol_save $symbol 580i_error_nonterminal $symbol 581i_ast_pop_rewind 582i_loc_pop_discard 583i:ok_ast_value_push 584}] 585 586Parsers use it at the end of a non-terminal symbol generating an AST 587using the AST generated by the right-hand side as child. 588 589[call [arg objectName] [method si:void_leaf_symbol_end] [arg symbol]] 590 591This method combines [example { 592i_value_clear/leaf $symbol 593i_symbol_save $symbol 594i_error_nonterminal $symbol 595i_loc_pop_discard 596i:ok_ast_value_push 597}] 598 599Parsers use it at the end of a non-terminal symbol generating an AST 600whose right-hand side is void. 601 602[call [arg objectName] [method si:value_leaf_symbol_end] [arg symbol]] 603 604This method combines [example { 605i_value_clear/leaf $symbol 606i_symbol_save $symbol 607i_error_nonterminal $symbol 608i_loc_pop_discard 609i_ast_pop_rewind 610i:ok_ast_value_push 611}] 612 613Parsers use it at the end of a non-terminal symbol generating an AST 614discarding the AST generated by the right-hand side. 615 616[call [arg objectName] [method si:value_clear_symbol_end] [arg symbol]] 617 618This method combines [example { 619i_value_clear 620i_symbol_save $symbol 621i_error_nonterminal $symbol 622i_loc_pop_discard 623i_ast_pop_rewind 624}] 625 626Parsers use it at the end of a void non-terminal symbol, discarding 627the AST generated by the right-hand side. 628 629[call [arg objectName] [method si:void_clear_symbol_end] [arg symbol]] 630 631This method combines [example { 632i_value_clear 633i_symbol_save $symbol 634i_error_nonterminal $symbol 635i_loc_pop_discard 636}] 637 638Parsers use it at the end of a void non-terminal symbol with a void 639right-hand side. 640 641[call [arg objectName] [method si:next_char] [arg tok]] 642[call [arg objectName] [method si:next_range] [arg toks] [arg toke]] 643[call [arg objectName] [method si:next_alnum]] 644[call [arg objectName] [method si:next_alpha]] 645[call [arg objectName] [method si:next_ascii]] 646[call [arg objectName] [method si:next_ddigit]] 647[call [arg objectName] [method si:next_digit]] 648[call [arg objectName] [method si:next_graph]] 649[call [arg objectName] [method si:next_lower]] 650[call [arg objectName] [method si:next_print]] 651[call [arg objectName] [method si:next_punct]] 652[call [arg objectName] [method si:next_space]] 653[call [arg objectName] [method si:next_upper]] 654[call [arg objectName] [method si:next_wordchar]] 655[call [arg objectName] [method si:next_xdigit]] 656 657These methods all combine [example { 658i_input_next $msg 659i:fail_return 660}] 661 662with the appropriate [cmd i_test_xxx] instruction. Parsers use them for 663handling atomic expressions. 664 665[list_end] 666[para] 667 668[include include/feedback.inc] 669[manpage_end] 670