1Version 3.10 2--------------------- 301/31/17: beazley 4 Changed grammar signature computation to not involve hashing 5 functions. Parts are just combined into a big string. 6 710/07/16: beazley 8 Fixed Issue #101: Incorrect shift-reduce conflict resolution with 9 precedence specifier. 10 11 PLY was incorrectly resolving shift-reduce conflicts in certain 12 cases. For example, in the example/calc/calc.py example, you 13 could trigger it doing this: 14 15 calc > -3 - 4 16 1 (correct answer should be -7) 17 calc > 18 19 Issue and suggested patch contributed by https://github.com/RomaVis 20 21Version 3.9 22--------------------- 2308/30/16: beazley 24 Exposed the parser state number as the parser.state attribute 25 in productions and error functions. For example: 26 27 def p_somerule(p): 28 ''' 29 rule : A B C 30 ''' 31 print('State:', p.parser.state) 32 33 May address issue #65 (publish current state in error callback). 34 3508/30/16: beazley 36 Fixed Issue #88. Python3 compatibility with ply/cpp. 37 3808/30/16: beazley 39 Fixed Issue #93. Ply can crash if SyntaxError is raised inside 40 a production. Not actually sure if the original implementation 41 worked as documented at all. Yacc has been modified to follow 42 the spec as outlined in the CHANGES noted for 11/27/07 below. 43 4408/30/16: beazley 45 Fixed Issue #97. Failure with code validation when the original 46 source files aren't present. Validation step now ignores 47 the missing file. 48 4908/30/16: beazley 50 Minor fixes to version numbers. 51 52Version 3.8 53--------------------- 5410/02/15: beazley 55 Fixed issues related to Python 3.5. Patch contributed by Barry Warsaw. 56 57Version 3.7 58--------------------- 5908/25/15: beazley 60 Fixed problems when reading table files from pickled data. 61 6205/07/15: beazley 63 Fixed regression in handling of table modules if specified as module 64 objects. See https://github.com/dabeaz/ply/issues/63 65 66Version 3.6 67--------------------- 6804/25/15: beazley 69 If PLY is unable to create the 'parser.out' or 'parsetab.py' files due 70 to permission issues, it now just issues a warning message and 71 continues to operate. This could happen if a module using PLY 72 is installed in a funny way where tables have to be regenerated, but 73 for whatever reason, the user doesn't have write permission on 74 the directory where PLY wants to put them. 75 7604/24/15: beazley 77 Fixed some issues related to use of packages and table file 78 modules. Just to emphasize, PLY now generates its special 79 files such as 'parsetab.py' and 'lextab.py' in the *SAME* 80 directory as the source file that uses lex() and yacc(). 81 82 If for some reason, you want to change the name of the table 83 module, use the tabmodule and lextab options: 84 85 lexer = lex.lex(lextab='spamlextab') 86 parser = yacc.yacc(tabmodule='spamparsetab') 87 88 If you specify a simple name as shown, the module will still be 89 created in the same directory as the file invoking lex() or yacc(). 90 If you want the table files to be placed into a different package, 91 then give a fully qualified package name. For example: 92 93 lexer = lex.lex(lextab='pkgname.files.lextab') 94 parser = yacc.yacc(tabmodule='pkgname.files.parsetab') 95 96 For this to work, 'pkgname.files' must already exist as a valid 97 Python package (i.e., the directories must already exist and be 98 set up with the proper __init__.py files, etc.). 99 100Version 3.5 101--------------------- 10204/21/15: beazley 103 Added support for defaulted_states in the parser. A 104 defaulted_state is a state where the only legal action is a 105 reduction of a single grammar rule across all valid input 106 tokens. For such states, the rule is reduced and the 107 reading of the next lookahead token is delayed until it is 108 actually needed at a later point in time. 109 110 This delay in consuming the next lookahead token is a 111 potentially important feature in advanced parsing 112 applications that require tight interaction between the 113 lexer and the parser. For example, a grammar rule change 114 modify the lexer state upon reduction and have such changes 115 take effect before the next input token is read. 116 117 *** POTENTIAL INCOMPATIBILITY *** 118 One potential danger of defaulted_states is that syntax 119 errors might be deferred to a a later point of processing 120 than where they were detected in past versions of PLY. 121 Thus, it's possible that your error handling could change 122 slightly on the same inputs. defaulted_states do not change 123 the overall parsing of the input (i.e., the same grammar is 124 accepted). 125 126 If for some reason, you need to disable defaulted states, 127 you can do this: 128 129 parser = yacc.yacc() 130 parser.defaulted_states = {} 131 13204/21/15: beazley 133 Fixed debug logging in the parser. It wasn't properly reporting goto states 134 on grammar rule reductions. 135 13604/20/15: beazley 137 Added actions to be defined to character literals (Issue #32). For example: 138 139 literals = [ '{', '}' ] 140 141 def t_lbrace(t): 142 r'\{' 143 # Some action 144 t.type = '{' 145 return t 146 147 def t_rbrace(t): 148 r'\}' 149 # Some action 150 t.type = '}' 151 return t 152 15304/19/15: beazley 154 Import of the 'parsetab.py' file is now constrained to only consider the 155 directory specified by the outputdir argument to yacc(). If not supplied, 156 the import will only consider the directory in which the grammar is defined. 157 This should greatly reduce problems with the wrong parsetab.py file being 158 imported by mistake. For example, if it's found somewhere else on the path 159 by accident. 160 161 *** POTENTIAL INCOMPATIBILITY *** It's possible that this might break some 162 packaging/deployment setup if PLY was instructed to place its parsetab.py 163 in a different location. You'll have to specify a proper outputdir= argument 164 to yacc() to fix this if needed. 165 16604/19/15: beazley 167 Changed default output directory to be the same as that in which the 168 yacc grammar is defined. If your grammar is in a file 'calc.py', 169 then the parsetab.py and parser.out files should be generated in the 170 same directory as that file. The destination directory can be changed 171 using the outputdir= argument to yacc(). 172 17304/19/15: beazley 174 Changed the parsetab.py file signature slightly so that the parsetab won't 175 regenerate if created on a different major version of Python (ie., a 176 parsetab created on Python 2 will work with Python 3). 177 17804/16/15: beazley 179 Fixed Issue #44 call_errorfunc() should return the result of errorfunc() 180 18104/16/15: beazley 182 Support for versions of Python <2.7 is officially dropped. PLY may work, but 183 the unit tests requires Python 2.7 or newer. 184 18504/16/15: beazley 186 Fixed bug related to calling yacc(start=...). PLY wasn't regenerating the 187 table file correctly for this case. 188 18904/16/15: beazley 190 Added skipped tests for PyPy and Java. Related to use of Python's -O option. 191 19205/29/13: beazley 193 Added filter to make unit tests pass under 'python -3'. 194 Reported by Neil Muller. 195 19605/29/13: beazley 197 Fixed CPP_INTEGER regex in ply/cpp.py (Issue 21). 198 Reported by @vbraun. 199 20005/29/13: beazley 201 Fixed yacc validation bugs when from __future__ import unicode_literals 202 is being used. Reported by Kenn Knowles. 203 20405/29/13: beazley 205 Added support for Travis-CI. Contributed by Kenn Knowles. 206 20705/29/13: beazley 208 Added a .gitignore file. Suggested by Kenn Knowles. 209 21005/29/13: beazley 211 Fixed validation problems for source files that include a 212 different source code encoding specifier. Fix relies on 213 the inspect module. Should work on Python 2.6 and newer. 214 Not sure about older versions of Python. 215 Contributed by Michael Droettboom 216 21705/21/13: beazley 218 Fixed unit tests for yacc to eliminate random failures due to dict hash value 219 randomization in Python 3.3 220 Reported by Arfrever 221 22210/15/12: beazley 223 Fixed comment whitespace processing bugs in ply/cpp.py. 224 Reported by Alexei Pososin. 225 22610/15/12: beazley 227 Fixed token names in ply/ctokens.py to match rule names. 228 Reported by Alexei Pososin. 229 23004/26/12: beazley 231 Changes to functions available in panic mode error recover. In previous versions 232 of PLY, the following global functions were available for use in the p_error() rule: 233 234 yacc.errok() # Reset error state 235 yacc.token() # Get the next token 236 yacc.restart() # Reset the parsing stack 237 238 The use of global variables was problematic for code involving multiple parsers 239 and frankly was a poor design overall. These functions have been moved to methods 240 of the parser instance created by the yacc() function. You should write code like 241 this: 242 243 def p_error(p): 244 ... 245 parser.errok() 246 247 parser = yacc.yacc() 248 249 *** POTENTIAL INCOMPATIBILITY *** The original global functions now issue a 250 DeprecationWarning. 251 25204/19/12: beazley 253 Fixed some problems with line and position tracking and the use of error 254 symbols. If you have a grammar rule involving an error rule like this: 255 256 def p_assignment_bad(p): 257 '''assignment : location EQUALS error SEMI''' 258 ... 259 260 You can now do line and position tracking on the error token. For example: 261 262 def p_assignment_bad(p): 263 '''assignment : location EQUALS error SEMI''' 264 start_line = p.lineno(3) 265 start_pos = p.lexpos(3) 266 267 If the trackng=True option is supplied to parse(), you can additionally get 268 spans: 269 270 def p_assignment_bad(p): 271 '''assignment : location EQUALS error SEMI''' 272 start_line, end_line = p.linespan(3) 273 start_pos, end_pos = p.lexspan(3) 274 275 Note that error handling is still a hairy thing in PLY. This won't work 276 unless your lexer is providing accurate information. Please report bugs. 277 Suggested by a bug reported by Davis Herring. 278 27904/18/12: beazley 280 Change to doc string handling in lex module. Regex patterns are now first 281 pulled from a function's .regex attribute. If that doesn't exist, then 282 .doc is checked as a fallback. The @TOKEN decorator now sets the .regex 283 attribute of a function instead of its doc string. 284 Changed suggested by Kristoffer Ellersgaard Koch. 285 28604/18/12: beazley 287 Fixed issue #1: Fixed _tabversion. It should use __tabversion__ instead of __version__ 288 Reported by Daniele Tricoli 289 29004/18/12: beazley 291 Fixed issue #8: Literals empty list causes IndexError 292 Reported by Walter Nissen. 293 29404/18/12: beazley 295 Fixed issue #12: Typo in code snippet in documentation 296 Reported by florianschanda. 297 29804/18/12: beazley 299 Fixed issue #10: Correctly escape t_XOREQUAL pattern. 300 Reported by Andy Kittner. 301 302Version 3.4 303--------------------- 30402/17/11: beazley 305 Minor patch to make cpp.py compatible with Python 3. Note: This 306 is an experimental file not currently used by the rest of PLY. 307 30802/17/11: beazley 309 Fixed setup.py trove classifiers to properly list PLY as 310 Python 3 compatible. 311 31201/02/11: beazley 313 Migration of repository to github. 314 315Version 3.3 316----------------------------- 31708/25/09: beazley 318 Fixed issue 15 related to the set_lineno() method in yacc. Reported by 319 mdsherry. 320 32108/25/09: beazley 322 Fixed a bug related to regular expression compilation flags not being 323 properly stored in lextab.py files created by the lexer when running 324 in optimize mode. Reported by Bruce Frederiksen. 325 326 327Version 3.2 328----------------------------- 32903/24/09: beazley 330 Added an extra check to not print duplicated warning messages 331 about reduce/reduce conflicts. 332 33303/24/09: beazley 334 Switched PLY over to a BSD-license. 335 33603/23/09: beazley 337 Performance optimization. Discovered a few places to make 338 speedups in LR table generation. 339 34003/23/09: beazley 341 New warning message. PLY now warns about rules never 342 reduced due to reduce/reduce conflicts. Suggested by 343 Bruce Frederiksen. 344 34503/23/09: beazley 346 Some clean-up of warning messages related to reduce/reduce errors. 347 34803/23/09: beazley 349 Added a new picklefile option to yacc() to write the parsing 350 tables to a filename using the pickle module. Here is how 351 it works: 352 353 yacc(picklefile="parsetab.p") 354 355 This option can be used if the normal parsetab.py file is 356 extremely large. For example, on jython, it is impossible 357 to read parsing tables if the parsetab.py exceeds a certain 358 threshold. 359 360 The filename supplied to the picklefile option is opened 361 relative to the current working directory of the Python 362 interpreter. If you need to refer to the file elsewhere, 363 you will need to supply an absolute or relative path. 364 365 For maximum portability, the pickle file is written 366 using protocol 0. 367 36803/13/09: beazley 369 Fixed a bug in parser.out generation where the rule numbers 370 where off by one. 371 37203/13/09: beazley 373 Fixed a string formatting bug with one of the error messages. 374 Reported by Richard Reitmeyer 375 376Version 3.1 377----------------------------- 37802/28/09: beazley 379 Fixed broken start argument to yacc(). PLY-3.0 broke this 380 feature by accident. 381 38202/28/09: beazley 383 Fixed debugging output. yacc() no longer reports shift/reduce 384 or reduce/reduce conflicts if debugging is turned off. This 385 restores similar behavior in PLY-2.5. Reported by Andrew Waters. 386 387Version 3.0 388----------------------------- 38902/03/09: beazley 390 Fixed missing lexer attribute on certain tokens when 391 invoking the parser p_error() function. Reported by 392 Bart Whiteley. 393 39402/02/09: beazley 395 The lex() command now does all error-reporting and diagonistics 396 using the logging module interface. Pass in a Logger object 397 using the errorlog parameter to specify a different logger. 398 39902/02/09: beazley 400 Refactored ply.lex to use a more object-oriented and organized 401 approach to collecting lexer information. 402 40302/01/09: beazley 404 Removed the nowarn option from lex(). All output is controlled 405 by passing in a logger object. Just pass in a logger with a high 406 level setting to suppress output. This argument was never 407 documented to begin with so hopefully no one was relying upon it. 408 40902/01/09: beazley 410 Discovered and removed a dead if-statement in the lexer. This 411 resulted in a 6-7% speedup in lexing when I tested it. 412 41301/13/09: beazley 414 Minor change to the procedure for signalling a syntax error in a 415 production rule. A normal SyntaxError exception should be raised 416 instead of yacc.SyntaxError. 417 41801/13/09: beazley 419 Added a new method p.set_lineno(n,lineno) that can be used to set the 420 line number of symbol n in grammar rules. This simplifies manual 421 tracking of line numbers. 422 42301/11/09: beazley 424 Vastly improved debugging support for yacc.parse(). Instead of passing 425 debug as an integer, you can supply a Logging object (see the logging 426 module). Messages will be generated at the ERROR, INFO, and DEBUG 427 logging levels, each level providing progressively more information. 428 The debugging trace also shows states, grammar rule, values passed 429 into grammar rules, and the result of each reduction. 430 43101/09/09: beazley 432 The yacc() command now does all error-reporting and diagnostics using 433 the interface of the logging module. Use the errorlog parameter to 434 specify a logging object for error messages. Use the debuglog parameter 435 to specify a logging object for the 'parser.out' output. 436 43701/09/09: beazley 438 *HUGE* refactoring of the the ply.yacc() implementation. The high-level 439 user interface is backwards compatible, but the internals are completely 440 reorganized into classes. No more global variables. The internals 441 are also more extensible. For example, you can use the classes to 442 construct a LALR(1) parser in an entirely different manner than 443 what is currently the case. Documentation is forthcoming. 444 44501/07/09: beazley 446 Various cleanup and refactoring of yacc internals. 447 44801/06/09: beazley 449 Fixed a bug with precedence assignment. yacc was assigning the precedence 450 each rule based on the left-most token, when in fact, it should have been 451 using the right-most token. Reported by Bruce Frederiksen. 452 45311/27/08: beazley 454 Numerous changes to support Python 3.0 including removal of deprecated 455 statements (e.g., has_key) and the additional of compatibility code 456 to emulate features from Python 2 that have been removed, but which 457 are needed. Fixed the unit testing suite to work with Python 3.0. 458 The code should be backwards compatible with Python 2. 459 46011/26/08: beazley 461 Loosened the rules on what kind of objects can be passed in as the 462 "module" parameter to lex() and yacc(). Previously, you could only use 463 a module or an instance. Now, PLY just uses dir() to get a list of 464 symbols on whatever the object is without regard for its type. 465 46611/26/08: beazley 467 Changed all except: statements to be compatible with Python2.x/3.x syntax. 468 46911/26/08: beazley 470 Changed all raise Exception, value statements to raise Exception(value) for 471 forward compatibility. 472 47311/26/08: beazley 474 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr 475 directly. Preparation for Python 3.0 support. 476 47711/04/08: beazley 478 Fixed a bug with referring to symbols on the the parsing stack using negative 479 indices. 480 48105/29/08: beazley 482 Completely revamped the testing system to use the unittest module for everything. 483 Added additional tests to cover new errors/warnings. 484 485Version 2.5 486----------------------------- 48705/28/08: beazley 488 Fixed a bug with writing lex-tables in optimized mode and start states. 489 Reported by Kevin Henry. 490 491Version 2.4 492----------------------------- 49305/04/08: beazley 494 A version number is now embedded in the table file signature so that 495 yacc can more gracefully accomodate changes to the output format 496 in the future. 497 49805/04/08: beazley 499 Removed undocumented .pushback() method on grammar productions. I'm 500 not sure this ever worked and can't recall ever using it. Might have 501 been an abandoned idea that never really got fleshed out. This 502 feature was never described or tested so removing it is hopefully 503 harmless. 504 50505/04/08: beazley 506 Added extra error checking to yacc() to detect precedence rules defined 507 for undefined terminal symbols. This allows yacc() to detect a potential 508 problem that can be really tricky to debug if no warning message or error 509 message is generated about it. 510 51105/04/08: beazley 512 lex() now has an outputdir that can specify the output directory for 513 tables when running in optimize mode. For example: 514 515 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") 516 517 The behavior of specifying a table module and output directory are 518 more aligned with the behavior of yacc(). 519 52005/04/08: beazley 521 [Issue 9] 522 Fixed filename bug in when specifying the modulename in lex() and yacc(). 523 If you specified options such as the following: 524 525 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") 526 527 yacc would create a file "foo.bar.parsetab.py" in the given directory. 528 Now, it simply generates a file "parsetab.py" in that directory. 529 Bug reported by cptbinho. 530 53105/04/08: beazley 532 Slight modification to lex() and yacc() to allow their table files 533 to be loaded from a previously loaded module. This might make 534 it easier to load the parsing tables from a complicated package 535 structure. For example: 536 537 import foo.bar.spam.parsetab as parsetab 538 parser = yacc.yacc(tabmodule=parsetab) 539 540 Note: lex and yacc will never regenerate the table file if used 541 in the form---you will get a warning message instead. 542 This idea suggested by Brian Clapper. 543 544 54504/28/08: beazley 546 Fixed a big with p_error() functions being picked up correctly 547 when running in yacc(optimize=1) mode. Patch contributed by 548 Bart Whiteley. 549 55002/28/08: beazley 551 Fixed a bug with 'nonassoc' precedence rules. Basically the 552 non-precedence was being ignored and not producing the correct 553 run-time behavior in the parser. 554 55502/16/08: beazley 556 Slight relaxation of what the input() method to a lexer will 557 accept as a string. Instead of testing the input to see 558 if the input is a string or unicode string, it checks to see 559 if the input object looks like it contains string data. 560 This change makes it possible to pass string-like objects 561 in as input. For example, the object returned by mmap. 562 563 import mmap, os 564 data = mmap.mmap(os.open(filename,os.O_RDONLY), 565 os.path.getsize(filename), 566 access=mmap.ACCESS_READ) 567 lexer.input(data) 568 569 57011/29/07: beazley 571 Modification of ply.lex to allow token functions to aliased. 572 This is subtle, but it makes it easier to create libraries and 573 to reuse token specifications. For example, suppose you defined 574 a function like this: 575 576 def number(t): 577 r'\d+' 578 t.value = int(t.value) 579 return t 580 581 This change would allow you to define a token rule as follows: 582 583 t_NUMBER = number 584 585 In this case, the token type will be set to 'NUMBER' and use 586 the associated number() function to process tokens. 587 58811/28/07: beazley 589 Slight modification to lex and yacc to grab symbols from both 590 the local and global dictionaries of the caller. This 591 modification allows lexers and parsers to be defined using 592 inner functions and closures. 593 59411/28/07: beazley 595 Performance optimization: The lexer.lexmatch and t.lexer 596 attributes are no longer set for lexer tokens that are not 597 defined by functions. The only normal use of these attributes 598 would be in lexer rules that need to perform some kind of 599 special processing. Thus, it doesn't make any sense to set 600 them on every token. 601 602 *** POTENTIAL INCOMPATIBILITY *** This might break code 603 that is mucking around with internal lexer state in some 604 sort of magical way. 605 60611/27/07: beazley 607 Added the ability to put the parser into error-handling mode 608 from within a normal production. To do this, simply raise 609 a yacc.SyntaxError exception like this: 610 611 def p_some_production(p): 612 'some_production : prod1 prod2' 613 ... 614 raise yacc.SyntaxError # Signal an error 615 616 A number of things happen after this occurs: 617 618 - The last symbol shifted onto the symbol stack is discarded 619 and parser state backed up to what it was before the 620 the rule reduction. 621 622 - The current lookahead symbol is saved and replaced by 623 the 'error' symbol. 624 625 - The parser enters error recovery mode where it tries 626 to either reduce the 'error' rule or it starts 627 discarding items off of the stack until the parser 628 resets. 629 630 When an error is manually set, the parser does *not* call 631 the p_error() function (if any is defined). 632 *** NEW FEATURE *** Suggested on the mailing list 633 63411/27/07: beazley 635 Fixed structure bug in examples/ansic. Reported by Dion Blazakis. 636 63711/27/07: beazley 638 Fixed a bug in the lexer related to start conditions and ignored 639 token rules. If a rule was defined that changed state, but 640 returned no token, the lexer could be left in an inconsistent 641 state. Reported by 642 64311/27/07: beazley 644 Modified setup.py to support Python Eggs. Patch contributed by 645 Simon Cross. 646 64711/09/07: beazely 648 Fixed a bug in error handling in yacc. If a syntax error occurred and the 649 parser rolled the entire parse stack back, the parser would be left in in 650 inconsistent state that would cause it to trigger incorrect actions on 651 subsequent input. Reported by Ton Biegstraaten, Justin King, and others. 652 65311/09/07: beazley 654 Fixed a bug when passing empty input strings to yacc.parse(). This 655 would result in an error message about "No input given". Reported 656 by Andrew Dalke. 657 658Version 2.3 659----------------------------- 66002/20/07: beazley 661 Fixed a bug with character literals if the literal '.' appeared as the 662 last symbol of a grammar rule. Reported by Ales Smrcka. 663 66402/19/07: beazley 665 Warning messages are now redirected to stderr instead of being printed 666 to standard output. 667 66802/19/07: beazley 669 Added a warning message to lex.py if it detects a literal backslash 670 character inside the t_ignore declaration. This is to help 671 problems that might occur if someone accidentally defines t_ignore 672 as a Python raw string. For example: 673 674 t_ignore = r' \t' 675 676 The idea for this is from an email I received from David Cimimi who 677 reported bizarre behavior in lexing as a result of defining t_ignore 678 as a raw string by accident. 679 68002/18/07: beazley 681 Performance improvements. Made some changes to the internal 682 table organization and LR parser to improve parsing performance. 683 68402/18/07: beazley 685 Automatic tracking of line number and position information must now be 686 enabled by a special flag to parse(). For example: 687 688 yacc.parse(data,tracking=True) 689 690 In many applications, it's just not that important to have the 691 parser automatically track all line numbers. By making this an 692 optional feature, it allows the parser to run significantly faster 693 (more than a 20% speed increase in many cases). Note: positional 694 information is always available for raw tokens---this change only 695 applies to positional information associated with nonterminal 696 grammar symbols. 697 *** POTENTIAL INCOMPATIBILITY *** 698 69902/18/07: beazley 700 Yacc no longer supports extended slices of grammar productions. 701 However, it does support regular slices. For example: 702 703 def p_foo(p): 704 '''foo: a b c d e''' 705 p[0] = p[1:3] 706 707 This change is a performance improvement to the parser--it streamlines 708 normal access to the grammar values since slices are now handled in 709 a __getslice__() method as opposed to __getitem__(). 710 71102/12/07: beazley 712 Fixed a bug in the handling of token names when combined with 713 start conditions. Bug reported by Todd O'Bryan. 714 715Version 2.2 716------------------------------ 71711/01/06: beazley 718 Added lexpos() and lexspan() methods to grammar symbols. These 719 mirror the same functionality of lineno() and linespan(). For 720 example: 721 722 def p_expr(p): 723 'expr : expr PLUS expr' 724 p.lexpos(1) # Lexing position of left-hand-expression 725 p.lexpos(1) # Lexing position of PLUS 726 start,end = p.lexspan(3) # Lexing range of right hand expression 727 72811/01/06: beazley 729 Minor change to error handling. The recommended way to skip characters 730 in the input is to use t.lexer.skip() as shown here: 731 732 def t_error(t): 733 print "Illegal character '%s'" % t.value[0] 734 t.lexer.skip(1) 735 736 The old approach of just using t.skip(1) will still work, but won't 737 be documented. 738 73910/31/06: beazley 740 Discarded tokens can now be specified as simple strings instead of 741 functions. To do this, simply include the text "ignore_" in the 742 token declaration. For example: 743 744 t_ignore_cppcomment = r'//.*' 745 746 Previously, this had to be done with a function. For example: 747 748 def t_ignore_cppcomment(t): 749 r'//.*' 750 pass 751 752 If start conditions/states are being used, state names should appear 753 before the "ignore_" text. 754 75510/19/06: beazley 756 The Lex module now provides support for flex-style start conditions 757 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. 758 Please refer to this document to understand this change note. Refer to 759 the PLY documentation for PLY-specific explanation of how this works. 760 761 To use start conditions, you first need to declare a set of states in 762 your lexer file: 763 764 states = ( 765 ('foo','exclusive'), 766 ('bar','inclusive') 767 ) 768 769 This serves the same role as the %s and %x specifiers in flex. 770 771 One a state has been declared, tokens for that state can be 772 declared by defining rules of the form t_state_TOK. For example: 773 774 t_PLUS = '\+' # Rule defined in INITIAL state 775 t_foo_NUM = '\d+' # Rule defined in foo state 776 t_bar_NUM = '\d+' # Rule defined in bar state 777 778 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar 779 t_ANY_NUM = '\d+' # Rule defined in all states 780 781 In addition to defining tokens for each state, the t_ignore and t_error 782 specifications can be customized for specific states. For example: 783 784 t_foo_ignore = " " # Ignored characters for foo state 785 def t_bar_error(t): 786 # Handle errors in bar state 787 788 With token rules, the following methods can be used to change states 789 790 def t_TOKNAME(t): 791 t.lexer.begin('foo') # Begin state 'foo' 792 t.lexer.push_state('foo') # Begin state 'foo', push old state 793 # onto a stack 794 t.lexer.pop_state() # Restore previous state 795 t.lexer.current_state() # Returns name of current state 796 797 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and 798 yy_top_state() functions in flex. 799 800 The use of start states can be used as one way to write sub-lexers. 801 For example, the lexer or parser might instruct the lexer to start 802 generating a different set of tokens depending on the context. 803 804 example/yply/ylex.py shows the use of start states to grab C/C++ 805 code fragments out of traditional yacc specification files. 806 807 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also 808 discussed various aspects of the design. 809 81010/19/06: beazley 811 Minor change to the way in which yacc.py was reporting shift/reduce 812 conflicts. Although the underlying LALR(1) algorithm was correct, 813 PLY was under-reporting the number of conflicts compared to yacc/bison 814 when precedence rules were in effect. This change should make PLY 815 report the same number of conflicts as yacc. 816 81710/19/06: beazley 818 Modified yacc so that grammar rules could also include the '-' 819 character. For example: 820 821 def p_expr_list(p): 822 'expression-list : expression-list expression' 823 824 Suggested by Oldrich Jedlicka. 825 82610/18/06: beazley 827 Attribute lexer.lexmatch added so that token rules can access the re 828 match object that was generated. For example: 829 830 def t_FOO(t): 831 r'some regex' 832 m = t.lexer.lexmatch 833 # Do something with m 834 835 836 This may be useful if you want to access named groups specified within 837 the regex for a specific token. Suggested by Oldrich Jedlicka. 838 83910/16/06: beazley 840 Changed the error message that results if an illegal character 841 is encountered and no default error function is defined in lex. 842 The exception is now more informative about the actual cause of 843 the error. 844 845Version 2.1 846------------------------------ 84710/02/06: beazley 848 The last Lexer object built by lex() can be found in lex.lexer. 849 The last Parser object built by yacc() can be found in yacc.parser. 850 85110/02/06: beazley 852 New example added: examples/yply 853 854 This example uses PLY to convert Unix-yacc specification files to 855 PLY programs with the same grammar. This may be useful if you 856 want to convert a grammar from bison/yacc to use with PLY. 857 85810/02/06: beazley 859 Added support for a start symbol to be specified in the yacc 860 input file itself. Just do this: 861 862 start = 'name' 863 864 where 'name' matches some grammar rule. For example: 865 866 def p_name(p): 867 'name : A B C' 868 ... 869 870 This mirrors the functionality of the yacc %start specifier. 871 87209/30/06: beazley 873 Some new examples added.: 874 875 examples/GardenSnake : A simple indentation based language similar 876 to Python. Shows how you might handle 877 whitespace. Contributed by Andrew Dalke. 878 879 examples/BASIC : An implementation of 1964 Dartmouth BASIC. 880 Contributed by Dave against his better 881 judgement. 882 88309/28/06: beazley 884 Minor patch to allow named groups to be used in lex regular 885 expression rules. For example: 886 887 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' 888 889 Patch submitted by Adam Ring. 890 89109/28/06: beazley 892 LALR(1) is now the default parsing method. To use SLR, use 893 yacc.yacc(method="SLR"). Note: there is no performance impact 894 on parsing when using LALR(1) instead of SLR. However, constructing 895 the parsing tables will take a little longer. 896 89709/26/06: beazley 898 Change to line number tracking. To modify line numbers, modify 899 the line number of the lexer itself. For example: 900 901 def t_NEWLINE(t): 902 r'\n' 903 t.lexer.lineno += 1 904 905 This modification is both cleanup and a performance optimization. 906 In past versions, lex was monitoring every token for changes in 907 the line number. This extra processing is unnecessary for a vast 908 majority of tokens. Thus, this new approach cleans it up a bit. 909 910 *** POTENTIAL INCOMPATIBILITY *** 911 You will need to change code in your lexer that updates the line 912 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" 913 91409/26/06: beazley 915 Added the lexing position to tokens as an attribute lexpos. This 916 is the raw index into the input text at which a token appears. 917 This information can be used to compute column numbers and other 918 details (e.g., scan backwards from lexpos to the first newline 919 to get a column position). 920 92109/25/06: beazley 922 Changed the name of the __copy__() method on the Lexer class 923 to clone(). This is used to clone a Lexer object (e.g., if 924 you're running different lexers at the same time). 925 92609/21/06: beazley 927 Limitations related to the use of the re module have been eliminated. 928 Several users reported problems with regular expressions exceeding 929 more than 100 named groups. To solve this, lex.py is now capable 930 of automatically splitting its master regular regular expression into 931 smaller expressions as needed. This should, in theory, make it 932 possible to specify an arbitrarily large number of tokens. 933 93409/21/06: beazley 935 Improved error checking in lex.py. Rules that match the empty string 936 are now rejected (otherwise they cause the lexer to enter an infinite 937 loop). An extra check for rules containing '#' has also been added. 938 Since lex compiles regular expressions in verbose mode, '#' is interpreted 939 as a regex comment, it is critical to use '\#' instead. 940 94109/18/06: beazley 942 Added a @TOKEN decorator function to lex.py that can be used to 943 define token rules where the documentation string might be computed 944 in some way. 945 946 digit = r'([0-9])' 947 nondigit = r'([_A-Za-z])' 948 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' 949 950 from ply.lex import TOKEN 951 952 @TOKEN(identifier) 953 def t_ID(t): 954 # Do whatever 955 956 The @TOKEN decorator merely sets the documentation string of the 957 associated token function as needed for lex to work. 958 959 Note: An alternative solution is the following: 960 961 def t_ID(t): 962 # Do whatever 963 964 t_ID.__doc__ = identifier 965 966 Note: Decorators require the use of Python 2.4 or later. If compatibility 967 with old versions is needed, use the latter solution. 968 969 The need for this feature was suggested by Cem Karan. 970 97109/14/06: beazley 972 Support for single-character literal tokens has been added to yacc. 973 These literals must be enclosed in quotes. For example: 974 975 def p_expr(p): 976 "expr : expr '+' expr" 977 ... 978 979 def p_expr(p): 980 'expr : expr "-" expr' 981 ... 982 983 In addition to this, it is necessary to tell the lexer module about 984 literal characters. This is done by defining the variable 'literals' 985 as a list of characters. This should be defined in the module that 986 invokes the lex.lex() function. For example: 987 988 literals = ['+','-','*','/','(',')','='] 989 990 or simply 991 992 literals = '+=*/()=' 993 994 It is important to note that literals can only be a single character. 995 When the lexer fails to match a token using its normal regular expression 996 rules, it will check the current character against the literal list. 997 If found, it will be returned with a token type set to match the literal 998 character. Otherwise, an illegal character will be signalled. 999 1000 100109/14/06: beazley 1002 Modified PLY to install itself as a proper Python package called 'ply'. 1003 This will make it a little more friendly to other modules. This 1004 changes the usage of PLY only slightly. Just do this to import the 1005 modules 1006 1007 import ply.lex as lex 1008 import ply.yacc as yacc 1009 1010 Alternatively, you can do this: 1011 1012 from ply import * 1013 1014 Which imports both the lex and yacc modules. 1015 Change suggested by Lee June. 1016 101709/13/06: beazley 1018 Changed the handling of negative indices when used in production rules. 1019 A negative production index now accesses already parsed symbols on the 1020 parsing stack. For example, 1021 1022 def p_foo(p): 1023 "foo: A B C D" 1024 print p[1] # Value of 'A' symbol 1025 print p[2] # Value of 'B' symbol 1026 print p[-1] # Value of whatever symbol appears before A 1027 # on the parsing stack. 1028 1029 p[0] = some_val # Sets the value of the 'foo' grammer symbol 1030 1031 This behavior makes it easier to work with embedded actions within the 1032 parsing rules. For example, in C-yacc, it is possible to write code like 1033 this: 1034 1035 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } 1036 1037 In this example, the printf() code executes immediately after A has been 1038 parsed. Within the embedded action code, $1 refers to the A symbol on 1039 the stack. 1040 1041 To perform this equivalent action in PLY, you need to write a pair 1042 of rules like this: 1043 1044 def p_bar(p): 1045 "bar : A seen_A B" 1046 do_stuff 1047 1048 def p_seen_A(p): 1049 "seen_A :" 1050 print "seen an A =", p[-1] 1051 1052 The second rule "seen_A" is merely a empty production which should be 1053 reduced as soon as A is parsed in the "bar" rule above. The use 1054 of the negative index p[-1] is used to access whatever symbol appeared 1055 before the seen_A symbol. 1056 1057 This feature also makes it possible to support inherited attributes. 1058 For example: 1059 1060 def p_decl(p): 1061 "decl : scope name" 1062 1063 def p_scope(p): 1064 """scope : GLOBAL 1065 | LOCAL""" 1066 p[0] = p[1] 1067 1068 def p_name(p): 1069 "name : ID" 1070 if p[-1] == "GLOBAL": 1071 # ... 1072 else if p[-1] == "LOCAL": 1073 #... 1074 1075 In this case, the name rule is inheriting an attribute from the 1076 scope declaration that precedes it. 1077 1078 *** POTENTIAL INCOMPATIBILITY *** 1079 If you are currently using negative indices within existing grammar rules, 1080 your code will break. This should be extremely rare if non-existent in 1081 most cases. The argument to various grammar rules is not usually not 1082 processed in the same way as a list of items. 1083 1084Version 2.0 1085------------------------------ 108609/07/06: beazley 1087 Major cleanup and refactoring of the LR table generation code. Both SLR 1088 and LALR(1) table generation is now performed by the same code base with 1089 only minor extensions for extra LALR(1) processing. 1090 109109/07/06: beazley 1092 Completely reimplemented the entire LALR(1) parsing engine to use the 1093 DeRemer and Pennello algorithm for calculating lookahead sets. This 1094 significantly improves the performance of generating LALR(1) tables 1095 and has the added feature of actually working correctly! If you 1096 experienced weird behavior with LALR(1) in prior releases, this should 1097 hopefully resolve all of those problems. Many thanks to 1098 Andrew Waters and Markus Schoepflin for submitting bug reports 1099 and helping me test out the revised LALR(1) support. 1100 1101Version 1.8 1102------------------------------ 110308/02/06: beazley 1104 Fixed a problem related to the handling of default actions in LALR(1) 1105 parsing. If you experienced subtle and/or bizarre behavior when trying 1106 to use the LALR(1) engine, this may correct those problems. Patch 1107 contributed by Russ Cox. Note: This patch has been superceded by 1108 revisions for LALR(1) parsing in Ply-2.0. 1109 111008/02/06: beazley 1111 Added support for slicing of productions in yacc. 1112 Patch contributed by Patrick Mezard. 1113 1114Version 1.7 1115------------------------------ 111603/02/06: beazley 1117 Fixed infinite recursion problem ReduceToTerminals() function that 1118 would sometimes come up in LALR(1) table generation. Reported by 1119 Markus Schoepflin. 1120 112103/01/06: beazley 1122 Added "reflags" argument to lex(). For example: 1123 1124 lex.lex(reflags=re.UNICODE) 1125 1126 This can be used to specify optional flags to the re.compile() function 1127 used inside the lexer. This may be necessary for special situations such 1128 as processing Unicode (e.g., if you want escapes like \w and \b to consult 1129 the Unicode character property database). The need for this suggested by 1130 Andreas Jung. 1131 113203/01/06: beazley 1133 Fixed a bug with an uninitialized variable on repeated instantiations of parser 1134 objects when the write_tables=0 argument was used. Reported by Michael Brown. 1135 113603/01/06: beazley 1137 Modified lex.py to accept Unicode strings both as the regular expressions for 1138 tokens and as input. Hopefully this is the only change needed for Unicode support. 1139 Patch contributed by Johan Dahl. 1140 114103/01/06: beazley 1142 Modified the class-based interface to work with new-style or old-style classes. 1143 Patch contributed by Michael Brown (although I tweaked it slightly so it would work 1144 with older versions of Python). 1145 1146Version 1.6 1147------------------------------ 114805/27/05: beazley 1149 Incorporated patch contributed by Christopher Stawarz to fix an extremely 1150 devious bug in LALR(1) parser generation. This patch should fix problems 1151 numerous people reported with LALR parsing. 1152 115305/27/05: beazley 1154 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, 1155 and Thad Austin. 1156 115705/27/05: beazley 1158 Added outputdir option to yacc() to control output directory. Contributed 1159 by Christopher Stawarz. 1160 116105/27/05: beazley 1162 Added rununit.py test script to run tests using the Python unittest module. 1163 Contributed by Miki Tebeka. 1164 1165Version 1.5 1166------------------------------ 116705/26/04: beazley 1168 Major enhancement. LALR(1) parsing support is now working. 1169 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) 1170 and optimized by David Beazley. To use LALR(1) parsing do 1171 the following: 1172 1173 yacc.yacc(method="LALR") 1174 1175 Computing LALR(1) parsing tables takes about twice as long as 1176 the default SLR method. However, LALR(1) allows you to handle 1177 more complex grammars. For example, the ANSI C grammar 1178 (in example/ansic) has 13 shift-reduce conflicts with SLR, but 1179 only has 1 shift-reduce conflict with LALR(1). 1180 118105/20/04: beazley 1182 Added a __len__ method to parser production lists. Can 1183 be used in parser rules like this: 1184 1185 def p_somerule(p): 1186 """a : B C D 1187 | E F" 1188 if (len(p) == 3): 1189 # Must have been first rule 1190 elif (len(p) == 2): 1191 # Must be second rule 1192 1193 Suggested by Joshua Gerth and others. 1194 1195Version 1.4 1196------------------------------ 119704/23/04: beazley 1198 Incorporated a variety of patches contributed by Eric Raymond. 1199 These include: 1200 1201 0. Cleans up some comments so they don't wrap on an 80-column display. 1202 1. Directs compiler errors to stderr where they belong. 1203 2. Implements and documents automatic line counting when \n is ignored. 1204 3. Changes the way progress messages are dumped when debugging is on. 1205 The new format is both less verbose and conveys more information than 1206 the old, including shift and reduce actions. 1207 120804/23/04: beazley 1209 Added a Python setup.py file to simply installation. Contributed 1210 by Adam Kerrison. 1211 121204/23/04: beazley 1213 Added patches contributed by Adam Kerrison. 1214 1215 - Some output is now only shown when debugging is enabled. This 1216 means that PLY will be completely silent when not in debugging mode. 1217 1218 - An optional parameter "write_tables" can be passed to yacc() to 1219 control whether or not parsing tables are written. By default, 1220 it is true, but it can be turned off if you don't want the yacc 1221 table file. Note: disabling this will cause yacc() to regenerate 1222 the parsing table each time. 1223 122404/23/04: beazley 1225 Added patches contributed by David McNab. This patch addes two 1226 features: 1227 1228 - The parser can be supplied as a class instead of a module. 1229 For an example of this, see the example/classcalc directory. 1230 1231 - Debugging output can be directed to a filename of the user's 1232 choice. Use 1233 1234 yacc(debugfile="somefile.out") 1235 1236 1237Version 1.3 1238------------------------------ 123912/10/02: jmdyck 1240 Various minor adjustments to the code that Dave checked in today. 1241 Updated test/yacc_{inf,unused}.exp to reflect today's changes. 1242 124312/10/02: beazley 1244 Incorporated a variety of minor bug fixes to empty production 1245 handling and infinite recursion checking. Contributed by 1246 Michael Dyck. 1247 124812/10/02: beazley 1249 Removed bogus recover() method call in yacc.restart() 1250 1251Version 1.2 1252------------------------------ 125311/27/02: beazley 1254 Lexer and parser objects are now available as an attribute 1255 of tokens and slices respectively. For example: 1256 1257 def t_NUMBER(t): 1258 r'\d+' 1259 print t.lexer 1260 1261 def p_expr_plus(t): 1262 'expr: expr PLUS expr' 1263 print t.lexer 1264 print t.parser 1265 1266 This can be used for state management (if needed). 1267 126810/31/02: beazley 1269 Modified yacc.py to work with Python optimize mode. To make 1270 this work, you need to use 1271 1272 yacc.yacc(optimize=1) 1273 1274 Furthermore, you need to first run Python in normal mode 1275 to generate the necessary parsetab.py files. After that, 1276 you can use python -O or python -OO. 1277 1278 Note: optimized mode turns off a lot of error checking. 1279 Only use when you are sure that your grammar is working. 1280 Make sure parsetab.py is up to date! 1281 128210/30/02: beazley 1283 Added cloning of Lexer objects. For example: 1284 1285 import copy 1286 l = lex.lex() 1287 lc = copy.copy(l) 1288 1289 l.input("Some text") 1290 lc.input("Some other text") 1291 ... 1292 1293 This might be useful if the same "lexer" is meant to 1294 be used in different contexts---or if multiple lexers 1295 are running concurrently. 1296 129710/30/02: beazley 1298 Fixed subtle bug with first set computation and empty productions. 1299 Patch submitted by Michael Dyck. 1300 130110/30/02: beazley 1302 Fixed error messages to use "filename:line: message" instead 1303 of "filename:line. message". This makes error reporting more 1304 friendly to emacs. Patch submitted by Fran�ois Pinard. 1305 130610/30/02: beazley 1307 Improvements to parser.out file. Terminals and nonterminals 1308 are sorted instead of being printed in random order. 1309 Patch submitted by Fran�ois Pinard. 1310 131110/30/02: beazley 1312 Improvements to parser.out file output. Rules are now printed 1313 in a way that's easier to understand. Contributed by Russ Cox. 1314 131510/30/02: beazley 1316 Added 'nonassoc' associativity support. This can be used 1317 to disable the chaining of operators like a < b < c. 1318 To use, simply specify 'nonassoc' in the precedence table 1319 1320 precedence = ( 1321 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators 1322 ('left', 'PLUS', 'MINUS'), 1323 ('left', 'TIMES', 'DIVIDE'), 1324 ('right', 'UMINUS'), # Unary minus operator 1325 ) 1326 1327 Patch contributed by Russ Cox. 1328 132910/30/02: beazley 1330 Modified the lexer to provide optional support for Python -O and -OO 1331 modes. To make this work, Python *first* needs to be run in 1332 unoptimized mode. This reads the lexing information and creates a 1333 file "lextab.py". Then, run lex like this: 1334 1335 # module foo.py 1336 ... 1337 ... 1338 lex.lex(optimize=1) 1339 1340 Once the lextab file has been created, subsequent calls to 1341 lex.lex() will read data from the lextab file instead of using 1342 introspection. In optimized mode (-O, -OO) everything should 1343 work normally despite the loss of doc strings. 1344 1345 To change the name of the file 'lextab.py' use the following: 1346 1347 lex.lex(lextab="footab") 1348 1349 (this creates a file footab.py) 1350 1351 1352Version 1.1 October 25, 2001 1353------------------------------ 1354 135510/25/01: beazley 1356 Modified the table generator to produce much more compact data. 1357 This should greatly reduce the size of the parsetab.py[c] file. 1358 Caveat: the tables still need to be constructed so a little more 1359 work is done in parsetab on import. 1360 136110/25/01: beazley 1362 There may be a possible bug in the cycle detector that reports errors 1363 about infinite recursion. I'm having a little trouble tracking it 1364 down, but if you get this problem, you can disable the cycle 1365 detector as follows: 1366 1367 yacc.yacc(check_recursion = 0) 1368 136910/25/01: beazley 1370 Fixed a bug in lex.py that sometimes caused illegal characters to be 1371 reported incorrectly. Reported by Sverre J�rgensen. 1372 13737/8/01 : beazley 1374 Added a reference to the underlying lexer object when tokens are handled by 1375 functions. The lexer is available as the 'lexer' attribute. This 1376 was added to provide better lexing support for languages such as Fortran 1377 where certain types of tokens can't be conveniently expressed as regular 1378 expressions (and where the tokenizing function may want to perform a 1379 little backtracking). Suggested by Pearu Peterson. 1380 13816/20/01 : beazley 1382 Modified yacc() function so that an optional starting symbol can be specified. 1383 For example: 1384 1385 yacc.yacc(start="statement") 1386 1387 Normally yacc always treats the first production rule as the starting symbol. 1388 However, if you are debugging your grammar it may be useful to specify 1389 an alternative starting symbol. Idea suggested by Rich Salz. 1390 1391Version 1.0 June 18, 2001 1392-------------------------- 1393Initial public offering 1394 1395