1% Copyright 2012 Jeffrey Kegler 2% This file is part of Marpa::XS. Marpa::XS is free software: you can 3% redistribute it and/or modify it under the terms of the GNU Lesser 4% General Public License as published by the Free Software Foundation, 5% either version 3 of the License, or (at your option) any later version. 6% 7% Marpa::XS is distributed in the hope that it will be useful, 8% but WITHOUT ANY WARRANTY; without even the implied warranty of 9% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 10% Lesser General Public License for more details. 11% 12% You should have received a copy of the GNU Lesser 13% General Public License along with Marpa::XS. If not, see 14% http://www.gnu.org/licenses/. 15 16\def\li{\item{$\bullet$}} 17 18% Here is TeX material that gets inserted after \input cwebmac 19\def\hang{\hangindent 3em\indent\ignorespaces} 20\def\pb{$\.|\ldots\.|$} % C brackets (|...|) 21\def\v{\char'174} % vertical (|) in typewriter font 22\def\dleft{[\![} \def\dright{]\!]} % double brackets 23\mathchardef\RA="3221 % right arrow 24\mathchardef\BA="3224 % double arrow 25\def\({} % ) kludge for alphabetizing certain section names 26\def\TeXxstring{\\{\TEX/\_string}} 27\def\skipxTeX{\\{skip\_\TEX/}} 28\def\copyxTeX{\\{copy\_\TEX/}} 29 30\let\K=\Longleftarrow 31 32\secpagedepth=1 33 34\def\title{Code for Marpa} 35\def\topofcontents{\null\vfill 36 \centerline{\titlefont Code for Marpa} 37 \vfill} 38\def\botofcontents{\vfill 39\noindent 40@i copyright_page_license.w 41\bigskip 42\leftline{\sc\today\ at \hours} % timestamps the contents page 43} 44% \datecontentspage 45 46\pageno=\contentspagenumber \advance\pageno by 1 47\let\maybe=\iftrue 48 49\def\marpa_sub#1{{\bf #1}: } 50\def\libmarpa/{{\tt libmarpa}} 51\def\QED/{{\bf QED}} 52\def\Theorem/{{\bf Theorem}} 53\def\Proof/{{\bf Theorem}} 54\def\size#1{\v #1\v} 55\def\gsize{\v g\v} 56\def\wsize{\v w\v} 57 58@q Unreserve the C++ keywords @> 59@s asm normal 60@s dynamic_cast normal 61@s namespace normal 62@s reinterpret_cast normal 63@s try normal 64@s bool normal 65@s explicit normal 66@s new normal 67@s static_cast normal 68@s typeid normal 69@s catch normal 70@s false normal 71@s operator normal 72@s template normal 73@s typename normal 74@s class normal 75@s friend normal 76@s private normal 77@s this normal 78@s using normal 79@s const_cast normal 80@s public normal 81@s throw normal 82@s virtual normal 83@s delete normal 84@s mutable normal 85@s protected normal 86@s true normal 87@s wchar_t normal 88@s and normal 89@s bitand normal 90@s compl normal 91@s not_eq normal 92@s or_eq normal 93@s xor_eq normal 94@s and_eq normal 95@s bitor normal 96@s not normal 97@s or normal 98@s xor normal 99 100@s error normal 101@s gconstpointer int 102@s gpointer int 103@s gint int 104@s guint int 105@s gboolean int 106@s PSAR int 107@s PSL int 108 109@** License. 110\bigskip\noindent 111@i copyright_page_license.w 112 113@** About This Document. 114This document is very much under construction, 115enough so that readers may question why I make it 116available at all. Two reasons: 117\li Despite its problems, it is the best way to read the source code 118at this point. 119\li Since it is essential to changing the code, not making it available 120could be seen to violate the spirit of the open source. 121@ This will eventually become a real book describing the 122code. 123It is already approaching that in size. 124Quality is another story. 125Much rewriting and reorganization is being left until the end. 126\par 127Marpa is a very unusual C library -- no system calls, no floating 128point and almost no arithmetic. A lot of data structures 129and pointer twiddling. 130I have found that a lot of good coding practices in other 131contexts are not in this one. 132\par 133For example, I intended to fully to avoid abbreviations. 134This is good practice -- in most cases all abbreviations save is 135some typing, at a very high cost in readability. 136In |libmarpa|, however, spelling things out usually does 137{\bf not} make them more readable. 138To be sure, |To_AHFA_of_EIM_by_SYMID| is pretty incomprehensible. 139But is 140$$Aycock\_Horspool\_Finite\_Automaton\_To\_State\_of\_Earley\_Item\_by\_Symbol\_ID$$ 141better? 142At this point, I have a lot of practice coming back to pages of both, cold, 143and trying to figure them out. 144Both are daunting, but the abbreviations, are more elegant, and look 145better on the page, while unabbreviated names routinely pose almost insoluble 146problems for Cweb's \TeX{} typesetting. 147\par 148Whichever is used, it must be kept systematic and 149documented, and that is easier with the abbreviations. 150In general, I believe abbreviations are used in code 151far more than they should be. But they have their place 152and |libmarpa| is one of them. 153\par 154Because I realized that abbreviations were going to be not 155just better, but almost essential if I ever was to finish this 156project, I changed from a ``no abbreviation" policy to one 157of ``abbreviate when necessary and it is necessary a lot" half 158way through. 159Thus the code is highly inconsistent in this respect. 160At the moment, 161that's true of a lot of my other coding conventions. 162\par 163To summarize, the reader who has not yet been scared off, 164needs to be aware that the coding conventions are not yet 165consistent internally, and not yet consistent with their 166documentation. 167@ 168The Cweb is being written along with the code. 169If the code works right off the bat, its accompanying text 170will be a first draft. 171The more trouble I had understanding an issue, 172and writing the code, 173the more thorough the documentation. 174 175@** Design. 176@*0 Layers. 177|libmarpa|, the library described in this document, is intended as the bottom of potentially 178four layers. 179The layers are, from low to high 180\li |libmarpa| 181\li The glue layer 182\li The wrapper layer 183\li The application 184 185This glue layer will be in C and will call the |libmarpa| routines 186in a way that makes them compatible with another language. 187I expect this will usually be a 4GL (4th generation language), 188such as Perl. 189One example of a glue description lanuage is SWIG. 190Another is Perl XS, and currently that is 191the only glue layer implemented for |libmarpa|. 192 193|libmarpa| itself is not enormously user- 194or application-friendly. 195For example, in |libmarpa|, symbols do not have 196names, just symbol structures and symbol ID's. 197These are all that is needed for the data crunching, 198but an application writer will usually want a friendlier 199interface, including names for the symbols and 200many other conveniences. 201For this reason, applications will typically 202use |libmarpa| through a {\bf wrapper package}. 203Currently the only such package is in Perl. 204 205The top layer is the application. 206My expectation is that this will also be in a 4GL. 207Currently, |libmarpa|'s only application are 208in Perl. 209 210Not all these layers need be present. 211For example, it is conceivable that someone might 212write their application in C, in which case they could 213manage without minimal or no 214glue layers or package layers. 215 216Iterfaces between layers are named after the lower 217of the two layers. For example the interface between 218|libmarpa| and the glue layer is the |libmarpa| interface. 219 220@*0 Representing Objects. 221Representation of objects is most commonly in one 222of three forms: cookies, ID's or pointers to C structures. 223 224@*1 Object ID's. 225Object ID's are integers. They are always issued in sequence. 226They are guaranteed unique. 227(Note that in C, 228pointers to identical objects do {\bf not} necessarily 229compare equal.) 230If desired, they can be checked easily without risking a memory 231violation. 232 233ID's are the only object representation 234that can be used in any layer or any interface, 235and they are the preferred representation 236in the application layer 237and the package interface. 238 239Wraparound issues for object ID's are ignored. 240By the time any object ID wraps, memory will have long 241since overflowed. 242 243@*1 Object Cookies. 244Ideally, outside of the |libmarpa| layer, 245all objects would be represented by their ID. 246However, an exception is made recognizers and grammars, 247even though they do have ID's. 248This is because looking up ID's for these global objects 249is not thread-safe. 250 251@ To make ID lookup for global objects could be made thread-safe, 252but this involves locking data. 253It is possible to do this portably, using Glib, but it seems simply 254and safer to expect the calling environment to respect the opaque 255nature of the grammar and recognizer cookies. 256 257``Respecting the opaque nature of a cookie", 258means not 259accessing its internal contents -- using the 260cookie only as a cookie. 261The overall idea is that, 262if an programmer 263writes trick-free higher-level code 264using cookies, 265any resulting errors occur 266in the package or application layer. 267 268The contents of Object Cookies are dependent on 269the choice of higher-level language (HLL). 270For this reason, 271The cookies are never visible in the |libmarpa| layer. 272 273In Perl's cookies, a major consideration is ensuring 274that, during the lifetime of a cookie, 275all the objects implied by the cookie also exist. 276This means that so long as 277a recognizer object cookie exists, 278the underlying grammar cannot be destroyed. 279 280@*1 Object pointers. 281The most efficient representation of objects 282are pointers to structures. 283These are the main representation of objects 284in the |libmarpa| layer. 285These must not be visible in the package and application 286layers. 287 288With regard to the visibility of object pointers in the 289glue layer, the situation is more complicated. 290At this writing, I expect to make pointers 291to most structures 292completely invisible except inside |libmarpa|. 293The external accessors do allow the glue layer 294some access 295to |libmarpa|'s internal structures. 296But in the case of the |_peek| 297external accessors, 298it is intuitive that the memory is owned 299by the |libmarpa| layer, 300and expected that any use of it will be quick. 301 302In the case of object pointers, their expected ordinary 303use is be kept around to refer to the object. 304But, for example, symbol object pointers must not 305be freed by the glue layer, but will become invalid 306when their associated grammar layer is destroyed. 307 308This behavior is not completely unintuitive to an 309experienced C programmer -- functions (like |ctime|) 310which return 311transient information in memory unowned by the caller 312have a long tradition in UNIX. 313But these are now deprecated. 314 315But tracking the lifetime of symbol object pointers 316in the glue layer 317would be tricky, so as this writing the thought is to 318avoid the issue, for it and most other object pointers. 319The exceptions are grammar and recognizer objects. 320The base objects for these {\bf are} owned by 321the glue layer, so these do not present the same 322issues. 323The glue layer creates 324grammar and recognizer objects, 325it owns them during their lifetime, 326and it is up to the glue layer to destroy them. 327 328@*0 Inlining. 329Most of this code is expected to be freqently executed 330and inlining is used a lot. 331Enough so 332that it is useful to define a macro to let me know when inlining is not 333used in a private function. 334@s PRIVATE_NOT_INLINE int 335@d PRIVATE_NOT_INLINE static 336 337@*0 Marpa Global Setup. 338 339Marpa does no global initialization at the moment. 340I'll try to keep it that way. 341If I can't, I will need to deal with the issue 342of thread safety. 343 344@*0 Complexity. 345Considerable attention is paid to time and, 346where it is a serious issue, space complexity. 347Complexity is considered from three points of view. 348{\bf Practical worst-case complexity} is the complexity of the 349actual implementation, in the worst-case. 350{\bf Practical average complexity} is the complexity of the 351actual implementation under what are expected to be normal 352circumstances. 353Average complexity is of most interest to the typical user, 354but worst-case considerations should not be ignored --- 355in some applications, 356one case of poor performance 357can outweigh any number of 358of excellent ``average case" results. 359@ Finally, there is {\bf theoretical complexity}. 360This is the complexity I would claim in a write-up of the 361Marpa algorithm for a Theory of Computation article. 362Most of the time, this is the same as practical worst-case complexity. 363Often, however, for theoretical complexity I consider 364myself entitled to claim 365the time complexity for a 366better algorithm, even thought that is not the one 367used in the actual implementation. 368@ Sorting is a good example of under what circumstances 369I take the liberty of claiming a time complexity I did not 370implement. 371In many places in |libmarpa|, 372for sorting, 373the most reasonable practical 374implementation (sometimes the only reasonable practical implementation) 375is an $O(n^2)$ sort. 376When average list size is small, for example, 377a hand-optimized insertion sort is often clearly superior 378to all other alternatives. 379Where average list size is larger, 380a call to |g_qsort| is the appropriate response. 381|g_qsort| is the result of considerable thought and experience, 382the GNU project has decided to base it on quicksort, 383and I do not care to second-guess them on this. 384But quicksort and insertion sorts are both, theoretically, $O(n^2)$. 385@ Clearly, in both cases, I could drop in a merge sort and achieve 386a theoretical $O(n \log n)$ worst case. 387Often just as clear is that is all cases likely to occur in practice, 388the merge sort would be inferior. 389@ When I claim a complexity from a theoretical choice of algorithm, 390rather than the actually implemented one, the following will always be 391the case: 392\li The existence of the theoretical algorithm must be generally accepted. 393\li The complexity I claim for it must be generally accepted. 394\li It must be clear that there are no obstacles to using the theoretical algorithm 395whose solution is not straightforward. 396@ I am a big believer in theory. 397Often practical considerations didn't clearly indicate a choice of 398algorithm . 399In those circumstances, I usually 400allowed theoretical superiority to be the deciding factor. 401@ But there were cases 402where the theoretically superior choice 403was clearly going to be inferior in practice. 404Sorting was one of them. 405It would be possible to 406go through |libmarpa| and replace all sorts with a merge sort. 407But a slower library would be the result. 408 409@** Coding conventions. 410@*0 Naming conventions. 411 412@*1 Reserved locals. 413Certain symbol names are reserved for certain purposes. 414They are not necessarily defined, but if defined they 415must be used for the designated purpose. 416An example is |g|, which is the grammar of most interest in 417the context. 418(In fact, no marpa routine uses more than one grammar.) 419It is expected that the routines which refer to a grammar 420will set |g| to that value. 421This convention saves a lot of clutter in the form of 422macro and subroutine arguments. 423 424In some cases, these constants may not be well-defined. 425An example is |rule_count_of_g| while rules are being added 426to the grammar. 427In such cases, to minimize confusion, these names should be 428left undefined. 429This makes the macros which use them unuseable, which 430is a feature. 431 432\li |g| is always the grammar of most interest in the context. 433\li |r| is always the recognizer of most interest in the context. 434\li |rule_count_of_g| is the number of rules in |g|. 435 436@*1 Mixed Case Macros. 437In programming in general, accessors are very common. 438In |libmarpa|, the percentage of the logic the consists 439of accessors is even higher than usual, 440and their variety approaches the botanical. 441Most of these accessors are simple or even trivial, 442but some are not. 443In an effort to make the code readable and maintainable, 444I use macros for all accessors. 445@ The standard C convention is that macros are all caps. 446This is a good convention. I believe in it and almost 447always follow it. 448But in this code I have departed from it. 449@ As has been noted in the email world, 450when most of a page is in caps, that page becomes 451much harder and less pleasant to read. 452So in this code I have made macros mixed case. 453Marpa's mixed case macros are easy to spot --- 454they always start with a capital, and the ``major words" 455also begin in capital letters. 456``Verbs" and ``coverbs" in the macros begin with a lower 457case letter. 458All words are separated with an underscore, 459as is the currently accepted practice to enhance readability. 460@ The ``macros are all caps" convention is a long standing one. 461I understand that experienced C programmers will be suspicious 462of my claim that this code is special in a way that justifies 463breaking the convention. 464Frankly, if I were a new reader coming to this code, 465I would be suspicious as well. 466But I would ask anyone who wishes to criticize to first do 467the following: 468Look at one of the many macro-heavy pages in this code 469and ask yourself -- do you genuinely wish more of this 470page was in caps? 471 472@*1 External Names. 473External Names have |marpa_| or |MARPA_| as their prefix, 474as appropriate under the capitalization conventions. 475Many names begin with one of the major ``objects" of Marpa: 476grammars, recognizers, symbols, etc. 477Names of functions typically end with a verb. 478 479@*1 Booleans. 480Names of booleans are often 481of the form |is_x|, where |x| is some 482property. For example, the element of the symbol structure 483which indicates whether the symbol is a terminal or not, 484is |is_terminal|. 485Boolean names are chosen so that the |TRUE| or |FALSE| 486value corresponds correctly to the question implied by the 487name. 488Names should be as 489accurate as possible consistent with brevity. 490Where possible, consistent with brevity and accuracy, 491positive names (|is_found|) are preferred 492to negative names (|is_not_lost|). 493 494@*1 Function names. 495For function names, some final verbs have special meanings. 496In the description below |obj| stands for an object, 497and |fld| for a field of that object. 498In cases where there is not ambiguity about which 499object a field might belong to, |obj| will often be omitted. 500 501\li |obj_fld_get| returns field |fld| 502of object |obj|. 503It is an internal function, and often will be declared 504|static inline|. 505 506\li |obj_fld_put| assigns a value to field |fld| 507of object |obj|. 508It is an internal function, and often will be declared 509|static inline|. 510 511\li |marpa_obj_fld_look| returns field |fld| 512of object |obj|. 513It is an external equivalent of |obj_fld_get|. 514The returned value is still owned by object |obj| -- it should 515not be modified or freed. 516In practice, the |look| verb is often omitted. 517 518\li |marpa_obj_fld_peek| returns field |fld| 519of object |obj|. 520It is an external equivalent of |obj_fld_get|. 521The returned value is still owned by object |obj| -- it should 522not be modified or freed. 523 524The difference between ``peek" and ``look" is somewhat 525subjective. 526``Look" functions are expected to be called in the normal 527course of operation, including in production code. 528``Peek" functions break the encapsulation rules. 529Their use is expected to be limited 530to debugging or tracing situations. 531 532\li |marpa_obj_fld_set| sets field |fld| 533of object |obj|. 534It's the external equivalent of |obj_fld_put|. 535 536\li |marpa_obj_fld_value| returns field |fld| 537of object |obj|. 538It is an external equivalent of |obj_fld_get|. 539The returned value is owned by the caller. 540 541@*0 Abbreviations and Vocabulary. 542@ Unexplained abbreviations and non-standard vocabulary 543pose unnecessary challenges. 544Particular obstacles to those who are not native speakers 545of English, they are annoying to the natives as well. 546This section is intended to document 547all abbreviations. 548Also included is the 549any non-standard vocabulary 550which is not explained in detail elsewhere in the 551text. 552By ``non-standard vocabulary", 553I mean terms that 554are not in a general dictionary, and 555are also not in the standard reference works. 556@ While development is underway, this section will be 557incomplete and sometimes inaccurate. 558\li alloc: Allocate. 559\li assign: Find something, creating it when necessary. 560\li bv: Bit Vector. 561\li cmp: Compare. 562Usually as |_cmp|, the suffix or ``verb" of a function name. 563\li \_Object: As a suffix of a type name, this means an object, 564as opposed to a pointer. 565When there is a choice, 566most complex types are considered to be pointers 567to structures or unions, rather than the structure or 568union itself. 569When it's necessary to have a type which 570refers to the actual structure 571or union {\bf directly}, not via a pointer, 572that type is called the ``object" form of the 573type. As an example, look at the definitions 574of |EIM| and |EIM_Object|. 575\li EIM: Earley item. 576\li |EIM_Object|: Earley item (object). 577\li EIX: Earley item index. 578\li ES: Earley set. 579\li g: Grammar. 580\li |_ix|, |_IX|, ix, IX: Index. Often used as a suffix. 581\li Leo base item: The Earley item which ``causes" a Leo item to 582be added. If a Leo chain in reconstructed from the Leo item, 583\li Leo completion item: The Earley item which is the ``successor" 584of a Leo item to 585be added. 586\li Leo LHS symbol: The LHS of a Leo completion item (see which). 587\li Leo item: A ``transition item" as described in Leo1991. 588These stand in for a Leo chain of one or more Earley tems. 589Leo items can stand in for all the Earley items of a right 590recursion, 591and it is the use of Leo items which makes this algorithm |O(n)| 592for all LR-regular grammars. 593In an Earley implementation 594without Leo items, a parse with right recursion 595can have the time comlexity |O(n^2)|. 596\li LIM: Leo item. 597\li \_Object: Suffix indicating that the type is of an 598actual object, and not a pointer as is usually the case. 599\li PIM, pim: Postdot item. 600\li p: A Pointer. Often as |_p|, as the end of a variable name, or as |p_| at 601the beginning of one. 602\li pp: A Pointer to pointer. Often as |_pp|, as the end of a variable name. 603\li R, r: Recognizer. 604\li RECCE, recce: Recognizer. Originally military slang for a 605reconnaissance. 606\li -s, -es: Plural. Note that the |es| suffix is often used even when 607it is not good English, because it is easier to spot in text. 608For example, the plural of |ES| is |ESes|. 609\li |s_|: Prefix for a structure tag. Cweb does not C code format well 610unless tag names are distinct from other names. 611\li |t_|: Prefix for an element tag. Cweb does not C code format well 612unless tag names are distinct from others. 613Since each structure and union in C has a different namespace, 614this does not suffice to make different tags unique, but it does 615suffice to let Cweb distinguish tags from other items, and that is the 616object. 617\li |u_|: Prefix for a union tag. Cweb does not C code format well 618unless tag names are distinct from other names. 619 620@** To Do. 621 622Most of the to do list has been moved to Marpa::R2. 623 624\li If I convert Marpa to use Marpa::XS, 625and if I continue to implement the |tokens()| call, 626make sure the ``interactive" flag works. 627 628@** The Public Header File. 629@*0 Version Constants. 630@<Private global variables@> = 631const guint marpa_major_version = MARPA_MAJOR_VERSION; 632const guint marpa_minor_version = MARPA_MINOR_VERSION; 633const guint marpa_micro_version = MARPA_MICRO_VERSION; 634const guint marpa_interface_age = MARPA_INTERFACE_AGE; 635const guint marpa_binary_age = MARPA_BINARY_AGE; 636@ Return the version in a 3 element int array 637@<Function definitions@> = 638void marpa_version(int* version) { 639 version[0] = MARPA_MAJOR_VERSION; 640 version[1] = MARPA_MINOR_VERSION, 641 version[2] = MARPA_MICRO_VERSION; 642} 643@ @<Public function prototypes@> = 644void marpa_version(int* version); 645 646@*0 Header file. 647|GLIB_VAR| is to 648prefix variable declarations so that they 649will be exported properly for Windows dlls. 650@f GLIB_VAR const 651@<Body of public header file@> = 652GLIB_VAR const guint marpa_major_version;@/ 653GLIB_VAR const guint marpa_minor_version;@/ 654GLIB_VAR const guint marpa_micro_version;@/ 655GLIB_VAR const guint marpa_interface_age;@/ 656GLIB_VAR const guint marpa_binary_age;@# 657#define MARPA_CHECK_VERSION(major,minor,micro) @| \ 658 @[ (MARPA_MAJOR_VERSION > (major) \ 659 @| || (MARPA_MAJOR_VERSION == (major) && MARPA_MINOR_VERSION > (minor)) \ 660 @| || (MARPA_MAJOR_VERSION == (major) && MARPA_MINOR_VERSION == (minor) \ 661 @| && MARPA_MICRO_VERSION >= (micro))) 662 @]@# 663#define MARPA_CAT(a, b) @[ a ## b @] 664@<Public defines@>@/ 665@<Public incomplete structures@>@/ 666@<Public typedefs@>@/@\ 667@<Callback typedefs@>@/ 668@<Public structures@>@/ 669@<Public function prototypes@>@/ 670 671@** Grammar (GRAMMAR) Code. 672@<Public incomplete structures@> = struct marpa_g; 673@ @<Private structures@> = struct marpa_g { 674@<Widely aligned grammar elements@>@; 675@<Int aligned grammar elements@>@; 676@<Bit aligned grammar elements@>@; 677}; 678typedef struct marpa_g GRAMMARD; 679@ @<Private typedefs@> = 680typedef struct marpa_g* GRAMMAR; 681typedef const struct marpa_g* GRAMMAR_Const; 682 683@ @<Function definitions@> = 684struct marpa_g* marpa_g_new( void) 685{ struct marpa_g* g = g_slice_new(struct marpa_g); 686 @<Initialize grammar elements@>@; 687 return g; } 688@ @<Public function prototypes@> = 689struct marpa_g* marpa_g_new(void); 690 691@ @<Function definitions@> = 692void marpa_g_free(struct marpa_g *g) 693{ @<Destroy grammar elements@>@; 694g_slice_free(struct marpa_g, g); 695} 696@ @<Public function prototypes@> = 697void marpa_g_free(struct marpa_g *g); 698 699@*0 The Grammar ID. 700A unique ID for the grammar. 701This must be unique not just per-thread, 702but process-wide. 703The counter which tracks grammar ID's 704(|next_grammar_id|) 705is (at this writing) the only global 706non-constant, and requires special handling to 707keep |libmarpa| MT-safe. 708(|next_grammar_id|) is accessed only via 709|glib|'s special atomic operations. 710@ @<Int aligned grammar elements@> = gint t_id; 711@ @<Public typedefs@> = typedef gint Marpa_Grammar_ID; 712@ @<Private global variables@> = static gint next_grammar_id = 1; 713@ @<Initialize grammar elements@> = 714g->t_id = g_atomic_int_exchange_and_add(&next_grammar_id, 1); 715@ @<Function definitions@> = 716gint marpa_grammar_id(struct marpa_g* g) { return g->t_id; } 717@ @<Public function prototypes@> = 718gint marpa_grammar_id(struct marpa_g* g); 719 720@*0 The Grammar's Symbol List. 721This lists the symbols for the grammar, 722with their 723|Marpa_Symbol_ID| as the index. 724 725@<Widely aligned grammar elements@> = GArray* t_symbols; 726@ @<Initialize grammar elements@> = 727g->t_symbols = g_array_new(FALSE, FALSE, sizeof(SYM)); 728@ @<Destroy grammar elements@> = 729{ Marpa_Symbol_ID id; for (id = 0; id < (Marpa_Symbol_ID)g->t_symbols->len; id++) 730{ symbol_free(SYM_by_ID(id)); } } 731g_array_free(g->t_symbols, TRUE); 732 733@ The trace accessor returns the GArray. 734It remains ``owned" by the Grammar, 735and must not be freed or modified. 736@<Function definitions@> = 737GArray *marpa_g_symbols_peek(struct marpa_g* g) 738{ return g->t_symbols; } 739@ @<Public function prototypes@> = 740GArray *marpa_g_symbols_peek(struct marpa_g* g); 741 742@ Symbol count accesor. 743@d SYM_Count_of_G(g) ((g)->t_symbols->len) 744 745@ Symbol by ID. 746@d SYM_by_ID(id) (g_array_index(g->t_symbols, SYM, (id))) 747 748@ Adds the symbol to the list of symbols kept by the Grammar 749object. 750@<Private inline functions@> = 751static inline 752void g_symbol_add( 753 struct marpa_g *g, 754 Marpa_Symbol_ID symid, 755 SYM symbol) 756{ 757 g_array_insert_val(g->t_symbols, (unsigned)symid, symbol); 758} 759 760@ Check that symbol is in valid range. 761@<Function definitions@> = 762static inline gint symbol_is_valid( 763const struct marpa_g *g, const Marpa_Symbol_ID symid) { 764return symid >= 0 && (guint)symid < g->t_symbols->len; 765} 766@ @<Private function prototypes@> = 767static inline gint symbol_is_valid( 768const struct marpa_g *g, const Marpa_Symbol_ID symid); 769 770@*0 The Grammar's Rule List. 771This lists the rules for the grammar, 772with their |Marpa_Rule_ID| as the index. 773@d RULE_Count_of_G(g) ((g)->t_rules->len) 774@<Widely aligned grammar elements@> = GArray* t_rules; 775@ @<Initialize grammar elements@> = 776g->t_rules = g_array_new(FALSE, FALSE, sizeof(RULE)); 777@ @<Destroy grammar elements@> = 778g_array_free(g->t_rules, TRUE); 779 780@ The trace accessor returns the GArray. 781It remains ``owned" by the Grammar, 782and must not be freed or modified. 783@<Function definitions@> = 784GArray *marpa_g_rules_peek(struct marpa_g* g) 785{ return g->t_rules; } 786@ @<Public function prototypes@> = 787GArray *marpa_g_rules_peek(struct marpa_g* g); 788 789@ Internal accessor to find a rule by its id. 790@d RULE_by_ID(g, id) (g_array_index((g)->t_rules, RULE, (id))) 791 792@ Adds the rule to the list of rules kept by the Grammar 793object. 794@<Private inline functions@> = 795static inline 796void rule_add( 797 struct marpa_g *g, 798 RULEID rule_id, 799 RULE rule) 800{ 801 g_array_insert_val(g->t_rules, (unsigned)rule_id, rule); 802 LV_Size_of_G(g) += 1 + Length_of_RULE(rule); 803 g->t_max_rule_length = MAX(Length_of_RULE(rule), g->t_max_rule_length); 804} 805 806@ Check that rule is in valid range. 807@d RULEID_of_G_is_Valid(g, rule_id) 808 ((rule_id) >= 0 && (guint)(rule_id) < (g)->t_rules->len) 809 810@*0 Default Value. 811@d Default_Value_of_G(g) ((g)->t_default_value) 812@<Widely aligned grammar elements@> = gpointer t_default_value; 813@ @<Initialize grammar elements@> = 814Default_Value_of_G(g) = NULL; 815@ @<Public function prototypes@> = 816gpointer marpa_default_value(struct marpa_g* g); 817@ @<Function definitions@> = 818gpointer marpa_default_value(struct marpa_g* g) 819{ return Default_Value_of_G(g); } 820@ @<Public function prototypes@> = 821gboolean marpa_default_value_set(struct marpa_g*g, gpointer default_value); 822@ @<Function definitions@> = 823gboolean marpa_default_value_set(struct marpa_g*g, gpointer default_value) 824{ 825 @<Return |FALSE| on failure@>@; 826 @<Fail if grammar is precomputed@>@; 827 Default_Value_of_G(g) = default_value; 828 return TRUE; 829} 830 831@*0 Start Symbol. 832@<Int aligned grammar elements@> = Marpa_Symbol_ID t_start_symid; 833@ @<Initialize grammar elements@> = 834g->t_start_symid = -1; 835@ @<Function definitions@> = 836Marpa_Symbol_ID marpa_start_symbol(struct marpa_g* g) 837{ return g->t_start_symid; } 838@ @<Public function prototypes@> = 839Marpa_Symbol_ID marpa_start_symbol(struct marpa_g* g); 840@ Returns |TRUE| on success, 841|FALSE| on failure. 842@<Function definitions@> = 843gboolean marpa_start_symbol_set(struct marpa_g*g, Marpa_Symbol_ID symid) 844{ 845 @<Return |FALSE| on failure@>@; 846 @<Fail if grammar is precomputed@>@; 847 @<Fail if grammar |symid| is invalid@>@; 848 g->t_start_symid = symid; 849 return TRUE; 850} 851@ @<Public function prototypes@> = 852gboolean marpa_start_symbol_set(struct marpa_g*g, Marpa_Symbol_ID id); 853 854@*0 Start Rules. 855These are the start rules, after the grammar is augmented. 856Only one of these needs to be non-NULL. 857@<Int aligned grammar elements@> = 858RULE t_null_start_rule; 859RULE t_proper_start_rule; 860@ @<Initialize grammar elements@> = 861g->t_null_start_rule = NULL; 862g->t_proper_start_rule = NULL; 863 864@*0 The Grammar's Size. 865Intuitively, 866I define a grammar's size as the total size, in symbols, of all of its 867rules. 868This includes both the LHS symbol and the RHS symbol. 869Since every rule has exactly one LHS symbol, 870the grammar's size is always equal to the total of 871all the rules lengths, plus the total number of rules. 872 873Unused rules are not included in the theoretical number, 874but Marpa does not necessarily deduct rules from the 875count as they are marked useless. 876This means that the 877grammar will always be of this size or smaller. 878As rules are marked useless, they are not necessarily deducted 879from the count. 880The purpose of tracking grammar size is to allocate resources, 881and for that purpose a high-ball estimate is adequate. 882@d Size_of_G(g) ((g)->t_size) 883@d LV_Size_of_G(g) ((g)->t_size) 884@ @<Int aligned grammar elements@> = int t_size; 885@ @<Initialize grammar elements@> = 886LV_Size_of_G(g) = 0; 887 888@*0 The Maximum Rule Length. 889This is a high-ball estimate of the length of the 890longest rule in the grammar. 891The actual value will always be this number or smaller. 892\par 893The value is used for allocating resources. 894Unused rules are not included in the theoretical number, 895but Marpa does not adjust this number as rules 896are marked useless. 897@ @<Int aligned grammar elements@> = gint t_max_rule_length; 898@ @<Initialize grammar elements@> = 899g->t_max_rule_length = 0; 900 901@*0 Grammar Boolean: Precomputed. 902@ @<Public function prototypes@> = 903gboolean marpa_is_precomputed(const struct marpa_g* const g); 904@ @d G_is_Precomputed(g) ((g)->t_is_precomputed) 905@<Bit aligned grammar elements@> = guint t_is_precomputed:1; 906@ @<Initialize grammar elements@> = 907g->t_is_precomputed = FALSE; 908@ @<Function definitions@> = 909gboolean marpa_is_precomputed(const struct marpa_g* const g) 910{ return G_is_Precomputed(g); } 911 912@*0 Grammar Boolean: Has Loop. 913@<Bit aligned grammar elements@> = guint t_has_loop:1; 914@ @<Initialize grammar elements@> = 915g->t_has_loop = FALSE; 916@ The internal accessor would be trivial, so there is none. 917@<Function definitions@> = 918gboolean marpa_has_loop(struct marpa_g* g) 919{ return g->t_has_loop; } 920@ @<Public function prototypes@> = 921gboolean marpa_has_loop(struct marpa_g* g); 922 923@*0 Grammar Boolean: LHS Terminal OK. 924Traditionally, a BNF grammar did {\bf not} allow a symbol 925which was a terminal symbol of the grammar, to also be a LHS 926symbol. 927By default, this is allowed under Marpa. 928@<Bit aligned grammar elements@> = guint t_is_lhs_terminal_ok:1; 929@ @<Initialize grammar elements@> = 930g->t_is_lhs_terminal_ok = TRUE; 931@ The internal accessor would be trivial, so there is none. 932@<Function definitions@> = 933gboolean marpa_is_lhs_terminal_ok(struct marpa_g* g) 934{ return g->t_is_lhs_terminal_ok; } 935@ @<Public function prototypes@> = 936gboolean marpa_is_lhs_terminal_ok(struct marpa_g* g); 937@ Returns |TRUE| on success, 938|FALSE| on failure. 939@<Function definitions@> = 940gboolean marpa_is_lhs_terminal_ok_set( 941struct marpa_g*g, gboolean value) 942{ 943 if (G_is_Precomputed(g)) { 944 g->t_error = "precomputed"; 945 return FALSE; 946 } 947 g->t_is_lhs_terminal_ok = value; 948 return TRUE; 949} 950@ @<Public function prototypes@> = 951gboolean marpa_is_lhs_terminal_ok_set( struct marpa_g*g, gboolean value); 952 953@*0 Terminal Boolean Vector. 954A boolean vector, with bits sets if the symbol is a 955terminal. 956This is not used as the working vector while doing 957the census, because not all symbols have been added at 958that point. 959At grammar initialization, this vector cannot be sized. 960It is initialized to |NULL| so that the destructor 961can tell if there is a bit vector to be freed. 962@<Widely aligned grammar elements@> = Bit_Vector t_bv_symid_is_terminal; 963@ @<Initialize grammar elements@> = g->t_bv_symid_is_terminal = NULL; 964@ @<Destroy grammar elements@> = 965if (g->t_bv_symid_is_terminal) { bv_free(g->t_bv_symid_is_terminal); } 966 967@*0 The Grammar's Context. 968The ``context" is a hash of miscellaneous data, 969by keyword. 970It is so called because its purpose is to 971provide callbacks with ``context" --- 972data about 973|libmarpa|'s state which is not conveniently 974available in other forms. 975@d Context_of_G(g) ((g)->t_context) 976@<Widely aligned grammar elements@> = GHashTable* t_context; 977@ @<Initialize grammar elements@> = 978g->t_context = g_hash_table_new_full( g_str_hash, g_str_equal, NULL, g_free ); 979@ @<Destroy grammar elements@> = g_hash_table_destroy(Context_of_G(g)); 980 981@ @<Public defines@> = 982#define MARPA_CONTEXT_INT 1@/ 983#define MARPA_CONTEXT_CONST 2@/ 984#define MARPA_IS_CONTEXT_INT(v) @| @[ ((v)->t_type == MARPA_CONTEXT_INT) @]@/ 985#define MARPA_CONTEXT_INT_VALUE(v) @| \ 986@[ ((v)->t_type == MARPA_CONTEXT_INT \ 987 ? ((struct marpa_context_int_value*)v)->t_data \ 988 : G_MININT) @]@/ 989#define MARPA_CONTEXT_STRING_VALUE(v) @| \ 990@[ ((v)->t_type == MARPA_CONTEXT_CONST \ 991 ? ((struct marpa_context_const_value*)v)->t_data \ 992 : NULL) @]@/ 993@ @<Public structures@> = 994struct marpa_context_int_value { 995 gint t_type; 996 gint t_data; 997}; 998@ @<Public structures@> = 999struct marpa_context_const_value { 1000 gint t_type; 1001 const gchar* t_data; 1002}; 1003@ @<Public structures@> = 1004union marpa_context_value { 1005 gint t_type; 1006 struct marpa_context_int_value t_int_value; 1007 struct marpa_context_const_value t_const_value; 1008}; 1009 1010@ Add an integer to the context. 1011These functions might be converted to be public. 1012For now they are only for use by |libmarpa| in setting 1013values to be read by the higher layers, 1014are therefore internal. 1015 1016The const qualifier on the key is deliberately discarded. 1017As implemented, the keys are treated as const's by 1018|g_hash_table_insert|, but the compiler can't know 1019that is my intention. 1020For type safety, I do want to keep the |const| 1021qualifier in other contexts. 1022@<Function definitions@> = 1023static inline 1024void g_context_int_add(struct marpa_g* g, const gchar* key, gint payload) 1025{ 1026 struct marpa_context_int_value* value 1027 = g_new(struct marpa_context_int_value, 1); 1028 value->t_type = MARPA_CONTEXT_INT; 1029 value->t_data = payload; 1030 g_hash_table_insert(Context_of_G(g), (gpointer)key, value); 1031} 1032@ @<Private function prototypes@> = 1033static inline 1034void g_context_int_add(struct marpa_g* g, const gchar* key, gint value); 1035@ @<Function definitions@> = 1036static inline 1037void context_const_add(struct marpa_g* g, const gchar* key, const gchar* payload) 1038{ 1039 struct marpa_context_const_value* value 1040 = g_new(struct marpa_context_const_value, 1); 1041 value->t_type = MARPA_CONTEXT_CONST; 1042 value->t_data = payload; 1043 g_hash_table_insert(Context_of_G(g), (gpointer)key, value); 1044} 1045@ @<Private function prototypes@> = 1046static inline 1047void context_const_add(struct marpa_g* g, const gchar* key, const gchar* value); 1048 1049@ Clear the current context. 1050Used to create a ``clean slate" in the context. 1051@<Function definitions@> = 1052static inline void g_context_clear(struct marpa_g* g) { 1053 g_hash_table_remove_all(Context_of_G(g)); } 1054@ @<Private function prototypes@> = 1055static inline void g_context_clear(struct marpa_g* g); 1056 1057@ @<Function definitions@> = 1058union marpa_context_value* marpa_g_context_value(struct marpa_g* g, const gchar* key) 1059{ return g_hash_table_lookup(Context_of_G(g), key); } 1060@ @<Public function prototypes@> = 1061union marpa_context_value* marpa_g_context_value(struct marpa_g* g, const gchar* key); 1062 1063@*0 The Grammar Obstacks. 1064Two obstacks with the same lifetime as the grammar. 1065This is a very efficient way of allocating memory which won't be 1066resized and which will have the same lifetime as the grammar. 1067One obstack is reserved for of ``tricky" operations 1068like |obs_free|, 1069which require coordination with other allocations. 1070The other obstack is reserved for ``safe" operations---% 1071complete allocations which are never reversed. 1072The dual obstacks allow me to get tricky where it is useful, 1073which also allowing most obstack allocations to be done safely without 1074the need to carefully examine their context. 1075@<Widely aligned grammar elements@> = 1076struct obstack t_obs; 1077struct obstack t_obs_tricky; 1078@ @<Initialize grammar elements@> = 1079obstack_init(&g->t_obs); 1080obstack_init(&g->t_obs_tricky); 1081@ @<Destroy grammar elements@> = 1082obstack_free(&g->t_obs, NULL); 1083obstack_free(&g->t_obs_tricky, NULL); 1084 1085@*0 The Grammar's Error ID. 1086This is an error flag for the grammar. 1087Error status is not necessarily cleared 1088on successful return, so that 1089it is only valid when an external 1090function has indicated there is an error, 1091and becomes invalid again when another external method 1092is called on the grammar. 1093Checking it at other times may reveal ``stale" error 1094messages. 1095@<Public typedefs@> = 1096typedef const gchar* Marpa_Error_ID; 1097@ @<Widely aligned grammar elements@> = Marpa_Error_ID t_error; 1098@ @<Initialize grammar elements@> = 1099g->t_error = NULL; 1100@ There is no destructor. 1101The error strings are assummed to be 1102{\bf not} error messages, but ``cookies". 1103These cookies are constants residing in static memory 1104(which may be read-only depending on implementation). 1105They cannot and should not be de-allocated. 1106@ @<Function definitions@> = 1107Marpa_Error_ID marpa_g_error(const struct marpa_g* g) 1108{ return g->t_error ? g->t_error : "unknown error"; } 1109@ @<Public function prototypes@> = 1110Marpa_Error_ID marpa_g_error(const struct marpa_g* g); 1111 1112@** Symbol (SYM) Code. 1113@s Marpa_Symbol_ID int 1114@<Public typedefs@> = 1115typedef gint Marpa_Symbol_ID; 1116@ @<Private typedefs@> = 1117typedef gint SYMID; 1118@ @<Private incomplete structures@> = 1119struct s_symbol; 1120typedef struct s_symbol* SYM; 1121typedef const struct s_symbol* SYM_Const; 1122@ The initial element is a type gint so that 1123symbol structure may be used where or-nodes are 1124expected. 1125@<Private structures@> = 1126struct s_symbol { 1127 @<Widely aligned symbol elements@>@; 1128 @<Int aligned symbol elements@>@; 1129 @<Bit aligned symbol elements@>@; 1130}; 1131typedef struct s_symbol SYM_Object; 1132 1133@ @<Private function prototypes@> = 1134static inline 1135SYM symbol_new(struct marpa_g *g); 1136@ @<Function definitions@> = 1137static inline SYM 1138symbol_new (struct marpa_g *g) 1139{ 1140 SYM symbol = g_malloc (sizeof (SYM_Object)); 1141 @<Initialize symbol elements @>@/ 1142 { 1143 SYMID id = ID_of_SYM(symbol); 1144 g_symbol_add (g, id, symbol); 1145 } 1146 return symbol; 1147} 1148 1149@ @<Public function prototypes@> = 1150Marpa_Symbol_ID marpa_symbol_new(struct marpa_g *g); 1151@ @<Function definitions@> = 1152Marpa_Symbol_ID 1153marpa_symbol_new (struct marpa_g * g) 1154{ 1155 SYMID id = ID_of_SYM(symbol_new (g)); 1156 symbol_callback (g, id); 1157 return id; 1158} 1159 1160@ @<Function definitions@> = 1161static inline void symbol_free(SYM symbol) 1162{ @<Free symbol elements@>@; g_free(symbol); } 1163@ @<Private function prototypes@> = 1164static inline void symbol_free(SYM symbol); 1165 1166@ Symbol ID: This is the unique identifier for the symbol. 1167@d ID_of_SYM(sym) ((sym)->t_symbol_id) 1168@d LV_ID_of_SYM(sym) ID_of_SYM(sym) 1169@<Int aligned symbol elements@> = SYMID t_symbol_id; 1170@ @<Initialize symbol elements@> = LV_ID_of_SYM(symbol) = g->t_symbols->len; 1171 1172@*0 Symbol LHS Rules Element. 1173This tracks the rules for which this symbol is the LHS. 1174It is an optimization --- the same information could be found 1175by scanning the rules every time this information is needed. 1176The implementation is a |GArray|. 1177@d SYMBOL_LHS_RULE_COUNT(symbol) ((symbol)->t_lhs->len) 1178@<Widely aligned symbol elements@> = GArray* t_lhs; 1179@ @<Initialize symbol elements@> = 1180symbol->t_lhs = g_array_new(FALSE, FALSE, sizeof(Marpa_Rule_ID)); 1181@ @<Free symbol elements@> = 1182g_array_free(symbol->t_lhs, TRUE); 1183@ The trace accessor returns the GArray. 1184It remains ``owned" by the Grammar, 1185and must not be freed or modified. 1186@<Function definitions@> = 1187GArray *marpa_symbol_lhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid) 1188{ @<Return |NULL| on failure@>@; 1189@<Fail if grammar |symid| is invalid@>@; 1190return SYM_by_ID(symid)->t_lhs; } 1191@ @<Public function prototypes@> = 1192GArray *marpa_symbol_lhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid); 1193@ @<Function definitions@> = static inline 1194void symbol_lhs_add(SYM symbol, Marpa_Rule_ID rule_id) 1195{ g_array_append_val(symbol->t_lhs, rule_id); } 1196void 1197marpa_symbol_lhs_add(struct marpa_g*g, Marpa_Symbol_ID symid, Marpa_Rule_ID rule_id) 1198{ symbol_lhs_add(SYM_by_ID(symid), rule_id); } 1199@ @<Private function prototypes@> = 1200void 1201marpa_symbol_lhs_add(struct marpa_g*g, Marpa_Symbol_ID symid, Marpa_Rule_ID rule_id); 1202 1203@*0 Symbol RHS Rules Element. 1204This tracks the rules for which this symbol is the RHS. 1205It is an optimization --- the same information could be found 1206by scanning the rules every time this information is needed. 1207The implementation is a |GArray|. 1208@<Widely aligned symbol elements@> = GArray* t_rhs; 1209@ @<Initialize symbol elements@> = 1210symbol->t_rhs = g_array_new(FALSE, FALSE, sizeof(Marpa_Rule_ID)); 1211@ @<Free symbol elements@> = g_array_free(symbol->t_rhs, TRUE); 1212 1213@ The trace accessor returns the GArray. 1214It remains ``owned" by the Grammar, 1215and must not be freed or modified. 1216@<Function definitions@> = 1217GArray *marpa_symbol_rhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid) 1218{ @<Return |NULL| on failure@>@; 1219@<Fail if grammar |symid| is invalid@>@; 1220return SYM_by_ID(symid)->t_rhs; } 1221@ @<Public function prototypes@> = 1222GArray *marpa_symbol_rhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid); 1223@ @<Function definitions@> = static inline 1224void symbol_rhs_add(SYM symbol, Marpa_Rule_ID rule_id) 1225{ g_array_append_val(symbol->t_rhs, rule_id); } 1226@ @<Private function prototypes@> = static inline 1227void symbol_rhs_add(SYM symbol, Marpa_Rule_ID rule_id); 1228 1229@ Symbol Is Accessible Boolean 1230@<Bit aligned symbol elements@> = guint t_is_accessible:1; 1231@ @<Initialize symbol elements@> = 1232symbol->t_is_accessible = FALSE; 1233@ The trace accessor returns the Boolean value. 1234Right now this function uses a pointer 1235to the symbol function. 1236If that becomes private, 1237the prototype of this function 1238must be changed. 1239\par 1240The internal accessor would be trivial, so there is none. 1241@<Function definitions@> = 1242gboolean marpa_symbol_is_accessible(struct marpa_g* g, Marpa_Symbol_ID id) 1243{ return SYM_by_ID(id)->t_is_accessible; } 1244@ @<Public function prototypes@> = 1245gboolean marpa_symbol_is_accessible(struct marpa_g* g, Marpa_Symbol_ID id); 1246 1247@ Symbol Is Counted Boolean 1248@<Bit aligned symbol elements@> = guint t_is_counted:1; 1249@ @<Initialize symbol elements@> = 1250symbol->t_is_counted = FALSE; 1251@ The trace accessor returns the Boolean value. 1252Right now this function uses a pointer 1253to the symbol function. 1254If that becomes private, 1255the prototype of this function 1256must be changed. 1257\par 1258The internal accessor would be trivial, so there is none. 1259@<Function definitions@> = 1260gboolean marpa_symbol_is_counted(struct marpa_g* g, Marpa_Symbol_ID id) 1261{ return SYM_by_ID(id)->t_is_counted; } 1262@ @<Public function prototypes@> = 1263gboolean marpa_symbol_is_counted(struct marpa_g* g, Marpa_Symbol_ID id); 1264 1265@ Symbol Is Nullable Boolean 1266@<Bit aligned symbol elements@> = guint t_is_nullable:1; 1267@ @<Initialize symbol elements@> = 1268symbol->t_is_nullable = FALSE; 1269@ The trace accessor returns the Boolean value. 1270Right now this function uses a pointer 1271to the symbol function. 1272If that becomes private, 1273the prototype of this function 1274must be changed. 1275\par 1276The internal accessor would be trivial, so there is none. 1277@<Function definitions@> = 1278gboolean marpa_symbol_is_nullable(struct marpa_g* g, Marpa_Symbol_ID id) 1279{ return SYM_by_ID(id)->t_is_nullable; } 1280@ @<Public function prototypes@> = 1281gboolean marpa_symbol_is_nullable(struct marpa_g* g, Marpa_Symbol_ID id); 1282 1283@ Symbol Is Nulling Boolean 1284@d SYM_is_Nulling(sym) ((sym)->t_is_nulling) 1285@<Bit aligned symbol elements@> = guint t_is_nulling:1; 1286@ @<Initialize symbol elements@> = 1287symbol->t_is_nulling = FALSE; 1288@ The trace accessor returns the Boolean value. 1289Right now this function uses a pointer 1290to the symbol function. 1291If that becomes private, 1292the prototype of this function 1293must be changed. 1294\par 1295The internal accessor would be trivial, so there is none. 1296@<Function definitions@> = 1297gint marpa_symbol_is_nulling(struct marpa_g* g, Marpa_Symbol_ID symid) 1298{ @<Return |-2| on failure@>@; 1299@<Fail if grammar |symid| is invalid@>@; 1300return SYM_is_Nulling(SYM_by_ID(symid)); } 1301@ @<Public function prototypes@> = 1302gint marpa_symbol_is_nulling(struct marpa_g* g, Marpa_Symbol_ID id); 1303 1304@ Symbol Is Terminal Boolean 1305@<Bit aligned symbol elements@> = guint t_is_terminal:1; 1306@ @<Initialize symbol elements@> = 1307symbol->t_is_terminal = FALSE; 1308@ The trace accessor returns the Boolean value. 1309Right now this function uses a pointer 1310to the symbol function. 1311If that becomes private, 1312the prototype of this function 1313must be changed. 1314\par 1315The internal accessor would be trivial, so there is none. 1316@d SYM_is_Terminal(symbol) ((symbol)->t_is_terminal) 1317@d SYMID_is_Terminal(id) (SYM_is_Terminal(SYM_by_ID(id))) 1318@<Function definitions@> = 1319gboolean marpa_symbol_is_terminal(struct marpa_g* g, Marpa_Symbol_ID id) 1320{ return SYMID_is_Terminal(id); } 1321@ @<Public function prototypes@> = 1322gboolean marpa_symbol_is_terminal(struct marpa_g* g, Marpa_Symbol_ID id); 1323@ @<Function definitions@> = 1324void marpa_symbol_is_terminal_set( 1325struct marpa_g*g, Marpa_Symbol_ID id, gboolean value) 1326{ SYMID_is_Terminal(id) = value; } 1327@ @<Public function prototypes@> = 1328void marpa_symbol_is_terminal_set( struct marpa_g*g, Marpa_Symbol_ID id, gboolean value); 1329 1330@ Symbol Is Productive Boolean 1331@<Bit aligned symbol elements@> = guint t_is_productive:1; 1332@ @<Initialize symbol elements@> = 1333symbol->t_is_productive = FALSE; 1334@ The trace accessor returns the Boolean value. 1335Right now this function uses a pointer 1336to the symbol function. 1337If that becomes private, 1338the prototype of this function 1339must be changed. 1340\par 1341The internal accessor would be trivial, so there is none. 1342@<Function definitions@> = 1343gboolean marpa_symbol_is_productive(struct marpa_g* g, Marpa_Symbol_ID id) 1344{ return SYM_by_ID(id)->t_is_productive; } 1345@ @<Public function prototypes@> = 1346gboolean marpa_symbol_is_productive(struct marpa_g* g, Marpa_Symbol_ID id); 1347 1348@ Symbol Is Start Boolean 1349@<Bit aligned symbol elements@> = guint t_is_start:1; 1350@ @<Initialize symbol elements@> = symbol->t_is_start = FALSE; 1351@ Accessor: The trace accessor returns the Boolean value. 1352The internal accessor would be trivial, so there is none. 1353@<Function definitions@> = 1354static inline 1355gint symbol_is_start(SYM symbol) 1356{ return symbol->t_is_start; } 1357gint marpa_symbol_is_start( struct marpa_g*g, Marpa_Symbol_ID symid) 1358{ @<Return |-2| on failure@>@; 1359@<Fail if grammar |symid| is invalid@>@; 1360 return symbol_is_start(SYM_by_ID(symid)); 1361} 1362@ @<Private function prototypes@> = 1363static inline 1364gint symbol_is_start(SYM symbol); 1365@ @<Public function prototypes@> = 1366gint marpa_symbol_is_start( struct marpa_g*g, Marpa_Symbol_ID id); 1367 1368@ Symbol Aliasing: 1369This is the logic for aliasing symbols. 1370In the Aycock-Horspool algorithm, from which Marpa is derived, 1371it is essential that there be no ``proper nullable" 1372symbols. Therefore, all proper nullable symbols in 1373the original grammar are converted into two, aliased, 1374symbols: a non-nullable (or ``proper") alias and a nulling alias. 1375@<Bit aligned symbol elements@> = 1376guint t_is_proper_alias:1; 1377guint t_is_nulling_alias:1; 1378@ @<Widely aligned symbol elements@> = 1379struct s_symbol* t_alias; 1380@ @<Initialize symbol elements@> = 1381symbol->t_is_proper_alias = FALSE; 1382symbol->t_is_nulling_alias = FALSE; 1383symbol->t_alias = NULL; 1384 1385@ Proper Alias Trace Accessor: 1386If this symbol is a nulling symbol 1387with a proper alias, returns the proper alias. 1388Otherwise, returns |NULL|. 1389@<Function definitions@> = 1390static inline 1391SYM symbol_proper_alias(SYM symbol) 1392{ return symbol->t_is_nulling_alias ? symbol->t_alias : NULL; } 1393Marpa_Symbol_ID marpa_symbol_proper_alias(struct marpa_g* g, Marpa_Symbol_ID symid) 1394{ 1395SYM symbol; 1396SYM proper_alias; 1397@<Return |-2| on failure@>@; 1398@<Fail if grammar |symid| is invalid@>@; 1399symbol = SYM_by_ID(symid); 1400proper_alias = symbol_proper_alias(symbol); 1401return proper_alias == NULL ? -1 : ID_of_SYM(proper_alias); 1402} 1403@ @<Private function prototypes@> = 1404static inline SYM symbol_proper_alias(SYM symbol); 1405@ @<Public function prototypes@> = 1406Marpa_Symbol_ID marpa_symbol_proper_alias(struct marpa_g* g, Marpa_Symbol_ID symid); 1407 1408@ Nulling Alias Trace Accessor: 1409If this symbol is a proper (non-nullable) symbol 1410with a nulling alias, returns the nulling alias. 1411Otherwise, returns |NULL|. 1412@<Function definitions@> = 1413static inline 1414SYM symbol_null_alias(SYM symbol) 1415{ return symbol->t_is_proper_alias ? symbol->t_alias : NULL; } 1416Marpa_Symbol_ID marpa_symbol_null_alias(struct marpa_g* g, Marpa_Symbol_ID symid) 1417{ 1418SYM symbol; 1419SYM alias; 1420@<Return |-2| on failure@>@; 1421@<Fail if grammar |symid| is invalid@>@; 1422symbol = SYM_by_ID(symid); 1423alias = symbol_null_alias(symbol); 1424if (alias == NULL) { 1425 g_context_int_add(g, "symid", symid); 1426 g->t_error = "no alias"; 1427 return -1; 1428} 1429return ID_of_SYM(alias); 1430} 1431@ @<Private function prototypes@> = 1432static inline SYM symbol_null_alias(SYM symbol); 1433@ @<Public function prototypes@> = 1434Marpa_Symbol_ID marpa_symbol_null_alias(struct marpa_g* g, Marpa_Symbol_ID symid); 1435 1436@ Given a proper nullable symbol as its argument, 1437converts the argument into two ``aliases". 1438The proper (non-nullable) alias will have the same symbol ID 1439as the arugment. 1440The nulling alias will have a new symbol ID. 1441The return value is a pointer to the nulling alias. 1442@ @<Private function prototypes@> = 1443static inline 1444SYM symbol_alias_create(GRAMMAR g, SYM symbol); 1445@ @<Function definitions@> = static inline 1446SYM symbol_alias_create(GRAMMAR g, SYM symbol) 1447{ 1448 SYM alias = symbol_new(g); 1449 symbol->t_is_proper_alias = TRUE; 1450 SYM_is_Nulling(symbol) = FALSE; 1451 symbol->t_is_nullable = FALSE; 1452 symbol->t_alias = alias; 1453 alias->t_is_nulling_alias = TRUE; 1454 SYM_is_Nulling(alias) = TRUE; 1455 alias->t_is_nullable = TRUE; 1456 alias->t_is_productive = TRUE; 1457 alias->t_is_accessible = symbol->t_is_accessible; 1458 alias->t_alias = symbol; 1459 return alias; 1460} 1461 1462@ {\bf Symbol callbacks}: The user can define a callback 1463(with argument) which is invoked whenever a symbol 1464is created. 1465@ Function pointer declarations are 1466hard to type and impossible to read. 1467This typedef localizes the damage. 1468@<Callback typedefs@> = 1469typedef void (Marpa_Symbol_Callback)(struct marpa_g *g, Marpa_Symbol_ID id); 1470@ @<Widely aligned grammar elements@> = 1471 Marpa_Symbol_Callback* t_symbol_callback; 1472 gpointer t_symbol_callback_arg; 1473@ @<Initialize grammar elements@> = 1474g->t_symbol_callback_arg = NULL; 1475g->t_symbol_callback = NULL; 1476@ @<Function definitions@> = 1477void marpa_symbol_callback_set(struct marpa_g *g, Marpa_Symbol_Callback*cb) 1478{ g->t_symbol_callback = cb; } 1479void marpa_symbol_callback_arg_set(struct marpa_g *g, gpointer cb_arg) 1480{ g->t_symbol_callback_arg = cb_arg; } 1481gpointer marpa_symbol_callback_arg(struct marpa_g *g) 1482{ return g->t_symbol_callback_arg; } 1483@ @<Public function prototypes@> = 1484void marpa_symbol_callback_set(struct marpa_g *g, Marpa_Symbol_Callback*cb); 1485void marpa_symbol_callback_arg_set(struct marpa_g *g, gpointer cb_arg); 1486gpointer marpa_symbol_callback_arg(struct marpa_g *g); 1487@ Do the symbol callback. 1488{\bf To Do}: @^To Do@> 1489Look at the possibility of leaking memory if the callback 1490never returns, but the grammar is destroyed. 1491@<Function definitions@> = 1492static inline void symbol_callback(struct marpa_g *g, Marpa_Symbol_ID id) 1493{ Marpa_Symbol_Callback* cb = g->t_symbol_callback; 1494if (cb) { (*cb)(g, id); } } 1495@ @<Private function prototypes@> = 1496static inline void symbol_callback(struct marpa_g *g, Marpa_Symbol_ID id); 1497 1498@** Rule (RULE) Code. 1499@s Marpa_Rule_ID int 1500@<Public typedefs@> = 1501typedef gint Marpa_Rule_ID; 1502@ @<Private structures@> = 1503struct s_rule { 1504 @<Int aligned rule elements@>@/ 1505 @<Bit aligned rule elements@>@/ 1506 @<Final rule elements@>@/ 1507}; 1508@ 1509@s RULE int 1510@s RULEID int 1511@<Private typedefs@> = 1512struct s_rule; 1513typedef struct s_rule* RULE; 1514typedef Marpa_Rule_ID RULEID; 1515 1516@*0 Rule Construction. 1517@ Set up the basic data. 1518This logic is intended to be common to all individual rules. 1519The name comes from the idea that this logic ``starts" 1520the initialization of a rule. 1521@ @<Private function prototypes@> = 1522PRIVATE_NOT_INLINE 1523RULE rule_start(GRAMMAR g, 1524SYMID lhs, SYMID *rhs, gint length); 1525@ GCC complains about inlining |rule_start| -- it is 1526not a tiny function, and it is repeated often. 1527@<Function definitions@> = 1528PRIVATE_NOT_INLINE 1529RULE rule_start(GRAMMAR g, 1530SYMID lhs, SYMID *rhs, gint length) 1531{ 1532 @<Return |NULL| on failure@>@; 1533 RULE rule; 1534 const gint rule_sizeof = G_STRUCT_OFFSET (struct s_rule, t_symbols) + 1535 (length + 1) * sizeof (rule->t_symbols[0]); 1536 @<Return failure on invalid rule symbols@>@/ 1537 rule = obstack_alloc (&g->t_obs, rule_sizeof); 1538 @<Initialize rule symbols@>@/ 1539 @<Initialize rule elements@>@/ 1540 rule_add(g, rule->t_id, rule); 1541 @<Add this rule to the symbol rule lists@> 1542 return rule; 1543} 1544 1545@ @<Public function prototypes@> = 1546Marpa_Rule_ID marpa_rule_new(struct marpa_g *g, 1547Marpa_Symbol_ID lhs, Marpa_Symbol_ID *rhs, gint length); 1548@ @<Function definitions@> = 1549Marpa_Rule_ID marpa_rule_new(struct marpa_g *g, 1550Marpa_Symbol_ID lhs, Marpa_Symbol_ID *rhs, gint length) 1551{ 1552 Marpa_Rule_ID rule_id; 1553 RULE rule; 1554 if (length > MAX_RHS_LENGTH) { 1555 g->t_error = (Marpa_Error_ID)"rhs too long"; 1556 return -1; 1557 } 1558 if (is_rule_duplicate(g, lhs, rhs, length) == TRUE) { 1559 g->t_error = (Marpa_Error_ID)"duplicate rule"; 1560 return -1; 1561 } 1562 rule = rule_start(g, lhs, rhs, length); 1563 if (!rule) { return -1; }@; 1564 rule_id = rule->t_id; 1565 rule_callback(g, rule_id); 1566 return rule_id; 1567} 1568 1569@ @<Public function prototypes@> = 1570Marpa_Rule_ID marpa_sequence_new(struct marpa_g *g, 1571Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id, 1572gint min, gint flags ); 1573@ @<Function definitions@> = 1574Marpa_Rule_ID marpa_sequence_new(struct marpa_g *g, 1575Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id, 1576gint min, gint flags ) 1577{ 1578 @<Return |-2| on failure@>@; 1579 Marpa_Rule_ID original_rule_id; 1580 RULE original_rule; 1581 Marpa_Symbol_ID internal_lhs_id, *temp_rhs;@; 1582 if (is_rule_duplicate(g, lhs_id, &rhs_id, 1) == TRUE) { 1583 g_context_clear(g); 1584 g->t_error = (Marpa_Error_ID)"duplicate rule"; 1585 return failure_indicator; 1586 } 1587 1588 @<Add the original rule for a sequence@>@; 1589 @<Check that the separator is valid or -1@>@; 1590 @<Mark the counted symbols@>@; 1591 if (min == 0) { @<Add the nulling rule for a sequence@>@; } 1592 min = 1; 1593 @<Create the internal LHS symbol@>@; 1594 @<Allocate the temporary rhs buffer@>@; 1595 @<Add the top rule for the sequence@>@; 1596 if (separator_id >= 0 && !(flags & MARPA_PROPER_SEPARATION)) { 1597 @<Add the alternate top rule for the sequence@>@; 1598 } 1599 @<Add the minimum rule for the sequence@>@; 1600 @<Add the iterating rule for the sequence@>@; 1601 @<Free the temporary rhs buffer@>@; 1602 return original_rule_id; 1603} 1604@ As a side effect, this checks the LHS and RHS symbols for validity. 1605@<Add the original rule for a sequence@> = 1606 original_rule = rule_start(g, lhs_id, &rhs_id, 1); 1607 if (!original_rule) { 1608 g_context_clear(g); 1609 g->t_error = "internal_error"; 1610 return failure_indicator; 1611 } 1612 RULE_is_Used(original_rule) = 0; 1613 original_rule_id = original_rule->t_id; 1614 original_rule->t_is_discard = !(flags & MARPA_KEEP_SEPARATION) 1615 && separator_id >= 0; 1616 rule_callback(g, original_rule_id); 1617 1618@ @<Check that the separator is valid or -1@> = 1619if (separator_id != -1 && !symbol_is_valid(g, separator_id)) { 1620 g_context_clear(g); 1621 g_context_int_add(g, "symid", separator_id); 1622 g->t_error = "bad separator"; 1623 return failure_indicator; 1624} 1625 1626@ @<Mark the counted symbols@> = 1627SYM_by_ID(rhs_id)->t_is_counted = 1; 1628if (separator_id >= 0) { SYM_by_ID(separator_id)->t_is_counted = 1; } 1629@ @<Add the nulling rule for a sequence@> = 1630 { RULE rule = rule_start(g, lhs_id, 0, 0); 1631 if (!rule) { @<Fail with internal grammar error@>@; } 1632 rule->t_is_semantic_equivalent = TRUE; 1633 rule->t_original = original_rule_id; 1634 rule_callback(g, rule->t_id); 1635 } 1636@ @<Create the internal LHS symbol@> = 1637 internal_lhs_id = ID_of_SYM(symbol_new(g)); 1638 symbol_callback(g, internal_lhs_id); 1639@ The actual size needed for the RHS buffer is determined by 1640the longer of minimum rule and the iterating rule. 1641The iterating rule may require 3 RHS symbols, if there is 1642a separator. 1643(We have $min>=1$ at this point.) 1644The minimum rule will require $1 + 2 * (min - 1)$ symbols 1645with a separator, and $min$ symbols without. 1646The allocation below uses a simplified expression, which 1647overallocates. 1648Worst case is the minimum rule with a separator, in 1649which case it allocates 4 bytes too many. 1650@<Allocate the temporary rhs buffer@> = 1651temp_rhs = g_new(Marpa_Symbol_ID, (3 + (separator_id < 0 ? 1 : 2) * min)); 1652@ @<Free the temporary rhs buffer@> = g_free(temp_rhs); 1653@ @<Add the top rule for the sequence@> = 1654{ RULE rule; 1655temp_rhs[0] = internal_lhs_id; 1656rule = rule_start(g, lhs_id, temp_rhs, 1); 1657if (!rule) { @<Fail with internal grammar error@>@; } 1658rule->t_original = original_rule_id; 1659rule->t_is_semantic_equivalent = TRUE; 1660/* Real symbol count remains at default of 0 */ 1661RULE_is_Virtual_RHS(rule) = TRUE; 1662rule_callback(g, rule->t_id); 1663} 1664@ This ``alternate" top rule is needed if a final separator is allowed. 1665@<Add the alternate top rule for the sequence@> = 1666{ RULE rule; 1667 temp_rhs[0] = internal_lhs_id; 1668 temp_rhs[1] = separator_id; 1669 rule = rule_start(g, lhs_id, temp_rhs, 2); 1670 if (!rule) { @<Fail with internal grammar error@>@; } 1671 rule->t_original = original_rule_id; 1672 rule->t_is_semantic_equivalent = TRUE; 1673 RULE_is_Virtual_RHS(rule) = TRUE; 1674 Real_SYM_Count_of_RULE(rule) = 1; 1675 rule_callback(g, rule->t_id); 1676} 1677@ The traditional way to write a sequence in BNF is with one 1678rule to represent the minimum, and another to deal with iteration. 1679That's the core of Marpa's rewrite. 1680@<Add the minimum rule for the sequence@> = 1681{ RULE rule; 1682gint rhs_ix, i; 1683 temp_rhs[0] = rhs_id; 1684 rhs_ix = 1; 1685 for (i = 0; i < min - 1; i++) { 1686 if (separator_id >= 0) temp_rhs[rhs_ix++] = separator_id; 1687 temp_rhs[rhs_ix++] = rhs_id; 1688 } 1689 rule = rule_start(g, internal_lhs_id, temp_rhs, rhs_ix); 1690 if (!rule) { @<Fail with internal grammar error@>@; } 1691 RULE_is_Virtual_LHS(rule) = 1; 1692 Real_SYM_Count_of_RULE(rule) = rhs_ix; 1693 rule_callback(g, rule->t_id); 1694} 1695@ @<Add the iterating rule for the sequence@> = 1696{ RULE rule; 1697gint rhs_ix = 0; 1698 temp_rhs[rhs_ix++] = internal_lhs_id; 1699 if (separator_id >= 0) temp_rhs[rhs_ix++] = separator_id; 1700 temp_rhs[rhs_ix++] = rhs_id; 1701 rule = rule_start(g, internal_lhs_id, temp_rhs, rhs_ix); 1702 if (!rule) { @<Fail with internal grammar error@>@; } 1703 RULE_is_Virtual_LHS(rule) = 1; 1704 RULE_is_Virtual_RHS(rule) = 1; 1705 Real_SYM_Count_of_RULE(rule) = rhs_ix - 1; 1706 rule_callback(g, rule->t_id); 1707} 1708 1709@ Does this rule duplicate an already existing rule? 1710A duplicate is a rule with the same lhs symbol, 1711the same rhs length, 1712and the same symbol in each position on the rhs. 1713 1714Note that this definition of duplicate applies to 1715sequences as well. That means that a sequence rule 1716can be a duplicate of a non-sequence rule of length 1, 1717if they have the same lhs symbols and the same rhs 1718symbol. 1719Also, that means you cannot define sequences 1720that differ only in the separator, or only in the 1721minimum count. 1722 1723I do not think the 1724restrictions on sequence rules represent real limitations. 1725Multiple sequences with the same lhs and rhs would be 1726very confusing. 1727And users who really, really want such them are free 1728to write the sequences out as BNF rules. 1729After all, sequence rules are only a shorthand. 1730And shorthand is counter-productive when it makes 1731you lose track of what you are trying to say. 1732 1733The algorithm is the first get a list of all the rules 1734with the same LHS, which is very fast because 1735I have pre-computed it. 1736If there are no such rules, the new rule is 1737unique (not a duplicate). 1738If there are such rules, I look at them, 1739trying to find one that duplicates the new 1740rule. 1741For each old rule, I first compare its length to 1742the new rule, and then its right hand side 1743symbols, one by one. 1744If all these comparisons succeed, I conclude 1745that the old rule duplicates the new one 1746and return |TRUE|. 1747If, after having done the comparison for all 1748the ``same LHS" rules, I have found no duplicates, 1749then I conclude there is no duplicate of the new 1750rule, and return |FALSE|. 1751@ @<Private function prototypes@> = 1752static inline 1753gboolean is_rule_duplicate(struct marpa_g* g, 1754Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID* rhs_ids, gint length); 1755@ @<Function definitions@> = 1756static inline 1757gboolean is_rule_duplicate(struct marpa_g* g, 1758Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID* rhs_ids, gint length) 1759{ 1760 gint ix; 1761 SYM lhs = SYM_by_ID(lhs_id); 1762 GArray* same_lhs_array = lhs->t_lhs; 1763 gint same_lhs_count = same_lhs_array->len; 1764 for (ix = 0; ix < same_lhs_count; ix++) { 1765 RULEID same_lhs_rule_id = ((RULEID *)(same_lhs_array->data))[ix]; 1766 gint rhs_position; 1767 RULE rule = RULE_by_ID(g, same_lhs_rule_id); 1768 const gint rule_length = Length_of_RULE(rule); 1769 if (rule_length != length) { goto RULE_IS_NOT_DUPLICATE; } 1770 for (rhs_position = 0; rhs_position < rule_length; rhs_position++) { 1771 if (RHS_ID_of_RULE(rule, rhs_position) != rhs_ids[rhs_position]) { 1772 goto RULE_IS_NOT_DUPLICATE; 1773 } 1774 } 1775 return TRUE; /* This rule duplicates the new one */ 1776 RULE_IS_NOT_DUPLICATE: ; 1777 } 1778 return FALSE; /* No duplicate rules were found */ 1779} 1780 1781@ Add the rules to the symbol's rule lists: 1782An obstack scratchpad might be useful for 1783the copy of the RHS symbols. 1784|alloca|, while tempting, should not used 1785because an unusually long RHS could cause 1786a stack overflow. 1787Even if such case is pathological, 1788a core dump is not the right response. 1789@<Add this rule to the symbol rule lists@> = 1790 symbol_lhs_add(SYM_by_ID(rule->t_symbols[0]), rule->t_id);@; 1791 if (Length_of_RULE(rule) > 0) { 1792 gint rh_list_ix; 1793 const guint alloc_size = Length_of_RULE(rule)*sizeof( SYMID); 1794 Marpa_Symbol_ID *rh_symbol_list = g_slice_alloc(alloc_size); 1795 gint rh_symbol_list_length = 1; 1796 @<Create |rh_symbol_list|, 1797 a duplicate-free list of the right hand side symbols@>@; 1798 for (rh_list_ix = 0; 1799 rh_list_ix < rh_symbol_list_length; 1800 rh_list_ix++) { 1801 symbol_rhs_add( 1802 SYM_by_ID(rh_symbol_list[rh_list_ix]), 1803 rule->t_id); 1804 }@; 1805 g_slice_free1(alloc_size, rh_symbol_list); 1806 } 1807 1808@ \marpa_sub{Create a duplicate-free list of the right hand side symbols} 1809The algorithm is a 1810hand-coded 1811insertion sort, modified to not insert duplicates. 1812@ The first goal is to optimize for the usual case, 1813where both the average and root mean square of 1814number of unique symbols on the RHS of a rule 1815is a small number -- usually less 1816than 10. 1817(Root mean square is more relevant than the average for 1818comparison with worst case performance.) 1819bizarrely long. 1820A hand-inlined insertion sort is perfect for 1821this. 1822\par It might be thought that the below could 1823be improved by finding the insertion point 1824with a binary search, but when the number of RHS symbols 1825for most rules is less than a certain number, 1826a the higher-overhead binary search is worse, 1827not better. 1828This number is probably around 8, and in practice most rules 1829are shorter than that. 1830A reasonable alternative is to only use binary search above 1831a certain size, but in most cases that will produce no 1832measurable improvement. 1833 1834@ A second goal is that behavior for unusual and pathological 1835cases be, if not optimal, reasonable. 1836Worst case for insertion sort is $O(n^2)$). 1837(This is why I used the root mean square, not a simple average.) 1838This would be approached if most of the right hand symbols were 1839in very long rules. 1840$O(n^2)$ is in fact, not actually a worse case than the quicksort 1841on which |qsort| is usually based. 1842The hand-coding here means it would take some effort to 1843construct a case in which 1844the theoretical advantage of another 1845sort algorithm would 1846show up in practice. 1847\par If anyone comes to care about very long right hand sides, 1848this algorithm can be changed to switch over to mergesort 1849when the right hand side exceeds a certain length. 1850The cost of an extra comparision is tiny, but then again, 1851so would the likelihood of any benefit from an alternative sort 1852algorithm would also 1853be tiny. 1854 1855@ The code assumes that the rhs has length greater than zero. 1856@<Create |rh_symbol_list|, a duplicate-free list of the right hand side symbols@> = 1857{ 1858/* Handle the first symbol as a special case */ 1859gint rhs_ix = Length_of_RULE (rule) - 1; 1860rh_symbol_list[0] = RHS_ID_of_RULE(rule, (unsigned)rhs_ix); 1861rh_symbol_list_length = 1; 1862rhs_ix--; 1863for (; rhs_ix >= 0; rhs_ix--) { 1864 gint higher_ix; 1865 Marpa_Symbol_ID new_symid = RHS_ID_of_RULE(rule, (unsigned)rhs_ix); 1866 gint next_highest_ix = rh_symbol_list_length - 1; 1867 while (next_highest_ix >= 0) { 1868 Marpa_Symbol_ID current_symid = rh_symbol_list[next_highest_ix]; 1869 if (current_symid == new_symid) goto ignore_this_symbol; 1870 if (current_symid < new_symid) break; 1871 next_highest_ix--; 1872 } 1873 /* Shift the higher symbol ID's up one slot */ 1874 for (higher_ix = rh_symbol_list_length-1; 1875 higher_ix > next_highest_ix; 1876 higher_ix--) { 1877 rh_symbol_list[higher_ix+1] = rh_symbol_list[higher_ix]; 1878 } 1879 /* Insert the next symbol */ 1880 rh_symbol_list[next_highest_ix+1] = new_symid; 1881 rh_symbol_list_length++; 1882 ignore_this_symbol: ; 1883} 1884} 1885 1886@*0 Rule Symbols. 1887A rule takes the traditiona form of 1888a left hand side (LHS), and a right hand side (RHS). 1889The {\bf length} of a rule is the length of the RHS --- 1890there is always exactly one LHS symbol. 1891Maximum length of the RHS is restricted. 1892I take off two more bits than necessary, as a fudge 1893factor. 1894This is only checked for new rules. 1895The rules generated internally by libmarpa 1896are shorter than 1897a small constant in length, and 1898rewrites of existing rules shorten them. 1899On a 32-bit machine, this still allows a RHS of over a billion 1900of symbols. 1901I believe 1902by the time 64-bit machines become universal, 1903nobody will have noticed this restriction. 1904@d MAX_RHS_LENGTH (G_MAXINT >> (2)) 1905@d Length_of_RULE(rule) ((rule)->t_rhs_length) 1906@<Int aligned rule elements@> = gint t_rhs_length; 1907@ The symbols come at the end of the |marpa_rule| structure, 1908so that they can be variable length. 1909@<Final rule elements@> = Marpa_Symbol_ID t_symbols[1]; 1910 1911@ @<Return failure on invalid rule symbols@> = 1912{ 1913 SYMID symid = lhs; 1914 @<Fail if grammar |symid| is invalid@>@; 1915} 1916{ gint rh_index; 1917 for (rh_index = 0; rh_index<length; rh_index++) { 1918 SYMID symid = rhs[rh_index]; 1919 @<Fail if grammar |symid| is invalid@>@; 1920 } 1921} 1922 1923@ @<Initialize rule symbols@> = 1924Length_of_RULE(rule) = length; 1925rule->t_symbols[0] = lhs; 1926{ gint i; for (i = 0; i<length; i++) { 1927 rule->t_symbols[i+1] = rhs[i]; } } 1928@ @<Function definitions@> = 1929static inline Marpa_Symbol_ID rule_lhs_get(RULE rule) { 1930 return rule->t_symbols[0]; } 1931@ @<Private function prototypes@> = 1932static inline Marpa_Symbol_ID rule_lhs_get(RULE rule); 1933@ @<Function definitions@> = 1934Marpa_Symbol_ID marpa_rule_lhs(struct marpa_g *g, Marpa_Rule_ID rule_id) { 1935 @<Return |-2| on failure@>@; 1936 @<Fail if grammar |rule_id| is invalid@>@; 1937 return rule_lhs_get(RULE_by_ID(g, rule_id)); } 1938@ @<Public function prototypes@> = 1939Marpa_Symbol_ID marpa_rule_lhs(struct marpa_g *g, Marpa_Rule_ID rule_id); 1940@ @<Function definitions@> = 1941static inline Marpa_Symbol_ID* rule_rhs_get(RULE rule) { 1942 return rule->t_symbols+1; } 1943@ @<Private function prototypes@> = 1944static inline Marpa_Symbol_ID* rule_rhs_get(RULE rule); 1945@ @<Public function prototypes@> = 1946Marpa_Symbol_ID marpa_rule_rh_symbol(struct marpa_g *g, Marpa_Rule_ID rule_id, gint ix); 1947@ @<Function definitions@> = 1948Marpa_Symbol_ID marpa_rule_rh_symbol(struct marpa_g *g, Marpa_Rule_ID rule_id, gint ix) { 1949 RULE rule; 1950 @<Return |-2| on failure@>@; 1951 @<Fail if grammar |rule_id| is invalid@>@; 1952 rule = RULE_by_ID(g, rule_id); 1953 if (Length_of_RULE(rule) <= ix) return -1; 1954 return RHS_ID_of_RULE(rule, ix); 1955} 1956@ @<Function definitions@> = 1957static inline gsize rule_length_get(RULE rule) { 1958 return Length_of_RULE(rule); } 1959@ @<Private function prototypes@> = 1960static inline gsize rule_length_get(RULE rule); 1961@ @<Function definitions@> = 1962gint marpa_rule_length(struct marpa_g *g, Marpa_Rule_ID rule_id) { 1963 @<Return |-2| on failure@>@; 1964 @<Fail if grammar |rule_id| is invalid@>@; 1965 return rule_length_get(RULE_by_ID(g, rule_id)); } 1966@ @<Public function prototypes@> = 1967gint marpa_rule_length(struct marpa_g *g, Marpa_Rule_ID rule_id); 1968 1969@*1 Symbols of the Rule. 1970@d LHS_ID_of_RULE(rule) ((rule)->t_symbols[0]) 1971@d RHS_ID_of_RULE(rule, position) 1972 ((rule)->t_symbols[(position)+1]) 1973 1974@*0 Rule ID. 1975The {\bf rule ID} is a number which 1976acts as the unique identifier for a rule. 1977@d ID_of_RULE(rule) ((rule)->t_id) 1978@<Int aligned rule elements@> = Marpa_Rule_ID t_id; 1979@ @<Initialize rule elements@> = rule->t_id = g->t_rules->len; 1980 1981@*0 Rule Boolean: Keep Separator. 1982When this rule is evaluated by the semantics, 1983do they want to see the separators? 1984Default is that they are thrown away. 1985Usually the role of the separators is only syntactic, 1986and that is what is wanted. 1987For non-sequence rules, this flag should be false. 1988@<Public defines@> = 1989#define MARPA_KEEP_SEPARATION @| @[0x1@]@/ 1990@ @<Bit aligned rule elements@> = guint t_is_discard:1; 1991@ @<Initialize rule elements@> = 1992rule->t_is_discard = FALSE; 1993@ @<Function definitions@> = 1994gboolean marpa_rule_is_discard_separation(struct marpa_g* g, Marpa_Rule_ID id) 1995{ return RULE_by_ID(g, id)->t_is_discard; } 1996@ @<Public function prototypes@> = 1997gboolean marpa_rule_is_discard_separation(struct marpa_g* g, Marpa_Rule_ID id); 1998 1999@*0 Rule Boolean: Proper Separation. 2000In Marpa's terminology, 2001proper separation means that a sequence 2002cannot legally end with a separator. 2003In ``proper" separation, 2004the term separator is interpreted strictly, 2005as something which separates two list items. 2006A separator coming after the final list item does not separate 2007two items, and therefore traditionally was considered a syntax 2008error. 2009\par 2010Proper separation is often inconvenient, 2011or even counter-productive. 2012Increasingly, the 2013practice is to be ``liberal" 2014and to allow a separator to come after the last list 2015item. 2016Liberal separation is the default in Marpa. 2017\par 2018There is not bitfield for this, because proper separation is 2019a completely syntactic matter, 2020taken care of in the rewrite itself. 2021@<Public defines@> = 2022#define MARPA_PROPER_SEPARATION @| @[0x2@]@/ 2023 2024@*0 Accessible Rules. 2025@ A rule is accessible if its LHS is accessible. 2026@<Function definitions@> = 2027static inline gint rule_is_accessible(struct marpa_g* g, RULE rule) 2028{ 2029Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule); 2030 return SYM_by_ID(lhs_id)->t_is_accessible; } 2031gint marpa_rule_is_accessible(struct marpa_g* g, Marpa_Rule_ID rule_id) 2032{ 2033 @<Return |-2| on failure@>@; 2034RULE rule; 2035 @<Fail if grammar |rule_id| is invalid@>@; 2036rule = RULE_by_ID(g, rule_id); 2037return rule_is_accessible(g, rule); 2038} 2039@ @<Private function prototypes@> = 2040static inline gint rule_is_accessible(struct marpa_g* g, RULE rule); 2041@ @<Public function prototypes@> = 2042gint marpa_rule_is_accessible(struct marpa_g* g, Marpa_Rule_ID id); 2043 2044@*0 Productive Rules. 2045@ A rule is productive if every symbol on its RHS is productive. 2046@<Function definitions@> = 2047static inline gint rule_is_productive(struct marpa_g* g, RULE rule) 2048{ 2049gint rh_ix; 2050for (rh_ix = 0; rh_ix < Length_of_RULE(rule); rh_ix++) { 2051 Marpa_Symbol_ID rhs_id = RHS_ID_of_RULE(rule, rh_ix); 2052 if ( !SYM_by_ID(rhs_id)->t_is_productive ) return FALSE; 2053} 2054return TRUE; } 2055gint marpa_rule_is_productive(struct marpa_g* g, Marpa_Rule_ID rule_id) 2056{ 2057 @<Return |-2| on failure@>@; 2058RULE rule; 2059 @<Fail if grammar |rule_id| is invalid@>@; 2060rule = RULE_by_ID(g, rule_id); 2061return rule_is_productive(g, rule); 2062} 2063@ @<Private function prototypes@> = 2064static inline gint rule_is_productive(struct marpa_g* g, RULE rule); 2065@ @<Public function prototypes@> = 2066gint marpa_rule_is_productive(struct marpa_g* g, Marpa_Rule_ID id); 2067 2068@*0 Loop Rule. 2069@ A rule is a loop rule if it non-trivially 2070produces the string of length one 2071which consists only of its LHS symbol. 2072``Non-trivially" means the zero-step derivation does not count -- the 2073derivation must have at least one step. 2074@<Bit aligned rule elements@> = guint t_is_loop:1; 2075@ @<Initialize rule elements@> = 2076rule->t_is_loop = FALSE; 2077@ This is the external accessor. 2078The internal accessor would be trivial, so there is none. 2079@<Function definitions@> = 2080gint marpa_rule_is_loop(struct marpa_g* g, Marpa_Rule_ID rule_id) 2081{ 2082 @<Return |-2| on failure@>@; 2083 @<Fail if grammar |rule_id| is invalid@>@; 2084return RULE_by_ID(g, rule_id)->t_is_loop; } 2085@ @<Public function prototypes@> = 2086gint marpa_rule_is_loop(struct marpa_g* g, Marpa_Rule_ID rule_id); 2087 2088@*0 Virtual Loop Rule. 2089@ When dealing with rules which result from the CHAF rewrite, 2090it is convenient to recognize the ``loop rule" property as belonging 2091to only one of the pieces. 2092The ``virtual loop rule" property exists for this purpose. 2093All virtual loop rules are loop rules, 2094but not vice versa. 2095@<Bit aligned rule elements@> = guint t_is_virtual_loop:1; 2096@ @<Initialize rule elements@> = 2097rule->t_is_virtual_loop = FALSE; 2098@ This is the external accessor. 2099The internal accessor would be trivial, so there is none. 2100@<Function definitions@> = 2101gint marpa_rule_is_virtual_loop(struct marpa_g* g, Marpa_Rule_ID rule_id) 2102{ 2103 @<Return |-2| on failure@>@; 2104 @<Fail if grammar |rule_id| is invalid@>@; 2105return RULE_by_ID(g, rule_id)->t_is_virtual_loop; } 2106@ @<Public function prototypes@> = 2107gint marpa_rule_is_virtual_loop(struct marpa_g* g, Marpa_Rule_ID rule_id); 2108 2109@*0 Nulling Rules. 2110@ A rule is nulling if every symbol on its RHS is nulling. 2111Note that this can be vacuously true --- an empty rule is nulling. 2112@<Function definitions@> = 2113static inline gint 2114rule_is_nulling (GRAMMAR g, RULE rule) 2115{ 2116 gint rh_ix; 2117 for (rh_ix = 0; rh_ix < Length_of_RULE (rule); rh_ix++) 2118 { 2119 SYMID rhs_id = RHS_ID_of_RULE (rule, rh_ix); 2120 if (!SYM_is_Nulling(SYM_by_ID (rhs_id))) 2121 return FALSE; 2122 } 2123 return TRUE; 2124} 2125@ @<Private function prototypes@> = 2126static inline gint rule_is_nulling(GRAMMAR g, RULE rule); 2127 2128@*0 Is Rule Used?. 2129@d RULE_is_Used(rule) ((rule)->t_is_used) 2130@<Bit aligned rule elements@> = guint t_is_used:1; 2131@ @<Initialize rule elements@> = 2132RULE_is_Used(rule) = 1; 2133@ This is the external accessor. 2134The internal accessor would be trivial, so there is none. 2135@<Function definitions@> = 2136gint marpa_rule_is_used(struct marpa_g* g, Marpa_Rule_ID rule_id) 2137{ 2138 @<Return |-2| on failure@>@; 2139 @<Fail if grammar |rule_id| is invalid@>@; 2140return RULE_is_Used(RULE_by_ID(g, rule_id)); } 2141@ @<Public function prototypes@> = 2142gint marpa_rule_is_used(struct marpa_g* g, Marpa_Rule_ID rule_id); 2143 2144@*0 Is This a Start Rule?. 2145@d RULE_is_Start(rule) ((rule)->t_is_start) 2146@<Bit aligned rule elements@> = guint t_is_start:1; 2147@ @<Initialize rule elements@> = 2148rule->t_is_start = FALSE; 2149@ This is the external accessor. 2150The internal accessor would be trivial, so there is none. 2151@<Function definitions@> = 2152gint marpa_rule_is_start(struct marpa_g* g, Marpa_Rule_ID rule_id) 2153{ 2154 @<Return |-2| on failure@>@; 2155 @<Fail if grammar |rule_id| is invalid@>@; 2156return RULE_by_ID(g, rule_id)->t_is_start; } 2157@ @<Public function prototypes@> = 2158gint marpa_rule_is_start(struct marpa_g* g, Marpa_Rule_ID rule_id); 2159 2160@*0 Rule Boolean: Virtual LHS. 2161This is for Marpa's ``internal semantics". 2162When Marpa rewrites rules, it does so in a way invisible to 2163the user's semantics. 2164It does this by marking rules so that it can reassemble 2165the results of rewritten rules to appear ``as if" 2166they were the result of evaluating the original, 2167un-rewritten rule. 2168\par 2169All Marpa's rewrites allow the rewritten rules to be 2170``dummied up" to look like the originals. 2171That this must be possible for any rewrite was one of 2172Marpa's design criteria. 2173It was an especially non-negotiable criteria, because 2174almost the only reason for parsing a grammar is to apply the 2175semantics specified for the original grammar. 2176@d RULE_is_Virtual_LHS(rule) ((rule)->t_is_virtual_lhs) 2177@<Bit aligned rule elements@> = guint t_is_virtual_lhs:1; 2178@ @<Initialize rule elements@> = 2179RULE_is_Virtual_LHS(rule) = FALSE; 2180@ The internal accessor would be trivial, so there is none. 2181@<Function definitions@> = 2182gboolean marpa_rule_is_virtual_lhs(struct marpa_g* g, Marpa_Rule_ID rule_id) 2183{ 2184@<Return |-2| on failure@>@; 2185@<Fail if grammar |rule_id| is invalid@>@; 2186return RULE_is_Virtual_LHS(RULE_by_ID(g, rule_id)); } 2187@ @<Public function prototypes@> = 2188gboolean marpa_rule_is_virtual_lhs(struct marpa_g* g, Marpa_Rule_ID rule_id); 2189 2190@*0 Rule Boolean: Virtual RHS. 2191@d RULE_is_Virtual_RHS(rule) ((rule)->t_is_virtual_rhs) 2192@<Bit aligned rule elements@> = guint t_is_virtual_rhs:1; 2193@ @<Initialize rule elements@> = 2194RULE_is_Virtual_RHS(rule) = FALSE; 2195@ The internal accessor would be trivial, so there is none. 2196@<Function definitions@> = 2197gboolean marpa_rule_is_virtual_rhs(struct marpa_g* g, Marpa_Rule_ID rule_id) 2198{ 2199@<Return |-2| on failure@>@; 2200@<Fail if grammar |rule_id| is invalid@>@; 2201return RULE_is_Virtual_RHS(RULE_by_ID(g, rule_id)); } 2202@ @<Public function prototypes@> = 2203gboolean marpa_rule_is_virtual_rhs(struct marpa_g* g, Marpa_Rule_ID rule_id); 2204 2205@*0 Virtual Start Position. 2206For a virtual rule, 2207this is the RHS position in the original rule 2208where this one starts. 2209@<Int aligned rule elements@> = gint t_virtual_start; 2210@ @<Initialize rule elements@> = rule->t_virtual_start = -1; 2211@ @<Function definitions@> = 2212guint marpa_virtual_start(struct marpa_g *g, Marpa_Rule_ID rule_id) 2213{ 2214@<Return |-2| on failure@>@; 2215@<Fail if grammar |rule_id| is invalid@>@; 2216return RULE_by_ID(g, rule_id)->t_virtual_start; 2217} 2218@ @<Public function prototypes@> = 2219guint marpa_virtual_start(struct marpa_g *g, Marpa_Rule_ID rule_id); 2220 2221@*0 Virtual End Position. 2222For a virtual rule, 2223this is the RHS position in the original rule 2224at which this one ends. 2225@<Int aligned rule elements@> = gint t_virtual_end; 2226@ @<Initialize rule elements@> = rule->t_virtual_end = -1; 2227@ @<Function definitions@> = 2228guint marpa_virtual_end(struct marpa_g *g, Marpa_Rule_ID rule_id) 2229{ 2230@<Return |-2| on failure@>@; 2231@<Fail if grammar |rule_id| is invalid@>@; 2232return RULE_by_ID(g, rule_id)->t_virtual_end; 2233} 2234@ @<Public function prototypes@> = 2235guint marpa_virtual_end(struct marpa_g *g, Marpa_Rule_ID rule_id); 2236 2237@*0 Rule Callbacks. 2238The user can define a callback 2239(with argument) which is invoked whenever a rule 2240is created. 2241@ Function pointer declarations are 2242hard to type and impossible to read. 2243This typedef localizes the damage. 2244@<Callback typedefs@> = 2245typedef void (Marpa_Rule_Callback)(struct marpa_g *g, Marpa_Rule_ID id); 2246@ @<Widely aligned grammar elements@> = 2247 Marpa_Rule_Callback* t_rule_callback; 2248 gpointer t_rule_callback_arg; 2249@ @<Initialize grammar elements@> = 2250g->t_rule_callback_arg = NULL; 2251g->t_rule_callback = NULL; 2252@ @<Function definitions@> = 2253void marpa_rule_callback_set(struct marpa_g *g, Marpa_Rule_Callback*cb) 2254{ g->t_rule_callback = cb; } 2255@ @<Public function prototypes@> = 2256void marpa_rule_callback_set(struct marpa_g *g, Marpa_Rule_Callback*cb); 2257@ @<Function definitions@> = 2258void marpa_rule_callback_arg_set(struct marpa_g *g, gpointer cb_arg) 2259{ g->t_rule_callback_arg = cb_arg; } 2260@ @<Public function prototypes@> = 2261void marpa_rule_callback_arg_set(struct marpa_g *g, gpointer cb_arg); 2262@ @<Function definitions@> = 2263gpointer marpa_rule_callback_arg(struct marpa_g *g) 2264{ return g->t_rule_callback_arg; } 2265@ @<Public function prototypes@> = 2266gpointer marpa_rule_callback_arg(struct marpa_g *g); 2267@ Do the rule callback. 2268@<Private function prototypes@> = 2269static inline void rule_callback(struct marpa_g *g, Marpa_Rule_ID id); 2270@ {\bf To Do}: @^To Do@> 2271Look at with the possibility of leaking memory if the callback 2272never returns, but the grammar is destroyed. 2273@<Function definitions@> = 2274static inline void rule_callback(struct marpa_g *g, Marpa_Rule_ID id) 2275{ Marpa_Rule_Callback* cb = g->t_rule_callback; 2276if (cb) { (*cb)(g, id); } } 2277 2278@*0 Rule Original. 2279In many cases, Marpa will rewrite a rule. 2280If this rule is the result of a rewriting, this element contains 2281the ID of the original rule. 2282@ @<Int aligned rule elements@> = Marpa_Rule_ID t_original; 2283@ @<Initialize rule elements@> = rule->t_original = -1; 2284@ @<Function definitions@> = 2285Marpa_Rule_ID marpa_rule_original(struct marpa_g *g, Marpa_Rule_ID rule_id) 2286{ 2287@<Return |-2| on failure@>@; 2288@<Fail if grammar |rule_id| is invalid@>@; 2289return RULE_by_ID(g, rule_id)->t_original; 2290} 2291@ @<Public function prototypes@> = 2292Marpa_Rule_ID marpa_rule_original(struct marpa_g *g, Marpa_Rule_ID rule_id); 2293 2294@*0 Rule Real Symbol Count. 2295This is another data element used for the ``internal semantics" -- 2296the logic to reassemble results of rewritten rules so that they 2297look as if they came from the original, un-rewritten rules. 2298The value of this field is meaningful if and only if 2299the rule has a virtual rhs or a virtual lhs. 2300@d Real_SYM_Count_of_RULE(rule) ((rule)->t_real_symbol_count) 2301@ @<Int aligned rule elements@> = gint t_real_symbol_count; 2302@ @<Initialize rule elements@> = Real_SYM_Count_of_RULE(rule) = 0; 2303@ @<Public function prototypes@> = 2304gint marpa_real_symbol_count(struct marpa_g *g, Marpa_Rule_ID rule_id); 2305@ @<Function definitions@> = 2306gint marpa_real_symbol_count(struct marpa_g *g, Marpa_Rule_ID rule_id) 2307{ 2308@<Return |-2| on failure@>@; 2309@<Fail if grammar |rule_id| is invalid@>@; 2310return Real_SYM_Count_of_RULE(RULE_by_ID(g, rule_id)); 2311} 2312 2313@*0 Semantic Equivalents. 2314@<Bit aligned rule elements@> = guint t_is_semantic_equivalent:1; 2315@ @<Initialize rule elements@> = 2316rule->t_is_semantic_equivalent = FALSE; 2317@ Semantic equivalence arises out of Marpa's rewritings. 2318When a rule is rewritten, 2319some (but not all!) of the resulting rules have the 2320same semantics as the original rule. 2321It is this ``original rule" that |semantic_equivalent()| returns. 2322 2323@ If this rule is the semantic equivalent of another rule, 2324this external accessor returns the ``original rule". 2325Otherwise it returns -1. 2326@<Public function prototypes@> = 2327Marpa_Rule_ID marpa_rule_semantic_equivalent(struct marpa_g* g, Marpa_Rule_ID id); 2328@ @<Function definitions@> = 2329Marpa_Rule_ID 2330marpa_rule_semantic_equivalent (struct marpa_g *g, Marpa_Rule_ID rule_id) 2331{ 2332 RULE rule; 2333@<Return |-2| on failure@>@; 2334@<Fail if grammar |rule_id| is invalid@>@; 2335 rule = RULE_by_ID (g, rule_id); 2336 if (RULE_is_Virtual_LHS(rule)) return -1; 2337 if (rule->t_is_semantic_equivalent) return rule->t_original; 2338 return rule_id; 2339} 2340 2341@** Symbol Instance (SYMI) Code. 2342@<Private typedefs@> = typedef gint SYMI; 2343@ @d SYMI_Count_of_G(g) ((g)->t_symbol_instance_count) 2344@<Int aligned grammar elements@> = 2345gint t_symbol_instance_count; 2346@ |SYMI_of_Completed_RULE| assumes that the rule is 2347not zero length. 2348|SYMI_of_Last_AIM_of_RULE| will return -1 if the 2349rule has no proper symbols. 2350@d SYMI_of_RULE(rule) ((rule)->t_symbol_instance_base) 2351@d Last_Proper_SYMI_of_RULE(rule) ((rule)->t_last_proper_symi) 2352@d SYMI_of_Completed_RULE(rule) 2353 (SYMI_of_RULE(rule) + Length_of_RULE(rule)-1) 2354@d SYMI_of_AIM(aim) (symbol_instance_of_ahfa_item_get(aim)) 2355@<Int aligned rule elements@> = 2356gint t_symbol_instance_base; 2357gint t_last_proper_symi; 2358@ @<Initialize rule elements@> = 2359Last_Proper_SYMI_of_RULE(rule) = -1; 2360@ @<Private function prototypes@> = 2361static inline gint symbol_instance_of_ahfa_item_get(AIM aim); 2362@ Symbol instances are for the {\bf predot} symbol. 2363In parsing the emphasis is on what is to come --- 2364on what follows the dot. 2365Symbol instances are used in evaluation. 2366In evaluation we are looking at what we have, 2367so the emphasis is on what precedes the dot position. 2368@ The symbol instance of a prediction is $-1$. 2369If the AHFA item is not a prediction, then it has a preceding 2370AHFA item for the same rule. 2371In that case the symbol instance is the 2372base symbol instance for 2373the rule, offset by the position of that preceding AHFA item. 2374@<Function definitions@> = 2375static inline gint 2376symbol_instance_of_ahfa_item_get (AIM aim) 2377{ 2378 gint position = Position_of_AIM (aim); 2379 const gint null_count = Null_Count_of_AIM(aim); 2380 if (position < 0 || position - null_count > 0) { 2381 /* If this AHFA item is not a predictiion */ 2382 const RULE rule = RULE_of_AIM (aim); 2383 position = Position_of_AIM(aim-1); 2384 return SYMI_of_RULE(rule) + position; 2385 } 2386 return -1; 2387} 2388 2389@** Precomputing the Grammar. 2390Marpa's logic divides roughly into three pieces -- grammar precomputation, 2391the actual parsing of input tokens, 2392and semantic evaluation. 2393Precomputing the grammar is complex enought to divide into several 2394stages of its own, which are 2395covered in the next few 2396sections. 2397This section describes the top-level method for precomputation, 2398which is external. 2399 2400@<Function definitions@> = 2401struct marpa_g* marpa_precompute(struct marpa_g* g) 2402{ 2403 if (!census(g)) return NULL; 2404 if (!CHAF_rewrite(g)) return NULL; 2405 if (!g_augment(g)) return NULL; 2406 loop_detect(g); 2407 create_AHFA_items(g); 2408 create_AHFA_states(g); 2409 @<Populate the Terminal Boolean Vector@>@; 2410 return g; 2411} 2412@ @<Public function prototypes@> = 2413struct marpa_g* marpa_precompute(struct marpa_g* g); 2414 2415@** The Grammar Census. 2416 2417@*0 Implementation: Inacessible and Unproductive Rules. 2418The textbooks say that, 2419in order to automatically {\bf eliminate} inaccessible and unproductive 2420productions from a grammar, you have to first eliminate the 2421unproductive productions, {\bf then} the inaccessible ones. 2422 2423In practice, this advice does not seem very helpful. 2424Imagine the (quite possible) case 2425of an unproductive start symbol. 2426Following the 2427correct procedure for automatically cleaning the grammar, I would 2428have to regard the start symbol and its productions as eliminated 2429and therefore go on to report every other production and symbol as 2430inaccessible. Almost certainly all these inaccessiblity reports, 2431while theoretically correct, would be irrelevant. 2432What the user probably wants to 2433is to make the start symbol productive. 2434 2435In |libmarpa|, 2436inaccessibility is determined based on the assumption that 2437unproductive symbols will be make productive somehow, 2438and not eliminated. 2439The downside of this choice is that, in a few uncommon cases, 2440a user relying entirely 2441on the Marpa::XS warnings to clean up his grammar will have to go through 2442more than a single pass of the diagnostics. 2443(As of this writing, I personally have yet to encounter such a case.) 2444The upside is that in the more frequent cases, the user is spared 2445a lot of useless diagnostics. 2446 2447@<Function definitions@> = 2448static struct marpa_g* census(struct marpa_g* g) 2449{ 2450 @<Return |NULL| on failure@>@; 2451 @<Declare census variables@>@; 2452 @<Return |NULL| if empty grammar@>@; 2453 @<Return |NULL| if already precomputed@>@; 2454 @<Return |NULL| if bad start symbol@>@; 2455 @<Census LHS symbols@>@; 2456 @<Census terminals@>@; 2457 if (have_marked_terminals) { 2458 @<Fatal if LHS terminal when not allowed@>@; 2459 } else { 2460 @<Fatal if empty rule and unmarked terminals@>; 2461 if (g->t_is_lhs_terminal_ok) { 2462 @<Mark all symbols terminal@>@; 2463 } else { 2464 @<Mark non-LHS symbols terminal@>@; 2465 } 2466 } 2467 @<Census nullable symbols@>@; 2468 @<Census productive symbols@>@; 2469 @<Check that start symbol is productive@>@; 2470 @<Calculate reach matrix@>@; 2471 @<Census accessible symbols@>@; 2472 @<Census nulling symbols@>@; 2473 @<Free Boolean vectors@>@; 2474 @<Free Boolean matrixes@>@; 2475 g->t_is_precomputed = TRUE; 2476 return g; 2477} 2478@ @<Private function prototypes@> = 2479static struct marpa_g* census(struct marpa_g* g); 2480@ @<Declare census variables@> = 2481guint pre_rewrite_rule_count = g->t_rules->len; 2482guint pre_rewrite_symbol_count = g->t_symbols->len; 2483 2484@ @<Return |NULL| if empty grammar@> = 2485if (g->t_rules->len <= 0) { g->t_error = "no rules"; return NULL; } 2486@ The upper layers have a lot of latitude with this one. 2487There's no harm done, so the upper layers can simply ignore this one. 2488On the other hand, the upper layer may see this as a sign of a major 2489logic error, and treat it as a fatal error. 2490Anything in between these two extremes is also possible. 2491@<Return |NULL| if already precomputed@> = 2492if (G_is_Precomputed(g)) { g->t_error = "precomputed"; return NULL; } 2493@ Loop over the rules, producing bit vector of LHS symbols, and of 2494symbols which are the LHS of empty rules. 2495While at it, set a flag to indicate if there are empty rules. 2496 2497@ @<Return |NULL| if bad start symbol@> = 2498if (original_start_symid < 0) { 2499 g_context_clear(g); 2500 g->t_error = "no start symbol"; 2501 return failure_indicator; 2502} 2503if (!symbol_is_valid(g, original_start_symid)) { 2504 g_context_clear(g); 2505 g_context_int_add(g, "symid", original_start_symid); 2506 g->t_error = "invalid start symbol"; 2507 return failure_indicator; 2508} 2509original_start_symbol = SYM_by_ID(original_start_symid); 2510if (original_start_symbol->t_lhs->len <= 0) { 2511 g_context_clear(g); 2512 g_context_int_add(g, "symid", original_start_symid); 2513 g->t_error = "start symbol not on LHS"; 2514 return failure_indicator; 2515} 2516 2517@ @<Declare census variables@> = 2518Marpa_Symbol_ID original_start_symid = g->t_start_symid; 2519SYM original_start_symbol; 2520 2521@ @<Census LHS symbols@> = 2522{ Marpa_Rule_ID rule_id; 2523lhs_v = bv_create(pre_rewrite_symbol_count); 2524empty_lhs_v = bv_shadow(lhs_v); 2525for (rule_id = 0; 2526 rule_id < (Marpa_Rule_ID)pre_rewrite_rule_count; 2527 rule_id++) { 2528 RULE rule = RULE_by_ID(g, rule_id); 2529 Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule); 2530 bv_bit_set(lhs_v, (guint)lhs_id); 2531 if (Length_of_RULE(rule) <= 0) { 2532 bv_bit_set(empty_lhs_v, (guint)lhs_id); 2533 have_empty_rule = 1; 2534 } 2535} 2536} 2537@ Loop over the symbols, producing the boolean vector of symbols 2538already marked as terminal, 2539and a flag which indicates if there are any. 2540@<Census terminals@> = 2541{ Marpa_Symbol_ID symid; 2542terminal_v = bv_create(pre_rewrite_symbol_count); 2543for (symid = 0; 2544 symid < (Marpa_Symbol_ID)pre_rewrite_symbol_count; 2545 symid++) { 2546 SYM symbol = SYM_by_ID(symid); 2547 if (SYM_is_Terminal(symbol)) { 2548 bv_bit_set(terminal_v, (guint)symid); 2549 have_marked_terminals = 1; 2550 } 2551} } 2552@ @<Free Boolean vectors@> = 2553bv_free(terminal_v); 2554@ 2555@s Bit_Vector int 2556@<Declare census variables@> = 2557Bit_Vector terminal_v; 2558gboolean have_marked_terminals = 0; 2559 2560@ @<Fatal if empty rule and unmarked terminals@> = 2561if (have_empty_rule && g->t_is_lhs_terminal_ok) { 2562 g->t_error = "empty rule and unmarked terminals"; 2563 return NULL; 2564} 2565@ Any optimization should be for the non-error case, in which 2566there are no LHS terminals, and the entire list of symbols must 2567be scanned to discover this. 2568It is faster to stop scanning symbols on the first error, if there is 2569an error, but when that happens it is a fatal error, 2570and for that, this code is already plenty fast enough. 2571@<Fatal if LHS terminal when not allowed@> = 2572if (!g->t_is_lhs_terminal_ok) { 2573 gboolean have_bad_lhs = 0; 2574 guint start = 0; 2575 guint min, max; 2576 Bit_Vector bad_lhs_v = bv_clone(terminal_v); 2577 bv_and(bad_lhs_v, bad_lhs_v, lhs_v); 2578 while ( bv_scan(bad_lhs_v, start, &min, &max) ) { 2579 Marpa_Symbol_ID i; 2580 for (i = (Marpa_Symbol_ID)min; i <= (Marpa_Symbol_ID)max; i++) { 2581 g_context_clear(g); 2582 g_context_int_add(g, "symid", i); 2583 grammar_message(g, "lhs is terminal"); 2584 } 2585 start = max+2; 2586 have_bad_lhs = 1; 2587 } 2588 bv_free(bad_lhs_v); 2589 if (have_bad_lhs) { 2590 g->t_error = "lhs is terminal"; 2591 return NULL; 2592 } 2593} 2594 2595@ @<Mark all symbols terminal@> = 2596{ Marpa_Symbol_ID symid; 2597bv_fill(terminal_v); 2598for (symid = 0; symid < (Marpa_Symbol_ID)g->t_symbols->len; symid++) 2599{ SYMID_is_Terminal(symid) = 1; } } 2600@ @<Mark non-LHS symbols terminal@> = 2601{ guint start = 0; 2602guint min, max; 2603bv_not(terminal_v, lhs_v); 2604while ( bv_scan(terminal_v, start, &min, &max) ) { 2605 Marpa_Symbol_ID symid; 2606 for (symid = (Marpa_Symbol_ID)min; symid <= (Marpa_Symbol_ID)max; symid++) { 2607 SYMID_is_Terminal(symid) = 1; 2608 } 2609 start = max+2; 2610} 2611} 2612@ @<Free Boolean vectors@> = 2613bv_free(lhs_v); 2614bv_free(empty_lhs_v); 2615@ @<Declare census variables@> = 2616Bit_Vector lhs_v; 2617Bit_Vector empty_lhs_v; 2618gboolean have_empty_rule = 0; 2619 2620@ @<Census nullable symbols@> = 2621nullable_v = bv_clone(empty_lhs_v); 2622rhs_closure(g, nullable_v); 2623{ guint min, max, start; 2624Marpa_Symbol_ID symid; 2625gint counted_nullables = 0; 2626 for ( start = 0; bv_scan(nullable_v, start, &min, &max); start = max+2 ) { 2627 for (symid = (Marpa_Symbol_ID)min; symid <= (Marpa_Symbol_ID)max; symid++) { 2628 SYM symbol = SYM_by_ID(symid); 2629 if (symbol->t_is_counted) { 2630 g_context_clear(g); 2631 g_context_int_add(g, "symid", symid); 2632 grammar_message(g, "counted nullable"); 2633 counted_nullables++; 2634 } 2635 symbol->t_is_nullable = 1; 2636} } 2637if (counted_nullables) { 2638 g->t_error = "counted nullable"; 2639 return NULL; 2640} 2641} 2642@ @<Declare census variables@> = 2643Bit_Vector nullable_v; 2644@ @<Free Boolean vectors@> = 2645bv_free(nullable_v); 2646 2647@ @<Census productive symbols@> = 2648productive_v = bv_shadow(nullable_v); 2649bv_or(productive_v, nullable_v, terminal_v); 2650rhs_closure(g, productive_v); 2651{ guint min, max, start; 2652Marpa_Symbol_ID symid; 2653 for ( start = 0; bv_scan(productive_v, start, &min, &max); start = max+2 ) { 2654 for (symid = (Marpa_Symbol_ID)min; 2655 symid <= (Marpa_Symbol_ID)max; 2656 symid++) { 2657 SYM symbol = SYM_by_ID(symid); 2658 symbol->t_is_productive = 1; 2659} } 2660} 2661@ @<Check that start symbol is productive@> = 2662if (!bv_bit_test(productive_v, (guint)g->t_start_symid)) 2663{ 2664 g_context_int_add(g, "symid", g->t_start_symid); 2665 g->t_error = "unproductive start symbol"; 2666 return NULL; 2667} 2668@ @<Declare census variables@> = 2669Bit_Vector productive_v; 2670@ @<Free Boolean vectors@> = 2671bv_free(productive_v); 2672 2673@ The reach matrix is the an $n\times n$ matrix, 2674where $n$ is the number of symbols. 2675Bit $(i,j)$ is set in the reach matrix if and only if 2676symbol $i$ can reach symbol $j$. 2677\par 2678This logic could be put earlier, and a child array 2679for each rule could be efficiently calculated during 2680the initialization for the calculation of the reach 2681matrix. 2682A rule-child array is a list of the rule's RHS symbols, 2683in sequence and without duplicates. 2684There are places were traversing a rule-child array, 2685instead of the rhs, would be more efficient. 2686At this point, 2687however, it is not clear whether use of a rule-child array 2688is not a pointless or even counter-productive optimization. 2689It would only make a difference in grammars 2690where many of the right hand sides repeat symbols. 2691@<Calculate reach matrix@> = 2692reach_matrix 2693 = matrix_create(pre_rewrite_symbol_count, pre_rewrite_symbol_count); 2694{ guint symid, no_of_symbols = SYM_Count_of_G(g); 2695for (symid = 0; symid < no_of_symbols; symid++) { 2696 matrix_bit_set(reach_matrix, symid, symid); 2697} } 2698{ Marpa_Rule_ID rule_id; 2699guint no_of_rules = RULE_Count_of_G(g); 2700for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) { 2701 RULE rule = RULE_by_ID(g, rule_id); 2702 Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule); 2703 guint rhs_ix, rule_length = Length_of_RULE(rule); 2704 for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) { 2705 matrix_bit_set(reach_matrix, 2706 (guint)lhs_id, (guint)RHS_ID_of_RULE(rule, rhs_ix)); 2707} } } 2708transitive_closure(reach_matrix); 2709@ @<Declare census variables@> = Bit_Matrix reach_matrix; 2710@ @<Free Boolean matrixes@> = 2711matrix_free(reach_matrix); 2712 2713@ @<Census accessible symbols@> = 2714accessible_v = matrix_row(reach_matrix, (guint)original_start_symid); 2715{ guint min, max, start; 2716Marpa_Symbol_ID symid; 2717 for ( start = 0; bv_scan(accessible_v, start, &min, &max); start = max+2 ) { 2718 for (symid = (Marpa_Symbol_ID)min; 2719 symid <= (Marpa_Symbol_ID)max; 2720 symid++) { 2721 SYM symbol = SYM_by_ID(symid); 2722 symbol->t_is_accessible = 1; 2723} } 2724} 2725@ |accessible_v| is a pointer into the |reach_matrix|. 2726Therefore there is no code to free it. 2727@<Declare census variables@> = 2728Bit_Vector accessible_v; 2729 2730@ A symbol is nulling if and only if it is a productive symbol which does not 2731reach a terminal symbol. 2732@<Census nulling symbols@> = 2733{ 2734 Bit_Vector reaches_terminal_v = bv_shadow (terminal_v); 2735 guint min, max, start; 2736 for (start = 0; bv_scan (productive_v, start, &min, &max); start = max + 2) 2737 { 2738 Marpa_Symbol_ID productive_id; 2739 for (productive_id = (Marpa_Symbol_ID) min; 2740 productive_id <= (Marpa_Symbol_ID) max; productive_id++) 2741 { 2742 bv_and (reaches_terminal_v, terminal_v, 2743 matrix_row (reach_matrix, (guint) productive_id)); 2744 if (bv_is_empty (reaches_terminal_v)) 2745 SYM_is_Nulling(SYM_by_ID (productive_id)) = 1; 2746 } 2747 } 2748 bv_free (reaches_terminal_v); 2749} 2750 2751@** The CHAF Rewrite. 2752 2753Nullable symbols have been a difficulty for Earley implementations 2754since day zero. 2755Aycock and Horspool came up with a solution to this problem, 2756part of which involved rewriting the grammar to eliminate 2757all proper nullables. 2758Marpa's CHAF rewrite is built on the work of Aycock and 2759Horspool. 2760 2761Marpa's CHAF rewrite is one of its two rewrites of the BNF. 2762The other 2763adds a new start symbol to the grammar. 2764 2765@ The rewrite strategy for Marpa is new to it. 2766It is an elaboration on the one developed by Aycock and Horspool. 2767The basic idea behind Aycock and Horspool's NNF was to elimnate 2768proper nullables by replacing the rules with variants which 2769used only nulling and non-nulling symbols. 2770These had to be created for every possible combination 2771of nulling and non-nulling symbols. 2772This meant that the number of NNF rules was 2773potentially exponential 2774in the length of rule of the original grammar. 2775 2776@ Marpa's CHAF (Chomsky-Horspool-Aycock Form) eliminates 2777the problem of exponential explosion by first breaking rules 2778up into pieces, each piece containing no more than two proper nullables. 2779The number of rewritten rules in CHAF in linear in the length of 2780the original rule. 2781 2782@ The CHAF rewrite affects only rules with proper nullables. 2783In this context, the proper nullables are called ``factors". 2784Each piece of the original rule is rewritten into up to four 2785``factored pieces". 2786When there are two proper nullables, the potential CHAF rules 2787are 2788\li The PP rule: Both factors are replaced with non-nulling symbols. 2789\li The PN rule: The first factor is replaced with a non-nulling symbol, 2790and the second factor is replaced with a nulling symbol. 2791\li The NP rule: The first factor is replaced with a nulling symbol, 2792and the second factor is replaced with a non-nulling symbol. 2793\li The NN rule: Both factors are replaced with nulling symbols. 2794 2795@ Sometimes the CHAF piece will have only one factor. A one-factor 2796piece is rewritten into at most two factored pieces: 2797\li The P rule: The factor is replaced with a non-nulling symbol. 2798\li The N rule: The factor is replaced with a nulling symbol. 2799 2800@ In |CHAF_rewrite|, a |rule_count| is taken before the loop over 2801the grammar's rules, even though rules are added in the loop. 2802This is not an error. 2803The CHAF rewrite is not recursive -- the new rules it creates 2804are not themselves subject to CHAF rewrite. 2805And rule ID's increase by one each time, 2806so that all the new 2807rules will have ID's equal to or greater than |no_of_rules|. 2808@ @<Function definitions@> = 2809static inline struct marpa_g* CHAF_rewrite(struct marpa_g* g) 2810{ 2811 @<CHAF rewrite declarations@>@; 2812 @<CHAF rewrite allocations@>@; 2813 @<Alias proper nullables@>@; 2814 no_of_rules = RULE_Count_of_G(g); 2815 for (rule_id = 0; rule_id < no_of_rules; rule_id++) { 2816 RULE rule = RULE_by_ID(g, rule_id); 2817 const gint rule_length = Length_of_RULE(rule); 2818 gint nullable_suffix_ix = 0; 2819 @<Mark and skip unused rules@>@; 2820 @<Calculate CHAF rule statistics@>@; 2821 /* If there is no proper nullable in this rule, I am done */ 2822 if (factor_count <= 0) goto NEXT_RULE; 2823 @<Factor the rule into CHAF rules@>@; 2824 NEXT_RULE: ; 2825 } 2826 @<CHAF rewrite deallocations@>@; 2827 return g; 2828} 2829@ @<Private function prototypes@> = 2830static inline struct marpa_g* CHAF_rewrite(struct marpa_g* g); 2831@ @<CHAF rewrite declarations@> = 2832Marpa_Rule_ID rule_id; 2833gint no_of_rules; 2834 2835@ @<Mark and skip unused rules@> = 2836if (!RULE_is_Used(rule)) { goto NEXT_RULE; } 2837if (rule_is_nulling(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; } 2838if (!rule_is_accessible(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; } 2839if (!rule_is_productive(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; } 2840 2841@ For every accessible and productive proper nullable which 2842is not already aliased, alias it. 2843@<Alias proper nullables@> = 2844{ gint no_of_symbols = SYM_Count_of_G(g); 2845Marpa_Symbol_ID symid; 2846for (symid = 0; symid < no_of_symbols; symid++) { 2847 SYM symbol = SYM_by_ID(symid); 2848 SYM alias; 2849 if (!symbol->t_is_nullable) continue; 2850 if (SYM_is_Nulling(symbol)) continue; 2851 if (!symbol->t_is_accessible) continue; 2852 if (!symbol->t_is_productive) continue; 2853 if (symbol_null_alias(symbol)) continue; 2854 alias = symbol_alias_create(g, symbol); 2855 symbol_callback(g, ID_of_SYM(alias)); 2856} } 2857 2858@*0 Compute Statistics Needed to Rewrite the Rule. 2859The term 2860``factor" is used to mean an instance of a proper nullable 2861symbol on the RHS of a rule. 2862This comes from the idea that replacing the proper nullables 2863with proper symbols and nulling symbols ``factors" pieces 2864of the rule being rewritten (the original rule) 2865into multiple CHAF rules. 2866@<Calculate CHAF rule statistics@> = 2867{ gint rhs_ix; 2868factor_count = 0; 2869for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) { 2870 Marpa_Symbol_ID symid = RHS_ID_of_RULE(rule, rhs_ix); 2871 SYM symbol = SYM_by_ID(symid); 2872 if (SYM_is_Nulling(symbol)) continue; /* Do nothing for nulling symbols */ 2873 if (symbol_null_alias(symbol)) { 2874 /* If a proper nullable, record its position */ 2875 factor_positions[factor_count++] = rhs_ix; 2876 continue; 2877 }@# 2878 nullable_suffix_ix = rhs_ix+1; 2879/* If not a nullable symbol, move forward the index 2880 of the nullable suffix location */ 2881} } 2882@ @<CHAF rewrite declarations@> = 2883gint factor_count; 2884gint* factor_positions; 2885@ @<CHAF rewrite allocations@> = 2886factor_positions = g_new(gint, g->t_max_rule_length); 2887@ @<CHAF rewrite deallocations@> = 2888g_free(factor_positions); 2889 2890@*0 Divide the Rule into Pieces. 2891@<Factor the rule into CHAF rules@> = 2892RULE_is_Used(rule) = 0; /* Mark the original rule unused */ 2893{ gint unprocessed_factor_count; /* The number of proper nullables for which CHAF rules have 2894yet to be written */ 2895gint factor_position_ix = 0; /* Current index into the list of factors */ 2896Marpa_Symbol_ID current_lhs_id = LHS_ID_of_RULE(rule); 2897gint piece_end, piece_start = 0; /* The positions, in the original rule, where 2898the new (virtual) rule starts and ends */ 2899for (unprocessed_factor_count = factor_count - factor_position_ix; 2900unprocessed_factor_count >= 3; 2901unprocessed_factor_count = factor_count - factor_position_ix) { 2902 @<Add non-final CHAF rules@>@; 2903} 2904if (unprocessed_factor_count == 2) { 2905 @<Add final CHAF rules for two factors@>@; 2906} else { 2907 @<Add final CHAF rules for one factor@>@; 2908} } 2909 2910@ @<Create a CHAF virtual symbol@> = { 2911 SYM chaf_virtual_symbol = symbol_new(g); 2912 chaf_virtual_symbol->t_is_accessible = 1; 2913 chaf_virtual_symbol->t_is_productive = 1; 2914 chaf_virtual_symid = ID_of_SYM(chaf_virtual_symbol); 2915 g_context_clear(g); 2916 g_context_int_add(g, "rule_id", rule_id); 2917 g_context_int_add(g, "lhs_id", LHS_ID_of_RULE(rule)); 2918 g_context_int_add(g, "virtual_end", (gint)piece_end); 2919 symbol_callback(g, chaf_virtual_symid); 2920} 2921 2922@*0 Temporary buffers for the CHAF right hand sides. 2923Two temporary buffers are used in factoring out CHAF rules. 2924|piece_rhs| is for the normal case, where only the symbols 2925of the current piece are on the RHS. 2926In certain cases, where the remainder of the rule is nulling, 2927further factoring is unnecessary and the CHAF rewrite simply 2928finishes out the rule with nulling symbols. 2929In such cases, the RHS is built in the 2930|remaining_rhs| buffer. 2931@<CHAF rewrite declarations@> = 2932Marpa_Symbol_ID* piece_rhs; 2933Marpa_Symbol_ID* remaining_rhs; 2934@ @<CHAF rewrite allocations@> = 2935piece_rhs = g_new(Marpa_Symbol_ID, g->t_max_rule_length); 2936remaining_rhs = g_new(Marpa_Symbol_ID, g->t_max_rule_length); 2937@ @<CHAF rewrite deallocations@> = 2938g_free(piece_rhs); 2939g_free(remaining_rhs); 2940 2941@*0 Factor A Non-Final Piece. 2942@ As long as I have more than 3 unprocessed factors, I am working on a non-final 2943rule. 2944@<Add non-final CHAF rules@> = 2945 Marpa_Symbol_ID chaf_virtual_symid; 2946 gint first_factor_position = factor_positions[factor_position_ix]; 2947 gint first_factor_piece_position = first_factor_position - piece_start; 2948 gint second_factor_position = factor_positions[factor_position_ix+1]; 2949 if (second_factor_position >= nullable_suffix_ix) { 2950 piece_end = second_factor_position-1; 2951 /* The last factor is in the nullable suffix, so the virtual RHS must be nullable */ 2952 @<Create a CHAF virtual symbol@>@; 2953 @<Add CHAF rules for nullable continuation@>@; 2954 factor_position_ix++; 2955 } else { 2956 gint second_factor_piece_position = second_factor_position - piece_start; 2957 piece_end = second_factor_position; 2958 @<Create a CHAF virtual symbol@>@; 2959 @<Add CHAF rules for proper continuation@>@; 2960 factor_position_ix += 2; 2961 } 2962 current_lhs_id = chaf_virtual_symid; 2963 piece_start = piece_end+1; 2964 2965@*0 Add CHAF Rules for Nullable Continuations. 2966For a piece that has a nullable continuation, 2967the virtual RHS counts 2968as one of the two allowed proper nullables. 2969That means the piece must 2970end before the second proper nullable (or factor). 2971@<Add CHAF rules for nullable continuation@> = 2972{ 2973 gint remaining_rhs_length, piece_rhs_length; 2974 @<Add PP CHAF rule for nullable continuation@>; 2975 @<Add PN CHAF rule for nullable continuation@>; 2976 @<Add NP CHAF rule for nullable continuation@>; 2977 @<Add NN CHAF rule for nullable continuation@>; 2978} 2979 2980@ Note that since the first part of |remaining_rhs| is exactly the same 2981as the first part of |piece_rhs| so I copy it here in preparation 2982for the PN rule. 2983@<Add PP CHAF rule for nullable continuation@> = 2984{ 2985gint real_symbol_count = piece_end - piece_start + 1; 2986for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) { 2987 remaining_rhs[piece_rhs_length] = 2988 piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length); 2989} 2990piece_rhs[piece_rhs_length++] = chaf_virtual_symid; 2991} 2992{ RULE chaf_rule; 2993 gint real_symbol_count = piece_rhs_length - 1; 2994 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 2995 @<Set CHAF rule flags and call back@>@; 2996} 2997 2998@ @<Add PN CHAF rule for nullable continuation@> = 2999{ 3000 gint chaf_rule_length = Length_of_RULE(rule) - piece_start; 3001 for (remaining_rhs_length = piece_rhs_length - 1; 3002 remaining_rhs_length < chaf_rule_length; remaining_rhs_length++) 3003 { 3004 Marpa_Symbol_ID original_id = 3005 RHS_ID_of_RULE (rule, piece_start + remaining_rhs_length); 3006 SYM alias = symbol_null_alias (SYM_by_ID (original_id)); 3007 remaining_rhs[remaining_rhs_length] = 3008 alias ? ID_of_SYM (alias) : original_id; 3009 } 3010} 3011{ 3012 RULE chaf_rule; 3013 gint real_symbol_count = remaining_rhs_length; 3014 chaf_rule = 3015 rule_start (g, current_lhs_id, remaining_rhs, remaining_rhs_length); 3016 @<Set CHAF rule flags and call back@>@; 3017} 3018 3019@ Note, while I have the nulling alias for the first factor, 3020|remaining_rhs| is altered to be ready for the NN rule. 3021@<Add NP CHAF rule for nullable continuation@> = { 3022 Marpa_Symbol_ID proper_id = RHS_ID_of_RULE(rule, first_factor_position); 3023 SYM alias = symbol_null_alias(SYM_by_ID(proper_id)); 3024 remaining_rhs[first_factor_piece_position] = 3025 piece_rhs[first_factor_piece_position] = 3026 ID_of_SYM(alias); 3027} 3028{ RULE chaf_rule; 3029 gint real_symbol_count = piece_rhs_length-1; 3030 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3031 @<Set CHAF rule flags and call back@>@; 3032} 3033 3034@ If this piece is nullable (|piece_start| at or 3035after |nullable_suffix_ix|), I don't add an NN choice, 3036because nulling both factors makes the entire piece nulling, 3037and nulling rules cannot be fed directly to 3038the Marpa parse engine. 3039Note that |remaining_rhs| was altered above. 3040@<Add NN CHAF rule for nullable continuation@> = 3041if (piece_start < nullable_suffix_ix) { 3042 RULE chaf_rule; 3043 gint real_symbol_count = remaining_rhs_length; 3044 chaf_rule = rule_start(g, current_lhs_id, remaining_rhs, remaining_rhs_length); 3045 @<Set CHAF rule flags and call back@>@; 3046} 3047 3048@*0 Add CHAF Rules for Proper Continuations. 3049@ Open block and declarations. 3050@<Add CHAF rules for proper continuation@> = { 3051 gint piece_rhs_length; 3052RULE chaf_rule; 3053gint real_symbol_count; 3054Marpa_Symbol_ID first_factor_proper_id, second_factor_proper_id, 3055 first_factor_alias_id, second_factor_alias_id; 3056real_symbol_count = piece_end - piece_start + 1; 3057 3058@ The PP Rule. 3059@<Add CHAF rules for proper continuation@> = 3060 for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) { 3061 piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length); 3062 } 3063 piece_rhs[piece_rhs_length++] = chaf_virtual_symid; 3064 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3065 @<Set CHAF rule flags and call back@>@; 3066 3067@ The PN Rule. 3068@<Add CHAF rules for proper continuation@> = 3069 second_factor_proper_id = RHS_ID_of_RULE(rule, second_factor_position); 3070 piece_rhs[second_factor_piece_position] 3071 = second_factor_alias_id = alias_by_id(g, second_factor_proper_id); 3072 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3073 @<Set CHAF rule flags and call back@>@; 3074 3075@ The NP Rule. 3076@<Add CHAF rules for proper continuation@> = 3077 first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position); 3078 piece_rhs[first_factor_piece_position] 3079 = first_factor_alias_id = alias_by_id(g, first_factor_proper_id); 3080 piece_rhs[second_factor_piece_position] = second_factor_proper_id; 3081 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3082 @<Set CHAF rule flags and call back@>@; 3083 3084@ The NN Rule. 3085@<Add CHAF rules for proper continuation@> = 3086 piece_rhs[second_factor_piece_position] = second_factor_alias_id; 3087 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3088 @<Set CHAF rule flags and call back@>@; 3089 3090@ Close the block 3091@<Add CHAF rules for proper continuation@> = } 3092 3093@*0 Add Final CHAF Rules for Two Factors. 3094Open block, declarations and setup. 3095@<Add final CHAF rules for two factors@> = { 3096gint first_factor_position = factor_positions[factor_position_ix]; 3097gint first_factor_piece_position = first_factor_position - piece_start; 3098gint second_factor_position = factor_positions[factor_position_ix+1]; 3099gint second_factor_piece_position = second_factor_position - piece_start; 3100gint real_symbol_count; 3101gint piece_rhs_length; 3102RULE chaf_rule; 3103Marpa_Symbol_ID first_factor_proper_id, second_factor_proper_id, 3104 first_factor_alias_id, second_factor_alias_id; 3105piece_end = Length_of_RULE(rule)-1; 3106real_symbol_count = piece_end - piece_start + 1; 3107 3108@ The PP Rule. 3109@<Add final CHAF rules for two factors@> = 3110 for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) { 3111 piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length); 3112 } 3113 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3114 @<Set CHAF rule flags and call back@>@; 3115 3116@ The PN Rule. 3117@<Add final CHAF rules for two factors@> = 3118 second_factor_proper_id = RHS_ID_of_RULE(rule, second_factor_position); 3119 piece_rhs[second_factor_piece_position] 3120 = second_factor_alias_id = alias_by_id(g, second_factor_proper_id); 3121 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3122 @<Set CHAF rule flags and call back@>@; 3123 3124@ The NP Rule. 3125@<Add final CHAF rules for two factors@> = 3126 first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position); 3127 piece_rhs[first_factor_piece_position] 3128 = first_factor_alias_id = alias_by_id(g, first_factor_proper_id); 3129 piece_rhs[second_factor_piece_position] = second_factor_proper_id; 3130 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3131 @<Set CHAF rule flags and call back@>@; 3132 3133@ The NN Rule. This is added only if it would not turn this into 3134a nulling rule. 3135@<Add final CHAF rules for two factors@> = 3136if (piece_start < nullable_suffix_ix) { 3137 piece_rhs[second_factor_piece_position] = second_factor_alias_id; 3138 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3139 @<Set CHAF rule flags and call back@>@; 3140} 3141 3142@ Close the block 3143@<Add final CHAF rules for two factors@> = } 3144 3145@*0 Add Final CHAF Rules for One Factor. 3146@<Add final CHAF rules for one factor@> = { 3147gint piece_rhs_length; 3148RULE chaf_rule; 3149Marpa_Symbol_ID first_factor_proper_id, first_factor_alias_id; 3150gint real_symbol_count; 3151gint first_factor_position = factor_positions[factor_position_ix]; 3152gint first_factor_piece_position = factor_positions[factor_position_ix] - piece_start; 3153piece_end = Length_of_RULE(rule)-1; 3154real_symbol_count = piece_end - piece_start + 1; 3155 3156@ The P Rule. 3157@<Add final CHAF rules for one factor@> = 3158 for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) { 3159 piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length); 3160 } 3161 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3162 @<Set CHAF rule flags and call back@>@; 3163 3164@ The N Rule. This is added only if it would not turn this into 3165a nulling rule. 3166@<Add final CHAF rules for one factor@> = 3167if (piece_start < nullable_suffix_ix) { 3168 first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position); 3169 first_factor_alias_id = alias_by_id(g, first_factor_proper_id); 3170 piece_rhs[first_factor_piece_position] = first_factor_alias_id; 3171 chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length); 3172 @<Set CHAF rule flags and call back@>@; 3173} 3174 3175@ Close the block 3176@<Add final CHAF rules for one factor@> = } 3177 3178@ Some of the code for adding CHAF rules is common to 3179them all. 3180This include the setting of many of the elements of the 3181rule structure, and performing the call back. 3182@<Set CHAF rule flags and call back@> = 3183RULE_is_Used (chaf_rule) = 1; 3184chaf_rule->t_original = rule_id; 3185RULE_is_Virtual_LHS(chaf_rule) = piece_start > 0; 3186chaf_rule->t_is_semantic_equivalent = !RULE_is_Virtual_LHS(chaf_rule); 3187RULE_is_Virtual_RHS(chaf_rule) = Length_of_RULE (chaf_rule) > real_symbol_count; 3188chaf_rule->t_virtual_start = piece_start; 3189chaf_rule->t_virtual_end = piece_start + real_symbol_count - 1; 3190Real_SYM_Count_of_RULE(chaf_rule) = real_symbol_count; 3191rule_callback (g, chaf_rule->t_id); 3192 3193@ This utility routine translates a proper symbol id to a nulling symbol ID. 3194It is assumed that the caller has ensured that 3195|proper_id| is valid and that an alias actually exists. 3196@<Function definitions@> = 3197static inline 3198Marpa_Symbol_ID alias_by_id(struct marpa_g* g, Marpa_Symbol_ID proper_id) { 3199 SYM alias = symbol_null_alias(SYM_by_ID(proper_id)); 3200 return ID_of_SYM(alias); 3201} 3202@ @<Private function prototypes@> = 3203static inline 3204Marpa_Symbol_ID alias_by_id(struct marpa_g* g, Marpa_Symbol_ID proper_id); 3205 3206@** Adding a New Start Symbol. 3207This is such a common rewrite that it has a special name 3208in the literature --- it is called ``augmenting the grammar". 3209 3210@ @<Function definitions@> = 3211static inline 3212struct marpa_g* g_augment(struct marpa_g* g) { 3213 Marpa_Symbol_ID proper_new_start_id = -1; 3214 SYM proper_old_start = NULL; 3215 SYM nulling_old_start = NULL; 3216 SYM proper_new_start = NULL; 3217 SYM old_start = SYM_by_ID(g->t_start_symid); 3218 @<Find and classify the old start symbols@>@; 3219 if (proper_old_start) { @<Set up a new proper start rule@> } 3220 if (nulling_old_start) { @<Set up a new nulling start rule@> } 3221 return g; 3222} 3223@ @<Private function prototypes@> = 3224static inline struct marpa_g* g_augment(struct marpa_g* g); 3225 3226@ @<Find and classify the old start symbols@> = 3227if (SYM_is_Nulling(old_start)) { 3228 old_start->t_is_accessible = 0; 3229 nulling_old_start = old_start; 3230} else { 3231 proper_old_start = old_start; 3232 nulling_old_start = symbol_null_alias(old_start); 3233} 3234old_start->t_is_start = 0; 3235 3236@ @<Set up a new proper start rule@> = { 3237 RULE new_start_rule; 3238 proper_old_start->t_is_start = 0; 3239 proper_new_start = symbol_new (g); 3240 proper_new_start_id = ID_of_SYM(proper_new_start); 3241 g->t_start_symid = proper_new_start_id; 3242 proper_new_start->t_is_accessible = TRUE; 3243 proper_new_start->t_is_productive = TRUE; 3244 proper_new_start->t_is_start = TRUE; 3245 g_context_clear (g); 3246 g_context_int_add (g, "old_start_id", ID_of_SYM(old_start)); 3247 symbol_callback (g, proper_new_start_id); 3248 new_start_rule = rule_start (g, proper_new_start_id, &LV_ID_of_SYM(old_start), 1); 3249 new_start_rule->t_is_start = 1; 3250 RULE_is_Virtual_LHS(new_start_rule) = 1; 3251 Real_SYM_Count_of_RULE(new_start_rule) = 1; 3252 RULE_is_Used(new_start_rule) = 1; 3253 g->t_proper_start_rule = new_start_rule; 3254 rule_callback (g, new_start_rule->t_id); 3255} 3256 3257@ Set up the new nulling start rule, if the old start symbol was 3258nulling or had a null alias. A new nulling start symbol 3259must be created. It is an alias of the new proper start symbol, 3260if there is one. Otherwise it is a new, nulling, symbol. 3261@<Set up a new nulling start rule@> = { 3262 Marpa_Symbol_ID nulling_new_start_id; 3263 RULE new_start_rule; 3264 SYM nulling_new_start; 3265 if (proper_new_start) 3266 { /* There are two start symbols */ 3267 nulling_new_start = symbol_alias_create (g, proper_new_start); 3268 nulling_new_start_id = ID_of_SYM(nulling_new_start); 3269 } 3270 else 3271 { /* The only start symbol is a nulling symbol */ 3272 nulling_new_start = symbol_new (g); 3273 nulling_new_start_id = ID_of_SYM(nulling_new_start); 3274 g->t_start_symid = nulling_new_start_id; 3275 SYM_is_Nulling(nulling_new_start) = TRUE; 3276 nulling_new_start->t_is_nullable = TRUE; 3277 nulling_new_start->t_is_productive = TRUE; 3278 nulling_new_start->t_is_accessible = TRUE; 3279 } 3280 nulling_new_start->t_is_start = TRUE; 3281 g_context_clear (g); 3282 g_context_int_add (g, "old_start_id", ID_of_SYM(old_start)); 3283 symbol_callback (g, nulling_new_start_id); 3284 new_start_rule = rule_start (g, nulling_new_start_id, 0, 0); 3285 new_start_rule->t_is_start = 1; 3286 RULE_is_Virtual_LHS(new_start_rule) = 1; 3287 Real_SYM_Count_of_RULE(new_start_rule) = 1; 3288 RULE_is_Used(new_start_rule) = TRUE; 3289 g->t_null_start_rule = new_start_rule; 3290 rule_callback (g, new_start_rule->t_id); 3291} 3292 3293@** Loops. 3294Loops are rules which non-trivially derive their own LHS. 3295More precisely, a rule is a loop if and only if it 3296non-trivially derives a string which contains its LHS symbol 3297and is of length 1. 3298In my experience, 3299and according to Grune and Jacobs 2008 (pp. 48-49), 3300loops are never of practical use. 3301 3302@ Marpa allows loops, for two reasons. 3303First, I want to be able to claim that 3304Marpa handles {\bf all} context-free grammars. 3305This is of real value to the user, because 3306it makes 3307it very easy for her 3308to know beforehand whether Marpa can 3309handle a particular grammar. 3310If she can write the grammar in BNF, then Marpa can handle it --- 3311it's that simple. 3312For Marpa to make this claim, 3313it must be able to handle grammars 3314with loops. 3315 3316Second, a user's drafts of a grammar might contain cycles. 3317A parser generator which did not handle them would force 3318the user's first order of business to be removing them. 3319That might be inconvenient. 3320 3321@ The grammar precomputations and the recognition 3322phase have been set up so that 3323loops are a complete non-issue --- they are dealt with like 3324any other situation, without additional overhead. 3325However, loops do impose overhead and require special 3326handling in the evaluation phase. 3327It is unlikely that a user will want to leave one in 3328a production grammar. 3329 3330@ Marpa detects all loops during its grammar 3331precomputation. 3332|libmarpa| assumes that parsing will go through as usual, 3333with the loops. 3334But it enables the upper layers to make other choices 3335by passing a message for every symbol involved in a 3336loop, 3337as well as a final message with the count of looping symbols. 3338 3339@<Function definitions@> = 3340static inline 3341void loop_detect(struct marpa_g* g) 3342{ gint no_of_rules = RULE_Count_of_G(g); 3343gint loop_rule_count = 0; 3344Bit_Matrix unit_transition_matrix 3345 = matrix_create( (guint)no_of_rules , (guint)no_of_rules); 3346@<Mark direct unit transitions in |unit_transition_matrix|@>@; 3347transitive_closure(unit_transition_matrix); 3348@<Mark loop rules@>@; 3349if (loop_rule_count) g->t_has_loop = TRUE; 3350@<Report loop rule count@>@; 3351matrix_free(unit_transition_matrix); 3352} 3353@ @<Private function prototypes@> = 3354static inline 3355void loop_detect(struct marpa_g* g); 3356 3357@ Note that direct transitions are marked in advance, 3358but not trivial ones. 3359That is, bit |(x,x)| is not set |TRUE| in advance. 3360In other words, for this purpose, 3361unit transitions are not in general reflexive. 3362@<Mark direct unit transitions in |unit_transition_matrix|@> = { 3363Marpa_Rule_ID rule_id; 3364for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) { 3365 RULE rule = RULE_by_ID(g, rule_id); 3366 Marpa_Symbol_ID proper_id; 3367 gint rhs_ix, rule_length; 3368 if (!RULE_is_Used(rule)) continue; 3369 rule_length = Length_of_RULE(rule); 3370 proper_id = -1; 3371 for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) { 3372 Marpa_Symbol_ID symid = RHS_ID_of_RULE(rule, rhs_ix); 3373 SYM symbol = SYM_by_ID(symid); 3374 if (symbol->t_is_nullable) continue; /* After the CHAF rewrite, nullable $\E$ nulling */ 3375 if (proper_id >= 0) goto NEXT_RULE; /* More 3376 than one proper symbol -- not a unit rule */ 3377 proper_id = symid; 3378 } 3379 @# 3380 if (proper_id < 0) continue; /* A 3381 nulling start rule is allowed, so there may be no proper symbol */ 3382 { SYM rhs_symbol = SYM_by_ID(proper_id); 3383 GArray* lhs_rules = rhs_symbol->t_lhs; 3384 gint ix, no_of_lhs_rules = lhs_rules->len; 3385 for (ix = 0; ix < no_of_lhs_rules; ix++) { 3386 /* Direct loops ($A \RA A$) only need the $(rule_id, rule_id)$ bit set, 3387 but it is not clear that it is a win to special case them. */ 3388 matrix_bit_set(unit_transition_matrix, (guint)rule_id, 3389 (guint)g_array_index(lhs_rules, Marpa_Rule_ID, ix)); 3390 } } 3391 NEXT_RULE: ; 3392} } 3393 3394@ Virtual loop rule are loop rules from the virtual point of view. 3395When CHAF rules, which are rewritten into multiple pieces, 3396it is inconvenient to see each piece as a loop rule. 3397Therefore only certain of CHAF pieces that are loop rules 3398are regarded as virtual loop rules. 3399All non-CHAF rules are virtual loop rules including, 3400at this point, sequence rules. 3401@<Mark loop rules@> = { Marpa_Rule_ID rule_id; 3402for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) { 3403 RULE rule; 3404 if (!matrix_bit_test(unit_transition_matrix, (guint)rule_id, (guint)rule_id)) 3405 continue; 3406 loop_rule_count++; 3407 rule = RULE_by_ID(g, rule_id); 3408 rule->t_is_loop = TRUE; 3409 rule->t_is_virtual_loop = rule->t_virtual_start < 0 || !RULE_is_Virtual_RHS(rule); 3410 g_context_clear(g); 3411 g_context_int_add(g, "rule_id", rule_id); 3412 grammar_message(g, "loop rule"); 3413} } 3414 3415@ The higher layers can differ greatly in their treatment 3416of loop rules. It is perfectly reasonable for a higher layer to treat a loop 3417rule as a fatal error. 3418It is also reasonable for a higher layer to always silently allow them. 3419There are lots of possibilities in between these two extremes. 3420To assist the upper layers, the reporting is very thorough --- 3421there is not just a message for each loop rule, but also a final tally. 3422@<Report loop rule count@> = 3423g_context_clear(g); 3424g_context_int_add(g, "loop_rule_count", loop_rule_count); 3425grammar_message(g, "loop rule tally"); 3426 3427@** The Aycock-Horspool Finite Automata. 3428 3429@*0 Some Statistics on AHFA states. 3430For Perl's grammar, the discovered states range in size from 1 to 20 items, 3431but the numbers are heavily skewed toward the low 3432end. Here are the item counts that appear, with the percent of the total 3433discovered AHFA states with that item count in parentheses. 3434in parentheses: 34351 (67.05\%); 34362 (25.67\%); 34373 (2.87\%); 34384 (2.68\%); 34395 (0.19\%); 34406 (0.38\%); 34417 (0.19\%); 34428 (0.57\%); 34439 (0.19\%); and 344420 (0.19\%). 3445 3446@ As can be seen, well over 90\% of the total discovered states have 3447just one or two items. 3448The average size is 1.5235, 3449and the average of the $|size|^2$ is 3.9405. 3450 3451@ For the HTML grammars I used, the totals are even more lopsided: 345280.96\% of all discovered states have only 1 item. 3453All the others (19.04\%) have 2 items. 3454The average size is 1.1904, 3455and the average of the $|size|^2$ is 1.5712. 3456 3457@ The number of predicted states tends to be much more 3458evenly distributed. 3459It also tends to be much larger, and 3460the average for practical grammars may be $O(s)$, 3461where $s$ is the size of the grammar. 3462This is the same as the theoretical worst case. 3463 3464Here are the number of items for predicted states for the Perl grammar. 3465The number of states with that item count in is parentheses: 34661 item (3), 34672 items (5), 34683 items (4), 34694 items (3), 34705 items (1), 34716 items (2), 34727 items (2), 347364 items (1), 347471 items (1), 347577 items (1), 347679 items (1), 347781 items (1), 347883 items (1), 347985 items (1), 348088 items (1), 348190 items (1), 348298 items (1), 3483100 items (1), 3484102 items (1), 3485104 items (1), 3486106 items (1), 3487108 items (1), 3488111 items (1), 3489116 items (1), 3490127 items (1), 3491129 items (1), 3492132 items (1), 3493135 items (1), 3494136 items (1), 3495137 items (1), 3496141 items (1), 3497142 items (4), 3498143 items (2), 3499144 items (1), 3500149 items (1), 3501151 items (1), 3502156 items (1), 3503157 items (1), 3504220 items (1), 3505224 items (1), 3506225 items (1). 3507And here is the same data for some grammar of HTML: 35081 item (95), 35092 items (95), 35104 items (95), 351111 items (181), 351214 items (181), 351315 items (294), 351416 items (112), 351518 items (349), 351619 items (120), 351720 items (190), 351821 items (63), 351922 items (22), 352024 items (8), 352125 items (16), 352226 items (16), 352328 items (2), 352429 items (16). 3525 3526 3527@** AHFA Item (AIM) Code. 3528AHFA states are sets of AHFA items. 3529AHFA items are named by analogy with LR(0) items. 3530LR(0) items play the same role in the LR(0) automaton that 3531AHFA items play in the AHFA --- 3532the states of the automata correspond to sets of the items. 3533Also like LR(0) items, 3534each AHFA items correponds one-to-one to a duple, 3535the duple being a a rule and a position in that rule. 3536@<Public typedefs@> = 3537typedef gint Marpa_AHFA_Item_ID; 3538@ 3539@d Sort_Key_of_AIM(aim) ((aim)->t_sort_key) 3540@<Private structures@> = 3541struct s_AHFA_item { 3542 gint t_sort_key; 3543 @<Widely aligned AHFA item elements@>@; 3544 @<Int aligned AHFA item elements@>@; 3545}; 3546@ @<Private incomplete structures@> = 3547struct s_AHFA_item; 3548typedef struct s_AHFA_item* AIM; 3549typedef Marpa_AHFA_Item_ID AIMID; 3550 3551@ A pointer to two lists of AHFA items. 3552The one list contains the AHFA items themselves, in 3553AHFA item ID order. 3554The other is indexed by rule ID, and contains a pointer to 3555the first AHFA item for that rule. 3556@ Because AHFA items are in an array, the predecessor can 3557be found by incrementing the AIM pointer, 3558the successor can be found by decrementing it, 3559and AIM pointers can be portably compared. 3560A lot of code relies on these facts. 3561@d Next_AIM_of_AIM(aim) ((aim)+1) 3562@d AIM_by_ID(id) (g->t_AHFA_items+(id)) 3563@<Widely aligned grammar elements@> = 3564 AIM t_AHFA_items; 3565 AIM* t_AHFA_items_by_rule; 3566@ 3567@d AIM_Count_of_G(g) ((g)->t_aim_count) 3568@d LV_AIM_Count_of_G(g) AIM_Count_of_G(g) 3569@<Int aligned grammar elements@> = 3570 guint t_aim_count; 3571@ The space is allocated during precomputation. 3572Because the grammar may be destroyed before precomputation, 3573I test that |g->t_AHFA_items| is non-zero. 3574@ @<Initialize grammar elements@> = 3575g->t_AHFA_items = NULL; 3576g->t_AHFA_items_by_rule = NULL; 3577@ @<Destroy grammar elements@> = 3578if (g->t_AHFA_items) { g_free(g->t_AHFA_items); }; 3579if (g->t_AHFA_items_by_rule) { g_free(g->t_AHFA_items_by_rule); }; 3580 3581@ Check that AHFA item ID is in valid range. 3582@<Function definitions@> = 3583static inline gboolean item_is_valid( 3584GRAMMAR_Const g, AIMID item_id) { 3585return item_id < (AIMID)AIM_Count_of_G(g) && item_id >= 0; 3586} 3587@ @<Private function prototypes@> = 3588static inline gboolean item_is_valid( 3589GRAMMAR_Const g, AIMID item_id); 3590 3591@*0 Rule. 3592@d RULE_of_AIM(item) ((item)->t_rule) 3593@d RULEID_of_AIM(item) ID_of_RULE(RULE_of_AIM(item)) 3594@d LHS_ID_of_AIM(item) (LHS_ID_of_RULE(RULE_of_AIM(item))) 3595@<Widely aligned AHFA item elements@> = 3596 RULE t_rule; 3597 3598@*0 Position. 3599Position in the RHS, -1 for a completion. 3600@d Position_of_AIM(aim) ((aim)->t_position) 3601@<Int aligned AHFA item elements@> = 3602gint t_position; 3603 3604@*0 Postdot Symbol. 3605|-1| if the item is a completion. 3606@d Postdot_SYMID_of_AIM(item) ((item)->t_postdot) 3607@d AIM_is_Completion(aim) (Postdot_SYMID_of_AIM(aim) < 0) 3608@d AIM_has_Completed_Start_Rule(aim) 3609 (AIM_is_Completion(aim) && RULE_is_Start(RULE_of_AIM(aim))) 3610@<Int aligned AHFA item elements@> = Marpa_Symbol_ID t_postdot; 3611 3612@*0 Leading Nulls. 3613In libmarpa's AHFA items, the dot position is never in front 3614of a nulling symbol. (Due to rewriting, every nullable symbol 3615is also a nulling symbol.) 3616This element contains the count of nulling symbols preceding 3617this AHFA items's dot position. 3618@d Null_Count_of_AIM(aim) ((aim)->t_leading_nulls) 3619@<Int aligned AHFA item elements@> = 3620gint t_leading_nulls; 3621 3622@*0 AHFA Item External Accessors. 3623@<Function definitions@> = 3624guint marpa_AHFA_item_count(struct marpa_g* g) { 3625 @<Return |-2| on failure@>@/ 3626 @<Fail if grammar not precomputed@>@/ 3627 return AIM_Count_of_G(g); 3628} 3629@ @<Public function prototypes@> = 3630guint marpa_AHFA_item_count(struct marpa_g* g); 3631 3632@ @<Function definitions@> = 3633Marpa_Rule_ID marpa_AHFA_item_rule(struct marpa_g* g, 3634 Marpa_AHFA_Item_ID item_id) { 3635 @<Return |-2| on failure@>@/ 3636 @<Fail if grammar not precomputed@>@/ 3637 @<Fail if grammar |item_id| is invalid@>@/ 3638 return RULE_of_AIM(AIM_by_ID(item_id))->t_id; 3639} 3640@ @<Public function prototypes@> = 3641Marpa_Rule_ID marpa_AHFA_item_rule(struct marpa_g* g, Marpa_AHFA_Item_ID item_id); 3642 3643@ |-1| is the value for completions, so |-2| is the failure indicator. 3644@<Public function prototypes@> = 3645gint marpa_AHFA_item_position(struct marpa_g* g, Marpa_AHFA_Item_ID item_id); 3646@ @<Function definitions@> = 3647gint marpa_AHFA_item_position(struct marpa_g* g, 3648 Marpa_AHFA_Item_ID item_id) { 3649 @<Return |-2| on failure@>@/ 3650 @<Fail if grammar not precomputed@>@/ 3651 @<Fail if grammar |item_id| is invalid@>@/ 3652 return Position_of_AIM(AIM_by_ID(item_id)); 3653} 3654 3655@ |-1| is the value for completions, so |-2| is the failure indicator. 3656@<Public function prototypes@> = 3657Marpa_Symbol_ID marpa_AHFA_item_postdot(struct marpa_g* g, Marpa_AHFA_Item_ID item_id); 3658@ @<Function definitions@> = 3659Marpa_Symbol_ID marpa_AHFA_item_postdot(struct marpa_g* g, 3660 Marpa_AHFA_Item_ID item_id) { 3661 @<Return |-2| on failure@>@/ 3662 @<Fail if grammar not precomputed@>@/ 3663 @<Fail if grammar |item_id| is invalid@>@/ 3664 return Postdot_SYMID_of_AIM(AIM_by_ID(item_id)); 3665} 3666 3667@ @<Public function prototypes@> = 3668gint marpa_AHFA_item_sort_key(struct marpa_g* g, Marpa_AHFA_Item_ID item_id); 3669@ @<Function definitions@> = 3670gint marpa_AHFA_item_sort_key(struct marpa_g* g, 3671 Marpa_AHFA_Item_ID item_id) { 3672 @<Return |-2| on failure@>@/ 3673 @<Fail if grammar not precomputed@>@/ 3674 @<Fail if grammar |item_id| is invalid@>@/ 3675 return Sort_Key_of_AIM(AIM_by_ID(item_id)); 3676} 3677 3678@** Creating the AHFA Items. 3679@ I do not use a |DSTACK| because I can initially size the 3680item stack to |Size_of_G(g)|, which is a reasonable allocation, 3681but guaranteed to be greater than 3682or equal to the final numbers of items. 3683That means that I can avoid the overhead of checking the array 3684size when adding each new AHFA item. 3685@<Function definitions@> = 3686static inline 3687void create_AHFA_items(GRAMMAR g) { 3688 RULEID rule_id; 3689 guint no_of_items; 3690 guint no_of_rules = RULE_Count_of_G(g); 3691 AIM base_item = g_new(struct s_AHFA_item, Size_of_G(g)); 3692 AIM current_item = base_item; 3693 guint symbol_instance_of_next_rule = 0; 3694 for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) { 3695 RULE rule = RULE_by_ID (g, rule_id); 3696 if (RULE_is_Used (rule)) { 3697 @<Create the AHFA items for a rule@>@; 3698 SYMI_of_RULE(rule) = symbol_instance_of_next_rule; 3699 symbol_instance_of_next_rule += Length_of_RULE(rule); 3700 } 3701 } 3702 SYMI_Count_of_G(g) = symbol_instance_of_next_rule; 3703 no_of_items = LV_AIM_Count_of_G(g) = current_item - base_item; 3704 g->t_AHFA_items = g_renew(struct s_AHFA_item, base_item, no_of_items); 3705 @<Set up the items-by-rule list@>@; 3706 @<Set up the AHFA item ids@>@; 3707} 3708@ @<Private function prototypes@> = 3709static inline void create_AHFA_items(struct marpa_g* g); 3710 3711@ @<Create the AHFA items for a rule@> = 3712{ 3713 gint leading_nulls = 0; 3714 gint rhs_ix; 3715 for (rhs_ix = 0; rhs_ix < Length_of_RULE(rule); rhs_ix++) 3716 { 3717 SYMID rh_symid = RHS_ID_of_RULE (rule, rhs_ix); 3718 SYM symbol = SYM_by_ID (rh_symid); 3719 if (!symbol->t_is_nullable) 3720 { 3721 Last_Proper_SYMI_of_RULE(rule) = symbol_instance_of_next_rule + rhs_ix; 3722 @<Create an AHFA item for a precompletion@>@; 3723 leading_nulls = 0; 3724 current_item++; 3725 } 3726 else 3727 { 3728 leading_nulls++; 3729 } 3730 } 3731 @<Create an AHFA item for a completion@>@; 3732 current_item++; 3733} 3734 3735@ @<Create an AHFA item for a precompletion@> = 3736{ 3737 RULE_of_AIM (current_item) = rule; 3738 Sort_Key_of_AIM (current_item) = current_item - base_item; 3739 Null_Count_of_AIM(current_item) = leading_nulls; 3740 Postdot_SYMID_of_AIM (current_item) = rh_symid; 3741 Position_of_AIM (current_item) = rhs_ix; 3742} 3743 3744@ @<Create an AHFA item for a completion@> = 3745{ 3746 RULE_of_AIM (current_item) = rule; 3747 Sort_Key_of_AIM (current_item) = current_item - base_item; 3748 Null_Count_of_AIM(current_item) = leading_nulls; 3749 Postdot_SYMID_of_AIM (current_item) = -1; 3750 Position_of_AIM (current_item) = -1; 3751} 3752 3753@ This is done after creating the AHFA items, because in 3754theory the |g_renew| might have moved them. 3755This is not likely since the |g_renew| shortened the array, 3756but if you are hoping for portability, 3757you want to follow the rules. 3758@<Set up the items-by-rule list@> = 3759{ 3760 AIM *items_by_rule = g_new (AIM, no_of_rules); 3761 AIM items = g->t_AHFA_items; 3762 /* The highest ID of a rule whose AHFA items have been found */ 3763 Marpa_Rule_ID highest_found_rule_id = -1; 3764 Marpa_AHFA_Item_ID item_id; 3765 /* |items_by_rule| must be NULL'd 3766 because not all entries will be populated */ 3767 for (rule_id = 0; rule_id < (Marpa_Rule_ID) no_of_rules; rule_id++) 3768 { 3769 items_by_rule[rule_id] = NULL; 3770 } 3771 for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++) 3772 { 3773 AIM item = items + item_id; 3774 Marpa_Rule_ID rule_id_for_item = RULE_of_AIM (item)->t_id; 3775 if (rule_id_for_item <= highest_found_rule_id) 3776 continue; 3777 items_by_rule[rule_id_for_item] = item; 3778 highest_found_rule_id = rule_id_for_item; 3779 } 3780 g->t_AHFA_items_by_rule = items_by_rule; 3781} 3782 3783@ @<Private function prototypes@> = 3784static gint cmp_by_aimid (gconstpointer a, 3785 gconstpointer b, gpointer user_data); 3786@ This functions sorts a list of pointers to 3787AHFA items by AHFA item id, 3788which is their most natural order. 3789Once the AHFA states are created, 3790they are restored to this order. 3791For portability, 3792it requires the AIMs to be in an array. 3793@ @<Function definitions@> = 3794static gint cmp_by_aimid (gconstpointer ap, 3795 gconstpointer bp, 3796 gpointer user_data @, G_GNUC_UNUSED) { 3797 AIM a = *(AIM*)ap; 3798 AIM b = *(AIM*)bp; 3799 return a-b; 3800} 3801 3802@ @<Private function prototypes@> = 3803static gint cmp_by_postdot_and_aimid (gconstpointer a, 3804 gconstpointer b, gpointer user_data); 3805@ The AHFA items were created with a temporary ID which sorts them 3806by rule, then by position within that rule. We need one that sort the AHFA items 3807by (from major to minor) postdot symbol, then rule, then position. 3808A postdot symbol of $-1$ should sort high. 3809This comparison function is used in the logic to change the AHFA item ID's 3810from their temporary values to their final ones. 3811@ @<Function definitions@> = 3812static gint cmp_by_postdot_and_aimid (gconstpointer ap, 3813 gconstpointer bp, gpointer user_data @, G_GNUC_UNUSED) { 3814 AIM a = *(AIM*)ap; 3815 AIM b = *(AIM*)bp; 3816 gint a_postdot = Postdot_SYMID_of_AIM(a); 3817 gint b_postdot = Postdot_SYMID_of_AIM(b); 3818 if (a_postdot == b_postdot) 3819 return Sort_Key_of_AIM (a) - Sort_Key_of_AIM (b); 3820 if (a_postdot < 0) return 1; 3821 if (b_postdot < 0) return -1; 3822 return a_postdot-b_postdot; 3823} 3824 3825@ Change the AHFA ID's from their temporary form to their 3826final form. 3827Pointers to the AHFA items are copied to a temporary array 3828which is then sorted in the order required for the new ID. 3829As a result, the final AHFA ID number will be the same as 3830the index in this temporary arra. 3831A final loop then indexes through 3832the temporary array and writes the index to the pointed-to 3833AHFA item as its new, final ID. 3834@<Set up the AHFA item ids@> = 3835{ 3836 Marpa_AHFA_Item_ID item_id; 3837 AIM *sort_array = g_new (struct s_AHFA_item *, no_of_items); 3838 AIM items = g->t_AHFA_items; 3839 for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++) 3840 { 3841 sort_array[item_id] = items + item_id; 3842 } 3843 g_qsort_with_data (sort_array, 3844 (gint) no_of_items, sizeof (AIM), cmp_by_postdot_and_aimid, 3845 (gpointer) NULL); 3846 for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++) 3847 { 3848 Sort_Key_of_AIM (sort_array[item_id]) = item_id; 3849 } 3850 g_free (sort_array); 3851} 3852 3853@** AHFA State (AHFA) Code. 3854 3855This algorithm to create the AHFA states is new with |libmarpa|. 3856It is based on noting that the states to be created fall into 3857distinct classes, and that considerable optimization is possible 3858if the classes of AHFA states are optimized separately. 3859@ In their paper Aycock and Horspool divide the states of their 3860automaton into 3861call non-kernel and kernel states. 3862In the AHFA, kernel states are called discovered AHFA states. 3863Non-kernel states are called predicted AHFA states. 3864If an AHFA states contains a start rule or 3865or an AHFA item for which at least some 3866non-nulling symbol has been recognized, 3867it is an {\bf discovered} AHFA state. 3868Otherwise, the AHFA state will contain only predictions, 3869and is a {\bf predicted} AHFA state. 3870@ Predicted AHFA states are so called because they only contain 3871items which predict, according to the grammar, 3872what might be found in the input. 3873Discovered AHFA states are so called because either they ``report" 3874the start of the input 3875or they ``report" symbols actually found in the input. 3876There is only one case in which 3877a discovered AHFA state will contain a prediction --- 3878that is when the AHFA state contains an 3879AHFA item for the nulling start rule. 3880@ {\bf The Initial AHFA State}: 3881This is the only state which can 3882contain an AHFA item for a null rule. 3883It only takes one of three possible forms. 3884Listing the reasons that it makes sense to special-case 3885this class would take more space than the code to do it. 3886@ {\bf The Initial AHFA Prediction State}: 3887This state is paired with a special-cased state, so it would 3888require going out of our way to {\bf not} special-case this 3889state as well. 3890It does 3891share with the other initial state that property that it is not 3892necessary to check to ensure it does not duplicate an existing 3893state. 3894Other than that, the code is much like that to create any other 3895prediction state. 3896@ {\bf Discovered States with 1 item}: 3897These may be specially optimized for. 3898Sorting the items can be dispensed with. 3899Checking for duplicates can be done using an array indexed by 3900the ID of the only AHFA item. 3901Statistics for practical grammars show that most discovered states 3902contain only a single AHFA item, so there is a big payoff from 3903special-casing these. 3904@ {\bf Discovered States with 2 or more items}: 3905For non-singleton discovered states, 3906I use a hand-written insertion sort, 3907and check for duplicates using a hash with a customized key. 3908Further optimizations are possible, but 3909few discovered states fall into this case. 3910Also, discovered states of 2 items are a large enough class to justify 3911separating out, if a significant optimization for them could be 3912found. 3913@ {\bf Predicted States}: 3914These are treated differently from discovered states. 3915The items in these are always a subset of the initial items for rules, 3916and therefore correspond one-to-one with a powerset of the rules. 3917This fact is used in precomputing rule bit vectors, by postdot symbol, 3918to speed up the construction of these. 3919An advantage of using bit vectors is that a radix sort of the items 3920happens as a side effect. 3921Because prediction states follow a very different distribution from 3922discovered states, they have their own hash for checking duplicates. 3923 3924@<Public typedefs@> = 3925typedef gint Marpa_AHFA_State_ID; 3926 3927@ {\bf Estimating the number of AHFA States}: Based on the numbers given previously 3928for Perl and HTML, 3929$2s$ is a good high-ball estimate of the number of AHFA states for 3930grammars of practical interest, 3931where $s$ is the size of the grammar. 3932I come up with this as follows. 3933 3934Let the size of an AHFA state be the number of AHFA items it contains. 3935\li It is impossible for the number of AHFA items to greater than 3936the size of the grammar. 3937\li It is impossible for the number of discovered states of size 1 3938to be greater than the number of AHFA items. 3939\li The number of discovered states of size 2 or greater 3940will typically be half the number of discovered states of size 1, 3941or less. 3942\li The number of predicted states will typically be 3943considerably less than half the number of discovered states. 3944 3945The three possibilities just enumerated exhaust the possibilities for AHFA states. 3946The total is ${s \over 2} + {s \over 2} + s = 2s$. 3947Typically, the number of AHFA states should be less than this estimate. 3948 3949@d AHFA_of_G_by_ID(g, id) ((g)->t_AHFA+(id)) 3950@d AHFA_has_Completed_Start_Rule(ahfa) ((ahfa)->t_has_completed_start_rule) 3951@<Private incomplete structures@> = struct s_AHFA_state; 3952@ @<Private structures@> = 3953struct s_AHFA_state_key { 3954 Marpa_AHFA_State_ID t_id; 3955}; 3956struct s_AHFA_state { 3957 struct s_AHFA_state_key t_key; 3958 struct s_AHFA_state* t_empty_transition; 3959 @<Widely aligned AHFA state elements@>@; 3960 @<Int aligned AHFA state elements@>@; 3961 guint t_has_completed_start_rule:1; 3962 @<Bit aligned AHFA elements@>@; 3963}; 3964typedef struct s_AHFA_state AHFA_Object; 3965 3966@*0 Complete Symbols Container. 3967@ @d Complete_SYMIDs_of_AHFA(state) ((state)->t_complete_symbols) 3968@d LV_Complete_SYMIDs_of_AHFA(state) Complete_SYMIDs_of_AHFA(state) 3969@d Complete_SYM_Count_of_AHFA(state) ((state)->t_complete_symbol_count) 3970@d LV_Complete_SYM_Count_of_AHFA(state) Complete_SYM_Count_of_AHFA(state) 3971@<Int aligned AHFA state elements@> = 3972guint t_complete_symbol_count; 3973@ @<Widely aligned AHFA state elements@> = 3974SYMID* t_complete_symbols; 3975 3976@*0 AHFA Item Container. 3977@ @d AIMs_of_AHFA(ahfa) ((ahfa)->t_items) 3978@d AIM_of_AHFA_by_AEX(ahfa, aex) (AIMs_of_AHFA(ahfa)[aex]) 3979@d LV_AIMs_of_AHFA(ahfa) AIMs_of_AHFA(ahfa) 3980@d AIM_Count_of_AHFA(ahfa) ((ahfa)->t_item_count) 3981@d LV_AIM_Count_of_AHFA(ahfa) AIM_Count_of_AHFA(ahfa) 3982@d AEX_of_AHFA_by_AIM(ahfa, aim) aex_of_ahfa_by_aim_get((ahfa), (aim)) 3983@<Widely aligned AHFA state elements@> = 3984AIM* t_items; 3985@ @<Int aligned AHFA state elements@> = 3986guint t_item_count; 3987@ This function assumes that the caller knows that the AHFA item 3988is in the AHFA state. 3989@<Private function prototypes@> = 3990static inline AEX aex_of_ahfa_by_aim_get(AHFA ahfa, AIM aim_sought); 3991@ Binary search is overkill for discovered states, 3992not even repaying the overhead. 3993But prediction states can get larger, 3994and the overhead is always low. 3995An alternative is to have different search routines based on the number 3996of AIM items, but that is more overhead. 3997Perhaps better to just search than 3998to spend cycles figuring out how to search. 3999@<Function definitions@> = 4000static inline AEX aex_of_ahfa_by_aim_get(AHFA ahfa, AIM sought_aim) 4001{ 4002 AIM* const aims = AIMs_of_AHFA(ahfa); 4003 gint aim_count = AIM_Count_of_AHFA(ahfa); 4004 gint hi = aim_count - 1; 4005 gint lo = 0; 4006 while (hi >= lo) { // A binary search 4007 gint trial_aex = lo+(hi-lo)/2; // guards against overflow 4008 AIM trial_aim = aims[trial_aex]; 4009 if (trial_aim == sought_aim) return trial_aex; 4010 if (trial_aim < sought_aim) { 4011 lo = trial_aex+1; 4012 } else { 4013 hi = trial_aex-1; 4014 } 4015 } 4016 return -1; 4017} 4018 4019@*0 Is AHFA Predicted?. 4020@ This boolean indicates whether the 4021{\bf AHFA state} is predicted, 4022as opposed to whether it contains any predicted 4023AHFA items. 4024This makes a difference in AHFA state 0. 4025When the null parse is allowed. 4026AHFA state 0 will contain an AHFA item 4027which is {\bf both} a prediction 4028and a completion. 4029AHFA state 0 is, however, {\bf never} 4030a predicted AHFA state. 4031@d AHFA_is_Predicted(ahfa) ((ahfa)->t_is_predict) 4032@d LV_AHFA_is_Predicted(ahfa) AHFA_is_Predicted(ahfa) 4033@d EIM_is_Predicted(eim) AHFA_is_Predicted(AHFA_of_EIM(eim)) 4034@<Bit aligned AHFA elements@> = 4035guint t_is_predict:1; 4036 4037@ @<Private typedefs@> = 4038typedef struct s_AHFA_state* AHFA; 4039typedef gint AHFAID; 4040 4041@ @<Widely aligned grammar elements@> = struct s_AHFA_state* t_AHFA; 4042@ 4043@d AHFA_Count_of_G(g) ((g)->t_AHFA_len) 4044@<Int aligned grammar elements@> = gint t_AHFA_len; 4045@ @<Initialize grammar elements@> = 4046g->t_AHFA = NULL; 4047AHFA_Count_of_G(g) = 0; 4048@*0 Destructor. 4049@<Destroy grammar elements@> = if (g->t_AHFA) { 4050AHFAID id; 4051for (id = 0; id < AHFA_Count_of_G(g); id++) { 4052 AHFA ahfa_state = AHFA_of_G_by_ID(g, id); 4053 @<Free AHFA state@>@; 4054} 4055STOLEN_DQUEUE_DATA_FREE(g->t_AHFA); 4056} 4057 4058@ Most of the data is on the obstack, and will be freed with that. 4059@<Free AHFA state@> = { 4060 TRANS *ahfa_transitions = LV_TRANSs_of_AHFA (ahfa_state); 4061 if (ahfa_transitions) 4062 g_free (TRANSs_of_AHFA (ahfa_state)); 4063} 4064 4065@*0 ID of AHFA State. 4066@d ID_of_AHFA(state) ((state)->t_key.t_id) 4067 4068@*0 Validate AHFA ID. 4069Check that AHFA ID is in valid range. 4070@<Function definitions@> = 4071static inline gint AHFA_state_id_is_valid( 4072const struct marpa_g *g, AHFAID AHFA_state_id) { 4073return AHFA_state_id < AHFA_Count_of_G(g) && AHFA_state_id >= 0; 4074} 4075@ @<Private function prototypes@> = 4076static inline gint AHFA_state_id_is_valid( 4077const struct marpa_g *g, AHFAID AHFA_state_id); 4078 4079 4080@*0 Postdot Symbols. 4081@d Postdot_SYM_Count_of_AHFA(state) ((state)->t_postdot_sym_count) 4082@d LV_Postdot_SYM_Count_of_AHFA(state) Postdot_SYM_Count_of_AHFA(state) 4083@d Postdot_SYMID_Ary_of_AHFA(state) ((state)->t_postdot_symid_ary) 4084@d LV_Postdot_SYMID_Ary_of_AHFA(state) Postdot_SYMID_Ary_of_AHFA(state) 4085@<Widely aligned AHFA state elements@> = Marpa_Symbol_ID* t_postdot_symid_ary; 4086@ @<Int aligned AHFA state elements@> = guint t_postdot_sym_count; 4087 4088@*0 AHFA State External Accessors. 4089@<Function definitions@> = 4090guint marpa_AHFA_state_count(struct marpa_g* g) { 4091 return AHFA_Count_of_G(g); 4092} 4093@ @<Public function prototypes@> = 4094guint marpa_AHFA_state_count(struct marpa_g* g); 4095 4096@ @<Function definitions@> = 4097gint 4098marpa_AHFA_state_item_count(struct marpa_g* g, AHFAID AHFA_state_id) 4099{ @<Return |-2| on failure@>@/ 4100 AHFA state; 4101 @<Fail if grammar not precomputed@>@/ 4102 @<Fail if grammar |AHFA_state_id| is invalid@>@/ 4103 state = AHFA_of_G_by_ID(g, AHFA_state_id); 4104 return state->t_item_count; 4105} 4106@ @<Public function prototypes@> = 4107gint marpa_AHFA_state_item_count(struct marpa_g* g, Marpa_AHFA_State_ID AHFA_state_id); 4108 4109@ @<Public function prototypes@> = 4110Marpa_AHFA_Item_ID marpa_AHFA_state_item(struct marpa_g* g, 4111 Marpa_AHFA_State_ID AHFA_state_id, 4112 guint item_ix); 4113@ @d AIMID_of_AHFA_by_AEX(g, ahfa, aex) 4114 ((ahfa)->t_items[aex] - (g)->t_AHFA_items) 4115@<Function definitions@> = 4116Marpa_AHFA_Item_ID marpa_AHFA_state_item(struct marpa_g* g, 4117 AHFAID AHFA_state_id, 4118 guint item_ix) { 4119 AHFA state; 4120 @<Return |-2| on failure@>@/ 4121 @<Fail if grammar not precomputed@>@/ 4122 @<Fail if grammar |AHFA_state_id| is invalid@>@/ 4123 state = AHFA_of_G_by_ID(g, AHFA_state_id); 4124 if (item_ix >= state->t_item_count) { 4125 g_context_clear(g); 4126 g_context_int_add(g, "item_ix", (gint)item_ix); 4127 g_context_int_add(g, "AHFA_state_id", AHFA_state_id); 4128 g->t_error = "invalid state item ix"; 4129 return failure_indicator; 4130 } 4131 return AIMID_of_AHFA_by_AEX(g, state, item_ix); 4132} 4133 4134@ @<Function definitions@> = 4135gint marpa_AHFA_state_is_predict(struct marpa_g* g, 4136 AHFAID AHFA_state_id) { 4137 AHFA state; 4138 @<Return |-2| on failure@>@/ 4139 @<Fail if grammar not precomputed@>@/ 4140 @<Fail if grammar |AHFA_state_id| is invalid@>@/ 4141 state = AHFA_of_G_by_ID(g, AHFA_state_id); 4142 return AHFA_is_Predicted(state); 4143} 4144@ @<Public function prototypes@> = 4145gint marpa_AHFA_state_is_predict(struct marpa_g* g, 4146 Marpa_AHFA_State_ID AHFA_state_id); 4147 4148@*0 Completed Start Rule. 4149This external acccessor returns the rule ID of 4150the completed start rule of an AHFA state. 4151Most often there is none, in which case 4152|-1| is returned. 4153For other failures, |-2| is returned. 4154@ @<Public function prototypes@> = 4155Marpa_Rule_ID marpa_AHFA_completed_start_rule(struct marpa_g* g, 4156 Marpa_AHFA_State_ID AHFA_state_id); 4157@ I know that the completed start rule is this AHFA state is 4158unique, via the following theorem. 4159\Theorem/ No AHFA state contains more than one completed start rule. 4160\Proof/: As proved elsewhere in this document, 4161an AHFA state with a completed start rule is either AHFA state 0 4162or a 1-item discovered AHFA state. 4163Clearly the AHFA item which is the completed start rule is 4164unique in a 1-item AHFA state. 4165From its construction we know that 4166AHFA state 0 contains at most two rules: 4167a predicted non-null start rule 4168and a predicted null start rule. 4169A predicted non-null rule is not a completed rule. 4170Therefore only the predicted null start rule 4171can be a completed start rule in AHFA state 0. 4172\QED/. 4173@ 4174{\bf To Do}: @^To Do@> 4175This function can probably be eliminated after conversion 4176is complete, along with the flag for whether a rule is a start rule 4177and the flag for tracking whether an AHFA has a completed start rule. 4178 4179@<Function definitions@> = 4180Marpa_Rule_ID marpa_AHFA_completed_start_rule(struct marpa_g* g, 4181 Marpa_AHFA_State_ID AHFA_state_id) { 4182 const gint no_completed_start_rule = -1; 4183 @<Return |-2| on failure@>@; 4184 AHFA state; 4185 @<Fail if grammar not precomputed@>@; 4186 @<Fail if grammar |AHFA_state_id| is invalid@>@; 4187 state = AHFA_of_G_by_ID (g, AHFA_state_id); 4188 if (AHFA_has_Completed_Start_Rule(state)) { 4189 const gint ahfa_item_count = state->t_item_count; 4190 const AIM* ahfa_items = state->t_items; 4191 gint ahfa_ix; 4192 for (ahfa_ix = 0; ahfa_ix < ahfa_item_count; ahfa_ix++) 4193 { 4194 const AIM ahfa_item = ahfa_items[ahfa_ix]; 4195 if (AIM_is_Completion (ahfa_item)) 4196 { 4197 const RULE rule = RULE_of_AIM (ahfa_item); 4198 if (RULE_is_Start (rule)) 4199 return ID_of_RULE (rule); 4200 } 4201 } 4202 @<Fail with internal grammar error@>@; 4203 } 4204 return no_completed_start_rule; 4205} 4206 4207@*0 Leo LHS Symbol. 4208The Leo LHS symbol is the LHS of the AHFA state's rule, 4209if that state can be a Leo completion. 4210Otherwise it is |-1|. 4211The value of the Leo completion symbol is used to 4212determine if an Earley item 4213with this AHFA state is eligible to be a Leo completion. 4214@d Leo_LHS_ID_of_AHFA(state) ((state)->t_leo_lhs_sym) 4215@d LV_Leo_LHS_ID_of_AHFA(state) Leo_LHS_ID_of_AHFA(state) 4216@d AHFA_is_Leo_Completion(state) (Leo_LHS_ID_of_AHFA(state) >= 0) 4217@ @<Int aligned AHFA state elements@> = SYMID t_leo_lhs_sym; 4218@ @<Public function prototypes@> = 4219Marpa_Symbol_ID marpa_AHFA_state_leo_lhs_symbol(struct marpa_g* g, 4220 Marpa_AHFA_State_ID AHFA_state_id); 4221@ @<Function definitions@> = 4222Marpa_Symbol_ID marpa_AHFA_state_leo_lhs_symbol(struct marpa_g* g, 4223 Marpa_AHFA_State_ID AHFA_state_id) { 4224 @<Return |-2| on failure@>@; 4225 AHFA state; 4226 @<Fail if grammar not precomputed@>@; 4227 @<Fail if grammar |AHFA_state_id| is invalid@>@; 4228 state = AHFA_of_G_by_ID(g, AHFA_state_id); 4229 return Leo_LHS_ID_of_AHFA(state); 4230} 4231 4232@*0 Internal Accessors. 4233@ The ordering of the AHFA states can be arbitrarily chosen 4234to be efficient to compute. 4235The only requirement is that states with identical sets 4236of items compare equal. 4237Here the length is the first subkey, because 4238that will be enough to order most predicted states. 4239The discovered states will be efficient to compute because 4240they will tend either to be short, 4241or quickly differentiated 4242by length. 4243\par 4244Note that this function is not used for discovered AHFA states of 4245size 1. 4246Checking those for duplicates is optimized, using an array 4247indexed by the ID of their only AHFA item. 4248@<Private function prototypes@> = 4249static gint AHFA_state_cmp(gconstpointer a, gconstpointer b); 4250@ @<Function definitions@> = 4251static gint AHFA_state_cmp( 4252 gconstpointer ap, 4253 gconstpointer bp) 4254{ 4255 guint i; 4256 AIM* items_a; 4257 AIM* items_b; 4258 const AHFA state_a = (AHFA)ap; 4259 const AHFA state_b = (AHFA)bp; 4260 guint length = state_a->t_item_count; 4261 gint subkey = length - state_b->t_item_count; 4262 if (subkey) return subkey; 4263 if (length != state_b->t_item_count) return FALSE; 4264 items_a = state_a->t_items; 4265 items_b = state_b->t_items; 4266 for (i = 0; i < length; i++) { 4267 subkey = Sort_Key_of_AIM (items_a[i]) - Sort_Key_of_AIM (items_b[i]); 4268 if (subkey) return subkey; 4269} 4270return 0; 4271} 4272 4273@*0 AHFA State Mutators. 4274@ @<Private function prototypes@> = 4275PRIVATE_NOT_INLINE void create_AHFA_states(struct marpa_g* g); 4276@ @<Function definitions@> = 4277PRIVATE_NOT_INLINE 4278void create_AHFA_states(struct marpa_g* g) { 4279 @<Declare locals for creating AHFA states@>@; 4280 @<Initialize locals for creating AHFA states@>@; 4281 @<Construct prediction matrix@>@; 4282 @<Construct initial AHFA states@>@; 4283 while ((p_working_state = DQUEUE_NEXT(states, AHFA_Object))) { 4284 @<Process an AHFA state from the working stack@>@; 4285 } 4286 ahfas_of_g = g->t_AHFA = DQUEUE_BASE(states, AHFA_Object); /* ``Steals" 4287 the |DQUEUE|'s data */ 4288 ahfa_count_of_g = AHFA_Count_of_G(g) = DQUEUE_END(states); 4289 @<Resize the transitions@>@; 4290 @<Resort the AIMs and populate the Leo base AEXes@>@; 4291 @<Populate the completed symbol data in the transitions@>@; 4292 @<Free locals for creating AHFA states@>@; 4293} 4294 4295@ @<Declare locals for creating AHFA states@> = 4296 AHFA p_working_state; 4297 const guint initial_no_of_states = 2*Size_of_G(g); 4298 AIM AHFA_item_0_p = g->t_AHFA_items; 4299 const guint symbol_count_of_g = SYM_Count_of_G(g); 4300 const guint rule_count_of_g = RULE_Count_of_G(g); 4301 Bit_Matrix prediction_matrix; 4302 RULE* rule_by_sort_key = g_new(RULE, rule_count_of_g); 4303 GTree* duplicates; 4304 AHFA* singleton_duplicates; 4305 DQUEUE_DECLARE(states); 4306 struct obstack ahfa_work_obs; 4307 gint ahfa_count_of_g; 4308 AHFA ahfas_of_g; 4309 4310@ @<Initialize locals for creating AHFA states@> = 4311 @<Initialize duplicates data structures@>@; 4312 DQUEUE_INIT(states, AHFA_Object, initial_no_of_states); 4313 4314@ @<Initialize duplicates data structures@> = 4315{ 4316 guint item_id; 4317 guint no_of_items_in_grammar = AIM_Count_of_G (g); 4318 obstack_init(&ahfa_work_obs); 4319 duplicates = g_tree_new (AHFA_state_cmp); 4320 singleton_duplicates = g_new (AHFA, no_of_items_in_grammar); 4321 for (item_id = 0; item_id < no_of_items_in_grammar; item_id++) 4322 { 4323 singleton_duplicates[item_id] = NULL; // All zero bits are not necessarily a NULL pointer 4324 } 4325} 4326 4327@ @<Process an AHFA state from the working stack@> = { 4328guint no_of_items = p_working_state->t_item_count; 4329guint current_item_ix=0; 4330AIM*item_list; 4331Marpa_Symbol_ID working_symbol; 4332item_list = p_working_state->t_items; 4333working_symbol = Postdot_SYMID_of_AIM(item_list[0]); /* 4334 Every AHFA has at least one item */ 4335if (working_symbol < 0) goto NEXT_AHFA_STATE; /* 4336 All items in this state are completions */ 4337 while (1) { /* Loop over all items for this state */ 4338 guint first_working_item_ix = current_item_ix; 4339 guint no_of_items_in_new_state; 4340 for (current_item_ix++; 4341 current_item_ix < no_of_items; 4342 current_item_ix++) { 4343 if (Postdot_SYMID_of_AIM(item_list[current_item_ix]) != working_symbol) break; 4344 } 4345 no_of_items_in_new_state = current_item_ix - first_working_item_ix; 4346 if (no_of_items_in_new_state == 1) { 4347 @<Create a 1-item discovered AHFA state@>@/ 4348 } else { 4349 @<Create a discovered AHFA state with 2+ items@>@/ 4350 } 4351 NEXT_WORKING_SYMBOL: ; 4352 if (current_item_ix >= no_of_items) break; 4353 working_symbol = Postdot_SYMID_of_AIM(item_list[current_item_ix]); 4354 if (working_symbol < 0) break; 4355 }@# 4356NEXT_AHFA_STATE: ; 4357} 4358 4359@ @<Resize the transitions@> = 4360{ 4361 gint ahfa_id; 4362 for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++) { 4363 guint symbol_id; 4364 AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id); 4365 TRANS* const transitions = TRANSs_of_AHFA(ahfa); 4366 for (symbol_id = 0; symbol_id < symbol_count_of_g; symbol_id++) { 4367 TRANS working_transition = transitions[symbol_id]; 4368 if (working_transition) { 4369 gint completion_count = Completion_Count_of_TRANS(working_transition); 4370 gint sizeof_transition = 4371 G_STRUCT_OFFSET (struct s_transition, t_aex) + completion_count * 4372 sizeof (transitions[0]->t_aex[0]); 4373 TRANS new_transition = obstack_alloc(&g->t_obs, sizeof_transition); 4374 LV_To_AHFA_of_TRANS(new_transition) = To_AHFA_of_TRANS(working_transition); 4375 LV_Completion_Count_of_TRANS(new_transition) = 0; 4376 transitions[symbol_id] = new_transition; 4377 } 4378 } 4379 } 4380} 4381 4382@ @<Populate the completed symbol data in the transitions@> = 4383{ 4384 gint ahfa_id; 4385 for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++) { 4386 const AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id); 4387 TRANS* const transitions = TRANSs_of_AHFA(ahfa); 4388 if (Complete_SYM_Count_of_AHFA(ahfa) > 0) { 4389 AIM* aims = AIMs_of_AHFA(ahfa); 4390 gint aim_count = AIM_Count_of_AHFA(ahfa); 4391 AEX aex; 4392 for (aex = 0; aex < aim_count; aex++) { 4393 AIM ahfa_item = aims[aex]; 4394 if (AIM_is_Completion(ahfa_item)) { 4395 SYMID completed_symbol_id = LHS_ID_of_AIM(ahfa_item); 4396 TRANS transition = transitions[completed_symbol_id]; 4397 AEX* aexes = AEXs_of_TRANS(transition); 4398 gint aex_ix = LV_Completion_Count_of_TRANS(transition)++; 4399MARPA_OFF_DEBUG4("Added completion aex at %d for ahfa_id=%d sym=%d", 4400 aex_ix, ahfa_id, completed_symbol_id); 4401 aexes[aex_ix] = aex; 4402 } 4403 } 4404 } 4405 } 4406} 4407 4408@ For every AHFA item which can be a Leo base, and any transition 4409(or postdot) symbol that leads to a Leo completion, put the AEX 4410into the |TRANS| structure, for memoization. 4411@<Resort the AIMs and populate the Leo base AEXes@> = 4412{ 4413 gint ahfa_id; 4414 for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++) 4415 { 4416 AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id); 4417 TRANS* const transitions = TRANSs_of_AHFA(ahfa); 4418 AIM *aims = AIMs_of_AHFA (ahfa); 4419 gint aim_count = AIM_Count_of_AHFA (ahfa); 4420 AEX aex; 4421 g_qsort_with_data(aims, aim_count, sizeof (AIM*), cmp_by_aimid, NULL); 4422 for (aex = 0; aex < aim_count; aex++) 4423 { 4424 AIM ahfa_item = aims[aex]; 4425 SYMID postdot = Postdot_SYMID_of_AIM (ahfa_item); 4426 if (postdot >= 0) 4427 { 4428 TRANS transition = transitions[postdot]; 4429 AHFA to_ahfa = To_AHFA_of_TRANS (transition); 4430 if (!AHFA_is_Leo_Completion (to_ahfa)) 4431 continue; 4432 Leo_Base_AEX_of_TRANS (transition) = aex; 4433 } 4434 } 4435 } 4436} 4437 4438@ @<Free locals for creating AHFA states@> = 4439 g_free(rule_by_sort_key); 4440 matrix_free(prediction_matrix); 4441 @<Free duplicates data structures@>@; 4442 obstack_free(&ahfa_work_obs, NULL); 4443 4444@ @<Free duplicates data structures@> = 4445g_free(singleton_duplicates); 4446g_tree_destroy(duplicates); 4447 4448@ @<Construct initial AHFA states@> = { 4449 AHFA p_initial_state = DQUEUE_PUSH(states, AHFA_Object);@/ 4450 Marpa_Rule_ID start_rule_id; 4451 AIM start_item; 4452 SYM start_symbol = SYM_by_ID(g->t_start_symid); 4453 SYM start_alias 4454 = symbol_null_alias(start_symbol); 4455 gint no_of_items_in_new_state = start_alias ? 2 : 1; 4456 AIM* item_list 4457 = obstack_alloc(&g->t_obs, no_of_items_in_new_state*sizeof(AIM)); 4458 start_rule_id = g_array_index(start_symbol->t_lhs, Marpa_Rule_ID, 0); /* The start rule 4459 is the unique rule that has the start symbol as its LHS */ 4460 start_item = g->t_AHFA_items_by_rule[start_rule_id]; /* The start item is the 4461 initial item for the start rule */ 4462 item_list[0] = start_item; 4463 if (start_alias) { 4464 Marpa_Rule_ID alias_rule_id 4465 = g_array_index(start_alias->t_lhs, Marpa_Rule_ID, 0); /* Start alias 4466 rule is the unique rule that has 4467 the start alias as its LHS */ 4468 item_list[1] = g->t_AHFA_items_by_rule[alias_rule_id]; 4469 } 4470 p_initial_state->t_items = item_list; 4471 p_initial_state->t_item_count = no_of_items_in_new_state; 4472 p_initial_state->t_key.t_id = 0; 4473 LV_AHFA_is_Predicted(p_initial_state) = 0; 4474 LV_Leo_LHS_ID_of_AHFA(p_initial_state) = -1; 4475 LV_TRANSs_of_AHFA(p_initial_state) = transitions_new(g); 4476 p_initial_state->t_empty_transition = NULL; 4477 if (SYM_is_Nulling(start_symbol)) 4478 { // Special case the null parse 4479 SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID)); 4480 SYMID completed_symbol_id = ID_of_SYM(start_symbol); 4481 *complete_symids = completed_symbol_id; 4482 completion_count_inc (&ahfa_work_obs, p_initial_state, completed_symbol_id); 4483 LV_Complete_SYMIDs_of_AHFA(p_initial_state) = complete_symids; 4484 LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 1; 4485 p_initial_state->t_has_completed_start_rule = 1; 4486 LV_Postdot_SYM_Count_of_AHFA(p_initial_state) = 0; 4487 } 4488 else 4489 { 4490 SYMID* postdot_symbol_ids; 4491 LV_Postdot_SYM_Count_of_AHFA(p_initial_state) = 1; 4492 postdot_symbol_ids = LV_Postdot_SYMID_Ary_of_AHFA(p_initial_state) = 4493 obstack_alloc (&g->t_obs, sizeof (SYMID)); 4494 *postdot_symbol_ids = Postdot_SYMID_of_AIM(start_item); 4495 if (start_alias) 4496 { 4497 SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID)); 4498 SYMID completed_symbol_id = ID_of_SYM(start_alias); 4499 *complete_symids = completed_symbol_id; 4500 completion_count_inc(&ahfa_work_obs, p_initial_state, completed_symbol_id); 4501 LV_Complete_SYMIDs_of_AHFA(p_initial_state) = complete_symids; 4502 LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 1; 4503 p_initial_state->t_has_completed_start_rule = 1; 4504 } 4505 else 4506 { 4507 LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 0; 4508 p_initial_state->t_has_completed_start_rule = 0; 4509 } 4510 p_initial_state->t_empty_transition = 4511 create_predicted_AHFA_state (g, 4512 matrix_row (prediction_matrix, 4513 (guint) 4514 Postdot_SYMID_of_AIM (start_item)), 4515 rule_by_sort_key, &states, duplicates); 4516 } 4517} 4518 4519@* Discovered AHFA States. 4520@ {\bf Theorem}: 4521An AHFA state that contains a start rule completion is either 4522AHFA state 0 or a 1-item discovered state. 4523{\bf Proof}: 4524AHFA state 0 contains a start rule completion in any grammar 4525for which the null parse is valid. 4526AHFA state 0 also contains the non-null parse predicted rule. 4527\par 4528The grammar is augmented, 4529so that no other rule predicts the start rules. 4530This means that AHFA state 0 will contain the only predicted 4531start rules. 4532The form of the non-null predicted start rule 4533is $S' \leftarrow \cdot S$, 4534where $S'$ is the augmented start symbol and $S$ was 4535the start symbol in the original grammar. 4536This rule will be the only transition out of AHFA state 0. 4537Call the to-state of this transition, state $n$. 4538State $n$ will clearly contain a completed start rule 4539( $S' \leftarrow S \cdot$ ), 4540which will be rule for the only AHFA item in AHFA state $n$. 4541\par 4542Since only state 0 contains 4543$S' \leftarrow \cdot S$, 4544only AHFA state $n$ will contain 4545$S' \leftarrow S \cdot$. 4546Therefore all AHFA states containing start rule completions 4547are either AHFA state 0, or 1-item discovered AHFA states. 4548{\bf QED}. 4549@<Create a 1-item discovered AHFA state@> = { 4550 AHFA p_new_state; 4551 AIM* new_state_item_list; 4552 AIM single_item_p = item_list[first_working_item_ix]; 4553 Marpa_AHFA_Item_ID single_item_id; 4554 Marpa_Symbol_ID postdot; 4555 single_item_p++; // Transition to next item for this rule 4556 single_item_id = single_item_p - AHFA_item_0_p; 4557 p_new_state = singleton_duplicates[single_item_id]; 4558 if (p_new_state) 4559 { /* Do not add, this is a duplicate */ 4560 transition_add (&ahfa_work_obs, p_working_state, working_symbol, p_new_state); 4561 goto NEXT_WORKING_SYMBOL; 4562 } 4563 p_new_state = DQUEUE_PUSH (states, AHFA_Object); 4564 /* Create a new AHFA state */ 4565 singleton_duplicates[single_item_id] = p_new_state; 4566 new_state_item_list = p_new_state->t_items = 4567 obstack_alloc (&g->t_obs, sizeof (AIM)); 4568 new_state_item_list[0] = single_item_p; 4569 p_new_state->t_item_count = 1; 4570 LV_AHFA_is_Predicted(p_new_state) = 0; 4571 if (AIM_has_Completed_Start_Rule(single_item_p)) { 4572 p_new_state->t_has_completed_start_rule = 1; 4573 } else { 4574 p_new_state->t_has_completed_start_rule = 0; 4575 } 4576 LV_Leo_LHS_ID_of_AHFA(p_new_state) = -1; 4577 p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE (states, AHFA_Object); 4578 LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g); 4579 transition_add (&ahfa_work_obs, p_working_state, working_symbol, p_new_state); 4580 postdot = Postdot_SYMID_of_AIM(single_item_p); 4581 if (postdot >= 0) 4582 { 4583 LV_Complete_SYM_Count_of_AHFA(p_new_state) = 0; 4584 p_new_state->t_postdot_sym_count = 1; 4585 p_new_state->t_postdot_symid_ary = 4586 obstack_alloc (&g->t_obs, sizeof (SYMID)); 4587 *(p_new_state->t_postdot_symid_ary) = postdot; 4588 /* If the sole item is not a completion 4589 attempt to create a predicted AHFA state as well */ 4590 p_new_state->t_empty_transition = 4591 create_predicted_AHFA_state (g, 4592 matrix_row (prediction_matrix, 4593 (guint) postdot), 4594 rule_by_sort_key, &states, duplicates); 4595 } 4596 else 4597 { 4598 SYMID lhs_id = LHS_ID_of_AIM(single_item_p); 4599 SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID)); 4600 *complete_symids = lhs_id; 4601 LV_Complete_SYMIDs_of_AHFA(p_new_state) = complete_symids; 4602 completion_count_inc(&ahfa_work_obs, p_new_state, lhs_id); 4603 LV_Complete_SYM_Count_of_AHFA(p_new_state) = 1; 4604 p_new_state->t_postdot_sym_count = 0; 4605 p_new_state->t_empty_transition = NULL; 4606 @<If this state can be a Leo completion, 4607 set the Leo completion symbol to |lhs_id|@>@; 4608 } 4609} 4610 4611@ 4612Assuming this is a 1-item completion, mark this state as 4613a Leo completion if the last non-nulling symbol is on a LHS. 4614(This eliminates rule which end in a terminal-only symbol from 4615consideration in the Leo logic.) 4616We know that there is a non-nulling symbol, because there is 4617one is every non-nulling rule, the only non-nulling rule will 4618be in AHFA state 0, and AHFA state 0 is 4619handled as a special cases. 4620\par 4621As a note, the current logic makes an item an leo completion 4622if the last non-nulling symbol is on a LHS. 4623With a bit more trouble, I could determine 4624which rules are right-recursive. 4625I would need to compute a transitive closure on the relationship 4626``X right-derives Y" and then consider a state to be 4627a Leo completion 4628only if the LHS of the rule in its only item right-derives its 4629last non-nulling symbol. 4630 4631@ The expression below takes the first (and only) item in 4632the current state, and finds its closest previous non-nulling 4633symbol. 4634This will be the postdot symbol of the AHFA item just prior, 4635which can be found by simply decrementing the pointer. 4636If the predot symbol of an item is on the LHS of any rule, 4637then that state is a Leo completion. 4638@<If this state can be a Leo completion, 4639set the Leo completion symbol to |lhs_id|@> = { 4640 AIM previous_ahfa_item = single_item_p - 1; 4641 SYMID predot_symid = Postdot_SYMID_of_AIM(previous_ahfa_item); 4642 if (SYMBOL_LHS_RULE_COUNT (SYM_by_ID (predot_symid)) 4643 > 0) 4644 { 4645 LV_Leo_LHS_ID_of_AHFA(p_new_state) = lhs_id; 4646 } 4647} 4648 4649@ Discovered AHFA states are usually quite small 4650and the insertion sort here is probably optimal for the usual cases. 4651It is $O(n^2)$ for the large AHFA states, but at present there is 4652little value in coding for such cases. 4653Average complexity -- probably $O(1)$. 4654Implemented worst-case complexity: $O(n^2)$. 4655Theoretical complexity: $O(n \log n)$, because another sort can easily be 4656substituted for the insertion sort. 4657\par 4658Note the mixture of indexing and old-fashioned pointer twiddling 4659in the insertion sort. 4660I am usually of the opinion that the pointer twiddling should be left 4661to the optimizer, but in this case I think that a little bit of 4662pointer twiddling actually makes the code clearer than it would 4663be if written 100\% using indexes. 4664@<Create a discovered AHFA state with 2+ items@> = { 4665AHFA p_new_state; 4666guint predecessor_ix; 4667guint no_of_new_items_so_far = 0; 4668AIM* item_list_for_new_state; 4669AHFA queued_AHFA_state; 4670p_new_state = DQUEUE_PUSH(states, AHFA_Object); 4671item_list_for_new_state = p_new_state->t_items = obstack_alloc(&g->t_obs_tricky, 4672 no_of_items_in_new_state * sizeof(AIM)); 4673p_new_state->t_item_count = no_of_items_in_new_state; 4674for (predecessor_ix = first_working_item_ix; 4675 predecessor_ix < current_item_ix; predecessor_ix++) 4676 { 4677 gint pre_insertion_point_ix = no_of_new_items_so_far - 1; 4678 AIM new_item_p = item_list[predecessor_ix] + 1; // Transition to the next item 4679 while (pre_insertion_point_ix >= 0) 4680 { // Insert the new item, ordered by |sort_key| 4681 AIM *current_item_pp = 4682 item_list_for_new_state + pre_insertion_point_ix; 4683 if (Sort_Key_of_AIM (new_item_p) >= 4684 Sort_Key_of_AIM (*current_item_pp)) 4685 break; 4686 *(current_item_pp + 1) = *current_item_pp; 4687 pre_insertion_point_ix--; 4688 } 4689 item_list_for_new_state[pre_insertion_point_ix + 1] = new_item_p; 4690 no_of_new_items_so_far++; 4691 } 4692queued_AHFA_state = assign_AHFA_state(p_new_state, duplicates); 4693if (queued_AHFA_state) 4694 { // The new state would be a duplicate 4695// Back it out and go on to the next in the queue 4696 (void) DQUEUE_POP (states, AHFA_Object); 4697 obstack_free (&g->t_obs_tricky, item_list_for_new_state); 4698 transition_add (&ahfa_work_obs, p_working_state, working_symbol, queued_AHFA_state); 4699 /* |transition_add()| allocates obstack memory, but uses the 4700 ``non-tricky" obstack */ 4701 goto NEXT_WORKING_SYMBOL; 4702 } 4703 // If we added the new state, finish up its data. 4704 p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE(states, AHFA_Object); 4705 LV_AHFA_is_Predicted(p_new_state) = 0; 4706 p_new_state->t_has_completed_start_rule = 0; 4707 LV_Leo_LHS_ID_of_AHFA(p_new_state) =-1; 4708 LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g); 4709 @<Calculate complete and postdot symbols for discovered state@>@/ 4710 transition_add(&ahfa_work_obs, p_working_state, working_symbol, p_new_state); 4711 @<Calculate the predicted rule vector for this state 4712 and add the predicted AHFA state@>@/ 4713} 4714 4715@ @<Calculate complete and postdot symbols for discovered state@> = 4716{ 4717 guint symbol_count = SYM_Count_of_G (g); 4718 guint item_ix; 4719 guint no_of_postdot_symbols; 4720 guint no_of_complete_symbols; 4721 Bit_Vector complete_v = bv_create (symbol_count); 4722 Bit_Vector postdot_v = bv_create (symbol_count); 4723 for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++) 4724 { 4725 AIM item = item_list_for_new_state[item_ix]; 4726 Marpa_Symbol_ID postdot = Postdot_SYMID_of_AIM (item); 4727 if (postdot < 0) 4728 { 4729 gint complete_symbol_id = LHS_ID_of_AIM (item); 4730 completion_count_inc (&ahfa_work_obs, p_new_state, complete_symbol_id); 4731 bv_bit_set (complete_v, (guint)complete_symbol_id ); 4732 } 4733 else 4734 { 4735 bv_bit_set (postdot_v, (guint) postdot); 4736 } 4737 } 4738if ((no_of_postdot_symbols = p_new_state->t_postdot_sym_count = 4739 bv_count (postdot_v))) 4740 { 4741 guint min, max, start; 4742 Marpa_Symbol_ID *p_symbol = p_new_state->t_postdot_symid_ary = 4743 obstack_alloc (&g->t_obs, 4744 no_of_postdot_symbols * sizeof (SYMID)); 4745 for (start = 0; bv_scan (postdot_v, start, &min, &max); start = max + 2) 4746 { 4747 Marpa_Symbol_ID postdot; 4748 for (postdot = (Marpa_Symbol_ID) min; 4749 postdot <= (Marpa_Symbol_ID) max; postdot++) 4750 { 4751 *p_symbol++ = postdot; 4752 } 4753 } 4754 } 4755 if ((no_of_complete_symbols = 4756 LV_Complete_SYM_Count_of_AHFA (p_new_state) = bv_count (complete_v))) 4757 { 4758 guint min, max, start; 4759 SYMID *complete_symids = obstack_alloc (&g->t_obs, 4760 no_of_complete_symbols * 4761 sizeof (SYMID)); 4762 SYMID *p_symbol = complete_symids; 4763 LV_Complete_SYMIDs_of_AHFA (p_new_state) = complete_symids; 4764 for (start = 0; bv_scan (complete_v, start, &min, &max); start = max + 2) 4765 { 4766 SYMID complete_symbol_id; 4767 for (complete_symbol_id = (SYMID) min; complete_symbol_id <= (SYMID) max; 4768 complete_symbol_id++) 4769 { 4770 *p_symbol++ = complete_symbol_id; 4771 } 4772 } 4773 } 4774 bv_free (postdot_v); 4775 bv_free (complete_v); 4776} 4777 4778@ Find the AHFA state in the argument, 4779creating it if it does not exist. 4780When it does not exist, insert it 4781in the sequence of states 4782and return |NULL|. 4783When it does exist, return a pointer to it. 4784@ @<Private function prototypes@> = 4785static inline AHFA assign_AHFA_state( 4786AHFA state_p, GTree* duplicates); 4787@ @<Function definitions@> = 4788static inline AHFA 4789assign_AHFA_state (AHFA sought_state, GTree* duplicates) 4790{ 4791 const AHFA state_found = g_tree_lookup(duplicates, sought_state); 4792 if (state_found) return state_found; 4793 g_tree_insert(duplicates, sought_state, sought_state); 4794 return NULL; 4795} 4796 4797@ @<Calculate the predicted rule vector for this state 4798and add the predicted AHFA state@> = { 4799guint item_ix; 4800Marpa_Symbol_ID postdot = -1; // Initialized to prevent GCC warning 4801for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++) { 4802 postdot = Postdot_SYMID_of_AIM(item_list_for_new_state[item_ix]); 4803 if (postdot >= 0) break; 4804} 4805p_new_state->t_empty_transition = NULL; 4806if (postdot >= 0) 4807{ /* If any item is not a completion ... */ 4808 Bit_Vector predicted_rule_vector 4809 = bv_shadow (matrix_row (prediction_matrix, (guint) postdot)); 4810 for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++) 4811 { 4812 /* ``or" the other non-complete items into the prediction rule vector */ 4813 postdot = Postdot_SYMID_of_AIM (item_list_for_new_state[item_ix]); 4814 if (postdot < 0) 4815 continue; 4816 bv_or_assign (predicted_rule_vector, 4817 matrix_row (prediction_matrix, (guint) postdot)); 4818 } 4819 /* Add the predicted rule */ 4820 p_new_state->t_empty_transition = create_predicted_AHFA_state (g, 4821 predicted_rule_vector, 4822 rule_by_sort_key, 4823 &states, 4824 duplicates); 4825 bv_free (predicted_rule_vector); 4826} 4827} 4828 4829@*0 Predicted AHFA States. 4830The method for building predicted AHFA states is optimized using 4831precomputed bit vectors. 4832This should be very fast, 4833but It is possible to think other methods might 4834be better, at least in some cases. The bit vectors are $O(s)$ in length, where $s$ is the 4835size of the grammar, and so is the time complexity of the method used. 4836@ It may be possible to look at a list of 4837only the AHFA items actually present in each state, 4838which might be $O(\log s)$ in the average case. An advantage of the bit vectors is they 4839implicitly perform a radix sort. 4840This would have to be performed explicitly for an enumerated 4841list of AHFA items, making the putative average case $O(\log s \cdot \log \log s)$. 4842@ In the worst case, however, the number of AHFA items in the predicted states is 4843$O(s)$, making the time complexity 4844of a list solution, $O(s \cdot \log s)$. 4845In normal cases, 4846the practical advantages of bit vectors are overwhelming and swamp the theoretical 4847time complexity. 4848The advantage of listing AHFA items is restricted to a putative ``average" case, 4849and even there would not kick in until the grammars became very large. 4850My conclusion is that alternatives to the bit vector implementation deserve 4851further investigation, but that at present, and overall, 4852bit vectors appear clearly superior to the alternatives. 4853@ For the predicted states, I construct a symbol-by-rule matrix 4854of predictions. First, I determine which symbols directly predict 4855others. Then I compute the transitive closure. 4856Finally, I convert this to a symbol-by-rule matrix. 4857The symbol-by-rule matrix will be used in constructing the prediction 4858states. 4859 4860@ @<Construct prediction matrix@> = { 4861 Bit_Matrix symbol_by_symbol_matrix = 4862 matrix_create (symbol_count_of_g, symbol_count_of_g); 4863 @<Initialize the symbol-by-symbol matrix@>@/ 4864 transitive_closure(symbol_by_symbol_matrix); 4865 @<Create the prediction matrix from the symbol-by-symbol matrix@>@/ 4866 matrix_free(symbol_by_symbol_matrix); 4867} 4868 4869@ @<Initialize the symbol-by-symbol matrix@> = 4870{ 4871 RULEID rule_id; 4872 SYMID symid; 4873 AIM *items_by_rule = g->t_AHFA_items_by_rule; 4874 for (symid = 0; symid < (SYMID) symbol_count_of_g; symid++) 4875 { 4876 /* If a symbol appears on a LHS, it predicts itself. */ 4877 SYM symbol = SYM_by_ID (symid); 4878 if (!SYMBOL_LHS_RULE_COUNT (symbol)) 4879 continue; 4880 matrix_bit_set (symbol_by_symbol_matrix, (guint) symid, (guint) symid); 4881 } 4882 for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++) 4883 { 4884 SYMID from, to; 4885 /* Get the initial item for the rule */ 4886 AIM item = items_by_rule[rule_id]; 4887 /* Not all rules have items */ 4888 if (!item) 4889 continue; 4890 from = LHS_ID_of_AIM (item); 4891 to = Postdot_SYMID_of_AIM (item); 4892 /* There is no symbol-to-symbol transition for a completion item */ 4893 if (to < 0) 4894 continue; 4895 /* Set a bit in the matrix */ 4896 matrix_bit_set (symbol_by_symbol_matrix, (guint) from, (guint) to); 4897 } 4898} 4899 4900@ At this point I have a full matrix showing which symbol implies a prediction 4901of which others. To save repeated processing when building the AHFA prediction states, 4902I now convert it into a matrix from symbols to the rules they predict. 4903Specifically, if symbol |S1| predicts symbol |S2|, then symbol |S1| 4904predicts every rule 4905with |S2| on its LHS. 4906@<Create the prediction matrix from the symbol-by-symbol matrix@> = { 4907 AIM* items_by_rule = g->t_AHFA_items_by_rule; 4908 SYMID from_symid; 4909 guint* sort_key_by_rule_id = g_new(guint, rule_count_of_g); 4910 guint no_of_predictable_rules = 0; 4911 @<Populate |sort_key_by_rule_id| with first pass value; 4912 calculate |no_of_predictable_rules|@>@/ 4913 @<Populate |rule_by_sort_key|@>@/ 4914 @<Populate |sort_key_by_rule_id| with second pass value@>@/ 4915 @<Populate the prediction matrix@>@/ 4916 g_free(sort_key_by_rule_id); 4917} 4918 4919@ For creating prediction AHFA states, we need to have an ordering of rules 4920by their postdot symbol. 4921A ``predictable rule" is one whose initial item has a postdot symbol. 4922The following facts hold: 4923\li A rule is predictable iff it is both used and non-nulling. 4924\li A rule is predictable iff it is a used rule which is not the nulling start rule. 4925\li A rule is predictable iff it has any item with a postdot symbol. 4926\par 4927Here we take a first pass at this, letting the value be the postdot symbol for 4928the predictable rules. 4929|G_MAXINT| is used for the others, so that they will sort high. 4930(|G_MAXINT| is used and not |G_MAXUINT|, because the sort routines 4931work with signed values.) 4932This first pass fully captures the order, but 4933our final result needs to be an unique ID for every ``predictable rule", 4934so that it can be used as the index in a bit vector. 4935@<Populate |sort_key_by_rule_id| with first pass value; 4936calculate |no_of_predictable_rules|@> = 4937{ 4938 RULEID rule_id; 4939 for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++) 4940 { 4941 AIM item = items_by_rule[rule_id]; 4942 SYMID postdot; 4943 if (!item) 4944 goto NOT_A_PREDICTABLE_RULE; 4945 postdot = Postdot_SYMID_of_AIM (item); 4946 if (postdot < 0) 4947 goto NOT_A_PREDICTABLE_RULE; 4948 sort_key_by_rule_id[rule_id] = postdot; 4949 no_of_predictable_rules++; 4950 continue; 4951 NOT_A_PREDICTABLE_RULE: 4952 sort_key_by_rule_id[rule_id] = G_MAXINT; 4953 } 4954} 4955 4956@ @<Populate |rule_by_sort_key|@> = 4957{ 4958 RULEID rule_id; 4959 for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++) 4960 { 4961 rule_by_sort_key[rule_id] = RULE_by_ID (g, rule_id); 4962 } 4963 g_qsort_with_data (rule_by_sort_key, (gint)rule_count_of_g, 4964 sizeof (RULE), cmp_by_rule_sort_key, 4965 (gpointer) sort_key_by_rule_id); 4966} 4967 4968@ @<Function definitions@> = static gint 4969cmp_by_rule_sort_key(gconstpointer ap, 4970 gconstpointer bp, gpointer user_data) { 4971 RULE a = *(RULE*)ap; 4972 RULE b = *(RULE*)bp; 4973 guint* sort_key_by_rule_id = (guint*)user_data; 4974 Marpa_Rule_ID a_id = a->t_id; 4975 Marpa_Rule_ID b_id = b->t_id; 4976 guint sort_key_a = sort_key_by_rule_id[a_id]; 4977 guint sort_key_b = sort_key_by_rule_id[b_id]; 4978 if (sort_key_a == sort_key_b) return a_id - b_id; 4979 return sort_key_a - sort_key_b; 4980} 4981@ @<Private function prototypes@> = static 4982gint cmp_by_rule_sort_key(gconstpointer ap, 4983 gconstpointer bp, gpointer user_data); 4984 4985@ We have now sorted the rules into the final sort key order. 4986With this final version of the sort keys, 4987populate the index from rule id to sort key. 4988@<Populate |sort_key_by_rule_id| with second pass value@> = 4989{ 4990 guint sort_key; 4991 for (sort_key = 0; sort_key < rule_count_of_g; sort_key++) 4992 { 4993 RULE rule = rule_by_sort_key[sort_key]; 4994 sort_key_by_rule_id[rule->t_id] = sort_key; 4995 } 4996} 4997 4998@ @<Populate the prediction matrix@> = 4999{ 5000 prediction_matrix = matrix_create (symbol_count_of_g, no_of_predictable_rules); 5001 for (from_symid = 0; from_symid < (SYMID) symbol_count_of_g; 5002 from_symid++) 5003 { 5004 // for every row of the symbol-by-symbol matrix 5005 guint min, max, start; 5006 for (start = 0; 5007 bv_scan (matrix_row 5008 (symbol_by_symbol_matrix, (guint) from_symid), start, 5009 &min, &max); start = max + 2) 5010 { 5011 Marpa_Symbol_ID to_symid; 5012 for (to_symid = min; to_symid <= (Marpa_Symbol_ID) max; 5013 to_symid++) 5014 { 5015 // for every predicted symbol 5016 SYM to_symbol = SYM_by_ID (to_symid); 5017 GArray *lhs_rules = to_symbol->t_lhs; 5018 guint ix, no_of_lhs_rules = lhs_rules->len; 5019 for (ix = 0; ix < no_of_lhs_rules; ix++) 5020 { 5021 // For every rule with that symbol on its LHS 5022 Marpa_Rule_ID rule_with_this_lhs_symbol = 5023 g_array_index (lhs_rules, Marpa_Rule_ID, ix); 5024 guint sort_key = 5025 sort_key_by_rule_id[rule_with_this_lhs_symbol]; 5026 if (sort_key >= no_of_predictable_rules) 5027 continue; /* 5028 We only need to predict rules which have items */ 5029 matrix_bit_set (prediction_matrix, (guint) from_symid, 5030 sort_key); 5031 // Set the $(symbol, rule sort key)$ bit in the matrix 5032 } 5033 } 5034 } 5035 } 5036} 5037 5038@ @<Private function prototypes@> = 5039static AHFA 5040create_predicted_AHFA_state( 5041 struct marpa_g* g, 5042 Bit_Vector prediction_rule_vector, 5043 RULE* rule_by_sort_key, 5044 DQUEUE states_p, 5045 GTree* duplicates 5046 ); 5047@ @<Function definitions@> = 5048static AHFA 5049create_predicted_AHFA_state( 5050 struct marpa_g* g, 5051 Bit_Vector prediction_rule_vector, 5052 RULE* rule_by_sort_key, 5053 DQUEUE states_p, 5054 GTree* duplicates 5055 ) { 5056AIM* item_list_for_new_state; 5057AHFA p_new_state; 5058guint item_list_ix = 0; 5059guint no_of_items_in_new_state = bv_count( prediction_rule_vector); 5060 if (no_of_items_in_new_state == 0) return NULL; 5061item_list_for_new_state = obstack_alloc (&g->t_obs, 5062 no_of_items_in_new_state * sizeof (AIM)); 5063{ 5064 guint start, min, max; 5065 for (start = 0; bv_scan (prediction_rule_vector, start, &min, &max); 5066 start = max + 2) 5067 { // Scan the prediction rule vector again, this time to populate the list 5068 guint rule_sort_key; 5069 for (rule_sort_key = min; rule_sort_key <= max; rule_sort_key++) 5070 { 5071 /* Add the initial item for the predicted rule */ 5072 RULE rule = rule_by_sort_key[rule_sort_key]; 5073 item_list_for_new_state[item_list_ix++] = 5074 g->t_AHFA_items_by_rule[rule->t_id]; 5075 } 5076 } 5077} 5078p_new_state = DQUEUE_PUSH((*states_p), AHFA_Object);@/ 5079 p_new_state->t_items = item_list_for_new_state; 5080 p_new_state->t_item_count = no_of_items_in_new_state; 5081 { AHFA queued_AHFA_state = assign_AHFA_state(p_new_state, duplicates); 5082 if (queued_AHFA_state) { 5083 /* The new state would be a duplicate. 5084 Back it out and return the one that already exists */ 5085 (void)DQUEUE_POP((*states_p), AHFA_Object); 5086 obstack_free(&g->t_obs, item_list_for_new_state); 5087 return queued_AHFA_state; 5088 } 5089 } 5090 // The new state was added -- finish up its data 5091 p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE((*states_p), AHFA_Object); 5092 LV_AHFA_is_Predicted(p_new_state) = 1; 5093 p_new_state->t_has_completed_start_rule = 0; 5094 LV_Leo_LHS_ID_of_AHFA(p_new_state) = -1; 5095 p_new_state->t_empty_transition = NULL; 5096 LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g); 5097 LV_Complete_SYM_Count_of_AHFA(p_new_state) = 0; 5098 @<Calculate postdot symbols for predicted state@>@/ 5099 return p_new_state; 5100} 5101 5102@ @<Calculate postdot symbols for predicted state@> = 5103{ 5104 guint symbol_count = SYM_Count_of_G (g); 5105 guint item_ix; 5106 guint no_of_postdot_symbols; 5107 Bit_Vector postdot_v = bv_create (symbol_count); 5108 for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++) 5109 { 5110 AIM item = item_list_for_new_state[item_ix]; 5111 SYMID postdot = Postdot_SYMID_of_AIM (item); 5112 if (postdot >= 0) 5113 bv_bit_set (postdot_v, (guint) postdot); 5114 } 5115 if ((no_of_postdot_symbols = p_new_state->t_postdot_sym_count = 5116 bv_count (postdot_v))) 5117 { 5118 guint min, max, start; 5119 Marpa_Symbol_ID *p_symbol = p_new_state->t_postdot_symid_ary = 5120 obstack_alloc (&g->t_obs, 5121 no_of_postdot_symbols * sizeof (SYMID)); 5122 for (start = 0; bv_scan (postdot_v, start, &min, &max); start = max + 2) 5123 { 5124 Marpa_Symbol_ID postdot; 5125 for (postdot = (Marpa_Symbol_ID) min; 5126 postdot <= (Marpa_Symbol_ID) max; postdot++) 5127 { 5128 *p_symbol++ = postdot; 5129 } 5130 } 5131 } 5132 bv_free (postdot_v); 5133} 5134 5135@** Transition (TRANS) Code. 5136This code deals with data which is accessed 5137as a function of AHFA state and symbol. 5138The most important data 5139of this type are the AHFA state transitions, 5140which is why the per-AHFA-per-symbol data is called 5141``transition" data. 5142But per-AHFA symbol completion data is also 5143a function of AHFA state and symbol. 5144@ This operation is at the heart of the parse engine, 5145and worth a careful look. 5146Speed is probably optimal. 5147Time complexity is fine --- $O(1)$ in the length of the input. 5148@ But this solution is is very space-intensive---% 5149perhaps $O(\v g\v^2)$. 5150Ordinarily, for code which is executed this heavily, 5151I would worry about a speed versus space tradeoff of this kind. 5152But these arrays are extremely sparse, 5153Many rows of the array have only one or two entries. 5154There are alternatives 5155which save a lot of space in return for a small overhead in time. 5156@ A very similar problem has been the subject of considerable 5157study---% 5158LALR and LR(0) state tables. 5159These also index by state and symbol, and their usage is very 5160similar to that expected for the AHFA lookups. 5161@ Bison's solution is probably worth study. 5162This is a kind of perfect hashing, and quite complex. 5163I do wonder if it would not be over-engineering 5164in the libmarpa context. 5165In practical applications, a binary search, or even 5166a linear search, 5167may have be fastest implementation for 5168the average case. 5169@ The trend is for memory to get cheap, 5170favoring the sparse 2-dimensional array 5171which is the present solution. 5172But I expect the trend will also be for grammars to get larger. 5173This would be a good issue to run some benchmarks on, 5174once I stabilize the C code implemention. 5175 5176@d TRANS_of_AHFA_by_SYMID(from_ahfa, id) 5177 (*(TRANSs_of_AHFA(from_ahfa)+(id))) 5178@d TRANS_of_EIM_by_SYMID(eim, id) TRANS_of_AHFA_by_SYMID(AHFA_of_EIM(eim), (id)) 5179@d To_AHFA_of_TRANS(trans) (to_ahfa_of_transition_get(trans)) 5180@d LV_To_AHFA_of_TRANS(trans) ((trans)->t_ur.t_to_ahfa) 5181@d Completion_Count_of_TRANS(trans) 5182 (completion_count_of_transition_get(trans)) 5183@d LV_Completion_Count_of_TRANS(trans) ((trans)->t_ur.t_completion_count) 5184@d To_AHFA_of_AHFA_by_SYMID(from_ahfa, id) 5185 (To_AHFA_of_TRANS(TRANS_of_AHFA_by_SYMID((from_ahfa), (id)))) 5186@d Completion_Count_of_AHFA_by_SYMID(from_ahfa, id) 5187 (Completion_Count_of_TRANS(TRANS_of_AHFA_by_SYMID((from ahfa), (id)))) 5188@d To_AHFA_of_EIM_by_SYMID(eim, id) To_AHFA_of_AHFA_by_SYMID(AHFA_of_EIM(eim), (id)) 5189@d AEXs_of_TRANS(trans) ((trans)->t_aex) 5190@d Leo_Base_AEX_of_TRANS(trans) ((trans)->t_leo_base_aex) 5191@ @s TRANS int 5192@<Private incomplete structures@> = 5193struct s_transition; 5194typedef struct s_transition* TRANS; 5195struct s_ur_transition; 5196typedef struct s_ur_transition* URTRANS; 5197@ @<Private typedefs@> = typedef gint AEX; 5198@ @<Private structures@> = 5199struct s_ur_transition { 5200 AHFA t_to_ahfa; 5201 gint t_completion_count; 5202}; 5203struct s_transition { 5204 struct s_ur_transition t_ur; 5205 AEX t_leo_base_aex; 5206 AEX t_aex[1]; 5207}; 5208@ @d TRANSs_of_AHFA(ahfa) ((ahfa)->t_transitions) 5209@d LV_TRANSs_of_AHFA(ahfa) TRANSs_of_AHFA(ahfa) 5210@<Widely aligned AHFA state elements@> = 5211 TRANS* t_transitions; 5212@ @<Private function prototypes@> = 5213static inline AHFA to_ahfa_of_transition_get(TRANS transition); 5214@ @<Function definitions@> = 5215static inline AHFA to_ahfa_of_transition_get(TRANS transition) { 5216 if (!transition) return NULL; 5217 return transition->t_ur.t_to_ahfa; 5218} 5219@ @<Private function prototypes@> = 5220static inline gint completion_count_of_transition_get(TRANS transition); 5221@ @<Function definitions@> = 5222static inline gint completion_count_of_transition_get(TRANS transition) { 5223 if (!transition) return 0; 5224 return transition->t_ur.t_completion_count; 5225} 5226 5227@ @<Private function prototypes@> = 5228static inline 5229URTRANS transition_new(struct obstack *obstack, AHFA to_ahfa, gint aim_ix); 5230@ @<Function definitions@> = 5231static inline 5232URTRANS transition_new(struct obstack *obstack, AHFA to_ahfa, gint aim_ix) { 5233 URTRANS transition; 5234 transition = obstack_alloc (obstack, sizeof (transition[0])); 5235 transition->t_to_ahfa = to_ahfa; 5236 transition->t_completion_count = aim_ix; 5237 return transition; 5238} 5239 5240@ @<Private function prototypes@> = static inline 5241TRANS* transitions_new(struct marpa_g* g); 5242@ @<Function definitions@> = static inline 5243TRANS* transitions_new(struct marpa_g* g) { 5244 gint symbol_count = SYM_Count_of_G(g); 5245 gint symid = 0; 5246 TRANS* transitions; 5247 transitions = g_malloc(symbol_count * sizeof(transitions[0])); 5248 while (symid < symbol_count) transitions[symid++] = NULL; /* 5249 |g_malloc0| will not work because NULL is not guaranteed 5250 to be a bitwise zero. */ 5251 return transitions; 5252} 5253 5254@ @<Private function prototypes@> = 5255static inline 5256void transition_add(struct obstack *obstack, AHFA from_ahfa, SYMID symid, AHFA to_ahfa); 5257@ @<Function definitions@> = 5258static inline 5259void transition_add(struct obstack *obstack, AHFA from_ahfa, SYMID symid, AHFA to_ahfa) 5260{ 5261 TRANS* transitions = TRANSs_of_AHFA(from_ahfa); 5262 TRANS transition = transitions[symid]; 5263 if (!transition) { 5264 transitions[symid] = (TRANS)transition_new(obstack, to_ahfa, 0); 5265 return; 5266 } 5267 LV_To_AHFA_of_TRANS(transition) = to_ahfa; 5268 return; 5269} 5270 5271@ @<Private function prototypes@> = 5272static inline 5273void completion_count_inc(struct obstack *obstack, AHFA from_ahfa, SYMID symid); 5274@ @<Function definitions@> = 5275static inline 5276void completion_count_inc(struct obstack *obstack, AHFA from_ahfa, SYMID symid) 5277{ 5278 TRANS* transitions = TRANSs_of_AHFA(from_ahfa); 5279 TRANS transition = transitions[symid]; 5280 if (!transition) { 5281 transitions[symid] = (TRANS)transition_new(obstack, NULL, 1); 5282 return; 5283 } 5284 LV_Completion_Count_of_TRANS(transition)++; 5285 return; 5286} 5287 5288@*0 Trace Functions. 5289@<Public function prototypes@> = 5290gint marpa_AHFA_state_transitions(struct marpa_g* g, 5291 Marpa_AHFA_State_ID AHFA_state_id, 5292 GArray *result); 5293@ @<Function definitions@> = 5294gint marpa_AHFA_state_transitions(struct marpa_g* g, 5295 Marpa_AHFA_State_ID AHFA_state_id, 5296 GArray *result) { 5297 5298 @<Return |-2| on failure@>@; 5299 AHFA from_ahfa_state; 5300 TRANS* transitions; 5301 SYMID symid; 5302 gint symbol_count; 5303 5304 @<Fail if grammar not precomputed@>@; 5305 @<Fail if grammar |AHFA_state_id| is invalid@>@; 5306 @<Fail grammar if elements of |result| are not |sizeof(gint)|@>@; 5307 from_ahfa_state = AHFA_of_G_by_ID(g, AHFA_state_id); 5308 transitions = TRANSs_of_AHFA(from_ahfa_state); 5309 symbol_count = SYM_Count_of_G(g); 5310 g_array_set_size(result, 0); 5311 for (symid = 0; symid < symbol_count; symid++) { 5312 AHFA to_ahfa_state = To_AHFA_of_TRANS(transitions[symid]); 5313 if (!to_ahfa_state) continue; 5314 g_array_append_val (result, symid); 5315 g_array_append_val (result, ID_of_AHFA(to_ahfa_state)); 5316 } 5317 return result->len; 5318} 5319 5320@** Empty Transition Code. 5321@d Empty_Transition_of_AHFA(state) ((state)->t_empty_transition) 5322@*0 Trace Functions. 5323@<Public function prototypes@> = 5324@ @<Public function prototypes@> = 5325Marpa_AHFA_State_ID marpa_AHFA_state_empty_transition(struct marpa_g* g, 5326 Marpa_AHFA_State_ID AHFA_state_id); 5327@ In the external accessor, 5328-1 is a valid return value, indicating no empty transition. 5329@<Function definitions@> = 5330AHFAID marpa_AHFA_state_empty_transition(struct marpa_g* g, 5331 AHFAID AHFA_state_id) { 5332 AHFA state; 5333 AHFA empty_transition_state; 5334 @<Return |-2| on failure@>@/ 5335 @<Fail if grammar not precomputed@>@/ 5336 @<Fail if grammar |AHFA_state_id| is invalid@>@/ 5337 state = AHFA_of_G_by_ID(g, AHFA_state_id); 5338 empty_transition_state = Empty_Transition_of_AHFA (state); 5339 if (empty_transition_state) 5340 return ID_of_AHFA (empty_transition_state); 5341 return -1; 5342} 5343 5344 5345@** Populating the Terminal Boolean Vector. 5346@<Populate the Terminal Boolean Vector@> = { 5347 gint symbol_count = SYM_Count_of_G(g); 5348 gint symid; 5349 Bit_Vector bv_is_terminal = bv_create( (guint)symbol_count ); 5350 g->t_bv_symid_is_terminal = bv_is_terminal; 5351 for (symid = 0; symid < symbol_count; symid++) { 5352 if (!SYMID_is_Terminal(symid)) continue; 5353 bv_bit_set(bv_is_terminal, (guint)symid); 5354 } 5355} 5356 5357@** Recognizer (RECCE) Code. 5358@<Public incomplete structures@> = 5359struct marpa_r; 5360@ @<Private typedefs@> = 5361typedef struct marpa_r* RECCE; 5362@ @<Recognizer structure@> = 5363struct marpa_r { 5364@<Widely aligned recognizer elements@>@/ 5365@<Int aligned recognizer elements@>@/ 5366@<Bit aligned recognizer elements@>@/ 5367}; 5368 5369@ @<Public function prototypes@> = 5370struct marpa_r* marpa_r_new( struct marpa_g* g ); 5371@ The grammar must not be deallocated for the life of the 5372recognizer. 5373In the event of an error creating the recognizer, 5374|NULL| is returned and the error status 5375of the {\bf grammar} is set. 5376For this reason, the grammar is not |const|. 5377@<Function definitions@> = 5378struct marpa_r* marpa_r_new( struct marpa_g* g ) 5379{ RECCE r; 5380 gint symbol_count_of_g; 5381 @<Return |NULL| on failure@>@/ 5382 if (!G_is_Precomputed(g)) { 5383 g->t_error = "precomputed"; 5384 return failure_indicator; 5385 } 5386 r = g_slice_new(struct marpa_r); 5387 r->t_grammar = g; 5388 symbol_count_of_g = SYM_Count_of_G(g); 5389 @<Initialize recognizer obstack@>@; 5390 @<Initialize recognizer elements@>@; 5391 return r; } 5392 5393@ @<Function definitions@> = 5394void marpa_r_free(struct marpa_r *r) 5395{ 5396@<Destroy recognizer elements@>@; 5397if (r->t_sym_workarea) g_free(r->t_sym_workarea); 5398if (r->t_workarea2) g_free(r->t_workarea2); 5399@<Free working bit vectors for symbols@>@; 5400@<Destroy recognizer obstack@>@; 5401g_slice_free(struct marpa_r, r); 5402} 5403@ @<Public function prototypes@> = 5404void marpa_r_free(struct marpa_r *r); 5405 5406@*0 The Recognizer ID. 5407A unique ID for the recognizer. 5408This must be unique not just per-thread, 5409but process-wide. 5410The counter which tracks recognizer ID's 5411(|next_recce_id|) 5412is (at this writing) the only global 5413non-constant, and requires special handling to 5414keep |libmarpa| MT-safe. 5415(|next_recce_id|) is accessed only via 5416|glib|'s special atomic operations. 5417@ @<Int aligned recognizer elements@> = gint t_id; 5418@ @<Public typedefs@> = typedef gint Marpa_Recognizer_ID; 5419@ @<Private global variables@> = static gint next_recce_id = 1; 5420@ @<Initialize recognizer elements@> = 5421r->t_id = g_atomic_int_exchange_and_add(&next_recce_id, 1); 5422@ @<Function definitions@> = 5423gint marpa_r_id(struct marpa_r* r) { return r->t_id; } 5424@ @<Public function prototypes@> = 5425gint marpa_r_id(struct marpa_r* r); 5426 5427@*0 The Grammar for the Recognizer. 5428Initialized in |marpa_r_new|. 5429@d G_of_R(r) ((r)->t_grammar) 5430@d AHFA_Count_of_R(r) AHFA_Count_of_G(G_of_R(r)) 5431@ @<Widely aligned recognizer elements@> = const struct marpa_g *t_grammar; 5432 5433@*0 Recognizer Phase. 5434The recognizer has phases, such as ``input" 5435and ``evaluation", 5436and states, such as ``exhausted". 5437The main distinction is that the 5438phases are mutually exclusive---% 5439entering one means leaving another. 5440``Exhausted" is not a phase, because when a parser is 5441exhausted it may gone into the evaluation phase, then 5442return to the input phase, 5443All that time it will remain ``exhausted". 5444@ {\bf To Do}: @^To Do@> 5445Once I refactor the objects, these phases will need to be 5446revisited. 5447|evaluation_phase| should probably be eliminated at that point, 5448assuming that the bocage object can be made independent of 5449the recognizer. 5450@<Public typedefs@> = 5451enum marpa_phase { 5452 no_such_phase = 0, // 0 is never a valid phase 5453 initial_phase, 5454 input_phase, 5455 evaluation_phase, 5456 error_phase 5457}; 5458typedef enum marpa_phase Marpa_Phase; 5459@ @d Phase_of_R(r) ((r)->t_phase) 5460@<Int aligned recognizer elements@> = 5461Marpa_Phase t_phase; 5462@ @<Initialize recognizer elements@> = 5463Phase_of_R(r) = initial_phase; 5464@ @<Public function prototypes@> = 5465Marpa_Phase marpa_phase(struct marpa_r* r); 5466@ @<Function definitions@> = 5467Marpa_Phase marpa_phase(struct marpa_r* r) 5468{ return Phase_of_R(r); } 5469 5470@*0 Earley Set Container. 5471@d First_ES_of_R(r) ((r)->t_first_earley_set) 5472@d LV_First_ES_of_R(r) First_ES_of_R(r) 5473@<Widely aligned recognizer elements@> = 5474ES t_first_earley_set; 5475ES t_latest_earley_set; 5476EARLEME t_current_earleme; 5477@ @<Initialize recognizer elements@> = 5478r->t_first_earley_set = NULL; 5479r->t_latest_earley_set = NULL; 5480r->t_current_earleme = -1; 5481 5482@*0 Current Earleme. 5483@d Latest_ES_of_R(r) ((r)->t_latest_earley_set) 5484@d LV_Latest_ES_of_R(r) Latest_ES_of_R(r) 5485@d Current_Earleme_of_R(r) ((r)->t_current_earleme) 5486@d LV_Current_Earleme_of_R(r) (Current_Earleme_of_R(r)) 5487@ @<Public function prototypes@> = 5488guint marpa_current_earleme(struct marpa_r* r); 5489@ @<Function definitions@> = 5490guint marpa_current_earleme(struct marpa_r* r) 5491{ return Current_Earleme_of_R(r); } 5492 5493@ @d Current_ES_of_R(r) current_es_of_r(r) 5494@<Private function prototypes@> = 5495static inline ES current_es_of_r(RECCE r); 5496@ @<Function definitions@> = 5497static inline ES current_es_of_r(RECCE r) 5498{ 5499 const ES latest = Latest_ES_of_R(r); 5500 if (Earleme_of_ES(latest) == Current_Earleme_of_R(r)) return latest; 5501 return NULL; 5502} 5503 5504@*0 Earley Set Warning Threshold. 5505@d DEFAULT_EIM_WARNING_THRESHOLD (100) 5506@<Int aligned recognizer elements@> = guint t_earley_item_warning_threshold; 5507@ @<Initialize recognizer elements@> = 5508r->t_earley_item_warning_threshold = MAX(DEFAULT_EIM_WARNING_THRESHOLD, AIM_Count_of_G(g)*2); 5509@ @<Public function prototypes@> = 5510guint marpa_earley_item_warning_threshold(struct marpa_r* r); 5511@ @<Function definitions@> = 5512guint marpa_earley_item_warning_threshold(struct marpa_r* r) 5513{ return r->t_earley_item_warning_threshold; } 5514 5515@ @<Public function prototypes@> = 5516gboolean marpa_earley_item_warning_threshold_set(struct marpa_r*r, guint threshold); 5517@ Returns |TRUE| on success, 5518|FALSE| on failure. 5519@<Function definitions@> = 5520gboolean marpa_earley_item_warning_threshold_set(struct marpa_r*r, guint threshold) 5521{ 5522 r->t_earley_item_warning_threshold = threshold == 0 ? EIM_FATAL_THRESHOLD : threshold; 5523 return TRUE; 5524} 5525 5526@*0 Furthest Earleme. 5527The ``furthest" or highest-numbered earleme. 5528This is the earleme of the last Earley set that contains anything. 5529Marpa allows variable length tokens, 5530so it needs to track how far out tokens might be found. 5531No complete or predicted Earley item will be found after the current earleme. 5532@d Furthest_Earleme_of_R(r) ((r)->t_furthest_earleme) 5533@d LV_Furthest_Earleme_of_R(r) Furthest_Earleme_of_R(r) 5534@<Int aligned recognizer elements@> = EARLEME t_furthest_earleme; 5535@ @<Initialize recognizer elements@> = r->t_furthest_earleme = 0; 5536@ @<Public function prototypes@> = 5537guint marpa_furthest_earleme(struct marpa_r* r); 5538@ @<Function definitions@> = 5539guint marpa_furthest_earleme(struct marpa_r* r) 5540{ return Furthest_Earleme_of_R(r); } 5541 5542@*0 Symbol Workarea. 5543This is used in the completion 5544phase for each Earley set. 5545It is used in building the list of postdot items, 5546and when building the Leo items. 5547It is sized to hold one |gpointer| for 5548every symbol. 5549@ 5550{\bf To Do}: @^To Do@> 5551It may be possible to free this space when the recognition phase 5552is finished. 5553@<Widely aligned recognizer elements@> = gpointer* t_sym_workarea; 5554@ @<Initialize recognizer elements@> = r->t_sym_workarea = NULL; 5555@ @<Allocate symbol workarea@> = 5556 r->t_sym_workarea = g_malloc(sym_workarea_size); 5557 5558@*0 Workarea 2. 5559This is used in the completion 5560phase for each Earley set. 5561when building the Leo items. 5562It is sized to hold two |gpointer|'s for 5563every symbol. 5564@ 5565{\bf To Do}: @^To Do@> 5566It may be possible to free this space when the recognition phase 5567is finished. 5568@<Widely aligned recognizer elements@> = gpointer* t_workarea2; 5569@ @<Initialize recognizer elements@> = r->t_workarea2 = NULL; 5570@ @<Allocate recognizer workareas@> = 5571{ 5572 const guint sym_workarea_size = sizeof (gpointer) * symbol_count_of_g; 5573 @<Allocate symbol workarea@>@; 5574 r->t_workarea2 = g_malloc(2u * sym_workarea_size); 5575} 5576 5577@*0 Working Bit Vectors for Symbols. 5578These are two bit vectors, sized to the number of symbols 5579in the grammar, 5580for utility purposes. 5581They are used in the completion 5582phase for each Earley set, 5583to keep track of the new postdot items and 5584Leo items. 5585@ 5586{\bf To Do}: @^To Do@> 5587It may be possible to free this space when the recognition phase 5588is finished. 5589@<Widely aligned recognizer elements@> = 5590Bit_Vector t_bv_sym; 5591Bit_Vector t_bv_sym2; 5592Bit_Vector t_bv_sym3; 5593@ @<Initialize recognizer elements@> = 5594r->t_bv_sym = NULL; 5595r->t_bv_sym2 = NULL; 5596r->t_bv_sym3 = NULL; 5597@ @<Allocate recognizer's bit vectors for symbols@> = { 5598 r->t_bv_sym = bv_create( (guint)symbol_count_of_g ); 5599 r->t_bv_sym2 = bv_create( (guint)symbol_count_of_g ); 5600 r->t_bv_sym3 = bv_create( (guint)symbol_count_of_g ); 5601} 5602@ @<Free working bit vectors for symbols@> = 5603if (r->t_bv_sym) bv_free(r->t_bv_sym); 5604if (r->t_bv_sym2) bv_free(r->t_bv_sym2); 5605if (r->t_bv_sym3) bv_free(r->t_bv_sym3); 5606 5607@*0 Expected Symbol Boolean Vector. 5608A boolean vector by symbol ID, 5609with the bits set if the symbol is expected 5610at the current earleme. 5611This vector is not size until input starts. 5612When the recognizer is created, 5613this bit vector is initialized to |NULL| so that the destructor 5614can tell if there is a bit vector to be freed. 5615@<Widely aligned recognizer elements@> = Bit_Vector t_bv_symid_is_expected; 5616@ @<Initialize recognizer elements@> = r->t_bv_symid_is_expected = NULL; 5617@ @<Allocate recognizer's bit vectors for symbols@> = 5618 r->t_bv_symid_is_expected = bv_create( (guint)symbol_count_of_g ); 5619@ @<Free working bit vectors for symbols@> = 5620if (r->t_bv_symid_is_expected) { bv_free(r->t_bv_symid_is_expected); } 5621@ Returns |-2| if there was a failure. 5622There is a check that the expectations of this 5623function and its caller about size of the |GArray| elements match. 5624This is a check worth making. 5625Mistakes happen, 5626a mismatch might arise as a portability issue, 5627and if I do not ``fail fast" here the ultimate problem 5628could be very hard to debug. 5629@<Public function prototypes@> = 5630gint marpa_terminals_expected(struct marpa_r* r, GArray* result); 5631@ @<Function definitions@> = 5632gint marpa_terminals_expected(struct marpa_r* r, GArray* result) 5633{ 5634 @<Return |-2| on failure@>@; 5635 guint min, max, start; 5636 @<Fail recognizer if |GArray| elements are not |sizeof(gint)|@>@; 5637 g_array_set_size(result, 0); 5638 for (start = 0; bv_scan (r->t_bv_symid_is_expected, start, &min, &max); 5639 start = max + 2) 5640 { 5641 gint symid; 5642 for (symid = (gint) min; symid <= (gint) max; symid++) 5643 { 5644 g_array_append_val (result, symid); 5645 } 5646 } 5647 return (gint)result->len; 5648} 5649 5650@*0 Leo-Related Booleans. 5651@*1 Turning Leo Logic Off and On. 5652A trace flag, set if we are using Leo items. 5653This flag is set by default. 5654It has two uses. 5655@ This flag is very useful for testing. 5656Since Leo items do not affect function, only effiency, 5657it is possible for the Leo logic to be broken or 5658disabled without most tests noticiing. 5659To make sure the Leo logic is intact, 5660one of |libmarpa|'s tests runs one pass 5661with Leo items off and another with Leo items on 5662and compares them. 5663@ This flag also allows the Leo logic 5664to be turned off in certain cases in which the Leo logic 5665actually slows things down. 5666The Leo logic could be turned off if the user knows there is 5667no right recursion, although the actual gain, 5668would typically be small or not measurable. 5669@ A real gain would occur in the case of highly ambiguous 5670grammars, all or most of whose parses are actually evaluated. 5671Since those Earley items eliminated by the Leo logic 5672are actually recreated on an as-needed basis in the evaluation 5673phase, in cases when most of the Earley items are needed 5674for evaluation, the Leo logic would be eliminated Earley 5675items only to have to add most of them later. 5676In these cases, 5677the Leo logic would impose a small overhead. 5678@ The author's current view is that it is best 5679to start by assuming that the Leo logic should 5680be left on. 5681In the rare event, that it turns out that the Leo 5682logic is counter-productive, 5683this flag can be used to test if turning the Leo 5684logic off is helpful. 5685@ It should be borne in mind that even when the Leo logic 5686imposes a small cost in typical cases, 5687it may act as a safeguard. 5688The time complexity explosions prevented by Leo logic can 5689easily mean the difference between an impractical computation 5690and a practical one. 5691In most applications, it is worth incurring an small 5692overhead in the average case to prevent failures, 5693even rare ones. 5694@ There are two booleans. 5695One is a flag that can be set and 5696unset externally, 5697indicating the application's intention to use Leo logic. 5698An internal boolean tracks whether the Leo logic is 5699actually enabled at any given point. 5700@ The reason for having two booleans 5701is that the Leo logic is only turned 5702on once Earley set 0 is complete. 5703While Earley set 0 is being processed the internal flag will always 5704be unset, while the external flag may be set or unset, as the user 5705decided. 5706After Earley set 0 is complete, both booleans will have the same value. 5707@ {\bf To Do}: @^To Do@> 5708Once the null parse is special-cased, one boolean may suffice. 5709@<Bit aligned recognizer elements@> = 5710guint t_use_leo_flag:1; 5711guint t_is_using_leo:1; 5712@ @<Initialize recognizer elements@> = 5713r->t_use_leo_flag = 1; 5714r->t_is_using_leo = 0; 5715@ Returns 1 if the ``use Leo" flag is set, 57160 if not, 5717and |-2| if there was an error. 5718@<Public function prototypes@> = 5719gboolean marpa_is_use_leo(struct marpa_r* r); 5720@ @<Function definitions@> = 5721gint marpa_is_use_leo(struct marpa_r* r) 5722{ 5723 @<Return |-2| on failure@>@/ 5724 @<Fail if recognizer has fatal error@>@; 5725 return r->t_use_leo_flag ? 1 : 0; 5726} 5727@ Returns |TRUE| on success, 5728|FALSE| on failure. 5729@<Function definitions@> = 5730gboolean marpa_is_use_leo_set( 5731struct marpa_r*r, gboolean value) 5732{ 5733 @<Return |FALSE| on failure@>@/ 5734 @<Fail if recognizer has fatal error@>@; 5735 @<Fail if recognizer not initial@>@; 5736 r->t_use_leo_flag = value; 5737 return TRUE; 5738} 5739@ @<Public function prototypes@> = 5740gboolean marpa_is_use_leo_set( struct marpa_r*r, gboolean value); 5741 5742@*1 Is The Parser Exhausted?. 5743A parser is ``exhausted" if it cannot accept any more input. 5744Both successful and failed parses can be ``exhausted". 5745In many grammars, 5746the parse is always exhausted as soon as it succeeds. 5747And even if the parse is exhausted at a point 5748where there is no good parse, 5749there may be good parses at earlemes prior to the 5750earleme at which the parse became exhausted. 5751@d R_is_Exhausted(r) ((r)->t_is_exhausted) 5752@d LV_R_is_Exhausted(r) R_is_Exhausted(r) 5753@<Bit aligned recognizer elements@> = guint t_is_exhausted:1; 5754@ @<Initialize recognizer elements@> = r->t_is_exhausted = 0; 5755@ Exhaustion is a boolean, not a phase. 5756Once exhausted a parse stays exhausted, 5757even though the phase may change. 5758@<Public function prototypes@> = 5759gboolean marpa_is_exhausted(struct marpa_r* r); 5760@ @<Function definitions@> = 5761gint marpa_is_exhausted(struct marpa_r* r) 5762{ 5763 @<Return |-2| on failure@>@/ 5764 @<Fail if recognizer has fatal error@>@; 5765 return r->t_is_exhausted ? 1 : 0; 5766} 5767 5768@*0 The Recognizer's Context. 5769As in the grammar, 5770The ``context" is a hash of miscellaneous data, 5771by keyword, 5772whose 5773purpose is to 5774provide callbacks with 5775data about the recognizer's 5776state which is not conveniently 5777available in other forms. 5778@d Context_of_R(r) ((r)->t_context) 5779@<Widely aligned recognizer elements@> = GHashTable* t_context; 5780@ @<Initialize recognizer elements@> = 5781r->t_context = g_hash_table_new_full( g_str_hash, g_str_equal, NULL, g_free ); 5782@ @<Destroy recognizer elements@> = g_hash_table_destroy(Context_of_R(r)); 5783 5784@ Add an integer to the context. 5785The const qualifier on the key is deliberately discarded. 5786As implemented, the keys are treated as const's by 5787|g_hash_table_insert|, but the compiler can't know 5788that is my intention. 5789For type safety, I do want to keep the |const| 5790qualifier in other contexts. 5791@<Function definitions@> = 5792static inline 5793void r_context_int_add(struct marpa_r* r, const gchar* key, gint payload) 5794{ 5795 struct marpa_context_int_value* value = g_new(struct marpa_context_int_value, 1); 5796 value->t_type = MARPA_CONTEXT_INT; 5797 value->t_data = payload; 5798 g_hash_table_insert(Context_of_R(r), (gpointer)key, value); 5799} 5800@ @<Private function prototypes@> = 5801static inline 5802void r_context_int_add(struct marpa_r* r, const gchar* key, gint value); 5803@ @<Function definitions@> = 5804static inline 5805void r_context_const_add(struct marpa_r* r, const gchar* key, const gchar* payload) 5806{ 5807 struct marpa_context_const_value* value = g_new(struct marpa_context_const_value, 1); 5808 value->t_type = MARPA_CONTEXT_CONST; 5809 value->t_data = payload; 5810 g_hash_table_insert(Context_of_R(r), (gpointer)key, value); 5811} 5812@ @<Private function prototypes@> = 5813static inline 5814void r_context_const_add(struct marpa_r* r, const gchar* key, const gchar* value); 5815 5816@ Clear the current context. 5817Used to create a ``clean slate" in the context. 5818@<Function definitions@> = 5819static inline void r_context_clear(struct marpa_r* r) { 5820 g_hash_table_remove_all(Context_of_R(r)); } 5821@ @<Private function prototypes@> = 5822static inline void r_context_clear(struct marpa_r* r); 5823 5824@ @<Function definitions@> = 5825union marpa_context_value* marpa_r_context_value(struct marpa_r* r, const gchar* key) 5826{ return g_hash_table_lookup(Context_of_R(r), key); } 5827@ @<Public function prototypes@> = 5828union marpa_context_value* marpa_r_context_value(struct marpa_r* r, const gchar* key); 5829 5830@*0 The Recognizer Obstack. 5831Create an obstack with the lifetime of the recognizer. 5832This is a very efficient way of allocating memory which won't be 5833resized and which will have the same lifetime as the recognizer. 5834@<Widely aligned recognizer elements@> = struct obstack t_obs; 5835@ @<Initialize recognizer obstack@> = obstack_init(&r->t_obs); 5836@ @<Destroy recognizer obstack@> = obstack_free(&r->t_obs, NULL); 5837 5838@*0 The Recognizer's Error ID. 5839This is an error flag for the recognizer. 5840Error status is not necessarily cleared 5841on successful return, so that 5842it is only valid when an external 5843function has indicated there is an error, 5844and becomes invalid again when another external method 5845is called on the recognizer. 5846Checking it at other times may reveal ``stale" error 5847messages. 5848@ @<Widely aligned recognizer elements@> = 5849Marpa_Error_ID t_error; 5850Marpa_Error_ID t_fatal_error; 5851@ @<Initialize recognizer elements@> = 5852r->t_error = NULL; 5853r->t_fatal_error = NULL; 5854@ There is no destructor. 5855The error strings are assummed to be 5856{\bf not} error messages, but ``cookies". 5857These cookies are constants residing in static memory 5858(which may be read-only depending on implementation). 5859They cannot and should not be de-allocated. 5860@ @<Function definitions@> = 5861Marpa_Error_ID marpa_r_error(const struct marpa_r* r) 5862{ return r->t_error ? r->t_error : "unknown error"; } 5863@ @<Public function prototypes@> = 5864Marpa_Error_ID marpa_r_error(const struct marpa_r* r); 5865 5866@** Earlemes. 5867In most parsers, the input is modeled as a token stream --- 5868a sequence of tokens. 5869In this model the idea of location is not complex. 5870The first token is at location 0, the second at location 1, 5871etc. 5872@ Marpa allows ambiguous and variable length tokens, and requires 5873a more flexible idea of location, with a unit of length. 5874The unit of token length in Marpa is called an Earleme. 5875The locations themselves are often called earlemes. 5876@ |EARLEME_THRESHOLD| is less than |G_MAXINT| so that 5877I can prevent overflow without getting fancy -- overflow 5878by addition is impossible as long as earlemes are below 5879the threshold. 5880@ I considered defining earlemes as |glong| or |gint64|. 5881But machines with 32-bit int's 5882will in a not very long time 5883become museum pieces. 5884And in the meantime this 5885definition of |EARLEME_THRESHOLD| probably allows as large as 5886parse as the memories on those machines will be 5887able to handle. 5888@d EARLEME_THRESHOLD (G_MAXINT/4) 5889@<Public typedefs@> = typedef gint Marpa_Earleme; 5890@ @<Private typedefs@> = typedef Marpa_Earleme EARLEME; 5891 5892@** Earley Set (ES) Code. 5893@<Public typedefs@> = typedef gint Marpa_Earley_Set_ID; 5894@ @<Private typedefs@> = typedef Marpa_Earley_Set_ID ESID; 5895@ @d Next_ES_of_ES(set) ((set)->t_next_earley_set) 5896@d LV_Next_ES_of_ES(set) Next_ES_of_ES(set) 5897@d Postdot_SYM_Count_of_ES(set) ((set)->t_postdot_sym_count) 5898@d First_PIM_of_ES_by_SYMID(set, symid) (first_pim_of_es_by_symid((set), (symid))) 5899@d PIM_SYM_P_of_ES_by_SYMID(set, symid) (pim_sym_p_find((set), (symid))) 5900@<Private incomplete structures@> = 5901struct s_earley_set; 5902typedef struct s_earley_set *ES; 5903typedef const struct s_earley_set *ES_Const; 5904struct s_earley_set_key; 5905typedef struct s_earley_set_key *ESK; 5906@ @<Private structures@> = 5907struct s_earley_set_key { 5908 EARLEME t_earleme; 5909}; 5910typedef struct s_earley_set_key ESK_Object; 5911@ @<Private structures@> = 5912struct s_earley_set { 5913 ESK_Object t_key; 5914 gint t_postdot_sym_count; 5915 @<Int aligned Earley set elements@>@; 5916 union u_postdot_item** t_postdot_ary; 5917 ES t_next_earley_set; 5918 @<Widely aligned Earley set elements@>@/ 5919}; 5920 5921@*0 Earley Item Container. 5922@d EIM_Count_of_ES(set) ((set)->t_eim_count) 5923@<Int aligned Earley set elements@> = 5924gint t_eim_count; 5925@ @d EIMs_of_ES(set) ((set)->t_earley_items) 5926@<Widely aligned Earley set elements@> = 5927EIM* t_earley_items; 5928 5929@*0 Ordinal. 5930The ordinal of the Earley set--- 5931its number in sequence. 5932It is different from the earleme, because there may be 5933gaps in the earleme sequence. 5934There are never gaps in the sequence of ordinals. 5935@d ES_Count_of_R(r) ((r)->t_earley_set_count) 5936@d Ord_of_ES(set) ((set)->t_ordinal) 5937@<Int aligned Earley set elements@> = 5938 gint t_ordinal; 5939@ @d ES_Ord_is_Valid(r, ordinal) 5940 ((ordinal) >= 0 && (ordinal) < ES_Count_of_R(r)) 5941@<Int aligned recognizer elements@> = 5942gint t_earley_set_count; 5943@ @<Initialize recognizer elements@> = 5944r->t_earley_set_count = 0; 5945 5946@*0 Constructor. 5947@<Private function prototypes@> = 5948static inline ES earley_set_new (RECCE r, EARLEME id); 5949@ @<Function definitions@> = 5950static inline ES 5951earley_set_new( RECCE r, EARLEME id) 5952{ 5953 ESK_Object key; 5954 ES set; 5955 set = obstack_alloc (&r->t_obs, sizeof (*set)); 5956 key.t_earleme = id; 5957 set->t_key = key; 5958 set->t_postdot_ary = NULL; 5959 set->t_postdot_sym_count = 0; 5960 EIM_Count_of_ES(set) = 0; 5961 set->t_ordinal = r->t_earley_set_count++; 5962 EIMs_of_ES(set) = NULL; 5963 LV_Next_ES_of_ES(set) = NULL; 5964 @<Initialize Earley set PSL data@>@/ 5965 return set; 5966} 5967 5968@*0 Destructor. 5969@<Destroy recognizer elements@> = 5970{ 5971 ES set; 5972 for (set = First_ES_of_R (r); set; set = Next_ES_of_ES (set)) 5973 { 5974 if (EIMs_of_ES(set)) 5975 g_free (EIMs_of_ES(set)); 5976 } 5977} 5978 5979@*0 ID of Earley Set. 5980@d Earleme_of_ES(set) ((set)->t_key.t_earleme) 5981 5982@*0 Trace Functions. 5983Many of the 5984trace functions use 5985a ``trace Earley set" which is 5986tracked on a per-recognizer basis. 5987The ``trace Earley set" is tracked separately 5988from the current Earley set for the parse. 5989The two may coincide, but should not be confused. 5990@<Widely aligned recognizer elements@> = 5991struct s_earley_set* t_trace_earley_set; 5992@ @<Initialize recognizer elements@> = 5993r->t_trace_earley_set = NULL; 5994 5995@ @<Public function prototypes@> = 5996Marpa_Earley_Set_ID marpa_trace_earley_set(struct marpa_r *r); 5997@ @<Function definitions@> = 5998Marpa_Earley_Set_ID marpa_trace_earley_set(struct marpa_r *r) 5999{ 6000 @<Return |-2| on failure@>@; 6001 ES trace_earley_set = r->t_trace_earley_set; 6002 @<Fail recognizer if not trace-safe@>@; 6003 if (!trace_earley_set) { 6004 R_ERROR("no trace es"); 6005 return failure_indicator; 6006 } 6007 return Ord_of_ES(trace_earley_set); 6008} 6009 6010@ @<Public function prototypes@> = 6011Marpa_Earley_Set_ID marpa_latest_earley_set(struct marpa_r *r); 6012@ @<Function definitions@> = 6013Marpa_Earley_Set_ID marpa_latest_earley_set(struct marpa_r *r) 6014{ 6015 @<Return |-2| on failure@>@; 6016 @<Fail recognizer if not trace-safe@>@; 6017 return Ord_of_ES(Latest_ES_of_R(r)); 6018} 6019 6020@ Given the ID (ordinal) of an Earley set, 6021return the earleme. 6022In the default, token-stream model, ID and earleme 6023are the same, but this is not the case in other input 6024models. 6025If the ordinal is out of bounds, this function 6026returns -1, which can be treated as a soft failure. 6027On other problems, it returns -2. 6028@<Public function prototypes@> = 6029Marpa_Earleme marpa_earleme(struct marpa_r* r, Marpa_Earley_Set_ID set_id); 6030@ @<Function definitions@> = 6031Marpa_Earleme marpa_earleme(struct marpa_r* r, Marpa_Earley_Set_ID set_id) 6032{ 6033 const gint es_does_not_exist = -1; 6034 @<Return |-2| on failure@>@; 6035 ES earley_set; 6036 @<Fail if recognizer initial@>@; 6037 @<Fail if recognizer has fatal error@>@; 6038 if (set_id < 0) { 6039 R_ERROR("invalid es ordinal"); 6040 return failure_indicator; 6041 } 6042 r_update_earley_sets (r); 6043 if (!ES_Ord_is_Valid (r, set_id)) 6044 { 6045 return es_does_not_exist; 6046 } 6047 earley_set = ES_of_R_by_Ord (r, set_id); 6048 return Earleme_of_ES (earley_set); 6049} 6050 6051@ Note that this trace function returns the earley set size 6052of the {\bf current earley set}. 6053@ @<Public function prototypes@> = 6054gint marpa_earley_set_size(struct marpa_r *r, Marpa_Earley_Set_ID set_id); 6055@ @<Function definitions@> = 6056gint marpa_earley_set_size(struct marpa_r *r, Marpa_Earley_Set_ID set_id) 6057{ 6058 @<Return |-2| on failure@>@; 6059 ES earley_set; 6060 @<Fail if recognizer initial@>@; 6061 @<Fail if recognizer has fatal error@>@; 6062 r_update_earley_sets (r); 6063 if (!ES_Ord_is_Valid (r, set_id)) 6064 { 6065 R_ERROR ("invalid es ordinal"); 6066 return failure_indicator; 6067 } 6068 earley_set = ES_of_R_by_Ord (r, set_id); 6069 return EIM_Count_of_ES (earley_set); 6070} 6071 6072@** Earley Item (EIM) Code. 6073@ {\bf Optimization Principles:} 6074\li Optimization should favor unambiguous grammars, 6075but not heavily penalize ambiguous grammars. 6076\li Optimization should favor mildly ambiguous grammars, 6077but not heavily penalize very ambiguous grammars. 6078\li Optimization should focus on saving space, 6079perhaps even if at a slight cost in time. 6080@ Space savings are important 6081because in practical applications 6082there can easily be many millions of 6083Earley items and links. 6084If there are 1M copies of a structure, 6085each byte saved is a 1M saved. 6086 6087@ The solution arrived at is to optimize for Earley items 6088with a single source, storing that source in the item 6089itself. 6090For Earley item with multiple sources, a special structure 6091of linked lists is used. 6092When a second source is added, 6093the first source is copied into the lists, 6094and its original space used for pointers to the linked 6095lists. 6096@ This solution is optimized both 6097for the unambiguous case, 6098and for adding the third and additional 6099sources. 6100The only awkwardness takes place 6101when the second source is added, and the first one must 6102be recopied to make way for pointers to the linked lists. 6103@d EIM_FATAL_THRESHOLD (G_MAXINT/4) 6104@d Complete_SYMIDs_of_EIM(item) 6105 Complete_SYMIDs_of_AHFA(AHFA_of_EIM(item)) 6106@d Complete_SYM_Count_of_EIM(item) 6107 Complete_SYM_Count_of_AHFA(AHFA_of_EIM(item)) 6108@d Leo_LHS_ID_of_EIM(eim) Leo_LHS_ID_of_AHFA(AHFA_of_EIM(eim)) 6109@ It might be slightly faster if this boolean is memoized in the Earley item 6110when the Earley item is initialized. 6111@d Earley_Item_is_Completion(item) 6112 (Complete_SYM_Count_of_EIM(item) > 0) 6113@<Public typedefs@> = typedef gint Marpa_Earley_Item_ID; 6114@ The ID of the Earley item is per-Earley-set, so that 6115to uniquely specify the Earley item you must also specify 6116the Earley set. 6117@d ES_of_EIM(item) ((item)->t_key.t_set) 6118@d ES_Ord_of_EIM(item) (Ord_of_ES(ES_of_EIM(item))) 6119@d Ord_of_EIM(item) ((item)->t_ordinal) 6120@d Earleme_of_EIM(item) Earleme_of_ES(ES_of_EIM(item)) 6121@d AHFAID_of_EIM(item) (ID_of_AHFA(AHFA_of_EIM(item))) 6122@d AHFA_of_EIM(item) ((item)->t_key.t_state) 6123@d AIM_Count_of_EIM(item) (AIM_Count_of_AHFA(AHFA_of_EIM(item))) 6124@d Origin_Earleme_of_EIM(item) (Earleme_of_ES(Origin_of_EIM(item))) 6125@d Origin_Ord_of_EIM(item) (Ord_of_ES(Origin_of_EIM(item))) 6126@d Origin_of_EIM(item) ((item)->t_key.t_origin) 6127@d AIM_of_EIM_by_AEX(eim, aex) AIM_of_AHFA_by_AEX(AHFA_of_EIM(eim), (aex)) 6128@d AEX_of_EIM_by_AIM(eim, aim) AEX_of_AHFA_by_AIM(AHFA_of_EIM(eim), (aim)) 6129@<Private incomplete structures@> = 6130struct s_earley_item; 6131typedef struct s_earley_item* EIM; 6132typedef const struct s_earley_item* EIM_Const; 6133struct s_earley_item_key; 6134typedef struct s_earley_item_key* EIK; 6135 6136@ @<Earley item structure@> = 6137struct s_earley_item_key { 6138 AHFA t_state; 6139 ES t_origin; 6140 ES t_set; 6141}; 6142typedef struct s_earley_item_key EIK_Object; 6143struct s_earley_item { 6144 EIK_Object t_key; 6145 union u_source_container t_container; 6146 gint t_ordinal; 6147 @<Bit aligned Earley item elements@>@/ 6148}; 6149typedef struct s_earley_item EIM_Object; 6150 6151@*0 Constructor. 6152Find an Earley item object, creating it if it does not exist. 6153Only in a couple of cases per parse (in AHFA state 0), 6154do we already 6155know that the Earley item is unique in the set. 6156These are not worth optimizing for. 6157@<Private function prototypes@> = 6158static inline EIM earley_item_create(const RECCE r, 6159 const EIK_Object key); 6160@ @<Function definitions@> = 6161static inline EIM earley_item_create(const RECCE r, 6162 const EIK_Object key) 6163{ 6164 @<Return |NULL| on failure@>@; 6165 EIM new_item; 6166 EIM* top_of_work_stack; 6167 const ES set = key.t_set; 6168 const guint count = ++EIM_Count_of_ES(set); 6169 @<Check count against Earley item thresholds@>@; 6170 new_item = obstack_alloc (&r->t_obs, sizeof (*new_item)); 6171 new_item->t_key = key; 6172 new_item->t_source_type = NO_SOURCE; 6173 Ord_of_EIM(new_item) = count - 1; 6174 top_of_work_stack = WORK_EIM_PUSH(r); 6175 *top_of_work_stack = new_item; 6176 return new_item; 6177} 6178 6179@ @<Private function prototypes@> = 6180static inline 6181EIM earley_item_assign (const RECCE r, const ES set, const ES origin, const AHFA state); 6182@ @<Function definitions@> = 6183static inline EIM 6184earley_item_assign (const RECCE r, const ES set, const ES origin, 6185 const AHFA state) 6186{ 6187 EIK_Object key; 6188 EIM eim; 6189 PSL psl; 6190 AHFAID ahfa_id = ID_of_AHFA(state); 6191 PSL *psl_owner = &Dot_PSL_of_ES (origin); 6192 if (!*psl_owner) 6193 { 6194 psl_claim (psl_owner, Dot_PSAR_of_R(r)); 6195 } 6196 psl = *psl_owner; 6197 eim = PSL_Datum (psl, ahfa_id); 6198 if (eim 6199 && Earleme_of_EIM (eim) == Earleme_of_ES (set) 6200 && Earleme_of_ES (Origin_of_EIM (eim)) == Earleme_of_ES (origin)) 6201 { 6202 return eim; 6203 } 6204 key.t_origin = origin; 6205 key.t_state = state; 6206 key.t_set = set; 6207 eim = earley_item_create (r, key); 6208 PSL_Datum (psl, ahfa_id) = eim; 6209 return eim; 6210} 6211 6212@ The fatal threshold always applies. 6213The warning threshold does not count against items added by a Leo expansion. 6214@<Check count against Earley item thresholds@> = 6215if (count >= r->t_earley_item_warning_threshold) 6216 { 6217 if (G_UNLIKELY(count >= EIM_FATAL_THRESHOLD)) 6218 { /* Set the recognizer to a fatal error */ 6219 r_context_clear (r); 6220 R_FATAL("eim count exceeds fatal threshold"); 6221 return failure_indicator; 6222 } 6223 r_context_clear (r); 6224 r_message (r, "earley item count exceeds threshold"); 6225} 6226 6227@*0 Destructor. 6228No destructor. All earley item elements are either owned by other objects. 6229The Earley item itself is on the obstack. 6230 6231@*0 Source of the Earley Item. 6232@d NO_SOURCE (0U) 6233@d SOURCE_IS_TOKEN (1U) 6234@d SOURCE_IS_COMPLETION (2U) 6235@d SOURCE_IS_LEO (3U) 6236@d SOURCE_IS_AMBIGUOUS (4U) 6237@d Source_Type_of_EIM(item) ((item)->t_source_type) 6238@d Earley_Item_has_No_Source(item) ((item)->t_source_type == NO_SOURCE) 6239@d Earley_Item_has_Token_Source(item) ((item)->t_source_type == SOURCE_IS_TOKEN) 6240@d Earley_Item_has_Complete_Source(item) ((item)->t_source_type == SOURCE_IS_COMPLETION) 6241@d Earley_Item_has_Leo_Source(item) ((item)->t_source_type == SOURCE_IS_LEO) 6242@d Earley_Item_is_Ambiguous(item) ((item)->t_source_type == SOURCE_IS_AMBIGUOUS) 6243@<Bit aligned Earley item elements@> = 6244guint t_source_type:3; 6245 6246@ @<Private function prototypes@> = 6247static const char* invalid_source_type_message(guint type); 6248@ Not inline, because not used in critical paths. 6249This is for creating error messages. 6250@<Function definitions@> = 6251static const char* invalid_source_type_message(guint type) { 6252 switch (type) { 6253 case NO_SOURCE: 6254 return "invalid source type: none"; 6255 case SOURCE_IS_TOKEN: 6256 return "invalid source type: token"; 6257 case SOURCE_IS_COMPLETION: 6258 return "invalid source type: completion"; 6259 case SOURCE_IS_LEO: 6260 return "invalid source type: leo"; 6261 case SOURCE_IS_AMBIGUOUS: 6262 return "invalid source type: ambiguous"; 6263 } 6264 return "unknown source type"; 6265} 6266 6267@*0 Trace Functions. 6268Many of the 6269trace functions use 6270a ``trace Earley item" which is 6271tracked on a per-recognizer basis. 6272@<Widely aligned recognizer elements@> = 6273EIM t_trace_earley_item; 6274@ @<Initialize recognizer elements@> = 6275r->t_trace_earley_item = NULL; 6276@ This function returns the AHFA state ID of an Earley item, 6277and sets the trace Earley item, 6278if it successfully finds an Earley item 6279in the trace Earley set with the specified 6280AHFA state ID and origin earleme. 6281If there is no such Earley item, 6282it returns |-1|, 6283and clears the trace Earley item. 6284On failure for other reasons, 6285it returns |-2|, 6286and clears the trace Earley item. 6287@ The trace Earley item is cleared if no matching 6288Earley item is found, and on failure. 6289The trace source link is always 6290cleared, regardless of success or failure. 6291 6292@ This function sets 6293the trace Earley set to the one indicated 6294by the ID 6295of the argument. 6296On success, 6297the earleme of the new trace Earley set is 6298returned. 6299@ Various other trace data depends on the Earley 6300set, and must be consistent with it. 6301This function clears all such data, 6302unless it is called while the recognizer is in 6303a trace-unsafe state (initial, fatal, etc.) 6304or unless the the Earley set requested by the 6305argument is already the trace Earley set. 6306On failure because the ID is for a non-existent 6307Earley set which does not 6308exist, |-1| is returned. 6309The upper levels may choose to treat this as a soft failure. 6310This may be treated as a soft failure by the upper levels. 6311On failure because the ID is illegal (less than zero) 6312or for other failures, |-2| is returned. 6313The upper levels may choose to treat these as hard failures. 6314@ @<Public function prototypes@> = 6315Marpa_Earleme 6316marpa_earley_set_trace (struct marpa_r *r, Marpa_Earley_Set_ID set_id); 6317@ @<Function definitions@> = 6318Marpa_Earleme 6319marpa_earley_set_trace (struct marpa_r *r, Marpa_Earley_Set_ID set_id) 6320{ 6321 ES earley_set; 6322 const gint es_does_not_exist = -1; 6323 @<Return |-2| on failure@>@/ 6324 @<Fail recognizer if not trace-safe@>@; 6325 if (r->t_trace_earley_set && Ord_of_ES (r->t_trace_earley_set) == set_id) 6326 { /* If the set is already 6327 the current earley set, 6328 return successfully without resetting any of the dependant data */ 6329 return Earleme_of_ES (r->t_trace_earley_set); 6330 } 6331 @<Clear trace Earley set dependent data@>@; 6332 if (set_id < 0) 6333 { 6334 R_ERROR ("invalid es ordinal"); 6335 return failure_indicator; 6336 } 6337 r_update_earley_sets (r); 6338 if (set_id >= DSTACK_LENGTH (r->t_earley_set_stack)) 6339 { 6340 return es_does_not_exist; 6341 } 6342 earley_set = ES_of_R_by_Ord (r, set_id); 6343 r->t_trace_earley_set = earley_set; 6344 return Earleme_of_ES(earley_set); 6345} 6346 6347@ @<Clear trace Earley set dependent data@> = { 6348 r->t_trace_earley_set = NULL; 6349 trace_earley_item_clear(r); 6350 @<Clear trace postdot item data@>@; 6351} 6352 6353@ @<Public function prototypes@> = 6354Marpa_AHFA_State_ID 6355marpa_earley_item_trace (struct marpa_r *r, 6356 Marpa_Earley_Item_ID item_id); 6357@ @<Function definitions@> = 6358Marpa_AHFA_State_ID 6359marpa_earley_item_trace (struct marpa_r *r, Marpa_Earley_Item_ID item_id) 6360{ 6361 const gint eim_does_not_exist = -1; 6362 @<Return |-2| on failure@>@; 6363 ES trace_earley_set; 6364 EIM earley_item; 6365 EIM *earley_items; 6366 @<Fail recognizer if not trace-safe@>@; 6367 trace_earley_set = r->t_trace_earley_set; 6368 if (!trace_earley_set) 6369 { 6370 @<Clear trace Earley set dependent data@>@; 6371 R_ERROR ("no trace es"); 6372 return failure_indicator; 6373 } 6374 trace_earley_item_clear (r); 6375 if (item_id < 0) 6376 { 6377 R_ERROR ("invalid eim ordinal"); 6378 return failure_indicator; 6379 } 6380 if (item_id >= EIM_Count_of_ES (trace_earley_set)) 6381 { 6382 return eim_does_not_exist; 6383 } 6384 earley_items = EIMs_of_ES (trace_earley_set); 6385 earley_item = earley_items[item_id]; 6386 r->t_trace_earley_item = earley_item; 6387 return AHFAID_of_EIM (earley_item); 6388} 6389 6390@ Clear all the data elements specifically 6391for the trace Earley item. 6392The difference between this code and 6393|trace_earley_item_clear| is 6394that |trace_earley_item_clear| 6395also clears the source link. 6396@<Clear trace Earley item data@> = 6397 r->t_trace_earley_item = NULL; 6398 6399@ @<Private function prototypes@> = 6400static inline void trace_earley_item_clear(struct marpa_r* r); 6401@ @<Function definitions@> = 6402static inline void trace_earley_item_clear(struct marpa_r* r) 6403{ 6404 @<Clear trace Earley item data@>@/ 6405 trace_source_link_clear(r); 6406} 6407 6408@ @<Private function prototypes@> = 6409Marpa_Earley_Set_ID marpa_earley_item_origin(struct marpa_r *r); 6410@ @<Function definitions@> = 6411Marpa_Earley_Set_ID marpa_earley_item_origin(struct marpa_r *r) 6412{ 6413 @<Return |-2| on failure@>@; 6414 EIM item = r->t_trace_earley_item; 6415 @<Fail if recognizer initial@>@; 6416 if (!item) { 6417 @<Clear trace Earley item data@>@; 6418 R_ERROR("no trace eim"); 6419 return failure_indicator; 6420 } 6421 return Origin_Ord_of_EIM(item); 6422} 6423 6424@** Earley Index (EIX) Code. 6425Postdot items are of two kinds: Earley indexes 6426and Leo items. 6427The payload of an Earley index is simple: 6428a pointer to an Earley item. 6429The other elements of the EIX are overhead to 6430support the chain of postdot items for 6431a postdot symbol. 6432@d Next_PIM_of_EIX(eix) ((eix)->t_next) 6433@d LV_Next_PIM_of_EIX(eix) Next_PIM_of_EIX(eix) 6434@d EIM_of_EIX(eix) ((eix)->t_earley_item) 6435@d LV_EIM_of_EIX(eix) EIM_of_EIX(eix) 6436@d Postdot_SYMID_of_EIX(eix) ((eix)->t_postdot_symid) 6437@d LV_Postdot_SYMID_of_EIX(eix) Postdot_SYMID_of_EIX(eix) 6438@<Private incomplete structures@> = 6439struct s_earley_ix; 6440typedef struct s_earley_ix* EIX; 6441union u_postdot_item; 6442@ @<Private structures@> = 6443struct s_earley_ix { 6444 union u_postdot_item* t_next; 6445 SYMID t_postdot_symid; 6446 EIM t_earley_item; // Never NULL if this is an index item 6447}; 6448typedef struct s_earley_ix EIX_Object; 6449 6450@** Leo Item (LIM) Code. 6451Leo items originate from the ``transition items" of Joop Leo's 1991 paper. 6452They are set up so their first fields are identical to those of 6453the Earley item indexes, 6454so that they can be linked together in the same chain. 6455Because the Earley index is at the beginning of each Leo item, 6456LIMs can be treated as a kind of EIX. 6457@d EIX_of_LIM(lim) ((EIX)(lim)) 6458@ Both Earley indexes and Leo items are 6459postdot items, so that Leo items also require 6460the fields to maintain the chain of postdot items. 6461For this reason, Leo items contain an Earley index, 6462but one 6463with a |NULL| Earley item pointer. 6464@d Postdot_SYMID_of_LIM(leo) (Postdot_SYMID_of_EIX(EIX_of_LIM(leo))) 6465@d Next_PIM_of_LIM(leo) (Next_PIM_of_EIX(EIX_of_LIM(leo))) 6466@d LV_Next_PIM_of_LIM(leo) Next_PIM_of_LIM(leo) 6467@d Origin_of_LIM(leo) ((leo)->t_origin) 6468@d LV_Origin_of_LIM(leo) Origin_of_LIM(leo) 6469@d Top_AHFA_of_LIM(leo) ((leo)->t_top_ahfa) 6470@d LV_Top_AHFA_of_LIM(leo) Top_AHFA_of_LIM(leo) 6471@d Predecessor_LIM_of_LIM(leo) ((leo)->t_predecessor) 6472@d LV_Predecessor_LIM_of_LIM(leo) Predecessor_LIM_of_LIM(leo) 6473@d Base_EIM_of_LIM(leo) ((leo)->t_base) 6474@d LV_Base_EIM_of_LIM(leo) Base_EIM_of_LIM(leo) 6475@d ES_of_LIM(leo) ((leo)->t_set) 6476@d LV_ES_of_LIM(leo) ES_of_LIM(leo) 6477@d Chain_Length_of_LIM(leo) ((leo)->t_chain_length) 6478@d LV_Chain_Length_of_LIM(leo) Chain_Length_of_LIM(leo) 6479@d Earleme_of_LIM(lim) Earleme_of_ES(ES_of_LIM(lim)) 6480@<Private incomplete structures@> = 6481struct s_leo_item; 6482typedef struct s_leo_item* LIM; 6483@ @<Private structures@> = 6484struct s_leo_item { 6485 EIX_Object t_earley_ix; 6486 ES t_origin; 6487 AHFA t_top_ahfa; 6488 LIM t_predecessor; 6489 EIM t_base; 6490 ES t_set; 6491 gint t_chain_length; 6492}; 6493typedef struct s_leo_item LIM_Object; 6494 6495@*0 Trace Functions. 6496The functions in this section are all accessors. 6497The trace Leo item is selected by setting the trace postdot item 6498to a Leo item. 6499 6500@ @<Private function prototypes@> = 6501Marpa_Symbol_ID marpa_leo_predecessor_symbol(struct marpa_r *r); 6502@ @<Function definitions@> = 6503Marpa_Symbol_ID marpa_leo_predecessor_symbol(struct marpa_r *r) 6504{ 6505 const Marpa_Symbol_ID no_predecessor = -1; 6506 @<Return |-2| on failure@>@; 6507 PIM postdot_item = r->t_trace_postdot_item; 6508 LIM predecessor_leo_item; 6509 @<Fail recognizer if not trace-safe@>@; 6510 if (!postdot_item) { 6511 R_ERROR("no trace pim"); 6512 return failure_indicator; 6513 } 6514 if (EIM_of_PIM(postdot_item)) { 6515 R_ERROR("pim is not lim"); 6516 return failure_indicator; 6517 } 6518 predecessor_leo_item = Predecessor_LIM_of_LIM(LIM_of_PIM(postdot_item)); 6519 if (!predecessor_leo_item) return no_predecessor; 6520 return Postdot_SYMID_of_LIM(predecessor_leo_item); 6521} 6522 6523Marpa_Earley_Set_ID marpa_leo_base_origin(struct marpa_r *r); 6524@ @<Function definitions@> = 6525Marpa_Earley_Set_ID marpa_leo_base_origin(struct marpa_r *r) 6526{ 6527 const EARLEME pim_is_not_a_leo_item = -1; 6528 @<Return |-2| on failure@>@; 6529 PIM postdot_item = r->t_trace_postdot_item; 6530 EIM base_earley_item; 6531 @<Fail recognizer if not trace-safe@>@; 6532 if (!postdot_item) { 6533 R_ERROR("no trace pim"); 6534 return failure_indicator; 6535 } 6536 if (EIM_of_PIM(postdot_item)) return pim_is_not_a_leo_item; 6537 base_earley_item = Base_EIM_of_LIM(LIM_of_PIM(postdot_item)); 6538 return Origin_Ord_of_EIM(base_earley_item); 6539} 6540 6541@ @<Private function prototypes@> = 6542Marpa_AHFA_State_ID marpa_leo_base_state(struct marpa_r *r); 6543@ @<Function definitions@> = 6544Marpa_AHFA_State_ID marpa_leo_base_state(struct marpa_r *r) 6545{ 6546 const EARLEME pim_is_not_a_leo_item = -1; 6547 @<Return |-2| on failure@>@; 6548 PIM postdot_item = r->t_trace_postdot_item; 6549 EIM base_earley_item; 6550 @<Fail recognizer if not trace-safe@>@; 6551 if (!postdot_item) { 6552 R_ERROR("no trace pim"); 6553 return failure_indicator; 6554 } 6555 if (EIM_of_PIM(postdot_item)) return pim_is_not_a_leo_item; 6556 base_earley_item = Base_EIM_of_LIM(LIM_of_PIM(postdot_item)); 6557 return AHFAID_of_EIM(base_earley_item); 6558} 6559 6560@ This function 6561returns the ``Leo expansion AHFA" of the current trace Leo item. 6562@<Private function prototypes@> = 6563Marpa_AHFA_State_ID marpa_leo_expansion_ahfa(struct marpa_r *r); 6564@ The {\bf Leo expansion AHFA} is the AHFA 6565of the {\bf Leo expansion Earley item}. 6566for this Leo item. 6567{\bf Leo expansion Earley items}, when 6568the context makes the meaning clear, 6569are also called {\bf Leo expansion items} 6570or simply {\bf Leo expansions}. 6571@ Every Leo item has a unique Leo expansion Earley item, 6572because for this purpose 6573the process of 6574Leo expansion is seen from a non-recursive point of view. 6575In practice, Leo expansion is recursive, 6576andl creation of the Leo expansion Earley item for 6577one Leo item 6578implies 6579the Leo expansion of all of the predecessors of that 6580Leo item. 6581@ Note that expansion of the Leo item at the top 6582of a Leo path is not needed---% 6583if a Leo item is the predecessor in 6584a Leo source for a Leo completion item, 6585the Leo completion item is the expansion of that Leo item. 6586@ @<Function definitions@> = 6587Marpa_AHFA_State_ID marpa_leo_expansion_ahfa(struct marpa_r *r) 6588{ 6589 const EARLEME pim_is_not_a_leo_item = -1; 6590 @<Return |-2| on failure@>@; 6591 const PIM postdot_item = r->t_trace_postdot_item; 6592 @<Fail recognizer if not trace-safe@>@; 6593 if (!postdot_item) 6594 { 6595 R_ERROR ("no trace pim"); 6596 return failure_indicator; 6597 } 6598 if (!EIM_of_PIM (postdot_item)) 6599 { 6600 const LIM leo_item = LIM_of_PIM (postdot_item); 6601 const EIM base_earley_item = Base_EIM_of_LIM (leo_item); 6602 const SYMID postdot_symbol = Postdot_SYMID_of_LIM (leo_item); 6603 const AHFA to_ahfa = To_AHFA_of_EIM_by_SYMID (base_earley_item, postdot_symbol); 6604 return ID_of_AHFA(to_ahfa); 6605 } 6606 return pim_is_not_a_leo_item; 6607} 6608 6609 6610@** Postdot Item (PIM) code. 6611Postdot items are entries in an index, 6612by postdot symbol, of both the Earley items and the Leo items 6613for each Earley set. 6614@d LIM_of_PIM(pim) ((LIM)(pim)) 6615@d EIX_of_PIM(pim) ((EIX)(pim)) 6616@d Postdot_SYMID_of_PIM(pim) (Postdot_SYMID_of_EIX(EIX_of_PIM(pim))) 6617@d LV_Postdot_SYMID_of_PIM(pim) Postdot_SYMID_of_PIM(pim) 6618@d EIM_of_PIM(pim) (EIM_of_EIX(EIX_of_PIM(pim))) 6619@d LV_EIM_of_PIM(pim) EIM_of_PIM(pim) 6620@d Next_PIM_of_PIM(pim) (Next_PIM_of_EIX(EIX_of_PIM(pim))) 6621@d LV_Next_PIM_of_PIM(pim) Next_PIM_of_PIM(pim) 6622 6623@ |PIM_of_LIM| assumes that PIM is in fact a LIM. 6624|PIM_is_LIM| is available to check this. 6625@d PIM_of_LIM(pim) ((PIM)(pim)) 6626@d PIM_is_LIM(pim) (EIM_of_EIX(EIX_of_PIM(pim)) == NULL) 6627@s PIM int 6628@<Private structures@> = 6629union u_postdot_item { 6630 LIM_Object t_leo; 6631 EIX_Object t_earley; 6632}; 6633typedef union u_postdot_item* PIM; 6634 6635@*0 Symbol of a Postdot Item. 6636@d SYMID_of_Postdot_Item(postdot) ((postdot)->t_earley.transition_symid) 6637 6638@ This function searches for the 6639first postdot item for an Earley set 6640and a symbol ID. 6641If successful, it 6642returns that postdot item. 6643If it fails, it returns |NULL|. 6644@<Private function prototypes@> = 6645static inline PIM* pim_sym_p_find(ES set, SYMID symid); 6646@ @<Function definitions@> = 6647static inline PIM* 6648pim_sym_p_find (ES set, SYMID symid) 6649{ 6650 gint lo = 0; 6651 gint hi = Postdot_SYM_Count_of_ES(set) - 1; 6652 PIM* postdot_array = set->t_postdot_ary; 6653 while (hi >= lo) { // A binary search 6654 gint trial = lo+(hi-lo)/2; // guards against overflow 6655 PIM trial_pim = postdot_array[trial]; 6656 SYMID trial_symid = Postdot_SYMID_of_PIM(trial_pim); 6657 if (trial_symid == symid) return postdot_array+trial; 6658 if (trial_symid < symid) { 6659 lo = trial+1; 6660 } else { 6661 hi = trial-1; 6662 } 6663 } 6664 return NULL; 6665} 6666@ @<Private function prototypes@> = 6667static inline PIM first_pim_of_es_by_symid(ES set, SYMID symid); 6668@ @<Function definitions@> = 6669static inline PIM first_pim_of_es_by_symid(ES set, SYMID symid) 6670{ 6671 PIM* pim_sym_p = pim_sym_p_find(set, symid); 6672 return pim_sym_p ? *pim_sym_p : NULL; 6673} 6674 6675@*0 Trace Functions. 6676Many of the 6677trace functions use 6678a ``trace postdot item". 6679This is 6680tracked on a per-recognizer basis. 6681@<Widely aligned recognizer elements@> = 6682union u_postdot_item** t_trace_pim_sym_p; 6683union u_postdot_item* t_trace_postdot_item; 6684@ @<Initialize recognizer elements@> = 6685r->t_trace_pim_sym_p = NULL; 6686r->t_trace_postdot_item = NULL; 6687@ |marpa_postdot_symbol_trace| 6688takes a recognizer and a symbol ID 6689as an argument. 6690It sets the trace postdot item to the first 6691postdot item for the symbol ID. 6692If there is no postdot item 6693for that symbol ID, 6694it returns |-1|. 6695On failure for other reasons, 6696it returns |-2| 6697and clears the trace postdot item. 6698@<Public function prototypes@> = 6699Marpa_Symbol_ID 6700marpa_postdot_symbol_trace (struct marpa_r *r, 6701 Marpa_Symbol_ID symid); 6702@ @<Function definitions@> = 6703Marpa_Symbol_ID 6704marpa_postdot_symbol_trace (struct marpa_r *r, 6705 Marpa_Symbol_ID symid) 6706{ 6707 @<Return |-2| on failure@>@; 6708 ES current_es = r->t_trace_earley_set; 6709 PIM* pim_sym_p; 6710 PIM pim; 6711 @<Clear trace postdot item data@>@; 6712 @<Fail recognizer if not trace-safe@>@; 6713 @<Fail if recognizer |symid| is invalid@>@; 6714 if (!current_es) { 6715 R_ERROR("no pim"); 6716 return failure_indicator; 6717 } 6718 pim_sym_p = PIM_SYM_P_of_ES_by_SYMID(current_es, symid); 6719 pim = *pim_sym_p; 6720 if (!pim) return -1; 6721 r->t_trace_pim_sym_p = pim_sym_p; 6722 r->t_trace_postdot_item = pim; 6723 return symid; 6724} 6725 6726@ @<Clear trace postdot item data@> = 6727r->t_trace_pim_sym_p = NULL; 6728r->t_trace_postdot_item = NULL; 6729 6730@ Set trace postdot item to the first in the trace Earley set, 6731and return its postdot symbol ID. 6732If the trace Earley set has no postdot items, return -1 and 6733clear the trace postdot item. 6734On other failures, return -2 and clear the trace 6735postdot item. 6736@<Public function prototypes@> = 6737Marpa_Symbol_ID 6738marpa_first_postdot_item_trace (struct marpa_r *r); 6739@ @<Function definitions@> = 6740Marpa_Symbol_ID 6741marpa_first_postdot_item_trace (struct marpa_r *r) 6742{ 6743 @<Return |-2| on failure@>@; 6744 ES current_earley_set = r->t_trace_earley_set; 6745 PIM pim; 6746 PIM* pim_sym_p; 6747 @<Clear trace postdot item data@>@; 6748 @<Fail recognizer if not trace-safe@>@; 6749 if (!current_earley_set) { 6750 @<Clear trace Earley item data@>@; 6751 R_ERROR("no trace es"); 6752 return failure_indicator; 6753 } 6754 if (current_earley_set->t_postdot_sym_count <= 0) return -1; 6755 pim_sym_p = current_earley_set->t_postdot_ary+0; 6756 pim = pim_sym_p[0]; 6757 r->t_trace_pim_sym_p = pim_sym_p; 6758 r->t_trace_postdot_item = pim; 6759 return Postdot_SYMID_of_PIM(pim); 6760} 6761 6762@ Set the trace postdot item to the one after 6763the current trace postdot item, 6764and return its postdot symbol ID. 6765If the current trace postdot item is the last, 6766return -1 and clear the trace postdot item. 6767On other failures, return -2 and clear the trace 6768postdot item. 6769@<Public function prototypes@> = 6770Marpa_Symbol_ID 6771marpa_next_postdot_item_trace (struct marpa_r *r); 6772@ @<Function definitions@> = 6773Marpa_Symbol_ID 6774marpa_next_postdot_item_trace (struct marpa_r *r) 6775{ 6776 const SYMID no_more_postdot_symbols = -1; 6777 @<Return |-2| on failure@>@; 6778 ES current_set = r->t_trace_earley_set; 6779 PIM pim; 6780 PIM* pim_sym_p; 6781 6782 pim_sym_p = r->t_trace_pim_sym_p; 6783 pim = r->t_trace_postdot_item; 6784 @<Clear trace postdot item data@>@; 6785 if (!pim_sym_p || !pim) { 6786 R_ERROR("no trace pim"); 6787 return failure_indicator; 6788 } 6789 @<Fail recognizer if not trace-safe@>@; 6790 if (!current_set) { 6791 R_ERROR("no trace es"); 6792 return failure_indicator; 6793 } 6794 pim = Next_PIM_of_PIM(pim); 6795 if (!pim) { /* If no next postdot item for this symbol, 6796 then look at next symbol */ 6797 pim_sym_p++; 6798 if (pim_sym_p - current_set->t_postdot_ary 6799 >= current_set->t_postdot_sym_count) { 6800 return no_more_postdot_symbols; 6801 } 6802 pim = *pim_sym_p; 6803 } 6804 r->t_trace_pim_sym_p = pim_sym_p; 6805 r->t_trace_postdot_item = pim; 6806 return Postdot_SYMID_of_PIM(pim); 6807} 6808 6809@ @<Private function prototypes@> = 6810Marpa_AHFA_State_ID marpa_postdot_item_symbol(struct marpa_r *r); 6811@ @<Function definitions@> = 6812Marpa_AHFA_State_ID marpa_postdot_item_symbol(struct marpa_r *r) 6813{ 6814 @<Return |-2| on failure@>@; 6815 PIM postdot_item = r->t_trace_postdot_item; 6816 @<Fail recognizer if not trace-safe@>@; 6817 if (!postdot_item) { 6818 R_ERROR("no trace pim"); 6819 return failure_indicator; 6820 } 6821 return Postdot_SYMID_of_PIM(postdot_item); 6822} 6823 6824 6825@** Source Objects. 6826These are distinguished by context. 6827@*0 The Relationship between Leo items and Ambiguity. 6828The relationship between Leo items and ambiguous sources bears 6829some explaining. 6830Leo sources must be unique, but only when their predecessor's 6831Earley set is considered. 6832That is, for every pairing of Earley item and Earley set, 6833if there be only one Leo source in that Earley item 6834with a predecessor in that Earley set. 6835But there may be other sources (both Leo and non-Leo), 6836a long as their predecessors 6837are in different Earley sets. 6838@ One way to look at these Leo ambiguities is as different 6839``factorings" of the Earley item. 6840Assume the last (or transition) symbol of an Earley item 6841is a token. 6842An Earley item will often have both a predecessor and a token, 6843and these can ``factor", or divide up, the distance between 6844an Earley item's origin and its current set in different ways. 6845@ The Earley item can have only one origin, 6846and only one transition symbol. 6847But that transition symbol does not have to start at the origin 6848and can start anywhere between the origin and the current 6849set of the Earley item. 6850For example, for an Earley item at earleme 14, with its origin at 10, 6851tokens may start at earlemes 10, 11, 12 and 13. 6852Each may have its own Leo source. 6853At those earlemes without a Leo source, there may be any number 6854of non-Leo sources. 6855@ In this way, an Earley item with a Leo source can be ambiguous. 6856The discussion above assumed the final symbol was a token. 6857The situation for completion Earley items is similar, 6858and these also can both have a Leo source and 6859be ambiguous. 6860@*0 Optimization. 6861There will be a lot of these structures in a long 6862parse, so space optimization is important. 6863I have some latitude in the number of linked lists 6864in a ambiguous source. 6865If an |int| is the same size as a |void*|, 6866then space for three |void*| in ambiguous sources 6867comes ``free". 6868If |void*| is $n$ bytes larger than an |int|, 6869then each unambiguous source uses $n$ bytes 6870more than it has to, although there are 6871compensating improvements in 6872speed and simplicity. 6873Any programmer trying to take advantage 6874of architectures where |int| 6875is shorter than |void*| will need to 6876assure herself that the space she saves in 6877the |ambiguous_source| struct was not simply wasted 6878by alignment within structures or during memory allocation. 6879@d Next_SRCL_of_SRCL(link) ((link)->t_next) 6880@d LV_Next_SRCL_of_SRCL(link) Next_SRCL_of_SRCL(link) 6881@ @<Private typedefs@> = 6882struct s_source; 6883typedef struct s_source* SRC; 6884@ @<Source object structure@>= 6885struct s_source { 6886 gpointer t_predecessor; 6887 union { 6888 gpointer t_completion; 6889 TOK t_token; 6890 } t_cause; 6891}; 6892 6893@ @<Private typedefs@> = 6894struct s_source_link; 6895typedef struct s_source_link* SRCL; 6896@ @<Source object structure@>= 6897struct s_source_link { 6898 SRCL t_next; 6899 struct s_source t_source; 6900}; 6901 6902@ @<Source object structure@>= 6903struct s_ambiguous_source { 6904 SRCL t_leo; 6905 SRCL t_token; 6906 SRCL t_completion; 6907}; 6908 6909@ @<Source object structure@>= 6910union u_source_container { 6911 struct s_ambiguous_source t_ambiguous; 6912 struct s_source t_unique; 6913}; 6914 6915@ 6916@d Source_of_SRCL(link) ((link)->t_source) 6917@d Source_of_EIM(eim) ((eim)->t_container.t_unique) 6918@d Predecessor_of_Source(srcd) ((srcd).t_predecessor) 6919@d Predecessor_of_SRC(source) Predecessor_of_Source(*(source)) 6920@d Predecessor_of_EIM(item) Predecessor_of_Source(Source_of_EIM(item)) 6921@d Predecessor_of_SRCL(link) Predecessor_of_Source(Source_of_SRCL(link)) 6922@d LV_Predecessor_of_SRCL(link) Predecessor_of_SRCL(link) 6923@d Cause_of_Source(srcd) ((srcd).t_cause.t_completion) 6924@d Cause_of_SRC(source) Cause_of_Source(*(source)) 6925@d Cause_of_EIM(item) Cause_of_Source(Source_of_EIM(item)) 6926@d Cause_of_SRCL(link) Cause_of_Source(Source_of_SRCL(link)) 6927@d TOK_of_Source(srcd) ((srcd).t_cause.t_token) 6928@d TOK_of_SRC(source) TOK_of_Source(*(source)) 6929@d TOK_of_EIM(eim) TOK_of_Source(Source_of_EIM(eim)) 6930@d TOK_of_SRCL(link) TOK_of_Source(Source_of_SRCL(link)) 6931@d SYMID_of_Source(srcd) SYMID_of_TOK(TOK_of_Source(srcd)) 6932@d SYMID_of_SRC(source) SYMID_of_Source(*(source)) 6933@d SYMID_of_EIM(eim) SYMID_of_Source(Source_of_EIM(eim)) 6934@d SYMID_of_SRCL(link) SYMID_of_Source(Source_of_SRCL(link)) 6935 6936@ @d Cause_AHFA_State_ID_of_SRC(source) 6937 AHFAID_of_EIM((EIM)Cause_of_SRC(source)) 6938@d Leo_Transition_SYMID_of_SRC(leo_source) 6939 Postdot_SYMID_of_LIM((LIM)Predecessor_of_SRC(leo_source)) 6940 6941@ 6942@d First_Completion_Link_of_EIM(item) ((item)->t_container.t_ambiguous.t_completion) 6943@d LV_First_Completion_Link_of_EIM(item) First_Completion_Link_of_EIM(item) 6944@d First_Token_Link_of_EIM(item) ((item)->t_container.t_ambiguous.t_token) 6945@d LV_First_Token_Link_of_EIM(item) First_Token_Link_of_EIM(item) 6946@d First_Leo_SRCL_of_EIM(item) ((item)->t_container.t_ambiguous.t_leo) 6947@d LV_First_Leo_SRCL_of_EIM(item) First_Leo_SRCL_of_EIM(item) 6948 6949@ @<Private function prototypes@> = static inline void 6950token_link_add (struct marpa_r *r, 6951 EIM item, 6952 EIM predecessor, 6953 TOK token); 6954@ @<Function definitions@> = static inline 6955void 6956token_link_add (struct marpa_r *r, 6957 EIM item, 6958 EIM predecessor, 6959 TOK token) 6960{ 6961 SRCL new_link; 6962 guint previous_source_type = Source_Type_of_EIM (item); 6963 if (previous_source_type == NO_SOURCE) 6964 { 6965 Source_Type_of_EIM (item) = SOURCE_IS_TOKEN; 6966 item->t_container.t_unique.t_predecessor = predecessor; 6967 TOK_of_Source(item->t_container.t_unique) = token; 6968 return; 6969 } 6970 if (previous_source_type != SOURCE_IS_AMBIGUOUS) 6971 { // If the sourcing is not already ambiguous, make it so 6972 earley_item_ambiguate (r, item); 6973 } 6974 new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 6975 new_link->t_next = First_Token_Link_of_EIM (item); 6976 new_link->t_source.t_predecessor = predecessor; 6977 TOK_of_Source(new_link->t_source) = token; 6978 LV_First_Token_Link_of_EIM (item) = new_link; 6979} 6980 6981@ @<Private function prototypes@> = static inline void 6982completion_link_add (struct marpa_r *r, 6983 EIM item, 6984 EIM predecessor, 6985 EIM cause); 6986@ 6987Each possible cause 6988link is only visited once. 6989It may be paired with several different predecessors. 6990Each cause may complete several different LHS symbols 6991and Marpa::XS will seek predecessors for each at 6992the parent location. 6993Two different completed LHS symbols might be postdot 6994symbols for the same predecessor Earley item. 6995For this reason, 6996predecessor-cause pairs 6997might not be unique 6998within an Earley item. 6999@ Since a completion link consists entirely of 7000the predecessor-cause pair, this means duplicate 7001completion links are possible. 7002The maximum possible number of such duplicates is the 7003number of complete LHS symbols for the current AHFA state. 7004This is alway a constant and typically a small one, 7005but it is also typically larger than 1. 7006@ This is not an issue for unambiguous parsing. 7007It {\bf is} an issue for iterating ambiguous parses. 7008The strategy currently taken is to do nothing about duplicates 7009in the recognition phase, 7010and to eliminate them in the evaluation phase. 7011Ultimately, duplicates must be eliminated by rule and 7012position -- eliminating duplicates by AHFA state is 7013{\bf not} sufficient. 7014Since I do not pull out the 7015individual rules and positions until the evaluation phase, 7016at this writing it seems to make sense to deal with 7017duplicates there. 7018@ As shown above, the number of duplicate completion links 7019is never more than $O(n)$ where $n$ is the number of Earley items. 7020For academic purposes, it 7021is probably possible to contrive a parse which generates 7022a lot of duplicates. 7023The actual numbers 7024I have encountered have always been very small, 7025even in grammars of only academic interest. 7026@ The carrying cost of the extra completion links can be safely 7027assumed to be very low, 7028in comparision with the cost of searching for them. 7029This means that the major consideration in deciding 7030where to eliminate duplicates, 7031is time efficiency. 7032Duplicate completion links should be eliminated 7033at the point where that elimination can be accomplished 7034most efficiently. 7035@<Function definitions@> = static inline 7036void 7037completion_link_add (struct marpa_r *r, 7038 EIM item, 7039 EIM predecessor, 7040 EIM cause) 7041{ 7042 SRCL new_link; 7043 guint previous_source_type = Source_Type_of_EIM (item); 7044 if (previous_source_type == NO_SOURCE) 7045 { 7046 Source_Type_of_EIM (item) = SOURCE_IS_COMPLETION; 7047 item->t_container.t_unique.t_predecessor = predecessor; 7048 Cause_of_Source(item->t_container.t_unique) = cause; 7049 return; 7050 } 7051 if (previous_source_type != SOURCE_IS_AMBIGUOUS) 7052 { // If the sourcing is not already ambiguous, make it so 7053 earley_item_ambiguate (r, item); 7054 } 7055 new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 7056 new_link->t_next = First_Completion_Link_of_EIM (item); 7057 new_link->t_source.t_predecessor = predecessor; 7058 Cause_of_Source(new_link->t_source) = cause; 7059 LV_First_Completion_Link_of_EIM (item) = new_link; 7060} 7061 7062@ @<Function definitions@> = static inline 7063void 7064leo_link_add (struct marpa_r *r, 7065 EIM item, 7066 LIM predecessor, 7067 EIM cause) 7068{ 7069 SRCL new_link; 7070 guint previous_source_type = Source_Type_of_EIM (item); 7071 if (previous_source_type == NO_SOURCE) 7072 { 7073 Source_Type_of_EIM (item) = SOURCE_IS_LEO; 7074 item->t_container.t_unique.t_predecessor = predecessor; 7075 Cause_of_Source(item->t_container.t_unique) = cause; 7076 return; 7077 } 7078 if (previous_source_type != SOURCE_IS_AMBIGUOUS) 7079 { // If the sourcing is not already ambiguous, make it so 7080 earley_item_ambiguate (r, item); 7081 } 7082 new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 7083 new_link->t_next = First_Leo_SRCL_of_EIM (item); 7084 new_link->t_source.t_predecessor = predecessor; 7085 Cause_of_Source(new_link->t_source) = cause; 7086 LV_First_Leo_SRCL_of_EIM(item) = new_link; 7087} 7088@ @<Private function prototypes@> = static inline void 7089leo_link_add (struct marpa_r *r, 7090 EIM item, 7091 LIM predecessor, 7092 EIM cause); 7093 7094@ {\bf Convert an Earley item to an ambiguous one.} 7095|earley_item_ambiguate| 7096assumes it is called when there is exactly one source. 7097In other words, is assumes that the Earley item 7098is not unsourced, 7099and that it is not already ambiguous. 7100Ambiguous sources should have more than one source, 7101and 7102|earley_item_ambiguate| 7103is assuming that a new source will be added as followup. 7104@ 7105Inlining |earley_item_ambiguate| might help in some 7106circumstance, but at this point 7107|earley_item_ambiguate| is not marked |inline|. 7108|earley_item_ambiguate| 7109is not short, 7110it is referenced in several places, 7111it is only called for ambiguous Earley items, 7112and even for these it is only called when the 7113Earley item first becomes ambiguous. 7114@<Function definitions@> = static 7115void earley_item_ambiguate (struct marpa_r * r, EIM item) 7116{ 7117 guint previous_source_type = Source_Type_of_EIM (item); 7118 Source_Type_of_EIM (item) = SOURCE_IS_AMBIGUOUS; 7119 switch (previous_source_type) 7120 { 7121 case SOURCE_IS_TOKEN: @<Ambiguate token source@>@; 7122 return; 7123 case SOURCE_IS_COMPLETION: @<Ambiguate completion source@>@; 7124 return; 7125 case SOURCE_IS_LEO: @<Ambiguate Leo source@>@; 7126 return; 7127 } 7128} 7129@ @<Private function prototypes@> = static 7130void earley_item_ambiguate (struct marpa_r * r, EIM item); 7131 7132@ @<Ambiguate token source@> = { 7133 SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 7134 new_link->t_next = NULL; 7135 new_link->t_source = item->t_container.t_unique; 7136 LV_First_Leo_SRCL_of_EIM (item) = NULL; 7137 LV_First_Completion_Link_of_EIM (item) = NULL; 7138 LV_First_Token_Link_of_EIM (item) = new_link; 7139} 7140 7141@ @<Ambiguate completion source@> = { 7142 SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 7143 new_link->t_next = NULL; 7144 new_link->t_source = item->t_container.t_unique; 7145 LV_First_Leo_SRCL_of_EIM (item) = NULL; 7146 LV_First_Completion_Link_of_EIM (item) = new_link; 7147 LV_First_Token_Link_of_EIM (item) = NULL; 7148} 7149 7150@ @<Ambiguate Leo source@> = { 7151 SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link)); 7152 new_link->t_next = NULL; 7153 new_link->t_source = item->t_container.t_unique; 7154 LV_First_Leo_SRCL_of_EIM (item) = new_link; 7155 LV_First_Completion_Link_of_EIM (item) = NULL; 7156 LV_First_Token_Link_of_EIM (item) = NULL; 7157} 7158 7159@*0 Trace Functions. 7160Many trace functions track a ``trace source link". 7161There is only one of these, shared among all types of 7162source link. 7163It is an error to call a trace function that is 7164inconsistent with the type of the current trace 7165source link. 7166@<Widely aligned recognizer elements@> = 7167SRC t_trace_source; 7168SRCL t_trace_next_source_link; 7169@ @<Bit aligned recognizer elements@> = 7170guint t_trace_source_type:3; 7171@ @<Initialize recognizer elements@> = 7172r->t_trace_source = NULL; 7173r->t_trace_next_source_link = NULL; 7174r->t_trace_source_type = NO_SOURCE; 7175 7176@*1 Trace First Token Link. 7177@ Set the trace source link to a token link, 7178if there is one, otherwise clear the trace source link. 7179Returns the symbol ID if there was a token source link, 7180|-1| if there was none, 7181and |-2| on some other kind of failure. 7182@<Public function prototypes@> = 7183Marpa_Symbol_ID marpa_first_token_link_trace(struct marpa_r *r); 7184@ @<Function definitions@> = 7185Marpa_Symbol_ID marpa_first_token_link_trace(struct marpa_r *r) 7186{ 7187 @<Return |-2| on failure@>@; 7188 SRC source; 7189 guint source_type; 7190 EIM item = r->t_trace_earley_item; 7191 @<Fail recognizer if not trace-safe@>@; 7192 @<Set |item|, failing if necessary@>@; 7193 source_type = Source_Type_of_EIM (item); 7194 switch (source_type) 7195 { 7196 case SOURCE_IS_TOKEN: 7197 r->t_trace_source_type = SOURCE_IS_TOKEN; 7198 source = &(item->t_container.t_unique); 7199 r->t_trace_source = source; 7200 r->t_trace_next_source_link = NULL; 7201 return SYMID_of_SRC (source); 7202 case SOURCE_IS_AMBIGUOUS: 7203 { 7204 SRCL full_link = 7205 First_Token_Link_of_EIM (item); 7206 if (full_link) 7207 { 7208 r->t_trace_source_type = SOURCE_IS_TOKEN; 7209 r->t_trace_next_source_link = Next_SRCL_of_SRCL (full_link); 7210 r->t_trace_source = &(full_link->t_source); 7211 return SYMID_of_SRCL (full_link); 7212 } 7213 } 7214 } 7215 trace_source_link_clear(r); 7216 return -1; 7217} 7218 7219@*1 Trace Next Token Link. 7220@ Set the trace source link to the next token link, 7221if there is one. 7222Otherwise clear the trace source link. 7223@ Returns the symbol ID if there is 7224a next token source link, 7225|-1| if there was none, 7226and |-2| on some other kind of failure. 7227@<Public function prototypes@> = 7228Marpa_Symbol_ID marpa_next_token_link_trace(struct marpa_r *r); 7229@ @<Function definitions@> = 7230Marpa_Symbol_ID marpa_next_token_link_trace(struct marpa_r *r) 7231{ 7232 @<Return |-2| on failure@>@; 7233 SRCL full_link; 7234 EIM item; 7235 @<Fail recognizer if not trace-safe@>@; 7236 @<Set |item|, failing if necessary@>@; 7237 if (r->t_trace_source_type != SOURCE_IS_TOKEN) { 7238 trace_source_link_clear(r); 7239 R_ERROR("not tracing token links"); 7240 return failure_indicator; 7241 } 7242 if (!r->t_trace_next_source_link) { 7243 trace_source_link_clear(r); 7244 return -1; 7245 } 7246 full_link = r->t_trace_next_source_link; 7247 r->t_trace_next_source_link = Next_SRCL_of_SRCL (full_link); 7248 r->t_trace_source = &(full_link->t_source); 7249 return SYMID_of_SRCL (full_link); 7250} 7251 7252@*1 Trace First Completion Link. 7253@ Set the trace source link to a completion link, 7254if there is one, otherwise clear the completion source link. 7255Returns the AHFA state ID of the cause 7256if there was a completion source link, 7257|-1| if there was none, 7258and |-2| on some other kind of failure. 7259@<Public function prototypes@> = 7260Marpa_Symbol_ID marpa_first_completion_link_trace(struct marpa_r *r); 7261@ @<Function definitions@> = 7262Marpa_Symbol_ID marpa_first_completion_link_trace(struct marpa_r *r) 7263{ 7264 @<Return |-2| on failure@>@; 7265 SRC source; 7266 guint source_type; 7267 EIM item = r->t_trace_earley_item; 7268 @<Fail recognizer if not trace-safe@>@; 7269 @<Set |item|, failing if necessary@>@; 7270 switch ((source_type = Source_Type_of_EIM (item))) 7271 { 7272 case SOURCE_IS_COMPLETION: 7273 r->t_trace_source_type = SOURCE_IS_COMPLETION; 7274 source = &(item->t_container.t_unique); 7275 r->t_trace_source = source; 7276 r->t_trace_next_source_link = NULL; 7277 return Cause_AHFA_State_ID_of_SRC (source); 7278 case SOURCE_IS_AMBIGUOUS: 7279 { 7280 SRCL completion_link = First_Completion_Link_of_EIM (item); 7281 if (completion_link) 7282 { 7283 source = &(completion_link->t_source); 7284 r->t_trace_source_type = SOURCE_IS_COMPLETION; 7285 r->t_trace_next_source_link = Next_SRCL_of_SRCL (completion_link); 7286 r->t_trace_source = source; 7287 return Cause_AHFA_State_ID_of_SRC (source); 7288 } 7289 } 7290 } 7291 trace_source_link_clear(r); 7292 return -1; 7293} 7294 7295@*1 Trace Next Completion Link. 7296@ Set the trace source link to the next completion link, 7297if there is one. 7298Otherwise clear the trace source link. 7299@ Returns the symbol ID if there is 7300a next completion source link, 7301|-1| if there was none, 7302and |-2| on some other kind of failure. 7303@<Public function prototypes@> = 7304Marpa_Symbol_ID marpa_next_completion_link_trace(struct marpa_r *r); 7305@ @<Function definitions@> = 7306Marpa_Symbol_ID marpa_next_completion_link_trace(struct marpa_r *r) 7307{ 7308 @<Return |-2| on failure@>@; 7309 SRC source; 7310 SRCL completion_link; 7311 EIM item; 7312 @<Fail recognizer if not trace-safe@>@; 7313 @<Set |item|, failing if necessary@>@; 7314 if (r->t_trace_source_type != SOURCE_IS_COMPLETION) { 7315 trace_source_link_clear(r); 7316 R_ERROR("not tracing completion links"); 7317 return failure_indicator; 7318 } 7319 if (!r->t_trace_next_source_link) { 7320 trace_source_link_clear(r); 7321 return -1; 7322 } 7323 completion_link = r->t_trace_next_source_link; 7324 r->t_trace_next_source_link = Next_SRCL_of_SRCL (r->t_trace_next_source_link); 7325 source = &(completion_link->t_source); 7326 r->t_trace_source = source; 7327 return Cause_AHFA_State_ID_of_SRC (source); 7328} 7329 7330@*1 Trace First Leo Link. 7331@ Set the trace source link to a Leo link, 7332if there is one, otherwise clear the Leo source link. 7333Returns the AHFA state ID of the cause 7334if there was a Leo source link, 7335|-1| if there was none, 7336and |-2| on some other kind of failure. 7337@<Public function prototypes@> = 7338Marpa_Symbol_ID marpa_first_leo_link_trace(struct marpa_r *r); 7339@ @<Function definitions@> = 7340Marpa_Symbol_ID 7341marpa_first_leo_link_trace (struct marpa_r *r) 7342{ 7343 @<Return |-2| on failure@>@; 7344 SRC source; 7345 guint source_type; 7346 EIM item = r->t_trace_earley_item; 7347 @<Fail recognizer if not trace-safe@>@; 7348 @<Set |item|, failing if necessary@>@; 7349 switch ((source_type = Source_Type_of_EIM (item))) 7350 { 7351 case SOURCE_IS_LEO: 7352 r->t_trace_source_type = SOURCE_IS_LEO; 7353 source = &(item->t_container.t_unique); 7354 r->t_trace_source = source; 7355 r->t_trace_next_source_link = NULL; 7356 return Cause_AHFA_State_ID_of_SRC (source); 7357 case SOURCE_IS_AMBIGUOUS: 7358 { 7359 SRCL full_link = 7360 First_Leo_SRCL_of_EIM (item); 7361 if (full_link) 7362 { 7363 source = &(full_link->t_source); 7364 r->t_trace_source_type = SOURCE_IS_LEO; 7365 r->t_trace_next_source_link = (SRCL) 7366 Next_SRCL_of_SRCL (full_link); 7367 r->t_trace_source = source; 7368 return Cause_AHFA_State_ID_of_SRC (source); 7369 } 7370 } 7371 } 7372 trace_source_link_clear (r); 7373 return -1; 7374} 7375 7376@*1 Trace Next Leo Link. 7377@ Set the trace source link to the next Leo link, 7378if there is one. 7379Otherwise clear the trace source link. 7380@ Returns the symbol ID if there is 7381a next Leo source link, 7382|-1| if there was none, 7383and |-2| on some other kind of failure. 7384@<Public function prototypes@> = 7385Marpa_Symbol_ID marpa_next_leo_link_trace(struct marpa_r *r); 7386@ @<Function definitions@> = 7387Marpa_Symbol_ID 7388marpa_next_leo_link_trace (struct marpa_r *r) 7389{ 7390 @<Return |-2| on failure@>@/ 7391 SRCL full_link; 7392 SRC source; 7393 EIM item; 7394 @<Fail recognizer if not trace-safe@>@/ 7395 @<Set |item|, failing if necessary@>@/ 7396 if (r->t_trace_source_type != SOURCE_IS_LEO) 7397 { 7398 trace_source_link_clear (r); 7399 R_ERROR("not tracing leo links"); 7400 return failure_indicator; 7401 } 7402 if (!r->t_trace_next_source_link) 7403 { 7404 trace_source_link_clear (r); 7405 return -1; 7406 } 7407 full_link = r->t_trace_next_source_link; 7408 source = &(full_link->t_source); 7409 r->t_trace_source = source; 7410 r->t_trace_next_source_link = 7411 Next_SRCL_of_SRCL(r->t_trace_next_source_link); 7412 return Cause_AHFA_State_ID_of_SRC (source); 7413} 7414 7415@ @<Set |item|, failing if necessary@> = 7416 item = r->t_trace_earley_item; 7417 if (!item) { 7418 trace_source_link_clear(r); 7419 R_ERROR("no eim"); 7420 return failure_indicator; 7421 } 7422 7423@*1 Clear Trace Source Link. 7424@ @<Private function prototypes@> = 7425static inline void trace_source_link_clear(struct marpa_r* r); 7426@ @<Function definitions@> = 7427static inline void trace_source_link_clear(struct marpa_r* r) { 7428 r->t_trace_next_source_link = NULL; 7429 r->t_trace_source = NULL; 7430 r->t_trace_source_type = NO_SOURCE; 7431} 7432 7433@*1 Return the Predecessor AHFA State. 7434Returns the predecessor AHFA State, 7435or -1 if there is no predecessor. 7436If the recognizer is trace-safe, 7437there is no trace source link, 7438the trace source link is a Leo source, 7439or there is some other failure, 7440|-2| is returned. 7441@<Public function prototypes@> = 7442Marpa_AHFA_State_ID marpa_source_predecessor_state(struct marpa_r *r); 7443@ @<Function definitions@> = 7444AHFAID marpa_source_predecessor_state(struct marpa_r *r) 7445{ 7446 @<Return |-2| on failure@>@/ 7447 guint source_type; 7448 SRC source; 7449 @<Fail recognizer if not trace-safe@>@/ 7450 source_type = r->t_trace_source_type; 7451 @<Set source, failing if necessary@>@/ 7452 switch (source_type) 7453 { 7454 case SOURCE_IS_TOKEN: 7455 case SOURCE_IS_COMPLETION: { 7456 EIM predecessor = Predecessor_of_SRC(source); 7457 if (!predecessor) return -1; 7458 return AHFAID_of_EIM(predecessor); 7459 } 7460 } 7461 R_ERROR(invalid_source_type_message(source_type)); 7462 return failure_indicator; 7463} 7464 7465@*1 Return the Token. 7466Returns the token. 7467The symbol id is the return value, 7468and the value is written to |*value_p|, 7469if it is non-null. 7470If the recognizer is not trace-safe, 7471there is no trace source link, 7472if the trace source link is not a token source, 7473or there is some other failure, 7474|-2| is returned. 7475\par 7476There is no function to return just the token value 7477for two reasons. 7478First, since token value can be anything 7479an additional return value is needed to indicate errors, 7480which means the symbol ID comes at virtually zero cost. 7481Second, whenever the token value is 7482wanted, the symbol ID is almost always wanted as well. 7483@<Public function prototypes@> = 7484Marpa_Symbol_ID marpa_source_token(struct marpa_r *r, gpointer *value_p); 7485@ @<Function definitions@> = 7486Marpa_Symbol_ID marpa_source_token(struct marpa_r *r, gpointer *value_p) 7487{ 7488 @<Return |-2| on failure@>@; 7489 guint source_type; 7490 SRC source; 7491 @<Fail recognizer if not trace-safe@>@; 7492 source_type = r->t_trace_source_type; 7493 @<Set source, failing if necessary@>@; 7494 if (source_type == SOURCE_IS_TOKEN) { 7495 const TOK token = TOK_of_SRC(source); 7496 if (value_p) *value_p = Value_of_TOK(token); 7497 return SYMID_of_TOK(token); 7498 } 7499 R_ERROR(invalid_source_type_message(source_type)); 7500 return failure_indicator; 7501} 7502 7503@*1 Return the Leo Transition Symbol. 7504The Leo transition symbol is defined only for sources 7505with a Leo predecessor. 7506The transition from a predecessor to the Earley item 7507containing a source will always be over exactly one symbol. 7508In the case of a Leo source, this symbol will be 7509the Leo transition symbol. 7510@ Returns the symbol ID of the Leo transition symbol. 7511If the recognizer is not trace-safe, 7512if there is no trace source link, 7513if the trace source link is not a Leo source, 7514or there is some other failure, 7515|-2| is returned. 7516@<Public function prototypes@> = 7517Marpa_Symbol_ID marpa_source_leo_transition_symbol(struct marpa_r *r); 7518@ @<Function definitions@> = 7519Marpa_Symbol_ID marpa_source_leo_transition_symbol(struct marpa_r *r) 7520{ 7521 @<Return |-2| on failure@>@/ 7522 guint source_type; 7523 SRC source; 7524 @<Fail recognizer if not trace-safe@>@/ 7525 source_type = r->t_trace_source_type; 7526 @<Set source, failing if necessary@>@/ 7527 switch (source_type) 7528 { 7529 case SOURCE_IS_LEO: 7530 return Leo_Transition_SYMID_of_SRC(source); 7531 } 7532 R_ERROR(invalid_source_type_message(source_type)); 7533 return failure_indicator; 7534} 7535 7536@*1 Return the Middle Earleme. 7537Every source has a ``middle earleme" defined. 7538Every source has 7539\li An origin (or start earleme). 7540\li An end earleme (the current set). 7541\li A ``middle earleme". 7542An Earley item can be thought of as covering a ``span" 7543from its origin to the current set. 7544For each source, 7545this span is divided into two pieces at the middle 7546earleme. 7547@ Informally, the middle earleme can be thought of as 7548dividing the span between the predecessor and either 7549the source's cause or its token. 7550If the source has no predecessor, the middle earleme 7551is the same as the origin. 7552If there is a predecessor, the middle earleme is 7553the current set of the predecessor. 7554If there is a cause, the middle earleme is always the same 7555as the origin of the cause. 7556If there is a token, 7557the middle earleme is always where the token starts. 7558@<Public function prototypes@> = 7559Marpa_Earley_Set_ID marpa_source_middle(struct marpa_r* r); 7560@ The ``predecessor set" is the earleme of the predecessor. 7561Returns |-1| if there is no predecessor. 7562If there are other failures, such as 7563there being no source link, 7564|-2| is returned. 7565@<Function definitions@> = 7566Marpa_Earley_Set_ID marpa_source_middle(struct marpa_r* r) 7567{ 7568 @<Return |-2| on failure@>@/ 7569 const EARLEME no_predecessor = -1; 7570 guint source_type; 7571 SRC source; 7572 @<Fail recognizer if not trace-safe@>@/ 7573 source_type = r->t_trace_source_type; 7574 @<Set source, failing if necessary@>@/ 7575 switch (source_type) 7576 { 7577 case SOURCE_IS_LEO: 7578 { 7579 LIM predecessor = Predecessor_of_SRC (source); 7580 if (!predecessor) return no_predecessor; 7581 return 7582 ES_Ord_of_EIM (Base_EIM_of_LIM (predecessor)); 7583 } 7584 case SOURCE_IS_TOKEN: 7585 case SOURCE_IS_COMPLETION: 7586 { 7587 EIM predecessor = Predecessor_of_SRC (source); 7588 if (!predecessor) return no_predecessor; 7589 return ES_Ord_of_EIM (predecessor); 7590 } 7591 } 7592 R_ERROR(invalid_source_type_message (source_type)); 7593 return failure_indicator; 7594} 7595 7596@ @<Set source, failing if necessary@> = 7597 source = r->t_trace_source; 7598 if (!source) { 7599 R_ERROR("no trace source link"); 7600 return failure_indicator; 7601 } 7602 7603@** Token Code (TOK). 7604@ Tokens are duples of symbol ID and token value. 7605They do {\bf not} store location information, 7606so the same token 7607can occur many times in a parse. 7608On the other hand, duplicate tokens are also allowed. 7609How much, if any, trouble to take to avoid duplication 7610is up to the application -- 7611duplicates have their cost, but so does the 7612tracking necessary to avoid them. 7613@ My strong preference is that token values 7614{\bf always} be integers, but 7615token values are |gpointer|'s to allow applications 7616full generality. 7617Using |glib|, integers can portably be stored in a 7618|gpointer|, but the reverse is not true. 7619@ In my prefered semantic scheme, the integers are 7620used by the higher levels to index the actual data. 7621In this way no direct pointer to any data "owned" 7622by the higher level is ever under libmarpa's control. 7623Problems with mismatches between libmarpa and the 7624higher levels are almost impossible to avoid in 7625development 7626and once an application gets in maintenance mode 7627things become, if possible, worse. 7628@ "But," you say, "pointers are faster, 7629and mismatches occur whether 7630you index the data with an integer or directly. 7631So if you are in trouble either way, why not go 7632for speed?" 7633\par 7634The above objection is true, but overlooks a very 7635important issue. A bad pointer can cause very 7636serious problems -- 7637a core dump, or even worse, undetected data corruption. 7638There is no good way to detect a bad pointer before it 7639does it's damage. 7640\par 7641If an integer index, on the other hand, is out of bounds, 7642the higher levels can catch this and react. 7643Worst case, the higher level may have to throw a controlled 7644fatal error. 7645This is a much better than a core dump 7646and far better than undetected data corruption. 7647@<Private incomplete structures@> = 7648struct s_token; 7649typedef struct s_token* TOK; 7650@ The |t_type| field is to allow |TOK| 7651objects to act as or-nodes. 7652@d Type_of_TOK(tok) ((tok)->t_type) 7653@d SYMID_of_TOK(tok) ((tok)->t_symbol_id) 7654@d Value_of_TOK(tok) ((tok)->t_value) 7655@<Private structures@> = 7656struct s_token { 7657 gint t_type; 7658 SYMID t_symbol_id; 7659 gpointer t_value; 7660}; 7661typedef struct s_token TOK_Object; 7662 7663@ An obstack dedicated to the tokens and an array 7664with default tokens for each symbol. 7665Currently, 7666the default tokens are used to provide 7667null values, since all non-tokens are given 7668values when read. 7669There is a special obstack for the tokens, to 7670to separate the token stream from the rest of the recognizer 7671data. 7672Once the bocage is built, the token data is all that 7673it needs, and someday I may want to take advantage of 7674this fact by freeing up the rest of recognizer memory. 7675@d TOK_Obs_of_R(r) (&(r)->t_token_obs) 7676@d TOKs_by_SYMID_of_R(r) ((r)->t_tokens_by_symid) 7677@d TOK_Obs TOK_Obs_of_R(r) 7678@d TOK_by_ID_of_R(r, symbol_id) (TOKs_by_SYMID_of_R(r)[symbol_id]) 7679@<Widely aligned recognizer elements@> = 7680struct obstack t_token_obs; 7681TOK *t_tokens_by_symid; 7682@ @<Initialize recognizer elements@> = 7683{ 7684 gpointer default_value = Default_Value_of_G(g); 7685 gint i; 7686 TOK *tokens_by_symid; 7687 obstack_init (TOK_Obs); 7688 tokens_by_symid = 7689 obstack_alloc (TOK_Obs, sizeof (TOK) * symbol_count_of_g); 7690 for (i = 0; i < symbol_count_of_g; i++) 7691 { 7692 tokens_by_symid[i] = token_new (r, i, default_value); 7693 } 7694 TOKs_by_SYMID_of_R(r) = tokens_by_symid; 7695} 7696@ @<Destroy recognizer elements@> = 7697{ 7698 TOK* tokens_by_symid = TOKs_by_SYMID_of_R(r); 7699 if (tokens_by_symid) { 7700 obstack_free(TOK_Obs, NULL); 7701 TOKs_by_SYMID_of_R(r) = NULL; 7702 } 7703} 7704 7705@ @<Private function prototypes@> = 7706static inline 7707TOK token_new(struct marpa_r *r, SYMID symbol_id, gpointer value); 7708@ @<Function definitions@> = 7709static inline 7710TOK token_new(struct marpa_r *r, SYMID symbol_id, gpointer value) 7711{ 7712 TOK token; 7713 token = obstack_alloc (TOK_Obs, sizeof(*token)); 7714 Type_of_TOK(token) = TOKEN_OR_NODE; 7715 SYMID_of_TOK(token) = symbol_id; 7716 Value_of_TOK(token) = value; 7717 return token; 7718} 7719 7720@ Recover |token| from the token obstack. 7721The intended use is to recover the one token 7722most recently added in case of an error. 7723@<Recover |token|@> = obstack_free (TOK_Obs, token); 7724 7725@** Alternative Tokens (ALT) Code. 7726Because Marpa allows more than one token at every 7727earleme, Marpa's tokens are also called ``alternatives". 7728@<Private incomplete structures@> = 7729struct s_alternative; 7730typedef struct s_alternative* ALT; 7731typedef const struct s_alternative* ALT_Const; 7732@ 7733@d TOK_of_ALT(alt) ((alt)->t_token) 7734@d SYMID_of_ALT(alt) SYMID_of_TOK(TOK_of_ALT(alt)) 7735@d Start_ES_of_ALT(alt) ((alt)->t_start_earley_set) 7736@d Start_Earleme_of_ALT(alt) Earleme_of_ES(Start_ES_of_ALT(alt)) 7737@d End_Earleme_of_ALT(alt) ((alt)->t_end_earleme) 7738@<Private structures@> = 7739struct s_alternative { 7740 TOK t_token; 7741 ES t_start_earley_set; 7742 EARLEME t_end_earleme; 7743}; 7744typedef struct s_alternative ALT_Object; 7745 7746@ @<Widely aligned recognizer elements@> = 7747DSTACK_DECLARE(t_alternatives); 7748@ 7749{\bf To Do}: @^To Do@> 7750The value of |INITIAL_ALTERNATIVES_CAPACITY| is 1 for testing while this 7751code is being developed. 7752Once the code is stable it should be increased. 7753@d INITIAL_ALTERNATIVES_CAPACITY 1 7754@<Initialize recognizer elements@> = 7755DSTACK_INIT(r->t_alternatives, ALT_Object, INITIAL_ALTERNATIVES_CAPACITY); 7756@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_alternatives); 7757 7758@ This functions returns the index at which to insert a new 7759alternative, or -1 if the new alternative is a duplicate. 7760(Duplicate alternatives should not be inserted.) 7761@<Private function prototypes@> = 7762static inline gint alternative_insertion_point(RECCE r, ALT new_alternative); 7763@ A variation of binary search. 7764@<Function definitions@> = 7765static inline gint 7766alternative_insertion_point (RECCE r, ALT new_alternative) 7767{ 7768 DSTACK alternatives = &r->t_alternatives; 7769 ALT alternative; 7770 gint hi = DSTACK_LENGTH(*alternatives) - 1; 7771 gint lo = 0; 7772 gint trial; 7773 // Special case when zero alternatives. 7774 if (hi < 0) 7775 return 0; 7776 alternative = DSTACK_BASE(*alternatives, ALT_Object); 7777 for (;;) 7778 { 7779 gint outcome; 7780 trial = lo + (hi - lo) / 2; 7781 outcome = alternative_cmp (new_alternative, alternative+trial); 7782 if (outcome == 0) 7783 return -1; 7784 if (outcome > 0) 7785 { 7786 lo = trial + 1; 7787 } 7788 else 7789 { 7790 hi = trial - 1; 7791 } 7792 if (hi < lo) 7793 return outcome > 0 ? trial + 1 : trial; 7794 } 7795} 7796 7797@ This is the comparison function for sorting alternatives. 7798The alternatives array also acts as a stack, with the alternatives 7799ending at the lowest numbered earleme on top of the stack. 7800This allows alternatives to be popped off the stack as the 7801earlemes are processed in numerical order. 7802@<Private function prototypes@> = 7803static inline gint alternative_cmp(const ALT_Const a, const ALT_Const b); 7804@ So that the alternatives array can act as a stack, 7805the end earleme of the alternatives must be the major key, 7806and must sort in reverse order. 7807Of the remaining two keys, 7808the more minor key is the start earleme, because that way its slightly 7809costlier evaluation can sometimes be avoided. 7810@<Function definitions@> = 7811static inline gint alternative_cmp(const ALT_Const a, const ALT_Const b) { 7812 gint subkey = End_Earleme_of_ALT(b) - End_Earleme_of_ALT(a); 7813 if (subkey) return subkey; 7814 subkey = SYMID_of_ALT(a) - SYMID_of_ALT(b); 7815 if (subkey) return subkey; 7816 return Start_Earleme_of_ALT(a) - Start_Earleme_of_ALT(b); 7817} 7818 7819@ This function pops an alternative from the stack, if it matches 7820the earleme argument. 7821If no alternative on the stack has its end earleme at the 7822earleme argument, |NULL| is returned. 7823The data pointed to by the return value may be overwritten when 7824new alternatives are added, so it must be used before the next 7825call that adds data to the alternatives stack. 7826@<Private function prototypes@> = 7827static inline ALT alternative_pop(RECCE r, EARLEME earleme); 7828@ @<Function definitions@> = 7829static inline ALT alternative_pop(RECCE r, EARLEME earleme) 7830{ 7831 DSTACK alternatives = &r->t_alternatives; 7832 ALT top_of_stack = DSTACK_TOP(*alternatives, ALT_Object); 7833 if (!top_of_stack) return NULL; 7834 if (earleme != End_Earleme_of_ALT(top_of_stack)) return NULL; 7835 return DSTACK_POP(*alternatives, ALT_Object); 7836} 7837 7838@ This function inserts an alternative into the stack, 7839in sorted order, 7840if the alternative is not a duplicate. 7841It returns -1 if the alternative is a duplicate, 7842and the insertion point (which must be zero or more) otherwise. 7843@<Private function prototypes@> = 7844static inline gint alternative_insert(RECCE r, ALT alternative); 7845@ @<Function definitions@> = 7846static inline gint alternative_insert(RECCE r, ALT new_alternative) 7847{ 7848 ALT top_of_stack, base_of_stack; 7849 DSTACK alternatives = &r->t_alternatives; 7850 gint ix; 7851 gint insertion_point = alternative_insertion_point (r, new_alternative); 7852 if (insertion_point < 0) 7853 return insertion_point; 7854 top_of_stack = DSTACK_PUSH(*alternatives, ALT_Object); // may change base 7855 base_of_stack = DSTACK_BASE(*alternatives, ALT_Object); // base will not change after this 7856 for (ix = top_of_stack-base_of_stack; ix > insertion_point; ix--) { 7857 base_of_stack[ix] = base_of_stack[ix-1]; 7858 } 7859 base_of_stack[insertion_point] = *new_alternative; 7860 return insertion_point; 7861} 7862 7863@** Starting Recognizer Input. 7864@ @<Public function prototypes@> = gboolean marpa_start_input(struct marpa_r *r); 7865@ @<Function definitions@> = gboolean marpa_start_input(struct marpa_r *r) 7866{ 7867 ES set0; 7868 EIM item; 7869 EIK_Object key; 7870 AHFA state; 7871 GRAMMAR_Const g = G_of_R(r); 7872 const gint symbol_count_of_g = SYM_Count_of_G(g); 7873 @<Return |FALSE| on failure@>@; 7874 @<Fail if recognizer not initial@>@; 7875 @<Allocate recognizer workareas@>@; 7876 psar_reset(Dot_PSAR_of_R(r)); 7877 @<Allocate recognizer's bit vectors for symbols@>@; 7878 @<Initialize Earley item work stacks@>@; 7879 Phase_of_R(r) = input_phase; 7880 LV_Current_Earleme_of_R(r) = 0; 7881 set0 = earley_set_new(r, 0); 7882 LV_Latest_ES_of_R(r) = set0; 7883 LV_First_ES_of_R(r) = set0; 7884 state = AHFA_of_G_by_ID(g, 0); 7885 key.t_origin = set0; 7886 key.t_state = state; 7887 key.t_set = set0; 7888 item = earley_item_create(r, key); 7889 state = Empty_Transition_of_AHFA(state); 7890 if (state) { 7891 key.t_state = state; 7892 item = earley_item_create(r, key); 7893 } 7894 postdot_items_create(r, set0); 7895 earley_set_update_items(r, set0); 7896 r->t_is_using_leo = r->t_use_leo_flag; 7897 return TRUE; 7898} 7899 7900@** Read a Token Alternative. 7901The ordinary semantics of a parser generator is a token-stream 7902semantics. 7903The input is a sequence of $n$ tokens. 7904Every token is of length 1. 7905The tokens fill the locations from 0 to $n-1$. 7906The first token goes into location 0, 7907the next into location 1, 7908and so on up to location $n-1$. 7909@ In Marpa terms, a token-stream 7910corresponds to reading exactly one token alternative at every location. 7911In Marpa, the input locations are also called earlemes. 7912@ Marpa allows other models of the input besides the token stream model. 7913Tokens may be ambiguous -- that is, more than one token may occur 7914at any location. 7915Tokens vary in length -- tokens may be of any length greater than 7916or equal to one. 7917This means tokens can span multiple earlemes. 7918As a consequence, 7919there may be no tokens at some earlemes. 7920@ |marpa_alternative|, by enforcing a limit on token length and on 7921the furthest location, indirectly enforces a limit on the 7922number of earley sets and the maximum earleme location. 7923If tokens ending at location $n$ cannot be scanned, then clearly 7924the parse can 7925never reach location $n$. 7926@ Whether token rejection is considered a failure is 7927a matter for the upper layers to define. 7928Retrying rejected tokens is one way to implement the 7929important ``Ruby Slippers" parsing technique. 7930On the other hand it is traditional, 7931and often quite reasonable, 7932to always treat rejection of a token as a fatal error. 7933@ Returns current earleme (which may be zero) on success. 7934If the token is rejected because it is not 7935expected, returns |-1|. 7936If the token is rejected as a duplicate 7937expected, returns |-3|. 7938On failure for other reasons, returns |-2|. 7939@ Rejection because a token is unexpected can a common 7940occurrence in an application---% 7941an application may use this function to try out 7942various alternatives. 7943Rejection because a token is a duplicate is more likely to be 7944a hard failure, but it is possible that an application will 7945also see this as a normal data path. 7946The general failures reported with |-2| will typically be 7947treated by the application as fatal errors. 7948@<Public function prototypes@> = gboolean marpa_alternative(struct marpa_r *r, 7949Marpa_Symbol_ID token_id, gpointer value, gint length); 7950@ @<Function definitions@> = 7951gboolean marpa_alternative(struct marpa_r *r, 7952Marpa_Symbol_ID token_id, gpointer value, gint length) { 7953 @<Return |-2| on failure@>@; 7954 GRAMMAR_Const g = G_of_R(r); 7955 const gint duplicate_token_indicator = -3; 7956 const gint unexpected_token_indicator = -1; 7957 ES current_earley_set; 7958 const EARLEME current_earleme = Current_Earleme_of_R(r); 7959 EARLEME target_earleme; 7960 @<Fail if recognizer not in input phase@>@; 7961 @<Fail if recognizer exhausted@>@; 7962 @<|marpa_alternative| initial check for failure conditions@>@; 7963 @<Set |current_earley_set|, failing if token is unexpected@>@; 7964 @<Set |target_earleme| or fail@>@; 7965 @<Insert alternative into stack, failing if token is duplicate@>@; 7966 return current_earleme; 7967} 7968 7969@ @<|marpa_alternative| initial check for failure conditions@> = { 7970 const SYM_Const token = SYM_by_ID(token_id); 7971 if (!SYM_is_Terminal(token)) { 7972 R_ERROR("token is not a terminal"); 7973 return failure_indicator; 7974 } 7975 if (length <= 0) { 7976 R_ERROR("token length negative or zero"); 7977 return failure_indicator; 7978 } 7979 if (length >= EARLEME_THRESHOLD) { 7980 R_ERROR("token too long"); 7981 return failure_indicator; 7982 } 7983} 7984 7985@ @<Set |target_earleme| or fail@> = { 7986 target_earleme = current_earleme + length; 7987 if (target_earleme >= EARLEME_THRESHOLD) { 7988 r_context_clear(r); 7989 r_context_int_add(r, "target_earleme", target_earleme); 7990 R_ERROR_CXT("parse too long"); 7991 return failure_indicator; 7992 } 7993} 7994 7995@ If no postdot item is found at the current Earley set for this 7996item, the token ID is unexpected, and |unexpected_token_indicator| is returned. 7997The application can treat this as a fatal error. 7998The application can also use this as a mechanism to test alternatives, 7999in which case, returning |unexpected_token_indicator| is a perfectly normal data path. 8000This last is part of an important technique: 8001``Ruby Slippers" parsing. 8002@<Set |current_earley_set|, failing if token is unexpected@> = { 8003 current_earley_set = Current_ES_of_R (r); 8004 if (!current_earley_set) return unexpected_token_indicator; 8005 if (!First_PIM_of_ES_by_SYMID (current_earley_set, token_id)) 8006 return unexpected_token_indicator; 8007} 8008 8009@ Insert an alternative into the alternatives stack, 8010detecting if we are attempting to add the same token twice. 8011Two tokens are considered the same if 8012\li they have the same token ID, and 8013\li they have the same length, and 8014\li they have the same origin. 8015Because $|origin|+|token_length| = |current_earleme|$, 8016Two tokens at the same current earleme are the same if they 8017have the same token ID and origin. 8018By the same equation, 8019two tokens at the same current earleme are the same if they 8020have the same token ID and token length. 8021It is up to the higher layers to determine if rejection 8022of a duplicate token is a fatal error. 8023The Earley sets and items will not have been 8024altered by the attempt. 8025@<Insert alternative into stack, failing if token is duplicate@> = 8026{ 8027 TOK token = token_new (r, token_id, value); 8028 ALT_Object alternative; 8029 if (Furthest_Earleme_of_R (r) < target_earleme) 8030 LV_Furthest_Earleme_of_R (r) = target_earleme; 8031 alternative.t_token = token; 8032 alternative.t_start_earley_set = current_earley_set; 8033 alternative.t_end_earleme = target_earleme; 8034 if (alternative_insert (r, &alternative) < 0) 8035 { 8036 @<Recover |token|@>@; 8037 return duplicate_token_indicator; 8038 } 8039} 8040 8041@** Complete an Earley Set. 8042In the Aycock-Horspool variation of Earley's algorithm, 8043the two main phases are scanning and completion. 8044This section is devoted to the logic for completion. 8045@d Work_EIMs_of_R(r) DSTACK_BASE((r)->t_eim_work_stack, EIM) 8046@d Work_EIM_Count_of_R(r) DSTACK_LENGTH((r)->t_eim_work_stack) 8047@d WORK_EIMS_CLEAR(r) DSTACK_CLEAR((r)->t_eim_work_stack) 8048@d WORK_EIM_PUSH(r) DSTACK_PUSH((r)->t_eim_work_stack, EIM) 8049@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_eim_work_stack); 8050@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_eim_work_stack); 8051@ @<Initialize Earley item work stacks@> = 8052 DSTACK_IS_INITIALIZED(r->t_eim_work_stack) || 8053 DSTACK_INIT (r->t_eim_work_stack, EIM , 1024); 8054@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_eim_work_stack); 8055 8056@ The completion stack is initialized to a very high-ball estimate of the 8057number of completions per Earley set. 8058It will grow if needed. 8059Large stacks may needed for very ambiguous grammars. 8060@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_completion_stack); 8061@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_completion_stack); 8062@ @<Initialize Earley item work stacks@> = 8063 DSTACK_IS_INITIALIZED(r->t_completion_stack) || 8064 DSTACK_INIT (r->t_completion_stack, EIM , 1024); 8065@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_completion_stack); 8066 8067@ The completion stack is initialized to a very high-ball estimate of the 8068number of completions per Earley set. 8069It will grow if needed. 8070Large stacks may needed for very ambiguous grammars. 8071@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_earley_set_stack); 8072@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_earley_set_stack); 8073@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_earley_set_stack); 8074 8075@ This function returns the number of terminals expected on success. 8076On failure, it returns |-2|. 8077If the completion of the earleme left the parse exhausted, 0 is 8078returned. 8079@ 8080While, if the completion of the earleme left the parse exhausted, 0 is 8081returned, the converse is not true if tokens may be longer than one earleme. 8082In those alternative input models, it is possible that no terminals are 8083expected at the current earleme, but other terminals might be expected 8084at later earlemes. 8085That means that the parse can be continued---% 8086it is not exhausted. 8087In those alternative input models, 8088if the distinction between zero terminals expected and an 8089exhausted parse is significant to the higher layers, 8090they must explicitly check the phase whenever this function 8091returns zero. 8092@<Public function prototypes@> = 8093Marpa_Earleme marpa_earleme_complete(struct marpa_r* r); 8094@ @<Function definitions@> = 8095Marpa_Earleme 8096marpa_earleme_complete(struct marpa_r* r) 8097{ 8098 @<Return |-2| on failure@>@; 8099 EIM* cause_p; 8100 ES current_earley_set; 8101 EARLEME current_earleme; 8102 gint count_of_expected_terminals; 8103 @<Fail if recognizer not in input phase@>@; 8104 @<Fail if recognizer exhausted@>@; 8105 psar_dealloc(Dot_PSAR_of_R(r)); 8106 bv_clear (r->t_bv_symid_is_expected); 8107 @<Initialize |current_earleme|@>@; 8108 @<Return 0 if no alternatives@>@; 8109 @<Initialize |current_earley_set|@>@; 8110 @<Scan from the alternative stack@>@; 8111 @<Pre-populate the completion stack@>@; 8112 while ((cause_p = DSTACK_POP(r->t_completion_stack, EIM))) { 8113 EIM cause = *cause_p; 8114 @<Add new Earley items for |cause|@>@; 8115 } 8116 postdot_items_create(r, current_earley_set); 8117 8118 count_of_expected_terminals = bv_count (r->t_bv_symid_is_expected); 8119 if (count_of_expected_terminals <= 0 8120 && Earleme_of_ES (current_earley_set) >= Furthest_Earleme_of_R (r)) 8121 { /* If no terminals are expected, and there are no Earley items in 8122 uncompleted Earley sets, we can make no further progress. 8123 The parse is ``exhausted". */ 8124 LV_R_is_Exhausted(r) = 1; 8125 } 8126 earley_set_update_items(r, current_earley_set); 8127 return count_of_expected_terminals; 8128} 8129 8130@ @<Initialize |current_earleme|@> = { 8131 current_earleme = ++(LV_Current_Earleme_of_R(r)); 8132 if (current_earleme > Furthest_Earleme_of_R (r)) 8133 { 8134 LV_R_is_Exhausted(r) = 1; 8135 R_ERROR("parse exhausted"); 8136 return failure_indicator; 8137 } 8138} 8139 8140@ Create a new Earley set. We know that it does not 8141exist. 8142@<Initialize |current_earley_set|@> = { 8143 current_earley_set = earley_set_new (r, current_earleme); 8144 LV_Next_ES_of_ES(Latest_ES_of_R(r)) = current_earley_set; 8145 LV_Latest_ES_of_R(r) = current_earley_set; 8146} 8147 8148@ If there are no alternatives for this earleme 8149return 0 without creating an 8150Earley set. 8151The return value of 0 indicates that there are no terminals 8152which will be accepted at this earleme. 8153In the default (token stream) model of input, 8154this means that the parse is exhausted. 8155@<Return 0 if no alternatives@> = { 8156 ALT top_of_stack = DSTACK_TOP(r->t_alternatives, ALT_Object); 8157 if (!top_of_stack) return 0; 8158 if (current_earleme != End_Earleme_of_ALT(top_of_stack)) return 0; 8159} 8160 8161@ @<Scan from the alternative stack@> = 8162{ 8163 ALT alternative; 8164 while ((alternative = alternative_pop (r, current_earleme))) 8165 @<Scan an Earley item from alternative@>@; 8166} 8167 8168@ @<Scan an Earley item from alternative@> = 8169{ 8170 ES start_earley_set = Start_ES_of_ALT (alternative); 8171 TOK token = TOK_of_ALT (alternative); 8172 SYMID token_id = SYMID_of_TOK(token); 8173 PIM pim = First_PIM_of_ES_by_SYMID (start_earley_set, token_id); 8174 for ( ; pim ; pim = Next_PIM_of_PIM (pim)) { 8175 AHFA scanned_AHFA, prediction_AHFA; 8176 EIM scanned_earley_item; 8177 EIM predecessor = EIM_of_PIM (pim); 8178 if (!predecessor) 8179 continue; // Ignore Leo items when scanning 8180 scanned_AHFA = To_AHFA_of_EIM_by_SYMID (predecessor, token_id); 8181 scanned_earley_item = earley_item_assign (r, 8182 current_earley_set, 8183 Origin_of_EIM (predecessor), 8184 scanned_AHFA); 8185 token_link_add (r, scanned_earley_item, predecessor, token); 8186 prediction_AHFA = Empty_Transition_of_AHFA (scanned_AHFA); 8187 if (!prediction_AHFA) continue; 8188 scanned_earley_item = earley_item_assign (r, 8189 current_earley_set, 8190 current_earley_set, 8191 prediction_AHFA); 8192 } 8193} 8194 8195@ @<Pre-populate the completion stack@> = { 8196 EIM* work_earley_items = DSTACK_BASE (r->t_eim_work_stack, EIM ); 8197 gint no_of_work_earley_items = DSTACK_LENGTH (r->t_eim_work_stack ); 8198 gint ix; 8199 DSTACK_CLEAR(r->t_completion_stack); 8200 for (ix = 0; 8201 ix < no_of_work_earley_items; 8202 ix++) { 8203 EIM earley_item = work_earley_items[ix]; 8204 EIM* tos; 8205 if (!Earley_Item_is_Completion (earley_item)) 8206 continue; 8207 tos = DSTACK_PUSH (r->t_completion_stack, EIM); 8208 *tos = earley_item; 8209 } 8210 } 8211 8212@ For the current completion cause, 8213add those Earley items it ``causes". 8214@<Add new Earley items for |cause|@> = 8215{ 8216 Marpa_Symbol_ID *complete_symbols = Complete_SYMIDs_of_EIM (cause); 8217 gint count = Complete_SYM_Count_of_EIM (cause); 8218 ES middle = Origin_of_EIM (cause); 8219 gint symbol_ix; 8220 for (symbol_ix = 0; symbol_ix < count; symbol_ix++) 8221 { 8222 Marpa_Symbol_ID complete_symbol = complete_symbols[symbol_ix]; 8223 @<Add new Earley items for |complete_symbol| and |cause|@>@; 8224 } 8225} 8226 8227@ @<Add new Earley items for |complete_symbol| and |cause|@> = 8228{ 8229 PIM postdot_item; 8230 for (postdot_item = First_PIM_of_ES_by_SYMID (middle, complete_symbol); 8231 postdot_item; postdot_item = Next_PIM_of_PIM (postdot_item)) 8232 { 8233 EIM predecessor = EIM_of_PIM (postdot_item); 8234 EIM effect; 8235 AHFA effect_AHFA_state; 8236 if (predecessor) 8237 { /* Not a Leo item */ 8238 @<Add effect, plus any prediction, for non-Leo predecessor@>@; 8239 } 8240 else 8241 { /* A Leo item */ 8242 @<Add effect of Leo item@>@; 8243 break; /* When I encounter a Leo item, 8244 I skip everything else for this postdot 8245 symbol */ 8246 } 8247 } 8248} 8249 8250@ @<Add effect, plus any prediction, for non-Leo predecessor@> = 8251{ 8252 ES origin = Origin_of_EIM(predecessor); 8253 effect_AHFA_state = To_AHFA_of_EIM_by_SYMID(predecessor, complete_symbol); 8254 effect = earley_item_assign(r, current_earley_set, 8255 origin, effect_AHFA_state); 8256 if (Earley_Item_has_No_Source(effect)) { 8257 /* If it has no source, then it is new */ 8258 if (Earley_Item_is_Completion(effect)) { 8259 @<Push effect onto completion stack@>@; 8260 } 8261 @<Add Earley item predicted by completion, if there is one@>@; 8262 } 8263 completion_link_add(r, effect, predecessor, cause); 8264} 8265 8266@ @<Push effect onto completion stack@> = { 8267 EIM* tos = DSTACK_PUSH (r->t_completion_stack, EIM); 8268 *tos = effect; 8269} 8270 8271 8272 8273@ @<Add Earley item predicted by completion, if there is one@> = { 8274 AHFA prediction_AHFA_state = 8275 Empty_Transition_of_AHFA (effect_AHFA_state); 8276 if (prediction_AHFA_state) 8277 { 8278 earley_item_assign (r, current_earley_set, current_earley_set, 8279 prediction_AHFA_state); 8280 } 8281} 8282 8283@ @<Add effect of Leo item@> = { 8284 LIM leo_item = LIM_of_PIM (postdot_item); 8285 ES origin = Origin_of_LIM (leo_item); 8286 effect_AHFA_state = Top_AHFA_of_LIM (leo_item); 8287 effect = earley_item_assign (r, current_earley_set, 8288 origin, effect_AHFA_state); 8289 if (Earley_Item_has_No_Source (effect)) 8290 { 8291 /* If it has no source, then it is new */ 8292 @<Push effect onto completion stack@>@; 8293 } 8294 leo_link_add (r, effect, leo_item, cause); 8295} 8296 8297@ @<Private function prototypes@> = 8298static inline void earley_set_update_items(RECCE r, ES set); 8299@ @<Function definitions@> = 8300static inline void earley_set_update_items(RECCE r, ES set) { 8301 EIM* working_earley_items; 8302 EIM* finished_earley_items; 8303 gint working_earley_item_count; 8304 gint i; 8305 if (!EIMs_of_ES(set)) { 8306 EIMs_of_ES(set) = g_new(EIM, EIM_Count_of_ES(set)); 8307 } else { 8308 EIMs_of_ES(set) = g_renew(EIM, EIMs_of_ES(set), EIM_Count_of_ES(set)); 8309 } 8310 finished_earley_items = EIMs_of_ES(set); 8311 working_earley_items = Work_EIMs_of_R(r); 8312 working_earley_item_count = Work_EIM_Count_of_R(r); 8313 for (i = 0; i < working_earley_item_count; i++) { 8314 EIM earley_item = working_earley_items[i]; 8315 gint ordinal = Ord_of_EIM(earley_item); 8316 finished_earley_items[ordinal] = earley_item; 8317 } 8318 WORK_EIMS_CLEAR(r); 8319} 8320 8321@ @<Private function prototypes@> = 8322static inline void r_update_earley_sets(RECCE r); 8323@ @d P_ES_of_R_by_Ord(r, ord) DSTACK_INDEX((r)->t_earley_set_stack, ES, (ord)) 8324@d ES_of_R_by_Ord(r, ord) (*P_ES_of_R_by_Ord((r), (ord))) 8325@<Function definitions@> = 8326static inline void r_update_earley_sets(RECCE r) { 8327 ES set; 8328 ES first_unstacked_earley_set; 8329 if (!DSTACK_IS_INITIALIZED(r->t_earley_set_stack)) { 8330 first_unstacked_earley_set = First_ES_of_R(r); 8331 DSTACK_INIT (r->t_earley_set_stack, ES, 8332 MAX (1024, ES_Count_of_R(r))); 8333 } else { 8334 ES* top_of_stack = DSTACK_TOP(r->t_earley_set_stack, ES); 8335 first_unstacked_earley_set = Next_ES_of_ES(*top_of_stack); 8336 } 8337 for (set = first_unstacked_earley_set; set; set = Next_ES_of_ES(set)) { 8338 ES* top_of_stack = DSTACK_PUSH(r->t_earley_set_stack, ES); 8339 (*top_of_stack) = set; 8340 } 8341} 8342 8343@** Create the Postdot Items. 8344@ This function inserts regular (non-Leo) postdot items into 8345the postdot list. 8346It is assumed that the caller has ensured this is not a duplicate. 8347@<Private function prototypes@> = 8348static void 8349postdot_items_create (struct marpa_r *r, ES set); 8350@ Not inlined, because of its size, and because it is used 8351twice -- once in initializing the Earley set 0, 8352and once for completing later Earley sets. 8353Earley set 0 is very much a special case, and it 8354might be a good idea to have 8355separate code to handle it, 8356in which case both could be inlined. 8357@ Leo items are not created for Earley set 0. 8358They are always optional, and add little at that point. 8359In that way I can avoid dealing with empty productions in 8360the Leo logic. 8361Empty productions only occur in dealing with the null parse, 8362and only in Earley set 0. 8363@<Function definitions@> = 8364static void 8365postdot_items_create (struct marpa_r *r, ES current_earley_set) 8366{ 8367 gpointer * const pim_workarea = r->t_sym_workarea; 8368 GRAMMAR_Const g = G_of_R(r); 8369 EARLEME current_earley_set_id = Earleme_of_ES(current_earley_set); 8370 Bit_Vector bv_pim_symbols = r->t_bv_sym; 8371 Bit_Vector bv_lim_symbols = r->t_bv_sym2; 8372 bv_clear (bv_pim_symbols); 8373 bv_clear (bv_lim_symbols); 8374 @<Start EIXes in PIM workarea@>@; 8375 if (r->t_is_using_leo) { 8376 @<Start LIMs in PIM workarea@>@; 8377 @<Add predecessors to LIMs@>@; 8378 } 8379 @<Copy PIM workarea to postdot item array@>@; 8380 bv_and(r->t_bv_symid_is_expected, bv_pim_symbols, g->t_bv_symid_is_terminal); 8381} 8382 8383@ This code creates the Earley indexes in the PIM workarea. 8384At this point there are no Leo items. 8385@<Start EIXes in PIM workarea@> = { 8386 EIM* work_earley_items = DSTACK_BASE (r->t_eim_work_stack, EIM ); 8387 gint no_of_work_earley_items = DSTACK_LENGTH (r->t_eim_work_stack ); 8388 gint ix; 8389 for (ix = 0; 8390 ix < no_of_work_earley_items; 8391 ix++) { 8392 EIM earley_item = work_earley_items[ix]; 8393 AHFA state = AHFA_of_EIM (earley_item); 8394 gint symbol_ix; 8395 gint postdot_symbol_count = Postdot_SYM_Count_of_AHFA (state); 8396 Marpa_Symbol_ID *postdot_symbols = 8397 Postdot_SYMID_Ary_of_AHFA (state); 8398 for (symbol_ix = 0; symbol_ix < postdot_symbol_count; symbol_ix++) 8399 { 8400 PIM old_pim = NULL; 8401 PIM new_pim; 8402 Marpa_Symbol_ID symid; 8403 new_pim = obstack_alloc (&r->t_obs, sizeof (EIX_Object)); 8404 symid = postdot_symbols[symbol_ix]; 8405 LV_Postdot_SYMID_of_PIM(new_pim) = symid; 8406 LV_EIM_of_PIM(new_pim) = earley_item; 8407 if (bv_bit_test(bv_pim_symbols, (guint)symid)) 8408 old_pim = pim_workarea[symid]; 8409 if (old_pim) { 8410 LV_Next_PIM_of_PIM(new_pim) = old_pim; 8411 } else { 8412 LV_Next_PIM_of_PIM(new_pim) = NULL; 8413 current_earley_set->t_postdot_sym_count++; 8414 } 8415 pim_workarea[symid] = new_pim; 8416 bv_bit_set(bv_pim_symbols, (guint)symid); 8417 } 8418 } 8419} 8420 8421@ This code creates the Earley indexes in the PIM workarea. 8422The Leo items do not contain predecessors or have the 8423predecessor-dependent information set at this point. 8424@ The origin and predecessor will be filled in later, 8425when the predecessor is known. 8426The top AHFA to-state is set to |NULL|, 8427and that will be used as an indicator that the fields 8428of this 8429Leo item have not been fully populated. 8430@d LIM_is_Populated(leo) (Origin_of_LIM(leo) != NULL) 8431@<Start LIMs in PIM workarea@> = 8432{ 8433 guint min, max, start; 8434 for (start = 0; bv_scan (bv_pim_symbols, start, &min, &max); 8435 start = max + 2) 8436 { 8437 SYMID symid; 8438 for (symid = (SYMID) min; symid <= (SYMID) max; symid++) 8439 { 8440 PIM this_pim = pim_workarea[symid]; 8441 if (!Next_PIM_of_PIM (this_pim)) 8442 { /* Only create a Leo item if there is more 8443 than one EIX */ 8444 EIM leo_base = EIM_of_PIM (this_pim); 8445 AHFA base_to_ahfa = To_AHFA_of_EIM_by_SYMID (leo_base, symid); 8446 if (AHFA_is_Leo_Completion (base_to_ahfa)) 8447 { 8448 @<Create a new, unpopulated, LIM@>@; 8449 } 8450 } 8451 } 8452 } 8453} 8454 8455@ The Top AHFA of the new LIM is temporarily used 8456to memoize 8457the value of the AHFA to-state for the LIM's 8458base EIM. 8459That may become its actual value, 8460once it is populated. 8461@<Create a new, unpopulated, LIM@> = { 8462 LIM new_lim; 8463 new_lim = obstack_alloc(&r->t_obs, sizeof(*new_lim)); 8464 Postdot_SYMID_of_LIM(new_lim) = symid; 8465 LV_EIM_of_PIM(new_lim) = NULL; 8466 LV_Predecessor_LIM_of_LIM(new_lim) = NULL; 8467 LV_Origin_of_LIM(new_lim) = NULL; 8468 LV_Chain_Length_of_LIM(new_lim) = -1; 8469 LV_Top_AHFA_of_LIM(new_lim) = base_to_ahfa; 8470 LV_Base_EIM_of_LIM(new_lim) = leo_base; 8471 LV_ES_of_LIM(new_lim) = current_earley_set; 8472 LV_Next_PIM_of_LIM(new_lim) = this_pim; 8473 pim_workarea[symid] = new_lim; 8474 bv_bit_set(bv_lim_symbols, (guint)symid); 8475} 8476 8477@ This code fully populates the data in the LIMs. 8478It determines the Leo predecesors of the LIMs, if any, 8479then populates that datum and the predecessor-dependent 8480data. 8481@ The algorithm is fast, if not a model of simplicity. 8482The LIMs are processed in an outer loop in order by 8483symbol ID, as well as in an inner loop which processes 8484predecessor chains from bottom to top. 8485It is very much possible that the 8486same LIM will be encountered twice, 8487once in each loop. 8488The code always checks to see if a LIM is 8489already populated, 8490before populating it. 8491@ The outer loop ensures that all LIMs are eventually 8492populated. It uses the PIM workarea, guided by 8493a bit vector which indicates the LIM's. 8494@ It is possible for a LIM to be encountered which may have a predecessor, 8495but which cannot be immediately populated. 8496This is because predecessors link the LIMs in chains, and such chains 8497must be populated in order. 8498Any ``links" in the chain of LIMs which are in previous Earley sets 8499will already be populated. 8500But a chain of LIMs may all be in the current Earley set, the 8501one we are currently processing. 8502In this case, there is a chicken-and-egg issue, which is 8503resolved by arranging those LIMs in chain link order, 8504and processing them in that order. 8505This is the business of the inner loop. 8506@ When a LIM is encountered which cannot be populated immediately, 8507its chain is followed and copied into |lim_chain|, which is in 8508effect a stack. The chain ends when it reaches 8509a LIM which can be populated immediately. 8510@ A special case is when the LIM chain cycles back to the LIM 8511which started the chain. 8512When this happens, the LIM chain is terminated. 8513The bottom of such a chain 8514(which, since it is a cycle, is also the top) 8515is populated with a predecessor of 8516|NULL| and appropriate predecessor-dependent data. 8517@ {\bf Theorem}: The number of links 8518in a LIM chain is never more than the number 8519of symbols in the grammar. 8520{\bf Proof}: A LIM chain consists of the predecessors of LIMs, 8521all of which are in the same Earley set. 8522A LIM is uniquely determined by a duple of Earley set and transition symbol. 8523This means, in a single Earley set, there is at most one LIM per symbol. 8524{\bf QED}. 8525@ {\bf Complexity}: Time complexity is $O(n)$, where $n$ is the number 8526of LIMs. This can be shown as follows: 8527\li The outer loop processes each LIM exactly once. 8528\li A LIM is never put onto a LIM chain if it is already populated. 8529\li A LIM is never taken off a LIM chain without being populated. 8530\li Based on the previous two observations, we know that a LIM will 8531be put onto a LIM chain at most once. 8532\li Ignoring the inner loop processing, the amount of processing done for each 8533LIM in the outer loop LIM is $O(1)$. 8534\li The amount of processing done for each LIM 8535in the inner loop is $O(1)$. 8536\li Total processing for all $n$ LIMs is therefore $n(O(1)+O(1))=O(n)$. 8537@ The |bv_ok_for_chain| is a vector of bits by symbol ID. 8538A bit is set if there is a LIM for that symbol ID that is OK for addition 8539to the LIM chain. 8540To be OK for addition to the LIM chain, the postdot item for the symbol 8541ID must 8542\li In fact actually be a Leo item (LIM). 8543\li Must not have been populated. 8544\li Must not have already been added to a LIM chain for this 8545Earley set.\par 8546@<Add predecessors to LIMs@> = { 8547 const Bit_Vector bv_ok_for_chain = r->t_bv_sym3; 8548 guint min, max, start; 8549 8550 bv_copy(bv_ok_for_chain, bv_lim_symbols); 8551 for (start = 0; bv_scan (bv_lim_symbols, start, &min, &max); 8552 start = max + 2) 8553 { /* This is the outer loop. It loops over the symbols IDs, 8554 visiting only the symbols with LIMs. */ 8555 SYMID main_loop_symbol_id; 8556 for (main_loop_symbol_id = (SYMID) min; 8557 main_loop_symbol_id <= (SYMID) max; 8558 main_loop_symbol_id++) 8559 { 8560 LIM predecessor_lim; 8561 LIM lim_to_process = pim_workarea[main_loop_symbol_id]; 8562 if (LIM_is_Populated(lim_to_process)) continue; /* LIM may 8563 have already been populated in the LIM chain loop */ 8564 @<Find predecessor LIM of unpopulated LIM@>@; 8565 if (predecessor_lim && LIM_is_Populated(predecessor_lim)) { 8566 @<Populate |lim_to_process| from |predecessor_lim|@>@; 8567 continue; 8568 } 8569 if (!predecessor_lim) { /* If there is no predecessor LIM to 8570 populate, we know that we should populate from the base 8571 Earley item */ 8572 @<Populate |lim_to_process| from its base Earley item@>@; 8573 continue; 8574 } 8575 @<Create and populate a LIM chain@>@; 8576 } 8577 } 8578} 8579 8580@ Find the predecessor LIM from the PIM workarea. 8581If the predecessor 8582starts at the current Earley set, I need to look in 8583the PIM workarea. 8584Otherwise the PIM item array by symbol is already 8585set up and I can find it there. 8586@ The LHS of the completed rule and of the applicable rule 8587in the base item will be the same, because the two rules 8588are the same. 8589Given the |main_loop_symbol_id| we can look up either the 8590appropriate rule in the base Earley item's AHFA state, 8591or the Leo completion's AHFA state. 8592It is most convenient to find the LHS of the completed 8593rule as the 8594only possible Leo LHS of the Leo completion's AHFA state. 8595The AHFA state for the Leo completion is guaranteed 8596to have only one rule. 8597The base Earley item's AHFA state can have multiple 8598rules, and in its list of rules there can 8599be transitions to Leo 8600completions via several different symbols. 8601@ This code only works for unpopulated LIMs, 8602because it relies on the Top AHFA value containing 8603the base AHFA to-state. 8604In a populated LIM, this will not necessarily be the case. 8605@<Find predecessor LIM of unpopulated LIM@> = { 8606 const EIM base_eim = Base_EIM_of_LIM(lim_to_process); 8607 const ES predecessor_set = Origin_of_EIM(base_eim); 8608 const AHFA base_to_ahfa = Top_AHFA_of_LIM(lim_to_process); 8609 const SYMID predecessor_transition_symbol = Leo_LHS_ID_of_AHFA(base_to_ahfa); 8610 PIM predecessor_pim; 8611 if (Earleme_of_ES(predecessor_set) < current_earley_set_id) { 8612 predecessor_pim 8613 = First_PIM_of_ES_by_SYMID (predecessor_set, predecessor_transition_symbol); 8614 } else { 8615 predecessor_pim = pim_workarea[predecessor_transition_symbol]; 8616 } 8617 predecessor_lim = PIM_is_LIM(predecessor_pim) ? LIM_of_PIM(predecessor_pim) : NULL; 8618} 8619 8620@ @<Create and populate a LIM chain@> = { 8621 gpointer* const lim_chain = r->t_workarea2; 8622 gint lim_chain_ix; 8623 @<Create a LIM chain@>@; 8624 @<Populate the LIMs in the LIM chain@>@; 8625} 8626 8627@ At this point we know that 8628\li |lim_to_process != NULL| 8629\li |lim_to_process| is not populated 8630\li |predecessor_lim != NULL| 8631\li |predecessor_lim| is not populated 8632@ Cycles can occur in the LIM chain. They are broken by refusing to 8633put the same LIM on LIM chain twice. Since a LIM chain links are one-to-one, 8634ensuring that the LIM on the bottom of the chain is never added to the LIM 8635chain is enough to enforce this. 8636@ When I am about to add a LIM twice to the LIM chain, instead I break the 8637chain at that point. The top of chain will then have no LIM predecesor, 8638instead of being part of a cycle. Since the LIM information is always optional, 8639and in that case would be useless, breaking the chain in this way causes no 8640problems. 8641@<Create a LIM chain@> = { 8642 SYMID postdot_symid_of_lim_to_process 8643 = Postdot_SYMID_of_LIM(lim_to_process); 8644 lim_chain_ix = 0; 8645 lim_chain[lim_chain_ix++] = LIM_of_PIM(lim_to_process); 8646 bv_bit_clear(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process); 8647 /* Make sure this LIM 8648 is not added to a LIM chain again for this Earley set */ @# 8649 while (1) { 8650 lim_to_process = predecessor_lim; /* I know at this point that 8651 |predecessor_lim| is unpopulated, so I also know that 8652 |lim_to_process| is unpopulated. This means I also know that 8653 |lim_to_process| is in the current Earley set, because all LIMs 8654 in previous Earley sets are already 8655 populated. */ @# 8656 8657 postdot_symid_of_lim_to_process = Postdot_SYMID_of_LIM(lim_to_process); 8658 if (!bv_bit_test(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process)) { 8659 /* If I am about to add a previously added LIM to the LIM chain, I 8660 break the LIM chain at this point. 8661 The predecessor LIM has not yet been changed, 8662 so that it is still appropriate for 8663 the LIM at the top of the chain. */ 8664 break; 8665 } 8666 8667 @<Find predecessor LIM of unpopulated LIM@>@; 8668 8669 lim_chain[lim_chain_ix++] = LIM_of_PIM(lim_to_process); /* 8670 |lim_to_process| is not populated, as shown above */ 8671 8672 bv_bit_clear(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process); 8673 /* Make sure this LIM 8674 is not added to a LIM chain again for this Earley set */ @# 8675 8676 if (!predecessor_lim) break; /* |predecesssor_lim = NULL|, 8677 so that we are forced to break the LIM chain before it */ @# 8678 8679 if (LIM_is_Populated(predecessor_lim)) break; 8680 /* |predecesssor_lim| is populated, so that if we 8681 break before |predecessor_lim|, we are ready to populate the entire LIM 8682 chain. */ 8683 } 8684} 8685 8686@ @<Populate the LIMs in the LIM chain@> = 8687for (lim_chain_ix--; lim_chain_ix >= 0; lim_chain_ix--) { 8688 lim_to_process = lim_chain[lim_chain_ix]; 8689 if (predecessor_lim && LIM_is_Populated(predecessor_lim)) { 8690 @<Populate |lim_to_process| from |predecessor_lim|@>@; 8691 } else { 8692 @<Populate |lim_to_process| from its base Earley item@>@; 8693 } 8694 predecessor_lim = lim_to_process; 8695} 8696 8697@ @<Populate |lim_to_process| from |predecessor_lim|@> = { 8698LV_Predecessor_LIM_of_LIM(lim_to_process) = predecessor_lim; 8699LV_Origin_of_LIM(lim_to_process) = Origin_of_LIM(predecessor_lim); 8700LV_Chain_Length_of_LIM(lim_to_process) = 8701 Chain_Length_of_LIM(lim_to_process)+1; 8702LV_Top_AHFA_of_LIM(lim_to_process) = Top_AHFA_of_LIM(predecessor_lim); 8703} 8704 8705@ If we have reached this code, either we do not have a predecessor 8706LIM, or we have one which is useless for populating |lim_to_process|. 8707If a predecessor LIM is not itself populated, it will be useless 8708for populating its successor. 8709An unpopulated predecessor LIM 8710may occur when there is a predecessor LIM 8711which proved impossible to populate because it is part of a cycle. 8712@ The predecessor LIM and the top AHFA to-state were initialized 8713to the appropriate values for this case, 8714and do not need to be changed. 8715The predecessor LIM was initialized to |NULL|, 8716and the top AHFA to-state was initialized to the AHFA to-state 8717of the base EIM. 8718@<Populate |lim_to_process| from its base Earley item@> = { 8719 EIM base_eim = Base_EIM_of_LIM(lim_to_process); 8720 LV_Origin_of_LIM (lim_to_process) = Origin_of_EIM (base_eim); 8721 LV_Chain_Length_of_LIM(lim_to_process) = 0; 8722} 8723 8724@ @<Copy PIM workarea to postdot item array@> = { 8725 PIM *postdot_array 8726 = current_earley_set->t_postdot_ary 8727 = obstack_alloc (&r->t_obs, 8728 current_earley_set->t_postdot_sym_count * sizeof (PIM)); 8729 guint min, max, start; 8730 gint postdot_array_ix = 0; 8731 for (start = 0; bv_scan (bv_pim_symbols, start, &min, &max); start = max + 2) { 8732 SYMID symid; 8733 for (symid = (SYMID)min; symid <= (SYMID) max; symid++) { 8734 PIM this_pim = pim_workarea[symid]; 8735 if (this_pim) postdot_array[postdot_array_ix++] = this_pim; 8736 } 8737 } 8738} 8739 8740@** Expand the Leo Items. 8741\libmarpa/ expands Leo items on a ``lazy" basis, 8742when it creates the parse bocage. 8743Some of the "virtual" Earley items in the Leo paths will also 8744be real Earley items. 8745Earley items in the Leo path may actually exist 8746for several reasons: 8747\li The Leo completion item itself always exists before 8748this function call. 8749It is counted in the total path lengths, 8750once for each Leo path. 8751This means that the total of the Leo path lengths will never be less 8752than the number of Leo paths. 8753\li Any Leo competion base items. 8754One of these exists for every path 8755whose base is a 8756completed Earley item, and not a token. 8757\li Any other Earley item in the Leo path item which was already created 8758for other reasons. 8759If an Earley item in a Leo path already exists, a new Earley 8760item is not created --- 8761instead a source link is added to the present Earley item. 8762 8763@** Evaluation --- Preliminary Notes. 8764 8765@*0 Alternate Start Rules. 8766Note that a start symbol only works if it is 8767on the LHS of just one rule. 8768This is not an issue with the main start symbol, because 8769Marpa uses an augmented grammar. 8770It {\bf is} an issue for alternate start symbols, when 8771I implement those, because an arbitrary symbol might be 8772on the LHS of several rules. 8773 8774@ Possibilities: 8775\li Require alternate start be specified as a rule, not a symbol. 8776\li Allow alternate start symbols, but only if they are on the LHS of a 8777single rule. 8778I don't like this it it limits the ability of grammar writers 8779to do on-the-fly experiments. 8780\li Both of the above. That certainly covers the bases, 8781but it is just one more interface 8782complication. 8783 8784@ Note that even when a start rule is supplied, that does 8785not necessarily point to an unique Earley item. 8786A completed rule can belong to several different AHFA states. 8787That is OK, because even so origin, current earleme 8788and the links will all be identical for all such Earley items. 8789 8790@*0 Statistics on Completed LHS Symbols per AHFA State. 8791An AHFA state may contain completions for more than one LHS, 8792but that is rare in practical use, and the number of completed 8793LHS symbols in the exceptions remains low. 8794The very complex perl AHFA contains 271 states with completions. 8795Of these 268 have only one completed symbol. 8796The other three AHFA states complete only two different LHS symbols. 8797Two states have completions with both 8798a |term_hi| and a |indirob| on the LHS. 8799One state has completions for both a 8800|sideff| and an |mexpr|. 8801@ My HTML test grammars make the 8802same point more strongly. 8803My HTML parser generates grammars on the fly. 8804These HTML grammars can differ from each other. 8805because Marpa takes the HTML input into account when 8806generating the grammar. 8807In my HTML test suite, 8808of the 14,782 of the AHFA states, every 8809single one has only one completed LHS symbol. 8810 8811@*0 CHAF Duplicate And-Nodes. 8812There are three ways in which the same and-node can occur multiple 8813times as the descendant of a single or-node. 8814@ First, an or-node can have several different Earley items as 8815its source. This is dealt with by noticing that in building the 8816or-node, we only use the source links of an Earley item, and 8817that these are always identical. Therefore we can arbitrarily 8818select any one of the possible source Earley items to be 8819the or-node's ``unique" Earley item source. 8820@ The second source of duplication is duplicate source links 8821for the same Earley item. 8822I prevent token source links from duplicating, 8823and the Leo logic does not allow duplicate Leo source links. 8824@ Completion source links could be prevented from duplicating by 8825making the transition symbol part of its ``signature", 8826and making sure the source link transition symbol matches 8827the predot symbol of the or-node. 8828This would only impose a small overhead. 8829But given that I need to look for duplicates from other 8830sources, there does not seem to enough of a payoff to justify 8831even a small overhead. 8832@ A third source of duplication occurs 8833when different source links 8834have different AHFA states in their predecessors; but 8835share the the same AHFA item. 8836There will be 8837pairs of these source links which share the same middle earleme, 8838because if an AHFA item (dotted rule) in one is justified at a 8839location, the same AHFA item in the other must be, also. 8840This happen frequently enough to be an issue even for practical 8841grammars. 8842 8843@*0 Sources of Leo Path Items. 8844A Leo path consists of a series of Earley items: 8845\li at the bottom, exactly one Leo base item; 8846\li at the top, exactly one Leo completion item; 8847\li in between, zero or more Leo path items. 8848@ Leo base items and Leo completion items can have a variety 8849of non-Leo sources. 8850Leo completion items can have multiple Leo sources, 8851though no other source can have the same middle earleme 8852as a Leo source. 8853@ When expanded, Leo path items can have multiple sources. 8854However, the sources of a single Leo path item 8855will result from the same Leo predecessor. 8856As consequences: 8857\li All the sources of an expanded Leo path item will have the same 8858Earley item predecessor, 8859the Leo base item of the Leo predecessor. 8860\li All these sources will also have the same middle 8861earleme, the Earley set of the Leo predecessor. 8862\li Every source of the Leo path item will have a cause 8863and the transition symbol of the Leo predecessor 8864will be on the LHS of at least one completion in all of those causes. 8865\li The Leo transition symbol will be the postdot symbol in exactly 8866one AHFA item in the AHFA state of the Earley item predecessor. 8867 8868@** Ur-Node (UR) Code. 8869Ur is a German word for ``primordial", which is used 8870a lot in academic writing to designate precursors---% 8871for example, scholars who believe that Shakespeare's 8872{\it Hamlet} is based on another, now lost, play, 8873call this play the ur-Hamlet. 8874My ur-nodes are precursors of and-nodes and or-nodes. 8875@<Private incomplete structures@> = 8876struct s_ur_node_stack; 8877struct s_ur_node; 8878typedef struct s_ur_node_stack* URS; 8879typedef struct s_ur_node* UR; 8880typedef const struct s_ur_node* UR_Const; 8881@ 8882@ 8883{\bf To Do}: @^To Do@> 8884It may make sense to reuse this stack 8885for the alternatives. 8886In that case some of these structures 8887will need to be changed. 8888@d Prev_UR_of_UR(ur) ((ur)->t_prev) 8889@d LV_Prev_UR_of_UR(ur) Prev_UR_of_UR(ur) 8890@d Next_UR_of_UR(ur) ((ur)->t_next) 8891@d LV_Next_UR_of_UR(ur) Next_UR_of_UR(ur) 8892@d EIM_of_UR(ur) ((ur)->t_earley_item) 8893@d LV_EIM_of_UR(ur) EIM_of_UR(ur) 8894@d AEX_of_UR(ur) ((ur)->t_aex) 8895@d LV_AEX_of_UR(ur) AEX_of_UR(ur) 8896 8897@<Private structures@> = 8898struct s_ur_node_stack { 8899 struct obstack t_obs; 8900 UR t_base; 8901 UR t_top; 8902}; 8903struct s_ur_node { 8904 UR t_prev; 8905 UR t_next; 8906 EIM t_earley_item; 8907 AEX t_aex; 8908}; 8909@ @d URS_of_R(r) (&(r)->t_ur_node_stack) 8910@<Widely aligned recognizer elements@> = 8911struct s_ur_node_stack t_ur_node_stack; 8912@ 8913{\bf To Do}: @^To Do@> 8914The lifetime of this stack should be reexamined once its uses 8915are settled. 8916@<Initialize recognizer elements@> = 8917 ur_node_stack_init(URS_of_R(r)); 8918@ @<Destroy recognizer elements@> = 8919 ur_node_stack_destroy(URS_of_R(r)); 8920 8921@ @<Private function prototypes@> = 8922static inline void ur_node_stack_init(URS stack); 8923@ @<Function definitions@> = 8924static inline void ur_node_stack_init(URS stack) { 8925MARPA_OFF_DEBUG2("ur_node_stack_init %s", G_STRLOC); 8926 obstack_init(&stack->t_obs); 8927 stack->t_base = ur_node_new(stack, 0); 8928 ur_node_stack_reset(stack); 8929} 8930 8931@ @<Private function prototypes@> = 8932static inline void ur_node_stack_reset(URS stack); 8933@ @<Function definitions@> = 8934static inline void ur_node_stack_reset(URS stack) { 8935 stack->t_top = stack->t_base; 8936} 8937 8938@ @<Private function prototypes@> = 8939static inline void ur_node_stack_destroy(URS stack); 8940@ @<Function definitions@> = 8941static inline void ur_node_stack_destroy(URS stack) { 8942MARPA_OFF_DEBUG2("ur_node_stack_destroy %s", G_STRLOC); 8943 if (stack->t_base) obstack_free(&stack->t_obs, NULL); 8944 stack->t_base = NULL; 8945MARPA_OFF_DEBUG2("ur_node_stack_destroy %s", G_STRLOC); 8946} 8947 8948@ @<Private function prototypes@> = 8949static inline UR ur_node_new(URS stack, UR prev); 8950@ @<Function definitions@> = 8951static inline UR ur_node_new(URS stack, UR prev) { 8952 UR new_ur_node; 8953 new_ur_node = obstack_alloc(&stack->t_obs, sizeof(new_ur_node[0])); 8954 LV_Next_UR_of_UR(new_ur_node) = 0; 8955 LV_Prev_UR_of_UR(new_ur_node) = prev; 8956 return new_ur_node; 8957} 8958 8959@ @<Private function prototypes@> = 8960static inline void ur_node_push(URS stack, EIM earley_item, AEX aex); 8961@ @<Function definitions@> = 8962static inline void 8963ur_node_push (URS stack, EIM earley_item, AEX aex) 8964{ 8965 UR top = stack->t_top; 8966 UR new_top = Next_UR_of_UR (top); 8967 LV_EIM_of_UR (top) = earley_item; 8968 LV_AEX_of_UR (top) = aex; 8969 if (!new_top) 8970 { 8971 new_top = ur_node_new (stack, top); 8972 LV_Next_UR_of_UR (top) = new_top; 8973 } 8974 stack->t_top = new_top; 8975} 8976 8977@ @<Private function prototypes@> = 8978static inline UR ur_node_pop(URS stack); 8979@ @<Function definitions@> = 8980static inline UR 8981ur_node_pop (URS stack) 8982{ 8983 UR new_top = Prev_UR_of_UR (stack->t_top); 8984 if (!new_top) return NULL; 8985 stack->t_top = new_top; 8986 return new_top; 8987} 8988 8989@ |predecessor_aim| and |predot| 8990are guaranteed to be defined, 8991since predictions and the null parse AHFA item are 8992never on the stack. 8993@<Populate the PSIA data@>= 8994{ 8995 UR_Const ur_node; 8996 const URS ur_node_stack = URS_of_R(r); 8997 ur_node_stack_reset(ur_node_stack); 8998 { 8999 const EIM ur_earley_item = start_eim; 9000 const AIM ur_aim = start_aim; 9001 const AEX ur_aex = start_aex; 9002 @<Push ur-node if new@>@; 9003 } 9004 while ((ur_node = ur_node_pop(ur_node_stack))) 9005 { 9006 const EIM_Const parent_earley_item = EIM_of_UR(ur_node); 9007 const AEX parent_aex = AEX_of_UR(ur_node); 9008 const AIM parent_aim = AIM_of_EIM_by_AEX (parent_earley_item, parent_aex); 9009 MARPA_ASSERT(parent_aim >= AIM_by_ID(1))@; 9010 const AIM predecessor_aim = parent_aim - 1; 9011 /* Note that the postdot symbol of the predecessor is NOT necessarily the 9012 predot symbol, because there may be nulling symbols in between. */ 9013 guint source_type = Source_Type_of_EIM (parent_earley_item); 9014 MARPA_ASSERT(!EIM_is_Predicted(parent_earley_item))@; 9015 @<Push child Earley items from token sources@>@; 9016 @<Push child Earley items from completion sources@>@; 9017 @<Push child Earley items from Leo sources@>@; 9018 } 9019 @<Unset the PSIA for the start rule prediction@>@; 9020} 9021 9022@ The start rule prediction is a special case --- 9023it is the one AHFA prediction item not in an 9024predicted AHFA state. 9025It's dealt with by letting its entry in the 9026PSIA be set spuriously, then unsetting it. 9027Not very elegant, but this deals with it at a constant 9028cost per parse. 9029@<Unset the PSIA for the start rule prediction@> = { 9030 const ES first_earley_set = ES_of_R_by_Ord (r, 0); 9031 OR** const nodes_by_item = per_es_data[0].t_aexes_by_item; 9032 const EIM* const eims_of_es = EIMs_of_ES(first_earley_set); 9033 const gint item_count = EIM_Count_of_ES (first_earley_set); 9034 gint item_ordinal; 9035 for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++) 9036 { 9037 OR* const nodes_by_aex = nodes_by_item[item_ordinal]; 9038 if (nodes_by_aex) { 9039 const EIM earley_item = eims_of_es[item_ordinal]; 9040 const Marpa_AHFA_State_ID ahfa_id = AHFAID_of_EIM(earley_item); 9041 /* The prediction start rule will be in AHFA state 0 */ 9042 if (ahfa_id) continue; 9043 { 9044 const gint aim_count_of_item = AIM_Count_of_EIM(earley_item); 9045 AEX aex; 9046 for (aex = 0; aex < aim_count_of_item; aex++) { 9047 AIM ahfa_item = AIM_of_EIM_by_AEX(earley_item, aex); 9048 if (Position_of_AIM(ahfa_item) == 0) { 9049 /* Don't bother with the null count --- 9050 there are no nulling symbols in the start rule */ 9051 nodes_by_aex[aex] = NULL; 9052 goto FINISHED_UNSET; 9053 } 9054 } 9055 } 9056 } 9057 } 9058 FINISHED_UNSET: ; 9059} 9060 9061@ @<Push ur-node if new@> = { 9062 if (!psia_test_and_set 9063 (&bocage_setup_obs, per_es_data, ur_earley_item, ur_aex)) 9064 { 9065 ur_node_push (ur_node_stack, ur_earley_item, ur_aex); 9066 or_node_estimate += 1 + Null_Count_of_AIM(ur_aim); 9067 } 9068} 9069 9070@ The |PSIA| is a container of data that is per Earley-set, per Earley item, 9071and per AEX. Thus, Per-Set-Item-Aex, or PSIA. 9072This function ensures that the appropriate |PSIA| boolean is set, 9073and returns that boolean's value prior to the call. 9074@<Private function prototypes@> = 9075static inline gint psia_test_and_set( 9076 struct obstack* obs, 9077 struct s_bocage_setup_per_es* per_es_data, 9078 EIM earley_item, 9079 AEX ahfa_element_ix); 9080@ @<Function definitions@> = 9081static inline gint psia_test_and_set( 9082 struct obstack* obs, 9083 struct s_bocage_setup_per_es* per_es_data, 9084 EIM earley_item, 9085 AEX ahfa_element_ix) 9086{ 9087 const gint aim_count_of_item = AIM_Count_of_EIM(earley_item); 9088 const Marpa_Earley_Set_ID set_ordinal = ES_Ord_of_EIM(earley_item); 9089 OR** nodes_by_item = per_es_data[set_ordinal].t_aexes_by_item; 9090 const gint item_ordinal = Ord_of_EIM(earley_item); 9091 OR* nodes_by_aex = nodes_by_item[item_ordinal]; 9092MARPA_ASSERT(ahfa_element_ix < aim_count_of_item)@; 9093 if (!nodes_by_aex) { 9094 AEX aex; 9095 nodes_by_aex = nodes_by_item[item_ordinal] = 9096 obstack_alloc(obs, aim_count_of_item*sizeof(OR)); 9097 for (aex = 0; aex < aim_count_of_item; aex++) { 9098 nodes_by_aex[aex] = NULL; 9099 } 9100 } 9101 if (!nodes_by_aex[ahfa_element_ix]) { 9102 nodes_by_aex[ahfa_element_ix] = dummy_or_node; 9103 return 0; 9104 } 9105 return 1; 9106} 9107 9108@ @<Push child Earley items from token sources@> = 9109{ 9110 SRCL source_link = NULL; 9111 EIM predecessor_earley_item = NULL; 9112 switch (source_type) 9113 { 9114 case SOURCE_IS_TOKEN: 9115 predecessor_earley_item = Predecessor_of_EIM (parent_earley_item); 9116 break; 9117 case SOURCE_IS_AMBIGUOUS: 9118 source_link = First_Token_Link_of_EIM (parent_earley_item); 9119 if (source_link) 9120 { 9121 predecessor_earley_item = Predecessor_of_SRCL (source_link); 9122 source_link = Next_SRCL_of_SRCL (source_link); 9123 } 9124 } 9125 for (;;) 9126 { 9127 if (predecessor_earley_item) 9128 { 9129 if (EIM_is_Predicted(predecessor_earley_item)) { 9130 Set_boolean_in_PSIA_for_initial_nulls(predecessor_earley_item, predecessor_aim); 9131 } else { 9132 const EIM ur_earley_item = predecessor_earley_item; 9133 const AEX ur_aex = 9134 AEX_of_EIM_by_AIM (predecessor_earley_item, predecessor_aim); 9135 const AIM ur_aim = predecessor_aim; 9136 @<Push ur-node if new@>@; 9137 } 9138 } 9139 if (!source_link) 9140 break; 9141 predecessor_earley_item = Predecessor_of_SRCL (source_link); 9142 source_link = Next_SRCL_of_SRCL (source_link); 9143 } 9144} 9145 9146@ If there are initial nulls, set a boolean in the PSIA 9147so that I will know to create the chain of or-nodes for them. 9148We don't need to stack the prediction, because it can have 9149no other descendants. 9150@d Set_boolean_in_PSIA_for_initial_nulls(eim, aim) { 9151 if (Position_of_AIM(aim) > 0) { 9152 const gint null_count = Null_Count_of_AIM(aim); 9153 if (null_count) { 9154 AEX aex = AEX_of_EIM_by_AIM((eim), 9155 (aim)); 9156 or_node_estimate += null_count; 9157 psia_test_and_set(&bocage_setup_obs, per_es_data, 9158 (eim), aex); 9159 } 9160 } 9161} 9162 9163@ @<Push child Earley items from completion sources@> = 9164{ 9165 SRCL source_link = NULL; 9166 EIM predecessor_earley_item = NULL; 9167 EIM cause_earley_item = NULL; 9168 const SYMID transition_symbol_id = Postdot_SYMID_of_AIM(predecessor_aim); 9169 switch (source_type) 9170 { 9171 case SOURCE_IS_COMPLETION: 9172 predecessor_earley_item = Predecessor_of_EIM (parent_earley_item); 9173 cause_earley_item = Cause_of_EIM (parent_earley_item); 9174 break; 9175 case SOURCE_IS_AMBIGUOUS: 9176 source_link = First_Completion_Link_of_EIM (parent_earley_item); 9177 if (source_link) 9178 { 9179 predecessor_earley_item = Predecessor_of_SRCL (source_link); 9180 cause_earley_item = Cause_of_SRCL (source_link); 9181 source_link = Next_SRCL_of_SRCL (source_link); 9182 } 9183 break; 9184 } 9185 while (cause_earley_item) 9186 { 9187 if (predecessor_earley_item) 9188 { 9189 if (EIM_is_Predicted (predecessor_earley_item)) 9190 { 9191 Set_boolean_in_PSIA_for_initial_nulls(predecessor_earley_item, predecessor_aim); 9192 } 9193 else 9194 { 9195 const EIM ur_earley_item = predecessor_earley_item; 9196 const AEX ur_aex = 9197 AEX_of_EIM_by_AIM (predecessor_earley_item, predecessor_aim); 9198 const AIM ur_aim = predecessor_aim; 9199 @<Push ur-node if new@>@; 9200 } 9201 } 9202 { 9203 const TRANS cause_completion_data = 9204 TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id); 9205 const gint aex_count = Completion_Count_of_TRANS (cause_completion_data); 9206 const AEX * const aexes = AEXs_of_TRANS (cause_completion_data); 9207 const EIM ur_earley_item = cause_earley_item; 9208 gint ix; 9209 for (ix = 0; ix < aex_count; ix++) { 9210 const AEX ur_aex = aexes[ix]; 9211 const AIM ur_aim = AIM_of_EIM_by_AEX(ur_earley_item, ur_aex); 9212 @<Push ur-node if new@>@; 9213 } 9214 } 9215 if (!source_link) break; 9216 predecessor_earley_item = Predecessor_of_SRCL (source_link); 9217 cause_earley_item = Cause_of_SRCL (source_link); 9218 source_link = Next_SRCL_of_SRCL (source_link); 9219 } 9220} 9221 9222@ @<Push child Earley items from Leo sources@> = 9223{ 9224 SRCL source_link = NULL; 9225 EIM cause_earley_item = NULL; 9226 LIM leo_predecessor = NULL; 9227 switch (source_type) 9228 { 9229 case SOURCE_IS_LEO: 9230 leo_predecessor = Predecessor_of_EIM (parent_earley_item); 9231 cause_earley_item = Cause_of_EIM (parent_earley_item); 9232 break; 9233 case SOURCE_IS_AMBIGUOUS: 9234 source_link = First_Leo_SRCL_of_EIM (parent_earley_item); 9235 if (source_link) 9236 { 9237 leo_predecessor = Predecessor_of_SRCL (source_link); 9238 cause_earley_item = Cause_of_SRCL (source_link); 9239 source_link = Next_SRCL_of_SRCL (source_link); 9240 } 9241 break; 9242 } 9243 while (cause_earley_item) 9244 { 9245 const SYMID transition_symbol_id = Postdot_SYMID_of_LIM(leo_predecessor); 9246 const TRANS cause_completion_data = 9247 TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id); 9248 const gint aex_count = Completion_Count_of_TRANS (cause_completion_data); 9249 const AEX * const aexes = AEXs_of_TRANS (cause_completion_data); 9250 gint ix; 9251 EIM ur_earley_item = cause_earley_item; 9252 for (ix = 0; ix < aex_count; ix++) { 9253 const AEX ur_aex = aexes[ix]; 9254 const AIM ur_aim = AIM_of_EIM_by_AEX(ur_earley_item, ur_aex); 9255 @<Push ur-node if new@>@; 9256 } 9257 while (leo_predecessor) { 9258 SYMID postdot = Postdot_SYMID_of_LIM (leo_predecessor); 9259 EIM leo_base = Base_EIM_of_LIM (leo_predecessor); 9260 TRANS transition = TRANS_of_EIM_by_SYMID (leo_base, postdot); 9261 const AEX ur_aex = Leo_Base_AEX_of_TRANS (transition); 9262 const AIM ur_aim = AIM_of_EIM_by_AEX(leo_base, ur_aex); 9263 ur_earley_item = leo_base; 9264 /* Increment the 9265 estimate to account for the Leo path or-nodes */ 9266 or_node_estimate += 1 + Null_Count_of_AIM(ur_aim+1); 9267 if (EIM_is_Predicted (ur_earley_item)) 9268 { 9269 Set_boolean_in_PSIA_for_initial_nulls(ur_earley_item, ur_aim); 9270 } else { 9271 @<Push ur-node if new@>@; 9272 } 9273 leo_predecessor = Predecessor_LIM_of_LIM(leo_predecessor); 9274 } 9275 if (!source_link) break; 9276 leo_predecessor = Predecessor_of_SRCL (source_link); 9277 cause_earley_item = Cause_of_SRCL (source_link); 9278 source_link = Next_SRCL_of_SRCL (source_link); 9279 } 9280} 9281 9282@** Or-Node (OR) Code. 9283The or-nodes are part of the parse bocage 9284and are similar to the or-nodes of a standard parse forest. 9285Unlike a parse forest, 9286a parse bocage can contain cycles. 9287 9288@<Public typedefs@> = 9289typedef gint Marpa_Or_Node_ID; 9290@ @<Private typedefs@> = 9291typedef Marpa_Or_Node_ID ORID; 9292 9293@*0 Relationship of Earley Items to Or-Nodes. 9294Several Earley items may be the source of the same or-node, 9295but the or-node only keeps track of one. This is sufficient, 9296because the Earley item is tracked by the or-node only for its 9297links and, 9298by the following theorem, 9299the links for every Earley item which is the source 9300of the same or-node must be the same. 9301 9302@ {\bf Theorem}: If two Earley items are sources of the same or-node, 9303they have the same links. 9304{\bf Outline of Proof}: 9305No or-node results from a predicted Earley 9306item, so every Earley item which is the source of an or-node 9307is itself the result of a transition over a symbol from 9308another Earley item. 9309So I can restrict my discussion to discovered Earley items. 9310For the same reason, I can assume all source links have 9311predecessors defined. 9312 9313@ {\bf Shared Predot Lemma}: An AHFA state is either predicted, 9314or all its LR0 items share the same predot symbol. 9315{\bf Proof}: Straightforward, based on the construction of 9316an AHFA. 9317 9318@ {\bf EIM Lemma }: If two Earley items are sources of the same or-node, 9319they share the same origin ES, the same current ES and the same 9320predot symbol. 9321{\bf Proof of Lemma}: 9322Showing that the Earley items share the same origin and current 9323ES is straightforward, based on the or-node's construction. 9324They share at least one LR0 item in their AHFA states---% 9325the LR0 item which defines the or-node. 9326Because they share at least one LR0 item and because, by the 9327Shared Predot Lemma, every LR0 9328item in a discovered AHFA state has the same predot symbol, 9329the two Earley items also 9330share the same predot symbol. 9331 9332@ {\bf Completion Source Lemma}: 9333A discovered Earley item has a completion source link if and only if 9334the origin ES of the link's predecessor, 9335the current ES of the link's cause 9336and the transition symbol match, respectively, 9337the origin ES, current ES and predot symbol of the discovered EIM. 9338{\bf Proof}: Based on the construction of EIMs. 9339 9340@ {\bf Token Source Lemma}: 9341A discovered Earley item has a token source link if and only if 9342origin ES of the link's predecessor, the current ES of the link's cause 9343and the token symbol match, respectively, 9344the origin ES, current ES and predot symbol of the discovered EIM. 9345{\bf Proof}: Based on the construction of EIMs. 9346 9347@ Source links are either completion source links or token source links. 9348The theorem for completion source links follows from the EIM Lemma and the 9349Completion Source Lemma. 9350The theorem for token source links follows from the EIM Lemma and the 9351Token Source Lemma. 9352{\bf QED}. 9353 9354@ @<Private incomplete structures@> = 9355union u_or_node; 9356typedef union u_or_node* OR; 9357@ The type is contained in same word as the position is 9358for final or-nodes. 9359@s OR int 9360Position is |DUMMY_OR_NODE| for dummy or-nodes, 9361|TOKEN_OR_NODE| if the or-node is actually a symbol. 9362Position is the dot position. 9363@d DUMMY_OR_NODE -1 9364@d TOKEN_OR_NODE -2 9365@d OR_is_Token(or) (Type_of_OR(or) == TOKEN_OR_NODE) 9366@d Position_of_OR(or) ((or)->t_final.t_position) 9367@d Type_of_OR(or) ((or)->t_final.t_position) 9368@d RULE_of_OR(or) ((or)->t_final.t_rule) 9369@d Origin_Ord_of_OR(or) ((or)->t_final.t_start_set_ordinal) 9370@d ID_of_OR(or) ((or)->t_final.t_id) 9371@d ES_Ord_of_OR(or) ((or)->t_draft.t_end_set_ordinal) 9372@d DANDs_of_OR(or) ((or)->t_draft.t_draft_and_node) 9373@d First_ANDID_of_OR(or) ((or)->t_final.t_first_and_node_id) 9374@d AND_Count_of_OR(or) ((or)->t_final.t_and_node_count) 9375@ C89 guarantees that common initial sequences 9376may be accessed via different members of a union. 9377@<Or-node common initial sequence@> = 9378gint t_position; 9379gint t_end_set_ordinal; 9380RULE t_rule; 9381gint t_start_set_ordinal; 9382ORID t_id; 9383@ @<Private structures@> = 9384struct s_draft_or_node 9385{ 9386 @<Or-node common initial sequence@>@; 9387 DAND t_draft_and_node; 9388}; 9389@ @<Private structures@> = 9390struct s_final_or_node 9391{ 9392 @<Or-node common initial sequence@>@; 9393 gint t_first_and_node_id; 9394 gint t_and_node_count; 9395}; 9396@ 9397@d TOK_of_OR(or) (&(or)->t_token) 9398@d SYMID_of_OR(or) SYMID_of_TOK(TOK_of_OR(or)) 9399@d Value_of_OR(or) Value_of_TOK(TOK_of_OR(or)) 9400@<Private structures@> = 9401union u_or_node { 9402 struct s_draft_or_node t_draft; 9403 struct s_final_or_node t_final; 9404 struct s_token t_token; 9405}; 9406typedef union u_or_node OR_Object; 9407 9408@ @<Private global variables@> = 9409static const gint dummy_or_node_type = DUMMY_OR_NODE; 9410static const OR dummy_or_node = (OR)&dummy_or_node_type; 9411 9412@ @d ORs_of_B(b) ((b)->t_or_nodes) 9413@d OR_of_B_by_ID(b, id) (ORs_of_B(b)[(id)]) 9414@d OR_Count_of_B(b) ((b)->t_or_node_count) 9415@d ANDs_of_B(b) ((b)->t_and_nodes) 9416@d AND_Count_of_B(b) ((b)->t_and_node_count) 9417@d Top_ORID_of_B(b) ((b)->t_top_or_node_id) 9418@<Widely aligned bocage elements@> = 9419OR* t_or_nodes; 9420AND t_and_nodes; 9421@ @<Int aligned bocage elements@> = 9422gint t_or_node_count; 9423gint t_and_node_count; 9424ORID t_top_or_node_id; 9425 9426@ @<Initialize bocage elements@> = 9427ORs_of_B(b) = NULL; 9428OR_Count_of_B(b) = 0; 9429ANDs_of_B(b) = NULL; 9430AND_Count_of_B(b) = 0; 9431 9432@ @<Destroy bocage elements, main phase@> = 9433{ 9434 OR* or_nodes = ORs_of_B (b); 9435 AND and_nodes = ANDs_of_B (b); 9436 if (or_nodes) 9437 { 9438 g_free (or_nodes); 9439 ORs_of_B (b) = NULL; 9440 } 9441 if (and_nodes) 9442 { 9443 g_free (and_nodes); 9444 ANDs_of_B (b) = NULL; 9445 } 9446} 9447 9448@*0 Create the Or-Nodes. 9449@<Create the or-nodes for all earley sets@> = 9450{ 9451 PSAR_Object or_per_es_arena; 9452 const PSAR or_psar = &or_per_es_arena; 9453 gint work_earley_set_ordinal; 9454 OR last_or_node = NULL ; 9455 ORs_of_B (b) = g_new (OR, or_node_estimate); 9456 psar_init (or_psar, SYMI_Count_of_G (g)); 9457 for (work_earley_set_ordinal = 0; 9458 work_earley_set_ordinal < earley_set_count_of_r; 9459 work_earley_set_ordinal++) 9460 { 9461 const ES_Const earley_set = ES_of_R_by_Ord (r, work_earley_set_ordinal); 9462 EIM* const eims_of_es = EIMs_of_ES(earley_set); 9463 const gint item_count = EIM_Count_of_ES (earley_set); 9464 PSL this_earley_set_psl; 9465 OR** const nodes_by_item = per_es_data[work_earley_set_ordinal].t_aexes_by_item; 9466 psar_dealloc(or_psar); 9467#define PSL_ES_ORD work_earley_set_ordinal 9468#define CLAIMED_PSL this_earley_set_psl 9469 @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@; 9470 @<Create the or-nodes for |work_earley_set_ordinal|@>@; 9471 @<Create the draft and-nodes for |work_earley_set_ordinal|@>@; 9472 } 9473 psar_destroy (or_psar); 9474 ORs_of_B(b) = g_renew (OR, ORs_of_B(b), OR_Count_of_B(b)); 9475} 9476 9477@ @<Create the or-nodes for |work_earley_set_ordinal|@> = 9478{ 9479 gint item_ordinal; 9480 for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++) 9481 { 9482 OR* const work_nodes_by_aex = nodes_by_item[item_ordinal]; 9483 if (work_nodes_by_aex) { 9484 const EIM work_earley_item = eims_of_es[item_ordinal]; 9485 const gint work_ahfa_item_count = AIM_Count_of_EIM(work_earley_item); 9486 AEX work_aex; 9487 const gint work_origin_ordinal = Ord_of_ES (Origin_of_EIM (work_earley_item)); 9488 for (work_aex = 0; work_aex < work_ahfa_item_count; work_aex++) { 9489 if (!work_nodes_by_aex[work_aex]) continue; 9490 @<Create the or-nodes 9491 for |work_earley_item| and |work_aex|@>@; 9492 } 9493 } 9494 } 9495} 9496 9497@ @<Create the or-nodes for |work_earley_item| and |work_aex|@> = 9498{ 9499 AIM ahfa_item = AIM_of_EIM_by_AEX(work_earley_item, work_aex); 9500 SYMI ahfa_item_symbol_instance; 9501 OR psia_or_node = NULL; 9502 ahfa_item_symbol_instance = SYMI_of_AIM(ahfa_item); 9503 { 9504 PSL or_psl; 9505#define PSL_ES_ORD work_origin_ordinal 9506#define CLAIMED_PSL or_psl 9507 @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@; 9508 @<Add main or-node@>@; 9509 @<Add nulling token or-nodes@>@; 9510 } 9511 /* Replace the dummy or-node with 9512 the last one added */ 9513 MARPA_ASSERT (psia_or_node)@; 9514 work_nodes_by_aex[work_aex] = psia_or_node; 9515 @<Add Leo or-nodes@>@; 9516} 9517 9518@*0 Non-Leo Or-Nodes. 9519@ Add the main or-node---% 9520the one that corresponds directly to this AHFA item. 9521The exception are predicted AHFA items. 9522Or-nodes are not added for predicted AHFA items. 9523@<Add main or-node@> = 9524{ 9525MARPA_OFF_DEBUG3("%s ahfa_item_symbol_instance = %d", G_STRLOC, ahfa_item_symbol_instance); 9526 if (ahfa_item_symbol_instance >= 0) 9527 { 9528 OR or_node; 9529MARPA_ASSERT(ahfa_item_symbol_instance < SYMI_Count_of_G(g))@; 9530 or_node = PSL_Datum (or_psl, ahfa_item_symbol_instance); 9531 if (!or_node || ES_Ord_of_OR(or_node) != work_earley_set_ordinal) 9532 { 9533 const RULE rule = RULE_of_AIM(ahfa_item); 9534 @<Set |last_or_node| to a new or-node@>@; 9535 or_node = last_or_node; 9536 PSL_Datum (or_psl, ahfa_item_symbol_instance) = last_or_node; 9537 Origin_Ord_of_OR(or_node) = Origin_Ord_of_EIM(work_earley_item); 9538 ES_Ord_of_OR(or_node) = work_earley_set_ordinal; 9539 RULE_of_OR(or_node) = rule; 9540 Position_of_OR (or_node) = 9541 ahfa_item_symbol_instance - SYMI_of_RULE (rule) + 1; 9542 DANDs_of_OR(or_node) = NULL; 9543 } 9544 psia_or_node = or_node; 9545 } 9546} 9547 9548@ The resizing of the or-node array here presents an issue. 9549It should not be invoked, which means it is never tested, 9550which raises the question of either having confidence in the logic 9551and deleting the code, 9552or arranging to test it. 9553@<Set |last_or_node| to a new or-node@> = 9554{ 9555 const gint or_node_id = OR_Count_of_B (b)++; 9556 OR *or_nodes_of_b = ORs_of_B (b); 9557 last_or_node = (OR)obstack_alloc (&OBS_of_B(b), sizeof(OR_Object)); 9558 ID_of_OR(last_or_node) = or_node_id; 9559 if (G_UNLIKELY(or_node_id >= or_node_estimate)) 9560 { 9561 MARPA_ASSERT(0); 9562 or_node_estimate *= 2; 9563 ORs_of_B (b) = or_nodes_of_b = 9564 g_renew (OR, or_nodes_of_b, or_node_estimate); 9565 } 9566 or_nodes_of_b[or_node_id] = last_or_node; 9567} 9568 9569 9570@ In the following logic, the order matters. 9571The one added last in this or the logic for 9572adding the main item, will be used as the or node 9573in the PSIA. 9574@ In building the final or-node, the predecessor can be 9575determined using the PSIA for $|symbol_instance|-1$. 9576The exception is where there is no predecessor, 9577and this is the case if |Position_of_OR(or_node) == 0|. 9578@<Add nulling token or-nodes@> = 9579{ 9580 const gint null_count = Null_Count_of_AIM (ahfa_item); 9581 if (null_count > 0) 9582 { 9583 const RULE rule = RULE_of_AIM (ahfa_item); 9584 const gint symbol_instance_of_rule = SYMI_of_RULE(rule); 9585 const gint first_null_symbol_instance = 9586 ahfa_item_symbol_instance < 0 ? symbol_instance_of_rule : ahfa_item_symbol_instance + 1; 9587 gint i; 9588 for (i = 0; i < null_count; i++) 9589 { 9590 const gint symbol_instance = first_null_symbol_instance + i; 9591 OR or_node = PSL_Datum (or_psl, symbol_instance); 9592MARPA_OFF_DEBUG3("adding nulling token or-node EIM = %s aex=%d", 9593 eim_tag(work_earley_item), work_aex); 9594 if (!or_node || ES_Ord_of_OR (or_node) != work_earley_set_ordinal) { 9595 DAND draft_and_node; 9596 const gint rhs_ix = symbol_instance - SYMI_of_RULE(rule); 9597 const OR predecessor = rhs_ix ? last_or_node : NULL; 9598 const OR cause = (OR)TOK_by_ID_of_R( r, RHS_ID_of_RULE (rule, rhs_ix ) ); 9599 @<Set |last_or_node| to a new or-node@>@; 9600 or_node = PSL_Datum (or_psl, symbol_instance) = last_or_node ; 9601 Origin_Ord_of_OR (or_node) = work_origin_ordinal; 9602 ES_Ord_of_OR (or_node) = work_earley_set_ordinal; 9603 RULE_of_OR (or_node) = rule; 9604MARPA_OFF_DEBUG3("Added rule %p to or-node %p", RULE_of_OR(or_node), or_node); 9605 Position_of_OR (or_node) = rhs_ix + 1; 9606MARPA_ASSERT(Position_of_OR(or_node) <= 1 || predecessor); 9607 draft_and_node = DANDs_of_OR (or_node) = 9608 draft_and_node_new (&bocage_setup_obs, predecessor, 9609 cause); 9610MARPA_OFF_DEBUG3("or = %p, setting DAND = %p", or_node, DANDs_of_OR(or_node)); 9611 Next_DAND_of_DAND (draft_and_node) = NULL; 9612 } 9613 psia_or_node = or_node; 9614 } 9615 } 9616} 9617 9618@*0 Leo Or-Nodes. 9619@<Add Leo or-nodes@> = { 9620 SRCL source_link = NULL; 9621 EIM cause_earley_item = NULL; 9622 LIM leo_predecessor = NULL; 9623 switch (Source_Type_of_EIM(work_earley_item)) 9624 { 9625 case SOURCE_IS_LEO: 9626 leo_predecessor = Predecessor_of_EIM (work_earley_item); 9627 cause_earley_item = Cause_of_EIM (work_earley_item); 9628 break; 9629 case SOURCE_IS_AMBIGUOUS: 9630 source_link = First_Leo_SRCL_of_EIM (work_earley_item); 9631 if (source_link) 9632 { 9633 leo_predecessor = Predecessor_of_SRCL (source_link); 9634 cause_earley_item = Cause_of_SRCL (source_link); 9635 source_link = Next_SRCL_of_SRCL (source_link); 9636 } 9637 break; 9638 } 9639 if (leo_predecessor) { 9640 for (;;) { /* for each Leo source link */ 9641 @<Add or-nodes for chain starting with |leo_predecessor|@>@; 9642 if (!source_link) break; 9643 leo_predecessor = Predecessor_of_SRCL (source_link); 9644 cause_earley_item = Cause_of_SRCL (source_link); 9645 source_link = Next_SRCL_of_SRCL (source_link); 9646 } 9647 } 9648} 9649 9650@ The main loop in this code deliberately skips the first leo predecessor. 9651The successor of the first leo predecessor is the base of the Leo path, 9652which already exists, and therefore the first leo predecessor is not 9653expanded. 9654@ The unwrapping of the information for the Leo path item is quite the 9655process, and some memoization might be useful. 9656But it is not clear that memoization does more than move 9657the processing from one place to another, increasing space 9658requirements in the process. 9659@<Add or-nodes for chain starting with |leo_predecessor|@> = 9660{ 9661 LIM this_leo_item = leo_predecessor; 9662 LIM previous_leo_item = this_leo_item; 9663 while ((this_leo_item = Predecessor_LIM_of_LIM (this_leo_item))) 9664 { 9665 const gint ordinal_of_set_of_this_leo_item = Ord_of_ES(ES_of_LIM(this_leo_item)); 9666 const AIM path_ahfa_item = Path_AIM_of_LIM(previous_leo_item); 9667 const RULE path_rule = RULE_of_AIM(path_ahfa_item); 9668 const gint symbol_instance_of_path_ahfa_item = SYMI_of_AIM(path_ahfa_item); 9669 @<Add main Leo path or-node@>@; 9670 @<Add Leo path nulling token or-nodes@>@; 9671 previous_leo_item = this_leo_item; 9672 } 9673} 9674 9675@ Get the base data for a Leo item -- it's base Earley item 9676and the index of the relevant AHFA item. 9677@<Private function prototypes@> = 9678static inline AEX lim_base_data_get(LIM leo_item, EIM* p_base); 9679@ @<Function definitions@> = 9680static inline AEX lim_base_data_get(LIM leo_item, EIM* p_base) 9681{ 9682 const SYMID postdot = Postdot_SYMID_of_LIM (leo_item); 9683 const EIM base = Base_EIM_of_LIM(leo_item); 9684 const TRANS transition = TRANS_of_EIM_by_SYMID (base, postdot); 9685 *p_base = base; 9686 return Leo_Base_AEX_of_TRANS (transition); 9687} 9688 9689@ @d Path_AIM_of_LIM(lim) (base_aim_of_lim(lim)+1) 9690@d Base_AIM_of_LIM(lim) (base_aim_of_lim(lim)) 9691@<Private function prototypes@> = 9692static inline AIM base_aim_of_lim(LIM leo_item); 9693@ @<Function definitions@> = 9694static inline AIM base_aim_of_lim(LIM leo_item) 9695{ 9696 EIM base; 9697 const AEX base_aex = lim_base_data_get(leo_item, &base); 9698 return AIM_of_EIM_by_AEX(base, base_aex); 9699} 9700 9701@ Adds the main Leo path or-node---% 9702the non-nulling or-node which 9703corresponds to the leo predecessor. 9704@<Add main Leo path or-node@> = 9705{ 9706 { 9707 OR or_node; 9708 PSL leo_psl; 9709#define PSL_ES_ORD ordinal_of_set_of_this_leo_item 9710#define CLAIMED_PSL leo_psl 9711 @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@; 9712 or_node = PSL_Datum (leo_psl, symbol_instance_of_path_ahfa_item); 9713 if (!or_node || ES_Ord_of_OR(or_node) != work_earley_set_ordinal) 9714 { 9715 @<Set |last_or_node| to a new or-node@>@; 9716 PSL_Datum (leo_psl, symbol_instance_of_path_ahfa_item) = or_node = last_or_node; 9717 Origin_Ord_of_OR(or_node) = ordinal_of_set_of_this_leo_item; 9718 ES_Ord_of_OR(or_node) = work_earley_set_ordinal; 9719 RULE_of_OR(or_node) = path_rule; 9720 Position_of_OR (or_node) = 9721 symbol_instance_of_path_ahfa_item - SYMI_of_RULE (path_rule) + 1; 9722MARPA_OFF_DEBUG3("Created or-node %s at %s", or_tag(or_node), G_STRLOC); 9723 DANDs_of_OR(or_node) = NULL; 9724MARPA_OFF_DEBUG3("or = %p, setting DAND = %p", or_node, DANDs_of_OR(or_node)); 9725 } 9726 } 9727} 9728 9729@ In building the final or-node, the predecessor can be 9730determined using the PSIA for $|symbol_instance|-1$. 9731There will always be a predecessor, since these nulling 9732or-nodes follow a completion. 9733@<Add Leo path nulling token or-nodes@> = 9734{ 9735 gint i; 9736 const gint null_count = Null_Count_of_AIM (path_ahfa_item); 9737 for (i = 1; i <= null_count; i++) 9738 { 9739 const gint symbol_instance = symbol_instance_of_path_ahfa_item + i; 9740 OR or_node = PSL_Datum (this_earley_set_psl, symbol_instance); 9741 MARPA_ASSERT (symbol_instance < SYMI_Count_of_G (g)) @; 9742 if (!or_node || ES_Ord_of_OR (or_node) != work_earley_set_ordinal) 9743 { 9744 DAND draft_and_node; 9745 const gint rhs_ix = symbol_instance - SYMI_of_RULE(path_rule); 9746 const OR predecessor = rhs_ix ? last_or_node : NULL; 9747 const OR cause = (OR)TOK_by_ID_of_R( r, RHS_ID_of_RULE (path_rule, rhs_ix)) ; 9748 MARPA_ASSERT (symbol_instance < Length_of_RULE (path_rule)) @; 9749 MARPA_ASSERT (symbol_instance >= 0) @; 9750 @<Set |last_or_node| to a new or-node@>@; 9751 PSL_Datum (this_earley_set_psl, symbol_instance) = or_node = last_or_node; 9752 Origin_Ord_of_OR (or_node) = ordinal_of_set_of_this_leo_item; 9753 ES_Ord_of_OR (or_node) = work_earley_set_ordinal; 9754 RULE_of_OR (or_node) = path_rule; 9755 Position_of_OR (or_node) = rhs_ix + 1; 9756MARPA_ASSERT(Position_of_OR(or_node) <= 1 || predecessor); 9757 DANDs_of_OR (or_node) = draft_and_node = 9758 draft_and_node_new (&bocage_setup_obs, predecessor, cause); 9759 MARPA_OFF_DEBUG3 ("or = %p, setting DAND = %p", or_node, 9760 DANDs_of_OR (or_node)); 9761 Next_DAND_of_DAND (draft_and_node) = NULL; 9762 } 9763 MARPA_ASSERT (Position_of_OR (or_node) <= 9764 SYMI_of_RULE (path_rule) + Length_of_RULE (path_rule)) @; 9765 MARPA_ASSERT (Position_of_OR (or_node) >= SYMI_of_RULE (path_rule)) @; 9766 } 9767} 9768 9769@** Whole Element ID (WHEID) Code. 9770The "whole elements" of the grammar are the symbols 9771and the completed rules. 9772{\bf To Do}: @^To Do@> 9773Note that this puts a limit on the number of symbols 9774and rules in a grammar --- their total must fit in an 9775int. 9776@d WHEID_of_SYMID(symid) (rule_count_of_g+(symid)) 9777@d WHEID_of_RULEID(ruleid) (ruleid) 9778@d WHEID_of_RULE(rule) WHEID_of_RULEID(ID_of_RULE(rule)) 9779@d WHEID_of_OR(or) ( 9780 wheid = OR_is_Token(or) ? 9781 WHEID_of_SYMID(SYMID_of_OR(or)) : 9782 WHEID_of_RULE(RULE_of_OR(or)) 9783 ) 9784 9785@<Private typedefs@> = 9786typedef gint WHEID; 9787 9788 9789@** Draft And-Node (DAND) Code. 9790The draft and-nodes are used while the bocage is 9791being built. 9792Both draft and final and-nodes contain the predecessor 9793and cause. 9794Draft and-nodes need to be in a linked list, 9795so they have a link to the next and-node. 9796@<Private incomplete structures@> = 9797struct s_draft_and_node; 9798typedef struct s_draft_and_node* DAND; 9799@ 9800@d Next_DAND_of_DAND(dand) ((dand)->t_next) 9801@d Predecessor_OR_of_DAND(dand) ((dand)->t_predecessor) 9802@d Cause_OR_of_DAND(dand) ((dand)->t_cause) 9803@<Private structures@> = 9804struct s_draft_and_node { 9805 DAND t_next; 9806 OR t_predecessor; 9807 OR t_cause; 9808}; 9809typedef struct s_draft_and_node DAND_Object; 9810 9811@ @<Private function prototypes@> = 9812static inline 9813DAND draft_and_node_new(struct obstack *obs, OR predecessor, OR cause); 9814@ @<Function definitions@> = 9815static inline 9816DAND draft_and_node_new(struct obstack *obs, OR predecessor, OR cause) 9817{ 9818 DAND draft_and_node = obstack_alloc (obs, sizeof(DAND_Object)); 9819 Predecessor_OR_of_DAND(draft_and_node) = predecessor; 9820 Cause_OR_of_DAND(draft_and_node) = cause; 9821 MARPA_ASSERT(cause); 9822 return draft_and_node; 9823} 9824 9825@ Currently, I do not check draft and-nodes for duplicates. 9826This will be done when they are copied to final and-ndoes. 9827In the future, it may be more efficient to do a linear search for 9828duplicates until the number of draft and-nodes reaches a small 9829constant $n$. 9830(Optimal $n$ is perhaps something like 7.) 9831Alernatively, it could always check for duplicates, but limit 9832the search to the first $n$ draft and-nodes. 9833@ In that case, the logic to copy the final and-nodes can 9834rely on chains of length less than $n$ being non-duplicated, 9835and the PSARs can be reserved for the unusual case where this 9836is not sufficient. 9837@<Private function prototypes@> = 9838static inline 9839void draft_and_node_add(struct obstack *obs, OR parent, OR predecessor, OR cause); 9840@ @<Function definitions@> = 9841static inline 9842void draft_and_node_add(struct obstack *obs, OR parent, OR predecessor, OR cause) 9843{ 9844 MARPA_ASSERT(Position_of_OR(parent) <= 1 || predecessor) 9845 const DAND new = draft_and_node_new(obs, predecessor, cause); 9846 Next_DAND_of_DAND(new) = DANDs_of_OR(parent); 9847 DANDs_of_OR(parent) = new; 9848} 9849 9850@ @<Create the draft and-nodes for |work_earley_set_ordinal|@> = 9851{ 9852 gint item_ordinal; 9853 for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++) 9854 { 9855 OR* const nodes_by_aex = nodes_by_item[item_ordinal]; 9856 if (nodes_by_aex) { 9857 const EIM work_earley_item = eims_of_es[item_ordinal]; 9858 const gint work_ahfa_item_count = AIM_Count_of_EIM(work_earley_item); 9859 const gint work_origin_ordinal = Ord_of_ES (Origin_of_EIM (work_earley_item)); 9860 AEX work_aex; 9861 for (work_aex = 0; work_aex < work_ahfa_item_count; work_aex++) { 9862 OR or_node = nodes_by_aex[work_aex]; 9863 Move_OR_to_Proper_OR(or_node); 9864 if (or_node) { 9865 @<Create draft and-nodes for |or_node|@>@; 9866 } 9867 } 9868 } 9869 } 9870} 9871 9872@ From an or-node, which may be nulling, determine its proper 9873predecessor. Set |or-node| to 0 if there is none. 9874@d Move_OR_to_Proper_OR(or_node) { 9875 while (or_node) { 9876 DAND draft_and_node = DANDs_of_OR(or_node); 9877 OR predecessor_or; 9878 if (!draft_and_node) break; 9879 predecessor_or = Predecessor_OR_of_DAND (draft_and_node); 9880 if (predecessor_or && 9881 ES_Ord_of_OR (predecessor_or) != work_earley_set_ordinal) 9882 break; 9883 or_node = predecessor_or; 9884 } 9885} 9886 9887@ @<Create draft and-nodes for |or_node|@> = 9888{ 9889 guint work_source_type = Source_Type_of_EIM (work_earley_item); 9890 const AIM work_ahfa_item = AIM_of_EIM_by_AEX (work_earley_item, work_aex); 9891 MARPA_ASSERT (work_ahfa_item >= AIM_by_ID (1))@; 9892 const AIM work_predecessor_aim = work_ahfa_item - 1; 9893 const gint work_symbol_instance = SYMI_of_AIM (work_ahfa_item); 9894 OR work_proper_or_node; 9895 Set_OR_from_Ord_and_SYMI (work_proper_or_node, work_origin_ordinal, 9896 work_symbol_instance); 9897 9898 @<Create Leo draft and-nodes@>@; 9899 @<Create draft and-nodes for token sources@>@; 9900 @<Create draft and-nodes for completion sources@>@; 9901} 9902 9903@ @<Create Leo draft and-nodes@> = { 9904 SRCL source_link = NULL; 9905 EIM cause_earley_item = NULL; 9906 LIM leo_predecessor = NULL; 9907 switch (Source_Type_of_EIM(work_earley_item)) 9908 { 9909 case SOURCE_IS_LEO: 9910 leo_predecessor = Predecessor_of_EIM (work_earley_item); 9911 cause_earley_item = Cause_of_EIM (work_earley_item); 9912 break; 9913 case SOURCE_IS_AMBIGUOUS: 9914 source_link = First_Leo_SRCL_of_EIM (work_earley_item); 9915 if (source_link) 9916 { 9917 leo_predecessor = Predecessor_of_SRCL (source_link); 9918 cause_earley_item = Cause_of_SRCL (source_link); 9919 source_link = Next_SRCL_of_SRCL (source_link); 9920 } 9921 break; 9922 } 9923 if (leo_predecessor) { 9924 for (;;) { /* for each Leo source link */ 9925 @<Add draft and-nodes for chain starting with |leo_predecessor|@>@; 9926 if (!source_link) break; 9927 leo_predecessor = Predecessor_of_SRCL (source_link); 9928 cause_earley_item = Cause_of_SRCL (source_link); 9929 source_link = Next_SRCL_of_SRCL (source_link); 9930 } 9931 } 9932} 9933 9934@ Note that in a trivial path the bottom is also the top. 9935@<Add draft and-nodes for chain starting with |leo_predecessor|@> = 9936{ 9937 /* The rule for the Leo path Earley item */ 9938 RULE path_rule = NULL; 9939 /* The rule for the previous Leo path Earley item */ 9940 RULE previous_path_rule; 9941 LIM path_leo_item = leo_predecessor; 9942 LIM higher_path_leo_item = Predecessor_LIM_of_LIM(path_leo_item); 9943 /* A boolean to indicate whether is true is there is some 9944 section of a non-trivial path left unprocessed. */ 9945 OR dand_predecessor; 9946 OR path_or_node; 9947 EIM base_earley_item; 9948 AEX base_aex = lim_base_data_get(path_leo_item, &base_earley_item); 9949 Set_OR_from_EIM_and_AEX(dand_predecessor, base_earley_item, base_aex); 9950 @<Set |path_or_node|@>@; 9951 @<Add draft and-nodes to the bottom or-node@>@; 9952 previous_path_rule = path_rule; 9953 while (higher_path_leo_item) { 9954 path_leo_item = higher_path_leo_item; 9955 higher_path_leo_item = Predecessor_LIM_of_LIM(path_leo_item); 9956 base_aex = lim_base_data_get(path_leo_item, &base_earley_item); 9957 Set_OR_from_EIM_and_AEX(dand_predecessor, base_earley_item, base_aex); 9958 @<Set |path_or_node|@>@; 9959 @<Add the draft and-nodes to an upper Leo path or-node@>@; 9960 previous_path_rule = path_rule; 9961 } 9962} 9963 9964@ @<Set |path_or_node|@> = 9965{ 9966 if (higher_path_leo_item) { 9967 @<Use Leo base data to set |path_or_node|@>@; 9968 } else { 9969 path_or_node = work_proper_or_node; 9970 } 9971} 9972 9973@ @d Set_OR_from_Ord_and_SYMI(or_node, origin, symbol_instance) { 9974 const PSL or_psl_at_origin = per_es_data[(origin)].t_or_psl; 9975 (or_node) = PSL_Datum (or_psl_at_origin, (symbol_instance)); 9976} 9977 9978@ @<Add draft and-nodes to the bottom or-node@> = 9979{ 9980 const SYMID transition_symbol_id = Postdot_SYMID_of_LIM (leo_predecessor); 9981 const TRANS cause_completion_data = 9982 TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id); 9983 const gint aex_count = Completion_Count_of_TRANS (cause_completion_data); 9984 const AEX *const aexes = AEXs_of_TRANS (cause_completion_data); 9985 gint ix; 9986 for (ix = 0; ix < aex_count; ix++) 9987 { 9988 const AEX cause_aex = aexes[ix]; 9989 OR dand_cause; 9990 Set_OR_from_EIM_and_AEX(dand_cause, cause_earley_item, cause_aex); 9991 draft_and_node_add (&bocage_setup_obs, path_or_node, 9992 dand_predecessor, dand_cause); 9993 } 9994} 9995 9996@ It is assumed that there is an or-node entry for 9997|psia_eim| and |psia_aex|. 9998@d Set_OR_from_EIM_and_AEX(psia_or, psia_eim, psia_aex) { 9999 const EIM psia_earley_item = psia_eim; 10000 const gint psia_earley_set_ordinal = ES_Ord_of_EIM (psia_earley_item); 10001 OR **const psia_nodes_by_item = 10002 per_es_data[psia_earley_set_ordinal].t_aexes_by_item; 10003 const gint psia_item_ordinal = Ord_of_EIM (psia_earley_item); 10004 OR *const psia_nodes_by_aex = psia_nodes_by_item[psia_item_ordinal]; 10005 psia_or = psia_nodes_by_aex ? psia_nodes_by_aex[psia_aex] : NULL; 10006} 10007 10008@ @<Use Leo base data to set |path_or_node|@> = 10009{ 10010 gint symbol_instance; 10011 const gint origin_ordinal = Origin_Ord_of_EIM (base_earley_item); 10012 const AIM aim = AIM_of_EIM_by_AEX (base_earley_item, base_aex); 10013 path_rule = RULE_of_AIM (aim); 10014 symbol_instance = Last_Proper_SYMI_of_RULE (path_rule); 10015 Set_OR_from_Ord_and_SYMI (path_or_node, origin_ordinal, symbol_instance); 10016} 10017 10018@ @<Add the draft and-nodes to an upper Leo path or-node@> = 10019{ 10020 OR dand_cause; 10021 const SYMI symbol_instance = SYMI_of_Completed_RULE(previous_path_rule); 10022 const gint origin_ordinal = Ord_of_ES(ES_of_LIM(path_leo_item)); 10023 Set_OR_from_Ord_and_SYMI(dand_cause, origin_ordinal, symbol_instance); 10024 draft_and_node_add (&bocage_setup_obs, path_or_node, 10025 dand_predecessor, dand_cause); 10026} 10027 10028@ @<Create draft and-nodes for token sources@> = 10029{ 10030 SRCL source_link = NULL; 10031 EIM predecessor_earley_item = NULL; 10032 TOK token = NULL; 10033 switch (work_source_type) 10034 { 10035 case SOURCE_IS_TOKEN: 10036 predecessor_earley_item = Predecessor_of_EIM (work_earley_item); 10037 token = TOK_of_EIM(work_earley_item); 10038 break; 10039 case SOURCE_IS_AMBIGUOUS: 10040 source_link = First_Token_Link_of_EIM (work_earley_item); 10041 if (source_link) 10042 { 10043 predecessor_earley_item = Predecessor_of_SRCL (source_link); 10044 token = TOK_of_SRCL(source_link); 10045 source_link = Next_SRCL_of_SRCL (source_link); 10046 } 10047 } 10048 while (token) 10049 { 10050 @<Add draft and-node for token source@>@; 10051 if (!source_link) break; 10052 predecessor_earley_item = Predecessor_of_SRCL (source_link); 10053 token = TOK_of_SRCL(source_link); 10054 source_link = Next_SRCL_of_SRCL (source_link); 10055 } 10056} 10057 10058@ @<Add draft and-node for token source@> = 10059{ 10060 OR dand_predecessor; 10061 @<Set |dand_predecessor|@>@; 10062 draft_and_node_add (&bocage_setup_obs, work_proper_or_node, 10063 dand_predecessor, (OR)token); 10064} 10065 10066@ @<Set |dand_predecessor|@> = 10067{ 10068 if (Position_of_AIM(work_predecessor_aim) < 1) { 10069 dand_predecessor = NULL; 10070 } else { 10071 const AEX predecessor_aex = 10072 AEX_of_EIM_by_AIM (predecessor_earley_item, work_predecessor_aim); 10073 Set_OR_from_EIM_and_AEX(dand_predecessor, predecessor_earley_item, predecessor_aex); 10074 } 10075} 10076 10077@ @<Create draft and-nodes for completion sources@> = 10078{ 10079 SRCL source_link = NULL; 10080 EIM predecessor_earley_item = NULL; 10081 EIM cause_earley_item = NULL; 10082 const SYMID transition_symbol_id = Postdot_SYMID_of_AIM(work_predecessor_aim); 10083 switch (work_source_type) 10084 { 10085 case SOURCE_IS_COMPLETION: 10086 predecessor_earley_item = Predecessor_of_EIM (work_earley_item); 10087 cause_earley_item = Cause_of_EIM (work_earley_item); 10088 break; 10089 case SOURCE_IS_AMBIGUOUS: 10090 source_link = First_Completion_Link_of_EIM (work_earley_item); 10091 if (source_link) 10092 { 10093 predecessor_earley_item = Predecessor_of_SRCL (source_link); 10094 cause_earley_item = Cause_of_SRCL (source_link); 10095 source_link = Next_SRCL_of_SRCL (source_link); 10096 } 10097 break; 10098 } 10099 while (cause_earley_item) 10100 { 10101 const TRANS cause_completion_data = 10102 TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id); 10103 const gint aex_count = Completion_Count_of_TRANS (cause_completion_data); 10104 const AEX * const aexes = AEXs_of_TRANS (cause_completion_data); 10105 gint ix; 10106 for (ix = 0; ix < aex_count; ix++) { 10107 const AEX cause_aex = aexes[ix]; 10108 @<Add draft and-node for completion source@>@; 10109 } 10110 if (!source_link) break; 10111 predecessor_earley_item = Predecessor_of_SRCL (source_link); 10112 cause_earley_item = Cause_of_SRCL (source_link); 10113 source_link = Next_SRCL_of_SRCL (source_link); 10114 } 10115} 10116 10117@ @<Add draft and-node for completion source@> = 10118{ 10119 OR dand_predecessor; 10120 OR dand_cause; 10121 const gint middle_ordinal = Origin_Ord_of_EIM(cause_earley_item); 10122 const AIM cause_ahfa_item = AIM_of_EIM_by_AEX(cause_earley_item, cause_aex); 10123 const SYMI cause_symbol_instance = 10124 SYMI_of_Completed_RULE(RULE_of_AIM(cause_ahfa_item)); 10125 @<Set |dand_predecessor|@>@; 10126 Set_OR_from_Ord_and_SYMI(dand_cause, middle_ordinal, cause_symbol_instance); 10127 draft_and_node_add (&bocage_setup_obs, work_proper_or_node, 10128 dand_predecessor, dand_cause); 10129} 10130 10131@ @<Mark duplicate draft and-nodes@> = 10132{ 10133 OR * const or_nodes_of_b = ORs_of_B (b); 10134 const gint or_node_count_of_b = OR_Count_of_B(b); 10135 PSAR_Object and_per_es_arena; 10136 const PSAR and_psar = &and_per_es_arena; 10137 gint or_node_id = 0; 10138 psar_init (and_psar, rule_count_of_g+symbol_count_of_g); 10139 while (or_node_id < or_node_count_of_b) { 10140 const OR work_or_node = or_nodes_of_b[or_node_id]; 10141 @<Mark the duplicate draft and-nodes for |work_or_node|@>@; 10142 or_node_id++; 10143 } 10144 psar_destroy (and_psar); 10145} 10146 10147@ I think the and PSL's and or PSL's are not actually used at the 10148same time, so the same field might be used for both. 10149More significantly, a simple $O(n^2)$ sort of the 10150draft and-nodes would spot duplicates more efficiently in 99% 10151of cases, although it would not be $O(n)$ as the PSL's are. 10152The best of both worlds could be had by using the sort when 10153there are less than, say, 7 and-nodes, and the PSL's otherwise. 10154@ The use of PSL's is slightly different here. 10155The PSL is not needed to find the draft and-nodes -- it's 10156essentially just a boolean to indicate whether it exists. 10157But "stale" booleans must still be detected. 10158The solutiion adopted is to put the parent or-node 10159into the PSL. 10160If the PSL contains the current parent or-node, 10161the draft and-node is a duplicate within that or-node. 10162Otherwise, it's the first such draft and-node. 10163@<Mark the duplicate draft and-nodes for |work_or_node|@> = 10164{ 10165 DAND dand = DANDs_of_OR (work_or_node); 10166 DAND next_dand = Next_DAND_of_DAND (dand); 10167 ORID work_or_node_id = ID_of_OR(work_or_node); 10168 /* Only if there is more than one draft and-node */ 10169 if (next_dand) 10170 { 10171 gint origin_ordinal = Origin_Ord_of_OR (work_or_node); 10172 psar_dealloc(and_psar); 10173 while (dand) 10174 { 10175 OR psl_or_node; 10176 OR predecessor = Predecessor_OR_of_DAND (dand); 10177 WHEID wheid = WHEID_of_OR(Cause_OR_of_DAND(dand)); 10178 const gint middle_ordinal = 10179 predecessor ? ES_Ord_of_OR (predecessor) : origin_ordinal; 10180 PSL and_psl; 10181 PSL *psl_owner = &per_es_data[middle_ordinal].t_and_psl; 10182 /* The or-node used as a boolean in the PSL */ 10183 if (!*psl_owner) psl_claim (psl_owner, and_psar); 10184 and_psl = *psl_owner; 10185 psl_or_node = PSL_Datum(and_psl, wheid); 10186 if (psl_or_node && ID_of_OR(psl_or_node) == work_or_node_id) 10187 { 10188 /* Mark this draft and-node as a duplicate */ 10189 Cause_OR_of_DAND(dand) = NULL; 10190 } else { 10191 /* Increment the count of unique draft and-nodes */ 10192 PSL_Datum(and_psl, wheid) = work_or_node; 10193 unique_draft_and_node_count++; 10194 } 10195 dand = Next_DAND_of_DAND (dand); 10196 } 10197 } else { 10198 unique_draft_and_node_count++; 10199 } 10200} 10201 10202@** And-Node (AND) Code. 10203The or-nodes are part of the parse bocage. 10204They are analogous to the and-nodes of a standard parse forest, 10205except that they are binary -- restricted to two children. 10206This means that the parse bocage stores the parse in a kind 10207of Chomsky Normal Form. 10208As another difference between it and a parse forest, 10209the parse bocage can contain cycles. 10210 10211@<Public typedefs@> = 10212typedef gint Marpa_And_Node_ID; 10213@ @<Private typedefs@> = 10214typedef Marpa_And_Node_ID ANDID; 10215 10216@ @<Private incomplete structures@> = 10217struct s_and_node; 10218typedef struct s_and_node* AND; 10219@ 10220@d OR_of_AND(and) ((and)->t_current) 10221@d Predecessor_OR_of_AND(and) ((and)->t_predecessor) 10222@d Cause_OR_of_AND(and) ((and)->t_cause) 10223@<Private structures@> = 10224struct s_and_node { 10225 OR t_current; 10226 OR t_predecessor; 10227 OR t_cause; 10228}; 10229typedef struct s_and_node AND_Object; 10230 10231@ @<Create the final and-nodes for all earley sets@> = 10232{ 10233 gint unique_draft_and_node_count = 0; 10234 @<Mark duplicate draft and-nodes@>@; 10235 @<Create the final and-node array@>@; 10236} 10237 10238@ @<Create the final and-node array@> = 10239{ 10240 const gint or_count_of_b = OR_Count_of_B (b); 10241 gint or_node_id; 10242 gint and_node_id = 0; 10243 const OR *ors_of_b = ORs_of_B (b); 10244 const AND ands_of_b = ANDs_of_B (b) = 10245 g_new (AND_Object, unique_draft_and_node_count); 10246 for (or_node_id = 0; or_node_id < or_count_of_b; or_node_id++) 10247 { 10248 gint and_count_of_parent_or = 0; 10249 const OR or_node = ors_of_b[or_node_id]; 10250 DAND dand = DANDs_of_OR (or_node); 10251 First_ANDID_of_OR(or_node) = and_node_id; 10252 while (dand) 10253 { 10254 const OR cause_or_node = Cause_OR_of_DAND (dand); 10255 if (cause_or_node) 10256 { /* Duplicates draft and-nodes 10257 were marked by nulling the cause or-node */ 10258 const AND and_node = ands_of_b + and_node_id; 10259 OR_of_AND (and_node) = or_node; 10260 Predecessor_OR_of_AND (and_node) = 10261 Predecessor_OR_of_DAND (dand); 10262 Cause_OR_of_AND (and_node) = cause_or_node; 10263 and_node_id++; 10264 and_count_of_parent_or++; 10265 } 10266 dand = Next_DAND_of_DAND(dand); 10267 } 10268 AND_Count_of_OR(or_node) = and_count_of_parent_or; 10269 } 10270 AND_Count_of_B (b) = and_node_id; 10271 MARPA_ASSERT(and_node_id == unique_draft_and_node_count); 10272} 10273 10274@*0 Trace Functions. 10275 10276@ @<Private function prototypes@> = 10277gint marpa_and_node_count(struct marpa_r *r); 10278@ @<Function definitions@> = 10279gint marpa_and_node_count(struct marpa_r *r) 10280{ 10281 BOC b = B_of_R(r); 10282 @<Return |-2| on failure@>@; 10283 @<Fail if recognizer has fatal error@>@; 10284 if (!b) { 10285 R_ERROR("no bocage"); 10286 return failure_indicator; 10287 } 10288 return AND_Count_of_B(b); 10289} 10290 10291@ @<Check |r| and |and_node_id|; set |and_node|@> = { 10292 BOC b = B_of_R(r); 10293 AND and_nodes; 10294 @<Fail if recognizer has fatal error@>@; 10295 if (!b) { 10296 R_ERROR("no bocage"); 10297 return failure_indicator; 10298 } 10299 and_nodes = ANDs_of_B(b); 10300 if (!and_nodes) { 10301 R_ERROR("no and nodes"); 10302 return failure_indicator; 10303 } 10304 if (and_node_id < 0) { 10305 R_ERROR("bad and node id"); 10306 return failure_indicator; 10307 } 10308 if (and_node_id >= AND_Count_of_B(b)) { 10309 return -1; 10310 } 10311 and_node = and_nodes + and_node_id; 10312} 10313 10314@ @<Private function prototypes@> = 10315gint marpa_and_node_parent(struct marpa_r *r, int and_node_id); 10316@ @<Function definitions@> = 10317gint marpa_and_node_parent(struct marpa_r *r, int and_node_id) 10318{ 10319 AND and_node; 10320 @<Return |-2| on failure@>@; 10321 @<Check |r| and |and_node_id|; set |and_node|@>@; 10322 return ID_of_OR (OR_of_AND (and_node)); 10323} 10324 10325@ @<Private function prototypes@> = 10326gint marpa_and_node_predecessor(struct marpa_r *r, int and_node_id); 10327@ @<Function definitions@> = 10328gint marpa_and_node_predecessor(struct marpa_r *r, int and_node_id) 10329{ 10330 AND and_node; 10331 @<Return |-2| on failure@>@; 10332 @<Check |r| and |and_node_id|; set |and_node|@>@; 10333 { 10334 const OR predecessor_or = Predecessor_OR_of_AND (and_node); 10335 const ORID predecessor_or_id = 10336 predecessor_or ? ID_of_OR (predecessor_or) : -1; 10337 return predecessor_or_id; 10338 } 10339} 10340 10341@ @<Private function prototypes@> = 10342gint marpa_and_node_cause(struct marpa_r *r, int and_node_id); 10343@ @<Function definitions@> = 10344gint marpa_and_node_cause(struct marpa_r *r, int and_node_id) 10345{ 10346 AND and_node; 10347 @<Return |-2| on failure@>@; 10348 @<Check |r| and |and_node_id|; set |and_node|@>@; 10349 { 10350 const OR cause_or = Cause_OR_of_AND (and_node); 10351 const ORID cause_or_id = 10352 OR_is_Token(cause_or) ? -1 : ID_of_OR (cause_or); 10353 return cause_or_id; 10354 } 10355} 10356 10357@ @<Private function prototypes@> = 10358gint marpa_and_node_symbol(struct marpa_r *r, int and_node_id); 10359@ @<Function definitions@> = 10360gint marpa_and_node_symbol(struct marpa_r *r, int and_node_id) 10361{ 10362 AND and_node; 10363 @<Return |-2| on failure@>@; 10364 @<Check |r| and |and_node_id|; set |and_node|@>@; 10365 { 10366 const OR cause_or = Cause_OR_of_AND (and_node); 10367 const SYMID symbol_id = 10368 OR_is_Token(cause_or) ? SYMID_of_OR(cause_or) : -1; 10369 return symbol_id; 10370 } 10371} 10372 10373@ Returns the data for the token of the and-node. 10374The symbol id is the return value, 10375and the token value is placed 10376in the location pointed 10377to by |value_p|, if that is non-null. 10378If |and_node_id| is not the ID of an and-node 10379whose cause is a token, 10380returns -1, 10381without changing |*value_p|. 10382On hard failure, returns -2 without changing 10383|*value_p|. 10384\par 10385There is no function to simply return the token value -- 10386because of the need to indicate errors, it is just as 10387easy to return the symbol ID as well. 10388If the 10389@<Public function prototypes@> = 10390Marpa_Symbol_ID marpa_and_node_token(struct marpa_r *r, 10391 Marpa_And_Node_ID and_node_id, gpointer* value_p); 10392@ @<Function definitions@> = 10393Marpa_Symbol_ID marpa_and_node_token(struct marpa_r *r, 10394 Marpa_And_Node_ID and_node_id, gpointer* value_p) 10395{ 10396 AND and_node; 10397 @<Return |-2| on failure@>@; 10398 @<Check |r| and |and_node_id|; set |and_node|@>@; 10399 return and_node_token(and_node, value_p); 10400} 10401@ @<Private function prototypes@> = 10402SYMID and_node_token(AND and_node, gpointer* value_p); 10403@ @<Function definitions@> = 10404SYMID and_node_token(AND and_node, gpointer* value_p) 10405{ 10406 const OR cause_or = Cause_OR_of_AND (and_node); 10407 if (OR_is_Token (cause_or)) 10408 { 10409 const TOK token = TOK_of_OR (cause_or); 10410 if (value_p) 10411 *value_p = Value_of_TOK (token); 10412 return SYMID_of_TOK (token); 10413 } 10414 return -1; 10415} 10416 10417@** Parse Bocage Code (BOC). 10418@ Pre-initialization is making the elements safe for the deallocation logic 10419to be called. Often it is setting the value to zero, so that the deallocation 10420logic knows when {\bf not} to try deallocating a not-yet uninitialized value. 10421@<Private incomplete structures@> = 10422struct s_bocage; 10423typedef struct s_bocage* BOC; 10424@ @<Bocage structure@> = 10425struct s_bocage { 10426 @<Widely aligned bocage elements@>@; 10427 @<Int aligned bocage elements@>@; 10428 @<Bit aligned bocage elements@>@; 10429}; 10430typedef struct s_bocage BOC_Object; 10431@ @d B_of_R(r) ((r)->t_bocage) 10432@<Widely aligned recognizer elements@> = 10433BOC t_bocage; 10434@ @<Initialize recognizer elements@> = 10435B_of_R(r) = NULL; 10436 10437@*0 The Bocage Obstack. 10438An obstack with the lifetime of the bocage. 10439@d OBS_of_B(b) ((b)->t_obs) 10440@<Widely aligned bocage elements@> = 10441struct obstack t_obs; 10442@ @<Bit aligned bocage elements@> = 10443unsigned int is_obstack_initialized:1; 10444@ @<Initialize bocage elements@> = 10445b->is_obstack_initialized = 1; 10446obstack_init(&OBS_of_B(b)); 10447@ @<Destroy bocage elements, final phase@> = 10448if (b->is_obstack_initialized) { 10449 obstack_free(&OBS_of_B(b), NULL); 10450 b->is_obstack_initialized = 0; 10451} 10452 10453@*0 Bocage Construction. 10454@ This function returns 0 for a null parse, 10455and the ID of the start or-node for a non-null parse. 10456If there is no parse, -1 is returned. 10457On other failures, -2 is returned. 10458Note that, even though 0 is a valid or-node ID, 10459this does not conflict with returning 0 for a null parse. 10460Or-node 0 must be in the first Earley set, 10461and any parse whose top or-node is in the first 10462Earley set must be a null parse. 10463 10464so that an or-node of 0 10465@<Public function prototypes@> = 10466gint marpa_bocage_new(struct marpa_r* r, Marpa_Rule_ID rule_id, Marpa_Earley_Set_ID ordinal); 10467@ @<Function definitions@> = 10468gint marpa_bocage_new(struct marpa_r* r, Marpa_Rule_ID rule_id, Marpa_Earley_Set_ID ordinal) { 10469 @<Return |-2| on failure@>@; 10470 ORID top_or_node_id = failure_indicator; 10471 const gint no_parse = -1; 10472 @<Declare bocage locals@>@; 10473 r_update_earley_sets(r); 10474 @<Return if function guards fail; 10475 set |end_of_parse_es| and |completed_start_rule|@>@; 10476 b = B_of_R(r) = g_slice_new(BOC_Object); 10477MARPA_DEBUG3("%s new bocage B_of_R=%p", G_STRLOC, B_of_R(r)); 10478 @<Initialize bocage elements@>@; 10479 @<Deal with null parse as a special case@>@; 10480 @<Find |start_eim|, |start_aim| and |start_aex|@>@; 10481 if (!start_eim) goto SOFT_ERROR; 10482 Phase_of_R(r) = evaluation_phase; 10483 obstack_init(&bocage_setup_obs); 10484 @<Allocate bocage setup working data@>@; 10485 @<Populate the PSIA data@>@; 10486 @<Create the or-nodes for all earley sets@>@; 10487 @<Create the final and-nodes for all earley sets@>@; 10488 @<Set |top_or_node_id|@>@; 10489 obstack_free(&bocage_setup_obs, NULL); 10490 Top_ORID_of_B(b) = top_or_node_id; 10491 return top_or_node_id; 10492 SOFT_ERROR: ; 10493 @<Destroy bocage elements, all phases@>; 10494 return no_parse; 10495} 10496 10497@ @<Declare bocage locals@> = 10498const GRAMMAR_Const g = G_of_R(r); 10499const gint rule_count_of_g = RULE_Count_of_G(g); 10500const gint symbol_count_of_g = SYM_Count_of_G(g); 10501BOC b; 10502ES end_of_parse_es; 10503RULE completed_start_rule; 10504EIM start_eim = NULL; 10505AIM start_aim = NULL; 10506AEX start_aex = -1; 10507struct obstack bocage_setup_obs; 10508gint total_earley_items_in_parse; 10509gint or_node_estimate = 0; 10510const gint earley_set_count_of_r = ES_Count_of_R (r); 10511 10512@ @<Private incomplete structures@> = 10513struct s_bocage_setup_per_es; 10514@ @<Private structures@> = 10515struct s_bocage_setup_per_es { 10516 OR ** t_aexes_by_item; 10517 PSL t_or_psl; 10518 PSL t_and_psl; 10519}; 10520@ @<Declare bocage locals@> = 10521struct s_bocage_setup_per_es* per_es_data = NULL; 10522 10523@ @<Return if function guards fail; 10524set |end_of_parse_es| and |completed_start_rule|@> = 10525{ 10526 EARLEME end_of_parse_earleme; 10527 @<Fail if recognizer has fatal error@>@; 10528 if (B_of_R(r)) { 10529 R_ERROR ("bocage in use"); 10530 return failure_indicator; 10531 } 10532 switch (Phase_of_R (r)) 10533 { 10534 default: 10535 R_ERROR ("recce not evaluation-ready"); 10536 return failure_indicator; 10537 case input_phase: 10538 case evaluation_phase: 10539 break; 10540 } 10541 10542MARPA_OFF_DEBUG2("ordinal=%d", ordinal); 10543 if (ordinal == -1) 10544 { 10545 end_of_parse_es = Current_ES_of_R (r); 10546 } 10547 else 10548 { // ordinal != -1 10549 if (!ES_Ord_is_Valid (r, ordinal)) 10550 { 10551 R_ERROR ("invalid es ordinal"); 10552 return failure_indicator; 10553 } 10554 end_of_parse_es = ES_of_R_by_Ord (r, ordinal); 10555 } 10556 10557 if (!end_of_parse_es) 10558 return no_parse; 10559 ordinal = Ord_of_ES(end_of_parse_es); 10560 end_of_parse_earleme = Earleme_of_ES (end_of_parse_es); 10561 if (rule_id == -1) { 10562 completed_start_rule = 10563 end_of_parse_earleme ? g->t_proper_start_rule : g->t_null_start_rule; 10564 if (!completed_start_rule) 10565 return no_parse; 10566 } else { 10567 if (!RULEID_of_G_is_Valid (g, rule_id)) 10568 { 10569 R_ERROR ("invalid rule id"); 10570 return failure_indicator; 10571 } 10572 completed_start_rule = RULE_by_ID (g, rule_id); 10573 } 10574MARPA_OFF_DEBUG2("ordinal=%d", ordinal); 10575} 10576 10577@ @<Deal with null parse as a special case@> = 10578{ 10579 if (ordinal == 0) { // If this is a null parse 10580 gint rule_length = Length_of_RULE(completed_start_rule); 10581 OR* or_nodes = ORs_of_B (b) = g_new (OR, 1); 10582 AND and_nodes = ANDs_of_B (b) = g_new (AND_Object, 1); 10583 OR or_node = or_nodes[0] = (OR)obstack_alloc (&OBS_of_B(b), sizeof(OR_Object)); 10584 ORID null_or_node_id = 0; 10585 Top_ORID_of_B(b) = null_or_node_id; 10586 10587 OR_Count_of_B(b) = 1; 10588 AND_Count_of_B(b) = 1; 10589 10590 RULE_of_OR(or_node) = completed_start_rule; 10591 Position_of_OR(or_node) = rule_length; 10592 Origin_Ord_of_OR(or_node) = 0; 10593 ID_of_OR(or_node) = null_or_node_id; 10594 ES_Ord_of_OR(or_node) = 0; 10595 First_ANDID_of_OR(or_node) = 0; 10596 AND_Count_of_OR(or_node) = 1; 10597 10598 OR_of_AND(and_nodes) = or_node; 10599 Predecessor_OR_of_AND(and_nodes) = NULL; 10600 Cause_OR_of_AND (and_nodes) = 10601 (OR)TOK_by_ID_of_R (r, RHS_ID_of_RULE (completed_start_rule, rule_length - 1)); 10602 10603 return null_or_node_id; 10604 } 10605} 10606 10607@ 10608@<Allocate bocage setup working data@>= 10609{ 10610 guint ix; 10611 guint earley_set_count = ES_Count_of_R (r); 10612 total_earley_items_in_parse = 0; 10613 per_es_data = 10614 obstack_alloc (&bocage_setup_obs, 10615 sizeof (struct s_bocage_setup_per_es) * earley_set_count); 10616 for (ix = 0; ix < earley_set_count; ix++) 10617 { 10618 const ES_Const earley_set = ES_of_R_by_Ord (r, ix); 10619 const guint item_count = EIM_Count_of_ES (earley_set); 10620 total_earley_items_in_parse += item_count; 10621 { 10622 struct s_bocage_setup_per_es *per_es = per_es_data + ix; 10623 OR ** const per_eim_eixes = per_es->t_aexes_by_item = 10624 obstack_alloc (&bocage_setup_obs, sizeof (OR *) * item_count); 10625 guint item_ordinal; 10626 per_es->t_or_psl = NULL; 10627 per_es->t_and_psl = NULL; 10628 for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++) 10629 { 10630 per_eim_eixes[item_ordinal] = NULL; 10631 } 10632 } 10633 } 10634} 10635 10636@ Predicted AHFA states can be skipped since they 10637contain no completions. 10638Note that AHFA state 0 is not marked as a predicted AHFA state, 10639even though it can contain a predicted AHFA item. 10640@ A linear search of the AHFA items is used. 10641As shown elsewhere in this document, 10642discovered AHFA states for practical grammars tend to be 10643very small---% 10644less than two AHFA items. 10645Size of the AHFA state is a function of the grammar, so 10646any reasonable search is $O(1)$ in terms of the length of 10647the input. 10648@ The search for the start Earley item is done once 10649per parse---% 10650$O(s)$, where $s$ is the size of the end of parse Earley set. 10651This makes it very hard to justify any precomputations to 10652help the search, because if they have to be done once per 10653Earley set, that is a $O(\wsize \cdot s')$ overhead, 10654where $\wsize$ is the length of the input, and where 10655$s'$ is the average size of an Earley set. 10656It is hard to believe that for practical grammars 10657that $O(\wsize \cdot s') <= O(s)$, which 10658is what it would take for any per-Earley set overhead 10659to make sense. 10660@<Find |start_eim|, |start_aim| and |start_aex|@> = 10661{ 10662 gint eim_ix; 10663 EIM* const earley_items = EIMs_of_ES(end_of_parse_es); 10664 const RULEID sought_rule_id = ID_of_RULE(completed_start_rule); 10665 const gint earley_item_count = EIM_Count_of_ES(end_of_parse_es); 10666 for (eim_ix = 0; eim_ix < earley_item_count; eim_ix++) { 10667 const EIM earley_item = earley_items[eim_ix]; 10668 const AHFA ahfa_state = AHFA_of_EIM(earley_item); 10669 if (Origin_Earleme_of_EIM(earley_item) > 0) continue; // Not a start EIM 10670 if (!AHFA_is_Predicted(ahfa_state)) { 10671 gint aex; 10672 AIM* const ahfa_items = AIMs_of_AHFA(ahfa_state); 10673 const gint ahfa_item_count = AIM_Count_of_AHFA(ahfa_state); 10674 for (aex = 0; aex < ahfa_item_count; aex++) { 10675 const AIM ahfa_item = ahfa_items[aex]; 10676 if (RULEID_of_AIM(ahfa_item) == sought_rule_id) { 10677 start_aim = ahfa_item; 10678 start_eim = earley_item; 10679 start_aex = aex; 10680 break; 10681 } 10682 } 10683 } 10684 if (start_eim) break; 10685 } 10686} 10687 10688@ @<Set |top_or_node_id|@> = { 10689 const ESID end_of_parse_ordinal = Ord_of_ES(end_of_parse_es); 10690 OR** const nodes_by_item = per_es_data[end_of_parse_ordinal].t_aexes_by_item; 10691 const gint start_earley_item_ordinal = Ord_of_EIM(start_eim); 10692 OR* const nodes_by_aex = nodes_by_item[start_earley_item_ordinal]; 10693 const OR top_or_node = nodes_by_aex[start_aex]; 10694 top_or_node_id = ID_of_OR(top_or_node); 10695} 10696 10697@*0 Bocage Destruction. 10698@<Destroy bocage elements, all phases@> = 10699@<Destroy bocage elements, main phase@>; 10700@<Destroy bocage elements, final phase@>; 10701 10702@ Destroy the bocage elements when I destroy the recognizer. 10703@<Destroy recognizer elements@> = bocage_destroy(r); 10704 10705@ This function is safe to call even 10706if the bocage already has been freed, 10707or was never initialized. 10708@<Public function prototypes@> = 10709gint marpa_bocage_free(struct marpa_r* r); 10710@ @<Function definitions@> = 10711gint marpa_bocage_free(struct marpa_r* r) { 10712 @<Return |-2| on failure@>@; 10713 @<Fail if recognizer has fatal error@>@; 10714 if (Phase_of_R(r) == evaluation_phase) { /* Reset phase if evaluating. 10715 Otherwise leave phase untouched */ 10716 Phase_of_R(r) = input_phase; 10717 } 10718 bocage_destroy(r); 10719 return 1; 10720} 10721 10722@ @<Private function prototypes@> = 10723static inline void bocage_destroy(struct marpa_r* r); 10724@ @<Function definitions@> = 10725static inline void bocage_destroy(struct marpa_r* r) 10726{ 10727 BOC b = B_of_R(r); 10728MARPA_DEBUG3("%s B_of_R=%p", G_STRLOC, B_of_R(r)); 10729 if (b) { 10730 @<Destroy bocage elements, all phases@>; 10731 g_slice_free(BOC_Object, b); 10732 B_of_R(r) = NULL; 10733 } 10734MARPA_DEBUG3("%s B_of_R=%p", G_STRLOC, B_of_R(r)); 10735} 10736 10737@*0 Trace Functions. 10738 10739@ This is common logic in the or-node trace functions. 10740@<Check |r| and |or_node_id|; set |or_node|@> = { 10741 BOC b = B_of_R(r); 10742 OR* or_nodes; 10743 @<Fail if recognizer has fatal error@>@; 10744 if (!b) { 10745 R_ERROR("no bocage"); 10746 return failure_indicator; 10747 } 10748 or_nodes = ORs_of_B(b); 10749 if (!or_nodes) { 10750 R_ERROR("no or nodes"); 10751 return failure_indicator; 10752 } 10753 if (or_node_id < 0) { 10754 R_ERROR("bad or node id"); 10755 return failure_indicator; 10756 } 10757 if (or_node_id >= OR_Count_of_B(b)) { 10758 return -1; 10759 } 10760 or_node = or_nodes[or_node_id]; 10761} 10762 10763@ Return the ordinal of the current (final) Earley set of 10764the or-node. 10765@<Private function prototypes@> = 10766gint marpa_or_node_set(struct marpa_r *r, int or_node_id); 10767@ @<Function definitions@> = 10768gint marpa_or_node_set(struct marpa_r *r, int or_node_id) 10769{ 10770 OR or_node; 10771 @<Return |-2| on failure@>@; 10772 @<Check |r| and |or_node_id|; set |or_node|@>@; 10773 return ES_Ord_of_OR(or_node); 10774} 10775 10776@ @<Private function prototypes@> = 10777gint marpa_or_node_origin(struct marpa_r *r, int or_node_id); 10778@ @<Function definitions@> = 10779gint marpa_or_node_origin(struct marpa_r *r, int or_node_id) 10780{ 10781 OR or_node; 10782 @<Return |-2| on failure@>@; 10783 @<Check |r| and |or_node_id|; set |or_node|@>@; 10784 return Origin_Ord_of_OR(or_node); 10785} 10786 10787@ @<Private function prototypes@> = 10788gint marpa_or_node_rule(struct marpa_r *r, int or_node_id); 10789@ @<Function definitions@> = 10790gint marpa_or_node_rule(struct marpa_r *r, int or_node_id) 10791{ 10792 OR or_node; 10793 @<Return |-2| on failure@>@; 10794 @<Check |r| and |or_node_id|; set |or_node|@>@; 10795 return ID_of_RULE(RULE_of_OR(or_node)); 10796} 10797 10798@ @<Private function prototypes@> = 10799gint marpa_or_node_position(struct marpa_r *r, int or_node_id); 10800@ @<Function definitions@> = 10801gint marpa_or_node_position(struct marpa_r *r, int or_node_id) 10802{ 10803 OR or_node; 10804 @<Return |-2| on failure@>@; 10805 @<Check |r| and |or_node_id|; set |or_node|@>@; 10806 return Position_of_OR(or_node); 10807} 10808 10809@ @<Private function prototypes@> = 10810gint marpa_or_node_first_and(struct marpa_r *r, int or_node_id); 10811@ @<Function definitions@> = 10812gint marpa_or_node_first_and(struct marpa_r *r, int or_node_id) 10813{ 10814 OR or_node; 10815 @<Return |-2| on failure@>@; 10816 @<Check |r| and |or_node_id|; set |or_node|@>@; 10817 return First_ANDID_of_OR(or_node); 10818} 10819 10820@ @<Private function prototypes@> = 10821gint marpa_or_node_last_and(struct marpa_r *r, int or_node_id); 10822@ @<Function definitions@> = 10823gint marpa_or_node_last_and(struct marpa_r *r, int or_node_id) 10824{ 10825 OR or_node; 10826 @<Return |-2| on failure@>@; 10827 @<Check |r| and |or_node_id|; set |or_node|@>@; 10828 return First_ANDID_of_OR(or_node) 10829 + AND_Count_of_OR(or_node) - 1; 10830} 10831 10832@ @<Private function prototypes@> = 10833gint marpa_or_node_and_count(struct marpa_r *r, int or_node_id); 10834@ @<Function definitions@> = 10835gint marpa_or_node_and_count(struct marpa_r *r, int or_node_id) 10836{ 10837 OR or_node; 10838 @<Return |-2| on failure@>@; 10839 @<Check |r| and |or_node_id|; set |or_node|@>@; 10840 return AND_Count_of_OR(or_node); 10841} 10842 10843@** Parse Tree (TREE) Code. 10844Within Marpa, 10845when it makes sense in context, 10846"tree" means a parse tree. 10847Trees are, of course, a very common data structure, 10848and are used for all sorts of things. 10849But the most important trees in Marpa's universe 10850are its parse trees. 10851\par 10852Marpa's parse trees are produced by iterating 10853the Marpa bocage. 10854Therefore, Marpa parse trees are also bocage iterators. 10855@<Private incomplete structures@> = 10856struct s_tree; 10857typedef struct s_tree* TREE; 10858@ An exhausted bocage iterator (or parse tree) 10859does not need a worklist 10860or a stack, so they are destroyed. 10861if the bocage iterator has a parse count, 10862but no stack, 10863it is exhausted. 10864@d TREE_is_Initialized(tree) ((tree)->t_parse_count >= 0) 10865@d TREE_is_Exhausted(tree) (TREE_is_Initialized(tree) 10866 && !FSTACK_IS_INITIALIZED((tree)->t_fork_stack)) 10867@d VAL_of_TREE(tree) (&(tree)->t_val) 10868@d Size_of_TREE(tree) FSTACK_LENGTH((tree)->t_fork_stack) 10869@d FORK_of_TREE_by_IX(tree, fork_id) 10870 FSTACK_INDEX((tree)->t_fork_stack, FORK_Object, fork_id) 10871@<Private structures@> = 10872@<FORK structure@>@; 10873@<VAL structure@>@; 10874struct s_tree { 10875 FSTACK_DECLARE(t_fork_stack, FORK_Object)@; 10876 FSTACK_DECLARE(t_fork_worklist, gint)@; 10877 Bit_Vector t_and_node_in_use; 10878 gint t_parse_count; 10879 VAL_Object t_val; 10880}; 10881typedef struct s_tree TREE_Object; 10882 10883@ @<Private function prototypes@> = 10884static inline void tree_exhaust(TREE tree); 10885@ @<Function definitions@> = 10886static inline void tree_exhaust(TREE tree) 10887{ 10888 if (FSTACK_IS_INITIALIZED(tree->t_fork_stack)) 10889 { 10890 FSTACK_DESTROY(tree->t_fork_stack); 10891 FSTACK_SAFE(tree->t_fork_stack); 10892 } 10893 if (FSTACK_IS_INITIALIZED(tree->t_fork_worklist)) 10894 { 10895 FSTACK_DESTROY(tree->t_fork_worklist); 10896 FSTACK_SAFE(tree->t_fork_worklist); 10897 } 10898 if (tree->t_and_node_in_use) { 10899 bv_free (tree->t_and_node_in_use); 10900 tree->t_and_node_in_use = NULL; 10901 } 10902} 10903 10904@ @<Private function prototypes@> = 10905static inline void tree_safe(TREE tree); 10906@ @<Function definitions@> = 10907static inline void tree_safe(TREE tree) 10908{ 10909 FSTACK_SAFE(tree->t_fork_stack); 10910 FSTACK_SAFE(tree->t_fork_worklist); 10911 tree->t_and_node_in_use = NULL; 10912 tree->t_parse_count = -1; 10913 val_safe(VAL_of_TREE(tree)); 10914} 10915 10916@ Returns the size of the tree. 10917If the bocage iterator is exhausted, returns -1. 10918On error, returns -2. 10919@<Public function prototypes@> = 10920int marpa_tree_new(struct marpa_r* r); 10921@ @<Function definitions@> = 10922int marpa_tree_new(struct marpa_r* r) 10923{ 10924 BOC b; 10925 TREE tree; 10926 gint first_tree_of_series = 0; 10927 @<Return |-2| on failure@>@; 10928 @<Fail if recognizer has fatal error@>@; 10929 @<Set |b| to bocage; fail if none@>@; 10930 tree = TREE_of_RANK(RANK_of_B(b)); 10931 if (TREE_is_Exhausted(tree)) { 10932 return -1; 10933 } 10934 val_destroy(VAL_of_TREE(tree)); 10935 if (!TREE_is_Initialized(tree)) 10936 { 10937 first_tree_of_series = 1; 10938 @<Initialize the tree iterator; 10939 return -1 if fails 10940 @>@; 10941 } 10942 while (1) { 10943 const AND ands_of_b = ANDs_of_B(b); 10944 if (!first_tree_of_series) { 10945 @<Start a new iteration of the tree@>@; 10946 } 10947 first_tree_of_series = 0; 10948 @<Finish tree if possible@>@; 10949 } 10950 TREE_IS_FINISHED: ; 10951 tree->t_parse_count++; 10952 return FSTACK_LENGTH(tree->t_fork_stack); 10953 TREE_IS_EXHAUSTED: ; 10954 tree_exhaust(tree); 10955 return -1; 10956} 10957 10958@*0 Claiming and Releasing And-nodes. 10959To avoid cycles, the same and node is not allowed to occur twice 10960in the parse tree. 10961A bit vector, accessed by these functions, enforces this. 10962@<Private function prototypes@> = 10963static inline void tree_and_node_claim(TREE tree, ANDID and_node_id); 10964static inline void tree_and_node_release(TREE tree, ANDID and_node_id); 10965static inline gint tree_and_node_try(TREE tree, ANDID and_node_id); 10966@ Claim the and-node by setting its bit. 10967@<Function definitions@> = 10968static inline void tree_and_node_claim(TREE tree, ANDID and_node_id) 10969{ 10970 bv_bit_set(tree->t_and_node_in_use, (guint)and_node_id); 10971} 10972@ Release the and-node by unsetting its bit. 10973@<Function definitions@> = 10974static inline void tree_and_node_release(TREE tree, ANDID and_node_id) 10975{ 10976 bv_bit_clear(tree->t_and_node_in_use, (guint)and_node_id); 10977} 10978@ Try to claim the and-node. 10979If it was already claimed, return 0, otherwise claim it (that is, 10980set the bit) and return 1. 10981@<Function definitions@> = 10982static inline gint tree_and_node_try(TREE tree, ANDID and_node_id) 10983{ 10984 return !bv_bit_test_and_set(tree->t_and_node_in_use, (guint)and_node_id); 10985} 10986 10987@ @<Initialize the tree iterator; 10988return -1 if fails@> = 10989{ 10990 ORID top_or_id = Top_ORID_of_B(b); 10991 OR top_or_node = OR_of_B_by_ID(b, top_or_id); 10992 FORK fork; 10993 gint choice; 10994 const gint and_count = AND_Count_of_B (b); 10995 tree->t_parse_count = 0; 10996 tree->t_and_node_in_use = bv_create ((guint) and_count); 10997 FSTACK_INIT (tree->t_fork_stack, FORK_Object, and_count); 10998 FSTACK_INIT (tree->t_fork_worklist, gint, and_count); 10999 choice = or_node_next_choice(b, tree, top_or_node, 0); 11000 /* Due to skipping, even the top or-node can have no 11001 valid choices, in which case there is no parse */ 11002 if (choice < 0) goto TREE_IS_EXHAUSTED; 11003 fork = FSTACK_PUSH (tree->t_fork_stack); 11004 OR_of_FORK(fork) = top_or_node; 11005 Choice_of_FORK(fork) = choice; 11006 Parent_of_FORK(fork) = -1; 11007 FORK_Cause_is_Ready(fork) = 0; 11008 FORK_is_Cause(fork) = 0; 11009 FORK_Predecessor_is_Ready(fork) = 0; 11010 FORK_is_Predecessor(fork) = 0; 11011 *(FSTACK_PUSH (tree->t_fork_worklist)) = 0; 11012} 11013 11014@ Look for a fork to iterate. 11015If there is one, set it to the next choice. 11016Otherwise, the tree is exhausted. 11017@<Start a new iteration of the tree@> = { 11018 while (1) { 11019 FORK iteration_candidate = FSTACK_TOP(tree->t_fork_stack, FORK_Object); 11020 gint choice; 11021 if (!iteration_candidate) break; 11022 choice = Choice_of_FORK(iteration_candidate); 11023 MARPA_ASSERT(choice >= 0); 11024 { 11025 OR or_node = OR_of_FORK(iteration_candidate); 11026 ANDID and_node_id = and_order_get(b, or_node, choice); 11027 tree_and_node_release(tree, and_node_id); 11028 choice = or_node_next_choice(b, tree, or_node, choice+1); 11029 } 11030 if (choice >= 0) { 11031 /* We have found a fork we can iterate. 11032 Set the new choice, 11033 dirty the child bits in the current working fork, 11034 and break out of the loop. 11035 */ 11036 Choice_of_FORK(iteration_candidate) = choice; 11037 FORK_Cause_is_Ready(iteration_candidate) = 0; 11038 FORK_Predecessor_is_Ready(iteration_candidate) = 0; 11039 break; 11040 } 11041 { 11042 /* Dirty the corresponding bit in the parent */ 11043 const gint parent_fork_ix = Parent_of_FORK(iteration_candidate); 11044 if (parent_fork_ix >= 0) { 11045 FORK parent_fork = FORK_of_TREE_by_IX(tree, parent_fork_ix); 11046 if (FORK_is_Cause(iteration_candidate)) { 11047 FORK_Cause_is_Ready(parent_fork) = 0; 11048 } 11049 if (FORK_is_Predecessor(iteration_candidate)) { 11050 FORK_Predecessor_is_Ready(parent_fork) = 0; 11051 } 11052 } 11053 11054 /* Continue with the next item on the stack */ 11055 FSTACK_POP(tree->t_fork_stack); 11056 } 11057 } 11058 { 11059 gint stack_length = FSTACK_LENGTH(tree->t_fork_stack); 11060 gint i; 11061 if (stack_length <= 0) goto TREE_IS_EXHAUSTED; 11062 FSTACK_CLEAR(tree->t_fork_worklist); 11063 for (i = 0; i < stack_length; i++) { 11064 *(FSTACK_PUSH(tree->t_fork_worklist)) = i; 11065 } 11066 } 11067} 11068 11069@ @<Finish tree if possible@> = { 11070 while (1) { 11071 FORKID* p_work_fork_id; 11072 FORK work_fork; 11073 ANDID work_and_node_id; 11074 AND work_and_node; 11075 OR work_or_node; 11076 OR child_or_node = NULL; 11077 gint choice; 11078 gint child_is_cause = 0; 11079 gint child_is_predecessor = 0; 11080 p_work_fork_id = FSTACK_TOP(tree->t_fork_worklist, FORKID); 11081 if (!p_work_fork_id) { 11082 goto TREE_IS_FINISHED; 11083 } 11084 work_fork = FORK_of_TREE_by_IX(tree, *p_work_fork_id); 11085 work_or_node = OR_of_FORK(work_fork); 11086 work_and_node_id = and_order_get(b, work_or_node, Choice_of_FORK(work_fork)); 11087 work_and_node = ands_of_b + work_and_node_id; 11088 if (!FORK_Cause_is_Ready(work_fork)) { 11089 child_or_node = Cause_OR_of_AND(work_and_node); 11090 if (child_or_node && OR_is_Token(child_or_node)) child_or_node = NULL; 11091 if (child_or_node) { 11092 child_is_cause = 1; 11093 } else { 11094 FORK_Cause_is_Ready(work_fork) = 1; 11095 } 11096 } 11097 if (!child_or_node && !FORK_Predecessor_is_Ready(work_fork)) { 11098 child_or_node = Predecessor_OR_of_AND(work_and_node); 11099 if (child_or_node) { 11100 child_is_predecessor = 1; 11101 } else { 11102 FORK_Predecessor_is_Ready(work_fork) = 1; 11103 } 11104 } 11105 if (!child_or_node) { 11106 FSTACK_POP(tree->t_fork_worklist); 11107 goto NEXT_FORK_ON_WORKLIST; 11108 } 11109 choice = or_node_next_choice(b, tree, child_or_node, 0); 11110 if (choice < 0) goto NEXT_TREE; 11111 @<Add new fork to tree@>; 11112 NEXT_FORK_ON_WORKLIST: ; 11113 } 11114 NEXT_TREE: ; 11115} 11116 11117@ @<Private function prototypes@> = 11118static inline gint or_node_next_choice(BOC b, TREE tree, OR or_node, gint start_choice); 11119@ @<Function definitions@> = 11120static inline gint or_node_next_choice(BOC b, TREE tree, OR or_node, gint start_choice) 11121{ 11122 gint choice = start_choice; 11123 while (1) { 11124 ANDID and_node_id = and_order_get(b, or_node, choice); 11125 if (and_node_id < 0) return -1; 11126 if (tree_and_node_try(tree, and_node_id)) return choice; 11127 choice++; 11128 } 11129 return -1; 11130} 11131 11132@ @<Add new fork to tree@> = 11133{ 11134 FORKID new_fork_id = FSTACK_LENGTH(tree->t_fork_stack); 11135 FORK new_fork = FSTACK_PUSH(tree->t_fork_stack); 11136 *(FSTACK_PUSH(tree->t_fork_worklist)) = new_fork_id; 11137 Parent_of_FORK(new_fork) = *p_work_fork_id; 11138 Choice_of_FORK(new_fork) = choice; 11139 OR_of_FORK(new_fork) = child_or_node; 11140 FORK_Cause_is_Ready(new_fork) = 0; 11141 if ( ( FORK_is_Cause(new_fork) = child_is_cause ) ) { 11142 FORK_Cause_is_Ready(work_fork) = 1; 11143 } 11144 FORK_Predecessor_is_Ready(new_fork) = 0; 11145 if ( ( FORK_is_Predecessor(new_fork) = child_is_predecessor ) ) { 11146 FORK_Predecessor_is_Ready(work_fork) = 1; 11147 } 11148} 11149 11150@ @<Set |b| to bocage; fail if none@> = 11151{ 11152 b = B_of_R(r); 11153 if (!b) { 11154 R_ERROR ("no bocage"); 11155 return failure_indicator; 11156 } 11157} 11158 11159@ @<Private function prototypes@> = 11160static inline void tree_destroy(TREE tree); 11161@ @<Function definitions@> = 11162static inline void tree_destroy(TREE tree) 11163{ 11164 tree_exhaust(tree); 11165 tree->t_parse_count = -1; 11166MARPA_DEBUG4("%s tree=%p parse_count=%d", G_STRLOC, tree, tree->t_parse_count); 11167} 11168 11169@ Soft failure (-1) if no bocage, so that this function 11170can be also used to check for the existence of the bocage. 11171@<Public function prototypes@> = 11172gint marpa_parse_count(struct marpa_r* r); 11173@ @<Function definitions@> = 11174gint marpa_parse_count(struct marpa_r* r) 11175{ 11176 BOC b; 11177 TREE tree; 11178 @<Return |-2| on failure@>@; 11179 @<Fail if recognizer has fatal error@>@; 11180 b = B_of_R(r); 11181 if (!b) { 11182 return -1; 11183 } 11184 tree = TREE_of_RANK(RANK_of_B(b)); 11185MARPA_DEBUG3("%s b=%p", G_STRLOC, b); 11186MARPA_DEBUG4("%s tree=%p parse_count=%d", G_STRLOC, tree, tree->t_parse_count); 11187 return tree->t_parse_count; 11188} 11189 11190@ Return the size of the parse tree. 11191This is the number of |FORK| entries in its stack. 11192If there is a serioius error, 11193or if the tree is uninitialized, return -2. 11194If the tree is exhausted, return -1. 11195@<Private function prototypes@> = 11196gint marpa_tree_size(struct marpa_r *r); 11197@ @<Function definitions@> = 11198gint marpa_tree_size(struct marpa_r *r) 11199{ 11200 @<Return |-2| on failure@>@; 11201 BOC b = B_of_R(r); 11202 TREE tree; 11203 @<Fail if recognizer has fatal error@>@; 11204 if (!b) { 11205 R_ERROR("no bocage"); 11206 return failure_indicator; 11207 } 11208 tree = TREE_of_RANK(RANK_of_B(b)); 11209 if (!TREE_is_Initialized(tree)) { 11210 R_ERROR("tree not initialized"); 11211 return failure_indicator; 11212 } 11213 if (TREE_is_Exhausted(tree)) { 11214 return -1; 11215 } 11216 return FSTACK_LENGTH(tree->t_fork_stack); 11217} 11218 11219@** Bocage Ranking (RANK) Code. 11220@<Private incomplete structures@> = 11221struct s_bocage_rank; 11222typedef struct s_bocage_rank* RANK; 11223@ 11224|t_and_node_orderings| is used as the "safe boolean" 11225for the obstack. They have the same lifetime, so 11226that it is safe to destroy the obstack if 11227|t_and_node_orderings| is not null. 11228@d TREE_of_RANK(rank) (&(rank)->t_tree) 11229@d OBS_of_RANK(rank) ((rank)->t_obs) 11230@<Private structures@> = 11231struct s_bocage_rank { 11232 struct obstack t_obs; 11233 Bit_Vector t_and_node_in_use; 11234 ANDID** t_and_node_orderings; 11235 TREE_Object t_tree; 11236}; 11237typedef struct s_bocage_rank RANK_Object; 11238 11239@ 11240@d RANK_of_B(b) (&(b)->t_rank) 11241@<Widely aligned bocage elements@> = 11242RANK_Object t_rank; 11243@ @<Initialize bocage elements@> = 11244MARPA_DEBUG3("%s rank_safe where b=%p", G_STRLOC, b); 11245rank_safe(RANK_of_B(b)); 11246@ @<Private function prototypes@> = 11247static inline void rank_safe(RANK rank); 11248@ @<Function definitions@> = 11249static inline void rank_safe(RANK rank) 11250{ 11251 rank->t_and_node_in_use = NULL; 11252 rank->t_and_node_orderings = NULL; 11253 tree_safe(TREE_of_RANK(rank)); 11254} 11255 11256@ @<Destroy bocage elements, main phase@> = 11257rank_destroy(RANK_of_B(b)); 11258@ @<Private function prototypes@> = 11259static inline void rank_freeze(RANK rank); 11260static inline void rank_destroy(RANK rank); 11261@ @<Function definitions@> = 11262static inline void rank_freeze(RANK rank) 11263{ 11264 if (rank->t_and_node_in_use) 11265 { 11266 bv_free (rank->t_and_node_in_use); 11267 rank->t_and_node_in_use = NULL; 11268 } 11269} 11270static inline void rank_destroy(RANK rank) 11271{ 11272 tree_destroy(TREE_of_RANK(rank)); 11273 rank_freeze(rank); 11274 if (rank->t_and_node_orderings) { 11275 rank->t_and_node_orderings = NULL; 11276 obstack_free(&OBS_of_RANK(rank), NULL); 11277 } 11278} 11279 11280@*0 The RANK Obstack. 11281An obstack with the lifetime of the bocage ranker. 11282 11283@*0 Set the Order of And-nodes. 11284This function 11285sets the order in which the and-nodes of an 11286or-node are used. 11287It is an error if an and-node ID is not the 11288immediate child of the specified or-node, 11289or if the and-node is specified twice, 11290or if an ordering has already been specified for 11291the or-node. 11292@<Public function prototypes@> = 11293gint marpa_and_order_set(struct marpa_r *r, 11294 Marpa_Or_Node_ID or_node_id, 11295 Marpa_And_Node_ID* and_node_ids, 11296 gint length); 11297@ For a given bocage, 11298this function may not be used to order 11299the same or-node more than once. 11300In other words, after you have once specified an order 11301for the and-nodes within an or-node, 11302you cannot change it. 11303Some applications might find this inconvenient, 11304and will have to resort to their own buffering 11305to prevent multiple changes. 11306But most applications won't care, and 11307will benefit from the faster memory allocation 11308this restriction allows. 11309 11310@ Using a bit vector for 11311the index of an and-node within an or-node, 11312instead of the and-node ID, would seem to allow 11313an space efficiency: the size of the bit vector 11314could be reduced to the maximum number of descendents 11315of any or-node. 11316But in fact, improvements from this approach are evasive. 11317 11318In the worst cases, these counts are the same, or 11319almost the same. 11320Any attempt to economize on space seems to always 11321be counter-productive in terms of speed. 11322And since 11323allocating a bit vector for the worst case does 11324not increase the memory high water mark, 11325it would seems to be the most reasonable tradeoff. 11326 11327This in turn suggests there is no advantage is using 11328a within-or-node index to index the bit vector, 11329instead of using the and-node id to index the bit vector. 11330Using the and-node ID does have the advantage that the bit 11331vector does not need to be cleared for each or-node. 11332@ The first position in each |and_node_orderings| array is not 11333actually an |ANDID|, but a count. 11334A purist might insist this needs to be reflected in a structure, 11335but to my mind doing this portably makes the code more obscure, 11336not less. 11337@<Function definitions@> = 11338gint marpa_and_order_set(struct marpa_r *r, 11339 Marpa_Or_Node_ID or_node_id, 11340 Marpa_And_Node_ID* and_node_ids, 11341 gint length) 11342{ 11343 OR or_node; 11344 RANK rank; 11345 @<Return |-2| on failure@>@; 11346 @<Check |r| and |or_node_id|; set |or_node|@>@; 11347 { BOC b = B_of_R(r); 11348 ANDID** and_node_orderings; 11349 Bit_Vector and_node_in_use; 11350 struct obstack *obs; 11351 ANDID first_and_node_id; 11352 ANDID and_count_of_or; 11353 if (!b) { 11354 R_ERROR("no bocage"); 11355 return failure_indicator; 11356 } 11357 rank = RANK_of_B(b); 11358 and_node_orderings = rank->t_and_node_orderings; 11359 and_node_in_use = rank->t_and_node_in_use; 11360 obs = &OBS_of_RANK(rank); 11361 if (and_node_orderings && !and_node_in_use) 11362 { 11363 R_ERROR("ranker frozen"); 11364 return failure_indicator; 11365 } 11366 if (!and_node_orderings) 11367 { 11368 gint and_id; 11369 const gint and_count_of_r = AND_Count_of_B (b); 11370 obstack_init(obs); 11371 rank->t_and_node_orderings = 11372 and_node_orderings = 11373 obstack_alloc (obs, sizeof (ANDID *) * and_count_of_r); 11374 for (and_id = 0; and_id < and_count_of_r; and_id++) 11375 { 11376 and_node_orderings[and_id] = (ANDID *) NULL; 11377 } 11378 rank->t_and_node_in_use = 11379 and_node_in_use = bv_create ((guint)and_count_of_r); 11380 } 11381 first_and_node_id = First_ANDID_of_OR(or_node); 11382 and_count_of_or = AND_Count_of_OR(or_node); 11383 { 11384 gint and_ix; 11385 for (and_ix = 0; and_ix < length; and_ix++) 11386 { 11387 ANDID and_node_id = and_node_ids[and_ix]; 11388 if (and_node_id < first_and_node_id || 11389 and_node_id - first_and_node_id >= and_count_of_or) { 11390 R_ERROR ("and node not in or node"); 11391 return failure_indicator; 11392 } 11393 if (bv_bit_test (and_node_in_use, (guint)and_node_id)) 11394 { 11395 R_ERROR ("dup and node"); 11396 return failure_indicator; 11397 } 11398 bv_bit_set (and_node_in_use, (guint)and_node_id); 11399 } 11400 } 11401 if (and_node_orderings[or_node_id]) { 11402 R_ERROR ("or node already ordered"); 11403 return failure_indicator; 11404 } 11405 { 11406 ANDID *orderings = obstack_alloc (obs, sizeof (ANDID) * (length + 1)); 11407 gint i; 11408 and_node_orderings[or_node_id] = orderings; 11409 *orderings++ = length; 11410 for (i = 0; i < length; i++) 11411 { 11412 *orderings++ = and_node_ids[i]; 11413 } 11414 } 11415 } 11416 return 1; 11417} 11418 11419@*0 Get an And-node by Order within its Or-Node. 11420@ @<Private function prototypes@> = 11421static inline ANDID and_order_get(BOC b, OR or_node, gint ix); 11422@ @<Public function prototypes@> = 11423Marpa_And_Node_ID marpa_and_order_get(struct marpa_r *r, Marpa_Or_Node_ID or_node_id, gint ix); 11424@ @<Function definitions@> = 11425static inline ANDID and_order_get(BOC b, OR or_node, gint ix) 11426{ 11427 RANK rank; 11428 ANDID **and_node_orderings; 11429 if (ix >= AND_Count_of_OR (or_node)) 11430 { 11431 return -1; 11432 } 11433 rank = RANK_of_B (b); 11434 and_node_orderings = rank->t_and_node_orderings; 11435 if (and_node_orderings) 11436 { 11437 ORID or_node_id = ID_of_OR(or_node); 11438 ANDID *ordering = and_node_orderings[or_node_id]; 11439 if (ordering) 11440 { 11441 gint length = ordering[0]; 11442 if (ix >= length) 11443 return -1; 11444 return ordering[1 + ix]; 11445 } 11446 } 11447 return First_ANDID_of_OR(or_node) + ix; 11448} 11449 11450Marpa_And_Node_ID marpa_and_order_get(struct marpa_r *r, Marpa_Or_Node_ID or_node_id, gint ix) 11451{ 11452 OR or_node; 11453 @<Return |-2| on failure@>@; 11454 @<Check |r| and |or_node_id|; set |or_node|@>@; 11455 if (ix < 0) { 11456 R_ERROR("negative and ix"); 11457 return failure_indicator; 11458 } 11459 { 11460 BOC b = B_of_R (r); 11461 if (!b) 11462 { 11463 R_ERROR ("no bocage"); 11464 return failure_indicator; 11465 } 11466 return and_order_get(b, or_node, ix); 11467 } 11468} 11469 11470@** Fork (FORK) Code. 11471In Marpa, a fork is any node of a parse tree. 11472In discussed Marpa's parse trees, 11473a leaf node is a special kind of |FORK|. 11474This terminology, while not unprecedented, 11475is unusual -- the usual term is "node". 11476The problem is that within Marpa, 11477the word "node" is already heavily overloaded. 11478So what most texts call "tree nodes" are here 11479called "forks". 11480@<Public typedefs@> = 11481typedef gint Marpa_Fork_ID; 11482@ @<Private typedefs@> = 11483typedef Marpa_Fork_ID FORKID; 11484@ @s FORK int 11485@<Private incomplete structures@> = 11486struct s_fork; 11487typedef struct s_fork* FORK; 11488@ @d OR_of_FORK(fork) ((fork)->t_or_node) 11489@d Choice_of_FORK(fork) ((fork)->t_choice) 11490@d Parent_of_FORK(fork) ((fork)->t_parent) 11491@d FORK_Cause_is_Ready(fork) ((fork)->t_is_cause_ready) 11492@d FORK_is_Cause(fork) ((fork)->t_is_cause_of_parent) 11493@d FORK_Predecessor_is_Ready(fork) ((fork)->t_is_predecessor_ready) 11494@d FORK_is_Predecessor(fork) ((fork)->t_is_predecessor_of_parent) 11495@s FORK_Object int 11496@<FORK structure@> = 11497struct s_fork { 11498 OR t_or_node; 11499 gint t_choice; 11500 FORKID t_parent; 11501 guint t_is_cause_ready:1; 11502 guint t_is_predecessor_ready:1; 11503 guint t_is_cause_of_parent:1; 11504 guint t_is_predecessor_of_parent:1; 11505}; 11506typedef struct s_fork FORK_Object; 11507 11508@*0 Trace Functions. 11509 11510@ This is common logic in the |FORK| trace functions. 11511@<Check |r| and |fork_id|; 11512set |fork|@> = { 11513 FORK base_fork; 11514 BOC b = B_of_R(r); 11515 TREE tree; 11516 @<Fail if recognizer has fatal error@>@; 11517 if (!b) { 11518 R_ERROR("no bocage"); 11519 return failure_indicator; 11520 } 11521 tree = TREE_of_RANK(RANK_of_B(b)); 11522 if (!TREE_is_Initialized(tree)) { 11523 R_ERROR("tree not initialized"); 11524 return failure_indicator; 11525 } 11526 if (TREE_is_Exhausted(tree)) { 11527 R_ERROR("bocage iteration exhausted"); 11528 return failure_indicator; 11529 } 11530 base_fork = FSTACK_BASE(tree->t_fork_stack, FORK_Object); 11531 if (fork_id < 0) { 11532 R_ERROR("bad fork id"); 11533 return failure_indicator; 11534 } 11535 if (fork_id >= FSTACK_LENGTH(tree->t_fork_stack)) { 11536 return -1; 11537 } 11538 fork = base_fork + fork_id; 11539} 11540 11541@ Return the ID of the or-node for |fork_id|. 11542@<Private function prototypes@> = 11543gint marpa_fork_or_node(struct marpa_r *r, int fork_id); 11544@ @<Function definitions@> = 11545gint marpa_fork_or_node(struct marpa_r *r, int fork_id) 11546{ 11547 FORK fork; 11548 @<Return |-2| on failure@>@; 11549 @<Check |r| and |fork_id|; set |fork|@>@; 11550 return ID_of_OR(OR_of_FORK(fork)); 11551} 11552 11553@ Return the current choice for |fork_id|. 11554@<Private function prototypes@> = 11555gint marpa_fork_choice(struct marpa_r *r, int fork_id); 11556@ @<Function definitions@> = 11557gint marpa_fork_choice(struct marpa_r *r, int fork_id) 11558{ 11559 FORK fork; 11560 @<Return |-2| on failure@>@; 11561 @<Check |r| and |fork_id|; set |fork|@>@; 11562 return Choice_of_FORK(fork); 11563} 11564 11565@ Return the parent fork's ID for |fork_id|. 11566As with the other fork trace functions, 11567-1 is returned if |fork_id| is not the ID of 11568a fork on the stack, 11569but -1 can also be a valid value. 11570If that's an issue, the |fork_id| needs 11571to be checked with one of the trace functions 11572where -1 is never a valid value --- 11573for example, |marpa_fork_or_node|. 11574@<Private function prototypes@> = 11575gint marpa_fork_parent(struct marpa_r *r, int fork_id); 11576@ @<Function definitions@> = 11577gint marpa_fork_parent(struct marpa_r *r, int fork_id) 11578{ 11579 FORK fork; 11580 @<Return |-2| on failure@>@; 11581 @<Check |r| and |fork_id|; set |fork|@>@; 11582 return Parent_of_FORK(fork); 11583} 11584 11585@ Return the cause-is-ready bit for |fork_id|. 11586@<Private function prototypes@> = 11587gint marpa_fork_cause_is_ready(struct marpa_r *r, int fork_id); 11588@ @<Function definitions@> = 11589gint marpa_fork_cause_is_ready(struct marpa_r *r, int fork_id) 11590{ 11591 FORK fork; 11592 @<Return |-2| on failure@>@; 11593 @<Check |r| and |fork_id|; set |fork|@>@; 11594 return FORK_Cause_is_Ready(fork); 11595} 11596 11597@ Return the predecessor-is-ready bit for |fork_id|. 11598@<Private function prototypes@> = 11599gint marpa_fork_predecessor_is_ready(struct marpa_r *r, int fork_id); 11600@ @<Function definitions@> = 11601gint marpa_fork_predecessor_is_ready(struct marpa_r *r, int fork_id) 11602{ 11603 FORK fork; 11604 @<Return |-2| on failure@>@; 11605 @<Check |r| and |fork_id|; set |fork|@>@; 11606 return FORK_Predecessor_is_Ready(fork); 11607} 11608 11609@ Return the is-cause bit for |fork_id|. 11610@<Private function prototypes@> = 11611gint marpa_fork_is_cause(struct marpa_r *r, int fork_id); 11612@ @<Function definitions@> = 11613gint marpa_fork_is_cause(struct marpa_r *r, int fork_id) 11614{ 11615 FORK fork; 11616 @<Return |-2| on failure@>@; 11617 @<Check |r| and |fork_id|; set |fork|@>@; 11618 return FORK_is_Cause(fork); 11619} 11620 11621@ Return the is-predecessor bit for |fork_id|. 11622@<Private function prototypes@> = 11623gint marpa_fork_is_predecessor(struct marpa_r *r, int fork_id); 11624@ @<Function definitions@> = 11625gint marpa_fork_is_predecessor(struct marpa_r *r, int fork_id) 11626{ 11627 FORK fork; 11628 @<Return |-2| on failure@>@; 11629 @<Check |r| and |fork_id|; set |fork|@>@; 11630 return FORK_is_Predecessor(fork); 11631} 11632 11633@** Event (EVE) Code. 11634@ 11635@d SYMID_of_EVE(eve) ((eve)->marpa_token_id) 11636@d Value_of_EVE(eve) ((eve)->marpa_value) 11637@d RULEID_of_EVE(eve) ((eve)->marpa_rule_id) 11638@d Arg0_of_EVE(eve) ((eve)->marpa_arg_0) 11639@d ArgN_of_EVE(eve) ((eve)->marpa_arg_n) 11640@<Public structures@> = 11641struct marpa_event { 11642 Marpa_Symbol_ID marpa_token_id; 11643 gpointer marpa_value; 11644 Marpa_Rule_ID marpa_rule_id; 11645 gint marpa_arg_0; 11646 gint marpa_arg_n; 11647}; 11648typedef struct marpa_event Marpa_Event; 11649@ @<Private typedefs@> = 11650typedef Marpa_Event *EVE; 11651 11652@** Evaluation (VAL) Code. 11653This code helps 11654compute a value for 11655a parse tree. 11656I say "helps" because evaluating a parse tree 11657involves semantics, and libmarpa has only 11658limited knowledge of the semantics. 11659This code is really just routines to assist 11660the higher level in tracking the evaluation stack. 11661\par 11662The main reason for this code is to hide libmarpa's 11663internal rewrites from the semantics. 11664If it were not for that, it would probably be 11665just as easy to provide a parse tree to the 11666higher level and let them decide how to 11667evaluation it. 11668@<Private incomplete structures@> = 11669struct s_value; 11670typedef struct s_value* VAL; 11671@ This structure tracks the top of the evaluation 11672stack, but does {\bf not} actually maintain the 11673actual evaluation stack --- 11674that is left for the upper layers to do. 11675It does, however, mantain a stack of the counts 11676of symbols in the 11677original (or "virtual") rules. 11678This enables libmarpa to make the rewriting of 11679the grammar invisible to the semantics. 11680@d VAL_is_Active(val) ((val)->t_active) 11681@d VAL_is_Trace(val) ((val)->t_trace) 11682@d FORK_of_VAL(val) ((val)->t_fork) 11683@d TOS_of_VAL(val) ((val)->t_tos) 11684@d VStack_of_VAL(val) ((val)->t_virtual_stack) 11685@<VAL structure@> = 11686struct s_value { 11687 DSTACK_DECLARE(t_virtual_stack); 11688 FORKID t_fork; 11689 gint t_tos; 11690 guint t_trace:1; 11691 guint t_active:1; 11692}; 11693typedef struct s_value VAL_Object; 11694 11695@ @<Private function prototypes@> = 11696static inline void val_safe(VAL val); 11697@ @<Function definitions@> = 11698static inline void val_safe(VAL val) 11699{ 11700 DSTACK_SAFE(val->t_virtual_stack); 11701 VAL_is_Active(val) = 0; 11702 VAL_is_Trace(val) = 0; 11703 TOS_of_VAL(val) = -1; 11704 FORK_of_VAL(val) = -1; 11705} 11706 11707@ @<Public function prototypes@> = 11708int marpa_val_new(struct marpa_r* r); 11709@ A dynamic stack is used here instead of a fixed 11710stack for two reasons. 11711First, there are only a few stack moves per call 11712of |marpa_val_event|. 11713Since at least one subroutine call occurs every few 11714virtual stack moves, 11715virtual stack moves are not really within a tight CPU 11716loop. 11717Therefore shaving off the few instructions it 11718takes to check stack size is less important than it is 11719in other places. 11720@ Second, the fixed stack, to accomodate the worst 11721case, would have to be many times larger than 11722what will usually be needed. 11723I calculate the 11724worst case for virtual stack size, as follows. 11725The virtual stack only grows once for each virtual 11726rules. 11727To be virtual, a rule must divide into a least two 11728"real" or rewritten, rules, so worst case is half 11729of all applications of real rules grow the virtual 11730stack. 11731The number of applications of real rules is 11732the size of the parse tree, $\size{|tree|}$. 11733So, if the fixed stack is sized per tree, 11734it must be $\size{|tree|}/2+1$. 11735@ I set the initial size of 11736the dynamic stack to be 11737$\size{|tree|}/1024$, 11738with a minimum of 1024. 117391024 is chosen because 11740in some modern configurations 11741a smaller allocation may require 11742extra work. 11743The purpose of the $\size{|tree|}/1024$ is 11744to guarantee that this code is $O(n)$. 11745$\size{|tree|}/1024$ is a fixed fraction 11746of the worst case size, so the number of 11747stack reallocations is $O(1)$. 11748@<Function definitions@> = 11749int marpa_val_new(struct marpa_r* r) 11750{ 11751 BOC b; 11752 TREE tree; 11753 @<Return |-2| on failure@>@; 11754 @<Fail if recognizer has fatal error@>@; 11755 @<Set |b| to bocage; fail if none@>@; 11756 tree = TREE_of_RANK(RANK_of_B(b)); 11757 if (TREE_is_Exhausted(tree)) { 11758 return -1; 11759 } 11760 if (!TREE_is_Initialized(tree)) 11761 { 11762 R_ERROR ("tree not initialized"); 11763 return failure_indicator; 11764 } 11765 { 11766 VAL val = VAL_of_TREE (tree); 11767 const gint minimum_stack_size = (8192 / sizeof (gint)); 11768 const gint initial_stack_size = 11769 MAX (Size_of_TREE (tree) / 1024, minimum_stack_size); 11770 val_destroy (val); 11771 DSTACK_INIT (VStack_of_VAL (val), gint, initial_stack_size); 11772 VAL_is_Active(val) = 1; 11773 } 11774 return 1; 11775} 11776 11777@ @<Private function prototypes@> = 11778static inline void val_destroy(VAL val); 11779@ @<Function definitions@> = 11780static inline void val_destroy(VAL val) 11781{ 11782 11783 if (DSTACK_IS_INITIALIZED(val->t_virtual_stack)) 11784 { 11785 DSTACK_DESTROY(val->t_virtual_stack); 11786 DSTACK_SAFE(val->t_virtual_stack); 11787 } 11788 val_safe(val); 11789} 11790 11791@ @<Set |b|, |tree|, |val|; 11792return on failure@> = { 11793 @<Fail if recognizer has fatal error@>@; 11794 b = B_of_R(r); 11795 if (!b) { 11796 return failure_indicator; 11797 } 11798 tree = TREE_of_RANK(RANK_of_B(b)); 11799 val = VAL_of_TREE(tree); 11800 if (!VAL_is_Active(val)) { 11801 return failure_indicator; 11802 } 11803} 11804 11805@ @<Public function prototypes@> = 11806gint marpa_val_trace(struct marpa_r* r, gint flag); 11807@ @<Function definitions@> = 11808gint marpa_val_trace(struct marpa_r* r, gint flag) 11809{ 11810 BOC b; 11811 TREE tree; 11812 VAL val; 11813 @<Return |-2| on failure@>@; 11814 @<Set |b|, |tree|, |val|; return on failure@>@; 11815 VAL_is_Trace(val) = flag; 11816 return 1; 11817} 11818 11819@ @<Public function prototypes@> = 11820Marpa_Fork_ID marpa_val_fork(struct marpa_r* r); 11821@ @<Function definitions@> = 11822Marpa_Fork_ID marpa_val_fork(struct marpa_r* r) 11823{ 11824 BOC b; 11825 TREE tree; 11826 VAL val; 11827 @<Return |-2| on failure@>@; 11828 @<Set |b|, |tree|, |val|; return on failure@>@; 11829 return FORK_of_VAL(val); 11830} 11831 11832@ @<Public function prototypes@> = 11833Marpa_Fork_ID marpa_val_event(struct marpa_r* r, Marpa_Event* event); 11834@ @<Function definitions@> = 11835Marpa_Fork_ID marpa_val_event(struct marpa_r* r, Marpa_Event* event) 11836{ 11837 BOC b; 11838 TREE tree; 11839 VAL val; 11840 AND and_nodes; 11841 gint semantic_rule_id = -1; 11842 gint token_id = -1; 11843 gpointer token_value = NULL; 11844 gint arg_0 = -1; 11845 gint arg_n = -1; 11846 FORKID fork_ix; 11847 gint continue_with_next_fork; 11848 11849 /* event is not changed in case of hard failure */ 11850 @<Return |-2| on failure@>@; 11851 @<Set |b|, |tree|, |val|; return on failure@>@; 11852 and_nodes = ANDs_of_B(b); 11853 11854 arg_0 = arg_n = TOS_of_VAL(val); 11855 fork_ix = FORK_of_VAL(val); 11856 if (fork_ix < 0) { 11857 fork_ix = Size_of_TREE(tree); 11858 } 11859 continue_with_next_fork = !VAL_is_Trace(val); 11860 11861 while (1) { 11862 OR or; 11863 RULE fork_rule; 11864 fork_ix--; 11865 if (fork_ix < 0) goto RETURN_SOFT_ERROR; 11866 { 11867 ANDID and_node_id; 11868 AND and_node; 11869 const FORK fork = FORK_of_TREE_by_IX(tree, fork_ix); 11870 const gint choice = Choice_of_FORK(fork); 11871 or = OR_of_FORK(fork); 11872 and_node_id = and_order_get(b, or, choice); 11873 and_node = and_nodes + and_node_id; 11874 token_id = and_node_token(and_node, &token_value); 11875 } 11876 if (token_id >= 0) { 11877 arg_0 = ++arg_n; 11878 continue_with_next_fork = 0; 11879 } 11880 fork_rule = RULE_of_OR(or); 11881 if (Position_of_OR(or) == Length_of_RULE(fork_rule)) { 11882 gint virtual_rhs = RULE_is_Virtual_RHS(fork_rule); 11883 gint virtual_lhs = RULE_is_Virtual_LHS(fork_rule); 11884 gint real_symbol_count; 11885 const DSTACK virtual_stack = &VStack_of_VAL(val); 11886 if (virtual_lhs) { 11887 real_symbol_count = Real_SYM_Count_of_RULE(fork_rule); 11888 if (virtual_rhs) { 11889 *(DSTACK_TOP(*virtual_stack, gint)) += real_symbol_count; 11890 } else { 11891 *DSTACK_PUSH(*virtual_stack, gint) = real_symbol_count; 11892 } 11893 goto NEXT_FORK; 11894 } 11895 if (virtual_rhs) { 11896 real_symbol_count = Real_SYM_Count_of_RULE(fork_rule); 11897 real_symbol_count += *DSTACK_POP(*virtual_stack, gint); 11898 } else { 11899 real_symbol_count = Length_of_RULE(fork_rule); 11900 } 11901 arg_0 = arg_n - real_symbol_count + 1; 11902 semantic_rule_id = 11903 fork_rule->t_is_semantic_equivalent ? 11904 fork_rule->t_original : ID_of_RULE(fork_rule); 11905 continue_with_next_fork = 0; 11906 } 11907 NEXT_FORK: ; 11908 if (!continue_with_next_fork) break; 11909 } 11910 11911 @<Write results to |val| and |event|@>@; 11912 return FORK_of_VAL(val); 11913 11914 RETURN_SOFT_ERROR: ; 11915 @<Write results to |val| and |event|@>@; 11916 return -1; 11917 11918} 11919 11920@ @<Write results to |val| and |event|@> = 11921{ 11922 SYMID_of_EVE(event) = token_id; 11923 Value_of_EVE(event) = token_value; 11924 RULEID_of_EVE(event) = semantic_rule_id; 11925 TOS_of_VAL(val) = Arg0_of_EVE(event) = arg_0; 11926 FORK_of_VAL(val) = fork_ix; 11927 ArgN_of_EVE(event) = arg_n; 11928} 11929 11930@** Boolean Vectors. 11931Marpa's boolean vectors are adapted from 11932Steffen Beyer's Bit-Vector package on CPAN. 11933This is a combined Perl package and C library for handling 11934bit vectors. 11935Someone seeking a general bit vector package should 11936look at Steffen's instead. 11937|libmarpa|'s boolean vectors are tightly tied in 11938with its own needs and environment. 11939@<Private typedefs@> = 11940typedef guint Bit_Vector_Word; 11941typedef Bit_Vector_Word* Bit_Vector; 11942@ Some defines and constants 11943@d BV_BITS(bv) *(bv-3) 11944@d BV_SIZE(bv) *(bv-2) 11945@d BV_MASK(bv) *(bv-1) 11946@<Private global variables@> = 11947static const guint bv_wordbits = sizeof(Bit_Vector_Word)*8u; 11948static const guint bv_modmask = sizeof(Bit_Vector_Word)*8u-1u; 11949static const guint bv_hiddenwords = 3; 11950static const guint bv_lsb = 1u; 11951static const guint bv_msb = (1u << (sizeof(Bit_Vector_Word)*8u-1u)); 11952 11953@ Given a number of bits, compute the size. 11954@<Function definitions@> = 11955static inline guint bv_bits_to_size(guint bits) 11956{ 11957 return (bits+bv_modmask)/bv_wordbits; 11958} 11959@ @<Private function prototypes@> = 11960static inline guint bv_bits_to_size(guint bits); 11961@ Given a number of bits, compute the unused-bit mask. 11962@<Function definitions@> = 11963static inline guint bv_bits_to_unused_mask(guint bits) 11964{ 11965 guint mask = bits & bv_modmask; 11966 if (mask) mask = (guint) ~(~0uL << mask); else mask = (guint) ~0uL; 11967 return(mask); 11968} 11969@ @<Private function prototypes@> = 11970static inline guint bv_bits_to_unused_mask(guint bits); 11971 11972@*0 Create a Boolean Vector. 11973@<Private function prototypes@> = 11974static inline Bit_Vector bv_create(guint bits); 11975@ Always start with an all-zero vector. 11976Note this code is a bit tricky --- 11977the pointer returned is to the data. 11978This is offset from the |g_malloc|'d space, 11979by |bv_hiddenwords|. 11980@<Function definitions@> = 11981static inline Bit_Vector bv_create(guint bits) 11982{ 11983 guint size = bv_bits_to_size(bits); 11984 guint bytes = (size + bv_hiddenwords) << sizeof(guint); 11985 guint* addr = (Bit_Vector) g_malloc0((size_t) bytes); 11986 *addr++ = bits; 11987 *addr++ = size; 11988 *addr++ = bv_bits_to_unused_mask(bits); 11989 return addr; 11990} 11991 11992@*0 Create a Boolean Vector on an Obstack. 11993@<Private function prototypes@> = 11994static inline Bit_Vector bv_obs_create(struct obstack *obs, guint bits); 11995@ Always start with an all-zero vector. 11996Note this code is a bit tricky --- 11997the pointer returned is to the data. 11998This is offset from the |g_malloc|'d space, 11999by |bv_hiddenwords|. 12000@<Function definitions@> = 12001static inline Bit_Vector 12002bv_obs_create (struct obstack *obs, guint bits) 12003{ 12004 guint size = bv_bits_to_size (bits); 12005 guint bytes = (size + bv_hiddenwords) << sizeof (guint); 12006 guint *addr = (Bit_Vector) obstack_alloc (obs, (size_t) bytes); 12007 *addr++ = bits; 12008 *addr++ = size; 12009 *addr++ = bv_bits_to_unused_mask (bits); 12010 if (size > 0) { 12011 Bit_Vector bv = addr; 12012 while (size--) *bv++ = 0u; 12013 } 12014 return addr; 12015} 12016 12017 12018@*0 Shadow a Boolean Vector. 12019Create another vector the same size as the original, but with 12020all bits unset. 12021@<Function definitions@> = 12022static inline Bit_Vector bv_shadow(Bit_Vector bv) 12023{ 12024 return bv_create(BV_BITS(bv)); 12025} 12026@ @<Private function prototypes@> = 12027static inline Bit_Vector bv_shadow(Bit_Vector bv); 12028 12029@*0 Clone a Boolean Vector. 12030Given a boolean vector, creates a new vector which is 12031an exact duplicate. 12032This call allocates a new vector, which must be |g_free|'d. 12033@<Function definitions@> = static inline 12034Bit_Vector bv_copy(Bit_Vector bv_to, Bit_Vector bv_from) 12035{ 12036 guint *p_to = bv_to; 12037 const guint bits = BV_BITS(bv_to); 12038 if (bits > 0) 12039 { 12040 gint count = BV_SIZE(bv_to); 12041 while (count--) *p_to++ = *bv_from++; 12042 } 12043 return(bv_to); 12044} 12045@ @<Private function prototypes@> = 12046static inline 12047Bit_Vector bv_copy(Bit_Vector bv_to, Bit_Vector bv_from); 12048 12049@*0 Clone a Boolean Vector. 12050Given a boolean vector, creates a new vector which is 12051an exact duplicate. 12052This call allocates a new vector, which must be |g_free|'d. 12053@<Function definitions@> = static inline 12054Bit_Vector bv_clone(Bit_Vector bv) 12055{ 12056 return bv_copy(bv_shadow(bv), bv); 12057} 12058@ @<Private function prototypes@> = 12059static inline 12060Bit_Vector bv_clone(Bit_Vector bv); 12061 12062@*0 Free a Boolean Vector. 12063@<Function definitions@> = 12064static inline void bv_free(Bit_Vector vector) { 12065 vector -= bv_hiddenwords; 12066 g_free(vector); 12067} 12068@ @<Private function prototypes@> = 12069static inline void bv_free(Bit_Vector vector); 12070 12071@*0 The Number of Bytes in a Boolean Vector. 12072@<Function definitions@> = 12073static inline gint bv_bytes(Bit_Vector bv) { 12074 return (BV_SIZE(bv)+bv_hiddenwords)*sizeof(Bit_Vector_Word); 12075} 12076@ @<Private function prototypes@> = 12077static inline gint bv_bytes(Bit_Vector bv); 12078 12079@*0 Fill a Boolean Vector. 12080@<Function definitions@> = 12081static inline void bv_fill(Bit_Vector bv) 12082{ 12083 guint size = BV_SIZE(bv); 12084 if (size <= 0) return; 12085 while (size--) *bv++ = ~0u; 12086 --bv; 12087 *bv &= BV_MASK(bv); 12088} 12089@ @<Private function prototypes@> = 12090static inline void bv_fill(Bit_Vector bv); 12091 12092@*0 Clear a Boolean Vector. 12093@ @<Private function prototypes@> = 12094static inline void bv_clear(Bit_Vector bv); 12095@ @<Function definitions@> = 12096static inline void bv_clear(Bit_Vector bv) 12097{ 12098 guint size = BV_SIZE(bv); 12099 if (size <= 0) return; 12100 while (size--) *bv++ = 0u; 12101} 12102 12103@ This function "overclears" --- 12104it clears "too many bits". 12105It clears a prefix of the bit vector faster 12106than an interval clear, at the expense of often 12107clearing more bits than were requested. 12108In some situations clearing the extra bits is OK. 12109@<Private function prototypes@> = 12110static inline void bv_over_clear(Bit_Vector bv, guint bit); 12111@ @<Function definitions@> = 12112static inline void bv_over_clear(Bit_Vector bv, guint bit) 12113{ 12114 guint length = bit/bv_wordbits+1; 12115 while (length--) *bv++ = 0u; 12116} 12117 12118@*0 Set a Boolean Vector Bit. 12119@ @<Function definitions@> = 12120static inline void bv_bit_set(Bit_Vector vector, guint bit) { 12121 *(vector+(bit/bv_wordbits)) |= (bv_lsb << (bit%bv_wordbits)); 12122} 12123@ @<Private function prototypes@> = 12124static inline void bv_bit_set(Bit_Vector vector, guint bit); 12125 12126@*0 Clear a Boolean Vector Bit. 12127@<Function definitions@> = 12128static inline void bv_bit_clear(Bit_Vector vector, guint bit) { 12129 *(vector+(bit/bv_wordbits)) &= ~ (bv_lsb << (bit%bv_wordbits)); 12130} 12131@ @<Private function prototypes@> = 12132static inline void bv_bit_clear(Bit_Vector vector, guint bit); 12133 12134@*0 Test a Boolean Vector Bit. 12135@<Function definitions@> = 12136static inline gboolean bv_bit_test(Bit_Vector vector, guint bit) { 12137 return (*(vector+(bit/bv_wordbits)) & (bv_lsb << (bit%bv_wordbits))) != 0u; 12138} 12139@ @<Private function prototypes@> = 12140static inline gboolean bv_bit_test(Bit_Vector vector, guint bit); 12141 12142@*0 Test and Set a Boolean Vector Bit. 12143Ensure that a bit is set and returning its value to the call. 12144@ @<Private function prototypes@> = 12145static inline gboolean bv_bit_test_and_set(Bit_Vector vector, guint bit); 12146@ @<Function definitions@> = 12147static inline gboolean 12148bv_bit_test_and_set (Bit_Vector vector, guint bit) 12149{ 12150 Bit_Vector addr = vector + (bit / bv_wordbits); 12151 guint mask = bv_lsb << (bit % bv_wordbits); 12152 if ((*addr & mask) != 0u) 12153 return 1; 12154 *addr |= mask; 12155 return 0; 12156} 12157 12158@*0 Set a Boolean Vector to all Ones. 12159@*0 Test a Boolean Vector for all Zeroes. 12160@<Function definitions@> = 12161static inline 12162gboolean bv_is_empty(Bit_Vector addr) 12163{ 12164 guint size = BV_SIZE(addr); 12165 gboolean r = TRUE; 12166 if (size > 0) { 12167 *(addr+size-1) &= BV_MASK(addr); 12168 while (r && (size-- > 0)) r = ( *addr++ == 0 ); 12169 } 12170 return(r); 12171} 12172@ @<Private function prototypes@> = 12173static inline 12174gboolean bv_is_empty(Bit_Vector addr); 12175 12176@*0 Bitwise-negate a Boolean Vector. 12177@<Function definitions@>= 12178static inline void bv_not(Bit_Vector X, Bit_Vector Y) 12179{ 12180 guint size = BV_SIZE(X); 12181 guint mask = BV_MASK(X); 12182 while (size-- > 0) *X++ = ~*Y++; 12183 *(--X) &= mask; 12184} 12185@ @<Private function prototypes@> = 12186static inline void bv_not(Bit_Vector X, Bit_Vector Y); 12187 12188@*0 Bitwise-and a Boolean Vector. 12189@<Function definitions@>= 12190static inline void bv_and(Bit_Vector X, Bit_Vector Y, Bit_Vector Z) 12191{ 12192 guint size = BV_SIZE(X); 12193 guint mask = BV_MASK(X); 12194 while (size-- > 0) *X++ = *Y++ & *Z++; 12195 *(--X) &= mask; 12196} 12197@ @<Private function prototypes@> = 12198static inline void bv_and(Bit_Vector X, Bit_Vector Y, Bit_Vector Z); 12199 12200@*0 Bitwise-or a Boolean Vector. 12201@<Function definitions@>= 12202static inline void bv_or(Bit_Vector X, Bit_Vector Y, Bit_Vector Z) 12203{ 12204 guint size = BV_SIZE(X); 12205 guint mask = BV_MASK(X); 12206 while (size-- > 0) *X++ = *Y++ | *Z++; 12207 *(--X) &= mask; 12208} 12209@ @<Private function prototypes@> = 12210static inline void bv_or(Bit_Vector X, Bit_Vector Y, Bit_Vector Z); 12211 12212@*0 Bitwise-or-assign a Boolean Vector. 12213@<Function definitions@>= 12214static inline void bv_or_assign(Bit_Vector X, Bit_Vector Y) 12215{ 12216 guint size = BV_SIZE(X); 12217 guint mask = BV_MASK(X); 12218 while (size-- > 0) *X++ |= *Y++; 12219 *(--X) &= mask; 12220} 12221@ @<Private function prototypes@> = 12222static inline void bv_or_assign(Bit_Vector X, Bit_Vector Y); 12223 12224@*0 Scan a Boolean Vector. 12225@<Function definitions@>= 12226static inline 12227gboolean bv_scan(Bit_Vector bv, guint start, 12228 guint* min, guint* max) 12229{ 12230 guint size = BV_SIZE(bv); 12231 guint mask = BV_MASK(bv); 12232 guint offset; 12233 guint bitmask; 12234 guint value; 12235 gboolean empty; 12236 12237 if (size == 0) return FALSE; 12238 if (start >= BV_BITS(bv)) return FALSE; 12239 *min = start; 12240 *max = start; 12241 offset = start / bv_wordbits; 12242 *(bv+size-1) &= mask; 12243 bv += offset; 12244 size -= offset; 12245 bitmask = (guint)1 << (start & bv_modmask); 12246 mask = ~ (bitmask | (bitmask - (guint)1)); 12247 value = *bv++; 12248 if ((value & bitmask) == 0) 12249 { 12250 value &= mask; 12251 if (value == 0) 12252 { 12253 offset++; 12254 empty = TRUE; 12255 while (empty && (--size > 0)) 12256 { 12257 if ((value = *bv++)) empty = FALSE; else offset++; 12258 } 12259 if (empty) return FALSE; 12260 } 12261 start = offset * bv_wordbits; 12262 bitmask = bv_lsb; 12263 mask = value; 12264 while (!(mask & bv_lsb)) 12265 { 12266 bitmask <<= 1; 12267 mask >>= 1; 12268 start++; 12269 } 12270 mask = ~ (bitmask | (bitmask - 1)); 12271 *min = start; 12272 *max = start; 12273 } 12274 value = ~ value; 12275 value &= mask; 12276 if (value == 0) 12277 { 12278 offset++; 12279 empty = TRUE; 12280 while (empty && (--size > 0)) 12281 { 12282 if ((value = ~ *bv++)) empty = FALSE; else offset++; 12283 } 12284 if (empty) value = bv_lsb; 12285 } 12286 start = offset * bv_wordbits; 12287 while (! (value & bv_lsb)) 12288 { 12289 value >>= 1; 12290 start++; 12291 } 12292 *max = --start; 12293 return TRUE; 12294} 12295@ @<Private function prototypes@> = 12296static inline 12297gboolean bv_scan( 12298 Bit_Vector bv, guint start, guint* min, guint* max); 12299 12300@*0 Count the bits in a Boolean Vector. 12301@<Function definitions@>= 12302static inline guint 12303bv_count (Bit_Vector v) 12304{ 12305 guint start, min, max; 12306 guint count = 0; 12307 for (start = 0; bv_scan (v, start, &min, &max); start = max + 2) 12308 { 12309 count += max - min + 1; 12310 } 12311 return count; 12312} 12313@ @<Private function prototypes@> = 12314static inline guint bv_count (Bit_Vector v); 12315 12316@*0 The RHS Closure of a Vector. 12317Despite the fact that they are actually tied closely to their 12318use in |libmarpa|, most of the logic of boolean vectors has 12319a ``pure math" appearance. 12320This routine has a direct connection with the grammar. 12321\par 12322Several properties of symbols that need to be determined 12323have the property that, if 12324all the symbols on the RHS of any rule have that property, 12325so does its LHS symbol. 12326@ The RHS closure looks a lot like the transitive closure, 12327but there are several major differences. 12328The biggest difference is that 12329the RHS closure deals with properties and takes a {\bf vector} to another 12330vector; 12331the transitive closure is for a relation and takes a transition {\bf matrix} 12332to another transition matrix. 12333@ There are two properties of the RHS closure to note. 12334First, it is reflexive. 12335Any symbol in a set is in the RHS closure of that set. 12336@ Second, the RHS closure is vacuously true. 12337For any RHS closure property, 12338every symbol which is on the LHS of an empty rule has that property. 12339This means the RHS closure operation can only be used for 12340properties which can meaningfully be regarded as vacuously 12341true. 12342In |libmarpa|, two important symbol properties are 12343RHS clousure properties: 12344the property of being productive, 12345and the property of being nullable. 12346 12347@*0 Produce the RHS Closure of a Vector. 12348This routine takes a symbol vector and a grammar, 12349and turns the original vector into the RHS closure of that vector. 12350The orignal vector is destroyed. 12351\par 12352If I decide rules should have a unique right hand symbol list, 12353this is one place to use it. 12354Duplicate symbols on the RHS are visited uselessly. 12355@<Function definitions@> = 12356static void 12357rhs_closure (struct marpa_g *g, Bit_Vector bv) 12358{ 12359 guint min, max, start = 0; 12360 Marpa_Symbol_ID *top_of_stack = NULL; 12361 FSTACK_DECLARE (stack, Marpa_Symbol_ID)@; 12362 FSTACK_INIT (stack, Marpa_Symbol_ID, SYM_Count_of_G(g)); 12363 while (bv_scan (bv, start, &min, &max)) 12364 { 12365 guint symid; 12366 for (symid = min; symid <= max; symid++) 12367 { 12368 *(FSTACK_PUSH (stack)) = symid; 12369 } 12370 start = max + 2; 12371 } 12372 while ((top_of_stack = FSTACK_POP (stack))) 12373 { 12374 guint rule_ix; 12375 GArray *rules = SYM_by_ID (*top_of_stack)->t_rhs; 12376 for (rule_ix = 0; rule_ix < rules->len; rule_ix++) 12377 { 12378 Marpa_Rule_ID rule_id = 12379 g_array_index (rules, Marpa_Rule_ID, rule_ix); 12380 RULE rule = RULE_by_ID (g, rule_id); 12381 guint rule_length; 12382 guint rh_ix; 12383 Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE (rule); 12384 if (bv_bit_test (bv, (guint) lhs_id)) 12385 goto NEXT_RULE; 12386 rule_length = Length_of_RULE(rule); 12387 for (rh_ix = 0; rh_ix < rule_length; rh_ix++) 12388 { 12389 if (!bv_bit_test (bv, (guint) RHS_ID_of_RULE (rule, rh_ix))) 12390 goto NEXT_RULE; 12391 } 12392 /* If I am here, the bits for the RHS symbols are all 12393 * set, but the one for the LHS symbol is not. 12394 */ 12395 bv_bit_set (bv, (guint) lhs_id); 12396 *(FSTACK_PUSH (stack)) = lhs_id; 12397 NEXT_RULE:; 12398 } 12399 } 12400 FSTACK_DESTROY (stack); 12401} 12402@ @<Private function prototypes@> = 12403static void rhs_closure(struct marpa_g* g, Bit_Vector bv); 12404 12405@** Boolean Matrixes. 12406Marpa's Boolean matrixes are implemented differently 12407from the matrixes in 12408Steffen Beyer's Bit-Vector package on CPAN, 12409but like Beyer's matrixes are build on that package. 12410Beyer's matrixes are a single Boolean vector 12411which special routines index by row and column. 12412Marpa's matrixes are arrays of vectors. 12413 12414Since there are ``hidden words" before the data 12415in each vectors, Marpa must repeat these for each 12416row of a vector. Consequences: 12417\li Marpa matrixes use a few extra bytes per row of space. 12418\li Marpa's matrix pointers cannot be used as vectors. 12419\li Marpa's rows {\bf can} be used as vectors. 12420\li Marpa's matrix pointers point to the beginning of 12421the allocated space. |Bit_Vector| pointers use trickery 12422and include ``hidden words" before the pointer. 12423@ Note that |typedef|'s for |Bit_Matrix| 12424and |Bit_Vector| are identical. 12425@s Bit_Matrix int 12426@<Private typedefs@> = 12427typedef Bit_Vector_Word* Bit_Matrix; 12428 12429@*0 Create a Boolean Matrix. 12430@ Here the pointer returned is the actual start of the 12431|g_malloc|'d space. 12432This is {\bf not} the case with vectors, whose pointer is offset for 12433the ``hidden words". 12434@<Function definitions@> = 12435static inline Bit_Matrix matrix_create(guint rows, guint columns) 12436{ 12437 guint bv_data_words = bv_bits_to_size(columns); 12438 guint row_bytes = (bv_data_words + bv_hiddenwords) * sizeof(Bit_Vector_Word); 12439 guint bv_mask = bv_bits_to_unused_mask(columns); 12440 Bit_Vector_Word* matrix_addr = g_malloc0((size_t)(row_bytes * rows)); 12441 guint row; 12442 for (row = 0; row < rows; row++) { 12443 guint row_start = row*(bv_data_words+bv_hiddenwords); 12444 matrix_addr[row_start] = columns; 12445 matrix_addr[row_start+1] = bv_data_words; 12446 matrix_addr[row_start+2] = bv_mask; 12447 } 12448 return matrix_addr; 12449} 12450@ @<Private function prototypes@> = 12451static inline Bit_Matrix matrix_create(guint rows, guint columns); 12452 12453@*0 Free a Boolean Matrix. 12454@<Function definitions@> = 12455static inline void matrix_free(Bit_Matrix matrix) { 12456 g_free(matrix); 12457} 12458@ @<Private function prototypes@> = 12459static inline void matrix_free(Bit_Matrix matrix); 12460 12461@*0 Find the Number of Columns in a Boolean Matrix. 12462The column count returned is for the first row. 12463It is assumed that 12464all rows have the same number of columns. 12465Note that, in this implementation, the matrix has no 12466idea internally of how many rows it has. 12467@<Function definitions@> = 12468static inline gint matrix_columns(Bit_Matrix matrix) { 12469 Bit_Vector row0 = matrix+bv_hiddenwords; 12470 return BV_BITS(row0); 12471} 12472@ @<Private function prototypes@> = 12473static inline gint matrix_columns(Bit_Matrix matrix); 12474 12475@*0 Find a Row of a Boolean Matrix. 12476Here's where the slight extra overhead of repeating 12477identical ``hidden word" data for each row of a matrix 12478pays off. 12479This simply returns a pointer into the matrix. 12480This is adequate if the data is not changed. 12481If it is changed, the vector should be cloned. 12482There is a bit of arithmetic, to deal with the 12483hidden words offset. 12484@<Function definitions@> = 12485static inline Bit_Vector matrix_row(Bit_Matrix matrix, guint row) { 12486 Bit_Vector row0 = matrix+bv_hiddenwords; 12487 guint words_per_row = BV_SIZE(row0)+bv_hiddenwords; 12488 return row0 + row*words_per_row; 12489} 12490@ @<Private function prototypes@> = 12491static inline Bit_Vector matrix_row(Bit_Matrix matrix, guint row); 12492 12493@*0 Set a Boolean Matrix Bit. 12494@ @<Function definitions@> = 12495static inline void matrix_bit_set(Bit_Matrix matrix, guint row, guint column) { 12496 Bit_Vector vector = matrix_row(matrix, row); 12497 bv_bit_set(vector, column); 12498} 12499@ @<Private function prototypes@> = 12500static inline void matrix_bit_set(Bit_Matrix matrix, guint row, guint column); 12501 12502@*0 Clear a Boolean Matrix Bit. 12503@ @<Function definitions@> = 12504static inline void matrix_bit_clear(Bit_Matrix matrix, guint row, guint column) { 12505 Bit_Vector vector = matrix_row(matrix, row); 12506 bv_bit_clear(vector, column); 12507} 12508@ @<Private function prototypes@> = 12509static inline void matrix_bit_clear(Bit_Matrix matrix, guint row, guint column); 12510 12511@*0 Test a Boolean Matrix Bit. 12512@ @<Function definitions@> = 12513static inline gboolean matrix_bit_test(Bit_Matrix matrix, guint row, guint column) { 12514 Bit_Vector vector = matrix_row(matrix, row); 12515 return bv_bit_test(vector, column); 12516} 12517@ @<Private function prototypes@> = 12518static inline gboolean matrix_bit_test(Bit_Matrix matrix, guint row, guint column); 12519 12520@*0 Produce the Transitive Closure of a Boolean Matrix. 12521This routine takes a matrix representing a relation 12522and produces a matrix that represents the transitive closure 12523of the relation. 12524The matrix is assumed to be square. 12525The input matrix will be destroyed. 12526@<Function definitions@> = 12527static void transitive_closure(Bit_Matrix matrix) 12528{ 12529 struct transition { guint from, to; } * top_of_stack = NULL; 12530 guint size = matrix_columns(matrix); 12531 guint row; 12532 DSTACK_DECLARE(stack); 12533 DSTACK_INIT(stack, struct transition, 1024); 12534 for (row = 0; row < size; row++) { 12535 guint min, max, start; 12536 Bit_Vector row_vector = matrix_row(matrix, row); 12537 for ( start = 0; bv_scan(row_vector, start, &min, &max); start = max+2 ) { 12538 guint column; 12539 for (column = min; column <= max; column++) { 12540 struct transition *t = DSTACK_PUSH(stack, struct transition); 12541 t->from = row; 12542 t->to = column; 12543 } } } 12544 while ((top_of_stack = DSTACK_POP(stack, struct transition))) { 12545 guint old_from = top_of_stack->from; 12546 guint old_to = top_of_stack->to; 12547 guint new_ix; 12548 for (new_ix = 0; new_ix < size; new_ix++) { 12549 /* Optimizations based on reuse of the same row are 12550 probably best left to the compiler's optimizer. 12551 */ 12552 if (!matrix_bit_test(matrix, new_ix, old_to) && 12553 matrix_bit_test(matrix, new_ix, old_from)) { 12554 struct transition *t = (DSTACK_PUSH(stack, struct transition)); 12555 matrix_bit_set(matrix, new_ix, old_to); 12556 t->from = new_ix; 12557 t->to = old_to; 12558 } 12559 if (!matrix_bit_test(matrix, old_from, new_ix) && 12560 matrix_bit_test(matrix, old_to, new_ix)) { 12561 struct transition *t = (DSTACK_PUSH(stack, struct transition)); 12562 matrix_bit_set(matrix, old_from, new_ix); 12563 t->from = old_from; 12564 t->to = new_ix; 12565 } 12566 } 12567 } 12568 DSTACK_DESTROY(stack); 12569} 12570@ @<Private function prototypes@> = 12571static void transitive_closure(Bit_Matrix matrix); 12572 12573@** Efficient Stacks and Queues. 12574@ The interface for these macros is somewhat hackish, 12575in that the user often 12576must be aware of the implementation of the 12577macros. 12578Arguably, using these macros is not 12579all that easier than 12580hand-writing each instance. 12581But the most important goal was safety -- by 12582writing this stuff once I have a greater assurance 12583that it is tested and bug-free. 12584Another important goal was that there be 12585no compromise on efficiency, 12586when compared to hand-written code. 12587 12588@*0 Fixed Size Stacks. 12589|libmarpa| uses stacks and worklists extensively. 12590Often a reasonable maximum size is known when they are 12591set up, in which case they can be made very fast. 12592@d FSTACK_DECLARE(stack, type) struct { gint t_count; type* t_base; } stack; 12593@d FSTACK_CLEAR(stack) ((stack).t_count = 0) 12594@d FSTACK_INIT(stack, type, n) (FSTACK_CLEAR(stack), ((stack).t_base = g_new(type, n))) 12595@d FSTACK_SAFE(stack) ((stack).t_base = NULL) 12596@d FSTACK_BASE(stack, type) ((type *)(stack).t_base) 12597@d FSTACK_INDEX(this, type, ix) (FSTACK_BASE((this), type)+(ix)) 12598@d FSTACK_TOP(this, type) (FSTACK_LENGTH(this) <= 0 12599 ? NULL 12600 : FSTACK_INDEX((this), type, FSTACK_LENGTH(this)-1)) 12601@d FSTACK_LENGTH(stack) ((stack).t_count) 12602@d FSTACK_PUSH(stack) ((stack).t_base+stack.t_count++) 12603@d FSTACK_POP(stack) ((stack).t_count <= 0 ? NULL : (stack).t_base+(--(stack).t_count)) 12604@d FSTACK_IS_INITIALIZED(stack) ((stack).t_base) 12605@d FSTACK_DESTROY(stack) (g_free((stack).t_base)) 12606 12607@*0 Dynamic Stacks. 12608|libmarpa| uses stacks and worklists extensively. 12609This stack interface resizes itself dynamically. 12610There are two disadvantages. 12611 12612\li There is more overhead --- 12613overflow must be checked for with each push, 12614and the resizings, while fast, do take time. 12615 12616\li The stack may be moved after any |DSTACK_PUSH| 12617operation, making all pointers into it invalid. 12618Data must be retrieved from the stack before the 12619next |DSTACK_PUSH|. 12620 12621@d DSTACK_DECLARE(this) struct s_dstack this 12622@d DSTACK_INIT(this, type, initial_size) 12623 (((this).t_count = 0), 12624 ((this).t_base = g_new(type, ((this).t_capacity = (initial_size))))) 12625 12626@ |DSTACK_SAFE| is for cases where the dstack is not 12627immediately initialized to a useful value, 12628and might never be. 12629All fields are zeroed so that when the containing object 12630is destroyed, the deallocation logic knows that no 12631memory has been allocated and therefore no attempt 12632to free memory should be made. 12633@d DSTACK_IS_INITIALIZED(this) ((this).t_base) 12634@d DSTACK_SAFE(this) 12635 (((this).t_count = (this).t_capacity = 0), ((this).t_base = NULL)) 12636 12637@ A stack reinitialized by 12638|DSTACK_CLEAR| contains 0 elements, 12639but has the same capacity as it had before the reinitialization. 12640This saves the cost of reallocating the dstack's buffer, 12641and leaves its capacity at what is hopefully 12642a stable, high-water mark, which will make future 12643resizings unnecessary. 12644@d DSTACK_CLEAR(this) ((this).t_count = 0) 12645@d DSTACK_PUSH(this, type) 12646 (((this).t_count >= (this).t_capacity ? dstack_resize(&(this), sizeof(type)) : 0), 12647 ((type *)(this).t_base+(this).t_count++)) 12648@d DSTACK_POP(this, type) ((this).t_count <= 0 ? NULL : 12649 ( (type*)(this).t_base+(--(this).t_count))) 12650@d DSTACK_INDEX(this, type, ix) (DSTACK_BASE((this), type)+(ix)) 12651@d DSTACK_TOP(this, type) (DSTACK_LENGTH(this) <= 0 12652 ? NULL 12653 : DSTACK_INDEX((this), type, DSTACK_LENGTH(this)-1)) 12654@d DSTACK_BASE(this, type) ((type *)(this).t_base) 12655@d DSTACK_LENGTH(this) ((this).t_count) 12656 12657@ 12658|DSTACK|'s can have their data ``stolen", by other containers. 12659The |STOLEN_DSTACK_DATA_FREE| macro is intended 12660to help the ``thief" container 12661deallocate the data it now has ``stolen". 12662@d STOLEN_DSTACK_DATA_FREE(data) ((data) && (g_free(data), 1)) 12663@d DSTACK_DESTROY(this) STOLEN_DSTACK_DATA_FREE(this.t_base) 12664 12665@<Private incomplete structures@> = 12666struct s_dstack; 12667typedef struct s_dstack* DSTACK; 12668@ @<Private utility structures@> = 12669struct s_dstack { gint t_count; gint t_capacity; gpointer t_base; }; 12670@ @<Function definitions@> = 12671static inline gpointer dstack_resize(struct s_dstack* this, gsize type_bytes) { 12672 this->t_capacity *= 2; 12673 this->t_base = g_realloc(this->t_base, this->t_capacity*type_bytes); 12674 return this->t_base; 12675} 12676@ @<Private function prototypes@> = 12677static inline gpointer dstack_resize(struct s_dstack* this, gsize type_size); 12678 12679@*0 Dynamic Queues. 12680This is simply a dynamic stack extended with a second 12681index. 12682These is no destructor at this point, because so far all uses 12683of this let another container ``steal" the data from this one. 12684When one exists, it will simply call the dynamic stack destructor. 12685Instead I define a destructor for the ``thief" container to use 12686when it needs to free the data. 12687 12688@d DQUEUE_DECLARE(this) struct s_dqueue this 12689@d DQUEUE_INIT(this, type, initial_size) 12690 ((this.t_current=0), DSTACK_INIT(this.t_stack, type, initial_size)) 12691@d DQUEUE_PUSH(this, type) DSTACK_PUSH(this.t_stack, type) 12692@d DQUEUE_POP(this, type) DSTACK_POP(this.t_stack, type) 12693@d DQUEUE_NEXT(this, type) (this.t_current >= DSTACK_LENGTH(this.t_stack) 12694 ? NULL 12695 : (DSTACK_BASE(this.t_stack, type))+this.t_current++) 12696@d DQUEUE_BASE(this, type) DSTACK_BASE(this.t_stack, type) 12697@d DQUEUE_END(this) DSTACK_LENGTH(this.t_stack) 12698@d STOLEN_DQUEUE_DATA_FREE(data) STOLEN_DSTACK_DATA_FREE(data) 12699 12700@<Private incomplete structures@> = 12701struct s_dqueue; 12702typedef struct s_dqueue* DQUEUE; 12703@ @<Private structures@> = 12704struct s_dqueue { gint t_current; struct s_dstack t_stack; }; 12705 12706@** Per-Earley-Set List (PSL) Code. 12707There are several cases where Marpa needs to 12708look up a triple $\langle s,s',k \rangle$, 12709where $s$ and $s'$ are earlemes, and $0<k<n$, 12710where $n$ is a reasonably small constant, 12711such as the number of AHFA items. 12712Earley items, or-nodes and and-nodes are examples. 12713@ Lookup for Earley items needs to be $O(1)$ 12714to justify Marpa's time complexity claims. 12715Setup of the parse 12716bocage for evaluation is not 12717parsing in the strict sense, 12718but makes sense to have it meet the same time complexity claims. 12719@ 12720To obtain $O(1)$, 12721Marpa uses a special data structure, the Per-Earley-Set List. 12722The Per-Earley-Set Lists rely on the following being true: 12723\li It can be arranged so 12724that only one $s'$ is being considered at a time, 12725so that we are in fact looking up a duple $\langle s,k \rangle$. 12726\li In all cases of interest 12727we will have pointers available that take 12728us directly to all of the 12729Earley sets involved, 12730so that lookup of the data for an Earley set is $O(1)$. 12731\li The value of $k$ is always less than a constant. 12732Therefore any reasonable algorithm 12733for the search and insertion of $k$ is $O(1)$. 12734@ The idea is that each Earley set has a list of values 12735for all the keys $k$. 12736We arrange to consider only one Earley set $s$ at a time. 12737A pointer takes us to the Earley set $s'$ in $O(1)$ time. 12738Each Earley set has a list of values indexed by $k$. 12739Since this list is of a size less than a constant, 12740search and insertion in it is $O(1)$. 12741Thus each search and insertion for the triple 12742$\langle s,s',k \rangle$ takes $O(1)$ time. 12743@ In understanding how the PSL's are used, it is important 12744to keep in mind that the PSL's are kept in Earley sets as 12745a convenience, and that the semantic relation of the Earley set 12746to the data structure being tracked by the PSL is not important 12747in the choice of where the PSL goes. 12748All data structures tracked by PSL's belong 12749semantically more to 12750the Earley set of their dot earleme than any other, 12751but for the time complexity hack to work, 12752that must be held constand while another Earley set is 12753the one which varies. 12754In the case of Earley items and or-nodes, the varying 12755Earley set is the origin. 12756In the case of and-nodes, the origin Earley set is also 12757held constant, and the Earley set of the middle earleme 12758is the variable. 12759@ The PSL's are kept in a linked list. 12760Each contains |Size_of_PSL| |gpointer|'s. 12761|t_owner| is the address of the location 12762that ``owns" this PSL. 12763That location will be NULL'ed 12764when deallocating. 12765@<Private incomplete structures@> = 12766struct s_per_earley_set_list; 12767typedef struct s_per_earley_set_list *PSL; 12768@ @d Sizeof_PSL(psar) 12769 (sizeof(PSL_Object) + (psar->t_psl_length - 1) * sizeof(gpointer)) 12770@d PSL_Datum(psl, i) ((psl)->t_data[(i)]) 12771@<Private structures@> = 12772struct s_per_earley_set_list { 12773 PSL t_prev; 12774 PSL t_next; 12775 PSL* t_owner; 12776 gpointer t_data[1]; 12777}; 12778typedef struct s_per_earley_set_list PSL_Object; 12779@ The per-Earley-set lists are allcated from per-Earley-set arenas. 12780@<Private incomplete structures@> = 12781struct s_per_earley_set_arena; 12782typedef struct s_per_earley_set_arena *PSAR; 12783@ The ``dot" PSAR is to track earley items whose origin 12784or current earleme is at the ``dot" location, 12785that is, the current Earley set. 12786The ``predict" PSAR 12787is to track earley items for predictions 12788at locations other than the current earleme. 12789The ``predict" PSAR 12790is used for predictions which result from 12791scanned items. 12792Since they are predictions, their current Earley set 12793and origin are at the same earleme. 12794This earleme will be somewhere after the current earleme. 12795@<Private structures@> = 12796struct s_per_earley_set_arena { 12797 gint t_psl_length; 12798 PSL t_first_psl; 12799 PSL t_first_free_psl; 12800}; 12801typedef struct s_per_earley_set_arena PSAR_Object; 12802@ @d Dot_PSAR_of_R(r) (&(r)->t_dot_psar_object) 12803@<Widely aligned recognizer elements@> = 12804PSAR_Object t_dot_psar_object; 12805@ @<Initialize recognizer elements@> = 12806 psar_init(Dot_PSAR_of_R(r), AHFA_Count_of_R (r)); 12807@ @<Destroy recognizer elements@> = 12808 psar_destroy(Dot_PSAR_of_R(r)); 12809@ @<Private function prototypes@> = 12810static inline void psar_init(const PSAR psar, gint length); 12811static inline void psar_destroy(const PSAR psar); 12812static inline PSL psl_new(const PSAR psar); 12813@ @<Function definitions@> = 12814static inline void 12815psar_init (const PSAR psar, gint length) 12816{ 12817 psar->t_psl_length = length; 12818 psar->t_first_psl = psar->t_first_free_psl = psl_new (psar); 12819} 12820@ @<Function definitions@> = 12821static inline void psar_destroy(const PSAR psar) 12822{ 12823 PSL psl = psar->t_first_psl; 12824MARPA_OFF_DEBUG3("%s psl=%p", G_STRLOC, psl); 12825 while (psl) 12826 { 12827 PSL next_psl = psl->t_next; 12828 PSL *owner = psl->t_owner; 12829MARPA_OFF_DEBUG3("%s owner=%p", G_STRLOC, owner); 12830 if (owner) 12831 *owner = NULL; 12832 g_slice_free1 (Sizeof_PSL (psar), psl); 12833 psl = next_psl; 12834MARPA_OFF_DEBUG3("%s psl=%p", G_STRLOC, psl); 12835 } 12836} 12837@ @<Function definitions@> = 12838static inline PSL psl_new(const PSAR psar) { 12839 gint i; 12840 PSL new_psl = g_slice_alloc(Sizeof_PSL(psar)); 12841 new_psl->t_next = NULL; 12842 new_psl->t_prev = NULL; 12843 new_psl->t_owner = NULL; 12844 for (i = 0; i < psar->t_psl_length; i++) { 12845 PSL_Datum(new_psl, i) = NULL; 12846 } 12847 return new_psl; 12848} 12849@ 12850{\bf To Do}: @^To Do@> 12851This is temporary data 12852and perhaps should be keep track of on a per-phase 12853obstack. 12854@d Dot_PSL_of_ES(es) ((es)->t_dot_psl) 12855@<Widely aligned Earley set elements@> = 12856 PSL t_dot_psl; 12857@ @<Initialize Earley set PSL data@> = 12858{ set->t_dot_psl = NULL; } 12859 12860@ A PSAR reset nulls out the data in the PSL's. 12861It is a moderately expensive operation, usually 12862avoided by having the logic check for ``stale" data. 12863But when the PSAR is needed for a 12864a different type of PSL data, 12865one which will require different stale-detection logic, 12866the old PSL data need to be nulled. 12867@<Private function prototypes@> = 12868static inline void psar_reset(const PSAR psar); 12869@ @<Function definitions@> = 12870static inline void psar_reset(const PSAR psar) { 12871 PSL psl = psar->t_first_psl; 12872 while (psl && psl->t_owner) { 12873 gint i; 12874 for (i = 0; i < psar->t_psl_length; i++) { 12875 PSL_Datum(psl, i) = NULL; 12876 } 12877 psl = psl->t_next; 12878 } 12879 psar_dealloc(psar); 12880} 12881 12882@ A PSAR dealloc removes an owner's claim to the all of 12883its PSLs, 12884and puts them back on the free list. 12885It does {\bf not} null out the stale PSL items. 12886@<Private function prototypes@> = 12887static inline void psar_dealloc(const PSAR psar); 12888@ @<Function definitions@> = 12889static inline void psar_dealloc(const PSAR psar) { 12890 PSL psl = psar->t_first_psl; 12891 while (psl) { 12892 PSL* owner = psl->t_owner; 12893 if (!owner) break; 12894 (*owner) = NULL; 12895 psl->t_owner = NULL; 12896 psl = psl->t_next; 12897 } 12898 psar->t_first_free_psl = psar->t_first_psl; 12899} 12900 12901@ This function ``claims" a PSL. 12902The address of the claimed PSL and the PSAR 12903from which to claim it are arguments. 12904The caller must ensure that 12905there is not a PSL already 12906at the claiming address. 12907@<Private function prototypes@> = 12908static inline void psl_claim( 12909 PSL* const psl_owner, const PSAR psar); 12910@ @<Function definitions@> = 12911static inline void psl_claim( 12912 PSL* const psl_owner, const PSAR psar) { 12913 PSL new_psl = psl_alloc(psar); 12914 (*psl_owner) = new_psl; 12915 new_psl->t_owner = psl_owner; 12916} 12917 12918@ @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@> = 12919{ 12920 PSL *psl_owner = &per_es_data[PSL_ES_ORD].t_or_psl; 12921 if (!*psl_owner) 12922 psl_claim (psl_owner, or_psar); 12923 (CLAIMED_PSL) = *psl_owner; 12924} 12925#undef PSL_ES_ORD 12926#undef CLAIMED_PSL 12927 12928@ This function ``allocates" a PSL. 12929It gets a free PSL from the PSAR. 12930There must always be at least one free PSL in a PSAR. 12931This function replaces the allocated PSL with 12932a new free PSL when necessary. 12933@ @<Private function prototypes@> = 12934static inline PSL psl_alloc(const PSAR psar); 12935@ @<Function definitions@> = 12936static inline PSL psl_alloc(const PSAR psar) { 12937 PSL free_psl = psar->t_first_free_psl; 12938 PSL next_psl = free_psl->t_next; 12939 if (!next_psl) { 12940 next_psl = free_psl->t_next = psl_new(psar); 12941 next_psl->t_prev = free_psl; 12942 } 12943 psar->t_first_free_psl = next_psl; 12944 return free_psl; 12945} 12946 12947@** Memory Allocation. 12948 12949@ By default, 12950a memory allocation failure 12951inside the Marpa library is a fatal error. 12952If this is a problem, the application can change 12953configure |g_malloc| to use its own allocator 12954which does something else on failure. 12955What else an application can do is not at all clear, 12956which is why the usual practice 12957is to treatment memory allocation errors are 12958fatal, irrecoverable problems. 12959 12960@ An error 12961in memory allocation will be logged 12962in the domain that |g_malloc| 12963is using, not in the domain being used by Marpa. 12964 12965@ |libmarpa| uses |g_malloc|, either directly or indirectly. 12966Indirect use of |g_malloc| comes via obstacks and |g_slice|. 12967Both of these are more efficient, but both also 12968limit the ability to resize memory. 12969Obstacks also sharply limit the ability 12970to control the lifetime of the memory. 12971\par 12972It should be noted that the libraries used by |libmarpa| may 12973also allocate memory, using their own methods. 12974This allocation is often also |g_malloc| based. 12975\par 12976Obstacks are particularly useful for |libmarpa|. 12977Much of the memory allocated in |libmarpa| is 12978\li In individual allocations less than 4K, often considerable less. 12979\li Once created, are kept for the entire life of the either the grammar or the recognizer. 12980\li Once created, is never resized. 12981For these, obstacks are perfect. 12982|libmarpa|'s grammar has an obstacks. 12983Small allocations needed for the lifetime of the grammar 12984are allocated on these as the grammar object is built. 12985All these allocations are are conveniently and quickly deallocated when 12986the grammar's obstack is destroyed along with its parent grammar. 12987@d obstack_chunk_alloc g_malloc 12988@d obstack_chunk_free g_free 12989 12990@*0 Why the obstacks are renamed. 12991Regretfully, I realized I simply could not simply include the 12992GNU obstacks, because of three obstacles. 12993First, the error handling is not thread-safe. In fact, 12994since it relies on a global error handler, it is not even 12995safe for use by multiple libraries within one thread. 12996Since 12997the obstack ``error handling" consisted of exactly one 12998``out of memory" message, which Marpa will never use because 12999it uses |g_malloc|, this risk comes at no benefit whatsoever. 13000Removing the error handling was far easier than leaving it 13001in. 13002 13003@ Second, there were also portability complications 13004caused by the unneeded features of obstacks. 13005\li The GNU obtacks had a complex set of |ifdef|'s intended 13006to allow the same code to be part of GNU libc, 13007or not part of it, and the portability aspect of these 13008was daunting. 13009\li GNU obstack's lone error message was dragging in 13010GNU's internationalization. 13011(|libmarpa| avoids internationalization by leaving all 13012messaging and naming to the higher layers.) 13013It was far easier to rip out these features than to 13014deal with the issues they raised, 13015especially the portability 13016issues. 13017 13018@ Third, if I did choose to try to use GNU obstacks in its 13019original form, |libmarpa| would have to deal with issues 13020of interposing identical function names in the linking process. 13021I aim at portability, even to systems that I have no 13022direct access to. 13023This is, of course, a real challenge when 13024it comes to debugging. 13025It was not cheering to think of the prospect 13026of multiple 13027libraries with obstack functions being resolved by the linkers 13028of widely different systems. 13029If, for example, a function that I intended to be used was not the 13030one linked, the bug would usually be a silent one. 13031 13032@ Porting to systems with no native obstack meant that I was 13033already in the business of maintaining my own obstacks code, 13034whether I liked it or not. 13035The only reasonable alternative seemed to be 13036to create my own version of obstacks, 13037essentially copying the GNU implementation, 13038but eliminating the unnecessary 13039but problematic features. 13040Namespace issues could then be dealt with by 13041renaming the external functions. 13042 13043@** External Failure Reports. 13044Most of 13045|libmarpa|'s external functions return failure under 13046one or more circumstances --- for 13047example, they may have been called incorrectly. 13048Many of the external routines share failure logic in 13049common. 13050I found it convenient to gather much of this logic here. 13051 13052@ External routines will differ in the exact value 13053they return on failure. 13054Routines returning a pointer will return a |NULL|. 13055External routines which return an integer value 13056will return either |-2| as a general failure 13057indicator, 13058so that |-1| can be reserved for special purposes. 13059@ The circumstances under 13060which |-1| is returned are described in the section 13061for each external function call. 13062Typical meanings of |-1| are 13063``not defined", or ``does not exist". 13064 13065@ The final decision about the meaning of 13066return values is up to the higher layers. 13067A general failure return 13068(|NULL| or |-2|) will 13069typically be a hard failure. 13070A |-1| return may be reasonably be 13071interpreted as a normal 13072return value, a soft failure, 13073or a hard failure, 13074depending on the context. 13075 13076@ For this reason, 13077all the logic in this section expects |failure_indication| 13078to be set in the scope in which it is used. 13079All failures treated in this section are general failures, 13080so that |-1| is not used as a return value. 13081 13082@ Routines with nothing else to return often use |FALSE| as the failure indicator. 13083@<Return |FALSE| on failure@> = const gboolean failure_indicator = FALSE; 13084@ Routines returning pointers often use |NULL| as the failure indicator. 13085@<Return |NULL| on failure@> = const gpointer failure_indicator = NULL; 13086@ Routines returning integer value use |-2| as the 13087general failure indicator. 13088@<Return |-2| on failure@> = const int failure_indicator = -2; 13089 13090@*0 Grammar Failures. 13091|g| is assumed to be the value of the relevant grammar, 13092when one is required. 13093@<Fail if grammar is precomputed@> = 13094if (G_is_Precomputed(g)) { 13095 g_context_clear(g); 13096 g->t_error = "grammar precomputed"; 13097 return failure_indicator; 13098} 13099@ @<Fail if grammar not precomputed@> = 13100if (!G_is_Precomputed(g)) { 13101 g_context_clear(g); 13102 g->t_error = "grammar not precomputed"; 13103 return failure_indicator; 13104} 13105@ @<Fail if grammar |symid| is invalid@> = 13106if (!symbol_is_valid(g, symid)) { 13107 g_context_clear(g); 13108 g_context_int_add(g, "symid", symid); 13109 g->t_error = "invalid symbol id"; 13110 return failure_indicator; 13111} 13112@ @<Fail if grammar |rule_id| is invalid@> = 13113if (!RULEID_of_G_is_Valid(g, rule_id)) { 13114 g_context_clear(g); 13115 g_context_int_add(g, "rule_id", rule_id); 13116 g->t_error = "invalid rule id"; 13117 return failure_indicator; 13118} 13119@ @<Fail if grammar |item_id| is invalid@> = 13120if (!item_is_valid(g, item_id)) { 13121 g_context_clear(g); 13122 g_context_int_add(g, "item_id", item_id); 13123 g->t_error = "invalid item id"; 13124 return failure_indicator; 13125} 13126@ @<Fail if grammar |AHFA_state_id| is invalid@> = 13127if (!AHFA_state_id_is_valid(g, AHFA_state_id)) { 13128 g_context_clear(g); 13129 g_context_int_add(g, "AHFA_state_id", AHFA_state_id); 13130 g->t_error = "invalid AHFA state id"; 13131 return failure_indicator; 13132} 13133@ @<Fail grammar if elements of |result| are not |sizeof(gint)|@> = 13134if (sizeof(gint) != g_array_get_element_size(result)) { 13135 g_context_clear(g); 13136 g_context_int_add(g, "expected size", sizeof(gint)); 13137 g->t_error = "garray size mismatch"; 13138 return failure_indicator; 13139} 13140@ @<Fail with internal grammar error@> = { 13141 g_context_clear(g); 13142 g->t_error = "internal error"; 13143 return failure_indicator; 13144} 13145 13146@*0 Recognizer Failures. 13147|r| is assumed to be the value of the relevant recognizer, 13148when one is required. 13149@<Fail if recognizer not initial@> = 13150if (Phase_of_R(r) != initial_phase) { 13151 R_ERROR("not initial recce phase"); 13152 return failure_indicator; 13153} 13154@ @<Fail if recognizer initial@> = 13155if (Phase_of_R(r) == initial_phase) { 13156 R_ERROR("initial recce phase"); 13157 return failure_indicator; 13158} 13159@ @<Fail if recognizer exhausted@> = 13160if (R_is_Exhausted(r)) { 13161 R_ERROR("recce exhausted"); 13162 return failure_indicator; 13163} 13164@ @<Fail if recognizer not in input phase@> = 13165if (Phase_of_R(r) != input_phase) { 13166 R_ERROR("recce not in input phase"); 13167 return failure_indicator; 13168} 13169@ @<Fail recognizer if not trace-safe@> = 13170switch (Phase_of_R(r)) { 13171default: 13172 R_ERROR("recce not trace-safe"); 13173 return failure_indicator; 13174case input_phase: 13175case evaluation_phase: 13176break; 13177} 13178@ @<Fail if recognizer has fatal error@> = 13179if (Phase_of_R(r) == error_phase) { 13180 R_ERROR(r->t_fatal_error); 13181 return failure_indicator; 13182} 13183@ @<Fail if recognizer |symid| is invalid@> = 13184if (!symbol_is_valid(G_of_R(r), symid)) { 13185 r_context_clear(r); 13186 r_context_int_add(r, "symid", symid); 13187 R_ERROR_CXT("invalid symid"); 13188 return failure_indicator; 13189} 13190@ @<Fail recognizer if |GArray| elements are not |sizeof(gint)|@> = 13191if (sizeof(gint) != g_array_get_element_size(result)) { 13192 r_context_clear(r); 13193 r_context_int_add(r, "expected size", sizeof(gint)); 13194 R_ERROR_CXT("garray size mismatch"); 13195 return failure_indicator; 13196} 13197 13198@ The central error routine for the recognizer. 13199There are two flags which control its behavior. 13200One flag makes a error recognizer-fatal. 13201When there is a recognizer-fatal error, all 13202subsequent 13203invocations of external functions for that recognizer 13204object will fail. 13205It is a design goal of libmarpa to leave as much discretion 13206about error handling to the higher layers as possible. 13207Because of this, even the most severe errors 13208are not necessarily made recognizer-fatal. 13209|libmarpa| makes an 13210error recognizer-fatal only when the integrity of the 13211recognizer object is so thorougly compromised 13212that |libmarpa|'s external functions cannot proceed 13213without risking internal memory errors, 13214such as bus errors and segment violations. 13215``Recognizer-fatal" status is thus, 13216not a means of dictating to the higher layers that a 13217|libmarpa| condition must be application-fatal, 13218but a way of preventing a recognizer error from becoming 13219application-fatal without the application's consent. 13220@d FATAL_FLAG (0x1u) 13221@ Another flag indicates that the caller set up the 13222context. 13223By default, |r_error| clears the context. 13224@d CONTEXT_FLAG (0x2u) 13225@ Several convenience macros are provided. 13226These are easier and less error-prone 13227than specifying the flags. 13228Not being error-prone 13229is important since there are many calls to |r_error| 13230in the code. 13231@d R_ERROR(message) (r_error(r, (message), 0u)) 13232@d R_ERROR_CXT(message) (r_error(r, (message), CONTEXT_FLAG)) 13233@d R_FATAL(message) (r_error(r, (message), FATAL_FLAG)) 13234@d R_FATAL_CXT(message) (r_error(r, (message), CONTEXT_FLAG|FATAL_FLAG)) 13235@<Private function prototypes@> = 13236static void r_error( struct marpa_r* r, Marpa_Message_ID message, guint flags ); 13237@ Not inlined. |r_error| 13238occurs in the code quite often, 13239but |r_error| 13240should actually be invoked only in exceptional circumstances. 13241In this case space clearly is much more important than speed. 13242@<Function definitions@> = 13243static void r_error( struct marpa_r* r, Marpa_Message_ID message, guint flags ) { 13244 if (!(flags & CONTEXT_FLAG)) r_context_clear(r); 13245 r->t_error = message; 13246 if (flags & FATAL_FLAG) r->t_fatal_error = r->t_error; 13247 r_message(r, message); 13248} 13249 13250@** Messages and Logging. 13251The main messaging system for |libmarpa| relies on callbacks 13252to upper layers. 13253But there are many cases in which it is not appropriate 13254to rely on the upper layers. 13255These cases include 13256serious internal problems, 13257memory allocation failures, 13258and debugging. 13259 13260\par As a fallback messaging and logging system, 13261|libmarpa| uses |glib|'s Message Logging framework. 13262When the messsage domain is 13263under |libmarpa|'s control, 13264Marpa sets the domain to |"Marpa"|. 13265In many cases, such as memory allocation failures, 13266the domain will be as set by |glib|. 13267@ Set the Logging Domain 13268@<Logging domain@> = 13269#undef G_LOG_DOMAIN@/ 13270#define G_LOG_DOMAIN "Marpa"@/ 13271 13272@*0 Message callbacks. 13273The user can define a callback 13274(with argument) which is invoked whenever |libmarpa| 13275has a message for the upper layers. 13276Note a lot of strings are used for convenience 13277in these messages. 13278These should be considered ``cookies", 13279as is they were file name or variables names. 13280They should not be regarded as part of the user 13281interface, even if some default or fallback routines 13282may sometimes expose them to the user. 13283And they should 13284not be subject to internationalization or localization. 13285 13286These message cookies are always null-terminated in 13287the 7-bit ASCII character set. 13288This is a lowest common denominator, and is not a choice 13289binding on the upper layers, 13290which may use one of the Unicode encoding or anything 13291else. 13292Cookies often are mnemonics in the English language, 13293but this should not be regarded 13294as a reason to subject them to translation --- 13295at least not unless you are also translating the variable 13296names and file names. 13297 13298The intent is to have all internationalization, 13299localization and string encoding issues dealt with 13300by the upper layers. 13301@<Public typedefs@> = 13302typedef const gchar* Marpa_Message_ID; 13303 13304@* Grammar Messages. 13305@ Function pointer declarations are 13306hard to type and impossible to read. 13307This typedef localizes the damage. 13308@<Callback typedefs@> = 13309typedef void (Marpa_G_Message_Callback)(struct marpa_g *g, Marpa_Message_ID id); 13310@ @<Widely aligned grammar elements@> = 13311 Marpa_G_Message_Callback* t_message_callback; 13312 gpointer t_message_callback_arg; 13313@ @<Initialize grammar elements@> = 13314g->t_message_callback_arg = NULL; 13315g->t_message_callback = NULL; 13316@ @<Function definitions@> = 13317void marpa_g_message_callback_set(struct marpa_g *g, Marpa_G_Message_Callback*cb) 13318{ g->t_message_callback = cb; } 13319void marpa_g_message_callback_arg_set(struct marpa_g *g, gpointer cb_arg) 13320{ g->t_message_callback_arg = cb_arg; } 13321gpointer marpa_g_message_callback_arg(struct marpa_g *g) 13322{ return g->t_message_callback_arg; } 13323@ @<Public function prototypes@> = 13324void marpa_g_message_callback_set(struct marpa_g *g, Marpa_G_Message_Callback*cb); 13325void marpa_g_message_callback_arg_set(struct marpa_g *g, gpointer cb_arg); 13326gpointer marpa_g_message_callback_arg(struct marpa_g *g); 13327@ Do the message callback. 13328The name of this function is spelled out to avoid a conflict with a 13329|glib| function. 13330Note that the memory management assumes that the 13331callback either exits or returns control to |libmarpa|. 13332A |longjmp| out of a callback will probably cause a memory leak. 13333@<Function definitions@> = 13334static inline void grammar_message(struct marpa_g *g, Marpa_Message_ID id) 13335{ Marpa_G_Message_Callback* cb = g->t_message_callback; 13336if (cb) { (*cb)(g, id); } } 13337@ @<Private function prototypes@> = 13338static inline void grammar_message(struct marpa_g *g, Marpa_Message_ID id); 13339 13340@* Recognizer Messages. 13341@ Essentially the same as grammar messages, 13342except they live in and use the recognizer object. 13343@<Callback typedefs@> = 13344typedef void (Marpa_R_Message_Callback)(struct marpa_r *r, Marpa_Message_ID id); 13345@ @d Message_Callback_of_R(r) ((r)->t_message_callback) 13346@d Message_Callback_Arg_of_R(r) ((r)->t_message_callback_arg) 13347@<Widely aligned recognizer elements@> = 13348 Marpa_R_Message_Callback* t_message_callback; 13349 gpointer t_message_callback_arg; 13350@ @<Initialize recognizer elements@> = 13351r->t_message_callback_arg = NULL; 13352r->t_message_callback = NULL; 13353@ @<Function definitions@> = 13354void marpa_r_message_callback_set(struct marpa_r *r, Marpa_R_Message_Callback*cb) 13355{ r->t_message_callback = cb; } 13356void marpa_r_message_callback_arg_set(struct marpa_r *r, gpointer cb_arg) 13357{ r->t_message_callback_arg = cb_arg; } 13358gpointer marpa_r_message_callback_arg(struct marpa_r *r) 13359{ return Message_Callback_Arg_of_R(r); } 13360@ @<Public function prototypes@> = 13361void marpa_r_message_callback_set(struct marpa_r *r, Marpa_R_Message_Callback*cb); 13362void marpa_r_message_callback_arg_set(struct marpa_r *r, gpointer cb_arg); 13363gpointer marpa_r_message_callback_arg(struct marpa_r *r); 13364@ @<Function definitions@> = 13365static inline void r_message(struct marpa_r *r, Marpa_Message_ID id) 13366{ Marpa_R_Message_Callback* cb = Message_Callback_of_R(r); 13367if (cb) { (*cb)(r, id); } } 13368@ @<Private function prototypes@> = 13369static inline void r_message(struct marpa_r *r, Marpa_Message_ID id); 13370 13371@** Debugging. 13372The |MARPA_DEBUG| flag enables intrusive debugging logic. 13373``Intrusive" debugging includes things which would 13374be annoying in production, such as detailed messages about 13375internal matters on |STDERR|. 13376@d MARPA_OFF_DEBUG1(a) 13377@d MARPA_OFF_DEBUG2(a, b) 13378@d MARPA_OFF_DEBUG3(a, b, c) 13379@d MARPA_OFF_DEBUG4(a, b, c, d) 13380@d MARPA_OFF_DEBUG5(a, b, c, d, e) 13381@d MARPA_OFF_ASSERT(expr) 13382@<Debug macros@> = 13383#define MARPA_DEBUG @[ 0 @] 13384#define MARPA_ENABLE_ASSERT @[ 0 @] 13385#if MARPA_DEBUG 13386#define MARPA_DEBUG1(a) @[ g_debug((a)); @] 13387#define MARPA_DEBUG2(a, b) @[ g_debug((a),(b)); @] 13388#define MARPA_DEBUG3(a, b, c) @[ g_debug((a),(b),(c)); @] 13389#define MARPA_DEBUG4(a, b, c, d) @[ g_debug((a),(b),(c),(d)); @] 13390#define MARPA_DEBUG5(a, b, c, d, e) @[ g_debug((a),(b),(c),(d),(e)); @] 13391#define MARPA_ASSERT(expr) do { if G_LIKELY (expr) ; else \ 13392 g_error ("%s: assertion failed %s", G_STRLOC, #expr); } while (0); 13393#else /* if not |MARPA_DEBUG| */ 13394#define MARPA_DEBUG1(a) @[@] 13395#define MARPA_DEBUG2(a, b) @[@] 13396#define MARPA_DEBUG3(a, b, c) @[@] 13397#define MARPA_DEBUG4(a, b, c, d) @[@] 13398#define MARPA_DEBUG5(a, b, c, d, e) @[@] 13399#define MARPA_ASSERT(exp) @[@] 13400#endif 13401 13402#if MARPA_ENABLE_ASSERT 13403#undef MARPA_ASSERT 13404#define MARPA_ASSERT(expr) do { if G_LIKELY (expr) ; else \ 13405 g_error ("%s: assertion failed %s", G_STRLOC, #expr); } while (0); 13406#endif 13407 13408@*0 Earley Item Tag. 13409A function to print a descriptive tag for 13410an Earley item. 13411@<Private function prototypes@> = 13412#if MARPA_DEBUG 13413PRIVATE_NOT_INLINE gchar* eim_tag_safe(gchar *buffer, EIM eim); 13414PRIVATE_NOT_INLINE gchar* eim_tag(EIM eim); 13415#endif 13416@ It is passed a buffer to keep it thread-safe. 13417@<Function definitions@> = 13418#if MARPA_DEBUG 13419PRIVATE_NOT_INLINE gchar * 13420eim_tag_safe (gchar * buffer, EIM eim) 13421{ 13422 sprintf (buffer, "S%d@@%d-%d", 13423 AHFAID_of_EIM (eim), Origin_Earleme_of_EIM (eim), 13424 Earleme_of_EIM (eim)); 13425 return buffer; 13426} 13427 13428static char DEBUG_eim_tag_buffer[1000]; 13429PRIVATE_NOT_INLINE gchar* 13430eim_tag (EIM eim) 13431{ 13432 return eim_tag_safe (DEBUG_eim_tag_buffer, eim); 13433} 13434#endif 13435 13436@*0 Leo Item Tag. 13437A function to print a descriptive tag for 13438an Leo item. 13439@<Private function prototypes@> = 13440#if MARPA_DEBUG 13441PRIVATE_NOT_INLINE gchar* lim_tag_safe (gchar *buffer, LIM lim); 13442PRIVATE_NOT_INLINE gchar* lim_tag (LIM lim); 13443#endif 13444@ This function is passed a buffer to keep it thread-safe. 13445be made thread-safe. 13446@<Function definitions@> = 13447#if MARPA_DEBUG 13448PRIVATE_NOT_INLINE gchar* 13449lim_tag_safe (gchar *buffer, LIM lim) 13450{ 13451 sprintf (buffer, "L%d@@%d", 13452 Postdot_SYMID_of_LIM (lim), Earleme_of_LIM (lim)); 13453 return buffer; 13454} 13455 13456static char DEBUG_lim_tag_buffer[1000]; 13457PRIVATE_NOT_INLINE gchar* 13458lim_tag (LIM lim) 13459{ 13460 return lim_tag_safe (DEBUG_lim_tag_buffer, lim); 13461} 13462#endif 13463 13464@*0 Or-Node Tag. 13465Functions to print a descriptive tag for 13466an or-node item. 13467One is thread-safe, the other is 13468more convenient but not thread-safe. 13469@<Private function prototypes@> = 13470#if MARPA_DEBUG 13471PRIVATE_NOT_INLINE const gchar* or_tag_safe(gchar *buffer, OR or); 13472PRIVATE_NOT_INLINE const gchar* or_tag(OR or); 13473#endif 13474@ It is passed a buffer to keep it thread-safe. 13475@<Function definitions@> = 13476#if MARPA_DEBUG 13477PRIVATE_NOT_INLINE const gchar * 13478or_tag_safe (gchar * buffer, OR or) 13479{ 13480 if (!or) return "NULL"; 13481 if (OR_is_Token(or)) return "TOKEN"; 13482 if (Type_of_OR(or) == DUMMY_OR_NODE) return "DUMMY"; 13483 sprintf (buffer, "R%d:%d@@%d-%d", 13484 ID_of_RULE(RULE_of_OR (or)), Position_of_OR (or), 13485 Origin_Ord_of_OR (or), 13486 ES_Ord_of_OR (or)); 13487 return buffer; 13488} 13489 13490static char DEBUG_or_tag_buffer[1000]; 13491PRIVATE_NOT_INLINE const gchar* 13492or_tag (OR or) 13493{ 13494 return or_tag_safe (DEBUG_or_tag_buffer, or); 13495} 13496#endif 13497 13498@*0 AHFA Item Tag. 13499Functions to print a descriptive tag for 13500an AHFA item. 13501One is passed a buffer to keep it thread-safe. 13502The other uses a global buffer, 13503which is not thread-safe, but 13504convenient when debugging in a non-threaded environment. 13505@<Private function prototypes@> = 13506#if MARPA_DEBUG 13507PRIVATE_NOT_INLINE const gchar* aim_tag_safe(gchar *buffer, AIM aim); 13508PRIVATE_NOT_INLINE const gchar* aim_tag(AIM aim); 13509#endif 13510@ @<Function definitions@> = 13511#if MARPA_DEBUG 13512PRIVATE_NOT_INLINE const gchar * 13513aim_tag_safe (gchar * buffer, AIM aim) 13514{ 13515 if (!aim) return "NULL"; 13516 const gint aim_position = Position_of_AIM (aim); 13517 if (aim_position >= 0) { 13518 sprintf (buffer, "R%d@@%d", RULEID_of_AIM (aim), Position_of_AIM (aim)); 13519 } else { 13520 sprintf (buffer, "R%d@@end", RULEID_of_AIM (aim)); 13521 } 13522 return buffer; 13523} 13524 13525static char DEBUG_aim_tag_buffer[1000]; 13526PRIVATE_NOT_INLINE const gchar* 13527aim_tag (AIM aim) 13528{ 13529 return aim_tag_safe (DEBUG_aim_tag_buffer, aim); 13530} 13531#endif 13532 13533 13534@** File Layout. 13535@ The output files are {\bf not} source files, 13536but I add the license to them anyway, 13537as close to the top as possible. 13538@ Also, it is helpful to someone first 13539trying to orient herself, 13540if built source files contain a comment 13541to that effect and a warning 13542not that they are 13543not intended to be edited directly. 13544So I add such a comment. 13545 13546@*0 |marpa.c| Layout. 13547@q This is a hack to get the @> 13548@q license language nearer the top of the files. @> 13549@ The physical structure of the |marpa.c| file 13550\tenpoint 13551@c 13552@=/*@>@/ 13553@= * Copyright 2012 Jeffrey Kegler@>@/ 13554@= * This file is part of Marpa::XS. Marpa::XS is free software: you can@>@/ 13555@= * redistribute it and/or modify it under the terms of the GNU Lesser@>@/ 13556@= * General Public License as published by the Free Software Foundation,@>@/ 13557@= * either version 3 of the License, or (at your option) any later version.@>@/ 13558@= *@>@/ 13559@= * Marpa::XS is distributed in the hope that it will be useful,@>@/ 13560@= * but WITHOUT ANY WARRANTY; without even the implied warranty of@>@/ 13561@= * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU@>@/ 13562@= * Lesser General Public License for more details.@>@/ 13563@= *@>@/ 13564@= * You should have received a copy of the GNU Lesser@>@/ 13565@= * General Public License along with Marpa::XS. If not, see@>@/ 13566@= * http://www.gnu.org/licenses/.@>@/ 13567@= */@>@/ 13568@=/*@>@/ 13569@= * DO NOT EDIT DIRECTLY@>@/ 13570@= * This file is written by ctangle@>@/ 13571@= * It is not intended to be modified directly@>@/ 13572@= */@>@/ 13573 13574@ \twelvepoint @c 13575#include "config.h" 13576#include "marpa.h" 13577@<Debug macros@> 13578@h 13579#include "marpa_obs.h" 13580@<Logging domain@>@; 13581@<Private incomplete structures@>@; 13582@<Private typedefs@>@; 13583@<Private global variables@>@; 13584@<Private utility structures@>@; 13585@<Private structures@>@; 13586@<Recognizer structure@>@; 13587@<Source object structure@>@; 13588@<Earley item structure@>@; 13589@<Bocage structure@>@; 13590@<Private function prototypes@>@; 13591@<Private inline functions@>@; 13592@<Function definitions@>@; 13593 13594@*0 |marpa.h| Layout. 13595@q This is a separate section in order to get the @> 13596@q license language nearer the top of the files. @> 13597@q It's hackish, but in a good cause. @> 13598@ The physical structure of the |marpa.h| file 13599\tenpoint 13600@(marpa.h@> = 13601@=/*@>@/ 13602@= * Copyright 2012 Jeffrey Kegler@>@/ 13603@= * This file is part of Marpa::XS. Marpa::XS is free software: you can@>@/ 13604@= * redistribute it and/or modify it under the terms of the GNU Lesser@>@/ 13605@= * General Public License as published by the Free Software Foundation,@>@/ 13606@= * either version 3 of the License, or (at your option) any later version.@>@/ 13607@= *@>@/ 13608@= * Marpa::XS is distributed in the hope that it will be useful,@>@/ 13609@= * but WITHOUT ANY WARRANTY; without even the implied warranty of@>@/ 13610@= * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU@>@/ 13611@= * Lesser General Public License for more details.@>@/ 13612@= *@>@/ 13613@= * You should have received a copy of the GNU Lesser@>@/ 13614@= * General Public License along with Marpa::XS. If not, see@>@/ 13615@= * http://www.gnu.org/licenses/.@>@/ 13616@= */@>@/ 13617@=/*@>@/ 13618@= * DO NOT EDIT DIRECTLY@>@/ 13619@= * This file is written by ctangle@>@/ 13620@= * It is not intended to be modified directly@>@/ 13621@= */@>@/ 13622 13623@ \twelvepoint 13624@(marpa.h@> = 13625#ifndef __MARPA_H__ 13626#define __MARPA_H__ @/ 13627#include <stdio.h> 13628#include <glib.h> 13629@<Body of public header file@> 13630#endif __MARPA_H__ 13631 13632@** Proofs. 13633 13634For |libmarpa|, more than inspection of 13635the code is desirable to establish confidence 13636that it works as intended. 13637For some non-obvious points, proofs are useful 13638to increase the level of confidence. 13639 13640@*0 Leo completion states are AHFA singletons. 13641 13642@ {\bf Motivation:} 13643|libmarpa| combines Joop Leo's enhancements to the 13644Earley algorithm with those of Aycock and Horspool. 13645While it was clear such a thing would be 13646possible, given enough effort, it was {\bf not} 13647obvious that the combined algorithm would preserve 13648the efficiencies of the algorithms from which it 13649was derived. 13650 13651This proof establishes the key fact to show that, 13652in fact, the Leo algorithm is compatible 13653with the Aycock and Horspool algorithms. 13654The following is an outline, 13655which assumes familiarity with the underlying algorithms. 13656 13657@ {\bf Theorem:} In |libmarpa|, 13658all Leo completion states are in their own LR(0) state. 13659 13660@ {\bf Proof:} 13661In |libmarpa|, every 13662Leo completion LR(0) item will have a non-nulling symbol, 13663by Leo's definitons. 13664Therefore, every Leo completion will have a final non-nulling 13665symbol. 13666Call the Leo completion item's final non-nulling symbol, $S$. 13667 13668Call the LR(0) DFA state containing the Leo Completion item $C$. 13669Call the Leo completion LR(0) item $C1$. 13670Suppose, for reduction to absurdity, 13671that another LR(0) item is combined with 13672the Leo completion LR(0) item in the LR(0) DFA. 13673Call this second LR(0) item $C2$. 13674 13675If so, 13676there must be Leo LR(0) DFA state, 13677$C_{predecessor}$, where two of the 13678LR(0) items, after a transition on symbol $S$, 13679produce both $C1$ and $C2$. 13680That means that in $C_{predecessor}$, 13681there are two LR(0) items with S as the postdot symbol, 13682and that these two items are predecessors of $C1$ and $C2$. 13683Call them $P1$ and $P2$. 13684$P1 \neq P2$, because $C1 \neq C2$ and different LR(0) 13685items always have different predecessors. 13686 13687Therefore $C_{predecessor}$ will contain $P1$ and $P2$, 13688two LR(0) items, both 13689with $S$ as the postdot symbol. 13690But by Leo's definitions, the transition on the postdot 13691symbol into a Leo completion state 13692must be unique. 13693Therefore $C_{predecessor}$ cannot exist. 13694This completes the reduction to absurdity, 13695and the proof. 13696QED. 13697 13698@ {\bf Theorem:} 13699All Leo completion states are in their own AHFA state. 13700 13701{\bf Proof:} 13702By the theorem above, all Leo completion states are in 13703their own state in the LR(0) DFA. 13704The conversion to an epsilion-DFA will not add any items to this 13705state, because the only item in it is a completion item. 13706And conversion to a split epsilon-DFA will not add items. 13707So the Leo completion item will remain in its own state as 13708the AHFA is constructed. 13709QED. 13710 13711@** Index. 13712 13713