1% Copyright 2012 Jeffrey Kegler
2% This file is part of Marpa::XS.  Marpa::XS is free software: you can
3% redistribute it and/or modify it under the terms of the GNU Lesser
4% General Public License as published by the Free Software Foundation,
5% either version 3 of the License, or (at your option) any later version.
6%
7% Marpa::XS is distributed in the hope that it will be useful,
8% but WITHOUT ANY WARRANTY; without even the implied warranty of
9% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
10% Lesser General Public License for more details.
11%
12% You should have received a copy of the GNU Lesser
13% General Public License along with Marpa::XS.  If not, see
14% http://www.gnu.org/licenses/.
15
16\def\li{\item{$\bullet$}}
17
18% Here is TeX material that gets inserted after \input cwebmac
19\def\hang{\hangindent 3em\indent\ignorespaces}
20\def\pb{$\.|\ldots\.|$} % C brackets (|...|)
21\def\v{\char'174} % vertical (|) in typewriter font
22\def\dleft{[\![} \def\dright{]\!]} % double brackets
23\mathchardef\RA="3221 % right arrow
24\mathchardef\BA="3224 % double arrow
25\def\({} % ) kludge for alphabetizing certain section names
26\def\TeXxstring{\\{\TEX/\_string}}
27\def\skipxTeX{\\{skip\_\TEX/}}
28\def\copyxTeX{\\{copy\_\TEX/}}
29
30\let\K=\Longleftarrow
31
32\secpagedepth=1
33
34\def\title{Code for Marpa}
35\def\topofcontents{\null\vfill
36  \centerline{\titlefont Code for Marpa}
37  \vfill}
38\def\botofcontents{\vfill
39\noindent
40@i copyright_page_license.w
41\bigskip
42\leftline{\sc\today\ at \hours} % timestamps the contents page
43}
44% \datecontentspage
45
46\pageno=\contentspagenumber \advance\pageno by 1
47\let\maybe=\iftrue
48
49\def\marpa_sub#1{{\bf #1}: }
50\def\libmarpa/{{\tt libmarpa}}
51\def\QED/{{\bf QED}}
52\def\Theorem/{{\bf Theorem}}
53\def\Proof/{{\bf Theorem}}
54\def\size#1{\v #1\v}
55\def\gsize{\v g\v}
56\def\wsize{\v w\v}
57
58@q Unreserve the C++ keywords @>
59@s asm normal
60@s dynamic_cast normal
61@s namespace normal
62@s reinterpret_cast normal
63@s try normal
64@s bool normal
65@s explicit normal
66@s new normal
67@s static_cast normal
68@s typeid normal
69@s catch normal
70@s false normal
71@s operator normal
72@s template normal
73@s typename normal
74@s class normal
75@s friend normal
76@s private normal
77@s this normal
78@s using normal
79@s const_cast normal
80@s public normal
81@s throw normal
82@s virtual normal
83@s delete normal
84@s mutable normal
85@s protected normal
86@s true normal
87@s wchar_t normal
88@s and normal
89@s bitand normal
90@s compl normal
91@s not_eq normal
92@s or_eq normal
93@s xor_eq normal
94@s and_eq normal
95@s bitor normal
96@s not normal
97@s or normal
98@s xor normal
99
100@s error normal
101@s gconstpointer int
102@s gpointer int
103@s gint int
104@s guint int
105@s gboolean int
106@s PSAR int
107@s PSL int
108
109@** License.
110\bigskip\noindent
111@i copyright_page_license.w
112
113@** About This Document.
114This document is very much under construction,
115enough so that readers may question why I make it
116available at all.  Two reasons:
117\li Despite its problems, it is the best way to read the source code
118at this point.
119\li Since it is essential to changing the code, not making it available
120could be seen to violate the spirit of the open source.
121@ This will eventually become a real book describing the
122code.
123It is already approaching that in size.
124Quality is another story.
125Much rewriting and reorganization is being left until the end.
126\par
127Marpa is a very unusual C library -- no system calls, no floating
128point and almost no arithmetic.  A lot of data structures
129and pointer twiddling.
130I have found that a lot of good coding practices in other
131contexts are not in this one.
132\par
133For example, I intended to fully to avoid abbreviations.
134This is good practice -- in most cases all abbreviations save is
135some typing, at a very high cost in readability.
136In |libmarpa|, however, spelling things out usually does
137{\bf not} make them more readable.
138To be sure, |To_AHFA_of_EIM_by_SYMID| is pretty incomprehensible.
139But is
140$$Aycock\_Horspool\_Finite\_Automaton\_To\_State\_of\_Earley\_Item\_by\_Symbol\_ID$$
141better?
142At this point, I have a lot of practice coming back to pages of both, cold,
143and trying to figure them out.
144Both are daunting, but the abbreviations, are more elegant, and look
145better on the page, while unabbreviated names routinely pose almost insoluble
146problems for Cweb's \TeX{} typesetting.
147\par
148Whichever is used, it must be kept systematic and
149documented, and that is easier with the abbreviations.
150In general, I believe abbreviations are used in code
151far more than they should be.  But they have their place
152and |libmarpa| is one of them.
153\par
154Because I realized that abbreviations were going to be not
155just better, but almost essential if I ever was to finish this
156project, I changed from a ``no abbreviation" policy to one
157of ``abbreviate when necessary and it is necessary a lot" half
158way through.
159Thus the code is highly inconsistent in this respect.
160At the moment,
161that's true of a lot of my other coding conventions.
162\par
163To summarize, the reader who has not yet been scared off,
164needs to be aware that the coding conventions are not yet
165consistent internally, and not yet consistent with their
166documentation.
167@
168The Cweb is being written along with the code.
169If the code works right off the bat, its accompanying text
170will be a first draft.
171The more trouble I had understanding an issue,
172and writing the code,
173the more thorough the documentation.
174
175@** Design.
176@*0 Layers.
177|libmarpa|, the library described in this document, is intended as the bottom of potentially
178four layers.
179The layers are, from low to high
180\li |libmarpa|
181\li The glue layer
182\li The wrapper layer
183\li The application
184
185This glue layer will be in C and will call the |libmarpa| routines
186in a way that makes them compatible with another language.
187I expect this will usually be a 4GL (4th generation language),
188such as Perl.
189One example of a glue description lanuage is SWIG.
190Another is Perl XS, and currently that is
191the only glue layer implemented for |libmarpa|.
192
193|libmarpa| itself is not enormously user-
194or application-friendly.
195For example, in |libmarpa|, symbols do not have
196names, just symbol structures and symbol ID's.
197These are all that is needed for the data crunching,
198but an application writer will usually want a friendlier
199interface, including names for the symbols and
200many other conveniences.
201For this reason, applications will typically
202use |libmarpa| through a {\bf wrapper package}.
203Currently the only such package is in Perl.
204
205The top layer is the application.
206My expectation is that this will also be in a 4GL.
207Currently, |libmarpa|'s only application are
208in Perl.
209
210Not all these layers need be present.
211For example, it is conceivable that someone might
212write their application in C, in which case they could
213manage without minimal or no
214glue layers or package layers.
215
216Iterfaces between layers are named after the lower
217of the two layers.  For example the interface between
218|libmarpa| and the glue layer is the |libmarpa| interface.
219
220@*0 Representing Objects.
221Representation of objects is most commonly in one
222of three forms: cookies, ID's or pointers to C structures.
223
224@*1 Object ID's.
225Object ID's are integers.  They are always issued in sequence.
226They are guaranteed unique.
227(Note that in C,
228pointers to identical objects do {\bf not} necessarily
229compare equal.)
230If desired, they can be checked easily without risking a memory
231violation.
232
233ID's are the only object representation
234that can be used in any layer or any interface,
235and they are the preferred representation
236in the application layer
237and the package interface.
238
239Wraparound issues for object ID's are ignored.
240By the time any object ID wraps, memory will have long
241since overflowed.
242
243@*1 Object Cookies.
244Ideally, outside of the |libmarpa| layer,
245all objects would be represented by their ID.
246However, an exception is made recognizers and grammars,
247even though they do have ID's.
248This is because looking up ID's for these global objects
249is not thread-safe.
250
251@ To make ID lookup for global objects could be made thread-safe,
252but this involves locking data.
253It is possible to do this portably, using Glib, but it seems simply
254and safer to expect the calling environment to respect the opaque
255nature of the grammar and recognizer cookies.
256
257``Respecting the opaque nature of a cookie",
258means not
259accessing its internal contents -- using the
260cookie only as a cookie.
261The overall idea is that,
262if an programmer
263writes trick-free higher-level code
264using cookies,
265any resulting errors occur
266in the package or application layer.
267
268The contents of Object Cookies are dependent on
269the choice of higher-level language (HLL).
270For this reason,
271The cookies are never visible in the |libmarpa| layer.
272
273In Perl's cookies, a major consideration is ensuring
274that, during the lifetime of a cookie,
275all the objects implied by the cookie also exist.
276This means that so long as
277a recognizer object cookie exists,
278the underlying grammar cannot be destroyed.
279
280@*1 Object pointers.
281The most efficient representation of objects
282are pointers to structures.
283These are the main representation of objects
284in the |libmarpa| layer.
285These must not be visible in the package and application
286layers.
287
288With regard to the visibility of object pointers in the
289glue layer, the situation is more complicated.
290At this writing, I expect to make pointers
291to most structures
292completely invisible except inside |libmarpa|.
293The external accessors do allow the glue layer
294some access
295to |libmarpa|'s internal structures.
296But in the case of the |_peek|
297external accessors,
298it is intuitive that the memory is owned
299by the |libmarpa| layer,
300and expected that any use of it will be quick.
301
302In the case of object pointers, their expected ordinary
303use is be kept around to refer to the object.
304But, for example, symbol object pointers must not
305be freed by the glue layer, but will become invalid
306when their associated grammar layer is destroyed.
307
308This behavior is not completely unintuitive to an
309experienced C programmer -- functions (like |ctime|)
310which return
311transient information in memory unowned by the caller
312have a long tradition in UNIX.
313But these are now deprecated.
314
315But tracking the lifetime of symbol object pointers
316in the glue layer
317would be tricky, so as this writing the thought is to
318avoid the issue, for it and most other object pointers.
319The exceptions are grammar and recognizer objects.
320The base objects for these {\bf are} owned by
321the glue layer, so these do not present the same
322issues.
323The glue layer creates
324grammar and recognizer objects,
325it owns them during their lifetime,
326and it is up to the glue layer to destroy them.
327
328@*0 Inlining.
329Most of this code is expected to be freqently executed
330and inlining is used a lot.
331Enough so
332that it is useful to define a macro to let me know when inlining is not
333used in a private function.
334@s PRIVATE_NOT_INLINE int
335@d PRIVATE_NOT_INLINE static
336
337@*0 Marpa Global Setup.
338
339Marpa does no global initialization at the moment.
340I'll try to keep it that way.
341If I can't, I will need to deal with the issue
342of thread safety.
343
344@*0 Complexity.
345Considerable attention is paid to time and,
346where it is a serious issue, space complexity.
347Complexity is considered from three points of view.
348{\bf Practical worst-case complexity} is the complexity of the
349actual implementation, in the worst-case.
350{\bf Practical average complexity} is the complexity of the
351actual implementation under what are expected to be normal
352circumstances.
353Average complexity is of most interest to the typical user,
354but worst-case considerations should not be ignored ---
355in some applications,
356one case of poor performance
357can outweigh any number of
358of excellent ``average case" results.
359@ Finally, there is {\bf theoretical complexity}.
360This is the complexity I would claim in a write-up of the
361Marpa algorithm for a Theory of Computation article.
362Most of the time, this is the same as practical worst-case complexity.
363Often, however, for theoretical complexity I consider
364myself entitled to claim
365the time complexity for a
366better algorithm, even thought that is not the one
367used in the actual implementation.
368@ Sorting is a good example of under what circumstances
369I take the liberty of claiming a time complexity I did not
370implement.
371In many places in |libmarpa|,
372for sorting,
373the most reasonable practical
374implementation (sometimes the only reasonable practical implementation)
375is an $O(n^2)$ sort.
376When average list size is small, for example,
377a hand-optimized insertion sort is often clearly superior
378to all other alternatives.
379Where average list size is larger,
380a call to |g_qsort| is the appropriate response.
381|g_qsort| is the result of considerable thought and experience,
382the GNU project has decided to base it on quicksort,
383and I do not care to second-guess them on this.
384But quicksort and insertion sorts are both, theoretically, $O(n^2)$.
385@ Clearly, in both cases, I could drop in a merge sort and achieve
386a theoretical $O(n \log n)$ worst case.
387Often just as clear is that is all cases likely to occur in practice,
388the merge sort would be inferior.
389@ When I claim a complexity from a theoretical choice of algorithm,
390rather than the actually implemented one, the following will always be
391the case:
392\li The existence of the theoretical algorithm must be generally accepted.
393\li The complexity I claim for it must be generally accepted.
394\li It must be clear that there are no obstacles to using the theoretical algorithm
395whose solution is not straightforward.
396@ I am a big believer in theory.
397Often practical considerations didn't clearly indicate a choice of
398algorithm .
399In those circumstances, I usually
400allowed theoretical superiority to be the deciding factor.
401@ But there were cases
402where the theoretically superior choice
403was clearly going to be inferior in practice.
404Sorting was one of them.
405It would be possible to
406go through |libmarpa| and replace all sorts with a merge sort.
407But a slower library would be the result.
408
409@** Coding conventions.
410@*0 Naming conventions.
411
412@*1 Reserved locals.
413Certain symbol names are reserved for certain purposes.
414They are not necessarily defined, but if defined they
415must be used for the designated purpose.
416An example is |g|, which is the grammar of most interest in
417the context.
418(In fact, no marpa routine uses more than one grammar.)
419It is expected that the routines which refer to a grammar
420will set |g| to that value.
421This convention saves a lot of clutter in the form of
422macro and subroutine arguments.
423
424In some cases, these constants may not be well-defined.
425An example is |rule_count_of_g| while rules are being added
426to the grammar.
427In such cases, to minimize confusion, these names should be
428left undefined.
429This makes the macros which use them unuseable, which
430is a feature.
431
432\li |g| is always the grammar of most interest in the context.
433\li |r| is always the recognizer of most interest in the context.
434\li |rule_count_of_g| is the number of rules in |g|.
435
436@*1 Mixed Case Macros.
437In programming in general, accessors are very common.
438In |libmarpa|, the percentage of the logic the consists
439of accessors is even higher than usual,
440and their variety approaches the botanical.
441Most of these accessors are simple or even trivial,
442but some are not.
443In an effort to make the code readable and maintainable,
444I use macros for all accessors.
445@ The standard C convention is that macros are all caps.
446This is a good convention.  I believe in it and almost
447always follow it.
448But in this code I have departed from it.
449@ As has been noted in the email world,
450when most of a page is in caps, that page becomes
451much harder and less pleasant to read.
452So in this code I have made macros mixed case.
453Marpa's mixed case macros are easy to spot ---
454they always start with a capital, and the ``major words"
455also begin in capital letters.
456``Verbs" and ``coverbs" in the macros begin with a lower
457case letter.
458All words are separated with an underscore,
459as is the currently accepted practice to enhance readability.
460@ The ``macros are all caps" convention is a long standing one.
461I understand that experienced C programmers will be suspicious
462of my claim that this code is special in a way that justifies
463breaking the convention.
464Frankly, if I were a new reader coming to this code,
465I would be suspicious as well.
466But I would ask anyone who wishes to criticize to first do
467the following:
468Look at one of the many macro-heavy pages in this code
469and ask yourself -- do you genuinely wish more of this
470page was in caps?
471
472@*1 External Names.
473External Names have |marpa_| or |MARPA_| as their prefix,
474as appropriate under the capitalization conventions.
475Many names begin with one of the major ``objects" of Marpa:
476grammars, recognizers, symbols, etc.
477Names of functions typically end with a verb.
478
479@*1 Booleans.
480Names of booleans are often
481of the form |is_x|, where |x| is some
482property.  For example, the element of the symbol structure
483which indicates whether the symbol is a terminal or not,
484is |is_terminal|.
485Boolean names are chosen so that the |TRUE| or |FALSE|
486value corresponds correctly to the question implied by the
487name.
488Names should be as
489accurate as possible consistent with brevity.
490Where possible, consistent with brevity and accuracy,
491positive names (|is_found|) are preferred
492to negative names (|is_not_lost|).
493
494@*1 Function names.
495For function names, some final verbs have special meanings.
496In the description below |obj| stands for an object,
497and |fld| for a field of that object.
498In cases where there is not ambiguity about which
499object a field might belong to, |obj| will often be omitted.
500
501\li |obj_fld_get| returns field |fld|
502of object |obj|.
503It is an internal function, and often will be declared
504|static inline|.
505
506\li |obj_fld_put| assigns a value to field |fld|
507of object |obj|.
508It is an internal function, and often will be declared
509|static inline|.
510
511\li |marpa_obj_fld_look| returns field |fld|
512of object |obj|.
513It is an external equivalent of |obj_fld_get|.
514The returned value is still owned by object |obj| -- it should
515not be modified or freed.
516In practice, the |look| verb is often omitted.
517
518\li |marpa_obj_fld_peek| returns field |fld|
519of object |obj|.
520It is an external equivalent of |obj_fld_get|.
521The returned value is still owned by object |obj| -- it should
522not be modified or freed.
523
524The difference between ``peek" and ``look" is somewhat
525subjective.
526``Look" functions are expected to be called in the normal
527course of operation, including in production code.
528``Peek" functions break the encapsulation rules.
529Their use is expected to be limited
530to debugging or tracing situations.
531
532\li |marpa_obj_fld_set| sets field |fld|
533of object |obj|.
534It's the external equivalent of |obj_fld_put|.
535
536\li |marpa_obj_fld_value| returns field |fld|
537of object |obj|.
538It is an external equivalent of |obj_fld_get|.
539The returned value is owned by the caller.
540
541@*0 Abbreviations and Vocabulary.
542@ Unexplained abbreviations and non-standard vocabulary
543pose unnecessary challenges.
544Particular obstacles to those who are not native speakers
545of English, they are annoying to the natives as well.
546This section is intended to document
547all abbreviations.
548Also included is the
549any non-standard vocabulary
550which is not explained in detail elsewhere in the
551text.
552By ``non-standard vocabulary",
553I mean terms that
554are not in a general dictionary, and
555are also not in the standard reference works.
556@ While development is underway, this section will be
557incomplete and sometimes inaccurate.
558\li alloc: Allocate.
559\li assign: Find something, creating it when necessary.
560\li bv: Bit Vector.
561\li cmp: Compare.
562Usually as |_cmp|, the suffix or ``verb" of a function name.
563\li \_Object: As a suffix of a type name, this means an object,
564as opposed to a pointer.
565When there is a choice,
566most complex types are considered to be pointers
567to structures or unions, rather than the structure or
568union itself.
569When it's necessary to have a type which
570refers to the actual structure
571or union {\bf directly}, not via a pointer,
572that type is called the ``object" form of the
573type.  As an example, look at the definitions
574of |EIM| and |EIM_Object|.
575\li EIM: Earley item.
576\li |EIM_Object|: Earley item (object).
577\li EIX: Earley item index.
578\li ES: Earley set.
579\li g: Grammar.
580\li |_ix|, |_IX|, ix, IX: Index.  Often used as a suffix.
581\li Leo base item: The Earley item which ``causes" a Leo item to
582be added.  If a Leo chain in reconstructed from the Leo item,
583\li Leo completion item: The Earley item which is the ``successor"
584of a Leo item to
585be added.
586\li Leo LHS symbol: The LHS of a Leo completion item (see which).
587\li Leo item: A ``transition item" as described in Leo1991.
588These stand in for a Leo chain of one or more Earley tems.
589Leo items can stand in for all the Earley items of a right
590recursion,
591and it is the use of Leo items which makes this algorithm |O(n)|
592for all LR-regular grammars.
593In an Earley implementation
594without Leo items, a parse with right recursion
595can have the time comlexity |O(n^2)|.
596\li LIM: Leo item.
597\li \_Object: Suffix indicating that the type is of an
598actual object, and not a pointer as is usually the case.
599\li PIM, pim: Postdot item.
600\li p: A Pointer.  Often as |_p|, as the end of a variable name, or as |p_| at
601the beginning of one.
602\li pp: A Pointer to pointer.  Often as |_pp|, as the end of a variable name.
603\li R, r: Recognizer.
604\li RECCE, recce: Recognizer.  Originally military slang for a
605reconnaissance.
606\li -s, -es: Plural.  Note that the |es| suffix is often used even when
607it is not good English, because it is easier to spot in text.
608For example, the plural of |ES| is |ESes|.
609\li |s_|: Prefix for a structure tag.  Cweb does not C code format well
610unless tag names are distinct from other names.
611\li |t_|: Prefix for an element tag.  Cweb does not C code format well
612unless tag names are distinct from others.
613Since each structure and union in C has a different namespace,
614this does not suffice to make different tags unique, but it does
615suffice to let Cweb distinguish tags from other items, and that is the
616object.
617\li |u_|: Prefix for a union tag.  Cweb does not C code format well
618unless tag names are distinct from other names.
619
620@** To Do.
621
622Most of the to do list has been moved to Marpa::R2.
623
624\li If I convert Marpa to use Marpa::XS,
625and if I continue to implement the |tokens()| call,
626make sure the ``interactive" flag works.
627
628@** The Public Header File.
629@*0 Version Constants.
630@<Private global variables@> =
631const guint marpa_major_version = MARPA_MAJOR_VERSION;
632const guint marpa_minor_version = MARPA_MINOR_VERSION;
633const guint marpa_micro_version = MARPA_MICRO_VERSION;
634const guint marpa_interface_age = MARPA_INTERFACE_AGE;
635const guint marpa_binary_age = MARPA_BINARY_AGE;
636@ Return the version in a 3 element int array
637@<Function definitions@> =
638void marpa_version(int* version) {
639        version[0] = MARPA_MAJOR_VERSION;
640        version[1] = MARPA_MINOR_VERSION,
641        version[2] = MARPA_MICRO_VERSION;
642}
643@ @<Public function prototypes@> =
644void marpa_version(int* version);
645
646@*0 Header file.
647|GLIB_VAR| is to
648prefix variable declarations so that they
649will be exported properly for Windows dlls.
650@f GLIB_VAR const
651@<Body of public header file@> =
652GLIB_VAR const guint marpa_major_version;@/
653GLIB_VAR const guint marpa_minor_version;@/
654GLIB_VAR const guint marpa_micro_version;@/
655GLIB_VAR const guint marpa_interface_age;@/
656GLIB_VAR const guint marpa_binary_age;@#
657#define MARPA_CHECK_VERSION(major,minor,micro) @| \
658    @[ (MARPA_MAJOR_VERSION > (major) \
659        @| || (MARPA_MAJOR_VERSION == (major) && MARPA_MINOR_VERSION > (minor)) \
660        @| || (MARPA_MAJOR_VERSION == (major) && MARPA_MINOR_VERSION == (minor) \
661        @|  && MARPA_MICRO_VERSION >= (micro)))
662        @]@#
663#define MARPA_CAT(a, b) @[ a ## b @]
664@<Public defines@>@/
665@<Public incomplete structures@>@/
666@<Public typedefs@>@/@\
667@<Callback typedefs@>@/
668@<Public structures@>@/
669@<Public function prototypes@>@/
670
671@** Grammar (GRAMMAR) Code.
672@<Public incomplete structures@> = struct marpa_g;
673@ @<Private structures@> = struct marpa_g {
674@<Widely aligned grammar elements@>@;
675@<Int aligned grammar elements@>@;
676@<Bit aligned grammar elements@>@;
677};
678typedef struct marpa_g GRAMMARD;
679@ @<Private typedefs@> =
680typedef struct marpa_g* GRAMMAR;
681typedef const struct marpa_g* GRAMMAR_Const;
682
683@ @<Function definitions@> =
684struct marpa_g* marpa_g_new( void)
685{ struct marpa_g* g = g_slice_new(struct marpa_g);
686    @<Initialize grammar elements@>@;
687   return g; }
688@ @<Public function prototypes@> =
689struct marpa_g* marpa_g_new(void);
690
691@ @<Function definitions@> =
692void marpa_g_free(struct marpa_g *g)
693{ @<Destroy grammar elements@>@;
694g_slice_free(struct marpa_g, g);
695}
696@ @<Public function prototypes@> =
697void marpa_g_free(struct marpa_g *g);
698
699@*0 The Grammar ID.
700A unique ID for the grammar.
701This must be unique not just per-thread,
702but process-wide.
703The counter which tracks grammar ID's
704(|next_grammar_id|)
705is (at this writing) the only global
706non-constant, and requires special handling to
707keep |libmarpa| MT-safe.
708(|next_grammar_id|) is accessed only via
709|glib|'s special atomic operations.
710@ @<Int aligned grammar elements@> = gint t_id;
711@ @<Public typedefs@> = typedef gint Marpa_Grammar_ID;
712@ @<Private global variables@> = static gint next_grammar_id = 1;
713@ @<Initialize grammar elements@> =
714g->t_id = g_atomic_int_exchange_and_add(&next_grammar_id, 1);
715@ @<Function definitions@> =
716gint marpa_grammar_id(struct marpa_g* g) { return g->t_id; }
717@ @<Public function prototypes@> =
718gint marpa_grammar_id(struct marpa_g* g);
719
720@*0 The Grammar's Symbol List.
721This lists the symbols for the grammar,
722with their
723|Marpa_Symbol_ID| as the index.
724
725@<Widely aligned grammar elements@> = GArray* t_symbols;
726@ @<Initialize grammar elements@> =
727g->t_symbols = g_array_new(FALSE, FALSE, sizeof(SYM));
728@ @<Destroy grammar elements@> =
729{  Marpa_Symbol_ID id; for (id = 0; id < (Marpa_Symbol_ID)g->t_symbols->len; id++)
730{ symbol_free(SYM_by_ID(id)); } }
731g_array_free(g->t_symbols, TRUE);
732
733@ The trace accessor returns the GArray.
734It remains ``owned" by the Grammar,
735and must not be freed or modified.
736@<Function definitions@> =
737GArray *marpa_g_symbols_peek(struct marpa_g* g)
738{ return g->t_symbols; }
739@ @<Public function prototypes@> =
740GArray *marpa_g_symbols_peek(struct marpa_g* g);
741
742@ Symbol count accesor.
743@d SYM_Count_of_G(g) ((g)->t_symbols->len)
744
745@ Symbol by ID.
746@d SYM_by_ID(id) (g_array_index(g->t_symbols, SYM, (id)))
747
748@ Adds the symbol to the list of symbols kept by the Grammar
749object.
750@<Private inline functions@> =
751static inline
752void g_symbol_add(
753    struct marpa_g *g,
754    Marpa_Symbol_ID symid,
755    SYM symbol)
756{
757    g_array_insert_val(g->t_symbols, (unsigned)symid, symbol);
758}
759
760@ Check that symbol is in valid range.
761@<Function definitions@> =
762static inline gint symbol_is_valid(
763const struct marpa_g *g, const Marpa_Symbol_ID symid) {
764return symid >= 0 && (guint)symid < g->t_symbols->len;
765}
766@ @<Private function prototypes@> =
767static inline gint symbol_is_valid(
768const struct marpa_g *g, const Marpa_Symbol_ID symid);
769
770@*0 The Grammar's Rule List.
771This lists the rules for the grammar,
772with their |Marpa_Rule_ID| as the index.
773@d RULE_Count_of_G(g) ((g)->t_rules->len)
774@<Widely aligned grammar elements@> = GArray* t_rules;
775@ @<Initialize grammar elements@> =
776g->t_rules = g_array_new(FALSE, FALSE, sizeof(RULE));
777@ @<Destroy grammar elements@> =
778g_array_free(g->t_rules, TRUE);
779
780@ The trace accessor returns the GArray.
781It remains ``owned" by the Grammar,
782and must not be freed or modified.
783@<Function definitions@> =
784GArray *marpa_g_rules_peek(struct marpa_g* g)
785{ return g->t_rules; }
786@ @<Public function prototypes@> =
787GArray *marpa_g_rules_peek(struct marpa_g* g);
788
789@ Internal accessor to find a rule by its id.
790@d RULE_by_ID(g, id) (g_array_index((g)->t_rules, RULE, (id)))
791
792@ Adds the rule to the list of rules kept by the Grammar
793object.
794@<Private inline functions@> =
795static inline
796void rule_add(
797    struct marpa_g *g,
798    RULEID rule_id,
799    RULE rule)
800{
801    g_array_insert_val(g->t_rules, (unsigned)rule_id, rule);
802    LV_Size_of_G(g) += 1 + Length_of_RULE(rule);
803    g->t_max_rule_length = MAX(Length_of_RULE(rule), g->t_max_rule_length);
804}
805
806@ Check that rule is in valid range.
807@d RULEID_of_G_is_Valid(g, rule_id)
808    ((rule_id) >= 0 && (guint)(rule_id) < (g)->t_rules->len)
809
810@*0 Default Value.
811@d Default_Value_of_G(g) ((g)->t_default_value)
812@<Widely aligned grammar elements@> = gpointer t_default_value;
813@ @<Initialize grammar elements@> =
814Default_Value_of_G(g) = NULL;
815@ @<Public function prototypes@> =
816gpointer marpa_default_value(struct marpa_g* g);
817@ @<Function definitions@> =
818gpointer marpa_default_value(struct marpa_g* g)
819{ return Default_Value_of_G(g); }
820@ @<Public function prototypes@> =
821gboolean marpa_default_value_set(struct marpa_g*g, gpointer default_value);
822@ @<Function definitions@> =
823gboolean marpa_default_value_set(struct marpa_g*g, gpointer default_value)
824{
825   @<Return |FALSE| on failure@>@;
826    @<Fail if grammar is precomputed@>@;
827    Default_Value_of_G(g) = default_value;
828    return TRUE;
829}
830
831@*0 Start Symbol.
832@<Int aligned grammar elements@> = Marpa_Symbol_ID t_start_symid;
833@ @<Initialize grammar elements@> =
834g->t_start_symid = -1;
835@ @<Function definitions@> =
836Marpa_Symbol_ID marpa_start_symbol(struct marpa_g* g)
837{ return g->t_start_symid; }
838@ @<Public function prototypes@> =
839Marpa_Symbol_ID marpa_start_symbol(struct marpa_g* g);
840@ Returns |TRUE| on success,
841|FALSE| on failure.
842@<Function definitions@> =
843gboolean marpa_start_symbol_set(struct marpa_g*g, Marpa_Symbol_ID symid)
844{
845   @<Return |FALSE| on failure@>@;
846    @<Fail if grammar is precomputed@>@;
847    @<Fail if grammar |symid| is invalid@>@;
848    g->t_start_symid = symid;
849    return TRUE;
850}
851@ @<Public function prototypes@> =
852gboolean marpa_start_symbol_set(struct marpa_g*g, Marpa_Symbol_ID id);
853
854@*0 Start Rules.
855These are the start rules, after the grammar is augmented.
856Only one of these needs to be non-NULL.
857@<Int aligned grammar elements@> =
858RULE t_null_start_rule;
859RULE t_proper_start_rule;
860@ @<Initialize grammar elements@> =
861g->t_null_start_rule = NULL;
862g->t_proper_start_rule = NULL;
863
864@*0 The Grammar's Size.
865Intuitively,
866I define a grammar's size as the total size, in symbols, of all of its
867rules.
868This includes both the LHS symbol and the RHS symbol.
869Since every rule has exactly one LHS symbol,
870the grammar's size is always equal to the total of
871all the rules lengths, plus the total number of rules.
872
873Unused rules are not included in the theoretical number,
874but Marpa does not necessarily deduct rules from the
875count as they are marked useless.
876This means that the
877grammar will always be of this size or smaller.
878As rules are marked useless, they are not necessarily deducted
879from the count.
880The purpose of tracking grammar size is to allocate resources,
881and for that purpose a high-ball estimate is adequate.
882@d Size_of_G(g) ((g)->t_size)
883@d LV_Size_of_G(g) ((g)->t_size)
884@ @<Int aligned grammar elements@> = int t_size;
885@ @<Initialize grammar elements@> =
886LV_Size_of_G(g) = 0;
887
888@*0 The Maximum Rule Length.
889This is a high-ball estimate of the length of the
890longest rule in the grammar.
891The actual value will always be this number or smaller.
892\par
893The value is used for allocating resources.
894Unused rules are not included in the theoretical number,
895but Marpa does not adjust this number as rules
896are marked useless.
897@ @<Int aligned grammar elements@> = gint t_max_rule_length;
898@ @<Initialize grammar elements@> =
899g->t_max_rule_length = 0;
900
901@*0 Grammar Boolean: Precomputed.
902@ @<Public function prototypes@> =
903gboolean marpa_is_precomputed(const struct marpa_g* const g);
904@ @d G_is_Precomputed(g) ((g)->t_is_precomputed)
905@<Bit aligned grammar elements@> = guint t_is_precomputed:1;
906@ @<Initialize grammar elements@> =
907g->t_is_precomputed = FALSE;
908@ @<Function definitions@> =
909gboolean marpa_is_precomputed(const struct marpa_g* const g)
910{ return G_is_Precomputed(g); }
911
912@*0 Grammar Boolean: Has Loop.
913@<Bit aligned grammar elements@> = guint t_has_loop:1;
914@ @<Initialize grammar elements@> =
915g->t_has_loop = FALSE;
916@ The internal accessor would be trivial, so there is none.
917@<Function definitions@> =
918gboolean marpa_has_loop(struct marpa_g* g)
919{ return g->t_has_loop; }
920@ @<Public function prototypes@> =
921gboolean marpa_has_loop(struct marpa_g* g);
922
923@*0 Grammar Boolean: LHS Terminal OK.
924Traditionally, a BNF grammar did {\bf not} allow a symbol
925which was a terminal symbol of the grammar, to also be a LHS
926symbol.
927By default, this is allowed under Marpa.
928@<Bit aligned grammar elements@> = guint t_is_lhs_terminal_ok:1;
929@ @<Initialize grammar elements@> =
930g->t_is_lhs_terminal_ok = TRUE;
931@ The internal accessor would be trivial, so there is none.
932@<Function definitions@> =
933gboolean marpa_is_lhs_terminal_ok(struct marpa_g* g)
934{ return g->t_is_lhs_terminal_ok; }
935@ @<Public function prototypes@> =
936gboolean marpa_is_lhs_terminal_ok(struct marpa_g* g);
937@ Returns |TRUE| on success,
938|FALSE| on failure.
939@<Function definitions@> =
940gboolean marpa_is_lhs_terminal_ok_set(
941struct marpa_g*g, gboolean value)
942{
943    if (G_is_Precomputed(g)) {
944        g->t_error = "precomputed";
945	return FALSE;
946    }
947    g->t_is_lhs_terminal_ok = value;
948    return TRUE;
949}
950@ @<Public function prototypes@> =
951gboolean marpa_is_lhs_terminal_ok_set( struct marpa_g*g, gboolean value);
952
953@*0 Terminal Boolean Vector.
954A boolean vector, with bits sets if the symbol is a
955terminal.
956This is not used as the working vector while doing
957the census, because not all symbols have been added at
958that point.
959At grammar initialization, this vector cannot be sized.
960It is initialized to |NULL| so that the destructor
961can tell if there is a bit vector to be freed.
962@<Widely aligned grammar elements@> = Bit_Vector t_bv_symid_is_terminal;
963@ @<Initialize grammar elements@> = g->t_bv_symid_is_terminal = NULL;
964@ @<Destroy grammar elements@> =
965if (g->t_bv_symid_is_terminal) { bv_free(g->t_bv_symid_is_terminal); }
966
967@*0 The Grammar's Context.
968The ``context" is a hash of miscellaneous data,
969by keyword.
970It is so called because its purpose is to
971provide callbacks with ``context" ---
972data about
973|libmarpa|'s state which is not conveniently
974available in other forms.
975@d Context_of_G(g) ((g)->t_context)
976@<Widely aligned grammar elements@> = GHashTable* t_context;
977@ @<Initialize grammar elements@> =
978g->t_context = g_hash_table_new_full( g_str_hash, g_str_equal, NULL, g_free );
979@ @<Destroy grammar elements@> = g_hash_table_destroy(Context_of_G(g));
980
981@ @<Public defines@> =
982#define MARPA_CONTEXT_INT 1@/
983#define MARPA_CONTEXT_CONST 2@/
984#define MARPA_IS_CONTEXT_INT(v) @| @[ ((v)->t_type == MARPA_CONTEXT_INT) @]@/
985#define MARPA_CONTEXT_INT_VALUE(v) @| \
986@[ ((v)->t_type == MARPA_CONTEXT_INT \
987    ? ((struct marpa_context_int_value*)v)->t_data \
988    : G_MININT) @]@/
989#define MARPA_CONTEXT_STRING_VALUE(v) @| \
990@[ ((v)->t_type == MARPA_CONTEXT_CONST \
991    ? ((struct marpa_context_const_value*)v)->t_data \
992    : NULL) @]@/
993@ @<Public structures@> =
994struct marpa_context_int_value {
995   gint t_type;
996   gint t_data;
997};
998@ @<Public structures@> =
999struct marpa_context_const_value {
1000   gint t_type;
1001   const gchar* t_data;
1002};
1003@ @<Public structures@> =
1004union marpa_context_value {
1005   gint t_type;
1006   struct marpa_context_int_value t_int_value;
1007   struct marpa_context_const_value t_const_value;
1008};
1009
1010@ Add an integer to the context.
1011These functions might be converted to be public.
1012For now they are only for use by |libmarpa| in setting
1013values to be read by the higher layers,
1014are therefore internal.
1015
1016The const qualifier on the key is deliberately discarded.
1017As implemented, the keys are treated as const's by
1018|g_hash_table_insert|, but the compiler can't know
1019that is my intention.
1020For type safety, I do want to keep the |const|
1021qualifier in other contexts.
1022@<Function definitions@> =
1023static inline
1024void g_context_int_add(struct marpa_g* g, const gchar* key, gint payload)
1025{
1026    struct marpa_context_int_value* value
1027	= g_new(struct marpa_context_int_value, 1);
1028    value->t_type = MARPA_CONTEXT_INT;
1029    value->t_data = payload;
1030    g_hash_table_insert(Context_of_G(g), (gpointer)key, value);
1031}
1032@ @<Private function prototypes@> =
1033static inline
1034void g_context_int_add(struct marpa_g* g, const gchar* key, gint value);
1035@ @<Function definitions@> =
1036static inline
1037void context_const_add(struct marpa_g* g, const gchar* key, const gchar* payload)
1038{
1039    struct marpa_context_const_value* value
1040	= g_new(struct marpa_context_const_value, 1);
1041    value->t_type = MARPA_CONTEXT_CONST;
1042    value->t_data = payload;
1043    g_hash_table_insert(Context_of_G(g), (gpointer)key, value);
1044}
1045@ @<Private function prototypes@> =
1046static inline
1047void context_const_add(struct marpa_g* g, const gchar* key, const gchar* value);
1048
1049@ Clear the current context.
1050Used to create a ``clean slate" in the context.
1051@<Function definitions@> =
1052static inline void g_context_clear(struct marpa_g* g) {
1053    g_hash_table_remove_all(Context_of_G(g)); }
1054@ @<Private function prototypes@> =
1055static inline void g_context_clear(struct marpa_g* g);
1056
1057@ @<Function definitions@> =
1058union marpa_context_value* marpa_g_context_value(struct marpa_g* g, const gchar* key)
1059{ return g_hash_table_lookup(Context_of_G(g), key); }
1060@ @<Public function prototypes@> =
1061union marpa_context_value* marpa_g_context_value(struct marpa_g* g, const gchar* key);
1062
1063@*0 The Grammar Obstacks.
1064Two obstacks with the same lifetime as the grammar.
1065This is a very efficient way of allocating memory which won't be
1066resized and which will have the same lifetime as the grammar.
1067One obstack is reserved for of ``tricky" operations
1068like |obs_free|,
1069which require coordination with other allocations.
1070The other obstack is reserved for ``safe" operations---%
1071complete allocations which are never reversed.
1072The dual obstacks allow me to get tricky where it is useful,
1073which also allowing most obstack allocations to be done safely without
1074the need to carefully examine their context.
1075@<Widely aligned grammar elements@> =
1076struct obstack t_obs;
1077struct obstack t_obs_tricky;
1078@ @<Initialize grammar elements@> =
1079obstack_init(&g->t_obs);
1080obstack_init(&g->t_obs_tricky);
1081@ @<Destroy grammar elements@> =
1082obstack_free(&g->t_obs, NULL);
1083obstack_free(&g->t_obs_tricky, NULL);
1084
1085@*0 The Grammar's Error ID.
1086This is an error flag for the grammar.
1087Error status is not necessarily cleared
1088on successful return, so that
1089it is only valid when an external
1090function has indicated there is an error,
1091and becomes invalid again when another external method
1092is called on the grammar.
1093Checking it at other times may reveal ``stale" error
1094messages.
1095@<Public typedefs@> =
1096typedef const gchar* Marpa_Error_ID;
1097@ @<Widely aligned grammar elements@> = Marpa_Error_ID t_error;
1098@ @<Initialize grammar elements@> =
1099g->t_error = NULL;
1100@ There is no destructor.
1101The error strings are assummed to be
1102{\bf not} error messages, but ``cookies".
1103These cookies are constants residing in static memory
1104(which may be read-only depending on implementation).
1105They cannot and should not be de-allocated.
1106@ @<Function definitions@> =
1107Marpa_Error_ID marpa_g_error(const struct marpa_g* g)
1108{ return g->t_error ? g->t_error : "unknown error"; }
1109@ @<Public function prototypes@> =
1110Marpa_Error_ID marpa_g_error(const struct marpa_g* g);
1111
1112@** Symbol (SYM) Code.
1113@s Marpa_Symbol_ID int
1114@<Public typedefs@> =
1115typedef gint Marpa_Symbol_ID;
1116@ @<Private typedefs@> =
1117typedef gint SYMID;
1118@ @<Private incomplete structures@> =
1119struct s_symbol;
1120typedef struct s_symbol* SYM;
1121typedef const struct s_symbol* SYM_Const;
1122@ The initial element is a type gint so that
1123symbol structure may be used where or-nodes are
1124expected.
1125@<Private structures@> =
1126struct s_symbol {
1127    @<Widely aligned symbol elements@>@;
1128    @<Int aligned symbol elements@>@;
1129    @<Bit aligned symbol elements@>@;
1130};
1131typedef struct s_symbol SYM_Object;
1132
1133@ @<Private function prototypes@> =
1134static inline
1135SYM symbol_new(struct marpa_g *g);
1136@ @<Function definitions@> =
1137static inline SYM
1138symbol_new (struct marpa_g *g)
1139{
1140  SYM symbol = g_malloc (sizeof (SYM_Object));
1141  @<Initialize symbol elements @>@/
1142  {
1143    SYMID id = ID_of_SYM(symbol);
1144    g_symbol_add (g, id, symbol);
1145  }
1146  return symbol;
1147}
1148
1149@ @<Public function prototypes@> =
1150Marpa_Symbol_ID marpa_symbol_new(struct marpa_g *g);
1151@ @<Function definitions@> =
1152Marpa_Symbol_ID
1153marpa_symbol_new (struct marpa_g * g)
1154{
1155  SYMID id = ID_of_SYM(symbol_new (g));
1156  symbol_callback (g, id);
1157  return id;
1158}
1159
1160@ @<Function definitions@> =
1161static inline void symbol_free(SYM symbol)
1162{ @<Free symbol elements@>@; g_free(symbol); }
1163@ @<Private function prototypes@> =
1164static inline void symbol_free(SYM symbol);
1165
1166@ Symbol ID: This is the unique identifier for the symbol.
1167@d ID_of_SYM(sym) ((sym)->t_symbol_id)
1168@d LV_ID_of_SYM(sym) ID_of_SYM(sym)
1169@<Int aligned symbol elements@> = SYMID t_symbol_id;
1170@ @<Initialize symbol elements@> = LV_ID_of_SYM(symbol) = g->t_symbols->len;
1171
1172@*0 Symbol LHS Rules Element.
1173This tracks the rules for which this symbol is the LHS.
1174It is an optimization --- the same information could be found
1175by scanning the rules every time this information is needed.
1176The implementation is a |GArray|.
1177@d SYMBOL_LHS_RULE_COUNT(symbol) ((symbol)->t_lhs->len)
1178@<Widely aligned symbol elements@> = GArray* t_lhs;
1179@ @<Initialize symbol elements@> =
1180symbol->t_lhs = g_array_new(FALSE, FALSE, sizeof(Marpa_Rule_ID));
1181@ @<Free symbol elements@> =
1182g_array_free(symbol->t_lhs, TRUE);
1183@ The trace accessor returns the GArray.
1184It remains ``owned" by the Grammar,
1185and must not be freed or modified.
1186@<Function definitions@> =
1187GArray *marpa_symbol_lhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid)
1188{ @<Return |NULL| on failure@>@;
1189@<Fail if grammar |symid| is invalid@>@;
1190return SYM_by_ID(symid)->t_lhs; }
1191@ @<Public function prototypes@> =
1192GArray *marpa_symbol_lhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid);
1193@ @<Function definitions@> = static inline
1194void symbol_lhs_add(SYM symbol, Marpa_Rule_ID rule_id)
1195{ g_array_append_val(symbol->t_lhs, rule_id); }
1196void
1197marpa_symbol_lhs_add(struct marpa_g*g, Marpa_Symbol_ID symid, Marpa_Rule_ID rule_id)
1198{ symbol_lhs_add(SYM_by_ID(symid), rule_id); }
1199@ @<Private function prototypes@> =
1200void
1201marpa_symbol_lhs_add(struct marpa_g*g, Marpa_Symbol_ID symid, Marpa_Rule_ID rule_id);
1202
1203@*0 Symbol RHS Rules Element.
1204This tracks the rules for which this symbol is the RHS.
1205It is an optimization --- the same information could be found
1206by scanning the rules every time this information is needed.
1207The implementation is a |GArray|.
1208@<Widely aligned symbol elements@> = GArray* t_rhs;
1209@ @<Initialize symbol elements@> =
1210symbol->t_rhs = g_array_new(FALSE, FALSE, sizeof(Marpa_Rule_ID));
1211@ @<Free symbol elements@> = g_array_free(symbol->t_rhs, TRUE);
1212
1213@ The trace accessor returns the GArray.
1214It remains ``owned" by the Grammar,
1215and must not be freed or modified.
1216@<Function definitions@> =
1217GArray *marpa_symbol_rhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid)
1218{ @<Return |NULL| on failure@>@;
1219@<Fail if grammar |symid| is invalid@>@;
1220return SYM_by_ID(symid)->t_rhs; }
1221@ @<Public function prototypes@> =
1222GArray *marpa_symbol_rhs_peek(struct marpa_g* g, Marpa_Symbol_ID symid);
1223@ @<Function definitions@> = static inline
1224void symbol_rhs_add(SYM symbol, Marpa_Rule_ID rule_id)
1225{ g_array_append_val(symbol->t_rhs, rule_id); }
1226@ @<Private function prototypes@> = static inline
1227void symbol_rhs_add(SYM symbol, Marpa_Rule_ID rule_id);
1228
1229@ Symbol Is Accessible Boolean
1230@<Bit aligned symbol elements@> = guint t_is_accessible:1;
1231@ @<Initialize symbol elements@> =
1232symbol->t_is_accessible = FALSE;
1233@ The trace accessor returns the Boolean value.
1234Right now this function uses a pointer
1235to the symbol function.
1236If that becomes private,
1237the prototype of this function
1238must be changed.
1239\par
1240The internal accessor would be trivial, so there is none.
1241@<Function definitions@> =
1242gboolean marpa_symbol_is_accessible(struct marpa_g* g, Marpa_Symbol_ID id)
1243{ return SYM_by_ID(id)->t_is_accessible; }
1244@ @<Public function prototypes@> =
1245gboolean marpa_symbol_is_accessible(struct marpa_g* g, Marpa_Symbol_ID id);
1246
1247@ Symbol Is Counted Boolean
1248@<Bit aligned symbol elements@> = guint t_is_counted:1;
1249@ @<Initialize symbol elements@> =
1250symbol->t_is_counted = FALSE;
1251@ The trace accessor returns the Boolean value.
1252Right now this function uses a pointer
1253to the symbol function.
1254If that becomes private,
1255the prototype of this function
1256must be changed.
1257\par
1258The internal accessor would be trivial, so there is none.
1259@<Function definitions@> =
1260gboolean marpa_symbol_is_counted(struct marpa_g* g, Marpa_Symbol_ID id)
1261{ return SYM_by_ID(id)->t_is_counted; }
1262@ @<Public function prototypes@> =
1263gboolean marpa_symbol_is_counted(struct marpa_g* g, Marpa_Symbol_ID id);
1264
1265@ Symbol Is Nullable Boolean
1266@<Bit aligned symbol elements@> = guint t_is_nullable:1;
1267@ @<Initialize symbol elements@> =
1268symbol->t_is_nullable = FALSE;
1269@ The trace accessor returns the Boolean value.
1270Right now this function uses a pointer
1271to the symbol function.
1272If that becomes private,
1273the prototype of this function
1274must be changed.
1275\par
1276The internal accessor would be trivial, so there is none.
1277@<Function definitions@> =
1278gboolean marpa_symbol_is_nullable(struct marpa_g* g, Marpa_Symbol_ID id)
1279{ return SYM_by_ID(id)->t_is_nullable; }
1280@ @<Public function prototypes@> =
1281gboolean marpa_symbol_is_nullable(struct marpa_g* g, Marpa_Symbol_ID id);
1282
1283@ Symbol Is Nulling Boolean
1284@d SYM_is_Nulling(sym) ((sym)->t_is_nulling)
1285@<Bit aligned symbol elements@> = guint t_is_nulling:1;
1286@ @<Initialize symbol elements@> =
1287symbol->t_is_nulling = FALSE;
1288@ The trace accessor returns the Boolean value.
1289Right now this function uses a pointer
1290to the symbol function.
1291If that becomes private,
1292the prototype of this function
1293must be changed.
1294\par
1295The internal accessor would be trivial, so there is none.
1296@<Function definitions@> =
1297gint marpa_symbol_is_nulling(struct marpa_g* g, Marpa_Symbol_ID symid)
1298{ @<Return |-2| on failure@>@;
1299@<Fail if grammar |symid| is invalid@>@;
1300return SYM_is_Nulling(SYM_by_ID(symid)); }
1301@ @<Public function prototypes@> =
1302gint marpa_symbol_is_nulling(struct marpa_g* g, Marpa_Symbol_ID id);
1303
1304@ Symbol Is Terminal Boolean
1305@<Bit aligned symbol elements@> = guint t_is_terminal:1;
1306@ @<Initialize symbol elements@> =
1307symbol->t_is_terminal = FALSE;
1308@ The trace accessor returns the Boolean value.
1309Right now this function uses a pointer
1310to the symbol function.
1311If that becomes private,
1312the prototype of this function
1313must be changed.
1314\par
1315The internal accessor would be trivial, so there is none.
1316@d SYM_is_Terminal(symbol) ((symbol)->t_is_terminal)
1317@d SYMID_is_Terminal(id) (SYM_is_Terminal(SYM_by_ID(id)))
1318@<Function definitions@> =
1319gboolean marpa_symbol_is_terminal(struct marpa_g* g, Marpa_Symbol_ID id)
1320{ return SYMID_is_Terminal(id); }
1321@ @<Public function prototypes@> =
1322gboolean marpa_symbol_is_terminal(struct marpa_g* g, Marpa_Symbol_ID id);
1323@ @<Function definitions@> =
1324void marpa_symbol_is_terminal_set(
1325struct marpa_g*g, Marpa_Symbol_ID id, gboolean value)
1326{ SYMID_is_Terminal(id) = value; }
1327@ @<Public function prototypes@> =
1328void marpa_symbol_is_terminal_set( struct marpa_g*g, Marpa_Symbol_ID id, gboolean value);
1329
1330@ Symbol Is Productive Boolean
1331@<Bit aligned symbol elements@> = guint t_is_productive:1;
1332@ @<Initialize symbol elements@> =
1333symbol->t_is_productive = FALSE;
1334@ The trace accessor returns the Boolean value.
1335Right now this function uses a pointer
1336to the symbol function.
1337If that becomes private,
1338the prototype of this function
1339must be changed.
1340\par
1341The internal accessor would be trivial, so there is none.
1342@<Function definitions@> =
1343gboolean marpa_symbol_is_productive(struct marpa_g* g, Marpa_Symbol_ID id)
1344{ return SYM_by_ID(id)->t_is_productive; }
1345@ @<Public function prototypes@> =
1346gboolean marpa_symbol_is_productive(struct marpa_g* g, Marpa_Symbol_ID id);
1347
1348@ Symbol Is Start Boolean
1349@<Bit aligned symbol elements@> = guint t_is_start:1;
1350@ @<Initialize symbol elements@> = symbol->t_is_start = FALSE;
1351@ Accessor: The trace accessor returns the Boolean value.
1352The internal accessor would be trivial, so there is none.
1353@<Function definitions@> =
1354static inline
1355gint symbol_is_start(SYM symbol)
1356{ return symbol->t_is_start; }
1357gint marpa_symbol_is_start( struct marpa_g*g, Marpa_Symbol_ID symid)
1358{ @<Return |-2| on failure@>@;
1359@<Fail if grammar |symid| is invalid@>@;
1360   return symbol_is_start(SYM_by_ID(symid));
1361}
1362@ @<Private function prototypes@> =
1363static inline
1364gint symbol_is_start(SYM symbol);
1365@ @<Public function prototypes@> =
1366gint marpa_symbol_is_start( struct marpa_g*g, Marpa_Symbol_ID id);
1367
1368@ Symbol Aliasing:
1369This is the logic for aliasing symbols.
1370In the Aycock-Horspool algorithm, from which Marpa is derived,
1371it is essential that there be no ``proper nullable"
1372symbols.  Therefore, all proper nullable symbols in
1373the original grammar are converted into two, aliased,
1374symbols: a non-nullable (or ``proper") alias and a nulling alias.
1375@<Bit aligned symbol elements@> =
1376guint t_is_proper_alias:1;
1377guint t_is_nulling_alias:1;
1378@ @<Widely aligned symbol elements@> =
1379struct s_symbol* t_alias;
1380@ @<Initialize symbol elements@> =
1381symbol->t_is_proper_alias = FALSE;
1382symbol->t_is_nulling_alias = FALSE;
1383symbol->t_alias = NULL;
1384
1385@ Proper Alias Trace Accessor:
1386If this symbol is a nulling symbol
1387with a proper alias, returns the proper alias.
1388Otherwise, returns |NULL|.
1389@<Function definitions@> =
1390static inline
1391SYM symbol_proper_alias(SYM symbol)
1392{ return symbol->t_is_nulling_alias ? symbol->t_alias : NULL; }
1393Marpa_Symbol_ID marpa_symbol_proper_alias(struct marpa_g* g, Marpa_Symbol_ID symid)
1394{
1395SYM symbol;
1396SYM proper_alias;
1397@<Return |-2| on failure@>@;
1398@<Fail if grammar |symid| is invalid@>@;
1399symbol = SYM_by_ID(symid);
1400proper_alias = symbol_proper_alias(symbol);
1401return proper_alias == NULL ? -1 : ID_of_SYM(proper_alias);
1402}
1403@ @<Private function prototypes@> =
1404static inline SYM symbol_proper_alias(SYM symbol);
1405@ @<Public function prototypes@> =
1406Marpa_Symbol_ID marpa_symbol_proper_alias(struct marpa_g* g, Marpa_Symbol_ID symid);
1407
1408@ Nulling Alias Trace Accessor:
1409If this symbol is a proper (non-nullable) symbol
1410with a nulling alias, returns the nulling alias.
1411Otherwise, returns |NULL|.
1412@<Function definitions@> =
1413static inline
1414SYM symbol_null_alias(SYM symbol)
1415{ return symbol->t_is_proper_alias ? symbol->t_alias : NULL; }
1416Marpa_Symbol_ID marpa_symbol_null_alias(struct marpa_g* g, Marpa_Symbol_ID symid)
1417{
1418SYM symbol;
1419SYM alias;
1420@<Return |-2| on failure@>@;
1421@<Fail if grammar |symid| is invalid@>@;
1422symbol = SYM_by_ID(symid);
1423alias = symbol_null_alias(symbol);
1424if (alias == NULL) {
1425    g_context_int_add(g, "symid", symid);
1426    g->t_error = "no alias";
1427    return -1;
1428}
1429return ID_of_SYM(alias);
1430}
1431@ @<Private function prototypes@> =
1432static inline SYM symbol_null_alias(SYM symbol);
1433@ @<Public function prototypes@> =
1434Marpa_Symbol_ID marpa_symbol_null_alias(struct marpa_g* g, Marpa_Symbol_ID symid);
1435
1436@ Given a proper nullable symbol as its argument,
1437converts the argument into two ``aliases".
1438The proper (non-nullable) alias will have the same symbol ID
1439as the arugment.
1440The nulling alias will have a new symbol ID.
1441The return value is a pointer to the nulling alias.
1442@ @<Private function prototypes@> =
1443static inline
1444SYM symbol_alias_create(GRAMMAR g, SYM symbol);
1445@ @<Function definitions@> = static inline
1446SYM symbol_alias_create(GRAMMAR g, SYM symbol)
1447{
1448    SYM alias = symbol_new(g);
1449    symbol->t_is_proper_alias = TRUE;
1450    SYM_is_Nulling(symbol) = FALSE;
1451    symbol->t_is_nullable = FALSE;
1452    symbol->t_alias = alias;
1453    alias->t_is_nulling_alias = TRUE;
1454    SYM_is_Nulling(alias) = TRUE;
1455    alias->t_is_nullable = TRUE;
1456    alias->t_is_productive = TRUE;
1457    alias->t_is_accessible = symbol->t_is_accessible;
1458    alias->t_alias = symbol;
1459    return alias;
1460}
1461
1462@ {\bf Symbol callbacks}:  The user can define a callback
1463(with argument) which is invoked whenever a symbol
1464is created.
1465@ Function pointer declarations are
1466hard to type and impossible to read.
1467This typedef localizes the damage.
1468@<Callback typedefs@> =
1469typedef void (Marpa_Symbol_Callback)(struct marpa_g *g, Marpa_Symbol_ID id);
1470@ @<Widely aligned grammar elements@> =
1471    Marpa_Symbol_Callback* t_symbol_callback;
1472    gpointer t_symbol_callback_arg;
1473@ @<Initialize grammar elements@> =
1474g->t_symbol_callback_arg = NULL;
1475g->t_symbol_callback = NULL;
1476@ @<Function definitions@> =
1477void marpa_symbol_callback_set(struct marpa_g *g, Marpa_Symbol_Callback*cb)
1478{ g->t_symbol_callback = cb; }
1479void marpa_symbol_callback_arg_set(struct marpa_g *g, gpointer cb_arg)
1480{ g->t_symbol_callback_arg = cb_arg; }
1481gpointer marpa_symbol_callback_arg(struct marpa_g *g)
1482{ return g->t_symbol_callback_arg; }
1483@ @<Public function prototypes@> =
1484void marpa_symbol_callback_set(struct marpa_g *g, Marpa_Symbol_Callback*cb);
1485void marpa_symbol_callback_arg_set(struct marpa_g *g, gpointer cb_arg);
1486gpointer marpa_symbol_callback_arg(struct marpa_g *g);
1487@ Do the symbol callback.
1488{\bf To Do}: @^To Do@>
1489Look at the possibility of leaking memory if the callback
1490never returns, but the grammar is destroyed.
1491@<Function definitions@> =
1492static inline void symbol_callback(struct marpa_g *g, Marpa_Symbol_ID id)
1493{ Marpa_Symbol_Callback* cb = g->t_symbol_callback;
1494if (cb) { (*cb)(g, id); } }
1495@ @<Private function prototypes@> =
1496static inline void symbol_callback(struct marpa_g *g, Marpa_Symbol_ID id);
1497
1498@** Rule (RULE) Code.
1499@s Marpa_Rule_ID int
1500@<Public typedefs@> =
1501typedef gint Marpa_Rule_ID;
1502@ @<Private structures@> =
1503struct s_rule {
1504    @<Int aligned rule elements@>@/
1505    @<Bit aligned rule elements@>@/
1506    @<Final rule elements@>@/
1507};
1508@
1509@s RULE int
1510@s RULEID int
1511@<Private typedefs@> =
1512struct s_rule;
1513typedef struct s_rule* RULE;
1514typedef Marpa_Rule_ID RULEID;
1515
1516@*0 Rule Construction.
1517@ Set up the basic data.
1518This logic is intended to be common to all individual rules.
1519The name comes from the idea that this logic ``starts"
1520the initialization of a rule.
1521@ @<Private function prototypes@> =
1522PRIVATE_NOT_INLINE
1523RULE rule_start(GRAMMAR g,
1524SYMID lhs, SYMID *rhs, gint length);
1525@ GCC complains about inlining |rule_start| -- it is
1526not a tiny function, and it is repeated often.
1527@<Function definitions@> =
1528PRIVATE_NOT_INLINE
1529RULE rule_start(GRAMMAR g,
1530SYMID lhs, SYMID *rhs, gint length)
1531{
1532    @<Return |NULL| on failure@>@;
1533    RULE rule;
1534    const gint rule_sizeof = G_STRUCT_OFFSET (struct s_rule, t_symbols) +
1535        (length + 1) * sizeof (rule->t_symbols[0]);
1536    @<Return failure on invalid rule symbols@>@/
1537    rule = obstack_alloc (&g->t_obs, rule_sizeof);
1538    @<Initialize rule symbols@>@/
1539    @<Initialize rule elements@>@/
1540    rule_add(g, rule->t_id, rule);
1541    @<Add this rule to the symbol rule lists@>
1542   return rule;
1543}
1544
1545@ @<Public function prototypes@> =
1546Marpa_Rule_ID marpa_rule_new(struct marpa_g *g,
1547Marpa_Symbol_ID lhs, Marpa_Symbol_ID *rhs, gint length);
1548@ @<Function definitions@> =
1549Marpa_Rule_ID marpa_rule_new(struct marpa_g *g,
1550Marpa_Symbol_ID lhs, Marpa_Symbol_ID *rhs, gint length)
1551{
1552    Marpa_Rule_ID rule_id;
1553    RULE rule;
1554    if (length > MAX_RHS_LENGTH) {
1555	g->t_error = (Marpa_Error_ID)"rhs too long";
1556        return -1;
1557    }
1558    if (is_rule_duplicate(g, lhs, rhs, length) == TRUE) {
1559	g->t_error = (Marpa_Error_ID)"duplicate rule";
1560        return -1;
1561    }
1562    rule = rule_start(g, lhs, rhs, length);
1563    if (!rule) { return -1; }@;
1564    rule_id = rule->t_id;
1565    rule_callback(g, rule_id);
1566    return rule_id;
1567}
1568
1569@ @<Public function prototypes@> =
1570Marpa_Rule_ID marpa_sequence_new(struct marpa_g *g,
1571Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id,
1572gint min, gint flags );
1573@ @<Function definitions@> =
1574Marpa_Rule_ID marpa_sequence_new(struct marpa_g *g,
1575Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id,
1576gint min, gint flags )
1577{
1578    @<Return |-2| on failure@>@;
1579    Marpa_Rule_ID original_rule_id;
1580    RULE original_rule;
1581    Marpa_Symbol_ID internal_lhs_id, *temp_rhs;@;
1582    if (is_rule_duplicate(g, lhs_id, &rhs_id, 1) == TRUE) {
1583	g_context_clear(g);
1584	g->t_error = (Marpa_Error_ID)"duplicate rule";
1585        return failure_indicator;
1586    }
1587
1588    @<Add the original rule for a sequence@>@;
1589    @<Check that the separator is valid or -1@>@;
1590    @<Mark the counted symbols@>@;
1591    if (min == 0) { @<Add the nulling rule for a sequence@>@; }
1592    min = 1;
1593    @<Create the internal LHS symbol@>@;
1594    @<Allocate the temporary rhs buffer@>@;
1595    @<Add the top rule for the sequence@>@;
1596    if (separator_id >= 0 && !(flags & MARPA_PROPER_SEPARATION)) {
1597	@<Add the alternate top rule for the sequence@>@;
1598    }
1599    @<Add the minimum rule for the sequence@>@;
1600    @<Add the iterating rule for the sequence@>@;
1601    @<Free the temporary rhs buffer@>@;
1602    return original_rule_id;
1603}
1604@ As a side effect, this checks the LHS and RHS symbols for validity.
1605@<Add the original rule for a sequence@> =
1606    original_rule = rule_start(g, lhs_id, &rhs_id, 1);
1607    if (!original_rule) {
1608	g_context_clear(g);
1609	g->t_error = "internal_error";
1610	return failure_indicator;
1611    }
1612    RULE_is_Used(original_rule) = 0;
1613    original_rule_id = original_rule->t_id;
1614    original_rule->t_is_discard = !(flags & MARPA_KEEP_SEPARATION)
1615      && separator_id >= 0;
1616    rule_callback(g, original_rule_id);
1617
1618@ @<Check that the separator is valid or -1@> =
1619if (separator_id != -1 && !symbol_is_valid(g, separator_id)) {
1620    g_context_clear(g);
1621    g_context_int_add(g, "symid", separator_id);
1622    g->t_error = "bad separator";
1623    return failure_indicator;
1624}
1625
1626@ @<Mark the counted symbols@> =
1627SYM_by_ID(rhs_id)->t_is_counted = 1;
1628if (separator_id >= 0) { SYM_by_ID(separator_id)->t_is_counted = 1; }
1629@ @<Add the nulling rule for a sequence@> =
1630	{ RULE rule = rule_start(g, lhs_id, 0, 0);
1631	if (!rule) { @<Fail with internal grammar error@>@; }
1632	rule->t_is_semantic_equivalent = TRUE;
1633	rule->t_original = original_rule_id;
1634	rule_callback(g, rule->t_id);
1635	}
1636@ @<Create the internal LHS symbol@> =
1637    internal_lhs_id = ID_of_SYM(symbol_new(g));
1638    symbol_callback(g, internal_lhs_id);
1639@ The actual size needed for the RHS buffer is determined by
1640the longer of minimum rule and the iterating rule.
1641The iterating rule may require 3 RHS symbols, if there is
1642a separator.
1643(We have $min>=1$ at this point.)
1644The minimum rule will require $1 + 2 * (min - 1)$ symbols
1645with a separator, and $min$ symbols without.
1646The allocation below uses a simplified expression, which
1647overallocates.
1648Worst case is the minimum rule with a separator, in
1649which case it allocates 4 bytes too many.
1650@<Allocate the temporary rhs buffer@> =
1651temp_rhs = g_new(Marpa_Symbol_ID, (3 + (separator_id < 0 ? 1 : 2) * min));
1652@ @<Free the temporary rhs buffer@> = g_free(temp_rhs);
1653@ @<Add the top rule for the sequence@> =
1654{ RULE rule;
1655temp_rhs[0] = internal_lhs_id;
1656rule = rule_start(g, lhs_id, temp_rhs, 1);
1657if (!rule) { @<Fail with internal grammar error@>@; }
1658rule->t_original = original_rule_id;
1659rule->t_is_semantic_equivalent = TRUE;
1660/* Real symbol count remains at default of 0 */
1661RULE_is_Virtual_RHS(rule) = TRUE;
1662rule_callback(g, rule->t_id);
1663}
1664@ This ``alternate" top rule is needed if a final separator is allowed.
1665@<Add the alternate top rule for the sequence@> =
1666{ RULE rule;
1667    temp_rhs[0] = internal_lhs_id;
1668    temp_rhs[1] = separator_id;
1669    rule = rule_start(g, lhs_id, temp_rhs, 2);
1670    if (!rule) { @<Fail with internal grammar error@>@; }
1671    rule->t_original = original_rule_id;
1672    rule->t_is_semantic_equivalent = TRUE;
1673    RULE_is_Virtual_RHS(rule) = TRUE;
1674    Real_SYM_Count_of_RULE(rule) = 1;
1675    rule_callback(g, rule->t_id);
1676}
1677@ The traditional way to write a sequence in BNF is with one
1678rule to represent the minimum, and another to deal with iteration.
1679That's the core of Marpa's rewrite.
1680@<Add the minimum rule for the sequence@> =
1681{ RULE rule;
1682gint rhs_ix, i;
1683    temp_rhs[0] = rhs_id;
1684    rhs_ix = 1;
1685    for (i = 0; i < min - 1; i++) {
1686        if (separator_id >= 0) temp_rhs[rhs_ix++] = separator_id;
1687        temp_rhs[rhs_ix++] = rhs_id;
1688    }
1689    rule = rule_start(g, internal_lhs_id, temp_rhs, rhs_ix);
1690    if (!rule) { @<Fail with internal grammar error@>@; }
1691    RULE_is_Virtual_LHS(rule) = 1;
1692    Real_SYM_Count_of_RULE(rule) = rhs_ix;
1693    rule_callback(g, rule->t_id);
1694}
1695@ @<Add the iterating rule for the sequence@> =
1696{ RULE rule;
1697gint rhs_ix = 0;
1698    temp_rhs[rhs_ix++] = internal_lhs_id;
1699    if (separator_id >= 0) temp_rhs[rhs_ix++] = separator_id;
1700    temp_rhs[rhs_ix++] = rhs_id;
1701    rule = rule_start(g, internal_lhs_id, temp_rhs, rhs_ix);
1702    if (!rule) { @<Fail with internal grammar error@>@; }
1703    RULE_is_Virtual_LHS(rule) = 1;
1704    RULE_is_Virtual_RHS(rule) = 1;
1705    Real_SYM_Count_of_RULE(rule) = rhs_ix - 1;
1706    rule_callback(g, rule->t_id);
1707}
1708
1709@ Does this rule duplicate an already existing rule?
1710A duplicate is a rule with the same lhs symbol,
1711the same rhs length,
1712and the same symbol in each position on the rhs.
1713
1714Note that this definition of duplicate applies to
1715sequences as well.  That means that a sequence rule
1716can be a duplicate of a non-sequence rule of length 1,
1717if they have the same lhs symbols and the same rhs
1718symbol.
1719Also, that means you cannot define sequences
1720that differ only in the separator, or only in the
1721minimum count.
1722
1723I do not think the
1724restrictions on sequence rules represent real limitations.
1725Multiple sequences with the same lhs and rhs would be
1726very confusing.
1727And users who really, really want such them are free
1728to write the sequences out as BNF rules.
1729After all, sequence rules are only a shorthand.
1730And shorthand is counter-productive when it makes
1731you lose track of what you are trying to say.
1732
1733The algorithm is the first get a list of all the rules
1734with the same LHS, which is very fast because
1735I have pre-computed it.
1736If there are no such rules, the new rule is
1737unique (not a duplicate).
1738If there are such rules, I look at them,
1739trying to find one that duplicates the new
1740rule.
1741For each old rule, I first compare its length to
1742the new rule, and then its right hand side
1743symbols, one by one.
1744If all these comparisons succeed, I conclude
1745that the old rule duplicates the new one
1746and return |TRUE|.
1747If, after having done the comparison for all
1748the ``same LHS" rules, I have found no duplicates,
1749then I conclude there is no duplicate of the new
1750rule, and return |FALSE|.
1751@ @<Private function prototypes@> =
1752static inline
1753gboolean is_rule_duplicate(struct marpa_g* g,
1754Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID* rhs_ids, gint length);
1755@ @<Function definitions@> =
1756static inline
1757gboolean is_rule_duplicate(struct marpa_g* g,
1758Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID* rhs_ids, gint length)
1759{
1760    gint ix;
1761    SYM lhs = SYM_by_ID(lhs_id);
1762    GArray* same_lhs_array = lhs->t_lhs;
1763    gint same_lhs_count = same_lhs_array->len;
1764    for (ix = 0; ix < same_lhs_count; ix++) {
1765	RULEID same_lhs_rule_id = ((RULEID *)(same_lhs_array->data))[ix];
1766	gint rhs_position;
1767	RULE rule = RULE_by_ID(g, same_lhs_rule_id);
1768	const gint rule_length = Length_of_RULE(rule);
1769	if (rule_length != length) { goto RULE_IS_NOT_DUPLICATE; }
1770	for (rhs_position = 0; rhs_position < rule_length; rhs_position++) {
1771	    if (RHS_ID_of_RULE(rule, rhs_position) != rhs_ids[rhs_position]) {
1772	        goto RULE_IS_NOT_DUPLICATE;
1773	    }
1774	}
1775	return TRUE; /* This rule duplicates the new one */
1776	RULE_IS_NOT_DUPLICATE: ;
1777    }
1778    return FALSE; /* No duplicate rules were found */
1779}
1780
1781@ Add the rules to the symbol's rule lists:
1782An obstack scratchpad might be useful for
1783the copy of the RHS symbols.
1784|alloca|, while tempting, should not used
1785because an unusually long RHS could cause
1786a stack overflow.
1787Even if such case is pathological,
1788a core dump is not the right response.
1789@<Add this rule to the symbol rule lists@> =
1790    symbol_lhs_add(SYM_by_ID(rule->t_symbols[0]), rule->t_id);@;
1791    if (Length_of_RULE(rule) > 0) {
1792	gint rh_list_ix;
1793	const guint alloc_size = Length_of_RULE(rule)*sizeof( SYMID);
1794	Marpa_Symbol_ID *rh_symbol_list = g_slice_alloc(alloc_size);
1795	gint rh_symbol_list_length = 1;
1796	@<Create |rh_symbol_list|,
1797	a duplicate-free list of the right hand side symbols@>@;
1798       for (rh_list_ix = 0;
1799	   rh_list_ix < rh_symbol_list_length;
1800	   rh_list_ix++) {
1801	    symbol_rhs_add(
1802		SYM_by_ID(rh_symbol_list[rh_list_ix]),
1803		rule->t_id);
1804       }@;
1805       g_slice_free1(alloc_size, rh_symbol_list);
1806    }
1807
1808@ \marpa_sub{Create a duplicate-free list of the right hand side symbols}
1809The algorithm is a
1810hand-coded
1811insertion sort, modified to not insert duplicates.
1812@ The first goal is to optimize for the usual case,
1813where both the average and root mean square of
1814number of unique symbols on the RHS of a rule
1815is a small number -- usually less
1816than 10.
1817(Root mean square is more relevant than the average for
1818comparison with worst case performance.)
1819bizarrely long.
1820A hand-inlined insertion sort is perfect for
1821this.
1822\par It might be thought that the below could
1823be improved by finding the insertion point
1824with a binary search, but when the number of RHS symbols
1825for most rules is less than a certain number,
1826a the higher-overhead binary search is worse,
1827not better.
1828This number is probably around 8, and in practice most rules
1829are shorter than that.
1830A reasonable alternative is to only use binary search above
1831a certain size, but in most cases that will produce no
1832measurable improvement.
1833
1834@ A second goal is that behavior for unusual and pathological
1835cases be, if not optimal, reasonable.
1836Worst case for insertion sort is $O(n^2)$).
1837(This is why I used the root mean square, not a simple average.)
1838This would be approached if most of the right hand symbols were
1839in very long rules.
1840$O(n^2)$ is in fact, not actually a worse case than the quicksort
1841on which |qsort| is usually based.
1842The hand-coding here means it would take some effort to
1843construct a case in which
1844the theoretical advantage of another
1845sort algorithm would
1846show up in practice.
1847\par If anyone comes to care about very long right hand sides,
1848this algorithm can be changed to switch over to mergesort
1849when the right hand side exceeds a certain length.
1850The cost of an extra comparision is tiny, but then again,
1851so would the likelihood of any benefit from an alternative sort
1852algorithm would also
1853be tiny.
1854
1855@ The code assumes that the rhs has length greater than zero.
1856@<Create |rh_symbol_list|, a duplicate-free list of the right hand side symbols@> =
1857{
1858/* Handle the first symbol as a special case */
1859gint rhs_ix = Length_of_RULE (rule) - 1;
1860rh_symbol_list[0] = RHS_ID_of_RULE(rule, (unsigned)rhs_ix);
1861rh_symbol_list_length = 1;
1862rhs_ix--;
1863for (; rhs_ix >= 0; rhs_ix--) {
1864    gint higher_ix;
1865    Marpa_Symbol_ID new_symid = RHS_ID_of_RULE(rule, (unsigned)rhs_ix);
1866    gint next_highest_ix = rh_symbol_list_length - 1;
1867    while (next_highest_ix >= 0) {
1868	Marpa_Symbol_ID current_symid = rh_symbol_list[next_highest_ix];
1869	if (current_symid == new_symid) goto ignore_this_symbol;
1870	if (current_symid < new_symid) break;
1871        next_highest_ix--;
1872    }
1873    /* Shift the higher symbol ID's up one slot */
1874    for (higher_ix = rh_symbol_list_length-1;
1875	    higher_ix > next_highest_ix;
1876	    higher_ix--) {
1877        rh_symbol_list[higher_ix+1] = rh_symbol_list[higher_ix];
1878    }
1879    /* Insert the next symbol */
1880    rh_symbol_list[next_highest_ix+1] = new_symid;
1881    rh_symbol_list_length++;
1882    ignore_this_symbol: ;
1883}
1884}
1885
1886@*0 Rule Symbols.
1887A rule takes the traditiona form of
1888a left hand side (LHS), and a right hand side (RHS).
1889The {\bf length} of a rule is the length of the RHS ---
1890there is always exactly one LHS symbol.
1891Maximum length of the RHS is restricted.
1892I take off two more bits than necessary, as a fudge
1893factor.
1894This is only checked for new rules.
1895The rules generated internally by libmarpa
1896are shorter than
1897a small constant in length, and
1898rewrites of existing rules shorten them.
1899On a 32-bit machine, this still allows a RHS of over a billion
1900of symbols.
1901I believe
1902by the time 64-bit machines become universal,
1903nobody will have noticed this restriction.
1904@d MAX_RHS_LENGTH (G_MAXINT >> (2))
1905@d Length_of_RULE(rule) ((rule)->t_rhs_length)
1906@<Int aligned rule elements@> = gint t_rhs_length;
1907@ The symbols come at the end of the |marpa_rule| structure,
1908so that they can be variable length.
1909@<Final rule elements@> = Marpa_Symbol_ID t_symbols[1];
1910
1911@ @<Return failure on invalid rule symbols@> =
1912{
1913    SYMID symid = lhs;
1914    @<Fail if grammar |symid| is invalid@>@;
1915}
1916{ gint rh_index;
1917    for (rh_index = 0; rh_index<length; rh_index++) {
1918	SYMID symid = rhs[rh_index];
1919	@<Fail if grammar |symid| is invalid@>@;
1920    }
1921}
1922
1923@ @<Initialize rule symbols@> =
1924Length_of_RULE(rule) = length;
1925rule->t_symbols[0] = lhs;
1926{ gint i; for (i = 0; i<length; i++) {
1927    rule->t_symbols[i+1] = rhs[i]; } }
1928@ @<Function definitions@> =
1929static inline Marpa_Symbol_ID rule_lhs_get(RULE rule) {
1930    return rule->t_symbols[0]; }
1931@ @<Private function prototypes@> =
1932static inline Marpa_Symbol_ID rule_lhs_get(RULE rule);
1933@ @<Function definitions@> =
1934Marpa_Symbol_ID marpa_rule_lhs(struct marpa_g *g, Marpa_Rule_ID rule_id) {
1935    @<Return |-2| on failure@>@;
1936    @<Fail if grammar |rule_id| is invalid@>@;
1937    return rule_lhs_get(RULE_by_ID(g, rule_id)); }
1938@ @<Public function prototypes@> =
1939Marpa_Symbol_ID marpa_rule_lhs(struct marpa_g *g, Marpa_Rule_ID rule_id);
1940@ @<Function definitions@> =
1941static inline Marpa_Symbol_ID* rule_rhs_get(RULE rule) {
1942    return rule->t_symbols+1; }
1943@ @<Private function prototypes@> =
1944static inline Marpa_Symbol_ID* rule_rhs_get(RULE rule);
1945@ @<Public function prototypes@> =
1946Marpa_Symbol_ID marpa_rule_rh_symbol(struct marpa_g *g, Marpa_Rule_ID rule_id, gint ix);
1947@ @<Function definitions@> =
1948Marpa_Symbol_ID marpa_rule_rh_symbol(struct marpa_g *g, Marpa_Rule_ID rule_id, gint ix) {
1949    RULE rule;
1950    @<Return |-2| on failure@>@;
1951    @<Fail if grammar |rule_id| is invalid@>@;
1952    rule = RULE_by_ID(g, rule_id);
1953    if (Length_of_RULE(rule) <= ix) return -1;
1954    return RHS_ID_of_RULE(rule, ix);
1955}
1956@ @<Function definitions@> =
1957static inline gsize rule_length_get(RULE rule) {
1958    return Length_of_RULE(rule); }
1959@ @<Private function prototypes@> =
1960static inline gsize rule_length_get(RULE rule);
1961@ @<Function definitions@> =
1962gint marpa_rule_length(struct marpa_g *g, Marpa_Rule_ID rule_id) {
1963    @<Return |-2| on failure@>@;
1964    @<Fail if grammar |rule_id| is invalid@>@;
1965    return rule_length_get(RULE_by_ID(g, rule_id)); }
1966@ @<Public function prototypes@> =
1967gint marpa_rule_length(struct marpa_g *g, Marpa_Rule_ID rule_id);
1968
1969@*1 Symbols of the Rule.
1970@d LHS_ID_of_RULE(rule) ((rule)->t_symbols[0])
1971@d RHS_ID_of_RULE(rule, position)
1972    ((rule)->t_symbols[(position)+1])
1973
1974@*0 Rule ID.
1975The {\bf rule ID} is a number which
1976acts as the unique identifier for a rule.
1977@d ID_of_RULE(rule) ((rule)->t_id)
1978@<Int aligned rule elements@> = Marpa_Rule_ID t_id;
1979@ @<Initialize rule elements@> = rule->t_id = g->t_rules->len;
1980
1981@*0 Rule Boolean: Keep Separator.
1982When this rule is evaluated by the semantics,
1983do they want to see the separators?
1984Default is that they are thrown away.
1985Usually the role of the separators is only syntactic,
1986and that is what is wanted.
1987For non-sequence rules, this flag should be false.
1988@<Public defines@> =
1989#define MARPA_KEEP_SEPARATION @| @[0x1@]@/
1990@ @<Bit aligned rule elements@> = guint t_is_discard:1;
1991@ @<Initialize rule elements@> =
1992rule->t_is_discard = FALSE;
1993@ @<Function definitions@> =
1994gboolean marpa_rule_is_discard_separation(struct marpa_g* g, Marpa_Rule_ID id)
1995{ return RULE_by_ID(g, id)->t_is_discard; }
1996@ @<Public function prototypes@> =
1997gboolean marpa_rule_is_discard_separation(struct marpa_g* g, Marpa_Rule_ID id);
1998
1999@*0 Rule Boolean: Proper Separation.
2000In Marpa's terminology,
2001proper separation means that a sequence
2002cannot legally end with a separator.
2003In ``proper" separation,
2004the term separator is interpreted strictly,
2005as something which separates two list items.
2006A separator coming after the final list item does not separate
2007two items, and therefore traditionally was considered a syntax
2008error.
2009\par
2010Proper separation is often inconvenient,
2011or even counter-productive.
2012Increasingly, the
2013practice is to be ``liberal"
2014and to allow a separator to come after the last list
2015item.
2016Liberal separation is the default in Marpa.
2017\par
2018There is not bitfield for this, because proper separation is
2019a completely syntactic matter,
2020taken care of in the rewrite itself.
2021@<Public defines@> =
2022#define MARPA_PROPER_SEPARATION @| @[0x2@]@/
2023
2024@*0 Accessible Rules.
2025@ A rule is accessible if its LHS is accessible.
2026@<Function definitions@> =
2027static inline gint rule_is_accessible(struct marpa_g* g, RULE  rule)
2028{
2029Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule);
2030 return SYM_by_ID(lhs_id)->t_is_accessible; }
2031gint marpa_rule_is_accessible(struct marpa_g* g, Marpa_Rule_ID rule_id)
2032{
2033    @<Return |-2| on failure@>@;
2034RULE  rule;
2035    @<Fail if grammar |rule_id| is invalid@>@;
2036rule = RULE_by_ID(g, rule_id);
2037return rule_is_accessible(g, rule);
2038}
2039@ @<Private function prototypes@> =
2040static inline gint rule_is_accessible(struct marpa_g* g, RULE  rule);
2041@ @<Public function prototypes@> =
2042gint marpa_rule_is_accessible(struct marpa_g* g, Marpa_Rule_ID id);
2043
2044@*0 Productive Rules.
2045@ A rule is productive if every symbol on its RHS is productive.
2046@<Function definitions@> =
2047static inline gint rule_is_productive(struct marpa_g* g, RULE  rule)
2048{
2049gint rh_ix;
2050for (rh_ix = 0; rh_ix < Length_of_RULE(rule); rh_ix++) {
2051   Marpa_Symbol_ID rhs_id = RHS_ID_of_RULE(rule, rh_ix);
2052   if ( !SYM_by_ID(rhs_id)->t_is_productive ) return FALSE;
2053}
2054return TRUE; }
2055gint marpa_rule_is_productive(struct marpa_g* g, Marpa_Rule_ID rule_id)
2056{
2057    @<Return |-2| on failure@>@;
2058RULE  rule;
2059    @<Fail if grammar |rule_id| is invalid@>@;
2060rule = RULE_by_ID(g, rule_id);
2061return rule_is_productive(g, rule);
2062}
2063@ @<Private function prototypes@> =
2064static inline gint rule_is_productive(struct marpa_g* g, RULE  rule);
2065@ @<Public function prototypes@> =
2066gint marpa_rule_is_productive(struct marpa_g* g, Marpa_Rule_ID id);
2067
2068@*0 Loop Rule.
2069@ A rule is a loop rule if it non-trivially
2070produces the string of length one
2071which consists only of its LHS symbol.
2072``Non-trivially" means the zero-step derivation does not count -- the
2073derivation must have at least one step.
2074@<Bit aligned rule elements@> = guint t_is_loop:1;
2075@ @<Initialize rule elements@> =
2076rule->t_is_loop = FALSE;
2077@ This is the external accessor.
2078The internal accessor would be trivial, so there is none.
2079@<Function definitions@> =
2080gint marpa_rule_is_loop(struct marpa_g* g, Marpa_Rule_ID rule_id)
2081{
2082    @<Return |-2| on failure@>@;
2083    @<Fail if grammar |rule_id| is invalid@>@;
2084return RULE_by_ID(g, rule_id)->t_is_loop; }
2085@ @<Public function prototypes@> =
2086gint marpa_rule_is_loop(struct marpa_g* g, Marpa_Rule_ID rule_id);
2087
2088@*0 Virtual Loop Rule.
2089@ When dealing with rules which result from the CHAF rewrite,
2090it is convenient to recognize the ``loop rule" property as belonging
2091to only one of the pieces.
2092The ``virtual loop rule" property exists for this purpose.
2093All virtual loop rules are loop rules,
2094but not vice versa.
2095@<Bit aligned rule elements@> = guint t_is_virtual_loop:1;
2096@ @<Initialize rule elements@> =
2097rule->t_is_virtual_loop = FALSE;
2098@ This is the external accessor.
2099The internal accessor would be trivial, so there is none.
2100@<Function definitions@> =
2101gint marpa_rule_is_virtual_loop(struct marpa_g* g, Marpa_Rule_ID rule_id)
2102{
2103    @<Return |-2| on failure@>@;
2104    @<Fail if grammar |rule_id| is invalid@>@;
2105return RULE_by_ID(g, rule_id)->t_is_virtual_loop; }
2106@ @<Public function prototypes@> =
2107gint marpa_rule_is_virtual_loop(struct marpa_g* g, Marpa_Rule_ID rule_id);
2108
2109@*0 Nulling Rules.
2110@ A rule is nulling if every symbol on its RHS is nulling.
2111Note that this can be vacuously true --- an empty rule is nulling.
2112@<Function definitions@> =
2113static inline gint
2114rule_is_nulling (GRAMMAR g, RULE rule)
2115{
2116  gint rh_ix;
2117  for (rh_ix = 0; rh_ix < Length_of_RULE (rule); rh_ix++)
2118    {
2119      SYMID rhs_id = RHS_ID_of_RULE (rule, rh_ix);
2120      if (!SYM_is_Nulling(SYM_by_ID (rhs_id)))
2121	return FALSE;
2122    }
2123  return TRUE;
2124}
2125@ @<Private function prototypes@> =
2126static inline gint rule_is_nulling(GRAMMAR g, RULE rule);
2127
2128@*0 Is Rule Used?.
2129@d RULE_is_Used(rule) ((rule)->t_is_used)
2130@<Bit aligned rule elements@> = guint t_is_used:1;
2131@ @<Initialize rule elements@> =
2132RULE_is_Used(rule) = 1;
2133@ This is the external accessor.
2134The internal accessor would be trivial, so there is none.
2135@<Function definitions@> =
2136gint marpa_rule_is_used(struct marpa_g* g, Marpa_Rule_ID rule_id)
2137{
2138    @<Return |-2| on failure@>@;
2139    @<Fail if grammar |rule_id| is invalid@>@;
2140return RULE_is_Used(RULE_by_ID(g, rule_id)); }
2141@ @<Public function prototypes@> =
2142gint marpa_rule_is_used(struct marpa_g* g, Marpa_Rule_ID rule_id);
2143
2144@*0 Is This a Start Rule?.
2145@d RULE_is_Start(rule) ((rule)->t_is_start)
2146@<Bit aligned rule elements@> = guint t_is_start:1;
2147@ @<Initialize rule elements@> =
2148rule->t_is_start = FALSE;
2149@ This is the external accessor.
2150The internal accessor would be trivial, so there is none.
2151@<Function definitions@> =
2152gint marpa_rule_is_start(struct marpa_g* g, Marpa_Rule_ID rule_id)
2153{
2154    @<Return |-2| on failure@>@;
2155    @<Fail if grammar |rule_id| is invalid@>@;
2156return RULE_by_ID(g, rule_id)->t_is_start; }
2157@ @<Public function prototypes@> =
2158gint marpa_rule_is_start(struct marpa_g* g, Marpa_Rule_ID rule_id);
2159
2160@*0 Rule Boolean: Virtual LHS.
2161This is for Marpa's ``internal semantics".
2162When Marpa rewrites rules, it does so in a way invisible to
2163the user's semantics.
2164It does this by marking rules so that it can reassemble
2165the results of rewritten rules to appear ``as if"
2166they were the result of evaluating the original,
2167un-rewritten rule.
2168\par
2169All Marpa's rewrites allow the rewritten rules to be
2170``dummied up" to look like the originals.
2171That this must be possible for any rewrite was one of
2172Marpa's design criteria.
2173It was an especially non-negotiable criteria, because
2174almost the only reason for parsing a grammar is to apply the
2175semantics specified for the original grammar.
2176@d RULE_is_Virtual_LHS(rule) ((rule)->t_is_virtual_lhs)
2177@<Bit aligned rule elements@> = guint t_is_virtual_lhs:1;
2178@ @<Initialize rule elements@> =
2179RULE_is_Virtual_LHS(rule) = FALSE;
2180@ The internal accessor would be trivial, so there is none.
2181@<Function definitions@> =
2182gboolean marpa_rule_is_virtual_lhs(struct marpa_g* g, Marpa_Rule_ID rule_id)
2183{
2184@<Return |-2| on failure@>@;
2185@<Fail if grammar |rule_id| is invalid@>@;
2186return RULE_is_Virtual_LHS(RULE_by_ID(g, rule_id)); }
2187@ @<Public function prototypes@> =
2188gboolean marpa_rule_is_virtual_lhs(struct marpa_g* g, Marpa_Rule_ID rule_id);
2189
2190@*0 Rule Boolean: Virtual RHS.
2191@d RULE_is_Virtual_RHS(rule) ((rule)->t_is_virtual_rhs)
2192@<Bit aligned rule elements@> = guint t_is_virtual_rhs:1;
2193@ @<Initialize rule elements@> =
2194RULE_is_Virtual_RHS(rule) = FALSE;
2195@ The internal accessor would be trivial, so there is none.
2196@<Function definitions@> =
2197gboolean marpa_rule_is_virtual_rhs(struct marpa_g* g, Marpa_Rule_ID rule_id)
2198{
2199@<Return |-2| on failure@>@;
2200@<Fail if grammar |rule_id| is invalid@>@;
2201return RULE_is_Virtual_RHS(RULE_by_ID(g, rule_id)); }
2202@ @<Public function prototypes@> =
2203gboolean marpa_rule_is_virtual_rhs(struct marpa_g* g, Marpa_Rule_ID rule_id);
2204
2205@*0 Virtual Start Position.
2206For a virtual rule,
2207this is the RHS position in the original rule
2208where this one starts.
2209@<Int aligned rule elements@> = gint t_virtual_start;
2210@ @<Initialize rule elements@> = rule->t_virtual_start = -1;
2211@ @<Function definitions@> =
2212guint marpa_virtual_start(struct marpa_g *g, Marpa_Rule_ID rule_id)
2213{
2214@<Return |-2| on failure@>@;
2215@<Fail if grammar |rule_id| is invalid@>@;
2216return RULE_by_ID(g, rule_id)->t_virtual_start;
2217}
2218@ @<Public function prototypes@> =
2219guint marpa_virtual_start(struct marpa_g *g, Marpa_Rule_ID rule_id);
2220
2221@*0 Virtual End Position.
2222For a virtual rule,
2223this is the RHS position in the original rule
2224at which this one ends.
2225@<Int aligned rule elements@> = gint t_virtual_end;
2226@ @<Initialize rule elements@> = rule->t_virtual_end = -1;
2227@ @<Function definitions@> =
2228guint marpa_virtual_end(struct marpa_g *g, Marpa_Rule_ID rule_id)
2229{
2230@<Return |-2| on failure@>@;
2231@<Fail if grammar |rule_id| is invalid@>@;
2232return RULE_by_ID(g, rule_id)->t_virtual_end;
2233}
2234@ @<Public function prototypes@> =
2235guint marpa_virtual_end(struct marpa_g *g, Marpa_Rule_ID rule_id);
2236
2237@*0 Rule Callbacks.
2238The user can define a callback
2239(with argument) which is invoked whenever a rule
2240is created.
2241@ Function pointer declarations are
2242hard to type and impossible to read.
2243This typedef localizes the damage.
2244@<Callback typedefs@> =
2245typedef void (Marpa_Rule_Callback)(struct marpa_g *g, Marpa_Rule_ID id);
2246@ @<Widely aligned grammar elements@> =
2247    Marpa_Rule_Callback* t_rule_callback;
2248    gpointer t_rule_callback_arg;
2249@ @<Initialize grammar elements@> =
2250g->t_rule_callback_arg = NULL;
2251g->t_rule_callback = NULL;
2252@ @<Function definitions@> =
2253void marpa_rule_callback_set(struct marpa_g *g, Marpa_Rule_Callback*cb)
2254{ g->t_rule_callback = cb; }
2255@ @<Public function prototypes@> =
2256void marpa_rule_callback_set(struct marpa_g *g, Marpa_Rule_Callback*cb);
2257@ @<Function definitions@> =
2258void marpa_rule_callback_arg_set(struct marpa_g *g, gpointer cb_arg)
2259{ g->t_rule_callback_arg = cb_arg; }
2260@ @<Public function prototypes@> =
2261void marpa_rule_callback_arg_set(struct marpa_g *g, gpointer cb_arg);
2262@ @<Function definitions@> =
2263gpointer marpa_rule_callback_arg(struct marpa_g *g)
2264{ return g->t_rule_callback_arg; }
2265@ @<Public function prototypes@> =
2266gpointer marpa_rule_callback_arg(struct marpa_g *g);
2267@ Do the rule callback.
2268@<Private function prototypes@> =
2269static inline void rule_callback(struct marpa_g *g, Marpa_Rule_ID id);
2270@ {\bf To Do}: @^To Do@>
2271Look at with the possibility of leaking memory if the callback
2272never returns, but the grammar is destroyed.
2273@<Function definitions@> =
2274static inline void rule_callback(struct marpa_g *g, Marpa_Rule_ID id)
2275{ Marpa_Rule_Callback* cb = g->t_rule_callback;
2276if (cb) { (*cb)(g, id); } }
2277
2278@*0 Rule Original.
2279In many cases, Marpa will rewrite a rule.
2280If this rule is the result of a rewriting, this element contains
2281the ID of the original rule.
2282@ @<Int aligned rule elements@> = Marpa_Rule_ID t_original;
2283@ @<Initialize rule elements@> = rule->t_original = -1;
2284@ @<Function definitions@> =
2285Marpa_Rule_ID marpa_rule_original(struct marpa_g *g, Marpa_Rule_ID rule_id)
2286{
2287@<Return |-2| on failure@>@;
2288@<Fail if grammar |rule_id| is invalid@>@;
2289return RULE_by_ID(g, rule_id)->t_original;
2290}
2291@ @<Public function prototypes@> =
2292Marpa_Rule_ID marpa_rule_original(struct marpa_g *g, Marpa_Rule_ID rule_id);
2293
2294@*0 Rule Real Symbol Count.
2295This is another data element used for the ``internal semantics" --
2296the logic to reassemble results of rewritten rules so that they
2297look as if they came from the original, un-rewritten rules.
2298The value of this field is meaningful if and only if
2299the rule has a virtual rhs or a virtual lhs.
2300@d Real_SYM_Count_of_RULE(rule) ((rule)->t_real_symbol_count)
2301@ @<Int aligned rule elements@> = gint t_real_symbol_count;
2302@ @<Initialize rule elements@> = Real_SYM_Count_of_RULE(rule) = 0;
2303@ @<Public function prototypes@> =
2304gint marpa_real_symbol_count(struct marpa_g *g, Marpa_Rule_ID rule_id);
2305@ @<Function definitions@> =
2306gint marpa_real_symbol_count(struct marpa_g *g, Marpa_Rule_ID rule_id)
2307{
2308@<Return |-2| on failure@>@;
2309@<Fail if grammar |rule_id| is invalid@>@;
2310return Real_SYM_Count_of_RULE(RULE_by_ID(g, rule_id));
2311}
2312
2313@*0 Semantic Equivalents.
2314@<Bit aligned rule elements@> = guint t_is_semantic_equivalent:1;
2315@ @<Initialize rule elements@> =
2316rule->t_is_semantic_equivalent = FALSE;
2317@ Semantic equivalence arises out of Marpa's rewritings.
2318When a rule is rewritten,
2319some (but not all!) of the resulting rules have the
2320same semantics as the original rule.
2321It is this ``original rule" that |semantic_equivalent()| returns.
2322
2323@ If this rule is the semantic equivalent of another rule,
2324this external accessor returns the ``original rule".
2325Otherwise it returns -1.
2326@<Public function prototypes@> =
2327Marpa_Rule_ID marpa_rule_semantic_equivalent(struct marpa_g* g, Marpa_Rule_ID id);
2328@ @<Function definitions@> =
2329Marpa_Rule_ID
2330marpa_rule_semantic_equivalent (struct marpa_g *g, Marpa_Rule_ID rule_id)
2331{
2332  RULE rule;
2333@<Return |-2| on failure@>@;
2334@<Fail if grammar |rule_id| is invalid@>@;
2335  rule = RULE_by_ID (g, rule_id);
2336  if (RULE_is_Virtual_LHS(rule)) return -1;
2337  if (rule->t_is_semantic_equivalent) return rule->t_original;
2338  return rule_id;
2339}
2340
2341@** Symbol Instance (SYMI) Code.
2342@<Private typedefs@> = typedef gint SYMI;
2343@ @d SYMI_Count_of_G(g) ((g)->t_symbol_instance_count)
2344@<Int aligned grammar elements@> =
2345gint t_symbol_instance_count;
2346@ |SYMI_of_Completed_RULE| assumes that the rule is
2347not zero length.
2348|SYMI_of_Last_AIM_of_RULE| will return -1 if the
2349rule has no proper symbols.
2350@d SYMI_of_RULE(rule) ((rule)->t_symbol_instance_base)
2351@d Last_Proper_SYMI_of_RULE(rule) ((rule)->t_last_proper_symi)
2352@d SYMI_of_Completed_RULE(rule)
2353    (SYMI_of_RULE(rule) + Length_of_RULE(rule)-1)
2354@d SYMI_of_AIM(aim) (symbol_instance_of_ahfa_item_get(aim))
2355@<Int aligned rule elements@> =
2356gint t_symbol_instance_base;
2357gint t_last_proper_symi;
2358@ @<Initialize rule elements@> =
2359Last_Proper_SYMI_of_RULE(rule) = -1;
2360@ @<Private function prototypes@> =
2361static inline gint symbol_instance_of_ahfa_item_get(AIM aim);
2362@ Symbol instances are for the {\bf predot} symbol.
2363In parsing the emphasis is on what is to come ---
2364on what follows the dot.
2365Symbol instances are used in evaluation.
2366In evaluation we are looking at what we have,
2367so the emphasis is on what precedes the dot position.
2368@ The symbol instance of a prediction is $-1$.
2369If the AHFA item is not a prediction, then it has a preceding
2370AHFA item for the same rule.
2371In that case the symbol instance is the
2372base symbol instance for
2373the rule, offset by the position of that preceding AHFA item.
2374@<Function definitions@> =
2375static inline gint
2376symbol_instance_of_ahfa_item_get (AIM aim)
2377{
2378  gint position = Position_of_AIM (aim);
2379  const gint null_count = Null_Count_of_AIM(aim);
2380  if (position < 0 || position - null_count > 0) {
2381      /* If this AHFA item is not a predictiion */
2382      const RULE rule = RULE_of_AIM (aim);
2383      position = Position_of_AIM(aim-1);
2384      return SYMI_of_RULE(rule) + position;
2385  }
2386  return -1;
2387}
2388
2389@** Precomputing the Grammar.
2390Marpa's logic divides roughly into three pieces -- grammar precomputation,
2391the actual parsing of input tokens,
2392and semantic evaluation.
2393Precomputing the grammar is complex enought to divide into several
2394stages of its own, which are
2395covered in the next few
2396sections.
2397This section describes the top-level method for precomputation,
2398which is external.
2399
2400@<Function definitions@> =
2401struct marpa_g* marpa_precompute(struct marpa_g* g)
2402{
2403     if (!census(g)) return NULL;
2404     if (!CHAF_rewrite(g)) return NULL;
2405     if (!g_augment(g)) return NULL;
2406    loop_detect(g);
2407    create_AHFA_items(g);
2408    create_AHFA_states(g);
2409    @<Populate the Terminal Boolean Vector@>@;
2410     return g;
2411}
2412@ @<Public function prototypes@> =
2413struct marpa_g* marpa_precompute(struct marpa_g* g);
2414
2415@** The Grammar Census.
2416
2417@*0 Implementation: Inacessible and Unproductive Rules.
2418The textbooks say that,
2419in order to automatically {\bf eliminate} inaccessible and unproductive
2420productions from a grammar, you have to first eliminate the
2421unproductive productions, {\bf then} the inaccessible ones.
2422
2423In practice, this advice does not seem very helpful.
2424Imagine the (quite possible) case
2425of an unproductive start symbol.
2426Following the
2427correct procedure for automatically cleaning the grammar, I would
2428have to regard the start symbol and its productions as eliminated
2429and therefore go on to report every other production and symbol as
2430inaccessible.  Almost certainly all these inaccessiblity reports,
2431while theoretically correct, would be irrelevant.
2432What the user probably wants to
2433is to make the start symbol productive.
2434
2435In |libmarpa|,
2436inaccessibility is determined based on the assumption that
2437unproductive symbols will be make productive somehow,
2438and not eliminated.
2439The downside of this choice is that, in a few uncommon cases,
2440a user relying entirely
2441on the Marpa::XS warnings to clean up his grammar will have to go through
2442more than a single pass of the diagnostics.
2443(As of this writing, I personally have yet to encounter such a case.)
2444The upside is that in the more frequent cases, the user is spared
2445a lot of useless diagnostics.
2446
2447@<Function definitions@> =
2448static struct marpa_g* census(struct marpa_g* g)
2449{
2450    @<Return |NULL| on failure@>@;
2451    @<Declare census variables@>@;
2452    @<Return |NULL| if  empty grammar@>@;
2453    @<Return |NULL| if already precomputed@>@;
2454    @<Return |NULL| if bad start symbol@>@;
2455    @<Census LHS symbols@>@;
2456    @<Census terminals@>@;
2457    if (have_marked_terminals) {
2458	@<Fatal if LHS terminal when not allowed@>@;
2459    } else {
2460	@<Fatal if empty rule and unmarked terminals@>;
2461	if (g->t_is_lhs_terminal_ok) {
2462	    @<Mark all symbols terminal@>@;
2463	} else {
2464	    @<Mark non-LHS symbols terminal@>@;
2465	}
2466    }
2467    @<Census nullable symbols@>@;
2468    @<Census productive symbols@>@;
2469    @<Check that start symbol is productive@>@;
2470    @<Calculate reach matrix@>@;
2471    @<Census accessible symbols@>@;
2472    @<Census nulling symbols@>@;
2473    @<Free Boolean vectors@>@;
2474    @<Free Boolean matrixes@>@;
2475    g->t_is_precomputed = TRUE;
2476    return g;
2477}
2478@ @<Private function prototypes@> =
2479static struct marpa_g* census(struct marpa_g* g);
2480@ @<Declare census variables@> =
2481guint pre_rewrite_rule_count = g->t_rules->len;
2482guint pre_rewrite_symbol_count = g->t_symbols->len;
2483
2484@ @<Return |NULL| if empty grammar@> =
2485if (g->t_rules->len <= 0) { g->t_error = "no rules"; return NULL; }
2486@ The upper layers have a lot of latitude with this one.
2487There's no harm done, so the upper layers can simply ignore this one.
2488On the other hand, the upper layer may see this as a sign of a major
2489logic error, and treat it as a fatal error.
2490Anything in between these two extremes is also possible.
2491@<Return |NULL| if already precomputed@> =
2492if (G_is_Precomputed(g)) { g->t_error = "precomputed"; return NULL; }
2493@ Loop over the rules, producing bit vector of LHS symbols, and of
2494symbols which are the LHS of empty rules.
2495While at it, set a flag to indicate if there are empty rules.
2496
2497@ @<Return |NULL| if bad start symbol@> =
2498if (original_start_symid < 0) {
2499    g_context_clear(g);
2500    g->t_error = "no start symbol";
2501    return failure_indicator;
2502}
2503if (!symbol_is_valid(g, original_start_symid)) {
2504    g_context_clear(g);
2505    g_context_int_add(g, "symid", original_start_symid);
2506    g->t_error = "invalid start symbol";
2507    return failure_indicator;
2508}
2509original_start_symbol = SYM_by_ID(original_start_symid);
2510if (original_start_symbol->t_lhs->len <= 0) {
2511    g_context_clear(g);
2512    g_context_int_add(g, "symid", original_start_symid);
2513    g->t_error = "start symbol not on LHS";
2514    return failure_indicator;
2515}
2516
2517@ @<Declare census variables@> =
2518Marpa_Symbol_ID original_start_symid = g->t_start_symid;
2519SYM original_start_symbol;
2520
2521@ @<Census LHS symbols@> =
2522{ Marpa_Rule_ID rule_id;
2523lhs_v = bv_create(pre_rewrite_symbol_count);
2524empty_lhs_v = bv_shadow(lhs_v);
2525for (rule_id = 0;
2526	rule_id < (Marpa_Rule_ID)pre_rewrite_rule_count;
2527	rule_id++) {
2528    RULE  rule = RULE_by_ID(g, rule_id);
2529    Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule);
2530    bv_bit_set(lhs_v, (guint)lhs_id);
2531    if (Length_of_RULE(rule) <= 0) {
2532	bv_bit_set(empty_lhs_v, (guint)lhs_id);
2533	have_empty_rule = 1;
2534    }
2535}
2536}
2537@ Loop over the symbols, producing the boolean vector of symbols
2538already marked as terminal,
2539and a flag which indicates if there are any.
2540@<Census terminals@> =
2541{ Marpa_Symbol_ID symid;
2542terminal_v = bv_create(pre_rewrite_symbol_count);
2543for (symid = 0;
2544	symid < (Marpa_Symbol_ID)pre_rewrite_symbol_count;
2545	symid++) {
2546    SYM symbol = SYM_by_ID(symid);
2547    if (SYM_is_Terminal(symbol)) {
2548	bv_bit_set(terminal_v, (guint)symid);
2549	have_marked_terminals = 1;
2550    }
2551} }
2552@ @<Free Boolean vectors@> =
2553bv_free(terminal_v);
2554@
2555@s Bit_Vector int
2556@<Declare census variables@> =
2557Bit_Vector terminal_v;
2558gboolean have_marked_terminals = 0;
2559
2560@ @<Fatal if empty rule and unmarked terminals@> =
2561if (have_empty_rule && g->t_is_lhs_terminal_ok) {
2562     g->t_error = "empty rule and unmarked terminals";
2563    return NULL;
2564}
2565@ Any optimization should be for the non-error case, in which
2566there are no LHS terminals, and the entire list of symbols must
2567be scanned to discover this.
2568It is faster to stop scanning symbols on the first error, if there is
2569an error, but when that happens it is a fatal error,
2570and for that, this code is already plenty fast enough.
2571@<Fatal if LHS terminal when not allowed@> =
2572if (!g->t_is_lhs_terminal_ok) {
2573    gboolean have_bad_lhs = 0;
2574    guint start = 0;
2575    guint min, max;
2576    Bit_Vector bad_lhs_v = bv_clone(terminal_v);
2577    bv_and(bad_lhs_v, bad_lhs_v, lhs_v);
2578    while ( bv_scan(bad_lhs_v, start, &min, &max) ) {
2579	Marpa_Symbol_ID i;
2580	for (i = (Marpa_Symbol_ID)min; i <= (Marpa_Symbol_ID)max; i++) {
2581	    g_context_clear(g);
2582	    g_context_int_add(g, "symid", i);
2583	    grammar_message(g, "lhs is terminal");
2584	}
2585        start = max+2;
2586	have_bad_lhs = 1;
2587    }
2588    bv_free(bad_lhs_v);
2589    if (have_bad_lhs) {
2590        g->t_error = "lhs is terminal";
2591	return NULL;
2592    }
2593}
2594
2595@ @<Mark all symbols terminal@> =
2596{ Marpa_Symbol_ID symid;
2597bv_fill(terminal_v);
2598for (symid = 0; symid < (Marpa_Symbol_ID)g->t_symbols->len; symid++)
2599{ SYMID_is_Terminal(symid) = 1; } }
2600@ @<Mark non-LHS symbols terminal@> =
2601{ guint start = 0;
2602guint min, max;
2603bv_not(terminal_v, lhs_v);
2604while ( bv_scan(terminal_v, start, &min, &max) ) {
2605    Marpa_Symbol_ID symid;
2606    for (symid = (Marpa_Symbol_ID)min; symid <= (Marpa_Symbol_ID)max; symid++) {
2607     SYMID_is_Terminal(symid) = 1;
2608    }
2609    start = max+2;
2610}
2611}
2612@ @<Free Boolean vectors@> =
2613bv_free(lhs_v);
2614bv_free(empty_lhs_v);
2615@ @<Declare census variables@> =
2616Bit_Vector lhs_v;
2617Bit_Vector empty_lhs_v;
2618gboolean have_empty_rule = 0;
2619
2620@ @<Census nullable symbols@> =
2621nullable_v = bv_clone(empty_lhs_v);
2622rhs_closure(g, nullable_v);
2623{ guint min, max, start;
2624Marpa_Symbol_ID symid;
2625gint counted_nullables = 0;
2626    for ( start = 0; bv_scan(nullable_v, start, &min, &max); start = max+2 ) {
2627	for (symid = (Marpa_Symbol_ID)min; symid <= (Marpa_Symbol_ID)max; symid++) {
2628	    SYM symbol = SYM_by_ID(symid);
2629	    if (symbol->t_is_counted) {
2630		g_context_clear(g);
2631		g_context_int_add(g, "symid", symid);
2632		grammar_message(g, "counted nullable");
2633		counted_nullables++;
2634	    }
2635	    symbol->t_is_nullable = 1;
2636} }
2637if (counted_nullables) {
2638    g->t_error = "counted nullable";
2639    return NULL;
2640}
2641}
2642@ @<Declare census variables@> =
2643Bit_Vector nullable_v;
2644@ @<Free Boolean vectors@> =
2645bv_free(nullable_v);
2646
2647@ @<Census productive symbols@> =
2648productive_v = bv_shadow(nullable_v);
2649bv_or(productive_v, nullable_v, terminal_v);
2650rhs_closure(g, productive_v);
2651{ guint min, max, start;
2652Marpa_Symbol_ID symid;
2653    for ( start = 0; bv_scan(productive_v, start, &min, &max); start = max+2 ) {
2654	for (symid = (Marpa_Symbol_ID)min;
2655		symid <= (Marpa_Symbol_ID)max;
2656		symid++) {
2657	    SYM symbol = SYM_by_ID(symid);
2658	    symbol->t_is_productive = 1;
2659} }
2660}
2661@ @<Check that start symbol is productive@> =
2662if (!bv_bit_test(productive_v, (guint)g->t_start_symid))
2663{
2664    g_context_int_add(g, "symid", g->t_start_symid);
2665    g->t_error = "unproductive start symbol";
2666    return NULL;
2667}
2668@ @<Declare census variables@> =
2669Bit_Vector productive_v;
2670@ @<Free Boolean vectors@> =
2671bv_free(productive_v);
2672
2673@ The reach matrix is the an $n\times n$ matrix,
2674where $n$ is the number of symbols.
2675Bit $(i,j)$ is set in the reach matrix if and only if
2676symbol $i$ can reach symbol $j$.
2677\par
2678This logic could be put earlier, and a child array
2679for each rule could be efficiently calculated during
2680the initialization for the calculation of the reach
2681matrix.
2682A rule-child array is a list of the rule's RHS symbols,
2683in sequence and without duplicates.
2684There are places were traversing a rule-child array,
2685instead of the rhs, would be more efficient.
2686At this point,
2687however, it is not clear whether use of a rule-child array
2688is not a pointless or even counter-productive optimization.
2689It would only make a difference in grammars
2690where many of the right hand sides repeat symbols.
2691@<Calculate reach matrix@> =
2692reach_matrix
2693    = matrix_create(pre_rewrite_symbol_count, pre_rewrite_symbol_count);
2694{ guint symid, no_of_symbols = SYM_Count_of_G(g);
2695for (symid = 0; symid < no_of_symbols; symid++) {
2696     matrix_bit_set(reach_matrix, symid, symid);
2697} }
2698{ Marpa_Rule_ID rule_id;
2699guint no_of_rules = RULE_Count_of_G(g);
2700for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) {
2701     RULE  rule = RULE_by_ID(g, rule_id);
2702     Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE(rule);
2703     guint rhs_ix, rule_length = Length_of_RULE(rule);
2704     for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) {
2705	 matrix_bit_set(reach_matrix,
2706	     (guint)lhs_id, (guint)RHS_ID_of_RULE(rule, rhs_ix));
2707} } }
2708transitive_closure(reach_matrix);
2709@ @<Declare census variables@> = Bit_Matrix reach_matrix;
2710@ @<Free Boolean matrixes@> =
2711matrix_free(reach_matrix);
2712
2713@ @<Census accessible symbols@> =
2714accessible_v = matrix_row(reach_matrix, (guint)original_start_symid);
2715{ guint min, max, start;
2716Marpa_Symbol_ID symid;
2717    for ( start = 0; bv_scan(accessible_v, start, &min, &max); start = max+2 ) {
2718	for (symid = (Marpa_Symbol_ID)min;
2719		symid <= (Marpa_Symbol_ID)max;
2720		symid++) {
2721	    SYM symbol = SYM_by_ID(symid);
2722	    symbol->t_is_accessible = 1;
2723} }
2724}
2725@ |accessible_v| is a pointer into the |reach_matrix|.
2726Therefore there is no code to free it.
2727@<Declare census variables@> =
2728Bit_Vector accessible_v;
2729
2730@ A symbol is nulling if and only if it is a productive symbol which does not
2731reach a terminal symbol.
2732@<Census nulling symbols@> =
2733{
2734  Bit_Vector reaches_terminal_v = bv_shadow (terminal_v);
2735  guint min, max, start;
2736  for (start = 0; bv_scan (productive_v, start, &min, &max); start = max + 2)
2737    {
2738      Marpa_Symbol_ID productive_id;
2739      for (productive_id = (Marpa_Symbol_ID) min;
2740	   productive_id <= (Marpa_Symbol_ID) max; productive_id++)
2741	{
2742	  bv_and (reaches_terminal_v, terminal_v,
2743		  matrix_row (reach_matrix, (guint) productive_id));
2744	  if (bv_is_empty (reaches_terminal_v))
2745	    SYM_is_Nulling(SYM_by_ID (productive_id)) = 1;
2746	}
2747    }
2748  bv_free (reaches_terminal_v);
2749}
2750
2751@** The CHAF Rewrite.
2752
2753Nullable symbols have been a difficulty for Earley implementations
2754since day zero.
2755Aycock and Horspool came up with a solution to this problem,
2756part of which involved rewriting the grammar to eliminate
2757all proper nullables.
2758Marpa's CHAF rewrite is built on the work of Aycock and
2759Horspool.
2760
2761Marpa's CHAF rewrite is one of its two rewrites of the BNF.
2762The other
2763adds a new start symbol to the grammar.
2764
2765@ The rewrite strategy for Marpa is new to it.
2766It is an elaboration on the one developed by Aycock and Horspool.
2767The basic idea behind Aycock and Horspool's NNF was to elimnate
2768proper nullables by replacing the rules with variants which
2769used only nulling and non-nulling symbols.
2770These had to be created for every possible combination
2771of nulling and non-nulling symbols.
2772This meant that the number of NNF rules was
2773potentially exponential
2774in the length of rule of the original grammar.
2775
2776@ Marpa's CHAF (Chomsky-Horspool-Aycock Form) eliminates
2777the problem of exponential explosion by first breaking rules
2778up into pieces, each piece containing no more than two proper nullables.
2779The number of rewritten rules in CHAF in linear in the length of
2780the original rule.
2781
2782@ The CHAF rewrite affects only rules with proper nullables.
2783In this context, the proper nullables are called ``factors".
2784Each piece of the original rule is rewritten into up to four
2785``factored pieces".
2786When there are two proper nullables, the potential CHAF rules
2787are
2788\li The PP rule:  Both factors are replaced with non-nulling symbols.
2789\li The PN rule:  The first factor is replaced with a non-nulling symbol,
2790and the second factor is replaced with a nulling symbol.
2791\li The NP rule: The first factor is replaced with a nulling symbol,
2792and the second factor is replaced with a non-nulling symbol.
2793\li The NN rule: Both factors are replaced with nulling symbols.
2794
2795@ Sometimes the CHAF piece will have only one factor.  A one-factor
2796piece is rewritten into at most two factored pieces:
2797\li The P rule:  The factor is replaced with a non-nulling symbol.
2798\li The N rule:  The factor is replaced with a nulling symbol.
2799
2800@ In |CHAF_rewrite|, a |rule_count| is taken before the loop over
2801the grammar's rules, even though rules are added in the loop.
2802This is not an error.
2803The CHAF rewrite is not recursive -- the new rules it creates
2804are not themselves subject to CHAF rewrite.
2805And rule ID's increase by one each time,
2806so that all the new
2807rules will have ID's equal to or greater than |no_of_rules|.
2808@ @<Function definitions@> =
2809static inline struct marpa_g* CHAF_rewrite(struct marpa_g* g)
2810{
2811    @<CHAF rewrite declarations@>@;
2812    @<CHAF rewrite allocations@>@;
2813     @<Alias proper nullables@>@;
2814    no_of_rules = RULE_Count_of_G(g);
2815    for (rule_id = 0; rule_id < no_of_rules; rule_id++) {
2816         RULE  rule = RULE_by_ID(g, rule_id);
2817	 const gint rule_length = Length_of_RULE(rule);
2818	 gint nullable_suffix_ix = 0;
2819	 @<Mark and skip unused rules@>@;
2820	 @<Calculate CHAF rule statistics@>@;
2821	 /* If there is no proper nullable in this rule, I am done */
2822	 if (factor_count <= 0) goto NEXT_RULE;
2823	 @<Factor the rule into CHAF rules@>@;
2824	 NEXT_RULE: ;
2825    }
2826    @<CHAF rewrite deallocations@>@;
2827    return g;
2828}
2829@ @<Private function prototypes@> =
2830static inline struct marpa_g* CHAF_rewrite(struct marpa_g* g);
2831@ @<CHAF rewrite declarations@> =
2832Marpa_Rule_ID rule_id;
2833gint no_of_rules;
2834
2835@ @<Mark and skip unused rules@> =
2836if (!RULE_is_Used(rule)) { goto NEXT_RULE; }
2837if (rule_is_nulling(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; }
2838if (!rule_is_accessible(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; }
2839if (!rule_is_productive(g, rule)) { RULE_is_Used(rule) = 0; goto NEXT_RULE; }
2840
2841@ For every accessible and productive proper nullable which
2842is not already aliased, alias it.
2843@<Alias proper nullables@> =
2844{ gint no_of_symbols = SYM_Count_of_G(g);
2845Marpa_Symbol_ID symid;
2846for (symid = 0; symid < no_of_symbols; symid++) {
2847     SYM symbol = SYM_by_ID(symid);
2848     SYM alias;
2849     if (!symbol->t_is_nullable) continue;
2850     if (SYM_is_Nulling(symbol)) continue;
2851     if (!symbol->t_is_accessible) continue;
2852     if (!symbol->t_is_productive) continue;
2853     if (symbol_null_alias(symbol)) continue;
2854    alias = symbol_alias_create(g, symbol);
2855    symbol_callback(g, ID_of_SYM(alias));
2856} }
2857
2858@*0 Compute Statistics Needed to Rewrite the Rule.
2859The term
2860``factor" is used to mean an instance of a proper nullable
2861symbol on the RHS of a rule.
2862This comes from the idea that replacing the proper nullables
2863with proper symbols and nulling symbols ``factors" pieces
2864of the rule being rewritten (the original rule)
2865into multiple CHAF rules.
2866@<Calculate CHAF rule statistics@> =
2867{ gint rhs_ix;
2868factor_count = 0;
2869for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) {
2870     Marpa_Symbol_ID symid = RHS_ID_of_RULE(rule, rhs_ix);
2871     SYM symbol = SYM_by_ID(symid);
2872     if (SYM_is_Nulling(symbol)) continue; /* Do nothing for nulling symbols */
2873     if (symbol_null_alias(symbol)) {
2874     /* If a proper nullable, record its position */
2875	 factor_positions[factor_count++] = rhs_ix;
2876	 continue;
2877    }@#
2878     nullable_suffix_ix = rhs_ix+1;
2879/* If not a nullable symbol, move forward the index
2880 of the nullable suffix location */
2881} }
2882@ @<CHAF rewrite declarations@> =
2883gint factor_count;
2884gint* factor_positions;
2885@ @<CHAF rewrite allocations@> =
2886factor_positions = g_new(gint, g->t_max_rule_length);
2887@ @<CHAF rewrite deallocations@> =
2888g_free(factor_positions);
2889
2890@*0 Divide the Rule into Pieces.
2891@<Factor the rule into CHAF rules@> =
2892RULE_is_Used(rule) = 0; /* Mark the original rule unused */
2893{ gint unprocessed_factor_count; /* The number of proper nullables for which CHAF rules have
2894yet to be written */
2895gint factor_position_ix = 0; /* Current index into the list of factors */
2896Marpa_Symbol_ID current_lhs_id = LHS_ID_of_RULE(rule);
2897gint piece_end, piece_start = 0; /* The positions, in the original rule, where
2898the new (virtual) rule starts and ends */
2899for (unprocessed_factor_count = factor_count - factor_position_ix;
2900unprocessed_factor_count >= 3;
2901unprocessed_factor_count = factor_count - factor_position_ix) {
2902    @<Add non-final CHAF rules@>@;
2903}
2904if (unprocessed_factor_count == 2) {
2905	@<Add final CHAF rules for two factors@>@;
2906} else {
2907	@<Add final CHAF rules for one factor@>@;
2908} }
2909
2910@ @<Create a CHAF virtual symbol@> = {
2911    SYM chaf_virtual_symbol = symbol_new(g);
2912    chaf_virtual_symbol->t_is_accessible = 1;
2913    chaf_virtual_symbol->t_is_productive = 1;
2914    chaf_virtual_symid = ID_of_SYM(chaf_virtual_symbol);
2915    g_context_clear(g);
2916    g_context_int_add(g, "rule_id", rule_id);
2917    g_context_int_add(g, "lhs_id", LHS_ID_of_RULE(rule));
2918    g_context_int_add(g, "virtual_end", (gint)piece_end);
2919    symbol_callback(g, chaf_virtual_symid);
2920}
2921
2922@*0 Temporary buffers for the CHAF right hand sides.
2923Two temporary buffers are used in factoring out CHAF rules.
2924|piece_rhs| is for the normal case, where only the symbols
2925of the current piece are on the RHS.
2926In certain cases, where the remainder of the rule is nulling,
2927further factoring is unnecessary and the CHAF rewrite simply
2928finishes out the rule with nulling symbols.
2929In such cases, the RHS is built in the
2930|remaining_rhs| buffer.
2931@<CHAF rewrite declarations@> =
2932Marpa_Symbol_ID* piece_rhs;
2933Marpa_Symbol_ID* remaining_rhs;
2934@ @<CHAF rewrite allocations@> =
2935piece_rhs = g_new(Marpa_Symbol_ID, g->t_max_rule_length);
2936remaining_rhs = g_new(Marpa_Symbol_ID, g->t_max_rule_length);
2937@ @<CHAF rewrite deallocations@> =
2938g_free(piece_rhs);
2939g_free(remaining_rhs);
2940
2941@*0 Factor A Non-Final Piece.
2942@ As long as I have more than 3 unprocessed factors, I am working on a non-final
2943rule.
2944@<Add non-final CHAF rules@> =
2945    Marpa_Symbol_ID chaf_virtual_symid;
2946    gint first_factor_position = factor_positions[factor_position_ix];
2947    gint first_factor_piece_position = first_factor_position - piece_start;
2948    gint second_factor_position = factor_positions[factor_position_ix+1];
2949    if (second_factor_position >= nullable_suffix_ix) {
2950	piece_end = second_factor_position-1;
2951        /* The last factor is in the nullable suffix, so the virtual RHS must be nullable */
2952	@<Create a CHAF virtual symbol@>@;
2953	@<Add CHAF rules for nullable continuation@>@;
2954	factor_position_ix++;
2955    } else {
2956	gint second_factor_piece_position = second_factor_position - piece_start;
2957	piece_end = second_factor_position;
2958	@<Create a CHAF virtual symbol@>@;
2959	@<Add CHAF rules for proper continuation@>@;
2960	factor_position_ix += 2;
2961    }
2962    current_lhs_id = chaf_virtual_symid;
2963    piece_start = piece_end+1;
2964
2965@*0 Add CHAF Rules for Nullable Continuations.
2966For a piece that has a nullable continuation,
2967the virtual RHS counts
2968as one of the two allowed proper nullables.
2969That means the piece must
2970end before the second proper nullable (or factor).
2971@<Add CHAF rules for nullable continuation@> =
2972{
2973    gint remaining_rhs_length, piece_rhs_length;
2974    @<Add PP CHAF rule for nullable continuation@>;
2975    @<Add PN CHAF rule for nullable continuation@>;
2976    @<Add NP CHAF rule for nullable continuation@>;
2977    @<Add NN CHAF rule for nullable continuation@>;
2978}
2979
2980@ Note that since the first part of |remaining_rhs| is exactly the same
2981as the first part of |piece_rhs| so I copy it here in preparation
2982for the PN rule.
2983@<Add PP CHAF rule for nullable continuation@> =
2984{
2985gint real_symbol_count = piece_end - piece_start + 1;
2986for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) {
2987   remaining_rhs[piece_rhs_length] =
2988   piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length);
2989}
2990piece_rhs[piece_rhs_length++] = chaf_virtual_symid;
2991}
2992{ RULE  chaf_rule;
2993    gint real_symbol_count = piece_rhs_length - 1;
2994    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
2995    @<Set CHAF rule flags and call back@>@;
2996}
2997
2998@ @<Add PN CHAF rule for nullable continuation@> =
2999{
3000  gint chaf_rule_length = Length_of_RULE(rule) - piece_start;
3001  for (remaining_rhs_length = piece_rhs_length - 1;
3002       remaining_rhs_length < chaf_rule_length; remaining_rhs_length++)
3003    {
3004      Marpa_Symbol_ID original_id =
3005	RHS_ID_of_RULE (rule, piece_start + remaining_rhs_length);
3006      SYM alias = symbol_null_alias (SYM_by_ID (original_id));
3007      remaining_rhs[remaining_rhs_length] =
3008	alias ? ID_of_SYM (alias) : original_id;
3009    }
3010}
3011{
3012  RULE chaf_rule;
3013  gint real_symbol_count = remaining_rhs_length;
3014  chaf_rule =
3015    rule_start (g, current_lhs_id, remaining_rhs, remaining_rhs_length);
3016  @<Set CHAF rule flags and call back@>@;
3017}
3018
3019@ Note, while I have the nulling alias for the first factor,
3020|remaining_rhs| is altered to be ready for the NN rule.
3021@<Add NP CHAF rule for nullable continuation@> = {
3022    Marpa_Symbol_ID proper_id = RHS_ID_of_RULE(rule, first_factor_position);
3023    SYM alias = symbol_null_alias(SYM_by_ID(proper_id));
3024    remaining_rhs[first_factor_piece_position] =
3025	piece_rhs[first_factor_piece_position] =
3026	ID_of_SYM(alias);
3027}
3028{ RULE  chaf_rule;
3029 gint real_symbol_count = piece_rhs_length-1;
3030    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3031    @<Set CHAF rule flags and call back@>@;
3032}
3033
3034@ If this piece is nullable (|piece_start| at or
3035after |nullable_suffix_ix|), I don't add an NN choice,
3036because nulling both factors makes the entire piece nulling,
3037and nulling rules cannot be fed directly to
3038the Marpa parse engine.
3039Note that |remaining_rhs| was altered above.
3040@<Add NN CHAF rule for nullable continuation@> =
3041if (piece_start < nullable_suffix_ix) {
3042 RULE  chaf_rule;
3043 gint real_symbol_count = remaining_rhs_length;
3044    chaf_rule = rule_start(g, current_lhs_id, remaining_rhs, remaining_rhs_length);
3045    @<Set CHAF rule flags and call back@>@;
3046}
3047
3048@*0 Add CHAF Rules for Proper Continuations.
3049@ Open block and declarations.
3050@<Add CHAF rules for proper continuation@> = {
3051    gint piece_rhs_length;
3052RULE  chaf_rule;
3053gint real_symbol_count;
3054Marpa_Symbol_ID first_factor_proper_id, second_factor_proper_id,
3055	first_factor_alias_id, second_factor_alias_id;
3056real_symbol_count = piece_end - piece_start + 1;
3057
3058@ The PP Rule.
3059@<Add CHAF rules for proper continuation@> =
3060    for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) {
3061	piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length);
3062    }
3063    piece_rhs[piece_rhs_length++] = chaf_virtual_symid;
3064    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3065    @<Set CHAF rule flags and call back@>@;
3066
3067@ The PN Rule.
3068@<Add CHAF rules for proper continuation@> =
3069    second_factor_proper_id = RHS_ID_of_RULE(rule, second_factor_position);
3070    piece_rhs[second_factor_piece_position]
3071	= second_factor_alias_id = alias_by_id(g, second_factor_proper_id);
3072    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3073    @<Set CHAF rule flags and call back@>@;
3074
3075@ The NP Rule.
3076@<Add CHAF rules for proper continuation@> =
3077    first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position);
3078    piece_rhs[first_factor_piece_position]
3079	= first_factor_alias_id = alias_by_id(g, first_factor_proper_id);
3080    piece_rhs[second_factor_piece_position] = second_factor_proper_id;
3081    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3082    @<Set CHAF rule flags and call back@>@;
3083
3084@ The NN Rule.
3085@<Add CHAF rules for proper continuation@> =
3086    piece_rhs[second_factor_piece_position] = second_factor_alias_id;
3087    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3088    @<Set CHAF rule flags and call back@>@;
3089
3090@ Close the block
3091@<Add CHAF rules for proper continuation@> = }
3092
3093@*0 Add Final CHAF Rules for Two Factors.
3094Open block, declarations and setup.
3095@<Add final CHAF rules for two factors@> = {
3096gint first_factor_position = factor_positions[factor_position_ix];
3097gint first_factor_piece_position = first_factor_position - piece_start;
3098gint second_factor_position = factor_positions[factor_position_ix+1];
3099gint second_factor_piece_position = second_factor_position - piece_start;
3100gint real_symbol_count;
3101gint piece_rhs_length;
3102RULE  chaf_rule;
3103Marpa_Symbol_ID first_factor_proper_id, second_factor_proper_id,
3104	first_factor_alias_id, second_factor_alias_id;
3105piece_end = Length_of_RULE(rule)-1;
3106real_symbol_count = piece_end - piece_start + 1;
3107
3108@ The PP Rule.
3109@<Add final CHAF rules for two factors@> =
3110    for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) {
3111	piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length);
3112    }
3113    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3114    @<Set CHAF rule flags and call back@>@;
3115
3116@ The PN Rule.
3117@<Add final CHAF rules for two factors@> =
3118    second_factor_proper_id = RHS_ID_of_RULE(rule, second_factor_position);
3119    piece_rhs[second_factor_piece_position]
3120	= second_factor_alias_id = alias_by_id(g, second_factor_proper_id);
3121    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3122    @<Set CHAF rule flags and call back@>@;
3123
3124@ The NP Rule.
3125@<Add final CHAF rules for two factors@> =
3126    first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position);
3127    piece_rhs[first_factor_piece_position]
3128	= first_factor_alias_id = alias_by_id(g, first_factor_proper_id);
3129    piece_rhs[second_factor_piece_position] = second_factor_proper_id;
3130    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3131    @<Set CHAF rule flags and call back@>@;
3132
3133@ The NN Rule.  This is added only if it would not turn this into
3134a nulling rule.
3135@<Add final CHAF rules for two factors@> =
3136if (piece_start < nullable_suffix_ix) {
3137    piece_rhs[second_factor_piece_position] = second_factor_alias_id;
3138    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3139    @<Set CHAF rule flags and call back@>@;
3140}
3141
3142@ Close the block
3143@<Add final CHAF rules for two factors@> = }
3144
3145@*0 Add Final CHAF Rules for One Factor.
3146@<Add final CHAF rules for one factor@> = {
3147gint piece_rhs_length;
3148RULE  chaf_rule;
3149Marpa_Symbol_ID first_factor_proper_id, first_factor_alias_id;
3150gint real_symbol_count;
3151gint first_factor_position = factor_positions[factor_position_ix];
3152gint first_factor_piece_position = factor_positions[factor_position_ix] - piece_start;
3153piece_end = Length_of_RULE(rule)-1;
3154real_symbol_count = piece_end - piece_start + 1;
3155
3156@ The P Rule.
3157@<Add final CHAF rules for one factor@> =
3158    for (piece_rhs_length = 0; piece_rhs_length < real_symbol_count; piece_rhs_length++) {
3159	piece_rhs[piece_rhs_length] = RHS_ID_of_RULE(rule, piece_start+piece_rhs_length);
3160    }
3161    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3162    @<Set CHAF rule flags and call back@>@;
3163
3164@ The N Rule.  This is added only if it would not turn this into
3165a nulling rule.
3166@<Add final CHAF rules for one factor@> =
3167if (piece_start < nullable_suffix_ix) {
3168    first_factor_proper_id = RHS_ID_of_RULE(rule, first_factor_position);
3169    first_factor_alias_id = alias_by_id(g, first_factor_proper_id);
3170    piece_rhs[first_factor_piece_position] = first_factor_alias_id;
3171    chaf_rule = rule_start(g, current_lhs_id, piece_rhs, piece_rhs_length);
3172    @<Set CHAF rule flags and call back@>@;
3173}
3174
3175@ Close the block
3176@<Add final CHAF rules for one factor@> = }
3177
3178@ Some of the code for adding CHAF rules is common to
3179them all.
3180This include the setting of many of the elements of the
3181rule structure, and performing the call back.
3182@<Set CHAF rule flags and call back@> =
3183RULE_is_Used (chaf_rule) = 1;
3184chaf_rule->t_original = rule_id;
3185RULE_is_Virtual_LHS(chaf_rule) = piece_start > 0;
3186chaf_rule->t_is_semantic_equivalent = !RULE_is_Virtual_LHS(chaf_rule);
3187RULE_is_Virtual_RHS(chaf_rule) = Length_of_RULE (chaf_rule) > real_symbol_count;
3188chaf_rule->t_virtual_start = piece_start;
3189chaf_rule->t_virtual_end = piece_start + real_symbol_count - 1;
3190Real_SYM_Count_of_RULE(chaf_rule) = real_symbol_count;
3191rule_callback (g, chaf_rule->t_id);
3192
3193@ This utility routine translates a proper symbol id to a nulling symbol ID.
3194It is assumed that the caller has ensured that
3195|proper_id| is valid and that an alias actually exists.
3196@<Function definitions@> =
3197static inline
3198Marpa_Symbol_ID alias_by_id(struct marpa_g* g, Marpa_Symbol_ID proper_id) {
3199     SYM alias = symbol_null_alias(SYM_by_ID(proper_id));
3200     return ID_of_SYM(alias);
3201}
3202@ @<Private function prototypes@> =
3203static inline
3204Marpa_Symbol_ID alias_by_id(struct marpa_g* g, Marpa_Symbol_ID proper_id);
3205
3206@** Adding a New Start Symbol.
3207This is such a common rewrite that it has a special name
3208in the literature --- it is called ``augmenting the grammar".
3209
3210@ @<Function definitions@> =
3211static inline
3212struct marpa_g* g_augment(struct marpa_g* g) {
3213    Marpa_Symbol_ID proper_new_start_id = -1;
3214    SYM proper_old_start = NULL;
3215    SYM nulling_old_start = NULL;
3216    SYM proper_new_start = NULL;
3217    SYM old_start = SYM_by_ID(g->t_start_symid);
3218    @<Find and classify the old start symbols@>@;
3219    if (proper_old_start) { @<Set up a new proper start rule@> }
3220    if (nulling_old_start) { @<Set up a new nulling start rule@> }
3221    return g;
3222}
3223@ @<Private function prototypes@> =
3224static inline struct marpa_g* g_augment(struct marpa_g* g);
3225
3226@ @<Find and classify the old start symbols@> =
3227if (SYM_is_Nulling(old_start)) {
3228   old_start->t_is_accessible = 0;
3229    nulling_old_start = old_start;
3230} else {
3231    proper_old_start = old_start;
3232    nulling_old_start = symbol_null_alias(old_start);
3233}
3234old_start->t_is_start = 0;
3235
3236@ @<Set up a new proper start rule@> = {
3237  RULE new_start_rule;
3238  proper_old_start->t_is_start = 0;
3239  proper_new_start = symbol_new (g);
3240  proper_new_start_id = ID_of_SYM(proper_new_start);
3241  g->t_start_symid = proper_new_start_id;
3242  proper_new_start->t_is_accessible = TRUE;
3243  proper_new_start->t_is_productive = TRUE;
3244  proper_new_start->t_is_start = TRUE;
3245  g_context_clear (g);
3246  g_context_int_add (g, "old_start_id", ID_of_SYM(old_start));
3247  symbol_callback (g, proper_new_start_id);
3248  new_start_rule = rule_start (g, proper_new_start_id, &LV_ID_of_SYM(old_start), 1);
3249  new_start_rule->t_is_start = 1;
3250  RULE_is_Virtual_LHS(new_start_rule) = 1;
3251  Real_SYM_Count_of_RULE(new_start_rule) = 1;
3252  RULE_is_Used(new_start_rule) = 1;
3253  g->t_proper_start_rule = new_start_rule;
3254  rule_callback (g, new_start_rule->t_id);
3255}
3256
3257@ Set up the new nulling start rule, if the old start symbol was
3258nulling or had a null alias.  A new nulling start symbol
3259must be created.  It is an alias of the new proper start symbol,
3260if there is one.  Otherwise it is a new, nulling, symbol.
3261@<Set up a new nulling start rule@> = {
3262  Marpa_Symbol_ID nulling_new_start_id;
3263  RULE new_start_rule;
3264  SYM nulling_new_start;
3265  if (proper_new_start)
3266    {				/* There are two start symbols */
3267      nulling_new_start = symbol_alias_create (g, proper_new_start);
3268      nulling_new_start_id = ID_of_SYM(nulling_new_start);
3269    }
3270  else
3271    {				/* The only start symbol is a nulling symbol */
3272      nulling_new_start = symbol_new (g);
3273      nulling_new_start_id = ID_of_SYM(nulling_new_start);
3274      g->t_start_symid = nulling_new_start_id;
3275      SYM_is_Nulling(nulling_new_start) = TRUE;
3276      nulling_new_start->t_is_nullable = TRUE;
3277      nulling_new_start->t_is_productive = TRUE;
3278      nulling_new_start->t_is_accessible = TRUE;
3279    }
3280  nulling_new_start->t_is_start = TRUE;
3281  g_context_clear (g);
3282  g_context_int_add (g, "old_start_id", ID_of_SYM(old_start));
3283  symbol_callback (g, nulling_new_start_id);
3284  new_start_rule = rule_start (g, nulling_new_start_id, 0, 0);
3285  new_start_rule->t_is_start = 1;
3286  RULE_is_Virtual_LHS(new_start_rule) = 1;
3287  Real_SYM_Count_of_RULE(new_start_rule) = 1;
3288  RULE_is_Used(new_start_rule) = TRUE;
3289  g->t_null_start_rule = new_start_rule;
3290  rule_callback (g, new_start_rule->t_id);
3291}
3292
3293@** Loops.
3294Loops are rules which non-trivially derive their own LHS.
3295More precisely, a rule is a loop if and only if it
3296non-trivially derives a string which contains its LHS symbol
3297and is of length 1.
3298In my experience,
3299and according to Grune and Jacobs 2008 (pp. 48-49),
3300loops are never of practical use.
3301
3302@ Marpa allows loops, for two reasons.
3303First, I want to be able to claim that
3304Marpa handles {\bf all} context-free grammars.
3305This is of real value to the user, because
3306it makes
3307it very easy for her
3308to know beforehand whether Marpa can
3309handle a particular grammar.
3310If she can write the grammar in BNF, then Marpa can handle it ---
3311it's that simple.
3312For Marpa to make this claim,
3313it must be able to handle grammars
3314with loops.
3315
3316Second, a user's drafts of a grammar might contain cycles.
3317A parser generator which did not handle them would force
3318the user's first order of business to be removing them.
3319That might be inconvenient.
3320
3321@ The grammar precomputations and the recognition
3322phase have been set up so that
3323loops are a complete non-issue --- they are dealt with like
3324any other situation, without additional overhead.
3325However, loops do impose overhead and require special
3326handling in the evaluation phase.
3327It is unlikely that a user will want to leave one in
3328a production grammar.
3329
3330@ Marpa detects all loops during its grammar
3331precomputation.
3332|libmarpa| assumes that parsing will go through as usual,
3333with the loops.
3334But it enables the upper layers to make other choices
3335by passing a message for every symbol involved in a
3336loop,
3337as well as a final message with the count of looping symbols.
3338
3339@<Function definitions@> =
3340static inline
3341void loop_detect(struct marpa_g* g)
3342{ gint no_of_rules = RULE_Count_of_G(g);
3343gint loop_rule_count = 0;
3344Bit_Matrix unit_transition_matrix
3345    = matrix_create( (guint)no_of_rules , (guint)no_of_rules);
3346@<Mark direct unit transitions in |unit_transition_matrix|@>@;
3347transitive_closure(unit_transition_matrix);
3348@<Mark loop rules@>@;
3349if (loop_rule_count) g->t_has_loop = TRUE;
3350@<Report loop rule count@>@;
3351matrix_free(unit_transition_matrix);
3352}
3353@ @<Private function prototypes@> =
3354static inline
3355void loop_detect(struct marpa_g* g);
3356
3357@ Note that direct transitions are marked in advance,
3358but not trivial ones.
3359That is, bit |(x,x)| is not set |TRUE| in advance.
3360In other words, for this purpose,
3361unit transitions are not in general reflexive.
3362@<Mark direct unit transitions in |unit_transition_matrix|@> = {
3363Marpa_Rule_ID rule_id;
3364for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) {
3365     RULE  rule = RULE_by_ID(g, rule_id);
3366     Marpa_Symbol_ID proper_id;
3367     gint rhs_ix, rule_length;
3368     if (!RULE_is_Used(rule)) continue;
3369     rule_length = Length_of_RULE(rule);
3370     proper_id = -1;
3371     for (rhs_ix = 0; rhs_ix < rule_length; rhs_ix++) {
3372	 Marpa_Symbol_ID symid = RHS_ID_of_RULE(rule, rhs_ix);
3373	 SYM symbol = SYM_by_ID(symid);
3374	 if (symbol->t_is_nullable) continue; /* After the CHAF rewrite, nullable $\E$ nulling */
3375	 if (proper_id >= 0) goto NEXT_RULE; /* More
3376	     than one proper symbol -- not a unit rule */
3377	 proper_id = symid;
3378    }
3379    @#
3380    if (proper_id < 0) continue; /* A
3381	nulling start rule is allowed, so there may be no proper symbol */
3382     { SYM rhs_symbol = SYM_by_ID(proper_id);
3383     GArray* lhs_rules = rhs_symbol->t_lhs;
3384     gint ix, no_of_lhs_rules = lhs_rules->len;
3385     for (ix = 0; ix < no_of_lhs_rules; ix++) {
3386	 /* Direct loops ($A \RA A$) only need the $(rule_id, rule_id)$ bit set,
3387	    but it is not clear that it is a win to special case them. */
3388	 matrix_bit_set(unit_transition_matrix, (guint)rule_id,
3389	     (guint)g_array_index(lhs_rules, Marpa_Rule_ID, ix));
3390     } }
3391     NEXT_RULE: ;
3392} }
3393
3394@ Virtual loop rule are loop rules from the virtual point of view.
3395When CHAF rules, which are rewritten into multiple pieces,
3396it is inconvenient to see each piece as a loop rule.
3397Therefore only certain of CHAF pieces that are loop rules
3398are regarded as virtual loop rules.
3399All non-CHAF rules are virtual loop rules including,
3400at this point, sequence rules.
3401@<Mark loop rules@> = { Marpa_Rule_ID rule_id;
3402for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) {
3403    RULE  rule;
3404    if (!matrix_bit_test(unit_transition_matrix, (guint)rule_id, (guint)rule_id))
3405	continue;
3406    loop_rule_count++;
3407    rule = RULE_by_ID(g, rule_id);
3408    rule->t_is_loop = TRUE;
3409    rule->t_is_virtual_loop = rule->t_virtual_start < 0 || !RULE_is_Virtual_RHS(rule);
3410    g_context_clear(g);
3411    g_context_int_add(g, "rule_id", rule_id);
3412    grammar_message(g, "loop rule");
3413} }
3414
3415@ The higher layers can differ greatly in their treatment
3416of loop rules.  It is perfectly reasonable for a higher layer to treat a loop
3417rule as a fatal error.
3418It is also reasonable for a higher layer to always silently allow them.
3419There are lots of possibilities in between these two extremes.
3420To assist the upper layers, the reporting is very thorough ---
3421there is not just a message for each loop rule, but also a final tally.
3422@<Report loop rule count@> =
3423g_context_clear(g);
3424g_context_int_add(g, "loop_rule_count", loop_rule_count);
3425grammar_message(g, "loop rule tally");
3426
3427@** The Aycock-Horspool Finite Automata.
3428
3429@*0 Some Statistics on AHFA states.
3430For Perl's grammar, the discovered states range in size from 1 to 20 items,
3431but the numbers are heavily skewed toward the low
3432end.  Here are the item counts that appear, with the percent of the total
3433discovered AHFA states with that item count in parentheses.
3434in parentheses:
34351   (67.05\%);
34362   (25.67\%);
34373   (2.87\%);
34384   (2.68\%);
34395   (0.19\%);
34406   (0.38\%);
34417   (0.19\%);
34428   (0.57\%);
34439   (0.19\%); and
344420   (0.19\%).
3445
3446@ As can be seen, well over 90\% of the total discovered states have
3447just one or two items.
3448The average size is 1.5235,
3449and the average of the $|size|^2$ is 3.9405.
3450
3451@ For the HTML grammars I used, the totals are even more lopsided:
345280.96\% of all discovered states have only 1 item.
3453All the others (19.04\%) have 2 items.
3454The average size is 1.1904,
3455and the average of the $|size|^2$ is 1.5712.
3456
3457@ The number of predicted states tends to be much more
3458evenly distributed.
3459It also tends to be much larger, and
3460the average for practical grammars may be $O(s)$,
3461where $s$ is the size of the grammar.
3462This is the same as the theoretical worst case.
3463
3464Here are the number of items for predicted states for the Perl grammar.
3465The number of states with that item count in is parentheses:
34661 item (3),
34672 items (5),
34683 items (4),
34694 items (3),
34705 items (1),
34716 items (2),
34727 items (2),
347364 items (1),
347471 items (1),
347577 items (1),
347679 items (1),
347781 items (1),
347883 items (1),
347985 items (1),
348088 items (1),
348190 items (1),
348298 items (1),
3483100 items (1),
3484102 items (1),
3485104 items (1),
3486106 items (1),
3487108 items (1),
3488111 items (1),
3489116 items (1),
3490127 items (1),
3491129 items (1),
3492132 items (1),
3493135 items (1),
3494136 items (1),
3495137 items (1),
3496141 items (1),
3497142 items (4),
3498143 items (2),
3499144 items (1),
3500149 items (1),
3501151 items (1),
3502156 items (1),
3503157 items (1),
3504220 items (1),
3505224 items (1),
3506225 items (1).
3507And here is the same data for some grammar of HTML:
35081 item (95),
35092 items (95),
35104 items (95),
351111 items (181),
351214 items (181),
351315 items (294),
351416 items (112),
351518 items (349),
351619 items (120),
351720 items (190),
351821 items (63),
351922 items (22),
352024 items (8),
352125 items (16),
352226 items (16),
352328 items (2),
352429 items (16).
3525
3526
3527@** AHFA Item (AIM) Code.
3528AHFA states are sets of AHFA items.
3529AHFA items are named by analogy with LR(0) items.
3530LR(0) items play the same role in the LR(0) automaton that
3531AHFA items play in the AHFA ---
3532the states of the automata correspond to sets of the items.
3533Also like LR(0) items,
3534each AHFA items correponds one-to-one to a duple,
3535the duple being a a rule and a position in that rule.
3536@<Public typedefs@> =
3537typedef gint Marpa_AHFA_Item_ID;
3538@
3539@d Sort_Key_of_AIM(aim) ((aim)->t_sort_key)
3540@<Private structures@> =
3541struct s_AHFA_item {
3542    gint t_sort_key;
3543    @<Widely aligned AHFA item elements@>@;
3544    @<Int aligned AHFA item elements@>@;
3545};
3546@ @<Private incomplete structures@> =
3547struct s_AHFA_item;
3548typedef struct s_AHFA_item* AIM;
3549typedef Marpa_AHFA_Item_ID AIMID;
3550
3551@ A pointer to two lists of AHFA items.
3552The one list contains the AHFA items themselves, in
3553AHFA item ID order.
3554The other is indexed by rule ID, and contains a pointer to
3555the first AHFA item for that rule.
3556@ Because AHFA items are in an array, the predecessor can
3557be found by incrementing the AIM pointer,
3558the successor can be found by decrementing it,
3559and AIM pointers can be portably compared.
3560A lot of code relies on these facts.
3561@d Next_AIM_of_AIM(aim) ((aim)+1)
3562@d AIM_by_ID(id) (g->t_AHFA_items+(id))
3563@<Widely aligned grammar elements@> =
3564   AIM t_AHFA_items;
3565   AIM* t_AHFA_items_by_rule;
3566@
3567@d AIM_Count_of_G(g) ((g)->t_aim_count)
3568@d LV_AIM_Count_of_G(g) AIM_Count_of_G(g)
3569@<Int aligned grammar elements@> =
3570   guint t_aim_count;
3571@ The space is allocated during precomputation.
3572Because the grammar may be destroyed before precomputation,
3573I test that |g->t_AHFA_items| is non-zero.
3574@ @<Initialize grammar elements@> =
3575g->t_AHFA_items = NULL;
3576g->t_AHFA_items_by_rule = NULL;
3577@ @<Destroy grammar elements@> =
3578if (g->t_AHFA_items) { g_free(g->t_AHFA_items); };
3579if (g->t_AHFA_items_by_rule) { g_free(g->t_AHFA_items_by_rule); };
3580
3581@ Check that AHFA item ID is in valid range.
3582@<Function definitions@> =
3583static inline gboolean item_is_valid(
3584GRAMMAR_Const g, AIMID item_id) {
3585return item_id < (AIMID)AIM_Count_of_G(g) && item_id >= 0;
3586}
3587@ @<Private function prototypes@> =
3588static inline gboolean item_is_valid(
3589GRAMMAR_Const g, AIMID item_id);
3590
3591@*0 Rule.
3592@d RULE_of_AIM(item) ((item)->t_rule)
3593@d RULEID_of_AIM(item) ID_of_RULE(RULE_of_AIM(item))
3594@d LHS_ID_of_AIM(item) (LHS_ID_of_RULE(RULE_of_AIM(item)))
3595@<Widely aligned AHFA item elements@> =
3596    RULE t_rule;
3597
3598@*0 Position.
3599Position in the RHS, -1 for a completion.
3600@d Position_of_AIM(aim) ((aim)->t_position)
3601@<Int aligned AHFA item elements@> =
3602gint t_position;
3603
3604@*0 Postdot Symbol.
3605|-1| if the item is a completion.
3606@d Postdot_SYMID_of_AIM(item) ((item)->t_postdot)
3607@d AIM_is_Completion(aim) (Postdot_SYMID_of_AIM(aim) < 0)
3608@d AIM_has_Completed_Start_Rule(aim)
3609    (AIM_is_Completion(aim) && RULE_is_Start(RULE_of_AIM(aim)))
3610@<Int aligned AHFA item elements@> = Marpa_Symbol_ID t_postdot;
3611
3612@*0 Leading Nulls.
3613In libmarpa's AHFA items, the dot position is never in front
3614of a nulling symbol.  (Due to rewriting, every nullable symbol
3615is also a nulling symbol.)
3616This element contains the count of nulling symbols preceding
3617this AHFA items's dot position.
3618@d Null_Count_of_AIM(aim) ((aim)->t_leading_nulls)
3619@<Int aligned AHFA item elements@> =
3620gint t_leading_nulls;
3621
3622@*0 AHFA Item External Accessors.
3623@<Function definitions@> =
3624guint marpa_AHFA_item_count(struct marpa_g* g) {
3625    @<Return |-2| on failure@>@/
3626    @<Fail if grammar not precomputed@>@/
3627    return AIM_Count_of_G(g);
3628}
3629@ @<Public function prototypes@> =
3630guint marpa_AHFA_item_count(struct marpa_g* g);
3631
3632@ @<Function definitions@> =
3633Marpa_Rule_ID marpa_AHFA_item_rule(struct marpa_g* g,
3634	Marpa_AHFA_Item_ID item_id) {
3635    @<Return |-2| on failure@>@/
3636    @<Fail if grammar not precomputed@>@/
3637    @<Fail if grammar |item_id| is invalid@>@/
3638    return RULE_of_AIM(AIM_by_ID(item_id))->t_id;
3639}
3640@ @<Public function prototypes@> =
3641Marpa_Rule_ID marpa_AHFA_item_rule(struct marpa_g* g, Marpa_AHFA_Item_ID item_id);
3642
3643@ |-1| is the value for completions, so |-2| is the failure indicator.
3644@<Public function prototypes@> =
3645gint marpa_AHFA_item_position(struct marpa_g* g, Marpa_AHFA_Item_ID item_id);
3646@ @<Function definitions@> =
3647gint marpa_AHFA_item_position(struct marpa_g* g,
3648	Marpa_AHFA_Item_ID item_id) {
3649    @<Return |-2| on failure@>@/
3650    @<Fail if grammar not precomputed@>@/
3651    @<Fail if grammar |item_id| is invalid@>@/
3652    return Position_of_AIM(AIM_by_ID(item_id));
3653}
3654
3655@ |-1| is the value for completions, so |-2| is the failure indicator.
3656@<Public function prototypes@> =
3657Marpa_Symbol_ID marpa_AHFA_item_postdot(struct marpa_g* g, Marpa_AHFA_Item_ID item_id);
3658@ @<Function definitions@> =
3659Marpa_Symbol_ID marpa_AHFA_item_postdot(struct marpa_g* g,
3660	Marpa_AHFA_Item_ID item_id) {
3661    @<Return |-2| on failure@>@/
3662    @<Fail if grammar not precomputed@>@/
3663    @<Fail if grammar |item_id| is invalid@>@/
3664    return Postdot_SYMID_of_AIM(AIM_by_ID(item_id));
3665}
3666
3667@ @<Public function prototypes@> =
3668gint marpa_AHFA_item_sort_key(struct marpa_g* g, Marpa_AHFA_Item_ID item_id);
3669@ @<Function definitions@> =
3670gint marpa_AHFA_item_sort_key(struct marpa_g* g,
3671	Marpa_AHFA_Item_ID item_id) {
3672    @<Return |-2| on failure@>@/
3673    @<Fail if grammar not precomputed@>@/
3674    @<Fail if grammar |item_id| is invalid@>@/
3675    return Sort_Key_of_AIM(AIM_by_ID(item_id));
3676}
3677
3678@** Creating the AHFA Items.
3679@ I do not use a |DSTACK| because I can initially size the
3680item stack to |Size_of_G(g)|, which is a reasonable allocation,
3681but guaranteed to be greater than
3682or equal to the final numbers of items.
3683That means that I can avoid the overhead of checking the array
3684size when adding each new AHFA item.
3685@<Function definitions@> =
3686static inline
3687void create_AHFA_items(GRAMMAR g) {
3688    RULEID rule_id;
3689    guint no_of_items;
3690    guint no_of_rules = RULE_Count_of_G(g);
3691    AIM base_item = g_new(struct s_AHFA_item, Size_of_G(g));
3692    AIM current_item = base_item;
3693    guint symbol_instance_of_next_rule = 0;
3694    for (rule_id = 0; rule_id < (Marpa_Rule_ID)no_of_rules; rule_id++) {
3695      RULE rule = RULE_by_ID (g, rule_id);
3696      if (RULE_is_Used (rule)) {
3697	@<Create the AHFA items for a rule@>@;
3698	SYMI_of_RULE(rule) = symbol_instance_of_next_rule;
3699	symbol_instance_of_next_rule += Length_of_RULE(rule);
3700	}
3701    }
3702    SYMI_Count_of_G(g) = symbol_instance_of_next_rule;
3703    no_of_items = LV_AIM_Count_of_G(g) = current_item - base_item;
3704    g->t_AHFA_items = g_renew(struct s_AHFA_item, base_item, no_of_items);
3705    @<Set up the items-by-rule list@>@;
3706    @<Set up the AHFA item ids@>@;
3707}
3708@ @<Private function prototypes@> =
3709static inline void create_AHFA_items(struct marpa_g* g);
3710
3711@ @<Create the AHFA items for a rule@> =
3712{
3713  gint leading_nulls = 0;
3714  gint rhs_ix;
3715  for (rhs_ix = 0; rhs_ix < Length_of_RULE(rule); rhs_ix++)
3716    {
3717      SYMID rh_symid = RHS_ID_of_RULE (rule, rhs_ix);
3718      SYM symbol = SYM_by_ID (rh_symid);
3719      if (!symbol->t_is_nullable)
3720	{
3721	  Last_Proper_SYMI_of_RULE(rule) = symbol_instance_of_next_rule + rhs_ix;
3722	  @<Create an AHFA item for a precompletion@>@;
3723	  leading_nulls = 0;
3724	  current_item++;
3725	}
3726      else
3727	{
3728	  leading_nulls++;
3729	}
3730    }
3731  @<Create an AHFA item for a completion@>@;
3732  current_item++;
3733}
3734
3735@ @<Create an AHFA item for a precompletion@> =
3736{
3737  RULE_of_AIM (current_item) = rule;
3738  Sort_Key_of_AIM (current_item) = current_item - base_item;
3739  Null_Count_of_AIM(current_item) = leading_nulls;
3740  Postdot_SYMID_of_AIM (current_item) = rh_symid;
3741  Position_of_AIM (current_item) = rhs_ix;
3742}
3743
3744@ @<Create an AHFA item for a completion@> =
3745{
3746  RULE_of_AIM (current_item) = rule;
3747  Sort_Key_of_AIM (current_item) = current_item - base_item;
3748  Null_Count_of_AIM(current_item) = leading_nulls;
3749  Postdot_SYMID_of_AIM (current_item) = -1;
3750  Position_of_AIM (current_item) = -1;
3751}
3752
3753@ This is done after creating the AHFA items, because in
3754theory the |g_renew| might have moved them.
3755This is not likely since the |g_renew| shortened the array,
3756but if you are hoping for portability,
3757you want to follow the rules.
3758@<Set up the items-by-rule list@> =
3759{
3760  AIM *items_by_rule = g_new (AIM, no_of_rules);
3761  AIM items = g->t_AHFA_items;
3762  /* The highest ID of a rule whose AHFA items have been found */
3763  Marpa_Rule_ID highest_found_rule_id = -1;
3764  Marpa_AHFA_Item_ID item_id;
3765  /* |items_by_rule| must be NULL'd
3766      because not all entries will be populated */
3767  for (rule_id = 0; rule_id < (Marpa_Rule_ID) no_of_rules; rule_id++)
3768  {
3769      items_by_rule[rule_id] = NULL;
3770  }
3771  for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++)
3772    {
3773      AIM item = items + item_id;
3774      Marpa_Rule_ID rule_id_for_item = RULE_of_AIM (item)->t_id;
3775      if (rule_id_for_item <= highest_found_rule_id)
3776	continue;
3777      items_by_rule[rule_id_for_item] = item;
3778      highest_found_rule_id = rule_id_for_item;
3779    }
3780  g->t_AHFA_items_by_rule = items_by_rule;
3781}
3782
3783@ @<Private function prototypes@> =
3784static gint cmp_by_aimid (gconstpointer a,
3785	gconstpointer b, gpointer user_data);
3786@ This functions sorts a list of pointers to
3787AHFA items by AHFA item id,
3788which is their most natural order.
3789Once the AHFA states are created,
3790they are restored to this order.
3791For portability,
3792it requires the AIMs to be in an array.
3793@ @<Function definitions@> =
3794static gint cmp_by_aimid (gconstpointer ap,
3795	gconstpointer bp,
3796	gpointer user_data @, G_GNUC_UNUSED) {
3797    AIM a = *(AIM*)ap;
3798    AIM b = *(AIM*)bp;
3799    return a-b;
3800}
3801
3802@ @<Private function prototypes@> =
3803static gint cmp_by_postdot_and_aimid (gconstpointer a,
3804	gconstpointer b, gpointer user_data);
3805@ The AHFA items were created with a temporary ID which sorts them
3806by rule, then by position within that rule.  We need one that sort the AHFA items
3807by (from major to minor) postdot symbol, then rule, then position.
3808A postdot symbol of $-1$ should sort high.
3809This comparison function is used in the logic to change the AHFA item ID's
3810from their temporary values to their final ones.
3811@ @<Function definitions@> =
3812static gint cmp_by_postdot_and_aimid (gconstpointer ap,
3813	gconstpointer bp, gpointer user_data @, G_GNUC_UNUSED) {
3814    AIM a = *(AIM*)ap;
3815    AIM b = *(AIM*)bp;
3816    gint a_postdot = Postdot_SYMID_of_AIM(a);
3817    gint b_postdot = Postdot_SYMID_of_AIM(b);
3818    if (a_postdot == b_postdot)
3819      return Sort_Key_of_AIM (a) - Sort_Key_of_AIM (b);
3820    if (a_postdot < 0) return 1;
3821    if (b_postdot < 0) return -1;
3822    return a_postdot-b_postdot;
3823}
3824
3825@ Change the AHFA ID's from their temporary form to their
3826final form.
3827Pointers to the AHFA items are copied to a temporary array
3828which is then sorted in the order required for the new ID.
3829As a result, the final AHFA ID number will be the same as
3830the index in this temporary arra.
3831A final loop then indexes through
3832the temporary array and writes the index to the pointed-to
3833AHFA item as its new, final ID.
3834@<Set up the AHFA item ids@> =
3835{
3836  Marpa_AHFA_Item_ID item_id;
3837  AIM *sort_array = g_new (struct s_AHFA_item *, no_of_items);
3838  AIM items = g->t_AHFA_items;
3839  for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++)
3840    {
3841      sort_array[item_id] = items + item_id;
3842    }
3843  g_qsort_with_data (sort_array,
3844		     (gint) no_of_items, sizeof (AIM), cmp_by_postdot_and_aimid,
3845		     (gpointer) NULL);
3846  for (item_id = 0; item_id < (Marpa_AHFA_Item_ID) no_of_items; item_id++)
3847    {
3848      Sort_Key_of_AIM (sort_array[item_id]) = item_id;
3849    }
3850  g_free (sort_array);
3851}
3852
3853@** AHFA State (AHFA) Code.
3854
3855This algorithm to create the AHFA states is new with |libmarpa|.
3856It is based on noting that the states to be created fall into
3857distinct classes, and that considerable optimization is possible
3858if the classes of AHFA states are optimized separately.
3859@ In their paper Aycock and Horspool divide the states of their
3860automaton into
3861call non-kernel and kernel states.
3862In the AHFA, kernel states are called discovered AHFA states.
3863Non-kernel states are called predicted AHFA states.
3864If an AHFA states contains a start rule or
3865or an AHFA item for which at least some
3866non-nulling symbol has been recognized,
3867it is an {\bf discovered} AHFA state.
3868Otherwise, the AHFA state will contain only predictions,
3869and is a {\bf predicted} AHFA state.
3870@ Predicted AHFA states are so called because they only contain
3871items which predict, according to the grammar,
3872what might be found in the input.
3873Discovered AHFA states are so called because either they ``report"
3874the start of the input
3875or they ``report" symbols actually found in the input.
3876There is only one case in which
3877a discovered AHFA state will contain a prediction ---
3878that is when the AHFA state contains an
3879AHFA item for the nulling start rule.
3880@ {\bf The Initial AHFA State}:
3881This is the only state which can
3882contain an AHFA item for a null rule.
3883It only takes one of three possible forms.
3884Listing the reasons that it makes sense to special-case
3885this class would take more space than the code to do it.
3886@ {\bf The Initial AHFA Prediction State}:
3887This state is paired with a special-cased state, so it would
3888require going out of our way to {\bf not} special-case this
3889state as well.
3890It does
3891share with the other initial state that property that it is not
3892necessary to check to ensure it does not duplicate an existing
3893state.
3894Other than that, the code is much like that to create any other
3895prediction state.
3896@ {\bf Discovered States with 1 item}:
3897These may be specially optimized for.
3898Sorting the items can be dispensed with.
3899Checking for duplicates can be done using an array indexed by
3900the ID of the only AHFA item.
3901Statistics for practical grammars show that most discovered states
3902contain only a single AHFA item, so there is a big payoff from
3903special-casing these.
3904@ {\bf Discovered States with 2 or more items}:
3905For non-singleton discovered states,
3906I use a hand-written insertion sort,
3907and check for duplicates using a hash with a customized key.
3908Further optimizations are possible, but
3909few discovered states fall into this case.
3910Also, discovered states of 2 items are a large enough class to justify
3911separating out, if a significant optimization for them could be
3912found.
3913@ {\bf Predicted States}:
3914These are treated differently from discovered states.
3915The items in these are always a subset of the initial items for rules,
3916and therefore correspond one-to-one with a powerset of the rules.
3917This fact is used in precomputing rule bit vectors, by postdot symbol,
3918to speed up the construction of these.
3919An advantage of using bit vectors is that a radix sort of the items
3920happens as a side effect.
3921Because prediction states follow a very different distribution from
3922discovered states, they have their own hash for checking duplicates.
3923
3924@<Public typedefs@> =
3925typedef gint Marpa_AHFA_State_ID;
3926
3927@ {\bf Estimating the number of AHFA States}: Based on the numbers given previously
3928for Perl and HTML,
3929$2s$ is a good high-ball estimate of the number of AHFA states for
3930grammars of practical interest,
3931where $s$ is the size of the grammar.
3932I come up with this as follows.
3933
3934Let the size of an AHFA state be the number of AHFA items it contains.
3935\li It is impossible for the number of AHFA items to greater than
3936the size of the grammar.
3937\li It is impossible for the number of discovered states of size 1
3938to be greater than the number of AHFA items.
3939\li The number of discovered states of size 2 or greater
3940will typically be half the number of discovered states of size 1,
3941or less.
3942\li The number of predicted states will typically be
3943considerably less than half the number of discovered states.
3944
3945The three possibilities just enumerated exhaust the possibilities for AHFA states.
3946The total is ${s \over 2} + {s \over 2} + s = 2s$.
3947Typically, the number of AHFA states should be less than this estimate.
3948
3949@d AHFA_of_G_by_ID(g, id) ((g)->t_AHFA+(id))
3950@d AHFA_has_Completed_Start_Rule(ahfa) ((ahfa)->t_has_completed_start_rule)
3951@<Private incomplete structures@> = struct s_AHFA_state;
3952@ @<Private structures@> =
3953struct s_AHFA_state_key {
3954    Marpa_AHFA_State_ID t_id;
3955};
3956struct s_AHFA_state {
3957    struct s_AHFA_state_key t_key;
3958    struct s_AHFA_state* t_empty_transition;
3959    @<Widely aligned AHFA state elements@>@;
3960    @<Int aligned AHFA state elements@>@;
3961    guint t_has_completed_start_rule:1;
3962    @<Bit aligned AHFA elements@>@;
3963};
3964typedef struct s_AHFA_state AHFA_Object;
3965
3966@*0 Complete Symbols Container.
3967@ @d Complete_SYMIDs_of_AHFA(state) ((state)->t_complete_symbols)
3968@d LV_Complete_SYMIDs_of_AHFA(state) Complete_SYMIDs_of_AHFA(state)
3969@d Complete_SYM_Count_of_AHFA(state) ((state)->t_complete_symbol_count)
3970@d LV_Complete_SYM_Count_of_AHFA(state) Complete_SYM_Count_of_AHFA(state)
3971@<Int aligned AHFA state elements@> =
3972guint t_complete_symbol_count;
3973@ @<Widely aligned AHFA state elements@> =
3974SYMID* t_complete_symbols;
3975
3976@*0 AHFA Item Container.
3977@ @d AIMs_of_AHFA(ahfa) ((ahfa)->t_items)
3978@d AIM_of_AHFA_by_AEX(ahfa, aex) (AIMs_of_AHFA(ahfa)[aex])
3979@d LV_AIMs_of_AHFA(ahfa) AIMs_of_AHFA(ahfa)
3980@d AIM_Count_of_AHFA(ahfa) ((ahfa)->t_item_count)
3981@d LV_AIM_Count_of_AHFA(ahfa) AIM_Count_of_AHFA(ahfa)
3982@d AEX_of_AHFA_by_AIM(ahfa, aim) aex_of_ahfa_by_aim_get((ahfa), (aim))
3983@<Widely aligned AHFA state elements@> =
3984AIM* t_items;
3985@ @<Int aligned AHFA state elements@> =
3986guint t_item_count;
3987@ This function assumes that the caller knows that the AHFA item
3988is in the AHFA state.
3989@<Private function prototypes@> =
3990static inline AEX aex_of_ahfa_by_aim_get(AHFA ahfa, AIM aim_sought);
3991@ Binary search is overkill for discovered states,
3992not even repaying the overhead.
3993But prediction states can get larger,
3994and the overhead is always low.
3995An alternative is to have different search routines based on the number
3996of AIM items, but that is more overhead.
3997Perhaps better to just search than
3998to spend cycles figuring out how to search.
3999@<Function definitions@> =
4000static inline AEX aex_of_ahfa_by_aim_get(AHFA ahfa, AIM sought_aim)
4001{
4002    AIM* const aims = AIMs_of_AHFA(ahfa);
4003    gint aim_count = AIM_Count_of_AHFA(ahfa);
4004    gint hi = aim_count - 1;
4005    gint lo = 0;
4006    while (hi >= lo) { // A binary search
4007       gint trial_aex = lo+(hi-lo)/2; // guards against overflow
4008       AIM trial_aim = aims[trial_aex];
4009       if (trial_aim == sought_aim) return trial_aex;
4010       if (trial_aim < sought_aim) {
4011           lo = trial_aex+1;
4012       } else {
4013           hi = trial_aex-1;
4014       }
4015  }
4016  return -1;
4017}
4018
4019@*0 Is AHFA Predicted?.
4020@ This boolean indicates whether the
4021{\bf AHFA state} is predicted,
4022as opposed to whether it contains any predicted
4023AHFA items.
4024This makes a difference in AHFA state 0.
4025When the null parse is allowed.
4026AHFA state 0 will contain an AHFA item
4027which is {\bf both} a prediction
4028and a completion.
4029AHFA state 0 is, however, {\bf never}
4030a predicted AHFA state.
4031@d AHFA_is_Predicted(ahfa) ((ahfa)->t_is_predict)
4032@d LV_AHFA_is_Predicted(ahfa) AHFA_is_Predicted(ahfa)
4033@d EIM_is_Predicted(eim) AHFA_is_Predicted(AHFA_of_EIM(eim))
4034@<Bit aligned AHFA elements@> =
4035guint t_is_predict:1;
4036
4037@ @<Private typedefs@> =
4038typedef struct s_AHFA_state* AHFA;
4039typedef gint AHFAID;
4040
4041@ @<Widely aligned grammar elements@> = struct s_AHFA_state* t_AHFA;
4042@
4043@d AHFA_Count_of_G(g) ((g)->t_AHFA_len)
4044@<Int aligned grammar elements@> = gint t_AHFA_len;
4045@ @<Initialize grammar elements@> =
4046g->t_AHFA = NULL;
4047AHFA_Count_of_G(g) = 0;
4048@*0 Destructor.
4049@<Destroy grammar elements@> = if (g->t_AHFA) {
4050AHFAID id;
4051for (id = 0; id < AHFA_Count_of_G(g); id++) {
4052   AHFA ahfa_state = AHFA_of_G_by_ID(g, id);
4053   @<Free AHFA state@>@;
4054}
4055STOLEN_DQUEUE_DATA_FREE(g->t_AHFA);
4056}
4057
4058@ Most of the data is on the obstack, and will be freed with that.
4059@<Free AHFA state@> = {
4060  TRANS *ahfa_transitions = LV_TRANSs_of_AHFA (ahfa_state);
4061  if (ahfa_transitions)
4062    g_free (TRANSs_of_AHFA (ahfa_state));
4063}
4064
4065@*0 ID of AHFA State.
4066@d ID_of_AHFA(state) ((state)->t_key.t_id)
4067
4068@*0 Validate AHFA ID.
4069Check that AHFA ID is in valid range.
4070@<Function definitions@> =
4071static inline gint AHFA_state_id_is_valid(
4072const struct marpa_g *g, AHFAID AHFA_state_id) {
4073return AHFA_state_id < AHFA_Count_of_G(g) && AHFA_state_id >= 0;
4074}
4075@ @<Private function prototypes@> =
4076static inline gint AHFA_state_id_is_valid(
4077const struct marpa_g *g, AHFAID AHFA_state_id);
4078
4079
4080@*0 Postdot Symbols.
4081@d Postdot_SYM_Count_of_AHFA(state) ((state)->t_postdot_sym_count)
4082@d LV_Postdot_SYM_Count_of_AHFA(state) Postdot_SYM_Count_of_AHFA(state)
4083@d Postdot_SYMID_Ary_of_AHFA(state) ((state)->t_postdot_symid_ary)
4084@d LV_Postdot_SYMID_Ary_of_AHFA(state) Postdot_SYMID_Ary_of_AHFA(state)
4085@<Widely aligned AHFA state elements@> = Marpa_Symbol_ID* t_postdot_symid_ary;
4086@ @<Int aligned AHFA state elements@> = guint t_postdot_sym_count;
4087
4088@*0 AHFA State External Accessors.
4089@<Function definitions@> =
4090guint marpa_AHFA_state_count(struct marpa_g* g) {
4091    return AHFA_Count_of_G(g);
4092}
4093@ @<Public function prototypes@> =
4094guint marpa_AHFA_state_count(struct marpa_g* g);
4095
4096@ @<Function definitions@> =
4097gint
4098marpa_AHFA_state_item_count(struct marpa_g* g, AHFAID AHFA_state_id)
4099{ @<Return |-2| on failure@>@/
4100    AHFA state;
4101    @<Fail if grammar not precomputed@>@/
4102    @<Fail if grammar |AHFA_state_id| is invalid@>@/
4103    state = AHFA_of_G_by_ID(g, AHFA_state_id);
4104    return state->t_item_count;
4105}
4106@ @<Public function prototypes@> =
4107gint marpa_AHFA_state_item_count(struct marpa_g* g, Marpa_AHFA_State_ID AHFA_state_id);
4108
4109@ @<Public function prototypes@> =
4110Marpa_AHFA_Item_ID marpa_AHFA_state_item(struct marpa_g* g,
4111     Marpa_AHFA_State_ID AHFA_state_id,
4112	guint item_ix);
4113@ @d AIMID_of_AHFA_by_AEX(g, ahfa, aex)
4114   ((ahfa)->t_items[aex] - (g)->t_AHFA_items)
4115@<Function definitions@> =
4116Marpa_AHFA_Item_ID marpa_AHFA_state_item(struct marpa_g* g,
4117     AHFAID AHFA_state_id,
4118	guint item_ix) {
4119    AHFA state;
4120    @<Return |-2| on failure@>@/
4121    @<Fail if grammar not precomputed@>@/
4122    @<Fail if grammar |AHFA_state_id| is invalid@>@/
4123    state = AHFA_of_G_by_ID(g, AHFA_state_id);
4124    if (item_ix >= state->t_item_count) {
4125	g_context_clear(g);
4126	g_context_int_add(g, "item_ix", (gint)item_ix);
4127	g_context_int_add(g, "AHFA_state_id", AHFA_state_id);
4128	g->t_error = "invalid state item ix";
4129	return failure_indicator;
4130    }
4131    return AIMID_of_AHFA_by_AEX(g, state, item_ix);
4132}
4133
4134@ @<Function definitions@> =
4135gint marpa_AHFA_state_is_predict(struct marpa_g* g,
4136	AHFAID AHFA_state_id) {
4137    AHFA state;
4138    @<Return |-2| on failure@>@/
4139    @<Fail if grammar not precomputed@>@/
4140    @<Fail if grammar |AHFA_state_id| is invalid@>@/
4141    state = AHFA_of_G_by_ID(g, AHFA_state_id);
4142    return AHFA_is_Predicted(state);
4143}
4144@ @<Public function prototypes@> =
4145gint marpa_AHFA_state_is_predict(struct marpa_g* g,
4146	Marpa_AHFA_State_ID AHFA_state_id);
4147
4148@*0 Completed Start Rule.
4149This external acccessor returns the rule ID of
4150the completed start rule of an AHFA state.
4151Most often there is none, in which case
4152|-1| is returned.
4153For other failures, |-2| is returned.
4154@ @<Public function prototypes@> =
4155Marpa_Rule_ID marpa_AHFA_completed_start_rule(struct marpa_g* g,
4156	Marpa_AHFA_State_ID AHFA_state_id);
4157@ I know that the completed start rule is this AHFA state is
4158unique, via the following theorem.
4159\Theorem/ No AHFA state contains more than one completed start rule.
4160\Proof/: As proved elsewhere in this document,
4161an AHFA state with a completed start rule is either AHFA state 0
4162or a 1-item discovered AHFA state.
4163Clearly the AHFA item which is the completed start rule is
4164unique in a 1-item AHFA state.
4165From its construction we know that
4166AHFA state 0 contains at most two rules:
4167a predicted non-null start rule
4168and a predicted null start rule.
4169A predicted non-null rule is not a completed rule.
4170Therefore only the predicted null start rule
4171can be a completed start rule in AHFA state 0.
4172\QED/.
4173@
4174{\bf To Do}: @^To Do@>
4175This function can probably be eliminated after conversion
4176is complete, along with the flag for whether a rule is a start rule
4177and the flag for tracking whether an AHFA has a completed start rule.
4178
4179@<Function definitions@> =
4180Marpa_Rule_ID marpa_AHFA_completed_start_rule(struct marpa_g* g,
4181	Marpa_AHFA_State_ID AHFA_state_id) {
4182    const gint no_completed_start_rule = -1;
4183    @<Return |-2| on failure@>@;
4184    AHFA state;
4185    @<Fail if grammar not precomputed@>@;
4186    @<Fail if grammar |AHFA_state_id| is invalid@>@;
4187    state = AHFA_of_G_by_ID (g, AHFA_state_id);
4188    if (AHFA_has_Completed_Start_Rule(state)) {
4189	const gint ahfa_item_count = state->t_item_count;
4190	const AIM* ahfa_items = state->t_items;
4191	gint ahfa_ix;
4192	for (ahfa_ix = 0; ahfa_ix < ahfa_item_count; ahfa_ix++)
4193	  {
4194	    const AIM ahfa_item = ahfa_items[ahfa_ix];
4195	    if (AIM_is_Completion (ahfa_item))
4196	      {
4197		const RULE rule = RULE_of_AIM (ahfa_item);
4198		if (RULE_is_Start (rule))
4199		  return ID_of_RULE (rule);
4200	      }
4201	  }
4202      @<Fail with internal grammar error@>@;
4203  }
4204  return no_completed_start_rule;
4205}
4206
4207@*0 Leo LHS Symbol.
4208The Leo LHS symbol is the LHS of the AHFA state's rule,
4209if that state can be a Leo completion.
4210Otherwise it is |-1|.
4211The value of the Leo completion symbol is used to
4212determine if an Earley item
4213with this AHFA state is eligible to be a Leo completion.
4214@d Leo_LHS_ID_of_AHFA(state) ((state)->t_leo_lhs_sym)
4215@d LV_Leo_LHS_ID_of_AHFA(state) Leo_LHS_ID_of_AHFA(state)
4216@d AHFA_is_Leo_Completion(state) (Leo_LHS_ID_of_AHFA(state) >= 0)
4217@ @<Int aligned AHFA state elements@> = SYMID t_leo_lhs_sym;
4218@ @<Public function prototypes@> =
4219Marpa_Symbol_ID marpa_AHFA_state_leo_lhs_symbol(struct marpa_g* g,
4220	Marpa_AHFA_State_ID AHFA_state_id);
4221@ @<Function definitions@> =
4222Marpa_Symbol_ID marpa_AHFA_state_leo_lhs_symbol(struct marpa_g* g,
4223	Marpa_AHFA_State_ID AHFA_state_id) {
4224    @<Return |-2| on failure@>@;
4225    AHFA state;
4226    @<Fail if grammar not precomputed@>@;
4227    @<Fail if grammar |AHFA_state_id| is invalid@>@;
4228    state = AHFA_of_G_by_ID(g, AHFA_state_id);
4229    return Leo_LHS_ID_of_AHFA(state);
4230}
4231
4232@*0 Internal Accessors.
4233@ The ordering of the AHFA states can be arbitrarily chosen
4234to be efficient to compute.
4235The only requirement is that states with identical sets
4236of items compare equal.
4237Here the length is the first subkey, because
4238that will be enough to order most predicted states.
4239The discovered states will be efficient to compute because
4240they will tend either to be short,
4241or quickly differentiated
4242by length.
4243\par
4244Note that this function is not used for discovered AHFA states of
4245size 1.
4246Checking those for duplicates is optimized, using an array
4247indexed by the ID of their only AHFA item.
4248@<Private function prototypes@> =
4249static gint AHFA_state_cmp(gconstpointer a, gconstpointer b);
4250@ @<Function definitions@> =
4251static gint AHFA_state_cmp(
4252    gconstpointer ap,
4253    gconstpointer bp)
4254{
4255    guint i;
4256    AIM* items_a;
4257    AIM* items_b;
4258    const AHFA state_a = (AHFA)ap;
4259    const AHFA state_b = (AHFA)bp;
4260    guint length = state_a->t_item_count;
4261    gint subkey = length - state_b->t_item_count;
4262    if (subkey) return subkey;
4263    if (length != state_b->t_item_count) return FALSE;
4264    items_a = state_a->t_items;
4265    items_b = state_b->t_items;
4266    for (i = 0; i < length; i++) {
4267    subkey = Sort_Key_of_AIM (items_a[i]) - Sort_Key_of_AIM (items_b[i]);
4268   if (subkey) return subkey;
4269}
4270return 0;
4271}
4272
4273@*0 AHFA State Mutators.
4274@ @<Private function prototypes@> =
4275PRIVATE_NOT_INLINE void create_AHFA_states(struct marpa_g* g);
4276@ @<Function definitions@> =
4277PRIVATE_NOT_INLINE
4278void create_AHFA_states(struct marpa_g* g) {
4279    @<Declare locals for creating AHFA states@>@;
4280    @<Initialize locals for creating AHFA states@>@;
4281   @<Construct prediction matrix@>@;
4282   @<Construct initial AHFA states@>@;
4283   while ((p_working_state = DQUEUE_NEXT(states, AHFA_Object))) {
4284       @<Process an AHFA state from the working stack@>@;
4285   }
4286   ahfas_of_g = g->t_AHFA = DQUEUE_BASE(states, AHFA_Object); /* ``Steals"
4287       the |DQUEUE|'s data */
4288   ahfa_count_of_g = AHFA_Count_of_G(g) = DQUEUE_END(states);
4289   @<Resize the transitions@>@;
4290   @<Resort the AIMs and populate the Leo base AEXes@>@;
4291   @<Populate the completed symbol data in the transitions@>@;
4292   @<Free locals for creating AHFA states@>@;
4293}
4294
4295@ @<Declare locals for creating AHFA states@> =
4296   AHFA p_working_state;
4297   const guint initial_no_of_states = 2*Size_of_G(g);
4298   AIM AHFA_item_0_p = g->t_AHFA_items;
4299   const guint symbol_count_of_g = SYM_Count_of_G(g);
4300   const guint rule_count_of_g = RULE_Count_of_G(g);
4301   Bit_Matrix prediction_matrix;
4302   RULE* rule_by_sort_key = g_new(RULE, rule_count_of_g);
4303    GTree* duplicates;
4304    AHFA* singleton_duplicates;
4305   DQUEUE_DECLARE(states);
4306  struct obstack ahfa_work_obs;
4307  gint ahfa_count_of_g;
4308  AHFA ahfas_of_g;
4309
4310@ @<Initialize locals for creating AHFA states@> =
4311    @<Initialize duplicates data structures@>@;
4312   DQUEUE_INIT(states, AHFA_Object, initial_no_of_states);
4313
4314@ @<Initialize duplicates data structures@> =
4315{
4316  guint item_id;
4317  guint no_of_items_in_grammar = AIM_Count_of_G (g);
4318  obstack_init(&ahfa_work_obs);
4319  duplicates = g_tree_new (AHFA_state_cmp);
4320  singleton_duplicates = g_new (AHFA, no_of_items_in_grammar);
4321  for (item_id = 0; item_id < no_of_items_in_grammar; item_id++)
4322    {
4323      singleton_duplicates[item_id] = NULL;	// All zero bits are not necessarily a NULL pointer
4324    }
4325}
4326
4327@ @<Process an AHFA state from the working stack@> = {
4328guint no_of_items = p_working_state->t_item_count;
4329guint current_item_ix=0;
4330AIM*item_list;
4331Marpa_Symbol_ID working_symbol;
4332item_list = p_working_state->t_items;
4333working_symbol = Postdot_SYMID_of_AIM(item_list[0]); /*
4334    Every AHFA has at least one item */
4335if (working_symbol < 0) goto NEXT_AHFA_STATE; /*
4336    All items in this state are completions */
4337    while (1) { /* Loop over all items for this state */
4338	guint first_working_item_ix = current_item_ix;
4339	guint no_of_items_in_new_state;
4340	for (current_item_ix++;
4341		current_item_ix < no_of_items;
4342		current_item_ix++) {
4343	    if (Postdot_SYMID_of_AIM(item_list[current_item_ix]) != working_symbol) break;
4344	}
4345	no_of_items_in_new_state = current_item_ix - first_working_item_ix;
4346	if (no_of_items_in_new_state == 1) {
4347	    @<Create a 1-item discovered AHFA state@>@/
4348	} else {
4349	    @<Create a discovered AHFA state with 2+ items@>@/
4350	}
4351	NEXT_WORKING_SYMBOL: ;
4352	if (current_item_ix >= no_of_items) break;
4353	working_symbol = Postdot_SYMID_of_AIM(item_list[current_item_ix]);
4354	if (working_symbol < 0) break;
4355    }@#
4356NEXT_AHFA_STATE: ;
4357}
4358
4359@ @<Resize the transitions@> =
4360{
4361     gint ahfa_id;
4362     for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++) {
4363	  guint symbol_id;
4364	  AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id);
4365          TRANS* const transitions = TRANSs_of_AHFA(ahfa);
4366	  for (symbol_id = 0; symbol_id < symbol_count_of_g; symbol_id++) {
4367	       TRANS working_transition = transitions[symbol_id];
4368	       if (working_transition) {
4369		   gint completion_count = Completion_Count_of_TRANS(working_transition);
4370		   gint sizeof_transition =
4371		       G_STRUCT_OFFSET (struct s_transition, t_aex) + completion_count *
4372		       sizeof (transitions[0]->t_aex[0]);
4373		   TRANS new_transition = obstack_alloc(&g->t_obs, sizeof_transition);
4374		   LV_To_AHFA_of_TRANS(new_transition) = To_AHFA_of_TRANS(working_transition);
4375		   LV_Completion_Count_of_TRANS(new_transition) = 0;
4376		   transitions[symbol_id] = new_transition;
4377	       }
4378	  }
4379	}
4380}
4381
4382@ @<Populate the completed symbol data in the transitions@> =
4383{
4384     gint ahfa_id;
4385     for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++) {
4386	  const AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id);
4387          TRANS* const transitions = TRANSs_of_AHFA(ahfa);
4388	  if (Complete_SYM_Count_of_AHFA(ahfa) > 0) {
4389	      AIM* aims = AIMs_of_AHFA(ahfa);
4390	      gint aim_count = AIM_Count_of_AHFA(ahfa);
4391	      AEX aex;
4392	      for (aex = 0; aex < aim_count; aex++) {
4393		  AIM ahfa_item = aims[aex];
4394		  if (AIM_is_Completion(ahfa_item)) {
4395		      SYMID completed_symbol_id = LHS_ID_of_AIM(ahfa_item);
4396		      TRANS transition = transitions[completed_symbol_id];
4397		      AEX* aexes = AEXs_of_TRANS(transition);
4398		      gint aex_ix = LV_Completion_Count_of_TRANS(transition)++;
4399MARPA_OFF_DEBUG4("Added completion aex at %d for ahfa_id=%d sym=%d",
4400    aex_ix, ahfa_id, completed_symbol_id);
4401		      aexes[aex_ix] = aex;
4402		  }
4403	      }
4404	  }
4405     }
4406}
4407
4408@ For every AHFA item which can be a Leo base, and any transition
4409(or postdot) symbol that leads to a Leo completion, put the AEX
4410into the |TRANS| structure, for memoization.
4411@<Resort the AIMs and populate the Leo base AEXes@> =
4412{
4413  gint ahfa_id;
4414  for (ahfa_id = 0; ahfa_id < ahfa_count_of_g; ahfa_id++)
4415    {
4416      AHFA ahfa = AHFA_of_G_by_ID(g, ahfa_id);
4417      TRANS* const transitions = TRANSs_of_AHFA(ahfa);
4418      AIM *aims = AIMs_of_AHFA (ahfa);
4419      gint aim_count = AIM_Count_of_AHFA (ahfa);
4420      AEX aex;
4421      g_qsort_with_data(aims, aim_count, sizeof (AIM*), cmp_by_aimid, NULL);
4422      for (aex = 0; aex < aim_count; aex++)
4423	{
4424	  AIM ahfa_item = aims[aex];
4425	  SYMID postdot = Postdot_SYMID_of_AIM (ahfa_item);
4426	  if (postdot >= 0)
4427	    {
4428	      TRANS transition = transitions[postdot];
4429	      AHFA to_ahfa = To_AHFA_of_TRANS (transition);
4430	      if (!AHFA_is_Leo_Completion (to_ahfa))
4431		continue;
4432	      Leo_Base_AEX_of_TRANS (transition) = aex;
4433	    }
4434	}
4435    }
4436}
4437
4438@ @<Free locals for creating AHFA states@> =
4439   g_free(rule_by_sort_key);
4440   matrix_free(prediction_matrix);
4441    @<Free duplicates data structures@>@;
4442     obstack_free(&ahfa_work_obs, NULL);
4443
4444@ @<Free duplicates data structures@> =
4445g_free(singleton_duplicates);
4446g_tree_destroy(duplicates);
4447
4448@ @<Construct initial AHFA states@> = {
4449   AHFA p_initial_state = DQUEUE_PUSH(states, AHFA_Object);@/
4450   Marpa_Rule_ID start_rule_id;
4451   AIM start_item;
4452   SYM start_symbol = SYM_by_ID(g->t_start_symid);
4453   SYM start_alias
4454       = symbol_null_alias(start_symbol);
4455    gint no_of_items_in_new_state = start_alias ? 2 : 1;
4456    AIM* item_list
4457	= obstack_alloc(&g->t_obs, no_of_items_in_new_state*sizeof(AIM));
4458    start_rule_id = g_array_index(start_symbol->t_lhs, Marpa_Rule_ID, 0); /* The start rule
4459	is the unique rule that has the start symbol as its LHS */
4460    start_item = g->t_AHFA_items_by_rule[start_rule_id]; /* The start item is the
4461       initial item for the start rule */
4462    item_list[0] = start_item;
4463    if (start_alias) {
4464       Marpa_Rule_ID alias_rule_id
4465	    = g_array_index(start_alias->t_lhs, Marpa_Rule_ID, 0); /* Start alias
4466	    rule is the unique rule that has
4467	   the start alias as its LHS */
4468	item_list[1] = g->t_AHFA_items_by_rule[alias_rule_id];
4469    }
4470    p_initial_state->t_items = item_list;
4471    p_initial_state->t_item_count = no_of_items_in_new_state;
4472    p_initial_state->t_key.t_id = 0;
4473    LV_AHFA_is_Predicted(p_initial_state) = 0;
4474    LV_Leo_LHS_ID_of_AHFA(p_initial_state) = -1;
4475    LV_TRANSs_of_AHFA(p_initial_state) = transitions_new(g);
4476    p_initial_state->t_empty_transition = NULL;
4477    if (SYM_is_Nulling(start_symbol))
4478      {				// Special case the null parse
4479	SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID));
4480	SYMID completed_symbol_id = ID_of_SYM(start_symbol);
4481	*complete_symids = completed_symbol_id;
4482	completion_count_inc (&ahfa_work_obs, p_initial_state, completed_symbol_id);
4483	LV_Complete_SYMIDs_of_AHFA(p_initial_state) = complete_symids;
4484	LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 1;
4485	p_initial_state->t_has_completed_start_rule = 1;
4486	LV_Postdot_SYM_Count_of_AHFA(p_initial_state) = 0;
4487      }
4488    else
4489      {
4490	SYMID* postdot_symbol_ids;
4491	LV_Postdot_SYM_Count_of_AHFA(p_initial_state) = 1;
4492	postdot_symbol_ids = LV_Postdot_SYMID_Ary_of_AHFA(p_initial_state) =
4493	  obstack_alloc (&g->t_obs, sizeof (SYMID));
4494	*postdot_symbol_ids = Postdot_SYMID_of_AIM(start_item);
4495	if (start_alias)
4496	  {
4497	    SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID));
4498	    SYMID completed_symbol_id = ID_of_SYM(start_alias);
4499	    *complete_symids = completed_symbol_id;
4500	    completion_count_inc(&ahfa_work_obs, p_initial_state, completed_symbol_id);
4501	    LV_Complete_SYMIDs_of_AHFA(p_initial_state) = complete_symids;
4502	    LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 1;
4503	    p_initial_state->t_has_completed_start_rule = 1;
4504	  }
4505	else
4506	  {
4507	    LV_Complete_SYM_Count_of_AHFA(p_initial_state) = 0;
4508	    p_initial_state->t_has_completed_start_rule = 0;
4509	  }
4510	    p_initial_state->t_empty_transition =
4511	    create_predicted_AHFA_state (g,
4512			     matrix_row (prediction_matrix,
4513					 (guint)
4514					 Postdot_SYMID_of_AIM (start_item)),
4515			     rule_by_sort_key, &states, duplicates);
4516      }
4517}
4518
4519@* Discovered AHFA States.
4520@ {\bf Theorem}:
4521An AHFA state that contains a start rule completion is either
4522AHFA state 0 or a 1-item discovered state.
4523{\bf Proof}:
4524AHFA state 0 contains a start rule completion in any grammar
4525for which the null parse is valid.
4526AHFA state 0 also contains the non-null parse predicted rule.
4527\par
4528The grammar is augmented,
4529so that no other rule predicts the start rules.
4530This means that AHFA state 0 will contain the only predicted
4531start rules.
4532The form of the non-null predicted start rule
4533is $S' \leftarrow \cdot S$,
4534where $S'$ is the augmented start symbol and $S$ was
4535the start symbol in the original grammar.
4536This rule will be the only transition out of AHFA state 0.
4537Call the to-state of this transition, state $n$.
4538State $n$ will clearly contain a completed start rule
4539( $S' \leftarrow S \cdot$ ),
4540which will be rule for the only AHFA item in AHFA state $n$.
4541\par
4542Since only state 0 contains
4543$S' \leftarrow \cdot S$,
4544only AHFA state $n$ will contain
4545$S' \leftarrow S \cdot$.
4546Therefore all AHFA states containing start rule completions
4547are either AHFA state 0, or 1-item discovered AHFA states.
4548{\bf QED}.
4549@<Create a 1-item discovered AHFA state@> = {
4550    AHFA p_new_state;
4551    AIM* new_state_item_list;
4552    AIM single_item_p = item_list[first_working_item_ix];
4553    Marpa_AHFA_Item_ID single_item_id;
4554    Marpa_Symbol_ID postdot;
4555    single_item_p++;		// Transition to next item for this rule
4556    single_item_id = single_item_p - AHFA_item_0_p;
4557    p_new_state = singleton_duplicates[single_item_id];
4558    if (p_new_state)
4559      {				/* Do not add, this is a duplicate */
4560	transition_add (&ahfa_work_obs, p_working_state, working_symbol, p_new_state);
4561	goto NEXT_WORKING_SYMBOL;
4562      }
4563    p_new_state = DQUEUE_PUSH (states, AHFA_Object);
4564    /* Create a new AHFA state */
4565    singleton_duplicates[single_item_id] = p_new_state;
4566    new_state_item_list = p_new_state->t_items =
4567	obstack_alloc (&g->t_obs, sizeof (AIM));
4568    new_state_item_list[0] = single_item_p;
4569    p_new_state->t_item_count = 1;
4570    LV_AHFA_is_Predicted(p_new_state) = 0;
4571    if (AIM_has_Completed_Start_Rule(single_item_p)) {
4572	p_new_state->t_has_completed_start_rule = 1;
4573    } else {
4574	p_new_state->t_has_completed_start_rule = 0;
4575    }
4576    LV_Leo_LHS_ID_of_AHFA(p_new_state) = -1;
4577    p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE (states, AHFA_Object);
4578    LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g);
4579    transition_add (&ahfa_work_obs, p_working_state, working_symbol, p_new_state);
4580    postdot = Postdot_SYMID_of_AIM(single_item_p);
4581    if (postdot >= 0)
4582      {
4583	LV_Complete_SYM_Count_of_AHFA(p_new_state) = 0;
4584	p_new_state->t_postdot_sym_count = 1;
4585	p_new_state->t_postdot_symid_ary =
4586	  obstack_alloc (&g->t_obs, sizeof (SYMID));
4587	*(p_new_state->t_postdot_symid_ary) = postdot;
4588    /* If the sole item is not a completion
4589     attempt to create a predicted AHFA state as well */
4590	p_new_state->t_empty_transition =
4591	  create_predicted_AHFA_state (g,
4592				       matrix_row (prediction_matrix,
4593						   (guint) postdot),
4594				       rule_by_sort_key, &states, duplicates);
4595      }
4596    else
4597      {
4598	SYMID lhs_id = LHS_ID_of_AIM(single_item_p);
4599	SYMID* complete_symids = obstack_alloc (&g->t_obs, sizeof (SYMID));
4600	*complete_symids = lhs_id;
4601	LV_Complete_SYMIDs_of_AHFA(p_new_state) = complete_symids;
4602	completion_count_inc(&ahfa_work_obs, p_new_state, lhs_id);
4603	LV_Complete_SYM_Count_of_AHFA(p_new_state) = 1;
4604	p_new_state->t_postdot_sym_count = 0;
4605	p_new_state->t_empty_transition = NULL;
4606	@<If this state can be a Leo completion,
4607	set the Leo completion symbol to |lhs_id|@>@;
4608  }
4609}
4610
4611@
4612Assuming this is a 1-item completion, mark this state as
4613a Leo completion if the last non-nulling symbol is on a LHS.
4614(This eliminates rule which end in a terminal-only symbol from
4615consideration in the Leo logic.)
4616We know that there is a non-nulling symbol, because there is
4617one is every non-nulling rule, the only non-nulling rule will
4618be in AHFA state 0, and AHFA state 0 is
4619handled as a special cases.
4620\par
4621As a note, the current logic makes an item an leo completion
4622if the last non-nulling symbol is on a LHS.
4623With a bit more trouble, I could determine
4624which rules are right-recursive.
4625I would need to compute a transitive closure on the relationship
4626``X right-derives Y" and then consider a state to be
4627a Leo completion
4628only if the LHS of the rule in its only item right-derives its
4629last non-nulling symbol.
4630
4631@ The expression below takes the first (and only) item in
4632the current state, and finds its closest previous non-nulling
4633symbol.
4634This will be the postdot symbol of the AHFA item just prior,
4635which can be found by simply decrementing the pointer.
4636If the predot symbol of an item is on the LHS of any rule,
4637then that state is a Leo completion.
4638@<If this state can be a Leo completion,
4639set the Leo completion symbol to |lhs_id|@> = {
4640  AIM previous_ahfa_item = single_item_p - 1;
4641  SYMID predot_symid = Postdot_SYMID_of_AIM(previous_ahfa_item);
4642  if (SYMBOL_LHS_RULE_COUNT (SYM_by_ID (predot_symid))
4643      > 0)
4644    {
4645	LV_Leo_LHS_ID_of_AHFA(p_new_state) = lhs_id;
4646    }
4647}
4648
4649@ Discovered AHFA states are usually quite small
4650and the insertion sort here is probably optimal for the usual cases.
4651It is $O(n^2)$ for the large AHFA states, but at present there is
4652little value in coding for such cases.
4653Average complexity -- probably $O(1)$.
4654Implemented worst-case complexity: $O(n^2)$.
4655Theoretical complexity: $O(n \log n)$, because another sort can easily be
4656substituted for the insertion sort.
4657\par
4658Note the mixture of indexing and old-fashioned pointer twiddling
4659in the insertion sort.
4660I am usually of the opinion that the pointer twiddling should be left
4661to the optimizer, but in this case I think that a little bit of
4662pointer twiddling actually makes the code clearer than it would
4663be if written 100\% using indexes.
4664@<Create a discovered AHFA state with 2+ items@> = {
4665AHFA p_new_state;
4666guint predecessor_ix;
4667guint no_of_new_items_so_far = 0;
4668AIM* item_list_for_new_state;
4669AHFA queued_AHFA_state;
4670p_new_state = DQUEUE_PUSH(states, AHFA_Object);
4671item_list_for_new_state = p_new_state->t_items = obstack_alloc(&g->t_obs_tricky,
4672    no_of_items_in_new_state * sizeof(AIM));
4673p_new_state->t_item_count = no_of_items_in_new_state;
4674for (predecessor_ix = first_working_item_ix;
4675     predecessor_ix < current_item_ix; predecessor_ix++)
4676  {
4677    gint pre_insertion_point_ix = no_of_new_items_so_far - 1;
4678    AIM new_item_p = item_list[predecessor_ix] + 1;	// Transition to the next item
4679    while (pre_insertion_point_ix >= 0)
4680      {				// Insert the new item, ordered by |sort_key|
4681	AIM *current_item_pp =
4682	  item_list_for_new_state + pre_insertion_point_ix;
4683	if (Sort_Key_of_AIM (new_item_p) >=
4684	    Sort_Key_of_AIM (*current_item_pp))
4685	  break;
4686	*(current_item_pp + 1) = *current_item_pp;
4687	pre_insertion_point_ix--;
4688      }
4689    item_list_for_new_state[pre_insertion_point_ix + 1] = new_item_p;
4690    no_of_new_items_so_far++;
4691  }
4692queued_AHFA_state = assign_AHFA_state(p_new_state, duplicates);
4693if (queued_AHFA_state)
4694  {				// The new state would be a duplicate
4695// Back it out and go on to the next in the queue
4696    (void) DQUEUE_POP (states, AHFA_Object);
4697    obstack_free (&g->t_obs_tricky, item_list_for_new_state);
4698    transition_add (&ahfa_work_obs, p_working_state, working_symbol, queued_AHFA_state);
4699    /* |transition_add()| allocates obstack memory, but uses the
4700       ``non-tricky" obstack */
4701    goto NEXT_WORKING_SYMBOL;
4702  }
4703    // If we added the new state, finish up its data.
4704    p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE(states, AHFA_Object);
4705    LV_AHFA_is_Predicted(p_new_state) = 0;
4706    p_new_state->t_has_completed_start_rule = 0;
4707    LV_Leo_LHS_ID_of_AHFA(p_new_state) =-1;
4708    LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g);
4709    @<Calculate complete and postdot symbols for discovered state@>@/
4710    transition_add(&ahfa_work_obs, p_working_state, working_symbol, p_new_state);
4711    @<Calculate the predicted rule vector for this state
4712        and add the predicted AHFA state@>@/
4713}
4714
4715@ @<Calculate complete and postdot symbols for discovered state@> =
4716{
4717  guint symbol_count = SYM_Count_of_G (g);
4718  guint item_ix;
4719  guint no_of_postdot_symbols;
4720  guint no_of_complete_symbols;
4721  Bit_Vector complete_v = bv_create (symbol_count);
4722  Bit_Vector postdot_v = bv_create (symbol_count);
4723  for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++)
4724    {
4725      AIM item = item_list_for_new_state[item_ix];
4726      Marpa_Symbol_ID postdot = Postdot_SYMID_of_AIM (item);
4727      if (postdot < 0)
4728	{
4729	  gint complete_symbol_id = LHS_ID_of_AIM (item);
4730	  completion_count_inc (&ahfa_work_obs, p_new_state, complete_symbol_id);
4731	  bv_bit_set (complete_v, (guint)complete_symbol_id );
4732	}
4733      else
4734	{
4735	  bv_bit_set (postdot_v, (guint) postdot);
4736	}
4737    }
4738if ((no_of_postdot_symbols = p_new_state->t_postdot_sym_count =
4739     bv_count (postdot_v)))
4740  {
4741    guint min, max, start;
4742    Marpa_Symbol_ID *p_symbol = p_new_state->t_postdot_symid_ary =
4743      obstack_alloc (&g->t_obs,
4744		     no_of_postdot_symbols * sizeof (SYMID));
4745    for (start = 0; bv_scan (postdot_v, start, &min, &max); start = max + 2)
4746      {
4747	Marpa_Symbol_ID postdot;
4748	for (postdot = (Marpa_Symbol_ID) min;
4749	     postdot <= (Marpa_Symbol_ID) max; postdot++)
4750	  {
4751	    *p_symbol++ = postdot;
4752	  }
4753      }
4754  }
4755    if ((no_of_complete_symbols =
4756	 LV_Complete_SYM_Count_of_AHFA (p_new_state) = bv_count (complete_v)))
4757      {
4758	guint min, max, start;
4759	SYMID *complete_symids = obstack_alloc (&g->t_obs,
4760						no_of_complete_symbols *
4761						sizeof (SYMID));
4762	SYMID *p_symbol = complete_symids;
4763	LV_Complete_SYMIDs_of_AHFA (p_new_state) = complete_symids;
4764	for (start = 0; bv_scan (complete_v, start, &min, &max); start = max + 2)
4765	  {
4766	    SYMID complete_symbol_id;
4767	    for (complete_symbol_id = (SYMID) min; complete_symbol_id <= (SYMID) max;
4768		 complete_symbol_id++)
4769	      {
4770		*p_symbol++ = complete_symbol_id;
4771	      }
4772	  }
4773    }
4774    bv_free (postdot_v);
4775    bv_free (complete_v);
4776}
4777
4778@ Find the AHFA state in the argument,
4779creating it if it does not exist.
4780When it does not exist, insert it
4781in the sequence of states
4782and return |NULL|.
4783When it does exist, return a pointer to it.
4784@ @<Private function prototypes@> =
4785static inline AHFA assign_AHFA_state(
4786AHFA state_p, GTree* duplicates);
4787@ @<Function definitions@> =
4788static inline AHFA
4789assign_AHFA_state (AHFA sought_state, GTree* duplicates)
4790{
4791  const AHFA state_found = g_tree_lookup(duplicates, sought_state);
4792  if (state_found) return state_found;
4793  g_tree_insert(duplicates, sought_state, sought_state);
4794  return NULL;
4795}
4796
4797@ @<Calculate the predicted rule vector for this state
4798and add the predicted AHFA state@> = {
4799guint item_ix;
4800Marpa_Symbol_ID postdot = -1; // Initialized to prevent GCC warning
4801for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++) {
4802    postdot = Postdot_SYMID_of_AIM(item_list_for_new_state[item_ix]);
4803    if (postdot >= 0) break;
4804}
4805p_new_state->t_empty_transition = NULL;
4806if (postdot >= 0)
4807{				/* If any item is not a completion ... */
4808  Bit_Vector predicted_rule_vector
4809    = bv_shadow (matrix_row (prediction_matrix, (guint) postdot));
4810  for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++)
4811    {
4812      /* ``or" the other non-complete items into the prediction rule vector */
4813      postdot = Postdot_SYMID_of_AIM (item_list_for_new_state[item_ix]);
4814      if (postdot < 0)
4815	continue;
4816      bv_or_assign (predicted_rule_vector,
4817		    matrix_row (prediction_matrix, (guint) postdot));
4818    }
4819  /* Add the predicted rule */
4820  p_new_state->t_empty_transition = create_predicted_AHFA_state (g,
4821			 predicted_rule_vector,
4822			 rule_by_sort_key,
4823			 &states,
4824			 duplicates);
4825  bv_free (predicted_rule_vector);
4826}
4827}
4828
4829@*0 Predicted AHFA States.
4830The method for building predicted AHFA states is optimized using
4831precomputed bit vectors.
4832This should be very fast,
4833but It is possible to think other methods might
4834be better, at least in some cases.  The bit vectors are $O(s)$ in length, where $s$ is the
4835size of the grammar, and so is the time complexity of the method used.
4836@ It may be possible to look at a list of
4837only the AHFA items actually present in each state,
4838which might be $O(\log s)$ in the average case.  An advantage of the bit vectors is they
4839implicitly perform a radix sort.
4840This would have to be performed explicitly for an enumerated
4841list of AHFA items, making the putative average case $O(\log s \cdot \log \log s)$.
4842@ In the worst case, however, the number of AHFA items in the predicted states is
4843$O(s)$, making the time complexity
4844of a list solution, $O(s \cdot \log s)$.
4845In normal cases,
4846the practical advantages of bit vectors are overwhelming and swamp the theoretical
4847time complexity.
4848The advantage of listing AHFA items is restricted to a putative ``average" case,
4849and even there would not kick in until the grammars became very large.
4850My conclusion is that alternatives to the bit vector implementation deserve
4851further investigation, but that at present, and overall,
4852bit vectors appear clearly superior to the alternatives.
4853@ For the predicted states, I construct a symbol-by-rule matrix
4854of predictions.  First, I determine which symbols directly predict
4855others.  Then I compute the transitive closure.
4856Finally, I convert this to a symbol-by-rule matrix.
4857The symbol-by-rule matrix will be used in constructing the prediction
4858states.
4859
4860@ @<Construct prediction matrix@> = {
4861    Bit_Matrix symbol_by_symbol_matrix =
4862	matrix_create (symbol_count_of_g, symbol_count_of_g);
4863    @<Initialize the symbol-by-symbol matrix@>@/
4864    transitive_closure(symbol_by_symbol_matrix);
4865    @<Create the prediction matrix from the symbol-by-symbol matrix@>@/
4866    matrix_free(symbol_by_symbol_matrix);
4867}
4868
4869@ @<Initialize the symbol-by-symbol matrix@> =
4870{
4871  RULEID rule_id;
4872  SYMID symid;
4873  AIM *items_by_rule = g->t_AHFA_items_by_rule;
4874  for (symid = 0; symid < (SYMID) symbol_count_of_g; symid++)
4875    {
4876      /* If a symbol appears on a LHS, it predicts itself. */
4877      SYM symbol = SYM_by_ID (symid);
4878      if (!SYMBOL_LHS_RULE_COUNT (symbol))
4879	continue;
4880      matrix_bit_set (symbol_by_symbol_matrix, (guint) symid, (guint) symid);
4881    }
4882  for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++)
4883    {
4884      SYMID from, to;
4885      /* Get the initial item for the rule */
4886      AIM item = items_by_rule[rule_id];
4887      /* Not all rules have items */
4888      if (!item)
4889	continue;
4890      from = LHS_ID_of_AIM (item);
4891      to = Postdot_SYMID_of_AIM (item);
4892      /* There is no symbol-to-symbol transition for a completion item */
4893      if (to < 0)
4894	continue;
4895      /* Set a bit in the matrix */
4896      matrix_bit_set (symbol_by_symbol_matrix, (guint) from, (guint) to);
4897    }
4898}
4899
4900@ At this point I have a full matrix showing which symbol implies a prediction
4901of which others.  To save repeated processing when building the AHFA prediction states,
4902I now convert it into a matrix from symbols to the rules they predict.
4903Specifically, if symbol |S1| predicts symbol |S2|, then symbol |S1|
4904predicts every rule
4905with |S2| on its LHS.
4906@<Create the prediction matrix from the symbol-by-symbol matrix@> = {
4907    AIM* items_by_rule = g->t_AHFA_items_by_rule;
4908    SYMID from_symid;
4909    guint* sort_key_by_rule_id = g_new(guint, rule_count_of_g);
4910    guint no_of_predictable_rules = 0;
4911    @<Populate |sort_key_by_rule_id| with first pass value;
4912	calculate |no_of_predictable_rules|@>@/
4913    @<Populate |rule_by_sort_key|@>@/
4914    @<Populate |sort_key_by_rule_id| with second pass value@>@/
4915    @<Populate the prediction matrix@>@/
4916    g_free(sort_key_by_rule_id);
4917}
4918
4919@ For creating prediction AHFA states, we need to have an ordering of rules
4920by their postdot symbol.
4921A ``predictable rule" is one whose initial item has a postdot symbol.
4922The following facts hold:
4923\li A rule is predictable iff it is both used and non-nulling.
4924\li A rule is predictable iff it is a used rule which is not the nulling start rule.
4925\li A rule is predictable iff it has any item with a postdot symbol.
4926\par
4927Here we take a first pass at this, letting the value be the postdot symbol for
4928the predictable rules.
4929|G_MAXINT| is used for the others, so that they will sort high.
4930(|G_MAXINT| is used and not |G_MAXUINT|, because the sort routines
4931work with signed values.)
4932This first pass fully captures the order, but
4933our final result needs to be an unique ID for every ``predictable rule",
4934so that it can be used as the index in a bit vector.
4935@<Populate |sort_key_by_rule_id| with first pass value;
4936calculate |no_of_predictable_rules|@> =
4937{
4938  RULEID rule_id;
4939  for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++)
4940    {
4941      AIM item = items_by_rule[rule_id];
4942      SYMID postdot;
4943      if (!item)
4944	goto NOT_A_PREDICTABLE_RULE;
4945      postdot = Postdot_SYMID_of_AIM (item);
4946      if (postdot < 0)
4947	goto NOT_A_PREDICTABLE_RULE;
4948      sort_key_by_rule_id[rule_id] = postdot;
4949      no_of_predictable_rules++;
4950      continue;
4951    NOT_A_PREDICTABLE_RULE:
4952      sort_key_by_rule_id[rule_id] = G_MAXINT;
4953    }
4954}
4955
4956@ @<Populate |rule_by_sort_key|@> =
4957{
4958  RULEID rule_id;
4959  for (rule_id = 0; rule_id < (RULEID) rule_count_of_g; rule_id++)
4960    {
4961      rule_by_sort_key[rule_id] = RULE_by_ID (g, rule_id);
4962    }
4963  g_qsort_with_data (rule_by_sort_key, (gint)rule_count_of_g,
4964		     sizeof (RULE), cmp_by_rule_sort_key,
4965		     (gpointer) sort_key_by_rule_id);
4966}
4967
4968@ @<Function definitions@> = static gint
4969cmp_by_rule_sort_key(gconstpointer ap,
4970	gconstpointer bp, gpointer user_data) {
4971    RULE a = *(RULE*)ap;
4972    RULE b = *(RULE*)bp;
4973    guint* sort_key_by_rule_id = (guint*)user_data;
4974    Marpa_Rule_ID a_id = a->t_id;
4975    Marpa_Rule_ID b_id = b->t_id;
4976    guint sort_key_a = sort_key_by_rule_id[a_id];
4977    guint sort_key_b = sort_key_by_rule_id[b_id];
4978    if (sort_key_a == sort_key_b) return a_id - b_id;
4979    return sort_key_a - sort_key_b;
4980}
4981@ @<Private function prototypes@> = static
4982gint cmp_by_rule_sort_key(gconstpointer ap,
4983	gconstpointer bp, gpointer user_data);
4984
4985@ We have now sorted the rules into the final sort key order.
4986With this final version of the sort keys,
4987populate the index from rule id to sort key.
4988@<Populate |sort_key_by_rule_id| with second pass value@> =
4989{
4990  guint sort_key;
4991  for (sort_key = 0; sort_key < rule_count_of_g; sort_key++)
4992    {
4993      RULE rule = rule_by_sort_key[sort_key];
4994      sort_key_by_rule_id[rule->t_id] = sort_key;
4995    }
4996}
4997
4998@ @<Populate the prediction matrix@> =
4999{
5000  prediction_matrix = matrix_create (symbol_count_of_g, no_of_predictable_rules);
5001  for (from_symid = 0; from_symid < (SYMID) symbol_count_of_g;
5002       from_symid++)
5003    {
5004      // for every row of the symbol-by-symbol matrix
5005      guint min, max, start;
5006      for (start = 0;
5007	   bv_scan (matrix_row
5008		    (symbol_by_symbol_matrix, (guint) from_symid), start,
5009		    &min, &max); start = max + 2)
5010	{
5011	  Marpa_Symbol_ID to_symid;
5012	  for (to_symid = min; to_symid <= (Marpa_Symbol_ID) max;
5013	       to_symid++)
5014	    {
5015	      // for every predicted symbol
5016	      SYM to_symbol = SYM_by_ID (to_symid);
5017	      GArray *lhs_rules = to_symbol->t_lhs;
5018	      guint ix, no_of_lhs_rules = lhs_rules->len;
5019	      for (ix = 0; ix < no_of_lhs_rules; ix++)
5020		{
5021		  // For every rule with that symbol on its LHS
5022		  Marpa_Rule_ID rule_with_this_lhs_symbol =
5023		    g_array_index (lhs_rules, Marpa_Rule_ID, ix);
5024		  guint sort_key =
5025		    sort_key_by_rule_id[rule_with_this_lhs_symbol];
5026		  if (sort_key >= no_of_predictable_rules)
5027		    continue;	/*
5028				   We only need to predict rules which have items */
5029		  matrix_bit_set (prediction_matrix, (guint) from_symid,
5030				  sort_key);
5031		  // Set the $(symbol, rule sort key)$ bit in the matrix
5032		}
5033	    }
5034	}
5035    }
5036}
5037
5038@ @<Private function prototypes@> =
5039static AHFA
5040create_predicted_AHFA_state(
5041     struct marpa_g* g,
5042     Bit_Vector prediction_rule_vector,
5043     RULE* rule_by_sort_key,
5044     DQUEUE states_p,
5045     GTree* duplicates
5046     );
5047@ @<Function definitions@> =
5048static AHFA
5049create_predicted_AHFA_state(
5050     struct marpa_g* g,
5051     Bit_Vector prediction_rule_vector,
5052     RULE* rule_by_sort_key,
5053     DQUEUE states_p,
5054     GTree* duplicates
5055     ) {
5056AIM* item_list_for_new_state;
5057AHFA p_new_state;
5058guint item_list_ix = 0;
5059guint no_of_items_in_new_state = bv_count( prediction_rule_vector);
5060	if (no_of_items_in_new_state == 0) return NULL;
5061item_list_for_new_state = obstack_alloc (&g->t_obs,
5062	       no_of_items_in_new_state * sizeof (AIM));
5063{
5064  guint start, min, max;
5065  for (start = 0; bv_scan (prediction_rule_vector, start, &min, &max);
5066       start = max + 2)
5067    {				// Scan the prediction rule vector again, this time to populate the list
5068      guint rule_sort_key;
5069      for (rule_sort_key = min; rule_sort_key <= max; rule_sort_key++)
5070	{
5071	  /* Add the initial item for the predicted rule */
5072	  RULE rule = rule_by_sort_key[rule_sort_key];
5073	  item_list_for_new_state[item_list_ix++] =
5074	    g->t_AHFA_items_by_rule[rule->t_id];
5075	}
5076    }
5077}
5078p_new_state = DQUEUE_PUSH((*states_p), AHFA_Object);@/
5079    p_new_state->t_items = item_list_for_new_state;
5080    p_new_state->t_item_count = no_of_items_in_new_state;
5081    { AHFA queued_AHFA_state = assign_AHFA_state(p_new_state, duplicates);
5082        if (queued_AHFA_state) {
5083		 /* The new state would be a duplicate.
5084		 Back it out and return the one that already exists */
5085	    (void)DQUEUE_POP((*states_p), AHFA_Object);
5086	    obstack_free(&g->t_obs, item_list_for_new_state);
5087	    return queued_AHFA_state;
5088	}
5089    }
5090    // The new state was added -- finish up its data
5091    p_new_state->t_key.t_id = p_new_state - DQUEUE_BASE((*states_p), AHFA_Object);
5092    LV_AHFA_is_Predicted(p_new_state) = 1;
5093    p_new_state->t_has_completed_start_rule = 0;
5094    LV_Leo_LHS_ID_of_AHFA(p_new_state) = -1;
5095    p_new_state->t_empty_transition = NULL;
5096    LV_TRANSs_of_AHFA(p_new_state) = transitions_new(g);
5097    LV_Complete_SYM_Count_of_AHFA(p_new_state) = 0;
5098    @<Calculate postdot symbols for predicted state@>@/
5099    return p_new_state;
5100}
5101
5102@ @<Calculate postdot symbols for predicted state@> =
5103{
5104  guint symbol_count = SYM_Count_of_G (g);
5105  guint item_ix;
5106  guint no_of_postdot_symbols;
5107  Bit_Vector postdot_v = bv_create (symbol_count);
5108    for (item_ix = 0; item_ix < no_of_items_in_new_state; item_ix++)
5109      {
5110	AIM item = item_list_for_new_state[item_ix];
5111	SYMID postdot = Postdot_SYMID_of_AIM (item);
5112	if (postdot >= 0)
5113	  bv_bit_set (postdot_v, (guint) postdot);
5114      }
5115    if ((no_of_postdot_symbols = p_new_state->t_postdot_sym_count =
5116     bv_count (postdot_v)))
5117  {
5118    guint min, max, start;
5119    Marpa_Symbol_ID *p_symbol = p_new_state->t_postdot_symid_ary =
5120      obstack_alloc (&g->t_obs,
5121		     no_of_postdot_symbols * sizeof (SYMID));
5122    for (start = 0; bv_scan (postdot_v, start, &min, &max); start = max + 2)
5123      {
5124	Marpa_Symbol_ID postdot;
5125	for (postdot = (Marpa_Symbol_ID) min;
5126	     postdot <= (Marpa_Symbol_ID) max; postdot++)
5127	  {
5128	    *p_symbol++ = postdot;
5129	  }
5130      }
5131  }
5132    bv_free (postdot_v);
5133}
5134
5135@** Transition (TRANS) Code.
5136This code deals with data which is accessed
5137as a function of AHFA state and symbol.
5138The most important data
5139of this type are the AHFA state transitions,
5140which is why the per-AHFA-per-symbol data is called
5141``transition" data.
5142But per-AHFA symbol completion data is also
5143a function of AHFA state and symbol.
5144@ This operation is at the heart of the parse engine,
5145and worth a careful look.
5146Speed is probably optimal.
5147Time complexity is fine --- $O(1)$ in the length of the input.
5148@ But this solution is is very space-intensive---%
5149perhaps $O(\v g\v^2)$.
5150Ordinarily, for code which is executed this heavily,
5151I would worry about a speed versus space tradeoff of this kind.
5152But these arrays are extremely sparse,
5153Many rows of the array have only one or two entries.
5154There are alternatives
5155which save a lot of space in return for a small overhead in time.
5156@ A very similar problem has been the subject of considerable
5157study---%
5158LALR and LR(0) state tables.
5159These also index by state and symbol, and their usage is very
5160similar to that expected for the AHFA lookups.
5161@ Bison's solution is probably worth study.
5162This is a kind of perfect hashing, and quite complex.
5163I do wonder if it would not be over-engineering
5164in the libmarpa context.
5165In practical applications, a binary search, or even
5166a linear search,
5167may have be fastest implementation for
5168the average case.
5169@ The trend is for memory to get cheap,
5170favoring the sparse 2-dimensional array
5171which is the present solution.
5172But I expect the trend will also be for grammars to get larger.
5173This would be a good issue to run some benchmarks on,
5174once I stabilize the C code implemention.
5175
5176@d TRANS_of_AHFA_by_SYMID(from_ahfa, id)
5177    (*(TRANSs_of_AHFA(from_ahfa)+(id)))
5178@d TRANS_of_EIM_by_SYMID(eim, id) TRANS_of_AHFA_by_SYMID(AHFA_of_EIM(eim), (id))
5179@d To_AHFA_of_TRANS(trans) (to_ahfa_of_transition_get(trans))
5180@d LV_To_AHFA_of_TRANS(trans) ((trans)->t_ur.t_to_ahfa)
5181@d Completion_Count_of_TRANS(trans)
5182    (completion_count_of_transition_get(trans))
5183@d LV_Completion_Count_of_TRANS(trans) ((trans)->t_ur.t_completion_count)
5184@d To_AHFA_of_AHFA_by_SYMID(from_ahfa, id)
5185     (To_AHFA_of_TRANS(TRANS_of_AHFA_by_SYMID((from_ahfa), (id))))
5186@d Completion_Count_of_AHFA_by_SYMID(from_ahfa, id)
5187     (Completion_Count_of_TRANS(TRANS_of_AHFA_by_SYMID((from ahfa), (id))))
5188@d To_AHFA_of_EIM_by_SYMID(eim, id) To_AHFA_of_AHFA_by_SYMID(AHFA_of_EIM(eim), (id))
5189@d AEXs_of_TRANS(trans) ((trans)->t_aex)
5190@d Leo_Base_AEX_of_TRANS(trans) ((trans)->t_leo_base_aex)
5191@ @s TRANS int
5192@<Private incomplete structures@> =
5193struct s_transition;
5194typedef struct s_transition* TRANS;
5195struct s_ur_transition;
5196typedef struct s_ur_transition* URTRANS;
5197@ @<Private typedefs@> = typedef gint AEX;
5198@ @<Private structures@> =
5199struct s_ur_transition {
5200    AHFA t_to_ahfa;
5201    gint t_completion_count;
5202};
5203struct s_transition {
5204    struct s_ur_transition t_ur;
5205    AEX t_leo_base_aex;
5206    AEX t_aex[1];
5207};
5208@ @d TRANSs_of_AHFA(ahfa) ((ahfa)->t_transitions)
5209@d LV_TRANSs_of_AHFA(ahfa) TRANSs_of_AHFA(ahfa)
5210@<Widely aligned AHFA state elements@> =
5211    TRANS* t_transitions;
5212@ @<Private function prototypes@> =
5213static inline AHFA to_ahfa_of_transition_get(TRANS transition);
5214@ @<Function definitions@> =
5215static inline AHFA to_ahfa_of_transition_get(TRANS transition) {
5216     if (!transition) return NULL;
5217     return transition->t_ur.t_to_ahfa;
5218}
5219@ @<Private function prototypes@> =
5220static inline gint completion_count_of_transition_get(TRANS transition);
5221@ @<Function definitions@> =
5222static inline gint completion_count_of_transition_get(TRANS transition) {
5223     if (!transition) return 0;
5224     return transition->t_ur.t_completion_count;
5225}
5226
5227@ @<Private function prototypes@> =
5228static inline
5229URTRANS transition_new(struct obstack *obstack, AHFA to_ahfa, gint aim_ix);
5230@ @<Function definitions@> =
5231static inline
5232URTRANS transition_new(struct obstack *obstack, AHFA to_ahfa, gint aim_ix) {
5233     URTRANS transition;
5234     transition = obstack_alloc (obstack, sizeof (transition[0]));
5235     transition->t_to_ahfa = to_ahfa;
5236     transition->t_completion_count = aim_ix;
5237     return transition;
5238}
5239
5240@ @<Private function prototypes@> = static inline
5241TRANS* transitions_new(struct marpa_g* g);
5242@ @<Function definitions@> = static inline
5243TRANS* transitions_new(struct marpa_g* g) {
5244    gint symbol_count = SYM_Count_of_G(g);
5245    gint symid = 0;
5246    TRANS* transitions;
5247    transitions = g_malloc(symbol_count * sizeof(transitions[0]));
5248    while (symid < symbol_count) transitions[symid++] = NULL; /*
5249        |g_malloc0| will not work because NULL is not guaranteed
5250	to be a bitwise zero. */
5251    return transitions;
5252}
5253
5254@ @<Private function prototypes@> =
5255static inline
5256void transition_add(struct obstack *obstack, AHFA from_ahfa, SYMID symid, AHFA to_ahfa);
5257@ @<Function definitions@> =
5258static inline
5259void transition_add(struct obstack *obstack, AHFA from_ahfa, SYMID symid, AHFA to_ahfa)
5260{
5261    TRANS* transitions = TRANSs_of_AHFA(from_ahfa);
5262    TRANS transition = transitions[symid];
5263    if (!transition) {
5264        transitions[symid] = (TRANS)transition_new(obstack, to_ahfa, 0);
5265	return;
5266    }
5267    LV_To_AHFA_of_TRANS(transition) = to_ahfa;
5268    return;
5269}
5270
5271@ @<Private function prototypes@> =
5272static inline
5273void completion_count_inc(struct obstack *obstack, AHFA from_ahfa, SYMID symid);
5274@ @<Function definitions@> =
5275static inline
5276void completion_count_inc(struct obstack *obstack, AHFA from_ahfa, SYMID symid)
5277{
5278    TRANS* transitions = TRANSs_of_AHFA(from_ahfa);
5279    TRANS transition = transitions[symid];
5280    if (!transition) {
5281        transitions[symid] = (TRANS)transition_new(obstack, NULL, 1);
5282	return;
5283    }
5284    LV_Completion_Count_of_TRANS(transition)++;
5285    return;
5286}
5287
5288@*0 Trace Functions.
5289@<Public function prototypes@> =
5290gint marpa_AHFA_state_transitions(struct marpa_g* g,
5291    Marpa_AHFA_State_ID AHFA_state_id,
5292    GArray *result);
5293@ @<Function definitions@> =
5294gint marpa_AHFA_state_transitions(struct marpa_g* g,
5295    Marpa_AHFA_State_ID AHFA_state_id,
5296    GArray *result) {
5297
5298    @<Return |-2| on failure@>@;
5299    AHFA from_ahfa_state;
5300    TRANS* transitions;
5301    SYMID symid;
5302    gint symbol_count;
5303
5304    @<Fail if grammar not precomputed@>@;
5305    @<Fail if grammar |AHFA_state_id| is invalid@>@;
5306    @<Fail grammar if elements of |result| are not |sizeof(gint)|@>@;
5307    from_ahfa_state = AHFA_of_G_by_ID(g, AHFA_state_id);
5308    transitions = TRANSs_of_AHFA(from_ahfa_state);
5309    symbol_count = SYM_Count_of_G(g);
5310    g_array_set_size(result, 0);
5311    for (symid = 0; symid < symbol_count; symid++) {
5312        AHFA to_ahfa_state = To_AHFA_of_TRANS(transitions[symid]);
5313	if (!to_ahfa_state) continue;
5314	g_array_append_val (result, symid);
5315	g_array_append_val (result, ID_of_AHFA(to_ahfa_state));
5316    }
5317    return result->len;
5318}
5319
5320@** Empty Transition Code.
5321@d Empty_Transition_of_AHFA(state) ((state)->t_empty_transition)
5322@*0 Trace Functions.
5323@<Public function prototypes@> =
5324@ @<Public function prototypes@> =
5325Marpa_AHFA_State_ID marpa_AHFA_state_empty_transition(struct marpa_g* g,
5326     Marpa_AHFA_State_ID AHFA_state_id);
5327@ In the external accessor,
5328-1 is a valid return value, indicating no empty transition.
5329@<Function definitions@> =
5330AHFAID marpa_AHFA_state_empty_transition(struct marpa_g* g,
5331     AHFAID AHFA_state_id) {
5332    AHFA state;
5333    AHFA empty_transition_state;
5334    @<Return |-2| on failure@>@/
5335    @<Fail if grammar not precomputed@>@/
5336    @<Fail if grammar |AHFA_state_id| is invalid@>@/
5337    state = AHFA_of_G_by_ID(g, AHFA_state_id);
5338    empty_transition_state = Empty_Transition_of_AHFA (state);
5339    if (empty_transition_state)
5340      return ID_of_AHFA (empty_transition_state);
5341    return -1;
5342}
5343
5344
5345@** Populating the Terminal Boolean Vector.
5346@<Populate the Terminal Boolean Vector@> = {
5347    gint symbol_count = SYM_Count_of_G(g);
5348    gint symid;
5349    Bit_Vector bv_is_terminal = bv_create( (guint)symbol_count );
5350    g->t_bv_symid_is_terminal = bv_is_terminal;
5351    for (symid = 0; symid < symbol_count; symid++) {
5352      if (!SYMID_is_Terminal(symid)) continue;
5353      bv_bit_set(bv_is_terminal, (guint)symid);
5354    }
5355}
5356
5357@** Recognizer (RECCE) Code.
5358@<Public incomplete structures@> =
5359struct marpa_r;
5360@ @<Private typedefs@> =
5361typedef struct marpa_r* RECCE;
5362@ @<Recognizer structure@> =
5363struct marpa_r {
5364@<Widely aligned recognizer elements@>@/
5365@<Int aligned recognizer elements@>@/
5366@<Bit aligned recognizer elements@>@/
5367};
5368
5369@ @<Public function prototypes@> =
5370struct marpa_r* marpa_r_new( struct marpa_g* g );
5371@ The grammar must not be deallocated for the life of the
5372recognizer.
5373In the event of an error creating the recognizer,
5374|NULL| is returned and the error status
5375of the {\bf grammar} is set.
5376For this reason, the grammar is not |const|.
5377@<Function definitions@> =
5378struct marpa_r* marpa_r_new( struct marpa_g* g )
5379{ RECCE r;
5380    gint symbol_count_of_g;
5381    @<Return |NULL| on failure@>@/
5382    if (!G_is_Precomputed(g)) {
5383        g->t_error = "precomputed";
5384	return failure_indicator;
5385    }
5386    r = g_slice_new(struct marpa_r);
5387    r->t_grammar = g;
5388    symbol_count_of_g = SYM_Count_of_G(g);
5389    @<Initialize recognizer obstack@>@;
5390    @<Initialize recognizer elements@>@;
5391   return r; }
5392
5393@ @<Function definitions@> =
5394void marpa_r_free(struct marpa_r *r)
5395{
5396@<Destroy recognizer elements@>@;
5397if (r->t_sym_workarea) g_free(r->t_sym_workarea);
5398if (r->t_workarea2) g_free(r->t_workarea2);
5399@<Free working bit vectors for symbols@>@;
5400@<Destroy recognizer obstack@>@;
5401g_slice_free(struct marpa_r, r);
5402}
5403@ @<Public function prototypes@> =
5404void marpa_r_free(struct marpa_r *r);
5405
5406@*0 The Recognizer ID.
5407A unique ID for the recognizer.
5408This must be unique not just per-thread,
5409but process-wide.
5410The counter which tracks recognizer ID's
5411(|next_recce_id|)
5412is (at this writing) the only global
5413non-constant, and requires special handling to
5414keep |libmarpa| MT-safe.
5415(|next_recce_id|) is accessed only via
5416|glib|'s special atomic operations.
5417@ @<Int aligned recognizer elements@> = gint t_id;
5418@ @<Public typedefs@> = typedef gint Marpa_Recognizer_ID;
5419@ @<Private global variables@> = static gint next_recce_id = 1;
5420@ @<Initialize recognizer elements@> =
5421r->t_id = g_atomic_int_exchange_and_add(&next_recce_id, 1);
5422@ @<Function definitions@> =
5423gint marpa_r_id(struct marpa_r* r) { return r->t_id; }
5424@ @<Public function prototypes@> =
5425gint marpa_r_id(struct marpa_r* r);
5426
5427@*0 The Grammar for the Recognizer.
5428Initialized in |marpa_r_new|.
5429@d G_of_R(r) ((r)->t_grammar)
5430@d AHFA_Count_of_R(r) AHFA_Count_of_G(G_of_R(r))
5431@ @<Widely aligned recognizer elements@> = const struct marpa_g *t_grammar;
5432
5433@*0 Recognizer Phase.
5434The recognizer has phases, such as ``input"
5435and ``evaluation",
5436and states, such as ``exhausted".
5437The main distinction is that the
5438phases are mutually exclusive---%
5439entering one means leaving another.
5440``Exhausted" is not a phase, because when a parser is
5441exhausted it may gone into the evaluation phase, then
5442return to the input phase,
5443All that time it will remain ``exhausted".
5444@ {\bf To Do}: @^To Do@>
5445Once I refactor the objects, these phases will need to be
5446revisited.
5447|evaluation_phase| should probably be eliminated at that point,
5448assuming that the bocage object can be made independent of
5449the recognizer.
5450@<Public typedefs@> =
5451enum marpa_phase {
5452    no_such_phase = 0, // 0 is never a valid phase
5453    initial_phase,
5454    input_phase,
5455    evaluation_phase,
5456    error_phase
5457};
5458typedef enum marpa_phase Marpa_Phase;
5459@ @d Phase_of_R(r) ((r)->t_phase)
5460@<Int aligned recognizer elements@> =
5461Marpa_Phase t_phase;
5462@ @<Initialize recognizer elements@> =
5463Phase_of_R(r) = initial_phase;
5464@ @<Public function prototypes@> =
5465Marpa_Phase marpa_phase(struct marpa_r* r);
5466@ @<Function definitions@> =
5467Marpa_Phase marpa_phase(struct marpa_r* r)
5468{ return Phase_of_R(r); }
5469
5470@*0 Earley Set Container.
5471@d First_ES_of_R(r) ((r)->t_first_earley_set)
5472@d LV_First_ES_of_R(r) First_ES_of_R(r)
5473@<Widely aligned recognizer elements@> =
5474ES t_first_earley_set;
5475ES t_latest_earley_set;
5476EARLEME t_current_earleme;
5477@ @<Initialize recognizer elements@> =
5478r->t_first_earley_set = NULL;
5479r->t_latest_earley_set = NULL;
5480r->t_current_earleme = -1;
5481
5482@*0 Current Earleme.
5483@d Latest_ES_of_R(r) ((r)->t_latest_earley_set)
5484@d LV_Latest_ES_of_R(r) Latest_ES_of_R(r)
5485@d Current_Earleme_of_R(r) ((r)->t_current_earleme)
5486@d LV_Current_Earleme_of_R(r) (Current_Earleme_of_R(r))
5487@ @<Public function prototypes@> =
5488guint marpa_current_earleme(struct marpa_r* r);
5489@ @<Function definitions@> =
5490guint marpa_current_earleme(struct marpa_r* r)
5491{ return Current_Earleme_of_R(r); }
5492
5493@ @d Current_ES_of_R(r) current_es_of_r(r)
5494@<Private function prototypes@> =
5495static inline ES current_es_of_r(RECCE r);
5496@ @<Function definitions@> =
5497static inline ES current_es_of_r(RECCE r)
5498{
5499    const ES latest = Latest_ES_of_R(r);
5500    if (Earleme_of_ES(latest) == Current_Earleme_of_R(r)) return latest;
5501    return NULL;
5502}
5503
5504@*0 Earley Set Warning Threshold.
5505@d DEFAULT_EIM_WARNING_THRESHOLD (100)
5506@<Int aligned recognizer elements@> = guint t_earley_item_warning_threshold;
5507@ @<Initialize recognizer elements@> =
5508r->t_earley_item_warning_threshold = MAX(DEFAULT_EIM_WARNING_THRESHOLD, AIM_Count_of_G(g)*2);
5509@ @<Public function prototypes@> =
5510guint marpa_earley_item_warning_threshold(struct marpa_r* r);
5511@ @<Function definitions@> =
5512guint marpa_earley_item_warning_threshold(struct marpa_r* r)
5513{ return r->t_earley_item_warning_threshold; }
5514
5515@ @<Public function prototypes@> =
5516gboolean marpa_earley_item_warning_threshold_set(struct marpa_r*r, guint threshold);
5517@ Returns |TRUE| on success,
5518|FALSE| on failure.
5519@<Function definitions@> =
5520gboolean marpa_earley_item_warning_threshold_set(struct marpa_r*r, guint threshold)
5521{
5522    r->t_earley_item_warning_threshold = threshold == 0 ? EIM_FATAL_THRESHOLD : threshold;
5523    return TRUE;
5524}
5525
5526@*0 Furthest Earleme.
5527The ``furthest" or highest-numbered earleme.
5528This is the earleme of the last Earley set that contains anything.
5529Marpa allows variable length tokens,
5530so it needs to track how far out tokens might be found.
5531No complete or predicted Earley item will be found after the current earleme.
5532@d Furthest_Earleme_of_R(r) ((r)->t_furthest_earleme)
5533@d LV_Furthest_Earleme_of_R(r) Furthest_Earleme_of_R(r)
5534@<Int aligned recognizer elements@> = EARLEME t_furthest_earleme;
5535@ @<Initialize recognizer elements@> = r->t_furthest_earleme = 0;
5536@ @<Public function prototypes@> =
5537guint marpa_furthest_earleme(struct marpa_r* r);
5538@ @<Function definitions@> =
5539guint marpa_furthest_earleme(struct marpa_r* r)
5540{ return Furthest_Earleme_of_R(r); }
5541
5542@*0 Symbol Workarea.
5543This is used in the completion
5544phase for each Earley set.
5545It is used in building the list of postdot items,
5546and when building the Leo items.
5547It is sized to hold one |gpointer| for
5548every symbol.
5549@
5550{\bf To Do}: @^To Do@>
5551It may be possible to free this space when the recognition phase
5552is finished.
5553@<Widely aligned recognizer elements@> = gpointer* t_sym_workarea;
5554@ @<Initialize recognizer elements@> = r->t_sym_workarea = NULL;
5555@ @<Allocate symbol workarea@> =
5556    r->t_sym_workarea = g_malloc(sym_workarea_size);
5557
5558@*0 Workarea 2.
5559This is used in the completion
5560phase for each Earley set.
5561when building the Leo items.
5562It is sized to hold two |gpointer|'s for
5563every symbol.
5564@
5565{\bf To Do}: @^To Do@>
5566It may be possible to free this space when the recognition phase
5567is finished.
5568@<Widely aligned recognizer elements@> = gpointer* t_workarea2;
5569@ @<Initialize recognizer elements@> = r->t_workarea2 = NULL;
5570@ @<Allocate recognizer workareas@> =
5571{
5572  const guint sym_workarea_size = sizeof (gpointer) * symbol_count_of_g;
5573  @<Allocate symbol workarea@>@;
5574  r->t_workarea2 = g_malloc(2u * sym_workarea_size);
5575}
5576
5577@*0 Working Bit Vectors for Symbols.
5578These are two bit vectors, sized to the number of symbols
5579in the grammar,
5580for utility purposes.
5581They are used in the completion
5582phase for each Earley set,
5583to keep track of the new postdot items and
5584Leo items.
5585@
5586{\bf To Do}: @^To Do@>
5587It may be possible to free this space when the recognition phase
5588is finished.
5589@<Widely aligned recognizer elements@> =
5590Bit_Vector t_bv_sym;
5591Bit_Vector t_bv_sym2;
5592Bit_Vector t_bv_sym3;
5593@ @<Initialize recognizer elements@> =
5594r->t_bv_sym = NULL;
5595r->t_bv_sym2 = NULL;
5596r->t_bv_sym3 = NULL;
5597@ @<Allocate recognizer's bit vectors for symbols@> = {
5598  r->t_bv_sym = bv_create( (guint)symbol_count_of_g );
5599  r->t_bv_sym2 = bv_create( (guint)symbol_count_of_g );
5600  r->t_bv_sym3 = bv_create( (guint)symbol_count_of_g );
5601}
5602@ @<Free working bit vectors for symbols@> =
5603if (r->t_bv_sym) bv_free(r->t_bv_sym);
5604if (r->t_bv_sym2) bv_free(r->t_bv_sym2);
5605if (r->t_bv_sym3) bv_free(r->t_bv_sym3);
5606
5607@*0 Expected Symbol Boolean Vector.
5608A boolean vector by symbol ID,
5609with the bits set if the symbol is expected
5610at the current earleme.
5611This vector is not size until input starts.
5612When the recognizer is created,
5613this bit vector is initialized to |NULL| so that the destructor
5614can tell if there is a bit vector to be freed.
5615@<Widely aligned recognizer elements@> = Bit_Vector t_bv_symid_is_expected;
5616@ @<Initialize recognizer elements@> = r->t_bv_symid_is_expected = NULL;
5617@ @<Allocate recognizer's bit vectors for symbols@> =
5618    r->t_bv_symid_is_expected = bv_create( (guint)symbol_count_of_g );
5619@ @<Free working bit vectors for symbols@> =
5620if (r->t_bv_symid_is_expected) { bv_free(r->t_bv_symid_is_expected); }
5621@ Returns |-2| if there was a failure.
5622There is a check that the expectations of this
5623function and its caller about size of the |GArray| elements match.
5624This is a check worth making.
5625Mistakes happen,
5626a mismatch might arise as a portability issue,
5627and if I do not ``fail fast" here the ultimate problem
5628could be very hard to debug.
5629@<Public function prototypes@> =
5630gint marpa_terminals_expected(struct marpa_r* r, GArray* result);
5631@ @<Function definitions@> =
5632gint marpa_terminals_expected(struct marpa_r* r, GArray* result)
5633{
5634    @<Return |-2| on failure@>@;
5635    guint min, max, start;
5636    @<Fail recognizer if |GArray| elements are not |sizeof(gint)|@>@;
5637    g_array_set_size(result, 0);
5638    for (start = 0; bv_scan (r->t_bv_symid_is_expected, start, &min, &max);
5639	 start = max + 2)
5640      {
5641	gint symid;
5642	for (symid = (gint) min; symid <= (gint) max; symid++)
5643	  {
5644	    g_array_append_val (result, symid);
5645	  }
5646      }
5647    return (gint)result->len;
5648}
5649
5650@*0 Leo-Related Booleans.
5651@*1 Turning Leo Logic Off and On.
5652A trace flag, set if we are using Leo items.
5653This flag is set by default.
5654It has two uses.
5655@ This flag is very useful for testing.
5656Since Leo items do not affect function, only effiency,
5657it is possible for the Leo logic to be broken or
5658disabled without most tests noticiing.
5659To make sure the Leo logic is intact,
5660one of |libmarpa|'s tests runs one pass
5661with Leo items off and another with Leo items on
5662and compares them.
5663@ This flag also allows the Leo logic
5664to be turned off in certain cases in which the Leo logic
5665actually slows things down.
5666The Leo logic could be turned off if the user knows there is
5667no right recursion, although the actual gain,
5668would typically be small or not measurable.
5669@ A real gain would occur in the case of highly ambiguous
5670grammars, all or most of whose parses are actually evaluated.
5671Since those Earley items eliminated by the Leo logic
5672are actually recreated on an as-needed basis in the evaluation
5673phase, in cases when most of the Earley items are needed
5674for evaluation, the Leo logic would be eliminated Earley
5675items only to have to add most of them later.
5676In these cases,
5677the Leo logic would impose a small overhead.
5678@ The author's current view is that it is best
5679to start by assuming that the Leo logic should
5680be left on.
5681In the rare event, that it turns out that the Leo
5682logic is counter-productive,
5683this flag can be used to test if turning the Leo
5684logic off is helpful.
5685@ It should be borne in mind that even when the Leo logic
5686imposes a small cost in typical cases,
5687it may act as a safeguard.
5688The time complexity explosions prevented by Leo logic can
5689easily mean the difference between an impractical computation
5690and a practical one.
5691In most applications, it is worth incurring an small
5692overhead in the average case to prevent failures,
5693even rare ones.
5694@ There are two booleans.
5695One is a flag that can be set and
5696unset externally,
5697indicating the application's intention to use Leo logic.
5698An internal boolean tracks whether the Leo logic is
5699actually enabled at any given point.
5700@ The reason for having two booleans
5701is that the Leo logic is only turned
5702on once Earley set 0 is complete.
5703While Earley set 0 is being processed the internal flag will always
5704be unset, while the external flag may be set or unset, as the user
5705decided.
5706After Earley set 0 is complete, both booleans will have the same value.
5707@ {\bf To Do}: @^To Do@>
5708Once the null parse is special-cased, one boolean may suffice.
5709@<Bit aligned recognizer elements@> =
5710guint t_use_leo_flag:1;
5711guint t_is_using_leo:1;
5712@ @<Initialize recognizer elements@> =
5713r->t_use_leo_flag = 1;
5714r->t_is_using_leo = 0;
5715@ Returns 1 if the ``use Leo" flag is set,
57160 if not,
5717and |-2| if there was an error.
5718@<Public function prototypes@> =
5719gboolean marpa_is_use_leo(struct marpa_r* r);
5720@ @<Function definitions@> =
5721gint marpa_is_use_leo(struct marpa_r* r)
5722{
5723   @<Return |-2| on failure@>@/
5724    @<Fail if recognizer has fatal error@>@;
5725    return r->t_use_leo_flag ? 1 : 0;
5726}
5727@ Returns |TRUE| on success,
5728|FALSE| on failure.
5729@<Function definitions@> =
5730gboolean marpa_is_use_leo_set(
5731struct marpa_r*r, gboolean value)
5732{
5733   @<Return |FALSE| on failure@>@/
5734    @<Fail if recognizer has fatal error@>@;
5735    @<Fail if recognizer not initial@>@;
5736    r->t_use_leo_flag = value;
5737    return TRUE;
5738}
5739@ @<Public function prototypes@> =
5740gboolean marpa_is_use_leo_set( struct marpa_r*r, gboolean value);
5741
5742@*1 Is The Parser Exhausted?.
5743A parser is ``exhausted" if it cannot accept any more input.
5744Both successful and failed parses can be ``exhausted".
5745In many grammars,
5746the parse is always exhausted as soon as it succeeds.
5747And even if the parse is exhausted at a point
5748where there is no good parse,
5749there may be good parses at earlemes prior to the
5750earleme at which the parse became exhausted.
5751@d R_is_Exhausted(r) ((r)->t_is_exhausted)
5752@d LV_R_is_Exhausted(r) R_is_Exhausted(r)
5753@<Bit aligned recognizer elements@> = guint t_is_exhausted:1;
5754@ @<Initialize recognizer elements@> = r->t_is_exhausted = 0;
5755@ Exhaustion is a boolean, not a phase.
5756Once exhausted a parse stays exhausted,
5757even though the phase may change.
5758@<Public function prototypes@> =
5759gboolean marpa_is_exhausted(struct marpa_r* r);
5760@ @<Function definitions@> =
5761gint marpa_is_exhausted(struct marpa_r* r)
5762{
5763   @<Return |-2| on failure@>@/
5764    @<Fail if recognizer has fatal error@>@;
5765    return r->t_is_exhausted ? 1 : 0;
5766}
5767
5768@*0 The Recognizer's Context.
5769As in the grammar,
5770The ``context" is a hash of miscellaneous data,
5771by keyword,
5772whose
5773purpose is to
5774provide callbacks with
5775data about the recognizer's
5776state which is not conveniently
5777available in other forms.
5778@d Context_of_R(r) ((r)->t_context)
5779@<Widely aligned recognizer elements@> = GHashTable* t_context;
5780@ @<Initialize recognizer elements@> =
5781r->t_context = g_hash_table_new_full( g_str_hash, g_str_equal, NULL, g_free );
5782@ @<Destroy recognizer elements@> = g_hash_table_destroy(Context_of_R(r));
5783
5784@ Add an integer to the context.
5785The const qualifier on the key is deliberately discarded.
5786As implemented, the keys are treated as const's by
5787|g_hash_table_insert|, but the compiler can't know
5788that is my intention.
5789For type safety, I do want to keep the |const|
5790qualifier in other contexts.
5791@<Function definitions@> =
5792static inline
5793void r_context_int_add(struct marpa_r* r, const gchar* key, gint payload)
5794{
5795    struct marpa_context_int_value* value = g_new(struct marpa_context_int_value, 1);
5796    value->t_type = MARPA_CONTEXT_INT;
5797    value->t_data = payload;
5798    g_hash_table_insert(Context_of_R(r), (gpointer)key, value);
5799}
5800@ @<Private function prototypes@> =
5801static inline
5802void r_context_int_add(struct marpa_r* r, const gchar* key, gint value);
5803@ @<Function definitions@> =
5804static inline
5805void r_context_const_add(struct marpa_r* r, const gchar* key, const gchar* payload)
5806{
5807    struct marpa_context_const_value* value = g_new(struct marpa_context_const_value, 1);
5808    value->t_type = MARPA_CONTEXT_CONST;
5809    value->t_data = payload;
5810    g_hash_table_insert(Context_of_R(r), (gpointer)key, value);
5811}
5812@ @<Private function prototypes@> =
5813static inline
5814void r_context_const_add(struct marpa_r* r, const gchar* key, const gchar* value);
5815
5816@ Clear the current context.
5817Used to create a ``clean slate" in the context.
5818@<Function definitions@> =
5819static inline void r_context_clear(struct marpa_r* r) {
5820    g_hash_table_remove_all(Context_of_R(r)); }
5821@ @<Private function prototypes@> =
5822static inline void r_context_clear(struct marpa_r* r);
5823
5824@ @<Function definitions@> =
5825union marpa_context_value* marpa_r_context_value(struct marpa_r* r, const gchar* key)
5826{ return g_hash_table_lookup(Context_of_R(r), key); }
5827@ @<Public function prototypes@> =
5828union marpa_context_value* marpa_r_context_value(struct marpa_r* r, const gchar* key);
5829
5830@*0 The Recognizer Obstack.
5831Create an obstack with the lifetime of the recognizer.
5832This is a very efficient way of allocating memory which won't be
5833resized and which will have the same lifetime as the recognizer.
5834@<Widely aligned recognizer elements@> = struct obstack t_obs;
5835@ @<Initialize recognizer obstack@> = obstack_init(&r->t_obs);
5836@ @<Destroy recognizer obstack@> = obstack_free(&r->t_obs, NULL);
5837
5838@*0 The Recognizer's Error ID.
5839This is an error flag for the recognizer.
5840Error status is not necessarily cleared
5841on successful return, so that
5842it is only valid when an external
5843function has indicated there is an error,
5844and becomes invalid again when another external method
5845is called on the recognizer.
5846Checking it at other times may reveal ``stale" error
5847messages.
5848@ @<Widely aligned recognizer elements@> =
5849Marpa_Error_ID t_error;
5850Marpa_Error_ID t_fatal_error;
5851@ @<Initialize recognizer elements@> =
5852r->t_error = NULL;
5853r->t_fatal_error = NULL;
5854@ There is no destructor.
5855The error strings are assummed to be
5856{\bf not} error messages, but ``cookies".
5857These cookies are constants residing in static memory
5858(which may be read-only depending on implementation).
5859They cannot and should not be de-allocated.
5860@ @<Function definitions@> =
5861Marpa_Error_ID marpa_r_error(const struct marpa_r* r)
5862{ return r->t_error ? r->t_error : "unknown error"; }
5863@ @<Public function prototypes@> =
5864Marpa_Error_ID marpa_r_error(const struct marpa_r* r);
5865
5866@** Earlemes.
5867In most parsers, the input is modeled as a token stream ---
5868a sequence of tokens.
5869In this model the idea of location is not complex.
5870The first token is at location 0, the second at location 1,
5871etc.
5872@ Marpa allows ambiguous and variable length tokens, and requires
5873a more flexible idea of location, with a unit of length.
5874The unit of token length in Marpa is called an Earleme.
5875The locations themselves are often called earlemes.
5876@ |EARLEME_THRESHOLD| is less than |G_MAXINT| so that
5877I can prevent overflow without getting fancy -- overflow
5878by addition is impossible as long as earlemes are below
5879the threshold.
5880@ I considered defining earlemes as |glong| or |gint64|.
5881But machines with 32-bit int's
5882will in a not very long time
5883become museum pieces.
5884And in the meantime this
5885definition of |EARLEME_THRESHOLD| probably allows as large as
5886parse as the memories on those machines will be
5887able to handle.
5888@d EARLEME_THRESHOLD (G_MAXINT/4)
5889@<Public typedefs@> = typedef gint Marpa_Earleme;
5890@ @<Private typedefs@> = typedef Marpa_Earleme EARLEME;
5891
5892@** Earley Set (ES) Code.
5893@<Public typedefs@> = typedef gint Marpa_Earley_Set_ID;
5894@ @<Private typedefs@> = typedef Marpa_Earley_Set_ID ESID;
5895@ @d Next_ES_of_ES(set) ((set)->t_next_earley_set)
5896@d LV_Next_ES_of_ES(set) Next_ES_of_ES(set)
5897@d Postdot_SYM_Count_of_ES(set) ((set)->t_postdot_sym_count)
5898@d First_PIM_of_ES_by_SYMID(set, symid) (first_pim_of_es_by_symid((set), (symid)))
5899@d PIM_SYM_P_of_ES_by_SYMID(set, symid) (pim_sym_p_find((set), (symid)))
5900@<Private incomplete structures@> =
5901struct s_earley_set;
5902typedef struct s_earley_set *ES;
5903typedef const struct s_earley_set *ES_Const;
5904struct s_earley_set_key;
5905typedef struct s_earley_set_key *ESK;
5906@ @<Private structures@> =
5907struct s_earley_set_key {
5908    EARLEME t_earleme;
5909};
5910typedef struct s_earley_set_key ESK_Object;
5911@ @<Private structures@> =
5912struct s_earley_set {
5913    ESK_Object t_key;
5914    gint t_postdot_sym_count;
5915    @<Int aligned Earley set elements@>@;
5916    union u_postdot_item** t_postdot_ary;
5917    ES t_next_earley_set;
5918    @<Widely aligned Earley set elements@>@/
5919};
5920
5921@*0 Earley Item Container.
5922@d EIM_Count_of_ES(set) ((set)->t_eim_count)
5923@<Int aligned Earley set elements@> =
5924gint t_eim_count;
5925@ @d EIMs_of_ES(set) ((set)->t_earley_items)
5926@<Widely aligned Earley set elements@> =
5927EIM* t_earley_items;
5928
5929@*0 Ordinal.
5930The ordinal of the Earley set---
5931its number in sequence.
5932It is different from the earleme, because there may be
5933gaps in the earleme sequence.
5934There are never gaps in the sequence of ordinals.
5935@d ES_Count_of_R(r) ((r)->t_earley_set_count)
5936@d Ord_of_ES(set) ((set)->t_ordinal)
5937@<Int aligned Earley set elements@> =
5938    gint t_ordinal;
5939@ @d ES_Ord_is_Valid(r, ordinal)
5940    ((ordinal) >= 0 && (ordinal) < ES_Count_of_R(r))
5941@<Int aligned recognizer elements@> =
5942gint t_earley_set_count;
5943@ @<Initialize recognizer elements@> =
5944r->t_earley_set_count = 0;
5945
5946@*0 Constructor.
5947@<Private function prototypes@> =
5948static inline ES earley_set_new (RECCE r, EARLEME id);
5949@ @<Function definitions@> =
5950static inline ES
5951earley_set_new( RECCE r, EARLEME id)
5952{
5953  ESK_Object key;
5954  ES set;
5955  set = obstack_alloc (&r->t_obs, sizeof (*set));
5956  key.t_earleme = id;
5957  set->t_key = key;
5958  set->t_postdot_ary = NULL;
5959  set->t_postdot_sym_count = 0;
5960  EIM_Count_of_ES(set) = 0;
5961  set->t_ordinal = r->t_earley_set_count++;
5962  EIMs_of_ES(set) = NULL;
5963  LV_Next_ES_of_ES(set) = NULL;
5964  @<Initialize Earley set PSL data@>@/
5965  return set;
5966}
5967
5968@*0 Destructor.
5969@<Destroy recognizer elements@> =
5970{
5971  ES set;
5972  for (set = First_ES_of_R (r); set; set = Next_ES_of_ES (set))
5973    {
5974      if (EIMs_of_ES(set))
5975	g_free (EIMs_of_ES(set));
5976    }
5977}
5978
5979@*0 ID of Earley Set.
5980@d Earleme_of_ES(set) ((set)->t_key.t_earleme)
5981
5982@*0 Trace Functions.
5983Many of the
5984trace functions use
5985a ``trace Earley set" which is
5986tracked on a per-recognizer basis.
5987The ``trace Earley set" is tracked separately
5988from the current Earley set for the parse.
5989The two may coincide, but should not be confused.
5990@<Widely aligned recognizer elements@> =
5991struct s_earley_set* t_trace_earley_set;
5992@ @<Initialize recognizer elements@> =
5993r->t_trace_earley_set = NULL;
5994
5995@ @<Public function prototypes@> =
5996Marpa_Earley_Set_ID marpa_trace_earley_set(struct marpa_r *r);
5997@ @<Function definitions@> =
5998Marpa_Earley_Set_ID marpa_trace_earley_set(struct marpa_r *r)
5999{
6000  @<Return |-2| on failure@>@;
6001  ES trace_earley_set = r->t_trace_earley_set;
6002  @<Fail recognizer if not trace-safe@>@;
6003  if (!trace_earley_set) {
6004      R_ERROR("no trace es");
6005      return failure_indicator;
6006  }
6007  return Ord_of_ES(trace_earley_set);
6008}
6009
6010@ @<Public function prototypes@> =
6011Marpa_Earley_Set_ID marpa_latest_earley_set(struct marpa_r *r);
6012@ @<Function definitions@> =
6013Marpa_Earley_Set_ID marpa_latest_earley_set(struct marpa_r *r)
6014{
6015  @<Return |-2| on failure@>@;
6016  @<Fail recognizer if not trace-safe@>@;
6017  return Ord_of_ES(Latest_ES_of_R(r));
6018}
6019
6020@ Given the ID (ordinal) of an Earley set,
6021return the earleme.
6022In the default, token-stream model, ID and earleme
6023are the same, but this is not the case in other input
6024models.
6025If the ordinal is out of bounds, this function
6026returns -1, which can be treated as a soft failure.
6027On other problems, it returns -2.
6028@<Public function prototypes@> =
6029Marpa_Earleme marpa_earleme(struct marpa_r* r, Marpa_Earley_Set_ID set_id);
6030@ @<Function definitions@> =
6031Marpa_Earleme marpa_earleme(struct marpa_r* r, Marpa_Earley_Set_ID set_id)
6032{
6033    const gint es_does_not_exist = -1;
6034    @<Return |-2| on failure@>@;
6035    ES earley_set;
6036    @<Fail if recognizer initial@>@;
6037    @<Fail if recognizer has fatal error@>@;
6038    if (set_id < 0) {
6039        R_ERROR("invalid es ordinal");
6040	return failure_indicator;
6041    }
6042    r_update_earley_sets (r);
6043    if (!ES_Ord_is_Valid (r, set_id))
6044      {
6045	return es_does_not_exist;
6046      }
6047    earley_set = ES_of_R_by_Ord (r, set_id);
6048    return Earleme_of_ES (earley_set);
6049}
6050
6051@ Note that this trace function returns the earley set size
6052of the {\bf current earley set}.
6053@ @<Public function prototypes@> =
6054gint marpa_earley_set_size(struct marpa_r *r, Marpa_Earley_Set_ID set_id);
6055@ @<Function definitions@> =
6056gint marpa_earley_set_size(struct marpa_r *r, Marpa_Earley_Set_ID set_id)
6057{
6058    @<Return |-2| on failure@>@;
6059    ES earley_set;
6060    @<Fail if recognizer initial@>@;
6061    @<Fail if recognizer has fatal error@>@;
6062    r_update_earley_sets (r);
6063    if (!ES_Ord_is_Valid (r, set_id))
6064      {
6065	R_ERROR ("invalid es ordinal");
6066	return failure_indicator;
6067      }
6068    earley_set = ES_of_R_by_Ord (r, set_id);
6069    return EIM_Count_of_ES (earley_set);
6070}
6071
6072@** Earley Item (EIM) Code.
6073@ {\bf Optimization Principles:}
6074\li Optimization should favor unambiguous grammars,
6075but not heavily penalize ambiguous grammars.
6076\li Optimization should favor mildly ambiguous grammars,
6077but not heavily penalize very ambiguous grammars.
6078\li Optimization should focus on saving space,
6079perhaps even if at a slight cost in time.
6080@ Space savings are important
6081because in practical applications
6082there can easily be many millions of
6083Earley items and links.
6084If there are 1M copies of a structure,
6085each byte saved is a 1M saved.
6086
6087@ The solution arrived at is to optimize for Earley items
6088with a single source, storing that source in the item
6089itself.
6090For Earley item with multiple sources, a special structure
6091of linked lists is used.
6092When a second source is added,
6093the first source is copied into the lists,
6094and its original space used for pointers to the linked
6095lists.
6096@ This solution is optimized both
6097for the unambiguous case,
6098and for adding the third and additional
6099sources.
6100The only awkwardness takes place
6101when the second source is added, and the first one must
6102be recopied to make way for pointers to the linked lists.
6103@d EIM_FATAL_THRESHOLD (G_MAXINT/4)
6104@d Complete_SYMIDs_of_EIM(item)
6105    Complete_SYMIDs_of_AHFA(AHFA_of_EIM(item))
6106@d Complete_SYM_Count_of_EIM(item)
6107    Complete_SYM_Count_of_AHFA(AHFA_of_EIM(item))
6108@d Leo_LHS_ID_of_EIM(eim) Leo_LHS_ID_of_AHFA(AHFA_of_EIM(eim))
6109@ It might be slightly faster if this boolean is memoized in the Earley item
6110when the Earley item is initialized.
6111@d Earley_Item_is_Completion(item)
6112    (Complete_SYM_Count_of_EIM(item) > 0)
6113@<Public typedefs@> = typedef gint Marpa_Earley_Item_ID;
6114@ The ID of the Earley item is per-Earley-set, so that
6115to uniquely specify the Earley item you must also specify
6116the Earley set.
6117@d ES_of_EIM(item) ((item)->t_key.t_set)
6118@d ES_Ord_of_EIM(item) (Ord_of_ES(ES_of_EIM(item)))
6119@d Ord_of_EIM(item) ((item)->t_ordinal)
6120@d Earleme_of_EIM(item) Earleme_of_ES(ES_of_EIM(item))
6121@d AHFAID_of_EIM(item) (ID_of_AHFA(AHFA_of_EIM(item)))
6122@d AHFA_of_EIM(item) ((item)->t_key.t_state)
6123@d AIM_Count_of_EIM(item) (AIM_Count_of_AHFA(AHFA_of_EIM(item)))
6124@d Origin_Earleme_of_EIM(item) (Earleme_of_ES(Origin_of_EIM(item)))
6125@d Origin_Ord_of_EIM(item) (Ord_of_ES(Origin_of_EIM(item)))
6126@d Origin_of_EIM(item) ((item)->t_key.t_origin)
6127@d AIM_of_EIM_by_AEX(eim, aex) AIM_of_AHFA_by_AEX(AHFA_of_EIM(eim), (aex))
6128@d AEX_of_EIM_by_AIM(eim, aim) AEX_of_AHFA_by_AIM(AHFA_of_EIM(eim), (aim))
6129@<Private incomplete structures@> =
6130struct s_earley_item;
6131typedef struct s_earley_item* EIM;
6132typedef const struct s_earley_item* EIM_Const;
6133struct s_earley_item_key;
6134typedef struct s_earley_item_key* EIK;
6135
6136@ @<Earley item structure@> =
6137struct s_earley_item_key {
6138     AHFA t_state;
6139     ES t_origin;
6140     ES t_set;
6141};
6142typedef struct s_earley_item_key EIK_Object;
6143struct s_earley_item {
6144     EIK_Object t_key;
6145     union u_source_container t_container;
6146     gint t_ordinal;
6147     @<Bit aligned Earley item elements@>@/
6148};
6149typedef struct s_earley_item EIM_Object;
6150
6151@*0 Constructor.
6152Find an Earley item object, creating it if it does not exist.
6153Only in a couple of cases per parse (in AHFA state 0),
6154do we already
6155know that the Earley item is unique in the set.
6156These are not worth optimizing for.
6157@<Private function prototypes@> =
6158static inline EIM earley_item_create(const RECCE r,
6159    const EIK_Object key);
6160@ @<Function definitions@> =
6161static inline EIM earley_item_create(const RECCE r,
6162    const EIK_Object key)
6163{
6164  @<Return |NULL| on failure@>@;
6165  EIM new_item;
6166  EIM* top_of_work_stack;
6167  const ES set = key.t_set;
6168  const guint count = ++EIM_Count_of_ES(set);
6169  @<Check count against Earley item thresholds@>@;
6170  new_item = obstack_alloc (&r->t_obs, sizeof (*new_item));
6171  new_item->t_key = key;
6172  new_item->t_source_type = NO_SOURCE;
6173  Ord_of_EIM(new_item) = count - 1;
6174  top_of_work_stack = WORK_EIM_PUSH(r);
6175  *top_of_work_stack = new_item;
6176  return new_item;
6177}
6178
6179@ @<Private function prototypes@> =
6180static inline
6181EIM earley_item_assign (const RECCE r, const ES set, const ES origin, const AHFA state);
6182@ @<Function definitions@> =
6183static inline EIM
6184earley_item_assign (const RECCE r, const ES set, const ES origin,
6185		    const AHFA state)
6186{
6187  EIK_Object key;
6188  EIM eim;
6189  PSL psl;
6190  AHFAID ahfa_id = ID_of_AHFA(state);
6191  PSL *psl_owner = &Dot_PSL_of_ES (origin);
6192  if (!*psl_owner)
6193    {
6194      psl_claim (psl_owner, Dot_PSAR_of_R(r));
6195    }
6196  psl = *psl_owner;
6197  eim = PSL_Datum (psl, ahfa_id);
6198  if (eim
6199      && Earleme_of_EIM (eim) == Earleme_of_ES (set)
6200      && Earleme_of_ES (Origin_of_EIM (eim)) == Earleme_of_ES (origin))
6201    {
6202      return eim;
6203    }
6204  key.t_origin = origin;
6205  key.t_state = state;
6206  key.t_set = set;
6207  eim = earley_item_create (r, key);
6208  PSL_Datum (psl, ahfa_id) = eim;
6209  return eim;
6210}
6211
6212@ The fatal threshold always applies.
6213The warning threshold does not count against items added by a Leo expansion.
6214@<Check count against Earley item thresholds@> =
6215if (count >= r->t_earley_item_warning_threshold)
6216    {
6217      if (G_UNLIKELY(count >= EIM_FATAL_THRESHOLD))
6218      { /* Set the recognizer to a fatal error */
6219	  r_context_clear (r);
6220	  R_FATAL("eim count exceeds fatal threshold");
6221	  return failure_indicator;
6222	}
6223	  r_context_clear (r);
6224	  r_message (r, "earley item count exceeds threshold");
6225}
6226
6227@*0 Destructor.
6228No destructor.  All earley item elements are either owned by other objects.
6229The Earley item itself is on the obstack.
6230
6231@*0 Source of the Earley Item.
6232@d NO_SOURCE (0U)
6233@d SOURCE_IS_TOKEN (1U)
6234@d SOURCE_IS_COMPLETION (2U)
6235@d SOURCE_IS_LEO (3U)
6236@d SOURCE_IS_AMBIGUOUS (4U)
6237@d Source_Type_of_EIM(item) ((item)->t_source_type)
6238@d Earley_Item_has_No_Source(item) ((item)->t_source_type == NO_SOURCE)
6239@d Earley_Item_has_Token_Source(item) ((item)->t_source_type == SOURCE_IS_TOKEN)
6240@d Earley_Item_has_Complete_Source(item) ((item)->t_source_type == SOURCE_IS_COMPLETION)
6241@d Earley_Item_has_Leo_Source(item) ((item)->t_source_type == SOURCE_IS_LEO)
6242@d Earley_Item_is_Ambiguous(item) ((item)->t_source_type == SOURCE_IS_AMBIGUOUS)
6243@<Bit aligned Earley item elements@> =
6244guint t_source_type:3;
6245
6246@ @<Private function prototypes@> =
6247static const char* invalid_source_type_message(guint type);
6248@ Not inline, because not used in critical paths.
6249This is for creating error messages.
6250@<Function definitions@> =
6251static const char* invalid_source_type_message(guint type) {
6252     switch (type) {
6253    case NO_SOURCE:
6254    return "invalid source type: none";
6255    case SOURCE_IS_TOKEN:
6256     return "invalid source type: token";
6257    case SOURCE_IS_COMPLETION:
6258     return "invalid source type: completion";
6259    case SOURCE_IS_LEO:
6260     return "invalid source type: leo";
6261    case SOURCE_IS_AMBIGUOUS:
6262     return "invalid source type: ambiguous";
6263     }
6264     return "unknown source type";
6265}
6266
6267@*0 Trace Functions.
6268Many of the
6269trace functions use
6270a ``trace Earley item" which is
6271tracked on a per-recognizer basis.
6272@<Widely aligned recognizer elements@> =
6273EIM t_trace_earley_item;
6274@ @<Initialize recognizer elements@> =
6275r->t_trace_earley_item = NULL;
6276@ This function returns the AHFA state ID of an Earley item,
6277and sets the trace Earley item,
6278if it successfully finds an Earley item
6279in the trace Earley set with the specified
6280AHFA state ID and origin earleme.
6281If there is no such Earley item,
6282it returns |-1|,
6283and clears the trace Earley item.
6284On failure for other reasons,
6285it returns |-2|,
6286and clears the trace Earley item.
6287@ The trace Earley item is cleared if no matching
6288Earley item is found, and on failure.
6289The trace source link is always
6290cleared, regardless of success or failure.
6291
6292@ This function sets
6293the trace Earley set to the one indicated
6294by the ID
6295of the argument.
6296On success,
6297the earleme of the new trace Earley set is
6298returned.
6299@ Various other trace data depends on the Earley
6300set, and must be consistent with it.
6301This function clears all such data,
6302unless it is called while the recognizer is in
6303a trace-unsafe state (initial, fatal, etc.)
6304or unless the the Earley set requested by the
6305argument is already the trace Earley set.
6306On failure because the ID is for a non-existent
6307Earley set which does not
6308exist, |-1| is returned.
6309The upper levels may choose to treat this as a soft failure.
6310This may be treated as a soft failure by the upper levels.
6311On failure because the ID is illegal (less than zero)
6312or for other failures, |-2| is returned.
6313The upper levels may choose to treat these as hard failures.
6314@ @<Public function prototypes@> =
6315Marpa_Earleme
6316marpa_earley_set_trace (struct marpa_r *r, Marpa_Earley_Set_ID set_id);
6317@ @<Function definitions@> =
6318Marpa_Earleme
6319marpa_earley_set_trace (struct marpa_r *r, Marpa_Earley_Set_ID set_id)
6320{
6321  ES earley_set;
6322  const gint es_does_not_exist = -1;
6323  @<Return |-2| on failure@>@/
6324  @<Fail recognizer if not trace-safe@>@;
6325    if (r->t_trace_earley_set && Ord_of_ES (r->t_trace_earley_set) == set_id)
6326      { /* If the set is already
6327	   the current earley set,
6328	   return successfully without resetting any of the dependant data */
6329	return Earleme_of_ES (r->t_trace_earley_set);
6330      }
6331  @<Clear trace Earley set dependent data@>@;
6332    if (set_id < 0)
6333    {
6334	R_ERROR ("invalid es ordinal");
6335	return failure_indicator;
6336    }
6337  r_update_earley_sets (r);
6338    if (set_id >= DSTACK_LENGTH (r->t_earley_set_stack))
6339      {
6340	return es_does_not_exist;
6341      }
6342    earley_set = ES_of_R_by_Ord (r, set_id);
6343  r->t_trace_earley_set = earley_set;
6344  return Earleme_of_ES(earley_set);
6345}
6346
6347@ @<Clear trace Earley set dependent data@> = {
6348  r->t_trace_earley_set = NULL;
6349  trace_earley_item_clear(r);
6350  @<Clear trace postdot item data@>@;
6351}
6352
6353@ @<Public function prototypes@> =
6354Marpa_AHFA_State_ID
6355marpa_earley_item_trace (struct marpa_r *r,
6356    Marpa_Earley_Item_ID item_id);
6357@ @<Function definitions@> =
6358Marpa_AHFA_State_ID
6359marpa_earley_item_trace (struct marpa_r *r, Marpa_Earley_Item_ID item_id)
6360{
6361  const gint eim_does_not_exist = -1;
6362  @<Return |-2| on failure@>@;
6363  ES trace_earley_set;
6364  EIM earley_item;
6365  EIM *earley_items;
6366  @<Fail recognizer if not trace-safe@>@;
6367  trace_earley_set = r->t_trace_earley_set;
6368  if (!trace_earley_set)
6369    {
6370      @<Clear trace Earley set dependent data@>@;
6371      R_ERROR ("no trace es");
6372      return failure_indicator;
6373    }
6374  trace_earley_item_clear (r);
6375  if (item_id < 0)
6376    {
6377      R_ERROR ("invalid eim ordinal");
6378      return failure_indicator;
6379    }
6380  if (item_id >= EIM_Count_of_ES (trace_earley_set))
6381    {
6382      return eim_does_not_exist;
6383    }
6384  earley_items = EIMs_of_ES (trace_earley_set);
6385  earley_item = earley_items[item_id];
6386  r->t_trace_earley_item = earley_item;
6387  return AHFAID_of_EIM (earley_item);
6388}
6389
6390@ Clear all the data elements specifically
6391for the trace Earley item.
6392The difference between this code and
6393|trace_earley_item_clear| is
6394that |trace_earley_item_clear|
6395also clears the source link.
6396@<Clear trace Earley item data@> =
6397      r->t_trace_earley_item = NULL;
6398
6399@ @<Private function prototypes@> =
6400static inline void trace_earley_item_clear(struct marpa_r* r);
6401@ @<Function definitions@> =
6402static inline void trace_earley_item_clear(struct marpa_r* r)
6403{
6404    @<Clear trace Earley item data@>@/
6405    trace_source_link_clear(r);
6406}
6407
6408@ @<Private function prototypes@> =
6409Marpa_Earley_Set_ID marpa_earley_item_origin(struct marpa_r *r);
6410@ @<Function definitions@> =
6411Marpa_Earley_Set_ID marpa_earley_item_origin(struct marpa_r *r)
6412{
6413  @<Return |-2| on failure@>@;
6414  EIM item = r->t_trace_earley_item;
6415  @<Fail if recognizer initial@>@;
6416  if (!item) {
6417      @<Clear trace Earley item data@>@;
6418      R_ERROR("no trace eim");
6419      return failure_indicator;
6420  }
6421  return Origin_Ord_of_EIM(item);
6422}
6423
6424@** Earley Index (EIX) Code.
6425Postdot items are of two kinds: Earley indexes
6426and Leo items.
6427The payload of an Earley index is simple:
6428a pointer to an Earley item.
6429The other elements of the EIX are overhead to
6430support the chain of postdot items for
6431a postdot symbol.
6432@d Next_PIM_of_EIX(eix) ((eix)->t_next)
6433@d LV_Next_PIM_of_EIX(eix) Next_PIM_of_EIX(eix)
6434@d EIM_of_EIX(eix) ((eix)->t_earley_item)
6435@d LV_EIM_of_EIX(eix) EIM_of_EIX(eix)
6436@d Postdot_SYMID_of_EIX(eix) ((eix)->t_postdot_symid)
6437@d LV_Postdot_SYMID_of_EIX(eix) Postdot_SYMID_of_EIX(eix)
6438@<Private incomplete structures@> =
6439struct s_earley_ix;
6440typedef struct s_earley_ix* EIX;
6441union u_postdot_item;
6442@ @<Private structures@> =
6443struct s_earley_ix {
6444     union u_postdot_item* t_next;
6445     SYMID t_postdot_symid;
6446     EIM t_earley_item; // Never NULL if this is an index item
6447};
6448typedef struct s_earley_ix EIX_Object;
6449
6450@** Leo Item (LIM) Code.
6451Leo items originate from the ``transition items" of Joop Leo's 1991 paper.
6452They are set up so their first fields are identical to those of
6453the Earley item indexes,
6454so that they can be linked together in the same chain.
6455Because the Earley index is at the beginning of each Leo item,
6456LIMs can be treated as a kind of EIX.
6457@d EIX_of_LIM(lim) ((EIX)(lim))
6458@ Both Earley indexes and Leo items are
6459postdot items, so that Leo items also require
6460the fields to maintain the chain of postdot items.
6461For this reason, Leo items contain an Earley index,
6462but one
6463with a |NULL| Earley item pointer.
6464@d Postdot_SYMID_of_LIM(leo) (Postdot_SYMID_of_EIX(EIX_of_LIM(leo)))
6465@d Next_PIM_of_LIM(leo) (Next_PIM_of_EIX(EIX_of_LIM(leo)))
6466@d LV_Next_PIM_of_LIM(leo) Next_PIM_of_LIM(leo)
6467@d Origin_of_LIM(leo) ((leo)->t_origin)
6468@d LV_Origin_of_LIM(leo) Origin_of_LIM(leo)
6469@d Top_AHFA_of_LIM(leo) ((leo)->t_top_ahfa)
6470@d LV_Top_AHFA_of_LIM(leo) Top_AHFA_of_LIM(leo)
6471@d Predecessor_LIM_of_LIM(leo) ((leo)->t_predecessor)
6472@d LV_Predecessor_LIM_of_LIM(leo) Predecessor_LIM_of_LIM(leo)
6473@d Base_EIM_of_LIM(leo) ((leo)->t_base)
6474@d LV_Base_EIM_of_LIM(leo) Base_EIM_of_LIM(leo)
6475@d ES_of_LIM(leo) ((leo)->t_set)
6476@d LV_ES_of_LIM(leo) ES_of_LIM(leo)
6477@d Chain_Length_of_LIM(leo) ((leo)->t_chain_length)
6478@d LV_Chain_Length_of_LIM(leo) Chain_Length_of_LIM(leo)
6479@d Earleme_of_LIM(lim) Earleme_of_ES(ES_of_LIM(lim))
6480@<Private incomplete structures@> =
6481struct s_leo_item;
6482typedef struct s_leo_item* LIM;
6483@ @<Private structures@> =
6484struct s_leo_item {
6485     EIX_Object t_earley_ix;
6486     ES t_origin;
6487     AHFA t_top_ahfa;
6488     LIM t_predecessor;
6489     EIM t_base;
6490     ES t_set;
6491     gint t_chain_length;
6492};
6493typedef struct s_leo_item LIM_Object;
6494
6495@*0 Trace Functions.
6496The functions in this section are all accessors.
6497The trace Leo item is selected by setting the trace postdot item
6498to a Leo item.
6499
6500@ @<Private function prototypes@> =
6501Marpa_Symbol_ID marpa_leo_predecessor_symbol(struct marpa_r *r);
6502@ @<Function definitions@> =
6503Marpa_Symbol_ID marpa_leo_predecessor_symbol(struct marpa_r *r)
6504{
6505  const Marpa_Symbol_ID no_predecessor = -1;
6506  @<Return |-2| on failure@>@;
6507  PIM postdot_item = r->t_trace_postdot_item;
6508  LIM predecessor_leo_item;
6509  @<Fail recognizer if not trace-safe@>@;
6510  if (!postdot_item) {
6511      R_ERROR("no trace pim");
6512      return failure_indicator;
6513  }
6514  if (EIM_of_PIM(postdot_item)) {
6515      R_ERROR("pim is not lim");
6516      return failure_indicator;
6517  }
6518  predecessor_leo_item = Predecessor_LIM_of_LIM(LIM_of_PIM(postdot_item));
6519  if (!predecessor_leo_item) return no_predecessor;
6520  return Postdot_SYMID_of_LIM(predecessor_leo_item);
6521}
6522
6523Marpa_Earley_Set_ID marpa_leo_base_origin(struct marpa_r *r);
6524@ @<Function definitions@> =
6525Marpa_Earley_Set_ID marpa_leo_base_origin(struct marpa_r *r)
6526{
6527  const EARLEME pim_is_not_a_leo_item = -1;
6528  @<Return |-2| on failure@>@;
6529  PIM postdot_item = r->t_trace_postdot_item;
6530  EIM base_earley_item;
6531  @<Fail recognizer if not trace-safe@>@;
6532  if (!postdot_item) {
6533      R_ERROR("no trace pim");
6534      return failure_indicator;
6535  }
6536  if (EIM_of_PIM(postdot_item)) return pim_is_not_a_leo_item;
6537  base_earley_item = Base_EIM_of_LIM(LIM_of_PIM(postdot_item));
6538  return Origin_Ord_of_EIM(base_earley_item);
6539}
6540
6541@ @<Private function prototypes@> =
6542Marpa_AHFA_State_ID marpa_leo_base_state(struct marpa_r *r);
6543@ @<Function definitions@> =
6544Marpa_AHFA_State_ID marpa_leo_base_state(struct marpa_r *r)
6545{
6546  const EARLEME pim_is_not_a_leo_item = -1;
6547  @<Return |-2| on failure@>@;
6548  PIM postdot_item = r->t_trace_postdot_item;
6549  EIM base_earley_item;
6550  @<Fail recognizer if not trace-safe@>@;
6551  if (!postdot_item) {
6552      R_ERROR("no trace pim");
6553      return failure_indicator;
6554  }
6555  if (EIM_of_PIM(postdot_item)) return pim_is_not_a_leo_item;
6556  base_earley_item = Base_EIM_of_LIM(LIM_of_PIM(postdot_item));
6557  return AHFAID_of_EIM(base_earley_item);
6558}
6559
6560@ This function
6561returns the ``Leo expansion AHFA" of the current trace Leo item.
6562@<Private function prototypes@> =
6563Marpa_AHFA_State_ID marpa_leo_expansion_ahfa(struct marpa_r *r);
6564@ The {\bf Leo expansion AHFA} is the AHFA
6565of the {\bf Leo expansion Earley item}.
6566for this Leo item.
6567{\bf Leo expansion Earley items}, when
6568the context makes the meaning clear,
6569are also called {\bf Leo expansion items}
6570or simply {\bf Leo expansions}.
6571@ Every Leo item has a unique Leo expansion Earley item,
6572because for this purpose
6573the process of
6574Leo expansion is seen from a non-recursive point of view.
6575In practice, Leo expansion is recursive,
6576andl creation of the Leo expansion Earley item for
6577one Leo item
6578implies
6579the Leo expansion of all of the predecessors of that
6580Leo item.
6581@ Note that expansion of the Leo item at the top
6582of a Leo path is not needed---%
6583if a Leo item is the predecessor in
6584a Leo source for a Leo completion item,
6585the Leo completion item is the expansion of that Leo item.
6586@ @<Function definitions@> =
6587Marpa_AHFA_State_ID marpa_leo_expansion_ahfa(struct marpa_r *r)
6588{
6589    const EARLEME pim_is_not_a_leo_item = -1;
6590    @<Return |-2| on failure@>@;
6591    const PIM postdot_item = r->t_trace_postdot_item;
6592    @<Fail recognizer if not trace-safe@>@;
6593    if (!postdot_item)
6594      {
6595	R_ERROR ("no trace pim");
6596	return failure_indicator;
6597      }
6598    if (!EIM_of_PIM (postdot_item))
6599      {
6600	const LIM leo_item = LIM_of_PIM (postdot_item);
6601	const EIM base_earley_item = Base_EIM_of_LIM (leo_item);
6602	const SYMID postdot_symbol = Postdot_SYMID_of_LIM (leo_item);
6603	const AHFA to_ahfa = To_AHFA_of_EIM_by_SYMID (base_earley_item, postdot_symbol);
6604	return ID_of_AHFA(to_ahfa);
6605      }
6606    return pim_is_not_a_leo_item;
6607}
6608
6609
6610@** Postdot Item (PIM) code.
6611Postdot items are entries in an index,
6612by postdot symbol, of both the Earley items and the Leo items
6613for each Earley set.
6614@d LIM_of_PIM(pim) ((LIM)(pim))
6615@d EIX_of_PIM(pim) ((EIX)(pim))
6616@d Postdot_SYMID_of_PIM(pim) (Postdot_SYMID_of_EIX(EIX_of_PIM(pim)))
6617@d LV_Postdot_SYMID_of_PIM(pim) Postdot_SYMID_of_PIM(pim)
6618@d EIM_of_PIM(pim) (EIM_of_EIX(EIX_of_PIM(pim)))
6619@d LV_EIM_of_PIM(pim) EIM_of_PIM(pim)
6620@d Next_PIM_of_PIM(pim) (Next_PIM_of_EIX(EIX_of_PIM(pim)))
6621@d LV_Next_PIM_of_PIM(pim) Next_PIM_of_PIM(pim)
6622
6623@ |PIM_of_LIM| assumes that PIM is in fact a LIM.
6624|PIM_is_LIM| is available to check this.
6625@d PIM_of_LIM(pim) ((PIM)(pim))
6626@d PIM_is_LIM(pim) (EIM_of_EIX(EIX_of_PIM(pim)) == NULL)
6627@s PIM int
6628@<Private structures@> =
6629union u_postdot_item {
6630    LIM_Object t_leo;
6631    EIX_Object t_earley;
6632};
6633typedef union u_postdot_item* PIM;
6634
6635@*0 Symbol of a Postdot Item.
6636@d SYMID_of_Postdot_Item(postdot) ((postdot)->t_earley.transition_symid)
6637
6638@ This function searches for the
6639first postdot item for an Earley set
6640and a symbol ID.
6641If successful, it
6642returns that postdot item.
6643If it fails, it returns |NULL|.
6644@<Private function prototypes@> =
6645static inline PIM* pim_sym_p_find(ES set, SYMID symid);
6646@ @<Function definitions@> =
6647static inline PIM*
6648pim_sym_p_find (ES set, SYMID symid)
6649{
6650  gint lo = 0;
6651  gint hi = Postdot_SYM_Count_of_ES(set) - 1;
6652  PIM* postdot_array = set->t_postdot_ary;
6653  while (hi >= lo) { // A binary search
6654       gint trial = lo+(hi-lo)/2; // guards against overflow
6655       PIM trial_pim = postdot_array[trial];
6656       SYMID trial_symid = Postdot_SYMID_of_PIM(trial_pim);
6657       if (trial_symid == symid) return postdot_array+trial;
6658       if (trial_symid < symid) {
6659           lo = trial+1;
6660       } else {
6661           hi = trial-1;
6662       }
6663  }
6664  return NULL;
6665}
6666@ @<Private function prototypes@> =
6667static inline PIM first_pim_of_es_by_symid(ES set, SYMID symid);
6668@ @<Function definitions@> =
6669static inline PIM first_pim_of_es_by_symid(ES set, SYMID symid)
6670{
6671   PIM* pim_sym_p = pim_sym_p_find(set, symid);
6672   return pim_sym_p ? *pim_sym_p : NULL;
6673}
6674
6675@*0 Trace Functions.
6676Many of the
6677trace functions use
6678a ``trace postdot item".
6679This is
6680tracked on a per-recognizer basis.
6681@<Widely aligned recognizer elements@> =
6682union u_postdot_item** t_trace_pim_sym_p;
6683union u_postdot_item* t_trace_postdot_item;
6684@ @<Initialize recognizer elements@> =
6685r->t_trace_pim_sym_p = NULL;
6686r->t_trace_postdot_item = NULL;
6687@ |marpa_postdot_symbol_trace|
6688takes a recognizer and a symbol ID
6689as an argument.
6690It sets the trace postdot item to the first
6691postdot item for the symbol ID.
6692If there is no postdot item
6693for that symbol ID,
6694it returns |-1|.
6695On failure for other reasons,
6696it returns |-2|
6697and clears the trace postdot item.
6698@<Public function prototypes@> =
6699Marpa_Symbol_ID
6700marpa_postdot_symbol_trace (struct marpa_r *r,
6701    Marpa_Symbol_ID symid);
6702@ @<Function definitions@> =
6703Marpa_Symbol_ID
6704marpa_postdot_symbol_trace (struct marpa_r *r,
6705    Marpa_Symbol_ID symid)
6706{
6707  @<Return |-2| on failure@>@;
6708  ES current_es = r->t_trace_earley_set;
6709  PIM* pim_sym_p;
6710  PIM pim;
6711  @<Clear trace postdot item data@>@;
6712  @<Fail recognizer if not trace-safe@>@;
6713  @<Fail if recognizer |symid| is invalid@>@;
6714  if (!current_es) {
6715      R_ERROR("no pim");
6716      return failure_indicator;
6717  }
6718  pim_sym_p = PIM_SYM_P_of_ES_by_SYMID(current_es, symid);
6719  pim = *pim_sym_p;
6720  if (!pim) return -1;
6721  r->t_trace_pim_sym_p = pim_sym_p;
6722  r->t_trace_postdot_item = pim;
6723  return symid;
6724}
6725
6726@ @<Clear trace postdot item data@> =
6727r->t_trace_pim_sym_p = NULL;
6728r->t_trace_postdot_item = NULL;
6729
6730@ Set trace postdot item to the first in the trace Earley set,
6731and return its postdot symbol ID.
6732If the trace Earley set has no postdot items, return -1 and
6733clear the trace postdot item.
6734On other failures, return -2 and clear the trace
6735postdot item.
6736@<Public function prototypes@> =
6737Marpa_Symbol_ID
6738marpa_first_postdot_item_trace (struct marpa_r *r);
6739@ @<Function definitions@> =
6740Marpa_Symbol_ID
6741marpa_first_postdot_item_trace (struct marpa_r *r)
6742{
6743  @<Return |-2| on failure@>@;
6744  ES current_earley_set = r->t_trace_earley_set;
6745  PIM pim;
6746  PIM* pim_sym_p;
6747  @<Clear trace postdot item data@>@;
6748  @<Fail recognizer if not trace-safe@>@;
6749  if (!current_earley_set) {
6750      @<Clear trace Earley item data@>@;
6751      R_ERROR("no trace es");
6752      return failure_indicator;
6753  }
6754  if (current_earley_set->t_postdot_sym_count <= 0) return -1;
6755  pim_sym_p = current_earley_set->t_postdot_ary+0;
6756  pim = pim_sym_p[0];
6757  r->t_trace_pim_sym_p = pim_sym_p;
6758  r->t_trace_postdot_item = pim;
6759  return Postdot_SYMID_of_PIM(pim);
6760}
6761
6762@ Set the trace postdot item to the one after
6763the current trace postdot item,
6764and return its postdot symbol ID.
6765If the current trace postdot item is the last,
6766return -1 and clear the trace postdot item.
6767On other failures, return -2 and clear the trace
6768postdot item.
6769@<Public function prototypes@> =
6770Marpa_Symbol_ID
6771marpa_next_postdot_item_trace (struct marpa_r *r);
6772@ @<Function definitions@> =
6773Marpa_Symbol_ID
6774marpa_next_postdot_item_trace (struct marpa_r *r)
6775{
6776  const SYMID no_more_postdot_symbols = -1;
6777  @<Return |-2| on failure@>@;
6778  ES current_set = r->t_trace_earley_set;
6779  PIM pim;
6780  PIM* pim_sym_p;
6781
6782  pim_sym_p = r->t_trace_pim_sym_p;
6783  pim = r->t_trace_postdot_item;
6784  @<Clear trace postdot item data@>@;
6785  if (!pim_sym_p || !pim) {
6786      R_ERROR("no trace pim");
6787      return failure_indicator;
6788  }
6789  @<Fail recognizer if not trace-safe@>@;
6790  if (!current_set) {
6791      R_ERROR("no trace es");
6792      return failure_indicator;
6793  }
6794  pim = Next_PIM_of_PIM(pim);
6795  if (!pim) { /* If no next postdot item for this symbol,
6796       then look at next symbol */
6797       pim_sym_p++;
6798       if (pim_sym_p - current_set->t_postdot_ary
6799	   >= current_set->t_postdot_sym_count) {
6800	   return no_more_postdot_symbols;
6801       }
6802      pim = *pim_sym_p;
6803  }
6804  r->t_trace_pim_sym_p = pim_sym_p;
6805  r->t_trace_postdot_item = pim;
6806  return Postdot_SYMID_of_PIM(pim);
6807}
6808
6809@ @<Private function prototypes@> =
6810Marpa_AHFA_State_ID marpa_postdot_item_symbol(struct marpa_r *r);
6811@ @<Function definitions@> =
6812Marpa_AHFA_State_ID marpa_postdot_item_symbol(struct marpa_r *r)
6813{
6814  @<Return |-2| on failure@>@;
6815  PIM postdot_item = r->t_trace_postdot_item;
6816  @<Fail recognizer if not trace-safe@>@;
6817  if (!postdot_item) {
6818      R_ERROR("no trace pim");
6819      return failure_indicator;
6820  }
6821  return Postdot_SYMID_of_PIM(postdot_item);
6822}
6823
6824
6825@** Source Objects.
6826These are distinguished by context.
6827@*0 The Relationship between Leo items and Ambiguity.
6828The relationship between Leo items and ambiguous sources bears
6829some explaining.
6830Leo sources must be unique, but only when their predecessor's
6831Earley set is considered.
6832That is, for every pairing of Earley item and Earley set,
6833if there be only one Leo source in that Earley item
6834with a predecessor in that Earley set.
6835But there may be other sources (both Leo and non-Leo),
6836a long as their predecessors
6837are in different Earley sets.
6838@ One way to look at these Leo ambiguities is as different
6839``factorings" of the Earley item.
6840Assume the last (or transition) symbol of an Earley item
6841is a token.
6842An Earley item will often have both a predecessor and a token,
6843and these can ``factor", or divide up, the distance between
6844an Earley item's origin and its current set in different ways.
6845@ The Earley item can have only one origin,
6846and only one transition symbol.
6847But that transition symbol does not have to start at the origin
6848and can start anywhere between the origin and the current
6849set of the Earley item.
6850For example, for an Earley item at earleme 14, with its origin at 10,
6851tokens may start at earlemes 10, 11, 12 and 13.
6852Each may have its own Leo source.
6853At those earlemes without a Leo source, there may be any number
6854of non-Leo sources.
6855@ In this way, an Earley item with a Leo source can be ambiguous.
6856The discussion above assumed the final symbol was a token.
6857The situation for completion Earley items is similar,
6858and these also can both have a Leo source and
6859be ambiguous.
6860@*0 Optimization.
6861There will be a lot of these structures in a long
6862parse, so space optimization is important.
6863I have some latitude in the number of linked lists
6864in a ambiguous source.
6865If an |int| is the same size as a |void*|,
6866then space for three |void*| in ambiguous sources
6867comes ``free".
6868If |void*| is $n$ bytes larger than an |int|,
6869then each unambiguous source uses $n$ bytes
6870more than it has to, although there are
6871compensating improvements in
6872speed and simplicity.
6873Any programmer trying to take advantage
6874of architectures where |int|
6875is shorter than |void*| will need to
6876assure herself that the space she saves in
6877the |ambiguous_source| struct was not simply wasted
6878by alignment within structures or during memory allocation.
6879@d Next_SRCL_of_SRCL(link) ((link)->t_next)
6880@d LV_Next_SRCL_of_SRCL(link) Next_SRCL_of_SRCL(link)
6881@ @<Private typedefs@> =
6882struct s_source;
6883typedef struct s_source* SRC;
6884@ @<Source object structure@>=
6885struct s_source {
6886     gpointer t_predecessor;
6887     union {
6888	 gpointer t_completion;
6889	 TOK t_token;
6890     } t_cause;
6891};
6892
6893@ @<Private typedefs@> =
6894struct s_source_link;
6895typedef struct s_source_link* SRCL;
6896@ @<Source object structure@>=
6897struct s_source_link {
6898    SRCL t_next;
6899    struct s_source t_source;
6900};
6901
6902@ @<Source object structure@>=
6903struct s_ambiguous_source {
6904    SRCL t_leo;
6905    SRCL t_token;
6906    SRCL t_completion;
6907};
6908
6909@ @<Source object structure@>=
6910union u_source_container {
6911    struct s_ambiguous_source t_ambiguous;
6912    struct s_source t_unique;
6913};
6914
6915@
6916@d Source_of_SRCL(link) ((link)->t_source)
6917@d Source_of_EIM(eim) ((eim)->t_container.t_unique)
6918@d Predecessor_of_Source(srcd) ((srcd).t_predecessor)
6919@d Predecessor_of_SRC(source) Predecessor_of_Source(*(source))
6920@d Predecessor_of_EIM(item) Predecessor_of_Source(Source_of_EIM(item))
6921@d Predecessor_of_SRCL(link) Predecessor_of_Source(Source_of_SRCL(link))
6922@d LV_Predecessor_of_SRCL(link) Predecessor_of_SRCL(link)
6923@d Cause_of_Source(srcd) ((srcd).t_cause.t_completion)
6924@d Cause_of_SRC(source) Cause_of_Source(*(source))
6925@d Cause_of_EIM(item) Cause_of_Source(Source_of_EIM(item))
6926@d Cause_of_SRCL(link) Cause_of_Source(Source_of_SRCL(link))
6927@d TOK_of_Source(srcd) ((srcd).t_cause.t_token)
6928@d TOK_of_SRC(source) TOK_of_Source(*(source))
6929@d TOK_of_EIM(eim) TOK_of_Source(Source_of_EIM(eim))
6930@d TOK_of_SRCL(link) TOK_of_Source(Source_of_SRCL(link))
6931@d SYMID_of_Source(srcd) SYMID_of_TOK(TOK_of_Source(srcd))
6932@d SYMID_of_SRC(source) SYMID_of_Source(*(source))
6933@d SYMID_of_EIM(eim) SYMID_of_Source(Source_of_EIM(eim))
6934@d SYMID_of_SRCL(link) SYMID_of_Source(Source_of_SRCL(link))
6935
6936@ @d Cause_AHFA_State_ID_of_SRC(source)
6937    AHFAID_of_EIM((EIM)Cause_of_SRC(source))
6938@d Leo_Transition_SYMID_of_SRC(leo_source)
6939    Postdot_SYMID_of_LIM((LIM)Predecessor_of_SRC(leo_source))
6940
6941@
6942@d First_Completion_Link_of_EIM(item) ((item)->t_container.t_ambiguous.t_completion)
6943@d LV_First_Completion_Link_of_EIM(item) First_Completion_Link_of_EIM(item)
6944@d First_Token_Link_of_EIM(item) ((item)->t_container.t_ambiguous.t_token)
6945@d LV_First_Token_Link_of_EIM(item) First_Token_Link_of_EIM(item)
6946@d First_Leo_SRCL_of_EIM(item) ((item)->t_container.t_ambiguous.t_leo)
6947@d LV_First_Leo_SRCL_of_EIM(item) First_Leo_SRCL_of_EIM(item)
6948
6949@ @<Private function prototypes@> = static inline void
6950token_link_add (struct marpa_r *r,
6951		EIM item,
6952		EIM predecessor,
6953		TOK token);
6954@ @<Function definitions@> = static inline
6955void
6956token_link_add (struct marpa_r *r,
6957		EIM item,
6958		EIM predecessor,
6959		TOK token)
6960{
6961  SRCL new_link;
6962  guint previous_source_type = Source_Type_of_EIM (item);
6963  if (previous_source_type == NO_SOURCE)
6964    {
6965      Source_Type_of_EIM (item) = SOURCE_IS_TOKEN;
6966      item->t_container.t_unique.t_predecessor = predecessor;
6967      TOK_of_Source(item->t_container.t_unique) = token;
6968      return;
6969    }
6970  if (previous_source_type != SOURCE_IS_AMBIGUOUS)
6971    { // If the sourcing is not already ambiguous, make it so
6972      earley_item_ambiguate (r, item);
6973    }
6974  new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
6975  new_link->t_next = First_Token_Link_of_EIM (item);
6976  new_link->t_source.t_predecessor = predecessor;
6977  TOK_of_Source(new_link->t_source) = token;
6978  LV_First_Token_Link_of_EIM (item) = new_link;
6979}
6980
6981@ @<Private function prototypes@> = static inline void
6982completion_link_add (struct marpa_r *r,
6983		EIM item,
6984		EIM predecessor,
6985		EIM cause);
6986@
6987Each possible cause
6988link is only visited once.
6989It may be paired with several different predecessors.
6990Each cause may complete several different LHS symbols
6991and Marpa::XS will seek predecessors for each at
6992the parent location.
6993Two different completed LHS symbols might be postdot
6994symbols for the same predecessor Earley item.
6995For this reason,
6996predecessor-cause pairs
6997might not be unique
6998within an Earley item.
6999@ Since a completion link consists entirely of
7000the predecessor-cause pair, this means duplicate
7001completion links are possible.
7002The maximum possible number of such duplicates is the
7003number of complete LHS symbols for the current AHFA state.
7004This is alway a constant and typically a small one,
7005but it is also typically larger than 1.
7006@ This is not an issue for unambiguous parsing.
7007It {\bf is} an issue for iterating ambiguous parses.
7008The strategy currently taken is to do nothing about duplicates
7009in the recognition phase,
7010and to eliminate them in the evaluation phase.
7011Ultimately, duplicates must be eliminated by rule and
7012position -- eliminating duplicates by AHFA state is
7013{\bf not} sufficient.
7014Since I do not pull out the
7015individual rules and positions until the evaluation phase,
7016at this writing it seems to make sense to deal with
7017duplicates there.
7018@ As shown above, the number of duplicate completion links
7019is never more than $O(n)$ where $n$ is the number of Earley items.
7020For academic purposes, it
7021is probably possible to contrive a parse which generates
7022a lot of duplicates.
7023The actual numbers
7024I have encountered have always been very small,
7025even in grammars of only academic interest.
7026@ The carrying cost of the extra completion links can be safely
7027assumed to be very low,
7028in comparision with the cost of searching for them.
7029This means that the major consideration in deciding
7030where to eliminate duplicates,
7031is time efficiency.
7032Duplicate completion links should be eliminated
7033at the point where that elimination can be accomplished
7034most efficiently.
7035@<Function definitions@> = static inline
7036void
7037completion_link_add (struct marpa_r *r,
7038		EIM item,
7039		EIM predecessor,
7040		EIM cause)
7041{
7042  SRCL new_link;
7043  guint previous_source_type = Source_Type_of_EIM (item);
7044  if (previous_source_type == NO_SOURCE)
7045    {
7046      Source_Type_of_EIM (item) = SOURCE_IS_COMPLETION;
7047      item->t_container.t_unique.t_predecessor = predecessor;
7048      Cause_of_Source(item->t_container.t_unique) = cause;
7049      return;
7050    }
7051  if (previous_source_type != SOURCE_IS_AMBIGUOUS)
7052    { // If the sourcing is not already ambiguous, make it so
7053      earley_item_ambiguate (r, item);
7054    }
7055  new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
7056  new_link->t_next = First_Completion_Link_of_EIM (item);
7057  new_link->t_source.t_predecessor = predecessor;
7058  Cause_of_Source(new_link->t_source) = cause;
7059  LV_First_Completion_Link_of_EIM (item) = new_link;
7060}
7061
7062@ @<Function definitions@> = static inline
7063void
7064leo_link_add (struct marpa_r *r,
7065		EIM item,
7066		LIM predecessor,
7067		EIM cause)
7068{
7069  SRCL new_link;
7070  guint previous_source_type = Source_Type_of_EIM (item);
7071  if (previous_source_type == NO_SOURCE)
7072    {
7073      Source_Type_of_EIM (item) = SOURCE_IS_LEO;
7074      item->t_container.t_unique.t_predecessor = predecessor;
7075      Cause_of_Source(item->t_container.t_unique) = cause;
7076      return;
7077    }
7078  if (previous_source_type != SOURCE_IS_AMBIGUOUS)
7079    { // If the sourcing is not already ambiguous, make it so
7080      earley_item_ambiguate (r, item);
7081    }
7082  new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
7083  new_link->t_next = First_Leo_SRCL_of_EIM (item);
7084  new_link->t_source.t_predecessor = predecessor;
7085  Cause_of_Source(new_link->t_source) = cause;
7086  LV_First_Leo_SRCL_of_EIM(item) = new_link;
7087}
7088@ @<Private function prototypes@> = static inline void
7089leo_link_add (struct marpa_r *r,
7090		EIM item,
7091		LIM predecessor,
7092		EIM cause);
7093
7094@ {\bf Convert an Earley item to an ambiguous one.}
7095|earley_item_ambiguate|
7096assumes it is called when there is exactly one source.
7097In other words, is assumes that the Earley item
7098is not unsourced,
7099and that it is not already ambiguous.
7100Ambiguous sources should have more than one source,
7101and
7102|earley_item_ambiguate|
7103is assuming that a new source will be added as followup.
7104@
7105Inlining |earley_item_ambiguate| might help in some
7106circumstance, but at this point
7107|earley_item_ambiguate| is not marked |inline|.
7108|earley_item_ambiguate|
7109is not short,
7110it is referenced in several places,
7111it is only called for ambiguous Earley items,
7112and even for these it is only called when the
7113Earley item first becomes ambiguous.
7114@<Function definitions@> = static
7115void earley_item_ambiguate (struct marpa_r * r, EIM item)
7116{
7117  guint previous_source_type = Source_Type_of_EIM (item);
7118  Source_Type_of_EIM (item) = SOURCE_IS_AMBIGUOUS;
7119  switch (previous_source_type)
7120    {
7121    case SOURCE_IS_TOKEN: @<Ambiguate token source@>@;
7122      return;
7123    case SOURCE_IS_COMPLETION: @<Ambiguate completion source@>@;
7124      return;
7125    case SOURCE_IS_LEO: @<Ambiguate Leo source@>@;
7126      return;
7127    }
7128}
7129@ @<Private function prototypes@> = static
7130void earley_item_ambiguate (struct marpa_r * r, EIM item);
7131
7132@ @<Ambiguate token source@> = {
7133  SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
7134  new_link->t_next = NULL;
7135  new_link->t_source = item->t_container.t_unique;
7136  LV_First_Leo_SRCL_of_EIM (item) = NULL;
7137  LV_First_Completion_Link_of_EIM (item) = NULL;
7138  LV_First_Token_Link_of_EIM (item) = new_link;
7139}
7140
7141@ @<Ambiguate completion source@> = {
7142  SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
7143  new_link->t_next = NULL;
7144  new_link->t_source = item->t_container.t_unique;
7145  LV_First_Leo_SRCL_of_EIM (item) = NULL;
7146  LV_First_Completion_Link_of_EIM (item) = new_link;
7147  LV_First_Token_Link_of_EIM (item) = NULL;
7148}
7149
7150@ @<Ambiguate Leo source@> = {
7151  SRCL new_link = obstack_alloc (&r->t_obs, sizeof (*new_link));
7152  new_link->t_next = NULL;
7153  new_link->t_source = item->t_container.t_unique;
7154  LV_First_Leo_SRCL_of_EIM (item) = new_link;
7155  LV_First_Completion_Link_of_EIM (item) = NULL;
7156  LV_First_Token_Link_of_EIM (item) = NULL;
7157}
7158
7159@*0 Trace Functions.
7160Many trace functions track a ``trace source link".
7161There is only one of these, shared among all types of
7162source link.
7163It is an error to call a trace function that is
7164inconsistent with the type of the current trace
7165source link.
7166@<Widely aligned recognizer elements@> =
7167SRC t_trace_source;
7168SRCL t_trace_next_source_link;
7169@ @<Bit aligned recognizer elements@> =
7170guint t_trace_source_type:3;
7171@ @<Initialize recognizer elements@> =
7172r->t_trace_source = NULL;
7173r->t_trace_next_source_link = NULL;
7174r->t_trace_source_type = NO_SOURCE;
7175
7176@*1 Trace First Token Link.
7177@ Set the trace source link to a token link,
7178if there is one, otherwise clear the trace source link.
7179Returns the symbol ID if there was a token source link,
7180|-1| if there was none,
7181and |-2| on some other kind of failure.
7182@<Public function prototypes@> =
7183Marpa_Symbol_ID marpa_first_token_link_trace(struct marpa_r *r);
7184@ @<Function definitions@> =
7185Marpa_Symbol_ID marpa_first_token_link_trace(struct marpa_r *r)
7186{
7187   @<Return |-2| on failure@>@;
7188   SRC source;
7189   guint source_type;
7190    EIM item = r->t_trace_earley_item;
7191    @<Fail recognizer if not trace-safe@>@;
7192    @<Set |item|, failing if necessary@>@;
7193    source_type = Source_Type_of_EIM (item);
7194    switch (source_type)
7195      {
7196      case SOURCE_IS_TOKEN:
7197	r->t_trace_source_type = SOURCE_IS_TOKEN;
7198	source = &(item->t_container.t_unique);
7199	r->t_trace_source = source;
7200	r->t_trace_next_source_link = NULL;
7201	return SYMID_of_SRC (source);
7202      case SOURCE_IS_AMBIGUOUS:
7203	{
7204	  SRCL full_link =
7205	    First_Token_Link_of_EIM (item);
7206	  if (full_link)
7207	    {
7208	      r->t_trace_source_type = SOURCE_IS_TOKEN;
7209	      r->t_trace_next_source_link = Next_SRCL_of_SRCL (full_link);
7210	      r->t_trace_source = &(full_link->t_source);
7211	      return SYMID_of_SRCL (full_link);
7212	    }
7213	}
7214      }
7215    trace_source_link_clear(r);
7216    return -1;
7217}
7218
7219@*1 Trace Next Token Link.
7220@ Set the trace source link to the next token link,
7221if there is one.
7222Otherwise clear the trace source link.
7223@ Returns the symbol ID if there is
7224a next token source link,
7225|-1| if there was none,
7226and |-2| on some other kind of failure.
7227@<Public function prototypes@> =
7228Marpa_Symbol_ID marpa_next_token_link_trace(struct marpa_r *r);
7229@ @<Function definitions@> =
7230Marpa_Symbol_ID marpa_next_token_link_trace(struct marpa_r *r)
7231{
7232   @<Return |-2| on failure@>@;
7233   SRCL full_link;
7234    EIM item;
7235    @<Fail recognizer if not trace-safe@>@;
7236    @<Set |item|, failing if necessary@>@;
7237    if (r->t_trace_source_type != SOURCE_IS_TOKEN) {
7238	trace_source_link_clear(r);
7239	R_ERROR("not tracing token links");
7240        return failure_indicator;
7241    }
7242    if (!r->t_trace_next_source_link) {
7243	trace_source_link_clear(r);
7244        return -1;
7245    }
7246    full_link = r->t_trace_next_source_link;
7247    r->t_trace_next_source_link = Next_SRCL_of_SRCL (full_link);
7248    r->t_trace_source = &(full_link->t_source);
7249    return SYMID_of_SRCL (full_link);
7250}
7251
7252@*1 Trace First Completion Link.
7253@ Set the trace source link to a completion link,
7254if there is one, otherwise clear the completion source link.
7255Returns the AHFA state ID of the cause
7256if there was a completion source link,
7257|-1| if there was none,
7258and |-2| on some other kind of failure.
7259@<Public function prototypes@> =
7260Marpa_Symbol_ID marpa_first_completion_link_trace(struct marpa_r *r);
7261@ @<Function definitions@> =
7262Marpa_Symbol_ID marpa_first_completion_link_trace(struct marpa_r *r)
7263{
7264   @<Return |-2| on failure@>@;
7265   SRC source;
7266   guint source_type;
7267    EIM item = r->t_trace_earley_item;
7268    @<Fail recognizer if not trace-safe@>@;
7269    @<Set |item|, failing if necessary@>@;
7270    switch ((source_type = Source_Type_of_EIM (item)))
7271      {
7272      case SOURCE_IS_COMPLETION:
7273	r->t_trace_source_type = SOURCE_IS_COMPLETION;
7274	source = &(item->t_container.t_unique);
7275	r->t_trace_source = source;
7276	r->t_trace_next_source_link = NULL;
7277	return Cause_AHFA_State_ID_of_SRC (source);
7278      case SOURCE_IS_AMBIGUOUS:
7279	{
7280	  SRCL completion_link = First_Completion_Link_of_EIM (item);
7281	  if (completion_link)
7282	    {
7283	      source = &(completion_link->t_source);
7284	      r->t_trace_source_type = SOURCE_IS_COMPLETION;
7285	      r->t_trace_next_source_link = Next_SRCL_of_SRCL (completion_link);
7286	      r->t_trace_source = source;
7287	      return Cause_AHFA_State_ID_of_SRC (source);
7288	    }
7289	}
7290      }
7291    trace_source_link_clear(r);
7292    return -1;
7293}
7294
7295@*1 Trace Next Completion Link.
7296@ Set the trace source link to the next completion link,
7297if there is one.
7298Otherwise clear the trace source link.
7299@ Returns the symbol ID if there is
7300a next completion source link,
7301|-1| if there was none,
7302and |-2| on some other kind of failure.
7303@<Public function prototypes@> =
7304Marpa_Symbol_ID marpa_next_completion_link_trace(struct marpa_r *r);
7305@ @<Function definitions@> =
7306Marpa_Symbol_ID marpa_next_completion_link_trace(struct marpa_r *r)
7307{
7308   @<Return |-2| on failure@>@;
7309   SRC source;
7310   SRCL completion_link;
7311    EIM item;
7312    @<Fail recognizer if not trace-safe@>@;
7313    @<Set |item|, failing if necessary@>@;
7314    if (r->t_trace_source_type != SOURCE_IS_COMPLETION) {
7315	trace_source_link_clear(r);
7316	R_ERROR("not tracing completion links");
7317        return failure_indicator;
7318    }
7319    if (!r->t_trace_next_source_link) {
7320	trace_source_link_clear(r);
7321        return -1;
7322    }
7323    completion_link = r->t_trace_next_source_link;
7324    r->t_trace_next_source_link = Next_SRCL_of_SRCL (r->t_trace_next_source_link);
7325    source = &(completion_link->t_source);
7326    r->t_trace_source = source;
7327    return Cause_AHFA_State_ID_of_SRC (source);
7328}
7329
7330@*1 Trace First Leo Link.
7331@ Set the trace source link to a Leo link,
7332if there is one, otherwise clear the Leo source link.
7333Returns the AHFA state ID of the cause
7334if there was a Leo source link,
7335|-1| if there was none,
7336and |-2| on some other kind of failure.
7337@<Public function prototypes@> =
7338Marpa_Symbol_ID marpa_first_leo_link_trace(struct marpa_r *r);
7339@ @<Function definitions@> =
7340Marpa_Symbol_ID
7341marpa_first_leo_link_trace (struct marpa_r *r)
7342{
7343  @<Return |-2| on failure@>@;
7344  SRC source;
7345  guint source_type;
7346  EIM item = r->t_trace_earley_item;
7347  @<Fail recognizer if not trace-safe@>@;
7348  @<Set |item|, failing if necessary@>@;
7349  switch ((source_type = Source_Type_of_EIM (item)))
7350	{
7351	case SOURCE_IS_LEO:
7352	  r->t_trace_source_type = SOURCE_IS_LEO;
7353	  source = &(item->t_container.t_unique);
7354	  r->t_trace_source = source;
7355	  r->t_trace_next_source_link = NULL;
7356	  return Cause_AHFA_State_ID_of_SRC (source);
7357	case SOURCE_IS_AMBIGUOUS:
7358	  {
7359	    SRCL full_link =
7360	      First_Leo_SRCL_of_EIM (item);
7361	    if (full_link)
7362	      {
7363		source = &(full_link->t_source);
7364		r->t_trace_source_type = SOURCE_IS_LEO;
7365		r->t_trace_next_source_link = (SRCL)
7366		  Next_SRCL_of_SRCL (full_link);
7367		r->t_trace_source = source;
7368		return Cause_AHFA_State_ID_of_SRC (source);
7369	      }
7370	  }
7371	}
7372  trace_source_link_clear (r);
7373  return -1;
7374}
7375
7376@*1 Trace Next Leo Link.
7377@ Set the trace source link to the next Leo link,
7378if there is one.
7379Otherwise clear the trace source link.
7380@ Returns the symbol ID if there is
7381a next Leo source link,
7382|-1| if there was none,
7383and |-2| on some other kind of failure.
7384@<Public function prototypes@> =
7385Marpa_Symbol_ID marpa_next_leo_link_trace(struct marpa_r *r);
7386@ @<Function definitions@> =
7387Marpa_Symbol_ID
7388marpa_next_leo_link_trace (struct marpa_r *r)
7389{
7390  @<Return |-2| on failure@>@/
7391  SRCL full_link;
7392  SRC source;
7393  EIM item;
7394  @<Fail recognizer if not trace-safe@>@/
7395  @<Set |item|, failing if necessary@>@/
7396  if (r->t_trace_source_type != SOURCE_IS_LEO)
7397    {
7398      trace_source_link_clear (r);
7399      R_ERROR("not tracing leo links");
7400      return failure_indicator;
7401    }
7402  if (!r->t_trace_next_source_link)
7403    {
7404      trace_source_link_clear (r);
7405      return -1;
7406    }
7407  full_link = r->t_trace_next_source_link;
7408  source = &(full_link->t_source);
7409  r->t_trace_source = source;
7410  r->t_trace_next_source_link =
7411    Next_SRCL_of_SRCL(r->t_trace_next_source_link);
7412  return Cause_AHFA_State_ID_of_SRC (source);
7413}
7414
7415@ @<Set |item|, failing if necessary@> =
7416    item = r->t_trace_earley_item;
7417    if (!item) {
7418	trace_source_link_clear(r);
7419	R_ERROR("no eim");
7420        return failure_indicator;
7421    }
7422
7423@*1 Clear Trace Source Link.
7424@ @<Private function prototypes@> =
7425static inline void trace_source_link_clear(struct marpa_r* r);
7426@ @<Function definitions@> =
7427static inline void trace_source_link_clear(struct marpa_r* r) {
7428    r->t_trace_next_source_link = NULL;
7429    r->t_trace_source = NULL;
7430    r->t_trace_source_type = NO_SOURCE;
7431}
7432
7433@*1 Return the Predecessor AHFA State.
7434Returns the predecessor AHFA State,
7435or -1 if there is no predecessor.
7436If the recognizer is trace-safe,
7437there is no trace source link,
7438the trace source link is a Leo source,
7439or there is some other failure,
7440|-2| is returned.
7441@<Public function prototypes@> =
7442Marpa_AHFA_State_ID marpa_source_predecessor_state(struct marpa_r *r);
7443@ @<Function definitions@> =
7444AHFAID marpa_source_predecessor_state(struct marpa_r *r)
7445{
7446   @<Return |-2| on failure@>@/
7447   guint source_type;
7448   SRC source;
7449    @<Fail recognizer if not trace-safe@>@/
7450   source_type = r->t_trace_source_type;
7451    @<Set source, failing if necessary@>@/
7452    switch (source_type)
7453    {
7454    case SOURCE_IS_TOKEN:
7455    case SOURCE_IS_COMPLETION: {
7456        EIM predecessor = Predecessor_of_SRC(source);
7457	if (!predecessor) return -1;
7458	return AHFAID_of_EIM(predecessor);
7459    }
7460    }
7461    R_ERROR(invalid_source_type_message(source_type));
7462    return failure_indicator;
7463}
7464
7465@*1 Return the Token.
7466Returns the token.
7467The symbol id is the return value,
7468and the value is written to |*value_p|,
7469if it is non-null.
7470If the recognizer is not trace-safe,
7471there is no trace source link,
7472if the trace source link is not a token source,
7473or there is some other failure,
7474|-2| is returned.
7475\par
7476There is no function to return just the token value
7477for two reasons.
7478First, since token value can be anything
7479an additional return value is needed to indicate errors,
7480which means the symbol ID comes at virtually zero cost.
7481Second, whenever the token value is
7482wanted, the symbol ID is almost always wanted as well.
7483@<Public function prototypes@> =
7484Marpa_Symbol_ID marpa_source_token(struct marpa_r *r, gpointer *value_p);
7485@ @<Function definitions@> =
7486Marpa_Symbol_ID marpa_source_token(struct marpa_r *r, gpointer *value_p)
7487{
7488   @<Return |-2| on failure@>@;
7489   guint source_type;
7490   SRC source;
7491    @<Fail recognizer if not trace-safe@>@;
7492   source_type = r->t_trace_source_type;
7493    @<Set source, failing if necessary@>@;
7494    if (source_type == SOURCE_IS_TOKEN) {
7495	const TOK token = TOK_of_SRC(source);
7496        if (value_p) *value_p = Value_of_TOK(token);
7497	return SYMID_of_TOK(token);
7498    }
7499    R_ERROR(invalid_source_type_message(source_type));
7500    return failure_indicator;
7501}
7502
7503@*1 Return the Leo Transition Symbol.
7504The Leo transition symbol is defined only for sources
7505with a Leo predecessor.
7506The transition from a predecessor to the Earley item
7507containing a source will always be over exactly one symbol.
7508In the case of a Leo source, this symbol will be
7509the Leo transition symbol.
7510@ Returns the symbol ID of the Leo transition symbol.
7511If the recognizer is not trace-safe,
7512if there is no trace source link,
7513if the trace source link is not a Leo source,
7514or there is some other failure,
7515|-2| is returned.
7516@<Public function prototypes@> =
7517Marpa_Symbol_ID marpa_source_leo_transition_symbol(struct marpa_r *r);
7518@ @<Function definitions@> =
7519Marpa_Symbol_ID marpa_source_leo_transition_symbol(struct marpa_r *r)
7520{
7521   @<Return |-2| on failure@>@/
7522   guint source_type;
7523   SRC source;
7524    @<Fail recognizer if not trace-safe@>@/
7525   source_type = r->t_trace_source_type;
7526    @<Set source, failing if necessary@>@/
7527    switch (source_type)
7528    {
7529    case SOURCE_IS_LEO:
7530	return Leo_Transition_SYMID_of_SRC(source);
7531    }
7532    R_ERROR(invalid_source_type_message(source_type));
7533    return failure_indicator;
7534}
7535
7536@*1 Return the Middle Earleme.
7537Every source has a ``middle earleme" defined.
7538Every source has
7539\li An origin (or start earleme).
7540\li An end earleme (the current set).
7541\li A ``middle earleme".
7542An Earley item can be thought of as covering a ``span"
7543from its origin to the current set.
7544For each source,
7545this span is divided into two pieces at the middle
7546earleme.
7547@ Informally, the middle earleme can be thought of as
7548dividing the span between the predecessor and either
7549the source's cause or its token.
7550If the source has no predecessor, the middle earleme
7551is the same as the origin.
7552If there is a predecessor, the middle earleme is
7553the current set of the predecessor.
7554If there is a cause, the middle earleme is always the same
7555as the origin of the cause.
7556If there is a token,
7557the middle earleme is always where the token starts.
7558@<Public function prototypes@> =
7559Marpa_Earley_Set_ID marpa_source_middle(struct marpa_r* r);
7560@ The ``predecessor set" is the earleme of the predecessor.
7561Returns |-1| if there is no predecessor.
7562If there are other failures, such as
7563there being no source link,
7564|-2| is returned.
7565@<Function definitions@> =
7566Marpa_Earley_Set_ID marpa_source_middle(struct marpa_r* r)
7567{
7568   @<Return |-2| on failure@>@/
7569   const EARLEME no_predecessor = -1;
7570   guint source_type;
7571   SRC source;
7572    @<Fail recognizer if not trace-safe@>@/
7573   source_type = r->t_trace_source_type;
7574    @<Set source, failing if necessary@>@/
7575    switch (source_type)
7576      {
7577      case SOURCE_IS_LEO:
7578	{
7579	  LIM predecessor = Predecessor_of_SRC (source);
7580	  if (!predecessor) return no_predecessor;
7581	  return
7582	    ES_Ord_of_EIM (Base_EIM_of_LIM (predecessor));
7583	}
7584      case SOURCE_IS_TOKEN:
7585      case SOURCE_IS_COMPLETION:
7586	{
7587	  EIM predecessor = Predecessor_of_SRC (source);
7588	  if (!predecessor) return no_predecessor;
7589	  return ES_Ord_of_EIM (predecessor);
7590	}
7591    }
7592    R_ERROR(invalid_source_type_message (source_type));
7593    return failure_indicator;
7594}
7595
7596@ @<Set source, failing if necessary@> =
7597    source = r->t_trace_source;
7598    if (!source) {
7599	R_ERROR("no trace source link");
7600        return failure_indicator;
7601    }
7602
7603@** Token Code (TOK).
7604@ Tokens are duples of symbol ID and token value.
7605They do {\bf not} store location information,
7606so the same token
7607can occur many times in a parse.
7608On the other hand, duplicate tokens are also allowed.
7609How much, if any, trouble to take to avoid duplication
7610is up to the application --
7611duplicates have their cost, but so does the
7612tracking necessary to avoid them.
7613@ My strong preference is that token values
7614{\bf always} be integers, but
7615token values are |gpointer|'s to allow applications
7616full generality.
7617Using |glib|, integers can portably be stored in a
7618|gpointer|, but the reverse is not true.
7619@ In my prefered semantic scheme, the integers are
7620used by the higher levels to index the actual data.
7621In this way no direct pointer to any data "owned"
7622by the higher level is ever under libmarpa's control.
7623Problems with mismatches between libmarpa and the
7624higher levels are almost impossible to avoid in
7625development
7626and once an application gets in maintenance mode
7627things become, if possible, worse.
7628@ "But," you say, "pointers are faster,
7629and mismatches occur whether
7630you index the data with an integer or directly.
7631So if you are in trouble either way, why not go
7632for speed?"
7633\par
7634The above objection is true, but overlooks a very
7635important issue.  A bad pointer can cause very
7636serious problems --
7637a core dump, or even worse, undetected data corruption.
7638There is no good way to detect a bad pointer before it
7639does it's damage.
7640\par
7641If an integer index, on the other hand, is out of bounds,
7642the higher levels can catch this and react.
7643Worst case, the higher level may have to throw a controlled
7644fatal error.
7645This is a much better than a core dump
7646and far better than undetected data corruption.
7647@<Private incomplete structures@> =
7648struct s_token;
7649typedef struct s_token* TOK;
7650@ The |t_type| field is to allow |TOK|
7651objects to act as or-nodes.
7652@d Type_of_TOK(tok) ((tok)->t_type)
7653@d SYMID_of_TOK(tok) ((tok)->t_symbol_id)
7654@d Value_of_TOK(tok) ((tok)->t_value)
7655@<Private structures@> =
7656struct s_token {
7657    gint t_type;
7658    SYMID t_symbol_id;
7659    gpointer t_value;
7660};
7661typedef struct s_token TOK_Object;
7662
7663@ An obstack dedicated to the tokens and an array
7664with default tokens for each symbol.
7665Currently,
7666the default tokens are used to provide
7667null values, since all non-tokens are given
7668values when read.
7669There is a special obstack for the tokens, to
7670to separate the token stream from the rest of the recognizer
7671data.
7672Once the bocage is built, the token data is all that
7673it needs, and someday I may want to take advantage of
7674this fact by freeing up the rest of recognizer memory.
7675@d TOK_Obs_of_R(r) (&(r)->t_token_obs)
7676@d TOKs_by_SYMID_of_R(r) ((r)->t_tokens_by_symid)
7677@d TOK_Obs TOK_Obs_of_R(r)
7678@d TOK_by_ID_of_R(r, symbol_id) (TOKs_by_SYMID_of_R(r)[symbol_id])
7679@<Widely aligned recognizer elements@> =
7680struct obstack t_token_obs;
7681TOK *t_tokens_by_symid;
7682@ @<Initialize recognizer elements@> =
7683{
7684  gpointer default_value = Default_Value_of_G(g);
7685  gint i;
7686  TOK *tokens_by_symid;
7687  obstack_init (TOK_Obs);
7688  tokens_by_symid =
7689    obstack_alloc (TOK_Obs, sizeof (TOK) * symbol_count_of_g);
7690  for (i = 0; i < symbol_count_of_g; i++)
7691    {
7692      tokens_by_symid[i] = token_new (r, i, default_value);
7693    }
7694  TOKs_by_SYMID_of_R(r) = tokens_by_symid;
7695}
7696@ @<Destroy recognizer elements@> =
7697{
7698    TOK* tokens_by_symid = TOKs_by_SYMID_of_R(r);
7699    if (tokens_by_symid) {
7700	obstack_free(TOK_Obs, NULL);
7701	TOKs_by_SYMID_of_R(r) = NULL;
7702    }
7703}
7704
7705@ @<Private function prototypes@> =
7706static inline
7707TOK token_new(struct marpa_r *r, SYMID symbol_id, gpointer value);
7708@ @<Function definitions@> =
7709static inline
7710TOK token_new(struct marpa_r *r, SYMID symbol_id, gpointer value)
7711{
7712  TOK token;
7713    token = obstack_alloc (TOK_Obs, sizeof(*token));
7714    Type_of_TOK(token) = TOKEN_OR_NODE;
7715    SYMID_of_TOK(token) = symbol_id;
7716    Value_of_TOK(token) = value;
7717  return token;
7718}
7719
7720@ Recover |token| from the token obstack.
7721The intended use is to recover the one token
7722most recently added in case of an error.
7723@<Recover |token|@> = obstack_free (TOK_Obs, token);
7724
7725@** Alternative Tokens (ALT) Code.
7726Because Marpa allows more than one token at every
7727earleme, Marpa's tokens are also called ``alternatives".
7728@<Private incomplete structures@> =
7729struct s_alternative;
7730typedef struct s_alternative* ALT;
7731typedef const struct s_alternative* ALT_Const;
7732@
7733@d TOK_of_ALT(alt) ((alt)->t_token)
7734@d SYMID_of_ALT(alt) SYMID_of_TOK(TOK_of_ALT(alt))
7735@d Start_ES_of_ALT(alt) ((alt)->t_start_earley_set)
7736@d Start_Earleme_of_ALT(alt) Earleme_of_ES(Start_ES_of_ALT(alt))
7737@d End_Earleme_of_ALT(alt) ((alt)->t_end_earleme)
7738@<Private structures@> =
7739struct s_alternative {
7740    TOK t_token;
7741    ES t_start_earley_set;
7742    EARLEME t_end_earleme;
7743};
7744typedef struct s_alternative ALT_Object;
7745
7746@ @<Widely aligned recognizer elements@> =
7747DSTACK_DECLARE(t_alternatives);
7748@
7749{\bf To Do}: @^To Do@>
7750The value of |INITIAL_ALTERNATIVES_CAPACITY| is 1 for testing while this
7751code is being developed.
7752Once the code is stable it should be increased.
7753@d INITIAL_ALTERNATIVES_CAPACITY 1
7754@<Initialize recognizer elements@> =
7755DSTACK_INIT(r->t_alternatives, ALT_Object, INITIAL_ALTERNATIVES_CAPACITY);
7756@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_alternatives);
7757
7758@ This functions returns the index at which to insert a new
7759alternative, or -1 if the new alternative is a duplicate.
7760(Duplicate alternatives should not be inserted.)
7761@<Private function prototypes@> =
7762static inline gint alternative_insertion_point(RECCE r, ALT new_alternative);
7763@ A variation of binary search.
7764@<Function definitions@> =
7765static inline gint
7766alternative_insertion_point (RECCE r, ALT new_alternative)
7767{
7768  DSTACK alternatives = &r->t_alternatives;
7769  ALT alternative;
7770  gint hi = DSTACK_LENGTH(*alternatives) - 1;
7771  gint lo = 0;
7772  gint trial;
7773  // Special case when zero alternatives.
7774  if (hi < 0)
7775    return 0;
7776  alternative = DSTACK_BASE(*alternatives, ALT_Object);
7777  for (;;)
7778    {
7779      gint outcome;
7780      trial = lo + (hi - lo) / 2;
7781      outcome = alternative_cmp (new_alternative, alternative+trial);
7782      if (outcome == 0)
7783	return -1;
7784      if (outcome > 0)
7785	{
7786	  lo = trial + 1;
7787	}
7788      else
7789	{
7790	  hi = trial - 1;
7791	}
7792      if (hi < lo)
7793	return outcome > 0 ? trial + 1 : trial;
7794    }
7795}
7796
7797@ This is the comparison function for sorting alternatives.
7798The alternatives array also acts as a stack, with the alternatives
7799ending at the lowest numbered earleme on top of the stack.
7800This allows alternatives to be popped off the stack as the
7801earlemes are processed in numerical order.
7802@<Private function prototypes@> =
7803static inline gint alternative_cmp(const ALT_Const a, const ALT_Const b);
7804@ So that the alternatives array can act as a stack,
7805the end earleme of the alternatives must be the major key,
7806and must sort in reverse order.
7807Of the remaining two keys,
7808the more minor key is the start earleme, because that way its slightly
7809costlier evaluation can sometimes be avoided.
7810@<Function definitions@> =
7811static inline gint alternative_cmp(const ALT_Const a, const ALT_Const b) {
7812     gint subkey = End_Earleme_of_ALT(b) - End_Earleme_of_ALT(a);
7813     if (subkey) return subkey;
7814     subkey = SYMID_of_ALT(a) - SYMID_of_ALT(b);
7815     if (subkey) return subkey;
7816     return Start_Earleme_of_ALT(a) - Start_Earleme_of_ALT(b);
7817}
7818
7819@ This function pops an alternative from the stack, if it matches
7820the earleme argument.
7821If no alternative on the stack has its end earleme at the
7822earleme argument, |NULL| is returned.
7823The data pointed to by the return value may be overwritten when
7824new alternatives are added, so it must be used before the next
7825call that adds data to the alternatives stack.
7826@<Private function prototypes@> =
7827static inline ALT alternative_pop(RECCE r, EARLEME earleme);
7828@ @<Function definitions@> =
7829static inline ALT alternative_pop(RECCE r, EARLEME earleme)
7830{
7831    DSTACK alternatives = &r->t_alternatives;
7832    ALT top_of_stack = DSTACK_TOP(*alternatives, ALT_Object);
7833    if (!top_of_stack) return NULL;
7834    if (earleme != End_Earleme_of_ALT(top_of_stack)) return NULL;
7835    return DSTACK_POP(*alternatives, ALT_Object);
7836}
7837
7838@ This function inserts an alternative into the stack,
7839in sorted order,
7840if the alternative is not a duplicate.
7841It returns -1 if the alternative is a duplicate,
7842and the insertion point (which must be zero or more) otherwise.
7843@<Private function prototypes@> =
7844static inline gint alternative_insert(RECCE r, ALT alternative);
7845@ @<Function definitions@> =
7846static inline gint alternative_insert(RECCE r, ALT new_alternative)
7847{
7848  ALT top_of_stack, base_of_stack;
7849  DSTACK alternatives = &r->t_alternatives;
7850  gint ix;
7851  gint insertion_point = alternative_insertion_point (r, new_alternative);
7852  if (insertion_point < 0)
7853    return insertion_point;
7854  top_of_stack = DSTACK_PUSH(*alternatives, ALT_Object); // may change base
7855  base_of_stack = DSTACK_BASE(*alternatives, ALT_Object); // base will not change after this
7856   for (ix = top_of_stack-base_of_stack; ix > insertion_point; ix--) {
7857       base_of_stack[ix] = base_of_stack[ix-1];
7858   }
7859   base_of_stack[insertion_point] = *new_alternative;
7860   return insertion_point;
7861}
7862
7863@** Starting Recognizer Input.
7864@ @<Public function prototypes@> = gboolean marpa_start_input(struct marpa_r *r);
7865@ @<Function definitions@> = gboolean marpa_start_input(struct marpa_r *r)
7866{
7867    ES set0;
7868    EIM item;
7869    EIK_Object key;
7870    AHFA state;
7871    GRAMMAR_Const g = G_of_R(r);
7872    const gint symbol_count_of_g = SYM_Count_of_G(g);
7873    @<Return |FALSE| on failure@>@;
7874    @<Fail if recognizer not initial@>@;
7875    @<Allocate recognizer workareas@>@;
7876    psar_reset(Dot_PSAR_of_R(r));
7877    @<Allocate recognizer's bit vectors for symbols@>@;
7878    @<Initialize Earley item work stacks@>@;
7879    Phase_of_R(r) = input_phase;
7880    LV_Current_Earleme_of_R(r) = 0;
7881    set0 = earley_set_new(r, 0);
7882    LV_Latest_ES_of_R(r) = set0;
7883    LV_First_ES_of_R(r) = set0;
7884    state = AHFA_of_G_by_ID(g, 0);
7885    key.t_origin = set0;
7886    key.t_state = state;
7887    key.t_set = set0;
7888    item = earley_item_create(r, key);
7889    state = Empty_Transition_of_AHFA(state);
7890    if (state) {
7891	key.t_state = state;
7892	item = earley_item_create(r, key);
7893    }
7894    postdot_items_create(r, set0);
7895    earley_set_update_items(r, set0);
7896    r->t_is_using_leo = r->t_use_leo_flag;
7897    return TRUE;
7898}
7899
7900@** Read a Token Alternative.
7901The ordinary semantics of a parser generator is a token-stream
7902semantics.
7903The input is a sequence of $n$ tokens.
7904Every token is of length 1.
7905The tokens fill the locations from 0 to $n-1$.
7906The first token goes into location 0,
7907the next into location 1,
7908and so on up to location $n-1$.
7909@ In Marpa terms, a token-stream
7910corresponds to reading exactly one token alternative at every location.
7911In Marpa, the input locations are also called earlemes.
7912@ Marpa allows other models of the input besides the token stream model.
7913Tokens may be ambiguous -- that is, more than one token may occur
7914at any location.
7915Tokens vary in length -- tokens may be of any length greater than
7916or equal to one.
7917This means tokens can span multiple earlemes.
7918As a consequence,
7919there may be no tokens at some earlemes.
7920@ |marpa_alternative|, by enforcing a limit on token length and on
7921the furthest location, indirectly enforces a limit on the
7922number of earley sets and the maximum earleme location.
7923If tokens ending at location $n$ cannot be scanned, then clearly
7924the parse can
7925never reach location $n$.
7926@ Whether token rejection is considered a failure is
7927a matter for the upper layers to define.
7928Retrying rejected tokens is one way to implement the
7929important ``Ruby Slippers" parsing technique.
7930On the other hand it is traditional,
7931and often quite reasonable,
7932to always treat rejection of a token as a fatal error.
7933@ Returns current earleme (which may be zero) on success.
7934If the token is rejected because it is not
7935expected, returns |-1|.
7936If the token is rejected as a duplicate
7937expected, returns |-3|.
7938On failure for other reasons, returns |-2|.
7939@ Rejection because a token is unexpected can a common
7940occurrence in an application---%
7941an application may use this function to try out
7942various alternatives.
7943Rejection because a token is a duplicate is more likely to be
7944a hard failure, but it is possible that an application will
7945also see this as a normal data path.
7946The general failures reported with |-2| will typically be
7947treated by the application as fatal errors.
7948@<Public function prototypes@> = gboolean marpa_alternative(struct marpa_r *r,
7949Marpa_Symbol_ID token_id, gpointer value, gint length);
7950@ @<Function definitions@> =
7951gboolean marpa_alternative(struct marpa_r *r,
7952Marpa_Symbol_ID token_id, gpointer value, gint length) {
7953    @<Return |-2| on failure@>@;
7954    GRAMMAR_Const g = G_of_R(r);
7955    const gint duplicate_token_indicator = -3;
7956    const gint unexpected_token_indicator = -1;
7957    ES current_earley_set;
7958    const EARLEME current_earleme = Current_Earleme_of_R(r);
7959    EARLEME target_earleme;
7960    @<Fail if recognizer not in input phase@>@;
7961    @<Fail if recognizer exhausted@>@;
7962    @<|marpa_alternative| initial check for failure conditions@>@;
7963    @<Set |current_earley_set|, failing if token is unexpected@>@;
7964    @<Set |target_earleme| or fail@>@;
7965    @<Insert alternative into stack, failing if token is duplicate@>@;
7966    return current_earleme;
7967}
7968
7969@ @<|marpa_alternative| initial check for failure conditions@> = {
7970    const SYM_Const token = SYM_by_ID(token_id);
7971    if (!SYM_is_Terminal(token)) {
7972	R_ERROR("token is not a terminal");
7973	return failure_indicator;
7974    }
7975    if (length <= 0) {
7976	R_ERROR("token length negative or zero");
7977	return failure_indicator;
7978    }
7979    if (length >= EARLEME_THRESHOLD) {
7980	R_ERROR("token too long");
7981	return failure_indicator;
7982    }
7983}
7984
7985@ @<Set |target_earleme| or fail@> = {
7986    target_earleme = current_earleme + length;
7987    if (target_earleme >= EARLEME_THRESHOLD) {
7988	r_context_clear(r);
7989	r_context_int_add(r, "target_earleme", target_earleme);
7990	R_ERROR_CXT("parse too long");
7991	return failure_indicator;
7992    }
7993}
7994
7995@ If no postdot item is found at the current Earley set for this
7996item, the token ID is unexpected, and |unexpected_token_indicator| is returned.
7997The application can treat this as a fatal error.
7998The application can also use this as a mechanism to test alternatives,
7999in which case, returning |unexpected_token_indicator| is a perfectly normal data path.
8000This last is part of an important technique:
8001``Ruby Slippers" parsing.
8002@<Set |current_earley_set|, failing if token is unexpected@> = {
8003    current_earley_set = Current_ES_of_R (r);
8004    if (!current_earley_set) return unexpected_token_indicator;
8005    if (!First_PIM_of_ES_by_SYMID (current_earley_set, token_id))
8006	return unexpected_token_indicator;
8007}
8008
8009@ Insert an alternative into the alternatives stack,
8010detecting if we are attempting to add the same token twice.
8011Two tokens are considered the same if
8012\li they have the same token ID, and
8013\li they have the same length, and
8014\li they have the same origin.
8015Because $|origin|+|token_length| = |current_earleme|$,
8016Two tokens at the same current earleme are the same if they
8017have the same token ID and origin.
8018By the same equation,
8019two tokens at the same current earleme are the same if they
8020have the same token ID and token length.
8021It is up to the higher layers to determine if rejection
8022of a duplicate token is a fatal error.
8023The Earley sets and items will not have been
8024altered by the attempt.
8025@<Insert alternative into stack, failing if token is duplicate@> =
8026{
8027  TOK token = token_new (r, token_id, value);
8028  ALT_Object alternative;
8029  if (Furthest_Earleme_of_R (r) < target_earleme)
8030    LV_Furthest_Earleme_of_R (r) = target_earleme;
8031  alternative.t_token = token;
8032  alternative.t_start_earley_set = current_earley_set;
8033  alternative.t_end_earleme = target_earleme;
8034  if (alternative_insert (r, &alternative) < 0)
8035  {
8036    @<Recover |token|@>@;
8037    return duplicate_token_indicator;
8038    }
8039}
8040
8041@** Complete an Earley Set.
8042In the Aycock-Horspool variation of Earley's algorithm,
8043the two main phases are scanning and completion.
8044This section is devoted to the logic for completion.
8045@d Work_EIMs_of_R(r) DSTACK_BASE((r)->t_eim_work_stack, EIM)
8046@d Work_EIM_Count_of_R(r) DSTACK_LENGTH((r)->t_eim_work_stack)
8047@d WORK_EIMS_CLEAR(r) DSTACK_CLEAR((r)->t_eim_work_stack)
8048@d WORK_EIM_PUSH(r) DSTACK_PUSH((r)->t_eim_work_stack, EIM)
8049@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_eim_work_stack);
8050@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_eim_work_stack);
8051@ @<Initialize Earley item work stacks@> =
8052    DSTACK_IS_INITIALIZED(r->t_eim_work_stack) ||
8053	DSTACK_INIT (r->t_eim_work_stack, EIM , 1024);
8054@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_eim_work_stack);
8055
8056@ The completion stack is initialized to a very high-ball estimate of the
8057number of completions per Earley set.
8058It will grow if needed.
8059Large stacks may needed for very ambiguous grammars.
8060@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_completion_stack);
8061@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_completion_stack);
8062@ @<Initialize Earley item work stacks@> =
8063    DSTACK_IS_INITIALIZED(r->t_completion_stack) ||
8064    DSTACK_INIT (r->t_completion_stack, EIM , 1024);
8065@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_completion_stack);
8066
8067@ The completion stack is initialized to a very high-ball estimate of the
8068number of completions per Earley set.
8069It will grow if needed.
8070Large stacks may needed for very ambiguous grammars.
8071@<Widely aligned recognizer elements@> = DSTACK_DECLARE(t_earley_set_stack);
8072@ @<Initialize recognizer elements@> = DSTACK_SAFE(r->t_earley_set_stack);
8073@ @<Destroy recognizer elements@> = DSTACK_DESTROY(r->t_earley_set_stack);
8074
8075@ This function returns the number of terminals expected on success.
8076On failure, it returns |-2|.
8077If the completion of the earleme left the parse exhausted, 0 is
8078returned.
8079@
8080While, if the completion of the earleme left the parse exhausted, 0 is
8081returned, the converse is not true if tokens may be longer than one earleme.
8082In those alternative input models, it is possible that no terminals are
8083expected at the current earleme, but other terminals might be expected
8084at later earlemes.
8085That means that the parse can be continued---%
8086it is not exhausted.
8087In those alternative input models,
8088if the distinction between zero terminals expected and an
8089exhausted parse is significant to the higher layers,
8090they must explicitly check the phase whenever this function
8091returns zero.
8092@<Public function prototypes@> =
8093Marpa_Earleme marpa_earleme_complete(struct marpa_r* r);
8094@ @<Function definitions@> =
8095Marpa_Earleme
8096marpa_earleme_complete(struct marpa_r* r)
8097{
8098  @<Return |-2| on failure@>@;
8099  EIM* cause_p;
8100  ES current_earley_set;
8101  EARLEME current_earleme;
8102  gint count_of_expected_terminals;
8103    @<Fail if recognizer not in input phase@>@;
8104    @<Fail if recognizer exhausted@>@;
8105  psar_dealloc(Dot_PSAR_of_R(r));
8106    bv_clear (r->t_bv_symid_is_expected);
8107    @<Initialize |current_earleme|@>@;
8108    @<Return 0 if no alternatives@>@;
8109    @<Initialize |current_earley_set|@>@;
8110    @<Scan from the alternative stack@>@;
8111    @<Pre-populate the completion stack@>@;
8112    while ((cause_p = DSTACK_POP(r->t_completion_stack, EIM))) {
8113      EIM cause = *cause_p;
8114        @<Add new Earley items for |cause|@>@;
8115    }
8116    postdot_items_create(r, current_earley_set);
8117
8118    count_of_expected_terminals = bv_count (r->t_bv_symid_is_expected);
8119    if (count_of_expected_terminals <= 0
8120	&& Earleme_of_ES (current_earley_set) >= Furthest_Earleme_of_R (r))
8121      { /* If no terminals are expected, and there are no Earley items in
8122           uncompleted Earley sets, we can make no further progress.
8123	   The parse is ``exhausted". */
8124	LV_R_is_Exhausted(r) = 1;
8125      }
8126    earley_set_update_items(r, current_earley_set);
8127    return count_of_expected_terminals;
8128}
8129
8130@ @<Initialize |current_earleme|@> = {
8131  current_earleme = ++(LV_Current_Earleme_of_R(r));
8132  if (current_earleme > Furthest_Earleme_of_R (r))
8133    {
8134	LV_R_is_Exhausted(r) = 1;
8135	R_ERROR("parse exhausted");
8136	return failure_indicator;
8137     }
8138}
8139
8140@ Create a new Earley set.  We know that it does not
8141exist.
8142@<Initialize |current_earley_set|@> = {
8143    current_earley_set = earley_set_new (r, current_earleme);
8144    LV_Next_ES_of_ES(Latest_ES_of_R(r)) = current_earley_set;
8145    LV_Latest_ES_of_R(r) = current_earley_set;
8146}
8147
8148@ If there are no alternatives for this earleme
8149return 0 without creating an
8150Earley set.
8151The return value of 0 indicates that there are no terminals
8152which will be accepted at this earleme.
8153In the default (token stream) model of input,
8154this means that the parse is exhausted.
8155@<Return 0 if no alternatives@> = {
8156    ALT top_of_stack = DSTACK_TOP(r->t_alternatives, ALT_Object);
8157    if (!top_of_stack) return 0;
8158    if (current_earleme != End_Earleme_of_ALT(top_of_stack)) return 0;
8159}
8160
8161@ @<Scan from the alternative stack@> =
8162{
8163  ALT alternative;
8164  while ((alternative = alternative_pop (r, current_earleme)))
8165    @<Scan an Earley item from alternative@>@;
8166}
8167
8168@ @<Scan an Earley item from alternative@> =
8169{
8170  ES start_earley_set = Start_ES_of_ALT (alternative);
8171  TOK token = TOK_of_ALT (alternative);
8172  SYMID token_id = SYMID_of_TOK(token);
8173  PIM pim = First_PIM_of_ES_by_SYMID (start_earley_set, token_id);
8174  for ( ; pim ; pim = Next_PIM_of_PIM (pim)) {
8175      AHFA scanned_AHFA, prediction_AHFA;
8176      EIM scanned_earley_item;
8177      EIM predecessor = EIM_of_PIM (pim);
8178      if (!predecessor)
8179	continue;		// Ignore Leo items when scanning
8180      scanned_AHFA = To_AHFA_of_EIM_by_SYMID (predecessor, token_id);
8181      scanned_earley_item = earley_item_assign (r,
8182						current_earley_set,
8183						Origin_of_EIM (predecessor),
8184						scanned_AHFA);
8185      token_link_add (r, scanned_earley_item, predecessor, token);
8186      prediction_AHFA = Empty_Transition_of_AHFA (scanned_AHFA);
8187      if (!prediction_AHFA) continue;
8188      scanned_earley_item = earley_item_assign (r,
8189						    current_earley_set,
8190						    current_earley_set,
8191						    prediction_AHFA);
8192    }
8193}
8194
8195@ @<Pre-populate the completion stack@> = {
8196    EIM* work_earley_items = DSTACK_BASE (r->t_eim_work_stack, EIM );
8197    gint no_of_work_earley_items = DSTACK_LENGTH (r->t_eim_work_stack );
8198    gint ix;
8199    DSTACK_CLEAR(r->t_completion_stack);
8200    for (ix = 0;
8201         ix < no_of_work_earley_items;
8202	 ix++) {
8203	EIM earley_item = work_earley_items[ix];
8204	EIM* tos;
8205	if (!Earley_Item_is_Completion (earley_item))
8206	  continue;
8207	tos = DSTACK_PUSH (r->t_completion_stack, EIM);
8208	*tos = earley_item;
8209      }
8210    }
8211
8212@ For the current completion cause,
8213add those Earley items it ``causes".
8214@<Add new Earley items for |cause|@> =
8215{
8216  Marpa_Symbol_ID *complete_symbols = Complete_SYMIDs_of_EIM (cause);
8217  gint count = Complete_SYM_Count_of_EIM (cause);
8218  ES middle = Origin_of_EIM (cause);
8219  gint symbol_ix;
8220  for (symbol_ix = 0; symbol_ix < count; symbol_ix++)
8221    {
8222      Marpa_Symbol_ID complete_symbol = complete_symbols[symbol_ix];
8223      @<Add new Earley items for |complete_symbol| and |cause|@>@;
8224    }
8225}
8226
8227@ @<Add new Earley items for |complete_symbol| and |cause|@> =
8228{
8229  PIM postdot_item;
8230  for (postdot_item = First_PIM_of_ES_by_SYMID (middle, complete_symbol);
8231       postdot_item; postdot_item = Next_PIM_of_PIM (postdot_item))
8232    {
8233      EIM predecessor = EIM_of_PIM (postdot_item);
8234      EIM effect;
8235      AHFA effect_AHFA_state;
8236      if (predecessor)
8237	{ /* Not a Leo item */
8238	  @<Add effect, plus any prediction, for non-Leo predecessor@>@;
8239	}
8240      else
8241	{			/* A Leo item */
8242	  @<Add effect of Leo item@>@;
8243	  break;		/* When I encounter a Leo item,
8244				   I skip everything else for this postdot
8245				   symbol */
8246	}
8247    }
8248}
8249
8250@ @<Add effect, plus any prediction, for non-Leo predecessor@> =
8251{
8252    ES origin = Origin_of_EIM(predecessor);
8253     effect_AHFA_state = To_AHFA_of_EIM_by_SYMID(predecessor, complete_symbol);
8254     effect = earley_item_assign(r, current_earley_set,
8255          origin, effect_AHFA_state);
8256     if (Earley_Item_has_No_Source(effect)) {
8257         /* If it has no source, then it is new */
8258         if (Earley_Item_is_Completion(effect)) {
8259	     @<Push effect onto completion stack@>@;
8260	 }
8261	 @<Add Earley item predicted by completion, if there is one@>@;
8262     }
8263     completion_link_add(r, effect, predecessor, cause);
8264}
8265
8266@ @<Push effect onto completion stack@> = {
8267    EIM* tos = DSTACK_PUSH (r->t_completion_stack, EIM);
8268    *tos = effect;
8269}
8270
8271
8272
8273@ @<Add Earley item predicted by completion, if there is one@> = {
8274  AHFA prediction_AHFA_state =
8275    Empty_Transition_of_AHFA (effect_AHFA_state);
8276  if (prediction_AHFA_state)
8277    {
8278      earley_item_assign (r, current_earley_set, current_earley_set,
8279			  prediction_AHFA_state);
8280    }
8281}
8282
8283@ @<Add effect of Leo item@> = {
8284    LIM leo_item = LIM_of_PIM (postdot_item);
8285    ES origin = Origin_of_LIM (leo_item);
8286    effect_AHFA_state = Top_AHFA_of_LIM (leo_item);
8287    effect = earley_item_assign (r, current_earley_set,
8288				 origin, effect_AHFA_state);
8289    if (Earley_Item_has_No_Source (effect))
8290      {
8291	/* If it has no source, then it is new */
8292	@<Push effect onto completion stack@>@;
8293      }
8294    leo_link_add (r, effect, leo_item, cause);
8295}
8296
8297@ @<Private function prototypes@> =
8298static inline void earley_set_update_items(RECCE r, ES set);
8299@ @<Function definitions@> =
8300static inline void earley_set_update_items(RECCE r, ES set) {
8301    EIM* working_earley_items;
8302    EIM* finished_earley_items;
8303    gint working_earley_item_count;
8304    gint i;
8305    if (!EIMs_of_ES(set)) {
8306        EIMs_of_ES(set) = g_new(EIM, EIM_Count_of_ES(set));
8307    } else {
8308        EIMs_of_ES(set) = g_renew(EIM, EIMs_of_ES(set), EIM_Count_of_ES(set));
8309    }
8310    finished_earley_items = EIMs_of_ES(set);
8311    working_earley_items = Work_EIMs_of_R(r);
8312    working_earley_item_count = Work_EIM_Count_of_R(r);
8313    for (i = 0; i < working_earley_item_count; i++) {
8314	 EIM earley_item = working_earley_items[i];
8315	 gint ordinal = Ord_of_EIM(earley_item);
8316         finished_earley_items[ordinal] = earley_item;
8317    }
8318    WORK_EIMS_CLEAR(r);
8319}
8320
8321@ @<Private function prototypes@> =
8322static inline void r_update_earley_sets(RECCE r);
8323@ @d P_ES_of_R_by_Ord(r, ord) DSTACK_INDEX((r)->t_earley_set_stack, ES, (ord))
8324@d ES_of_R_by_Ord(r, ord) (*P_ES_of_R_by_Ord((r), (ord)))
8325@<Function definitions@> =
8326static inline void r_update_earley_sets(RECCE r) {
8327    ES set;
8328    ES first_unstacked_earley_set;
8329    if (!DSTACK_IS_INITIALIZED(r->t_earley_set_stack)) {
8330	first_unstacked_earley_set = First_ES_of_R(r);
8331	DSTACK_INIT (r->t_earley_set_stack, ES,
8332		 MAX (1024, ES_Count_of_R(r)));
8333    } else {
8334	 ES* top_of_stack = DSTACK_TOP(r->t_earley_set_stack, ES);
8335	 first_unstacked_earley_set = Next_ES_of_ES(*top_of_stack);
8336    }
8337    for (set = first_unstacked_earley_set; set; set = Next_ES_of_ES(set)) {
8338          ES* top_of_stack = DSTACK_PUSH(r->t_earley_set_stack, ES);
8339	  (*top_of_stack) = set;
8340    }
8341}
8342
8343@** Create the Postdot Items.
8344@ This function inserts regular (non-Leo) postdot items into
8345the postdot list.
8346It is assumed that the caller has ensured this is not a duplicate.
8347@<Private function prototypes@> =
8348static void
8349postdot_items_create (struct marpa_r *r, ES set);
8350@ Not inlined, because of its size, and because it is used
8351twice -- once in initializing the Earley set 0,
8352and once for completing later Earley sets.
8353Earley set 0 is very much a special case, and it
8354might be a good idea to have
8355separate code to handle it,
8356in which case both could be inlined.
8357@ Leo items are not created for Earley set 0.
8358They are always optional, and add little at that point.
8359In that way I can avoid dealing with empty productions in
8360the Leo logic.
8361Empty productions only occur in dealing with the null parse,
8362and only in Earley set 0.
8363@<Function definitions@> =
8364static void
8365postdot_items_create (struct marpa_r *r, ES current_earley_set)
8366{
8367    gpointer * const pim_workarea = r->t_sym_workarea;
8368    GRAMMAR_Const g = G_of_R(r);
8369    EARLEME current_earley_set_id = Earleme_of_ES(current_earley_set);
8370    Bit_Vector bv_pim_symbols = r->t_bv_sym;
8371    Bit_Vector bv_lim_symbols = r->t_bv_sym2;
8372    bv_clear (bv_pim_symbols);
8373    bv_clear (bv_lim_symbols);
8374    @<Start EIXes in PIM workarea@>@;
8375    if (r->t_is_using_leo) {
8376	@<Start LIMs in PIM workarea@>@;
8377	@<Add predecessors to LIMs@>@;
8378    }
8379    @<Copy PIM workarea to postdot item array@>@;
8380    bv_and(r->t_bv_symid_is_expected, bv_pim_symbols, g->t_bv_symid_is_terminal);
8381}
8382
8383@ This code creates the Earley indexes in the PIM workarea.
8384At this point there are no Leo items.
8385@<Start EIXes in PIM workarea@> = {
8386    EIM* work_earley_items = DSTACK_BASE (r->t_eim_work_stack, EIM );
8387    gint no_of_work_earley_items = DSTACK_LENGTH (r->t_eim_work_stack );
8388    gint ix;
8389    for (ix = 0;
8390         ix < no_of_work_earley_items;
8391	 ix++) {
8392	EIM earley_item = work_earley_items[ix];
8393      AHFA state = AHFA_of_EIM (earley_item);
8394      gint symbol_ix;
8395      gint postdot_symbol_count = Postdot_SYM_Count_of_AHFA (state);
8396      Marpa_Symbol_ID *postdot_symbols =
8397	Postdot_SYMID_Ary_of_AHFA (state);
8398      for (symbol_ix = 0; symbol_ix < postdot_symbol_count; symbol_ix++)
8399	{
8400	  PIM old_pim = NULL;
8401	  PIM new_pim;
8402	  Marpa_Symbol_ID symid;
8403	  new_pim = obstack_alloc (&r->t_obs, sizeof (EIX_Object));
8404	  symid = postdot_symbols[symbol_ix];
8405	  LV_Postdot_SYMID_of_PIM(new_pim) = symid;
8406	  LV_EIM_of_PIM(new_pim) = earley_item;
8407	  if (bv_bit_test(bv_pim_symbols, (guint)symid))
8408	      old_pim = pim_workarea[symid];
8409	  if (old_pim) {
8410	      LV_Next_PIM_of_PIM(new_pim) = old_pim;
8411	  } else {
8412	      LV_Next_PIM_of_PIM(new_pim) = NULL;
8413	      current_earley_set->t_postdot_sym_count++;
8414	  }
8415	  pim_workarea[symid] = new_pim;
8416	  bv_bit_set(bv_pim_symbols, (guint)symid);
8417	}
8418    }
8419}
8420
8421@ This code creates the Earley indexes in the PIM workarea.
8422The Leo items do not contain predecessors or have the
8423predecessor-dependent information set at this point.
8424@ The origin and predecessor will be filled in later,
8425when the predecessor is known.
8426The top AHFA to-state is set to |NULL|,
8427and that will be used as an indicator that the fields
8428of this
8429Leo item have not been fully populated.
8430@d LIM_is_Populated(leo) (Origin_of_LIM(leo) != NULL)
8431@<Start LIMs in PIM workarea@> =
8432{
8433  guint min, max, start;
8434  for (start = 0; bv_scan (bv_pim_symbols, start, &min, &max);
8435       start = max + 2)
8436    {
8437      SYMID symid;
8438      for (symid = (SYMID) min; symid <= (SYMID) max; symid++)
8439	{
8440	  PIM this_pim = pim_workarea[symid];
8441	  if (!Next_PIM_of_PIM (this_pim))
8442	    { /* Only create a Leo item if there is more
8443	         than one EIX */
8444	      EIM leo_base = EIM_of_PIM (this_pim);
8445	      AHFA base_to_ahfa = To_AHFA_of_EIM_by_SYMID (leo_base, symid);
8446	      if (AHFA_is_Leo_Completion (base_to_ahfa))
8447		{
8448		  @<Create a new, unpopulated, LIM@>@;
8449		}
8450	    }
8451	}
8452    }
8453}
8454
8455@ The Top AHFA of the new LIM is temporarily used
8456to memoize
8457the value of the AHFA to-state for the LIM's
8458base EIM.
8459That may become its actual value,
8460once it is populated.
8461@<Create a new, unpopulated, LIM@> = {
8462    LIM new_lim;
8463    new_lim = obstack_alloc(&r->t_obs, sizeof(*new_lim));
8464    Postdot_SYMID_of_LIM(new_lim) = symid;
8465    LV_EIM_of_PIM(new_lim) = NULL;
8466    LV_Predecessor_LIM_of_LIM(new_lim) = NULL;
8467    LV_Origin_of_LIM(new_lim) = NULL;
8468    LV_Chain_Length_of_LIM(new_lim) = -1;
8469    LV_Top_AHFA_of_LIM(new_lim) = base_to_ahfa;
8470    LV_Base_EIM_of_LIM(new_lim) = leo_base;
8471    LV_ES_of_LIM(new_lim) = current_earley_set;
8472    LV_Next_PIM_of_LIM(new_lim) = this_pim;
8473    pim_workarea[symid] = new_lim;
8474    bv_bit_set(bv_lim_symbols, (guint)symid);
8475}
8476
8477@ This code fully populates the data in the LIMs.
8478It determines the Leo predecesors of the LIMs, if any,
8479then populates that datum and the predecessor-dependent
8480data.
8481@ The algorithm is fast, if not a model of simplicity.
8482The LIMs are processed in an outer loop in order by
8483symbol ID, as well as in an inner loop which processes
8484predecessor chains from bottom to top.
8485It is very much possible that the
8486same LIM will be encountered twice,
8487once in each loop.
8488The code always checks to see if a LIM is
8489already populated,
8490before populating it.
8491@ The outer loop ensures that all LIMs are eventually
8492populated.  It uses the PIM workarea, guided by
8493a bit vector which indicates the LIM's.
8494@ It is possible for a LIM to be encountered which may have a predecessor,
8495but which cannot be immediately populated.
8496This is because predecessors link the LIMs in chains, and such chains
8497must be populated in order.
8498Any ``links" in the chain of LIMs which are in previous Earley sets
8499will already be populated.
8500But a chain of LIMs may all be in the current Earley set, the
8501one we are currently processing.
8502In this case, there is a chicken-and-egg issue, which is
8503resolved by arranging those LIMs in chain link order,
8504and processing them in that order.
8505This is the business of the inner loop.
8506@ When a LIM is encountered which cannot be populated immediately,
8507its chain is followed and copied into |lim_chain|, which is in
8508effect a stack.  The chain ends when it reaches
8509a LIM which can be populated immediately.
8510@ A special case is when the LIM chain cycles back to the LIM
8511which started the chain.
8512When this happens, the LIM chain is terminated.
8513The bottom of such a chain
8514(which, since it is a cycle, is also the top)
8515is populated with a predecessor of
8516|NULL| and appropriate predecessor-dependent data.
8517@ {\bf Theorem}: The number of links
8518in a LIM chain is never more than the number
8519of symbols in the grammar.
8520{\bf Proof}: A LIM chain consists of the predecessors of LIMs,
8521all of which are in the same Earley set.
8522A LIM is uniquely determined by a duple of Earley set and transition symbol.
8523This means, in a single Earley set, there is at most one LIM per symbol.
8524{\bf QED}.
8525@ {\bf Complexity}: Time complexity is $O(n)$, where $n$ is the number
8526of LIMs.  This can be shown as follows:
8527\li The outer loop processes each LIM exactly once.
8528\li A LIM is never put onto a LIM chain if it is already populated.
8529\li A LIM is never taken off a LIM chain without being populated.
8530\li Based on the previous two observations, we know that a LIM will
8531be put onto a LIM chain at most once.
8532\li Ignoring the inner loop processing, the amount of processing done for each
8533LIM in the outer loop LIM is $O(1)$.
8534\li The amount of processing done for each LIM
8535in the inner loop is $O(1)$.
8536\li Total processing for all $n$ LIMs is therefore $n(O(1)+O(1))=O(n)$.
8537@ The |bv_ok_for_chain| is a vector of bits by symbol ID.
8538A bit is set if there is a LIM for that symbol ID that is OK for addition
8539to the LIM chain.
8540To be OK for addition to the LIM chain, the postdot item for the symbol
8541ID must
8542\li In fact actually be a Leo item (LIM).
8543\li Must not have been populated.
8544\li Must not have already been added to a LIM chain for this
8545Earley set.\par
8546@<Add predecessors to LIMs@> = {
8547  const Bit_Vector bv_ok_for_chain = r->t_bv_sym3;
8548  guint min, max, start;
8549
8550  bv_copy(bv_ok_for_chain, bv_lim_symbols);
8551  for (start = 0; bv_scan (bv_lim_symbols, start, &min, &max);
8552       start = max + 2)
8553    { /* This is the outer loop.  It loops over the symbols IDs,
8554	  visiting only the symbols with LIMs. */
8555      SYMID main_loop_symbol_id;
8556      for (main_loop_symbol_id = (SYMID) min;
8557	  main_loop_symbol_id <= (SYMID) max;
8558	  main_loop_symbol_id++)
8559	{
8560	  LIM predecessor_lim;
8561	  LIM lim_to_process = pim_workarea[main_loop_symbol_id];
8562          if (LIM_is_Populated(lim_to_process)) continue; /* LIM may
8563	      have already been populated in the LIM chain loop */
8564	    @<Find predecessor LIM of unpopulated LIM@>@;
8565	    if (predecessor_lim && LIM_is_Populated(predecessor_lim)) {
8566	        @<Populate |lim_to_process| from |predecessor_lim|@>@;
8567		continue;
8568	    }
8569	    if (!predecessor_lim) { /* If there is no predecessor LIM to
8570	       populate, we know that we should populate from the base
8571	       Earley item */
8572	       @<Populate |lim_to_process| from its base Earley item@>@;
8573	       continue;
8574	    }
8575	   @<Create and populate a LIM chain@>@;
8576	}
8577    }
8578}
8579
8580@ Find the predecessor LIM from the PIM workarea.
8581If the predecessor
8582starts at the current Earley set, I need to look in
8583the PIM workarea.
8584Otherwise the PIM item array by symbol is already
8585set up and I can find it there.
8586@ The LHS of the completed rule and of the applicable rule
8587in the base item will be the same, because the two rules
8588are the same.
8589Given the |main_loop_symbol_id| we can look up either the
8590appropriate rule in the base Earley item's AHFA state,
8591or the Leo completion's AHFA state.
8592It is most convenient to find the LHS of the completed
8593rule as the
8594only possible Leo LHS of the Leo completion's AHFA state.
8595The AHFA state for the Leo completion is guaranteed
8596to have only one rule.
8597The base Earley item's AHFA state can have multiple
8598rules, and in its list of rules there can
8599be transitions to Leo
8600completions via several different symbols.
8601@ This code only works for unpopulated LIMs,
8602because it relies on the Top AHFA value containing
8603the base AHFA to-state.
8604In a populated LIM, this will not necessarily be the case.
8605@<Find predecessor LIM of unpopulated LIM@> = {
8606    const EIM base_eim = Base_EIM_of_LIM(lim_to_process);
8607    const ES predecessor_set = Origin_of_EIM(base_eim);
8608    const AHFA base_to_ahfa = Top_AHFA_of_LIM(lim_to_process);
8609    const SYMID predecessor_transition_symbol = Leo_LHS_ID_of_AHFA(base_to_ahfa);
8610    PIM predecessor_pim;
8611    if (Earleme_of_ES(predecessor_set) < current_earley_set_id) {
8612	predecessor_pim
8613	= First_PIM_of_ES_by_SYMID (predecessor_set, predecessor_transition_symbol);
8614    } else {
8615        predecessor_pim = pim_workarea[predecessor_transition_symbol];
8616    }
8617    predecessor_lim = PIM_is_LIM(predecessor_pim) ? LIM_of_PIM(predecessor_pim) : NULL;
8618}
8619
8620@ @<Create and populate a LIM chain@> = {
8621  gpointer* const lim_chain = r->t_workarea2;
8622  gint lim_chain_ix;
8623  @<Create a LIM chain@>@;
8624  @<Populate the LIMs in the LIM chain@>@;
8625}
8626
8627@ At this point we know that
8628\li |lim_to_process != NULL|
8629\li |lim_to_process| is not populated
8630\li |predecessor_lim != NULL|
8631\li |predecessor_lim| is not populated
8632@ Cycles can occur in the LIM chain.  They are broken by refusing to
8633put the same LIM on LIM chain twice.  Since a LIM chain links are one-to-one,
8634ensuring that the LIM on the bottom of the chain is never added to the LIM
8635chain is enough to enforce this.
8636@ When I am about to add a LIM twice to the LIM chain, instead I break the
8637chain at that point.  The top of chain will then have no LIM predecesor,
8638instead of being part of a cycle.  Since the LIM information is always optional,
8639and in that case would be useless, breaking the chain in this way causes no
8640problems.
8641@<Create a LIM chain@> = {
8642     SYMID postdot_symid_of_lim_to_process
8643	 = Postdot_SYMID_of_LIM(lim_to_process);
8644    lim_chain_ix = 0;
8645    lim_chain[lim_chain_ix++] = LIM_of_PIM(lim_to_process);
8646	bv_bit_clear(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process);
8647	/* Make sure this LIM
8648	is not added to a LIM chain again for this Earley set */ @#
8649    while (1) {
8650	 lim_to_process = predecessor_lim; /* I know at this point that
8651	     |predecessor_lim| is unpopulated, so I also know that
8652	     |lim_to_process| is unpopulated.  This means I also know that
8653	     |lim_to_process| is in the current Earley set, because all LIMs
8654	     in previous Earley sets are already
8655	     populated. */ @#
8656
8657	 postdot_symid_of_lim_to_process = Postdot_SYMID_of_LIM(lim_to_process);
8658	if (!bv_bit_test(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process)) {
8659	/* If I am about to add a previously added LIM to the LIM chain, I
8660	   break the LIM chain at this point.
8661	     The predecessor LIM has not yet been changed,
8662	     so that it is still appropriate for
8663	     the LIM at the top of the chain.  */
8664	    break;
8665	}
8666
8667        @<Find predecessor LIM of unpopulated LIM@>@;
8668
8669	lim_chain[lim_chain_ix++] = LIM_of_PIM(lim_to_process); /*
8670	    |lim_to_process| is not populated, as shown above */
8671
8672	bv_bit_clear(bv_ok_for_chain, (guint)postdot_symid_of_lim_to_process);
8673	/* Make sure this LIM
8674	is not added to a LIM chain again for this Earley set */ @#
8675
8676	if (!predecessor_lim) break; /* |predecesssor_lim = NULL|,
8677	so that we are forced to break the LIM chain before it */ @#
8678
8679	if (LIM_is_Populated(predecessor_lim)) break;
8680	/* |predecesssor_lim| is populated, so that if we
8681	break before |predecessor_lim|, we are ready to populate the entire LIM
8682	   chain. */
8683    }
8684}
8685
8686@ @<Populate the LIMs in the LIM chain@> =
8687for (lim_chain_ix--; lim_chain_ix >= 0; lim_chain_ix--) {
8688    lim_to_process = lim_chain[lim_chain_ix];
8689    if (predecessor_lim && LIM_is_Populated(predecessor_lim)) {
8690	@<Populate |lim_to_process| from |predecessor_lim|@>@;
8691    } else {
8692	@<Populate |lim_to_process| from its base Earley item@>@;
8693    }
8694    predecessor_lim = lim_to_process;
8695}
8696
8697@ @<Populate |lim_to_process| from |predecessor_lim|@> = {
8698LV_Predecessor_LIM_of_LIM(lim_to_process) = predecessor_lim;
8699LV_Origin_of_LIM(lim_to_process) = Origin_of_LIM(predecessor_lim);
8700LV_Chain_Length_of_LIM(lim_to_process) =
8701    Chain_Length_of_LIM(lim_to_process)+1;
8702LV_Top_AHFA_of_LIM(lim_to_process) = Top_AHFA_of_LIM(predecessor_lim);
8703}
8704
8705@ If we have reached this code, either we do not have a predecessor
8706LIM, or we have one which is useless for populating |lim_to_process|.
8707If a predecessor LIM is not itself populated, it will be useless
8708for populating its successor.
8709An unpopulated predecessor LIM
8710may occur when there is a predecessor LIM
8711which proved impossible to populate because it is part of a cycle.
8712@ The predecessor LIM and the top AHFA to-state were initialized
8713to the appropriate values for this case,
8714and do not need to be changed.
8715The predecessor LIM was initialized to |NULL|,
8716and the top AHFA to-state was initialized to the AHFA to-state
8717of the base EIM.
8718@<Populate |lim_to_process| from its base Earley item@> = {
8719  EIM base_eim = Base_EIM_of_LIM(lim_to_process);
8720  LV_Origin_of_LIM (lim_to_process) = Origin_of_EIM (base_eim);
8721  LV_Chain_Length_of_LIM(lim_to_process) =  0;
8722}
8723
8724@ @<Copy PIM workarea to postdot item array@> = {
8725    PIM *postdot_array
8726	= current_earley_set->t_postdot_ary
8727	= obstack_alloc (&r->t_obs,
8728	       current_earley_set->t_postdot_sym_count * sizeof (PIM));
8729    guint min, max, start;
8730    gint postdot_array_ix = 0;
8731    for (start = 0; bv_scan (bv_pim_symbols, start, &min, &max); start = max + 2) {
8732	SYMID symid;
8733	for (symid = (SYMID)min; symid <= (SYMID) max; symid++) {
8734            PIM this_pim = pim_workarea[symid];
8735	    if (this_pim) postdot_array[postdot_array_ix++] = this_pim;
8736	}
8737    }
8738}
8739
8740@** Expand the Leo Items.
8741\libmarpa/ expands Leo items on a ``lazy" basis,
8742when it creates the parse bocage.
8743Some of the "virtual" Earley items in the Leo paths will also
8744be real Earley items.
8745Earley items in the Leo path may actually exist
8746for several reasons:
8747\li The Leo completion item itself always exists before
8748this function call.
8749It is counted in the total path lengths,
8750once for each Leo path.
8751This means that the total of the Leo path lengths will never be less
8752than the number of Leo paths.
8753\li Any Leo competion base items.
8754One of these exists for every path
8755whose base is a
8756completed Earley item, and not a token.
8757\li Any other Earley item in the Leo path item which was already created
8758for other reasons.
8759If an Earley item in a Leo path already exists, a new Earley
8760item is not created ---
8761instead a source link is added to the present Earley item.
8762
8763@** Evaluation --- Preliminary Notes.
8764
8765@*0 Alternate Start Rules.
8766Note that a start symbol only works if it is
8767on the LHS of just one rule.
8768This is not an issue with the main start symbol, because
8769Marpa uses an augmented grammar.
8770It {\bf is} an issue for alternate start symbols, when
8771I implement those, because an arbitrary symbol might be
8772on the LHS of several rules.
8773
8774@ Possibilities:
8775\li Require alternate start be specified as a rule, not a symbol.
8776\li Allow alternate start symbols, but only if they are on the LHS of a
8777single rule.
8778I don't like this it it limits the ability of grammar writers
8779to do on-the-fly experiments.
8780\li Both of the above.  That certainly covers the bases,
8781but it is just one more interface
8782complication.
8783
8784@ Note that even when a start rule is supplied, that does
8785not necessarily point to an unique Earley item.
8786A completed rule can belong to several different AHFA states.
8787That is OK, because even so origin, current earleme
8788and the links will all be identical for all such Earley items.
8789
8790@*0 Statistics on Completed LHS Symbols per AHFA State.
8791An AHFA state may contain completions for more than one LHS,
8792but that is rare in practical use, and the number of completed
8793LHS symbols in the exceptions remains low.
8794The very complex perl AHFA contains 271 states with completions.
8795Of these 268 have only one completed symbol.
8796The other three AHFA states complete only two different LHS symbols.
8797Two states have completions with both
8798a |term_hi| and a |indirob| on the LHS.
8799One state has completions for both a
8800|sideff| and an |mexpr|.
8801@ My HTML test grammars make the
8802same point more strongly.
8803My HTML parser generates grammars on the fly.
8804These HTML grammars can differ from each other.
8805because Marpa takes the HTML input into account when
8806generating the grammar.
8807In my HTML test suite,
8808of the 14,782 of the AHFA states, every
8809single one has only one completed LHS symbol.
8810
8811@*0 CHAF Duplicate And-Nodes.
8812There are three ways in which the same and-node can occur multiple
8813times as the descendant of a single or-node.
8814@ First, an or-node can have several different Earley items as
8815its source.  This is dealt with by noticing that in building the
8816or-node, we only use the source links of an Earley item, and
8817that these are always identical.  Therefore we can arbitrarily
8818select any one of the possible source Earley items to be
8819the or-node's ``unique" Earley item source.
8820@ The second source of duplication is duplicate source links
8821for the same Earley item.
8822I prevent token source links from duplicating,
8823and the Leo logic does not allow duplicate Leo source links.
8824@ Completion source links could be prevented from duplicating by
8825making the transition symbol part of its ``signature",
8826and making sure the source link transition symbol matches
8827the predot symbol of the or-node.
8828This would only impose a small overhead.
8829But given that I need to look for duplicates from other
8830sources, there does not seem to enough of a payoff to justify
8831even a small overhead.
8832@ A third source of duplication occurs
8833when different source links
8834have different AHFA states in their predecessors; but
8835share the the same AHFA item.
8836There will be
8837pairs of these source links which share the same middle earleme,
8838because if an AHFA item (dotted rule) in one is justified at a
8839location, the same AHFA item in the other must be, also.
8840This happen frequently enough to be an issue even for practical
8841grammars.
8842
8843@*0 Sources of Leo Path Items.
8844A Leo path consists of a series of Earley items:
8845\li at the bottom, exactly one Leo base item;
8846\li at the top, exactly one Leo completion item;
8847\li in between, zero or more Leo path items.
8848@ Leo base items and Leo completion items can have a variety
8849of non-Leo sources.
8850Leo completion items can have multiple Leo sources,
8851though no other source can have the same middle earleme
8852as a Leo source.
8853@ When expanded, Leo path items can have multiple sources.
8854However, the sources of a single Leo path item
8855will result from the same Leo predecessor.
8856As consequences:
8857\li All the sources of an expanded Leo path item will have the same
8858Earley item predecessor,
8859the Leo base item of the Leo predecessor.
8860\li All these sources will also have the same middle
8861earleme, the Earley set of the Leo predecessor.
8862\li Every source of the Leo path item will have a cause
8863and the transition symbol of the Leo predecessor
8864will be on the LHS of at least one completion in all of those causes.
8865\li The Leo transition symbol will be the postdot symbol in exactly
8866one AHFA item in the AHFA state of the Earley item predecessor.
8867
8868@** Ur-Node (UR) Code.
8869Ur is a German word for ``primordial", which is used
8870a lot in academic writing to designate precursors---%
8871for example, scholars who believe that Shakespeare's
8872{\it Hamlet} is based on another, now lost, play,
8873call this play the ur-Hamlet.
8874My ur-nodes are precursors of and-nodes and or-nodes.
8875@<Private incomplete structures@> =
8876struct s_ur_node_stack;
8877struct s_ur_node;
8878typedef struct s_ur_node_stack* URS;
8879typedef struct s_ur_node* UR;
8880typedef const struct s_ur_node* UR_Const;
8881@
8882@
8883{\bf To Do}: @^To Do@>
8884It may make sense to reuse this stack
8885for the alternatives.
8886In that case some of these structures
8887will need to be changed.
8888@d Prev_UR_of_UR(ur) ((ur)->t_prev)
8889@d LV_Prev_UR_of_UR(ur) Prev_UR_of_UR(ur)
8890@d Next_UR_of_UR(ur) ((ur)->t_next)
8891@d LV_Next_UR_of_UR(ur) Next_UR_of_UR(ur)
8892@d EIM_of_UR(ur) ((ur)->t_earley_item)
8893@d LV_EIM_of_UR(ur) EIM_of_UR(ur)
8894@d AEX_of_UR(ur) ((ur)->t_aex)
8895@d LV_AEX_of_UR(ur) AEX_of_UR(ur)
8896
8897@<Private structures@> =
8898struct s_ur_node_stack {
8899   struct obstack t_obs;
8900   UR t_base;
8901   UR t_top;
8902};
8903struct s_ur_node {
8904   UR t_prev;
8905   UR t_next;
8906   EIM t_earley_item;
8907   AEX t_aex;
8908};
8909@ @d URS_of_R(r) (&(r)->t_ur_node_stack)
8910@<Widely aligned recognizer elements@> =
8911struct s_ur_node_stack t_ur_node_stack;
8912@
8913{\bf To Do}: @^To Do@>
8914The lifetime of this stack should be reexamined once its uses
8915are settled.
8916@<Initialize recognizer elements@> =
8917    ur_node_stack_init(URS_of_R(r));
8918@ @<Destroy recognizer elements@> =
8919    ur_node_stack_destroy(URS_of_R(r));
8920
8921@ @<Private function prototypes@> =
8922static inline void ur_node_stack_init(URS stack);
8923@ @<Function definitions@> =
8924static inline void ur_node_stack_init(URS stack) {
8925MARPA_OFF_DEBUG2("ur_node_stack_init %s", G_STRLOC);
8926    obstack_init(&stack->t_obs);
8927    stack->t_base = ur_node_new(stack, 0);
8928    ur_node_stack_reset(stack);
8929}
8930
8931@ @<Private function prototypes@> =
8932static inline void ur_node_stack_reset(URS stack);
8933@ @<Function definitions@> =
8934static inline void ur_node_stack_reset(URS stack) {
8935    stack->t_top = stack->t_base;
8936}
8937
8938@ @<Private function prototypes@> =
8939static inline void ur_node_stack_destroy(URS stack);
8940@ @<Function definitions@> =
8941static inline void ur_node_stack_destroy(URS stack) {
8942MARPA_OFF_DEBUG2("ur_node_stack_destroy %s", G_STRLOC);
8943    if (stack->t_base) obstack_free(&stack->t_obs, NULL);
8944    stack->t_base = NULL;
8945MARPA_OFF_DEBUG2("ur_node_stack_destroy %s", G_STRLOC);
8946}
8947
8948@ @<Private function prototypes@> =
8949static inline UR ur_node_new(URS stack, UR prev);
8950@ @<Function definitions@> =
8951static inline UR ur_node_new(URS stack, UR prev) {
8952    UR new_ur_node;
8953    new_ur_node = obstack_alloc(&stack->t_obs, sizeof(new_ur_node[0]));
8954    LV_Next_UR_of_UR(new_ur_node) = 0;
8955    LV_Prev_UR_of_UR(new_ur_node) = prev;
8956    return new_ur_node;
8957}
8958
8959@ @<Private function prototypes@> =
8960static inline void ur_node_push(URS stack, EIM earley_item, AEX aex);
8961@ @<Function definitions@> =
8962static inline void
8963ur_node_push (URS stack, EIM earley_item, AEX aex)
8964{
8965  UR top = stack->t_top;
8966  UR new_top = Next_UR_of_UR (top);
8967  LV_EIM_of_UR (top) = earley_item;
8968  LV_AEX_of_UR (top) = aex;
8969  if (!new_top)
8970    {
8971      new_top = ur_node_new (stack, top);
8972      LV_Next_UR_of_UR (top) = new_top;
8973    }
8974  stack->t_top = new_top;
8975}
8976
8977@ @<Private function prototypes@> =
8978static inline UR ur_node_pop(URS stack);
8979@ @<Function definitions@> =
8980static inline UR
8981ur_node_pop (URS stack)
8982{
8983  UR new_top = Prev_UR_of_UR (stack->t_top);
8984  if (!new_top) return NULL;
8985  stack->t_top = new_top;
8986  return new_top;
8987}
8988
8989@ |predecessor_aim| and |predot|
8990are guaranteed to be defined,
8991since predictions and the null parse AHFA item are
8992never on the stack.
8993@<Populate the PSIA data@>=
8994{
8995    UR_Const ur_node;
8996    const URS ur_node_stack = URS_of_R(r);
8997    ur_node_stack_reset(ur_node_stack);
8998    {
8999       const EIM ur_earley_item = start_eim;
9000       const AIM ur_aim = start_aim;
9001       const AEX ur_aex = start_aex;
9002	@<Push ur-node if new@>@;
9003    }
9004    while ((ur_node = ur_node_pop(ur_node_stack)))
9005    {
9006        const EIM_Const parent_earley_item = EIM_of_UR(ur_node);
9007	const AEX parent_aex = AEX_of_UR(ur_node);
9008	const AIM parent_aim = AIM_of_EIM_by_AEX (parent_earley_item, parent_aex);
9009	MARPA_ASSERT(parent_aim >= AIM_by_ID(1))@;
9010	const AIM predecessor_aim = parent_aim - 1;
9011	/* Note that the postdot symbol of the predecessor is NOT necessarily the
9012	   predot symbol, because there may be nulling symbols in between. */
9013	guint source_type = Source_Type_of_EIM (parent_earley_item);
9014	MARPA_ASSERT(!EIM_is_Predicted(parent_earley_item))@;
9015	@<Push child Earley items from token sources@>@;
9016	@<Push child Earley items from completion sources@>@;
9017	@<Push child Earley items from Leo sources@>@;
9018    }
9019    @<Unset the PSIA for the start rule prediction@>@;
9020}
9021
9022@ The start rule prediction is a special case ---
9023it is the one AHFA prediction item not in an
9024predicted AHFA state.
9025It's dealt with by letting its entry in the
9026PSIA be set spuriously, then unsetting it.
9027Not very elegant, but this deals with it at a constant
9028cost per parse.
9029@<Unset the PSIA for the start rule prediction@> = {
9030    const ES first_earley_set = ES_of_R_by_Ord (r, 0);
9031    OR** const nodes_by_item = per_es_data[0].t_aexes_by_item;
9032    const EIM* const eims_of_es = EIMs_of_ES(first_earley_set);
9033    const gint item_count = EIM_Count_of_ES (first_earley_set);
9034    gint item_ordinal;
9035    for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++)
9036    {
9037	OR* const nodes_by_aex = nodes_by_item[item_ordinal];
9038	if (nodes_by_aex) {
9039	    const EIM earley_item = eims_of_es[item_ordinal];
9040	    const Marpa_AHFA_State_ID ahfa_id = AHFAID_of_EIM(earley_item);
9041	    /* The prediction start rule will be in AHFA state 0 */
9042	    if (ahfa_id) continue;
9043	    {
9044		const gint aim_count_of_item = AIM_Count_of_EIM(earley_item);
9045		AEX aex;
9046		for (aex = 0; aex < aim_count_of_item; aex++) {
9047		    AIM ahfa_item = AIM_of_EIM_by_AEX(earley_item, aex);
9048		    if (Position_of_AIM(ahfa_item) == 0) {
9049			/* Don't bother with the null count ---
9050			there are no nulling symbols in the start rule */
9051			nodes_by_aex[aex] = NULL;
9052			goto FINISHED_UNSET;
9053		    }
9054		}
9055	    }
9056	}
9057    }
9058    FINISHED_UNSET: ;
9059}
9060
9061@ @<Push ur-node if new@> = {
9062    if (!psia_test_and_set
9063	(&bocage_setup_obs, per_es_data, ur_earley_item, ur_aex))
9064      {
9065	ur_node_push (ur_node_stack, ur_earley_item, ur_aex);
9066	or_node_estimate += 1 + Null_Count_of_AIM(ur_aim);
9067      }
9068}
9069
9070@ The |PSIA| is a container of data that is per Earley-set, per Earley item,
9071and per AEX.  Thus, Per-Set-Item-Aex, or PSIA.
9072This function ensures that the appropriate |PSIA| boolean is set,
9073and returns that boolean's value prior to the call.
9074@<Private function prototypes@> =
9075static inline gint psia_test_and_set(
9076    struct obstack* obs,
9077    struct s_bocage_setup_per_es* per_es_data,
9078    EIM earley_item,
9079    AEX ahfa_element_ix);
9080@ @<Function definitions@> =
9081static inline gint psia_test_and_set(
9082    struct obstack* obs,
9083    struct s_bocage_setup_per_es* per_es_data,
9084    EIM earley_item,
9085    AEX ahfa_element_ix)
9086{
9087    const gint aim_count_of_item = AIM_Count_of_EIM(earley_item);
9088    const Marpa_Earley_Set_ID set_ordinal = ES_Ord_of_EIM(earley_item);
9089    OR** nodes_by_item = per_es_data[set_ordinal].t_aexes_by_item;
9090    const gint item_ordinal = Ord_of_EIM(earley_item);
9091    OR* nodes_by_aex = nodes_by_item[item_ordinal];
9092MARPA_ASSERT(ahfa_element_ix < aim_count_of_item)@;
9093    if (!nodes_by_aex) {
9094	AEX aex;
9095        nodes_by_aex = nodes_by_item[item_ordinal] =
9096	    obstack_alloc(obs, aim_count_of_item*sizeof(OR));
9097	for (aex = 0; aex < aim_count_of_item; aex++) {
9098	    nodes_by_aex[aex] = NULL;
9099	}
9100    }
9101    if (!nodes_by_aex[ahfa_element_ix]) {
9102	nodes_by_aex[ahfa_element_ix] = dummy_or_node;
9103	return 0;
9104    }
9105    return 1;
9106}
9107
9108@ @<Push child Earley items from token sources@> =
9109{
9110  SRCL source_link = NULL;
9111  EIM predecessor_earley_item = NULL;
9112  switch (source_type)
9113    {
9114    case SOURCE_IS_TOKEN:
9115      predecessor_earley_item = Predecessor_of_EIM (parent_earley_item);
9116      break;
9117    case SOURCE_IS_AMBIGUOUS:
9118      source_link = First_Token_Link_of_EIM (parent_earley_item);
9119      if (source_link)
9120	{
9121	  predecessor_earley_item = Predecessor_of_SRCL (source_link);
9122	  source_link = Next_SRCL_of_SRCL (source_link);
9123	}
9124    }
9125    for (;;)
9126      {
9127	if (predecessor_earley_item)
9128	  {
9129	    if (EIM_is_Predicted(predecessor_earley_item)) {
9130		Set_boolean_in_PSIA_for_initial_nulls(predecessor_earley_item, predecessor_aim);
9131	    } else {
9132		const EIM ur_earley_item = predecessor_earley_item;
9133		const AEX ur_aex =
9134		  AEX_of_EIM_by_AIM (predecessor_earley_item, predecessor_aim);
9135		const AIM ur_aim = predecessor_aim;
9136		@<Push ur-node if new@>@;
9137	    }
9138	  }
9139	if (!source_link)
9140	  break;
9141	predecessor_earley_item = Predecessor_of_SRCL (source_link);
9142	source_link = Next_SRCL_of_SRCL (source_link);
9143      }
9144}
9145
9146@ If there are initial nulls, set a boolean in the PSIA
9147so that I will know to create the chain of or-nodes for them.
9148We don't need to stack the prediction, because it can have
9149no other descendants.
9150@d Set_boolean_in_PSIA_for_initial_nulls(eim, aim) {
9151    if (Position_of_AIM(aim) > 0) {
9152	const gint null_count = Null_Count_of_AIM(aim);
9153	if (null_count) {
9154	    AEX aex = AEX_of_EIM_by_AIM((eim),
9155		(aim));
9156	    or_node_estimate += null_count;
9157	    psia_test_and_set(&bocage_setup_obs, per_es_data,
9158		(eim), aex);
9159	}
9160    }
9161}
9162
9163@ @<Push child Earley items from completion sources@> =
9164{
9165  SRCL source_link = NULL;
9166  EIM predecessor_earley_item = NULL;
9167  EIM cause_earley_item = NULL;
9168  const SYMID transition_symbol_id = Postdot_SYMID_of_AIM(predecessor_aim);
9169  switch (source_type)
9170    {
9171    case SOURCE_IS_COMPLETION:
9172      predecessor_earley_item = Predecessor_of_EIM (parent_earley_item);
9173      cause_earley_item = Cause_of_EIM (parent_earley_item);
9174      break;
9175    case SOURCE_IS_AMBIGUOUS:
9176      source_link = First_Completion_Link_of_EIM (parent_earley_item);
9177      if (source_link)
9178	{
9179	  predecessor_earley_item = Predecessor_of_SRCL (source_link);
9180	  cause_earley_item = Cause_of_SRCL (source_link);
9181	  source_link = Next_SRCL_of_SRCL (source_link);
9182	}
9183	break;
9184    }
9185  while (cause_earley_item)
9186    {
9187	if (predecessor_earley_item)
9188	  {
9189	    if (EIM_is_Predicted (predecessor_earley_item))
9190	      {
9191		Set_boolean_in_PSIA_for_initial_nulls(predecessor_earley_item, predecessor_aim);
9192	      }
9193	    else
9194	      {
9195		const EIM ur_earley_item = predecessor_earley_item;
9196		const AEX ur_aex =
9197		  AEX_of_EIM_by_AIM (predecessor_earley_item, predecessor_aim);
9198		const AIM ur_aim = predecessor_aim;
9199		@<Push ur-node if new@>@;
9200	      }
9201	  }
9202    {
9203      const TRANS cause_completion_data =
9204	TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id);
9205      const gint aex_count = Completion_Count_of_TRANS (cause_completion_data);
9206      const AEX * const aexes = AEXs_of_TRANS (cause_completion_data);
9207      const EIM ur_earley_item = cause_earley_item;
9208      gint ix;
9209      for (ix = 0; ix < aex_count; ix++) {
9210	  const AEX ur_aex = aexes[ix];
9211	  const AIM ur_aim = AIM_of_EIM_by_AEX(ur_earley_item, ur_aex);
9212	    @<Push ur-node if new@>@;
9213      }
9214    }
9215      if (!source_link) break;
9216      predecessor_earley_item = Predecessor_of_SRCL (source_link);
9217      cause_earley_item = Cause_of_SRCL (source_link);
9218      source_link = Next_SRCL_of_SRCL (source_link);
9219    }
9220}
9221
9222@ @<Push child Earley items from Leo sources@> =
9223{
9224  SRCL source_link = NULL;
9225  EIM cause_earley_item = NULL;
9226  LIM leo_predecessor = NULL;
9227  switch (source_type)
9228    {
9229    case SOURCE_IS_LEO:
9230      leo_predecessor = Predecessor_of_EIM (parent_earley_item);
9231      cause_earley_item = Cause_of_EIM (parent_earley_item);
9232      break;
9233    case SOURCE_IS_AMBIGUOUS:
9234      source_link = First_Leo_SRCL_of_EIM (parent_earley_item);
9235      if (source_link)
9236	{
9237	  leo_predecessor = Predecessor_of_SRCL (source_link);
9238	  cause_earley_item = Cause_of_SRCL (source_link);
9239	  source_link = Next_SRCL_of_SRCL (source_link);
9240	}
9241      break;
9242    }
9243  while (cause_earley_item)
9244    {
9245      const SYMID transition_symbol_id = Postdot_SYMID_of_LIM(leo_predecessor);
9246      const TRANS cause_completion_data =
9247	TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id);
9248      const gint aex_count = Completion_Count_of_TRANS (cause_completion_data);
9249      const AEX * const aexes = AEXs_of_TRANS (cause_completion_data);
9250      gint ix;
9251      EIM ur_earley_item = cause_earley_item;
9252      for (ix = 0; ix < aex_count; ix++) {
9253	  const AEX ur_aex = aexes[ix];
9254	  const AIM ur_aim = AIM_of_EIM_by_AEX(ur_earley_item, ur_aex);
9255	  @<Push ur-node if new@>@;
9256      }
9257    while (leo_predecessor) {
9258      SYMID postdot = Postdot_SYMID_of_LIM (leo_predecessor);
9259      EIM leo_base = Base_EIM_of_LIM (leo_predecessor);
9260      TRANS transition = TRANS_of_EIM_by_SYMID (leo_base, postdot);
9261      const AEX ur_aex = Leo_Base_AEX_of_TRANS (transition);
9262      const AIM ur_aim = AIM_of_EIM_by_AEX(leo_base, ur_aex);
9263      ur_earley_item = leo_base;
9264      /* Increment the
9265      estimate to account for the Leo path or-nodes */
9266      or_node_estimate += 1 + Null_Count_of_AIM(ur_aim+1);
9267	if (EIM_is_Predicted (ur_earley_item))
9268	  {
9269	    Set_boolean_in_PSIA_for_initial_nulls(ur_earley_item, ur_aim);
9270	  } else {
9271	      @<Push ur-node if new@>@;
9272	  }
9273	leo_predecessor = Predecessor_LIM_of_LIM(leo_predecessor);
9274        }
9275	if (!source_link) break;
9276	  leo_predecessor = Predecessor_of_SRCL (source_link);
9277	  cause_earley_item = Cause_of_SRCL (source_link);
9278	  source_link = Next_SRCL_of_SRCL (source_link);
9279      }
9280}
9281
9282@** Or-Node (OR) Code.
9283The or-nodes are part of the parse bocage
9284and are similar to the or-nodes of a standard parse forest.
9285Unlike a parse forest,
9286a parse bocage can contain cycles.
9287
9288@<Public typedefs@> =
9289typedef gint Marpa_Or_Node_ID;
9290@ @<Private typedefs@> =
9291typedef Marpa_Or_Node_ID ORID;
9292
9293@*0 Relationship of Earley Items to Or-Nodes.
9294Several Earley items may be the source of the same or-node,
9295but the or-node only keeps track of one.  This is sufficient,
9296because the Earley item is tracked by the or-node only for its
9297links and,
9298by the following theorem,
9299the links for every Earley item which is the source
9300of the same or-node must be the same.
9301
9302@ {\bf Theorem}: If two Earley items are sources of the same or-node,
9303they have the same links.
9304{\bf Outline of Proof}:
9305No or-node results from a predicted Earley
9306item, so every Earley item which is the source of an or-node
9307is itself the result of a transition over a symbol from
9308another Earley item.
9309So I can restrict my discussion to discovered Earley items.
9310For the same reason, I can assume all source links have
9311predecessors defined.
9312
9313@ {\bf Shared Predot Lemma}: An AHFA state is either predicted,
9314or all its LR0 items share the same predot symbol.
9315{\bf Proof}:  Straightforward, based on the construction of
9316an AHFA.
9317
9318@ {\bf EIM Lemma }: If two Earley items are sources of the same or-node,
9319they share the same origin ES, the same current ES and the same
9320predot symbol.
9321{\bf Proof of Lemma}:
9322Showing that the Earley items share the same origin and current
9323ES is straightforward, based on the or-node's construction.
9324They share at least one LR0 item in their AHFA states---%
9325the LR0 item which defines the or-node.
9326Because they share at least one LR0 item and because, by the
9327Shared Predot Lemma, every LR0
9328item in a discovered AHFA state has the same predot symbol,
9329the two Earley items also
9330share the same predot symbol.
9331
9332@ {\bf Completion Source Lemma}:
9333A discovered Earley item has a completion source link if and only if
9334the origin ES of the link's predecessor,
9335the current ES of the link's cause
9336and the transition symbol match, respectively,
9337the origin ES, current ES and predot symbol of the discovered EIM.
9338{\bf Proof}: Based on the construction of EIMs.
9339
9340@ {\bf Token Source Lemma}:
9341A discovered Earley item has a token source link if and only if
9342origin ES of the link's predecessor, the current ES of the link's cause
9343and the token symbol match, respectively,
9344the origin ES, current ES and predot symbol of the discovered EIM.
9345{\bf Proof}: Based on the construction of EIMs.
9346
9347@ Source links are either completion source links or token source links.
9348The theorem for completion source links follows from the EIM Lemma and the
9349Completion Source Lemma.
9350The theorem for token source links follows from the EIM Lemma and the
9351Token Source Lemma.
9352{\bf QED}.
9353
9354@ @<Private incomplete structures@> =
9355union u_or_node;
9356typedef union u_or_node* OR;
9357@ The type is contained in same word as the position is
9358for final or-nodes.
9359@s OR int
9360Position is |DUMMY_OR_NODE| for dummy or-nodes,
9361|TOKEN_OR_NODE| if the or-node is actually a symbol.
9362Position is the dot position.
9363@d DUMMY_OR_NODE -1
9364@d TOKEN_OR_NODE -2
9365@d OR_is_Token(or) (Type_of_OR(or) == TOKEN_OR_NODE)
9366@d Position_of_OR(or) ((or)->t_final.t_position)
9367@d Type_of_OR(or) ((or)->t_final.t_position)
9368@d RULE_of_OR(or) ((or)->t_final.t_rule)
9369@d Origin_Ord_of_OR(or) ((or)->t_final.t_start_set_ordinal)
9370@d ID_of_OR(or) ((or)->t_final.t_id)
9371@d ES_Ord_of_OR(or) ((or)->t_draft.t_end_set_ordinal)
9372@d DANDs_of_OR(or) ((or)->t_draft.t_draft_and_node)
9373@d First_ANDID_of_OR(or) ((or)->t_final.t_first_and_node_id)
9374@d AND_Count_of_OR(or) ((or)->t_final.t_and_node_count)
9375@ C89 guarantees that common initial sequences
9376may be accessed via different members of a union.
9377@<Or-node common initial sequence@> =
9378gint t_position;
9379gint t_end_set_ordinal;
9380RULE t_rule;
9381gint t_start_set_ordinal;
9382ORID t_id;
9383@ @<Private structures@> =
9384struct s_draft_or_node
9385{
9386    @<Or-node common initial sequence@>@;
9387  DAND t_draft_and_node;
9388};
9389@ @<Private structures@> =
9390struct s_final_or_node
9391{
9392    @<Or-node common initial sequence@>@;
9393    gint t_first_and_node_id;
9394    gint t_and_node_count;
9395};
9396@
9397@d TOK_of_OR(or) (&(or)->t_token)
9398@d SYMID_of_OR(or) SYMID_of_TOK(TOK_of_OR(or))
9399@d Value_of_OR(or) Value_of_TOK(TOK_of_OR(or))
9400@<Private structures@> =
9401union u_or_node {
9402    struct s_draft_or_node t_draft;
9403    struct s_final_or_node t_final;
9404    struct s_token t_token;
9405};
9406typedef union u_or_node OR_Object;
9407
9408@ @<Private global variables@> =
9409static const gint dummy_or_node_type = DUMMY_OR_NODE;
9410static const OR dummy_or_node = (OR)&dummy_or_node_type;
9411
9412@ @d ORs_of_B(b) ((b)->t_or_nodes)
9413@d OR_of_B_by_ID(b, id) (ORs_of_B(b)[(id)])
9414@d OR_Count_of_B(b) ((b)->t_or_node_count)
9415@d ANDs_of_B(b) ((b)->t_and_nodes)
9416@d AND_Count_of_B(b) ((b)->t_and_node_count)
9417@d Top_ORID_of_B(b) ((b)->t_top_or_node_id)
9418@<Widely aligned bocage elements@> =
9419OR* t_or_nodes;
9420AND t_and_nodes;
9421@ @<Int aligned bocage elements@> =
9422gint t_or_node_count;
9423gint t_and_node_count;
9424ORID t_top_or_node_id;
9425
9426@ @<Initialize bocage elements@> =
9427ORs_of_B(b) = NULL;
9428OR_Count_of_B(b) = 0;
9429ANDs_of_B(b) = NULL;
9430AND_Count_of_B(b) = 0;
9431
9432@ @<Destroy bocage elements, main phase@> =
9433{
9434  OR* or_nodes = ORs_of_B (b);
9435  AND and_nodes = ANDs_of_B (b);
9436  if (or_nodes)
9437    {
9438      g_free (or_nodes);
9439      ORs_of_B (b) = NULL;
9440    }
9441  if (and_nodes)
9442    {
9443      g_free (and_nodes);
9444      ANDs_of_B (b) = NULL;
9445    }
9446}
9447
9448@*0 Create the Or-Nodes.
9449@<Create the or-nodes for all earley sets@> =
9450{
9451  PSAR_Object or_per_es_arena;
9452  const PSAR or_psar = &or_per_es_arena;
9453  gint work_earley_set_ordinal;
9454  OR last_or_node = NULL ;
9455  ORs_of_B (b) = g_new (OR, or_node_estimate);
9456  psar_init (or_psar, SYMI_Count_of_G (g));
9457  for (work_earley_set_ordinal = 0;
9458      work_earley_set_ordinal < earley_set_count_of_r;
9459      work_earley_set_ordinal++)
9460  {
9461      const ES_Const earley_set = ES_of_R_by_Ord (r, work_earley_set_ordinal);
9462    EIM* const eims_of_es = EIMs_of_ES(earley_set);
9463    const gint item_count = EIM_Count_of_ES (earley_set);
9464      PSL this_earley_set_psl;
9465    OR** const nodes_by_item = per_es_data[work_earley_set_ordinal].t_aexes_by_item;
9466      psar_dealloc(or_psar);
9467#define PSL_ES_ORD work_earley_set_ordinal
9468#define CLAIMED_PSL this_earley_set_psl
9469      @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@;
9470    @<Create the or-nodes for |work_earley_set_ordinal|@>@;
9471    @<Create the draft and-nodes for |work_earley_set_ordinal|@>@;
9472  }
9473  psar_destroy (or_psar);
9474  ORs_of_B(b) = g_renew (OR, ORs_of_B(b), OR_Count_of_B(b));
9475}
9476
9477@ @<Create the or-nodes for |work_earley_set_ordinal|@> =
9478{
9479    gint item_ordinal;
9480    for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++)
9481    {
9482	OR* const work_nodes_by_aex = nodes_by_item[item_ordinal];
9483	if (work_nodes_by_aex) {
9484	    const EIM work_earley_item = eims_of_es[item_ordinal];
9485	    const gint work_ahfa_item_count = AIM_Count_of_EIM(work_earley_item);
9486	    AEX work_aex;
9487	      const gint work_origin_ordinal = Ord_of_ES (Origin_of_EIM (work_earley_item));
9488	    for (work_aex = 0; work_aex < work_ahfa_item_count; work_aex++) {
9489		if (!work_nodes_by_aex[work_aex]) continue;
9490		@<Create the or-nodes
9491		    for |work_earley_item| and |work_aex|@>@;
9492	    }
9493	}
9494    }
9495}
9496
9497@ @<Create the or-nodes for |work_earley_item| and |work_aex|@> =
9498{
9499  AIM ahfa_item = AIM_of_EIM_by_AEX(work_earley_item, work_aex);
9500  SYMI ahfa_item_symbol_instance;
9501  OR psia_or_node = NULL;
9502  ahfa_item_symbol_instance = SYMI_of_AIM(ahfa_item);
9503  {
9504	    PSL or_psl;
9505#define PSL_ES_ORD work_origin_ordinal
9506#define CLAIMED_PSL or_psl
9507	@<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@;
9508	@<Add main or-node@>@;
9509	@<Add nulling token or-nodes@>@;
9510    }
9511    /* Replace the dummy or-node with
9512    the last one added */
9513    MARPA_ASSERT (psia_or_node)@;
9514    work_nodes_by_aex[work_aex] = psia_or_node;
9515    @<Add Leo or-nodes@>@;
9516}
9517
9518@*0 Non-Leo Or-Nodes.
9519@ Add the main or-node---%
9520the one that corresponds directly to this AHFA item.
9521The exception are predicted AHFA items.
9522Or-nodes are not added for predicted AHFA items.
9523@<Add main or-node@> =
9524{
9525MARPA_OFF_DEBUG3("%s ahfa_item_symbol_instance = %d", G_STRLOC, ahfa_item_symbol_instance);
9526  if (ahfa_item_symbol_instance >= 0)
9527    {
9528      OR or_node;
9529MARPA_ASSERT(ahfa_item_symbol_instance < SYMI_Count_of_G(g))@;
9530      or_node = PSL_Datum (or_psl, ahfa_item_symbol_instance);
9531      if (!or_node || ES_Ord_of_OR(or_node) != work_earley_set_ordinal)
9532	{
9533	  const RULE rule = RULE_of_AIM(ahfa_item);
9534	  @<Set |last_or_node| to a new or-node@>@;
9535	  or_node = last_or_node;
9536	  PSL_Datum (or_psl, ahfa_item_symbol_instance) = last_or_node;
9537	  Origin_Ord_of_OR(or_node) = Origin_Ord_of_EIM(work_earley_item);
9538	  ES_Ord_of_OR(or_node) = work_earley_set_ordinal;
9539	  RULE_of_OR(or_node) = rule;
9540	  Position_of_OR (or_node) =
9541	      ahfa_item_symbol_instance - SYMI_of_RULE (rule) + 1;
9542	  DANDs_of_OR(or_node) = NULL;
9543	}
9544	psia_or_node = or_node;
9545    }
9546}
9547
9548@ The resizing of the or-node array here presents an issue.
9549It should not be invoked, which means it is never tested,
9550which raises the question of either having confidence in the logic
9551and deleting the code,
9552or arranging to test it.
9553@<Set |last_or_node| to a new or-node@> =
9554{
9555  const gint or_node_id = OR_Count_of_B (b)++;
9556  OR *or_nodes_of_b = ORs_of_B (b);
9557  last_or_node = (OR)obstack_alloc (&OBS_of_B(b), sizeof(OR_Object));
9558  ID_of_OR(last_or_node) = or_node_id;
9559  if (G_UNLIKELY(or_node_id >= or_node_estimate))
9560    {
9561      MARPA_ASSERT(0);
9562      or_node_estimate *= 2;
9563      ORs_of_B (b) = or_nodes_of_b =
9564	g_renew (OR, or_nodes_of_b, or_node_estimate);
9565    }
9566  or_nodes_of_b[or_node_id] = last_or_node;
9567}
9568
9569
9570@  In the following logic, the order matters.
9571The one added last in this or the logic for
9572adding the main item, will be used as the or node
9573in the PSIA.
9574@ In building the final or-node, the predecessor can be
9575determined using the PSIA for $|symbol_instance|-1$.
9576The exception is where there is no predecessor,
9577and this is the case if |Position_of_OR(or_node) == 0|.
9578@<Add nulling token or-nodes@> =
9579{
9580  const gint null_count = Null_Count_of_AIM (ahfa_item);
9581  if (null_count > 0)
9582    {
9583      const RULE rule = RULE_of_AIM (ahfa_item);
9584      const gint symbol_instance_of_rule = SYMI_of_RULE(rule);
9585      const gint first_null_symbol_instance =
9586	  ahfa_item_symbol_instance < 0 ? symbol_instance_of_rule : ahfa_item_symbol_instance + 1;
9587      gint i;
9588      for (i = 0; i < null_count; i++)
9589	{
9590	  const gint symbol_instance = first_null_symbol_instance + i;
9591	  OR or_node = PSL_Datum (or_psl, symbol_instance);
9592MARPA_OFF_DEBUG3("adding nulling token or-node EIM = %s aex=%d",
9593    eim_tag(work_earley_item), work_aex);
9594	  if (!or_node || ES_Ord_of_OR (or_node) != work_earley_set_ordinal) {
9595		DAND draft_and_node;
9596		const gint rhs_ix = symbol_instance - SYMI_of_RULE(rule);
9597		const OR predecessor = rhs_ix ? last_or_node : NULL;
9598		const OR cause = (OR)TOK_by_ID_of_R( r, RHS_ID_of_RULE (rule, rhs_ix ) );
9599		@<Set |last_or_node| to a new or-node@>@;
9600		or_node = PSL_Datum (or_psl, symbol_instance) = last_or_node ;
9601		Origin_Ord_of_OR (or_node) = work_origin_ordinal;
9602		ES_Ord_of_OR (or_node) = work_earley_set_ordinal;
9603		RULE_of_OR (or_node) = rule;
9604MARPA_OFF_DEBUG3("Added rule %p to or-node %p", RULE_of_OR(or_node), or_node);
9605		Position_of_OR (or_node) = rhs_ix + 1;
9606MARPA_ASSERT(Position_of_OR(or_node) <= 1 || predecessor);
9607		draft_and_node = DANDs_of_OR (or_node) =
9608		  draft_and_node_new (&bocage_setup_obs, predecessor,
9609		      cause);
9610MARPA_OFF_DEBUG3("or = %p, setting DAND = %p", or_node, DANDs_of_OR(or_node));
9611		Next_DAND_of_DAND (draft_and_node) = NULL;
9612	      }
9613	      psia_or_node = or_node;
9614	}
9615    }
9616}
9617
9618@*0 Leo Or-Nodes.
9619@<Add Leo or-nodes@> = {
9620  SRCL source_link = NULL;
9621  EIM cause_earley_item = NULL;
9622  LIM leo_predecessor = NULL;
9623  switch (Source_Type_of_EIM(work_earley_item))
9624    {
9625    case SOURCE_IS_LEO:
9626      leo_predecessor = Predecessor_of_EIM (work_earley_item);
9627      cause_earley_item = Cause_of_EIM (work_earley_item);
9628      break;
9629    case SOURCE_IS_AMBIGUOUS:
9630      source_link = First_Leo_SRCL_of_EIM (work_earley_item);
9631      if (source_link)
9632	{
9633	  leo_predecessor = Predecessor_of_SRCL (source_link);
9634	  cause_earley_item = Cause_of_SRCL (source_link);
9635	  source_link = Next_SRCL_of_SRCL (source_link);
9636	}
9637      break;
9638    }
9639    if (leo_predecessor) {
9640	for (;;) { /* for each Leo source link */
9641	    @<Add or-nodes for chain starting with |leo_predecessor|@>@;
9642	    if (!source_link) break;
9643	    leo_predecessor = Predecessor_of_SRCL (source_link);
9644	    cause_earley_item = Cause_of_SRCL (source_link);
9645	    source_link = Next_SRCL_of_SRCL (source_link);
9646	}
9647    }
9648}
9649
9650@ The main loop in this code deliberately skips the first leo predecessor.
9651The successor of the first leo predecessor is the base of the Leo path,
9652which already exists, and therefore the first leo predecessor is not
9653expanded.
9654@ The unwrapping of the information for the Leo path item is quite the
9655process, and some memoization might be useful.
9656But it is not clear that memoization does more than move
9657the processing from one place to another, increasing space
9658requirements in the process.
9659@<Add or-nodes for chain starting with |leo_predecessor|@> =
9660{
9661  LIM this_leo_item = leo_predecessor;
9662  LIM previous_leo_item = this_leo_item;
9663  while ((this_leo_item = Predecessor_LIM_of_LIM (this_leo_item)))
9664    {
9665	const gint ordinal_of_set_of_this_leo_item = Ord_of_ES(ES_of_LIM(this_leo_item));
9666          const AIM path_ahfa_item = Path_AIM_of_LIM(previous_leo_item);
9667	  const RULE path_rule = RULE_of_AIM(path_ahfa_item);
9668	  const gint symbol_instance_of_path_ahfa_item = SYMI_of_AIM(path_ahfa_item);
9669	@<Add main Leo path or-node@>@;
9670	@<Add Leo path nulling token or-nodes@>@;
9671	previous_leo_item = this_leo_item;
9672    }
9673}
9674
9675@ Get the base data for a Leo item -- it's base Earley item
9676and the index of the relevant AHFA item.
9677@<Private function prototypes@> =
9678static inline AEX lim_base_data_get(LIM leo_item, EIM* p_base);
9679@ @<Function definitions@> =
9680static inline AEX lim_base_data_get(LIM leo_item, EIM* p_base)
9681{
9682      const SYMID postdot = Postdot_SYMID_of_LIM (leo_item);
9683      const EIM base = Base_EIM_of_LIM(leo_item);
9684      const TRANS transition = TRANS_of_EIM_by_SYMID (base, postdot);
9685      *p_base = base;
9686      return Leo_Base_AEX_of_TRANS (transition);
9687}
9688
9689@ @d Path_AIM_of_LIM(lim) (base_aim_of_lim(lim)+1)
9690@d Base_AIM_of_LIM(lim) (base_aim_of_lim(lim))
9691@<Private function prototypes@> =
9692static inline AIM base_aim_of_lim(LIM leo_item);
9693@ @<Function definitions@> =
9694static inline AIM base_aim_of_lim(LIM leo_item)
9695{
9696      EIM base;
9697      const AEX base_aex = lim_base_data_get(leo_item, &base);
9698      return AIM_of_EIM_by_AEX(base, base_aex);
9699}
9700
9701@ Adds the main Leo path or-node---%
9702the non-nulling or-node which
9703corresponds to the leo predecessor.
9704@<Add main Leo path or-node@> =
9705{
9706    {
9707      OR or_node;
9708      PSL leo_psl;
9709#define PSL_ES_ORD ordinal_of_set_of_this_leo_item
9710#define CLAIMED_PSL leo_psl
9711	@<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@>@;
9712      or_node = PSL_Datum (leo_psl, symbol_instance_of_path_ahfa_item);
9713      if (!or_node || ES_Ord_of_OR(or_node) != work_earley_set_ordinal)
9714	{
9715	  @<Set |last_or_node| to a new or-node@>@;
9716	  PSL_Datum (leo_psl, symbol_instance_of_path_ahfa_item) = or_node = last_or_node;
9717	  Origin_Ord_of_OR(or_node) = ordinal_of_set_of_this_leo_item;
9718	  ES_Ord_of_OR(or_node) = work_earley_set_ordinal;
9719	  RULE_of_OR(or_node) = path_rule;
9720	  Position_of_OR (or_node) =
9721	      symbol_instance_of_path_ahfa_item - SYMI_of_RULE (path_rule) + 1;
9722MARPA_OFF_DEBUG3("Created or-node %s at %s", or_tag(or_node), G_STRLOC);
9723	  DANDs_of_OR(or_node) = NULL;
9724MARPA_OFF_DEBUG3("or = %p, setting DAND = %p", or_node, DANDs_of_OR(or_node));
9725	}
9726    }
9727}
9728
9729@ In building the final or-node, the predecessor can be
9730determined using the PSIA for $|symbol_instance|-1$.
9731There will always be a predecessor, since these nulling
9732or-nodes follow a completion.
9733@<Add Leo path nulling token or-nodes@> =
9734{
9735  gint i;
9736  const gint null_count = Null_Count_of_AIM (path_ahfa_item);
9737  for (i = 1; i <= null_count; i++)
9738    {
9739      const gint symbol_instance = symbol_instance_of_path_ahfa_item + i;
9740      OR or_node = PSL_Datum (this_earley_set_psl, symbol_instance);
9741      MARPA_ASSERT (symbol_instance < SYMI_Count_of_G (g)) @;
9742      if (!or_node || ES_Ord_of_OR (or_node) != work_earley_set_ordinal)
9743	{
9744	  DAND draft_and_node;
9745	  const gint rhs_ix = symbol_instance - SYMI_of_RULE(path_rule);
9746	    const OR predecessor = rhs_ix ? last_or_node : NULL;
9747	  const OR cause = (OR)TOK_by_ID_of_R( r, RHS_ID_of_RULE (path_rule, rhs_ix)) ;
9748	  MARPA_ASSERT (symbol_instance < Length_of_RULE (path_rule)) @;
9749	  MARPA_ASSERT (symbol_instance >= 0) @;
9750	  @<Set |last_or_node| to a new or-node@>@;
9751	  PSL_Datum (this_earley_set_psl, symbol_instance) = or_node = last_or_node;
9752	  Origin_Ord_of_OR (or_node) = ordinal_of_set_of_this_leo_item;
9753	  ES_Ord_of_OR (or_node) = work_earley_set_ordinal;
9754	  RULE_of_OR (or_node) = path_rule;
9755	  Position_of_OR (or_node) = rhs_ix + 1;
9756MARPA_ASSERT(Position_of_OR(or_node) <= 1 || predecessor);
9757	  DANDs_of_OR (or_node) = draft_and_node =
9758	      draft_and_node_new (&bocage_setup_obs, predecessor, cause);
9759	  MARPA_OFF_DEBUG3 ("or = %p, setting DAND = %p", or_node,
9760			    DANDs_of_OR (or_node));
9761	  Next_DAND_of_DAND (draft_and_node) = NULL;
9762	}
9763      MARPA_ASSERT (Position_of_OR (or_node) <=
9764		    SYMI_of_RULE (path_rule) + Length_of_RULE (path_rule)) @;
9765      MARPA_ASSERT (Position_of_OR (or_node) >= SYMI_of_RULE (path_rule)) @;
9766    }
9767}
9768
9769@** Whole Element ID (WHEID) Code.
9770The "whole elements" of the grammar are the symbols
9771and the completed rules.
9772{\bf To Do}: @^To Do@>
9773Note that this puts a limit on the number of symbols
9774and rules in a grammar --- their total must fit in an
9775int.
9776@d WHEID_of_SYMID(symid) (rule_count_of_g+(symid))
9777@d WHEID_of_RULEID(ruleid) (ruleid)
9778@d WHEID_of_RULE(rule) WHEID_of_RULEID(ID_of_RULE(rule))
9779@d WHEID_of_OR(or) (
9780    wheid = OR_is_Token(or) ?
9781        WHEID_of_SYMID(SYMID_of_OR(or)) :
9782        WHEID_of_RULE(RULE_of_OR(or))
9783    )
9784
9785@<Private typedefs@> =
9786typedef gint WHEID;
9787
9788
9789@** Draft And-Node (DAND) Code.
9790The draft and-nodes are used while the bocage is
9791being built.
9792Both draft and final and-nodes contain the predecessor
9793and cause.
9794Draft and-nodes need to be in a linked list,
9795so they have a link to the next and-node.
9796@<Private incomplete structures@> =
9797struct s_draft_and_node;
9798typedef struct s_draft_and_node* DAND;
9799@
9800@d Next_DAND_of_DAND(dand) ((dand)->t_next)
9801@d Predecessor_OR_of_DAND(dand) ((dand)->t_predecessor)
9802@d Cause_OR_of_DAND(dand) ((dand)->t_cause)
9803@<Private structures@> =
9804struct s_draft_and_node {
9805    DAND t_next;
9806    OR t_predecessor;
9807    OR t_cause;
9808};
9809typedef struct s_draft_and_node DAND_Object;
9810
9811@ @<Private function prototypes@> =
9812static inline
9813DAND draft_and_node_new(struct obstack *obs, OR predecessor, OR cause);
9814@ @<Function definitions@> =
9815static inline
9816DAND draft_and_node_new(struct obstack *obs, OR predecessor, OR cause)
9817{
9818    DAND draft_and_node = obstack_alloc (obs, sizeof(DAND_Object));
9819    Predecessor_OR_of_DAND(draft_and_node) = predecessor;
9820    Cause_OR_of_DAND(draft_and_node) = cause;
9821    MARPA_ASSERT(cause);
9822    return draft_and_node;
9823}
9824
9825@ Currently, I do not check draft and-nodes for duplicates.
9826This will be done when they are copied to final and-ndoes.
9827In the future, it may be more efficient to do a linear search for
9828duplicates until the number of draft and-nodes reaches a small
9829constant $n$.
9830(Optimal $n$ is perhaps something like 7.)
9831Alernatively, it could always check for duplicates, but limit
9832the search to the first $n$ draft and-nodes.
9833@ In that case, the logic to copy the final and-nodes can
9834rely on chains of length less than $n$ being non-duplicated,
9835and the PSARs can be reserved for the unusual case where this
9836is not sufficient.
9837@<Private function prototypes@> =
9838static inline
9839void draft_and_node_add(struct obstack *obs, OR parent, OR predecessor, OR cause);
9840@ @<Function definitions@> =
9841static inline
9842void draft_and_node_add(struct obstack *obs, OR parent, OR predecessor, OR cause)
9843{
9844    MARPA_ASSERT(Position_of_OR(parent) <= 1 || predecessor)
9845    const DAND new = draft_and_node_new(obs, predecessor, cause);
9846    Next_DAND_of_DAND(new) = DANDs_of_OR(parent);
9847    DANDs_of_OR(parent) = new;
9848}
9849
9850@ @<Create the draft and-nodes for |work_earley_set_ordinal|@> =
9851{
9852    gint item_ordinal;
9853    for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++)
9854    {
9855	OR* const nodes_by_aex = nodes_by_item[item_ordinal];
9856	if (nodes_by_aex) {
9857	    const EIM work_earley_item = eims_of_es[item_ordinal];
9858	    const gint work_ahfa_item_count = AIM_Count_of_EIM(work_earley_item);
9859	    const gint work_origin_ordinal = Ord_of_ES (Origin_of_EIM (work_earley_item));
9860	    AEX work_aex;
9861	    for (work_aex = 0; work_aex < work_ahfa_item_count; work_aex++) {
9862		OR or_node = nodes_by_aex[work_aex];
9863		Move_OR_to_Proper_OR(or_node);
9864		if (or_node) {
9865		    @<Create draft and-nodes for |or_node|@>@;
9866		}
9867	    }
9868	}
9869    }
9870}
9871
9872@ From an or-node, which may be nulling, determine its proper
9873predecessor.  Set |or-node| to 0 if there is none.
9874@d Move_OR_to_Proper_OR(or_node) {
9875    while (or_node)  {
9876	DAND draft_and_node = DANDs_of_OR(or_node);
9877	OR predecessor_or;
9878	if (!draft_and_node) break;
9879	predecessor_or = Predecessor_OR_of_DAND (draft_and_node);
9880	if (predecessor_or &&
9881	    ES_Ord_of_OR (predecessor_or) != work_earley_set_ordinal)
9882	  break;
9883	or_node = predecessor_or;
9884    }
9885}
9886
9887@ @<Create draft and-nodes for |or_node|@> =
9888{
9889    guint work_source_type = Source_Type_of_EIM (work_earley_item);
9890    const AIM work_ahfa_item = AIM_of_EIM_by_AEX (work_earley_item, work_aex);
9891    MARPA_ASSERT (work_ahfa_item >= AIM_by_ID (1))@;
9892    const AIM work_predecessor_aim = work_ahfa_item - 1;
9893    const gint work_symbol_instance = SYMI_of_AIM (work_ahfa_item);
9894    OR work_proper_or_node;
9895    Set_OR_from_Ord_and_SYMI (work_proper_or_node, work_origin_ordinal,
9896			      work_symbol_instance);
9897
9898    @<Create Leo draft and-nodes@>@;
9899    @<Create draft and-nodes for token sources@>@;
9900    @<Create draft and-nodes for completion sources@>@;
9901}
9902
9903@ @<Create Leo draft and-nodes@> = {
9904  SRCL source_link = NULL;
9905  EIM cause_earley_item = NULL;
9906  LIM leo_predecessor = NULL;
9907  switch (Source_Type_of_EIM(work_earley_item))
9908    {
9909    case SOURCE_IS_LEO:
9910      leo_predecessor = Predecessor_of_EIM (work_earley_item);
9911      cause_earley_item = Cause_of_EIM (work_earley_item);
9912      break;
9913    case SOURCE_IS_AMBIGUOUS:
9914      source_link = First_Leo_SRCL_of_EIM (work_earley_item);
9915      if (source_link)
9916	{
9917	  leo_predecessor = Predecessor_of_SRCL (source_link);
9918	  cause_earley_item = Cause_of_SRCL (source_link);
9919	  source_link = Next_SRCL_of_SRCL (source_link);
9920	}
9921      break;
9922    }
9923    if (leo_predecessor) {
9924	for (;;) { /* for each Leo source link */
9925	    @<Add draft and-nodes for chain starting with |leo_predecessor|@>@;
9926	    if (!source_link) break;
9927	    leo_predecessor = Predecessor_of_SRCL (source_link);
9928	    cause_earley_item = Cause_of_SRCL (source_link);
9929	    source_link = Next_SRCL_of_SRCL (source_link);
9930	}
9931    }
9932}
9933
9934@ Note that in a trivial path the bottom is also the top.
9935@<Add draft and-nodes for chain starting with |leo_predecessor|@> =
9936{
9937    /* The rule for the Leo path Earley item */
9938    RULE path_rule = NULL;
9939    /* The rule for the previous Leo path Earley item */
9940    RULE previous_path_rule;
9941    LIM path_leo_item = leo_predecessor;
9942    LIM higher_path_leo_item = Predecessor_LIM_of_LIM(path_leo_item);
9943    /* A boolean to indicate whether is true is there is some
9944       section of a non-trivial path left unprocessed. */
9945    OR dand_predecessor;
9946    OR path_or_node;
9947    EIM base_earley_item;
9948    AEX base_aex = lim_base_data_get(path_leo_item, &base_earley_item);
9949    Set_OR_from_EIM_and_AEX(dand_predecessor, base_earley_item, base_aex);
9950    @<Set |path_or_node|@>@;
9951    @<Add draft and-nodes to the bottom or-node@>@;
9952    previous_path_rule = path_rule;
9953    while (higher_path_leo_item) {
9954	path_leo_item = higher_path_leo_item;
9955	higher_path_leo_item = Predecessor_LIM_of_LIM(path_leo_item);
9956	base_aex = lim_base_data_get(path_leo_item, &base_earley_item);
9957	Set_OR_from_EIM_and_AEX(dand_predecessor, base_earley_item, base_aex);
9958	@<Set |path_or_node|@>@;
9959	@<Add the draft and-nodes to an upper Leo path or-node@>@;
9960	previous_path_rule = path_rule;
9961    }
9962}
9963
9964@ @<Set |path_or_node|@> =
9965{
9966  if (higher_path_leo_item) {
9967      @<Use Leo base data to set |path_or_node|@>@;
9968  } else {
9969      path_or_node = work_proper_or_node;
9970  }
9971}
9972
9973@ @d Set_OR_from_Ord_and_SYMI(or_node, origin, symbol_instance) {
9974  const PSL or_psl_at_origin = per_es_data[(origin)].t_or_psl;
9975  (or_node) = PSL_Datum (or_psl_at_origin, (symbol_instance));
9976}
9977
9978@ @<Add draft and-nodes to the bottom or-node@> =
9979{
9980  const SYMID transition_symbol_id = Postdot_SYMID_of_LIM (leo_predecessor);
9981  const TRANS cause_completion_data =
9982    TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id);
9983  const gint aex_count = Completion_Count_of_TRANS (cause_completion_data);
9984  const AEX *const aexes = AEXs_of_TRANS (cause_completion_data);
9985  gint ix;
9986  for (ix = 0; ix < aex_count; ix++)
9987    {
9988      const AEX cause_aex = aexes[ix];
9989      OR dand_cause;
9990      Set_OR_from_EIM_and_AEX(dand_cause, cause_earley_item, cause_aex);
9991      draft_and_node_add (&bocage_setup_obs, path_or_node,
9992			  dand_predecessor, dand_cause);
9993    }
9994}
9995
9996@ It is assumed that there is an or-node entry for
9997|psia_eim| and |psia_aex|.
9998@d Set_OR_from_EIM_and_AEX(psia_or, psia_eim, psia_aex) {
9999  const EIM psia_earley_item = psia_eim;
10000  const gint psia_earley_set_ordinal = ES_Ord_of_EIM (psia_earley_item);
10001  OR **const psia_nodes_by_item =
10002    per_es_data[psia_earley_set_ordinal].t_aexes_by_item;
10003  const gint psia_item_ordinal = Ord_of_EIM (psia_earley_item);
10004  OR *const psia_nodes_by_aex = psia_nodes_by_item[psia_item_ordinal];
10005  psia_or = psia_nodes_by_aex ? psia_nodes_by_aex[psia_aex] : NULL;
10006}
10007
10008@ @<Use Leo base data to set |path_or_node|@> =
10009{
10010  gint symbol_instance;
10011  const gint origin_ordinal = Origin_Ord_of_EIM (base_earley_item);
10012  const AIM aim = AIM_of_EIM_by_AEX (base_earley_item, base_aex);
10013  path_rule = RULE_of_AIM (aim);
10014  symbol_instance = Last_Proper_SYMI_of_RULE (path_rule);
10015  Set_OR_from_Ord_and_SYMI (path_or_node, origin_ordinal, symbol_instance);
10016}
10017
10018@ @<Add the draft and-nodes to an upper Leo path or-node@> =
10019{
10020  OR dand_cause;
10021  const SYMI symbol_instance = SYMI_of_Completed_RULE(previous_path_rule);
10022  const gint origin_ordinal = Ord_of_ES(ES_of_LIM(path_leo_item));
10023  Set_OR_from_Ord_and_SYMI(dand_cause, origin_ordinal, symbol_instance);
10024  draft_and_node_add (&bocage_setup_obs, path_or_node,
10025	  dand_predecessor, dand_cause);
10026}
10027
10028@ @<Create draft and-nodes for token sources@> =
10029{
10030  SRCL source_link = NULL;
10031  EIM predecessor_earley_item = NULL;
10032  TOK token = NULL;
10033  switch (work_source_type)
10034    {
10035    case SOURCE_IS_TOKEN:
10036      predecessor_earley_item = Predecessor_of_EIM (work_earley_item);
10037      token = TOK_of_EIM(work_earley_item);
10038      break;
10039    case SOURCE_IS_AMBIGUOUS:
10040      source_link = First_Token_Link_of_EIM (work_earley_item);
10041      if (source_link)
10042	{
10043	  predecessor_earley_item = Predecessor_of_SRCL (source_link);
10044	  token = TOK_of_SRCL(source_link);
10045	  source_link = Next_SRCL_of_SRCL (source_link);
10046	}
10047    }
10048    while (token)
10049      {
10050	@<Add draft and-node for token source@>@;
10051	if (!source_link) break;
10052	predecessor_earley_item = Predecessor_of_SRCL (source_link);
10053        token = TOK_of_SRCL(source_link);
10054	source_link = Next_SRCL_of_SRCL (source_link);
10055      }
10056}
10057
10058@ @<Add draft and-node for token source@> =
10059{
10060  OR dand_predecessor;
10061  @<Set |dand_predecessor|@>@;
10062  draft_and_node_add (&bocage_setup_obs, work_proper_or_node,
10063	  dand_predecessor, (OR)token);
10064}
10065
10066@ @<Set |dand_predecessor|@> =
10067{
10068   if (Position_of_AIM(work_predecessor_aim) < 1) {
10069       dand_predecessor = NULL;
10070   } else {
10071	const AEX predecessor_aex =
10072	    AEX_of_EIM_by_AIM (predecessor_earley_item, work_predecessor_aim);
10073      Set_OR_from_EIM_and_AEX(dand_predecessor, predecessor_earley_item, predecessor_aex);
10074   }
10075}
10076
10077@ @<Create draft and-nodes for completion sources@> =
10078{
10079  SRCL source_link = NULL;
10080  EIM predecessor_earley_item = NULL;
10081  EIM cause_earley_item = NULL;
10082  const SYMID transition_symbol_id = Postdot_SYMID_of_AIM(work_predecessor_aim);
10083  switch (work_source_type)
10084    {
10085    case SOURCE_IS_COMPLETION:
10086      predecessor_earley_item = Predecessor_of_EIM (work_earley_item);
10087      cause_earley_item = Cause_of_EIM (work_earley_item);
10088      break;
10089    case SOURCE_IS_AMBIGUOUS:
10090      source_link = First_Completion_Link_of_EIM (work_earley_item);
10091      if (source_link)
10092	{
10093	  predecessor_earley_item = Predecessor_of_SRCL (source_link);
10094	  cause_earley_item = Cause_of_SRCL (source_link);
10095	  source_link = Next_SRCL_of_SRCL (source_link);
10096	}
10097	break;
10098    }
10099  while (cause_earley_item)
10100    {
10101      const TRANS cause_completion_data =
10102	TRANS_of_EIM_by_SYMID (cause_earley_item, transition_symbol_id);
10103      const gint aex_count = Completion_Count_of_TRANS (cause_completion_data);
10104      const AEX * const aexes = AEXs_of_TRANS (cause_completion_data);
10105      gint ix;
10106      for (ix = 0; ix < aex_count; ix++) {
10107	  const AEX cause_aex = aexes[ix];
10108	    @<Add draft and-node for completion source@>@;
10109      }
10110      if (!source_link) break;
10111      predecessor_earley_item = Predecessor_of_SRCL (source_link);
10112      cause_earley_item = Cause_of_SRCL (source_link);
10113      source_link = Next_SRCL_of_SRCL (source_link);
10114    }
10115}
10116
10117@ @<Add draft and-node for completion source@> =
10118{
10119  OR dand_predecessor;
10120  OR dand_cause;
10121  const gint middle_ordinal = Origin_Ord_of_EIM(cause_earley_item);
10122  const AIM cause_ahfa_item = AIM_of_EIM_by_AEX(cause_earley_item, cause_aex);
10123  const SYMI cause_symbol_instance =
10124      SYMI_of_Completed_RULE(RULE_of_AIM(cause_ahfa_item));
10125  @<Set |dand_predecessor|@>@;
10126  Set_OR_from_Ord_and_SYMI(dand_cause, middle_ordinal, cause_symbol_instance);
10127  draft_and_node_add (&bocage_setup_obs, work_proper_or_node,
10128	  dand_predecessor, dand_cause);
10129}
10130
10131@ @<Mark duplicate draft and-nodes@> =
10132{
10133  OR * const or_nodes_of_b = ORs_of_B (b);
10134  const gint or_node_count_of_b = OR_Count_of_B(b);
10135  PSAR_Object and_per_es_arena;
10136  const PSAR and_psar = &and_per_es_arena;
10137  gint or_node_id = 0;
10138  psar_init (and_psar, rule_count_of_g+symbol_count_of_g);
10139  while (or_node_id < or_node_count_of_b) {
10140      const OR work_or_node = or_nodes_of_b[or_node_id];
10141    @<Mark the duplicate draft and-nodes for |work_or_node|@>@;
10142    or_node_id++;
10143  }
10144  psar_destroy (and_psar);
10145}
10146
10147@ I think the and PSL's and or PSL's are not actually used at the
10148same time, so the same field might be used for both.
10149More significantly, a simple $O(n^2)$ sort of the
10150draft and-nodes would spot duplicates more efficiently in 99%
10151of cases, although it would not be $O(n)$ as the PSL's are.
10152The best of both worlds could be had by using the sort when
10153there are less than, say, 7 and-nodes, and the PSL's otherwise.
10154@ The use of PSL's is slightly different here.
10155The PSL is not needed to find the draft and-nodes -- it's
10156essentially just a boolean to indicate whether it exists.
10157But "stale" booleans must still be detected.
10158The solutiion adopted is to put the parent or-node
10159into the PSL.
10160If the PSL contains the current parent or-node,
10161the draft and-node is a duplicate within that or-node.
10162Otherwise, it's the first such draft and-node.
10163@<Mark the duplicate draft and-nodes for |work_or_node|@> =
10164{
10165  DAND dand = DANDs_of_OR (work_or_node);
10166  DAND next_dand = Next_DAND_of_DAND (dand);
10167  ORID work_or_node_id = ID_of_OR(work_or_node);
10168  /* Only if there is more than one draft and-node */
10169  if (next_dand)
10170    {
10171      gint origin_ordinal = Origin_Ord_of_OR (work_or_node);
10172      psar_dealloc(and_psar);
10173      while (dand)
10174	{
10175	  OR psl_or_node;
10176	  OR predecessor = Predecessor_OR_of_DAND (dand);
10177	  WHEID wheid = WHEID_of_OR(Cause_OR_of_DAND(dand));
10178	  const gint middle_ordinal =
10179	    predecessor ? ES_Ord_of_OR (predecessor) : origin_ordinal;
10180	  PSL and_psl;
10181	  PSL *psl_owner = &per_es_data[middle_ordinal].t_and_psl;
10182	  /* The or-node used as a boolean in the PSL */
10183	  if (!*psl_owner) psl_claim (psl_owner, and_psar);
10184	  and_psl = *psl_owner;
10185	  psl_or_node = PSL_Datum(and_psl, wheid);
10186	  if (psl_or_node && ID_of_OR(psl_or_node) == work_or_node_id)
10187	  {
10188	      /* Mark this draft and-node as a duplicate */
10189	      Cause_OR_of_DAND(dand) = NULL;
10190	  } else {
10191	      /* Increment the count of unique draft and-nodes */
10192	      PSL_Datum(and_psl, wheid) = work_or_node;
10193	      unique_draft_and_node_count++;
10194	  }
10195	  dand = Next_DAND_of_DAND (dand);
10196	}
10197    } else {
10198	  unique_draft_and_node_count++;
10199    }
10200}
10201
10202@** And-Node (AND) Code.
10203The or-nodes are part of the parse bocage.
10204They are analogous to the and-nodes of a standard parse forest,
10205except that they are binary -- restricted to two children.
10206This means that the parse bocage stores the parse in a kind
10207of Chomsky Normal Form.
10208As another difference between it and a parse forest,
10209the parse bocage can contain cycles.
10210
10211@<Public typedefs@> =
10212typedef gint Marpa_And_Node_ID;
10213@ @<Private typedefs@> =
10214typedef Marpa_And_Node_ID ANDID;
10215
10216@ @<Private incomplete structures@> =
10217struct s_and_node;
10218typedef struct s_and_node* AND;
10219@
10220@d OR_of_AND(and) ((and)->t_current)
10221@d Predecessor_OR_of_AND(and) ((and)->t_predecessor)
10222@d Cause_OR_of_AND(and) ((and)->t_cause)
10223@<Private structures@> =
10224struct s_and_node {
10225    OR t_current;
10226    OR t_predecessor;
10227    OR t_cause;
10228};
10229typedef struct s_and_node AND_Object;
10230
10231@ @<Create the final and-nodes for all earley sets@> =
10232{
10233  gint unique_draft_and_node_count = 0;
10234  @<Mark duplicate draft and-nodes@>@;
10235  @<Create the final and-node array@>@;
10236}
10237
10238@ @<Create the final and-node array@> =
10239{
10240  const gint or_count_of_b = OR_Count_of_B (b);
10241  gint or_node_id;
10242  gint and_node_id = 0;
10243  const OR *ors_of_b = ORs_of_B (b);
10244  const AND ands_of_b = ANDs_of_B (b) =
10245    g_new (AND_Object, unique_draft_and_node_count);
10246  for (or_node_id = 0; or_node_id < or_count_of_b; or_node_id++)
10247    {
10248      gint and_count_of_parent_or = 0;
10249      const OR or_node = ors_of_b[or_node_id];
10250      DAND dand = DANDs_of_OR (or_node);
10251	First_ANDID_of_OR(or_node) = and_node_id;
10252      while (dand)
10253	{
10254	  const OR cause_or_node = Cause_OR_of_DAND (dand);
10255	  if (cause_or_node)
10256	    { /* Duplicates draft and-nodes
10257	    were marked by nulling the cause or-node */
10258	      const AND and_node = ands_of_b + and_node_id;
10259	      OR_of_AND (and_node) = or_node;
10260	      Predecessor_OR_of_AND (and_node) =
10261		Predecessor_OR_of_DAND (dand);
10262	      Cause_OR_of_AND (and_node) = cause_or_node;
10263	      and_node_id++;
10264	      and_count_of_parent_or++;
10265	    }
10266	    dand = Next_DAND_of_DAND(dand);
10267	}
10268	AND_Count_of_OR(or_node) = and_count_of_parent_or;
10269    }
10270    AND_Count_of_B (b) = and_node_id;
10271    MARPA_ASSERT(and_node_id == unique_draft_and_node_count);
10272}
10273
10274@*0 Trace Functions.
10275
10276@ @<Private function prototypes@> =
10277gint marpa_and_node_count(struct marpa_r *r);
10278@ @<Function definitions@> =
10279gint marpa_and_node_count(struct marpa_r *r)
10280{
10281  BOC b = B_of_R(r);
10282  @<Return |-2| on failure@>@;
10283  @<Fail if recognizer has fatal error@>@;
10284  if (!b) {
10285      R_ERROR("no bocage");
10286      return failure_indicator;
10287  }
10288  return AND_Count_of_B(b);
10289}
10290
10291@ @<Check |r| and |and_node_id|; set |and_node|@> = {
10292  BOC b = B_of_R(r);
10293  AND and_nodes;
10294  @<Fail if recognizer has fatal error@>@;
10295  if (!b) {
10296      R_ERROR("no bocage");
10297      return failure_indicator;
10298  }
10299  and_nodes = ANDs_of_B(b);
10300  if (!and_nodes) {
10301      R_ERROR("no and nodes");
10302      return failure_indicator;
10303  }
10304  if (and_node_id < 0) {
10305      R_ERROR("bad and node id");
10306      return failure_indicator;
10307  }
10308  if (and_node_id >= AND_Count_of_B(b)) {
10309      return -1;
10310  }
10311  and_node = and_nodes + and_node_id;
10312}
10313
10314@ @<Private function prototypes@> =
10315gint marpa_and_node_parent(struct marpa_r *r, int and_node_id);
10316@ @<Function definitions@> =
10317gint marpa_and_node_parent(struct marpa_r *r, int and_node_id)
10318{
10319  AND and_node;
10320  @<Return |-2| on failure@>@;
10321    @<Check |r| and |and_node_id|; set |and_node|@>@;
10322  return ID_of_OR (OR_of_AND (and_node));
10323}
10324
10325@ @<Private function prototypes@> =
10326gint marpa_and_node_predecessor(struct marpa_r *r, int and_node_id);
10327@ @<Function definitions@> =
10328gint marpa_and_node_predecessor(struct marpa_r *r, int and_node_id)
10329{
10330  AND and_node;
10331  @<Return |-2| on failure@>@;
10332    @<Check |r| and |and_node_id|; set |and_node|@>@;
10333    {
10334      const OR predecessor_or = Predecessor_OR_of_AND (and_node);
10335      const ORID predecessor_or_id =
10336	predecessor_or ? ID_of_OR (predecessor_or) : -1;
10337      return predecessor_or_id;
10338      }
10339}
10340
10341@ @<Private function prototypes@> =
10342gint marpa_and_node_cause(struct marpa_r *r, int and_node_id);
10343@ @<Function definitions@> =
10344gint marpa_and_node_cause(struct marpa_r *r, int and_node_id)
10345{
10346  AND and_node;
10347  @<Return |-2| on failure@>@;
10348    @<Check |r| and |and_node_id|; set |and_node|@>@;
10349    {
10350      const OR cause_or = Cause_OR_of_AND (and_node);
10351      const ORID cause_or_id =
10352	OR_is_Token(cause_or) ? -1 : ID_of_OR (cause_or);
10353      return cause_or_id;
10354    }
10355}
10356
10357@ @<Private function prototypes@> =
10358gint marpa_and_node_symbol(struct marpa_r *r, int and_node_id);
10359@ @<Function definitions@> =
10360gint marpa_and_node_symbol(struct marpa_r *r, int and_node_id)
10361{
10362  AND and_node;
10363  @<Return |-2| on failure@>@;
10364    @<Check |r| and |and_node_id|; set |and_node|@>@;
10365    {
10366      const OR cause_or = Cause_OR_of_AND (and_node);
10367      const SYMID symbol_id =
10368	OR_is_Token(cause_or) ? SYMID_of_OR(cause_or) : -1;
10369      return symbol_id;
10370    }
10371}
10372
10373@ Returns the data for the token of the and-node.
10374The symbol id is the return value,
10375and the token value is placed
10376in the location pointed
10377to by |value_p|, if that is non-null.
10378If |and_node_id| is not the ID of an and-node
10379whose cause is a token,
10380returns -1,
10381without changing |*value_p|.
10382On hard failure, returns -2 without changing
10383|*value_p|.
10384\par
10385There is no function to simply return the token value --
10386because of the need to indicate errors, it is just as
10387easy to return the symbol ID as well.
10388If the
10389@<Public function prototypes@> =
10390Marpa_Symbol_ID marpa_and_node_token(struct marpa_r *r,
10391    Marpa_And_Node_ID and_node_id, gpointer* value_p);
10392@ @<Function definitions@> =
10393Marpa_Symbol_ID marpa_and_node_token(struct marpa_r *r,
10394    Marpa_And_Node_ID and_node_id, gpointer* value_p)
10395{
10396  AND and_node;
10397  @<Return |-2| on failure@>@;
10398    @<Check |r| and |and_node_id|; set |and_node|@>@;
10399    return and_node_token(and_node, value_p);
10400}
10401@ @<Private function prototypes@> =
10402SYMID and_node_token(AND and_node, gpointer* value_p);
10403@ @<Function definitions@> =
10404SYMID and_node_token(AND and_node, gpointer* value_p)
10405{
10406  const OR cause_or = Cause_OR_of_AND (and_node);
10407  if (OR_is_Token (cause_or))
10408    {
10409      const TOK token = TOK_of_OR (cause_or);
10410      if (value_p)
10411	*value_p = Value_of_TOK (token);
10412      return SYMID_of_TOK (token);
10413    }
10414    return -1;
10415}
10416
10417@** Parse Bocage Code (BOC).
10418@ Pre-initialization is making the elements safe for the deallocation logic
10419to be called.  Often it is setting the value to zero, so that the deallocation
10420logic knows when {\bf not} to try deallocating a not-yet uninitialized value.
10421@<Private incomplete structures@> =
10422struct s_bocage;
10423typedef struct s_bocage* BOC;
10424@ @<Bocage structure@> =
10425struct s_bocage {
10426    @<Widely aligned bocage elements@>@;
10427    @<Int aligned bocage elements@>@;
10428    @<Bit aligned bocage elements@>@;
10429};
10430typedef struct s_bocage BOC_Object;
10431@ @d B_of_R(r) ((r)->t_bocage)
10432@<Widely aligned recognizer elements@> =
10433BOC t_bocage;
10434@ @<Initialize recognizer elements@> =
10435B_of_R(r) = NULL;
10436
10437@*0 The Bocage Obstack.
10438An obstack with the lifetime of the bocage.
10439@d OBS_of_B(b) ((b)->t_obs)
10440@<Widely aligned bocage elements@> =
10441struct obstack t_obs;
10442@ @<Bit aligned bocage elements@> =
10443unsigned int is_obstack_initialized:1;
10444@ @<Initialize bocage elements@> =
10445b->is_obstack_initialized = 1;
10446obstack_init(&OBS_of_B(b));
10447@ @<Destroy bocage elements, final phase@> =
10448if (b->is_obstack_initialized) {
10449    obstack_free(&OBS_of_B(b), NULL);
10450    b->is_obstack_initialized = 0;
10451}
10452
10453@*0 Bocage Construction.
10454@ This function returns 0 for a null parse,
10455and the ID of the start or-node for a non-null parse.
10456If there is no parse, -1 is returned.
10457On other failures, -2 is returned.
10458Note that, even though 0 is a valid or-node ID,
10459this does not conflict with returning 0 for a null parse.
10460Or-node 0 must be in the first Earley set,
10461and any parse whose top or-node is in the first
10462Earley set must be a null parse.
10463
10464so that an or-node of 0
10465@<Public function prototypes@> =
10466gint marpa_bocage_new(struct marpa_r* r, Marpa_Rule_ID rule_id, Marpa_Earley_Set_ID ordinal);
10467@ @<Function definitions@> =
10468gint marpa_bocage_new(struct marpa_r* r, Marpa_Rule_ID rule_id, Marpa_Earley_Set_ID ordinal) {
10469    @<Return |-2| on failure@>@;
10470    ORID top_or_node_id = failure_indicator;
10471    const gint no_parse = -1;
10472    @<Declare bocage locals@>@;
10473    r_update_earley_sets(r);
10474    @<Return if function guards fail;
10475	set |end_of_parse_es| and |completed_start_rule|@>@;
10476    b = B_of_R(r) = g_slice_new(BOC_Object);
10477MARPA_DEBUG3("%s new bocage B_of_R=%p", G_STRLOC, B_of_R(r));
10478    @<Initialize bocage elements@>@;
10479    @<Deal with null parse as a special case@>@;
10480    @<Find |start_eim|, |start_aim| and |start_aex|@>@;
10481    if (!start_eim) goto SOFT_ERROR;
10482    Phase_of_R(r) = evaluation_phase;
10483    obstack_init(&bocage_setup_obs);
10484    @<Allocate bocage setup working data@>@;
10485    @<Populate the PSIA data@>@;
10486    @<Create the or-nodes for all earley sets@>@;
10487    @<Create the final and-nodes for all earley sets@>@;
10488    @<Set |top_or_node_id|@>@;
10489    obstack_free(&bocage_setup_obs, NULL);
10490    Top_ORID_of_B(b) = top_or_node_id;
10491    return top_or_node_id;
10492    SOFT_ERROR: ;
10493    @<Destroy bocage elements, all phases@>;
10494    return no_parse;
10495}
10496
10497@ @<Declare bocage locals@> =
10498const GRAMMAR_Const g = G_of_R(r);
10499const gint rule_count_of_g = RULE_Count_of_G(g);
10500const gint symbol_count_of_g = SYM_Count_of_G(g);
10501BOC b;
10502ES end_of_parse_es;
10503RULE completed_start_rule;
10504EIM start_eim = NULL;
10505AIM start_aim = NULL;
10506AEX start_aex = -1;
10507struct obstack bocage_setup_obs;
10508gint total_earley_items_in_parse;
10509gint or_node_estimate = 0;
10510const gint earley_set_count_of_r = ES_Count_of_R (r);
10511
10512@ @<Private incomplete structures@> =
10513struct s_bocage_setup_per_es;
10514@ @<Private structures@> =
10515struct s_bocage_setup_per_es {
10516     OR ** t_aexes_by_item;
10517     PSL t_or_psl;
10518     PSL t_and_psl;
10519};
10520@ @<Declare bocage locals@> =
10521struct s_bocage_setup_per_es* per_es_data = NULL;
10522
10523@ @<Return if function guards fail;
10524set |end_of_parse_es| and |completed_start_rule|@> =
10525{
10526    EARLEME end_of_parse_earleme;
10527    @<Fail if recognizer has fatal error@>@;
10528    if (B_of_R(r)) {
10529	R_ERROR ("bocage in use");
10530	return failure_indicator;
10531    }
10532    switch (Phase_of_R (r))
10533      {
10534      default:
10535	R_ERROR ("recce not evaluation-ready");
10536	return failure_indicator;
10537      case input_phase:
10538      case evaluation_phase:
10539	break;
10540      }
10541
10542MARPA_OFF_DEBUG2("ordinal=%d", ordinal);
10543    if (ordinal == -1)
10544      {
10545	end_of_parse_es = Current_ES_of_R (r);
10546      }
10547    else
10548      {				// ordinal != -1
10549	if (!ES_Ord_is_Valid (r, ordinal))
10550	  {
10551	    R_ERROR ("invalid es ordinal");
10552	    return failure_indicator;
10553	  }
10554	end_of_parse_es = ES_of_R_by_Ord (r, ordinal);
10555      }
10556
10557    if (!end_of_parse_es)
10558      return no_parse;
10559    ordinal = Ord_of_ES(end_of_parse_es);
10560    end_of_parse_earleme = Earleme_of_ES (end_of_parse_es);
10561    if (rule_id == -1) {
10562	completed_start_rule =
10563	  end_of_parse_earleme ? g->t_proper_start_rule : g->t_null_start_rule;
10564	if (!completed_start_rule)
10565	  return no_parse;
10566    } else {
10567      if (!RULEID_of_G_is_Valid (g, rule_id))
10568	{
10569	  R_ERROR ("invalid rule id");
10570	  return failure_indicator;
10571	}
10572      completed_start_rule = RULE_by_ID (g, rule_id);
10573    }
10574MARPA_OFF_DEBUG2("ordinal=%d", ordinal);
10575}
10576
10577@ @<Deal with null parse as a special case@> =
10578{
10579    if (ordinal == 0) {  // If this is a null parse
10580	gint rule_length = Length_of_RULE(completed_start_rule);
10581	OR* or_nodes = ORs_of_B (b) = g_new (OR, 1);
10582        AND and_nodes = ANDs_of_B (b) = g_new (AND_Object, 1);
10583	OR or_node = or_nodes[0] = (OR)obstack_alloc (&OBS_of_B(b), sizeof(OR_Object));
10584	ORID null_or_node_id = 0;
10585	Top_ORID_of_B(b) = null_or_node_id;
10586
10587	OR_Count_of_B(b) = 1;
10588	AND_Count_of_B(b) = 1;
10589
10590	RULE_of_OR(or_node) = completed_start_rule;
10591	Position_of_OR(or_node) = rule_length;
10592	Origin_Ord_of_OR(or_node) = 0;
10593	ID_of_OR(or_node) = null_or_node_id;
10594	ES_Ord_of_OR(or_node) = 0;
10595	First_ANDID_of_OR(or_node) = 0;
10596	AND_Count_of_OR(or_node) = 1;
10597
10598	OR_of_AND(and_nodes) = or_node;
10599	Predecessor_OR_of_AND(and_nodes) = NULL;
10600	Cause_OR_of_AND (and_nodes) =
10601	  (OR)TOK_by_ID_of_R (r, RHS_ID_of_RULE (completed_start_rule, rule_length - 1));
10602
10603	return null_or_node_id;
10604    }
10605}
10606
10607@
10608@<Allocate bocage setup working data@>=
10609{
10610  guint ix;
10611  guint earley_set_count = ES_Count_of_R (r);
10612  total_earley_items_in_parse = 0;
10613  per_es_data =
10614    obstack_alloc (&bocage_setup_obs,
10615		   sizeof (struct s_bocage_setup_per_es) * earley_set_count);
10616  for (ix = 0; ix < earley_set_count; ix++)
10617    {
10618      const ES_Const earley_set = ES_of_R_by_Ord (r, ix);
10619      const guint item_count = EIM_Count_of_ES (earley_set);
10620      total_earley_items_in_parse += item_count;
10621	{
10622	  struct s_bocage_setup_per_es *per_es = per_es_data + ix;
10623	  OR ** const per_eim_eixes = per_es->t_aexes_by_item =
10624	    obstack_alloc (&bocage_setup_obs, sizeof (OR *) * item_count);
10625	  guint item_ordinal;
10626	  per_es->t_or_psl = NULL;
10627	  per_es->t_and_psl = NULL;
10628	  for (item_ordinal = 0; item_ordinal < item_count; item_ordinal++)
10629	    {
10630	      per_eim_eixes[item_ordinal] = NULL;
10631	    }
10632	}
10633    }
10634}
10635
10636@ Predicted AHFA states can be skipped since they
10637contain no completions.
10638Note that AHFA state 0 is not marked as a predicted AHFA state,
10639even though it can contain a predicted AHFA item.
10640@ A linear search of the AHFA items is used.
10641As shown elsewhere in this document,
10642discovered AHFA states for practical grammars tend to be
10643very small---%
10644less than two AHFA items.
10645Size of the AHFA state is a function of the grammar, so
10646any reasonable search is $O(1)$ in terms of the length of
10647the input.
10648@ The search for the start Earley item is done once
10649per parse---%
10650$O(s)$, where $s$ is the size of the end of parse Earley set.
10651This makes it very hard to justify any precomputations to
10652help the search, because if they have to be done once per
10653Earley set, that is a $O(\wsize \cdot s')$ overhead,
10654where $\wsize$ is the length of the input, and where
10655$s'$ is the average size of an Earley set.
10656It is hard to believe that for practical grammars
10657that $O(\wsize \cdot s') <= O(s)$, which
10658is what it would take for any per-Earley set overhead
10659to make sense.
10660@<Find |start_eim|, |start_aim| and |start_aex|@> =
10661{
10662    gint eim_ix;
10663    EIM* const earley_items = EIMs_of_ES(end_of_parse_es);
10664    const RULEID sought_rule_id = ID_of_RULE(completed_start_rule);
10665    const gint earley_item_count = EIM_Count_of_ES(end_of_parse_es);
10666    for (eim_ix = 0; eim_ix < earley_item_count; eim_ix++) {
10667        const EIM earley_item = earley_items[eim_ix];
10668	const AHFA ahfa_state = AHFA_of_EIM(earley_item);
10669	if (Origin_Earleme_of_EIM(earley_item) > 0) continue; // Not a start EIM
10670	if (!AHFA_is_Predicted(ahfa_state)) {
10671	    gint aex;
10672	    AIM* const ahfa_items = AIMs_of_AHFA(ahfa_state);
10673	    const gint ahfa_item_count = AIM_Count_of_AHFA(ahfa_state);
10674	    for (aex = 0; aex < ahfa_item_count; aex++) {
10675		 const AIM ahfa_item = ahfa_items[aex];
10676	         if (RULEID_of_AIM(ahfa_item) == sought_rule_id) {
10677		      start_aim = ahfa_item;
10678		      start_eim = earley_item;
10679		      start_aex = aex;
10680		      break;
10681		 }
10682	    }
10683	}
10684	if (start_eim) break;
10685    }
10686}
10687
10688@ @<Set |top_or_node_id|@> = {
10689    const ESID end_of_parse_ordinal = Ord_of_ES(end_of_parse_es);
10690    OR** const nodes_by_item = per_es_data[end_of_parse_ordinal].t_aexes_by_item;
10691    const gint start_earley_item_ordinal = Ord_of_EIM(start_eim);
10692    OR* const nodes_by_aex = nodes_by_item[start_earley_item_ordinal];
10693    const OR top_or_node = nodes_by_aex[start_aex];
10694    top_or_node_id = ID_of_OR(top_or_node);
10695}
10696
10697@*0 Bocage Destruction.
10698@<Destroy bocage elements, all phases@> =
10699@<Destroy bocage elements, main phase@>;
10700@<Destroy bocage elements, final phase@>;
10701
10702@ Destroy the bocage elements when I destroy the recognizer.
10703@<Destroy recognizer elements@> = bocage_destroy(r);
10704
10705@ This function is safe to call even
10706if the bocage already has been freed,
10707or was never initialized.
10708@<Public function prototypes@> =
10709gint marpa_bocage_free(struct marpa_r* r);
10710@ @<Function definitions@> =
10711gint marpa_bocage_free(struct marpa_r* r) {
10712    @<Return |-2| on failure@>@;
10713    @<Fail if recognizer has fatal error@>@;
10714    if (Phase_of_R(r) == evaluation_phase) { /* Reset phase if evaluating.
10715	    Otherwise leave phase untouched */
10716	Phase_of_R(r) = input_phase;
10717    }
10718    bocage_destroy(r);
10719    return 1;
10720}
10721
10722@ @<Private function prototypes@> =
10723static inline void bocage_destroy(struct marpa_r* r);
10724@ @<Function definitions@> =
10725static inline void bocage_destroy(struct marpa_r* r)
10726{
10727    BOC b = B_of_R(r);
10728MARPA_DEBUG3("%s B_of_R=%p", G_STRLOC, B_of_R(r));
10729    if (b) {
10730	@<Destroy bocage elements, all phases@>;
10731	g_slice_free(BOC_Object, b);
10732	B_of_R(r) = NULL;
10733    }
10734MARPA_DEBUG3("%s B_of_R=%p", G_STRLOC, B_of_R(r));
10735}
10736
10737@*0 Trace Functions.
10738
10739@ This is common logic in the or-node trace functions.
10740@<Check |r| and |or_node_id|; set |or_node|@> = {
10741  BOC b = B_of_R(r);
10742  OR* or_nodes;
10743  @<Fail if recognizer has fatal error@>@;
10744  if (!b) {
10745      R_ERROR("no bocage");
10746      return failure_indicator;
10747  }
10748  or_nodes = ORs_of_B(b);
10749  if (!or_nodes) {
10750      R_ERROR("no or nodes");
10751      return failure_indicator;
10752  }
10753  if (or_node_id < 0) {
10754      R_ERROR("bad or node id");
10755      return failure_indicator;
10756  }
10757  if (or_node_id >= OR_Count_of_B(b)) {
10758      return -1;
10759  }
10760  or_node = or_nodes[or_node_id];
10761}
10762
10763@ Return the ordinal of the current (final) Earley set of
10764the or-node.
10765@<Private function prototypes@> =
10766gint marpa_or_node_set(struct marpa_r *r, int or_node_id);
10767@ @<Function definitions@> =
10768gint marpa_or_node_set(struct marpa_r *r, int or_node_id)
10769{
10770  OR or_node;
10771  @<Return |-2| on failure@>@;
10772    @<Check |r| and |or_node_id|; set |or_node|@>@;
10773  return ES_Ord_of_OR(or_node);
10774}
10775
10776@ @<Private function prototypes@> =
10777gint marpa_or_node_origin(struct marpa_r *r, int or_node_id);
10778@ @<Function definitions@> =
10779gint marpa_or_node_origin(struct marpa_r *r, int or_node_id)
10780{
10781  OR or_node;
10782  @<Return |-2| on failure@>@;
10783    @<Check |r| and |or_node_id|; set |or_node|@>@;
10784  return Origin_Ord_of_OR(or_node);
10785}
10786
10787@ @<Private function prototypes@> =
10788gint marpa_or_node_rule(struct marpa_r *r, int or_node_id);
10789@ @<Function definitions@> =
10790gint marpa_or_node_rule(struct marpa_r *r, int or_node_id)
10791{
10792  OR or_node;
10793  @<Return |-2| on failure@>@;
10794    @<Check |r| and |or_node_id|; set |or_node|@>@;
10795  return ID_of_RULE(RULE_of_OR(or_node));
10796}
10797
10798@ @<Private function prototypes@> =
10799gint marpa_or_node_position(struct marpa_r *r, int or_node_id);
10800@ @<Function definitions@> =
10801gint marpa_or_node_position(struct marpa_r *r, int or_node_id)
10802{
10803  OR or_node;
10804  @<Return |-2| on failure@>@;
10805    @<Check |r| and |or_node_id|; set |or_node|@>@;
10806  return Position_of_OR(or_node);
10807}
10808
10809@ @<Private function prototypes@> =
10810gint marpa_or_node_first_and(struct marpa_r *r, int or_node_id);
10811@ @<Function definitions@> =
10812gint marpa_or_node_first_and(struct marpa_r *r, int or_node_id)
10813{
10814  OR or_node;
10815  @<Return |-2| on failure@>@;
10816    @<Check |r| and |or_node_id|; set |or_node|@>@;
10817  return First_ANDID_of_OR(or_node);
10818}
10819
10820@ @<Private function prototypes@> =
10821gint marpa_or_node_last_and(struct marpa_r *r, int or_node_id);
10822@ @<Function definitions@> =
10823gint marpa_or_node_last_and(struct marpa_r *r, int or_node_id)
10824{
10825  OR or_node;
10826  @<Return |-2| on failure@>@;
10827    @<Check |r| and |or_node_id|; set |or_node|@>@;
10828  return First_ANDID_of_OR(or_node)
10829      + AND_Count_of_OR(or_node) - 1;
10830}
10831
10832@ @<Private function prototypes@> =
10833gint marpa_or_node_and_count(struct marpa_r *r, int or_node_id);
10834@ @<Function definitions@> =
10835gint marpa_or_node_and_count(struct marpa_r *r, int or_node_id)
10836{
10837  OR or_node;
10838  @<Return |-2| on failure@>@;
10839    @<Check |r| and |or_node_id|; set |or_node|@>@;
10840  return AND_Count_of_OR(or_node);
10841}
10842
10843@** Parse Tree (TREE) Code.
10844Within Marpa,
10845when it makes sense in context,
10846"tree" means a parse tree.
10847Trees are, of course, a very common data structure,
10848and are used for all sorts of things.
10849But the most important trees in Marpa's universe
10850are its parse trees.
10851\par
10852Marpa's parse trees are produced by iterating
10853the Marpa bocage.
10854Therefore, Marpa parse trees are also bocage iterators.
10855@<Private incomplete structures@> =
10856struct s_tree;
10857typedef struct s_tree* TREE;
10858@ An exhausted bocage iterator (or parse tree)
10859does not need a worklist
10860or a stack, so they are destroyed.
10861if the bocage iterator has a parse count,
10862but no stack,
10863it is exhausted.
10864@d TREE_is_Initialized(tree) ((tree)->t_parse_count >= 0)
10865@d TREE_is_Exhausted(tree) (TREE_is_Initialized(tree)
10866    && !FSTACK_IS_INITIALIZED((tree)->t_fork_stack))
10867@d VAL_of_TREE(tree) (&(tree)->t_val)
10868@d Size_of_TREE(tree) FSTACK_LENGTH((tree)->t_fork_stack)
10869@d FORK_of_TREE_by_IX(tree, fork_id)
10870    FSTACK_INDEX((tree)->t_fork_stack, FORK_Object, fork_id)
10871@<Private structures@> =
10872@<FORK structure@>@;
10873@<VAL structure@>@;
10874struct s_tree {
10875    FSTACK_DECLARE(t_fork_stack, FORK_Object)@;
10876    FSTACK_DECLARE(t_fork_worklist, gint)@;
10877    Bit_Vector t_and_node_in_use;
10878    gint t_parse_count;
10879    VAL_Object t_val;
10880};
10881typedef struct s_tree TREE_Object;
10882
10883@ @<Private function prototypes@> =
10884static inline void tree_exhaust(TREE tree);
10885@ @<Function definitions@> =
10886static inline void tree_exhaust(TREE tree)
10887{
10888  if (FSTACK_IS_INITIALIZED(tree->t_fork_stack))
10889    {
10890      FSTACK_DESTROY(tree->t_fork_stack);
10891      FSTACK_SAFE(tree->t_fork_stack);
10892    }
10893  if (FSTACK_IS_INITIALIZED(tree->t_fork_worklist))
10894    {
10895      FSTACK_DESTROY(tree->t_fork_worklist);
10896      FSTACK_SAFE(tree->t_fork_worklist);
10897    }
10898    if (tree->t_and_node_in_use) {
10899	  bv_free (tree->t_and_node_in_use);
10900	tree->t_and_node_in_use = NULL;
10901    }
10902}
10903
10904@ @<Private function prototypes@> =
10905static inline void tree_safe(TREE tree);
10906@ @<Function definitions@> =
10907static inline void tree_safe(TREE tree)
10908{
10909    FSTACK_SAFE(tree->t_fork_stack);
10910    FSTACK_SAFE(tree->t_fork_worklist);
10911    tree->t_and_node_in_use = NULL;
10912    tree->t_parse_count = -1;
10913    val_safe(VAL_of_TREE(tree));
10914}
10915
10916@ Returns the size of the tree.
10917If the bocage iterator is exhausted, returns -1.
10918On error, returns -2.
10919@<Public function prototypes@> =
10920int marpa_tree_new(struct marpa_r* r);
10921@ @<Function definitions@> =
10922int marpa_tree_new(struct marpa_r* r)
10923{
10924    BOC b;
10925    TREE tree;
10926    gint first_tree_of_series = 0;
10927    @<Return |-2| on failure@>@;
10928    @<Fail if recognizer has fatal error@>@;
10929    @<Set |b| to bocage; fail if none@>@;
10930    tree = TREE_of_RANK(RANK_of_B(b));
10931    if (TREE_is_Exhausted(tree)) {
10932       return -1;
10933    }
10934    val_destroy(VAL_of_TREE(tree));
10935    if (!TREE_is_Initialized(tree))
10936      {
10937	first_tree_of_series = 1;
10938	@<Initialize the tree iterator;
10939	return -1 if fails
10940	@>@;
10941      }
10942      while (1) {
10943	 const AND ands_of_b = ANDs_of_B(b);
10944         if (!first_tree_of_series) {
10945	     @<Start a new iteration of the tree@>@;
10946	 }
10947	 first_tree_of_series = 0;
10948	 @<Finish tree if possible@>@;
10949     }
10950     TREE_IS_FINISHED: ;
10951    tree->t_parse_count++;
10952      return FSTACK_LENGTH(tree->t_fork_stack);
10953    TREE_IS_EXHAUSTED: ;
10954   tree_exhaust(tree);
10955   return -1;
10956}
10957
10958@*0 Claiming and Releasing And-nodes.
10959To avoid cycles, the same and node is not allowed to occur twice
10960in the parse tree.
10961A bit vector, accessed by these functions, enforces this.
10962@<Private function prototypes@> =
10963static inline void tree_and_node_claim(TREE tree, ANDID and_node_id);
10964static inline void tree_and_node_release(TREE tree, ANDID and_node_id);
10965static inline gint tree_and_node_try(TREE tree, ANDID and_node_id);
10966@ Claim the and-node by setting its bit.
10967@<Function definitions@> =
10968static inline void tree_and_node_claim(TREE tree, ANDID and_node_id)
10969{
10970    bv_bit_set(tree->t_and_node_in_use, (guint)and_node_id);
10971}
10972@ Release the and-node by unsetting its bit.
10973@<Function definitions@> =
10974static inline void tree_and_node_release(TREE tree, ANDID and_node_id)
10975{
10976    bv_bit_clear(tree->t_and_node_in_use, (guint)and_node_id);
10977}
10978@ Try to claim the and-node.
10979If it was already claimed, return 0, otherwise claim it (that is,
10980set the bit) and return 1.
10981@<Function definitions@> =
10982static inline gint tree_and_node_try(TREE tree, ANDID and_node_id)
10983{
10984    return !bv_bit_test_and_set(tree->t_and_node_in_use, (guint)and_node_id);
10985}
10986
10987@ @<Initialize the tree iterator;
10988return -1 if fails@> =
10989{
10990    ORID top_or_id = Top_ORID_of_B(b);
10991    OR top_or_node = OR_of_B_by_ID(b, top_or_id);
10992  FORK fork;
10993  gint choice;
10994  const gint and_count = AND_Count_of_B (b);
10995  tree->t_parse_count = 0;
10996    tree->t_and_node_in_use = bv_create ((guint) and_count);
10997  FSTACK_INIT (tree->t_fork_stack, FORK_Object, and_count);
10998  FSTACK_INIT (tree->t_fork_worklist, gint, and_count);
10999    choice = or_node_next_choice(b, tree, top_or_node, 0);
11000	/* Due to skipping, even the top or-node can have no
11001	   valid choices, in which case there is no parse */
11002	if (choice < 0) goto TREE_IS_EXHAUSTED;
11003  fork = FSTACK_PUSH (tree->t_fork_stack);
11004    OR_of_FORK(fork) = top_or_node;
11005    Choice_of_FORK(fork) = choice;
11006    Parent_of_FORK(fork) = -1;
11007    FORK_Cause_is_Ready(fork) = 0;
11008    FORK_is_Cause(fork) = 0;
11009    FORK_Predecessor_is_Ready(fork) = 0;
11010    FORK_is_Predecessor(fork) = 0;
11011  *(FSTACK_PUSH (tree->t_fork_worklist)) = 0;
11012}
11013
11014@ Look for a fork to iterate.
11015If there is one, set it to the next choice.
11016Otherwise, the tree is exhausted.
11017@<Start a new iteration of the tree@> = {
11018    while (1) {
11019	FORK iteration_candidate = FSTACK_TOP(tree->t_fork_stack, FORK_Object);
11020	gint choice;
11021	if (!iteration_candidate) break;
11022	choice = Choice_of_FORK(iteration_candidate);
11023	MARPA_ASSERT(choice >= 0);
11024	{
11025	    OR or_node = OR_of_FORK(iteration_candidate);
11026	    ANDID and_node_id = and_order_get(b, or_node, choice);
11027	    tree_and_node_release(tree, and_node_id);
11028	    choice = or_node_next_choice(b, tree, or_node, choice+1);
11029	}
11030	if (choice >= 0) {
11031	    /* We have found a fork we can iterate.
11032	        Set the new choice,
11033		dirty the child bits in the current working fork,
11034		and break out of the loop.
11035	    */
11036	    Choice_of_FORK(iteration_candidate) = choice;
11037	    FORK_Cause_is_Ready(iteration_candidate) = 0;
11038	    FORK_Predecessor_is_Ready(iteration_candidate) = 0;
11039	    break;
11040	}
11041	{
11042	    /* Dirty the corresponding bit in the parent */
11043	    const gint parent_fork_ix = Parent_of_FORK(iteration_candidate);
11044	    if (parent_fork_ix >= 0) {
11045		FORK parent_fork = FORK_of_TREE_by_IX(tree, parent_fork_ix);
11046		if (FORK_is_Cause(iteration_candidate)) {
11047		    FORK_Cause_is_Ready(parent_fork) = 0;
11048		}
11049		if (FORK_is_Predecessor(iteration_candidate)) {
11050		    FORK_Predecessor_is_Ready(parent_fork) = 0;
11051		}
11052	    }
11053
11054	    /* Continue with the next item on the stack */
11055	    FSTACK_POP(tree->t_fork_stack);
11056	}
11057    }
11058    {
11059	gint stack_length = FSTACK_LENGTH(tree->t_fork_stack);
11060	gint i;
11061	if (stack_length <= 0) goto TREE_IS_EXHAUSTED;
11062	FSTACK_CLEAR(tree->t_fork_worklist);
11063	for (i = 0; i < stack_length; i++) {
11064	    *(FSTACK_PUSH(tree->t_fork_worklist)) = i;
11065	}
11066    }
11067}
11068
11069@ @<Finish tree if possible@> = {
11070    while (1) {
11071	FORKID* p_work_fork_id;
11072	FORK work_fork;
11073	ANDID work_and_node_id;
11074	AND work_and_node;
11075	OR work_or_node;
11076	OR child_or_node = NULL;
11077	gint choice;
11078	gint child_is_cause = 0;
11079	gint child_is_predecessor = 0;
11080	p_work_fork_id = FSTACK_TOP(tree->t_fork_worklist, FORKID);
11081	if (!p_work_fork_id) {
11082	    goto TREE_IS_FINISHED;
11083	}
11084	work_fork = FORK_of_TREE_by_IX(tree, *p_work_fork_id);
11085	work_or_node = OR_of_FORK(work_fork);
11086	work_and_node_id = and_order_get(b, work_or_node, Choice_of_FORK(work_fork));
11087	work_and_node = ands_of_b + work_and_node_id;
11088	if (!FORK_Cause_is_Ready(work_fork)) {
11089	    child_or_node = Cause_OR_of_AND(work_and_node);
11090	    if (child_or_node && OR_is_Token(child_or_node)) child_or_node = NULL;
11091	    if (child_or_node) {
11092		child_is_cause = 1;
11093	    } else {
11094		FORK_Cause_is_Ready(work_fork) = 1;
11095	    }
11096	}
11097	if (!child_or_node && !FORK_Predecessor_is_Ready(work_fork)) {
11098	    child_or_node = Predecessor_OR_of_AND(work_and_node);
11099	    if (child_or_node) {
11100		child_is_predecessor = 1;
11101	    } else {
11102		FORK_Predecessor_is_Ready(work_fork) = 1;
11103	    }
11104	}
11105	if (!child_or_node) {
11106	    FSTACK_POP(tree->t_fork_worklist);
11107	    goto NEXT_FORK_ON_WORKLIST;
11108	}
11109	choice = or_node_next_choice(b, tree, child_or_node, 0);
11110	if (choice < 0) goto NEXT_TREE;
11111	@<Add new fork to tree@>;
11112	NEXT_FORK_ON_WORKLIST: ;
11113    }
11114    NEXT_TREE: ;
11115}
11116
11117@ @<Private function prototypes@> =
11118static inline gint or_node_next_choice(BOC b, TREE tree, OR or_node, gint start_choice);
11119@ @<Function definitions@> =
11120static inline gint or_node_next_choice(BOC b, TREE tree, OR or_node, gint start_choice)
11121{
11122    gint choice = start_choice;
11123    while (1) {
11124	ANDID and_node_id = and_order_get(b, or_node, choice);
11125	if (and_node_id < 0) return -1;
11126	if (tree_and_node_try(tree, and_node_id)) return choice;
11127	choice++;
11128    }
11129    return -1;
11130}
11131
11132@ @<Add new fork to tree@> =
11133{
11134   FORKID new_fork_id = FSTACK_LENGTH(tree->t_fork_stack);
11135   FORK new_fork = FSTACK_PUSH(tree->t_fork_stack);
11136    *(FSTACK_PUSH(tree->t_fork_worklist)) = new_fork_id;
11137    Parent_of_FORK(new_fork) = *p_work_fork_id;
11138    Choice_of_FORK(new_fork) = choice;
11139    OR_of_FORK(new_fork) = child_or_node;
11140    FORK_Cause_is_Ready(new_fork) = 0;
11141    if ( ( FORK_is_Cause(new_fork) = child_is_cause ) ) {
11142	FORK_Cause_is_Ready(work_fork) = 1;
11143    }
11144    FORK_Predecessor_is_Ready(new_fork) = 0;
11145    if ( ( FORK_is_Predecessor(new_fork) = child_is_predecessor ) ) {
11146	FORK_Predecessor_is_Ready(work_fork) = 1;
11147    }
11148}
11149
11150@ @<Set |b| to bocage; fail if none@> =
11151{
11152    b = B_of_R(r);
11153    if (!b) {
11154	R_ERROR ("no bocage");
11155	return failure_indicator;
11156    }
11157}
11158
11159@ @<Private function prototypes@> =
11160static inline void tree_destroy(TREE tree);
11161@ @<Function definitions@> =
11162static inline void tree_destroy(TREE tree)
11163{
11164    tree_exhaust(tree);
11165    tree->t_parse_count = -1;
11166MARPA_DEBUG4("%s tree=%p parse_count=%d", G_STRLOC, tree, tree->t_parse_count);
11167}
11168
11169@ Soft failure (-1) if no bocage, so that this function
11170can be also used to check for the existence of the bocage.
11171@<Public function prototypes@> =
11172gint marpa_parse_count(struct marpa_r* r);
11173@ @<Function definitions@> =
11174gint marpa_parse_count(struct marpa_r* r)
11175{
11176    BOC b;
11177    TREE tree;
11178    @<Return |-2| on failure@>@;
11179    @<Fail if recognizer has fatal error@>@;
11180    b = B_of_R(r);
11181    if (!b) {
11182	return -1;
11183    }
11184    tree = TREE_of_RANK(RANK_of_B(b));
11185MARPA_DEBUG3("%s b=%p", G_STRLOC, b);
11186MARPA_DEBUG4("%s tree=%p parse_count=%d", G_STRLOC, tree, tree->t_parse_count);
11187    return tree->t_parse_count;
11188}
11189
11190@ Return the size of the parse tree.
11191This is the number of |FORK| entries in its stack.
11192If there is a serioius error,
11193or if the tree is uninitialized, return -2.
11194If the tree is exhausted, return -1.
11195@<Private function prototypes@> =
11196gint marpa_tree_size(struct marpa_r *r);
11197@ @<Function definitions@> =
11198gint marpa_tree_size(struct marpa_r *r)
11199{
11200  @<Return |-2| on failure@>@;
11201  BOC b = B_of_R(r);
11202  TREE tree;
11203  @<Fail if recognizer has fatal error@>@;
11204  if (!b) {
11205      R_ERROR("no bocage");
11206      return failure_indicator;
11207  }
11208  tree = TREE_of_RANK(RANK_of_B(b));
11209  if (!TREE_is_Initialized(tree)) {
11210      R_ERROR("tree not initialized");
11211      return failure_indicator;
11212  }
11213  if (TREE_is_Exhausted(tree)) {
11214      return -1;
11215  }
11216  return FSTACK_LENGTH(tree->t_fork_stack);
11217}
11218
11219@** Bocage Ranking (RANK) Code.
11220@<Private incomplete structures@> =
11221struct s_bocage_rank;
11222typedef struct s_bocage_rank* RANK;
11223@
11224|t_and_node_orderings| is used as the "safe boolean"
11225for the obstack.  They have the same lifetime, so
11226that it is safe to destroy the obstack if
11227|t_and_node_orderings| is not null.
11228@d TREE_of_RANK(rank) (&(rank)->t_tree)
11229@d OBS_of_RANK(rank) ((rank)->t_obs)
11230@<Private structures@> =
11231struct s_bocage_rank {
11232    struct obstack t_obs;
11233    Bit_Vector t_and_node_in_use;
11234    ANDID** t_and_node_orderings;
11235    TREE_Object t_tree;
11236};
11237typedef struct s_bocage_rank RANK_Object;
11238
11239@
11240@d RANK_of_B(b) (&(b)->t_rank)
11241@<Widely aligned bocage elements@> =
11242RANK_Object t_rank;
11243@ @<Initialize bocage elements@> =
11244MARPA_DEBUG3("%s rank_safe where b=%p", G_STRLOC, b);
11245rank_safe(RANK_of_B(b));
11246@ @<Private function prototypes@> =
11247static inline void rank_safe(RANK rank);
11248@ @<Function definitions@> =
11249static inline void rank_safe(RANK rank)
11250{
11251    rank->t_and_node_in_use = NULL;
11252    rank->t_and_node_orderings = NULL;
11253    tree_safe(TREE_of_RANK(rank));
11254}
11255
11256@ @<Destroy bocage elements, main phase@> =
11257rank_destroy(RANK_of_B(b));
11258@ @<Private function prototypes@> =
11259static inline void rank_freeze(RANK rank);
11260static inline void rank_destroy(RANK rank);
11261@ @<Function definitions@> =
11262static inline void rank_freeze(RANK rank)
11263{
11264  if (rank->t_and_node_in_use)
11265    {
11266      bv_free (rank->t_and_node_in_use);
11267	rank->t_and_node_in_use = NULL;
11268    }
11269}
11270static inline void rank_destroy(RANK rank)
11271{
11272  tree_destroy(TREE_of_RANK(rank));
11273  rank_freeze(rank);
11274  if (rank->t_and_node_orderings) {
11275      rank->t_and_node_orderings = NULL;
11276      obstack_free(&OBS_of_RANK(rank), NULL);
11277  }
11278}
11279
11280@*0 The RANK Obstack.
11281An obstack with the lifetime of the bocage ranker.
11282
11283@*0 Set the Order of And-nodes.
11284This function
11285sets the order in which the and-nodes of an
11286or-node are used.
11287It is an error if an and-node ID is not the
11288immediate child of the specified or-node,
11289or if the and-node is specified twice,
11290or if an ordering has already been specified for
11291the or-node.
11292@<Public function prototypes@> =
11293gint marpa_and_order_set(struct marpa_r *r,
11294    Marpa_Or_Node_ID or_node_id,
11295    Marpa_And_Node_ID* and_node_ids,
11296    gint length);
11297@ For a given bocage,
11298this function may not be used to order
11299the same or-node more than once.
11300In other words, after you have once specified an order
11301for the and-nodes within an or-node,
11302you cannot change it.
11303Some applications might find this inconvenient,
11304and will have to resort to their own buffering
11305to prevent multiple changes.
11306But most applications won't care, and
11307will benefit from the faster memory allocation
11308this restriction allows.
11309
11310@ Using a bit vector for
11311the index of an and-node within an or-node,
11312instead of the and-node ID, would seem to allow
11313an space efficiency: the size of the bit vector
11314could be reduced to the maximum number of descendents
11315of any or-node.
11316But in fact, improvements from this approach are evasive.
11317
11318In the worst cases, these counts are the same, or
11319almost the same.
11320Any attempt to economize on space seems to always
11321be counter-productive in terms of speed.
11322And since
11323allocating a bit vector for the worst case does
11324not increase the memory high water mark,
11325it would seems to be the most reasonable tradeoff.
11326
11327This in turn suggests there is no advantage is using
11328a within-or-node index to index the bit vector,
11329instead of using the and-node id to index the bit vector.
11330Using the and-node ID does have the advantage that the bit
11331vector does not need to be cleared for each or-node.
11332@ The first position in each |and_node_orderings| array is not
11333actually an |ANDID|, but a count.
11334A purist might insist this needs to be reflected in a structure,
11335but to my mind doing this portably makes the code more obscure,
11336not less.
11337@<Function definitions@> =
11338gint marpa_and_order_set(struct marpa_r *r,
11339    Marpa_Or_Node_ID or_node_id,
11340    Marpa_And_Node_ID* and_node_ids,
11341    gint length)
11342{
11343    OR or_node;
11344    RANK rank;
11345  @<Return |-2| on failure@>@;
11346    @<Check |r| and |or_node_id|; set |or_node|@>@;
11347    { BOC b = B_of_R(r);
11348      ANDID** and_node_orderings;
11349      Bit_Vector and_node_in_use;
11350      struct obstack *obs;
11351      ANDID first_and_node_id;
11352      ANDID and_count_of_or;
11353	  if (!b) {
11354	      R_ERROR("no bocage");
11355	      return failure_indicator;
11356	  }
11357	rank = RANK_of_B(b);
11358	and_node_orderings = rank->t_and_node_orderings;
11359	and_node_in_use = rank->t_and_node_in_use;
11360	obs = &OBS_of_RANK(rank);
11361	if (and_node_orderings && !and_node_in_use)
11362	{
11363	  R_ERROR("ranker frozen");
11364	  return failure_indicator;
11365	}
11366	if (!and_node_orderings)
11367	  {
11368	    gint and_id;
11369	    const gint and_count_of_r = AND_Count_of_B (b);
11370	    obstack_init(obs);
11371	    rank->t_and_node_orderings =
11372	      and_node_orderings =
11373	      obstack_alloc (obs, sizeof (ANDID *) * and_count_of_r);
11374	    for (and_id = 0; and_id < and_count_of_r; and_id++)
11375	      {
11376		and_node_orderings[and_id] = (ANDID *) NULL;
11377	      }
11378	     rank->t_and_node_in_use =
11379	     and_node_in_use = bv_create ((guint)and_count_of_r);
11380	  }
11381	  first_and_node_id = First_ANDID_of_OR(or_node);
11382	  and_count_of_or = AND_Count_of_OR(or_node);
11383	    {
11384	      gint and_ix;
11385	      for (and_ix = 0; and_ix < length; and_ix++)
11386		{
11387		  ANDID and_node_id = and_node_ids[and_ix];
11388		  if (and_node_id < first_and_node_id ||
11389			  and_node_id - first_and_node_id >= and_count_of_or) {
11390		      R_ERROR ("and node not in or node");
11391		      return failure_indicator;
11392		    }
11393		  if (bv_bit_test (and_node_in_use, (guint)and_node_id))
11394		    {
11395		      R_ERROR ("dup and node");
11396		      return failure_indicator;
11397		    }
11398		  bv_bit_set (and_node_in_use, (guint)and_node_id);
11399		}
11400	    }
11401	    if (and_node_orderings[or_node_id]) {
11402		      R_ERROR ("or node already ordered");
11403		      return failure_indicator;
11404	    }
11405	    {
11406	      ANDID *orderings = obstack_alloc (obs, sizeof (ANDID) * (length + 1));
11407	      gint i;
11408	      and_node_orderings[or_node_id] = orderings;
11409	      *orderings++ = length;
11410	      for (i = 0; i < length; i++)
11411		{
11412		  *orderings++ = and_node_ids[i];
11413		}
11414	    }
11415    }
11416  return 1;
11417}
11418
11419@*0 Get an And-node by Order within its Or-Node.
11420@ @<Private function prototypes@> =
11421static inline ANDID and_order_get(BOC b, OR or_node, gint ix);
11422@ @<Public function prototypes@> =
11423Marpa_And_Node_ID marpa_and_order_get(struct marpa_r *r, Marpa_Or_Node_ID or_node_id, gint ix);
11424@ @<Function definitions@> =
11425static inline ANDID and_order_get(BOC b, OR or_node, gint ix)
11426{
11427  RANK rank;
11428  ANDID **and_node_orderings;
11429  if (ix >= AND_Count_of_OR (or_node))
11430    {
11431      return -1;
11432    }
11433  rank = RANK_of_B (b);
11434  and_node_orderings = rank->t_and_node_orderings;
11435  if (and_node_orderings)
11436    {
11437      ORID or_node_id = ID_of_OR(or_node);
11438      ANDID *ordering = and_node_orderings[or_node_id];
11439      if (ordering)
11440	{
11441	  gint length = ordering[0];
11442	  if (ix >= length)
11443	    return -1;
11444	  return ordering[1 + ix];
11445	}
11446    }
11447  return First_ANDID_of_OR(or_node) + ix;
11448}
11449
11450Marpa_And_Node_ID marpa_and_order_get(struct marpa_r *r, Marpa_Or_Node_ID or_node_id, gint ix)
11451{
11452    OR or_node;
11453  @<Return |-2| on failure@>@;
11454    @<Check |r| and |or_node_id|; set |or_node|@>@;
11455  if (ix < 0) {
11456      R_ERROR("negative and ix");
11457      return failure_indicator;
11458  }
11459    {
11460      BOC b = B_of_R (r);
11461      if (!b)
11462	{
11463	  R_ERROR ("no bocage");
11464	  return failure_indicator;
11465	}
11466	return and_order_get(b, or_node, ix);
11467	}
11468}
11469
11470@** Fork (FORK) Code.
11471In Marpa, a fork is any node of a parse tree.
11472In discussed Marpa's parse trees,
11473a leaf node is a special kind of |FORK|.
11474This terminology, while not unprecedented,
11475is unusual -- the usual term is "node".
11476The problem is that within Marpa,
11477the word "node" is already heavily overloaded.
11478So what most texts call "tree nodes" are here
11479called "forks".
11480@<Public typedefs@> =
11481typedef gint Marpa_Fork_ID;
11482@ @<Private typedefs@> =
11483typedef Marpa_Fork_ID FORKID;
11484@ @s FORK int
11485@<Private incomplete structures@> =
11486struct s_fork;
11487typedef struct s_fork* FORK;
11488@ @d OR_of_FORK(fork) ((fork)->t_or_node)
11489@d Choice_of_FORK(fork) ((fork)->t_choice)
11490@d Parent_of_FORK(fork) ((fork)->t_parent)
11491@d FORK_Cause_is_Ready(fork) ((fork)->t_is_cause_ready)
11492@d FORK_is_Cause(fork) ((fork)->t_is_cause_of_parent)
11493@d FORK_Predecessor_is_Ready(fork) ((fork)->t_is_predecessor_ready)
11494@d FORK_is_Predecessor(fork) ((fork)->t_is_predecessor_of_parent)
11495@s FORK_Object int
11496@<FORK structure@> =
11497struct s_fork {
11498    OR t_or_node;
11499    gint t_choice;
11500    FORKID t_parent;
11501    guint t_is_cause_ready:1;
11502    guint t_is_predecessor_ready:1;
11503    guint t_is_cause_of_parent:1;
11504    guint t_is_predecessor_of_parent:1;
11505};
11506typedef struct s_fork FORK_Object;
11507
11508@*0 Trace Functions.
11509
11510@ This is common logic in the |FORK| trace functions.
11511@<Check |r| and |fork_id|;
11512set |fork|@> = {
11513  FORK base_fork;
11514  BOC b = B_of_R(r);
11515  TREE tree;
11516  @<Fail if recognizer has fatal error@>@;
11517  if (!b) {
11518      R_ERROR("no bocage");
11519      return failure_indicator;
11520  }
11521  tree = TREE_of_RANK(RANK_of_B(b));
11522  if (!TREE_is_Initialized(tree)) {
11523      R_ERROR("tree not initialized");
11524      return failure_indicator;
11525  }
11526  if (TREE_is_Exhausted(tree)) {
11527      R_ERROR("bocage iteration exhausted");
11528      return failure_indicator;
11529  }
11530  base_fork = FSTACK_BASE(tree->t_fork_stack, FORK_Object);
11531  if (fork_id < 0) {
11532      R_ERROR("bad fork id");
11533      return failure_indicator;
11534  }
11535  if (fork_id >= FSTACK_LENGTH(tree->t_fork_stack)) {
11536      return -1;
11537  }
11538  fork = base_fork + fork_id;
11539}
11540
11541@ Return the ID of the or-node for |fork_id|.
11542@<Private function prototypes@> =
11543gint marpa_fork_or_node(struct marpa_r *r, int fork_id);
11544@ @<Function definitions@> =
11545gint marpa_fork_or_node(struct marpa_r *r, int fork_id)
11546{
11547  FORK fork;
11548  @<Return |-2| on failure@>@;
11549   @<Check |r| and |fork_id|; set |fork|@>@;
11550  return ID_of_OR(OR_of_FORK(fork));
11551}
11552
11553@ Return the current choice for |fork_id|.
11554@<Private function prototypes@> =
11555gint marpa_fork_choice(struct marpa_r *r, int fork_id);
11556@ @<Function definitions@> =
11557gint marpa_fork_choice(struct marpa_r *r, int fork_id)
11558{
11559  FORK fork;
11560  @<Return |-2| on failure@>@;
11561   @<Check |r| and |fork_id|; set |fork|@>@;
11562    return Choice_of_FORK(fork);
11563}
11564
11565@ Return the parent fork's ID for |fork_id|.
11566As with the other fork trace functions,
11567-1 is returned if |fork_id| is not the ID of
11568a fork on the stack,
11569but -1 can also be a valid value.
11570If that's an issue, the |fork_id| needs
11571to be checked with one of the trace functions
11572where -1 is never a valid value ---
11573for example, |marpa_fork_or_node|.
11574@<Private function prototypes@> =
11575gint marpa_fork_parent(struct marpa_r *r, int fork_id);
11576@ @<Function definitions@> =
11577gint marpa_fork_parent(struct marpa_r *r, int fork_id)
11578{
11579  FORK fork;
11580  @<Return |-2| on failure@>@;
11581   @<Check |r| and |fork_id|; set |fork|@>@;
11582    return Parent_of_FORK(fork);
11583}
11584
11585@ Return the cause-is-ready bit for |fork_id|.
11586@<Private function prototypes@> =
11587gint marpa_fork_cause_is_ready(struct marpa_r *r, int fork_id);
11588@ @<Function definitions@> =
11589gint marpa_fork_cause_is_ready(struct marpa_r *r, int fork_id)
11590{
11591  FORK fork;
11592  @<Return |-2| on failure@>@;
11593   @<Check |r| and |fork_id|; set |fork|@>@;
11594    return FORK_Cause_is_Ready(fork);
11595}
11596
11597@ Return the predecessor-is-ready bit for |fork_id|.
11598@<Private function prototypes@> =
11599gint marpa_fork_predecessor_is_ready(struct marpa_r *r, int fork_id);
11600@ @<Function definitions@> =
11601gint marpa_fork_predecessor_is_ready(struct marpa_r *r, int fork_id)
11602{
11603  FORK fork;
11604  @<Return |-2| on failure@>@;
11605   @<Check |r| and |fork_id|; set |fork|@>@;
11606    return FORK_Predecessor_is_Ready(fork);
11607}
11608
11609@ Return the is-cause bit for |fork_id|.
11610@<Private function prototypes@> =
11611gint marpa_fork_is_cause(struct marpa_r *r, int fork_id);
11612@ @<Function definitions@> =
11613gint marpa_fork_is_cause(struct marpa_r *r, int fork_id)
11614{
11615  FORK fork;
11616  @<Return |-2| on failure@>@;
11617   @<Check |r| and |fork_id|; set |fork|@>@;
11618    return FORK_is_Cause(fork);
11619}
11620
11621@ Return the is-predecessor bit for |fork_id|.
11622@<Private function prototypes@> =
11623gint marpa_fork_is_predecessor(struct marpa_r *r, int fork_id);
11624@ @<Function definitions@> =
11625gint marpa_fork_is_predecessor(struct marpa_r *r, int fork_id)
11626{
11627  FORK fork;
11628  @<Return |-2| on failure@>@;
11629   @<Check |r| and |fork_id|; set |fork|@>@;
11630    return FORK_is_Predecessor(fork);
11631}
11632
11633@** Event (EVE) Code.
11634@
11635@d SYMID_of_EVE(eve) ((eve)->marpa_token_id)
11636@d Value_of_EVE(eve) ((eve)->marpa_value)
11637@d RULEID_of_EVE(eve) ((eve)->marpa_rule_id)
11638@d Arg0_of_EVE(eve) ((eve)->marpa_arg_0)
11639@d ArgN_of_EVE(eve) ((eve)->marpa_arg_n)
11640@<Public structures@> =
11641struct marpa_event {
11642    Marpa_Symbol_ID marpa_token_id;
11643    gpointer marpa_value;
11644    Marpa_Rule_ID marpa_rule_id;
11645    gint marpa_arg_0;
11646    gint marpa_arg_n;
11647};
11648typedef struct marpa_event Marpa_Event;
11649@ @<Private typedefs@> =
11650typedef Marpa_Event *EVE;
11651
11652@** Evaluation (VAL) Code.
11653This code helps
11654compute a value for
11655a parse tree.
11656I say "helps" because evaluating a parse tree
11657involves semantics, and libmarpa has only
11658limited knowledge of the semantics.
11659This code is really just routines to assist
11660the higher level in tracking the evaluation stack.
11661\par
11662The main reason for this code is to hide libmarpa's
11663internal rewrites from the semantics.
11664If it were not for that, it would probably be
11665just as easy to provide a parse tree to the
11666higher level and let them decide how to
11667evaluation it.
11668@<Private incomplete structures@> =
11669struct s_value;
11670typedef struct s_value* VAL;
11671@ This structure tracks the top of the evaluation
11672stack, but does {\bf not} actually maintain the
11673actual evaluation stack ---
11674that is left for the upper layers to do.
11675It does, however, mantain a stack of the counts
11676of symbols in the
11677original (or "virtual") rules.
11678This enables libmarpa to make the rewriting of
11679the grammar invisible to the semantics.
11680@d VAL_is_Active(val) ((val)->t_active)
11681@d VAL_is_Trace(val) ((val)->t_trace)
11682@d FORK_of_VAL(val) ((val)->t_fork)
11683@d TOS_of_VAL(val) ((val)->t_tos)
11684@d VStack_of_VAL(val) ((val)->t_virtual_stack)
11685@<VAL structure@> =
11686struct s_value {
11687    DSTACK_DECLARE(t_virtual_stack);
11688    FORKID t_fork;
11689    gint t_tos;
11690    guint t_trace:1;
11691    guint t_active:1;
11692};
11693typedef struct s_value VAL_Object;
11694
11695@ @<Private function prototypes@> =
11696static inline void val_safe(VAL val);
11697@ @<Function definitions@> =
11698static inline void val_safe(VAL val)
11699{
11700    DSTACK_SAFE(val->t_virtual_stack);
11701    VAL_is_Active(val) = 0;
11702    VAL_is_Trace(val) = 0;
11703    TOS_of_VAL(val) = -1;
11704    FORK_of_VAL(val) = -1;
11705}
11706
11707@ @<Public function prototypes@> =
11708int marpa_val_new(struct marpa_r* r);
11709@ A dynamic stack is used here instead of a fixed
11710stack for two reasons.
11711First, there are only a few stack moves per call
11712of |marpa_val_event|.
11713Since at least one subroutine call occurs every few
11714virtual stack moves,
11715virtual stack moves are not really within a tight CPU
11716loop.
11717Therefore shaving off the few instructions it
11718takes to check stack size is less important than it is
11719in other places.
11720@ Second, the fixed stack, to accomodate the worst
11721case, would have to be many times larger than
11722what will usually be needed.
11723I calculate the
11724worst case for virtual stack size, as follows.
11725The virtual stack only grows once for each virtual
11726rules.
11727To be virtual, a rule must divide into a least two
11728"real" or rewritten, rules, so worst case is half
11729of all applications of real rules grow the virtual
11730stack.
11731The number of applications of real rules is
11732the size of the parse tree, $\size{|tree|}$.
11733So, if the fixed stack is sized per tree,
11734it must be $\size{|tree|}/2+1$.
11735@ I set the initial size of
11736the dynamic stack to be
11737$\size{|tree|}/1024$,
11738with a minimum of 1024.
117391024 is chosen because
11740in some modern configurations
11741a smaller allocation may require
11742extra work.
11743The purpose of the $\size{|tree|}/1024$ is
11744to guarantee that this code is $O(n)$.
11745$\size{|tree|}/1024$ is a fixed fraction
11746of the worst case size, so the number of
11747stack reallocations is $O(1)$.
11748@<Function definitions@> =
11749int marpa_val_new(struct marpa_r* r)
11750{
11751    BOC b;
11752    TREE tree;
11753    @<Return |-2| on failure@>@;
11754    @<Fail if recognizer has fatal error@>@;
11755    @<Set |b| to bocage; fail if none@>@;
11756    tree = TREE_of_RANK(RANK_of_B(b));
11757    if (TREE_is_Exhausted(tree)) {
11758       return -1;
11759    }
11760    if (!TREE_is_Initialized(tree))
11761      {
11762	R_ERROR ("tree not initialized");
11763	return failure_indicator;
11764      }
11765    {
11766      VAL val = VAL_of_TREE (tree);
11767      const gint minimum_stack_size = (8192 / sizeof (gint));
11768	const gint initial_stack_size =
11769	MAX (Size_of_TREE (tree) / 1024, minimum_stack_size);
11770      val_destroy (val);
11771      DSTACK_INIT (VStack_of_VAL (val), gint, initial_stack_size);
11772      VAL_is_Active(val) = 1;
11773    }
11774    return 1;
11775}
11776
11777@ @<Private function prototypes@> =
11778static inline void val_destroy(VAL val);
11779@ @<Function definitions@> =
11780static inline void val_destroy(VAL val)
11781{
11782
11783  if (DSTACK_IS_INITIALIZED(val->t_virtual_stack))
11784    {
11785      DSTACK_DESTROY(val->t_virtual_stack);
11786      DSTACK_SAFE(val->t_virtual_stack);
11787    }
11788    val_safe(val);
11789}
11790
11791@ @<Set |b|, |tree|, |val|;
11792return on failure@> = {
11793    @<Fail if recognizer has fatal error@>@;
11794    b = B_of_R(r);
11795    if (!b) {
11796	return failure_indicator;
11797    }
11798    tree = TREE_of_RANK(RANK_of_B(b));
11799    val = VAL_of_TREE(tree);
11800    if (!VAL_is_Active(val)) {
11801	return failure_indicator;
11802    }
11803}
11804
11805@ @<Public function prototypes@> =
11806gint marpa_val_trace(struct marpa_r* r, gint flag);
11807@ @<Function definitions@> =
11808gint marpa_val_trace(struct marpa_r* r, gint flag)
11809{
11810    BOC b;
11811    TREE tree;
11812    VAL val;
11813    @<Return |-2| on failure@>@;
11814    @<Set |b|, |tree|, |val|; return on failure@>@;
11815    VAL_is_Trace(val) = flag;
11816    return 1;
11817}
11818
11819@ @<Public function prototypes@> =
11820Marpa_Fork_ID marpa_val_fork(struct marpa_r* r);
11821@ @<Function definitions@> =
11822Marpa_Fork_ID marpa_val_fork(struct marpa_r* r)
11823{
11824    BOC b;
11825    TREE tree;
11826    VAL val;
11827    @<Return |-2| on failure@>@;
11828    @<Set |b|, |tree|, |val|; return on failure@>@;
11829    return FORK_of_VAL(val);
11830}
11831
11832@ @<Public function prototypes@> =
11833Marpa_Fork_ID marpa_val_event(struct marpa_r* r, Marpa_Event* event);
11834@ @<Function definitions@> =
11835Marpa_Fork_ID marpa_val_event(struct marpa_r* r, Marpa_Event* event)
11836{
11837    BOC b;
11838    TREE tree;
11839    VAL val;
11840    AND and_nodes;
11841    gint semantic_rule_id = -1;
11842    gint token_id = -1;
11843    gpointer token_value = NULL;
11844    gint arg_0 = -1;
11845    gint arg_n = -1;
11846    FORKID fork_ix;
11847    gint continue_with_next_fork;
11848
11849    /* event is not changed in case of hard failure */
11850    @<Return |-2| on failure@>@;
11851    @<Set |b|, |tree|, |val|; return on failure@>@;
11852    and_nodes = ANDs_of_B(b);
11853
11854    arg_0 = arg_n = TOS_of_VAL(val);
11855    fork_ix = FORK_of_VAL(val);
11856    if (fork_ix < 0) {
11857	fork_ix = Size_of_TREE(tree);
11858    }
11859    continue_with_next_fork = !VAL_is_Trace(val);
11860
11861    while (1) {
11862	OR or;
11863	RULE fork_rule;
11864	fork_ix--;
11865	if (fork_ix < 0) goto RETURN_SOFT_ERROR;
11866	{
11867	    ANDID and_node_id;
11868	    AND and_node;
11869	    const FORK fork = FORK_of_TREE_by_IX(tree, fork_ix);
11870	    const gint choice = Choice_of_FORK(fork);
11871	    or = OR_of_FORK(fork);
11872	    and_node_id = and_order_get(b, or, choice);
11873	    and_node = and_nodes + and_node_id;
11874	    token_id = and_node_token(and_node, &token_value);
11875	}
11876	if (token_id >= 0) {
11877	    arg_0 = ++arg_n;
11878	    continue_with_next_fork = 0;
11879	}
11880	fork_rule = RULE_of_OR(or);
11881	if (Position_of_OR(or) == Length_of_RULE(fork_rule)) {
11882	    gint virtual_rhs = RULE_is_Virtual_RHS(fork_rule);
11883	    gint virtual_lhs = RULE_is_Virtual_LHS(fork_rule);
11884	    gint real_symbol_count;
11885	    const DSTACK virtual_stack = &VStack_of_VAL(val);
11886	    if (virtual_lhs) {
11887	        real_symbol_count = Real_SYM_Count_of_RULE(fork_rule);
11888		if (virtual_rhs) {
11889		    *(DSTACK_TOP(*virtual_stack, gint)) += real_symbol_count;
11890		} else {
11891		    *DSTACK_PUSH(*virtual_stack, gint) = real_symbol_count;
11892		}
11893		goto NEXT_FORK;
11894	    }
11895	    if (virtual_rhs) {
11896	        real_symbol_count = Real_SYM_Count_of_RULE(fork_rule);
11897		real_symbol_count += *DSTACK_POP(*virtual_stack, gint);
11898	    } else {
11899	        real_symbol_count = Length_of_RULE(fork_rule);
11900	    }
11901	    arg_0 = arg_n - real_symbol_count + 1;
11902	    semantic_rule_id =
11903	      fork_rule->t_is_semantic_equivalent ?
11904		  fork_rule->t_original : ID_of_RULE(fork_rule);
11905	    continue_with_next_fork = 0;
11906	}
11907	NEXT_FORK: ;
11908	if (!continue_with_next_fork) break;
11909    }
11910
11911    @<Write results to |val| and |event|@>@;
11912    return FORK_of_VAL(val);
11913
11914    RETURN_SOFT_ERROR: ;
11915    @<Write results to |val| and |event|@>@;
11916    return -1;
11917
11918}
11919
11920@ @<Write results to |val| and |event|@> =
11921{
11922    SYMID_of_EVE(event) = token_id;
11923    Value_of_EVE(event) = token_value;
11924    RULEID_of_EVE(event) = semantic_rule_id;
11925    TOS_of_VAL(val) = Arg0_of_EVE(event) = arg_0;
11926    FORK_of_VAL(val) = fork_ix;
11927    ArgN_of_EVE(event) = arg_n;
11928}
11929
11930@** Boolean Vectors.
11931Marpa's boolean vectors are adapted from
11932Steffen Beyer's Bit-Vector package on CPAN.
11933This is a combined Perl package and C library for handling
11934bit vectors.
11935Someone seeking a general bit vector package should
11936look at Steffen's instead.
11937|libmarpa|'s boolean vectors are tightly tied in
11938with its own needs and environment.
11939@<Private typedefs@> =
11940typedef guint Bit_Vector_Word;
11941typedef Bit_Vector_Word* Bit_Vector;
11942@ Some defines and constants
11943@d BV_BITS(bv) *(bv-3)
11944@d BV_SIZE(bv) *(bv-2)
11945@d BV_MASK(bv) *(bv-1)
11946@<Private global variables@> =
11947static const guint bv_wordbits = sizeof(Bit_Vector_Word)*8u;
11948static const guint bv_modmask = sizeof(Bit_Vector_Word)*8u-1u;
11949static const guint bv_hiddenwords = 3;
11950static const guint bv_lsb = 1u;
11951static const guint bv_msb = (1u << (sizeof(Bit_Vector_Word)*8u-1u));
11952
11953@ Given a number of bits, compute the size.
11954@<Function definitions@> =
11955static inline guint bv_bits_to_size(guint bits)
11956{
11957    return (bits+bv_modmask)/bv_wordbits;
11958}
11959@ @<Private function prototypes@> =
11960static inline guint bv_bits_to_size(guint bits);
11961@ Given a number of bits, compute the unused-bit mask.
11962@<Function definitions@> =
11963static inline guint bv_bits_to_unused_mask(guint bits)
11964{
11965    guint mask = bits & bv_modmask;
11966    if (mask) mask = (guint) ~(~0uL << mask); else mask = (guint) ~0uL;
11967    return(mask);
11968}
11969@ @<Private function prototypes@> =
11970static inline guint bv_bits_to_unused_mask(guint bits);
11971
11972@*0 Create a Boolean Vector.
11973@<Private function prototypes@> =
11974static inline Bit_Vector bv_create(guint bits);
11975@ Always start with an all-zero vector.
11976Note this code is a bit tricky ---
11977the pointer returned is to the data.
11978This is offset from the |g_malloc|'d space,
11979by |bv_hiddenwords|.
11980@<Function definitions@> =
11981static inline Bit_Vector bv_create(guint bits)
11982{
11983    guint size = bv_bits_to_size(bits);
11984    guint bytes = (size + bv_hiddenwords) << sizeof(guint);
11985    guint* addr = (Bit_Vector) g_malloc0((size_t) bytes);
11986    *addr++ = bits;
11987    *addr++ = size;
11988    *addr++ = bv_bits_to_unused_mask(bits);
11989    return addr;
11990}
11991
11992@*0 Create a Boolean Vector on an Obstack.
11993@<Private function prototypes@> =
11994static inline Bit_Vector bv_obs_create(struct obstack *obs, guint bits);
11995@ Always start with an all-zero vector.
11996Note this code is a bit tricky ---
11997the pointer returned is to the data.
11998This is offset from the |g_malloc|'d space,
11999by |bv_hiddenwords|.
12000@<Function definitions@> =
12001static inline Bit_Vector
12002bv_obs_create (struct obstack *obs, guint bits)
12003{
12004  guint size = bv_bits_to_size (bits);
12005  guint bytes = (size + bv_hiddenwords) << sizeof (guint);
12006  guint *addr = (Bit_Vector) obstack_alloc (obs, (size_t) bytes);
12007  *addr++ = bits;
12008  *addr++ = size;
12009  *addr++ = bv_bits_to_unused_mask (bits);
12010  if (size > 0) {
12011      Bit_Vector bv = addr;
12012      while (size--) *bv++ = 0u;
12013  }
12014  return addr;
12015}
12016
12017
12018@*0 Shadow a Boolean Vector.
12019Create another vector the same size as the original, but with
12020all bits unset.
12021@<Function definitions@> =
12022static inline Bit_Vector bv_shadow(Bit_Vector bv)
12023{
12024    return bv_create(BV_BITS(bv));
12025}
12026@ @<Private function prototypes@> =
12027static inline Bit_Vector bv_shadow(Bit_Vector bv);
12028
12029@*0 Clone a Boolean Vector.
12030Given a boolean vector, creates a new vector which is
12031an exact duplicate.
12032This call allocates a new vector, which must be |g_free|'d.
12033@<Function definitions@> = static inline
12034Bit_Vector bv_copy(Bit_Vector bv_to, Bit_Vector bv_from)
12035{
12036    guint *p_to = bv_to;
12037    const guint bits = BV_BITS(bv_to);
12038    if (bits > 0)
12039    {
12040        gint count = BV_SIZE(bv_to);
12041	while (count--) *p_to++ = *bv_from++;
12042    }
12043    return(bv_to);
12044}
12045@ @<Private function prototypes@> =
12046static inline
12047Bit_Vector bv_copy(Bit_Vector bv_to, Bit_Vector bv_from);
12048
12049@*0 Clone a Boolean Vector.
12050Given a boolean vector, creates a new vector which is
12051an exact duplicate.
12052This call allocates a new vector, which must be |g_free|'d.
12053@<Function definitions@> = static inline
12054Bit_Vector bv_clone(Bit_Vector bv)
12055{
12056    return bv_copy(bv_shadow(bv), bv);
12057}
12058@ @<Private function prototypes@> =
12059static inline
12060Bit_Vector bv_clone(Bit_Vector bv);
12061
12062@*0 Free a Boolean Vector.
12063@<Function definitions@> =
12064static inline void bv_free(Bit_Vector vector) {
12065    vector -= bv_hiddenwords;
12066    g_free(vector);
12067}
12068@ @<Private function prototypes@> =
12069static inline void bv_free(Bit_Vector vector);
12070
12071@*0 The Number of Bytes in a Boolean Vector.
12072@<Function definitions@> =
12073static inline gint bv_bytes(Bit_Vector bv) {
12074    return (BV_SIZE(bv)+bv_hiddenwords)*sizeof(Bit_Vector_Word);
12075}
12076@ @<Private function prototypes@> =
12077static inline gint bv_bytes(Bit_Vector bv);
12078
12079@*0 Fill a Boolean Vector.
12080@<Function definitions@> =
12081static inline void bv_fill(Bit_Vector bv)
12082{
12083    guint size = BV_SIZE(bv);
12084    if (size <= 0) return;
12085    while (size--) *bv++ = ~0u;
12086    --bv;
12087    *bv &= BV_MASK(bv);
12088}
12089@ @<Private function prototypes@> =
12090static inline void bv_fill(Bit_Vector bv);
12091
12092@*0 Clear a Boolean Vector.
12093@ @<Private function prototypes@> =
12094static inline void bv_clear(Bit_Vector bv);
12095@ @<Function definitions@> =
12096static inline void bv_clear(Bit_Vector bv)
12097{
12098    guint size = BV_SIZE(bv);
12099    if (size <= 0) return;
12100    while (size--) *bv++ = 0u;
12101}
12102
12103@ This function "overclears" ---
12104it clears "too many bits".
12105It clears a prefix of the bit vector faster
12106than an interval clear, at the expense of often
12107clearing more bits than were requested.
12108In some situations clearing the extra bits is OK.
12109@<Private function prototypes@> =
12110static inline void bv_over_clear(Bit_Vector bv, guint bit);
12111@ @<Function definitions@> =
12112static inline void bv_over_clear(Bit_Vector bv, guint bit)
12113{
12114    guint length = bit/bv_wordbits+1;
12115    while (length--) *bv++ = 0u;
12116}
12117
12118@*0 Set a Boolean Vector Bit.
12119@ @<Function definitions@> =
12120static inline void bv_bit_set(Bit_Vector vector, guint bit) {
12121    *(vector+(bit/bv_wordbits)) |= (bv_lsb << (bit%bv_wordbits));
12122}
12123@ @<Private function prototypes@> =
12124static inline void bv_bit_set(Bit_Vector vector, guint bit);
12125
12126@*0 Clear a Boolean Vector Bit.
12127@<Function definitions@> =
12128static inline void bv_bit_clear(Bit_Vector vector, guint bit) {
12129    *(vector+(bit/bv_wordbits)) &= ~ (bv_lsb << (bit%bv_wordbits));
12130}
12131@ @<Private function prototypes@> =
12132static inline void bv_bit_clear(Bit_Vector vector, guint bit);
12133
12134@*0 Test a Boolean Vector Bit.
12135@<Function definitions@> =
12136static inline gboolean bv_bit_test(Bit_Vector vector, guint bit) {
12137    return (*(vector+(bit/bv_wordbits)) & (bv_lsb << (bit%bv_wordbits))) != 0u;
12138}
12139@ @<Private function prototypes@> =
12140static inline gboolean bv_bit_test(Bit_Vector vector, guint bit);
12141
12142@*0 Test and Set a Boolean Vector Bit.
12143Ensure that a bit is set and returning its value to the call.
12144@ @<Private function prototypes@> =
12145static inline gboolean bv_bit_test_and_set(Bit_Vector vector, guint bit);
12146@ @<Function definitions@> =
12147static inline gboolean
12148bv_bit_test_and_set (Bit_Vector vector, guint bit)
12149{
12150  Bit_Vector addr = vector + (bit / bv_wordbits);
12151  guint mask = bv_lsb << (bit % bv_wordbits);
12152  if ((*addr & mask) != 0u)
12153    return 1;
12154  *addr |= mask;
12155  return 0;
12156}
12157
12158@*0 Set a Boolean Vector to all Ones.
12159@*0 Test a Boolean Vector for all Zeroes.
12160@<Function definitions@> =
12161static inline
12162gboolean bv_is_empty(Bit_Vector addr)
12163{
12164    guint  size = BV_SIZE(addr);
12165    gboolean r = TRUE;
12166    if (size > 0) {
12167        *(addr+size-1) &= BV_MASK(addr);
12168        while (r && (size-- > 0)) r = ( *addr++ == 0 );
12169    }
12170    return(r);
12171}
12172@ @<Private function prototypes@> =
12173static inline
12174gboolean bv_is_empty(Bit_Vector addr);
12175
12176@*0 Bitwise-negate a Boolean Vector.
12177@<Function definitions@>=
12178static inline void bv_not(Bit_Vector X, Bit_Vector Y)
12179{
12180    guint size = BV_SIZE(X);
12181    guint mask = BV_MASK(X);
12182    while (size-- > 0) *X++ = ~*Y++;
12183    *(--X) &= mask;
12184}
12185@ @<Private function prototypes@> =
12186static inline void bv_not(Bit_Vector X, Bit_Vector Y);
12187
12188@*0 Bitwise-and a Boolean Vector.
12189@<Function definitions@>=
12190static inline void bv_and(Bit_Vector X, Bit_Vector Y, Bit_Vector Z)
12191{
12192    guint size = BV_SIZE(X);
12193    guint mask = BV_MASK(X);
12194    while (size-- > 0) *X++ = *Y++ & *Z++;
12195    *(--X) &= mask;
12196}
12197@ @<Private function prototypes@> =
12198static inline void bv_and(Bit_Vector X, Bit_Vector Y, Bit_Vector Z);
12199
12200@*0 Bitwise-or a Boolean Vector.
12201@<Function definitions@>=
12202static inline void bv_or(Bit_Vector X, Bit_Vector Y, Bit_Vector Z)
12203{
12204    guint size = BV_SIZE(X);
12205    guint mask = BV_MASK(X);
12206    while (size-- > 0) *X++ = *Y++ | *Z++;
12207    *(--X) &= mask;
12208}
12209@ @<Private function prototypes@> =
12210static inline void bv_or(Bit_Vector X, Bit_Vector Y, Bit_Vector Z);
12211
12212@*0 Bitwise-or-assign a Boolean Vector.
12213@<Function definitions@>=
12214static inline void bv_or_assign(Bit_Vector X, Bit_Vector Y)
12215{
12216    guint size = BV_SIZE(X);
12217    guint mask = BV_MASK(X);
12218    while (size-- > 0) *X++ |= *Y++;
12219    *(--X) &= mask;
12220}
12221@ @<Private function prototypes@> =
12222static inline void bv_or_assign(Bit_Vector X, Bit_Vector Y);
12223
12224@*0 Scan a Boolean Vector.
12225@<Function definitions@>=
12226static inline
12227gboolean bv_scan(Bit_Vector bv, guint start,
12228                                    guint* min, guint* max)
12229{
12230    guint  size = BV_SIZE(bv);
12231    guint  mask = BV_MASK(bv);
12232    guint  offset;
12233    guint  bitmask;
12234    guint  value;
12235    gboolean empty;
12236
12237    if (size == 0) return FALSE;
12238    if (start >= BV_BITS(bv)) return FALSE;
12239    *min = start;
12240    *max = start;
12241    offset = start / bv_wordbits;
12242    *(bv+size-1) &= mask;
12243    bv += offset;
12244    size -= offset;
12245    bitmask = (guint)1 << (start & bv_modmask);
12246    mask = ~ (bitmask | (bitmask - (guint)1));
12247    value = *bv++;
12248    if ((value & bitmask) == 0)
12249    {
12250        value &= mask;
12251        if (value == 0)
12252        {
12253            offset++;
12254            empty = TRUE;
12255            while (empty && (--size > 0))
12256            {
12257                if ((value = *bv++)) empty = FALSE; else offset++;
12258            }
12259            if (empty) return FALSE;
12260        }
12261        start = offset * bv_wordbits;
12262        bitmask = bv_lsb;
12263        mask = value;
12264        while (!(mask & bv_lsb))
12265        {
12266            bitmask <<= 1;
12267            mask >>= 1;
12268            start++;
12269        }
12270        mask = ~ (bitmask | (bitmask - 1));
12271        *min = start;
12272        *max = start;
12273    }
12274    value = ~ value;
12275    value &= mask;
12276    if (value == 0)
12277    {
12278        offset++;
12279        empty = TRUE;
12280        while (empty && (--size > 0))
12281        {
12282            if ((value = ~ *bv++)) empty = FALSE; else offset++;
12283        }
12284        if (empty) value = bv_lsb;
12285    }
12286    start = offset * bv_wordbits;
12287    while (! (value & bv_lsb))
12288    {
12289        value >>= 1;
12290        start++;
12291    }
12292    *max = --start;
12293    return TRUE;
12294}
12295@ @<Private function prototypes@> =
12296static inline
12297gboolean bv_scan(
12298    Bit_Vector bv, guint start, guint* min, guint* max);
12299
12300@*0 Count the bits in a Boolean Vector.
12301@<Function definitions@>=
12302static inline guint
12303bv_count (Bit_Vector v)
12304{
12305  guint start, min, max;
12306  guint count = 0;
12307  for (start = 0; bv_scan (v, start, &min, &max); start = max + 2)
12308    {
12309      count += max - min + 1;
12310    }
12311    return count;
12312}
12313@ @<Private function prototypes@> =
12314static inline guint bv_count (Bit_Vector v);
12315
12316@*0 The RHS Closure of a Vector.
12317Despite the fact that they are actually tied closely to their
12318use in |libmarpa|, most of the logic of boolean vectors has
12319a ``pure math" appearance.
12320This routine has a direct connection with the grammar.
12321\par
12322Several properties of symbols that need to be determined
12323have the property that, if
12324all the symbols on the RHS of any rule have that property,
12325so does its LHS symbol.
12326@ The RHS closure looks a lot like the transitive closure,
12327but there are several major differences.
12328The biggest difference is that
12329the RHS closure deals with properties and takes a {\bf vector} to another
12330vector;
12331the transitive closure is for a relation and takes a transition {\bf matrix}
12332to another transition matrix.
12333@ There are two properties of the RHS closure to note.
12334First, it is reflexive.
12335Any symbol in a set is in the RHS closure of that set.
12336@ Second, the RHS closure is vacuously true.
12337For any RHS closure property,
12338every symbol which is on the LHS of an empty rule has that property.
12339This means the RHS closure operation can only be used for
12340properties which can meaningfully be regarded as vacuously
12341true.
12342In |libmarpa|, two important symbol properties are
12343RHS clousure properties:
12344the property of being productive,
12345and the property of being nullable.
12346
12347@*0 Produce the RHS Closure of a Vector.
12348This routine takes a symbol vector and a grammar,
12349and turns the original vector into the RHS closure of that vector.
12350The orignal vector is destroyed.
12351\par
12352If I decide rules should have a unique right hand symbol list,
12353this is one place to use it.
12354Duplicate symbols on the RHS are visited uselessly.
12355@<Function definitions@> =
12356static void
12357rhs_closure (struct marpa_g *g, Bit_Vector bv)
12358{
12359  guint min, max, start = 0;
12360  Marpa_Symbol_ID *top_of_stack = NULL;
12361  FSTACK_DECLARE (stack, Marpa_Symbol_ID)@;
12362  FSTACK_INIT (stack, Marpa_Symbol_ID, SYM_Count_of_G(g));
12363  while (bv_scan (bv, start, &min, &max))
12364    {
12365      guint symid;
12366      for (symid = min; symid <= max; symid++)
12367	{
12368	  *(FSTACK_PUSH (stack)) = symid;
12369	}
12370      start = max + 2;
12371    }
12372  while ((top_of_stack = FSTACK_POP (stack)))
12373    {
12374      guint rule_ix;
12375      GArray *rules = SYM_by_ID (*top_of_stack)->t_rhs;
12376      for (rule_ix = 0; rule_ix < rules->len; rule_ix++)
12377	{
12378	  Marpa_Rule_ID rule_id =
12379	    g_array_index (rules, Marpa_Rule_ID, rule_ix);
12380	  RULE rule = RULE_by_ID (g, rule_id);
12381	  guint rule_length;
12382	  guint rh_ix;
12383	  Marpa_Symbol_ID lhs_id = LHS_ID_of_RULE (rule);
12384	  if (bv_bit_test (bv, (guint) lhs_id))
12385	    goto NEXT_RULE;
12386	  rule_length = Length_of_RULE(rule);
12387	  for (rh_ix = 0; rh_ix < rule_length; rh_ix++)
12388	    {
12389	      if (!bv_bit_test (bv, (guint) RHS_ID_of_RULE (rule, rh_ix)))
12390		goto NEXT_RULE;
12391	    }
12392	  /* If I am here, the bits for the RHS symbols are all
12393	   * set, but the one for the LHS symbol is not.
12394	   */
12395	  bv_bit_set (bv, (guint) lhs_id);
12396	  *(FSTACK_PUSH (stack)) = lhs_id;
12397	NEXT_RULE:;
12398	}
12399    }
12400  FSTACK_DESTROY (stack);
12401}
12402@ @<Private function prototypes@> =
12403static void rhs_closure(struct marpa_g* g, Bit_Vector bv);
12404
12405@** Boolean Matrixes.
12406Marpa's Boolean matrixes are implemented differently
12407from the matrixes in
12408Steffen Beyer's Bit-Vector package on CPAN,
12409but like Beyer's matrixes are build on that package.
12410Beyer's matrixes are a single Boolean vector
12411which special routines index by row and column.
12412Marpa's matrixes are arrays of vectors.
12413
12414Since there are ``hidden words" before the data
12415in each vectors, Marpa must repeat these for each
12416row of a vector.  Consequences:
12417\li Marpa matrixes use a few extra bytes per row of space.
12418\li Marpa's matrix pointers cannot be used as vectors.
12419\li Marpa's rows {\bf can} be used as vectors.
12420\li Marpa's matrix pointers point to the beginning of
12421the allocated space.  |Bit_Vector| pointers use trickery
12422and include ``hidden words" before the pointer.
12423@ Note that |typedef|'s for |Bit_Matrix|
12424and |Bit_Vector| are identical.
12425@s Bit_Matrix int
12426@<Private typedefs@> =
12427typedef Bit_Vector_Word* Bit_Matrix;
12428
12429@*0 Create a Boolean Matrix.
12430@ Here the pointer returned is the actual start of the
12431|g_malloc|'d space.
12432This is {\bf not} the case with vectors, whose pointer is offset for
12433the ``hidden words".
12434@<Function definitions@> =
12435static inline Bit_Matrix matrix_create(guint rows, guint columns)
12436{
12437    guint bv_data_words = bv_bits_to_size(columns);
12438    guint row_bytes = (bv_data_words + bv_hiddenwords) * sizeof(Bit_Vector_Word);
12439    guint bv_mask = bv_bits_to_unused_mask(columns);
12440    Bit_Vector_Word* matrix_addr = g_malloc0((size_t)(row_bytes * rows));
12441    guint row;
12442    for (row = 0; row < rows; row++) {
12443	guint row_start = row*(bv_data_words+bv_hiddenwords);
12444	matrix_addr[row_start] = columns;
12445	matrix_addr[row_start+1] = bv_data_words;
12446	matrix_addr[row_start+2] = bv_mask;
12447    }
12448    return matrix_addr;
12449}
12450@ @<Private function prototypes@> =
12451static inline Bit_Matrix matrix_create(guint rows, guint columns);
12452
12453@*0 Free a Boolean Matrix.
12454@<Function definitions@> =
12455static inline void matrix_free(Bit_Matrix matrix) {
12456    g_free(matrix);
12457}
12458@ @<Private function prototypes@> =
12459static inline void matrix_free(Bit_Matrix matrix);
12460
12461@*0 Find the Number of Columns in a Boolean Matrix.
12462The column count returned is for the first row.
12463It is assumed that
12464all rows have the same number of columns.
12465Note that, in this implementation, the matrix has no
12466idea internally of how many rows it has.
12467@<Function definitions@> =
12468static inline gint matrix_columns(Bit_Matrix matrix) {
12469    Bit_Vector row0 = matrix+bv_hiddenwords;
12470     return BV_BITS(row0);
12471}
12472@ @<Private function prototypes@> =
12473static inline gint matrix_columns(Bit_Matrix matrix);
12474
12475@*0 Find a Row of a Boolean Matrix.
12476Here's where the slight extra overhead of repeating
12477identical ``hidden word" data for each row of a matrix
12478pays off.
12479This simply returns a pointer into the matrix.
12480This is adequate if the data is not changed.
12481If it is changed, the vector should be cloned.
12482There is a bit of arithmetic, to deal with the
12483hidden words offset.
12484@<Function definitions@> =
12485static inline Bit_Vector matrix_row(Bit_Matrix matrix, guint row) {
12486    Bit_Vector row0 = matrix+bv_hiddenwords;
12487    guint words_per_row = BV_SIZE(row0)+bv_hiddenwords;
12488    return row0 + row*words_per_row;
12489}
12490@ @<Private function prototypes@> =
12491static inline Bit_Vector matrix_row(Bit_Matrix matrix, guint row);
12492
12493@*0 Set a Boolean Matrix Bit.
12494@ @<Function definitions@> =
12495static inline void matrix_bit_set(Bit_Matrix matrix, guint row, guint column) {
12496    Bit_Vector vector = matrix_row(matrix, row);
12497    bv_bit_set(vector, column);
12498}
12499@ @<Private function prototypes@> =
12500static inline void matrix_bit_set(Bit_Matrix matrix, guint row, guint column);
12501
12502@*0 Clear a Boolean Matrix Bit.
12503@ @<Function definitions@> =
12504static inline void matrix_bit_clear(Bit_Matrix matrix, guint row, guint column) {
12505    Bit_Vector vector = matrix_row(matrix, row);
12506    bv_bit_clear(vector, column);
12507}
12508@ @<Private function prototypes@> =
12509static inline void matrix_bit_clear(Bit_Matrix matrix, guint row, guint column);
12510
12511@*0 Test a Boolean Matrix Bit.
12512@ @<Function definitions@> =
12513static inline gboolean matrix_bit_test(Bit_Matrix matrix, guint row, guint column) {
12514    Bit_Vector vector = matrix_row(matrix, row);
12515    return bv_bit_test(vector, column);
12516}
12517@ @<Private function prototypes@> =
12518static inline gboolean matrix_bit_test(Bit_Matrix matrix, guint row, guint column);
12519
12520@*0 Produce the Transitive Closure of a Boolean Matrix.
12521This routine takes a matrix representing a relation
12522and produces a matrix that represents the transitive closure
12523of the relation.
12524The matrix is assumed to be square.
12525The input matrix will be destroyed.
12526@<Function definitions@> =
12527static void transitive_closure(Bit_Matrix matrix)
12528{
12529      struct transition { guint from, to; } * top_of_stack = NULL;
12530      guint size = matrix_columns(matrix);
12531      guint row;
12532      DSTACK_DECLARE(stack);
12533      DSTACK_INIT(stack, struct transition, 1024);
12534      for (row = 0; row < size; row++) {
12535          guint min, max, start;
12536	  Bit_Vector row_vector = matrix_row(matrix, row);
12537	for ( start = 0; bv_scan(row_vector, start, &min, &max); start = max+2 ) {
12538	    guint column;
12539	    for (column = min; column <= max; column++) {
12540		struct transition *t = DSTACK_PUSH(stack, struct transition);
12541		t->from = row;
12542		t->to = column;
12543    } } }
12544    while ((top_of_stack = DSTACK_POP(stack, struct transition))) {
12545	guint old_from = top_of_stack->from;
12546	guint old_to = top_of_stack->to;
12547	guint new_ix;
12548	for (new_ix = 0; new_ix < size; new_ix++) {
12549	     /* Optimizations based on reuse of the same row are
12550	       probably best left to the compiler's optimizer.
12551	      */
12552	     if (!matrix_bit_test(matrix, new_ix, old_to) &&
12553	     matrix_bit_test(matrix, new_ix, old_from)) {
12554		 struct transition *t = (DSTACK_PUSH(stack, struct transition));
12555		  matrix_bit_set(matrix, new_ix, old_to);
12556		 t->from = new_ix;
12557		 t->to = old_to;
12558		}
12559	     if (!matrix_bit_test(matrix, old_from, new_ix) &&
12560	     matrix_bit_test(matrix, old_to, new_ix)) {
12561		 struct transition *t = (DSTACK_PUSH(stack, struct transition));
12562		  matrix_bit_set(matrix, old_from, new_ix);
12563		 t->from = old_from;
12564		 t->to = new_ix;
12565		}
12566	}
12567    }
12568      DSTACK_DESTROY(stack);
12569}
12570@ @<Private function prototypes@> =
12571static void transitive_closure(Bit_Matrix matrix);
12572
12573@** Efficient Stacks and Queues.
12574@ The interface for these macros is somewhat hackish,
12575in that the user often
12576must be aware of the implementation of the
12577macros.
12578Arguably, using these macros is not
12579all that easier than
12580hand-writing each instance.
12581But the most important goal was safety -- by
12582writing this stuff once I have a greater assurance
12583that it is tested and bug-free.
12584Another important goal was that there be
12585no compromise on efficiency,
12586when compared to hand-written code.
12587
12588@*0 Fixed Size Stacks.
12589|libmarpa| uses stacks and worklists extensively.
12590Often a reasonable maximum size is known when they are
12591set up, in which case they can be made very fast.
12592@d FSTACK_DECLARE(stack, type) struct { gint t_count; type* t_base; } stack;
12593@d FSTACK_CLEAR(stack) ((stack).t_count = 0)
12594@d FSTACK_INIT(stack, type, n) (FSTACK_CLEAR(stack), ((stack).t_base = g_new(type, n)))
12595@d FSTACK_SAFE(stack) ((stack).t_base = NULL)
12596@d FSTACK_BASE(stack, type) ((type *)(stack).t_base)
12597@d FSTACK_INDEX(this, type, ix) (FSTACK_BASE((this), type)+(ix))
12598@d FSTACK_TOP(this, type) (FSTACK_LENGTH(this) <= 0
12599   ? NULL
12600   : FSTACK_INDEX((this), type, FSTACK_LENGTH(this)-1))
12601@d FSTACK_LENGTH(stack) ((stack).t_count)
12602@d FSTACK_PUSH(stack) ((stack).t_base+stack.t_count++)
12603@d FSTACK_POP(stack) ((stack).t_count <= 0 ? NULL : (stack).t_base+(--(stack).t_count))
12604@d FSTACK_IS_INITIALIZED(stack) ((stack).t_base)
12605@d FSTACK_DESTROY(stack) (g_free((stack).t_base))
12606
12607@*0 Dynamic Stacks.
12608|libmarpa| uses stacks and worklists extensively.
12609This stack interface resizes itself dynamically.
12610There are two disadvantages.
12611
12612\li There is more overhead ---
12613overflow must be checked for with each push,
12614and the resizings, while fast, do take time.
12615
12616\li The stack may be moved after any |DSTACK_PUSH|
12617operation, making all pointers into it invalid.
12618Data must be retrieved from the stack before the
12619next |DSTACK_PUSH|.
12620
12621@d DSTACK_DECLARE(this) struct s_dstack this
12622@d DSTACK_INIT(this, type, initial_size)
12623  (((this).t_count = 0),
12624  ((this).t_base = g_new(type, ((this).t_capacity = (initial_size)))))
12625
12626@ |DSTACK_SAFE| is for cases where the dstack is not
12627immediately initialized to a useful value,
12628and might never be.
12629All fields are zeroed so that when the containing object
12630is destroyed, the deallocation logic knows that no
12631memory has been allocated and therefore no attempt
12632to free memory should be made.
12633@d DSTACK_IS_INITIALIZED(this) ((this).t_base)
12634@d DSTACK_SAFE(this)
12635  (((this).t_count = (this).t_capacity = 0), ((this).t_base = NULL))
12636
12637@ A stack reinitialized by
12638|DSTACK_CLEAR| contains 0 elements,
12639but has the same capacity as it had before the reinitialization.
12640This saves the cost of reallocating the dstack's buffer,
12641and leaves its capacity at what is hopefully
12642a stable, high-water mark, which will make future
12643resizings unnecessary.
12644@d DSTACK_CLEAR(this) ((this).t_count = 0)
12645@d DSTACK_PUSH(this, type)
12646    (((this).t_count >= (this).t_capacity ? dstack_resize(&(this), sizeof(type)) : 0),
12647     ((type *)(this).t_base+(this).t_count++))
12648@d DSTACK_POP(this, type) ((this).t_count <= 0 ? NULL :
12649    ( (type*)(this).t_base+(--(this).t_count)))
12650@d DSTACK_INDEX(this, type, ix) (DSTACK_BASE((this), type)+(ix))
12651@d DSTACK_TOP(this, type) (DSTACK_LENGTH(this) <= 0
12652   ? NULL
12653   : DSTACK_INDEX((this), type, DSTACK_LENGTH(this)-1))
12654@d DSTACK_BASE(this, type) ((type *)(this).t_base)
12655@d DSTACK_LENGTH(this) ((this).t_count)
12656
12657@
12658|DSTACK|'s can have their data ``stolen", by other containers.
12659The |STOLEN_DSTACK_DATA_FREE| macro is intended
12660to help the ``thief" container
12661deallocate the data it now has ``stolen".
12662@d STOLEN_DSTACK_DATA_FREE(data) ((data) && (g_free(data), 1))
12663@d DSTACK_DESTROY(this) STOLEN_DSTACK_DATA_FREE(this.t_base)
12664
12665@<Private incomplete structures@> =
12666struct s_dstack;
12667typedef struct s_dstack* DSTACK;
12668@ @<Private utility structures@> =
12669struct s_dstack { gint t_count; gint t_capacity; gpointer t_base; };
12670@ @<Function definitions@> =
12671static inline gpointer dstack_resize(struct s_dstack* this, gsize type_bytes) {
12672    this->t_capacity *= 2;
12673    this->t_base = g_realloc(this->t_base, this->t_capacity*type_bytes);
12674    return this->t_base;
12675}
12676@ @<Private function prototypes@> =
12677static inline gpointer dstack_resize(struct s_dstack* this, gsize type_size);
12678
12679@*0 Dynamic Queues.
12680This is simply a dynamic stack extended with a second
12681index.
12682These is no destructor at this point, because so far all uses
12683of this let another container ``steal" the data from this one.
12684When one exists, it will simply call the dynamic stack destructor.
12685Instead I define a destructor for the ``thief" container to use
12686when it needs to free the data.
12687
12688@d DQUEUE_DECLARE(this) struct s_dqueue this
12689@d DQUEUE_INIT(this, type, initial_size)
12690    ((this.t_current=0), DSTACK_INIT(this.t_stack, type, initial_size))
12691@d DQUEUE_PUSH(this, type) DSTACK_PUSH(this.t_stack, type)
12692@d DQUEUE_POP(this, type) DSTACK_POP(this.t_stack, type)
12693@d DQUEUE_NEXT(this, type) (this.t_current >= DSTACK_LENGTH(this.t_stack)
12694    ? NULL
12695    : (DSTACK_BASE(this.t_stack, type))+this.t_current++)
12696@d DQUEUE_BASE(this, type) DSTACK_BASE(this.t_stack, type)
12697@d DQUEUE_END(this) DSTACK_LENGTH(this.t_stack)
12698@d STOLEN_DQUEUE_DATA_FREE(data) STOLEN_DSTACK_DATA_FREE(data)
12699
12700@<Private incomplete structures@> =
12701struct s_dqueue;
12702typedef struct s_dqueue* DQUEUE;
12703@ @<Private structures@> =
12704struct s_dqueue { gint t_current; struct s_dstack t_stack; };
12705
12706@** Per-Earley-Set List (PSL) Code.
12707There are several cases where Marpa needs to
12708look up a triple $\langle s,s',k \rangle$,
12709where $s$ and $s'$ are earlemes, and $0<k<n$,
12710where $n$ is a reasonably small constant,
12711such as the number of AHFA items.
12712Earley items, or-nodes and and-nodes are examples.
12713@ Lookup for Earley items needs to be $O(1)$
12714to justify Marpa's time complexity claims.
12715Setup of the parse
12716bocage for evaluation is not
12717parsing in the strict sense,
12718but makes sense to have it meet the same time complexity claims.
12719@
12720To obtain $O(1)$,
12721Marpa uses a special data structure, the Per-Earley-Set List.
12722The Per-Earley-Set Lists rely on the following being true:
12723\li It can be arranged so
12724that only one $s'$ is being considered at a time,
12725so that we are in fact looking up a duple $\langle s,k \rangle$.
12726\li In all cases of interest
12727we will have pointers available that take
12728us directly to all of the
12729Earley sets involved,
12730so that lookup of the data for an Earley set is $O(1)$.
12731\li The value of $k$ is always less than a constant.
12732Therefore any reasonable algorithm
12733for the search and insertion of $k$ is $O(1)$.
12734@ The idea is that each Earley set has a list of values
12735for all the keys $k$.
12736We arrange to consider only one Earley set $s$ at a time.
12737A pointer takes us to the Earley set $s'$ in $O(1)$ time.
12738Each Earley set has a list of values indexed by $k$.
12739Since this list is of a size less than a constant,
12740search and insertion in it is $O(1)$.
12741Thus each search and insertion for the triple
12742$\langle s,s',k \rangle$ takes $O(1)$ time.
12743@ In understanding how the PSL's are used, it is important
12744to keep in mind that the PSL's are kept in Earley sets as
12745a convenience, and that the semantic relation of the Earley set
12746to the data structure being tracked by the PSL is not important
12747in the choice of where the PSL goes.
12748All data structures tracked by PSL's belong
12749semantically more to
12750the Earley set of their dot earleme than any other,
12751but for the time complexity hack to work,
12752that must be held constand while another Earley set is
12753the one which varies.
12754In the case of Earley items and or-nodes, the varying
12755Earley set is the origin.
12756In the case of and-nodes, the origin Earley set is also
12757held constant, and the Earley set of the middle earleme
12758is the variable.
12759@ The PSL's are kept in a linked list.
12760Each contains |Size_of_PSL| |gpointer|'s.
12761|t_owner| is the address of the location
12762that ``owns" this PSL.
12763That location will be NULL'ed
12764when deallocating.
12765@<Private incomplete structures@> =
12766struct s_per_earley_set_list;
12767typedef struct s_per_earley_set_list *PSL;
12768@ @d Sizeof_PSL(psar)
12769    (sizeof(PSL_Object) + (psar->t_psl_length - 1) * sizeof(gpointer))
12770@d PSL_Datum(psl, i) ((psl)->t_data[(i)])
12771@<Private structures@> =
12772struct s_per_earley_set_list {
12773    PSL t_prev;
12774    PSL t_next;
12775    PSL* t_owner;
12776    gpointer t_data[1];
12777};
12778typedef struct s_per_earley_set_list PSL_Object;
12779@ The per-Earley-set lists are allcated from per-Earley-set arenas.
12780@<Private incomplete structures@> =
12781struct s_per_earley_set_arena;
12782typedef struct s_per_earley_set_arena *PSAR;
12783@ The ``dot" PSAR is to track earley items whose origin
12784or current earleme is at the ``dot" location,
12785that is, the current Earley set.
12786The ``predict" PSAR
12787is to track earley items for predictions
12788at locations other than the current earleme.
12789The ``predict" PSAR
12790is used for predictions which result from
12791scanned items.
12792Since they are predictions, their current Earley set
12793and origin are at the same earleme.
12794This earleme will be somewhere after the current earleme.
12795@<Private structures@> =
12796struct s_per_earley_set_arena {
12797      gint t_psl_length;
12798      PSL t_first_psl;
12799      PSL t_first_free_psl;
12800};
12801typedef struct s_per_earley_set_arena PSAR_Object;
12802@ @d Dot_PSAR_of_R(r) (&(r)->t_dot_psar_object)
12803@<Widely aligned recognizer elements@> =
12804PSAR_Object t_dot_psar_object;
12805@ @<Initialize recognizer elements@> =
12806  psar_init(Dot_PSAR_of_R(r), AHFA_Count_of_R (r));
12807@ @<Destroy recognizer elements@> =
12808  psar_destroy(Dot_PSAR_of_R(r));
12809@ @<Private function prototypes@> =
12810static inline void psar_init(const PSAR psar, gint length);
12811static inline void psar_destroy(const PSAR psar);
12812static inline PSL psl_new(const PSAR psar);
12813@ @<Function definitions@> =
12814static inline void
12815psar_init (const PSAR psar, gint length)
12816{
12817  psar->t_psl_length = length;
12818  psar->t_first_psl = psar->t_first_free_psl = psl_new (psar);
12819}
12820@ @<Function definitions@> =
12821static inline void psar_destroy(const PSAR psar)
12822{
12823    PSL psl = psar->t_first_psl;
12824MARPA_OFF_DEBUG3("%s psl=%p", G_STRLOC, psl);
12825    while (psl)
12826      {
12827	PSL next_psl = psl->t_next;
12828	PSL *owner = psl->t_owner;
12829MARPA_OFF_DEBUG3("%s owner=%p", G_STRLOC, owner);
12830	if (owner)
12831	  *owner = NULL;
12832	g_slice_free1 (Sizeof_PSL (psar), psl);
12833	psl = next_psl;
12834MARPA_OFF_DEBUG3("%s psl=%p", G_STRLOC, psl);
12835      }
12836}
12837@ @<Function definitions@> =
12838static inline PSL psl_new(const PSAR psar) {
12839     gint i;
12840     PSL new_psl = g_slice_alloc(Sizeof_PSL(psar));
12841     new_psl->t_next = NULL;
12842     new_psl->t_prev = NULL;
12843     new_psl->t_owner = NULL;
12844    for (i = 0; i < psar->t_psl_length; i++) {
12845	PSL_Datum(new_psl, i) = NULL;
12846    }
12847     return new_psl;
12848}
12849@
12850{\bf To Do}: @^To Do@>
12851This is temporary data
12852and perhaps should be keep track of on a per-phase
12853obstack.
12854@d Dot_PSL_of_ES(es) ((es)->t_dot_psl)
12855@<Widely aligned Earley set elements@> =
12856    PSL t_dot_psl;
12857@ @<Initialize Earley set PSL data@> =
12858{ set->t_dot_psl = NULL; }
12859
12860@ A PSAR reset nulls out the data in the PSL's.
12861It is a moderately expensive operation, usually
12862avoided by having the logic check for ``stale" data.
12863But when the PSAR is needed for a
12864a different type of PSL data,
12865one which will require different stale-detection logic,
12866the old PSL data need to be nulled.
12867@<Private function prototypes@> =
12868static inline void psar_reset(const PSAR psar);
12869@ @<Function definitions@> =
12870static inline void psar_reset(const PSAR psar) {
12871    PSL psl = psar->t_first_psl;
12872    while (psl && psl->t_owner) {
12873	gint i;
12874	for (i = 0; i < psar->t_psl_length; i++) {
12875	    PSL_Datum(psl, i) = NULL;
12876	}
12877	psl = psl->t_next;
12878    }
12879    psar_dealloc(psar);
12880}
12881
12882@ A PSAR dealloc removes an owner's claim to the all of
12883its PSLs,
12884and puts them back on the free list.
12885It does {\bf not} null out the stale PSL items.
12886@<Private function prototypes@> =
12887static inline void psar_dealloc(const PSAR psar);
12888@ @<Function definitions@> =
12889static inline void psar_dealloc(const PSAR psar) {
12890    PSL psl = psar->t_first_psl;
12891    while (psl) {
12892	PSL* owner = psl->t_owner;
12893	if (!owner) break;
12894	(*owner) = NULL;
12895	psl->t_owner = NULL;
12896	psl = psl->t_next;
12897    }
12898     psar->t_first_free_psl = psar->t_first_psl;
12899}
12900
12901@ This function ``claims" a PSL.
12902The address of the claimed PSL and the PSAR
12903from which to claim it are arguments.
12904The caller must ensure that
12905there is not a PSL already
12906at the claiming address.
12907@<Private function prototypes@> =
12908static inline void psl_claim(
12909    PSL* const psl_owner, const PSAR psar);
12910@ @<Function definitions@> =
12911static inline void psl_claim(
12912    PSL* const psl_owner, const PSAR psar) {
12913     PSL new_psl = psl_alloc(psar);
12914     (*psl_owner) = new_psl;
12915     new_psl->t_owner = psl_owner;
12916}
12917
12918@ @<Claim the or-node PSL for |PSL_ES_ORD| as |CLAIMED_PSL|@> =
12919{
12920      PSL *psl_owner = &per_es_data[PSL_ES_ORD].t_or_psl;
12921      if (!*psl_owner)
12922	psl_claim (psl_owner, or_psar);
12923      (CLAIMED_PSL) = *psl_owner;
12924}
12925#undef PSL_ES_ORD
12926#undef CLAIMED_PSL
12927
12928@ This function ``allocates" a PSL.
12929It gets a free PSL from the PSAR.
12930There must always be at least one free PSL in a PSAR.
12931This function replaces the allocated PSL with
12932a new free PSL when necessary.
12933@ @<Private function prototypes@> =
12934static inline PSL psl_alloc(const PSAR psar);
12935@ @<Function definitions@> =
12936static inline PSL psl_alloc(const PSAR psar) {
12937    PSL free_psl = psar->t_first_free_psl;
12938    PSL next_psl = free_psl->t_next;
12939    if (!next_psl) {
12940        next_psl = free_psl->t_next = psl_new(psar);
12941	next_psl->t_prev = free_psl;
12942    }
12943    psar->t_first_free_psl = next_psl;
12944    return free_psl;
12945}
12946
12947@** Memory Allocation.
12948
12949@ By default,
12950a memory allocation failure
12951inside the Marpa library is a fatal error.
12952If this is a problem, the application can change
12953configure |g_malloc| to use its own allocator
12954which does something else on failure.
12955What else an application can do is not at all clear,
12956which is why the usual practice
12957is to treatment memory allocation errors are
12958fatal, irrecoverable problems.
12959
12960@ An error
12961in memory allocation will be logged
12962in the domain that |g_malloc|
12963is using, not in the domain being used by Marpa.
12964
12965@ |libmarpa| uses |g_malloc|, either directly or indirectly.
12966Indirect use of |g_malloc| comes via obstacks and |g_slice|.
12967Both of these are more efficient, but both also
12968limit the ability to resize memory.
12969Obstacks also sharply limit the ability
12970to control the lifetime of the memory.
12971\par
12972It should be noted that the libraries used by |libmarpa| may
12973also allocate memory, using their own methods.
12974This allocation is often also |g_malloc| based.
12975\par
12976Obstacks are particularly useful for |libmarpa|.
12977Much of the memory allocated in |libmarpa| is
12978\li In individual allocations less than 4K, often considerable less.
12979\li Once created, are kept for the entire life of the either the grammar or the recognizer.
12980\li Once created, is never resized.
12981For these, obstacks are perfect.
12982|libmarpa|'s grammar has an obstacks.
12983Small allocations needed for the lifetime of the grammar
12984are allocated on these as the grammar object is built.
12985All these allocations are are conveniently and quickly deallocated when
12986the grammar's obstack is destroyed along with its parent grammar.
12987@d obstack_chunk_alloc g_malloc
12988@d obstack_chunk_free g_free
12989
12990@*0 Why the obstacks are renamed.
12991Regretfully, I realized I simply could not simply include the
12992GNU obstacks, because of three obstacles.
12993First, the error handling is not thread-safe.  In fact,
12994since it relies on a global error handler, it is not even
12995safe for use by multiple libraries within one thread.
12996Since
12997the obstack ``error handling" consisted of exactly one
12998``out of memory" message, which Marpa will never use because
12999it uses |g_malloc|, this risk comes at no benefit whatsoever.
13000Removing the error handling was far easier than leaving it
13001in.
13002
13003@ Second, there were also portability complications
13004caused by the unneeded features of obstacks.
13005\li The GNU obtacks had a complex set of |ifdef|'s intended
13006to allow the same code to be part of GNU libc,
13007or not part of it, and the portability aspect of these
13008was daunting.
13009\li GNU obstack's lone error message was dragging in
13010GNU's internationalization.
13011(|libmarpa| avoids internationalization by leaving all
13012messaging and naming to the higher layers.)
13013It was far easier to rip out these features than to
13014deal with the issues they raised,
13015especially the portability
13016issues.
13017
13018@ Third, if I did choose to try to use GNU obstacks in its
13019original form, |libmarpa| would have to deal with issues
13020of interposing identical function names in the linking process.
13021I aim at portability, even to systems that I have no
13022direct access to.
13023This is, of course, a real challenge when
13024it comes to debugging.
13025It was not cheering to think of the prospect
13026of multiple
13027libraries with obstack functions being resolved by the linkers
13028of widely different systems.
13029If, for example, a function that I intended to be used was not the
13030one linked, the bug would usually be a silent one.
13031
13032@ Porting to systems with no native obstack meant that I was
13033already in the business of maintaining my own obstacks code,
13034whether I liked it or not.
13035The only reasonable alternative seemed to be
13036to create my own version of obstacks,
13037essentially copying the GNU implementation,
13038but eliminating the unnecessary
13039but problematic features.
13040Namespace issues could then be dealt with by
13041renaming the external functions.
13042
13043@** External Failure Reports.
13044Most of
13045|libmarpa|'s external functions return failure under
13046one or more circumstances --- for
13047example, they may have been called incorrectly.
13048Many of the external routines share failure logic in
13049common.
13050I found it convenient to gather much of this logic here.
13051
13052@ External routines will differ in the exact value
13053they return on failure.
13054Routines returning a pointer will return a |NULL|.
13055External routines which return an integer value
13056will return either |-2| as a general failure
13057indicator,
13058so that |-1| can be reserved for special purposes.
13059@ The circumstances under
13060which |-1| is returned are described in the section
13061for each external function call.
13062Typical meanings of |-1| are
13063``not defined", or ``does not exist".
13064
13065@ The final decision about the meaning of
13066return values is up to the higher layers.
13067A general failure return
13068(|NULL| or |-2|) will
13069typically be a hard failure.
13070A |-1| return may be reasonably be
13071interpreted as a normal
13072return value, a soft failure,
13073or a hard failure,
13074depending on the context.
13075
13076@ For this reason,
13077all the logic in this section expects |failure_indication|
13078to be set in the scope in which it is used.
13079All failures treated in this section are general failures,
13080so that |-1| is not used as a return value.
13081
13082@ Routines with nothing else to return often use |FALSE| as the failure indicator.
13083@<Return |FALSE| on failure@> = const gboolean failure_indicator = FALSE;
13084@ Routines returning pointers often use |NULL| as the failure indicator.
13085@<Return |NULL| on failure@> = const gpointer failure_indicator = NULL;
13086@ Routines returning integer value use |-2| as the
13087general failure indicator.
13088@<Return |-2| on failure@> = const int failure_indicator = -2;
13089
13090@*0 Grammar Failures.
13091|g| is assumed to be the value of the relevant grammar,
13092when one is required.
13093@<Fail if grammar is precomputed@> =
13094if (G_is_Precomputed(g)) {
13095    g_context_clear(g);
13096    g->t_error = "grammar precomputed";
13097    return failure_indicator;
13098}
13099@ @<Fail if grammar not precomputed@> =
13100if (!G_is_Precomputed(g)) {
13101    g_context_clear(g);
13102    g->t_error = "grammar not precomputed";
13103    return failure_indicator;
13104}
13105@ @<Fail if grammar |symid| is invalid@> =
13106if (!symbol_is_valid(g, symid)) {
13107    g_context_clear(g);
13108    g_context_int_add(g, "symid", symid);
13109    g->t_error = "invalid symbol id";
13110    return failure_indicator;
13111}
13112@ @<Fail if grammar |rule_id| is invalid@> =
13113if (!RULEID_of_G_is_Valid(g, rule_id)) {
13114    g_context_clear(g);
13115    g_context_int_add(g, "rule_id", rule_id);
13116    g->t_error = "invalid rule id";
13117    return failure_indicator;
13118}
13119@ @<Fail if grammar |item_id| is invalid@> =
13120if (!item_is_valid(g, item_id)) {
13121    g_context_clear(g);
13122    g_context_int_add(g, "item_id", item_id);
13123    g->t_error = "invalid item id";
13124    return failure_indicator;
13125}
13126@ @<Fail if grammar |AHFA_state_id| is invalid@> =
13127if (!AHFA_state_id_is_valid(g, AHFA_state_id)) {
13128    g_context_clear(g);
13129    g_context_int_add(g, "AHFA_state_id", AHFA_state_id);
13130    g->t_error = "invalid AHFA state id";
13131    return failure_indicator;
13132}
13133@ @<Fail grammar if elements of |result| are not |sizeof(gint)|@> =
13134if (sizeof(gint) != g_array_get_element_size(result)) {
13135     g_context_clear(g);
13136     g_context_int_add(g, "expected size", sizeof(gint));
13137     g->t_error = "garray size mismatch";
13138     return failure_indicator;
13139}
13140@ @<Fail with internal grammar error@> = {
13141    g_context_clear(g);
13142    g->t_error = "internal error";
13143    return failure_indicator;
13144}
13145
13146@*0 Recognizer Failures.
13147|r| is assumed to be the value of the relevant recognizer,
13148when one is required.
13149@<Fail if recognizer not initial@> =
13150if (Phase_of_R(r) != initial_phase) {
13151    R_ERROR("not initial recce phase");
13152    return failure_indicator;
13153}
13154@ @<Fail if recognizer initial@> =
13155if (Phase_of_R(r) == initial_phase) {
13156    R_ERROR("initial recce phase");
13157    return failure_indicator;
13158}
13159@ @<Fail if recognizer exhausted@> =
13160if (R_is_Exhausted(r)) {
13161    R_ERROR("recce exhausted");
13162    return failure_indicator;
13163}
13164@ @<Fail if recognizer not in input phase@> =
13165if (Phase_of_R(r) != input_phase) {
13166    R_ERROR("recce not in input phase");
13167    return failure_indicator;
13168}
13169@ @<Fail recognizer if not trace-safe@> =
13170switch (Phase_of_R(r)) {
13171default:
13172    R_ERROR("recce not trace-safe");
13173    return failure_indicator;
13174case input_phase:
13175case evaluation_phase:
13176break;
13177}
13178@ @<Fail if recognizer has fatal error@> =
13179if (Phase_of_R(r) == error_phase) {
13180    R_ERROR(r->t_fatal_error);
13181    return failure_indicator;
13182}
13183@ @<Fail if recognizer |symid| is invalid@> =
13184if (!symbol_is_valid(G_of_R(r), symid)) {
13185    r_context_clear(r);
13186    r_context_int_add(r, "symid", symid);
13187    R_ERROR_CXT("invalid symid");
13188    return failure_indicator;
13189}
13190@ @<Fail recognizer if |GArray| elements are not |sizeof(gint)|@> =
13191if (sizeof(gint) != g_array_get_element_size(result)) {
13192     r_context_clear(r);
13193     r_context_int_add(r, "expected size", sizeof(gint));
13194     R_ERROR_CXT("garray size mismatch");
13195     return failure_indicator;
13196}
13197
13198@ The central error routine for the recognizer.
13199There are two flags which control its behavior.
13200One flag makes a error recognizer-fatal.
13201When there is a recognizer-fatal error, all
13202subsequent
13203invocations of external functions for that recognizer
13204object will fail.
13205It is a design goal of libmarpa to leave as much discretion
13206about error handling to the higher layers as possible.
13207Because of this, even the most severe errors
13208are not necessarily made recognizer-fatal.
13209|libmarpa| makes an
13210error recognizer-fatal only when the integrity of the
13211recognizer object is so thorougly compromised
13212that |libmarpa|'s external functions cannot proceed
13213without risking internal memory errors,
13214such as bus errors and segment violations.
13215``Recognizer-fatal" status is thus,
13216not a means of dictating to the higher layers that a
13217|libmarpa| condition must be application-fatal,
13218but a way of preventing a recognizer error from becoming
13219application-fatal without the application's consent.
13220@d FATAL_FLAG (0x1u)
13221@ Another flag indicates that the caller set up the
13222context.
13223By default, |r_error| clears the context.
13224@d CONTEXT_FLAG (0x2u)
13225@ Several convenience macros are provided.
13226These are easier and less error-prone
13227than specifying the flags.
13228Not being error-prone
13229is important since there are many calls to |r_error|
13230in the code.
13231@d R_ERROR(message) (r_error(r, (message), 0u))
13232@d R_ERROR_CXT(message) (r_error(r, (message), CONTEXT_FLAG))
13233@d R_FATAL(message) (r_error(r, (message), FATAL_FLAG))
13234@d R_FATAL_CXT(message) (r_error(r, (message), CONTEXT_FLAG|FATAL_FLAG))
13235@<Private function prototypes@> =
13236static void r_error( struct marpa_r* r, Marpa_Message_ID message, guint flags );
13237@ Not inlined.  |r_error|
13238occurs in the code quite often,
13239but |r_error|
13240should actually be invoked only in exceptional circumstances.
13241In this case space clearly is much more important than speed.
13242@<Function definitions@> =
13243static void r_error( struct marpa_r* r, Marpa_Message_ID message, guint flags ) {
13244    if (!(flags & CONTEXT_FLAG)) r_context_clear(r);
13245    r->t_error = message;
13246    if (flags & FATAL_FLAG) r->t_fatal_error = r->t_error;
13247    r_message(r, message);
13248}
13249
13250@** Messages and Logging.
13251The main messaging system for |libmarpa| relies on callbacks
13252to upper layers.
13253But there are many cases in which it is not appropriate
13254to rely on the upper layers.
13255These cases include
13256serious internal problems,
13257memory allocation failures,
13258and debugging.
13259
13260\par As a fallback messaging and logging system,
13261|libmarpa| uses |glib|'s Message Logging framework.
13262When the messsage domain is
13263under |libmarpa|'s control,
13264Marpa sets the domain to |"Marpa"|.
13265In many cases, such as memory allocation failures,
13266the domain will be as set by |glib|.
13267@ Set the Logging Domain
13268@<Logging domain@> =
13269#undef G_LOG_DOMAIN@/
13270#define G_LOG_DOMAIN "Marpa"@/
13271
13272@*0 Message callbacks.
13273The user can define a callback
13274(with argument) which is invoked whenever |libmarpa|
13275has a message for the upper layers.
13276Note a lot of strings are used for convenience
13277in these messages.
13278These should be considered ``cookies",
13279as is they were file name or variables names.
13280They should not be regarded as part of the user
13281interface, even if some default or fallback routines
13282may sometimes expose them to the user.
13283And they should
13284not be subject to internationalization or localization.
13285
13286These message cookies are always null-terminated in
13287the 7-bit ASCII character set.
13288This is a lowest common denominator, and is not a choice
13289binding on the upper layers,
13290which may use one of the Unicode encoding or anything
13291else.
13292Cookies often are mnemonics in the English language,
13293but this should not be regarded
13294as a reason to subject them to translation ---
13295at least not unless you are also translating the variable
13296names and file names.
13297
13298The intent is to have all internationalization,
13299localization and string encoding issues dealt with
13300by the upper layers.
13301@<Public typedefs@> =
13302typedef const gchar* Marpa_Message_ID;
13303
13304@* Grammar Messages.
13305@ Function pointer declarations are
13306hard to type and impossible to read.
13307This typedef localizes the damage.
13308@<Callback typedefs@> =
13309typedef void (Marpa_G_Message_Callback)(struct marpa_g *g, Marpa_Message_ID id);
13310@ @<Widely aligned grammar elements@> =
13311    Marpa_G_Message_Callback* t_message_callback;
13312    gpointer t_message_callback_arg;
13313@ @<Initialize grammar elements@> =
13314g->t_message_callback_arg = NULL;
13315g->t_message_callback = NULL;
13316@ @<Function definitions@> =
13317void marpa_g_message_callback_set(struct marpa_g *g, Marpa_G_Message_Callback*cb)
13318{ g->t_message_callback = cb; }
13319void marpa_g_message_callback_arg_set(struct marpa_g *g, gpointer cb_arg)
13320{ g->t_message_callback_arg = cb_arg; }
13321gpointer marpa_g_message_callback_arg(struct marpa_g *g)
13322{ return g->t_message_callback_arg; }
13323@ @<Public function prototypes@> =
13324void marpa_g_message_callback_set(struct marpa_g *g, Marpa_G_Message_Callback*cb);
13325void marpa_g_message_callback_arg_set(struct marpa_g *g, gpointer cb_arg);
13326gpointer marpa_g_message_callback_arg(struct marpa_g *g);
13327@ Do the message callback.
13328The name of this function is spelled out to avoid a conflict with a
13329|glib| function.
13330Note that the memory management assumes that the
13331callback either exits or returns control to |libmarpa|.
13332A |longjmp| out of a callback will probably cause a memory leak.
13333@<Function definitions@> =
13334static inline void grammar_message(struct marpa_g *g, Marpa_Message_ID id)
13335{ Marpa_G_Message_Callback* cb = g->t_message_callback;
13336if (cb) { (*cb)(g, id); } }
13337@ @<Private function prototypes@> =
13338static inline void grammar_message(struct marpa_g *g, Marpa_Message_ID id);
13339
13340@* Recognizer Messages.
13341@ Essentially the same as grammar messages,
13342except they live in and use the recognizer object.
13343@<Callback typedefs@> =
13344typedef void (Marpa_R_Message_Callback)(struct marpa_r *r, Marpa_Message_ID id);
13345@ @d Message_Callback_of_R(r) ((r)->t_message_callback)
13346@d Message_Callback_Arg_of_R(r) ((r)->t_message_callback_arg)
13347@<Widely aligned recognizer elements@> =
13348    Marpa_R_Message_Callback* t_message_callback;
13349    gpointer t_message_callback_arg;
13350@ @<Initialize recognizer elements@> =
13351r->t_message_callback_arg = NULL;
13352r->t_message_callback = NULL;
13353@ @<Function definitions@> =
13354void marpa_r_message_callback_set(struct marpa_r *r, Marpa_R_Message_Callback*cb)
13355{ r->t_message_callback = cb; }
13356void marpa_r_message_callback_arg_set(struct marpa_r *r, gpointer cb_arg)
13357{ r->t_message_callback_arg = cb_arg; }
13358gpointer marpa_r_message_callback_arg(struct marpa_r *r)
13359{ return Message_Callback_Arg_of_R(r); }
13360@ @<Public function prototypes@> =
13361void marpa_r_message_callback_set(struct marpa_r *r, Marpa_R_Message_Callback*cb);
13362void marpa_r_message_callback_arg_set(struct marpa_r *r, gpointer cb_arg);
13363gpointer marpa_r_message_callback_arg(struct marpa_r *r);
13364@ @<Function definitions@> =
13365static inline void r_message(struct marpa_r *r, Marpa_Message_ID id)
13366{ Marpa_R_Message_Callback* cb = Message_Callback_of_R(r);
13367if (cb) { (*cb)(r, id); } }
13368@ @<Private function prototypes@> =
13369static inline void r_message(struct marpa_r *r, Marpa_Message_ID id);
13370
13371@** Debugging.
13372The |MARPA_DEBUG| flag enables intrusive debugging logic.
13373``Intrusive" debugging includes things which would
13374be annoying in production, such as detailed messages about
13375internal matters on |STDERR|.
13376@d MARPA_OFF_DEBUG1(a)
13377@d MARPA_OFF_DEBUG2(a, b)
13378@d MARPA_OFF_DEBUG3(a, b, c)
13379@d MARPA_OFF_DEBUG4(a, b, c, d)
13380@d MARPA_OFF_DEBUG5(a, b, c, d, e)
13381@d MARPA_OFF_ASSERT(expr)
13382@<Debug macros@> =
13383#define MARPA_DEBUG @[ 0 @]
13384#define MARPA_ENABLE_ASSERT @[ 0 @]
13385#if MARPA_DEBUG
13386#define MARPA_DEBUG1(a) @[ g_debug((a)); @]
13387#define MARPA_DEBUG2(a, b) @[ g_debug((a),(b)); @]
13388#define MARPA_DEBUG3(a, b, c) @[ g_debug((a),(b),(c)); @]
13389#define MARPA_DEBUG4(a, b, c, d) @[ g_debug((a),(b),(c),(d)); @]
13390#define MARPA_DEBUG5(a, b, c, d, e) @[ g_debug((a),(b),(c),(d),(e)); @]
13391#define MARPA_ASSERT(expr) do { if G_LIKELY (expr) ; else \
13392       g_error ("%s: assertion failed %s", G_STRLOC, #expr); } while (0);
13393#else /* if not |MARPA_DEBUG| */
13394#define MARPA_DEBUG1(a) @[@]
13395#define MARPA_DEBUG2(a, b) @[@]
13396#define MARPA_DEBUG3(a, b, c) @[@]
13397#define MARPA_DEBUG4(a, b, c, d) @[@]
13398#define MARPA_DEBUG5(a, b, c, d, e) @[@]
13399#define MARPA_ASSERT(exp) @[@]
13400#endif
13401
13402#if MARPA_ENABLE_ASSERT
13403#undef MARPA_ASSERT
13404#define MARPA_ASSERT(expr) do { if G_LIKELY (expr) ; else \
13405       g_error ("%s: assertion failed %s", G_STRLOC, #expr); } while (0);
13406#endif
13407
13408@*0 Earley Item Tag.
13409A function to print a descriptive tag for
13410an Earley item.
13411@<Private function prototypes@> =
13412#if MARPA_DEBUG
13413PRIVATE_NOT_INLINE gchar* eim_tag_safe(gchar *buffer, EIM eim);
13414PRIVATE_NOT_INLINE gchar* eim_tag(EIM eim);
13415#endif
13416@ It is passed a buffer to keep it thread-safe.
13417@<Function definitions@> =
13418#if MARPA_DEBUG
13419PRIVATE_NOT_INLINE gchar *
13420eim_tag_safe (gchar * buffer, EIM eim)
13421{
13422  sprintf (buffer, "S%d@@%d-%d",
13423	   AHFAID_of_EIM (eim), Origin_Earleme_of_EIM (eim),
13424	   Earleme_of_EIM (eim));
13425  return buffer;
13426}
13427
13428static char DEBUG_eim_tag_buffer[1000];
13429PRIVATE_NOT_INLINE gchar*
13430eim_tag (EIM eim)
13431{
13432  return eim_tag_safe (DEBUG_eim_tag_buffer, eim);
13433}
13434#endif
13435
13436@*0 Leo Item Tag.
13437A function to print a descriptive tag for
13438an Leo item.
13439@<Private function prototypes@> =
13440#if MARPA_DEBUG
13441PRIVATE_NOT_INLINE gchar* lim_tag_safe (gchar *buffer, LIM lim);
13442PRIVATE_NOT_INLINE gchar* lim_tag (LIM lim);
13443#endif
13444@ This function is passed a buffer to keep it thread-safe.
13445be made thread-safe.
13446@<Function definitions@> =
13447#if MARPA_DEBUG
13448PRIVATE_NOT_INLINE gchar*
13449lim_tag_safe (gchar *buffer, LIM lim)
13450{
13451  sprintf (buffer, "L%d@@%d",
13452	   Postdot_SYMID_of_LIM (lim), Earleme_of_LIM (lim));
13453	return buffer;
13454}
13455
13456static char DEBUG_lim_tag_buffer[1000];
13457PRIVATE_NOT_INLINE gchar*
13458lim_tag (LIM lim)
13459{
13460  return lim_tag_safe (DEBUG_lim_tag_buffer, lim);
13461}
13462#endif
13463
13464@*0 Or-Node Tag.
13465Functions to print a descriptive tag for
13466an or-node item.
13467One is thread-safe, the other is
13468more convenient but not thread-safe.
13469@<Private function prototypes@> =
13470#if MARPA_DEBUG
13471PRIVATE_NOT_INLINE const gchar* or_tag_safe(gchar *buffer, OR or);
13472PRIVATE_NOT_INLINE const gchar* or_tag(OR or);
13473#endif
13474@ It is passed a buffer to keep it thread-safe.
13475@<Function definitions@> =
13476#if MARPA_DEBUG
13477PRIVATE_NOT_INLINE const gchar *
13478or_tag_safe (gchar * buffer, OR or)
13479{
13480  if (!or) return "NULL";
13481  if (OR_is_Token(or)) return "TOKEN";
13482  if (Type_of_OR(or) == DUMMY_OR_NODE) return "DUMMY";
13483  sprintf (buffer, "R%d:%d@@%d-%d",
13484	   ID_of_RULE(RULE_of_OR (or)), Position_of_OR (or),
13485	   Origin_Ord_of_OR (or),
13486	   ES_Ord_of_OR (or));
13487  return buffer;
13488}
13489
13490static char DEBUG_or_tag_buffer[1000];
13491PRIVATE_NOT_INLINE const gchar*
13492or_tag (OR or)
13493{
13494  return or_tag_safe (DEBUG_or_tag_buffer, or);
13495}
13496#endif
13497
13498@*0 AHFA Item Tag.
13499Functions to print a descriptive tag for
13500an AHFA item.
13501One is passed a buffer to keep it thread-safe.
13502The other uses a global buffer,
13503which is not thread-safe, but
13504convenient when debugging in a non-threaded environment.
13505@<Private function prototypes@> =
13506#if MARPA_DEBUG
13507PRIVATE_NOT_INLINE const gchar* aim_tag_safe(gchar *buffer, AIM aim);
13508PRIVATE_NOT_INLINE const gchar* aim_tag(AIM aim);
13509#endif
13510@ @<Function definitions@> =
13511#if MARPA_DEBUG
13512PRIVATE_NOT_INLINE const gchar *
13513aim_tag_safe (gchar * buffer, AIM aim)
13514{
13515  if (!aim) return "NULL";
13516  const gint aim_position = Position_of_AIM (aim);
13517  if (aim_position >= 0) {
13518      sprintf (buffer, "R%d@@%d", RULEID_of_AIM (aim), Position_of_AIM (aim));
13519  } else {
13520      sprintf (buffer, "R%d@@end", RULEID_of_AIM (aim));
13521  }
13522  return buffer;
13523}
13524
13525static char DEBUG_aim_tag_buffer[1000];
13526PRIVATE_NOT_INLINE const gchar*
13527aim_tag (AIM aim)
13528{
13529  return aim_tag_safe (DEBUG_aim_tag_buffer, aim);
13530}
13531#endif
13532
13533
13534@** File Layout.
13535@ The output files are {\bf not} source files,
13536but I add the license to them anyway,
13537as close to the top as possible.
13538@ Also, it is helpful to someone first
13539trying to orient herself,
13540if built source files contain a comment
13541to that effect and a warning
13542not that they are
13543not intended to be edited directly.
13544So I add such a comment.
13545
13546@*0 |marpa.c| Layout.
13547@q This is a hack to get the @>
13548@q license language nearer the top of the files. @>
13549@ The physical structure of the |marpa.c| file
13550\tenpoint
13551@c
13552@=/*@>@/
13553@= * Copyright 2012 Jeffrey Kegler@>@/
13554@= * This file is part of Marpa::XS.  Marpa::XS is free software: you can@>@/
13555@= * redistribute it and/or modify it under the terms of the GNU Lesser@>@/
13556@= * General Public License as published by the Free Software Foundation,@>@/
13557@= * either version 3 of the License, or (at your option) any later version.@>@/
13558@= *@>@/
13559@= * Marpa::XS is distributed in the hope that it will be useful,@>@/
13560@= * but WITHOUT ANY WARRANTY; without even the implied warranty of@>@/
13561@= * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU@>@/
13562@= * Lesser General Public License for more details.@>@/
13563@= *@>@/
13564@= * You should have received a copy of the GNU Lesser@>@/
13565@= * General Public License along with Marpa::XS.  If not, see@>@/
13566@= * http://www.gnu.org/licenses/.@>@/
13567@= */@>@/
13568@=/*@>@/
13569@= * DO NOT EDIT DIRECTLY@>@/
13570@= * This file is written by ctangle@>@/
13571@= * It is not intended to be modified directly@>@/
13572@= */@>@/
13573
13574@ \twelvepoint @c
13575#include "config.h"
13576#include "marpa.h"
13577@<Debug macros@>
13578@h
13579#include "marpa_obs.h"
13580@<Logging domain@>@;
13581@<Private incomplete structures@>@;
13582@<Private typedefs@>@;
13583@<Private global variables@>@;
13584@<Private utility structures@>@;
13585@<Private structures@>@;
13586@<Recognizer structure@>@;
13587@<Source object structure@>@;
13588@<Earley item structure@>@;
13589@<Bocage structure@>@;
13590@<Private function prototypes@>@;
13591@<Private inline functions@>@;
13592@<Function definitions@>@;
13593
13594@*0 |marpa.h| Layout.
13595@q This is a separate section in order to get the @>
13596@q license language nearer the top of the files. @>
13597@q It's hackish, but in a good cause. @>
13598@ The physical structure of the |marpa.h| file
13599\tenpoint
13600@(marpa.h@> =
13601@=/*@>@/
13602@= * Copyright 2012 Jeffrey Kegler@>@/
13603@= * This file is part of Marpa::XS.  Marpa::XS is free software: you can@>@/
13604@= * redistribute it and/or modify it under the terms of the GNU Lesser@>@/
13605@= * General Public License as published by the Free Software Foundation,@>@/
13606@= * either version 3 of the License, or (at your option) any later version.@>@/
13607@= *@>@/
13608@= * Marpa::XS is distributed in the hope that it will be useful,@>@/
13609@= * but WITHOUT ANY WARRANTY; without even the implied warranty of@>@/
13610@= * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU@>@/
13611@= * Lesser General Public License for more details.@>@/
13612@= *@>@/
13613@= * You should have received a copy of the GNU Lesser@>@/
13614@= * General Public License along with Marpa::XS.  If not, see@>@/
13615@= * http://www.gnu.org/licenses/.@>@/
13616@= */@>@/
13617@=/*@>@/
13618@= * DO NOT EDIT DIRECTLY@>@/
13619@= * This file is written by ctangle@>@/
13620@= * It is not intended to be modified directly@>@/
13621@= */@>@/
13622
13623@ \twelvepoint
13624@(marpa.h@> =
13625#ifndef __MARPA_H__
13626#define __MARPA_H__ @/
13627#include <stdio.h>
13628#include <glib.h>
13629@<Body of public header file@>
13630#endif __MARPA_H__
13631
13632@** Proofs.
13633
13634For |libmarpa|, more than inspection of
13635the code is desirable to establish confidence
13636that it works as intended.
13637For some non-obvious points, proofs are useful
13638to increase the level of confidence.
13639
13640@*0 Leo completion states are AHFA singletons.
13641
13642@ {\bf Motivation:}
13643|libmarpa| combines Joop Leo's enhancements to the
13644Earley algorithm with those of Aycock and Horspool.
13645While it was clear such a thing would be
13646possible, given enough effort, it was {\bf not}
13647obvious that the combined algorithm would preserve
13648the efficiencies of the algorithms from which it
13649was derived.
13650
13651This proof establishes the key fact to show that,
13652in fact, the Leo algorithm is compatible
13653with the Aycock and Horspool algorithms.
13654The following is an outline,
13655which assumes familiarity with the underlying algorithms.
13656
13657@ {\bf Theorem:} In |libmarpa|,
13658all Leo completion states are in their own LR(0) state.
13659
13660@ {\bf Proof:}
13661In |libmarpa|, every
13662Leo completion LR(0) item will have a non-nulling symbol,
13663by Leo's definitons.
13664Therefore, every Leo completion will have a final non-nulling
13665symbol.
13666Call the Leo completion item's final non-nulling symbol, $S$.
13667
13668Call the LR(0) DFA state containing the Leo Completion item $C$.
13669Call the Leo completion LR(0) item $C1$.
13670Suppose, for reduction to absurdity,
13671that another LR(0) item is combined with
13672the Leo completion LR(0) item in the LR(0) DFA.
13673Call this second LR(0) item $C2$.
13674
13675If so,
13676there must be Leo LR(0) DFA state,
13677$C_{predecessor}$, where two of the
13678LR(0) items, after a transition on symbol $S$,
13679produce both $C1$ and $C2$.
13680That means that in $C_{predecessor}$,
13681there are two LR(0) items with S as the postdot symbol,
13682and that these two items are predecessors of $C1$ and $C2$.
13683Call them $P1$ and $P2$.
13684$P1 \neq P2$, because $C1 \neq C2$ and different LR(0)
13685items always have different predecessors.
13686
13687Therefore $C_{predecessor}$ will contain $P1$ and $P2$,
13688two LR(0) items, both
13689with $S$ as the postdot symbol.
13690But by Leo's definitions, the transition on the postdot
13691symbol into a Leo completion state
13692must be unique.
13693Therefore $C_{predecessor}$ cannot exist.
13694This completes the reduction to absurdity,
13695and the proof.
13696QED.
13697
13698@ {\bf Theorem:}
13699All Leo completion states are in their own AHFA state.
13700
13701{\bf Proof:}
13702By the theorem above, all Leo completion states are in
13703their own state in the LR(0) DFA.
13704The conversion to an epsilion-DFA will not add any items to this
13705state, because the only item in it is a completion item.
13706And conversion to a split epsilon-DFA will not add items.
13707So the Leo completion item will remain in its own state as
13708the AHFA is constructed.
13709QED.
13710
13711@** Index.
13712
13713