1.. Copyright (C) 2001-2019 NLTK Project
2.. For license information, see LICENSE.TXT
3
4=========================
5 Feature Grammar Parsing
6=========================
7
8.. include:: ../../../nltk_book/definitions.rst
9
10Grammars can be parsed from strings.
11
12    >>> from __future__ import print_function
13    >>> import nltk
14    >>> from nltk import grammar, parse
15    >>> g = """
16    ... % start DP
17    ... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
18    ... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
19    ... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
20    ... D[AGR=[NUM='pl', PERS=1]] -> 'we'
21    ... D[AGR=[PERS=2]] -> 'you'
22    ... N[AGR=[NUM='sg', GND='m']] -> 'boy'
23    ... N[AGR=[NUM='pl', GND='m']] -> 'boys'
24    ... N[AGR=[NUM='sg', GND='f']] -> 'girl'
25    ... N[AGR=[NUM='pl', GND='f']] -> 'girls'
26    ... N[AGR=[NUM='sg']] -> 'student'
27    ... N[AGR=[NUM='pl']] -> 'students'
28    ... """
29    >>> grammar = grammar.FeatureGrammar.fromstring(g)
30    >>> tokens = 'these girls'.split()
31    >>> parser = parse.FeatureEarleyChartParser(grammar)
32    >>> trees = parser.parse(tokens)
33    >>> for tree in trees: print(tree)
34    (DP[AGR=[GND='f', NUM='pl', PERS=3]]
35      (D[AGR=[NUM='pl', PERS=3]] these)
36      (N[AGR=[GND='f', NUM='pl']] girls))
37
38In general, when we are trying to develop even a very small grammar,
39it is convenient to put the rules in a file where they can be edited,
40tested and revised. Let's assume that we have saved feat0cfg_ as a file named
41``'feat0.fcfg'`` and placed it in the NLTK ``data`` directory. We can
42inspect it as follows:
43
44.. _feat0cfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/feat0.fcfg
45
46    >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')
47    % start S
48    # ###################
49    # Grammar Productions
50    # ###################
51    # S expansion productions
52    S -> NP[NUM=?n] VP[NUM=?n]
53    # NP expansion productions
54    NP[NUM=?n] -> N[NUM=?n]
55    NP[NUM=?n] -> PropN[NUM=?n]
56    NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
57    NP[NUM=pl] -> N[NUM=pl]
58    # VP expansion productions
59    VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
60    VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
61    # ###################
62    # Lexical Productions
63    # ###################
64    Det[NUM=sg] -> 'this' | 'every'
65    Det[NUM=pl] -> 'these' | 'all'
66    Det -> 'the' | 'some' | 'several'
67    PropN[NUM=sg]-> 'Kim' | 'Jody'
68    N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
69    N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'
70    IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'
71    TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
72    IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'
73    TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
74    IV[TENSE=past] -> 'disappeared' | 'walked'
75    TV[TENSE=past] -> 'saw' | 'liked'
76
77Assuming we have saved feat0cfg_ as a file named
78``'feat0.fcfg'``, the function ``parse.load_parser`` allows us to
79read the grammar into NLTK, ready for use in parsing.
80
81
82    >>> cp = parse.load_parser('grammars/book_grammars/feat0.fcfg', trace=1)
83    >>> sent = 'Kim likes children'
84    >>> tokens = sent.split()
85    >>> tokens
86    ['Kim', 'likes', 'children']
87    >>> trees = cp.parse(tokens)
88    |.Kim .like.chil.|
89    |[----]    .    .| [0:1] 'Kim'
90    |.    [----]    .| [1:2] 'likes'
91    |.    .    [----]| [2:3] 'children'
92    |[----]    .    .| [0:1] PropN[NUM='sg'] -> 'Kim' *
93    |[----]    .    .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] *
94    |[---->    .    .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}
95    |.    [----]    .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' *
96    |.    [---->    .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}
97    |.    .    [----]| [2:3] N[NUM='pl'] -> 'children' *
98    |.    .    [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] *
99    |.    .    [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}
100    |.    [---------]| [1:3] VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] *
101    |[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] *
102    >>> for tree in trees: print(tree)
103    (S[]
104      (NP[NUM='sg'] (PropN[NUM='sg'] Kim))
105      (VP[NUM='sg', TENSE='pres']
106        (TV[NUM='sg', TENSE='pres'] likes)
107        (NP[NUM='pl'] (N[NUM='pl'] children))))
108
109The parser works directly with
110the underspecified productions given by the grammar. That is, the
111Predictor rule does not attempt to compile out all admissible feature
112combinations before trying to expand the non-terminals on the left hand
113side of a production. However, when the Scanner matches an input word
114against a lexical production that has been predicted, the new edge will
115typically contain fully specified features; e.g., the edge
116[PropN[`num`:feat: = `sg`:fval:] |rarr| 'Kim', (0, 1)]. Recall from
117Chapter 8 that the Fundamental (or Completer) Rule in
118standard CFGs is used to combine an incomplete edge that's expecting a
119nonterminal *B* with a following, complete edge whose left hand side
120matches *B*. In our current setting, rather than checking for a
121complete match, we test whether the expected category *B* will
122`unify`:dt: with the left hand side *B'* of a following complete
123edge. We will explain in more detail in Section 9.2 how
124unification works; for the moment, it is enough to know that as a
125result of unification, any variable values of features in *B* will be
126instantiated by constant values in the corresponding feature structure
127in *B'*, and these instantiated values will be used in the new edge
128added by the Completer. This instantiation can be seen, for example,
129in the edge
130[NP [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)]
131in Example 9.2, where the feature `num`:feat: has been assigned the value `sg`:fval:.
132
133Feature structures in NLTK are ... Atomic feature values can be strings or
134integers.
135
136    >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
137    >>> print(fs1)
138    [ NUM   = 'sg'   ]
139    [ TENSE = 'past' ]
140
141We can think of a feature structure as being like a Python dictionary,
142and access its values by indexing in the usual way.
143
144    >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')
145    >>> print(fs1['GND'])
146    fem
147
148We can also define feature structures which have complex values, as
149discussed earlier.
150
151    >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1)
152    >>> print(fs2)
153    [       [ GND = 'fem' ] ]
154    [ AGR = [ NUM = 'pl'  ] ]
155    [       [ PER = 3     ] ]
156    [                       ]
157    [ POS = 'N'             ]
158    >>> print(fs2['AGR'])
159    [ GND = 'fem' ]
160    [ NUM = 'pl'  ]
161    [ PER = 3     ]
162    >>> print(fs2['AGR']['PER'])
163    3
164
165Feature structures can also be constructed using the ``parse()``
166method of the ``nltk.FeatStruct`` class. Note that in this case, atomic
167feature values do not need to be enclosed in quotes.
168
169    >>> f1 = nltk.FeatStruct("[NUMBER = sg]")
170    >>> f2 = nltk.FeatStruct("[PERSON = 3]")
171    >>> print(nltk.unify(f1, f2))
172    [ NUMBER = 'sg' ]
173    [ PERSON = 3    ]
174
175    >>> f1 = nltk.FeatStruct("[A = [B = b, D = d]]")
176    >>> f2 = nltk.FeatStruct("[A = [C = c, D = d]]")
177    >>> print(nltk.unify(f1, f2))
178    [     [ B = 'b' ] ]
179    [ A = [ C = 'c' ] ]
180    [     [ D = 'd' ] ]
181
182
183Feature Structures as Graphs
184----------------------------
185
186Feature structures are not inherently tied to linguistic objects; they are
187general purpose structures for representing knowledge. For example, we
188could encode information about a person in a feature structure:
189
190    >>> person01 = nltk.FeatStruct("[NAME=Lee, TELNO='01 27 86 42 96',AGE=33]")
191    >>> print(person01)
192    [ AGE   = 33               ]
193    [ NAME  = 'Lee'            ]
194    [ TELNO = '01 27 86 42 96' ]
195
196There are a number of notations for representing reentrancy in
197matrix-style representations of feature structures. In NLTK, we adopt
198the following convention: the first occurrence of a shared feature structure
199is prefixed with an integer in parentheses, such as ``(1)``, and any
200subsequent reference to that structure uses the notation
201``->(1)``, as shown below.
202
203
204    >>> fs = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
205    ...                               SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
206    >>> print(fs)
207    [ ADDRESS = (1) [ NUMBER = 74           ] ]
208    [               [ STREET = 'rue Pascal' ] ]
209    [                                         ]
210    [ NAME    = 'Lee'                         ]
211    [                                         ]
212    [ SPOUSE  = [ ADDRESS -> (1)  ]           ]
213    [           [ NAME    = 'Kim' ]           ]
214
215There can be any number of tags within a single feature structure.
216
217    >>> fs3 = nltk.FeatStruct("[A=(1)[B=b], C=(2)[], D->(1), E->(2)]")
218    >>> print(fs3)
219    [ A = (1) [ B = 'b' ] ]
220    [                     ]
221    [ C = (2) []          ]
222    [                     ]
223    [ D -> (1)            ]
224    [ E -> (2)            ]
225    >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
226    >>> fs2 = nltk.FeatStruct(CITY='Paris')
227    >>> print(nltk.unify(fs1, fs2))
228    [ CITY   = 'Paris'      ]
229    [ NUMBER = 74           ]
230    [ STREET = 'rue Pascal' ]
231
232Unification is symmetric:
233
234    >>> nltk.unify(fs1, fs2) == nltk.unify(fs2, fs1)
235    True
236
237Unification is commutative:
238
239    >>> fs3 = nltk.FeatStruct(TELNO='01 27 86 42 96')
240    >>> nltk.unify(nltk.unify(fs1, fs2), fs3) == nltk.unify(fs1, nltk.unify(fs2, fs3))
241    True
242
243Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\
244:subscript:`1` will fail if the two feature structures share a path |pi|,
245but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct
246atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK,
247this is implemented by setting the result of unification to be
248``None``.
249
250    >>> fs0 = nltk.FeatStruct(A='a')
251    >>> fs1 = nltk.FeatStruct(A='b')
252    >>> print(nltk.unify(fs0, fs1))
253    None
254
255Now, if we look at how unification interacts with structure-sharing,
256things become really interesting.
257
258
259
260    >>> fs0 = nltk.FeatStruct("""[NAME=Lee,
261    ...                                ADDRESS=[NUMBER=74,
262    ...                                         STREET='rue Pascal'],
263    ...                                SPOUSE= [NAME=Kim,
264    ...                                         ADDRESS=[NUMBER=74,
265    ...                                                  STREET='rue Pascal']]]""")
266    >>> print(fs0)
267    [ ADDRESS = [ NUMBER = 74           ]               ]
268    [           [ STREET = 'rue Pascal' ]               ]
269    [                                                   ]
270    [ NAME    = 'Lee'                                   ]
271    [                                                   ]
272    [           [ ADDRESS = [ NUMBER = 74           ] ] ]
273    [ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
274    [           [                                     ] ]
275    [           [ NAME    = 'Kim'                     ] ]
276
277
278    >>> fs1 = nltk.FeatStruct("[SPOUSE=[ADDRESS=[CITY=Paris]]]")
279    >>> print(nltk.unify(fs0, fs1))
280    [ ADDRESS = [ NUMBER = 74           ]               ]
281    [           [ STREET = 'rue Pascal' ]               ]
282    [                                                   ]
283    [ NAME    = 'Lee'                                   ]
284    [                                                   ]
285    [           [           [ CITY   = 'Paris'      ] ] ]
286    [           [ ADDRESS = [ NUMBER = 74           ] ] ]
287    [ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
288    [           [                                     ] ]
289    [           [ NAME    = 'Kim'                     ] ]
290
291    >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
292    ...                                SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
293
294
295    >>> print(fs2)
296    [ ADDRESS = (1) [ NUMBER = 74           ] ]
297    [               [ STREET = 'rue Pascal' ] ]
298    [                                         ]
299    [ NAME    = 'Lee'                         ]
300    [                                         ]
301    [ SPOUSE  = [ ADDRESS -> (1)  ]           ]
302    [           [ NAME    = 'Kim' ]           ]
303
304
305    >>> print(nltk.unify(fs2, fs1))
306    [               [ CITY   = 'Paris'      ] ]
307    [ ADDRESS = (1) [ NUMBER = 74           ] ]
308    [               [ STREET = 'rue Pascal' ] ]
309    [                                         ]
310    [ NAME    = 'Lee'                         ]
311    [                                         ]
312    [ SPOUSE  = [ ADDRESS -> (1)  ]           ]
313    [           [ NAME    = 'Kim' ]           ]
314
315
316    >>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
317    >>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
318    >>> print(fs2)
319    [ ADDRESS1 = ?x ]
320    [ ADDRESS2 = ?x ]
321    >>> print(nltk.unify(fs1, fs2))
322    [ ADDRESS1 = (1) [ NUMBER = 74           ] ]
323    [                [ STREET = 'rue Pascal' ] ]
324    [                                          ]
325    [ ADDRESS2 -> (1)                          ]
326
327
328
329
330    >>> sent = 'who do you claim that you like'
331    >>> tokens = sent.split()
332    >>> cp = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1)
333    >>> trees = cp.parse(tokens)
334    |.w.d.y.c.t.y.l.|
335    |[-] . . . . . .| [0:1] 'who'
336    |. [-] . . . . .| [1:2] 'do'
337    |. . [-] . . . .| [2:3] 'you'
338    |. . . [-] . . .| [3:4] 'claim'
339    |. . . . [-] . .| [4:5] 'that'
340    |. . . . . [-] .| [5:6] 'you'
341    |. . . . . . [-]| [6:7] 'like'
342    |# . . . . . . .| [0:0] NP[]/NP[] -> *
343    |. # . . . . . .| [1:1] NP[]/NP[] -> *
344    |. . # . . . . .| [2:2] NP[]/NP[] -> *
345    |. . . # . . . .| [3:3] NP[]/NP[] -> *
346    |. . . . # . . .| [4:4] NP[]/NP[] -> *
347    |. . . . . # . .| [5:5] NP[]/NP[] -> *
348    |. . . . . . # .| [6:6] NP[]/NP[] -> *
349    |. . . . . . . #| [7:7] NP[]/NP[] -> *
350    |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
351    |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
352    |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
353    |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
354    |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
355    |. [-> . . . . .| [1:2] S[+INV] -> V[+AUX] * NP[] VP[] {}
356    |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
357    |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
358    |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
359    |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
360    |. . [-> . . . .| [2:3] S[-INV] -> NP[] * VP[] {}
361    |. . [-> . . . .| [2:3] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
362    |. . [-> . . . .| [2:3] S[-INV] -> NP[] * S[]/NP[] {}
363    |. [---> . . . .| [1:3] S[+INV] -> V[+AUX] NP[] * VP[] {}
364    |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
365    |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
366    |. . . [-> . . .| [3:4] VP[] -> V[-AUX, SUBCAT='clause'] * SBar[] {}
367    |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
368    |. . . . [-] . .| [4:5] Comp[] -> 'that' *
369    |. . . . [-> . .| [4:5] SBar[] -> Comp[] * S[-INV] {}
370    |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
371    |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
372    |. . . . . [-> .| [5:6] S[-INV] -> NP[] * VP[] {}
373    |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
374    |. . . . . [-> .| [5:6] S[-INV] -> NP[] * S[]/NP[] {}
375    |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
376    |. . . . . . [->| [6:7] VP[] -> V[-AUX, SUBCAT='trans'] * NP[] {}
377    |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
378    |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
379    |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
380    |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
381    |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
382    |. . [---------]| [2:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
383    |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
384    |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
385
386    >>> trees = list(trees)
387    >>> for tree in trees: print(tree)
388    (S[-INV]
389      (NP[+WH] who)
390      (S[+INV]/NP[]
391        (V[+AUX] do)
392        (NP[-WH] you)
393        (VP[]/NP[]
394          (V[-AUX, SUBCAT='clause'] claim)
395          (SBar[]/NP[]
396            (Comp[] that)
397            (S[-INV]/NP[]
398              (NP[-WH] you)
399              (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))
400
401A different parser should give the same parse trees, but perhaps in a different order:
402
403    >>> cp2 = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1,
404    ...                         parser=parse.FeatureEarleyChartParser)
405    >>> trees2 = cp2.parse(tokens)
406    |.w.d.y.c.t.y.l.|
407    |[-] . . . . . .| [0:1] 'who'
408    |. [-] . . . . .| [1:2] 'do'
409    |. . [-] . . . .| [2:3] 'you'
410    |. . . [-] . . .| [3:4] 'claim'
411    |. . . . [-] . .| [4:5] 'that'
412    |. . . . . [-] .| [5:6] 'you'
413    |. . . . . . [-]| [6:7] 'like'
414    |> . . . . . . .| [0:0] S[-INV] -> * NP[] VP[] {}
415    |> . . . . . . .| [0:0] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
416    |> . . . . . . .| [0:0] S[-INV] -> * NP[] S[]/NP[] {}
417    |> . . . . . . .| [0:0] S[-INV] -> * Adv[+NEG] S[+INV] {}
418    |> . . . . . . .| [0:0] S[+INV] -> * V[+AUX] NP[] VP[] {}
419    |> . . . . . . .| [0:0] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
420    |> . . . . . . .| [0:0] NP[+WH] -> * 'who' {}
421    |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
422    |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
423    |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
424    |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
425    |. > . . . . . .| [1:1] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
426    |. > . . . . . .| [1:1] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
427    |. > . . . . . .| [1:1] V[+AUX] -> * 'do' {}
428    |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
429    |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
430    |. > . . . . . .| [1:1] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
431    |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
432    |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
433    |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
434    |. > . . . . . .| [1:1] VP[] -> * V[+AUX] VP[] {}
435    |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
436    |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
437    |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
438    |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
439    |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
440    |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
441    |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
442    |. . > . . . . .| [2:2] VP[] -> * V[+AUX] VP[] {}
443    |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
444    |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
445    |. . > . . . . .| [2:2] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
446    |. . > . . . . .| [2:2] NP[-WH] -> * 'you' {}
447    |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
448    |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
449    |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
450    |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
451    |. . . > . . . .| [3:3] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
452    |. . . > . . . .| [3:3] V[-AUX, SUBCAT='clause'] -> * 'claim' {}
453    |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
454    |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
455    |. . . . > . . .| [4:4] SBar[]/?x[] -> * Comp[] S[-INV]/?x[] {}
456    |. . . . > . . .| [4:4] Comp[] -> * 'that' {}
457    |. . . . [-] . .| [4:5] Comp[] -> 'that' *
458    |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
459    |. . . . . > . .| [5:5] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
460    |. . . . . > . .| [5:5] NP[-WH] -> * 'you' {}
461    |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
462    |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
463    |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
464    |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
465    |. . . . . . > .| [6:6] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
466    |. . . . . . > .| [6:6] V[-AUX, SUBCAT='trans'] -> * 'like' {}
467    |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
468    |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
469    |. . . . . . . #| [7:7] NP[]/NP[] -> *
470    |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
471    |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
472    |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
473    |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
474    |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
475    |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
476
477    >>> sorted(trees) == sorted(trees2)
478    True
479
480
481Let's load a German grammar:
482
483    >>> cp = parse.load_parser('grammars/book_grammars/german.fcfg', trace=0)
484    >>> sent = 'die Katze sieht den Hund'
485    >>> tokens = sent.split()
486    >>> trees = cp.parse(tokens)
487    >>> for tree in trees: print(tree)
488    (S[]
489      (NP[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom']
490        (Det[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom'] die)
491        (N[AGR=[GND='fem', NUM='sg', PER=3]] Katze))
492      (VP[AGR=[NUM='sg', PER=3]]
493        (TV[AGR=[NUM='sg', PER=3], OBJCASE='acc'] sieht)
494        (NP[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc']
495          (Det[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] den)
496          (N[AGR=[GND='masc', NUM='sg', PER=3]] Hund))))
497
498Grammar with Binding Operators
499------------------------------
500The `bindop.fcfg`_ grammar is a semantic grammar that uses lambda
501calculus.  Each element has a core semantics, which is a single lambda
502calculus expression; and a set of binding operators, which bind
503variables.
504
505.. _bindop.fcfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/bindop.fcfg
506
507In order to make the binding operators work right, they need to
508instantiate their bound variable every time they are added to the
509chart.  To do this, we use a special subclass of `Chart`, called
510`InstantiateVarsChart`.
511
512    >>> from nltk.parse.featurechart import InstantiateVarsChart
513    >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=1,
514    ...                        chart_class=InstantiateVarsChart)
515    >>> print(cp.grammar())
516    Grammar with 15 productions (start state = S[])
517        S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] VP[SEM=[BO=?b2, CORE=?vp]]
518        VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] NP[SEM=[BO=?b2, CORE=?obj]]
519        VP[SEM=?s] -> IV[SEM=?s]
520        NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] N[SEM=[BO=?b2, CORE=?n]]
521        Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a'
522        N[SEM=[BO={/}, CORE=<dog>]] -> 'dog'
523        N[SEM=[BO={/}, CORE=<dog>]] -> 'cat'
524        N[SEM=[BO={/}, CORE=<dog>]] -> 'mouse'
525        IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks'
526        IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'eats'
527        IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'walks'
528        TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds'
529        TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'walks'
530        NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'john'
531        NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'alex'
532
533A simple intransitive sentence:
534
535    >>> from nltk.sem import logic
536    >>> logic._counter._value = 100
537
538    >>> trees = cp.parse('john barks'.split())
539    |. john.barks.|
540    |[-----]     .| [0:1] 'john'
541    |.     [-----]| [1:2] 'barks'
542    |[-----]     .| [0:1] NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] -> 'john' *
543    |[----->     .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
544    |.     [-----]| [1:2] IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks' *
545    |.     [-----]| [1:2] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
546    |[===========]| [0:2] S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
547    >>> for tree in trees: print(tree)
548    (S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]]
549      (NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] john)
550      (VP[SEM=[BO={/}, CORE=<\x.bark(x)>]]
551        (IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] barks)))
552
553A transitive sentence:
554
555    >>> trees = cp.parse('john feeds a dog'.split())
556    |.joh.fee. a .dog.|
557    |[---]   .   .   .| [0:1] 'john'
558    |.   [---]   .   .| [1:2] 'feeds'
559    |.   .   [---]   .| [2:3] 'a'
560    |.   .   .   [---]| [3:4] 'dog'
561    |[---]   .   .   .| [0:1] NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] -> 'john' *
562    |[--->   .   .   .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
563    |.   [---]   .   .| [1:2] TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds' *
564    |.   [--->   .   .| [1:2] VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] * NP[SEM=[BO=?b2, CORE=?obj]] {?b1: {/}, ?v: <LambdaExpression \x y.feed(y,x)>}
565    |.   .   [---]   .| [2:3] Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a' *
566    |.   .   [--->   .| [2:3] NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] * N[SEM=[BO=?b2, CORE=?n]] {?b1: {/}, ?det: <LambdaExpression \Q P.exists x.(Q(x) & P(x))>}
567    |.   .   .   [---]| [3:4] N[SEM=[BO={/}, CORE=<dog>]] -> 'dog' *
568    |.   .   [-------]| [2:4] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]] -> Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] N[SEM=[BO={/}, CORE=<dog>]] *
569    |.   .   [------->| [2:4] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.exists x.(dog(x) & P(x)),z2)}, ?subj: <IndividualVariableExpression z2>}
570    |.   [-----------]| [1:4] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] -> TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<z2>]] *
571    |[===============]| [0:4] S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<\y.feed(y,z3)>]] *
572
573    >>> for tree in trees: print(tree)
574    (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
575      (NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] john)
576      (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
577        (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
578        (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]]
579          (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
580          (N[SEM=[BO={/}, CORE=<dog>]] dog))))
581
582Turn down the verbosity:
583
584    >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=0,
585    ...                       chart_class=InstantiateVarsChart)
586
587Reuse the same lexical item twice:
588
589    >>> trees = cp.parse('john feeds john'.split())
590    >>> for tree in trees: print(tree)
591    (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.P(John),z3)}, CORE=<feed(z2,z3)>]]
592      (NP[SEM=[BO={bo(\P.P(John),z104)}, CORE=<z104>]] john)
593      (VP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<\y.feed(y,z2)>]]
594        (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
595        (NP[SEM=[BO={bo(\P.P(John),z105)}, CORE=<z105>]] john)))
596
597    >>> trees = cp.parse('a dog feeds a dog'.split())
598    >>> for tree in trees: print(tree)
599    (S[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
600      (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z106)}, CORE=<z106>]]
601        (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
602        (N[SEM=[BO={/}, CORE=<dog>]] dog))
603      (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
604        (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
605        (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z107)}, CORE=<z107>]]
606          (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
607          (N[SEM=[BO={/}, CORE=<dog>]] dog))))
608