1.. Copyright (C) 2001-2019 NLTK Project 2.. For license information, see LICENSE.TXT 3 4========================= 5 Feature Grammar Parsing 6========================= 7 8.. include:: ../../../nltk_book/definitions.rst 9 10Grammars can be parsed from strings. 11 12 >>> from __future__ import print_function 13 >>> import nltk 14 >>> from nltk import grammar, parse 15 >>> g = """ 16 ... % start DP 17 ... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a] 18 ... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that' 19 ... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those' 20 ... D[AGR=[NUM='pl', PERS=1]] -> 'we' 21 ... D[AGR=[PERS=2]] -> 'you' 22 ... N[AGR=[NUM='sg', GND='m']] -> 'boy' 23 ... N[AGR=[NUM='pl', GND='m']] -> 'boys' 24 ... N[AGR=[NUM='sg', GND='f']] -> 'girl' 25 ... N[AGR=[NUM='pl', GND='f']] -> 'girls' 26 ... N[AGR=[NUM='sg']] -> 'student' 27 ... N[AGR=[NUM='pl']] -> 'students' 28 ... """ 29 >>> grammar = grammar.FeatureGrammar.fromstring(g) 30 >>> tokens = 'these girls'.split() 31 >>> parser = parse.FeatureEarleyChartParser(grammar) 32 >>> trees = parser.parse(tokens) 33 >>> for tree in trees: print(tree) 34 (DP[AGR=[GND='f', NUM='pl', PERS=3]] 35 (D[AGR=[NUM='pl', PERS=3]] these) 36 (N[AGR=[GND='f', NUM='pl']] girls)) 37 38In general, when we are trying to develop even a very small grammar, 39it is convenient to put the rules in a file where they can be edited, 40tested and revised. Let's assume that we have saved feat0cfg_ as a file named 41``'feat0.fcfg'`` and placed it in the NLTK ``data`` directory. We can 42inspect it as follows: 43 44.. _feat0cfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/feat0.fcfg 45 46 >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg') 47 % start S 48 # ################### 49 # Grammar Productions 50 # ################### 51 # S expansion productions 52 S -> NP[NUM=?n] VP[NUM=?n] 53 # NP expansion productions 54 NP[NUM=?n] -> N[NUM=?n] 55 NP[NUM=?n] -> PropN[NUM=?n] 56 NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n] 57 NP[NUM=pl] -> N[NUM=pl] 58 # VP expansion productions 59 VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n] 60 VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP 61 # ################### 62 # Lexical Productions 63 # ################### 64 Det[NUM=sg] -> 'this' | 'every' 65 Det[NUM=pl] -> 'these' | 'all' 66 Det -> 'the' | 'some' | 'several' 67 PropN[NUM=sg]-> 'Kim' | 'Jody' 68 N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child' 69 N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' 70 IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks' 71 TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes' 72 IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk' 73 TV[TENSE=pres, NUM=pl] -> 'see' | 'like' 74 IV[TENSE=past] -> 'disappeared' | 'walked' 75 TV[TENSE=past] -> 'saw' | 'liked' 76 77Assuming we have saved feat0cfg_ as a file named 78``'feat0.fcfg'``, the function ``parse.load_parser`` allows us to 79read the grammar into NLTK, ready for use in parsing. 80 81 82 >>> cp = parse.load_parser('grammars/book_grammars/feat0.fcfg', trace=1) 83 >>> sent = 'Kim likes children' 84 >>> tokens = sent.split() 85 >>> tokens 86 ['Kim', 'likes', 'children'] 87 >>> trees = cp.parse(tokens) 88 |.Kim .like.chil.| 89 |[----] . .| [0:1] 'Kim' 90 |. [----] .| [1:2] 'likes' 91 |. . [----]| [2:3] 'children' 92 |[----] . .| [0:1] PropN[NUM='sg'] -> 'Kim' * 93 |[----] . .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] * 94 |[----> . .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'} 95 |. [----] .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' * 96 |. [----> .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'} 97 |. . [----]| [2:3] N[NUM='pl'] -> 'children' * 98 |. . [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] * 99 |. . [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'} 100 |. [---------]| [1:3] VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] * 101 |[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] * 102 >>> for tree in trees: print(tree) 103 (S[] 104 (NP[NUM='sg'] (PropN[NUM='sg'] Kim)) 105 (VP[NUM='sg', TENSE='pres'] 106 (TV[NUM='sg', TENSE='pres'] likes) 107 (NP[NUM='pl'] (N[NUM='pl'] children)))) 108 109The parser works directly with 110the underspecified productions given by the grammar. That is, the 111Predictor rule does not attempt to compile out all admissible feature 112combinations before trying to expand the non-terminals on the left hand 113side of a production. However, when the Scanner matches an input word 114against a lexical production that has been predicted, the new edge will 115typically contain fully specified features; e.g., the edge 116[PropN[`num`:feat: = `sg`:fval:] |rarr| 'Kim', (0, 1)]. Recall from 117Chapter 8 that the Fundamental (or Completer) Rule in 118standard CFGs is used to combine an incomplete edge that's expecting a 119nonterminal *B* with a following, complete edge whose left hand side 120matches *B*. In our current setting, rather than checking for a 121complete match, we test whether the expected category *B* will 122`unify`:dt: with the left hand side *B'* of a following complete 123edge. We will explain in more detail in Section 9.2 how 124unification works; for the moment, it is enough to know that as a 125result of unification, any variable values of features in *B* will be 126instantiated by constant values in the corresponding feature structure 127in *B'*, and these instantiated values will be used in the new edge 128added by the Completer. This instantiation can be seen, for example, 129in the edge 130[NP [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)] 131in Example 9.2, where the feature `num`:feat: has been assigned the value `sg`:fval:. 132 133Feature structures in NLTK are ... Atomic feature values can be strings or 134integers. 135 136 >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg') 137 >>> print(fs1) 138 [ NUM = 'sg' ] 139 [ TENSE = 'past' ] 140 141We can think of a feature structure as being like a Python dictionary, 142and access its values by indexing in the usual way. 143 144 >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem') 145 >>> print(fs1['GND']) 146 fem 147 148We can also define feature structures which have complex values, as 149discussed earlier. 150 151 >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1) 152 >>> print(fs2) 153 [ [ GND = 'fem' ] ] 154 [ AGR = [ NUM = 'pl' ] ] 155 [ [ PER = 3 ] ] 156 [ ] 157 [ POS = 'N' ] 158 >>> print(fs2['AGR']) 159 [ GND = 'fem' ] 160 [ NUM = 'pl' ] 161 [ PER = 3 ] 162 >>> print(fs2['AGR']['PER']) 163 3 164 165Feature structures can also be constructed using the ``parse()`` 166method of the ``nltk.FeatStruct`` class. Note that in this case, atomic 167feature values do not need to be enclosed in quotes. 168 169 >>> f1 = nltk.FeatStruct("[NUMBER = sg]") 170 >>> f2 = nltk.FeatStruct("[PERSON = 3]") 171 >>> print(nltk.unify(f1, f2)) 172 [ NUMBER = 'sg' ] 173 [ PERSON = 3 ] 174 175 >>> f1 = nltk.FeatStruct("[A = [B = b, D = d]]") 176 >>> f2 = nltk.FeatStruct("[A = [C = c, D = d]]") 177 >>> print(nltk.unify(f1, f2)) 178 [ [ B = 'b' ] ] 179 [ A = [ C = 'c' ] ] 180 [ [ D = 'd' ] ] 181 182 183Feature Structures as Graphs 184---------------------------- 185 186Feature structures are not inherently tied to linguistic objects; they are 187general purpose structures for representing knowledge. For example, we 188could encode information about a person in a feature structure: 189 190 >>> person01 = nltk.FeatStruct("[NAME=Lee, TELNO='01 27 86 42 96',AGE=33]") 191 >>> print(person01) 192 [ AGE = 33 ] 193 [ NAME = 'Lee' ] 194 [ TELNO = '01 27 86 42 96' ] 195 196There are a number of notations for representing reentrancy in 197matrix-style representations of feature structures. In NLTK, we adopt 198the following convention: the first occurrence of a shared feature structure 199is prefixed with an integer in parentheses, such as ``(1)``, and any 200subsequent reference to that structure uses the notation 201``->(1)``, as shown below. 202 203 204 >>> fs = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], 205 ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""") 206 >>> print(fs) 207 [ ADDRESS = (1) [ NUMBER = 74 ] ] 208 [ [ STREET = 'rue Pascal' ] ] 209 [ ] 210 [ NAME = 'Lee' ] 211 [ ] 212 [ SPOUSE = [ ADDRESS -> (1) ] ] 213 [ [ NAME = 'Kim' ] ] 214 215There can be any number of tags within a single feature structure. 216 217 >>> fs3 = nltk.FeatStruct("[A=(1)[B=b], C=(2)[], D->(1), E->(2)]") 218 >>> print(fs3) 219 [ A = (1) [ B = 'b' ] ] 220 [ ] 221 [ C = (2) [] ] 222 [ ] 223 [ D -> (1) ] 224 [ E -> (2) ] 225 >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal') 226 >>> fs2 = nltk.FeatStruct(CITY='Paris') 227 >>> print(nltk.unify(fs1, fs2)) 228 [ CITY = 'Paris' ] 229 [ NUMBER = 74 ] 230 [ STREET = 'rue Pascal' ] 231 232Unification is symmetric: 233 234 >>> nltk.unify(fs1, fs2) == nltk.unify(fs2, fs1) 235 True 236 237Unification is commutative: 238 239 >>> fs3 = nltk.FeatStruct(TELNO='01 27 86 42 96') 240 >>> nltk.unify(nltk.unify(fs1, fs2), fs3) == nltk.unify(fs1, nltk.unify(fs2, fs3)) 241 True 242 243Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\ 244:subscript:`1` will fail if the two feature structures share a path |pi|, 245but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct 246atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK, 247this is implemented by setting the result of unification to be 248``None``. 249 250 >>> fs0 = nltk.FeatStruct(A='a') 251 >>> fs1 = nltk.FeatStruct(A='b') 252 >>> print(nltk.unify(fs0, fs1)) 253 None 254 255Now, if we look at how unification interacts with structure-sharing, 256things become really interesting. 257 258 259 260 >>> fs0 = nltk.FeatStruct("""[NAME=Lee, 261 ... ADDRESS=[NUMBER=74, 262 ... STREET='rue Pascal'], 263 ... SPOUSE= [NAME=Kim, 264 ... ADDRESS=[NUMBER=74, 265 ... STREET='rue Pascal']]]""") 266 >>> print(fs0) 267 [ ADDRESS = [ NUMBER = 74 ] ] 268 [ [ STREET = 'rue Pascal' ] ] 269 [ ] 270 [ NAME = 'Lee' ] 271 [ ] 272 [ [ ADDRESS = [ NUMBER = 74 ] ] ] 273 [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ] 274 [ [ ] ] 275 [ [ NAME = 'Kim' ] ] 276 277 278 >>> fs1 = nltk.FeatStruct("[SPOUSE=[ADDRESS=[CITY=Paris]]]") 279 >>> print(nltk.unify(fs0, fs1)) 280 [ ADDRESS = [ NUMBER = 74 ] ] 281 [ [ STREET = 'rue Pascal' ] ] 282 [ ] 283 [ NAME = 'Lee' ] 284 [ ] 285 [ [ [ CITY = 'Paris' ] ] ] 286 [ [ ADDRESS = [ NUMBER = 74 ] ] ] 287 [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ] 288 [ [ ] ] 289 [ [ NAME = 'Kim' ] ] 290 291 >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], 292 ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""") 293 294 295 >>> print(fs2) 296 [ ADDRESS = (1) [ NUMBER = 74 ] ] 297 [ [ STREET = 'rue Pascal' ] ] 298 [ ] 299 [ NAME = 'Lee' ] 300 [ ] 301 [ SPOUSE = [ ADDRESS -> (1) ] ] 302 [ [ NAME = 'Kim' ] ] 303 304 305 >>> print(nltk.unify(fs2, fs1)) 306 [ [ CITY = 'Paris' ] ] 307 [ ADDRESS = (1) [ NUMBER = 74 ] ] 308 [ [ STREET = 'rue Pascal' ] ] 309 [ ] 310 [ NAME = 'Lee' ] 311 [ ] 312 [ SPOUSE = [ ADDRESS -> (1) ] ] 313 [ [ NAME = 'Kim' ] ] 314 315 316 >>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]") 317 >>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]") 318 >>> print(fs2) 319 [ ADDRESS1 = ?x ] 320 [ ADDRESS2 = ?x ] 321 >>> print(nltk.unify(fs1, fs2)) 322 [ ADDRESS1 = (1) [ NUMBER = 74 ] ] 323 [ [ STREET = 'rue Pascal' ] ] 324 [ ] 325 [ ADDRESS2 -> (1) ] 326 327 328 329 330 >>> sent = 'who do you claim that you like' 331 >>> tokens = sent.split() 332 >>> cp = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1) 333 >>> trees = cp.parse(tokens) 334 |.w.d.y.c.t.y.l.| 335 |[-] . . . . . .| [0:1] 'who' 336 |. [-] . . . . .| [1:2] 'do' 337 |. . [-] . . . .| [2:3] 'you' 338 |. . . [-] . . .| [3:4] 'claim' 339 |. . . . [-] . .| [4:5] 'that' 340 |. . . . . [-] .| [5:6] 'you' 341 |. . . . . . [-]| [6:7] 'like' 342 |# . . . . . . .| [0:0] NP[]/NP[] -> * 343 |. # . . . . . .| [1:1] NP[]/NP[] -> * 344 |. . # . . . . .| [2:2] NP[]/NP[] -> * 345 |. . . # . . . .| [3:3] NP[]/NP[] -> * 346 |. . . . # . . .| [4:4] NP[]/NP[] -> * 347 |. . . . . # . .| [5:5] NP[]/NP[] -> * 348 |. . . . . . # .| [6:6] NP[]/NP[] -> * 349 |. . . . . . . #| [7:7] NP[]/NP[] -> * 350 |[-] . . . . . .| [0:1] NP[+WH] -> 'who' * 351 |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {} 352 |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {} 353 |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {} 354 |. [-] . . . . .| [1:2] V[+AUX] -> 'do' * 355 |. [-> . . . . .| [1:2] S[+INV] -> V[+AUX] * NP[] VP[] {} 356 |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {} 357 |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {} 358 |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {} 359 |. . [-] . . . .| [2:3] NP[-WH] -> 'you' * 360 |. . [-> . . . .| [2:3] S[-INV] -> NP[] * VP[] {} 361 |. . [-> . . . .| [2:3] S[-INV]/?x[] -> NP[] * VP[]/?x[] {} 362 |. . [-> . . . .| [2:3] S[-INV] -> NP[] * S[]/NP[] {} 363 |. [---> . . . .| [1:3] S[+INV] -> V[+AUX] NP[] * VP[] {} 364 |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {} 365 |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' * 366 |. . . [-> . . .| [3:4] VP[] -> V[-AUX, SUBCAT='clause'] * SBar[] {} 367 |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {} 368 |. . . . [-] . .| [4:5] Comp[] -> 'that' * 369 |. . . . [-> . .| [4:5] SBar[] -> Comp[] * S[-INV] {} 370 |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {} 371 |. . . . . [-] .| [5:6] NP[-WH] -> 'you' * 372 |. . . . . [-> .| [5:6] S[-INV] -> NP[] * VP[] {} 373 |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {} 374 |. . . . . [-> .| [5:6] S[-INV] -> NP[] * S[]/NP[] {} 375 |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' * 376 |. . . . . . [->| [6:7] VP[] -> V[-AUX, SUBCAT='trans'] * NP[] {} 377 |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {} 378 |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] * 379 |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] * 380 |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] * 381 |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] * 382 |. . [---------]| [2:7] S[-INV]/NP[] -> NP[] VP[]/NP[] * 383 |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] * 384 |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] * 385 386 >>> trees = list(trees) 387 >>> for tree in trees: print(tree) 388 (S[-INV] 389 (NP[+WH] who) 390 (S[+INV]/NP[] 391 (V[+AUX] do) 392 (NP[-WH] you) 393 (VP[]/NP[] 394 (V[-AUX, SUBCAT='clause'] claim) 395 (SBar[]/NP[] 396 (Comp[] that) 397 (S[-INV]/NP[] 398 (NP[-WH] you) 399 (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] ))))))) 400 401A different parser should give the same parse trees, but perhaps in a different order: 402 403 >>> cp2 = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1, 404 ... parser=parse.FeatureEarleyChartParser) 405 >>> trees2 = cp2.parse(tokens) 406 |.w.d.y.c.t.y.l.| 407 |[-] . . . . . .| [0:1] 'who' 408 |. [-] . . . . .| [1:2] 'do' 409 |. . [-] . . . .| [2:3] 'you' 410 |. . . [-] . . .| [3:4] 'claim' 411 |. . . . [-] . .| [4:5] 'that' 412 |. . . . . [-] .| [5:6] 'you' 413 |. . . . . . [-]| [6:7] 'like' 414 |> . . . . . . .| [0:0] S[-INV] -> * NP[] VP[] {} 415 |> . . . . . . .| [0:0] S[-INV]/?x[] -> * NP[] VP[]/?x[] {} 416 |> . . . . . . .| [0:0] S[-INV] -> * NP[] S[]/NP[] {} 417 |> . . . . . . .| [0:0] S[-INV] -> * Adv[+NEG] S[+INV] {} 418 |> . . . . . . .| [0:0] S[+INV] -> * V[+AUX] NP[] VP[] {} 419 |> . . . . . . .| [0:0] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {} 420 |> . . . . . . .| [0:0] NP[+WH] -> * 'who' {} 421 |[-] . . . . . .| [0:1] NP[+WH] -> 'who' * 422 |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {} 423 |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {} 424 |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {} 425 |. > . . . . . .| [1:1] S[-INV]/?x[] -> * NP[] VP[]/?x[] {} 426 |. > . . . . . .| [1:1] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {} 427 |. > . . . . . .| [1:1] V[+AUX] -> * 'do' {} 428 |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {} 429 |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {} 430 |. > . . . . . .| [1:1] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {} 431 |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='intrans'] {} 432 |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {} 433 |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {} 434 |. > . . . . . .| [1:1] VP[] -> * V[+AUX] VP[] {} 435 |. [-] . . . . .| [1:2] V[+AUX] -> 'do' * 436 |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {} 437 |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {} 438 |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {} 439 |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='intrans'] {} 440 |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {} 441 |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {} 442 |. . > . . . . .| [2:2] VP[] -> * V[+AUX] VP[] {} 443 |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {} 444 |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {} 445 |. . > . . . . .| [2:2] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {} 446 |. . > . . . . .| [2:2] NP[-WH] -> * 'you' {} 447 |. . [-] . . . .| [2:3] NP[-WH] -> 'you' * 448 |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {} 449 |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {} 450 |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {} 451 |. . . > . . . .| [3:3] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {} 452 |. . . > . . . .| [3:3] V[-AUX, SUBCAT='clause'] -> * 'claim' {} 453 |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' * 454 |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {} 455 |. . . . > . . .| [4:4] SBar[]/?x[] -> * Comp[] S[-INV]/?x[] {} 456 |. . . . > . . .| [4:4] Comp[] -> * 'that' {} 457 |. . . . [-] . .| [4:5] Comp[] -> 'that' * 458 |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {} 459 |. . . . . > . .| [5:5] S[-INV]/?x[] -> * NP[] VP[]/?x[] {} 460 |. . . . . > . .| [5:5] NP[-WH] -> * 'you' {} 461 |. . . . . [-] .| [5:6] NP[-WH] -> 'you' * 462 |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {} 463 |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {} 464 |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {} 465 |. . . . . . > .| [6:6] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {} 466 |. . . . . . > .| [6:6] V[-AUX, SUBCAT='trans'] -> * 'like' {} 467 |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' * 468 |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {} 469 |. . . . . . . #| [7:7] NP[]/NP[] -> * 470 |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] * 471 |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] * 472 |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] * 473 |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] * 474 |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] * 475 |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] * 476 477 >>> sorted(trees) == sorted(trees2) 478 True 479 480 481Let's load a German grammar: 482 483 >>> cp = parse.load_parser('grammars/book_grammars/german.fcfg', trace=0) 484 >>> sent = 'die Katze sieht den Hund' 485 >>> tokens = sent.split() 486 >>> trees = cp.parse(tokens) 487 >>> for tree in trees: print(tree) 488 (S[] 489 (NP[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom'] 490 (Det[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom'] die) 491 (N[AGR=[GND='fem', NUM='sg', PER=3]] Katze)) 492 (VP[AGR=[NUM='sg', PER=3]] 493 (TV[AGR=[NUM='sg', PER=3], OBJCASE='acc'] sieht) 494 (NP[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] 495 (Det[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] den) 496 (N[AGR=[GND='masc', NUM='sg', PER=3]] Hund)))) 497 498Grammar with Binding Operators 499------------------------------ 500The `bindop.fcfg`_ grammar is a semantic grammar that uses lambda 501calculus. Each element has a core semantics, which is a single lambda 502calculus expression; and a set of binding operators, which bind 503variables. 504 505.. _bindop.fcfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/bindop.fcfg 506 507In order to make the binding operators work right, they need to 508instantiate their bound variable every time they are added to the 509chart. To do this, we use a special subclass of `Chart`, called 510`InstantiateVarsChart`. 511 512 >>> from nltk.parse.featurechart import InstantiateVarsChart 513 >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=1, 514 ... chart_class=InstantiateVarsChart) 515 >>> print(cp.grammar()) 516 Grammar with 15 productions (start state = S[]) 517 S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] VP[SEM=[BO=?b2, CORE=?vp]] 518 VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] NP[SEM=[BO=?b2, CORE=?obj]] 519 VP[SEM=?s] -> IV[SEM=?s] 520 NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] N[SEM=[BO=?b2, CORE=?n]] 521 Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a' 522 N[SEM=[BO={/}, CORE=<dog>]] -> 'dog' 523 N[SEM=[BO={/}, CORE=<dog>]] -> 'cat' 524 N[SEM=[BO={/}, CORE=<dog>]] -> 'mouse' 525 IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks' 526 IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'eats' 527 IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'walks' 528 TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds' 529 TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'walks' 530 NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'john' 531 NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'alex' 532 533A simple intransitive sentence: 534 535 >>> from nltk.sem import logic 536 >>> logic._counter._value = 100 537 538 >>> trees = cp.parse('john barks'.split()) 539 |. john.barks.| 540 |[-----] .| [0:1] 'john' 541 |. [-----]| [1:2] 'barks' 542 |[-----] .| [0:1] NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] -> 'john' * 543 |[-----> .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>} 544 |. [-----]| [1:2] IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks' * 545 |. [-----]| [1:2] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] * 546 |[===========]| [0:2] S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] * 547 >>> for tree in trees: print(tree) 548 (S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]] 549 (NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] john) 550 (VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] 551 (IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] barks))) 552 553A transitive sentence: 554 555 >>> trees = cp.parse('john feeds a dog'.split()) 556 |.joh.fee. a .dog.| 557 |[---] . . .| [0:1] 'john' 558 |. [---] . .| [1:2] 'feeds' 559 |. . [---] .| [2:3] 'a' 560 |. . . [---]| [3:4] 'dog' 561 |[---] . . .| [0:1] NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] -> 'john' * 562 |[---> . . .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>} 563 |. [---] . .| [1:2] TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds' * 564 |. [---> . .| [1:2] VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] * NP[SEM=[BO=?b2, CORE=?obj]] {?b1: {/}, ?v: <LambdaExpression \x y.feed(y,x)>} 565 |. . [---] .| [2:3] Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a' * 566 |. . [---> .| [2:3] NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] * N[SEM=[BO=?b2, CORE=?n]] {?b1: {/}, ?det: <LambdaExpression \Q P.exists x.(Q(x) & P(x))>} 567 |. . . [---]| [3:4] N[SEM=[BO={/}, CORE=<dog>]] -> 'dog' * 568 |. . [-------]| [2:4] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]] -> Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] N[SEM=[BO={/}, CORE=<dog>]] * 569 |. . [------->| [2:4] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.exists x.(dog(x) & P(x)),z2)}, ?subj: <IndividualVariableExpression z2>} 570 |. [-----------]| [1:4] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] -> TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<z2>]] * 571 |[===============]| [0:4] S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<\y.feed(y,z3)>]] * 572 573 >>> for tree in trees: print(tree) 574 (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] 575 (NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] john) 576 (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] 577 (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds) 578 (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]] 579 (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a) 580 (N[SEM=[BO={/}, CORE=<dog>]] dog)))) 581 582Turn down the verbosity: 583 584 >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=0, 585 ... chart_class=InstantiateVarsChart) 586 587Reuse the same lexical item twice: 588 589 >>> trees = cp.parse('john feeds john'.split()) 590 >>> for tree in trees: print(tree) 591 (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.P(John),z3)}, CORE=<feed(z2,z3)>]] 592 (NP[SEM=[BO={bo(\P.P(John),z104)}, CORE=<z104>]] john) 593 (VP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<\y.feed(y,z2)>]] 594 (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds) 595 (NP[SEM=[BO={bo(\P.P(John),z105)}, CORE=<z105>]] john))) 596 597 >>> trees = cp.parse('a dog feeds a dog'.split()) 598 >>> for tree in trees: print(tree) 599 (S[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] 600 (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z106)}, CORE=<z106>]] 601 (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a) 602 (N[SEM=[BO={/}, CORE=<dog>]] dog)) 603 (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] 604 (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds) 605 (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z107)}, CORE=<z107>]] 606 (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a) 607 (N[SEM=[BO={/}, CORE=<dog>]] dog)))) 608