1# PEG Parser Combinators Implemented in Rust 2 3This article introduces [pom](https://github.com/J-F-Liu/pom), a PEG parser combinator library implemented in Rust, using operator overloading without macros. 4 5## Why Rust? 6 7![Rust](rust.png) 8 9After I've learned C/C++ and C#, I found that choosing a new programming language can greatly affect a programmer's productivity. 10On one hand I keep sorting out new languages, there are hundreds of them, I examine and choose what I like best, my favorites are C#, Ruby, TypeScript and Rust. 11On the other hand I try to design a new language and implement a compiler by myself. 12 13I like the syntax provided by C#, but hate the huge .NET runtime. Dependency on CLR makes distribution of an application written in C# very hard. Compiling to native code is always what I longed for a programming language. 14In year 2003 I thought a compiler can get rid of garbage collector by generating free memory instructions in appropriate locations in the target program. 15But I didn't go deep into the design of the details of this mechanism, I decided to firstly write a working compiler, then improve the design of the language and implementation of the compiler bit by bit. 16 17The first stage of compilation is parsing. I tried some parser generators, but not satisfied with the result. 18Then I dig into the parsing theory, followed several books, implemented DFA, NFA, NFA to DFA conversion, LL(1), LR, LALR algorithms, 19then wrote a parser to parse BNF, EBNF or TBNF grammar file, and generate parser code corresponding to the grammar. 20 21The syntax/semantics analysis and code generation parts of a compiler are more difficult. 22I even tried to define a intermediate assembly language, at that time I didn't know LLVM. My effort of writing a compiler ceased for years, then Rust was born. 23 24At first glance, the Rust's syntax is a bit strange, why use `fn` instaed of `def`, why use `let mut` instead of `var`, I was not attracted by it. 25After read a publication on O'Reilly [*Why Rust?*](http://www.oreilly.com/programming/free/files/why-rust.pdf) I suddenly realized that this is language I'm trying to build, 26when you actually start using Rust you'll find that `fn` and `let mut` fits Rust's logic well. For me, **Rust is once a dream now a reality.** 27 28Rust has a steep learning curve, more challenging than any of the previous programming languages I learned. All this learning is worthwhile when you finally get your program working and polished. 29Object oriented class hierarchy is not good enough for code reuse, Rust's enum, tuple, struct and trait type system is a better solution. 30I still wondering whether the Rust compiler can be smart enough to elide all the lifetime parameters, they are mostly noise and obstacle when reading and writing programs. 31 32## What is PEG? 33 34When I discovered [PEG](http://bford.info/packrat/), I knew that all my previous work on LALR can be thrown away. 35I rewrote my parser generator using and working with PEG. Using this parser generator I created a [YAML parser](https://www.codeproject.com/Articles/28720/YAML-Parser-in-C) and a [Lua Interpreter](https://www.codeproject.com/Articles/228212/Lua-Interpreter). 36 37[Parsing Expression Grammars](http://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEGs) are an alternative to [Context-Free Grammars](http://en.wikipedia.org/wiki/Context-free_grammar) (CFGs) for formally specifying syntax. 38CFG describe a rule system to generate language strings while PEG describe a rule system to recognize language strings. 39 40![CFG](cfg.png) 41 42![PEG](peg.png) 43 44Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree. 45We normally specify our languages directly by how to recognize it, so PEG is both closer match to syntax practices and more powerful than nondeterministic CFG. 46 47### Parsing expressions 48 49| Expr | Description | 50| ---------- | ----------------------------------- | 51| ε | the empty string | 52| a | terminal (a ∈ Σ) | 53| A | non-terminal (A ∈ N) | 54| e1 e2 | sequence | 55| e1 / e2 | prioritized choice | 56| e?, e*, e+ | optional, zero-or-more, one-or-more | 57| &e, !e | syntactic predicates | 58 59## What is parser combinator? 60 61When I heard of Parsec in the Haskell world, I got the concept of parser combinator for my first time. 62 63A *parser* is a function which takes a *string* (a series of *symbols*) as input, and returns matching result as *output*. 64 65A *combinator* is a higher-order function (a "functional") which takes zero or more functions (each of the same type) as input and returns a new function of the same type as output. 66 67A *parser combinator* is a higher-order function which takes parsers as input and returns a new parser as output. 68 69Parser combinators allow you write grammar rules and create a parser directly in the host language, without a separated parser generation step, so the whole procedure is more fluent. 70 71## How to implement parser combinators? 72 73I thought deeply about how to implement parser combinator using language constructs provided by Rust. In summary, there are four approaches: 74 751. Parser as closure 76 77 ```rust 78 pub fn empty<I>() -> impl Fn(&mut Input<I>) -> Result<()> { 79 |_: &mut Input<I>| Ok(()) 80 } 81 82 pub fn term<I>(t: I) -> impl Fn(&mut Input<I>) -> Result<I> { 83 ... 84 } 85 86 pub fn seq<'a, I>(tag: &'a [I]) -> impl Fn(&mut Input<I>) -> Result<&'a [I]> { 87 ... 88 } 89 ... 90 // To create a parser for integer 91 let parser = concatenate(optional(one_of("+-")), one_or_more(one_of("0123456789"))); 92 ``` 93 94 *Pros*: Less implementation code. 95 96 *Cons*: Cannot overload operators, poor readability. 97 982. Parser as struct 99 100 ```rust 101 pub struct Parser<I, O> { 102 method: Box<Fn(&mut Input<I>) -> Result<O>>, 103 } 104 105 impl<I, O> Parser<I, O> { 106 /// Create new parser. 107 pub fn new<P>(parse: P) -> Parser<I, O> 108 where P: Fn(&mut Input<I>) -> Result<O> + 'static 109 { 110 Parser { method: Box::new(parse) } 111 } 112 113 /// Apply the parser to parse input. 114 pub fn parse(&self, input: &mut Input<I>) -> Result<O> { 115 (self.method)(input) 116 } 117 ... 118 } 119 120 pub fn empty<I>() -> Parser<I, ()> { 121 Parser::new(|_: &mut Input<I>| Ok(())) 122 } 123 124 pub fn term<I>(t: I) -> Parser<I, I> { 125 ... 126 } 127 ... 128 impl<I: Copy, O, U> Add<Parser<I, U>> for Parser<I, O> { 129 type Output = Parser<I, (O, U)>; 130 131 fn add(self, other: Parser<I, U>) -> Self::Output 132 where I: 'static, 133 O: 'static, 134 U: 'static 135 { 136 Parser::new(move |input: &mut Input<I>| { 137 let start = input.position(); 138 let result = self.parse(input) 139 .and_then(|out1| other.parse(input).map(|out2| (out1, out2))); 140 if result.is_err() { 141 input.jump_to(start); 142 } 143 result 144 }) 145 } 146 } 147 ... 148 // To create a parser for integer 149 let parser = one_of("+-").opt() + one_of("0123456789").repeat(1..); 150 ``` 151 152 *Pros*: Can overload operators, elegant code. 153 154 *Cons*: Depends on compiler's zero-cost abstractions to optimize runtime performance. 155 156 Crate [pom](https://github.com/J-F-Liu/pom) is using this approach. 157 1583. Parser as trait 159 160 ```rust 161 pub trait Parser { 162 type I: ?Sized; 163 type O; 164 165 fn parse<'a>(&self, data: &'a Self::I) -> ParseResult<&'a Self::I, Self::O>; 166 } 167 168 pub trait ParserCombinator : Parser + Clone { 169 fn then<P: Parser<I=Self::I>>(&self, p: P) -> ChainedParser<Self,P> { 170 ChainedParser{first: self.clone(), second: p} 171 } 172 ... 173 } 174 175 pub fn opt<T: Parser>(t: T) -> OptionParser<T> { 176 OptionParser{parser: t} 177 } 178 179 pub fn recursive<I:?Sized,O, F: Fn() -> Box<Parser<I=I,O=O>>>(f: F) -> RecursiveParser<I,O,F> { 180 RecursiveParser{parser: Rc::new(f)} 181 } 182 183 ... 184 185 pub struct ChainedParser<A,B> { 186 first: A, 187 second: B, 188 } 189 ... 190 impl<C: ?Sized, A: Parser<I=C>, B: Parser<I=C>> Parser for ChainedParser<A, B> { 191 type I = C; 192 type O = (A::O,B::O); 193 194 fn parse<'a>(&self, data: &'a Self::I) -> ParseResult<&'a Self::I, Self::O>{ 195 match self.first.parse(data) { 196 Ok((a, d2)) => match self.second.parse(d2) { 197 Ok((b, remain)) => Ok(((a, b), remain)), 198 Err(err) => Err(err) 199 }, 200 Err(err) => Err(err) 201 } 202 } 203 } 204 205 impl<C: ?Sized, A: ParserCombinator<I=C>, B: ParserCombinator<I=C>> Clone for ChainedParser<A, B> { 206 ... 207 } 208 ... 209 ``` 210 211 *Pros*: Can overload operators. 212 213 *Cons*: Bloated code. 214 215 Crate [peruse](https://github.com/DanSimon/peruse) is using this approach. 216 2174. Parser as macro 218 219 ```rust 220 #[macro_export] 221 macro_rules! do_parse ( 222 (__impl $i:expr, $consumed:expr, ( $($rest:expr),* )) => ( 223 $crate::IResult::Done($i, ( $($rest),* )) 224 ); 225 226 (__impl $i:expr, $consumed:expr, $e:ident >> $($rest:tt)*) => ( 227 do_parse!(__impl $i, $consumed, call!($e) >> $($rest)*); 228 ); 229 (__impl $i:expr, $consumed:expr, $submac:ident!( $($args:tt)* ) >> $($rest:tt)*) => ( 230 { 231 match $submac!($i, $($args)*) { 232 $crate::IResult::Error(e) => $crate::IResult::Error(e), 233 $crate::IResult::Incomplete($crate::Needed::Unknown) => 234 $crate::IResult::Incomplete($crate::Needed::Unknown), 235 $crate::IResult::Incomplete($crate::Needed::Size(i)) => 236 $crate::IResult::Incomplete($crate::Needed::Size($consumed + i)), 237 $crate::IResult::Done(i,_) => { 238 do_parse!(__impl i, 239 $consumed + ($crate::InputLength::input_len(&($i)) - 240 $crate::InputLength::input_len(&i)), $($rest)*) 241 }, 242 } 243 } 244 ); 245 246 (__impl $i:expr, $consumed:expr, $field:ident : $e:ident >> $($rest:tt)*) => ( 247 do_parse!(__impl $i, $consumed, $field: call!($e) >> $($rest)*); 248 ); 249 250 (__impl $i:expr, $consumed:expr, $field:ident : $submac:ident!( $($args:tt)* ) >> $($rest:tt)*) => ( 251 { 252 match $submac!($i, $($args)*) { 253 $crate::IResult::Error(e) => $crate::IResult::Error(e), 254 $crate::IResult::Incomplete($crate::Needed::Unknown) => 255 $crate::IResult::Incomplete($crate::Needed::Unknown), 256 $crate::IResult::Incomplete($crate::Needed::Size(i)) => 257 $crate::IResult::Incomplete($crate::Needed::Size($consumed + i)), 258 $crate::IResult::Done(i,o) => { 259 let $field = o; 260 do_parse!(__impl i, 261 $consumed + ($crate::InputLength::input_len(&($i)) - 262 $crate::InputLength::input_len(&i)), $($rest)*) 263 }, 264 } 265 } 266 ); 267 268 // ending the chain 269 (__impl $i:expr, $consumed:expr, $e:ident >> ( $($rest:tt)* )) => ( 270 do_parse!(__impl $i, $consumed, call!($e) >> ( $($rest)* )); 271 ); 272 273 (__impl $i:expr, $consumed:expr, $submac:ident!( $($args:tt)* ) >> ( $($rest:tt)* )) => ( 274 match $submac!($i, $($args)*) { 275 $crate::IResult::Error(e) => $crate::IResult::Error(e), 276 $crate::IResult::Incomplete($crate::Needed::Unknown) => 277 $crate::IResult::Incomplete($crate::Needed::Unknown), 278 $crate::IResult::Incomplete($crate::Needed::Size(i)) => 279 $crate::IResult::Incomplete($crate::Needed::Size($consumed + i)), 280 $crate::IResult::Done(i,_) => { 281 $crate::IResult::Done(i, ( $($rest)* )) 282 }, 283 } 284 ); 285 286 (__impl $i:expr, $consumed:expr, $field:ident : $e:ident >> ( $($rest:tt)* )) => ( 287 do_parse!(__impl $i, $consumed, $field: call!($e) >> ( $($rest)* ) ); 288 ); 289 290 (__impl $i:expr, $consumed:expr, $field:ident : $submac:ident!( $($args:tt)* ) >> ( $($rest:tt)* )) => ( 291 match $submac!($i, $($args)*) { 292 $crate::IResult::Error(e) => $crate::IResult::Error(e), 293 $crate::IResult::Incomplete($crate::Needed::Unknown) => 294 $crate::IResult::Incomplete($crate::Needed::Unknown), 295 $crate::IResult::Incomplete($crate::Needed::Size(i)) => 296 $crate::IResult::Incomplete($crate::Needed::Size($consumed + i)), 297 $crate::IResult::Done(i,o) => { 298 let $field = o; 299 $crate::IResult::Done(i, ( $($rest)* )) 300 }, 301 } 302 ); 303 304 ($i:expr, $($rest:tt)*) => ( 305 { 306 do_parse!(__impl $i, 0usize, $($rest)*) 307 } 308 ); 309 ); 310 ... 311 // To create a parser for integer 312 named!(integer<&[u8], i64>, map!( 313 pair!( 314 opt!(sign), 315 map_res!(map_res!(digit, str::from_utf8), i64::from_str) 316 ), 317 |(sign, value): (Option<i64>, i64)| { sign.unwrap_or(1) * value } 318 )); 319 ``` 320 321 *Pros*: Can create DSL syntax, high performance. 322 323 *Cons*: Macros themselves are difficult to read, write and debug. 324 325 Crate [nom](https://github.com/Geal/nom) is using this approach. 326 327According to above comparison, parser as struct is the best approach. At first I choose to use nom to create a PDF parser, it turns out a special PDF feature blocked me. 328When parsing a PDF stream object, it's length may be a referenced object, hence the need to get the length from a reader. 329The `named! ` macro cannot accept extra parameters, there is no obvious way to read a length object inside a stream object parser. 330This is the primary reason why I started to develop pom. 331 332## List of predefined parsers and combinators in pom 333 334| Basic Parsers | Description | 335| -------------- | ---------------------------------------- | 336| empty() | Always succeeds, consume no input. | 337| end() | Match end of input. | 338| sym(t) | Match a single terminal symbol *t*. | 339| seq(s) | Match sequence of symbols. | 340| list(p,s) | Match list of *p*, separated by *s*. | 341| one_of(set) | Success when current input symbol is one of the set. | 342| none_of(set) | Success when current input symbol is none of the set. | 343| is_a(predicate) | Success when predicate return true on current input symbol. | 344| not_a(predicate) | Success when predicate return false on current input symbol. | 345| take(n) | Read *n* symbols. | 346| skip(n) | Skip *n* symbols. | 347| call(pf) | Call a parser factory, can used to create recursive parsers. | 348 349These are functions to create basic parsers. 350 351 352| Parser Combinators | Description | 353| ------------------ | ---------------------------------------- | 354| p | q | Match p or q, return result of the first success. | 355| p + q | Match p and q, if both success return a pair of results. | 356| p - q | Match p and q, if both success return result of p. | 357| p * q | Match p and q, if both success return result of q. | 358| p >> q | Parse p and get result P, then parse and return result of q(P). | 359| -p | Success when p success, doen't consume input. | 360| !p | Success when p fail, doen't consume input. | 361| p.opt() | Make parser optional. | 362| p.repeat(m..n) | `p.repeat(0..)` repeat p zero or more times<br>`p.repeat(1..)` repeat p one or more times<br>`p.repeat(1..4)` match p at least 1 and at most 3 times | 363| p.map(f) | Convert parser result to desired value. | 364| p.convert(f) | Convert parser result to desired value, fail in case of conversion error. | 365| p.pos() | Get input position after matching p. | 366| p.collect() | Collect all matched input symbols. | 367| p.discard() | Discard parser output. | 368| p.name(_) | Give parser a name to identify parsing errors. | 369 370These are operations to create new parsers based on other parsers. The choice of operators is established by their operator precedence, arity and "meaning". 371 372Use `*` to ignore the result of first operand on the start of an expression, `+` and `-` can fulfill the need on the rest of the expression. 373For example, `A * B * C - D + E - F` will return the results of C and E as a pair. 374 375## Using the code 376 377There are three ways to create a parser: 378 3791. As a variable, normally used to construct another parser. 380 381 ```rust 382 let integer = one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..); 383 ``` 384 3852. As a closure, when referenced several times in constructing another parser. 386 387 ```rust 388 let integer = || one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..); 389 let pair = sym(b'(') * integer() - sym(b',') + integer() - sym(b')'); 390 ``` 391 3923. As a function, provides a high level construct. 393 394 ```rust 395 fn integer() -> Parser<u8, (Option<u8>, Vec<u8>)> { 396 one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..) 397 } 398 ``` 399 400## Example JSON Parser 401 402Let me explain the parser combinators in more detail by creating a JSON parser. Syntax diagrams can be found on [json.org](http://www.json.org/). 403 404```rust 405extern crate pom; 406use pom::{Parser, DataInput}; 407use pom::char_class::hex_digit; 408use pom::parser::*; 409 410use std::str::FromStr; 411use std::char::{decode_utf16, REPLACEMENT_CHARACTER}; 412use std::collections::HashMap; 413 414#[derive(Debug, PartialEq)] 415pub enum JsonValue { 416 Null, 417 Bool(bool), 418 Str(String), 419 Num(f64), 420 Array(Vec<JsonValue>), 421 Object(HashMap<String,JsonValue>) 422} 423``` 424 425Import predefined parser combinators and utility functions, define the JSON parser's output value as an enum. 426 427```rust 428fn space() -> Parser<u8, ()> { 429 one_of(b" \t\r\n").repeat(0..).discard() 430} 431``` 432 433Match zero or more space characters, the output is ignored. 434 435```rust 436fn number() -> Parser<u8, f64> { 437 let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0'); 438 let frac = sym(b'.') + one_of(b"0123456789").repeat(1..); 439 let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..); 440 let number = sym(b'-').opt() + integer + frac.opt() + exp.opt(); 441 number.collect().convert(String::from_utf8).convert(|s|f64::from_str(&s)) 442} 443``` 444 445Don't care each output of integer, frac or exp, collect() method get all the match character as a Vec<u8>, then it is converted to a string, and further converted to a float number. 446 447```rust 448fn string() -> Parser<u8, String> { 449 let special_char = sym(b'\\') | sym(b'/') | sym(b'"') 450 | sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C') 451 | sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t'); 452 let escape_sequence = sym(b'\\') * special_char; 453 let char_string = (none_of(b"\\\"") | escape_sequence).repeat(1..).convert(String::from_utf8); 454 let utf16_char = seq(b"\\u") * is_a(hex_digit).repeat(4).convert(String::from_utf8).convert(|digits|u16::from_str_radix(&digits, 16)); 455 let utf16_string = utf16_char.repeat(1..).map(|chars|decode_utf16(chars).map(|r| r.unwrap_or(REPLACEMENT_CHARACTER)).collect::<String>()); 456 let string = sym(b'"') * (char_string | utf16_string).repeat(0..) - sym(b'"'); 457 string.map(|strings|strings.concat()) 458} 459``` 460 461The bulk of code is written to parse escape sequences. 462According to [Wikipedia](https://en.wikipedia.org/wiki/JSON#Data_portability_issues), UTF-16 surrogate pairs is a detail missed by some JSON parsers. 463We implement this easily with Rust's Unicode support. 464 465```rust 466fn array() -> Parser<u8, Vec<JsonValue>> { 467 let elems = list(call(value), sym(b',') * space()); 468 sym(b'[') * space() * elems - sym(b']') 469} 470 471fn object() -> Parser<u8, HashMap<String, JsonValue>> { 472 let member = string() - space() - sym(b':') - space() + call(value); 473 let members = list(member, sym(b',') * space()); 474 let obj = sym(b'{') * space() * members - sym(b'}'); 475 obj.map(|members|members.into_iter().collect::<HashMap<_,_>>()) 476} 477 478fn value() -> Parser<u8, JsonValue> { 479 ( seq(b"null").map(|_|JsonValue::Null) 480 | seq(b"true").map(|_|JsonValue::Bool(true)) 481 | seq(b"false").map(|_|JsonValue::Bool(false)) 482 | number().map(|num|JsonValue::Num(num)) 483 | string().map(|text|JsonValue::Str(text)) 484 | array().map(|arr|JsonValue::Array(arr)) 485 | object().map(|obj|JsonValue::Object(obj)) 486 ) - space() 487} 488``` 489 490array and object are very straight to parse, notice `call(value)`, at the first attempt I write it as `value()`, then an infinite loop is created. Recursive parsing is solved by adding `call()` to `pom`. 491 492```rust 493pub fn json() -> Parser<u8, JsonValue> { 494 space() * value() - end() 495} 496``` 497 498The final JSON parser, declared as public. According to [RFC 7159](https://tools.ietf.org/html/rfc7159) a JSON text is a serialized value of any of the six types. 499`end()` is used to ensure there is no extra text in the input. 500 501```rust 502fn main() { 503 let test = br#" 504 { 505 "Image": { 506 "Width": 800, 507 "Height": 600, 508 "Title": "View from 15th Floor", 509 "Thumbnail": { 510 "Url": "http://www.example.com/image/481989943", 511 "Height": 125, 512 "Width": 100 513 }, 514 "Animated" : false, 515 "IDs": [116, 943, 234, 38793] 516 }, 517 "escaped characters": "\u2192\uD83D\uDE00\"\t\uD834\uDD1E" 518 }"#; 519 520 let mut input = DataInput::new(test); 521 println!("{:?}", json().parse(&mut input)); 522} 523``` 524 525Use the JSON parser to parse JSON text, the output is: 526 527``` 528cargo run --example json 529 Compiling pom v0.6.0 (file:///work/pom) 530 Finished debug [unoptimized + debuginfo] target(s) in 2.20 secs 531 Running `target/debug/examples/json` 532Ok(Object({"Image": Object({"Width": Num(800), "Title": Str("View from 15th Floor"), "Height": Num(600), "Animated": Bool(false), "IDs": Array([Num(116), Num(943), Num(234), Num(38793)]), "Thumbnail": Object({"Height 533": Num(125), "Url": Str("http://www.example.com/image/481989943"), "Width": Num(100)})}), "escaped characters": Str("→\"\t")})) 534``` 535 536The above parser assumes that the input bytes is UTF-8 encoded text; otherwise, you can use the [char version of JSON parser](https://github.com/J-F-Liu/pom/blob/master/examples/json_char.rs). 537 538`p >> q` is not covered in the JSON example. It is used to pass the output of `p` into parser creation of `p`. 539 540```rust 541let mut input = DataInput::new(b"5oooooooo"); 542let parser = one_of(b"0123456789").map(|c|c - b'0') >> |n| { 543 take(n as usize) + sym(b'o').repeat(0..) 544}; 545let output = parser.parse(&mut input); 546assert_eq!(output, Ok( (vec![b'o';5], vec![b'o';3]) )); 547``` 548 549The first character indicates the number of `o`s to parse, then the number is used in the closure `|n| take(n)`. 550 551## More examples 552 553- A [simple PDF parser](https://github.com/J-F-Liu/lopdf/blob/491dece5867a2b81878208bcb5e07ff1007c0d89/src/parser.rs), you can compare it with the equivalent [nom version](https://github.com/J-F-Liu/lopdf/blob/dff82c49fea9ac9ea23edf42ad80e480bd5edb46/src/parser.rs). 554- A [complete PDF parser](https://github.com/J-F-Liu/lopdf/blob/master/src/parser.rs) which can read length object when parsing stream object. 555 556## Conclusion 557 558I think I created something really cool, you can use pom to write all kinds of parsers elegantly. 559I helped pom to evolve version by version into what it is, and pom also helps me to grow my Rust programming skills a lot. 560Of course there is still room for improvement, any feed back is welcome. 561 562## Points of interest 563 564I try to add a `cache()` method to `Parser`. Memorize the result on given input position, return the result directly when called again, effectively implementing the Packrat Parsing algorithm. 565But there are two problems, 1) save result means mutate a Hashmap, so Parser's method field should be a Box of `FnMut`, 5662) Hashmap returns an reference of value for a given key, the value cannot be moved, so need to make the value cloneable. 567 568## Pain points where Rust needs to improve 569 5701. Implement trait for `[T]` should automatically implement `[T; N]`. 5712. The standard library should provide a char_at() method return the char and the number of bytes consumed, like: 572 ```rust 573 pub trait Encoding { 574 /// Get char at a byte index, return the char and the number of bytes read. 575 fn char_at(&self, data: &[u8], index: usize) -> Result<(char, u32)>; 576 } 577 ``` 5783. Can ellide 'static lifetime parameter, allow `Parser<'static, I, O>` written as `Parser<I, O>`. 5794. Should `impl Copy for closure`, so that FnOnce closure can be passed to map() inside Fn closure. 580 ```rust 581 pub fn map<U, F>(self, f: F) -> Parser<'a, I, U> 582 where F: FnOnce(O) -> U + Copy + 'a, 583 I: 'static, 584 O: 'static, 585 U: 'static 586 { 587 Parser::new(move |input: &mut Input<I>| { 588 self.parse(input).map(f) 589 }) 590 } 591 ``` 592 593## More Readings 594 595- [The Rust programming language, in the words of its practitioners](https://brson.github.io/fireflowers/) 596- [PEGs, Packrats and Parser Combinators](http://scg.unibe.ch/download/lectures/cc2011/10PEGs.pptx.pdf) 597- [An introduction to parsing text in Haskell with Parsec](http://unbui.lt/#!/post/haskell-parsec-basics/) 598