README.md
1# pom
2
3[![Crates.io](https://img.shields.io/crates/v/pom.svg)](https://crates.io/crates/pom)
4[![Build Status](https://travis-ci.org/J-F-Liu/pom.png)](https://travis-ci.org/J-F-Liu/pom)
5[![Docs](https://docs.rs/pom/badge.svg)](https://docs.rs/pom)
6[![Discord](https://img.shields.io/badge/discord-pom-red.svg)](https://discord.gg/CVy85pg)
7
8PEG parser combinators created using operator overloading without macros.
9
10## Document
11
12- [Tutorial](https://github.com/J-F-Liu/pom/blob/master/doc/article.md)
13- [API Reference](https://docs.rs/crate/pom/)
14
15## What is PEG?
16
17PEG stands for parsing expression grammar, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language.
18Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree.
19Each parsing function conceptually takes an input string as its argument, and yields one of the following results:
20- success, in which the function may optionally move forward or consume one or more characters of the input string supplied to it, or
21- failure, in which case no input is consumed.
22
23Read more on [Wikipedia](https://en.wikipedia.org/wiki/Parsing_expression_grammar).
24
25## What is parser combinator?
26
27A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output.
28Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing.
29
30Parsers built using combinators are straightforward to construct, readable, modular, well-structured and easily maintainable.
31With operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule.
32Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar.
33And the code is easier to debug than macros.
34
35The main advantage is that you don't need to go through any kind of code generation step, you're always using the vanilla language underneath.
36Aside from build issues (and the usual issues around error messages and debuggability, which in fairness are about as bad with macros as with code generation), it's usually easier to freely intermix grammar expressions and plain code.
37
38## List of predefined parsers and combinators
39
40|Basic Parsers|Description|
41| --- | --- |
42|empty()|Always succeeds, consume no input.|
43|end() |Match end of input.|
44|sym(t)|Match a single terminal symbol *t*.|
45|seq(s) |Match sequence of symbols.|
46|list(p,s) |Match list of *p*, separated by *s*.|
47|one_of(set) |Success when current input symbol is one of the set.|
48|none_of(set)|Success when current input symbol is none of the set.|
49|is_a(predicate) |Success when predicate return true on current input symbol.|
50|not_a(predicate)|Success when predicate return false on current input symbol.|
51|take(n)|Read *n* symbols.|
52|skip(n)|Skip *n* symbols.|
53|call(pf)|Call a parser factory, can used to create recursive parsers.|
54
55|Parser Combinators|Description|
56| --- | --- |
57| p | q | Match p or q, return result of the first success. |
58| p + q | Match p and q, if both success return a pair of results. |
59| p - q | Match p and q, if both success return result of p. |
60| p * q | Match p and q, if both success return result of q. |
61| p >> q | Parse p and get result P, then parse and return result of q(P). |
62| -p | Success when p success, doesn't consume input. |
63| !p | Success when p fail, doesn't consume input. |
64|p.opt()|Make parser optional. Returns an `Option`.|
65|p.repeat(m..n)| `p.repeat(0..)` repeat p zero or more times<br>`p.repeat(1..)` repeat p one or more times<br>`p.repeat(1..4)` match p at least 1 and at most 3 times<br>`p.repeat(5)` repeat p exactly 5 times|
66|p.map(f)|Convert parser result to desired value.|
67|p.convert(f)|Convert parser result to desired value, fail in case of conversion error.|
68|p.pos() |Get input position after matching p.|
69|p.collect()|Collect all matched input symbols.|
70|p.discard()|Discard parser output.|
71|p.name(_)|Give parser a name to identify parsing errors.|
72|p.expect(_)|Mark parser as expected, abort early when failed in ordered choice.|
73
74The choice of operators is established by their operator precedence, arity and "meaning".
75Use `*` to ignore the result of first operand on the start of an expression, `+` and `-` can fulfill the need on the rest of the expression.
76
77For example, `A * B * C - D + E - F` will return the results of C and E as a pair.
78
79## Example code
80```rust
81extern crate pom;
82use pom::parser::*;
83
84let input = b"abcde";
85let parser = sym(b'a') * none_of(b"AB") - sym(b'c') + seq(b"de");
86let output = parser.parse(input);
87assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );
88```
89
90### Example JSON parser
91```rust
92extern crate pom;
93use pom::parser::*;
94use pom::Parser;
95
96use std::collections::HashMap;
97use std::str::{self, FromStr};
98
99#[derive(Debug, PartialEq)]
100pub enum JsonValue {
101 Null,
102 Bool(bool),
103 Str(String),
104 Num(f64),
105 Array(Vec<JsonValue>),
106 Object(HashMap<String,JsonValue>)
107}
108
109fn space() -> Parser<u8, ()> {
110 one_of(b" \t\r\n").repeat(0..).discard()
111}
112
113fn number() -> Parser<u8, f64> {
114 let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0');
115 let frac = sym(b'.') + one_of(b"0123456789").repeat(1..);
116 let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..);
117 let number = sym(b'-').opt() + integer + frac.opt() + exp.opt();
118 number.collect().convert(str::from_utf8).convert(|s|f64::from_str(&s))
119}
120
121fn string() -> Parser<u8, String> {
122 let special_char = sym(b'\\') | sym(b'/') | sym(b'"')
123 | sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C')
124 | sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t');
125 let escape_sequence = sym(b'\\') * special_char;
126 let string = sym(b'"') * (none_of(b"\\\"") | escape_sequence).repeat(0..) - sym(b'"');
127 string.convert(String::from_utf8)
128}
129
130fn array() -> Parser<u8, Vec<JsonValue>> {
131 let elems = list(call(value), sym(b',') * space());
132 sym(b'[') * space() * elems - sym(b']')
133}
134
135fn object() -> Parser<u8, HashMap<String, JsonValue>> {
136 let member = string() - space() - sym(b':') - space() + call(value);
137 let members = list(member, sym(b',') * space());
138 let obj = sym(b'{') * space() * members - sym(b'}');
139 obj.map(|members|members.into_iter().collect::<HashMap<_,_>>())
140}
141
142fn value() -> Parser<u8, JsonValue> {
143 ( seq(b"null").map(|_|JsonValue::Null)
144 | seq(b"true").map(|_|JsonValue::Bool(true))
145 | seq(b"false").map(|_|JsonValue::Bool(false))
146 | number().map(|num|JsonValue::Num(num))
147 | string().map(|text|JsonValue::Str(text))
148 | array().map(|arr|JsonValue::Array(arr))
149 | object().map(|obj|JsonValue::Object(obj))
150 ) - space()
151}
152
153pub fn json() -> Parser<u8, JsonValue> {
154 space() * value() - end()
155}
156
157fn main() {
158 let input = br#"
159 {
160 "Image": {
161 "Width": 800,
162 "Height": 600,
163 "Title": "View from 15th Floor",
164 "Thumbnail": {
165 "Url": "http://www.example.com/image/481989943",
166 "Height": 125,
167 "Width": 100
168 },
169 "Animated" : false,
170 "IDs": [116, 943, 234, 38793]
171 }
172 }"#;
173
174 println!("{:?}", json().parse(input));
175}
176```
177You can run this example with the following command:
178```
179cargo run --example json
180```
181
182## Benchmark
183
184| Parser | Time to parse the same JSON file |
185|------------------|----------------------------------|
186| pom: json_byte | 621,319 ns/iter (+/- 20,318) |
187| pom: json_char | 627,110 ns/iter (+/- 11,463) |
188| [pest](https://github.com/dragostis/pest): json_char | 13,359 ns/iter (+/- 811)|
189