1# Nom Recipes
2
3These are short recipes for accomplishing common tasks with nom.
4
5* [Whitespace](#whitespace)
6  + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
7* [Comments](#comments)
8  + [`// C++/EOL-style comments`](#-ceol-style-comments)
9  + [`/* C-style comments */`](#-c-style-comments-)
10* [Identifiers](#identifiers)
11  + [`Rust-Style Identifiers`](#rust-style-identifiers)
12* [Literal Values](#literal-values)
13  + [Escaped Strings](#escaped-strings)
14  + [Integers](#integers)
15    - [Hexadecimal](#hexadecimal)
16    - [Octal](#octal)
17    - [Binary](#binary)
18    - [Decimal](#decimal)
19  + [Floating Point Numbers](#floating-point-numbers)
20
21## Whitespace
22
23
24
25### Wrapper combinators that eat whitespace before and after a parser
26
27```rust
28/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and
29/// trailing whitespace, returning the output of `inner`.
30fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E>
31  where
32  F: Fn(&'a str) -> IResult<&'a str, O, E>,
33{
34  delimited(
35    multispace0,
36    inner,
37    multispace0
38  )
39}
40```
41
42To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
43Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
44&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
45of lexemes.
46
47## Comments
48
49### `// C++/EOL-style comments`
50
51This version uses `%` to start a comment, does not consume the newline character, and returns an
52output of `()`.
53
54```rust
55pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E>
56{
57  value(
58    (), // Output is thrown away.
59    pair(char('%'), is_not("\n\r"))
60  )(i)
61}
62```
63
64### `/* C-style comments */`
65
66Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()`
67and does not handle nested comments.
68
69```rust
70pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> {
71  value(
72    (), // Output is thrown away.
73    tuple((
74      tag("(*"),
75      take_until("*)"),
76      tag("*)")
77    ))
78  )(i)
79}
80```
81
82## Identifiers
83
84### `Rust-Style Identifiers`
85
86Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
87letters and numbers may be parsed like this:
88
89```rust
90pub fn identifier(input: &str) -> IResult<&str, &str> {
91  recognize(
92    pair(
93      alt((alpha1, tag("_"))),
94      many0(alt((alphanumeric1, tag("_"))))
95    )
96  )(input)
97}
98```
99
100Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would
101recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next
102`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator
103returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the
104input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.
105
106## Literal Values
107
108### Escaped Strings
109
110This is [one of the examples](https://github.com/Geal/nom/blob/master/examples/string.rs) in the
111examples directory.
112
113### Integers
114
115The following recipes all return string slices rather than integer values. How to obtain an
116integer value instead is demonstrated for hexadecimal integers. The others are similar.
117
118The parsers allow the grouping character `_`, which allows one to group the digits by byte, for
119example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a
120string slice to an integer value is slightly simpler. You can also strip the `_` from the string
121slice that is returned, which is demonstrated in the second hexdecimal number parser.
122
123If you wish to limit the number of digits in a valid integer literal, replace `many1` with
124`many_m_n` in the recipes.
125
126#### Hexadecimal
127
128The parser outputs the string slice of the digits without the leading `0x`/`0X`.
129
130```rust
131fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>>
132  preceded(
133    alt((tag("0x"), tag("0X"))),
134    recognize(
135      many1(
136        terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
137      )
138    )
139  )(input)
140}
141```
142
143If you want it to return the integer value instead, use map:
144
145```rust
146fn hexadecimal_value(input: &str) -> IResult<&str, i64> {
147  map_res(
148    preceded(
149      alt((tag("0x"), tag("0X"))),
150      recognize(
151        many1(
152          terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
153        )
154      )
155    ),
156    |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16)
157  )(input)
158}
159```
160
161#### Octal
162
163```rust
164fn octal(input: &str) -> IResult<&str, &str> {
165  preceded(
166    alt((tag("0o"), tag("0O"))),
167    recognize(
168      many1(
169        terminated(one_of("01234567"), many0(char('_')))
170      )
171    )
172  )(input)
173}
174```
175
176#### Binary
177
178```rust
179fn binary(input: &str) -> IResult<&str, &str> {
180  preceded(
181    alt((tag("0b"), tag("0B"))),
182    recognize(
183      many1(
184        terminated(one_of("01"), many0(char('_')))
185      )
186    )
187  )(input)
188}
189```
190
191#### Decimal
192
193```rust
194fn decimal(input: &str) -> IResult<&str, &str> {
195  recognize(
196    many1(
197      terminated(one_of("0123456789"), many0(char('_')))
198    )
199  )(input)
200}
201```
202
203### Floating Point Numbers
204
205The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs).
206
207```rust
208fn float(input: &str) -> IResult<&str, &str> {
209  alt((
210    // Case one: .42
211    recognize(
212      tuple((
213        char('.'),
214        decimal,
215        opt(tuple((
216          one_of("eE"),
217          opt(one_of("+-")),
218          decimal
219        )))
220      ))
221    )
222    , // Case two: 42e42 and 42.42e42
223    recognize(
224      tuple((
225        decimal,
226        opt(preceded(
227          char('.'),
228          decimal,
229        )),
230        one_of("eE"),
231        opt(one_of("+-")),
232        decimal
233      ))
234    )
235    , // Case three: 42. and 42.42
236    recognize(
237      tuple((
238        decimal,
239        char('.'),
240        opt(decimal)
241      ))
242    )
243  ))(input)
244}
245```
246
247# implementing FromStr
248
249The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides
250a common interface to parse from a string.
251
252```rust
253use nom::{
254  IResult, Finish, error::Error,
255  bytes::complete::{tag, take_while},
256};
257use std::str::FromStr;
258
259// will recognize the name in "Hello, name!"
260fn parse_name(input: &str) -> IResult<&str, &str> {
261  let (i, _) = tag("Hello, ")(input)?;
262  let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?;
263  let (i, _) = tag("!")(i)?;
264
265  Ok((i, name))
266}
267
268// with FromStr, the result cannot be a reference to the input, it must be owned
269#[derive(Debug)]
270pub struct Name(pub String);
271
272impl FromStr for Name {
273  // the error must be owned as well
274  type Err = Error<String>;
275
276  fn from_str(s: &str) -> Result<Self, Self::Err> {
277      match parse_name(s).finish() {
278          Ok((_remaining, name)) => Ok(Name(name.to_string())),
279          Err(Error { input, code }) => Err(Error {
280              input: input.to_string(),
281              code,
282          })
283      }
284  }
285}
286
287fn main() {
288  // parsed: Ok(Name("nom"))
289  println!("parsed: {:?}", "Hello, nom!".parse::<Name>());
290
291  // parsed: Err(Error { input: "123!", code: Tag })
292  println!("parsed: {:?}", "Hello, 123!".parse::<Name>());
293}
294```
295
296