1# Nom Recipes
2
3These are short recipes for accomplishing common tasks with nom.
4
5* [Whitespace](#whitespace)
6  + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
7* [Comments](#comments)
8  + [`// C++/EOL-style comments`](#-ceol-style-comments)
9  + [`/* C-style comments */`](#-c-style-comments-)
10* [Identifiers](#identifiers)
11  + [`Rust-Style Identifiers`](#rust-style-identifiers)
12* [Literal Values](#literal-values)
13  + [Escaped Strings](#escaped-strings)
14  + [Integers](#integers)
15    - [Hexadecimal](#hexadecimal)
16    - [Octal](#octal)
17    - [Binary](#binary)
18    - [Decimal](#decimal)
19  + [Floating Point Numbers](#floating-point-numbers)
20
21## Whitespace
22
23
24
25### Wrapper combinators that eat whitespace before and after a parser
26
27```rust
28use nom::{
29  IResult,
30  error::ParseError,
31  combinator::value,
32  sequence::delimited,
33  character::complete::multispace0,
34};
35
36/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and
37/// trailing whitespace, returning the output of `inner`.
38fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E>
39  where
40  F: Fn(&'a str) -> IResult<&'a str, O, E>,
41{
42  delimited(
43    multispace0,
44    inner,
45    multispace0
46  )
47}
48```
49
50To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
51Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
52&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
53of lexemes.
54
55## Comments
56
57### `// C++/EOL-style comments`
58
59This version uses `%` to start a comment, does not consume the newline character, and returns an
60output of `()`.
61
62```rust
63use nom::{
64  IResult,
65  error::ParseError,
66  combinator::value,
67  sequence::pair,
68  bytes::complete::is_not,
69  character::complete::char,
70};
71
72pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E>
73{
74  value(
75    (), // Output is thrown away.
76    pair(char('%'), is_not("\n\r"))
77  )(i)
78}
79```
80
81### `/* C-style comments */`
82
83Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()`
84and does not handle nested comments.
85
86```rust
87use nom::{
88  IResult,
89  error::ParseError,
90  combinator::value,
91  sequence::tuple,
92  bytes::complete::{tag, take_until},
93};
94
95pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> {
96  value(
97    (), // Output is thrown away.
98    tuple((
99      tag("(*"),
100      take_until("*)"),
101      tag("*)")
102    ))
103  )(i)
104}
105```
106
107## Identifiers
108
109### `Rust-Style Identifiers`
110
111Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
112letters and numbers may be parsed like this:
113
114```rust
115use nom::{
116  IResult,
117  branch::alt,
118  multi::many0,
119  combinator::recognize,
120  sequence::pair,
121  character::complete::{alpha1, alphanumeric1},
122  bytes::complete::tag,
123};
124
125pub fn identifier(input: &str) -> IResult<&str, &str> {
126  recognize(
127    pair(
128      alt((alpha1, tag("_"))),
129      many0(alt((alphanumeric1, tag("_"))))
130    )
131  )(input)
132}
133```
134
135Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would
136recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next
137`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator
138returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the
139input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.
140
141## Literal Values
142
143### Escaped Strings
144
145This is [one of the examples](https://github.com/Geal/nom/blob/master/examples/string.rs) in the
146examples directory.
147
148### Integers
149
150The following recipes all return string slices rather than integer values. How to obtain an
151integer value instead is demonstrated for hexadecimal integers. The others are similar.
152
153The parsers allow the grouping character `_`, which allows one to group the digits by byte, for
154example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a
155string slice to an integer value is slightly simpler. You can also strip the `_` from the string
156slice that is returned, which is demonstrated in the second hexdecimal number parser.
157
158If you wish to limit the number of digits in a valid integer literal, replace `many1` with
159`many_m_n` in the recipes.
160
161#### Hexadecimal
162
163The parser outputs the string slice of the digits without the leading `0x`/`0X`.
164
165```rust
166use nom::{
167  IResult,
168  branch::alt,
169  multi::{many0, many1},
170  combinator::recognize,
171  sequence::{preceded, terminated},
172  character::complete::{char, one_of},
173  bytes::complete::tag,
174};
175
176fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>>
177  preceded(
178    alt((tag("0x"), tag("0X"))),
179    recognize(
180      many1(
181        terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
182      )
183    )
184  )(input)
185}
186```
187
188If you want it to return the integer value instead, use map:
189
190```rust
191use nom::{
192  IResult,
193  branch::alt,
194  multi::{many0, many1},
195  combinator::{map_res, recognize},
196  sequence::{preceded, terminated},
197  character::complete::{char, one_of},
198  bytes::complete::tag,
199};
200
201fn hexadecimal_value(input: &str) -> IResult<&str, i64> {
202  map_res(
203    preceded(
204      alt((tag("0x"), tag("0X"))),
205      recognize(
206        many1(
207          terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
208        )
209      )
210    ),
211    |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16)
212  )(input)
213}
214```
215
216#### Octal
217
218```rust
219use nom::{
220  IResult,
221  branch::alt,
222  multi::{many0, many1},
223  combinator::recognize,
224  sequence::{preceded, terminated},
225  character::complete::{char, one_of},
226  bytes::complete::tag,
227};
228
229fn octal(input: &str) -> IResult<&str, &str> {
230  preceded(
231    alt((tag("0o"), tag("0O"))),
232    recognize(
233      many1(
234        terminated(one_of("01234567"), many0(char('_')))
235      )
236    )
237  )(input)
238}
239```
240
241#### Binary
242
243```rust
244use nom::{
245  IResult,
246  branch::alt,
247  multi::{many0, many1},
248  combinator::recognize,
249  sequence::{preceded, terminated},
250  character::complete::{char, one_of},
251  bytes::complete::tag,
252};
253
254fn binary(input: &str) -> IResult<&str, &str> {
255  preceded(
256    alt((tag("0b"), tag("0B"))),
257    recognize(
258      many1(
259        terminated(one_of("01"), many0(char('_')))
260      )
261    )
262  )(input)
263}
264```
265
266#### Decimal
267
268```rust
269use nom::{
270  IResult,
271  multi::{many0, many1},
272  combinator::recognize,
273  sequence::terminated,
274  character::complete::{char, one_of},
275};
276
277fn decimal(input: &str) -> IResult<&str, &str> {
278  recognize(
279    many1(
280      terminated(one_of("0123456789"), many0(char('_')))
281    )
282  )(input)
283}
284```
285
286### Floating Point Numbers
287
288The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs).
289
290```rust
291use nom::{
292  IResult,
293  branch::alt,
294  multi::{many0, many1},
295  combinator::{opt, recognize},
296  sequence::{preceded, terminated, tuple},
297  character::complete::{char, one_of},
298};
299
300fn float(input: &str) -> IResult<&str, &str> {
301  alt((
302    // Case one: .42
303    recognize(
304      tuple((
305        char('.'),
306        decimal,
307        opt(tuple((
308          one_of("eE"),
309          opt(one_of("+-")),
310          decimal
311        )))
312      ))
313    )
314    , // Case two: 42e42 and 42.42e42
315    recognize(
316      tuple((
317        decimal,
318        opt(preceded(
319          char('.'),
320          decimal,
321        )),
322        one_of("eE"),
323        opt(one_of("+-")),
324        decimal
325      ))
326    )
327    , // Case three: 42. and 42.42
328    recognize(
329      tuple((
330        decimal,
331        char('.'),
332        opt(decimal)
333      ))
334    )
335  ))(input)
336}
337
338fn decimal(input: &str) -> IResult<&str, &str> {
339  recognize(
340    many1(
341      terminated(one_of("0123456789"), many0(char('_')))
342    )
343  )(input)
344}
345```
346
347# implementing FromStr
348
349The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides
350a common interface to parse from a string.
351
352```rust
353use nom::{
354  IResult, Finish, error::Error,
355  bytes::complete::{tag, take_while},
356};
357use std::str::FromStr;
358
359// will recognize the name in "Hello, name!"
360fn parse_name(input: &str) -> IResult<&str, &str> {
361  let (i, _) = tag("Hello, ")(input)?;
362  let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?;
363  let (i, _) = tag("!")(i)?;
364
365  Ok((i, name))
366}
367
368// with FromStr, the result cannot be a reference to the input, it must be owned
369#[derive(Debug)]
370pub struct Name(pub String);
371
372impl FromStr for Name {
373  // the error must be owned as well
374  type Err = Error<String>;
375
376  fn from_str(s: &str) -> Result<Self, Self::Err> {
377      match parse_name(s).finish() {
378          Ok((_remaining, name)) => Ok(Name(name.to_string())),
379          Err(Error { input, code }) => Err(Error {
380              input: input.to_string(),
381              code,
382          })
383      }
384  }
385}
386
387fn main() {
388  // parsed: Ok(Name("nom"))
389  println!("parsed: {:?}", "Hello, nom!".parse::<Name>());
390
391  // parsed: Err(Error { input: "123!", code: Tag })
392  println!("parsed: {:?}", "Hello, 123!".parse::<Name>());
393}
394```
395
396