1# Nom Recipes 2 3These are short recipes for accomplishing common tasks with nom. 4 5* [Whitespace](#whitespace) 6 + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser) 7* [Comments](#comments) 8 + [`// C++/EOL-style comments`](#-ceol-style-comments) 9 + [`/* C-style comments */`](#-c-style-comments-) 10* [Identifiers](#identifiers) 11 + [`Rust-Style Identifiers`](#rust-style-identifiers) 12* [Literal Values](#literal-values) 13 + [Escaped Strings](#escaped-strings) 14 + [Integers](#integers) 15 - [Hexadecimal](#hexadecimal) 16 - [Octal](#octal) 17 - [Binary](#binary) 18 - [Decimal](#decimal) 19 + [Floating Point Numbers](#floating-point-numbers) 20 21## Whitespace 22 23 24 25### Wrapper combinators that eat whitespace before and after a parser 26 27```rust 28use nom::{ 29 IResult, 30 error::ParseError, 31 combinator::value, 32 sequence::delimited, 33 character::complete::multispace0, 34}; 35 36/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and 37/// trailing whitespace, returning the output of `inner`. 38fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E> 39 where 40 F: Fn(&'a str) -> IResult<&'a str, O, E>, 41{ 42 delimited( 43 multispace0, 44 inner, 45 multispace0 46 ) 47} 48``` 49 50To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`. 51Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0, 52&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set 53of lexemes. 54 55## Comments 56 57### `// C++/EOL-style comments` 58 59This version uses `%` to start a comment, does not consume the newline character, and returns an 60output of `()`. 61 62```rust 63use nom::{ 64 IResult, 65 error::ParseError, 66 combinator::value, 67 sequence::pair, 68 bytes::complete::is_not, 69 character::complete::char, 70}; 71 72pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> 73{ 74 value( 75 (), // Output is thrown away. 76 pair(char('%'), is_not("\n\r")) 77 )(i) 78} 79``` 80 81### `/* C-style comments */` 82 83Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()` 84and does not handle nested comments. 85 86```rust 87use nom::{ 88 IResult, 89 error::ParseError, 90 combinator::value, 91 sequence::tuple, 92 bytes::complete::{tag, take_until}, 93}; 94 95pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> { 96 value( 97 (), // Output is thrown away. 98 tuple(( 99 tag("(*"), 100 take_until("*)"), 101 tag("*)") 102 )) 103 )(i) 104} 105``` 106 107## Identifiers 108 109### `Rust-Style Identifiers` 110 111Parsing identifiers that may start with a letter (or underscore) and may contain underscores, 112letters and numbers may be parsed like this: 113 114```rust 115use nom::{ 116 IResult, 117 branch::alt, 118 multi::many0, 119 combinator::recognize, 120 sequence::pair, 121 character::complete::{alpha1, alphanumeric1}, 122 bytes::complete::tag, 123}; 124 125pub fn identifier(input: &str) -> IResult<&str, &str> { 126 recognize( 127 pair( 128 alt((alpha1, tag("_"))), 129 many0(alt((alphanumeric1, tag("_")))) 130 ) 131 )(input) 132} 133``` 134 135Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would 136recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next 137`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator 138returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the 139input text that was parsed, which in this case is the entire `&str` `hello_world123abc`. 140 141## Literal Values 142 143### Escaped Strings 144 145This is [one of the examples](https://github.com/Geal/nom/blob/master/examples/string.rs) in the 146examples directory. 147 148### Integers 149 150The following recipes all return string slices rather than integer values. How to obtain an 151integer value instead is demonstrated for hexadecimal integers. The others are similar. 152 153The parsers allow the grouping character `_`, which allows one to group the digits by byte, for 154example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a 155string slice to an integer value is slightly simpler. You can also strip the `_` from the string 156slice that is returned, which is demonstrated in the second hexdecimal number parser. 157 158If you wish to limit the number of digits in a valid integer literal, replace `many1` with 159`many_m_n` in the recipes. 160 161#### Hexadecimal 162 163The parser outputs the string slice of the digits without the leading `0x`/`0X`. 164 165```rust 166use nom::{ 167 IResult, 168 branch::alt, 169 multi::{many0, many1}, 170 combinator::recognize, 171 sequence::{preceded, terminated}, 172 character::complete::{char, one_of}, 173 bytes::complete::tag, 174}; 175 176fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>> 177 preceded( 178 alt((tag("0x"), tag("0X"))), 179 recognize( 180 many1( 181 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_'))) 182 ) 183 ) 184 )(input) 185} 186``` 187 188If you want it to return the integer value instead, use map: 189 190```rust 191use nom::{ 192 IResult, 193 branch::alt, 194 multi::{many0, many1}, 195 combinator::{map_res, recognize}, 196 sequence::{preceded, terminated}, 197 character::complete::{char, one_of}, 198 bytes::complete::tag, 199}; 200 201fn hexadecimal_value(input: &str) -> IResult<&str, i64> { 202 map_res( 203 preceded( 204 alt((tag("0x"), tag("0X"))), 205 recognize( 206 many1( 207 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_'))) 208 ) 209 ) 210 ), 211 |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16) 212 )(input) 213} 214``` 215 216#### Octal 217 218```rust 219use nom::{ 220 IResult, 221 branch::alt, 222 multi::{many0, many1}, 223 combinator::recognize, 224 sequence::{preceded, terminated}, 225 character::complete::{char, one_of}, 226 bytes::complete::tag, 227}; 228 229fn octal(input: &str) -> IResult<&str, &str> { 230 preceded( 231 alt((tag("0o"), tag("0O"))), 232 recognize( 233 many1( 234 terminated(one_of("01234567"), many0(char('_'))) 235 ) 236 ) 237 )(input) 238} 239``` 240 241#### Binary 242 243```rust 244use nom::{ 245 IResult, 246 branch::alt, 247 multi::{many0, many1}, 248 combinator::recognize, 249 sequence::{preceded, terminated}, 250 character::complete::{char, one_of}, 251 bytes::complete::tag, 252}; 253 254fn binary(input: &str) -> IResult<&str, &str> { 255 preceded( 256 alt((tag("0b"), tag("0B"))), 257 recognize( 258 many1( 259 terminated(one_of("01"), many0(char('_'))) 260 ) 261 ) 262 )(input) 263} 264``` 265 266#### Decimal 267 268```rust 269use nom::{ 270 IResult, 271 multi::{many0, many1}, 272 combinator::recognize, 273 sequence::terminated, 274 character::complete::{char, one_of}, 275}; 276 277fn decimal(input: &str) -> IResult<&str, &str> { 278 recognize( 279 many1( 280 terminated(one_of("0123456789"), many0(char('_'))) 281 ) 282 )(input) 283} 284``` 285 286### Floating Point Numbers 287 288The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs). 289 290```rust 291use nom::{ 292 IResult, 293 branch::alt, 294 multi::{many0, many1}, 295 combinator::{opt, recognize}, 296 sequence::{preceded, terminated, tuple}, 297 character::complete::{char, one_of}, 298}; 299 300fn float(input: &str) -> IResult<&str, &str> { 301 alt(( 302 // Case one: .42 303 recognize( 304 tuple(( 305 char('.'), 306 decimal, 307 opt(tuple(( 308 one_of("eE"), 309 opt(one_of("+-")), 310 decimal 311 ))) 312 )) 313 ) 314 , // Case two: 42e42 and 42.42e42 315 recognize( 316 tuple(( 317 decimal, 318 opt(preceded( 319 char('.'), 320 decimal, 321 )), 322 one_of("eE"), 323 opt(one_of("+-")), 324 decimal 325 )) 326 ) 327 , // Case three: 42. and 42.42 328 recognize( 329 tuple(( 330 decimal, 331 char('.'), 332 opt(decimal) 333 )) 334 ) 335 ))(input) 336} 337 338fn decimal(input: &str) -> IResult<&str, &str> { 339 recognize( 340 many1( 341 terminated(one_of("0123456789"), many0(char('_'))) 342 ) 343 )(input) 344} 345``` 346 347# implementing FromStr 348 349The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides 350a common interface to parse from a string. 351 352```rust 353use nom::{ 354 IResult, Finish, error::Error, 355 bytes::complete::{tag, take_while}, 356}; 357use std::str::FromStr; 358 359// will recognize the name in "Hello, name!" 360fn parse_name(input: &str) -> IResult<&str, &str> { 361 let (i, _) = tag("Hello, ")(input)?; 362 let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?; 363 let (i, _) = tag("!")(i)?; 364 365 Ok((i, name)) 366} 367 368// with FromStr, the result cannot be a reference to the input, it must be owned 369#[derive(Debug)] 370pub struct Name(pub String); 371 372impl FromStr for Name { 373 // the error must be owned as well 374 type Err = Error<String>; 375 376 fn from_str(s: &str) -> Result<Self, Self::Err> { 377 match parse_name(s).finish() { 378 Ok((_remaining, name)) => Ok(Name(name.to_string())), 379 Err(Error { input, code }) => Err(Error { 380 input: input.to_string(), 381 code, 382 }) 383 } 384 } 385} 386 387fn main() { 388 // parsed: Ok(Name("nom")) 389 println!("parsed: {:?}", "Hello, nom!".parse::<Name>()); 390 391 // parsed: Err(Error { input: "123!", code: Tag }) 392 println!("parsed: {:?}", "Hello, 123!".parse::<Name>()); 393} 394``` 395 396