1# Nom Recipes 2 3These are short recipes for accomplishing common tasks with nom. 4 5* [Whitespace](#whitespace) 6 + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser) 7* [Comments](#comments) 8 + [`// C++/EOL-style comments`](#-ceol-style-comments) 9 + [`/* C-style comments */`](#-c-style-comments-) 10* [Identifiers](#identifiers) 11 + [`Rust-Style Identifiers`](#rust-style-identifiers) 12* [Literal Values](#literal-values) 13 + [Escaped Strings](#escaped-strings) 14 + [Integers](#integers) 15 - [Hexadecimal](#hexadecimal) 16 - [Octal](#octal) 17 - [Binary](#binary) 18 - [Decimal](#decimal) 19 + [Floating Point Numbers](#floating-point-numbers) 20 21## Whitespace 22 23 24 25### Wrapper combinators that eat whitespace before and after a parser 26 27```rust 28/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and 29/// trailing whitespace, returning the output of `inner`. 30fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E> 31 where 32 F: Fn(&'a str) -> IResult<&'a str, O, E>, 33{ 34 delimited( 35 multispace0, 36 inner, 37 multispace0 38 ) 39} 40``` 41 42To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`. 43Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0, 44&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set 45of lexemes. 46 47## Comments 48 49### `// C++/EOL-style comments` 50 51This version uses `%` to start a comment, does not consume the newline character, and returns an 52output of `()`. 53 54```rust 55pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> 56{ 57 value( 58 (), // Output is thrown away. 59 pair(char('%'), is_not("\n\r")) 60 )(i) 61} 62``` 63 64### `/* C-style comments */` 65 66Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()` 67and does not handle nested comments. 68 69```rust 70pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> { 71 value( 72 (), // Output is thrown away. 73 tuple(( 74 tag("(*"), 75 take_until("*)"), 76 tag("*)") 77 )) 78 )(i) 79} 80``` 81 82## Identifiers 83 84### `Rust-Style Identifiers` 85 86Parsing identifiers that may start with a letter (or underscore) and may contain underscores, 87letters and numbers may be parsed like this: 88 89```rust 90pub fn identifier(input: &str) -> IResult<&str, &str> { 91 recognize( 92 pair( 93 alt((alpha1, tag("_"))), 94 many0(alt((alphanumeric1, tag("_")))) 95 ) 96 )(input) 97} 98``` 99 100Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would 101recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next 102`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator 103returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the 104input text that was parsed, which in this case is the entire `&str` `hello_world123abc`. 105 106## Literal Values 107 108### Escaped Strings 109 110This is [one of the examples](https://github.com/Geal/nom/blob/master/examples/string.rs) in the 111examples directory. 112 113### Integers 114 115The following recipes all return string slices rather than integer values. How to obtain an 116integer value instead is demonstrated for hexadecimal integers. The others are similar. 117 118The parsers allow the grouping character `_`, which allows one to group the digits by byte, for 119example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a 120string slice to an integer value is slightly simpler. You can also strip the `_` from the string 121slice that is returned, which is demonstrated in the second hexdecimal number parser. 122 123If you wish to limit the number of digits in a valid integer literal, replace `many1` with 124`many_m_n` in the recipes. 125 126#### Hexadecimal 127 128The parser outputs the string slice of the digits without the leading `0x`/`0X`. 129 130```rust 131fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>> 132 preceded( 133 alt((tag("0x"), tag("0X"))), 134 recognize( 135 many1( 136 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_'))) 137 ) 138 ) 139 )(input) 140} 141``` 142 143If you want it to return the integer value instead, use map: 144 145```rust 146fn hexadecimal_value(input: &str) -> IResult<&str, i64> { 147 map_res( 148 preceded( 149 alt((tag("0x"), tag("0X"))), 150 recognize( 151 many1( 152 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_'))) 153 ) 154 ) 155 ), 156 |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16) 157 )(input) 158} 159``` 160 161#### Octal 162 163```rust 164fn octal(input: &str) -> IResult<&str, &str> { 165 preceded( 166 alt((tag("0o"), tag("0O"))), 167 recognize( 168 many1( 169 terminated(one_of("01234567"), many0(char('_'))) 170 ) 171 ) 172 )(input) 173} 174``` 175 176#### Binary 177 178```rust 179fn binary(input: &str) -> IResult<&str, &str> { 180 preceded( 181 alt((tag("0b"), tag("0B"))), 182 recognize( 183 many1( 184 terminated(one_of("01"), many0(char('_'))) 185 ) 186 ) 187 )(input) 188} 189``` 190 191#### Decimal 192 193```rust 194fn decimal(input: &str) -> IResult<&str, &str> { 195 recognize( 196 many1( 197 terminated(one_of("0123456789"), many0(char('_'))) 198 ) 199 )(input) 200} 201``` 202 203### Floating Point Numbers 204 205The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs). 206 207```rust 208fn float(input: &str) -> IResult<&str, &str> { 209 alt(( 210 // Case one: .42 211 recognize( 212 tuple(( 213 char('.'), 214 decimal, 215 opt(tuple(( 216 one_of("eE"), 217 opt(one_of("+-")), 218 decimal 219 ))) 220 )) 221 ) 222 , // Case two: 42e42 and 42.42e42 223 recognize( 224 tuple(( 225 decimal, 226 opt(preceded( 227 char('.'), 228 decimal, 229 )), 230 one_of("eE"), 231 opt(one_of("+-")), 232 decimal 233 )) 234 ) 235 , // Case three: 42. and 42.42 236 recognize( 237 tuple(( 238 decimal, 239 char('.'), 240 opt(decimal) 241 )) 242 ) 243 ))(input) 244} 245``` 246 247# implementing FromStr 248 249The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides 250a common interface to parse from a string. 251 252```rust 253use nom::{ 254 IResult, Finish, error::Error, 255 bytes::complete::{tag, take_while}, 256}; 257use std::str::FromStr; 258 259// will recognize the name in "Hello, name!" 260fn parse_name(input: &str) -> IResult<&str, &str> { 261 let (i, _) = tag("Hello, ")(input)?; 262 let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?; 263 let (i, _) = tag("!")(i)?; 264 265 Ok((i, name)) 266} 267 268// with FromStr, the result cannot be a reference to the input, it must be owned 269#[derive(Debug)] 270pub struct Name(pub String); 271 272impl FromStr for Name { 273 // the error must be owned as well 274 type Err = Error<String>; 275 276 fn from_str(s: &str) -> Result<Self, Self::Err> { 277 match parse_name(s).finish() { 278 Ok((_remaining, name)) => Ok(Name(name.to_string())), 279 Err(Error { input, code }) => Err(Error { 280 input: input.to_string(), 281 code, 282 }) 283 } 284 } 285} 286 287fn main() { 288 // parsed: Ok(Name("nom")) 289 println!("parsed: {:?}", "Hello, nom!".parse::<Name>()); 290 291 // parsed: Err(Error { input: "123!", code: Tag }) 292 println!("parsed: {:?}", "Hello, 123!".parse::<Name>()); 293} 294``` 295 296