1# Megaparsec 2 3[![License FreeBSD](https://img.shields.io/badge/license-FreeBSD-brightgreen.svg)](http://opensource.org/licenses/BSD-2-Clause) 4[![Hackage](https://img.shields.io/hackage/v/megaparsec.svg?style=flat)](https://hackage.haskell.org/package/megaparsec) 5[![Stackage Nightly](http://stackage.org/package/megaparsec/badge/nightly)](http://stackage.org/nightly/package/megaparsec) 6[![Stackage LTS](http://stackage.org/package/megaparsec/badge/lts)](http://stackage.org/lts/package/megaparsec) 7![CI](https://github.com/mrkkrp/megaparsec/workflows/CI/badge.svg?branch=master) 8 9* [Features](#features) 10 * [Core features](#core-features) 11 * [Error messages](#error-messages) 12 * [External lexers](#external-lexers) 13 * [Character and binary parsing](#character-and-binary-parsing) 14 * [Lexer](#lexer) 15* [Documentation](#documentation) 16* [Tutorials](#tutorials) 17* [Performance](#performance) 18* [Comparison with other solutions](#comparison-with-other-solutions) 19 * [Megaparsec vs Attoparsec](#megaparsec-vs-attoparsec) 20 * [Megaparsec vs Parsec](#megaparsec-vs-parsec) 21 * [Megaparsec vs Trifecta](#megaparsec-vs-trifecta) 22 * [Megaparsec vs Earley](#megaparsec-vs-earley) 23* [Related packages](#related-packages) 24* [Prominent projects that use Megaparsec](#prominent-projects-that-use-megaparsec) 25* [Links to announcements and blog posts](#links-to-announcements-and-blog-posts) 26* [Contribution](#contribution) 27* [License](#license) 28 29This is an industrial-strength monadic parser combinator library. Megaparsec 30is a feature-rich package that tries to find a nice balance between speed, 31flexibility, and quality of parse errors. 32 33## Features 34 35The project provides flexible solutions to satisfy common parsing needs. The 36section describes them shortly. If you're looking for comprehensive 37documentation, see the [section about documentation](#documentation). 38 39### Core features 40 41The package is built around `MonadParsec`, an MTL-style monad transformer. 42Most features work with all instances of `MonadParsec`. One can achieve 43various effects combining monad transformers, i.e. building a monadic stack. 44Since the common monad transformers like `WriterT`, `StateT`, `ReaderT` and 45others are instances of the `MonadParsec` type class, one can also wrap 46`ParsecT` *in* these monads, achieving, for example, backtracking state. 47 48On the other hand `ParsecT` is an instance of many type classes as well. The 49most useful ones are `Monad`, `Applicative`, `Alternative`, and 50`MonadParsec`. 51 52Megaparsec includes all functionality that is typically available in 53Parsec-like libraries and also features some special combinators: 54 55* `parseError` allows us to end parsing and report an arbitrary parse error. 56* `withRecovery` can be used to recover from parse errors “on-the-fly” and 57 continue parsing. Once parsing is finished, several parse errors may be 58 reported or ignored altogether. 59* `observing` makes it possible to “observe” parse errors without ending 60 parsing. 61 62In addition to that, Megaparsec features high-performance combinators 63similar to those found in [Attoparsec][attoparsec]: 64 65* `tokens` makes it easy to parse several tokens in a row (`string` and 66 `string'` are built on top of this primitive). This is about 100 times 67 faster than matching a string token by token. `tokens` returns “chunk” of 68 original input, meaning that if you parse `Text`, it'll return `Text` 69 without repacking. 70* `takeWhile` and `takeWhile1` are about 150 times faster than approaches 71 involving `many`, `manyTill` and other similar combinators. 72* `takeP` allows us to grab n tokens from the stream and returns them as a 73 “chunk” of the stream. 74 75Megaparsec is about as fast as Attoparsec if you write your parser carefully 76(see also [the section about performance](#performance)). 77 78The library can currently work with the following types of input stream 79out-of-the-box: 80 81* `String = [Char]` 82* `ByteString` (strict and lazy) 83* `Text` (strict and lazy) 84 85It's also possible to make it work with custom token streams by making them 86an instance of the `Stream` type class. 87 88### Error messages 89 90* Megaparsec has typed error messages and the ability to signal custom parse 91 errors that better suit user's domain of interest. 92 93* Since version 8, location of parse errors can independent of current 94 offset in the input stream. It is useful when you want a parse error to 95 point to a particular position after performing some checks. 96 97* Instead of single parse error Megaparsec produces so-called 98 `ParseErrorBundle` data type that helps to manage multi-error messages and 99 pretty-print them easily and efficiently. Since version 8, reporting 100 multiple parse errors at once has become much easier. 101 102### External lexers 103 104Megaparsec works well with streams of tokens produced by tools like Alex. 105The design of the `Stream` type class has been changed significantly in the 106recent versions, but user can still work with custom streams of tokens. 107 108### Character and binary parsing 109 110Megaparsec has decent support for Unicode-aware character parsing. Functions 111for character parsing live in the [`Text.Megaparsec.Char`][tm-char] module. 112Similarly, there is [`Text.Megaparsec.Byte`][tm-byte] module for parsing 113streams of bytes. 114 115### Lexer 116 117[`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is a module that should help 118you write your lexer. If you have used `Parsec` in the past, this module 119“fixes” its particularly inflexible `Text.Parsec.Token`. 120 121[`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is intended to be imported 122using a qualified import, it's not included in [`Text.Megaparsec`][tm]. The 123module doesn't impose how you should write your parser, but certain 124approaches may be more elegant than others. An especially important theme is 125parsing of white space, comments, and indentation. 126 127The design of the module allows one quickly solve simple tasks and doesn't 128get in the way when the need to implement something less standard arises. 129 130[`Text.Megaparsec.Byte.Lexer`][tm-byte-lexer] is also available for users 131who wish to parse binary data. 132 133## Documentation 134 135Megaparsec is well-documented. See the [current version of Megaparsec 136documentation on Hackage][hackage]. 137 138## Tutorials 139 140You can find the most complete Megaparsec tutorial [here][the-tutorial]. It 141should provide sufficient guidance to help you start with your parsing 142tasks. The site also has instructions and tips for Parsec users who decide 143to migrate to Megaparsec. 144 145## Performance 146 147Despite being flexible, Megaparsec is also fast. Here is how Megaparsec 148compares to [Attoparsec][attoparsec] (the fastest widely used parsing 149library in the Haskell ecosystem): 150 151Test case | Execution time | Allocated | Max residency 152------------------|---------------:|----------:|-------------: 153CSV (Attoparsec) | 76.50 μs | 397,784 | 10,544 154CSV (Megaparsec) | 64.69 μs | 352,408 | 9,104 155Log (Attoparsec) | 302.8 μs | 1,150,032 | 10,912 156Log (Megaparsec) | 337.8 μs | 1,246,496 | 10,912 157JSON (Attoparsec) | 18.20 μs | 128,368 | 9,032 158JSON (Megaparsec) | 25.45 μs | 203,824 | 9,176 159 160You can run the benchmarks yourself by executing: 161 162``` 163$ nix-build -A benches.parsers-bench 164$ cd result/bench 165$ ./bench-memory 166$ ./bench-speed 167``` 168 169More information about benchmarking and development can be found 170[here][hacking]. 171 172## Comparison with other solutions 173 174There are quite a few libraries that can be used for parsing in Haskell, 175let's compare Megaparsec with some of them. 176 177### Megaparsec vs Attoparsec 178 179[Attoparsec][attoparsec] is another prominent Haskell library for parsing. 180Although both libraries deal with parsing, it's usually easy to decide which 181you will need in particular project: 182 183* *Attoparsec* is sometimes faster but not that feature-rich. It should be 184 used when you want to process large amounts of data where performance 185 matters more than quality of error messages. 186 187* *Megaparsec* is good for parsing of source code or other human-readable 188 texts. It has better error messages and it's implemented as monad 189 transformer. 190 191So, if you work with something human-readable where size of input data is 192moderate, just go with Megaparsec, otherwise Attoparsec may be a better 193choice. 194 195### Megaparsec vs Parsec 196 197Since Megaparsec is a fork of [Parsec][parsec], we are bound to list the 198main differences between the two libraries: 199 200* Better error messages. Megaparsec has typed error messages and custom 201 error messages, it can also report multiple parse errors at once. 202 203* Megaparsec can show the line on which parse error happened as part of 204 parse error. This makes it a lot easier to figure out where the error 205 happened. 206 207* Some quirks and bugs of Parsec are fixed. 208 209* Better support for Unicode parsing in [`Text.Megaparsec.Char`][tm-char]. 210 211* Megaparsec has more powerful combinators and can parse languages where 212 indentation matters out-of-the-box. 213 214* Better documentation. 215 216* Megaparsec can recover from parse errors “on the fly” and continue 217 parsing. 218 219* Megaparsec allows us to conditionally process parse errors *inside your 220 parser* before parsing is finished. In particular, it's possible to define 221 regions in which parse errors, should they happen, will get a “context 222 tag”, e.g. we could build a context stack like “in function definition 223 foo”, “in expression x”, etc. 224 225* Megaparsec is faster and supports efficient operations `tokens`, 226 `takeWhileP`, `takeWhile1P`, `takeP`, like Attoparsec. 227 228If you want to see a detailed change log, `CHANGELOG.md` may be helpful. 229Also see [this original announcement][original-announcement] for another 230comparison. 231 232### Megaparsec vs Trifecta 233 234[Trifecta][trifecta] is another Haskell library featuring good error 235messages. These are the common reasons why Trifecta may be problematic to 236use: 237 238* Complicated, doesn't have any tutorials available, and documentation 239 doesn't help at all. 240 241* Trifecta can parse `String` and `ByteString` natively, but not `Text`. 242 243* Depends on `lens`, which is a very heavy dependency. If you're not into 244 `lens` and would like to keep your code “vanilla”, you may not like the 245 API. 246 247[Idris][idris] has switched from Trifecta to Megaparsec which allowed it to 248[have better error messages and fewer dependencies][idris-testimony]. 249 250### Megaparsec vs Earley 251 252[Earley][earley] is a newer library that allows us to safely parse 253context-free grammars (CFG). Megaparsec is a lower-level library compared to 254Earley, but there are still enough reasons to choose it: 255 256* Megaparsec is faster. 257 258* Your grammar may be not context-free or you may want introduce some sort 259 of state to the parsing process. Almost all non-trivial parsers require 260 state. Even if your grammar is context-free, state may allow for 261 additional niceties. Earley does not support that. 262 263* Megaparsec's error messages are more flexible allowing to include 264 arbitrary data in them, return multiple error messages, mark regions that 265 affect any error that happens in those regions, etc. 266 267In other words, Megaparsec is less safe but also more powerful. 268 269## Related packages 270 271The following packages are designed to be used with Megaparsec (open a PR if 272you want to add something to the list): 273 274* [`hspec-megaparsec`](https://hackage.haskell.org/package/hspec-megaparsec)—utilities 275 for testing Megaparsec parsers with with 276 [Hspec](https://hackage.haskell.org/package/hspec). 277* [`replace-megaparsec`](https://hackage.haskell.org/package/replace-megaparsec)—Stream 278 editing and find-and-replace with Megaparsec. 279* [`cassava-megaparsec`](https://hackage.haskell.org/package/cassava-megaparsec)—Megaparsec 280 parser of CSV files that plays nicely with 281 [Cassava](https://hackage.haskell.org/package/cassava). 282* [`tagsoup-megaparsec`](https://hackage.haskell.org/package/tagsoup-megaparsec)—a 283 library for easily using 284 [TagSoup](https://hackage.haskell.org/package/tagsoup) as a token type in 285 Megaparsec. 286 287## Prominent projects that use Megaparsec 288 289Some prominent projects that use Megaparsec: 290 291* [Idris](https://github.com/idris-lang/Idris-dev)—a general-purpose 292 functional programming language with dependent types 293* [Dhall](https://github.com/dhall-lang/dhall-haskell)—an advanced 294 configuration language 295* [hnix](https://github.com/haskell-nix/hnix)—re-implementation of the Nix 296 language in Haskell 297* [Hledger](https://github.com/simonmichael/hledger)—an accounting tool 298* [MMark](https://github.com/mmark-md/mmark)—strict markdown processor for 299 writers 300 301## Links to announcements and blog posts 302 303Here are some blog posts mainly announcing new features of the project and 304describing what sort of things are now possible: 305 306* [Megaparsec 8](https://markkarpov.com/post/megaparsec-8.html) 307* [Megaparsec 7](https://markkarpov.com/post/megaparsec-7.html) 308* [Evolution of error messages](https://markkarpov.com/post/evolution-of-error-messages.html) 309* [A major upgrade to Megaparsec: more speed, more power](https://markkarpov.com/post/megaparsec-more-speed-more-power.html) 310* [Latest additions to Megaparsec](https://markkarpov.com/post/latest-additions-to-megaparsec.html) 311* [Announcing Megaparsec 5](https://markkarpov.com/post/announcing-megaparsec-5.html) 312* [Megaparsec 4 and 5](https://markkarpov.com/post/megaparsec-4-and-5.html) 313* [The original Megaparsec 4.0.0 announcement][original-announcement] 314 315## Contribution 316 317Issues (bugs, feature requests or otherwise feedback) may be reported in 318[the GitHub issue tracker for this 319project](https://github.com/mrkkrp/megaparsec/issues). 320 321Pull requests are also welcome. If you would like to contribute to the 322project, you may find [this document][hacking] helpful. 323 324## License 325 326Copyright © 2015–present Megaparsec contributors\ 327Copyright © 2007 Paolo Martini\ 328Copyright © 1999–2000 Daan Leijen 329 330Distributed under FreeBSD license. 331 332[hackage]: https://hackage.haskell.org/package/megaparsec 333[the-tutorial]: https://markkarpov.com/tutorial/megaparsec.html 334[hacking]: ./HACKING.md 335 336[tm]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html 337[tm-char]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char.html 338[tm-byte]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte.html 339[tm-char-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char-Lexer.html 340[tm-byte-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte-Lexer.html 341 342[attoparsec]: https://hackage.haskell.org/package/attoparsec 343[parsec]: https://hackage.haskell.org/package/parsec 344[trifecta]: https://hackage.haskell.org/package/trifecta 345[earley]: https://hackage.haskell.org/package/Earley 346[idris]: https://www.idris-lang.org/ 347[idris-testimony]: https://twitter.com/edwinbrady/status/950084043282010117?s=09 348 349[parsers-bench]: https://github.com/mrkkrp/parsers-bench 350[fast-parser]: https://markkarpov.com/megaparsec/writing-a-fast-parser.html 351[original-announcement]: https://mail.haskell.org/pipermail/haskell-cafe/2015-September/121530.html 352