1# Megaparsec
2
3[![License FreeBSD](https://img.shields.io/badge/license-FreeBSD-brightgreen.svg)](http://opensource.org/licenses/BSD-2-Clause)
4[![Hackage](https://img.shields.io/hackage/v/megaparsec.svg?style=flat)](https://hackage.haskell.org/package/megaparsec)
5[![Stackage Nightly](http://stackage.org/package/megaparsec/badge/nightly)](http://stackage.org/nightly/package/megaparsec)
6[![Stackage LTS](http://stackage.org/package/megaparsec/badge/lts)](http://stackage.org/lts/package/megaparsec)
7![CI](https://github.com/mrkkrp/megaparsec/workflows/CI/badge.svg?branch=master)
8
9* [Features](#features)
10    * [Core features](#core-features)
11    * [Error messages](#error-messages)
12    * [External lexers](#external-lexers)
13    * [Character and binary parsing](#character-and-binary-parsing)
14    * [Lexer](#lexer)
15* [Documentation](#documentation)
16* [Tutorials](#tutorials)
17* [Performance](#performance)
18* [Comparison with other solutions](#comparison-with-other-solutions)
19    * [Megaparsec vs Attoparsec](#megaparsec-vs-attoparsec)
20    * [Megaparsec vs Parsec](#megaparsec-vs-parsec)
21    * [Megaparsec vs Trifecta](#megaparsec-vs-trifecta)
22    * [Megaparsec vs Earley](#megaparsec-vs-earley)
23* [Related packages](#related-packages)
24* [Prominent projects that use Megaparsec](#prominent-projects-that-use-megaparsec)
25* [Links to announcements and blog posts](#links-to-announcements-and-blog-posts)
26* [Contribution](#contribution)
27* [License](#license)
28
29This is an industrial-strength monadic parser combinator library. Megaparsec
30is a feature-rich package that tries to find a nice balance between speed,
31flexibility, and quality of parse errors.
32
33## Features
34
35The project provides flexible solutions to satisfy common parsing needs. The
36section describes them shortly. If you're looking for comprehensive
37documentation, see the [section about documentation](#documentation).
38
39### Core features
40
41The package is built around `MonadParsec`, an MTL-style monad transformer.
42Most features work with all instances of `MonadParsec`. One can achieve
43various effects combining monad transformers, i.e. building a monadic stack.
44Since the common monad transformers like `WriterT`, `StateT`, `ReaderT` and
45others are instances of the `MonadParsec` type class, one can also wrap
46`ParsecT` *in* these monads, achieving, for example, backtracking state.
47
48On the other hand `ParsecT` is an instance of many type classes as well. The
49most useful ones are `Monad`, `Applicative`, `Alternative`, and
50`MonadParsec`.
51
52Megaparsec includes all functionality that is typically available in
53Parsec-like libraries and also features some special combinators:
54
55* `parseError` allows us to end parsing and report an arbitrary parse error.
56* `withRecovery` can be used to recover from parse errors “on-the-fly” and
57  continue parsing. Once parsing is finished, several parse errors may be
58  reported or ignored altogether.
59* `observing` makes it possible to “observe” parse errors without ending
60  parsing.
61
62In addition to that, Megaparsec features high-performance combinators
63similar to those found in [Attoparsec][attoparsec]:
64
65* `tokens` makes it easy to parse several tokens in a row (`string` and
66  `string'` are built on top of this primitive). This is about 100 times
67  faster than matching a string token by token. `tokens` returns “chunk” of
68  original input, meaning that if you parse `Text`, it'll return `Text`
69  without repacking.
70* `takeWhile` and `takeWhile1` are about 150 times faster than approaches
71  involving `many`, `manyTill` and other similar combinators.
72* `takeP` allows us to grab n tokens from the stream and returns them as a
73  “chunk” of the stream.
74
75Megaparsec is about as fast as Attoparsec if you write your parser carefully
76(see also [the section about performance](#performance)).
77
78The library can currently work with the following types of input stream
79out-of-the-box:
80
81* `String = [Char]`
82* `ByteString` (strict and lazy)
83* `Text` (strict and lazy)
84
85It's also possible to make it work with custom token streams by making them
86an instance of the `Stream` type class.
87
88### Error messages
89
90* Megaparsec has typed error messages and the ability to signal custom parse
91  errors that better suit user's domain of interest.
92
93* Since version 8, location of parse errors can independent of current
94  offset in the input stream. It is useful when you want a parse error to
95  point to a particular position after performing some checks.
96
97* Instead of single parse error Megaparsec produces so-called
98  `ParseErrorBundle` data type that helps to manage multi-error messages and
99  pretty-print them easily and efficiently. Since version 8, reporting
100  multiple parse errors at once has become much easier.
101
102### External lexers
103
104Megaparsec works well with streams of tokens produced by tools like Alex.
105The design of the `Stream` type class has been changed significantly in the
106recent versions, but user can still work with custom streams of tokens.
107
108### Character and binary parsing
109
110Megaparsec has decent support for Unicode-aware character parsing. Functions
111for character parsing live in the [`Text.Megaparsec.Char`][tm-char] module.
112Similarly, there is [`Text.Megaparsec.Byte`][tm-byte] module for parsing
113streams of bytes.
114
115### Lexer
116
117[`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is a module that should help
118you write your lexer. If you have used `Parsec` in the past, this module
119“fixes” its particularly inflexible `Text.Parsec.Token`.
120
121[`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is intended to be imported
122using a qualified import, it's not included in [`Text.Megaparsec`][tm]. The
123module doesn't impose how you should write your parser, but certain
124approaches may be more elegant than others. An especially important theme is
125parsing of white space, comments, and indentation.
126
127The design of the module allows one quickly solve simple tasks and doesn't
128get in the way when the need to implement something less standard arises.
129
130[`Text.Megaparsec.Byte.Lexer`][tm-byte-lexer] is also available for users
131who wish to parse binary data.
132
133## Documentation
134
135Megaparsec is well-documented. See the [current version of Megaparsec
136documentation on Hackage][hackage].
137
138## Tutorials
139
140You can find the most complete Megaparsec tutorial [here][the-tutorial]. It
141should provide sufficient guidance to help you start with your parsing
142tasks. The site also has instructions and tips for Parsec users who decide
143to migrate to Megaparsec.
144
145## Performance
146
147Despite being flexible, Megaparsec is also fast. Here is how Megaparsec
148compares to [Attoparsec][attoparsec] (the fastest widely used parsing
149library in the Haskell ecosystem):
150
151Test case         | Execution time | Allocated | Max residency
152------------------|---------------:|----------:|-------------:
153CSV (Attoparsec)  |       76.50 μs |   397,784 |        10,544
154CSV (Megaparsec)  |       64.69 μs |   352,408 |         9,104
155Log (Attoparsec)  |       302.8 μs | 1,150,032 |        10,912
156Log (Megaparsec)  |       337.8 μs | 1,246,496 |        10,912
157JSON (Attoparsec) |       18.20 μs |   128,368 |         9,032
158JSON (Megaparsec) |       25.45 μs |   203,824 |         9,176
159
160You can run the benchmarks yourself by executing:
161
162```
163$ nix-build -A benches.parsers-bench
164$ cd result/bench
165$ ./bench-memory
166$ ./bench-speed
167```
168
169More information about benchmarking and development can be found
170[here][hacking].
171
172## Comparison with other solutions
173
174There are quite a few libraries that can be used for parsing in Haskell,
175let's compare Megaparsec with some of them.
176
177### Megaparsec vs Attoparsec
178
179[Attoparsec][attoparsec] is another prominent Haskell library for parsing.
180Although both libraries deal with parsing, it's usually easy to decide which
181you will need in particular project:
182
183* *Attoparsec* is sometimes faster but not that feature-rich. It should be
184  used when you want to process large amounts of data where performance
185  matters more than quality of error messages.
186
187* *Megaparsec* is good for parsing of source code or other human-readable
188  texts. It has better error messages and it's implemented as monad
189  transformer.
190
191So, if you work with something human-readable where size of input data is
192moderate, just go with Megaparsec, otherwise Attoparsec may be a better
193choice.
194
195### Megaparsec vs Parsec
196
197Since Megaparsec is a fork of [Parsec][parsec], we are bound to list the
198main differences between the two libraries:
199
200* Better error messages. Megaparsec has typed error messages and custom
201  error messages, it can also report multiple parse errors at once.
202
203* Megaparsec can show the line on which parse error happened as part of
204  parse error. This makes it a lot easier to figure out where the error
205  happened.
206
207* Some quirks and bugs of Parsec are fixed.
208
209* Better support for Unicode parsing in [`Text.Megaparsec.Char`][tm-char].
210
211* Megaparsec has more powerful combinators and can parse languages where
212  indentation matters out-of-the-box.
213
214* Better documentation.
215
216* Megaparsec can recover from parse errors “on the fly” and continue
217  parsing.
218
219* Megaparsec allows us to conditionally process parse errors *inside your
220  parser* before parsing is finished. In particular, it's possible to define
221  regions in which parse errors, should they happen, will get a “context
222  tag”, e.g. we could build a context stack like “in function definition
223  foo”, “in expression x”, etc.
224
225* Megaparsec is faster and supports efficient operations `tokens`,
226  `takeWhileP`, `takeWhile1P`, `takeP`, like Attoparsec.
227
228If you want to see a detailed change log, `CHANGELOG.md` may be helpful.
229Also see [this original announcement][original-announcement] for another
230comparison.
231
232### Megaparsec vs Trifecta
233
234[Trifecta][trifecta] is another Haskell library featuring good error
235messages. These are the common reasons why Trifecta may be problematic to
236use:
237
238* Complicated, doesn't have any tutorials available, and documentation
239  doesn't help at all.
240
241* Trifecta can parse `String` and `ByteString` natively, but not `Text`.
242
243* Depends on `lens`, which is a very heavy dependency. If you're not into
244  `lens` and would like to keep your code “vanilla”, you may not like the
245  API.
246
247[Idris][idris] has switched from Trifecta to Megaparsec which allowed it to
248[have better error messages and fewer dependencies][idris-testimony].
249
250### Megaparsec vs Earley
251
252[Earley][earley] is a newer library that allows us to safely parse
253context-free grammars (CFG). Megaparsec is a lower-level library compared to
254Earley, but there are still enough reasons to choose it:
255
256* Megaparsec is faster.
257
258* Your grammar may be not context-free or you may want introduce some sort
259  of state to the parsing process. Almost all non-trivial parsers require
260  state. Even if your grammar is context-free, state may allow for
261  additional niceties. Earley does not support that.
262
263* Megaparsec's error messages are more flexible allowing to include
264  arbitrary data in them, return multiple error messages, mark regions that
265  affect any error that happens in those regions, etc.
266
267In other words, Megaparsec is less safe but also more powerful.
268
269## Related packages
270
271The following packages are designed to be used with Megaparsec (open a PR if
272you want to add something to the list):
273
274* [`hspec-megaparsec`](https://hackage.haskell.org/package/hspec-megaparsec)—utilities
275  for testing Megaparsec parsers with with
276  [Hspec](https://hackage.haskell.org/package/hspec).
277* [`replace-megaparsec`](https://hackage.haskell.org/package/replace-megaparsec)—Stream
278  editing and find-and-replace with Megaparsec.
279* [`cassava-megaparsec`](https://hackage.haskell.org/package/cassava-megaparsec)—Megaparsec
280  parser of CSV files that plays nicely with
281  [Cassava](https://hackage.haskell.org/package/cassava).
282* [`tagsoup-megaparsec`](https://hackage.haskell.org/package/tagsoup-megaparsec)—a
283  library for easily using
284  [TagSoup](https://hackage.haskell.org/package/tagsoup) as a token type in
285  Megaparsec.
286
287## Prominent projects that use Megaparsec
288
289Some prominent projects that use Megaparsec:
290
291* [Idris](https://github.com/idris-lang/Idris-dev)—a general-purpose
292  functional programming language with dependent types
293* [Dhall](https://github.com/dhall-lang/dhall-haskell)—an advanced
294  configuration language
295* [hnix](https://github.com/haskell-nix/hnix)—re-implementation of the Nix
296  language in Haskell
297* [Hledger](https://github.com/simonmichael/hledger)—an accounting tool
298* [MMark](https://github.com/mmark-md/mmark)—strict markdown processor for
299  writers
300
301## Links to announcements and blog posts
302
303Here are some blog posts mainly announcing new features of the project and
304describing what sort of things are now possible:
305
306* [Megaparsec 8](https://markkarpov.com/post/megaparsec-8.html)
307* [Megaparsec 7](https://markkarpov.com/post/megaparsec-7.html)
308* [Evolution of error messages](https://markkarpov.com/post/evolution-of-error-messages.html)
309* [A major upgrade to Megaparsec: more speed, more power](https://markkarpov.com/post/megaparsec-more-speed-more-power.html)
310* [Latest additions to Megaparsec](https://markkarpov.com/post/latest-additions-to-megaparsec.html)
311* [Announcing Megaparsec 5](https://markkarpov.com/post/announcing-megaparsec-5.html)
312* [Megaparsec 4 and 5](https://markkarpov.com/post/megaparsec-4-and-5.html)
313* [The original Megaparsec 4.0.0 announcement][original-announcement]
314
315## Contribution
316
317Issues (bugs, feature requests or otherwise feedback) may be reported in
318[the GitHub issue tracker for this
319project](https://github.com/mrkkrp/megaparsec/issues).
320
321Pull requests are also welcome. If you would like to contribute to the
322project, you may find [this document][hacking] helpful.
323
324## License
325
326Copyright © 2015–present Megaparsec contributors\
327Copyright © 2007 Paolo Martini\
328Copyright © 1999–2000 Daan Leijen
329
330Distributed under FreeBSD license.
331
332[hackage]: https://hackage.haskell.org/package/megaparsec
333[the-tutorial]: https://markkarpov.com/tutorial/megaparsec.html
334[hacking]: ./HACKING.md
335
336[tm]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html
337[tm-char]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char.html
338[tm-byte]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte.html
339[tm-char-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char-Lexer.html
340[tm-byte-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte-Lexer.html
341
342[attoparsec]: https://hackage.haskell.org/package/attoparsec
343[parsec]: https://hackage.haskell.org/package/parsec
344[trifecta]: https://hackage.haskell.org/package/trifecta
345[earley]: https://hackage.haskell.org/package/Earley
346[idris]: https://www.idris-lang.org/
347[idris-testimony]: https://twitter.com/edwinbrady/status/950084043282010117?s=09
348
349[parsers-bench]: https://github.com/mrkkrp/parsers-bench
350[fast-parser]: https://markkarpov.com/megaparsec/writing-a-fast-parser.html
351[original-announcement]: https://mail.haskell.org/pipermail/haskell-cafe/2015-September/121530.html
352