1# Lark - a parsing toolkit for Python 2 3Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity. 4 5Lark can parse all context-free languages. To put it simply, it means that it is capable of parsing almost any programming language out there, and to some degree most natural languages too. 6 7**Who is it for?** 8 9 - **Beginners**: Lark is very friendly for experimentation. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs an annotated parse-tree for you, using only the grammar and an input, and it gives you convienient and flexible tools to process that parse-tree. 10 11 - **Experts**: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities. 12 13**What can it do?** 14 15 - Parse all context-free grammars, and handle any ambiguity gracefully 16 - Build an annotated parse-tree automagically, no construction code required. 17 - Provide first-rate performance in terms of both Big-O complexity and measured run-time (considering that this is Python ;) 18 - Run on every Python interpreter (it's pure-python) 19 - Generate a stand-alone parser (for LALR(1) grammars) 20 21And many more features. Read ahead and find out! 22 23Most importantly, Lark will save you time and prevent you from getting parsing headaches. 24 25### Quick links 26 27- [Documentation @readthedocs](https://lark-parser.readthedocs.io/) 28- [Cheatsheet (PDF)](/docs/_static/lark_cheatsheet.pdf) 29- [Online IDE](https://lark-parser.github.io/ide) 30- [Tutorial](/docs/json_tutorial.md) for writing a JSON parser. 31- Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) 32- [Gitter chat](https://gitter.im/lark-parser/Lobby) 33 34### Install Lark 35 36 $ pip install lark-parser --upgrade 37 38Lark has no dependencies. 39 40[![Tests](https://github.com/lark-parser/lark/actions/workflows/tests.yml/badge.svg)](https://github.com/lark-parser/lark/actions/workflows/tests.yml) 41 42### Syntax Highlighting 43 44Lark provides syntax highlighting for its grammar files (\*.lark): 45 46- [Sublime Text & TextMate](https://github.com/lark-parser/lark_syntax) 47- [vscode](https://github.com/lark-parser/vscode-lark) 48- [Intellij & PyCharm](https://github.com/lark-parser/intellij-syntax-highlighting) 49- [Vim](https://github.com/lark-parser/vim-lark-syntax) 50- [Atom](https://github.com/Alhadis/language-grammars) 51 52### Clones 53 54These are implementations of Lark in other languages. They accept Lark grammars, and provide similar utilities. 55 56- [Lerche (Julia)](https://github.com/jamesrhester/Lerche.jl) - an unofficial clone, written entirely in Julia. 57- [Lark.js (Javascript)](https://github.com/lark-parser/lark.js) - a port of the stand-alone LALR(1) parser generator to Javascsript. 58 59### Hello World 60 61Here is a little program to parse "Hello, World!" (Or any other similar phrase): 62 63```python 64from lark import Lark 65 66l = Lark('''start: WORD "," WORD "!" 67 68 %import common.WORD // imports from terminal library 69 %ignore " " // Disregard spaces in text 70 ''') 71 72print( l.parse("Hello, World!") ) 73``` 74 75And the output is: 76 77```python 78Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')]) 79``` 80 81Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark. 82 83### Fruit flies like bananas 84 85Lark is great at handling ambiguity. Here is the result of parsing the phrase "fruit flies like bananas": 86 87![fruitflies.png](examples/fruitflies.png) 88 89[Read the code here](https://github.com/lark-parser/lark/tree/master/examples/fruitflies.py), and see [more examples here](https://lark-parser.readthedocs.io/en/latest/examples/index.html). 90 91 92## List of main features 93 94 - Builds a parse-tree (AST) automagically, based on the structure of the grammar 95 - **Earley** parser 96 - Can parse all context-free grammars 97 - Full support for ambiguous grammars 98 - **LALR(1)** parser 99 - Fast and light, competitive with PLY 100 - Can generate a stand-alone parser ([read more](docs/tools.md#stand-alone-parser)) 101 - **CYK** parser, for highly ambiguous grammars 102 - **EBNF** grammar 103 - **Unicode** fully supported 104 - **Python 2 & 3** compatible 105 - Automatic line & column tracking 106 - Standard library of terminals (strings, numbers, names, etc.) 107 - Import grammars from Nearley.js ([read more](/docs/tools.md#importing-grammars-from-nearleyjs)) 108 - Extensive test suite [![codecov](https://codecov.io/gh/lark-parser/lark/branch/master/graph/badge.svg?token=lPxgVhCVPK)](https://codecov.io/gh/lark-parser/lark) 109 - MyPy support using type stubs 110 - And much more! 111 112See the full list of [features here](https://lark-parser.readthedocs.io/en/latest/features.html) 113 114 115### Comparison to other libraries 116 117#### Performance comparison 118 119Lark is the fastest and lightest (lower is better) 120 121![Run-time Comparison](docs/_static/comparison_runtime.png) 122 123![Memory Usage Comparison](docs/_static/comparison_memory.png) 124 125 126Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made. 127 128*Note: I really wanted to add PLY to the benchmark, but I couldn't find a working JSON parser anywhere written in PLY. If anyone can point me to one that actually works, I would be happy to add it!* 129 130*Note 2: The parsimonious code has been optimized for this specific test, unlike the other benchmarks (Lark included). Its "real-world" performance may not be as good.* 131 132#### Feature comparison 133 134| Library | Algorithm | Grammar | Builds tree? | Supports ambiguity? | Can handle every CFG? | Line/Column tracking | Generates Stand-alone 135|:--------|:----------|:----|:--------|:------------|:------------|:----------|:---------- 136| **Lark** | Earley/LALR(1) | EBNF | Yes! | Yes! | Yes! | Yes! | Yes! (LALR only) | 137| [PLY](http://www.dabeaz.com/ply/) | LALR(1) | BNF | No | No | No | No | No | 138| [PyParsing](https://github.com/pyparsing/pyparsing) | PEG | Combinators | No | No | No\* | No | No | 139| [Parsley](https://pypi.python.org/pypi/Parsley) | PEG | EBNF | No | No | No\* | No | No | 140| [Parsimonious](https://github.com/erikrose/parsimonious) | PEG | EBNF | Yes | No | No\* | No | No | 141| [ANTLR](https://github.com/antlr/antlr4) | LL(*) | EBNF | Yes | No | Yes? | Yes | No | 142 143 144(\* *PEGs cannot handle non-deterministic grammars. Also, according to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs*) 145 146 147### Projects using Lark 148 149 - [Poetry](https://github.com/python-poetry/poetry-core) - A utility for dependency management and packaging 150 - [tartiflette](https://github.com/dailymotion/tartiflette) - a GraphQL server by Dailymotion 151 - [Hypothesis](https://github.com/HypothesisWorks/hypothesis) - Library for property-based testing 152 - [mappyfile](https://github.com/geographika/mappyfile) - a MapFile parser for working with MapServer configuration 153 - [synapse](https://github.com/vertexproject/synapse) - an intelligence analysis platform 154 - [Datacube-core](https://github.com/opendatacube/datacube-core) - Open Data Cube analyses continental scale Earth Observation data through time 155 - [SPFlow](https://github.com/SPFlow/SPFlow) - Library for Sum-Product Networks 156 - [Torchani](https://github.com/aiqm/torchani) - Accurate Neural Network Potential on PyTorch 157 - [Command-Block-Assembly](https://github.com/simon816/Command-Block-Assembly) - An assembly language, and C compiler, for Minecraft commands 158 - [EQL](https://github.com/endgameinc/eql) - Event Query Language 159 - [Fabric-SDK-Py](https://github.com/hyperledger/fabric-sdk-py) - Hyperledger fabric SDK with Python 3.x 160 - [required](https://github.com/shezadkhan137/required) - multi-field validation using docstrings 161 - [miniwdl](https://github.com/chanzuckerberg/miniwdl) - A static analysis toolkit for the Workflow Description Language 162 - [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer 163 - [harmalysis](https://github.com/napulen/harmalysis) - A language for harmonic analysis and music theory 164 - [gersemi](https://github.com/BlankSpruce/gersemi) - A CMake code formatter 165 166Using Lark? Send me a message and I'll add your project! 167 168## License 169 170Lark uses the [MIT license](LICENSE). 171 172(The standalone tool is under MPL2) 173 174## Contribute 175 176Lark is currently accepting pull-requests. See [How to develop Lark](/docs/how_to_develop.md) 177 178## Sponsor 179 180If you like Lark, and want to see it grow, please consider [sponsoring us!](https://github.com/sponsors/lark-parser) 181 182## Contact the author 183 184Questions about code are best asked on [gitter](https://gitter.im/lark-parser/Lobby) or in the issues. 185 186For anything else, I can be reached by email at erezshin at gmail com. 187 188 -- [Erez](https://github.com/erezsh) 189