• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

docs/H13-Feb-2019-3,8812,071

libpg_query/H13-Feb-2019-185,222126,955

pglast/H13-Feb-2019-10,8829,218

pglast.egg-info/H03-May-2022-548352

tests/H13-Feb-2019-2,9042,174

tools/H13-Feb-2019-482346

CHANGES.rstH A D13-Feb-20195.4 KiB267138

MANIFEST.inH A D16-Jun-2018522 1918

MakefileH A D16-Jun-20183.3 KiB12786

Makefile.releaseH A D16-Jun-20181.9 KiB6947

Makefile.virtualenvH A D16-Jun-20181.3 KiB5031

PKG-INFOH A D13-Feb-201919.2 KiB548352

README.rstH A D07-Feb-20198.8 KiB256191

setup.cfgH A D13-Feb-2019172 149

setup.pyH A D03-May-20222.7 KiB9772

README.rst

1.. -*- coding: utf-8 -*-
2.. :Project:   pglast -- PostgreSQL Languages AST
3.. :Created:   mer 02 ago 2017 14:49:24 CEST
4.. :Author:    Lele Gaifax <lele@metapensiero.it>
5.. :License:   GNU General Public License version 3 or later
6.. :Copyright: © 2017, 2018, 2019 Lele Gaifax
7..
8
9========
10 pglast
11========
12
13PostgreSQL Languages AST and statements prettifier
14==================================================
15
16:Author: Lele Gaifax
17:Contact: lele@metapensiero.it
18:License: `GNU General Public License version 3 or later`__
19:Status: |build| |doc|
20
21__ https://www.gnu.org/licenses/gpl.html
22.. |build| image:: https://travis-ci.org/lelit/pglast.svg?branch=master
23   :target: https://travis-ci.org/lelit/pglast
24   :alt: Build status
25.. |doc| image:: https://readthedocs.org/projects/pglast/badge/?version=latest
26   :target: https://readthedocs.org/projects/pglast/builds/
27   :alt: Documentation status
28
29This is a Python 3 module that exposes the *parse tree* of a PostgreSQL__ statement (extracted
30by the almost standard PG parser repackaged as a standalone static library by `libpg_query`__)
31as set of interconnected *nodes*, usually called an *abstract syntax tree*.
32
33__ https://www.postgresql.org/
34__ https://github.com/lfittl/libpg_query
35
36I needed a better SQL reformatter than the one implemented by `sqlparse`__, and was annoyed by
37a few glitches (subselects__ in particular) that ruins the otherwise excellent job it does,
38considering that it is a generic library that tries to swallow many different SQL dialects.
39
40__ https://pypi.org/project/sqlparse/
41__ https://github.com/andialbrecht/sqlparse/issues/334
42
43When I found `psqlparse`__ I decided to try implementing a PostgreSQL `focused tool`__: at the
44beginning it's been easier than I feared, but I quickly hit some shortcomings in that
45implementation, so I opted for writing my own solution restarting from scratch, with the
46following goals:
47
48__ https://github.com/alculquicondor/psqlparse
49__ https://github.com/alculquicondor/psqlparse/pull/22
50
51- target only Python 3.4+
52
53- target PostgreSQL 10
54
55- use a more dynamic approach to represent the *parse tree*, with a twofold advantage:
56
57  1. it is much less boring to code, because there's no need to write one Python class for each
58     PostgreSQL node tag
59
60  2. the representation is version agnostic, it can be adapted to newer/older Elephants in a
61     snap
62
63- allow exploration of parse tree in both directions, because I realized that some kinds of
64  nodes require that knowledge to determine their textual representation
65
66- avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL
67  documentation/sources\ [*]_ is available without the hassle of guessing how a symbol has been
68  mapped
69
70- use a `zero copy`__ approach, keeping the original parse tree returned from the underlying
71  libpg_query functions and have each node just borrow a reference to its own subtree
72
73__ https://en.wikipedia.org/wiki/Zero-copy
74
75.. [*] Currently what you can find in the following headers:
76
77       - `lockoptions.h`__
78       - `nodes.h`__
79       - `parsenodes.h`__
80       - `pg_class.h`__
81       - `primnodes.h`__
82
83__ https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/nodes/lockoptions.h;hb=HEAD
84__ https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/nodes/nodes.h;hb=HEAD
85__ https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/nodes/parsenodes.h;hb=HEAD
86__ https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/catalog/pg_class.h;hb=HEAD
87__ https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/nodes/primnodes.h;hb=HEAD
88
89Introduction
90------------
91
92At the lower level the module exposes two libpg_query functions, ``parse_sql()`` and
93``parse_plpgsql()``, that take respectively an ``SQL`` statement and a ``PLpgSQL`` statement
94and return a *parse tree* as a hierarchy of Python dictionaries, lists and scalar values. In
95some cases these scalars correspond to some C ``typedef enums``, that are automatically
96extracted from the PostgreSQL headers mentioned above and are available as ``pglast.enums``.
97
98At a higher level that tree is represented by three Python classes, a ``Node`` that represents
99a single node, a ``List`` that wraps a sequence of nodes and a ``Scalar`` for plain values such
100a *strings*, *integers*, *booleans* or *none*.
101
102Every node is identified by a *tag*, a string label that characterizes its content that is
103exposed as a set of *attributes* as well as with a dictionary-like interface (technically they
104implements both a ``__getattr__`` method and a ``__getitem__`` method). When asked for an
105attribute, the node returns an instance of the base classes, i.e. another ``Node``, or a
106``List`` or a ``Scalar``, depending on the data type of that item. When the node does not
107contain the requested attribute it returns a singleton ``Missing`` marker instance.
108
109A ``List`` wraps a plain Python ``list`` and may contains a sequence of ``Node`` instances, or
110in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
111
112Finally, a ``Scalar`` carries a single value of some type, accessible through its ``value``
113attribute.
114
115On top of that, the module implements two serializations, one that transforms a ``Node`` into a
116*raw* textual representation and another that returns a *prettified* representation. The latter
117is exposed by the ``pgpp`` CLI tool, see below for an example.
118
119Installation
120------------
121
122As usual, the easiest way is with pip::
123
124  $ pip install pglast
125
126Alternatively you can clone the repository::
127
128  $ git clone https://github.com/lelit/pglast.git --recursive
129
130and install from there::
131
132  $ pip install ./pglast
133
134Development
135-----------
136
137There is a set of *makefiles* implementing the most common operations, a ``make help`` will
138show a brief table of contents. A comprehensive test suite, based on pytest__, covers 98% of
139the source lines.
140
141__ https://docs.pytest.org/en/latest/
142
143Examples of usage
144-----------------
145
146* Parse an ``SQL`` statement and get its *AST* root node::
147
148   >>> from pglast import Node, parse_sql
149   >>> root = Node(parse_sql('SELECT foo FROM bar'))
150   >>> print(root)
151   None=[1*{RawStmt}]
152
153* Recursively traverse the parse tree::
154
155   >>> for node in root.traverse():
156   ...   print(node)
157   ...
158   None[0]={RawStmt}
159   stmt={SelectStmt}
160   fromClause[0]={RangeVar}
161   inh=<True>
162   location=<16>
163   relname=<'bar'>
164   relpersistence=<'p'>
165   op=<0>
166   targetList[0]={ResTarget}
167   location=<7>
168   val={ColumnRef}
169   fields[0]={String}
170   str=<'foo'>
171   location=<7>
172
173  As you can see, the ``repr``\ esentation of each value is mnemonic: ``{some_tag}`` means a
174  ``Node`` with tag ``some_tag``, ``[X*{some_tag}]`` is a ``List`` containing `X` nodes of that
175  particular kind\ [*]_ and ``<value>`` is a ``Scalar``.
176
177* Get a particular node::
178
179   >>> from_clause = root[0].stmt.fromClause
180   >>> print(from_clause)
181   fromClause=[1*{RangeVar}]
182
183* Obtain some information about a node::
184
185   >>> range_var = from_clause[0]
186   >>> print(range_var.node_tag)
187   RangeVar
188   >>> print(range_var.attribute_names)
189   dict_keys(['relname', 'inh', 'relpersistence', 'location'])
190   >>> print(range_var.parent_node)
191   stmt={SelectStmt}
192
193* Iterate over nodes::
194
195   >>> for a in from_clause:
196   ...     print(a)
197   ...     for b in a:
198   ...         print(b)
199   ...
200   fromClause[0]={RangeVar}
201   inh=<True>
202   location=<16>
203   relname=<'bar'>
204   relpersistence=<'p'>
205
206* Reformat a SQL statement\ [*]_ from the command line::
207
208   $ echo "select a,b,c from sometable" | pgpp
209   SELECT a
210        , b
211        , c
212   FROM sometable
213
214   $ echo "select a,b,c from sometable" | pgpp -c
215   SELECT a,
216          b,
217          c
218   FROM sometable
219
220   $ echo 'update "table" set value=123 where value is null' | pgpp
221   UPDATE "table"
222   SET value = 123
223   WHERE value IS NULL
224
225   $ echo "
226   insert into t (id, description)
227   values (1, 'this is short enough'),
228          (2, 'this is too long, and will be splitted')" | pgpp -s 20
229   INSERT INTO t (id, description)
230   VALUES (1, 'this is short enough')
231        , (2, 'this is too long, an'
232              'd will be splitted')
233
234* Programmatically reformat a SQL statement::
235
236   >>> from pglast import prettify
237   >>> print(prettify('delete from sometable where value is null'))
238   DELETE FROM sometable
239   WHERE value IS NULL
240
241Documentation
242-------------
243
244Latest documentation is hosted by `Read the Docs`__ at http://pglast.readthedocs.io/en/latest/
245
246__ https://readthedocs.org/
247
248
249.. [*] This is an approximation, because in principle a list could contain different kinds of
250       nodes, or even sub-lists in some cases: the ``List`` representation arbitrarily shows
251       the tag of the first object.
252
253.. [*] Currently this covers most `DML` statements such as ``SELECT``\ s, ``INSERT``\ s,
254       ``DELETE``\ s and ``UPDATE``\ s, fulfilling my needs, but I'd like to extend it to
255       handle also `DDL` statements and, why not, `PLpgSQL` instructions too.
256