Development¶
There is a set of makefiles implementing the most common operations, a make help
will
show a brief table of contents. A comprehensive test suite, based on pytest, covers nearly
99% of the source lines.
History¶
For a more detailed evolution steps see Changes.
Version 1¶
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2¶
In late 2019, Ronan opened PR #62 against libpg_query
, that reimplemented the
build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL
12, and PR #36 to bring pglast
in line.
Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I
bumped the major version of pglast
to better reflect the fact.
This version only had some development releases, since PR #62
has been superseded.
Important
This version requires Python 3.6 or greater, due to usage of f-strings.
Version 3¶
In early 2021, Lukas put a considerable effort into evolving his library to target PostgreSQL
13. He introduced a richer protobuf-based AST serialization protocol, rewriting the
underlying machinery so that the same code is used to generate either a JSON
or a
protobuf
stream.
The approach has obvious advantages, but unfortunately both formats come with different
shortcomings, and I was not able to adapt pglast
. The JSON
serialization has changed in
a way that it is not sufficient anymore to rebuild the original AST
because some attributes
now carry an implicit structure, that requires additional information to understand the
content (see issue #82). OTOH, the Protobuf
format is clumsy, at least on the Python
side: the Google’s compiler creates a huge and unreadable module, while other implementations
(see pyrobuf, cprotobuf and betterproto) suffer of different issues (see issue
#210).
After several attempts, I decided to follow a more rewarding way and implement a native Python
wrapper layer on top of PG parser’s nodes, pglast.ast
.
Ronan and Hong helped a lot respectively with PR #72 and PR #77. Last but not least, https://bit.io/ kindly sponsored the project.