Development¶
There is a set of makefiles implementing the most common operations, a make help
will
show a brief table of contents. A comprehensive test suite, based on pytest, covers nearly
99% of the source lines.
History¶
For a more detailed evolution steps see Changes.
Version 1¶
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2¶
In late 2019, Ronan opened PR #62 against libpg_query
, that reimplemented the
build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL
12, and PR #36 to bring pglast
in line.
Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I
bumped the major version of pglast
to better reflect the fact.
This version only had some development releases, since PR #62
has been superseded.
Important
This version requires Python 3.6 or greater, due to usage of f-strings.
Version 3¶
In early 2021, Lukas put a considerable effort into evolving his library to target PostgreSQL
13. He introduced a richer protobuf-based AST serialization protocol, rewriting the
underlying machinery so that the same code is used to generate either a JSON
or a
protobuf
stream.
The approach has obvious advantages, but unfortunately both formats come with different
shortcomings, and I was not able to adapt pglast
. The JSON
serialization has changed in
a way that it is not sufficient anymore to rebuild the original AST
because some attributes
now carry an implicit structure, that requires additional information to understand the
content (see issue #82). OTOH, the Protobuf
format is clumsy, at least on the Python
side: the Google’s compiler creates a huge and unreadable module, while other implementations
(see pyrobuf, cprotobuf and betterproto) suffer of different issues (see issue
#210).
After several attempts, I decided to follow a more rewarding way and implement a native Python
wrapper layer on top of PG parser’s nodes, pglast.ast
.
Ronan and Hong helped a lot respectively with PR #72 and PR #77. Last but not least, https://bit.io/ kindly sponsored the project.
Version 4¶
The ultimate goal of this version is targeting PostgreSQL 14
, exploiting the combined
effort of Tessa Lisanti and Wolfgang Walther who upgraded libpg_query
to the latest
PG 14 parser that eventually has been finalized in the 14-latest branch.
While I was waiting for that to happen, I simplified the code getting rid of the wrapper
classes. They were required in version 1, when pglast
consumed the
JSON
-serialized parse tree emitted by libpg_query
exposing those structures as generic
Node
s distinguishable by their tag.
Version 3 retained them, although rewritten on top of the new concrete AST parser nodes, to make them aware of their ancestry, notion required by some printers to choose different representations.
Now the lineage is injected directly into the AST nodes by the printer machinery (cheaper than updating/computing it when setting/accessing each property) and all the printer functions receive one concrete AST node.
Version 5¶
This version tracks the 15-latest branch of libpg_query
and thus targets PostgreSQL
15
: other than handling new statements (MERGE to mention just one) and revised
syntaxes, there are currently no other significant differences from Version 4.
Version 6¶
This version tracks the 16-latest branch of libpg_query
and thus targets PostgreSQL
16
: other than handling new statements (most notably, SQL/JSON
constructor functions like
json_array()
or aggregators such as json_objectagg()
) and revised syntaxes, there are
currently no other significant differences from Version 5.