Examples of usage¶

Here are some example of how the module can be used.

Low level¶

The lowest level is a Python wrapper around each parse node returned by the PostgreSQL parser. Each node is represented by a corresponding Python class in the module pglast.ast — Python classes representing PG parser nodes.

Parse an `SQL` statement and get its AST root node¶

The function parse_sql() returns a tuple containing one or more RawStmt instances:

>>> from pglast import parse_sql
>>> root = parse_sql('select 1')
>>> print(root)
(<RawStmt stmt=<SelectStmt targetList=(<ResTarget val=<A_Const val=<Integer val=1>>>,) ...,)

The textual representation of a parse node carries all its not None attributes, recursively.

You can obtain the accepted attributes of any node by iterating it:

>>> rawstmt = root[0]
>>> print(tuple(rawstmt))
('stmt', 'stmt_location', 'stmt_len')

>>> from pglast import ast
>>> stmt = rawstmt.stmt
>>> assert isinstance(stmt, ast.SelectStmt)

Each node is also a callable, to serialize it into a hierarchy of elementary Python values such as dictionaries and tuples:

>>> from pprint import pprint
>>> pprint(stmt(depth=2, skip_none=True))
{'@': 'SelectStmt',
 'all': False,
 'limitOption': {'#': 'LimitOption',
                 'name': 'LIMIT_OPTION_DEFAULT',
                 'value': 0},
 'op': {'#': 'SetOperation', 'name': 'SETOP_NONE', 'value': 0},
 'targetList': ({'@': 'ResTarget', 'location': 7, 'val': …},)}

As you can see, each node is serialized to a dictionary containing at least on special key, @, with the tag name of the node; lists of nodes are converted to tuples, and Enum instances to a dictionary with a special # key carrying the name of data type, and two other keys name and value respectively with the name and value of the enum value.

Nodes can be compared to each other, and are considered equal when all their attributes match, ignoring those semantically irrelevant:

>>> other_stmt = parse_sql('select /* something here */ 1')[0].stmt
>>> print(other_stmt(depth=2, skip_none=True))
{'@': 'SelectStmt', 'targetList': ({'@': 'ResTarget', 'val': …, 'location': 28},), ...}
>>> stmt == other_stmt
True

Altering a node¶

Any attribute of a node is alterable, and some check is done on the assigned value:

>>> print(stmt.all)
False
>>> stmt.all = True
>>> print(stmt.all)
True

>>> stmt.all = "foo"
Traceback (most recent call last):
  ...
ValueError: Bad value for attribute SelectStmt.all, expected (<class 'bool'>, <class 'int'>), got <class 'str'>: 'foo'

Enum attributes can be set to either a plain string, which is looked up in the related class, or to a dictionary:

>>> stmt.limitOption = 'LIMIT_OPTION_COUNT'
>>> pprint(stmt(depth=1, skip_none=True))
{'@': 'SelectStmt',
 'all': True,
 'limitOption': {'#': 'LimitOption', 'name': 'LIMIT_OPTION_COUNT', 'value': 1},
 'op': {'#': 'SetOperation', 'name': 'SETOP_NONE', 'value': 0},
 'targetList': (…,)}

>>> stmt.limitOption = {'#': 'LimitOption', 'name': 'LIMIT_OPTION_WITH_TIES'}
>>> pprint(stmt(depth=1, skip_none=True))
{'@': 'SelectStmt',
 'all': True,
 'limitOption': {'#': 'LimitOption',
                 'name': 'LIMIT_OPTION_WITH_TIES',
                 'value': 2},
 'op': {'#': 'SetOperation', 'name': 'SETOP_NONE', 'value': 0},
 'targetList': (…,)}

Either way, assigning the wrong value raises an exception:

>>> stmt.limitOption = 'foo'
Traceback (most recent call last):
  ...
ValueError: Bad value for attribute SelectStmt.limitOption, (<class 'int'>, <class 'str'>, <class 'dict'>, <enum 'LimitOption'>), got 'foo'
>>> stmt.limitOption = {'#': 'JoinType', 'name': 'JOIN_INNER'}
Traceback (most recent call last):
  ...
ValueError: Bad value for attribute SelectStmt.limitOption, expected a (<class 'int'>, <class 'str'>, <class 'dict'>, <enum 'LimitOption'>), got {'#': 'JoinType', 'name': 'JOIN_INNER'}

Creating a node¶

You can easily create a new node in the usual way, possibly passing any recognized attribute as a parameter to the constructor:

>>> print(ast.SelectStmt())
<SelectStmt>
>>> print(ast.SelectStmt(all=1))
<SelectStmt all=True>
>>> ast.SelectStmt(non_existing_attribute=None)
Traceback (most recent call last):
  ...
TypeError: __init__() got an unexpected keyword argument 'non_existing_attribute'
>>> ast.SelectStmt(all="foo")
Traceback (most recent call last):
  ...
ValueError: Bad value for attribute SelectStmt.all, expected (<class 'bool'>, <class 'int'>), got <class 'str'>: 'foo'

Alternatively, you can pass a single dictionary as argument, with the special @ key valued with the correct node name:

>>> print(ast.SelectStmt({'@': 'SelectStmt', 'all': True}))
<SelectStmt all=True>
>>> print(ast.SelectStmt({'@': 'RawStmt', 'all': True}))
Traceback (most recent call last):
  ...
ValueError: Bad argument, wrong "@" value, expected 'SelectStmt', got 'RawStmt'

This basically means that you can reconstruct a syntax tree from the result of calling a node:

>>> clone = ast.SelectStmt(stmt())
>>> clone is stmt
False
>>> clone == stmt
True

Medium level¶

Parse an `SQL` statement and get its AST root node¶

>>> from pglast import Node
>>> root = Node(parse_sql('SELECT foo FROM bar'))
>>> print(root)
None=[1*{RawStmt}]

Get a particular node¶

>>> from_clause = root[0].stmt.fromClause
>>> print(from_clause)
fromClause=[1*{RangeVar}]

Obtain some information about a node¶

>>> range_var = from_clause[0]
>>> print(range_var.node_tag)
RangeVar
>>> print(range_var.attribute_names)
('catalogname', 'schemaname', 'relname', 'inh', 'relpersistence', 'alias', 'location')
>>> print(range_var.parent_node)
stmt={SelectStmt}

Iterate over nodes¶

>>> for a in from_clause:
...     print(a)
...     for b in a:
...         print(b)
...
fromClause[0]={RangeVar}
inh=<True>
location=<16>
relname=<'bar'>
relpersistence=<'p'>

Recursively `traverse` the parse tree¶

>>> for node in root.traverse():
...   print(node)
...
None[0]={RawStmt}
stmt={SelectStmt}
all=<False>
fromClause[0]={RangeVar}
inh=<True>
location=<16>
relname=<'bar'>
relpersistence=<'p'>
limitOption=<LimitOption.LIMIT_OPTION_DEFAULT: 0>
op=<SetOperation.SETOP_NONE: 0>
targetList[0]={ResTarget}
location=<7>
val={ColumnRef}
fields[0]={String}
val=<'foo'>
location=<7>
stmt_len=<0>
stmt_location=<0>

As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind* and <value> is a Scalar.

Programmatically `reformat` a `SQL` statement¶

The easy way¶

The prettify() takes a textual SQL statement and returns its equivalent once reprinted with a focus on readability.

>>> from pglast import prettify
>>> print(prettify('delete from sometable where value is null'))
DELETE FROM sometable
WHERE value IS NULL

>>> print(prettify('select a,b,c from sometable where value is null'))
SELECT a
     , b
     , c
FROM sometable
WHERE value IS NULL

>>> print(prettify('select a,b,c from sometable'
...                ' where value is null or value = 1',
...                comma_at_eoln=True))
SELECT a,
       b,
       c
FROM sometable
WHERE ((value IS NULL)
   OR (value = 1))

Under the cover¶

The function above is a simple wrapper to the IndentedStream class, that extends pglast.stream.RawStream adding a bit a aesthetic sense.

>>> from pglast.stream import IndentedStream, RawStream
>>> print(IndentedStream(comma_at_eoln=True)('select a,b,c from sometable'))
SELECT a,
       b,
       c
FROM sometable

>>> print(IndentedStream()(root))
SELECT foo
FROM bar

>>> sql = 'select a.x, b.y from a join b on a.bid = b.id'
>>> astnode = parse_sql(sql)[0].stmt
>>> astnode
<SelectStmt targetList=(<ResTarget val=<ColumnRef fields=(<String val='a'>, <String val='x'>)>>...
>>> print(RawStream()(astnode.fromClause))
a INNER JOIN b ON a.bid = b.id

`Visit` or modify the AST tree¶

>>> from collections import Counter
>>> from pglast.visitors import Visitor
>>>
>>> class Stats(Visitor):
...     def __call__(self, node):
...         self.counters = Counter()
...         super().__call__(node)
...         return self.counters
...
...     def visit(self, ancestors, node):
...         self.counters.update((node.__class__.__name__,))
...
>>> stats = Stats()
>>> print(stats(parse_sql('select 1')))
Counter({'RawStmt': 1, 'SelectStmt': 1, 'ResTarget': 1, 'A_Const': 1, 'Integer': 1})

>>> class NoisyVisitor(Visitor):
...     def visit(self, ancestors, node):
...         print(ancestors, ':', node(depth=0, skip_none=True))
...
>>> visitor = NoisyVisitor()
>>> visitor(parse_sql('select a, b from c'))
ROOT → 0 : {'@': 'RawStmt', 'stmt': …, 'stmt_location': 0, 'stmt_len': 0}
ROOT → 0 → stmt : {'@': 'SelectStmt', 'targetList': …, 'fromClause': …, ...
ROOT → 0 → stmt → targetList → 0 : {'@': 'ResTarget', 'val': …, 'location': 7}
ROOT → 0 → stmt → targetList → 1 : {'@': 'ResTarget', 'val': …, 'location': 10}
ROOT → 0 → stmt → fromClause → 0 : {'@': 'RangeVar', 'relname': 'c', 'inh': True, ...
ROOT → 0 → stmt → targetList → 0 → val : {'@': 'ColumnRef', 'fields': …, 'location': 7}
ROOT → 0 → stmt → targetList → 1 → val : {'@': 'ColumnRef', 'fields': …, 'location': 10}
ROOT → 0 → stmt → targetList → 0 → val → fields → 0 : {'@': 'String', 'val': 'a'}
ROOT → 0 → stmt → targetList → 1 → val → fields → 0 : {'@': 'String', 'val': 'b'}
(<RawStmt stmt=<SelectStmt ...

>>> from pglast import enums
>>> from pglast.visitors import Delete
>>>
>>> class DropNullConstraint(Visitor):
...     def visit_Constraint(self, ancestors, node):
...         if node.contype == enums.ConstrType.CONSTR_NULL:
...             return Delete
...
>>> raw = parse_sql('create table foo (a integer null, b integer not null)')
>>> DropNullConstraint()(raw)
(<RawStmt stmt=<CreateStmt ...
>>> print(RawStream()(raw))
CREATE TABLE foo (a integer, b integer NOT NULL)

Customize a `node printer`¶

>>> sql = 'update translations set italian=$2 where word=$1'
>>> print(prettify(sql))
UPDATE translations
SET italian = $2
WHERE word = $1
>>> from pglast.printers import node_printer
>>> @node_printer('ParamRef', override=True)
... def replace_param_ref(node, output):
...     output.write(repr(args[node.number.value - 1]))
...
>>> args = ['Hello', 'Ciao']
>>> print(prettify(sql, safety_belt=False))
UPDATE translations
SET italian = 'Ciao'
WHERE word = 'Hello'

`Iterate` over each statement¶

By default, the split() function uses the parser to do its job:

>>> from pglast import split
>>> for statement in split('select 1; select 2'):
...     print(statement)
...
select 1
select 2

and thus it raises an error if the statement contains errors:

>>> split('select 1 from; select 2')
Traceback (most recent call last):
  ...
pglast.parser.ParseError: syntax error at or near ";", at location 14

In this case, you can use a variant that uses the lexical scanner instead:

>>> for statement in split('select 1 from; select 2', with_parser=False):
...     print(statement)
...
select 1 from
select 2

Command line¶

Reformat a `SQL` statement¶

$ echo "select a,b,c from sometable" | pgpp
SELECT a
     , b
     , c
FROM sometable

$ pgpp -S "select a, case when a=1 then 'singular' else 'plural' end from test"
SELECT a
     , CASE
         WHEN (a = 1)
           THEN 'singular'
         ELSE 'plural'
       END
FROM test

$ echo 'update "table" set value=123 where value is null' | pgpp
UPDATE "table"
SET value = 123
WHERE value IS NULL

$ echo "
insert into t (id, description)
values (1, 'this is short enough'),
       (2, 'this is too long, and will be splitted')" | pgpp -s 20
INSERT INTO t (id, description)
VALUES (1, 'this is short enough')
     , (2, 'this is too long, an'
           'd will be splitted')

Get a more compact representation¶

$ pgpp --compact 30 -S "select a,b,c from st where a='longvalue1' and b='longvalue2'"
SELECT a, b, c
FROM st
WHERE (a = 'longvalue1')
  AND (b = 'longvalue2')

$ pgpp --compact 60 -S "select a,b,c from st where a='longvalue1' and b='longvalue2'"
SELECT a, b, c
FROM st
WHERE (a = 'longvalue1') AND (b = 'longvalue2')

Obtain the parse tree of a `SQL` statement¶

$ pgpp --parse-tree --statement "select 1"
[{'@': 'RawStmt',
  'stmt': {'@': 'SelectStmt',
           'all': False,
           'limitOption': <LimitOption.LIMIT_OPTION_DEFAULT: 0>,
           'op': <SetOperation.SETOP_NONE: 0>,
           'targetList': ({'@': 'ResTarget',
                           'location': 7,
                           'val': {'@': 'A_Const',
                                   'location': 7,
                                   'val': {'@': 'Integer', 'val': 1}}},)},
  'stmt_len': 0,
  'stmt_location': 0}]

Preserve comments¶

$ pgpp --preserve-comments -S "/* Header */ select 1"
/* Header */ SELECT 1

$ echo -e "--what?\nselect foo\n--where?\nfrom bar" | pgpp -C
--what?
SELECT foo
FROM
   --where?
bar

$ echo -e "--what?\nselect foo\n/*where?*/from bar\n--end" | pgpp -C
--what?
SELECT foo
FROM
   /*where?*/ bar
--end

Note

Preserving comments is always hard and far from a perfect science: not all AST nodes carry their exact location, so it is not possible to differentiate between SELECT * /*comment*/ FROM foo and SELECT * FROM /*comment*/ foo.

—

*: This is an approximation, because in principle a list can contain different kinds of nodes, or even sub-lists in some cases: the List representation arbitrarily shows the tag of the first object.

Functions vs SQL syntax¶

$ pgpp -S "select extract(hour from t1.modtime) from t1"
SELECT pg_catalog.date_part('hour', t1.modtime)
FROM t1

$ pgpp --special-functions -S "select extract(hour from t1.modtime) from t1"
SELECT EXTRACT(HOUR FROM t1.modtime)
FROM t1

$ echo "
select substring('123',2,3),
       regexp_split_to_array('x,x,x', ','),
       btrim('xxx'), trim('xxx'),
       POSITION('hour' in trim(substring('xyz hour ',1,6)))
" | pgpp
SELECT pg_catalog.substring('123', 2, 3)
     , regexp_split_to_array('x,x,x', ',')
     , btrim('xxx')
     , pg_catalog.btrim('xxx')
     , pg_catalog.position(pg_catalog.btrim(pg_catalog.substring('xyz hour ', 1, 6))
                         , 'hour')

$ echo "
select substring('123',2,3),
       regexp_split_to_array('x,x,x', ','),
       btrim('xxx'), trim('xxx'),
       POSITION('hour' in trim(substring('xyz hour ',1,6)))
" | pgpp -f --remove-pg_catalog-from-functions
SELECT substring('123', 2, 3)
     , regexp_split_to_array('x,x,x', ',')
     , btrim('xxx')
     , btrim('xxx')
     , pg_catalog.position(btrim(substring('xyz hour ', 1, 6))
                         , 'hour')

Examples of usage¶

Low level¶

Parse an `SQL` statement and get its AST root node¶

Altering a node¶

Creating a node¶

Medium level¶

Parse an `SQL` statement and get its AST root node¶

Get a particular node¶

Obtain some information about a node¶

Iterate over nodes¶

Recursively `traverse` the parse tree¶

Programmatically `reformat` a `SQL` statement¶

The easy way¶

Under the cover¶

`Visit` or modify the AST tree¶

Customize a `node printer`¶

`Iterate` over each statement¶

Command line¶

Reformat a `SQL` statement¶

Get a more compact representation¶

Obtain the parse tree of a `SQL` statement¶

Preserve comments¶

Functions vs SQL syntax¶

pglast

Navigation

Related Topics

Examples of usage¶

Low level¶

Parse an SQL statement and get its AST root node¶

Altering a node¶

Creating a node¶

Medium level¶

Parse an SQL statement and get its AST root node¶

Get a particular node¶

Obtain some information about a node¶

Iterate over nodes¶

Recursively traverse the parse tree¶

Programmatically reformat a SQL statement¶

The easy way¶

Under the cover¶

Visit or modify the AST tree¶

Customize a node printer¶

Iterate over each statement¶

Command line¶

Reformat a SQL statement¶

Get a more compact representation¶

Obtain the parse tree of a SQL statement¶

Preserve comments¶

Functions vs SQL syntax¶

Parse an `SQL` statement and get its AST root node¶

Parse an `SQL` statement and get its AST root node¶

Recursively `traverse` the parse tree¶

Programmatically `reformat` a `SQL` statement¶

`Visit` or modify the AST tree¶

Customize a `node printer`¶

`Iterate` over each statement¶

Reformat a `SQL` statement¶

Obtain the parse tree of a `SQL` statement¶