pglast.parser — The interface with libpg_query

This module is a C extension written in Cython that exposes a few functions from the underlying libpg_query library it links against.

pglast.parser.LONG_MAX

The highest integer that can be stored in a C long variable: it is used as a marker, for example in PG’s FetchStmt.howMany, that uses the constant FETCH_ALL.

exception pglast.parser.ParseError

Exception representing the error state returned by the parser.

exception pglast.parser.DeparseError

Exception representing the error state returned by the deparser.

class pglast.parser.Displacements(string)

Helper class used to find the index of Unicode character from its offset in the corresponding UTF-8 encoded array.

Example:

>>> from pglast.parser import Displacements
>>> unicode = '€ 0.01'
>>> utf8 = unicode.encode('utf-8')
>>> d = Displacements(unicode)
>>> for offset in range(len(utf8)):
...   idx = d(offset)
...   print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]')
...
0 [e2] -> 0 [€]
1 [82] -> 0 [€]
2 [ac] -> 0 [€]
3 [20] -> 1 [ ]
4 [30] -> 2 [0]
5 [2e] -> 3 [.]
6 [30] -> 4 [0]
7 [31] -> 5 [1]

The underlying libpg_parse library operates on UTF-8 strings: its parser functions emit tokens with a location, that is actually the offset within the UTF-8 representation of the statement. With this class you can fixup those offsets, like in the following example:

>>> import json
>>> from pglast.parser import parse_sql_json
>>> stmt = 'select alias.bar as alìbàbà from foo as alias'
>>> parsed = json.loads(parse_sql_json(stmt))
>>> select = parsed['stmts'][0]['stmt']['SelectStmt']
>>> rangevar = select['fromClause'][0]['RangeVar']
>>> loc = rangevar['location']
>>> print(stmt[loc:loc+3])
 as
>>> d = Displacements(stmt)
>>> adjloc = d(loc)
>>> print(stmt[adjloc:adjloc+3])
foo
pglast.parser.deparse_protobuf(buffer)
Parameters:

buffer (bytes) – a Protobuf buffer

Returns:

str

Return the SQL statement from the given buffer argument, something generated by parse_sql_protobuf().

pglast.parser.fingerprint(query)
Parameters:

query (str) – The SQL statement

Returns:

str

Fingerprint the given query, a string with the SQL statement(s), and return a hash digest that can identify similar queries. For similar queries that are different only because of the queried object or formatting, the returned digest will be the same.

pglast.parser.get_postgresql_version()
Returns:

a tuple

Return the PostgreSQL version as a tuple (major, minor, patch).

pglast.parser.parse_sql(query)
Parameters:

query (str) – The SQL statement

Returns:

tuple

Parse the given query, a string with the SQL statement(s), and return the corresponding parse tree as a tuple of pglast.ast.RawStmt instances.

pglast.parser.parse_sql_json(query)
Parameters:

query (str) – The SQL statement

Returns:

str

Parse the given query, a string with the SQL statement(s), and return the libpg_query‘s JSON-serialized parse tree.

pglast.parser.parse_sql_protobuf(query)
Parameters:

query (str) – The SQL statement

Returns:

bytes

Parse the given query, a string with the SQL statement(s), and return the libpg_query‘s Protobuf-serialized parse tree.

pglast.parser.parse_plpgsql_json(query)
Parameters:

query (str) – The PLpgSQL statement

Returns:

str

Parse the given query, a string with the plpgsql statement(s), and return the libpg_query‘s JSON-serialized parse tree.

pglast.parser.scan(query)
Parameters:

query (str) – The SQL statement

Returns:

sequence of tuples

Split the given query into its tokens. Each token is a namedtuple with the following slots:

startint

the index of the start of the token

endint

the index of the end of the token

namestr

the name of the token

kindstr

the kind of the token

Example:

>>> from pglast.parser import scan
>>> stmt = 'select bar as alìbàbà from foo'
>>> tokens = scan(stmt)
>>> print(tokens[0])
Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD')
>>> print([stmt[t.start:t.end+1] for t in tokens])
['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']
pglast.parser.split(query, with_parser=True, only_slices=False)
Parameters:
  • query (str) – The SQL statement

  • with_parser (bool) – Whether to use the parser or the scanner

  • only_slices (bool) – Return slices instead of statement’s text

Returns:

tuple

Split the given stmts string into a sequence of the single SQL statements.

By default this uses the parser to perform the job; when with_parser is False the scanner variant is used, indicated when the statements may contain parse errors.

When only_slices is True, return a sequence of slice instances, one for each statement, instead of statements text.

Note

Leading and trailing whitespace are removed from the statements.

Example:

>>> from pglast.parser import split
>>> split('select 1 for; select 2')
Traceback (most recent call last):
  ...
pglast.parser.ParseError: syntax error at or near ";", at index 12
>>> split('select 1 for; select 2', with_parser=False)
('select 1 for', 'select 2')
>>> stmts = "select 'fòò'; select 'bàr'"
>>> print([stmts[r] for r in split(stmts, only_slices=True)])
["select 'fòò'", "select 'bàr'"]