`pglast.parser` — The interface with libpg_query¶

This module is a C extension written in Cython that exposes a few functions from the underlying libpg_query library it links against.

pglast.parser.LONG_MAX¶: The highest integer that can be stored in a C long variable: it is used as a marker, for example in PG’s FetchStmt.howMany, that uses the constant FETCH_ALL.

exception pglast.parser.ParseError¶: Exception representing the error state returned by the parser.

exception pglast.parser.DeparseError¶: Exception representing the error state returned by the deparser.

class pglast.parser.Displacements(string)¶

Helper class used to find the index of Unicode character from its offset in the corresponding UTF-8 encoded array.

Example:

>>> from pglast.parser import Displacements
>>> unicode = '€ 0.01'
>>> utf8 = unicode.encode('utf-8')
>>> d = Displacements(unicode)
>>> for offset in range(len(utf8)):
...   idx = d(offset)
...   print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]')
...
0 [e2] -> 0 [€]
1 [82] -> 0 [€]
2 [ac] -> 0 [€]
3 [20] -> 1 [ ]
4 [30] -> 2 [0]
5 [2e] -> 3 [.]
6 [30] -> 4 [0]
7 [31] -> 5 [1]

The underlying libpg_parse library operates on UTF-8 strings: its parser functions emit tokens with a location, that is actually the offset within the UTF-8 representation of the statement. With this class you can fixup those offsets, like in the following example:

>>> import json
>>> from pglast.parser import parse_sql_json
>>> stmt = 'select alias.bar as alìbàbà from foo as alias'
>>> parsed = json.loads(parse_sql_json(stmt))
>>> select = parsed['stmts'][0]['stmt']['SelectStmt']
>>> rangevar = select['fromClause'][0]['RangeVar']
>>> loc = rangevar['location']
>>> print(stmt[loc:loc+3])
 as
>>> d = Displacements(stmt)
>>> adjloc = d(loc)
>>> print(stmt[adjloc:adjloc+3])
foo

pglast.parser.deparse_protobuf(buffer)¶

Parameters:: buffer (bytes) – a Protobuf buffer
Returns:: str

Return the SQL statement from the given buffer argument, something generated by parse_sql_protobuf().

pglast.parser.fingerprint(query)¶

Parameters:: query (str) – The SQL statement
Returns:: str

Fingerprint the given query, a string with the SQL statement(s), and return a hash digest that can identify similar queries. For similar queries that are different only because of the queried object or formatting, the returned digest will be the same.

pglast.parser.get_postgresql_version()¶

Returns:: a tuple

Return the PostgreSQL version as a tuple (major, minor, patch).

pglast.parser.parse_sql(query)¶

Parameters:: query (str) – The SQL statement
Returns:: tuple

Parse the given query, a string with the SQL statement(s), and return the corresponding parse tree as a tuple of pglast.ast.RawStmt instances.

pglast.parser.parse_sql_json(query)¶

Parameters:: query (str) – The SQL statement
Returns:: str

Parse the given query, a string with the SQL statement(s), and return the libpg_query‘s JSON-serialized parse tree.

pglast.parser.parse_sql_protobuf(query)¶

Parameters:: query (str) – The SQL statement
Returns:: bytes

Parse the given query, a string with the SQL statement(s), and return the libpg_query‘s Protobuf-serialized parse tree.

pglast.parser.parse_plpgsql_json(query)¶

Parameters:: query (str) – The PLpgSQL statement
Returns:: str

Parse the given query, a string with the plpgsql statement(s), and return the libpg_query‘s JSON-serialized parse tree.

pglast.parser.scan(query)¶

Parameters:: query (str) – The SQL statement
Returns:: sequence of tuples

Split the given query into its tokens. Each token is a namedtuple with the following slots:

startint: the index of the start of the token
endint: the index of the end of the token
namestr: the name of the token
kindstr: the kind of the token

Example:

>>> from pglast.parser import scan
>>> stmt = 'select bar as alìbàbà from foo'
>>> tokens = scan(stmt)
>>> print(tokens[0])
Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD')
>>> print([stmt[t.start:t.end+1] for t in tokens])
['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']

pglast.parser.split(query, with_parser=True, only_slices=False)¶

Parameters:

query (str) – The SQL statement
with_parser (bool) – Whether to use the parser or the scanner
only_slices (bool) – Return slices instead of statement’s text

Returns:

tuple

Split the given stmts string into a sequence of the single SQL statements.

By default this uses the parser to perform the job; when with_parser is False the scanner variant is used, indicated when the statements may contain parse errors.

When only_slices is True, return a sequence of slice instances, one for each statement, instead of statements text.

Note

Leading and trailing whitespace are removed from the statements.

Example:

>>> from pglast.parser import split
>>> split('select 1 for; select 2')
Traceback (most recent call last):
  ...
pglast.parser.ParseError: syntax error at or near ";", at index 12
>>> split('select 1 for; select 2', with_parser=False)
('select 1 for', 'select 2')
>>> stmts = "select 'fòò'; select 'bàr'"
>>> print([stmts[r] for r in split(stmts, only_slices=True)])
["select 'fòò'", "select 'bàr'"]

`pglast.parser` — The interface with libpg_query¶

pglast

Navigation

Related Topics

pglast.parser — The interface with libpg_query¶

`pglast.parser` — The interface with libpg_query¶