pglast.parser
— The interface with libpg_query¶
This module is a C extension written in Cython that exposes a few functions from the
underlying libpg_query
library it links against.
- pglast.parser.LONG_MAX¶
The highest integer that can be stored in a C
long
variable: it is used as a marker, for example in PG’sFetchStmt.howMany
, that uses the constantFETCH_ALL
.
- exception pglast.parser.ParseError¶
Exception representing the error state returned by the parser.
- exception pglast.parser.DeparseError¶
Exception representing the error state returned by the deparser.
- class pglast.parser.Displacements(string)¶
Helper class used to find the index of Unicode character from its offset in the corresponding UTF-8 encoded array.
Example:
>>> from pglast.parser import Displacements >>> unicode = '€ 0.01' >>> utf8 = unicode.encode('utf-8') >>> d = Displacements(unicode) >>> for offset in range(len(utf8)): ... idx = d(offset) ... print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]') ... 0 [e2] -> 0 [€] 1 [82] -> 0 [€] 2 [ac] -> 0 [€] 3 [20] -> 1 [ ] 4 [30] -> 2 [0] 5 [2e] -> 3 [.] 6 [30] -> 4 [0] 7 [31] -> 5 [1]
The underlying
libpg_parse
library operates onUTF-8
strings: its parser functions emit tokens with alocation
, that is actually the offset within theUTF-8
representation of the statement. With this class you can fixup those offsets, like in the following example:>>> import json >>> from pglast.parser import parse_sql_json >>> stmt = 'select alias.bar as alìbàbà from foo as alias' >>> parsed = json.loads(parse_sql_json(stmt)) >>> select = parsed['stmts'][0]['stmt']['SelectStmt'] >>> rangevar = select['fromClause'][0]['RangeVar'] >>> loc = rangevar['location'] >>> print(stmt[loc:loc+3]) as >>> d = Displacements(stmt) >>> adjloc = d(loc) >>> print(stmt[adjloc:adjloc+3]) foo
- pglast.parser.deparse_protobuf(buffer)¶
- Parameters:
buffer (bytes) – a
Protobuf
buffer- Returns:
str
Return the
SQL
statement from the given buffer argument, something generated byparse_sql_protobuf()
.
- pglast.parser.fingerprint(query)¶
- Parameters:
query (str) – The SQL statement
- Returns:
str
Fingerprint the given query, a string with the
SQL
statement(s), and return a hash digest that can identify similar queries. For similar queries that are different only because of the queried object or formatting, the returned digest will be the same.
- pglast.parser.get_postgresql_version()¶
- Returns:
a tuple
Return the PostgreSQL version as a tuple (major, minor, patch).
- pglast.parser.parse_sql(query)¶
- Parameters:
query (str) – The SQL statement
- Returns:
tuple
Parse the given query, a string with the
SQL
statement(s), and return the corresponding parse tree as a tuple ofpglast.ast.RawStmt
instances.
- pglast.parser.parse_sql_json(query)¶
- Parameters:
query (str) – The SQL statement
- Returns:
str
Parse the given query, a string with the
SQL
statement(s), and return thelibpg_query
‘sJSON
-serialized parse tree.
- pglast.parser.parse_sql_protobuf(query)¶
- Parameters:
query (str) – The SQL statement
- Returns:
bytes
Parse the given query, a string with the
SQL
statement(s), and return thelibpg_query
‘sProtobuf
-serialized parse tree.
- pglast.parser.parse_plpgsql_json(query)¶
- Parameters:
query (str) – The PLpgSQL statement
- Returns:
str
Parse the given query, a string with the
plpgsql
statement(s), and return thelibpg_query
‘sJSON
-serialized parse tree.
- pglast.parser.scan(query)¶
- Parameters:
query (str) – The SQL statement
- Returns:
sequence of tuples
Split the given query into its tokens. Each token is a namedtuple with the following slots:
- startint
the index of the start of the token
- endint
the index of the end of the token
- namestr
the name of the token
- kindstr
the kind of the token
Example:
>>> from pglast.parser import scan >>> stmt = 'select bar as alìbàbà from foo' >>> tokens = scan(stmt) >>> print(tokens[0]) Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD') >>> print([stmt[t.start:t.end+1] for t in tokens]) ['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']
- pglast.parser.split(query, with_parser=True, only_slices=False)¶
- Parameters:
query (str) – The SQL statement
with_parser (bool) – Whether to use the parser or the scanner
only_slices (bool) – Return slices instead of statement’s text
- Returns:
tuple
Split the given stmts string into a sequence of the single
SQL
statements.By default this uses the parser to perform the job; when with_parser is
False
the scanner variant is used, indicated when the statements may contain parse errors.When only_slices is
True
, return a sequence ofslice
instances, one for each statement, instead of statements text.Note
Leading and trailing whitespace are removed from the statements.
Example:
>>> from pglast.parser import split >>> split('select 1 for; select 2') Traceback (most recent call last): ... pglast.parser.ParseError: syntax error at or near ";", at index 12 >>> split('select 1 for; select 2', with_parser=False) ('select 1 for', 'select 2') >>> stmts = "select 'fòò'; select 'bàr'" >>> print([stmts[r] for r in split(stmts, only_slices=True)]) ["select 'fòò'", "select 'bàr'"]