.. -*- coding: utf-8 -*- .. :Project: pglast -- Parser module .. :Created: gio 10 ago 2017 10:19:26 CEST .. :Author: Lele Gaifax .. :License: GNU General Public License version 3 or later .. :Copyright: © 2017, 2018, 2021, 2023 Lele Gaifax .. =========================================================== :mod:`pglast.parser` --- The interface with libpg_query =========================================================== .. module:: pglast.parser :synopsis: The interface with libpg_query This module is a C extension written in Cython__ that exposes a few functions from the underlying ``libpg_query`` library it links against. .. data:: LONG_MAX The highest integer that can be stored in a C ``long`` variable: it is used as a marker, for example in PG's ``FetchStmt.howMany``, that uses the constant ``FETCH_ALL``. .. exception:: ParseError Exception representing the error state returned by the parser. .. exception:: DeparseError Exception representing the error state returned by the deparser. .. class:: Displacements(string) Helper class used to find the index of Unicode character from its offset in the corresponding UTF-8 encoded array. Example: .. doctest:: >>> from pglast.parser import Displacements >>> unicode = '€ 0.01' >>> utf8 = unicode.encode('utf-8') >>> d = Displacements(unicode) >>> for offset in range(len(utf8)): ... idx = d(offset) ... print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]') ... 0 [e2] -> 0 [€] 1 [82] -> 0 [€] 2 [ac] -> 0 [€] 3 [20] -> 1 [ ] 4 [30] -> 2 [0] 5 [2e] -> 3 [.] 6 [30] -> 4 [0] 7 [31] -> 5 [1] The underlying ``libpg_parse`` library operates on ``UTF-8`` strings: its parser functions emit tokens with a ``location``, that is actually the offset within the ``UTF-8`` representation of the statement. With this class you can fixup those offsets, like in the following example: .. doctest:: >>> import json >>> from pglast.parser import parse_sql_json >>> stmt = 'select alias.bar as alìbàbà from foo as alias' >>> parsed = json.loads(parse_sql_json(stmt)) >>> select = parsed['stmts'][0]['stmt']['SelectStmt'] >>> rangevar = select['fromClause'][0]['RangeVar'] >>> loc = rangevar['location'] >>> print(stmt[loc:loc+3]) as >>> d = Displacements(stmt) >>> adjloc = d(loc) >>> print(stmt[adjloc:adjloc+3]) foo .. function:: deparse_protobuf(buffer) :param bytes buffer: a ``Protobuf`` buffer :returns: str Return the ``SQL`` statement from the given `buffer` argument, something generated by :func:`parse_sql_protobuf()`. .. function:: fingerprint(query) :param str query: The SQL statement :returns: str Fingerprint the given `query`, a string with the ``SQL`` statement(s), and return a hash digest that can identify similar queries. For similar queries that are different only because of the queried object or formatting, the returned digest will be the same. .. function:: get_postgresql_version() :returns: a tuple Return the PostgreSQL version as a tuple (`major`, `minor`, `patch`). .. function:: parse_sql(query) :param str query: The SQL statement :returns: tuple Parse the given `query`, a string with the ``SQL`` statement(s), and return the corresponding *parse tree* as a tuple of :class:`pglast.ast.RawStmt` instances. .. function:: parse_sql_json(query) :param str query: The SQL statement :returns: str Parse the given `query`, a string with the ``SQL`` statement(s), and return the ``libpg_query``\ 's ``JSON``\ -serialized parse tree. .. function:: parse_sql_protobuf(query) :param str query: The SQL statement :returns: bytes Parse the given `query`, a string with the ``SQL`` statement(s), and return the ``libpg_query``\ 's ``Protobuf``\ -serialized parse tree. .. function:: parse_plpgsql_json(query) :param str query: The PLpgSQL statement :returns: str Parse the given `query`, a string with the ``plpgsql`` statement(s), and return the ``libpg_query``\ 's ``JSON``\ -serialized parse tree. .. function:: scan(query) :param str query: The SQL statement :returns: sequence of tuples Split the given `query` into its *tokens*. Each token is a `namedtuple` with the following slots: start : int the index of the start of the token end : int the index of the end of the token name : str the name of the token kind : str the kind of the token Example: .. doctest:: >>> from pglast.parser import scan >>> stmt = 'select bar as alìbàbà from foo' >>> tokens = scan(stmt) >>> print(tokens[0]) Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD') >>> print([stmt[t.start:t.end+1] for t in tokens]) ['select', 'bar', 'as', 'alìbàbà', 'from', 'foo'] .. function:: split(query, with_parser=True, only_slices=False) :param str query: The SQL statement :param bool with_parser: Whether to use the parser or the scanner :param bool only_slices: Return slices instead of statement's text :returns: tuple Split the given `stmts` string into a sequence of the single ``SQL`` statements. By default this uses the *parser* to perform the job; when `with_parser` is ``False`` the *scanner* variant is used, indicated when the statements may contain parse errors. When `only_slices` is ``True``, return a sequence of :class:`slice` instances, one for each statement, instead of statements text. .. note:: Leading and trailing whitespace are removed from the statements. Example: .. doctest:: >>> from pglast.parser import split >>> split('select 1 for; select 2') Traceback (most recent call last): ... pglast.parser.ParseError: syntax error at or near ";", at index 12 >>> split('select 1 for; select 2', with_parser=False) ('select 1 for', 'select 2') >>> stmts = "select 'fòò'; select 'bàr'" >>> print([stmts[r] for r in split(stmts, only_slices=True)]) ["select 'fòò'", "select 'bàr'"] __ http://cython.org/