antlr4 python 3 从 plsql 语法打印或转储令牌 [英] antlr4 python 3 print or dump tokens from plsql grammar

查看:37
本文介绍了antlr4 python 3 从 plsql 语法打印或转储令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 中使用 antlr4,阅读以下语法:

I am using antlr4 in Python, to read the following grammar :

https://github.com/antlr/grammars-v4/tree/master/plsql

file grants.sql 只是有begin select 'bob' from dual; end;"

file grants.sql just has "begin select 'bob' from dual; end;"

像树一样打印 lisp 的简单代码

simple code to print lisp like tree

from antlr4 import *
from PlSqlLexer import PlSqlLexer
from PlSqlParser import PlSqlParser
from PlSqlParserListener import PlSqlParserListener

input = FileStream('grants.sql')
lexer = PlSqlLexer(input)

stream = CommonTokenStream(lexer)
parser = PlSqlParser(stream)
tree = parser.sql_script()

print ("Tree " + tree.toStringTree(recog=parser));

输出是这样的:

Tree(sql_script(unit_statement(anonymous_block BEGIN(seq_of_statements(statement(sql_statement(data_manipulation_language_statements(select_statement(subquery(subquery_basic_elements(query_block SELECT)(selected_element(select_list_elements(expression(expression(expression_expression)(atom (constant (quoted_string 'bob'))))))))))))) (from_clause FROM (table_ref_list (table_ref (table_ref_aux (table_ref_aux_internal (dml_table_expression_clause (tableview_name (identifier) (id_expression (regular_id DUAL)))))))))))))))))) ;) 结束 ;)) )

Tree (sql_script (unit_statement (anonymous_block BEGIN (seq_of_statements (statement (sql_statement (data_manipulation_language_statements (select_statement (subquery (subquery_basic_elements (query_block SELECT (selected_element (select_list_elements (expressions (expression (logical_expression (multiset_expression (relational_expression (compound_expression (concatenation (model_expression (unary_expression (atom (constant (quoted_string 'bob')))))))))))))) (from_clause FROM (table_ref_list (table_ref (table_ref_aux (table_ref_aux_internal (dml_table_expression_clause (tableview_name (identifier (id_expression (regular_id DUAL))))))))))))))))) ;) END ;)) )

我希望能够使用 python 代码,而不是在类似 lisp 的语句中列出上述内容,而是列出所有规则和标记.即

I'd like to be able to have python code that lists the above not in a lisp like statement but lists all the rules and tokens.. i.e

  1. .sql_script
  1. .sql_script
  1. ..unit_statement
  2. ...anonymous_block
  3. ....开始

等等等等

有人可以提供执行此操作的 python 代码或给我一些提示.不胜感激.

Can someone supply python code that does this or give me some hints. Gratefully appreciated.

推荐答案

开始:

from antlr4 import *
from antlr4.tree.Tree import TerminalNodeImpl
from PlSqlLexer import PlSqlLexer
from PlSqlParser import PlSqlParser

# Generate the lexer nad parser like this:
#
#   java -jar antlr-4.7.1-complete.jar -Dlanguage=Python3 *.g4
#
def main():
    lexer = PlSqlLexer(InputStream("SELECT * FROM TABLE_NAME"))
    parser = PlSqlParser(CommonTokenStream(lexer))
    tree = parser.sql_script()
    traverse(tree, parser.ruleNames)

def traverse(tree, rule_names, indent = 0):
    if tree.getText() == "<EOF>":
        return
    elif isinstance(tree, TerminalNodeImpl):
        print("{0}TOKEN='{1}'".format("  " * indent, tree.getText()))
    else:
        print("{0}{1}".format("  " * indent, rule_names[tree.getRuleIndex()]))
        for child in tree.children:
            traverse(child, rule_names, indent + 1)

if __name__ == '__main__':
    main()

打印:

sql_script
  unit_statement
    data_manipulation_language_statements
      select_statement
        subquery
          subquery_basic_elements
            query_block
              TOKEN='SELECT'
              TOKEN='*'
              from_clause
                TOKEN='FROM'
                table_ref_list
                  table_ref
                    table_ref_aux
                      table_ref_aux_internal
                        dml_table_expression_clause
                          tableview_name
                            identifier
                              id_expression
                                regular_id
                                  TOKEN='TABLE_NAME'

请注意,为了使词法分析器和解析器正常工作,我添加了以下 Python 类:

Note that for the lexer and parser to work properly, I added the following Python classes:

# PlSqlBaseLexer.py
from antlr4 import *

class PlSqlBaseLexer(Lexer):

    def IsNewlineAtPos(self, pos):
        la = self._input.LA(pos)
        return la == -1 or la == '\n'

和:

# PlSqlBaseParser.py
from antlr4 import *

class PlSqlBaseParser(Parser):

    _isVersion10 = False
    _isVersion12 = True

    def isVersion10(self):
        return self._isVersion10

    def isVersion12(self):
        return self._isVersion12

    def setVersion10(self, value):
        self._isVersion10 = value

    def setVersion12(self, value):
        self._isVersion12 = value

我将它放在与生成的 Python 类相同的文件夹中.我还需要和生成的 PlSqlLexer.py 类中的导入语句 from PlSqlBaseLexer import PlSqlBaseLexer,并修复 PlSqlParser.py 中的导入语句from from ./PlSqlBaseParser import PlSqlBaseParser to from PlSqlBaseParser import PlSqlBaseParser.

which I placed in the same folder as the generated Python classes. I also needed to and the import statement from PlSqlBaseLexer import PlSqlBaseLexer in the generated PlSqlLexer.py class, and fix the import statement in PlSqlParser.py from from ./PlSqlBaseParser import PlSqlBaseParser to from PlSqlBaseParser import PlSqlBaseParser.

请注意,运行演示相当慢.除非您有严格的要求在 Python 中执行此操作,否则我建议改用(快得多!)更快的 Java 或 C# 目标.

Note that running the demo is rather slow. Unless you have a hard requirement to do this in Python, I recommend going with the (much!) faster Java or C# target instead.

这篇关于antlr4 python 3 从 plsql 语法打印或转储令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆