缓慢的ANTLR4在Python中生成解析器,但在Java中快速 [英] Slow ANTLR4 generated Parser in Python, but fast in Java

查看:740
本文介绍了缓慢的ANTLR4在Python中生成解析器,但在Java中快速的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 ANTLR3语法转换为 ANTLR4语法,以便将其与antlr4-python2-runtime结合使用. 该语法是C/C ++模糊解析器.

I am trying to convert ant ANTLR3 grammar to an ANTLR4 grammar, in order to use it with the antlr4-python2-runtime. This grammar is a C/C++ fuzzy parser.

转换后(基本上删除了树运算符和语义/句法谓词),我使用以下命令生成了Python2文件:

After converting it (basically removing tree operators and semantic/syntactic predicates), I generated the Python2 files using:

java -jar antlr4.5-complete.jar -Dlanguage=Python2 CPPGrammar.g4

并且生成的代码没有任何错误,因此我将其导入我的python项目(我正在使用PyCharm)中进行一些测试:

And the code is generated without any error, so I import it in my python project (I'm using PyCharm) to make some tests:

import sys, time
from antlr4 import *
from parser.CPPGrammarLexer import CPPGrammarLexer
from parser.CPPGrammarParser import CPPGrammarParser

currenttimemillis = lambda: int(round(time.time() * 1000))

def is_string(object):
    return isinstance(object,str)

def parsecommandstringline(argv):
    if(2!=len(argv)):
        raise IndexError("Invalid args size.")
    if(is_string(argv[1])):
        return True
    else:
        raise TypeError("Argument must be str type.")

def doparsing(argv):
    if parsecommandstringline(argv):
        print("Arguments: OK - {0}".format(argv[1]))
        input = FileStream(argv[1])
        lexer = CPPGrammarLexer(input)
        stream = CommonTokenStream(lexer)
        parser = CPPGrammarParser(stream)
        print("*** Parser: START ***")
        start = currenttimemillis()
        tree = parser.code()
        print("*** Parser: END *** - {0} ms.".format(currenttimemillis()-start))
        pass

def main(argv):
    tree = doparsing(argv)
    pass

if __name__ == '__main__':
    main(sys.argv)

问题在于解析速度很慢.对于包含约200行的文件,需要5分钟以上才能完成,而在antlrworks中对同一文件的解析仅需要1-2秒. 分析antlrworks树时,我注意到expr规则及其所有后代经常被调用,我认为我需要简化/更改这些规则以使解析器更快地运行:

The problem is that the parsing is very slow. With a file containing ~200 lines it takes more than 5 minutes to complete, while the parsing of the same file in antlrworks only takes 1-2 seconds. Analyzing the antlrworks tree, I noticed that the expr rule and all of its descendants are called very often and I think that I need to simplify/change these rules to make the parser operate faster:

我的假设是正确的还是在转换语法时犯了一些错误?如何使解析速度与在antlrworks上一样快?

Is my assumption correct or did I make some mistake while converting the grammar? What can be done to make parsing as fast as on antlrworks?

更新: 我将相同的语法导出到Java,仅花费795ms即可完成解析.这个问题似乎与python实现有关,而不是与语法本身有关.有什么办法可以加快Python解析速度?
我已经在此处中阅读到python可以比Java慢20-30倍,但就我而言,python的速度要慢约400倍!

UPDATE: I exported the same grammar to Java and it only took 795ms to complete the parsing. The problem seems more related to python implementation than to the grammar itself. Is there anything that can be done to speed up Python parsing?
I've read here that python can be 20-30 times slower than java, but in my case python is ~400 times slower!

推荐答案

我确认Python 2和Python 3运行时存在性能问题.有了一些补丁,我在python3运行时上获得了10倍的加速(从约5秒减少到约400毫秒). https://github.com/antlr/antlr4/pull/1010

I confirm that the Python 2 and Python 3 runtimes have performance issues. With a few patches, I got a 10x speedup on the python3 runtime (~5 seconds down to ~400 ms). https://github.com/antlr/antlr4/pull/1010

这篇关于缓慢的ANTLR4在Python中生成解析器,但在Java中快速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆