ANTLR ParseTree 的序列化 [英] Serialization of ANTLR ParseTree

查看:19
本文介绍了ANTLR ParseTree 的序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个生成的语法,它可以做两件事:

I have a generated grammar that does two things:

  • 检查特定领域语言的语法
  • 针对特定领域的语言评估输入

这两个函数是独立的,我们称它们为validate() 和evaluate().

These two functions are separate, lets call them validate() and evaluate().

validate() 函数从字符串输入构建树,同时确保它满足 BNF 对语言的要求.evaluate() 函数将值插入到该树中以获得结果(通常为 true 或 false).

The validate() function builds the tree from a String input while ensuring it meets the requirements of the BNF for the language. The evaluate() function plugs in values to that tree to get a result (usually true or false).

代码当前所做的是每次对输入运行 validate() ,只是为了生成evaluate() 使用的树.有些输入需要长达 60 秒的时间来检查.我想做的是序列化 validate() 的结果(假设它满足语法要求),将序列化后的表单存储在后端数据库中,然后作为evaluate() 的一部分从数据库中加载它.

What the code is currently doing is running validate() each time on the input, just to generate the tree that evaluate() uses. Some of the inputs take up to 60 seconds to be checked. What I would LIKE to do is serialize the results of validate() (assuming it meets the syntax requirements), store the serialized form in the backend database, and just load it from the database as part of evaluate().

我注意到我可以在解析树上执行方法 toStringTree(),并检索 LISP 样式树.但是,我可以将 LISP 样式树恢复为 ANTLR 解析树吗?如果没有,有人可以推荐另一种方法来序列化和存储生成的解析树吗?

I noticed that I can execute the method toStringTree() on the parse tree, and retrieve a LISP style tree. However, can I restore a LISP style tree to an ANTLR parse tree? If not, can anyone recommend another way to serialize and store the generated parse tree?

感谢您的帮助.

杰森

推荐答案

ANTLR 4 的 ParseRuleContext 数据结构(生成解析器用来表示语法规则的 ParseTree 的具体实现在解析树中)默认情况下不可序列化.在项目问题跟踪器上打开 issue #233 涵盖功能请求.然而,根据我使用 ANTLR 进行解析的许多应用程序的经验,我不相信序列化解析树从长远来看会有用.对于序列化解析树要解决的每个问题,已经存在更好的解决方案.

ANTLR 4's ParseRuleContext data structure (the specific implementation of ParseTree used by generated parsers to represent grammar rules in the parse tree) is not serializable by default. Open issue #233 on the project issue tracker covers the feature request. However, based on my experience with many applications using ANTLR for parsing, I'm not convinced serializing the parse trees would be useful in the long run. For each problem serializing the parse tree is meant to address, a better solution already exists.

另一种选择是在数据库中存储最后一个已知有效文件的哈希值.使用解析器创建解析树后,如果输入文件的哈希值与上次验证时相同,则可以跳过验证步骤.这利用了 ANTLR 4 的两个方面:

Another option is to store a hash of the last known valid file in the database. After you use the parser to create a parse tree, you could skip the validation step if the input file has the same hash as the last time it was validated. This leverages two aspects of ANTLR 4:

  1. 对于同一个输入文件,运行两次解析器将产生相同的解析树.
  2. ANTLR 4 解析器几乎在所有情况下都非常快(例如,Java 语法每秒可以处理大约 20MB 的源代码).其余情况往往是由结构不良的语法规则引起的,ANTLRWorks 2.2 中的新解析器解释器功能可以分析这些规则并提出改进建议.

如果您需要的性能超出您的使用范围,那么解析树不是您应该使用的数据结构.StringTemplate 4 相对于 StringTemplate 3 的巨大性能优势主要来自这样一个事实,即解释器从使用 AST(相当于此推理的解析树)切换到线性字节码表示/解释器.出于性能原因,ST4 的 AST 永远不需要序列化,因为字节码将被序列化.事实上,StringTemplate 4 的 C# 端口提供了这个特性.

If you need performance beyond what you get with this, then a parse tree isn't the data structure you should be using. StringTemplate 4's enormous performance advantage over StringTemplate 3 came primarily from the fact that the interpreter switched from using ASTs (equivalent to parse trees for this reasoning) to a linear bytecode representation/interpreter. The ASTs for ST4 would never need to be serialized for performance reasons because the bytecode would be serialized instead. In fact, the C# port of StringTemplate 4 provides exactly this feature.

这篇关于ANTLR ParseTree 的序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆