ANTLR ParseTree的序列化 [英] Serialization of ANTLR ParseTree

查看:33
本文介绍了ANTLR ParseTree的序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个生成的语法,可以完成两件事:

I have a generated grammar that does two things:

  • 检查特定领域语言的语法
  • 根据特定领域的语言评估输入内容

这两个函数是分开的,让我们分别将它们称为validate()和valuate().

These two functions are separate, lets call them validate() and evaluate().

validate()函数从字符串输入构建树,同时确保其满足该语言对BNF的要求.valuate()函数将值插入该树以获取结果(通常为true或false).

The validate() function builds the tree from a String input while ensuring it meets the requirements of the BNF for the language. The evaluate() function plugs in values to that tree to get a result (usually true or false).

代码当前正在做什么,每次在输入上运行validate(),只是为了生成评估()使用的树.某些输入最多需要60秒才能检查.我想做的是序列化validate()的结果(假设它满足语法要求),将序列化的表单存储在后端数据库中,然后将其作为validate()的一部分从数据库中加载.

What the code is currently doing is running validate() each time on the input, just to generate the tree that evaluate() uses. Some of the inputs take up to 60 seconds to be checked. What I would LIKE to do is serialize the results of validate() (assuming it meets the syntax requirements), store the serialized form in the backend database, and just load it from the database as part of evaluate().

我注意到我可以在解析树上执行方法 toStringTree(),并检索LISP样式树.但是,我可以将LISP样式树还原为ANTLR解析树吗?如果不是,那么有人可以推荐另一种方法来序列化和存储生成的解析树吗?

I noticed that I can execute the method toStringTree() on the parse tree, and retrieve a LISP style tree. However, can I restore a LISP style tree to an ANTLR parse tree? If not, can anyone recommend another way to serialize and store the generated parse tree?

感谢您的帮助.

Jason

推荐答案

ANTLR 4的 ParseRuleContext 数据结构(生成的解析器用来表示语法规则的 ParseTree 的特定实现)在分析树中)默认情况下无法序列化.在项目问题跟踪程序中涵盖的功能请求上,打开 issue#233 .但是,根据我在许多使用ANTLR进行解析的应用程序中的经验,我不认为序列化解析树从长远来看会很有用.对于序列化解析树要解决的每个问题,已经存在更好的解决方案.

ANTLR 4's ParseRuleContext data structure (the specific implementation of ParseTree used by generated parsers to represent grammar rules in the parse tree) is not serializable by default. Open issue #233 on the project issue tracker covers the feature request. However, based on my experience with many applications using ANTLR for parsing, I'm not convinced serializing the parse trees would be useful in the long run. For each problem serializing the parse tree is meant to address, a better solution already exists.

另一个选择是将最后一个已知有效文件的哈希存储在数据库中.使用解析器创建解析树后,如果输入文件的哈希值与上次验证的哈希值相同,则可以跳过验证步骤.这利用了ANTLR 4的两个方面:

Another option is to store a hash of the last known valid file in the database. After you use the parser to create a parse tree, you could skip the validation step if the input file has the same hash as the last time it was validated. This leverages two aspects of ANTLR 4:

  1. 对于相同的输入文件,运行两次解析器将产生相同的解析树.
  2. ANTLR 4解析器在几乎所有情况下都非常快(例如,Java语法每秒可以处理约20MB的源).剩下的情况往往是由于语法规则结构不良造成的,ANTLRWorks 2.2中的新解析器解释器功能可以分析语法规则并提出改进建议.

如果您需要超出此范围的性能,则解析树不是您应使用的数据结构.StringTemplate 4与StringTemplate 3相比,其巨大的性能优势主要来自于以下事实:解释器从使用AST(等效于此原因解析树)切换为线性字节码表示/解释器.由于性能原因,ST4的AST永远不需要序列化,因为字节码将被序列化.实际上,StringTemplate 4的C#端口完全提供了此功能.

If you need performance beyond what you get with this, then a parse tree isn't the data structure you should be using. StringTemplate 4's enormous performance advantage over StringTemplate 3 came primarily from the fact that the interpreter switched from using ASTs (equivalent to parse trees for this reasoning) to a linear bytecode representation/interpreter. The ASTs for ST4 would never need to be serialized for performance reasons because the bytecode would be serialized instead. In fact, the C# port of StringTemplate 4 provides exactly this feature.

这篇关于ANTLR ParseTree的序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆