Scala解析器组合器,歧义语法&解析森林 [英] Scala Parser Combinator, Ambiguous Grammar & Parse Forest

查看:89
本文介绍了Scala解析器组合器,歧义语法&解析森林的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图让解析器从一个模棱两可的语法中返回所有可能的解析结果(解析森林),并通过针对用户上下文/历史记录和知识库对其进行评估,从解析森林中进行选择.出于性能原因,应该使用packrat解析器和搜索限制/上限参数来完成此操作,以限制应用生产规则以避免无限循环时递归调用的次数.

I am trying to get the parser to return all possible parse results (parse forest) from an ambiguous grammar and choose from the parse forest by evaluating them against user context / history and a knowledge base. For performance reason, this should probably be done with the packrat parser and a search limit / upper-bound parameter to limite the number of recursive calls in applying production rules to avoid infinite loops.

对于Scala及其解析器组合器来说,我是一个新手,我不知道该怎么做或是否可以做到.有人可以帮忙吗?非常感谢.

Being new to both Scala and its Parser Combinators, I can't figure out how to do this or whether it can be done at all. Could someone please help? Much appreciated.

最诚挚的问候, 托马斯·胡安

Best regards, Thomas Juan

推荐答案

您无法使用Scala的内置解析器组合器来做到这一点. Packrat组合器仅限于明确的语法.如果尝试处理歧义性,那么您将失去记忆的能力,即使对于明确的线索,解析复杂度也变为O(k ^ n).所以,不要那样做.

You cannot do this with Scala's built-in parser combinators. Packrat combinators are restricted to only unambiguous grammars. If you try to deal with ambiguity, you lose the ability to memoize and the parse complexity becomes O(k^n) even for unambiguous trails. So, don't do that.

您想要的是一个能够正确处理歧义的解析器组合器框架.据我所知,Scala唯一这样的框架是我的 GLL组合器.语法几乎与Scala的解析器组合器相同(主要区别是高arar函数正确运行 ),但是输出为Stream[A],其中A是结果类型.这样,您将获得一个解析森林.请注意,结果实际上不是SPPF(共享打包解析林),因此,如果您产生指数级的结果,则流将具有成比例的大小.这样做是为了保持结果类型的最终灵活性.

What you want is a parser combinator framework that correctly handles ambiguity. As far as I know, the only such framework for Scala is my GLL combinators. The syntax is almost identical to Scala's parser combinators (the main difference being that higher-arity functions work correctly), but the output is a Stream[A], where A is the result type. Thus, you get a parse forest. Note that the result is not actually a SPPF (shared packed parse forest), so if you produce an exponential number of results, the stream will have proportional size. This was done to maintain ultimate flexibility in the result type.

GLL组合符在最坏情况下也为O(n ^ 3).在一般情况下,解析线索是明确的,GLL组合器为O(n).恒定的时间开销目前有点过分,但是优化是一个正在进行的项目.我们在生产环境中使用 Precog 的GLL组合器,因此可以期望该库可以得到很好的维护.

GLL combinators is O(n^3) in the worst case, even for pathologically ambiguous grammars. In the average case, where the parse trail is unambiguous, GLL combinators is O(n). The constant time overhead is currently a bit excessive, but optimization is an ongoing project. We use GLL combinators in production at Precog, so you can expect the library to be well maintained going forward.

这篇关于Scala解析器组合器,歧义语法&解析森林的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆