LR(k)到LR(1)的语法转换 [英] LR(k) to LR(1) grammar conversion

查看:347
本文介绍了LR(k)到LR(1)的语法转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对以下来自维基百科的报价感到困惑:

I am confused by the following quote from Wikipedia:

换句话说,如果一种语言足够合理,可以允许 有效的一遍解析器,可以用LR(k)语法描述. 而且这种语法总是可以机械地转化为 等效(但更大)的LR(1)语法.因此,LR(1)解析方法是 从理论上讲,其功能足以处理任何合理的语言.在 实践中,许多编程语言的自然语法是 接近LR(1).[需要引用]

In other words, if a language was reasonable enough to allow an efficient one-pass parser, it could be described by an LR(k) grammar. And that grammar could always be mechanically transformed into an equivalent (but larger) LR(1) grammar. So an LR(1) parsing method was, in theory, powerful enough to handle any reasonable language. In practice, the natural grammars for many programming languages are close to being LR(1).[citation needed]

这意味着解析器生成器(如bison)非常强大(因为它可以处理LR(k)语法),只要它能够将LR(k)语法转换为LR(1)语法.是否存在一些此类示例,或有关如何执行此操作的秘诀?我想知道这一点,因为我的语法存在移位/减少冲突,但是我认为这是因为它是LR(2)语法,并且希望将其转换为LR(1)语法.附带的问题:C++是一种不合理的语言,因为我读过,bison生成的解析器无法对其进行解析.

This means that a parser generator, like bison, is very powerful (since it can handle LR(k) grammars), if one is able to convert a LR(k) grammar to a LR(1) grammar. Do some examples of this exist, or a recipe on how to do this? I'd like to know this since I have a shift/reduce conflict in my grammar, but I think this is because it is a LR(2) grammar and would like to convert it to a LR(1) grammar. Side question: is C++ an unreasonable language, since I've read, that bison-generated parsers cannot parse it.

推荐答案

有关为找到LR(k)语法的覆盖LR(1)语法的通用算法的参考,请参见

For references on the general purpose algorithm to find a covering LR(1) grammar for an LR(k) grammar, see Real-world LR(k > 1) grammars?

通用算法产生很大的语法;实际上,我非常确定所生成的PDA与LR(k) PDA的大小相同.但是,在特定情况下,可以提出更简单的解决方案.不过,一般原则适用:您需要通过无条件转移来推迟转移/减少决定,直到可以使用单个超前标记做出决定为止.

The general purpose algorithm produces quite large grammars; in fact, I'm pretty sure that the resulting PDA is the same size as the LR(k) PDA would be. However, in particular cases it's possible to come up with simpler solutions. The general principle applies, though: you need to defer the shift/reduce decision by unconditionally shifting until the decision can be made with a single lookahead token.

一个示例: C#的lambda表达式语法LALR(1)吗?

在不了解有关您语法的更多细节的情况下,我无能为力.

Without knowing more details about your grammar, I can't really help more than that.

关于C ++,解析起来很棘手的是预处理器和解析(和词法化)模板实例化的一些特殊情况.表达式的解析取决于符号的种类"(而不是类型)(在符号出现的上下文中)这一事实使得使用bison进行精确解析变得很复杂. [1]不合理"是我不愿意做出的价值判断;当然,使用不同的语法对工具的支持(如准确的语法着色器和制表符完成器)本来很简单,但是证据表明,编写(甚至阅读)良好的C ++代码并不难.

With regard to C++, the things that make it tricky to parse are the preprocessor and some corner cases in parsing (and lexing) template instantiations. The fact that the parse of an expression depends on the "kind" (not type) of a symbol (in the context in which the symbol occurs) makes precise parsing with bison complicated. [1] "Unreasonable" is a value judgement which I'm not comfortable making; certainly, tool support (like accurate syntax colourizers and tab-completers) would have been simple with a different grammar, but the evidence is that it is not that hard to write (or even read) good C++ code.

注意:

[1]经典的棘手解析(也适用于C)是(a)*b,如果a表示类型,则为dereference的强制转换,否则为乘法.如果您要在以下环境中编写它:c/(a)*b,那么很明显,如果不知道它是演员还是产品,就无法构建AST,因为这会影响AST的形状,

[1] The classic tricky parse, which also applies to C, is (a)*b, which is a cast of a dereference if a represents a type, and otherwise a multiplication. If you were to write it in the context: c/(a)*b, it would be clear that an AST cannot be constructed without knowing whether it's a cast or a product, since that affects the shape of the AST,

一个更特定于C ++的问题是:x<y>(z)(或x<y<z>>(3))根据x是否命名模板而不同地解析(并且可以说是标记化).

A more C++-specific issue is: x<y>(z) (or x<y<z>>(3)) which parse (and arguably tokenise) differently depending on whether x names a template or not.

这篇关于LR(k)到LR(1)的语法转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆