语法与算子关联性之间的关系 [英] Relation between grammar and operator associativity

查看:76
本文介绍了语法与算子关联性之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些编译器书籍/文章/论文谈论语法的设计及其与运算符的关联性的关系.我是自上而下的忠实粉丝,尤其是递归下降,解析器以及到目前为止,我编写的大多数(如果不是全部)编译器都使用以下表达式语法:

Expr   ::= Term { ( "+" | "-" ) Term }
Term   ::= Factor { ( "*" | "/" ) Factor }
Factor ::= INTEGER | "(" Expr ")"

这是该BNF的EBNF表示形式:

Expr  ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term  ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor = INTEGER | "(" Expr ")"

据我所读,有人认为此语法是错误的",这是由于运算符关联性的变化(对于这4个运算符,从左到右)已由向右(而不是向左)增长的解析树证明.对于通过属性语法实现的解析器,这可能是正确的,因为l属性值要求先创建此值,然后再将其传递给子节点.但是,当使用普通递归下降解析器实现时,由我决定是先构造此节点,然后传递给子节点(自上而下),还是先创建子节点,然后将返回值添加为该节点的子节点(通过在此节点的构造函数中)(自下而上).我应该在这里遗漏某些东西,因为我不同意这种语法错误"的说法,并且该语法已在许多语言中使用,尤其是.维尔斯的.通常(或全部?)说它促进LR解析而不是LL解析的读物.

解决方案

我认为这里的问题是一种语言具有抽象语法,就像这样:

E ::= E + E | E - E | E * E | E / E | Int | (E)

,但这实际上是通过具体语法实现的,该语法用于指定关联性和优先级.因此,如果您要编写一个递归的体面解析,那么您将在过程中隐式地写入具体语法,这很好,尽管也可以准确地将其指定为短语结构语法!

要成为完整的具体语法,您的语法有几个问题.首先,您需要将生产添加到下一个层次",所以请稍微放松一下语法:

Expr ::= Term + Term | Term - Term | Term
Term ::= Factor * Factor | Factor / Factor | Factor
Factor ::= INTEGER | (Expr)

否则,无法从起始符号(在本例中为Expr)派生有效的句子.例如,如果没有这些额外的产生量,您将如何得出"1 * 2"?

Expr -> Term
     -> Factor * Factor
     -> 1 * Factor
     -> 1 * 2

我们可以看到其他语法以稍微不同的方式处理此问题:

Expr -> Term Expr'
     -> Factor Term' Expr'
     -> 1 Term' Expr'
     -> 1 * Factor Term' Expr'
     -> 1 * 2 Term' Expr'
     -> 1 * 2 ε Expr'
     -> 1 * 2 ε ε
      = 1 * 2

但这可以达到相同的效果.

您的解析器实际上是非关联的.要查看此内容,请问将如何解析E + E + E并发现无法解析.不管首先使用哪个+,我们都会在一侧获得E而在另一侧获得E + E,但是随后我们试图将E + E解析为Term,这是不可能的.同样,考虑从起始符号派生该表达式,这也是不可能的.

Expr -> Term + Term
     -> ? (can't get another + in here)

另一个语法是左联想ebcase,可以导出任意长的E + E + ... + E字符串.

总而言之,总而言之,在编写RDP时,您可以实现自己喜欢的抽象语法的任何 concrete 版本,并且您可能知道比我更多.但是,在尝试生成精确描述您的RDP的语法时会遇到这些问题.希望有帮助!

Some compiler books / articles / papers talk about design of a grammar and the relation of its operator's associativity. I'm a big fan of top-down, especially recursive descent, parsers and so far most (if not all) compilers I've written use the following expression grammar:

Expr   ::= Term { ( "+" | "-" ) Term }
Term   ::= Factor { ( "*" | "/" ) Factor }
Factor ::= INTEGER | "(" Expr ")"

which is an EBNF representation of this BNF:

Expr  ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term  ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor = INTEGER | "(" Expr ")"

According to what I read, some regards this grammar as being "wrong" due to the change of operator associativity (left to right for those 4 operators) proven by the growing parse tree to the right instead of left. For a parser implemented through attribute grammar, this might be true as l-attribute value requires that this value created first then passed to child nodes. however, when implementing with normal recursive descent parser, it's up to me whether to construct this node first then pass to child nodes (top-down) or let child nodes be created first then add the returned value as the children of this node (passed in this node's constructor) (bottom-up). There should be something I miss here because I don't agree with the statement saying this grammar is "wrong" and this grammar has been used in many languages esp. Wirthian ones. Usually (or all?) the reading that says it promotes LR parsing instead of LL.

解决方案

I think the issue here is that a language has an abstract syntax which is just like:

E ::= E + E | E - E | E * E | E / E | Int | (E)

but this is actually implemented via a concrete syntax which is used to specify associativity and precedence. So, if you're writing a recursive decent parse, you're implicitly writing the concrete syntax into it as you go along and that's fine, though it may be good to specify it exactly as a phrase-structured grammar as well!

There are a couple of issues with your grammar if it is to be a fully-fledged concrete grammar. First of all, you need to add productions to just 'go to the next level down', so relaxing your syntax a bit:

Expr ::= Term + Term | Term - Term | Term
Term ::= Factor * Factor | Factor / Factor | Factor
Factor ::= INTEGER | (Expr)

Otherwise there's no way to derive valid sentences starting from the start symbol (in this case Expr). For example, how would you derive '1 * 2' without those extra productions?

Expr -> Term
     -> Factor * Factor
     -> 1 * Factor
     -> 1 * 2

We can see the other grammar handles this in a slightly different way:

Expr -> Term Expr'
     -> Factor Term' Expr'
     -> 1 Term' Expr'
     -> 1 * Factor Term' Expr'
     -> 1 * 2 Term' Expr'
     -> 1 * 2 ε Expr'
     -> 1 * 2 ε ε
      = 1 * 2

but this achieves the same effect.

Your parser is actually non-associative. To see this ask how E + E + E would be parsed and find that it couldn't. Whichever + is consumed first, we get E on one side and E + E on the other, but then we're trying to parse E + E as a Term which is not possible. Equivalently, think about deriving that expression from the start symbol, again not possible.

Expr -> Term + Term
     -> ? (can't get another + in here)

The other grammar is left-associative ebcase an arbitrarily long sting of E + E + ... + E can be derived.

So anyway, to sum up, you're right that when writing the RDP, you can implement whatever concrete version of the abstract syntax you like and you probably know a lot more about that than me. But there are these issues when trying to produce the grammar which describes your RDP precisely. Hope that helps!

这篇关于语法与算子关联性之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆