如何在(LA)LR解析器中获取标识符 [英] How to get identifiers scoped in a (LA)LR parser

查看:47
本文介绍了如何在(LA)LR解析器中获取标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(LA)LR 解析器的一个缺点是,reduce 只在规则的末尾处理.这是具有作用域变量(如 javascript)的编程语言中的一个问题.

A disadvantage of a (LA)LR parser is that a reduce is only processed at the end of a rule. This is a problem in programming languages with scoped variables like javascript.

示例:

var a = 2;
function (a) {
   a = 4;
}

参见上面的代码示例.解析器可能看起来像:

See the above code example. A parser could look like:

program : instruction program {}
        | {}
        ;

instruction : command {}
            | function {}
            ;

command : "var" identifier "=" literal ";" {}
        ;

function : "function" "(" arguments ")" "{" program "}" {/*1*/}
         ;

arguments : identifier {}
          | identifier "," arguments {}
          | {}
          ;

现在很明显,解析器每次都会消耗一个标识符.可以在寄存器中注册标识符.然而,问题是函数(行 /*1*/)只在函数的末尾被考虑.因此,在函数中使用标识符(如 a = 4;)的指令不能在解析器时绑定到本地/全局标识符,因为它当时是未知的.

Now it is clear that each time an identifer is consumed by the parser. One can register the identifier in a register. The problem however is that a function (line /*1*/) is only considered at the end of the function. Instructions using the identifier (like a = 4;) in a function thus cannot be binded to the local/global identifier at parser time, since it is at that time unknown.

有什么好的做法可以解决这个问题,C#(标准库)提供哪些功能来处理这种情况?

What are good practices to tackle this problem, what functionality does C# (standard library) provide to handle such situations?

推荐答案

免责声明

您问题中的语法采用 yacc 格式,因此我假设您打算使用与 yacc 兼容的解析器生成器.因此,我不知道您对 C# 标准库有什么期望.这个答案一般适用于 yaccbison 和其他一些使用相同输入格式的解析器生成器.对于 jison,请参阅下面关于中间规则操作的注释.

Disclaimer

The grammar in your question is in yacc format, so I'm assuming you are planning to use a yacc-compatible parser generator. Consequently, I have no idea what you expect from the C# standard library. This answer applies in general terms to yacc, bison and some other parser generators which use the same input format. For jison, see the note below with respect to mid-rule actions.

yacc 及其派生类允许您编写具有中间规则操作的规则;这些操作在规则其余部分的任何减少之前执行.这使得插入设置和拆卸操作变得非常容易.例如:

yacc and its derivatives allow you to write rules with mid-rule actions; these actions are executed before any reductions in the rest of the rule. That makes it quite easy to insert set-up and tear-down actions. For example:

function: "function"    { start_scope(); }
          '(' arguments ')'
          '{' body '}'  { end_scope(); /* ... */ }
        ;

中间规则动作不会为上下文无关文法添加任何便利,因为它们可以用另一种方式编写(见下文),但它们有时更易于阅读和编写.中间规则动作与产生式的任何其他组件一样具有值,并且可以被后面的动作引用.例如,执行以下操作可能会很有用:

Mid-rule actions do not add any facility to a context free grammars, since they can be written in another way (see below), but they are sometimes easier to read and write. The mid-rule action has a value like any other component of a production, and it can be referenced by a later action. It might, for example, be useful to do something like this:

function: "function"    { $$ = new_scope(); }
          '(' arguments ')'
          '{' body '}'  { close_scope($2); /* ... */ }
        ;

即使在那里,如果您希望在初始解析期间将名称查找应用于当前作用域,也需要使用持久状态来记录当前作用域.

although even there it will be necessary to use persistent state to record the current scope if you want name lookup to be applied to the current scope during the initial parse.

此外,向 (LA)LR(1) 语法添加中间规则操作可能会删除 (LA)LR(1) 属性.bison 中有一个例子 手册,这与定义局部变量的问题并不完全巧合.

Furthermore, it is possible that the addition of a mid-rule action to an (LA)LR(1) grammar will remove the (LA)LR(1) property. There is an example in the bison manual, which not entirely coincidentally has to do precisely with the question of scoping local variables.

尽管这个策略很诱人,但它对许多语言的帮助并不像您预期​​的那么大.例如,在 javascript 中,在定义局部变量之前引用它是完全合法的.例如,考虑以下内容:

Tempting though this strategy is, there are many languages for which it will not help as much as you might expect. In javascript, for example, it is entirely legal to refer to a local variable before it is defined as such. Consider the following, for example:

a=42;
function geta() {
  var rv = a;
  var a = 43;
  return rv;
}
geta();

geta 的结果是 undefined 因为当 rva 初始化时,那个特定的 geta 范围内的一个 尚未赋值.如果下一行更改为 var b = 43; 那么 geta 将返回 42 因为现在 a 是对封闭范围内变量的引用.Python 的作用域大致相同(尽管语言在局部函数的作用域上有所不同),但远非通用;可能有更多语言的范围规则像Lua一样,可以从左到右范围,其中geta中第一次使用a是指外层的a,而内部 a 的作用域从它的声明开始,一直延伸到块的末尾.(当然,Lua 的语法略有不同,所以你不能只提供那个例子.)

The result of geta is undefined because when rv is initialized from a, that particular a, which is in the scope of geta, has not yet been assigned a value. If the next line were changed to var b = 43; then geta would return 42 because now the a is a reference to the variable in the enclosing scope. Python scoping is roughly the same (although the languages differ in how local functions are scoped), but it is far from universal; there are probably more languages whose scoping rules are like Lua, which can be scoped left to right, in which the first use of a in geta refers to the outer a, while the scope of the inner a starts at its declaration and extends to the end of the block. (Of course, Lua's syntax is slightly different, so you can't just feed it that example.)

C 和大多数 C 衍生物基于使用前声明"规则,但 C++ 不要求在内联类成员函数的主体中使用的名称在函数定义之前声明除非名称是模板或 typedef.因此,在 C++ 中,名称解析的语法部分可以从左到右完成,您可以决定 (foo)*(bar) 是演员还是产品,但您仍然需要重新审视表达式以进行名称解析.

C and most C derivatives are based on a "declare-before-use" rule, but C++ does not require names used in the bodies of inline class member functions to have been declared before the function's definition unless the name is a template or typedef. So in C++ the syntactic part of name resolution can be done left-to-right and you can decide whether (foo)*(bar) is a cast or a product, but you still need to revisit the expression in order to do name resolution.

因此通常最好在 AST 的后续步骤中将名称附加到作用域,在这种情况下,在 function 产生式之前,作用域不附加到函数不再重要减少.

So it is often better to attach names to scopes in a subsequent walk of the AST, in which case it no longer matters that the scope isn't attached to a function until the function production is reduced.

并非所有 LALR(1) 解析器生成器都允许中间规则操作.例如,jison(否则使用非常相似的语法格式)和 lemon 都没有实现该功能.(有一个相当长的jison 功能请求.) 但这并不重要,因为 MRA 只是语法糖.有时,它们可以通过具有唯一名称且右侧为空的产品来实现.换句话说,规则:

Not all LALR(1) parser generators allow mid-rule actions. Neither jison -- which otherwise uses a very similar grammar format -- nor lemon implement the feature, for example. (There is a fairly long-outstanding jison feature request.) But it doesn't matter so much, because MRAs are just syntactic sugar. Sometimes, they can be implemented with a uniquely-named production with an empty right-hand side. In other words, the rule:

X: a b c { /* mid-rule action */ } d e f { /* final action */ };

可以转换成类似的东西:

can be transformed into something like:

@32: /* empty */ { /* mid-rule action */ };
X: a b c @32 d e f { /* final action */ };

其中当@32减少时执行中间规则动作,即在de和<减少之前代码>f.

in which the mid-rule action will be executed when @32 is reduced, i.e. before the reductions of d, e and f.

很多时候,MRA 需要能够在右侧引用其左侧符号的语义值,因此将 MRA 视为语法糖会更准确rhs 的前缀:

Much of the time, the MRA needs to be able to reference the semantic values of the symbols to the left of it in the right-hand side, so it would be more accurate to consider the MRA to be syntactic sugar for a prefix of the rhs:

X: @32 d e f { /* final action (with semantic values renumbered) */ };
@32: a b c   { /* mid-rule action */ }

实际的bison实现更接近上面第一个建议,因为bison知道当@32减少时,栈顶的三个元素总是abc.这是有效的,因为创建的非终结符只出现在语法中的一个地方.(Bison 使您可以访问此堆栈探测功能,但除非您真的知道自己在做什么,否则请不要使用它.)因此,如果您需要等效于 MRA 的解决方案,则更有可能使用前缀解决方案jison 或柠檬.

The actual bison implementation is closer to the first suggestion above, because bison knows that when @32 is reduced, the top three stack elements will always be a, b and c. That works because the created non-terminal only appears in one place in the grammar. (Bison gives you access to this stack-probing feature, but please don't use it unless you really know what you're doing.) So you're more likely to use the prefix solution if you need the equivalent to an MRA in jison or lemon.

您可能仍然会遇到问题,因为最终操作可能会引用一个语义值,该值现在已包含在前缀规则中.为了处理这种情况,您可能需要利用前缀规则的语义值来传递一个或多个语义规则.幸运的是,这些情况很少见.

You might still run into problems, because the final action might refer to a semantic value which has now been subsumed by the prefix rule. To deal with such a case, you might need to pass one or more semantic rules through the prefix rule by making use of its semantic value. Fortunately, these cases are rare.

这篇关于如何在(LA)LR解析器中获取标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆