Bison/Flex,减少/减少,不同生产中的标识符 [英] Bison/Flex, reduce/reduce, identifier in different production

查看:111
本文介绍了Bison/Flex,减少/减少,不同生产中的标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用bison/flex做一个解析器.

I am doing a parser in bison/flex.

这是我的代码的一部分:

This is part of my code:

我想实现赋值生产,因此标识符可以是boolean_expr或expr,其类型将由符号表检查. 因此它允许类似:

I want to implement the assignment production, so the identifier can be both boolean_expr or expr, its type will be checked by a symbol table. So it allows something like:

int a = 1;
boolean b = true;
if(b) ...

但是,如果我在term和boolean_expr中都包含标识符,则是减少/减少,有什么解决方案可以解决此问题吗?

However, it is reduce/reduce if I include identifier in both term and boolean_expr, any solution to solve this problem?

推荐答案

本质上,您要尝试的是将语义规则(类型信息)注入语法中.这是可能的,但这并不容易.更重要的是,这很少是一个好主意.如果语法和语义得到很好的描述,那几乎总是最好的.

Essentially, what you are trying to do is to inject semantic rules (type information) into your syntax. That's possible, but it is not easy. More importantly, it's rarely a good idea. It's almost always best if syntax and semantics are well delineated.

所有内容相同,如前所述,您的语法是明确且LALR(1).但是,后一个功能很脆弱,完成语法后您将很难维护它.

All the same, as presented your grammar is unambiguous and LALR(1). However, the latter feature is fragile, and you will have difficulty maintaining it as you complete the grammar.

例如,您没有在问题中包含分配语法,但是会

For example, you don't include your assignment syntax in your question, but it would

assignment: identifier '=' expr
          | identifier '=' boolean_expr
          ;

与所示语法的其余部分不同,该产生式是模棱两可的,因为:

Unlike the rest of the part of the grammar shown, that production is ambiguous, because:

x = y

如果不了解有关y的任何信息,则可以将y简化为termboolean_expr.

without knowing anything about y, y could be reduced to either term or boolean_expr.

一个可能更有趣的示例是在语法中添加括号.显而易见的方法是添加两个产品:

A possibly more interesting example is the addition of parentheses to the grammar. The obvious way of doing that would be to add two productions:

term: '(' expr ')'

boolean_expr: '(' boolean_expr ')'

生成的语法不是模棱两可的,但不再是LALR(1).考虑以下两个声明:

The resulting grammar is not ambiguous, but it is no longer LALR(1). Consider the two following declarations:

boolean x = (y) < 7
boolean x = (y)

在第一个中,y必须为int,以便可以将(y)简化为term;第二个y中的y必须为boolean,以便可以将(y)简化为boolean_expr.没有模棱两可;一旦看到<(或没有看到),就很清楚选择哪种缩小.但是<不是提前标记,实际上它可以与y任意距离:

In the first one, y must be an int so that (y) can be reduced to a term; in the second one y must be boolean so that (y) can be reduced to a boolean_expr. There is no ambiguity; once the < is seen (or not), it is entirely clear which reduction to choose. But < is not the lookahead token, and in fact it could be arbitrarily distant from y:

boolean x = ((((((((((((((((((((((y...

因此对于任何k来说,生成的明确语法都不是LALR(k).

So the resulting unambiguous grammar is not LALR(k) for any k.

解决问题的一种方法是在词法级别上注入类型信息,方法是允许扫描程序访问符号表.然后,扫描程序可以在符号表中查找已扫描的标识符令牌,并使用符号表中的信息来确定以下三种令牌类型(如果有更多数据类型,则为更多)中的一种:undefined_variableinteger_variableboolean_variable.那么您将拥有例如:

One way you could solve the problem would be to inject the type information at the lexical level, by giving the scanner access to the symbol table. Then the scanner could look a scanned identifier token in the symbol table and use the information in the symbol table to decide between one of three token types (or more, if you have more datatypes): undefined_variable, integer_variable, and boolean_variable. Then you would have, for example:

declaration: "int" undefined_variable '=' expr
           | "boolean" undefined_variable '=' boolean_expr
           ;

term: integer_variable
    | ...
    ;

boolean_expr: boolean_variable
            | ...
            ; 

这将起作用,但是显然这是不可扩展的:每次添加类型时,都必须扩展语法和词法描述,因为现在语义不仅与语法,甚至与词法分析混为一谈.一旦让语义开箱即用,它就会污染所有内容.

That will work but it should be obvious that this is not scalable: every time you add a type, you'll have to extend both the grammar and the lexical description, because the now the semantics is not only mixed up with the syntax, it has even gotten intermingled with the lexical analysis. Once you let semantics out of its box, it tends to contaminate everything.

对于某些语言,这实际上是最方便的解决方案:例如,如果区分typedef名称和标识符名称,则C解析要容易得多,这样您就可以知道(t)*x是强制转换还是乘法. (但是,对于C ++来说,它工作起来不那么容易,因为C ++具有更复杂的名称查找规则,并且还需要更多的语义分析才能找到正确的解析.)

There are languages for which this really is the most convenient solution: C parsing, for example, is much easier if typedef names and identifier names are distinguished so that you can tell whether (t)*x is a cast or a multiplication. (But it doesn't work so easily for C++, which has much more complicated name lookup rules, and also much more need for semantic analysis in order to find the correct parse.)

但是,老实说,我建议您不要使用C语言(更不用说C ++语言)作为设计语言的模型.编译器难以解析的语言对人类而言也难以解析. 最令人烦恼的分析" 仍然是C ++新手经常遇到的痛苦之源.绊倒了经验丰富的程序员:

But, honestly, I'd suggest that you do not use C -- and much less C++ -- as a model of how to design a language. Languages which are hard for compilers to parse are also hard for human beings to parse. The "most vexing parse" continues to be a regular source of pain for C++ newcomers, and even sometimes trips up relatively experienced programmers:

class X {
  public:
    X(int n = 0) : data_is_available_(n) {}
    operator bool() const { return data_is_available_; }
    // ...
  private:
    bool data_is_available_;
    // ...
};

X my_x_object();
// ...
if (!x) {
  // This code is unreachable. Can you see why?
}

简而言之,最好使用一种可以完全解析为AST而无需任何语义信息的语言.解析器生成AST后,您可以在单独的过程中进行语义分析,其中之一将检查类型约束.那是最干净的解决方案.没有显式输入,语法会稍作简化,因为expr现在可以是任何expr:

In short, you're best off with a language which can be parsed into an AST without any semantic information at all. Once the parser has produced the AST, you can do semantic analyses in separate passes, one of which will check type constraints. That's far and away the cleanest solution. Without explicit typing, the grammar is slightly simplified, because an expr now can be any expr:

 expr:        conjunction | expr "or" conjunction ;
 conjunction: comparison  | conjunction "and" comparison ;
 comparison:  product     | product '<' product ;
 product:     factor      | product '*' factor ;
 factor:      term        | factor '+' term ;
 term:        identifier
     |        constant
     |        '(' expr ')'
     ;

上面的每个操作将简单地创建一个新的AST节点并将$$设置为该新节点.解析结束时,将遍历AST以验证所有expr的类型均正确.

Each action in the above would simply create a new AST node and set $$ to the new node. At the end of the parse, the AST is walked to verify that all exprs have the correct type.

如果这对于您的项目来说似乎是多余的,则可以在归约操作中进行语义检查,从而有效地将AST漫游与解析混合在一起.对于立即评估而言,这似乎很方便,但是它还需要在解析器的语义类型中包含显式类型信息,这会增加不必要的开销(并且,如前所述,让语义干扰解析器的不雅观.)在这种情况下,每个操作都将看起来像这样:

If that seems like overkill for your project, you can do the semantic checks in the reduction actions, effectively intermingling the AST walk with the parse. That might seem convenient for immediate evaluation, but it also requires including explicit type information in the parser's semantic type, which adds unnecessary overhead (and, as mentioned, the inelegance of letting semantics interfere with the parser.) In that case, every action would look something like this:

expr : expr '+' expr { CheckArithmeticCompatibility($1, $3);
                       $$ = NewArithmeticNode('+', $1, $3);
                     }

这篇关于Bison/Flex,减少/减少,不同生产中的标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆