Bison:在GLR解析器中按类型/YYERROR区分变量 [英] Bison: distinguish variables by type / YYERROR in GLR-parsers

查看:183
本文介绍了Bison:在GLR解析器中按类型/YYERROR区分变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GNU野牛进行解析器的开发,并且面临着一个有趣的问题.我的实际问题略有不同,总体上没有那么有趣,因此我将以不同的方式陈述,这样,答案通常会更有用.

I'm working on a parser, using GNU bison, and I'm facing an interesting problem. My actual problem is a bit different and less interesting in general, so I will state it a bit differently, so that an answer would be more useful in general.

我需要根据表达式的类型来区分它们,例如算术表达式和字符串表达式.他们的顶级非终结者有一些共同的祖先,例如

I need to distinguish expressions depending on their type, like arithmetic expressions from string expressions. Their top-level nonterminals have some common ancestors, like

statement
    : TOK_PRINT expression TOK_SEMICOLON {...}
;
expression
    : arithmeticexpression {...}
    | stringexpression {...}
;

现在我需要能够在两种表达式中都包含变量

Now I need to be able to have variables in both kinds of expressions

arithmeticexpression
    : TOK_IDENTIFIER {...}
;
stringexpression
    : TOK_IDENTIFIER {...}
;

(在stringexpression情况下仅允许使用string-type变量,在算术表达式情况下仅允许使用int-或float类型的变量) 但这显然会导致R/R冲突,这是该语言固有的-无法解决,因为该语言是模棱两可的.

(only the variables of string-type should be allowed in the stringexpression case and only variables of int- or float-type should be allowed in the arithmeticexpression case) But this obviously causes an R/R conflict, which is inherent in the language - it is impossible to resolve, because the language is ambiguous.

当然,我可以降低语言的使用量,以便仅将常规的"Expression"对象传递到解析堆栈中,但随后我必须在操作中进行大量的手动类型检查,而我想要避免. 同样,在我的实际用例中,从语法到可变规则的路径是如此不同,以至于我不得不大量降低语言的使用量,以至于我失去了很多语法规则(即,失去了很多语法规则).结构信息),并且需要将解析引擎手写到一些动作中.

Of course I could water down the language, so that only general "Expression" objects are passed around on the parse-stack, but then I'd have to do a lot of manual type-checking in the actions, which I would like to avoid. Also, in my real use-case, the paths through the grammar to the variable-rules are so different, that I would have to water the language down so much that I would lose a lot of grammar-rules (i.e. lose a lot of structural information) and would need to hand-write parsing-engines into some actions.

我已经阅读了有关GLR解析的信息,这听起来可以解决我的问题.我正在考虑使用此功能:语法如上所述,并且在相应变量类型错误的路径中包含YYERROR.

I have read about GLR-parsing, which sounds like it could solve my problem. I'm considering to use this feature: have the grammar as above and YYERROR in the paths where the corresponding variable has the wrong type.

arithmeticexpression
    : TOK_IDENTIFIER {
          if(!dynamic_cast<IntVariable*>(
              symbol_table[*$<stringvalue>1]))
              YYERROR;
      }
;
stringexpression
    : TOK_IDENTIFIER {
          if(!dynamic_cast<StringVariable*>(
              symbol_table[*$<stringvalue>1]))
              YYERROR;
      }
;

《野牛手册》

  1. 在确定性GLR操作期间,YYERROR的效果是相同的 在确定性解析器中的作用.
  2. 延迟动作中的效果类似,但精确点在于 错误未定义;相反,解析器恢复为确定性 操作,选择一个未指定的堆栈以继续操作 语法错误.
  3. 在语义谓词(请参阅语义谓词)中, 非确定性分析,YYERROR默默地修剪所调用的分析 测试.
  1. During deterministic GLR operation, the effect of YYERROR is the same as its effect in a deterministic parser.
  2. The effect in a deferred action is similar, but the precise point of the error is undefined; instead, the parser reverts to deterministic operation, selecting an unspecified stack on which to continue with a syntax error.
  3. In a semantic predicate (see Semantic Predicates) during nondeterministic parsing, YYERROR silently prunes the parse that invoked the test.

我不确定我是否正确理解-我是这样理解的:

I'm not sure I understand this correctly - I understand it this way:

  1. 不适用于此处,因为GLR解析不是确定性的
  2. 这就是我在上面代码中的方式,但是不应该这样,因为它完全是随机的,哪个路径会被YYERROR杀死(如果我错了,请纠正我)
  3. 无法解决我的问题,因为语义谓词 (不是语义上的 actions !)必须位于规则的开头(如果我错了,请纠正我),这时从TOK_IDENTIFIER令牌获得的yylval无法访问(如果我输入错了,请纠正我),因此我无法查看符号表来找到变量的类型.
  1. doesn't apply here, because GLR-parsing isn't deterministic
  2. is how I have it in the code above, but shouldn't be done this way, because it is completely random which path is killed by the YYERROR (correct me if I'm wrong)
  3. wouldn't solve my problem, because semantic predicates (not semantic actions!) must be at the beginning of a rule (correct me if I'm wrong), at which point the yylval from the TOK_IDENTIFIER token isn't accessible (correct me if I'm wrong), so I can't look into the symbol table to find the type of the variable.

有人在这种情况下有经验吗?我对手册的理解错了吗?您将如何解决这个问题?对我来说,这个问题似乎很自然,我以为人们会经常碰到这个问题,以至于野牛会内置一个解决方案...

Does anyone have experience with such a situation? Is my understanding of the manual wrong? How would you approach this problem? To me, the problem seems to be so natural that I would assume people run into it often enough that bison would have a solution built-in...

推荐答案

此处的示例有效,但是当我将其与

The example here works, but when I put it together with the solution for Bison: GLR-parsing of valid expression fails without error message I ran into the following problem:

该变量可以由多个标识符确定.您可以有一个数组索引,也可以是另一个对象的成员.我通过使用另一个非终结符来对这种情况进行建模

The variable could be determined by more than one identifier. You could have an array-index or it could be a member of a different object. I modeled this situation by having another non-terminal like

lvalue
    : TOK_IDENTIFIER
    | lvalue '[' arithmeticexpression ']'
    | lvalue '.' TOK_IDENTIFIER

但是当我现在拥有

arithmeticexpression : lvalue
stringexpression : lvalue

,然后像上面一样尝试从非终结符"lvalue"访问对象,我遇到了分段错误.因此,看来这里的方法不适用于更复杂的情况.

and (try to) access the object from the non-terminal "lvalue" like above, I get a segmentation fault. So it seems that the method here doesn't work for more complex situations.

我现在要做的是,我使用语义ACTION访问对象,并且如果变量的类型错误,则将$$设置为nullptr.

What I did instead now is that I access the object in the semantic ACTION and set $$ to nullptr if the variable has the wrong type.

arithmeticexpression
    : lvalue
    {
        $$ = nullptr;
        if(dynamic_cast<IntVariable*>($lvalue->getVariable()))
            $$ = new ArithmeticExpressionIntVariable($lvalue);
    }

然后我必须像这样传播nullptr(这是很多其他代码)

Then I had to propagate the nullptr (this is quite a lot of additional code) like this

arithmeticexpression
    ...
    | arithmeticexpression[left] '+' arithmeticexpressionM[right]
    {
        $$ = nullptr;
        if($left && $right)
            $$ = new ArithmeticExpressionPlus($left, $right);
    }

现在,在

expression
    : arithmeticexpression
    | stringexpression
(note: I have "Expression* expr;" in the "%union{" declaration
and "%type <expression> expr" in the prologue)

我有一个歧义:可以用两种不同的方式来解析相同的输入文本,但是只有其中一种将具有值!= nullptr.此时,我需要一个自定义合并过程,该过程基本上只选择非null值.

I have an ambiguity: the same input text can be parsed in two different ways, but only one of them will have a value != nullptr. At this point, I need a custom merge procedure which basically just selects the non-null value.

为此,我在像这样的野牛文件的序言中进行了前瞻性声明

For this, I forward-declared it in the prologue of my bison file like this

static Expression* exprMerge (yy::semantic_type x0, yy::semantic_type x1);

并像这样在结尾中对其进行定义

and defined it in the epilogue like this

static Expression* exprMerge (YYSTYPE x0, YYSTYPE x1) \
{   
    /* otherwise we'd have a legitimate ambiguity */
    assert(!x0.expr || !x1.expr);
    return x0.expr ? x0.expr : x1.expr;
}

最后,我不得不告诉野牛使用此程序来解决歧义,

finally, I had to tell bison to use this procedure to resolve the ambiguity, using

expression
    : arithmeticexpression  %merge <exprMerge>
    | stringexpression %merge <exprMerge>

老实说,我认为这是相当大的努力,如果bison在尝试合并路径而不是排除路径时测试了语义谓词(至少是规则背后的谓词),则这是不必要的.在它们合并之前.

Honestly, I think this is quite a lot of effort, which wouldn't be necessary if bison tested semantic predicates (at least the ones which are behind the rule) when it tries to merge the paths, not to rule out paths before they merge.

但是至少这可以奏效,并且比收集令牌并对其进行手动排序要省力得多.

But at least this works and is far less effort than collecting the tokens and sorting them manually, which would have been the alternative.

这篇关于Bison:在GLR解析器中按类型/YYERROR区分变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆