创建单独的“布尔表达式".动态语言的规则 [英] Creating a separate "boolean expression" rule for a dynamic language

查看:112
本文介绍了创建单独的“布尔表达式".动态语言的规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Bison中为一种简单的动态键入语言创建语法.我有一个通用" expression规则,该规则有点类似于C中的右值的概念;表达式出现在赋值的右侧,它们也可以作为参数等发送给函数.该规则的简化版本如下:

I'm creating a grammar in Bison for a simple dynamically-typed language. I have a "general" expression rule, which is somewhat akin to the concept of an rvalue in C; expressions appear on the right-hand side of an assignment, they can also be sent to functions as arguments etc. A greatly simplified version of the rule follows:

constantExpression
    : TOK_INTEGER_CONSTANT
    | TOK_FLOAT_CONSTANT
    | stringLiteral
    ;

expression
    : constantExpression
    | identifier
    | booleanExpression
    | booleanExpression TOK_QUESTION_MARK expression TOK_COLON expression
    | TOK_LPAREN expression TOK_RPAREN
    | expression TOK_PLUS expression
    | expression TOK_MINUS expression
    ;

我还有一个专用的布尔表达式规则;布尔表达式最常用于if语句中,但是任何其他需要二进制真值的上下文当然也可以:

I also have a dedicated boolean expression rule; boolean expressions are used most prominently in if statements, but any other context which requires a binary truth value is of course also fine:

booleanExpression
    : identifier
    | expression '<' expression
    | expression '<=' expression
    | expression '==' expression
    | expression '!=' expression
    | expression '>' expression
    | expression '>=' expression
    | booleanExpression '&&' booleanExpression
    | booleanExpression '||' booleanExpression
    | '!' booleanExpression
    | 'true'
    | 'false'
    ;

问题:显然,上述规则遭受大量的减少-减少冲突.根据上下文,标识符应简化为expression(例如,在诸如这样的语句中:myVar2 = myVar1)或booleanExpression(明显的示例:if (myBoolVar))中.

The problem: obviously the above rules suffer from a wealth of reduce-reduce conflicts. An identifier, depending on context, should be reduced either to an expression (for example, in a statement such as this: myVar2 = myVar1), or to a booleanExpression (obvious example: if (myBoolVar)).

不仅如此,还存在与booleanExpresssion减少为expression的事实相关的移位减少错误;当解析器解析了booleanExpression并且遇到&&令牌时,它可以移动并继续前进,或者减小为expression. booleanExpression需要简化为表达式以允许诸如

Not only that, but there are also shift-reduce errors related to the fact that a booleanExpresssion reduces to an expression; when the parser has parsed a booleanExpression, and it encounters a && token, it could either shift and keep going, or reduce to an expression. A booleanExpression needs to reduce to an expression to allow code such as

conditionVar = (var1 == 5) || (var2 > 10 && var3 < 20);
if (conditionVar) { ... }

我知道与运算符优先级相关的shift-reduce冲突,这在这里不是问题,我已经针对运算符使用%left规则对其进行了修复.

I'm aware of shift-reduce conflicts related to operator precedence, that isn't a problem here, I've fixed it using %left rules for the operators.

我的问题:此问题的最佳解决方案是什么?我的想法是

My question: what is the best solution to this problem? My ideas are to either

  1. 以这样的方式构建规则:尽管有冲突,但Bison 做了我想要的,却忽略了警告-相当丑陋, 不可维护的
  2. 删除单独的booleanExpression规则并移动 全部交给expression-好的,但是需要在 语义分析阶段;用我的语言,字符串不是隐式的 可转换为布尔值,因此if ("foo")之类的代码不合法​​,但 将被解析器接受
  3. 使用GLR解析器-感觉有点 这样一个简单的用例就太夸张了,但也许我错了
  1. structure the rules in such a way that despite the conflicts, Bison does what I want, and ignore the warnings - quite ugly and unmaintanable
  2. remove the separate booleanExpression rule and move it all to expression - okay, but requires more checks at the semantic analysis stage; in my language, strings are not implicitly convertible to booleans, so code like if ("foo") is not legal, but would be accepted by the parser
  3. use a GLR parser - feels a bit overkill for such a simple use-case, but perhaps I'm wrong

以上哪个是最好的主意?还有我没有考虑过的其他可能性吗?

Which of the above is the best idea? Are there any other possibilities I haven't considered?

推荐答案

传统观点是:不要尝试在语法中进行语义分析.

Conventional wisdom is: Don't try to do semantic analysis in your grammar.

首先,如您所见,即使可能,它也会使语法复杂化.相比之下,类型检查规则在AST上的树中执行时非常简单.

First, it complicates the grammar, even if it is possible, as you have seen. By contrast, the type-checking rules are very simple when performed in a tree walk over the AST.

第二,这实际上是不可能的.由于您的语言是动态的,因此您实际上并不知道任何变量的类型.因此,编译时检查可能会导致三种情况,而不是两种情况:好,不好和未知.这在语法上会更加复杂,但在语义分析上只会稍微复杂些.

Second, it is not really possible. Since your language is dynamic, you don't really know what the type of any variable is. So a compile-time check could result in three cases, not two: good, bad, and unknown. That will be even more complicated in the grammar, but only slightly more complicated in the semantic analysis.

但是,取决于您的语言的确切性质,可能有可能选择中间立场.通常,某些运算符(布尔运算符和比较)肯定会返回布尔值,而某些上下文肯定需要布尔值.因此,您可以添加boolean_expression非终结符,用于表示结果肯定为布尔值,值必须为布尔值.然后,您可以在语法中插入单个单元产品

However, depending on the precise nature of your language, it might be possible to choose a middle ground. Generally, some operators -- boolean operators and comparisons --- definitely return boolean values, while some contexts definitely require boolean values. So you could add a boolean_expression non-terminal, used to indicate where results will definitely be boolean and where values must be boolean. Then you could insert into your grammar a single unit production

 boolean_expression: expression

具有语义动作,该动作将运行时检查节点插入AST.

with a semantic action which inserts a runtime check node into the AST.

在语义分析过程中,如果确定此检查将始终成功,则可以消除此检查;如果确定其始终将失败,则可以产生错误.否则,最终将发出代码进行检查.

During semantic analysis, this check can be eliminated if it is determined that it would always succeed or an error produced if it is determined that it would always fail. Otherwise, code will eventually be emitted to do the check.

此解决方案的优势在于,语法随后显示了需要布尔值的上下文,而没有遭受为完全执行该要求所必需的拜占庭式修改.

The advantage of this solution is that the grammar then shows the contexts in which a boolean is required, without suffering from the Byzantine modifications necessary to fully enforce the requirement.

(在下面的示例中,我仅显示一个布尔运算符,一个比较运算符和一个算术运算符.显然,一种真正的语言会包含更多的布尔运算符,但它根本不会改变表示形式.我也没有麻烦序言,序言中必须包含运算符的优先级声明.)

(In the examples below, I only show one boolean operator, one comparison operator, and one arithmetic operator. Obviously a real language would have more of each, but it does not change the presentation at all. I also didn't bother with the prologue, which must include precedence declarations for the operators.)

program : stmt_list
stmt_list:%empty
        | stmt_list stmt
stmt    : assign
        | call
        | empty
        | while
        | '{' stmt_list '}'
assign  : IDENTIFIER '=' expr ';'
call    : expr '(' expr_list ')' ';'
        | expr '(' ')' ';'
empty   : ';'
while   : "while" '(' boolean_expr ')' stmt
expr_list
        : expr
        | expr_list ',' expr
boolean_expr
        : boolean_term
        | boolean_expr "or" boolean_expr
        | expr '<' expr
boolean_term
        : "true" | "false"
        | expr               { /* insert conversion from expr to boolean */ }

expr    : term
        | expr '+' expr
term    : INTEGER
        | IDENTIFIER
        | '(' expr ')'

但是它确实对语言有一些限制.在上述最简单的化身中,布尔值只能在布尔上下文中使用,这会阻止布尔值成为一等值.如上面的语法所示,它们不能用作赋值的右侧或函数调用中的参数.

But it does place some restrictions on the language. In the simplest incarnation as presented above, a boolean value can never be used other than in a boolean context, which prevents boolean values from being first-class values. They can't be used as the right-hand side of an assignment or as an argument in a function call, for example, as is clear from the above grammar.

此外,以上语法不允许在布尔表达式周围添加多余的括号.

Also, the above grammar doesn't allow redundant parentheses around boolean expressions.

这不是很令人满意,但是我们可以通过将布尔结果与布尔值分开来做得更好,但语法会稍微复杂些.

That's not very satisfactory, but we can do better by separating boolean results from boolean values at the cost of a slight complication in the grammar.

在大多数语言中,可以根据定义的规则从其他值创建布尔值.按照惯例,转换为布尔值true的值称为真实"值.这可能是非常方便的,尽管如果强制性的范围过宽,也会有些危险. [注1]为控制损害,我们可能只允许在强制布尔上下文中将自动强制布尔值设置为布尔值,而决不允许将布尔值自动强制为非布尔值.如果您愿意接受这些限制,那么我们仍然可以将语法用作记录布尔上下文和强制的工具.

In most languages, boolean values can be created according to defined rules from other values; by convention, a value which is converted to a boolean true is called a "truthy" value. This can be very convenient, although it can also be slightly dangerous if there is too much latitude in the nature of the coercion. [Note 1] To control the damage, we might only allow automatic coercion to boolean in an explicitly boolean context, and never allow a boolean to be automatically coerced to a non-boolean. If you are willing to accept those restrictions, then we can still use the grammar as a tool for documenting boolean contexts and coercions.

下面,我们介绍四个非终端,它们都代表某种表达风格:

In the following, we introduce four non-terminals, all representing some flavour of expression:

  • expr:非布尔表达式
  • boolean_expr:一个专门的布尔表达式;相应的结果列出了必须具有布尔结果的语法.
  • truthy_expr:可以强制转换为布尔表达式的布尔表达式或非布尔表达式.此非终端用于需要布尔值的地方. [注2]
  • either_expr:在上下文中可以是布尔表达式也可以是非布尔表达式,在这种情况下,它们可能都没有强制性出现(例如,赋值和函数自变量).
  • expr: a non-boolean expression
  • boolean_expr: a specifically boolean expression; the corresponding productions list the syntaxes which must have a boolean result.
  • truthy_expr: either a boolean expression or a non-boolean expression which could be coerced to a boolean expression. This non-terminal is used in places where a boolean value is required. [Note 2]
  • either_expr: either a boolean expression or a non-boolean expression in a context in which either might appear without coercion (assignments and function arguments, for example).

请注意,最后两个非终结词的产生完全相同,但语义却非常不同(因此语义动作也不同).因为它们可能出现的上下文是不相交的,所以不会产生冲突.

Note that the last two non-terminals have exactly the same productions, but very different semantics (and thus different semantic actions). Because the contexts in which they might appear are disjoint, no conflict results.

除了上述非终结点的定义及其在各种情况下的用法外,语法没有太大变化:

Other than the definition of the above non-terminals and their use in various contexts, the grammar is not much changed:

program : stmt_list
stmt_list:%empty
        | stmt_list stmt
stmt    : assign
        | call
        | empty
        | while
        | '{' stmt_list '}'
assign  : IDENTIFIER '=' either_expr ';'
call    : expr '(' expr_list ')' ';'
        | expr '(' ')' ';'
empty   : ';'
while   : "while" '(' truthy_expr ')' stmt
expr_list
        : either_expr
        | expr_list ',' either_expr

truthy_expr
        : boolean_expr
        | expr               { /* insert conversion from expr to boolean */ }

either_expr
        : boolean_expr
        | expr

boolean_expr
        : boolean_term
        | truthy_expr "or" truthy_expr
        | expr '<' expr
boolean_term
        : "true"
        | "false"
        | '(' boolean_expr ')'

expr    : term
        | expr '+' expr
term    : INTEGER
        | IDENTIFIER
        | '(' expr ')'

如果您认为以上内容太复杂,请遵循常规知识,并避免在语法中散布语义.另一方面,如果您认为它具有说明性的价值,并且您的语言所允许的限制是可接受的,则可以对其进行调整以适应您的目的.

If you believe the above is too complicated, then go with the conventional wisdom and avoid semantics interspersed in your grammar. If, on the other hand, you feel that it has expository value and your language is such that the restrictions are acceptable, then adapt it to your purposes.

  1. 该方案不取决于是否存在任何真实的"强制,但是如果布尔值是第一类,则将存在只能在运行时确定为布尔值的表达式(布尔变量,返回布尔值的函数)值等).考虑在运行时检查,布尔值上下文中使用的值是布尔值,是强制转换为真实性的一种形式,其中只有true为true,只有false为false,而所有其他值都会引发错误.

  1. The scheme does not depend on the existence of any "truthy" coercion, but if boolean values are first class, there will be expressions which can only be determined to be boolean at runtime (boolean variables, functions returning boolean values, etc.). Consider the run-time check that a value used in a boolean context is a boolean value to be a form of coercion into truthiness, where only true is true and only false is false, while all other values throw an error.

我个人非常喜欢有限的自动布尔强制.对我来说,一个文件对象在错误情况下是虚假的,或者在容器是非空的情况下它是真实的,这对我来说是完全合理的.将这些转换限制为显式布尔上下文(例如条件语句中的条件),使自动强制成为我可以接受的.但是我不坚持这个想法.如果您不喜欢它,请忽略该想法.

Personally, I've grown fond of limited automatic boolean coercions. It makes perfect sense to me that a file object be falsy if it is in an error condition, for example, or that a container be truthy if it is non-empty. Restricting these conversions to explicitly boolean contexts, such as the condition in a conditional statement, makes the automatic coercion acceptable to me. But I don't insist on the idea; if you don't like it, ignore the thought.

这不是一个很好的名字,但是truthy_or_falsy_expr似乎太长了,而boolish_expr看起来太怪异了.欢迎提出建议.

This isn't a very good name, but truthy_or_falsy_expr seemed too long and boolish_expr seemed too weird. Suggestions are welcome.

这篇关于创建单独的“布尔表达式".动态语言的规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆