是C#的lambda表达式语法LALR(1)? [英] Is C#'s lambda expression grammar LALR(1)?

查看:150
本文介绍了是C#的lambda表达式语法LALR(1)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要问的问题是succintly在标题中给出。让我给有问题的语法的例子:

The question I wish to ask is succintly given in the title. Let me give an example of the grammar in question:

identifier_list
    : identifier
    | identifier_list identifier;

lambda_arguments
    : '(' identifier_list ')'
    | identifier;

lambda
    : lambda_arguments '=>' expression



然后,我们在普通的C表达式添加grammar-特别,

Then we add in the normal C expression grammar- particularly,

primary_expression
    : '(' expression ')'
    | identifier
    | lambda;



真正的问题是,这是文法LALR(1)可解析,也就是说,能够通过被解析自动化解析器生成?还是需要手工轧制或GLR分析器?请注意,我想这个款,而不是上下文相关的关键字的东西或其他任何部分具体了解。

The real question is, is this grammar LALR(1) parsable, i.e., capable of being parsed by automated parser generators? Or does it require a hand-rolled or GLR parser? Note that I wish to know specifically about this subsection, not the context-sensitive keywords stuff or any other section.

我在想什么,现在是,如果解析器看到'('标识'),这有两种有效的解析,所以如果解析器看到标识符,展望到'),它将不能够决定哪些解析树往下走。这可能仅仅是一个转变/减少冲突,不过,我可以通过指定一些任意的优先级(可能favouriing '('标识'))。

What I'm thinking right now is that if the parser sees '(' identifier ')', this has two valid parses, so if the parser sees identifier, looks ahead to ')', it won't be able to decide which parse tree to go down. This could just be a shift/reduce conflict, though, that I could eliminate by assigning some arbitrary precedence (probably favouriing '(' identifier ')' ).

编辑:其实,我正在考虑<击>窃取使用语法本款为一门新语言类似的功能。我已经在语法形式类似于JavaScript的匿名函数,但我的<击>豚鼠尖叫声反馈抱怨他们对很多用途太冗长,并指出了C#lambda表达式作为更理想的解决方案。我很担心,从这个解决方案产生的潜在的不确定性。所以,真的,我只是在该款感兴趣。像泛型和类型转换其他的东西都是非问题对我来说。

Actually, I was considering stealing using this subsection of the grammar for a similar feature in a new language. I already have anonymous functions similar to JavaScript in grammatical form but my guinea pigs squeaks feedback complains they are too verbose for many uses, and pointed out the C# lambda expressions as a more ideal solution. I was concerned about potential ambiguity resulting from this solution. So, really, I was only interested in that subsection. The other stuff like generics and casts are non-issues for me.

我的语法以前的版本是机械地解析的,我不想失去这个属性,我以前用机械发电机的经验告诉我,这是最好的检查这里而不是试图自己。对于我的手挽解析器,我当然可以简单的特殊情况(标识符向前看远一点比正常。

Previous editions of my grammar are mechanically parsable and I wouldn't want to lose this property, and my previous experience with a mechanical generator tells me that it's best to check here rather than try myself. For my hand-rolled parser, I could of course simply special case '(' identifier to look ahead a bit further than normal.

推荐答案

表达式文法用C#风格lambda表达式增加是不LALR(1),但它可能是LALR(2),因此,它是可能的(虽然不一定琐碎)产生相等的LALR(1)语法:参见下面编辑

An expression grammar augmented with C#-style lambdas is not LALR(1), but it's probably LALR(2). Consequently, it is possible (though not necessarily trivial) to produce an equivalent LALR(1) grammar: see edit below.

您会得到一个减少/降低输入冲突:

You're going to get a reduce/reduce conflict on the input:

( id )

因为 ID 既可以降低到 identifier_list 表达式 (间接地,在第二种情况下),解析器无法分辨哪一个基于一个超前记号()。

because id can either be reduced to identifier_list or to expression (indirectly, in the second case), and the parser cannot tell which one is correct based on one lookahead token ()).

它可以告诉基于两个先行标记,因为 identifier_list 减少只可能是第二个下一个标记是 => ,只要 => 是不是在你的语言操作,在表达式如果第二下一个标记是 =>还原是不可能的; 。所以我认为它可能是LALR(2),虽然我不能说肯定。

It could tell based on two lookahead tokens, since the identifier_list reduction is only possible if the second next token is =>, and as long as => is not an operator in your language, the expression reduction is not possible if the second next token is =>. So I think it's probably LALR(2), although I can't say that with certainty.

在有一个以上的标识符的情况是没有问题的,因为在

The case where there is more than one identifier is not problematic, since in

( id1 id2 )

ID1 ID2 不能被减少到一个表达式(在大多数的表达语言;你的可能,当然,不同)。在一个单一的加括号的标识符,后面紧跟着的情况下 => 也没有问题,只要'=>'不是一个有效的运营商

id1 id2 cannot be reduced to an expression (in most expression languages; yours may, of course, differ). The case where a single unparenthesized identifier is immediately followed by => is also not problematic provided that `=>' is not a valid operator.

修改

我忽略了我原来的答复提到,有没有这样的事,作为一个LALR(2 ) 语言。由LALR识别的语言(2)语法也由一些LALR(1)文法识别。事实上,有这种说法的一个建设性的证据,这使得这样的LALR(1)文法的机械制作,具有恢复原始分析树的过程一起。

I neglected to mention in my original answer that there is no such thing as an LALR(2) language. The language recognized by a LALR(2) grammar is also recognized by some LALR(1) grammar. In fact, there is a constructive proof of this assertion, which allows the mechanical creation of such an LALR(1) grammar, along with a procedure to recover the original parse tree.

在这种情况下,这是更简单的,以产生一个LALR(1)的语法中,由于如上所述,只有一个生产需要额外的超前。该解决方案是由一个令牌延迟的降低。换句话说,在原有的语法包括类似:

In this case, it's even simpler to generate an LALR(1) grammar, since as mentioned above there is only one production which requires additional lookahead. The solution is to delay the reduction by one token. In other words, in the original grammar includes something like:

primary:           '(' expression ')'
lambda_parameters: '(' id_list ')'

其中两个 ID_LIST 表达式导出终端 ID 。除了 ID ,这两个非终端的推导是不相交的,所以我们可以如下解决问题:

where both id_list and expression derive the terminal ID. Aside from ID, the derivations of these two non-terminals are disjoint, so we could resolve the issue as follows:

primary:           '(' expression_not_id ')'
       |           '(' ID ')'


lambda_parameters: '(' id_list_not_id ')'
                 | '(' ID ')' "



它仍然只是来划分生产为表达式 ID_LIST ,以便分离出 ID 情况下,这原来是是不是很难下面是一个简化的例子,它可以很容易地扩展;它仅限于加法,乘法和函数应用(我包括证明两个逗号分隔的列表是没有问题的):

It remains only to divide the productions for expression and id_list so as to separate out the ID case, which turns out to be not very difficult. Below is a simplified example, which could be easily extended; it's restricted to addition, multiplication and function application (which I included to demonstrate that the two comma-separated lists are not a problem):

%token ID LITERAL RIGHT_ARROW
%start expr
%%
primary: primary_not_id | ID ;
term:    term_not_id    | ID ;
sum:     sum_not_id     | ID ;
expr:    expr_not_id    | ID ;

expr_list: expr         | expr_list ',' expr ;
arguments: '(' ')'      | '(' expr_list ')' ;

ids: ID ',' ID          | ids ',' ID ;
parameters: '(' ID ')'  | '(' ids ')' ;

primary_not_id: LITERAL
              | '(' expr_not_id ')'
              | '(' ID ')'
              | primary arguments
              ;

term_not_id: primary_not_id
           | term '*' primary
           ;

sum_not_id: term_not_id
          | sum '+' term
          ;

expr_not_id: sum_not_id
           | parameters RIGHT_ARROW expr
           ;

请注意:在OP的语法产生具有多个参数的lambda表达式不会用逗号分隔的标识符序列:(AB)=> A + B 。我认为,实际的目的是使用逗号:(A,B)=> A + B ,这就是我在上面的语法一样。所不同的是很重要的,如果你的语言有一个逗号运算符,如C家呢,因为在这种情况下的表达可能是(expression_list'),它与冲突一个lambda参数列表。一个天真的实施将导致减少/减少在 expression_list 不能用有限的解决的第一个表达式冲突向前看,因为 expression_list 可以任意长

Note: The grammar in the OP produces lambdas with multiple parameters as a sequence of identifiers not separated by commas: (a b) => a + b. I think that the actual intention was to use commas: (a, b) => a + b, and that's what I did in the above grammar. The difference is important if your language has a comma operator, as the C family does, because in that case an expression might be '(' expression_list ')', which conflicts with a lambda parameter list. A naïve implementation would result in a reduce/reduce conflict on the first expression in the expression_list which cannot be resolved with finite lookahead, since the expression_list could be arbitrarily long.

有对这种情况下的解决方案,以及,虽然:它由分离的 ID_LIST expression_list ,类似如下:

There is a solution for this case as well, though: it consists of separating id_list from expression_list, something like the following:

id_list:         ID
       |         id_list ',' ID
       ;
expression_list_not_id_list: expression_not_id
                           | id_list ',' expression_not_id
                           | expression_list_not_id_list ',' expression
                           ;
expression_list: expression_list_not_id_list
               | id_list
               ;



我没有做一个完整的语法,虽然,因为我不知道什么样的目标语言要求

I didn't do a full grammar, though, since I have no idea what the target language requires.

这篇关于是C#的lambda表达式语法LALR(1)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆