Ebnf –这是LL(1)语法吗? [英] Ebnf – Is this an LL(1) grammar?

查看:284
本文介绍了Ebnf –这是LL(1)语法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在维基百科上找到以下 EBNF ,描述了EBNF:

I found the following EBNF on wikipedia, describing EBNF:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
   | "H" | "I" | "J" | "K" | "L" | "M" | "N"
   | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
   | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
   | "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ;

identifier = letter , { letter | digit | "_" } ;
terminal = "'" , character , { character } , "'" 
     | '"' , character , { character } , '"' ;

lhs = identifier ;
rhs = identifier
 | terminal
 | "[" , rhs , "]"
 | "{" , rhs , "}"
 | "(" , rhs , ")"
 | rhs , "|" , rhs
 | rhs , "," , rhs ;

rule = lhs , "=" , rhs , ";" ;
grammar = { rule } ;

现在,由于我对解析器和语法的了解有限,因此我不知道这是否是LL(1)语法.我试图为此编写一个解析器,但是在尝试读取rhs时会失败,后者会重新读取自身,然后再次读取自身,,您明白了...

Now, because of my limited knowledge on parsers and grammars, I don't know if this is an LL(1) grammar. I tried to write a parser for it, but it fails when trying to read rhs, which reads itself again, which reads itself again, which oh, you got it...

  • 这是LL(1)语法吗?
  • 如果没有,如何将其变成一体(可能?)?

推荐答案

引用的维基百科摘录不是EBNF的正确EBNF语法.它也不是左解析的:的确,它是模棱两可的,因此根本不是绝对可解析的.

The quoted Wikipedia extract is not a correct EBNF grammar for EBNF. It's also not left-parseable: indeed, it is ambiguous, so it's not unambiguously parseable at all.

通常,术语LL(k)LR(k)(以及许多其他此类术语)适用于上下文无关文法(CFG)(以及扩展地,这些语法生成的语言) ). EBNF并不是描述CFG的形式主义.它被设计为描述上下文无关语言的形式主义,因此应该可以从给定的EBNF语法创建CFG(但请参见注释1),但是EBNF语法规则与单个语法之间没有直接对应关系.在CFG中进行生产.

In general, the terms LL(k) and LR(k) (and many other such terms) apply to Context-Free Grammars (CFGs) (and, by extension, the languages generated by those grammars). EBNF is not a formalism for describing CFGs. It is designed to be a formalism to describe context-free languages and therefore it should be possible to create a CFG from a given EBNF grammar (but see Note 1), but there is not a direct correspondence between an EBNF syntax rule and a single production in a CFG.

也就是说,通常可以使用一些标准转换直接创建CFG.例如:

That said, you can usually directly create a CFG by using some standard transformations. For example:

{ ... }

可以用生成的非末端M''替换,并加上以下内容:(ε是空字符串)

can be substituted with the generated non-terminal M'', with the addition of the following productions: (ε is the empty string)

M'  → ...
M'' → ε
M'' → M' M''

上述变换不会引入左递归,因此不会人为地使原始语法变为非LL(1).

The above transformation does not introduce left-recursion, so it does not artificially make the original grammar non-LL(1).

引文语法中最重要的错误[注2]是不明确的EBNF规则:

The most important error in the cited grammar [Note 2] is the ambigous EBNF rule:

rhs = identifier
    | terminal
    | "[" , rhs , "]"
    | "{" , rhs , "}"
    | "(" , rhs , ")"
    | rhs , "|" , rhs
    | rhs , "," , rhs
    ;

它也是左递归的,因此它将不对应于LL(1)CFG.但更重要的是,它不表示|,运算符的关联性或优先级. (从语义上讲,这些运算符没有定义的关联性,但是语法仍应指定一个;否则,就不可能明确创建一个解析树.这两个运算符之间的优先级在语义上很重要.)

It's also left-recursive, so it will not correspond to an LL(1) CFG. But more importantly, it does not indicate either the associativity or the precedence of the | and , operators. (Semantically, these operators do not have a defined associativity, but the syntax should still specify one; otherwise, it is impossible to unambiguously create a parse tree. The precedence between the two operators is important semantically.)

一组更好的规则是:

primary = identifier
        | terminal
        | "[" , rhs , "]"
        | "{" , rhs , "}"
        | "(" , rhs , ")"
        ;
factor  = primary , { "|" , primary } ;
rhs     = factor ,  { "," , factor } ;

这仍然是一个过分的简化,但是它涵盖了许多用例.它既不是模棱两可的,也不是左递归的. [3]

This is still an oversimplification, but it covers a large number of use cases. It's neither ambiguous nor left-recursive. [3]

注释

    但是,注释中指定的语法约束可能不容易转换为CFG.例如,针对EBNF的ISO标准EBNF定义了非终结符语法异常",如下所示:

    syntactic exception =
       ? a syntactic-factor that could be replaced
         by a syntactic-factor containing no
         meta-identifiers
       ?


    作为常规语言的例外.这很重要,因为两种无上下文语言之间的集合差异不一定是无上下文的,而无上下文语言与常规语言之间的差异是无上下文的.具有讽刺意味的是,描述此限制的特殊序列"不能表示为无上下文语法,因为它取决于元标识符的定义. (如果它说不包含元标识符的语法因素",则无需借助特殊的序列就可以很容易地编写出来,但是显然那不是意图.)
  1. Syntactic constraints specified in comments may not be easy to translate into CFGs, though. For example, the ISO standard EBNF for EBNF defines the non-terminal "syntactic exception" as follows:

    syntactic exception =
       ? a syntactic-factor that could be replaced
         by a syntactic-factor containing no
         meta-identifiers
       ?


    The intention of the above text is to restrict an exception to be a regular language. That's important, since the set difference between two context-free languages is not necessarily context-free, while the difference between a context-free language and a regular language is provably context-free. Ironically, the "special sequence" describing this restriction cannot be expressed as a context-free grammar because it depends on the definition of meta-identifiers. (If it had said "a syntactic-factor containing no meta-identifiers" it would be easy to write without resorting to a special sequence, but clearly that was not the intent.)

Wikipedia摘录中还有另一个重要错误.它将两种类型的带引号的字符串定义为具有相同的主体,但这是不正确的.双引号字符串不能包含双引号字符,单引号字符串不能包含单引号字符.因此,在这两个定义中都使用标识符character是不正确的.

There is another important error in the Wikipedia excerpt. It defines both types of quoted strings as having the same body, but that's not correct; a double-quoted string cannot include double-quote characters, and a single-quoted string cannot contain single-quote characters. So the use of the identifier character in both of those definitions is incorrect.

正式的EBNF语法允许primary为空.我省去了,因为通常不需要.

The formal EBNF grammar allows primary to be empty. I left that out, because it's not usually needed.

这篇关于Ebnf –这是LL(1)语法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆