Ebnf –这是LL(1)语法吗? [英] Ebnf – Is this an LL(1) grammar?
问题描述
我在维基百科上找到以下 EBNF ,描述了EBNF:
I found the following EBNF on wikipedia, describing EBNF:
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
| "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ;
identifier = letter , { letter | digit | "_" } ;
terminal = "'" , character , { character } , "'"
| '"' , character , { character } , '"' ;
lhs = identifier ;
rhs = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
| rhs , "|" , rhs
| rhs , "," , rhs ;
rule = lhs , "=" , rhs , ";" ;
grammar = { rule } ;
现在,由于我对解析器和语法的了解有限,因此我不知道这是否是LL(1)语法.我试图为此编写一个解析器,但是在尝试读取rhs时会失败,后者会重新读取自身,然后再次读取自身,哦,您明白了...
Now, because of my limited knowledge on parsers and grammars, I don't know if this is an LL(1) grammar. I tried to write a parser for it, but it fails when trying to read rhs, which reads itself again, which reads itself again, which oh, you got it...
- 这是LL(1)语法吗?
- 如果没有,如何将其变成一体(可能?)?
推荐答案
引用的维基百科摘录不是EBNF的正确EBNF语法.它也不是左解析的:的确,它是模棱两可的,因此根本不是绝对可解析的.
The quoted Wikipedia extract is not a correct EBNF grammar for EBNF. It's also not left-parseable: indeed, it is ambiguous, so it's not unambiguously parseable at all.
通常,术语LL(k)
和LR(k)
(以及许多其他此类术语)适用于上下文无关文法(CFG)(以及扩展地,这些语法生成的语言) ). EBNF并不是描述CFG的形式主义.它被设计为描述上下文无关语言的形式主义,因此应该可以从给定的EBNF语法创建CFG(但请参见注释1),但是EBNF语法规则与单个语法之间没有直接对应关系.在CFG中进行生产.
In general, the terms LL(k)
and LR(k)
(and many other such terms) apply to Context-Free Grammars (CFGs) (and, by extension, the languages generated by those grammars). EBNF is not a formalism for describing CFGs. It is designed to be a formalism to describe context-free languages and therefore it should be possible to create a CFG from a given EBNF grammar (but see Note 1), but there is not a direct correspondence between an EBNF syntax rule and a single production in a CFG.
也就是说,通常可以使用一些标准转换直接创建CFG.例如:
That said, you can usually directly create a CFG by using some standard transformations. For example:
{ ... }
可以用生成的非末端M''替换,并加上以下内容:(ε
是空字符串)
can be substituted with the generated non-terminal M'', with the addition of the following productions: (ε
is the empty string)
M' → ...
M'' → ε
M'' → M' M''
上述变换不会引入左递归,因此不会人为地使原始语法变为非LL(1).
The above transformation does not introduce left-recursion, so it does not artificially make the original grammar non-LL(1).
引文语法中最重要的错误[注2]是不明确的EBNF规则:
The most important error in the cited grammar [Note 2] is the ambigous EBNF rule:
rhs = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
| rhs , "|" , rhs
| rhs , "," , rhs
;
它也是左递归的,因此它将不对应于LL(1)CFG.但更重要的是,它不表示|
和,
运算符的关联性或优先级. (从语义上讲,这些运算符没有定义的关联性,但是语法仍应指定一个;否则,就不可能明确创建一个解析树.这两个运算符之间的优先级在语义上很重要.)
It's also left-recursive, so it will not correspond to an LL(1) CFG. But more importantly, it does not indicate either the associativity or the precedence of the |
and ,
operators. (Semantically, these operators do not have a defined associativity, but the syntax should still specify one; otherwise, it is impossible to unambiguously create a parse tree. The precedence between the two operators is important semantically.)
一组更好的规则是:
primary = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
;
factor = primary , { "|" , primary } ;
rhs = factor , { "," , factor } ;
这仍然是一个过分的简化,但是它涵盖了许多用例.它既不是模棱两可的,也不是左递归的. [3]
This is still an oversimplification, but it covers a large number of use cases. It's neither ambiguous nor left-recursive. [3]
注释
-
但是,注释中指定的语法约束可能不容易转换为CFG.例如,针对EBNF的ISO标准EBNF定义了非终结符语法异常",如下所示:
syntactic exception =
? a syntactic-factor that could be replaced
by a syntactic-factor containing no
meta-identifiers
?
作为常规语言的例外.这很重要,因为两种无上下文语言之间的集合差异不一定是无上下文的,而无上下文语言与常规语言之间的差异是无上下文的.具有讽刺意味的是,描述此限制的特殊序列"不能表示为无上下文语法,因为它取决于元标识符的定义. (如果它说不包含元标识符的语法因素",则无需借助特殊的序列就可以很容易地编写出来,但是显然那不是意图.)
Syntactic constraints specified in comments may not be easy to translate into CFGs, though. For example, the ISO standard EBNF for EBNF defines the non-terminal "syntactic exception" as follows:
syntactic exception =
? a syntactic-factor that could be replaced
by a syntactic-factor containing no
meta-identifiers
?
The intention of the above text is to restrict an exception to be a regular language. That's important, since the set difference between two context-free languages is not necessarily context-free, while the difference between a context-free language and a regular language is provably context-free. Ironically, the "special sequence" describing this restriction cannot be expressed as a context-free grammar because it depends on the definition of meta-identifiers. (If it had said "a syntactic-factor containing no meta-identifiers" it would be easy to write without resorting to a special sequence, but clearly that was not the intent.)
Wikipedia摘录中还有另一个重要错误.它将两种类型的带引号的字符串定义为具有相同的主体,但这是不正确的.双引号字符串不能包含双引号字符,单引号字符串不能包含单引号字符.因此,在这两个定义中都使用标识符character
是不正确的.
There is another important error in the Wikipedia excerpt. It defines both types of quoted strings as having the same body, but that's not correct; a double-quoted string cannot include double-quote characters, and a single-quoted string cannot contain single-quote characters. So the use of the identifier character
in both of those definitions is incorrect.
正式的EBNF语法允许primary
为空.我省去了,因为通常不需要.
The formal EBNF grammar allows primary
to be empty. I left that out, because it's not usually needed.
这篇关于Ebnf –这是LL(1)语法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!