CFG用于python样式的元组 [英] CFG for python-style tuples

查看:116
本文介绍了CFG用于python样式的元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在无数次阅读了关于Stackoverflow上的如何使用Regex解析HTML的问题之后,我再次对语法产生了兴趣,抓起了大学脚本,几分钟后,我想知道自己过去如何我的考试。

After having read for the zillionth time a question about "How do I parse HTML with Regex" on Stackoverflow, I got myself interested again in grammars, grabbed my university scripts and after a few minutes I wondered how I've ever passed my exams.

作为一个简单的练习(我希望是这样的简单),我试图编写一个生成有效python元组的CFG(仅出于简单起见)使用标识符 a b c )。经过一段美好的时光,我现在想到了这个:

As a simple (well, "simple" I expected it to be) exercise I tried to write a CFG that produces valid python tuples (for simplicity's sake only using the identifiers a, b and c). After some good time I now came up with this:

G = ( {Tuple, TupleItem, Id}, {"a", "b", "c", ",", "(", ")"}, P, Tuple)

成为P:

Tuple → "(" TupleItem ")"
Tuple → "(" TupleItem Id ")"
Tuple → "(" TupleItem Tuple ")"
TupleItem → TupleItem TupleItem
TupleItem → Id ","
TupleItem → Tuple ","
Id → "a"
Id → "b"
Id → "c"

该语法应该产生例如(a,)(a,b)(a,b,)((a,),)((a,b,),(a,),),但不是(,a)()(a,bc)等。我不想产生多余的括号,例如((a),)((a,b))。实际上,有时是可选的(当一项以上时),有时是强制性的(当仅一项时)尾随逗号几乎使我丧命。

This grammar is supposed to produce e.g. (a,), (a,b), (a,b,), ((a,),), ((a,b,),(a,),), but not (,a), (), ,, (a,b c) etc. I do not want to produce superfluous parentheses like ((a),) or ((a,b)). Actually the sometimes optional (when more than one item) and sometimes obligatory (when only one item) trailing comma almost killed me.


  1. 这样做语法产生所有有效的python元组(仅使用 a b c )?

  2. 此语法产生的字符串不是有效的python元组吗?

  3. 此语法是否正确? (我不确定循环准则)

  4. 我的语法为什么这么长才那么奇怪?如何减少生产规则的数量? (不要使用句法糖,例如管道,因为它们只会将几条规则放在一行上。)

  1. Does this grammar produce all valid python tuples (using only a, b and c)?
  2. Does this grammar produce strings that are not valid python tuples?
  3. Is this grammar proper? (I am unsure about the cyclic criterion)
  4. Why is my grammar so freaking long? How can I reduce the number of production rules? (Not by using syntactic sugar like pipes, as those only put several rules onto one line.)

在此先感谢您的评论和答案。

Thanks in advance for your comments and answers.

推荐答案

实际上没有提到Python语法,我很确定您的语法会产生除一个之外的所有有效Python元组。 ((),空的元组),并且它不会产生任何不是Python元组的东西。

Without actually referring to the Python grammar, I'm pretty sure that your grammar produces all valid Python tuples except one ((), the empty tuple), and that it doesn't produce anything which is not a Python tuple. So to that extent, it's fine.

但是,解析的用处不大,因为

However, it's not much use for parsing because

TupleItem → TupleItem TupleItem

是指数模棱两可的。 (Dicho sea de paso,TupleItem并不是这个非终结符的描述性名称,它实际上是一个列表。)歧义语法在遵循上下文无关语法的所有规则的意义上是正确的,但它们应遵循歧义语法通常会更好。

is exponentially ambiguous. (Dicho sea de paso, TupleItem is not a very descriptive name for this non-terminal, which is really a list.) Ambiguous grammars are "proper" in the sense that they obey all the rules for context-free grammars, but unambiguous grammars are usually better.

很容易修复:

Tuple → "(" ")"
Tuple → "(" ItemList "," ")"
Tuple → "(" ItemList "," Item ")"
ItemList → Item
ItemList → ItemList "," Item
Item → Id
Item → Tuple

(我省去了 Id 的产生;在实际语法中, Id 将是一个终端,但没什么区别。 )

(I left out the Id productions; in practical grammars, Id would be a terminal, but it makes little difference.)

最后,为什么这个语法那么长? (我猜七部作品真的那么长吗??取决于您的标准。)

Finally, why is this grammar "so long"? (Is seven productions really "so freaking long?"? Depends on your criteria, I guess.)

简单的答案就是CFG就是这样。您可以添加语法糖来使右边的正则表达式(不仅是交替出现,而且还有Kleene star及其伴奏):

The simple answer is that CFGs are like that. You could add syntactic sugar to make the right-hand sides regular expressions (not just alternation, but also Kleene star and its companions):

Tuple → "(" [ ItemList "," Item? ]? ")"
ItemList → Item // ","
Item → Id | Tuple

在这里,我使用有用的插值运算符 // ,这在学术课程中很少教授,因此很少有实现:

Here I use the useful interpolate operator //, which is rarely taught in academic classes and consequently has surprisingly few implementations:

a // b = def a(ba) *

a // b =def a(ba)*

以上内容是否易于阅读,我留给读者。它与语法说明中常用的EBNF(扩展Backus-Naur格式)相似,特别是在RFC中。 (EBNF是使用插值运算符的少数形式主义之一,尽管其写法并不像我的那样明确。)

Whether or not the above is easier to read, I leave to the reader. It's similar to the EBNF (Extended Backus-Naur Form) commonly used in grammar expositions, particularly in RFCs. (EBNF is one of the few formalisms with an interpolate operator, although its not written as explicitly as mine.)

无论如何,除此之外,我不认为您的语法可以修剪。

Anyway, other than that, I don't believe that your grammar can be trimmed.

这篇关于CFG用于python样式的元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆