pyparsing真的是一个递归下降解析器吗? [英] Is pyparsing really a recursive descent parser?

查看:85
本文介绍了pyparsing真的是一个递归下降解析器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pyparsing真的是一个递归下降解析器吗?我问这个是因为

有语法它不能解析我的递归下降解析器会解析,如果我写了一个。例如:

来自pyparsing import *


语法= OneOrMore(Word(alphas))+ Literal(''end'')

grammar.parseString(''First Second Third end'')

令人惊讶的是,这将无法解析!

现在,也许我不知道我是什么我正在谈论但不应该

递归下降解析器以递归方式下降所有可能的规则

组合来发现一个有效的?为什么pyparsing无法解析

这个?有没有办法让pyparsing解析这样的语法?

谢谢...

解决方案

周五,2007年11月2日06:05:13 +0000,只是环境的另一个受害者

道德写道:


pyparsing真的是一个递归血统解析器?我问这个是因为

有语法它不能解析我的递归下降解析器会解析,如果我写了一个。例如:


来自pyparsing import *


语法= OneOrMore(Word(alphas))+ Literal(''end'')

grammar.parseString(''First Second Third end'')


令人惊讶的是,这将无法解析!

现在,也许我不知道我在说什么,但是不应该...... b $ b递归下降解析器递归地通过所有可能的规则

组合来发现一个有效的?为什么pyparsing无法解析

这个?



Pyparsing不是递归下降解析器。它不会返回输入

流。 ``OneOrMore(Word(alphas))``part'吃' ''end''当它没有得到更多时,解析器移动到``Literal(''end'')``part

失败是因为''end''已经消失了。


有没有办法让pyparsing解析这样的语法?



否定前瞻可能:


语法=(OneOrMore(NotAny(Literal(''end''))+ Word (alphas))

+ Literal(''end''))


Ciao,

Marc''BlackJack'' Rintsch


2007-11-04,Kay Schluehr< ka ********** @ gmx.netwrote:


11月4日,03:07,Neil Cerutti< horp ... @ yahoo.comwrote:


> I不会把它描述为假装。你会怎么解析:

你好结束你好结束

WORD END WORD END和WORD WORD WORD END根据语法,它们都是有效的解释。

一旦你从语法中消除歧义,PyParsing就会开始正常工作。



我认为你对此是正确的。但是我不确定它有多重要。只要看看Pythons Grammar

http://svn.python.org/view/python/tr...46&view=markup


无特殊关键字处理这个语法将是

ambigous并且无法使用LL(1)解析器进行解析。



我同意。我不知道使用

PyParsing创建关键字是多么容易,或者如果有问题的语法仍然被作者认为是正确的。


语法编译器构建解析器表会为每个关键字创建一个

特殊标签。当

NAME标记进入解析器时,将过滤此标签。使用

所属的标签,例如''if''或''while''正确的陈述可以是

在恒定时间内选择。当我使用你的EBNF语法的解析器

生成器时,也会发生同样的情况。通过更多的改编

,也可以过滤NUMBER个令牌。但这将是

过度设计。


理论美在这里使用合理的默认值进行了妥协

假设保持语法简单(" ;约会

配置借用Rails口号)。



关键字几乎无处不在。我以前从来没有想过它们是不可能的。


标记化是Python中的另一个问题。由于显着的空格和线条延续,它确实有点特别值得b $ b特别是

标记化在概念上要简单得多,而且很可能会抛出各种各样的

词法分析阶段的选项和特殊情况处理。



它可能是PyParsing中的一个快速修复,它包含一个关键字
正则表达式或NotAny类型中使用(如前所述)负面预测。

< blockquote class =post_quotes>


>> goal = OneOrMore(NotAny(Literal(''end''))+ Word(alphas))+ Literal(''end '')
goal.parseString(''hello hello hello end'')



([''hello' ',''你好'',''你好'',''结束''],{})


>> goal.parseString(''hello end hello end'')



([' 'hello'',''end''],{})


不需要扫描仪/标记器! ;)


-

Neil Cerutti




" ; Neil Cerutti < ho ***** @ yahoo.com在留言中写道

新闻:wP ******************* @ newsreading01.news.tds 。 net ...


2007-11-04,环境道德的另一个受害者

< ih ***** **@hotmail.comwrote:


>>
" Neil Cerutti" < ho ***** @ yahoo.com写了留言
新闻:oz ******************* @ newsreading01.news.tds .net .. 。


>>>
语法中是否有歧义?

在EBNF中:

目标--WORD {WORD}结束

WORD是''[a-zA-Z] +''
结束是''结束''

我认为PyParsing无法猜出该语法的作曲家是什么意思。


一种解释符合语法,而另一种解释则不符合。你会认为与语法一致的解释是更好的选择,所以应该是程序。其次,即使它是含糊不清的...那么什么?
pyparsing'的当前行为是返回一个解析错误,
假装不能解析该字符串。理想情况下,或许它应该提醒你注意模糊性,但是,当然,返回_a_有效解析比假装字符串
根本无法解析更好。 ...



我不会把它描述为假装。你会怎么解析:


你好结束你好结束


WORD END WORD END和WORD WORD WORD END根据语法,这些都是有效的。
解释。



...如果解析器解析其中一个,那将是很好的,因为

它们都是正确的。有一个以上的正确答案与

没有答案是不一样的,这就是pyparsing声称的......


一旦从语法中删除歧义,PyParsing

就会开始正常工作。



这根本不是真的。试试这个:

语法= OneOrMore(Word(alphas))+ Literal(''end'')+ Literal(''。'')

grammar.parseString( ''First Second Third end。'')

...再次,这将无法解析。歧义在哪里?

此外,解析含糊不清的语法是一个有用的特性。并非所有被解析的
语法都是由进行解析的人设计的......


考虑编写一个递归的正确解析器用手解析

语言''[ab] + b''。


目标--ab_list''b''

ab_list - ''''list_tail

ab_list - ''b''list_tail

list_tail - ''a''list_tail

list_tail - ''b''list_tail

list_tail --null


上面有完全相同的bug(可能还有其他一些 - 我是

抱歉无法直接测试它作为PyParsing解决方案。


错误在于语法。它可以通过指定

''b''必须后跟EOF来修复,然后可以使用

多个前瞻字符进行编码。



我不完全理解你用来描述你的递归下降解析器的

产生的语法,所以我不仅没有跟着它

但是我无法弄清楚你的其他帖子。你能解释一下

更多细节吗?指向''null'的最后一部分特别是

令人困惑......

如前所述,它不仅仅是语法。还有

的情况是明确的,pyparsing不能简单解析和

没有理由。

此外,模棱两可的语法是生活中的事实,我们中的一些人需要用b $ b来解析它们。它通常也没问题。考虑一个前面的例子:

语法= OneOrMore(Word(alphas))+ Literal(''end'')

虽然你可能认为这本身就含糊不清,但是''通常不是。

也就是说,只要很少使用''结束''而不是在字符串的结尾处,这将是简单的解析,然而,pyparsing将一直

无法解析它...


Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can''t parse that my recursive descent parser would
parse, should I have written one. For instance:
from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal(''end'')
grammar.parseString(''First Second Third end'')
Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I''m talking about but shouldn''t a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this? Is there a way to get pyparsing to parse a grammar like this?
Thank you...

解决方案

On Fri, 02 Nov 2007 06:05:13 +0000, Just Another Victim of the Ambient
Morality wrote:

Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can''t parse that my recursive descent parser would
parse, should I have written one. For instance:
from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal(''end'')
grammar.parseString(''First Second Third end'')
Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I''m talking about but shouldn''t a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this?


Pyparsing is no recursive descent parser. It doesn''t go back in the input
stream. The ``OneOrMore(Word(alphas))`` part "eats" the ''end'' and when it
can''t get more, the parser moves to the ``Literal(''end'')`` part which
fails because the ''end'' is already gone.

Is there a way to get pyparsing to parse a grammar like this?

Negative lookahead maybe:

grammar = (OneOrMore(NotAny(Literal(''end'')) + Word(alphas))
+ Literal(''end''))

Ciao,
Marc ''BlackJack'' Rintsch


On 2007-11-04, Kay Schluehr <ka**********@gmx.netwrote:

On 4 Nov., 03:07, Neil Cerutti <horp...@yahoo.comwrote:

>I wouldn''t characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.


I think you are correct about this. But I''m not sure how much
it shall matter. Just take a look at Pythons Grammar

http://svn.python.org/view/python/tr...46&view=markup

Without special keyword treatment this grammar would be
ambigous and couldn''t be parsed using an LL(1) parser.

I agree. I don''t know how easy it is to create keywords using
PyParsing, or if the grammar in question would still be
considered correct by the author.

The "grammar compiler" which builds the parser tables creates a
special label for each keyword. This label is filtered when a
NAME token is feeded into the parser. With the label that
belongs to e.g. ''if'' or ''while'' the correct statement can be
selected in constant time. Same happens when I use the parser
generator with your EBNF grammar. With a little more adaption
also NUMBER token could be filtered. But this would be
overdesign.

Theoretical beauty is compromised here using reasonable default
assumptions for keeping the grammar simple ( "convention over
configuration" to borrow a Rails slogan ).

Keywords are practically ubiquitous. I never thought of them as
unbeautiful before.

Tokenization is another issue in Python. It is indeed somewhat
special due to significant whitespace and line continuation but
tokenization is conceptually much simpler and one would likely
throw all kinds of options and special case handling in the
lexical analysis phase.

It might be a quick fix in PyParsing, which includes a Keyword
type, but without the semantics that are needed in this case. You
have to use (as suggested earlier) negative lookahead in either a
Regex or with the NotAny type.

>>goal = OneOrMore(NotAny(Literal(''end'')) + Word(alphas)) + Literal(''end'')
goal.parseString(''hello hello hello end'')

([''hello'', ''hello'', ''hello'', ''end''], {})

>>goal.parseString(''hello end hello end'')

([''hello'', ''end''], {})

No scanner/tokenizer needed! ;)

--
Neil Cerutti



"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:wP*******************@newsreading01.news.tds. net...

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:

>>
"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:oz*******************@newsreading01.news.tds .net...

>>>
Is there not an ambiguity in the grammar?

In EBNF:

goal --WORD { WORD } END

WORD is ''[a-zA-Z]+''
END is ''end''

I think it is fine that PyParsing can''t guess what the composer
of that grammar meant.


One interpretation conforms to the grammar while the other
doesn''t. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing''s current behaviour is to return a parse error,
pretending that the string can''t be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it''s better
to return _a_ valid parsing than to pretend that the string
can''t be parsed at all...


I wouldn''t characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

...and it would be nice if the parser were to parse one of them since
they are both right. Having more than one right answer is not the same as
having no answer, which is what pyparsing claims...

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

This is simply not true. Try this:
grammar = OneOrMore(Word(alphas)) + Literal(''end'') + Literal(''.'')
grammar.parseString(''First Second Third end.'')
...again, this will fail to parse. Where''s the ambiguity?
Besides, parsing ambiguous grammars is a useful feature. Not all
grammars being parsed are designed by those doing the parsing...

Consider writing a recursive decent parser by hand to parse the
language ''[ab]+b''.

goal --ab_list ''b''
ab_list --''a'' list_tail
ab_list --''b'' list_tail
list_tail --''a'' list_tail
list_tail --''b'' list_tail
list_tail --null
The above has the exact same bug (and probably some others--I''m
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
''b'' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

I don''t exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow it
but I couldn''t make out the rest of your post. Could you explain in a
little more detail? The last part that points to ''null'' is especially
confusing...
As demonstrated earlier, it''s not just the grammar. There are
situations that are unambiguous that pyparsing can''t parse simply and
there''s no reason for it.
Besides, ambiguous grammars are a fact of life and some of us need to
parse them. It''s usually okay, too. Consider a previous example:
grammar = OneOrMore(Word(alphas)) + Literal(''end'')
While you may consider this inherently ambiguous, it''s usually not.
That is to say, as long as it is rare that ''end'' is used not at the end of
the string, this will simply parse and, yet, pyparsing will consistently
fail to parse it...


这篇关于pyparsing真的是一个递归下降解析器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆