Python正则表达式不是PCRE [英] Python regular expressions just ain't PCRE

查看:96
本文介绍了Python正则表达式不是PCRE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对re正则表达式模块很失望。特别是

,对于递归((?R)或(?n))缺乏支持是我的主要缺点。有很多很棒的东西可以通过这种方式用正则表达式完成,例如验证

数学表达式或用嵌套的parens解析语言,

引用或表达式。


我缺少的另一个功能是一次性子模式和占有性

量词((?> .. 。)和?+ * + ++ {...} +)这些非常适合避免使用嵌套

量词的某些复杂模式中的深度递归和低效率。甚至java.util.regex都支持它们。


有没有计划支持这些功能?对于Python 2.6来说,这些将是非常棒的功能,它们不会混乱,而且它们意味着使用Perl而不是Python的原因要少一些。


注意:我知道有用于Python的LALR解析器生成器/解析器,但是

之所以重新存在是为了提供更简单,更多

解析或验证简单语言和处理文本的高效方法。

(pyparse / yappy / yapps /<插入你最喜欢的Python解析器

生成器hereargument本来可以用来跳过语言中的常规

表达式支持,或者弃用re。你想要

吗?按照同样的规则,我们为什么要这样做?

有Python吗?)

I''m kind of disappointed with the re regular expressions module. In
particular, the lack of support for recursion ( (?R) or (?n) ) is a
major drawback to me. There are so many great things that can be
accomplished with regular expressions this way, such as validating a
mathematical expression or parsing a language with nested parens,
quoting or expressions.

Another feature I''m missing is once-only subpatterns and possessive
quantifiers ( (?>...) and ?+ *+ ++ {...}+ ) which are great to avoid
deep recursion and inefficiency in some complex patterns with nested
quantifiers. Even java.util.regex supports them.

Are there any plans to support these features in re? These would be
great features for Python 2.6, they wouldn''t clutter anything, and
they''d mean one less reason left to use Perl instead of Python.

Note: I know there are LALR parser generators/parsers for Python, but
the very reason why re exists is to provide a much simpler, more
productive way to parse or validate simple languages and process text.
(The pyparse/yappy/yapps/<insert your favourite Python parser
generator hereargument could have been used to skip regular
expression support in the language, or to deprecate re. Would you want
that? And following the same rule, why would we have Python when
there''s C?)

推荐答案



Wiseman < Wi ********* @ gmail.comwrote in message

news:11 ********************* @ e65g2000hsc.googlegro ups.com ...

|我对re正则表达式模块感到很失望。


我相信当前的Python re模块是用来替换pcre的Python

包装的为了支持unicode。


|特别是,缺乏对递归((?R)或(?n))的支持是

|我的主要缺点。


我不记得那些曾经在pcre Python中的人。也许他们是

新。


|有没有计划支持这些功能?


我没见过。你不得不问作者。但我怀疑

这将是一个超出他需要的非平凡项目。


tjr


"Wiseman" <Wi*********@gmail.comwrote in message
news:11*********************@e65g2000hsc.googlegro ups.com...
| I''m kind of disappointed with the re regular expressions module.

I believe the current Python re module was written to replace the Python
wrapping of pcre in order to support unicode.

| In particular, the lack of support for recursion ( (?R) or (?n) ) is a
| major drawback to me.

I don''t remember those being in the pcre Python once had. Perhaps they are
new.

|Are there any plans to support these features in re?

I have not seen any. You would have to ask the author. But I suspect that
this would be a non-trivial project outside his needs.

tjr


在< 11 ********************* @ e65g2000hsc.googlegroups中。 com>,Wiseman写道:
In <11*********************@e65g2000hsc.googlegroups. com>, Wiseman wrote:

注意:我知道有用于Python的LALR解析器生成器/解析器,但是

这就是为什么re存在是为了解析或验证简单语言和处理文本提供更简单,更有效的b $ b生产方式。

(pyparse / yappy / yapps /<插入你最喜欢的Python解析器

生成器hereargument本来可以用来跳过语言中的常规

表达式支持,或者弃用re。你想要

那个?遵循相同的规则,为什么我们会使用Python时

那里有C?)
Note: I know there are LALR parser generators/parsers for Python, but
the very reason why re exists is to provide a much simpler, more
productive way to parse or validate simple languages and process text.
(The pyparse/yappy/yapps/<insert your favourite Python parser
generator hereargument could have been used to skip regular
expression support in the language, or to deprecate re. Would you want
that? And following the same rule, why would we have Python when
there''s C?)



我不跟踪你的理由在这里。 `re`对于匹配令牌很有用

用于更高级别的解析器,而C用于编写需要

硬件访问或原始速度的部分。纯Python太慢了。


与Python源相比,正则表达式变得非常难以理解

代码或EBNF语法,但在EBNF或Python对象中对令牌进行建模

并不像简单的正则表达式那样紧凑和可读。所以两个`
和更高级别的解析器一起使用并且不能取代每个

其他。


同样适用于C和Python。恕我直言。


Ciao,

Marc''BlackJack''Rintsch

I don''t follow your reasoning here. `re` is useful for matching tokens
for a higher level parser and C is useful for writing parts that need
hardware access or "raw speed" where pure Python is too slow.

Regular expressions can become very unreadable compared to Python source
code or EBNF grammars but modeling the tokens in EBNF or Python objects
isn''t as compact and readable as simple regular expressions. So both `re`
and higher level parsers are useful together and don''t supersede each
other.

The same holds for C and Python. IMHO.

Ciao,
Marc ''BlackJack'' Rintsch


5月5日,上午5:12,Terry Reedy < tjre ... @ udel.eduwrote:
On May 5, 5:12 am, "Terry Reedy" <tjre...@udel.eduwrote:

我相信当前的Python re模块是用来替换Python的

包装的pcre是为了支持unicode。
I believe the current Python re module was written to replace the Python
wrapping of pcre in order to support unicode.



我当时不知道PCRE是怎么回事,但现在它支持UTF-8

Unicode模式和字符串,以及Unicode字符属性。也许

它可以重新引入Python?

I don''t know how PCRE was back then, but right now it supports UTF-8
Unicode patterns and strings, and Unicode character properties. Maybe
it could be reintroduced into Python?


我不记得那些曾经在pcre Python中的人。也许他们是新的b

I don''t remember those being in the pcre Python once had. Perhaps they are
new.



至少在今天,PCRE支持递归和递归检查,

占有量词和一次性子模式(禁用

回溯在子模式中),标注(用户函数调用

给定点数),以及其他有趣的强大功能。

At least today, PCRE supports recursion and recursion check,
possessive quantifiers and once-only subpatterns (disables
backtracking in a subpattern), callouts (user functions to call at
given points), and other interesting, powerful features.


这篇关于Python正则表达式不是PCRE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆