如何获得“尽可能长的”与Python的RE模块匹配? [英] How to get the "longest possible" match with Python's RE module?

查看:68
本文介绍了如何获得“尽可能长的”与Python的RE模块匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,问题是:

Basically, the problem is this:


>> p = re。 compile(" do | dolittle")
p.match(" dolittle")。group()
>>p = re.compile("do|dolittle")
p.match("dolittle").group()



''做''


Python的NFA正则表达式引擎仅尝试第一个选项,并且很高兴

依赖于此。还有另一个例子:

''do''

Python''s NFA regexp engine trys only the first option, and happily
rests on that. There''s another example:


>> p = re.compile(" ;一个(自我)?(自足)?")
p.match(oneselfsufficient)。group()
>>p = re.compile("one(self)?(selfsufficient)?")
p.match("oneselfsufficient").group()



''自己'


Python正则表达式引擎并没有放弃所有的b $ b b可能性,但在我的应用程序中我希望从给定的点开始,获得最长的

匹配。


有没有办法用Python做到这一点?

''oneself''

The Python regular expression engine doesn''t exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.

Is there a way to do this in Python?

推荐答案

Licheng Fang写道:
Licheng Fang wrote:

基本上,问题是:
Basically, the problem is this:

> p = re.compile(" do | dolittle")
p.match(" dolittle")。group()
>p = re.compile("do|dolittle")
p.match("dolittle").group()



''do''

''do''


>根据我的理解,这不是python特定的,它是预期的
>From what I understand, this isn''t python specific, it is the expected



任何实现中该模式的行为。你正在使用

替换,这意味着要么或者,你先得到更短的

子表达式,所以只需要''做'即可满足条件和

匹配终止。

behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just ''do'' and the
matching terminates.


还有另一个例子:
There''s another example:

> p = re.compile(" one(self)?(selfsufficient)?")
p.match(" oneselfsufficient")。group()
>p = re.compile("one(self)?(selfsufficient)?")
p.match("oneselfsufficient").group()



''oneself''

''oneself''



再说一次,我不认为这有与python有关。你的模式

基本上意味着匹配'一个'是否跟随''自我'或者不是','b $ b,以及它是否跟随''自给自足'或不'。对于这个

特定的例子,你会想要像

" one(self)?(够)?


我认为你可以构建一个模式,可以在没有任何问题的情况下在你的b $ b $ python中做你想做的事情。如果您发布(短)数据示例,

我相信有人可以帮助您。


问候,

Jordan

Again, I don''t think this has anything to do with python. You pattern
basically means "match ''one'' whether it is followed by ''self'' or not,
and whether it is followed by ''selfsufficient'' or not". For this
particular example, you''d want something like
"one(self)?(sufficient)?".

I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I''m sure someone could help you with it.

Regards,
Jordan


Licheng Fang写道:
Licheng Fang wrote:

基本上,问题是这个:
Basically, the problem is this:

> p = re.compile(" do | dolittle")
p.match(" dolittle")。group()
>p = re.compile("do|dolittle")
p.match("dolittle").group()



''do''


Python的NFA regexp引擎只尝试第一个选项,并且很高兴

依赖于此。还有另一个例子:

''do''

Python''s NFA regexp engine trys only the first option, and happily
rests on that. There''s another example:


> p = re.compile(" one(self)?(selfsufficient)? ")
p.match(oneselfsufficient)。group()
>p = re.compile("one(self)?(selfsufficient)?")
p.match("oneselfsufficient").group()



''self''


Python正则表达式引擎并没有放弃所有

的可能性,但在我的应用程序中我希望得到最长的

匹配,从给定的点开始。


有没有办法在Python中执行此操作?

''oneself''

The Python regular expression engine doesn''t exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.

Is there a way to do this in Python?



这就是正则表达式的工作原理python与

没有任何关系。它开始使用给定的模式解析数据。它会根据模式返回

匹配的字符串,并且不会再找回

其他组合。


所以得到你可能需要的所有组合每次给

不同的模式。

This is the way the regexp works python doesn''t has anything to do with
it. It starts parsing the data with the pattern given. It returns the
matched string acording the pattern and doesn''t go back to find the
other combinations.

So to get all the combinations you would probably require to give
different patterns each time.




MonkeeSage写道:

MonkeeSage wrote:

Licheng Fang写道:
Licheng Fang wrote:

基本上,问题是:
Basically, the problem is this:

>> p = re.compile(" do | dolittle")

>> p.match(" dolittle" ).group()
>>p = re.compile("do|dolittle")
>>p.match("dolittle").group()



''do''

''do''


据我所知,这不是特定于python的,它是任何实现中该模式的预期
From what I understand, this isn''t python specific, it is the expected



行为。你正在使用

替换,这意味着要么或者,你先得到更短的

子表达式,所以只需要''做'即可满足条件和

匹配终止。

behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just ''do'' and the
matching terminates.


还有另一个例子:
There''s another example:

>> p = re.compile(" one(self)?(selfsufficient)?")

>> p.match(" oneselfsufficient")。group( )
>>p = re.compile("one(self)?(selfsufficient)?")
>>p.match("oneselfsufficient").group()



''自己''

''oneself''



再一次,我认为这没什么做python。你的模式

基本上意味着匹配'一个'是否跟随''自我'或者不是','b $ b,以及它是否跟随''自给自足'或不'。对于这个

特定的例子,你会想要像

" one(self)?(够)?


我认为你可以构建一个模式,可以在没有任何问题的情况下在你的b $ b $ python中做你想做的事情。如果您发布(短)数据示例,

我相信有人可以帮助您。


问候,

Jordan


Again, I don''t think this has anything to do with python. You pattern
basically means "match ''one'' whether it is followed by ''self'' or not,
and whether it is followed by ''selfsufficient'' or not". For this
particular example, you''d want something like
"one(self)?(sufficient)?".

I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I''m sure someone could help you with it.

Regards,
Jordan



根据这些正则表达式引擎讨论,对于任何实现来说,这不是一种行为



http:// www.softec.st/en/OpenSource/D...onEngines.html
http://www.softec.st/en/OpenSource/D...onEngines.html


Python的NFA引擎读取输入字符串,将其与

模式匹配,并在需要时回溯。相比之下,DFA引擎,以我的理解,构建一个DFA,并使用它来尽可能多地填充多个字符

。也许就是这样:


模式:一个(自我)?(自给自足)?


PYTHON的NFA引擎:


一个人,没有自给自足,没有

(开始)------->((1))------ ------>((2))----------------------->((3))

DFA发动机:


一个人自我

(开始)------->((123))--- --------->((23))

|

|

|自给自足

--------------->((3))


我想知道是否有某些方法让Python RE的行为像grep

那样,还是我必须换成另一个引擎?

Hi, according to these regexp engine discussions, it''s NOT a behavior
true to any implementation.

http://www.softec.st/en/OpenSource/D...onEngines.html
http://www.softec.st/en/OpenSource/D...onEngines.html

Python''s NFA engine reads along the input string, matching it to the
pattern, and backtracking when needed. By contrast a DFA engine, to my
understanding, constructs a DFA and uses it to munch as many characters
as possible. Maybe it''s like this:

Pattern: one(self)?(selfsufficient)?

PYTHON''S NFA ENGINE:

one self, none selfsufficient, none
(start)------->((1))------------>((2))----------------------->((3))

DFA ENGINE:

one self
(start)------->((123))------------>((23))
|
|
| selfsufficient
--------------->((3))

I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?


这篇关于如何获得“尽可能长的”与Python的RE模块匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆