python中是否有正则表达式的最大长度? [英] Is there a maximum length of a regular expression in python?

查看:309
本文介绍了python中是否有正则表达式的最大长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约100k字节的正则表达式。 (这是

基本上是所有已知的挪威邮政编号列表和

对应的地方与...之间。我知道这不是预期的

用于正则表达式,但它应该可以工作。


模式是

ur''(N || NO - )?(5259 HJELLESTAD | 4026 STAVANGER | 4027 STAVANGER ........ | 8305

SVOLV?R)''


我得到的错误信息是:

RuntimeError:正则表达式引擎中的内部错误

I have a regular expression that is approximately 100k bytes. (It is
basically a list of all known norwegian postal numbers and the
corresponding place with | in between. I know this is not the intended
use for regular expressions, but it should nonetheless work.

the pattern is
ur''(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
SVOLV?R)''

The error message I get is:
RuntimeError: internal error in regular expression engine

推荐答案

ol **************** @ gmail.com 写道:
我有一个大约100k字节的正则表达式。(它基本上是所有已知挪威邮政编号的列表和
对应的地方之间。我知道这不是用于正则表达式的预期用途,但它应该起作用。

模式是
你的' (N | | NO - )?(5259 HJELLESTAD | 4026 STAVANGER | 4027 STAVANGER ........ | 8305
SVOLV?R)''

我得到的错误信息是:
RuntimeError:正则表达式引擎中的内部错误
I have a regular expression that is approximately 100k bytes. (It is
basically a list of all known norwegian postal numbers and the
corresponding place with | in between. I know this is not the intended
use for regular expressions, but it should nonetheless work.

the pattern is
ur''(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
SVOLV?R)''

The error message I get is:
RuntimeError: internal error in regular expression engine



我一点也不惊讶。你的代码很脆弱(例如,可能会损坏
)并且不能处理

数字和单词之间的多个空格。除了打破翻译之外:-)


我会说你的测试是最明显的证明那里

*是一个限制。


如果有一个dict键入数字并包含

这个词(你可以用你构造的同一个源构建的话)不是更好吗? br />
你可怕的长正则表达式)?


然后如果你找到匹配模式的东西(未经测试)


ur' '(N- | NO - )?((\\\\\\\\\\\\\\\\\\ b $ b或类似的东西实际上有效(我总是得到正则表达式错误

至少三次才能让它们正确)你可以使用dict来

验证数字和名称。


除了其他任何东西之外,如果您正在检查的文本行没有合适的句法形式那么你将要去测试

数百个选项,其中没有一个可能匹配。因此匹配

语法然后验证所识别的数据似乎是一个更明智的选择(至少对我而言)。


问候

Steve

-

Steve Holden +44 150 684 7255 +1 800 494 3119

Holden Web LLC www.holdenweb.com

PyCon TX 2006 www.python.org/pycon/


And I''m not the least bit surprised. Your code is brittle (i.e. likely
to break) and cannot, for example, cope with multiple spaces between the
number and the word(s). Quite apart from breaking the interpreter :-)

I''d say your test was the clearest possible demonstration that there
*is* a limit.

Wouldn''t it be better to have a dict keyed on the number and containing
the word (which you can construct from the same source you constructed
your horrendously long regexp)?

Then if you find something matching the pattern (untested)

ur''(N-|NO-)?((\d\d\d\d)\s*([A-Za-z ]+))''

or something like it that actually works (I invariably get regexps wrong
at least three times before I get them right) you can use the dict to
validate the number and name.

Quite apart from anything else, if the text line you are examining
doesn''t have the right syntactic form then you are going to test
hundreds of options, none of which can possibly match. So matching the
syntax and then validating the data identified seems like a much more
sensible option (to me, at least).

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/


ol ********* *******@gmail.com 写道:
ol****************@gmail.com wrote:
我有一个大约100k字节的正则表达式。 (它基本上是所有已知的挪威邮政编号的列表和
相应的位置之间。我知道这不是正则表达式的预期用途,但它应该仍然有效。

模式是
你''(N | | NO - )?(5259 HJELLESTAD | 4026 STAVANGER | 4027 STAVANGER ........ | 8305
SVOLV?R)''

我得到的错误信息是:
RuntimeError:正则表达式引擎中的内部错误
I have a regular expression that is approximately 100k bytes. (It is
basically a list of all known norwegian postal numbers and the
corresponding place with | in between. I know this is not the intended
use for regular expressions, but it should nonetheless work.

the pattern is
ur''(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
SVOLV?R)''

The error message I get is:
RuntimeError: internal error in regular expression engine




you ''很可能超过了允许的代码大小(通常是64k)。


然而,将所有邮政号码放在一个RE中是对可怕的滥用RE

引擎。为什么不扫描(N- | NO - )?(\d +)并使用字典检查

如果你有一个有效的匹配?


postcodes = {

" 5269":" HJELLESTAD",

...

" ; 9999":"?STRE FJORDVIDDA",

}


for m in re.fi nditer("(N | | NO - )?(\ d +)",text):

前缀,number = m.groups()

尝试:

place = postcodes [number]

除了KeyError:

继续

如果不是text.startswith(地方, m.end()):

继续

#得到一个匹配!

打印前缀,数字,地点

< / F>



you''re most likely exceeding the allowed code size (usually 64k).

however, putting all postal numbers in a single RE is a horrid abuse of the RE
engine. why not just scan for "(N-|NO-)?(\d+)" and use a dictionary to check
if you have a valid match?

postcodes = {
"5269": "HJELLESTAD",
...
"9999": "?STRE FJORDVIDDA",
}

for m in re.finditer("(N-|NO-)?(\d+) ", text):
prefix, number = m.groups()
try:
place = postcodes[number]
except KeyError:
continue
if not text.startswith(place, m.end()):
continue
# got a match!
print prefix, number, place

</F>


文章< 11 **************** ******@z14g2000cwz.googlegroups .com> ;,
ol * ***************@gmail.com 写道:
In article <11**********************@z14g2000cwz.googlegroups .com>,
ol****************@gmail.com wrote:
我有一个大约100k字节的正则表达式。 (它基本上是所有已知的挪威邮政编号的列表和
相应的位置之间。我知道这不是正则表达式的预期用途,但它应该仍然有效。

模式是
你''(N | | NO - )?(5259 HJELLESTAD | 4026 STAVANGER | 4027 STAVANGER ........ | 8305
SVOLV?R)''

我得到的错误信息是:
RuntimeError:正则表达式引擎中的内部错误
I have a regular expression that is approximately 100k bytes. (It is
basically a list of all known norwegian postal numbers and the
corresponding place with | in between. I know this is not the intended
use for regular expressions, but it should nonetheless work.

the pattern is
ur''(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
SVOLV?R)''

The error message I get is:
RuntimeError: internal error in regular expression engine




我我不知道任何规定的最大长度,但我并不感到惊讶

这会导致正则表达式编译器爆炸。这显然是一个正则表达式的例子

是错误的工具。


我猜一本字典,用数字代码作为键和城市

将名称作为值(或者反过来说)就是你想要的。



I don''t know of any stated maximum length, but I''m not at all surprised
this causes the regex compiler to blow up. This is clearly a case of regex
being the wrong tool for the job.

I''m guessing a dictionary, with the numeric codes as keys and the city
names as values (or perhaps the other way around) is what you want.


这篇关于python中是否有正则表达式的最大长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆