re.search比一些正则表达式上的grep慢得多 [英] re.search much slower then grep on some regular expressions

查看:222
本文介绍了re.search比一些正则表达式上的grep慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

re.search和

grep之间存在巨大差异的原因是什么?

这个脚本大约需要5分钟才能在我的电脑上运行:

#!/ usr / bin / env python

导入重新


row =""

for a range(156000):

row + =" a"

print re.search(''[^=] * /' ',row)

做一个简单的grep:

grep''[^=] * /''输入(输入包含156.000 a

一行)

甚至不需要一秒钟。


这是python中的一个错误吗?


谢谢......

Henning Thornblad

What can be the cause of the large difference between re.search and
grep?

This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re

row=""
for a in range(156000):
row+="a"
print re.search(''[^ "=]*/'',row)
While doing a simple grep:
grep ''[^ "=]*/'' input (input contains 156.000 a in
one row)
doesn''t even take a second.

Is this a bug in python?

Thanks...
Henning Thornblad

推荐答案

Henning_Thornbladaécrit:
Henning_Thornblad a écrit :

造成re.search和

grep之间巨大差异的原因是什么?


此脚本需要大约5分钟在我的电脑上运行:

#!/ usr / bin / env python

导入re


row =""

的范围内(156000):

row + =" a"

print re.search(''[^=] * /'',行)


做一个简单的grep:

grep''[^=] * /''输入(输入包含156.000 a

一行)

甚至不需要第二。


这是python中的一个错误吗?
What can be the cause of the large difference between re.search and
grep?

This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re

row=""
for a in range(156000):
row+="a"
print re.search(''[^ "=]*/'',row)
While doing a simple grep:
grep ''[^ "=]*/'' input (input contains 156.000 a in
one row)
doesn''t even take a second.

Is this a bug in python?



请仔细阅读你的python代码。难道你不觉得读取文件和构建156000个字符串对象之间存在细微差别吗?

Please re-read carefully your python code. Don''t you think there''s a
subtle difference between reading a file and buildin 156000 string objects ?


Bruno Desthuilliersaécrit:
Bruno Desthuilliers a écrit :

Henning_Thornbladaécrit:
Henning_Thornblad a écrit :

>大的原因是什么? re.search和
grep之间的区别?
这个脚本需要大约5分钟才能在我的电脑上运行:
#!/ usr / bin / env python
import对于范围内(156000)的行=""

row + =" a"
print re.search(''[^ " =] * /'',row)

做一个简单的grep:
grep''[^=] * /''输入(输入包含156.000 a in
一行)
甚至不需要一秒钟。

这是python中的一个错误吗?
>What can be the cause of the large difference between re.search and
grep?

This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re

row=""
for a in range(156000):
row+="a"
print re.search(''[^ "=]*/'',row)
While doing a simple grep:
grep ''[^ "=]*/'' input (input contains 156.000 a in
one row)
doesn''t even take a second.

Is this a bug in python?



请仔细阅读你的python代码。难道你不觉得读取文件和构建156000字符串

对象之间存在细微差别吗?


Please re-read carefully your python code. Don''t you think there''s a
subtle difference between reading a file and buildin 156000 string
objects ?



嗯...这个预留下来,经过测试(以一种更有效的方式构建字符串
),对re.search的调用实际上需要

年龄回归。请忘记我以前的帖子。

Mmm... This set aside, after testing it (building the string in a
somewhat more efficient way), the call to re.search effectively takes
ages to return. Please forget my previous post.


Henning_Thornblad写道:
Henning_Thornblad wrote:

造成差异的原因可能是什么re.search和

grep?
What can be the cause of the large difference between re.search and
grep?



grep使用更智能的算法;)

grep uses a smarter algorithm ;)


此脚本大约需要5分钟才能在我的计算机上运行:

#!/ usr / bin / env python

导入重新


row =""

for a range(156000):

row + =" a"

print re.search(''[^=] * /' ',row)


做一个简单的grep:

grep''[^=] * /''输入(输入包含156.000 a in

一行)

甚至不需要一秒钟。


这是python中的一个错误吗?
This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re

row=""
for a in range(156000):
row+="a"
print re.search(''[^ "=]*/'',row)
While doing a simple grep:
grep ''[^ "=]*/'' input (input contains 156.000 a in
one row)
doesn''t even take a second.

Is this a bug in python?



您可以将此称为性能错误,但在真正的

代码中通常不足以获得必要的脑循环核心开发人员。

所以你可以自己编写补丁或使用解决方法。


re.search(''[^=] * /'',row)if/其他没有


可能还不错。


彼得

You could call this a performance bug, but it''s not common enough in real
code to get the necessary brain cycles from the core developers.
So you can either write a patch yourself or use a workaround.

re.search(''[^ "=]*/'', row) if "/" in row else None

might be good enough.

Peter


这篇关于re.search比一些正则表达式上的grep慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆