python与grep [英] python vs. grep
问题描述
我读过有关发电机的好文章:
http ://www.dabeaz.com/generators/index.html
作者说,编写常见的linux工具模拟很容易
as awk,grep等。他说性能可能会更好。
但我在编写性能grep analog时遇到了一些问题。
这是我的剧本:
import re
pat = re.compile(" sometext)
f = open(" bigfile",'r'')
flines =(如果pat.search(行),则为f行中的行)
c = 0
$ f $ b for flines:
c + = 1
打印c
和bash:
grep" sometext" bigfile | wc -l <br />
Python代码在Windows上慢3-4倍。而且我记得在linux上
同样的情况...
在开放时缓冲甚至增加时间。
是否可以提高文件读取性能?
I''ve read great paper about generators:
http://www.dabeaz.com/generators/index.html
Author say that it''s easy to write analog of common linux tools such
as awk,grep etc. He say that performance could be even better.
But I have some problem with writing performance grep analog.
It''s my script:
import re
pat = re.compile("sometext")
f = open("bigfile",''r'')
flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c
and bash:
grep "sometext" bigfile | wc -l
Python code 3-4 times slower on windows. And as I remember on linux
the same situation...
Buffering in open even increase time.
Is it possible to increase file reading performance?
推荐答案
2008年5月6日星期二下午1:42,Anton Slesarev< sl * ***********@gmail.com写道:
On Tue, May 6, 2008 at 1:42 PM, Anton Slesarev <sl************@gmail.comwrote:
是否可以提高文件读取性能?
Is it possible to increase file reading performance?
不知道这个,但是这一部分:
Dunno about that, but this part:
flines =(如果拍的是f行中的行.search(line))
c = 0
$ f $ b for flines:
c + = 1
print c
flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c
可以改写为:
打印总和(如果pat.search(行),则为f中的行为1)
could be rewritten as just:
print sum(1 for line in f if pat.search(line))
Anton Slesarev< sl ************ @ gmail.comwrites:
Anton Slesarev <sl************@gmail.comwrites:
f = open(" bigfile",'r'')
flines =(如果pat.search(行),则为f行中的行)
c = 0
$ f $ b for flines:
c + = 1
打印c
f = open("bigfile",''r'')
flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c
不使用生成器表达式会更简单(也可能更快):
search = re.compile(''sometext'')。search
c = 0
for line in open(''bigfile''):
如果搜索(行):
c + = 1
或许更快(因为名字查找的数量减少了),使用
itertools.ifilter:
来自itertools import ifilter <对于ifilter中的行(搜索,''bigfile''),
c = 0
:
c + = 1
如果''sometext''只是文本(没有regexp通配符),那么更简单:
....
for line in。 ..:
如果''sometext''排成一行:
c + = 1
我不相信你虽然使用Python很容易击败grep + wc。
也许更快?
总和(bool(搜索(行))为行在open(''bigfile''))
sum(ifilter中的行为1(搜索,打开(''bigfile'')))
......等......
所有这些都是未经测试的!
-
Arnaud
It would be simpler (and probably faster) not to use a generator expression:
search = re.compile(''sometext'').search
c = 0
for line in open(''bigfile''):
if search(line):
c += 1
Perhaps faster (because the number of name lookups is reduced), using
itertools.ifilter:
from itertools import ifilter
c = 0
for line in ifilter(search, ''bigfile''):
c += 1
If ''sometext'' is just text (no regexp wildcards) then even simpler:
....
for line in ...:
if ''sometext'' in line:
c += 1
I don''t believe you''ll easily beat grep + wc using Python though.
Perhaps faster?
sum(bool(search(line)) for line in open(''bigfile''))
sum(1 for line in ifilter(search, open(''bigfile'')))
....etc...
All this is untested!
--
Arnaud
2008/5/6,Anton Slesarev< sl ************ @ gmail.com>:
2008/5/6, Anton Slesarev <sl************@gmail.com>:
但我有编写性能grep analog的一些问题。
But I have some problem with writing performance grep analog.
[...]
[...]
Python代码在Windows上慢3-4倍。而且我记得在linux上
同样的情况...
在开放时缓冲甚至增加时间。
是否可以提高文件读取性能?
Python code 3-4 times slower on windows. And as I remember on linux
the same situation...
Buffering in open even increase time.
Is it possible to increase file reading performance?
最好的建议是不要试图击败grep,但如果你真的想要b $ b,那么这是正确的地方;)
这是我的代码:
The best advice would be not to try to beat grep, but if you really
want to, this is the right place ;)
Here is my code:
这篇关于python与grep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!