python与grep [英] python vs. grep

查看:62
本文介绍了python与grep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过有关发电机的好文章:
http ://www.dabeaz.com/generators/index.html


作者说,编写常见的linux工具模拟很容易

as awk,grep等。他说性能可能会更好。


但我在编写性能grep analog时遇到了一些问题。

这是我的剧本:


import re

pat = re.compile(" sometext)


f = open(" bigfile",'r'')


flines =(如果pat.search(行),则为f行中的行)

c = 0
$ f $ b for flines:

c + = 1

打印c


和bash:

grep" sometext" bigfile | wc -l <​​br />

Python代码在Windows上慢3-4倍。而且我记得在linux上

同样的情况...


在开放时缓冲甚至增加时间。


是否可以提高文件读取性能?

I''ve read great paper about generators:
http://www.dabeaz.com/generators/index.html

Author say that it''s easy to write analog of common linux tools such
as awk,grep etc. He say that performance could be even better.

But I have some problem with writing performance grep analog.
It''s my script:

import re
pat = re.compile("sometext")

f = open("bigfile",''r'')

flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c

and bash:
grep "sometext" bigfile | wc -l

Python code 3-4 times slower on windows. And as I remember on linux
the same situation...

Buffering in open even increase time.

Is it possible to increase file reading performance?

推荐答案

2008年5月6日星期二下午1:42,Anton Slesarev< sl * ***********@gmail.com写道:
On Tue, May 6, 2008 at 1:42 PM, Anton Slesarev <sl************@gmail.comwrote:

是否可以提高文件读取性能?
Is it possible to increase file reading performance?



不知道这个,但是这一部分:

Dunno about that, but this part:


flines =(如果拍的是f行中的行.search(line))

c = 0
$ f $ b for flines:

c + = 1

print c
flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c



可以改写为:


打印总和(如果pat.search(行),则为f中的行为1)

could be rewritten as just:

print sum(1 for line in f if pat.search(line))


Anton Slesarev< sl ************ @ gmail.comwrites:
Anton Slesarev <sl************@gmail.comwrites:

f = open(" bigfile",'r'')


flines =(如果pat.search(行),则为f行中的行)

c = 0
$ f $ b for flines:

c + = 1

打印c
f = open("bigfile",''r'')

flines = (line for line in f if pat.search(line))
c=0
for x in flines:
c+=1
print c



不使用生成器表达式会更简单(也可能更快):


search = re.compile(''sometext'')。search


c = 0

for line in open(''bigfile''):

如果搜索(行):

c + = 1


或许更快(因为名字查找的数量减少了),使用

itertools.ifilter:


来自itertools import ifilter <对于ifilter中的行(搜索,''bigfile''),
c = 0



c + = 1

如果''sometext''只是文本(没有regexp通配符),那么更简单:


....

for line in。 ..:

如果''sometext''排成一行:

c + = 1


我不相信你虽然使用Python很容易击败grep + wc。


也许更快?


总和(bool(搜索(行))为行在open(''bigfile''))

sum(ifilter中的行为1(搜索,打开(''bigfile'')))


......等......


所有这些都是未经测试的!

-

Arnaud

It would be simpler (and probably faster) not to use a generator expression:

search = re.compile(''sometext'').search

c = 0
for line in open(''bigfile''):
if search(line):
c += 1

Perhaps faster (because the number of name lookups is reduced), using
itertools.ifilter:

from itertools import ifilter

c = 0
for line in ifilter(search, ''bigfile''):
c += 1
If ''sometext'' is just text (no regexp wildcards) then even simpler:

....
for line in ...:
if ''sometext'' in line:
c += 1

I don''t believe you''ll easily beat grep + wc using Python though.

Perhaps faster?

sum(bool(search(line)) for line in open(''bigfile''))
sum(1 for line in ifilter(search, open(''bigfile'')))

....etc...

All this is untested!
--
Arnaud


2008/5/6,Anton Slesarev< sl ************ @ gmail.com>:
2008/5/6, Anton Slesarev <sl************@gmail.com>:

但我有编写性能grep analog的一些问题。
But I have some problem with writing performance grep analog.



[...]

[...]


Python代码在Windows上慢3-4倍。而且我记得在linux上

同样的情况...


在开放时缓冲甚至增加时间。


是否可以提高文件读取性能?
Python code 3-4 times slower on windows. And as I remember on linux
the same situation...

Buffering in open even increase time.

Is it possible to increase file reading performance?



最好的建议是不要试图击败grep,但如果你真的想要b $ b,那么这是正确的地方;)


这是我的代码:

The best advice would be not to try to beat grep, but if you really
want to, this is the right place ;)

Here is my code:


这篇关于python与grep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆