使用Python的高效grep? [英] Efficient grep using Python?

查看:53
本文介绍了使用Python的高效grep?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

刚开始考虑学习python。


有没有什么地方我可以得到一些免费的例子,特别是对于

以下的问题(它对于那些使用python的人来说一定是微不足道的)


我有文件A,而B每个包含100,000行(每行= 1

字符串没有任何空格)


我想做什么


" A - (A交叉点B)


基本上,想要做有效的grep,我从A中移除那些

也存在的行在文件B中。

解决方案

>>>>> " SF" == sf< sf@sf.sf>写道:


sf>刚开始考虑学习python。有没有

sf>我可以获得一些免费示例的地方,尤其是

sf>以下类型的问题(对于那些使用

sf> python的人来说一定是微不足道的)


sf>我有文件A和B,每个包含10万行(每行

sf>我想做什么


sf> " A - (A交叉点B)


sf>基本上,想要做有效的grep,我来自A删除

sf>这些行也存在于文件B中。


如果你只谈论100K行左右,你就有了一台合理的现代计算机,你可以在内存中完成这一切。如果订单

无关紧要(可能确实如此),您可以使用一套来获取文件B中不在A中的所有




from sets import Set

A = Set(file(''test1.dat'')。readlines())

B = Set (file(''test2.dat'')。readlines())

print BA


为了保留顺序,你应该使用一个映射行的字典排队

数字。您可以稍后使用这些数字进行排序


A = dict([(line,num)表示num,line in enumerate(file(''test1.dat''))])

B = dict([(line,num)为num,行为枚举(file(''test2.dat''))])


keep = [(num,line)for line,num in B.items()如果不是A.has_key(line)]

keep.sort()

for num ,保留行:

打印行,


现在别人会来告诉你所有这些功能

已经在标准库中。但是因为python让这些事情变得如此简单,所以一次自己破解这个

总是很有趣。


JDH

" SF" < sf@sf.sf>写道:

我有文件A,和B各包含10万行(每行=一个
字符串,没有任何空格)

我想要要做

A - (A交叉点B)

基本上,想要做有效的grep,我从A中移除那些
也存在于文件B中的行。




这是'grep'的一个不寻常的定义,但以下似乎

做你想做的事:

afile =" a.txt"

bfile =" b.txt"


bdict = dict.fromkeys(open(bfile) ).readlines())


for line in open(afile):

如果行不在bdict:

print line ,


< / F>


sf写道:

刚开始考虑学习python。

是否有任何地方我可以获得一些免费的例子,特别是对于
以下的问题(对于那些使用python的人来说一定是微不足道的)

我有文件A和B,每个包含10万行(每行=一个没有任何空格的字符串)

我想要做
A - (A交叉点B)

基本上,想要做有效的grep,我从A中移除那些
也存在于文件B中的行。




你可以优雅地使用新的套装功能实现

这里有参考的unix方法:


排序abb | uniq -u

-
$ b $bPádraigBrady - http://www.pixelbeat.org

-


Just started thinking about learning python.

Is there any place where I can get some free examples, especially for
following kind of problem ( it must be trivial for those using python)

I have files A, and B each containing say 100,000 lines (each line=one
string without any space)

I want to do

" A - (A intersection B) "

Essentially, want to do efficient grep, i..e from A remove those lines which
are also present in file B.

解决方案

>>>>> "sf" == sf <sf@sf.sf> writes:

sf> Just started thinking about learning python. Is there any
sf> place where I can get some free examples, especially for
sf> following kind of problem ( it must be trivial for those using
sf> python)

sf> I have files A, and B each containing say 100,000 lines (each
sf> line=one string without any space)

sf> I want to do

sf> " A - (A intersection B) "

sf> Essentially, want to do efficient grep, i..e from A remove
sf> those lines which are also present in file B.

If you''re only talking about 100K lines or so, and you have a
reasonably modern computer, you can do this all in memory. If order
doesn''t matter (it probably does) you can use a set to get all the
lines in file B that are not in A

from sets import Set
A = Set(file(''test1.dat'').readlines())
B = Set(file(''test2.dat'').readlines())
print B-A

To preserve order, you should use a dictionary that maps lines to line
numbers. You can later use these numbers to sort

A = dict([(line, num) for num,line in enumerate(file(''test1.dat''))])
B = dict([(line, num) for num,line in enumerate(file(''test2.dat''))])

keep = [(num, line) for line,num in B.items() if not A.has_key(line)]
keep.sort()
for num, line in keep:
print line,

Now someone else will come along and tell you all this functionality
is already in the standard library. But it''s always fun to hack this
out yourself once because python makes such things so damned easy.

JDH


"sf" <sf@sf.sf> wrote:

I have files A, and B each containing say 100,000 lines (each line=one
string without any space)

I want to do

" A - (A intersection B) "

Essentially, want to do efficient grep, i..e from A remove those lines which
are also present in file B.



that''s an unusual definition of "grep", but the following seems to
do what you want:

afile = "a.txt"
bfile = "b.txt"

bdict = dict.fromkeys(open(bfile).readlines())

for line in open(afile):
if line not in bdict:
print line,

</F>


sf wrote:

Just started thinking about learning python.

Is there any place where I can get some free examples, especially for
following kind of problem ( it must be trivial for those using python)

I have files A, and B each containing say 100,000 lines (each line=one
string without any space)

I want to do

" A - (A intersection B) "

Essentially, want to do efficient grep, i..e from A remove those lines which
are also present in file B.



You could implement elegantly using the new sets feature
For reference here is the unix way to do it:

sort a b b | uniq -u

--
Pádraig Brady - http://www.pixelbeat.org
--


这篇关于使用Python的高效grep?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆