在Python中将列表项与大文件中的行匹配的最有效方法是什么? [英] What is the most efficient way to match list items to lines in a large file in Python?

查看:223
本文介绍了在Python中将列表项与大文件中的行匹配的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为my_file的大文件(5Gb).我有一个名为my_list的列表.读取文件中每一行的最有效方法是什么,如果my_list中的项目与my_file中的行中的项目相匹配,则创建一个名为matches的新列表,其中包含 AND my_list中发生匹配的项目.这是我正在尝试做的事情:

I have a large file (5Gb) called my_file. I have a list called my_list. What is the most efficient way to read each line in the file and, if an item from my_list matches an item from a line in my_file, create a new list called matches that contains items from the lines in my_file AND items from my_list where a match occurred. Here is what I am trying to do:

def calc(my_file, my_list)
    matches = []
    my_file.seek(0,0)
    for i in my_file:
        i = list(i.rstrip('\n').split('\t'))
        for v in my_list:
            if v[1] == i[2]:
                item = v[0], i[1], i[3]
                matches.append(item)
    return matches

这是my_file中的几行:

lion    4    blue    ch3
sheep   1    red     pq2
frog    9    green   xd7
donkey  2    aqua    zr8

这是my_list

intel    yellow
amd      green
msi      aqua    

在上面的示例中,所需的输出(列表列表)将是:

The desired output, a list of lists, in the above example would be:

[['amd', 9, 'xd7'], ['msi', 2, 'zr8']]

我的代码目前正在运行,尽管速度很慢.使用生成器或序列化会有所帮助吗?谢谢.

My code is currently work, albeit really slow. Would using a generator or serialization help? Thanks.

推荐答案

您可以构建字典以查找v.我添加了其他一些小的优化方法:

You could build a dictonary for looking up v. I added further little optimizations:

def calc(my_file, my_list)

    vd = dict( (v[1],v[0]) for v in my_list)

    my_file.seek(0,0)
    for line in my_file:
        f0, f1, f2, f3 = line[:-1].split('\t')
        v0 = vd.get(f2)
        if v0 is not None:
           yield (v0, f1, f3)

对于大型my_list,这应该更快.

This should be much faster for a large my_list.

使用get的速度比检查i[2]是否在vd中+访问vd[i[2]]

Using get is faster than checking if i[2] is in vd + accessing vd[i[2]]

要获得除这些优化之外的更多加速​​,我建议 http://www.cython.org

For getting more speedup beyond these optimizations I recommend http://www.cython.org

这篇关于在Python中将列表项与大文件中的行匹配的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆