在Python中重新读取文件的最快方法? [英] Fastest way to re-read a file in Python?

查看:155
本文介绍了在Python中重新读取文件的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,其中包含名称及其位置(开始-结束)的列表.

I've got a file which has a list of names and their position(start - end).

我的脚本遍历该文件,并按名称读取另一个包含信息的文件,以检查该行是否在这些位置之间,然后从中计算出一些内容.

My script iterates over that file and per name it reads another file with info to check if that line is between those positions and then calculates something out of that.

此刻,它逐行读取整个第二个文件(60MB),检查它是否在开始/结束之间.对于第一个列表中的每个名称(大约5000).收集这些参数之间的数据而不是重新读取整个文件5000次的最快方法是什么?

At the moment it reads the whole second file(60MB) line by line checking if it's between the start / end. For every name in the first list(approx 5000). What's the fastest way to collect the data that's between those parameters instead of rereading the whole file 5000 times?

第二个循环的示例代码:

Sample code of the second loop:

for line in file:
    if int(line.split()[2]) >= start and int(line.split()[2]) <= end:
        Dosomethingwithline():

将文件加载到第一个循环上方的列表中,并对其进行迭代,从而提高了速度.

Loading the file in a list above the first loop and iterating over that improved the speed.

with open("filename.txt", 'r') as f:
    file2 = f.readlines()
for line in file:
    [...]
    for line2 in file2:
       [...]

推荐答案

您可以使用

You can use the mmap module to load that file into memory, then iterate.

示例:

import mmap

# write a simple example file
with open("hello.txt", "wb") as f:
    f.write(b"Hello Python!\n")

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    mm = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print(mm.readline())  # prints b"Hello Python!\n"
    # read content via slice notation
    print(mm[:5])  # prints b"Hello"
    # update content using slice notation;
    # note that new content must have same size
    mm[6:] = b" world!\n"
    # ... and read again using standard file methods
    mm.seek(0)
    print(mm.readline())  # prints b"Hello  world!\n"
    # close the map
    mm.close()

这篇关于在Python中重新读取文件的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆