如何读取非常大的文本文件的最后一个MB [英] How to read the last MB of a very large text file

查看:117
本文介绍了如何读取非常大的文本文件的最后一个MB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在文本文件末尾附近找到一个字符串.问题在于文本文件的大小可能相差很大.从3MB到4GB.但是,每当我尝试运行脚本以在大约3GB的文本文件中找到此字符串时,我的计算机就会用尽内存.因此,我想知道python是否仍然可以找到文件的大小,然后读取文件的最后一兆字节.

I am trying to find a string near the end of a text file. The problem is that the text file can vary greatly in size. From 3MB to 4GB. But everytime I try to run a script to find this string in a text file that is around 3GB, my computer runs out of memory. SO I was wondering if there was anyway for python to find the size of the file and then read the last megabyte of it.

我当前使用的代码如下,但是就像我之前说的那样,我似乎没有足够的内存来读取如此大的文件.

The code I am currently using is as follows, but like I said earlier, I do not seem to have a big enough memory to read such large files.

find_str = "ERROR"
file = open(file_directory)                           
last_few_lines​ = file.readlines()[-20:]   

error​ = False  

for line in ​last_few_lines​:
    if find_str in line:
    ​    error​ = True

推荐答案

使用 file.seek():

import os
find_str = "ERROR"
error = False
# Open file with 'b' to specify binary mode
with open(file_directory, 'rb') as file:
    file.seek(-1024 * 1024, os.SEEK_END)  # Note minus sign
    if find_str in file.read():
        error = True

打开文件时,必须指定二进制模式,否则会出现未定义的行为".在python2下,它仍然可以工作(对我有用),但是在python3下,如果以默认文本模式打开文件,则seek()将引发io.UnsupportedOperation异常. python 3文档位于此处.尽管这些文档尚不清楚,但SEEK_*常量仍在os模块中.

You must specify binary mode when you open the file or you will get 'undefined behavior.' Under python2, it might work anyway (it did for me), but under python3 seek() will raise an io.UnsupportedOperation exception if the file was opened in the default text mode. The python 3 docs are here. Though it isn't clear from those docs, the SEEK_* constants are still in the os module.

更新:使用 声明,以确保Chris Betti提出的更安全的资源管理.

Update: Using with statement for safer resource management, as suggested by Chris Betti.

这篇关于如何读取非常大的文本文件的最后一个MB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆