从文件对象读取块,直到从末尾x个字节为止 [英] Read blocks from a file object until x bytes from the end

查看:57
本文介绍了从文件对象读取块,直到从末尾x个字节为止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要循环读取64KB的块并进行处理,但是在文件末尾减去16个字节停止:最后16个字节是 tag 元数据

I need to read chunks of 64KB in loop, and process them, but stop at the end of file minus 16 bytes: the last 16 bytes are a tag metadata.

文件可能非常大,所以我无法在RAM中全部读取它.

The file might be super large, so I can't read it all in RAM.

我发现的所有解决方案都有些笨拙和/或难以理解.

All the solutions I find are a bit clumsy and/or unpythonic.

with open('myfile', 'rb') as f:
    while True:
        block = f.read(65536)
        if not block:
            break
        process_block(block)

如果 16< = len(block)<65536 ,这很容易:这是有史以来的最后一个代码块.因此, useful_data =块[:-16] tag =块[-16:]

If 16 <= len(block) < 65536, it's easy: it's the last block ever. So useful_data = block[:-16] and tag = block[-16:]

如果 len(block)== 65536 ,则可以表示三件事:完整的块是有用的数据.还是说这64KB的块实际上是最后一个块,所以 useful_data =块[:-16] tag =块[-16:] .或者,这个64KB的块后面紧跟着另一个只有几个字节的块(比方说3个字节),因此在这种情况下: useful_data = block [:-13] tag = block [-13:] + last_block [:3] .

If len(block) == 65536, it could mean three things: that the full block is useful data. Or that this 64KB block is in fact the last block, so useful_data = block[:-16] and tag = block[-16:]. Or that this 64KB block is followed by another block of only a few bytes (let's say 3 bytes), so in this case: useful_data = block[:-13] and tag = block[-13:] + last_block[:3].

与区分所有这些情况相比,如何更好地处理此问题?

注意:

  • 该解决方案应该适用于使用 open(...)打开的文件,也适用于 io.BytesIO()对象,或者适用于远程SFTP打开的文件(带有 pysftp ).

  • the solution should work for a file opened with open(...), but also for a io.BytesIO() object, or for a distant SFTP opened file (with pysftp).

我当时正在考虑使用

f.seek(0,2)
length = f.tell()
f.seek(0)

然后每次

block = f.read(65536)

我们可以知道我们是否距离 length-f.tell()还很远,但是完整的解决方案看起来也不是很优雅.

we can know if we are far from the end with length - f.tell(), but again the full solution does not look very elegant.

推荐答案

您可以在每次迭代中阅读 min(65536,L-f.tell()-16)

you can just read in every iteration min(65536, L-f.tell()-16)

类似这样的东西:

from pathlib import Path

L = Path('myfile').stat().st_size

with open('myfile', 'rb') as f:
    while True:    
        to_read_length = min(65536, L-f.tell()-16)
        block = f.read(to_read_length)
        process_block(block)
        if f.tell() == L-16
            break

没有运行它,但是希望您能理解它的主旨.

Did not ran this, but hope you get the gist of it.

这篇关于从文件对象读取块,直到从末尾x个字节为止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆