"OSError:[Errno 22]无效的参数"当read()处理一个大文件时 [英] "OSError: [Errno 22] Invalid argument" when read()ing a huge file

查看:387
本文介绍了"OSError:[Errno 22]无效的参数"当read()处理一个大文件时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个小的脚本来打印文件的校验和(使用 https://gist.github.com/Zireael-N/ed36997fd1a967d78cb2 ):

I'm trying to write a small script that prints the checksum of a file (using some code from https://gist.github.com/Zireael-N/ed36997fd1a967d78cb2):

import sys
import os
import hashlib

file = '/Users/Me/Downloads/2017-11-29-raspbian-stretch.img'

with open(file, 'rb') as f:
    contents = f.read()
    print('SHA256 of file is %s' % hashlib.sha256(contents).hexdigest())

但是我收到以下错误消息:

But I'm getting the following error message:

Traceback (most recent call last):
  File "checksum.py", line 8, in <module>
    contents = f.read()
OSError: [Errno 22] Invalid argument

我做错了什么?我在macOS High Sierra上使用python 3

What am I doing wrong? I'm using python 3 on macOS High Sierra

推荐答案

已经 几个 历史在Python的历史中(在最新版本中最固定)从文件句柄一次读取超过2-4 GB的内存(该问题的无法修复的版本也发生在32位版本的Python上,在该版本中,它们只是缺乏分配缓冲区的虚拟地址空间;与I/O不相关,但最常见的情况是大文件.可用于散列的一种变通方法是以固定大小的块更新散列(无论如何,这是一个好主意,因为指望RAM大于文件大小是一个糟糕的主意).最直接的方法是将您的代码更改为:

There have been several issues over the history of Python (most fixed in recent versions) reading more than 2-4 GB at once from a file handle (an unfixable version of the problem also occurs on 32 bit builds of Python, where they simply lack the virtual address space to allocate the buffer; not I/O related, but seen most frequently slurping large files). A workaround available for hashing is to update the hash in fixed size chunks (which is a good idea anyway, since counting on RAM being greater than file size is a poor idea). The most straightforward approach is to change your code to:

with open(file, 'rb') as f:
    hasher = hashlib.sha256()  # Make empty hasher to update piecemeal
    while True:
        block = f.read(64 * (1 << 20)) # Read 64 MB at a time; big, but not memory busting
        if not block:  # Reached EOF
            break
        hasher.update(block)  # Update with new block
print('SHA256 of file is %s' % hasher.hexdigest())  # Finalize to compute digest

如果您喜欢,可以使用两个参数iter和一些functools魔术简化"循环,将整个while循环替换为:

If you're feeling fancy, you can "simplify" the loop using two-arg iter and some functools magic, replacing the whole of the while loop with:

for block in iter(functools.partial(f.read, 64 * (1 << 20)), b''):
    hasher.update(block)

或者在Python 3.8+上,使用海象运算符:= 更简单,无需导入或不可读的代码:

Or on Python 3.8+, with the walrus operator, := it's simpler without the need for imports or unreadable code:

while block := f.read(64 * (1 << 20)):  # Assigns and tests result in conditional!
    hasher.update(block)

这篇关于"OSError:[Errno 22]无效的参数"当read()处理一个大文件时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆