在Python中获取大文件的MD5哈希 [英] Get MD5 hash of big files in Python

查看:329
本文介绍了在Python中获取大文件的MD5哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用过hashlib(在Python 2.6/3.0中取代了md5),如果我打开一个文件并将其内容放在

I have used hashlib (which replaces md5 in Python 2.6/3.0) and it worked fine if I opened a file and put its content in hashlib.md5() function.

问题在于非常大的文件,其大小可能超过RAM大小.

The problem is with very big files that their sizes could exceed RAM size.

如何在不将整个文件加载到内存的情况下获取文件的MD5哈希?

How to get the MD5 hash of a file without loading the whole file to memory?

推荐答案

将文件拆分为8192字节的块(或128字节的其他倍数),然后使用update()连续将其馈送到MD5.

Break the file into 8192-byte chunks (or some other multiple of 128 bytes) and feed them to MD5 consecutively using update().

这利用了MD5具有128字节摘要块(8192为128×64)这一事实.由于您没有将整个文件读取到内存中,因此占用的内存不会超过8192字节.

This takes advantage of the fact that MD5 has 128-byte digest blocks (8192 is 128×64). Since you're not reading the entire file into memory, this won't use much more than 8192 bytes of memory.

在Python 3.8+中,您可以做到

In Python 3.8+ you can do

import hashlib
with open("your_filename.txt", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)
print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

这篇关于在Python中获取大文件的MD5哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆