在 Python 中散列文件 [英] Hashing a file in Python

查看:25
本文介绍了在 Python 中散列文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想让 python 读取 EOF,这样我就可以获得适当的哈希值,无论是 sha1 还是 md5.请帮忙.这是我目前所拥有的:

I want python to read to the EOF so I can get an appropriate hash, whether it is sha1 or md5. Please help. Here is what I have so far:

import hashlib

inputFile = raw_input("Enter the name of the file:")
openedFile = open(inputFile)
readFile = openedFile.read()

md5Hash = hashlib.md5(readFile)
md5Hashed = md5Hash.hexdigest()

sha1Hash = hashlib.sha1(readFile)
sha1Hashed = sha1Hash.hexdigest()

print "File Name: %s" % inputFile
print "MD5: %r" % md5Hashed
print "SHA1: %r" % sha1Hashed

推荐答案

TL;DR 使用缓冲区来不使用大量内存.

我相信,当我们考虑使用非常大的文件对内存的影响时,我们就找到了问题的症结所在.正如 pasztorpisti 指出的那样,我们不希望这个坏男孩为 2 GB 的文件运行 2 gig 的内存,我们必须分块处理那些更大的文件!

We get to the crux of your problem, I believe, when we consider the memory implications of working with very large files. We don't want this bad boy to churn through 2 gigs of ram for a 2 gigabyte file so, as pasztorpisti points out, we gotta deal with those bigger files in chunks!

import sys
import hashlib

# BUF_SIZE is totally arbitrary, change for your app!
BUF_SIZE = 65536  # lets read stuff in 64kb chunks!

md5 = hashlib.md5()
sha1 = hashlib.sha1()

with open(sys.argv[1], 'rb') as f:
    while True:
        data = f.read(BUF_SIZE)
        if not data:
            break
        md5.update(data)
        sha1.update(data)

print("MD5: {0}".format(md5.hexdigest()))
print("SHA1: {0}".format(sha1.hexdigest()))

我们所做的是以 64kb 块更新这个坏男孩的哈希值,同时使用 hashlib 的方便花花公子 更新方法.通过这种方式,我们使用的内存比一次性散列所有家伙所需的 2GB 内存少得多!

What we've done is we're updating our hashes of this bad boy in 64kb chunks as we go along with hashlib's handy dandy update method. This way we use a lot less memory than the 2gb it would take to hash the guy all at once!

您可以使用以下方法进行测试:

You can test this with:

$ mkfile 2g bigfile
$ python hashes.py bigfile
MD5: a981130cf2b7e09f4686dc273cf7187e
SHA1: 91d50642dd930e9542c39d36f0516d45f4e1af0d
$ md5 bigfile
MD5 (bigfile) = a981130cf2b7e09f4686dc273cf7187e
$ shasum bigfile
91d50642dd930e9542c39d36f0516d45f4e1af0d  bigfile

希望有帮助!

此外,所有这些都在右侧的链接问题中进行了概述:在Python中获取大文件的MD5哈希

Also all of this is outlined in the linked question on the right hand side: Get MD5 hash of big files in Python

通常在编写 python 时,养成遵循 pep-8 的习惯是有帮助的.例如,在 python 中变量通常是用下划线分隔的,而不是驼峰式的.但这只是风格,除了那些必须阅读糟糕风格的人之外,没有人真正关心这些事情......这可能是你几年后阅读这段代码的原因.

In general when writing python it helps to get into the habit of following pep-8. For example, in python variables are typically underscore separated not camelCased. But that's just style and no one really cares about those things except people who have to read bad style... which might be you reading this code years from now.

这篇关于在 Python 中散列文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆