用Python散列文件 [英] Hashing a file in Python

查看:155
本文介绍了用Python散列文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想让python读取EOF,这样我就可以获取适当的哈希,无论它是sha1还是md5.请帮忙.这是我到目前为止的内容:

I want python to read to the EOF so I can get an appropriate hash, whether it is sha1 or md5. Please help. Here is what I have so far:

import hashlib

inputFile = raw_input("Enter the name of the file:")
openedFile = open(inputFile)
readFile = openedFile.read()

md5Hash = hashlib.md5(readFile)
md5Hashed = md5Hash.hexdigest()

sha1Hash = hashlib.sha1(readFile)
sha1Hashed = sha1Hash.hexdigest()

print "File Name: %s" % inputFile
print "MD5: %r" % md5Hashed
print "SHA1: %r" % sha1Hashed

推荐答案

TL; DR使用缓冲区不占用大量内存.

我认为,当我们考虑使用非常大的文件对内存的影响时,我们就陷入了问题的症结. pasztorpisti 指出,我们不希望这个坏男孩为2 GB的文件搅动2 GB的ram. ,我们必须分批处理那些较大的文件!

We get to the crux of your problem, I believe, when we consider the memory implications of working with very large files. We don't want this bad boy to churn through 2 gigs of ram for a 2 gigabyte file so, as pasztorpisti points out, we gotta deal with those bigger files in chunks!

import sys
import hashlib

# BUF_SIZE is totally arbitrary, change for your app!
BUF_SIZE = 65536  # lets read stuff in 64kb chunks!

md5 = hashlib.md5()
sha1 = hashlib.sha1()

with open(sys.argv[1], 'rb') as f:
    while True:
        data = f.read(BUF_SIZE)
        if not data:
            break
        md5.update(data)
        sha1.update(data)

print("MD5: {0}".format(md5.hexdigest()))
print("SHA1: {0}".format(sha1.hexdigest()))

我们所做的是,随着我们与hashlib方便的dandy一起使用,我们以64kb块的形式更新了这个坏男孩的哈希值.

What we've done is we're updating our hashes of this bad boy in 64kb chunks as we go along with hashlib's handy dandy update method. This way we use a lot less memory than the 2gb it would take to hash the guy all at once!

您可以使用以下方法进行测试:

You can test this with:

$ mkfile 2g bigfile
$ python hashes.py bigfile
MD5: a981130cf2b7e09f4686dc273cf7187e
SHA1: 91d50642dd930e9542c39d36f0516d45f4e1af0d
$ md5 bigfile
MD5 (bigfile) = a981130cf2b7e09f4686dc273cf7187e
$ shasum bigfile
91d50642dd930e9542c39d36f0516d45f4e1af0d  bigfile

希望有帮助!

所有这些都在右侧的链接问题中进行了概述:

Also all of this is outlined in the linked question on the right hand side: Get MD5 hash of big files in Python

通常,在编写python时,它有助于养成遵循 pep-8的习惯.例如,在python中,变量通常用下划线分隔而不是驼峰式.但这只是样式,除了那些必须阅读不良样式的人之外,没有人真正关心这些事情……这可能是您从现在开始阅读此代码.

In general when writing python it helps to get into the habit of following pep-8. For example, in python variables are typically underscore separated not camelCased. But that's just style and no one really cares about those things except people who have to read bad style... which might be you reading this code years from now.

这篇关于用Python散列文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆