在lagre文件上计算有效的校验和 [英] Efficient checksum calculating on lagre files

查看:86
本文介绍了在lagre文件上计算有效的校验和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好


有没有人知道为大文件计算校验和的快速方法。

我需要一种方法来生成ETag密钥网络服务器,大型

文件的ETag并不是真正的必要条件,但如果我能做到这一点会很好。我是动态生成的字符串上使用python哈希函数的
(比如

页面内容)但是像图像之类的东西我使用shutil''s

copyfileobject函数和fileobject'哈希的哈希值是'

处理程序的memmory地址。


有谁知道一个可以使用的python实用程序,或许是* b $ b类似于* nix系统上的md5sum实用程序。

-

---- ----------------------------------

Ola Natvig< ol *** *****@infosense.no>

infoSense AS / development

解决方案

Ola Natvig写道:

大家好

有没有人知道计算大文件校验和的快速方法。
我需要一种为网络服务器生成ETag密钥的方法,大型ETag
文件不是真正的必要条件,但如果我能做到这一点会很好。我是在动态生成的字符串上使用python哈希函数(比如
页面内容)但是在像图像之类的东西上我使用shutil'的
copyfileobject函数和哈希一个文件对象的哈希它是'
处理程序的memmory地址。

有没有人知道可以使用的python实用程序,也许是像md5sum实用程序那样的东西* nix系统。



well md5sum可用于许多系统。我在win32和darwin上运行它。


我在2.4中使用新的子进程模块尝试了这个


def md5sum(fn):

导入子流程

返回subprocess.Popen([" md5sum.exe",fn],

stdout = subprocess.PIPE).communicate() [0]


导入时间

t0 = time.time()

print md5sum(''test.rml'' )

t1 = time.time()

打印t1-t0


得到了


C:\ Tmp> md5sum.py

b68e4efa5e5dbca37718414f6020f6ff * test.rml


0.0160000324249

试过原文

C:\Tmp> timethis md5sum.exe test.rml

TimeThis:命令行:md5sum.exe test.rml

TimeThis:开始时间:2008年2月8日星期二16:12:26

b68e4efa5e5dbca37718414f6020f6ff * test.rml


TimeThis:命令行:md5sum.exe test.rml

TimeThis:Start Time:Tue Feb 08 16:12:26 200 5

TimeThis:结束时间:2008年2月8日星期二16:12:26

TimeThis:已用时间:00:00:00.437


C:\ Tmp> ls -l test.rml

-rw-rw-rw- 1个用户组996688 12月31日09:57 test.rml


C:\ Tmp>


-

Robin Becker


Ola Natvig写道:

有没有人知道计算大文件校验和的快速方法。
我需要一种为网络服务器生成ETag密钥的方法,大型
文件的ETag并不是真的,但如果我能做到这一点会很好。我是在动态生成的字符串上使用python哈希函数(比如
页面内容)但是在像图像之类的东西上我使用shutil'的
copyfileobject函数和哈希一个文件对象的哈希它是'
处理程序的memmory地址。

有没有人知道可以使用的python实用程序,也许是像md5sum实用程序那样的东西* nix系统。




有没有理由你不能使用sha模块?使用一个随机的大文件,我有
躺在那里:


sha.new(文件(jdk-1_5_0-linux-i586.rpm))。 read())。hexdigest()#首先将所有内容加载到内存中

如果您不想立即将整个对象加载到内存中,您可以随时调用sha1sum也可以自己实用。

subprocess.Popen([" sha1sum"," .bashrc"], stdout = subprocess.PIPE).communicate()[0] .split()[0]



''5c59906733bf780c446ea290646709a14750eaad''

-

Michael Hoffman


Michael Hoffman写道:

有没有理由你不能使用sha模块吗?




BTW,由于报告的漏洞,我使用的是SHA-1而不是MD5
$ b MD5中的$ b,这可能对y不重要我的应用程序,但我认为它最好完全避免将来使用MD5。

-

Michael Hoffman


Hi all

Does anyone know of a fast way to calculate checksums for a large file.
I need a way to generate ETag keys for a webserver, the ETag of large
files are not realy nececary, but it would be nice if I could do it. I''m
using the python hash function on the dynamic generated strings (like in
page content) but on things like images I use the shutil''s
copyfileobject function and the hash of a fileobject''s hash are it''s
handlers memmory address.

Does anyone know a python utility which is possible to use, perhaps
something like the md5sum utility on *nix systems.
--
--------------------------------------
Ola Natvig <ol********@infosense.no>
infoSense AS / development

解决方案

Ola Natvig wrote:

Hi all

Does anyone know of a fast way to calculate checksums for a large file.
I need a way to generate ETag keys for a webserver, the ETag of large
files are not realy nececary, but it would be nice if I could do it. I''m
using the python hash function on the dynamic generated strings (like in
page content) but on things like images I use the shutil''s
copyfileobject function and the hash of a fileobject''s hash are it''s
handlers memmory address.

Does anyone know a python utility which is possible to use, perhaps
something like the md5sum utility on *nix systems.


well md5sum is usable on many systems. I run it on win32 and darwin.

I tried this in 2.4 with the new subprocess module

def md5sum(fn):
import subprocess
return subprocess.Popen(["md5sum.exe", fn],
stdout=subprocess.PIPE).communicate()[0]

import time
t0 = time.time()
print md5sum(''test.rml'')
t1 = time.time()
print t1-t0

and got

C:\Tmp>md5sum.py
b68e4efa5e5dbca37718414f6020f6ff *test.rml

0.0160000324249
Tried with the original
C:\Tmp>timethis md5sum.exe test.rml

TimeThis : Command Line : md5sum.exe test.rml
TimeThis : Start Time : Tue Feb 08 16:12:26 2005

b68e4efa5e5dbca37718414f6020f6ff *test.rml

TimeThis : Command Line : md5sum.exe test.rml
TimeThis : Start Time : Tue Feb 08 16:12:26 2005
TimeThis : End Time : Tue Feb 08 16:12:26 2005
TimeThis : Elapsed Time : 00:00:00.437

C:\Tmp>ls -l test.rml
-rw-rw-rw- 1 user group 996688 Dec 31 09:57 test.rml

C:\Tmp>

--
Robin Becker


Ola Natvig wrote:

Does anyone know of a fast way to calculate checksums for a large file.
I need a way to generate ETag keys for a webserver, the ETag of large
files are not realy nececary, but it would be nice if I could do it. I''m
using the python hash function on the dynamic generated strings (like in
page content) but on things like images I use the shutil''s
copyfileobject function and the hash of a fileobject''s hash are it''s
handlers memmory address.

Does anyone know a python utility which is possible to use, perhaps
something like the md5sum utility on *nix systems.



Is there a reason you can''t use the sha module? Using a random large file I had
lying around:

sha.new(file("jdk-1_5_0-linux-i586.rpm").read()).hexdigest() # loads all into memory first

If you don''t want to load the whole object into memory at once you can always call out to the sha1sum utility yourself as well.

subprocess.Popen(["sha1sum", ".bashrc"], stdout=subprocess.PIPE).communicate()[0].split()[0]


''5c59906733bf780c446ea290646709a14750eaad''
--
Michael Hoffman


Michael Hoffman wrote:

Is there a reason you can''t use the sha module?



BTW, I''m using SHA-1 instead of MD5 because of the reported vulnerabilities
in MD5, which may not be important for your application, but I consider it
best to just avoid MD5 entirely in the future.
--
Michael Hoffman


这篇关于在lagre文件上计算有效的校验和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆