什么是较便宜的哈希算法? [英] What is the less expensive hash algorithm?

查看:126
本文介绍了什么是较便宜的哈希算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不太了解哈希算法。

我需要在将文件转发到远程系统之前,在Java中计算传入文件的哈希值(有点像S3),这需要在MD2 / MD5 / SHA-X中的文件散列。
这个哈希不是出于安全考虑而是为了一致性校验和而计算的。

I need to compute the hash of an incoming file live in Java before forwarding the file a remote system (a bit like S3) which requires a file hash in MD2/MD5/SHA-X. This hash is not computed for security reasons but simply for a consistency checksum.

我可以在转发文件的同时用DigestInputStream的Java标准库,但是想知道哪种算法最适合使用以避免使用DigestInputStream的性能问题?

I am able to compute this hash live while forwarding the file, with a DigestInputStream of Java standard library, but would like to know which algorithm is the best to use to avoid performance problems of using the DigestInputStream?

我之前的一个同事测试并告诉我们与unix命令行或文件相比,计算哈希实况可能相当昂贵。

One of my former collegue tested and told us that computing the hash live can be quite expensive compared to an unix command line or on a file.

编辑关于过早优化:
我正在建立一个公司,目标是帮助其他公司对其文档进行重新设置。
这意味着我们有一个批处理其他公司的文件传输。我们今后针对每天的数百万个文档,实际上这批次的执行时间对我们的业务很敏感。

Edit about premature optimization: I work an a company which targets to help other companies to dematerialize their documents. This means we have a batch which handle document transfers from other companies. We target in the future millions of document per days and actually, the execution time of this batch is sensitive for our business.

一百万个文档的散列优化为10毫秒每天的执行时间减少了3个小时,这是相当巨大的。

An hashing optimisation of 10 milliseconds for 1 million document per day is a daily execution time reduced of 3 hours which is pretty huge.

推荐答案

如果你只是想检测意外的腐败传输等,那么一个简单的(非加密)校验和就足够了。但请注意(例如)16位校验和将无法在2 16 中检测到一次随机损坏。对于有意修改数据的人来说,这并不是防守。

If you simply want to detect accidental corruption during transmission, etc, then a simple (non-crypto) checksum should be sufficient. But note that (for example) a 16 bit checksum will fail to detect random corruption one time in 216. And it is no guard against someone deliberately modifying the data.

维基百科页面校验码列出了各种选项,包括Adler-32和CRC等常用(和便宜)的一些选项。

The Wikipedia page on Checksums, lists various options including a number of commonly used (and cheap) ones like Adler-32 and CRCs.

然而,我同意@ppeterka。这种过早优化的气味。

However, I agree with @ppeterka. This smells of "premature optimization".

这篇关于什么是较便宜的哈希算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆