JavaScript for Files中的快速低冲突非加密哈希 [英] Fast low-collision non-crypto hash in JavaScript for Files

查看:119
本文介绍了JavaScript for Files中的快速低冲突非加密哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找在JavaScript中实现低冲突的快速哈希。它不需要是加密哈希。我基本上使用它来查看某个文件是否已经上传(或部分上传)到用户的帐户,以便在大(视频)文件上保存一些上传时间。

I am looking for a fast hash with low collisions implemented in JavaScript. It doesn't need to be a crypto hash. I am basically using it as a way to see if a given file has already been uploaded (or partially uploaded) to a user's account to save them some upload time on large (video) files.

我正在使用新的HTML5文件API来读取文件的切片。然后我把它交给 SparkMD5 给我一个文件的哈希值。我喜欢SparkMD5允许我进行增量散列的事实,所以我不必在内存中读取整个内容。

I am using the new HTML5 File API to read in slices of the file. I then hand this off to SparkMD5 to give me a hash of the file. I like the fact that SparkMD5 allows me to do an incremental hash so I don't have to read the entire thing in memory.

总体而言,SparkMD5可以满足我的需求但是对于大文件,它可能需要一段时间才能得到我的哈希值(对于300MB文件大约需要30秒)。我希望减少这一点。我对哈希函数知之甚少,所以我不想寻找一些已经实现的库。

Overall, SparkMD5 works for my needs but for large files it can take a while to get me my hash (about 30 seconds for a 300MB file). I would ideally like to reduce this. I am not that knowledgeable on hash functions so I am not looking to port something and am ideally looking for an already implemented library.

推荐答案

基准测试各种散列算法,这是我找到的最快的选项:

I benchmarked various hashing algorithms, and here's the fastest options I found:


  • 如果您只需要32位摘要,请使用 iMurmurHash 。请注意,这将在大约2 ** 14(16,000)个哈希值后发生冲突。

  • If you only need 32 bit digests, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.

使用 SparkMD5 如果你需要超过32位。我没有找到快速的64或128位Murmur实现,但SparkMD5的速度非常快(75 MB /秒)。

Use SparkMD5 if you need more than 32 bits. I didn't find a fast 64 or 128 bit Murmur implementation, but SparkMD5 was surprisingly fast (75 MB/sec).


  • 如果你需要增量更新,考虑将字符串连接到更大的块然后再将它们提供给SparkMD5,因为SparkMD5的增量散列似乎会受到一些适度的开销。

这些建议适用于纯JavaScript。我用字符串对它们进行了基准测试,但是SparkMD5也使用了ArrayBuffers。

These recommendations are for pure JavaScript. I benchmarked them with strings, but SparkMD5 takes ArrayBuffers as well.

如果你想在Node上使用快速哈希模块,你最好选项略有不同:

If you want fast hashing modules on Node, your best options are slightly different:


  • 如果您正在散列缓冲区:使用内置的加密模块。


  • 例外情况是:如果您不需要增量散列,需要超过500 MB /秒的吞吐量,就可以了本机npm依赖,使用 murmurhash-native 获得一些额外的性能。我没有测试小于128位的摘要大小,因为即使使用128位,散列也是如此之快,以至于它不太可能成为瓶颈。

  • The exception to this is: If you don't need incremental hashing, and you need more than 500 MB/sec throughput, and you're OK with a native npm dependency, use murmurhash-native for some extra performance. I didn't test digest sizes of less than 128 bit, as even with 128 bits the hashing is so fast that it's not likely to be a bottleneck.

(注意, murmurhash-native在技术上支持增量散列,但在此模式下它比Node的内置MD5算法慢。)

(Note that murmurhash-native technically supports incremental hashing, but it's slower than Node's built-in MD5 algorithm in this mode.)

如果您以非增量方式对单个字符串进行哈希处理,请将
转换为Buffer并查看上一个项目符号点。

If you are hashing a single string non-incrementally, convert it to a Buffer and see the preceding bullet point.

如果您逐渐散列字符串:

If you are hashing strings incrementally:


  • 如果您只需要32位,请使用iMurmurHash。请注意,这将在大约2 ** 14(16,000)个哈希值后发生冲突。

  • If you only need 32 bits, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.

如果需要超过32位:使用内置加密使用MD5算法的模块。

If you need more than 32 bits: Use the built-in crypto module with the MD5 algorithm.


  • 我还建议你尝试将字符串连接成更大的块,因为字符串在默认情况下被隐式转换为Buffers你将它们传递给加密模块,每个缓冲区创建都很慢。性能通常会受到缓冲区创建和字符串连接的瓶颈,因为相比之下本机哈希函数非常快。

这篇关于JavaScript for Files中的快速低冲突非加密哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆