JavaScript for Files中的快速低冲突非加密哈希 [英] Fast low-collision non-crypto hash in JavaScript for Files
问题描述
我正在寻找在JavaScript中实现低冲突的快速哈希。它不需要是加密哈希。我基本上使用它来查看某个文件是否已经上传(或部分上传)到用户的帐户,以便在大(视频)文件上保存一些上传时间。
I am looking for a fast hash with low collisions implemented in JavaScript. It doesn't need to be a crypto hash. I am basically using it as a way to see if a given file has already been uploaded (or partially uploaded) to a user's account to save them some upload time on large (video) files.
我正在使用新的HTML5文件API来读取文件的切片。然后我把它交给 SparkMD5 给我一个文件的哈希值。我喜欢SparkMD5允许我进行增量散列的事实,所以我不必在内存中读取整个内容。
I am using the new HTML5 File API to read in slices of the file. I then hand this off to SparkMD5 to give me a hash of the file. I like the fact that SparkMD5 allows me to do an incremental hash so I don't have to read the entire thing in memory.
总体而言,SparkMD5可以满足我的需求但是对于大文件,它可能需要一段时间才能得到我的哈希值(对于300MB文件大约需要30秒)。我希望减少这一点。我对哈希函数知之甚少,所以我不想寻找一些已经实现的库。
Overall, SparkMD5 works for my needs but for large files it can take a while to get me my hash (about 30 seconds for a 300MB file). I would ideally like to reduce this. I am not that knowledgeable on hash functions so I am not looking to port something and am ideally looking for an already implemented library.
推荐答案
我基准测试各种散列算法,这是我找到的最快的选项:
I benchmarked various hashing algorithms, and here's the fastest options I found:
-
如果您只需要32位摘要,请使用 iMurmurHash 。请注意,这将在大约2 ** 14(16,000)个哈希值后发生冲突。
If you only need 32 bit digests, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
使用 SparkMD5 如果你需要超过32位。我没有找到快速的64或128位Murmur实现,但SparkMD5的速度非常快(75 MB /秒)。
Use SparkMD5 if you need more than 32 bits. I didn't find a fast 64 or 128 bit Murmur implementation, but SparkMD5 was surprisingly fast (75 MB/sec).
- 如果你需要增量更新,考虑将字符串连接到更大的块然后再将它们提供给SparkMD5,因为SparkMD5的增量散列似乎会受到一些适度的开销。
这些建议适用于纯JavaScript。我用字符串对它们进行了基准测试,但是SparkMD5也使用了ArrayBuffers。
These recommendations are for pure JavaScript. I benchmarked them with strings, but SparkMD5 takes ArrayBuffers as well.
如果你想在Node上使用快速哈希模块,你最好选项略有不同:
If you want fast hashing modules on Node, your best options are slightly different:
-
如果您正在散列缓冲区:使用内置的加密模块。
-
例外情况是:如果您不需要增量散列,和需要超过500 MB /秒的吞吐量,和就可以了本机npm依赖,使用 murmurhash-native 获得一些额外的性能。我没有测试小于128位的摘要大小,因为即使使用128位,散列也是如此之快,以至于它不太可能成为瓶颈。
The exception to this is: If you don't need incremental hashing, and you need more than 500 MB/sec throughput, and you're OK with a native npm dependency, use murmurhash-native for some extra performance. I didn't test digest sizes of less than 128 bit, as even with 128 bits the hashing is so fast that it's not likely to be a bottleneck.
(注意, murmurhash-native在技术上支持增量散列,但在此模式下它比Node的内置MD5算法慢。)
(Note that murmurhash-native technically supports incremental hashing, but it's slower than Node's built-in MD5 algorithm in this mode.)
如果您以非增量方式对单个字符串进行哈希处理,请将
转换为Buffer并查看上一个项目符号点。
If you are hashing a single string non-incrementally, convert it to a Buffer and see the preceding bullet point.
如果您逐渐散列字符串:
If you are hashing strings incrementally:
-
如果您只需要32位,请使用iMurmurHash。请注意,这将在大约2 ** 14(16,000)个哈希值后发生冲突。
If you only need 32 bits, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
如果需要超过32位:使用内置加密使用MD5算法的模块。
If you need more than 32 bits: Use the built-in crypto module with the MD5 algorithm.
- 我还建议你尝试将字符串连接成更大的块,因为字符串在默认情况下被隐式转换为Buffers你将它们传递给加密模块,每个缓冲区创建都很慢。性能通常会受到缓冲区创建和字符串连接的瓶颈,因为相比之下本机哈希函数非常快。
这篇关于JavaScript for Files中的快速低冲突非加密哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!