CRC32可以用作散列函数吗? [英] Can CRC32 be used as a hash function?

查看:635
本文介绍了CRC32可以用作散列函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CRC32可以用作散列函数吗?这种方法的任何缺点?
任何tradedeoffs?

解决方案

CRC32非常好作为散列算法。 CRC的整点是用尽可能少的冲突散列字节流。也就是说,有几点需要考虑:


  • CRC是不安全的。为了安全散列,你需要一个计算量更大的算法。对于一个简单的桶加密器来说,安全性通常是一个非问题。


  • 不同的CRC版本存在不同的属性。确保你使用正确的算法,例如使用散列多项式0x11EDC6F41(CRC32C),这是最佳的通用选择。

  • 作为哈希速度/质量折衷,x86 CRC32指令很难击败。但是,这个指令在旧CPU中不存在,所以要小心可移植性的问题。




----编辑 - ---

Mark Adler为Bret Mulvey提供了一篇有用的文章,用于散列评估。使用本文提供的源代码,我对CRC32C和Jenkins96进行了桶测试。这些表格显示了一个真正的均匀分布比单独偶然测量结果更糟的可能性。所以,数字越高越好。作者认为0.05或更低为弱,0.01或更低为弱。我完全相信所有这些作者,并且只是报告结果。



我在所有CRC32C比Jenkins96表现更好的实例中放置了一个*。通过这个简单的计数,CRC32C是一个比Jenkins96更为统一的散列96次96次。 特别是如果您可以使用x86 CRC32指令,则速度性能折衷非常好。

 
CRC32C 0x1EDC6F41)

统一键文本键稀疏键

位低位高位低位高位低位高位
1 0.671 * 0.671 * 1.000 0.120 * 0.572 * 0.572
2 * 0.706 * 0.165 * 0.729 * 0.919 0.277 0.440
3 * 0.878 * 0.879 * 0.556 0.362 * 0.535 * 0.542
4 0.573 0.332 0.433 0.462 * 0.855 0.393
5 0.023 * 0.681 0.470 0.907 0.266 0.059
6 * 0.145 * 0.523 0.354 * 0.172 * 0.336 0.588
7 0.424 0.722 0.172 * 0.736 0.184 * 0.842
8 * 0.767 0.507 * 0.533 0.437 0.337 0.321
9 0.480 0.725 * 0.753 * 0.807 * 0.618 0.025
10 * 0.719 0.161 * 0.970 * 0.740 * 0.789 0.344
11 * 0.610 0.225 * 0.849 * 0.814 * 0.854 * 0.003
12 * 0.979 * 0.239 * 0.709 0.786 0.171 * 0.865
13 * 0.515 0.395 0.192 0.600 0.869 * 0.238 $ b $ 14 0.089 * 0.609 0.055 * 0.414 * 0.286 * 0.398
15 * 0.372 * 0.719 * 0.944 0.100 * 0.852 * 0.300
16 0.015 * 0.946 * 0.467 0.459 0.372 * 0.793

而对于Jenkins96而言,文章被认为是一个很好的散列:

 
Jenkins96

统一键文本键稀疏键

位低位高位低位高位低位高位
1 0.888 0.572 0.090 0.322 0.090 0.203
2 0.198 0.027 0.505 0.447 0.729 0.825
3 0.444 0.510 0.360 0.444 0.467 0.540
4 0.974 0.783 0.724 0.971 0.439 0.902
5 0.308 0.383 0.686 0.940 0.424 0.119
6 0.138 0.505 0.907 0.103 0.300 0.891
7 0.7 10 0.956 0.202 0.407 0.792 0.506
8 0.031 0.552 0.229 0.573 0.407 0.688
9 0.682 0.990 0.276 0.075 0.269 0.543
10 0.382 0.933 0.038 0.559 0.746 0.511
11 0.043 0.918 0.101 0.290 0.584 0.822
12 0.895 0.036 0.207 0.966 0.486 0.533
13 0.290 0.872 0.902 0.934 0.877 0.155
14 0.859 0.568 0.428 0.027 0.136 0.265
15 0.290 0.420 0.915 0.465 0.532 0.059
16 0.155 0.922 0.036 0.577 0.545 0.336


Can CRC32 be used as a hash function? Any drawbacks to this approach? Any tradedeoffs?

解决方案

CRC32 works very well as a hash algorithm. The whole point of a CRC is to hash a stream of bytes with as few collisions as possible. That said, there are a few points to consider:

  • CRC's are not secure. For secure hashing you need a much more computationally expensive algorithm. For a simple bucket hasher, security is usually a non-issue.

  • Different CRC flavors exist with different properties. Make sure you use the right algorithm, e.g. with hash polynomial 0x11EDC6F41 (CRC32C) which is the optimal general purpose choice.

  • As a hashing speed/quality trade-off, the x86 CRC32 instruction is tough to beat. However, this instruction doesn't exist in older CPU's so beware of portability problems.

---- EDIT ----

Mark Adler provided a link to a useful article for hash evaluation by Bret Mulvey. Using the source code provided in the article, I ran the "bucket test" for both CRC32C and Jenkins96. These tables show the probability that a truly uniform distribution would be worse than the measured result by chance alone. So, higher numbers are better. The author considered 0.05 or lower to be weak and 0.01 or lower to be very weak. I'm entirely trusting the author on all this and am just reporting results.

I placed an * by all the instances where CRC32C performed better than Jenkins96. By this simple tally, CRC32C was a more uniform hash than Jenkins96 54 of 96 times. Especially if you can use the x86 CRC32 instruction, the speed performance trade-off is excellent.

CRC32C (0x1EDC6F41)

       Uniform keys        Text keys         Sparse keys

Bits  Lower    Upper     Lower    Upper     Lower    Upper
 1    0.671   *0.671    *1.000    0.120    *0.572   *0.572
 2   *0.706   *0.165    *0.729   *0.919     0.277    0.440
 3   *0.878   *0.879    *0.556    0.362    *0.535   *0.542
 4    0.573    0.332     0.433    0.462    *0.855    0.393
 5    0.023   *0.681     0.470    0.907     0.266    0.059
 6   *0.145   *0.523     0.354   *0.172    *0.336    0.588
 7    0.424    0.722     0.172   *0.736     0.184   *0.842
 8   *0.767    0.507    *0.533    0.437     0.337    0.321
 9    0.480    0.725    *0.753   *0.807    *0.618    0.025
10   *0.719    0.161    *0.970   *0.740    *0.789    0.344
11   *0.610    0.225    *0.849   *0.814    *0.854   *0.003
12   *0.979   *0.239    *0.709    0.786     0.171   *0.865
13   *0.515    0.395     0.192    0.600     0.869   *0.238
14    0.089   *0.609     0.055   *0.414    *0.286   *0.398
15   *0.372   *0.719    *0.944    0.100    *0.852   *0.300
16    0.015   *0.946    *0.467    0.459     0.372   *0.793

And for Jenkins96, which the author of article considered to be an excellent hash:

Jenkins96

      Uniform keys         Text keys         Sparse keys

Bits  Lower    Upper     Lower    Upper     Lower    Upper
 1    0.888    0.572     0.090    0.322     0.090    0.203
 2    0.198    0.027     0.505    0.447     0.729    0.825
 3    0.444    0.510     0.360    0.444     0.467    0.540
 4    0.974    0.783     0.724    0.971     0.439    0.902
 5    0.308    0.383     0.686    0.940     0.424    0.119
 6    0.138    0.505     0.907    0.103     0.300    0.891
 7    0.710    0.956     0.202    0.407     0.792    0.506
 8    0.031    0.552     0.229    0.573     0.407    0.688
 9    0.682    0.990     0.276    0.075     0.269    0.543
10    0.382    0.933     0.038    0.559     0.746    0.511
11    0.043    0.918     0.101    0.290     0.584    0.822
12    0.895    0.036     0.207    0.966     0.486    0.533
13    0.290    0.872     0.902    0.934     0.877    0.155
14    0.859    0.568     0.428    0.027     0.136    0.265
15    0.290    0.420     0.915    0.465     0.532    0.059
16    0.155    0.922     0.036    0.577     0.545    0.336

这篇关于CRC32可以用作散列函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆