20字节数据上CRC16冲突的可能性有多大? [英] What is the possibility of CRC16 collisions on 20 bytes of data?

查看:1374
本文介绍了20字节数据上CRC16冲突的可能性有多大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个系统,它需要为一个20字节的结构存储一个散列,长度可能更少。但是,为了优化在一系列哈希中查找哈希的过程,我们希望尽可能减少哈希的大小。



所以我的问题是,我们提供给crc16哈希的数据量与它与另一个相同长度条目发生冲突的概率之间是否存在关系?如果是这样,这是哪个最佳长度?

其中18个字节落在ascii表(az,0-9)内,其余的范围在0和10

解决方案

以下简单的脚本运行一个无限循环,获取2个随机20字节序列,计算CRC16并检查是否存在是一次碰撞。对这个循环的持续评估实际上估计了碰撞百分比:

 #!/ usr / bin / env perl 

使用Digest :: CRC qw(crc16);

open(my $ f,'<','/ dev / urandom');
my $ n = 0;
my $ coll = 0;

while(1){
read $ f,$ randstr1,20;
读取$ f,$ randstr2,20;
my $ crc1 = crc16($ randstr1);
my $ crc2 = crc16($ randstr2);

$ n ++;
$ coll ++ if $ crc1 == $ crc2;

printf碰撞百分比=%.6f %% \\\
,$ coll * 100.0 / $ n if($ n%100000 == 0);

$ / code>

从我在电脑上得到的结果来看,碰撞百分比似乎在 0.0016%(或 1e-5 或100_000中的1),这是方式更糟比根据理想散列分布的16位散列值(例如2 ^ 16/2 ^ 160)预测的估计值。

更新 :我发现你已经明确了20个字节不是完全随机的字节,而是落入 [a-z0-9] 范围内。以下是估算该字母表中冲突的更新版本:

 #!/ usr / bin / env perl 

使用Digest :: CRC qw(crc16);

my $ n = 0;
my $ coll = 0;
my @chars =('a'..'z','0'..'9');

sub randstr(){
my $ res;
foreach(1..20){$ res。= $ chars [rand @chars]; }
返回$ res;


while(1){
my $ crc1 = crc16(randstr());
my $ crc2 = crc16(randstr());

$ n ++;
$ coll ++ if $ crc1 == $ crc2;

printf碰撞百分比=%.4f %% \\\
,$ coll * 100.0 / $ n if($ n%100000 == 0);
}

它的结果大致相同,大约 0.0016%


I am developing a system which needs to store a hash for a structure 20 bytes maybe less in length. However, in order to optimize the process of looking up the hash in a series of hashes, we want to reduce the size of the hash as much as possible.

So my question is this, is there a relationship between the amount of data we feed into a crc16 hash and the probability of its collision with another entry of the same length? If so, which is the most optimal length for this?

18 of the bytes fall within the ascii table (a-z, 0-9), and the remaining range between 0 and 10

解决方案

The following simple script runs an infinite loop that fetches 2 random 20-byte sequences, calculates CRC16 and checks if there is a collision. Continuous evaluation of this loop de facto estimates collision percentage:

#!/usr/bin/env perl

use Digest::CRC qw(crc16);

open(my $f, '<', '/dev/urandom');
my $n = 0;
my $coll = 0;

while (1) {
    read $f, $randstr1, 20;
    read $f, $randstr2, 20;
    my $crc1 = crc16($randstr1);
    my $crc2 = crc16($randstr2);

    $n++;
    $coll++ if $crc1 == $crc2;

    printf "percent of collisions = %.6f%%\n", $coll * 100.0 / $n if ($n % 100000 == 0);
}

From what I get on my computer, collision percentage seems to be around 0.0016% (or 1e-5, or "1 in 100_000"), which is way worse than predicted estimates based on ideal hashing distribution of a 16-bit hash (such as 2^16 / 2^160).

UPDATE: I see you've clarified that 20 bytes are not just fully random bytes, but fall into range of [a-z0-9]. Here's the updated version that estimates collisions in that alphabet:

#!/usr/bin/env perl

use Digest::CRC qw(crc16);

my $n = 0;
my $coll = 0;
my @chars = ('a'..'z', '0'..'9');

sub randstr() {
    my $res;
    foreach (1..20) { $res .= $chars[rand @chars]; }
    return $res;
}

while (1) {
    my $crc1 = crc16(randstr());
    my $crc2 = crc16(randstr());

    $n++;
    $coll++ if $crc1 == $crc2;

    printf "percent of collisions = %.4f%%\n", $coll * 100.0 / $n if ($n % 100000 == 0);
}

It yields approximately the same result, about 0.0016%.

这篇关于20字节数据上CRC16冲突的可能性有多大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆