Fletchers16校验和是否适合于小数据? [英] Is Fletchers16 checksum suitable for small data?

查看:175
本文介绍了Fletchers16校验和是否适合于小数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用维基百科Fletcher的校验和的直接实现,我们将获得相同的校验和数据,例如 BCA和 CAB以及 BAC和 ACB。



这是预期的吗?

Fletcher16校验和不应该解释块的顺序吗?



可以通过将索引与数据进行或操作来修复缺陷,如下代码所示。...

  uint16_t fletcher16(uint8_t * data,int count)
{
uint16_t sum1 = 0;
uint16_t sum2 = 0;
int指数;

for(index = 0; index< count; ++ index)
{
// sum1 =(sum1 + data [index])%255; //原始
sum1 =(sum1 +索引| data [index])%255; // fix
sum2 =(sum2 + sum1)%255;
}

收益(sum2<< 8)| sum1;
}


解决方案


...我们得到相同的校验和...这是预期的吗?


是的,有可能。校验和是16位的(并且不会发生511种组合: 0x..FF 0xFF .. ),因此3字符串24位肯定会发生冲突。 皮孔原理


Fletcher16校验和不应该说明块的顺序吗?


是的。只是算法很容易与选择的相似输入冲突。另请参见锤距



BTW ,如果使用字符串的长度或大小(还检查空字符),则原始算法会得出不同的结果。同样,这4个输入字符串给出了不同的结果对。

  printf(%x\n,fletcher16( BCA,3)); // 8ec6 
printf(%x\n,fletcher16( CAB,3)); // 8ec6相同的
printf(%x\n,fletcher16( BAC,3)); // 8cc6
printf(%x\n,fletcher16( ACB,3)); // 8cc6 same

printf(%x\n,fletcher16( BCA,4)); // 55c6
printf(%x\n,fletcher16( CAB,4)); // 55c6 same
printf(%x\n,fletcher16( BAC,4)); // 53c6
printf(%x\n,fletcher16( ACB,4)); // 53c6相同






OP的建议改进会削弱校验和也与按或排序的索引相同,它忽略了每个阶段的选择位。建议进行异或运算。






次要提示:

  //返回(sum2<< 8)| sum1; 
回报(1u * sum2<< 8)| sum1;

此更改对所有 int / unsigned int / unsigned 为16位时,c>大小仍避免了实现定义的行为。最好确保代码不会左移到符号位。



some_int%255 执行带符号的余数。在设备上,例如简单的嵌入式设备,无符号的余数肯定是一样快或更快。 %255u 不会丢失任何东西,但是有潜在的改进。





尽管OP已为短字符串固定了代码,但它击败了 fletcher16()的设计参数,即执行速度。 p>




详细信息:如果我们将%255 放在一边,则 sum1 data [0] + ... + data [count-1] )和 sum2 data [0] *(count)+ data [0] *(count-1)+ ... + data [count-1] *(1),就很容易创建具有低值的1,2,3等长字符串,这些字符串几乎不会产生%255 操作。



请注意, sum2 是可以根据订单有效创建不同校验和的部分。如果数组元素的总和永远不会达到255(这在OP的4个测试用例中会发生),则对于任何2个仅顺序不同的字符串, sum1 将是相同的。



要有效地混合/哈希具有低值的短字符串,需要使用其他算法。



仅当 count< 8

  sum1 =(sum1 +索引+数据[索引])%255; 


Using the straight forward implementation on wikipedia Fletcher's checksum we get the same checksum for data such as "BCA" and "CAB" as well as "BAC" and "ACB".

Is this expected?
Should not the Fletcher16 checksum account for the order of the blocks?

The deficiency can easily be fixed by OR'ing the index with the data as shown in the code below....

uint16_t fletcher16( uint8_t *data, int count )
{
   uint16_t sum1 = 0;
   uint16_t sum2 = 0;
   int index;

   for( index = 0; index < count; ++index )
   {
      //sum1 = (sum1 + data[index]) % 255; // Original
      sum1 = (sum1 + index | data[index]) % 255; // The "fix"
      sum2 = (sum2 + sum1) % 255;
   }

   return (sum2 << 8) | sum1;
}

解决方案

... we get the same checksum... Is this expected?

Yes, as it is possible. The checksum is 16-bit (and 511 combinations never occur: 0x..FF, 0xFF..) so 3 character strings 24-bits will certainly have collisions. Pigeonhole principle

Should not the Fletcher16 checksum account for the order of the blocks?

It does. It is just that the algorithm collides readily with select similar inputs. Also see Hamming distance

BTW, the original algorithm gives different results, if the length or size (also check the null character ) of the string is used. Also, the 4 inputs strings gave a different pair of results.

  printf("%x\n", fletcher16("BCA",3)); // 8ec6
  printf("%x\n", fletcher16("CAB",3)); // 8ec6 same
  printf("%x\n", fletcher16("BAC",3)); // 8cc6 
  printf("%x\n", fletcher16("ACB",3)); // 8cc6 same

  printf("%x\n", fletcher16("BCA",4)); // 55c6
  printf("%x\n", fletcher16("CAB",4)); // 55c6 same
  printf("%x\n", fletcher16("BAC",4)); // 53c6
  printf("%x\n", fletcher16("ACB",4)); // 53c6 same


OP's suggested improvement weakens the checksum also as by or-ing in the index, which disregards select bits at each stage. Suggest xor-ing or adding instead.


Minor nits:

// return (sum2 << 8) | sum1;
return (1u*sum2 << 8) | sum1;

This change is not determental for all int/unsigned sizes yet avoids implementation defined behavior when int/unsigned are 16-bit. Best to insure code does not left-shift into the sign bit.

some_int % 255 performs a signed remainder. On devices, simple embedded ones for example, an unsigned remainder is certainly as fast or faster. Nothing to be lost with % 255u, but potential improvements.

[Edit]

Although OP has "fixed" code for short strings, it defeats a design parameter of fletcher16(), that of speed of execution.


Details: If we set aside the %255, sum1 is data[0] + ... + data[count-1]) and sum2 is data[0]*(count) + data[0]*(count-1) + ... + data[count-1]*(1), it becomes easy to create 1,2,3 etc long strings with low values that incur few, if any, %255 operations.

Notice that sum2 is the part that effectively creates different checksums base on order. Should the sum of array elements never reach 255 (this occurs in OP's 4 test cases), sum1 will be the same for any 2 strings that differ only in order.

To effectively "mix up/hash" short strings with low values, a different algorithm is needed.

Perhaps only use a variant when count < 8:

sum1 = (sum1 + index + data[index]) % 255;

这篇关于Fletchers16校验和是否适合于小数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆