CRC32碰撞概率 [英] CRC32 Collision Probability

查看:209
本文介绍了CRC32碰撞概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经做了很多其他问题的检查,但是我对此仍然不确定.

I've done quite a bit of checking up on other questions and I'm still uncertain on the issue.

这是我的用例:

我有一个在线购物车.有时,某些客户发现订购过程太繁琐,或者有些客户无法在线下单,因此需要实际的PDF估算值(报价)才能购买产品.

I have an online shopping cart. Occassionaly, certain clients find the ordering process either too tedious, or there are some clients where an online order will not cut it, and they need an actual PDF estimate (quote) in order to purchase a product.

因此,我在一个模块中编码,该模块接收购物车中的内容,并将其整齐地布置为PDF估算值.

So I coded in a module that takes the shopping cart contents, and lays out neatly as a PDF estimate.

现在,因为此过程仅使用购物车中的内容,而不使用其他任何内容,甚至不使用数据库,所以我必须创建一个唯一的估算文件编号,以便客户应为报价单付款,他们可以使用引用在他们的付款说明中.

Now because this process only uses the cart contents, and nothing else is used, not even the database, I have to create a unique Estimate document number, so that should the client pay for the quote, they have a reference to use in their payment instruction.

当前,购物车会生成一个5位数字的购物车ID,该ID对于每个客户来说都是唯一的,取决于他们的会话.我已经获取了这个5位数的购物车ID,然后在其中添加了UNIX时间,这给了我一个很好的长号,用作估计"文档号.

The shopping cart currently generates a 5 digit cart ID, unique to each customer based on their session. I've taken this 5 digit cart ID, and I've then added UNIX time to it, which gives me a nice long number to use as the Estimate document number.

所以我最终得到这样的结果:363821482812537 [36382是购物车ID,而1482812537是在生成PDF估算值时的unix时间]

So I end up with something like this: 363821482812537 [36382 is the cart ID and 1482812537 is unix time at the time the PDF estimate was generated]

问题是它太长了,由于银行付款参考受到限制,这将成为一个问题.理想情况下,我希望将其保持在10个字符以内.

The problem with this is that it is too long, and WILL be an issue as bank payment references are limited. Ideally, I'd like to keep it to 10 characters or less.

我决定查看CRC32来缩短生成的估计数,并且似乎能够将估计数缩短到可接受的字符数.

I've decided to look at CRC32 to shorten the generated estimate numbers, and it seems capable of shortening the estimate number to an acceptable amount of characters.

但是,任何人都可以对我可能遇到的碰撞类型有所了解吗?

But, can anyone shed some light on what kind of collision I might be up against?

要考虑的几件事:

  1. 购物车ID始终为5位数字.

  1. Cart ID will always be 5 digits.

在2286年之前,Unix时间将始终为10位数字.

Unix time will always be 10 digits up until the year 2286.

[因此,我们总是以需要编码的15位数字结尾,并且不再需要其他数字]

[So we will always end up with 15 digits that needs to be encoded, and no more]

  1. 有一个适当的保护措施,如果偶然发生重复,则会引发错误,并提供该选项以重试并生成估计.通过将估算值保存到与估算值匹配的文件名(或本例中为估算数字的CRC32哈希)的方法来完成,然后首先检查是否存在带有哈希值的文件名.

  1. There is a safeguard in place, that if by some chance, a duplicate occurs, an error is thrown, and the the option is provided to retry and generate the estimate. This is done by the estimate saving to a filename matching the estimate number (or in this case, the CRC32 hash of the estimate number) - and then checking first to see if a filename with the hash exists.

由于对我的问题不重要的原因,目前暂时不允许客户自己生成估算值.因此只有管理员才能生成估算值.

Customers will for the moment not be allowed to generate estimates themselves, for reasons not important to my question. So it will only be admins who can generate estimates.

我的担心很简单,我会发现自己经常用15位CRC32哈希编码遇到冲突吗?还是很少会发生冲突?

My concern is simple, will I find myself running into collisions very often with my 15 digit to CRC32 hash encoding, or is it going to be pretty rare to run into collisions?

推荐答案

为什么不只维护一个估计数,而这个估计数在每次需要一个新的估计数时都会增加?您已经在有效地维护一个已用编号列表,以检查是否存在冲突,因此只需将计数器放在此处即可.然后,您只需要看一件事,而不是 n 件事.通过获取CRC,您正在丢弃可能尝试从估计数中提取的信息,因此首先没有必要从该信息中提取ID.您的方法似乎比需要的方法复杂得多.

Why not just maintain an estimate number that you simply increment each time you need a new one? You are already effectively maintaining a list of used numbers to check against for collisions, so just put your counter there. Then you only need to look at one thing instead of n things. By taking the CRC, you are discarding information you might try to extract from the estimate number, so there was no point in making the ID out of that information in the first place. Your approach seems way more complicated than it needs to be.

单个碰撞的概率为2 -32 .数据内容无关紧要,只要它大于32位(在这种情况下就是这种情况),这是因为CRC在混合这些位方面做得很好.但是,如果您以前生成了 n 个估算值,则有 n 个碰撞机会.因此,随着 n 的增长,发生碰撞的机会也相应增加. (请参阅生日问题.)因此,在仅进行77,164次估计后,他们的两个哈希值发生冲突的可能性就有50%.

The probability of an individual collision is 2-32. The data content doesn't matter, so long as it's more than 32 bits, which it is in this case, since a CRC does a very good job mixing the bits. However you have n chances at a collision if you have previously generated n estimates. So as n grows, the chance of a collision grows accordingly. (See the Birthday Problem.) As a result, after only 77,164 estimates there is a probability of 50% that two of their hashes collided.

这篇关于CRC32碰撞概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆