您可以截断SHA1哈希多少并合理确定具有唯一ID? [英] How much can you truncate a SHA1 hash and be reasonably sure of having an unique ID?

查看:87
本文介绍了您可以截断SHA1哈希多少并合理确定具有唯一ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个存储文件的应用程序,并根据包括时间戳在内的一些内容的SHA1摘要为每个文件提供一个UID。摘要有很多字符,我希望允许用户使用完整摘要的前x个字符来识别文档。如果文档数可能在10K-100K左右,那么x的一个好值是什么?

I am making an application that stores documents and gives each one a UID based on a SHA1 digest of a few things including the timestamp. The digest has a lot of characters, and I want to allow users to identify the documents by using the first x characters of the full digest. What's a good value for x if the number of documents is maybe around 10K - 100K?

推荐答案

关于生日问题的维基百科,您可以将碰撞概率近似为 1-e ^(-n ^ 2 /(2 ^(b + 1))),其中 n 是文档计数, b 是位数。 用n = 100,000绘制该公式,看来您至少需要b> 45。我更倾向于使用64,使其成为一个不错的整数。也就是说,是否有应对冲突的计划(可能会稍微更改时间戳,还是添加随机数?)

Adapting the formulas on on wikipedia for the Birthday problem, you can approximate the probability of collision as 1 - e^(-n^2/(2^(b+1))), where n is the document count and b is the number of bits. Graphing this formula with n=100,000, it looks like you'll want b > 45 at least. I'd be more inclined to go with 64 to make it a nice and round number. That said, do have a plan to deal with collisions if they occur (maybe alter the timestamp slightly, or add a nonce?)

为此,如果sha1是不仅基于文档的内容,为什么不简单地将其设为随机ID?在这种情况下,冲突就不再是问题,因为您始终可以生成一个新的随机数并重试(但是一次尝试发生冲突的可能性是相同的。)

For that matter, if the sha1 is based on more than just the content of the document, why not simply make it a random ID? In this case collisions are less of a problem, as you can always generate a new random number and try again (the probability of a collision with a single try is the same, however).

这篇关于您可以截断SHA1哈希多少并合理确定具有唯一ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆