哈希功能MD5 30长度 [英] Hash Function MD5 30 length

查看:66
本文介绍了哈希功能MD5 30长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用给定的字符串使用 MD5

  MessageDigest.getInstance("MD5").digest("SOME-BIG-STRING").map(%02x" .format(_)).mkString//输出:47a8899bdd7213fb1baab6cd493474b4 

是否可以生成30位数而不是32位数长,如果生成的话会出现什么问题?

还有其他用于支持30个字符长和1万亿个唯一字符串碰撞概率的哈希算法吗?

安全性并不重要,唯一性是必需的.

解决方案

要从字符串生成唯一的ID,散列函数永远不是正确的答案.

您需要定义文本字符串(例如"v1.0.0")到30个字符长的字符串(例如"123123 ...")的一对一映射.这也称为 bijection ,尽管在​​您的情况下, injection (从输入到输出(不一定是到输入的简单的一对一映射))可能就足够了.作为撰写本文时的另一答案,哈希函数不一定能确保这种映射,但是还有其他可能性,例如全周期 鸽子洞原理 .

另请参阅此问题:< UUID> -NAME-< UUID> "),另一列随机保存与这些字符串关联的生成的ID.由于随机数不能确保唯一性,因此每次创建新的随机ID时,您都需要检查数据库中是否已存在随机ID,如果确实存在,请尝试使用新的随机ID(但可能会重复被发现会随着ID大小的增加而缩小.

From a given string I am generating 32 digit unique hash code using MD5

    MessageDigest.getInstance("MD5")
             .digest("SOME-BIG-STRING").map("%02x".format(_)).mkString

    //output: 47a8899bdd7213fb1baab6cd493474b4

Is it possible to generate 30 digit long instead of 32 digit and what will be problem if it do so?

Any another hash algorithm to use to support 30 character long and 1 trillion unique strings collision probability?

Security is not important, uniqueness is required.

解决方案

For generating unique IDs from strings, hash functions are never the correct answer.

What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 30-character-long strings (such as "123123..."). This is also known as a bijection, although in your case a injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As the other answer at the time of this writing notes, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.

However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.

See also this question: How to generate a GUID with a custom alphabet, that behaves similar to an MD5 hash (in JavaScript)?.

Indeed, if you use hash functions, the chance of collision will be close to zero but never exactly zero (meaning that the risk of duplicates will always be there). If we take MD5 as an example (which produces any of 2^128 hash codes), then roughly speaking, the chance of accidental collision becomes non-negligible only after 2^64 IDs are generated, which is well over 1 trillion.

But MD5 and other hash functions are not the right way to do what you want to do. This is discussed next.


If you can't restrict the format of your input strings to 30 digits and can't compress them to 30 digits or less and can't tolerate the risk of duplicates, then the next best thing is to create a database table mapping your input strings to randomly generated IDs.

This database table should have two columns: one column holds your input strings (e.g., "<UUID>-NAME-<UUID>"), and the other column holds randomly generated IDs associated with those strings. Since random numbers don't ensure uniqueness, every time you create a new random ID you will need to check whether the random ID already exists in the database, and if it does exist, try a new random ID (but the chance that a duplicate is found will shrink as the size of the ID grows).

这篇关于哈希功能MD5 30长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆