哈希功能MD5 30长度 [英] Hash Function MD5 30 length
问题描述
我正在使用给定的字符串使用 MD5
MessageDigest.getInstance("MD5").digest("SOME-BIG-STRING").map(%02x" .format(_)).mkString//输出:47a8899bdd7213fb1baab6cd493474b4
是否可以生成30位数而不是32位数长,如果生成的话会出现什么问题?
还有其他用于支持30个字符长和1万亿个唯一字符串碰撞概率的哈希算法吗?
安全性并不重要,唯一性是必需的.
要从字符串生成唯一的ID,散列函数永远不是正确的答案.
您需要定义文本字符串(例如"v1.0.0")到30个字符长的字符串(例如"123123 ...")的一对一映射.这也称为 bijection ,尽管在您的情况下, injection (从输入到输出(不一定是到输入的简单的一对一映射))可能就足够了.作为撰写本文时的另一答案,哈希函数不一定能确保这种映射,但是还有其他可能性,例如全周期 鸽子洞原理 .
From a given string I am generating 32 digit unique hash code using MD5
MessageDigest.getInstance("MD5")
.digest("SOME-BIG-STRING").map("%02x".format(_)).mkString
//output: 47a8899bdd7213fb1baab6cd493474b4
Is it possible to generate 30 digit long instead of 32 digit and what will be problem if it do so?
Any another hash algorithm to use to support 30 character long and 1 trillion unique strings collision probability?
Security is not important, uniqueness is required.
For generating unique IDs from strings, hash functions are never the correct answer.
What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 30-character-long strings (such as "123123..."). This is also known as a bijection, although in your case a injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As the other answer at the time of this writing notes, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.
However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.
See also this question: How to generate a GUID with a custom alphabet, that behaves similar to an MD5 hash (in JavaScript)?.
Indeed, if you use hash functions, the chance of collision will be close to zero but never exactly zero (meaning that the risk of duplicates will always be there). If we take MD5 as an example (which produces any of 2^128 hash codes), then roughly speaking, the chance of accidental collision becomes non-negligible only after 2^64 IDs are generated, which is well over 1 trillion.
But MD5 and other hash functions are not the right way to do what you want to do. This is discussed next.
If you can't restrict the format of your input strings to 30 digits and can't compress them to 30 digits or less and can't tolerate the risk of duplicates, then the next best thing is to create a database table mapping your input strings to randomly generated IDs.
This database table should have two columns: one column holds your input strings (e.g., "<UUID>-NAME-<UUID>
"), and the other column holds randomly generated IDs associated with those strings. Since random numbers don't ensure uniqueness, every time you create a new random ID you will need to check whether the random ID already exists in the database, and if it does exist, try a new random ID (but the chance that a duplicate is found will shrink as the size of the ID grows).
这篇关于哈希功能MD5 30长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!