PHP-从长md5哈希值生成短字母数字字符串的好方法是什么? [英] PHP - What is a good way to produce a short alphanumeric string from a long md5 hash?

查看:364
本文介绍了PHP-从长md5哈希值生成短字母数字字符串的好方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是为了拥有一个不错的短URL,该短URL引用数据库中的md5哈希.我想转换成这样:

This is for the purpose of having a nice short URL which refers to an md5 hash in a database. I would like to convert something like this:

a7d2cd9e0e09bebb6a520af48205ced1

a7d2cd9e0e09bebb6a520af48205ced1

变成这样:

hW9lM5f27

hW9lM5f27

这两个都包含大约相同数量的信息.该方法不必是直接且可逆的,但这会很好(更灵活).至少我希望使用以十六进制哈希作为种子的随机生成的字符串,以便可重现.我敢肯定有很多可能的答案,我很想知道人们会如何以优雅的方式做到这一点.

Those both contain about the same amount of information. The method doesn't have to be direct and reversible but that would be nice (more flexible). At the least I would want a randomly generated string with the hex hash as the seed so it is reproducible. I'm sure there are many possible answers, I am curious to see how people would do it in an elegant way.

哦,这不必与原始哈希完美地实现1:1对应,但这将是一个好处(我想我已经用可逆性标准暗示了这一点).而且,如果可能的话,我希望避免发生碰撞.

Oh, this doesn't have to have perfect 1:1 correspondence with the original hash but that would be a bonus (I guess I already implied that with the reversibility criteria). And I would like to avoid collisions if possible.

编辑 我意识到我的最初计算是完全错误的(这要归功于人们的回答,但是花了我一段时间才有所提示),并且您不能通过将所有小写和大写字母都放入混合中来真正减少字符串的长度.所以我想我会想要一些不能直接从十六进制转换为62的东西.

EDIT I realized my initial calculations were totally wrong (thanks to the people answering here but it took me awhile to clue in) and you can't really reduce the string length very much by throwing in all the lower case and uppercase letters into the mix. So I guess I will want something that doesn't directly convert from hex to base 62.

推荐答案

以下是一个需要考虑的小功能:

Here's a little function for consideration:

/** Return 22-char compressed version of 32-char hex string (eg from PHP md5). */
function compress_md5($md5_hash_str) {
    // (we start with 32-char $md5_hash_str eg "a7d2cd9e0e09bebb6a520af48205ced1")
    $md5_bin_str = "";
    foreach (str_split($md5_hash_str, 2) as $byte_str) { // ("a7", "d2", ...)
        $md5_bin_str .= chr(hexdec($byte_str));
    }
    // ($md5_bin_str is now a 16-byte string equivalent to $md5_hash_str)
    $md5_b64_str = base64_encode($md5_bin_str);
    // (now it's a 24-char string version of $md5_hash_str eg "VUDNng4JvrtqUgr0QwXOIg==")
    $md5_b64_str = substr($md5_b64_str, 0, 22);
    // (but we know the last two chars will be ==, so drop them eg "VUDNng4JvrtqUgr0QwXOIg")
    $url_safe_str = str_replace(array("+", "/"), array("-", "_"), $md5_b64_str);
    // (Base64 includes two non-URL safe chars, so we replace them with safe ones)
    return $url_safe_str;
}

基本上,MD5哈希字符串中有16字节的数据.它的长度为32个字符,因为每个字节都被编码为2个十六进制数字(即00-FF).因此,我们将它们分解为字节,并构建一个16字节的字符串.但是,由于它不再是人类可读或有效的ASCII,因此我们将它以base-64编码回可读的char.但是由于base-64导致〜4/3扩展(每8位输入仅输出6位,因此需要32位来编码24位),因此16字节变为22字节.但是由于base-64编码通常填充到4的倍数,所以我们只能采用24个字符输出中的前22个字符(其中最后2个字符为填充).然后,我们将base-64编码使用的非URL安全字符替换为URL安全等效项.

Basically you have 16-bytes of data in the MD5 hash string. It's 32 chars long because each byte is encoded as 2 hex digits (i.e. 00-FF). So we break them up into bytes and build up a 16-byte string of it. But because this is no longer human-readable or valid ASCII, we base-64 encode it back to readable chars. But since base-64 results in ~4/3 expansion (we only output 6 bits per 8 bits of input, thus requiring 32 bits to encode 24 bits), the 16-bytes becomes 22 bytes. But because base-64 encoding typically pads to lengths multiples of 4, we can take only the first 22 chars of the 24 character output (the last 2 of which are padding). Then we replace non-URL-safe characters used by base-64 encoding with URL-safe equivalents.

这是完全可逆的,但这留给读者练习.

This is fully reversible, but that is left as an exercise to the reader.

我认为这是最好的选择,除非您不关心人类可读的ASCII码,在这种情况下,您可以直接使用$ md5_bin_str.

I think this is the best you can do, unless you don't care about human-readable/ASCII, in which case you can just use $md5_bin_str directly.

如果您不需要保留所有位,也可以使用该函数结果的前缀或其他子集.扔掉数据显然是缩短事情的最简单方法! (但是那是不可逆的)

And also you can use a prefix or other subset of the result from this function if you don't need to preserve all the bits. Throwing out data is obviously the simplest way to shorten things! (But then it's not reversible)

P.S.对于您输入的"a7d2cd9e0e09bebb6a520af48205ced1"(32个字符),此函数将返回"VUDNng4JvrtqUgr0QwXO0Q"(22个字符).

P.S. for your input of "a7d2cd9e0e09bebb6a520af48205ced1" (32 chars), this function will return "VUDNng4JvrtqUgr0QwXO0Q" (22 chars).

这篇关于PHP-从长md5哈希值生成短字母数字字符串的好方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆