如何创建 URL 缩短器? [英] How do I create a URL shortener?
问题描述
我想创建一个 URL 缩短器服务,您可以在其中将长 URL 写入输入字段,该服务将 URL 缩短为http://www.example.org/abcdef
".
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef
".
除了abcdef
"之外,还可以使用包含 a-z、A-Z 和 0-9
的六个字符的任何其他字符串.这使得 56~570 亿个可能的字符串.
Instead of "abcdef
" there can be any other string with six characters containing a-z, A-Z and 0-9
. That makes 56~57 billion possible strings.
我的方法:
我有一个包含三列的数据库表:
I have a database table with three columns:
- id,整数,自增
- long,字符串,用户输入的长网址
- short、string、缩短的 URL(或仅六个字符)
然后我会将长 URL 插入表中.然后我会选择id
"的自动递增值并构建它的哈希.然后应将此哈希作为short
"插入.但是我应该构建什么样的哈希?像 MD5 这样的哈希算法会创建太长的字符串.我不使用这些算法,我想.自建算法也可以.
I would then insert the long URL into the table. Then I would select the auto-increment value for "id
" and build a hash of it. This hash should then be inserted as "short
". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
我的想法:
对于http://www.google.de/
",我得到自动递增 ID 239472
.然后我执行以下步骤:
For "http://www.google.de/
" I get the auto-increment id 239472
. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
这可以重复,直到数字不再可整除.你认为这是一个好方法吗?你有更好的主意吗?
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
由于对这个话题的持续兴趣,我已经发布了一个有效的解决方案到 GitHub,带有 JavaScript 的实现, PHP, Python 和 Java.如果您愿意,请添加您的解决方案:)
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
推荐答案
我会继续你的将数字转换为字符串"的方法.但是,您会意识到,如果您的 ID 是 质数且大于 52,则您提出的算法将失败.
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
您需要一个双射函数 f.这是必要的,以便您可以为您的 f(123) = 'abc' 函数找到反函数 g('abc') = 123.这意味着:
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
- 不能有 x1, x2 (with x1 ≠ x2) 使得 f(x1) = f(x2),
- 并且对于每个y,您必须能够找到一个x,以便f(x) = y.
- There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
- and for every y you must be able to find an x so that f(x) = y.
- 想想我们要使用的字母表.就您而言,就是
[a-zA-Z0-9]
.它包含62 个字母. 采用自动生成的唯一数字键(例如 MySQL 表的自动递增
id
).
在这个例子中,我将使用 12510(125,底数为 10).
For this example, I will use 12510 (125 with a base of 10).
现在您必须将 12510 转换为 X62(基数为 62).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 = [2,1]
12510 = 2×621 + 1×620 = [2,1]
这需要使用整数除法和取模.一个伪代码示例:
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverse
现在将索引 2 和 1 映射到您的字母表.这就是您的映射(例如使用数组)的样子:
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9
使用 2 → c 和 1 → b,您将收到 cb62 作为缩短的 URL.
With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
如何将缩短的 URL 解析为初始 ID
反过来更容易.您只需在字母表中进行反向查找即可.
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 将解析为字母表中的第 4、61 和 0 个字母".
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 = [4,61,0]
= 4×622 + 61×621 + 0×620 = 1915810
e9a62 = [4,61,0]
= 4×622 + 61×621 + 0×620 = 1915810
现在使用 WHERE id = 19158
找到您的数据库记录并进行重定向.
Now find your database-record with WHERE id = 19158
and do the redirect.