url文本压缩(不缩短)和存储在mysql中 [英] url text compression (not shortening) and storing in mysql

查看:176
本文介绍了url文本压缩(不缩短)和存储在mysql中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在mysql中有url表,它只有两个字段id和varchar(255)。目前有超过5000万的网址,我的老板刚刚给我的线索,我们当前的项目的扩展,这将导致更多的网址被添加在该网址表中,预期数字在中间大约1.5亿

I have url table in mysql which has only two fields id and varchar(255) for url. There are currently more than 50 million urls there and my boss has just given my clue about the expansion of our current project which will result in more urls to be added in that url table and expected numbers are well around 150 million in the mid of the next year.

目前数据库大小约为6GB,所以我可以安全地说,如果事情是相同的方式,那么它会跨越20GB这是不好的。所以,我想到一些解决方案,可以减少磁盘空间的url存储。

Currently database size is about 6GB so I can safely say that if things are left same way then it will cross 20GB which is not good. So, I am thinking of some solution which can reduce the disk space of url storage.

我还想说明这个表不是一个繁忙的表,并且在momen没有太多的查询,所以我只是希望节省磁盘空间和更重要的是,我正在寻求探索短文本压缩及其在mysql中的存储的新想法

I also want to make it clear that this table is not a busy table and there are not too many queries at the momen so I am just looking to save disk space and more importantly I am looking to explore new ideas of short text compression and its storage in mysql

但是,在将来这个表也可以被访问,所以它更好地优化表

BUT in future that table can also be accessed heavily so its better to optimize the table well before the time come.

我做了很多工作,将URL改为数字形式并使用BIGINT存储,但由于它有64位的限制,所以它没有工作相当不错。同样是BIT数据类型的问题,并强加64位的限制。

I worked quite a bit to change the url into numeric form and store using BIGINT but as it has limitations of 64 bits so it didn't work out quite well. And same is the problem with BIT data type and imposes the limit of 64 bits too.

我转换为数字形式的想法基本上是8byte BIGINT存储19位数,因此如果每个数字指向所有可能字符的字符集中的一个字符,那么如果所有字符都在1-10之间,则它可以存储19个字符,如果所有字符都在1-10之间,但是在现实世界中,有52个字符的英语和10个数字加几个符号,它的井大约100个字符集。所以,在最坏的情况下,BIGINT仍然可以指向6个字符,是的,它不是一个最终的结论,它仍然需要一些锻炼,以确切地知道每个数字是指向它是10+数字或30+数字或80+数字,但你有几乎是我想到的想法。

My idea behind converting to numeric form is basically as 8byte BIGINT stores 19 digits so if each digit points a character in a character set of all possible characters then it can store 19 characters in 8 bytes if all characters are ranged from 1-10 but as in real world scenario there are 52 characters of English and 10 numeric plus few symbols so its well around 100 character set. So, in worst case scenario BIGINT can still point to 6 characters and yes its not a final verdict it still needs some workout to know exactly what each digit is point to it is 10+ digit or 30+ digit or 80+ digit but you have got pretty much the idea of what I am thinking about.

更重要的是,因为url是可变长度,所以我也试图节省磁盘空间小urls,所以我不想给一个固定长度的列类型。

One more important thing is that as url are of variable length so I am also trying to save disk space of small urls so I don't want to give a fixed length column type.

我也研究了一些文本压缩algo像smaz和Huffman压缩算法,但不是很相信因为他们使用某种字典词,但我正在寻找一个干净的方法。

I have also looked into some text compression algo like smaz and Huffman compression algo but not pretty much convinced because they use some sort of dictionary words but I am looking for a clean method.

我不想使用二进制数据类型,因为它也占用太多的空间,如字节的varchars。

And I don't want to use binary data type because it also take too many space like varchars in bytes.

推荐答案

如果你正在寻找128位整数,那么你可以使用二进制(16)这里16是字节。您可以将其扩展到64字节(512位),因此它不占用比位数据类型更多的空间。您可以将二进制数据类型说成BIT数据类型的扩展,但其字符串变体。

If you are looking for 128 bit integers then you can use binary(16) here 16 is bytes. And you can extend it to 64 bytes (512 bits) so it doesn't take more space than bit data type. You can say Binary data type as an expansion of BIT data type but its string variant.

我建议使用字典算法来压缩URL和短字符串,但使用url缩短服务使用的技术的混合,如使用AZ az 0-9组合三个字替换大字典单词,你将有更多的组合可用的字62 X 62 X 62。

Having said that I would suggest dictionary algorithms to compress URLs and short strings but with the blend of techniques used by url shortening services like using A-Z a-z 0-9 combination of three words to replace large dictionary words and you would have more combinations available than available words 62 X 62 X 62.

虽然我不知道你会实现什么级别的压缩,但是这不是一个坏主意,以这种方式实现url压缩。

Though I am not sure what level of compression you would achieve but its not a bad idea to implement url compression this way.

这篇关于url文本压缩(不缩短)和存储在mysql中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆