如何压缩小字符串 [英] How to compress small strings

查看:129
本文介绍了如何压缩小字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个sqlite数据库充满了大量的URL,它占用了大量的磁盘空间,访问它导致了很多磁盘寻道,并且很慢。平均网址路径长度为97个字节(主机名重复很多,所以我将它们移动到外键表)。有什么好的压缩方法吗?大多数压缩算法适用于大文档,而不是文档平均少于100个字节,但即使减少20%也是非常有用的。任何压缩算法将工作?

I have an sqlite database full of huge number of URLs and it's taking huge amount of diskspace, and accessing it causes many disk seeks and is slow. Average URL path length is 97 bytes (host names repeat a lot so I moved them to a foreign-keyed table). Is there any good way of compressing them? Most compression algorithms work well with big documents, not "documents" of less that 100 bytes on average, but even 20% reduction would be very useful. Any compression algorithms that would work? Doesn't have to be anything standard.

推荐答案

使用压缩算法,但使用共享字典。

Use the compress algorithm but use a shared dictionary.

在这之前,我使用了Unix compress命令使用的LZC / LZW算法。

I've done something like this before where I used the LZC/LZW algorithm, as used by the Unix compress command.

使用短字符串获得良好压缩的技巧是使用由正在压缩的URL的标准示例构成的字典。

The trick to get good compression with short strings is to use a dictionary made up of a standard sample of the URLs you are compressing.

您应该很容易得到20%。

You should easily get 20%.

编辑:LZC是LZW的一个变种。你只需要LZW,因为你只需要一个静态字典。 LZC添加支持在字典/表已满时将其重置。

LZC is a variant of LZW. You only require LZW as you only need a static dictionary. LZC adds support for resetting the dictionary/table when it gets full.

这篇关于如何压缩小字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆