在将文本存储到数据库之前对其进行压缩 [英] Compressing text before storing it in the database

查看:30
本文介绍了在将文本存储到数据库之前对其进行压缩的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 mysql 数据库中存储大量文本.字段类型为 LONGTEXT 的记录将达到数百万条,数据库大小将非常庞大.

I need to store a very big amount of text in mysql database. It will be millions of records with field type LONGTEXT and database size will be huge.

所以,我想问一下,是否有一种安全的方法可以在将文本存储到 TEXT 字段之前对其进行压缩以节省空间,并且能够在需要时将其提取回来?

So, I want ask, if there is a safe way to compress text before storing it into TEXT field to save space, with ability to extract it back if needed?

类似于:

$archived_text = compress_text($huge_text);
// saving $archived_text to database here
// ...

// ...
// getting compressed text from database
$archived_text = get_text_from_db();
$huge_text = uncompress_text($archived_text);

有没有办法用 php 或 mysql 做到这一点?所有文本均采用 utf-8 编码.

Is there a way to do this with php or mysql? All the texts are utf-8 encoded.

更新

我的应用程序是一个大型文学网站,用户可以在其中添加他们的文本.这是我的表:

My application is a large literature website where users can add their texts. Here is the table I have:

CREATE TABLE `book_parts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_id` int(11) NOT NULL,
  `title` varchar(200) DEFAULT NULL,
  `content` longtext,
  `order_num` int(11) DEFAULT NULL,
  `views` int(10) unsigned DEFAULT '0',
  `add_date` datetime DEFAULT NULL,
  `is_public` tinyint(3) unsigned NOT NULL DEFAULT '1',
  `published_as_draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `key_order_num` (`order_num`),
  KEY `add_date` (`add_date`),
  KEY `key_book_id` (`book_id`,`is_public`,`order_num`),
  CONSTRAINT FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

目前它有大约 80 万条记录,权重为 4 GB,99% 的查询都是 SELECT.我有充分的理由认为数字会以图形方式增加.我不想在文件中存储文本,因为周围的逻辑非常繁重,而且我的网站有很多点击量.

Currently it has about 800k records and weights 4 GB, 99% of queries are SELECT. I have all reasons to think that numbers increase diagrammatically. I wouldn't like to store texts in the files because there is quite heavy logic around and my website has quite a few hits.

推荐答案

你打算索引这些文本吗?这些文本的阅读量有多大?插入负载?

Are you going to index these texts. How big is read load on this texts? Insert load?

您可以使用 InnoDB 数据压缩 - 透明且现代的方式.请参阅文档了解更多信息.

You can use InnoDB data compression - transparent and modern way. See docs for more info.

如果您有非常大的文本(例如,每个文本超过 10MB),最好不要将它们存储在 Mysql 中.在文件系统中存储由 gzip 文本压缩的文本,在 mysql 中仅存储指针和元数据.您可以在将来轻松扩展您的存储并将其移动到例如DFS.

If you have realy huge texts (say, each text is above 10MB), than good idea is not to store them in Mysql. Store compressed by gzip texts in file system and only pointers and meta in mysql. You can easily expand your storage in future and move it to e.g. DFS.

更新: 在 Mysql 之外存储文本的另一个优点:DB 保持小而快.减:数据不一致的概率很高.

Update: another plus of storing texts outside Mysql: DB stays small and fast. Minus: high probability of data inconsistence.

更新 2:如果你有很多编程资源,请看看这样的项目:http://code.google.com/p/mysql-filesystem-engine/.

Update 2: if you have much programming resourses, please, take a look on projects like this one: http://code.google.com/p/mysql-filesystem-engine/.

最终更新:根据您的信息,您可以只使用 InnoDB 压缩 - 它与 ZIP 相同.您可以从这些参数开始:

Final Update: according to your info, you can just use InnoDB compression - it is the same as ZIP. You can start with these params:

CREATE TABLE book_parts
 (...) 
 ENGINE=InnoDB
 ROW_FORMAT=COMPRESSED 
 KEY_BLOCK_SIZE=8;

稍后您将需要使用 KEY_BLOCK_SIZE.参见 SHOW STATUS LIKE 'COMPRESS_OPS_OK'SHOW STATUS LIKE 'COMPRESS_OPS'.这两个参数的比例必须接近1.0:文档.

Later you will need to play with KEY_BLOCK_SIZE. See SHOW STATUS LIKE 'COMPRESS_OPS_OK' and SHOW STATUS LIKE 'COMPRESS_OPS'. Ratio of these two params must be close to 1.0: Docs.

这篇关于在将文本存储到数据库之前对其进行压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆