在将文本存储在数据库中之前压缩文本 [英] Compressing text before storing it in the database

查看:146
本文介绍了在将文本存储在数据库中之前压缩文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在mysql数据库中存储大量的文本。这将是数百万记录与字段类型LONGTEXT和数据库大小将是巨大的。



所以,我想问,如果有一个安全的方式来压缩文本,



类似的东西:

  $ archived_text = compress_text($ huge_text); 
//保存$ archived_text到数据库
// ...

// ...
//从数据库获取压缩文本
$ archived_text = get_text_from_db();
$ huge_text = uncompress_text($ archived_text);

有没有办法用php或mysql?所有文字均为utf-8编码。



UPDATE



一个大文学网站,用户可以添加他们的文本。这是我有的表:

  CREATE TABLE`book_parts`(
`id` int AUTO_INCREMENT,
`book_id` int(11)NOT NULL,
`title` varchar(200)DEFAULT NULL,
`content` longtext,
` DEFAULT NULL,
`views` int(10)unsigned DEFAULT'0',
`add_date` datetime DEFAULT NULL,
`is_public` tinyint(3)unsigned NOT NULL DEFAULT'
`published_as_draft` tinyint(3)unsigned NOT NULL DEFAULT'0',
PRIMARY KEY(`id`),
KEY`key_order_num`(`order_num`),
KEY `add_date`(`add_date`),
KEY`key_book_id`(`book_id`,`is_public`,`order_num`),
约束外键(`book_id`)REFERENCES`books` `)ON DELETE CASCADE
)ENGINE = InnoDB DEFAULT CHARSET = utf8

800k记录和权重4 GB,99%的查询是SELECT。我有所有理由认为数字增加图解。我不想在文件中存储文本,因为有相当严重的逻辑,我的网站有很多的匹配。

解决方案

p>您要索引这些文本。这些文本的读取负载有多大?插入负载?



您可以使用InnoDB数据压缩 - 透明和现代的方式。有关详情,请参见文档。 / p>

如果你有真正的巨大的文本(比如,每个文本超过10MB),好主意不是存储在Mysql。在文件系统中通过gzip文本压缩存储,在mysql中只有指针和元。您可以轻松地扩展您的存储空间,并将其移动到DFS。



更新:在Mysql之外存储文本的另一加法:DB保持小而快。减去数据不一致的可能性很高。



更新2:如果您有许多编程资源,请查看类似这样的项目: http://code.google.com/p/mysql-filesystem-engine/



最终更新:根据您的信息,您只需使用InnoDB压缩 - 它与ZIP相同。您可以从这些参数开始:

  CREATE TABLE book_parts 
(...)
ENGINE = InnoDB
ROW_FORMAT = COMPRESSED
KEY_BLOCK_SIZE = 8;

稍后您将需要使用 KEY_BLOCK_SIZE 。参见 SHOW STATUS LIKE'COMPRESS_OPS_OK'显示状态LIKE'COMPRESS_OPS'。这两个参数的比率必须接近1.0:文档


I need to store a very big amount of text in mysql database. It will be millions of records with field type LONGTEXT and database size will be huge.

So, I want ask, if there is a safe way to compress text before storing it into TEXT field to save space, with ability to extract it back if needed?

Something like:

$archived_text = compress_text($huge_text);
// saving $archived_text to database here
// ...

// ...
// getting compressed text from database
$archived_text = get_text_from_db();
$huge_text = uncompress_text($archived_text);

Is there a way to do this with php or mysql? All the texts are utf-8 encoded.

UPDATE

My application is a large literature website where users can add their texts. Here is the table I have:

CREATE TABLE `book_parts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_id` int(11) NOT NULL,
  `title` varchar(200) DEFAULT NULL,
  `content` longtext,
  `order_num` int(11) DEFAULT NULL,
  `views` int(10) unsigned DEFAULT '0',
  `add_date` datetime DEFAULT NULL,
  `is_public` tinyint(3) unsigned NOT NULL DEFAULT '1',
  `published_as_draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `key_order_num` (`order_num`),
  KEY `add_date` (`add_date`),
  KEY `key_book_id` (`book_id`,`is_public`,`order_num`),
  CONSTRAINT FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

Currently it has about 800k records and weights 4 GB, 99% of queries are SELECT. I have all reasons to think that numbers increase diagrammatically. I wouldn't like to store texts in the files because there is quite heavy logic around and my website has quite a few hits.

解决方案

Are you going to index these texts. How big is read load on this texts? Insert load?

You can use InnoDB data compression - transparent and modern way. See docs for more info.

If you have realy huge texts (say, each text is above 10MB), than good idea is not to store them in Mysql. Store compressed by gzip texts in file system and only pointers and meta in mysql. You can easily expand your storage in future and move it to e.g. DFS.

Update: another plus of storing texts outside Mysql: DB stays small and fast. Minus: high probability of data inconsistence.

Update 2: if you have much programming resourses, please, take a look on projects like this one: http://code.google.com/p/mysql-filesystem-engine/.

Final Update: according to your info, you can just use InnoDB compression - it is the same as ZIP. You can start with these params:

CREATE TABLE book_parts
 (...) 
 ENGINE=InnoDB
 ROW_FORMAT=COMPRESSED 
 KEY_BLOCK_SIZE=8;

Later you will need to play with KEY_BLOCK_SIZE. See SHOW STATUS LIKE 'COMPRESS_OPS_OK' and SHOW STATUS LIKE 'COMPRESS_OPS'. Ratio of these two params must be close to 1.0: Docs.

这篇关于在将文本存储在数据库中之前压缩文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆