在mysql中分区或分离非常大的表 [英] partitioning or separating a very large table in mysql

查看:66
本文介绍了在mysql中分区或分离非常大的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在mysql中有一个非常大的表,其中有500,000,000条记录,每秒有100个请求(SELECT).
这是模式:

We have a very large table in mysql with 500,000,000 records in it with 100 requests ( SELECT ) per second.
This is schema:


id(int), 
user_id (int), 
content(text), 
date(datetime)

由于多达90%的请求都在过去6个月内.我的问题是关于提高性能. 将过去6个月的记录分离到另一个表中并从中选择SELECT是一个好主意,还是一种分区方法来快速获取最近6个月的所有记录.

Since up to 90% of requests are within last 6 months. My question is about increasing performance. Is it a good idea to separate those records from last 6 month in another table and SELECT from it, OR a partitioning method to get all records of the last 6 month fast.

或者如果有更好的方法...

Or if there's a better way...

例如,一个查询就是这个.

For instance, a query is this.

SELECT content,user_id FROM log
JOIN users ON users.id = log.user_id
WHERE date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
LIMIT 15

user_id,日期在表Log
中建立索引 Users表中有200万用户.

user_id, date is indexed in table Log
There are 2 million users in table Users.

推荐答案

您的编辑说您使用这样的查询的速度为每小时一百万分之一.

Your edit says you use queries like this at a rate of a third of a million per hour.

 SELECT content,user_id 
   FROM log
   JOIN users ON users.id = log.user_id
  WHERE date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15

我将自由地重写此查询以完全限定您的列选择.

I will take the liberty of rewriting this query to fully qualify your column selections.

 SELECT log.content,
        log.user_id 
   FROM log                                  /* one half gigarow table */
   JOIN users ON users.id = log.user_id      /* two megarow table */
  WHERE log.date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15

(如果不正确,请考虑更新您的问题.)

(Please consider updating your question if this is not correct.)

为什么要在此查询中加入users表?您的结果似乎都不是从中得出的.为什么这个查询不能满足您的需求?

Why are you joining the users table in this query? None of your results seem to come from it. Why won't this query do what you need?

 SELECT log.content,
        log.user_id 
   FROM log                                  /* one half gigarow table */
  WHERE log.date > DATE_SUB(CURDATE(), INTERVAL 180 DAY)
  LIMIT 15

如果要使此查询更快,请在(date,user_id, content)上放一个覆盖索引的化合物.该覆盖索引将支持范围扫描和快速检索.如果您的content列实际上是TEXT(LOB)类型,则只需将(date,user_id)放入覆盖索引中,您的检索就会慢一点.

If you want to make this query faster, put a compound covering index on (date,user_id, content). This covering index will support a range scan and fast retrieval. If your content column is in fact of type TEXT (a LOB) type, you need to put just (date,user_id) into the covering index, and your retrieval will be a little slower.

您是否使用JOIN来确保返回的日志条目在users?中具有匹配的条目?如果是,请更好地解释您的查询.

Are you using the JOIN to ensure that you get log entries returned which have a matching entry in users? If so, please explain your query better.

您绝对可以根据日期范围对表进行分区.但是您将需要更改表,或者重新创建并重新填充它,这将导致停机或混乱.

You definitely can partition your table based on date ranges. But you will need to either alter your table, or recreate and repopulate it, which will incur either downtime or a giant scramble.

http://dev.mysql.com/doc/refman/5.6/en/partitioning-range.html

然后,像这样的DDL应该会为您解决问题

Something like this DDL should then do the trick for you

CREATE TABLE LOG (
  id         INT NOT NULL AUTO_INCREMENT,  /*maybe BIGINT? */
  user_id    INT NOT NULL,
  `date`     DATETIME NOT NULL,
  content    TEXT,
  UNIQUE KEY (id, `date`),
  KEY covering (`date`,user_id)
) 
PARTITION BY RANGE COLUMNS(`date`) (
    PARTITION p0 VALUES LESS THAN ('2012-01-01'),
    PARTITION p1 VALUES LESS THAN ('2012-07-01'),
    PARTITION p2 VALUES LESS THAN ('2013-01-01'),
    PARTITION p3 VALUES LESS THAN ('2013-07-01'),
    PARTITION p4 VALUES LESS THAN ('2014-01-01'),
    PARTITION p5 VALUES LESS THAN ('2014-07-01'),
    PARTITION p6 VALUES LESS THAN ('2015-01-01'),
    PARTITION p7 VALUES LESS THAN ('2015-07-01')
);

请注意,有关UNIQUE KEY的事情有些胡闹.分区功能中的列也需要出现在所谓的主键中.

Notice that there's some monkey business about the UNIQUE KEY. The column that goes into your partitioning function needs also to appear in the so-called primary key.

稍后,当2015年7月(分区p7的截止日期)临近时,您可以运行此语句以为下一个六个月的时间段添加一个分区.

Later on, when July 2015 (partition p7's cutoff date) draws near, you can run this statement to add a partition for the next six month segment of time.

   ALTER TABLE `log` 
 ADD PARTITION (PARTITION p8 VALUES LESS THAN ('2016-01-01'))

但是,严重的是,如果您的查询具有不必要的联接或索引覆盖范围较差,那么这些分区垃圾将无济于事.它将使您的数据库管理更加复杂.

But, seriously, none of this partitioning junk is going to help much if your queries have unnecessary joins or poor index coverage. And it is going to make your database administration more complex.

这篇关于在mysql中分区或分离非常大的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆