我可以将Mysql设置为自动分区吗? [英] Can i set up Mysql to auto-partition?

查看:126
本文介绍了我可以将Mysql设置为自动分区吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对一个很大的表进行分区.随着业务的增长,按日期进行分区实际上并不是那么好,因为分区每年都在变得越来越大.我真正想要的是每1000万个记录的一个分区.

I want to partition a very large table. As the business is growing, partitioning by date isn't really that good because each year the partitions get bigger and bigger. What I'd really like is a partition for every 10 million records.

Mysql手册显示了以下简单示例:

The Mysql manual show this simple example:

CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);

但是,这意味着大于16且小于MAXVALUE的所有内容都将被抛出到最后一个分区中.有没有一种方法可以在每个时间间隔内自动生成一个新分区(在我的情况下为1000万条记录),这样我就不必继续修改活动数据库了?我正在运行Mysql 5.5

But this means that everything larger than 16 and less than MAXVALUE gets thrown in the last partition. Is there a way to auto-generate a new partition every interval (in my case, 10 million records) so I won't have to keep modifying an active database? I am running Mysql 5.5

谢谢!

这是我的实际桌子

CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
 PRIMARY KEY (`row_id`),
 UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
 KEY `index1` (`filename`),
 KEY `index2` (`timestamp`),
 KEY `index3` (`timestamp`,`filename`,`string`),
 KEY `index4` (`filename`,`unit_num`)
 ) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1

并且该图的示例查询是...

and an example query for the graph is...

SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num 
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) 
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;

这是查询的解释...

Here is the explain for the query...

mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table      | type | possible_keys           | key    | key_len | ref         | rows  | Extra                                              |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
|  1 | SIMPLE      | unit_tarma | ref  | timestamp,index3,index4 | index4 | 58      | const,const | 13440 | Using index condition; Using where; Using filesort | 
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+

推荐答案

(此答案针对架构和SELECT.)

(This answer is directed at the schema and SELECT.)

由于您预计会有数以百万计的行,因此我首先要指出对架构的一些改进.

Since you anticipate millions of rows, first I want to point out some improvements to the schema.

  • FLOAT(m,n)通常是要做的错误"事情,因为它会导致两次舍入.可以使用普通的FLOAT(对于电压等度量标准来说似乎是正确的"),也可以使用DECIMAL(m,n). FLOAT是4个字节;在给定的情况下,DECIMAL将是3或4个字节.

  • FLOAT(m,n) is usually the 'wrong' thing to do because it leads to two roundings. Either use plain FLOAT (which seems 'right' for metrics like voltage) or use DECIMAL(m,n). FLOAT is 4 bytes; in the cases given, DECIMAL would be 3 or 4 bytes.

当同时具有INDEX(a)INDEX(a,b)时,前者是不必要的,因为后者可以覆盖这些内容.您有3个不必要的键.这会减慢INSERTs.

When you have both INDEX(a) and INDEX(a,b), the former is unnecessary since the latter can cover for such. You have 3 unnecessary KEYs. This slows down INSERTs.

INT(3)-您是在说三位数"吗?如果是这样,请考虑将TINYINT UNSIGNED(值0..255)用于1个字节,而不是将INT用于4个字节.这样可以节省许多MB的磁盘空间,从而提高速度. (另请参见SMALLINT等,以及SIGNEDUNSIGNED.)

INT(3) -- Are you saying a "3-digit number"? If so consider TINYINT UNSIGNED (values 0..255) for 1 byte instead of INT for 4 bytes. This will save many MB of disk space, hence speed. (See also SMALLINT, etc, and SIGNED or UNSIGNED.)

如果filename重复很多,则可能需要对其进行标准化".这样可以节省很多MB.

If filename is repeated a lot, you may want to "normalize" it. This would save many MB.

除非您需要NULL,否则请使用NOT NULL.

Use NOT NULL unless you need NULL for something.

AUTO_INCREMENT=690892041表示您使用id大约是灾难的三分之一,最高可达20亿.您是否使用id进行任何操作?摆脱专栏将避免此问题;并将UNIQUE KEY更改为PRIMARY KEY. (如果您确实需要id,让我们进一步谈谈.)

AUTO_INCREMENT=690892041 implies that you are about 1/3 of the way to disaster with id, which will top out at about 2 billion. Do you use id for anything? Getting rid of the column would avoid the issue; and change the UNIQUE KEY to PRIMARY KEY. (If you do need id, let's talk further.)

ENGINE=MyISAM-切换有一些影响,包括有利和不利.桌子会变成原来的2-3倍. PRIMARY KEY的正确"选择将进一步大大加快 this SELECT的运行速度. (并且可能会也可能不会减慢其他SELECTs的速度.)

ENGINE=MyISAM -- Switching has some ramifications, both favorable and unfavorable. The table would become 2-3 times as big. The 'right' choice of PRIMARY KEY would further speed up this SELECT significantly. (And may or may not slow down other SELECTs.)

关于SELECT的注释:由于stringunit_num是查询中的常量,因此ORDER BY timestamp asc, string asc, unit_num asc的最后两个字段是不必要的.如果它们由于SELECT中不明显的原因而相关,则我的建议可能不完整.

A note on the SELECT: Since string and unit_num are constants in the query, the last two fields of ORDER BY timestamp asc, string asc, unit_num asc are unnecessary. If they are relevant for reasons not apparent in the SELECT, then my advice may be incomplete.

WHERE filename = 'foobar'
  AND unit_num='40'
  AND string='2' 
  AND timestamp >= ...

INDEX(filename, unit_name, string, timestamp)最佳处理. 列的顺序并不重要,除了timestamp必须是 last .重新排列当前的UNIQUE键,可以为您提供最佳索引. (与此同时,没有一个索引对此SELECT非常有用.)将其设为PRIMARY KEY和表InnoDB会使它变得更快.

is optimally handled by INDEX(filename, unit_name, string, timestamp). The order of the columns is not important except that timestamp needs to be last. Rearranging the current UNIQUE key, you give you the optimal index. (Meanwhile, none of the indexes is very good for this SELECT.) Making it the PRIMARY KEY and the table InnoDB would make it even faster.

分区?没有优势不是为了表现;不用于您提到的其他内容.分区的常见用途是清除旧"文件.如果您打算这样做,让我们进一步谈谈.

Partitioning? No advantage. Not for performance; not for anything else you have mentioned. A common use for partitioning is for purging 'old'. If you intend to do such, let's talk further.

在大表中,最好同时查看所有重要的SELECTs,这样我们就不会在破坏其他速度的同时加快速度. 甚至可能证明分区有助于这种折衷.

In huge tables it is best to look at all the important SELECTs simultaneously so that we don't speed up one while demolishing the speed of others. It may even turn out that partitioning helps in this kind of tradeoff.

这篇关于我可以将Mysql设置为自动分区吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆