MySQL:将大表拆分为小表的最快方法 [英] MySQL: The quickest way to split a big table into small tables

查看:2114
本文介绍了MySQL:将大表拆分为小表的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的桌子很大,有近3亿条记录.由于选择查询对我来说太慢了,我想将其拆分为大约800个小表.

I have a very big table, which has almost 300 million records. Since the select query is too slow for me, I want to split it into about 800 small tables.

数据集如下:

XXXXXX column2 column3 column4 ...
XXXXXX column2 column3 column4 ...
XXXXXX column2 column3 column4 ...
YYYYYY column2 column3 column4 ...
YYYYYY column2 column3 column4 ...

我想根据第一列的值拆分表(例如,将XXXXXX的记录拆分为表XXXXXX),最快的制作方法是什么?

I want to split the table based on the value of first column(e.g. record with XXXXXX splits into table XXXXXX), what's the quickest way to make it ?

注意:我已经为其添加了10个分区,但是它不能很好地加快它的速度.

Note: I have already added 10 partitions for it, but it doesn't speed it up very well.

推荐答案

在两种情况下,分区作为一种性能策略:

Partitioning works as a performance strategy under two circumstances:

  1. 该表的主查询最终进行表或索引扫描,并且位于具有足够资源和适当配置的系统上,以执行高级别的并行性.因此,如果所有分区都在同一个物理驱动器上,那么您所花的钱就不多了,那么您将像刚开始时一样受I/O约束.但是,如果您使用的是16核系统,则每个分区都位于物理上不同的磁盘上?分区可能会导致系统性能的惊人提高.

  1. The primary querie(s) that of that table end up doing table or index scans, and are on a system with adequate resources and appropriate configuration to do a high level of parallelism. So if all of the partitions are on the same physical drive, that doesn't buy you much, you're as I/O bound as you were in the first place. But if you're on a 16-core system, with each partition on a physically distinct disk? Partitioning may produce startling improvements in system performance.

分区规则使用一个索引,该索引通常在对该表的最普遍查询中使用.如果要通过该途径提高性能,则应该对索引值进行分区,该索引值通常用于过滤或约束结果集.最频繁的候选者是交易日期,因为报告通常是按日历日期范围进行的.然后,查询优化器可以使用分区规则将操作限制在一个(较小的)分区上,或者并行运行两个或多个分区扫描(要遵守上述相同的规定).

The partitioning rule uses an index that is often used in the most prevalent queries against that table. If you're going for performance by that route, you should partition on an indexed value that is often used to filter or constrain the result set. The most frequent candidate is transaction date, since reporting is often by a calendar date range. The query optimizer can then use the partitioning rule to constrict action to a single (smaller) partition, or to run two or more partition scans in parallel (subject to the same strictures mentioned above).

我假设要拆分此表的主要原因是为了提高性能.但是800个分区?如果要提高性能,那可能是错误的方法.企业数据库在高速缓存中保留尽可能多的顶级表索引,以实现良好的性能.在五级b树中,对于中等使用的表,很可能前三级始终在第一次访问后始终保留在缓存中(这对于具有整数主键的300M行表可能是一种配置) .通过将表分成800个部分,这意味着将有800个数据结构试图保持高速缓存(除了表数据本身之外).如果您的访问权限几乎由主键平均分配,则在一个分区上进行搜索将最终将其他分区移出,最终损害整体性能.

I'm presuming that the primary reason to want to split up this table is for performance. But 800 partitions? If performance improvement is what you're after, that may be the wrong approach. Enterprise databases keep as much top-level table indexes in cache memory for good performance. In a five-level b-tree, for a moderately used table, it's quite possible that the top three levels are always kept in cache, after their first access (this is a likely configuration for a 300M row table with an integer primary key). By splitting your table into 800 pieces, that means there will be 800 data structures to try to keep cached (in addition to table data itself). Chances are, if your access is more-or-less evenly distributed by the primary key, that searching on one partition will end up pushing other partitions out of cache, to the ultimate detriment of overall performance.

但是,如果您确定要执行此操作,则将表划分为N个部分的最简单方法是根据所需的主键分区数MODULUS对表进行分区(在您的情况下为primary_key % 800) .较新版本的MySQL还具有哈希分区支持,这使得分区成任意数量的集合相当简单:

Nevertheless, if you're determined to do this, the easiest way to partition a table into N pieces is to partition it by the MODULUS of number of partitions you want against the primary key (primary_key % 800, in your case). Newer version s of MySQL also have hash partition support, making partitioning into arbitrary numbers of sets fairly strightforward:

PARTITION BY HASH(some_column_value) PARTITIONS number_of_partitions

如果要将数据放入800个实际表中,则必须使用编辑器魔术或使用脚本语言,然后在SQL中进行操作:

If you want to put your data into 800 actual tables, instead, you'll have to do so editor magic, or use a scripting language, and do it in SQL:

CREATE TABLE table1 LIKE MasterTable
CREATE TABLE table2 LIKE MasterTable
CREATE TABLE table3 LIKE MasterTable
..
INSERT INTO table1 SELECT * FROM MasterTable WHERE id MOD 800 = 0
INSERT INTO table2 SELECT * FROM MasterTable WHERE id MOD 800 = 1
INSERT INTO table3 SELECT * FROM MasterTable WHERE id MOD 800 = 2

您可以使用动态SQL以您喜欢的编程语言循环执行此操作:这可能最容易呈现.

You could do this in a loop in your favorite programming language using dynamic SQL: that would probably be the easiest to render.

这篇关于MySQL:将大表拆分为小表的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆