导入前后的InnoDB索引 [英] InnoDB indexes before and after importing

查看:85
本文介绍了导入前后的InnoDB索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试导入mysqldump为InnoDB表生成的大型SQL文件,但是即使在my.cnf中调整了一些参数并禁用了AUTOCOMMIT(以及FOREIGN_KEY_CHECKS和UNIQUE_CHECKS之后,也要花很长时间),该表没有任何外键或唯一键).但是我想知道是否由于表中的多个索引花费了这么长时间?

I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.

看一下SQL文件,似乎在插入所有数据之前已在CREATE TABLE语句中创建了索引.根据我的(有限)研究和个人经验,我发现插入所有数据.是否不必检查每个INSERT的索引?我知道mysqldump确实具有--disable-keys选项,该选项确实可以做到这一点–在插入之前禁用键,但是显然,这仅适用于MyISAM表,而不适用于InnoDB.

Looking at the SQL file, it appears that the indexes are being created in the CREATE TABLE statement, prior to inserting all the data. Based on my (limited) research and personal experience, I've found that it's faster to add the indexes after inserting all the data. Does it not have to check the indexes for every INSERT? I know that mysqldump does have a --disable-keys option which does exactly that – disable the keys prior to inserting, but apparently this only works with MyISAM tables and not InnoDB.

但是,为什么mysqldump不能在InnoDB表的CREATE TABLE语句中不包含键,然后在插入所有数据之后执行ALTER TABLE?还是InnoDB的工作方式不同,并且速度没有差别?

But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?

谢谢!

推荐答案

我在过去的工作中对这个概念进行了一些试验,我们需要一种在MySQL服务器之间复制架构的快速方法.

I experimented with this concept a bit at a past job, where we needed a fast method of copying schemas between MySQL servers.

当您插入具有二级索引的表时,确实存在性能开销.插入需要更新聚簇索引(又名表),也需要更新二级索引.表具有的索引越多,插入的开销就越大.

There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.

InnoDB具有称为更改缓冲区的功能通过延迟索引更新会有所帮助,但最终必须将它们合并.

InnoDB has a feature called the change buffer which helps a bit by postponing index updates, but they have to get merged eventually.

插入没有二级索引的表的速度更快,因此很容易尝试将索引创建推迟到数据加载后再进行,如您所描述的那样.

Inserts to a table with no secondary indexes are faster, so it's tempting to try to defer index creation until after your data is loaded, as you describe.

Percona Server(MySQL的一个分支)尝试了mysqldump --optimize-keys选项.当您使用此选项时,它会将mysqldump的输出更改为不带索引的CREATE TABLE,然后插入所有数据,然后在数据加载后添加ALTER TABLE以添加索引.参见 https://www.percona.com/doc/percona -server/LATEST/management/innodb_expanded_fast_index_creation.html

Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See https://www.percona.com/doc/percona-server/LATEST/management/innodb_expanded_fast_index_creation.html

但是根据我的经验,性能的净改善很小.即使没有索引的表也要花很多时间才能插入很多行.然后,还原需要运行ALTER TABLE来建立索引.一张大桌子要花一些时间.当您计算INSERT的时间加上建立索引的额外时间时,它比将传统方法插入具有索引的表中仅快百分之几(低个位数).

But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.

此后处理索引创建的另一个好处是索引存储更紧凑,因此,如果需要节省磁盘空间,那是使用此技术的更好理由.

Another benefit of this post-processing index creation is that the indexes are stored more compactly, so if you need to save disk space, that's a better reason to use this technique.

我发现通过并行加载多个表对恢复性能更有利.

I found it much more beneficial to performance to restore by loading several tables in parallel.

  • 新的MySQL 8.0工具 mysqlpump 支持多种线程转储.
  • 开源工具 mydumper 支持多线程转储,并且还具有多线程转储功能-线程还原工具,称为myloader. mydumper/myloader的最糟糕的缺点是该文档实际上不存在,因此您必须是一个勇敢的超级用户才能弄清楚如何运行它.
  • The new MySQL 8.0 tool mysqlpump supports multi-threaded dump.
  • The open-source tool mydumper supports multi-threaded dump, and also has a multi-threaded restore tool, called myloader. The worst downside of mydumper/myloader is that the documentation is virtually non-existant, so you need to be an intrepid power user to figure out how to run it.

另一种策略是使用mysqldump --tab转储CSV文件而不是SQL脚本.批量加载CSV文件比执行SQL脚本还原数据要快得多.好吧,它转储用于表定义的SQL文件和用于导入数据的CSV文件.它为每个表创建单独的文件.您必须通过加载所有SQL文件来手动重新创建表(这是快速的),然后使用 mysqlimport 以加载CSV数据文件. mysqlimport工具甚至具有用于并行执行的--use-threads选项.

Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.

使用不同数量的并行线程进行仔细测试.我的经验是4个线程是最好的.随着更大的并行度,InnoDB成为瓶颈.但是您的经验可能会有所不同,具体取决于MySQL的版本和服务器硬件的性能.

Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.

最快的还原方法是使用物理备份工具时,最流行的还原方法是

The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.

这篇关于导入前后的InnoDB索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆