如何提高大型InnoDB表的DELETE FROM性能? [英] How can I improve DELETE FROM performance on large InnoDB tables?

查看:116
本文介绍了如何提高大型InnoDB表的DELETE FROM性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的InnoDB表,其中包含约1000万行(并且不断增长,预计它将变成该大小的20倍).每行不是那么大(平均131 B),但是我不得不不时地删除其中的一部分,这已经很老了.这是表结构:

I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:

 CREATE TABLE `problematic_table` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    `taxid` int(10) unsigned NOT NULL,
    `blastdb_path` varchar(255) NOT NULL,
    `query` char(32) NOT NULL,
    `target` int(10) unsigned NOT NULL,
    `score` double NOT NULL,
    `evalue` varchar(100) NOT NULL,
    `log_evalue` double NOT NULL DEFAULT '-999',
    `start` int(10) unsigned DEFAULT NULL,
    `end` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `taxid` (`taxid`),
    KEY `query` (`query`),
    KEY `target` (`target`),
    KEY `log_evalue` (`log_evalue`)
) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;

从表中删除大块的查询就像这样:

Queries that delete large chunks from the table are simply like this:

DELETE FROM problematic_table WHERE problematic_table.taxid = '57';

这样的查询仅花费了将近一个小时的时间.我可以想象索引重写开销使这些查询非常慢.

A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.

我正在开发一个将在现有数据库上运行的应用程序.我很可能无法控制服务器变量,除非我强制对它们进行更改(我不希望这样做),因此,担心更改这些变量的建议价值不大.

I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.

我试图将不想删除的那些行INSERT ... SELECT放入临时表中,只删除其余的行,但是随着删除"与保留"的比率向保留"的方向移动,不再是有用的解决方案.

I have tried to INSERT ... SELECT those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.

此表将来可能会频繁出现INSERTSELECT,但没有UPDATE.基本上,这是一个日志记录和参考表,需要不时删除其部分内容.

This is a table that may see frequent INSERTs and SELECTs in the future, but no UPDATEs. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.

我可以通过限制索引的长度来改进此表上的索引吗?是否在交易期间切换到支持DISABLE KEYS的MyISAM帮助?我还能尝试什么来改善DELETE性能?

Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports DISABLE KEYS during transactions? What else could I try to improve DELETE performance?

一种这样的删除方式是大约删除一百万行.

One such deletion would be in the order of about one million of rows.

推荐答案

此解决方案完成后可以提供更好的性能,但是实现该过程可能需要一些时间.

This solution can provide better performance once completed, but the process may take some time to implement.

可以添加新的BIT列,对于活动",默认为TRUE,对于非活动",默认为FALSE.如果状态不够,则可以将TINYINT与256个可能的值一起使用.

A new BIT column can be added and defaulted to TRUE for "active" and FALSE for "inactive". If that's not enough states, you could use TINYINT with 256 possible values.

添加此新列可能会花费很长时间,但是一旦结束,只要您将PRIMARY删除,并且不对这个新列建立索引,您的更新就应该更快.

Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY as you do with your deletes and don't index this new column.

在像您这样的大型表上,InnoDB之所以需要很长时间来执行DELETE的原因是因为群集索引.它根据您的PRIMARY,找到的第一个UNIQUE或如果找不到PRIMARYUNIQUE可以确定为适当替代品的任何东西,对表进行物理排序,因此当删除一行时,它会现在可以对整个表进行物理重新排序,以提高速度和碎片整理.因此,DELETE花费的时间不是那么长.这是删除该行后的物理重新排序.

The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY, first UNIQUE it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY or UNIQUE, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long; it's the physical reordering after that row is removed.

创建固定宽度的列并对其进行更新而不是删除时,由于行和表本身消耗的空间是恒定的,因此无需对巨大的表进行物理重新排序.

When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.

在下班时间,单个DELETE可用于删除不必要的行.与删除单个行相比,此操作仍然会很慢,但总体上要快得多.

During off hours, a single DELETE can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.

这篇关于如何提高大型InnoDB表的DELETE FROM性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆