从巨大的表中删除大量数据 [英] Delete huge amounts of data from huge table
问题描述
我有两个桌子.我们称它们为KEY和VALUE.
KEY很小,大约有1.000.000条记录.
VALUE很大,例如1.000.000.000条记录.
I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.
在它们之间存在一个连接,使得每个KEY可能具有许多VALUES.它不是外键,但含义基本相同.
Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.
DDL看起来像这样
create table KEY (
key_id int,
primary key (key_id)
);
create table VALUE (
key_id int,
value_id int,
primary key (key_id, value_id)
);
现在,我的问题.在VALUE中的所有key_id中,大约有一半已从KEY中删除,在两个表仍处于高负荷状态时,我需要以有序的方式删除它们.
Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.
这很容易
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null;
但是,由于不允许在多表删除上使用limit
,所以我不喜欢这种方法.这样的删除需要花费几个小时才能运行,因此无法限制删除.
However, as it's not allowed to have a limit
on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.
另一种方法是创建游标以查找所有丢失的key_id,并逐个删除它们并有限制.这似乎非常缓慢并且有点倒退.
Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.
还有其他选择吗?一些不错的技巧可能会有所帮助?
Are there any other options? Some nice tricks that could help?
推荐答案
对此有限制吗?
delete x
from `VALUE` x
join (select key_id, value_id
from `VALUE` v
left join `KEY` k using (key_id)
where k.key_id is null
limit 1000) y
on x.key_id = y.key_id AND x.value_id = y.value_id;
这篇关于从巨大的表中删除大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!