从巨大的表中删除大量数据 [英] Delete huge amounts of data from huge table

查看:100
本文介绍了从巨大的表中删除大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个桌子.我们称它们为KEY和VALUE.
KEY很小,大约有1.000.000条记录.
VALUE很大,例如1.000.000.000条记录.

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.

在它们之间存在一个连接,使得每个KEY可能具有许多VALUES.它不是外键,但含义基本相同.

Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.

DDL看起来像这样

create table KEY (
 key_id int,
 primary key (key_id)
);

create table VALUE (
 key_id int,
 value_id int,
 primary key (key_id, value_id)
);

现在,我的问题.在VALUE中的所有key_id中,大约有一半已从KEY中删除,在两个表仍处于高负荷状态时,我需要以有序的方式删除它们.

Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.

这很容易

delete v 
  from VALUE v
  left join KEY k using (key_id)
 where k.key_id is null;

但是,由于不允许在多表删除上使用limit,所以我不喜欢这种方法.这样的删除需要花费几个小时才能运行,因此无法限制删除.

However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.

另一种方法是创建游标以查找所有丢失的key_id,并逐个删除它们并有限制.这似乎非常缓慢并且有点倒退.

Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.

还有其他选择吗?一些不错的技巧可能会有所帮助?

Are there any other options? Some nice tricks that could help?

推荐答案

对此有限制吗?

delete x 
  from `VALUE` x
  join (select key_id, value_id
          from `VALUE` v
          left join `KEY` k using (key_id)
         where k.key_id is null
         limit 1000) y
    on x.key_id = y.key_id AND x.value_id = y.value_id;

这篇关于从巨大的表中删除大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆