优化比较 MySQL 两个大表中的数据 [英] Optimise comparing data in two big MySQL tables
问题描述
我如何优化查询,它将找到所有记录,其中:
How could I optimise query, which will find all records, which:
- 有activation_request.date_confirmed 不为空
和
- 在另一个表中没有相关的字符串值:activation_request.email =user.username 不应返回任何记录
我试过了:
SELECT email
FROM activation_request l
LEFT JOIN user r ON r.username = l.email
WHERE l.date_confirmed is not null
AND r.username IS NULL
和
SELECT email
FROM activation_request
WHERE date_confirmed is not null
AND NOT EXISTS (SELECT 1
FROM user
WHERE user.username = activation_request.email
)
但是两个表都有 xxx.xxx.xxx 记录,因此在运行这些查询一整夜之后不幸的是我没有得到任何结果.
but both tables have xxx.xxx.xxx records hence after all night running those queries unfortunatelly I haven't got any results.
创建语句:
CREATE TABLE `activation_request` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`date_confirmed` datetime DEFAULT NULL,
`email` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
KEY `emailIdx` (`email`),
KEY `reminderSentIdx` (`date_reminder_sent`),
KEY `idx_resent_needed` (`date_reminder_sent`,`date_confirmed`),
) ENGINE=InnoDB AUTO_INCREMENT=103011867 DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`username` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
UNIQUE KEY `Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb` (`username`),
) ENGINE=InnoDB AUTO_INCREMENT=431400048 DEFAULT CHARSET=latin1;
LEFT JOIN 的说明:
Explain for LEFT JOIN:
[[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null,key:null, key_len:null, ref:null, rows:49148965, Extra:Using where],[id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null,密钥:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb,key_len:257, ref:null, rows:266045508, Extra:Using where;不存在;使用索引;使用连接缓冲区(块嵌套循环)]] [[id:1,选择类型:简单,表:l,类型:全部,可能的键:空,键:空,key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1,选择类型:简单,表:r,类型:索引,可能的键:空,密钥:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb,key_len:257, ref:null, rows:266045508, Extra:Using where;不存在;使用索引;使用连接缓冲区(块嵌套循环)]]
[[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]] [[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]]
在暂存数据库上添加索引(数据略少,但结构相同)后,查询现在运行约 24 小时,但仍然没有结果):
After adding indexes on staging db (with slightly less data, but the same structure) query is now running ~24h and still no results):
$ show processlist;
| Id | User | Host | db | Command | Time | State | Info
| 64 | root | localhost | staging_db | Query | 110072 | Sending data | SELECT ar.email FROM activation_request ar WHERE ar.date_confirmed is not null AND NOT EXISTS (SELE |
Mysql 版本:
$ select version();
5.6.16-1~exp1
列表中的所有其他命令都是 Sleep
,因此没有其他查询在运行,也没有可能干扰/锁定行.
All other commands on the list are Sleep
so there is no other query running and possibly disturbing/locking rows.
推荐答案
对于这个查询:
SELECT ar.email
FROM activation_request ar
WHERE ar.date_confirmed is not null AND
NOT EXISTS (SELECT 1
FROM user u
WHERE u.username = ar.email
)
我会推荐 activation_request(date_confirmed, email)
和 user(username)
的索引.
I would recommend indexes on activation_request(date_confirmed, email)
and user(username)
.
除非你有大量的数据,否则你的问题可能是表被锁定了.
Unless you have a really humongous amount of data, though, your problem may be that tables are locked.
这篇关于优化比较 MySQL 两个大表中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!