MySQL永远接受“发送数据".简单查询,大量数据 [英] MySQL taking forever 'sending data'. Simple query, lots of data

查看:137
本文介绍了MySQL永远接受“发送数据".简单查询,大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在一个相当大的数据集上运行一个我认为是简单的查询,并且要花很长时间才能执行-它在发送数据"状态下停滞了3-4个小时或更长时间

I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.

该表如下所示:

CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'

查询在这里

select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;

该表大约有2.22亿行,查询中的where子句最多可过滤约2000万行.单独选项会将其减少到大约700,000行,然后进行分组(当查询最终完成时),实际上返回了4到5行.

The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.

我意识到这是很多数据,但是似乎4-5个小时对于这个查询来说是很长的时间.

I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.

谢谢.

供参考,它在db.m2.4xlarge RDS数据库实例上的AWS上运行.

For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.

推荐答案

这是一个非常繁琐的查询.要了解为什么要花这么长时间,您应该了解详细信息.

This is a really heavy query. To understand why it takes so long you should understand the details.

在索引字段上有一个范围条件,即MySQL在索引中找到最小的创建值,并为每个值从索引中获取相应的主键,从磁盘中检索行,并获取所需的字段(当前索引记录中缺少协议(userId),则将它们放在临时表"中,从而对这700000行进行分组.该索引实际上可以使用,并且仅在此处用于加速范围条件.

You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.

加快速度的唯一方法是拥有一个包含所有必要数据的索引,这样MySQL就不需要在磁盘上对行进行查找.这称为covering index.但是您应该理解,索引将驻留在内存中并且将包含〜sizeOf(created+protocol+userId+PK)*rowCount个字节,这本身可能会成为更新表和其他索引的查询的负担.创建一个单独的聚合表并使用您的查询定期更新该表比较容易.

The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.

这篇关于MySQL永远接受“发送数据".简单查询,大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆