优化查询以获取整行,其中一个字段是组的最大值 [英] Optimizing query to get entire row where one field is the maximum for a group

查看:20
本文介绍了优化查询以获取整行,其中一个字段是组的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,其架构类似于,例如,

I have a table with a schema like, say,

EventTime   DATETIME(6),
EventType   VARCHAR(20),
Number1     INT,
Number2     INT,
Number3     INT,
...

在这个表中有难以想象的大量行,但为了这个查询,我只感兴趣,比如说,在 EventTime.EventTime 上有一个索引,如果我只是做类似的事情

There are an unimaginably large number of rows in this table, but for the sake of this query I'm only interested in, say, a few thousand of them that are between two given values of EventTime. There's an index on EventTime, and if I just do something like

SELECT * FROM table WHERE EventTime >= time1 and EventTime <= time2;

然后它能够​​几乎立即返回相关行.

Then it's able to return the relevant rows near-instantaneously.

在这个时间窗口的行中,我想精确地提取那些 Number1 是具有该 EventType 的任何行的最大行.所以换句话说,我想做一些与这个查询等效的事情:

Out of the rows in this time window, I want to extract precisely those where Number1 is the largest for any row with that EventType. So in other words I want to do something equivalent to this query:

SELECT * FROM
  (SELECT EventType, MAX(Number1) as max_Number1
   FROM table
   WHERE EventTime >= time1 AND EventTime <= time2
   GROUP BY EventType) AS a
  LEFT JOIN
  (SELECT * FROM table
   WHERE EventTime >= time1 AND EventTime <= time2) AS b
  ON a.EventType = b.EventType AND a.max_Number1 = b.Number1)

这看起来应该可以正常工作——我可以运行每个子查询,即

This seems like it should work just fine -- I can run each of the subqueries, namely

SELECT EventType, MAX(Number1) as max_Number1
FROM table
WHERE EventTime >= time1 AND EventTime <= time2
GROUP BY EventType;

SELECT * FROM table
WHERE EventTime >= time1 AND EventTime <= time2;

几乎是即时的,因此此时产生所需的结果应该不会太难:数据库可以通过 EventType 对两个子查询的结果进行排序或索引,然后将它们匹配起来.

virtually instantaneously, so at this point it shouldn't be too hard to produce the desired results: the database can sort or index the results of both subquery by EventType and then just match things up.

但是,当我实际运行它时,它需要 永远.我不知道需要多长时间,因为我从来没有让它完成,但它比我手动提取两个查询的结果并在其他地方进行合并所需的时间更长.

However, when I actually run this it takes forever. I don't know how long, because I've never let it complete, but it takes way longer than it would for me to just manually pull the results of both queries and do the merge elsewhere.

问题:

  1. 为什么要花这么长时间?数据库引擎在做什么?
  2. 有没有办法以合理的方式编写这个查询?
  3. 如果没有,我能不能把它写成一个存储过程?

困难:由于该表有数百亿行,因此向其添加任何进一步的索引将非常昂贵.

Difficulty: As this table has tens of billions of rows it would be quite costly to add any further indices to it.

推荐答案

实际上你已经非常接近一个好的查询了.您的主要缺点可能是在时间范围内从 table 中选择所有内容的 LEFT JOIN.请尝试以下操作:

You actually are already pretty close to a good query. The main drawback of yours is likely the LEFT JOIN on selecting all from table in the time frame. Try the following:

SELECT * FROM
table b
INNER JOIN (
    SELECT EventType, MAX(Number1) as max_Number1
    FROM table
    WHERE EventTime >= time1 AND EventTime <= time2
    GROUP BY EventType
) AS a
ON a.EventType = b.EventType
AND a.max_Number1 = b.Number1
WHERE b.EventTime >= time1 AND b.EventTime <= time2

理想情况下,这将伴随一个索引 (EventType,EventTime).请在您的问题中提供 SHOW CREATE TABLE 表,以便我们查看您当前拥有的索引.我们或许可以调整现有索引,或帮助您删除不需要的索引,以允许添加此新索引.

Ideally, this would be accompanied by an index (EventType,EventTime). Please provide the SHOW CREATE TABLE table in your question, so we can see what indexes you currently have. We may be able to tweak an existing one, or help you drop unneeded ones, to permit adding this new index.

免责声明:我的经验仅限于 MySQL 和 InnoDB,但我认为这对 MariaDB 和 MyISAM 仍然有帮助.

Disclaimer: My experience is pretty exclusively in MySQL and InnoDB, but I think this should still be helpful for MariaDB and MyISAM.

这篇关于优化查询以获取整行,其中一个字段是组的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆