优化查询以获取整个行,其中一个字段是组的最大值 [英] Optimizing query to get entire row where one field is the maximum for a group

查看:83
本文介绍了优化查询以获取整个行,其中一个字段是组的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有类似模式的表

I have a table with a schema like, say,

EventTime   DATETIME(6),
EventType   VARCHAR(20),
Number1     INT,
Number2     INT,
Number3     INT,
...

此表中有大量行,但是对于此查询,我只感兴趣,例如,其中几行位于两个给定值EventTime之间的行. EventTime上有一个索引,如果我只是做类似的事情

There are an unimaginably large number of rows in this table, but for the sake of this query I'm only interested in, say, a few thousand of them that are between two given values of EventTime. There's an index on EventTime, and if I just do something like

SELECT * FROM table WHERE EventTime >= time1 and EventTime <= time2;

然后便可以立即返回相关行.

Then it's able to return the relevant rows near-instantaneously.

在此时间窗口的行中,我想精确地提取Number1对于具有EventType的任何行最大的那些行.因此,换句话说,我想做与此查询等效的事情:

Out of the rows in this time window, I want to extract precisely those where Number1 is the largest for any row with that EventType. So in other words I want to do something equivalent to this query:

SELECT * FROM
  (SELECT EventType, MAX(Number1) as max_Number1
   FROM table
   WHERE EventTime >= time1 AND EventTime <= time2
   GROUP BY EventType) AS a
  LEFT JOIN
  (SELECT * FROM table
   WHERE EventTime >= time1 AND EventTime <= time2) AS b
  ON a.EventType = b.EventType AND a.max_Number1 = b.Number1)

这似乎应该可以正常工作-我可以运行每个子查询,即

This seems like it should work just fine -- I can run each of the subqueries, namely

SELECT EventType, MAX(Number1) as max_Number1
FROM table
WHERE EventTime >= time1 AND EventTime <= time2
GROUP BY EventType;

SELECT * FROM table
WHERE EventTime >= time1 AND EventTime <= time2;

几乎是瞬间完成的,因此此时产生想要的结果应该不太困难:数据库可以通过EventType对两个子查询的结果进行排序或索引,然后进行匹配.

virtually instantaneously, so at this point it shouldn't be too hard to produce the desired results: the database can sort or index the results of both subquery by EventType and then just match things up.

但是,当我实际运行此程序时,它需要永远.我不知道要花多长时间,因为我从未让它完成,但是它花费的时间比我手动提取两个查询的结果并在其他位置进行合并所需的时间长.

However, when I actually run this it takes forever. I don't know how long, because I've never let it complete, but it takes way longer than it would for me to just manually pull the results of both queries and do the merge elsewhere.

问题:

  1. 为什么要花这么长时间?数据库引擎在做什么?
  2. 有没有一种可以合理执行查询的方式来编写此查询?
  3. 如果没有,我可以以某种方式将其写为存储过程吗?

难度:由于此表有数百亿行,因此向其添加任何其他索引将非常昂贵.

Difficulty: As this table has tens of billions of rows it would be quite costly to add any further indices to it.

推荐答案

您实际上已经非常接近一个好的查询了.您的主要缺点可能是在时间范围内从table中选择所有内容的左联接.请尝试以下操作:

You actually are already pretty close to a good query. The main drawback of yours is likely the LEFT JOIN on selecting all from table in the time frame. Try the following:

SELECT * FROM
table b
INNER JOIN (
    SELECT EventType, MAX(Number1) as max_Number1
    FROM table
    WHERE EventTime >= time1 AND EventTime <= time2
    GROUP BY EventType
) AS a
ON a.EventType = b.EventType
AND a.max_Number1 = b.Number1
WHERE b.EventTime >= time1 AND b.EventTime <= time2

理想情况下,这将伴随索引(EventType,EventTime).请在您的问题中提供SHOW CREATE TABLE table,以便我们可以看到您当前拥有的索引.我们可以调整现有索引,或帮助您删除不需要的索引,以允许添加此新索引.

Ideally, this would be accompanied by an index (EventType,EventTime). Please provide the SHOW CREATE TABLE table in your question, so we can see what indexes you currently have. We may be able to tweak an existing one, or help you drop unneeded ones, to permit adding this new index.

免责声明:我的经验几乎完全是在MySQL和InnoDB中,但是我认为这对于MariaDB和MyISAM仍然有帮助.

Disclaimer: My experience is pretty exclusively in MySQL and InnoDB, but I think this should still be helpful for MariaDB and MyISAM.

这篇关于优化查询以获取整个行,其中一个字段是组的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆