慢查询:根据每个组的另一列上的 min 和 max 查找值之间的差异 [英] Slow query: find difference between values base on min and max on another column for each group
问题描述
对于选定的odds_type,对于每个相同的fix_id,我正在寻找更新列中MIN和MAX的赔率差异.通过输出,我想取回更新列中的值为最大值的其他行.这就是我的桌子的样子
I'm looking for a difference in odds where MIN and MAX in updated column for each same fix_id for selected odds_type. With the output I would like to get back other rows where value in updated column was max. That's how my table looks like
fix_id | 赔率 | 市场 | 更新 | 输入 |
---|---|---|---|---|
120 | 1.80 | home | 160 | 平均 |
120 | 1.40 | home | 150 | 平均 |
120 | 2.00 | home | 110 | 平均 |
188 | 1.00 | u/o | 200 | REG |
121 | 1.60 | 离开 | 160 | 平均 |
121 | 1.40 | 离开 | 150 | 平均 |
121 | 1.10 | 离开 | 110 | 平均 |
我期待得到什么
fix_id | 赔率 | 市场 | 更新 | 差异 | 输入 |
---|---|---|---|---|---|
120 | 1.80 | home | 160 | -0.2 | 平均 |
121 | 1.60 | 离开 | 160 | 0.5 | 平均 |
我正在使用的代码正在运行,但处理时间太长,而且数据越多,我就会出现超时错误
The code I'm using is working but it takes too much time to process and with more data I get time out error
SELECT a.*
, a.odds - b.odds delta
FROM
( SELECT x.*
FROM average_odds x
JOIN
( SELECT fix_id
, market
, MAX(updated) min_updated
FROM average_odds x where type='avg'
GROUP BY fix_id
, market
) y
ON y.fix_id = x.fix_id
AND y.market = x.market
AND y.min_updated = x.updated
) a
JOIN
( SELECT x.*
FROM average_odds x
JOIN
( SELECT fix_id
, market
, MIN(updated) min_updated
FROM average_odds x where type='avg'
GROUP BY fix_id
, market
) y
ON y.fix_id = x.fix_id
AND y.market = x.market
AND y.min_updated = x.updated
) b
ON b.fix_id = a.fix_id
AND b.market = a.market
ORDER BY `delta` ASC
这里也是查询用了 15 秒的 sql 的解释表
Here is also explain table from sql for the query that took 15 seconds
ID | S TYPE | 表格.. | parti | 输入 | pos_keys | KEY | key len | ref | 行 | 过滤 | 额外 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | 衍生3> | null | 全部 | null | null | null | null | 17466 | 100.00 | 使用临时的;使用文件排序 |
1 | PRIMARY | x | null | ref | 修复、修复、市场、更新 | 修复 | 4 | y.fix_id | 596 | 0.11 | 使用 where |
1 | PRIMARY | x | null | ref | 修复、修复、市场、更新 | 修复 | 4 | y.fix_id | 596 | 2.27 | 使用 where |
1 | PRIMARY | 衍生5> | null | ref | auto_key0> | auto_key0> | 31 | y.fix_id,y.market,bobi.x.updated | 10 | 100.00 | 使用索引 |
5 | 派生 | x | null | ref | boki | boki | 4 | const | 17466 | 100.00 | 使用索引条件;使用临时;使用文件... |
3 | 派生 | x | null | ref | boki | boki | 4 | const | 17466 | 100.00 | 使用索引条件;使用临时;使用文件... |
有一个表格定义
字段 | TYPE | NULL | key | 默认 | 额外 |
---|---|---|---|---|---|
fix_id | int(20) | no | MUL | NULL | |
市场 | varchar(20) | no | MUL | NULL | |
标签 | varchar(20) | no | NULL | ||
赔率 | 十进制(4,2) | no | NULL | ||
更新 | int(20) | no | MUL | NULL | |
ev_tstamp | int(20) | no | NULL | ||
TYPE | varchar(20) | no | MUL | NULL | |
market_id | int(20) | no | NULL |
目前查询需要 15 秒才能在大约 75 000 行上执行,但大部分时间我都会出现超时错误.我需要在有 500 000 行的表上使用它,我想知道这是否可能.P.S 我使用的是 MySQL 5.7.29 并且我在共享服务器上,所以我无法将其升级到 8
At the moment the query takes 15s to perform on around 75 000 rows but most of the time I get time out error. I need to use it on table with 500 000 rows and I wonder if that's possible. P.S I'm using MySQL 5.7.29 and I'm on the share server so I'm not able to upgrade it to 8
推荐答案
使用 NOT EXISTS
过滤表和相关子查询以获得最早odds
代码>更新代码>:
Use NOT EXISTS
to filter the table and a correlated subquery to get the odds
for the earliest update
:
SELECT a.*,
a.odds - (SELECT t.odds FROM average_odds t WHERE t.fix_id = a.fix_id ORDER BY t.updated LIMIT 1) diff
FROM average_odds a
WHERE type = 'avg'
AND NOT EXISTS (SELECT 1 FROM average_odds t WHERE t.fix_id = a.fix_id AND t.updated > a.updated)
请参阅演示.结果:
fix_id | 赔率 | 市场 | 更新 | 输入 | 差异 |
---|---|---|---|---|---|
120 | 1.80 | home | 160 | 平均 | -0.20 |
121 | 1.60 | 离开 | 160 | 平均 | 0.50 |
这篇关于慢查询:根据每个组的另一列上的 min 和 max 查找值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!