是否可以替代“最大"功能和分组(聚合操作的性能优化)? [英] Is there some replacement for 'max' function and grouping (performance optimization of aggregate operations)?
问题描述
我有一个很大的查询,它也返回很大的响应.查询看起来像这样:
I have a big query which also returns very big response. The query looks like this:
SELECT group, subgroup, max(last_update) FROM
(
SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... and some more selects (15 select queries in total)
) GROUP BY group, subgroup;
如您所见,我需要从某些组中加载最大日期.问题在于,这些日期需要从15个选择中加载,并且工作非常缓慢(〜4秒).我测试了该子选择
As you can see I need to load maximum date from some groups. The problem is that those dates needs to be loaded from 15 selects and it works very slow (~4s). I tested that subselect
SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... ans some more selects
快速正常运行(〜0.1s),问题出在分组功能上(这就是为什么查询工作缓慢的原因):
works pretty (~0.1s) fast and the problem is with grouping function (thats why query works slowly):
SELECT group, subgroup, max(last_update) FROM
(
...
) GROUP BY group, subgroup;
是否有一些方法可以改善这种分组?在我撰写本文时,目标是为组中的每个子组获取最大日期.
Is there some way to improve this grouping? As I wrote the goal is to get maximum dates for each subgroup in group.
推荐答案
我为您介绍了并行查询:
I offer you take a look at parallel queries:
create table ttt as
with t(a, b, c, d, a1, b1, c1, d1, last_updated) as (
select 1, 2, 3, 4, 1, 2, 3, 4, sysdate + 1 from dual union all
select 1, 2, 3, 4, 1, 2, 3, 4, sysdate from dual union all
select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 2 from dual union all
select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 1 from dual union all
select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 3 from dual union all
select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 2 from dual union all
select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 4 from dual union all
select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 3 from dual
)
select * from t;
select a grp, a1 subgrp, max(last_updated)
from ttt
group by a, a1
说明计划
让我们添加一些并行性:
Let's add some parallelism:
alter table ttt parallel;
select a grp, a1 subgrp, max(last_updated)
from ttt
group by a, a1
说明计划
如您所见,成本降低了.但这不是免费的,在并行查询执行期间,查询会使用您拥有的所有资源,因此可能会损害性能,但是您说此查询不是经常运行的,我认为这是一个很好的解决方案.要了解有关并行查询的更多信息,请查看此
As you can see the cost cut down. But it is not for free, during a parallel query execution the query use all the resources you have, so it could damage your performance, but you said that this query was run not so often, I think this is a good solution. To read more about parallel query take a look at this
这篇关于是否可以替代“最大"功能和分组(聚合操作的性能优化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!