是否可以替代“最大"功能和分组(聚合操作的性能优化)? [英] Is there some replacement for 'max' function and grouping (performance optimization of aggregate operations)?

查看:82
本文介绍了是否可以替代“最大"功能和分组(聚合操作的性能优化)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的查询,它也返回很大的响应.查询看起来像这样:

I have a big query which also returns very big response. The query looks like this:

SELECT group, subgroup, max(last_update) FROM
(
    SELECT a as group, a1 as subgroup, d1 as last_update FROM....
    UNION ALL
    SELECT b as group, b1 as subgroup, d2 as last_update FROM....
    UNION ALL
    SELECT c as group, c1 as subgroup, d3 as last_update FROM....
    UNION ALL
    SELECT d as group, d1 as subgroup, d3 as last_update FROM....
    UNION ALL
    SELECT e as group, e1 as subgroup, d4 as last_update FROM....
    ... and some more selects (15 select queries in total)
) GROUP BY group, subgroup;

如您所见,我需要从某些组中加载最大日期.问题在于,这些日期需要从15个选择中加载,并且工作非常缓慢(〜4秒).我测试了该子选择

As you can see I need to load maximum date from some groups. The problem is that those dates needs to be loaded from 15 selects and it works very slow (~4s). I tested that subselect

SELECT a as group, a1 as subgroup, d1 as last_update FROM....
UNION ALL
SELECT b as group, b1 as subgroup, d2 as last_update FROM....
UNION ALL
SELECT c as group, c1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT d as group, d1 as subgroup, d3 as last_update FROM....
UNION ALL
SELECT e as group, e1 as subgroup, d4 as last_update FROM....
... ans some more selects

快速正常运行(〜0.1s),问题出在分组功能上(这就是为什么查询工作缓慢的原因):

works pretty (~0.1s) fast and the problem is with grouping function (thats why query works slowly):

SELECT group, subgroup, max(last_update) FROM
(
    ...
) GROUP BY group, subgroup;

是否有一些方法可以改善这种分组?在我撰写本文时,目标是为组中的每个子组获取最大日期.

Is there some way to improve this grouping? As I wrote the goal is to get maximum dates for each subgroup in group.

推荐答案

我为您介绍了并行查询:

I offer you take a look at parallel queries:

create table ttt as
with t(a, b, c, d, a1, b1, c1, d1, last_updated) as (
  select 1, 2, 3, 4, 1, 2, 3, 4, sysdate + 1 from dual union all
  select 1, 2, 3, 4, 1, 2, 3, 4, sysdate from dual union all
  select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 2 from dual union all
  select 2, 3, 4, 5, 2, 3, 4, 5, sysdate + 1 from dual union all
  select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 3 from dual union all
  select 3, 4, 5, 6, 3, 4, 5, 6, sysdate + 2 from dual union all
  select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 4 from dual union all
  select 4, 5, 6, 7, 4, 5, 6, 7, sysdate + 3 from dual 
)
select * from t;

select a grp, a1 subgrp, max(last_updated)
  from ttt
 group by a, a1

说明计划

让我们添加一些并行性:

Let's add some parallelism:

alter table ttt parallel;

select a grp, a1 subgrp, max(last_updated)
  from ttt
 group by a, a1

说明计划

如您所见,成本降低了.但这不是免费的,在并行查询执行期间,查询会使用您拥有的所有资源,因此可能会损害性能,但是您说此查询不是经常运行的,我认为这是一个很好的解决方案.要了解有关并行查询的更多信息,请查看

As you can see the cost cut down. But it is not for free, during a parallel query execution the query use all the resources you have, so it could damage your performance, but you said that this query was run not so often, I think this is a good solution. To read more about parallel query take a look at this

这篇关于是否可以替代“最大"功能和分组(聚合操作的性能优化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆