Oracle SQL层次查询:展平层次结构并执行聚合 [英] Oracle SQL Hierarchical Query: Flatten Hierarchy and Perform Aggregation

查看:103
本文介绍了Oracle SQL层次查询:展平层次结构并执行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力提高性能,以证明我已经写过并且没有运气的概念证明.我认为该方法可能有缺陷,但是我正在努力寻找其他解决方案.我已经介绍了我可以找到的所有Ask Tom文章和论坛帖子.

我们正在运行Oracle 10g R2.

我们有按层次结构排列的项目.数量在关系上定义.层次结构中有两种类型的对象:逻辑分组的程序集和代表实际项目的项目.因此,如果我们要代表一个完整的工具集,我们将有一个代表整个工具集的根,以及一个代表实际工具的叶子.所以:

工具套件->螺丝刀->平头螺丝刀->小型平头螺丝刀

程序集以及项目都可以在层次结构中重复使用.

我需要展平层次结构,以便项目的每个实例都有一行和数量.任何一种关系的数量都可以> =1.要获得一个项目的数量,我们需要从根到叶的所有关系中获得数量的乘积.

我的解决方案有效,但扩展性不佳.根据实际数据运行,大约需要8分钟才能产生6000+行,而我们的层次结构将产生50k +行.理想情况下,此过程将在10秒或更短的时间内完成,但是我知道这很乐观;)

我的解决方案和简化的数据集如下.任何反馈将不胜感激!

 CREATE TABLE ITEMHIER
(
  PARENT          VARCHAR2(30 BYTE),
  CHILD           VARCHAR2(30 BYTE),
  QUANTITY        NUMBER(15,2),
  ISLEAF          NUMBER
);

INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM001',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM002',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM003',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM002',10,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM004',3,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM005',12,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM006',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM006',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM005',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY005',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY006',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY008',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY005',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY006',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY002',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY003',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY004',1,0);

COMMIT;
/

CREATE OR REPLACE FUNCTION GETQTY(P_NAVPATH   IN VARCHAR2,
                                  P_STARTWITH IN VARCHAR2) RETURN INTEGER AS

R_QTY  INTEGER;

BEGIN

    SELECT EXP(SUM(LN(QUANTITY)))
    INTO R_QTY
    FROM (
           SELECT QUANTITY, SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH
           FROM ITEMHIER
           START WITH PARENT = P_STARTWITH
           CONNECT BY PRIOR  CHILD = PARENT
         )
    WHERE INSTR(P_NAVPATH, NAV_PATH) = 1; 

    RETURN R_QTY;
END;
/

SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;
 

----编辑

使用WITH子句,我可以将处理时间减少大约1/2,这是一个很大的收获!还有其他想法吗?

with
h as (
    select sys_connect_by_path(child,'/') navpath,
          child,
          quantity qty,
          isleaf
    from itemhier
    start with parent = 'ASSY001'
    connect by prior child = parent
)
select h1.navpath,
       h1.child,
       (SELECT exp(sum(ln(h2.qty)))
        FROM h h2
        WHERE instr(h1.navpath, h2.navpath) = 1) qty
from h h1
where isleaf = 1

编辑2

jonearles建议使用sys_connect_by_path来构建算术表达式,然后使用PL/SQL对其进行评估似乎是可行的方法.在最大的数据集上运行,我能够在55秒内产生77k行的输出.

我也尝试使用并行性,但是正如他指出的那样,几乎没有性能提升.

解决方案

您可以使用SYS_CONNECT_BY_PATH生成表达式,即分支中所有数量的乘积.然后使用函数动态执行该表达式以获取最终数量.

这不是理想的解决方案. SQL和PL/SQL之间的上下文切换将花费一些时间.您将需要担心SQL注入.但是至少您可以避免两次查询同一张表.

(如Dan A.和podiluska所建议的那样,递归CTE很可能是最好的解决方案.以我的经验,即使两种语法在做同样的事情并且使用相似的访问路径,递归CTE也会非常有效.比connect by快.但是您需要等到升级到11gR2.)

CREATE OR REPLACE FUNCTION EVALUATE_EXPRESSION(P_EXPRESSION IN VARCHAR2) RETURN NUMBER AS
    R_QTY  INTEGER;
BEGIN
    EXECUTE IMMEDIATE 'SELECT '||P_EXPRESSION||' FROM DUAL' INTO R_QTY;
    RETURN R_QTY;
END;
/


SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2) QTY_EXPRESSION,
      EVALUATE_EXPRESSION(SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2)) QTY2,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

此外,您提到该表具有索引.但是查询是否使用索引?你可以发布解释计划吗?

最后,使用这种缓慢的查询,您可能需要研究并行性.不幸的是,使用并行性和connect by我从来没有碰过运气.

I am trying to improve performance for a proof of concept I have already written and am having no luck. I think the approach is probably flawed, but I’m struggling to find another solution. I’ve covered all the Ask Tom articles and forum posts I can find.

We’re running Oracle 10g R2.

We have items arranged in a hierarchical structure. Quantities are defined on the relationships. There are two types of objects in the hierarchy: assemblies that are logical groupings, and items that represent an actual item. So if we were representing a full tool set we would have a root representing the whole tool set, and a leaf that represent an actual tool. So:

tool set -> screw drivers -> flat head screw drivers -> small flat head screw driver

The assemblies can be reused in the hierarchy, as can the items.

I need to flatten the hierarchy so each instance of an item has a row, and the quantity. Any of the relationships can have a quantity >= 1. To get the quantity of an item we need to get the product of the quantities from all the relationships from the root to the leaf.

My solution works, but it does not scale well. Running against actual data it is taking about 8 minutes to produce 6000+ rows, and we have hierarchies that would produce 50k+ rows. Ideally this would be completed in 10 seconds or less, but I know that’s… optimistic ;)

My solution and simplified dataset is below. Any feedback will be much appreciated!

CREATE TABLE ITEMHIER
(
  PARENT          VARCHAR2(30 BYTE),
  CHILD           VARCHAR2(30 BYTE),
  QUANTITY        NUMBER(15,2),
  ISLEAF          NUMBER
);

INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM001',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM002',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM003',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM002',10,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM004',3,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM005',12,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM006',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM006',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM005',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY005',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY006',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY008',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY005',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY006',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY002',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY003',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY004',1,0);

COMMIT;
/

CREATE OR REPLACE FUNCTION GETQTY(P_NAVPATH   IN VARCHAR2,
                                  P_STARTWITH IN VARCHAR2) RETURN INTEGER AS

R_QTY  INTEGER;

BEGIN

    SELECT EXP(SUM(LN(QUANTITY)))
    INTO R_QTY
    FROM (
           SELECT QUANTITY, SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH
           FROM ITEMHIER
           START WITH PARENT = P_STARTWITH
           CONNECT BY PRIOR  CHILD = PARENT
         )
    WHERE INSTR(P_NAVPATH, NAV_PATH) = 1; 

    RETURN R_QTY;
END;
/

SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

----EDIT

Using the WITH clause I was able to cut processing time in about 1/2, which is a great gain! Any other ideas?

with
h as (
    select sys_connect_by_path(child,'/') navpath,
          child,
          quantity qty,
          isleaf
    from itemhier
    start with parent = 'ASSY001'
    connect by prior child = parent
)
select h1.navpath,
       h1.child,
       (SELECT exp(sum(ln(h2.qty)))
        FROM h h2
        WHERE instr(h1.navpath, h2.navpath) = 1) qty
from h h1
where isleaf = 1

EDIT 2

jonearles suggestion to use the sys_connect_by_path to build an arithmetic expression, then use PL/SQL to evaluate it appears to be the way to go. Running against my largest dataset I was able to produce 77k rows of output in 55 seconds.

I also attempted to use parallelism, but as he noted there was little to no performance gain to be had.

解决方案

You can use SYS_CONNECT_BY_PATH to generate an expression, the product of all quantities in a branch. Then use a function to dynamically execute that expression to get the final quantity.

It's not an ideal solution. The context switches between SQL and PL/SQL will take some time. And you'll need to worry about SQL injection. But at least you can avoid querying the same table twice.

(A recursive CTE, as Dan A. and podiluska suggested, would very likely be the best solution. In my experience, even when the two syntaxes are doing the same thing and using similar access paths, the recursive CTE can be significantly faster than connect by. But you'll need to wait until an upgrade to 11gR2.)

CREATE OR REPLACE FUNCTION EVALUATE_EXPRESSION(P_EXPRESSION IN VARCHAR2) RETURN NUMBER AS
    R_QTY  INTEGER;
BEGIN
    EXECUTE IMMEDIATE 'SELECT '||P_EXPRESSION||' FROM DUAL' INTO R_QTY;
    RETURN R_QTY;
END;
/


SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2) QTY_EXPRESSION,
      EVALUATE_EXPRESSION(SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2)) QTY2,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

Also, you mentioned that the table has indexes. But are the queries using the indexes? Can you post the explain plan?

Finally, with a query this slow you may need to look into parallelism. Unfortunately, I've never had much luck using parallelism and connect by.

这篇关于Oracle SQL层次查询:展平层次结构并执行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆