计算其父母拥有的根的百分比 [英] Calculate the percentage of the root owned by its parents

查看:87
本文介绍了计算其父母拥有的根的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之,我正在尝试计算其父本拥有的一棵树的根的百分比,并进一步计算该树的根.我该如何单独在SQL中执行此操作?

这是我的(样本)模式.请注意,尽管层次结构本身非常简单,但是还有一个附加的holding_id,这意味着单亲可以拥有"其孩子的不同部分.

create table hierarchy_test ( 
       id number -- "root" ID
     , parent_id number -- Parent of ID
     , holding_id number -- The ID can be split into multiple parts
     , percent_owned number (3, 2)
     , primary key (id, parent_id, holding_id) 
        );

以及一些示例数据:

insert all 
 into hierarchy_test values (1, 2, 1, 1) 
 into hierarchy_test values (2, 3, 1, 0.25)
 into hierarchy_test values (2, 4, 1, 0.25)
 into hierarchy_test values (2, 5, 1, 0.1)
 into hierarchy_test values (2, 4, 2, 0.4)
 into hierarchy_test values (4, 5, 1, 1)
 into hierarchy_test values (5, 6, 1, 0.3)
 into hierarchy_test values (5, 7, 1, 0.2)
 into hierarchy_test values (5, 8, 1, 0.5)
select * from dual;

SQL小提琴

以下查询返回我要进行的计算.据SYS_CONNECT_BY_PATH的性质,据我所知,它本身无法执行计算.

 select a.*, level as lvl
      , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
   from hierarchy_test a
  start with id = 1
connect by nocycle prior parent_id = id

数据中存在周期性关系,只是在此示例中没有.

此刻,我将使用一个非常简单的函数将calc列中的字符串转换为数字

create or replace function some_sum ( P_Sum in varchar2 ) return number is
   l_result number;
begin  
   execute immediate 'select ' || P_Sum || ' from dual'
     into l_result;

   return l_result;   
end;
/

这似乎是一种荒谬的处理方式,我宁愿避免解析动态SQL 1 会花费额外的时间.

从理论上讲,我认为我应该能够使用MODEL子句进行计算.我的问题是由树的非唯一性引起的.我使用MODEL子句执行此操作的尝试之一是:

select *
  from ( select a.*, level as lvl
              , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
           from hierarchy_test a
          start with id = 1
        connect by nocycle prior parent_id = id
                 )
 model
 dimension by (lvl ll, id ii)
 measures (percent_owned, parent_id )
 rules upsert all ( 
   percent_owned[any, any]
   order by ll, ii  = percent_owned[cv(ll), cv(ii)] * nvl( percent_owned[cv(ll) - 1, parent_id[cv(ll), cv(ii)]], 1)
               )

可以理解的是,此操作失败,并且出现以下情况:

ORA-32638:模型尺寸中的非唯一寻址

使用唯一唯一引用失败的原因类似,即ORDER BY子句不是唯一的.

tl; dr

是否有一种简单的方法可以仅使用SQL来计算其父本拥有的树的根的百分比?如果我在使用MODEL的道路上走对了,那我哪里错了?

1.我也想避免PL/SQL SQL上下文切换.我意识到这是一个很小的时间,但是要每天不增加几分钟就很难快速地做到这一点.

解决方案

这应该得到答案;尽管要注意,我们在某些特殊情况下仍在运行.

首先要提到的是,做到这一点的最佳方法是使用递归子查询分解/评论中的递归CTE:

with temp (id, parent_id, percent_owned, calc) as (
  select a.id, a.parent_id, a.percent_owned, percent_owned as calc
    from hierarchy_test a
   where id = 1
   union all
  select a.id, a.parent_id, a.percent_owned, a.percent_owned * t.calc as calc
    from temp t
    join hierarchy_test a
      on t.parent_id = a.id
         )
select * 
  from temp

他们的SQL提琴..

不幸的是,查询的复杂性和我们正在处理的数据量是如此之大,以至于事实证明这是不可能的.如果没有每次都完全扫描一些过大的表,就无法做到这一点.

这不一定意味着我们回到了CONNECT BY.有机会批量计算层次结构.不幸的是,事实证明这也是不可能的.一个小时的数据库崩溃了.三次.我们用完了将近100GB的UNDO,而服务器却无法应付.

这些是特殊情况;我们最多只能在几个小时内计算成千上万的层次结构.平均一个深约1.5级,总共可能有5-10片叶子和8-12个节点.但是,离群值有90k节点,27个级别和多个循环关系.离群值离稀有度还远.

因此,CONNECT BY.在问题中建议针对PL/SQL EXECUTE IMMEDIATE Annjawn的解决方案进行基准测试表明,对于高于平均水平的树,XMLQuery()是最多慢4倍.太好了,有答案了;没有其他选择;留在那里.

不是.

由于我们要计算的节点层次结构如此之多,我们最终由于库高速缓存销锁而导致了漫长的等待,这是由于EXECUTE IMMEDIATE.

对此没有明显的反应,因此回头再回想一下Annjawn的解决方案,它可以快3倍! 库缓存销锁完全消失,我们又回到了狭窄的局面.

不是.

不幸的是,当结合使用CONNECT BYXMLQuery()和DBMS_SCHEDULER时,似乎在11.2中出现了一个Oracle错误.在某些情况下,通常在较大的层次结构中,它会泄漏大量内存.丢失了数据库,而服务器丢失了该服务器. Oracle提出了一份报告,我们正在12c中进行测试;尽管内存泄漏的表现较少,但仍会出现,因此12c已耗尽.

解决方案?将XMLQuery()包装在PL/SQL函数中.解决了内存泄漏问题,不幸的是,此功能引起了大量争用,我们开始获得数小时的库高速缓存:互斥锁x 等待.查询x$kglob确认是XMLTYPE被锤击.

Andrey Nikolaev建议更改系统;而不是在其他一切正常的情况下执行此操作,或者使用 DBMS_POOL.MARKHOT 过程告诉Oracle您将大量访问此对象.临时来看,这可能已经解决了问题,但是,大约10分钟之后,经历了Oracle似乎拥有的每一个锁,我们最终获得了5个争用CPU的进程.显然还不够(测试盒上有54GB和24个内核)...

然后我们开始获取光标针:s 等待. Burleson建议更多隐藏参数修饰,

create table hierarchy_test ( 
       id number -- "root" ID
     , parent_id number -- Parent of ID
     , holding_id number -- The ID can be split into multiple parts
     , percent_owned number (3, 2)
     , primary key (id, parent_id, holding_id) 
        );

And some sample data:

insert all 
 into hierarchy_test values (1, 2, 1, 1) 
 into hierarchy_test values (2, 3, 1, 0.25)
 into hierarchy_test values (2, 4, 1, 0.25)
 into hierarchy_test values (2, 5, 1, 0.1)
 into hierarchy_test values (2, 4, 2, 0.4)
 into hierarchy_test values (4, 5, 1, 1)
 into hierarchy_test values (5, 6, 1, 0.3)
 into hierarchy_test values (5, 7, 1, 0.2)
 into hierarchy_test values (5, 8, 1, 0.5)
select * from dual;

SQL Fiddle

The following query returns the calculation I would like to make. Due to the nature of SYS_CONNECT_BY_PATH it can't, to my knowledge, perform the calculation itself.

 select a.*, level as lvl
      , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
   from hierarchy_test a
  start with id = 1
connect by nocycle prior parent_id = id

There are cyclical relationships in the data, just not in this example.

At the moment I'm going to use a pretty simple function to turn the string in the calc column into a number

create or replace function some_sum ( P_Sum in varchar2 ) return number is
   l_result number;
begin  
   execute immediate 'select ' || P_Sum || ' from dual'
     into l_result;

   return l_result;   
end;
/

This seems to be a ridiculous way of going about it and I would rather avoid the additional time that will be taken parsing the dynamic SQL1.

Theoretically, I think, I should be able to use the MODEL clause to calculate this. My problem is caused by the non-uniqueness of the tree. One of my attempts at using the MODEL clause to do this is:

select *
  from ( select a.*, level as lvl
              , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
           from hierarchy_test a
          start with id = 1
        connect by nocycle prior parent_id = id
                 )
 model
 dimension by (lvl ll, id ii)
 measures (percent_owned, parent_id )
 rules upsert all ( 
   percent_owned[any, any]
   order by ll, ii  = percent_owned[cv(ll), cv(ii)] * nvl( percent_owned[cv(ll) - 1, parent_id[cv(ll), cv(ii)]], 1)
               )

This, understandably, fails with the following:

ORA-32638: Non unique addressing in MODEL dimensions

Using UNIQUE SINGLE REFERENCE fails for a similar reason, namely that the ORDER BY clause is not unique.

tl;dr

Is there a simple way to calculate the percentage of the root of a tree owned by its parents using only SQL? If I'm on the right track with MODEL where am I going wrong?

1. I'd also like to avoid the PL/SQL SQL context-switch. I realise that this is a tiny amount of time but this is going to be difficult enough to do quickly without adding an additional few minutes a day.

解决方案

This deserves an answer; though beware we're operating under a few special circumstances.

The first thing to mention is that the best possible way to do this is with recursive sub-query factoring/a recursive CTE as per Daniel Hilgarth and jonearles in the comments:

with temp (id, parent_id, percent_owned, calc) as (
  select a.id, a.parent_id, a.percent_owned, percent_owned as calc
    from hierarchy_test a
   where id = 1
   union all
  select a.id, a.parent_id, a.percent_owned, a.percent_owned * t.calc as calc
    from temp t
    join hierarchy_test a
      on t.parent_id = a.id
         )
select * 
  from temp

Their SQL Fiddle..

Unfortunately the complexity of the query and the size of the data we#re working with is such that this turned out to be impossible. There was no way to do it without full-scanning some overly large tables each time.

This doesn't necessarily mean that we're back to CONNECT BY. There is the opportunity to calculate hierarchies in bulk. Unfortunately, this turned out to be impossible as well; an hour in the database crashed. Thrice. We were using up almost 100GB of UNDO and the server just couldn't cope.

These are the special circumstances; we have to calculate hundreds of thousands of hierarchies in a few hours, at most. The average one is about 1.5 levels deep with maybe 5-10 leaves and 8-12 nodes in total. However, the outliers have 90k nodes, 27 levels and multiple cyclic relationships. The outliers aren't anywhere near rare enough.

So, CONNECT BY. Benchmarking Annjawn's solution against the PL/SQL EXECUTE IMMEDIATE suggested in the question indicated that for an above average tree XMLQuery() was up to 4 times slower. Excellent, had the answer; no other option; leave it at that.

Not.

Because we're calculating so many hierarchies with so many nodes we ended up getting overly long waits from library cache pin locks caused by the constant hard-parsing of hundreds of thousands of mathematical functions in EXECUTE IMMEDIATE.

No obvious response to that, so going back too Annjawn's solution it ends up 3 times quicker! The library cache pin locks completely disappear and we're back on the straight and narrow.

Not.

Unfortunately, there seems to be an Oracle bug in 11.2 that appears when you combine CONNECT BY, XMLQuery() and DBMS_SCHEDULER. In certain circumstances, normally in the larger hierarchies, it leaks huge amounts of memory. Lost the database and the server finding that one out. A report's been raised with Oracle and we're testing in 12c; though the memory leaks exhibit less, they still appear so 12c is out.

The solution? Wrap the XMLQuery() in a PL/SQL function. Memory leak solved, unfortunately this caused a large amount of contention for this function, and we started getting multi-hour Library cache: mutex x waits.. Querying x$kglob confirmed it was XMLTYPE that was getting hammered.

Andrey Nikolaev recommends either altering the system; rather not do that when everything else works fine, or using the DBMS_POOL.MARKHOT procedure to tell Oracle that you'll be accessing this object a lot. To the casual eye, this may have solved the issue, however, after about 10 minutes, and going through what seemed to be every lock that Oracle has, we ended up with 5 processes contending for CPU. Apparently there wasn't enough (54GB and 24 cores on the test box)...

We then started getting Cursor pin: s waits. Burleson recommends more hidden parameter finangling, Jonathan Lewis suggests that it's down to SGA resizing. As the DB was using automated SGA sizing we tried gradually increasing the shared pool, up to 30GB and only got back out old friend the Library cache: mutex x wait.

So, what's the solution? Who knows is the honest answer, but a Java stored procedure is working brilliantly so far, no memory leaks, no waits and significantly quicker than everything else.

I'm sure there's more out there... and I'd really like to get the MODEL clause to work if anyone has any ideas?

P.S. I can't claim credit for all of this; it's the work of about 3 people to get us to this stage...

这篇关于计算其父母拥有的根的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆