创建一个分层定义的数据集的展平表/视图 [英] Creating a flattened table/view of a hierarchically-defined set of data

查看:184
本文介绍了创建一个分层定义的数据集的展平表/视图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含分层数据的表。目前这个层次结构中有8个级别。



我非常喜欢数据结构的方式,但是当我需要知道8级的记录时,表现是令人沮丧的是一级记录的小孩。



我有PL / SQL存储的函数,为我做这些查找,每个都有一个 select *从tbl开始...通过...连接... 语句。当我查询一些记录时,这样工作正常,但是现在我需要一次查询〜10k个记录,并且每个记录都运行这个功能。它需要2-3分钟,我需要它在短短的几秒钟内运行。



根据我对当前数据的了解,使用一些启发式方法,我可以摆脱查找功能,只需执行 childrecord.key || '%'LIKE parentrecord.key 但是这是一个非常脏的黑客,并不总是可以工作。



所以现在我在想这个分层定义的表我需要一个单独的父子表,它将包含每个关系...对于从1-8级开始的层次结构将有8个!记录,1与2,1与3,...,1与8和2与3,2与4,...,2与8.等等。


$ b $我的想法是,我需要一个插入触发器,它将基本上运行查询的连接,并且对于每个匹配到层次结构,它将插入一个记录在查找表中。并且为了处理旧的数据,我将使用级联删除将外键设置到主表。



有更好的选择吗?我错过了另一种方式,可以更快地确定这些遥远的祖先/后裔吗?



编辑:这似乎正是我' m思考: http://evolt.org/working_with_hierarchical_data_in_sql_using_ancestor_tables

解决方案

所以你想要的是实现传递闭包。也就是说,给出这个应用程序表...

  ID | PARENT_ID 
------ + ----------
1 |
2 | 1
3 | 2
4 | 2
5 | 4

...图表将如下所示:

  PARENT_ID | CHILD_ID 
----------- + ----------
1 | 2
1 | 3
1 | 4
1 | 5
2 | 3
2 | 4
2 | 5
4 | 5

可以在Oracle中维护这样的表,尽管您需要滚动自己的框架为它。问题是是否值得开销。如果源表是易失性的,那么将图形数据保持新鲜可能会花费更多的周期,而不是保存在查询上。只有你知道你的数据资料。



我不认为你可以用CONNECT BY查询和级联外键来维护这样的图表。间接活动太多了,太难了。也是一个物化视图,因为当我们删除的源记录时,我们不能编写一个SQL查询,它将zap 1-> 5 ID = 4



所以我建议你阅读一个名为。这包含了很多理论和一些精采(Oracle)SQL,但它将为您建立维护一个图表表所需的PL / SQL的基础。







你可以扩展一下它
太难了维持
CONNECT BY / cascading FKs?如果我控制
访问表,所有
插入/更新/删除通过
存储过程进行,什么类型的
情况在那里会有
分解?


考虑记录 1-> ; 5 这是 1-> 2-> 4-> 5< / code>的短路。现在,如我所说,我们删除 ID = 4 的源记录会发生什么?级联外键可以删除 2-> 4 4-> 5 的条目。但是,尽管图表中的 1-> 5 (确实 2-> 5 ), em>不再代表图表中的有效边缘。



什么可以工作(我想,我还没有这样做)会使用在源表中增加了合成键,就像这样。

  ID | PARENT_ID | NEW_KEY 
------ + ----------- + ---------
1 | | AAA
2 | 1 | BBB
3 | 2 | CCC
4 | 2 | DDD
5 | 4 | EEE

现在图表如下所示:

  PARENT_ID | CHILD_ID | NEW_KEY 
----------- + ---------- + ---------
1 | 2 | BBB
1 | 3 | CCC
1 | 4 | DDD
1 | 5 | DDD
2 | 3 | CCC
2 | 4 | DDD
2 | 5 | DDD
4 | 5 |所以图表有一个引用源表中关系的外键,它产生它,而不是而不是链接到ID。然后删除 ID = 4 的记录将级联图表中所有记录的删除,其中 NEW_KEY = DDD



如果任何给定的ID只能有零个或一个父ID,这将工作。但是如果允许这种情况发生,则不行:

  ID | PARENT_ID 
------ + ----------
5 | 2
5 | 4

换句话说边缘 1-> 5 代表 1-> 2-> 4-> 5 1-> 2-> 5 。所以,可能的工作取决于数据的复杂性。


I have a table containing hierarchical data. There are currently ~8 levels in this hierarchy.

I really like the way the data is structured, but performance is dismal when I need to know if a record at level 8 is a child of a record at level 1.

I have PL/SQL stored functions which do these lookups for me, each having a select * from tbl start with ... connect by... statement. This works fine when I'm querying a handful of records, but I'm in a situation now where I need to query ~10k records at once and for each of them run this function. It's taking 2-3 minutes where I need it to run in just a few seconds.

Using some heuristics based on my knowledge of the current data, I can get rid of the lookup function and just do childrecord.key || '%' LIKE parentrecord.key but that's a really dirty hack and will not always work.

So now I'm thinking that for this hierarchically-defined table I need to have a separate parent-child table, which will contain every relationship...for a hierarchy going from level 1-8 there would be 8! records, associating 1 with 2, 1 with 3,...,1 with 8 and 2 with 3, 2 with 4,...,2 with 8. And so forth.

My thought is that I would need to have an insert trigger where it will basically run the connect by query and for every match going up the hierarchy it will insert a record in the lookup table. And to deal with old data I'll just set up foreign keys to the main table with cascading deletes.

Are there better options than this? Am I missing another way that I could determine these distant ancestor/descendant relationships more quickly?

EDIT: This appears to be exactly what I'm thinking about: http://evolt.org/working_with_hierarchical_data_in_sql_using_ancestor_tables

解决方案

So what you want is to materialize the transitive closures. That is, given this application table ...

 ID   | PARENT_ID
------+----------
    1 | 
    2 |         1
    3 |         2
    4 |         2
    5 |         4

... the graph table would look like this:

 PARENT_ID | CHILD_ID
-----------+----------
         1 |        2
         1 |        3
         1 |        4
         1 |        5
         2 |        3
         2 |        4
         2 |        5
         4 |        5

It is possible to maintain a table like this in Oracle, although you will need to roll your own framework for it. The question is whether it is worth the overhead. If the source table is volatile then keeping the graph data fresh may cost more cycles than you will save on the queries. Only you know your data's profile.

I don't think you can maintain such a graph table with CONNECT BY queries and cascading foreign keys. Too much indirect activity, too hard to get right. Also a materialized view is out, because we cannot write a SQL query which will zap the 1->5 record when we delete the source record for ID=4.

So what I suggest you read a paper called Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su and Wong. This contains a lot of theory and some gnarly (Oracle) SQL but it will give you the grounding to build the PL/SQL you need to maintain a graph table.


"can you expand on the part about it being too difficult to maintain with CONNECT BY/cascading FKs? If I control access to the table and all inserts/updates/deletes take place via stored procedures, what kinds of scenarios are there where this would break down?"

Consider the record 1->5 which is a short-circuit of 1->2->4->5. Now what happens if, as I said before, we delete the the source record for ID=4? Cascading foreign keys could delete the entries for 2->4 and 4->5. But that leaves 1->5 (and indeed 2->5) in the graph table although they no longer represent a valid edge in the graph.

What might work (I think, I haven't done it) would be to use an additional synthetic key in the source table, like this.

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
    1 |           | AAA
    2 |         1 | BBB
    3 |         2 | CCC
    4 |         2 | DDD
    5 |         4 | EEE

Now the graph table would look like this:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
         1 |        2 | BBB
         1 |        3 | CCC
         1 |        4 | DDD
         1 |        5 | DDD
         2 |        3 | CCC
         2 |        4 | DDD
         2 |        5 | DDD
         4 |        5 | DDD

So the graph table has a foreign key referencing the relationship in the source table which generated it, rather than linking to the ID. Then deleting the record for ID=4 would cascade deletes of all records in the graph table where NEW_KEY=DDD.

This would work if any given ID can only have zero or one parent IDs. But it won't work if it is permissible for this to happen:

 ID   | PARENT_ID
------+----------
    5 |         2
    5 |         4

In other words the edge 1->5 represents both 1->2->4->5 and 1->2->5. So, what might work depends on the complexity of your data.

这篇关于创建一个分层定义的数据集的展平表/视图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆