标记与 Oracle SQL 共享共同特性的个人 [英] Flag individuals that share common features with Oracle SQL

查看:41
本文介绍了标记与 Oracle SQL 共享共同特性的个人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下表:

ID  Feature
1   1
1   2
1   3
2   3
2   4
2   6
3   5
3   10
3   12
4   12
4   18
5   10
5   30

我想根据重叠特征对个体进行分组.如果其中两个组再次具有重叠特征,我会将它们视为一组.这个过程应该重复,直到组之间没有重叠的特征.上表中此过程的结果将是:

I would like to group the individuals based on overlapping features. If two of these groups again have overlapping features, I would consider both as one group. This process should be repeated until there are no overlapping features between groups. The result of this procedure on the table above would be:

ID  Feature Flag
1   1       A
1   2       A
1   3       A
2   3       A
2   4       A
2   6       A
3   5       B
3   10      B
3   12      B
4   12      B
4   18      B
5   10      B
5   30      B

所以实际上我试图解决的问题是在图中找到连通分量.这里 [1,2,3] 是 ID 为 1 的图(参见 https://en.wikipedia.org/wiki/Connectivity_(graph_theory)).这个问题相当于这个问题,但是我想用 Oracle SQL 解决.

So actually the problem I am trying to solve is finding connected components in a graph. Here [1,2,3] is the graph with ID 1 (see https://en.wikipedia.org/wiki/Connectivity_(graph_theory)). The problem is equivalent to this problem, however I would like to solve it with Oracle SQL.

推荐答案

这是一种实现此目的的方法,使用分层(连接方式")查询.第一步是从基础数据中提取初始关系;分层查询建立在第一步的结果上.我在输入中再添加一行来说明一个节点,它本身就是一个连接组件.

Here is one way to do this, using a hierarchical ("connect by") query. The first step is to extract the initial relationships from the base data; the hierarchical query is built on the result from this first step. I added one more row to the inputs to illustrate a node that is a connected component by itself.

您将连接的组件标记为 A 和 B - 当然,如果您有 30,000 个连接的组件,这将不起作用.在我的解决方案中,我使用最小节点名称作为每个连接组件的标记.

You marked the connected components as A and B - of course, that won't work if you have, say, 30,000 connected components. In my solution, I use the minimum node name as the marker for each connected component.

with
  sample_data (id, feature) as (
    select 1,  1 from dual union all
    select 1,  2 from dual union all
    select 1,  3 from dual union all
    select 2,  3 from dual union all
    select 2,  4 from dual union all
    select 2,  6 from dual union all
    select 3,  5 from dual union all
    select 3, 10 from dual union all
    select 3, 12 from dual union all
    select 4, 12 from dual union all
    select 4, 18 from dual union all
    select 5, 10 from dual union all
    select 5, 30 from dual union all
    select 6, 40 from dual
  )
-- select * from sample_data; /*
, initial_rel(id_base, id_linked) as (
    select distinct s1.id, s2.id
      from sample_data s1 join sample_data s2
                          on s1.feature = s2.feature and s1.id <= s2.id
  )
-- select * from initial_rel; /*
select     id_linked as id, min(connect_by_root(id_base)) as id_group
from       initial_rel
start with id_base <= id_linked
connect by nocycle prior id_linked = id_base and id_base < id_linked
group by   id_linked
order by   id_group, id
;

输出:

     ID   ID_GROUP
------- ----------
      1          1
      2          1
      3          3
      4          3
      5          3
      6          6

然后,如果您需要将 ID_GROUP 作为 FLAG 添加到基础数据中,您可以通过简单的连接来实现.

Then, if you need to add the ID_GROUP as a FLAG to the base data, you can do so with a trivial join.

这篇关于标记与 Oracle SQL 共享共同特性的个人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆