SQL组合多个标识符为重复的记录创建组ID [英] SQL combine multiple identifiers to create a group id for duplicate records

查看:114
本文介绍了SQL组合多个标识符为重复的记录创建组ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决Oracle中一个难以解决的问题.

I'm working on a problem in Oracle that I'm struggling to solve 'elegantly'.

我有一个带有三个不同标识符的数据提取:A,B,C

I have a data extract with three different identifiers: A, B, C

每个标识符可能出现在不止一行中,并且每一行可能具有这三个标识符中的一个或多个(即,该列已填充​​或为空).

Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).

我希望能够对所有具有A,B或C任意组合的记录进行分组,并为其分配相同的组ID.

I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.

提取表显示最终的组应该是什么:

Extract table showing what the eventual groups should be:

Rownum | A    | B    | C    | End group
1        p      NULL   NULL   1
2        p      r      NULL   1
3        q      NULL   NULL   2
4        NULL   r      NULL   1
5        NULL   NULL   s      2
6        q      NULL   s      2

我最初的方法是为提取物中的每一行分配一个GUID并为三个标识符创建一个查找表:

My original approach was to assign a guid to each row in the extract and create a lookup table for the three identifiers:

GUID | IDENTIFIER | IDENTIFIER TYPE | GROUP | END GROUP
1      p            A                 1       1
2      p            A                 1       1
2      r            B                 2       1
3      q            A                 3       3
4      r            B                 2       1
5      s            C                 4       3
6      q            A                 3       3
6      s            C                 4       3

然后按标识符分组并分配一个组号.但是,在可能的情况下,需要将这些组合并以提供端组中显示的视图.

Then group by identifier and assign a group number. The groups, however, need to be combined where possible to provide the view shown in end group.

我可以想到的唯一解决方案是使用循环,而我宁愿避免使用循环.

The only solution I can think of for this problem is to use loops, which I'd rather avoid.

任何想法都将不胜感激.

Any ideas would be greatly appreciated.

Niall

推荐答案

这确实是一个有趣的问题.不过,我认为我们仍缺少组"的定义.由于在您的示例中,(p,null,null) (row1)(null,r,null) (row4)没有共用标识符,并且属于同一组,因此我将使用此定义进行分组:

This is truly an interesting problem. Still, I think we are missing a definition of a "group". Since in your example (p,null,null) (row1) and (null,r,null) (row4) share no common identifier and belong to the same group I'll go with this definition for grouping:

如果一行与该组的至少一行共享至少一个标识符,则该行属于该组.

A row belongs to a group if it shares at least one identifier with at least one row of this group.

这意味着我们可以链接"行.这自然会导致分层解决方案:

This means we can "chain" rows. This naturally leads to a hierarchical solution:

SQL> SELECT ID, a, b, c, MIN(grp) grp
  2    FROM (SELECT connect_by_root(id) ID,
  3                 connect_by_root(a) a,
  4                 connect_by_root(b) b,
  5                 connect_by_root(c) c,
  6                 ID grp
  7             FROM a
  8           CONNECT BY NOCYCLE(PRIOR a = a
  9                           OR PRIOR b = b
 10                           OR PRIOR c = c))
 11   GROUP BY ID, a, b, c
 12   ORDER BY ID;

        ID A          B          C                 GRP
---------- ---------- ---------- ---------- ----------
         1 p                                         1
         2 p          r                              1
         3 q                                         3
         4            r                              1
         5                       s                   3
         6 q                     s                   3

6 rows selected

您可以执行子查询来了解其构造:

You can execute the subquery to understand the construction:

SQL> SELECT connect_by_root(id) ID,
  2         connect_by_root(a) a,
  3         connect_by_root(b) b,
  4         connect_by_root(c) c,
  5         substr(sys_connect_by_path(ID, '->'), 3) path,
  6         ID grp
  7    FROM a
  8  CONNECT BY NOCYCLE(a = PRIOR a
  9                  OR b = PRIOR b
 10                  OR c = PRIOR c);

        ID A          B          C          PATH            GRP
---------- ---------- ---------- ---------- -------- ----------
         1 p                                1                 1
         1 p                                1->2              2
         1 p                                1->2->4           4
         2 p          r                     2                 2
         2 p          r                     2->1              1
         2 p          r                     2->4              4
         3 q                                3                 3
         3 q                                3->6              6
         3 q                                3->6->5           5
         4            r                     4                 4
         4            r                     4->2              2
         4            r                     4->2->1           1
         5                       s          5                 5
         5                       s          5->6              6
         5                       s          5->6->3           3
         6 q                     s          6                 6
         6 q                     s          6->3              3
         6 q                     s          6->5              5

18 rows selected

这篇关于SQL组合多个标识符为重复的记录创建组ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆