根据具有可能链的两列对记录进行分组 [英] Group records based on two columns with possible chains

查看:86
本文介绍了根据具有可能链的两列对记录进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PostgreSQL,但我对SQL却一无所获。我的记录很长,想将它们分配到组中,其中组的每个成员与至少一个其他组的成员具有两个相同的列之一。理想情况下,通过创建一个包含来自任一列的所有不同值的表(它们之间没有重叠,因为它们是完全不同的字符值)及其所属的组的数目。



<我希望某些小组仅由一个成员组成,但是可能存在长链关系,其中两个记录没有关联,而第三条记录将它们联系起来。使用编程语言(例如JavaScript),我可能会使用递归函数,但使用SQL时我会迷路。



我试图寻找答案,但我发现很难提出合适的关键词。这有点像一个巨大的多米诺骨牌游戏,但还是有点不同。有没有解决此问题的简单方法?否则,有人可以为我指出一个正确的方向吗?



编辑:
一些示例数据。 userId和session是我的两列。因此,在这种情况下,Id 2、4和6将在同一组中。

 
Id类型userID会话
1回调25596094 lJcD7fiFCnB4o4ZxI_DQHKMmBGW1T0b4
2回调26631605 xupFcU6C8cl7wdviOnc1XX37Feg234vK
3分配回调02-9128924-01 eNE8VuJBz9vffGeuALy72owq1cJhK84l
4分配回调26631605 GhenxfiVXQaGbYq2_SXJhhkvTRN8M3vb
5分配回调周游世界-394146 PdJEDeW57piXMu6nNsJjLZeFmNrP2jvG
6分配回调31831125 xupFcU6C8cl7wdviOnc1XX37Feg234vK


解决方案

这只是一个部分解决方案,它将检索与第一个相关的所有行,基于



在这种情况下,如果我们从行(id = 4)开始,查询可能是:

 带有递归
x as(
select * from my_table,其中id = 4-这是起始行
并集所有
select t。*
来自my_t能够t $​​ b $ b在t.userID = x.userId或t.session = x.session
上加入x)
select * from x

结果将是:

  Id类型userID会话
4回调26631605 GhenxfiVXQaaGBYq2_SXJhhkvTRN8M3vb
2回调26631605 xupFcU6C8cl7wdviOnc1XX37Feg234vK
6回调31831125 xupFcU6C8cl7wdviOnc1XX





查询需要更多的工作才能使其在所有行中运行,而不仅仅是在这种情况下的子集。



UPDATE 2018年12月7日:



我编写了一个SQL更新,该更新将找到一个组并分配一个新的(不同的) group_id 值。如果您多次运行此SQL更新,则最终将为所有行分配组ID。这里是:

 更改表my_table add group_id int; -额外的列存储了group_id 

创建序列group_id_seq; -每次以递归
s作为

时都会生成一个不同的group_id(
select nextval('group_id_seq')为nv
),
x为(
select * from(
select * from my_table其中group_id为null仅仅获取前1行
)x
union
select t。*
from my_table t
加入x on t.userid = x.userid或t.session = x.session

更新my_table t set group_id = s.nv from s,x其中t.id = x.id;同样,每次您运行

,它将用标记一个新的[未标记]行集新的组ID值。



我希望对您有所帮助。


I'm working with PostgreSQL and I am anything but good with SQL. I have a long table of records and want to assign them into groups where each member of the group has one of the two columns in common with at least one other member of the group. Ideally by creating a table containing of all distinct values from either column (there are no overlaps there as they are totally different character values) and the number of the group they belong to.

I expect some groups to be consisting of only the one member but there may be long chains of relations where two records are not connected but a third record connects them. Using a programming language (e.g. JavaScript) I would possibly use a recursive function but with SQL I am lost.

I have tried searching for an answer but I find it hard to come up with suitable key words. It is kind of like an enormous domino game but still a bit different. Is there any simple solution for this problem? And if not, can someone please point me in a good direction?

Edit: Some example data. userId and session are my two columns. So in this case Id 2, 4, and 6 would be in the same group.

Id   Type      userID                session
1    callback  25596094              lJcD7fiFCnB4o4ZxI_DQHKMmBGW1T0b4
2    callback  26631605              xupFcU6C8cl7wdviOnc1XX37Feg234vK
3    callback  02-9128924-01         eNE8VuJBz9vffGeuALy72owq1cJhK84l
4    callback  26631605              GhenxfiVXQaGbYq2_SXJhhkvTRN8M3vb
5    callback  globetrotter-394146   PdJEDeW57piXMu6nNsJjLZeFmNrP2jvG
6    callback  31831125              xupFcU6C8cl7wdviOnc1XX37Feg234vK

解决方案

This is just a partial solution that will retrieve all rows "related" to a first one, based on your logic.

In this case if we start with row (id = 4), the query could be:

with recursive
x as (
  select * from my_table where id = 4 -- this is the starting row
  union all
  select t.* 
  from my_table t
  join x on t.userID = x.userId or t.session = x.session
)
select * from x

And the result will be:

Id   Type      userID                session
4    callback  26631605              GhenxfiVXQaGbYq2_SXJhhkvTRN8M3vb
2    callback  26631605              xupFcU6C8cl7wdviOnc1XX37Feg234vK
6    callback  31831125              xupFcU6C8cl7wdviOnc1XX37Feg234vK

The query needs more work to make it run for all rows, not just a subset like in this case.

UPDATE Dec 7, 2018:

I wrote a SQL update that will find one group and assign a new (different) group_id value to it. If you run this SQL update multiple times you'll eventually assign group ids to all rows. Here it is:

alter table my_table add group_id int; -- extra column stores the group_id

create sequence group_id_seq; -- will generate a different group_id each time

with recursive
s as (
  select nextval('group_id_seq') as nv
),
x as (
  select * from (
    select * from my_table where group_id is null fetch first 1 rows only
  ) x
  union
  select t.*
  from my_table t
  join x on t.userid = x.userid or t.session = x.session
)
update my_table t set group_id = s.nv from s, x where t.id = x.id;

Again, each time you run it, it will mark a new [unmarked] set of rows with a new group id value.

I hope it helps.

这篇关于根据具有可能链的两列对记录进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆