如何将互连的ID对列表转换为ID集群? [英] How do I turn a list of interconnected pairs of ids into a cluster of ids?
问题描述
我有一张桌子,桌子上有成对的ID(有时是三对),它们是链中的一种链接
I have a table with pairs (and sometimes triples) of ids, which act as sort of links in a chain
+------+-----+
| from | to |
+------+-----+
| id1 | id2 |
| id2 | id3 |
| id4 | id5 |
+------+-----+
我想创建一个新表,其中所有链接都聚集到链/族中:
I want to create a new table where all the links are clustered into chains/families:
+-----+----------+
| id | familyid |
+-----+----------+
| id1 | 1 |
| id2 | 1 |
| id3 | 1 |
| id4 | 2 |
| id5 | 2 |
+-----+----------+
即将链接中的所有链添加到一个家族中,并为其指定一个ID.在上面的示例中,第一个表的前2行创建一个家庭,最后一行创建另一个家庭.
i.e. add up all chains in a link into a single family, and give it an id. in the example above, the first 2 rows of the first table create one family, and the last row creates another family.
解决方案
我将使用node.js查询大批行(每批几千行),对其进行处理,然后将它们插入具有家族ID的我自己的表中.
I will use node.js to query big batches of rows (a few thousands every batch), process them, and insert them into my own table with a family id.
问题
问题是我有成千上万个ID对,并且在最初创建Familys表之后,随着时间的推移,我还需要添加新的ID,并且我需要将ID添加到现有的Family中.
The problem is I have a few tens of thousands of id pairs, and I will also need to add new ids over time after the initial creation of the families table, and i will need to add ids to existing families
是否有好的算法可以将数据对聚类到族/类中,从而牢记我的问题?
Are there good algorithms for clustering pairs of data into families/clusters, keeping my issue in mind?
推荐答案
这看起来很像在图数据集上进行聚类,其中"familyid"是聚类中心编号.
This looks a lot like clustering over graph dataset where 'familyid' is the cluster center number.
这是算法描述.您需要在描述的条件下实施.
Here is the algorithm description. You will need to implement under the conditions you described.
这篇关于如何将互连的ID对列表转换为ID集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!