它们具有共同的甚至一种元素合并集 [英] merging sets which have even one element in common

查看:130
本文介绍了它们具有共同的甚至一种元素合并集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
  <一href="http://stackoverflow.com/questions/9110837/python-simple-list-merging-based-on-intersections">Python:简单的列表基于交叉口合并

我试图对象进行分类。每个对象都由一个唯一的标识符属性名为 ID 。所以,我的分类逻辑是这样的。首先,我prepare对象的列表,然后分类函数需要2个对象的时间和返回 frozenset 包含它们的 ID 。因此,如果 object1 object5 是在同一类别中的 frozenset(ID1,ID5)返回。现在,我一直将这些frozensets一组所以最后我有一组这样的

I am trying to classify objects. Each object is identified by a unique identifier property called id. So my classification logic goes like this. First i prepare a list of objects and then the classification function takes 2 objects at a time and returns a frozenset containing their id. So if object1 and object5 are in the same category a frozenset(id1,id5) is returned. Now i keep adding these frozensets to a set so in the end i have a set like this

matched_set=(
             frozenset(id1,id2),
             frozenset(id9,id3),
             frozenset(id9,id2),
             frozenset(id24,id22),
             frozenset(id1,id23),
             frozenset(id25,id24),
             frozenset(id30,id24)
            )

现在,因为使用对象 ID1 ID2 在同一个类别,对象与 ID9 ID3 在同一个类别,对象与 ID9 ID2 在同一个类别,与 ID1,ID2,ID3,ID9 应该是同一类人。所以我应该有一组这样的设置(ID1,ID2,ID3,ID9) 有人可以提供一种算法来做到这一点? 谢谢

Now because objects with id1 and id2 are in the same category, objects with id9 and id3 are in the same category, objects with id9 and id2 are in the same category, objects with id1,id2,id3,id9 should be in same category. So i should have a set like this set(id1,id2,id3,id9) Can someone provide an algorithm to do so? Thanks

推荐答案

这听起来像你正在寻找一个的不相交集数据结构

It sounds like you're looking for a disjoint-set datastructure.

由于您所设定的ID的,您的类别分开成不相交的子集。而对于不相交集数据结构重新presents每个类别选择重新presentative ID,将被其任何成员的查询返回。 (隔离的ID的形式一类每人,然后返回自己)

Given your set of id's, your categories separate them into disjoint subsets. A disjoint-set datastructure represents each category by choosing a representative id, which will be returned by a query of any of its members. (isolated id's form one category apiece, and return themselves)

更新不相交集数据结构相结合的任何两个ID的类别,这样以后查询返回相同的重presentative两个亚群的成员。 (如果这两个ID的已经是同一类的成员,更新的功能是一个无操作)

Updates to a disjoint-set datastructure combine the categories of any two id's, so that future queries return the same representative for members of both subsets. (if the two id's are already members of the same category, the update is functionally a no-op)

通常的方法是重新present每个类别为反向树:每个ID有一个的链接,但没有孩子的联系。 重presentative元件是树,这是很容易由以下父链接查询的根。升级后需要寻找这两个ID的树根,以及(如果它们是不同的)通过一个根其他的母公司合并树。

The usual method is to represent each category as a reverse-tree: each id has a parent link, but no child links. The "representative element" is the root of the tree, which is easy to query by following the parent links. An update requires finding the root of the trees of both id's, and (if they are different) merging the trees by making one root the parent of the other.

通过添加一些简单的优化(查询崩溃的查询路径直接指向根和更新总是选择最深的树作为合并母公司的根),该算法变得非常高效,运行几乎-O(1)分期时间。

By adding a couple of simple optimizations (queries "collapse" the query path to point directly to the root, and updates always choose the root of the deepest tree as the merge parent), this algorithm becomes extremely efficient, running in "almost-O(1)" amortized time.

如果您想在线访问的ID的完整列表中的每个类别中,您应保持连接到每个类别根的累积列表,以及将它们连接起来的每个合并。一般情况下,它可以方便地维护任意数量的统计信息的类别这样的。

If you want online access to the complete list of id's in each category, you should maintain a cumulative list attached to each category root, and concatenate them in each merge. In general, it can be convenient to maintain any number of statistics about your categories this way.

这篇关于它们具有共同的甚至一种元素合并集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆