合并成对的列表(元组)? [英] Combine a list of pairs (tuples)?

查看:54
本文介绍了合并成对的列表(元组)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从链接对的列表中,我想将这些对组合成通用ID组,以便随后可以将group_ids写回到数据库,例如:

From a list of linked pairs, I want to combine the pairs into groups of common IDs, so that I can then write group_ids back to the database, eg:

UPDATE table SET group = n WHERE id IN (...........);

示例:

[(1,2), (3, 4), (1, 5), (6, 3), (7, 8)]

成为

[[1, 2, 5], [3, 4, 6], [7, 8]]

允许:

UPDATE table SET group = 1 WHERE id IN (1, 2, 5);
UPDATE table SET group = 2 WHERE id IN (3, 4, 6);
UPDATE table SET group = 3 WHERE id IN (7, 8);

[(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)]

成为

[[1, 2, 5, 3, 4, 6], [7, 8]]

允许:

UPDATE table SET group = 1 WHERE id IN (1, 2, 5, 3, 4, 6);
UPDATE table SET group = 2 WHERE id IN (7, 8);

我写了一些有效的代码.我传入一个元组列表,其中每个元组都是一对链接的ID.我返回一个列表列表,其中每个内部列表都是通用ID的列表.

I have written some code which works. I pass in a list of tuples, where each tuple is a pair of linked ids. I return a list of lists, where each internal list is a list of common id's.

我遍历元组列表,并将每个元组元素分配给组,如下所示:

I iterate over the list of tuples and assign each tuple element to groups as follows:

  • 如果 a 和 b 都不在列表中,则创建一个新列表,附加 a 和 b 并将新列表附加到列表列表中
  • 如果 a 在一个组中而 b 不在,则将 b 添加到 a 组中
  • 如果b在一个组中,但a不在组中,则将a添加到b组中.
  • 如果a和b已经在单独的组中,则合并a和b组
  • 如果 a 和 b 已经在同一个组中,则什么都不做

我期望数以百万计的链接对,并且期望成千上万的团体和成千上万的团体.因此,我需要算法更快,我正在寻找一些真正有效的代码的建议.尽管我已经实现了构建列表列表的方法,但是我对任何事物都持开放态度,他们关键的一点是能够构建上述SQL以将组ID返回数据库.

I am expecting millions of linked pairs, and I am expecting hundreds of thousand of gropus and hunderds of thousands of group members. So, I need the algorithm to be fast, I am looking for some suggestions for some really efficient code. Although I have implemented this to build a list of lists, I am open to anything, they key thing is being able to build the above SQL to get the group ids back to the database.

def group_pairs(list_of_pairs):
    """

    :param list_of_pairs:
    :return:
    """
    groups = list()
    for pair in list_of_pairs:
        a_group = None
        b_group = None

        for group in groups:
            # find what group if any a and b belong to

            # don't bother checking if a group already found
            if a_group is None and pair[0] in group:
                a_group = group
            # don't bother checking if b group already found
            if b_group is None and pair[1] in group:
                b_group = group
            # if a and b found, stop looking
            if a_group is not None and b_group is not None:
                break

        if a_group is None:
            if b_group is None:
                # neither a nor b are in a group; create a new group and
                # add a and b
                groups.append([pair[0], pair[1]])
            else:
                # b is in a group but a isn't, so add a to the b group
                b_group.append(pair[0])
        elif a_group != b_group:
            if b_group is None:
                # a is in a group but b isn't, so add b to the a group
                a_group.append(pair[1])
            else:
                # a and b are in different groups, add join b to a and get
                # rid of b
                a_group.extend(b_group)
                groups.remove(b_group)
        elif a_group == b_group:
            # a and b already in same group, so nothing to do
            pass

    return groups

推荐答案

使用:

def make_equiv_classes(pairs):
    groups = {}
    for (x, y) in pairs:
        xset = groups.get(x, set([x]))
        yset = groups.get(y, set([y]))
        jset = xset | yset
        for z in jset:
            groups[z] = jset
    return set(map(tuple, groups.values()))

您可以获得:

>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8)])
{(1, 2, 5), (3, 4, 6), (8, 7)}

>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)])
{(1, 2, 3, 4, 5, 6), (8, 7)}

复杂度应为 O(np),其中 n 是不同整数值的数量,而 p 是对的数量

The complexity should be O(np), where n is the number of distinct integer values, and p is the number of pairs.

我认为 set 是单个组的正确类型,因为它使联合操作快速且易于表达,而 dict 是存储 groups ,因为您可以进行恒定时间查询,以询问特定整数值属于哪个组的问题.

I think set is the proper type for a single group, because it makes the union operation fast and easy to express, and dict is the right way to store groups because you get constant-time lookup for asking the question of what group a particular integer value belongs to.

如果需要,我们可以设置测试工具来计时此代码.首先,我们可以在中等大小的对象(例如10K节点)(即不同的整数值)上构建随机图.我将放入5K个随机链接(成对),因为这往往会给我成千上万个组,这些组合起来约占节点的三分之二(即,约3K个节点将位于一个单例组中,而不链接到任何东西)其他).

We can set up a test harness to time this code, if we want to. To start, we can build a random graph over something moderately large, like 10K nodes (i.e., distinct integer values). I'll put in 5K random links (pairs), since that tends to give me thousands of groups, which together account for about two thirds of the nodes (that is, about 3K nodes will be in a singleton groups, not linked to anything else).

import random
pairs = []
while len(pairs) < 5000:
    a = random.randint(1,10000)
    b = random.randint(1,10000)
    if a != b:
        pairs.append((a,b))

然后,我们可以将其计时(我在这里使用IPython魔术):

Then, we can just time this (I'm using IPython magic here):

In [48]: %timeit c = make_equiv_classes(pairs)
10 loops, best of 3: 63 ms per loop

比您最初的解决方案要快:

which is faster than your initial solution:

In [49]: %timeit c = group_pairs(pairs)
1 loop, best of 3: 2.08 s per loop

我们还可以使用此随机图来检查两个函数的输出对于某些较大的随机输入是否相同:

We can also use this random graph to check that the output of the two functions is identical for some large random input:

>>> c = make_equiv_classes(pairs)
>>> c2 = group_pairs(pairs)
>>> set(tuple(sorted(x)) for x in c) == set(tuple(sorted(x)) for x in c2)
True

这篇关于合并成对的列表(元组)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆