根据重复项对Python列表进行分组 [英] Group Python lists based on repeated items

查看:207
本文介绍了根据重复项对Python列表进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与将Python列表的列表根据重叠的项目进行分组,实际上,它可以称为重复项.

This question is very similar to this one Group Python list of lists into groups based on overlapping items, in fact it could be called a duplicate.

基本上,我有一个子列表列表,其中每个子列表都包含一定数量的整数(子列表之间的数字不同).我需要对所有共享一个或多个整数的子列表进行分组.

Basically, I have a list of sub-lists where each sub-list contains some number of integers (this number is not the same among sub-lists). I need to group all sub-lists that share one integer or more.

我问一个新的单独问题的原因是,我试图适应Martijn Pieters的绝佳答案没有运气.

The reason I'm asking a new separate question is because I'm attempting to adapt Martijn Pieters' great answer with no luck.

这是MWE:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]


gr = [[29, 27, 26, 28], [31, 11, 10, 3, 30], [71, 51, 52, 69],
      [78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81],
      [86, 68, 67, 84]]

for i, group in enumerate(grouper(gr)):
    print 'g{}:'.format(i), group

我得到的输出是:

g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
g4: [[86, 84, 81, 82, 83, 85]]

最后一组 g4 应该已经与 g3 合并,因为其中的列表共享 81 83 84 ,甚至单个重复元素也足以将它们合并.

The last group g4 should have been merged with g3, since the lists inside them share the items 81, 83 and 84, and even a single repeated element should be enough for them to be merged.

我不确定应用的代码是否错误,或者代码是否有问题.

I'm not sure if I'm applying the code wrong, or if there's something wrong with the code.

推荐答案

您可以将要执行的合并描述为集合合并或连接组件问题.我倾向于使用现成的集合合并算法,然后将其适应特定情况.例如,IIUC,您可以使用类似

You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like

def consolidate(sets):
    # http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
    setlist = [s for s in sets if s]
    for i, s1 in enumerate(setlist):
        if s1:
            for s2 in setlist[i+1:]:
                intersection = s1.intersection(s2)
                if intersection:
                    s2.update(s1)
                    s1.clear()
                    s1 = s2
    return [s for s in setlist if s]

def wrapper(seqs):
    consolidated = consolidate(map(set, seqs))
    groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
    output = {}
    for seq in seqs:
        target = output.setdefault(groupmap[seq[0]], [])
        target.append(seq)
    return list(output.values())

给出

>>> for i, group in enumerate(wrapper(gr)):
...     print('g{}:'.format(i), group)
...     
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]

(由于使用了字典,因此无法保证顺序.)

(Order not guaranteed because of the use of the dictionaries.)

这篇关于根据重复项对Python列表进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆