如何在python中随机合并两个列表的两个元素,并确保生成的条目都是唯一的? [英] How to merge two elements of two lists in python at random and ensure resulting entries are all unique?

查看:130
本文介绍了如何在python中随机合并两个列表的两个元素,并确保生成的条目都是唯一的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个列表,A和B,具有相同数量的元素,尽管每个列表中的元素不一定是不同的.

I have two lists, A and B, with an equal number of elements, although the elements in each list are not necessarily distinct.

我想通过随机耦合A和B中的元素来形成一个新列表(随机配对很重要).

I would like to form a new list by coupling the elements from A and B at random (the random pairing is important).

但是,我还需要确保结果列表中的每一对都是唯一的.

However, I also need to make sure that each pair in the resulting list is unique.

到目前为止,我一直在解决以下问题,该问题适用于小型列表,但显然不适用于具有许多组合的大型列表.

So far, I've been approaching the problem as follows, which works for small lists, but clearly is not suited to larger lists with many combinations.

from random import shuffle

# Create a list of actors and events for testing
events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']

# Randomize the elements of each list
shuffle(events)
shuffle(actors)

# Merge the two lists into a new list of pairs
edgelist = zip(events,actors)

# If the new list of pairs has all unique elements, then it is a good solution, otherwise try again at random
x = set(edgelist)

if len(edgelist) == len(x):
  break
else:
  while True:
    shuffle(events)
    shuffle(actors)
    edgelist = zip(events,actors)
    x = set(edgelist)
    if len(edgelist) == len(x):
      break

# Display the solution
print 'Solution obtained: '
for item in edgelist:
  print item

有人可以建议一种修改或替代方法,以适应更大的输入列表吗?

Can anyone suggest a modification or alternative approach that would scale to larger input lists?

感谢您的有用答复.

更新

事实证明,这是一个比最初想象的更具挑战性的问题.我想我现在有一个解决方案.它可能无法很好地扩展,但适用于中小型列表.它会在开始之前检查是否有解决方案,因此无需对输入列表的分布进行假设.我还包括几行代码,以显示结果列表的频率分布与原始频率匹配.

Turns out this is a more challenging problem than originally thought. I think I now have a solution. It may not scale incredibly well but works fine for small or medium sized lists. It checks to see whether a solution is possible before starting, so assumptions about the distribution of the input lists aren't necessary. I also included a few lines of code to show that the frequency distributions of the resulting list match the original.

# Randomize the elements
shuffle(events)

# Make sure a solution is possible
combinations = len(set(events))*len(set(actors))
assert combinations >= len(events) and combinations >= len(actors) and len(events) == len(actors), 'No soluton possible!'

# Merge the two lists into a new list of pairs (this will contain duplicates)
edgelist = zip(events,actors)

# Search for duplicates
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)

while duplicate_count > 0:

  # Get a single duplicate to address
  duplicate = duplicates[0]

  # Find the position of the duplicate in the in edgelist
  duplicate_pos = edgelist.index(duplicate)

  # Search for a replacement
  swap = choice(edgelist)
  swap_pos = edgelist.index(swap)

  if (swap[0],duplicate[1]) not in edgelist: 
    edgelist[duplicate_pos] = (swap[0],duplicate[1])
    edgelist[swap_pos] = (duplicate[0],swap[1])

  # Update duplicate count
  counts = collections.Counter(edgelist)
  duplicates = [i for i in counts if counts[i] > 1]
  duplicate_count = len(duplicates)


# Verify resulting edgelist and frequency distributions

print 'Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(events).items())], reverse=True)

print 'Edgelist Event Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter([x for (x,y) in edgelist]).items())], reverse=True)

print 'Actor Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter(actors).items())], reverse=True)

print 'Edgelist Actor Frequencies: '        
print sorted([y for (x,y) in list(collections.Counter([y for (x,y) in edgelist]).items())], reverse=True)

assert len(set(edgelist)) == len(events) == len(actors)

推荐答案

好吧,您没有理由对两个列表进行混洗.配对不会获得更多随机性".

Well, there is no reason for you to shuffle both lists. The pairing will not get "more random".

更新:我发布的解决方案与原始解决方案不同.它是递归的,并且保证总是返回有效答案;如果不可能,则返回None.

Update: I've posted a different solution than my original one. It's recursive, and is guaranteed to always return a valid answer, or None if one is not possible.

from random import shuffle

def merge(events, actors, index=None):
    """Returns list of unique pairs of events and actors, none of which may on index"""
    if len(events) == 0:
        return []
    if index is None:
        index = set()

    merged = None
    tried_pairs = set()
    for event in events:
        for i, actor in enumerate(actors):
            pair = (event, actor)
            if pair not in index and pair not in tried_pairs:
                new_index = index.union([pair])
                rest = merge(events[1:], actors[:i]+actors[i+1:], new_index)

                if rest is not None:
                    # Found! Done.
                    merged = [pair] + rest
                    break
                else:
                    tried_pairs.add(pair)
        if merged is not None:
            break

    return merged


if __name__ == '__main__':
    _events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
    _actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']
    shuffle(_events)
    edgelist = merge(_events, _actors)

    if edgelist is not None:
        print 'Solution obtained: '
        for item in edgelist:
            print item
    else:
        print 'No possible solution with these events and actors.'

注意:OP的解决方案上的"index"变量类似于检查边缘列表. "tried_pa​​irs"变量只是每个特定递归步骤的优化,以避免一遍又一遍地重试同一对(例如,如果actor中有多个连续的相同项).

Note: the "index" variable is similar to checking edgelist, on the OP's solution. The "tried_pairs" variable is just an optimisation for each specific recursion step, to avoid retrying the same pair over and over again (if, for instance, there are several consecutive identical items in actors).

这篇关于如何在python中随机合并两个列表的两个元素,并确保生成的条目都是唯一的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆