交错不同的长度列表,删除重复项并保留Python中的顺序 [英] Interleave different length lists, elimating duplicates and preserve order in Python

查看:124
本文介绍了交错不同的长度列表,删除重复项并保留Python中的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个列表,让我们说:

I have two lists, lets say:

keys1 = ['A', 'B', 'C', 'D', 'E',           'H', 'I']
keys2 = ['A', 'B',           'E', 'F', 'G', 'H',      'J', 'K']

如何创建没有重复项的合并列表,以保留两个列表的顺序,将缺少的元素插入它们所属的位置?像这样:

How do I create a merged list without duplicates that preserve the order of both lists, inserting the missing elements where they belong? Like so:

merged = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

请注意,可以将元素与相等进行比较,但不进行排序(它们是复杂的字符串)。
更新:不能通过比较来排序元素,但是它们会根据它们在原始列表中的出现次序进行排序。

Note that the elements can be compared against equality but not ordered (they are complex strings). Update: The elements can't be ordered by comparing them, but they have a order based on their occurrence in the original lists.

更新:如果出现矛盾(两个输入列表中的顺序不同),则包含所有元素的任何输出都有效。如果解决方案在保留大部分订单时显示常识,当然还有奖励积分。

Update: In case of contradiction (different order in both input lists), any output containing all elements is valid. Of course with bonus points if the solution shows 'common sense' in preserving most of the order.

更新:再次(因为一些评论仍然争论它,这些列表通常不会在共同元素的顺序上相互矛盾。如果他们这样做,算法需要优雅地处理该错误。

Update: Again (as some comments still argue about it), the lists normally don't contradict each other in terms of the order of the common elements. In case they do, the algorithm needs to handle that error gracefully.

我开始使用.next()迭代列表的版本,只推进包含该列表的列表不匹配的元素,但.next()只是不知道何时停止。

I started with a version that iterates over the lists with .next() to advance just the list containing the unmatched elements, but .next() just doesn't know when to stop.

merged = []
L = iter(keys1)
H = iter(keys2)
l = L.next()
h = H.next()

for i in range(max(len(keys1, keys2))):
  if l == h:
    if l not in merged:
      merged.append(l)
    l = L.next()
    h = H.next()

  elif l not in keys2:
    if l not in merged:
      merged.append(l)
    l = L.next()

  elif h not in keys1:
    if h not in merged:
      merged.append(h)
    h = H.next()

  else: # just in case the input is badly ordered
    if l not in merged:
      merged.append(l)
    l = L.next()
    if h not in merged:
      merged.append(h)
    h = H.next()   

print merged

这显然不起作用,因为.next()会导致最短列表的例外。现在我可以更新我的代码以在每次调用.next()时捕获该异常。但是这些代码已经完全不同于pythonic,这显然会破坏泡沫。

This obviously doesn't work, as .next() will cause an exception for the shortest list. Now I could update my code to catch that exception every time I call .next(). But the code already is quite un-pythonic and this would clearly burst the bubble.

有没有人更好地了解如何迭代这些列表来组合元素?

Does anyone have a better idea of how to iterate over those lists to combine the elements?

如果我可以一次性完成三个列表,那么奖励积分。

Bonus points if I can do it for three lists in one go.

推荐答案

您需要的基本上是任何合并实用程序所做的:它尝试合并两个序列,同时保持每个序列的相对顺序。您可以使用Python的 difflib 模块来区分两个序列,合并它们:

What you need is basically what any merge utility does: It tries to merge two sequences, while keeping the relative order of each sequence. You can use Python's difflib module to diff the two sequences, and merge them:

from difflib import SequenceMatcher

def merge_sequences(seq1,seq2):
    sm=SequenceMatcher(a=seq1,b=seq2)
    res = []
    for (op, start1, end1, start2, end2) in sm.get_opcodes():
        if op == 'equal' or op=='delete':
            #This range appears in both sequences, or only in the first one.
            res += seq1[start1:end1]
        elif op == 'insert':
            #This range appears in only the second sequence.
            res += seq2[start2:end2]
        elif op == 'replace':
            #There are different ranges in each sequence - add both.
            res += seq1[start1:end1]
            res += seq2[start2:end2]
    return res

示例:

>>> keys1 = ['A', 'B', 'C', 'D', 'E',           'H', 'I']
>>> keys2 = ['A', 'B',           'E', 'F', 'G', 'H',      'J', 'K']
>>> merge_sequences(keys1, keys2)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

请注意,您期望的答案不一定是唯一可能的答案。例如,如果我们在此处更改序列的顺序,我们会得到另一个同样有效的答案:

Note that the answer you expect is not necessarily the only possible one. For example, if we change the order of sequences here, we get another answer which is just as valid:

>>> merge_sequences(keys2, keys1)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'I']

这篇关于交错不同的长度列表,删除重复项并保留Python中的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆