合并有序列表 [英] Merging ordered lists

查看:79
本文介绍了合并有序列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个算法问题:我应该如何有效地合并大多数类似列表的

集合,具有不同的长度和

任意内容,同时消除重复和保留订单

尽可能多吗?


我的代码:


def merge_to_unique(来源):

"""将源中每个列表中的唯一元素合并到新的

列表中。


使用最长的输入列表作为参考,每个较小或相等长度的列表合并来自

元素,并删除重复项。


@return:元素的组合列表。

""

sources.sort(无,len,True)#Scecending length

ref = sources [0]

for src in sources [1:]:

for i,s in enumerate(src):

如果s和(ref [i]!= s)且s不在ref中:

ref.insert(ref.index(src [i-1])+ 1,s)

#删除重复项

返回[r代表i,r代表枚举(ref)如果r和r不在ref [i + 1:]]中

这就是使用CSV模块'DictWriter类合并了一个不太完美的CSV源的

集(此处列表)。 DictWriter

构造函数需要一个字段名列表,以便它可以将

字典转换为它写入的CSV文件的行。一些输入

CSV文件缺少列,有些可能有额外的 - 所有这些

应该被接受,并且合并文件中列的顺序

应该尽可能地匹配输入文件的顺序(不是按字母顺序排列的b $ b)。所有列表元素都是字符串,在这种情况下,但是如果函数不需要它会很好。


速度实际上不是问题还没有;它可能有一天重要,但是现在它只是一个概念美学的问题。有什么建议吗?

Here''s an algorithm question: How should I efficiently merge a
collection of mostly similar lists, with different lengths and
arbitrary contents, while eliminating duplicates and preserving order
as much as possible?

My code:

def merge_to_unique(sources):
"""Merge the unique elements from each list in sources into new
list.

Using the longest input list as a reference, merges in the
elements from
each of the smaller or equal-length lists, and removes duplicates.

@return: Combined list of elements.
"""
sources.sort(None, len, True) # Descending length
ref = sources[0]
for src in sources[1:]:
for i, s in enumerate(src):
if s and (ref[i] != s) and s not in ref:
ref.insert(ref.index(src[i-1])+1, s)
# Remove duplicates
return [r for i, r in enumerate(ref) if r and r not in ref[i+1:]]
This comes up with using the CSV module''s DictWriter class to merge a
set (list, here) of not-quite-perfect CSV sources. The DictWriter
constructor needs a list of field names so that it can convert
dictionaries into rows of the CSV file it writes. Some of the input
CSV files are missing columns, some might have extras -- all of this
should be accepted, and the order of the columns in the merged file
should match the order of the input files as much as possible (not
alphabetical). All of the list elements are strings, in this case, but
it would be nice if the function didn''t require it.

Speed actually isn''t a problem yet; it might matter some day, but for
now it''s just an issue of conceptual aesthetics. Any suggestions?

推荐答案

5月31日,10:00 * pm,etal< eric.talev ... @ gmail.comwrote:
On May 31, 10:00*pm, etal <eric.talev...@gmail.comwrote:

这里是一个算法问题:我应该如何有效地合并

大多数相似列表的集合,具有不同的长度和

任意内容,同时尽可能消除重复并保留订单


Here''s an algorithm question: How should I efficiently merge a
collection of mostly similar lists, with different lengths and
arbitrary contents, while eliminating duplicates and preserving order
as much as possible?



我会做两步。关于是否所有内容都被拉入内存,有很多方法可以合并取决于


http://aspn.activestate.com/ASPN/Coo.../Recipe/491285
http://aspn.activestate.com/ASPN/Coo.../Recipe / 305269


合并后,groupby itertool适用于删除重复项:


result = [k for k,g in groupby(imerge(* sources))]

Raymond

I would do it two steps. There''s a number of ways to merge depending
on whether everything is pulled into memory or not:
http://aspn.activestate.com/ASPN/Coo.../Recipe/491285
http://aspn.activestate.com/ASPN/Coo.../Recipe/305269

After merging, the groupby itertool is good for removing duplicates:

result = [k for k, g in groupby(imerge(*sources))]
Raymond


etal写道:
etal wrote:

这是一个算法问题:我应该如何有效地合并一个大多数相似列表的

集合,具有不同的长度和

任意内容,同时消除重复和保留订单

尽可能多吗?


我的代码:


def merge_to_unique(来源):

"""将源中每个列表中的唯一元素合并到新的

列表。


使用最长的输入列表作为参考,合并来自

元素的每个小于或等于-length列表,并删除重复项。


@return:元素的组合列表。

"""

sources.sort(None,len,True)#Scecending length

ref = sources [0]

for src in sources [1:]:

for i,s in enumerate(src):

如果s和(ref [i]!= s)且s不在ref:

ref.insert( ref.index(src [i-1])+ 1,s)

#删除重复项

返回[r for i,r in enumerate(ref)if r and r不在ref [i + 1:]]


这就是使用CSV模块的DictWriter类来合并

集(列表) ,这里)not-quite-p错误的CSV源。 DictWriter

构造函数需要一个字段名列表,以便它可以将

字典转换为它写入的CSV文件的行。一些输入

CSV文件缺少列,有些可能有额外的 - 所有这些

应该被接受,并且合并文件中列的顺序

应该尽可能地匹配输入文件的顺序(不是按字母顺序排列的b $ b)。所有列表元素都是字符串,在这种情况下,但是如果函数不需要它会很好。


速度实际上不是问题还没有;它可能有一天重要,但是现在它只是一个概念美学的问题。有什么建议?
Here''s an algorithm question: How should I efficiently merge a
collection of mostly similar lists, with different lengths and
arbitrary contents, while eliminating duplicates and preserving order
as much as possible?

My code:

def merge_to_unique(sources):
"""Merge the unique elements from each list in sources into new
list.

Using the longest input list as a reference, merges in the
elements from
each of the smaller or equal-length lists, and removes duplicates.

@return: Combined list of elements.
"""
sources.sort(None, len, True) # Descending length
ref = sources[0]
for src in sources[1:]:
for i, s in enumerate(src):
if s and (ref[i] != s) and s not in ref:
ref.insert(ref.index(src[i-1])+1, s)
# Remove duplicates
return [r for i, r in enumerate(ref) if r and r not in ref[i+1:]]
This comes up with using the CSV module''s DictWriter class to merge a
set (list, here) of not-quite-perfect CSV sources. The DictWriter
constructor needs a list of field names so that it can convert
dictionaries into rows of the CSV file it writes. Some of the input
CSV files are missing columns, some might have extras -- all of this
should be accepted, and the order of the columns in the merged file
should match the order of the input files as much as possible (not
alphabetical). All of the list elements are strings, in this case, but
it would be nice if the function didn''t require it.

Speed actually isn''t a problem yet; it might matter some day, but for
now it''s just an issue of conceptual aesthetics. Any suggestions?



#untested

import difflib

def _merge(a,b):

sm = difflib.SequenceMatcher(无,a,b)

for op,a1,a2,b1,b2 in sm.get_opcodes():

如果op ==" insert":

产量b [b1:b2]

否则:

产生一个[a1:a2]


def merge(a,b):

返还金额(_merge(a,b),[])


def merge_to_unique(来源):

返回减少(合并,排序(来源,关键= len,反向=真))


彼得

#untested
import difflib

def _merge(a, b):
sm = difflib.SequenceMatcher(None, a, b)
for op, a1, a2, b1, b2 in sm.get_opcodes():
if op == "insert":
yield b[b1:b2]
else:
yield a[a1:a2]

def merge(a, b):
return sum(_merge(a, b), [])

def merge_to_unique(sources):
return reduce(merge, sorted(sources, key=len, reverse=True))

Peter


嗨!


使用set(union)。

示例:

la = [2,1,3,5,4,6]

lb = [2,8,6,4,12]


#compact:

打印列表(set(la).union(set(lb)))


#detail:

s1 = set(la)

s2 = set(lb)

s3 = s1.union(s2)

打印列表( s3)

@ -salutations


Michel Claveau

Hi!

Use set (union).
Example:

la=[2,1,3,5,4,6]
lb=[2,8,6,4,12]

#compact:
print list(set(la).union(set(lb)))

#detail:
s1 = set(la)
s2 = set(lb)
s3 = s1.union(s2)
print list(s3)
@-salutations

Michel Claveau


这篇关于合并有序列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆