在Python中以排序顺序从排序的迭代器中产生? [英] Yielding from sorted iterators in sorted order in Python?
问题描述
有没有更好的方法将一堆已排序的迭代器合并/整理成一个,以便按排序顺序生成项目?我认为下面的代码有效,但我觉得有一种更清洁,更简洁的方式,我错过了。
Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
用例因为这是我有一堆csv文件,我需要根据一些排序字段合并。它们足够大,我不想将它们全部读入列表并调用sort()。我正在使用python2.6,但如果有python3的解决方案,我仍然有兴趣看到它。
The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.
推荐答案
是的,你想 heapq.merge()
只做一件事;按顺序遍历已排序的迭代器
yes, you want heapq.merge()
which does exactly one thing; iterate over sorted iterators in order
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))
这篇关于在Python中以排序顺序从排序的迭代器中产生?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!