将排序列表/迭代器/生成器合并为一个值流... [英] Merging sorted lists/iterators/generators into one stream of values...

查看:66
本文介绍了将排序列表/迭代器/生成器合并为一个值流...的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将几个值的来源合并为一个值流。所有来源的b $ b已被排序,我需要从排序的顺序中检索

的值。


In换句话说:

s1 = [10,20,30,40,50]

s2 = [15,25]

s3 = [ 17,27,37]


中的值为(s1,s2,s3):

打印值


将按顺序打印10,15,17,20,25,27,30,37,40,50.


源是游标从几个数据中检索数据数据库,而不是来自

的同一台服务器,并且有很多行可能来自

。因此,任何将它全部加载到内存中并对其进行排序的方法并不好,因为内存太多了。


是否有函数或在Python中可以做我想做的事情吗?我已经想出了我自己的方法,但是因为它使用了生成器,所以我不会确定它有多少开销。另外,由于从游标检索的值将是fieldname的字典:value

对,该方法将需要处理该构造(并且

告诉要排序的字段名称,或者能够将一个函数对象用于

用于比较操作。


基本上,我会使用我自己的功能,除非有人可以指出我已经可用的东西
。似乎无法在内置模块中找到任何内容

但是当我看到

时我找不到glob.glob如何查找文件名要么是谁知道它叫什么:)


因为我需要处理一个变量来源列表(即2合1

应用程序,3在另一个和4-5中的第三个),一个简单的双源方法

是不够的,但如果它比我对2个来源做的更好,那么我可以

为它制作一个包装,因为这就是我所做的。


-

Lasse V?gs?ther Karlsen
http://usinglvkblog.blogspot.com/

mailto:la *** @ vkarlsen.no

PGP KeyID:0x2A42A1C2

解决方案

Lasse V?gs Karlsen写道:

我需要将几个值的来源合并为一个值流。所有来源都已经排序了,我需要按照排序的顺序从
中检索这些值。

换句话说:
s1 = [10, 20,30,40,50]
s2 = [15,25]
s3 = [17,27,37]

值为???(s1,s2 ,s3):
打印价值
将按顺序打印出10,15,17,20,25,27,30,37,40,50。

在Python中是否有函数或诸如此类的东西可以做我所做的事情想?我已经提出了自己的方法,但由于它使用了生成器,我不确定它有多少开销。另外,由于从游标中检索的值将是fieldname:value
对的字典,因此该方法将需要处理该构造(并且被告知要对哪个字段进行排序) on),或者能够将一个功能对象用于比较操作。

基本上,除非有人能指出我,否则我会使用自己的功能到已经可用的东西。似乎在内置模块中找不到任何东西
但是当我在寻找如何查找文件名时我找不到glob.glob,所以谁知道它叫什么: )

因为我需要处理一个可变的源列表(即2合一应用程序,3个在另一个应用程序中,4到5在第三个应用程序中),一个简单的2源代码方法
是不够的,但如果它比我为2个来源做的更好,那么我可以为它做一个包装,因为那就是我做的事情。



我怀疑你会找到一个预建的,毕竟创建你自己的,相当容易。
。生成器在Python中是相当快的构造,顺便说一句。

这是我几分钟内掀起的:


def firstIter(value):

it = iter(value)

试试:

返回it.next(),除了StopIteration之外它是
,错误:

返回无,无


def inorder(比较,* args):

iterators = [

[value,it]

for(value,it)in [firstIter(arg)for arg in args]

如果不是无

]

iterators.sort()

迭代器:

产生迭代器[0] [0]

试试:

value = iterators [0] [0] = iterators [0] [1] .next()

除了StopIteration,错误:

iterators.pop(0)

else:

if len(iterators)> 1和比较(值,

迭代器[1] [0])== 1:

iterators.sort()

继续


如果__name__ ==" __ main __":

s1 = [10,20,30,40,50]

s2 = [15,25]

s3 = [17,27,37]

s4 = []

for order in order(cmp,s1 ,s2,s3,s4):

打印价值

无论如何,玩得开心,

迈克


-

________________________________________________

Mike C. Fletcher

设计师,VR水管工,编码器
http://www.vrplumber.com
http://blog.vrplumber.com


好的,那个看起来更多比我想出来的更好。


我从你的解决方案中学到了很多东西,请纠正我,如果我b / b
误解了一些东西:


1.列表包含其他列表会根据内部列表中的第一个元素

对自己进行排序吗?

2. sort(),pop()不是昂贵的操作

除此之外你似乎没有使用cmp操作符但是很容易修复



这个看起来好多了我的,这就是我想出的:


def merge_sorted(iterables,comparer = None):

"""

生成器将获取单独排序的可迭代/生成器列表,然后通过取最低值以排序顺序生成

值来自每个

源的每次通话。


@param iterables:用于检索值的可迭代/生成器列表
来自
>
@type iterables:可迭代/生成器列表


@param comparer:可选的fn(v1,v2)函数可以比较两个

值,应该返回< 0如果v1< v2,

0如果v1> v2和== 0如果v1 == v2。



@type comparer:function-object,例如:lambda x,y:xy


@注意:可迭代列表实际上可以是

生成可迭代列表的任何东西,所以你可以使用一个产生列表的函数。

"" ;"


#首先将我们给出的任何内容转换为源列表

iterables = [迭代可迭代的迭代]


#这一系列的if语句将决定我们有多少来源b
,并解决了可管理的子问题

#。 br />
if len(iterables)!= 2:

if len(iterables)== 0:

#List,但没有来源

通过

elif len(iterables)== 1:

#只有1个来源,只需返回其内容

的值在iterables [0]:

收益价值

elif len(iterables)== 3:

#3来源,细分为0 < - > (1,2)

left_iterable = iterables [0]

right_iterable = merge_sorted([iterables [1],iterables [2]],

comparer)

merge_sorted中的值([left_iterable,right_iterable],

comparer):

收益率价值

elif len(iterables)== 4:

#4来源,细分为(0,1)< - > (2,3)

left_iterable = merge_sorted([iterables [0],iterables [1]],

comparer)

right_iterable = merge_sorted ([iterables [2],iterables [3]],

comparer)

for merge_sorted中的值((left_iterable,right_iterable),

比较器):

收益率价值

elif len(iterables)> 4:

#> 4个来源,细分为(0,1)< - > (2,...)

left_iterable = merge_sorted([iterables [0],iterables [1]],

comparer)

right_iterable = merge_sorted(iterables [2:],comparer)

for merge_sorted中的值((left_iterable,right_iterable),

comparer):

yield价值

提高StopIteration


#只有两个来源才能到达这个方法,

很容易处理

i1 = iter(iterables [0])

i2 = iter(iterables [1])


#Grab来自两个来源的前两个值,如果可能的话

试试:

v1 = i1.next()

has_v1 = True
除了StopIteration:

has_v1 = False

试试:

v2 = i2.next()

has_v2 = True

除了StopIteration:

has_v2 = False


#只要我们从两个值获得值/迭代器/无论如何,

比较并得出最低的

#two,然后得到来自相应来源的下一个值

而has_v1和has_v2:

#找出v1和v2中的哪一个首先

如果比较器不是无:

comp = comparer(v1,v2)

如果comp< = 0:

yield_v1 = True

else:

yield_v1 = False

else:

如果v1< = v2:

yield_v1 =真的

否则:

yield_v1 =错误


#产生下一个值,然后从

对应来源

如果yield_v1:

收益率v1

尝试:

v1 = i1。 next()

除了StopIteration:

has_v1 = False

else:

yield v2

试试:

v2 = i2.next()

除了StopIteration:

has_v2 = False


#当我们到达这里时,我们有3种可能性:

#1. has_v1 == True,has_v2 == False - >产生剩余的v1 / i1和

只是退出StopIteration异常

#2. has_v1 == False,has_v1 == True - >产生剩余的v2 / i2和

只需退出StopIteration异常

#3。has_v1 == has_v2 == False - > while-loops将跳过,函数

从最终落下

而has_v1:

yield v1

v1 = i1.next()

而has_v2:

收益v2

v2 = i2.next()


" Lasse V?gs?ther Karlsen" < LA ***** @ gmail.com>写道:

好吧,那个看起来比我想象的还要小。




为了最有效和优雅解决方案,查看Raymond Hettinger的回复:

http://aspn.activestate.com/ASPN/Coo.../Recipe/141934


George


I need to merge several sources of values into one stream of values. All
of the sources are sorted already and I need to retrieve the values from
them all in sorted order.

In other words:
s1 = [10, 20, 30, 40, 50]
s2 = [15, 25]
s3 = [17, 27, 37]

for value in ???(s1, s2, s3):
print value

will print out 10, 15, 17, 20, 25, 27, 30, 37, 40, 50 in that order.

The sources are cursors retrieving data from several databases, not from
the same server, and there''s a potential for a large number of rows from
several of the sources. As such, any method that would load it all into
memory and sort it is no good as it would too much memory.

Is there a function or whatnot in Python that will do what I want? I
have come up with my own method but since it uses generators I''m not
sure how much overhead there is. Additionally, since the values
retrieved from the cursors will be dictionaries of "fieldname":value
pairs, the method would either need to handle that construct (and be
told what fieldname to sort on), or be able to take a function object to
use for the comparison operation.

Basically, I''ll use my own function for this unless someone can point me
to something that is already available. Couldn''t seem to find anything
in the builtin modules but I didn''t find glob.glob when I was looking
for how to find filenames either so who knows what it''s called :)

Since I need to deal with a variable list of sources (that is, 2 in one
application, 3 in another and 4-5 in a third), a simple 2-source method
isn''t enough but if it''s better than what I do for 2 sources then I can
make a wrapper for it, since that''s what I do anyway.

--
Lasse V?gs?ther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2

解决方案

Lasse V?gs?ther Karlsen wrote:

I need to merge several sources of values into one stream of values. All
of the sources are sorted already and I need to retrieve the values from
them all in sorted order.

In other words:
s1 = [10, 20, 30, 40, 50]
s2 = [15, 25]
s3 = [17, 27, 37]

for value in ???(s1, s2, s3):
print value

will print out 10, 15, 17, 20, 25, 27, 30, 37, 40, 50 in that order.

The sources are cursors retrieving data from several databases, not from
the same server, and there''s a potential for a large number of rows from
several of the sources. As such, any method that would load it all into
memory and sort it is no good as it would too much memory.

Is there a function or whatnot in Python that will do what I want? I
have come up with my own method but since it uses generators I''m not
sure how much overhead there is. Additionally, since the values
retrieved from the cursors will be dictionaries of "fieldname":value
pairs, the method would either need to handle that construct (and be
told what fieldname to sort on), or be able to take a function object to
use for the comparison operation.

Basically, I''ll use my own function for this unless someone can point me
to something that is already available. Couldn''t seem to find anything
in the builtin modules but I didn''t find glob.glob when I was looking
for how to find filenames either so who knows what it''s called :)

Since I need to deal with a variable list of sources (that is, 2 in one
application, 3 in another and 4-5 in a third), a simple 2-source method
isn''t enough but if it''s better than what I do for 2 sources then I can
make a wrapper for it, since that''s what I do anyway.


I doubt you''ll find a prebuilt one, it''s fairly easy to create your own,
after all. Generators are fairly fast constructs in Python, btw.
Here''s what I whipped up in a few minutes:

def firstIter( value ):
it = iter( value )
try:
return it.next(), it
except StopIteration, err:
return None, None

def inorder( comparision, *args ):
iterators = [
[value,it]
for (value,it) in [ firstIter( arg ) for arg in args ]
if it is not None
]
iterators.sort()
while iterators:
yield iterators[0][0]
try:
value = iterators[0][0] = iterators[0][1].next()
except StopIteration, err:
iterators.pop(0)
else:
if len(iterators) > 1 and comparision( value,
iterators[1][0]) == 1:
iterators.sort()
continue

if __name__ == "__main__":
s1 = [10, 20, 30, 40, 50]
s2 = [15, 25]
s3 = [17, 27, 37]
s4 = []
for value in inorder(cmp, s1, s2, s3, s4):
print value
Anyway, have fun,
Mike

--
________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com


Ok, that one looks more sleak than what I came up with.

Couple of things I learn from your solution, please correct me if I
misunderstood something:

1. list containing other lists will sort itself based on first element
on lists inside ?
2. sort(), pop() is not costly operations

Other than that you seem to not use the cmp operator but that''s easily
fixed.

This one looks much better than mine, here''s what I came up with:

def merge_sorted(iterables, comparer=None):
"""
Generator that will take a list of iterables/generators that is
individually sorted, and then produce
values in a sorted order by taking the lowest value from each
source each call.

@param iterables: List of iterables/generators to retrieve values
from
@type iterables: List of iterables/generators

@param comparer: Optional fn(v1, v2) function that can compare two
values, should return <0 if v1<v2,

0 if v1>v2 and ==0 if v1==v2.


@type comparer: function-object, example: lambda x, y: x-y

@note: The "list of iterables" can actually be anything that
produces a list of iterables, so you can
use a function that yields lists for instance.
"""

# First convert whatever we''re given into a list of sources
iterables = [iterable for iterable in iterables]

# This series of if-statements will determine how many sources we
have and work out sub-problems
# that are manageable.
if len(iterables) != 2:
if len(iterables) == 0:
# List, but no sources
pass
elif len(iterables) == 1:
# Only 1 source, just return its contents
for value in iterables[0]:
yield value
elif len(iterables) == 3:
# 3 sources, sub-divide into 0 <--> (1, 2)
left_iterable = iterables[0]
right_iterable = merge_sorted([iterables[1], iterables[2]],
comparer)
for value in merge_sorted([left_iterable, right_iterable],
comparer):
yield value
elif len(iterables) == 4:
# 4 sources, sub-divide into (0, 1) <--> (2, 3)
left_iterable = merge_sorted([iterables[0], iterables[1]],
comparer)
right_iterable = merge_sorted([iterables[2], iterables[3]],
comparer)
for value in merge_sorted((left_iterable, right_iterable),
comparer):
yield value
elif len(iterables) > 4:
# >4 sources, sub-divide into (0, 1) <--> (2, ...)
left_iterable = merge_sorted([iterables[0], iterables[1]],
comparer)
right_iterable = merge_sorted(iterables[2:], comparer)
for value in merge_sorted((left_iterable, right_iterable),
comparer):
yield value
raise StopIteration

# The method will only get here if there is only two sources, which
is an easy case to handle
i1 = iter(iterables[0])
i2 = iter(iterables[1])

# Grab the first two values from the two sources, if possible
try:
v1 = i1.next()
has_v1 = True
except StopIteration:
has_v1 = False
try:
v2 = i2.next()
has_v2 = True
except StopIteration:
has_v2 = False

# As long as we got values from both generates/iterators/whatever,
compare and yield the lowest of the
# two, and then get the next value from the corresponding source
while has_v1 and has_v2:
# Work out which of v1 and v2 comes first
if comparer is not None:
comp = comparer(v1, v2)
if comp <= 0:
yield_v1 = True
else:
yield_v1 = False
else:
if v1 <= v2:
yield_v1 = True
else:
yield_v1 = False

# Yield the next value, then grab a new value from the
corresponding source
if yield_v1:
yield v1
try:
v1 = i1.next()
except StopIteration:
has_v1 = False
else:
yield v2
try:
v2 = i2.next()
except StopIteration:
has_v2 = False

# When we get here, we got 3 possibilities:
# 1. has_v1 == True, has_v2 == False --> yield rest of v1/i1 and
just exit on StopIteration exception
# 2. has_v1 == False, has_v1 == True --> yield rest of v2/i2 and
just exit on StopIteration exception
# 3. has_v1 == has_v2 == False --> while-loops will skip, function
falls off the end
while has_v1:
yield v1
v1 = i1.next()
while has_v2:
yield v2
v2 = i2.next()


"Lasse V?gs?ther Karlsen" <la*****@gmail.com> wrote:

Ok, that one looks more sleak than what I came up with.



For the most efficient and elegant solution, check out Raymond Hettinger''s reply to:

http://aspn.activestate.com/ASPN/Coo.../Recipe/141934

George


这篇关于将排序列表/迭代器/生成器合并为一个值流...的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆