加入一组有序整数的Python迭代器 [英] Joining a set of ordered-integer yielding Python iterators

查看:140
本文介绍了加入一组有序整数的Python迭代器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个看似简单的问题:给定一个按升序生成整数序列的迭代器列表,编写一个简洁的生成器,只生成每个序列中出现的整数。



昨晚看了几篇论文后,我决定用Python修改一个完全最小的全文索引器,如此处所示(虽然现在这个版本很老了。)



我的问题在于 search() 函数,它必须遍历每个发布列表并仅生成每个列表中显示的文档ID。正如您从上面的链接中看到的那样,我当前的非递归工作尝试非常糟糕。



示例

  postss = [[1,100,142,322,12312],
[2,100,101,322,1221],
[100,142,322,956,1222]]

应该收益:

  [100,322] 

至少有一个优雅的递归函数解决方案,但我想尽可能避免这种情况。但是,一个涉及嵌套生成器表达式, itertools 滥用或任何其他类型的代码高尔夫的解决方案非常受欢迎。 : - )



应该可以安排函数只需要与最小列表中的项目一样多的步骤,并且不需要抽取整个整数集进入记忆。将来,这些列表可以从磁盘中读取,并且大于可用的RAM。



在过去的30分钟里,我已经有了一个想法,但我不能把它变成代码。请记住,这只是为了好玩!


解决方案

 进口heapq,itertools 
def intersect(* its):
表示密钥,值为itertools.groupby(heapq.merge(* its)):
if len(list(values))== len(its):
yield key

>>> list(intersect(* postss))
[100,322]


Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.

After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).

My problem is with the search() function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.

Example:

postings = [[1,   100, 142, 322, 12312],
            [2,   100, 101, 322, 1221],
            [100, 142, 322, 956, 1222]]

Should yield:

[100, 322]

There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools abuse, or any other kind of code golf is more than welcome. :-)

It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.

For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!

解决方案

import heapq, itertools
def intersect(*its):
    for key, values in itertools.groupby(heapq.merge(*its)):
        if len(list(values)) == len(its):
            yield key

>>> list(intersect(*postings))
[100, 322]

这篇关于加入一组有序整数的Python迭代器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆