加入一组有序整数产生 Python 迭代器 [英] Joining a set of ordered-integer yielding Python iterators

查看:31
本文介绍了加入一组有序整数产生 Python 迭代器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个看似简单的问题:给定一个按升序生成整数序列的迭代器列表,编写一个简洁的生成器,只生成出现在每个序列中的整数.

Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.

昨晚阅读了几篇论文后,我决定用 Python 编写一个完全最小的全文索引器,如此处所示(尽管该版本现在已经很旧了).

After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).

我的问题是 search() 函数,它必须遍历每个发布列表并只生成出现在每个列表中的文档 ID.从上面的链接可以看出,我目前的非递归工作"尝试很糟糕.

My problem is with the search() function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.

示例:

postings = [[1,   100, 142, 322, 12312],
            [2,   100, 101, 322, 1221],
            [100, 142, 322, 956, 1222]]

应该屈服:

[100, 322]

至少有一个优雅的递归函数解决方案,但如果可能的话,我想避免这种情况.然而,涉及嵌套生成器表达式、itertools 滥用或任何其他类型的代码高尔夫的解决方案是非常受欢迎的.:-)

There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools abuse, or any other kind of code golf is more than welcome. :-)

应该可以安排函数只需要与最小列表中的项目一样多的步骤,而不需要将整个整数集吸入内存.将来,这些列表可能会从磁盘读取,并且大于可用 RAM.

It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.

在过去的 30 分钟里,我的舌尖上有了一个想法,但我无法将它写成代码.请记住,这只是为了好玩!

For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!

推荐答案

import heapq, itertools
def intersect(*its):
    for key, values in itertools.groupby(heapq.merge(*its)):
        if len(list(values)) == len(its):
            yield key

>>> list(intersect(*postings))
[100, 322]

这篇关于加入一组有序整数产生 Python 迭代器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆