如何heapq.nlargest工作? [英] How does heapq.nlargest work?

查看:133
本文介绍了如何heapq.nlargest工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找的这个PYCON谈话,34:30 和扬声器说,让 N T 的最大元素C $ C>元素可以在 O(T + N)。

I was looking at this pycon talk, 34:30 and the speaker says that getting the t largest elements of a list of n elements can be done in O(t + n).

这怎么可能?我的理解是,创建堆将是 O(N),但什么的 nlargest 本身的复杂性,它是 O(N + T) O(T)(这有什么实际的算法)?

How is that possible? My understanding is that creating the heap will be O(n), but what's the complexity of nlargest itself, is it O(n + t) or O(t) (and what's the actual algorithm)?

推荐答案

说话的人是错误的在这种情况下。实际成本为O(n *日志(T))。 Heapify被称为只在第一个 T 可迭代的元素。这是 O(T),不过是微不足道的,如果 T N 。然后,所有剩余的元素都在同一时间通过 heappushpop ,一是加入到这一小堆。这需要 O(日志(T)) heappushpop 的调用时间。堆遗骸的长度 T 贯穿始终。在最后,堆排序,它的价格 O(T *日志(T)),但是这也是微不足道的,如果 T 远小于 N

The speaker is wrong in this case. The actual cost is O(n * log(t)). Heapify is called only on the first t elements of the iterable. That's O(t), but is insignificant if t is much smaller than n. Then all the remaining elements are added to this "little heap" via heappushpop, one at a time. That takes O(log(t)) time per invocation of heappushpop. The length of the heap remains t throughout. At the very end, the heap is sorted, which costs O(t * log(t)), but that's also insignificant if t is much smaller than n.

有相当简单的方法来找到预期的 O(N)时的第t-最大的元素;例如,看到这里。有困难的方式做,在最坏情况下的 O(N)的时间。然后,在另一个传过来的输入,可以输出 T 元素> =第t最大(与繁琐复杂的情况下,重复的)。所以整个作业的可以的在 O(N)的时间内完成。

There are reasonably easy ways to find the t'th-largest element in expected O(n) time; for example, see here. There harder ways to do it in worst-case O(n) time. Then, in another pass over the input, you could output the t elements >= the t-th largest (with tedious complications in case of duplicates). So the whole job can be done in O(n) time.

但这些方法需要 O(N)内存了。 Python不使用它们。什么实际实施的优点是,在最坏情况下的额外存储器负担是 O(t)的,这可以是非常显著当输入为,例如,发电机产生了大量的价值。

But those ways require O(n) memory too. Python doesn't use them. An advantage of what's actually implemented is that the worst-case "extra" memory burden is O(t), and that can be very significant when the input is, for example, a generator producing a great many values.

这篇关于如何heapq.nlargest工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆