heapq.nlargest的时间复杂度是多少? [英] What is the time complexity of heapq.nlargest?

查看:114
本文介绍了heapq.nlargest的时间复杂度是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在看这个pycon谈话,34:30 发言人说,可以在 O(t + n)中完成 n 个元素列表中最大的 t 个元素.

那怎么可能?我的理解是创建堆将是 O(n),但是 nlargest 本身的复杂性是 O(n + t) O(t)(以及实际的算法是什么)?

解决方案

在这种情况下,演讲者是错误的.实际费用为 O(n * log(t)).仅在可迭代对象的前 t 个元素上调用Heapify.那是 O(t),但是如果 t n 小得多,那是微不足道的.然后,所有剩余的元素通过 heappushpop 一次添加到此小堆"中.每次调用 heappushpop 都需要花费 O(log(t))时间.整个堆的长度始终为 t .最后,对堆进行排序,这花费了 O(t * log(t)),但是如果 t n小得多,这也就无关紧要了..

有理论的乐趣;-)

有相当简单的方法可以在期望的 O(n)时间中找到第t个最大元素;例如,请参见此处.在最坏的情况下,有更困难的方法来执行此操作.然后,在输入的另一遍中,您可以输出 t 元素> =第t个最大元素(如果有重复,则会带来繁琐的复杂操作).因此,整个工作 可以在 O(n)时间内完成.

但是这些方式也需要 O(n)内存.Python不使用它们.实际实现的优点是,最坏情况下的额外"内存负担是 O(t),例如,当输入是生成大量内容的生成器时,这可能非常重要.值.

I was looking at this pycon talk, 34:30 and the speaker says that getting the t largest elements of a list of n elements can be done in O(t + n).

How is that possible? My understanding is that creating the heap will be O(n), but what's the complexity of nlargest itself, is it O(n + t) or O(t) (and what's the actual algorithm)?

解决方案

The speaker is wrong in this case. The actual cost is O(n * log(t)). Heapify is called only on the first t elements of the iterable. That's O(t), but is insignificant if t is much smaller than n. Then all the remaining elements are added to this "little heap" via heappushpop, one at a time. That takes O(log(t)) time per invocation of heappushpop. The length of the heap remains t throughout. At the very end, the heap is sorted, which costs O(t * log(t)), but that's also insignificant if t is much smaller than n.

Fun with Theory ;-)

There are reasonably easy ways to find the t'th-largest element in expected O(n) time; for example, see here. There are harder ways to do it in worst-case O(n) time. Then, in another pass over the input, you could output the t elements >= the t-th largest (with tedious complications in case of duplicates). So the whole job can be done in O(n) time.

But those ways require O(n) memory too. Python doesn't use them. An advantage of what's actually implemented is that the worst-case "extra" memory burden is O(t), and that can be very significant when the input is, for example, a generator producing a great many values.

这篇关于heapq.nlargest的时间复杂度是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆