在Python中大小为N的未排序列表中获取k个最小数字的最快方法? [英] fastest method of getting k smallest numbers in unsorted list of size N in python?

查看:278
本文介绍了在Python中大小为N的未排序列表中获取k个最小数字的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用python在大小为N的未排序列表中获取k个最小数字的最快方法是什么?

对大数字列表进行排序然后获得k个最小数字的速度更快吗?
还是要通过找到列表中的最小值k次来获得k个最小的数字,并确保在下一次搜索之前从搜索中删除找到的最小值?

What is the fastest method to get the k smallest numbers in an unsorted list of size N using python?
Is it faster to sort the big list of numbers, and then get the k smallest numbers,
or to get the k smallest numbers by finding the minimum in the list k times, making sure u remove the found minimum from the search before the next search?

推荐答案

您可以使用堆队列;它可以为您提供O(NlogK)时间中大小为N的列表中的K个最大或最小的数字。

You could use a heap queue; it can give you the K largest or smallest numbers out of a list of size N in O(NlogK) time.

Python标准库包含 heapq 模块,并带有 heapq.nsmallest()函数已准备好实现:

The Python standard library includes the heapq module, complete with a heapq.nsmallest() function ready implemented:

import heapq

k_smallest = heapq.nsmallest(k, input_list)

内部,这会创建一个大小为K的堆,其中包含输入列表的前K个元素,然后遍历其余的NK元素,将每个NK元素推入堆,然后弹出最大的NK元素。这样的推入和弹出操作花费K的时间,使得整个操作为O(NlogK)。

Internally, this creates a heap of size K with the first K elements of the input list, then iterating over the remaining N-K elements, pushing each to the heap, then popping off the largest one. Such a push and pop takes log K time, making the overall operation O(NlogK).

该函数还优化了以下边缘情况:

The function also optimises the following edge cases:


  • 如果K为1,则使用 min()函数,从而得到O(N)结果

  • 如果K> = N,则函数将使用排序,因为在这种情况下O(NlogN)会胜过O(NlogK)。

  • If K is 1, the min() function is used instead, giving you a O(N) result.
  • If K >= N, the function uses sorting instead, since O(NlogN) would beat O(NlogK) in that case.

更好的选择是使用 introselect算法,它提供了O(n)选项。我知道的唯一实现是使用 numpy.partition()函数

A better option is to use the introselect algorithm, which offers an O(n) option. The only implementation I am aware of is using the numpy.partition() function:

import numpy

# assuming you have a python list, you need to convert to a numpy array first
array = numpy.array(input_list)
# partition, slice back to the k smallest elements, convert back to a Python list
k_smallest = numpy.partition(array, k)[:k].tolist()

除了需要安装 numpy 外,这还占用了N个内存(与 heapq 的K相比),

Apart from requiring installation of numpy, this also takes N memory (versus K for heapq), as a copy of the list is created for the partition.

如果只需要索引,则可以使用任一变体:

If you only wanted indices, you can use, for either variant:

heapq.nsmallest(k, range(len(input_list)), key=input_list.__getitem__)  # O(NlogK)
numpy.argpartition(numpy.array(input_list), k)[:k].tolist()  # O(N)

这篇关于在Python中大小为N的未排序列表中获取k个最小数字的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆