从一个数字亿检索前100号 [英] Retrieving the top 100 numbers from one hundred million of numbers
问题描述
我的一位朋友一直问一个问题
One of my friend has been asked with a question
从一个数字亿检索最大前100号
Retrieving the max top 100 numbers from one hundred million of numbers
在最近的一次面试。你有什么想法,拿出一个有效的方式来解决呢?
in a recent job interview. Do you have any idea to come up with an efficient way to solve it?
推荐答案
运行它们都通过一个分堆< /尺寸100>:为每个输入号 K
,取代目前的分 M
与 MAX(K,M)
。然后堆持有的100家最大的投入。
Run them all through a min-heap of size 100: for each input number k
, replace the current min m
with max(k, m)
. Afterwards the heap holds the 100 largest inputs.
像Lucene的搜索引擎可以使用这种方法,与改进,以选择最相关的搜索答案。
A search engine like Lucene can use this method, with refinements, to choose the most-relevant search answers.
编辑:我失败了采访 - 我得到的信息错误的两倍(在做这件事之前,在生产后)。下面是code,以检查它;这是几乎相同的Python的标准 heapq.nlargest()
:
I fail the interview -- I got the details wrong twice (after having done this before, in production). Here's code to check it; it's almost the same as Python's standard heapq.nlargest()
:
import heapq
def funnel(n, numbers):
if n == 0: return []
heap = numbers[:n]
heapq.heapify(heap)
for k in numbers[n:]:
if heap[0] < k:
heapq.heapreplace(heap, k)
return heap
>>> funnel(4, [3,1,4,1,5,9,2,6,5,3,5,8])
[5, 8, 6, 9]
这篇关于从一个数字亿检索前100号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!