从非常大的未排序列表中获取最大X数的最快方法? [英] Fastest way to obtain the largest X numbers from a very large unsorted list?

查看:90
本文介绍了从非常大的未排序列表中获取最大X数的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我的程序生成的分数列表中获得顶部的100分。不幸的是,列表是巨大的(大约数百万到数十亿),所以排序是程序的时间密集的部分。

I'm trying to obtain the top say, 100 scores from a list of scores being generated by my program. Unfortuatly the list is huge (on the order of millions to billions) so sorting is a time intensive portion of the program.

进行排序以获得前100分的最佳方式是什么?

Whats the best way of doing the sorting to get the top 100 scores?

到目前为止,我唯一能想到的两种方法是首先将所有分数生成为一个大规模数组,然后排序并取出前100个。或者, X分数,排序和截断前100分,然后继续生成更多分数,将它们添加到截断列表,然后再次排序。

The only two methods i can think of so far is either first generating all the scores into a massive array and then sorting it and taking the top 100. Or second, generating X number of scores, sorting it and truncating the top 100 scores then continue generating more scores, adding them to the truncated list and then sorting it again.

无论哪种方式,我仍然需要更多的时间,我想,任何想法如何以更高效的方式做到这一点? (我以前从来没有参加过程序设计课程,也许那些具有comp sc学位的人知道有效的算法来做到这一点,至少这是我所希望的)。

Either way I do it, it still takes more time than i would like, any ideas on how to do it in an even more efficient way? (I've never taken programming courses before, maybe those of you with comp sci degrees know about efficient algorithms to do this, at least that's what I'm hoping).

最后,c ++中的标准sort()函数使用的排序算法是什么?

Lastly, whats the sorting algorithm used by the standard sort() function in c++?

谢谢,

- 已验证

编辑:只是为了好奇的人...

Just for anyone who is curious...

我做了几次试验之前和之后以及这里是结果:

I did a few time trials on the before and after and here are the results:

旧程序(在每个外循环迭代之后排序):

Old program (preforms sorting after each outer loop iteration):

top 100 scores: 147 seconds
top  10 scores: 147 seconds
top   1 scores: 146 seconds
Sorting disabled: 55 seconds

新计划(仅实施最高分数的跟踪并使用默认排序功能):

new program (implementing tracking of only top scores and using default sorting function):

top 100 scores: 350 seconds <-- hmm...worse than before
top  10 scores: 103 seconds 
top   1 scores:  69 seconds 
Sorting disabled: 51 seconds

新的重写排序算法):

top 100 scores: 71 seconds <-- Very nice!
top  10 scores: 52 seconds
top   1 scores: 51 seconds
Sorting disabled: 50 seconds

在核心2,1.6 GHz上完成...我不能等到我的核心i7 860到达...

Done on a core 2, 1.6 GHz...I can't wait till my core i7 860 arrives...

很多其他更积极的优化我的工作(主要是在减少我运行的迭代次数的领域),但就目前来说,速度是不够好,我可能甚至不打扰工作

There's a lot of other even more aggressive optimizations for me to work out (mainly in the area of reducing the number of iterations i run), but as it stands right now, the speed is more than good enough, i might not even bother to work out those algorithm optimizations.

推荐答案

感谢eveyrone的输入! >

  • 取得前100个分数,并将它们排列成数组。

  • 取下一个分数,并将其插入数组小结束)

  • 删除第101个值

  • 继续下一个值,在2处,直到完成

    1. take the first 100 scores, and sort them in an array.
    2. take the next score, and insertion-sort it into the array (starting at the "small" end)
    3. drop the 101st value
    4. continue with the next value, at 2, until done

    随着时间的推移,列表将越来越类似于100最大值,所以更频繁地,您发现插入排序立即中止,发现新值小于前100名候选人的最小值。

    Over time, the list will resemble the 100 largest value more and more, so more often, you find that the insertion sort immediately aborts, finding that the new value is smaller than the smallest value of the candidates for the top 100.

    这篇关于从非常大的未排序列表中获取最大X数的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆