在显示进度的同时对大型馆藏进行排序 [英] Sort a large collection while showing progress

查看:74
本文介绍了在显示进度的同时对大型馆藏进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新进度条时对集合进行排序的最佳方法是什么?目前,我有这样的代码:

What is the best way to sort a collection while updating a progress bar? Currently I have code like this:

for (int i = 0; i < items.size(); i++)
{
    progressBar.setValue(i);

    // Uses Collections.binarySearch:
    CollectionUtils.insertInOrder(sortedItems, item.get(i));
}

这显示进度,但是进度栏随着sortedItems中项目数的增加而变慢.有谁有更好的方法?理想情况下,我想使用类似于Collections.sort()的接口,以便尝试不同的排序算法.

This shows progress but the progress bar slows down as the number of items in sortedItems grows larger. Does anyone have a better approach? Ideally I'd like to use an interface similar to Collections.sort() so that I try different sorting algorithms.

任何帮助都会很棒!


作为背景知识,这段代码正在从Lucene撤回许多文档(1到1千万),并在它们之上运行自定义比较器.通过将数据写回到磁盘上对它们进行排序将太慢而无法实用.大部分成本是从磁盘上读取项目,然后在项目上运行比较器.我的PC有大量的内存,因此没有与交换到磁盘等有关的问题.

As a bit of background, this code is pulling back lots of documents (1-10 million) from Lucene and running a custom comparator over them. Sorting them by writing data back onto the disk will be way too slow to be practical. Most of the cost is reading the item off the disk and then running the comparator over the items. My PC has loads of memory so there is no issues relating to swapping to disk, etc.

最后,我选择了Stephen的解决方案,因为它非常干净,可以轻松添加多线程排序算法.

In the end I went with Stephen's solution since it was very clean and allowed me to easily add a multi-threaded sorting algorithm.

推荐答案

在这里要小心.您已选择使用一种算法来增量构建排序的数据结构,以便(我接受)您可以显示进度条.但是,这样做时,您 可能选择了一种比最佳排序慢得多的排序方法. (两种类型都将是O(NlogN),但是性能要比big-O行为更多...)

You want to be careful here. You've chosen to use an algorithm that incrementally builds a sorted data structure so that (I take it) you can display a progress bar. However, in doing this, you may have chosen a sorting method that is significantly slower than the optimal sort. (Both sorts will be O(NlogN) but there's more to performance than big-O behaviour ...)

如果您担心这可能是一个问题,请比较使用TreeMapCollections.sort对典型集合进行排序的时间.后者的工作方式是将输入集合复制到数组中,对数组进行排序,然后再将其复制回. (最有效 如果输入集合是ArrayList.如果您不需要将结果作为可变集合,则可以使用Collection.toArrayArrays.sortArrays.asList来避免返回最终副本.)

If you are concerned that this might be an issue, compare the time to sort a typical collection using TreeMap and Collections.sort. The latter works by copying the input collection into an array, sorting the array and then copying it back. (It works best if the the input collection is an ArrayList. If you don't need the result as a mutable collection you can avoid the final copy back by using Collection.toArray, Arrays.sort and Arrays.asList instead.)

一种替代方法是使用一个Comparator对象,该对象跟踪被调用的次数,并使用该对象来跟踪排序的进度.您可以利用以下事实:比较器通常会被调用大约N*log(N)次,尽管您可能需要根据实际使用的算法 1 对其进行校准.

An alternative idea would be to use a Comparator object that keeps track of the number of times that it has been called, and use that to track the sort's progress. You can make use of the fact that the comparator is typically going to be called roughly N*log(N) times, though you may need to calibrate this against the actual algorithm used1.

顺便说一句,对比较器的调用进行计数比对插入计数进行计数会更好地指示进度.当您接近完成排序时,您将不会看到进度出现放缓的趋势.

Incidentally, counting the calls to the comparator will give you a better indication of progress than you get by counting insertions. You won't get the rate of progress appearing to slow down as you get closer to completing the sort.

(您将具有不同的线程来读写计数器,因此您需要考虑同步.将计数器声明为volatile是可行的,但这会占用额外的内存流量.您也可以忽略该问题,如果您很高兴进度条有时会显示过时的值……取决于您的平台等.

(You'll have different threads reading and writing the counter, so you need to consider synchronization. Declaring the counter as volatile would work, at the cost of extra memory traffic. You could also just ignore the issue if you are happy for the progress bar to sometimes show stale values ... depending on your platform, etc.)

1-出现问题.在某些算法中,比较的数量可能会根据要排序的数据的初始顺序而急剧变化.对于这种算法,无法校准将在非平均"情况下工作的计数器.

这篇关于在显示进度的同时对大型馆藏进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆