Java:通过多线程并行化快速排序 [英] Java: Parallelizing quick sort via multi-threading

查看:149
本文介绍了Java:通过多线程并行化快速排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Java中并行化算法。我从合并排序开始,并在问题中发布了我的尝试。我修改过的尝试是在下面的代码中,我现在尝试并行快速排序。



我的多线程实现或解决此问题的方法是否有任何新手错误?如果不是,我不应该期望在双核上的顺序和并行算法之间的速度增加超过32%(参见底部的时间)?



这是多线程算法:

  public class ThreadedQuick extends Thread 
{
final int MAX_THREADS = Runtime.getRuntime()。availableProcessors();

CountDownLatch doneSignal;
static int num_threads = 1;

int [] my_array;
int start,end;

public ThreadedQuick(CountDownLatch doneSignal,int [] array,int start,int end){
this.my_array = array;
this.start = start;
this.end = end;
this.doneSignal = doneSignal;
}

public static void reset(){
num_threads = 1;
}

public void run(){
quicksort(my_array,start,end);
doneSignal.countDown();
num_threads--;
}

public void quicksort(int [] array,int start,int end){
int len = end-start + 1;

if(len< = 1)
return;

int pivot_index = medianOfThree(array,start,end);
int pivotValue = array [pivot_index];

swap(array,pivot_index,end);

int storeIndex = start;
for(int i = start; i< end; i ++){
if(array [i]< = pivotValue){
swap(array,i,storeIndex);
storeIndex ++;
}
}

swap(array,storeIndex,end);

if(num_threads< MAX_THREADS){
num_threads ++;

CountDownLatch completionSignal = new CountDownLatch(1);

新的ThreadedQuick(completionSignal,array,start,storeIndex - 1).start();
quicksort(array,storeIndex + 1,end);

try {
completionSignal.await(1000,TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
} else {
quicksort(array,start,storeIndex - 1);
quicksort(array,storeIndex + 1,end);
}
}
}

以下是我启动它的方法off:

  ThreadedQuick.reset(); 
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal,array,0,array.length-1).start();
try {
completionSignal.await(1000,TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}

我对Arrays.sort和类似的顺序快速排序算法进行了测试。以下是intel duel-core dell笔记本电脑的计时结果,以秒为单位:



元素:500,000,
顺序:0.068592,
线程: 0.046871,
Arrays.sort:0.079677



元素:1,000,000,
顺序:0.14416,
线程:0.095492,
Arrays.sort:0.167155



元素:2,000,000,
顺序:0.301666,
线程:0.205719,
Arrays.sort:0.350982



元素:4,000,000,
顺序:0.623291,
线程:0.424119,
Arrays.sort:0.712698



元素:8,000,000,
顺序:1.279374,
线程:0.859363,
Arrays.sort:1.487671



上面的每个数字是100次测试的平均时间,抛出3个最低和3个最高的情况。我使用Random.nextInt(Integer.MAX_VALUE)为每个测试生成一个数组,每10次测试使用相同的种子初始化一次。每个测试包括使用System.nanoTime对给定算法进行计时。平均后我四舍五入到小数点后六位。显然,我确实检查了每种是否正常工作



正如你所看到的,两者之间的速度增加了约32%。每组测试中的顺序和线程案例。正如我上面提到的,我不应该期待更多吗?

解决方案

使numThreads静态可能会导致问题,很可能你最终会在某个时刻运行超过MAX_THREADS。



可能你没有在性能上完全翻倍的原因是你的快速排序可以没有完全并行化。请注意,对quicksort的第一次调用将在初始线程中开始真正并行运行之前传递整个数组。在耕种到单独的线程时,以上下文切换和模式转换的形式并行化算法的开销。



看看Fork / Join框架,这个问题可能非常适合那里。



有关实施的几点意见。实现Runnable而不是扩展Thread。只有在创建一些新版本的Thread类时,才应使用扩展线程。当你只想做一些并行运行的工作时,你最好使用Runnable。在运行Runnable的同时,您还可以扩展另一个类,从而为OO设计提供更大的灵活性。使用仅限于系统中可用线程数的线程池。也不要使用numThreads来决定是否分叉新线程。您可以预先计算出来。使用最小分区大小,即总阵列的大小除以可用的处理器数。类似于:

 公共类ThreadedQuick实现Runnable {

public static final int MAX_THREADS = Runtime.getRuntime ().availableProcessors();
static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);

final int [] my_array;
final int start,end;

private final int minParitionSize;

public ThreadedQuick(int minParitionSize,int [] array,int start,int end){
this.minParitionSize = minParitionSize;
this.my_array = array;
this.start = start;
this.end = end;
}

public void run(){
quicksort(my_array,start,end);
}

public void quicksort(int [] array,int start,int end){
int len = end - start + 1;

if(len< = 1)
return;

int pivot_index = medianOfThree(array,start,end);
int pivotValue = array [pivot_index];

swap(array,pivot_index,end);

int storeIndex = start;
for(int i = start; i< end; i ++){
if(array [i]< = pivotValue){
swap(array,i,storeIndex);
storeIndex ++;
}
}

swap(array,storeIndex,end);

if(len> minParitionSize){

ThreadedQuick quick = new ThreadedQuick(minParitionSize,array,start,storeIndex - 1);
未来<?> future = executor.submit(quick);
quicksort(array,storeIndex + 1,end);

try {
future.get(1000,TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
} else {
quicksort(array,start,storeIndex - 1);
quicksort(array,storeIndex + 1,end);
}
}
}

你可以开始做:

  ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS,array,0,array.length  -  1); 
quick.run();

这将在同一个线程中开始排序,避免在启动时出现不必要的线程跳跃。 / p>

警告:不确定上述实施实际上会更快,因为我没有对其进行基准测试。


I am experimenting with parallelizing algorithms in Java. I began with merge sort, and posted my attempt in this question. My revised attempt is in the code below, where I now try to parallelize quick sort.

Are there any rookie mistakes in my multi-threaded implementation or approach to this problem? If not, shouldn't I expect more than a 32% speed increase between a sequential and a parallelized algorithm on a duel-core (see timings at bottom)?

Here is the multithreading algorithm:

    public class ThreadedQuick extends Thread
    {
        final int MAX_THREADS = Runtime.getRuntime().availableProcessors();

        CountDownLatch doneSignal;
        static int num_threads = 1;

        int[] my_array;
        int start, end;

        public ThreadedQuick(CountDownLatch doneSignal, int[] array, int start, int end) {
            this.my_array = array;
            this.start = start;
            this.end = end;
            this.doneSignal = doneSignal;
        }

        public static void reset() {
            num_threads = 1;
        }

        public void run() {
            quicksort(my_array, start, end);
            doneSignal.countDown();
            num_threads--;
        }

        public void quicksort(int[] array, int start, int end) {
            int len = end-start+1;

            if (len <= 1)
                return;

            int pivot_index = medianOfThree(array, start, end);
            int pivotValue = array[pivot_index];

            swap(array, pivot_index, end);

            int storeIndex = start;
            for (int i = start; i < end; i++) {
               if (array[i] <= pivotValue) {
                   swap(array, i, storeIndex);
                   storeIndex++;
               }
            }

            swap(array, storeIndex, end);

            if (num_threads < MAX_THREADS) {
                num_threads++;

                CountDownLatch completionSignal = new CountDownLatch(1);

                new ThreadedQuick(completionSignal, array, start, storeIndex - 1).start();
                quicksort(array, storeIndex + 1, end);

                try {
                    completionSignal.await(1000, TimeUnit.SECONDS);
                } catch(Exception ex) {
                    ex.printStackTrace();
                }
            } else {
                quicksort(array, start, storeIndex - 1);
                quicksort(array, storeIndex + 1, end);
            }
        }
    }

Here is how I start it off:

ThreadedQuick.reset();
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, 0, array.length-1).start();
try {
    completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex){
    ex.printStackTrace();
}

I tested this against Arrays.sort and a similar sequential quick sort algorithm. Here are the timing results on an intel duel-core dell laptop, in seconds:

Elements: 500,000, sequential: 0.068592, threaded: 0.046871, Arrays.sort: 0.079677

Elements: 1,000,000, sequential: 0.14416, threaded: 0.095492, Arrays.sort: 0.167155

Elements: 2,000,000, sequential: 0.301666, threaded: 0.205719, Arrays.sort: 0.350982

Elements: 4,000,000, sequential: 0.623291, threaded: 0.424119, Arrays.sort: 0.712698

Elements: 8,000,000, sequential: 1.279374, threaded: 0.859363, Arrays.sort: 1.487671

Each number above is the average time of 100 tests, throwing out the 3 lowest and 3 highest cases. I used Random.nextInt(Integer.MAX_VALUE) to generate an array for each test, which was initialized once every 10 tests with the same seed. Each test consisted of timing the given algorithm with System.nanoTime. I rounded to six decimal places after averaging. And obviously, I did check to see if each sort worked.

As you can see, there is about a 32% increase in speed between the sequential and threaded cases in every set of tests. As I asked above, shouldn't I expect more than that?

解决方案

Making numThreads static can cause problems, it is highly likely that you will end up with more than MAX_THREADS running at some point.

Probably the reason why you don't get a full double up in performance is that your quick sort can not be fully parallelised. Note that the first call to quicksort will do a pass through the whole array in the initial thread before it starts to really run in parallel. There is also an overhead in parallelising an algorithm in the form of context switching and mode transitions when farming off to separate threads.

Have a look at the Fork/Join framework, this problem would probably fit quite neatly there.

A couple of points on the implementation. Implement Runnable rather than extending Thread. Extending a Thread should be used only when you create some new version of Thread class. When you just want to do some job to be run in parallel you are better off with Runnable. While iplementing a Runnable you can also still extend another class which gives you more flexibility in OO design. Use a thread pool that is restricted to the number of threads you have available in the system. Also don't use numThreads to make the decision on whether to fork off a new thread or not. You can calculate this up front. Use a minimum partition size which is the size of the total array divided by the number of processors available. Something like:

public class ThreadedQuick implements Runnable {

    public static final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
    static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);

    final int[] my_array;
    final int start, end;

    private final int minParitionSize;

    public ThreadedQuick(int minParitionSize, int[] array, int start, int end) {
        this.minParitionSize = minParitionSize;
        this.my_array = array;
        this.start = start;
        this.end = end;
    }

    public void run() {
        quicksort(my_array, start, end);
    }

    public void quicksort(int[] array, int start, int end) {
        int len = end - start + 1;

        if (len <= 1)
            return;

        int pivot_index = medianOfThree(array, start, end);
        int pivotValue = array[pivot_index];

        swap(array, pivot_index, end);

        int storeIndex = start;
        for (int i = start; i < end; i++) {
            if (array[i] <= pivotValue) {
                swap(array, i, storeIndex);
                storeIndex++;
            }
        }

        swap(array, storeIndex, end);

        if (len > minParitionSize) {

            ThreadedQuick quick = new ThreadedQuick(minParitionSize, array, start, storeIndex - 1);
            Future<?> future = executor.submit(quick);
            quicksort(array, storeIndex + 1, end);

            try {
                future.get(1000, TimeUnit.SECONDS);
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        } else {
            quicksort(array, start, storeIndex - 1);
            quicksort(array, storeIndex + 1, end);
        }
    }    
}

You can kick it off by doing:

ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS, array, 0, array.length - 1);
quick.run();

This will start the sort in the same thread, which avoids an unnecessary thread hop at start up.

Caveat: Not sure the above implementation will actually be faster as I haven't benchmarked it.

这篇关于Java:通过多线程并行化快速排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆