C OpenMP并行气泡排序 [英] C OpenMP parallel bubble sort

查看:85
本文介绍了C OpenMP并行气泡排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个并行气泡排序算法的实现(奇偶转置排序),使用OpenMP.但是,经过测试,它比串行版本慢(约10%),尽管我有4核处理器(由于Intel超线程,所以2实x 2).我检查了内核是否已实际使用,并且在运行该程序时可以看到它们各占100%.因此,我认为我在算法的实现上犯了一个错误.

I have an implementation of parallel bubble sort algorithm(Odd-Even transposition sort) in C, using OpenMP. However, after I tested it it's slower than the serial version(by about 10%) although I have a 4 cores processor ( 2 real x 2 because of Intel hyperthreading). I have checked to see if the cores are actually used and I can see them at 100% each when running the program. Therefore I think I did a mistake in the implementation the algorithm.

我正在使用内核2.6.38-8-generic的linux.

I am using linux with kernel 2.6.38-8-generic.

这是我的编译方式:

gcc -o bubble-sort bubble-sort.c -Wall -fopenmp

gcc -o bubble-sort bubble-sort.c -Wall -fopenmp(针对串行版本)

这是我的奔跑方式:

./bubble-sort < in_10000 > out_10000

#include <omp.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

int main()
{
        int i, n, tmp, *x, changes;
        int chunk;
        scanf("%d ", &n);
        chunk = n / 4;
        x = (int*) malloc(n * sizeof(int));
        for(i = 0; i < n; ++i)
            scanf("%d ", &x[i]);
    changes = 1;
    int nr = 0;
    while(changes)
    {
    #pragma omp parallel private(tmp)
    {
            nr++;
            changes = 0;
            #pragma omp for \
                    reduction(+:changes)
            for(i = 0; i < n - 1; i = i + 2)
            {
                    if(x[i] > x[i+1] )
                    {
                            tmp = x[i];
                            x[i] = x[i+1];
                            x[i+1] = tmp;
                            ++changes;
                    }
            }
            #pragma omp for \
                    reduction(+:changes)
            for(i = 1; i < n - 1; i = i + 2)
            {
                    if( x[i] > x[i+1] )
                    {
                            tmp = x[i];
                            x[i] = x[i+1];
                            x[i+1] = tmp;
                            ++changes;
                    }
            }
    }
    }

    return 0;

}

以后

在我做出了您建议的更改之后,现在看来效果很好.它的伸缩性也相当不错(我也在8个物理内核上进行了测试->花了21s的一组15万个数字,远远少于一个内核).但是,如果我自己设置OMP_SCHEDULE环境变量,则性能会降低...

It seems to work well now after I made the changes you suggested. It also scales pretty good(I tested on 8 physical cores too -> took 21s for a set of 150k numbers which is far less than on one core). However if I set the OMP_SCHEDULE environment variable myself the performance decreases...

推荐答案

您应该对其进行概要分析,并检查线程在哪里花费时间.

You should profile it and check where threads spend time.

一个可能的原因是平行区域不断被创建和破坏;取决于OpenMP的实现,它可能导致线程池的重新创建,尽管好的实现可能会处理这种情况.

One possible reason is that parallel regions are constantly created and destroyed; depending on OpenMP implementation, it could lead to re-creation of the thread pool, though good implementations should probably handle this case.

需要刮除的一些小东西:
-ok似乎完全没有必要,您只需将循环退出条件更改为i<n-1;
-不需要显式的障碍-首先,将其放在平行区域之外,这样就没有意义了;其次,OpenMP并行区域和循环在末尾具有隐式的障碍;
-在while循环内至少合并两个相应的并行区域:

Some small things to shave off:
- ok seems completely unnecessary, you can just change the loop exit condition to i<n-1;
- explicit barrier is unnecessary - first, you put it out of parallel regions so it makes no sense; and second, OpenMP parallel regions and loops have implicit barriers at the end;
- combine at least the two consequent parallel regions inside the while loop:

#pragma omp parallel private(tmp)
{
    #pragma omp for bla-bla
    for (i=0; i<n-1; i+=2 ) {...}

    #pragma omp for bla-bla
    for (i=1; i<n-1; i+=2 ) {...}
}

这篇关于C OpenMP并行气泡排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆