C OpenMP并行气泡排序 [英] C OpenMP parallel bubble sort
问题描述
我有一个并行气泡排序算法的实现(奇偶转置排序),使用OpenMP.但是,经过测试,它比串行版本慢(约10%),尽管我有4核处理器(由于Intel超线程,所以2实x 2).我检查了内核是否已实际使用,并且在运行该程序时可以看到它们各占100%.因此,我认为我在算法的实现上犯了一个错误.
I have an implementation of parallel bubble sort algorithm(Odd-Even transposition sort) in C, using OpenMP. However, after I tested it it's slower than the serial version(by about 10%) although I have a 4 cores processor ( 2 real x 2 because of Intel hyperthreading). I have checked to see if the cores are actually used and I can see them at 100% each when running the program. Therefore I think I did a mistake in the implementation the algorithm.
我正在使用内核2.6.38-8-generic的linux.
I am using linux with kernel 2.6.38-8-generic.
这是我的编译方式:
gcc -o bubble-sort bubble-sort.c -Wall -fopenmp
或
gcc -o bubble-sort bubble-sort.c -Wall -fopenmp
(针对串行版本)
这是我的奔跑方式:
./bubble-sort < in_10000 > out_10000
#include <omp.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
int main()
{
int i, n, tmp, *x, changes;
int chunk;
scanf("%d ", &n);
chunk = n / 4;
x = (int*) malloc(n * sizeof(int));
for(i = 0; i < n; ++i)
scanf("%d ", &x[i]);
changes = 1;
int nr = 0;
while(changes)
{
#pragma omp parallel private(tmp)
{
nr++;
changes = 0;
#pragma omp for \
reduction(+:changes)
for(i = 0; i < n - 1; i = i + 2)
{
if(x[i] > x[i+1] )
{
tmp = x[i];
x[i] = x[i+1];
x[i+1] = tmp;
++changes;
}
}
#pragma omp for \
reduction(+:changes)
for(i = 1; i < n - 1; i = i + 2)
{
if( x[i] > x[i+1] )
{
tmp = x[i];
x[i] = x[i+1];
x[i+1] = tmp;
++changes;
}
}
}
}
return 0;
}
以后
在我做出了您建议的更改之后,现在看来效果很好.它的伸缩性也相当不错(我也在8个物理内核上进行了测试->花了21s的一组15万个数字,远远少于一个内核).但是,如果我自己设置OMP_SCHEDULE环境变量,则性能会降低...
It seems to work well now after I made the changes you suggested. It also scales pretty good(I tested on 8 physical cores too -> took 21s for a set of 150k numbers which is far less than on one core). However if I set the OMP_SCHEDULE environment variable myself the performance decreases...
推荐答案
您应该对其进行概要分析,并检查线程在哪里花费时间.
You should profile it and check where threads spend time.
一个可能的原因是平行区域不断被创建和破坏;取决于OpenMP的实现,它可能导致线程池的重新创建,尽管好的实现可能会处理这种情况.
One possible reason is that parallel regions are constantly created and destroyed; depending on OpenMP implementation, it could lead to re-creation of the thread pool, though good implementations should probably handle this case.
需要刮除的一些小东西:
-ok
似乎完全没有必要,您只需将循环退出条件更改为i<n-1
;
-不需要显式的障碍-首先,将其放在平行区域之外,这样就没有意义了;其次,OpenMP并行区域和循环在末尾具有隐式的障碍;
-在while循环内至少合并两个相应的并行区域:
Some small things to shave off:
- ok
seems completely unnecessary, you can just change the loop exit condition to i<n-1
;
- explicit barrier is unnecessary - first, you put it out of parallel regions so it makes no sense; and second, OpenMP parallel regions and loops have implicit barriers at the end;
- combine at least the two consequent parallel regions inside the while loop:
#pragma omp parallel private(tmp)
{
#pragma omp for bla-bla
for (i=0; i<n-1; i+=2 ) {...}
#pragma omp for bla-bla
for (i=1; i<n-1; i+=2 ) {...}
}
这篇关于C OpenMP并行气泡排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!