Openmp可以减少执行时间 [英] Openmp to reduce time execution

查看:92
本文介绍了Openmp可以减少执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好。



我试图优化这种编码我使用OpenMP来减少执行时间。但是,我尝试在for循环中添加#pragma omp parallel,但执行时间更长。如何使用OpenMP优化此代码?



我尝试过:



  #include   <   stdio.h  >  
#include < omp.h >

int main()
{
int array [ 10 ] [ 10 ],行,列,i,j,sum = 0 ;
printf( \ n输入行数限制:\t);
scanf( %d,& row);
printf( \ n输入列的限制:\t);
scanf( %d,& column);
printf( \ nEnnter%d *%d Matrix \ n中的元素,row,column);

for (j = 0 ; j< column; j ++)
{
for (i = 0 ; i< row; i ++)
{
scanf( %d,& array [i] [ J]);
}
}
printf( \ nArray \ n);
for (j = 0 ; j< column; j ++)
{
for (i = 0 ; i< row; i ++)
{
printf( %4d array [i] [j]);
}
printf( \ n);
}

for (i = 0 ; i<行; i ++)
{
for (j = 0 ; j< column; j ++)
{
sum = sum + array [i] [j];
}
printf( 列号[%d]的\ nSum:\ t%d,i,sum);
sum = 0 ;
}
printf( \ n);
return 0 ;
}

解决方案

很难解释这是怎么回事,因为它已经失去了它的缩进。



从我所知道的,这段代码不会轻易并行化,因为它是如此微不足道,以至于花在同步上的时间将压倒计算。在这段代码中,OpenMP必须对sum变量使用锁定,这将是非常低效的。



更好的方法是为每个变量设置一个中间数排,然后总结所有这些。每行的总和可以并行完成,因为它们是独立的。您可以将数据保存在每行一个插槽的阵列中,然后不需要同步。然后可以使用标准缩减来完成行的总和。这些在CUDA世界中很常见,但我不知道它们是否在OpenMP中。如果在OpenMP中没有减少,则可以进行标准线性计算。


如果要减少执行时间,请在要优化的代码中删除I / O! I / O无法以有意义的方式并行化。此外,printf和scanf语句中的每一个都将比10 * 10阵列上的整个总和处理更长的时间。只要您在那里有I / O,就无法对程序性能进行有意义的测量,并且实际处理非常简单。当你删除总和时,请尝试测量代码的时间:我打赌你会看到差异。



阅读解决方案1以获得进一步的建议。 Rick说得对:并行化代码增加了代码的复杂性,如果没有足够的处理可以节省时间,执行将花费更多,而不是更少的时间。


< blockquote class =quote>

引用:

我试图优化这种编码,使用OpenMP减少执行时间。但是,我尝试在for循环中添加#pragma omp parallel,但执行时间更长。如何使用OpenMP优化此代码?



您的问题是MP需要成本,这意味着设置线程和收集结果需要花费时间。所以每个线程都需要做足够的工作才能使它值得花费。

一个好的编码风格也可能有助于加速线程。

例如:

在此代码中, sum 来自代码的另一部分,您必须转发事实,如果该变量未用于其他任务以获得正确的结果。

  for (i =  0 ; i<行; i ++)
{
for (j = 0 ; j< column; j ++)
{
sum = sum + array [i] [j];
}
printf( 列号[%d]的\ nSum:\ t%d,i,sum);
sum = 0 ;
}



这里总和从其他中间使用不会出错。

  for (i =  0 ; i < row; i ++)
{
sum = 0 ;
for (j = 0 ; j< column; j ++)
{
sum = sum + 阵列 [i] [j];
}
printf( 列号[%d]的\ nSum:\ t%d,i,sum);
}



这显然每个内部循环都不依赖于外部事物(假设乱序结果不是问题) 。



另一个问题:

你还没有测试过你的代码

with matrix

1 2

3 4

你得到

3 7

当你期待

4 6

,因为您的代码对行而不是列进行求和。


Hello.

Im trying to optimize this coding my using OpenMP to reduce the execution time. However, I tried adding #pragma omp parallel at for loop but the execution time is longer. How do I optimize this code using OpenMP?

What I have tried:

#include <stdio.h>
#include <omp.h>
 
int main()
{
 int array[10][10], row, column, i, j, sum = 0;
 printf("\nEnter The Limit of Rows:\t");
 scanf("%d", &row);
 printf("\nEnter The Limit of Columns:\t");
 scanf("%d", &column);
 printf("\nEnter Elements in the %d*%d Matrix\n", row, column);

 for(j = 0; j < column; j++)
 {
     for(i = 0; i < row; i++)
     {
         scanf("%d", &array[i][j]);
     }
 }
 printf("\nArray\n");
 for(j = 0; j < column; j++)
 {
     for(i = 0; i < row; i++)
     {
         printf("%4d", array[i][j]);
     }
     printf("\n");
 }

 for(i = 0; i < row; i++)
 {
     for(j = 0; j < column; j++)
     {
         sum = sum + array[i][j];
     }
     printf("\nSum of Column No. [%d]:\t%d", i, sum);
     sum = 0;
 }
 printf("\n");
 return 0;
}

解决方案

It is difficult to decipher what's going on with this because it has lost its indenting.

From what I can tell, this code will not parallelize easily because it is so trivial that the time spent on synchronization will overwhelm the calculations. In this code, OpenMP would have to use a lock on the sum variable and this will be very inefficient.

A better approach would be to have an intermediate sum for each row and then sum all of those. The sum for each row can be done in parallel because they are independent. You can save the data in an array with one slot per row and then no synchronization is required. The sum of the rows can then be done using a standard reduction. Those are common in the CUDA world but I don't know if they are in OpenMP. If a reduction is not available in OpenMP then a standard linear calculation can be done.


If you want to reduce execution time, get rid of I/O within the code that you want optimized! I/O can't be parallelized in a meaningful way. Also, every single one of the printf and scanf statements will take much longer to process than the entire summation over a 10*10 array. You can't make a meaningful measurement of your programs performance as long as you have I/O in there, and the actual processing is so trivial. Try measuring the time of your code when you remove the summation: I bet you'll have trouble seeing a difference.

Read solution 1 for further advice. Rick has it right: parallelizing code adds complexity to the code, and if there is not enough processing going on that you can save time on, execution will take more, not less time.


Quote:

Im trying to optimize this coding my using OpenMP to reduce the execution time. However, I tried adding #pragma omp parallel at for loop but the execution time is longer. How do I optimize this code using OpenMP?


Your problem is that MP comes at cost, it means that setting threads and gathering results cost time. So each thread need to do enough work to make it worth the cost.
A good coding style may also help to speedup threads.
Exemple:
In this code, sum comes from another part of code, you have to relay on fact that the variable if not used for another task to get correct result.

for(i = 0; i < row; i++)
{
    for(j = 0; j < column; j++)
    {
        sum = sum + array[i][j];
    }
    printf("\nSum of Column No. [%d]:\t%d", i, sum);
    sum = 0;
}


Here sum can't get wrong from other intermediate use.

for(i = 0; i < row; i++)
{
    sum = 0;
    for(j = 0; j < column; j++)
    {
        sum = sum + array[i][j];
    }
    printf("\nSum of Column No. [%d]:\t%d", i, sum);
}


And it make it obvious that each inner loop do not depend on something external (assuming out of order result is not a problem).

Another problem:
You haven't tested your code
with matrix
1 2
3 4
you get
3 7
when you expect
4 6
because your code sums rows instead of columns.


这篇关于Openmp可以减少执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆