向量OpenMP C的矩阵乘法 [英] Matrix multiplication by vector OpenMP C

查看:268
本文介绍了向量OpenMP C的矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过C(OpenMP)中的矢量乘法编写Matrix 但是添加处理器时,我的程序变慢了.

I'm trying to write Matrix by vector multiplication in C (OpenMP) but my program slows when I add processors...

1 proc - 1,3 s
2 proc - 2,6 s
4 proc - 5,47 s

我在PC(i5核心)和学校集群上进行了测试,结果相同(程序变慢)

I tested this on my PC (core i5) and our school's cluster and the result is the same (program slows)

这是我的代码(矩阵是10000 x 10000),向量是10000:

here is my code (matrix is 10000 x 10000) and vector is 10000:

double start_time = clock();
#pragma omp parallel private(i) num_threads(4)
{
    tid = omp_get_thread_num();
    world_size = omp_get_num_threads();
    printf("Threads: %d\n",world_size);

    for(y = 0; y < matrix_size ; y++){
        #pragma omp parallel for private(i) shared(results, vector, matrix)
        for(i = 0; i < matrix_size; i++){
                results[y] = results[y] + vector[i]*matrix[i][y];   
        }
    }
}
double end_time = clock();
double result_time = (end_time - start_time) / CLOCKS_PER_SEC;
printf("Time: %f\n", result_time);

我的问题是:有什么错误吗?对我来说,这似乎很简单,应该加快速度

My question is: is there any mistake? For me it seems pretty straightforward and should speed up

推荐答案

我基本上已经回答了这个问题

I essentially already answer this question parallelizing-matrix-times-a-vector-by-columns-and-by-rows-with-openmp.

写入results[y]时,您处于竞争状态.要解决此问题并仍然并行化内部循环,您必须制作私有版本的results[y],并行填充它们,然后将其合并到关键部分.

You have a race condition when you write to results[y]. To fix this, and still parallelize the inner loop, you have to make private versions of results[y], fill them in parallel, and then merge them in a critical section.

在下面的代码中,我假设您正在使用double,将其替换为floatint或您使用的任何数据类型(请注意,您的内部循环遍历了matrix[i][y]的第一个索引缓存不友好).

In the code below I assume you're using double, replace it with float or int or whatever datatype you're using (note that your inner loop goes over the first index of matrix[i][y] which is cache unfriendly).

#pragma omp parallel num_threads(4)
{
    int y,i;
    double* results_private = (double*)calloc(matrix_size, sizeof(double));
    for(y = 0; y < matrix_size ; y++) {
        #pragma omp for
        for(i = 0; i < matrix_size; i++) {
            results_private[y] += vector[i]*matrix[i][y];   
        }
    }
    #pragma omp critical
    {
        for(y=0; y<matrix_size; y++) results[y] += results_private[y];
    }
    free(results_private);
}

如果这是家庭作业,并且您想给老师留下深刻的印象,则可以在没有关键部分的情况下进行合并.请参阅此链接以获取有关操作的想法

If this is homework assignment and you want to really impress your instructor then it's possible to do the merging without a critical section. See this link to get an idea on what to do fill-histograms-array-reduction-in-parallel-with-openmp-without-using-a-critic though I can't promise it will be faster.

这篇关于向量OpenMP C的矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆