写入单个向量时OpenMP中的虚假共享 [英] False sharing in OpenMP when writing to a single vector

查看:283
本文介绍了写入单个向量时OpenMP中的虚假共享的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用蒂姆·马特森(Tim Matterson)的讲义学习了OpenMP,他给出了一个错误共享的示例,如下所示.该代码很简单,用于从4.0/(1 + x * x)的数值积分计算pi,x的范围为0到1.该代码使用向量包含4.0/(1 + x * x)的值对于从0到1的每个x,然后将向量求和:

I learnt OpenMP using Tim Matterson's lecture notes, and he gave an example of false sharing as below. The code is simple and is used to calculate pi from numerical integral of 4.0/(1+x*x) with x ranges from 0 to 1. The code uses a vector to contain the value of 4.0/(1+x*x) for each x from 0 to 1, then sum the vector at the end:

#include <omp.h>
static long num_steps = 100000;
double step;
#define NUM_THREADS 2
void main()
{
    int i, nthreads; double pi, sum[NUM_THREADS];
    step = 1.0/(double)num_steps;
    omp_set_num_threads(NUM_THREADS);
    #pragma omp parallel
    {
        int i, id, nthrds;
        double x;
        id = omp_get_thread_num();
        nthrds = omp_get_num_threads();
        if (id == 0) nthreads = nthrds;
        for (i=id, sum[id]=0.0; i<num_steps; i=i+nthrds){
            x = (i+0.5)*step;
            sum[id] += 4.0/(1.0+x*x);
        }
    }
    for (i=0; pi=0.0; i<nthreads;i++) pi += sum[i]*step;
}

在此示例中,我对虚假共享存在一些疑问:

I have some questions about false sharing from this example:

  1. 是否由于将写入数组的作业间歇性地分配在两个线程(即[thread0,thread1,thread0,thread1,...])之间而导致了错误共享?如果我们使用#pragma omp parallel for,则数组将被划分为[thread0,thread0,thread0,....,thread1,thread1,thread1,...],那么我们仍然有错误的共享,因为地址是从每个线程访问的相距遥远吗?
  2. 如果我有一个使用#pragma omp parallel for写入与我的输入向量具有1对1对应关系的输出向量的工作(例如,输入是预测变量的矩阵,而输出是预测的向量) ,那么我什么时候需要担心虚假共享?
  1. Is the false sharing caused by the fact that the job of writing to the array is divided intermittently between two threads, i.e. [thread0, thread1, thread0, thread1, ...]? If we use #pragma omp parallel for, then the array will be divided as [thread0, thread0, thread0, ...., thread1, thread1, thread1, ...], then do we still have false sharing, now that the address being accessed from each thread is far from each other?
  2. If I have a job that uses #pragma omp parallel for to write to an output vector that has 1-to-1 correspondence with my input vector (for example the input is a matrix of predictors and the output is a vector is prediction), then when do I need to worry about false sharing?

推荐答案

本教程不断在Stack Overflow上发送困惑的人-有时,学习自下而上不是一个好主意.

This tutorial keeps sending confused people on Stack Overflow - sometimes it's not a good idea to learn bottom up.

  1. 数组sum仅具有2 === NUM_THREADS项,即[sum of thread 0, sum of thread 1].这些值可能在同一高速缓存行上,因此会导致错误的共享.

  1. The array sum only has 2 === NUM_THREADS entries, i.e. [sum of thread 0, sum of thread 1]. Those values are likely on the same cache-line therefore causing false sharing.

如果输入和输出向量足够(即每个线程数百个元素),则可以.您应该始终使用惯用的OpenMP,即使用parallel for而不是本教程有问题的示例中展示的手动工作共享.那么默认情况下您就可以了,因为OpenMP会将相邻的索引分配给同一线程.

If the input and output vectors are sufficiently (i.e. hundreds of elements per threads), you are fine. You should always use idiomatic OpenMP, i.e. using parallel for rather than the manual worksharing exhibited in the problematic examples of this tutorial. Then you are fine by default because OpenMP will assign adjacent indices to the same thread.

如果您在本教程中没有讲到重点,请确保使用内置的reduce关键字,而不是像示例中所展示的那样手动砍在一起.

If you haven't got to the point in the tutorial, make sure to use the built-in reduction keyword rather than manually hacking together reduction as exposed in the example.

这篇关于写入单个向量时OpenMP中的虚假共享的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆