多核错误共享 [英] False sharing over multiple cores

查看：72 发布时间：2021/6/12 20:19:37 c++ parallel-processing openmp cpu-cache false-sharing

本文介绍了多核错误共享的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面的程序会不会出现虚假分享?

Would false sharing happen in the following program?

内存

1 个数组分成 4 个相等的区域:[A1, A2, B1, B2]
整个数组可以放入实际程序中的 L1 缓存中.
每个区域都填充为 64 字节的倍数.

步骤

1. thread 1 write to region A1 and A2 while thread 2 write to region B1 and B2.
2. barrier
3. thread 1 read B1 and write to A1 while thread 2 read B2 and write to A2.
4. barrier
5. Go to step 1.

测试

#include <vector>
#include <iostream>
#include <stdint.h>
int main() {
    int N = 64;
    std::vector<std::int32_t> x(N, 0);
    #pragma omp parallel
    {
        for (int i = 0; i < 1000; ++i) {
            #pragma omp for
            for (int j = 0; j < 2; ++j) {
                for (int k = 0; k < (N / 2); ++k) {
                    x[j*N/2 + k] += 1;
                }
            }
            #pragma omp for
            for (int j = 0; j < 2; ++j) {
                for (int k = 0; k < (N/4); ++k) {
                    x[j*N/4 + k] += x[N/2 + j*N/4 + k] - 1;
                }
            }
        }
    }
    for (auto i : x ) std::cout << i << " ";
    std::cout << "\n";
}

结果

32 elements of 500500 (1000 * 1001 / 2)
32 elements of 1000

推荐答案

由于 x 不能保证与缓存行对齐，因此您的代码中存在一些错误共享.填充不一定足够.在您的示例中 N 非常小，这可能是一个问题.请注意，在您的示例 N 中，最大的开销可能是工作共享和线程管理.如果 N 足够大，即 array-size/number-of-threads >>缓存行大小，错误共享不是相关问题.

There is some false sharing in your code since x is not guaranteed to be aligned to a cache-line. Padding is not necessarily enough. In your example N is really small which may be a problem. Note at your example N, the biggest overhead would probably be worksharing and thread management. If N is sufficiently large, i.e. array-size / number-of-threads >> cache-line-size, false sharing is not a relevant problem.

从代码中的不同线程交替写入 A2 在缓存使用方面也不是最佳的，但这不是错误共享问题.

Alternating writes to A2 from different threads in your code is also not optimal in terms of cache usage, but that is not a false sharing issue.

注意，您不需要拆分循环.如果您在循环中连续访问内存中的索引，则一个循环就可以了，例如

Note, you do not need to split the loops. If you access index into memory contiguously in a loop, one loop is just fine, e.g.

#pragma omp for
for (int j = 0; j < N; ++j)
    x[j] += 1;

如果你真的很小心，你可以添加schedule(static)，那么你就可以保证一个连续的词分布.

If you are really careful you may add schedule(static), then you have a guarantee of an even contiguous word distribution.

请记住，错误共享是一个性能问题，而不是正确性问题，并且只有在频繁发生时才相关.典型的不良模式是写入 vector[my_thread_index].

Remember that false sharing is a performance issue, not a correctness problem, and only relevant if it occurs frequently. Typical bad patterns are writes to vector[my_thread_index].

这篇关于多核错误共享的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多核错误共享 [英] False sharing over multiple cores

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

多核错误共享 [英] False sharing over multiple cores

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭