OpenMP虚假共享 [英] OpenMP False Sharing

查看：321 发布时间：2016/10/25 15:40:37 c++ c++11 openmp false-sharing

本文介绍了OpenMP虚假共享的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我相信我遇到使用OpenMP的假分享。有任何方法来识别它并修复它吗？

I believe I am experiencing false sharing using OpenMP. Is there any way to identify it and fix it?

我的代码是： https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp 第36行。

My code is: https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp line 36.

与单线程1核心版本相比，使用4核CPU的性能只有10％。当使用NUMA 32物理（64虚拟）CPU系统时，CPU利用率停留在1.5内核左右，我认为这是假共享的直接症状，无法扩展。

Using a 4 core CPU compared to the single threaded 1 core version yielded only 10% in additional performance. When using a NUMA 32 physical (64 virtual) CPU system, the CPU utilization is stuck at around 1.5 cores, I think this is a direct symptom of false sharing and unable to scale.

我也试过用Intel VTune分析器运行它，它表示大多数时间都花在f（）和+ =函数上。我相信这是合理的，并没有真正解释为什么我得到这样糟糕的规模...

I also tried running it with Intel VTune profiler, it stated most of the time is spent on the "f()" and "+=" functions. I believe this is reasonable and doesn't really explain why I am getting such poor scaling...

任何想法/建议？

感谢。

推荐答案

使用reduce而不是基于线程ID显式索引数组。该数组实际上保证了错误的共享。

Use reduction instead of explicitly indexing an array based on the thread ID. That array virtually guarantees false sharing.

替换

#pragma omp parallel for 
    clones[omp_get_thread_num()]->mse() += norm_2(dedy);

for (int i = 0; i < omp_get_max_threads(); i++) {
     neural_network->mse() += clones[i]->mse();

：

#pragma omp parallel for reduction(+ : mse)
     mse += norm_2(dedy);

neural_network->mse() = mse;

这篇关于OpenMP虚假共享的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

OpenMP虚假共享 [英] OpenMP False Sharing

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

OpenMP虚假共享 [英] OpenMP False Sharing

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭