OpenMP虚假共享 [英] OpenMP False Sharing
问题描述
我相信我遇到使用OpenMP的假分享。有任何方法来识别它并修复它吗?
I believe I am experiencing false sharing using OpenMP. Is there any way to identify it and fix it?
我的代码是: https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp 第36行。
My code is: https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp line 36.
与单线程1核心版本相比,使用4核CPU的性能只有10%。当使用NUMA 32物理(64虚拟)CPU系统时,CPU利用率停留在1.5内核左右,我认为这是假共享的直接症状,无法扩展。
Using a 4 core CPU compared to the single threaded 1 core version yielded only 10% in additional performance. When using a NUMA 32 physical (64 virtual) CPU system, the CPU utilization is stuck at around 1.5 cores, I think this is a direct symptom of false sharing and unable to scale.
我也试过用Intel VTune分析器运行它,它表示大多数时间都花在f()和+ =函数上。我相信这是合理的,并没有真正解释为什么我得到这样糟糕的规模...
I also tried running it with Intel VTune profiler, it stated most of the time is spent on the "f()" and "+=" functions. I believe this is reasonable and doesn't really explain why I am getting such poor scaling...
任何想法/建议?
感谢。
推荐答案
使用reduce而不是基于线程ID显式索引数组。该数组实际上保证了错误的共享。
Use reduction instead of explicitly indexing an array based on the thread ID. That array virtually guarantees false sharing.
替换
#pragma omp parallel for
clones[omp_get_thread_num()]->mse() += norm_2(dedy);
for (int i = 0; i < omp_get_max_threads(); i++) {
neural_network->mse() += clones[i]->mse();
:
#pragma omp parallel for reduction(+ : mse)
mse += norm_2(dedy);
neural_network->mse() = mse;
这篇关于OpenMP虚假共享的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!