OpenMP虚假共享 [英] OpenMP False Sharing

查看:321
本文介绍了OpenMP虚假共享的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我相信我遇到使用OpenMP的假分享。有任何方法来识别它并修复它吗?

I believe I am experiencing false sharing using OpenMP. Is there any way to identify it and fix it?

我的代码是: https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp 第36行。

My code is: https://github.com/wchan/libNN/blob/master/ResilientBackpropagation.hpp line 36.

与单线程1核心版本相比,使用4核CPU的性能只有10%。当使用NUMA 32物理(64虚拟)CPU系统时,CPU利用率停留在1.5内核左右,我认为这是假共享的直接症状,无法扩展。

Using a 4 core CPU compared to the single threaded 1 core version yielded only 10% in additional performance. When using a NUMA 32 physical (64 virtual) CPU system, the CPU utilization is stuck at around 1.5 cores, I think this is a direct symptom of false sharing and unable to scale.

我也试过用Intel VTune分析器运行它,它表示大多数时间都花在f()和+ =函数上。我相信这是合理的,并没有真正解释为什么我得到这样糟糕的规模...

I also tried running it with Intel VTune profiler, it stated most of the time is spent on the "f()" and "+=" functions. I believe this is reasonable and doesn't really explain why I am getting such poor scaling...

任何想法/建议?

感谢。

推荐答案

使用reduce而不是基于线程ID显式索引数组。该数组实际上保证了错误的共享。

Use reduction instead of explicitly indexing an array based on the thread ID. That array virtually guarantees false sharing.

替换


#pragma omp parallel for 
    clones[omp_get_thread_num()]->mse() += norm_2(dedy);

for (int i = 0; i < omp_get_max_threads(); i++) {
     neural_network->mse() += clones[i]->mse();


#pragma omp parallel for reduction(+ : mse)
     mse += norm_2(dedy);

neural_network->mse() = mse;

这篇关于OpenMP虚假共享的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆