为什么这个OpenMP程序比单线程慢? [英] Why is this OpenMP program slower than single-thread?
问题描述
请看这段代码。
单线程程序: http://pastebin.com/KAx4RmSJ 。编译:
g ++ -lrt -O2 main.cpp -o nnlv2
具有openMP的多线程: http://pastebin.com/fbe4gZSn
编译为:
g ++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp
我在双核系统上测试了它(所以我们有两个并行运行的线程)。但是多线程版本比单线程版本慢(并且显示不稳定的时间,尝试运行它几次)。怎么了?
一些测试:
单线程:
层神经元输入---时间(ns)
10 200 200 --- 1898983
10 500 500 --- 11009094
10 1000 1000 --- 48116913
多线程:
层神经元输入---时间(ns)
$ p $
10 200 200 --- 2518262
10 500 500 --- 13861504
10 1000 1000 --- 53446849
解决方案是你的目标这里研究OpenMP,还是让你的程序更快?如果是后者,那么写入乘法代码,减少遍历次数,并结合SIMD将是更值得的。
步骤1:组合循环并使用乘法 - add:
//完全删除变量'temp'
for(int i = 0; i< LAYERS; i ++)
{
for(int j = 0; j{
outputs [j] = 0;
for(int k = 0,l = 0; l{
outputs [j] + = inputs [ ] [k];
}
outputs [j] = sigmoid(outputs [j]);
}
std :: swap(inputs,outputs);
}
Please look at this code.
Single-threaded program: http://pastebin.com/KAx4RmSJ. Compiled with:
g++ -lrt -O2 main.cpp -o nnlv2
Multithread with openMP: http://pastebin.com/fbe4gZSn Compiled with:
g++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp
I tested it on a dual core system (so we have two threads running in parallel). But multi-threaded version is slower than the single-threaded one (and shows unstable time, try to run it few times). What's wrong? Where did I make mistake?
Some tests:
Single-thread:
Layers Neurons Inputs --- Time (ns) 10 200 200 --- 1898983 10 500 500 --- 11009094 10 1000 1000 --- 48116913
Multi-thread:
Layers Neurons Inputs --- Time (ns) 10 200 200 --- 2518262 10 500 500 --- 13861504 10 1000 1000 --- 53446849
I don't understand what is wrong.
解决方案Is your goal here to study OpenMP, or to make your program faster? If the latter, it would be more worthwhile to write multiply-add code, reduce the number of passes, and incorporate SIMD.
Step 1: Combine loops and use multiply-add:
// remove the variable 'temp' completely for(int i=0;i<LAYERS;i++) { for(int j=0;j<NEURONS;j++) { outputs[j] = 0; for(int k=0,l=0;l<INPUTS;l++,k++) { outputs[j] += inputs[l] * weights[i][k]; } outputs[j] = sigmoid(outputs[j]); } std::swap(inputs, outputs); }
这篇关于为什么这个OpenMP程序比单线程慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!