为什么这个OpenMP程序比单线程慢? [英] Why is this OpenMP program slower than single-thread?

查看:1020
本文介绍了为什么这个OpenMP程序比单线程慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请看这段代码。



单线程程序: http://pastebin.com/KAx4RmSJ 。编译:


g ++ -lrt -O2 main.cpp -o nnlv2


具有openMP的多线程: http://pastebin.com/fbe4gZSn
编译为:


g ++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp




我在双核系统上测试了它(所以我们有两个并行运行的线程)。但是多线程版本比单线程版本慢(并且显示不稳定的时间,尝试运行它几次)。怎么了?



一些测试:



单线程:

 层神经元输入---时间(ns)

10 200 200 --- 1898983

10 500 500 --- 11009094

10 1000 1000 --- 48116913

多线程:

 层神经元输入---时间(ns)

10 200 200 --- 2518262

10 500 500 --- 13861504

10 1000 1000 --- 53446849
解决方案

是你的目标这里研究OpenMP,还是让你的程序更快?如果是后者,那么写入乘法代码,减少遍历次数,并结合SIMD将是更值得的。



步骤1:组合循环并使用乘法 - add:

  //完全删除变量'temp'
for(int i = 0; i< LAYERS; i ++)
{
for(int j = 0; j {
outputs [j] = 0;

for(int k = 0,l = 0; l {
outputs [j] + = inputs [ ] [k];
}

outputs [j] = sigmoid(outputs [j]);
}

std :: swap(inputs,outputs);
}


Please look at this code.

Single-threaded program: http://pastebin.com/KAx4RmSJ. Compiled with:

g++ -lrt -O2 main.cpp -o nnlv2

Multithread with openMP: http://pastebin.com/fbe4gZSn Compiled with:

g++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp

I tested it on a dual core system (so we have two threads running in parallel). But multi-threaded version is slower than the single-threaded one (and shows unstable time, try to run it few times). What's wrong? Where did I make mistake?

Some tests:

Single-thread:

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 1898983

10 500 500 --- 11009094

10 1000 1000 --- 48116913

Multi-thread:

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 2518262

10 500 500 --- 13861504

10 1000 1000 --- 53446849

I don't understand what is wrong.

解决方案

Is your goal here to study OpenMP, or to make your program faster? If the latter, it would be more worthwhile to write multiply-add code, reduce the number of passes, and incorporate SIMD.

Step 1: Combine loops and use multiply-add:

// remove the variable 'temp' completely
for(int i=0;i<LAYERS;i++)
{
  for(int j=0;j<NEURONS;j++)
  {
    outputs[j] = 0;

    for(int k=0,l=0;l<INPUTS;l++,k++)
    {
      outputs[j] += inputs[l] * weights[i][k];
    }

    outputs[j] = sigmoid(outputs[j]);
  }

  std::swap(inputs, outputs);
}

这篇关于为什么这个OpenMP程序比单线程慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆