提高在Linux的SMP线程和不存在的加速比 [英] Boost threads and non-existant speedups on Linux SMPs

查看:177
本文介绍了提高在Linux的SMP线程和不存在的加速比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了一个小例子C ++程序,使用boost ::线程。由于它是215线,我已经张贴在引擎收录,而不是

I have written a small example C++ program, using boost::thread. Since it's 215 lines, I've posted it on pastebin instead

http://pastebin.com/LRZ24W7D

该程序创建了大量的花车(目前1GB),并增加了他们,第一顺序,然后使用多个线程(在device_matrix类中托管)。假设机器是SMP,我希望看到从code的加速。而我的Windows机器上,我看到一个四倍的提速,用4 device_matrix实例时(给4个线程,我的双核心超线程英特尔酷睿2 CPU上)。 Windows上的输出如下:

The program creates a large number of floats (currently 1gb) and adds them up, first sequentially, and then using a number of threads (hosted inside the device_matrix class). Assuming the machine is a SMP, I'd expect to see a speedup from the code. And on my Windows machine, I see a four-fold speedup, when using 4 device_matrix instances (giving 4 threads, on my dual-core hyperthreaded Intel Core2 CPU). The output on Windows is the following:

starting computation
device_matrix count       4
elements                  268435456
UINT_MAX                  4294967295
data size total           1024 mb
size per device_matrix    256 mb
reference                 134224128.00000
result                    134224128.00000
time taken (init)         12.015 secs
time taken (single)       3.422 secs
time taken (device)       0.859 secs

然而,当我编译相同code Ubuntu的机器我都可以,我看到下面的输出上:

However, when I compile the same code on an Ubuntu machine I have available, I see the following output:

starting computation
device_matrix count       8
elements                  268435456
UINT_MAX                  4294967295
data size total           1024 mb
size per device_matrix    128 mb
reference                 134215408.00000
result                    134215400.00000
time taken (init)         3.670 secs
time taken (single)       3.030 secs
time taken (threaded)     3.950 secs

在这里,没有加速出现时(事实上,它是由很多很慢)。

Here, no speed up is seen (in fact, it's slower by quite alot).

我使用的是Ubuntu机具有以下的uname -a输出

The Ubuntu machine I'm using has the following uname -a output

Linux gpulab03 2.6.32-23-generic #37-Ubuntu SMP Fri Jun 11 08:03:28 UTC 2010 x86_64 GNU/Linux

和HWINFO -short给出了以下的输出:

And hwinfo -short gives the following output:

cpu:
                       Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz, 1600 MHz
                       ... 7 more times

这我理解为有八个内核的机器(当然,四核与HT)

Which I read as the machine having eight cores (well, quad core with HT)

我使用下面的行编译我的Windows程序:

I'm using the following line to compile my program on Windows:

cl /Fe"boost.exe" /EHsc -I. boost.cpp /link /LIBPATH:"C:\boost\boost_1_45_0\stage\lib"

和Ubuntu上,我用的是以下行:

And on Ubuntu, I use the following line:

g++ -O0 -v -o boost -I$HOME/Code/boost -L$HOME/Code/boost/stage/lib boost.cpp -lboost_thread-gcc44-mt

运行上面的线时,输出在这里 http://pastebin.com/Gj6W3pcs 的情况下,它可以告诉任何人任何事。

The output when running the above line is here http://pastebin.com/Gj6W3pcs in case it can tell anyone anything.

由于我不习惯在Linux上开发的,我只是不知道要寻找什么。有一些标志我需要传递给GCC或一些设置需要启用的地方,以获得实际的并发线程?

Since I'm not used to developing on Linux, I'm just not sure what to look for. Is there some flag I need to pass to GCC or some setting I need to enable somewhere, to get actual concurrent threads?

我使用boost ::线程绕过球网为例程序看,这可能给我点对基准,但我只发现不需要任何紧缩重小生产者 - 消费者例子

I've looked around the net for an example program using boost::thread, that could give me something to benchmark against, but I'm only finding small producer-consumer examples that don't need to crunch anything "heavy".

作为一个额外的东西,使用时间命令,与一个线程提供了以下时间(以防万一的boost ::计时器是靠不住的):

As an extra thing, using the time command, with one thread gives the following times (just in case boost::timer is wonky):

real    0m9.788s
user    0m9.500s
sys     0m0.280s

和使用8个线程的时候,我看到以下内容:

And when using 8 threads, I see the following:

real    0m7.292s
user    0m10.340s
sys     0m0.340s

这似乎并没有表示任何反正跑得更快。

Which doesn't seem to indicate any faster run anyway.

我还要提到的是我在一个普通用户帐户,我已经建立了自己的提高(因此,连接针对它的正常的文件夹之外为此Linux的。)这也意味着我已经在我可以安装严重限制等是否有适用于以某种方式线程类似的限制?

I should also mention that I'm on a normal user account, and I've built boost myself (and so, linking against it outside of the "normal" folders for this purpose on Linux.) This also means I've severely limited in what I can install, etc. Are there similar limitations that applies to threads somehow?

推荐答案

我相信问题是与的boost ::计时器。我得到不同的计时结果,如果我使用 gettimeofday的和减去来代替。

I believe the problem is with boost::timer. I get different timing results if I use gettimeofday and subtract instead.

看起来好像时钟(),这是的boost ::计时器使用,将退回量由整个节目,而不是仅仅一个线程使用的CPU时间。这看起来像一个加速的bug给我。

It looks as if clock(), which is what boost::timer uses, is returning the amount of CPU time used by the entire program, not just one thread. This looks like a Boost bug to me.

我做你的code的新版本,这是一个CentOS 5的机器上的升压兼容。我修改你的do_sum操作成一个自由的功能,所以我保证总和计算准确的单线程和多线程的方式相同。我添加了一个非Windows的头,所以我可以用gettimeofday的。

I made a new version of your code that was compatible with the Boost on a CentOS 5 machine. I modified your do_sum operation into a free function so I was guaranteed sum was computed exactly the same way for single and multithreaded. I added a non-Windows header so I could use gettimeofday.

的code是在这里。

这篇关于提高在Linux的SMP线程和不存在的加速比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆