为了分析(-pg),为什么我的代码在多线程下运行速度慢于单线程时的运行速度? [英] Why does my code run slower with multiple threads than with a single thread when it is compiled for profiling (-pg)?
问题描述
最近,我在该程序中添加了线程,以利用我的i5 Quad Core上的其他内核。
在奇怪的一系列事件中,应用程序的调试版本现在运行速度较慢,但优化版本的运行速度比添加线程之前要快。
我将-g -pg标志传递给gcc作为调试版本和优化版本的-O3标志。
主机系统:Ubuntu Linux 10.4 AMD64。
我知道调试符号为程序增加了很大的开销,但相对性能一直保持不变。即在调试和优化构建中,更快的算法总是会运行得更快。
任何想法为什么我看到这种行为?
调试版本使用-g3 -pg编译。使用-O3优化版本。
优化无线程:0m4.864s
优化线程:0m2.075s
调试无线程:0m30.351s
调试线程:0m39.860s
在strip之后调试线程:0m39.767s
调试无线程no-pg):0m10.428s
调试线程(no-pg):0m4.045s
<这让我确信,-g3不是怪怪性能差异的三角洲,而是它相当于-pg开关。很可能-pg选项会添加某种锁定机制来衡量线程性能。
由于-pg在线程应用程序中被破坏,我会
旗?这不是调试符号(不会影响代码生成),这是用于分析(它确实)。多线程过程中的分析需要额外的锁定这会降低多线程版本的速度,甚至会导致它比非多线程版本慢。
I'm writing a ray tracer.
Recently, I added threading to the program to exploit the additional cores on my i5 Quad Core.
In a weird turn of events the debug version of the application is now running slower, but the optimized build is running faster than before I added threading.
I'm passing the "-g -pg" flags to gcc for the debug build and the "-O3" flag for the optimized build.
Host system: Ubuntu Linux 10.4 AMD64.
I know that debug symbols add significant overhead to the program, but the relative performance has always been maintained. I.e. a faster algorithm will always run faster in both debug and optimization builds.
Any idea why I'm seeing this behavior?
Debug version is compiled with "-g3 -pg". Optimized version with "-O3".
Optimized no threading: 0m4.864s
Optimized threading: 0m2.075s
Debug no threading: 0m30.351s
Debug threading: 0m39.860s
Debug threading after "strip": 0m39.767s
Debug no threading (no-pg): 0m10.428s
Debug threading (no-pg): 0m4.045s
This convinces me that "-g3" is not to blame for the odd performance delta, but that it's rather the "-pg" switch. It's likely that the "-pg" option adds some sort of locking mechanism to measure thread performance.
Since "-pg" is broken on threaded applications anyway, I'll just remove it.
What do you get without the -pg
flag? That's not debugging symbols (which don't affect the code generation), that's for profiling (which does).
It's quite plausible that profiling in a multithreaded process requires additional locking which slows the multithreaded version down, even to the point of making it slower than the non-multithreaded version.
这篇关于为了分析(-pg),为什么我的代码在多线程下运行速度慢于单线程时的运行速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!