Kcachegrind / callgrind对于调度程序功能不准确? [英] Kcachegrind/callgrind is inaccurate for dispatcher functions?
问题描述
我有一个模型代码,kcachegrind / callgrind报告了奇怪的结果。这是一种调度程序功能。调度员从四个地方被呼叫;每个调用都说要运行哪个实际的 do_J
函数(所以 first2
将仅调用 do_1
和 do_2
等)
I have a model code on which kcachegrind/callgrind reports strange results. It is kind of dispatcher function. The dispatcher is called from 4 places; each call says, which actual do_J
function to run (so the first2
will call only do_1
and do_2
and so on)
来源(这是实际代码的模型)
Source (this is a model of actual code)
#define N 1000000
int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }
int dispatcher(int *a, int j) {
if(j==1) do_1(a);
else if(j==2) do_2(a);
else if(j==3) do_3(a);
else do_4(a);
}
int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }
int main(){
first2(a);
last2(a);
inner2(a);
outer2(a);
}
使用 gcc -O0 $ c $进行编译c>;用
valgrind --tool = callgrind
进行调用;用 kcachegrind
和 qcachegrind-0.7
进行了kcachegrind。
Compiled with gcc -O0
; Callgrinded with valgrind --tool=callgrind
; kcachegrinded with kcachegrind
and qcachegrind-0.7
.
此处是应用程序的完整记录。到达do_J的所有路径都通过调度程序,这很好(do_1的隐藏速度太快了,但是它确实在这里,只剩下do_2了)
Here is a full callgraph of the application. All paths to do_J go through dispatcher and this is good (the do_1 is just hided as too fast, but it is here really, just left to do_2)
让我们专注于 do_1
并检查调用它的人(此图片不正确):
Lets focus on do_1
and check, who called it (this picture is incorrect):
我觉得这很奇怪,只有 first2
和 outer2
称为 do_1
,但不是全部。
And this is very strange, I think, only first2
and outer2
called do_1
but not all.
callgrind / kcachegrind的限制吗?如何获得带有权重的准确的调用图(与每个函数的运行时间成正比,有无其子项)?
Is it a limitation of callgrind/kcachegrind? How can I get accurate callgraph with weights (proportional to running time of every function, with and without its childs)?
推荐答案
是的,这是callgrind格式的限制。它不存储完整的跟踪;
Yes, this is limitation of callgrind format. It doesn't store full trace; it only stores parent-child calls information.
有一个带有pprof / libprofiler.so的google-perftools项目,CPU探查器 http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html 。 libprofiler.so
可以获取带有调用跟踪的配置文件,它将存储具有完整回溯的每个跟踪事件。 pprof
是libprofile输出到图形格式或callgrind格式的转换器。在全视图下,结果将与kcachegrind中的结果相同;但是如果您专注于某些功能,例如do_1使用pprof的选项焦点;专注于功能时,它将显示准确的调用树。
There is a google-perftools project with pprof/libprofiler.so CPU profiler, http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html . libprofiler.so
can get profile with calltraces and it will store every trace event with full backtrace. pprof
is converter of libprofile's output to graphic formats or to callgrind format. In full view the result will be the same as in kcachegrind; but if you will focus on some function, e.g. do_1 using pprof's option focus; it will show accurate calltree when focused on function.
这篇关于Kcachegrind / callgrind对于调度程序功能不准确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!