Kcachegrind / callgrind对于调度程序功能不准确? [英] Kcachegrind/callgrind is inaccurate for dispatcher functions?

查看:166
本文介绍了Kcachegrind / callgrind对于调度程序功能不准确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模型代码,kcachegrind / callgrind报告了奇怪的结果。这是一种调度程序功能。调度员从四个地方被呼叫;每个调用都说要运行哪个实际的 do_J 函数(所以 first2 将仅调用 do_1 do_2 等)

I have a model code on which kcachegrind/callgrind reports strange results. It is kind of dispatcher function. The dispatcher is called from 4 places; each call says, which actual do_J function to run (so the first2 will call only do_1 and do_2 and so on)

来源(这是实际代码的模型)

Source (this is a model of actual code)

#define N 1000000

int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }

int dispatcher(int *a, int j) {
    if(j==1) do_1(a);
    else if(j==2) do_2(a);
    else if(j==3) do_3(a);
    else do_4(a);
}

int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }

int main(){
    first2(a);
    last2(a);
    inner2(a);
    outer2(a);
}

使用 gcc -O0 ;用 valgrind --tool = callgrind 进行调用;用 kcachegrind qcachegrind-0.7 进行了kcachegrind。

Compiled with gcc -O0; Callgrinded with valgrind --tool=callgrind; kcachegrinded with kcachegrind and qcachegrind-0.7.

此处是应用程序的完整记录。到达do_J的所有路径都通过调度程序,这很好(do_1的隐藏速度太快了,但是它确实在这里,只剩下do_2了)

Here is a full callgraph of the application. All paths to do_J go through dispatcher and this is good (the do_1 is just hided as too fast, but it is here really, just left to do_2)

让我们专注于 do_1 并检查调用它的人(此图片不正确):

Lets focus on do_1 and check, who called it (this picture is incorrect):

我觉得这很奇怪,只有 first2 outer2 称为 do_1 ,但不是全部。

And this is very strange, I think, only first2 and outer2 called do_1 but not all.

callgrind / kcachegrind的限制吗?如何获得带有权重的准确的调用图(与每个函数的运行时间成正比,有无其子项)?

Is it a limitation of callgrind/kcachegrind? How can I get accurate callgraph with weights (proportional to running time of every function, with and without its childs)?

推荐答案

是的,这是callgrind格式的限制。它不存储完整的跟踪;

Yes, this is limitation of callgrind format. It doesn't store full trace; it only stores parent-child calls information.

有一个带有pprof / libprofiler.so的google-perftools项目,CPU探查器 http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html libprofiler.so 可以获取带有调用跟踪的配置文件,它将存储具有完整回溯的每个跟踪事件。 pprof 是libprofile输出到图形格式或callgrind格式的转换器。在全视图下,结果将与kcachegrind中的结果相同;但是如果您专注于某些功能,例如do_1使用pprof的选项焦点;专注于功能时,它将显示准确的调用树。

There is a google-perftools project with pprof/libprofiler.so CPU profiler, http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html . libprofiler.so can get profile with calltraces and it will store every trace event with full backtrace. pprof is converter of libprofile's output to graphic formats or to callgrind format. In full view the result will be the same as in kcachegrind; but if you will focus on some function, e.g. do_1 using pprof's option focus; it will show accurate calltree when focused on function.

这篇关于Kcachegrind / callgrind对于调度程序功能不准确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆