使用gprof的怪异分析输出 [英] Weird profiling output using gprof

查看:161
本文介绍了使用gprof的怪异分析输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前我的分析code,这是C99标准,使用gprof。 我的计划是递归定义。我使用的GCC与启用了一些警告,并没有优化标志的-pg标志编译。然后我运行code后跟命令:

  gprof的./myProgram> outputFile.txt

我以什么输出告诉我很困惑,虽然。这是说,我花了很多时间在我知道的一个事实,我不花很多时间的函数。有问题的功能有内循环和执行任务的时间定数。这是一个很小的任务,但是,并通过增加的次任务,比如说,从完成5至100,我得到的时间采取我的程序来执行的总量没有可见的改变的数目。然而,gprof的告诉我,我已经在这个函数花费的执行时间近50%。

我不知道是什么问题就在这里。有没有人遇到这样的事情过吗?我能解决这个我自己?我使用Linux的一个CentOS分布。我曾试图安装OProfile的为不同的探查,但没有到/ usr /本地写权限,所以化妆安装失败,所以不知道这仅仅是使用gprof问题

下面是输出文件的相关部分。怪异的输出是功能 nl_jacobi_to_converge

从平面轮廓我有

 %的累积自我自我总
 时间秒秒调用S /呼叫S /调用名称
 49.02 7.50 7.50 0.00 1562 0.01 calculate_resid_numeric_jac_diags_tc41
 16.41 10.01 2.51 323878716 0.00 0.00Ç
 14.97 12.30 2.29 323878716 0.00 0.00 b
  6.01 13.22 0.92 0.01 89 0.02 calculate_resid_tc41
  5.69 14.09 0.87 0.00 10967082 0.00 rhs_function
  1.11 14.26 0.17 1562 0.00 0.00 single_nl_jacobi
  ...
  0.00 15.30 0.00 0.00 11 0.76 nl_jacobi_to_converge

和从粒度我有

 叫名索引%的时间自我的孩子
                0.00 8.32 11/11 nl_multigrid [3]
[5] 54.4 0.00 8.32 11 nl_jacobi_to_converge [5]
                5.28 2.921562分之1100calculate_resid_numeric_jac_diags_tc41 [4]
                0.12 0.00一千五百六十二分之一千一single_nl_jacobi [17]


解决方案

您有许多人试图找出 gprof的告诉他们中的一个,到小数点后四位。
我使用随机暂停和检查堆栈。

首先, gprof的是CPU分析器。
在IO,互斥,寻呼,或其他任何阻塞的系统调用,这意味着,它的关闭的,不计那个时候。
你说你正在做这样的事,但它可能内心深处的一些库函数发生。
如果是这样, gprof的口罩吧。
另一方面,一个单堆样品将显示什么它在等待,以概率等于时间它正在等待的级分

其次,@keshlam说,了解有关自我时间是很重要的。
如果它的高,这意味着程序计数器在该例程中发现的CPU时间的好个百分点。
但是,如果它是低的,这并不意味着功能是无罪的。
该功能可以花费的地段的时间,但通过调用子函数这样做。
要看到的是,你需要在 gprof的总一栏,但总时间的百分比,而不是每次通话的绝对时间,这是它给你。
如果你把一个堆栈样本,那么任何程序将出现在它的概率等于一次被消费的比例。
更重要的是,你就会知道到底为什么的时间被消耗,因为样本将显示您所在的呼叫发生precise行号。

增加: gprof的尝试处理递归函数,但由于其作者所指出的,它不会成功。然而,堆叠的样品有递归没有问题。如果堆叠样品递归函数调用期间拍摄,则函数出现不止一次堆栈,可能多次上。然而,它仍然是一个函数,或$ C $中任意行调用一个函数的包容时间成本,只是时间的分数是在栈上的情况。
看到这一点,假设采样在一个恒定的频率,总共M个样本,和code的特定功能或行上他们的分数出现F。
如果该功能或code线可以作出不采取任何时间,比如通过删除它时,它周围的分支,或把它关到一个无限快速的处理器,那么就没有接触到被采样。
然后它已经出现在M *˚F样品会的消失的,由F部位缩短执行时间。

I am currently profiling my code, which is C99 compliant, with gprof. My program is defined recursively. I am compiling using gcc with the -pg flag with some warnings enabled, and no optimisation flags. Then I am running the code followed by the command:

gprof ./myProgram > outputFile.txt

I am very confused as to what the output is telling me, though. It is saying that I am spending a lot of time in a function that I know for a fact I do not spend a lot of time in. The function in question has a loop inside and performs a task a set number of times. This is a tiny task, though, and by increasing the number of times the task is completed from, say, 5 to 100 I get no visible change in the total amount of time taken for my program to execute. However, gprof is telling me that I have spent almost 50% of the execution time in this function.

I have no idea what the issue is here. Has anyone encountered something like this before? Can I resolve this myself? I am using a CentOS distribution of Linux. I have tried to install OProfile as a different profiler, but don't have write access to /usr/local, so the make install fails, so not sure if this is only an issue with gprof

Here are the relevant parts of the output file. The weird output is for the function nl_jacobi_to_converge:

From the flat profile I have

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 49.02      7.50     7.50     1562     0.00     0.01  calculate_resid_numeric_jac_diags_tc41
 16.41     10.01     2.51 323878716     0.00     0.00  c
 14.97     12.30     2.29 323878716     0.00     0.00  b
  6.01     13.22     0.92       89     0.01     0.02  calculate_resid_tc41
  5.69     14.09     0.87 10967082     0.00     0.00  rhs_function
  1.11     14.26     0.17     1562     0.00     0.00  single_nl_jacobi
  ...
  0.00     15.30     0.00       11     0.00     0.76  nl_jacobi_to_converge

And from the granularity I have

index % time    self  children    called     name
                0.00    8.32      11/11          nl_multigrid [3]
[5]     54.4    0.00    8.32      11         nl_jacobi_to_converge [5]
                5.28    2.92    1100/1562        calculate_resid_numeric_jac_diags_tc41 [4]
                0.12    0.00    1100/1562        single_nl_jacobi [17]

解决方案

You are one of numerous people trying to figure out what gprof is telling them, to four decimal places. I use random pausing and examining the stack.

In the first place, gprof is a "CPU profiler". That means during IO, mutex, paging, or any other blocking system calls, it is shut off, and doesn't count that time. You say you're doing nothing of the sort, but it could be happening deep inside some library function. If it were, gprof masks it. On the other hand, a single stack sample will show what it is waiting for, with probability equal to the fraction of time it is waiting.

Second, as @keshlam said, it's important to understand about "self time". If it's high, that means the program counter was found in that routine a good percent of the CPU time. However, if it's low, that does not mean the function isn't guilty. The function could be spending lots of time, but doing it by calling subfunctions. To see that, you need the gprof "total" column, but as a percent of total time, not as an absolute time per call, which is what it is giving you. If you take a stack sample, then any routine will appear on it with probability equal to the fraction of time it is spending. What's more, you will know exactly why that time is being spent, because the sample will show you the precise line numbers where the calls occur.

ADDED: gprof attempts to handle recursive functions, but as its authors point out, it does not succeed. However, stack samples have no problem with recursion. If a stack sample is taken during a recursive function call, the function appears more than once on the stack, possibly many times. However, it is still the case that the inclusive time cost of a function, or of any line of code that calls a function, is simply the fraction of time it is on the stack. To see this, suppose samples are taken at a constant frequency, for a total of M samples, and a particular function or line of code appears on fraction F of them. If that function or line of code could be made to take no time, such as by deleting it, branching around it, or passing it off to an infinitely fast processor, then it would have no exposure to being sampled. Then the M*F samples on which it had appeared would disappear, shortening execution time by fraction F.

这篇关于使用gprof的怪异分析输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆