如何使用Google Perf工具 [英] How to use google perf tools

查看:332
本文介绍了如何使用Google Perf工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用google性能工具(ubuntu中的google-perftoolslibgoogle-perftools4软件包),我发誓要搜索大约一天,但没有找到答案!! 问题是我无法通过CPU分析获得所有函数的结果.这是我的代码:

I just started using google performance tools (google-perftools and libgoogle-perftools4 packages in ubuntu), I swear that I'm googling for around a day and I didn't find an answer!! The problem is that I do not get the result for ALL of my functions with CPU profiling. This is my code:

#include "gperftools/profiler.h"
#include <iostream>
#include <math.h>
using namespace std;
void bar()
{
        int a,b,c,d,j,k;
        a=0;
        int z=0;
        b = 1000;
        while(z < b)
        {
                while (a < b)
                {
                        d = sin(a);
                        c = cos(a);
                        j = tan(a);
                        k = tan(a);
                        k = d * c + j *k;
                        a++;
                }
                a = 0;
                z++;
        }
}
void foo()
{
        cout << "hey " << endl;
}

int main()
{
        ProfilerStart("/home/mohammad/gperf/dump.txt");

        int a = 1000;
        while(a--){foo();}
        bar();
        ProfilerFlush();
        ProfilerStop();
}

编译为g++ test.cc -lprofiler -o a.out

这是我运行代码的方式:

this is how I run the code:

CPUPROFILE=dump.txt ./a.out

我也尝试过:

CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out

这就是我从google-pprof --text a.out dump.txt那里得到的:

And this is what I get from google-pprof --text a.out dump.txt:

Using local file ./a.out.
Using local file ./dump.txt.
Total: 22 samples
8  36.4%  36.4%        8  36.4% 00d8cb04
6  27.3%  63.6%        6  27.3% bar
3  13.6%  77.3%        3  13.6% __cos (inline)
2   9.1%  86.4%        2   9.1% 00d8cab4
1   4.5%  90.9%        1   4.5% 00d8cab6
1   4.5%  95.5%        1   4.5% 00d8cb06
1   4.5% 100.0%        1   4.5% __write_nocancel
0   0.0% 100.0%        3  13.6% __cos

但是没有有关foo函数的信息!

But there is no information about the foo function!

我的系统信息: Ubuntu 12.04 g ++ 4.6.3

my system information: ubuntu 12.04 g++ 4.6.3

仅此而已!

推荐答案

TL; DR:foo要获取概要分析事件又快又小,请再运行100次.频率设置带有错别字,pprof的采样频率不会超过CONFIG_HZ(通常为250).最好改用更现代的Linux perf分析器(作者的教程维基百科).

TL;DR: foo is to fast and small to get profiling events, run it 100 more times. Frequency setting was with typo, and pprof will not sample more often than CONFIG_HZ (usually 250). It is better to switch to more modern Linux perf profiler (tutorial from its authors, wikipedia).

长版:

您的foo函数太短和太简单-只需调用两个函数即可.使用g++ test.cc -lprofiler -o test.s -S -g编译测试,并使用c++filt程序过滤test.s,以使c ++名称可读:

Your foo function is just too short and simple - just call two functions. Compiled the test with g++ test.cc -lprofiler -o test.s -S -g, with filtering of test.s with c++filt program to make c++ names readable:

foo():
.LFB972:
        .loc 1 27 0
        pushq   %rbp
        movq    %rsp, %rbp
        .loc 1 28 0
        movl    $.LC0, %esi
        movl    std::cout, %edi
        call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
        movl    std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi
        movq    %rax, %rdi
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&))
        .loc 1 29 0
        popq    %rbp
        ret
.LFE972:
        .size   foo(), .-foo()

因此,要在配置文件中看到它,您应该运行foo更多次,将main中的int a = 1000;更改为更大的值,例如10000或更好的100000(我在测试中也是如此).

So, to see it in the profile you should run foo for more times, changing int a = 1000; in main to something much greater, like 10000 or better 100000 (as did I for the test).

此外,您还可以修复不正确的"CPUPROFILE_FREQUENC=10000"来更正CPUPROFILE_FREQUENCY(请注意Y).我应该说CPUPROFILE_FREQUENCY的设置太高了10000,因为它通常每秒可能仅生成1000或250个事件,具体取决于内核配置CONFIG_HZ(大多数3.x内核有250个,请检查grep CONFIG_HZ= /boot/config*). pprof中的CPUPROFILE_FREQUENCY的默认设置为100.

Also you may fix the incorrect "CPUPROFILE_FREQUENC=10000" to correct CPUPROFILE_FREQUENCY (note the Y). I should say that 10000 is too high setting for CPUPROFILE_FREQUENCY, because it usually may only generate 1000 or 250 events per second depending on kernel configuration CONFIG_HZ (most 3.x kernels have 250, check grep CONFIG_HZ= /boot/config*). And default setting for CPUPROFILE_FREQUENCY in pprof is 100.

我在Ubuntu 14.04上使用bash脚本测试了CPUPROFILE_FREQUENCY的不同值:100000、10000、1000、250

I tested different values of CPUPROFILE_FREQUENCY: 100000, 10000, 1000, 250 with bash script on Ubuntu 14.04

for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do 
   echo -n "$a "; 
   CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null;
done

结果是相同的120-140个事件和每个./test的运行时间大约为0.5秒,因此google-perftools的cpuprofiler不能为单线程每秒执行的事件要比内核中设置的CONFIG_HZ多(我有250).

And the results were the same 120-140 events and runtime of every ./test around 0.5 seconds, so cpuprofiler from google-perftools can't do more events per second for single thread, than CONFIG_HZ set in kernel (my has 250).

100000 PROFILE: interrupts/evictions/bytes = 124/1/6584
100000 PROFILE: interrupts/evictions/bytes = 134/0/7864
10000 PROFILE: interrupts/evictions/bytes = 125/0/7488
10000 PROFILE: interrupts/evictions/bytes = 123/0/6960
1000 PROFILE: interrupts/evictions/bytes = 134/0/6264
1000 PROFILE: interrupts/evictions/bytes = 125/2/7272
300 PROFILE: interrupts/evictions/bytes = 137/2/7984
300 PROFILE: interrupts/evictions/bytes = 126/0/7216
250 PROFILE: interrupts/evictions/bytes = 123/3/6680
250 PROFILE: interrupts/evictions/bytes = 137/2/7352

原始a = 1000 foo且cout的函数运行太快而无法在每次运行中获得任何概要分析事件(即使是250个事件/秒),因此您没有foo,也没有任何Input/Output函数.在少量运行中,__write_nocancel可能会发生采样事件,然后将报告由libstdc ++组成的foo和I/O函数(某些地方不在最顶端,因此请使用pprofgoogle-pprof),自我事件计数为零,子事件计数为非零:

With original a=1000 foo and cout's functions runs too fast to get any profiling event (even on 250 events/s) on them in every run, so you have no foo, nor any Input/Output functions. In small amount of runs, the __write_nocancel may got sampling event, and then foo and I/O functions form libstdc++ will be reported (somewhere not in the very top, so use --text option of pprof or google-pprof) with zero self event count, and non-zero child event count:

 ....
   1   0.9%  99.1%        1   0.9% __write_nocancel
 ....
   0   0.0% 100.0%        1   0.9% _IO_new_file_overflow
   0   0.0% 100.0%        1   0.9% _IO_new_file_write
   0   0.0% 100.0%        1   0.9% __GI__IO_putc
   0   0.0% 100.0%        1   0.9% foo
   0   0.0% 100.0%        1   0.9% new_do_write
   0   0.0% 100.0%        1   0.9% std::endl
   0   0.0% 100.0%        1   0.9% std::ostream::put

使用a=100000时,foo仍然太短和太快而无法获得自己的事件,但是I/O函数却得到了几个.这是我从长--text输出中提取的列表:

With a=100000, foo is still too short and fast to get own events, but I/O functions got several. This is list I grepped from long --text output:

  34  24.6%  24.6%       34  24.6% __write_nocancel

   1   0.7%  95.7%       35  25.4% __GI__IO_fflush
   1   0.7%  96.4%        1   0.7% __GI__IO_putc
   1   0.7%  97.8%        2   1.4% std::operator<< 
   1   0.7%  98.6%       36  26.1% std::ostream::flush
   1   0.7%  99.3%        2   1.4% std::ostream::put
   0   0.0% 100.0%       34  24.6% _IO_new_file_sync
   0   0.0% 100.0%       34  24.6% _IO_new_file_write
   0   0.0% 100.0%       40  29.0% foo

   0   0.0% 100.0%       34  24.6% new_do_write

   0   0.0% 100.0%        2   1.4% std::endl

仅凭pprof读取调用链的功能(只有在不省略帧信息的情况下,谁才能调用获得样本的函数),才能看到拥有自己的计数器为零的函数.

Functions with zero own counters seen only thanks to pprof capability of reading call chains (it knows who calls the functions which got sample, if frame info is not omitted).

我还可以推荐更现代,更强大的功能(软件和硬件事件,高达5 kHz或更高的频率;用户空间和内核性能分析)以及更好支持的探查器Linux perf探查器(来自其作者的教程

I can also recommend more modern, more capable (both software and hardware events, up to 5 kHz frequency or more; both user-space and kernel profiling) and better supported profiler, the Linux perf profiler (tutorial from its authors, wikipedia).

perfa=10000的结果:

$ perf record  ./test3  >/dev/null
 ... skip some perf's spam about inaccessibility of kernel symbols 
 ... note the 3 kHz frequency here VVVV
Lowering default frequency rate to 3250. 
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]

要查看来自perf.data输出文件的文本报告,我将减少使用(因为默认情况下perf report启动交互式配置文件浏览器):

To see text report from perf.data output file I'll use less (because perf report by default starts interactive profile browser):

$ perf report |less
... skip some extra info about the machine, kernel, and perf starting command
# Samples: 1K of event 'cycles'
# Event count (approx.): 1155264208
# Overhead   Command   Shared Object          Symbol
    41.94%    test3  libm-2.19.so         [.] __tan_sse2                                                                                                                                                                    
    16.95%    test3  libm-2.19.so         [.] __sin_sse2    
    13.40%    test3  libm-2.19.so         [.] __cos_sse2                                                                                                                                                                    
     4.93%    test3  test3                [.] bar()                                                                                                                                                                         
     2.90%    test3  libc-2.19.so         [.] __GI___libc_write    
....
     0.20%    test3  test3                [.] foo()                                                                                                                                                                         

perf report -n | less查看原始事件(样本)计数器:

Or perf report -n | less to see raw event (sample) counters:

# Overhead       Samples  Command        Shared Object 
    41.94%           663    test3  libm-2.19.so         [.] __tan_sse2                                                                                                                                                                    
    16.95%           268    test3  libm-2.19.so         [.] __sin_sse2   
    13.40%           212    test3  libm-2.19.so         [.] __cos_sse2                                                                                                                                                                    
     4.93%            78    test3  test3                [.] bar()                                                                                                                                                                         
     2.90%            62    test3  libc-2.19.so         [.] __GI___libc_write     
 ....
     0.20%             4    test3  test3                [.] foo()                                                                                                                                                                         

这篇关于如何使用Google Perf工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆