如何计算进程 id 的执行指令数,包括所有未来的子线程 [英] How to count number of executed instructions of a process id including all future child threads

查看:70
本文介绍了如何计算进程 id 的执行指令数,包括所有未来的子线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时候,我问了以下问题,@ M-Iduoad友好地提供了一种 pgrep 来捕获的解决方案所有子PID,并将其与perf stat中的-p一起使用.效果很好!

Some times ago, I asked the following question "How to count number of executed instructions of a process id including child processes", and @M-Iduoad kindly provided a solution with pgrep to capture all child PIDs and use it with -p in perf stat. It works great!

但是,我遇到的一个问题是多线程应用程序以及何时生成新线程.由于我不是算命先生(太糟糕了!),我不知道新生成的线程的 tid ,因此我无法将它们添加到 perf stat 的-p或-t参数.

However, one problem I encountered is with multi-threaded application, and when a new thread is being spawned. Since I'm not a fortune teller (too bad!), I don't know tid of the newly generated threads, and therefore I can't add them in the perf stat's -p or -t parameter.

作为示例,假设我有一个多线程nodejs服务器(部署为Kubernetes之上的容器),具有以下 pstree :

As an example, let's assume I have a multithreaded nodejs server (deployed as a container on top of Kubernetes) with the following pstree:

root@node2:/home/m# pstree -p 4037791
node(4037791)─┬─sh(4037824)───node(4037825)─┬─{node}(4037826)
              │                             ├─{node}(4037827)
              │                             ├─{node}(4037828)
              │                             ├─{node}(4037829)
              │                             ├─{node}(4037830)
              │                             └─{node}(4037831)
              ├─{node}(4037805)
              ├─{node}(4037806)
              ├─{node}(4037807)
              ├─{node}(4037808)
              ├─{node}(4037809)
              ├─{node}(4037810)
              ├─{node}(4037811)
              ├─{node}(4037812)
              ├─{node}(4037813)
              └─{node}(4037814) 

当然,我可以使用以下 perf stat 命令来监视其线程:

Of course, I can have the following perf stat command to watch its threads:

perf stat --per-thread -e instructions,cycles,task-clock,cpu-clock,cpu-migrations,context-switches,cache-misses,duration_time -p $(pgrep --ns 4037791 | paste -s -d ",")

它与单线程nodejs应用程序一起正常工作.但是,在多线程服务的情况下,一旦收到请求, pstree 的输出将如下所示:

It works fine with a single threaded nodejs application. But in case of a multi-threaded service, as soon as it receives a request, the pstree output would be look like this:

root@node2:/home/m# pstree -p 4037791
node(4037791)─┬─sh(4037824)───node(4037825)─┬─{node}(4037826)
              │                             ├─{node}(4037827)
              │                             ├─{node}(4037828)
              │                             ├─{node}(4037829)
              │                             ├─{node}(4037830)
              │                             ├─{node}(4037831)
              │                             ├─{node}(1047898)
              │                             ├─{node}(1047899)
              │                             ├─{node}(1047900)
              │                             ├─{node}(1047901)
              │                             ├─{node}(1047902)
              │                             ├─{node}(1047903)
              │                             ├─{node}(1047904)
              │                             ├─{node}(1047905)
              │                             ├─{node}(1047906)
              │                             ├─{node}(1047907)
              │                             ├─{node}(1047908)
              │                             ├─{node}(1047909)
              │                             ├─{node}(1047910)
              │                             ├─{node}(1047911)
              │                             ├─{node}(1047913)
              │                             ├─{node}(1047914)
              │                             ├─{node}(1047919)
              │                             ├─{node}(1047920)
              │                             ├─{node}(1047921)
              │                             └─{node}(1047922)
              ├─{node}(4037805)
              ├─{node}(4037806)
              ├─{node}(4037807)
              ├─{node}(4037808)
              ├─{node}(4037809)
              ├─{node}(4037810)
              ├─{node}(4037811)
              ├─{node}(4037812)
              ├─{node}(4037813)
              └─{node}(4037814)

因此,我之前的 perf stat 命令不会捕获新生成的线程的统计信息.我的意思是,它可能会捕获累积的指令,但绝对不会显示在每线程"中.格式.

Therefore, my previous perf stat command would not capture the stats of the newly generated threads. I mean, it may capture accumulated instructions but it's definitely not showing in a "per-thread" format.

有什么方法可以在perf统计信息中使用-per-thread 并捕获多线程应用程序中新产生的线程的统计信息?似乎只能使用 -p -t 来遵循 perf 启动时已经存在的固定线程集,而不会跟随新的.

Is there any way that I can use --per-thread in perf stat and capture stats of the newly spawned threads in a multithreaded application? It seems to only work with -p or -t to follow a fixed set of threads that already exist when perf starts, and won't follow new ones.

这里有一个类似的有关 perf记录的问题 ,但我使用的是 perf stat .另外,这似乎并没有按线程分开记录的概要文件,因此它等效于 perf stat节点... ,除非有一种方法可以处理记录的数据,然后在线程之后将其按线程分开事实吗?

There's a similar question here for perf record but I'm using perf stat. Also, that doesn't seem to separate the recorded profile by thread, so it's just equivalent to perf stat node ... Unless there's a way to process the recorded data to separate it out by thread after the fact?

可以帮助我动态计算指令,周期,任务时钟,cpu时钟,cpu迁移,上下文切换,缓存丢失"的任何其他潜在解决方案.给定PID的每个线程(包括新生成的线程),无论使用 perf 还是其他任何方法,都是可以接受的!

Any other potential solutions that help me dynamically count "instructions,cycles,task-clock,cpu-clock,cpu-migrations,context-switches,cache-misses" per threads of a given PID (including newly spawned threads) is acceptable, whether using perf or anything else!

推荐答案

perf record -s perf report -T 的组合应为您提供所需的信息

The combination of perf record -s and perf report -T should give you the information you need.

为了演示,请使用以下具有良好定义的指令数的线程作为示例代码:

To demonstrate, take the following example code using threads with well-defined instruction counts:

#include <cstdint>
#include <thread>

void work(int64_t count) {
    for (int64_t i = 0; i < count; i++);
}

int main() {
    std::thread first(work, 100000000ll);
    std::thread second(work, 400000000ll);
    std::thread third(work, 800000000ll);
    first.join();
    second.join();
    third.join();
}

(无需优化即可编译!)

(Compile without optimization!)

现在,使用 perf record 作为前缀命令.它将遵循所有产生的进程和线程.

Now, use perf record as a prefix command. It will follow all spawned processes and threads.

$ perf record -s -e instructions -c 1000000000 ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (5 samples) ]

要很好地显示统计信息:

To display the statistics nicely:

$ perf report -T
[... snip ...]
#    PID     TID  instructions:u
  270682  270683       500003888
  270682  270684      2000001866
  270682  270685      4000002177

perf record 的参数有些棘手. -s 用相当精确的数字写单独的记录-它们不依赖于指令样本(每1000000000条指令生成).但是,即使找不到 -T 性能报告也会失败,因为它找不到单个样本.因此,您需要设置至少触发一次的指令样本计数 -c (或频率).任何示例都可以,每个线程都不需要示例.

The parameters for perf record are a little bit tricky. -s writes separate records with fairly precise numbers - they do not depend on the instruction samples (generated every 1000000000 instructions). However, perf report, even with -T fails when it does not find a single sample. So you need to set a instruction sample count -c (or frequency) that triggers at least once. Any sample will do, it does not need a sample per thread.

或者,您可以查看来自 perf.data 的原始记录.然后,您实际上可以告诉性能记录不收集任何任何样本.

Alternatively, you could look at the raw records from perf.data. Then you can actually tell perf record to not collect any samples.

$ perf record -s -e instructions -n ./a.out             
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]

但是您需要过滤掉相关记录,并且可能还需要汇总其他记录.

But you need to filter out the relevant records and there might be additional records you need to sum up.

$ perf script -D | grep PERF_RECORD_READ | grep -v " 0$"
# Annotation by me                              PID    TID 
213962455637481 0x760 [0x40]: PERF_RECORD_READ: 270887 270888 instructions:u 500003881
213963194850657 0x890 [0x40]: PERF_RECORD_READ: 270887 270889 instructions:u 2000001874
213964190418415 0x9c0 [0x40]: PERF_RECORD_READ: 270887 270890 instructions:u 4000002175

这篇关于如何计算进程 id 的执行指令数,包括所有未来的子线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆