从内核读取性能寄存器 [英] Reading performance registers from the kernel

查看:198
本文介绍了从内核读取性能寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想阅读某些性能计数器.我知道有一些perf之类的工具可以在用户空间本身中为我做到,我希望代码位于Linux内核中.

I want to read certain performance counters. I know that there are tools like perf, that can do it for me in the user space itself, I want the code to be inside the Linux kernel.

我想编写一种机制来监视Intel®Core™i7-3770 CPU上的性能计数器.在使用之上,我正在使用Ubuntu内核4.19.2.我已经从 easyperf

I want to write a mechanism to monitor performance counters on Intel(R) Core(TM) i7-3770 CPU. On top of using I am using Ubuntu kernel 4.19.2. I have gotten the following method from easyperf

这是我的代码的一部分,用于阅读说明.

Here's part of my code to read instructions.

  struct perf_event_attr *attr
  memset (&pe, 0, sizeof (struct perf_event_attr));
  pe.type = PERF_TYPE_HARDWARE;
  pe.size = sizeof (struct perf_event_attr);
  pe.config = PERF_COUNT_HW_INSTRUCTIONS;
  pe.disabled = 0;
  pe.exclude_kernel = 0;
  pe.exclude_user = 0;
  pe.exclude_hv = 0;
  pe.exclude_idle = 0;

  fd = syscall(__NR_perf_event_open, hw, pid, cpu, grp, flags);

  uint64_t perf_read(int fd) {
    uint64_t val;
    int rc;
    rc = read(fd, &val, sizeof(val));
    assert(rc == sizeof(val));
    return val;
  }

我想在内核代码中添加相同的行(在上下文切换功能),然后检查所读取的值.

I want to put the same lines in the kernel code (in the context switch function) and check the values being read.

我的最终目标是找出一种从内核(4.19.2)本身每次读取到另一个进程时读取其性能计数器的方法.

My end goal is to figure out a way to read performance counters for a process, every time it switches to another, from the kernel(4.19.2) itself.

为此,我签出了系统调用号__NR_perf_event_open的代码.可以在此处找到 为了使可用,我将代码复制为一个单独的函数,在同一文件中将其命名为perf_event_open()并导出.

To achieve this I check out the code for the system call number __NR_perf_event_open. It can be found here To make to usable I copied the code inside as a separate function, named it perf_event_open() in the same file and exported.

现在的问题是,每当我以与上述相同的方式调用perf_event_open()时,返回的描述符为-2.通过检查错误代码,我发现错误是ENOENT.在 perf_event_open()手册页中,其原因是错误定义为错误的 type 字段.

Now the problem is whenever I call perf_event_open() in the same way as above, the descriptor returned is -2. Checking with the error codes, I figured out that the error was ENOENT. In the perf_event_open() man page, the cause of this error is defined as wrong type field.

由于文件描述符与打开它们的进程相关联,因此如何从内核使用它们?是否有另一种方法可以配置pmu以开始计数而不涉及文件描述符?

Since file descriptors are associated to the process that's opened them, how can one use them from the kernel? Is there an alternative way to configure the pmu to start counting without involving file descriptors?

推荐答案

您可能不希望在上下文切换功能中重新编程计数器的开销.

最简单的方法是从用户空间进行系统调用以对PMU进行编程(以计算一些事件,可能将其设置为在内核模式下计数,但不是用户空间,这样计数器很少溢出).

The easiest thing would be to make system calls from user-space to program the PMU (to count some event, probably setting it to count in kernel mode but not user-space, just so the counter overflows less often).

然后在您的自定义内核代码中两次使用rdpmc(以获取开始/停止计数).计数器将保持运行状态,我猜想内核性能代码将在环绕时处理中断. (或者当它的PEBS缓冲区已满时.)

Then just use rdpmc twice (to get start/stop counts) in your custom kernel code. The counter will stay running, and I guess the kernel perf code will handle interrupts when it wraps around. (Or when its PEBS buffer is full.)

IDK(如果可以对计数器进行编程),以便它包装而不会中断,对于像这样的用例,您不关心总数或基于样本的分析,而只想使用rdpmc.如果是这样,那就这样做.

IDK if it's possible to program a counter so it just wraps without interrupting, for use-cases like this where you don't care about totals or sample-based profiling, and just want to use rdpmc. If so, do that.

旧答案,解决了您的旧问题,该问题基于错误的printf格式字符串,即使您也不在用户空间中进行任何计数,该字符串也会打印非零垃圾.

Old answer, addressing your old question which was based on a buggy printf format string that was printing non-zero garbage even though you weren't counting anything in user-space either.

您的嵌入式asm看起来正确,所以问题是在代码运行的上下文中,PMU计数器在内核模式下到底要编程多少?

Your inline asm looks correct, so the question is what exactly that PMU counter is programmed to count in kernel mode in the context where your code runs.

perf虚拟化上下文切换上的PMU计数器,从而给perf stat计数单个进程(即使它跨CPU迁移)的错觉.除非您使用perf -a来获取系统范围的计数,否则PMU可能不会被编程为计数任何东西,因此即使在其他时间将其编程为对诸如周期之类的快速变化事件进行计数,多次读取也会给出0或说明.

perf virtualizes the PMU counters on context-switch, giving the illusion of perf stat counting a single process even when it migrates across CPUs. Unless you're using perf -a to get system-wide counts, the PMU might not be programmed to count anything, so multiple reads would all give 0 even if at other times it's programmed to count a fast-changing event like cycles or instructions.

您确定已设置perf来统计用户+内核事件,而不仅仅是用户空间事件吗?

Are you sure you have perf set to count user + kernel events, not just user-space events?

perf stat将自身限制为用户空间,则会显示instructions:u而不是instructions. (如果您尚未将sysctl kernel.perf_event_paranoid降低到0,或者是从安全的默认值(不允许用户空间了解内核的任何信息)中降低,则这是非root用户的默认值.)

perf stat will show something like instructions:u instead of instructions if it's limiting itself to user-space. (This is the default for non-root if you haven't lowered sysctl kernel.perf_event_paranoid to 0 or something from the safe default that doesn't let user-space learn anything about the kernel.)

HW支持将计数器编程为仅在CPL!= 0(即不在环0/内核模式下)时计数. kernel.perf_event_paranoid的较高值限制了perf API不允许编程计数器以内核+用户模式进行计数,但是即使使用paranoid = -1,也可以通过这种方式进行编程.如果那是您编程计数器的方式,那么这将解释所有内容.

There's HW support for programming a counter to only count when CPL != 0 (i.e. not in ring 0 / kernel mode). Higher values for kernel.perf_event_paranoid restrict the perf API to not allow programming counters to count in kernel+user mode, but even with paranoid = -1 it's possible to program them this way. If that's how you programmed a counter, then that would explain everything.

我们需要查看您为计数器编程的代码.这不会自动发生.

当没有进程使用PAPI函数启用每个进程或系统范围的计数器时,内核不仅使计数器一直保持运行状态;会产生中断,使系统减速,毫无益处.

The kernel doesn't just leave the counters running all the time when no process has used a PAPI function to enable a per-process or system-wide counter; that would generate interrupts that slow the system down for no benefit.

这篇关于从内核读取性能寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆