perf_event_open溢出信号 [英] perf_event_open Overflow Signal

查看:134
本文介绍了perf_event_open溢出信号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算一些代码的(或多或少)确切的指令数量.此外,我想在传递特定数量的指令后收到信号.

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.

为此,我使用了由提供的溢出信号行为 perf_event_open .

For this purpose, I use the overflow signal behaviour provided by perf_event_open.

我正在使用手册页提出的第二种方法来实现溢出信号:

I'm using the second way the manpage proposes to achieve overflow signals:

信号溢出

可以将事件设置为在阈值达到阈值时传递信号 越过.信号处理程序是使用poll(2),select(2), epoll(2)和fcntl(2),系统调用.

Events can be set to deliver a signal when a threshold is crossed. The signal handler is set up using the poll(2), select(2), epoll(2) and fcntl(2), system calls.

[...]

另一种方法是使用PERF_EVENT_IOC_REFRESH ioctl.这 ioctl添加到一个计数器,该计数器在每次事件溢出时递减. 非零时,将在溢出时发送POLL_IN信号,但值一次 达到0,将发送类型为POLL_HUP的信号和基础事件 被禁用.

The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl. This ioctl adds to a counter that decrements each time the event overflows. When nonzero, a POLL_IN signal is sent on overflow, but once the value reaches 0, a signal is sent of type POLL_HUP and the underlying event is disabled.

有关PERF_EVENT_IOC_REFRESH ioctl的进一步说明:

Further explanation of PERF_EVENT_IOC_REFRESH ioctl:

PERF_EVENT_IOC_REFRESH

PERF_EVENT_IOC_REFRESH

非继承的溢出计数器可以使用它来启用 计数器针对参数指定的大量溢出, 之后将其禁用.此ioctl的后续调用 将参数值添加到当前计数.信号与 POLL_IN设置将在每次溢出之前发生,直到计数 达到0;当发生这种情况时,设置了POLL_HUP的信号是 发送,并且该事件被禁用.使用参数0为 被认为是未定义的行为.

Non-inherited overflow counters can use this to enable a counter for a number of overflows specified by the argument, after which it is disabled. Subsequent calls of this ioctl add the argument value to the current count. A signal with POLL_IN set will happen on each overflow until the count reaches 0; when that happens a signal with POLL_HUP set is sent and the event is disabled. Using an argument of 0 is considered undefined behavior.

一个非常小的例子如下:

A very minimal example would look like this:

#define _GNU_SOURCE 1

#include <asm/unistd.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu, int group_fd, unsigned long flags)
{
    return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags);
}

static void perf_event_handler(int signum, siginfo_t* info, void* ucontext) {
    if(info->si_code != POLL_HUP) {
        // Only POLL_HUP should happen.
        exit(EXIT_FAILURE);
    }

    ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1);
}

int main(int argc, char** argv)
{
    // Configure signal handler
    struct sigaction sa;
    memset(&sa, 0, sizeof(struct sigaction));
    sa.sa_sigaction = perf_event_handler;
    sa.sa_flags = SA_SIGINFO;

    // Setup signal handler
    if (sigaction(SIGIO, &sa, NULL) < 0) {
        fprintf(stderr,"Error setting up signal handler\n");
        perror("sigaction");
        exit(EXIT_FAILURE);
    }

    // Configure perf_event_attr struct
    struct perf_event_attr pe;
    memset(&pe, 0, sizeof(struct perf_event_attr));
    pe.type = PERF_TYPE_HARDWARE;
    pe.size = sizeof(struct perf_event_attr);
    pe.config = PERF_COUNT_HW_INSTRUCTIONS;     // Count retired hardware instructions
    pe.disabled = 1;        // Event is initially disabled
    pe.sample_type = PERF_SAMPLE_IP;
    pe.sample_period = 1000;
    pe.exclude_kernel = 1;      // excluding events that happen in the kernel-space
    pe.exclude_hv = 1;          // excluding events that happen in the hypervisor

    pid_t pid = 0;  // measure the current process/thread
    int cpu = -1;   // measure on any cpu
    int group_fd = -1;
    unsigned long flags = 0;

    int fd = perf_event_open(&pe, pid, cpu, group_fd, flags);
    if (fd == -1) {
        fprintf(stderr, "Error opening leader %llx\n", pe.config);
        perror("perf_event_open");
        exit(EXIT_FAILURE);
    }

    // Setup event handler for overflow signals
    fcntl(fd, F_SETFL, O_NONBLOCK|O_ASYNC);
    fcntl(fd, F_SETSIG, SIGIO);
    fcntl(fd, F_SETOWN, getpid());

    ioctl(fd, PERF_EVENT_IOC_RESET, 0);     // Reset event counter to 0
    ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);   // 

// Start monitoring

    long loopCount = 1000000;
    long c = 0;
    long i = 0;

    // Some sample payload.
    for(i = 0; i < loopCount; i++) {
        c += 1;
    }

// End monitoring

    ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);   // Disable event

    long long counter;
    read(fd, &counter, sizeof(long long));  // Read event counter value

    printf("Used %lld instructions\n", counter);

    close(fd);
}

所以基本上我正在执行以下操作:

So basically I'm doing the following:

  1. 为SIGIO信号设置信号处理程序
  2. 使用perf_event_open创建一个新的性能计数器(返回文件描述符)
  3. 使用fcntl将信号发送行为添加到文件描述符中.
  4. 运行有效负载循环以执行许多指令.
  1. Set up a signal handler for SIGIO signals
  2. Create a new performance counter with perf_event_open (returns a file descriptor)
  3. Use fcntl to add signal sending behavior to the file descriptor.
  4. Run a payload loop to execute many instructions.

在执行有效负载循环时,有时将执行1000条指令(sample_interval).根据 perf_event_open手册页,这会触发溢出,然后该溢出将减少内部计数器. 一旦该计数器达到零,将发送POLL_HUP类型的信号,并且禁用基础事件."

When executing the payload loop, at some point 1000 instructions (the sample_interval) will have been executed. According to the perf_event_open manpage this triggers an overflow which will then decrement an internal counter. Once this counter reaches zero, "a signal is sent of type POLL_HUP and the underlying event is disabled."

发送信号时,当前进程/线程的控制流将停止,并执行信号处理程序.场景:

When a signal is sent, the control flow of the current process/thread is stopped, and the signal handler is executed. Scenario:

  1. 已执行1000条指令.
  2. 事件被自动禁用并发送了信号.
  3. 立即交付信号,停止过程控制流并执行信号处理程序.
  1. 1000 instructions have been executed.
  2. Event is automatically disabled and a signal is sent.
  3. Signal is immediately delivered, control flow of the process is stopped and the signal handler is executed.

这种情况将意味着两件事:

This scenario would mean two things:

  • 最后计算的指令数量总是等于,完全不使用信号的示例.
  • 已为信号处理程序保存的指令指针(可以通过ucontext访问)将直接指向指向导致溢出的指令.
  • The final amount of counted instructions would always be equal to an example which does not use signals at all.
  • The instruction pointer which has been saved for the signal handler (and can be accessed through ucontext) would directly point to the instruction which caused the overflow.

基本上可以说,信号行为可以看作是同步.

Basically you could say, the signal behavior can be seen as synchronous.

这是我想要实现的完美语义.

This is the perfect semantic for what I want to achieve.

但是,就我而言,我配置的信号通常是异步的,并且可能要经过一段时间才能最终传递并执行信号处理程序.这可能对我造成问题.

However, as far as I'm concerned, the signal I configured is generally rather asynchronous and some time may pass until it is eventually delivered and the signal handler is executed. This may pose a problem for me.

例如,考虑以下情形:

  1. 已执行1000条指令.
  2. 事件被自动禁用并发送了信号.
  3. 更多说明通过了
  4. 传递信号,停止过程的控制流,并执行信号处理程序.
  1. 1000 instructions have been executed.
  2. Event is automatically disabled and a signal is sent.
  3. Some more instructions pass
  4. Signal is delivered, control flow of the process is stopped and the signal handler is executed.

这种情况将意味着两件事:

This scenario would mean two things:

  • 与完全不使用信号的示例相比,最终计数的指令数量将.
  • 为信号处理程序保存的指令指针将指向导致溢出的指令或之后的任何指令.
  • The final amount of counted instructions would be less than an example which does not use signals at all.
  • The instruction pointer which has been saved for the signal handler would point to the instructions which caused the overflow or to any one after it.

到目前为止,我已经对上述示例进行了很多测试,并且没有会错过支持第一种情况的指令.

So far, I've tested above example a lot and did not experience missed instructions which would support the first scenario.

但是,我真的很想知道我是否可以依靠这个假设. 内核中会发生什么?

However, I'd really like to know, whether I can rely on this assumption or not. What happens in the kernel?

推荐答案

我想计算一些代码的(或多或少)确切的指令数量.另外,我希望在传递特定数量的指令后收到信号.

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.

您有两个可能相互冲突的任务.当您想进行计数(某些硬件事件的确切数量)时,只需在计数模式下使用CPU的性能监视单元(不要设置使用的sample_period/sample_freq结构的sample_period/sample_freq)并放置测量代码在您的目标程序中(如您的示例中所做的那样).在此模式下,根据perf_event_open手册页不会产生溢出(CPU的PMU通常为64位宽,并且在使用采样模式时未设置为较小的负值时不会溢出):

You have two task which may conflict with each other. When you want to get counting (exact amounts of some hardware event), just use performance monitoring unit of your CPU in counting mode (don't set sample_period/sample_freq of perf_event_attr structure used) and place the measurement code in your target program (as it was done in your example). In this mode according to the man page of perf_event_open no overflows will be generated (CPU's PMU are usually 64-bit wide and don't overflow when not set to small negative value when sampling mode is used):

上溢仅通过采样事件生成(sample_period必须为非零值).

Overflows are generated only by sampling events (sample_period must a nonzero value).

要计算程序的一部分,请使用ioctl的perf_event_open返回的fd,如手册页

To count part of program, use ioctls of perf_event_open returned fd as described in man page

perf_event ioctl调用-各种ioctl作用于perf_event_open()文件描述符:PERF_EVENT_IOC_ENABLE ... PERF_EVENT_IOC_DISABLE ... PERF_EVENT_IOC_RESET

perf_event ioctl calls - Various ioctls act on perf_event_open() file descriptors: PERF_EVENT_IOC_ENABLE ... PERF_EVENT_IOC_DISABLE ... PERF_EVENT_IOC_RESET

您可以使用rdpmc(在x86上)或通过fd上的read syscall读取当前值,就像从

You can read current value with rdpmc (on x86) or by read syscall on the fd like in the short example from the man page:

   #include <stdlib.h>
   #include <stdio.h>
   #include <unistd.h>
   #include <string.h>
   #include <sys/ioctl.h>
   #include <linux/perf_event.h>
   #include <asm/unistd.h>

   static long
   perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
                   int cpu, int group_fd, unsigned long flags)
   {
       int ret;

       ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
                      group_fd, flags);
       return ret;
   }

   int
   main(int argc, char **argv)
   {
       struct perf_event_attr pe;
       long long count;
       int fd;

       memset(&pe, 0, sizeof(struct perf_event_attr));
       pe.type = PERF_TYPE_HARDWARE;
       pe.size = sizeof(struct perf_event_attr);
       pe.config = PERF_COUNT_HW_INSTRUCTIONS;
       pe.disabled = 1;
       pe.exclude_kernel = 1;
       pe.exclude_hv = 1;

       fd = perf_event_open(&pe, 0, -1, -1, 0);
       if (fd == -1) {
          fprintf(stderr, "Error opening leader %llx\n", pe.config);
          exit(EXIT_FAILURE);
       }

       ioctl(fd, PERF_EVENT_IOC_RESET, 0);
       ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

       printf("Measuring instruction count for this printf\n");
       /* Place target code here instead of printf */

       ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
       read(fd, &count, sizeof(long long));

       printf("Used %lld instructions\n", count);

       close(fd);
   }

此外,我希望在传递特定数量的指令后收到信号.

Additionally, I want to receive a Signal after a specific amount of instructions passed.

您是否真的想获取信号,或者只需要每执行1000条指令就需要一个指令指针?如果要收集指针,请在采样模式下使用perf_even_open,但是从其他程序执行 可以禁用事件收集代码的度量.此外,如果您不对每次溢出使用信号(具有大量的内核-跟踪器交互以及从/到内核的切换),而是使用perf_events的功能来收集多个溢出事件,那么它将对目标程序产生较少的负面影响.进入单个mmap缓冲区并在此缓冲区上轮询.在来自PMU的溢出中断时,将调用perf中断处理程序以将指令指针保存到缓冲区中,然后将复位计数,程序将返回到执行状态.在您的示例中,perf中断处理程序将唤醒您的程序,它将执行多个syscall,返回到内核,然后内核将重新启动目标代码(因此,每个样本的开销大于使用mmap进行解析的开销).使用precise_ip标志,您可以激活PMU的高级采样(如果它具有这种模式,例如某些计数器的intel x86/em64t中的PEBS和PREC_DIST) ,例如INST_RETIRED,UOPS_RETIRED,BR_INST_RETIRED,BR_MISP_RETIRED,MEM_UOPS_RETIRED,MEM_LOAD_UOPS_RETIRED,MEM_LOAD_UOPS_LLC_HIT_RETIRED和 PEBS和IBS ),当指令地址由低滑差的硬件直接保存时.一些非常高级的PMU可以在硬件中进行采样,可以在没有软件中断的情况下自动重置计数器而在行中存储多个事件的溢出信息(关于precise_ip的某些描述是

Do you really want to get signal or you just need instruction pointers at every 1000 instructions executed? If you want to collect pointers, use perf_even_open with sampling mode, but do it from other program to disable measuring of the event collection code. Also, it will have less negative effect on your target program, if you will use not signals for every overflow (with huge amount of kernel-tracer interactions and switching from/to kernel), but instead use capabilities of perf_events to collect several overflow events into single mmap buffer and poll on this buffer. On overflow interrupt from PMU perf interrupt handler will be called to save the instruction pointer into buffer and then counting will be reset and program will return to execution. In your example, perf interrupt handler will woke your program, it will do several syscalls, return to kernel and then kernel will restart target code (so overhead per sample is greater than using mmap and parsing it). With precise_ip flag you may activate advanced sampling of your PMU (if it has such mode, like PEBS and PREC_DIST in intel x86/em64t for some counters like INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED, MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED and with simple hack to cycles too; or like IBS of AMD x86/amd64; paper about PEBS and IBS), when instruction address is saved directly by hardware with low skid. Some very advanced PMUs has ability to do sampling in hardware, storing overflow information of several events in row with automatic reset of counter without software interrupts (some descriptions on precise_ip are in the same paper).

我不知道在perf_events子系统和您的CPU中是否有可能同时激活两个perf_event任务:两个都在目标进程中计数事件,并且同时从其他进程中采样事件.使用高级PMU,这可以在硬件中实现,而现代内核中的perf_events可能允许这样做.但是您没有提供有关内核版本以及CPU供应商和家族的详细信息,因此我们无法回答这一部分.

I don't know if it is possible in perf_events subsystem and in your CPU to have two perf_event tasks active at same time: both count events in the target process and in the same time have sampling from other process. With advanced PMU this can be possible in the hardware and perf_events in modern kernel may allow it. But you give no details on your kernel version and your CPU vendor and family, so we can't answer this part.

您还可以尝试使用其他API来访问PMU,例如PAPI或likwid( https://github.com /RRZE-HPC/likwid ).其中一些可以直接读取PMU寄存器(有时为MSR),并且在启用计数时可以允许同时采样.

You also may try other APIs to access PMU like PAPI or likwid (https://github.com/RRZE-HPC/likwid). Some of them may directly read PMU registers (sometimes MSR) and may allow sampling at the same time when counting is enabled.

这篇关于perf_event_open溢出信号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆