如何在过程中配置和采样英特尔性能计数器 [英] How to Configure and Sample Intel Performance Counters In-Process

查看:82
本文介绍了如何在过程中配置和采样英特尔性能计数器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之,我正在尝试在用户级基准测试流程(伪代码,假设x86_64和UNIX系统)中实现以下目标:

In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system):

results[] = ...
for (iteration = 0; iteration < num_iterations; iteration++) {
    pctr_start = sample_pctr();
    the_benchmark();
    pctr_stop = sample_pctr();
    results[iteration] = pctr_stop - pctr_start;
}

FWIW,我正在考虑使用的性能计数器是CPU_CLK_UNHALTED.THREAD_ALL,用于读取独立于时钟频率变化的核心周期数(在

FWIW, the performance counter I am thinking of using is CPU_CLK_UNHALTED.THREAD_ALL, to read the number of core cycles independent of clock frequency changes (In an earlier question I had been planning to use the TSC register for this, but alas, that is not what this register measures at all).

我最初的意图是使用内联汇编程序首先使用WRMSR配置计数器,然后使用sample_pctr()内部的RDPMC读取计数器.

My initial intention was to use inline assembler to first configure a counter using WRMSR, then to read the counter using RDPMC inside sample_pctr().

我偶然发现了第一个障碍,因为编写MSR需要内核特权.看来您实际上可以从用户空间读取计数器(如果配置正确),但是配置计数器(带有MSR)的操作需要由内核.

I stumbled at the first hurdle, as writing MSRs requires kernel privileges. It seems like you can in fact read the counters from user space (if configured correctly), but the act of configuring the counter (with an MSR) needs to be undertaken by the kernel.

有人知道一种轻巧的方法来请求内核从用户空间配置性能计数器,以便随后在基准测试工具中使用RDPMC吗?

Does anyone know a lightweight way to ask the kernel to configure the a performance counters from user-space so that I can then use RDPMC from within my benchmark harness?

我研究过/考虑过的东西:

Stuff I've looked into/thought about:

  • Perf tools for Linux. Seems to be geared up for sampling over the whole lifetime of a process, not within a process as specific points (before and after each iteration).
  • Use perf syscalls directly (i.e. perf_event_open). Looks like the counter value will only update periodically (using a sample rate) or after the counter exceeds a threshold. I need the counter value precisely at the moment I ask. This is why RDPMC seemed so attractive. I imagine that sampling frequently will itself skew the performance counter readings.
  • PAPI builds on perf, so probably inherits the above problem.
  • Write a kernel module -- too much effort, too error prone.

理想情况下,我想要一个可以在OpenBSD和Linux上运行的解决方案,但是我认为这是一个很高的要求.也许目前仅适用于Linux.

Ideally I would like a solution which works on OpenBSD and Linux, but somehow I think that is a tall order. Perhaps just for Linux for now.

我们非常感谢您的帮助.谢谢.

Any help is most appreciated. Thanks.

我刚刚找到了 Linux msr设备节点,这可能就足够了.如果出现更好的答案,我将保留该问题.

I just found the Linux msr device node, which would probably suffice. I'll leave the question up in case a better answer shows up.

推荐答案

似乎最好的方法-至少对于Linux-是使用

It seems the best way -- for Linux at least -- is to use the msr device node.

您只需打开设备节点,查找所需的MSR地址,然后读取或写入8个字节即可.

You simply open a device node, seek to the address of the MSR required, and read or write 8 bytes.

OpenBSD更加困难,因为(在编写本文时)没有用户空间代理到MSR.因此,您需要手动编写内核模块或实现sysctl.

OpenBSD is harder, since (at the time of writing) there is no user-space proxy to the MSRs. So you would need to write a kernel module or implement a sysctl by hand.

这篇关于如何在过程中配置和采样英特尔性能计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆