我应该如何以非root身份读取Linux上的Intel PCI uncore性能计数器? [英] How should I read Intel PCI uncore performance counters on Linux as non-root?

查看:98
本文介绍了我应该如何以非root身份读取Linux上的Intel PCI uncore性能计数器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拥有一个允许对Linux可执行文件的关键部分进行自我分析"的库.以一种可以使用 gettimeofday() RDTSC 我希望能够计算诸如分支未命中和缓存命中之类的事件.

I'd like to have a library that allows 'self profiling' of critical sections of Linux executables. In the same way that one can time a section using gettimeofday() or RDTSC I'd like to be able to count events such as branch misses and cache hits.

有许多工具可以做类似的事情( perf PAPI 标记API .

There are a number of tools that do similar things (perf, PAPI, likwid) but I haven't found anything that matches what I'm looking for. Likwid comes closest, so I'm mostly looking at ways to modify it's existing Marker API.

每核计数器的值存储在MSR(特定于模型的寄存器)中,但是对于当前的Intel处理器(从Sandy Bridge开始),非核"测量(内存访问以及与整个CPU相关的其他事物)通过PCI访问.

The per-core counters are values are stored in MSR's (Model Specific Registers), but for current Intel processors (Sandy Bridge onward) the "uncore" measurements (memory accesses and other things that pertain to the CPU as a whole) are accessed with PCI.

通常采用的方法是使用 msr 内核读取 MSR模块,并且从

The usual approach taken is that the MSR's are read using the msr kernel module, and that the PCI counters (if supported) are read from the sysfs-pci hierarchy. The problem is that both or these require the reader to be running as root and have 'setcap cap_sys_rawio'. This is difficult (or impossible) for many users.

它也不是特别快.由于目标是分析少量代码,因此使用syscall读取每个计数器所产生的歪斜"意义重大.事实证明,普通用户可以使用RDPMC读取MSR寄存器.对于读取PCI寄存器,我还没有很好的解决方案.

It's also not particularly fast. Since the goal is to profile small pieces of code, the 'skew' from reading each counter with a syscall is significant. It turns out that the MSR registers can be read by a normal user using RDPMC. I don't yet have a great solution for reading the PCI registers.

一种方法是通过以root用户身份运行的访问服务器"代理所有内容.这可以工作,但比使用/proc/bus/pci还要慢(因此准确性较低).我正在尝试找出如何最好地使计数器的PCI配置"空间对非特权程序可见.

One way would be to proxy everything through an 'access server' running as root. This would work, but would be even slower (and hence less accurate) than using /proc/bus/pci. I'm trying to figure out how best to make the PCI 'configuration' space of the counters visible to a non-privileged program.

我想出的最好办法是让服务器以root身份运行,客户端可以在启动时通过Unix本地域套接字连接到该服务器.以root用户身份,服务器将打开相应的设备文件,然后将打开的文件句柄传递给客户.然后,客户端应该能够在执行期间自行进行多次读取.有什么理由不起作用吗?

The best I've come up with is to have a server running as root, to which the client can connect at startup via a Unix local domain socket. As root, the server will open the appropriate device files, and pass the open file handle to the client. The client should then be able to make multiple reads during execution on its own. Is there any reason this wouldn't work?

但是即使我这样做,我仍然会使用每次访问都有pread()系统调用(或类似的调用),其中可能有数十亿次访问.如果尝试对小于1000个周期的小部分进行计时,则可能会产生过多的开销.相反,我想弄清楚如何以内存映射I/O .

But even if I do that, I'll still be using a pread() system call (or something comparable) for every access, of which there might be billions. If trying to time small sub-1000 cycle sections, this might be too much overhead. Instead, I'd like to figure out how to access these counters as Memory Mapped I/O.

也就是说,我想以内存中的地址对每个计数器进行只读访问,并且I/O映射发生在处理器和IOMMU级别,而不是操作系统.这在.

That is, I'd like to have read-only access to each counter represented by an address in memory, with the I/O mapping happening at the level of the processor and IOMMU rather than involving the OS. This is described in the Intel Architectures Software Developer Vol 1 in section 16.3.1 Memory Mapped I/O.

这似乎几乎是可能的.在proc_bus_pci_mmap()中,/proc/bus/pci 设备处理程序似乎允许映射配置区域,但只能按root映射,并且仅当我具有CAP_SYS_RAWIO时才如此.

This seems almost possible. In proc_bus_pci_mmap() the device handler for /proc/bus/pci seems to allow the configuration area to be mapped, but only by root, and only if I have CAP_SYS_RAWIO.

static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
{
        struct pci_dev *dev = PDE_DATA(file_inode(file));
        struct pci_filp_private *fpriv = file->private_data;
        int i, ret;

        if (!capable(CAP_SYS_RAWIO))
                return -EPERM;

        /* Make sure the caller is mapping a real resource for this device */
        for (i = 0; i < PCI_ROM_RESOURCE; i++) {
                if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))
                        break;
        }

        if (i >= PCI_ROM_RESOURCE)
                return -ENODEV;

        ret = pci_mmap_page_range(dev, vma,
                                  fpriv->mmap_state,
                                  fpriv->write_combine);
        if (ret < 0)
                return ret;

        return 0;
}

因此,尽管我可以将文件句柄传递给客户端,但它无法mmap(),而且我想不出任何与非独立进程共享mmap区域的方法.

So while I could pass the file handle to the client, it can't mmap() it, and I can't think of any way to share an mmap'd region with a non-descendent process.

(最后,我们解决了问题!)

(Finally, we get to the questions!)

因此,假设我真的想要一个非特权进程中的指针,该指针每次都可以从PCI配置空间读取而无需内核的帮助,我有什么选择?

So presuming I really want have a pointer in a non-privileged process that can read from PCI configuration space without help from the kernel each time, what are my options?

1)也许我可以让根进程打开/dev/mem,然后将打开的文件描述符传递给子进程,然后子进程可以映射它想要的部分.但是我想不出任何办法来确保它甚至是远程安全.

1) Maybe I could have a root process open /dev/mem, and then pass that open file descriptor to the child, which then can then mmap the part that it wants. But I can't think of any way to make that even remotely secure.

2)我可以编写自己的内核模块,该模块看起来很像linux/drivers/pci/proc.c,但是省略了对通常权限的检查.由于我可以将其锁定为只读状态,并且仅用于所需的PCI空间,因此它应该是相当安全的.

2) I could write my own kernel module, which looks a lot like linux/drivers/pci/proc.c but omits the check for the usual permissions. Since I can lock this down so that it is read-only and just for the PCI space that I want, it should be reasonably safe.

3)???(这是您进来的地方)

3) ??? (This is where you come in)

推荐答案

也许答案有点晚了.答案是使用likwid.如您所说,必须通过root用户来读取MSR/sysfs-pci.构建likwid accessDaemon并赋予其访问MSR的权限将绕过此问题.当然,由于某些进程间通信,性能值可能会有一些延迟.此延迟不是很高.
(对于较小的代码段,性能计数器无论如何都不太精确.)

maybe the answer is a little late. The answer is using likwid. As you said read MSR/sysfs-pci has to be done by root. Building likwid accessDaemon and giving it the right to access the MSR would bypass this issue. Of course, due to some inter-process communication, performance values could have some delay. This delay is not very high.
(For small code sections, the performance counters are unprecise in some how, in any way.)

Likwid还可以进行非核心事件.最好的

Likwid can also with uncore events. Best

这篇关于我应该如何以非root身份读取Linux上的Intel PCI uncore性能计数器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆