英特尔性能监视器计数器可用于测量内存带宽吗? [英] Can the Intel performance monitor counters be used to measure memory bandwidth?

查看:82
本文介绍了英特尔性能监视器计数器可用于测量内存带宽吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

英特尔 PMU 能否用于测量每核读/写内存带宽使用情况?这里的内存"指的是 DRAM(即,不命中任何缓存级别).

Can the Intel PMU be used to measure per-core read/write memory bandwidth usage? Here "memory" means to DRAM (i.e., not hitting in any cache level).

推荐答案

是的,这是可能的,尽管它不一定像编写普通 PMU 计数器那样简单.

Yes, this is possible, although it is not necessarily as straightforward as programming the usual PMU counters.

一种方法是使用通过 PCI 空间访问的可编程内存控制器计数器.一个好的起点是在 pcm-memory 中的实现rel="noreferrer">pcm-memory.cpp.此应用程序向您显示每个套接字或每个内存控制器的吞吐量,这适用于某些用途.特别是,带宽在所有内核之间共享,因此在安静的机器上,您可以假设大部分带宽与被测进程相关联,或者如果您想在套接字级别进行监控,这正是您想要的.

One approach is to use the programmable memory controller counters which are accessed via PCI space. A good place to start is by examining Intel's own implementation in pcm-memory at pcm-memory.cpp. This app shows you the per-socket or per-memory-controller throughput, which is suitable for some uses. In particular, the bandwidth is shared among all cores, so on a quiet machine you can assume most of the bandwidth is associated with the process under test, or if you wanted to monitor at the socket level it's exactly what you want.

另一种选择是对offcore repsonse"计数器进行仔细的编程.据我所知,这些与 L2(最后一个核心私有缓存)和系统其余部分之间的流量有关.您可以通过非核心响应的结果进行过滤,因此您可以使用各种L3 未命中"事件的组合并乘以缓存线大小以获得读写带宽.事件的粒度非常细,因此您可以通过首先导致访问的原因进一步细分:指令获取、数据需求请求、预取等.

The other alternative is to use careful programming of the "offcore repsonse" counters. These, as far as I know, relate to traffic between the L2 (the last core-private cache) and the rest of the system. You can filter by the result of the offcore response, so you can use a combination of the various "L3 miss" events and multiply by the cache line size to get a read and write bandwidth. The events are quite fine grained, so you can further break it down by the what caused the access in the first place: instruction fetch, data demand requests, prefetching, etc, etc.

offcore 响应计数器通常落后于诸如 perflikwid 之类的工具的支持,但至少最近的版本似乎有合理的支持,即使对于像 SKL 这样的客户端部分也是如此.

The offcore response counters generally lag behind in support by tools like perf and likwid but at least recent versions seem to have reasonable support, even for client parts like SKL.

这篇关于英特尔性能监视器计数器可用于测量内存带宽吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆