没有内存引用的无限循环中的高速缓存未命中? [英] Cache misses in an infinite loop with no memory references?

查看:124
本文介绍了没有内存引用的无限循环中的高速缓存未命中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是运行一会儿1个循环并测量缓存未命中。

I am just running a while 1 loop and measuring cache miss.

int main() {
   while(1);
}

此特定进程与一个CPU相关联(使用任务集),并且该CPU孤立的,意味着没有其他进程可以在同一CPU上进行调度。现在,我开始使用 perf 测量缓存性能,令我惊讶的是,上一级缓存未命中率为42%。

This particular process is tied to one cpu(using taskset) and this cpu is isolated, meaning no other process can get scheduled on the same cpu. Now I start measuring cache performance using perf and to my surprise last level cache miss is 42%.

22,579      cache-references                                            (20.82%)
8,976      **cache-misses         #   39.754 %** of all cache refs      (20.83%)
4,414      **LLC-load-misses      #   42.74%** of all LL-cache hits

我很惊讶,因为我没有执行任何内存操作,所以我期望缓存丢失为零。任何帮助/想法。
cpu:型号名称:Intel(R)Xeon(R)CPU E5-2670 v3 @ 2.30GHz

I am surprised and I expected zero cache miss as I am not doing any memory operation. Any help/thoughts on this. cpu: model name : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz

我所做的另一项实验是纳米睡眠.1毫秒和缓存未命中率降低到不足1%。我不知道发生了什么。

Another experiment I did with giving a nano sleep of .1 milli second and cache miss reduced to less than 1%. I have no clue on whats going on.

推荐答案

可能perf计数器正在计数来自中断处理程序中内核代码的某些事件。 perf计数器事件不是很精确,因此您将获得归因于附近指令的计数,而且我猜想,当内核代码执行 iret 时,操作仍在进行中。否则,这可能只是完全计数发生在内核上下文中的事件,因为在每次中断时与性能计数器打乱是很昂贵的。

Probably the perf counters are counting some events from kernel code in interrupt handlers. perf counter events aren't precise, so you'll get counts attributed to nearby instructions, and I guess for ops still in the pipeline when the kernel code did an iret. Or this may just be fully counting events that happened in kernel context, since it would be expensive to mess with perf-counters on every interrupt.

请注意,只有当您不考虑总共有多少个高速缓存访​​问时,高速缓存未命中率才看起来很糟糕,总计:

Note that the cache-miss ratio only looks bad if you don't take into account how few cache accesses there are, total:

$ perf stat -e cycles,instructions,L1-dcache-loads,LLC-load-misses,LLC-loads,cache-references,cache-misses  ./infloop

Performance counter stats for './infloop':

 6,177,174,823      cycles                                                        (28.79%)
 6,167,361,425      instructions              #    1.00  insns per cycle          (43.00%)
     1,884,882      L1-dcache-loads                                               (42.93%)
        13,133      LLC-load-misses           #   19.41% of all LL-cache hits     (42.75%)
        67,676      LLC-loads                                                     (28.74%)
       391,004      cache-references                                              (28.50%)
        18,025      cache-misses              #    4.610 % of all cache refs      (28.42%)

   2.604227273 seconds time elapsed

在Conroe Core2Duo E6600上计时(因为我在Intel SnB主板上安装了Intel损坏的BIOS更新程序。)

Timed on a Conroe Core2Duo E6600 (since I bricked my Intel SnB motherboard with Intel's broken BIOS updates).

缓存引用 cache-misss 是内核PMU事件,而性能列表 LLC-* 和 L1-* 是硬件缓存事件 c>。我不确定这是什么意思。

cache-references and cache-misses are "Kernel PMU events", while LLC-* and L1-* are "Hardware cache events", according to perf list. I'm not sure what that means.

这篇关于没有内存引用的无限循环中的高速缓存未命中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆