没有内存引用的无限循环中的高速缓存未命中? [英] Cache misses in an infinite loop with no memory references?
问题描述
我只是运行一会儿1个循环并测量缓存未命中。
I am just running a while 1 loop and measuring cache miss.
int main() {
while(1);
}
此特定进程与一个CPU相关联(使用任务集),并且该CPU孤立的,意味着没有其他进程可以在同一CPU上进行调度。现在,我开始使用 perf
测量缓存性能,令我惊讶的是,上一级缓存未命中率为42%。
This particular process is tied to one cpu(using taskset) and this cpu is isolated, meaning no other process can get scheduled on the same cpu. Now I start measuring cache performance using perf
and to my surprise last level cache miss is 42%.
22,579 cache-references (20.82%)
8,976 **cache-misses # 39.754 %** of all cache refs (20.83%)
4,414 **LLC-load-misses # 42.74%** of all LL-cache hits
我很惊讶,因为我没有执行任何内存操作,所以我期望缓存丢失为零。任何帮助/想法。
cpu:型号名称:Intel(R)Xeon(R)CPU E5-2670 v3 @ 2.30GHz
I am surprised and I expected zero cache miss as I am not doing any memory operation. Any help/thoughts on this. cpu: model name : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
我所做的另一项实验是纳米睡眠.1毫秒和缓存未命中率降低到不足1%。我不知道发生了什么。
Another experiment I did with giving a nano sleep of .1 milli second and cache miss reduced to less than 1%. I have no clue on whats going on.
推荐答案
可能perf计数器正在计数来自中断处理程序中内核代码的某些事件。 perf计数器事件不是很精确,因此您将获得归因于附近指令的计数,而且我猜想,当内核代码执行 iret
时,操作仍在进行中。否则,这可能只是完全计数发生在内核上下文中的事件,因为在每次中断时与性能计数器打乱是很昂贵的。
Probably the perf counters are counting some events from kernel code in interrupt handlers. perf counter events aren't precise, so you'll get counts attributed to nearby instructions, and I guess for ops still in the pipeline when the kernel code did an iret
. Or this may just be fully counting events that happened in kernel context, since it would be expensive to mess with perf-counters on every interrupt.
请注意,只有当您不考虑总共有多少个高速缓存访问时,高速缓存未命中率才看起来很糟糕,总计:
Note that the cache-miss ratio only looks bad if you don't take into account how few cache accesses there are, total:
$ perf stat -e cycles,instructions,L1-dcache-loads,LLC-load-misses,LLC-loads,cache-references,cache-misses ./infloop
Performance counter stats for './infloop':
6,177,174,823 cycles (28.79%)
6,167,361,425 instructions # 1.00 insns per cycle (43.00%)
1,884,882 L1-dcache-loads (42.93%)
13,133 LLC-load-misses # 19.41% of all LL-cache hits (42.75%)
67,676 LLC-loads (28.74%)
391,004 cache-references (28.50%)
18,025 cache-misses # 4.610 % of all cache refs (28.42%)
2.604227273 seconds time elapsed
在Conroe Core2Duo E6600上计时(因为我在Intel SnB主板上安装了Intel损坏的BIOS更新程序。)
Timed on a Conroe Core2Duo E6600 (since I bricked my Intel SnB motherboard with Intel's broken BIOS updates).
缓存引用
和 cache-misss
是内核PMU事件,而性能列表$ c $,> LLC-*
和 L1-*
是硬件缓存事件 c>。我不确定这是什么意思。
cache-references
and cache-misses
are "Kernel PMU events", while LLC-*
and L1-*
are "Hardware cache events", according to perf list
. I'm not sure what that means.
这篇关于没有内存引用的无限循环中的高速缓存未命中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!