每个mmap/access/munmap有2个TLB缺失 [英] Two TLB-miss per mmap/access/munmap

查看:136
本文介绍了每个mmap/access/munmap有2个TLB缺失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

for (int i = 0; i < 100000; ++i) {
    int *page = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
                            MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

    page[0] = 0;

    munmap(page, PAGE_SIZE);
}

我希望在用户空间中获得〜100000 dTLB存储缺失,每个迭代一次(内核也有〜100000页面错误和dTLB加载缺失).运行以下命令,结果大约是我预期的2倍.如果有人能弄清为什么会这样,我将不胜感激:

I expect to get ~100000 dTLB-store-misses in userspace, one per each iteration (Also ~100000 page-faults and dTLB-load-misses for kernel). Running following command, the result is roughly 2x what I expect. I would appreciate if someone could clarify why this is the case:

perf stat -e dTLB-store-misses:u ./test
Performance counter stats for './test':

           200,114      dTLB-store-misses

       0.213379649 seconds time elapsed

P.S.我已经验证并确定生成的代码不会引入任何可证明该结果合理的内容.另外,我确实得到了约100000个页面错误和dTLB-load-misses:k.

P.S. I have verified and am certain that the generated code doesn't introduce anything that would justify this result. Also, I do get ~100000 page-faults and dTLB-load-misses:k.

推荐答案

我希望在用户空间中获得约100000个dTLB存储缺失,每次迭代一次

I expect to get ~100000 dTLB-store-misses in userspace, one per each iteration

我希望:

  • CPU尝试执行page[0] = 0;,尝试加载包含page[0]的缓存行,找不到它的TLB条目,递增dTLB-load-misses,获取翻译,实现页面不存在",然后产生页面错误.
  • 页面错误处理程序分配了一个页面,并且(因为已修改了页面表)确保TLB条目无效(可能是由于依赖于Intel CPU始终不缓存不存在"页面的事实,而不必通过显式的)做INVLPG).页面错误处理程序返回导致错误的指令,以便可以重试.
  • CPU再次尝试执行page[0] = 0;,尝试加载包含page[0]的高速缓存行,找不到它的TLB条目,递增dTLB-load-misses,获取转换,然后修改高速缓存行.
  • CPU tries to do page[0] = 0;, tries to load the cache line containing page[0], can't find the TLB entry for it, increments dTLB-load-misses, fetches the translation, realises the page is "not present", then generates a page fault.
  • Page fault handler allocates a page and (because the page table was modified) ensures that the TLB entry is invalidated (possibly by relying on the fact that Intel CPU's don't cache "not present" pages anyway, not necessarily by explicitly doing an INVLPG). The page fault handler returns to the instruction that caused the fault so it can be retried.
  • CPU tries to do page[0] = 0; a second time, tries to load the cache line containing page[0], can't find the TLB entry for it, increments dTLB-load-misses, fetches the translation, then modifies the cache line.

为了娱乐,您可以将MAP_POPULATE标志与mmap()结合使用,以尝试使内核预先分配页面(并避免页面错误和第一个TLB遗漏).

For fun, you could use the MAP_POPULATE flag with mmap() to try to get the kernel to pre-allocate the pages (and avoid the page fault and the first TLB miss).

这篇关于每个mmap/access/munmap有2个TLB缺失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆