TLB是否包含在内? [英] Is TLB inclusive?

查看:115
本文介绍了TLB是否包含在内?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TLB层次结构是否包含在现代x86 CPU(例如Skylake或其他Lakes)上?

Is TLB hierarchy inclusive on modern x86 CPU (e.g. Skylake, or maybe other Lakes)?

例如, prefetchtn 将数据带到级别缓存 n + 1 以及DTLB中的相应TLB条目.它也会包含在STLB中吗?

For example, prefetchtn brings data to the level cache n + 1 as well as a corresponding TLB entry in DTLB. Will it be contained in the STLB as well?

推荐答案

AFAIK,在Intel SnB系列2级TLB上是一级iTLB和dTLB的牺牲品缓存.(我可以我最初没有在IDK上找到它的来源,所以带着一粒盐拿来.我原本以为这是一个众所周知的事实,但可能是一个误解.我发明了!)

AFAIK, on Intel SnB-family 2nd-level TLB is a victim cache for first-level iTLB and dTLB. (I can't find a source for this and IDK where I read it originally. So take this with a grain of salt. I had originally thought this was a well-known fact, but it might have been a misconception I invented!)

我认为这已记录在英特尔的优化手册,但事实并非如此.

I thought this was documented somewhere in Intel's optimization manual, but it doesn't seem to be.

如果这是正确的话,那么从dTLB撤消条目后,再过一段时间再按STLB即可获得基本相同的好处,但不会浪费重复条目上的空间.

If this is correct, you get basically the same benefit of hitting in STLB some time later after the entry has been evicted from dTLB, but without wasting space on duplicate entries.

因此,例如,如果将代码和数据保留在同一页面中,则在执行代码时可能会遇到iTLB遗漏,然后在STLB中也会遗漏dTLB遗漏,并且如果该代码从同一页.(这是有原因的,因为我们没有将只读数据与x86上的代码保存在同一页面上;它没有代码大小上的优势,并且两个TLB中都有相同的页面,因此浪费了iTLB + dTLB的覆盖范围.)

So for example if you keep code and data in the same page, you could get an iTLB miss when executing the code, and then a dTLB miss that also misses in the STLB and does another page walk if that code loads data from the same page. (That's on reason we don't keep read-only data in the same page as code on x86; it has no code-size advantage and wastes iTLB + dTLB coverage footprint by having the same page in both TLBs.)

但是也许我错了;Travis(@BeeOnRope)建议使用数据预取来降低iTLB丢失成本;他假设页面漫游者在STLB和dTLB中填写了一个条目.(在Core 2(?)及更高版本上,TLB-miss software-prefetch可以触发行走而不是放弃.)

But perhaps I'm wrong; Travis (@BeeOnRope) suggested using data prefetch to reduce iTLB miss cost; he's assuming that the page walker fills an entry in STLB and dTLB. (On Core 2(?) and later, TLB-miss software-prefetch can trigger a walk instead of giving up.)

我认为L2预取对于否则会丢失DRAM的代码可能非常有效.是的,您不需要对ITLB或L1I进行加热,但可以对L2和STLB进行加热,因此第一次执行将花费十几个周期.

I think L2 prefetching is likely to be very effective for code that would otherwise miss to DRAM. Yes, you don't warm the ITLB or the L1I, but you warm the L2 and STLB, so you are taking something like a dozen cycles for the the first execution.

这将适用于 NINE STLB;它实际上不必是 inclusive 的,只是不是互斥的或受害者缓存的.(例如,L2高速缓存为NINE W.L1i高速缓存和L1d高速缓存.它们通过它们进行获取,但是可以从L2逐出行,而不必强制从任一L1高速缓存逐出.)

This would work for a NINE STLB; it doesn't have to actually be inclusive, just not exclusive or a victim cache. (e.g. L2 cache is NINE wrt. L1i cache and L1d cache. They fetch through it, but lines can be evicted from L2 without forcing eviction from either L1 cache.)

带有链接的更多详细信息:

Further details with links to source:

根据Intel上的CPUID结果了解TLB

https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Memory_Hierarchy

https://www.7-cpu.com/cpu/Skylake.html 具有计时结果和TLB大小,但没有我们正在寻找的信息.

https://www.7-cpu.com/cpu/Skylake.html has timing results and TLB sizes, but not the info we're looking for.

核心2有所不同: https://www.realworldtech.com/nehalem/8/说有一个只有16个条目的L1dTLB仅用于装载,并使用L2 DTLB进行存储以及L1dTLB缺失装载.

Core 2 was different: https://www.realworldtech.com/nehalem/8/ says that has a tiny 16-entry L1dTLB used only for loads, and uses L2 DTLB for stores as well as L1dTLB-miss loads.

Nehalem改变了这一点(64项DTLB),并将存储层次结构重新组织为客户端(非服务器)芯片上仍在使用的内容:大型共享包容LLC和256k私有L2.(当然,仍然是通常的拆分32k L1i/d)

Nehalem changed that (64-entry DTLB) along with reorganizing the memory hierarchy to what's still used on client (non-server) chips: large shared inclusive LLC and 256k private L2. (And of course still the usual split 32k L1i/d) Which cache mapping technique is used in intel core i7 processor?

这篇关于TLB是否包含在内?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆