clflush是否还会删除TLB条目? [英] Does clflush also remove TLB entries?
问题描述
clflush
1 还会刷新关联的TLB条目吗?由于 clflush
以高速缓存行的粒度运行,而TLB条目以(更大)的页面粒度存在,所以我不认为这是可行的,但我准备感到惊讶。 p>
1 ...或 clflushopt
我认为假定不是安全的;将 invlpg
烘焙到 clflush
中听起来像是一个疯狂的设计决定,我认为没人会做出。您通常希望使页面中的多行无效。也没有明显的好处;
即使只是删除最终的TLB条目(也不一定会使任何页面目录缓存无效) )会比 invlpg
弱,但仍然没有道理。
所有现代x86都使用带有物理索引/标记的缓存,不是虚拟的。 (VIPT L1d缓存实际上是具有索引的自由转换的PIPT,因为它是从页面内偏移量的一部分的地址位中获取的。)即使缓存是虚拟的,使TLB条目无效也需要使虚拟缓存无效,而不是相反。
根据IACA, clflush
在HSW-SKL和NHM-IVB上的4微克(包括微融合)。因此,它甚至还没有在Intel上进行微编码。
IACA不会对 invlpg
进行建模,但我想它会更多哎呀。 (而且它很荣幸,因此测试起来并非易事。)极有可能在HSW之前的那些额外操作会导致TLB失效。
我没有任何信息
invlpg
享有特权的事实是另一个事实期望 clflush
不是它的超集的原因。 clflush
没有特权。大概是出于性能原因, invlpg
仅限于环0。
但是 invlpg
不会出现页面错误,因此用户空间可以使用它来使内核TLB条目无效,从而延迟实时进程和中断处理程序。 ( wbinvd
被授予特权的原因类似:它非常慢,我认为不可中断。) clflush
确实在非法地址上出错因此它不会打开该拒绝服务漏洞。不过,您可以 clflush
共享的VDSO页面。
除非有某些原因导致CPU 想要暴露用户空间中的 invlpg
(通过将其烘焙到 clflush
),我真的不没看到任何供应商会这样做的原因。
在未来的计算中使用非易失性DIMM的可能性更低将来的任何CPU都将使其超慢地循环执行 clflush
的一系列内存。您可能希望大多数使用内存映射NV存储的软件都使用 clflushopt
,但是我希望CPU供应商能够制作 clflush
也要尽快。
Does clflush
1 also flush associated TLB entries? I would assume not since clflush
operates at a cache-line granularity, while TLB entries exist at the (much larger) page granularity - but I am prepared to be suprised.
1 ... or clflushopt
although one would reasonably assume their behaviors are the same.
I think it's safe to assume no; baking invlpg
into clflush
sounds like an insane design decision that I don't think anyone would make. You often want to invalidate multiple lines in a page. There's also no apparent benefit; flushing the TLB as well doesn't make it any easier to implement data-cache flushing.
Even just dropping the final TLB entry (without necessarily invalidating any page-directory caching) would be weaker than invlpg
but still not make sense.
All modern x86s use caches with physical indexing/tagging, not virtual. (VIPT L1d caches are really PIPT with free translation of the index because it's taken from address bits that are part of the offset within a page.) And even if caches were virtual, invalidating TLB entries requires invaliding virtual caches but not the other way around.
According to IACA, clflush
is only 2 uops on HSW-SKL, and 4 uops (including micro-fusion) on NHM-IVB. So it's not even micro-coded on Intel.
IACA doesn't model invlpg
, but I assume it's more uops. (And it's privileged so it's not totally trivial to test.) It's remotely possible those extra uops on pre-HSW were for TLB invalidation.
I don't have any info on AMD.
The fact that invlpg
is privileged is another reason to expect clflush
not to be a superset of it. clflush
is unprivileged. Presumably it's only for performance reasons that invlpg
is restricted to ring 0 only.
But invlpg
won't page-fault, so user-space could use it to invalidate kernel TLB entries, delaying real-time processes and interrupt handlers. (wbinvd
is privileged for similar reasons: it's very slow and I think not interruptible.) clflush
does fault on illegal addresses so it wouldn't open up that denial-of-service vulnerability. You could clflush
the shared VDSO page, though.
Unless there's some reason why a CPU would want to expose invlpg
in user-space (by baking it in to clflush
), I really don't see why any vendor would do it.
With non-volatile DIMMs in the future of computing, it's even less likely that any future CPUs will make it super-slow to loop over a range of memory doing clflush
. You'd expect most software using memory mapped NV storage to be using clflushopt
, but I'd expect CPU vendors to make clflush
as fast as possible, too.
这篇关于clflush是否还会删除TLB条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!