我们可以在堆内存上使用非时间mov指令吗? [英] Can we use non-temporal mov instructions on heap memory?

查看:66
本文介绍了我们可以在堆内存上使用非时间mov指令吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Agner Fog的用汇编语言优化子例程-第11.8节缓存控制指令"中,他说:当写回缓存中发生缓存未命中时,内存写比读取要昂贵.必须读取整条缓存行从内存中进行修改,修改并在发生高速缓存未命中时写回可以通过使用非临时性的写指令MOVNTI,MOVNTQ,MOVNTDQ,MOVNTPD,MOVNTPS来避免.根据经验,建议仅在写入内存块时才使用非临时写入大于最大级别缓存的一半."

In Agner Fog's "Optimizing subroutines in assembly language - section 11.8 Cache control instructions," he says: "Memory writes are more expensive than reads when cache misses occur in a write-back cache. A whole cache line has to be read from memory, modified, and written back in case of a cache miss. This can be avoided by using the non-temporal write instructions MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPD, MOVNTPS. These instructions should be used when writing to a memory location that is unlikely to be cached and unlikely to be read from again before the would-be cache line is evicted. As a rule of thumb, it can be recommended to use non-temporal writes only when writing a memory block that is bigger than half the size of the largest-level cache."

摘自《英特尔64和IA-32体系结构软件开发人员手册合并卷》,2019年10月–这些SSE和SSE2非临时存储指令通过将访问的存储器视为写入合并(WC)类型,从而最大程度地减少了缓存污染.如果程序使用以下指令之一指定了非临时存储,并且目标区域的存储类型为回写(WB),直写(WT)或写组合​​(WC),处理器将执行以下操作……"

From the "Intel 64 and IA-32 Architectures Software Developer's Manual Combined Volumes Oct 2019" - "These SSE and SSE2 non-temporal store instructions minimize cache pollution by treating the memory being accessed as the write combining (WC) type. If a program specifies a non-temporal store with one of these instructions and the memory type of the destination region is write back (WB), write through (WT), or write combining (WC), the processor will do the following . . . "

我认为写合并内存仅在图形卡中找到,而在通用堆内存中则没有,并且作为扩展,上面列出的指令仅在这种情况下有用.如果是这样,那么为何Agner Fog会推荐这些说明?英特尔手册似乎暗示它仅对WB,WT或WC内存有用,但是随后他们说被访问的内存将被视为WC.

I thought that write-combining memory is only found in graphics cards but not in general-purpose heap memory -- and by extension that the instructions listed above would only be useful in such cases. If that's true, why would Agner Fog recommend those instructions? The Intel manual seems to suggest that it's only useful with WB, WT or WC memory, but then they say that the memory being accessed will be treated as WC.

如果这些指令实际上可以用在对堆内存的普通写入中,是否有任何限制?如何分配写合并内存?

If those instructions actually can be used in an ordinary write to heap memory, are there any limitations? How do I allocate write-combining memory?

推荐答案

您可以在普通WB内存(即堆)上使用NT存储,例如 movntps .针对mcpy的增强型REP MOVSB 了解有关NT商店和普通商店的更多信息.

You can use NT stores like movntps on normal WB memory (i.e. the heap). See also Enhanced REP MOVSB for memcpy for more about NT stores vs. normal stores.

尽管这些MTRR和/或PAT已将其设置为正常WB,但对于这些NT存储区而言,仍将其作为WC进行处理.

It treats it as WC for the purposes of those NT stores, despite the MTRR and/or PAT having it set to normal WB.

英特尔文档告诉您NT将工作"存储在WB,WT和WC内存中.(但不是强排序的UC不可缓存内存,当然也不在WP写保护的内存上.)

The Intel docs are telling you that NT stores "work" on WB, WT, and WC memory. (But not strongly-ordered UC uncacheable memory, and of course not on WP write-protected memory).

您正确的是,通常仅将视频RAM(或可能的其他类似的设备内存区域)映射为WC.不,在像Linux这样的普通操作系统下,您无法在用户空间进程中轻松分配WC内存,但是通常您不会想要的.

You are correct that normally only video RAM (or possibly other similar device-memory regions) are mapped WC. And no, you can't easily allocate WC memory in a user-space process under a normal OS like Linux, but you wouldn't normally want to.

您只能在WC内存上使用SSE4 NT负载(否则当前的CPU会忽略NT提示),但是负载的某些缓存污染是为硬件预取和缓存工作付出的代价很小.您可以从WB内存中使用NT prefetch 来在某些级别的缓存中减少污染,例如绕过L2.但这很难调整.

You can only use SSE4 NT loads on WC memory (otherwise current CPUs ignore the NT hint), but some cache pollution for loads is a small price to pay for HW prefetch and caching working. You can use NT prefetch from WB memory to reduce pollution in some levels of cache, e.g. bypassing L2. But that's hard to tune.

IIRC, normal 存储(如 mov )具有您从NT存储获得的存储合并行为.但是您无需使用WC内存即可使NT存储工作.

IIRC, normal stores like mov on WC memory have the store-merging behaviour you get from NT stores. But you don't need to use WC memory for NT stores to work.

这篇关于我们可以在堆内存上使用非时间mov指令吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆