什么是_mm_prefetch()本地提示? [英] What are _mm_prefetch() locality hints?

查看:162
本文介绍了什么是_mm_prefetch()本地提示?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

内部指南仅说明了void _mm_prefetch (char const* p, int i):

从内存中将包含地址p的数据行提取到a 位置提示i指定的缓存层次结构中的位置.

Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified by the locality hint i.

您能否列出int i参数的可能值并解释其含义?

Could you list the possible values for int i parameter and explain their meanings?

我找到了_MM_HINT_T0_MM_HINT_T1_MM_HINT_T2_MM_HINT_NTA_MM_HINT_ENTA,但是我不知道这是否是详尽的清单及其含义.

I've found _MM_HINT_T0, _MM_HINT_T1, _MM_HINT_T2, _MM_HINT_NTA and _MM_HINT_ENTA, but I don't know whether this is an exhaustive list and what they mean.

如果特定于处理器,我想知道它们在Ryzen和最新的Intel Core处理器上做什么.

If processor-specific, I would like to know what they do on Ryzen and latest Intel Core processors.

推荐答案

有时候,更好地理解内在函数是根据它们表示的指令而不是描述中给出的抽象语义.

Sometimes intrinsics are better understood in terms of the instruction they represent rather than as the abstract semantic given in their descriptions.

像今天一样,完整的局部常数是

The full set of the locality constants, as today, is

#define _MM_HINT_T0 1
#define _MM_HINT_T1 2
#define _MM_HINT_T2 3
#define _MM_HINT_NTA 0
#define _MM_HINT_ENTA 4
#define _MM_HINT_ET0 5
#define _MM_HINT_ET1 6
#define _MM_HINT_ET2 7

中所述关于英特尔至强融核协处理器预取功能的论文.

对于IA32/AMD处理器,该设置减少为

For IA32/AMD processors, the set is reduced to

#define _MM_HINT_T0 1
#define _MM_HINT_T1 2
#define _MM_HINT_T2 3
#define _MM_HINT_NTA 0
#define _MM_HINT_ET1 6


_mm_prefetch根据体系结构和位置提示被编译为不同的指令


_mm_prefetch is compiled into different instructions based on the architecture and the locality hint

    Hint              IA32/AMD          iMC
_MM_HINT_T0           prefetcht0     vprefetch0
_MM_HINT_T1           prefetcht1     vprefetch1
_MM_HINT_T2           prefetcht2     vprefetch2
_MM_HINT_NTA          prefetchtnta   vprefetchnta
_MM_HINT_ENTA              -         vprefetchenta
_MM_HINT_ET0               -         vprefetchet0
_MM_HINT_ET1          prefetchtwt1   vprefetchet1
_MM_HINT_ET2               -         vprefetchet2


如果满足所有要求,则(v)prefetch指令的作用是将有价值的缓存行数据带入位置提示所指定的缓存级别.
该说明只是一个提示,可能会被忽略.


What the (v)prefetch instructions do, if all the requirements are satisfied, is to bring a cache line worth of data into the cache level specified by the locality hint.
The instruction is just a hint, it may be ignored.

当一行被预取到X级时,手册(Intel和AMD)都说它也被访存到所有其他更高级别(但对于X = 3的情况).
我不确定这是否是真的,我认为该行是预取的缓存级别X,具体取决于较高级别的缓存策略(包括或不包括)它可能也可能不存在.

When a line is prefetched into level X, the manuals (both Intel and AMD) say that it also fetched into all the other higher level (but for the case X=3).
I'm not sure if this is actually true, I believe that the line is prefetched with-respect-to cache level X and depending on the caching strategies of the higher levels (inclusive vs non-inclusive) it may or may not be present there too.

(v)prefetch指令的另一个属性是非时间属性.
非时间性数据不太可能很快重用.
据我了解,NT数据存储在IA32架构 1 的流负载缓冲区"中,而对于iMC架构,它存储在普通缓存中(用作硬件线程ID的方式),但是使用最近使用的替换策略(以便在需要时将其作为下一个逐出的行).
对于AMD,手册中指出实际位置取决于实现,范围从软件看不见的缓冲区到专用的非临时性缓存.

Another attribute of the (v)prefetch instructions is the non-temporal attribute.
A non-temporal data is unlikely to be reused soon.
In my understanding, NT data is stored in the "streaming load buffers" for the IA32 architecture1 while for the iMC architecture it is stored in the normal cache (using as the way the hardware thread id) but with Most Recent Use replacement policy (so that it will be the next evicted line if needed).
For AMD the manual read that the actual location is implementation dependent, ranging from a software invisible buffer to a dedicated non-temporal cache.

(v)prefetch指令的最后一个属性是"intent"属性或"eviction"属性.
由于存在MESI和变量协议,因此必须进行所有权请求,以使线路进入排他状态(以便对其进行修改).
RFO只是一个特殊的读取,因此使用RFO预取它会直接将其带入Exclusive状态(否则,它的第一个存储将由于需要延迟的" RFO而取消预取的好处),前提是我们知道我们会以后再写.

The last attribute of the (v)prefetch instructions is the "intent" attribute or the "eviction" attribute.
Due to the MESI-and-variant protocols, a Request-for-ownership must be made to bring a line into an exclusive state (in order to modify it).
An RFO is just a special read, so prefetching it with an RFO will bring it into the Exclusive state directly (otherwise the first store to it will cancel the benefits of prefetching due to the "delayed" RFO needed), granted we know we will write to it later.

由于非时间缓存级别是由实现定义的,因此IA32和AMD体系结构尚不支持和排他的非时间提示. iMC体系结构允许使用本地代码_MM_HINT_ENTA.

The IA32 and AMD architectures don't support and exclusive non-temporal hint (yet) since the way the non-temporal cache level is implementation-defined.
The iMC architecture allows for it with the locality code _MM_HINT_ENTA.

1 我知道是WC缓冲区.彼得·科德斯(Peter Cordes)在下面的评论中对此进行了澄清. : prefetchnta仅在预取USWC内存区域时才使用Line-Fill缓冲区.否则,它会预取到L1

1 Which I understand to be the WC buffers. Peter Cordes clarified this on a comment below: prefetchnta only uses the Line-Fill buffers if prefetching USWC memory regions. Otherwise it prefetches into L1

作为参考,这里是所涉及指令的说明

For reference here is the description of the instructions involved

PREFETCHh

PREFETCHh

从内存中获取包含用源操作数指定的字节的数据行到内存中的某个位置. 位置提示指定的缓存层次结构:

Fetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by a locality hint:

•T0(时间数据)-将数据​​预取到缓存层次结构的所有级别. •T1(与一级高速缓存未命中有关的时间数据)—将数据预取到二级高速缓存中.
•T2(与二级缓存未命中有关的时间数据)—将数据预取到三级缓存及更高级别中,或者 实现特定的选择.
•NTA(关于所有缓存级别的非临时数据)—将数据预取到非临时缓存结构中,并且 进入处理器附近的位置,从而最大程度地减少了缓存污染.

• T0 (temporal data)—prefetch data into all levels of the cache hierarchy.
• T1 (temporal data with respect to first level cache misses)—prefetch data into level 2 cache and higher.
• T2 (temporal data with respect to second level cache misses)—prefetch data into level 3 cache and higher, or an implementation-specific choice.
• NTA (non-temporal data with respect to all cache levels)—prefetch data into non-temporal cache structure and into a location close to the processor, minimizing cache pollution.

PREFETCHWT1

PREFETCHWT1

从内存中获取包含用源操作数指定的字节的数据行到内存中的某个位置. 意图由写入提示指定的缓存层次结构(这样,通过请求 所有权)和位置提示:

Fetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by an intent to write hint (so that data is brought into ‘Exclusive’ state via a request for ownership) and a locality hint:

•T1(与一级缓存有关的时间数据)—将数据预取到二级缓存中.

• T1 (temporal data with respect to first level cache)—prefetch data into the second level cache.

VPREFETCHh

VPREFETCHh

                Cache  Temporal    Exclusive state
                 Level
VPREFETCH0       L1     NO          NO
VPREFETCHNTA     L1     YES         NO
VPREFETCH1       L2     NO          NO
VPREFETCH2       L2     YES         NO
VPREFETCHE0      L1     NO          YES
VPREFETCHENTA    L1     YES         YES
VPREFETCHE1      L2     NO          YES
VPREFETCHE2      L2     YES         YES

这篇关于什么是_mm_prefetch()本地提示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆