英特尔的CLWB指令使缓存行无效 [英] Intel's CLWB instruction invalidating cache lines

查看:726
本文介绍了英特尔的CLWB指令使缓存行无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Intel的 clwb 指令查找配置或内存访问模式,该配置或内存访问模式不会使高速缓存行无效.我正在使用NVDIMM的Intel Xeon Gold 5218处理器进行测试. Linux版本是5.4.0-3-amd64.我尝试使用Device-DAX模式并将此char设备直接映射到地址空间.我还尝试将此非易失性存储器添加为新的NUMA节点,并使用numactl --membind命令将存储器绑定到它.在这两种情况下,当我使用 clwb 缓存地址时,它都会被驱逐.我正在使用禁用的预取器的PAPI硬件计数器观察逐出.

I am trying to find configuration or memory access pattern for Intel's clwb instruction that would not invalidate cache line. I am testing on Intel Xeon Gold 5218 processor with NVDIMMs. Linux version is 5.4.0-3-amd64. I tried using Device−DAX mode and directly mapping this char device to the address space. I also tried adding this non-volatile memory as a new NUMA node and using numactl --membind command to bind memory to it. In both cases when I use clwb to cached address, it is evicted. I am observing eviction with PAPI hardware counters, with disabled prefetchers.

这是我正在测试的一个简单循环.数组和tmp变量都声明为volatile,因此加载实际上已执行.

This is a simple loop that I am testing. array and tmp variable, both are declared as volatile, so the loads are really executed.

for(int i=0; i < arr_size; i++){
    tmp = array[i];
    _mm_clwb(& array[i]);
    _mm_mfence();
    tmp = array[i];    
}

两次读取都导致高速缓存未命中.

Both reads are giving cache misses.

我想知道是否还有其他人试图检测是否有某种配置或内存访问模式会使高速缓存行留在高速缓存中?

I was wondering if anyone else has tried to detect whether there is some configuration or memory access pattern that would leave the cache line in the cache?

推荐答案

clwb在SKX和CSL上的行为与clflushopt相似.但是,在将来的支持优化的clwb实现的进程上运行时,在这些处理器上使用clwb的程序将自动受益.

clwb behaves like clflushopt on SKX and CSL. However, programs that use clwb on these processors will automatically benefit when run on a future process that supports an optimized implementation of clwb.

《英特尔优化手册》(2019年9月)的第2.1.1.4节提到clwb在Ice Lake Client上是新的.也许这意味着clwb的性能优势在Ice Lake上是新的.尽管 InstLatx64 cpuid叶0x7信息说ICL不支持clwb.我不确定这是谁的错.有人应该检查_mm_clwb(void const *p)是否在ICL上工作.无论如何,ICX很可能会支持它.

Section 2.1.1.4 of the Intel Optimization Manual (September 2019) mentions that clwb is new on Ice Lake Client. Perhaps this means that the performance advantage of clwb is new on Ice Lake. Although the cpuid leaf 0x7 information from InstLatx64 says that ICL doesn't support clwb. I'm not sure who's wrong here. Someone should check whether _mm_clwb(void const *p) works on ICL. Anyway, it will most probably be supported on ICX.

clwb,但是我不知道它如何在这种微体系结构上工作.

clwb is also supported on Zen 2, but I don't know how it works on this microarchitecture.

这篇关于英特尔的CLWB指令使缓存行无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆