用于读取或写入的预取之间的区别 [英] Difference between prefetch for read or write

查看:125
本文介绍了用于读取或写入的预取之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

gcc文档讨论一个用于读取的预取和用于写入的预取之间的区别。在技​​术上的区别是什么?

解决方案

在CPU级别上,软件预取(与硬件自身触发相反)是一种向CPU暗示即将被访问的便捷方式,并且您希望提前预取以节省延迟。



如果访问将是一个简单的读取,你会想要一个常规的预取,其行为将类似于来自内存的正常加载(除了没有阻塞CPU以防万一地址错误,以及其他各种好处,在微架构上)。



但是,如果您打算写入该行,并且它也存在于另一个内核中,则只需简单的读取操作即可。这是由于基于MESI的缓存处理协议。核心在修改之前必须拥有所有权,以保持一致性(如果同一行在多个核心中被修改,您将无法确保这些更改的正确排序,甚至可能会丢失其中的一些,这些在正常的WB存储器类型上不允许)。
相反,写入操作将从获取线路所有权开始,然后从任何其他可能拥有副本的核心/套接字窥探出来。只有这样才能写入。
读取操作(按需或预取)会使其他内核中的行处于共享状态,如果多行内核多次读取该行,这种情况是很好的,但如果核心稍后写入为了允许预先准备稍后写入的行,大多数CPU公司都支持用于写入的特殊预取。在x86中,Intel和AMD都支持prefetchW指令,该指令应该具有写入的效果(即 - 获取专线的唯一所有权,并使其他任何副本无效)。请注意,并非所有的CPU都支持这种功能(即使在同一系列中,并非所有代都具有它),并非所有的编译器版本都支持它。



以下是一个示例(使用gcc 4.8.2) - 请注意,您需要在此明确地启用它 -

  #include< emmintrin.h> 

int main(){
long long int a [100];
__builtin_prefetch(& a [0],0,0);
__builtin_prefetch(& a [16],0,1);
__builtin_prefetch(& a [32],0,2);
__builtin_prefetch(& a [48],0,3);
__builtin_prefetch(& a [64],1,0);
返回0;
}

使用编译> gcc -O3 -mprfchw prefetchw.c -c ,:

  0000000000000000 

0:48 81 ec b0 02 00 00 sub $ 0x2b0,%rsp
7:48 8d 44 24 88 lea -0x78(%rsp),%rax
c:0f 18 00 prefetchnta(%rax)
f:0f 18 98 80 00 00 00 prefetcht2 0x80(%rax)
16:0f 18 90 00 01 00 00 prefetcht1 0x100(%rax)
1d:0f 18 88 80 01 00 00 prefetcht0 0x180(%rax)
24:0f 0d 88 00 02 00 00 prefetchw 0x200(%rax)
2b:31 c0 xor%eax,%eax
2d:48 81 c4 b0 02 00 00 add $ 0x2b0, %rsp
34:c3 retq

如果使用第二个参数提示级别忽略prefetchW,因为它不支持时间级别提示。顺便说一下,如果你删除了-mprfchw标志,gcc会将其转换为正常的读取预取(我还没有尝试过不同的-march / mattr设置,也许其中一些也包括了它)。


The gcc docs talk about a difference between prefetch for read and prefetch for write. What is the technical difference?

解决方案

On the CPU level, a software prefetch (as opposed to ones trigger by the hardware itself) are a convenient way to hint to the CPU that a line is about to be accessed, and you want it prefetched in advance to save the latency.

If the access will be a simple read, you would want a regular prefetch, which would behave similarly to a normal load from memory (aside from not blocking the CPU in case it misses, not faulting in case the address is bad, and all sorts of other benefits, depending on the micro architecture).

However, if you intend to write to that line, and it also exists in another core, a simple read operation would not suffice. This is due to MESI-based cache handling protocols. A core must have ownership of a line before modifying it, so that it preserves coherency (if the same line gets modified in multiple cores, you will not be able to ensure correct ordering for these changes, and may even lose some of them, which is not allowed on normal WB memory types). Instead, a write operation will start by acquiring ownership of the line, and snooping it out of any other core / socket that may hold a copy. Only then can the write occur. A read operation (demand or prefetch) would have left the line in other cores in a shared state, which is good if the line is read multiple times by many cores, but doesn't help you if your core later writes to it.

To allow useful prefetching for lines that will later be written to, most CPU companies support special prefetches for writing. In x86, both Intel and AMD support the prefetchW instruction, which should have the effect of a write (i.e. - acquiring sole ownership of a line, and invalidating any other copy if it). Note that not all CPUs support that (even within the same family, not all generations have it), and not all compiler versions enable it.

Here's an example (with gcc 4.8.2) - note that you need to enable it explicitly here -

#include <emmintrin.h>

int main() {
    long long int a[100];
    __builtin_prefetch (&a[0], 0, 0);
    __builtin_prefetch (&a[16], 0, 1);
    __builtin_prefetch (&a[32], 0, 2);
    __builtin_prefetch (&a[48], 0, 3);
    __builtin_prefetch (&a[64], 1, 0);
    return 0;
}

compiled with gcc -O3 -mprfchw prefetchw.c -c , :

0000000000000000 <main>:
   0:   48 81 ec b0 02 00 00    sub    $0x2b0,%rsp
   7:   48 8d 44 24 88          lea    -0x78(%rsp),%rax
   c:   0f 18 00                prefetchnta (%rax)
   f:   0f 18 98 80 00 00 00    prefetcht2 0x80(%rax)
  16:   0f 18 90 00 01 00 00    prefetcht1 0x100(%rax)
  1d:   0f 18 88 80 01 00 00    prefetcht0 0x180(%rax)
  24:   0f 0d 88 00 02 00 00    prefetchw 0x200(%rax)
  2b:   31 c0                   xor    %eax,%eax
  2d:   48 81 c4 b0 02 00 00    add    $0x2b0,%rsp
  34:   c3                      retq

If you play with the 2nd argument you'd notice that the hint levels are ignores for prefetchW, since it doesn't support temporal level hints. By the way, if you remove the -mprfchw flag, gcc will convert this into a normal read prefetch (I haven't tried different -march/mattr settings, maybe some of them include it as well).

这篇关于用于读取或写入的预取之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆