写分配/在写缓存策略上获取 [英] Write Allocate / Fetch on Write Cache Policy

查看:112
本文介绍了写分配/在写缓存策略上获取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找不到详细说明该政策如何运作的信息来源。 Jouppi的论文有兴趣的。这就是我的理解。

I couldn't find a source that explains how the policy works in great detail. The combinations of write policies are explained in Jouppi's Paper for the interested. This is how I understood it.


  1. 写请求从cpu发送到缓存。

  2. 请求导致缓存丢失。

  3. 为此请求在缓存中分配了一个缓存块。(写分配)

  4. 写入请求块从较低的内存中提取到分配的缓存块中。(写入时提取)

  5. 现在,我们可以写入分配和更新的

  1. A write request is sent from cpu to cache.
  2. Request results in a cache-miss.
  3. A cache block is allocated for this request in cache.(Write-Allocate)
  4. Write request block is fetched from lower memory to the allocated cache block.(Fetch-on-Write)
  5. Now we are able to write onto allocated and updated by fetch cache block.

问题是第4步和第5步之间发生的情况。(假设Cache是​​使用Miss的非阻塞缓存状态处理寄存器。)

Question is what happens between step 4 and step 5. (Lets say Cache is a non-blocking cache using Miss Status Handling Registers.)

CPU是否必须重试对缓存的写请求,直到发生写命中? (将数据块提取到已分配的缓存数据块之后)

Does CPU have to retry write request on cache until write-hit happens? (after fetching the block to the allocated cache block)

如果没有,在此期间写入请求数据将保存在何处

If not, where does write request data is being held in the meantime?

编辑:我想我已经在在K86™处理器中实现写入分配。它直接被写入分配的缓存块中,并稍后与读取请求合并。

I think I've found my answer in Implementation of Write Allocate in the K86™ Processors . It is directly being written into the allocated cache block and it gets merged with the read request later on.

推荐答案


它直接被写入分配的缓存块中,并且稍后与读取请求合并。

It is directly being written into the allocated cache block and it gets merged with the read request later on.

不,那是不是AMD的pdf所说的。他们说,存储数据与刚刚从内存中获取的数据合并,然后然后存储到L1缓存的数据数组中。

No, that's not what AMD's pdf says. They say the store-data is merged with the just-fetched data from memory and then stored into the L1 cache's data array.

缓存轨道缓存行粒度的有效性。它无法存储以下事实:字节3至6是有效的;当数据从内存到达时将其保留。这种逻辑太大了,无法在缓存阵列的每一行中复制。

Cache tracks validity with cache-line granularity. There's no way for it to store the fact that "bytes 3 to 6 are valid; keep them when data arrives from memory". That kind of logic is too big to replicate in each line of the cache array.

还要注意,您发现的pdf描述了其AMD K6微体系结构的某些特定行为。仅限于单核,并且某些模型仅具有单级缓存,因此甚至不需要缓存一致性协议。他们确实在L1和L2缓存之间使用MESI描述了K6-III(模型9)。

Also note that the pdf you found describes some specific behaviour of their AMD's K6 microarchitectures, which was single-core only, and some models only had a single level of cache, so no cache-coherency protocol was even necessary. They do describe the K6-III (model 9) using MESI between L1 and L2 caches.

CPU写入缓存必须保留数据,直到缓存准备好接受它为止。不过,这不是重试到成功的过程。这更像是当缓存准备好接受该商店时(即该行处于活动状态,并且如果缓存与其他缓存使用 MESI协议)。

A CPU writing to cache has to hold onto the data until the cache is ready to accept it. It's not a retry-until-success process, though. It's more like the cache notified the store hardware when it's ready to accept that store (i.e. it has that line active, and in the Modified state if the cache is coherent with other caches using the MESI protocol).

在实际CPU中,可以一次同时发生多个未命中的事件(即使没有完全的无序投机执行)。这称为未命中。 CPU高速缓存连接需要每个可并行支持的未命中缓冲区的缓冲区,以保存存储数据。例如一个内核可能具有8个缓冲区,并支持8个未完成的加载或存储未命中。在8个缓冲区之一可用之前,第9个内存操作无法开始。在此之前,数据必须保留在CPU的存储队列中。

In a real CPU, multiple outstanding misses can be in flight at once (even without full out-of-order speculative execution). This is called miss under miss. The CPU<->cache connection needs a buffer for each outstanding miss that can be supported in parallel, to hold the store data. e.g. a core might have 8 buffers and support 8 outstanding load or store misses. A 9th memory operation couldn't start to happen until one of the 8 buffers became available. Until then, data would have to stay in the CPU's store queue.

这些缓冲区可能在加载和存储之间共享,或者可能有专用的存储缓冲区。 OP报告说,在存储缓冲区上进行搜索时发现了很多相关的有趣内容;一个例子是这是Wikipedia的MESI文章的这一部分

These buffers might be shared between loads and stores, or there might be dedicated store buffers. The OP reports that searching on store buffer found lots of related stuff of interest; one example being this part of Wikipedia's MESI article.

L1高速缓存实际上是现代高性能设计中CPU内核的一部分。它与内存顺序逻辑紧密集成,并且需要能够有效地支持原子操作,例如 lock inc [mem] 以及许多其他复杂性(例如内存重新排序) 。例如,请参见 https://en.wikipedia.org/wiki/Memory_disambiguation#Avoiding_WAR_and_WAW_dependencies

The L1 cache is really a part of a CPU core in modern high-performance designs. It's very tightly integrated with the memory-order logic, and needs to be able to efficiently support atomic operations like lock inc [mem] and lots of other complications (like memory reordering). See https://en.wikipedia.org/wiki/Memory_disambiguation#Avoiding_WAR_and_WAW_dependencies for example.

其他一些术语:


  • 存储缓冲区

  • 存储队列

  • 内存顺序缓冲区

  • 缓存写端口/缓存读端口/缓存端口

  • 全局可见

  • store buffer
  • store queue
  • memory order buffer
  • cache write port / cache read port / cache port
  • globally visible

远距离相关:有趣的帖子调查了英特尔IvyBridge L3缓存的自适应替换策略,使其在出现以下情况时更能抵抗驱逐有价值的数据扫描巨大的阵列。

distantly related: An interesting post investigating the adaptive replacement policy of Intel IvyBridge's L3 cache, making it more resistant against evicting valuable data when scanning a huge array.

这篇关于写分配/在写缓存策略上获取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆