在此示例中,数据缓存如何路由对象? [英] How do data caches route the object in this example?

查看:93
本文介绍了在此示例中,数据缓存如何路由对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑图解数据缓存体系结构。 (随后是ASCII艺术。)

Consider the diagrammed data cache architecture. (ASCII art follows.)

  --------------------------------------
  | CPU core A | CPU core B |          |
  |------------|------------| Devices  |
  |  Cache A1  |  Cache B1  | with DMA |
  |-------------------------|          |
  |         Cache 2         |          |
  |------------------------------------|
  |                RAM                 |
  --------------------------------------

假设


  • 对象被遮盖在上面缓存A1的行,

  • 同一对象的旧版本在Cache 干净行的阴影中,并且

  • 该对象的最新版本最近已通过DMA写入RAM。

  • an object is shadowed on a dirty line of Cache A1,
  • an older version of the same object is shadowed on a clean line of Cache 2, and
  • the newest version of the same object has recently been written to RAM via DMA.

图表:

  --------------------------------------
  | CPU core A | CPU core B |          |
  |------------|------------| Devices  |
  |  (dirty)   |            | with DMA |
  |-------------------------|          |
  |     (older, clean)      |          |
  |------------------------------------|
  |          (newest, via DMA)         |
  --------------------------------------

请回答三个问题。


  1. 如果CPU内核A试图加载(读取)对象,会发生什么?

  1. If CPU core A tries to load (read) the object, what happens?

如果CPU内核A尝试存储(写入)该对象,会发生什么?

If, instead, CPU core A tries to store (write) the object, what happens?

如果不是核心A,而是核心B而不是核心A进行加载或存储,会发生任何不明显,有趣和/或不同的事情吗? ?

Would anything nonobvious, interesting and/or different happen if, rather than core A, core B did the loading or storing?

我的问题是理论性的。我的问题不涉及任何特定的CPU体系结构,但是如果愿意,您可以在答案中涉及x86或ARM(甚至RISC-V)。

My questions are theoretical. My questions do not refer to any particular CPU architecture but you may refer to x86 or ARM (or even RISC-V) in your answer if you wish.

注意。如果不考虑监听可以简化您的回答,那么您可以自行决定不考虑监听。或者,如果修改后的问题可以更好地说明您的观点,则可以修改问题。如果您必须编写代码来回答问题,那么我更喜欢C / C ++。据我所知,您无需在答案中命名 MESI 或MOESI协议的特定标志

Notes. If disregarding snooping would simplify your answer then you may disregard snooping at your discretion. Alternately, you may modify the problem if a modified problem would better illuminate the topic in your opinion. If you must write code to answer, then I would prefer C/C++. You need not name specific flags of a MESI or MOESI protocol in your answer as far as I know, but a simpler, less detailed answer would probably suffice.

动机。我要问的动机是,我正在阅读有关并发性和内存的信息。 C ++标准中的模型。我想学习尽可能在硬件操作方面可视化该模型。

Motive. My motive to ask is that I am reading about concurrency and the memory model in the C++ standard. I would like to learn to visualize this model approximately in terms of hardware operations if possible.

更新

据我所知,@ HadiBrais建议以下图解架构比我之前图解的架构更为常见,尤其是如果实现了DDIO(请参见下面的答案)。

To the extent to which I understand, @HadiBrais advises that the following diagrammed architecture would be more usual than the one I have earlier diagrammed, especially if DDIO (see his answer below) is implemented.

  --------------------------------------
  | CPU core A | CPU core B | Devices  |
  |------------|------------| with DMA |
  |  Cache A1  |  Cache B1  |          |
  |------------------------------------|
  |              Cache 2               |
  |------------------------------------|
  |                RAM                 |
  --------------------------------------


推荐答案

您的假设系统似乎包括相干的回写L1缓存和不相干的DMA。一个非常相似的真实处理器是 ARM11 MPCore ,不同之处在于它没有二级缓存。但是,大多数现代处理器的确具有连贯的DMA。否则,软件有责任确保一致性。如图所示,系统状态已经不连贯。

Your hypothetical system seems to include coherent, write-back L1 caches and non-coherent DMA. A very similar real processor is ARM11 MPCore, except that it doesn't have an L2 cache. However, most modern processors do have coherent DMA. Otherwise, it is the software's responsibility to ensure coherence. The state of the system as shown in your diagram is already incoherent.


如果CPU内核A尝试加载(读取)对象,该怎么办?

If CPU core A tries to load (read) the object, what happens?

它将仅读取保存在其本地L1缓存中的行。不会发生任何变化。

It will just read the line held in its local L1 cache. No changes will occur.


如果CPU内核A尝试存储(写入)对象,那么
会发生什么?

If, instead, CPU core A tries to store (write) the object, what happens?

这些行在核心A的L1高速缓存中已经处于M相干状态。因此可以直接对其进行写操作。不会发生任何变化。

The lines is already in the M coherence state in the L1 cache of core A. So it can write to it directly. No changes will occur.


如果
而不是核心A,核心B,会发生任何不明显,有趣和/或不同的事情

Would anything nonobvious, interesting and/or different happen if, rather than core A, core B did the loading or storing?

如果内核B向同一行发出了加载请求,则监听内核A的L1缓存,该线处于M状态。该行在L2高速缓存中更新,并发送到核心B的L1高速缓存。还将发生以下情况之一:

If core B issued a load request to the same line, the L1 cache of core A is snooped and the line is found in the M state. The line is updated in the L2 cache and is sent to the L1 cache of core B. Also one of the following will occur:


  • 线从核心A的L1缓存中无效。该行以E一致性状态(对于MESI协议)或S一致性状态(对于MSI协议)插入核心B的L1缓存中。如果L2使用探听过滤器,则更新过滤器以指示核心B的线路处于E / S状态。否则,L2中的线路状态将与核心B的L1中的线路状态相同,只是它不知道它的存在(因此侦听必须一直广播)。

  • 核心A的L1高速缓存中的行状态更改为S。该行以S相干状态插入核心B的L1高速缓存中。 L2将行插入为S状态。

无论哪种方式,L1缓存和L2缓存都将保存相同的副本。

Either way, both L1 caches and the L2 cache will all hold the same copy of the line, which remains incoherent with that in the memory.

如果核心B向同一行发出存储请求,则该行将从核心A的缓存中无效,并且

If core B issued a store request to the same line, the line will be invalidated from the core A's cache and will end up in the M state in core B's cache.

最终,该行将从高速缓存层次结构中逐出,为其他行腾出空间。发生这种情况时,有两种情况:

Eventually, the line will be evicted from the cache hierarchy to make space for other lines. When that happens, there are two cases:


  • 该行处于S / E状态,因此只需将其从所有缓存中删除。稍后,如果再次读取该行,则将从主内存中读取由DMA操作写入的副本。

  • 该行处于M状态,因此它将被写回到main内存,并(可能是部分地)覆盖DMA操作写入的副本。

显然,这种不相干状态绝对不能发生。可以通过在DMA写操作开始之前使所有高速缓存中的所有相关行无效并确保在操作完成之前确保没有内核访问正在写入的内存区域来防止此情况。每当操作完成时,DMA控制器都会发送一个中断。如果是读DMA操作,则所有相关行都需要写回到内存中,以确保使用最新值。

Obviously such incoherent state must never occur. It can be prevent by invalidating all relevant line from all caches before the DMA write operation begins and ensuring that no core accesses the area of memory being written to until the operation finishes. The DMA controller sends an interrupt whenever an operation completes. In case of a read DMA operation, all the relevant lines need to be written back to memory to ensure that the most recent values are used.

英特尔数据直接I / O (DDIO)技术使DMA控制器可以直接从共享的最后一级缓存中读取或写入数据,以提高性能。

Intel Data Direct I/O (DDIO) technology enables the DMA controller to read or write directly from the shared last-level cache to improve performance.

此部分不与问题直接相关,但我想将此写在某个地方。

This section is not directly related to the question, but I want to write this somewhere.

所有商用x86 CPU都是完全缓存一致的(即,整个缓存层次结构是一致的)。更准确地说,同一共享内存域中的所有处理器都是高速缓存一致的。此外,所有商用x86多核协处理器(即PCIe卡形式的Intel Xeon Phi)在内部都是完全一致的。作为PCIe互连上的设备的协处理器与其他协处理器或CPU不协调。因此,协处理器驻留在其自己的单独的一致性域中。我认为这是因为没有内置硬件机制可以使PCIe设备具有与其他PCIe设备或CPU保持一致的缓存。

All commercial x86 CPUs are fully cache coherent (i.e., the whole cache hierarchy is coherent). To be more precise, all processors within the same shared memory domain are cache coherent. In addition, all commercial x86 manycore coprocessors (i.e., Intel Xeon Phi in the PCIe card form) are internally fully coherent. A coprocessor, which is a device on the PCIe interconnect, is not coherent with other coprocessors or CPUs. So a coprocessor resides in a separate coherence domain of its own. I think this is because there is no built-in hardware mechanism to make a PCIe device that has a cache coherent with other PCIe devices or CPUs.

除商用x86芯片外,有些原型x86芯片没有缓存一致性。我知道的唯一例子是英特尔的单芯片云计算机(SCC) ,后来演变为连贯的至强融核。

Other than commercial x86 chips, there are prototype x86 chips that are not cache coherent. The only example I'm aware of is Intel's Single-Chip Cloud Computer (SCC), which has later evolved into coherent Xeon Phi.

这篇关于在此示例中,数据缓存如何路由对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆