超级队列和行填充缓冲区的语义是什么? [英] What is the semantics for Super Queue and Line Fill buffers?

查看:21
本文介绍了超级队列和行填充缓冲区的语义是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我问这个关于 Haswell 微架构(英特尔至强 E5-2640-v3 CPU)的问题.从CPU和其他资源的规格中我发现有10个LFB,super queue的Size是16.我有两个关于LFB和SuperQueue的问题:

I am asking this question regarding Haswell Microarchitetcure(Intel Xeon E5-2640-v3 CPU). From the specifications of the CPU and other resources I found out that there are 10 LFBs and Size of the super queue is 16. I have two questions related to LFBs and SuperQueues:

1) 系统可以提供的最大内存级并行度是多少,10 或 16(LFB 或 SQ)?

1) What will be the maximum degree of memory level parallelism the system can provide, 10 or 16(LFBs or SQ)?

2) 根据某些来源,每个 L1D 未命中都记录在 SQ 中,然后 SQ 分配行填充缓冲区,而在其他某些来源,他们写道 SQ 和 LFB 可以独立工作.能否简单介绍一下 SQ 的工作原理?

2) According to some sources every L1D miss is recorded in SQ and then SQ assigns the Line fill buffer and at some other sources they have written that SQ and LFBs can work independently. Could you please explain the working of SQ in brief?

这是 SQ 和 LFB 的示例图(不适用于 Haswell).参考:https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Here is the example figure(Not for Haswell) for SQ and LFB. References: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

http://www.realworldtech.com/haswell-cpu/>

推荐答案

对于 (1) 从逻辑上讲,最大并行度将受到管道的最小并行部分(即 10 个 LFB)的限制,这可能严格适用于预取被禁用或无能为力时的按需加载并行性.在实践中,一旦你的负载至少部分地通过预取来帮助,一切都会变得更加复杂,因为这样可以使用 L2 和 RAM 之间更宽的队列,这可以使观察到的并行度大于 10.最实用的方法可能是直接测量:给定测量到 RAM 的延迟和观察到的吞吐量,您可以计算任何特定负载的有效并行度.

For (1) logically the maximum parallelism would be limited by the least-parallel part of the pipeline which is the 10 LFBs, and this is probably strictly true for demand-load parallelism when prefetching is disabled or can't help. In practice, everything is more complicated once your load is at least partly helped by prefetching, since then the wider queues between L2 and RAM can be used which could make the observed parallelism greater than 10. The most practical approach is probably direct measurement: given measured latency to RAM, and observed throughput, you can calculate an effective parallelism for any particular load.

对于 (2),我的理解是相反的:L1 中的所有需求未命中首先分配到 LFB(当然,除非它们命中现有的 LFB)并且稍后可能涉及超级队列"(或任何其他被称为这些天)如果它们也错过了缓存层次结构中的更高位置.您包含的图表似乎证实了这一点:从 L1 出发的唯一路径是通过 LFB 队列.

For (2) my understanding is that it is the other way around: all demand misses in L1 first allocate into the LFB (unless of course they hit an existing LFB) and may involve the "superqueue" later (or whatever it is called these days) if they also miss higher in the cache hierarchy. The diagram you included seems to confirm that: the only path from the L1 is through the LFB queue.

这篇关于超级队列和行填充缓冲区的语义是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆