超级队列和行填充缓冲区的含义是什么? [英] What is the semantics for Super Queue and Line Fill buffers?

查看:203
本文介绍了超级队列和行填充缓冲区的含义是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在问有关Haswell Microarchitetcure(Intel Xeon E5-2640-v3 CPU)的问题.从CPU的规格和其他资源中,我发现有10个LFB,超级队列的大小是16.我有两个与LFB和SuperQueues有关的问题:

I am asking this question regarding Haswell Microarchitetcure(Intel Xeon E5-2640-v3 CPU). From the specifications of the CPU and other resources I found out that there are 10 LFBs and Size of the super queue is 16. I have two questions related to LFBs and SuperQueues:

1)系统可以提供的最大内存级别并行度是10还是16(LFB或SQ)?

1) What will be the maximum degree of memory level parallelism the system can provide, 10 or 16(LFBs or SQ)?

2)根据某些消息来源,每个L1D未命中都会记录在SQ中,然后SQ会分配行填充缓冲区,而在其他一些消息来源中,他们已写道SQ和LFB可以独立工作.您能简要解释一下SQ的工作吗?

2) According to some sources every L1D miss is recorded in SQ and then SQ assigns the Line fill buffer and at some other sources they have written that SQ and LFBs can work independently. Could you please explain the working of SQ in brief?

这是SQ和LFB的示例图(不适用于Haswell). 参考:

Here is the example figure(Not for Haswell) for SQ and LFB. References: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

http://www.realworldtech.com/haswell-cpu/

推荐答案

对于(1),从逻辑上讲,最大并行度将受到流水线的最小并行部分(即10个LFB)的限制,对于当预取被禁用或无济于事时,按需负载并行性.实际上,一旦预加载至少部分地帮助了您的负载,一切都会变得更加复杂,因为这样可以使用L2和RAM之间的更宽队列,这会使观察到的并行度大于10.最实际的方法可能是直接测量:测量到RAM的延迟以及观察到的吞吐量,您可以为任何特定负载计算有效并行度.

For (1) logically the maximum parallelism would be limited by the least-parallel part of the pipeline which is the 10 LFBs, and this is probably strictly true for demand-load parallelism when prefetching is disabled or can't help. In practice, everything is more complicated once your load is at least partly helped by prefetching, since then the wider queues between L2 and RAM can be used which could make the observed parallelism greater than 10. The most practical approach is probably direct measurement: given measured latency to RAM, and observed throughput, you can calculate an effective parallelism for any particular load.

对于(2),我的理解是相反的:L1中的所有需求未命中首先分配到LFB中(除非它们当然命中了现有LFB),然后可能涉及超队列"(或其他任何原因)这些天称为),如果它们在缓存层次结构中也错过了更高的级别.您所包含的图似乎证实了:从L1出发的唯一路径是通过LFB队列.

For (2) my understanding is that it is the other way around: all demand misses in L1 first allocate into the LFB (unless of course they hit an existing LFB) and may involve the "superqueue" later (or whatever it is called these days) if they also miss higher in the cache hierarchy. The diagram you included seems to confirm that: the only path from the L1 is through the LFB queue.

这篇关于超级队列和行填充缓冲区的含义是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆