如果发生高速缓存未命中,数据将直接移到寄存器中还是先移到高速缓存中然后再注册? [英] if cache miss happens, the data will be moved to register directly or first moved to cache then to register?

查看:92
本文介绍了如果发生高速缓存未命中,数据将直接移到寄存器中还是先移到高速缓存中然后再注册?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果发生高速缓存未命中,数据将直接从主存储器中移至寄存器中,还是先将数据移至高速缓存中再进行寄存器?是否有直接方法将寄存器与主存储器连接?

if cache miss happens, the data will be moved to register directly from main memory, or the data firstly will be moved to cache then to register? Is there a direct way connect the register with main memory?

推荐答案

我认为您是在问,缓存行从外部缓存到达后,缓存未命中负载是否必须等待L1负载使用延迟.即等待该行写入L1,然后正常重试加载.

I think you're asking if a cache-miss load has to wait for L1 load-use latency after the cache line arrives from outer cache. i.e. wait for the line to be written to L1, then retry the load normally.

我几乎可以肯定,高性能CPU不能那样工作.L2命中延迟对于许多工作负载而言都很重要,并且无论如何,您都需要一个负载缓冲区来跟踪传入的高速缓存行,以了解何时重新启动负载.因此,您只需抓取传入的数据,同时将其写入缓存即可.作为生成物理地址以发送到外部缓存的一部分,已经完成了TLB检查.

I'm almost certain that high-performance CPUs don't work that way. L2-hit latency is important for many workloads, and you need a load buffer tracking that incoming cache line anyway to know when to restart the load. So you just grab the data as it comes in, in parallel with writing it to the cache. The TLB check was already done as part of generating a physical address to send to the outer cache.

大多数实际的CPU使用早期重启设计,该设计可使流水线在等待的字/字节到达时立即重启,因此其余缓存行在后台"传输.

Most real CPUs use an early-restart design that lets the pipeline restart as soon as the word / byte they were waiting for arrives, so the rest of the cache line transfers "in the background".

进一步的优化是关键字优先,它要求从所需字开始发送高速缓存行,因此在高速缓存行中间的一个字的需求丢失可以首先接收该字.我认为现代DDR DRAM从主存储器读取数据时仍支持此功能,以指定的64位块开始64字节突发.不过,我不是100%肯定现代的无序CPU会使用它.当乱序执行允许同一行出现多个未命中事件时,这可能会使情况变得更加复杂.

A further optimization is critical-word-first, which asks for the cache line to be sent starting with the needed word, so a demand miss for a word in the middle of a cache line can receive that word first. I think modern DDR DRAM still supports this when reading from main memory, starting the 64-byte burst at a specified 64-bit chunk. I'm not 100% sure modern out-of-order CPUs use this, though; when out-of-order execution allows multiple outstanding misses for the same line, it probably makes it more complicated.

请参见这是最佳选择是更大的块缓存还是较小的块缓存??有关早期重启和关键单词优先的一些讨论.

See which is optimal a bigger block cache size or a smaller one? for some discussion of early-restart and critical-word-first.

是否有直接方法将寄存器与主存储器连接?

Is there a direct way connect the register with main memory?

这取决于您所说的直接".在现代的高性能CPU中,将有2层或3层缓存以及一个具有自己的缓冲的内存控制器,以仲裁对多个内核的内存访问.所以不,你不能.

It depends what you mean by "direct". In a modern high-performance CPU, there will be 2 or 3 layers of cache and a memory controller with its own buffering to arbitrate access to memory for multiple cores. So no, you can't.

如果您设计一个具有特殊的绕过加载和存储指令的简单单核CPU,那么请确定.或者,如果您认为提前重启是直接"的,那么是的,它已经发生了.

If you design a simple single-core CPU with special cache-bypassing load and store instructions, then sure. Or if you consider early-restart as "direct", then yes it already happens.

对于存储,x86和其他一些体系结构具有绕过高速缓存的存储,但是x86的MOVNT指令不会直接将寄存器与内存连接.存储进入一个行填充缓冲区,该缓冲区在满时将被刷新,因此您可以进行写合并.

For stores, x86 and some other architectures have cache-bypassing stores, but x86's MOVNT instructions don't directly connect registers with memory. Stores go into a line-fill buffer which is flushed when full, so you get write-combining.

还有不可缓存的内存区域:不可缓存内存的加载或存储在结构上是直接"的,但是在实际的微体系结构中,它仍然通过加载/存储执行单元通过L1D用来交谈的相同机制遍历内存层次结构.到内存控制器.

There's also uncacheable memory regions: a load or store to uncacheable memory is architecturally "direct", but in the actually microarchitecture it still goes through the memory hierarchy from the load/store execution unit through the same mechanism that L1D uses to talk to the memory controller.

这篇关于如果发生高速缓存未命中,数据将直接移到寄存器中还是先移到高速缓存中然后再注册?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆