CPU如何通过TLB和缓存发出数据请求? [英] How does CPU make data request via TLBs and caches?

查看:268
本文介绍了CPU如何通过TLB和缓存发出数据请求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在观察最近的几个英特尔微体系结构(Nehalem/SB/IB和Haswell).我正在尝试弄清楚在发出数据请求时会发生什么(在相当简化的水平上).到目前为止,我有一个大概的想法:

I am observing the last few Intel microarchitectures (Nehalem/SB/IB and Haswell). I am trying to work out what happens (at a fairly simplified level) when a data request is made. So far I have this rough idea:

  1. 执行引擎发出数据请求
  2. 内存控制"查询L1 DTLB
  3. 如果上述内容未达到要求,则现在查询L2 TLB

这时可能会发生两种情况:未命中或命中:

At this point two things can happen, a miss or a hit:

  1. 如果命中,CPU会尝试L1D/L2/L3高速缓存,页表,然后依次尝试主内存/硬盘?

  1. If its a hit the CPU tries L1D/L2/L3 caches, page table and then main memory/hard disk in that order?

如果未命中,则CPU会请求(集成内存控制器?)请求检查RAM中保存的页表(我在那儿是否获得了IMC的作用?).

If its a miss- the CPU requests the (integrated memory controller?) to request checking the page table held in RAM (did I get the role of the IMC correct there?).

如果某人可以编辑/提供一组项目符号,这些项目符号提供了CPU对执行引擎数据请求(包括

If somebody could edit/provide a set of bullet points which provide a basic "overview" of what the CPU does from the execution engine data request, including the

  • L1 DTLB(数据TLB)
  • L2 TLB(数据+指令TLB)
  • L1D缓存(数据缓存)
  • L2缓存(数据+指令缓存)
  • L3缓存(数据+指令缓存)
  • 控制访问主内存的CPU部分
  • 页表

,将不胜感激.我确实找到了一些有用的图像:

it would be most appreciated. I did find some useful images:

  • http://www.realworldtech.com/wp-content/uploads/2012/10/haswell-41.png
  • http://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Intel_Core2_arch.svg/1052px-Intel_Core2_arch.svg.png

但是他们并没有真正区分TLB和缓存之间的交互.

but they didn't really separate the interaction between the TLBs and the caches.

更新:我现在已经理解,已经认为改变了上面的内容. TLB只是从虚拟地址获取物理地址.如果有遗漏,我们很麻烦,需要检查页表.如果遇到问题,我们就从L1D缓存开始,逐步研究内存层次.

UPDATE: Have changed the above as I think I now understand. The TLB just gets the physical address from the virtual one. If there's a miss- we're in trouble and need to check page table. If there's a hit we just proceed down through the memory hierarchy starting with the L1D cache.

推荐答案

该页面映射仅适用于虚拟地址到物理地址的转换.但是,由于它驻留在内存中,并且仅部分缓存在TLB中,因此您可能必须在翻译过程中在其中访问它.

The pagemap is only applicable for virtual to physical address translation. However, as it's residing in memory and only partially cached in the TLBs, you may have to access it there during the translation process.

基本流程如下:

  1. 执行将计算地址(实际上可以在存储单元中完成一些比例和偏移量的计算).
  2. 在DTLB中查找
    2.a.如果错过了,请在第二级TLB中查找.
    2.a.a.如果错过了-开始步行.
    2.a.b.如果达到第二级TLB,则填写DTLB并继续使用新的物理地址
    2.b.在DTLB中被击中,并使用物理地址
  3. 查找L1(如果丢失)-查找L2,如果再次丢失,则查找L3(如果丢失)-发送至内存控制器,等待DRAM访问.
  4. 当数据返回(从哪个级别)时,请沿途填充到高速缓存中(取决于填充策略,高速缓存的包含性,指令的时间性规范,内存区域类型以及可能的其他因素).
  1. Execution calculates the address (actually some calculations like scale and offsets could be done in the memory unit).
  2. Lookup in the DTLB
    2.a. If missed, lookup in the 2nd level TLB.
    2.a.a. if missed - start a page walk.
    2.a.b. if hit the 2nd level TLB, fill into the DTLB and proceed with the new physical address
    2.b. is hit in the DTLB proceed with physical address
  3. Lookup the L1, if missed - lookup the L2, if missed again lookup the L3, if missed - send to the memory controller, wait for DRAM access.
  4. When data returns (from whichever level), fill in to the caches along the way (depending on fill policy, cache inclusiveness, and instruction temporality specifications, memory region type, and probably other factors as well).

如果需要分页走行,则暂停主要请求,并向页面地图发出物理负载(根据体系结构定义).在x86中,它可能包括CR3,PDPTR,PDP,PDE,PTE等.取决于分页模式,页面大小等.请注意,在虚拟化条件下,VM上的每个pagewalk级别都可能要求主机上具有完整的pagewalk(因此您实际上需要的步数为平方).

If a pagewalk was required, stall main request, and issue physical loads to the pagemap (according to the architectural definition). In x86 it may include CR3, PDPTR, PDP, PDE, PTE, etc.. depending on the paging mode, page sizes, etc.. Note that under virtualization, each pagewalk level on the VM may require a full pagewalk on the host (so you actually square the number of steps needed).

请注意,页面地图基本上是树结构,其中每次访问都取决于前一个的值(以及您转换的虚拟地址的一部分).因此,这些访问是相关的,只有完成最后一个访问后,您才能获取物理地址并可以返回到#3.一直以来,您想要的行可能都位于您的L1中,而您却无法知道(虽然,老实说,如果您进行了分页浏览,那么您的行就不太可能仍然存在于上层缓存中).

Note that a pagemap is basically a tree structure, where each access depends on the value of the previous one (and part of the virtual address you translate). These accesses are therefore dependent, and only once the last one is done you get the physical address and can go back to #3. All along, the line you want may be sitting in your L1 without you being able to know (although to be honest, if you did a pagewalk you're not likely to still have the line in your upper caches).

其他重要说明-页面地图位于物理空间中,并以这种方式访问​​.您不需要翻译所需的访问权限,这可能会导致死锁:)
更重要的是,页面地图数据可以被缓存,因此可以简单地进行内存访问可能会由于TLB缺失而扩展为多个,因此分页游还是很便宜的.

Other important notes - the pagemap is in physical space and accessed that way. You don't want to have to translate the accesses you need for translation, that could be a deadlock :)
More importantly, the pagemap data can be cached, so while a simple memory access may expand to multiple ones due to a TLB miss, the pagewalk may still be fairly cheap.

这篇关于CPU如何通过TLB和缓存发出数据请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆