访问映射到相同物理地址的虚拟地址是否会受到惩罚? [英] Is there a penalty for accesses to virtual addresses which are mapped to the same physical address?

查看:124
本文介绍了访问映射到相同物理地址的虚拟地址是否会受到惩罚?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于处理操作的虚拟地址与代表内存中实际位置的物理地址之间的分隔,您可以玩一些有趣的技巧:例如创建<一个href ="https://github.com/willemt/cbuffer" rel ="nofollow noreferrer">圆形缓冲区,在分配空间的开头/结尾不间断.

在这种情况下,我想知道这样的映射技巧是否会影响数据的读取或写入访问:

  • 对物理页面的访问主要是通过相同的虚拟映射进行的,但偶尔是通过其他映射进行的.
  • 对物理页面的访问或多或少均匀地分布在映射到同一物理地址的虚拟地址之间.

我对过去十年左右发布的x86芯片特别感兴趣,而且对当代ARM和POWER芯片也很感兴趣.

解决方案

对于80x86(我不了解其他体系结构):

a)正常的指令/数据/统一缓存在物理上进行了索引(因此不受分页把戏的影响)

b)对TLB进行虚拟索引.这意味着(取决于很多因素),对于循环缓冲区技巧,与没有循环缓冲区技巧的情况相比,您可能会期望更多的TLB丢失.可能重要的事情包括区域的大小和所使用的TLB条目类型的数量(4 KiB,2 MiB/1 GiB);如果CPU预取了TLB条目(最近的CPU则这样做),并且花了足够的时间进行其他工作以确保预取的TLB在需要之前到达;并且如果CPU缓存了更高级别的分页结构(例如页面目录),以避免获取TLB未命中的每个级别(例如,由于缓存了页面目录,仅是页面表条目;或者是PML4条目,然后是PDPT条目,然后是PD条目,然后是页面表条目)

c)任何uop缓存(例如,作为循环流检测器的一部分,或旧的Pentium 4跟踪缓存")实际上都已编入索引或根本未编入索引(例如,CPU只记得从循环开始起就产生了uops").除非您有多个代码副本,否则这无关紧要.并且如果您确实有多个代码副本,则会变得很复杂(例如,如果重复导致uops的数量超过了uop缓存的大小).

d)对分支预测进行虚拟索引.这意味着,如果您有多个相同代码的副本,它将再次变得复杂(例如,对于静态预测不正确的分支,这将增加训练时间";并且重复操作可能导致分支数量超过分支预测的数量)并导致更差的分支预测).

e)返回缓冲区实际上已经编入索引,但是我想不到这有多重要(重复代码不会增加调用图的深度).

f)用于存储缓冲区(用于存储转发);如果商店在不同的虚拟页面上,那么他们必须假定商店可能是别名的,无论它是否存在;因此没关系.

g)用于写合并缓冲区;老实说,我不确定是否对它们进行了虚拟索引或物理索引.很有可能,如果这很重要,那么在真正重要之前,您将用完写合并插槽".

Given the separation between virtual addresses that processes manipulate and the physical address that represent an actual location in memory, you can play some interesting tricks: such as creating a circular buffer without a discontinuity at the beginning/end of the allocated space.

I would like to know if such mapping tricks have a penalty for data read or write access in the case:

  • That access to the physical page is mostly through the same virtual mapping but only occasionally through the other mapping(s).
  • Access to the physical page(s) are spread more or less evenly between the virtual addresses that map to the same physical address.

I'm interested especially in x86 chips released over the last decade or so, but also in contemporary ARM and POWER chips.

解决方案

For 80x86 (I don't know about other architectures):

a) the normal instruction/data/unified caches are physically indexed (and therefore unaffected by paging tricks)

b) TLBs are virtually indexed. This means that (depending on a lot of things), for your circular buffer trick, you might expect a lot more TLB misses than you would have seen without the circular buffer trick. Things that could matter include the size of the area and the number of type of TLB entries used (4 KiB, 2 MiB/1 GiB); if the CPU prefetches TLB entries (recent CPUs do) and enough time is spent doing other work to ensure that the prefetched TLBs arrive before they're needed; and if the CPU caches higher level paging structures (e.g. page directories) to avoid fetching every level on a TLB miss (e.g. page table entry alone because the page directory was cached; or PML4 entry then PDPT entry then PD entry then page table entry).

c) Any uop cache (e.g. as part of a loop stream detector, or the old Pentium 4 "trace cache") is virtually indexed or not indexed at all (e.g. CPU just remembers "uops from start of loop"). That won't matter unless you have multiple copies of code; and if you do have multiple copies of code it becomes complicated (e.g. if duplication causes the number of uops to exceed the size of the uop cache).

d) Branch prediction is virtually indexed. This means that if you have multiple copies of the same code it becomes complicated again (e.g. it would increase "training time" for branches that aren't statically predicted correctly; and duplication can cause the number of branches to exceed the number of branch prediction slots and result in worse branch prediction).

e) The return buffer is virtually indexed, but I can't think of how that could matter (duplicating code wouldn't increase the depth of the call graph).

f) For store buffers (used for store forwarding); if stores are on different virtual pages then they have to assume a store may be aliased regardless of whether it is or not; and therefore shouldn't matter.

g) For write combining buffers; I'm honestly not sure if they're virtually indexed or physically indexed. Chances are that if it might matter you're going to run out of "write combining slots" before it actually does matter.

这篇关于访问映射到相同物理地址的虚拟地址是否会受到惩罚?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆