PIPT L1高速缓存也为VIPT的最小关联性,无需将索引转换为物理索引即可访问集合 [英] Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

查看:80
本文介绍了PIPT L1高速缓存也为VIPT的最小关联性,无需将索引转换为物理索引即可访问集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与本科计算机架构课程中有关虚拟内存的部分有关.助教和教授都无法充分回答问题,在线资源也很有限.

问题:

假定处理器具有以下规格:

  • 8KB页面
  • 32位虚拟地址
  • 28位物理地址
  • 一个两级页表,第一级有1KB页表,而在第一级有8KB页表第二级
  • 4字节页表条目
  • 16个条目的8路组合式TLB
  • 除物理帧(页)号外,页表项还包含有效位,可读位,可写位,可执行位和仅内核位.

现在假定此处理器具有32KB的L1高速缓存,其标记是根据物理地址计算的.在计算与虚拟地址相对应的物理地址之前,高速缓存必须具有的最小关联性是什么,以允许访问适当的高速缓存集?

直觉:

我的直觉是,如果高速缓存中的索引数和虚拟页面(即页面表项)的数量可以被彼此整除,那么我们可以直接从高速缓存中检索物理页面中包含的字节,而无需曾经计算过该物理页面,因此提供了较小的加速.但是,我不确定这是否是正确的直觉,并且绝对不知道该如何遵循.有人可以解释一下吗?

注意:如果可以帮助任何人,我计算出的页表条目数为2 ^ 19.

解决方案

在计算与虚拟地址相对应的物理地址之前,高速缓存必须具有的最小关联性是什么,以便允许访问适当的高速缓存集?

仅指定缓存在物理上被标记为 .

您可以始终构建虚拟索引的缓存,没有最小关联性.甚至直接映射(每套1种方式)也可以工作.请参阅缓存寻址方法混乱,以了解关于VIPT与PIPT(以及VIVT甚至是异常情况)的详细信息PIVT).

对于这个问题,我认为它们也意味着没有造成混叠问题" ,因此VIPT只是PIPT(物理索引,物理标有标签)的提速.您将获得允许在获取索引集的方式的同时与获取标签(和数据)并行进行TLB查找的好处,而没有任何不利之处.

我的直觉是,如果高速缓存中的索引数和虚拟页面(即页面表项)的数量可以被彼此整除,那么我们可以直接从高速缓存中检索物理页面中包含的字节,而无需曾经计算过该物理页面

您需要实际地址来检查标签;请记住,您的缓存已被物理标记.(确实存在虚拟标记的高速缓存,但是通常必须在上下文切换到具有不同页表=不同虚拟地址空间的进程上进行刷新.这曾经用于旧CPU上的小型L1高速缓存.)

通常都假定两个数字均为2的幂,因此它们总是被整除.

页面大小始终是2的幂,因此您只需将地址中不同的位范围取整即可将地址分为页码和页内偏移量.

小型/快速缓存大小也总是具有2个数量集的幂,因此索引函数"仅从地址中获取一定范围的位.对于虚拟索引的缓存:来自虚拟地址.对于物理索引的缓存:来自物理地址.(像大型共享L3高速缓存那样的外部高速缓存可能具有更高级的索引功能,例如更多地址位的哈希,以避免别名相互偏移2的大幂.)

缓存 size 可能不是2的幂,但是您可以通过具有非2的幂的关联性(例如10或12种方式并不罕见)来实现,而不是非2的幂的行大小或套数.索引一个集合后,缓存将获取该集合的所有方式的标签,并并行比较它们.(对于快速的L1高速缓存,通常也并行获取由行偏移量位选择的数据,然后比较器会将数据复用到输出中,或者引发一个不匹配的标志.)


不带别名的VIPT要求(例如PIPT)

在这种情况下,您需要所有索引位都来自页面偏移量以下.他们将虚拟环境免费"转换为物理环境,因此VIPT高速缓存(在TLB查找之前对集合进行索引)不会出现同音/同义词问题.除了性能,它是PIPT.

我在虚拟索引的物理标记缓存同义词显示了缓存确实存在的情况具有该属性,并且操作系统需要对页面进行着色,以避免出现同义词问题.

如何为集合关联缓存和TLB中的标签,索引和偏移量计算缓存位宽,有关赋予该属性的缓存大小/关联性有更多说明.

公式:

  • 最小关联性=缓存大小/页面大小

例如一个具有8kiB页的系统需要一个32kiB L1高速缓存至少是4位关联的,以便索引位仅来自低13位.

直接映射的高速缓存(每组1路)只能长达1页:行内字节数和索引位总计达页面内字节数偏移量.直接映射(1路)高速缓存中的每个字节必须具有唯一的index:offset地址,并且这些位来自完整地址的连续低位.

换句话说, 2 ^(idx_bits + inner_line_bits)是缓存的总大小,每组只有一种.2 ^ N是页面大小,页面偏移量为N(免费转换的页面内字节地址位的数量).

实际的套数(在这种情况下=行)取决于行大小和页面大小.使用较小/较大的行只会偏移偏移量和索引位之间的距离.

从那里开始,在不从更高地址位索引的情况下增大缓存的唯一方法是在每组中添加更多的方法,而不是更多的方法.

This question comes in context of a section on virtual memory in an undergraduate computer architecture course. Neither the teaching assistants nor the professor were able to answer it sufficiently, and online resources are limited.

Question:

Suppose a processor with the following specifications:

  • 8KB pages
  • 32-bit virtual addresses
  • 28-bit physical addresses
  • a two-level page table, with a 1KB page table at the first level, and 8KB page tables at the second level
  • 4-byte page table entries
  • a 16-entry 8-way set associative TLB
  • in addition to the physical frame (page) number, page table entries contain a valid bit, a readable bit, a writeable bit, an executable bit, and a kernel-only bit.

Now suppose this processor has a 32KB L1 cache whose tags are computed based on physical addresses. What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?

Intuition:

My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page, thus providing a small speed-up. However, I am unsure if this is the correct intuition and definitely don't know how to follow through with it. Could someone please explain this?

Note: I have computed the number of page table entries to be 2^19, if that helps anyone.

解决方案

What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?

They're only specified that the cache is physically tagged.

You can always build a virtually indexed cache, no minimum associativity. Even direct-mapped (1 way per set) works. See Cache Addressing Methods Confusion for details on VIPT vs. PIPT (and VIVT, and even the unusual PIVT).

For this question not to be trivial, I assume they also meant "without creating aliasing problems", so VIPT is just a speedup over PIPT (physically indexed, phyiscally tagged). You get the benefit of allowing TLB lookup in parallel with fetching tags (and data) for the ways of the indexed set without any downsides.

My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page

You need the physical address to check against the tags; remember your cache is physically tagged. (Virtually tagged caches do exist, but typically have to get flushed on context switches to a process with different page tables = different virtual address space. This used to be used for small L1 caches on old CPUs.)

Having both numbers be a power of 2 is normally assumed, so they're always evenly divisible.

Page sizes are always a power of 2 so you can split an address into page number and offset-within-page by just taking different ranges of bits in the address.

Small/fast cache sizes also always have a power of 2 number of sets so the index "function" is just taking a range of bits from the address. For a virtually-indexed cache: from the virtual address. For a physically-indexed cache: from the physical address. (Outer caches like a big shared L3 cache may have a fancier indexing function, like a hash of more address bits, to avoid aliasing for addresses offset from each other by a large power of 2.)

The cache size might not be a power of 2, but you'd do that by having a non-power-of-2 associativity (e.g. 10 or 12 ways is not rare) rather than a non-power-of-2 line size or number of sets. After indexing a set, the cache fetches the tags for all the ways of that set and compare them in parallel. (And for fast L1 caches, often fetch the data selected by the line-offset bits in parallel, too, then the comparators just mux that data into the output, or raise a flag for no match.)


Requirements for VIPT without aliasing (like PIPT)

For that case, you need all index bits to come from below the page offset. They translate "for free" from virtual to physical so a VIPT cache (that indexes a set before TLB lookup) has no homonym/synonym problems. Other than performance, it's PIPT.

My detailed answer on Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? includes a section on that speed hack.

Virtually indexed physically tagged cache Synonym shows a case where the cache does not have that property, and needs page coloring by the OS to let avoid synonym problems.

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB has some more notes about cache size / associativity that give that property.

Formula:

  • min associativity = cache size / page size

e.g. a system with 8kiB pages needs a 32kiB L1 cache to be at least 4-way associative so that index bits only come from the low 13.

A direct-mapped cache (1 way per set) can only be as large as 1 page: byte-within-line and index bits total up to the byte-within-page offset. Every byte within a direct-mapped (1-way) cache must have a unique index:offset address, and those bits come from contiguous low bits of the full address.

To put it another way, 2^(idx_bits + within_line_bits) is the total cache size with only one way per set. 2^N is the page size, for a page offset of N (the number of byte-within-page address bits that translate for free).

The actual number of sets (in this case = lines) depends on the line size and page size. Using smaller / larger lines would just shift the divide between offset and index bits.

From there, the only way to make the cache bigger without indexing from higher address bits is to add more ways per set, not more ways.

这篇关于PIPT L1高速缓存也为VIPT的最小关联性,无需将索引转换为物理索引即可访问集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆