如何计算集关联缓存和TLB中标签,索引和偏移量的缓存位宽 [英] How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

查看:242
本文介绍了如何计算集关联缓存和TLB中标签,索引和偏移量的缓存位宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是问题:

我们拥有同时具有64位虚拟和物理位的存储系统48位地址.L1 TLB与64个条目完全关联.虚拟内存中的页面大小为16KB.L1高速缓存为32KB和2路如果设置为关联,则L2高速缓存为2MB,具有4路设置关联.堵塞L1和L2缓存的大小均为64B.L1缓存实际上正在使用索引物理标记(VIPT)方案.

We have memory system with both virtual of 64-bits and physical address of 48-bits. The L1 TLB is fully associative with 64 entries. The page size in virtual memory is 16KB. L1 cache is of 32KB and 2-way set associative, L2 cache is of 2MB and 4-way set associative. Block size of both L1 and L2 cache is 64B. L1 cache is using virtually indexed physically tagged (VIPT) scheme.

我们需要计算标签,索引和偏移量.到目前为止,这是我制定的解决方案:

We are required to compute tags, indices and offsets. This is the solution that I have formulated so far:

  • 页面偏移=日志基数2(页面大小)= 14位
  • 块偏移=对数基数2(块大小)= 6位
  • 虚拟页数=虚拟地址-页面偏移量= 64-14 = 50位
  • L1缓存索引=页面偏移-块偏移= 8位
  • L1标签=物理地址-L1索引块偏移量= 50位
  • TLB索引=对数基数为2(64/64)= 0位{{因为它是完全关联的,并且整个缓存可以被视为一个集合.}
  • TLBtag =虚拟页码-索引= 50位
  • L2缓存索引=日志基数2(缓存大小/(块大小*方式))13位
  • L2标签= 21位

供参考:

这是我计算出的解决方案.请告诉是否有误.在此先感谢:)

This is the solution that I have calculated.Please tell if wrong. Thanks in advance :)

推荐答案

向右看.

您应该真正按照与L2相同的方式计算L1D索引位: log2(32KiB/(64B * 2)) = log2(256) = 8位

You should really calculate L1D index bits the same way you do for L2: log2(32KiB / (64B * 2)) = log2(256) = 8 bits.

只能将L1索引位计算为 page offset-block offset ,因为您的图表显示您的缓存具有所需的属性,即所有索引位都是page-offset位.(因此,对于混叠行为,就像一个PIPT缓存:同义词和同义词是不可能的.因此,您可以获得VIPT的速度,而不会出现任何虚拟缓存的混叠缺点.)

Calculating the L1 index bits as page offset - block offset is only possible because your diagram shows you that your cache has the desirable property that all the index bits are page-offset bits. (So for aliasing behaviour, it's like a PIPT cache: homonyms and synonyms are impossible. So you can get VIPT speed without any of the aliasing downsides of virtual caches.)

所以我想真正地计算方法和检查都是不错的检查.即检查它是否与图表匹配,或图表是否与其他参数匹配.

So I guess really calculating both ways and checking is a good sanity check. i.e. check that it matches the diagram, or that the diagram matches the other parameters.

也不需要L1D索引+偏移位用完"所有页面偏移位:L1D关联性的提高将使1个或多个页面偏移位作为标签的一部分.(这很好,不会引入别名问题,这仅意味着您的L1D不会像给定的关联性和页面大小那样大.)

It's also not required that L1D index+offset bits "use up" all the page offset bits: e.g. increasing L1D associativity would leave 1 or more page-offset bits as part of the tag. (This is fine, and wouldn't introduce aliasing problems, it just means your L1D isn't as big as it could be for a given associativity and page size.)

通常以这种方式构建缓存,尤其是在页面较小的情况下.例如,x86有4k页,而Intel CPU使用32kiB/8路L1D已有十多年了.(32k/8 = 4k).使其更大(64kiB)还需要使其具有16向关联性,因为不能选择更改页面大小.对于具有并行标记+数据提取的低延迟高吞吐量高速缓存,这将开始变得太昂贵了.像奔腾III这样的早期CPU具有16kiB/4路,它们能够将其扩展到32kiB/8路,但是我认为除非有根本的改变,否则我们不应该期望更大的L1D.但是,如果您的虚拟CPU体系结构具有16kiB页,那么具有更多关联性的小型+快速L1D无疑是合理的.(您的图表很清楚,索引一直到页面拆分,但是在不放弃VIPT好处的情况下,其他设计也是可能的.)

It is common to build caches this way, though, especially with smaller page sizes. For example, x86 has 4k pages, and Intel CPUs have used 32kiB / 8-way L1D for over a decade. (32k / 8 = 4k). Making it larger (64kiB) would also require making it 16-way associative, because changing the page size is not an option. This would start to get too expensive for a low-latency high throughput cache with parallel tag + data fetch. Earlier CPUs like Pentium III had 16kiB / 4-way, and they were able to scale that up to 32kiB / 8-way, but I don't think we should expect larger L1D unless something fundamental changes. But with your hypothetical CPU architecture with 16kiB pages, a small+fast L1D with more associativity is certainly plausible. (Your diagram is pretty clear that the index goes all the way up to the page split, but other designs are possible without giving up the VIPT benefits.)

另请参阅

See also Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? for more about the "VIPT hack" and why multi-level caches are necessary to get a combination of low-latency and large capacity in practical designs. (And note that current Intel L1D caches are pipelined and multi-ported (with 2 reads and 1 write per clock) for access widths up to 32 bytes, or even all 64 bytes of a line with AVX512. How can cache be that fast?. So making L1D larger and more highly associative would cost a lot of power.)

这篇关于如何计算集关联缓存和TLB中标签,索引和偏移量的缓存位宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆