VIPT到PIPT的转换如何在L1-> L2逐出中工作 [英] How does the VIPT to PIPT conversion work on L1->L2 eviction

查看:149
本文介绍了VIPT到PIPT的转换如何在L1-> L2逐出中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这种情况浮现在我的脑海,似乎有点基本,但我会问.

This scenario came into my head and it seems a bit basic but I'll ask.

因此L1中有一个虚拟索引和物理标记,但是该集合已满,因此被逐出. L1控制器如何从虚拟索引和L1中的物理标签获取完整的物理地址,以便可以将线路插入L2?我想它可以在TLB中搜索组合,但这似乎很慢,而且可能根本不在TLB中.也许来自原始TLB转换的完整物理地址存储在高速缓存行旁边的L1中?

So there is a virtual index and physical tag in L1 but the set becomes full so it is evicted. How does the L1 controller get the full physical address from the virtual index and the physical tag in L1 so the line can be inserted into L2? I suppose it could search the TLB for the combination but that seems slow and also it may not be in the TLB at all. Perhaps the full physical address from the original TLB translation is stored in the L1 next to the cache line?

这也引发了一个更广泛的问题:当PMH将访问的位写入PTE和PDE等时,如何使L1条目无效.据我了解,它直接与L2高速缓存连接以获取物理地址,但是当它写入访问和修改的位以及在需要时发送RFO时,如果存在,它将必须反映L1副本中的更改一个,这意味着它必须知道物理地址的虚拟索引.在这种情况下,如果完整的物理地址也存储在L1中,则它为L2提供了一种也可以对其进行索引的方式.

This also opens the wider question of how the PMH invalidates the L1 entry when it writes accessed bits to the PTEs and PDEs and so on. It is my understanding it interfaces with the L2 cache directly for physical addresses but when it writes accessed and modified bits, as well as sending an RFO if it needs to, it would have to reflect the change in the copy in the L1 if there is one, meaning it would have to know the virtual index of the physical address. In this case if the full physical address were also stored in the L1 then it offers a way for the L2 to be able to index it as well.

推荐答案

是的,外部缓存(几乎?)始终是PIPT,并且内存本身显然需要物理地址.因此,在将其发送到内存层次结构中时,需要它的物理地址.

Yes, outer caches are (almost?) always PIPT, and memory itself obviously needs the physical address. So you need the physical address of a line when you send it out the memory hierarchy.

在Intel CPU中,VIPT L1高速缓存具有地址中页内偏移部分的所有索引位,因此virt = phys可以避免任何混叠问题.它基本上是PIPT,但仍然能够与TLB查找页数位并行地从集合中获取数据/标签,从而为标签比较器创建输入.

In Intel CPUs, the VIPT L1 caches have all the index bits from the offset-within-page part of the address, so virt=phys, avoiding any aliasing problems. It's basically PIPT but still being able to fetch data/tags from the set in parallel with the TLB lookup for the pagenumber bits to create an input for the tag comparator.

仅通过L1d索引+标记即可知道完整的物理地址 ,这又是因为除了负载延迟外,它的行为就像PIPT一样.

The full physical address is known just from L1d index + tag, again because it behaves like a PIPT for everything except load latency.

在虚拟索引高速缓存的一般情况下,其中某些索引位确实来自页码,这是一个好问题.确实存在这样的系统,并且OS经常使用页面着色来避免混淆. (因此,它们不需要刷新上下文开关上的缓存.)

In the general case of virtually-indexed caches where some of the index bits do come from the page-number, that's a good question. Such systems do exist, and page-colouring is often used by the OS to avoid aliasing. (So they don't need to flush the cache on context switches.)

虚拟索引的物理标记缓存同义词给出了一个这样的VIPT的示意图L1d:物理标签被扩展了几位,一直向下到页面偏移,与顶部索引位重叠..

Virtually indexed physically tagged cache Synonym has a diagram for one such VIPT L1d: the physical tag is extended a few bits to come all the way down to the page offset, overlapping the top index bit.

很好的观察到,在完成对存储的TLB检查之后,写回高速缓存需要能够清除脏行.与负载不同,除非您将TLB结果存储在某个地方,否则它仍然没有浮动.

Good observation that a write-back cache needs to be able to evict dirty lines long after the TLB check for the store was done. Unlike a load, you don't still have the TLB result floating around unless you stored it somewhere.

让标签包含页面偏移量以上的所有物理地址位(即使它与某些索引位重叠)也可以解决此问题.

Having the tag include all the physical address bits above the page offset (even if that overlaps some index bits) solves this problem.

另一种解决方案是直写式高速缓存,因此即使无法从高速缓存标记+索引中重构数据,您也总是从TLB获得与数据一起发送的物理地址.或用于只读缓存,例如指令缓存,虚拟化不是问题.

Another solution would be a write-through cache, so you do always have the physical address from the TLB to send with the data, even if it's not reconstructable from the cache tag+index. Or for read-only caches, e.g. instruction caches, being virtual isn't a problem.

但是对于不重叠的标签情况,我不认为在撤离时进行TLB检查会解决问题:您没有完整的标签虚拟地址了,只有页码的低位是虚拟的(来自索引),其余都是物理的(来自标签).因此,这不是TLB的有效输入.

But I don't think a TLB check at eviction could solve the problem for the non-overlapping tag case: you don't have the full virtual address anymore, only the low bits of your page-number are virtual (from the index), the rest are physical (from the tag). So this isn't a valid input to the TLB.

因此,除了效率低下外,还有一个同样重要的问题,那就是它根本无法工作. :P也许有一些我不知道的窍门,或者我想念的东西,但是我不认为即使是特殊的TLB索引,两种方式(phys-> virt和virt-> phys)都可以起作用,因为允许使用相同的物理页面.

So besides being inefficient, there's also the equally important problem that it wouldn't work at all. :P Maybe there's some trick I don't know or something I'm missing, but I don't think even a special TLB indexed both ways (phys->virt and virt->phys) could work, because multiple mappings of the same physical page are allowed.

我认为使用VIVT缓存的实际CPU通常会将其作为直写.我对历史的了解不足以肯定地说出或列举任何例子.我不知道如何写回它们,除非它们为每行存储两个标签(物理标签和虚拟标签).

I think real CPUs that have used VIVT caches have normally had them as write-through. I don't know the history well enough to say for sure or cite any examples. I don't see how they could be write-back, unless they stored two tags (physical and virtual) for every line.

我认为早期的RISC CPU通常具有8k直接映射的缓存.

I think early RISC CPUs often had 8k direct-mapped caches.

但是第一代经典5阶段 MIPS R2000 (使用外部SRAM)如果

But first-gen classic 5-stage MIPS R2000 (using external SRAM for its L1) apparently had a PIPT write-back cache, if the diagram in these slides labeled MIPS R2000 is right, showing a 14-bit cache index taking some bits from the physical page number of the TLB result. But it still works with 2 cycle latency for loads (1 for address-generation in the EX stage, 1 for cache access in the MEM stage).

在那些日子里,时钟速度要低得多,并且缓存+ TLB越来越大.我猜想当时ALU中的32位二进制加法器确实具有与TLB +缓存访问相当的延迟,也许不使用激进的超前进位或进位选择设计.

Clock speeds were much lower in those days, and caches+TLBs have gotten larger. I guess back then a 32-bit binary adder in the ALU did have comparable latency to TLB + cache access, maybe not using as aggressive carry-lookahead or carry-select designs.

MIPS 4300i数据表,(使用的MIPS 4200变体在Nintendo 64中)显示了在其5级流水线中何时何地发生的情况,某些事情发生在时钟的上升沿或下降沿,使它可以将某个阶段中的某些事物分成半个时钟. (因此,转发可以从一个阶段的前半部分到另一阶段的后半部分进行工作,例如,对于分支目标->指令提取,仍然不需要在半阶段之间进行额外的锁存.)

A MIPS 4300i datasheet, (variant of MIPS 4200 used in Nintendo 64) shows what happens where/when in its 5-stage pipeline, with some things happening on the rising or falling edge of the clock, letting it divide some things up into half-clocks within a stage. (so e.g. forwarding can work from the first half of one stage to the 2nd half of another, e.g. for branch target -> instruction fetch, still without needing extra latching between half-stages.)

无论如何,它显示EX中发生的DVA(数据虚拟地址)计算:这是lw $t0, 1234($t1)中的寄存器+ imm16.然后,DTLB和DCR(数据高速缓存读取)在数据高速缓存阶段的前半部分并行发生. (所以这是VIPT). DTC(数据标签检查)和LA(负载对齐,例如,对LWL/LWR进行移位,或者对LBU进行移位以从提取的字中提取字节)在阶段的后半部分并行发生.

Anyway, it shows DVA (data virtual address) calculation happening in EX: that's the register + imm16 from a lw $t0, 1234($t1). Then DTLB and DCR (data-cache read) happen in parallel in the first half of the Data Cache stage. (So this is a VIPT). DTC (Data Tag Check) and LA (load alignment e.g. shifting for LWL / LWR, or for LBU to extract a byte from a fetched word) happen in parallel in the 2nd half of the stage.

所以我仍然没有找到单周期(地址计算后)PIPT MIPS的确认.但这绝对证实了单周期VIPT是一回事.从Wikipedia上,我们知道其D缓存是8-kiB直接映射的回写

So I still haven't found confirmation of a single-cycle (after address calculation) PIPT MIPS. But this is definite confirmation that single-cycle VIPT was a thing. From Wikipedia, we know that its D-cache was 8-kiB direct-mapped write-back.

这篇关于VIPT到PIPT的转换如何在L1-> L2逐出中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆