管道停顿和绕过示例 [英] pipeline stalling and bypassing examples

查看:95
本文介绍了管道停顿和绕过示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在上一门计算机体系结构课程.我从另一所大学找到了该网站,该网站上的笔记和视频到目前为止对我有帮助: CS6810,Univ犹他州.我正在通过这些系列笔记,但是需要对一些示例问题进行一些解释.我目前正在看第17-18页的问题7.解决方案在第18页的注释中给出,但是我不确定教授如何得出结论.他在班级网页上说,他没有提供任何解决方案,所以这是不可能的.

I am taking a course on Computer Architecture. I found this website from another University which has notes and videos which are helping me thus far: CS6810, Univ of Utah. I am working through these series of notes but am in need of some explanation on some of the example problems. I am currently looking at Problem 7, on page 17-18. The solutions are given in the notes on page 18 but I am somewhat unsure of how the professor is reaching the conclusions. He states on his class webpage that he does not provide solutions to anything, so that is out of the picture.

对于那些无法查看pdf的人,问题如下:

For those that cannot view the pdf, the problem is as follows:

请考虑一个8级流水线,其中寄存器读(RR)和寄存器写(RW)需要一个完整的周期.密钥:指令提取= IF,解码= DE,ALU = AL,数据存储器= DM,锁存器= = L#

Consider an 8-stage pipeline where Register Read (RR) and Register Write (RW) take a full cycle. Key: Instruction Fetch = IF, Decode = DE, ALU = AL, Data Memory = DM, Latch # = L#

L1-> IF-> L2-> DE-> L3-> RR-> L4-> AL-> L5-> AL-> L6-> DM-- > L7-> DM-> L8-> RR-> L9

L1-->IF-->L2-->DE-->L3-->RR-->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RR-->L9

给出以下一系列指令,确定有无旁路的第二条指令的停顿数

Given the following series of instructions, determine the number of stalls for the 2nd instruction, with and without bypassing

  1. ADD R1 + R2-> R3,ADD R3 + R4-> R5:不绕过5,绕过1
  2. LD [R1]-> R2,添加R2 + R3-> R4:不绕过5,绕过3
  3. LD [R1]-> R2,SD [R2]-> R3:不绕过5,绕过3
  4. LD [R1]-> R2,SD [R3]-> R2:不绕过5,绕过1
  1. ADD R1 + R2 -> R3, ADD R3 + R4 -> R5 : without bypassing 5, with bypassing 1
  2. LD[R1] -> R2, ADD R2 + R3 -> R4 : without bypassing 5, with bypassing 3
  3. LD[R1] -> R2, SD[R2] -> R3 : without bypassing 5, with bypassing 3
  4. LD[R1] -> R2, SD[R3] -> R2 : without bypassing 5, with bypassing 1

我了解它们中的每个将如何在不绕过的情况下产生5个失速,并且我理解第一个在绕过时将仅产生1个失速的情况,但是我不确定2-4如何产生具有绕过的失速.

I understand how each of them will generate 5 stalls without bypassing, and I understand how the first one will only generate 1 stall with bypassing, but I am uncertain of how the stalls with bypassing are generated with 2-4.

任何帮助将不胜感激.

编辑(为进一步澄清,我对案件的理解): ST =失速,暗含闩锁

edit (for further clarification, my understanding of how the cases would look): ST = Stall, latches are implied

1.

IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW
     IF-->DE-->ST-->ST-->ST-->ST-->ST-->RR-->AL-->AL-->DM-->DM-->RW (without)
     IF-->DE-->RR-->ST-->AL-->AL-->DM-->DM-->RW                     (with)

在没有旁路的情况下,I2在进入RR之前停滞,并且必须等待直到写入R3才可以进入RR;这种理解在所有情况下都是普遍的.通过旁路,I2可以进入RR但停顿,直到在第二个ALU阶段之后由I1完成算术为止.

Without bypassing, I2 stalls before entering RR and has to wait until R3 is written before it can enter RR; this understanding is universal amongst all the cases. With bypassing, I2 can enter RR but stalls until the arithmetic is done by I1, which is after the second ALU stage.

2.

IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW
     IF-->DE-->ST-->ST-->ST-->ST-->ST-->RR-->AL-->AL-->DM-->DM-->RW (without)
     IF-->DE-->RR-->ST-->ST-->ST-->AL-->AL-->DM-->DM-->RW           (with)

通过旁路,I2可以进入RR,但必须等待R2处理,并且这会在I1的第二个DM阶段之后发生.

With bypassing, I2 can enter RR but must wait until R2 processed and this occurs after the second DM stage of I1.

3.

IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW
     IF-->DE-->ST-->ST-->ST-->ST-->ST-->RR-->AL-->AL-->DM-->DM-->RW (without)
     IF-->DE-->RR-->ST-->ST-->ST-->AL-->AL-->DM-->DM-->RW           (with)

通过旁路,I2可以进入RR,但必须等待R2被处理,并且这会在I1的第二个DM阶段之后发生.

With bypassing, I2 can enter RR but must wait until R2 is processed and this occurs after the second DM stage of I1.

4.

IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW
     IF-->DE-->ST-->ST-->ST-->ST-->ST-->RR-->AL-->AL-->DM-->DM-->RW (without)
     IF-->DE-->RR-->AL-->AL-->ST-->DM-->DM-->RW                     (with)

通过旁路,I2可以继续沿管道运行直到第二个ALU阶段,并且它必须在这里等待直到可以拉出R2,直到第二个DM阶段之后,I1才对其进行处理.

With bypassing, I2 can continue along the pipeline until the second ALU stage and it must wait here until it can pull R2, which isn't processed by I1 until after its second DM stage.

还有一项,只是为了确保我了解所有内容:

And one more, just to make sure I understand everything:

I1:R1 + R2-> R3,I2:SD [R4]<-R3

I1: R1+R2-->R3, I2: SD[R4]<--R3

IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW
     IF-->DE-->ST-->ST-->ST-->ST-->ST-->RR-->AL-->AL-->DM-->DM-->RW (without)
     IF-->DE-->RR-->AL-->AL-->DM-->DM-->RW                          (with)

据我了解,如果不绕行,它将在相同的位置停顿相同数量的档位(5).但是,使用旁路时,将有0个停顿,因为I2将使用ALU级来计算寄存器地址,并且当需要进行存储时,它可以从I1中的第二个ALU级获取信息.

It is my understanding that without bypassing, it would stall in the same place for the same number of stalls (5). With bypassing, however, there would be 0 stalls because I2 would use the ALU stages to calculate the register address and when it came time to make the store, it could take the information from the 2nd ALU stage in I1.

推荐答案

情况2和3的停顿来自第二条指令,取决于第二条指令的第一个ALU阶段,取决于前一条指令中的加载结果(不可用)直到第二个数据存储阶段之后,如果前面的指令的第二个ALU阶段和两个数据存储阶段都停顿了. (第一个指令的L8与第二个指令的L4对齐.)

The stalls in cases 2 and 3 come from the second instruction depending in its first ALU stage on the result of the load in the previous instruction (which is not available until after the second Data Memory stage, so the stall if for the earlier instruction's second ALU stage and the two Data Memory stages). (L8 of the first instruction lines up with L4 of the second.)

 L1-->IF-->L2-->DE-->L3-->RR-->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RW-->L9
           L1-->IF-->L2-->DE-->L3-->RR-->STALL---->STALL---->STALL---->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RW-->L9

对于情况4,(大概)不需要第二条指令存储在存储器中的值,直到第一条数据存储级和第二条指令的地址生成部分不依赖于第一条指令为止. (第一条指令的L8与第二条指令的L6对齐.)

For case 4, the value stored in memory by the second instruction is (presumably) not needed until the first Data Memory stage and the address generation part of the second instruction has no dependency on the first instruction. (L8 of the first instruction lines up with L6 of the second.)

 L1-->IF-->L2-->DE-->L3-->RR-->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RW-->L9
           L1-->IF-->L2-->DE-->L3-->RR-->L4-->AL-->L5-->AL-->STALL---->L6-->DM-->L7-->DM-->L8-->RW-->L9

(由于写入内存是一种类似于写入寄存器的架构状态承诺,因此对于管道来说,直到RW阶段才需要存储的值可能更为典型.)

(Since the writing to memory is a commitment of architectural state similar to writing the register, it might be more typical for a pipeline not to require the stored value until the RW stage.)

在不跳过的情况下,所有寄存器源操作数都在寄存器读取"阶段从寄存器文件中检索.由于在寄存器写"阶段将新值写入寄存器文件,因此在不绕过给定的8级流水线的情况下,此类情况将需要5个停顿周期.

Without bypassing all register source operands are retrieved from the register file in the Register Read stage. Since a new value is written to the register file in the Register Write stage, without bypassing the given 8-stage pipeline will require 5 cycles of stall for such dependent cases.

 L1-->IF-->L2-->DE-->L3-->RR-->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RW-->L9
           L1-->IF-->L2-->DE-->STALL---->STALL---->STALL---->STALL---->STALL---->L3-->RR-->L4-->AL-->L5-->AL-->L6-->DM-->L7-->DM-->L8-->RW-->L9

通过旁路,可以从可用的最早阶段传送相关值(对于算术指令,第二个ALU阶段的末尾,对于加载指令,第二个数据存储阶段的末尾)—而不是寄存器写阶段-到需要值的相关指令的最早阶段(在算术指令和地址计算的ALU阶段之前,在存储的数据存储阶段之前,如果存储需要较早地存储值,则在这种情况下)该管道)-而不是寄存器读取"阶段.

With bypassing, a dependent value can be communicated from the earliest stage it is available (the end of the second ALU stage for arithmetic instructions, the end of the second Data Memory stage for load instructions)--rather than the Register Write stage--to the earliest stage of the dependent instruction in which the value is needed (before the ALU stages for arithmetic instructions and address computation, before the Data Memory stages for stores if stores require the stored value early as seems to be the case in this pipeline)--rather than the Register Read stage.

(在旁边:某些流水线在周期的前半部分执行寄存器写操作,而在周期的后半部分执行寄存器读操作.这不仅可以减少寄存器文件所需的访问端口数量,而且还可以允许在一个周期之前从寄存器文件中获取值,因为新写入的值的读取可以在与写入相同的周期的后半段进行.这减少了所需的旁路次数.)

(Aside: Some pipelines perform the register write in the first half of the cycle and the register read in the second half of the cycle. Not only can this reduce the number of access ports needed for the register file, but it also allows values to be available from the register file one cycle earlier since the read of a newly written value can occur in the later half of the same cycle as the write. This reduces the amount of bypassing needed.)

这篇关于管道停顿和绕过示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆