发生中断时,流水线中的指令会如何处理? [英] When an interrupt occurs, what happens to instructions in the pipeline?

查看:612
本文介绍了发生中断时,流水线中的指令会如何处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设一个5级流水线体系结构(IF =指令提取,ID =指令解码,EX =执行,MEM =存储器访问,WB =寄存器写回)。必须执行4条指令。



(这些示例指令并不准确,但我相信这一点可以理解)



在第五个时钟周期中,这些



添加a,b,c [IF ID EX MEM WB]



添加a,b,d [IF ID EX MEM]



添加a,b,e [IF ID EX]



添加a,b,f [IF ID]



现在,如果发生硬件中断,这些指令将发生什么。仅在流水线中的所有指令执行完后,才会处理中断吗?

解决方案

首先,术语:



通常,至少在英特尔看来,中断是来自外部世界的东西。通常它不与在处理器上执行的指令同步,即它是一个异步外部中断。



在Intel术语中,异常是由在处理器上执行的指令引起的。例如。页面错误或未定义的指令陷阱。



--- +中断会刷新飞行中的所有指令



在我熟悉的每台机器上-例如自从P5(我在P6上工作过),AMD x86s,ARM,MIPS之后的所有Intel处理器-当收到中断信号时,流水线中的指令几乎总是被刷新,丢弃。



我说几乎总是的唯一原因是,在其中一些机器上,您并不总是处在允许您接收中断的地方。因此,您进入允许中断的下一个位置-通常为任何指令边界-然后丢弃管道中的所有指令。



,中断可能会被阻止。因此,您可以继续操作直到中断被阻止,然后再将它们扔掉。



现在,这些机器并不是完全简单的5级流水线。但是,这种观察-大多数机器在中断逻辑所在的管道级之前,在管道级中丢弃了管道中的所有指令-几乎普遍适用。



在简单的机器中中断逻辑通常位于流水线的最后阶段WB,大致相当于高级计算机的提交流水线阶段。有时它会在例如您的示例中的MEM。因此,在此类计算机上,IF ID EX中的所有指令(通常是MEM)都被丢弃了。



--- ++我为什么在乎:避免浪费工作



这个话题很贴切,因为我建议不要这样做。例如。在我们计划构建P6的客户拜访中,我问客户他们更喜欢哪个-较低的延迟中断,正在运行的刷新指令,或者(稍微)更高的吞吐量,从而允许至少一些正在运行的指令在



但是,尽管有些客户更喜欢后者,但我们还是选择了传统方法,立即进行冲洗。除了较低的延迟外,主要原因还在于复杂性:



例如如果您执行了中断,但是如果已经执行的一条指令也发生了异常,那么在您恢复了IF(指令提取)之后但在中断中的任何指令提交之前,哪个优先?答:要看情况。



--- +++民俗学:大型机操作系统中断批处理



这类似于某些IBM大型机操作系统据报道已运行的方式:




  • 所有中断在正常操作中被阻止定时器中断除外;

  • 在计时器中断中,您可以解除阻塞并处理所有中断;

  • 然后以中断阻止模式返回正常操作



可以想象,他们可能只会使用重载时为中断批处理模式;



--- +++延迟的计算机检查异常



推迟中断以使流水线中已有的指令有机会执行的想法也类似于我所谓的延期机器检查异常(Deferred Machine Check Exception)这一概念,我在大约1991-1996年的最初的Intel P6系列机器检查架构中就包含了这一概念,但是



这是麻烦所在:指令退出后,机器检查错误(如(不可)可纠正的ECC错误)可能会发生(即,在假定是较年轻的指令之后)



AFTER错误的经典示例是由存储在文件中的存储区触发的不可校正的ECC。毕业时写入缓冲区。几乎所有现代机器都执行此操作,所有机器都具有TSO,这几乎意味着,如果您足够小心以免缓冲存储,则总是存在不精确的机器检查错误,这种错误可能是精确的。



BEFORE错误的经典示例是……好吧,每条指令,在任何具有流水线的机器上。但更有趣的是,在分支预测错误的阴影下,错误路径指令上的错误。



当加载指令遇到不可纠正的ECC错误时,您有两种选择: / p>

(1)您可以立即拉动链条,不仅杀死比加载指令更年轻的指令,而且杀死任何更旧的指令



(2),或者您可以将某种状态代码写入控制投机的逻辑中,并在退休时采取例外措施。这几乎是页面错误所要做的,它可以使此类错误更精确,从而有助于调试。



(3)但是,如果出现无法纠正的ECC错误的加载指令是错误的路径指令,并且由于较旧的机上飞行分支预测错误并采取另一种方式而永不退役?



好吧,您可以写下状态以使其更加精确。您应该具有精确错误和不精确错误的计数器。否则,您可能会忽略此类错误路径指令上的错误-毕竟,如果这是一个硬错误,则将再次被触摸,或者可能不会被触摸。错误可能在架构上是无声的-例如



并且,如果您确实需要,可以设置一些位,以便在分支较旧的情况下错误的预测,然后在该时间点接受计算机检查异常。



这样的错误不会在与导致错误的指令相关的程序计数器处发生,但是



我打电话(2)推迟计算机检查异常; (3)就是您如何处理延期。



IIRC,所有Intel P6机器检查异常都不精确。



--- ++令人抓紧的手:更快



因此,我们已经讨论了



0)立即处理中断,或者如果中断被阻止,则执行指令和微指令,直到到达中断未阻止点为止。然后刷新所有正在运行的指令。



1)尝试在管道中执行指令,以免浪费工作。



但是还有第三种可能性:



-1)如果您具有微体系结构状态检查点,请立即执行中断,而不必等待中断未阻塞的点。仅当您在最近的可以安全地进行中断点上拥有所有相关状态的检查点时,才能执行此操作。



这甚至比0还要快,这就是为什么我将其标记为-1)。但是它需要检查点,许多但不是所有积极的CPU都会使用这些检查点-例如英特尔P6不使用检查点。这样的退休后检查点在共享内存的情况下变得很时髦-毕竟,您可以在阻止中断的情况下执行诸如加载和存储之类的内存操作。您甚至可以在CPU之间进行通信。甚至是硬件事务内存通常也不会这样做。



--- +异常标记了受影响的指令



相反,异常(如页面错误)会将指令标记为受影响的指令。



当该指令即将提交时,此时将清除异常之后的所有后续指令,并



可以想象,可以更早地恢复指令提取,这是大多数处理器上已经处理了分支错误预测的方式,在这一点上我们知道异常将会发生。我不知道有人这样做。对于当前的工作负载,异常并不是那么重要。



--- +软件中断



软件



可以想象,这样的一条指令可以在不中断管道的情况下进行处理,就像分支一样。



但是,我熟悉的所有机器都以某种方式进行序列化。以我的说法,它们不会重命名特权级别。



--- +精确中断,EMON,PEBS



另一位张贴者提到了精确的中断。



这是一个历史术语。在大多数现代机器上,中断的定义是精确的。带有不精确中断的较旧机器在市场上并不十分成功。



但是,还有另一种含义,我参与了介绍:当我让英特尔加入时在性能计数器溢出时,首先使用外部硬件,然后在CPU内部产生中断的能力在最初的几代中是完全不精确的。



例如您可以设置计数器来计算退休指令的数量。退出逻辑(RL)将看到指令退出,并向性能事件监视电路(EMON)发出信号。将此信号从RL发送到EMON可能需要两个或三个时钟周期。 EMON将增加计数器,然后查看是否有溢出。溢出将触发对APIC(高级可编程中断控制器)的中断请求。 APIC可能需要几个周期
来了解发生了什么,然后发出退出逻辑信号。



即EMON中断将被不正确地发出信号。不是事件发生时,而是事件发生后的某个时间。



为什么这种不精确性?嗯,在1992-6年,性能测量硬件并不是当务之急。我们正在利用现有的中断硬件。乞eg不能成为选择者。



但是,某些表现本质上是不精确的。例如。您何时在永不退出的推测性指令上发出缓存未命中中断信号? (我有一个叫Deferred EMON事件的方案,但是仍然认为它太昂贵了。)为此,关于存储指令的高速缓存未命中,将存储放入存储缓冲区,并且指令已经退出,该怎么办? / p>

即有时性能事件会在与它们关联的指令提交(退出)之后发生。有时以前。



但是在到目前为止的所有实现中,据我所知,这些性能事件都被视为中断:



现在,您可以通过将其视为陷阱来使性能事件更加精确。例如。如果是诸如指令退出的事件,则可以立即设置退出逻辑陷阱,而不用执行上述的circuit回循环。如果它在管道中较早发生,则可以在ROB(重排序缓冲区)的指令错误状态中标记出它已发生的事实。英特尔对PEBS(基于精确事件的精确采样)所做的就是这样。 http://software.intel.com/sites/products/collat​​eral /hpc/vtune/performance_analysis_guide.pdf



不过,请注意,并非所有事件都可以使用PEBS进行采样。例如,上例中的PEBS可以计算发生高速缓存命中或未命中但未存储的负载(因为存储稍后发生)。



因此,这类似于例外情况:仅在指令退出时才传递事件。因为从某种意义上说,该事件尚未完全发生-这是一条加载指令,它会导致高速缓存未命中,然后退出。并将标记的PEBS指令之后的指令从管道中清除。



我希望--- +早期计算机的后期添加


Assume a 5 stage pipeline architecture (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back).There are 4 instructions that has to be executed.

(These sample instruction are not accurate, but I believe the point would be understood)

In the fifth clock cycle, these instruction will be in pipeline as shown below.

Add a, b, c [IF ID EX MEM WB]

Add a, b, d [IF ID EX MEM]

Add a, b, e [IF ID EX]

Add a, b, f [IF ID]

Now if an hardware interrupts occur what happens to these instructions. Will the interrupt be handled only after all the instructions in the pipeline is executed? Will the software interrupts and exceptions be handled in a different way??

解决方案

First, terminology:

Usually, at Intel at least, an interrupt is something that comes from the outside world. Usually it is not synchronized with instructions executing on the processor, i.e. it is an asynchronous external interrupt.

In Intel terminology an exception is something caused by instructions executing on the processor. E.g. a page fault, or an undefined instruction trap.

---+ Interrupts flush all instructions in flight

On every machine that I am familiar with - e.g. all Intel processors since the P5 (I worked on the P6), AMD x86s, ARM, MIPS - when the interrupt signal is received the instructions in the pipeline are nearly always flushed, thrown away.

The only reason I say "nearly always" is that on some of these machines you are not always at a place where you are allowed to receive an interrupt. So, you proceed to the next place where an interrupt is allowed - any instruction boundary, typically - and THEN throw away all of the instructions in the pipeline.

For that matter, interrupts may be blocked. So you proceed until interrupts are unblocked, and THEN you throw them away.

Now, these machines aren't exactly simple 5 stage pipelines. Nevertheless, this observation - that most machines throw away all instructions in the pipeline, in pipestages before the pipestage where the interrupt logic lives - remains almost universally true.

In simple machines the interrupt logic typically lives in the last stage of the pipeline, WB, corresponding roughly to the commit pipestage of advanced machines. Sometimes it is moved up to a pipestage just before, e.g. MEM in your example. So, on such machines, all instructions in IF ID EX, and usually MEM, are thrown away.

---++ Why I care: Avoiding Wasted Work

This topic is near and dear to my heart because I have proposed NOT doing this. E.g. in customer visits while we were planning to build the P6, I asked customers which they preferred - lower latency interrupts, flushing instructions that are in flight, or (slightly) higher throughput, allowing at least some of the instructions in flight to complete, at the cost of slightly longer latency.

However, although some customers preferred the latter, we chose to do the traditional thing, flushing immediately. Apart from the lower latency, the main reason is complexity:

E.g. if you take an interrupt, but if one of the instructions already in flight also takes an exception, after you have resteered IF (instruction fetch) but before any instruction in the interrupt has committed, which takes priority? A: it depends. And that sort of thing is a pain to deal with.

---+++ Folklore: Mainframe OS Interrupt Batching

This is rather like the way that some IBM mainframe OSes are reported to have operated:

  • with all interrupts blocked in normal operation except for the timer interrupt;
  • in the timer interrupt, you unblock interrupts, and handle them all;
  • and then return to normal operation with interrupts blocked mode

Conceivably they might only use such an "interrupt batching" mode when heavily loaded; if lightly loaded, they might not block interrupts.

---+++ Deferred Machine Check Exceptions

The idea of deferring interrupts to give instructions already in the pipeline a chance to execute is also similar to what I call the Deferred Machine Check Exception - a concept that I included in the original Intel P6 family Machine Check Architecture, circa 1991-1996, but which appears not to have been released.

Here's the rub: machine check errors like (un)correctable ECC errors can occur AFTER an instruction has retired (i.e. after supposedly younger instructions have committed state, e.g. written registers), or BEFORE the instruction has retired.

The classic example of AFTER errors is an uncorrectable ECC triggered by a store that is placed into a write buffer at graduation. Pretty much all modern machines do this, all machines with TSO, which pretty much means that there is always the possibility of an imprecise machine check error that could have been precise if you cared enough not to buffer stores.

The classic example of BEFORE errors is ... well, every instruction, on any machine with a pipeline. But more interestingly, errors on wrong-path instructions, in the shadow of a branch misprediction.

When a load instruction gets an uncorrectable ECC error, you have two choices:

(1) you could pull the chain immediately, killing not just instructions YOUNGER than the load instruction but also any OLDER instructions

(2) or you could write some sort of status code into the logic that controls speculation, and take the exception at retirement. This is pretty much what you have to do for a page fault, and it makes such errors precise, helping debugging.

(3) But what if the load instruction that got the uncorrectable ECC error was a wrong path instruction, and never retires because an older inflight branch mispredicted and went another way?

Well, you could write the status to try to make it precise. You should have counters of precise errors and imprecise errors. You could otherwise ignore an error on such a wrong-path instruction - after all, if it is a hard error, it wil either be touched again, or it might not be./ E.g. it is possible that the error would be architecturally silent - e.g. a bad cache line might be overwritten by a good cache line for the same address .

And, if you really wanted, you could set a bit so that if an older branch mispredicts, then you take the machine check exception at that point in time.

Such an error would not occur at a program counter associated with the instruction that caused the error, but might still have otherwise precise state.

I call (2) deferring a machine check exception; (3) is just how you might handle the deferral.

IIRC, all Intel P6 machine check exceptions were imprecise.

---++ On the gripping hand: even faster

So, we have discussed

0) taking the interrupt immediately, or, if interrupts are blocked, executing instructions and microinstructions until an interrupt unblocked point is reached. And then flushing all instructions in flight.

1) trying to execute instructions in the pipeline, so as to avoid wasted work.

But there is a third possibility:

-1) if you have microarchitecture state checkpoints, take the interrupt immediately, never waiting to an interrupt unblocked point. Which you can only do if you have a checkpoint of all relevant state at the most recent "safe to take an interrupt" point.

This is even faster than 0), which is why I labelled it -1). But it requires checkpoints, which many but not all aggressive CPUs use - e.g. Intel P6 dod not use checkpoints. And such post-retirement checkpoints get funky in the presence of shared memory - after all, you can do memory operations like loads and stores while interrupts are blocked. And you can even communicate between CPUs. Even hardware transactional memory usually doesn't do that.

---+ Exceptions mark the instructions affected

Conversely, exceptions, things like page faults, mark the instruction affected.

When that instruction is about to commit, at that point all later instructions after the exception are flushed, and instruction fetch is redirected.

Conceivably, instruction fetch could be resteered earlier, the way branch mispredictions are already handled on most processors, at the point at which we know that the exception is going to occur. I don't know anyone who does this. On current workloads, exceptions are not that important.

---+ "Software Interrupts"

"Software interrupts" are a misnamed instruction usually associated with system calls.

Conceivably, such an instruction could be handled without interrupting the pipeline, predicted like a branch.

However, all of the machines I am familiar with serialize in some way. In my parlance, they do not rename the privilege level.

---+ "Precise Interrupts", EMON, PEBS

Another poster mentioned precise interrupts.

This is a historical term. On most modern machines interrupts are defined to be precise. Older machines with imprecise interrupts have not been very successful in the market place.

However, there is an alternate meaning, I was involved in introducing: when I got Intel to add the capability to produce an interrupt on performance counter overflow, first using external hardware, and then inside the CPU, it was, in the first few generations, completely imprecise.

E.g. you might set the counter to count the number of instructions retired. The retirement logic (RL)would see the instructions retire, and signal the performance event monitoring circuitry (EMON). It might take two or three clock cycles to send this signal from RL to EMON. EMON would increment the counter, and then see that there was an overflow. The overflow would trigger an interrupt request to the APIC (Advanced Programmable Interrupt Controller). The APIC might take a few cycles to figure out what was happening, and then signal the retirement logic.

I.e. the EMON interrupt would be signalled imprecisely. Not at the time of the event, but some time thereafter.

Why this imprecision? Well, in 1992-6, performance measurement hardware was not a high priority. We were leveraging existing interrupt hardware. Beggars can't be choosers.

But furthermore, some performance are intrinsically imprecise. E.g. when do you signal an interrupt for a cache miss on a speculative instruction that never retires? (I have a scheme I called Deferred EMON events, but this is still considered too expensive.) For that matter, what about cache misses on store instructions, where the store is placed into a store buffer, and the instruction has already retired?

I.e. sometimes performance events occur after the instruction they are associated with has committed (retired). Sometimes before. And often not exactly at the instruction they are associated with.

But in all of the implementations so far, as far as I know, these performance events are treated like interrupts: existing instructions in the pipe are flushed.

Now, you can make a performance event precise by treating it like a trap. E.g. if it is an event like instructions retired, you can have the retirement logic trap immediately, instead of taking that circuitous loop I described above. If it occurs earlier in the pipeline, you can have the fact that it occurred marked in the instruction fault status in the ROB (Re-Order Buffer). Something like this is what Intel has done with PEBS (Precise Event Based Sampling). http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf.

However, note that not all events can be sampled using PEBS. For example, PEBS in the example above can count loads that took a cache hit or miss, but not stores (since stores occur later).

So this is like exceptions: the event is delivered only when the instruction retires. Because in a sense the event has not completely occurred - it is a load instruction, that takes a cache miss, and then retires. And instructions after the marked PEBS instruction are flushed from the pipeline.

I hope ---+ Late Addition About Early Computers

这篇关于发生中断时,流水线中的指令会如何处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆