在执行过程中中断指令 [英] Interrupting instruction in the middle of execution

查看:112
本文介绍了在执行过程中中断指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设CPU正在运行一个汇编指令,例如 FOO ,该指令将在多个时钟(例如10个时钟)中执行

正在执行 FOO 的过程中出现了一个中断请求,处理器需要中断.它会等待命令正确执行,还是 FOO 中止并重新启动?考虑到不同类型的中断优先级,它的行为是否有所不同?

解决方案

CPU可以选择执行任一操作,即确定相对于原始指令流处理中断的时间.

已经发布但尚未调度到执行单元的insnins在AMD和Intel的当前实现中被取消.发生中断时,管道?

由于乱序执行,通常会有数十条指令在执行中,并且实际上一次执行ALU的中间可能有多个指令.

但这是一个有趣的问题,是否允许已开始执行但尚未退休的低延迟指令(如 add imul )指出中断处理程序看不到.

如果不是,则可能是由于难以建立用于检测除了当前退休状态之外还有多少个相邻指令准备很快"退休的逻辑的困难.中断很少见(最坏的情况下,每千条指令之一,或者I/O负载低的情况下,每百万条指令中的一条),因此围绕中断处理挤压更多的周围代码吞吐量的好处很低.而且任何潜在的中断延迟成本都是不利的.


某些指令,尤其是微指令,具有无需重新启动即可被中断的机制.例如

  • rep movsb 可以使RSI,RDI和RCX更新到复制的中途(因此它将在重新启动时完成复制).其他REP字符串指令也可以类似地被中断.对于中断,只有一次操作是原子操作.

    即使单步进入调试器(通过设置TF),CPU也会在每次计数后中断,因此从中断PoV来看,它实际上是在重复单独的 movsb 指令RCX次.

  • AVX2的收集方式如 vpgatherdd 一样输入掩码向量,显示要收集或忽略哪些元素.成功收集对应的索引后,它将清除蒙版元素.在发生异常情况(例如页面错误)时,故障元素是最右侧的元素,其掩码仍设置(不能保证收集顺序,但是可以确定错误顺序,请参阅Intel的手册).

这使得收集成功而无需同时映射所有相关页面成为可能.即使在内存压力大的情况下,在另一个页面中分页时逐出一个已经聚集的元素也不会导致无限循环.保证前进.

在异步中断上,硬件可以使用掩码记录进度,从而使收集工作部分完成.IDK(如果有的话)实际上是由硬件完成的,但是ISA设计使该选项保持打开状态.

无论如何,这就是为什么您需要为每个聚集在循环内继续创建一个新的全罩蒙版的原因.

AVX512聚集和分散具有相同的机制,但是使用掩码寄存器而不是向量寄存器. http://felixcloutier.com/x86/VPSCATTERDD:VPSCATTERDQ:VPSCATTERQD:VPSCATTERQQ.html


非常慢的指令没有一种中断和重新启动的机制,包括 wbinvd .(将所有缓存同步到主内存并使它们无效).英特尔手册提到 wbinvd 确实会延迟中断.

因此,使用WBINVD指令可能会对逻辑处理器的中断/事件响应时间产生影响.

这可能就是为什么它是特权指令的原因.用户空间可以做很多事情来使系统变慢(例如,占用大量内存带宽),但是不能太大地增加中断延迟.(已经从ROB退出但尚未提交到L1d的存储可能会增加中断延迟,因为它们必须发生并且不能中止.但是,造成大量分散的高速缓存未命中存储正在运行的病理情况更加困难,并且存储缓冲区的大小很小.)


相关:

Suppose that CPU is running an assembly instruction, say, FOO that will be executed in several clocks (e.g. 10)

An interrupt request has come just in the middle of executing FOO and processor needs to interrupt. Does it wait until command is properly executed, or is FOO aborted and will be restarted? Does it behave differently considering different types of interrupts' prioritization?

解决方案

The CPU has the option of deciding to do either one, i.e. deciding when the interrupt was handled relative to the original instruction stream.

Insns that have been issued, but not yet dispatched to an execution unit, are cancelled in current implementations from AMD and Intel. When an interrupt occurs, what happens to instructions in the pipeline?

With out-of-order execution, typically dozens of instructions are in flight, and more than one can literally be in the middle of executing in an ALU at once.

But it's an interesting question whether or not low-latency instructions like add or imul that have started executing but not yet retired will be allowed to complete and update the architectural state that the interrupt handler sees or not.

If not, it's probably because of the difficulty of building the logic for detecting how many more contiguous instructions will be ready to retire "soon", beyond the current retirement state. Interrupts are rare (one per thousands of instructions at worst, or one per millions of instructions with low I/O load), so the benefit of squeezing a bit more throughput of surrounding code around interrupt handling is low. And any potential cost in interrupt latency would be a downside.


Some instructions, especially micro-coded ones, have mechanisms for being interrupted without having to restart from scratch. For example

  • rep movsb can leave RSI, RDI, and RCX updated to part-way through a copy (so it will finish the copy on restart). The other REP-string instructions can similarly be interrupted. Only a single count of the operation is atomic with respect to interrupts.

    Even when single-stepping in a debugger (by setting TF), the CPU breaks after each count, so from an interrupt PoV it really is repeating a separate movsb instruction RCX times.

  • AVX2 gathers like vpgatherdd have an input mask vector that shows which elements to gather vs. ignore. It clears mask elements after successfully gathering the corresponding index. On an exception (e.g. page fault), the faulting element is the right-most element with its mask still set (gather order is not guaranteed, but fault order is, see Intel's manual entry).

This makes it possible for a gather to succeed without needing all the relevant pages to be mapped at the same time. Evicting an already-gathered element while paging in another can't lead to an infinite loop, even in a memory-pressure corner case. Forward progress is guaranteed.

On an async interrupt, the hardware could similarly leave the gather partially done, using the mask to record progress. IDK if any hardware actually does that, but the ISA design leaves that option open.

Anyway, this is why you need to keep creating a fresh all-ones mask inside the loop for every gather.

AVX512 gathers and scatters have the same mechanism but with the a mask register instead of a vector register. http://felixcloutier.com/x86/VPSCATTERDD:VPSCATTERDQ:VPSCATTERQD:VPSCATTERQQ.html


Very slow instructions without a mechanism for being interrupted and restarting include wbinvd. (Sync all caches to main memory and invalidate them). Intel's manual mentions that wbinvd does delay interrupts.

As a consequence, the use of the WBINVD instruction can have an impact on logical processor interrupt/event response time.

This is probably why it's a privileged instruction. There's lots of stuff that user-space can do to make the system slow (e.g. use up lots of memory bandwidth), but it can't increase interrupt latency too dramatically. (Stores that have retired from the ROB but not yet committed to L1d can increase interrupt latency because they have to happen and can't be aborted. But creating a pathological case of lots of scattered cache-miss stores in flight is harder, and the store buffer size is small.)


Related:

这篇关于在执行过程中中断指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆