什么是微编码指令? [英] What is a microcoded instruction?

查看:26
本文介绍了什么是微编码指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过很多参考微编码指令的文献.

I have seen a lot of literature referencing microcoded instructions.

这些是什么以及为什么使用它们?

What are these and why they are used?

推荐答案

CPU 读取机器代码并将其解码为内部控制信号,将正确的数据发送到正确的执行单元.

A CPU reads machine code and decodes it into internal control signals that send the right data to the right execution units.

大多数指令映射到一个内部操作,并且可以直接解码.(例如,在 x86 上,add eax, edx 只是将 eax 和 edx 发送到整数 ALU 以进行 ADD 运算,并将结果放入 eax.)

Most instructions map to one internal operation, and can be decoded directly. (e.g. on x86, add eax, edx just sends eax and edx to the integer ALU for an ADD operation, and puts the result in eax.)

其他一些单一指令可以做更多的工作.例如x86 的 rep movs实现memcpy(edi, esi, ecx),需要CPU循环.

Some other single instructions do much more work. e.g. x86's rep movs implements memcpy(edi, esi, ecx), and requires the CPU to loop.

当指令解码器看到这样的指令时,它们不是直接产生内部控制信号,而是从微码 ROM 中读取微码.

When the instruction decoders see an instruction like that, instead of just producing internal control signals directly they read micro-code out of the microcode ROM.

微编码指令是一种解码为许多内部操作的指令

现代 x86 CPU 总是将 x86 指令解码为内部微操作.在这个术语中,它仍然不算作微编码".即使 add [mem],eax 解码为从 [mem] 加载,ALU ADD 操作,然后存储回 [mem]>.另一个例子是 xchg eax, edx,其中 在 Intel Haswell 上解码为 3 uop.有趣的是,这与使用 3 条 MOV 指令与临时寄存器进行交换所获得的 uops 不完全相同,因为它们不是零延迟.

Modern x86 CPUs always decode x86 instructions to internal micro-operations. In this terminology, it still doesn't count as "micro-coded" even when add [mem], eax decodes to a load from [mem], an ALU ADD operation, and a store back into [mem]. Another example is xchg eax, edx, which decodes to 3 uops on Intel Haswell. And interestingly, not exactly the same kind of uops you'd get from using 3 MOV instructions to do the exchange with a scratch register, because they aren't zero-latency.

在 Intel/AMD CPU 上,微编码"意味着解码器打开微码定序器将来自 ROM 的 uops 送入流水线,而不是直接产生多个 uops.

On Intel / AMD CPUs, "micro-coded" means the decoders turn on the micro-code sequencer to feed uops from the ROM into the pipeline, instead of producing multiple uops directly.

(如果您使用纯 RISC 术语进行思考,您可以将任何多 uop x86 指令称为微编码",但使用术语微编码"来进行不同的区分很有用,IMO. 我认为这个意思在 x86 优化圈子里很普遍,比如 Intel 的优化手册.其他人可能对术语使用不同的含义,尤其是在将 x86 与 RISC 进行比较时谈论其他架构或一般计算机架构时.)

(You could call any multi-uop x86 instruction "microcoded" if you were thinking in pure RISC terms, but it's useful to use the term "microcoded" to make a different distinction, IMO. This meaning is I think widespread in x86 optimization circles, like Intel's optimization manual. Other people may use different meanings for terminology, especially if talking about other architectures or about computer architecture in general when comparing x86 to a RISC.)

在当前的 Intel CPU 中,解码器可以直接生成的内容的限制是 4 uops(融合域),而无需进入微码 ROM.AMD 同样具有 FastPath(又名 DirectPath)单指令或双指令(1 或 2 个宏操作",AMD 相当于 uops),除此之外它是 VectorPath 又名微码,正如 David Kanter 对 AMD Bulldozer 的深入研究,特别是关于它的解码器.

In current Intel CPUs, the limit on what the decoders can produce directly, without going to micro-code ROM, is 4 uops (fused-domain). AMD similarly has FastPath (aka DirectPath) single or double instructions (1 or 2 "macro-ops", AMD's equivalent of uops), and beyond that it's VectorPath aka Microcode, as explained in David Kanter's in-depth look at AMD Bulldozer, specifically talking about its decoders.

另一个例子是 x86 的整数 DIV 指令,即使在像 Haswell 这样的现代英特尔 CPU 上也是微编码的.但不是 AMD;AMD 只有一两个 uops 激活整数分频器单元内的所有内容.它不是 DIV 的基础,只是一个实现选择.请参阅我对 C++ 代码的回答,用于更快地测试 Collat​​z 猜想而不是手写程序集 - 为什么? 对于数字.

Another example is x86's integer DIV instruction, which is micro-coded even on modern Intel CPUs like Haswell. But not AMD; AMD just has one or 2 uops activate everything inside the integer divider unit. It's not fundamental to DIV, just an implementation choice. See my answer on C++ code for testing the Collatz conjecture faster than hand-written assembly - why? for the numbers.

FP 除法也很慢,但被解码为单个 uop,因此它不会成为前端的瓶颈.如果 FP 除法很少见并且不是延迟瓶颈的一部分,那么它可以像乘法一样便宜.(但如果执行确实必须等待其结果,或者其吞吐量出现瓶颈,它要慢得多.)这个答案.

FP division is also slow, but is decoded to a single uop so it doesn't bottleneck the front-end. If FP division is rare and not part of a latency bottleneck, it can be as cheap as multiplication. (But if execution does have to wait for its result, or bottlenecks on its throughput, it's much slower.) More in this answer.

整数除法和其他微编码指令会给 CPU 带来困难,创造了使代码对齐变得重要的效果,否则它不会如此.

Integer division and other micro-coded instructions can give the CPU a hard time, and creates effects that make code alignment matter where it wouldn't otherwise.

要了解有关 x86 CPU 内部结构的更多信息,请参阅 标记 wiki,尤其是 Agner Fog 的微架构指南.

To learn more about x86 CPU internals, see the x86 tag wiki, and especially Agner Fog's microarch guide.

此外,David Kanter 对 x86 微体系结构的深入研究有助于理解 uops 所经历的管道:Core 2Sandy Bridge 是主要的,AMD K8 和推土机的文章也很有趣对比.

Also David Kanter's deep dives into x86 microarchitectures are useful to understand the pipeline that uops go through: Core 2 and Sandy Bridge being major ones, also AMD K8 and Bulldozer articles are interesting for comparison.

RISC 与 CISC 仍然很重要(2000 年 2 月)作者Paul DeMone 着眼于 PPro 如何将指令分解为 uops,而 RISC 中大多数指令已经很简单,只需一步通过管道,只有少数指令如 ARM push/pop 多个寄存器需要发送多个管道中的事物(也就是 RISC 术语中的微编码).

RISC vs. CISC Still Matters (Feb 2000) by Paul DeMone looks at how PPro breaks down instructions into uops, vs. RISCs where most instructions are already simple to just go through the pipeline in one step, with only rare ones like ARM push/pop multiple registers needing to send multiple things down the pipeline (aka microcoded in RISC terms).

为了更好的衡量,现代微处理器90 分钟指南! 总是值得推荐的流水线和 OoO 执行基础知识.

And for good measure, Modern Microprocessors A 90-Minute Guide! is always worth recommending for the basics of pipelining and OoO exec.

在一些较旧/较简单的 CPU 中,每条指令都经过有效微编码.例如,6502 通过运行来自 PLA 解码 ROM 的一系列内部指令来执行 6502 条指令.这适用于非流水线 CPU,其中 CPU 不同部分的使用顺序可能因指令而异.

In some older / simpler CPUs, every instruction was effectively micro-coded. For example, the 6502 executed 6502 instructions by running a sequence of internal instructions from a PLA decode ROM. This works well for a non-pipelined CPU, where the order of using the different parts of the CPU can vary from instruction to instruction.

从历史上看,微码"有不同的技术含义,其含义类似于从指令字解码的内部控制信号.特别是在像 MIPS 这样的 CPU 中,指令字直接映射到那些控制信号,无需复杂的解码.(我可能有部分错误;我读过类似的内容(除了在这个问题的已删除答案中),但后来找不到了.)

Historically, there was a different technical meaning for "microcode", meaning something like the internal control signals decoded from the instruction word. Especially in a CPU like MIPS where the instruction word mapped directly to those control signals, without complicated decoding. (I may have this partly wrong; I read something like this (other than in the deleted answer on this question) but couldn't find it again later.)

这个含义可能仍然在某些圈子中实际使用,例如在设计简单的流水线 CPU 时,例如业余 MIPS.

This meaning may still actually get used in some circles, like when designing a simple pipelined CPU, like a hobby MIPS.

这篇关于什么是微编码指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆