是否可以制造一个支持多个ISA的处理器? (例如:ARM + x86) [英] Could a processor be made that supports multiple ISAs? (ex: ARM + x86)

查看:150
本文介绍了是否可以制造一个支持多个ISA的处理器? (例如:ARM + x86)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

自从他们的Skylake(?)体系结构以来,英特尔一直在内部将CISC指令解码为RISC指令,而自从其K5处理器以来,AMD一直在这样做.那么,这是否意味着x86指令在执行过程中被转换为某种奇怪的内部RISC ISA?如果这是正在发生的事情,那么我想知道是否有可能创建一个能够同时理解(即,在内部转换为自己的专有指令)x86和ARM指令的处理器.如果可能的话,性能会如何?为何还没有完成呢?

Intel has been internally decoding CISC instructions to RISC instructions since their Skylake(?) architecture and AMD has been doing so since their K5 processors. So does this mean that the x86 instructions get translated to some weird internal RISC ISA during execution? If that is what is happening, then I wonder if its possible to create a processor that understands (i.e, internally translates to its own proprietary instructions) both x86 and ARM instructions. If that is possible, what would the performance be like? And why hasn't it been done already?

推荐答案

ISA越不同,难度就越大. 这将花费更多的开销,尤其是后端.这不像将一个不同的前端应用于通用的后端微体系结构设计那样简单.

The more different the ISAs, the harder it would be. And the more overhead it would cost, especially the back-end. It's not as easy as slapping a different front-end onto a common back-end microarchitecture design.

如果这只是不同解码器的裸片面积成本,而不是其他功率或性能差异,那么如今这将是很小的,并且在晶体管预算较大的情况下完全可行. (在芯片的关键部分占用空间,使重要的东西彼此远离,这仍然是一项成本,但是在前端这不太可能成为问题).时钟或什至电源门控都可以完全关闭没有使用的解码器.但是,正如我所说,这不是那么简单,因为后端必须设计为支持ISA的指令和其他规则/功能. CPU不会解码为完全通用/中性的RISC后端.相关:为什么英特尔会隐藏内部RISC 对处理器的内部核心有什么看法?对现代英特尔设计中的内部RISC式微控制器的状态有何看法和信息.

If it was just a die area cost for different decoders, not other power or perf differences, that would be minor and totally viable these days, with large transistor budgets. (Taking up space in a critical part of the chip that places important things farther from each other is still a cost, but that's unlikely to be a problem in the front-end). Clock or even power gating could fully power down whichever decoder wasn't being used. But as I said, it's not that simple because the back-end has to be designed to support the ISA's instructions and other rules / features; CPUs don't decode to a fully generic / neutral RISC back-end. Related: Why does Intel hide internal RISC core in their processors? has some thoughts and info about what what the internal RISC-like uops are like in modern Intel designs.

例如,在Skylake中添加ARM支持功能将使其在运行纯x86代码时更慢且功耗更低,并且会增加芯片面积.考虑到它的市场有限,并且需要特殊的OS或虚拟机管理程序软件才能充分利用它,因此在商业上不值得这样做. (尽管随着苹果公司AArch64变得越来越重要,这种情况可能会开始改变.)

Adding ARM support capability to Skylake for example would make it slower and less power-efficient when running pure x86 code, as well as cost more die area. That's not worth it commercially, given the limited market for it, and the need for special OS or hypervisor software to even take advantage of it. (Although that might start to change with AArch64 becoming more relevant thanks to Apple.)

一个可以同时运行ARM和x86代码的CPU会比只处理一个代码的纯设计差得多.

A CPU that could run both ARM and x86 code would be significantly worse at either one than a pure design that only handles one.

  • 要有效运行32位ARM,需要支持完全确定的执行,包括对装载/存储的故障抑制. (与AArch64或x86不同,AArch64或x86仅具有csinc vs. cmov/setcc之类的ALU选择类型指令,这些指令仅对FLAGS及其其他输入具有正常的数据依赖性.)

  • efficiently running 32-bit ARM requires support for fully predicated execution, including fault suppression for loads / stores. (Unlike AArch64 or x86, which only have ALU-select type instructions like csinc vs. cmov / setcc that just have a normal data dependency on FLAGS as well as their other inputs.)

ARM和AArch64(尤其是SIMD改组)具有多个产生2个输出的指令,而几乎所有x86指令仅写入一个输出寄存器.因此,建立了x86微体系结构来跟踪可读取最多3个输入(在Haswell/Broadwell之前2个)并且仅写入1个输出(或1 reg + EFLAGS)的uops.

ARM and AArch64 (especially SIMD shuffles) have several instructions that produce 2 outputs, while almost all x86 instructions only write one output register. So x86 microarchitectures are built to track uops that read up to 3 inputs (2 before Haswell/Broadwell), and write only 1 output (or 1 reg + EFLAGS).

x86需要跟踪CISC指令的各个组成部分,例如内存源操作数的负载和ALU运算符,或内存目标的负载,ALU和存储.

x86 requires tracking the separate components of a CISC instruction, e.g. the load and the ALU uops for a memory source operand, or the load, ALU, and store for a memory destination.

x86需要一致的指令高速缓存,并监听用于修改已获取并在管道中运行的指令的存储,或以某种方式至少处理x86强大的自修改代码ISA保证(通过自我修改在x86上获取陈旧指令代码).

x86 requires coherent instruction caches, and snooping for stores that modify instructions already fetched and in flight in the pipeline, or some way to handle at least x86's strong self-modifying-code ISA guarantees (Observing stale instruction fetching on x86 with self-modifying code).

x86需要一个严格排序的内存模型 . (程序顺序+具有存储转发功能的存储缓冲区).您必须将其放入加载和存储缓冲区中,因此我希望即使在运行ARM代码时,这样的CPU基本上仍将使用x86强大得多的内存模型. (现代Intel CPU推测性地提早加载,并在错误推测的情况下清除了内存命令机器,因此也许您可以让这种情况发生,而进行这些流水线操作.除非是由于错误而造成的. -预测加载是否正在通过此线程重新加载最近的存储;当然仍然必须正确处理该加载.)

x86 requires a strongly-ordered memory model. (program order + store buffer with store-forwarding). You have to bake this in to your load and store buffers, so I expect that even when running ARM code, such a CPU would basically still use x86's far stronger memory model. (Modern Intel CPUs speculatively load early and do a memory order machine clear on mis-speculation, so maybe you could let that happen and simply not do those pipeline nukes. Except in cases where it was due to mis-predicting whether a load was reloading a recent store by this thread or not; that of course still has to be handled correctly.)

纯ARM可能具有较简单的加载/存储缓冲区,它们之间的交互作用不大. (除了为了使stlr/ldar发布/获得便宜而不仅仅是完全停滞的目的.)

A pure ARM could have simpler load / store buffers that didn't interact with each other as much. (Except for the purpose of making stlr / ldar release / acquire cheaper, not just fully stalling.)

不同的页表格式. (您可能会选择一个或另一个供操作系统使用,并且仅在本机内核下为用户空间支持另一个ISA.)

Different page-table formats. (You'd probably pick one or the other for the OS to use, and only support the other ISA for user-space under a native kernel.)

如果您 did 尝试完全处理两个ISA中的特权/内核内容,例如因此,您可以使用任一ISA的VM进行硬件虚拟化,并且还拥有诸如控制注册和调试功能之类的东西.

If you did try to fully handle privileged / kernel stuff from both ISAs, e.g. so you could have HW virtualization with VMs of either ISA, you also have stuff like control-register and debug facilities.

对于ISA的其他组合(特别是AArch64 + ARM )已经存在,但是x86-64和32位x86的计算机代码格式略有不同,并且寄存器集更大.这些ISA对当然是设计为兼容的,并且新ISA的内核具有将旧ISA作为用户空间进程运行的支持.

This already exists for other combinations of ISAs, notably AArch64 + ARM, but also x86-64 and 32-bit x86 have slightly different machine code formats, and a larger register set. Those pairs ISAs were of course designed to be compatible, and for kernels for the new ISA to have support for running the older ISA as user-space processes.

在最简单的范围内,我们有x86-64 CPU,它们支持在64位内核下运行32位x86机器代码(以"compat模式").他们对所有模式完全使用相同的管道获取/解码/发行/乱序执行管道. 64位x86机器码特意类似于16位和32位模式,可以使用相同的解码器,但与模式相关的解码差异很小. (就像inc/dec与REX前缀一样.)AMD故意非常保守,不幸的是,对于64位模式,许多次要的x86疣保持不变,以使解码器尽可能相似. (也许万一AMD64甚至没有流行起来,他们也不想卡在人们不愿使用的多余晶体管上.)

At the easiest end of the spectrum, we have x86-64 CPUs which support running 32-bit x86 machine code (in "compat mode") under a 64-bit kernel. They fully use the same pipeline fetch/decode/issue/out-of-order-exec pipeline for all modes. 64-bit x86 machine code is intentionally similar enough to 16 and 32-bit modes that the same decoders can be used, with only a few mode-dependent decoding differences. (Like inc/dec vs. REX prefix.) AMD was intentionally very conservative, unfortunately, leaving many minor x86 warts unchanged for 64-bit mode, to keep decoders as similar as possible. (Perhaps in case AMD64 didn't even catch on, they didn't want to be stuck spending extra transistors that people wouldn't use.)

AArch64和ARM 32位是单独的机器代码格式,在编码方面有显着差异.例如立即数操作数的编码方式不同,我认为大多数操作码都不同.假定流水线具有2个单独的解码器块,并且前端根据模式通过一个或另一个路由指令流.与x86不同,两者都相对容易解码,因此大概还不错.要将指令转换为一致的内部格式,这两个块都不是很大.但是,支持32位ARM确实意味着在整个管道中实现了对谓词的有效支持.

AArch64 and ARM 32-bit are separate machine-code formats with significant differences in encoding. e.g. immediate operands are encoded differently, and I assume most of the opcodes are different. Presumably pipelines have 2 separate decoder blocks, and the front-end routes the instruction stream through one or the other depending on mode. Both are relatively easy to decode, unlike x86, so this is presumably fine; neither block has to be huge to turn instructions into a consistent internal format. Supporting 32-bit ARM does mean somehow implementing efficient support for predication throughout the pipeline, though.

早期的Itanium(IA-64)也具有对x86的硬件支持,定义了x86寄存器状态如何映射到IA-64寄存器状态.这些ISA完全不同.我的理解是x86的支持或多或少是固定"的,而芯片的一个单独区域专用于运行x86机器代码.性能很差,比好的软件仿真还差,因此,一旦准备好,硬件设计就会放弃它. ( https://en.wikipedia.org/wiki/IA-64#Architectural_changes)

Early Itanium (IA-64) also had hardware support for x86, defining how the x86 register state mapped onto the IA-64 register state. Those ISAs are completely different. My understanding was that x86 support was more or less "bolted on", with a separate area of the chip dedicated to running x86 machine code. Performance was bad, worse than good software emulation, so once that was ready the HW designs dropped it. (https://en.wikipedia.org/wiki/IA-64#Architectural_changes)

那么这是否意味着x86指令在执行过程中会转换为某些奇怪的内部RISC ISA?

So does this mean that the x86 instructions get translated to some weird internal RISC ISA during execution?

是的,但是该"RISC ISA"是与ARM不同.例如它具有x86的所有特性,例如,如果移位计数为0,则移位使FLAGS保持不变.(现代Intel通过将shl eax, cl解码为3 oups来处理该问题;如果后面的指令想要读取FLAGS,Nehalem和更早的版本将前端停顿了.换班.)

Yes, but that "RISC ISA" is not similar to ARM. e.g. it has all the quirks of x86, like shifts leaving FLAGS unmodified if the shift count is 0. (Modern Intel handles that by decoding shl eax, cl to 3 uops; Nehalem and earlier stalled the front-end if a later instruction wanted to read FLAGS from a shift.)

可能需要一个更好的后端怪异示例,它是x86部分寄存器,例如写入AL和AH,然后读取EAX.后端的RAT(寄存器分配表)必须跟踪所有这些,并发出合并uops或对其进行处理. (请参阅为什么GCC不使用部分寄存器?.)

Probably a better example of a back-end quirk that needs to be supported is x86 partial registers, like writing AL and AH, then reading EAX. The RAT (register allocation table) in the back-end has to track all that, and issue merging uops or however it handles it. (See Why doesn't GCC use partial registers?).

这篇关于是否可以制造一个支持多个ISA的处理器? (例如:ARM + x86)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆