从程序员的角度来看,“新"处理器中的“新"是什么 [英] What's 'new' in a 'new' processor when viewed from programmer's point

查看:78
本文介绍了从程序员的角度来看,“新"处理器中的“新"是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近对理解底层计算感兴趣.我了解当今使用广泛的计算机遵循x86/x86-64体系结构.

I have recently been interested in understanding low level computing. I understand that today's widely used computers follow x86/x86-64 architecture.

据我了解,架构,更具体地说是指令集架构(ISA)是程序员能够发布给CPU的一组指令.

To my understanding, architecture, more specifically Instruction Set Architecture (ISA) is the set of instructions that the programmer is able to issue to the CPU.

第一个问题,ISA是不断发展还是保持不变?

The first question, Is the ISA keeps evolving or remains the same?

我认为它一直在发展(意味着新指令不断在增加/修改以前的指令?),然后旧处理器如何执行用新指令编写的代码?(它不知道新的指令,但应该能够执行代码,因为它具有x86体系结构).编译器会处理此事还是处理器?基本上,相同的指令集如何能够在所有新旧处理器上运行?

I think that it keeps evolving (meaning new instructions keeps getting added/previous instructions modified?) but then how an old processor be able to execute the code written with new instructions? (it doesn't know about new instructions but should be able to execute the code because it has that x86 architecture). Does compiler handle this thing or the processor? Basically how the same collection of instructions are able to run on all processors, old or new?

最后,除了微体系结构(这不是程序员关心的问题(如果我错了,请纠正我))之外,程序员在处理新处理器时会看到哪些变化?由于微体系结构的变化,由于有效的执行,旧指令可能会快速运行.但是,是否引入了新的说明以允许以前无法执行的操作?还是以前可以用一堆指令来做,但是现在由于硬件的变化可以用一个指令来做什么?新的寄存器?还有什么吗?

Finally, apart from the microarchitecture, which isn't the programmer's concern (correct me if I'm wrong), what changes are seen by the programmer when dealing with a new processor? Due to change in microarchitecture, the old instructions may run fast because of efficient implementation. But are the new instructions introduced to allow what couldn't be done previously? or what could be done previously with bunch of instructions but now can be done with one due to changes in hardware? New registers? anything else?

它是否执行了类似的操作-如果处理器支持此功能强大的新指令以加快执行速度,则使用新指令,否则回退到较慢的旧指令.如果是,谁执行此if-else子句?编译器?如果没有,那会发生什么?

Is it done something like - if the processor supports this new powerful instruction for faster execution, then use the new instruction else fallback to the slower older instruction. If yes, who implements this if - else clause? Compiler? If no, then what happens?

推荐答案

与大多数ISA一样,x86也在不断发展.

Like most ISAs, x86 is evolving.

某些ISA通过重新定义现有的操作码来破坏向后兼容(例如MIPS64r6这样做了),但这很少见.例如MIPS32r6/MIPS64r6就是这样的一个示例: https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6 重新定义了几种编码,并删除了一些指令.

Some ISAs break backwards compat by redefining existing opcodes (e.g. MIPS64r6 did so), but it's somewhat rare. e.g. MIPS32r6 / MIPS64r6 is an example of that: https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6 redefining several encodings, as well as removing a few instructions.

x86从来没有 向后兼容:Ryzen或Skylake-X仍然可以启动并运行可在8086上运行的机器代码.这就是x86 CPU含义的一部分:另请参见 x86的开始:Intel 8080 vs Intel 8086?.(我们只是在谈论机器代码,但是,如果您以旧版BIOS模式而不是UEFI引导PC,即使I/O设备也会被仿真,因此,像早期DOS这样的早期8086 PC OS可能实际上是在本地运行的.)

x86 has never broken backwards compat: a Ryzen or Skylake-X could still boot and run machine code that worked on an 8086. That's part of what it means to be an x86 CPU: see also The start of x86: Intel 8080 vs Intel 8086?. (We're just talking about machine code, but even I/O devices are emulated if you boot a PC in legacy BIOS mode, not UEFI, so a very early 8086 PC OS like early DOS might actually run natively.)

英特尔正在计划从其芯片组中删除一些旧的IBM-PC硬件仿真支持,例如PIC,PIT,A20门.同时也放弃了对传统BIOS引导(CSM)的支持,而只支持UEFI,但CPU本身仍将支持切换回实模式.

Intel is planning to drop some legacy IBM-PC hardware emulation support from its chipsets, like PIC, PIT, A20 gate. And also to drop support for legacy-BIOS bootup (CSM) in favour of just UEFI, but CPUs themselves will still support switching back to real mode.

Intel和AMD将此极端化,以至于16和32仍支持未公开文件的SALC之类的 8086指令(如 sbb al,al 但未更新FLAGS).当前CPU上的位模式,占用了宝贵的操作码编码空间,该空间可用于对新指令进行较短的编码.

Intel and AMD take this to such an extreme that undocumented 8086 instructions like SALC (like sbb al,al but without updating FLAGS) are still supported in 16 and 32-bit mode on current CPUs, using up valuable opcode coding space that could be used for shorter encodings for new instructions.

但是使用新insns的SW仅适用于新的HW.新软件将在当前和将来的硬件上运行,而旧硬件则选择与之兼容.(例如,在32位代码中,您可能避免使用 cmov 或Pentium Pro新增的其他指令,因此您的代码可以在P5(i586)Pentium/PMMX上运行.)

But SW that uses new insns only works on new HW. New software will run on current and future hardware, and old hardware as far back as it chooses to be compatible with. (e.g. in 32-bit code, you might avoid using cmov or other instructions that were new with Pentium Pro, so your code can run on P5 (i586) Pentium / PMMX.)

x86-64设置了一个包含SSE2和Ppro指令(如 cmov )的新基线.因此,幸运的是64位代码不必担心与没有这些功能的旧CPU兼容,而x86-64则需要它们.

x86-64 set a new baseline that includes SSE2, and PPro instructions like cmov. So fortunately 64-bit code doesn't have to ever worry about compat with old CPUs that don't have those things, they're required by x86-64.

一个包含AVX2,FMA和BMI2(例如Haswell)的新基准将非常不错.如果您的编译器可以在整个代码中的任何地方都使用它们,以获取更有效的变量计数移位指令等,而不仅仅是像SIMD指令那样在几个热循环中使用,则BMI1/BMI2尤其有用.但是英特尔仍在销售没有BMI2的新CPU(例如,奔腾/赛扬版本的Skylake/Coffee Lake).

A new baseline that includes AVX2, FMA, and BMI2 (e.g. Haswell) would be quite nice. BMI1/BMI2 especially are most useful if your compiler can use them everywhere throughout your code for more efficient variable-count shift instructions and so on, not just in a couple hot loops like with SIMD instructions. But Intel is still selling new CPUs without BMI2 (e.g. Pentium/Celeron versions of Skylake / Coffee Lake.)

如果没有,那会发生什么?

If no, then what happens?

CPU不支持的指令通常会因 #UD (未定义)而出错.在类似Unix的操作系统上,您的进程将收到SIGILL(非法指令信号.

Instructions not supported by the CPU will normally fault with #UD (UnDefined). On Unix-like OSes, your process will receive a SIGILL (Illegal instruction signal.

(有趣的事实:原始8086没有#UD异常;每个字节序列都解码为 something .)

(Fun fact: original 8086 didn't have a #UD exception; every sequence of bytes decoded as something.)

制作一个将利用新指令但不会在旧CPU上触发非法指令错误的二进制文件的唯一方法是执行运行时CPU检测和动态分配.一些编译器可以为您做到这一点.

The only way to make one binary that will take advantage of new instructions but not trigger illegal instruction faults on old CPUs is by doing runtime CPU detection and dynamic dispatching. Some compilers can do that for you.

新指令的编码可能(在旧CPU上)看起来像是另一条指令的冗余前缀.例如在不支持它的CPU上 lzcnt rep bsr ,仅作为 bsr 运行.并给出与 lzcnt 不同的结果!

New instructions may have an encoding that (on old CPUs) looks like a redundant prefix for a different instruction. e.g. lzcnt on a CPU that doesn't support it will decode as rep bsr, which runs as just bsr. And gives a different result than lzcnt!

(Intel的文档明确指出,不保证将来的CPU可以像当前CPU一样解码无意义前缀的指令.这为它们留出了进行ISA扩展的空间.)

(Intel's docs are explicit that future CPUs are not guaranteed to decode instructions with meaningless prefixes the same way that current CPUs do. This leaves them room to make ISA extensions that way.)

有时,对旧的CPU进行无意义的REP前缀静默忽略对于ISA扩展很有用.例如 暂停 rep nop .它在旧CPU上进行无害解码非常有用,可以将其放置在自旋循环中而无需检查.同样,硬件锁省略(事务性存储器)解码为仍可在旧CPU上运行的代码,实际上执行原子操作而不是开始事务.

Sometimes the silent-ignore of meaningless REP prefixes on old CPUs is useful for ISA extensions. e.g. pause is rep nop. It's very useful that it decodes harmlessly on old CPUs, allowing it to be placed in spin-loops without checking. Similarly, hardware lock-ellision (transactional memory) decodes to code that still works on old CPUs, actually doing the atomic operations instead of beginning a transaction.

另请参阅: 停止指令集之争,作者:Agner Fog .英特尔过去一直不发布即将推出的ISA扩展的详细信息而困扰AMD的历史,因此AMD最终开发了自己不兼容的ISA扩展,并花费了更多年才能为自己的CPU添加新扩展的支持.(例如,SSSE3在Bulldozer之前的AMD CPU上不可用,这意味着即使要求使用新式计算机的游戏多年来在Phenom-II CPU仍然存在的情况下也无法将其作为基准.)

See also: Stop the instruction set war, by Agner Fog. Some history of Intel screwing over AMD by not releasing details for upcoming ISA extensions, so AMD ends up developing their own incompatible ones, and taking more years to add support for a new extension to their own CPUs. (e.g. SSSE3 wasn't available on AMD CPUs before Bulldozer, meaning that even games that require new-ish computers couldn't require it as a baseline for many years while Phenom-II CPUs were still around.)

但是是否引入了新的说明以允许以前无法执行的操作?

But are the new instructions introduced to allow what couldn't be done previously?

8086已完成Turing(有限内存除外),因此无法完成"的最重要形式是寻址更多内存:386中的32位地址,64位地址(err 48虚拟/52物理)在x86-64中.但是这些是通过引入全新的模式而来的;他们还引入的新指令是另外一回事.

8086 is Turing complete (except for bounded memory) so the most important form of "couldn't be done" is addressing more memory: 32-bit addresses in 386, 64-bit addresses (err 48 virtual / 52 physical) in x86-64. But those came by introducing whole new modes; the new instructions they also introduced were a separate thing.

但是,如果您的意思是无法有效地完成 ":

But if you mean "couldn't be done efficiently":

是的,SIMD是最重要的示例之一.MMX,然后是SSE/SSE2,然后是SSE4.x.然后是AVX两倍宽的向量.与一次字节循环相比,并行处理整个16或32字节数据的矢量可以大大提高 strlen memcmp 之类的速度.对于很多数组内容也很有帮助.

Yes, SIMD is one of the most important examples. MMX, then SSE/SSE2, then SSE4.x. Then AVX for twice as wide vectors. Processing a whole vector of 16 or 32 bytes of data in parallel gives a huge speedup for stuff like strlen or memcmp vs. a byte-at-a-time loop. Also very helpful for lots of array stuff.

AVX2是一个有趣的示例,说明通过新指令集启用的新技巧是最有效的方法,该方法是基于蒙版向左打包.例如AVX512内置有此操作,而AVX2 + BMI2允许使用以前无法实现的 pdep / pext 技巧.

AVX2 what is the most efficient way to pack left based on a mask? is an interesting example of new tricks enabled by new instruction sets. e.g. AVX512 has this operation built-in, while AVX2 + BMI2 allows tricks with pdep/pext that weren't possible before.

SSSE3 pshufb 是第一个变量控制随机播放指令,从查找表中加载随机播放控件可以使以前不可能的事情变得高效.例如从字符串获取IPv4地址的最快方法.

SSSE3 pshufb is the first variable-control shuffle instruction, and loading a shuffle-control from a lookup table allows things that weren't previously possible efficiently. e.g. Fastest way to get IPv4 address from string.

如何使用SIMD实现atoi?还向您展示了一些漂亮的东西可以使用x86的 pmaddubsw / pmaddwd 整数乘法+水平加法指令,乘以十进制的位值.

How to implement atoi using SIMD? also shows some nifty things you can do with x86's pmaddubsw / pmaddwd integer multiply + horizontal add instructions, to multiply by decimal place-values.

附录A的错误修复的分支很好地记录了8086之后添加新指令的早期历史.NASM手册.本附录的当前版本删除了每条指令的文本描述,以便为SIMD指令腾出空间.(有很多.)

The earlier history of new instructions being added after 8086 is nicely documented in a bugfixed fork of an appendix of the NASM manual. The current version of this appendix removed text descriptions of each instruction to make room for SIMD instructions. (There are a lot of them.)

A.5.118 IMUL: Signed Integer Multiply
IMUL r/m8                     ; F6 /5                [8086]
IMUL r/m16                    ; o16 F7 /5            [8086]
IMUL r/m32                    ; o32 F7 /5            [386]

IMUL reg16,r/m16              ; o16 0F AF /r         [386]
IMUL reg32,r/m32              ; o32 0F AF /r         [386]

IMUL reg16,imm8               ; o16 6B /r ib         [186]
IMUL reg16,imm16              ; o16 69 /r iw         [186]
IMUL reg32,imm8               ; o32 6B /r ib         [386]
IMUL reg32,imm32              ; o32 69 /r id         [386]

IMUL reg16,r/m16,imm8         ; o16 6B /r ib         [186]
IMUL reg16,r/m16,imm16        ; o16 69 /r iw         [186]
IMUL reg32,r/m32,imm8         ; o32 6B /r ib         [386]
IMUL reg32,r/m32,imm32        ; o32 69 /r id         [386]

当然,任何reg32指令都需要386的32位扩展名,但是请注意,imul-immediate是186中的新功能( imul cx,[bx],123 ),而2操作数imul是386( imul cx,[bx] )中的新功能,可以在不破坏DX:AX的情况下进行乘法运算,从而使AX的特殊性"降低.

Of course any reg32 instruction requires 386 for 32-bit extensions, but note that imul-immediate was new in 186 (imul cx, [bx], 123) while 2-operand imul was new in 386 (imul cx, [bx]), allowing multiply without clobbering DX:AX, making AX less "special".

其他386指令(例如 movsx movzx )在使寄存器更正交方面也走了很长一段路,使您可以有效地将符号扩展到任何寄存器中.在此之前,您必须将数据放入AL并使用 cbw ,或将AX放入 cwd 的AX,以将扩展签名为DX:AX.

Other 386 instructions like movsx and movzx also went a long way towards making the registers more orthogonal, letting you sign-extend into any register efficiently. Before that you had to get your data into AL and use cbw, or into AX for cwd to sign extend into DX:AX.

这篇关于从程序员的角度来看,“新"处理器中的“新"是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆