为什么任何现代x86掩码将计数转换为CL中的5个低位 [英] Why any modern x86 masks shift count to the 5 low bits in CL

查看:62
本文介绍了为什么任何现代x86掩码将计数转换为CL中的5个低位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究x86 ASM中的左右移位操作,例如 shl eax,cl

I'm digging into left and right shift operations in x86 ASM, like shl eax, cl

摘自IA-32英特尔架构软件开发人员手册3

From IA-32 Intel Architecture Software Developer’s Manual 3

所有IA-32处理器(从Intel 286处理器开始)确实掩盖了这一转变计数到5位,导致最大计数为31.此掩码为在所有操作模式(包括虚拟8086模式)下完成减少指令的最大执行时间.

All IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

我正试图了解这种逻辑背后的原因.也许它是这样工作的,因为在硬件级别上,很难使用1个周期对寄存器中的所有32位(或64位)实现移位?

I'm trying to understand the reasoning behind this logic. Maybe it works this way because on a hardware level it is hard to implement shift for all 32 (or 64) bits in a register using 1 cycle?

任何详细的说明都会有很大帮助!

Any detailed explanation would help a lot!

推荐答案

经过修改以更正以下语句:80386,(令我惊讶的是)确实有一个桶形移位器.

Edited to correct statement re: 80386, which (to my surprise) did have a barrel shifter.

很高兴听到286被描述为现代" :-)

Happy to hear the 286 described as "modern" :-)

8086运行了 SHL AX,CL ,每个时钟移位8个时钟+ 4个时钟.因此,如果 CL = 255,则这是一条非常慢的指令!

The 8086 ran a SHL AX, CL in 8 clocks + 4 clocks per bit shifted. So if CL = 255 this is a seriously slow instruction !

因此286赢得了所有人的青睐,并通过掩盖0..31来固定计数.将指令限制为最多5 + 31个时钟.对于16位寄存器,这是一个有趣的折衷方案.

So the 286 did everybody a favour and clamped the count by masking to 0..31. Limiting the instruction to at most 5 + 31 clocks. Which for 16 bit registers is an interesting compromise.

[我找到了《 80186/80188 80C186/80C188硬件参考手册》(订货号270788-001),其中说这种创新首先出现在这里. SHL 等人运行了5 + n个时钟(用于寄存器操作),与286相同.FWIW,186还添加了PUSHA/POPA,PUSH内置,INS/OUTS,BOUND,ENTER/LEAVE,INUL灌输.和SHL/ROL等.我不知道为什么186似乎不是一个人.]

[I found "80186/80188 80C186/80C188 Hardware Reference Manual" (order no. 270788-001) which says that this innovation appears there first. SHL et al ran 5+n clocks (for register operations), same like the 286. FWIW, the 186 also added PUSHA/POPA, PUSH immed., INS/OUTS, BOUND, ENTER/LEAVE, INUL immed. and SHL/ROL etc. immed. I do not know why the 186 appears to be a non-person.]

对于386,他们保留了相同的掩码,但这也适用于32位寄存器移位.我找到了《 80386程序员参考手册》(订货号230985-001)的副本,该副本给出了所有寄存器移位的时钟计数为3.在英特尔80386硬件参考手册"(订货号231732-002)的第2.4节执行单元"中,该执行单元包括:

For the 386 they kept the same mask, but that applies also to 32-bit register shifts. I found a copy of the "80386 Programmer's Reference Manual" (order no. 230985-001), which gives a clock count of 3 for all register shifts. The "Intel 80386 Hardware Reference Manual" (order no. 231732-002), section 2.4 "Execution Unit" says that the Execution Unit includes:

•数据单元包含ALU,八个32位通用寄存器的文件和一个64位桶形移位器(在一个时钟中执行多个位移位).

• The Data Unit contains the ALU, a file of eight 32-bit general-purpose registers, and a 64-bit barrel shifter (which performs multiple bit shifts in one clock).

因此,我不知道为什么它们没有掩盖32位到0..63的移位.在这一点上,我只能提出历史发展理论.

So, I do not know why they did not mask 32-bit shifts to 0..63. At this point I can only suggest the cock-up theory of history.

我同意,没有(GPR)移位对于任何> =参数大小的计数都返回零,这是一个遗憾.这将要求硬件检查设置在底部6/5之外的任何位,并返回零.作为妥协,也许只是Bit6/Bit5.

I agree it is a shame that there isn't a (GPR) shift which returns zero for any count >= argument size. That would require the hardware to check for any bit set beyond the bottom 6/5, and return zero. As a compromise, perhaps just the Bit6/Bit5.

[我还没有尝试过,但是我怀疑使用 PSLLQ 等是很难的工作-将计数和值改写为 xmm 并再次将结果改写回-与测试班次计数和以某种无分支方式掩盖班次的结果相比.]

[I haven't tried it, but I suspect that using PSLLQ et al is hard work -- shuffling count and value to xmm and shuffling the result back again -- compared to testing the shift count and masking the result of a shift in some branch-free fashion.]

无论如何...这种行为的原因似乎是历史原因.

Anyway... the reason for the behaviour appears to be history.

这篇关于为什么任何现代x86掩码将计数转换为CL中的5个低位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆