其中2的补码整数操作可以在不输入调零高比特被使用,如果只结果的低部被通缉? [英] Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?
问题描述
在汇编语言编程,这是相当常见的要计算从不能保证有其他位清零寄存器的低位东西。在更高层次的语言,如C,你会简单地把你的投入规模小,让编译器决定是否需要单独零每个输入的高位,或者是否可以砍下结果后高位事实。
In assembly programming, it's fairly common to want to compute something from the low bits of a register that isn't guaranteed to have the other bits zeroed. In higher level languages like C, you'd simply cast your inputs to the small size and let the compiler decide whether it needs to zero the upper bits of each input separately, or whether it can chop off the upper bits of the result after the fact.
这是X86-64(又名AMD64)尤其常见,因为种种原因 1 ,其中有些是在其他国际检索单位present。
This is especially common for x86-64 (aka AMD64), for various reasons1, some of which are present in other ISAs.
我将使用实例的x86 64位,但其目的是询问/商量 2的补并无符号的二进制算术一般,因为<一href=\"http://stackoverflow.com/questions/2931630/how-are-negative-numbers-re$p$psented-in-32-bit-signed-integer\">all现代的CPU使用它。 (请注意,C和C ++不保证补 4 ,并签署溢出是未定义的行为。)
I'll use 64bit x86 for examples, but the intent is to ask about/discuss 2's complement and unsigned binary arithmetic in general, since all modern CPUs use it. (Note that C and C++ don't guarantee two's complement4, and that signed overflow is undefined behaviour.)
作为一个例子,考虑一个简单的功能,可以编写一个 LEA
指令 2 。 (在X86-64 SysV的(Linux)的 ABI 3 ,前两个功能ARG游戏在 RDI
和 RSI
,并在返回 RAX
。 INT
是一个32位的类型。)
As an example, consider a simple function that can compile to an LEA
instruction2. (In the x86-64 SysV(Linux) ABI3, the first two function args are in rdi
and rsi
, with the return in rax
. int
is a 32bit type.)
; int intfunc(int a, int b) { return a + b*4 + 3; }
intfunc:
lea eax, [edi + esi*4 + 3] ; the obvious choice, but gcc can do better
ret
的gcc知道此外,即使负符号的整数,从右边仅携带到左,所以输入的高位比特可以在不影响什么进入 eax中
。因此,这样可以节省一个指令字节,并使用 LEA EAX,[RDI + RSI * 4 + 3]
和为什么它的工作原理?
And why does it work?
1 为什么这个频繁出现的 X86-64 :
X86-64具有可变长度的指令,其中,额外的preFIX字节改变操作数大小(从32到64或16),因此节省了一个字节,通常可以在被以相同的速度,否则执行的指令。它也有假的依赖关系(AMD / P4 / Silvermont)写入寄存器的低位8B或16B(或当读书迟全寄存器(英特尔pre-IVB)一档)时:由于历史原因,<一个href=\"http://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register\">only写到32B子登记零的其余64B注册。几乎所有的算术和逻辑可以在低8,16或32位,以及完整的64位的通用寄存器,用于上。整数矢量指令也相当不正交,与某些操作没有可用的一些元件的尺寸。
1 Why this comes up frequently for x86-64: x86-64 has variable-length instructions, where an extra prefix byte changes the operand size (from 32 to 64 or 16), so saving a byte is often possible in instructions that are otherwise executed at the same speed. It also has false-dependencies (AMD/P4/Silvermont) when writing the low 8b or 16b of a register (or a stall when later reading the full register (Intel pre-IvB)): For historical reasons, only writes to 32b sub-registers zero the rest of the 64b register. Almost all arithmetic and logic can be used on on the low 8, 16, or 32bits, as well as the full 64bits, of general-purpose registers. Integer vector instructions are also rather non-orthogonal, with some operations not available for some element sizes.
此外,不同于X86-32,ABI的传递函数参数的寄存器,并且是窄类型零高位比特是不需要的。
Furthermore, unlike x86-32, the ABI passes function args in registers, and upper bits aren't required to be zero for narrow types.
2 LEA:如同其他指令,默认的操作 LEA的大小是32位,但默认地址长度是64位。一个操作数大小preFIX字节( 0x66
或 REX.W
)可以使输出操作数大小16或64位。地址大小preFIX字节( 0x67
),可以减少地址的大小32位(64位中的模式)或16位(32位中的模式)。因此,在64位模式, LEA EAX,[EDX + ESI]
需要一个字节比 LEA EAX多,[RDX + RSI]
。
2 LEA: Like other instructions, the default operand size of LEA is 32bit, but the default address size is 64bit. An operand-size prefix byte (0x66
or REX.W
) can make the output operand size 16 or 64bit. An address-size prefix byte (0x67
) can reduce the address size to 32bit (in 64bit mode) or 16bit (in 32bit mode). So in 64bit mode, lea eax, [edx+esi]
takes one byte more than lea eax, [rdx+rsi]
.
这是可以做到 LEA RAX,[EDX + ESI]
,但地址仍只有32位计算(进不设置位32 RAX
)。你与 LEA EAX相同的结果,[RDX + RSI]
,这是两个字节的短。因此,地址大小preFIX与 LEA
从来没有很有用,因为从瓦格纳雾的优秀objconv反汇编拆装输出的注释警告。
It is possible to do lea rax, [edx+esi]
, but the address is still only computed with 32bits (a carry doesn't set bit 32 of rax
). You get identical results with lea eax, [rdx+rsi]
, which is two bytes shorter. Thus, the address-size prefix is never useful with LEA
, as the comments in disassembly output from Agner Fog's excellent objconv disassembler warn.
3 86 ABI :
主叫方的不的具有零(或符号扩展)的用于传递或返回值类型小寄存器64的上部。那想用返回值作为数组索引调用者就必须符号扩展它(与 MOVZX RAX,EAX
,或特殊情况下换EAX指令 cdqe
。(不要与混淆干熄焦
,其中信号扩展 EAX
到 EDX:EAX
如设置为 IDIV
))
3 x86 ABI:
The caller doesn't have to zero (or sign-extend) the upper part of 64bit registers used to pass or return smaller types by value. A caller that wanted to use the return value as an array index would have to sign-extend it (with movzx rax, eax
, or the special-case-for-eax instruction cdqe
. (not to be confused with cdq
, which sign-extends eax
into edx:eax
e.g. to set up for idiv
.))
这意味着函数返回 unsigned int类型
可以计算在 RAX
临时64位的返回值,而不是需要 MOV EAX,EAX
to零高位的 RAX
。这样的设计决定适用于大多数情况:经常调用者不需要任何额外指令的上半部 RAX
忽略未定义位
This means a function returning unsigned int
can compute its return value in a 64bit temporary in rax
, and not require a mov eax, eax
to zero the upper bits of rax
. This design decision works well in most cases: often the caller doesn't need any extra instructions to ignore the undefined bits in the upper half of rax
.
C和C ++做专的不的要求的二进制补码有符号整数(除了的 C ++ 的std ::原子
的类型)。 一个人的补充和符号/幅度也允许,所以的完全的便携C,这些技巧都是唯一有用的用无符号
类型。显然,对于签名的操作,在符号/幅值重新presentation一套符号位装置的其它位中减去,而不是增加,例如。我没有通过逻辑工作的补
C and C++ specifically do not require two's complement binary signed integers (except for C++ std::atomic
types). One's complement and sign/magnitude are also allowed, so for fully portable C, these tricks are only useful with unsigned
types. Obviously for signed operations, a set sign-bit in sign/magnitude representation means the other bits are subtracted, rather than added, for example. I haven't worked through the logic for one's complement
不过,位黑客的说的仅与二的补的是工作 wides $ p $垫时,因为在实际上没有人关心别的。许多与补工作的事情也应该与一种补工作,因为符号位仍然不改变其他位的帧间pretation:它只是具有值 - 2( N -1)(而不是2 N )。符号/幅度再presentation没有这个属性:每一位的位值是依赖于符号位正或负
However, bit-hacks that only work with two's complement are widespread, because in practice nobody cares about anything else. Many things that work with two's complement should also work with one's complement, since the sign bit still doesn't change the interpretation of the other bits: it just has a value of -(2N-1) (instead of 2N). Sign/magnitude representation does not have this property: the place value of every bit is positive or negative depending on the sign bit.
另外请注意,C编译器允许承担签署溢出的从未发生过的,因为它是不确定的行为。因此,如编译器可以做假设(X + 1) - ; X
始终为false 。这使得检测签订溢出C. 注意,而不方便的<无符号环绕(进),并签署溢出的区别/ A>
Also note that C compilers are allowed to assume that signed overflow never happens, because it's undefined behaviour. So e.g. compilers can and do assume (x+1) < x
is always false. This makes detecting signed overflow rather inconvenient in C. Note that the difference between unsigned wraparound (carry) and signed overflow.
推荐答案
- 按位逻辑值
- 左移(包括
*比例
在[REG1 + REG2 *规模+ DISP]
) - 加法/减法(因而
LEA
的说明:是从来不需要地址大小preFIX只需使用所需的操作数大小,如果需要截断。) -
乘法的低一半。例如16b的点¯x16b的 - > 32B - > 16b中可以与32b的点¯x32B来完成。您<一个href=\"http://stackoverflow.com/questions/34111959/packing-bcd-to-dpd-how-to-improve-this-amd64-assembly-routine\">can避免LCP摊位(和部分注册问题)从
IMUL R16,R / M16,imm16
通过使用一个32位的IMUL R32,R / M32,imm32
,然后只读取结果的低16位。 (如果使用M32
的版本,但要小心,与更广泛的内存裁判。)
tl;dr summary:
Wide operations that can be used with garbage in upper bits:
- bitwise logicals
- left shift (including the
*scale
in[reg1 + reg2*scale + disp]
) - addition/subtraction (and thus
LEA
instructions: the address-size prefix is never needed. Just use the desired operand-size to truncate if needed.) The low half of a multiply. e.g. 16b x 16b -> 16b can be done with a 32b x 32b -> 32b. You can avoid LCP stalls (and partial-register problems) from
imul r16, r/m16, imm16
by using a 32bitimul r32, r/m32, imm32
and then reading only the low 16 of the result. (Be careful with wider memory refs if using them32
version, though.)正如英特尔的insn参考手册中指出,
IMUL
的2和3操作数形式是对无符号整数使用安全。输入的符号位不影响结果在的N×n的N位 - &GT; ñ
位的乘法。)As pointed out by Intel's insn ref manual, the 2 and 3 operand forms of
imul
are safe for use on unsigned integers. The sign bits of the inputs don't affect the N bits of the result in aN x N -> N
bit multiply.)明明喜欢进位/溢出/符号标志/零都将在更广泛的操作高位垃圾的影响。 86的变化将移出的最后一位入进位标志,所以这甚至影响变化。
Obviously flags like carry/overflow / sign / zero will all be affected by garbage in high bits of a wider operation. x86's shifts put the last bit shifted out into the carry flag, so this even affect shifts.
- 右键移
-
全乘法:例如为16B 16B点¯x - > 32B,确保输入的上16零或做之前符号扩展一个32B 32B点¯x - > 32B
IMUL
。或者用一个16位的单操作数MUL
或IMUL
来不便结果放在DX: AX
。 (签名与无符号指令的选择将影响以相同的方式上16b中作为零或申请延伸的32b的IMUL
之前。)
- right shift
full multiplication: e.g. for 16b x 16b -> 32b, ensure the upper 16 of the inputs are zero- or sign-extended before doing a 32b x 32b -> 32b
imul
. Or use a 16bit one-operandmul
orimul
to inconveniently put the result indx:ax
. (The choice of signed vs. unsigned instruction will affect the upper 16b in the same way as zero- or sign-extending before a 32bimul
.)
内存寻址(
[RSI + RAX]
)标志或零扩展需要。没有[RSI + EAX]
寻址模式。memory addressing (
[rsi + rax]
): sign or zero-extend as needed. There is no[rsi + eax]
addressing mode.除法和余数
2的补就像usigned基地2个,是一处价值体系。最高位无符号BASE2有2 N-1 的N位数字的地方的值(例如2 31 )。在2的补数,最高位有值-2 N-1 (因此可以作为一个符号位)。 维基百科的文章解释的理解补的其他许多方面和否定一个无符号数BASE2
Two's complement, like unsigned base 2, is a place-value system. The MSB for unsigned base2 has a place value of 2N-1 in an N bit number (e.g. 231). In 2's complement, the MSB has a value of -2N-1 (and thus works as a sign bit). The wikipedia article explains many other ways of understanding 2's complement and negating an unsigned base2 number.
的关键点是,具有符号位组的不改变其他位的相互pretation 的。加法和减法工作完全一样,无符号BASE2,它的签署和无符号之间不同的结果只有国米pretation。 (例如:符号溢出发生时,有一个进不能出符号位的一>)
另外,从随身携带LSB到MSB只(从右到左)传播。减法是一样的:不管有什么事情在高位借,低比特借。如果这引起了上溢或携带,只有高位将受到影响。例如:
In addition, carry propagates from LSB to MSB (right to left) only. Subtraction is the same: regardless of whether there is anything in the high bits to borrow, the low bits borrow it. If that causes an overflow or carry, only the high bits will be affected. e.g.:
0x801F -0x9123 ------- 0xeefc
低8位,
0xFC有
,不取决于他们从借来的东西。他们环绕和借通到上8。The low 8 bits,
0xFC
, don't depend on what they borrowed from. They "wrap around" and pass on the borrow to the upper 8.所以加减有结果的低位不依赖于操作数的高位属性。
由于
LEA
只使用加法(和左移),使用默认的地址大小始终是罚款。截断延迟直到操作数大小进场的结果总是很好。Since
LEA
only uses addition (and left-shift), using the default address-size is always fine. Delaying truncation until the operand-size comes into play for the result is always fine.(例外:16位code可以使用地址大小preFIX做32位数学在32或64B code,地址大小preFIX减少增加的宽度来代替。 )
(Exception: 16bit code can use an address-size prefix to do 32bit math. In 32 or 64b code, the address-size prefix reduces the width instead of increasing.)
乘法可以被认为是由于反复此外,或者作为移位和加法。低一半不会受任何上限比特。在这4位示例中,我已经写了所有相加进入低2结果位位产品。只有两种来源的低2位是参与。很显然,这部作品在一般:部分产品除了之前被移位,在源如此高位从未在结果总体影响低位
Multiplication can be thought of as repeated addition, or as shifting and addition. The low half isn't affected by any upper bits. In this 4-bit example, I've written out all the bit-products that are summed into the low 2 result bits. Only the low 2 bits of either source are involved. It's clear that this works in general: Partial products are shifted before addition, so high bits in the source never affect lower bits in the result in general.
对于更详细的解释更大版本的这种请参见维基百科。有很多很好的谷歌命中的二进制符号的乘法的,包括一些教材。
See Wikipedia for a larger version of this with much more detailed explanation. There are many good google hits for binary signed multiplication, including some teaching material.
*Warning*: This diagram is probably slightly bogus. ABCD A has a place value of -2^3 = -8 * abcd a has a place value of -2^3 = -8 ------ RRRRrrrr AAAAABCD * d sign-extended partial products + AAAABCD * c + AAABCD * b - AABCD * a (a * A = +2^6, since the negatives cancel) ---------- D*d ^ C*d+D*c
<强>做一个符号乘法,而不是一个无符号乘法仍然给出(在这个例子中,低4位)相同的结果在低半即可。部分产品的符号扩展只发生到结果的上半部分。
Doing a signed multiply instead of an unsigned multiply still gives the same result in the low half (the low 4 bits in this example). Sign-extension of the partial products only happens into the upper half of the result.
这解释是不是很彻底的(甚至有错误),但有充分的证据表明这是真的,安全生产code使用方法:
This explanation is not very thorough (and maybe even has mistakes), but there is good evidence that it is true and safe to use in production code:
gcc uses
imul
to compute theunsigned long
product of twounsigned long
inputs
英特尔的insn参考手册说:
Intel's insn ref manual says:
在二,三操作数形式也与无符号使用
操作数,因为产物的下半部是相同的,不管
如果操作数是带符号。在CF和标志的,但是,
不能被用于确定如果结果的上半部分是
非零The two- and three-operand forms may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.
- 英特尔的设计决定只引进2和3
IMUL
的操作形式,没有MUL
。
- Intel's design decision to only introduce 2 and 3 operand forms of
imul
, notmul
.
显然,按位二进制逻辑操作(和/或/异/不)独立地对待每个位:对位位置的结果只取决于在该比特位置的输入值。位的变化也相当明显。
Obviously the bitwise binary logical operations (and/or/xor/not) treat each bit independently: the result for a bit position depends only on the inputs value at that bit position. Bit-shifts are also rather obvious.
这篇关于其中2的补码整数操作可以在不输入调零高比特被使用,如果只结果的低部被通缉?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!