设置进位标志的最快方法 [英] Fastest way to set a Carry Flag
问题描述
我在做一个周期来概括两个数组。我的目标是通过避免随身携带检查做C = A + B;携带=(C< A)
。我失去了 CF
当我做回路测试,用 CMP
指令。
I'm doing a cycle to sum two arrays. My objective is do it by avoiding carry checks c = a + b; carry = (c<a)
. I lost the CF
when I do the loop test, with the cmp
instruction.
目前,我使用和 JE
和 STC
来测试和设置previously保存状态 CF的
。但跳跃需要更多的少7个周期,它是什么,很多关于我想要的东西。
Currently, i am using and the JE
and STC
to test and set the previously saved state of CF
. But the jump takes more less 7 cycles, what it is a lot for what I want.
//This one is working
asm(
"cmp $0,%0;"
"je 0f;"
"stc;"
"0:"
"adcq %2, %1;"
"setc %0"
: "+r" (carry), "+r" (anum)
: "r" (bnum)
);
我已经尝试过使用 SAHF
(2 + 2(MOV)周期),但没有奏效。
I already tried use the SAHF
(2 + 2(mov) cycles), but that do not worked.
//Do not works
asm(
"mov %0, %%ah;"
"sahf;"
"adcq %2, %1;"
"setc %0"
: "+r" (carry), "+r" (anum)
: "r" (bnum)
);
任何人都知道的方式更快速地将 CF
?像直接移动或类似的东西。
Anyone knows a way to set the CF
more quickly? Like a direct move or something similar..
推荐答案
<一个href=\"http://stackoverflow.com/questions/32084204/problems-with-adc-sbb-and-inc-dec-in-tight-loops-on-some-cpus\">Looping没有弄错 CF
会更快。看到一些更好的ASM循环的链接。
Looping without clobbering CF
will be faster. See that link for some better asm loops.
不要试图只要写 ADC
与C环内联汇编。这是不可能为是最佳的,因为你不能要求的gcc不揍标志。努力学习ASM与GNU C内联汇编比写一个独立的功能,尤其困难。在这种情况下,你试图preserve进位标志。
Don't try to write just the adc
with inline asm inside a C loop. It's impossible for that to be optimal, because you can't ask gcc not to clobber flags. Trying to learn asm with GNU C inline asm is harder than writing a stand-alone function, esp. in this case where you are trying to preserve the carry flag.
您可以使用 setnc%[套利]
来保存和 SUBB 1 $,%[套利]
来恢复。 (或 CMPB $ 1%[套利]
我猜。)或者斯蒂芬指出, negb%[套利]
You could use setnc %[carry]
to save and subb $1, %[carry]
to restore. (Or cmpb $1, %[carry]
I guess.) Or as Stephen points out, negb %[carry]
.
0 - 1
产生进位,但 1 - 1
不
使用 uint8_t有
来变量来存放随身携带的,因为你永远不会直接添加到%[ANUM]
。这避免了任何的机会<一href=\"http://stackoverflow.com/questions/33666617/which-is-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and\">partial-register减速的。例如。
Use a uint8_t
to variable to hold the carry, since you will never add it directly to %[anum]
. This avoids any chance of partial-register slowdowns. e.g.
uint8_t carry = 0;
int64_t numa, numb;
for (...) {
asm ( "negb %[carry]\n\t"
"adc %[bnum], %[anum]\n\t"
"setc %[carry]\n\t"
: [carry] "+&r" (carry), [anum] "+r" (anum)
: [bnum] "rme" (bnum)
: // no clobbers
);
}
您也可以为寄存器源,章/ MEM DEST备用约束模式。我用一个x86 E
的约束,而不是我
,因为64位模式仍然只允许32位符号扩展立即数。 GCC将获得较大的编译时常到自身的寄存器。搭载的是早期被破坏,所以即使它和带bnum
双双 1
下手,GCC不能使用同一个寄存器对两个输入
You could also provide an alternate constraint pattern for register source, reg/mem dest. I used an x86 "e"
constraint instead of "i"
, because 64bit mode still only allows 32bit sign-extended immediates. gcc will have to get larger compile-time constants into a register on its own. Carry is early-clobbered, so even if it and bnum
were both 1
to start with, gcc couldn't use the same register for both inputs.
这仍然是可怕的,并增加了循环携带依赖性链的长度从0.02至4C(英特尔pre-Broadwell微架构),或者从1C到3C(英特尔BDW / SKYLAKE微架构,而AMD)。
This is still terrible, and increases the length of the loop-carried dependency chain from 2c to 4c (Intel pre-Broadwell), or from 1c to 3c (Intel BDW/Skylake, and AMD).
所以,你的循环,因为您使用的是杂牌,而不是在写ASM整个循环的1/3的速度运行。
这个答案的previous版本建议直接添加进位,而不是恢复它变成 CF
。这种方法有一个致命的缺陷:它混合了进来进位这个迭代与即将离任的利差将下一个迭代
A previous version of this answer suggested adding the carry directly, instead of restoring it into CF
. This approach has a fatal flaw: it mixed up the incoming carry into this iteration with the outgoing carry going to the next iteration.
此外, SAHF
从设置标志啊。 LAHF
是加载到AH标志(和它的标志整个低8位工作配对这些指令;不要使用 LAHF
在0或1,您从国家经贸委
。
Also, sahf
is Set AH from Flags. lahf
is Load AH into Flags (and it operates on the whole low 8 bits of flags. Pair those instructions; don't use lahf
on a 0 or 1 that you got from setc
.
阅读的insn设置参考手册,似乎并不在做你所期望的任何的insn。请参见 http://stackoverflow.com/tags/x86/info
Read the insn set reference manual for any insns that don't seem to be doing what you expect. See http://stackoverflow.com/tags/x86/info
这篇关于设置进位标志的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!