设置进位标志的最快方法 [英] Fastest way to set a Carry Flag

查看:242
本文介绍了设置进位标志的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在做一个周期来概括两个数组。我的目标是通过避免随身携带检查做C = A + B;携带=(C< A)。我失去了 CF 当我做回路测试,用 CMP 指令。

I'm doing a cycle to sum two arrays. My objective is do it by avoiding carry checks c = a + b; carry = (c<a). I lost the CF when I do the loop test, with the cmp instruction.

目前,我使用和 JE STC 来测试和设置previously保存状态 CF的。但跳跃需要更多的少7个周期,它是什么,很多关于我想要的东西。

Currently, i am using and the JEand STC to test and set the previously saved state of CF. But the jump takes more less 7 cycles, what it is a lot for what I want.

   //This one is working
   asm(
        "cmp $0,%0;"
        "je 0f;"
        "stc;"
    "0:"   
        "adcq %2, %1;"
        "setc %0"

    : "+r" (carry), "+r" (anum)
    : "r" (bnum)
   );

我已经尝试过使用 SAHF (2 + 2(MOV)周期),但没有奏效。

I already tried use the SAHF (2 + 2(mov) cycles), but that do not worked.

   //Do not works
   asm(
        "mov %0, %%ah;"
        "sahf;"
        "adcq %2, %1;"
        "setc %0"

        : "+r" (carry), "+r" (anum)
        : "r" (bnum)
   );

任何人都知道的方式更快速地将 CF ?像直接移动或类似的东西。

Anyone knows a way to set the CF more quickly? Like a direct move or something similar..

推荐答案

<一个href=\"http://stackoverflow.com/questions/32084204/problems-with-adc-sbb-and-inc-dec-in-tight-loops-on-some-cpus\">Looping没有弄错 CF 会更快。看到一些更好的ASM循环的链接。

Looping without clobbering CF will be faster. See that link for some better asm loops.

不要试图只要写 ADC 与C环内联汇编。这是不可能为是最佳的,因为你不能要求的gcc不揍标志。努力学习ASM与GNU C内联汇编比写一个独立的功能,尤其困难。在这种情况下,你试图preserve进位标志。

Don't try to write just the adc with inline asm inside a C loop. It's impossible for that to be optimal, because you can't ask gcc not to clobber flags. Trying to learn asm with GNU C inline asm is harder than writing a stand-alone function, esp. in this case where you are trying to preserve the carry flag.

您可以使用 setnc%[套利] 来保存和 SUBB 1 $,%[套利] 来恢复。 (或 CMPB $ 1%[套利] 我猜。)或者斯蒂芬指出, negb%[套利]

You could use setnc %[carry] to save and subb $1, %[carry] to restore. (Or cmpb $1, %[carry] I guess.) Or as Stephen points out, negb %[carry].

0 - 1 产生进位,但 1 - 1

使用 uint8_t有来变量来存放随身携带的,因为你永远不会直接添加到%[ANUM] 。这避免了任何的机会<一href=\"http://stackoverflow.com/questions/33666617/which-is-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and\">partial-register减速的。例如。

Use a uint8_t to variable to hold the carry, since you will never add it directly to %[anum]. This avoids any chance of partial-register slowdowns. e.g.

uint8_t carry = 0;
int64_t numa, numb;

for (...) {
    asm ( "negb   %[carry]\n\t"
          "adc    %[bnum], %[anum]\n\t"
          "setc   %[carry]\n\t"
          : [carry] "+&r" (carry), [anum] "+r" (anum)
          : [bnum] "rme" (bnum)
          : // no clobbers
        );
}

您也可以为寄存器源,章/ MEM DEST备用约束模式。我用一个x86 E 的约束,而不是,因为64位模式仍然只允许32位符号扩展立即数。 GCC将获得较大的编译时常到自身的寄存器。搭载的是早期被破坏,所以即使它和带bnum 双双 1 下手,GCC不能使用同一个寄存器对两个输入

You could also provide an alternate constraint pattern for register source, reg/mem dest. I used an x86 "e" constraint instead of "i", because 64bit mode still only allows 32bit sign-extended immediates. gcc will have to get larger compile-time constants into a register on its own. Carry is early-clobbered, so even if it and bnum were both 1 to start with, gcc couldn't use the same register for both inputs.

这仍然是可怕的,并增加了循环携带依赖性链的长度从0.02至4C(英特尔pre-Broadwell微架构),或者从1C到3C(英特尔BDW / SKYLAKE微架构,而AMD)。

This is still terrible, and increases the length of the loop-carried dependency chain from 2c to 4c (Intel pre-Broadwell), or from 1c to 3c (Intel BDW/Skylake, and AMD).

所以,你的循环,因为您使用的是杂牌,而不是在写ASM整个循环的1/3的速度运行。

这个答案的previous版本建议直接添加进位,而不是恢复它变成 CF 。这种方法有一个致命的缺陷:它混合了进来进位这个迭代与即将离任的利差将下一个迭代

A previous version of this answer suggested adding the carry directly, instead of restoring it into CF. This approach has a fatal flaw: it mixed up the incoming carry into this iteration with the outgoing carry going to the next iteration.

此外, SAHF 从设置标志啊。 LAHF 是加载到AH标志(和它的标志整个低8位工作配对这些指令;不要使用 LAHF 在0或1,您从国家经贸委。

Also, sahf is Set AH from Flags. lahf is Load AH into Flags (and it operates on the whole low 8 bits of flags. Pair those instructions; don't use lahf on a 0 or 1 that you got from setc.

阅读的insn设置参考手册,似乎并不在做你所期望的任何的insn。请参见 http://stackoverflow.com/tags/x86/info

Read the insn set reference manual for any insns that don't seem to be doing what you expect. See http://stackoverflow.com/tags/x86/info

这篇关于设置进位标志的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆