如何在C中使用asm添加两个64位数字时访问进位标志 [英] How to access the carry flag while adding two 64 bit numbers using asm in C

查看:26
本文介绍了如何在C中使用asm添加两个64位数字时访问进位标志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是的,谢谢.@彼得科德斯.__int128 也有效.但是,正如您所说,使用 C 中的 _addcarry_u64 多精度算术的内在函数,使用头文件 immintrin.h 我有以下代码

Yeah thanks that works. @PeterCordes. Also __int128 works. But one more thing as you said using the intrinsics of multiprecision arithmetic that is _addcarry_u64 in C, using the header file immintrin.h I have the following code

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <immintrin.h>

unsigned char _addcarry_u64(unsigned char c_in, uint64_t src1, uint64_t src2,uint64_t *sum);

int main()
{
    unsigned char carry;
    uint64_t sum;
    long long int c1=0,c2=0;
    uint64_t a=0x0234BDFA12CD4379,b=0xA8DB4567ACE92B38;
    carry = _addcarry_u64(0,a,b,&sum);
    printf("sum is %lx and carry value is %u n",sum,carry);
    return 0;
}

你能指出我的错误吗?我收到了对 _addcarry_u64 的未定义引用.一些快速的谷歌没有回答这个问题,如果要使用任何其他头文件或者它与 gcc 不兼容,为什么会这样

Can you please point me out the error? I'm getting undefined reference to _addcarry_u64. Some quick google doesn't answer the problem , if any other header file to be used or it is not compatible with gcc and why so

最初我有这个代码来添加两个 64 位数字:

Initially I had this code for adding two 64 bit numbers:

static __inline int is_digit_lessthan_ct(digit_t x, digit_t y)
{ // Is x < y?
    return ( int)((x ^ ((x ^ y) | ((x - y) ^ y))) >> (RADIX-1)); 
}


#define ADDC(carryIn, addend1, addend2, carryOut, sumOut) 
       { digit_t tempReg = (addend1) + (int)(carryIn);    
                (sumOut) = (addend2) + tempReg;           
              (carryOut) = (is_digit_lessthan_ct(tempReg, (int)(carryIn)) | is_digit_lessthan_ct((sumOut), tempReg)); 
 }

现在我知道使用汇编语言可以提高这个实现的速度.所以我正在尝试做一些类似的事情,但是我无法访问或返回进位.这是我的代码:

Now I got to know that the speed of this implementation can be improved using assembly language. So I am trying to do something similar however I cannot access or return the carry. Here is my code :

#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
    uint64_t d=0,carry=0;
    __asm__("mov %1,%%rax
	"
            "adc %2,%%rax
	"
            "mov %%rax,%0
	"
            :"=r"(d)
            :"r"(a),"r"(b)
            :"%rax"
           );
    return d;
}
int main()
{
    uint64_t a=0xA234BDFA12CD4379,b=0xA8DB4567ACE92B38;
    printf("Sum = %lx 
",add32(a,b));
    return 0;
}

此加法的结果应该是 14B100361BFB66EB1,其中 msb 中的初始 1 是进位.我想将该进位保存在另一个寄存器中.我试过 jc,但我遇到了一些或其他错误.甚至 setc 也给了我错误,可能是因为我不确定语法.那么谁能告诉我如何将进位保存在另一个寄存器中或通过修改此代码返回它?

The result of this addition should be 14B100361BFB66EB1, where the initial 1 in msb is the carry. I want to save that carry in another register. I tried jc, but I'm getting some or the other error. Even setc gave me error, may be because I'm not sure of the syntax. So can anyone tell me how to save the carry in another register or return it by modifying this code?

推荐答案

像往常一样,内联汇编并不是绝对必要的.https://gcc.gnu.org/wiki/DontUseInlineAsm.但是目前编译器对于实际的扩展精度加法有点糟糕,所以你可能需要 asm 来实现这一点.

As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.

adc 有一个英特尔内在函数:_addcarry_u64.但是 gcc 和 clang 可能会使代码变慢.,很遗憾.在 64 位平台上的 GNU C 中,您可以只使用 unsigned __int128.

There's an Intel intrinsic for adc: _addcarry_u64. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just use unsigned __int128.

在检查加法的进位时,编译器通常会设法制作出非常好的代码 使用 carry_out = (x+y) < 的习惯用法.x,其中 < 是无符号比较.例如:

Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that carry_out = (x+y) < x, where < is an unsigned compare. For example:

struct long_carry { unsigned long res; unsigned carry; };

struct long_carry add_carryout(unsigned long x, unsigned long y) {
    unsigned long retval = x + y;
    unsigned carry = (retval < x);
    return (struct long_carry){ retval, carry };
}

gcc7.2 -O3发出此(和铛发出类似代码):

gcc7.2 -O3 emits this (and clang emits similar code):

    mov     rax, rdi        # because we need return value in a different register
    xor     edx, edx        # set up for setc
    add     rax, rsi        # generate carry
    setc    dl              # save carry.
    ret                     # return with rax=sum, edx=carry  (SysV ABI struct packing)

内联汇编没有比这更好的方法了;此功能对于现代 CPU 来说已经是最佳选择.(好吧,我想如果 mov 不是零延迟,首先执行 add 会缩短准备就绪的延迟.但在 Intel CPU 上,应该更好立即覆盖mov-elimination结果,所以最好先mov然后添加.)

There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if mov wasn't zero latency, doing the add first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)

Clang 甚至会使用 adc 将一个加法的进位用作另一个加法的进位,但仅用于第一个分支.也许是因为:更新:此功能已损坏: carry_out = (x+y) 在有进位时不起作用.随着 carry_out = (x+y+c_in) , y+c_in 可以换零并给你 (x+0) <x (false) 即使有进位.

Clang will even use adc to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken: carry_out = (x+y) < x doesn't work when there's carry-in. With carry_out = (x+y+c_in) < x, y+c_in can wrap to zero and give you (x+0) < x (false) even though there was carry.

请注意,clang 的 cmp/adc reg,0 完全实现了 C 的行为,这与另一个 adc 不同

Notice that clang's cmp/adc reg,0 exactly implements the behaviour of the C, which isn't the same as another adc there.

无论如何,在安全的情况下,gcc 第一次甚至不会使用 adc.(所以使用 unsigned __int128 表示不糟糕的代码,而 asm 表示比这更宽的整数).

Anyway, gcc doesn't even use adc the first time, when it is safe. (So use unsigned __int128 for code that doesn't suck, and asm for integers even wider than that).

// BROKEN with carry_in=1 and y=~0U
static
unsigned adc_buggy(unsigned long *sum, unsigned long x, unsigned long y, unsigned carry_in) {
    *sum = x + y + carry_in;
    unsigned carry = (*sum < x);
    return carry;
}

// *x += *y
void add256(unsigned long *x, unsigned long *y) {
    unsigned carry;
    carry = adc(x, x[0], y[0], 0);
    carry = adc(x+1, x[1], y[1], carry);
    carry = adc(x+2, x[2], y[2], carry);
    carry = adc(x+3, x[3], y[3], carry);
}

    mov     rax, qword ptr [rsi]
    add     rax, qword ptr [rdi]
    mov     qword ptr [rdi], rax

    mov     rax, qword ptr [rdi + 8]
    mov     r8, qword ptr [rdi + 16]   # hoisted
    mov     rdx, qword ptr [rsi + 8]
    adc     rdx, rax                   # ok, no memory operand but still adc
    mov     qword ptr [rdi + 8], rdx

    mov     rcx, qword ptr [rsi + 16]   # r8 was loaded earlier
    add     rcx, r8
    cmp     rdx, rax                    # manually check the previous result for carry.  /facepalm
    adc     rcx, 0

    ...

这很糟糕,所以如果你想要扩展精度的加法,你仍然需要 asm.但是为了将结转输入到 C 变量中,您不需要.

This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.

这篇关于如何在C中使用asm添加两个64位数字时访问进位标志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆