uint64_t与int64_t的sqrt [英] sqrt of uint64_t vs. int64_t

查看:576
本文介绍了uint64_t与int64_t的sqrt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到,计算uint64_t的平方根的整数部分比int64_t复杂得多.拜托,有人对此有解释吗?为什么似乎很难处理多一点?

以下内容:

int64_t sqrt_int(int64_t a) {
    return sqrt(a);
}

使用clang 5.0和-mfpmath=sse -msse3 -Wall -O3编译为

sqrt_int(long):                           # @sqrt_int(long)
        cvtsi2sd        xmm0, rdi
        sqrtsd  xmm0, xmm0
        cvttsd2si       rax, xmm0
        ret

但以下内容:

uint64_t sqrt_int(uint64_t a) {
    return sqrt(a);
}

编译为:

.LCPI0_0:
        .long   1127219200              # 0x43300000
        .long   1160773632              # 0x45300000
        .long   0                       # 0x0
        .long   0                       # 0x0
.LCPI0_1:
        .quad   4841369599423283200     # double 4503599627370496
        .quad   4985484787499139072     # double 1.9342813113834067E+25
.LCPI0_2:
        .quad   4890909195324358656     # double 9.2233720368547758E+18
sqrt_int(unsigned long):                           # @sqrt_int(unsigned long)
        movq    xmm0, rdi
        punpckldq       xmm0, xmmword ptr [rip + .LCPI0_0] # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
        subpd   xmm0, xmmword ptr [rip + .LCPI0_1]
        haddpd  xmm0, xmm0
        sqrtsd  xmm0, xmm0
        movsd   xmm1, qword ptr [rip + .LCPI0_2] # xmm1 = mem[0],zero
        movapd  xmm2, xmm0
        subsd   xmm2, xmm1
        cvttsd2si       rax, xmm2
        movabs  rcx, -9223372036854775808
        xor     rcx, rax
        cvttsd2si       rax, xmm0
        ucomisd xmm0, xmm1
        cmovae  rax, rcx
        ret

解决方案

首先,您需要清楚一点,这段代码将64位整数(有符号或无符号)转换为双精度浮点数,并取平方根,然后将结果转换回有符号或无符号整数.

您的问题的答案是因为Intel在您要编译的指令集中提供了64位有符号整数到双精度浮点转换(反之亦然),但对于无符号情况则没有这样做.他们在AVX-512中添加了无符号转换指令,但是在此之前不存在.因此,对于带符号的情况,到双精度的转换和向后的转换每个都是一条指令.对于无符号的情况,编译器必须从可用指令中合成转换操作.

您可以在此处获取有关哪些指令可用,哪些版本的SSE2/AVX/AVX-512等的信息: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

您可以在此处查看有关用于综合转换的方法的讨论: 是否存在x87 FILD的无符号等效项和SSE CVTSI2SD指令?

I noticed that calculating the integer part of square root of uint64_t is much more complicated than of int64_t. Please, does anybody have an explanation for this? Why is it seemingly much more difficult to deal with one extra bit?

The following:

int64_t sqrt_int(int64_t a) {
    return sqrt(a);
}

compiles with clang 5.0 and -mfpmath=sse -msse3 -Wall -O3 to

sqrt_int(long):                           # @sqrt_int(long)
        cvtsi2sd        xmm0, rdi
        sqrtsd  xmm0, xmm0
        cvttsd2si       rax, xmm0
        ret

But the following:

uint64_t sqrt_int(uint64_t a) {
    return sqrt(a);
}

compiles to:

.LCPI0_0:
        .long   1127219200              # 0x43300000
        .long   1160773632              # 0x45300000
        .long   0                       # 0x0
        .long   0                       # 0x0
.LCPI0_1:
        .quad   4841369599423283200     # double 4503599627370496
        .quad   4985484787499139072     # double 1.9342813113834067E+25
.LCPI0_2:
        .quad   4890909195324358656     # double 9.2233720368547758E+18
sqrt_int(unsigned long):                           # @sqrt_int(unsigned long)
        movq    xmm0, rdi
        punpckldq       xmm0, xmmword ptr [rip + .LCPI0_0] # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
        subpd   xmm0, xmmword ptr [rip + .LCPI0_1]
        haddpd  xmm0, xmm0
        sqrtsd  xmm0, xmm0
        movsd   xmm1, qword ptr [rip + .LCPI0_2] # xmm1 = mem[0],zero
        movapd  xmm2, xmm0
        subsd   xmm2, xmm1
        cvttsd2si       rax, xmm2
        movabs  rcx, -9223372036854775808
        xor     rcx, rax
        cvttsd2si       rax, xmm0
        ucomisd xmm0, xmm1
        cmovae  rax, rcx
        ret

解决方案

First off, you need to be clear that this code is converting 64-bit integers (signed or unsigned) to double precision floating point, taking the square root, and then casting the result back to a signed or unsigned integer.

The answer to your question is because Intel provided 64-bit signed integer to double precision floating-point conversion (and the opposite) in the instruction set you are compiling for, but did not do so for the unsigned case. They added the unsigned conversion instruction in AVX-512, but it does not exist prior to that. So for the signed case, the conversion to double precision and the conversion back are one instruction each. For the unsigned case, the compiler has to synthesize the conversion operation from available instructions.

You can get information on which instructions are available in which versions of SSE2/AVX/AVX-512, etc. here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

You can see discussion of the method used to synthesize the conversion here: Are there unsigned equivalents of the x87 FILD and SSE CVTSI2SD instructions?

这篇关于uint64_t与int64_t的sqrt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆