如何在 MSVC 中有效地从两个 __m128d 转换为一个 __m128i? [英] How to efficiently convert from two __m128d to one __m128i in MSVC?

查看:29
本文介绍了如何在 MSVC 中有效地从两个 __m128d 转换为一个 __m128i?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

转换然后移位然后按位或是从两个 __m128d 转换为单个 __m128i 的唯一方法吗?

这对于 x64 版本的 Xcode 来说是完全可以接受的

m128d v2dHi = ....m128d v2dLo = ....__m128i v4i = _mm_set_epi64(_mm_cvtpd_pi32(v2dHi), _mm_cvtpd_pi32(v2dLo))

反汇编显示正在使用_mm_cvtpd_pi32.但是,Visual Studio 无法编译它,并抱怨链接器错误.这在 VS 文档中得到支持,说 _mm_cvtpd_pi32 在 x64 上不受支持.

我不太担心它不可用,但是是两次转换,一次转换,然后按位或最快的方式?

解决方案

如果您遇到链接器错误,您可能忽略了有关未声明的内部函数的警告.

您当前的代码很可能被编译为糟糕的 asm.如果它编译为向量移位和 OR,则它已经编译为次优代码.(更新:这不是它编译的结果,IDK 是您从哪里得到这个想法的.)

使用 2x _mm_cvtpd_epi32 得到两个 __m128i 向量你想要的整数在每个的低 2 个元素中.使用 _mm_unpacklo_epi64 将这两个低半部分组合成一个向量4 个你想要的元素.

<小时>

编译器输出来自 clang3.8.1 在 Godbolt 编译器浏览器上.(我认为 Xcode 默认使用 clang).

#include //好的版本__m128i pack_double_to_int(__m128d a, __m128d b) {返回_mm_unpacklo_epi64(_mm_cvtpd_epi32(a), _mm_cvtpd_epi32(b));}cvtpd2dq xmm0, xmm0cvtpd2dq xmm1, xmm1punpcklqdq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0]退//原本的__m128i pack_double_to_int_badMMX(__m128d a, __m128d b) {返回_mm_set_epi64(_mm_cvtpd_pi32(b), _mm_cvtpd_pi32(a));}cvtpd2pi mm0, xmm1cvtpd2pi mm1, xmm0movq2dq xmm1, mm0movq2dq xmm0, mm1punpcklqdq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0]# 注意缺少 EMMS,因为没有使用它的内在函数退

当 SSE2 及更高版本可用时,MMX 几乎完全没用;只是避免它.有关一些指南,请参阅 标签维基.>

Is converting then shifting then bitwise-or'ing the only way to convert from two __m128d to a single __m128i?

This is perfectly acceptable to Xcode in an x64 build

m128d v2dHi = ....
m128d v2dLo = ....
__m128i v4i = _mm_set_epi64(_mm_cvtpd_pi32(v2dHi), _mm_cvtpd_pi32(v2dLo))

and the disassembly shows _mm_cvtpd_pi32 being used. However, Visual Studio cannot compile this, complaining about a linker error. This is supported in the VS docs, saying _mm_cvtpd_pi32 is not supported on x64.

I'm not too worried that it's not available, but is two conversions, a shift, then a bitwise-or the fastest way?

解决方案

If you got a linker error, you're probably ignoring a warning about an undeclared intrinsic function.

Your current code has a high risk of compiling to terrible asm. If it compiled to a vector-shift and an OR, it already is compiling to sub-optimal code. (Update: that's not what it compiles to, IDK where you got that idea.)

Use 2x _mm_cvtpd_epi32 to get two __m128i vectors with ints you want in the low 2 elements of each. Use _mm_unpacklo_epi64 to combine those two low halves into one vector with all 4 elements you want.


Compiler output from clang3.8.1 on the Godbolt compiler explorer. (Xcode uses clang by default, I think).

#include <immintrin.h>

// the good version
__m128i pack_double_to_int(__m128d a, __m128d b) {
    return _mm_unpacklo_epi64(_mm_cvtpd_epi32(a), _mm_cvtpd_epi32(b));
}
    cvtpd2dq        xmm0, xmm0
    cvtpd2dq        xmm1, xmm1
    punpcklqdq      xmm0, xmm1      # xmm0 = xmm0[0],xmm1[0]
    ret

// the original
__m128i pack_double_to_int_badMMX(__m128d a, __m128d b) {
    return _mm_set_epi64(_mm_cvtpd_pi32(b), _mm_cvtpd_pi32(a));
}
    cvtpd2pi        mm0, xmm1
    cvtpd2pi        mm1, xmm0
    movq2dq xmm1, mm0
    movq2dq xmm0, mm1
    punpcklqdq      xmm0, xmm1      # xmm0 = xmm0[0],xmm1[0]
      # note the lack of EMMS, because of not using the intrinsic for it
    ret

MMX is almost totally useless when SSE2 and later is available; just avoid it. See the tag wiki for some guides.

这篇关于如何在 MSVC 中有效地从两个 __m128d 转换为一个 __m128i?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆