_mm_cvtsd_f64类似物,用于更高阶的浮点 [英] _mm_cvtsd_f64 analogon for higher order floating point

查看:84
本文介绍了_mm_cvtsd_f64类似物,用于更高阶的浮点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在玩SIMD,想知道为什么没有类似_mm_cvtsd_f64的类来从__m128d引出高阶浮点.

I'm playing around with SIMD and wonder why there is no analogon to _mm_cvtsd_f64 to extrat the higher order floating point from a __m128d.

GCC 4.6+有一个扩展,可以很好地实现此目的:

GCC 4.6+ has an extension which achieves this in a nice way:

__m128d a = ...;
double d1 = a[0];
double d2 = a[1];

但是在较旧的GCC(即4.4.)上,我唯一能解决的方法是使用__builtin_ia32_vec_ext_v2df定义自己的模拟函数,即:

But on older GCC (i.e 4.4.) the only way I could manage to get this is to define my own analogon function using __builtin_ia32_vec_ext_v2df, i.e.:

extern __inline double __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cvtsd_f64_h (__m128d __A)
{
  return __builtin_ia32_vec_ext_v2df (__A, 1);
}

__m128d a = ...;
double d1 = _mm_cvtsd_f64(a);
double d2 = _mm_cvtsd_f64_h(a);

这真的是要走的路吗?是否有任何替代方法不使用潜在于编译器的__builtin东西?再次-为什么没有_mm_cvtsd_f64_h或类似的预定义?

Is this really the way to go? Is there any alternative that does not use potentially compiler-specific __builtin stuff? And again - why is there no _mm_cvtsd_f64_h or similar predefined?

我可以想出的这种替代方法是慢得多:

This alternative I could come up with is much slower btw:

_mm_cvtsd_f64_h(__m128d __A) {
    double d[2];
    _mm_store_pd(d, __A);
    return d[1];
}

推荐答案

我建议您使用以下代码:

I suggest that you use the following code:

inline static _mm_cvtsd_f64_h(__m128d x) {
    return _mm_cvtsd_f64(_mm_unpackhi_pd(x, x));
}

这可能是获取xmm寄存器上半部分的最快方法,并且与MSVC/icc/gcc/clang兼容.

This is likely the fastest way to get get the upper half of xmm register, and it is compatible with MSVC/icc/gcc/clang.

这篇关于_mm_cvtsd_f64类似物,用于更高阶的浮点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆