不断加载到彩车SSE寄存器 [英] Load constant floats into SSE registers

查看:161
本文介绍了不断加载到彩车SSE寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出一种有效的方式来加载编译时间常数花车到SSE(2/3)寄存器。我试着做一些简单的code这样的,

I'm trying to figure out an efficient way to load compile time constant floats into SSE(2/3) registers. I've tried doing simple code like this,

const __m128 x = { 1.0f, 2.0f, 3.0f, 4.0f }; 

不过,从内存中产生4 MOVSS说明!

but that generates 4 movss instructions from memory!

movss       xmm0,dword ptr [__real@3f800000 (14048E534h)] 
movss       xmm1,dword ptr [__real@40000000 (14048E530h)] 
movaps      xmm6,xmm12 
shufps      xmm6,xmm12,0C6h 
movss       dword ptr [rsp],xmm0 
movss       xmm0,dword ptr [__real@40400000 (14048E52Ch)] 
movss       dword ptr [rsp+4],xmm1 
movss       xmm1,dword ptr [__real@40a00000 (14048E528h)] 

其中加载标量进出内存

...(?!?!)

which load the scalars in and out of memory... (?!?!)

虽然这样做。

float Align(16) myfloat4[4] = { 1.0f, 2.0f, 3.0f, 4.0f, }; // out in global scope

生成。

movaps      xmm5,xmmword ptr [::myarray4 (140512050h)]

在理想情况下,这将是很好,如果我有常数他们将是一个办法不拉平触摸记忆,只是立即样式指令去做(如编译成指令本身的常量)。

Ideally, it would be nice if I have constants their would be a way not to even touch memory and just do it with immediate style instructions (e.g. the constants compiled into the instruction itself).

感谢

推荐答案

如果你想用它强制一个负载,你可以尝试(GCC):

If you want to force it to a single load, you could try (gcc):

__attribute__((aligned(16))) float vec[4] = { 1.0f, 1.1f, 1.2f, 1.3f };
__m128 v = _mm_load_ps(vec); // edit by sor: removed the "&" cause its already an address

如果您有Visual C ++中,使用 __ declspec(调整(16))要求适当的约束。

If you have Visual C++, use __declspec(align(16)) to request the proper constraint.

在我的系统,这个(编译的gcc -m32 -msse -O2 ;在所有的杂波code,但仍然没有优化保留了单 MOVAPS 到底)创建以下组件code(GCC / AT& T公司的语法):

On my system, this (compiled with gcc -m32 -msse -O2; no optimization at all clutters the code but still retains the single movaps in the end) creates the following assembly code (gcc / AT&T syntax):

    andl    $-16, %esp
    subl    $16, %esp
    movl    $0x3f800000, (%esp)
    movl    $0x3f8ccccd, 4(%esp)
    movl    $0x3f99999a, 8(%esp)
    movl    $0x3fa66666, 12(%esp)
    movaps  (%esp), %xmm0

请注意,它对齐分配stackspace并把常量在那里之前stackpointer。离开 __属性__((对齐))出可能,这取决于你的编译器,创建不正确code,它并没有这样做,所以要小心,并检查拆卸。

Note that it aligns the stackpointer before allocating stackspace and putting the constants in there. Leaving the __attribute__((aligned)) out may, depending on your compiler, create incorrect code that doesn't do this, so beware, and check the disassembly.

此外:结果
既然你已经要求的如何把常量到code 的,只是尝试上面用静态预选赛的浮动阵列。这将创建下列组件:

Additionally:
Since you've been asking for how to put constants into the code, simply try the above with a static qualifier for the float array. That creates the following assembly:

    movaps  vec.7330, %xmm0
    ...
vec.7330:
    .long   1065353216
    .long   1066192077
    .long   1067030938
    .long   1067869798

这篇关于不断加载到彩车SSE寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆