不断加载到彩车SSE寄存器 [英] Load constant floats into SSE registers
问题描述
我试图找出一种有效的方式来加载编译时间常数花车到SSE(2/3)寄存器。我试着做一些简单的code这样的,
I'm trying to figure out an efficient way to load compile time constant floats into SSE(2/3) registers. I've tried doing simple code like this,
const __m128 x = { 1.0f, 2.0f, 3.0f, 4.0f };
不过,从内存中产生4 MOVSS说明!
but that generates 4 movss instructions from memory!
movss xmm0,dword ptr [__real@3f800000 (14048E534h)]
movss xmm1,dword ptr [__real@40000000 (14048E530h)]
movaps xmm6,xmm12
shufps xmm6,xmm12,0C6h
movss dword ptr [rsp],xmm0
movss xmm0,dword ptr [__real@40400000 (14048E52Ch)]
movss dword ptr [rsp+4],xmm1
movss xmm1,dword ptr [__real@40a00000 (14048E528h)]
其中加载标量进出内存
...(?!?!)
which load the scalars in and out of memory... (?!?!)
虽然这样做。
float Align(16) myfloat4[4] = { 1.0f, 2.0f, 3.0f, 4.0f, }; // out in global scope
生成。
movaps xmm5,xmmword ptr [::myarray4 (140512050h)]
在理想情况下,这将是很好,如果我有常数他们将是一个办法不拉平触摸记忆,只是立即样式指令去做(如编译成指令本身的常量)。
Ideally, it would be nice if I have constants their would be a way not to even touch memory and just do it with immediate style instructions (e.g. the constants compiled into the instruction itself).
感谢
推荐答案
如果你想用它强制一个负载,你可以尝试(GCC):
If you want to force it to a single load, you could try (gcc):
__attribute__((aligned(16))) float vec[4] = { 1.0f, 1.1f, 1.2f, 1.3f };
__m128 v = _mm_load_ps(vec); // edit by sor: removed the "&" cause its already an address
如果您有Visual C ++中,使用 __ declspec(调整(16))
要求适当的约束。
If you have Visual C++, use __declspec(align(16))
to request the proper constraint.
在我的系统,这个(编译的gcc -m32 -msse -O2
;在所有的杂波code,但仍然没有优化保留了单 MOVAPS
到底)创建以下组件code(GCC / AT& T公司的语法):
On my system, this (compiled with gcc -m32 -msse -O2
; no optimization at all clutters the code but still retains the single movaps
in the end) creates the following assembly code (gcc / AT&T syntax):
andl $-16, %esp
subl $16, %esp
movl $0x3f800000, (%esp)
movl $0x3f8ccccd, 4(%esp)
movl $0x3f99999a, 8(%esp)
movl $0x3fa66666, 12(%esp)
movaps (%esp), %xmm0
请注意,它对齐分配stackspace并把常量在那里之前stackpointer。离开 __属性__((对齐))
出可能,这取决于你的编译器,创建不正确code,它并没有这样做,所以要小心,并检查拆卸。
Note that it aligns the stackpointer before allocating stackspace and putting the constants in there. Leaving the __attribute__((aligned))
out may, depending on your compiler, create incorrect code that doesn't do this, so beware, and check the disassembly.
此外:结果
既然你已经要求的如何把常量到code 的,只是尝试上面用静态
预选赛的浮动
阵列。这将创建下列组件:
Additionally:
Since you've been asking for how to put constants into the code, simply try the above with a static
qualifier for the float
array. That creates the following assembly:
movaps vec.7330, %xmm0
...
vec.7330:
.long 1065353216
.long 1066192077
.long 1067030938
.long 1067869798
这篇关于不断加载到彩车SSE寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!