装上彩车上交所翻转迹象 [英] Flipping sign on packed SSE floats
问题描述
我在寻找的包装在SSE寄存器中的所有四个浮点翻转标志的最有效的方法。
I'm looking for the most efficient method of flipping the sign on all four floats packed in an SSE register.
我还没有找到一个内在的英特尔架构软件开发手册这样做。以下是我已经尝试过的事情。
I have not found an intrinsic for doing this in the Intel Architecture software dev manual. Below are the things I've already tried.
有关我环绕在code 10的十亿倍,得到指示的墙时每个案件。我想至少匹配4秒,把我的非SIMD方法,它是只用元减运算符。
For each case I looped over the code 10 billion times and got the wall-time indicated. I'm trying to at least match 4 seconds it takes my non-SIMD approach, which is using just the unary minus operator.
[48秒] _mm_sub_ps(_mm_setzero_ps(),VEC);
[32秒] _mm_mul_ps(_mm_set1_ps(-1.0F),VEC);
[9秒]
[9 sec]
union NegativeMask {
int intRep;
float fltRep;
} negMask;
negMask.intRep = 0x80000000;
_mm_xor_ps( _mm_set1_ps( negMask.fltRep ), vec );
编译器是gcc的-O3 4.2。 CPU是英特尔的Core 2 Duo处理器。
The compiler is gcc 4.2 with -O3. The CPU is an Intel Core 2 Duo.
推荐答案
仅仅通过这些内置矢量gcc的文档来完成自己的答案:
Just to complete your own answer by the gcc documentation about these builtin vectors:
The types defined in this manner can be used with a subset of normal C
operations. Currently, GCC will allow using the following operators on
these types: `+, -, *, /, unary minus, ^, |, &, ~'.
这可能是一个好主意,始终坚持这些可能的情况下。具有很高的机率GCC总是会提供最有效的code这个东西SSE
It is probably a good idea to always stick to these when possible. With very high chances gcc will always provide the most efficient code for this SSE stuff.
有关你的编译器选项,添加更具体的东西,你的架构,像 -march =本地
会做在大多数情况下。
For your compiler options, add something more specific to your architecture, something like -march=native
will do in most cases.
这篇关于装上彩车上交所翻转迹象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!